On Fri, 20 Nov 2020 03:51:07 +0100
Halil Pasic <pasic(a)linux.ibm.com> wrote:
On Tue, 17 Nov 2020 04:26:05 +0100
Eric Farman <farman(a)linux.ibm.com> wrote:
> Now that the vfio-ccw code has a notifier interface to request that
> a device be unplugged, let's wire that together.
I'm aware of the fact that performing an unplug is what vfio-pci does,
but I was not aware of this before, and I became curious with regards
to how is this going to mesh with migration (in the future).
The sentence 'For this to work, QEMU has to be launched with the same
arguments the two times.' form docs/devel/migration.rst should not
be taken literally, I know, but the VM configuration changing not because
it was changed via a management interface, but because of events that
may not be under the control of whoever is managing the VM does
make thinks harder to reason about.
I suppose, we now basically mandate that whoever manages the VM, either
a) maintains his own model of the VM and listens to the events, to
update it if such a device removal happens, or
b) inspects the VM at some appropriate point in time, to figure out how
the target VM is supposed to be started.
I think libvirt does a).
This seems like something that would be of general interest to libvirt
folks, adding the list on cc:.
For virtual devices, QEMU and any management software are in full
control, and can simply make sure that both source and target have the
device available.
For physical devices, we still can make sure that source and target
match when we do the setup, but device failures can happen at
inconvenient times. It may suddenly be no longer possible to access
state etc. Can we propagate changes like "device foobar, don't bother
migrating" even when we already started migration? Should the handling
be different if the target system uses a different (compatible) device?
Should we fail the migration?
I also suppose, such a device removal may not happen during device
migration. That is, form the QEMU perspective I believe taken care
by the BQL.
Even if the device is not actually removed, it might still be
inaccessible.
But I'm in the dark regarding the management software/libivrt view. For
example what happens if we get a req_notification on the target while
pre-copy memory migration? At that point the destination is already
receiving pages, thus it is already constructed.
My questions are not necessarily addressed to you Eric. Maybe the
folks working on vfio migration can help us out. I'm also adding
our libvirt guys.
Regards,
Halil