Re: [libvirt] [Qemu-devel] [RFC 0/2] Attempt to implement the standby feature for assigned network devices

5 Dec 2018

      On Wed, Dec 05, 2018 at 10:18:29 -0600, Michael Roth wrote:
...
Quoting Sameeh Jubran (2018-10-25 13:01:10)
...
On Thu, Oct 25, 2018 at 5:06 PM Sameeh Jubran <sameeh@daynix.com> wrote:
...
From: Sameeh Jubran <sjubran@redhat.com>
Migration support:
Pre migration or during setup phase of the migration we should send an
unplug request to the guest to unplug the primary device. I haven't had
the chance to implement that part yet but should do soon. Do you know
what's the best approach to do so? I wanted to have a callback to the
virtio-net device which tries to send an unplug request to the guest and
if succeeds then the migration continues. It needs to handle the case where
the migration fails and then it has to replug the primary device back.
I think that the "add_migration_state_change_notifier" API call can be used
from within the virtio-net device to achieve this, what do you think?
I think it would be good to hear from the libvirt folks (on Cc:) on this as
having QEMU unplug a device without libvirt's involvement seems like it
could cause issues. Personally I think it seems cleaner to just have QEMU
handle the 'hidden' aspects of the device and leave it to QMP/libvirt to do
the unplug beforehand. On the libvirt side I could imagine adding an option
like virsh migrate --switch-to-standby-networking or something along
that line to do it automatically (if we decide doing it automatically is
even needed on that end).
I remember talking about this approach some time ago.

In general the migration itself is a very complex process which has too
many places where it can fail. The same applies to device hotunplug.
This series proposes to merge those two together into an even more
complex behemoth.

Few scenarios which don't have clear solution come into my mind:
- Since unplug request time is actually unbounded. The guest OS may
  arbitrarily reject it or execute it at any later time, migration may get
  stuck in a halfway state without any clear rollback or failure scenario.

- After migration, device hotplug may fail for whatever reason, leaving
  networking crippled and again no clear single-case rollback scenario.

Then there's stuff which requires libvirt/management cooperation
- picking of the network device on destination
- making sure that the device is present etc.

From managements point of view, bundling all this together is really not
a good idea since it creates a very big matrix of failure scenarios. In
general even libvirt will prefer that upper layer management drives this
externally, since any rolback scenario will result in a policy decision
of what to do in certain cases, and what timeouts to pick.

Re: [libvirt] [Qemu-devel] [RFC 0/2] Attempt to implement the standby feature for assigned network devices

Peter Krempa