* Daniel P. Berrange (berrange(a)redhat.com) wrote:
On Wed, May 13, 2015 at 10:00:42AM +0100, Dr. David Alan Gilbert
wrote:
> * Peter Krempa (pkrempa(a)redhat.com) wrote:
> > On Wed, May 13, 2015 at 09:40:23 +0100, Dr. David Alan Gilbert wrote:
> > > * Peter Krempa (pkrempa(a)redhat.com) wrote:
> > > > On Wed, May 13, 2015 at 09:08:39 +0100, Dr. David Alan Gilbert
wrote:
> > > > > * Peter Krempa (pkrempa(a)redhat.com) wrote:
> > > > > > On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote:
> > > > > > > my main goal is to add support migration with host
NIC
> > > > > > > passthrough devices and keep the network
connectivity.
> > > > > > >
> > > > > > > this series patch base on Shradha's patches on
> > > > > > >
https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html
> > > > > > > which is add migration support for host passthrough
devices.
> > > > > > >
> > > > > > > 1) unplug the ephemeral devices before migration
> > > > > > >
> > > > > > > 2) do native migration
> > > > > > >
> > > > > > > 3) when migration finished, hotplug the ephemeral
devices
> > > > > >
> > > > > > IMHO this algorithm is something that an upper layer
management app
> > > > > > should do. The device unplug operation is complex and it
might not
> > > > > > succeed which will make the current migration thread hang
or fail in an
> > > > > > intermediate state that will not be recoverable.
> > > > >
> > > > > However you wouldn't want each of the upper layer management
apps implementing
> > > > > their own hacks for this; so something somewhere needs to
standardise
> > > > > what the guest sees.
> > > >
> > > > The guest still will see an PCI device unplug request and will have
to
> > > > respond to it, then will be paused and after resume a new PCI device
> > > > will appear. This is standardised. The nonstandardised part (which
can't
> > > > really be standardised) is how the bonding or other guest-dependant
> > > > stuff will be handled, but that is up to the guest OS to handle.
> > >
> > > Why can't that be standardised? Don't we need to provide the
information
> > > on what to bond to the guest and that this process is happening? The
previous
> > > suggestion was to use guest-agent for this.
> >
> > Well, since only in linux you've got multiple ways to do that including
> > legacy init scripts on various distros, the systemd-networkd thingie or
> > how it's called or network manager, standardising this part won't be
> > that easy. Not speaking of possible different OSes.
>
> Right - so we need to standardise on the messaging we send to the guest to
> tell it that we've got this bonded hotplug setup, and then the different
> OSs can implement what they need off using that information.
>
> > > > From libvirt's perspective this is only something that will
trigger the
> > > > device unplug and plug the devices back. And there are a lot of
issues
> > > > here:
> > > >
> > > > 1) the destination of the migration might not have the desired
devices
> > > >
> > > > This will trigger a lot of problems as we will not be able to
guarantee
> > > > that the devices reappear on the destination and if we'd
wanted to check
> > > > we'd need a new migration protocol AFAIK.
> > >
> > > But if it's using the bonding trick then that isn't fatal; it
would still
> > > be able to have the bonded virtio device.
> > >
> > > > 2) The guest OS might refuse to detach the PCI device (it might be
stuck
> > > > before PCI code is loaded)
> > > >
> > > > In that case the migration will be stuck forever and abort
attempts
> > > > will make the domain state basically undefined depending on the
> > > > phase where it failed.
> > > >
> > > > Since we can't guarantee that the unplug of the PCI host devices
will be
> > > > atomic or that it will succeed we basically can't guarantee in
any way
> > > > in which state the VM will end up later after (a possibly failed)
> > > > migration. To recover such state there are too many option that could
be
> > > > desired by the user that would be hard to implement in a way that
would
> > > > be flexible enough.
> > >
> > > I don't understand why this is any different to any other PCI device
hot-unplug.
> >
> > It's the same, but once libvirt would be doing multiple PCI unplug
> > requests along with the migration code, things might not go well. If you
> > then couple this with different user expectations what should happen in
> > various error cases it gets even more messy.
>
> Well, since we've got the bond it shouldn't get quite that bad; the
> error cases don't sound that bad:
> 1) If we can't hot-unplug then we don't migrate/cancel migration.
> We warn the user, if we're unlucky we're left running on the bond.
> 2) If we can't hot-plug at the end, then we've still got the bond in,
> so the guest carries on running (albeit with reduced performance).
> We need to flag this to the user somehow.
If there are multiple PCI devices attached to the guest, we may end up
with some PCI devices removed and some still present, and some for which
we don't know if they are removed or present at all as the guest may simply
not have responded to us yet. Further there are devices which are not just
bonded NICs, so I'm really not happy for us to design a policy that works
for bonded NICs but which is quite possibly going to be useless for other
types of PCI device people will inevitably want to deal with later.
This is only trying to address the problem for devices that can have
the equivalent of a bond; so it's not NIC specific; the same should work for
storage devices with multipath.
Dave
--
Dr. David Alan Gilbert / dgilbert(a)redhat.com / Manchester, UK