* Peter Krempa (pkrempa(a)redhat.com) wrote:
On Wed, May 13, 2015 at 09:08:39 +0100, Dr. David Alan Gilbert
wrote:
> * Peter Krempa (pkrempa(a)redhat.com) wrote:
> > On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote:
> > > my main goal is to add support migration with host NIC
> > > passthrough devices and keep the network connectivity.
> > >
> > > this series patch base on Shradha's patches on
> > >
https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html
> > > which is add migration support for host passthrough devices.
> > >
> > > 1) unplug the ephemeral devices before migration
> > >
> > > 2) do native migration
> > >
> > > 3) when migration finished, hotplug the ephemeral devices
> >
> > IMHO this algorithm is something that an upper layer management app
> > should do. The device unplug operation is complex and it might not
> > succeed which will make the current migration thread hang or fail in an
> > intermediate state that will not be recoverable.
>
> However you wouldn't want each of the upper layer management apps implementing
> their own hacks for this; so something somewhere needs to standardise
> what the guest sees.
The guest still will see an PCI device unplug request and will have to
respond to it, then will be paused and after resume a new PCI device
will appear. This is standardised. The nonstandardised part (which can't
really be standardised) is how the bonding or other guest-dependant
stuff will be handled, but that is up to the guest OS to handle.
Why can't that be standardised? Don't we need to provide the information
on what to bond to the guest and that this process is happening? The previous
suggestion was to use guest-agent for this.
From libvirt's perspective this is only something that will
trigger the
device unplug and plug the devices back. And there are a lot of issues
here:
1) the destination of the migration might not have the desired devices
This will trigger a lot of problems as we will not be able to guarantee
that the devices reappear on the destination and if we'd wanted to check
we'd need a new migration protocol AFAIK.
But if it's using the bonding trick then that isn't fatal; it would still
be able to have the bonded virtio device.
2) The guest OS might refuse to detach the PCI device (it might be
stuck
before PCI code is loaded)
In that case the migration will be stuck forever and abort attempts
will make the domain state basically undefined depending on the
phase where it failed.
Since we can't guarantee that the unplug of the PCI host devices will be
atomic or that it will succeed we basically can't guarantee in any way
in which state the VM will end up later after (a possibly failed)
migration. To recover such state there are too many option that could be
desired by the user that would be hard to implement in a way that would
be flexible enough.
I don't understand why this is any different to any other PCI device hot-unplug.
Dave
Peter
--
Dr. David Alan Gilbert / dgilbert(a)redhat.com / Manchester, UK