On 05/13/2015 10:30 PM, Laine Stump wrote:
> On 05/13/2015 05:57 AM, Daniel P. Berrange wrote:
>> On Wed, May 13, 2015 at 11:36:30AM +0800, Chen Fan wrote:
>>> add migration support for ephemeral host devices, introduce
>>> two 'detach' and 'restore' functions to unplug/plug host
devices
>>> during migration.
>>>
>>> Signed-off-by: Chen Fan <chen.fan.fnst(a)cn.fujitsu.com>
>>> ---
>>> src/qemu/qemu_migration.c | 171
>>> ++++++++++++++++++++++++++++++++++++++++++++--
>>> src/qemu/qemu_migration.h | 9 +++
>>> src/qemu/qemu_process.c | 11 +++
>>> 3 files changed, 187 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
>>> index 56112f9..d5a698f 100644
>>> --- a/src/qemu/qemu_migration.c
>>> +++ b/src/qemu/qemu_migration.c
>>> +void
>>> +qemuMigrationRestoreEphemeralDevices(virQEMUDriverPtr driver,
>>> + virConnectPtr conn,
>>> + virDomainObjPtr vm,
>>> + bool live)
>>> +{
>>> + qemuDomainObjPrivatePtr priv = vm->privateData;
>>> + virDomainDeviceDefPtr dev;
>>> + int ret = -1;
>>> + size_t i;
>>> +
>>> + VIR_DEBUG("Rum domain restore ephemeral devices");
>>> +
>>> + for (i = 0; i < priv->nEphemeralDevices; i++) {
>>> + dev = priv->ephemeralDevices[i];
>>> +
>>> + switch ((virDomainDeviceType) dev->type) {
>>> + case VIR_DOMAIN_DEVICE_NET:
>>> + if (live) {
>>> + ret = qemuDomainAttachNetDevice(conn, driver, vm,
>>> +
dev->data.net);
>>> + } else {
>>> + ret = virDomainNetInsert(vm->def,
dev->data.net);
>>> + }
>>> +
>>> + if (!ret)
>>> +
dev->data.net = NULL;
>>> + break;
>>> + case VIR_DOMAIN_DEVICE_HOSTDEV:
>>> + if (live) {
>>> + ret = qemuDomainAttachHostDevice(conn, driver, vm,
>>> + dev->data.hostdev);
>>> + } else {
>>> + ret =virDomainHostdevInsert(vm->def,
>>> dev->data.hostdev);
>>> + }
>> This re-attach step is where we actually have far far far worse
>> problems
>> than with detach. This is blindly assuming that the guest on the target
>> host can use the same hostdev that it was using on the source host.
> (kind of pointless to comment on, since pkrempa has changed my opinion
> by forcing me to think about the "failure to reattach" condition, but
> could be useful info for others)
>
> For a <hostdev>, yes, but not for <interface type='network'>
(which
> would point to a libvirt network pool of VFs).
>
>> This
>> is essentially useless in the real world.
> Agreed (for plain <hostdev>)
>
>> Even if the same vendor/model
>> device is available on the target host, it is very unlikely to be
>> available
>> at the same bus/slot/function that it was on the source. It is quite
>> likely
>> neccessary to allocate a complete different NIC, or if using SRIOV
>> allocate
>> a different function. It is also not uncommon to have different
>> vendor/models,
>> so a completely different NIC may be required.
> In the case of a network device, a different brand/model of NIC at a
> different PCI address using a different guest driver shouldn't be a
> problem for the guest, as long as the MAC address is the same (for a
> Linux guest anyway; not sure what a Windows guest would do with a NIC
> that had the same MAC but used a different driver). This points out the
> folly of trying to do migration with attached hostdevs (managed at *any*
> level), for anything other than SRIOV VFs (which can have their MAC
> address set before attach, unlike non-SRIOV NICs).
>
> .
So should we focus on implementing the feature that support migration
with SRIOV
VFs at first?
Not "at first", but "only". Adding the requirement of dealing
properly
with MAC address change to the guest adds a lot of complexity to that
code with not much real gain.
And based on my newfound realization of the horrible situation that
would be created by a failure to re-attach after migration was complete
(see my response to Peter Krempa yesterday), I now agree with Dan that
this shouldn't be implemented in libvirt, but in the higher level
management, which will be able to more easily/realistically deal with
such a failure.
(and by the way, I think I should apologize for leading you down the
road of the ephemeral patches in response to your earlier RFC. If only
I'd fully considered the post-migration re-attach failure case, and the
difficulty libvirt would have recovering from that prior to Peter
pointing it out so eloquently yesterday :-/)