[libvirt] [RFC v1 0/6] Live Migration with ephemeral host NIC devices

my main goal is to add support migration with host NIC passthrough devices and keep the network connectivity. this series patch base on Shradha's patches on https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html which is add migration support for host passthrough devices. 1) unplug the ephemeral devices before migration 2) do native migration 3) when migration finished, hotplug the ephemeral devices TODO: keep network connectivity on guest level by bonding device. Chen Fan (6): conf: add ephemeral element for hostdev supporting migration qemu: Save ephemeral devices into qemuDomainObjPrivate qemu: add check ephemeral devices only for PCI host devices migration: Migration support for ephemeral hostdevs managedsave: move the domain xml handling forward to stop CPU managedsave: add managedsave support for ephemeral host devices docs/schemas/domaincommon.rng | 10 ++ docs/schemas/network.rng | 5 + src/conf/domain_conf.c | 14 +- src/conf/domain_conf.h | 1 + src/conf/network_conf.c | 13 ++ src/conf/network_conf.h | 1 + src/network/bridge_driver.c | 1 + src/qemu/qemu_command.c | 11 ++ src/qemu/qemu_domain.c | 5 + src/qemu/qemu_domain.h | 3 + src/qemu/qemu_driver.c | 48 +++--- src/qemu/qemu_migration.c | 182 ++++++++++++++++++++- src/qemu/qemu_migration.h | 9 + src/qemu/qemu_process.c | 12 ++ tests/networkxml2xmlin/hostdev-pf.xml | 2 +- tests/networkxml2xmlin/hostdev.xml | 2 +- tests/networkxml2xmlout/hostdev-pf.xml | 2 +- tests/networkxml2xmlout/hostdev.xml | 2 +- .../qemuxml2argv-controller-order.xml | 2 +- .../qemuxml2argv-hostdev-pci-address-device.xml | 2 +- .../qemuxml2argv-hostdev-pci-address.xml | 2 +- .../qemuxml2argv-hostdev-scsi-autogen-address.xml | 22 +-- .../qemuxml2argv-hostdev-scsi-lsi-iscsi-auth.xml | 4 +- .../qemuxml2argv-hostdev-scsi-lsi-iscsi.xml | 4 +- .../qemuxml2argv-hostdev-scsi-lsi.xml | 2 +- .../qemuxml2argv-hostdev-scsi-rawio.xml | 2 +- .../qemuxml2argv-hostdev-scsi-readonly.xml | 2 +- .../qemuxml2argv-hostdev-scsi-sgio.xml | 2 +- .../qemuxml2argv-hostdev-scsi-shareable.xml | 2 +- ...qemuxml2argv-hostdev-scsi-virtio-iscsi-auth.xml | 4 +- .../qemuxml2argv-hostdev-scsi-virtio-iscsi.xml | 4 +- .../qemuxml2argv-hostdev-scsi-virtio-scsi.xml | 2 +- ...emuxml2argv-hostdev-usb-address-device-boot.xml | 2 +- .../qemuxml2argv-hostdev-usb-address-device.xml | 2 +- .../qemuxml2argv-hostdev-usb-address.xml | 2 +- .../qemuxml2argv-hostdev-vfio-multidomain.xml | 2 +- .../qemuxml2argvdata/qemuxml2argv-hostdev-vfio.xml | 2 +- .../qemuxml2argv-net-hostdev-multidomain.xml | 2 +- .../qemuxml2argv-net-hostdev-vfio-multidomain.xml | 2 +- .../qemuxml2argv-net-hostdev-vfio.xml | 2 +- .../qemuxml2argvdata/qemuxml2argv-net-hostdev.xml | 2 +- tests/qemuxml2argvdata/qemuxml2argv-pci-rom.xml | 4 +- ...qemuxml2xmlout-hostdev-scsi-autogen-address.xml | 22 +-- 43 files changed, 340 insertions(+), 83 deletions(-) -- 1.9.3

the ephemeral flag helps support migration with PCI-passthrough. An ephemeral hostdev is automatically unplugged before migration and replugged (if one is available on the destination) after migration. Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> --- docs/schemas/domaincommon.rng | 10 ++++++++++ docs/schemas/network.rng | 5 +++++ src/conf/domain_conf.c | 14 +++++++++++++- src/conf/domain_conf.h | 1 + src/conf/network_conf.c | 13 +++++++++++++ src/conf/network_conf.h | 1 + src/network/bridge_driver.c | 1 + src/qemu/qemu_command.c | 1 + tests/networkxml2xmlin/hostdev-pf.xml | 2 +- tests/networkxml2xmlin/hostdev.xml | 2 +- tests/networkxml2xmlout/hostdev-pf.xml | 2 +- tests/networkxml2xmlout/hostdev.xml | 2 +- .../qemuxml2argv-controller-order.xml | 2 +- .../qemuxml2argv-hostdev-pci-address-device.xml | 2 +- .../qemuxml2argv-hostdev-pci-address.xml | 2 +- .../qemuxml2argv-hostdev-scsi-autogen-address.xml | 22 +++++++++++----------- .../qemuxml2argv-hostdev-scsi-lsi-iscsi-auth.xml | 4 ++-- .../qemuxml2argv-hostdev-scsi-lsi-iscsi.xml | 4 ++-- .../qemuxml2argv-hostdev-scsi-lsi.xml | 2 +- .../qemuxml2argv-hostdev-scsi-rawio.xml | 2 +- .../qemuxml2argv-hostdev-scsi-readonly.xml | 2 +- .../qemuxml2argv-hostdev-scsi-sgio.xml | 2 +- .../qemuxml2argv-hostdev-scsi-shareable.xml | 2 +- ...qemuxml2argv-hostdev-scsi-virtio-iscsi-auth.xml | 4 ++-- .../qemuxml2argv-hostdev-scsi-virtio-iscsi.xml | 4 ++-- .../qemuxml2argv-hostdev-scsi-virtio-scsi.xml | 2 +- ...emuxml2argv-hostdev-usb-address-device-boot.xml | 2 +- .../qemuxml2argv-hostdev-usb-address-device.xml | 2 +- .../qemuxml2argv-hostdev-usb-address.xml | 2 +- .../qemuxml2argv-hostdev-vfio-multidomain.xml | 2 +- .../qemuxml2argvdata/qemuxml2argv-hostdev-vfio.xml | 2 +- .../qemuxml2argv-net-hostdev-multidomain.xml | 2 +- .../qemuxml2argv-net-hostdev-vfio-multidomain.xml | 2 +- .../qemuxml2argv-net-hostdev-vfio.xml | 2 +- .../qemuxml2argvdata/qemuxml2argv-net-hostdev.xml | 2 +- tests/qemuxml2argvdata/qemuxml2argv-pci-rom.xml | 4 ++-- ...qemuxml2xmlout-hostdev-scsi-autogen-address.xml | 22 +++++++++++----------- 37 files changed, 99 insertions(+), 55 deletions(-) diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index b1d883f..6f4551c 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -2261,6 +2261,11 @@ <ref name="virYesNo"/> </attribute> </optional> + <optional> + <attribute name="ephemeral"> + <ref name="virYesNo"/> + </attribute> + </optional> <interleave> <element name="source"> <optional> @@ -3717,6 +3722,11 @@ <ref name="virYesNo"/> </attribute> </optional> + <optional> + <attribute name="ephemeral"> + <ref name="virYesNo"/> + </attribute> + </optional> <choice> <ref name="hostdevsubsyspci"/> <ref name="hostdevsubsysusb"/> diff --git a/docs/schemas/network.rng b/docs/schemas/network.rng index 4edb6eb..d63b066 100644 --- a/docs/schemas/network.rng +++ b/docs/schemas/network.rng @@ -115,6 +115,11 @@ <ref name="virYesNo"/> </attribute> </optional> + <optional> + <attribute name="ephemeral"> + <ref name="virYesNo"/> + </attribute> + </optional> <interleave> <choice> <group> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 3d05844..a1a0602 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -4823,6 +4823,7 @@ virDomainHostdevDefParseXMLSubsys(xmlNodePtr node, { xmlNodePtr sourcenode; char *managed = NULL; + char *ephemeral = NULL; char *sgio = NULL; char *rawio = NULL; char *backendStr = NULL; @@ -4841,6 +4842,11 @@ virDomainHostdevDefParseXMLSubsys(xmlNodePtr node, def->managed = true; } + if ((ephemeral = virXMLPropString(node, "ephemeral")) != NULL) { + if (STREQ(ephemeral, "yes")) + def->ephemeral = true; + } + sgio = virXMLPropString(node, "sgio"); rawio = virXMLPropString(node, "rawio"); @@ -18064,8 +18070,10 @@ virDomainActualNetDefFormat(virBufferPtr buf, virBufferAsprintf(buf, "<actual type='%s'", typeStr); if (type == VIR_DOMAIN_NET_TYPE_HOSTDEV) { virDomainHostdevDefPtr hostdef = virDomainNetGetActualHostdev(def); - if (hostdef && hostdef->managed) + if (hostdef && hostdef->managed) virBufferAddLit(buf, " managed='yes'"); + if (hostdef && hostdef->ephemeral) + virBufferAddLit(buf, " ephemeral='yes'"); } if (def->trustGuestRxFilters) virBufferAsprintf(buf, " trustGuestRxFilters='%s'", @@ -18236,6 +18244,8 @@ virDomainNetDefFormat(virBufferPtr buf, virBufferAsprintf(buf, "<interface type='%s'", typeStr); if (hostdef && hostdef->managed) virBufferAddLit(buf, " managed='yes'"); + if (hostdef && hostdef->ephemeral) + virBufferAddLit(buf, " ephemeral='yes'"); if (def->trustGuestRxFilters) virBufferAsprintf(buf, " trustGuestRxFilters='%s'", virTristateBoolTypeToString(def->trustGuestRxFilters)); @@ -19562,6 +19572,8 @@ virDomainHostdevDefFormat(virBufferPtr buf, if (def->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS) { virBufferAsprintf(buf, " managed='%s'", def->managed ? "yes" : "no"); + virBufferAsprintf(buf, " ephemeral='%s'", + def->ephemeral ? "yes" : "no"); if (def->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_SCSI && scsisrc->sgio) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index f36315b..0d64add 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -519,6 +519,7 @@ struct _virDomainHostdevDef { bool missing; bool readonly; bool shareable; + bool ephemeral; union { virDomainHostdevSubsys subsys; virDomainHostdevCaps caps; diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index d7c5dec..7107bb5 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -1710,6 +1710,7 @@ virNetworkForwardDefParseXML(const char *networkName, int nForwardIfs, nForwardAddrs, nForwardPfs, nForwardNats; char *forwardDev = NULL; char *forwardManaged = NULL; + char *forwardEphemeral = NULL; char *forwardDriverName = NULL; char *type = NULL; xmlNodePtr save = ctxt->node; @@ -1733,6 +1734,12 @@ virNetworkForwardDefParseXML(const char *networkName, def->managed = true; } + forwardEphemeral = virXPathString("string(./@ephemeral)", ctxt); + if (forwardEphemeral != NULL && + STRCASEEQ(forwardEphemeral, "yes")) { + def->ephemeral = true; + } + forwardDriverName = virXPathString("string(./driver/@name)", ctxt); if (forwardDriverName) { int driverName @@ -2631,6 +2638,12 @@ virNetworkDefFormatBuf(virBufferPtr buf, else virBufferAddLit(buf, " managed='no'"); } + if (def->forward.type == VIR_NETWORK_FORWARD_HOSTDEV) { + if (def->forward.ephemeral) + virBufferAddLit(buf, " ephemeral='yes'"); + else + virBufferAddLit(buf, " ephemeral='no'"); + } shortforward = !(def->forward.nifs || def->forward.npfs || VIR_SOCKET_ADDR_VALID(&def->forward.addr.start) || VIR_SOCKET_ADDR_VALID(&def->forward.addr.end) diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h index 3e926f7..29aa4f6 100644 --- a/src/conf/network_conf.h +++ b/src/conf/network_conf.h @@ -188,6 +188,7 @@ typedef virNetworkForwardDef *virNetworkForwardDefPtr; struct _virNetworkForwardDef { int type; /* One of virNetworkForwardType constants */ bool managed; /* managed attribute for hostdev mode */ + bool ephemeral; /* ephemeral attribute for hostdev mode */ int driverName; /* enum virNetworkForwardDriverNameType */ /* If there are multiple forward devices (i.e. a pool of diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 13e1717..eb4838f 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -3903,6 +3903,7 @@ networkAllocateActualDevice(virDomainDefPtr dom, iface->data.network.actual->data.hostdev.def.info = &iface->info; iface->data.network.actual->data.hostdev.def.mode = VIR_DOMAIN_HOSTDEV_MODE_SUBSYS; iface->data.network.actual->data.hostdev.def.managed = netdef->forward.managed ? 1 : 0; + iface->data.network.actual->data.hostdev.def.ephemeral = netdef->forward.ephemeral ? 1 : 0; iface->data.network.actual->data.hostdev.def.source.subsys.type = dev->type; iface->data.network.actual->data.hostdev.def.source.subsys.u.pci.addr = dev->device.pci; diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 5303de5..fc81214 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -11493,6 +11493,7 @@ qemuParseCommandLinePCI(const char *val) def->mode = VIR_DOMAIN_HOSTDEV_MODE_SUBSYS; def->managed = true; + def->ephemeral = true; def->source.subsys.type = VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI; def->source.subsys.u.pci.addr.bus = bus; def->source.subsys.u.pci.addr.slot = slot; diff --git a/tests/networkxml2xmlin/hostdev-pf.xml b/tests/networkxml2xmlin/hostdev-pf.xml index 5b8f598..cfb0f7c 100644 --- a/tests/networkxml2xmlin/hostdev-pf.xml +++ b/tests/networkxml2xmlin/hostdev-pf.xml @@ -1,7 +1,7 @@ <network> <name>hostdev</name> <uuid>81ff0d90-c91e-6742-64da-4a736edb9a9b</uuid> - <forward mode='hostdev' managed='yes'> + <forward mode='hostdev' managed='yes' ephemeral='yes'> <driver name='vfio'/> <pf dev='eth2'/> </forward> diff --git a/tests/networkxml2xmlin/hostdev.xml b/tests/networkxml2xmlin/hostdev.xml index 03f1411..406c2df 100644 --- a/tests/networkxml2xmlin/hostdev.xml +++ b/tests/networkxml2xmlin/hostdev.xml @@ -1,7 +1,7 @@ <network> <name>hostdev</name> <uuid>81ff0d90-c91e-6742-64da-4a736edb9a9b</uuid> - <forward mode='hostdev' managed='yes'> + <forward mode='hostdev' managed='yes' ephemeral='yes'> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x1'/> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x2'/> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x3'/> diff --git a/tests/networkxml2xmlout/hostdev-pf.xml b/tests/networkxml2xmlout/hostdev-pf.xml index 5b8f598..cfb0f7c 100644 --- a/tests/networkxml2xmlout/hostdev-pf.xml +++ b/tests/networkxml2xmlout/hostdev-pf.xml @@ -1,7 +1,7 @@ <network> <name>hostdev</name> <uuid>81ff0d90-c91e-6742-64da-4a736edb9a9b</uuid> - <forward mode='hostdev' managed='yes'> + <forward mode='hostdev' managed='yes' ephemeral='yes'> <driver name='vfio'/> <pf dev='eth2'/> </forward> diff --git a/tests/networkxml2xmlout/hostdev.xml b/tests/networkxml2xmlout/hostdev.xml index 03f1411..406c2df 100644 --- a/tests/networkxml2xmlout/hostdev.xml +++ b/tests/networkxml2xmlout/hostdev.xml @@ -1,7 +1,7 @@ <network> <name>hostdev</name> <uuid>81ff0d90-c91e-6742-64da-4a736edb9a9b</uuid> - <forward mode='hostdev' managed='yes'> + <forward mode='hostdev' managed='yes' ephemeral='yes'> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x1'/> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x2'/> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x3'/> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-controller-order.xml b/tests/qemuxml2argvdata/qemuxml2argv-controller-order.xml index 07db77e..b492d9e 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-controller-order.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-controller-order.xml @@ -76,7 +76,7 @@ <model type='cirrus' vram='16384' heads='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> - <hostdev mode='subsystem' type='usb' managed='yes'> + <hostdev mode='subsystem' type='usb' managed='yes' ephemeral='no'> <source> <address bus='14' device='6'/> </source> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-pci-address-device.xml b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-pci-address-device.xml index b29ef58..98b2ced 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-pci-address-device.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-pci-address-device.xml @@ -18,7 +18,7 @@ <source dev='/dev/HostVG/QEMUGuest2'/> <target dev='hda' bus='ide'/> </disk> - <hostdev mode='subsystem' type='pci' managed='yes'> + <hostdev mode='subsystem' type='pci' managed='yes' ephemeral='yes'> <source> <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> </source> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-pci-address.xml b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-pci-address.xml index b9b5c14..2b9d722 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-pci-address.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-pci-address.xml @@ -23,7 +23,7 @@ <controller type='usb' index='0'/> <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> - <hostdev mode='subsystem' type='pci' managed='yes'> + <hostdev mode='subsystem' type='pci' managed='yes' ephemeral='yes'> <source> <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> </source> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-autogen-address.xml b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-autogen-address.xml index 21f112b..cd2c2ca 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-autogen-address.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-autogen-address.xml @@ -23,68 +23,68 @@ <controller type='usb' index='0'/> <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host0'/> <address bus='0' target='0' unit='0'/> </source> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host1'/> <address bus='0' target='0' unit='1'/> </source> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host2'/> <address bus='0' target='0' unit='2'/> </source> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host3'/> <address bus='0' target='0' unit='3'/> </source> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host4'/> <address bus='0' target='0' unit='4'/> </source> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host5'/> <address bus='0' target='0' unit='5'/> </source> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host6'/> <address bus='0' target='0' unit='6'/> </source> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host7'/> <address bus='0' target='0' unit='7'/> </source> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host8'/> <address bus='0' target='0' unit='8'/> </source> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host9'/> <address bus='0' target='0' unit='9'/> </source> <address type='drive' controller='1' bus='0' target='0' unit='5'/> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host10'/> <address bus='0' target='0' unit='10'/> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-lsi-iscsi-auth.xml b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-lsi-iscsi-auth.xml index 3bfded4..0d9126e 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-lsi-iscsi-auth.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-lsi-iscsi-auth.xml @@ -23,7 +23,7 @@ <controller type='usb' index='0'/> <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source protocol='iscsi' name='iqn.1992-01.com.example'> <host name='example.org' port='3260'/> <auth username='myname'> @@ -32,7 +32,7 @@ </source> <address type='drive' controller='0' bus='0' target='0' unit='4'/> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source protocol='iscsi' name='iqn.1992-01.com.example/1'> <host name='example.org' port='3260'/> <auth username='myname'> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-lsi-iscsi.xml b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-lsi-iscsi.xml index 8a05099..1de1796 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-lsi-iscsi.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-lsi-iscsi.xml @@ -23,13 +23,13 @@ <controller type='usb' index='0'/> <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source protocol='iscsi' name='iqn.1992-01.com.example'> <host name='example.org' port='3260'/> </source> <address type='drive' controller='0' bus='0' target='0' unit='4'/> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source protocol='iscsi' name='iqn.1992-01.com.example/1'> <host name='example.org' port='3260'/> </source> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-lsi.xml b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-lsi.xml index 98c469c..d5b03f6 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-lsi.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-lsi.xml @@ -23,7 +23,7 @@ <controller type='usb' index='0'/> <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host0'/> <address bus='0' target='0' unit='0'/> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-rawio.xml b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-rawio.xml index 69fdde3..40efa63 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-rawio.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-rawio.xml @@ -23,7 +23,7 @@ <controller type='usb' index='0'/> <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> - <hostdev mode='subsystem' type='scsi' managed='yes' sgio='unfiltered' rawio='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no' sgio='unfiltered' rawio='yes'> <source> <adapter name='scsi_host0'/> <address bus='0' target='0' unit='0'/> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-readonly.xml b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-readonly.xml index 359bb95..ab56fb9 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-readonly.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-readonly.xml @@ -23,7 +23,7 @@ <controller type='usb' index='0'/> <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host0'/> <address bus='0' target='0' unit='0'/> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-sgio.xml b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-sgio.xml index 21404ee..cfacc6e 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-sgio.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-sgio.xml @@ -23,7 +23,7 @@ <controller type='usb' index='0'/> <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> - <hostdev mode='subsystem' type='scsi' managed='yes' sgio='unfiltered'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no' sgio='unfiltered'> <source> <adapter name='scsi_host0'/> <address bus='0' target='0' unit='0'/> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-shareable.xml b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-shareable.xml index f9ce8af..bff49ba 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-shareable.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-shareable.xml @@ -23,7 +23,7 @@ <controller type='usb' index='0'/> <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host0'/> <address bus='0' target='0' unit='0'/> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-virtio-iscsi-auth.xml b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-virtio-iscsi-auth.xml index d4dba4a..e5bc3d2 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-virtio-iscsi-auth.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-virtio-iscsi-auth.xml @@ -23,7 +23,7 @@ <controller type='usb' index='0'/> <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source protocol='iscsi' name='iqn.1992-01.com.example'> <host name='example.org' port='3260'/> <auth username='myname'> @@ -32,7 +32,7 @@ </source> <address type='drive' controller='0' bus='0' target='2' unit='4'/> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source protocol='iscsi' name='iqn.1992-01.com.example/1'> <host name='example.org' port='3260'/> <auth username='myname'> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-virtio-iscsi.xml b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-virtio-iscsi.xml index 13c0930..c1c3581 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-virtio-iscsi.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-virtio-iscsi.xml @@ -23,13 +23,13 @@ <controller type='usb' index='0'/> <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source protocol='iscsi' name='iqn.1992-01.com.example'> <host name='example.org' port='3260'/> </source> <address type='drive' controller='0' bus='0' target='2' unit='4'/> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source protocol='iscsi' name='iqn.1992-01.com.example/1'> <host name='example.org' port='3260'/> </source> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-virtio-scsi.xml b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-virtio-scsi.xml index 5a263e7..a8a1dbd 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-virtio-scsi.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-scsi-virtio-scsi.xml @@ -23,7 +23,7 @@ <controller type='usb' index='0'/> <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host0'/> <address bus='0' target='0' unit='0'/> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-usb-address-device-boot.xml b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-usb-address-device-boot.xml index c11a963..ce393df 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-usb-address-device-boot.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-usb-address-device-boot.xml @@ -17,7 +17,7 @@ <source dev='/dev/HostVG/QEMUGuest1'/> <target dev='hda' bus='ide'/> </disk> - <hostdev mode='subsystem' type='usb' managed='no'> + <hostdev mode='subsystem' type='usb' managed='no' ephemeral='no'> <source> <address bus='14' device='6'/> </source> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-usb-address-device.xml b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-usb-address-device.xml index c5992ef..f3319cc 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-usb-address-device.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-usb-address-device.xml @@ -18,7 +18,7 @@ <source dev='/dev/HostVG/QEMUGuest1'/> <target dev='hda' bus='ide'/> </disk> - <hostdev mode='subsystem' type='usb' managed='no'> + <hostdev mode='subsystem' type='usb' managed='no' ephemeral='no'> <source> <address bus='14' device='6'/> </source> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-usb-address.xml b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-usb-address.xml index 5807eff..c48a977 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-usb-address.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-usb-address.xml @@ -23,7 +23,7 @@ <controller type='usb' index='0'/> <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> - <hostdev mode='subsystem' type='usb' managed='no'> + <hostdev mode='subsystem' type='usb' managed='no' ephemeral='no'> <source> <address bus='14' device='6'/> </source> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-vfio-multidomain.xml b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-vfio-multidomain.xml index efbff38..1b29ded 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-vfio-multidomain.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-vfio-multidomain.xml @@ -22,7 +22,7 @@ <controller type='usb' index='0'/> <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> - <hostdev mode='subsystem' type='pci' managed='yes'> + <hostdev mode='subsystem' type='pci' managed='yes' ephemeral='yes'> <driver name='vfio'/> <source> <address domain='0x55aa' bus='32' slot='15' function='3'/> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-vfio.xml b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-vfio.xml index 8daa53a..7e6748b 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hostdev-vfio.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-hostdev-vfio.xml @@ -22,7 +22,7 @@ <controller type='usb' index='0'/> <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> - <hostdev mode='subsystem' type='pci' managed='yes'> + <hostdev mode='subsystem' type='pci' managed='yes' ephemeral='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev-multidomain.xml b/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev-multidomain.xml index 14b9515..fa07462 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev-multidomain.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev-multidomain.xml @@ -22,7 +22,7 @@ <controller type='usb' index='0'/> <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> - <interface type='hostdev' managed='yes'> + <interface type='hostdev' managed='yes' ephemeral='yes'> <mac address='00:11:22:33:44:55'/> <source> <address type='pci' domain='0x2424' bus='0x21' slot='0x1c' function='0x6'/> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev-vfio-multidomain.xml b/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev-vfio-multidomain.xml index 5e834ad..fe71993 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev-vfio-multidomain.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev-vfio-multidomain.xml @@ -22,7 +22,7 @@ <controller type='usb' index='0'/> <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> - <interface type='hostdev' managed='yes'> + <interface type='hostdev' managed='yes' ephemeral='yes'> <mac address='00:11:22:33:44:55'/> <driver name='vfio'/> <source> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev-vfio.xml b/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev-vfio.xml index b4f5e72..e0ff4a6 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev-vfio.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev-vfio.xml @@ -22,7 +22,7 @@ <controller type='usb' index='0'/> <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> - <interface type='hostdev' managed='yes'> + <interface type='hostdev' managed='yes' ephemeral='yes'> <mac address='00:11:22:33:44:55'/> <driver name='vfio'/> <source> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev.xml b/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev.xml index f88eefc..b851944 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev.xml @@ -22,7 +22,7 @@ <controller type='usb' index='0'/> <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> - <interface type='hostdev' managed='yes'> + <interface type='hostdev' managed='yes' ephemeral='yes'> <mac address='00:11:22:33:44:55'/> <source> <address type='pci' domain='0x0000' bus='0x03' slot='0x07' function='0x1'/> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-pci-rom.xml b/tests/qemuxml2argvdata/qemuxml2argv-pci-rom.xml index a5e59b2..f0de8fb 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-pci-rom.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-pci-rom.xml @@ -32,13 +32,13 @@ <model type='virtio'/> <rom file='/etc/fake/bootrom.bin'/> </interface> - <hostdev mode='subsystem' type='pci' managed='yes'> + <hostdev mode='subsystem' type='pci' managed='yes' ephemeral='no'> <source> <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> </source> <rom bar='off'/> </hostdev> - <hostdev mode='subsystem' type='pci' managed='yes'> + <hostdev mode='subsystem' type='pci' managed='yes' ephemeral='no'> <source> <address domain='0x0000' bus='0x06' slot='0x12' function='0x6'/> </source> diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-hostdev-scsi-autogen-address.xml b/tests/qemuxml2xmloutdata/qemuxml2xmlout-hostdev-scsi-autogen-address.xml index e416654..a11eb2e 100644 --- a/tests/qemuxml2xmloutdata/qemuxml2xmlout-hostdev-scsi-autogen-address.xml +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-hostdev-scsi-autogen-address.xml @@ -24,77 +24,77 @@ <controller type='ide' index='0'/> <controller type='pci' index='0' model='pci-root'/> <controller type='scsi' index='1'/> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host0'/> <address bus='0' target='0' unit='0'/> </source> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host1'/> <address bus='0' target='0' unit='1'/> </source> <address type='drive' controller='0' bus='0' target='0' unit='1'/> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host2'/> <address bus='0' target='0' unit='2'/> </source> <address type='drive' controller='0' bus='0' target='0' unit='2'/> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host3'/> <address bus='0' target='0' unit='3'/> </source> <address type='drive' controller='0' bus='0' target='0' unit='3'/> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host4'/> <address bus='0' target='0' unit='4'/> </source> <address type='drive' controller='0' bus='0' target='0' unit='4'/> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host5'/> <address bus='0' target='0' unit='5'/> </source> <address type='drive' controller='0' bus='0' target='0' unit='5'/> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host6'/> <address bus='0' target='0' unit='6'/> </source> <address type='drive' controller='0' bus='0' target='0' unit='6'/> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host7'/> <address bus='0' target='0' unit='7'/> </source> <address type='drive' controller='1' bus='0' target='0' unit='0'/> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host8'/> <address bus='0' target='0' unit='8'/> </source> <address type='drive' controller='1' bus='0' target='0' unit='1'/> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host9'/> <address bus='0' target='0' unit='9'/> </source> <address type='drive' controller='1' bus='0' target='0' unit='5'/> </hostdev> - <hostdev mode='subsystem' type='scsi' managed='yes'> + <hostdev mode='subsystem' type='scsi' managed='yes' ephemeral='no'> <source> <adapter name='scsi_host10'/> <address bus='0' target='0' unit='10'/> -- 1.9.3

after migration we should restore the ephemeral devices. so we save them to qemuDomainObjPrivate. Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> --- src/qemu/qemu_domain.c | 5 +++++ src/qemu/qemu_domain.h | 3 +++ 2 files changed, 8 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index d8a2087..5ce933d 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -425,6 +425,7 @@ static void qemuDomainObjPrivateFree(void *data) { qemuDomainObjPrivatePtr priv = data; + size_t i; virObjectUnref(priv->qemuCaps); @@ -441,6 +442,10 @@ qemuDomainObjPrivateFree(void *data) virCondDestroy(&priv->unplugFinished); virChrdevFree(priv->devs); + for (i = 0; i < priv->nEphemeralDevices; i++) + virDomainDeviceDefFree(priv->ephemeralDevices[i]); + VIR_FREE(priv->ephemeralDevices); + /* This should never be non-NULL if we get here, but just in case... */ if (priv->mon) { VIR_ERROR(_("Unexpected QEMU monitor still active during domain deletion")); diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index fe3e2b1..e75c828 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -180,6 +180,9 @@ struct _qemuDomainObjPrivate { size_t ncleanupCallbacks; size_t ncleanupCallbacks_max; + virDomainDeviceDefPtr *ephemeralDevices; + size_t nEphemeralDevices; + virCgroupPtr cgroup; virCond unplugFinished; /* signals that unpluggingDevice was unplugged */ -- 1.9.3

currently, we only support PCI host devices with ephemeral flag. and USB already supports migration. so maybe in the near future we can support SCSI. Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> --- src/qemu/qemu_command.c | 10 ++++++++++ src/qemu/qemu_migration.c | 11 +++++++---- 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index fc81214..5acd8b4 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -10182,6 +10182,16 @@ qemuBuildCommandLine(virConnectPtr conn, /* PCI */ if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && + hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB && + (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && + hostdev->ephemeral)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("non-USB and non-PCI device assignment with ephemeral " + "flag are not supported by this version of qemu")); + goto error; + } + + if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) { int backend = hostdev->source.subsys.u.pci.backend; diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 83be435..56112f9 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1981,21 +1981,24 @@ qemuMigrationIsAllowed(virQEMUDriverPtr driver, virDomainObjPtr vm, def = vm->def; } - /* Migration with USB host devices is allowed, all other devices are - * forbidden. + /* + * Migration with USB and ephemeral PCI host devices host devices are allowed, + * all other devices are forbidden. */ forbid = false; for (i = 0; i < def->nhostdevs; i++) { virDomainHostdevDefPtr hostdev = def->hostdevs[i]; if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS || - hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB) { + (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB && + !hostdev->ephemeral)) { forbid = true; break; } } if (forbid) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", - _("domain has assigned non-USB host devices")); + _("domain has assigned non-USB and " + "non-ephemeral host devices")); return false; } -- 1.9.3

On Wed, May 13, 2015 at 11:36:29 +0800, Chen Fan wrote:
currently, we only support PCI host devices with ephemeral flag. and USB already supports migration. so maybe in the near future we can support SCSI.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> --- src/qemu/qemu_command.c | 10 ++++++++++ src/qemu/qemu_migration.c | 11 +++++++---- 2 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index fc81214..5acd8b4 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -10182,6 +10182,16 @@ qemuBuildCommandLine(virConnectPtr conn,
/* PCI */ if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && + hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB && + (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && + hostdev->ephemeral)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("non-USB and non-PCI device assignment with ephemeral " + "flag are not supported by this version of qemu"));
This functionality is not based on qemu support but on libvirt implementation so the error message is incorrect.
+ goto error; + } + + if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) { int backend = hostdev->source.subsys.u.pci.backend;
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 83be435..56112f9 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1981,21 +1981,24 @@ qemuMigrationIsAllowed(virQEMUDriverPtr driver, virDomainObjPtr vm, def = vm->def; }
- /* Migration with USB host devices is allowed, all other devices are - * forbidden. + /* + * Migration with USB and ephemeral PCI host devices host devices are allowed, + * all other devices are forbidden. */ forbid = false; for (i = 0; i < def->nhostdevs; i++) { virDomainHostdevDefPtr hostdev = def->hostdevs[i]; if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS || - hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB) { + (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB && + !hostdev->ephemeral)) { forbid = true; break; } } if (forbid) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", - _("domain has assigned non-USB host devices")); + _("domain has assigned non-USB and " + "non-ephemeral host devices")); return false; }
This patch has to be moved after you actually implement the ephemeral device unplug code, since an intermediate state would allow to bypass the check while the devices actually would not be unplugged. Peter

On 05/13/2015 04:17 PM, Peter Krempa wrote:
On Wed, May 13, 2015 at 11:36:29 +0800, Chen Fan wrote:
currently, we only support PCI host devices with ephemeral flag. and USB already supports migration. so maybe in the near future we can support SCSI.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> --- src/qemu/qemu_command.c | 10 ++++++++++ src/qemu/qemu_migration.c | 11 +++++++---- 2 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index fc81214..5acd8b4 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -10182,6 +10182,16 @@ qemuBuildCommandLine(virConnectPtr conn,
/* PCI */ if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && + hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB && + (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && + hostdev->ephemeral)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("non-USB and non-PCI device assignment with ephemeral " + "flag are not supported by this version of qemu")); This functionality is not based on qemu support but on libvirt implementation so the error message is incorrect. indeed.
thanks for pointing out this. Chen
+ goto error; + } + + if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) { int backend = hostdev->source.subsys.u.pci.backend;
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 83be435..56112f9 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1981,21 +1981,24 @@ qemuMigrationIsAllowed(virQEMUDriverPtr driver, virDomainObjPtr vm, def = vm->def; }
- /* Migration with USB host devices is allowed, all other devices are - * forbidden. + /* + * Migration with USB and ephemeral PCI host devices host devices are allowed, + * all other devices are forbidden. */ forbid = false; for (i = 0; i < def->nhostdevs; i++) { virDomainHostdevDefPtr hostdev = def->hostdevs[i]; if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS || - hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB) { + (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB && + !hostdev->ephemeral)) { forbid = true; break; } } if (forbid) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", - _("domain has assigned non-USB host devices")); + _("domain has assigned non-USB and " + "non-ephemeral host devices")); return false; } This patch has to be moved after you actually implement the ephemeral device unplug code, since an intermediate state would allow to bypass the check while the devices actually would not be unplugged.
Peter

add migration support for ephemeral host devices, introduce two 'detach' and 'restore' functions to unplug/plug host devices during migration. Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> --- src/qemu/qemu_migration.c | 171 ++++++++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_migration.h | 9 +++ src/qemu/qemu_process.c | 11 +++ 3 files changed, 187 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 56112f9..d5a698f 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -3384,6 +3384,158 @@ qemuMigrationPrepareDef(virQEMUDriverPtr driver, return def; } +int +qemuMigrationDetachEphemeralDevices(virQEMUDriverPtr driver, + virDomainObjPtr vm, + bool live) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainHostdevDefPtr hostdev; + virDomainNetDefPtr net; + virDomainDeviceDef dev; + virDomainDeviceDefPtr dev_copy = NULL; + virCapsPtr caps = NULL; + int actualType; + int ret = -1; + size_t i; + + VIR_DEBUG("Rum domain detach ephemeral devices"); + + if (!(caps = virQEMUDriverGetCapabilities(driver, false))) + return ret; + + for (i = 0; i < vm->def->nnets;) { + net = vm->def->nets[i]; + + actualType = virDomainNetGetActualType(net); + if (actualType != VIR_DOMAIN_NET_TYPE_HOSTDEV) { + i++; + continue; + } + + hostdev = virDomainNetGetActualHostdev(net); + if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS || + hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI || + !hostdev->ephemeral) { + i++; + continue; + } + + dev.type = VIR_DOMAIN_DEVICE_NET; + dev.data.net = net; + + dev_copy = virDomainDeviceDefCopy(&dev, vm->def, + caps, driver->xmlopt); + if (!dev_copy) + goto cleanup; + + if (live) { + /* nnets reduced */ + if (qemuDomainDetachNetDevice(driver, vm, dev_copy) < 0) + goto cleanup; + } else { + virDomainNetDefFree(virDomainNetRemove(vm->def, i)); + } + if (VIR_APPEND_ELEMENT(priv->ephemeralDevices, + priv->nEphemeralDevices, + dev_copy) < 0) { + goto cleanup; + } + dev_copy = NULL; + } + + for (i = 0; i < vm->def->nhostdevs;) { + hostdev = vm->def->hostdevs[i]; + + if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS || + hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI || + !hostdev->ephemeral) { + i++; + continue; + } + + dev.type = VIR_DOMAIN_DEVICE_HOSTDEV; + dev.data.hostdev = hostdev; + + VIR_FREE(dev_copy); + dev_copy = virDomainDeviceDefCopy(&dev, vm->def, + caps, driver->xmlopt); + if (!dev_copy) + goto cleanup; + + if (live) { + /* nhostdevs reduced */ + if (qemuDomainDetachHostDevice(driver, vm, dev_copy) < 0) + goto cleanup; + } else { + virDomainHostdevDefFree(virDomainHostdevRemove(vm->def, i)); + } + if (VIR_APPEND_ELEMENT(priv->ephemeralDevices, + priv->nEphemeralDevices, + dev_copy) < 0) { + goto cleanup; + } + dev_copy = NULL; + } + + ret = 0; + cleanup: + virDomainDeviceDefFree(dev_copy); + virObjectUnref(caps); + + return ret; +} + +void +qemuMigrationRestoreEphemeralDevices(virQEMUDriverPtr driver, + virConnectPtr conn, + virDomainObjPtr vm, + bool live) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDeviceDefPtr dev; + int ret = -1; + size_t i; + + VIR_DEBUG("Rum domain restore ephemeral devices"); + + for (i = 0; i < priv->nEphemeralDevices; i++) { + dev = priv->ephemeralDevices[i]; + + switch ((virDomainDeviceType) dev->type) { + case VIR_DOMAIN_DEVICE_NET: + if (live) { + ret = qemuDomainAttachNetDevice(conn, driver, vm, + dev->data.net); + } else { + ret = virDomainNetInsert(vm->def, dev->data.net); + } + + if (!ret) + dev->data.net = NULL; + break; + case VIR_DOMAIN_DEVICE_HOSTDEV: + if (live) { + ret = qemuDomainAttachHostDevice(conn, driver, vm, + dev->data.hostdev); + } else { + ret =virDomainHostdevInsert(vm->def, dev->data.hostdev); + } + if (!ret) + dev->data.hostdev = NULL; + break; + default: + ret = -1; + } + + if (ret == -1) + VIR_WARN("Unable to restore ephemeral device on domain %s ", + vm->def->name); + virDomainDeviceDefFree(dev); + } + VIR_FREE(priv->ephemeralDevices); + priv->nEphemeralDevices = 0; +} static int qemuMigrationConfirmPhase(virQEMUDriverPtr driver, @@ -3454,6 +3606,7 @@ qemuMigrationConfirmPhase(virQEMUDriverPtr driver, /* cancel any outstanding NBD jobs */ qemuMigrationCancelDriveMirror(mig, driver, vm); + qemuMigrationRestoreEphemeralDevices(driver, conn, vm, true); if (qemuMigrationRestoreDomainState(conn, vm)) { event = virDomainEventLifecycleNewFromObj(vm, @@ -4842,6 +4995,9 @@ qemuMigrationPerformJob(virQEMUDriverPtr driver, qemuMigrationStoreDomainState(vm); + if (qemuMigrationDetachEphemeralDevices(driver, vm, true) < 0) + goto endjob; + if ((flags & (VIR_MIGRATE_TUNNELLED | VIR_MIGRATE_PEER2PEER))) { ret = doPeer2PeerMigrate(driver, conn, vm, xmlin, dconnuri, uri, graphicsuri, listenAddress, @@ -4931,6 +5087,9 @@ qemuMigrationPerformPhase(virQEMUDriverPtr driver, virCloseCallbacksUnset(driver->closeCallbacks, vm, qemuMigrationCleanup); + if (qemuMigrationDetachEphemeralDevices(driver, vm, true) < 0) + goto endjob; + ret = doNativeMigrate(driver, vm, uri, cookiein, cookieinlen, cookieout, cookieoutlen, flags, resource, NULL, graphicsuri); @@ -5272,10 +5431,14 @@ qemuMigrationFinish(virQEMUDriverPtr driver, } } - if (virDomainObjIsActive(vm) && - virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0) { - VIR_WARN("Failed to save status on vm %s", vm->def->name); - goto endjob; + if (virDomainObjIsActive(vm)) { + /* Check whether exist ephemeral devices, hotplug them. */ + qemuMigrationRestoreEphemeralDevices(driver, dconn, vm, true); + + if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0) { + VIR_WARN("Failed to save status on vm %s", vm->def->name); + goto endjob; + } } /* Guest is successfully running, so cancel previous auto destroy */ diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index 1726455..e378b30 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -177,4 +177,13 @@ int qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm, ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(5) ATTRIBUTE_RETURN_CHECK; +int +qemuMigrationDetachEphemeralDevices(virQEMUDriverPtr driver, + virDomainObjPtr vm, + bool live); +void +qemuMigrationRestoreEphemeralDevices(virQEMUDriverPtr driver, + virConnectPtr conn, + virDomainObjPtr vm, + bool live); #endif /* __QEMU_MIGRATION_H__ */ diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index d1f089d..904c447 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -4496,6 +4496,15 @@ int qemuProcessStart(virConnectPtr conn, if (qemuNetworkPrepareDevices(vm->def) < 0) goto cleanup; + /* + * Ephemeral device would be hotplugged at a later stage + * during migration. hence we should remove the reserved + * PCI address for ephemeral device. + */ + if (vmop == VIR_NETDEV_VPORT_PROFILE_OP_MIGRATE_IN_START) + if (qemuMigrationDetachEphemeralDevices(driver, vm, false) < 0) + goto cleanup; + /* Must be run before security labelling */ VIR_DEBUG("Preparing host devices"); if (!cfg->relaxedACS) @@ -5288,6 +5297,8 @@ void qemuProcessStop(virQEMUDriverPtr driver, priv->ccwaddrs = NULL; } + qemuMigrationRestoreEphemeralDevices(driver, NULL, vm, false); + qemuDomainReAttachHostDevices(driver, vm->def); def = vm->def; -- 1.9.3

On Wed, May 13, 2015 at 11:36:30 +0800, Chen Fan wrote:
add migration support for ephemeral host devices, introduce two 'detach' and 'restore' functions to unplug/plug host devices during migration.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> --- src/qemu/qemu_migration.c | 171 ++++++++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_migration.h | 9 +++ src/qemu/qemu_process.c | 11 +++ 3 files changed, 187 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 56112f9..d5a698f 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -3384,6 +3384,158 @@ qemuMigrationPrepareDef(virQEMUDriverPtr driver, return def; }
+int +qemuMigrationDetachEphemeralDevices(virQEMUDriverPtr driver, + virDomainObjPtr vm, + bool live) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainHostdevDefPtr hostdev; + virDomainNetDefPtr net; + virDomainDeviceDef dev; + virDomainDeviceDefPtr dev_copy = NULL; + virCapsPtr caps = NULL; + int actualType; + int ret = -1; + size_t i; + + VIR_DEBUG("Rum domain detach ephemeral devices"); + + if (!(caps = virQEMUDriverGetCapabilities(driver, false))) + return ret; + + for (i = 0; i < vm->def->nnets;) { + net = vm->def->nets[i]; + + actualType = virDomainNetGetActualType(net); + if (actualType != VIR_DOMAIN_NET_TYPE_HOSTDEV) { + i++; + continue; + } + + hostdev = virDomainNetGetActualHostdev(net); + if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS || + hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI || + !hostdev->ephemeral) { + i++; + continue; + } + + dev.type = VIR_DOMAIN_DEVICE_NET; + dev.data.net = net; + + dev_copy = virDomainDeviceDefCopy(&dev, vm->def, + caps, driver->xmlopt); + if (!dev_copy) + goto cleanup; + + if (live) { + /* nnets reduced */ + if (qemuDomainDetachNetDevice(driver, vm, dev_copy) < 0) + goto cleanup;
So this is where the fun begins. qemuDomainDetachNetDevice is not designed to be called this way since the detach API where it's used normally returns 0 in the following two cases: 1) The detach was successfull, the guest removed the device 2) The detach request was successful, but guest did not remove the device yet In the latter case you need to wait for a event to successfully know when the device was removed. Since this might very well happen the code will need to be changed to take that option into account. Please note that that step will make all the things really complicated. Peter

On Wed, May 13, 2015 at 10:36:34AM +0200, Peter Krempa wrote:
On Wed, May 13, 2015 at 11:36:30 +0800, Chen Fan wrote:
add migration support for ephemeral host devices, introduce two 'detach' and 'restore' functions to unplug/plug host devices during migration.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> --- src/qemu/qemu_migration.c | 171 ++++++++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_migration.h | 9 +++ src/qemu/qemu_process.c | 11 +++ 3 files changed, 187 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 56112f9..d5a698f 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -3384,6 +3384,158 @@ qemuMigrationPrepareDef(virQEMUDriverPtr driver, return def; }
+int +qemuMigrationDetachEphemeralDevices(virQEMUDriverPtr driver, + virDomainObjPtr vm, + bool live) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainHostdevDefPtr hostdev; + virDomainNetDefPtr net; + virDomainDeviceDef dev; + virDomainDeviceDefPtr dev_copy = NULL; + virCapsPtr caps = NULL; + int actualType; + int ret = -1; + size_t i; + + VIR_DEBUG("Rum domain detach ephemeral devices"); + + if (!(caps = virQEMUDriverGetCapabilities(driver, false))) + return ret; + + for (i = 0; i < vm->def->nnets;) { + net = vm->def->nets[i]; + + actualType = virDomainNetGetActualType(net); + if (actualType != VIR_DOMAIN_NET_TYPE_HOSTDEV) { + i++; + continue; + } + + hostdev = virDomainNetGetActualHostdev(net); + if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS || + hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI || + !hostdev->ephemeral) { + i++; + continue; + } + + dev.type = VIR_DOMAIN_DEVICE_NET; + dev.data.net = net; + + dev_copy = virDomainDeviceDefCopy(&dev, vm->def, + caps, driver->xmlopt); + if (!dev_copy) + goto cleanup; + + if (live) { + /* nnets reduced */ + if (qemuDomainDetachNetDevice(driver, vm, dev_copy) < 0) + goto cleanup;
So this is where the fun begins. qemuDomainDetachNetDevice is not designed to be called this way since the detach API where it's used normally returns 0 in the following two cases:
1) The detach was successfull, the guest removed the device 2) The detach request was successful, but guest did not remove the device yet
In the latter case you need to wait for a event to successfully know when the device was removed. Since this might very well happen the code will need to be changed to take that option into account. Please note that that step will make all the things really complicated.
Even more fun 3) The detach request was successful, but the guest is going to ignore it forever Really, this is not something we want to be deciding policy for inside libvirt. It is no end of trouble and we really must let the mgmt app decide how it wants this kind of problem handled. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 05/13/2015 04:36 AM, Peter Krempa wrote:
On Wed, May 13, 2015 at 11:36:30 +0800, Chen Fan wrote:
add migration support for ephemeral host devices, introduce two 'detach' and 'restore' functions to unplug/plug host devices during migration.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> --- src/qemu/qemu_migration.c | 171 ++++++++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_migration.h | 9 +++ src/qemu/qemu_process.c | 11 +++ 3 files changed, 187 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 56112f9..d5a698f 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -3384,6 +3384,158 @@ qemuMigrationPrepareDef(virQEMUDriverPtr driver, return def; }
+int +qemuMigrationDetachEphemeralDevices(virQEMUDriverPtr driver, + virDomainObjPtr vm, + bool live) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainHostdevDefPtr hostdev; + virDomainNetDefPtr net; + virDomainDeviceDef dev; + virDomainDeviceDefPtr dev_copy = NULL; + virCapsPtr caps = NULL; + int actualType; + int ret = -1; + size_t i; + + VIR_DEBUG("Rum domain detach ephemeral devices"); + + if (!(caps = virQEMUDriverGetCapabilities(driver, false))) + return ret; + + for (i = 0; i < vm->def->nnets;) { + net = vm->def->nets[i]; + + actualType = virDomainNetGetActualType(net); + if (actualType != VIR_DOMAIN_NET_TYPE_HOSTDEV) { + i++; + continue; + } + + hostdev = virDomainNetGetActualHostdev(net); + if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS || + hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI || + !hostdev->ephemeral) { + i++; + continue; + } + + dev.type = VIR_DOMAIN_DEVICE_NET; + dev.data.net = net; + + dev_copy = virDomainDeviceDefCopy(&dev, vm->def, + caps, driver->xmlopt); + if (!dev_copy) + goto cleanup; + + if (live) { + /* nnets reduced */ + if (qemuDomainDetachNetDevice(driver, vm, dev_copy) < 0) + goto cleanup;
So this is where the fun begins. qemuDomainDetachNetDevice is not designed to be called this way since the detach API where it's used normally returns 0 in the following two cases:
1) The detach was successfull, the guest removed the device 2) The detach request was successful, but guest did not remove the device yet
In the latter case you need to wait for a event to successfully know when the device was removed.
For historical reference: omission of this bit (needing to wait for the guest to remove the device) was one of the reasons Shradha's patches couldn't be pushed.

On 05/13/2015 04:36 PM, Peter Krempa wrote:
On Wed, May 13, 2015 at 11:36:30 +0800, Chen Fan wrote:
add migration support for ephemeral host devices, introduce two 'detach' and 'restore' functions to unplug/plug host devices during migration.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> --- src/qemu/qemu_migration.c | 171 ++++++++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_migration.h | 9 +++ src/qemu/qemu_process.c | 11 +++ 3 files changed, 187 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 56112f9..d5a698f 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -3384,6 +3384,158 @@ qemuMigrationPrepareDef(virQEMUDriverPtr driver, return def; }
+int +qemuMigrationDetachEphemeralDevices(virQEMUDriverPtr driver, + virDomainObjPtr vm, + bool live) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainHostdevDefPtr hostdev; + virDomainNetDefPtr net; + virDomainDeviceDef dev; + virDomainDeviceDefPtr dev_copy = NULL; + virCapsPtr caps = NULL; + int actualType; + int ret = -1; + size_t i; + + VIR_DEBUG("Rum domain detach ephemeral devices"); + + if (!(caps = virQEMUDriverGetCapabilities(driver, false))) + return ret; + + for (i = 0; i < vm->def->nnets;) { + net = vm->def->nets[i]; + + actualType = virDomainNetGetActualType(net); + if (actualType != VIR_DOMAIN_NET_TYPE_HOSTDEV) { + i++; + continue; + } + + hostdev = virDomainNetGetActualHostdev(net); + if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS || + hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI || + !hostdev->ephemeral) { + i++; + continue; + } + + dev.type = VIR_DOMAIN_DEVICE_NET; + dev.data.net = net; + + dev_copy = virDomainDeviceDefCopy(&dev, vm->def, + caps, driver->xmlopt); + if (!dev_copy) + goto cleanup; + + if (live) { + /* nnets reduced */ + if (qemuDomainDetachNetDevice(driver, vm, dev_copy) < 0) + goto cleanup; So this is where the fun begins. qemuDomainDetachNetDevice is not designed to be called this way since the detach API where it's used normally returns 0 in the following two cases:
1) The detach was successfull, the guest removed the device 2) The detach request was successful, but guest did not remove the device yet
In the latter case you need to wait for a event to successfully know when the device was removed. Since this might very well happen the code will need to be changed to take that option into account. Please note that that step will make all the things really complicated.
did you said the event is "DEVICE_DELETED" ? I saw the code the funcition qemuDomainWaitForDeviceRemoval has been used for waiting device removed from guest. Thanks, Chen
Peter

On Wed, May 13, 2015 at 11:36:30AM +0800, Chen Fan wrote:
add migration support for ephemeral host devices, introduce two 'detach' and 'restore' functions to unplug/plug host devices during migration.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> --- src/qemu/qemu_migration.c | 171 ++++++++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_migration.h | 9 +++ src/qemu/qemu_process.c | 11 +++ 3 files changed, 187 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 56112f9..d5a698f 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c
+void +qemuMigrationRestoreEphemeralDevices(virQEMUDriverPtr driver, + virConnectPtr conn, + virDomainObjPtr vm, + bool live) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDeviceDefPtr dev; + int ret = -1; + size_t i; + + VIR_DEBUG("Rum domain restore ephemeral devices"); + + for (i = 0; i < priv->nEphemeralDevices; i++) { + dev = priv->ephemeralDevices[i]; + + switch ((virDomainDeviceType) dev->type) { + case VIR_DOMAIN_DEVICE_NET: + if (live) { + ret = qemuDomainAttachNetDevice(conn, driver, vm, + dev->data.net); + } else { + ret = virDomainNetInsert(vm->def, dev->data.net); + } + + if (!ret) + dev->data.net = NULL; + break; + case VIR_DOMAIN_DEVICE_HOSTDEV: + if (live) { + ret = qemuDomainAttachHostDevice(conn, driver, vm, + dev->data.hostdev); + } else { + ret =virDomainHostdevInsert(vm->def, dev->data.hostdev); + }
This re-attach step is where we actually have far far far worse problems than with detach. This is blindly assuming that the guest on the target host can use the same hostdev that it was using on the source host. This is essentially useless in the real world. Even if the same vendor/model device is available on the target host, it is very unlikely to be available at the same bus/slot/function that it was on the source. It is quite likely neccessary to allocate a complete different NIC, or if using SRIOV allocate a different function. It is also not uncommon to have different vendor/models, so a completely different NIC may be required. It is impossible for libvirt todo anything sensible when picking the hostdev to use on the target host as it does not have anywhere near enough knowledge to make a correct decision. For example, it does not know which physical network each NIC on the target host is plugged into. Even if it knew the networks, it does not know what the I/O utilization is likel, to be able to intelligently decide between a set of possible free NICs. In any non-trivial mgmt app, the management app itself will have this knowledge and have policies around which hostdevice to assign to a guest given a particular set of circumstances. It may even decide not to assign a hostdev on the target and instead provide 2 or 3 or more emulated devices that could be used in bandwidth aggregation mode rather than failover mode. In OpenStack, the compute hosts don't even decide which NICs are given to which guests. This is down to an external schedular running on a different host(s), and the compute host just hotplugs what has already been decided elsewhere. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 05/13/2015 05:57 AM, Daniel P. Berrange wrote:
On Wed, May 13, 2015 at 11:36:30AM +0800, Chen Fan wrote:
add migration support for ephemeral host devices, introduce two 'detach' and 'restore' functions to unplug/plug host devices during migration.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> --- src/qemu/qemu_migration.c | 171 ++++++++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_migration.h | 9 +++ src/qemu/qemu_process.c | 11 +++ 3 files changed, 187 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 56112f9..d5a698f 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c
+void +qemuMigrationRestoreEphemeralDevices(virQEMUDriverPtr driver, + virConnectPtr conn, + virDomainObjPtr vm, + bool live) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDeviceDefPtr dev; + int ret = -1; + size_t i; + + VIR_DEBUG("Rum domain restore ephemeral devices"); + + for (i = 0; i < priv->nEphemeralDevices; i++) { + dev = priv->ephemeralDevices[i]; + + switch ((virDomainDeviceType) dev->type) { + case VIR_DOMAIN_DEVICE_NET: + if (live) { + ret = qemuDomainAttachNetDevice(conn, driver, vm, + dev->data.net); + } else { + ret = virDomainNetInsert(vm->def, dev->data.net); + } + + if (!ret) + dev->data.net = NULL; + break; + case VIR_DOMAIN_DEVICE_HOSTDEV: + if (live) { + ret = qemuDomainAttachHostDevice(conn, driver, vm, + dev->data.hostdev); + } else { + ret =virDomainHostdevInsert(vm->def, dev->data.hostdev); + }
This re-attach step is where we actually have far far far worse problems than with detach. This is blindly assuming that the guest on the target host can use the same hostdev that it was using on the source host.
(kind of pointless to comment on, since pkrempa has changed my opinion by forcing me to think about the "failure to reattach" condition, but could be useful info for others) For a <hostdev>, yes, but not for <interface type='network'> (which would point to a libvirt network pool of VFs).
This is essentially useless in the real world.
Agreed (for plain <hostdev>)
Even if the same vendor/model device is available on the target host, it is very unlikely to be available at the same bus/slot/function that it was on the source. It is quite likely neccessary to allocate a complete different NIC, or if using SRIOV allocate a different function. It is also not uncommon to have different vendor/models, so a completely different NIC may be required.
In the case of a network device, a different brand/model of NIC at a different PCI address using a different guest driver shouldn't be a problem for the guest, as long as the MAC address is the same (for a Linux guest anyway; not sure what a Windows guest would do with a NIC that had the same MAC but used a different driver). This points out the folly of trying to do migration with attached hostdevs (managed at *any* level), for anything other than SRIOV VFs (which can have their MAC address set before attach, unlike non-SRIOV NICs).

On 05/13/2015 10:30 PM, Laine Stump wrote:
On 05/13/2015 05:57 AM, Daniel P. Berrange wrote:
add migration support for ephemeral host devices, introduce two 'detach' and 'restore' functions to unplug/plug host devices during migration.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> --- src/qemu/qemu_migration.c | 171 ++++++++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_migration.h | 9 +++ src/qemu/qemu_process.c | 11 +++ 3 files changed, 187 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 56112f9..d5a698f 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c +void +qemuMigrationRestoreEphemeralDevices(virQEMUDriverPtr driver, + virConnectPtr conn, + virDomainObjPtr vm, + bool live) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDeviceDefPtr dev; + int ret = -1; + size_t i; + + VIR_DEBUG("Rum domain restore ephemeral devices"); + + for (i = 0; i < priv->nEphemeralDevices; i++) { + dev = priv->ephemeralDevices[i]; + + switch ((virDomainDeviceType) dev->type) { + case VIR_DOMAIN_DEVICE_NET: + if (live) { + ret = qemuDomainAttachNetDevice(conn, driver, vm, + dev->data.net); + } else { + ret = virDomainNetInsert(vm->def, dev->data.net); + } + + if (!ret) + dev->data.net = NULL; + break; + case VIR_DOMAIN_DEVICE_HOSTDEV: + if (live) { + ret = qemuDomainAttachHostDevice(conn, driver, vm, + dev->data.hostdev); + } else { + ret =virDomainHostdevInsert(vm->def, dev->data.hostdev); + } This re-attach step is where we actually have far far far worse problems
On Wed, May 13, 2015 at 11:36:30AM +0800, Chen Fan wrote: than with detach. This is blindly assuming that the guest on the target host can use the same hostdev that it was using on the source host. (kind of pointless to comment on, since pkrempa has changed my opinion by forcing me to think about the "failure to reattach" condition, but could be useful info for others)
For a <hostdev>, yes, but not for <interface type='network'> (which would point to a libvirt network pool of VFs).
This is essentially useless in the real world. Agreed (for plain <hostdev>)
Even if the same vendor/model device is available on the target host, it is very unlikely to be available at the same bus/slot/function that it was on the source. It is quite likely neccessary to allocate a complete different NIC, or if using SRIOV allocate a different function. It is also not uncommon to have different vendor/models, so a completely different NIC may be required. In the case of a network device, a different brand/model of NIC at a different PCI address using a different guest driver shouldn't be a problem for the guest, as long as the MAC address is the same (for a Linux guest anyway; not sure what a Windows guest would do with a NIC that had the same MAC but used a different driver). This points out the folly of trying to do migration with attached hostdevs (managed at *any* level), for anything other than SRIOV VFs (which can have their MAC address set before attach, unlike non-SRIOV NICs).
. So should we focus on implementing the feature that support migration with SRIOV VFs at first?
I think that is simple to achieve my original target that implement NIC passthrough device migration. because sometimes we assign a native NIC to guest to keep the performance of network I/O, due to the MAC limitation of the non-SRIOV NICs, as laine said the cost of SRIOV NIC is cheaper than what we try. Thanks, Chen

On Thu, May 14, 2015 at 10:02:39AM +0800, Chen Fan wrote:
On 05/13/2015 10:30 PM, Laine Stump wrote:
On 05/13/2015 05:57 AM, Daniel P. Berrange wrote:
add migration support for ephemeral host devices, introduce two 'detach' and 'restore' functions to unplug/plug host devices during migration.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> --- src/qemu/qemu_migration.c | 171 ++++++++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_migration.h | 9 +++ src/qemu/qemu_process.c | 11 +++ 3 files changed, 187 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 56112f9..d5a698f 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c +void +qemuMigrationRestoreEphemeralDevices(virQEMUDriverPtr driver, + virConnectPtr conn, + virDomainObjPtr vm, + bool live) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDeviceDefPtr dev; + int ret = -1; + size_t i; + + VIR_DEBUG("Rum domain restore ephemeral devices"); + + for (i = 0; i < priv->nEphemeralDevices; i++) { + dev = priv->ephemeralDevices[i]; + + switch ((virDomainDeviceType) dev->type) { + case VIR_DOMAIN_DEVICE_NET: + if (live) { + ret = qemuDomainAttachNetDevice(conn, driver, vm, + dev->data.net); + } else { + ret = virDomainNetInsert(vm->def, dev->data.net); + } + + if (!ret) + dev->data.net = NULL; + break; + case VIR_DOMAIN_DEVICE_HOSTDEV: + if (live) { + ret = qemuDomainAttachHostDevice(conn, driver, vm, + dev->data.hostdev); + } else { + ret =virDomainHostdevInsert(vm->def, dev->data.hostdev); + } This re-attach step is where we actually have far far far worse problems
On Wed, May 13, 2015 at 11:36:30AM +0800, Chen Fan wrote: than with detach. This is blindly assuming that the guest on the target host can use the same hostdev that it was using on the source host. (kind of pointless to comment on, since pkrempa has changed my opinion by forcing me to think about the "failure to reattach" condition, but could be useful info for others)
For a <hostdev>, yes, but not for <interface type='network'> (which would point to a libvirt network pool of VFs).
This is essentially useless in the real world. Agreed (for plain <hostdev>)
Even if the same vendor/model device is available on the target host, it is very unlikely to be available at the same bus/slot/function that it was on the source. It is quite likely neccessary to allocate a complete different NIC, or if using SRIOV allocate a different function. It is also not uncommon to have different vendor/models, so a completely different NIC may be required. In the case of a network device, a different brand/model of NIC at a different PCI address using a different guest driver shouldn't be a problem for the guest, as long as the MAC address is the same (for a Linux guest anyway; not sure what a Windows guest would do with a NIC that had the same MAC but used a different driver). This points out the folly of trying to do migration with attached hostdevs (managed at *any* level), for anything other than SRIOV VFs (which can have their MAC address set before attach, unlike non-SRIOV NICs).
. So should we focus on implementing the feature that support migration with SRIOV VFs at first?
I think that is simple to achieve my original target that implement NIC passthrough device migration. because sometimes we assign a native NIC to guest to keep the performance of network I/O, due to the MAC limitation of the non-SRIOV NICs, as laine said the cost of SRIOV NIC is cheaper than what we try.
No, I think you should /not/ attempt to implement this in libvirt at all and instead focus on the higher level apps. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 2015/05/14 17:38, Daniel P. Berrange wrote:
On Thu, May 14, 2015 at 10:02:39AM +0800, Chen Fan wrote:
On 05/13/2015 10:30 PM, Laine Stump wrote:
On 05/13/2015 05:57 AM, Daniel P. Berrange wrote:
add migration support for ephemeral host devices, introduce two 'detach' and 'restore' functions to unplug/plug host devices during migration.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> --- src/qemu/qemu_migration.c | 171 ++++++++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_migration.h | 9 +++ src/qemu/qemu_process.c | 11 +++ 3 files changed, 187 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 56112f9..d5a698f 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c +void +qemuMigrationRestoreEphemeralDevices(virQEMUDriverPtr driver, + virConnectPtr conn, + virDomainObjPtr vm, + bool live) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDeviceDefPtr dev; + int ret = -1; + size_t i; + + VIR_DEBUG("Rum domain restore ephemeral devices"); + + for (i = 0; i < priv->nEphemeralDevices; i++) { + dev = priv->ephemeralDevices[i]; + + switch ((virDomainDeviceType) dev->type) { + case VIR_DOMAIN_DEVICE_NET: + if (live) { + ret = qemuDomainAttachNetDevice(conn, driver, vm, + dev->data.net); + } else { + ret = virDomainNetInsert(vm->def, dev->data.net); + } + + if (!ret) + dev->data.net = NULL; + break; + case VIR_DOMAIN_DEVICE_HOSTDEV: + if (live) { + ret = qemuDomainAttachHostDevice(conn, driver, vm, + dev->data.hostdev); + } else { + ret =virDomainHostdevInsert(vm->def, dev->data.hostdev); + } This re-attach step is where we actually have far far far worse problems
On Wed, May 13, 2015 at 11:36:30AM +0800, Chen Fan wrote: than with detach. This is blindly assuming that the guest on the target host can use the same hostdev that it was using on the source host. (kind of pointless to comment on, since pkrempa has changed my opinion by forcing me to think about the "failure to reattach" condition, but could be useful info for others)
For a <hostdev>, yes, but not for <interface type='network'> (which would point to a libvirt network pool of VFs).
This is essentially useless in the real world. Agreed (for plain <hostdev>)
Even if the same vendor/model device is available on the target host, it is very unlikely to be available at the same bus/slot/function that it was on the source. It is quite likely neccessary to allocate a complete different NIC, or if using SRIOV allocate a different function. It is also not uncommon to have different vendor/models, so a completely different NIC may be required. In the case of a network device, a different brand/model of NIC at a different PCI address using a different guest driver shouldn't be a problem for the guest, as long as the MAC address is the same (for a Linux guest anyway; not sure what a Windows guest would do with a NIC that had the same MAC but used a different driver). This points out the folly of trying to do migration with attached hostdevs (managed at *any* level), for anything other than SRIOV VFs (which can have their MAC address set before attach, unlike non-SRIOV NICs).
. So should we focus on implementing the feature that support migration with SRIOV VFs at first?
I think that is simple to achieve my original target that implement NIC passthrough device migration. because sometimes we assign a native NIC to guest to keep the performance of network I/O, due to the MAC limitation of the non-SRIOV NICs, as laine said the cost of SRIOV NIC is cheaper than what we try.
No, I think you should /not/ attempt to implement this in libvirt at all and instead focus on the higher level apps.
Hmm, I think there are some roles which libvirt can take in the whole operations. Let me clarify how things will go at pci-pass through + migration. (1) the user(or high level apps) make a pair of pci devices which can be replaced before/after migration. (2) the pair of devices in 2 hosts are described somewhere. (3) before starting migration, migration initiator takes care of another side of the pair devices are available at target host. (4) unplug pci devices, which are descrived as part of paired devices. (5) migration with checking all pci-passthrough devices are unplugged. (6) at success, plug pci devices, which are descrived as part of paired devices. (6') at failure, plug unplugged devices back. I think (1) should be done by higher level apps or user (by hand). (2) should be a generic/vm-independent format (3) should be checked by migration initiator (4) shoule be done by agent in the guest. (5) should be chekced by migration initiator (6) should be done by agent in the guest. can't libvirt take a role in (2)(3)(5) ? All should be done by the higher level app ? I wonder we should have virt-tool to do this before going the higher level. Thanks, -Kame

On 05/13/2015 10:02 PM, Chen Fan wrote:
On 05/13/2015 10:30 PM, Laine Stump wrote:
On 05/13/2015 05:57 AM, Daniel P. Berrange wrote:
add migration support for ephemeral host devices, introduce two 'detach' and 'restore' functions to unplug/plug host devices during migration.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> --- src/qemu/qemu_migration.c | 171 ++++++++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_migration.h | 9 +++ src/qemu/qemu_process.c | 11 +++ 3 files changed, 187 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 56112f9..d5a698f 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c +void +qemuMigrationRestoreEphemeralDevices(virQEMUDriverPtr driver, + virConnectPtr conn, + virDomainObjPtr vm, + bool live) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDeviceDefPtr dev; + int ret = -1; + size_t i; + + VIR_DEBUG("Rum domain restore ephemeral devices"); + + for (i = 0; i < priv->nEphemeralDevices; i++) { + dev = priv->ephemeralDevices[i]; + + switch ((virDomainDeviceType) dev->type) { + case VIR_DOMAIN_DEVICE_NET: + if (live) { + ret = qemuDomainAttachNetDevice(conn, driver, vm, + dev->data.net); + } else { + ret = virDomainNetInsert(vm->def, dev->data.net); + } + + if (!ret) + dev->data.net = NULL; + break; + case VIR_DOMAIN_DEVICE_HOSTDEV: + if (live) { + ret = qemuDomainAttachHostDevice(conn, driver, vm, + dev->data.hostdev); + } else { + ret =virDomainHostdevInsert(vm->def, dev->data.hostdev); + } This re-attach step is where we actually have far far far worse
On Wed, May 13, 2015 at 11:36:30AM +0800, Chen Fan wrote: problems than with detach. This is blindly assuming that the guest on the target host can use the same hostdev that it was using on the source host. (kind of pointless to comment on, since pkrempa has changed my opinion by forcing me to think about the "failure to reattach" condition, but could be useful info for others)
For a <hostdev>, yes, but not for <interface type='network'> (which would point to a libvirt network pool of VFs).
This is essentially useless in the real world. Agreed (for plain <hostdev>)
Even if the same vendor/model device is available on the target host, it is very unlikely to be available at the same bus/slot/function that it was on the source. It is quite likely neccessary to allocate a complete different NIC, or if using SRIOV allocate a different function. It is also not uncommon to have different vendor/models, so a completely different NIC may be required. In the case of a network device, a different brand/model of NIC at a different PCI address using a different guest driver shouldn't be a problem for the guest, as long as the MAC address is the same (for a Linux guest anyway; not sure what a Windows guest would do with a NIC that had the same MAC but used a different driver). This points out the folly of trying to do migration with attached hostdevs (managed at *any* level), for anything other than SRIOV VFs (which can have their MAC address set before attach, unlike non-SRIOV NICs).
. So should we focus on implementing the feature that support migration with SRIOV VFs at first?
Not "at first", but "only". Adding the requirement of dealing properly with MAC address change to the guest adds a lot of complexity to that code with not much real gain. And based on my newfound realization of the horrible situation that would be created by a failure to re-attach after migration was complete (see my response to Peter Krempa yesterday), I now agree with Dan that this shouldn't be implemented in libvirt, but in the higher level management, which will be able to more easily/realistically deal with such a failure. (and by the way, I think I should apologize for leading you down the road of the ephemeral patches in response to your earlier RFC. If only I'd fully considered the post-migration re-attach failure case, and the difficulty libvirt would have recovering from that prior to Peter pointing it out so eloquently yesterday :-/)

On Wed, May 13, 2015 at 10:30:32AM -0400, Laine Stump wrote:
On 05/13/2015 05:57 AM, Daniel P. Berrange wrote:
On Wed, May 13, 2015 at 11:36:30AM +0800, Chen Fan wrote:
add migration support for ephemeral host devices, introduce two 'detach' and 'restore' functions to unplug/plug host devices during migration.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> --- src/qemu/qemu_migration.c | 171 ++++++++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_migration.h | 9 +++ src/qemu/qemu_process.c | 11 +++ 3 files changed, 187 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 56112f9..d5a698f 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c
+void +qemuMigrationRestoreEphemeralDevices(virQEMUDriverPtr driver, + virConnectPtr conn, + virDomainObjPtr vm, + bool live) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDeviceDefPtr dev; + int ret = -1; + size_t i; + + VIR_DEBUG("Rum domain restore ephemeral devices"); + + for (i = 0; i < priv->nEphemeralDevices; i++) { + dev = priv->ephemeralDevices[i]; + + switch ((virDomainDeviceType) dev->type) { + case VIR_DOMAIN_DEVICE_NET: + if (live) { + ret = qemuDomainAttachNetDevice(conn, driver, vm, + dev->data.net); + } else { + ret = virDomainNetInsert(vm->def, dev->data.net); + } + + if (!ret) + dev->data.net = NULL; + break; + case VIR_DOMAIN_DEVICE_HOSTDEV: + if (live) { + ret = qemuDomainAttachHostDevice(conn, driver, vm, + dev->data.hostdev); + } else { + ret =virDomainHostdevInsert(vm->def, dev->data.hostdev); + }
This re-attach step is where we actually have far far far worse problems than with detach. This is blindly assuming that the guest on the target host can use the same hostdev that it was using on the source host.
(kind of pointless to comment on, since pkrempa has changed my opinion by forcing me to think about the "failure to reattach" condition, but could be useful info for others)
For a <hostdev>, yes, but not for <interface type='network'> (which would point to a libvirt network pool of VFs).
I should note that in OpenStack at least we don't ever use the libvirt <interface type='network'> feature. This is because the OpenStack scheduler needs to have better control over exactly which VFs are allocated to which guest. This code runs on a separate host, and takes into account stuff such as the NUMA affinity of the guest, the utilization of the VFs by other guests, and more besides. So even in the <interface> case this proposal is pretty limited in usefulness. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

we should save the XML information to image head before we hotunplug the ephemeral devices. so here we handle XML ahead. Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> --- src/qemu/qemu_driver.c | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index b3263ac..86d93d2 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -3179,26 +3179,6 @@ qemuDomainSaveInternal(virQEMUDriverPtr driver, virDomainPtr dom, priv->job.current->type = VIR_DOMAIN_JOB_UNBOUNDED; - /* Pause */ - if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { - was_running = true; - if (qemuProcessStopCPUs(driver, vm, VIR_DOMAIN_PAUSED_SAVE, - QEMU_ASYNC_JOB_SAVE) < 0) - goto endjob; - - if (!virDomainObjIsActive(vm)) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("guest unexpectedly quit")); - goto endjob; - } - } - - /* libvirt.c already guaranteed these two flags are exclusive. */ - if (flags & VIR_DOMAIN_SAVE_RUNNING) - was_running = true; - else if (flags & VIR_DOMAIN_SAVE_PAUSED) - was_running = false; - /* Get XML for the domain. Restore needs only the inactive xml, * including secure. We should get the same result whether xmlin * is NULL or whether it was the live xml of the domain moments @@ -3225,6 +3205,26 @@ qemuDomainSaveInternal(virQEMUDriverPtr driver, virDomainPtr dom, goto endjob; } + /* Pause */ + if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { + was_running = true; + if (qemuProcessStopCPUs(driver, vm, VIR_DOMAIN_PAUSED_SAVE, + QEMU_ASYNC_JOB_SAVE) < 0) + goto endjob; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("guest unexpectedly quit")); + goto endjob; + } + } + + /* libvirt.c already guaranteed these two flags are exclusive. */ + if (flags & VIR_DOMAIN_SAVE_RUNNING) + was_running = true; + else if (flags & VIR_DOMAIN_SAVE_PAUSED) + was_running = false; + ret = qemuDomainSaveMemory(driver, vm, path, xml, compressed, was_running, flags, QEMU_ASYNC_JOB_SAVE); if (ret < 0) -- 1.9.3

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com> --- src/qemu/qemu_driver.c | 8 ++++++++ src/qemu/qemu_process.c | 3 ++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 86d93d2..112acb1 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -3208,6 +3208,10 @@ qemuDomainSaveInternal(virQEMUDriverPtr driver, virDomainPtr dom, /* Pause */ if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { was_running = true; + /* Detach ephemeral host devices first */ + if (qemuMigrationDetachEphemeralDevices(driver, vm, true) < 0) + goto endjob; + if (qemuProcessStopCPUs(driver, vm, VIR_DOMAIN_PAUSED_SAVE, QEMU_ASYNC_JOB_SAVE) < 0) goto endjob; @@ -3249,6 +3253,8 @@ qemuDomainSaveInternal(virQEMUDriverPtr driver, virDomainPtr dom, VIR_DOMAIN_EVENT_SUSPENDED, VIR_DOMAIN_EVENT_SUSPENDED_API_ERROR); } + qemuMigrationRestoreEphemeralDevices(driver, dom->conn, vm, true); + virSetError(save_err); virFreeError(save_err); } @@ -6404,6 +6410,8 @@ qemuDomainSaveImageStartVM(virConnectPtr conn, if (event) qemuDomainEventQueue(driver, event); + /* Restore ephemeral devices */ + qemuMigrationRestoreEphemeralDevices(driver, NULL, vm, true); /* If it was running before, resume it now unless caller requested pause. */ if (header->was_running && !start_paused) { diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 904c447..6519477 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -4501,7 +4501,8 @@ int qemuProcessStart(virConnectPtr conn, * during migration. hence we should remove the reserved * PCI address for ephemeral device. */ - if (vmop == VIR_NETDEV_VPORT_PROFILE_OP_MIGRATE_IN_START) + if (vmop == VIR_NETDEV_VPORT_PROFILE_OP_MIGRATE_IN_START || + vmop == VIR_NETDEV_VPORT_PROFILE_OP_RESTORE) if (qemuMigrationDetachEphemeralDevices(driver, vm, false) < 0) goto cleanup; -- 1.9.3

On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote:
my main goal is to add support migration with host NIC passthrough devices and keep the network connectivity.
this series patch base on Shradha's patches on https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html which is add migration support for host passthrough devices.
1) unplug the ephemeral devices before migration
2) do native migration
3) when migration finished, hotplug the ephemeral devices
IMHO this algorithm is something that an upper layer management app should do. The device unplug operation is complex and it might not succeed which will make the current migration thread hang or fail in an intermediate state that will not be recoverable. I'll point the places out in the actual patches.
TODO: keep network connectivity on guest level by bonding device.
This is out of scope for libvirt since we don't really know what is running inside the vm.
Chen Fan (6): conf: add ephemeral element for hostdev supporting migration qemu: Save ephemeral devices into qemuDomainObjPrivate qemu: add check ephemeral devices only for PCI host devices migration: Migration support for ephemeral hostdevs managedsave: move the domain xml handling forward to stop CPU managedsave: add managedsave support for ephemeral host devices
Peter

* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote:
my main goal is to add support migration with host NIC passthrough devices and keep the network connectivity.
this series patch base on Shradha's patches on https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html which is add migration support for host passthrough devices.
1) unplug the ephemeral devices before migration
2) do native migration
3) when migration finished, hotplug the ephemeral devices
IMHO this algorithm is something that an upper layer management app should do. The device unplug operation is complex and it might not succeed which will make the current migration thread hang or fail in an intermediate state that will not be recoverable.
However you wouldn't want each of the upper layer management apps implementing their own hacks for this; so something somewhere needs to standardise what the guest sees. Dave
I'll point the places out in the actual patches.
TODO: keep network connectivity on guest level by bonding device.
This is out of scope for libvirt since we don't really know what is running inside the vm.
Chen Fan (6): conf: add ephemeral element for hostdev supporting migration qemu: Save ephemeral devices into qemuDomainObjPrivate qemu: add check ephemeral devices only for PCI host devices migration: Migration support for ephemeral hostdevs managedsave: move the domain xml handling forward to stop CPU managedsave: add managedsave support for ephemeral host devices
Peter
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

On Wed, May 13, 2015 at 09:08:39 +0100, Dr. David Alan Gilbert wrote:
* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote:
my main goal is to add support migration with host NIC passthrough devices and keep the network connectivity.
this series patch base on Shradha's patches on https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html which is add migration support for host passthrough devices.
1) unplug the ephemeral devices before migration
2) do native migration
3) when migration finished, hotplug the ephemeral devices
IMHO this algorithm is something that an upper layer management app should do. The device unplug operation is complex and it might not succeed which will make the current migration thread hang or fail in an intermediate state that will not be recoverable.
However you wouldn't want each of the upper layer management apps implementing their own hacks for this; so something somewhere needs to standardise what the guest sees.
The guest still will see an PCI device unplug request and will have to respond to it, then will be paused and after resume a new PCI device will appear. This is standardised. The nonstandardised part (which can't really be standardised) is how the bonding or other guest-dependant stuff will be handled, but that is up to the guest OS to handle. From libvirt's perspective this is only something that will trigger the device unplug and plug the devices back. And there are a lot of issues here: 1) the destination of the migration might not have the desired devices This will trigger a lot of problems as we will not be able to guarantee that the devices reappear on the destination and if we'd wanted to check we'd need a new migration protocol AFAIK. 2) The guest OS might refuse to detach the PCI device (it might be stuck before PCI code is loaded) In that case the migration will be stuck forever and abort attempts will make the domain state basically undefined depending on the phase where it failed. Since we can't guarantee that the unplug of the PCI host devices will be atomic or that it will succeed we basically can't guarantee in any way in which state the VM will end up later after (a possibly failed) migration. To recover such state there are too many option that could be desired by the user that would be hard to implement in a way that would be flexible enough. Peter

* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 09:08:39 +0100, Dr. David Alan Gilbert wrote:
* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote:
my main goal is to add support migration with host NIC passthrough devices and keep the network connectivity.
this series patch base on Shradha's patches on https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html which is add migration support for host passthrough devices.
1) unplug the ephemeral devices before migration
2) do native migration
3) when migration finished, hotplug the ephemeral devices
IMHO this algorithm is something that an upper layer management app should do. The device unplug operation is complex and it might not succeed which will make the current migration thread hang or fail in an intermediate state that will not be recoverable.
However you wouldn't want each of the upper layer management apps implementing their own hacks for this; so something somewhere needs to standardise what the guest sees.
The guest still will see an PCI device unplug request and will have to respond to it, then will be paused and after resume a new PCI device will appear. This is standardised. The nonstandardised part (which can't really be standardised) is how the bonding or other guest-dependant stuff will be handled, but that is up to the guest OS to handle.
Why can't that be standardised? Don't we need to provide the information on what to bond to the guest and that this process is happening? The previous suggestion was to use guest-agent for this.
From libvirt's perspective this is only something that will trigger the device unplug and plug the devices back. And there are a lot of issues here:
1) the destination of the migration might not have the desired devices
This will trigger a lot of problems as we will not be able to guarantee that the devices reappear on the destination and if we'd wanted to check we'd need a new migration protocol AFAIK.
But if it's using the bonding trick then that isn't fatal; it would still be able to have the bonded virtio device.
2) The guest OS might refuse to detach the PCI device (it might be stuck before PCI code is loaded)
In that case the migration will be stuck forever and abort attempts will make the domain state basically undefined depending on the phase where it failed.
Since we can't guarantee that the unplug of the PCI host devices will be atomic or that it will succeed we basically can't guarantee in any way in which state the VM will end up later after (a possibly failed) migration. To recover such state there are too many option that could be desired by the user that would be hard to implement in a way that would be flexible enough.
I don't understand why this is any different to any other PCI device hot-unplug. Dave
Peter
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

On Wed, May 13, 2015 at 09:40:23 +0100, Dr. David Alan Gilbert wrote:
* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 09:08:39 +0100, Dr. David Alan Gilbert wrote:
* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote:
my main goal is to add support migration with host NIC passthrough devices and keep the network connectivity.
this series patch base on Shradha's patches on https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html which is add migration support for host passthrough devices.
1) unplug the ephemeral devices before migration
2) do native migration
3) when migration finished, hotplug the ephemeral devices
IMHO this algorithm is something that an upper layer management app should do. The device unplug operation is complex and it might not succeed which will make the current migration thread hang or fail in an intermediate state that will not be recoverable.
However you wouldn't want each of the upper layer management apps implementing their own hacks for this; so something somewhere needs to standardise what the guest sees.
The guest still will see an PCI device unplug request and will have to respond to it, then will be paused and after resume a new PCI device will appear. This is standardised. The nonstandardised part (which can't really be standardised) is how the bonding or other guest-dependant stuff will be handled, but that is up to the guest OS to handle.
Why can't that be standardised? Don't we need to provide the information on what to bond to the guest and that this process is happening? The previous suggestion was to use guest-agent for this.
Well, since only in linux you've got multiple ways to do that including legacy init scripts on various distros, the systemd-networkd thingie or how it's called or network manager, standardising this part won't be that easy. Not speaking of possible different OSes.
From libvirt's perspective this is only something that will trigger the device unplug and plug the devices back. And there are a lot of issues here:
1) the destination of the migration might not have the desired devices
This will trigger a lot of problems as we will not be able to guarantee that the devices reappear on the destination and if we'd wanted to check we'd need a new migration protocol AFAIK.
But if it's using the bonding trick then that isn't fatal; it would still be able to have the bonded virtio device.
2) The guest OS might refuse to detach the PCI device (it might be stuck before PCI code is loaded)
In that case the migration will be stuck forever and abort attempts will make the domain state basically undefined depending on the phase where it failed.
Since we can't guarantee that the unplug of the PCI host devices will be atomic or that it will succeed we basically can't guarantee in any way in which state the VM will end up later after (a possibly failed) migration. To recover such state there are too many option that could be desired by the user that would be hard to implement in a way that would be flexible enough.
I don't understand why this is any different to any other PCI device hot-unplug.
It's the same, but once libvirt would be doing multiple PCI unplug requests along with the migration code, things might not go well. If you then couple this with different user expectations what should happen in various error cases it gets even more messy. Peter

* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 09:40:23 +0100, Dr. David Alan Gilbert wrote:
* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 09:08:39 +0100, Dr. David Alan Gilbert wrote:
* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote:
my main goal is to add support migration with host NIC passthrough devices and keep the network connectivity.
this series patch base on Shradha's patches on https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html which is add migration support for host passthrough devices.
1) unplug the ephemeral devices before migration
2) do native migration
3) when migration finished, hotplug the ephemeral devices
IMHO this algorithm is something that an upper layer management app should do. The device unplug operation is complex and it might not succeed which will make the current migration thread hang or fail in an intermediate state that will not be recoverable.
However you wouldn't want each of the upper layer management apps implementing their own hacks for this; so something somewhere needs to standardise what the guest sees.
The guest still will see an PCI device unplug request and will have to respond to it, then will be paused and after resume a new PCI device will appear. This is standardised. The nonstandardised part (which can't really be standardised) is how the bonding or other guest-dependant stuff will be handled, but that is up to the guest OS to handle.
Why can't that be standardised? Don't we need to provide the information on what to bond to the guest and that this process is happening? The previous suggestion was to use guest-agent for this.
Well, since only in linux you've got multiple ways to do that including legacy init scripts on various distros, the systemd-networkd thingie or how it's called or network manager, standardising this part won't be that easy. Not speaking of possible different OSes.
Right - so we need to standardise on the messaging we send to the guest to tell it that we've got this bonded hotplug setup, and then the different OSs can implement what they need off using that information.
From libvirt's perspective this is only something that will trigger the device unplug and plug the devices back. And there are a lot of issues here:
1) the destination of the migration might not have the desired devices
This will trigger a lot of problems as we will not be able to guarantee that the devices reappear on the destination and if we'd wanted to check we'd need a new migration protocol AFAIK.
But if it's using the bonding trick then that isn't fatal; it would still be able to have the bonded virtio device.
2) The guest OS might refuse to detach the PCI device (it might be stuck before PCI code is loaded)
In that case the migration will be stuck forever and abort attempts will make the domain state basically undefined depending on the phase where it failed.
Since we can't guarantee that the unplug of the PCI host devices will be atomic or that it will succeed we basically can't guarantee in any way in which state the VM will end up later after (a possibly failed) migration. To recover such state there are too many option that could be desired by the user that would be hard to implement in a way that would be flexible enough.
I don't understand why this is any different to any other PCI device hot-unplug.
It's the same, but once libvirt would be doing multiple PCI unplug requests along with the migration code, things might not go well. If you then couple this with different user expectations what should happen in various error cases it gets even more messy.
Well, since we've got the bond it shouldn't get quite that bad; the error cases don't sound that bad: 1) If we can't hot-unplug then we don't migrate/cancel migration. We warn the user, if we're unlucky we're left running on the bond. 2) If we can't hot-plug at the end, then we've still got the bond in, so the guest carries on running (albeit with reduced performance). We need to flag this to the user somehow. Dave
Peter
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

On Wed, May 13, 2015 at 10:00:42AM +0100, Dr. David Alan Gilbert wrote:
* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 09:40:23 +0100, Dr. David Alan Gilbert wrote:
* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 09:08:39 +0100, Dr. David Alan Gilbert wrote:
* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote: > my main goal is to add support migration with host NIC > passthrough devices and keep the network connectivity. > > this series patch base on Shradha's patches on > https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html > which is add migration support for host passthrough devices. > > 1) unplug the ephemeral devices before migration > > 2) do native migration > > 3) when migration finished, hotplug the ephemeral devices
IMHO this algorithm is something that an upper layer management app should do. The device unplug operation is complex and it might not succeed which will make the current migration thread hang or fail in an intermediate state that will not be recoverable.
However you wouldn't want each of the upper layer management apps implementing their own hacks for this; so something somewhere needs to standardise what the guest sees.
The guest still will see an PCI device unplug request and will have to respond to it, then will be paused and after resume a new PCI device will appear. This is standardised. The nonstandardised part (which can't really be standardised) is how the bonding or other guest-dependant stuff will be handled, but that is up to the guest OS to handle.
Why can't that be standardised? Don't we need to provide the information on what to bond to the guest and that this process is happening? The previous suggestion was to use guest-agent for this.
Well, since only in linux you've got multiple ways to do that including legacy init scripts on various distros, the systemd-networkd thingie or how it's called or network manager, standardising this part won't be that easy. Not speaking of possible different OSes.
Right - so we need to standardise on the messaging we send to the guest to tell it that we've got this bonded hotplug setup, and then the different OSs can implement what they need off using that information.
From libvirt's perspective this is only something that will trigger the device unplug and plug the devices back. And there are a lot of issues here:
1) the destination of the migration might not have the desired devices
This will trigger a lot of problems as we will not be able to guarantee that the devices reappear on the destination and if we'd wanted to check we'd need a new migration protocol AFAIK.
But if it's using the bonding trick then that isn't fatal; it would still be able to have the bonded virtio device.
2) The guest OS might refuse to detach the PCI device (it might be stuck before PCI code is loaded)
In that case the migration will be stuck forever and abort attempts will make the domain state basically undefined depending on the phase where it failed.
Since we can't guarantee that the unplug of the PCI host devices will be atomic or that it will succeed we basically can't guarantee in any way in which state the VM will end up later after (a possibly failed) migration. To recover such state there are too many option that could be desired by the user that would be hard to implement in a way that would be flexible enough.
I don't understand why this is any different to any other PCI device hot-unplug.
It's the same, but once libvirt would be doing multiple PCI unplug requests along with the migration code, things might not go well. If you then couple this with different user expectations what should happen in various error cases it gets even more messy.
Well, since we've got the bond it shouldn't get quite that bad; the error cases don't sound that bad: 1) If we can't hot-unplug then we don't migrate/cancel migration. We warn the user, if we're unlucky we're left running on the bond. 2) If we can't hot-plug at the end, then we've still got the bond in, so the guest carries on running (albeit with reduced performance). We need to flag this to the user somehow.
If there are multiple PCI devices attached to the guest, we may end up with some PCI devices removed and some still present, and some for which we don't know if they are removed or present at all as the guest may simply not have responded to us yet. Further there are devices which are not just bonded NICs, so I'm really not happy for us to design a policy that works for bonded NICs but which is quite possibly going to be useless for other types of PCI device people will inevitably want to deal with later. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

* Daniel P. Berrange (berrange@redhat.com) wrote:
On Wed, May 13, 2015 at 10:00:42AM +0100, Dr. David Alan Gilbert wrote:
* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 09:40:23 +0100, Dr. David Alan Gilbert wrote:
* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 09:08:39 +0100, Dr. David Alan Gilbert wrote:
* Peter Krempa (pkrempa@redhat.com) wrote: > On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote: > > my main goal is to add support migration with host NIC > > passthrough devices and keep the network connectivity. > > > > this series patch base on Shradha's patches on > > https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html > > which is add migration support for host passthrough devices. > > > > 1) unplug the ephemeral devices before migration > > > > 2) do native migration > > > > 3) when migration finished, hotplug the ephemeral devices > > IMHO this algorithm is something that an upper layer management app > should do. The device unplug operation is complex and it might not > succeed which will make the current migration thread hang or fail in an > intermediate state that will not be recoverable.
However you wouldn't want each of the upper layer management apps implementing their own hacks for this; so something somewhere needs to standardise what the guest sees.
The guest still will see an PCI device unplug request and will have to respond to it, then will be paused and after resume a new PCI device will appear. This is standardised. The nonstandardised part (which can't really be standardised) is how the bonding or other guest-dependant stuff will be handled, but that is up to the guest OS to handle.
Why can't that be standardised? Don't we need to provide the information on what to bond to the guest and that this process is happening? The previous suggestion was to use guest-agent for this.
Well, since only in linux you've got multiple ways to do that including legacy init scripts on various distros, the systemd-networkd thingie or how it's called or network manager, standardising this part won't be that easy. Not speaking of possible different OSes.
Right - so we need to standardise on the messaging we send to the guest to tell it that we've got this bonded hotplug setup, and then the different OSs can implement what they need off using that information.
From libvirt's perspective this is only something that will trigger the device unplug and plug the devices back. And there are a lot of issues here:
1) the destination of the migration might not have the desired devices
This will trigger a lot of problems as we will not be able to guarantee that the devices reappear on the destination and if we'd wanted to check we'd need a new migration protocol AFAIK.
But if it's using the bonding trick then that isn't fatal; it would still be able to have the bonded virtio device.
2) The guest OS might refuse to detach the PCI device (it might be stuck before PCI code is loaded)
In that case the migration will be stuck forever and abort attempts will make the domain state basically undefined depending on the phase where it failed.
Since we can't guarantee that the unplug of the PCI host devices will be atomic or that it will succeed we basically can't guarantee in any way in which state the VM will end up later after (a possibly failed) migration. To recover such state there are too many option that could be desired by the user that would be hard to implement in a way that would be flexible enough.
I don't understand why this is any different to any other PCI device hot-unplug.
It's the same, but once libvirt would be doing multiple PCI unplug requests along with the migration code, things might not go well. If you then couple this with different user expectations what should happen in various error cases it gets even more messy.
Well, since we've got the bond it shouldn't get quite that bad; the error cases don't sound that bad: 1) If we can't hot-unplug then we don't migrate/cancel migration. We warn the user, if we're unlucky we're left running on the bond. 2) If we can't hot-plug at the end, then we've still got the bond in, so the guest carries on running (albeit with reduced performance). We need to flag this to the user somehow.
If there are multiple PCI devices attached to the guest, we may end up with some PCI devices removed and some still present, and some for which we don't know if they are removed or present at all as the guest may simply not have responded to us yet. Further there are devices which are not just bonded NICs, so I'm really not happy for us to design a policy that works for bonded NICs but which is quite possibly going to be useless for other types of PCI device people will inevitably want to deal with later.
This is only trying to address the problem for devices that can have the equivalent of a bond; so it's not NIC specific; the same should work for storage devices with multipath. Dave
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

On Wed, May 13, 2015 at 09:40:23AM +0100, Dr. David Alan Gilbert wrote:
* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 09:08:39 +0100, Dr. David Alan Gilbert wrote:
* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote:
my main goal is to add support migration with host NIC passthrough devices and keep the network connectivity.
this series patch base on Shradha's patches on https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html which is add migration support for host passthrough devices.
1) unplug the ephemeral devices before migration
2) do native migration
3) when migration finished, hotplug the ephemeral devices
IMHO this algorithm is something that an upper layer management app should do. The device unplug operation is complex and it might not succeed which will make the current migration thread hang or fail in an intermediate state that will not be recoverable.
However you wouldn't want each of the upper layer management apps implementing their own hacks for this; so something somewhere needs to standardise what the guest sees.
The guest still will see an PCI device unplug request and will have to respond to it, then will be paused and after resume a new PCI device will appear. This is standardised. The nonstandardised part (which can't really be standardised) is how the bonding or other guest-dependant stuff will be handled, but that is up to the guest OS to handle.
Why can't that be standardised? Don't we need to provide the information on what to bond to the guest and that this process is happening? The previous suggestion was to use guest-agent for this.
When we've had "standardized" policy decisions in libvirt before it has come back to bite us later. The block migration code is a prime example. Someone decided that the standardized behaviour should be that block migrate skips readonly disks. Everything looked great for a while and then a new use case came up in OpenStack for which this standardized behaviour is no longer suitable. Now we have to abandon this standardized policy and implement what we actually want in OpenStack itself. This is the key problem with applying policy decisions inside libvirt, and thus why our focus is on providing the mechanisms on which applications should build the policies they specifically desire. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 05/13/2015 04:28 AM, Peter Krempa wrote:
On Wed, May 13, 2015 at 09:08:39 +0100, Dr. David Alan Gilbert wrote:
* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote:
my main goal is to add support migration with host NIC passthrough devices and keep the network connectivity.
this series patch base on Shradha's patches on https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html which is add migration support for host passthrough devices.
1) unplug the ephemeral devices before migration
2) do native migration
3) when migration finished, hotplug the ephemeral devices
IMHO this algorithm is something that an upper layer management app should do. The device unplug operation is complex and it might not succeed which will make the current migration thread hang or fail in an intermediate state that will not be recoverable.
However you wouldn't want each of the upper layer management apps implementing their own hacks for this; so something somewhere needs to standardise what the guest sees.
The guest still will see an PCI device unplug request and will have to respond to it, then will be paused and after resume a new PCI device will appear. This is standardised. The nonstandardised part (which can't really be standardised) is how the bonding or other guest-dependant stuff will be handled, but that is up to the guest OS to handle.
From libvirt's perspective this is only something that will trigger the device unplug and plug the devices back. And there are a lot of issues here:
1) the destination of the migration might not have the desired devices
This will trigger a lot of problems as we will not be able to guarantee that the devices reappear on the destination and if we'd wanted to check we'd need a new migration protocol AFAIK.
2) The guest OS might refuse to detach the PCI device (it might be stuck before PCI code is loaded)
In that case the migration will be stuck forever and abort attempts will make the domain state basically undefined depending on the phase where it failed.
Since we can't guarantee that the unplug of the PCI host devices will be atomic or that it will succeed we basically can't guarantee in any way in which state the VM will end up later after (a possibly failed) migration. To recover such state there are too many option that could be desired by the user that would be hard to implement in a way that would be flexible enough.
In the past I've been on the side of having libvirt automatically do the device detach and reattach (but definitely on the side of the guest agent and libvirt keeping their hands off of network configuration in the guest), with the thinking that 1) libvirt is in a well situated spot to do it, and 2) this would eliminate duplicate code in the upper level management. However, Peter's points above made me consider the failure cases more closely, in particular this one: * the destination claims to have the resources required (right type of PCI device, enough RAM), so migration is started. * device detached on source, guest memory migrated to destination, * guest started - no problems. (At this point, since the guest has been restarted, it's not really possible for libvirt to fail the migration in a recoverable manner (unless you want to implement some sort of "unmigration" so that the guest state on the source is updated with whatever execution occurred on the destination, and I don't think *anyone* wants to go there)) * libvirt finds the device still available and attempts to attach it but (for some odd reason) fails. Now libvirt can't tell the application that the migration has succeeded, because it didn't (unless the device was marked as "optional"), but it also can't fail the migration except to say "this is such a monumental failure that your guest has simply died". If, on the other hand, the detach and re-attach are implemented in a higher layer (ovirt/openstack), they will at least have the guest in a state they can deal with - it won't be pretty, but they could for example migrate the guest to another host (maybe back to the source) and re-attach there. So this one message from Peter has nicely pointed out the error in my thinking, and I now agree that auto-detach/reattach shouldn't be implemented in libvirt - it would work nicely in an error free world, but would crumble in the face of some errors. (I just wish I had considered the particular failure mode above a year or two ago, so I could have been more discouraging in my emails then :-)

* Laine Stump (laine@redhat.com) wrote:
On 05/13/2015 04:28 AM, Peter Krempa wrote:
On Wed, May 13, 2015 at 09:08:39 +0100, Dr. David Alan Gilbert wrote:
* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote:
my main goal is to add support migration with host NIC passthrough devices and keep the network connectivity.
this series patch base on Shradha's patches on https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html which is add migration support for host passthrough devices.
1) unplug the ephemeral devices before migration
2) do native migration
3) when migration finished, hotplug the ephemeral devices
IMHO this algorithm is something that an upper layer management app should do. The device unplug operation is complex and it might not succeed which will make the current migration thread hang or fail in an intermediate state that will not be recoverable.
However you wouldn't want each of the upper layer management apps implementing their own hacks for this; so something somewhere needs to standardise what the guest sees.
The guest still will see an PCI device unplug request and will have to respond to it, then will be paused and after resume a new PCI device will appear. This is standardised. The nonstandardised part (which can't really be standardised) is how the bonding or other guest-dependant stuff will be handled, but that is up to the guest OS to handle.
From libvirt's perspective this is only something that will trigger the device unplug and plug the devices back. And there are a lot of issues here:
1) the destination of the migration might not have the desired devices
This will trigger a lot of problems as we will not be able to guarantee that the devices reappear on the destination and if we'd wanted to check we'd need a new migration protocol AFAIK.
2) The guest OS might refuse to detach the PCI device (it might be stuck before PCI code is loaded)
In that case the migration will be stuck forever and abort attempts will make the domain state basically undefined depending on the phase where it failed.
Since we can't guarantee that the unplug of the PCI host devices will be atomic or that it will succeed we basically can't guarantee in any way in which state the VM will end up later after (a possibly failed) migration. To recover such state there are too many option that could be desired by the user that would be hard to implement in a way that would be flexible enough.
In the past I've been on the side of having libvirt automatically do the device detach and reattach (but definitely on the side of the guest agent and libvirt keeping their hands off of network configuration in the guest), with the thinking that 1) libvirt is in a well situated spot to do it, and 2) this would eliminate duplicate code in the upper level management.
However, Peter's points above made me consider the failure cases more closely, in particular this one:
* the destination claims to have the resources required (right type of PCI device, enough RAM), so migration is started.
* device detached on source, guest memory migrated to destination,
* guest started - no problems. (At this point, since the guest has been restarted, it's not really possible for libvirt to fail the migration in a recoverable manner (unless you want to implement some sort of "unmigration" so that the guest state on the source is updated with whatever execution occurred on the destination, and I don't think *anyone* wants to go there))
* libvirt finds the device still available and attempts to attach it but (for some odd reason) fails.
Now libvirt can't tell the application that the migration has succeeded, because it didn't (unless the device was marked as "optional"), but it also can't fail the migration except to say "this is such a monumental failure that your guest has simply died".
If, on the other hand, the detach and re-attach are implemented in a higher layer (ovirt/openstack), they will at least have the guest in a state they can deal with - it won't be pretty, but they could for example migrate the guest to another host (maybe back to the source) and re-attach there.
So this one message from Peter has nicely pointed out the error in my thinking, and I now agree that auto-detach/reattach shouldn't be implemented in libvirt - it would work nicely in an error free world, but would crumble in the face of some errors. (I just wish I had considered the particular failure mode above a year or two ago, so I could have been more discouraging in my emails then :-)
It's a shame to limit the utility of this by dealing with an error case that's not a fatal error. Does libvirt not have a way of dealing with non-fatal errors? Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

On 05/13/2015 10:42 AM, Dr. David Alan Gilbert wrote:
* Laine Stump (laine@redhat.com) wrote:
On 05/13/2015 04:28 AM, Peter Krempa wrote:
On Wed, May 13, 2015 at 09:08:39 +0100, Dr. David Alan Gilbert wrote:
* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote:
my main goal is to add support migration with host NIC passthrough devices and keep the network connectivity.
this series patch base on Shradha's patches on https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html which is add migration support for host passthrough devices.
1) unplug the ephemeral devices before migration
2) do native migration
3) when migration finished, hotplug the ephemeral devices
IMHO this algorithm is something that an upper layer management app should do. The device unplug operation is complex and it might not succeed which will make the current migration thread hang or fail in an intermediate state that will not be recoverable.
However you wouldn't want each of the upper layer management apps implementing their own hacks for this; so something somewhere needs to standardise what the guest sees.
The guest still will see an PCI device unplug request and will have to respond to it, then will be paused and after resume a new PCI device will appear. This is standardised. The nonstandardised part (which can't really be standardised) is how the bonding or other guest-dependant stuff will be handled, but that is up to the guest OS to handle.
From libvirt's perspective this is only something that will trigger the device unplug and plug the devices back. And there are a lot of issues here:
1) the destination of the migration might not have the desired devices
This will trigger a lot of problems as we will not be able to guarantee that the devices reappear on the destination and if we'd wanted to check we'd need a new migration protocol AFAIK.
2) The guest OS might refuse to detach the PCI device (it might be stuck before PCI code is loaded)
In that case the migration will be stuck forever and abort attempts will make the domain state basically undefined depending on the phase where it failed.
Since we can't guarantee that the unplug of the PCI host devices will be atomic or that it will succeed we basically can't guarantee in any way in which state the VM will end up later after (a possibly failed) migration. To recover such state there are too many option that could be desired by the user that would be hard to implement in a way that would be flexible enough.
In the past I've been on the side of having libvirt automatically do the device detach and reattach (but definitely on the side of the guest agent and libvirt keeping their hands off of network configuration in the guest), with the thinking that 1) libvirt is in a well situated spot to do it, and 2) this would eliminate duplicate code in the upper level management.
However, Peter's points above made me consider the failure cases more closely, in particular this one:
* the destination claims to have the resources required (right type of PCI device, enough RAM), so migration is started.
* device detached on source, guest memory migrated to destination,
* guest started - no problems. (At this point, since the guest has been restarted, it's not really possible for libvirt to fail the migration in a recoverable manner (unless you want to implement some sort of "unmigration" so that the guest state on the source is updated with whatever execution occurred on the destination, and I don't think *anyone* wants to go there))
* libvirt finds the device still available and attempts to attach it but (for some odd reason) fails.
Now libvirt can't tell the application that the migration has succeeded, because it didn't (unless the device was marked as "optional"), but it also can't fail the migration except to say "this is such a monumental failure that your guest has simply died".
If, on the other hand, the detach and re-attach are implemented in a higher layer (ovirt/openstack), they will at least have the guest in a state they can deal with - it won't be pretty, but they could for example migrate the guest to another host (maybe back to the source) and re-attach there.
So this one message from Peter has nicely pointed out the error in my thinking, and I now agree that auto-detach/reattach shouldn't be implemented in libvirt - it would work nicely in an error free world, but would crumble in the face of some errors. (I just wish I had considered the particular failure mode above a year or two ago, so I could have been more discouraging in my emails then :-)
It's a shame to limit the utility of this by dealing with an error case that's not a fatal error. Does libvirt not have a way of dealing with non-fatal errors?
But is it non-fatal? Dan's point is that isn't up to libvirt to decide. In the case of attached USB devices, there is an attribute called startupPolicy which can be set to "mandatory", "requisite" or "optional". The first would cause a failure of the migration if the device wasn't present on the destination of migrate, while the other two would result in the device simply not being present on the destination. But USB works differently from PCI - I don't think it even detaces the device from the guest - so it doesn't have the same problems as a PCI device. Although libvirt can reserve the device on the destination before the migration starts, once the guest CPUs have been restarted, there is currently "no going back". The only options would be 1) fail the migration and kill the guest on the destination (is there even a state for this?) or 2) implement new code to stop the CPUs and migrate the new memory state back to the source, restart the CPUs on the source, and report the migration as failed (not implemented, and wouldn't be very pretty). We *could* just unilaterally decide that all PCI assigned devices are "optional" on the destination, and report the migration as a success (just without the device being attached), but that is getting into the territory of "libvirt making policy decisions" as discussed by Dan.

* Laine Stump (laine@redhat.com) wrote:
On 05/13/2015 10:42 AM, Dr. David Alan Gilbert wrote:
* Laine Stump (laine@redhat.com) wrote:
On 05/13/2015 04:28 AM, Peter Krempa wrote:
On Wed, May 13, 2015 at 09:08:39 +0100, Dr. David Alan Gilbert wrote:
* Peter Krempa (pkrempa@redhat.com) wrote:
On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote: > my main goal is to add support migration with host NIC > passthrough devices and keep the network connectivity. > > this series patch base on Shradha's patches on > https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html > which is add migration support for host passthrough devices. > > 1) unplug the ephemeral devices before migration > > 2) do native migration > > 3) when migration finished, hotplug the ephemeral devices
IMHO this algorithm is something that an upper layer management app should do. The device unplug operation is complex and it might not succeed which will make the current migration thread hang or fail in an intermediate state that will not be recoverable.
However you wouldn't want each of the upper layer management apps implementing their own hacks for this; so something somewhere needs to standardise what the guest sees.
The guest still will see an PCI device unplug request and will have to respond to it, then will be paused and after resume a new PCI device will appear. This is standardised. The nonstandardised part (which can't really be standardised) is how the bonding or other guest-dependant stuff will be handled, but that is up to the guest OS to handle.
From libvirt's perspective this is only something that will trigger the device unplug and plug the devices back. And there are a lot of issues here:
1) the destination of the migration might not have the desired devices
This will trigger a lot of problems as we will not be able to guarantee that the devices reappear on the destination and if we'd wanted to check we'd need a new migration protocol AFAIK.
2) The guest OS might refuse to detach the PCI device (it might be stuck before PCI code is loaded)
In that case the migration will be stuck forever and abort attempts will make the domain state basically undefined depending on the phase where it failed.
Since we can't guarantee that the unplug of the PCI host devices will be atomic or that it will succeed we basically can't guarantee in any way in which state the VM will end up later after (a possibly failed) migration. To recover such state there are too many option that could be desired by the user that would be hard to implement in a way that would be flexible enough.
In the past I've been on the side of having libvirt automatically do the device detach and reattach (but definitely on the side of the guest agent and libvirt keeping their hands off of network configuration in the guest), with the thinking that 1) libvirt is in a well situated spot to do it, and 2) this would eliminate duplicate code in the upper level management.
However, Peter's points above made me consider the failure cases more closely, in particular this one:
* the destination claims to have the resources required (right type of PCI device, enough RAM), so migration is started.
* device detached on source, guest memory migrated to destination,
* guest started - no problems. (At this point, since the guest has been restarted, it's not really possible for libvirt to fail the migration in a recoverable manner (unless you want to implement some sort of "unmigration" so that the guest state on the source is updated with whatever execution occurred on the destination, and I don't think *anyone* wants to go there))
* libvirt finds the device still available and attempts to attach it but (for some odd reason) fails.
Now libvirt can't tell the application that the migration has succeeded, because it didn't (unless the device was marked as "optional"), but it also can't fail the migration except to say "this is such a monumental failure that your guest has simply died".
If, on the other hand, the detach and re-attach are implemented in a higher layer (ovirt/openstack), they will at least have the guest in a state they can deal with - it won't be pretty, but they could for example migrate the guest to another host (maybe back to the source) and re-attach there.
So this one message from Peter has nicely pointed out the error in my thinking, and I now agree that auto-detach/reattach shouldn't be implemented in libvirt - it would work nicely in an error free world, but would crumble in the face of some errors. (I just wish I had considered the particular failure mode above a year or two ago, so I could have been more discouraging in my emails then :-)
It's a shame to limit the utility of this by dealing with an error case that's not a fatal error. Does libvirt not have a way of dealing with non-fatal errors?
But is it non-fatal? Dan's point is that isn't up to libvirt to decide. In the case of attached USB devices, there is an attribute called startupPolicy which can be set to "mandatory", "requisite" or "optional". The first would cause a failure of the migration if the device wasn't present on the destination of migrate, while the other two would result in the device simply not being present on the destination. But USB works differently from PCI - I don't think it even detaces the device from the guest - so it doesn't have the same problems as a PCI device.
Although libvirt can reserve the device on the destination before the migration starts, once the guest CPUs have been restarted, there is currently "no going back". The only options would be 1) fail the migration and kill the guest on the destination (is there even a state for this?) or 2) implement new code to stop the CPUs and migrate the new memory state back to the source, restart the CPUs on the source, and report the migration as failed (not implemented, and wouldn't be very pretty).
We *could* just unilaterally decide that all PCI assigned devices are "optional" on the destination, and report the migration as a success (just without the device being attached), but that is getting into the territory of "libvirt making policy decisions" as discussed by Dan.
I don't see it as policy; it's just that we only have a good solution for the "optional" case. It's actually not the mechanics of doing the hot-add/remove that worry me; getting a higher layer to do those is kind of OK; what I'm more worried about is standardising the mechanism to let the guest know about the pairs of devices, including when adding a new device. Since that requires some guest cooperation, I wouldn't want the guest cooperation to have to be dependent on which higher-level management system is used. Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

On Wed, May 13, 2015 at 10:05:24AM +0200, Peter Krempa wrote:
On Wed, May 13, 2015 at 11:36:26 +0800, Chen Fan wrote:
my main goal is to add support migration with host NIC passthrough devices and keep the network connectivity.
this series patch base on Shradha's patches on https://www.redhat.com/archives/libvir-list/2012-November/msg01324.html which is add migration support for host passthrough devices.
1) unplug the ephemeral devices before migration
2) do native migration
3) when migration finished, hotplug the ephemeral devices
IMHO this algorithm is something that an upper layer management app should do. The device unplug operation is complex and it might not succeed which will make the current migration thread hang or fail in an intermediate state that will not be recoverable.
Agreed, that's what I have said in response to this suggestion many times before. This kind of thing really falls into the realm of usage policy, and we've long said that libvirt should focus on providing the /mechanism/ and leave usage policy upto the management application. There are many possible policies, and libvirt should not be trying to decide which is best for all applications. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (7)
-
Chen Fan
-
Daniel P. Berrange
-
Dr. David Alan Gilbert
-
Kamezawa Hiroyuki
-
Laine Stump
-
Laine Stump
-
Peter Krempa