[libvirt] [PATCH 0/8] Hostdev-hybrid patches

This patch series adds the support for interface-type="hostdev-hybrid" and forward mode="hostdev-hybrid". The hostdev-hybrid mode makes migration possible along with PCI-passthrough. I had posted a RFC on the hostdev-hybrid methodology earlier on the libvirt mailing list. The RFC can be found here: https://www.redhat.com/archives/libvir-list/2012-February/msg00309.html Shradha Shah (8): RNG updates, new xml parser/formatter code for interface type=hostdev-hybrid RNG updates, new xml parser/formatter code for forward mode=hostdev-hybrid Hostdev-hybrid mode requires a direct linkdev and direct mode. ActualParent is used to store the information about the NETDEV. network: support hostdev-hybrid in network driver qemu: support netdevs from hostdev-hybrid networks Using the Ephemeral Flag to prepare for Migration Support. Migration support for hostdev-hybrid. docs/formatdomain.html.in | 29 ++++ docs/formatnetwork.html.in | 46 ++++++ docs/schemas/domaincommon.rng | 50 ++++++ docs/schemas/network.rng | 1 + include/libvirt/libvirt.h.in | 1 + src/conf/domain_conf.c | 167 +++++++++++++++++--- src/conf/domain_conf.h | 8 + src/conf/network_conf.c | 9 +- src/conf/network_conf.h | 1 + src/libvirt_private.syms | 1 + src/network/bridge_driver.c | 123 +++++++++++++-- src/qemu/qemu_command.c | 61 +++++++ src/qemu/qemu_domain.c | 6 +- src/qemu/qemu_domain.h | 3 +- src/qemu/qemu_driver.c | 6 +- src/qemu/qemu_hostdev.c | 158 +++++++++++++------ src/qemu/qemu_hotplug.c | 26 +++- src/qemu/qemu_migration.c | 106 ++++++++++++- src/qemu/qemu_process.c | 3 +- src/uml/uml_conf.c | 5 + src/util/pci.c | 2 +- src/util/pci.h | 2 + src/util/virnetdev.c | 40 +++++ src/util/virnetdev.h | 6 + src/xenxs/xen_sxpr.c | 1 + tests/networkxml2xmlin/hostdev-hybrid-pf.xml | 7 + tests/networkxml2xmlin/hostdev-hybrid.xml | 10 ++ tests/networkxml2xmlout/hostdev-hybrid-pf.xml | 7 + tests/networkxml2xmlout/hostdev-hybrid.xml | 10 ++ tests/networkxml2xmltest.c | 2 + .../qemuxml2argv-net-hostdevhybrid.args | 8 + .../qemuxml2argv-net-hostdevhybrid.xml | 35 ++++ tests/qemuxml2argvtest.c | 2 + .../qemuxml2xmlout-net-hostdevhybrid.xml | 40 +++++ tests/qemuxml2xmltest.c | 1 + 35 files changed, 884 insertions(+), 99 deletions(-) create mode 100644 tests/networkxml2xmlin/hostdev-hybrid-pf.xml create mode 100644 tests/networkxml2xmlin/hostdev-hybrid.xml create mode 100644 tests/networkxml2xmlout/hostdev-hybrid-pf.xml create mode 100644 tests/networkxml2xmlout/hostdev-hybrid.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-net-hostdevhybrid.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-net-hostdevhybrid.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-net-hostdevhybrid.xml -- 1.7.4.4

This patch introduces the new interface type='hostdev-hybrid' along with attribute managed Includes updates to the domain RNG and new xml parser/formatter code. Also introduces a ephemeral tag for hybrid hostdevs. The ephemeral tag for hybrid hostdevs will be useful for live migration support at a later stage. --- docs/formatdomain.html.in | 29 ++++++ docs/schemas/domaincommon.rng | 50 ++++++++++ src/conf/domain_conf.c | 96 +++++++++++++++++--- src/conf/domain_conf.h | 2 + src/uml/uml_conf.c | 5 + src/xenxs/xen_sxpr.c | 1 + .../qemuxml2argv-net-hostdevhybrid.args | 8 ++ .../qemuxml2argv-net-hostdevhybrid.xml | 35 +++++++ tests/qemuxml2argvtest.c | 2 + .../qemuxml2xmlout-net-hostdevhybrid.xml | 40 ++++++++ tests/qemuxml2xmltest.c | 1 + 11 files changed, 256 insertions(+), 13 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 503685f..70cf362 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -2659,6 +2659,20 @@ guest instead of <interface type='hostdev'/>. </p> + <p> + Libvirt later than 0.10.0 also supports "intelligent passthrough" + of VF in the hybrid mode. This is done by using the <interface + type='hostdev-hybrid'/> functionality. Similar to <interface + type='hostdev'/> the device's MAC address is first optionally + configured and the device is optionally associated with an 802.1Qbh + capable switch using an optionally specified <virtualport> + element (see the examples of virtualport given above for + type='direct' network devices). The Vf is passed into the guest as + a PCI device and at the same time a virtual interface with + type='direct' mode='bridge' is created in the guest. This hybrid mode + of intelligent passthrough makes Live migration possible. + </p> + <pre> ... <devices> @@ -2674,6 +2688,21 @@ </devices> ...</pre> +<pre> + ... + <devices> + <interface type='hostdev-hybrid'> + <source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> + </source> + <mac address='52:54:00:6d:90:02'> + <virtualport type='802.1Qbh'> + <parameters profileid='finance'/> + </virtualport> + </interface> + </devices> + ...</pre> + <h5><a name="elementsNICSMulticast">Multicast tunnel</a></h5> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index c2c6184..eedc255 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -1677,6 +1677,56 @@ <ref name="interface-options"/> </interleave> </group> + <group> + <attribute name="type"> + <value>hostdev-hybrid</value> + </attribute> + <optional> + <attribute name="managed"> + <choice> + <value>yes</value> + <value>no</value> + </choice> + </attribute> + </optional> + <interleave> + <element name="source"> + <choice> + <group> + <ref name="usbproduct"/> + <optional> + <ref name="usbaddress"/> + </optional> + </group> + <element name="address"> + <choice> + <group> + <attribute name="type"> + <value>pci</value> + </attribute> + <ref name="pciaddress"/> + </group> + <group> + <attribute name="type"> + <value>usb</value> + </attribute> + <attribute name="bus"> + <ref name="usbAddr"/> + </attribute> + <attribute name="device"> + <ref name="usbPort"/> + </attribute> + </group> + </choice> + </element> + </choice> + </element> + <optional> + <ref name="virtualPortProfile"/> + </optional> + <ref name="interface-options"/> + </interleave> + </group> </choice> </element> </define> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 8952b69..d8ab40c 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -293,7 +293,8 @@ VIR_ENUM_IMPL(virDomainNet, VIR_DOMAIN_NET_TYPE_LAST, "bridge", "internal", "direct", - "hostdev") + "hostdev", + "hostdev-hybrid") VIR_ENUM_IMPL(virDomainNetBackend, VIR_DOMAIN_NET_BACKEND_TYPE_LAST, "default", @@ -1020,6 +1021,9 @@ virDomainActualNetDefFree(virDomainActualNetDefPtr def) case VIR_DOMAIN_NET_TYPE_HOSTDEV: virDomainHostdevDefClear(&def->data.hostdev.def); break; + case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: + virDomainHostdevDefClear(&def->data.hostdev.def); + break; default: break; } @@ -1072,6 +1076,10 @@ void virDomainNetDefFree(virDomainNetDefPtr def) virDomainHostdevDefClear(&def->data.hostdev.def); break; + case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: + virDomainHostdevDefClear(&def->data.hostdev.def); + break; + case VIR_DOMAIN_NET_TYPE_USER: case VIR_DOMAIN_NET_TYPE_LAST: break; @@ -1563,8 +1571,10 @@ void virDomainDefFree(virDomainDefPtr def) * so the original object must still be available during the call * to virDomainHostdevDefFree(). */ - for (i = 0 ; i < def->nhostdevs ; i++) - virDomainHostdevDefFree(def->hostdevs[i]); + for (i = 0 ; i < def->nhostdevs ; i++) { + if (def->hostdevs[i]->ephemeral == 0) + virDomainHostdevDefFree(def->hostdevs[i]); + } VIR_FREE(def->hostdevs); for (i = 0 ; i < def->nleases ; i++) @@ -4525,6 +4535,7 @@ virDomainActualNetDefParseXML(xmlNodePtr node, if (actual->type != VIR_DOMAIN_NET_TYPE_BRIDGE && actual->type != VIR_DOMAIN_NET_TYPE_DIRECT && actual->type != VIR_DOMAIN_NET_TYPE_HOSTDEV && + actual->type != VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID && actual->type != VIR_DOMAIN_NET_TYPE_NETWORK) { virReportError(VIR_ERR_INTERNAL_ERROR, _("unsupported type '%s' in interface's <actual> element"), @@ -4536,7 +4547,8 @@ virDomainActualNetDefParseXML(xmlNodePtr node, if (virtPortNode) { if (actual->type == VIR_DOMAIN_NET_TYPE_BRIDGE || actual->type == VIR_DOMAIN_NET_TYPE_DIRECT || - actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) { + actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV || + actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) { /* the virtualport in <actual> should always already * have an instanceid/interfaceid if its required, * so don't let the parser generate one */ @@ -4589,6 +4601,30 @@ virDomainActualNetDefParseXML(xmlNodePtr node, hostdev, flags) < 0) { goto error; } + } else if (actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) { + virDomainHostdevDefPtr hostdev = &actual->data.hostdev.def; + + hostdev->parent.type = VIR_DOMAIN_DEVICE_NONE; + if (VIR_ALLOC(hostdev->info) < 0) { + virReportOOMError(); + goto error; + } + hostdev->ephemeral = 1; + /* The helper function expects type to already be found and + * passed in as a string, since it is in a different place in + * NetDef vs HostdevDef. + */ + addrtype = virXPathString("string(./source/address/@type)", ctxt); + /* if not explicitly stated, source/vendor implies usb device */ + if (!addrtype && virXPathNode("./source/vendor", ctxt) && + (addrtype = strdup("usb")) == NULL) { + virReportOOMError(); + goto error; + } + if (virDomainHostdevPartsParse(node, ctxt, NULL, addrtype, + hostdev, flags) < 0) { + goto error; + } } bandwidth_node = virXPathNode("./bandwidth", ctxt); @@ -4708,7 +4744,8 @@ virDomainNetDefParseXML(virCapsPtr caps, } } else if (def->type == VIR_DOMAIN_NET_TYPE_BRIDGE || def->type == VIR_DOMAIN_NET_TYPE_DIRECT || - def->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) { + def->type == VIR_DOMAIN_NET_TYPE_HOSTDEV || + def->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) { if (!(def->virtPortProfile = virNetDevVPortProfileParse(cur, VIR_VPORT_XML_GENERATE_MISSING_DEFAULTS| @@ -4968,6 +5005,27 @@ virDomainNetDefParseXML(virCapsPtr caps, } break; + case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: + hostdev = &def->data.hostdev.def; + hostdev->parent.type = VIR_DOMAIN_DEVICE_NONE; + if (VIR_ALLOC(hostdev->info) < 0) { + virReportOOMError(); + goto error; + } + hostdev->ephemeral = 1; + addrtype = virXPathString("string(./source/address/@type)", ctxt); + /* if not explicitly stated, source/vendor implies usb device */ + if (!addrtype && virXPathNode("./source/vendor", ctxt) && + ((addrtype = strdup("usb")) == NULL)) { + virReportOOMError(); + goto error; + } + if (virDomainHostdevPartsParse(node, ctxt, NULL, addrtype, + hostdev, flags) < 0) { + goto error; + } + break; + case VIR_DOMAIN_NET_TYPE_USER: case VIR_DOMAIN_NET_TYPE_LAST: break; @@ -7609,7 +7667,8 @@ int virDomainNetInsert(virDomainDefPtr def, virDomainNetDefPtr net) return -1; def->nets[def->nnets] = net; def->nnets++; - if (net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) { + if ((net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) || + (net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID)) { /* hostdev net devices must also exist in the hostdevs array */ return virDomainHostdevInsert(def, &net->data.hostdev.def); } @@ -7631,7 +7690,8 @@ virDomainNetRemove(virDomainDefPtr def, size_t i) { virDomainNetDefPtr net = def->nets[i]; - if (net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) { + if ((net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) || + (net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID)) { /* hostdev net devices are normally also be in the hostdevs * array, but might have already been removed by the time we * get here. @@ -8995,8 +9055,10 @@ static virDomainDefPtr virDomainDefParseXML(virCapsPtr caps, def->nets[def->nnets++] = net; - /* <interface type='hostdev'> must also be in the hostdevs array */ - if (net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV && + /* <interface type='hostdev' and 'hostdev-hybrid'> must also be in + the hostdevs array */ + if (((net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) || + (net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID)) && virDomainHostdevInsert(def, &net->data.hostdev.def) < 0) { goto no_memory; } @@ -11941,7 +12003,8 @@ virDomainActualNetDefFormat(virBufferPtr buf, } virBufferAsprintf(buf, "<actual type='%s'", type); - if (def->type == VIR_DOMAIN_NET_TYPE_HOSTDEV && + if ((def->type == VIR_DOMAIN_NET_TYPE_HOSTDEV || + def->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) && def->data.hostdev.def.managed) { virBufferAddLit(buf, " managed='yes'"); } @@ -11971,6 +12034,7 @@ virDomainActualNetDefFormat(virBufferPtr buf, break; case VIR_DOMAIN_NET_TYPE_HOSTDEV: + case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: if (virDomainHostdevSourceFormat(buf, &def->data.hostdev.def, flags, true) < 0) { return -1; @@ -12011,7 +12075,8 @@ virDomainNetDefFormat(virBufferPtr buf, } virBufferAsprintf(buf, " <interface type='%s'", type); - if (def->type == VIR_DOMAIN_NET_TYPE_HOSTDEV && + if (((def->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) || + (def->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID)) && def->data.hostdev.def.managed) { virBufferAddLit(buf, " managed='yes'"); } @@ -12078,6 +12143,7 @@ virDomainNetDefFormat(virBufferPtr buf, break; case VIR_DOMAIN_NET_TYPE_HOSTDEV: + case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: if (virDomainHostdevSourceFormat(buf, &def->data.hostdev.def, flags, true) < 0) { return -1; @@ -14620,10 +14686,12 @@ virDomainNetGetActualDirectMode(virDomainNetDefPtr iface) virDomainHostdevDefPtr virDomainNetGetActualHostdev(virDomainNetDefPtr iface) { - if (iface->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) + if ((iface->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) || + (iface->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID)) return &iface->data.hostdev.def; if (iface->type == VIR_DOMAIN_NET_TYPE_NETWORK && - iface->data.network.actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) { + (iface->data.network.actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV || + iface->data.network.actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID)){ return &iface->data.network.actual->data.hostdev.def; } return NULL; @@ -14636,6 +14704,7 @@ virDomainNetGetActualVirtPortProfile(virDomainNetDefPtr iface) case VIR_DOMAIN_NET_TYPE_DIRECT: case VIR_DOMAIN_NET_TYPE_BRIDGE: case VIR_DOMAIN_NET_TYPE_HOSTDEV: + case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: return iface->virtPortProfile; case VIR_DOMAIN_NET_TYPE_NETWORK: if (!iface->data.network.actual) @@ -14644,6 +14713,7 @@ virDomainNetGetActualVirtPortProfile(virDomainNetDefPtr iface) case VIR_DOMAIN_NET_TYPE_DIRECT: case VIR_DOMAIN_NET_TYPE_BRIDGE: case VIR_DOMAIN_NET_TYPE_HOSTDEV: + case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: return iface->data.network.actual->virtPortProfile; default: return NULL; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 3995c2d..156eb32 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -378,6 +378,7 @@ struct _virDomainHostdevDef { virDomainDeviceDef parent; /* higher level Def containing this */ int mode; /* enum virDomainHostdevMode */ unsigned int managed : 1; + unsigned int ephemeral : 1; union { virDomainHostdevSubsys subsys; struct { @@ -727,6 +728,7 @@ enum virDomainNetType { VIR_DOMAIN_NET_TYPE_INTERNAL, VIR_DOMAIN_NET_TYPE_DIRECT, VIR_DOMAIN_NET_TYPE_HOSTDEV, + VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID, VIR_DOMAIN_NET_TYPE_LAST, }; diff --git a/src/uml/uml_conf.c b/src/uml/uml_conf.c index 410f3e2..edea034 100644 --- a/src/uml/uml_conf.c +++ b/src/uml/uml_conf.c @@ -261,6 +261,11 @@ umlBuildCommandLineNet(virConnectPtr conn, _("hostdev networking type not supported")); goto error; + case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("hostdev-hybrid networking type not supported")); + goto error; + case VIR_DOMAIN_NET_TYPE_LAST: break; } diff --git a/src/xenxs/xen_sxpr.c b/src/xenxs/xen_sxpr.c index 8bb3849..c94b787 100644 --- a/src/xenxs/xen_sxpr.c +++ b/src/xenxs/xen_sxpr.c @@ -1987,6 +1987,7 @@ xenFormatSxprNet(virConnectPtr conn, case VIR_DOMAIN_NET_TYPE_INTERNAL: case VIR_DOMAIN_NET_TYPE_DIRECT: case VIR_DOMAIN_NET_TYPE_HOSTDEV: + case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: case VIR_DOMAIN_NET_TYPE_LAST: break; } diff --git a/tests/qemuxml2argvdata/qemuxml2argv-net-hostdevhybrid.args b/tests/qemuxml2argvdata/qemuxml2argv-net-hostdevhybrid.args new file mode 100644 index 0000000..a4c50d1 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-net-hostdevhybrid.args @@ -0,0 +1,8 @@ +LC_ALL=C PATH=/bin HOME=/home/test USER=test LOGNAME=test /usr/bin/qemu -S \ +-M pc -m 214 -smp 1 -nographic -nodefconfig -nodefaults -monitor \ +unix:/tmp/test-monitor,server,nowait -no-acpi -boot c \ +-hda /dev/HostVG/QEMUGuest1 \ +-device rtl8139,vlan=0,id=net0,mac=00:11:22:33:44:55,bus=pci.0,addr=0x3 \ +-net user,vlan=0,name=hostnet0 -usb \ +-device pci-assign,host=03:07.1,id=hostdev0,bus=pci.0,addr=0x4 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 diff --git a/tests/qemuxml2argvdata/qemuxml2argv-net-hostdevhybrid.xml b/tests/qemuxml2argvdata/qemuxml2argv-net-hostdevhybrid.xml new file mode 100644 index 0000000..dcf3fd1 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-net-hostdevhybrid.xml @@ -0,0 +1,35 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <interface type='hostdev-hybrid' managed='yes'> + <mac address='00:11:22:33:44:55'/> + <source> + <address type='pci' domain='0x0002' bus='0x03' slot='0x07' function='0x1'/> + </source> + <virtualport type='802.1Qbg'> + <parameters managerid='11' typeid='1193047' typeidversion='2' instanceid='09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f'/> + </virtualport> + </interface> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 47c3f6c..4155352 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -576,6 +576,8 @@ mymain(void) DO_TEST("net-mcast", NONE); DO_TEST("net-hostdev", QEMU_CAPS_PCIDEVICE, QEMU_CAPS_DEVICE, QEMU_CAPS_NODEFCONFIG); + DO_TEST("net-hostdevhybrid", + QEMU_CAPS_PCIDEVICE, QEMU_CAPS_DEVICE, QEMU_CAPS_NODEFCONFIG); DO_TEST("serial-vc", NONE); DO_TEST("serial-pty", NONE); diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-net-hostdevhybrid.xml b/tests/qemuxml2xmloutdata/qemuxml2xmlout-net-hostdevhybrid.xml new file mode 100644 index 0000000..5ab9ed3 --- /dev/null +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-net-hostdevhybrid.xml @@ -0,0 +1,40 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <interface type='hostdev-hybrid' managed='yes'> + <mac address='00:11:22:33:44:55'/> + <source> + <address type='pci' domain='0x0002' bus='0x03' slot='0x07' function='0x1'/> + </source> + <virtualport type='802.1Qbg'> + <parameters managerid='11' typeid='1193047' typeidversion='2' instanceid='09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f'/> + </virtualport> + </interface> + <hostdev mode='subsystem' type='pci' managed='yes'> + <source> + <address domain='0x0002' bus='0x03' slot='0x07' function='0x1'/> + </source> + </hostdev> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 87d9e77..73846e2 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -181,6 +181,7 @@ mymain(void) DO_TEST("net-eth-ifname"); DO_TEST("net-virtio-network-portgroup"); DO_TEST("net-hostdev"); + DO_TEST_DIFFERENT("net-hostdevhybrid"); DO_TEST("net-openvswitch"); DO_TEST("sound"); DO_TEST("sound-device"); -- 1.7.4.4

(a couple of things I noticed in passing, which are useful general info whether or not we take these patches in their current form) On 09/07/2012 12:14 PM, Shradha Shah wrote:
This patch introduces the new interface type='hostdev-hybrid' along with attribute managed Includes updates to the domain RNG and new xml parser/formatter code. Also introduces a ephemeral tag for hybrid hostdevs. The ephemeral tag for hybrid hostdevs will be useful for live migration support at a later stage. --- docs/formatdomain.html.in | 29 ++++++ docs/schemas/domaincommon.rng | 50 ++++++++++ src/conf/domain_conf.c | 96 +++++++++++++++++--- src/conf/domain_conf.h | 2 + src/uml/uml_conf.c | 5 + src/xenxs/xen_sxpr.c | 1 + .../qemuxml2argv-net-hostdevhybrid.args | 8 ++ .../qemuxml2argv-net-hostdevhybrid.xml | 35 +++++++ tests/qemuxml2argvtest.c | 2 + .../qemuxml2xmlout-net-hostdevhybrid.xml | 40 ++++++++ tests/qemuxml2xmltest.c | 1 + 11 files changed, 256 insertions(+), 13 deletions(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 503685f..70cf362 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -2659,6 +2659,20 @@ guest instead of <interface type='hostdev'/>. </p>
+ <p> + Libvirt later than 0.10.0 also supports "intelligent passthrough" + of VF in the hybrid mode. This is done by using the <interface + type='hostdev-hybrid'/> functionality. Similar to <interface + type='hostdev'/> the device's MAC address is first optionally + configured and the device is optionally associated with an 802.1Qbh + capable switch using an optionally specified <virtualport> + element (see the examples of virtualport given above for + type='direct' network devices). The Vf is passed into the guest as + a PCI device and at the same time a virtual interface with + type='direct' mode='bridge' is created in the guest. This hybrid mode + of intelligent passthrough makes Live migration possible. + </p> + <pre> ... <devices> @@ -2674,6 +2688,21 @@ </devices> ...</pre>
+<pre> + ... + <devices> + <interface type='hostdev-hybrid'> + <source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> + </source> + <mac address='52:54:00:6d:90:02'> + <virtualport type='802.1Qbh'> + <parameters profileid='finance'/> + </virtualport> + </interface> + </devices> + ...</pre> +
<h5><a name="elementsNICSMulticast">Multicast tunnel</a></h5>
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index c2c6184..eedc255 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -1677,6 +1677,56 @@ <ref name="interface-options"/> </interleave> </group> + <group> + <attribute name="type"> + <value>hostdev-hybrid</value>
I noticed that the <group> for hostdev-hybrid is exactly the same as for hostdev. In this case, you can just change this line in the hostdev group: <value>hostdev></value> into the following 4 lines: <choice> <value>hostdev</value> <value>hostdev-hybrid</value> </choice> and avoid adding the new <group> altogether.
+ </attribute> + <optional> + <attribute name="managed"> + <choice> + <value>yes</value> + <value>no</value> + </choice> + </attribute> + </optional> + <interleave> + <element name="source"> + <choice> + <group> + <ref name="usbproduct"/> + <optional> + <ref name="usbaddress"/> + </optional> + </group> + <element name="address"> + <choice> + <group> + <attribute name="type"> + <value>pci</value> + </attribute> + <ref name="pciaddress"/> + </group> + <group> + <attribute name="type"> + <value>usb</value> + </attribute> + <attribute name="bus"> + <ref name="usbAddr"/> + </attribute> + <attribute name="device"> + <ref name="usbPort"/> + </attribute> + </group> + </choice> + </element> + </choice> + </element> + <optional> + <ref name="virtualPortProfile"/> + </optional> + <ref name="interface-options"/> + </interleave> + </group> </choice> </element> </define> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 8952b69..d8ab40c 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -293,7 +293,8 @@ VIR_ENUM_IMPL(virDomainNet, VIR_DOMAIN_NET_TYPE_LAST, "bridge", "internal", "direct", - "hostdev") + "hostdev", + "hostdev-hybrid")
VIR_ENUM_IMPL(virDomainNetBackend, VIR_DOMAIN_NET_BACKEND_TYPE_LAST, "default", @@ -1020,6 +1021,9 @@ virDomainActualNetDefFree(virDomainActualNetDefPtr def) case VIR_DOMAIN_NET_TYPE_HOSTDEV: virDomainHostdevDefClear(&def->data.hostdev.def); break; + case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: + virDomainHostdevDefClear(&def->data.hostdev.def); + break;
might as well combine the above two cases.
default: break; } @@ -1072,6 +1076,10 @@ void virDomainNetDefFree(virDomainNetDefPtr def) virDomainHostdevDefClear(&def->data.hostdev.def); break;
+ case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: + virDomainHostdevDefClear(&def->data.hostdev.def); + break; +
Same here I think (can't see all of the context)
case VIR_DOMAIN_NET_TYPE_USER: case VIR_DOMAIN_NET_TYPE_LAST: break; @@ -1563,8 +1571,10 @@ void virDomainDefFree(virDomainDefPtr def) * so the original object must still be available during the call * to virDomainHostdevDefFree(). */ - for (i = 0 ; i < def->nhostdevs ; i++) - virDomainHostdevDefFree(def->hostdevs[i]); + for (i = 0 ; i < def->nhostdevs ; i++) { + if (def->hostdevs[i]->ephemeral == 0) + virDomainHostdevDefFree(def->hostdevs[i]); + } VIR_FREE(def->hostdevs);
for (i = 0 ; i < def->nleases ; i++) @@ -4525,6 +4535,7 @@ virDomainActualNetDefParseXML(xmlNodePtr node, if (actual->type != VIR_DOMAIN_NET_TYPE_BRIDGE && actual->type != VIR_DOMAIN_NET_TYPE_DIRECT && actual->type != VIR_DOMAIN_NET_TYPE_HOSTDEV && + actual->type != VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID && actual->type != VIR_DOMAIN_NET_TYPE_NETWORK) { virReportError(VIR_ERR_INTERNAL_ERROR, _("unsupported type '%s' in interface's <actual> element"), @@ -4536,7 +4547,8 @@ virDomainActualNetDefParseXML(xmlNodePtr node, if (virtPortNode) { if (actual->type == VIR_DOMAIN_NET_TYPE_BRIDGE || actual->type == VIR_DOMAIN_NET_TYPE_DIRECT || - actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) { + actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV || + actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) { /* the virtualport in <actual> should always already * have an instanceid/interfaceid if its required, * so don't let the parser generate one */ @@ -4589,6 +4601,30 @@ virDomainActualNetDefParseXML(xmlNodePtr node, hostdev, flags) < 0) { goto error; } + } else if (actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) { + virDomainHostdevDefPtr hostdev = &actual->data.hostdev.def; + + hostdev->parent.type = VIR_DOMAIN_DEVICE_NONE; + if (VIR_ALLOC(hostdev->info) < 0) { + virReportOOMError(); + goto error; + } + hostdev->ephemeral = 1; + /* The helper function expects type to already be found and + * passed in as a string, since it is in a different place in + * NetDef vs HostdevDef. + */ + addrtype = virXPathString("string(./source/address/@type)", ctxt); + /* if not explicitly stated, source/vendor implies usb device */ + if (!addrtype && virXPathNode("./source/vendor", ctxt) && + (addrtype = strdup("usb")) == NULL) { + virReportOOMError(); + goto error; + } + if (virDomainHostdevPartsParse(node, ctxt, NULL, addrtype, + hostdev, flags) < 0) { + goto error; + }
I've got a feeling a lot of this is common code with hostdev. Maybe you could refactor it to avoid duplicated code.
}
bandwidth_node = virXPathNode("./bandwidth", ctxt); @@ -4708,7 +4744,8 @@ virDomainNetDefParseXML(virCapsPtr caps, } } else if (def->type == VIR_DOMAIN_NET_TYPE_BRIDGE || def->type == VIR_DOMAIN_NET_TYPE_DIRECT || - def->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) { + def->type == VIR_DOMAIN_NET_TYPE_HOSTDEV || + def->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) { if (!(def->virtPortProfile = virNetDevVPortProfileParse(cur, VIR_VPORT_XML_GENERATE_MISSING_DEFAULTS| @@ -4968,6 +5005,27 @@ virDomainNetDefParseXML(virCapsPtr caps, } break;
+ case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: + hostdev = &def->data.hostdev.def; + hostdev->parent.type = VIR_DOMAIN_DEVICE_NONE; + if (VIR_ALLOC(hostdev->info) < 0) { + virReportOOMError(); + goto error; + } + hostdev->ephemeral = 1; + addrtype = virXPathString("string(./source/address/@type)", ctxt); + /* if not explicitly stated, source/vendor implies usb device */ + if (!addrtype && virXPathNode("./source/vendor", ctxt) && + ((addrtype = strdup("usb")) == NULL)) { + virReportOOMError(); + goto error; + } + if (virDomainHostdevPartsParse(node, ctxt, NULL, addrtype, + hostdev, flags) < 0) { + goto error; + } + break; + case VIR_DOMAIN_NET_TYPE_USER: case VIR_DOMAIN_NET_TYPE_LAST: break; @@ -7609,7 +7667,8 @@ int virDomainNetInsert(virDomainDefPtr def, virDomainNetDefPtr net) return -1; def->nets[def->nnets] = net; def->nnets++; - if (net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) { + if ((net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) || + (net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID)) { /* hostdev net devices must also exist in the hostdevs array */ return virDomainHostdevInsert(def, &net->data.hostdev.def); } @@ -7631,7 +7690,8 @@ virDomainNetRemove(virDomainDefPtr def, size_t i) { virDomainNetDefPtr net = def->nets[i];
- if (net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) { + if ((net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) || + (net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID)) { /* hostdev net devices are normally also be in the hostdevs * array, but might have already been removed by the time we * get here. @@ -8995,8 +9055,10 @@ static virDomainDefPtr virDomainDefParseXML(virCapsPtr caps,
def->nets[def->nnets++] = net;
- /* <interface type='hostdev'> must also be in the hostdevs array */ - if (net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV && + /* <interface type='hostdev' and 'hostdev-hybrid'> must also be in + the hostdevs array */ + if (((net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) || + (net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID)) && virDomainHostdevInsert(def, &net->data.hostdev.def) < 0) { goto no_memory; } @@ -11941,7 +12003,8 @@ virDomainActualNetDefFormat(virBufferPtr buf, }
virBufferAsprintf(buf, "<actual type='%s'", type); - if (def->type == VIR_DOMAIN_NET_TYPE_HOSTDEV && + if ((def->type == VIR_DOMAIN_NET_TYPE_HOSTDEV || + def->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) && def->data.hostdev.def.managed) { virBufferAddLit(buf, " managed='yes'"); } @@ -11971,6 +12034,7 @@ virDomainActualNetDefFormat(virBufferPtr buf, break;
case VIR_DOMAIN_NET_TYPE_HOSTDEV: + case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: if (virDomainHostdevSourceFormat(buf, &def->data.hostdev.def, flags, true) < 0) { return -1; @@ -12011,7 +12075,8 @@ virDomainNetDefFormat(virBufferPtr buf, }
virBufferAsprintf(buf, " <interface type='%s'", type); - if (def->type == VIR_DOMAIN_NET_TYPE_HOSTDEV && + if (((def->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) || + (def->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID)) && def->data.hostdev.def.managed) { virBufferAddLit(buf, " managed='yes'"); } @@ -12078,6 +12143,7 @@ virDomainNetDefFormat(virBufferPtr buf, break;
case VIR_DOMAIN_NET_TYPE_HOSTDEV: + case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: if (virDomainHostdevSourceFormat(buf, &def->data.hostdev.def, flags, true) < 0) { return -1; @@ -14620,10 +14686,12 @@ virDomainNetGetActualDirectMode(virDomainNetDefPtr iface) virDomainHostdevDefPtr virDomainNetGetActualHostdev(virDomainNetDefPtr iface) { - if (iface->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) + if ((iface->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) || + (iface->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID)) return &iface->data.hostdev.def; if (iface->type == VIR_DOMAIN_NET_TYPE_NETWORK && - iface->data.network.actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) { + (iface->data.network.actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV || + iface->data.network.actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID)){ return &iface->data.network.actual->data.hostdev.def; } return NULL; @@ -14636,6 +14704,7 @@ virDomainNetGetActualVirtPortProfile(virDomainNetDefPtr iface) case VIR_DOMAIN_NET_TYPE_DIRECT: case VIR_DOMAIN_NET_TYPE_BRIDGE: case VIR_DOMAIN_NET_TYPE_HOSTDEV: + case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: return iface->virtPortProfile; case VIR_DOMAIN_NET_TYPE_NETWORK: if (!iface->data.network.actual) @@ -14644,6 +14713,7 @@ virDomainNetGetActualVirtPortProfile(virDomainNetDefPtr iface) case VIR_DOMAIN_NET_TYPE_DIRECT: case VIR_DOMAIN_NET_TYPE_BRIDGE: case VIR_DOMAIN_NET_TYPE_HOSTDEV: + case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: return iface->data.network.actual->virtPortProfile; default: return NULL; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 3995c2d..156eb32 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -378,6 +378,7 @@ struct _virDomainHostdevDef { virDomainDeviceDef parent; /* higher level Def containing this */ int mode; /* enum virDomainHostdevMode */ unsigned int managed : 1; + unsigned int ephemeral : 1;
I'm not sure why managed was defined as a bitfield, but these days we're more inclined to use bool
union { virDomainHostdevSubsys subsys; struct { @@ -727,6 +728,7 @@ enum virDomainNetType { VIR_DOMAIN_NET_TYPE_INTERNAL, VIR_DOMAIN_NET_TYPE_DIRECT, VIR_DOMAIN_NET_TYPE_HOSTDEV, + VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID,
VIR_DOMAIN_NET_TYPE_LAST, }; diff --git a/src/uml/uml_conf.c b/src/uml/uml_conf.c index 410f3e2..edea034 100644 --- a/src/uml/uml_conf.c +++ b/src/uml/uml_conf.c @@ -261,6 +261,11 @@ umlBuildCommandLineNet(virConnectPtr conn, _("hostdev networking type not supported")); goto error;
+ case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("hostdev-hybrid networking type not supported")); + goto error; + case VIR_DOMAIN_NET_TYPE_LAST: break; } diff --git a/src/xenxs/xen_sxpr.c b/src/xenxs/xen_sxpr.c index 8bb3849..c94b787 100644 --- a/src/xenxs/xen_sxpr.c +++ b/src/xenxs/xen_sxpr.c @@ -1987,6 +1987,7 @@ xenFormatSxprNet(virConnectPtr conn, case VIR_DOMAIN_NET_TYPE_INTERNAL: case VIR_DOMAIN_NET_TYPE_DIRECT: case VIR_DOMAIN_NET_TYPE_HOSTDEV: + case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: case VIR_DOMAIN_NET_TYPE_LAST: break; } diff --git a/tests/qemuxml2argvdata/qemuxml2argv-net-hostdevhybrid.args b/tests/qemuxml2argvdata/qemuxml2argv-net-hostdevhybrid.args new file mode 100644 index 0000000..a4c50d1 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-net-hostdevhybrid.args @@ -0,0 +1,8 @@ +LC_ALL=C PATH=/bin HOME=/home/test USER=test LOGNAME=test /usr/bin/qemu -S \ +-M pc -m 214 -smp 1 -nographic -nodefconfig -nodefaults -monitor \ +unix:/tmp/test-monitor,server,nowait -no-acpi -boot c \ +-hda /dev/HostVG/QEMUGuest1 \ +-device rtl8139,vlan=0,id=net0,mac=00:11:22:33:44:55,bus=pci.0,addr=0x3 \ +-net user,vlan=0,name=hostnet0 -usb \ +-device pci-assign,host=03:07.1,id=hostdev0,bus=pci.0,addr=0x4 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 diff --git a/tests/qemuxml2argvdata/qemuxml2argv-net-hostdevhybrid.xml b/tests/qemuxml2argvdata/qemuxml2argv-net-hostdevhybrid.xml new file mode 100644 index 0000000..dcf3fd1 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-net-hostdevhybrid.xml @@ -0,0 +1,35 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <interface type='hostdev-hybrid' managed='yes'> + <mac address='00:11:22:33:44:55'/> + <source> + <address type='pci' domain='0x0002' bus='0x03' slot='0x07' function='0x1'/> + </source> + <virtualport type='802.1Qbg'> + <parameters managerid='11' typeid='1193047' typeidversion='2' instanceid='09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f'/> + </virtualport> + </interface> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 47c3f6c..4155352 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -576,6 +576,8 @@ mymain(void) DO_TEST("net-mcast", NONE); DO_TEST("net-hostdev", QEMU_CAPS_PCIDEVICE, QEMU_CAPS_DEVICE, QEMU_CAPS_NODEFCONFIG); + DO_TEST("net-hostdevhybrid", + QEMU_CAPS_PCIDEVICE, QEMU_CAPS_DEVICE, QEMU_CAPS_NODEFCONFIG);
DO_TEST("serial-vc", NONE); DO_TEST("serial-pty", NONE); diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-net-hostdevhybrid.xml b/tests/qemuxml2xmloutdata/qemuxml2xmlout-net-hostdevhybrid.xml new file mode 100644 index 0000000..5ab9ed3 --- /dev/null +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-net-hostdevhybrid.xml @@ -0,0 +1,40 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <interface type='hostdev-hybrid' managed='yes'> + <mac address='00:11:22:33:44:55'/> + <source> + <address type='pci' domain='0x0002' bus='0x03' slot='0x07' function='0x1'/> + </source> + <virtualport type='802.1Qbg'> + <parameters managerid='11' typeid='1193047' typeidversion='2' instanceid='09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f'/> + </virtualport> + </interface> + <hostdev mode='subsystem' type='pci' managed='yes'> + <source> + <address domain='0x0002' bus='0x03' slot='0x07' function='0x1'/> + </source> + </hostdev> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 87d9e77..73846e2 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -181,6 +181,7 @@ mymain(void) DO_TEST("net-eth-ifname"); DO_TEST("net-virtio-network-portgroup"); DO_TEST("net-hostdev"); + DO_TEST_DIFFERENT("net-hostdevhybrid"); DO_TEST("net-openvswitch"); DO_TEST("sound"); DO_TEST("sound-device");

This patch introduces the new forward mode='hostdev-hybrid' along with attribute managed Includes updates to the network RNG and new xml parser/formatter code. --- docs/formatnetwork.html.in | 46 +++++++++++++++++++++++++ docs/schemas/network.rng | 1 + src/conf/network_conf.c | 9 +++-- src/conf/network_conf.h | 1 + tests/networkxml2xmlin/hostdev-hybrid-pf.xml | 7 ++++ tests/networkxml2xmlin/hostdev-hybrid.xml | 10 +++++ tests/networkxml2xmlout/hostdev-hybrid-pf.xml | 7 ++++ tests/networkxml2xmlout/hostdev-hybrid.xml | 10 +++++ tests/networkxml2xmltest.c | 2 + 9 files changed, 90 insertions(+), 3 deletions(-) diff --git a/docs/formatnetwork.html.in b/docs/formatnetwork.html.in index 49206dd..5a5392c 100644 --- a/docs/formatnetwork.html.in +++ b/docs/formatnetwork.html.in @@ -259,6 +259,22 @@ with <code><forward mode='hostdev'/></code>. </p> </dd> + <dt><code>hostdev-hybrid</code></dt> + <dd> + Libvirt later than 0.10.0 also supports "intelligent + passthrough" of VF in the hybrid mode. This is done by + using the <interface type='hostdev-hybrid'/> + functionality. Similar to <interface type='hostdev'/> + the device's MAC address is first optionally configured and + the device is optionally associated with an 802.1Qbh capable + switch using an optionally specified <virtualport> + element (see the examples of virtualport given above for + type='direct' network devices). The Vf is passed into the + guest as a PCI device and at the same time a virtual interface + with type='direct' mode='bridge' is created in the guest. This + hybrid mode of intelligent passthrough makes Live migration + possible. + </dd> </dl> As mentioned above, a <code><forward></code> element can have multiple <code><interface></code> subelements, each @@ -341,6 +357,36 @@ ... </pre> + Similarly hostdev-hybrid mode can be used to support PCI- + passthrough of VF in hybrid mode as described above. The + interface pool can be specified with a list of + <code><address></code> elements, each of which has + <code>< type></code> (must always be <code>'pci'</code>, + <code><domain></code>, <code><bus></code>, + <code><slot></code>, and <code><function></code> + attributes or the interface pool can also be defined using a + single physical function <code><pf></code> subelement to + call out the corresponding physical interface associated with + multiple virtual interfaces (similar to passthrough mode). + + <pre> +... + <forward mode='hostdev-hybrid' managed='yes'> + <address type='pci' domain='0' bus='4' slot='0' function='1'/> + <address type='pci' domain='0' bus='4' slot='0' function='2'/> + <address type='pci' domain='0' bus='4' slot='0' function='3'/> + </forward> +... + </pre> + + <pre> +... + <forward mode='hostdev-hybrid' managed='yes'> + <pf dev='eth0'/> + </forward> +... + </pre> + </dd> </dl> <h5><a name="elementQoS">Quality of service</a></h5> diff --git a/docs/schemas/network.rng b/docs/schemas/network.rng index 4abfd91..1f2136a 100644 --- a/docs/schemas/network.rng +++ b/docs/schemas/network.rng @@ -88,6 +88,7 @@ <value>private</value> <value>vepa</value> <value>hostdev</value> + <value>hostdev-hybrid</value> </choice> </attribute> </optional> diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index 9d53d8e..443b060 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -50,7 +50,7 @@ VIR_ENUM_IMPL(virNetworkForward, VIR_NETWORK_FORWARD_LAST, - "none", "nat", "route", "bridge", "private", "vepa", "passthrough", "hostdev") + "none", "nat", "route", "bridge", "private", "vepa", "passthrough", "hostdev", "hostdev-hybrid") VIR_ENUM_DECL(virNetworkForwardHostdevDevice) VIR_ENUM_IMPL(virNetworkForwardHostdevDevice, @@ -1289,6 +1289,7 @@ virNetworkDefParseXML(xmlXPathContextPtr ctxt) case VIR_NETWORK_FORWARD_VEPA: case VIR_NETWORK_FORWARD_PASSTHROUGH: case VIR_NETWORK_FORWARD_HOSTDEV: + case VIR_NETWORK_FORWARD_HOSTDEV_HYBRID: if (def->bridge) { virReportError(VIR_ERR_XML_ERROR, _("bridge name not allowed in %s mode (network '%s')"), @@ -1590,7 +1591,8 @@ char *virNetworkDefFormat(const virNetworkDefPtr def, unsigned int flags) virBufferAddLit(&buf, "<forward"); virBufferEscapeString(&buf, " dev='%s'", dev); virBufferAsprintf(&buf, " mode='%s'", mode); - if (def->forwardType == VIR_NETWORK_FORWARD_HOSTDEV) { + if (def->forwardType == VIR_NETWORK_FORWARD_HOSTDEV || + def->forwardType == VIR_NETWORK_FORWARD_HOSTDEV_HYBRID) { if (def->managed == 1) virBufferAddLit(&buf, " managed='yes'"); else @@ -1608,7 +1610,8 @@ char *virNetworkDefFormat(const virNetworkDefPtr def, unsigned int flags) if (def->nForwardIfs && (!def->nForwardPfs || !(flags & VIR_NETWORK_XML_INACTIVE))) { for (ii = 0; ii < def->nForwardIfs; ii++) { - if (def->forwardType != VIR_NETWORK_FORWARD_HOSTDEV) { + if (def->forwardType != VIR_NETWORK_FORWARD_HOSTDEV && + def->forwardType != VIR_NETWORK_FORWARD_HOSTDEV_HYBRID) { virBufferEscapeString(&buf, "<interface dev='%s'", def->forwardIfs[ii].device.dev); if (!(flags & VIR_NETWORK_XML_INACTIVE) && diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h index f49c367..a9d8b12 100644 --- a/src/conf/network_conf.h +++ b/src/conf/network_conf.h @@ -48,6 +48,7 @@ enum virNetworkForwardType { VIR_NETWORK_FORWARD_VEPA, VIR_NETWORK_FORWARD_PASSTHROUGH, VIR_NETWORK_FORWARD_HOSTDEV, + VIR_NETWORK_FORWARD_HOSTDEV_HYBRID, VIR_NETWORK_FORWARD_LAST, }; diff --git a/tests/networkxml2xmlin/hostdev-hybrid-pf.xml b/tests/networkxml2xmlin/hostdev-hybrid-pf.xml new file mode 100644 index 0000000..f8b8f9b --- /dev/null +++ b/tests/networkxml2xmlin/hostdev-hybrid-pf.xml @@ -0,0 +1,7 @@ +<network> + <name>hostdev-hybrid</name> + <uuid>81ff0d90-c91e-6742-64da-4a736edb9a9b</uuid> + <forward mode='hostdev-hybrid' managed='yes'> + <pf dev='eth2'/> + </forward> +</network> diff --git a/tests/networkxml2xmlin/hostdev-hybrid.xml b/tests/networkxml2xmlin/hostdev-hybrid.xml new file mode 100644 index 0000000..94c9d65 --- /dev/null +++ b/tests/networkxml2xmlin/hostdev-hybrid.xml @@ -0,0 +1,10 @@ +<network> + <name>hostdev-hybrid</name> + <uuid>81ff0d90-c91e-6742-64da-4a736edb9a9b</uuid> + <forward mode='hostdev-hybrid' managed='yes'> + <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x1'/> + <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x2'/> + <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x3'/> + <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x4'/> + </forward> +</network> diff --git a/tests/networkxml2xmlout/hostdev-hybrid-pf.xml b/tests/networkxml2xmlout/hostdev-hybrid-pf.xml new file mode 100644 index 0000000..f8b8f9b --- /dev/null +++ b/tests/networkxml2xmlout/hostdev-hybrid-pf.xml @@ -0,0 +1,7 @@ +<network> + <name>hostdev-hybrid</name> + <uuid>81ff0d90-c91e-6742-64da-4a736edb9a9b</uuid> + <forward mode='hostdev-hybrid' managed='yes'> + <pf dev='eth2'/> + </forward> +</network> diff --git a/tests/networkxml2xmlout/hostdev-hybrid.xml b/tests/networkxml2xmlout/hostdev-hybrid.xml new file mode 100644 index 0000000..94c9d65 --- /dev/null +++ b/tests/networkxml2xmlout/hostdev-hybrid.xml @@ -0,0 +1,10 @@ +<network> + <name>hostdev-hybrid</name> + <uuid>81ff0d90-c91e-6742-64da-4a736edb9a9b</uuid> + <forward mode='hostdev-hybrid' managed='yes'> + <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x1'/> + <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x2'/> + <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x3'/> + <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x4'/> + </forward> +</network> diff --git a/tests/networkxml2xmltest.c b/tests/networkxml2xmltest.c index e57d190..d8da37a 100644 --- a/tests/networkxml2xmltest.c +++ b/tests/networkxml2xmltest.c @@ -108,6 +108,8 @@ mymain(void) DO_TEST_FULL("passthrough-pf", VIR_NETWORK_XML_INACTIVE); DO_TEST("hostdev"); DO_TEST_FULL("hostdev-pf", VIR_NETWORK_XML_INACTIVE); + DO_TEST("hostdev-hybrid"); + DO_TEST_FULL("hostdev-hybrid-pf", VIR_NETWORK_XML_INACTIVE); return ret==0 ? EXIT_SUCCESS : EXIT_FAILURE; } -- 1.7.4.4

In this mode the guest contains a Virtual network device along with a SRIOV VF passed through to the guest as a pci device. --- src/conf/domain_conf.c | 38 ++++++++++++++++++++++++++++++++++++-- src/conf/domain_conf.h | 5 +++++ src/libvirt_private.syms | 1 + src/util/pci.c | 2 +- src/util/pci.h | 2 ++ src/util/virnetdev.c | 40 ++++++++++++++++++++++++++++++++++++++++ src/util/virnetdev.h | 6 ++++++ 7 files changed, 91 insertions(+), 3 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index d8ab40c..c59ea00 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -1022,6 +1022,7 @@ virDomainActualNetDefFree(virDomainActualNetDefPtr def) virDomainHostdevDefClear(&def->data.hostdev.def); break; case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: + VIR_FREE(def->data.hostdev.linkdev); virDomainHostdevDefClear(&def->data.hostdev.def); break; default: @@ -1077,6 +1078,7 @@ void virDomainNetDefFree(virDomainNetDefPtr def) break; case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: + VIR_FREE(def->data.hostdev.linkdev); virDomainHostdevDefClear(&def->data.hostdev.def); break; @@ -4687,6 +4689,7 @@ virDomainNetDefParseXML(virCapsPtr caps, char *mode = NULL; char *linkstate = NULL; char *addrtype = NULL; + char *pfname = NULL; virNWFilterHashTablePtr filterparams = NULL; virDomainActualNetDefPtr actual = NULL; xmlNodePtr oldnode = ctxt->node; @@ -5024,6 +5027,27 @@ virDomainNetDefParseXML(virCapsPtr caps, hostdev, flags) < 0) { goto error; } + + if (hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) { + if (virNetDevGetPhysicalFunctionFromVfPciAddr(hostdev->source.subsys.u.pci.domain, + hostdev->source.subsys.u.pci.bus, + hostdev->source.subsys.u.pci.slot, + hostdev->source.subsys.u.pci.function, + &pfname) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Could not get Physical Function of the hostdev")); + goto error; + } + } + if (pfname != NULL) + def->data.hostdev.linkdev = strdup(pfname); + else { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Linkdev is required in %s mode"), + virDomainNetTypeToString(def->type)); + goto error; + } + def->data.hostdev.mode = VIR_NETDEV_MACVLAN_MODE_BRIDGE; break; case VIR_DOMAIN_NET_TYPE_USER: @@ -14664,11 +14688,16 @@ virDomainNetGetActualDirectDev(virDomainNetDefPtr iface) { if (iface->type == VIR_DOMAIN_NET_TYPE_DIRECT) return iface->data.direct.linkdev; + if (iface->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) + return iface->data.hostdev.linkdev; if (iface->type != VIR_DOMAIN_NET_TYPE_NETWORK) return NULL; if (!iface->data.network.actual) return NULL; - return iface->data.network.actual->data.direct.linkdev; + if (iface->data.network.actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) + return iface->data.network.actual->data.hostdev.linkdev; + else + return iface->data.network.actual->data.direct.linkdev; } int @@ -14676,11 +14705,16 @@ virDomainNetGetActualDirectMode(virDomainNetDefPtr iface) { if (iface->type == VIR_DOMAIN_NET_TYPE_DIRECT) return iface->data.direct.mode; + if (iface->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) + return iface->data.hostdev.mode; if (iface->type != VIR_DOMAIN_NET_TYPE_NETWORK) return 0; if (!iface->data.network.actual) return 0; - return iface->data.network.actual->data.direct.mode; + if (iface->data.network.actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) + return iface->data.network.actual->data.hostdev.mode; + else + return iface->data.network.actual->data.direct.mode; } virDomainHostdevDefPtr diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 156eb32..171dd70 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -44,6 +44,7 @@ # include "virnetdevopenvswitch.h" # include "virnetdevbandwidth.h" # include "virnetdevvlan.h" +# include "virnetdev.h" # include "virobject.h" # include "device_conf.h" @@ -778,6 +779,8 @@ struct _virDomainActualNetDef { int mode; /* enum virMacvtapMode from util/macvtap.h */ } direct; struct { + char *linkdev; + int mode; virDomainHostdevDef def; } hostdev; } data; @@ -833,6 +836,8 @@ struct _virDomainNetDef { int mode; /* enum virMacvtapMode from util/macvtap.h */ } direct; struct { + char *linkdev; + int mode; virDomainHostdevDef def; } hostdev; } data; diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 10af063..2d06cc3 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1399,6 +1399,7 @@ virNetDevGetMTU; virNetDevGetPhysicalFunction; virNetDevGetVLanID; virNetDevGetVirtualFunctionIndex; +virNetDevGetPhysicalFunctionFromVfPciAddr; virNetDevGetVirtualFunctionInfo; virNetDevGetVirtualFunctions; virNetDevIsOnline; diff --git a/src/util/pci.c b/src/util/pci.c index 0742d07..ba8fffc 100644 --- a/src/util/pci.c +++ b/src/util/pci.c @@ -802,7 +802,7 @@ pciDriverFile(char **buffer, const char *driver, const char *file) return 0; } -static int +int pciDeviceFile(char **buffer, const char *device, const char *file) { VIR_FREE(*buffer); diff --git a/src/util/pci.h b/src/util/pci.h index 8bbab07..936fee4 100644 --- a/src/util/pci.h +++ b/src/util/pci.h @@ -116,6 +116,8 @@ int pciConfigAddressToSysfsFile(struct pci_config_address *dev, int pciDeviceNetName(char *device_link_sysfs_path, char **netname); +int pciDeviceFile(char **buffer, const char *device, const char *file); + int pciSysfsFile(char *pciDeviceName, char **pci_sysfs_device_link) ATTRIBUTE_RETURN_CHECK; diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index f622f5d..4d4fcb1 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -1097,6 +1097,46 @@ virNetDevGetVirtualFunctionIndex(const char *pfname, const char *vfname, } /** + * virNetDevGetPhyscialFunctionFromVfPciAddr + * + * @domain : Domain part of VF PCI addr + * @bus : Bus part of VF PCI addr + * @slot : Slot part of VF PCI addr + * @function : Function part of VF PCI addr + * @pfname : Contains sriov physical function for Vf upon + * successful return + * + * Returns 0 on success, -1 on failure + * + */ +int +virNetDevGetPhysicalFunctionFromVfPciAddr(unsigned domain, + unsigned bus, + unsigned slot, + unsigned function, + char **pfname) +{ + char *pciConfigAddr = NULL; + char *physfn_sysfs_path = NULL; + int ret = -1; + + if (pciGetDeviceAddrString(domain, bus, slot, function, + &pciConfigAddr) < 0) { + goto cleanup; + } + if (pciDeviceFile(&physfn_sysfs_path, pciConfigAddr, "physfn") < 0) { + goto cleanup; + } + ret = pciDeviceNetName(physfn_sysfs_path, pfname); + +cleanup: + VIR_FREE(pciConfigAddr); + VIR_FREE(physfn_sysfs_path); + + return ret; +} + +/** * virNetDevGetPhysicalFunction * * @ifname : name of the physical function interface name diff --git a/src/util/virnetdev.h b/src/util/virnetdev.h index 705ad9c..2e102f2 100644 --- a/src/util/virnetdev.h +++ b/src/util/virnetdev.h @@ -99,6 +99,12 @@ int virNetDevGetVirtualFunctionIndex(const char *pfname, const char *vfname, ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) ATTRIBUTE_RETURN_CHECK; +int virNetDevGetPhysicalFunctionFromVfPciAddr(unsigned domain, unsigned bus, + unsigned slot, unsigned function, + char **pfname) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4) ATTRIBUTE_RETURN_CHECK; + int virNetDevGetPhysicalFunction(const char *ifname, char **pfname) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; -- 1.7.4.4

(not a full review yet, but a couple of things I noticed in passing) On 09/07/2012 12:15 PM, Shradha Shah wrote:
In this mode the guest contains a Virtual network device along with a SRIOV VF passed through to the guest as a pci device. --- src/conf/domain_conf.c | 38 ++++++++++++++++++++++++++++++++++++-- src/conf/domain_conf.h | 5 +++++ src/libvirt_private.syms | 1 + src/util/pci.c | 2 +- src/util/pci.h | 2 ++ src/util/virnetdev.c | 40 ++++++++++++++++++++++++++++++++++++++++ src/util/virnetdev.h | 6 ++++++ 7 files changed, 91 insertions(+), 3 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index d8ab40c..c59ea00 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -1022,6 +1022,7 @@ virDomainActualNetDefFree(virDomainActualNetDefPtr def) virDomainHostdevDefClear(&def->data.hostdev.def); break; case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: + VIR_FREE(def->data.hostdev.linkdev); virDomainHostdevDefClear(&def->data.hostdev.def); break; default: @@ -1077,6 +1078,7 @@ void virDomainNetDefFree(virDomainNetDefPtr def) break;
case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: + VIR_FREE(def->data.hostdev.linkdev); virDomainHostdevDefClear(&def->data.hostdev.def); break;
@@ -4687,6 +4689,7 @@ virDomainNetDefParseXML(virCapsPtr caps, char *mode = NULL; char *linkstate = NULL; char *addrtype = NULL; + char *pfname = NULL; virNWFilterHashTablePtr filterparams = NULL; virDomainActualNetDefPtr actual = NULL; xmlNodePtr oldnode = ctxt->node; @@ -5024,6 +5027,27 @@ virDomainNetDefParseXML(virCapsPtr caps, hostdev, flags) < 0) { goto error; } + + if (hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) { + if (virNetDevGetPhysicalFunctionFromVfPciAddr(hostdev->source.subsys.u.pci.domain, + hostdev->source.subsys.u.pci.bus, + hostdev->source.subsys.u.pci.slot, + hostdev->source.subsys.u.pci.function, + &pfname) < 0) {
It's problematic to have this call in a parsing function - it requires the actual hardware to be present on the machine doing the parse. For example, doing this in the parse causes the qemuxml2xml and qemuxml2argv tests to fail on my system, because my hardware doesn't match yours: One last thing - after applying the entire series, I'm getting the following failures in make check: VIR_TEST_DEBUG=2 ./qemuxml2xmltest TEST: qemuxml2xmltest [...] 7) QEMU XML-2-XML net-hostdevhybrid ... libvir: Domain Config error : internal error Could not get Physical Function of the hostdev ... VIR_TEST_DEBUG=2 ./qemuxml2argvtest TEST: qemuxml2argvtest [...] 113) QEMU XML-2-ARGV net-hostdevhybrid ... libvir: Domain Config error : internal error Could not get Physical Function of the hostdev I couldn't find any other uses of network device "ioctl" type calls in the conf directory. I think we at least frown on doing that in parsing/formatting, and may even forbid it. At any rate, You need to not do this here. Instead, perhaps you can leave it empty if it's not given explicitly in the input XML, and fill it in in the caller when appropriate?
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Could not get Physical Function of the hostdev")); + goto error; + } + } + if (pfname != NULL) + def->data.hostdev.linkdev = strdup(pfname); + else { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Linkdev is required in %s mode"), + virDomainNetTypeToString(def->type)); + goto error; + } + def->data.hostdev.mode = VIR_NETDEV_MACVLAN_MODE_BRIDGE; break;
case VIR_DOMAIN_NET_TYPE_USER: @@ -14664,11 +14688,16 @@ virDomainNetGetActualDirectDev(virDomainNetDefPtr iface) { if (iface->type == VIR_DOMAIN_NET_TYPE_DIRECT) return iface->data.direct.linkdev; + if (iface->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) + return iface->data.hostdev.linkdev; if (iface->type != VIR_DOMAIN_NET_TYPE_NETWORK) return NULL; if (!iface->data.network.actual) return NULL; - return iface->data.network.actual->data.direct.linkdev; + if (iface->data.network.actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) + return iface->data.network.actual->data.hostdev.linkdev; + else + return iface->data.network.actual->data.direct.linkdev; }
int @@ -14676,11 +14705,16 @@ virDomainNetGetActualDirectMode(virDomainNetDefPtr iface) { if (iface->type == VIR_DOMAIN_NET_TYPE_DIRECT) return iface->data.direct.mode; + if (iface->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) + return iface->data.hostdev.mode; if (iface->type != VIR_DOMAIN_NET_TYPE_NETWORK) return 0; if (!iface->data.network.actual) return 0; - return iface->data.network.actual->data.direct.mode; + if (iface->data.network.actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) + return iface->data.network.actual->data.hostdev.mode; + else + return iface->data.network.actual->data.direct.mode; }
virDomainHostdevDefPtr diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 156eb32..171dd70 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -44,6 +44,7 @@ # include "virnetdevopenvswitch.h" # include "virnetdevbandwidth.h" # include "virnetdevvlan.h" +# include "virnetdev.h" # include "virobject.h" # include "device_conf.h"
@@ -778,6 +779,8 @@ struct _virDomainActualNetDef { int mode; /* enum virMacvtapMode from util/macvtap.h */ } direct; struct { + char *linkdev; + int mode;
It appears that the mode is always "bridge", so I don't see any point in including it here.
virDomainHostdevDef def; } hostdev; } data; @@ -833,6 +836,8 @@ struct _virDomainNetDef { int mode; /* enum virMacvtapMode from util/macvtap.h */ } direct; struct { + char *linkdev; + int mode; virDomainHostdevDef def; } hostdev; } data; diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 10af063..2d06cc3 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1399,6 +1399,7 @@ virNetDevGetMTU; virNetDevGetPhysicalFunction; virNetDevGetVLanID; virNetDevGetVirtualFunctionIndex; +virNetDevGetPhysicalFunctionFromVfPciAddr; virNetDevGetVirtualFunctionInfo; virNetDevGetVirtualFunctions; virNetDevIsOnline; diff --git a/src/util/pci.c b/src/util/pci.c index 0742d07..ba8fffc 100644 --- a/src/util/pci.c +++ b/src/util/pci.c @@ -802,7 +802,7 @@ pciDriverFile(char **buffer, const char *driver, const char *file) return 0; }
-static int +int pciDeviceFile(char **buffer, const char *device, const char *file) { VIR_FREE(*buffer); diff --git a/src/util/pci.h b/src/util/pci.h index 8bbab07..936fee4 100644 --- a/src/util/pci.h +++ b/src/util/pci.h @@ -116,6 +116,8 @@ int pciConfigAddressToSysfsFile(struct pci_config_address *dev,
int pciDeviceNetName(char *device_link_sysfs_path, char **netname);
+int pciDeviceFile(char **buffer, const char *device, const char *file); + int pciSysfsFile(char *pciDeviceName, char **pci_sysfs_device_link) ATTRIBUTE_RETURN_CHECK;
diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index f622f5d..4d4fcb1 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -1097,6 +1097,46 @@ virNetDevGetVirtualFunctionIndex(const char *pfname, const char *vfname, }
/** + * virNetDevGetPhyscialFunctionFromVfPciAddr + * + * @domain : Domain part of VF PCI addr + * @bus : Bus part of VF PCI addr + * @slot : Slot part of VF PCI addr + * @function : Function part of VF PCI addr + * @pfname : Contains sriov physical function for Vf upon + * successful return + * + * Returns 0 on success, -1 on failure + * + */ +int +virNetDevGetPhysicalFunctionFromVfPciAddr(unsigned domain, + unsigned bus, + unsigned slot, + unsigned function, + char **pfname) +{ + char *pciConfigAddr = NULL; + char *physfn_sysfs_path = NULL; + int ret = -1; + + if (pciGetDeviceAddrString(domain, bus, slot, function, + &pciConfigAddr) < 0) { + goto cleanup; + } + if (pciDeviceFile(&physfn_sysfs_path, pciConfigAddr, "physfn") < 0) { + goto cleanup; + } + ret = pciDeviceNetName(physfn_sysfs_path, pfname); + +cleanup: + VIR_FREE(pciConfigAddr); + VIR_FREE(physfn_sysfs_path); + + return ret; +} + +/** * virNetDevGetPhysicalFunction * * @ifname : name of the physical function interface name diff --git a/src/util/virnetdev.h b/src/util/virnetdev.h index 705ad9c..2e102f2 100644 --- a/src/util/virnetdev.h +++ b/src/util/virnetdev.h @@ -99,6 +99,12 @@ int virNetDevGetVirtualFunctionIndex(const char *pfname, const char *vfname, ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) ATTRIBUTE_RETURN_CHECK;
+int virNetDevGetPhysicalFunctionFromVfPciAddr(unsigned domain, unsigned bus, + unsigned slot, unsigned function, + char **pfname) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4) ATTRIBUTE_RETURN_CHECK; + int virNetDevGetPhysicalFunction(const char *ifname, char **pfname) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK;

On 09/11/2012 08:15 PM, Laine Stump wrote:
(not a full review yet, but a couple of things I noticed in passing)
On 09/07/2012 12:15 PM, Shradha Shah wrote:
In this mode the guest contains a Virtual network device along with a SRIOV VF passed through to the guest as a pci device. --- src/conf/domain_conf.c | 38 ++++++++++++++++++++++++++++++++++++-- src/conf/domain_conf.h | 5 +++++ src/libvirt_private.syms | 1 + src/util/pci.c | 2 +- src/util/pci.h | 2 ++ src/util/virnetdev.c | 40 ++++++++++++++++++++++++++++++++++++++++ src/util/virnetdev.h | 6 ++++++ 7 files changed, 91 insertions(+), 3 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index d8ab40c..c59ea00 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -1022,6 +1022,7 @@ virDomainActualNetDefFree(virDomainActualNetDefPtr def) virDomainHostdevDefClear(&def->data.hostdev.def); break; case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: + VIR_FREE(def->data.hostdev.linkdev); virDomainHostdevDefClear(&def->data.hostdev.def); break; default: @@ -1077,6 +1078,7 @@ void virDomainNetDefFree(virDomainNetDefPtr def) break;
case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: + VIR_FREE(def->data.hostdev.linkdev); virDomainHostdevDefClear(&def->data.hostdev.def); break;
@@ -4687,6 +4689,7 @@ virDomainNetDefParseXML(virCapsPtr caps, char *mode = NULL; char *linkstate = NULL; char *addrtype = NULL; + char *pfname = NULL; virNWFilterHashTablePtr filterparams = NULL; virDomainActualNetDefPtr actual = NULL; xmlNodePtr oldnode = ctxt->node; @@ -5024,6 +5027,27 @@ virDomainNetDefParseXML(virCapsPtr caps, hostdev, flags) < 0) { goto error; } + + if (hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) { + if (virNetDevGetPhysicalFunctionFromVfPciAddr(hostdev->source.subsys.u.pci.domain, + hostdev->source.subsys.u.pci.bus, + hostdev->source.subsys.u.pci.slot, + hostdev->source.subsys.u.pci.function, + &pfname) < 0) {
It's problematic to have this call in a parsing function - it requires the actual hardware to be present on the machine doing the parse. For example, doing this in the parse causes the qemuxml2xml and qemuxml2argv tests to fail on my system, because my hardware doesn't match yours:
One last thing - after applying the entire series, I'm getting the following failures in make check:
VIR_TEST_DEBUG=2 ./qemuxml2xmltest TEST: qemuxml2xmltest [...] 7) QEMU XML-2-XML net-hostdevhybrid ... libvir: Domain Config error : internal error Could not get Physical Function of the hostdev
...
VIR_TEST_DEBUG=2 ./qemuxml2argvtest TEST: qemuxml2argvtest [...] 113) QEMU XML-2-ARGV net-hostdevhybrid ... libvir: Domain Config error : internal error Could not get Physical Function of the hostdev
I couldn't find any other uses of network device "ioctl" type calls in the conf directory. I think we at least frown on doing that in parsing/formatting, and may even forbid it.
At any rate, You need to not do this here. Instead, perhaps you can leave it empty if it's not given explicitly in the input XML, and fill it in in the caller when appropriate?
Thanks for the review Laine. I had actually encountered this issue when I was working on the patches but at that moment I could not think of a workaround. Thanks for the suggestion. I will do the needful.
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Could not get Physical Function of the hostdev")); + goto error; + } + } + if (pfname != NULL) + def->data.hostdev.linkdev = strdup(pfname); + else { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Linkdev is required in %s mode"), + virDomainNetTypeToString(def->type)); + goto error; + } + def->data.hostdev.mode = VIR_NETDEV_MACVLAN_MODE_BRIDGE; break;
case VIR_DOMAIN_NET_TYPE_USER: @@ -14664,11 +14688,16 @@ virDomainNetGetActualDirectDev(virDomainNetDefPtr iface) { if (iface->type == VIR_DOMAIN_NET_TYPE_DIRECT) return iface->data.direct.linkdev; + if (iface->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) + return iface->data.hostdev.linkdev; if (iface->type != VIR_DOMAIN_NET_TYPE_NETWORK) return NULL; if (!iface->data.network.actual) return NULL; - return iface->data.network.actual->data.direct.linkdev; + if (iface->data.network.actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) + return iface->data.network.actual->data.hostdev.linkdev; + else + return iface->data.network.actual->data.direct.linkdev; }
int @@ -14676,11 +14705,16 @@ virDomainNetGetActualDirectMode(virDomainNetDefPtr iface) { if (iface->type == VIR_DOMAIN_NET_TYPE_DIRECT) return iface->data.direct.mode; + if (iface->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) + return iface->data.hostdev.mode; if (iface->type != VIR_DOMAIN_NET_TYPE_NETWORK) return 0; if (!iface->data.network.actual) return 0; - return iface->data.network.actual->data.direct.mode; + if (iface->data.network.actual->type == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) + return iface->data.network.actual->data.hostdev.mode; + else + return iface->data.network.actual->data.direct.mode; }
virDomainHostdevDefPtr diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 156eb32..171dd70 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -44,6 +44,7 @@ # include "virnetdevopenvswitch.h" # include "virnetdevbandwidth.h" # include "virnetdevvlan.h" +# include "virnetdev.h" # include "virobject.h" # include "device_conf.h"
@@ -778,6 +779,8 @@ struct _virDomainActualNetDef { int mode; /* enum virMacvtapMode from util/macvtap.h */ } direct; struct { + char *linkdev; + int mode;
It appears that the mode is always "bridge", so I don't see any point in including it here.
virDomainHostdevDef def; } hostdev; } data; @@ -833,6 +836,8 @@ struct _virDomainNetDef { int mode; /* enum virMacvtapMode from util/macvtap.h */ } direct; struct { + char *linkdev; + int mode; virDomainHostdevDef def; } hostdev; } data; diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 10af063..2d06cc3 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1399,6 +1399,7 @@ virNetDevGetMTU; virNetDevGetPhysicalFunction; virNetDevGetVLanID; virNetDevGetVirtualFunctionIndex; +virNetDevGetPhysicalFunctionFromVfPciAddr; virNetDevGetVirtualFunctionInfo; virNetDevGetVirtualFunctions; virNetDevIsOnline; diff --git a/src/util/pci.c b/src/util/pci.c index 0742d07..ba8fffc 100644 --- a/src/util/pci.c +++ b/src/util/pci.c @@ -802,7 +802,7 @@ pciDriverFile(char **buffer, const char *driver, const char *file) return 0; }
-static int +int pciDeviceFile(char **buffer, const char *device, const char *file) { VIR_FREE(*buffer); diff --git a/src/util/pci.h b/src/util/pci.h index 8bbab07..936fee4 100644 --- a/src/util/pci.h +++ b/src/util/pci.h @@ -116,6 +116,8 @@ int pciConfigAddressToSysfsFile(struct pci_config_address *dev,
int pciDeviceNetName(char *device_link_sysfs_path, char **netname);
+int pciDeviceFile(char **buffer, const char *device, const char *file); + int pciSysfsFile(char *pciDeviceName, char **pci_sysfs_device_link) ATTRIBUTE_RETURN_CHECK;
diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index f622f5d..4d4fcb1 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -1097,6 +1097,46 @@ virNetDevGetVirtualFunctionIndex(const char *pfname, const char *vfname, }
/** + * virNetDevGetPhyscialFunctionFromVfPciAddr + * + * @domain : Domain part of VF PCI addr + * @bus : Bus part of VF PCI addr + * @slot : Slot part of VF PCI addr + * @function : Function part of VF PCI addr + * @pfname : Contains sriov physical function for Vf upon + * successful return + * + * Returns 0 on success, -1 on failure + * + */ +int +virNetDevGetPhysicalFunctionFromVfPciAddr(unsigned domain, + unsigned bus, + unsigned slot, + unsigned function, + char **pfname) +{ + char *pciConfigAddr = NULL; + char *physfn_sysfs_path = NULL; + int ret = -1; + + if (pciGetDeviceAddrString(domain, bus, slot, function, + &pciConfigAddr) < 0) { + goto cleanup; + } + if (pciDeviceFile(&physfn_sysfs_path, pciConfigAddr, "physfn") < 0) { + goto cleanup; + } + ret = pciDeviceNetName(physfn_sysfs_path, pfname); + +cleanup: + VIR_FREE(pciConfigAddr); + VIR_FREE(physfn_sysfs_path); + + return ret; +} + +/** * virNetDevGetPhysicalFunction * * @ifname : name of the physical function interface name diff --git a/src/util/virnetdev.h b/src/util/virnetdev.h index 705ad9c..2e102f2 100644 --- a/src/util/virnetdev.h +++ b/src/util/virnetdev.h @@ -99,6 +99,12 @@ int virNetDevGetVirtualFunctionIndex(const char *pfname, const char *vfname, ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) ATTRIBUTE_RETURN_CHECK;
+int virNetDevGetPhysicalFunctionFromVfPciAddr(unsigned domain, unsigned bus, + unsigned slot, unsigned function, + char **pfname) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4) ATTRIBUTE_RETURN_CHECK; + int virNetDevGetPhysicalFunction(const char *ifname, char **pfname) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK;
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

The parent type for hostdev hybrid needs to be VIR_DOMAIN_DEVICE_NONE as the device is passed into the guest as a PCI Device. In order to store the information of the NETDEV that is the parent of the HOSTDEV in question we use a new variable actualParent. This variable also helps during VF MAC address, vlan and virtportprofile configuration. ActualParent = Parent in case of forward mode="hostdev" --- src/conf/domain_conf.c | 9 +++ src/conf/domain_conf.h | 1 + src/qemu/qemu_hostdev.c | 152 +++++++++++++++++++++++++++++++++-------------- src/qemu/qemu_hotplug.c | 2 +- 4 files changed, 118 insertions(+), 46 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index c59ea00..52c00db 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -4587,6 +4587,8 @@ virDomainActualNetDefParseXML(xmlNodePtr node, hostdev->parent.type = VIR_DOMAIN_DEVICE_NET; hostdev->parent.data.net = parent; + hostdev->actualParent.type = VIR_DOMAIN_DEVICE_NET; + hostdev->actualParent.data.net = parent; hostdev->info = &parent->info; /* The helper function expects type to already be found and * passed in as a string, since it is in a different place in @@ -4607,6 +4609,9 @@ virDomainActualNetDefParseXML(xmlNodePtr node, virDomainHostdevDefPtr hostdev = &actual->data.hostdev.def; hostdev->parent.type = VIR_DOMAIN_DEVICE_NONE; + hostdev->actualParent.type = VIR_DOMAIN_DEVICE_NET; + hostdev->actualParent.data.net = parent; + if (VIR_ALLOC(hostdev->info) < 0) { virReportOOMError(); goto error; @@ -4990,6 +4995,8 @@ virDomainNetDefParseXML(virCapsPtr caps, hostdev = &def->data.hostdev.def; hostdev->parent.type = VIR_DOMAIN_DEVICE_NET; hostdev->parent.data.net = def; + hostdev->actualParent.type = VIR_DOMAIN_DEVICE_NET; + hostdev->actualParent.data.net = def; hostdev->info = &def->info; /* The helper function expects type to already be found and * passed in as a string, since it is in a different place in @@ -5011,6 +5018,8 @@ virDomainNetDefParseXML(virCapsPtr caps, case VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID: hostdev = &def->data.hostdev.def; hostdev->parent.type = VIR_DOMAIN_DEVICE_NONE; + hostdev->actualParent.type = VIR_DOMAIN_DEVICE_NET; + hostdev->actualParent.data.net = def; if (VIR_ALLOC(hostdev->info) < 0) { virReportOOMError(); goto error; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 171dd70..adbb777 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -377,6 +377,7 @@ struct _virDomainHostdevSubsys { /* basic device for direct passthrough */ struct _virDomainHostdevDef { virDomainDeviceDef parent; /* higher level Def containing this */ + virDomainDeviceDef actualParent; /*used only in the case of hybrid hostdev*/ int mode; /* enum virDomainHostdevMode */ unsigned int managed : 1; unsigned int ephemeral : 1; diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index 46c84b5..a060a7e 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -320,43 +320,87 @@ qemuDomainHostdevNetConfigReplace(virDomainHostdevDefPtr hostdev, if (qemuDomainHostdevNetDevice(hostdev, &linkdev, &vf) < 0) return ret; - vlan = virDomainNetGetActualVlan(hostdev->parent.data.net); - virtPort = virDomainNetGetActualVirtPortProfile( - hostdev->parent.data.net); - if (virtPort) { - if (vlan) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("direct setting of the vlan tag is not allowed " - "for hostdev devices using %s mode"), - virNetDevVPortTypeToString(virtPort->virtPortType)); - goto cleanup; - } - ret = qemuDomainHostdevNetConfigVirtPortProfile(linkdev, vf, - virtPort, &hostdev->parent.data.net->mac, uuid, - port_profile_associate); - } else { - /* Set only mac and vlan */ - if (vlan) { - if (vlan->nTags != 1 || vlan->trunk) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("vlan trunking is not supported " - "by SR-IOV network devices")); + if (hostdev->parent.data.net) { + vlan = virDomainNetGetActualVlan(hostdev->parent.data.net); + virtPort = virDomainNetGetActualVirtPortProfile(hostdev->parent.data.net); + if (virtPort) { + if (vlan) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("direct setting of the vlan tag is not allowed " + "for hostdev devices using %s mode"), + virNetDevVPortTypeToString(virtPort->virtPortType)); goto cleanup; } - if (vf == -1) { + ret = qemuDomainHostdevNetConfigVirtPortProfile(linkdev, vf, + virtPort, + &hostdev->parent.data.net->mac, + uuid, + port_profile_associate); + } else { + /* Set only mac and vlan */ + if (vlan) { + if (vlan->nTags != 1 || vlan->trunk) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("vlan trunking is not supported " + "by SR-IOV network devices")); + goto cleanup; + } + if (vf == -1) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("vlan can only be set for SR-IOV VFs, but " + "%s is not a VF"), linkdev); + goto cleanup; + } + vlanid = vlan->tag[0]; + } else if (vf >= 0) { + vlanid = 0; /* assure any current vlan tag is reset */ + } + + ret = virNetDevReplaceNetConfig(linkdev, vf, + &hostdev->parent.data.net->mac, + vlanid, stateDir); + } + } + else if (hostdev->actualParent.data.net) { + vlan = virDomainNetGetActualVlan(hostdev->actualParent.data.net); + virtPort = virDomainNetGetActualVirtPortProfile(hostdev->actualParent.data.net); + if (virtPort) { + if (vlan) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("vlan can only be set for SR-IOV VFs, but " - "%s is not a VF"), linkdev); + _("direct setting of the vlan tag is not allowed " + "for hostdev devices using %s mode"), + virNetDevVPortTypeToString(virtPort->virtPortType)); goto cleanup; } - vlanid = vlan->tag[0]; - } else if (vf >= 0) { - vlanid = 0; /* assure any current vlan tag is reset */ - } + ret = qemuDomainHostdevNetConfigVirtPortProfile(linkdev, vf, + virtPort, + &hostdev->actualParent.data.net->mac, + uuid, + port_profile_associate); + } else { + /* Set only mac and vlan */ + if (vlan) { + if (vlan->nTags != 1 || vlan->trunk) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("vlan trunking is not supported " + "by SR-IOV network devices")); + goto cleanup; + } + if (vf == -1) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("vlan can only be set for SR-IOV VFs, but " + "%s is not a VF"), linkdev); + goto cleanup; + } + vlanid = vlan->tag[0]; + } else if (vf >= 0) { + vlanid = 0; /* assure any current vlan tag is reset */ + } - ret = virNetDevReplaceNetConfig(linkdev, vf, - &hostdev->parent.data.net->mac, - vlanid, stateDir); + ret = virNetDevReplaceNetConfig(linkdev, vf, + &hostdev->actualParent.data.net->mac, + vlanid, stateDir); + } } cleanup: VIR_FREE(linkdev); @@ -385,14 +429,26 @@ qemuDomainHostdevNetConfigRestore(virDomainHostdevDefPtr hostdev, if (qemuDomainHostdevNetDevice(hostdev, &linkdev, &vf) < 0) return ret; - virtPort = virDomainNetGetActualVirtPortProfile( - hostdev->parent.data.net); - if (virtPort) - ret = qemuDomainHostdevNetConfigVirtPortProfile(linkdev, vf, virtPort, - &hostdev->parent.data.net->mac, NULL, - port_profile_associate); - else - ret = virNetDevRestoreNetConfig(linkdev, vf, stateDir); + if (hostdev->parent.data.net) { + virtPort = virDomainNetGetActualVirtPortProfile(hostdev->parent.data.net); + if (virtPort) + ret = qemuDomainHostdevNetConfigVirtPortProfile(linkdev, vf, virtPort, + &hostdev->parent.data.net->mac, + NULL, + port_profile_associate); + else + ret = virNetDevRestoreNetConfig(linkdev, vf, stateDir); + } + if (hostdev->actualParent.data.net) { + virtPort = virDomainNetGetActualVirtPortProfile(hostdev->actualParent.data.net); + if (virtPort) + ret = qemuDomainHostdevNetConfigVirtPortProfile(linkdev, vf, virtPort, + &hostdev->actualParent.data.net->mac, + NULL, + port_profile_associate); + else + ret = virNetDevRestoreNetConfig(linkdev, vf, stateDir); + } VIR_FREE(linkdev); @@ -478,8 +534,10 @@ int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, continue; if (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) continue; - if (hostdev->parent.type == VIR_DOMAIN_DEVICE_NET && - hostdev->parent.data.net) { + if ((hostdev->parent.type == VIR_DOMAIN_DEVICE_NET && + hostdev->parent.data.net) || + (hostdev->actualParent.type == VIR_DOMAIN_DEVICE_NET && + hostdev->actualParent.data.net)) { if (qemuDomainHostdevNetConfigReplace(hostdev, uuid, driver->stateDir) < 0) { goto resetvfnetconfig; @@ -568,8 +626,10 @@ inactivedevs: resetvfnetconfig: for (i = 0; i < last_processed_hostdev_vf; i++) { virDomainHostdevDefPtr hostdev = hostdevs[i]; - if (hostdev->parent.type == VIR_DOMAIN_DEVICE_NET && - hostdev->parent.data.net) { + if ((hostdev->parent.type == VIR_DOMAIN_DEVICE_NET && + hostdev->parent.data.net) || + (hostdev->actualParent.type == VIR_DOMAIN_DEVICE_NET && + hostdev->actualParent.data.net)) { qemuDomainHostdevNetConfigRestore(hostdev, driver->stateDir); } } @@ -827,8 +887,10 @@ void qemuDomainReAttachHostdevDevices(struct qemud_driver *driver, continue; if (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) continue; - if (hostdev->parent.type == VIR_DOMAIN_DEVICE_NET && - hostdev->parent.data.net) { + if ((hostdev->parent.type == VIR_DOMAIN_DEVICE_NET && + hostdev->parent.data.net) || + (hostdev->actualParent.type == VIR_DOMAIN_DEVICE_NET && + hostdev->actualParent.data.net)) { qemuDomainHostdevNetConfigRestore(hostdev, driver->stateDir); } } diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index a8a904c..e8860f3 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1996,7 +1996,7 @@ qemuDomainDetachHostPciDevice(struct qemud_driver *driver, * For SRIOV net host devices, unset mac and port profile before * reset and reattach device */ - if (detach->parent.data.net) + if (detach->parent.data.net || detach->actualParent.data.net) qemuDomainHostdevNetConfigRestore(detach, driver->stateDir); pci = pciGetDevice(subsys->u.pci.domain, subsys->u.pci.bus, -- 1.7.4.4

This patch updates the network driver to properly utilize the new attributes/elements that are now in virNetworkDef --- src/network/bridge_driver.c | 123 ++++++++++++++++++++++++++++++++++++++----- 1 files changed, 110 insertions(+), 13 deletions(-) diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 808c843..f94f81a 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -1993,9 +1993,9 @@ networkStartNetworkExternal(struct network_driver *driver ATTRIBUTE_UNUSED, virNetworkObjPtr network ATTRIBUTE_UNUSED) { /* put anything here that needs to be done each time a network of - * type BRIDGE, PRIVATE, VEPA, HOSTDEV or PASSTHROUGH is started. On - * failure, undo anything you've done, and return -1. On success - * return 0. + * type BRIDGE, PRIVATE, VEPA, HOSTDEV, HOSTDEV-HYBRID or PASSTHROUGH + * is started. On failure, undo anything you've done, and return -1. + * On success return 0. */ return 0; } @@ -2004,9 +2004,9 @@ static int networkShutdownNetworkExternal(struct network_driver *driver ATTRIBUT virNetworkObjPtr network ATTRIBUTE_UNUSED) { /* put anything here that needs to be done each time a network of - * type BRIDGE, PRIVATE, VEPA, HOSTDEV or PASSTHROUGH is shutdown. On - * failure, undo anything you've done, and return -1. On success - * return 0. + * type BRIDGE, PRIVATE, VEPA, HOSTDEV, HOSTDEV-HYBRID or PASSTHROUGH + * is shutdown. On failure, undo anything you've done, and return -1. + * On success return 0. */ return 0; } @@ -2036,6 +2036,7 @@ networkStartNetwork(struct network_driver *driver, case VIR_NETWORK_FORWARD_VEPA: case VIR_NETWORK_FORWARD_PASSTHROUGH: case VIR_NETWORK_FORWARD_HOSTDEV: + case VIR_NETWORK_FORWARD_HOSTDEV_HYBRID: ret = networkStartNetworkExternal(driver, network); break; } @@ -2096,6 +2097,7 @@ static int networkShutdownNetwork(struct network_driver *driver, case VIR_NETWORK_FORWARD_VEPA: case VIR_NETWORK_FORWARD_PASSTHROUGH: case VIR_NETWORK_FORWARD_HOSTDEV: + case VIR_NETWORK_FORWARD_HOSTDEV_HYBRID: ret = networkShutdownNetworkExternal(driver, network); break; } @@ -2885,7 +2887,8 @@ networkCreateInterfacePool(virNetworkDefPtr netdef) { goto finish; } } - else if (netdef->forwardType == VIR_NETWORK_FORWARD_HOSTDEV) { + else if (netdef->forwardType == VIR_NETWORK_FORWARD_HOSTDEV || + netdef->forwardType == VIR_NETWORK_FORWARD_HOSTDEV_HYBRID) { /* VF's are always PCI devices */ netdef->forwardIfs[ii].type = VIR_NETWORK_FORWARD_HOSTDEV_DEVICE_PCI; netdef->forwardIfs[ii].device.pci.domain = virt_fns[ii]->domain; @@ -3056,6 +3059,8 @@ networkAllocateActualDevice(virDomainNetDefPtr iface) } iface->data.network.actual->data.hostdev.def.parent.type = VIR_DOMAIN_DEVICE_NET; iface->data.network.actual->data.hostdev.def.parent.data.net = iface; + iface->data.network.actual->data.hostdev.def.actualParent.type = VIR_DOMAIN_DEVICE_NET; + iface->data.network.actual->data.hostdev.def.actualParent.data.net = iface; iface->data.network.actual->data.hostdev.def.info = &iface->info; iface->data.network.actual->data.hostdev.def.mode = VIR_DOMAIN_HOSTDEV_MODE_SUBSYS; iface->data.network.actual->data.hostdev.def.managed = netdef->managed; @@ -3087,6 +3092,92 @@ networkAllocateActualDevice(virDomainNetDefPtr iface) } } + } else if (netdef->forwardType == VIR_NETWORK_FORWARD_HOSTDEV_HYBRID) { + char *pfname = NULL; + if (!iface->data.network.actual + && (VIR_ALLOC(iface->data.network.actual) < 0)) { + virReportOOMError(); + goto error; + } + iface->data.network.actual->type = actualType = VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID; + + if (netdef->nForwardPfs > 0 && netdef->nForwardIfs <= 0 && + networkCreateInterfacePool(netdef) < 0) { + goto error; + } + /* pick first dev with 0 connections*/ + + for (ii = 0; ii < netdef->nForwardIfs; ii++) { + if (netdef->forwardIfs[ii].connections == 0) { + dev = &netdef->forwardIfs[ii]; + break; + } + } + if (!dev) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("network '%s' requires exclusive access to interfaces, but none are available"), + netdef->name); + goto error; + } + iface->data.network.actual->data.hostdev.def.parent.type = VIR_DOMAIN_DEVICE_NONE; + iface->data.network.actual->data.hostdev.def.actualParent.type = VIR_DOMAIN_DEVICE_NET; + iface->data.network.actual->data.hostdev.def.actualParent.data.net = iface; + + if (VIR_ALLOC(iface->data.network.actual->data.hostdev.def.info) < 0) { + virReportOOMError(); + goto cleanup; + } + iface->data.network.actual->data.hostdev.def.ephemeral = 1; + + iface->data.network.actual->data.hostdev.def.mode = VIR_DOMAIN_HOSTDEV_MODE_SUBSYS; + iface->data.network.actual->data.hostdev.def.managed = netdef->managed; + iface->data.network.actual->data.hostdev.def.source.subsys.type = dev->type; + iface->data.network.actual->data.hostdev.def.source.subsys.u.pci = dev->device.pci; + + if (virNetDevGetPhysicalFunctionFromVfPciAddr(dev->device.pci.domain, + dev->device.pci.bus, + dev->device.pci.slot, + dev->device.pci.function, + &pfname) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Could not get Physical function of the selected VF")); + goto cleanup; + } + + if (pfname != NULL) + iface->data.network.actual->data.hostdev.linkdev = strdup(pfname); + else { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Linkdev is required in hostdev-hybrid mode")); + goto cleanup; + } + iface->data.network.actual->data.hostdev.mode = VIR_NETDEV_MACVLAN_MODE_BRIDGE; + + /* merge virtualports from interface, network, and portgroup to + * arrive at actual virtualport to use + */ + if (virNetDevVPortProfileMerge3(&iface->data.network.actual->virtPortProfile, + iface->virtPortProfile, + netdef->virtPortProfile, + portgroup + ? portgroup->virtPortProfile : NULL) < 0) { + goto error; + } + virtport = iface->data.network.actual->virtPortProfile; + if (virtport) { + /* make sure type is supported for hostdev connections */ + if (virtport->virtPortType != VIR_NETDEV_VPORT_PROFILE_8021QBG && + virtport->virtPortType != VIR_NETDEV_VPORT_PROFILE_8021QBH) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("<virtualport type='%s'> not supported for network " + "'%s' which uses an SR-IOV Virtual Function " + "via PCI passthrough"), + virNetDevVPortTypeToString(virtport->virtPortType), + netdef->name); + goto error; + } + } + } else if ((netdef->forwardType == VIR_NETWORK_FORWARD_BRIDGE) || (netdef->forwardType == VIR_NETWORK_FORWARD_PRIVATE) || (netdef->forwardType == VIR_NETWORK_FORWARD_VEPA) || @@ -3252,7 +3343,8 @@ validate: if (dev) { /* we are now assured of success, so mark the allocation */ dev->connections++; - if (actualType != VIR_DOMAIN_NET_TYPE_HOSTDEV) { + if (actualType != VIR_DOMAIN_NET_TYPE_HOSTDEV && + actualType != VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) { VIR_DEBUG("Using physical device %s, %d connections", dev->device.dev, dev->connections); } else { @@ -3319,7 +3411,8 @@ networkNotifyActualDevice(virDomainNetDefPtr iface) if (!iface->data.network.actual || (actualType != VIR_DOMAIN_NET_TYPE_DIRECT && - actualType != VIR_DOMAIN_NET_TYPE_HOSTDEV)) { + actualType != VIR_DOMAIN_NET_TYPE_HOSTDEV && + actualType != VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID)) { VIR_DEBUG("Nothing to claim from network %s", iface->data.network.name); goto success; } @@ -3387,7 +3480,8 @@ networkNotifyActualDevice(virDomainNetDefPtr iface) VIR_DEBUG("Using physical device %s, connections %d", dev->device.dev, dev->connections); - } else /* if (actualType == VIR_DOMAIN_NET_TYPE_HOSTDEV) */ { + } else if (actualType == VIR_DOMAIN_NET_TYPE_HOSTDEV || + actualType == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) { virDomainHostdevDefPtr hostdev; hostdev = virDomainNetGetActualHostdev(iface); @@ -3426,7 +3520,8 @@ networkNotifyActualDevice(virDomainNetDefPtr iface) * current connections count must be 0 in those cases. */ if ((dev->connections > 0) && - netdef->forwardType == VIR_NETWORK_FORWARD_HOSTDEV) { + (netdef->forwardType == VIR_NETWORK_FORWARD_HOSTDEV || + netdef->forwardType == VIR_NETWORK_FORWARD_HOSTDEV_HYBRID)) { virReportError(VIR_ERR_INTERNAL_ERROR, _("network '%s' claims the PCI device at " "domain=%d bus=%d slot=%d function=%d " @@ -3496,7 +3591,8 @@ networkReleaseActualDevice(virDomainNetDefPtr iface) if ((!iface->data.network.actual) || ((actualType != VIR_DOMAIN_NET_TYPE_DIRECT) && - (actualType != VIR_DOMAIN_NET_TYPE_HOSTDEV))) { + (actualType != VIR_DOMAIN_NET_TYPE_HOSTDEV) && + (actualType != VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID))) { VIR_DEBUG("Nothing to release to network %s", iface->data.network.name); goto success; } @@ -3541,7 +3637,8 @@ networkReleaseActualDevice(virDomainNetDefPtr iface) VIR_DEBUG("Releasing physical device %s, connections %d", dev->device.dev, dev->connections); - } else /* if (actualType == VIR_DOMAIN_NET_TYPE_HOSTDEV) */ { + } else if (actualType == VIR_DOMAIN_NET_TYPE_HOSTDEV || + actualType == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) { virDomainHostdevDefPtr hostdev; hostdev = virDomainNetGetActualHostdev(iface); -- 1.7.4.4

For network devices allocated from a network with <forward mode='hostdev-hybrid'>, there is a need to add the newly minted hostdev to the hostdevs array. In this case we also need to call qemuPrepareHostDevices just for this one device, as the standard call to initialize all the hostdevs that were defined directly in the domain's configuration has already been made by the time we allocate a device from a libvirt network, and thus have something that needs initializing. --- src/qemu/qemu_command.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_hostdev.c | 2 +- src/qemu/qemu_hotplug.c | 24 ++++++++++++++++-- src/qemu/qemu_process.c | 3 +- 4 files changed, 85 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index a83d6de..e5388b3 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -27,6 +27,7 @@ #include "qemu_hostdev.h" #include "qemu_capabilities.h" #include "qemu_bridge_filter.h" +#include "qemu_hostdev.h" #include "cpu/cpu.h" #include "memory.h" #include "logging.h" @@ -4357,10 +4358,16 @@ qemuBuildCommandLine(virConnectPtr conn, bool emitBootindex = false; int usbcontroller = 0; bool usblegacy = false; + + virDomainObjPtr vm = NULL; + virDomainObjListPtr doms = &driver->domains; + uname_normalize(&ut); virUUIDFormat(def->uuid, uuid); + vm = virHashLookup(doms->objs, uuid); + emulator = def->emulator; /* @@ -5309,6 +5316,60 @@ qemuBuildCommandLine(virConnectPtr conn, continue; } + if (actualType == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) { + if (net->type == VIR_DOMAIN_NET_TYPE_NETWORK) { + virDomainHostdevDefPtr hostdev = virDomainNetGetActualHostdev(net); + virDomainHostdevDefPtr found; + if (vmop == VIR_NETDEV_VPORT_PROFILE_OP_CREATE) { + if (qemuAssignDeviceHostdevAlias(def, hostdev, + (def->nhostdevs-1)) < 0) { + goto error; + } + + if (virDomainHostdevFind(def, hostdev, &found) < 0) { + qemuDomainObjPrivatePtr priv = vm->privateData; + if (qemuDomainPCIAddressEnsureAddr(priv->pciaddrs, + hostdev->info) < 0) { + goto error; + } + if (virDomainHostdevInsert(def, hostdev) < 0) { + virReportOOMError(); + goto error; + } + if (qemuPrepareHostdevPCIDevices(driver, def->name, + def->uuid, &hostdev, 1) < 0) { + goto error; + } + } + else { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("PCI device %04x:%02x:%02x.%x " + "allocated from network %s is already " + "in use by domain %s"), + hostdev->source.subsys.u.pci.domain, + hostdev->source.subsys.u.pci.bus, + hostdev->source.subsys.u.pci.slot, + hostdev->source.subsys.u.pci.function, + net->data.network.name, + def->name); + goto error; + } + } + } + + int tapfd = qemuPhysIfaceConnect(def, driver, net, + qemuCaps, vmop); + if (tapfd < 0) + goto error; + + last_good_net = i; + virCommandTransferFD(cmd, tapfd); + + if (snprintf(tapfd_name, sizeof(tapfd_name), "%d", + tapfd) >= sizeof(tapfd_name)) + goto no_memory; + } + if (actualType == VIR_DOMAIN_NET_TYPE_NETWORK || actualType == VIR_DOMAIN_NET_TYPE_BRIDGE) { /* diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index a060a7e..4851e11 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -362,7 +362,7 @@ qemuDomainHostdevNetConfigReplace(virDomainHostdevDefPtr hostdev, } } else if (hostdev->actualParent.data.net) { - vlan = virDomainNetGetActualVlan(hostdev->actualParent.data.net); + vlan = virDomainNetGetActualVlan(hostdev->actualParent.data.net); virtPort = virDomainNetGetActualVirtPortProfile(hostdev->actualParent.data.net); if (virtPort) { if (vlan) { diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index e8860f3..e26ffc9 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -694,6 +694,21 @@ int qemuDomainAttachNetDevice(virConnectPtr conn, goto cleanup; } + if (actualType == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) { + ret = qemuDomainAttachHostDevice(driver, vm, + virDomainNetGetActualHostdev(net)); + if (ret < 0) + goto cleanup; + + if ((tapfd = qemuPhysIfaceConnect(vm->def, driver, net, + priv->qemuCaps, + VIR_NETDEV_VPORT_PROFILE_OP_CREATE)) < 0) + goto cleanup; + iface_connected = true; + if (qemuOpenVhostNet(vm->def, net, priv->qemuCaps, &vhostfd) < 0) + goto cleanup; + } + if (actualType == VIR_DOMAIN_NET_TYPE_BRIDGE || actualType == VIR_DOMAIN_NET_TYPE_NETWORK) { /* @@ -2109,7 +2124,8 @@ int qemuDomainDetachThisHostDevice(struct qemud_driver *driver, VIR_WARN("Failed to restore host device labelling"); } virDomainHostdevRemove(vm->def, idx); - virDomainHostdevDefFree(detach); + if (detach->ephemeral == 0) + virDomainHostdevDefFree(detach); } return ret; } @@ -2199,7 +2215,8 @@ qemuDomainDetachNetDevice(struct qemud_driver *driver, goto cleanup; } - if (virDomainNetGetActualType(detach) == VIR_DOMAIN_NET_TYPE_HOSTDEV) { + if ((virDomainNetGetActualType(detach) == VIR_DOMAIN_NET_TYPE_HOSTDEV) || + (virDomainNetGetActualType(detach) == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID)) { ret = qemuDomainDetachThisHostDevice(driver, vm, virDomainNetGetActualHostdev(detach), -1); @@ -2272,7 +2289,8 @@ qemuDomainDetachNetDevice(struct qemud_driver *driver, virDomainConfNWFilterTeardown(detach); - if (virDomainNetGetActualType(detach) == VIR_DOMAIN_NET_TYPE_DIRECT) { + if ((virDomainNetGetActualType(detach) == VIR_DOMAIN_NET_TYPE_DIRECT) || + (virDomainNetGetActualType(detach) == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID)) { ignore_value(virNetDevMacVLanDeleteWithVPortProfile( detach->ifname, &detach->mac, virDomainNetGetActualDirectDev(detach), diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 77d679a..112c113 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -4169,7 +4169,8 @@ void qemuProcessStop(struct qemud_driver *driver, def = vm->def; for (i = 0; i < def->nnets; i++) { virDomainNetDefPtr net = def->nets[i]; - if (virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_DIRECT) { + if ((virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_DIRECT) || + (virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID)) { ignore_value(virNetDevMacVLanDeleteWithVPortProfile( net->ifname, &net->mac, virDomainNetGetActualDirectDev(net), -- 1.7.4.4

--- include/libvirt/libvirt.h.in | 1 + src/conf/domain_conf.c | 24 +++++++++++++++++++----- src/qemu/qemu_domain.c | 6 +++++- src/qemu/qemu_domain.h | 3 ++- src/qemu/qemu_driver.c | 6 +++--- src/qemu/qemu_hostdev.c | 6 ++++++ src/qemu/qemu_migration.c | 4 ++-- 7 files changed, 38 insertions(+), 12 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index deb35ec..8e7a85d 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1651,6 +1651,7 @@ typedef enum { VIR_DOMAIN_XML_SECURE = (1 << 0), /* dump security sensitive information too */ VIR_DOMAIN_XML_INACTIVE = (1 << 1), /* dump inactive domain information */ VIR_DOMAIN_XML_UPDATE_CPU = (1 << 2), /* update guest CPU requirements according to host CPU */ + VIR_DOMAIN_XML_NO_EPHEMERAL_DEVICES = (1 << 24), /* Do not include ephemeral devices */ } virDomainXMLFlags; char * virDomainGetXMLDesc (virDomainPtr domain, diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 52c00db..5e2b224 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -13174,7 +13174,8 @@ virDomainDefFormatInternal(virDomainDefPtr def, virCheckFlags(DUMPXML_FLAGS | VIR_DOMAIN_XML_INTERNAL_STATUS | VIR_DOMAIN_XML_INTERNAL_ACTUAL_NET | - VIR_DOMAIN_XML_INTERNAL_PCI_ORIG_STATES, + VIR_DOMAIN_XML_INTERNAL_PCI_ORIG_STATES | + VIR_DOMAIN_XML_NO_EPHEMERAL_DEVICES, -1); if (!(type = virDomainVirtTypeToString(def->virtType))) { @@ -13675,10 +13676,22 @@ virDomainDefFormatInternal(virDomainDefPtr def, /* If parent.type != NONE, this is just a pointer to the * hostdev in a higher-level device (e.g. virDomainNetDef), * and will have already been formatted there. + * Hostdevs marked as ephemeral are hybrid hostdevs and + * should not be formatted. */ - if (def->hostdevs[n]->parent.type == VIR_DOMAIN_DEVICE_NONE && - virDomainHostdevDefFormat(buf, def->hostdevs[n], flags) < 0) { - goto cleanup; + if (def->hostdevs[n]->parent.type == VIR_DOMAIN_DEVICE_NONE) { + if ((flags & VIR_DOMAIN_XML_NO_EPHEMERAL_DEVICES) == 0) { + if (virDomainHostdevDefFormat(buf, def->hostdevs[n], flags) < 0) { + goto cleanup; + } + } + else { + if (def->hostdevs[n]->ephemeral == 0) { + if (virDomainHostdevDefFormat(buf, def->hostdevs[n], flags) < 0) { + goto cleanup; + } + } + } } } @@ -13727,7 +13740,8 @@ virDomainDefFormat(virDomainDefPtr def, unsigned int flags) { virBuffer buf = VIR_BUFFER_INITIALIZER; - virCheckFlags(DUMPXML_FLAGS, NULL); + virCheckFlags(DUMPXML_FLAGS | VIR_DOMAIN_XML_NO_EPHEMERAL_DEVICES, + NULL); if (virDomainDefFormatInternal(def, flags, &buf) < 0) return NULL; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 0ae30b7..a05e0dd 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1335,13 +1335,17 @@ char * qemuDomainDefFormatLive(struct qemud_driver *driver, virDomainDefPtr def, bool inactive, - bool compatible) + bool compatible, + bool ephemeral) { unsigned int flags = QEMU_DOMAIN_FORMAT_LIVE_FLAGS; if (inactive) flags |= VIR_DOMAIN_XML_INACTIVE; + if (ephemeral) + flags |= VIR_DOMAIN_XML_NO_EPHEMERAL_DEVICES; + return qemuDomainDefFormatXML(driver, def, flags, compatible); } diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index dff53cf..82c5f8d 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -269,7 +269,8 @@ char *qemuDomainFormatXML(struct qemud_driver *driver, char *qemuDomainDefFormatLive(struct qemud_driver *driver, virDomainDefPtr def, bool inactive, - bool compatible); + bool compatible, + bool ephemeral); void qemuDomainObjTaint(struct qemud_driver *driver, virDomainObjPtr obj, diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 8e8e00c..3b18e80 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2748,9 +2748,9 @@ qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, virDomainDefFree(def); goto endjob; } - xml = qemuDomainDefFormatLive(driver, def, true, true); + xml = qemuDomainDefFormatLive(driver, def, true, true, false); } else { - xml = qemuDomainDefFormatLive(driver, vm->def, true, true); + xml = qemuDomainDefFormatLive(driver, vm->def, true, true, false); } if (!xml) { virReportError(VIR_ERR_OPERATION_FAILED, @@ -11218,7 +11218,7 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, } else { /* Easiest way to clone inactive portion of vm->def is via * conversion in and back out of xml. */ - if (!(xml = qemuDomainDefFormatLive(driver, vm->def, true, false)) || + if (!(xml = qemuDomainDefFormatLive(driver, vm->def, true, false, false)) || !(def->dom = virDomainDefParseString(driver->caps, xml, QEMU_EXPECTED_VIRT_TYPES, VIR_DOMAIN_XML_INACTIVE))) diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index 4851e11..877f99f 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -50,6 +50,8 @@ qemuGetPciHostDeviceList(virDomainHostdevDefPtr *hostdevs, int nhostdevs) continue; if (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) continue; + if (hostdev->ephemeral == 1) + continue; dev = pciGetDevice(hostdev->source.subsys.u.pci.domain, hostdev->source.subsys.u.pci.bus, @@ -92,6 +94,8 @@ qemuGetActivePciHostDeviceList(struct qemud_driver *driver, continue; if (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) continue; + if (hostdev->ephemeral == 1) + continue; dev = pciGetDevice(hostdev->source.subsys.u.pci.domain, hostdev->source.subsys.u.pci.bus, @@ -133,6 +137,8 @@ int qemuUpdateActivePciHostdevs(struct qemud_driver *driver, continue; if (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) continue; + if (hostdev->ephemeral == 1) + continue; dev = pciGetDevice(hostdev->source.subsys.u.pci.domain, hostdev->source.subsys.u.pci.bus, diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 1b21ef6..4a51e11 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1166,9 +1166,9 @@ char *qemuMigrationBegin(struct qemud_driver *driver, if (!virDomainDefCheckABIStability(vm->def, def)) goto cleanup; - rv = qemuDomainDefFormatLive(driver, def, false, true); + rv = qemuDomainDefFormatLive(driver, def, false, true, true); } else { - rv = qemuDomainDefFormatLive(driver, vm->def, false, true); + rv = qemuDomainDefFormatLive(driver, vm->def, false, true, true); } cleanup: -- 1.7.4.4

This patch uses the ephemeral flag to prevent the hybrid hostdev from being formatted into the xml. Before migration the hybrid hostdev is hot unplugged and hotplugged again after migration is the specific hostdev is available on the destination host. --- src/qemu/qemu_migration.c | 102 ++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 97 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 4a51e11..5398049 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -31,6 +31,7 @@ #include "qemu_monitor.h" #include "qemu_domain.h" #include "qemu_process.h" +#include "qemu_hotplug.h" #include "qemu_capabilities.h" #include "qemu_cgroup.h" @@ -49,6 +50,7 @@ #include "storage_file.h" #include "viruri.h" #include "hooks.h" +#include "network/bridge_driver.h" #define VIR_FROM_THIS VIR_FROM_QEMU @@ -122,6 +124,79 @@ struct _qemuMigrationCookie { virDomainDefPtr persistent; }; +static void +qemuMigrationRemoveEphemeralDevices(struct qemud_driver *driver, + virDomainObjPtr vm) +{ + virDomainHostdevDefPtr dev; + virDomainDeviceDef def; + unsigned int i; + + for (i = 0; i < vm->def->nhostdevs; i++) { + dev = vm->def->hostdevs[i]; + if (dev->ephemeral == 1) { + def.type = VIR_DOMAIN_DEVICE_HOSTDEV; + def.data.hostdev = dev; + + if (qemuDomainDetachHostDevice(driver, vm, &def) >= 0) { + continue; /* nhostdevs reduced */ + } + } + } +} + +static void +qemuMigrationRestoreEphemeralDevices(struct qemud_driver *driver, + virDomainObjPtr vm) +{ + virDomainNetDefPtr net; + unsigned int i; + + /* Do nothing if ephemeral devices are present in which case this + function was called before qemuMigrationRemoveEphemeralDevices */ + + for (i = 0; i < vm->def->nhostdevs; i++) { + if (vm->def->hostdevs[i]->ephemeral == 1) + return; + } + + for (i = 0; i < vm->def->nnets; i++) { + net = vm->def->nets[i]; + + if (virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) { + if (qemuDomainAttachHostDevice(driver, vm, + virDomainNetGetActualHostdev(net)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Hybrid Hostdev cannot be attached after migration")); + networkReleaseActualDevice(net); + } + } + return; + } +} + +static void +qemuMigrationAttachEphemeralDevices(struct qemud_driver *driver, + virDomainObjPtr vm) +{ + virDomainNetDefPtr net; + unsigned int i; + + for (i = 0; i < vm->def->nnets; i++) { + net = vm->def->nets[i]; + + if (virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_HOSTDEV_HYBRID) { + if (qemuDomainAttachHostDevice(driver, vm, + virDomainNetGetActualHostdev(net)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Hybrid Hostdev cannot be attached after migration")); + networkReleaseActualDevice(net); + } + } + } + return; +} + static void qemuMigrationCookieGraphicsFree(qemuMigrationCookieGraphicsPtr grap) { if (!grap) @@ -800,6 +875,7 @@ qemuMigrationIsAllowed(struct qemud_driver *driver, virDomainObjPtr vm, virDomainDefPtr def) { int nsnapshots; + unsigned int i; if (vm) { if (qemuProcessAutoDestroyActive(driver, vm)) { @@ -817,10 +893,12 @@ qemuMigrationIsAllowed(struct qemud_driver *driver, virDomainObjPtr vm, def = vm->def; } - if (def->nhostdevs > 0) { - virReportError(VIR_ERR_OPERATION_INVALID, - "%s", _("Domain with assigned host devices cannot be migrated")); - return false; + for (i = 0; i < def->nhostdevs; i++) { + if (def->hostdevs[i]->ephemeral == 0) { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("Domain with assigned host devices cannot be migrated")); + return false; + } } return true; @@ -2043,6 +2121,8 @@ static int doNativeMigrate(struct qemud_driver *driver, driver, vm, uri, NULLSTR(cookiein), cookieinlen, cookieout, cookieoutlen, flags, resource); + qemuMigrationRemoveEphemeralDevices(driver, vm); + if (STRPREFIX(uri, "tcp:") && !STRPREFIX(uri, "tcp://")) { char *tmp; /* HACK: source host generates bogus URIs, so fix them up */ @@ -2069,6 +2149,9 @@ static int doNativeMigrate(struct qemud_driver *driver, ret = qemuMigrationRun(driver, vm, cookiein, cookieinlen, cookieout, cookieoutlen, flags, resource, &spec, dconn); + if (ret != 0 ) + qemuMigrationRestoreEphemeralDevices(driver, vm); + if (spec.destType == MIGRATION_DEST_FD) VIR_FORCE_CLOSE(spec.dest.fd.qemu); @@ -2107,6 +2190,8 @@ static int doTunnelMigrate(struct qemud_driver *driver, return -1; } + qemuMigrationRemoveEphemeralDevices(driver, vm); + spec.fwdType = MIGRATION_FWD_STREAM; spec.fwd.stream = st; @@ -2153,6 +2238,8 @@ static int doTunnelMigrate(struct qemud_driver *driver, cookieoutlen, flags, resource, &spec, dconn); cleanup: + if (ret != 0) + qemuMigrationRestoreEphemeralDevices(driver, vm); if (spec.destType == MIGRATION_DEST_FD) { VIR_FORCE_CLOSE(spec.dest.fd.qemu); VIR_FORCE_CLOSE(spec.dest.fd.local); @@ -2197,7 +2284,8 @@ static int doPeer2PeerMigrate2(struct qemud_driver *driver, */ if (!(dom_xml = qemuDomainFormatXML(driver, vm, VIR_DOMAIN_XML_SECURE | - VIR_DOMAIN_XML_UPDATE_CPU, + VIR_DOMAIN_XML_UPDATE_CPU | + VIR_DOMAIN_XML_NO_EPHEMERAL_DEVICES, true))) return -1; @@ -3057,6 +3145,8 @@ qemuMigrationFinish(struct qemud_driver *driver, goto endjob; } + qemuMigrationAttachEphemeralDevices(driver, vm); + /* Guest is successfully running, so cancel previous auto destroy */ qemuProcessAutoDestroyRemove(driver, vm); } else { @@ -3154,6 +3244,8 @@ int qemuMigrationConfirm(struct qemud_driver *driver, VIR_WARN("Failed to save status on vm %s", vm->def->name); goto cleanup; } + + qemuMigrationRestoreEphemeralDevices(driver, vm); } qemuMigrationCookieFree(mig); -- 1.7.4.4

On 09/07/2012 12:12 PM, Shradha Shah wrote:
This patch series adds the support for interface-type="hostdev-hybrid" and forward mode="hostdev-hybrid".
The hostdev-hybrid mode makes migration possible along with PCI-passthrough. I had posted a RFC on the hostdev-hybrid methodology earlier on the libvirt mailing list.
The RFC can be found here: https://www.redhat.com/archives/libvir-list/2012-February/msg00309.html
Before anything else, let me outline what I *think* happens with a hostdev-hybrid device entry, and you can tell me how far off I am :-): * Any hostdev-hybrid interface definition results in 2 PCI devices being added to the guest: a) a PCI passthrough of an SR-IOV VF (done essentially the same as <interface type='hostdev'> b) a virtio-net device which is connected via macvtap "bridge" mode (? is that always the case) to the PF of the VF in (a) * Both of these devices are assigned the same MAC address. * Each of these occupies one PCI address on the guest, so a total of 2 PCI addresses is needed for each hostdev-hybrid "device". (The redundancy in this statement is to be sure that I'm right, as that's an important point :-) * On the guest, these two network devices with matching MAC addresses are put together into a bond interface, with an extra driver that causes the bond to prefer the pci-passthrough device when it is present. So, under normal circumstances *all* traffic goes through the pci-passthrough device. * At migration time, since guests with attached pci-passthrough devices can't be migrated, the pci-passthrough device (which is found by searching the hostdev array for items with the "ephemeral" flag set) is detached. This reduces the bond interface on the guest to only having the virtio-net device, so traffic now passes through that device - it's slower, but connectivity is maintained. * on the destination, a new VF is found, setup with proper MAC address, VLAN, and 802.1QbX port info. A virtio-net device attached to the PF associated with this VF (via macvtap bridge mode) is also setup. The qemu commandline includes an entry for both of these devices. (Question: Is it the virtio-net device that uses the guest PCI address given in the <interface> device info?) (Question: actually, I guess the pci-passthrough device won't be attached until after the guest actually starts running on the destination host, correct?) * When migration is finished, the guest is shut down on the source and started up on the destination, leaving the new instance of the guest temporarily with just a single (virtio-net) device in the bond. * Finally, the pci-passthrough of the VF is attached to the guest, and the guest's bond interface resumes preferring this device, thus restoring full speed networking. Is that all correct? If so, one issue I have is that one of the devices (the pci-passthrough?) doesn't have its guest-side PCI address visible anywhere in the guest's XML, does it? This is problematic, because management applications (and libvirt itself) expect to be able to scan the list of devices to learn what PCI slots are occupied on the guest, and where they can add new devices. I have other questions beyond that, but either don't understand the code enough yet to verbalize them, or will ask them next to the associated code.

On 09/11/2012 01:07 PM, Laine Stump wrote:
* On the guest, these two network devices with matching MAC addresses are put together into a bond interface, with an extra driver that causes the bond to prefer the pci-passthrough device when it is present. So, under normal circumstances *all* traffic goes through the pci-passthrough device.
* At migration time, since guests with attached pci-passthrough devices can't be migrated, the pci-passthrough device (which is found by searching the hostdev array for items with the "ephemeral" flag set) is detached. This reduces the bond interface on the guest to only having the virtio-net device, so traffic now passes through that device - it's slower, but connectivity is maintained.
And if this is the case, it means that 1) the guest must be aware that it is virtualized, and 2) can detect when it is being migrated. The ideal virtualization is one in where the guest doesn't have to be aware of anything, but the goal of this patch is not ideal guest behavior, so much as faster performance by explicitly making virtualization a leaky interface where the guest has to cooperate. Assuming I'm correct, does that have any security implications on the host? Or are we okay even if the guest is malicious, because the worst the guest can do is use the slower interface rather than the faster pci-passthrough device?
I have other questions beyond that, but either don't understand the code enough yet to verbalize them, or will ask them next to the associated code.
Seconded :) -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 09/11/2012 04:47 PM, Eric Blake wrote:
On 09/11/2012 01:07 PM, Laine Stump wrote:
* On the guest, these two network devices with matching MAC addresses are put together into a bond interface, with an extra driver that causes the bond to prefer the pci-passthrough device when it is present. So, under normal circumstances *all* traffic goes through the pci-passthrough device.
* At migration time, since guests with attached pci-passthrough devices can't be migrated, the pci-passthrough device (which is found by searching the hostdev array for items with the "ephemeral" flag set) is detached. This reduces the bond interface on the guest to only having the virtio-net device, so traffic now passes through that device - it's slower, but connectivity is maintained. And if this is the case, it means that 1) the guest must be aware that it is virtualized, and 2) can detect when it is being migrated.
Unless I'm misunderstanding, the guest doesn't explicitly know that it's virtualized or that it's being migrated - the guest OS just knows that one of its PCI devices has been unplugged, and later that it's being re-plugged (of course since there's a special driver on the guest, at least that bit has a pretty good idea that it's in a virtual machine; but that's no different from the virtio-net guest driver (not to mention the guest agent)). I'm guessing that the migration will (should, anyway) fail if the guest OS fails to detach the device in a timely fashion.
The ideal virtualization is one in where the guest doesn't have to be aware of anything, but the goal of this patch is not ideal guest behavior, so much as faster performance by explicitly making virtualization a leaky interface where the guest has to cooperate.
Assuming I'm correct, does that have any security implications on the host? Or are we okay even if the guest is malicious, because the worst the guest can do is use the slower interface rather than the faster pci-passthrough device?
I think the only completely new functionality provided by these patches is the "ephemeral" flag, which makes it possible for the pci-passthrough device to be auto-detached to allow migration. So any *new* security concern would be related to that capability. Otherwise, I don't see this as any different than defining the two devices separately, which is already possible with existing libvirt. The single hostdev-hybrid "two-in-one" device does provide a convenience factor beyond just shortening the config - the PF to use for the virtio-net device is automatically derived from the VF that's allocated for the pci-passthrough device (in my mind that's the one thing that makes it desirable to have this special device type rather than just adding the "ephemeral" flag to hostdev and requiring guest configurations to have two separate device entries. Am I missing something else?) but it would still be possible to use existing device types to provide the same virtual hardware to the guest, and the guest could use that hardware in the same manner.
I have other questions beyond that, but either don't understand the code enough yet to verbalize them, or will ask them next to the associated code. Seconded :)

* At migration time, since guests with attached pci-passthrough devices can't be migrated, the pci-passthrough device (which is found by searching the hostdev array for items with the "ephemeral" flag set) is detached. This reduces the bond interface on the guest to only having the virtio-net device, so traffic now passes through that device - it's slower, but connectivity is maintained. And if this is the case, it means that 1) the guest must be aware that it is virtualized, and 2) can detect when it is being migrated.
Unless I'm misunderstanding, the guest doesn't explicitly know that it's virtualized or that it's being migrated - the guest OS just knows that one of its PCI devices has been unplugged, and later that it's being re-plugged (of course since there's a special driver on the guest, at least that bit has a pretty good idea that it's in a virtual machine; but that's no different from the virtio-net guest driver (not to mention the guest agent)). I'm guessing that the migration will (should, anyway) fail if the guest OS fails to detach the device in a timely fashion.
The migration will fail if the guest OS fails to detach the device in a timely fashion.
The ideal virtualization is one in where the guest doesn't have to be aware of anything, but the goal of this patch is not ideal guest behavior, so much as faster performance by explicitly making virtualization a leaky interface where the guest has to cooperate.
Assuming I'm correct, does that have any security implications on the host? Or are we okay even if the guest is malicious, because the worst the guest can do is use the slower interface rather than the faster pci-passthrough device?
I think the only completely new functionality provided by these patches is the "ephemeral" flag, which makes it possible for the pci-passthrough device to be auto-detached to allow migration. So any *new* security concern would be related to that capability.
Otherwise, I don't see this as any different than defining the two devices separately, which is already possible with existing libvirt.
The hostdev-hybrid model can be configured using manual steps and 2 devices separately with existing libvirt device types.
The single hostdev-hybrid "two-in-one" device does provide a convenience factor beyond just shortening the config - the PF to use for the virtio-net device is automatically derived from the VF that's allocated for the pci-passthrough device (in my mind that's the one thing that makes it desirable to have this special device type rather than just adding the "ephemeral" flag to hostdev and requiring guest configurations to have two separate device entries. Am I missing something else?) but it would still be possible to use existing device types to provide the same virtual hardware to the guest, and the guest could use that hardware in the same manner.
I have other questions beyond that, but either don't understand the code enough yet to verbalize them, or will ask them next to the associated code. Seconded :)
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Tue, Sep 11, 2012 at 03:07:25PM -0400, Laine Stump wrote:
On 09/07/2012 12:12 PM, Shradha Shah wrote:
This patch series adds the support for interface-type="hostdev-hybrid" and forward mode="hostdev-hybrid".
The hostdev-hybrid mode makes migration possible along with PCI-passthrough. I had posted a RFC on the hostdev-hybrid methodology earlier on the libvirt mailing list.
The RFC can be found here: https://www.redhat.com/archives/libvir-list/2012-February/msg00309.html
Before anything else, let me outline what I *think* happens with a hostdev-hybrid device entry, and you can tell me how far off I am :-):
* Any hostdev-hybrid interface definition results in 2 PCI devices being added to the guest:
a) a PCI passthrough of an SR-IOV VF (done essentially the same as <interface type='hostdev'> b) a virtio-net device which is connected via macvtap "bridge" mode (? is that always the case) to the PF of the VF in (a)
* Both of these devices are assigned the same MAC address.
* Each of these occupies one PCI address on the guest, so a total of 2 PCI addresses is needed for each hostdev-hybrid "device". (The redundancy in this statement is to be sure that I'm right, as that's an important point :-)
* On the guest, these two network devices with matching MAC addresses are put together into a bond interface, with an extra driver that causes the bond to prefer the pci-passthrough device when it is present. So, under normal circumstances *all* traffic goes through the pci-passthrough device.
* At migration time, since guests with attached pci-passthrough devices can't be migrated, the pci-passthrough device (which is found by searching the hostdev array for items with the "ephemeral" flag set) is detached. This reduces the bond interface on the guest to only having the virtio-net device, so traffic now passes through that device - it's slower, but connectivity is maintained.
* on the destination, a new VF is found, setup with proper MAC address, VLAN, and 802.1QbX port info. A virtio-net device attached to the PF associated with this VF (via macvtap bridge mode) is also setup. The qemu commandline includes an entry for both of these devices. (Question: Is it the virtio-net device that uses the guest PCI address given in the <interface> device info?) (Question: actually, I guess the pci-passthrough device won't be attached until after the guest actually starts running on the destination host, correct?)
* When migration is finished, the guest is shut down on the source and started up on the destination, leaving the new instance of the guest temporarily with just a single (virtio-net) device in the bond.
* Finally, the pci-passthrough of the VF is attached to the guest, and the guest's bond interface resumes preferring this device, thus restoring full speed networking.
Is that all correct?
If so, one issue I have is that one of the devices (the pci-passthrough?) doesn't have its guest-side PCI address visible anywhere in the guest's XML, does it? This is problematic, because management applications (and libvirt itself) expect to be able to scan the list of devices to learn what PCI slots are occupied on the guest, and where they can add new devices.
If that description is correct, then I have to wonder why we need to add all this code for a new "hybrid" device type. It seems to me like we can do all this already simply by listing one virtio device and one hostdev device in the guest XML. All that's required is to add support for the 'ephemeral' against hostdevs, so they are automagically unplugged. Technically we don't even need that, since a mgmt app can already just use regular hotunplug APIs before issuing the migrate API calls. These patches seem to add alot of complexity for mere syntactic sugar over existing capabilities. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 09/12/2012 05:59 AM, Daniel P. Berrange wrote:
On Tue, Sep 11, 2012 at 03:07:25PM -0400, Laine Stump wrote:
On 09/07/2012 12:12 PM, Shradha Shah wrote:
This patch series adds the support for interface-type="hostdev-hybrid" and forward mode="hostdev-hybrid".
The hostdev-hybrid mode makes migration possible along with PCI-passthrough. I had posted a RFC on the hostdev-hybrid methodology earlier on the libvirt mailing list.
The RFC can be found here: https://www.redhat.com/archives/libvir-list/2012-February/msg00309.html Before anything else, let me outline what I *think* happens with a hostdev-hybrid device entry, and you can tell me how far off I am :-):
* Any hostdev-hybrid interface definition results in 2 PCI devices being added to the guest:
a) a PCI passthrough of an SR-IOV VF (done essentially the same as <interface type='hostdev'> b) a virtio-net device which is connected via macvtap "bridge" mode (? is that always the case) to the PF of the VF in (a)
* Both of these devices are assigned the same MAC address.
* Each of these occupies one PCI address on the guest, so a total of 2 PCI addresses is needed for each hostdev-hybrid "device". (The redundancy in this statement is to be sure that I'm right, as that's an important point :-)
* On the guest, these two network devices with matching MAC addresses are put together into a bond interface, with an extra driver that causes the bond to prefer the pci-passthrough device when it is present. So, under normal circumstances *all* traffic goes through the pci-passthrough device.
* At migration time, since guests with attached pci-passthrough devices can't be migrated, the pci-passthrough device (which is found by searching the hostdev array for items with the "ephemeral" flag set) is detached. This reduces the bond interface on the guest to only having the virtio-net device, so traffic now passes through that device - it's slower, but connectivity is maintained.
* on the destination, a new VF is found, setup with proper MAC address, VLAN, and 802.1QbX port info. A virtio-net device attached to the PF associated with this VF (via macvtap bridge mode) is also setup. The qemu commandline includes an entry for both of these devices. (Question: Is it the virtio-net device that uses the guest PCI address given in the <interface> device info?) (Question: actually, I guess the pci-passthrough device won't be attached until after the guest actually starts running on the destination host, correct?)
* When migration is finished, the guest is shut down on the source and started up on the destination, leaving the new instance of the guest temporarily with just a single (virtio-net) device in the bond.
* Finally, the pci-passthrough of the VF is attached to the guest, and the guest's bond interface resumes preferring this device, thus restoring full speed networking.
Is that all correct?
If so, one issue I have is that one of the devices (the pci-passthrough?) doesn't have its guest-side PCI address visible anywhere in the guest's XML, does it? This is problematic, because management applications (and libvirt itself) expect to be able to scan the list of devices to learn what PCI slots are occupied on the guest, and where they can add new devices. If that description is correct,
That's a big "if" - keep in mind the author of the description :-) (seriously, it's very possible I'm missing some important point)
then I have to wonder why we need to add all this code for a new "hybrid" device type. It seems to me like we can do all this already simply by listing one virtio device and one hostdev device in the guest XML.
Aside from detaching/re-attaching the hostdev, the other thing that these patches bring is automatic derivation of the <source> of the virtio-net device from the hostdev. The hostdev device will be grabbed from a pool of VFs in a <network>, then a "reverse lookup" is done in PCI space to determine the PF for that VF - that's where the virtio-net device is connected. I suppose this could be handled by 1) putting only the VFs of a single PF in any network definition's device pool, and 2) always having two parallel network definitions like this: <network> <name>net-x-vfs-hostdev</name> <forward mode='hostdev' ephemeral='yes'> <pf dev='eth3'/> <!-- makes a list of all VFs for PF 'eth3' --> </forward> </network> <network> <name>net-x-pf-macvtap</name> <forward mode='bridge'> <interface dev='eth3'/> </forward> </network> Then each guest would have: <interface type='network'> <mac address='x:x:x:x:x:x'/> <network name='net-x-vfs-hostdev'> </interface> <interface type='network'> <mac address='x:x:x:x:x:x'/> <network name='net-x-pf-macvtap'> <model type='virtio'/> </interface> The problem with this is that then you can't have a pool that uses more than a single PF-worth of VFs. For example, I have an Intel 82576 card that has 2 PFs and 7 VFs per PF. This would mean that I can only have 7 VFs in a network. Let's say I have 10 guests and want to migrate them back and forth between two hosts, I would have to make some arbitrary decision that some would use "net-x-vfs-hostdev+net-x-pf-macvtap" and some others would use "net-y-vfs-hostdev+net-y-pf-macvtap". Even worse would be if I had > 14 guests - there would be artificial limits (beyond simply "no more than 14 guests/host") on which guests could be moved to which machine at any given time (I would have to oversubscribe the 7-guest limit for one pair of networks, and no more than 7 of that subset of guests could be on the same host at the same time). If, instead, the PF used for the virtio-net device is derived from the particular VF currently assigned to the same guest's hostdev, I can have a single network definition with VFs from multiple PFs, and they all become one big pool of resources. In that case, my only limit is the far simpler "no more than 14 guests/host"; no worries about *which* of the guests those 14 are. tl;dr - the two-in-one hostdev-hybrid device simplifies administrative decisions when you have/need multiple PFs. (another minor annoyance is that the dual device allows both to use the same auto-generated MAC address, but if we just use two individual devices, the MAC must be manually specified for each when the device is originally defined (so that they will match)).
All that's required is to add support for the 'ephemeral' against hostdevs, so they are automagically unplugged. Technically we don't even need that, since a mgmt app can already just use regular hotunplug APIs before issuing the migrate API calls.
I like the idea of having that capability at libvirt's level, so that you can easily try things out with virsh (is the ephemeral flag implemented so that it also works for virsh save/restore? That would be a double plus.) A lot of us don't really use anything higher level than virsh or virt-manager, especially for testing. (I actually think there's merit to adding the ephemeral flag (can anyone think of a better name? When I hear ephemeral, I think of that TV chef - Emeril) for hostdevs in general - it would provide a method of easily allowing save/restore/migration for guests that have hostdevs that could be temporarily detached without ill consequences. I think proper operation would require that qemu notify libvirt when it's *really* finished detaching a device though (I don't have it at hand right now, but there's an open BZ requesting that from qemu).)
These patches seem to add alot of complexity for mere syntactic sugar over existing capabilities.
I agree that the two-in-one device adds a lot of complexity. If we could find a way to derive the PF used for the virtio-net device from the VF used for the hostdev without having a combined two-in-one device entry (and being able to use a common auto-generated mac address would be nice too), then I would agree that it should be left as two separate device entries (if nothing else, this gives us an obvious place to put the PCI address of the 2nd device). I'm not sure how to do that without limiting pools to a single PF though. (I know, I know - the solution is for a higher level management application to modify the guest's config during migration according to what's in use. But if we're going to do that anyway, we may as well not have network definitions defining pools of interfaces in the first place.)

Please find my comments inline. Many Thanks, Regards, Shradha Shah On 09/12/2012 08:01 PM, Laine Stump wrote:
On 09/12/2012 05:59 AM, Daniel P. Berrange wrote:
On Tue, Sep 11, 2012 at 03:07:25PM -0400, Laine Stump wrote:
On 09/07/2012 12:12 PM, Shradha Shah wrote:
This patch series adds the support for interface-type="hostdev-hybrid" and forward mode="hostdev-hybrid".
The hostdev-hybrid mode makes migration possible along with PCI-passthrough. I had posted a RFC on the hostdev-hybrid methodology earlier on the libvirt mailing list.
The RFC can be found here: https://www.redhat.com/archives/libvir-list/2012-February/msg00309.html Before anything else, let me outline what I *think* happens with a hostdev-hybrid device entry, and you can tell me how far off I am :-):
* Any hostdev-hybrid interface definition results in 2 PCI devices being added to the guest:
a) a PCI passthrough of an SR-IOV VF (done essentially the same as <interface type='hostdev'> b) a virtio-net device which is connected via macvtap "bridge" mode (? is that always the case) to the PF of the VF in (a)
* Both of these devices are assigned the same MAC address.
* Each of these occupies one PCI address on the guest, so a total of 2 PCI addresses is needed for each hostdev-hybrid "device". (The redundancy in this statement is to be sure that I'm right, as that's an important point :-)
* On the guest, these two network devices with matching MAC addresses are put together into a bond interface, with an extra driver that causes the bond to prefer the pci-passthrough device when it is present. So, under normal circumstances *all* traffic goes through the pci-passthrough device.
* At migration time, since guests with attached pci-passthrough devices can't be migrated, the pci-passthrough device (which is found by searching the hostdev array for items with the "ephemeral" flag set) is detached. This reduces the bond interface on the guest to only having the virtio-net device, so traffic now passes through that device - it's slower, but connectivity is maintained.
* on the destination, a new VF is found, setup with proper MAC address, VLAN, and 802.1QbX port info. A virtio-net device attached to the PF associated with this VF (via macvtap bridge mode) is also setup. The qemu commandline includes an entry for both of these devices. (Question: Is it the virtio-net device that uses the guest PCI address given in the <interface> device info?) (Question: actually, I guess the pci-passthrough device won't be attached until after the guest actually starts running on the destination host, correct?)
* When migration is finished, the guest is shut down on the source and started up on the destination, leaving the new instance of the guest temporarily with just a single (virtio-net) device in the bond.
* Finally, the pci-passthrough of the VF is attached to the guest, and the guest's bond interface resumes preferring this device, thus restoring full speed networking.
Is that all correct?
If so, one issue I have is that one of the devices (the pci-passthrough?) doesn't have its guest-side PCI address visible anywhere in the guest's XML, does it? This is problematic, because management applications (and libvirt itself) expect to be able to scan the list of devices to learn what PCI slots are occupied on the guest, and where they can add new devices. If that description is correct,
That's a big "if" - keep in mind the author of the description :-) (seriously, it's very possible I'm missing some important point)
then I have to wonder why we need to add all this code for a new "hybrid" device type. It seems to me like we can do all this already simply by listing one virtio device and one hostdev device in the guest XML.
Aside from detaching/re-attaching the hostdev, the other thing that these patches bring is automatic derivation of the <source> of the virtio-net device from the hostdev. The hostdev device will be grabbed from a pool of VFs in a <network>, then a "reverse lookup" is done in PCI space to determine the PF for that VF - that's where the virtio-net device is connected.
I suppose this could be handled by 1) putting only the VFs of a single PF in any network definition's device pool, and 2) always having two parallel network definitions like this:
<network> <name>net-x-vfs-hostdev</name> <forward mode='hostdev' ephemeral='yes'> <pf dev='eth3'/> <!-- makes a list of all VFs for PF 'eth3' --> </forward> </network>
<network> <name>net-x-pf-macvtap</name> <forward mode='bridge'> <interface dev='eth3'/> </forward> </network>
Then each guest would have:
<interface type='network'> <mac address='x:x:x:x:x:x'/> <network name='net-x-vfs-hostdev'> </interface> <interface type='network'> <mac address='x:x:x:x:x:x'/> <network name='net-x-pf-macvtap'> <model type='virtio'/> </interface>
The problem with this is that then you can't have a pool that uses more than a single PF-worth of VFs. For example, I have an Intel 82576 card that has 2 PFs and 7 VFs per PF. This would mean that I can only have 7 VFs in a network. Let's say I have 10 guests and want to migrate them back and forth between two hosts, I would have to make some arbitrary decision that some would use "net-x-vfs-hostdev+net-x-pf-macvtap" and some others would use "net-y-vfs-hostdev+net-y-pf-macvtap". Even worse would be if I had > 14 guests - there would be artificial limits (beyond simply "no more than 14 guests/host") on which guests could be moved to which machine at any given time (I would have to oversubscribe the 7-guest limit for one pair of networks, and no more than 7 of that subset of guests could be on the same host at the same time).
If, instead, the PF used for the virtio-net device is derived from the particular VF currently assigned to the same guest's hostdev, I can have a single network definition with VFs from multiple PFs, and they all become one big pool of resources. In that case, my only limit is the far simpler "no more than 14 guests/host"; no worries about *which* of the guests those 14 are. tl;dr - the two-in-one hostdev-hybrid device simplifies administrative decisions when you have/need multiple PFs.
(another minor annoyance is that the dual device allows both to use the same auto-generated MAC address, but if we just use two individual devices, the MAC must be manually specified for each when the device is originally defined (so that they will match)).
All that's required is to add support for the 'ephemeral' against hostdevs, so they are automagically unplugged. Technically we don't even need that, since a mgmt app can already just use regular hotunplug APIs before issuing the migrate API calls.
I like the idea of having that capability at libvirt's level, so that you can easily try things out with virsh (is the ephemeral flag implemented so that it also works for virsh save/restore? That would be a double plus.) A lot of us don't really use anything higher level than virsh or virt-manager, especially for testing.
The ephemeral flag is not currently implemented so that it works for virsh save/restore, but I can make the changes very easily if required. The ephemeral flag will be an addition to the network XML config similar to "managed".
(I actually think there's merit to adding the ephemeral flag (can anyone think of a better name? When I hear ephemeral, I think of that TV chef - Emeril) for hostdevs in general - it would provide a method of easily allowing save/restore/migration for guests that have hostdevs that could be temporarily detached without ill consequences. I think proper operation would require that qemu notify libvirt when it's *really* finished detaching a device though (I don't have it at hand right now, but there's an open BZ requesting that from qemu).)
These patches seem to add alot of complexity for mere syntactic sugar over existing capabilities.
I agree that the two-in-one device adds a lot of complexity. If we could find a way to derive the PF used for the virtio-net device from the VF used for the hostdev without having a combined two-in-one device entry (and being able to use a common auto-generated mac address would be nice too), then I would agree that it should be left as two separate device entries (if nothing else, this gives us an obvious place to put the PCI address of the 2nd device). I'm not sure how to do that without limiting pools to a single PF though. (I know, I know - the solution is for a higher level management application to modify the guest's config during migration according to what's in use. But if we're going to do that anyway, we may as well not have network definitions defining pools of interfaces in the first place.)
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Wed, Sep 12, 2012 at 03:01:08PM -0400, Laine Stump wrote:
On 09/12/2012 05:59 AM, Daniel P. Berrange wrote:
then I have to wonder why we need to add all this code for a new "hybrid" device type. It seems to me like we can do all this already simply by listing one virtio device and one hostdev device in the guest XML.
Aside from detaching/re-attaching the hostdev, the other thing that these patches bring is automatic derivation of the <source> of the virtio-net device from the hostdev. The hostdev device will be grabbed from a pool of VFs in a <network>, then a "reverse lookup" is done in PCI space to determine the PF for that VF - that's where the virtio-net device is connected.
I suppose this could be handled by 1) putting only the VFs of a single PF in any network definition's device pool, and 2) always having two parallel network definitions like this:
<network> <name>net-x-vfs-hostdev</name> <forward mode='hostdev' ephemeral='yes'> <pf dev='eth3'/> <!-- makes a list of all VFs for PF 'eth3' --> </forward> </network>
<network> <name>net-x-pf-macvtap</name> <forward mode='bridge'> <interface dev='eth3'/> </forward> </network>
Eww, that's a bit of a nasty duplication.
Then each guest would have:
<interface type='network'> <mac address='x:x:x:x:x:x'/> <network name='net-x-vfs-hostdev'> </interface> <interface type='network'> <mac address='x:x:x:x:x:x'/> <network name='net-x-pf-macvtap'> <model type='virtio'/> </interface>
The problem with this is that then you can't have a pool that uses more than a single PF-worth of VFs. For example, I have an Intel 82576 card that has 2 PFs and 7 VFs per PF. This would mean that I can only have 7 VFs in a network. Let's say I have 10 guests and want to migrate them back and forth between two hosts, I would have to make some arbitrary decision that some would use "net-x-vfs-hostdev+net-x-pf-macvtap" and some others would use "net-y-vfs-hostdev+net-y-pf-macvtap". Even worse would be if I had > 14 guests - there would be artificial limits (beyond simply "no more than 14 guests/host") on which guests could be moved to which machine at any given time (I would have to oversubscribe the 7-guest limit for one pair of networks, and no more than 7 of that subset of guests could be on the same host at the same time).
If, instead, the PF used for the virtio-net device is derived from the particular VF currently assigned to the same guest's hostdev, I can have a single network definition with VFs from multiple PFs, and they all become one big pool of resources. In that case, my only limit is the far simpler "no more than 14 guests/host"; no worries about *which* of the guests those 14 are. tl;dr - the two-in-one hostdev-hybrid device simplifies administrative decisions when you have/need multiple PFs.
(another minor annoyance is that the dual device allows both to use the same auto-generated MAC address, but if we just use two individual devices, the MAC must be manually specified for each when the device is originally defined (so that they will match)).
Why not just define a new element to put inside the <interface> tag to indicated two related devices. <paired/> and lookup the pairing based on the MAC address. Alternatively, you could define a new source type for the associated device, eg <interface type='paired'> and again asociate based on the MAC address.
All that's required is to add support for the 'ephemeral' against hostdevs, so they are automagically unplugged. Technically we don't even need that, since a mgmt app can already just use regular hotunplug APIs before issuing the migrate API calls.
I like the idea of having that capability at libvirt's level, so that you can easily try things out with virsh (is the ephemeral flag implemented so that it also works for virsh save/restore? That would be a double plus.) A lot of us don't really use anything higher level than virsh or virt-manager, especially for testing.
(I actually think there's merit to adding the ephemeral flag (can anyone think of a better name? When I hear ephemeral, I think of that TV chef - Emeril) for hostdevs in general - it would provide a method of easily allowing save/restore/migration for guests that have hostdevs that could be temporarily detached without ill consequences. I think proper operation would require that qemu notify libvirt when it's *really* finished detaching a device though (I don't have it at hand right now, but there's an open BZ requesting that from qemu).)
True, we can't safely do migration until QEMU has truely removed the PCI device from the guest, and must prevent migration if that doesn't happen. This is something that must be addressed regardless. As for the name, we already use 'ephemeral' and 'transient' in libvirt - either one of those would be reasonable choices.
These patches seem to add alot of complexity for mere syntactic sugar over existing capabilities.
I agree that the two-in-one device adds a lot of complexity. If we could find a way to derive the PF used for the virtio-net device from the VF used for the hostdev without having a combined two-in-one device entry (and being able to use a common auto-generated mac address would be nice too), then I would agree that it should be left as two separate device entries (if nothing else, this gives us an obvious place to put the PCI address of the 2nd device). I'm not sure how to do that without limiting pools to a single PF though. (I know, I know - the solution is for a higher level management application to modify the guest's config during migration according to what's in use. But if we're going to do that anyway, we may as well not have network definitions defining pools of interfaces in the first place.)
Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Laine, Please find my comments inline. Regards, Shradha Shah On 09/11/2012 08:07 PM, Laine Stump wrote:
On 09/07/2012 12:12 PM, Shradha Shah wrote:
This patch series adds the support for interface-type="hostdev-hybrid" and forward mode="hostdev-hybrid".
The hostdev-hybrid mode makes migration possible along with PCI-passthrough. I had posted a RFC on the hostdev-hybrid methodology earlier on the libvirt mailing list.
The RFC can be found here: https://www.redhat.com/archives/libvir-list/2012-February/msg00309.html
Before anything else, let me outline what I *think* happens with a hostdev-hybrid device entry, and you can tell me how far off I am :-):
* Any hostdev-hybrid interface definition results in 2 PCI devices being added to the guest:
a) a PCI passthrough of an SR-IOV VF (done essentially the same as <interface type='hostdev'> b) a virtio-net device which is connected via macvtap "bridge" mode (? is that always the case) to the PF of the VF in (a)
Yes the virtio-net device is always connected via macvtap "bridge" mode
* Both of these devices are assigned the same MAC address.
Correct.
* Each of these occupies one PCI address on the guest, so a total of 2 PCI addresses is needed for each hostdev-hybrid "device". (The redundancy in this statement is to be sure that I'm right, as that's an important point :-)
Yes a total of 2 PCI addresses are needed for each hostdev-hybrid "device"
* On the guest, these two network devices with matching MAC addresses are put together into a bond interface, with an extra driver that causes the bond to prefer the pci-passthrough device when it is present. So, under normal circumstances *all* traffic goes through the pci-passthrough device.
Correct.
* At migration time, since guests with attached pci-passthrough devices can't be migrated, the pci-passthrough device (which is found by searching the hostdev array for items with the "ephemeral" flag set) is detached. This reduces the bond interface on the guest to only having the virtio-net device, so traffic now passes through that device - it's slower, but connectivity is maintained.
Correct.
* on the destination, a new VF is found, setup with proper MAC address, VLAN, and 802.1QbX port info. A virtio-net device attached to the PF associated with this VF (via macvtap bridge mode) is also setup. The qemu commandline includes an entry for both of these devices. (Question: Is it the virtio-net device that uses the guest PCI address given in the <interface> device info?)
Yes, The virtio-net device gets the guest PCI address given in the <interface> device info. The VF gets a separate guest PCI address .
(Question: actually, I guess the pci-passthrough device won't be attached until after the guest actually starts running on the destination host, correct?)
The Vf is not attached until the migration completes because during migration qemu checks for PCI-passthrough devices on the source and on the destination.
* When migration is finished, the guest is shut down on the source and started up on the destination, leaving the new instance of the guest temporarily with just a single (virtio-net) device in the bond.
Yes, The VF is hotplug into the guest in the qemuMigrationFinish Stage
* Finally, the pci-passthrough of the VF is attached to the guest, and the guest's bond interface resumes preferring this device, thus restoring full speed networking.
Is that all correct?
If so, one issue I have is that one of the devices (the pci-passthrough?) doesn't have its guest-side PCI address visible anywhere in the guest's XML, does it? This is problematic, because management applications (and libvirt itself) expect to be able to scan the list of devices to learn what PCI slots are occupied on the guest, and where they can add new devices.
Actually the guest PCI address of the pci-passthrough device i.e. The VF is visible in the guest's XML when the guest is running. The VF will be plugged into the guest only when the guest is running or when the guest is not being migrated hence will be visible in the guest XML. i.e At any moment if a VF is plugged into the guest, its guest address will be visible in the guest XML and the PCI slot will be marked as being used. When the VF is hot-unplugged the PCI slot is free and hence can be used for some other device by management applications (libvirt).
I have other questions beyond that, but either don't understand the code enough yet to verbalize them, or will ask them next to the associated code.
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 09/13/2012 06:16 AM, Shradha Shah wrote:
On 09/11/2012 08:07 PM, Laine Stump wrote:
If so, one issue I have is that one of the devices (the pci-passthrough?) doesn't have its guest-side PCI address visible anywhere in the guest's XML, does it? This is problematic, because management applications (and libvirt itself) expect to be able to scan the list of devices to learn what PCI slots are occupied on the guest, and where they can add new devices. Actually the guest PCI address of the pci-passthrough device i.e. The VF is visible in the guest's XML when the guest is running. The VF will be plugged into the guest only when the guest is running or when the guest is not being migrated hence will be visible in the guest XML.
But there's only a place for one guest-side PCI address in each device element. Where is it showing up? Also, we really need for the same PCI address to be used each time the device is attached; although there may not appear to be a need for that now, past experience has shown that changing the PCI slot of a device over time inevitably leads to a problem somewhere with something :-/ Due to this, the general complexity of what's being done vs. what's being added, and also time/review bandwidth constraints I think that at least for this release we can't take the full hostdev-hybrid device patchset (really I think it will need to be re-thought and probably a different approach taken for specifying the two devices). However, the "ephemeral" attribute for <hostdev>, <interface type='hostdev'> and <forward mode='hostdev'> is fairly straightforward and provides generally useful new functionality (especially if it is expanded as mentioned to work for save/restore as well as migration) - with just this part of your patch, we can still get all of the desired functionality at the level of libvirt XML (with the two limitations of 1) networks limited to a single PF, and 2) duplicated mac addresses must be manually specified). How difficult would it be to create a patch with just that part of the functionality (plus the additional save/restore tweak)? If you could do that before the freeze on Tuesday AM we might be able to get it into 0.10.2 (which will be what Fedora 18 is based on).

On 09/14/2012 12:05 PM, Laine Stump wrote:
On 09/13/2012 06:16 AM, Shradha Shah wrote:
On 09/11/2012 08:07 PM, Laine Stump wrote:
If so, one issue I have is that one of the devices (the pci-passthrough?) doesn't have its guest-side PCI address visible anywhere in the guest's XML, does it? This is problematic, because management applications (and libvirt itself) expect to be able to scan the list of devices to learn what PCI slots are occupied on the guest, and where they can add new devices. Actually the guest PCI address of the pci-passthrough device i.e. The VF is visible in the guest's XML when the guest is running. The VF will be plugged into the guest only when the guest is running or when the guest is not being migrated hence will be visible in the guest XML.
But there's only a place for one guest-side PCI address in each device element. Where is it showing up?
The guest PCI address of the pci-passthrough device is visible in <hostdev> device element and that of the virtio-net is visible in the <interface> device element.
Also, we really need for the same PCI address to be used each time the device is attached; although there may not appear to be a need for that now, past experience has shown that changing the PCI slot of a device over time inevitably leads to a problem somewhere with something :-/
Due to this, the general complexity of what's being done vs. what's being added, and also time/review bandwidth constraints I think that at least for this release we can't take the full hostdev-hybrid device patchset (really I think it will need to be re-thought and probably a different approach taken for specifying the two devices).
However, the "ephemeral" attribute for <hostdev>, <interface type='hostdev'> and <forward mode='hostdev'> is fairly straightforward and provides generally useful new functionality (especially if it is expanded as mentioned to work for save/restore as well as migration) - with just this part of your patch, we can still get all of the desired functionality at the level of libvirt XML (with the two limitations of 1) networks limited to a single PF, and 2) duplicated mac addresses must be manually specified).
How difficult would it be to create a patch with just that part of the functionality (plus the additional save/restore tweak)? If you could do that before the freeze on Tuesday AM we might be able to get it into 0.10.2 (which will be what Fedora 18 is based on).
I can provide the required patches over the weekend.
participants (4)
-
Daniel P. Berrange
-
Eric Blake
-
Laine Stump
-
Shradha Shah