[libvirt PATCH 00/12] virtio failover / vfio auto-plug-on-migrate

https://bugzilla.redhat.com/1693587 'QEMU 4.2.0 and later, combined with a sufficiently recent guest virtio-net driver, supports setting up a simple network bond device comprised of one virtio emulated NIC and one hostdev NIC (which must be an SRIOV VF). The allure of this setup is that the bond will always favor the hostdev device, providing better performance, until the guest is migrated - at that time QEMU will automatically unplug the hostdev NIC and the bond will send all traffic via the virtio NIC until migration is completed, then QEMU on the destination side will hotplug a new hostdev NIC and the bond will switch back to using the hostdev for network traffic. The result is that guests desiring the extra performance of a hostdev NIC are now migratable without network downtime (performance is just degraded during migration) and without requiring a complicated bonding configuration in the guest OS network config and complicated unplug/replug logic in the management application on the host - it can instead all be accomplished in libvirt with the interface <driver> subelement "failover" and "backupAlias" attributes. Patches 1-4 are just simple refactoring with no functional change, 5-10 are the new functionality, patch 11 is documentation, and Patch 12 is an RFC for a method to solve a problem that oVirt has when trying to use this feature - while the virtio guest driver requires the pair of interfaces to have matching MAC addresses, oVirt requires every network interface to have a unique MAC. I'm not sure that I like having this hackishness in libvirt (see the commit log message), but it does solve oVirt's problem, and also makes direct config with libvirt XML simpler (since it removes the need to manually specify any MAC addresses in order to arrive at a working config, which is something that has always been the case before now). I'll leave it up to the jury to decide whether or not it's acceptable :-) Laine Stump (12): conf: refactor hostdev driver subelement format for simpler additions conf: change virDomainVirtioNet*Format() to return void conf: rename two static functions conf: refactor <interface>'s <driver> subelement parse/format qemu: add capabilities flag for failover feature conf: add failover attribute to <driver> subelement of <interface> qemu: add backupAlias attribute to <driver> subelement of hostdev devices conf: add backupAlias attribute to <interface> driver subelement qemu: allow migration with assigned PCI hostdev if backupAlias is set qemu: add wait-unplug to qemu migration status enum docs: document virtio failover / QEMU auto-plug of hostdev during migration conf/qemu: new <driver> attribute "useBackupMAC" docs/formatdomain.html.in | 74 +++ docs/news.xml | 27 + docs/schemas/domaincommon.rng | 15 + src/conf/domain_conf.c | 559 ++++++++++-------- src/conf/domain_conf.h | 57 +- src/libxl/libxl_driver.c | 2 +- src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 5 + src/qemu/qemu_domain.c | 21 +- src/qemu/qemu_hostdev.c | 5 +- src/qemu/qemu_hostdev.h | 1 + src/qemu/qemu_hotplug.c | 2 +- src/qemu/qemu_migration.c | 49 +- src/qemu/qemu_monitor.c | 1 + src/qemu/qemu_monitor.h | 1 + src/qemu/qemu_monitor_json.c | 1 + src/util/virhostdev.c | 47 +- src/util/virhostdev.h | 1 + .../caps_4.2.0.aarch64.xml | 1 + .../caps_4.2.0.x86_64.xml | 1 + .../net-virtio-failover-network.xml | 37 ++ .../qemuxml2argvdata/net-virtio-failover.args | 40 ++ .../qemuxml2argvdata/net-virtio-failover.xml | 50 ++ tests/qemuxml2argvtest.c | 4 + .../net-virtio-failover-network.xml | 51 ++ .../net-virtio-failover.xml | 66 +++ tests/qemuxml2xmltest.c | 6 + tests/virhostdevtest.c | 18 +- 29 files changed, 856 insertions(+), 289 deletions(-) create mode 100644 tests/qemuxml2argvdata/net-virtio-failover-network.xml create mode 100644 tests/qemuxml2argvdata/net-virtio-failover.args create mode 100644 tests/qemuxml2argvdata/net-virtio-failover.xml create mode 100644 tests/qemuxml2xmloutdata/net-virtio-failover-network.xml create mode 100644 tests/qemuxml2xmloutdata/net-virtio-failover.xml -- 2.24.1

This reorganizes with no functional change, to make it easier to see what is added in an upcoming patch. (Yes, the "backend != DEFAULT" clause is duplicated in the nested if - the first is to check if *any* attributes of <driver> are present, the second is checking for this specific attribute; this will be useful when we add a second attribute). Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 2a8a04cacb..0208d330c3 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -25022,16 +25022,23 @@ virDomainHostdevDefFormatSubsys(virBufferPtr buf, if (def->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && pcisrc->backend != VIR_DOMAIN_HOSTDEV_PCI_BACKEND_DEFAULT) { - const char *backend = - virDomainHostdevSubsysPCIBackendTypeToString(pcisrc->backend); - if (!backend) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("unexpected pci hostdev driver name type %d"), - pcisrc->backend); - return -1; + virBufferAddLit(buf, "<driver"); + + if (pcisrc->backend != VIR_DOMAIN_HOSTDEV_PCI_BACKEND_DEFAULT) { + const char *backend = + virDomainHostdevSubsysPCIBackendTypeToString(pcisrc->backend); + + if (!backend) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("unexpected pci hostdev driver name type %d"), + pcisrc->backend); + return -1; + } + virBufferAsprintf(buf, " name='%s'", backend); } - virBufferAsprintf(buf, "<driver name='%s'/>\n", backend); + + virBufferAddLit(buf, "/>\n"); } virBufferAddLit(buf, "<source"); -- 2.24.1

All three of these functions could only return 0 anyway, so just get rid of all the extra red tape. Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 20 ++++++-------------- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 0208d330c3..69e3b7be9a 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -25317,7 +25317,7 @@ virDomainActualNetDefFormat(virBufferPtr buf, } -static int +static void virDomainVirtioNetGuestOptsFormat(char **outstr, virDomainNetDefPtr def) { @@ -25345,11 +25345,10 @@ virDomainVirtioNetGuestOptsFormat(char **outstr, virBufferTrim(&buf, " ", -1); *outstr = virBufferContentAndReset(&buf); - return 0; } -static int +static void virDomainVirtioNetHostOptsFormat(char **outstr, virDomainNetDefPtr def) { @@ -25385,11 +25384,10 @@ virDomainVirtioNetHostOptsFormat(char **outstr, virBufferTrim(&buf, " ", -1); *outstr = virBufferContentAndReset(&buf); - return 0; } -static int +static void virDomainVirtioNetDriverFormat(char **outstr, virDomainNetDefPtr def) { @@ -25422,7 +25420,6 @@ virDomainVirtioNetDriverFormat(char **outstr, virDomainVirtioOptionsFormat(&buf, def->virtio); *outstr = virBufferContentAndReset(&buf); - return 0; } @@ -25696,15 +25693,13 @@ virDomainNetDefFormat(virBufferPtr buf, virBufferEscapeString(buf, "<model type='%s'/>\n", virDomainNetGetModelString(def)); if (virDomainNetIsVirtioModel(def)) { - int rc = 0; g_autofree char *str = NULL; g_autofree char *gueststr = NULL; g_autofree char *hoststr = NULL; - if (virDomainVirtioNetDriverFormat(&str, def) < 0 || - virDomainVirtioNetGuestOptsFormat(&gueststr, def) < 0 || - virDomainVirtioNetHostOptsFormat(&hoststr, def) < 0) - rc = -1; + virDomainVirtioNetDriverFormat(&str, def); + virDomainVirtioNetGuestOptsFormat(&gueststr, def); + virDomainVirtioNetHostOptsFormat(&hoststr, def); if (!gueststr && !hoststr) { if (str) @@ -25722,9 +25717,6 @@ virDomainNetDefFormat(virBufferPtr buf, virBufferAdjustIndent(buf, -2); virBufferAddLit(buf, "</driver>\n"); } - - if (rc < 0) - return -1; } } if (def->backend.tap || def->backend.vhost) { -- 2.24.1

On Sun, Jan 19, 2020 at 10:24:09PM -0500, Laine Stump wrote:
All three of these functions could only return 0 anyway, so just get rid of all the extra red tape.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 20 ++++++-------------- 1 file changed, 6 insertions(+), 14 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Adding Driver to the names makes them better fit their purpose. Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 69e3b7be9a..58f4a1c14b 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -25318,8 +25318,8 @@ virDomainActualNetDefFormat(virBufferPtr buf, static void -virDomainVirtioNetGuestOptsFormat(char **outstr, - virDomainNetDefPtr def) +virDomainVirtioNetDriverGuestOptsFormat(char **outstr, + virDomainNetDefPtr def) { virBuffer buf = VIR_BUFFER_INITIALIZER; if (def->driver.virtio.guest.csum) { @@ -25349,8 +25349,8 @@ virDomainVirtioNetGuestOptsFormat(char **outstr, static void -virDomainVirtioNetHostOptsFormat(char **outstr, - virDomainNetDefPtr def) +virDomainVirtioNetDriverHostOptsFormat(char **outstr, + virDomainNetDefPtr def) { virBuffer buf = VIR_BUFFER_INITIALIZER; if (def->driver.virtio.host.csum) { @@ -25698,8 +25698,8 @@ virDomainNetDefFormat(virBufferPtr buf, g_autofree char *hoststr = NULL; virDomainVirtioNetDriverFormat(&str, def); - virDomainVirtioNetGuestOptsFormat(&gueststr, def); - virDomainVirtioNetHostOptsFormat(&hoststr, def); + virDomainVirtioNetDriverGuestOptsFormat(&gueststr, def); + virDomainVirtioNetDriverHostOptsFormat(&hoststr, def); if (!gueststr && !hoststr) { if (str) -- 2.24.1

On Sun, Jan 19, 2020 at 10:24:10PM -0500, Laine Stump wrote:
Adding Driver to the names makes them better fit their purpose.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Until now the <driver> subelement of <interface> was only parsed/used/formatted if the device model was "virtio". Making the driver member of the data object a struct instead of a union and refactoring the parsing/formatting code in this separate patch will make it easier to see exactly what is being added when an upcoming patch adds an attribute to <driver> that will be present/parsed regardless of whether or not the device model is "virtio". Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 443 +++++++++++++++++++++-------------------- src/conf/domain_conf.h | 52 ++--- 2 files changed, 254 insertions(+), 241 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 58f4a1c14b..1379ae1600 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -12059,205 +12059,210 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt, if (ifname_guest_actual != NULL) def->ifname_guest_actual = g_steal_pointer(&ifname_guest_actual); - if (def->type != VIR_DOMAIN_NET_TYPE_HOSTDEV && - virDomainNetIsVirtioModel(def)) { - if (backend != NULL) { - if ((val = virDomainNetBackendTypeFromString(backend)) < 0 || - val == VIR_DOMAIN_NET_BACKEND_TYPE_DEFAULT) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("Unknown interface <driver name='%s'> " - "has been specified"), - backend); - goto error; - } - def->driver.virtio.name = val; - } - if (txmode != NULL) { - if ((val = virDomainNetVirtioTxModeTypeFromString(txmode)) < 0 || - val == VIR_DOMAIN_NET_VIRTIO_TX_MODE_DEFAULT) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("Unknown interface <driver txmode='%s'> " - "has been specified"), - txmode); - goto error; - } - def->driver.virtio.txmode = val; - } - if (ioeventfd) { - if ((val = virTristateSwitchTypeFromString(ioeventfd)) <= 0) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("unknown interface ioeventfd mode '%s'"), - ioeventfd); - goto error; - } - def->driver.virtio.ioeventfd = val; - } - if (event_idx) { - if ((val = virTristateSwitchTypeFromString(event_idx)) <= 0) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("unknown interface event_idx mode '%s'"), - event_idx); - goto error; - } - def->driver.virtio.event_idx = val; - } - if (queues) { - unsigned int q; - if (virStrToLong_uip(queues, NULL, 10, &q) < 0) { - virReportError(VIR_ERR_XML_DETAIL, - _("'queues' attribute must be positive number: %s"), - queues); - goto error; - } - if (q > 1) - def->driver.virtio.queues = q; - } - if (rx_queue_size) { - unsigned int q; - if (virStrToLong_uip(rx_queue_size, NULL, 10, &q) < 0) { - virReportError(VIR_ERR_XML_DETAIL, - _("'rx_queue_size' attribute must be positive number: %s"), - rx_queue_size); - goto error; - } - def->driver.virtio.rx_queue_size = q; - } - if (tx_queue_size) { - unsigned int q; - if (virStrToLong_uip(tx_queue_size, NULL, 10, &q) < 0) { - virReportError(VIR_ERR_XML_DETAIL, - _("'tx_queue_size' attribute must be positive number: %s"), - tx_queue_size); - goto error; - } - def->driver.virtio.tx_queue_size = q; - } - - if ((tmpNode = virXPathNode("./driver/host", ctxt))) { - if ((str = virXMLPropString(tmpNode, "csum"))) { - if ((val = virTristateSwitchTypeFromString(str)) <= 0) { + if (def->type != VIR_DOMAIN_NET_TYPE_HOSTDEV) { + /* + * the <driver> subelement of <interface type='hostdev'> has + * already been parsed by virDomainHostdevDefParseXMLSubsys() + */ + if (virDomainNetIsVirtioModel(def)) { + if (backend != NULL) { + if ((val = virDomainNetBackendTypeFromString(backend)) < 0 || + val == VIR_DOMAIN_NET_BACKEND_TYPE_DEFAULT) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("unknown host csum mode '%s'"), - str); + _("Unknown interface <driver name='%s'> " + "has been specified"), + backend); goto error; } - def->driver.virtio.host.csum = val; + def->driver.virtio.name = val; } - VIR_FREE(str); - if ((str = virXMLPropString(tmpNode, "gso"))) { - if ((val = virTristateSwitchTypeFromString(str)) <= 0) { + if (txmode != NULL) { + if ((val = virDomainNetVirtioTxModeTypeFromString(txmode)) < 0 || + val == VIR_DOMAIN_NET_VIRTIO_TX_MODE_DEFAULT) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("unknown host gso mode '%s'"), - str); + _("Unknown interface <driver txmode='%s'> " + "has been specified"), + txmode); goto error; } - def->driver.virtio.host.gso = val; + def->driver.virtio.txmode = val; } - VIR_FREE(str); - if ((str = virXMLPropString(tmpNode, "tso4"))) { - if ((val = virTristateSwitchTypeFromString(str)) <= 0) { + if (ioeventfd) { + if ((val = virTristateSwitchTypeFromString(ioeventfd)) <= 0) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("unknown host tso4 mode '%s'"), - str); + _("unknown interface ioeventfd mode '%s'"), + ioeventfd); goto error; } - def->driver.virtio.host.tso4 = val; + def->driver.virtio.ioeventfd = val; } - VIR_FREE(str); - if ((str = virXMLPropString(tmpNode, "tso6"))) { - if ((val = virTristateSwitchTypeFromString(str)) <= 0) { + if (event_idx) { + if ((val = virTristateSwitchTypeFromString(event_idx)) <= 0) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("unknown host tso6 mode '%s'"), - str); + _("unknown interface event_idx mode '%s'"), + event_idx); goto error; } - def->driver.virtio.host.tso6 = val; + def->driver.virtio.event_idx = val; } - VIR_FREE(str); - if ((str = virXMLPropString(tmpNode, "ecn"))) { - if ((val = virTristateSwitchTypeFromString(str)) <= 0) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("unknown host ecn mode '%s'"), - str); + if (queues) { + unsigned int q; + if (virStrToLong_uip(queues, NULL, 10, &q) < 0) { + virReportError(VIR_ERR_XML_DETAIL, + _("'queues' attribute must be positive number: %s"), + queues); goto error; } - def->driver.virtio.host.ecn = val; + if (q > 1) + def->driver.virtio.queues = q; } - VIR_FREE(str); - if ((str = virXMLPropString(tmpNode, "ufo"))) { - if ((val = virTristateSwitchTypeFromString(str)) <= 0) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("unknown host ufo mode '%s'"), - str); + if (rx_queue_size) { + unsigned int q; + if (virStrToLong_uip(rx_queue_size, NULL, 10, &q) < 0) { + virReportError(VIR_ERR_XML_DETAIL, + _("'rx_queue_size' attribute must be positive number: %s"), + rx_queue_size); goto error; } - def->driver.virtio.host.ufo = val; + def->driver.virtio.rx_queue_size = q; } - VIR_FREE(str); - if ((str = virXMLPropString(tmpNode, "mrg_rxbuf"))) { - if ((val = virTristateSwitchTypeFromString(str)) <= 0) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("unknown host mrg_rxbuf mode '%s'"), - str); + if (tx_queue_size) { + unsigned int q; + if (virStrToLong_uip(tx_queue_size, NULL, 10, &q) < 0) { + virReportError(VIR_ERR_XML_DETAIL, + _("'tx_queue_size' attribute must be positive number: %s"), + tx_queue_size); goto error; } - def->driver.virtio.host.mrg_rxbuf = val; + def->driver.virtio.tx_queue_size = q; } - VIR_FREE(str); - } - if ((tmpNode = virXPathNode("./driver/guest", ctxt))) { - if ((str = virXMLPropString(tmpNode, "csum"))) { - if ((val = virTristateSwitchTypeFromString(str)) <= 0) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("unknown guest csum mode '%s'"), - str); - goto error; + if ((tmpNode = virXPathNode("./driver/host", ctxt))) { + if ((str = virXMLPropString(tmpNode, "csum"))) { + if ((val = virTristateSwitchTypeFromString(str)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unknown host csum mode '%s'"), + str); + goto error; + } + def->driver.virtio.host.csum = val; } - def->driver.virtio.guest.csum = val; - } - VIR_FREE(str); - if ((str = virXMLPropString(tmpNode, "tso4"))) { - if ((val = virTristateSwitchTypeFromString(str)) <= 0) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("unknown guest tso4 mode '%s'"), - str); - goto error; + VIR_FREE(str); + if ((str = virXMLPropString(tmpNode, "gso"))) { + if ((val = virTristateSwitchTypeFromString(str)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unknown host gso mode '%s'"), + str); + goto error; + } + def->driver.virtio.host.gso = val; } - def->driver.virtio.guest.tso4 = val; - } - VIR_FREE(str); - if ((str = virXMLPropString(tmpNode, "tso6"))) { - if ((val = virTristateSwitchTypeFromString(str)) <= 0) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("unknown guest tso6 mode '%s'"), - str); - goto error; + VIR_FREE(str); + if ((str = virXMLPropString(tmpNode, "tso4"))) { + if ((val = virTristateSwitchTypeFromString(str)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unknown host tso4 mode '%s'"), + str); + goto error; + } + def->driver.virtio.host.tso4 = val; } - def->driver.virtio.guest.tso6 = val; - } - VIR_FREE(str); - if ((str = virXMLPropString(tmpNode, "ecn"))) { - if ((val = virTristateSwitchTypeFromString(str)) <= 0) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("unknown guest ecn mode '%s'"), - str); - goto error; + VIR_FREE(str); + if ((str = virXMLPropString(tmpNode, "tso6"))) { + if ((val = virTristateSwitchTypeFromString(str)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unknown host tso6 mode '%s'"), + str); + goto error; + } + def->driver.virtio.host.tso6 = val; + } + VIR_FREE(str); + if ((str = virXMLPropString(tmpNode, "ecn"))) { + if ((val = virTristateSwitchTypeFromString(str)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unknown host ecn mode '%s'"), + str); + goto error; + } + def->driver.virtio.host.ecn = val; + } + VIR_FREE(str); + if ((str = virXMLPropString(tmpNode, "ufo"))) { + if ((val = virTristateSwitchTypeFromString(str)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unknown host ufo mode '%s'"), + str); + goto error; + } + def->driver.virtio.host.ufo = val; + } + VIR_FREE(str); + if ((str = virXMLPropString(tmpNode, "mrg_rxbuf"))) { + if ((val = virTristateSwitchTypeFromString(str)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unknown host mrg_rxbuf mode '%s'"), + str); + goto error; + } + def->driver.virtio.host.mrg_rxbuf = val; } - def->driver.virtio.guest.ecn = val; + VIR_FREE(str); } - VIR_FREE(str); - if ((str = virXMLPropString(tmpNode, "ufo"))) { - if ((val = virTristateSwitchTypeFromString(str)) <= 0) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("unknown guest ufo mode '%s'"), - str); - goto error; + + if ((tmpNode = virXPathNode("./driver/guest", ctxt))) { + if ((str = virXMLPropString(tmpNode, "csum"))) { + if ((val = virTristateSwitchTypeFromString(str)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unknown guest csum mode '%s'"), + str); + goto error; + } + def->driver.virtio.guest.csum = val; + } + VIR_FREE(str); + if ((str = virXMLPropString(tmpNode, "tso4"))) { + if ((val = virTristateSwitchTypeFromString(str)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unknown guest tso4 mode '%s'"), + str); + goto error; + } + def->driver.virtio.guest.tso4 = val; + } + VIR_FREE(str); + if ((str = virXMLPropString(tmpNode, "tso6"))) { + if ((val = virTristateSwitchTypeFromString(str)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unknown guest tso6 mode '%s'"), + str); + goto error; + } + def->driver.virtio.guest.tso6 = val; + } + VIR_FREE(str); + if ((str = virXMLPropString(tmpNode, "ecn"))) { + if ((val = virTristateSwitchTypeFromString(str)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unknown guest ecn mode '%s'"), + str); + goto error; + } + def->driver.virtio.guest.ecn = val; + } + VIR_FREE(str); + if ((str = virXMLPropString(tmpNode, "ufo"))) { + if ((val = virTristateSwitchTypeFromString(str)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unknown guest ufo mode '%s'"), + str); + goto error; + } + def->driver.virtio.guest.ufo = val; } - def->driver.virtio.guest.ufo = val; } + def->backend.vhost = g_steal_pointer(&vhost_path); } - def->backend.vhost = g_steal_pointer(&vhost_path); } def->linkstate = VIR_DOMAIN_NET_INTERFACE_LINK_STATE_DEFAULT; @@ -25388,36 +25393,40 @@ virDomainVirtioNetDriverHostOptsFormat(char **outstr, static void -virDomainVirtioNetDriverFormat(char **outstr, - virDomainNetDefPtr def) +virDomainNetDriverAttributesFormat(char **outstr, + virDomainNetDefPtr def) { virBuffer buf = VIR_BUFFER_INITIALIZER; - if (def->driver.virtio.name) { - virBufferAsprintf(&buf, " name='%s'", - virDomainNetBackendTypeToString(def->driver.virtio.name)); - } - if (def->driver.virtio.txmode) { - virBufferAsprintf(&buf, " txmode='%s'", - virDomainNetVirtioTxModeTypeToString(def->driver.virtio.txmode)); - } - if (def->driver.virtio.ioeventfd) { - virBufferAsprintf(&buf, " ioeventfd='%s'", - virTristateSwitchTypeToString(def->driver.virtio.ioeventfd)); - } - if (def->driver.virtio.event_idx) { - virBufferAsprintf(&buf, " event_idx='%s'", - virTristateSwitchTypeToString(def->driver.virtio.event_idx)); - } - if (def->driver.virtio.queues) - virBufferAsprintf(&buf, " queues='%u'", def->driver.virtio.queues); - if (def->driver.virtio.rx_queue_size) - virBufferAsprintf(&buf, " rx_queue_size='%u'", - def->driver.virtio.rx_queue_size); - if (def->driver.virtio.tx_queue_size) - virBufferAsprintf(&buf, " tx_queue_size='%u'", - def->driver.virtio.tx_queue_size); - virDomainVirtioOptionsFormat(&buf, def->virtio); + if (virDomainNetIsVirtioModel(def)) { + + if (def->driver.virtio.name) { + virBufferAsprintf(&buf, " name='%s'", + virDomainNetBackendTypeToString(def->driver.virtio.name)); + } + if (def->driver.virtio.txmode) { + virBufferAsprintf(&buf, " txmode='%s'", + virDomainNetVirtioTxModeTypeToString(def->driver.virtio.txmode)); + } + if (def->driver.virtio.ioeventfd) { + virBufferAsprintf(&buf, " ioeventfd='%s'", + virTristateSwitchTypeToString(def->driver.virtio.ioeventfd)); + } + if (def->driver.virtio.event_idx) { + virBufferAsprintf(&buf, " event_idx='%s'", + virTristateSwitchTypeToString(def->driver.virtio.event_idx)); + } + if (def->driver.virtio.queues) + virBufferAsprintf(&buf, " queues='%u'", def->driver.virtio.queues); + if (def->driver.virtio.rx_queue_size) + virBufferAsprintf(&buf, " rx_queue_size='%u'", + def->driver.virtio.rx_queue_size); + if (def->driver.virtio.tx_queue_size) + virBufferAsprintf(&buf, " tx_queue_size='%u'", + def->driver.virtio.tx_queue_size); + + virDomainVirtioOptionsFormat(&buf, def->virtio); + } *outstr = virBufferContentAndReset(&buf); } @@ -25454,6 +25463,9 @@ virDomainNetDefFormat(virBufferPtr buf, char macstr[VIR_MAC_STRING_BUFLEN]; g_auto(virBuffer) attrBuf = VIR_BUFFER_INITIALIZER; const char *prefix = xmlopt ? xmlopt->config.netPrefix : NULL; + g_autofree char *driverstr = NULL; + g_autofree char *gueststr = NULL; + g_autofree char *hoststr = NULL; /* publicActual is true if we should report the current state in * def->data.network.actual *instead of* the config (*not* in @@ -25692,33 +25704,32 @@ virDomainNetDefFormat(virBufferPtr buf, if (virDomainNetGetModelString(def)) { virBufferEscapeString(buf, "<model type='%s'/>\n", virDomainNetGetModelString(def)); - if (virDomainNetIsVirtioModel(def)) { - g_autofree char *str = NULL; - g_autofree char *gueststr = NULL; - g_autofree char *hoststr = NULL; + } - virDomainVirtioNetDriverFormat(&str, def); - virDomainVirtioNetDriverGuestOptsFormat(&gueststr, def); - virDomainVirtioNetDriverHostOptsFormat(&hoststr, def); + virDomainNetDriverAttributesFormat(&driverstr, def); - if (!gueststr && !hoststr) { - if (str) - virBufferAsprintf(buf, "<driver%s/>\n", str); - } else { - if (str) - virBufferAsprintf(buf, "<driver%s>\n", str); - else - virBufferAddLit(buf, "<driver>\n"); - virBufferAdjustIndent(buf, 2); - if (hoststr) - virBufferAsprintf(buf, "<host %s/>\n", hoststr); - if (gueststr) - virBufferAsprintf(buf, "<guest %s/>\n", gueststr); - virBufferAdjustIndent(buf, -2); - virBufferAddLit(buf, "</driver>\n"); - } - } + if (virDomainNetIsVirtioModel(def)) { + virDomainVirtioNetDriverGuestOptsFormat(&gueststr, def); + virDomainVirtioNetDriverHostOptsFormat(&hoststr, def); + } + + if (!gueststr && !hoststr) { + if (driverstr) + virBufferAsprintf(buf, "<driver%s/>\n", driverstr); + } else { + if (driverstr) + virBufferAsprintf(buf, "<driver%s>\n", driverstr); + else + virBufferAddLit(buf, "<driver>\n"); + virBufferAdjustIndent(buf, 2); + if (hoststr) + virBufferAsprintf(buf, "<host %s/>\n", hoststr); + if (gueststr) + virBufferAsprintf(buf, "<guest %s/>\n", gueststr); + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</driver>\n"); } + if (def->backend.tap || def->backend.vhost) { virBufferAddLit(buf, "<backend"); virBufferEscapeString(buf, " tap='%s'", def->backend.tap); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 6ae89fa498..5a44113681 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -927,32 +927,34 @@ struct _virDomainNetDef { bool mac_generated; /* true if mac was *just now* auto-generated by libvirt */ int model; /* virDomainNetModelType */ char *modelstr; - union { - struct { - virDomainNetBackendType name; /* which driver backend to use */ - virDomainNetVirtioTxModeType txmode; - virTristateSwitch ioeventfd; - virTristateSwitch event_idx; - unsigned int queues; /* Multiqueue virtio-net */ - unsigned int rx_queue_size; - unsigned int tx_queue_size; - struct { - virTristateSwitch csum; - virTristateSwitch gso; - virTristateSwitch tso4; - virTristateSwitch tso6; - virTristateSwitch ecn; - virTristateSwitch ufo; - virTristateSwitch mrg_rxbuf; - } host; + struct { + union { struct { - virTristateSwitch csum; - virTristateSwitch tso4; - virTristateSwitch tso6; - virTristateSwitch ecn; - virTristateSwitch ufo; - } guest; - } virtio; + virDomainNetBackendType name; /* which driver backend to use */ + virDomainNetVirtioTxModeType txmode; + virTristateSwitch ioeventfd; + virTristateSwitch event_idx; + unsigned int queues; /* Multiqueue virtio-net */ + unsigned int rx_queue_size; + unsigned int tx_queue_size; + struct { + virTristateSwitch csum; + virTristateSwitch gso; + virTristateSwitch tso4; + virTristateSwitch tso6; + virTristateSwitch ecn; + virTristateSwitch ufo; + virTristateSwitch mrg_rxbuf; + } host; + struct { + virTristateSwitch csum; + virTristateSwitch tso4; + virTristateSwitch tso6; + virTristateSwitch ecn; + virTristateSwitch ufo; + } guest; + } virtio; + }; } driver; struct { char *tap; -- 2.24.1

Presence of the virtio-net-pci option called "failover" indicates support in a qemu binary of a simplistic bonding of a virtio-net device with another PCI device. This feature allows migration of guests that have a network device assigned to a guest with VFIO, by creating a network bond device in the guest consisting of the VFIO-assigned device and a virtio-net-pci device, then temporarily (and automatically) unplugging the VFIO net device prior to migration (and hotplugging an equivalent device on the migration destination). (The feature is called "failover" because the bond device uses the vfio-pci netdev for normal guest networking, but "fails over" to the virtio-net-pci netdev once the vfio-pci device is unplugged for migration.) Full functioning of the feature also requires support in the virtio-net driver in the guest OS (since that is where the bond device resides), but if the "failover" commandline option is present for the virtio-net-pci device in qemu, at least the qemu part of the feature is available, and libvirt can add the proper options to both the virtio-net-pci and vfio-pci device commandlines to indicate qemu should attempt doing the failover during migration. This patch just adds the qemu capabilities flag "virtio-net.failover". Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_4.2.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_4.2.0.x86_64.xml | 1 + 4 files changed, 5 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 84c62a4e28..15943b4a45 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -553,6 +553,7 @@ VIR_ENUM_IMPL(virQEMUCaps, "blockdev-file-dynamic-auto-read-only", "savevm-monitor-nodes", "drive-nvme", + "virtio-net.failover", ); @@ -1275,6 +1276,7 @@ static struct virQEMUCapsStringFlags virQEMUCapsDevicePropsVirtioNet[] = { { "disable-legacy", QEMU_CAPS_VIRTIO_PCI_DISABLE_LEGACY }, { "iommu_platform", QEMU_CAPS_VIRTIO_PCI_IOMMU_PLATFORM }, { "ats", QEMU_CAPS_VIRTIO_PCI_ATS }, + { "failover", QEMU_CAPS_VIRTIO_NET_FAILOVER }, }; static struct virQEMUCapsStringFlags virQEMUCapsDevicePropsSpaprPCIHostBridge[] = { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 193c19fc81..2beee3e839 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -534,6 +534,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ QEMU_CAPS_BLOCK_FILE_AUTO_READONLY_DYNAMIC, /* the auto-read-only property of block backends for files is dynamic */ QEMU_CAPS_SAVEVM_MONITOR_NODES, /* 'savevm' handles monitor-owned nodes properly */ QEMU_CAPS_DRIVE_NVME, /* -drive file.driver=nvme */ + QEMU_CAPS_VIRTIO_NET_FAILOVER, /* virtio-net-*.failover */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_4.2.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_4.2.0.aarch64.xml index a6469073fd..536092e1f1 100644 --- a/tests/qemucapabilitiesdata/caps_4.2.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_4.2.0.aarch64.xml @@ -175,6 +175,7 @@ <flag name='blockdev-file-dynamic-auto-read-only'/> <flag name='savevm-monitor-nodes'/> <flag name='drive-nvme'/> + <flag name='virtio-net.failover'/> <version>4001050</version> <kvmVersion>0</kvmVersion> <microcodeVersion>61700242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_4.2.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_4.2.0.x86_64.xml index 4857e2f5a5..35f601dc5d 100644 --- a/tests/qemucapabilitiesdata/caps_4.2.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_4.2.0.x86_64.xml @@ -218,6 +218,7 @@ <flag name='blockdev-file-dynamic-auto-read-only'/> <flag name='savevm-monitor-nodes'/> <flag name='drive-nvme'/> + <flag name='virtio-net.failover'/> <version>4002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100242</microcodeVersion> -- 2.24.1

This attribute is only used for virtio-net devices, so it is stored in the virtio part of the anonymous union in virDomainNetDef::driver. An example of the new option: <interface type='network'> <source network='mybridge'/> <mac address='00:11:22:33:44:55'/> <model type='virtio'/> <driver failover='on'/> </interface> The corresponding qemu commandline option can only be set if the qemu binary supports it, so we check for the qemu capability before adding it. Signed-off-by: Laine Stump <laine@redhat.com> --- docs/schemas/domaincommon.rng | 5 ++ src/conf/domain_conf.c | 15 ++++++ src/conf/domain_conf.h | 1 + src/qemu/qemu_command.c | 3 ++ src/qemu/qemu_domain.c | 12 ++++- .../qemuxml2argvdata/net-virtio-failover.args | 36 +++++++++++++ .../qemuxml2argvdata/net-virtio-failover.xml | 36 +++++++++++++ tests/qemuxml2argvtest.c | 3 ++ .../net-virtio-failover.xml | 50 +++++++++++++++++++ tests/qemuxml2xmltest.c | 2 + 10 files changed, 161 insertions(+), 2 deletions(-) create mode 100644 tests/qemuxml2argvdata/net-virtio-failover.args create mode 100644 tests/qemuxml2argvdata/net-virtio-failover.xml create mode 100644 tests/qemuxml2xmloutdata/net-virtio-failover.xml diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 76d94b156f..80aea47e36 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -3050,6 +3050,11 @@ <optional> <ref name="event_idx"/> </optional> + <optional> + <attribute name="failover"> + <ref name="virOnOff"/> + </attribute> + </optional> </group> </choice> <ref name="virtioOptions"/> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 1379ae1600..29636617a2 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -11558,6 +11558,7 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt, g_autofree char *txmode = NULL; g_autofree char *ioeventfd = NULL; g_autofree char *event_idx = NULL; + g_autofree char *failover = NULL; g_autofree char *queues = NULL; g_autofree char *rx_queue_size = NULL; g_autofree char *tx_queue_size = NULL; @@ -11729,6 +11730,7 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt, txmode = virXMLPropString(cur, "txmode"); ioeventfd = virXMLPropString(cur, "ioeventfd"); event_idx = virXMLPropString(cur, "event_idx"); + failover = virXMLPropString(cur, "failover"); queues = virXMLPropString(cur, "queues"); rx_queue_size = virXMLPropString(cur, "rx_queue_size"); tx_queue_size = virXMLPropString(cur, "tx_queue_size"); @@ -12105,6 +12107,15 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt, } def->driver.virtio.event_idx = val; } + if (failover) { + if ((val = virTristateSwitchTypeFromString(failover)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unknown interface failover setting '%s'"), + failover); + goto error; + } + def->driver.virtio.failover = val; + } if (queues) { unsigned int q; if (virStrToLong_uip(queues, NULL, 10, &q) < 0) { @@ -25416,6 +25427,10 @@ virDomainNetDriverAttributesFormat(char **outstr, virBufferAsprintf(&buf, " event_idx='%s'", virTristateSwitchTypeToString(def->driver.virtio.event_idx)); } + if (def->driver.virtio.failover) { + virBufferAsprintf(&buf, " failover='%s'", + virTristateSwitchTypeToString(def->driver.virtio.failover)); + } if (def->driver.virtio.queues) virBufferAsprintf(&buf, " queues='%u'", def->driver.virtio.queues); if (def->driver.virtio.rx_queue_size) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 5a44113681..af9691d62b 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -934,6 +934,7 @@ struct _virDomainNetDef { virDomainNetVirtioTxModeType txmode; virTristateSwitch ioeventfd; virTristateSwitch event_idx; + virTristateSwitch failover; unsigned int queues; /* Multiqueue virtio-net */ unsigned int rx_queue_size; unsigned int tx_queue_size; diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 904d2beab5..d3c0cc0506 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3834,6 +3834,9 @@ qemuBuildNicDevStr(virDomainDefPtr def, virBufferAsprintf(&buf, ",host_mtu=%u", net->mtu); } + if (usingVirtio && net->driver.virtio.failover == VIR_TRISTATE_SWITCH_ON) + virBufferAddLit(&buf, ",failover=on"); + virBufferAsprintf(&buf, ",netdev=host%s", net->info.alias); virBufferAsprintf(&buf, ",id=%s", net->info.alias); virBufferAsprintf(&buf, ",mac=%s", diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index a6dde15bad..6f45d74bde 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6396,7 +6396,8 @@ qemuDomainValidateActualNetDef(const virDomainNetDef *net, static int -qemuDomainDeviceDefValidateNetwork(const virDomainNetDef *net) +qemuDomainDeviceDefValidateNetwork(const virDomainNetDef *net, + virQEMUCapsPtr qemuCaps) { bool hasIPv4 = false; bool hasIPv6 = false; @@ -6471,6 +6472,13 @@ qemuDomainDeviceDefValidateNetwork(const virDomainNetDef *net) _("tx_queue_size has to be a power of two")); return -1; } + + if (net->driver.virtio.failover == VIR_TRISTATE_SWITCH_ON && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_VIRTIO_NET_FAILOVER)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("virtio-net failover is not supported with this QEMU binary")); + return -1; + } } if (net->mtu && @@ -8377,7 +8385,7 @@ qemuDomainDeviceDefValidate(const virDomainDeviceDef *dev, switch ((virDomainDeviceType)dev->type) { case VIR_DOMAIN_DEVICE_NET: - ret = qemuDomainDeviceDefValidateNetwork(dev->data.net); + ret = qemuDomainDeviceDefValidateNetwork(dev->data.net, qemuCaps); break; case VIR_DOMAIN_DEVICE_CHR: diff --git a/tests/qemuxml2argvdata/net-virtio-failover.args b/tests/qemuxml2argvdata/net-virtio-failover.args new file mode 100644 index 0000000000..da41e19628 --- /dev/null +++ b/tests/qemuxml2argvdata/net-virtio-failover.args @@ -0,0 +1,36 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-i386 \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=tcg,usb=off,dump-guest-core=off \ +-m 214 \ +-realtime mlock=off \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-ide0-0-0 \ +-device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 \ +-netdev user,id=hostua-backup0 \ +-device virtio-net-pci,failover=on,netdev=hostua-backup0,id=ua-backup0,\ +mac=00:11:22:33:44:55,bus=pci.0,addr=0x3 \ +-netdev user,id=hostua-backup1 \ +-device virtio-net-pci,failover=on,netdev=hostua-backup1,id=ua-backup1,\ +mac=66:44:33:22:11:00,bus=pci.0,addr=0x4 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 diff --git a/tests/qemuxml2argvdata/net-virtio-failover.xml b/tests/qemuxml2argvdata/net-virtio-failover.xml new file mode 100644 index 0000000000..1f545b8d73 --- /dev/null +++ b/tests/qemuxml2argvdata/net-virtio-failover.xml @@ -0,0 +1,36 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i386</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + </disk> + <controller type='usb' index='0'/> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='virtio'/> + <driver failover='on'/> + <alias name='ua-backup0'/> + </interface> + <interface type='user'> + <mac address='66:44:33:22:11:00'/> + <model type='virtio'/> + <driver failover='on'/> + <alias name='ua-backup1'/> + </interface> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 58b4deefc6..c4fbe321d8 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1308,6 +1308,9 @@ mymain(void) QEMU_CAPS_VIRTIO_NET_RX_QUEUE_SIZE, QEMU_CAPS_VIRTIO_NET_TX_QUEUE_SIZE); DO_TEST_PARSE_ERROR("net-virtio-rxqueuesize-invalid-size", NONE); + DO_TEST("net-virtio-failover", + QEMU_CAPS_VIRTIO_NET_FAILOVER); + DO_TEST_PARSE_ERROR("net-virtio-failover", NONE); DO_TEST("net-eth", NONE); DO_TEST("net-eth-ifname", NONE); DO_TEST("net-eth-names", NONE); diff --git a/tests/qemuxml2xmloutdata/net-virtio-failover.xml b/tests/qemuxml2xmloutdata/net-virtio-failover.xml new file mode 100644 index 0000000000..7895c03dd7 --- /dev/null +++ b/tests/qemuxml2xmloutdata/net-virtio-failover.xml @@ -0,0 +1,50 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i386</emulator> + <disk type='block' device='disk'> + <driver name='qemu' type='raw'/> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <controller type='ide' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='virtio'/> + <driver failover='on'/> + <alias name='ua-backup0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </interface> + <interface type='user'> + <mac address='66:44:33:22:11:00'/> + <model type='virtio'/> + <driver failover='on'/> + <alias name='ua-backup1'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </interface> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 3cefc64833..326e49fbcd 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -451,6 +451,8 @@ mymain(void) DO_TEST("net-eth-unmanaged-tap", NONE); DO_TEST("net-virtio-network-portgroup", NONE); DO_TEST("net-virtio-rxtxqueuesize", NONE); + DO_TEST("net-virtio-failover", + QEMU_CAPS_VIRTIO_NET_FAILOVER); DO_TEST("net-hostdev", NONE); DO_TEST("net-hostdev-bootorder", NONE); DO_TEST("net-hostdev-vfio", QEMU_CAPS_DEVICE_VFIO_PCI); -- 2.24.1

On Sun, Jan 19, 2020 at 10:24:13PM -0500, Laine Stump wrote:
This attribute is only used for virtio-net devices, so it is stored in the virtio part of the anonymous union in virDomainNetDef::driver. An
I'm not convinced that storing it only for virtio-net is the right approach. This feels like a generic concept, that just happens to only be implemented by virtio-net in QEMU today. IOW, it is valid to store it in the virDOmainNetDef in a place that's generally applicable. Any restriction to virtio-net belongs in the QEMU driver, rather than the XML parser.
example of the new option:
<interface type='network'> <source network='mybridge'/> <mac address='00:11:22:33:44:55'/> <model type='virtio'/> <driver failover='on'/> </interface>
The corresponding qemu commandline option can only be set if the qemu binary supports it, so we check for the qemu capability before adding it.
Signed-off-by: Laine Stump <laine@redhat.com> --- docs/schemas/domaincommon.rng | 5 ++ src/conf/domain_conf.c | 15 ++++++ src/conf/domain_conf.h | 1 + src/qemu/qemu_command.c | 3 ++ src/qemu/qemu_domain.c | 12 ++++- .../qemuxml2argvdata/net-virtio-failover.args | 36 +++++++++++++ .../qemuxml2argvdata/net-virtio-failover.xml | 36 +++++++++++++ tests/qemuxml2argvtest.c | 3 ++ .../net-virtio-failover.xml | 50 +++++++++++++++++++ tests/qemuxml2xmltest.c | 2 + 10 files changed, 161 insertions(+), 2 deletions(-) create mode 100644 tests/qemuxml2argvdata/net-virtio-failover.args create mode 100644 tests/qemuxml2argvdata/net-virtio-failover.xml create mode 100644 tests/qemuxml2xmloutdata/net-virtio-failover.xml
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 76d94b156f..80aea47e36 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -3050,6 +3050,11 @@ <optional> <ref name="event_idx"/> </optional> + <optional> + <attribute name="failover"> + <ref name="virOnOff"/> + </attribute> + </optional> </group> </choice> <ref name="virtioOptions"/> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 1379ae1600..29636617a2 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -11558,6 +11558,7 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt, g_autofree char *txmode = NULL; g_autofree char *ioeventfd = NULL; g_autofree char *event_idx = NULL; + g_autofree char *failover = NULL; g_autofree char *queues = NULL; g_autofree char *rx_queue_size = NULL; g_autofree char *tx_queue_size = NULL; @@ -11729,6 +11730,7 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt, txmode = virXMLPropString(cur, "txmode"); ioeventfd = virXMLPropString(cur, "ioeventfd"); event_idx = virXMLPropString(cur, "event_idx"); + failover = virXMLPropString(cur, "failover"); queues = virXMLPropString(cur, "queues"); rx_queue_size = virXMLPropString(cur, "rx_queue_size"); tx_queue_size = virXMLPropString(cur, "tx_queue_size"); @@ -12105,6 +12107,15 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt, } def->driver.virtio.event_idx = val; } + if (failover) { + if ((val = virTristateSwitchTypeFromString(failover)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unknown interface failover setting '%s'"), + failover); + goto error; + } + def->driver.virtio.failover = val; + } if (queues) { unsigned int q; if (virStrToLong_uip(queues, NULL, 10, &q) < 0) { @@ -25416,6 +25427,10 @@ virDomainNetDriverAttributesFormat(char **outstr, virBufferAsprintf(&buf, " event_idx='%s'", virTristateSwitchTypeToString(def->driver.virtio.event_idx)); } + if (def->driver.virtio.failover) { + virBufferAsprintf(&buf, " failover='%s'", + virTristateSwitchTypeToString(def->driver.virtio.failover)); + } if (def->driver.virtio.queues) virBufferAsprintf(&buf, " queues='%u'", def->driver.virtio.queues); if (def->driver.virtio.rx_queue_size) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 5a44113681..af9691d62b 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -934,6 +934,7 @@ struct _virDomainNetDef { virDomainNetVirtioTxModeType txmode; virTristateSwitch ioeventfd; virTristateSwitch event_idx; + virTristateSwitch failover; unsigned int queues; /* Multiqueue virtio-net */ unsigned int rx_queue_size; unsigned int tx_queue_size; diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 904d2beab5..d3c0cc0506 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3834,6 +3834,9 @@ qemuBuildNicDevStr(virDomainDefPtr def, virBufferAsprintf(&buf, ",host_mtu=%u", net->mtu); }
+ if (usingVirtio && net->driver.virtio.failover == VIR_TRISTATE_SWITCH_ON) + virBufferAddLit(&buf, ",failover=on"); + virBufferAsprintf(&buf, ",netdev=host%s", net->info.alias); virBufferAsprintf(&buf, ",id=%s", net->info.alias); virBufferAsprintf(&buf, ",mac=%s", diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index a6dde15bad..6f45d74bde 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6396,7 +6396,8 @@ qemuDomainValidateActualNetDef(const virDomainNetDef *net,
static int -qemuDomainDeviceDefValidateNetwork(const virDomainNetDef *net) +qemuDomainDeviceDefValidateNetwork(const virDomainNetDef *net, + virQEMUCapsPtr qemuCaps) { bool hasIPv4 = false; bool hasIPv6 = false; @@ -6471,6 +6472,13 @@ qemuDomainDeviceDefValidateNetwork(const virDomainNetDef *net) _("tx_queue_size has to be a power of two")); return -1; } + + if (net->driver.virtio.failover == VIR_TRISTATE_SWITCH_ON && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_VIRTIO_NET_FAILOVER)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("virtio-net failover is not supported with this QEMU binary")); + return -1; + } }
if (net->mtu && @@ -8377,7 +8385,7 @@ qemuDomainDeviceDefValidate(const virDomainDeviceDef *dev,
switch ((virDomainDeviceType)dev->type) { case VIR_DOMAIN_DEVICE_NET: - ret = qemuDomainDeviceDefValidateNetwork(dev->data.net); + ret = qemuDomainDeviceDefValidateNetwork(dev->data.net, qemuCaps); break;
case VIR_DOMAIN_DEVICE_CHR: diff --git a/tests/qemuxml2argvdata/net-virtio-failover.args b/tests/qemuxml2argvdata/net-virtio-failover.args new file mode 100644 index 0000000000..da41e19628 --- /dev/null +++ b/tests/qemuxml2argvdata/net-virtio-failover.args @@ -0,0 +1,36 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-i386 \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=tcg,usb=off,dump-guest-core=off \ +-m 214 \ +-realtime mlock=off \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-ide0-0-0 \ +-device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 \ +-netdev user,id=hostua-backup0 \ +-device virtio-net-pci,failover=on,netdev=hostua-backup0,id=ua-backup0,\ +mac=00:11:22:33:44:55,bus=pci.0,addr=0x3 \ +-netdev user,id=hostua-backup1 \ +-device virtio-net-pci,failover=on,netdev=hostua-backup1,id=ua-backup1,\ +mac=66:44:33:22:11:00,bus=pci.0,addr=0x4 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 diff --git a/tests/qemuxml2argvdata/net-virtio-failover.xml b/tests/qemuxml2argvdata/net-virtio-failover.xml new file mode 100644 index 0000000000..1f545b8d73 --- /dev/null +++ b/tests/qemuxml2argvdata/net-virtio-failover.xml @@ -0,0 +1,36 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i386</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + </disk> + <controller type='usb' index='0'/> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='virtio'/> + <driver failover='on'/> + <alias name='ua-backup0'/> + </interface> + <interface type='user'> + <mac address='66:44:33:22:11:00'/> + <model type='virtio'/> + <driver failover='on'/> + <alias name='ua-backup1'/> + </interface> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 58b4deefc6..c4fbe321d8 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1308,6 +1308,9 @@ mymain(void) QEMU_CAPS_VIRTIO_NET_RX_QUEUE_SIZE, QEMU_CAPS_VIRTIO_NET_TX_QUEUE_SIZE); DO_TEST_PARSE_ERROR("net-virtio-rxqueuesize-invalid-size", NONE); + DO_TEST("net-virtio-failover", + QEMU_CAPS_VIRTIO_NET_FAILOVER); + DO_TEST_PARSE_ERROR("net-virtio-failover", NONE); DO_TEST("net-eth", NONE); DO_TEST("net-eth-ifname", NONE); DO_TEST("net-eth-names", NONE); diff --git a/tests/qemuxml2xmloutdata/net-virtio-failover.xml b/tests/qemuxml2xmloutdata/net-virtio-failover.xml new file mode 100644 index 0000000000..7895c03dd7 --- /dev/null +++ b/tests/qemuxml2xmloutdata/net-virtio-failover.xml @@ -0,0 +1,50 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i386</emulator> + <disk type='block' device='disk'> + <driver name='qemu' type='raw'/> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <controller type='ide' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <interface type='user'> + <mac address='00:11:22:33:44:55'/> + <model type='virtio'/> + <driver failover='on'/> + <alias name='ua-backup0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </interface> + <interface type='user'> + <mac address='66:44:33:22:11:00'/> + <model type='virtio'/> + <driver failover='on'/> + <alias name='ua-backup1'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </interface> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 3cefc64833..326e49fbcd 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -451,6 +451,8 @@ mymain(void) DO_TEST("net-eth-unmanaged-tap", NONE); DO_TEST("net-virtio-network-portgroup", NONE); DO_TEST("net-virtio-rxtxqueuesize", NONE); + DO_TEST("net-virtio-failover", + QEMU_CAPS_VIRTIO_NET_FAILOVER); DO_TEST("net-hostdev", NONE); DO_TEST("net-hostdev-bootorder", NONE); DO_TEST("net-hostdev-vfio", QEMU_CAPS_DEVICE_VFIO_PCI); -- 2.24.1
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 1/21/20 10:26 AM, Daniel P. Berrangé wrote:
This attribute is only used for virtio-net devices, so it is stored in the virtio part of the anonymous union in virDomainNetDef::driver. An I'm not convinced that storing it only for virtio-net is the right approach. This feels like a generic concept, that just happens to only be implemented by virtio-net in QEMU today. IOW, it is valid to store it in the virDOmainNetDef in a
On Sun, Jan 19, 2020 at 10:24:13PM -0500, Laine Stump wrote: place that's generally applicable. Any restriction to virtio-net belongs in the QEMU driver, rather than the XML parser.
Agreed, although in light of your comment on 8/12 (suggesting we just create an entire new subelement), I guess the point is moot :-)

On Tue, Jan 21, 2020 at 12:48:27PM -0500, Laine Stump wrote:
On 1/21/20 10:26 AM, Daniel P. Berrangé wrote:
This attribute is only used for virtio-net devices, so it is stored in the virtio part of the anonymous union in virDomainNetDef::driver. An I'm not convinced that storing it only for virtio-net is the right approach. This feels like a generic concept, that just happens to only be implemented by virtio-net in QEMU today. IOW, it is valid to store it in the virDOmainNetDef in a
On Sun, Jan 19, 2020 at 10:24:13PM -0500, Laine Stump wrote: place that's generally applicable. Any restriction to virtio-net belongs in the QEMU driver, rather than the XML parser.
Agreed, although in light of your comment on 8/12 (suggesting we just create an entire new subelement), I guess the point is moot :-)
Or rather this point just reinforces comment 8 :-) Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

The new <driver> attribute backupAlias gives the alias name ("id" in QEMU parlance) of the virtio-net device that will be paired in the guest with a vfio-pci device for "failover" when the vfio-pci device is unplugged (e.g. during migration). This patch adds backupAlias for <interface type='hostdev'> (which has its <driver> stored in the child virDomainHostdevDef); an upcoming patch will handle the case of <interface type='network'> when the network is a pool of SRIOV VFs (in that case the <driver> subelement is stored in the virDomainNetDef). Since a MAC address must be set for the hostdev in order for the failover to work (a current guest driver limitation is that the MAC addresses of the two devices must match), and that isn't possible with plain <hostdev>, backupAlias is prohibited by validation in the case of a plain <hostdev>. An example usage: <interface type='network'> <source network='mybridge'/> <mac address='00:11:22:33:44:55'/> <model type='virtio'/> <alias name='ua-backup0'/> <driver failover='on'/> </interface> <interface type='hostdev' managed='yes'> <source> <address type='pci' domain='0x0000' bus='0x03' slot='0x07' function='0x1'/> </source> <mac address='00:11:22:33:44:55'/> <driver backupAlias='ua-backup0'/> </interface> QEMU will internally associate these two devices, and will automatically unplug the hostdev at the start of any migration, and re-plug a new device on the destination when migration is complete (it is up to the guest virtio driver to handle bonding the two devices). Signed-off-by: Laine Stump <laine@redhat.com> --- docs/schemas/domaincommon.rng | 5 +++++ src/conf/domain_conf.c | 16 ++++++++++++++-- src/conf/domain_conf.h | 1 + src/qemu/qemu_command.c | 2 ++ src/qemu/qemu_domain.c | 6 ++++++ .../qemuxml2argvdata/net-virtio-failover.args | 6 +++++- tests/qemuxml2argvdata/net-virtio-failover.xml | 14 ++++++++++++++ tests/qemuxml2argvtest.c | 3 ++- .../qemuxml2xmloutdata/net-virtio-failover.xml | 18 +++++++++++++++++- tests/qemuxml2xmltest.c | 3 ++- 10 files changed, 68 insertions(+), 6 deletions(-) diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 80aea47e36..e0977d28d3 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -3002,6 +3002,11 @@ </optional> <optional> <element name="driver"> + <optional> + <attribute name="backupAlias"> + <ref name='aliasName'/> + </attribute> + </optional> <choice> <group> <attribute name="name"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 29636617a2..e0e47415ed 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2941,8 +2941,10 @@ void virDomainHostdevDefClear(virDomainHostdevDefPtr def) case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_SCSI_HOST: VIR_FREE(def->source.subsys.u.scsi_host.wwpn); break; - case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB: case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: + VIR_FREE(def->source.subsys.u.pci.backupAlias); + break; + case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB: case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_MDEV: case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_LAST: break; @@ -6360,6 +6362,12 @@ virDomainHostdevDefValidate(const virDomainHostdevDef *hostdev) "'unassigned' address type")); return -1; } + if (hostdev->source.subsys.u.pci.backupAlias && + !hostdev->parentnet) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("backupAlias is not supported for plain <hostdev> - <interface type='hostdev'> is required")); + return -1; + } break; case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_SCSI: if (hostdev->info->type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE && @@ -8410,6 +8418,7 @@ virDomainHostdevDefParseXMLSubsys(xmlNodePtr node, } pcisrc->backend = backend; + pcisrc->backupAlias = virXPathString("string(./driver/@backupAlias)", ctxt); break; case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB: @@ -25037,7 +25046,8 @@ virDomainHostdevDefFormatSubsys(virBufferPtr buf, virDomainHostdevSubsysSCSIiSCSIPtr iscsisrc = &scsisrc->u.iscsi; if (def->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && - pcisrc->backend != VIR_DOMAIN_HOSTDEV_PCI_BACKEND_DEFAULT) { + (pcisrc->backend != VIR_DOMAIN_HOSTDEV_PCI_BACKEND_DEFAULT || + pcisrc->backupAlias)) { virBufferAddLit(buf, "<driver"); @@ -25054,6 +25064,8 @@ virDomainHostdevDefFormatSubsys(virBufferPtr buf, virBufferAsprintf(buf, " name='%s'", backend); } + virBufferEscapeString(buf, " backupAlias='%s'", pcisrc->backupAlias); + virBufferAddLit(buf, "/>\n"); } diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index af9691d62b..0bf1d2b6bf 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -235,6 +235,7 @@ struct _virDomainHostdevSubsysUSB { struct _virDomainHostdevSubsysPCI { virPCIDeviceAddress addr; /* host address */ int backend; /* enum virDomainHostdevSubsysPCIBackendType */ + char *backupAlias; /* alias id of backup virtio device for failover */ }; struct _virDomainHostdevSubsysSCSIHost { diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index d3c0cc0506..82a95e2474 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -4706,6 +4706,8 @@ qemuBuildPCIHostdevDevStr(const virDomainDef *def, return NULL; if (qemuBuildRomStr(&buf, dev->info) < 0) return NULL; + if (pcisrc->backupAlias) + virBufferAsprintf(&buf, ",failover_pair_id=%s", pcisrc->backupAlias); return virBufferContentAndReset(&buf); } diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 6f45d74bde..d32a3c0935 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6664,6 +6664,12 @@ qemuDomainDeviceDefValidateHostdev(const virDomainHostdevDef *hostdev, return -1; } } + if (hostdev->source.subsys.u.pci.backupAlias && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_VIRTIO_NET_FAILOVER)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("virtio-net failover / hostdev backupAlias is not supported with this QEMU binary")); + return -1; + } break; case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_SCSI_HOST: diff --git a/tests/qemuxml2argvdata/net-virtio-failover.args b/tests/qemuxml2argvdata/net-virtio-failover.args index da41e19628..19e7260843 100644 --- a/tests/qemuxml2argvdata/net-virtio-failover.args +++ b/tests/qemuxml2argvdata/net-virtio-failover.args @@ -33,4 +33,8 @@ mac=00:11:22:33:44:55,bus=pci.0,addr=0x3 \ -netdev user,id=hostua-backup1 \ -device virtio-net-pci,failover=on,netdev=hostua-backup1,id=ua-backup1,\ mac=66:44:33:22:11:00,bus=pci.0,addr=0x4 \ --device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 +-device vfio-pci,host=0000:03:07.1,id=hostdev0,bus=pci.0,addr=0x5,\ +failover_pair_id=ua-backup0 \ +-device vfio-pci,host=0000:03:07.2,id=hostdev1,bus=pci.0,addr=0x6,\ +failover_pair_id=ua-backup1 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 diff --git a/tests/qemuxml2argvdata/net-virtio-failover.xml b/tests/qemuxml2argvdata/net-virtio-failover.xml index 1f545b8d73..99ac37013a 100644 --- a/tests/qemuxml2argvdata/net-virtio-failover.xml +++ b/tests/qemuxml2argvdata/net-virtio-failover.xml @@ -31,6 +31,20 @@ <driver failover='on'/> <alias name='ua-backup1'/> </interface> + <interface type='hostdev' managed='yes'> + <mac address='00:11:22:33:44:55'/> + <driver backupAlias='ua-backup0'/> + <source> + <address type='pci' domain='0x0000' bus='0x03' slot='0x07' function='0x1'/> + </source> + </interface> + <interface type='hostdev' managed='yes'> + <mac address='66:44:33:22:11:00'/> + <driver backupAlias='ua-backup1'/> + <source> + <address type='pci' domain='0x0000' bus='0x03' slot='0x07' function='0x2'/> + </source> + </interface> <memballoon model='virtio'/> </devices> </domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index c4fbe321d8..411393a55a 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1309,7 +1309,8 @@ mymain(void) QEMU_CAPS_VIRTIO_NET_TX_QUEUE_SIZE); DO_TEST_PARSE_ERROR("net-virtio-rxqueuesize-invalid-size", NONE); DO_TEST("net-virtio-failover", - QEMU_CAPS_VIRTIO_NET_FAILOVER); + QEMU_CAPS_VIRTIO_NET_FAILOVER, + QEMU_CAPS_DEVICE_VFIO_PCI); DO_TEST_PARSE_ERROR("net-virtio-failover", NONE); DO_TEST("net-eth", NONE); DO_TEST("net-eth-ifname", NONE); diff --git a/tests/qemuxml2xmloutdata/net-virtio-failover.xml b/tests/qemuxml2xmloutdata/net-virtio-failover.xml index 7895c03dd7..9c73a3bee7 100644 --- a/tests/qemuxml2xmloutdata/net-virtio-failover.xml +++ b/tests/qemuxml2xmloutdata/net-virtio-failover.xml @@ -41,10 +41,26 @@ <alias name='ua-backup1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </interface> + <interface type='hostdev' managed='yes'> + <mac address='00:11:22:33:44:55'/> + <driver backupAlias='ua-backup0'/> + <source> + <address type='pci' domain='0x0000' bus='0x03' slot='0x07' function='0x1'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </interface> + <interface type='hostdev' managed='yes'> + <mac address='66:44:33:22:11:00'/> + <driver backupAlias='ua-backup1'/> + <source> + <address type='pci' domain='0x0000' bus='0x03' slot='0x07' function='0x2'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> + </interface> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <memballoon model='virtio'> - <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </memballoon> </devices> </domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 326e49fbcd..c91690a030 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -452,7 +452,8 @@ mymain(void) DO_TEST("net-virtio-network-portgroup", NONE); DO_TEST("net-virtio-rxtxqueuesize", NONE); DO_TEST("net-virtio-failover", - QEMU_CAPS_VIRTIO_NET_FAILOVER); + QEMU_CAPS_VIRTIO_NET_FAILOVER, + QEMU_CAPS_DEVICE_VFIO_PCI); DO_TEST("net-hostdev", NONE); DO_TEST("net-hostdev-bootorder", NONE); DO_TEST("net-hostdev-vfio", QEMU_CAPS_DEVICE_VFIO_PCI); -- 2.24.1

For <interface type='hostdev'> the <driver> subelement (including the backupAlias attribute) is parsed directly into the hostdev child object (virDomaniHostdevDef) of the interface (using virDomainHostdevDefParseXMLSubsys()). But for <interface type='network'> where the network is a pool of hostdevs, the hostdev object doesn't exist until the network port is allocated at runtime, and so virDomainHostdevDefParseXMLSubsys() can't be called during XML parsing, and any backupAlias in the driver subelement of the XML will be lost. For this case, we need to add a backupAlias member to the interface object (virDomainNetDef), and parse it during the <driver> parsing that happens for all non-hostdev interfaces. Then when the network port is allocated at runtime and the hostdev child object is created, we can copy the backupAlias into the hostdev so it is available when building the QEMU commandline. An example usage: <interface type='network'> <source network='mybridge'/> <mac address='00:11:22:33:44:55'/> <model type='virtio'/> <alias name='ua-backup0'/> <driver failover='on'/> </interface> <interface type='network'> <source network='hostdev-pool'/> <mac address='00:11:22:33:44:55'/> <model type='virtio'/> <driver backupAlias='ua-backup0'/> </interface> Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 11 ++++ src/conf/domain_conf.h | 1 + src/qemu/qemu_domain.c | 15 +++--- .../net-virtio-failover-network.xml | 37 ++++++++++++++ .../net-virtio-failover-network.xml | 51 +++++++++++++++++++ tests/qemuxml2xmltest.c | 3 ++ 6 files changed, 112 insertions(+), 6 deletions(-) create mode 100644 tests/qemuxml2argvdata/net-virtio-failover-network.xml create mode 100644 tests/qemuxml2xmloutdata/net-virtio-failover-network.xml diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index e0e47415ed..89cccd22bc 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2428,6 +2428,7 @@ virDomainNetDefClear(virDomainNetDefPtr def) break; } + VIR_FREE(def->driver.backupAlias); VIR_FREE(def->backend.tap); VIR_FREE(def->backend.vhost); VIR_FREE(def->virtPortProfile); @@ -11563,6 +11564,7 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt, g_autofree char *localaddr = NULL; g_autofree char *localport = NULL; g_autofree char *model = NULL; + g_autofree char *backupAlias = NULL; g_autofree char *backend = NULL; g_autofree char *txmode = NULL; g_autofree char *ioeventfd = NULL; @@ -11735,6 +11737,7 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt, } else if (virXMLNodeNameEqual(cur, "model")) { model = virXMLPropString(cur, "type"); } else if (virXMLNodeNameEqual(cur, "driver")) { + backupAlias = virXMLPropString(cur, "backupAlias"); backend = virXMLPropString(cur, "name"); txmode = virXMLPropString(cur, "txmode"); ioeventfd = virXMLPropString(cur, "ioeventfd"); @@ -12075,6 +12078,10 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt, * the <driver> subelement of <interface type='hostdev'> has * already been parsed by virDomainHostdevDefParseXMLSubsys() */ + + if (backupAlias) + def->driver.backupAlias = g_steal_pointer(&backupAlias); + if (virDomainNetIsVirtioModel(def)) { if (backend != NULL) { if ((val = virDomainNetBackendTypeFromString(backend)) < 0 || @@ -25455,6 +25462,8 @@ virDomainNetDriverAttributesFormat(char **outstr, virDomainVirtioOptionsFormat(&buf, def->virtio); } + virBufferEscapeString(&buf, " backupAlias='%s'", def->driver.backupAlias); + *outstr = virBufferContentAndReset(&buf); } @@ -30675,6 +30684,8 @@ virDomainNetDefActualFromNetworkPort(virDomainNetDefPtr iface, } actual->data.hostdev.def.source.subsys.type = VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI; actual->data.hostdev.def.source.subsys.u.pci.addr = port->plug.hostdevpci.addr; + actual->data.hostdev.def.source.subsys.u.pci.backupAlias + = g_strdup(iface->driver.backupAlias); switch ((virNetworkForwardDriverNameType)port->plug.hostdevpci.driver) { case VIR_NETWORK_FORWARD_DRIVER_NAME_DEFAULT: actual->data.hostdev.def.source.subsys.u.pci.backend = diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 0bf1d2b6bf..2b6d9bab75 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -929,6 +929,7 @@ struct _virDomainNetDef { int model; /* virDomainNetModelType */ char *modelstr; struct { + char *backupAlias; /* alias id of backup virtio device for failover */ union { struct { virDomainNetBackendType name; /* which driver backend to use */ diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index d32a3c0935..b8acbc6248 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6472,13 +6472,16 @@ qemuDomainDeviceDefValidateNetwork(const virDomainNetDef *net, _("tx_queue_size has to be a power of two")); return -1; } + } - if (net->driver.virtio.failover == VIR_TRISTATE_SWITCH_ON && - !virQEMUCapsGet(qemuCaps, QEMU_CAPS_VIRTIO_NET_FAILOVER)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("virtio-net failover is not supported with this QEMU binary")); - return -1; - } + + if ((net->driver.backupAlias || + (virDomainNetIsVirtioModel(net) && + net->driver.virtio.failover == VIR_TRISTATE_SWITCH_ON)) && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_VIRTIO_NET_FAILOVER)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("virtio-net failover is not supported with this QEMU binary")); + return -1; } if (net->mtu && diff --git a/tests/qemuxml2argvdata/net-virtio-failover-network.xml b/tests/qemuxml2argvdata/net-virtio-failover-network.xml new file mode 100644 index 0000000000..e2674ed57d --- /dev/null +++ b/tests/qemuxml2argvdata/net-virtio-failover-network.xml @@ -0,0 +1,37 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i386</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + </disk> + <controller type='usb' index='0'/> + <interface type='network'> + <mac address='00:11:22:33:44:55'/> + <source network='mybridge'/> + <model type='virtio'/> + <driver failover='on'/> + <alias name='ua-backup0'/> + </interface> + <interface type='network'> + <mac address='00:11:22:33:44:55'/> + <source network='myhostdevpool'/> + <model type='virtio'/> + <driver backupAlias='ua-backup0'/> + </interface> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2xmloutdata/net-virtio-failover-network.xml b/tests/qemuxml2xmloutdata/net-virtio-failover-network.xml new file mode 100644 index 0000000000..b803de1da2 --- /dev/null +++ b/tests/qemuxml2xmloutdata/net-virtio-failover-network.xml @@ -0,0 +1,51 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i386</emulator> + <disk type='block' device='disk'> + <driver name='qemu' type='raw'/> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <controller type='ide' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <interface type='network'> + <mac address='00:11:22:33:44:55'/> + <source network='mybridge'/> + <model type='virtio'/> + <driver failover='on'/> + <alias name='ua-backup0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </interface> + <interface type='network'> + <mac address='00:11:22:33:44:55'/> + <source network='myhostdevpool'/> + <model type='virtio'/> + <driver backupAlias='ua-backup0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </interface> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index c91690a030..45ba4db30c 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -454,6 +454,9 @@ mymain(void) DO_TEST("net-virtio-failover", QEMU_CAPS_VIRTIO_NET_FAILOVER, QEMU_CAPS_DEVICE_VFIO_PCI); + DO_TEST("net-virtio-failover-network", + QEMU_CAPS_VIRTIO_NET_FAILOVER, + QEMU_CAPS_DEVICE_VFIO_PCI); DO_TEST("net-hostdev", NONE); DO_TEST("net-hostdev-bootorder", NONE); DO_TEST("net-hostdev-vfio", QEMU_CAPS_DEVICE_VFIO_PCI); -- 2.24.1

On Sun, Jan 19, 2020 at 10:24:15PM -0500, Laine Stump wrote:
For <interface type='hostdev'> the <driver> subelement (including the backupAlias attribute) is parsed directly into the hostdev child object (virDomaniHostdevDef) of the interface (using virDomainHostdevDefParseXMLSubsys()). But for <interface type='network'> where the network is a pool of hostdevs, the hostdev object doesn't exist until the network port is allocated at runtime, and so virDomainHostdevDefParseXMLSubsys() can't be called during XML parsing, and any backupAlias in the driver subelement of the XML will be lost.
I'm thinking this is a sign that we're storing the data in the wrong way / structure internally, and possibly with the XML schema too. Overall I'm not a fan of having the data scattered across three different structs for different scenarios....
For this case, we need to add a backupAlias member to the interface object (virDomainNetDef), and parse it during the <driver> parsing that happens for all non-hostdev interfaces. Then when the network port is allocated at runtime and the hostdev child object is created, we can copy the backupAlias into the hostdev so it is available when building the QEMU commandline.
An example usage:
<interface type='network'> <source network='mybridge'/> <mac address='00:11:22:33:44:55'/> <model type='virtio'/> <alias name='ua-backup0'/> <driver failover='on'/> </interface> <interface type='network'> <source network='hostdev-pool'/> <mac address='00:11:22:33:44:55'/> <model type='virtio'/> <driver backupAlias='ua-backup0'/> </interface>
I get that a big part of the problem is that the <driver> element is parsed by different bits of code depending on the type of interface being use, so it is hard to get consistent handling of this. We could perhaps better deal with this by not using <driver> at all and instead have a new <failover> element we store in virDomainNetDef separately from the driver struct that's causing us trouble. eg <interface type='network'> <source network='mybridge'/> <mac address='00:11:22:33:44:55'/> <model type='virtio'/> <alias name='ua-backup0'/> <failover type="primary"/> </interface> <interface type='network'> <source network='hostdev-pool'/> <mac address='00:11:22:33:44:55'/> <model type='virtio'/> <failover type="secondary" primary="ua-backup0"/> </interface> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Normally a PCI hostdev can't be migrated, so qemuMigrationSrcIsAllowedHostdev() won't permit it. In the case of a PCI hostdev that has the backupAlias attribute set, QEMU will automatically unplug the device prior to migration, and re-plug a corresponding device on the destination. This patch modifies qemuMigrationSrcIsAllowedHostdev() to allow domains with those devices to be migrated. Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_migration.c | 48 +++++++++++++++++++++++++++++++++++---- 1 file changed, 44 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 29d228a8d9..f675b445d5 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1093,10 +1093,50 @@ qemuMigrationSrcIsAllowedHostdev(const virDomainDef *def) * forbidden. */ for (i = 0; i < def->nhostdevs; i++) { virDomainHostdevDefPtr hostdev = def->hostdevs[i]; - if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS || - hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB) { - virReportError(VIR_ERR_OPERATION_INVALID, "%s", - _("domain has assigned non-USB host devices")); + switch ((virDomainHostdevMode)hostdev->mode) { + case VIR_DOMAIN_HOSTDEV_MODE_CAPABILITIES: + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("cannot migrate a domain with <hostdev mode='capabilities'>")); + return false; + + case VIR_DOMAIN_HOSTDEV_MODE_SUBSYS: + switch ((virDomainHostdevSubsysType)hostdev->source.subsys.type) { + case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB: + /* USB devices can be "migrated" */ + continue; + + case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_SCSI: + case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_SCSI_HOST: + case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_MDEV: + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, + _("cannot migrate a domain with <hostdev mode='subsystem' type='%s'>"), + virDomainHostdevSubsysTypeToString(hostdev->source.subsys.type)); + return false; + + case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: + /* + * if a backupAlias is defined, the device will be auto-unplugged + * during migration. + */ + if (hostdev->source.subsys.u.pci.backupAlias) + continue; + + /* all other PCI hostdevs can't be migrated */ + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, + _("cannot migrate a domain with <hostdev mode='subsystem' type='%s'>"), + virDomainHostdevSubsysTypeToString(hostdev->source.subsys.type)); + return false; + + case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_LAST: + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("invalid hostdev subsystem type")); + return false; + } + break; + + case VIR_DOMAIN_HOSTDEV_MODE_LAST: + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("invalid hostdev mode")); return false; } } -- 2.24.1

Aside from eliminating itinerant error (effectively just warning, since the migration continues) messages due to an unrecognized response from qemu, this isn't even necessary - the migration proceeds successfully to completion anyway. (I'm not sure where to see this status reported in the API though - do we need to add an extra state, or recognition of a new event somewhere?) Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_migration.c | 1 + src/qemu/qemu_monitor.c | 1 + src/qemu/qemu_monitor.h | 1 + src/qemu/qemu_monitor_json.c | 1 + 4 files changed, 4 insertions(+) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index f675b445d5..e53ef098ab 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1453,6 +1453,7 @@ qemuMigrationUpdateJobType(qemuDomainJobInfoPtr jobInfo) case QEMU_MONITOR_MIGRATION_STATUS_SETUP: case QEMU_MONITOR_MIGRATION_STATUS_ACTIVE: case QEMU_MONITOR_MIGRATION_STATUS_CANCELLING: + case QEMU_MONITOR_MIGRATION_STATUS_WAIT_UNPLUG: case QEMU_MONITOR_MIGRATION_STATUS_LAST: break; } diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index ccd20b3740..4f547bf5ec 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -168,6 +168,7 @@ VIR_ENUM_IMPL(qemuMonitorMigrationStatus, "device", "postcopy-active", "completed", "failed", "cancelling", "cancelled", + "wait-unplug", ); VIR_ENUM_IMPL(qemuMonitorVMStatus, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 3f3b81cddd..cca2cdcb27 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -767,6 +767,7 @@ typedef enum { QEMU_MONITOR_MIGRATION_STATUS_ERROR, QEMU_MONITOR_MIGRATION_STATUS_CANCELLING, QEMU_MONITOR_MIGRATION_STATUS_CANCELLED, + QEMU_MONITOR_MIGRATION_STATUS_WAIT_UNPLUG, QEMU_MONITOR_MIGRATION_STATUS_LAST } qemuMonitorMigrationStatus; diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index e5164d218a..5d8c7e9b5e 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -3515,6 +3515,7 @@ qemuMonitorJSONGetMigrationStatsReply(virJSONValuePtr reply, case QEMU_MONITOR_MIGRATION_STATUS_INACTIVE: case QEMU_MONITOR_MIGRATION_STATUS_SETUP: case QEMU_MONITOR_MIGRATION_STATUS_CANCELLED: + case QEMU_MONITOR_MIGRATION_STATUS_WAIT_UNPLUG: case QEMU_MONITOR_MIGRATION_STATUS_LAST: break; -- 2.24.1

The information in formatdomain.html seems too detailed, but it also didn't seem right to put that information in a wiki page before the patches are even pushed... Signed-off-by: Laine Stump <laine@redhat.com> --- docs/formatdomain.html.in | 70 +++++++++++++++++++++++++++++++++++++++ docs/news.xml | 27 +++++++++++++++ 2 files changed, 97 insertions(+) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 6e86d057a8..e3ea89fe25 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -5871,6 +5871,76 @@ </devices> ...</pre> + <h5><a id="elementsVirtioFailover">Using virtio "failover" to bond an emulated/hostdev NIC pair</a></h5> + + <p> + <span class="since">Since 6.1.0 (QEMU and KVM only, requires + QEMU 4.2.0 or newer)</span> If the virtio-net driver in the + guest OS supports the virtio "failover" feature, it is possible + to setup a simple bond device comprised of one emulated virtio + NIC and one SRIOV VF "hostdev" NIC. In this configuration, the + higher-performing hostdev NIC will normally be preferred for all + network traffic, but when the VM is migrated, QEMU will + automatically unplug the VF from the VM, and then hotplug a + similar device once migration is completed; while migration is + taking place, network traffic will use the virtio NIC. (Of + course the emulated virtio NIC and the hostdev NIC must be + connected to the same subnet for bonding to work properly). The + interface <code><driver></code> subelement + attributes <code>failover</code> and <code>backupAlias</code> + are used to set this up - the virtio NIC will need to + have <code>failover='on'</code> set in + its <code><driver></code>, and the hostdev NIC will + use <code>backupAlias</code> to indicate the alias name of the + virtio NIC. + </p> + <p> + NB1: Since you must know the alias name of the virtio + NIC when configuring the hostdev NIC, it will need to be + manually set in the virtio NIC's configuration (as with all + other manually set alias names, it must start with "ua-". + </p> + <p> + NB2: Currently the only known implementation of failover in a + guest OS virtio-net driver requires that the MAC addresses of + the virtio and hostdev NIC must match. Since that may not always + be a requirement, libvirt doesn't enforce this limitation - it + is up to the person/management application that is creating the + configuration. + </p> + <p> + NB3: Since the PCI addresses of the SRIOV VFs on the hosts that + are the source and destination of the migration will almost + certainly be different, either higher level management software + will need to modify the <code><source></code> of the + hostdev NIC (<code><interface type='hostdev'></code>) at + the start of migration, or (a simpler solution) the + configuration will need to use a libvirt "hostdev" virtual + network that maintains a pool of such devices, as is implied in + the following example's use of the libvirt network named + "hostdev-pool" - as long as the hostdev network pools on both + hosts have the same name, libvirt itself will take care of + allocating an appropriate device on both ends of the migration. + </p> + +<pre> +... +<devices> + <interface type='network'> + <source network='mybridge'/> + <mac address='00:11:22:33:44:55'/> + <model type='virtio'/> + <driver failover='on'/> + <alias name='ua-backup0'/> + </interface> + <interface type='network'> + <source network='hostdev-pool'/> + <mac address='00:11:22:33:44:55'/> + <model type='virtio'/> + <driver backupAlias='ua-backup0'/> + </interface> +</devices> +...</pre> <h5><a id="elementsNICSMulticast">Multicast tunnel</a></h5> diff --git a/docs/news.xml b/docs/news.xml index 056c7ef026..051b6b3a54 100644 --- a/docs/news.xml +++ b/docs/news.xml @@ -44,6 +44,33 @@ <libvirt> <release version="v6.1.0" date="unreleased"> <section title="New features"> + <change> + <summary> + support for virtio "failover" / QEMU auto-unplug of vfio devices + </summary> + <description> + QEMU 4.2.0 and later, combined with a sufficiently recent + guest virtio-net driver, supports setting up a simple + network bond device comprised of one virtio emulated NIC and + one hostdev NIC (which must be an SRIOV VF). The allure of + this setup is that the bond will always favor the hostdev + device, providing better performance, until the guest is + migrated - at that time QEMU will automatically unplug the + hostdev NIC and the bond will send all traffic via the + virtio NIC until migration is completed, then QEMU on the + destination side will hotplug a new hostdev NIC and the bond + will switch back to using the hostdev for network + traffic. The result is that guests desiring the extra + performance of a hostdev NIC are now migratable without + network downtime (performance is just degraded during + migration) and without requiring a complicated bonding + configuration in the guest OS network config and complicated + unplug/replug logic in the management application on the + host - it can instead all be accomplished in libvirt with + the interface <driver> subelement "failover" and + "backupAlias" attributes. + </description> + </change> </section> <section title="Improvements"> </section> -- 2.24.1

Current virtio-net drivers that support the failover feature match up the virtio backup device with its corresponding hostdev device by looking for an interface with a matching MAC address. Since libvirt will assign a different random MAC address to each interface that isn't given an explicit MAC address, this means that the configuration for any failover pairs must include explicit matching MAC addresses. To make life easier, we could have libvirt populate the XML config with the same auto-generated MAC address for both interfaces when it detects a failover pair that have no MAC addresses provided (a failover pair can be detected by matching <alias name='blah'/> of the virtio interface with <driver backupAlias='blah'/> of the hostdev interface), but then we would be stuck with that behavior even if the virtio guest driver later eliminated the requirement that mac addresses match. Additionally, some management software uses the MAC address as the primary index for its list of network devices, and can't properly deal with two interfaces having the same MAC address (oVirt). Even libvirt's own virsh utility uses MAC address (combined with interface type) to uniquely identify interfaces for the virsh detach-interface command (in this case, fortunately the runtime interface type is used, so one of the interfaces will always be of type='hostdev' and the other type='something-else", so it doesn't currently cause any problem). In order to remove the necessity of explicitly setting interface MAC addresses, as well as permit the two interfaces of a failover pair to each have a unique index while still fulfilling the current guest driver requirement that the MAC addresses matchin the guest, this patch adds a new attribute "useBackupMAC" that is set on the hostdev interface of the pair. When useBackupMAC='yes', the setup for the hostdev interface will find the virtio failover interface (using backupAlias) and use that interface's MAC address to initialize the MAC address of the hostdev interface; the MAC address in the hostdev interface config remains unchanged, it just isnt used for device initialization. I made this patch to followup on https://bugzilla.redhat.com/show_bug.cgi?id=1693587#c12 (where I suggested this attribute as a possible remedy to oVirt's requirement that each network device have a unique MAC address). Truthfully, I'm not convinced that I want it though, as it seems "a bit" hackish. In particular, I've thought for a long time that the "hostdev manager" code in util/virhostdev.c should really live in the node device driver and be called from the hypervisors via a public API (so that there is one central place on the host that maintains the list of in-use PCI devices and their status), but this patch adds an obstacle to that goal by adding a virDomainDefPtr to more of the APIs in that library - if this was turned into a public API, then entire virDomainDef would need to be serialized and sent in the API call, then parsed at the other end - yuck :-/. NB: there are already functions in virhostdev.h that take a virDomainDefPtr, so maybe I'm being too sensitive. On the upside, it solves a problem, and default bevahior is unchanged. So although we would have to live with the presence of this option forever once added, at least it would never be used unknowingly - the behavior it triggers would only occur if it was explicitly called out in the config. Signed-off-by: Laine Stump <laine@redhat.com> --- docs/formatdomain.html.in | 6 ++++- docs/schemas/domaincommon.rng | 5 ++++ src/conf/domain_conf.c | 41 ++++++++++++++++++++++++++++-- src/conf/domain_conf.h | 2 ++ src/libxl/libxl_driver.c | 2 +- src/qemu/qemu_hostdev.c | 5 ++-- src/qemu/qemu_hostdev.h | 1 + src/qemu/qemu_hotplug.c | 2 +- src/util/virhostdev.c | 47 +++++++++++++++++++++++++++++------ src/util/virhostdev.h | 1 + tests/virhostdevtest.c | 18 +++++++------- 11 files changed, 106 insertions(+), 24 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index e3ea89fe25..691c54da74 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -5906,7 +5906,11 @@ the virtio and hostdev NIC must match. Since that may not always be a requirement, libvirt doesn't enforce this limitation - it is up to the person/management application that is creating the - configuration. + configuration. However, if the driver + attribute <code>useBackupMAC</code> is set to "yes", then the + hostdev NIC's own MAC address will be ignored, and libvirt will + initialize the hostdev NIC with the MAC address configured for + the virtio NIC (the backup). </p> <p> NB3: Since the PCI addresses of the SRIOV VFs on the hosts that diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index e0977d28d3..cd19a7c3b4 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -3007,6 +3007,11 @@ <ref name='aliasName'/> </attribute> </optional> + <optional> + <attribute name="useBackupMAC"> + <ref name="virYesNo"/> + </attribute> + </optional> <choice> <group> <attribute name="name"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 89cccd22bc..ca49fb825b 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -6363,10 +6363,17 @@ virDomainHostdevDefValidate(const virDomainHostdevDef *hostdev) "'unassigned' address type")); return -1; } - if (hostdev->source.subsys.u.pci.backupAlias && + if ((hostdev->source.subsys.u.pci.backupAlias || + hostdev->source.subsys.u.pci.useBackupMAC == VIR_TRISTATE_BOOL_YES) && !hostdev->parentnet) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("backupAlias is not supported for plain <hostdev> - <interface type='hostdev'> is required")); + _("backupAlias and useBackupMAC are not supported for plain <hostdev> - <interface type='hostdev'> is required")); + return -1; + } + if (hostdev->source.subsys.u.pci.useBackupMAC == VIR_TRISTATE_BOOL_YES && + !hostdev->source.subsys.u.pci.backupAlias) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("useBackupMAC requires backupAlias to also be set")); return -1; } break; @@ -8270,6 +8277,7 @@ virDomainHostdevDefParseXMLSubsys(xmlNodePtr node, g_autofree char *model = NULL; g_autofree char *display = NULL; g_autofree char *ramfb = NULL; + g_autofree char *useBackupMAC = NULL; /* @managed can be read from the xml document - it is always an * attribute of the toplevel element, no matter what type of @@ -8420,6 +8428,13 @@ virDomainHostdevDefParseXMLSubsys(xmlNodePtr node, pcisrc->backend = backend; pcisrc->backupAlias = virXPathString("string(./driver/@backupAlias)", ctxt); + + if ((useBackupMAC = virXPathString("string(./driver/@useBackupMAC)", ctxt)) && + (pcisrc->useBackupMAC = virTristateBoolTypeFromString(useBackupMAC)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Unknown useBackupMAC value '%s'"), useBackupMAC); + return -1; + } break; case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB: @@ -11565,6 +11580,7 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt, g_autofree char *localport = NULL; g_autofree char *model = NULL; g_autofree char *backupAlias = NULL; + g_autofree char *useBackupMAC = NULL; g_autofree char *backend = NULL; g_autofree char *txmode = NULL; g_autofree char *ioeventfd = NULL; @@ -11738,6 +11754,7 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt, model = virXMLPropString(cur, "type"); } else if (virXMLNodeNameEqual(cur, "driver")) { backupAlias = virXMLPropString(cur, "backupAlias"); + useBackupMAC = virXMLPropString(cur, "useBackupMAC"); backend = virXMLPropString(cur, "name"); txmode = virXMLPropString(cur, "txmode"); ioeventfd = virXMLPropString(cur, "ioeventfd"); @@ -12082,6 +12099,13 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt, if (backupAlias) def->driver.backupAlias = g_steal_pointer(&backupAlias); + if (useBackupMAC && + (def->driver.useBackupMAC = virTristateBoolTypeFromString(useBackupMAC)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Unknown useBackupMAC value '%s'"), useBackupMAC); + goto error; + } + if (virDomainNetIsVirtioModel(def)) { if (backend != NULL) { if ((val = virDomainNetBackendTypeFromString(backend)) < 0 || @@ -25073,6 +25097,11 @@ virDomainHostdevDefFormatSubsys(virBufferPtr buf, virBufferEscapeString(buf, " backupAlias='%s'", pcisrc->backupAlias); + if (pcisrc->useBackupMAC) { + virBufferAsprintf(buf, " useBackupMAC='%s'", + virTristateBoolTypeToString(pcisrc->useBackupMAC)); + } + virBufferAddLit(buf, "/>\n"); } @@ -25464,6 +25493,11 @@ virDomainNetDriverAttributesFormat(char **outstr, virBufferEscapeString(&buf, " backupAlias='%s'", def->driver.backupAlias); + if (def->driver.useBackupMAC) { + virBufferAsprintf(&buf, " useBackupMAC='%s'", + virTristateBoolTypeToString(def->driver.useBackupMAC)); + } + *outstr = virBufferContentAndReset(&buf); } @@ -30686,6 +30720,9 @@ virDomainNetDefActualFromNetworkPort(virDomainNetDefPtr iface, actual->data.hostdev.def.source.subsys.u.pci.addr = port->plug.hostdevpci.addr; actual->data.hostdev.def.source.subsys.u.pci.backupAlias = g_strdup(iface->driver.backupAlias); + actual->data.hostdev.def.source.subsys.u.pci.useBackupMAC + = iface->driver.useBackupMAC; + switch ((virNetworkForwardDriverNameType)port->plug.hostdevpci.driver) { case VIR_NETWORK_FORWARD_DRIVER_NAME_DEFAULT: actual->data.hostdev.def.source.subsys.u.pci.backend = diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 2b6d9bab75..e505823a89 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -236,6 +236,7 @@ struct _virDomainHostdevSubsysPCI { virPCIDeviceAddress addr; /* host address */ int backend; /* enum virDomainHostdevSubsysPCIBackendType */ char *backupAlias; /* alias id of backup virtio device for failover */ + virTristateBool useBackupMAC; /* true to use MAC from backup virtio NIC */ }; struct _virDomainHostdevSubsysSCSIHost { @@ -930,6 +931,7 @@ struct _virDomainNetDef { char *modelstr; struct { char *backupAlias; /* alias id of backup virtio device for failover */ + virTristateBool useBackupMAC; /* true to use MAC from backup virtio NIC */ union { struct { virDomainNetBackendType name; /* which driver backend to use */ diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index f021ec9c5d..7e4266c534 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -3119,7 +3119,7 @@ libxlDomainAttachHostPCIDevice(libxlDriverPrivatePtr driver, if (virHostdevPreparePCIDevices(hostdev_mgr, LIBXL_DRIVER_NAME, vm->def->name, vm->def->uuid, - &hostdev, 1, 0) < 0) + vm->def, &hostdev, 1, 0) < 0) goto cleanup; if (libxlMakePCI(hostdev, &pcidev) < 0) diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index 1774850640..7359f67b66 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -238,6 +238,7 @@ int qemuHostdevPreparePCIDevices(virQEMUDriverPtr driver, const char *name, const unsigned char *uuid, + virDomainDefPtr def, virDomainHostdevDefPtr *hostdevs, int nhostdevs, virQEMUCapsPtr qemuCaps, @@ -249,7 +250,7 @@ qemuHostdevPreparePCIDevices(virQEMUDriverPtr driver, return -1; return virHostdevPreparePCIDevices(hostdev_mgr, QEMU_DRIVER_NAME, - name, uuid, hostdevs, + name, uuid, def, hostdevs, nhostdevs, flags); } @@ -354,7 +355,7 @@ qemuHostdevPrepareDomainDevices(virQEMUDriverPtr driver, return -1; if (qemuHostdevPreparePCIDevices(driver, def->name, def->uuid, - def->hostdevs, def->nhostdevs, + def, def->hostdevs, def->nhostdevs, qemuCaps, flags) < 0) return -1; diff --git a/src/qemu/qemu_hostdev.h b/src/qemu/qemu_hostdev.h index 23df925529..01a7dd37c5 100644 --- a/src/qemu/qemu_hostdev.h +++ b/src/qemu/qemu_hostdev.h @@ -51,6 +51,7 @@ int qemuHostdevPrepareNVMeDisks(virQEMUDriverPtr driver, int qemuHostdevPreparePCIDevices(virQEMUDriverPtr driver, const char *name, const unsigned char *uuid, + virDomainDefPtr def, virDomainHostdevDefPtr *hostdevs, int nhostdevs, virQEMUCapsPtr qemuCaps, diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 31d455505b..97fb859334 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1549,7 +1549,7 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver, if (!cfg->relaxedACS) flags |= VIR_HOSTDEV_STRICT_ACS_CHECK; if (qemuHostdevPreparePCIDevices(driver, vm->def->name, vm->def->uuid, - &hostdev, 1, priv->qemuCaps, flags) < 0) + vm->def, &hostdev, 1, priv->qemuCaps, flags) < 0) return -1; /* this could have been changed by qemuHostdevPreparePCIDevices */ diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index 9b4ea30216..d42b5263d6 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -487,6 +487,7 @@ virHostdevSaveNetConfig(virDomainHostdevDefPtr hostdev, /** * virHostdevSetNetConfig: * @hostdev: config object describing a hostdev device + * @domaindef: complete domain if available (otherwise NULL) * @uuid: uuid of the domain * * If the given hostdev device is an SRIOV network VF, determine its @@ -497,9 +498,11 @@ virHostdevSaveNetConfig(virDomainHostdevDefPtr hostdev, */ static int virHostdevSetNetConfig(virDomainHostdevDefPtr hostdev, + virDomainDefPtr domaindef, const unsigned char *uuid) { g_autofree char *linkdev = NULL; + const virMacAddr *mac; const virNetDevVlan *vlan; const virNetDevVPortProfile *virtPort; int vf = -1; @@ -511,6 +514,32 @@ virHostdevSetNetConfig(virDomainHostdevDefPtr hostdev, if (virHostdevNetDevice(hostdev, -1, &linkdev, &vf) < 0) return -1; + mac = &hostdev->parentnet->mac; + if (hostdev->source.subsys.u.pci.useBackupMAC == VIR_TRISTATE_BOOL_YES) { + size_t i; + + if (!domaindef) { + /* This should never happen */ + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to get backup device MAC because domaindef is NULL")); + return -1; + } + + for (i = 0; i < domaindef->nnets; i++) { + if (STREQ_NULLABLE(domaindef->nets[i]->info.alias, + hostdev->source.subsys.u.pci.backupAlias)) { + mac = &domaindef->nets[i]->mac; + break; + } + } + if (i > domaindef->nnets) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("backup device with alias name='%s' was not found"), + hostdev->source.subsys.u.pci.backupAlias); + return -1; + } + } + vlan = virDomainNetGetActualVlan(hostdev->parentnet); virtPort = virDomainNetGetActualVirtPortProfile(hostdev->parentnet); if (virtPort) { @@ -522,12 +551,10 @@ virHostdevSetNetConfig(virDomainHostdevDefPtr hostdev, return -1; } if (virHostdevNetConfigVirtPortProfile(linkdev, vf, virtPort, - &hostdev->parentnet->mac, - uuid, port_profile_associate) < 0) + mac, uuid, port_profile_associate) < 0) return -1; } else { - if (virNetDevSetNetConfig(linkdev, vf, &hostdev->parentnet->mac, - vlan, NULL, true) < 0) + if (virNetDevSetNetConfig(linkdev, vf, mac, vlan, NULL, true) < 0) return -1; } @@ -724,6 +751,7 @@ virHostdevPreparePCIDevicesImpl(virHostdevManagerPtr mgr, const char *dom_name, const unsigned char *uuid, virPCIDeviceListPtr pcidevs, + virDomainDefPtr domaindef, /* only used if nhostdevs > 0 */ virDomainHostdevDefPtr *hostdevs, int nhostdevs, unsigned int flags) @@ -872,7 +900,7 @@ virHostdevPreparePCIDevicesImpl(virHostdevManagerPtr mgr, * the network device, set the new netdev config */ for (i = 0; i < nhostdevs; i++) { - if (virHostdevSetNetConfig(hostdevs[i], uuid) < 0) + if (virHostdevSetNetConfig(hostdevs[i], domaindef, uuid) < 0) goto resetvfnetconfig; last_processed_hostdev_vf = i; @@ -988,6 +1016,7 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr mgr, const char *drv_name, const char *dom_name, const unsigned char *uuid, + virDomainDefPtr domaindef, virDomainHostdevDefPtr *hostdevs, int nhostdevs, unsigned int flags) @@ -1001,7 +1030,8 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr mgr, return -1; return virHostdevPreparePCIDevicesImpl(mgr, drv_name, dom_name, uuid, - pcidevs, hostdevs, nhostdevs, flags); + pcidevs, domaindef, hostdevs, + nhostdevs, flags); } @@ -2151,6 +2181,7 @@ virHostdevPrepareDomainDevices(virHostdevManagerPtr mgr, if (flags & VIR_HOSTDEV_SP_PCI) { if (virHostdevPreparePCIDevices(mgr, driver, def->name, def->uuid, + def, def->hostdevs, def->nhostdevs, flags) < 0) @@ -2358,8 +2389,8 @@ virHostdevPrepareOneNVMeDevice(virHostdevManagerPtr hostdev_mgr, } if (virHostdevPreparePCIDevicesImpl(hostdev_mgr, - drv_name, dom_name, NULL, - pciDevices, NULL, 0, pciFlags) < 0) + drv_name, dom_name, NULL, pciDevices, + NULL, NULL, 0, pciFlags) < 0) goto rollback; ret = 0; diff --git a/src/util/virhostdev.h b/src/util/virhostdev.h index ae84ed3d20..1aba8b73a8 100644 --- a/src/util/virhostdev.h +++ b/src/util/virhostdev.h @@ -68,6 +68,7 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr hostdev_mgr, const char *drv_name, const char *dom_name, const unsigned char *uuid, + virDomainDefPtr domaindef, virDomainHostdevDefPtr *hostdevs, int nhostdevs, unsigned int flags) diff --git a/tests/virhostdevtest.c b/tests/virhostdevtest.c index b6260bd9c1..f05ea73125 100644 --- a/tests/virhostdevtest.c +++ b/tests/virhostdevtest.c @@ -186,7 +186,7 @@ testVirHostdevPreparePCIHostdevs_unmanaged(void) /* Test normal functionality */ VIR_TEST_DEBUG("Test 0 hostdevs"); if (virHostdevPreparePCIDevices(mgr, drv_name, dom_name, uuid, - NULL, 0, 0) < 0) + NULL, NULL, 0, 0) < 0) return -1; CHECK_PCI_LIST_COUNT(mgr->activePCIHostdevs, active_count); CHECK_PCI_LIST_COUNT(mgr->inactivePCIHostdevs, inactive_count); @@ -194,7 +194,7 @@ testVirHostdevPreparePCIHostdevs_unmanaged(void) /* Test unmanaged hostdevs */ VIR_TEST_DEBUG("Test >=1 unmanaged hostdevs"); if (virHostdevPreparePCIDevices(mgr, drv_name, dom_name, uuid, - hostdevs, nhostdevs, 0) < 0) + NULL, hostdevs, nhostdevs, 0) < 0) return -1; CHECK_PCI_LIST_COUNT(mgr->activePCIHostdevs, active_count + nhostdevs); CHECK_PCI_LIST_COUNT(mgr->inactivePCIHostdevs, inactive_count - nhostdevs); @@ -204,7 +204,7 @@ testVirHostdevPreparePCIHostdevs_unmanaged(void) inactive_count = virPCIDeviceListCount(mgr->inactivePCIHostdevs); VIR_TEST_DEBUG("Test: prepare same hostdevs for same driver/domain again"); if (virHostdevPreparePCIDevices(mgr, drv_name, dom_name, uuid, - &hostdevs[0], 1, 0) == 0) + NULL, &hostdevs[0], 1, 0) == 0) return -1; virResetLastError(); CHECK_PCI_LIST_COUNT(mgr->activePCIHostdevs, active_count); @@ -212,7 +212,7 @@ testVirHostdevPreparePCIHostdevs_unmanaged(void) VIR_TEST_DEBUG("Test: prepare same hostdevs for same driver, diff domain again"); if (virHostdevPreparePCIDevices(mgr, drv_name, "test_domain1", uuid, - &hostdevs[1], 1, 0) == 0) + NULL, &hostdevs[1], 1, 0) == 0) return -1; virResetLastError(); CHECK_PCI_LIST_COUNT(mgr->activePCIHostdevs, active_count); @@ -220,7 +220,7 @@ testVirHostdevPreparePCIHostdevs_unmanaged(void) VIR_TEST_DEBUG("Test: prepare same hostdevs for diff driver/domain again"); if (virHostdevPreparePCIDevices(mgr, "test_driver1", dom_name, uuid, - &hostdevs[2], 1, 0) == 0) + NULL, &hostdevs[2], 1, 0) == 0) return -1; virResetLastError(); CHECK_PCI_LIST_COUNT(mgr->activePCIHostdevs, active_count); @@ -272,7 +272,7 @@ testVirHostdevPreparePCIHostdevs_managed(bool mixed) /* Test normal functionality */ VIR_TEST_DEBUG("Test >=1 hostdevs"); if (virHostdevPreparePCIDevices(mgr, drv_name, dom_name, uuid, - hostdevs, nhostdevs, 0) < 0) + NULL, hostdevs, nhostdevs, 0) < 0) return -1; CHECK_PCI_LIST_COUNT(mgr->activePCIHostdevs, active_count + nhostdevs); /* If testing a mixed roundtrip, devices are already in the inactive list @@ -288,7 +288,7 @@ testVirHostdevPreparePCIHostdevs_managed(bool mixed) inactive_count = virPCIDeviceListCount(mgr->inactivePCIHostdevs); VIR_TEST_DEBUG("Test: prepare same hostdevs for same driver/domain again"); if (virHostdevPreparePCIDevices(mgr, drv_name, dom_name, uuid, - &hostdevs[0], 1, 0) == 0) + NULL, &hostdevs[0], 1, 0) == 0) return -1; virResetLastError(); CHECK_PCI_LIST_COUNT(mgr->activePCIHostdevs, active_count); @@ -296,7 +296,7 @@ testVirHostdevPreparePCIHostdevs_managed(bool mixed) VIR_TEST_DEBUG("Test: prepare same hostdevs for same driver, diff domain again"); if (virHostdevPreparePCIDevices(mgr, drv_name, "test_domain1", uuid, - &hostdevs[1], 1, 0) == 0) + NULL, &hostdevs[1], 1, 0) == 0) return -1; virResetLastError(); CHECK_PCI_LIST_COUNT(mgr->activePCIHostdevs, active_count); @@ -304,7 +304,7 @@ testVirHostdevPreparePCIHostdevs_managed(bool mixed) VIR_TEST_DEBUG("Test: prepare same hostdevs for diff driver/domain again"); if (virHostdevPreparePCIDevices(mgr, "test_driver1", dom_name, uuid, - &hostdevs[2], 1, 0) == 0) + NULL, &hostdevs[2], 1, 0) == 0) return -1; virResetLastError(); CHECK_PCI_LIST_COUNT(mgr->activePCIHostdevs, active_count); -- 2.24.1

On Sun, Jan 19, 2020 at 10:24:19PM -0500, Laine Stump wrote:
Current virtio-net drivers that support the failover feature match up the virtio backup device with its corresponding hostdev device by looking for an interface with a matching MAC address. Since libvirt will assign a different random MAC address to each interface that isn't given an explicit MAC address, this means that the configuration for any failover pairs must include explicit matching MAC addresses.
To make life easier, we could have libvirt populate the XML config with the same auto-generated MAC address for both interfaces when it detects a failover pair that have no MAC addresses provided (a failover pair can be detected by matching <alias name='blah'/> of the virtio interface with <driver backupAlias='blah'/> of the hostdev interface), but then we would be stuck with that behavior even if the virtio guest driver later eliminated the requirement that mac addresses match.
Additionally, some management software uses the MAC address as the primary index for its list of network devices, and can't properly deal with two interfaces having the same MAC address (oVirt). Even libvirt's own virsh utility uses MAC address (combined with interface type) to uniquely identify interfaces for the virsh detach-interface command (in this case, fortunately the runtime interface type is used, so one of the interfaces will always be of type='hostdev' and the other type='something-else", so it doesn't currently cause any problem).
In order to remove the necessity of explicitly setting interface MAC addresses, as well as permit the two interfaces of a failover pair to each have a unique index while still fulfilling the current guest driver requirement that the MAC addresses matchin the guest, this patch adds a new attribute "useBackupMAC" that is set on the hostdev interface of the pair. When useBackupMAC='yes', the setup for the hostdev interface will find the virtio failover interface (using backupAlias) and use that interface's MAC address to initialize the MAC address of the hostdev interface; the MAC address in the hostdev interface config remains unchanged, it just isnt used for device initialization.
I made this patch to followup on https://bugzilla.redhat.com/show_bug.cgi?id=1693587#c12 (where I suggested this attribute as a possible remedy to oVirt's requirement that each network device have a unique MAC address).
Truthfully, I'm not convinced that I want it though, as it seems "a bit" hackish. In particular, I've thought for a long time that the "hostdev manager" code in util/virhostdev.c should really live in the node device driver and be called from the hypervisors via a public API (so that there is one central place on the host that maintains the list of in-use PCI devices and their status), but this patch adds an obstacle to that goal by adding a virDomainDefPtr to more of the APIs in that library - if this was turned into a public API, then entire virDomainDef would need to be serialized and sent in the API call, then parsed at the other end - yuck :-/. NB: there are already functions in virhostdev.h that take a virDomainDefPtr, so maybe I'm being too sensitive.
On the upside, it solves a problem, and default bevahior is unchanged.
I don't believe it does solve a real problem. If a mgmt app is capable of setting useBackupMAC=yes when writing the XML doc, then I don't see why it cannot just as easily set the matching MAC address when wring the XML doc. It can still treat both NICs as having a different MAC address in its own internal code. All it has to do is use the matching MAC address when writing out the XML config it gives to libvirt. I know oVirt has a facility for hook scripts that are add-ons which can arbitrarily munge the XML that VDSM creates. So AFAICT this doesn't even need to involve changes to VDSM itself. There can be a hook script which looks for the config indicating a failover pair, and rewrites the XML to have the matching MAC addr. Such a workaround then only needs to exist for as long as the mgmt app has this problematic limitation without impacting libvirt's maint. So I don't want to take this application specific hack in libvirt. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Mon, Jan 20, 2020 at 8:33 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Sun, Jan 19, 2020 at 10:24:19PM -0500, Laine Stump wrote:
Current virtio-net drivers that support the failover feature match up the virtio backup device with its corresponding hostdev device by looking for an interface with a matching MAC address. Since libvirt will assign a different random MAC address to each interface that isn't given an explicit MAC address, this means that the configuration for any failover pairs must include explicit matching MAC addresses.
To make life easier, we could have libvirt populate the XML config with the same auto-generated MAC address for both interfaces when it detects a failover pair that have no MAC addresses provided (a failover pair can be detected by matching <alias name='blah'/> of the virtio interface with <driver backupAlias='blah'/> of the hostdev interface), but then we would be stuck with that behavior even if the virtio guest driver later eliminated the requirement that mac addresses match.
Additionally, some management software uses the MAC address as the primary index for its list of network devices, and can't properly deal with two interfaces having the same MAC address (oVirt). Even libvirt's own virsh utility uses MAC address (combined with interface type) to uniquely identify interfaces for the virsh detach-interface command (in this case, fortunately the runtime interface type is used, so one of the interfaces will always be of type='hostdev' and the other type='something-else", so it doesn't currently cause any problem).
In order to remove the necessity of explicitly setting interface MAC addresses, as well as permit the two interfaces of a failover pair to each have a unique index while still fulfilling the current guest driver requirement that the MAC addresses matchin the guest, this patch adds a new attribute "useBackupMAC" that is set on the hostdev interface of the pair. When useBackupMAC='yes', the setup for the hostdev interface will find the virtio failover interface (using backupAlias) and use that interface's MAC address to initialize the MAC address of the hostdev interface; the MAC address in the hostdev interface config remains unchanged, it just isnt used for device initialization.
I made this patch to followup on https://bugzilla.redhat.com/show_bug.cgi?id=1693587#c12 (where I suggested this attribute as a possible remedy to oVirt's requirement that each network device have a unique MAC address).
Truthfully, I'm not convinced that I want it though, as it seems "a bit" hackish. In particular, I've thought for a long time that the "hostdev manager" code in util/virhostdev.c should really live in the node device driver and be called from the hypervisors via a public API (so that there is one central place on the host that maintains the list of in-use PCI devices and their status), but this patch adds an obstacle to that goal by adding a virDomainDefPtr to more of the APIs in that library - if this was turned into a public API, then entire virDomainDef would need to be serialized and sent in the API call, then parsed at the other end - yuck :-/. NB: there are already functions in virhostdev.h that take a virDomainDefPtr, so maybe I'm being too sensitive.
On the upside, it solves a problem, and default bevahior is unchanged.
I don't believe it does solve a real problem.
If a mgmt app is capable of setting useBackupMAC=yes when writing the XML doc, then I don't see why it cannot just as easily set the matching MAC address when wring the XML doc.
It can still treat both NICs as having a different MAC address in its own internal code. All it has to do is use the matching MAC address when writing out the XML config it gives to libvirt.
I know oVirt has a facility for hook scripts that are add-ons which can arbitrarily munge the XML that VDSM creates. So AFAICT this doesn't even need to involve changes to VDSM itself. There can be a hook script which looks for the config indicating a failover pair, and rewrites the XML to have the matching MAC addr.
Such a workaround then only needs to exist for as long as the mgmt app has this problematic limitation without impacting libvirt's maint.
So I don't want to take this application specific hack in libvirt.
I see why you can see it as a hack that should not exist in libvirt. However I would try to put out my case for it. This feature creates something very similar to an in-guest bond between an sriov interface to a virtio one. Currently, we ask end users to configure the bond, set the sriov interface as the primary interface, and allow mac-spoofing on the virio interface. To me, the purpose of this feature is to remove the need for end-user intervention. The bond device no longer need to be created in the guest, it can be configured by management on the host side. Defining bonds in Linux is an established procedure. You select pre-existing interfaces, each with its different mac address, pick up a bonding mode, pick up a master interface, pick up the active mac address of the bond and start using it. I'd like to have the same experience when I configure this new type of bonding via libvirt. It just feels right to have a couple of independent interfaces, bonded together, with one selected as "master". Regards, Dan.
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Tue, Jan 21, 2020 at 12:46:38PM +0200, Dan Kenigsberg wrote:
On Mon, Jan 20, 2020 at 8:33 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Sun, Jan 19, 2020 at 10:24:19PM -0500, Laine Stump wrote:
Current virtio-net drivers that support the failover feature match up the virtio backup device with its corresponding hostdev device by looking for an interface with a matching MAC address. Since libvirt will assign a different random MAC address to each interface that isn't given an explicit MAC address, this means that the configuration for any failover pairs must include explicit matching MAC addresses.
To make life easier, we could have libvirt populate the XML config with the same auto-generated MAC address for both interfaces when it detects a failover pair that have no MAC addresses provided (a failover pair can be detected by matching <alias name='blah'/> of the virtio interface with <driver backupAlias='blah'/> of the hostdev interface), but then we would be stuck with that behavior even if the virtio guest driver later eliminated the requirement that mac addresses match.
Additionally, some management software uses the MAC address as the primary index for its list of network devices, and can't properly deal with two interfaces having the same MAC address (oVirt). Even libvirt's own virsh utility uses MAC address (combined with interface type) to uniquely identify interfaces for the virsh detach-interface command (in this case, fortunately the runtime interface type is used, so one of the interfaces will always be of type='hostdev' and the other type='something-else", so it doesn't currently cause any problem).
In order to remove the necessity of explicitly setting interface MAC addresses, as well as permit the two interfaces of a failover pair to each have a unique index while still fulfilling the current guest driver requirement that the MAC addresses matchin the guest, this patch adds a new attribute "useBackupMAC" that is set on the hostdev interface of the pair. When useBackupMAC='yes', the setup for the hostdev interface will find the virtio failover interface (using backupAlias) and use that interface's MAC address to initialize the MAC address of the hostdev interface; the MAC address in the hostdev interface config remains unchanged, it just isnt used for device initialization.
I made this patch to followup on https://bugzilla.redhat.com/show_bug.cgi?id=1693587#c12 (where I suggested this attribute as a possible remedy to oVirt's requirement that each network device have a unique MAC address).
Truthfully, I'm not convinced that I want it though, as it seems "a bit" hackish. In particular, I've thought for a long time that the "hostdev manager" code in util/virhostdev.c should really live in the node device driver and be called from the hypervisors via a public API (so that there is one central place on the host that maintains the list of in-use PCI devices and their status), but this patch adds an obstacle to that goal by adding a virDomainDefPtr to more of the APIs in that library - if this was turned into a public API, then entire virDomainDef would need to be serialized and sent in the API call, then parsed at the other end - yuck :-/. NB: there are already functions in virhostdev.h that take a virDomainDefPtr, so maybe I'm being too sensitive.
On the upside, it solves a problem, and default bevahior is unchanged.
I don't believe it does solve a real problem.
If a mgmt app is capable of setting useBackupMAC=yes when writing the XML doc, then I don't see why it cannot just as easily set the matching MAC address when wring the XML doc.
It can still treat both NICs as having a different MAC address in its own internal code. All it has to do is use the matching MAC address when writing out the XML config it gives to libvirt.
I know oVirt has a facility for hook scripts that are add-ons which can arbitrarily munge the XML that VDSM creates. So AFAICT this doesn't even need to involve changes to VDSM itself. There can be a hook script which looks for the config indicating a failover pair, and rewrites the XML to have the matching MAC addr.
Such a workaround then only needs to exist for as long as the mgmt app has this problematic limitation without impacting libvirt's maint.
So I don't want to take this application specific hack in libvirt.
I see why you can see it as a hack that should not exist in libvirt. However I would try to put out my case for it. This feature creates something very similar to an in-guest bond between an sriov interface to a virtio one. Currently, we ask end users to configure the bond, set the sriov interface as the primary interface, and allow mac-spoofing on the virio interface. To me, the purpose of this feature is to remove the need for end-user intervention. The bond device no longer need to be created in the guest, it can be configured by management on the host side. Defining bonds in Linux is an established procedure. You select pre-existing interfaces, each with its different mac address, pick up a bonding mode, pick up a master interface, pick up the active mac address of the bond and start using it. I'd like to have the same experience when I configure this new type of bonding via libvirt. It just feels right to have a couple of independent interfaces, bonded together, with one selected as "master".
This is comparing apples & oranges IMHO. It is comparing what is done as an end user in the guest with what is done as an internal config option in libvirt. If you want to compare the user experiance, then the comparison is between the guest setup and the RHEV user interface experiance. AFAICT, we can achieve the desired user experiance in RHEV and there's no functional reason to need this patch. It is just adding a policy control knob that doesn't have any impact on what it is possible to configure for the guest, while adding to maint burden of libvirt. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Tue, Jan 21, 2020 at 4:47 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Tue, Jan 21, 2020 at 12:46:38PM +0200, Dan Kenigsberg wrote:
On Mon, Jan 20, 2020 at 8:33 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Sun, Jan 19, 2020 at 10:24:19PM -0500, Laine Stump wrote:
Current virtio-net drivers that support the failover feature match up the virtio backup device with its corresponding hostdev device by looking for an interface with a matching MAC address. Since libvirt will assign a different random MAC address to each interface that isn't given an explicit MAC address, this means that the configuration for any failover pairs must include explicit matching MAC addresses.
To make life easier, we could have libvirt populate the XML config with the same auto-generated MAC address for both interfaces when it detects a failover pair that have no MAC addresses provided (a failover pair can be detected by matching <alias name='blah'/> of the virtio interface with <driver backupAlias='blah'/> of the hostdev interface), but then we would be stuck with that behavior even if the virtio guest driver later eliminated the requirement that mac addresses match.
Additionally, some management software uses the MAC address as the primary index for its list of network devices, and can't properly deal with two interfaces having the same MAC address (oVirt). Even libvirt's own virsh utility uses MAC address (combined with interface type) to uniquely identify interfaces for the virsh detach-interface command (in this case, fortunately the runtime interface type is used, so one of the interfaces will always be of type='hostdev' and the other type='something-else", so it doesn't currently cause any problem).
In order to remove the necessity of explicitly setting interface MAC addresses, as well as permit the two interfaces of a failover pair to each have a unique index while still fulfilling the current guest driver requirement that the MAC addresses matchin the guest, this patch adds a new attribute "useBackupMAC" that is set on the hostdev interface of the pair. When useBackupMAC='yes', the setup for the hostdev interface will find the virtio failover interface (using backupAlias) and use that interface's MAC address to initialize the MAC address of the hostdev interface; the MAC address in the hostdev interface config remains unchanged, it just isnt used for device initialization.
I made this patch to followup on https://bugzilla.redhat.com/show_bug.cgi?id=1693587#c12 (where I suggested this attribute as a possible remedy to oVirt's requirement that each network device have a unique MAC address).
Truthfully, I'm not convinced that I want it though, as it seems "a bit" hackish. In particular, I've thought for a long time that the "hostdev manager" code in util/virhostdev.c should really live in the node device driver and be called from the hypervisors via a public API (so that there is one central place on the host that maintains the list of in-use PCI devices and their status), but this patch adds an obstacle to that goal by adding a virDomainDefPtr to more of the APIs in that library - if this was turned into a public API, then entire virDomainDef would need to be serialized and sent in the API call, then parsed at the other end - yuck :-/. NB: there are already functions in virhostdev.h that take a virDomainDefPtr, so maybe I'm being too sensitive.
On the upside, it solves a problem, and default bevahior is unchanged.
I don't believe it does solve a real problem.
If a mgmt app is capable of setting useBackupMAC=yes when writing the XML doc, then I don't see why it cannot just as easily set the matching MAC address when wring the XML doc.
It can still treat both NICs as having a different MAC address in its own internal code. All it has to do is use the matching MAC address when writing out the XML config it gives to libvirt.
I know oVirt has a facility for hook scripts that are add-ons which can arbitrarily munge the XML that VDSM creates. So AFAICT this doesn't even need to involve changes to VDSM itself. There can be a hook script which looks for the config indicating a failover pair, and rewrites the XML to have the matching MAC addr.
Such a workaround then only needs to exist for as long as the mgmt app has this problematic limitation without impacting libvirt's maint.
So I don't want to take this application specific hack in libvirt.
I see why you can see it as a hack that should not exist in libvirt. However I would try to put out my case for it. This feature creates something very similar to an in-guest bond between an sriov interface to a virtio one. Currently, we ask end users to configure the bond, set the sriov interface as the primary interface, and allow mac-spoofing on the virio interface. To me, the purpose of this feature is to remove the need for end-user intervention. The bond device no longer need to be created in the guest, it can be configured by management on the host side. Defining bonds in Linux is an established procedure. You select pre-existing interfaces, each with its different mac address, pick up a bonding mode, pick up a master interface, pick up the active mac address of the bond and start using it. I'd like to have the same experience when I configure this new type of bonding via libvirt. It just feels right to have a couple of independent interfaces, bonded together, with one selected as "master".
This is comparing apples & oranges IMHO. It is comparing what is done as an end user in the guest with what is done as an internal config option in libvirt. If you want to compare the user experiance, then the comparison is between the guest setup and the RHEV user interface experiance. AFAICT, we can achieve the desired user experiance in RHEV and there's no functional reason to need this patch. It is just adding a policy control knob that doesn't have any impact on what it is possible to configure for the guest, while adding to maint burden of libvirt.
IMHO there are three layers of apples here. In oVirt or KubeVirt, maybe also in OpenStack, a VM owner would like to define to virtual interfaces. One that is primary, based on SR-IOV. The other is secondary, connected via virtio to a logical network to their choosing. They would to specify the second interface as a backup for the first one. They don't really care about the mac address of the interfaces. In the guest, oVirt users can currently do just this using normal Linux bonding. IMHO a user of virsh or virt-manager would like to do that, too. He or she should not bother setting the mac address of the two interfaces; no human like to do that. They rather state which interface is the backup of another one. I think that what you see as a maintenance burden stems from a design mistake^Wdecision in virtio, which expresses the "x is the backup of y" as "x and y have the same mac address and x is virtio". I (egotistically?) think of libvirt as the human-accessible API for virtualization, that let me express what I want, with little leakage of implementation issues. I'll try to express the user experience I envisage in KubeVirt terms: kind: VM spec: domain: devices: interfaces: - name: fast sriov: {} - name: backup bridge: {} backupOf: fast # <---- how I'd like to express the relation between the two nics networks: - name: fast multus: networkName: sriov-vlan-100 - name: backup multus: networkName: ovs-vlan-100 When this is fed today to a cluster with kubemacpool [1] enabled, kubemacpool would allocate a different mac to each of the interfaces. KubeVirt would be able to ignore the mac of the backup interface and feed the mac of the fast mac instead. I suppose that with enough work, oVirt would be able to do the same. Humans using virt-manager would curse us every time they copy a pseudorandom mac address from one nic to another. I think that it would be better to address this once, here in libvirt, rather than repeat it thrice higher up the management stack. Regards, Dan. [1] https://github.com/k8snetworkplumbingwg/kubemacpool

On Wed, Jan 22, 2020 at 02:59:36PM +0200, Dan Kenigsberg wrote:
On Tue, Jan 21, 2020 at 4:47 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Tue, Jan 21, 2020 at 12:46:38PM +0200, Dan Kenigsberg wrote:
On Mon, Jan 20, 2020 at 8:33 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Sun, Jan 19, 2020 at 10:24:19PM -0500, Laine Stump wrote:
Current virtio-net drivers that support the failover feature match up the virtio backup device with its corresponding hostdev device by looking for an interface with a matching MAC address. Since libvirt will assign a different random MAC address to each interface that isn't given an explicit MAC address, this means that the configuration for any failover pairs must include explicit matching MAC addresses.
To make life easier, we could have libvirt populate the XML config with the same auto-generated MAC address for both interfaces when it detects a failover pair that have no MAC addresses provided (a failover pair can be detected by matching <alias name='blah'/> of the virtio interface with <driver backupAlias='blah'/> of the hostdev interface), but then we would be stuck with that behavior even if the virtio guest driver later eliminated the requirement that mac addresses match.
Additionally, some management software uses the MAC address as the primary index for its list of network devices, and can't properly deal with two interfaces having the same MAC address (oVirt). Even libvirt's own virsh utility uses MAC address (combined with interface type) to uniquely identify interfaces for the virsh detach-interface command (in this case, fortunately the runtime interface type is used, so one of the interfaces will always be of type='hostdev' and the other type='something-else", so it doesn't currently cause any problem).
In order to remove the necessity of explicitly setting interface MAC addresses, as well as permit the two interfaces of a failover pair to each have a unique index while still fulfilling the current guest driver requirement that the MAC addresses matchin the guest, this patch adds a new attribute "useBackupMAC" that is set on the hostdev interface of the pair. When useBackupMAC='yes', the setup for the hostdev interface will find the virtio failover interface (using backupAlias) and use that interface's MAC address to initialize the MAC address of the hostdev interface; the MAC address in the hostdev interface config remains unchanged, it just isnt used for device initialization.
I made this patch to followup on https://bugzilla.redhat.com/show_bug.cgi?id=1693587#c12 (where I suggested this attribute as a possible remedy to oVirt's requirement that each network device have a unique MAC address).
Truthfully, I'm not convinced that I want it though, as it seems "a bit" hackish. In particular, I've thought for a long time that the "hostdev manager" code in util/virhostdev.c should really live in the node device driver and be called from the hypervisors via a public API (so that there is one central place on the host that maintains the list of in-use PCI devices and their status), but this patch adds an obstacle to that goal by adding a virDomainDefPtr to more of the APIs in that library - if this was turned into a public API, then entire virDomainDef would need to be serialized and sent in the API call, then parsed at the other end - yuck :-/. NB: there are already functions in virhostdev.h that take a virDomainDefPtr, so maybe I'm being too sensitive.
On the upside, it solves a problem, and default bevahior is unchanged.
I don't believe it does solve a real problem.
If a mgmt app is capable of setting useBackupMAC=yes when writing the XML doc, then I don't see why it cannot just as easily set the matching MAC address when wring the XML doc.
It can still treat both NICs as having a different MAC address in its own internal code. All it has to do is use the matching MAC address when writing out the XML config it gives to libvirt.
I know oVirt has a facility for hook scripts that are add-ons which can arbitrarily munge the XML that VDSM creates. So AFAICT this doesn't even need to involve changes to VDSM itself. There can be a hook script which looks for the config indicating a failover pair, and rewrites the XML to have the matching MAC addr.
Such a workaround then only needs to exist for as long as the mgmt app has this problematic limitation without impacting libvirt's maint.
So I don't want to take this application specific hack in libvirt.
I see why you can see it as a hack that should not exist in libvirt. However I would try to put out my case for it. This feature creates something very similar to an in-guest bond between an sriov interface to a virtio one. Currently, we ask end users to configure the bond, set the sriov interface as the primary interface, and allow mac-spoofing on the virio interface. To me, the purpose of this feature is to remove the need for end-user intervention. The bond device no longer need to be created in the guest, it can be configured by management on the host side. Defining bonds in Linux is an established procedure. You select pre-existing interfaces, each with its different mac address, pick up a bonding mode, pick up a master interface, pick up the active mac address of the bond and start using it. I'd like to have the same experience when I configure this new type of bonding via libvirt. It just feels right to have a couple of independent interfaces, bonded together, with one selected as "master".
This is comparing apples & oranges IMHO. It is comparing what is done as an end user in the guest with what is done as an internal config option in libvirt. If you want to compare the user experiance, then the comparison is between the guest setup and the RHEV user interface experiance. AFAICT, we can achieve the desired user experiance in RHEV and there's no functional reason to need this patch. It is just adding a policy control knob that doesn't have any impact on what it is possible to configure for the guest, while adding to maint burden of libvirt.
IMHO there are three layers of apples here. In oVirt or KubeVirt, maybe also in OpenStack, a VM owner would like to define to virtual interfaces. One that is primary, based on SR-IOV. The other is secondary, connected via virtio to a logical network to their choosing. They would to specify the second interface as a backup for the first one. They don't really care about the mac address of the interfaces. In the guest, oVirt users can currently do just this using normal Linux bonding. IMHO a user of virsh or virt-manager would like to do that, too. He or she should not bother setting the mac address of the two interfaces; no human like to do that. They rather state which interface is the backup of another one.
I think that what you see as a maintenance burden stems from a design mistake^Wdecision in virtio, which expresses the "x is the backup of y" as "x and y have the same mac address and x is virtio". I (egotistically?) think of libvirt as the human-accessible API for virtualization, that let me express what I want, with little leakage of implementation issues.
I think this is where the disconnect is. The libvirt API is not aiming to provide human targetted convenience. It is intending to provide a machine targetted functional mechanism with clear semantics, on which applications can construct human targetted virtualization solutions. Human convenience entails making a large number of usage policy decisions based on criteria that are appropriate for the application's use cases & knowledge of the guest OS needs.
I'll try to express the user experience I envisage in KubeVirt terms:
kind: VM spec: domain: devices: interfaces: - name: fast sriov: {} - name: backup bridge: {} backupOf: fast # <---- how I'd like to express the relation between the two nics networks: - name: fast multus: networkName: sriov-vlan-100 - name: backup multus: networkName: ovs-vlan-100
Yes, this is a perfectly sane way to expose teaming in KubeVirt. What you're showing here is the KubeVirt config though, and this will be converted into a libvirt guest XML config, at which time KubeVirt can choose to provide a matching MAC address for the two NICs it has requested. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

I forgot to mention in the cover letter - there is a bug in the QEMU 4.2.0 release that causes the qemu process to crash whenever the vfio device is unplugged, and for some reason when booting a Fedora 31 guest with a failover pair, the guest kernel attempts to unplug the vfio device during its initial boot! Fortunately that bug is already fixed, by this qemu commit: https://git.qemu.org/?p=qemu.git;a=commitdiff;h=0446f8121723b134ca1d1ed0b73e... We've also noticed that if the vfio device of the pair is manually unplugged from the guest, the DEVICE_DELETED event never makes it back up to libvirt, and so the device isn't removed from the libvirt config for the guest, is marked as still in-use, and can't be re-plugged into the same or any other guest until that particular guest is shut down. We still haven't found the root cause of that problem, but it seems likely it's not in these patches (more likely an issue in qemu or in the guest), so I posted them :-)
participants (3)
-
Dan Kenigsberg
-
Daniel P. Berrangé
-
Laine Stump