[PATCH 0/9] qemu: support passt as the backend for vhost-user network interfaces

passt (https://passt.top) provides a method of connecting QEMU virtual machines to the external network without requiring special privileges or capabilities of any participating processes - even libvirt itself can run unprivileged and create an instance of passt (which *always* runs unprivileged) that is then connected to the qemu process (and thus the virtual machine) with a unix socket. Originally passt used its own protocol for this socket, sending both control messages and data packets over the socket. This works, and is already much more efficient than the previously only-unprivileged-networking-solution slirp. But recently passt added support for using the vhost-user protocol for communication between the passt process (which is connected to the external network) and the QEMU process (and thus the VM). vhost-user also uses a unix socket, but only for control plane messages - all data packets are "sent" between the VM and passt process via a shared memory region. This is unsurprisingly much more efficient. From the point of view of QEMU, the passt process looks identical to any normal vhost-user backend, so we can run QEMU with exactly the same interface commandline options as normal vhost-user. Also, the passt process supports all of the same options as it does when used in its "traditional" mode, so really in the end all we need to do is twist libvirt around so that when <backend type='passt'/> is specified for an <interface type='vhostuser'>, it will run passt just as before (except with the added "--vhost-user" option so that passt will know to use that), and then force feed the vhost-user code in libvirt with the same ocket path used by passt. This series does that, while also switching up a few bits of code prior to adding in the new functionality. So far this has been tested both unprivileged and privileged on Fedora 40 (with latest passt packet) and selinux enabled (there are a couple of selinux policy tweaks that still need to be pushed to passt-selinux) as well as unprivileged on debian (I *think* with AppArmor enabled) and everything seems to work. (I haven't gotten to testing hotplug, but it *should* work, and I'll be testing it while (hopefully) someone is reviewing these patches.) I also need to make the patch to update formatdomain.rst before the rest of it can be pushed, but I wanted to get this posted to save time later. This series does require patch 1 of the series I posted a couple days ago that changes several functions that can't fail to return void. Also, you will need the latest (20250121) passt package. This Resolves: https://issues.redhat.com/browse/RHEL-69455 Laine Stump (9): conf: change virDomainHostdevInsert() to return void qemu: fix qemu validation to forbid guest-side IP address for type='vdpa' qemu: validate that model is virtio for vhostuser and vdpa interfaces in the same place qemu: automatically set model type='virtio' for interface type='vhostuser' qemu: do all vhostuser attribute validation in qemu driver conf/qemu: make <source> element *almost* optional for type=vhostuser qemu: use switch instead of if in qemuProcessPrepareDomainNetwork() qemu: make qemuPasstCreateSocketPath() public qemu: complete vhostuser + passt support src/conf/domain_conf.c | 107 +++++++++--------- src/conf/domain_conf.h | 2 +- src/conf/domain_validate.c | 83 ++++---------- src/conf/schemas/domaincommon.rng | 32 +++++- src/libxl/libxl_domain.c | 5 +- src/libxl/libxl_driver.c | 3 +- src/lxc/lxc_driver.c | 3 +- src/qemu/qemu_command.c | 7 +- src/qemu/qemu_driver.c | 3 +- src/qemu/qemu_extdevice.c | 6 +- src/qemu/qemu_hotplug.c | 21 +++- src/qemu/qemu_passt.c | 5 +- src/qemu/qemu_passt.h | 3 + src/qemu/qemu_postparse.c | 3 +- src/qemu/qemu_process.c | 84 +++++++++----- src/qemu/qemu_validate.c | 56 ++++++--- ...t-user-slirp-portforward.x86_64-latest.err | 2 +- .../net-vhostuser-passt.x86_64-latest.args | 42 +++++++ .../net-vhostuser-passt.x86_64-latest.xml | 72 ++++++++++++ tests/qemuxmlconfdata/net-vhostuser-passt.xml | 70 ++++++++++++ tests/qemuxmlconftest.c | 1 + 21 files changed, 429 insertions(+), 181 deletions(-) create mode 100644 tests/qemuxmlconfdata/net-vhostuser-passt.x86_64-latest.args create mode 100644 tests/qemuxmlconfdata/net-vhostuser-passt.x86_64-latest.xml create mode 100644 tests/qemuxmlconfdata/net-vhostuser-passt.xml -- 2.47.1

We haven't checked for memalloc failure in many years, and that was the only reason this function would have ever failed. Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 15 +++++---------- src/conf/domain_conf.h | 2 +- src/libxl/libxl_domain.c | 5 +---- src/libxl/libxl_driver.c | 3 +-- src/lxc/lxc_driver.c | 3 +-- src/qemu/qemu_driver.c | 3 +-- src/qemu/qemu_process.c | 3 +-- 7 files changed, 11 insertions(+), 23 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 87f87bbe56..50dc4a33a6 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -14465,12 +14465,10 @@ virDomainChrTargetTypeToString(int deviceType, return type; } -int +void virDomainHostdevInsert(virDomainDef *def, virDomainHostdevDef *hostdev) { VIR_APPEND_ELEMENT(def->hostdevs, def->nhostdevs, hostdev); - - return 0; } virDomainHostdevDef * @@ -14886,9 +14884,8 @@ virDomainDiskRemoveByName(virDomainDef *def, const char *name) int virDomainNetInsert(virDomainDef *def, virDomainNetDef *net) { /* hostdev net devices must also exist in the hostdevs array */ - if (net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV && - virDomainHostdevInsert(def, &net->data.hostdev.def) < 0) - return -1; + if (net->type == VIR_DOMAIN_NET_TYPE_HOSTDEV) + virDomainHostdevInsert(def, &net->data.hostdev.def); VIR_APPEND_ELEMENT(def->nets, def->nnets, net); return 0; @@ -19281,10 +19278,8 @@ virDomainDefParseXML(xmlXPathContextPtr ctxt, * where the actual network type is already known to be * hostdev) must also be in the hostdevs array. */ - if (virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_HOSTDEV && - virDomainHostdevInsert(def, virDomainNetGetActualHostdev(net)) < 0) { - return NULL; - } + if (virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_HOSTDEV) + virDomainHostdevInsert(def, virDomainNetGetActualHostdev(net)); } VIR_FREE(nodes); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index e51c74b6d1..9da6586e66 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -3994,7 +3994,7 @@ virDomainNetDef *virDomainNetRemove(virDomainDef *def, size_t i); virDomainNetDef *virDomainNetRemoveByObj(virDomainDef *def, virDomainNetDef *net); void virDomainNetRemoveHostdev(virDomainDef *def, virDomainNetDef *net); -int virDomainHostdevInsert(virDomainDef *def, virDomainHostdevDef *hostdev); +void virDomainHostdevInsert(virDomainDef *def, virDomainHostdevDef *hostdev); virDomainHostdevDef * virDomainHostdevRemove(virDomainDef *def, size_t i); int virDomainHostdevFind(virDomainDef *def, virDomainHostdevDef *match, diff --git a/src/libxl/libxl_domain.c b/src/libxl/libxl_domain.c index a049cdb30f..6805160923 100644 --- a/src/libxl/libxl_domain.c +++ b/src/libxl/libxl_domain.c @@ -1014,10 +1014,7 @@ libxlNetworkPrepareDevices(virDomainDef *def) /* Each type='hostdev' network device must also have a * corresponding entry in the hostdevs array. */ - virDomainHostdevDef *hostdev = virDomainNetGetActualHostdev(net); - - if (virDomainHostdevInsert(def, hostdev) < 0) - return -1; + virDomainHostdevInsert(def, virDomainNetGetActualHostdev(net)); } } diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index bd858d8127..edf7b37581 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -3574,8 +3574,7 @@ libxlDomainAttachDeviceConfig(virDomainDef *vmdef, virDomainDeviceDef *dev) return -1; } - if (virDomainHostdevInsert(vmdef, hostdev) < 0) - return -1; + virDomainHostdevInsert(vmdef, hostdev); dev->data.hostdev = NULL; break; diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index e63732dbea..22266c1ab6 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -2993,8 +2993,7 @@ lxcDomainAttachDeviceConfig(virDomainDef *vmdef, _("device is already in the domain configuration")); return -1; } - if (virDomainHostdevInsert(vmdef, hostdev) < 0) - return -1; + virDomainHostdevInsert(vmdef, hostdev); dev->data.hostdev = NULL; ret = 0; break; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 772cb405d6..1d0da1028f 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -6732,8 +6732,7 @@ qemuDomainAttachDeviceConfig(virDomainDef *vmdef, _("device is already in the domain configuration")); return -1; } - if (virDomainHostdevInsert(vmdef, hostdev)) - return -1; + virDomainHostdevInsert(vmdef, hostdev); dev->data.hostdev = NULL; break; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index d015285b0d..910229a616 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -5928,8 +5928,7 @@ qemuProcessPrepareDomainNetwork(virDomainObj *vm) if (qemuDomainPrepareHostdev(hostdev, priv) < 0) return -1; - if (virDomainHostdevInsert(def, hostdev) < 0) - return -1; + virDomainHostdevInsert(def, hostdev); } } return 0; -- 2.47.1

Because all the checks for VIR_DOMAIN_NET_TYPE_VDPA were inside an else-if clause that was immediately followed by another else-if clause that forbid setting guestIP.ips or guestIP.routes, we've been allowing users to set guestIP.* for vdpa interfaces (but then not doing validation of the attributes that should have been done if we *did* support setting IPs for vdpa (but we don't anyway, so :shrug:.) This can be fixed by turning the vdpa else-if clause into a top-level if - this way vdpa interfaces will hit the "else if (net->guestIP.nips)" clause and reject guest-side IP address setting. Also, since there are currently *no* interface types for QEMU that support adding guest-side routes, we put that check by itself (I think it may be possible to set some guest routes for passt interfaces, but we don't do that) Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_validate.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 76f2eafe49..06093bc42b 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -1745,6 +1745,12 @@ qemuValidateDomainDeviceDefNetwork(const virDomainNetDef *net, bool hasIPv6 = false; size_t i; + if (net->guestIP.nroutes) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Invalid attempt to set network interface guest-side IP route, not supported by QEMU")); + return -1; + } + if (net->type == VIR_DOMAIN_NET_TYPE_USER) { virDomainCapsDeviceNet netCaps = { }; @@ -1758,12 +1764,6 @@ qemuValidateDomainDeviceDefNetwork(const virDomainNetDef *net, return -1; } - if (net->guestIP.nroutes) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("Invalid attempt to set network interface guest-side IP route, not supported by QEMU")); - return -1; - } - for (i = 0; i < net->guestIP.nips; i++) { const virNetDevIPAddr *ip = net->guestIP.ips[i]; @@ -1811,7 +1811,13 @@ qemuValidateDomainDeviceDefNetwork(const virDomainNetDef *net, } } } - } else if (net->type == VIR_DOMAIN_NET_TYPE_VDPA) { + } else if (net->guestIP.nips) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Invalid attempt to set network interface guest-side IP address info, not supported by QEMU")); + return -1; + } + + if (net->type == VIR_DOMAIN_NET_TYPE_VDPA) { if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_NETDEV_VHOST_VDPA)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("vDPA devices are not supported with this QEMU binary")); @@ -1825,10 +1831,6 @@ qemuValidateDomainDeviceDefNetwork(const virDomainNetDef *net, virDomainNetModelTypeToString(net->model)); return -1; } - } else if (net->guestIP.nroutes || net->guestIP.nips) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("Invalid attempt to set network interface guest-side IP route and/or address info, not supported by QEMU")); - return -1; } if (virDomainNetIsVirtioModel(net)) { -- 2.47.1

Both vhostuser and vdpa interface types must use the virtio model in the guest (because part of the functionality is implemented in the guest virtio driver). Due to ["because that's the way it happened"] this has been validated for vhostuser in the hypervisor-agnostic validate function, but for vdpa it has been done in the QEMU-specific validate. Since these interface models are only supported by QEMU anyway, validate for both of them in the QEMU validation function. Take advantage of this change to switch to using virDomainNetIsVirtioModel(net) instead of "net->model == VIR_DOMAIN_NET_MODEL_VIRTIO" (the former also matches ...VIRTIO_TRANSITIONAL and ...VIRTIO_NON_TRANSITIONAL, so is more correct). Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_validate.c | 6 ------ src/qemu/qemu_validate.c | 11 ++++++----- 2 files changed, 6 insertions(+), 11 deletions(-) diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c index eb5e764c02..d0e2bcaccf 100644 --- a/src/conf/domain_validate.c +++ b/src/conf/domain_validate.c @@ -2186,12 +2186,6 @@ virDomainNetDefValidate(const virDomainNetDef *net) switch (net->type) { case VIR_DOMAIN_NET_TYPE_VHOSTUSER: - if (!virDomainNetIsVirtioModel(net)) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("Wrong or no <model> 'type' attribute specified with <interface type='vhostuser'/>. vhostuser requires the virtio-net* frontend")); - return -1; - } - if (net->data.vhostuser->data.nix.listen && net->data.vhostuser->data.nix.reconnect.enabled == VIR_TRISTATE_BOOL_YES) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 06093bc42b..243c499a33 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -1823,17 +1823,18 @@ qemuValidateDomainDeviceDefNetwork(const virDomainNetDef *net, _("vDPA devices are not supported with this QEMU binary")); return -1; } + } - if (net->model != VIR_DOMAIN_NET_MODEL_VIRTIO) { + if (!virDomainNetIsVirtioModel(net)) { + if (net->type == VIR_DOMAIN_NET_TYPE_VDPA || + net->type == VIR_DOMAIN_NET_TYPE_VHOSTUSER) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("invalid model for interface of type '%1$s': '%2$s'"), + _("invalid model for interface of type '%1$s': '%2$s' - must be 'virtio'"), virDomainNetTypeToString(net->type), virDomainNetModelTypeToString(net->model)); return -1; } - } - - if (virDomainNetIsVirtioModel(net)) { + } else { if (net->driver.virtio.rx_queue_size) { if (!VIR_IS_POW2(net->driver.virtio.rx_queue_size)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", -- 2.47.1

Both vdpa and vhostuser require that the guest device be virtio, and for interface type='vdpa', we already set <model type='virtio'/> if it is unspecified in the input XML, so let's be just as courteous for interface type='vhostuser'. Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_postparse.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_postparse.c b/src/qemu/qemu_postparse.c index 20ee333e0d..49009ae2e4 100644 --- a/src/qemu/qemu_postparse.c +++ b/src/qemu/qemu_postparse.c @@ -100,7 +100,8 @@ qemuDomainDeviceNetDefPostParse(virDomainNetDef *net, const virDomainDef *def, virQEMUCaps *qemuCaps) { - if (net->type == VIR_DOMAIN_NET_TYPE_VDPA && + if ((net->type == VIR_DOMAIN_NET_TYPE_VDPA || + net->type == VIR_DOMAIN_NET_TYPE_VHOSTUSER) && !virDomainNetGetModelString(net)) { net->model = VIR_DOMAIN_NET_MODEL_VIRTIO; } else if (net->type != VIR_DOMAIN_NET_TYPE_HOSTDEV && -- 2.47.1

Since vhostuser is only used/supported by the QEMU driver, and all the rest of the vhostuser-specific validation is done in QEMU's validation, lets move the final check (to see if they've tried to enable auto-reconnect when this interface is on the server side of the vhostuser socket) to the QEMU validate. Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_validate.c | 10 +--------- src/qemu/qemu_validate.c | 8 ++++++++ 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c index d0e2bcaccf..577dbab0af 100644 --- a/src/conf/domain_validate.c +++ b/src/conf/domain_validate.c @@ -2185,15 +2185,6 @@ virDomainNetDefValidate(const virDomainNetDef *net) } switch (net->type) { - case VIR_DOMAIN_NET_TYPE_VHOSTUSER: - if (net->data.vhostuser->data.nix.listen && - net->data.vhostuser->data.nix.reconnect.enabled == VIR_TRISTATE_BOOL_YES) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("'reconnect' attribute unsupported 'server' mode for <interface type='vhostuser'>")); - return -1; - } - break; - case VIR_DOMAIN_NET_TYPE_USER: if (net->backend.type == VIR_DOMAIN_NET_BACKEND_PASST) { size_t p; @@ -2217,6 +2208,7 @@ virDomainNetDefValidate(const virDomainNetDef *net) } break; + case VIR_DOMAIN_NET_TYPE_VHOSTUSER: case VIR_DOMAIN_NET_TYPE_NETWORK: case VIR_DOMAIN_NET_TYPE_VDPA: case VIR_DOMAIN_NET_TYPE_BRIDGE: diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 243c499a33..351fe38830 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -1825,6 +1825,14 @@ qemuValidateDomainDeviceDefNetwork(const virDomainNetDef *net, } } + if (net->type == VIR_DOMAIN_NET_TYPE_VHOSTUSER && + net->data.vhostuser->data.nix.listen && + net->data.vhostuser->data.nix.reconnect.enabled == VIR_TRISTATE_BOOL_YES) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("'reconnect' attribute is not supported when source mode='server' for <interface type='vhostuser'>")); + return -1; + } + if (!virDomainNetIsVirtioModel(net)) { if (net->type == VIR_DOMAIN_NET_TYPE_VDPA || net->type == VIR_DOMAIN_NET_TYPE_VHOSTUSER) { -- 2.47.1

For some reason, when vhostuser interface support was added in 2014, the parser required that the XML for the <interface> have a <source> element with type, mode, and path, all 3 also required. This in spite of the fact that 'unix' is the only possible valid setting for type, and 95% of the time the mode is set to 'client' (as I understand from comments in the code, normally a guest will use mode='client' to connect to an existing socket that is precreated (by OVS?), and the only use for mode='server' is for test setups where one guest is setup with a listening vhostuser socket (i.e. 'server') and another guest connects to that socket (i.e. 'client')). (or maybe one guest connects to OVS in server mode, and all the others connect in client mode, not sure - I don't claim to be an expert on vhost-user.) So from the point of view of existing vhost-user functionality, it seems reasonable to make 'type' and 'mode' optional, and by default fill in the vhostuser part of the NetDef as if they were 'unix' and 'client'. In theory, the <source> element itself is also not *directly* required after this patch, however, the path attribute of <source> *is* required (for now), so effectively the <source> element is still required. Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 56 ++++++++++++------------------- src/conf/schemas/domaincommon.rng | 4 ++- src/qemu/qemu_validate.c | 20 +++++++---- 3 files changed, 39 insertions(+), 41 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 50dc4a33a6..6b382eb63f 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -9776,50 +9776,38 @@ virDomainNetDefParseXML(virDomainXMLOption *xmlopt, g_autofree char *vhostuser_type = NULL; virDomainNetVhostuserMode vhostuser_mode; - if (virDomainNetDefParseXMLRequireSource(def, source_node) < 0) - return NULL; - - if (!(vhostuser_type = virXMLPropStringRequired(source_node, "type"))) - return NULL; - - if (STRNEQ_NULLABLE(vhostuser_type, "unix")) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("Type='%1$s' unsupported for <interface type='vhostuser'>"), - vhostuser_type); - return NULL; - } - if (!(def->data.vhostuser = virDomainChrSourceDefNew(xmlopt))) return NULL; + /* Default (and only valid) value of type is "unix". + * Everything else's default value is 0/NULL. + */ def->data.vhostuser->type = VIR_DOMAIN_CHR_TYPE_UNIX; - if (!(def->data.vhostuser->data.nix.path = virXMLPropStringRequired(source_node, "path"))) - return NULL; + if (source_node) { + if ((vhostuser_type = virXMLPropString(source_node, "type"))) { + if (STRNEQ(vhostuser_type, "unix")) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Type='%1$s' unsupported for <interface type='vhostuser'>"), + vhostuser_type); + return NULL; + } + } - if (virXMLPropEnum(source_node, "mode", - virDomainNetVhostuserModeTypeFromString, - VIR_XML_PROP_REQUIRED | VIR_XML_PROP_NONZERO, - &vhostuser_mode) < 0) - return NULL; + def->data.vhostuser->data.nix.path = virXMLPropString(source_node, "path"); - switch (vhostuser_mode) { - case VIR_DOMAIN_NET_VHOSTUSER_MODE_CLIENT: - def->data.vhostuser->data.nix.listen = false; - break; + if (virXMLPropEnum(source_node, "mode", virDomainNetVhostuserModeTypeFromString, + VIR_XML_PROP_NONZERO, &vhostuser_mode) < 0) { + return NULL; + } - case VIR_DOMAIN_NET_VHOSTUSER_MODE_SERVER: - def->data.vhostuser->data.nix.listen = true; - break; + if (vhostuser_mode == VIR_DOMAIN_NET_VHOSTUSER_MODE_SERVER) + def->data.vhostuser->data.nix.listen = true; - case VIR_DOMAIN_NET_VHOSTUSER_MODE_NONE: - case VIR_DOMAIN_NET_VHOSTUSER_MODE_LAST: - break; + if (virDomainChrSourceReconnectDefParseXML(&def->data.vhostuser->data.nix.reconnect, + source_node, ctxt) < 0) + return NULL; } - - if (virDomainChrSourceReconnectDefParseXML(&def->data.vhostuser->data.nix.reconnect, - source_node, ctxt) < 0) - return NULL; } break; diff --git a/src/conf/schemas/domaincommon.rng b/src/conf/schemas/domaincommon.rng index 96cedb85e8..e5da550e45 100644 --- a/src/conf/schemas/domaincommon.rng +++ b/src/conf/schemas/domaincommon.rng @@ -3485,7 +3485,9 @@ <value>vhostuser</value> </attribute> <interleave> - <ref name="unixSocketSource"/> + <optional> + <ref name="unixSocketSource"/> + </optional> <ref name="interface-options"/> </interleave> </group> diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 351fe38830..b0cf5e866c 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -1825,12 +1825,20 @@ qemuValidateDomainDeviceDefNetwork(const virDomainNetDef *net, } } - if (net->type == VIR_DOMAIN_NET_TYPE_VHOSTUSER && - net->data.vhostuser->data.nix.listen && - net->data.vhostuser->data.nix.reconnect.enabled == VIR_TRISTATE_BOOL_YES) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("'reconnect' attribute is not supported when source mode='server' for <interface type='vhostuser'>")); - return -1; + if (net->type == VIR_DOMAIN_NET_TYPE_VHOSTUSER) { + if (!net->data.vhostuser->data.nix.path) { + virReportError(VIR_ERR_XML_ERROR, + _("Missing required attribute '%1$s' in element '%2$s'"), + "path", "source"); + return -1; + } + + if (net->data.vhostuser->data.nix.listen && + net->data.vhostuser->data.nix.reconnect.enabled == VIR_TRISTATE_BOOL_YES) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("'reconnect' attribute is not supported when source mode='server' for <interface type='vhostuser'>")); + return -1; + } } if (!virDomainNetIsVirtioModel(net)) { -- 2.47.1

qemuProcessPrepareDomain()'s comments say that it should be the only place to change the "live XML" of a domain (i.e. the public parts of the virDomainDef object that is shown in the domain's status XML), and that seems like a reasonable idea (although there aren't many users of it to date). qemuProcessPrepareDomainNetwork() is called by the aforementioned qemuProcessPrepareDomain() - this patch changes the "if (type == HOSTDEV)" in that function to a "switch(type)" so it's simpler to add DomainDef modifications for various other types of virDomainNetDef, and also so that anyone who adds a new interface type is forced to look at the code and decide if anything needs to be done here for the new type. Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_process.c | 75 ++++++++++++++++++++++++++--------------- 1 file changed, 47 insertions(+), 28 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 910229a616..963d090963 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -5887,7 +5887,6 @@ qemuProcessPrepareDomainNetwork(virDomainObj *vm) for (i = 0; i < def->nnets; i++) { virDomainNetDef *net = def->nets[i]; - virDomainNetType actualType; /* If appropriate, grab a physical device from the configured * network's pool of devices, or resolve bridge device name @@ -5900,36 +5899,56 @@ qemuProcessPrepareDomainNetwork(virDomainObj *vm) return -1; } - actualType = virDomainNetGetActualType(net); - if (actualType == VIR_DOMAIN_NET_TYPE_HOSTDEV && - net->type == VIR_DOMAIN_NET_TYPE_NETWORK) { - /* Each type='hostdev' network device must also have a - * corresponding entry in the hostdevs array. For netdevs - * that are hardcoded as type='hostdev', this is already - * done by the parser, but for those allocated from a - * network / determined at runtime, we need to do it - * separately. - */ - virDomainHostdevDef *hostdev = virDomainNetGetActualHostdev(net); - virDomainHostdevSubsysPCI *pcisrc = &hostdev->source.subsys.u.pci; - - if (virDomainHostdevFind(def, hostdev, NULL) >= 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("PCI device %1$04x:%2$02x:%3$02x.%4$x allocated from network %5$s is already in use by domain %6$s"), - pcisrc->addr.domain, pcisrc->addr.bus, - pcisrc->addr.slot, pcisrc->addr.function, - net->data.network.name, def->name); - return -1; - } + switch (virDomainNetGetActualType(net)) { + case VIR_DOMAIN_NET_TYPE_HOSTDEV: + if (net->type == VIR_DOMAIN_NET_TYPE_NETWORK) { + /* Each type='hostdev' network device must also have a + * corresponding entry in the hostdevs array. For netdevs + * that are hardcoded as type='hostdev', this is already + * done by the parser, but for those allocated from a + * network / determined at runtime, we need to do it + * separately. + */ + virDomainHostdevDef *hostdev = virDomainNetGetActualHostdev(net); + virDomainHostdevSubsysPCI *pcisrc = &hostdev->source.subsys.u.pci; + + if (virDomainHostdevFind(def, hostdev, NULL) >= 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("PCI device %1$04x:%2$02x:%3$02x.%4$x allocated from network %5$s is already in use by domain %6$s"), + pcisrc->addr.domain, pcisrc->addr.bus, + pcisrc->addr.slot, pcisrc->addr.function, + net->data.network.name, def->name); + return -1; + } - /* For hostdev present in qemuProcessPrepareDomain() phase this was - * done already, but this code runs after that, so we have to call - * it ourselves. */ - if (qemuDomainPrepareHostdev(hostdev, priv) < 0) - return -1; + /* For hostdev present in qemuProcessPrepareDomain() phase this was + * done already, but this code runs after that, so we have to call + * it ourselves. */ + if (qemuDomainPrepareHostdev(hostdev, priv) < 0) + return -1; - virDomainHostdevInsert(def, hostdev); + virDomainHostdevInsert(def, hostdev); + } + break; + + case VIR_DOMAIN_NET_TYPE_DIRECT: + case VIR_DOMAIN_NET_TYPE_BRIDGE: + case VIR_DOMAIN_NET_TYPE_NETWORK: + case VIR_DOMAIN_NET_TYPE_ETHERNET: + case VIR_DOMAIN_NET_TYPE_USER: + case VIR_DOMAIN_NET_TYPE_VHOSTUSER: + case VIR_DOMAIN_NET_TYPE_SERVER: + case VIR_DOMAIN_NET_TYPE_CLIENT: + case VIR_DOMAIN_NET_TYPE_MCAST: + case VIR_DOMAIN_NET_TYPE_INTERNAL: + case VIR_DOMAIN_NET_TYPE_UDP: + case VIR_DOMAIN_NET_TYPE_VDPA: + case VIR_DOMAIN_NET_TYPE_NULL: + case VIR_DOMAIN_NET_TYPE_VDS: + case VIR_DOMAIN_NET_TYPE_LAST: + break; } + } return 0; } -- 2.47.1

When passt is used with vhostuser, the vhostuser code that builds the qemu commandline will need to have the same socket path that is given to the passt command, so this patch makes it visible outside of qemu_passt.c. Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_passt.c | 2 +- src/qemu/qemu_passt.h | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_passt.c b/src/qemu/qemu_passt.c index dd4a8bb997..8a3ac4e988 100644 --- a/src/qemu/qemu_passt.c +++ b/src/qemu/qemu_passt.c @@ -54,7 +54,7 @@ qemuPasstCreatePidFilename(virDomainObj *vm, } -static char * +char * qemuPasstCreateSocketPath(virDomainObj *vm, virDomainNetDef *net) { diff --git a/src/qemu/qemu_passt.h b/src/qemu/qemu_passt.h index 623b494b7a..e0b9aaac8d 100644 --- a/src/qemu/qemu_passt.h +++ b/src/qemu/qemu_passt.h @@ -36,3 +36,6 @@ void qemuPasstStop(virDomainObj *vm, int qemuPasstSetupCgroup(virDomainObj *vm, virDomainNetDef *net, virCgroup *cgroup); + +char *qemuPasstCreateSocketPath(virDomainObj *vm, + virDomainNetDef *net); -- 2.47.1

<interface type='vhostuser'><backend type='passt'/> needs to run the passt command just as is done for interface type='user', but then add vhostuser bits to the qemu commandline/monitor command. There are some changes to the parsing/validation along with changes to the vhostuser codepath do do the extra stuff for passt. I tried keeping them separated into different patches, but then the unit test failed in a strange way deep down in the bowels of the commandline generation, so this patch both 1) makes the final changes to parsing/formatting and 2) adds passt stuff at appropriate places for vhostuser (as well as making a couple of things *not* happen when the passt backend is chosen). The result is that you can now have: <interface type='vhostuser'> <backend type='passt'/> ... </interface> Then as long as you also have the following as a subelement of <domain>: <memoryBacking> <access mode='shared'/> </memoryBacking> your passt interfaces will benefit from the greatly improved efficiency of a vhost-user data path, and all without requiring special privileges or capabilities *anywhere* (i.e. it works for unprivileged libvirt (qemu:///session) as well as privileged libvirt). Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 36 ++++++--- src/conf/domain_validate.c | 75 +++++++------------ src/conf/schemas/domaincommon.rng | 32 +++++++- src/qemu/qemu_command.c | 7 +- src/qemu/qemu_extdevice.c | 6 +- src/qemu/qemu_hotplug.c | 21 +++++- src/qemu/qemu_passt.c | 3 + src/qemu/qemu_process.c | 14 +++- src/qemu/qemu_validate.c | 7 +- ...t-user-slirp-portforward.x86_64-latest.err | 2 +- .../net-vhostuser-passt.x86_64-latest.args | 42 +++++++++++ .../net-vhostuser-passt.x86_64-latest.xml | 72 ++++++++++++++++++ tests/qemuxmlconfdata/net-vhostuser-passt.xml | 70 +++++++++++++++++ tests/qemuxmlconftest.c | 1 + 14 files changed, 315 insertions(+), 73 deletions(-) create mode 100644 tests/qemuxmlconfdata/net-vhostuser-passt.x86_64-latest.args create mode 100644 tests/qemuxmlconfdata/net-vhostuser-passt.x86_64-latest.xml create mode 100644 tests/qemuxmlconfdata/net-vhostuser-passt.xml diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 6b382eb63f..49555efc56 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -9457,9 +9457,25 @@ virDomainNetBackendParseXML(xmlNodePtr node, g_autofree char *tap = virXMLPropString(node, "tap"); g_autofree char *vhost = virXMLPropString(node, "vhost"); - /* The VIR_DOMAIN_NET_BACKEND_DEFAULT really means 'use hypervisor's - * builtin SLIRP'. It's reported in domain caps and thus we need to accept - * it. Hence VIR_XML_PROP_NONE instead of VIR_XML_PROP_NONZERO. */ + /* In the case of NET_TYPE_USER, backend type can be unspecified + * (i.e. VIR_DOMAIN_NET_BACKEND_DEFAULT) and that means 'use + * hypervisor's builtin SLIRP (or if that isn't available, use + * passt)'. Similarly, it can also be left unspecified in the case + * of NET_TYPE_VHOSTUSER, and then it means "use the traditional + * vhost-user backend (which auto-detects between connecting to a + * socket created by OVS, or connecting to a standalone socket + * used (mostly in testing) to connect the vhost-user interface of + * one guest directly to the vhost-user interface of another + * guest. + * + * If backend type is set to 'passt', then in both cases a passt + * process will be started, and libvirt will connect that to the + * guest interface (either communicating everything over the + * socket created by passt using a specific-to-passt protocol + * (interface type='user'>), or by using the socket for control + * plane messages and shared memory for data using the vhost-user + * protocol (<interface type='vhostuser'>)). + */ if (virXMLPropEnum(node, "type", virDomainNetBackendTypeFromString, VIR_XML_PROP_NONE, &def->backend.type) < 0) { return -1; @@ -24616,7 +24632,11 @@ virDomainNetDefFormat(virBuffer *buf, break; case VIR_DOMAIN_NET_TYPE_VHOSTUSER: - if (def->data.vhostuser->type == VIR_DOMAIN_CHR_TYPE_UNIX) { + if (def->data.vhostuser->type == VIR_DOMAIN_CHR_TYPE_UNIX && + def->backend.type != VIR_DOMAIN_NET_BACKEND_PASST) { + /* in the case of BACKEND_PASST, the values of all of these are either + * fixed (type, mode, reconnect), or derived from elsewhere (path) + */ virBufferAddLit(&sourceAttrBuf, " type='unix'"); virBufferEscapeString(&sourceAttrBuf, " path='%s'", def->data.vhostuser->data.nix.path); @@ -24627,7 +24647,6 @@ virDomainNetDefFormat(virBuffer *buf, virDomainChrSourceReconnectDefFormat(&sourceChildBuf, &def->data.vhostuser->data.nix.reconnect); } - } break; @@ -24689,15 +24708,14 @@ virDomainNetDefFormat(virBuffer *buf, } case VIR_DOMAIN_NET_TYPE_USER: - if (def->backend.type == VIR_DOMAIN_NET_BACKEND_PASST) - virBufferEscapeString(&sourceAttrBuf, " dev='%s'", def->sourceDev); - break; - case VIR_DOMAIN_NET_TYPE_NULL: case VIR_DOMAIN_NET_TYPE_LAST: break; } + if (def->backend.type == VIR_DOMAIN_NET_BACKEND_PASST) + virBufferEscapeString(&sourceAttrBuf, " dev='%s'", def->sourceDev); + if (def->hostIP.nips || def->hostIP.nroutes) { if (virDomainNetIPInfoFormat(&sourceChildBuf, &def->hostIP) < 0) return -1; diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c index 577dbab0af..d38a77aa22 100644 --- a/src/conf/domain_validate.c +++ b/src/conf/domain_validate.c @@ -2163,7 +2163,8 @@ virDomainNetDefValidate(const virDomainNetDef *net) return -1; } - if (net->type != VIR_DOMAIN_NET_TYPE_USER) { + if (net->type != VIR_DOMAIN_NET_TYPE_USER && + net->type != VIR_DOMAIN_NET_TYPE_VHOSTUSER) { if (net->backend.type == VIR_DOMAIN_NET_BACKEND_PASST) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("The 'passt' backend can only be used with interface type='user'")); @@ -2171,59 +2172,37 @@ virDomainNetDefValidate(const virDomainNetDef *net) } } - if (net->nPortForwards > 0 && - (net->type != VIR_DOMAIN_NET_TYPE_USER || - (net->type == VIR_DOMAIN_NET_TYPE_USER && - net->backend.type != VIR_DOMAIN_NET_BACKEND_PASST))) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("The <portForward> element can only be used with <interface type='user'> and its 'passt' backend")); - return -1; - } + if (net->nPortForwards > 0) { + size_t p; - if (!virNetDevBandwidthValidate(net->bandwidth)) { - return -1; - } + if ((net->type != VIR_DOMAIN_NET_TYPE_USER && + net->type != VIR_DOMAIN_NET_TYPE_VHOSTUSER) || + net->backend.type != VIR_DOMAIN_NET_BACKEND_PASST) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("The <portForward> element can only be used with the 'passt' backend of interface type='user' or type='vhostuser'")); + return -1; + } - switch (net->type) { - case VIR_DOMAIN_NET_TYPE_USER: - if (net->backend.type == VIR_DOMAIN_NET_BACKEND_PASST) { - size_t p; - - for (p = 0; p < net->nPortForwards; p++) { - size_t r; - virDomainNetPortForward *pf = net->portForwards[p]; - - for (r = 0; r < pf->nRanges; r++) { - virDomainNetPortForwardRange *range = pf->ranges[r]; - - if (!range->start - && (range->end || range->to - || range->exclude != VIR_TRISTATE_BOOL_ABSENT)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("The 'range' of a 'portForward' requires 'start' attribute if 'end', 'to', or 'exclude' is specified")); - return -1; - } + for (p = 0; p < net->nPortForwards; p++) { + size_t r; + virDomainNetPortForward *pf = net->portForwards[p]; + + for (r = 0; r < pf->nRanges; r++) { + virDomainNetPortForwardRange *range = pf->ranges[r]; + + if (!range->start + && (range->end || range->to + || range->exclude != VIR_TRISTATE_BOOL_ABSENT)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("The 'range' of a 'portForward' requires 'start' attribute if 'end', 'to', or 'exclude' is specified")); + return -1; } } } - break; + } - case VIR_DOMAIN_NET_TYPE_VHOSTUSER: - case VIR_DOMAIN_NET_TYPE_NETWORK: - case VIR_DOMAIN_NET_TYPE_VDPA: - case VIR_DOMAIN_NET_TYPE_BRIDGE: - case VIR_DOMAIN_NET_TYPE_CLIENT: - case VIR_DOMAIN_NET_TYPE_SERVER: - case VIR_DOMAIN_NET_TYPE_MCAST: - case VIR_DOMAIN_NET_TYPE_UDP: - case VIR_DOMAIN_NET_TYPE_INTERNAL: - case VIR_DOMAIN_NET_TYPE_DIRECT: - case VIR_DOMAIN_NET_TYPE_HOSTDEV: - case VIR_DOMAIN_NET_TYPE_VDS: - case VIR_DOMAIN_NET_TYPE_ETHERNET: - case VIR_DOMAIN_NET_TYPE_NULL: - case VIR_DOMAIN_NET_TYPE_LAST: - break; + if (!virNetDevBandwidthValidate(net->bandwidth)) { + return -1; } return 0; diff --git a/src/conf/schemas/domaincommon.rng b/src/conf/schemas/domaincommon.rng index e5da550e45..3328a63205 100644 --- a/src/conf/schemas/domaincommon.rng +++ b/src/conf/schemas/domaincommon.rng @@ -3486,8 +3486,36 @@ </attribute> <interleave> <optional> - <ref name="unixSocketSource"/> - </optional> + <element name="source"> + <optional> + <attribute name="type"> + <value>unix</value> + </attribute> + </optional> + <optional> + <attribute name="path"> + <ref name="absFilePath"/> + </attribute> + </optional> + <optional> + <attribute name="mode"> + <choice> + <value>server</value> + <value>client</value> + </choice> + </attribute> + </optional> + <optional> + <attribute name="dev"> + <ref name="deviceName"/> + </attribute> + </optional> + <optional> + <ref name="reconnect"/> + </optional> + <empty/> + </element> + </optional> <ref name="interface-options"/> </interleave> </group> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 7370711918..54130ac4f0 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -8649,11 +8649,12 @@ qemuBuildInterfaceCommandLine(virQEMUDriver *driver, if (qemuInterfaceVhostuserConnect(cmd, net, qemuCaps) < 0) goto cleanup; - if (virNetDevOpenvswitchGetVhostuserIfname(net->data.vhostuser->data.nix.path, + if (net->backend.type != VIR_DOMAIN_NET_BACKEND_PASST && + virNetDevOpenvswitchGetVhostuserIfname(net->data.vhostuser->data.nix.path, net->data.vhostuser->data.nix.listen, - &net->ifname) < 0) + &net->ifname) < 0) { goto cleanup; - + } break; case VIR_DOMAIN_NET_TYPE_VDPA: diff --git a/src/qemu/qemu_extdevice.c b/src/qemu/qemu_extdevice.c index 954cb323a4..2384bab7a6 100644 --- a/src/qemu/qemu_extdevice.c +++ b/src/qemu/qemu_extdevice.c @@ -212,13 +212,15 @@ qemuExtDevicesStart(virQEMUDriver *driver, for (i = 0; i < def->nnets; i++) { virDomainNetDef *net = def->nets[i]; - if (net->type != VIR_DOMAIN_NET_TYPE_USER) + if (net->type != VIR_DOMAIN_NET_TYPE_USER && + net->type != VIR_DOMAIN_NET_TYPE_VHOSTUSER) { continue; + } if (net->backend.type == VIR_DOMAIN_NET_BACKEND_PASST) { if (qemuPasstStart(vm, net) < 0) return -1; - } else { + } else if (net->type == VIR_DOMAIN_NET_TYPE_USER) { if (qemuSlirpStart(vm, net, incomingMigration) < 0) return -1; } diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 6c224c9793..28ca321c5c 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1262,10 +1262,23 @@ qemuDomainAttachNetDevice(virQEMUDriver *driver, if (!(charDevAlias = qemuAliasChardevFromDevAlias(net->info.alias))) goto cleanup; - if (virNetDevOpenvswitchGetVhostuserIfname(net->data.vhostuser->data.nix.path, - net->data.vhostuser->data.nix.listen, - &net->ifname) < 0) - goto cleanup; + if (net->backend.type == VIR_DOMAIN_NET_BACKEND_PASST) { + + /* vhostuser needs socket path in this location, and when + * backend is passt, the path is derived from other info, + * not taken from config. + */ + g_free(net->data.vhostuser->data.nix.path); + net->data.vhostuser->data.nix.path = qemuPasstCreateSocketPath(vm, net); + + if (qemuPasstStart(vm, net) < 0) + goto cleanup; + } else { + if (virNetDevOpenvswitchGetVhostuserIfname(net->data.vhostuser->data.nix.path, + net->data.vhostuser->data.nix.listen, + &net->ifname) < 0) + goto cleanup; + } if (qemuSecuritySetNetdevLabel(driver, vm, net) < 0) goto cleanup; diff --git a/src/qemu/qemu_passt.c b/src/qemu/qemu_passt.c index 8a3ac4e988..b9616d1c63 100644 --- a/src/qemu/qemu_passt.c +++ b/src/qemu/qemu_passt.c @@ -180,6 +180,9 @@ qemuPasstStart(virDomainObj *vm, virCommandClearCaps(cmd); + if (virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_VHOSTUSER) + virCommandAddArg(cmd, "--vhost-user"); + virCommandAddArgList(cmd, "--one-off", "--socket", passtSocketName, diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 963d090963..be8e32fbc3 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -64,6 +64,7 @@ #include "qemu_backup.h" #include "qemu_dbus.h" #include "qemu_snapshot.h" +#include "qemu_passt.h" #include "cpu/cpu.h" #include "cpu/cpu_x86.h" @@ -5931,12 +5932,22 @@ qemuProcessPrepareDomainNetwork(virDomainObj *vm) } break; + case VIR_DOMAIN_NET_TYPE_VHOSTUSER: + if (net->backend.type == VIR_DOMAIN_NET_BACKEND_PASST) { + /* when using the passt backend, the path of the + * unix socket is always derived from other info + * *not* manually given in the config, but all the + * vhostuser code looks for it there. + */ + net->data.vhostuser->data.nix.path = qemuPasstCreateSocketPath(vm, net); + } + break; + case VIR_DOMAIN_NET_TYPE_DIRECT: case VIR_DOMAIN_NET_TYPE_BRIDGE: case VIR_DOMAIN_NET_TYPE_NETWORK: case VIR_DOMAIN_NET_TYPE_ETHERNET: case VIR_DOMAIN_NET_TYPE_USER: - case VIR_DOMAIN_NET_TYPE_VHOSTUSER: case VIR_DOMAIN_NET_TYPE_SERVER: case VIR_DOMAIN_NET_TYPE_CLIENT: case VIR_DOMAIN_NET_TYPE_MCAST: @@ -5948,7 +5959,6 @@ qemuProcessPrepareDomainNetwork(virDomainObj *vm) case VIR_DOMAIN_NET_TYPE_LAST: break; } - } return 0; } diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index b0cf5e866c..92e745cea1 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -1751,7 +1751,9 @@ qemuValidateDomainDeviceDefNetwork(const virDomainNetDef *net, return -1; } - if (net->type == VIR_DOMAIN_NET_TYPE_USER) { + if (net->type == VIR_DOMAIN_NET_TYPE_USER || + (net->type == VIR_DOMAIN_NET_TYPE_VHOSTUSER && + net->backend.type == VIR_DOMAIN_NET_BACKEND_PASST)) { virDomainCapsDeviceNet netCaps = { }; virQEMUCapsFillDomainDeviceNetCaps(qemuCaps, &netCaps); @@ -1826,7 +1828,8 @@ qemuValidateDomainDeviceDefNetwork(const virDomainNetDef *net, } if (net->type == VIR_DOMAIN_NET_TYPE_VHOSTUSER) { - if (!net->data.vhostuser->data.nix.path) { + if (!net->data.vhostuser->data.nix.path && + net->backend.type != VIR_DOMAIN_NET_BACKEND_PASST) { virReportError(VIR_ERR_XML_ERROR, _("Missing required attribute '%1$s' in element '%2$s'"), "path", "source"); diff --git a/tests/qemuxmlconfdata/net-user-slirp-portforward.x86_64-latest.err b/tests/qemuxmlconfdata/net-user-slirp-portforward.x86_64-latest.err index eaa934742e..e231677e57 100644 --- a/tests/qemuxmlconfdata/net-user-slirp-portforward.x86_64-latest.err +++ b/tests/qemuxmlconfdata/net-user-slirp-portforward.x86_64-latest.err @@ -1 +1 @@ -unsupported configuration: The <portForward> element can only be used with <interface type='user'> and its 'passt' backend +unsupported configuration: The <portForward> element can only be used with the 'passt' backend of interface type='user' or type='vhostuser' diff --git a/tests/qemuxmlconfdata/net-vhostuser-passt.x86_64-latest.args b/tests/qemuxmlconfdata/net-vhostuser-passt.x86_64-latest.args new file mode 100644 index 0000000000..21d78d6072 --- /dev/null +++ b/tests/qemuxmlconfdata/net-vhostuser-passt.x86_64-latest.args @@ -0,0 +1,42 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1/.config \ +/usr/bin/qemu-system-x86_64 \ +-name guest=QEMUGuest1,debug-threads=on \ +-S \ +-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain--1-QEMUGuest1/master-key.aes"}' \ +-machine pc,usb=off,dump-guest-core=off,memory-backend=pc.ram,acpi=off \ +-accel tcg \ +-cpu qemu64 \ +-m size=219136k \ +-object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":224395264}' \ +-overcommit mem-lock=off \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-boot strict=on \ +-blockdev '{"driver":"host_device","filename":"/dev/HostVG/QEMUGuest1","node-name":"libvirt-1-storage","read-only":false}' \ +-device '{"driver":"ide-hd","bus":"ide.0","unit":0,"drive":"libvirt-1-storage","id":"ide0-0-0","bootindex":1}' \ +-chardev socket,id=charnet0,path=/var/run/libvirt/qemu/passt/-1-QEMUGuest1-net0.socket \ +-netdev '{"type":"vhost-user","chardev":"charnet0","id":"hostnet0"}' \ +-device '{"driver":"virtio-net-pci","netdev":"hostnet0","id":"net0","mac":"00:11:22:33:44:55","bus":"pci.0","addr":"0x2"}' \ +-chardev socket,id=charnet1,path=/var/run/libvirt/qemu/passt/-1-QEMUGuest1-net1.socket \ +-netdev '{"type":"vhost-user","chardev":"charnet1","id":"hostnet1"}' \ +-device '{"driver":"virtio-net-pci","netdev":"hostnet1","id":"net1","mac":"00:11:22:33:44:11","bus":"pci.0","addr":"0x3"}' \ +-chardev socket,id=charnet2,path=/var/run/libvirt/qemu/passt/-1-QEMUGuest1-net2.socket \ +-netdev '{"type":"vhost-user","chardev":"charnet2","id":"hostnet2"}' \ +-device '{"driver":"virtio-net-pci","netdev":"hostnet2","id":"net2","mac":"00:11:22:33:44:11","bus":"pci.0","addr":"0x4"}' \ +-audiodev '{"id":"audio1","driver":"none"}' \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ +-msg timestamp=on diff --git a/tests/qemuxmlconfdata/net-vhostuser-passt.x86_64-latest.xml b/tests/qemuxmlconfdata/net-vhostuser-passt.x86_64-latest.xml new file mode 100644 index 0000000000..26aa4c8d05 --- /dev/null +++ b/tests/qemuxmlconfdata/net-vhostuser-passt.x86_64-latest.xml @@ -0,0 +1,72 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <cpu mode='custom' match='exact' check='none'> + <model fallback='forbid'>qemu64</model> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <disk type='block' device='disk'> + <driver name='qemu' type='raw'/> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0' model='none'/> + <controller type='ide' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <interface type='vhostuser'> + <mac address='00:11:22:33:44:55'/> + <ip address='172.17.2.0' family='ipv4' prefix='24'/> + <ip address='2001:db8:ac10:fd01::feed' family='ipv6'/> + <portForward proto='tcp' address='2001:db8:ac10:fd01::1:10'> + <range start='22' to='2022'/> + <range start='1000' end='1050'/> + <range start='1020' exclude='yes'/> + <range start='1030' end='1040' exclude='yes'/> + </portForward> + <portForward proto='udp' address='1.2.3.4' dev='eth0'> + <range start='5000' end='5020' to='6000'/> + <range start='5010' end='5015' exclude='yes'/> + </portForward> + <portForward proto='tcp'> + <range start='80'/> + </portForward> + <portForward proto='tcp'> + <range start='443' to='344'/> + </portForward> + <model type='virtio'/> + <backend type='passt' logFile='/var/log/loglaw.blog'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </interface> + <interface type='vhostuser'> + <mac address='00:11:22:33:44:11'/> + <model type='virtio'/> + <backend type='passt'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </interface> + <interface type='vhostuser'> + <mac address='00:11:22:33:44:11'/> + <model type='virtio'/> + <backend type='passt'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </interface> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <audio id='1' type='none'/> + <memballoon model='none'/> + </devices> +</domain> diff --git a/tests/qemuxmlconfdata/net-vhostuser-passt.xml b/tests/qemuxmlconfdata/net-vhostuser-passt.xml new file mode 100644 index 0000000000..e44c91e541 --- /dev/null +++ b/tests/qemuxmlconfdata/net-vhostuser-passt.xml @@ -0,0 +1,70 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <disk type='block' device='disk'> + <driver name='qemu' type='raw'/> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0' model='none'/> + <controller type='ide' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <interface type='vhostuser'> + <mac address='00:11:22:33:44:55'/> + <ip address='172.17.2.0' family='ipv4' prefix='24'/> + <ip address='2001:db8:ac10:fd01::feed' family='ipv6'/> + <source dev='eth42'/> + <portForward proto='tcp' address='2001:db8:ac10:fd01::1:10'> + <range start='22' to='2022'/> + <range start='1000' end='1050'/> + <range start='1020' exclude='yes'/> + <range start='1030' end='1040' exclude='yes'/> + </portForward> + <portForward proto='udp' address='1.2.3.4' dev='eth0'> + <range start='5000' end='5020' to='6000'/> + <range start='5010' end='5015' exclude='yes'/> + </portForward> + <portForward proto='tcp'> + <range start='80'/> + </portForward> + <portForward proto='tcp'> + <range start='443' to='344'/> + </portForward> + <model type='virtio'/> + <backend type='passt' logFile='/var/log/loglaw.blog'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </interface> + <interface type='vhostuser'> + <mac address='00:11:22:33:44:11'/> + <backend type='passt'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </interface> + <interface type='vhostuser'> + <mac address='00:11:22:33:44:11'/> + <source dev='eth43'/> + <model type='virtio'/> + <backend type='passt'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </interface> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <audio id='1' type='none'/> + <memballoon model='none'/> + </devices> +</domain> diff --git a/tests/qemuxmlconftest.c b/tests/qemuxmlconftest.c index 1f0068864a..34674551a4 100644 --- a/tests/qemuxmlconftest.c +++ b/tests/qemuxmlconftest.c @@ -1793,6 +1793,7 @@ mymain(void) DO_TEST_CAPS_LATEST("net-user-passt"); DO_TEST_CAPS_VER("net-user-passt", "7.2.0"); DO_TEST_CAPS_LATEST_PARSE_ERROR("net-user-slirp-portforward"); + DO_TEST_CAPS_LATEST("net-vhostuser-passt"); DO_TEST_CAPS_LATEST("net-virtio"); DO_TEST_CAPS_LATEST("net-virtio-device"); DO_TEST_CAPS_LATEST("net-virtio-disable-offloads"); -- 2.47.1

On Thu, Feb 13, 2025 at 01:19:53PM -0500, Laine Stump wrote:
The result is that you can now have:
<interface type='vhostuser'> <backend type='passt'/> ... </interface>
Then as long as you also have the following as a subelement of <domain>:
<memoryBacking> <access mode='shared'/> </memoryBacking>
your passt interfaces will benefit from the greatly improved efficiency of a vhost-user data path
The presence of this element is not validated anywhere though, so if you leave it out the network interface will just silently be non-functional. I don't think that makes for a good user experience. How does passt know where the backing file is located in the first place? It's not passed on its command line, and I didn't spot any logic to hand the information over. This part clearly works, I just don't understand how :)
+++ b/src/conf/domain_validate.c @@ -2163,7 +2163,8 @@ virDomainNetDefValidate(const virDomainNetDef *net) return -1; }
- if (net->type != VIR_DOMAIN_NET_TYPE_USER) { + if (net->type != VIR_DOMAIN_NET_TYPE_USER && + net->type != VIR_DOMAIN_NET_TYPE_VHOSTUSER) { if (net->backend.type == VIR_DOMAIN_NET_BACKEND_PASST) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("The 'passt' backend can only be used with interface type='user'"));
This error message needs to be updated.
+++ b/tests/qemuxmlconfdata/net-vhostuser-passt.x86_64-latest.args @@ -0,0 +1,42 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1/.config \ +/usr/bin/qemu-system-x86_64 \ +-name guest=QEMUGuest1,debug-threads=on \ +-S \ +-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain--1-QEMUGuest1/master-key.aes"}' \ +-machine pc,usb=off,dump-guest-core=off,memory-backend=pc.ram,acpi=off \ +-accel tcg \ +-cpu qemu64 \ +-m size=219136k \ +-object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":224395264}' \ +-overcommit mem-lock=off \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-boot strict=on \ +-blockdev '{"driver":"host_device","filename":"/dev/HostVG/QEMUGuest1","node-name":"libvirt-1-storage","read-only":false}' \ +-device '{"driver":"ide-hd","bus":"ide.0","unit":0,"drive":"libvirt-1-storage","id":"ide0-0-0","bootindex":1}' \ +-chardev socket,id=charnet0,path=/var/run/libvirt/qemu/passt/-1-QEMUGuest1-net0.socket \ +-netdev '{"type":"vhost-user","chardev":"charnet0","id":"hostnet0"}' \ +-device '{"driver":"virtio-net-pci","netdev":"hostnet0","id":"net0","mac":"00:11:22:33:44:55","bus":"pci.0","addr":"0x2"}' \ +-chardev socket,id=charnet1,path=/var/run/libvirt/qemu/passt/-1-QEMUGuest1-net1.socket \ +-netdev '{"type":"vhost-user","chardev":"charnet1","id":"hostnet1"}' \ +-device '{"driver":"virtio-net-pci","netdev":"hostnet1","id":"net1","mac":"00:11:22:33:44:11","bus":"pci.0","addr":"0x3"}' \ +-chardev socket,id=charnet2,path=/var/run/libvirt/qemu/passt/-1-QEMUGuest1-net2.socket \ +-netdev '{"type":"vhost-user","chardev":"charnet2","id":"hostnet2"}' \ +-device '{"driver":"virtio-net-pci","netdev":"hostnet2","id":"net2","mac":"00:11:22:33:44:11","bus":"pci.0","addr":"0x4"}' \ +-audiodev '{"id":"audio1","driver":"none"}' \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ +-msg timestamp=on
It's a real shame that we don't have a way to include the command line for external programs (swtpm, passt) in the corresponding test case... -- Andrea Bolognani / Red Hat / Virtualization

On Fri, Feb 14, 2025 at 03:03:20AM -0800, Andrea Bolognani wrote:
On Thu, Feb 13, 2025 at 01:19:53PM -0500, Laine Stump wrote:
The result is that you can now have:
<interface type='vhostuser'> <backend type='passt'/> ... </interface>
Then as long as you also have the following as a subelement of <domain>:
<memoryBacking> <access mode='shared'/> </memoryBacking>
your passt interfaces will benefit from the greatly improved efficiency of a vhost-user data path
The presence of this element is not validated anywhere though, so if you leave it out the network interface will just silently be non-functional. I don't think that makes for a good user experience.
How does passt know where the backing file is located in the first place? It's not passed on its command line, and I didn't spot any logic to hand the information over. This part clearly works, I just don't understand how :)
IIUC the vhost user protocol passes over a pre-opened FD for the memory region(s). With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Fri, Feb 14, 2025 at 11:07:17AM +0000, Daniel P. Berrangé wrote:
On Fri, Feb 14, 2025 at 03:03:20AM -0800, Andrea Bolognani wrote:
How does passt know where the backing file is located in the first place? It's not passed on its command line, and I didn't spot any logic to hand the information over. This part clearly works, I just don't understand how :)
IIUC the vhost user protocol passes over a pre-opened FD for the memory region(s).
In the audit log (there are some known AVC denials due to the SELinux policy not having been updated yet) I'm seeing the full path to the file, which I assume wouldn't be there if passt was only provided an FD. But QEMU does indeed know the path, so it sounds reasonable that it would be the one handing that information over to the passt process as part of setting up the vhost-user connection. -- Andrea Bolognani / Red Hat / Virtualization

On 2/14/25 6:03 AM, Andrea Bolognani wrote:
On Thu, Feb 13, 2025 at 01:19:53PM -0500, Laine Stump wrote:
The result is that you can now have:
<interface type='vhostuser'> <backend type='passt'/> ... </interface>
Then as long as you also have the following as a subelement of <domain>:
<memoryBacking> <access mode='shared'/> </memoryBacking>
your passt interfaces will benefit from the greatly improved efficiency of a vhost-user data path
The presence of this element is not validated anywhere though, so if you leave it out the network interface will just silently be non-functional. I don't think that makes for a good user experience.
Keeping in mind that this is a pre-existing requirement for a pre-existing functionality that's been there since 2014... My assumption has been that whatever type='vhostuser' already does is "good enough", and at some point during testing of uncompleted code, one of the unit tests produced an error saying something about "you've selected a feature that requires shared memory but don't have shared memory turned on; your configuration may not work correctly" or something like that. Because I saw that during a failed unit test (which was later fixed) I figured that must be reported when a vhost-user config doesn't have shared memory enabled (and didn't think to actually try it), but just now I tried making a vhost-user config without shared memory enabled, and it didn't produce any error in libvirt at virsh edit time, nor did either QEMU or passt log any error (that I can see, althouygh maybe I'm not looking in all the right places), but it did produce a non-working interface for the guest! I agree with you that it doesn't "make for a good user experience" and something should be done about it. I had considered having shared memory turned on automatically for type='vhostuser', but anything about "shared memory" sets security red flags ringing (or some other mixed metaphor) so I've assumed that's why they didn't already do this for traditional vhost-user interfaces. (I did ask about this online, but sbrivio was the only one who answered (saying a variation of the above)) (admittedly it was late in the day for me, so anybody in a further east timezone was probably already offline (except sbrivio, who seems to not sleep until after it is sunrise in Australia for some strange reason ;-)). As a followup, I'd be happy to make a patch that causes a config with a vhost-user interface but no shared mem enabled to fail validation. That shouldn't hold up what's here though - as I said above, it's a preexisting condition (I don't know if you were actually suggesting that, but just want to "head it off at the pass" if you were :-)) (Now *there's* a nice American colloquialism (straight out of old western movies) - do people outside the U.S. every use it?)
How does passt know where the backing file is located in the first place? It's not passed on its command line, and I didn't spot any logic to hand the information over. This part clearly works, I just don't understand how :)
me too - double :-)
+++ b/src/conf/domain_validate.c @@ -2163,7 +2163,8 @@ virDomainNetDefValidate(const virDomainNetDef *net) return -1; }
- if (net->type != VIR_DOMAIN_NET_TYPE_USER) { + if (net->type != VIR_DOMAIN_NET_TYPE_USER && + net->type != VIR_DOMAIN_NET_TYPE_VHOSTUSER) { if (net->backend.type == VIR_DOMAIN_NET_BACKEND_PASST) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("The 'passt' backend can only be used with interface type='user'"));
This error message needs to be updated.
Oops, missed that one. I'll fix it before I push anything.
+++ b/tests/qemuxmlconfdata/net-vhostuser-passt.x86_64-latest.args @@ -0,0 +1,42 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1/.config \ +/usr/bin/qemu-system-x86_64 \ +-name guest=QEMUGuest1,debug-threads=on \ +-S \ +-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain--1-QEMUGuest1/master-key.aes"}' \ +-machine pc,usb=off,dump-guest-core=off,memory-backend=pc.ram,acpi=off \ +-accel tcg \ +-cpu qemu64 \ +-m size=219136k \ +-object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":224395264}' \ +-overcommit mem-lock=off \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-boot strict=on \ +-blockdev '{"driver":"host_device","filename":"/dev/HostVG/QEMUGuest1","node-name":"libvirt-1-storage","read-only":false}' \ +-device '{"driver":"ide-hd","bus":"ide.0","unit":0,"drive":"libvirt-1-storage","id":"ide0-0-0","bootindex":1}' \ +-chardev socket,id=charnet0,path=/var/run/libvirt/qemu/passt/-1-QEMUGuest1-net0.socket \ +-netdev '{"type":"vhost-user","chardev":"charnet0","id":"hostnet0"}' \ +-device '{"driver":"virtio-net-pci","netdev":"hostnet0","id":"net0","mac":"00:11:22:33:44:55","bus":"pci.0","addr":"0x2"}' \ +-chardev socket,id=charnet1,path=/var/run/libvirt/qemu/passt/-1-QEMUGuest1-net1.socket \ +-netdev '{"type":"vhost-user","chardev":"charnet1","id":"hostnet1"}' \ +-device '{"driver":"virtio-net-pci","netdev":"hostnet1","id":"net1","mac":"00:11:22:33:44:11","bus":"pci.0","addr":"0x3"}' \ +-chardev socket,id=charnet2,path=/var/run/libvirt/qemu/passt/-1-QEMUGuest1-net2.socket \ +-netdev '{"type":"vhost-user","chardev":"charnet2","id":"hostnet2"}' \ +-device '{"driver":"virtio-net-pci","netdev":"hostnet2","id":"net2","mac":"00:11:22:33:44:11","bus":"pci.0","addr":"0x4"}' \ +-audiodev '{"id":"audio1","driver":"none"}' \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ +-msg timestamp=on
It's a real shame that we don't have a way to include the command line for external programs (swtpm, passt) in the corresponding test case...
I was thinking that myself as I was writing this code! We really should be grabbing *all* external commands and putting them somewhere to compare (I guess they could all just go into the same file, and then the unit test would be able to verify the commandline was correct, and that the commands had all been executed in the proper order).

On 2/14/25 8:49 AM, Laine Stump wrote:
On 2/14/25 6:03 AM, Andrea Bolognani wrote:
On Thu, Feb 13, 2025 at 01:19:53PM -0500, Laine Stump wrote:
The result is that you can now have:
<interface type='vhostuser'> <backend type='passt'/> ... </interface>
Then as long as you also have the following as a subelement of <domain>:
<memoryBacking> <access mode='shared'/> </memoryBacking>
your passt interfaces will benefit from the greatly improved efficiency of a vhost-user data path
The presence of this element is not validated anywhere though, so if you leave it out the network interface will just silently be non-functional. I don't think that makes for a good user experience.
Keeping in mind that this is a pre-existing requirement for a pre- existing functionality that's been there since 2014...
My assumption has been that whatever type='vhostuser' already does is "good enough", and at some point during testing of uncompleted code, one of the unit tests produced an error saying something about "you've selected a feature that requires shared memory but don't have shared memory turned on; your configuration may not work correctly" or something like that.
Because I saw that during a failed unit test (which was later fixed) I figured that must be reported when a vhost-user config doesn't have shared memory enabled (and didn't think to actually try it), but just now I tried making a vhost-user config without shared memory enabled, and it didn't produce any error in libvirt at virsh edit time, nor did either QEMU or passt log any error (that I can see, althouygh maybe I'm not looking in all the right places), but it did produce a non-working interface for the guest!
I stand corrected - qemu gives this warning (which shows up in /var/log/libvirt/qemu/${guestname}.log, but doesn't cause failure to start the guest, and doesn't get grabbed and reported by libvirt (since I think we only do that if qemu exits with an error code): qemu-kvm: Failed initializing vhost-user memory map, consider using -object memory-backend-file share=on qemu-kvm: vhost_set_mem_table failed: Invalid argument (22) qemu-kvm: unable to start vhost net: 22: falling back on userspace virtio
I agree with you that it doesn't "make for a good user experience" and something should be done about it. I had considered having shared memory turned on automatically for type='vhostuser', but anything about "shared memory" sets security red flags ringing (or some other mixed metaphor) so I've assumed that's why they didn't already do this for traditional vhost-user interfaces. (I did ask about this online, but sbrivio was the only one who answered (saying a variation of the above)) (admittedly it was late in the day for me, so anybody in a further east timezone was probably already offline (except sbrivio, who seems to not sleep until after it is sunrise in Australia for some strange reason ;-)).
As a followup, I'd be happy to make a patch that causes a config with a vhost-user interface but no shared mem enabled to fail validation. That shouldn't hold up what's here though - as I said above, it's a preexisting condition (I don't know if you were actually suggesting that, but just want to "head it off at the pass" if you were :-))
(Now *there's* a nice American colloquialism (straight out of old western movies) - do people outside the U.S. every use it?)
How does passt know where the backing file is located in the first place? It's not passed on its command line, and I didn't spot any logic to hand the information over. This part clearly works, I just don't understand how :)
me too - double :-)
+++ b/src/conf/domain_validate.c @@ -2163,7 +2163,8 @@ virDomainNetDefValidate(const virDomainNetDef *net) return -1; }
- if (net->type != VIR_DOMAIN_NET_TYPE_USER) { + if (net->type != VIR_DOMAIN_NET_TYPE_USER && + net->type != VIR_DOMAIN_NET_TYPE_VHOSTUSER) { if (net->backend.type == VIR_DOMAIN_NET_BACKEND_PASST) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("The 'passt' backend can only be used with interface type='user'"));
This error message needs to be updated.
Oops, missed that one. I'll fix it before I push anything.
+++ b/tests/qemuxmlconfdata/net-vhostuser-passt.x86_64-latest.args @@ -0,0 +1,42 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest1/.config \ +/usr/bin/qemu-system-x86_64 \ +-name guest=QEMUGuest1,debug-threads=on \ +-S \ +-object '{"qom- type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/ libvirt/qemu/domain--1-QEMUGuest1/master-key.aes"}' \ +-machine pc,usb=off,dump-guest-core=off,memory- backend=pc.ram,acpi=off \ +-accel tcg \ +-cpu qemu64 \ +-m size=219136k \ +-object '{"qom-type":"memory-backend- ram","id":"pc.ram","size":224395264}' \ +-overcommit mem-lock=off \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-boot strict=on \ +-blockdev '{"driver":"host_device","filename":"/dev/HostVG/ QEMUGuest1","node-name":"libvirt-1-storage","read-only":false}' \ +-device '{"driver":"ide- hd","bus":"ide.0","unit":0,"drive":"libvirt-1- storage","id":"ide0-0-0","bootindex":1}' \ +-chardev socket,id=charnet0,path=/var/run/libvirt/qemu/passt/-1- QEMUGuest1-net0.socket \ +-netdev '{"type":"vhost-user","chardev":"charnet0","id":"hostnet0"}' \ +-device '{"driver":"virtio-net- pci","netdev":"hostnet0","id":"net0","mac":"00:11:22:33:44:55","bus":"pci.0","addr":"0x2"}' \ +-chardev socket,id=charnet1,path=/var/run/libvirt/qemu/passt/-1- QEMUGuest1-net1.socket \ +-netdev '{"type":"vhost-user","chardev":"charnet1","id":"hostnet1"}' \ +-device '{"driver":"virtio-net- pci","netdev":"hostnet1","id":"net1","mac":"00:11:22:33:44:11","bus":"pci.0","addr":"0x3"}' \ +-chardev socket,id=charnet2,path=/var/run/libvirt/qemu/passt/-1- QEMUGuest1-net2.socket \ +-netdev '{"type":"vhost-user","chardev":"charnet2","id":"hostnet2"}' \ +-device '{"driver":"virtio-net- pci","netdev":"hostnet2","id":"net2","mac":"00:11:22:33:44:11","bus":"pci.0","addr":"0x4"}' \ +-audiodev '{"id":"audio1","driver":"none"}' \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ +-msg timestamp=on
It's a real shame that we don't have a way to include the command line for external programs (swtpm, passt) in the corresponding test case...
I was thinking that myself as I was writing this code! We really should be grabbing *all* external commands and putting them somewhere to compare (I guess they could all just go into the same file, and then the unit test would be able to verify the commandline was correct, and that the commands had all been executed in the proper order).

On Fri, Feb 14, 2025 at 08:49:29AM -0500, Laine Stump wrote:
On 2/14/25 6:03 AM, Andrea Bolognani wrote:
On Thu, Feb 13, 2025 at 01:19:53PM -0500, Laine Stump wrote:
The result is that you can now have:
<interface type='vhostuser'> <backend type='passt'/> ... </interface>
Then as long as you also have the following as a subelement of <domain>:
<memoryBacking> <access mode='shared'/> </memoryBacking>
your passt interfaces will benefit from the greatly improved efficiency of a vhost-user data path
The presence of this element is not validated anywhere though, so if you leave it out the network interface will just silently be non-functional. I don't think that makes for a good user experience.
Keeping in mind that this is a pre-existing requirement for a pre-existing functionality that's been there since 2014...
My assumption has been that whatever type='vhostuser' already does is "good enough", and at some point during testing of uncompleted code, one of the unit tests produced an error saying something about "you've selected a feature that requires shared memory but don't have shared memory turned on; your configuration may not work correctly" or something like that.
Because I saw that during a failed unit test (which was later fixed) I figured that must be reported when a vhost-user config doesn't have shared memory enabled (and didn't think to actually try it), but just now I tried making a vhost-user config without shared memory enabled, and it didn't produce any error in libvirt at virsh edit time, nor did either QEMU or passt log any error (that I can see, althouygh maybe I'm not looking in all the right places), but it did produce a non-working interface for the guest!
I agree with you that it doesn't "make for a good user experience" and something should be done about it. I had considered having shared memory turned on automatically for type='vhostuser', but anything about "shared memory" sets security red flags ringing (or some other mixed metaphor) so I've assumed that's why they didn't already do this for traditional vhost-user interfaces. (I did ask about this online, but sbrivio was the only one who answered (saying a variation of the above)) (admittedly it was late in the day for me, so anybody in a further east timezone was probably already offline (except sbrivio, who seems to not sleep until after it is sunrise in Australia for some strange reason ;-)).
As a followup, I'd be happy to make a patch that causes a config with a vhost-user interface but no shared mem enabled to fail validation. That shouldn't hold up what's here though - as I said above, it's a preexisting condition (I don't know if you were actually suggesting that, but just want to "head it off at the pass" if you were :-))
I was pretty much requesting that change be made, actually :) I have no idea about how things work in the pre-existing vhost-user scenarios, but vhost-user passt is obviously broken when shared memory is disabled, so we should simply not allow it. I see no reason to introduce a new feature with known flaws, especially when adding a check for this should be completely trivial. You can restrict the check to just the vhost-user passt case, leaving other vhost-user cases alone, so that we don't have to worry about accidentally breaking existing configurations. Let's not even consider enabling shared memory automatically for now, because as you say that's potentially opening a big can of worm. You need to respin to add the documentation anyway, don't you? -- Andrea Bolognani / Red Hat / Virtualization

On 2/14/25 9:21 AM, Andrea Bolognani wrote:
On Fri, Feb 14, 2025 at 08:49:29AM -0500, Laine Stump wrote:
On 2/14/25 6:03 AM, Andrea Bolognani wrote:
On Thu, Feb 13, 2025 at 01:19:53PM -0500, Laine Stump wrote:
The result is that you can now have:
<interface type='vhostuser'> <backend type='passt'/> ... </interface>
Then as long as you also have the following as a subelement of <domain>:
<memoryBacking> <access mode='shared'/> </memoryBacking>
your passt interfaces will benefit from the greatly improved efficiency of a vhost-user data path
The presence of this element is not validated anywhere though, so if you leave it out the network interface will just silently be non-functional. I don't think that makes for a good user experience.
Keeping in mind that this is a pre-existing requirement for a pre-existing functionality that's been there since 2014...
My assumption has been that whatever type='vhostuser' already does is "good enough", and at some point during testing of uncompleted code, one of the unit tests produced an error saying something about "you've selected a feature that requires shared memory but don't have shared memory turned on; your configuration may not work correctly" or something like that.
Because I saw that during a failed unit test (which was later fixed) I figured that must be reported when a vhost-user config doesn't have shared memory enabled (and didn't think to actually try it), but just now I tried making a vhost-user config without shared memory enabled, and it didn't produce any error in libvirt at virsh edit time, nor did either QEMU or passt log any error (that I can see, althouygh maybe I'm not looking in all the right places), but it did produce a non-working interface for the guest!
I agree with you that it doesn't "make for a good user experience" and something should be done about it. I had considered having shared memory turned on automatically for type='vhostuser', but anything about "shared memory" sets security red flags ringing (or some other mixed metaphor) so I've assumed that's why they didn't already do this for traditional vhost-user interfaces. (I did ask about this online, but sbrivio was the only one who answered (saying a variation of the above)) (admittedly it was late in the day for me, so anybody in a further east timezone was probably already offline (except sbrivio, who seems to not sleep until after it is sunrise in Australia for some strange reason ;-)).
As a followup, I'd be happy to make a patch that causes a config with a vhost-user interface but no shared mem enabled to fail validation. That shouldn't hold up what's here though - as I said above, it's a preexisting condition (I don't know if you were actually suggesting that, but just want to "head it off at the pass" if you were :-))
I was pretty much requesting that change be made, actually :)
I have no idea about how things work in the pre-existing vhost-user scenarios, but vhost-user passt is obviously broken when shared memory is disabled, so we should simply not allow it. I see no reason to introduce a new feature with known flaws, especially when adding a check for this should be completely trivial.
Yeah, sure sure sure, okay :-P
You can restrict the check to just the vhost-user passt case, leaving other vhost-user cases alone, so that we don't have to worry about accidentally breaking existing configurations.
I will also send a separate patch that removes the "passt-only" aspect and does it for vhost-user as well; since validation isn't run on existing configs, and any existing config that had vhost-user but no shared memory would have previously failed to function anyway, I don't see much danger in validating it for plain vhost-user as well.
Let's not even consider enabling shared memory automatically for now, because as you say that's potentially opening a big can of worm.
Yeah, it would be convenient, and *might not* even be all that unsafe, but I certainly can't say that certain thing with any certainty (hey, if sbrivio can do it so can I!) and I wouldn't want to be the one responsible for some new exploit.
You need to respin to add the documentation anyway, don't you?
If there weren't any other major problems, I wasn't planning to resend the entire series for that, just add a "Patch 10/9".

On Fri, Feb 14, 2025 at 10:59:11AM -0500, Laine Stump wrote:
On 2/14/25 9:21 AM, Andrea Bolognani wrote:
You can restrict the check to just the vhost-user passt case, leaving other vhost-user cases alone, so that we don't have to worry about accidentally breaking existing configurations.
I will also send a separate patch that removes the "passt-only" aspect and does it for vhost-user as well; since validation isn't run on existing configs, and any existing config that had vhost-user but no shared memory would have previously failed to function anyway, I don't see much danger in validating it for plain vhost-user as well.
That sounds good, as long as it's a separate patch that this series doesn't depend on. I'm highly confident that we can do that safely for the passt case, slightly less so for the general case, and I wouldn't want to hold up the former because of the latter.
You need to respin to add the documentation anyway, don't you?
If there weren't any other major problems, I wasn't planning to resend the entire series for that, just add a "Patch 10/9".
Please do a quick resend even if the changes are minimal. That gives reviewers the chance to take one last look before the series is merged. And don't forget to update the release notes too ;) -- Andrea Bolognani / Red Hat / Virtualization

On a Thursday in 2025, Laine Stump wrote: [...]
This Resolves: https://issues.redhat.com/browse/RHEL-69455
Laine Stump (9): conf: change virDomainHostdevInsert() to return void qemu: fix qemu validation to forbid guest-side IP address for type='vdpa' qemu: validate that model is virtio for vhostuser and vdpa interfaces in the same place qemu: automatically set model type='virtio' for interface type='vhostuser' qemu: do all vhostuser attribute validation in qemu driver conf/qemu: make <source> element *almost* optional for type=vhostuser qemu: use switch instead of if in qemuProcessPrepareDomainNetwork() qemu: make qemuPasstCreateSocketPath() public qemu: complete vhostuser + passt support
src/conf/domain_conf.c | 107 +++++++++--------- src/conf/domain_conf.h | 2 +- src/conf/domain_validate.c | 83 ++++---------- src/conf/schemas/domaincommon.rng | 32 +++++- src/libxl/libxl_domain.c | 5 +- src/libxl/libxl_driver.c | 3 +- src/lxc/lxc_driver.c | 3 +- src/qemu/qemu_command.c | 7 +- src/qemu/qemu_driver.c | 3 +- src/qemu/qemu_extdevice.c | 6 +- src/qemu/qemu_hotplug.c | 21 +++- src/qemu/qemu_passt.c | 5 +- src/qemu/qemu_passt.h | 3 + src/qemu/qemu_postparse.c | 3 +- src/qemu/qemu_process.c | 84 +++++++++----- src/qemu/qemu_validate.c | 56 ++++++--- ...t-user-slirp-portforward.x86_64-latest.err | 2 +- .../net-vhostuser-passt.x86_64-latest.args | 42 +++++++ .../net-vhostuser-passt.x86_64-latest.xml | 72 ++++++++++++ tests/qemuxmlconfdata/net-vhostuser-passt.xml | 70 ++++++++++++ tests/qemuxmlconftest.c | 1 + 21 files changed, 429 insertions(+), 181 deletions(-) create mode 100644 tests/qemuxmlconfdata/net-vhostuser-passt.x86_64-latest.args create mode 100644 tests/qemuxmlconfdata/net-vhostuser-passt.x86_64-latest.xml create mode 100644 tests/qemuxmlconfdata/net-vhostuser-passt.xml
Reviewed-by: Ján Tomko <jtomko@redhat.com> Jano

On Thu, Feb 13, 2025 at 01:19:44PM -0500, Laine Stump wrote:
passt (https://passt.top) provides a method of connecting QEMU virtual machines to the external network without requiring special privileges or capabilities of any participating processes - even libvirt itself can run unprivileged and create an instance of passt (which *always* runs unprivileged) that is then connected to the qemu process (and thus the virtual machine) with a unix socket.
[...]
So far this has been tested both unprivileged and privileged on Fedora 40 (with latest passt packet) and selinux enabled (there are a couple of selinux policy tweaks that still need to be pushed to passt-selinux) as well as unprivileged on debian (I *think* with AppArmor enabled) and everything seems to work.
Unfortunately unprivileged VMs don't actually benefit from AppArmor isolation. See [1] for a recent discussion about this.
Also, you will need the latest (20250121) passt package.
I truly appreciate the amount of information you've included in the cover letter, especially the details about required passt version and missing SELinux bits. That made it much easier for me to jump in with some confidence. Speaking of SELinux, with the current policy on Fedora 41 I get a couple of AVC denials related to accessing the shared memory file. I understand that's expected, based on the above, but it's still quite surprising to me that the VM would start at all in this case. Just like the scenario that I've mentioned in my reply to 9/9, the network interface quietly being broken doesn't make for a great user experience. I believe this specific failure scenario, unlike the other one, is pre-existing and not easy to deal with purely through XML validation, but I really think that we should spend some effort (as a follow-up) on making sure that, if passt can't set up the network interface successfully, we report a useful error to the user instead of just leaving things broken with no clear indication that they are. [1] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/R52J7... -- Andrea Bolognani / Red Hat / Virtualization

On 2/14/25 6:17 AM, Andrea Bolognani wrote:
On Thu, Feb 13, 2025 at 01:19:44PM -0500, Laine Stump wrote:
passt (https://passt.top) provides a method of connecting QEMU virtual machines to the external network without requiring special privileges or capabilities of any participating processes - even libvirt itself can run unprivileged and create an instance of passt (which *always* runs unprivileged) that is then connected to the qemu process (and thus the virtual machine) with a unix socket.
[...]
So far this has been tested both unprivileged and privileged on Fedora 40 (with latest passt packet) and selinux enabled (there are a couple of selinux policy tweaks that still need to be pushed to passt-selinux) as well as unprivileged on debian (I *think* with AppArmor enabled) and everything seems to work.
Unfortunately unprivileged VMs don't actually benefit from AppArmor isolation. See [1] for a recent discussion about this.
Also, you will need the latest (20250121) passt package.
I truly appreciate the amount of information you've included in the cover letter, especially the details about required passt version and missing SELinux bits. That made it much easier for me to jump in with some confidence.
Speaking of SELinux, with the current policy on Fedora 41 I get a couple of AVC denials related to accessing the shared memory file. I understand that's expected, based on the above, but it's still quite surprising to me that the VM would start at all in this case.
Just like the scenario that I've mentioned in my reply to 9/9, the network interface quietly being broken doesn't make for a great user experience. I believe this specific failure scenario, unlike the other one, is pre-existing and not easy to deal with purely through XML validation, but I really think that we should spend some effort (as a follow-up) on making sure that, if passt can't set up the network interface successfully, we report a useful error to the user instead of just leaving things broken with no clear indication that they are.
(I guess you're talking about reporting the selinux denial?) The difficulty here is that it's not libvirt getting the selinux denial, it's passt and/or QEMU, and we don't report errors from either of those unless they are fatal (i.e. the other process exits right away with an error code). Practically speaking though, the selinux issue you're seeing should never happen in production (as long as all packages are up to date) so I don't think it's as essential as the share memory config thing to figure out all the contortions necessary to report it. (Translated: "Error reporting is hard!!!! Let's all go shopping!!!!"). If you've got any bright ideas feel free to pontificate though :-)
[1] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/R52J7...

On Fri, Feb 14, 2025 at 09:08:36AM -0500, Laine Stump wrote:
On 2/14/25 6:17 AM, Andrea Bolognani wrote:
Speaking of SELinux, with the current policy on Fedora 41 I get a couple of AVC denials related to accessing the shared memory file. I understand that's expected, based on the above, but it's still quite surprising to me that the VM would start at all in this case.
Just like the scenario that I've mentioned in my reply to 9/9, the network interface quietly being broken doesn't make for a great user experience. I believe this specific failure scenario, unlike the other one, is pre-existing and not easy to deal with purely through XML validation, but I really think that we should spend some effort (as a follow-up) on making sure that, if passt can't set up the network interface successfully, we report a useful error to the user instead of just leaving things broken with no clear indication that they are.
(I guess you're talking about reporting the selinux denial?)
The difficulty here is that it's not libvirt getting the selinux denial, it's passt and/or QEMU, and we don't report errors from either of those unless they are fatal (i.e. the other process exits right away with an error code). Practically speaking though, the selinux issue you're seeing should never happen in production (as long as all packages are up to date) so I don't think it's as essential as the share memory config thing to figure out all the contortions necessary to report it. (Translated: "Error reporting is hard!!!! Let's all go shopping!!!!"). If you've got any bright ideas feel free to pontificate though :-)
I haven't looked into it in any detail, so no specific suggestions. And I agree that it won't be seen in production so we can proceed as is for now, and only consider improving things further as a follow-up. In abstract terms, we need to be able to catch startup errors from passt more consistently. As a point of comparison, swtpm will complain very loudly and refuse to start if it can't manufacture a TPM device; passt being unable to create the network interface backend or connect to the frontend should result in a similar VM startup failure. My impression is that in many cases passt will attempt to proceed and simply log an error message on failure, possibly because the underlying problem can be fixed after the fact. In the context of libvirt, I don't think this applies. So maybe we need a "strict mode" of sorts, where passt is more willing to call it quits whenever something doesn't work? I don't know how feasible that is. It's entirely possible that I have incorrectly described how passt does error handling. All I know for sure is that the current situation, in which a VM can successfully be started despite ending up with a non-working network interface, is very clearly not acceptable in the long run. -- Andrea Bolognani / Red Hat / Virtualization

On Fri, 14 Feb 2025 06:47:53 -0800 Andrea Bolognani <abologna@redhat.com> wrote:
On Fri, Feb 14, 2025 at 09:08:36AM -0500, Laine Stump wrote:
On 2/14/25 6:17 AM, Andrea Bolognani wrote:
Speaking of SELinux, with the current policy on Fedora 41 I get a couple of AVC denials related to accessing the shared memory file. I understand that's expected, based on the above, but it's still quite surprising to me that the VM would start at all in this case.
Just like the scenario that I've mentioned in my reply to 9/9, the network interface quietly being broken doesn't make for a great user experience. I believe this specific failure scenario, unlike the other one, is pre-existing and not easy to deal with purely through XML validation, but I really think that we should spend some effort (as a follow-up) on making sure that, if passt can't set up the network interface successfully, we report a useful error to the user instead of just leaving things broken with no clear indication that they are.
(I guess you're talking about reporting the selinux denial?)
The difficulty here is that it's not libvirt getting the selinux denial, it's passt and/or QEMU, and we don't report errors from either of those unless they are fatal (i.e. the other process exits right away with an error code). Practically speaking though, the selinux issue you're seeing should never happen in production (as long as all packages are up to date) so I don't think it's as essential as the share memory config thing to figure out all the contortions necessary to report it. (Translated: "Error reporting is hard!!!! Let's all go shopping!!!!"). If you've got any bright ideas feel free to pontificate though :-)
I haven't looked into it in any detail, so no specific suggestions. And I agree that it won't be seen in production so we can proceed as is for now, and only consider improving things further as a follow-up.
In abstract terms, we need to be able to catch startup errors from passt more consistently. As a point of comparison, swtpm will complain very loudly and refuse to start if it can't manufacture a TPM device; passt being unable to create the network interface backend or connect to the frontend should result in a similar VM startup failure.
My impression is that in many cases passt will attempt to proceed and simply log an error message on failure, possibly because the underlying problem can be fixed after the fact.
In this case what you see is not a desired behaviour of passt and it's a bit more complicated compared to the non-vhost-user case (hence the issue that nobody had thought about). With or without --vhost-user, passt goes to background once its *control interface* is up and running, which should make it convenient for libvirt, libkrun, and even "manual" users (typically using scripts). You start it without having to fork. If it forks and exits successfully, you know it's ready. Nothing fancy here, that's a pretty much established UNIX daemon thing. In the non-vhost-user case, that socket is also the data interface, so not much can go wrong after this phase. QEMU might fail to start (or fail to connect to passt and hence fail to start), and in that case libvirt will terminate passt, but that's about it. In the vhost-user case, passt still forks once the control interface is up and running: libvirt knows that QEMU can start, now. But the data connection is not ready, yet, because that's negotiated as part of the vhost-user protocol. So, QEMU connects, and passes to passt a passing file descriptor, en passant, via SCM_RIGHTS (admittedly a bit passé), to represent the shared (guest) memory that passt can map. If passt can't map the memory, it fails with a resounding: die_perror("vhost-user region mmap error"); but that's too late: passt declared success, the guest started, and libvirt can't do much. We don't have guarantees as to when this failure can happen, either. Without vhost-user, libvirt gets a NETDEV_STREAM_DISCONNECTED if passt terminates for whatever reason, and it handles that by restarting passt. QEMU's interface is configured with a --reconnect-ms option, so restarting passt should be enough to give the guest its connectivity back. For vhost-user, a patch from Laurent would introduce equivalent functionality: https://lore.kernel.org/qemu-devel/20250214072629.1033314-1-lvivier@redhat.c... but that only helps when things are up and running. If failure happens before the guest ever had working connectivity, it's pretty unclear to me what the desired behaviour is. And:
In the context of libvirt, I don't think this applies. So maybe we need a "strict mode" of sorts, where passt is more willing to call it quits whenever something doesn't work?
...while this already happens...
I don't know how feasible that is. It's entirely possible that I have incorrectly described how passt does error handling. All I know for sure is that the current situation, in which a VM can successfully be started despite ending up with a non-working network interface, is very clearly not acceptable in the long run.
...should libvirt really bring down the guest because oops, on a second thought, your network interface will never work? Again, we have no timing guarantees. Or perhaps QEMU should terminate instead? Are you aware of any similar case we can try to use as established practice? -- Stefano

On Fri, 14 Feb 2025 03:17:06 -0800 Andrea Bolognani <abologna@redhat.com> wrote:
On Thu, Feb 13, 2025 at 01:19:44PM -0500, Laine Stump wrote:
passt (https://passt.top) provides a method of connecting QEMU virtual machines to the external network without requiring special privileges or capabilities of any participating processes - even libvirt itself can run unprivileged and create an instance of passt (which *always* runs unprivileged) that is then connected to the qemu process (and thus the virtual machine) with a unix socket.
[...]
So far this has been tested both unprivileged and privileged on Fedora 40 (with latest passt packet) and selinux enabled (there are a couple of selinux policy tweaks that still need to be pushed to passt-selinux) as well as unprivileged on debian (I *think* with AppArmor enabled) and everything seems to work.
Unfortunately unprivileged VMs don't actually benefit from AppArmor isolation. See [1] for a recent discussion about this.
I quickly reported to Laine about a test I made with the workaround I proposed there. That's what it means by "working with AppArmor". It's simply passt with: https://archives.passt.top/passt-dev/20250205163101.3793658-1-sbrivio@redhat... (merged but not released yet).
Also, you will need the latest (20250121) passt package.
I truly appreciate the amount of information you've included in the cover letter, especially the details about required passt version and missing SELinux bits. That made it much easier for me to jump in with some confidence.
Speaking of SELinux, with the current policy on Fedora 41 I get a couple of AVC denials related to accessing the shared memory file. I understand that's expected, based on the above, but it's still quite surprising to me that the VM would start at all in this case.
Just for the record, it's with this: https://archives.passt.top/passt-dev/20250213221642.4085986-1-sbrivio@redhat... as you found out meanwhile. Of course, it's temporary (and not even released yet).
Just like the scenario that I've mentioned in my reply to 9/9, the network interface quietly being broken doesn't make for a great user experience. I believe this specific failure scenario, unlike the other one, is pre-existing and not easy to deal with purely through XML validation, but I really think that we should spend some effort (as a follow-up) on making sure that, if passt can't set up the network interface successfully, we report a useful error to the user instead of just leaving things broken with no clear indication that they are.
I guess that's a valid follow-up improvement regardless of the workaround I'm about to release in passt's policy.
[1] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/R52J7...
-- Stefano
participants (6)
-
Andrea Bolognani
-
Daniel P. Berrangé
-
Ján Tomko
-
Laine Stump
-
Laine Stump
-
Stefano Brivio