[libvirt] [PATCH 0/9] support use of precreated tap devices from unprivileged libvirtd

This resolves https://bugzilla.redhat.com/1723367 It has become more popular to run libvirtd in an unprivileged environment (e.g. inside a container), but until now the only possible types of network connection for a qemu started by an unprivileged libvirtd were: 1) a usermode slirp connection 2) a tap device connection to a bridge handled by running qemu-bridge-helper (a suid-root utility distributed with qemu) 3) a host network card assigned to the guest using VFIO (this requires special setup by a privileged process though) This patch series remedies that by making it possible for libvirtd to use a tap device that has been pre-created (*and* properly setup) by some other process beforehand. In order to use this, you must have a standard tap, or macvtap device that has been set to be owned by the uid that will be running libvirtd, has its MAC address already set, and has been set online (IFF_UP). For example, here are the commands to create a standard tap device named "mytap0", attach it to the host bridge device "br0" and prepare it for use by a libvirtd that is running as user "laine": ip tuntap add mode tap user laine group laine name mytap0 ip link set mytap0 up ip link set mytap0 master br0 (You may want to set a specific MAC address for the tap device, but as long as it *doesn't* match the MAC address used by the guest emulated device, it really doesn't matter) You can now add the following <interface> to a domain definition: <interface type='ethernet'> <model type='virtio'/> <mac address='52:54:00:11:11:11'/> <target dev='mytap0' managed='no'/> </interface> and start up the guest. A similar set of commands to create a macvtap device named "mymacvtap0" with MAC addres 52:54:00:11:11:11 connected to the host device "en2" would be something like this: ip link add link en2 name mymacvtap0 address 52:54:00:11:11:11 \ type macvtap mode bridge ip link set mymacvtap0 up The XML would be identical, except the name of the device <interface type='ethernet'> <model type='virtio'/> <mac address='52:54:00:11:11:11'/> <target dev='mymacvtap0' managed='no'/> </interface> (Note that in the case of macvtap, the precreated device must *match* the MAC address of the emulated guest device). If libvirtd is given a precreated device, that device will *not* be explicitly deleted when qemu is finished with it - the caller must take care of that. Laine Stump (9): util: new function virNetDevMacVLanIsMacvtap() util: make a couple virNetDevMacVlan*() functions public qemu: reorganize qemuInterfaceEthernetConnect() conf: use virXMLFormatElement for interface <target> conf: new "managed" attribute for target dev of <interface type='ethernet'> qemu: support unmanaged target tap dev for <interface type='ethernet'> qemu: support unmanaged macvtap devices with <interface type='ethernet'> qemu: explicitly delete standard tap devices only on platforms that require it docs: update news file docs/formatdomain.html.in | 48 +++++++--- docs/news.xml | 13 +++ docs/schemas/domaincommon.rng | 5 ++ src/conf/domain_conf.c | 55 +++++++++--- src/conf/domain_conf.h | 1 + src/libvirt_private.syms | 3 + src/qemu/qemu_interface.c | 89 ++++++++++++------- src/qemu/qemu_process.c | 6 +- src/util/virnetdev.h | 2 +- src/util/virnetdevmacvlan.c | 35 ++++++-- src/util/virnetdevmacvlan.h | 12 +++ .../net-eth-unmanaged-tap.args | 32 +++++++ .../net-eth-unmanaged-tap.xml | 35 ++++++++ tests/qemuxml2argvmock.c | 16 +++- tests/qemuxml2argvtest.c | 1 + .../net-eth-unmanaged-tap.xml | 40 +++++++++ tests/qemuxml2xmltest.c | 1 + 17 files changed, 329 insertions(+), 65 deletions(-) create mode 100644 tests/qemuxml2argvdata/net-eth-unmanaged-tap.args create mode 100644 tests/qemuxml2argvdata/net-eth-unmanaged-tap.xml create mode 100644 tests/qemuxml2xmloutdata/net-eth-unmanaged-tap.xml -- 2.21.0

This function returns T if the given name is a macvtap device. This is determined by 1) getting the ifindex of the device with that name (if there is one), and 2) checking for existence of /dev/tapXX, where "XX" is the ifindex learned in (1). It's also possible to learn this by getting a netlink dump of the interface and parsing through it to look for some attributes, but that is complicated to figure out, takes longer to execute, and I'm lazy. Signed-off-by: Laine Stump <laine@redhat.com> --- src/libvirt_private.syms | 3 +++ src/util/virnetdevmacvlan.c | 23 +++++++++++++++++++++++ src/util/virnetdevmacvlan.h | 3 +++ 3 files changed, 29 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index a34d92f5ef..afea00b629 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2523,10 +2523,13 @@ virNetDevMacVLanCreate; virNetDevMacVLanCreateWithVPortProfile; virNetDevMacVLanDelete; virNetDevMacVLanDeleteWithVPortProfile; +virNetDevMacVLanIsMacvtap; virNetDevMacVLanModeTypeFromString; virNetDevMacVLanReleaseName; virNetDevMacVLanReserveName; virNetDevMacVLanRestartWithVPortProfile; +virNetDevMacVLanTapOpen; +virNetDevMacVLanTapSetup; virNetDevMacVLanVPortProfileRegisterCallback; diff --git a/src/util/virnetdevmacvlan.c b/src/util/virnetdevmacvlan.c index 3302522289..79aa7ed5ac 100644 --- a/src/util/virnetdevmacvlan.c +++ b/src/util/virnetdevmacvlan.c @@ -278,6 +278,29 @@ virNetDevMacVLanReleaseName(const char *name) } +/** + * virNetDevMacVLanIsMacvtap: + * @ifname: Name of the interface + * + * Return T if the named netdev exists and is a macvtap device + * F in all other cases. + */ +bool +virNetDevMacVLanIsMacvtap(const char *ifname) +{ + int ifindex; + VIR_AUTOFREE(char *) tapname = NULL; + + if (virNetDevGetIndex(ifname, &ifindex) < 0) + return false; + + if (virAsprintf(&tapname, "/dev/tap%d", ifindex) < 0) + return false; + + return virFileExists(tapname); +} + + /** * virNetDevMacVLanCreate: * diff --git a/src/util/virnetdevmacvlan.h b/src/util/virnetdevmacvlan.h index d1b479ed9f..8ac7643e49 100644 --- a/src/util/virnetdevmacvlan.h +++ b/src/util/virnetdevmacvlan.h @@ -57,6 +57,9 @@ typedef enum { int virNetDevMacVLanReserveName(const char *name, bool quietfail); int virNetDevMacVLanReleaseName(const char *name); +bool virNetDevMacVLanIsMacvtap(const char *ifname) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK ATTRIBUTE_NOINLINE; + int virNetDevMacVLanCreate(const char *ifname, const char *type, const virMacAddr *macaddress, -- 2.21.0

On Tue, Aug 27, 2019 at 09:46:31PM -0400, Laine Stump wrote:
This function returns T if the given name is a macvtap device. This is determined by 1) getting the ifindex of the device with that name (if there is one), and 2) checking for existence of /dev/tapXX, where "XX" is the ifindex learned in (1).
It's also possible to learn this by getting a netlink dump of the interface and parsing through it to look for some attributes, but that is complicated to figure out, takes longer to execute, and I'm lazy.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/libvirt_private.syms | 3 +++ src/util/virnetdevmacvlan.c | 23 +++++++++++++++++++++++ src/util/virnetdevmacvlan.h | 3 +++ 3 files changed, 29 insertions(+)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index a34d92f5ef..afea00b629 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2523,10 +2523,13 @@ virNetDevMacVLanCreate; virNetDevMacVLanCreateWithVPortProfile; virNetDevMacVLanDelete; virNetDevMacVLanDeleteWithVPortProfile; +virNetDevMacVLanIsMacvtap; virNetDevMacVLanModeTypeFromString; virNetDevMacVLanReleaseName; virNetDevMacVLanReserveName; virNetDevMacVLanRestartWithVPortProfile; +virNetDevMacVLanTapOpen; +virNetDevMacVLanTapSetup;
Accidentally included
virNetDevMacVLanVPortProfileRegisterCallback;
With that removed Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

In virNetDevMacVLanOpen(), The "retries" arg has been removed and the value hardcoded as 10, since previously the function was only called from one place, so it was always 10. Signed-off-by: Laine Stump <laine@redhat.com> --- src/util/virnetdevmacvlan.c | 12 +++++------- src/util/virnetdevmacvlan.h | 9 +++++++++ 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/src/util/virnetdevmacvlan.c b/src/util/virnetdevmacvlan.c index 79aa7ed5ac..9a9750f24a 100644 --- a/src/util/virnetdevmacvlan.c +++ b/src/util/virnetdevmacvlan.c @@ -374,19 +374,17 @@ int virNetDevMacVLanDelete(const char *ifname) * @ifname: Name of the macvtap interface * @tapfd: array of file descriptor return value for the new macvtap device * @tapfdSize: number of file descriptors in @tapfd - * @retries : Number of retries in case udev for example may need to be - * waited for to create the tap chardev * * Open the macvtap's tap device, possibly multiple times if @tapfdSize > 1. * * Returns 0 on success, -1 otherwise. */ -static int +int virNetDevMacVLanTapOpen(const char *ifname, int *tapfd, - size_t tapfdSize, - int retries) + size_t tapfdSize) { + int retries = 10; int ret = -1; int ifindex; size_t i = 0; @@ -446,7 +444,7 @@ virNetDevMacVLanTapOpen(const char *ifname, * * Returns 0 on success, -1 in case of fatal error. */ -static int +int virNetDevMacVLanTapSetup(int *tapfd, size_t tapfdSize, bool vnet_hdr) { unsigned int features; @@ -1040,7 +1038,7 @@ virNetDevMacVLanCreateWithVPortProfile(const char *ifnameRequested, } if (flags & VIR_NETDEV_MACVLAN_CREATE_WITH_TAP) { - if (virNetDevMacVLanTapOpen(ifnameCreated, tapfd, tapfdSize, 10) < 0) + if (virNetDevMacVLanTapOpen(ifnameCreated, tapfd, tapfdSize) < 0) goto disassociate_exit; if (virNetDevMacVLanTapSetup(tapfd, tapfdSize, vnet_hdr) < 0) diff --git a/src/util/virnetdevmacvlan.h b/src/util/virnetdevmacvlan.h index 8ac7643e49..24b17b4bd0 100644 --- a/src/util/virnetdevmacvlan.h +++ b/src/util/virnetdevmacvlan.h @@ -88,6 +88,15 @@ int virNetDevMacVLanCreateWithVPortProfile(const char *ifname, ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(6) ATTRIBUTE_NONNULL(8) ATTRIBUTE_NONNULL(10) ATTRIBUTE_RETURN_CHECK; +int virNetDevMacVLanTapOpen(const char *ifname, + int *tapfd, + size_t tapfdSize) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) + ATTRIBUTE_RETURN_CHECK; + +int virNetDevMacVLanTapSetup(int *tapfd, size_t tapfdSize, bool vnet_hdr) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; + int virNetDevMacVLanDeleteWithVPortProfile(const char *ifname, const virMacAddr *macaddress, const char *linkdev, -- 2.21.0

On Tue, Aug 27, 2019 at 09:46:32PM -0400, Laine Stump wrote:
In virNetDevMacVLanOpen(), The "retries" arg has been removed and the value hardcoded as 10, since previously the function was only called from one place, so it was always 10.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/util/virnetdevmacvlan.c | 12 +++++------- src/util/virnetdevmacvlan.h | 9 +++++++++ 2 files changed, 14 insertions(+), 7 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

This just moves around a few things in qemuInterfaceConnect() with no functional difference (except that a few failures that would have previously resulted in a "success" audit log will now properly produce a "fail" audit). The change is so that adding support for unmanaged tap/macvtap devices will be more easily reviewable. Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_interface.c | 69 ++++++++++++++++++++------------------- 1 file changed, 35 insertions(+), 34 deletions(-) diff --git a/src/qemu/qemu_interface.c b/src/qemu/qemu_interface.c index 72ed51cb1f..1e3b7f0d06 100644 --- a/src/qemu/qemu_interface.c +++ b/src/qemu/qemu_interface.c @@ -414,6 +414,7 @@ qemuInterfaceEthernetConnect(virDomainDefPtr def, bool template_ifname = false; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); const char *tunpath = "/dev/net/tun"; + const char *auditdev = tunpath; if (net->backend.tap) { tunpath = net->backend.tap; @@ -424,43 +425,39 @@ qemuInterfaceEthernetConnect(virDomainDefPtr def, } } - if (!net->ifname || - STRPREFIX(net->ifname, VIR_NET_GENERATED_TAP_PREFIX) || - strchr(net->ifname, '%')) { - VIR_FREE(net->ifname); - if (VIR_STRDUP(net->ifname, VIR_NET_GENERATED_TAP_PREFIX "%d") < 0) - goto cleanup; - /* avoid exposing vnet%d in getXMLDesc or error outputs */ - template_ifname = true; - } - if (virDomainNetIsVirtioModel(net)) tap_create_flags |= VIR_NETDEV_TAP_CREATE_VNET_HDR; - if (virNetDevTapCreate(&net->ifname, tunpath, tapfd, tapfdSize, - tap_create_flags) < 0) { - virDomainAuditNetDevice(def, net, tunpath, false); - goto cleanup; - } - - virDomainAuditNetDevice(def, net, tunpath, true); - - /* The tap device's MAC address cannot match the MAC address - * used by the guest. This results in "received packet on - * vnetX with own address as source address" error logs from - * the kernel. - */ - virMacAddrSet(&tapmac, &net->mac); - if (tapmac.addr[0] == 0xFE) - tapmac.addr[0] = 0xFA; - else - tapmac.addr[0] = 0xFE; - - if (virNetDevSetMAC(net->ifname, &tapmac) < 0) - goto cleanup; - - if (virNetDevSetOnline(net->ifname, true) < 0) - goto cleanup; + if (!net->ifname || + STRPREFIX(net->ifname, VIR_NET_GENERATED_TAP_PREFIX) || + strchr(net->ifname, '%')) { + VIR_FREE(net->ifname); + if (VIR_STRDUP(net->ifname, VIR_NET_GENERATED_TAP_PREFIX "%d") < 0) + goto cleanup; + /* avoid exposing vnet%d in getXMLDesc or error outputs */ + template_ifname = true; + } + if (virNetDevTapCreate(&net->ifname, tunpath, tapfd, tapfdSize, + tap_create_flags) < 0) { + goto cleanup; + } + + /* The tap device's MAC address cannot match the MAC address + * used by the guest. This results in "received packet on + * vnetX with own address as source address" error logs from + * the kernel. + */ + virMacAddrSet(&tapmac, &net->mac); + if (tapmac.addr[0] == 0xFE) + tapmac.addr[0] = 0xFA; + else + tapmac.addr[0] = 0xFE; + + if (virNetDevSetMAC(net->ifname, &tapmac) < 0) + goto cleanup; + + if (virNetDevSetOnline(net->ifname, true) < 0) + goto cleanup; if (net->script && virNetDevRunEthernetScript(net->ifname, net->script) < 0) @@ -477,11 +474,15 @@ qemuInterfaceEthernetConnect(virDomainDefPtr def, goto cleanup; } + virDomainAuditNetDevice(def, net, auditdev, true); + ret = 0; cleanup: if (ret < 0) { size_t i; + + virDomainAuditNetDevice(def, net, auditdev, false); for (i = 0; i < tapfdSize && tapfd[i] >= 0; i++) VIR_FORCE_CLOSE(tapfd[i]); if (template_ifname) -- 2.21.0

On Tue, Aug 27, 2019 at 09:46:33PM -0400, Laine Stump wrote:
This just moves around a few things in qemuInterfaceConnect() with no functional difference (except that a few failures that would have previously resulted in a "success" audit log will now properly produce a "fail" audit). The change is so that adding support for unmanaged tap/macvtap devices will be more easily reviewable.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_interface.c | 69 ++++++++++++++++++++------------------- 1 file changed, 35 insertions(+), 34 deletions(-)
diff --git a/src/qemu/qemu_interface.c b/src/qemu/qemu_interface.c index 72ed51cb1f..1e3b7f0d06 100644 --- a/src/qemu/qemu_interface.c +++ b/src/qemu/qemu_interface.c @@ -414,6 +414,7 @@ qemuInterfaceEthernetConnect(virDomainDefPtr def, bool template_ifname = false; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); const char *tunpath = "/dev/net/tun"; + const char *auditdev = tunpath;
if (net->backend.tap) { tunpath = net->backend.tap; @@ -424,43 +425,39 @@ qemuInterfaceEthernetConnect(virDomainDefPtr def, } }
- if (!net->ifname || - STRPREFIX(net->ifname, VIR_NET_GENERATED_TAP_PREFIX) || - strchr(net->ifname, '%')) { - VIR_FREE(net->ifname); - if (VIR_STRDUP(net->ifname, VIR_NET_GENERATED_TAP_PREFIX "%d") < 0) - goto cleanup; - /* avoid exposing vnet%d in getXMLDesc or error outputs */ - template_ifname = true; - } - if (virDomainNetIsVirtioModel(net)) tap_create_flags |= VIR_NETDEV_TAP_CREATE_VNET_HDR;
- if (virNetDevTapCreate(&net->ifname, tunpath, tapfd, tapfdSize, - tap_create_flags) < 0) { - virDomainAuditNetDevice(def, net, tunpath, false); - goto cleanup; - } - - virDomainAuditNetDevice(def, net, tunpath, true); - - /* The tap device's MAC address cannot match the MAC address - * used by the guest. This results in "received packet on - * vnetX with own address as source address" error logs from - * the kernel. - */ - virMacAddrSet(&tapmac, &net->mac); - if (tapmac.addr[0] == 0xFE) - tapmac.addr[0] = 0xFA; - else - tapmac.addr[0] = 0xFE; - - if (virNetDevSetMAC(net->ifname, &tapmac) < 0) - goto cleanup; - - if (virNetDevSetOnline(net->ifname, true) < 0) - goto cleanup; + if (!net->ifname || + STRPREFIX(net->ifname, VIR_NET_GENERATED_TAP_PREFIX) || + strchr(net->ifname, '%')) { + VIR_FREE(net->ifname); + if (VIR_STRDUP(net->ifname, VIR_NET_GENERATED_TAP_PREFIX "%d") < 0) + goto cleanup; + /* avoid exposing vnet%d in getXMLDesc or error outputs */ + template_ifname = true; + } + if (virNetDevTapCreate(&net->ifname, tunpath, tapfd, tapfdSize, + tap_create_flags) < 0) { + goto cleanup; + } + + /* The tap device's MAC address cannot match the MAC address + * used by the guest. This results in "received packet on + * vnetX with own address as source address" error logs from + * the kernel. + */ + virMacAddrSet(&tapmac, &net->mac); + if (tapmac.addr[0] == 0xFE) + tapmac.addr[0] = 0xFA; + else + tapmac.addr[0] = 0xFE; + + if (virNetDevSetMAC(net->ifname, &tapmac) < 0) + goto cleanup; + + if (virNetDevSetOnline(net->ifname, true) < 0) + goto cleanup;
Indentation here is all off-by-1 which makes the diffstat bigger
if (net->script && virNetDevRunEthernetScript(net->ifname, net->script) < 0) @@ -477,11 +474,15 @@ qemuInterfaceEthernetConnect(virDomainDefPtr def, goto cleanup; }
+ virDomainAuditNetDevice(def, net, auditdev, true); + ret = 0;
cleanup: if (ret < 0) { size_t i; + + virDomainAuditNetDevice(def, net, auditdev, false); for (i = 0; i < tapfdSize && tapfd[i] >= 0; i++) VIR_FORCE_CLOSE(tapfd[i]); if (template_ifname) -- 2.21.0
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

This will simplify addition of another attribute to the <target> element Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index b7a342bb91..f21731b5f6 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -25316,6 +25316,7 @@ virDomainNetDefFormat(virBufferPtr buf, const char *typeStr; virDomainHostdevDefPtr hostdef = NULL; char macstr[VIR_MAC_STRING_BUFLEN]; + VIR_AUTOCLEAN(virBuffer) attrBuf = VIR_BUFFER_INITIALIZER; /* publicActual is true if we should report the current state in * def->data.network.actual *instead of* the config (*not* in @@ -25530,9 +25531,12 @@ virDomainNetDefFormat(virBufferPtr buf, (STRPREFIX(def->ifname, VIR_NET_GENERATED_TAP_PREFIX) || (prefix && STRPREFIX(def->ifname, prefix))))) { /* Skip auto-generated target names for inactive config. */ - virBufferEscapeString(buf, "<target dev='%s'/>\n", def->ifname); + virBufferEscapeString(&attrBuf, " dev='%s'", def->ifname); } + if (virXMLFormatElement(buf, "target", &attrBuf, NULL) < 0) + return -1; + if (def->ifname_guest || def->ifname_guest_actual) { virBufferAddLit(buf, "<guest"); /* Skip auto-generated target names for inactive config. */ -- 2.21.0

On Tue, Aug 27, 2019 at 09:46:34PM -0400, Laine Stump wrote:
This will simplify addition of another attribute to the <target> element
Signed-off-by: Laine Stump <laine@redhat.com> --- src/conf/domain_conf.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Although <interface type='ethernet'> has always been able to use an existing tap device, this is just a coincidence due to the fact that the same ioctl is used to create a new tap device or get a handle to an existing device. Even then, once we have the handle to the device, we still insist on doing extra setup to it (setting the MAC address and IFF_UP). That *might* be okay if libvirtd is running as a privileged process, but if libvirtd is running as an unprivileged user, those attempted modifications to the tap device will fail (yes, even if the tap is set to be owned by the user running libvirtd). We could avoid this if we knew that the device already existed, but as stated above, an existing device and new device are both accessed in the same manner, and anyway, we need to preserve existing behavior for those who are already using pre-existing devices with privileged libvirtd (and allowing/expecting libvirt to configure the pre-existing device). In order to cleanly support the idea of using a pre-existing and pre-configured tap device, this patch introduces a new optional attribute "managed" for the interface <target> element. This attribute is only valid for <interface type='ethernet'> (since all other interface types have mandatory config that doesn't apply in the case where we expect the tap device to be setup before we get it). The syntax would look something like this: <interface type='ethernet'> <target dev='mytap0' managed='no'/> ... </interface> This patch just adds managed to the grammar and parser for <target>, but has no functionality behind it. (NB: when managed='no' (the default when not specified is 'yes'), the target dev is always a name explicitly provided, so we don't auto-remove it from the config just because it starts with "vnet" (VIR_NET_GENERATED_TAP_PREFIX); this makes it possible to use the same pattern of names that libvirt itself uses when it automatically creates the tap devices.) Signed-off-by: Laine Stump <laine@redhat.com> --- docs/formatdomain.html.in | 48 +++++++++++++---- docs/schemas/domaincommon.rng | 5 ++ src/conf/domain_conf.c | 51 +++++++++++++++---- src/conf/domain_conf.h | 1 + .../net-eth-unmanaged-tap.xml | 35 +++++++++++++ .../net-eth-unmanaged-tap.xml | 40 +++++++++++++++ tests/qemuxml2xmltest.c | 1 + 7 files changed, 160 insertions(+), 21 deletions(-) create mode 100644 tests/qemuxml2argvdata/net-eth-unmanaged-tap.xml create mode 100644 tests/qemuxml2xmloutdata/net-eth-unmanaged-tap.xml diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index fcb7c59c00..86a5261e47 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -5511,24 +5511,52 @@ <h5><a id="elementsNICSEthernet">Generic ethernet connection</a></h5> <p> - Provides a means for the administrator to execute an arbitrary script - to connect the guest's network to the LAN. The guest will have a tun - device created with a name of vnetN, which can also be overridden with the - <target> element. After creating the tun device a shell script will - be run which is expected to do whatever host network integration is - required. By default this script is called /etc/qemu-ifup but can be - overridden. + Provides a means to use a new or existing tap device (or veth + device pair, depening on the needs of the hypervisor driver) + that is partially or wholly setup external to libvirt (either + prior to the guest starting, or while the guest is being started + via an optional script specified in the config). + </p> + <p> + The name of the tap device can optionally be specified with + the <code>dev</code> attribute of the + <code><target></code> element. If no target dev is + specified, libvirt will create a new standard tap device with a + name of the pattern "vnetN", where "N" is replaced with a + number. If a target dev is specified and that device doesn't + exist, then a new standard tap device will be created with the + exact dev name given. If the specified target dev does exist, + then that existing device will be used. Usually some basic setup + of the device is done by libvirt, including setting a MAC + address, and the IFF_UP flag, but if the <code>dev</code> is a + pre-existing device, and the <code>managed</code> attribute of + the <code>target</code> element is also set to "no" (the default + value is "yes"), even this basic setup will not be performed - + libvirt will simply pass the device on to the hypervisor with no + setup at all. <span class="since">Since 5.7.0</span> Using + managed='no' with a pre-created tap device is useful because + it permits a virtual machine managed by an unprivileged libvirtd + to have emulated network devices based on tap devices. + </p> + <p> + After creating/opening the tap device, an optional shell script + (given in the <code>path</code> attribute of + the <code><script></code> element) will be run; this can + be used to do whatever extra host network integration is + required. </p> <pre> ... <devices> - <interface type='ethernet'/> - ... <interface type='ethernet'> - <target dev='vnet7'/> <script path='/etc/qemu-ifup-mynet'/> </interface> + ... + <interface type='ethernet'> + <target dev='mytap1' managed='no'/> + <model type='virtio'/> + </interface> </devices> ...</pre> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index c48f8c4f56..cae3be639e 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -2885,6 +2885,11 @@ <attribute name="dev"> <ref name="deviceName"/> </attribute> + <optional> + <attribute name="managed"> + <ref name="virYesNo"/> + </attribute> + </optional> <empty/> </element> </optional> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index f21731b5f6..d473853f0e 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -6107,6 +6107,14 @@ virDomainNetDefValidate(const virDomainNetDef *net) virDomainNetTypeToString(net->type)); return -1; } + if (net->managed_tap == VIR_TRISTATE_BOOL_NO && + net->type != VIR_DOMAIN_NET_TYPE_ETHERNET) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("unmanaged target dev is not supported on " + "interfaces of type '%s'"), + virDomainNetTypeToString(net->type)); + return -1; + } return 0; } @@ -11416,6 +11424,7 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt, VIR_AUTOFREE(char *) bridge = NULL; VIR_AUTOFREE(char *) dev = NULL; VIR_AUTOFREE(char *) ifname = NULL; + VIR_AUTOFREE(char *) managed_tap = NULL; VIR_AUTOFREE(char *) ifname_guest = NULL; VIR_AUTOFREE(char *) ifname_guest_actual = NULL; VIR_AUTOFREE(char *) script = NULL; @@ -11578,13 +11587,7 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt, } else if (!ifname && virXMLNodeNameEqual(cur, "target")) { ifname = virXMLPropString(cur, "dev"); - if (ifname && - (flags & VIR_DOMAIN_DEF_PARSE_INACTIVE) && - (STRPREFIX(ifname, VIR_NET_GENERATED_TAP_PREFIX) || - (prefix && STRPREFIX(ifname, prefix)))) { - /* An auto-generated target name, blank it out */ - VIR_FREE(ifname); - } + managed_tap = virXMLPropString(cur, "managed"); } else if ((!ifname_guest || !ifname_guest_actual) && virXMLNodeNameEqual(cur, "guest")) { ifname_guest = virXMLPropString(cur, "dev"); @@ -11918,6 +11921,27 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt, ctxt, &def->guestIP) < 0) goto error; + if (managed_tap) { + if (STREQ(managed_tap, "no")) { + def->managed_tap = VIR_TRISTATE_BOOL_NO; + } else if (STREQ(managed_tap, "yes")) { + def->managed_tap = VIR_TRISTATE_BOOL_YES; + } else { + virReportError(VIR_ERR_XML_ERROR, + _("invalid 'managed' value '%s'"), + managed_tap); + goto error; + } + } + + if (def->managed_tap != VIR_TRISTATE_BOOL_NO && ifname && + (flags & VIR_DOMAIN_DEF_PARSE_INACTIVE) && + (STRPREFIX(ifname, VIR_NET_GENERATED_TAP_PREFIX) || + (prefix && STRPREFIX(ifname, prefix)))) { + /* An auto-generated target name, blank it out */ + VIR_FREE(ifname); + } + if (script != NULL) VIR_STEAL_PTR(def->script, script); if (domain_name != NULL) @@ -25527,16 +25551,21 @@ virDomainNetDefFormat(virBufferPtr buf, virBufferEscapeString(buf, "<backenddomain name='%s'/>\n", def->domain_name); if (def->ifname && - !((flags & VIR_DOMAIN_DEF_FORMAT_INACTIVE) && - (STRPREFIX(def->ifname, VIR_NET_GENERATED_TAP_PREFIX) || - (prefix && STRPREFIX(def->ifname, prefix))))) { + (def->managed_tap == VIR_TRISTATE_BOOL_NO || + !((flags & VIR_DOMAIN_DEF_FORMAT_INACTIVE) && + (STRPREFIX(def->ifname, VIR_NET_GENERATED_TAP_PREFIX) || + (prefix && STRPREFIX(def->ifname, prefix)))))) { /* Skip auto-generated target names for inactive config. */ virBufferEscapeString(&attrBuf, " dev='%s'", def->ifname); } + if (def->managed_tap != VIR_TRISTATE_BOOL_ABSENT) { + virBufferAsprintf(&attrBuf, " managed='%s'", + virTristateBoolTypeToString(def->managed_tap)); + } if (virXMLFormatElement(buf, "target", &attrBuf, NULL) < 0) return -1; - + if (def->ifname_guest || def->ifname_guest_actual) { virBufferAddLit(buf, "<guest"); /* Skip auto-generated target names for inactive config. */ diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 33cef5b75c..9be05514b0 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1004,6 +1004,7 @@ struct _virDomainNetDef { char *script; char *domain_name; /* backend domain name */ char *ifname; /* interface name on the host (<target dev='x'/>) */ + int managed_tap; /* enum virTristateBool - ABSENT == YES */ virNetDevIPInfo hostIP; char *ifname_guest_actual; char *ifname_guest; diff --git a/tests/qemuxml2argvdata/net-eth-unmanaged-tap.xml b/tests/qemuxml2argvdata/net-eth-unmanaged-tap.xml new file mode 100644 index 0000000000..7f5a0c217b --- /dev/null +++ b/tests/qemuxml2argvdata/net-eth-unmanaged-tap.xml @@ -0,0 +1,35 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <disk type='block' device='disk'> + <driver name='qemu' type='raw'/> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <interface type='ethernet'> + <mac address='fe:11:22:33:44:55'/> + <target dev='mytap0' managed='no'/> + <model type='virtio'/> + </interface> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='none'/> + </devices> +</domain> diff --git a/tests/qemuxml2xmloutdata/net-eth-unmanaged-tap.xml b/tests/qemuxml2xmloutdata/net-eth-unmanaged-tap.xml new file mode 100644 index 0000000000..cdff179932 --- /dev/null +++ b/tests/qemuxml2xmloutdata/net-eth-unmanaged-tap.xml @@ -0,0 +1,40 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <disk type='block' device='disk'> + <driver name='qemu' type='raw'/> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='ide' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <interface type='ethernet'> + <mac address='fe:11:22:33:44:55'/> + <target dev='mytap0' managed='no'/> + <model type='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </interface> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='none'/> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 0ea4ae0342..a286d6a5e1 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -408,6 +408,7 @@ mymain(void) DO_TEST("net-eth", NONE); DO_TEST("net-eth-ifname", NONE); DO_TEST("net-eth-hostip", NONE); + DO_TEST("net-eth-unmanaged-tap", NONE); DO_TEST("net-virtio-network-portgroup", NONE); DO_TEST("net-virtio-rxtxqueuesize", NONE); DO_TEST("net-hostdev", NONE); -- 2.21.0

On Tue, Aug 27, 2019 at 09:46:35PM -0400, Laine Stump wrote:
Although <interface type='ethernet'> has always been able to use an existing tap device, this is just a coincidence due to the fact that the same ioctl is used to create a new tap device or get a handle to an existing device.
Even then, once we have the handle to the device, we still insist on doing extra setup to it (setting the MAC address and IFF_UP). That *might* be okay if libvirtd is running as a privileged process, but if libvirtd is running as an unprivileged user, those attempted modifications to the tap device will fail (yes, even if the tap is set to be owned by the user running libvirtd). We could avoid this if we knew that the device already existed, but as stated above, an existing device and new device are both accessed in the same manner, and anyway, we need to preserve existing behavior for those who are already using pre-existing devices with privileged libvirtd (and allowing/expecting libvirt to configure the pre-existing device).
In order to cleanly support the idea of using a pre-existing and pre-configured tap device, this patch introduces a new optional attribute "managed" for the interface <target> element. This attribute is only valid for <interface type='ethernet'> (since all other interface types have mandatory config that doesn't apply in the case where we expect the tap device to be setup before we get it). The syntax would look something like this:
<interface type='ethernet'> <target dev='mytap0' managed='no'/> ... </interface>
This patch just adds managed to the grammar and parser for <target>, but has no functionality behind it.
(NB: when managed='no' (the default when not specified is 'yes'), the target dev is always a name explicitly provided, so we don't auto-remove it from the config just because it starts with "vnet" (VIR_NET_GENERATED_TAP_PREFIX); this makes it possible to use the same pattern of names that libvirt itself uses when it automatically creates the tap devices.)
Signed-off-by: Laine Stump <laine@redhat.com> --- docs/formatdomain.html.in | 48 +++++++++++++---- docs/schemas/domaincommon.rng | 5 ++ src/conf/domain_conf.c | 51 +++++++++++++++---- src/conf/domain_conf.h | 1 + .../net-eth-unmanaged-tap.xml | 35 +++++++++++++ .../net-eth-unmanaged-tap.xml | 40 +++++++++++++++ tests/qemuxml2xmltest.c | 1 + 7 files changed, 160 insertions(+), 21 deletions(-) create mode 100644 tests/qemuxml2argvdata/net-eth-unmanaged-tap.xml create mode 100644 tests/qemuxml2xmloutdata/net-eth-unmanaged-tap.xml
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

If managed='no', then the tap device must already exist, and setting of MAC address and online status (IFF_UP) is skipped. NB: we still set IFF_VNET_HDR and IFF_MULTI_QUEUE as appropriate, because those bits must be properly set in the TUNSETIFF we use to set the tap device name of the handle we've opened - if IFF_VNET_HDR has not been set and we set it the request will be honored even when running libvirtd unprivileged; if IFF_MULTI_QUEUE is requested to be different than how it was created, that will result in an error from the kernel. This means that you don't need to pay attention to IFF_VNET_HDR when creating the tap devices, but you *do* need to set IFF_MULTI_QUEUE if you're going to use multiple queues for your tap device. NB2: /dev/vhost-net normally has permissions 600, so it can't be opened by an unprivileged process. This would normally cause a warning message when using a virtio net device from an unprivileged libvirtd. I've found that setting the permissions for /dev/vhost-net permits unprivileged libvirtd to use vhost-net for virtio devices, but have no idea what sort of security implications that has. I haven't changed libvrit's code to avoid *attempting* to open /dev/vhost-net - if you are concerned about the security of opening up permissions of /dev/vhost-net (probably a good idea at least until we ask someone who knows about the code) then add <driver name='qemu'/> to the interface definition and you'll avoid the warning message. Note that virNetDevTapCreate() is the correct function to call in the case of an existing device, because the same ioctl() that creates a new tap device will also open an existing tap device. Resolves: https://bugzilla.redhat.com/1723367 (partially) Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_interface.c | 77 +++++++++++-------- src/qemu/qemu_process.c | 2 +- src/util/virnetdev.h | 2 +- .../net-eth-unmanaged-tap.args | 32 ++++++++ tests/qemuxml2argvmock.c | 16 +++- tests/qemuxml2argvtest.c | 1 + 6 files changed, 96 insertions(+), 34 deletions(-) create mode 100644 tests/qemuxml2argvdata/net-eth-unmanaged-tap.args diff --git a/src/qemu/qemu_interface.c b/src/qemu/qemu_interface.c index 1e3b7f0d06..446f43c364 100644 --- a/src/qemu/qemu_interface.c +++ b/src/qemu/qemu_interface.c @@ -428,36 +428,53 @@ qemuInterfaceEthernetConnect(virDomainDefPtr def, if (virDomainNetIsVirtioModel(net)) tap_create_flags |= VIR_NETDEV_TAP_CREATE_VNET_HDR; - if (!net->ifname || - STRPREFIX(net->ifname, VIR_NET_GENERATED_TAP_PREFIX) || - strchr(net->ifname, '%')) { - VIR_FREE(net->ifname); - if (VIR_STRDUP(net->ifname, VIR_NET_GENERATED_TAP_PREFIX "%d") < 0) - goto cleanup; - /* avoid exposing vnet%d in getXMLDesc or error outputs */ - template_ifname = true; - } - if (virNetDevTapCreate(&net->ifname, tunpath, tapfd, tapfdSize, - tap_create_flags) < 0) { - goto cleanup; - } - - /* The tap device's MAC address cannot match the MAC address - * used by the guest. This results in "received packet on - * vnetX with own address as source address" error logs from - * the kernel. - */ - virMacAddrSet(&tapmac, &net->mac); - if (tapmac.addr[0] == 0xFE) - tapmac.addr[0] = 0xFA; - else - tapmac.addr[0] = 0xFE; - - if (virNetDevSetMAC(net->ifname, &tapmac) < 0) - goto cleanup; - - if (virNetDevSetOnline(net->ifname, true) < 0) - goto cleanup; + if (net->managed_tap == VIR_TRISTATE_BOOL_NO) { + if (!net->ifname) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("target dev must be supplied when managed='no'")); + goto cleanup; + } + if (virNetDevExists(net->ifname) != 1) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("target managed='no' but specified dev doesn't exist")); + goto cleanup; + } + if (virNetDevTapCreate(&net->ifname, tunpath, tapfd, tapfdSize, + tap_create_flags) < 0) { + goto cleanup; + } + } else { + if (!net->ifname || + STRPREFIX(net->ifname, VIR_NET_GENERATED_TAP_PREFIX) || + strchr(net->ifname, '%')) { + VIR_FREE(net->ifname); + if (VIR_STRDUP(net->ifname, VIR_NET_GENERATED_TAP_PREFIX "%d") < 0) + goto cleanup; + /* avoid exposing vnet%d in getXMLDesc or error outputs */ + template_ifname = true; + } + if (virNetDevTapCreate(&net->ifname, tunpath, tapfd, tapfdSize, + tap_create_flags) < 0) { + goto cleanup; + } + + /* The tap device's MAC address cannot match the MAC address + * used by the guest. This results in "received packet on + * vnetX with own address as source address" error logs from + * the kernel. + */ + virMacAddrSet(&tapmac, &net->mac); + if (tapmac.addr[0] == 0xFE) + tapmac.addr[0] = 0xFA; + else + tapmac.addr[0] = 0xFE; + + if (virNetDevSetMAC(net->ifname, &tapmac) < 0) + goto cleanup; + + if (virNetDevSetOnline(net->ifname, true) < 0) + goto cleanup; + } if (net->script && virNetDevRunEthernetScript(net->ifname, net->script) < 0) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index c9921646e9..11c1ba8fb9 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -7548,7 +7548,7 @@ void qemuProcessStop(virQEMUDriverPtr driver, cfg->stateDir)); break; case VIR_DOMAIN_NET_TYPE_ETHERNET: - if (net->ifname) { + if (net->managed_tap != VIR_TRISTATE_BOOL_NO && net->ifname) { ignore_value(virNetDevTapDelete(net->ifname, net->backend.tap)); VIR_FREE(net->ifname); } diff --git a/src/util/virnetdev.h b/src/util/virnetdev.h index 563b218227..6ff98487cb 100644 --- a/src/util/virnetdev.h +++ b/src/util/virnetdev.h @@ -150,7 +150,7 @@ int virNetDevSetupControl(const char *ifname, ATTRIBUTE_RETURN_CHECK; int virNetDevExists(const char *brname) - ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; + ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK ATTRIBUTE_NOINLINE; int virNetDevSetOnline(const char *ifname, bool online) diff --git a/tests/qemuxml2argvdata/net-eth-unmanaged-tap.args b/tests/qemuxml2argvdata/net-eth-unmanaged-tap.args new file mode 100644 index 0000000000..2bb99e96da --- /dev/null +++ b/tests/qemuxml2argvdata/net-eth-unmanaged-tap.args @@ -0,0 +1,32 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-i686 \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=tcg,usb=off,dump-guest-core=off \ +-m 214 \ +-realtime mlock=off \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-ide0-0-0 \ +-device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 \ +-netdev tap,fd=3,id=hostnet0,vhost=on,vhostfd=44 \ +-device virtio-net-pci,netdev=hostnet0,id=net0,mac=fe:11:22:33:44:55,bus=pci.0,\ +addr=0x3 diff --git a/tests/qemuxml2argvmock.c b/tests/qemuxml2argvmock.c index 3f0c1c3fef..a386dd17be 100644 --- a/tests/qemuxml2argvmock.c +++ b/tests/qemuxml2argvmock.c @@ -149,8 +149,12 @@ virNetDevTapCreate(char **ifname, for (i = 0; i < tapfdSize; i++) tapfd[i] = STDERR_FILENO + 1 + i; - VIR_FREE(*ifname); - return VIR_STRDUP(*ifname, "vnet0"); + if (STREQ_NULLABLE(*ifname, "mytap0")) { + return 0; + } else { + VIR_FREE(*ifname); + return VIR_STRDUP(*ifname, "vnet0"); + } } int @@ -160,6 +164,14 @@ virNetDevSetMAC(const char *ifname ATTRIBUTE_UNUSED, return 0; } + +int +virNetDevExists(const char *ifname) +{ + return STREQ_NULLABLE(ifname, "mytap0"); +} + + int virNetDevIPAddrAdd(const char *ifname ATTRIBUTE_UNUSED, virSocketAddr *addr ATTRIBUTE_UNUSED, virSocketAddr *peer ATTRIBUTE_UNUSED, diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index e9f45775e5..05060b2e9f 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1302,6 +1302,7 @@ mymain(void) DO_TEST("net-eth-ifname", NONE); DO_TEST("net-eth-names", NONE); DO_TEST("net-eth-hostip", NONE); + DO_TEST("net-eth-unmanaged-tap", NONE); DO_TEST("net-client", NONE); DO_TEST("net-server", NONE); DO_TEST("net-many-models", NONE); -- 2.21.0

On Tue, Aug 27, 2019 at 09:46:36PM -0400, Laine Stump wrote:
If managed='no', then the tap device must already exist, and setting of MAC address and online status (IFF_UP) is skipped.
NB: we still set IFF_VNET_HDR and IFF_MULTI_QUEUE as appropriate, because those bits must be properly set in the TUNSETIFF we use to set the tap device name of the handle we've opened - if IFF_VNET_HDR has not been set and we set it the request will be honored even when running libvirtd unprivileged; if IFF_MULTI_QUEUE is requested to be different than how it was created, that will result in an error from the kernel. This means that you don't need to pay attention to IFF_VNET_HDR when creating the tap devices, but you *do* need to set IFF_MULTI_QUEUE if you're going to use multiple queues for your tap device.
NB2: /dev/vhost-net normally has permissions 600, so it can't be opened by an unprivileged process. This would normally cause a warning message when using a virtio net device from an unprivileged libvirtd. I've found that setting the permissions for /dev/vhost-net permits unprivileged libvirtd to use vhost-net for virtio devices, but have no idea what sort of security implications that has. I haven't changed libvrit's code to avoid *attempting* to open /dev/vhost-net - if you are concerned about the security of opening up permissions of /dev/vhost-net (probably a good idea at least until we ask someone who knows about the code) then add <driver name='qemu'/> to the interface definition and you'll avoid the warning message.
Note that virNetDevTapCreate() is the correct function to call in the case of an existing device, because the same ioctl() that creates a new tap device will also open an existing tap device.
Resolves: https://bugzilla.redhat.com/1723367 (partially) Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_interface.c | 77 +++++++++++-------- src/qemu/qemu_process.c | 2 +- src/util/virnetdev.h | 2 +- .../net-eth-unmanaged-tap.args | 32 ++++++++ tests/qemuxml2argvmock.c | 16 +++- tests/qemuxml2argvtest.c | 1 + 6 files changed, 96 insertions(+), 34 deletions(-) create mode 100644 tests/qemuxml2argvdata/net-eth-unmanaged-tap.args
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Traditionally, macvtap devices are supported using <interface type='direct'>, but that type requires specifying a source device name and macvtap mode which can't be altered after the initial device creation (and may not even be available to the management software that's creating the XML config to feed to libvirt). But the attributes in the <source> are essentially describing how the device will be connected to the network, and if libvirt is to be supplied with the name of a macvtap device that has already been created, that device will also already be connected to the network (and the connection can't be changed). Thus it seems more appropriate to use type='ethernet', which was created explicitly for this purpose - for devices that have already been (or will be) connected to the external network by someone/something outside of libvirt. The fact that it is a *macv*tap rather than a contentional tap device is just a detail. This patch supports using an existing macvtap device with <interface type='ethernet'> by checking the supplied target dev name to see if it is a macvtap device and, when this is the case, calling virNetDevMacVLanTapOpen() instead of virNetDevTapCreate(). For consistency, this is only done when target managed='no'. Resolves: https://bugzilla.redhat.com/1723367 (partially) Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_interface.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_interface.c b/src/qemu/qemu_interface.c index 446f43c364..83580b1a82 100644 --- a/src/qemu/qemu_interface.c +++ b/src/qemu/qemu_interface.c @@ -439,9 +439,18 @@ qemuInterfaceEthernetConnect(virDomainDefPtr def, _("target managed='no' but specified dev doesn't exist")); goto cleanup; } - if (virNetDevTapCreate(&net->ifname, tunpath, tapfd, tapfdSize, - tap_create_flags) < 0) { - goto cleanup; + if (virNetDevMacVLanIsMacvtap(net->ifname)) { + auditdev = net->ifname; + if (virNetDevMacVLanTapOpen(net->ifname, tapfd, tapfdSize) < 0) + goto cleanup; + if (virNetDevMacVLanTapSetup(tapfd, tapfdSize, + virDomainNetIsVirtioModel(net)) < 0) { + goto cleanup; + } + } else { + if (virNetDevTapCreate(&net->ifname, tunpath, tapfd, tapfdSize, + tap_create_flags) < 0) + goto cleanup; } } else { if (!net->ifname || -- 2.21.0

On Tue, Aug 27, 2019 at 09:46:37PM -0400, Laine Stump wrote:
Traditionally, macvtap devices are supported using <interface type='direct'>, but that type requires specifying a source device name and macvtap mode which can't be altered after the initial device creation (and may not even be available to the management software that's creating the XML config to feed to libvirt).
But the attributes in the <source> are essentially describing how the device will be connected to the network, and if libvirt is to be supplied with the name of a macvtap device that has already been created, that device will also already be connected to the network (and the connection can't be changed). Thus it seems more appropriate to use type='ethernet', which was created explicitly for this purpose - for devices that have already been (or will be) connected to the external network by someone/something outside of libvirt. The fact that it is a *macv*tap rather than a contentional tap device is just a detail.
This patch supports using an existing macvtap device with <interface type='ethernet'> by checking the supplied target dev name to see if it is a macvtap device and, when this is the case, calling virNetDevMacVLanTapOpen() instead of virNetDevTapCreate(). For consistency, this is only done when target managed='no'.
Resolves: https://bugzilla.redhat.com/1723367 (partially) Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_interface.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

libvirt creates its tap devices without the IFF_PERSIST flag, so they will be automatically deleted when qemu is finished with them. In the case of tap devices created outside of libvirt, if the creating entity wants the devices to be deleted, it will also omit IFF_PERSIST, but if it wants them to remain (e.g. for re-use), then it will use IFF_PERSIST when creating the device. Back when support was added for autocreation by libvirt of tap devices for <interface type='ethernet'> (commit 9c17d665), code was mistakenly put in qemuProcessStop to always delete tap devices for type='ethernet'. This should only be done on platforms that have VIR_NETDEV_TAP_REQUIRE_MANUAL_CLEANUP #defined (which is only FreeBSD). This mistake has been corrected, along with the unnecessary check for non-null net->ifname (it must always be non-null), and erroneous VIR_FREE of net->ifname. Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_process.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 11c1ba8fb9..3449abf2ec 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -7548,10 +7548,10 @@ void qemuProcessStop(virQEMUDriverPtr driver, cfg->stateDir)); break; case VIR_DOMAIN_NET_TYPE_ETHERNET: - if (net->managed_tap != VIR_TRISTATE_BOOL_NO && net->ifname) { +#ifdef VIR_NETDEV_TAP_REQUIRE_MANUAL_CLEANUP + if (net->managed_tap != VIR_TRISTATE_BOOL_NO) ignore_value(virNetDevTapDelete(net->ifname, net->backend.tap)); - VIR_FREE(net->ifname); - } +#endif break; case VIR_DOMAIN_NET_TYPE_BRIDGE: case VIR_DOMAIN_NET_TYPE_NETWORK: -- 2.21.0

On Tue, Aug 27, 2019 at 09:46:38PM -0400, Laine Stump wrote:
libvirt creates its tap devices without the IFF_PERSIST flag, so they will be automatically deleted when qemu is finished with them. In the case of tap devices created outside of libvirt, if the creating entity wants the devices to be deleted, it will also omit IFF_PERSIST, but if it wants them to remain (e.g. for re-use), then it will use IFF_PERSIST when creating the device.
Back when support was added for autocreation by libvirt of tap devices for <interface type='ethernet'> (commit 9c17d665), code was mistakenly put in qemuProcessStop to always delete tap devices for type='ethernet'. This should only be done on platforms that have VIR_NETDEV_TAP_REQUIRE_MANUAL_CLEANUP #defined (which is only FreeBSD).
This isn't right. The tap devices should *always* be deleted as we don't trust that QEMU hasn't (possibly maliciously) set IFF_PERSIST itself.
This mistake has been corrected, along with the unnecessary check for non-null net->ifname (it must always be non-null), and erroneous VIR_FREE of net->ifname.
There could be a risk of net->ifname being NULL if qemuProcessStart fails early in startup before all tap devices have finished being created IIUC.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_process.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 11c1ba8fb9..3449abf2ec 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -7548,10 +7548,10 @@ void qemuProcessStop(virQEMUDriverPtr driver, cfg->stateDir)); break; case VIR_DOMAIN_NET_TYPE_ETHERNET: - if (net->managed_tap != VIR_TRISTATE_BOOL_NO && net->ifname) { +#ifdef VIR_NETDEV_TAP_REQUIRE_MANUAL_CLEANUP + if (net->managed_tap != VIR_TRISTATE_BOOL_NO) ignore_value(virNetDevTapDelete(net->ifname, net->backend.tap)); - VIR_FREE(net->ifname); - } +#endif break; case VIR_DOMAIN_NET_TYPE_BRIDGE: case VIR_DOMAIN_NET_TYPE_NETWORK: -- 2.21.0
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 9/6/19 5:16 AM, Daniel P. Berrangé wrote:
On Tue, Aug 27, 2019 at 09:46:38PM -0400, Laine Stump wrote:
libvirt creates its tap devices without the IFF_PERSIST flag, so they will be automatically deleted when qemu is finished with them. In the case of tap devices created outside of libvirt, if the creating entity wants the devices to be deleted, it will also omit IFF_PERSIST, but if it wants them to remain (e.g. for re-use), then it will use IFF_PERSIST when creating the device.
Back when support was added for autocreation by libvirt of tap devices for <interface type='ethernet'> (commit 9c17d665), code was mistakenly put in qemuProcessStop to always delete tap devices for type='ethernet'. This should only be done on platforms that have VIR_NETDEV_TAP_REQUIRE_MANUAL_CLEANUP #defined (which is only FreeBSD). This isn't right. The tap devices should *always* be deleted as we don't trust that QEMU hasn't (possibly maliciously) set IFF_PERSIST itself.
(So you're saying this is a security issue that was coincidentally fixed by commit 9c17d665?) Interesting point. But I wonder if it's also problematic that (in the case of a tap device created by someone else (not libvirt) who purposefully set IFF_PERSIST) QEMU could mistakenly/maliciously *clear* IFF_PERSIST. I guess there's really nothing we could do about that though, since the device would already be deleted by the time we found out about it... I found this bit of code specifically because I was creating tap devices with IFF_PERSIST set (that's just what "ip tuntap add" does), and they were disappearing after each use, which may or may not be what the user wants - another case of "someone" clearing IFF_PERSIST, but in this case it is *us*. And as a matter of fact I can't see a way to even force macvtap devices to be deleted by an unprivileged process - when I had libvirt try to do the standard delete, it would fail. So having this unconditional forced delete of all standard tap devices both causes an unexpected behavior for some users, as well as creating an inconsistency between tap and macvtap behavior (standard taps are always deleted, macvtaps are never deleted). (This reminds me of another inconsistency I saw while looking at this, but then later forgot - virNetDevTapDelete() is *never * called in the case of hot-unplug. So if you think that we should be unconditionally deleting all taps after use regardless of the previous state of IFF_PERSIST of pre-created taps, then we should also be doing it for hot-unplug.) So how about if we remember the setting of IFF_PERSIST prior to starting QEMU, and restore it to its previous state afterwards? That would make behavior more what was expected / consistent with macvtap. To change the topic a bit - I actually find it strange that some of the IF flags can be modified by an unprivileged process and some can't. From what I've seen (if I'm remembering my experiments correctly - I didn't take notes, but the implementation is based on the following observations), an unprivileged process can set/clear: IFF_VNET_HDR IFF_PERSIST but can't set/clear IFF_MULTI_QUEUE IFF_UP I can't really see a way to categorize the implications of these flags to justify the difference - each looks just as important/potentially disruptive/not as the other. (Also it's not possible to change the MAC address). And to top that off, the method of getting a handle to a standard tap device is via opening /dev/tun/tap and then calling TUNSETIFF (which magically makes the handle you have to /dev/tun/tap be a handle to the specific tap device), and if you request the wrong setting of an "unmodifiable" IF flag in the TUNSETIFF ifreq struct, it will fail. For example, if the tap was created with IFF_MULTI_QUEUE and libvirt doesn't set IFF_MULTI_QUEUE in the TUNSETIFF, the ioctl will fail (and the same for the opposite setting of the flags). For IFF_VNET_HDR, though, the value it was created with is essentially ignored, and the setting given in libvirt's TUNSETIFF is honored. The result is that we can't just "leave all the flags alone", we have to make sure some are set the same as at tap creation, and others are set as we want them to be (regardless of how the tap was created). (The list of which flags can/can't be set by an unprivileged process is at least consistent between tap and macvtap, although it's problematic that you can clear IFF_PERSIST for a tap (effectively deleting it), but *can't* do an RTM_DELLINK on a macvtap). I went in assuming that *none* of the flags would be changeable, and I could just not set any of them, but was sorely disappointed - I have to set IFF_VNET_HDR appropriately or performance of virtio devices will suffer, and I have to set IFF_MULTI_QUEUE appropriately or the TUNSETIFF will fail completely.
This mistake has been corrected, along with the unnecessary check for non-null net->ifname (it must always be non-null), and erroneous VIR_FREE of net->ifname. There could be a risk of net->ifname being NULL if qemuProcessStart fails early in startup before all tap devices have finished being created IIUC.
Ah, I hadn't thought of the case where qemuProcessStop() is just being used to clean up after a failed qemuProcessStart(). I'll go back and recheck.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_process.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 11c1ba8fb9..3449abf2ec 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -7548,10 +7548,10 @@ void qemuProcessStop(virQEMUDriverPtr driver, cfg->stateDir)); break; case VIR_DOMAIN_NET_TYPE_ETHERNET: - if (net->managed_tap != VIR_TRISTATE_BOOL_NO && net->ifname) { +#ifdef VIR_NETDEV_TAP_REQUIRE_MANUAL_CLEANUP + if (net->managed_tap != VIR_TRISTATE_BOOL_NO) ignore_value(virNetDevTapDelete(net->ifname, net->backend.tap)); - VIR_FREE(net->ifname); - } +#endif break; case VIR_DOMAIN_NET_TYPE_BRIDGE: case VIR_DOMAIN_NET_TYPE_NETWORK: -- 2.21.0
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list Regards, Daniel

On Fri, Sep 06, 2019 at 11:37:12AM -0400, Laine Stump wrote:
On 9/6/19 5:16 AM, Daniel P. Berrangé wrote:
On Tue, Aug 27, 2019 at 09:46:38PM -0400, Laine Stump wrote:
libvirt creates its tap devices without the IFF_PERSIST flag, so they will be automatically deleted when qemu is finished with them. In the case of tap devices created outside of libvirt, if the creating entity wants the devices to be deleted, it will also omit IFF_PERSIST, but if it wants them to remain (e.g. for re-use), then it will use IFF_PERSIST when creating the device.
Back when support was added for autocreation by libvirt of tap devices for <interface type='ethernet'> (commit 9c17d665), code was mistakenly put in qemuProcessStop to always delete tap devices for type='ethernet'. This should only be done on platforms that have VIR_NETDEV_TAP_REQUIRE_MANUAL_CLEANUP #defined (which is only FreeBSD). This isn't right. The tap devices should *always* be deleted as we don't trust that QEMU hasn't (possibly maliciously) set IFF_PERSIST itself.
(So you're saying this is a security issue that was coincidentally fixed by commit 9c17d665?)
I'm not sure I'd call it a security issue - more of a reliability issue - since its really just a denial of service at most. We just want to be sure we're not leaving anything behind when QEMU quits.
Interesting point. But I wonder if it's also problematic that (in the case of a tap device created by someone else (not libvirt) who purposefully set IFF_PERSIST) QEMU could mistakenly/maliciously *clear* IFF_PERSIST. I guess there's really nothing we could do about that though, since the device would already be deleted by the time we found out about it...
I found this bit of code specifically because I was creating tap devices with IFF_PERSIST set (that's just what "ip tuntap add" does), and they were disappearing after each use, which may or may not be what the user wants - another case of "someone" clearing IFF_PERSIST, but in this case it is *us*.
If they don't want the device deleted, then they can set managed=no now with your patch series.
And as a matter of fact I can't see a way to even force macvtap devices to be deleted by an unprivileged process - when I had libvirt try to do the standard delete, it would fail. So having this unconditional forced delete of all standard tap devices both causes an unexpected behavior for some users, as well as creating an inconsistency between tap and macvtap behavior (standard taps are always deleted, macvtaps are never deleted).
We don't currently support pre-createds macvtap, so we can fix this inconsistency so that it works the way tap devs do.
(This reminds me of another inconsistency I saw while looking at this, but then later forgot - virNetDevTapDelete() is *never * called in the case of hot-unplug. So if you think that we should be unconditionally deleting all taps after use regardless of the previous state of IFF_PERSIST of pre-created taps, then we should also be doing it for hot-unplug.)
So how about if we remember the setting of IFF_PERSIST prior to starting QEMU, and restore it to its previous state afterwards? That would make behavior more what was expected / consistent with macvtap.
I don't think we need rmemeber IFF_PERSIST when we have the "managed" flag now. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 9/6/19 11:46 AM, Daniel P. Berrangé wrote:
On Fri, Sep 06, 2019 at 11:37:12AM -0400, Laine Stump wrote:
On 9/6/19 5:16 AM, Daniel P. Berrangé wrote:
On Tue, Aug 27, 2019 at 09:46:38PM -0400, Laine Stump wrote:
libvirt creates its tap devices without the IFF_PERSIST flag, so they will be automatically deleted when qemu is finished with them. In the case of tap devices created outside of libvirt, if the creating entity wants the devices to be deleted, it will also omit IFF_PERSIST, but if it wants them to remain (e.g. for re-use), then it will use IFF_PERSIST when creating the device.
Back when support was added for autocreation by libvirt of tap devices for <interface type='ethernet'> (commit 9c17d665), code was mistakenly put in qemuProcessStop to always delete tap devices for type='ethernet'. This should only be done on platforms that have VIR_NETDEV_TAP_REQUIRE_MANUAL_CLEANUP #defined (which is only FreeBSD). This isn't right. The tap devices should *always* be deleted as we don't trust that QEMU hasn't (possibly maliciously) set IFF_PERSIST itself.
(So you're saying this is a security issue that was coincidentally fixed by commit 9c17d665?) I'm not sure I'd call it a security issue - more of a reliability issue - since its really just a denial of service at most. We just want to be sure we're not leaving anything behind when QEMU quits.
Interesting point. But I wonder if it's also problematic that (in the case of a tap device created by someone else (not libvirt) who purposefully set IFF_PERSIST) QEMU could mistakenly/maliciously *clear* IFF_PERSIST. I guess there's really nothing we could do about that though, since the device would already be deleted by the time we found out about it...
I found this bit of code specifically because I was creating tap devices with IFF_PERSIST set (that's just what "ip tuntap add" does), and they were disappearing after each use, which may or may not be what the user wants - another case of "someone" clearing IFF_PERSIST, but in this case it is *us*. If they don't want the device deleted, then they can set managed=no now with your patch series.
Right. In the case of <interface type='ethernet'> anyway. There *could* be cases where someone is using a pre-existing tap device with <interface type='bridge'> (that works, although only by the coincidence that TUNSETIFF will create the specified device if it doesn't exist, or just return a handle to any existing device), but I suppose anyone already doing that will already be accustomed to the current (since 2016) behavior. And I'm trying to think of a good reason why somebody would want to use a pre-existing tap device but still rely on libvirt to connect it to the bridge, and I can't think of anything, so the number of new people who would want to do this is probably vanishingly small. TL;DR - I'm convinced. I think for consistency I will make a patch that does the same thing both for shutting qemuProcessStop() and qemuDomainRemoveNetDevice() (hot-unplug). (Really it looks like a good reorganization is in order - there are multiple bits that cycle through all the netdefs to shutdown various things, and they all should just happen in one place, if for no other reason than so that we can sequester them off in qemu_interface.c, provide a simple API, then make that into its own module. Or "block", some would say :-)
And as a matter of fact I can't see a way to even force macvtap devices to be deleted by an unprivileged process - when I had libvirt try to do the standard delete, it would fail. So having this unconditional forced delete of all standard tap devices both causes an unexpected behavior for some users, as well as creating an inconsistency between tap and macvtap behavior (standard taps are always deleted, macvtaps are never deleted). We don't currently support pre-createds macvtap, so we can fix this inconsistency so that it works the way tap devs do.
Right. And since the only way to use a pre-created macvtap is with managed=no, we've done that (again, as long as QEMU doesn't clear IFF_PERSIST on standard taps).
(This reminds me of another inconsistency I saw while looking at this, but then later forgot - virNetDevTapDelete() is *never * called in the case of hot-unplug. So if you think that we should be unconditionally deleting all taps after use regardless of the previous state of IFF_PERSIST of pre-created taps, then we should also be doing it for hot-unplug.)
So how about if we remember the setting of IFF_PERSIST prior to starting QEMU, and restore it to its previous state afterwards? That would make behavior more what was expected / consistent with macvtap. I don't think we need rmemeber IFF_PERSIST when we have the "managed" flag now.
Yeah, the only situation that would warrant intervention would be to try and recover from QEMU maliciously clearing IFF_PERSIST, but by the time libvirt gets control it's too late anyway - the tap is already deleted. As always, thanks for the reviews!

with info about support for using precreated tap/macvtap devices in unprivileged libvirtd. Signed-off-by: Laine Stump <laine@redhat.com> --- docs/news.xml | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/docs/news.xml b/docs/news.xml index c6580e4e72..be51d6c953 100644 --- a/docs/news.xml +++ b/docs/news.xml @@ -50,6 +50,19 @@ for Hyper-V guests. </description> </change> + <change> + <summary> + qemu: Support use of precreated tap/macvtap devices by unprivileged libvirtd + </summary> + <description> + It is now possible for an unprivileged libvirtd to make use + of tap and macvtap devices that were previously created by + some other entity. Since is done by setting + <code>managed='no'</code> along with the device name in the + <code>target</code> subelement of <code><interface + type='ethernet'></code> + </description> + </change> </section> <section title="Removed features"> <change> -- 2.21.0

On Tue, 2019-08-27 at 21:46 -0400, Laine Stump wrote:
with info about support for using precreated tap/macvtap devices in unprivileged libvirtd.
Signed-off-by: Laine Stump <laine@redhat.com> --- docs/news.xml | 13 +++++++++++++ 1 file changed, 13 insertions(+)
You mentioned this was never reviewed, so here we go :)
+ <change> + <summary> + qemu: Support use of precreated tap/macvtap devices by unprivileged libvirtd + </summary> + <description> + It is now possible for an unprivileged libvirtd to make use + of tap and macvtap devices that were previously created by + some other entity. Since is done by setting
s/Since/This/
+ <code>managed='no'</code> along with the device name in the + <code>target</code> subelement of <code><interface + type='ethernet'></code>
The description should end with a period. With the above fixed, and the patch updated so that it actually applies, Reviewed-by: Andrea Bolognani <abologna@redhat.com> -- Andrea Bolognani / Red Hat / Virtualization

ping. Aside from review, I'm interested whether or not there is agreement with using type='ethernet' for precreated *macvtap* devices as well as standard tap - on one hand macvtap devices have traditionally been handled by type='direct'; on the other hand, type='direct' requires/uses info about how the device is connected to the network, which we don't have / can't use when the device is pre-created by someone else, and type='ethernet is specifically for network devices that have their connection setup "elsewhere" outside of libvirt (it's really just a quirk of their implementations that the method for getting a handle to a standard tap device is by opening /dev/tun/tap and calling ioctl(TUNSETIFF), while the way to get a handle for a macvtap device is to open /dev/tap${ifindex} - they're otherwise essentially the same thing from the POV of qemu, just an fd that's used as a gazinta/gazouta for packets). On 8/27/19 9:46 PM, Laine Stump wrote:
This resolves https://bugzilla.redhat.com/1723367
It has become more popular to run libvirtd in an unprivileged environment (e.g. inside a container), but until now the only possible types of network connection for a qemu started by an unprivileged libvirtd were:
1) a usermode slirp connection
2) a tap device connection to a bridge handled by running qemu-bridge-helper (a suid-root utility distributed with qemu)
3) a host network card assigned to the guest using VFIO (this requires special setup by a privileged process though)
This patch series remedies that by making it possible for libvirtd to use a tap device that has been pre-created (*and* properly setup) by some other process beforehand.
In order to use this, you must have a standard tap, or macvtap device that has been set to be owned by the uid that will be running libvirtd, has its MAC address already set, and has been set online (IFF_UP). For example, here are the commands to create a standard tap device named "mytap0", attach it to the host bridge device "br0" and prepare it for use by a libvirtd that is running as user "laine":
ip tuntap add mode tap user laine group laine name mytap0 ip link set mytap0 up ip link set mytap0 master br0
(You may want to set a specific MAC address for the tap device, but as long as it *doesn't* match the MAC address used by the guest emulated device, it really doesn't matter)
You can now add the following <interface> to a domain definition:
<interface type='ethernet'> <model type='virtio'/> <mac address='52:54:00:11:11:11'/> <target dev='mytap0' managed='no'/> </interface>
and start up the guest.
A similar set of commands to create a macvtap device named "mymacvtap0" with MAC addres 52:54:00:11:11:11 connected to the host device "en2" would be something like this:
ip link add link en2 name mymacvtap0 address 52:54:00:11:11:11 \ type macvtap mode bridge ip link set mymacvtap0 up
The XML would be identical, except the name of the device
<interface type='ethernet'> <model type='virtio'/> <mac address='52:54:00:11:11:11'/> <target dev='mymacvtap0' managed='no'/> </interface>
(Note that in the case of macvtap, the precreated device must *match* the MAC address of the emulated guest device).
If libvirtd is given a precreated device, that device will *not* be explicitly deleted when qemu is finished with it - the caller must take care of that.
Laine Stump (9): util: new function virNetDevMacVLanIsMacvtap() util: make a couple virNetDevMacVlan*() functions public qemu: reorganize qemuInterfaceEthernetConnect() conf: use virXMLFormatElement for interface <target> conf: new "managed" attribute for target dev of <interface type='ethernet'> qemu: support unmanaged target tap dev for <interface type='ethernet'> qemu: support unmanaged macvtap devices with <interface type='ethernet'> qemu: explicitly delete standard tap devices only on platforms that require it docs: update news file
docs/formatdomain.html.in | 48 +++++++--- docs/news.xml | 13 +++ docs/schemas/domaincommon.rng | 5 ++ src/conf/domain_conf.c | 55 +++++++++--- src/conf/domain_conf.h | 1 + src/libvirt_private.syms | 3 + src/qemu/qemu_interface.c | 89 ++++++++++++------- src/qemu/qemu_process.c | 6 +- src/util/virnetdev.h | 2 +- src/util/virnetdevmacvlan.c | 35 ++++++-- src/util/virnetdevmacvlan.h | 12 +++ .../net-eth-unmanaged-tap.args | 32 +++++++ .../net-eth-unmanaged-tap.xml | 35 ++++++++ tests/qemuxml2argvmock.c | 16 +++- tests/qemuxml2argvtest.c | 1 + .../net-eth-unmanaged-tap.xml | 40 +++++++++ tests/qemuxml2xmltest.c | 1 + 17 files changed, 329 insertions(+), 65 deletions(-) create mode 100644 tests/qemuxml2argvdata/net-eth-unmanaged-tap.args create mode 100644 tests/qemuxml2argvdata/net-eth-unmanaged-tap.xml create mode 100644 tests/qemuxml2xmloutdata/net-eth-unmanaged-tap.xml
participants (4)
-
Andrea Bolognani
-
Daniel P. Berrangé
-
Laine Stump
-
Laine Stump