[libvirt] [PATCH v2 0/2] qemu: invoke qemu-bridge-helper from libvirtd

The <interface type='bridge'> is working mostly because of a peculiar design decision in Linux. Ideally, QEMU would run with an empty capability bounding set and would not be able to do any privileged operation (not even by running a helper program). This is not the case because dropping capabilities from the bounding set requires a capability of its own, CAP_SETPCAP; thus QEMU does *not* run with an empty bounding set if invoked via qemu:///session. This is apparently for security reasons, to avoid that dropping _some_ caps but not all of them lets you exploit untested error paths in suid binaries. This series lets libvirtd invoke the privileged helper program on its own, which is a cleaner design that would work even if the above Linux quirk was not there. Also, this adds a <target dev='tap0'/> element to the XML of an active domain using <interface type='bridge'>. Thanks to the patches that have already been committed, the recvfd and virCommand APIs make the task almost trivial. v1->v2: OOM fix in patch 1, change label name in patch 2, rebase Paolo Bonzini (2): virnetdevtap: add virNetDevTapGetName qemu: launch bridge helper from libvirtd src/libvirt_private.syms | 1 + src/qemu/qemu_command.c | 133 ++++++++++++++++++++++++++++++++++------------- src/qemu/qemu_command.h | 1 - src/qemu/qemu_hotplug.c | 25 +++------ src/util/virnetdevtap.c | 33 ++++++++++++ src/util/virnetdevtap.h | 3 ++ 6 files changed, 143 insertions(+), 53 deletions(-) -- 1.8.2

This will be used on a tap file descriptor returned by the bridge helper to populate the <target> element, because the helper does not provide the interface name. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virnetdevtap.c | 33 +++++++++++++++++++++++++++++++++ src/util/virnetdevtap.h | 3 +++ 3 files changed, 37 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index f778e9c..eac5352 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1528,6 +1528,7 @@ virNetDevOpenvswitchSetMigrateData; virNetDevTapCreate; virNetDevTapCreateInBridgePort; virNetDevTapDelete; +virNetDevTapGetName; # util/virnetdevveth.h diff --git a/src/util/virnetdevtap.c b/src/util/virnetdevtap.c index e4ce223..5e88383 100644 --- a/src/util/virnetdevtap.c +++ b/src/util/virnetdevtap.c @@ -43,6 +43,40 @@ #define VIR_FROM_THIS VIR_FROM_NONE /** + * virNetDevTapGetName: + * @tapfd: a tun/tap file descriptor + * @ifname: a pointer that will receive the interface name + * + * Retrieve the interface name given a file descriptor for a tun/tap + * interface. + * + * Returns 0 if the interface name is successfully queried, -1 otherwise + */ +int +virNetDevTapGetName(int tapfd, char **ifname) +{ +#ifdef TUNGETIFF + struct ifreq ifr; + + if (ioctl(tapfd, TUNGETIFF, &ifr) < 0) { + virReportSystemError(errno, "%s", + _("Unable to query tap interface name")); + return -1; + } + + *ifname = strdup(ifr.ifr_name); + if (*ifname == NULL) { + virReportOOMError(); + return -1; + } + return 0; +#else + return -1; +#endif +} + + +/** * virNetDevProbeVnetHdr: * @tapfd: a tun/tap file descriptor * diff --git a/src/util/virnetdevtap.h b/src/util/virnetdevtap.h index 980db61..6bfc80c 100644 --- a/src/util/virnetdevtap.h +++ b/src/util/virnetdevtap.h @@ -35,6 +35,9 @@ int virNetDevTapCreate(char **ifname, int virNetDevTapDelete(const char *ifname) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; +int virNetDevTapGetName(int tapfd, char **ifname) + ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; + typedef enum { VIR_NETDEV_TAP_CREATE_NONE = 0, /* Bring the interface up */ -- 1.8.2

On Sat, Apr 20, 2013 at 11:11:24AM +0200, Paolo Bonzini wrote:
This will be used on a tap file descriptor returned by the bridge helper to populate the <target> element, because the helper does not provide the interface name.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virnetdevtap.c | 33 +++++++++++++++++++++++++++++++++ src/util/virnetdevtap.h | 3 +++ 3 files changed, 37 insertions(+)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

<source type='bridge'> uses a helper application to do the necessary TUN/TAP setup to use an existing network bridge, thus letting unprivileged users use TUN/TAP interfaces. However, libvirt should be preventing QEMU from running any setuid programs at all, which would include this helper program. From a security POV, any setuid helper needs to be run by libvirtd itself, not QEMU. This is what this patch does. libvirt now invokes the setuid helper, gets the TAP fd and then passes it to QEMU in the normal manner. The path to the helper is specified in qemu.conf. As a small advantage, this adds a <target dev='tap0'/> element to the XML of an active domain using <interface type='bridge'>. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- src/qemu/qemu_command.c | 133 +++++++++++++++++++++++++++++++++++------------- src/qemu/qemu_command.h | 1 - src/qemu/qemu_hotplug.c | 25 +++------ 3 files changed, 106 insertions(+), 53 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 05c12b2..cc10dd4 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -28,12 +28,14 @@ #include "qemu_capabilities.h" #include "qemu_bridge_filter.h" #include "cpu/cpu.h" +#include "passfd.h" #include "viralloc.h" #include "virlog.h" #include "virarch.h" #include "virerror.h" #include "virutil.h" #include "virfile.h" +#include "virnetdev.h" #include "virstring.h" #include "viruuid.h" #include "c-ctype.h" @@ -47,6 +49,9 @@ #include "device_conf.h" #include "virstoragefile.h" #include "virtpm.h" +#if defined(__linux__) +# include <linux/capability.h> +#endif #include <sys/stat.h> #include <fcntl.h> @@ -194,6 +199,77 @@ error: } +/** + * qemuCreateInBridgePortWithHelper: + * @cfg: the configuration object in which the helper name is looked up + * @brname: the bridge name + * @ifname: the returned interface name + * @macaddr: the returned MAC address + * @tapfd: file descriptor return value for the new tap device + * @flags: OR of virNetDevTapCreateFlags: + + * VIR_NETDEV_TAP_CREATE_VNET_HDR + * - Enable IFF_VNET_HDR on the tap device + * + * This function creates a new tap device on a bridge using an external + * helper. The final name for the bridge will be stored in @ifname. + * + * Returns 0 in case of success or -1 on failure + */ +static int qemuCreateInBridgePortWithHelper(virQEMUDriverConfigPtr cfg, + const char *brname, + char **ifname, + int *tapfd, + unsigned int flags) +{ + virCommandPtr cmd; + int status; + int pair[2] = { -1, -1 }; + + if ((flags & ~VIR_NETDEV_TAP_CREATE_VNET_HDR) != VIR_NETDEV_TAP_CREATE_IFUP) + return -1; + + if (socketpair(AF_UNIX, SOCK_STREAM, 0, pair) < 0) { + virReportSystemError(errno, "%s", _("failed to create socket")); + return -1; + } + + cmd = virCommandNew(cfg->bridgeHelperName); + if (flags & VIR_NETDEV_TAP_CREATE_VNET_HDR) + virCommandAddArgFormat(cmd, "--use-vnet"); + virCommandAddArgFormat(cmd, "--br=%s", brname); + virCommandAddArgFormat(cmd, "--fd=%d", pair[1]); + virCommandTransferFD(cmd, pair[1]); + virCommandClearCaps(cmd); +#ifdef CAP_NET_ADMIN + virCommandAllowCap(cmd, CAP_NET_ADMIN); +#endif + if (virCommandRunAsync(cmd, NULL) < 0) { + *tapfd = -1; + goto cleanup; + } + + do { + *tapfd = recvfd(pair[0], 0); + } while (*tapfd < 0 && errno == EINTR); + if (*tapfd < 0) { + virReportSystemError(errno, "%s", + _("failed to retrieve file descriptor for interface")); + goto cleanup; + } + + if (virNetDevTapGetName(*tapfd, ifname) < 0 || + virCommandWait(cmd, &status) < 0) { + VIR_FORCE_CLOSE(*tapfd); + *tapfd = -1; + } + +cleanup: + virCommandFree(cmd); + VIR_FORCE_CLOSE(pair[0]); + return *tapfd < 0 ? -1 : 0; +} + int qemuNetworkIfaceConnect(virDomainDefPtr def, virConnectPtr conn, @@ -271,11 +347,17 @@ qemuNetworkIfaceConnect(virDomainDefPtr def, tap_create_flags |= VIR_NETDEV_TAP_CREATE_VNET_HDR; } - err = virNetDevTapCreateInBridgePort(brname, &net->ifname, &net->mac, - def->uuid, &tapfd, - virDomainNetGetActualVirtPortProfile(net), - virDomainNetGetActualVlan(net), - tap_create_flags); + if (cfg->privileged) + err = virNetDevTapCreateInBridgePort(brname, &net->ifname, &net->mac, + def->uuid, &tapfd, + virDomainNetGetActualVirtPortProfile(net), + virDomainNetGetActualVlan(net), + tap_create_flags); + else + err = qemuCreateInBridgePortWithHelper(cfg, brname, + &net->ifname, + &tapfd, tap_create_flags); + virDomainAuditNetDevice(def, net, "/dev/net/tun", tapfd >= 0); if (err < 0) { if (template_ifname) @@ -3745,7 +3827,6 @@ error: char * qemuBuildHostNetStr(virDomainNetDefPtr net, virQEMUDriverPtr driver, - virQEMUCapsPtr qemuCaps, char type_sep, int vlan, const char *tapfd, @@ -3754,7 +3835,6 @@ qemuBuildHostNetStr(virDomainNetDefPtr net, bool is_tap = false; virBuffer buf = VIR_BUFFER_INITIALIZER; enum virDomainNetType netType = virDomainNetGetActualType(net); - const char *brname = NULL; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); if (net->script && netType != VIR_DOMAIN_NET_TYPE_ETHERNET) { @@ -3772,14 +3852,6 @@ qemuBuildHostNetStr(virDomainNetDefPtr net, * through, -net tap,fd */ case VIR_DOMAIN_NET_TYPE_BRIDGE: - if (!cfg->privileged && - virQEMUCapsGet(qemuCaps, QEMU_CAPS_NETDEV_BRIDGE)) { - brname = virDomainNetGetActualBridgeName(net); - virBufferAsprintf(&buf, "bridge%cbr=%s", type_sep, brname); - type_sep = ','; - is_tap = true; - break; - } case VIR_DOMAIN_NET_TYPE_NETWORK: case VIR_DOMAIN_NET_TYPE_DIRECT: virBufferAsprintf(&buf, "tap%cfd=%s", type_sep, tapfd); @@ -6780,26 +6852,17 @@ qemuBuildCommandLine(virConnectPtr conn, if (actualType == VIR_DOMAIN_NET_TYPE_NETWORK || actualType == VIR_DOMAIN_NET_TYPE_BRIDGE) { - /* - * If type='bridge' then we attempt to allocate the tap fd here only if - * running under a privilged user or -netdev bridge option is not - * supported. - */ - if (actualType == VIR_DOMAIN_NET_TYPE_NETWORK || - cfg->privileged || - (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_NETDEV_BRIDGE))) { - int tapfd = qemuNetworkIfaceConnect(def, conn, driver, net, - qemuCaps); - if (tapfd < 0) - goto error; + int tapfd = qemuNetworkIfaceConnect(def, conn, driver, net, + qemuCaps); + if (tapfd < 0) + goto error; - last_good_net = i; - virCommandTransferFD(cmd, tapfd); + last_good_net = i; + virCommandTransferFD(cmd, tapfd); - if (snprintf(tapfd_name, sizeof(tapfd_name), "%d", - tapfd) >= sizeof(tapfd_name)) - goto no_memory; - } + if (snprintf(tapfd_name, sizeof(tapfd_name), "%d", + tapfd) >= sizeof(tapfd_name)) + goto no_memory; } else if (actualType == VIR_DOMAIN_NET_TYPE_DIRECT) { int tapfd = qemuPhysIfaceConnect(def, driver, net, qemuCaps, vmop); @@ -6843,7 +6906,7 @@ qemuBuildCommandLine(virConnectPtr conn, if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_NETDEV) && virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE)) { virCommandAddArg(cmd, "-netdev"); - if (!(host = qemuBuildHostNetStr(net, driver, qemuCaps, + if (!(host = qemuBuildHostNetStr(net, driver, ',', vlan, tapfd_name, vhostfd_name))) goto error; @@ -6867,7 +6930,7 @@ qemuBuildCommandLine(virConnectPtr conn, if (!(virQEMUCapsGet(qemuCaps, QEMU_CAPS_NETDEV) && virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE))) { virCommandAddArg(cmd, "-net"); - if (!(host = qemuBuildHostNetStr(net, driver, qemuCaps, + if (!(host = qemuBuildHostNetStr(net, driver, ',', vlan, tapfd_name, vhostfd_name))) goto error; diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index 1789c20..083cf5d 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -71,7 +71,6 @@ qemuBuildChrDeviceStr (virDomainChrDefPtr serial, /* With vlan == -1, use netdev syntax, else old hostnet */ char * qemuBuildHostNetStr(virDomainNetDefPtr net, virQEMUDriverPtr driver, - virQEMUCapsPtr qemuCaps, char type_sep, int vlan, const char *tapfd, diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 238d0d7..e6a2e9a 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -732,21 +732,12 @@ int qemuDomainAttachNetDevice(virConnectPtr conn, if (actualType == VIR_DOMAIN_NET_TYPE_BRIDGE || actualType == VIR_DOMAIN_NET_TYPE_NETWORK) { - /* - * If type=bridge then we attempt to allocate the tap fd here only if - * running under a privilged user or -netdev bridge option is not - * supported. - */ - if (actualType == VIR_DOMAIN_NET_TYPE_NETWORK || - cfg->privileged || - (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_NETDEV_BRIDGE))) { - if ((tapfd = qemuNetworkIfaceConnect(vm->def, conn, driver, net, - priv->qemuCaps)) < 0) - goto cleanup; - iface_connected = true; - if (qemuOpenVhostNet(vm->def, net, priv->qemuCaps, &vhostfd) < 0) - goto cleanup; - } + if ((tapfd = qemuNetworkIfaceConnect(vm->def, conn, driver, net, + priv->qemuCaps)) < 0) + goto cleanup; + iface_connected = true; + if (qemuOpenVhostNet(vm->def, net, priv->qemuCaps, &vhostfd) < 0) + goto cleanup; } else if (actualType == VIR_DOMAIN_NET_TYPE_DIRECT) { if ((tapfd = qemuPhysIfaceConnect(vm->def, driver, net, priv->qemuCaps, @@ -806,12 +797,12 @@ int qemuDomainAttachNetDevice(virConnectPtr conn, if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_NETDEV) && virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE)) { - if (!(netstr = qemuBuildHostNetStr(net, driver, priv->qemuCaps, + if (!(netstr = qemuBuildHostNetStr(net, driver, ',', -1, tapfd_name, vhostfd_name))) goto cleanup; } else { - if (!(netstr = qemuBuildHostNetStr(net, driver, priv->qemuCaps, + if (!(netstr = qemuBuildHostNetStr(net, driver, ' ', vlan, tapfd_name, vhostfd_name))) goto cleanup; -- 1.8.2

On Sat, Apr 20, 2013 at 11:11:25AM +0200, Paolo Bonzini wrote:
<source type='bridge'> uses a helper application to do the necessary TUN/TAP setup to use an existing network bridge, thus letting unprivileged users use TUN/TAP interfaces.
However, libvirt should be preventing QEMU from running any setuid programs at all, which would include this helper program. From a security POV, any setuid helper needs to be run by libvirtd itself, not QEMU.
This is what this patch does. libvirt now invokes the setuid helper, gets the TAP fd and then passes it to QEMU in the normal manner. The path to the helper is specified in qemu.conf.
As a small advantage, this adds a <target dev='tap0'/> element to the XML of an active domain using <interface type='bridge'>.
That's very good because it allows the network interfaces stats API to work
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- src/qemu/qemu_command.c | 133 +++++++++++++++++++++++++++++++++++------------- src/qemu/qemu_command.h | 1 - src/qemu/qemu_hotplug.c | 25 +++------ 3 files changed, 106 insertions(+), 53 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Il 20/04/2013 11:11, Paolo Bonzini ha scritto:
The <interface type='bridge'> is working mostly because of a peculiar design decision in Linux. Ideally, QEMU would run with an empty capability bounding set and would not be able to do any privileged operation (not even by running a helper program). This is not the case because dropping capabilities from the bounding set requires a capability of its own, CAP_SETPCAP; thus QEMU does *not* run with an empty bounding set if invoked via qemu:///session. This is apparently for security reasons, to avoid that dropping _some_ caps but not all of them lets you exploit untested error paths in suid binaries.
This series lets libvirtd invoke the privileged helper program on its own, which is a cleaner design that would work even if the above Linux quirk was not there. Also, this adds a <target dev='tap0'/> element to the XML of an active domain using <interface type='bridge'>.
Thanks to the patches that have already been committed, the recvfd and virCommand APIs make the task almost trivial.
v1->v2: OOM fix in patch 1, change label name in patch 2, rebase
Paolo Bonzini (2): virnetdevtap: add virNetDevTapGetName qemu: launch bridge helper from libvirtd
src/libvirt_private.syms | 1 + src/qemu/qemu_command.c | 133 ++++++++++++++++++++++++++++++++++------------- src/qemu/qemu_command.h | 1 - src/qemu/qemu_hotplug.c | 25 +++------ src/util/virnetdevtap.c | 33 ++++++++++++ src/util/virnetdevtap.h | 3 ++ 6 files changed, 143 insertions(+), 53 deletions(-)
Please apply. :) Paolo

On 04/26/2013 03:32 PM, Paolo Bonzini wrote:
Il 20/04/2013 11:11, Paolo Bonzini ha scritto:
The <interface type='bridge'> is working mostly because of a peculiar design decision in Linux. Ideally, QEMU would run with an empty capability bounding set and would not be able to do any privileged operation (not even by running a helper program). This is not the case because dropping capabilities from the bounding set requires a capability of its own, CAP_SETPCAP; thus QEMU does *not* run with an empty bounding set if invoked via qemu:///session. This is apparently for security reasons, to avoid that dropping _some_ caps but not all of them lets you exploit untested error paths in suid binaries.
This series lets libvirtd invoke the privileged helper program on its own, which is a cleaner design that would work even if the above Linux quirk was not there. Also, this adds a <target dev='tap0'/> element to the XML of an active domain using <interface type='bridge'>.
Thanks to the patches that have already been committed, the recvfd and virCommand APIs make the task almost trivial.
v1->v2: OOM fix in patch 1, change label name in patch 2, rebase
Please apply. :)
Done. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (3)
-
Daniel P. Berrange
-
Eric Blake
-
Paolo Bonzini