[libvirt-users] QEMU interface type=ethernet

With Libvirt under modern kernels, you can't use <interface type='ethernet'> unless QEMU is running as root. Running qemu as root is not ideal, but I was able to track down the issue to this linux change: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ca... Which means that if you're seeing errors like this: 2015-03-02T18:00:51.243477Z qemu-kvm: -netdev tap,script=/tmp/vnet380622.sh,id=hostnet1: could not open /dev/net/tun: Operation not permitted 2015-03-02T18:00:51.243518Z qemu-kvm: -netdev tap,script=/tmp/vnet380622.sh,id=hostnet1: Device 'tap' could not be initialized They can be resolved like this: 1) Edit /etc/libvirt/qemu.conf, and add "/dev/net/tun" to the cgroup_device_acl option 2) Run: setcap cap_net_admin+eip /bin/qemu-system-x86_64 This will give QEMU CAP_NET_ADMIN when it runs. Make sure you review `man capabilities` to see what capabilities this actually gets qemu. The downside here is that in the event a guest somehow breaks out of qemu, CAP_NET_ADMIN gives them a bunch of scary permissions that could result in you having a seriously bad day (it's enough permissions to MITM all the machine's traffic, which could easily result in compromise) It looks to me like libvirt already has the ability to create tap devices and pass them into qemu (src/util/virnetdevtap.c - virNetDevTapCreateInBridgePort), however you need to actually be using a bridged network to do this. There is no way to have libvirt just create a tap device and leave the rest to user defined scripts. I don't think I have the necessary knowledge to add that feature in a generic way, but it seems like it would be pretty handy. I'll probably just work around it by removing the virNetDevBridgeAddPort call from our version of libvirt.

On 3/2/2015 1:41 PM, Brian Rak wrote:
With Libvirt under modern kernels, you can't use <interface type='ethernet'> unless QEMU is running as root.
Running qemu as root is not ideal, but I was able to track down the issue to this linux change:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ca...
Which means that if you're seeing errors like this:
2015-03-02T18:00:51.243477Z qemu-kvm: -netdev tap,script=/tmp/vnet380622.sh,id=hostnet1: could not open /dev/net/tun: Operation not permitted 2015-03-02T18:00:51.243518Z qemu-kvm: -netdev tap,script=/tmp/vnet380622.sh,id=hostnet1: Device 'tap' could not be initialized
They can be resolved like this:
1) Edit /etc/libvirt/qemu.conf, and add "/dev/net/tun" to the cgroup_device_acl option 2) Run: setcap cap_net_admin+eip /bin/qemu-system-x86_64
This will give QEMU CAP_NET_ADMIN when it runs. Make sure you review `man capabilities` to see what capabilities this actually gets qemu.
The downside here is that in the event a guest somehow breaks out of qemu, CAP_NET_ADMIN gives them a bunch of scary permissions that could result in you having a seriously bad day (it's enough permissions to MITM all the machine's traffic, which could easily result in compromise)
It looks to me like libvirt already has the ability to create tap devices and pass them into qemu (src/util/virnetdevtap.c - virNetDevTapCreateInBridgePort), however you need to actually be using a bridged network to do this. There is no way to have libvirt just create a tap device and leave the rest to user defined scripts.
I don't think I have the necessary knowledge to add that feature in a generic way, but it seems like it would be pretty handy. I'll probably just work around it by removing the virNetDevBridgeAddPort call from our version of libvirt.
In IRC, I was directed to this patch: https://www.redhat.com/archives/libvir-list/2015-February/msg01212.html ... which does exactly what I was looking for. It doesn't build cleanly in that state, but it's pretty trivial fix (needs actualType added to the function definition for qemuNetworkIfaceConnect and the two calls modified)

2015-03-02 23:41 GMT+03:00 Brian Rak <brak@gameservers.com>:
In IRC, I was directed to this patch: https://www.redhat.com/archives/libvir-list/2015-February/msg01212.html ... which does exactly what I was looking for. It doesn't build cleanly in that state, but it's pretty trivial fix (needs actualType added to the function definition for qemuNetworkIfaceConnect and the two calls modified)
I send new patch version one day ago and it waiting for review. -- Vasiliy Tolstov, e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru

On 3/3/2015 1:49 AM, Vasiliy Tolstov wrote:
2015-03-02 23:41 GMT+03:00 Brian Rak <brak@gameservers.com>:
In IRC, I was directed to this patch: https://www.redhat.com/archives/libvir-list/2015-February/msg01212.html ... which does exactly what I was looking for. It doesn't build cleanly in that state, but it's pretty trivial fix (needs actualType added to the function definition for qemuNetworkIfaceConnect and the two calls modified)
I send new patch version one day ago and it waiting for review.
We've been using this patch for a couple months now, and it's working perfectly. I noticed that it no longer applied cleanly against 1.2.14, so I updated it: diff -urw src_clean/src/qemu/qemu_command.c src/src/qemu/qemu_command.c --- src_clean/src/qemu/qemu_command.c 2015-03-26 22:01:44.000000000 -0400 +++ src/src/qemu/qemu_command.c 2015-04-21 11:34:03.363772741 -0400 @@ -330,10 +330,41 @@ return *tapfd < 0 ? -1 : 0; } +/** + * qemuExecuteEthernetScript: + * @ifname: the interface name + * @script: the script name + * This function executes script for new tap device created by libvirt. + * Returns 0 in case of success or -1 on failure + */ +static int qemuExecuteEthernetScript(const char *ifname, const char *script) +{ + virCommandPtr cmd; + int ret; + + cmd = virCommandNew(script); + virCommandAddArgFormat(cmd, "%s", ifname); + virCommandClearCaps(cmd); +#ifdef CAP_NET_ADMIN + virCommandAllowCap(cmd, CAP_NET_ADMIN); +#endif + virCommandAddEnvPassCommon(cmd); + + if (virCommandRun(cmd, NULL) < 0) { + ret = -1; + } else { + ret = 0; + } + + virCommandFree(cmd); + return ret; +} + /* qemuNetworkIfaceConnect - *only* called if actualType is - * VIR_DOMAIN_NET_TYPE_NETWORK or VIR_DOMAIN_NET_TYPE_BRIDGE (i.e. if - * the connection is made with a tap device connecting to a bridge - * device) + * VIR_DOMAIN_NET_TYPE_NETWORK, VIR_DOMAIN_NET_TYPE_BRIDGE or + * VIR_DOMAIN_NET_TYPE_ETHERNET (i.e. if the connection is + * made with a tap device connecting to a bridge device or + * used ethernet tap device) */ int qemuNetworkIfaceConnect(virDomainDefPtr def, @@ -341,7 +372,8 @@ virDomainNetDefPtr net, virQEMUCapsPtr qemuCaps, int *tapfd, - size_t *tapfdSize) + size_t *tapfdSize, + int actualType) { const char *brname; int ret = -1; @@ -359,11 +391,6 @@ } } - if (!(brname = virDomainNetGetActualBridgeName(net))) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Missing bridge name")); - goto cleanup; - } - if (!net->ifname || STRPREFIX(net->ifname, VIR_NET_GENERATED_PREFIX) || strchr(net->ifname, '%')) { @@ -379,6 +406,22 @@ tap_create_flags |= VIR_NETDEV_TAP_CREATE_VNET_HDR; } + if (actualType == VIR_DOMAIN_NET_TYPE_ETHERNET) { + if (virNetDevTapCreate(&net->ifname, tunpath, tapfd, *tapfdSize, + tap_create_flags) < 0) { + virDomainAuditNetDevice(def, net, tunpath, false); + goto cleanup; + } + if (net->script) { + if (qemuExecuteEthernetScript(net->ifname, net->script) < 0) + goto cleanup; + } + } else { + if (!(brname = virDomainNetGetActualBridgeName(net))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Missing bridge name")); + goto cleanup; + } + if (cfg->privileged) { if (virNetDevTapCreateInBridgePort(brname, &net->ifname, &net->mac, def->uuid, tunpath, tapfd, *tapfdSize, @@ -415,8 +458,8 @@ *tapfdSize = 1; } } - virDomainAuditNetDevice(def, net, tunpath, true); + } if (cfg->macFilter && ebtablesAddForwardAllowIn(driver->ebtables, @@ -5123,6 +5166,7 @@ case VIR_DOMAIN_NET_TYPE_BRIDGE: case VIR_DOMAIN_NET_TYPE_NETWORK: case VIR_DOMAIN_NET_TYPE_DIRECT: + case VIR_DOMAIN_NET_TYPE_ETHERNET: virBufferAsprintf(&buf, "tap%c", type_sep); /* for one tapfd 'fd=' shall be used, * for more than one 'fds=' is the right choice */ @@ -5140,20 +5184,6 @@ is_tap = true; break; - case VIR_DOMAIN_NET_TYPE_ETHERNET: - virBufferAddLit(&buf, "tap"); - if (net->ifname) { - virBufferAsprintf(&buf, "%cifname=%s", type_sep, net->ifname); - type_sep = ','; - } - if (net->script) { - virBufferAsprintf(&buf, "%cscript=%s", type_sep, - net->script); - type_sep = ','; - } - is_tap = true; - break; - case VIR_DOMAIN_NET_TYPE_CLIENT: virBufferAsprintf(&buf, "socket%cconnect=%s:%d", type_sep, @@ -8009,7 +8039,8 @@ /* Currently nothing besides TAP devices supports multiqueue. */ if (net->driver.virtio.queues > 0 && !(actualType == VIR_DOMAIN_NET_TYPE_NETWORK || - actualType == VIR_DOMAIN_NET_TYPE_BRIDGE)) { + actualType == VIR_DOMAIN_NET_TYPE_BRIDGE || + actualType == VIR_DOMAIN_NET_TYPE_ETHERNET)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("Multiqueue network is not supported for: %s"), virDomainNetTypeToString(actualType)); @@ -8026,7 +8057,8 @@ } if (actualType == VIR_DOMAIN_NET_TYPE_NETWORK || - actualType == VIR_DOMAIN_NET_TYPE_BRIDGE) { + actualType == VIR_DOMAIN_NET_TYPE_BRIDGE || + actualType == VIR_DOMAIN_NET_TYPE_ETHERNET) { tapfdSize = net->driver.virtio.queues; if (!tapfdSize) tapfdSize = 1; @@ -8039,7 +8071,7 @@ if (qemuNetworkIfaceConnect(def, driver, net, qemuCaps, tapfd, - &tapfdSize) < 0) + &tapfdSize, actualType) < 0) goto cleanup; } else if (actualType == VIR_DOMAIN_NET_TYPE_DIRECT) { if (VIR_ALLOC(tapfd) < 0 || VIR_ALLOC(tapfdName) < 0) diff -urw src_clean/src/qemu/qemu_command.h src/src/qemu/qemu_command.h --- src_clean/src/qemu/qemu_command.h 2015-03-25 03:36:59.000000000 -0400 +++ src/src/qemu/qemu_command.h 2015-04-21 11:33:26.557397006 -0400 @@ -223,7 +223,8 @@ virDomainNetDefPtr net, virQEMUCapsPtr qemuCaps, int *tapfd, - size_t *tapfdSize) + size_t *tapfdSize, + int qemuNetworkIfaceConnect) ATTRIBUTE_NONNULL(2); int qemuPhysIfaceConnect(virDomainDefPtr def, Only in src/src/qemu: qemu_command.h.orig diff -urw src_clean/src/qemu/qemu_hotplug.c src/src/qemu/qemu_hotplug.c --- src_clean/src/qemu/qemu_hotplug.c 2015-03-25 03:36:59.000000000 -0400 +++ src/src/qemu/qemu_hotplug.c 2015-04-21 11:34:34.810243144 -0400 @@ -898,7 +898,8 @@ /* Currently nothing besides TAP devices supports multiqueue. */ if (net->driver.virtio.queues > 0 && !(actualType == VIR_DOMAIN_NET_TYPE_NETWORK || - actualType == VIR_DOMAIN_NET_TYPE_BRIDGE)) { + actualType == VIR_DOMAIN_NET_TYPE_BRIDGE || + actualType == VIR_DOMAIN_NET_TYPE_ETHERNET)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("Multiqueue network is not supported for: %s"), virDomainNetTypeToString(actualType)); @@ -906,7 +907,8 @@ } if (actualType == VIR_DOMAIN_NET_TYPE_BRIDGE || - actualType == VIR_DOMAIN_NET_TYPE_NETWORK) { + actualType == VIR_DOMAIN_NET_TYPE_NETWORK || + actualType == VIR_DOMAIN_NET_TYPE_ETHERNET) { tapfdSize = vhostfdSize = net->driver.virtio.queues; if (!tapfdSize) tapfdSize = vhostfdSize = 1; @@ -917,7 +919,7 @@ goto cleanup; memset(vhostfd, -1, sizeof(*vhostfd) * vhostfdSize); if (qemuNetworkIfaceConnect(vm->def, driver, net, - priv->qemuCaps, tapfd, &tapfdSize) < 0) + priv->qemuCaps, tapfd, &tapfdSize, actualType) < 0) goto cleanup; iface_connected = true; if (qemuOpenVhostNet(vm->def, net, priv->qemuCaps, vhostfd, &vhostfdSize) < 0) @@ -937,13 +939,6 @@ iface_connected = true; if (qemuOpenVhostNet(vm->def, net, priv->qemuCaps, vhostfd, &vhostfdSize) < 0) goto cleanup; - } else if (actualType == VIR_DOMAIN_NET_TYPE_ETHERNET) { - vhostfdSize = 1; - if (VIR_ALLOC(vhostfd) < 0) - goto cleanup; - *vhostfd = -1; - if (qemuOpenVhostNet(vm->def, net, priv->qemuCaps, vhostfd, &vhostfdSize) < 0) - goto cleanup; } /* Set device online immediately */ diff -urw src_clean/src/qemu/qemu_process.c src/src/qemu/qemu_process.c --- src_clean/src/qemu/qemu_process.c 2015-03-30 20:36:17.000000000 -0400 +++ src/src/qemu/qemu_process.c 2015-04-21 11:33:26.559396972 -0400 @@ -5248,6 +5248,12 @@ cfg->stateDir)); VIR_FREE(net->ifname); break; + case VIR_DOMAIN_NET_TYPE_ETHERNET: + if (net->ifname) { + ignore_value(virNetDevTapDelete(net->ifname, net->backend.tap)); + VIR_FREE(net->ifname); + } + break; case VIR_DOMAIN_NET_TYPE_BRIDGE: case VIR_DOMAIN_NET_TYPE_NETWORK: #ifdef VIR_NETDEV_TAP_REQUIRE_MANUAL_CLEANUP
participants (2)
-
Brian Rak
-
Vasiliy Tolstov