[PATCH 0/2] qemu: Set noqueue qdisc for TAP devices

See 2/2 for detailed explanation. Long story short - we can squeeze more bandwidth from TAP devices we create for domains. Michal Prívozník (2): virnetdev: Introduce virNetDevSetRootQDisc() qemu: Set noqueue qdisc for TAP devices src/libvirt_private.syms | 1 + src/qemu/qemu_command.c | 2 ++ src/qemu/qemu_domain.c | 36 +++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 4 ++++ src/qemu/qemu_hotplug.c | 2 ++ src/util/virnetdev.c | 46 ++++++++++++++++++++++++++++++++++++++++ src/util/virnetdev.h | 3 +++ 7 files changed, 94 insertions(+) -- 2.26.2

This helper changes the root qdisc on given interface. Ideally, it would be written using netlink but my attempts to write the code were not successful and thus I've fallen back to virCommand() + tc. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virnetdev.c | 46 ++++++++++++++++++++++++++++++++++++++++ src/util/virnetdev.h | 3 +++ 3 files changed, 50 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 7cf8bea962..0778bc6d5b 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2581,6 +2581,7 @@ virNetDevSetOnline; virNetDevSetPromiscuous; virNetDevSetRcvAllMulti; virNetDevSetRcvMulti; +virNetDevSetRootQDisc; virNetDevSetupControl; virNetDevSysfsFile; virNetDevValidateConfig; diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index e711a6dc8a..5c7660dab4 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -3394,3 +3394,49 @@ virNetDevRunEthernetScript(const char *ifname, const char *script) return virCommandRun(cmd, NULL); } + + +/** + * virNetDevSetRootQDisc: + * @ifname: the interface name + * @qdisc: queueing discipline to set + * + * For given interface @ifname set its root queueing discipline + * to @qdisc. This can be used to replace the default qdisc + * (usually pfifo_fast or whatever is set in + * /proc/sys/net/core/default_qdisc) with different qdisc. + * + * Returns: 0 on success, + * -1 if failed to exec tc (with error reported) + * -2 if tc failed (with no error reported) + */ +int +virNetDevSetRootQDisc(const char *ifname, + const char *qdisc) +{ + g_autoptr(virCommand) cmd = NULL; + g_autofree char *outbuf = NULL; + g_autofree char *errbuf = NULL; + int status; + + /* Ideally, we would have a netlink implementation and just + * call it here. But honestly, I tried and failed miserably. + * Fallback to spawning tc. */ + cmd = virCommandNewArgList(TC, "qdisc", "add", "dev", ifname, + "root", "handle", "0:", qdisc, + NULL); + + virCommandAddEnvString(cmd, "LC_ALL=C"); + virCommandSetOutputBuffer(cmd, &outbuf); + virCommandSetErrorBuffer(cmd, &errbuf); + + if (virCommandRun(cmd, &status) < 0) + return -1; + + if (status != 0) { + VIR_DEBUG("Setting qdisc failed: output='%s' err='%s'", outbuf, errbuf); + return -2; + } + + return 0; +} diff --git a/src/util/virnetdev.h b/src/util/virnetdev.h index 5f581323ed..82943b8e08 100644 --- a/src/util/virnetdev.h +++ b/src/util/virnetdev.h @@ -312,4 +312,7 @@ int virNetDevSysfsFile(char **pf_sysfs_device_link, int virNetDevRunEthernetScript(const char *ifname, const char *script) G_GNUC_NO_INLINE; +int virNetDevSetRootQDisc(const char *ifname, + const char *qdisc); + G_DEFINE_AUTOPTR_CLEANUP_FUNC(virNetDevRxFilter, virNetDevRxFilterFree); -- 2.26.2

By default, pfifo_fast queueing discipline (qdisc) is set on newly created interfaces (including TAPs). This qdisc has three queues and packets that want to be sent through given NIC are placed into one of the queues based on TOS field. Queues are then emptied based on their priority allowing interactive sessions stay interactive whilst something else is downloading a large file. Obviously, this means that kernel has to be involved and some locking has to happen (when placing packets into queues). If virtualization is taken into account then the above algorithm happens twice - once in the guest and the second time in the host. This is arguably not optimal as it burns host CPU cycles needlessly. Guest already made it choice and sent packets in the order it wants. To resolve this, Linux kernel offers 'noqueue' qdisc which can be applied on virtual interfaces and in fact for 'lo' it is by default: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue Set it for other TAP devices we create for domains too. With this change I was able to squeeze 1Mbps more from a macvtap attached to a guest and to my 1Gbps LAN (as measured by iperf3). Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1329644 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_command.c | 2 ++ src/qemu/qemu_domain.c | 36 ++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 4 ++++ src/qemu/qemu_hotplug.c | 2 ++ 4 files changed, 44 insertions(+) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 9519861e92..eec860382c 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -8267,6 +8267,8 @@ qemuBuildInterfaceCommandLine(virQEMUDriverPtr driver, } } + qemuDomainInterfaceSetDefaultQDisc(driver, net); + if (net->mtu && virNetDevSetMTU(net->ifname, net->mtu) < 0) goto cleanup; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 9623123d3c..72da28666f 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11048,3 +11048,39 @@ qemuDomainFileWrapperFDClose(virDomainObjPtr vm, } return ret; } + + +/** + * qemuDomainInterfaceSetDefaultQDisc: + * @driver: QEMU driver + * @net: domain interface + * + * Set the noqueue qdisc on @net if running as privileged. The + * noqueue qdisc is a lockless transmit and thus faster than the + * default pfifo_fast (at least in theory). But we can modify + * root qdisc only if we have CAP_NET_ADMIN. + * + * Returns: 0 on success, + * -1 otherwise. + */ +int +qemuDomainInterfaceSetDefaultQDisc(virQEMUDriverPtr driver, + virDomainNetDefPtr net) +{ + virDomainNetType actualType = virDomainNetGetActualType(net); + + if (!driver->privileged || !net->ifname) + return 0; + + /* We want only those types which are represented as TAP + * devices in the host. */ + if (actualType == VIR_DOMAIN_NET_TYPE_ETHERNET || + actualType == VIR_DOMAIN_NET_TYPE_NETWORK || + actualType == VIR_DOMAIN_NET_TYPE_BRIDGE || + actualType == VIR_DOMAIN_NET_TYPE_DIRECT) { + if (virNetDevSetRootQDisc(net->ifname, "noqueue") < 0) + return -1; + } + + return 0; +} diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 9bf32e16c9..91b3b67cb6 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -1046,3 +1046,7 @@ qemuDomainOpenFile(virQEMUDriverPtr driver, int qemuDomainFileWrapperFDClose(virDomainObjPtr vm, virFileWrapperFdPtr fd); + +int +qemuDomainInterfaceSetDefaultQDisc(virQEMUDriverPtr driver, + virDomainNetDefPtr net); diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 7a54fcb221..2c184b9ba0 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1368,6 +1368,8 @@ qemuDomainAttachNetDevice(virQEMUDriverPtr driver, virNetDevSetMTU(net->ifname, net->mtu) < 0) goto cleanup; + qemuDomainInterfaceSetDefaultQDisc(driver, net); + for (i = 0; i < tapfdSize; i++) { if (qemuSecuritySetTapFDLabel(driver->securityManager, vm->def, tapfd[i]) < 0) -- 2.26.2

On 10/8/20 3:10 PM, Michal Privoznik wrote:
See 2/2 for detailed explanation. Long story short - we can squeeze more bandwidth from TAP devices we create for domains.
Michal Prívozník (2): virnetdev: Introduce virNetDevSetRootQDisc() qemu: Set noqueue qdisc for TAP devices
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Have you tried it out with older kernels? Reading the bug I understood that 4.2 and older (2015 kernels) does not have qdisc support and that you can, for example, test for NETIF_F_LLTX of the ETHTOOL_GFEATURES ioctl to verify it. I consider this to be a 'nice to have' that can be added in a follow up patch though, if applicable. By the way, do we have any documentation about "the latest Libvirt release will not care about N+ years old kernel/QEMU"? Thanks, DHB
src/libvirt_private.syms | 1 + src/qemu/qemu_command.c | 2 ++ src/qemu/qemu_domain.c | 36 +++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 4 ++++ src/qemu/qemu_hotplug.c | 2 ++ src/util/virnetdev.c | 46 ++++++++++++++++++++++++++++++++++++++++ src/util/virnetdev.h | 3 +++ 7 files changed, 94 insertions(+)

On 10/13/20 3:09 PM, Daniel Henrique Barboza wrote:
On 10/8/20 3:10 PM, Michal Privoznik wrote:
See 2/2 for detailed explanation. Long story short - we can squeeze more bandwidth from TAP devices we create for domains.
Michal Prívozník (2): virnetdev: Introduce virNetDevSetRootQDisc() qemu: Set noqueue qdisc for TAP devices
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Thanks. Just before I merged these I realized that it will be a good idea to mock virNetDevSetRootQDisc() so that we don't run tc from the test suite. But the diff is really trivial: diff --git i/src/util/virnetdev.h w/src/util/virnetdev.h index 82943b8e08..dfef49938f 100644 --- i/src/util/virnetdev.h +++ w/src/util/virnetdev.h @@ -313,6 +313,7 @@ int virNetDevRunEthernetScript(const char *ifname, const char *script) G_GNUC_NO_INLINE; int virNetDevSetRootQDisc(const char *ifname, - const char *qdisc); + const char *qdisc) + G_GNUC_NO_INLINE; G_DEFINE_AUTOPTR_CLEANUP_FUNC(virNetDevRxFilter, virNetDevRxFilterFree); diff --git i/tests/qemuxml2argvmock.c w/tests/qemuxml2argvmock.c index 9bf4357b66..b9322f4f2a 100644 --- i/tests/qemuxml2argvmock.c +++ w/tests/qemuxml2argvmock.c @@ -286,3 +286,11 @@ qemuBuildTPMOpenBackendFDs(const char *tpmdev G_GNUC_UNUSED, *cancelfd = 1731; return 0; } + + +int +virNetDevSetRootQDisc(const char *ifname G_GNUC_UNUSED, + const char *qdisc G_GNUC_UNUSED) +{ + return 0; +} Hopefully, you are okay if I squash this to 2/2.
Have you tried it out with older kernels? Reading the bug I understood that 4.2 and older (2015 kernels) does not have qdisc support and that you can, for example, test for NETIF_F_LLTX of the ETHTOOL_GFEATURES ioctl to verify it.
I didn't, but I had this on mind when writing patches and I made the whole thing best effort. Note that qemuDomainInterfaceSetDefaultQDisc() retval is not checked for => if setting noqueue fails (for whatever reason) then worst case scenario an error is printed into logs.
I consider this to be a 'nice to have' that can be added in a follow up patch though, if applicable. By the way, do we have any documentation about "the latest Libvirt release will not care about N+ years old kernel/QEMU"?
I'm not sure what you mean. This perhaps? https://libvirt.org/platforms.html#linux-freebsd-and-macos Michal

On 10/13/20 11:06 AM, Michal Privoznik wrote:
On 10/13/20 3:09 PM, Daniel Henrique Barboza wrote:
On 10/8/20 3:10 PM, Michal Privoznik wrote:
See 2/2 for detailed explanation. Long story short - we can squeeze more bandwidth from TAP devices we create for domains.
Michal Prívozník (2): virnetdev: Introduce virNetDevSetRootQDisc() qemu: Set noqueue qdisc for TAP devices
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Thanks. Just before I merged these I realized that it will be a good idea to mock virNetDevSetRootQDisc() so that we don't run tc from the test suite. But the diff is really trivial:
diff --git i/src/util/virnetdev.h w/src/util/virnetdev.h index 82943b8e08..dfef49938f 100644 --- i/src/util/virnetdev.h +++ w/src/util/virnetdev.h @@ -313,6 +313,7 @@ int virNetDevRunEthernetScript(const char *ifname, const char *script) G_GNUC_NO_INLINE;
int virNetDevSetRootQDisc(const char *ifname, - const char *qdisc); + const char *qdisc) + G_GNUC_NO_INLINE;
G_DEFINE_AUTOPTR_CLEANUP_FUNC(virNetDevRxFilter, virNetDevRxFilterFree); diff --git i/tests/qemuxml2argvmock.c w/tests/qemuxml2argvmock.c index 9bf4357b66..b9322f4f2a 100644 --- i/tests/qemuxml2argvmock.c +++ w/tests/qemuxml2argvmock.c @@ -286,3 +286,11 @@ qemuBuildTPMOpenBackendFDs(const char *tpmdev G_GNUC_UNUSED, *cancelfd = 1731; return 0; } + + +int +virNetDevSetRootQDisc(const char *ifname G_GNUC_UNUSED, + const char *qdisc G_GNUC_UNUSED) +{ + return 0; +}
Hopefully, you are okay if I squash this to 2/2.
Yep, go ahead.
Have you tried it out with older kernels? Reading the bug I understood that 4.2 and older (2015 kernels) does not have qdisc support and that you can, for example, test for NETIF_F_LLTX of the ETHTOOL_GFEATURES ioctl to verify it.
I didn't, but I had this on mind when writing patches and I made the whole thing best effort. Note that qemuDomainInterfaceSetDefaultQDisc() retval is not checked for => if setting noqueue fails (for whatever reason) then worst case scenario an error is printed into logs.
I consider this to be a 'nice to have' that can be added in a follow up patch though, if applicable. By the way, do we have any documentation about "the latest Libvirt release will not care about N+ years old kernel/QEMU"?
I'm not sure what you mean. This perhaps?
Thanks, this is what I was looking for. DHB
Michal
participants (2)
-
Daniel Henrique Barboza
-
Michal Privoznik