[libvirt] [PATCH 0/7] QoS: Adapt to changing MAC

While we taught libvirt to listen to MAC address changes from guests we forgot to update our QoS hierarchy accordingly. The first 3 patches are pure bug fixes and can go in regardless of state of the rest. The 4th patch is a slight improvement as it makes filter priority predictable. Then, patches 5-7 actually fix the bug. Michal Privoznik (7): virNetDevBandwidthPlug: Update function description RNG schema: allow plain @floor to <bandwidth/> virDomainActualNetDefContentsFormat: Format class_id more frequently virNetDevBandwidthSet: Add priority to filter virnetdevbandwidth.c: Separate tc filter creation to a function Introduce virNetDevBandwidthUpdateFilter processNicRxFilterChangedEvent: Take appropriate actions for NET_TYPE_NETWORK too docs/schemas/networkcommon.rng | 8 +- src/conf/domain_conf.c | 3 +- src/libvirt_private.syms | 1 + src/qemu/qemu_driver.c | 22 +++ src/util/virnetdevbandwidth.c | 186 ++++++++++++++++----- src/util/virnetdevbandwidth.h | 6 + .../qemuxml2argv-net-bandwidth2.xml | 65 +++++++ tests/qemuxml2xmltest.c | 1 + tests/virnetdevbandwidthtest.c | 4 +- 9 files changed, 250 insertions(+), 46 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-net-bandwidth2.xml -- 2.0.5

The comment is describing arguments passed to the function. However, there's no @ifmac argument. In 955af4d4 it was replaced with @ifmac_ptr. Unfortunately, the comment wasn't updated. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/virnetdevbandwidth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index 8041d52..d1c0f12 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -389,7 +389,7 @@ virNetDevBandwidthEqual(virNetDevBandwidthPtr a, * virNetDevBandwidthPlug: * @brname: name of the bridge * @net_bandwidth: QoS settings on @brname - * @ifmac: MAC of interface + * @ifmac_ptr: MAC of interface * @bandwidth: QoS settings for interface * @id: unique ID (MUST be greater than 2) * -- 2.0.5

On 04/14/2015 12:59 PM, Michal Privoznik wrote:
The comment is describing arguments passed to the function. However, there's no @ifmac argument. In 955af4d4 it was replaced with @ifmac_ptr. Unfortunately, the comment wasn't updated.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/virnetdevbandwidth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index 8041d52..d1c0f12 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -389,7 +389,7 @@ virNetDevBandwidthEqual(virNetDevBandwidthPtr a, * virNetDevBandwidthPlug: * @brname: name of the bridge * @net_bandwidth: QoS settings on @brname - * @ifmac: MAC of interface + * @ifmac_ptr: MAC of interface * @bandwidth: QoS settings for interface * @id: unique ID (MUST be greater than 2) *
ACK.

The <inbound/> element to <bandwidth/> has several attributes from which two are mandatory. Well, from two at least one has to be present: @average or @floor or both. Instead of inventing crazy RNG schema, let's make all the attributes optional there and rely on our parsing code to correctly handle the situation. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/schemas/networkcommon.rng | 8 ++- .../qemuxml2argv-net-bandwidth2.xml | 65 ++++++++++++++++++++++ tests/qemuxml2xmltest.c | 1 + 3 files changed, 71 insertions(+), 3 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-net-bandwidth2.xml diff --git a/docs/schemas/networkcommon.rng b/docs/schemas/networkcommon.rng index cc8b1dc..09691dc 100644 --- a/docs/schemas/networkcommon.rng +++ b/docs/schemas/networkcommon.rng @@ -152,9 +152,11 @@ </define> <define name="bandwidth-attributes"> - <attribute name="average"> - <ref name="speed"/> - </attribute> + <optional> + <attribute name="average"> + <ref name="speed"/> + </attribute> + </optional> <optional> <attribute name="peak"> <ref name="speed"/> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-net-bandwidth2.xml b/tests/qemuxml2argvdata/qemuxml2argv-net-bandwidth2.xml new file mode 100644 index 0000000..10e15ee --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-net-bandwidth2.xml @@ -0,0 +1,65 @@ +<domain type='kvm'> + <name>f14-60</name> + <uuid>38644c45-5227-a936-3b38-bc4f72c355bb</uuid> + <memory unit='KiB'>1048576</memory> + <currentMemory unit='KiB'>1048576</currentMemory> + <vcpu placement='static'>2</vcpu> + <os> + <type arch='x86_64' machine='pc-0.13'>hvm</type> + <boot dev='cdrom'/> + <boot dev='hd'/> + <bootmenu enable='yes'/> + </os> + <features> + <acpi/> + <apic/> + <pae/> + </features> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>restart</on_crash> + <devices> + <emulator>/usr/bin/qemu-kvm</emulator> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2'/> + <source file='/var/lib/libvirt/images/f14-6.img'/> + <target dev='vda' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </disk> + <controller type='usb' index='0'/> + <controller type='virtio-serial' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <interface type='network'> + <mac address='52:54:00:24:a5:9f'/> + <source network='default'/> + <bandwidth> + <inbound floor='200'/> + </bandwidth> + <model type='rtl8139'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> + </interface> + <serial type='pty'> + <target port='0'/> + </serial> + <console type='pty'> + <target type='serial' port='0'/> + </console> + <input type='tablet' bus='usb'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <graphics type='vnc' port='-1' autoport='yes'/> + <sound model='ac97'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </sound> + <video> + <model type='vga' vram='16384' heads='1'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </video> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 5a5812f..c615d8e 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -456,6 +456,7 @@ mymain(void) DO_TEST("sound"); DO_TEST("sound-device"); DO_TEST("net-bandwidth"); + DO_TEST("net-bandwidth2"); DO_TEST("serial-vc"); DO_TEST("serial-pty"); -- 2.0.5

On 04/14/2015 12:59 PM, Michal Privoznik wrote:
The <inbound/> element to <bandwidth/> has several attributes from which two are mandatory. Well, from two at least one has to be present: @average or @floor or both. Instead of inventing crazy RNG schema, let's make all the attributes optional there and rely on our parsing code to correctly handle the situation.
Agreed. Using the RNG as a sanity check is nice, but it's easy to become obsessed with making it too picky. ACK.

After a360912179 the formatting of virDomainActualNetDefPtr was changed a bit. However, during the function rewrite, iface's class_id is not formatted as frequently as it could be. In fact, after rewrite it's formatted only for iface of type VIR_DOMAIN_NET_TYPE_DIRECT where it makes no sense and is unused. While where needed (_TYPE_NETWORK) is not formatted at all. This makes the daemon forget it upon daemon restart resulting in bad behaviour. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/domain_conf.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 4d7e3c9..ab4f2bf 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -18608,8 +18608,7 @@ virDomainActualNetDefContentsFormat(virBufferPtr buf, virBufferAddLit(buf, "/>\n"); } - if (actualType == VIR_DOMAIN_NET_TYPE_DIRECT && - def->data.network.actual && def->data.network.actual->class_id) { + if (def->data.network.actual && def->data.network.actual->class_id) { virBufferAsprintf(buf, "<class id='%u'/>\n", def->data.network.actual->class_id); } -- 2.0.5

On 04/14/2015 12:59 PM, Michal Privoznik wrote:
After a360912179 the formatting of virDomainActualNetDefPtr was changed a bit. However, during the function rewrite, iface's class_id is not formatted as frequently as it could be. In fact, after rewrite it's formatted only for iface of type VIR_DOMAIN_NET_TYPE_DIRECT where it makes no sense and is unused. While where needed (_TYPE_NETWORK) is not formatted at all.
Yikes! That was a typo when moving the code around. Should have been VIR_DOMAIN_NET_TYPE_NETWORK (that bit was inside "case VIR_DOMAIN_NET_TYPE_NETWORK previous to the patch)
This makes the daemon forget it upon daemon restart resulting in bad behaviour.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/domain_conf.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 4d7e3c9..ab4f2bf 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -18608,8 +18608,7 @@ virDomainActualNetDefContentsFormat(virBufferPtr buf,
virBufferAddLit(buf, "/>\n"); } - if (actualType == VIR_DOMAIN_NET_TYPE_DIRECT && - def->data.network.actual && def->data.network.actual->class_id) { + if (def->data.network.actual && def->data.network.actual->class_id) { virBufferAsprintf(buf, "<class id='%u'/>\n", def->data.network.actual->class_id); }
I suppose since the modes that wouldn't have been able to setup any bandwidth limiting would have class_id == 0, this works. If you for some reason wanted to protect against using it for hostdev interfaces, yoou could move it up into the preceding else clause, but I guess I don't see any gain from that. ACK.

On 04/16/2015 01:00 PM, Laine Stump wrote:
On 04/14/2015 12:59 PM, Michal Privoznik wrote:
After a360912179 the formatting of virDomainActualNetDefPtr was changed a bit. However, during the function rewrite, iface's class_id is not formatted as frequently as it could be. In fact, after rewrite it's formatted only for iface of type VIR_DOMAIN_NET_TYPE_DIRECT where it makes no sense and is unused. While where needed (_TYPE_NETWORK) is not formatted at all. Yikes! That was a typo when moving the code around. Should have been VIR_DOMAIN_NET_TYPE_NETWORK (that bit was inside "case VIR_DOMAIN_NET_TYPE_NETWORK previous to the patch)
This makes the daemon forget it upon daemon restart resulting in bad behaviour.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/domain_conf.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 4d7e3c9..ab4f2bf 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -18608,8 +18608,7 @@ virDomainActualNetDefContentsFormat(virBufferPtr buf,
virBufferAddLit(buf, "/>\n"); } - if (actualType == VIR_DOMAIN_NET_TYPE_DIRECT && - def->data.network.actual && def->data.network.actual->class_id) { + if (def->data.network.actual && def->data.network.actual->class_id) { virBufferAsprintf(buf, "<class id='%u'/>\n", def->data.network.actual->class_id); } I suppose since the modes that wouldn't have been able to setup any bandwidth limiting would have class_id == 0, this works. If you for some reason wanted to protect against using it for hostdev interfaces, yoou could move it up into the preceding else clause, but I guess I don't see any gain from that.
An interesting aside note - just a few days ago I noticed that at most/all SRIOV cards have support for setting a bandwidth maximum on each VF (and at least a couple support setting a minimum). Of course in this case tc wouldn't be used, so no class_id would be needed (in other words, it's not directly relevant to this patch), I just thought it was interesting - a nice simple project for someone with a few spare cycles and access to the proper SRIOV hardware for testing.

Currently, when constructing traffic shaping rules, the ingress filter is created without any priority specified on the command line. This makes kernel to make up one. While this works, it simplifies things a big if we provide the filter priority. In this case, since it's the root filter lets have it the highest priority of number 1. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/virnetdevbandwidth.c | 4 ++-- tests/virnetdevbandwidthtest.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index d1c0f12..943178b 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -209,8 +209,8 @@ virNetDevBandwidthSet(const char *ifname, virCommandFree(cmd); cmd = virCommandNew(TC); virCommandAddArgList(cmd, "filter", "add", "dev", ifname, "parent", - "1:0", "protocol", "all", "handle", "1", "fw", - "flowid", "1", NULL); + "1:0", "protocol", "all", "prio", "1", "handle", + "1", "fw", "flowid", "1", NULL); if (virCommandRun(cmd, NULL) < 0) goto cleanup; diff --git a/tests/virnetdevbandwidthtest.c b/tests/virnetdevbandwidthtest.c index 3b46455..5a3f02c 100644 --- a/tests/virnetdevbandwidthtest.c +++ b/tests/virnetdevbandwidthtest.c @@ -139,7 +139,7 @@ mymain(void) TC " qdisc add dev eth0 root handle 1: htb default 1\n" TC " class add dev eth0 parent 1: classid 1:1 htb rate 1024kbps\n" TC " qdisc add dev eth0 parent 1:1 handle 2: sfq perturb 10\n" - TC " filter add dev eth0 parent 1:0 protocol all handle 1 fw flowid 1\n")); + TC " filter add dev eth0 parent 1:0 protocol all prio 1 handle 1 fw flowid 1\n")); DO_TEST_SET(("<bandwidth>" " <outbound average='1024'/>" @@ -159,7 +159,7 @@ mymain(void) TC " qdisc add dev eth0 root handle 1: htb default 1\n" TC " class add dev eth0 parent 1: classid 1:1 htb rate 1kbps ceil 2kbps burst 4kb\n" TC " qdisc add dev eth0 parent 1:1 handle 2: sfq perturb 10\n" - TC " filter add dev eth0 parent 1:0 protocol all handle 1 fw flowid 1\n" + TC " filter add dev eth0 parent 1:0 protocol all prio 1 handle 1 fw flowid 1\n" TC " qdisc add dev eth0 ingress\n" TC " filter add dev eth0 parent ffff: protocol all u32 match u32 0 0 " "police rate 5kbps burst 7kb mtu 64kb drop flowid :1\n")); -- 2.0.5

On 04/14/2015 12:59 PM, Michal Privoznik wrote:
Currently, when constructing traffic shaping rules, the ingress filter is created without any priority specified on the command line. This makes kernel to make up one. While this works, it simplifies things a big if we provide the filter priority. In
s/big/bit/
this case, since it's the root filter lets have it the highest
s/have/give/
priority of number 1.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
ACK.
--- src/util/virnetdevbandwidth.c | 4 ++-- tests/virnetdevbandwidthtest.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index d1c0f12..943178b 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -209,8 +209,8 @@ virNetDevBandwidthSet(const char *ifname, virCommandFree(cmd); cmd = virCommandNew(TC); virCommandAddArgList(cmd, "filter", "add", "dev", ifname, "parent", - "1:0", "protocol", "all", "handle", "1", "fw", - "flowid", "1", NULL); + "1:0", "protocol", "all", "prio", "1", "handle", + "1", "fw", "flowid", "1", NULL);
if (virCommandRun(cmd, NULL) < 0) goto cleanup; diff --git a/tests/virnetdevbandwidthtest.c b/tests/virnetdevbandwidthtest.c index 3b46455..5a3f02c 100644 --- a/tests/virnetdevbandwidthtest.c +++ b/tests/virnetdevbandwidthtest.c @@ -139,7 +139,7 @@ mymain(void) TC " qdisc add dev eth0 root handle 1: htb default 1\n" TC " class add dev eth0 parent 1: classid 1:1 htb rate 1024kbps\n" TC " qdisc add dev eth0 parent 1:1 handle 2: sfq perturb 10\n" - TC " filter add dev eth0 parent 1:0 protocol all handle 1 fw flowid 1\n")); + TC " filter add dev eth0 parent 1:0 protocol all prio 1 handle 1 fw flowid 1\n"));
DO_TEST_SET(("<bandwidth>" " <outbound average='1024'/>" @@ -159,7 +159,7 @@ mymain(void) TC " qdisc add dev eth0 root handle 1: htb default 1\n" TC " class add dev eth0 parent 1: classid 1:1 htb rate 1kbps ceil 2kbps burst 4kb\n" TC " qdisc add dev eth0 parent 1:1 handle 2: sfq perturb 10\n" - TC " filter add dev eth0 parent 1:0 protocol all handle 1 fw flowid 1\n" + TC " filter add dev eth0 parent 1:0 protocol all prio 1 handle 1 fw flowid 1\n" TC " qdisc add dev eth0 ingress\n" TC " filter add dev eth0 parent ffff: protocol all u32 match u32 0 0 " "police rate 5kbps burst 7kb mtu 64kb drop flowid :1\n"));

Not only this simplifies the code a bit, it prepares the environment for upcoming patches. The new virNetDevBandwidthManipulateFilter() function is capable of both removing a filter and adding a new one. At the same time! Yeah, this is not currently used anywhere but look at the next commit where you'll see it. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/virnetdevbandwidth.c | 142 +++++++++++++++++++++++++++++++----------- 1 file changed, 106 insertions(+), 36 deletions(-) diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index 943178b..c57c8c0 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -43,6 +43,107 @@ virNetDevBandwidthFree(virNetDevBandwidthPtr def) VIR_FREE(def); } +/** + * virNetDevBandwidthManipulateFilter: + * @ifname: interface to operate on + * @ifmac_ptr: MAC of the interface to create filter over + * @id: filter ID + * @class_id: where to place traffic + * @remove_old: whether to remove the filter + * @create_new: whether to create the filter + * + * TC filters are as crucial for traffic shaping as QDiscs. While + * QDisc acts like black boxes deciding which packets should be + * held up and which should be sent immediately, it's filter who + * places a packet into the box. So, we may end up constructing a + * set of filters on a single device (e.g. a bridge) and filter + * the traffic into QDiscs based on the originating vNET device. + * + * Long story short, @ifname is the interface where the filter + * should be created. The @ifmac_ptr is the MAC address for which + * the filter should be created (usually different to the MAC + * address of @ifname). Then, like everything - even filters have + * an @id which should be unique (per @ifname). And @class_id + * tells into which QDisc should filter place the traffic. + * + * This function can be used for both, removing stale filter + * (@remove_old set to true) and creating new one (@create_new + * set to true). Both at once for the same price! + * + * Returns: 0 on success, + * -1 otherwise (with error reported). + */ +static int ATTRIBUTE_NONNULL(1) +virNetDevBandwidthManipulateFilter(const char *ifname, + const virMacAddr *ifmac_ptr, + unsigned int id, + const char *class_id, + bool remove_old, + bool create_new) +{ + int ret = -1; + char *filter_id = NULL; + virCommandPtr cmd = NULL; + unsigned char ifmac[VIR_MAC_BUFLEN]; + char *mac[2] = {NULL, NULL}; + + if (!(remove_old || create_new)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("filter creation API error")); + goto cleanup; + } + + /* u32 filters must have 800:: prefix. Don't ask. */ + if (virAsprintf(&filter_id, "800::%u", id) < 0) + goto cleanup; + + if (remove_old) { + int cmd_ret = 0; + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "filter", "del", "dev", ifname, + "prio", "2", "handle", filter_id, "u32", NULL); + + if (virCommandRun(cmd, &cmd_ret) < 0) + goto cleanup; + + } + + if (create_new) { + virMacAddrGetRaw(ifmac_ptr, ifmac); + + if (virAsprintf(&mac[0], "0x%02x%02x%02x%02x", ifmac[2], + ifmac[3], ifmac[4], ifmac[5]) < 0 || + virAsprintf(&mac[1], "0x%02x%02x", ifmac[0], ifmac[1]) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + /* Okay, this not nice. But since libvirt does not necessarily track + * interface IP address(es), and tc fw filter simply refuse to use + * ebtables marks, we need to use u32 selector to match MAC address. + * If libvirt will ever know something, remove this FIXME + */ + virCommandAddArgList(cmd, "filter", "add", "dev", ifname, "protocol", "ip", + "prio", "2", "handle", filter_id, "u32", + "match", "u16", "0x0800", "0xffff", "at", "-2", + "match", "u32", mac[0], "0xffffffff", "at", "-12", + "match", "u16", mac[1], "0xffff", "at", "-14", + "flowid", class_id, NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + } + + ret = 0; + cleanup: + VIR_FREE(mac[1]); + VIR_FREE(mac[0]); + VIR_FREE(filter_id); + virCommandFree(cmd); + return ret; +} + /** * virNetDevBandwidthSet: @@ -416,19 +517,15 @@ virNetDevBandwidthPlug(const char *brname, virCommandPtr cmd = NULL; char *class_id = NULL; char *qdisc_id = NULL; - char *filter_id = NULL; char *floor = NULL; char *ceil = NULL; - unsigned char ifmac[VIR_MAC_BUFLEN]; char ifmacStr[VIR_MAC_STRING_BUFLEN]; - char *mac[2] = {NULL, NULL}; if (id <= 2) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Invalid class ID %d"), id); return -1; } - virMacAddrGetRaw(ifmac_ptr, ifmac); virMacAddrFormat(ifmac_ptr, ifmacStr); if (!net_bandwidth || !net_bandwidth->in) { @@ -441,10 +538,6 @@ virNetDevBandwidthPlug(const char *brname, if (virAsprintf(&class_id, "1:%x", id) < 0 || virAsprintf(&qdisc_id, "%x:", id) < 0 || - virAsprintf(&filter_id, "%u", id) < 0 || - virAsprintf(&mac[0], "0x%02x%02x%02x%02x", ifmac[2], - ifmac[3], ifmac[4], ifmac[5]) < 0 || - virAsprintf(&mac[1], "0x%02x%02x", ifmac[0], ifmac[1]) < 0 || virAsprintf(&floor, "%llukbps", bandwidth->in->floor) < 0 || virAsprintf(&ceil, "%llukbps", net_bandwidth->in->peak ? net_bandwidth->in->peak : @@ -468,31 +561,15 @@ virNetDevBandwidthPlug(const char *brname, if (virCommandRun(cmd, NULL) < 0) goto cleanup; - virCommandFree(cmd); - cmd = virCommandNew(TC); - /* Okay, this not nice. But since libvirt does not know anything about - * interface IP address(es), and tc fw filter simply refuse to use ebtables - * marks, we need to use u32 selector to match MAC address. - * If libvirt will ever know something, remove this FIXME - */ - virCommandAddArgList(cmd, "filter", "add", "dev", brname, "protocol", "ip", - "prio", filter_id, "u32", - "match", "u16", "0x0800", "0xffff", "at", "-2", - "match", "u32", mac[0], "0xffffffff", "at", "-12", - "match", "u16", mac[1], "0xffff", "at", "-14", - "flowid", class_id, NULL); - - if (virCommandRun(cmd, NULL) < 0) + if (virNetDevBandwidthManipulateFilter(brname, ifmac_ptr, id, + class_id, false, true) < 0) goto cleanup; ret = 0; cleanup: - VIR_FREE(mac[1]); - VIR_FREE(mac[0]); VIR_FREE(ceil); VIR_FREE(floor); - VIR_FREE(filter_id); VIR_FREE(qdisc_id); VIR_FREE(class_id); virCommandFree(cmd); @@ -517,7 +594,6 @@ virNetDevBandwidthUnplug(const char *brname, virCommandPtr cmd = NULL; char *class_id = NULL; char *qdisc_id = NULL; - char *filter_id = NULL; if (id <= 2) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Invalid class ID %d"), id); @@ -525,8 +601,7 @@ virNetDevBandwidthUnplug(const char *brname, } if (virAsprintf(&class_id, "1:%x", id) < 0 || - virAsprintf(&qdisc_id, "%x:", id) < 0 || - virAsprintf(&filter_id, "%u", id) < 0) + virAsprintf(&qdisc_id, "%x:", id) < 0) goto cleanup; cmd = virCommandNew(TC); @@ -538,12 +613,8 @@ virNetDevBandwidthUnplug(const char *brname, if (virCommandRun(cmd, &cmd_ret) < 0) goto cleanup; - virCommandFree(cmd); - cmd = virCommandNew(TC); - virCommandAddArgList(cmd, "filter", "del", "dev", brname, - "prio", filter_id, NULL); - - if (virCommandRun(cmd, &cmd_ret) < 0) + if (virNetDevBandwidthManipulateFilter(brname, NULL, id, + NULL, true, false) < 0) goto cleanup; virCommandFree(cmd); @@ -557,7 +628,6 @@ virNetDevBandwidthUnplug(const char *brname, ret = 0; cleanup: - VIR_FREE(filter_id); VIR_FREE(qdisc_id); VIR_FREE(class_id); virCommandFree(cmd); -- 2.0.5

On 04/14/2015 12:59 PM, Michal Privoznik wrote:
Not only this simplifies the code a bit, it prepares the environment for upcoming patches. The new virNetDevBandwidthManipulateFilter() function is capable of both removing a filter and adding a new one. At the same time! Yeah, this is not currently used anywhere but look at the next commit where you'll see it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/virnetdevbandwidth.c | 142 +++++++++++++++++++++++++++++++----------- 1 file changed, 106 insertions(+), 36 deletions(-)
diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index 943178b..c57c8c0 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -43,6 +43,107 @@ virNetDevBandwidthFree(virNetDevBandwidthPtr def) VIR_FREE(def); }
+/** + * virNetDevBandwidthManipulateFilter: + * @ifname: interface to operate on + * @ifmac_ptr: MAC of the interface to create filter over + * @id: filter ID + * @class_id: where to place traffic + * @remove_old: whether to remove the filter + * @create_new: whether to create the filter + * + * TC filters are as crucial for traffic shaping as QDiscs. While + * QDisc acts like black boxes deciding which packets should be
s/QDisc acts/QDiscs act/
+ * held up and which should be sent immediately, it's filter who
s/filter who/the filter that/
+ * places a packet into the box. So, we may end up constructing a + * set of filters on a single device (e.g. a bridge) and filter + * the traffic into QDiscs based on the originating vNET device. + * + * Long story short, @ifname is the interface where the filter + * should be created. The @ifmac_ptr is the MAC address for which + * the filter should be created (usually different to the MAC + * address of @ifname). Then, like everything - even filters have + * an @id which should be unique (per @ifname). And @class_id + * tells into which QDisc should filter place the traffic.
I think this would be better as a bullet list of parameters with their purposes, rather than a chummy paragraph of text :-)
+ * + * This function can be used for both, removing stale filter + * (@remove_old set to true) and creating new one (@create_new + * set to true). Both at once for the same price!
Same here. Not that I don't appreciate the nice tone of the conversation, it's just easier to pick out what you need to know from a list than from a paragraph.
+ * + * Returns: 0 on success, + * -1 otherwise (with error reported). + */ +static int ATTRIBUTE_NONNULL(1) +virNetDevBandwidthManipulateFilter(const char *ifname, + const virMacAddr *ifmac_ptr, + unsigned int id, + const char *class_id, + bool remove_old, + bool create_new)
How about making these two flags so that it will be easier to tell what's intended when looking at a call to the function?
+{ + int ret = -1; + char *filter_id = NULL; + virCommandPtr cmd = NULL; + unsigned char ifmac[VIR_MAC_BUFLEN]; + char *mac[2] = {NULL, NULL}; + + if (!(remove_old || create_new)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("filter creation API error")); + goto cleanup; + }
I'm never sure how far to go with this type of checking on args that are only given literal values in internal-only functions (especially static). Especially if it's static and only called from one or two places, I tend to go on the light side. Every new log message is one more thing to translate, and one more bit of space taken up... ACK, modulo the above comments (any of which you can take or leave) and assuming your functional testing showed it to produce identical results to previous behavior (which I'm guessing is true, since you've just found another bug that would have been noticed by testing bandwidth management while restarting libvirtd :-)
+ + /* u32 filters must have 800:: prefix. Don't ask. */ + if (virAsprintf(&filter_id, "800::%u", id) < 0) + goto cleanup; + + if (remove_old) { + int cmd_ret = 0; + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "filter", "del", "dev", ifname, + "prio", "2", "handle", filter_id, "u32", NULL); + + if (virCommandRun(cmd, &cmd_ret) < 0) + goto cleanup; + + } + + if (create_new) { + virMacAddrGetRaw(ifmac_ptr, ifmac); + + if (virAsprintf(&mac[0], "0x%02x%02x%02x%02x", ifmac[2], + ifmac[3], ifmac[4], ifmac[5]) < 0 || + virAsprintf(&mac[1], "0x%02x%02x", ifmac[0], ifmac[1]) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + /* Okay, this not nice. But since libvirt does not necessarily track + * interface IP address(es), and tc fw filter simply refuse to use + * ebtables marks, we need to use u32 selector to match MAC address. + * If libvirt will ever know something, remove this FIXME + */ + virCommandAddArgList(cmd, "filter", "add", "dev", ifname, "protocol", "ip", + "prio", "2", "handle", filter_id, "u32", + "match", "u16", "0x0800", "0xffff", "at", "-2", + "match", "u32", mac[0], "0xffffffff", "at", "-12", + "match", "u16", mac[1], "0xffff", "at", "-14", + "flowid", class_id, NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + } + + ret = 0; + cleanup: + VIR_FREE(mac[1]); + VIR_FREE(mac[0]); + VIR_FREE(filter_id); + virCommandFree(cmd); + return ret; +} +
/** * virNetDevBandwidthSet: @@ -416,19 +517,15 @@ virNetDevBandwidthPlug(const char *brname, virCommandPtr cmd = NULL; char *class_id = NULL; char *qdisc_id = NULL; - char *filter_id = NULL; char *floor = NULL; char *ceil = NULL; - unsigned char ifmac[VIR_MAC_BUFLEN]; char ifmacStr[VIR_MAC_STRING_BUFLEN]; - char *mac[2] = {NULL, NULL};
if (id <= 2) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Invalid class ID %d"), id); return -1; }
- virMacAddrGetRaw(ifmac_ptr, ifmac); virMacAddrFormat(ifmac_ptr, ifmacStr);
if (!net_bandwidth || !net_bandwidth->in) { @@ -441,10 +538,6 @@ virNetDevBandwidthPlug(const char *brname,
if (virAsprintf(&class_id, "1:%x", id) < 0 || virAsprintf(&qdisc_id, "%x:", id) < 0 || - virAsprintf(&filter_id, "%u", id) < 0 || - virAsprintf(&mac[0], "0x%02x%02x%02x%02x", ifmac[2], - ifmac[3], ifmac[4], ifmac[5]) < 0 || - virAsprintf(&mac[1], "0x%02x%02x", ifmac[0], ifmac[1]) < 0 || virAsprintf(&floor, "%llukbps", bandwidth->in->floor) < 0 || virAsprintf(&ceil, "%llukbps", net_bandwidth->in->peak ? net_bandwidth->in->peak : @@ -468,31 +561,15 @@ virNetDevBandwidthPlug(const char *brname, if (virCommandRun(cmd, NULL) < 0) goto cleanup;
- virCommandFree(cmd); - cmd = virCommandNew(TC); - /* Okay, this not nice. But since libvirt does not know anything about - * interface IP address(es), and tc fw filter simply refuse to use ebtables - * marks, we need to use u32 selector to match MAC address. - * If libvirt will ever know something, remove this FIXME - */ - virCommandAddArgList(cmd, "filter", "add", "dev", brname, "protocol", "ip", - "prio", filter_id, "u32", - "match", "u16", "0x0800", "0xffff", "at", "-2", - "match", "u32", mac[0], "0xffffffff", "at", "-12", - "match", "u16", mac[1], "0xffff", "at", "-14", - "flowid", class_id, NULL); - - if (virCommandRun(cmd, NULL) < 0) + if (virNetDevBandwidthManipulateFilter(brname, ifmac_ptr, id, + class_id, false, true) < 0) goto cleanup;
ret = 0;
cleanup: - VIR_FREE(mac[1]); - VIR_FREE(mac[0]); VIR_FREE(ceil); VIR_FREE(floor); - VIR_FREE(filter_id); VIR_FREE(qdisc_id); VIR_FREE(class_id); virCommandFree(cmd); @@ -517,7 +594,6 @@ virNetDevBandwidthUnplug(const char *brname, virCommandPtr cmd = NULL; char *class_id = NULL; char *qdisc_id = NULL; - char *filter_id = NULL;
if (id <= 2) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Invalid class ID %d"), id); @@ -525,8 +601,7 @@ virNetDevBandwidthUnplug(const char *brname, }
if (virAsprintf(&class_id, "1:%x", id) < 0 || - virAsprintf(&qdisc_id, "%x:", id) < 0 || - virAsprintf(&filter_id, "%u", id) < 0) + virAsprintf(&qdisc_id, "%x:", id) < 0) goto cleanup;
cmd = virCommandNew(TC); @@ -538,12 +613,8 @@ virNetDevBandwidthUnplug(const char *brname, if (virCommandRun(cmd, &cmd_ret) < 0) goto cleanup;
- virCommandFree(cmd); - cmd = virCommandNew(TC); - virCommandAddArgList(cmd, "filter", "del", "dev", brname, - "prio", filter_id, NULL); - - if (virCommandRun(cmd, &cmd_ret) < 0) + if (virNetDevBandwidthManipulateFilter(brname, NULL, id, + NULL, true, false) < 0) goto cleanup;
virCommandFree(cmd); @@ -557,7 +628,6 @@ virNetDevBandwidthUnplug(const char *brname, ret = 0;
cleanup: - VIR_FREE(filter_id); VIR_FREE(qdisc_id); VIR_FREE(class_id); virCommandFree(cmd);

On 16.04.2015 19:25, Laine Stump wrote:
On 04/14/2015 12:59 PM, Michal Privoznik wrote:
Not only this simplifies the code a bit, it prepares the environment for upcoming patches. The new virNetDevBandwidthManipulateFilter() function is capable of both removing a filter and adding a new one. At the same time! Yeah, this is not currently used anywhere but look at the next commit where you'll see it.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/virnetdevbandwidth.c | 142 +++++++++++++++++++++++++++++++----------- 1 file changed, 106 insertions(+), 36 deletions(-)
diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index 943178b..c57c8c0 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -43,6 +43,107 @@ virNetDevBandwidthFree(virNetDevBandwidthPtr def) VIR_FREE(def); }
+/** + * virNetDevBandwidthManipulateFilter: + * @ifname: interface to operate on + * @ifmac_ptr: MAC of the interface to create filter over + * @id: filter ID + * @class_id: where to place traffic + * @remove_old: whether to remove the filter + * @create_new: whether to create the filter + * + * TC filters are as crucial for traffic shaping as QDiscs. While + * QDisc acts like black boxes deciding which packets should be
s/QDisc acts/QDiscs act/
+ * held up and which should be sent immediately, it's filter who
s/filter who/the filter that/
+ * places a packet into the box. So, we may end up constructing a + * set of filters on a single device (e.g. a bridge) and filter + * the traffic into QDiscs based on the originating vNET device. + * + * Long story short, @ifname is the interface where the filter + * should be created. The @ifmac_ptr is the MAC address for which + * the filter should be created (usually different to the MAC + * address of @ifname). Then, like everything - even filters have + * an @id which should be unique (per @ifname). And @class_id + * tells into which QDisc should filter place the traffic.
I think this would be better as a bullet list of parameters with their purposes, rather than a chummy paragraph of text :-)
+ * + * This function can be used for both, removing stale filter + * (@remove_old set to true) and creating new one (@create_new + * set to true). Both at once for the same price!
Same here. Not that I don't appreciate the nice tone of the conversation, it's just easier to pick out what you need to know from a list than from a paragraph.
+ * + * Returns: 0 on success, + * -1 otherwise (with error reported). + */ +static int ATTRIBUTE_NONNULL(1) +virNetDevBandwidthManipulateFilter(const char *ifname, + const virMacAddr *ifmac_ptr, + unsigned int id, + const char *class_id, + bool remove_old, + bool create_new)
How about making these two flags so that it will be easier to tell what's intended when looking at a call to the function?
I was thinking about this too, but then I went with booleans. My idea was that it's shorter this way than inventing new enum items like VIR_NET_DEV_BANDWIDTH_FILTER_CREATE or VIR_NET_DEV_BANDWIDTH_FILTER_REMOVE. But if somebody prefers the other way, I can switch to that. Michal

On 04/17/2015 04:51 AM, Michal Privoznik wrote:
On 16.04.2015 19:25, Laine Stump wrote:
On 04/14/2015 12:59 PM, Michal Privoznik wrote:
Not only this simplifies the code a bit, it prepares the environment for upcoming patches. The new virNetDevBandwidthManipulateFilter() function is capable of both removing a filter and adding a new one. At the same time! Yeah, this is not currently used anywhere but look at the next commit + * + * Returns: 0 on success, + * -1 otherwise (with error reported). + */ +static int ATTRIBUTE_NONNULL(1) +virNetDevBandwidthManipulateFilter(const char *ifname, + const virMacAddr *ifmac_ptr, + unsigned int id, + const char *class_id, + bool remove_old, + bool create_new) How about making these two flags so that it will be easier to tell what's intended when looking at a call to the function? I was thinking about this too, but then I went with booleans. My idea was that it's shorter this way than inventing new enum items like VIR_NET_DEV_BANDWIDTH_FILTER_CREATE or VIR_NET_DEV_BANDWIDTH_FILTER_REMOVE. But if somebody prefers the other way, I can switch to that.
I'm fine if you want to keep it as booleans, too.

This is practically a wrapper over virNetDevBandwidthManipulateFilter() that will update the desired filter on an interface (usually a network bridge) with a new MAC address. Although, the MAC address in question usually refers to some other interface - the one that the filter is constructed for. Yeah, hard to parse. Thing is, our NATed network has a bridge where some part of QoS takes place. And vNICs from guests are plugged into the bridge. However, if a guest decides to change the MAC of its vNIC, corresponding qemu process emits an event to which we respond somehow. However, our QoS hierarchy is currently not notified, therefore it falls apart. This function (when called from the correct place) will update our QoS hierarchy and duck tape it together again. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virnetdevbandwidth.c | 38 ++++++++++++++++++++++++++++++++++++++ src/util/virnetdevbandwidth.h | 6 ++++++ 3 files changed, 45 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 7166283..547a919 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1753,6 +1753,7 @@ virNetDevBandwidthFree; virNetDevBandwidthPlug; virNetDevBandwidthSet; virNetDevBandwidthUnplug; +virNetDevBandwidthUpdateFilter; virNetDevBandwidthUpdateRate; diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index c57c8c0..0c49b41 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -680,3 +680,41 @@ virNetDevBandwidthUpdateRate(const char *ifname, VIR_FREE(ceil); return ret; } + +/** + * virNetDevBandwidthUpdateFilter: + * @ifname: interface to operate on + * @ifmac_ptr: new MAC to update the filter with + * @id: filter ID + * + * Sometimes the host environment is so dynamic, that even + * guest's MAC addresses change on the fly. That's when we must + * update our QoS hierarchy so that guest's traffic is placed + * into correct QDiscs. This function does exactly that. On the + * interface @ifname (usually a bridge) updates filter with + * unique identifier @id so that it reflects fact, that new + * address corresponding to the filter has changed to @ifmac_ptr. + * + * Returns: 0 on success, + * -1 on failure (with error reported). + */ +int +virNetDevBandwidthUpdateFilter(const char *ifname, + const virMacAddr *ifmac_ptr, + unsigned int id) +{ + int ret = -1; + char *class_id = NULL; + + if (virAsprintf(&class_id, "1:%x", id) < 0) + goto cleanup; + + if (virNetDevBandwidthManipulateFilter(ifname, ifmac_ptr, id, + class_id, true, true) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(class_id); + return ret; +} diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index efcf95a..9b1d2a6 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -73,4 +73,10 @@ int virNetDevBandwidthUpdateRate(const char *ifname, unsigned long long new_rate) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; + +int virNetDevBandwidthUpdateFilter(const char *ifname, + const virMacAddr *ifmac_ptr, + unsigned int id) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) + ATTRIBUTE_RETURN_CHECK; #endif /* __VIR_NETDEV_BANDWIDTH_H__ */ -- 2.0.5

On 04/14/2015 12:59 PM, Michal Privoznik wrote:
This is practically a wrapper over
"This is a simple wrapper around"
virNetDevBandwidthManipulateFilter() that will update the desired filter on an interface (usually a network bridge) with a new MAC address. Although, the MAC address in question usually refers to some other interface - the one that the filter is constructed for. Yeah, hard to parse. Thing is, our NATed network has a
Again with the extra unnecessary verbiage :-)
bridge where some part of QoS takes place. And vNICs from guests are plugged into the bridge. However, if a guest decides to change the MAC of its vNIC, corresponding qemu process emits an
s/corresponding/the corresponding/
event to which we respond somehow.
"... an event which we can use to update the QoS configuration based on the new MAC address."
However, our QoS hierarchy is currently not notified, therefore it falls apart.
"falls apart" is a bit lacking on the details :-)
This function (when called from the correct place)
(when called in response to the aforementioned event)
will update our QoS hierarchy and duck tape it together again.
Technically, it's "duct" tape (although there is one company that capitalizes on the mispronunciation by calling theirs "Duck(tm) Tape".
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virnetdevbandwidth.c | 38 ++++++++++++++++++++++++++++++++++++++ src/util/virnetdevbandwidth.h | 6 ++++++ 3 files changed, 45 insertions(+)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 7166283..547a919 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1753,6 +1753,7 @@ virNetDevBandwidthFree; virNetDevBandwidthPlug; virNetDevBandwidthSet; virNetDevBandwidthUnplug; +virNetDevBandwidthUpdateFilter; virNetDevBandwidthUpdateRate;
diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index c57c8c0..0c49b41 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -680,3 +680,41 @@ virNetDevBandwidthUpdateRate(const char *ifname, VIR_FREE(ceil); return ret; } + +/** + * virNetDevBandwidthUpdateFilter: + * @ifname: interface to operate on + * @ifmac_ptr: new MAC to update the filter with + * @id: filter ID + * + * Sometimes the host environment is so dynamic, that even + * guest's MAC addresses change on the fly. That's when we must
s/guest's/a guest's/ s/That's when/When that happens/
+ * update our QoS hierarchy so that guest's traffic is placed
s/guest's/the guest's/
+ * into correct QDiscs.
s/correct/the correct/
This function does exactly that. On the + * interface @ifname (usually a bridge) updates filter with + * unique identifier @id so that it reflects fact, that new + * address corresponding to the filter has changed to @ifmac_ptr.
This function updates the filter for the interface @ifname with the unique identifier @id so that it uses the new MAC address of the guest interface @ifmac_ptr.
+ * + * Returns: 0 on success, + * -1 on failure (with error reported). + */ +int +virNetDevBandwidthUpdateFilter(const char *ifname, + const virMacAddr *ifmac_ptr, + unsigned int id) +{ + int ret = -1; + char *class_id = NULL; + + if (virAsprintf(&class_id, "1:%x", id) < 0) + goto cleanup; + + if (virNetDevBandwidthManipulateFilter(ifname, ifmac_ptr, id, + class_id, true, true) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(class_id); + return ret; +} diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index efcf95a..9b1d2a6 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -73,4 +73,10 @@ int virNetDevBandwidthUpdateRate(const char *ifname, unsigned long long new_rate) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; + +int virNetDevBandwidthUpdateFilter(const char *ifname, + const virMacAddr *ifmac_ptr, + unsigned int id) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) + ATTRIBUTE_RETURN_CHECK; #endif /* __VIR_NETDEV_BANDWIDTH_H__ */
ACK with the description cleaned up a bit.

So, as partly explained in previous commit, there is an issue with NATed networks on a guest originated MAC change. When user sets @floor in <inbound/> under <bandwidth/> it results in a QoS hierarchy being built on the corresponding bridge (yes, this feature is type='network' only). The reason is we can make shaping decisions only there, where the whole traffic can be seen. In case of bridged networks it's bridge. But, on the bridge, in egress qdiscs (outgoing traffic, after routing decision), due to some Linux kernel internal implementation, we are not able to see the device packet came from. So we have a set of filters that place the packet into correct qdisc based on the MAC recorded in the packet. But with recent changes, this is not working as expected. We taught libvirt to listen to MAC change events from guests. However, the QoS hierarchy is not updated accordingly resulting in applying incorrect shaping rules on a vNIC that changed the MAC address. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_driver.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index f5a3ef9..6aab1d0 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4382,6 +4382,28 @@ processNicRxFilterChangedEvent(virQEMUDriverPtr driver, syncNicRxFilterDeviceOptions(def->ifname, guestFilter, hostFilter); } + if (virDomainNetGetActualType(def) == VIR_DOMAIN_NET_TYPE_NETWORK) { + const char *brname = virDomainNetGetActualBridgeName(def); + + if (virNetDevGetRxFilter(def->ifname, &hostFilter)) { + VIR_WARN("Couldn't get current RX filter for device %s " + "while responding to NIC_RX_FILTER_CHANGED", + def->ifname); + goto endjob; + } + + /* For TUN/TAP connections, set the following macvtap network device + * attributes to match those of the guest network device: + * - MAC address + * - QoS filters (which are based on MAC address) + */ + syncNicRxFilterMacAddr(def->ifname, guestFilter, hostFilter); + + if (virNetDevBandwidthUpdateFilter(brname, &guestFilter->mac, + def->data.network.actual->class_id) < 0) + goto endjob; + } + endjob: qemuDomainObjEndJob(driver, vm); -- 2.0.5

On 04/14/2015 12:59 PM, Michal Privoznik wrote:
So, as partly explained in previous commit, there is an issue with NATed networks on a guest originated MAC change. When user sets @floor in <inbound/> under <bandwidth/> it results in a QoS hierarchy being built on the corresponding bridge (yes, this feature is type='network' only). The reason is we can make shaping decisions only there, where the whole traffic can be seen. In case of bridged networks it's bridge. But, on the bridge, in egress qdiscs (outgoing traffic, after routing decision), due to some Linux kernel internal implementation, we are not able to see the device packet came from. So we have a set of filters that place the packet into correct qdisc based on the MAC recorded in the packet. But with recent changes, this is not working as expected. We taught libvirt to listen to MAC change events from guests. However, the QoS hierarchy is not updated accordingly resulting in applying incorrect shaping rules on a vNIC that changed the MAC address.
How about something like this? Because packets going through the egress from a bridge (where our bandwidth limiting takes place) have no information about which interface they came from, the QoS rules that we create instead use the source MAC address of the packets to make their decisions about which QDisc the packet should be in. One flaw in this is that when a guest changed the MAC address it used, packets from the guest would no longer be put into the correct QDisc, but would instead be put in an "unprivileged" class, resulting in the bandwidth "floor" (minimum guaranteed) being no longer honored. Now that libvirt has infrastructure to capture and respond to RX_FILTER_CHANGE events from qemu (sent whenever a guest interface modifies its MAC address, among other things), we can notice when a guest MAC address changes, and update the QoS rules accordingly, so that bandwidth floor is honored even after a guest MAC address change.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_driver.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index f5a3ef9..6aab1d0 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4382,6 +4382,28 @@ processNicRxFilterChangedEvent(virQEMUDriverPtr driver, syncNicRxFilterDeviceOptions(def->ifname, guestFilter, hostFilter); }
We had discussed in private mail whether this should always be done regardless of the setting of trustGuestRxFilters. In some ways it seems like it should, since the guest is free to change its MAC address anyway. On the other hand, if the admin hasn't specifically allowed changing the MAC address, there's nothing saying that we must honor the bandwidth floor for the guest even though it is "going rogue". Note that when macTableManager="libvirt", traffic from to/from the guest *won't* pass at all after a MAC address change. So maybe it *is* proper to not do this unless trustGuestRxFilters is "yes" (as you have it doing right now).
+ if (virDomainNetGetActualType(def) == VIR_DOMAIN_NET_TYPE_NETWORK) { + const char *brname = virDomainNetGetActualBridgeName(def); + + if (virNetDevGetRxFilter(def->ifname, &hostFilter)) { + VIR_WARN("Couldn't get current RX filter for device %s " + "while responding to NIC_RX_FILTER_CHANGED", + def->ifname); + goto endjob; + } + + /* For TUN/TAP connections, set the following macvtap network device + * attributes to match those of the guest network device: + * - MAC address + * - QoS filters (which are based on MAC address) + */ + syncNicRxFilterMacAddr(def->ifname, guestFilter, hostFilter);
syncNicRxFilterMacAddr() is only useful for macvtap devices. It sets the MAC address of the tap (macvtap in that case) to the same address used by the guest. Doing so for a tap device would be disastrous, as the kernel would no longer understand that it had to forward packets *through* the tap device to the guest, but would believe that the host itself was the final destination. If anything, the MAC address of the tap could be set to the guest's MAC with the initial byte changed to 0xFE, but even that isn't really necessary (the exact MAC address of the tap device isn't important, it's just important that it is unique within the L2 domain that the tap is connected to).
+ + if (virNetDevBandwidthUpdateFilter(brname, &guestFilter->mac, + def->data.network.actual->class_id) < 0) + goto endjob; + } + endjob: qemuDomainObjEndJob(driver, vm);
participants (2)
-
Laine Stump
-
Michal Privoznik