[libvirt] [PATCH 0/7] Add support for setting QoS

This patch series add support for setting traffic shaping and policing on both domain's interface and network's virtual bridge. Basically, this is done via 'tc' from iproute2 package. For shaping is HTB used, for policing we need u32 match selector. Both should be available in RHEL-6 kernel. Michal Privoznik (7): bandwidth: Define schema and create documentation bandwidth: Declare internal structures bandwidth: Add format parsing functions bandwidth: Create format functions bandwitdh: Implement functions to enable and disable QoS bandwidth: Add test cases for network bandwidth: Add domain schema test suite configure.ac | 4 + docs/formatdomain.html.in | 32 ++ docs/formatnetwork.html.in | 30 ++ docs/schemas/domain.rng | 50 +++ docs/schemas/network.rng | 51 +++ src/conf/domain_conf.c | 6 + src/conf/domain_conf.h | 1 + src/conf/network_conf.c | 8 + src/conf/network_conf.h | 1 + src/libvirt_private.syms | 6 + src/network/bridge_driver.c | 8 + src/qemu/qemu_command.c | 5 + src/util/network.c | 508 +++++++++++++++++++++++++ src/util/network.h | 28 ++ tests/domainschemadata/domain-bandwidth.xml | 72 ++++ tests/networkxml2xmlin/bandwidth-network.xml | 16 + tests/networkxml2xmlout/bandwidth-network.xml | 16 + tests/networkxml2xmltest.c | 1 + 18 files changed, 843 insertions(+), 0 deletions(-) create mode 100644 tests/domainschemadata/domain-bandwidth.xml create mode 100644 tests/networkxml2xmlin/bandwidth-network.xml create mode 100644 tests/networkxml2xmlout/bandwidth-network.xml -- 1.7.5.rc3

Define new 'bandwidth' element with possible child element 'inbound' and 'outbound' addressing incoming and outgoing traffic respectively: <bandwidth> <inbound average='1mbit' peak='2mbit' burst='5m'/> <outbound average='0.5mbit'/> </bandwidth> Leaving any element out means not to shape traffic in that direction. Accepted units are the same as 'tc' accepts. This element can be inserted into domain's 'interface' and 'network'. --- docs/formatdomain.html.in | 32 +++++++++++++++++++++++++++ docs/formatnetwork.html.in | 30 +++++++++++++++++++++++++ docs/schemas/domain.rng | 50 +++++++++++++++++++++++++++++++++++++++++++ docs/schemas/network.rng | 51 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 163 insertions(+), 0 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 39e1a85..3bd2440 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1776,6 +1776,38 @@ qemu-kvm -net nic,model=? /dev/null <span class="since">Since 0.8.8</span> </p> + <h5><a name="elementQoS">Quality of service</a></h5> + +<pre> + ... + <devices> + <interface type='network'> + <source network='default'/> + <target dev='vnet0'/> + <b><bandwidth> + <inbound average='1mbit' peak='0.5mbps' burst='1m'/> + <outbound average='128kbit' peak='256kbit' burst='256kbit'/> + </bandwidth></b> + </interface> + <devices> + ...</pre> + + <p> + This part of interface XML provides setting quality of service. Incoming + and outgoing traffic can be shaped independently. The + <code>bandwidth</code> element can have at most one <code>inbound</code> + and at most one <code>outbound</code> child elements. Leaving any of these + children element out result in no QoS applied on that traffic direction. + So, when you want to shape only domain's incoming traffic, use + <code>inbound</code> only, and vice versa. Each of these elements have one + mandatory attribute <code>average</code>. It specifies average bit rate on + interface being shaped. Then there are two optional attributes: + <code>peak</code>, which specifies maximum rate at which interface can send + data, and <code>burst</code>, amount of bytes that can be burst at + <code>peak</code> speed. Accepted values for attributes are decimal + numbers, optionally followed by unit. See <code>man tc</code> for more + details. + </p> <h4><a name="elementsInput">Input devices</a></h4> <p> diff --git a/docs/formatnetwork.html.in b/docs/formatnetwork.html.in index 589aaff..d222704 100644 --- a/docs/formatnetwork.html.in +++ b/docs/formatnetwork.html.in @@ -102,6 +102,36 @@ 0.4.2</span></dd> </dl> + <h5><a name="elementQoS">Quality of service</a></h5> + +<pre> + ... + <forward mode='nat' dev='eth0'/> + <b><bandwidth> + <inbound average='1mbit' peak='0.5mbps' burst='1m'/> + <outbound average='128kbit' peak='256kbit' burst='256kbit'/> + </bandwidth></b> + <mac address='00:16:3E:5D:C7:9E'/> + ...</pre> + + <p> + This part of network XML provides setting quality of service. Incoming + and outgoing traffic can be shaped independently. The + <code>bandwidth</code> element can have at most one <code>inbound</code> + and at most one <code>outbound</code> child elements. Leaving any of these + children element out result in no QoS applied on that traffic direction. + So, when you want to shape only network's incoming traffic, use + <code>inbound</code> only, and vice versa. Each of these elements have one + mandatory attribute <code>average</code>. It specifies average bit rate on + interface being shaped. Then there are two optional attributes: + <code>peak</code>, which specifies maximum rate at which bridge can send + data, and <code>burst</code>, amount of bytes that can be burst at + <code>peak</code> speed. Accepted values for attributes are decimal + numbers, optionally followed by unit. See <code>man tc</code> for + more details. The rate is shared equally within domains connected + to the network. + </p> + <h3><a name="elementsAddress">Addressing</a></h3> <p> diff --git a/docs/schemas/domain.rng b/docs/schemas/domain.rng index 891662d..b15c8e5 100644 --- a/docs/schemas/domain.rng +++ b/docs/schemas/domain.rng @@ -1155,6 +1155,9 @@ <optional> <ref name="deviceBoot"/> </optional> + <optional> + <ref name="bandwidth"/> + </optional> </interleave> </define> <define name="virtualPortProfile"> @@ -2307,6 +2310,43 @@ </element> </define> + <define name="bandwidth"> + <element name="bandwidth"> + <interleave> + <optional> + <element name="inbound"> + <ref name="bandwidth-attributes"/> + <empty/> + </element> + </optional> + <optional> + <element name="outbound"> + <ref name="bandwidth-attributes"/> + <empty/> + </element> + </optional> + </interleave> + </element> + </define> + + <define name="bandwidth-attributes"> + <interleave> + <attribute name="average"> + <ref name="speed"/> + </attribute> + <optional> + <attribute name="peak"> + <ref name="speed"/> + </attribute> + </optional> + <optional> + <attribute name='burst'> + <ref name="BurstSize"/> + </attribute> + </optional> + </interleave> + </define> + <!-- Optional hypervisor extensions in their own namespace: QEmu @@ -2557,4 +2597,14 @@ <param name="maxLength">39</param> </data> </define> + <define name="speed"> + <data type="string"> + <param name="pattern">[0-9]+(\.[0-9]+)?((Ki|k|mi|m|gi|g|ti|t|)?bit|(Ki|K|Mi|M|Gi|G|Ti|T)?Bps)?</param> + </data> + </define> + <define name="BurstSize"> + <data type="string"> + <param name="pattern">[0-9]+(\.[0-9]+)?(kb|k|mb|m|mbit|kbit|b|gb|g|gbit)?</param> + </data> + </define> </grammar> diff --git a/docs/schemas/network.rng b/docs/schemas/network.rng index 6d01b06..b3b7bcf 100644 --- a/docs/schemas/network.rng +++ b/docs/schemas/network.rng @@ -87,6 +87,10 @@ </element> </optional> + <optional> + <ref name="bandwidth"/> + </optional> + <!-- <ip> element --> <zeroOrMore> <!-- The IP element sets up NAT'ing and an optional DHCP server @@ -183,4 +187,51 @@ </data> </define> + <define name="bandwidth"> + <element name="bandwidth"> + <interleave> + <optional> + <element name="inbound"> + <ref name="bandwidth-attributes"/> + <empty/> + </element> + </optional> + <optional> + <element name="outbound"> + <ref name="bandwidth-attributes"/> + <empty/> + </element> + </optional> + </interleave> + </element> + </define> + + <define name="bandwidth-attributes"> + <interleave> + <attribute name="average"> + <ref name="speed"/> + </attribute> + <optional> + <attribute name="peak"> + <ref name="speed"/> + </attribute> + </optional> + <optional> + <attribute name='burst'> + <ref name="BurstSize"/> + </attribute> + </optional> + </interleave> + </define> + + <define name="speed"> + <data type="string"> + <param name="pattern">[0-9]+(\.[0-9]+)?((Ki|k|mi|m|gi|g|ti|t|)?bit|(Ki|K|Mi|M|Gi|G|Ti|T)?Bps)?</param> + </data> + </define> + <define name="BurstSize"> + <data type="string"> + <param name="pattern">[0-9]+(\.[0-9]+)?(kb|k|mb|m|mbit|kbit|b|gb|g|gbit)?</param> + </data> + </define> </grammar> -- 1.7.5.rc3

--- src/util/network.h | 16 ++++++++++++++++ 1 files changed, 16 insertions(+), 0 deletions(-) diff --git a/src/util/network.h b/src/util/network.h index ed0b78c..568bca1 100644 --- a/src/util/network.h +++ b/src/util/network.h @@ -45,6 +45,22 @@ typedef struct { typedef virSocketAddr *virSocketAddrPtr; +typedef struct { + /* Even if we let user to input rates + * in various units, we store them in bps */ + unsigned long average; + unsigned long peak; + unsigned long burst; +} virRate; + +typedef virRate *virRatePtr; + +typedef struct { + virRate in, out; +} virBandwidth; + +typedef virBandwidth *virBandwidthPtr; + int virSocketParseAddr (const char *val, virSocketAddrPtr addr, int hint); -- 1.7.5.rc3

These functions take on input decimal numbers optionally followed by unit. Units are exactly the same as 'tc' accepts. --- src/conf/domain_conf.c | 3 + src/conf/domain_conf.h | 1 + src/conf/network_conf.c | 5 + src/conf/network_conf.h | 1 + src/libvirt_private.syms | 1 + src/util/network.c | 203 ++++++++++++++++++++++++++++++++++++++++++++++ src/util/network.h | 2 + 7 files changed, 216 insertions(+), 0 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 3d290fb..b7c88c8 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2838,6 +2838,9 @@ virDomainNetDefParseXML(virCapsPtr caps, if (virDomainDeviceBootParseXML(cur, &def->bootIndex, bootMap)) goto error; + } else if (xmlStrEqual(cur->name, BAD_CAST "bandwidth")) { + if (virBandwidthDefParseNode(cur, &def->bandwidth) < 0) + goto error; } } cur = cur->next; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index aa25e36..25fafee 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -398,6 +398,7 @@ struct _virDomainNetDef { virDomainDeviceInfo info; char *filter; virNWFilterHashTablePtr filterparams; + virBandwidth bandwidth; }; enum virDomainChrDeviceType { diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index e4765ea..ef5d31e 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -584,6 +584,7 @@ virNetworkDefParseXML(xmlXPathContextPtr ctxt) virNetworkDefPtr def; char *tmp; xmlNodePtr *ipNodes = NULL; + xmlNodePtr bandwidthNode = NULL; int nIps; if (VIR_ALLOC(def) < 0) { @@ -619,6 +620,10 @@ virNetworkDefParseXML(xmlXPathContextPtr ctxt) /* Parse network domain information */ def->domain = virXPathString("string(./domain[1]/@name)", ctxt); + if ((bandwidthNode = virXPathNode("./bandwidth", ctxt)) != NULL && + virBandwidthDefParseNode(bandwidthNode, &def->bandwidth) < 0) + goto error; + /* Parse bridge information */ def->bridge = virXPathString("string(./bridge[1]/@name)", ctxt); tmp = virXPathString("string(./bridge[1]/@stp)", ctxt); diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h index 281124b..65fbedd 100644 --- a/src/conf/network_conf.h +++ b/src/conf/network_conf.h @@ -101,6 +101,7 @@ struct _virNetworkDef { size_t nips; virNetworkIpDefPtr ips; /* ptr to array of IP addresses on this network */ + virBandwidth bandwidth; }; typedef struct _virNetworkObj virNetworkObj; diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index d9d3913..a2c8470 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -681,6 +681,7 @@ nlComm; # network.h +virBandwidthDefParseNode; virSocketAddrBroadcast; virSocketAddrBroadcastByPrefix; virSocketAddrIsNetmask; diff --git a/src/util/network.c b/src/util/network.c index eb16e0c..476ecde 100644 --- a/src/util/network.c +++ b/src/util/network.c @@ -10,6 +10,7 @@ #include <config.h> #include <arpa/inet.h> +#include <math.h> #include "memory.h" #include "network.h" @@ -21,6 +22,9 @@ virReportErrorHelper(VIR_FROM_THIS, code, __FILE__, \ __FUNCTION__, __LINE__, __VA_ARGS__) +#define virBandwidthError(code, ...) \ + virReportErrorHelper(VIR_FROM_THIS, code, __FILE__, \ + __FUNCTION__, __LINE__, __VA_ARGS__) /* * Helpers to extract the IP arrays from the virSocketAddrPtr * That part is the less portable of the module @@ -674,3 +678,202 @@ virSocketAddrPrefixToNetmask(unsigned int prefix, error: return result; } + +static const struct rate_suffix { + const char *name; + double scale; +} suffixes[] = { + { "bit", 1. }, + { "Kibit", 1024. }, + { "kbit", 1000. }, + { "mibit", 1024.*1024. }, + { "mbit", 1000.*1000. }, + { "gibit", 1024.*1024.*1024. }, + { "gbit", 1000.*1000.*1000. }, + { "tibit", 1024.*1024.*1024.*1024. }, + { "tbit", 1000.*1000.*1000.*1000. }, + { "Bps", 8. }, + { "KiBps", 8.*1024. }, + { "KBps", 8.*1000. }, + { "MiBps", 8.*1024.*1024. }, + { "MBps", 8.*1000.*1000. }, + { "GiBps", 8.*1024.*1024.*1024. }, + { "GBps", 8.*1000.*1000.*1000. }, + { "TiBps", 8.*1024.*1024.*1024.*1024. }, + { "TBps", 8.*1000.*1000.*1000.*1000. }, + { NULL, 0 } +}; + +static int +virRateToBps(const char *str, unsigned long *rate) +{ + char *p; + double bps; + const struct rate_suffix *s; + + if (virStrToDouble(str, &p, &bps) < 0) + return -1; + + if (p == str) + return -1; + + if (*p == '\0') { + *rate = bps / 8.; /* assume bytes/sec */ + return 0; + } + + for (s = suffixes; s->name; ++s) { + if (STRCASEEQ(s->name, p)) { + *rate = (bps * s->scale) / 8.; + return 0; + } + } + + return -1; +} + +static int +virSizeToB(const char *str, unsigned long *size) +{ + double sz; + char *p; + + if (virStrToDouble(str, &p, &sz) < 0) + return -1; + + if (p == str) + return -1; + + if (*p) { + if (STRCASEEQ(p, "kb") || STRCASEEQ(p, "k")) + sz *= 1024; + else if (STRCASEEQ(p, "gb") || STRCASEEQ(p, "g")) + sz *= 1024*1024*1024; + else if (STRCASEEQ(p, "gbit")) + sz *= 1024*1024*1024/8; + else if (STRCASEEQ(p, "mb") || STRCASEEQ(p, "m")) + sz *= 1024*1024; + else if (STRCASEEQ(p, "mbit")) + sz *= 1024*1024/8; + else if (STRCASEEQ(p, "kbit")) + sz *= 1024/8; + else if (STRCASENEQ(p, "b")) + return -1; + } + + *size = sz; + return 0; +} + +static int +virBandwidthParseChildDefNode(xmlNodePtr node, virRatePtr rate) +{ + int ret = -1; + char *average = NULL; + char *peak = NULL; + char *burst = NULL; + + if (!node || !rate) + return -1; + + average = virXMLPropString(node, "average"); + peak = virXMLPropString(node, "peak"); + burst = virXMLPropString(node, "burst"); + + if (average) { + if (virRateToBps(average, &rate->average) < 0) { + virBandwidthError(VIR_ERR_CONFIG_UNSUPPORTED, + _("could not convert %s"), + average); + goto cleanup; + } + } else { + virBandwidthError(VIR_ERR_XML_DETAIL, "%s", + _("Missing mandatory average attribute")); + goto cleanup; + } + + if (peak && virRateToBps(peak, &rate->peak) < 0) { + virBandwidthError(VIR_ERR_CONFIG_UNSUPPORTED, + _("could not convert %s"), + peak); + goto cleanup; + } + + if (burst && virSizeToB(burst, &rate->burst) < 0) { + virBandwidthError(VIR_ERR_CONFIG_UNSUPPORTED, + _("could not convert %s"), + burst); + goto cleanup; + } + + ret = 0; + +cleanup: + VIR_FREE(average); + VIR_FREE(peak); + VIR_FREE(burst); + + return ret; +} + +/** + * virBandwidthParseXML: + * @node: XML node + * @def: where to store the parsed result + * + * Parse bandwidth XML and store into given pointer + * + * Returns 0 on success, -1 on error. + */ +int +virBandwidthDefParseNode(xmlNodePtr node, virBandwidthPtr def) +{ + int ret = -1; + xmlNodePtr cur = node->children; + xmlNodePtr in = NULL, out = NULL; + + if (!node || !def || + !xmlStrEqual(node->name, BAD_CAST "bandwidth")) + return -1; + + memset(def, 0, sizeof(virBandwidth)); + while (cur) { + if (cur->type == XML_ELEMENT_NODE) { + if (xmlStrEqual(cur->name, BAD_CAST "inbound")) { + if (in) { + virBandwidthError(VIR_ERR_XML_DETAIL, "%s", + _("Only one child <inbound> " + "element allowed")); + goto cleanup; + } + in = cur; + } else if (xmlStrEqual(cur->name, BAD_CAST "outbound")) { + if (out) { + virBandwidthError(VIR_ERR_XML_DETAIL, "%s", + _("Only one child <outbound> " + "element allowed")); + goto cleanup; + } + out = cur; + } else { + virBandwidthError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Unknown element %s"), + cur->name); + goto cleanup; + } + } + cur = cur->next; + } + + if (in && virBandwidthParseChildDefNode(in, &def->in) < 0) + goto cleanup; + + if (out && virBandwidthParseChildDefNode(out, &def->out) < 0) + goto cleanup; + + ret = 0; + +cleanup: + return ret; +} diff --git a/src/util/network.h b/src/util/network.h index 568bca1..28a6402 100644 --- a/src/util/network.h +++ b/src/util/network.h @@ -20,6 +20,7 @@ # endif # include <netdb.h> # include <netinet/in.h> +# include "xml.h" typedef struct { union { @@ -106,4 +107,5 @@ int virSocketAddrPrefixToNetmask(unsigned int prefix, virSocketAddrPtr netmask, int family); +int virBandwidthDefParseNode(xmlNodePtr node, virBandwidthPtr def); #endif /* __VIR_NETWORK_H__ */ -- 1.7.5.rc3

--- src/conf/domain_conf.c | 3 + src/conf/network_conf.c | 3 + src/libvirt_private.syms | 3 + src/util/network.c | 153 ++++++++++++++++++++++++++++++++++++++++++++++ src/util/network.h | 7 ++ 5 files changed, 169 insertions(+), 0 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index b7c88c8..0dfcd5c 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -8687,6 +8687,9 @@ virDomainNetDefFormat(virBufferPtr buf, virBufferAddLit(buf, " </tune>\n"); } + if (virBandwidrhDefFormat(buf, &def->bandwidth, " ") < 0) + return -1; + if (virDomainDeviceInfoFormat(buf, &def->info, flags) < 0) return -1; diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index ef5d31e..02b4cca 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -886,6 +886,9 @@ char *virNetworkDefFormat(const virNetworkDefPtr def) if (def->domain) virBufferAsprintf(&buf, " <domain name='%s'/>\n", def->domain); + if (virBandwidrhDefFormat(&buf, &def->bandwidth, " ") < 0) + goto error; + for (ii = 0; ii < def->nips; ii++) { if (virNetworkIpDefFormat(&buf, &def->ips[ii]) < 0) goto error; diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index a2c8470..6640cfb 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -681,7 +681,10 @@ nlComm; # network.h +virBandwidrhDefFormat; virBandwidthDefParseNode; +virBpsToRate; +virBtoSize; virSocketAddrBroadcast; virSocketAddrBroadcastByPrefix; virSocketAddrIsNetmask; diff --git a/src/util/network.c b/src/util/network.c index 476ecde..b24f977 100644 --- a/src/util/network.c +++ b/src/util/network.c @@ -877,3 +877,156 @@ virBandwidthDefParseNode(xmlNodePtr node, virBandwidthPtr def) cleanup: return ret; } + +/** + * virBpsToRate: + * @buf: output + * @rate: input + * + * Format rate, take bps as input. Do pretty formating. + * Caller MUST free @buf after use. + * + * Return 0 on success, -1 otherwise. + */ +int virBpsToRate(char **buf, unsigned long rate) { + int ret = -1; + double tmp = (double)rate*8; + + if (tmp >= 1000.*1000.*1000.) { + if (virAsprintf(buf, "%.0fmbit", tmp/1000./1000.) < 0) + goto cleanup; + } else if (tmp >= 1000.*1000.) { + if (virAsprintf(buf, "%.0fkbit", tmp/1000.) < 0) + goto cleanup; + } else { + if (virAsprintf(buf, "%.0fbit", tmp) < 0) + goto cleanup; + } + + ret = 0; + +cleanup: + return ret; +} + +/** + * virBtoSize: + * @buf: output + * @size: input + * + * Format size, take size in bytes, produce prettier format. + * Caller MUST free @buf after use. + * + * Return 0 on success, -1 otherwise. + */ +int virBtoSize(char **buf, unsigned long size) { + int ret = -1; + double tmp = (double)size; + + if (size >= 1024*1024 && + fabs(1024*1024 * rint(tmp/(1024*1024)) - size) < 1024) { + if (virAsprintf(buf, "%gmb", rint(tmp/(1024*1024))) < 0) + goto cleanup; + } else if (size >= 1024 && + fabs(1024*rint(tmp/1024) - size) < 16) { + if (virAsprintf(buf, "%gkb", rint(tmp/1024)) < 0) + goto cleanup; + } else { + if (virAsprintf(buf, "%lub", size) < 0) + goto cleanup; + } + + ret = 0; + +cleanup: + return ret; +} + +static int +virBandwidthChildDefFormat(virBufferPtr buf, + virRatePtr def, + const char *elem_name) +{ + int ret = -1; + char *tmp = NULL; + + if (!buf || !def || !elem_name) + return -1; + + if (def->average) { + if (virBpsToRate(&tmp, def->average) < 0) + goto cleanup; + virBufferAsprintf(buf, "<%s average='%s'", elem_name, tmp); + VIR_FREE(tmp); + + if (def->peak) { + if (virBpsToRate(&tmp, def->peak) < 0) + goto cleanup; + virBufferAsprintf(buf, " peak='%s'", tmp); + VIR_FREE(tmp); + } + + if (def->burst) { + if (virBtoSize(&tmp, def->burst)) + goto cleanup; + virBufferAsprintf(buf, " burst='%s'", tmp); + VIR_FREE(tmp); + } + virBufferAddLit(buf, "/>\n"); + } + + ret = 0; + +cleanup: + return ret; +} + +/** + * virBandwidrhDefFormat: + * @buf: Buffer to print to + * @def: Data source + * @indent: prepend all lines printed with this + * + * Formats bandwidth and prepend each line with @indent. + * Passing NULL to @indent is equivalent passing "". + * + * Returns 0 on success, else -1. + */ +int +virBandwidrhDefFormat(virBufferPtr buf, + virBandwidthPtr def, + const char *indent) +{ + int ret = -1; + + if (!buf || !def) + goto cleanup; + + if (!indent) + indent = ""; + + if (!def->in.average && !def->out.average) { + ret = 0; + goto cleanup; + } + + virBufferAsprintf(buf, "%s<bandwidth>\n", indent); + if (def->in.average) { + virBufferAsprintf(buf, "%s ", indent); + if (virBandwidthChildDefFormat(buf, &def->in, "inbound") < 0) + goto cleanup; + } + + if (def->out.average) { + virBufferAsprintf(buf, "%s ", indent); + if (virBandwidthChildDefFormat(buf, &def->out, "outbound") < 0) + goto cleanup; + } + + virBufferAsprintf(buf, "%s</bandwidth>\n", indent); + + ret = 0; + +cleanup: + return ret; +} diff --git a/src/util/network.h b/src/util/network.h index 28a6402..f8cdc55 100644 --- a/src/util/network.h +++ b/src/util/network.h @@ -21,6 +21,7 @@ # include <netdb.h> # include <netinet/in.h> # include "xml.h" +# include "buf.h" typedef struct { union { @@ -108,4 +109,10 @@ int virSocketAddrPrefixToNetmask(unsigned int prefix, int family); int virBandwidthDefParseNode(xmlNodePtr node, virBandwidthPtr def); +int virBandwidrhDefFormat(virBufferPtr buf, + virBandwidthPtr def, + const char *indent); + +int virBpsToRate(char **buf, unsigned long rate); +int virBtoSize(char **buf, unsigned long size); #endif /* __VIR_NETWORK_H__ */ -- 1.7.5.rc3

These function executes 'tc' with appropriate arguments to set desired QoS setting on interface or bridge during its creation. --- configure.ac | 4 + src/libvirt_private.syms | 2 + src/network/bridge_driver.c | 8 ++ src/qemu/qemu_command.c | 5 ++ src/util/network.c | 152 +++++++++++++++++++++++++++++++++++++++++++ src/util/network.h | 3 + 6 files changed, 174 insertions(+), 0 deletions(-) diff --git a/configure.ac b/configure.ac index f816696..d563e94 100644 --- a/configure.ac +++ b/configure.ac @@ -165,6 +165,8 @@ AC_PATH_PROG([RADVD], [radvd], [radvd], [/sbin:/usr/sbin:/usr/local/sbin:$PATH]) AC_PATH_PROG([BRCTL], [brctl], [brctl], [/sbin:/usr/sbin:/usr/local/sbin:$PATH]) +AC_PATH_PROG([TC], [tc], [tc], + [/sbin:/usr/sbin:/usr/local/sbin:$PATH]) AC_PATH_PROG([UDEVADM], [udevadm], [], [/sbin:/usr/sbin:/usr/local/sbin:$PATH]) AC_PATH_PROG([UDEVSETTLE], [udevsettle], [], @@ -178,6 +180,8 @@ AC_DEFINE_UNQUOTED([RADVD],["$RADVD"], [Location or name of the radvd program]) AC_DEFINE_UNQUOTED([BRCTL],["$BRCTL"], [Location or name of the brctl program (see bridge-utils)]) +AC_DEFINE_UNQUOTED([TC],["$TC"], + [Location or name of the tc profram (see iproute2)]) if test -n "$UDEVADM"; then AC_DEFINE_UNQUOTED([UDEVADM],["$UDEVADM"], [Location or name of the udevadm program]) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 6640cfb..adf9cb0 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -683,6 +683,8 @@ nlComm; # network.h virBandwidrhDefFormat; virBandwidthDefParseNode; +virBandwidthDisable; +virBandwidthEnable; virBpsToRate; virBtoSize; virSocketAddrBroadcast; diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 4b94959..80f852f 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -1745,6 +1745,9 @@ networkStartNetworkDaemon(struct network_driver *driver, if (v6present && networkStartRadvd(network) < 0) goto err4; + if (virBandwidthEnable(&network->def->bandwidth, network->def->bridge) < 0) + goto err5; + /* Persist the live configuration now we have bridge info */ if (virNetworkSaveConfig(NETWORK_STATE_DIR, network->def) < 0) { goto err5; @@ -1836,6 +1839,11 @@ static int networkShutdownNetworkDaemon(struct network_driver *driver, unlink(stateFile); VIR_FREE(stateFile); + if (virBandwidthDisable(network->def->bridge, true) < 0) { + VIR_WARN("Failed to disable QoS on %s", + network->def->name); + } + if (network->radvdPid > 0) { char *radvdpidbase; diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index b517e1a..5ef73ed 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -291,6 +291,11 @@ qemuNetworkIfaceConnect(virDomainDefPtr def, } } + if (virBandwidthEnable(&net->bandwidth, net->ifname) < 0) { + VIR_FORCE_CLOSE(tapfd); + goto cleanup; + } + if (tapfd >= 0) { if ((net->filter) && (net->ifname)) { err = virDomainConfNWFilterInstantiate(conn, net); diff --git a/src/util/network.c b/src/util/network.c index b24f977..7bba60c 100644 --- a/src/util/network.c +++ b/src/util/network.c @@ -16,6 +16,7 @@ #include "network.h" #include "util.h" #include "virterror_internal.h" +#include "command.h" #define VIR_FROM_THIS VIR_FROM_NONE #define virSocketError(code, ...) \ @@ -1030,3 +1031,154 @@ virBandwidrhDefFormat(virBufferPtr buf, cleanup: return ret; } + +/** + * virBandwidthEnable: + * @bandwidth: rates to set + * @iface: on which interface + * + * This function enables QoS on specified interface + * and set given traffic limits for both, incoming + * and outgoing traffic. Any previous setting get + * overwritten. + * + * Return 0 on success, -1 otherwise. + */ +int +virBandwidthEnable(virBandwidthPtr bandwidth, + const char *iface) +{ + int ret = -1; + virCommandPtr cmd = NULL; + char *average = NULL; + char *peak = NULL; + char *burst = NULL; + + if (!bandwidth || !iface) + return -1; + + if (!bandwidth->in.average && + !bandwidth->out.average) { + /* nothing to be enabled */ + ret = 0; + goto cleanup; + } + + if (virBandwidthDisable(iface, true) < 0) + goto cleanup; + + if (bandwidth->in.average) { + if (virBpsToRate(&average, bandwidth->in.average) < 0) + goto cleanup; + if (bandwidth->in.peak && + (virBpsToRate(&peak, bandwidth->in.peak) < 0)) + goto cleanup; + if (bandwidth->in.burst && + (virBtoSize(&burst, bandwidth->in.burst) < 0)) + goto cleanup; + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "add", "dev", iface, "root", "handle", "1:", "htb", "default", "1", NULL); + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd,"class", "add", "dev", iface, "parent", "1:", "classid", "1:1", "htb", NULL); + virCommandAddArgList(cmd, "rate", average, NULL); + + if (peak) + virCommandAddArgList(cmd, "ceil", peak, NULL); + if (burst) + virCommandAddArgList(cmd, "burst", burst, NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd,"filter", "add", "dev", iface, "parent", "1:0", "protocol", "ip", "handle", "1", "fw", "flowid", "1", NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + VIR_FREE(average); + VIR_FREE(peak); + VIR_FREE(burst); + } + + if (bandwidth->out.average) { + if (virBpsToRate(&average, bandwidth->out.average) < 0) + goto cleanup; + if (virBtoSize(&burst, bandwidth->out.burst ? bandwidth->out.burst : bandwidth->out.average) < 0) + goto cleanup; + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "add", "dev", iface, "ingress", NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "filter", "add", "dev", iface, "parent", "ffff:", "protocol", "ip", "u32", "match", "ip", "src", "0.0.0.0/0", "police", "rate", average, "burst", burst, "drop", "flowid", ":1", NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + } + + ret = 0; + +cleanup: + virCommandFree(cmd); + VIR_FREE(average); + VIR_FREE(peak); + VIR_FREE(burst); + return ret; +} + +/** + * virBandwidthDisable: + * @iface: on which interface + * @may_fail: should be unsuccessful disable considered fatal? + * + * This function tries to disable QoS on specified interface + * by deleting root and ingress qdisc. However, this may fail + * if we try to remove the default one. + * + * Return 0 on success, -1 otherwise. + */ +int +virBandwidthDisable(const char *iface, + bool may_fail) +{ + int ret = -1; + int status; + virCommandPtr cmd = NULL; + + if (!iface) + return -1; + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "del", "dev", iface, "root", NULL); + + if ((virCommandRun(cmd, &status) < 0) || + (!may_fail && status)) + goto cleanup; + + virCommandFree(cmd); + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "del", "dev", iface, "ingress", NULL); + + if ((virCommandRun(cmd, &status) < 0) || + (!may_fail && status)) + goto cleanup; + + ret = 0; + +cleanup: + virCommandFree(cmd); + return ret; +} diff --git a/src/util/network.h b/src/util/network.h index f8cdc55..5a88e51 100644 --- a/src/util/network.h +++ b/src/util/network.h @@ -115,4 +115,7 @@ int virBandwidrhDefFormat(virBufferPtr buf, int virBpsToRate(char **buf, unsigned long rate); int virBtoSize(char **buf, unsigned long size); + +int virBandwidthEnable(virBandwidthPtr bandwidth, const char *iface); +int virBandwidthDisable(const char *iface, bool may_fail); #endif /* __VIR_NETWORK_H__ */ -- 1.7.5.rc3

--- tests/networkxml2xmlin/bandwidth-network.xml | 16 ++++++++++++++++ tests/networkxml2xmlout/bandwidth-network.xml | 16 ++++++++++++++++ tests/networkxml2xmltest.c | 1 + 3 files changed, 33 insertions(+), 0 deletions(-) create mode 100644 tests/networkxml2xmlin/bandwidth-network.xml create mode 100644 tests/networkxml2xmlout/bandwidth-network.xml diff --git a/tests/networkxml2xmlin/bandwidth-network.xml b/tests/networkxml2xmlin/bandwidth-network.xml new file mode 100644 index 0000000..a1301df --- /dev/null +++ b/tests/networkxml2xmlin/bandwidth-network.xml @@ -0,0 +1,16 @@ +<network> + <name>test-net</name> + <uuid>986fed9e-a488-186d-ef2d-17ebfd1993f8</uuid> + <forward mode='nat'/> + <bridge name='virbr1' stp='on' delay='0' /> + <mac address='52:54:00:E6:A2:C9'/> + <bandwidth> + <inbound average='1mbit' peak='2mbit' burst='1m'/> + <outbound average='2mbit'/> + </bandwidth> + <ip address='192.168.120.1' netmask='255.255.255.0'> + <dhcp> + <range start='192.168.120.2' end='192.168.120.254' /> + </dhcp> + </ip> +</network> diff --git a/tests/networkxml2xmlout/bandwidth-network.xml b/tests/networkxml2xmlout/bandwidth-network.xml new file mode 100644 index 0000000..6c31de7 --- /dev/null +++ b/tests/networkxml2xmlout/bandwidth-network.xml @@ -0,0 +1,16 @@ +<network> + <name>test-net</name> + <uuid>986fed9e-a488-186d-ef2d-17ebfd1993f8</uuid> + <forward mode='nat'/> + <bridge name='virbr1' stp='on' delay='0' /> + <mac address='52:54:00:E6:A2:C9'/> + <bandwidth> + <inbound average='1000kbit' peak='2000kbit' burst='1mb'/> + <outbound average='2000kbit'/> + </bandwidth> + <ip address='192.168.120.1' netmask='255.255.255.0'> + <dhcp> + <range start='192.168.120.2' end='192.168.120.254' /> + </dhcp> + </ip> +</network> diff --git a/tests/networkxml2xmltest.c b/tests/networkxml2xmltest.c index 468785b..a29e023 100644 --- a/tests/networkxml2xmltest.c +++ b/tests/networkxml2xmltest.c @@ -86,6 +86,7 @@ mymain(void) DO_TEST("nat-network"); DO_TEST("netboot-network"); DO_TEST("netboot-proxy-network"); + DO_TEST("bandwidth-network"); return (ret==0 ? EXIT_SUCCESS : EXIT_FAILURE); } -- 1.7.5.rc3

--- tests/domainschemadata/domain-bandwidth.xml | 72 +++++++++++++++++++++++++++ 1 files changed, 72 insertions(+), 0 deletions(-) create mode 100644 tests/domainschemadata/domain-bandwidth.xml diff --git a/tests/domainschemadata/domain-bandwidth.xml b/tests/domainschemadata/domain-bandwidth.xml new file mode 100644 index 0000000..98b989a --- /dev/null +++ b/tests/domainschemadata/domain-bandwidth.xml @@ -0,0 +1,72 @@ +<domain type='kvm'> + <name>f14-60</name> + <uuid>38644c45-5227-a936-3b38-bc4f72c355bb</uuid> + <memory>1048576</memory> + <currentMemory>1048576</currentMemory> + <vcpu>2</vcpu> + <os> + <type arch='x86_64' machine='pc-0.13'>hvm</type> + <boot dev='cdrom'/> + <boot dev='hd'/> + <bootmenu enable='yes'/> + </os> + <features> + <acpi/> + <apic/> + <pae/> + </features> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>restart</on_crash> + <devices> + <emulator>/usr/bin/qemu-kvm</emulator> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2'/> + <source file='/var/lib/libvirt/images/f14-6.img'/> + <target dev='vda' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </disk> + <disk type='file' device='cdrom'> + <driver name='qemu' type='raw'/> + <source file='/home/zippy/tmp/Fedora-14-x86_64-Live-KDE.iso'/> + <target dev='hdc' bus='ide'/> + <readonly/> + <address type='drive' controller='0' bus='1' unit='0'/> + </disk> + <controller type='ide' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <controller type='virtio-serial' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> + </controller> + <interface type='network'> + <mac address='52:54:00:24:a5:9f'/> + <source network='default'/> + <bandwidth> + <inbound average='1000kbit' peak='4000kbit' burst='1mb'/> + <outbound average='128000bit' peak='256000bit' burst='32kb'/> + </bandwidth> + <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> + </interface> + <serial type='pty'> + <target port='0'/> + </serial> + <console type='pty'> + <target type='serial' port='0'/> + </console> + <input type='tablet' bus='usb'/> + <input type='mouse' bus='ps2'/> + <graphics type='vnc' port='-1' autoport='yes'/> + <sound model='ac97'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </sound> + <video> + <model type='vga' vram='9216' heads='1'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </video> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </memballoon> + </devices> +</domain> -- 1.7.5.rc3

On 06/23/2011 11:36 AM, Michal Privoznik wrote:
This patch series add support for setting traffic shaping and policing on both domain's interface and network's virtual bridge. Basically, this is done via 'tc' from iproute2 package. For shaping is HTB used, for policing we need u32 match selector. Both should be available in RHEL-6 kernel.
Some general questions: 1) Just to make sure I understand it, if <bandwidth> is specified in the network XML, that will be applied to the bridge itself, not to each interface on the bridge. (So, if we want to have the per-guest bandwidth specified in a single place, that will be where my proposed <portgroup> comes in - a <bandwidth> specified in a <portgroup> will be applied individually to each guest interface that is part of the portgroup.) 2) Following on (2). Once host bridges (ie, a bridge device that's directly connected to a physical NIC) are available as <network>s, can the limiting then be applied to the bridge device, or will it instead need to be applied to the ethernet device that the bridge is using to attach to the physical network? If the latter is the case, then we'll need a way to determine which device that is (currently I hadn't planned on putting that information in the XML, because it's configured outside of libvirt, and this could lead to mismatches between what libvirt XML shows and reality). 3) Similarly for macvtap <network>s, will the network-wide bandwidth limiting be applied to the physical ethernet device? This would have the side effect of including host traffic on that interface in the bandwidth totals, but I don't see a way around it. 4) Finally on that topic, what about <network>s that have a pool of physical ethernets to be used macvtap-style? Is there any way we can do bandwidth limiting on an aggregated collection of network interfaces? Or will attempts to configure this necessarily result in a "config not supported" log message? 5) I see you've put the data structures and parsing/formatting functions that are used by both network_conf.c and domain_conf.c in the util directory. I've been wondering myself what to do with similar structs/functions that will be used in both network and domain XML (eg, <virtualport> will soon be used by both) - does everyone agree this is the right way to handle it? Or should there be a common file in the conf directory instead? (I don't have an opinion, but want to "do the right thing" when I write my own code)

On 23.06.2011 19:17, Laine Stump wrote:
On 06/23/2011 11:36 AM, Michal Privoznik wrote:
This patch series add support for setting traffic shaping and policing on both domain's interface and network's virtual bridge. Basically, this is done via 'tc' from iproute2 package. For shaping is HTB used, for policing we need u32 match selector. Both should be available in RHEL-6 kernel.
Some general questions:
1) Just to make sure I understand it, if <bandwidth> is specified in the network XML, that will be applied to the bridge itself, not to each interface on the bridge. (So, if we want to have the per-guest bandwidth specified in a single place, that will be where my proposed <portgroup> comes in - a <bandwidth> specified in a <portgroup> will be applied individually to each guest interface that is part of the portgroup.) Yes. If I want to support per-guest traffic shaping in these patches, I need to create a separate XML for that. Neither <network> nor <domain> is not the right place. So I'd end up with the same structure as you.
2) Following on (2). Once host bridges (ie, a bridge device that's directly connected to a physical NIC) are available as <network>s, can the limiting then be applied to the bridge device, or will it instead need to be applied to the ethernet device that the bridge is using to attach to the physical network? If the latter is the case, then we'll need a way to determine which device that is (currently I hadn't planned on putting that information in the XML, because it's configured outside of libvirt, and this could lead to mismatches between what libvirt XML shows and reality). Fortunately, it is the first case. Therefore is XML inserted into <interface> and <network>. Rules are applied only on desired device. Not anywhere else.
3) Similarly for macvtap <network>s, will the network-wide bandwidth limiting be applied to the physical ethernet device? This would have the side effect of including host traffic on that interface in the bandwidth totals, but I don't see a way around it. With this patch as-is, shaping rules are applied only when creating TAP devices. This mean only network types VIR_DOMAIN_NET_TYPE_NETWORK and VIR_DOMAIN_NET_TYPE_BRIDGE.
4) Finally on that topic, what about <network>s that have a pool of physical ethernets to be used macvtap-style? Is there any way we can do bandwidth limiting on an aggregated collection of network interfaces? Or will attempts to configure this necessarily result in a "config not supported" log message? Huh, I didn't know it is possible to have a pool of devices within one <network>. So in this case, this patch silently does nothing.
5) I see you've put the data structures and parsing/formatting functions that are used by both network_conf.c and domain_conf.c in the util directory. I've been wondering myself what to do with similar structs/functions that will be used in both network and domain XML (eg, <virtualport> will soon be used by both) - does everyone agree this is the right way to handle it? Or should there be a common file in the conf directory instead? (I don't have an opinion, but want to "do the right thing" when I write my own code) Yeah, I was thinking the same way. When I wrote the code, I've decided for src/util. But I would not hesitate if somebody wants it in src/conf.
Michal

On 06/24/2011 01:34 AM, Michal Privoznik wrote:
5) I see you've put the data structures and parsing/formatting functions that are used by both network_conf.c and domain_conf.c in the util directory. I've been wondering myself what to do with similar structs/functions that will be used in both network and domain XML (eg, <virtualport> will soon be used by both) - does everyone agree this is the right way to handle it? Or should there be a common file in the conf directory instead? (I don't have an opinion, but want to "do the right thing" when I write my own code) Yeah, I was thinking the same way. When I wrote the code, I've decided for src/util. But I would not hesitate if somebody wants it in src/conf.
We already have src/util/sysinfo.[ch] for common parsing/formatting of sysinfo XML, so I'm okay with src/util. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 06/24/2011 10:38 AM, Eric Blake wrote:
On 06/24/2011 01:34 AM, Michal Privoznik wrote:
5) I see you've put the data structures and parsing/formatting functions that are used by both network_conf.c and domain_conf.c in the util directory. I've been wondering myself what to do with similar structs/functions that will be used in both network and domain XML (eg, <virtualport> will soon be used by both) - does everyone agree this is the right way to handle it? Or should there be a common file in the conf directory instead? (I don't have an opinion, but want to "do the right thing" when I write my own code) Yeah, I was thinking the same way. When I wrote the code, I've decided for src/util. But I would not hesitate if somebody wants it in src/conf. We already have src/util/sysinfo.[ch] for common parsing/formatting of sysinfo XML, so I'm okay with src/util.
Ah, good. I hadn't seen that one. So it's all settled.

On 06/24/2011 03:34 AM, Michal Privoznik wrote:
On 23.06.2011 19:17, Laine Stump wrote:
3) Similarly for macvtap<network>s, will the network-wide bandwidth limiting be applied to the physical ethernet device? This would have the side effect of including host traffic on that interface in the bandwidth totals, but I don't see a way around it. With this patch as-is, shaping rules are applied only when creating TAP devices. This mean only network types VIR_DOMAIN_NET_TYPE_NETWORK and VIR_DOMAIN_NET_TYPE_BRIDGE.
We really need to support VIR_DOMAIN_NET_TYPE_DIRECT (macvtap) as well. It also uses a tap device.
4) Finally on that topic, what about<network>s that have a pool of physical ethernets to be used macvtap-style? Is there any way we can do bandwidth limiting on an aggregated collection of network interfaces? Or will attempts to configure this necessarily result in a "config not supported" log message? Huh, I didn't know it is possible to have a pool of devices within one <network>. So in this case, this patch silently does nothing.
It isn't possible, yet. :-)

3) Similarly for macvtap <network>s, will the network-wide bandwidth limiting be applied to the physical ethernet device? This would have the side effect of including host traffic on that interface in the bandwidth totals, but I don't see a way around it. With this patch as-is, shaping rules are applied only when creating TAP devices. This mean only network types VIR_DOMAIN_NET_TYPE_NETWORK and VIR_DOMAIN_NET_TYPE_BRIDGE.
4) Finally on that topic, what about <network>s that have a pool of physical ethernets to be used macvtap-style? Is there any way we can do bandwidth limiting on an aggregated collection of network interfaces? Or will attempts to configure this necessarily result in a "config not supported" log message? Huh, I didn't know it is possible to have a pool of devices within one <network>. So in this case, this patch silently does nothing.
The IFB (Intermediate Functional Block) allows you to configure aggregate QoS on multiple interfaces. /Chris

On 24.06.2011 20:00, Christian Benvenuti (benve) wrote:
3) Similarly for macvtap <network>s, will the network-wide bandwidth limiting be applied to the physical ethernet device? This would have the side effect of including host traffic on that interface in the bandwidth totals, but I don't see a way around it. With this patch as-is, shaping rules are applied only when creating TAP devices. This mean only network types VIR_DOMAIN_NET_TYPE_NETWORK and VIR_DOMAIN_NET_TYPE_BRIDGE.
4) Finally on that topic, what about <network>s that have a pool of physical ethernets to be used macvtap-style? Is there any way we can do bandwidth limiting on an aggregated collection of network interfaces? Or will attempts to configure this necessarily result in a "config not supported" log message? Huh, I didn't know it is possible to have a pool of devices within one <network>. So in this case, this patch silently does nothing.
The IFB (Intermediate Functional Block) allows you to configure aggregate QoS on multiple interfaces.
/Chris
Yes, but we would then need to create those IFB devices on the fly (e.g. on domain startup) and I don't think there is other way than unloading and then loading the ifb module (with different parameter) which however would break other domains connections. Or am I missing something? But I agree, using ifb would be much more beautiful, because we could actually shape incoming traffic instead of dropping. Michal

-----Original Message----- From: Michal Privoznik [mailto:mprivozn@redhat.com] Sent: Monday, June 27, 2011 8:34 AM To: Christian Benvenuti (benve) Cc: Laine Stump; libvir-list@redhat.com Subject: Re: [libvirt] [PATCH 0/7] Add support for setting QoS
On 24.06.2011 20:00, Christian Benvenuti (benve) wrote:
3) Similarly for macvtap <network>s, will the network-wide bandwidth limiting be applied to the physical ethernet device? This would have the side effect of including host traffic on that interface in the bandwidth totals, but I don't see a way around it. With this patch as-is, shaping rules are applied only when creating TAP devices. This mean only network types VIR_DOMAIN_NET_TYPE_NETWORK and VIR_DOMAIN_NET_TYPE_BRIDGE.
4) Finally on that topic, what about <network>s that have a pool of physical ethernets to be used macvtap-style? Is there any way we can do bandwidth limiting on an aggregated collection of network interfaces? Or will attempts to configure this necessarily result in a "config not supported" log message? Huh, I didn't know it is possible to have a pool of devices within one <network>. So in this case, this patch silently does nothing.
The IFB (Intermediate Functional Block) allows you to configure aggregate QoS on multiple interfaces.
/Chris
Yes, but we would then need to create those IFB devices on the fly (e.g. on domain startup) and I don't think there is other way than unloading and then loading the ifb module (with different parameter) which however would break other domains connections. Or am I missing something?
Adding support for dynamic creation/deletion of IFB devices (for example via netlink) should not be that hard. If that's the only reason for not using it, I would consider the option of extending the IFB module. /Chris
But I agree, using ifb would be much more beautiful, because we could actually shape incoming traffic instead of dropping.
Michal

On 27.06.2011 19:37, Christian Benvenuti (benve) wrote:
-----Original Message----- From: Michal Privoznik [mailto:mprivozn@redhat.com] Sent: Monday, June 27, 2011 8:34 AM To: Christian Benvenuti (benve) Cc: Laine Stump; libvir-list@redhat.com Subject: Re: [libvirt] [PATCH 0/7] Add support for setting QoS
On 24.06.2011 20:00, Christian Benvenuti (benve) wrote:
3) Similarly for macvtap <network>s, will the network-wide bandwidth limiting be applied to the physical ethernet device? This would have the side effect of including host traffic on that interface in the bandwidth totals, but I don't see a way around it. With this patch as-is, shaping rules are applied only when creating TAP devices. This mean only network types VIR_DOMAIN_NET_TYPE_NETWORK and VIR_DOMAIN_NET_TYPE_BRIDGE.
4) Finally on that topic, what about <network>s that have a pool of physical ethernets to be used macvtap-style? Is there any way we can do bandwidth limiting on an aggregated collection of network interfaces? Or will attempts to configure this necessarily result in a "config not supported" log message? Huh, I didn't know it is possible to have a pool of devices within one <network>. So in this case, this patch silently does nothing.
The IFB (Intermediate Functional Block) allows you to configure aggregate QoS on multiple interfaces.
/Chris
Yes, but we would then need to create those IFB devices on the fly (e.g. on domain startup) and I don't think there is other way than unloading and then loading the ifb module (with different parameter) which however would break other domains connections. Or am I missing something?
Adding support for dynamic creation/deletion of IFB devices (for example via netlink) should not be that hard. If that's the only reason for not using it, I would consider the option of extending the IFB module.
/Chris
Sorry for mystification. Simple reading of source code for iproute2 has shown ifb module understand netlink interface and thus new ifb device can be simply created via "ip link add name <name> type ifb". So I'll rewrite and post v2. Michal

On 01.07.2011 09:20, Michal Privoznik wrote:
On 27.06.2011 19:37, Christian Benvenuti (benve) wrote:
-----Original Message----- From: Michal Privoznik [mailto:mprivozn@redhat.com] Sent: Monday, June 27, 2011 8:34 AM To: Christian Benvenuti (benve) Cc: Laine Stump; libvir-list@redhat.com Subject: Re: [libvirt] [PATCH 0/7] Add support for setting QoS
On 24.06.2011 20:00, Christian Benvenuti (benve) wrote:
3) Similarly for macvtap<network>s, will the network-wide bandwidth limiting be applied to the physical ethernet device? This would have the side effect of including host traffic on that interface in the bandwidth totals, but I don't see a way around it. With this patch as-is, shaping rules are applied only when creating TAP devices. This mean only network types VIR_DOMAIN_NET_TYPE_NETWORK and VIR_DOMAIN_NET_TYPE_BRIDGE.
4) Finally on that topic, what about<network>s that have a pool of physical ethernets to be used macvtap-style? Is there any way we can do bandwidth limiting on an aggregated collection of network interfaces? Or will attempts to configure this necessarily result in a "config not supported" log message? Huh, I didn't know it is possible to have a pool of devices within one <network>. So in this case, this patch silently does nothing.
The IFB (Intermediate Functional Block) allows you to configure aggregate QoS on multiple interfaces.
/Chris
Yes, but we would then need to create those IFB devices on the fly (e.g. on domain startup) and I don't think there is other way than unloading and then loading the ifb module (with different parameter) which however would break other domains connections. Or am I missing something?
Adding support for dynamic creation/deletion of IFB devices (for example via netlink) should not be that hard. If that's the only reason for not using it, I would consider the option of extending the IFB module.
/Chris
Sorry for mystification. Simple reading of source code for iproute2 has shown ifb module understand netlink interface and thus new ifb device can be simply created via "ip link add name<name> type ifb".
So I'll rewrite and post v2.
Michal
Although, allowing to set bandwidth limits per domain as whole (and thus use IFB) might be considered as extension to this patch, which can be added later. So I'll post a v2 which will not contain IFB feature, but will be rebased to current HEAD. I think splitting this huge QoS feature in smaller parts might increase chances of getting it in. Michal

On Jun 24, 2011, at 9:34 AM, Michal Privoznik wrote:
3) Similarly for macvtap <network>s, will the network-wide bandwidth limiting be applied to the physical ethernet device? This would have the side effect of including host traffic on that interface in the bandwidth totals, but I don't see a way around it.
With this patch as-is, shaping rules are applied only when creating TAP devices. This mean only network types VIR_DOMAIN_NET_TYPE_NETWORK and VIR_DOMAIN_NET_TYPE_BRIDGE
Bandwith management with macvtap devices does not work with this approach because the macvtap has no queue of its own, instead packets are directly queued to the corresponding ethX device. For this to work it is necessary to create a MAC filter on the ethX with the MAC of the VM. Regards, D.Herrendoerfer
participants (5)
-
Christian Benvenuti (benve)
-
D. Herrendoerfer
-
Eric Blake
-
Laine Stump
-
Michal Privoznik