[libvirt] [PATCH v2 0/7] Add support for setting QoS

This patch series add support for setting traffic shaping and policing on both domain's interface and network's virtual bridge. Basically, this is done via 'tc' from iproute2 package. For shaping is HTB used, for policing we need u32 match selector. Both should be available in RHEL-6 kernel. How this works: On an virtual interface which has limits defined a root qdisc are replaced. Ingress root for outbound traffic shaping and egress for inbound. Basically, in inbound traffic policing is applied, on outbound shaping. New qdiscs are set to limit the traffic to rate set in XML. For shaping it is possible to set the size of buffer. Accepted values for rate, peak and burst have same format as 'tc' command and are documented. Supported devices are VIR_DOMAIN_NET_TYPE_NETWORK, VIR_DOMAIN_NET_TYPE_BRIDGE and VIR_DOMAIN_NET_TYPE_DIRECT. diff to v1: -rebase to current HEAD -add support for macvtap devices Michal Privoznik (7): bandwidth: Define schema and create documentation bandwidth: Declare internal structures bandwidth: Add format parsing functions bandwidth: Create format functions bandwitdh: Implement functions to enable and disable QoS bandwidth: Add test cases for network bandwidth: Add domain schema test suite configure.ac | 4 + docs/formatdomain.html.in | 32 ++ docs/formatnetwork.html.in | 30 ++ docs/schemas/domain.rng | 50 +++ docs/schemas/network.rng | 50 +++ src/conf/domain_conf.c | 6 + src/conf/domain_conf.h | 1 + src/conf/network_conf.c | 8 + src/conf/network_conf.h | 1 + src/libvirt_private.syms | 6 + src/network/bridge_driver.c | 12 + src/qemu/qemu_command.c | 10 +- src/util/macvtap.c | 12 +- src/util/macvtap.h | 5 +- src/util/network.c | 517 +++++++++++++++++++++++++ src/util/network.h | 28 ++ tests/domainschemadata/domain-bandwidth.xml | 72 ++++ tests/networkxml2xmlin/bandwidth-network.xml | 16 + tests/networkxml2xmlout/bandwidth-network.xml | 16 + tests/networkxml2xmltest.c | 1 + 20 files changed, 873 insertions(+), 4 deletions(-) create mode 100644 tests/domainschemadata/domain-bandwidth.xml create mode 100644 tests/networkxml2xmlin/bandwidth-network.xml create mode 100644 tests/networkxml2xmlout/bandwidth-network.xml -- 1.7.5.rc3

Define new 'bandwidth' element with possible child element 'inbound' and 'outbound' addressing incoming and outgoing traffic respectively: <bandwidth> <inbound average='1mbit' peak='2mbit' burst='5m'/> <outbound average='0.5mbit'/> </bandwidth> Leaving any element out means not to shape traffic in that direction. Accepted units are the same as 'tc' accepts. This element can be inserted into domain's 'interface' and 'network'. --- docs/formatdomain.html.in | 32 ++++++++++++++++++++++++++++ docs/formatnetwork.html.in | 30 ++++++++++++++++++++++++++ docs/schemas/domain.rng | 50 ++++++++++++++++++++++++++++++++++++++++++++ docs/schemas/network.rng | 50 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 162 insertions(+), 0 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 269fc30..766b2bc 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1825,6 +1825,38 @@ qemu-kvm -net nic,model=? /dev/null <span class="since">Since 0.8.8</span> </p> + <h5><a name="elementQoS">Quality of service</a></h5> + +<pre> + ... + <devices> + <interface type='network'> + <source network='default'/> + <target dev='vnet0'/> + <b><bandwidth> + <inbound average='1mbit' peak='0.5mbps' burst='1m'/> + <outbound average='128kbit' peak='256kbit' burst='256kbit'/> + </bandwidth></b> + </interface> + <devices> + ...</pre> + + <p> + This part of interface XML provides setting quality of service. Incoming + and outgoing traffic can be shaped independently. The + <code>bandwidth</code> element can have at most one <code>inbound</code> + and at most one <code>outbound</code> child elements. Leaving any of these + children element out result in no QoS applied on that traffic direction. + So, when you want to shape only domain's incoming traffic, use + <code>inbound</code> only, and vice versa. Each of these elements have one + mandatory attribute <code>average</code>. It specifies average bit rate on + interface being shaped. Then there are two optional attributes: + <code>peak</code>, which specifies maximum rate at which interface can send + data, and <code>burst</code>, amount of bytes that can be burst at + <code>peak</code> speed. Accepted values for attributes are decimal + numbers, optionally followed by unit. See <code>man tc</code> for more + details. + </p> <h4><a name="elementsInput">Input devices</a></h4> <p> diff --git a/docs/formatnetwork.html.in b/docs/formatnetwork.html.in index f9421c3..607e1aa 100644 --- a/docs/formatnetwork.html.in +++ b/docs/formatnetwork.html.in @@ -102,6 +102,36 @@ 0.4.2</span></dd> </dl> + <h5><a name="elementQoS">Quality of service</a></h5> + +<pre> + ... + <forward mode='nat' dev='eth0'/> + <b><bandwidth> + <inbound average='1mbit' peak='0.5mbps' burst='1m'/> + <outbound average='128kbit' peak='256kbit' burst='256kbit'/> + </bandwidth></b> + <mac address='00:16:3E:5D:C7:9E'/> + ...</pre> + + <p> + This part of network XML provides setting quality of service. Incoming + and outgoing traffic can be shaped independently. The + <code>bandwidth</code> element can have at most one <code>inbound</code> + and at most one <code>outbound</code> child elements. Leaving any of these + children element out result in no QoS applied on that traffic direction. + So, when you want to shape only network's incoming traffic, use + <code>inbound</code> only, and vice versa. Each of these elements have one + mandatory attribute <code>average</code>. It specifies average bit rate on + interface being shaped. Then there are two optional attributes: + <code>peak</code>, which specifies maximum rate at which bridge can send + data, and <code>burst</code>, amount of bytes that can be burst at + <code>peak</code> speed. Accepted values for attributes are decimal + numbers, optionally followed by unit. See <code>man tc</code> for + more details. The rate is shared equally within domains connected + to the network. + </p> + <h3><a name="elementsAddress">Addressing</a></h3> <p> diff --git a/docs/schemas/domain.rng b/docs/schemas/domain.rng index 8a4e3fe..e12f5b7 100644 --- a/docs/schemas/domain.rng +++ b/docs/schemas/domain.rng @@ -1170,6 +1170,9 @@ <optional> <ref name="deviceBoot"/> </optional> + <optional> + <ref name="bandwidth"/> + </optional> </interleave> </define> <define name="virtualPortProfile"> @@ -2349,6 +2352,43 @@ </element> </define> + <define name="bandwidth"> + <element name="bandwidth"> + <interleave> + <optional> + <element name="inbound"> + <ref name="bandwidth-attributes"/> + <empty/> + </element> + </optional> + <optional> + <element name="outbound"> + <ref name="bandwidth-attributes"/> + <empty/> + </element> + </optional> + </interleave> + </element> + </define> + + <define name="bandwidth-attributes"> + <interleave> + <attribute name="average"> + <ref name="speed"/> + </attribute> + <optional> + <attribute name="peak"> + <ref name="speed"/> + </attribute> + </optional> + <optional> + <attribute name='burst'> + <ref name="BurstSize"/> + </attribute> + </optional> + </interleave> + </define> + <!-- Optional hypervisor extensions in their own namespace: QEmu @@ -2520,4 +2560,14 @@ <param name="maxLength">39</param> </data> </define> + <define name="speed"> + <data type="string"> + <param name="pattern">[0-9]+(\.[0-9]+)?((Ki|k|mi|m|gi|g|ti|t|)?bit|(Ki|K|Mi|M|Gi|G|Ti|T)?Bps)?</param> + </data> + </define> + <define name="BurstSize"> + <data type="string"> + <param name="pattern">[0-9]+(\.[0-9]+)?(kb|k|mb|m|mbit|kbit|b|gb|g|gbit)?</param> + </data> + </define> </grammar> diff --git a/docs/schemas/network.rng b/docs/schemas/network.rng index 6d9f06b..8e86b05 100644 --- a/docs/schemas/network.rng +++ b/docs/schemas/network.rng @@ -110,6 +110,9 @@ </zeroOrMore> </element> </optional> + <optional> + <ref name="bandwidth"/> + </optional> <!-- <ip> element --> <zeroOrMore> @@ -172,4 +175,51 @@ </data> </define> + <define name="bandwidth"> + <element name="bandwidth"> + <interleave> + <optional> + <element name="inbound"> + <ref name="bandwidth-attributes"/> + <empty/> + </element> + </optional> + <optional> + <element name="outbound"> + <ref name="bandwidth-attributes"/> + <empty/> + </element> + </optional> + </interleave> + </element> + </define> + + <define name="bandwidth-attributes"> + <interleave> + <attribute name="average"> + <ref name="speed"/> + </attribute> + <optional> + <attribute name="peak"> + <ref name="speed"/> + </attribute> + </optional> + <optional> + <attribute name='burst'> + <ref name="BurstSize"/> + </attribute> + </optional> + </interleave> + </define> + + <define name="speed"> + <data type="string"> + <param name="pattern">[0-9]+(\.[0-9]+)?((Ki|k|mi|m|gi|g|ti|t|)?bit|(Ki|K|Mi|M|Gi|G|Ti|T)?Bps)?</param> + </data> + </define> + <define name="BurstSize"> + <data type="string"> + <param name="pattern">[0-9]+(\.[0-9]+)?(kb|k|mb|m|mbit|kbit|b|gb|g|gbit)?</param> + </data> + </define> </grammar> -- 1.7.5.rc3

On Tue, Jul 12, 2011 at 13:57:07 +0200, Michal Privoznik wrote:
Define new 'bandwidth' element with possible child element 'inbound' and 'outbound' addressing incoming and outgoing traffic respectively:
<bandwidth> <inbound average='1mbit' peak='2mbit' burst='5m'/> <outbound average='0.5mbit'/> </bandwidth>
Leaving any element out means not to shape traffic in that direction. Accepted units are the same as 'tc' accepts.
tc is an internal implementation which we shouldn't expose externally. Instead of allowing units, we can just select one fixed unit and have plain numbers in the attributes. Kilobytes (per second) seems like a good unit for me since it's small enough that I doubt anyone would need less than that and we use kilobytes in other places as well (memory size, e.g.).
This element can be inserted into domain's 'interface' and 'network'. --- docs/formatdomain.html.in | 32 ++++++++++++++++++++++++++++ docs/formatnetwork.html.in | 30 ++++++++++++++++++++++++++ docs/schemas/domain.rng | 50 ++++++++++++++++++++++++++++++++++++++++++++ docs/schemas/network.rng | 50 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 162 insertions(+), 0 deletions(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 269fc30..766b2bc 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1825,6 +1825,38 @@ qemu-kvm -net nic,model=? /dev/null <span class="since">Since 0.8.8</span> </p>
+ <h5><a name="elementQoS">Quality of service</a></h5> + +<pre> + ... + <devices> + <interface type='network'> + <source network='default'/> + <target dev='vnet0'/> + <b><bandwidth> + <inbound average='1mbit' peak='0.5mbps' burst='1m'/> + <outbound average='128kbit' peak='256kbit' burst='256kbit'/> + </bandwidth></b> + </interface> + <devices> + ...</pre> + + <p> + This part of interface XML provides setting quality of service. Incoming + and outgoing traffic can be shaped independently. The + <code>bandwidth</code> element can have at most one <code>inbound</code> + and at most one <code>outbound</code> child elements. Leaving any of these + children element out result in no QoS applied on that traffic direction. + So, when you want to shape only domain's incoming traffic, use + <code>inbound</code> only, and vice versa. Each of these elements have one + mandatory attribute <code>average</code>. It specifies average bit rate on + interface being shaped. Then there are two optional attributes: + <code>peak</code>, which specifies maximum rate at which interface can send + data, and <code>burst</code>, amount of bytes that can be burst at + <code>peak</code> speed. Accepted values for attributes are decimal + numbers, optionally followed by unit. See <code>man tc</code> for more + details. + </p>
We shouldn't mention tc in XML documentation. The semantics should be documented in a self-contained way.
<h4><a name="elementsInput">Input devices</a></h4>
<p> diff --git a/docs/formatnetwork.html.in b/docs/formatnetwork.html.in index f9421c3..607e1aa 100644 --- a/docs/formatnetwork.html.in +++ b/docs/formatnetwork.html.in @@ -102,6 +102,36 @@ 0.4.2</span></dd> </dl>
+ <h5><a name="elementQoS">Quality of service</a></h5> + +<pre> + ... + <forward mode='nat' dev='eth0'/> + <b><bandwidth> + <inbound average='1mbit' peak='0.5mbps' burst='1m'/> + <outbound average='128kbit' peak='256kbit' burst='256kbit'/> + </bandwidth></b> + <mac address='00:16:3E:5D:C7:9E'/> + ...</pre> + + <p> + This part of network XML provides setting quality of service. Incoming + and outgoing traffic can be shaped independently. The + <code>bandwidth</code> element can have at most one <code>inbound</code> + and at most one <code>outbound</code> child elements. Leaving any of these + children element out result in no QoS applied on that traffic direction. + So, when you want to shape only network's incoming traffic, use + <code>inbound</code> only, and vice versa. Each of these elements have one + mandatory attribute <code>average</code>. It specifies average bit rate on + interface being shaped. Then there are two optional attributes: + <code>peak</code>, which specifies maximum rate at which bridge can send + data, and <code>burst</code>, amount of bytes that can be burst at + <code>peak</code> speed. Accepted values for attributes are decimal + numbers, optionally followed by unit. See <code>man tc</code> for + more details. The rate is shared equally within domains connected + to the network. + </p> +
The same applies here. ...
diff --git a/docs/schemas/domain.rng b/docs/schemas/domain.rng index 8a4e3fe..e12f5b7 100644 --- a/docs/schemas/domain.rng +++ b/docs/schemas/domain.rng @@ -2349,6 +2352,43 @@ </element> </define>
+ <define name="bandwidth"> + <element name="bandwidth"> + <interleave> + <optional> + <element name="inbound"> + <ref name="bandwidth-attributes"/> + <empty/> + </element> + </optional> + <optional> + <element name="outbound"> + <ref name="bandwidth-attributes"/> + <empty/> + </element> + </optional> + </interleave> + </element> + </define> + + <define name="bandwidth-attributes"> + <interleave> + <attribute name="average"> + <ref name="speed"/> + </attribute> + <optional> + <attribute name="peak"> + <ref name="speed"/> + </attribute> + </optional> + <optional> + <attribute name='burst'> + <ref name="BurstSize"/> + </attribute> + </optional> + </interleave> + </define> +
Remove the <interleave> around attributes, since it doesn't make sense here, attributes can always be set in any order. And make all values be just plain numbers. ...
diff --git a/docs/schemas/network.rng b/docs/schemas/network.rng index 6d9f06b..8e86b05 100644 --- a/docs/schemas/network.rng +++ b/docs/schemas/network.rng
The bandwidth XML element is not very complicated but it still seems better if we have a single rng file defining the element and make both domain.rng and network.rng include that file. We do the same with storageencryption.rng. Jirka

--- src/util/network.h | 16 ++++++++++++++++ 1 files changed, 16 insertions(+), 0 deletions(-) diff --git a/src/util/network.h b/src/util/network.h index ed0b78c..568bca1 100644 --- a/src/util/network.h +++ b/src/util/network.h @@ -45,6 +45,22 @@ typedef struct { typedef virSocketAddr *virSocketAddrPtr; +typedef struct { + /* Even if we let user to input rates + * in various units, we store them in bps */ + unsigned long average; + unsigned long peak; + unsigned long burst; +} virRate; + +typedef virRate *virRatePtr; + +typedef struct { + virRate in, out; +} virBandwidth; + +typedef virBandwidth *virBandwidthPtr; + int virSocketParseAddr (const char *val, virSocketAddrPtr addr, int hint); -- 1.7.5.rc3

On Tue, Jul 12, 2011 at 13:57:08 +0200, Michal Privoznik wrote:
--- src/util/network.h | 16 ++++++++++++++++ 1 files changed, 16 insertions(+), 0 deletions(-)
diff --git a/src/util/network.h b/src/util/network.h index ed0b78c..568bca1 100644 --- a/src/util/network.h +++ b/src/util/network.h @@ -45,6 +45,22 @@ typedef struct {
typedef virSocketAddr *virSocketAddrPtr;
+typedef struct { + /* Even if we let user to input rates + * in various units, we store them in bps */
Let's not allow users to use various units :-)
+ unsigned long average; + unsigned long peak; + unsigned long burst; +} virRate; + +typedef virRate *virRatePtr; + +typedef struct { + virRate in, out; +} virBandwidth; + +typedef virBandwidth *virBandwidthPtr; + int virSocketParseAddr (const char *val, virSocketAddrPtr addr, int hint);
Jirka

2011/7/18 Jiri Denemark <jdenemar@redhat.com>:
On Tue, Jul 12, 2011 at 13:57:08 +0200, Michal Privoznik wrote:
--- src/util/network.h | 16 ++++++++++++++++ 1 files changed, 16 insertions(+), 0 deletions(-)
diff --git a/src/util/network.h b/src/util/network.h index ed0b78c..568bca1 100644 --- a/src/util/network.h +++ b/src/util/network.h @@ -45,6 +45,22 @@ typedef struct {
typedef virSocketAddr *virSocketAddrPtr;
+typedef struct { + /* Even if we let user to input rates + * in various units, we store them in bps */
Let's not allow users to use various units :-)
ditto
+ unsigned long average; + unsigned long peak; + unsigned long burst; +} virRate;
Also don't use long, as it's size is compiler and architecture dependent. Make it either int if that's large enough or make it long long to guarantee 64bit size if we need that here. -- Matthias Bolte http://photron.blogspot.com

These functions take on input decimal numbers optionally followed by unit. Units are exactly the same as 'tc' accepts. --- src/conf/domain_conf.c | 3 + src/conf/domain_conf.h | 1 + src/conf/network_conf.c | 5 + src/conf/network_conf.h | 1 + src/libvirt_private.syms | 1 + src/util/network.c | 203 ++++++++++++++++++++++++++++++++++++++++++++++ src/util/network.h | 2 + 7 files changed, 216 insertions(+), 0 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index bf2eadf..c109153 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2853,6 +2853,9 @@ virDomainNetDefParseXML(virCapsPtr caps, if (virDomainDeviceBootParseXML(cur, &def->bootIndex, bootMap)) goto error; + } else if (xmlStrEqual(cur->name, BAD_CAST "bandwidth")) { + if (virBandwidthDefParseNode(cur, &def->bandwidth) < 0) + goto error; } } cur = cur->next; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 8262d25..84ae171 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -398,6 +398,7 @@ struct _virDomainNetDef { virDomainDeviceInfo info; char *filter; virNWFilterHashTablePtr filterparams; + virBandwidth bandwidth; }; enum virDomainChrDeviceType { diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index ae479bf..c9929e2 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -742,6 +742,7 @@ virNetworkDefParseXML(xmlXPathContextPtr ctxt) char *tmp; xmlNodePtr *ipNodes = NULL; xmlNodePtr dnsNode = NULL; + xmlNodePtr bandwidthNode = NULL; int nIps; if (VIR_ALLOC(def) < 0) { @@ -777,6 +778,10 @@ virNetworkDefParseXML(xmlXPathContextPtr ctxt) /* Parse network domain information */ def->domain = virXPathString("string(./domain[1]/@name)", ctxt); + if ((bandwidthNode = virXPathNode("./bandwidth", ctxt)) != NULL && + virBandwidthDefParseNode(bandwidthNode, &def->bandwidth) < 0) + goto error; + /* Parse bridge information */ def->bridge = virXPathString("string(./bridge[1]/@name)", ctxt); tmp = virXPathString("string(./bridge[1]/@stp)", ctxt); diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h index 5edcf27..565a464 100644 --- a/src/conf/network_conf.h +++ b/src/conf/network_conf.h @@ -127,6 +127,7 @@ struct _virNetworkDef { virNetworkIpDefPtr ips; /* ptr to array of IP addresses on this network */ virNetworkDNSDefPtr dns; /* ptr to dns related configuration */ + virBandwidth bandwidth; }; typedef struct _virNetworkObj virNetworkObj; diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 1112398..c0e1bfa 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -682,6 +682,7 @@ nlComm; # network.h +virBandwidthDefParseNode; virSocketAddrBroadcast; virSocketAddrBroadcastByPrefix; virSocketAddrIsNetmask; diff --git a/src/util/network.c b/src/util/network.c index eb16e0c..476ecde 100644 --- a/src/util/network.c +++ b/src/util/network.c @@ -10,6 +10,7 @@ #include <config.h> #include <arpa/inet.h> +#include <math.h> #include "memory.h" #include "network.h" @@ -21,6 +22,9 @@ virReportErrorHelper(VIR_FROM_THIS, code, __FILE__, \ __FUNCTION__, __LINE__, __VA_ARGS__) +#define virBandwidthError(code, ...) \ + virReportErrorHelper(VIR_FROM_THIS, code, __FILE__, \ + __FUNCTION__, __LINE__, __VA_ARGS__) /* * Helpers to extract the IP arrays from the virSocketAddrPtr * That part is the less portable of the module @@ -674,3 +678,202 @@ virSocketAddrPrefixToNetmask(unsigned int prefix, error: return result; } + +static const struct rate_suffix { + const char *name; + double scale; +} suffixes[] = { + { "bit", 1. }, + { "Kibit", 1024. }, + { "kbit", 1000. }, + { "mibit", 1024.*1024. }, + { "mbit", 1000.*1000. }, + { "gibit", 1024.*1024.*1024. }, + { "gbit", 1000.*1000.*1000. }, + { "tibit", 1024.*1024.*1024.*1024. }, + { "tbit", 1000.*1000.*1000.*1000. }, + { "Bps", 8. }, + { "KiBps", 8.*1024. }, + { "KBps", 8.*1000. }, + { "MiBps", 8.*1024.*1024. }, + { "MBps", 8.*1000.*1000. }, + { "GiBps", 8.*1024.*1024.*1024. }, + { "GBps", 8.*1000.*1000.*1000. }, + { "TiBps", 8.*1024.*1024.*1024.*1024. }, + { "TBps", 8.*1000.*1000.*1000.*1000. }, + { NULL, 0 } +}; + +static int +virRateToBps(const char *str, unsigned long *rate) +{ + char *p; + double bps; + const struct rate_suffix *s; + + if (virStrToDouble(str, &p, &bps) < 0) + return -1; + + if (p == str) + return -1; + + if (*p == '\0') { + *rate = bps / 8.; /* assume bytes/sec */ + return 0; + } + + for (s = suffixes; s->name; ++s) { + if (STRCASEEQ(s->name, p)) { + *rate = (bps * s->scale) / 8.; + return 0; + } + } + + return -1; +} + +static int +virSizeToB(const char *str, unsigned long *size) +{ + double sz; + char *p; + + if (virStrToDouble(str, &p, &sz) < 0) + return -1; + + if (p == str) + return -1; + + if (*p) { + if (STRCASEEQ(p, "kb") || STRCASEEQ(p, "k")) + sz *= 1024; + else if (STRCASEEQ(p, "gb") || STRCASEEQ(p, "g")) + sz *= 1024*1024*1024; + else if (STRCASEEQ(p, "gbit")) + sz *= 1024*1024*1024/8; + else if (STRCASEEQ(p, "mb") || STRCASEEQ(p, "m")) + sz *= 1024*1024; + else if (STRCASEEQ(p, "mbit")) + sz *= 1024*1024/8; + else if (STRCASEEQ(p, "kbit")) + sz *= 1024/8; + else if (STRCASENEQ(p, "b")) + return -1; + } + + *size = sz; + return 0; +} + +static int +virBandwidthParseChildDefNode(xmlNodePtr node, virRatePtr rate) +{ + int ret = -1; + char *average = NULL; + char *peak = NULL; + char *burst = NULL; + + if (!node || !rate) + return -1; + + average = virXMLPropString(node, "average"); + peak = virXMLPropString(node, "peak"); + burst = virXMLPropString(node, "burst"); + + if (average) { + if (virRateToBps(average, &rate->average) < 0) { + virBandwidthError(VIR_ERR_CONFIG_UNSUPPORTED, + _("could not convert %s"), + average); + goto cleanup; + } + } else { + virBandwidthError(VIR_ERR_XML_DETAIL, "%s", + _("Missing mandatory average attribute")); + goto cleanup; + } + + if (peak && virRateToBps(peak, &rate->peak) < 0) { + virBandwidthError(VIR_ERR_CONFIG_UNSUPPORTED, + _("could not convert %s"), + peak); + goto cleanup; + } + + if (burst && virSizeToB(burst, &rate->burst) < 0) { + virBandwidthError(VIR_ERR_CONFIG_UNSUPPORTED, + _("could not convert %s"), + burst); + goto cleanup; + } + + ret = 0; + +cleanup: + VIR_FREE(average); + VIR_FREE(peak); + VIR_FREE(burst); + + return ret; +} + +/** + * virBandwidthParseXML: + * @node: XML node + * @def: where to store the parsed result + * + * Parse bandwidth XML and store into given pointer + * + * Returns 0 on success, -1 on error. + */ +int +virBandwidthDefParseNode(xmlNodePtr node, virBandwidthPtr def) +{ + int ret = -1; + xmlNodePtr cur = node->children; + xmlNodePtr in = NULL, out = NULL; + + if (!node || !def || + !xmlStrEqual(node->name, BAD_CAST "bandwidth")) + return -1; + + memset(def, 0, sizeof(virBandwidth)); + while (cur) { + if (cur->type == XML_ELEMENT_NODE) { + if (xmlStrEqual(cur->name, BAD_CAST "inbound")) { + if (in) { + virBandwidthError(VIR_ERR_XML_DETAIL, "%s", + _("Only one child <inbound> " + "element allowed")); + goto cleanup; + } + in = cur; + } else if (xmlStrEqual(cur->name, BAD_CAST "outbound")) { + if (out) { + virBandwidthError(VIR_ERR_XML_DETAIL, "%s", + _("Only one child <outbound> " + "element allowed")); + goto cleanup; + } + out = cur; + } else { + virBandwidthError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Unknown element %s"), + cur->name); + goto cleanup; + } + } + cur = cur->next; + } + + if (in && virBandwidthParseChildDefNode(in, &def->in) < 0) + goto cleanup; + + if (out && virBandwidthParseChildDefNode(out, &def->out) < 0) + goto cleanup; + + ret = 0; + +cleanup: + return ret; +} diff --git a/src/util/network.h b/src/util/network.h index 568bca1..28a6402 100644 --- a/src/util/network.h +++ b/src/util/network.h @@ -20,6 +20,7 @@ # endif # include <netdb.h> # include <netinet/in.h> +# include "xml.h" typedef struct { union { @@ -106,4 +107,5 @@ int virSocketAddrPrefixToNetmask(unsigned int prefix, virSocketAddrPtr netmask, int family); +int virBandwidthDefParseNode(xmlNodePtr node, virBandwidthPtr def); #endif /* __VIR_NETWORK_H__ */ -- 1.7.5.rc3

On Tue, Jul 12, 2011 at 13:57:09 +0200, Michal Privoznik wrote:
These functions take on input decimal numbers optionally followed by unit. Units are exactly the same as 'tc' accepts. --- src/conf/domain_conf.c | 3 + src/conf/domain_conf.h | 1 + src/conf/network_conf.c | 5 + src/conf/network_conf.h | 1 + src/libvirt_private.syms | 1 + src/util/network.c | 203 ++++++++++++++++++++++++++++++++++++++++++++++ src/util/network.h | 2 + 7 files changed, 216 insertions(+), 0 deletions(-)
This can be greatly simplified once we forbid using units in bandwidth attributes. ...
diff --git a/src/util/network.c b/src/util/network.c index eb16e0c..476ecde 100644 --- a/src/util/network.c +++ b/src/util/network.c @@ -10,6 +10,7 @@
#include <config.h> #include <arpa/inet.h> +#include <math.h>
BTW, why did you need to include math.h? Getting rid of it is another good reason for removing units.
#include "memory.h" #include "network.h" @@ -21,6 +22,9 @@
...
+/** + * virBandwidthParseXML:
The name here doesn't match the function this is supposed to document.
+ * @node: XML node + * @def: where to store the parsed result + * + * Parse bandwidth XML and store into given pointer + * + * Returns 0 on success, -1 on error. + */ +int +virBandwidthDefParseNode(xmlNodePtr node, virBandwidthPtr def) +{ + int ret = -1; + xmlNodePtr cur = node->children; + xmlNodePtr in = NULL, out = NULL; + + if (!node || !def || + !xmlStrEqual(node->name, BAD_CAST "bandwidth")) + return -1; + + memset(def, 0, sizeof(virBandwidth));
Using sizeof(*def) is better.
+ while (cur) { + if (cur->type == XML_ELEMENT_NODE) { + if (xmlStrEqual(cur->name, BAD_CAST "inbound")) { + if (in) { + virBandwidthError(VIR_ERR_XML_DETAIL, "%s", + _("Only one child <inbound> " + "element allowed")); + goto cleanup; + } + in = cur; + } else if (xmlStrEqual(cur->name, BAD_CAST "outbound")) { + if (out) { + virBandwidthError(VIR_ERR_XML_DETAIL, "%s", + _("Only one child <outbound> " + "element allowed")); + goto cleanup; + } + out = cur; + } else { + virBandwidthError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Unknown element %s"), + cur->name); + goto cleanup;
AFAIK we ignore unknown XML elements in other XML parsing code, shouldn't we do the same here? ... Jirka

--- src/conf/domain_conf.c | 3 + src/conf/network_conf.c | 3 + src/libvirt_private.syms | 3 + src/util/network.c | 153 ++++++++++++++++++++++++++++++++++++++++++++++ src/util/network.h | 7 ++ 5 files changed, 169 insertions(+), 0 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index c109153..740630f 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -8844,6 +8844,9 @@ virDomainNetDefFormat(virBufferPtr buf, virBufferAddLit(buf, " </tune>\n"); } + if (virBandwidrhDefFormat(buf, &def->bandwidth, " ") < 0) + return -1; + if (virDomainDeviceInfoFormat(buf, &def->info, flags) < 0) return -1; diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index c9929e2..43145b1 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -1093,6 +1093,9 @@ char *virNetworkDefFormat(const virNetworkDefPtr def) if (virNetworkDNSDefFormat(&buf, def->dns) < 0) goto error; + if (virBandwidrhDefFormat(&buf, &def->bandwidth, " ") < 0) + goto error; + for (ii = 0; ii < def->nips; ii++) { if (virNetworkIpDefFormat(&buf, &def->ips[ii]) < 0) goto error; diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index c0e1bfa..2131108 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -682,7 +682,10 @@ nlComm; # network.h +virBandwidrhDefFormat; virBandwidthDefParseNode; +virBpsToRate; +virBtoSize; virSocketAddrBroadcast; virSocketAddrBroadcastByPrefix; virSocketAddrIsNetmask; diff --git a/src/util/network.c b/src/util/network.c index 476ecde..b24f977 100644 --- a/src/util/network.c +++ b/src/util/network.c @@ -877,3 +877,156 @@ virBandwidthDefParseNode(xmlNodePtr node, virBandwidthPtr def) cleanup: return ret; } + +/** + * virBpsToRate: + * @buf: output + * @rate: input + * + * Format rate, take bps as input. Do pretty formating. + * Caller MUST free @buf after use. + * + * Return 0 on success, -1 otherwise. + */ +int virBpsToRate(char **buf, unsigned long rate) { + int ret = -1; + double tmp = (double)rate*8; + + if (tmp >= 1000.*1000.*1000.) { + if (virAsprintf(buf, "%.0fmbit", tmp/1000./1000.) < 0) + goto cleanup; + } else if (tmp >= 1000.*1000.) { + if (virAsprintf(buf, "%.0fkbit", tmp/1000.) < 0) + goto cleanup; + } else { + if (virAsprintf(buf, "%.0fbit", tmp) < 0) + goto cleanup; + } + + ret = 0; + +cleanup: + return ret; +} + +/** + * virBtoSize: + * @buf: output + * @size: input + * + * Format size, take size in bytes, produce prettier format. + * Caller MUST free @buf after use. + * + * Return 0 on success, -1 otherwise. + */ +int virBtoSize(char **buf, unsigned long size) { + int ret = -1; + double tmp = (double)size; + + if (size >= 1024*1024 && + fabs(1024*1024 * rint(tmp/(1024*1024)) - size) < 1024) { + if (virAsprintf(buf, "%gmb", rint(tmp/(1024*1024))) < 0) + goto cleanup; + } else if (size >= 1024 && + fabs(1024*rint(tmp/1024) - size) < 16) { + if (virAsprintf(buf, "%gkb", rint(tmp/1024)) < 0) + goto cleanup; + } else { + if (virAsprintf(buf, "%lub", size) < 0) + goto cleanup; + } + + ret = 0; + +cleanup: + return ret; +} + +static int +virBandwidthChildDefFormat(virBufferPtr buf, + virRatePtr def, + const char *elem_name) +{ + int ret = -1; + char *tmp = NULL; + + if (!buf || !def || !elem_name) + return -1; + + if (def->average) { + if (virBpsToRate(&tmp, def->average) < 0) + goto cleanup; + virBufferAsprintf(buf, "<%s average='%s'", elem_name, tmp); + VIR_FREE(tmp); + + if (def->peak) { + if (virBpsToRate(&tmp, def->peak) < 0) + goto cleanup; + virBufferAsprintf(buf, " peak='%s'", tmp); + VIR_FREE(tmp); + } + + if (def->burst) { + if (virBtoSize(&tmp, def->burst)) + goto cleanup; + virBufferAsprintf(buf, " burst='%s'", tmp); + VIR_FREE(tmp); + } + virBufferAddLit(buf, "/>\n"); + } + + ret = 0; + +cleanup: + return ret; +} + +/** + * virBandwidrhDefFormat: + * @buf: Buffer to print to + * @def: Data source + * @indent: prepend all lines printed with this + * + * Formats bandwidth and prepend each line with @indent. + * Passing NULL to @indent is equivalent passing "". + * + * Returns 0 on success, else -1. + */ +int +virBandwidrhDefFormat(virBufferPtr buf, + virBandwidthPtr def, + const char *indent) +{ + int ret = -1; + + if (!buf || !def) + goto cleanup; + + if (!indent) + indent = ""; + + if (!def->in.average && !def->out.average) { + ret = 0; + goto cleanup; + } + + virBufferAsprintf(buf, "%s<bandwidth>\n", indent); + if (def->in.average) { + virBufferAsprintf(buf, "%s ", indent); + if (virBandwidthChildDefFormat(buf, &def->in, "inbound") < 0) + goto cleanup; + } + + if (def->out.average) { + virBufferAsprintf(buf, "%s ", indent); + if (virBandwidthChildDefFormat(buf, &def->out, "outbound") < 0) + goto cleanup; + } + + virBufferAsprintf(buf, "%s</bandwidth>\n", indent); + + ret = 0; + +cleanup: + return ret; +} diff --git a/src/util/network.h b/src/util/network.h index 28a6402..f8cdc55 100644 --- a/src/util/network.h +++ b/src/util/network.h @@ -21,6 +21,7 @@ # include <netdb.h> # include <netinet/in.h> # include "xml.h" +# include "buf.h" typedef struct { union { @@ -108,4 +109,10 @@ int virSocketAddrPrefixToNetmask(unsigned int prefix, int family); int virBandwidthDefParseNode(xmlNodePtr node, virBandwidthPtr def); +int virBandwidrhDefFormat(virBufferPtr buf, + virBandwidthPtr def, + const char *indent); + +int virBpsToRate(char **buf, unsigned long rate); +int virBtoSize(char **buf, unsigned long size); #endif /* __VIR_NETWORK_H__ */ -- 1.7.5.rc3

On Tue, Jul 12, 2011 at 13:57:10 +0200, Michal Privoznik wrote:
--- src/conf/domain_conf.c | 3 + src/conf/network_conf.c | 3 + src/libvirt_private.syms | 3 + src/util/network.c | 153 ++++++++++++++++++++++++++++++++++++++++++++++ src/util/network.h | 7 ++ 5 files changed, 169 insertions(+), 0 deletions(-)
Removing units makes this simpler as well. Jirka

These function executes 'tc' with appropriate arguments to set desired QoS setting on interface or bridge during its creation. --- configure.ac | 4 + src/libvirt_private.syms | 2 + src/network/bridge_driver.c | 12 +++ src/qemu/qemu_command.c | 10 +++- src/util/macvtap.c | 12 +++- src/util/macvtap.h | 5 +- src/util/network.c | 161 +++++++++++++++++++++++++++++++++++++++++++ src/util/network.h | 3 + 8 files changed, 205 insertions(+), 4 deletions(-) diff --git a/configure.ac b/configure.ac index ae747fb..bef204d 100644 --- a/configure.ac +++ b/configure.ac @@ -165,6 +165,8 @@ AC_PATH_PROG([RADVD], [radvd], [radvd], [/sbin:/usr/sbin:/usr/local/sbin:$PATH]) AC_PATH_PROG([BRCTL], [brctl], [brctl], [/sbin:/usr/sbin:/usr/local/sbin:$PATH]) +AC_PATH_PROG([TC], [tc], [tc], + [/sbin:/usr/sbin:/usr/local/sbin:$PATH]) AC_PATH_PROG([UDEVADM], [udevadm], [], [/sbin:/usr/sbin:/usr/local/sbin:$PATH]) AC_PATH_PROG([UDEVSETTLE], [udevsettle], [], @@ -178,6 +180,8 @@ AC_DEFINE_UNQUOTED([RADVD],["$RADVD"], [Location or name of the radvd program]) AC_DEFINE_UNQUOTED([BRCTL],["$BRCTL"], [Location or name of the brctl program (see bridge-utils)]) +AC_DEFINE_UNQUOTED([TC],["$TC"], + [Location or name of the tc profram (see iproute2)]) if test -n "$UDEVADM"; then AC_DEFINE_UNQUOTED([UDEVADM],["$UDEVADM"], [Location or name of the udevadm program]) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 2131108..8c4217c 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -684,6 +684,8 @@ nlComm; # network.h virBandwidrhDefFormat; virBandwidthDefParseNode; +virBandwidthDisable; +virBandwidthEnable; virBpsToRate; virBtoSize; virSocketAddrBroadcast; diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 554a8ac..750dee4 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -1804,6 +1804,13 @@ networkStartNetworkDaemon(struct network_driver *driver, if (v6present && networkStartRadvd(network) < 0) goto err4; + if (virBandwidthEnable(&network->def->bandwidth, network->def->bridge) < 0) { + networkReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot set bandwidth limits on %s"), + network->def->bridge); + goto err5; + } + /* Persist the live configuration now we have bridge info */ if (virNetworkSaveConfig(NETWORK_STATE_DIR, network->def) < 0) { goto err5; @@ -1895,6 +1902,11 @@ static int networkShutdownNetworkDaemon(struct network_driver *driver, unlink(stateFile); VIR_FREE(stateFile); + if (virBandwidthDisable(network->def->bridge, true) < 0) { + VIR_WARN("Failed to disable QoS on %s", + network->def->name); + } + if (network->radvdPid > 0) { char *radvdpidbase; diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index b131ce7..08867c7 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -128,7 +128,7 @@ qemuPhysIfaceConnect(virDomainDefPtr def, rc = openMacvtapTap(net->ifname, net->mac, net->data.direct.linkdev, net->data.direct.mode, vnet_hdr, def->uuid, &net->data.direct.virtPortProfile, &res_ifname, - vmop, driver->stateDir); + vmop, driver->stateDir, &net->bandwidth); if (rc >= 0) { qemuAuditNetDevice(def, net, res_ifname, true); VIR_FREE(net->ifname); @@ -291,6 +291,14 @@ qemuNetworkIfaceConnect(virDomainDefPtr def, } } + if (virBandwidthEnable(&net->bandwidth, net->ifname) < 0) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot set bandwidth limits on %s"), + net->ifname); + VIR_FORCE_CLOSE(tapfd); + goto cleanup; + } + if (tapfd >= 0) { if ((net->filter) && (net->ifname)) { err = virDomainConfNWFilterInstantiate(conn, net); diff --git a/src/util/macvtap.c b/src/util/macvtap.c index 30343c8..bb7784c 100644 --- a/src/util/macvtap.c +++ b/src/util/macvtap.c @@ -266,7 +266,8 @@ openMacvtapTap(const char *tgifname, virVirtualPortProfileParamsPtr virtPortProfile, char **res_ifname, enum virVMOperationType vmOp, - char *stateDir) + char *stateDir, + virBandwidthPtr bandwidth) { const char *type = "macvtap"; int c, rc; @@ -360,6 +361,15 @@ create_name: } else goto disassociate_exit; + if (virBandwidthEnable(bandwidth, cr_ifname) < 0) { + macvtapError(VIR_ERR_INTERNAL_ERROR, + _("cannot set bandwidth limits on %s"), + cr_ifname); + rc = -1; + goto disassociate_exit; + } + + return rc; disassociate_exit: diff --git a/src/util/macvtap.h b/src/util/macvtap.h index 1b85989..84bbc92 100644 --- a/src/util/macvtap.h +++ b/src/util/macvtap.h @@ -24,7 +24,7 @@ # define __UTIL_MACVTAP_H__ # include <config.h> - +# include "network.h" enum virVirtualPortType { VIR_VIRTUALPORT_NONE, @@ -95,7 +95,8 @@ int openMacvtapTap(const char *ifname, virVirtualPortProfileParamsPtr virtPortProfile, char **res_ifname, enum virVMOperationType vmop, - char *stateDir); + char *stateDir, + virBandwidthPtr bandwidth); void delMacvtap(const char *ifname, const unsigned char *macaddress, diff --git a/src/util/network.c b/src/util/network.c index b24f977..0266616 100644 --- a/src/util/network.c +++ b/src/util/network.c @@ -16,6 +16,7 @@ #include "network.h" #include "util.h" #include "virterror_internal.h" +#include "command.h" #define VIR_FROM_THIS VIR_FROM_NONE #define virSocketError(code, ...) \ @@ -1030,3 +1031,163 @@ virBandwidrhDefFormat(virBufferPtr buf, cleanup: return ret; } + +/** + * virBandwidthEnable: + * @bandwidth: rates to set + * @iface: on which interface + * + * This function enables QoS on specified interface + * and set given traffic limits for both, incoming + * and outgoing traffic. Any previous setting get + * overwritten. + * + * Return 0 on success, -1 otherwise. + */ +int +virBandwidthEnable(virBandwidthPtr bandwidth, + const char *iface) +{ + int ret = -1; + virCommandPtr cmd = NULL; + char *average = NULL; + char *peak = NULL; + char *burst = NULL; + + if (!bandwidth || !iface) + return -1; + + if (!bandwidth->in.average && + !bandwidth->out.average) { + /* nothing to be enabled */ + ret = 0; + goto cleanup; + } + + if (virBandwidthDisable(iface, true) < 0) + goto cleanup; + + if (bandwidth->in.average) { + if (virBpsToRate(&average, bandwidth->in.average) < 0) + goto cleanup; + if (bandwidth->in.peak && + (virBpsToRate(&peak, bandwidth->in.peak) < 0)) + goto cleanup; + if (bandwidth->in.burst && + (virBtoSize(&burst, bandwidth->in.burst) < 0)) + goto cleanup; + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "add", "dev", iface, "root", + "handle", "1:", "htb", "default", "1", NULL); + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd,"class", "add", "dev", iface, "parent", + "1:", "classid", "1:1", "htb", NULL); + virCommandAddArgList(cmd, "rate", average, NULL); + + if (peak) + virCommandAddArgList(cmd, "ceil", peak, NULL); + if (burst) + virCommandAddArgList(cmd, "burst", burst, NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd,"filter", "add", "dev", iface, "parent", + "1:0", "protocol", "ip", "handle", "1", "fw", + "flowid", "1", NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + VIR_FREE(average); + VIR_FREE(peak); + VIR_FREE(burst); + } + + if (bandwidth->out.average) { + if (virBpsToRate(&average, bandwidth->out.average) < 0) + goto cleanup; + if (virBtoSize(&burst, bandwidth->out.burst ? bandwidth->out.burst : + bandwidth->out.average) < 0) + goto cleanup; + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "add", "dev", iface, + "ingress", NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "filter", "add", "dev", iface, "parent", + "ffff:", "protocol", "ip", "u32", "match", "ip", + "src", "0.0.0.0/0", "police", "rate", average, + "burst", burst, "drop", "flowid", ":1", NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + } + + ret = 0; + +cleanup: + virCommandFree(cmd); + VIR_FREE(average); + VIR_FREE(peak); + VIR_FREE(burst); + return ret; +} + +/** + * virBandwidthDisable: + * @iface: on which interface + * @may_fail: should be unsuccessful disable considered fatal? + * + * This function tries to disable QoS on specified interface + * by deleting root and ingress qdisc. However, this may fail + * if we try to remove the default one. + * + * Return 0 on success, -1 otherwise. + */ +int +virBandwidthDisable(const char *iface, + bool may_fail) +{ + int ret = -1; + int status; + virCommandPtr cmd = NULL; + + if (!iface) + return -1; + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "del", "dev", iface, "root", NULL); + + if ((virCommandRun(cmd, &status) < 0) || + (!may_fail && status)) + goto cleanup; + + virCommandFree(cmd); + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "del", "dev", iface, "ingress", NULL); + + if ((virCommandRun(cmd, &status) < 0) || + (!may_fail && status)) + goto cleanup; + + ret = 0; + +cleanup: + virCommandFree(cmd); + return ret; +} diff --git a/src/util/network.h b/src/util/network.h index f8cdc55..5a88e51 100644 --- a/src/util/network.h +++ b/src/util/network.h @@ -115,4 +115,7 @@ int virBandwidrhDefFormat(virBufferPtr buf, int virBpsToRate(char **buf, unsigned long rate); int virBtoSize(char **buf, unsigned long size); + +int virBandwidthEnable(virBandwidthPtr bandwidth, const char *iface); +int virBandwidthDisable(const char *iface, bool may_fail); #endif /* __VIR_NETWORK_H__ */ -- 1.7.5.rc3

On Tue, Jul 12, 2011 at 13:57:11 +0200, Michal Privoznik wrote:
These function executes 'tc' with appropriate arguments to set desired QoS setting on interface or bridge during its creation. --- configure.ac | 4 + src/libvirt_private.syms | 2 + src/network/bridge_driver.c | 12 +++ src/qemu/qemu_command.c | 10 +++- src/util/macvtap.c | 12 +++- src/util/macvtap.h | 5 +- src/util/network.c | 161 +++++++++++++++++++++++++++++++++++++++++++ src/util/network.h | 3 + 8 files changed, 205 insertions(+), 4 deletions(-)
I didn't spot any issues in this patch but I have a general question. Does QoS state need to be cleaned up in case virBandwidthEnable succeeds but something further in the process fails? That is do we need to call virBandwidthDisable in the cleanup part? Similarly, if is any cleanup needed when some parts of virBandwidthEnable succeeds but not all of them? Jirka

--- tests/networkxml2xmlin/bandwidth-network.xml | 16 ++++++++++++++++ tests/networkxml2xmlout/bandwidth-network.xml | 16 ++++++++++++++++ tests/networkxml2xmltest.c | 1 + 3 files changed, 33 insertions(+), 0 deletions(-) create mode 100644 tests/networkxml2xmlin/bandwidth-network.xml create mode 100644 tests/networkxml2xmlout/bandwidth-network.xml diff --git a/tests/networkxml2xmlin/bandwidth-network.xml b/tests/networkxml2xmlin/bandwidth-network.xml new file mode 100644 index 0000000..a1301df --- /dev/null +++ b/tests/networkxml2xmlin/bandwidth-network.xml @@ -0,0 +1,16 @@ +<network> + <name>test-net</name> + <uuid>986fed9e-a488-186d-ef2d-17ebfd1993f8</uuid> + <forward mode='nat'/> + <bridge name='virbr1' stp='on' delay='0' /> + <mac address='52:54:00:E6:A2:C9'/> + <bandwidth> + <inbound average='1mbit' peak='2mbit' burst='1m'/> + <outbound average='2mbit'/> + </bandwidth> + <ip address='192.168.120.1' netmask='255.255.255.0'> + <dhcp> + <range start='192.168.120.2' end='192.168.120.254' /> + </dhcp> + </ip> +</network> diff --git a/tests/networkxml2xmlout/bandwidth-network.xml b/tests/networkxml2xmlout/bandwidth-network.xml new file mode 100644 index 0000000..6c31de7 --- /dev/null +++ b/tests/networkxml2xmlout/bandwidth-network.xml @@ -0,0 +1,16 @@ +<network> + <name>test-net</name> + <uuid>986fed9e-a488-186d-ef2d-17ebfd1993f8</uuid> + <forward mode='nat'/> + <bridge name='virbr1' stp='on' delay='0' /> + <mac address='52:54:00:E6:A2:C9'/> + <bandwidth> + <inbound average='1000kbit' peak='2000kbit' burst='1mb'/> + <outbound average='2000kbit'/> + </bandwidth> + <ip address='192.168.120.1' netmask='255.255.255.0'> + <dhcp> + <range start='192.168.120.2' end='192.168.120.254' /> + </dhcp> + </ip> +</network> diff --git a/tests/networkxml2xmltest.c b/tests/networkxml2xmltest.c index 065166d..9dac7b4 100644 --- a/tests/networkxml2xmltest.c +++ b/tests/networkxml2xmltest.c @@ -88,6 +88,7 @@ mymain(void) DO_TEST("netboot-proxy-network"); DO_TEST("nat-network-dns-txt-record"); DO_TEST("nat-network-dns-hosts"); + DO_TEST("bandwidth-network"); return (ret==0 ? EXIT_SUCCESS : EXIT_FAILURE); } -- 1.7.5.rc3

On Tue, Jul 12, 2011 at 13:57:12 +0200, Michal Privoznik wrote:
--- tests/networkxml2xmlin/bandwidth-network.xml | 16 ++++++++++++++++ tests/networkxml2xmlout/bandwidth-network.xml | 16 ++++++++++++++++ tests/networkxml2xmltest.c | 1 + 3 files changed, 33 insertions(+), 0 deletions(-) create mode 100644 tests/networkxml2xmlin/bandwidth-network.xml create mode 100644 tests/networkxml2xmlout/bandwidth-network.xml
Looks good except for the units. Jirka

--- tests/domainschemadata/domain-bandwidth.xml | 72 +++++++++++++++++++++++++++ 1 files changed, 72 insertions(+), 0 deletions(-) create mode 100644 tests/domainschemadata/domain-bandwidth.xml diff --git a/tests/domainschemadata/domain-bandwidth.xml b/tests/domainschemadata/domain-bandwidth.xml new file mode 100644 index 0000000..98b989a --- /dev/null +++ b/tests/domainschemadata/domain-bandwidth.xml @@ -0,0 +1,72 @@ +<domain type='kvm'> + <name>f14-60</name> + <uuid>38644c45-5227-a936-3b38-bc4f72c355bb</uuid> + <memory>1048576</memory> + <currentMemory>1048576</currentMemory> + <vcpu>2</vcpu> + <os> + <type arch='x86_64' machine='pc-0.13'>hvm</type> + <boot dev='cdrom'/> + <boot dev='hd'/> + <bootmenu enable='yes'/> + </os> + <features> + <acpi/> + <apic/> + <pae/> + </features> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>restart</on_crash> + <devices> + <emulator>/usr/bin/qemu-kvm</emulator> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2'/> + <source file='/var/lib/libvirt/images/f14-6.img'/> + <target dev='vda' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </disk> + <disk type='file' device='cdrom'> + <driver name='qemu' type='raw'/> + <source file='/home/zippy/tmp/Fedora-14-x86_64-Live-KDE.iso'/> + <target dev='hdc' bus='ide'/> + <readonly/> + <address type='drive' controller='0' bus='1' unit='0'/> + </disk> + <controller type='ide' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <controller type='virtio-serial' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> + </controller> + <interface type='network'> + <mac address='52:54:00:24:a5:9f'/> + <source network='default'/> + <bandwidth> + <inbound average='1000kbit' peak='4000kbit' burst='1mb'/> + <outbound average='128000bit' peak='256000bit' burst='32kb'/> + </bandwidth> + <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> + </interface> + <serial type='pty'> + <target port='0'/> + </serial> + <console type='pty'> + <target type='serial' port='0'/> + </console> + <input type='tablet' bus='usb'/> + <input type='mouse' bus='ps2'/> + <graphics type='vnc' port='-1' autoport='yes'/> + <sound model='ac97'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </sound> + <video> + <model type='vga' vram='9216' heads='1'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </video> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </memballoon> + </devices> +</domain> -- 1.7.5.rc3

On Tue, Jul 12, 2011 at 13:57:13 +0200, Michal Privoznik wrote:
--- tests/domainschemadata/domain-bandwidth.xml | 72 +++++++++++++++++++++++++++ 1 files changed, 72 insertions(+), 0 deletions(-) create mode 100644 tests/domainschemadata/domain-bandwidth.xml
No units, please :-) Jirka

On Tue, Jul 12, 2011 at 01:57:06PM +0200, Michal Privoznik wrote:
This patch series add support for setting traffic shaping and policing on both domain's interface and network's virtual bridge. Basically, this is done via 'tc' from iproute2 package. For shaping is HTB used, for policing we need u32 match selector. Both should be available in RHEL-6 kernel.
How this works: On an virtual interface which has limits defined a root qdisc are replaced. Ingress root for outbound traffic shaping and egress for inbound. Basically, in inbound traffic policing is applied, on outbound shaping.
Okay, this should work http://events.linuxfoundation.org/events/linuxcon-japan/horman OpenVSwitch offers both egress and ingress and based on Horman feedback in slides it wasn't a good idea to use ingress (i.e. traffic shaping) for inbound but egress was fine (see slides from 18)
New qdiscs are set to limit the traffic to rate set in XML. For shaping it is possible to set the size of buffer. Accepted values for rate, peak and burst have same format as 'tc' command and are documented.
Supported devices are VIR_DOMAIN_NET_TYPE_NETWORK, VIR_DOMAIN_NET_TYPE_BRIDGE and VIR_DOMAIN_NET_TYPE_DIRECT.
diff to v1: -rebase to current HEAD -add support for macvtap devices
Okay, individual review separately, Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Tue, Jul 12, 2011 at 12:57 PM, Michal Privoznik <mprivozn@redhat.com> wrote:
This patch series add support for setting traffic shaping and policing on both domain's interface and network's virtual bridge. Basically, this is done via 'tc' from iproute2 package. For shaping is HTB used, for policing we need u32 match selector. Both should be available in RHEL-6 kernel.
Nice, this is a good feature. I'll keep an eye on this! Stefan

On Tue, Jul 12, 2011 at 13:57:06 +0200, Michal Privoznik wrote:
This patch series add support for setting traffic shaping and policing on both domain's interface and network's virtual bridge. Basically, this is done via 'tc' from iproute2 package. For shaping is HTB used, for policing we need u32 match selector. Both should be available in RHEL-6 kernel.
How this works: On an virtual interface which has limits defined a root qdisc are replaced. Ingress root for outbound traffic shaping and egress for inbound. Basically, in inbound traffic policing is applied, on outbound shaping. New qdiscs are set to limit the traffic to rate set in XML. For shaping it is possible to set the size of buffer. Accepted values for rate, peak and burst have same format as 'tc' command and are documented.
Supported devices are VIR_DOMAIN_NET_TYPE_NETWORK, VIR_DOMAIN_NET_TYPE_BRIDGE and VIR_DOMAIN_NET_TYPE_DIRECT.
One more thing. Would it be possible to also implement new APIs to manage network bandwidth on-the-fly? It's not for this patchset but is does sound like a good idea to have the possibility to, for example, restrict bandwidth of a running domain since more domains need to use the network or increase the bandwidth because host's owner got more money from domain's owner :-) Jirka

On 07/18/2011 09:59 AM, Jiri Denemark wrote:
One more thing. Would it be possible to also implement new APIs to manage network bandwidth on-the-fly? It's not for this patchset but is does sound like a good idea to have the possibility to, for example, restrict bandwidth of a running domain since more domains need to use the network or increase the bandwidth because host's owner got more money from domain's owner :-)
As a matter of fact, this would be a nice feature for many network (and other) settings. For example, right now if you add a host to the static host mappings in a network, you must restart that network for it to take effect. Restarting a network disconnects all the guest interfaces using that network, and either the guests must be restarted, or at the very least their interfaces must be manually detached and reattached. In reality, though all that's needed is to update a text file on disk and SIGHUP dnsmasq.
participants (6)
-
Daniel Veillard
-
Jiri Denemark
-
Laine Stump
-
Matthias Bolte
-
Michal Privoznik
-
Stefan Hajnoczi