[libvirt] [PATCH v2 0/8] Allow minimal bandwidth guarantee

This is a rework of my one year old patchset [1]. If you are brave enough and start reviewing it, you may want to read the last patch first. The whole magic hidden behind the scene is documented there. For now, the only supported combination where this works is virtual interface connected to virtual network. This network MUST have an inbound average set, at least. This tells how much bandwidth do we have, and how much can we guarantee. If we would miss this info, then we cannot guarantee anything. On the other hand, this is just a very small part of whole problem. Users MUST configure all other devices on traffic path to follow up their setting in libvirt. But that's out of our scope. 1: http://www.redhat.com/archives/libvir-list/2011-December/msg00486.html diff to v1: -Laine's suggestions worked in Michal Privoznik (8): bandwidth: Attach sfq to leaf node bandwidth: add new 'floor' attribute bandwidth: Create hierarchical shaping classes bandwidth: Create (un)plug functions bandwidth: Create rate update function bandwidth: Create network bandwidth (un)plug functions network: Create real network status files domain: Keep assigned class_id in domstatus XML docs/formatdomain.html.in | 23 ++- docs/schemas/networkcommon.rng | 5 + po/POTFILES.in | 1 + src/Makefile.am | 7 +- src/conf/domain_conf.c | 19 ++- src/conf/domain_conf.h | 1 + src/conf/netdev_bandwidth_conf.c | 67 +++++++- src/conf/netdev_bandwidth_conf.h | 3 +- src/conf/network_conf.c | 230 +++++++++++++++++++++----- src/conf/network_conf.h | 6 + src/libvirt_network.syms | 13 ++ src/libvirt_private.syms | 9 +- src/lxc/lxc_process.c | 3 +- src/network/bridge_driver.c | 253 +++++++++++++++++++++++++++- src/qemu/qemu_command.c | 3 +- src/qemu/qemu_driver.c | 2 +- src/util/virnetdevbandwidth.c | 337 +++++++++++++++++++++++++++++++++++++- src/util/virnetdevbandwidth.h | 24 +++- src/util/virnetdevmacvlan.c | 2 +- 19 files changed, 920 insertions(+), 88 deletions(-) create mode 100644 src/libvirt_network.syms -- 1.7.8.6

SFQ is qdisc which doesn't really shape any traffic but 'just' re-arrange packets in sending buffer so no stream starve. The goal is to ensure fairness. There is basically only one configuration parameter (perturb) which is set to advised value of 10. --- src/util/virnetdevbandwidth.c | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index bddb788..49fc425 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -103,6 +103,15 @@ virNetDevBandwidthSet(const char *ifname, virCommandFree(cmd); cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "add", "dev", ifname, "parent", + "1:1", "handle", "2:", "sfq", "perturb", + "10", NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); virCommandAddArgList(cmd,"filter", "add", "dev", ifname, "parent", "1:0", "protocol", "ip", "handle", "1", "fw", "flowid", "1", NULL); -- 1.7.8.6

On 12/04/2012 02:18 PM, Michal Privoznik wrote:
SFQ is qdisc which doesn't really shape any traffic but 'just' re-arrange packets in sending buffer so no stream starve. The goal is to ensure fairness. There is basically only one configuration parameter (perturb) which is set to advised value of 10.
What does sfq stand for anyway? ACK.
--- src/util/virnetdevbandwidth.c | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index bddb788..49fc425 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -103,6 +103,15 @@ virNetDevBandwidthSet(const char *ifname,
virCommandFree(cmd); cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "add", "dev", ifname, "parent", + "1:1", "handle", "2:", "sfq", "perturb", + "10", NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); virCommandAddArgList(cmd,"filter", "add", "dev", ifname, "parent", "1:0", "protocol", "ip", "handle", "1", "fw", "flowid", "1", NULL);

On 11.12.2012 02:01, Laine Stump wrote:
On 12/04/2012 02:18 PM, Michal Privoznik wrote:
SFQ is qdisc which doesn't really shape any traffic but 'just' re-arrange packets in sending buffer so no stream starve. The goal is to ensure fairness. There is basically only one configuration parameter (perturb) which is set to advised value of 10.
What does sfq stand for anyway?
Stochastic Fairness Queuing. I've found a nice picture and description of it: http://opalsoft.net/qos/DS-25.htm
ACK.
--- src/util/virnetdevbandwidth.c | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index bddb788..49fc425 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -103,6 +103,15 @@ virNetDevBandwidthSet(const char *ifname,
virCommandFree(cmd); cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "add", "dev", ifname, "parent", + "1:1", "handle", "2:", "sfq", "perturb", + "10", NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); virCommandAddArgList(cmd,"filter", "add", "dev", ifname, "parent", "1:0", "protocol", "ip", "handle", "1", "fw", "flowid", "1", NULL);
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

This is however supported only on domain interfaces with type='network'. Moreover, target network needs to have at least inbound QoS set. This is required by hierarchical traffic shaping.
From now on, the required attribute for <inbound/> is either 'average' (old) or 'floor' (new). This new attribute can be used just for interfaces type of network (<interface type='network'/>) currently.
docs/formatdomain.html.in | 23 +++++++++++-- docs/schemas/networkcommon.rng | 5 +++ src/conf/domain_conf.c | 6 ++- src/conf/netdev_bandwidth_conf.c | 67 ++++++++++++++++++++++++++++++++------ src/conf/netdev_bandwidth_conf.h | 3 +- src/conf/network_conf.c | 4 +- src/util/virnetdevbandwidth.h | 1 + 7 files changed, 90 insertions(+), 19 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 0574e68..c5baf27 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -3068,7 +3068,7 @@ qemu-kvm -net nic,model=? /dev/null <source network='default'/> <target dev='vnet0'/> <b><bandwidth> - <inbound average='1000' peak='5000' burst='1024'/> + <inbound average='1000' peak='5000' floor='200' burst='1024'/> <outbound average='128' peak='256' burst='256'/> </bandwidth></b> </interface> @@ -3083,14 +3083,29 @@ qemu-kvm -net nic,model=? /dev/null children element out result in no QoS applied on that traffic direction. So, when you want to shape only domain's incoming traffic, use <code>inbound</code> only, and vice versa. Each of these elements have one - mandatory attribute <code>average</code>. It specifies average bit rate on - interface being shaped. Then there are two optional attributes: + mandatory attribute <code>average</code> (or <code>floor</code> as + described below). <code>average</code> specifies average bit rate on + the interface being shaped. Then there are two optional attributes: <code>peak</code>, which specifies maximum rate at which interface can send data, and <code>burst</code>, amount of bytes that can be burst at <code>peak</code> speed. Accepted values for attributes are integer numbers. The units for <code>average</code> and <code>peak</code> attributes are kilobytes per second, and for the <code>burst</code> just kilobytes. - <span class="since">Since 0.9.4</span> + <span class="since">Since 0.9.4</span> The <code>inbound</code> can + optionally have <code>floor</code> attribute. This is there for + guaranteeing minimal throughput for shaped interfaces. This, however, + requires that all traffic goes through one point where QoS decisions can + take place. That's why this attribute works only for virtual networks for + now (that is <code><interface type='network'/></code> with a + forward type of route, nat, or no forward at all). Moreover, the + virtual network the interface is connected to is required to have at least + inbound QoS set (<code>average</code> at least). Moreover, with + <code>floor<code> attribute users don't need to specify + <code>average</code>. However, <code>peak</code> and <code>burst</code> + attributes still require <code>average</code>. Currently, linux kernel + doesn't allow ingress qdiscs to have any classes therefore + <code>floor</code> can be applied only on <code>inbound</code> and not + </code>outbound</code>. <span class="since">Since 1.0.1</span> </p> <h5><a name="elementVlanTag">Setting VLAN tag (on supported network types only)</a></h5> diff --git a/docs/schemas/networkcommon.rng b/docs/schemas/networkcommon.rng index c7749e7..51ff759 100644 --- a/docs/schemas/networkcommon.rng +++ b/docs/schemas/networkcommon.rng @@ -149,6 +149,11 @@ </attribute> </optional> <optional> + <attribute name="floor"> + <ref name="speed"/> + </attribute> + </optional> + <optional> <attribute name='burst'> <ref name="BurstSize"/> </attribute> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 4ffb39d..99aa08d 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -4781,7 +4781,8 @@ virDomainActualNetDefParseXML(xmlNodePtr node, bandwidth_node = virXPathNode("./bandwidth", ctxt); if (bandwidth_node && - !(actual->bandwidth = virNetDevBandwidthParse(bandwidth_node))) + !(actual->bandwidth = virNetDevBandwidthParse(bandwidth_node, + actual->type))) goto error; vlanNode = virXPathNode("./vlan", ctxt); @@ -4969,7 +4970,8 @@ virDomainNetDefParseXML(virCapsPtr caps, goto error; } } else if (xmlStrEqual(cur->name, BAD_CAST "bandwidth")) { - if (!(def->bandwidth = virNetDevBandwidthParse(cur))) + if (!(def->bandwidth = virNetDevBandwidthParse(cur, + def->type))) goto error; } else if (xmlStrEqual(cur->name, BAD_CAST "vlan")) { if (virNetDevVlanParse(cur, ctxt, &def->vlan) < 0) diff --git a/src/conf/netdev_bandwidth_conf.c b/src/conf/netdev_bandwidth_conf.c index 5802eba..15840b9 100644 --- a/src/conf/netdev_bandwidth_conf.c +++ b/src/conf/netdev_bandwidth_conf.c @@ -26,6 +26,7 @@ #include "virterror_internal.h" #include "util.h" #include "memory.h" +#include "domain_conf.h" #define VIR_FROM_THIS VIR_FROM_NONE @@ -36,6 +37,7 @@ virNetDevBandwidthParseRate(xmlNodePtr node, virNetDevBandwidthRatePtr rate) char *average = NULL; char *peak = NULL; char *burst = NULL; + char *floor = NULL; if (!node || !rate) { virReportError(VIR_ERR_INVALID_ARG, "%s", @@ -46,40 +48,55 @@ virNetDevBandwidthParseRate(xmlNodePtr node, virNetDevBandwidthRatePtr rate) average = virXMLPropString(node, "average"); peak = virXMLPropString(node, "peak"); burst = virXMLPropString(node, "burst"); + floor = virXMLPropString(node, "floor"); if (average) { if (virStrToLong_ull(average, NULL, 10, &rate->average) < 0) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("could not convert %s"), + _("could not convert bandwidth average value '%s'"), average); goto cleanup; } - } else { + } else if (!floor) { virReportError(VIR_ERR_XML_DETAIL, "%s", - _("Missing mandatory average attribute")); + _("Missing mandatory average or floor attributes")); + goto cleanup; + } + + if ((peak || burst) && !average) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("'peak' and 'burst' require 'average' attribute")); goto cleanup; } if (peak && virStrToLong_ull(peak, NULL, 10, &rate->peak) < 0) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("could not convert %s"), + _("could not convert bandwidth peak value '%s'"), peak); goto cleanup; } if (burst && virStrToLong_ull(burst, NULL, 10, &rate->burst) < 0) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("could not convert %s"), + _("could not convert bandwidth burst value '%s'"), burst); goto cleanup; } + if (floor && virStrToLong_ull(floor, NULL, 10, &rate->floor) < 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("could not convert bandwidth floor value '%s'"), + floor); + goto cleanup; + } + ret = 0; cleanup: VIR_FREE(average); VIR_FREE(peak); VIR_FREE(burst); + VIR_FREE(floor); return ret; } @@ -87,13 +104,17 @@ cleanup: /** * virNetDevBandwidthParse: * @node: XML node + * @net_type: one of virDomainNetType * - * Parse bandwidth XML and return pointer to structure + * Parse bandwidth XML and return pointer to structure. + * @net_type tell to which type will/is interface connected to. + * Pass -1 if this is not called on interface. * * Returns !NULL on success, NULL on error. */ virNetDevBandwidthPtr -virNetDevBandwidthParse(xmlNodePtr node) +virNetDevBandwidthParse(xmlNodePtr node, + int net_type) { virNetDevBandwidthPtr def = NULL; xmlNodePtr cur; @@ -146,6 +167,20 @@ virNetDevBandwidthParse(xmlNodePtr node) /* helper reported error for us */ goto error; } + + if (def->in->floor && net_type != VIR_DOMAIN_NET_TYPE_NETWORK) { + if (net_type == -1) { + /* 'floor' on network isn't supported */ + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("floor attribute isn't supported for " + "network's bandwidth yet")); + } else { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("floor attribute is supported only for " + "interfaces of type network")); + } + goto error; + } } if (out) { @@ -158,6 +193,13 @@ virNetDevBandwidthParse(xmlNodePtr node) /* helper reported error for us */ goto error; } + + if (def->out->floor) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("'floor' attribute allowed " + "only in <inbound> element")); + goto error; + } } return def; @@ -177,13 +219,18 @@ virNetDevBandwidthRateFormat(virNetDevBandwidthRatePtr def, if (!def) return 0; - if (def->average) { - virBufferAsprintf(buf, " <%s average='%llu'", elem_name, - def->average); + if (def->average || def->floor) { + virBufferAsprintf(buf, " <%s", elem_name); + + if (def->average) + virBufferAsprintf(buf, " average='%llu'", def->average); if (def->peak) virBufferAsprintf(buf, " peak='%llu'", def->peak); + if (def->floor) + virBufferAsprintf(buf, " floor='%llu'", def->floor); + if (def->burst) virBufferAsprintf(buf, " burst='%llu'", def->burst); virBufferAddLit(buf, "/>\n"); diff --git a/src/conf/netdev_bandwidth_conf.h b/src/conf/netdev_bandwidth_conf.h index bca5c50..0080165 100644 --- a/src/conf/netdev_bandwidth_conf.h +++ b/src/conf/netdev_bandwidth_conf.h @@ -28,7 +28,8 @@ # include "buf.h" # include "xml.h" -virNetDevBandwidthPtr virNetDevBandwidthParse(xmlNodePtr node) +virNetDevBandwidthPtr virNetDevBandwidthParse(xmlNodePtr node, + int net_type) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; int virNetDevBandwidthFormat(virNetDevBandwidthPtr def, virBufferPtr buf) diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index 6ce2e63..2cb9e81 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -1192,7 +1192,7 @@ virNetworkPortGroupParseXML(virPortGroupDefPtr def, bandwidth_node = virXPathNode("./bandwidth", ctxt); if (bandwidth_node && - !(def->bandwidth = virNetDevBandwidthParse(bandwidth_node))) { + !(def->bandwidth = virNetDevBandwidthParse(bandwidth_node, -1))) { goto error; } @@ -1268,7 +1268,7 @@ virNetworkDefParseXML(xmlXPathContextPtr ctxt) def->domain = virXPathString("string(./domain[1]/@name)", ctxt); if ((bandwidthNode = virXPathNode("./bandwidth", ctxt)) != NULL && - (def->bandwidth = virNetDevBandwidthParse(bandwidthNode)) == NULL) + (def->bandwidth = virNetDevBandwidthParse(bandwidthNode, -1)) == NULL) goto error; vlanNode = virXPathNode("./vlan", ctxt); diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index e046230..35f8b89 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -30,6 +30,7 @@ typedef virNetDevBandwidthRate *virNetDevBandwidthRatePtr; struct _virNetDevBandwidthRate { unsigned long long average; /* kbytes/s */ unsigned long long peak; /* kbytes/s */ + unsigned long long floor; /* kbytes/s */ unsigned long long burst; /* kbytes */ }; -- 1.7.8.6

On 12/04/2012 02:18 PM, Michal Privoznik wrote:
This is however supported only on domain interfaces with type='network'. Moreover, target network needs to have at least inbound QoS set. This is required by hierarchical traffic shaping.
From now on, the required attribute for <inbound/> is either 'average' (old) or 'floor' (new). This new attribute can be used just for interfaces type of network (<interface type='network'/>) currently.
docs/formatdomain.html.in | 23 +++++++++++-- docs/schemas/networkcommon.rng | 5 +++ src/conf/domain_conf.c | 6 ++- src/conf/netdev_bandwidth_conf.c | 67 ++++++++++++++++++++++++++++++++------ src/conf/netdev_bandwidth_conf.h | 3 +- src/conf/network_conf.c | 4 +- src/util/virnetdevbandwidth.h | 1 + 7 files changed, 90 insertions(+), 19 deletions(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 0574e68..c5baf27 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -3068,7 +3068,7 @@ qemu-kvm -net nic,model=? /dev/null <source network='default'/> <target dev='vnet0'/> <b><bandwidth> - <inbound average='1000' peak='5000' burst='1024'/> + <inbound average='1000' peak='5000' floor='200' burst='1024'/> <outbound average='128' peak='256' burst='256'/> </bandwidth></b> </interface> @@ -3083,14 +3083,29 @@ qemu-kvm -net nic,model=? /dev/null children element out result in no QoS applied on that traffic direction. So, when you want to shape only domain's incoming traffic, use <code>inbound</code> only, and vice versa. Each of these elements have one - mandatory attribute <code>average</code>. It specifies average bit rate on - interface being shaped. Then there are two optional attributes: + mandatory attribute <code>average</code> (or <code>floor</code> as + described below). <code>average</code> specifies average bit rate on + the interface being shaped. Then there are two optional attributes: <code>peak</code>, which specifies maximum rate at which interface can send data, and <code>burst</code>, amount of bytes that can be burst at <code>peak</code> speed. Accepted values for attributes are integer numbers. The units for <code>average</code> and <code>peak</code> attributes are kilobytes per second, and for the <code>burst</code> just kilobytes. - <span class="since">Since 0.9.4</span> + <span class="since">Since 0.9.4</span> The <code>inbound</code> can + optionally have <code>floor</code> attribute. This is there for + guaranteeing minimal throughput for shaped interfaces. This, however, + requires that all traffic goes through one point where QoS decisions can + take place. That's why this attribute works only for virtual networks for + now (that is <code><interface type='network'/></code> with a + forward type of route, nat, or no forward at all). Moreover, the + virtual network the interface is connected to is required to have at least + inbound QoS set (<code>average</code> at least). Moreover, with + <code>floor<code> attribute users don't need to specify + <code>average</code>. However, <code>peak</code> and <code>burst</code> + attributes still require <code>average</code>. Currently, linux kernel + doesn't allow ingress qdiscs to have any classes therefore + <code>floor</code> can be applied only on <code>inbound</code> and not + </code>outbound</code>. <span class="since">Since 1.0.1</span> </p>
<h5><a name="elementVlanTag">Setting VLAN tag (on supported network types only)</a></h5> diff --git a/docs/schemas/networkcommon.rng b/docs/schemas/networkcommon.rng index c7749e7..51ff759 100644 --- a/docs/schemas/networkcommon.rng +++ b/docs/schemas/networkcommon.rng @@ -149,6 +149,11 @@ </attribute> </optional> <optional> + <attribute name="floor"> + <ref name="speed"/> + </attribute> + </optional> + <optional> <attribute name='burst'> <ref name="BurstSize"/> </attribute> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 4ffb39d..99aa08d 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -4781,7 +4781,8 @@ virDomainActualNetDefParseXML(xmlNodePtr node,
bandwidth_node = virXPathNode("./bandwidth", ctxt); if (bandwidth_node && - !(actual->bandwidth = virNetDevBandwidthParse(bandwidth_node))) + !(actual->bandwidth = virNetDevBandwidthParse(bandwidth_node, + actual->type))) goto error;
vlanNode = virXPathNode("./vlan", ctxt); @@ -4969,7 +4970,8 @@ virDomainNetDefParseXML(virCapsPtr caps, goto error; } } else if (xmlStrEqual(cur->name, BAD_CAST "bandwidth")) { - if (!(def->bandwidth = virNetDevBandwidthParse(cur))) + if (!(def->bandwidth = virNetDevBandwidthParse(cur, + def->type))) goto error; } else if (xmlStrEqual(cur->name, BAD_CAST "vlan")) { if (virNetDevVlanParse(cur, ctxt, &def->vlan) < 0) diff --git a/src/conf/netdev_bandwidth_conf.c b/src/conf/netdev_bandwidth_conf.c index 5802eba..15840b9 100644 --- a/src/conf/netdev_bandwidth_conf.c +++ b/src/conf/netdev_bandwidth_conf.c @@ -26,6 +26,7 @@ #include "virterror_internal.h" #include "util.h" #include "memory.h" +#include "domain_conf.h"
#define VIR_FROM_THIS VIR_FROM_NONE
@@ -36,6 +37,7 @@ virNetDevBandwidthParseRate(xmlNodePtr node, virNetDevBandwidthRatePtr rate) char *average = NULL; char *peak = NULL; char *burst = NULL; + char *floor = NULL;
if (!node || !rate) { virReportError(VIR_ERR_INVALID_ARG, "%s", @@ -46,40 +48,55 @@ virNetDevBandwidthParseRate(xmlNodePtr node, virNetDevBandwidthRatePtr rate) average = virXMLPropString(node, "average"); peak = virXMLPropString(node, "peak"); burst = virXMLPropString(node, "burst"); + floor = virXMLPropString(node, "floor");
if (average) { if (virStrToLong_ull(average, NULL, 10, &rate->average) < 0) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("could not convert %s"), + _("could not convert bandwidth average value '%s'"), average); goto cleanup; } - } else { + } else if (!floor) { virReportError(VIR_ERR_XML_DETAIL, "%s", - _("Missing mandatory average attribute")); + _("Missing mandatory average or floor attributes")); + goto cleanup; + } + + if ((peak || burst) && !average) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("'peak' and 'burst' require 'average' attribute")); goto cleanup; }
if (peak && virStrToLong_ull(peak, NULL, 10, &rate->peak) < 0) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("could not convert %s"), + _("could not convert bandwidth peak value '%s'"), peak); goto cleanup; }
if (burst && virStrToLong_ull(burst, NULL, 10, &rate->burst) < 0) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("could not convert %s"), + _("could not convert bandwidth burst value '%s'"), burst); goto cleanup; }
+ if (floor && virStrToLong_ull(floor, NULL, 10, &rate->floor) < 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("could not convert bandwidth floor value '%s'"), + floor); + goto cleanup; + } + ret = 0;
cleanup: VIR_FREE(average); VIR_FREE(peak); VIR_FREE(burst); + VIR_FREE(floor);
return ret; } @@ -87,13 +104,17 @@ cleanup: /** * virNetDevBandwidthParse: * @node: XML node + * @net_type: one of virDomainNetType * - * Parse bandwidth XML and return pointer to structure + * Parse bandwidth XML and return pointer to structure. + * @net_type tell to which type will/is interface connected to. + * Pass -1 if this is not called on interface. * * Returns !NULL on success, NULL on error. */ virNetDevBandwidthPtr -virNetDevBandwidthParse(xmlNodePtr node) +virNetDevBandwidthParse(xmlNodePtr node, + int net_type) { virNetDevBandwidthPtr def = NULL; xmlNodePtr cur; @@ -146,6 +167,20 @@ virNetDevBandwidthParse(xmlNodePtr node) /* helper reported error for us */ goto error; } + + if (def->in->floor && net_type != VIR_DOMAIN_NET_TYPE_NETWORK) { + if (net_type == -1) { + /* 'floor' on network isn't supported */ + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("floor attribute isn't supported for " + "network's bandwidth yet")); + } else { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("floor attribute is supported only for " + "interfaces of type network")); + } + goto error; + } }
if (out) { @@ -158,6 +193,13 @@ virNetDevBandwidthParse(xmlNodePtr node) /* helper reported error for us */ goto error; } + + if (def->out->floor) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("'floor' attribute allowed " + "only in <inbound> element")); + goto error; + } }
return def; @@ -177,13 +219,18 @@ virNetDevBandwidthRateFormat(virNetDevBandwidthRatePtr def, if (!def) return 0;
- if (def->average) { - virBufferAsprintf(buf, " <%s average='%llu'", elem_name, - def->average); + if (def->average || def->floor) { + virBufferAsprintf(buf, " <%s", elem_name); + + if (def->average) + virBufferAsprintf(buf, " average='%llu'", def->average);
if (def->peak) virBufferAsprintf(buf, " peak='%llu'", def->peak);
+ if (def->floor) + virBufferAsprintf(buf, " floor='%llu'", def->floor); + if (def->burst) virBufferAsprintf(buf, " burst='%llu'", def->burst); virBufferAddLit(buf, "/>\n"); diff --git a/src/conf/netdev_bandwidth_conf.h b/src/conf/netdev_bandwidth_conf.h index bca5c50..0080165 100644 --- a/src/conf/netdev_bandwidth_conf.h +++ b/src/conf/netdev_bandwidth_conf.h @@ -28,7 +28,8 @@ # include "buf.h" # include "xml.h"
-virNetDevBandwidthPtr virNetDevBandwidthParse(xmlNodePtr node) +virNetDevBandwidthPtr virNetDevBandwidthParse(xmlNodePtr node, + int net_type) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; int virNetDevBandwidthFormat(virNetDevBandwidthPtr def, virBufferPtr buf) diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index 6ce2e63..2cb9e81 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -1192,7 +1192,7 @@ virNetworkPortGroupParseXML(virPortGroupDefPtr def,
bandwidth_node = virXPathNode("./bandwidth", ctxt); if (bandwidth_node && - !(def->bandwidth = virNetDevBandwidthParse(bandwidth_node))) { + !(def->bandwidth = virNetDevBandwidthParse(bandwidth_node, -1))) { goto error; }
@@ -1268,7 +1268,7 @@ virNetworkDefParseXML(xmlXPathContextPtr ctxt) def->domain = virXPathString("string(./domain[1]/@name)", ctxt);
if ((bandwidthNode = virXPathNode("./bandwidth", ctxt)) != NULL && - (def->bandwidth = virNetDevBandwidthParse(bandwidthNode)) == NULL) + (def->bandwidth = virNetDevBandwidthParse(bandwidthNode, -1)) == NULL) goto error;
vlanNode = virXPathNode("./vlan", ctxt); diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index e046230..35f8b89 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -30,6 +30,7 @@ typedef virNetDevBandwidthRate *virNetDevBandwidthRatePtr; struct _virNetDevBandwidthRate { unsigned long long average; /* kbytes/s */ unsigned long long peak; /* kbytes/s */ + unsigned long long floor; /* kbytes/s */ unsigned long long burst; /* kbytes */ };
ACK. You covered everything I mentioned.

These classes can borrow unused bandwidth. Basically, only egress qdsics can have classes, therefore we can do this kind of traffic shaping only on host's outgoing, that is domain's incoming traffic. --- src/lxc/lxc_process.c | 3 +- src/network/bridge_driver.c | 3 +- src/qemu/qemu_command.c | 3 +- src/qemu/qemu_driver.c | 2 +- src/util/virnetdevbandwidth.c | 93 +++++++++++++++++++++++++++++++++++++--- src/util/virnetdevbandwidth.h | 4 +- src/util/virnetdevmacvlan.c | 2 +- 7 files changed, 97 insertions(+), 13 deletions(-) diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index 8b9d02f..3937839 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -337,7 +337,8 @@ static int virLXCProcessSetupInterfaceBridged(virConnectPtr conn, goto cleanup; if (virNetDevBandwidthSet(net->ifname, - virDomainNetGetActualBandwidth(net)) < 0) { + virDomainNetGetActualBandwidth(net), + false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), net->ifname); diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 75f3c3a..63be5e2 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -2290,7 +2290,8 @@ networkStartNetworkVirtual(struct network_driver *driver, VIR_FORCE_CLOSE(tapfd); } - if (virNetDevBandwidthSet(network->def->bridge, network->def->bandwidth) < 0) { + if (virNetDevBandwidthSet(network->def->bridge, + network->def->bandwidth, true) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), network->def->bridge); diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 1805625..2b466dc 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -292,7 +292,8 @@ qemuNetworkIfaceConnect(virDomainDefPtr def, if (tapfd >= 0 && virNetDevBandwidthSet(net->ifname, - virDomainNetGetActualBandwidth(net)) < 0) { + virDomainNetGetActualBandwidth(net), + false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), net->ifname); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 8e838cd..259ac37 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -8983,7 +8983,7 @@ qemuDomainSetInterfaceParameters(virDomainPtr dom, sizeof(*newBandwidth->out)); } - if (virNetDevBandwidthSet(net->ifname, newBandwidth) < 0) { + if (virNetDevBandwidthSet(net->ifname, newBandwidth, false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), device); diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index 49fc425..71c272e 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -45,17 +45,21 @@ virNetDevBandwidthFree(virNetDevBandwidthPtr def) * virNetDevBandwidthSet: * @ifname: on which interface * @bandwidth: rates to set (may be NULL) + * @hierarchical_class: whether to create hierarchical class * * This function enables QoS on specified interface * and set given traffic limits for both, incoming * and outgoing traffic. Any previous setting get - * overwritten. + * overwritten. If @hierarchical_class is TRUE, create + * hierarchical class. It is used to guarantee minimal + * throughput ('floor' attribute in NIC). * * Return 0 on success, -1 otherwise. */ int virNetDevBandwidthSet(const char *ifname, - virNetDevBandwidthPtr bandwidth) + virNetDevBandwidthPtr bandwidth, + bool hierarchical_class) { int ret = -1; virCommandPtr cmd = NULL; @@ -71,7 +75,7 @@ virNetDevBandwidthSet(const char *ifname, virNetDevBandwidthClear(ifname); - if (bandwidth->in) { + if (bandwidth->in && bandwidth->in->average) { if (virAsprintf(&average, "%llukbps", bandwidth->in->average) < 0) goto cleanup; if (bandwidth->in->peak && @@ -83,15 +87,89 @@ virNetDevBandwidthSet(const char *ifname, cmd = virCommandNew(TC); virCommandAddArgList(cmd, "qdisc", "add", "dev", ifname, "root", - "handle", "1:", "htb", "default", "1", NULL); + "handle", "1:", "htb", "default", + hierarchical_class ? "2" : "1", NULL); if (virCommandRun(cmd, NULL) < 0) goto cleanup; + /* If we are creating a hierarchical class, all non guaranteed traffic + * goes to the 1:2 class which will adjust 'rate' dynamically as NICs + * with guaranteed throughput are plugged and unplugged. Class 1:1 + * exists so we don't exceed the maximum limit for the network. For each + * NIC with guaranteed throughput a separate classid will be created. + * NB '1:' is just a shorter notation of '1:0'. + * + * To get a picture how this works: + * + * +-----+ +---------+ +-----------+ +-----------+ +-----+ + * | | | qdisc | | class 1:1 | | class 1:2 | | | + * | NIC | | def 1:2 | | rate | | rate | | sfq | + * | | --> | | --> | peak | -+-> | peak | --> | | + * +-----+ +---------+ +-----------+ | +-----------+ +-----+ + * | + * | +-----------+ +-----+ + * | | class 1:3 | | | + * | | rate | | sfq | + * +-> | peak | --> | | + * | +-----------+ +-----+ + * ... + * | +-----------+ +-----+ + * | | class 1:n | | | + * | | rate | | sfq | + * +-> | peak | --> | | + * +-----------+ +-----+ + * + * After the routing decision, when is it clear a packet is to be sent + * via a particular NIC, it is sent to the root qdisc (queueing + * discipline). In this case HTB (Hierarchical Token Bucket). It has + * only one direct child class (with id 1:1) which shapes the overall + * rate that is sent through the NIC. This class has at least one child + * (1:2) which is meant for all non-privileged (non guaranteed) traffic + * from all domains. Then, for each interface with guaranteed + * throughput, a separate class (1:n) is created. Imagine a class is a + * box. Whenever a packet ends up in a class it is stored in this box + * until the kernel sends it, then it is removed from box. Packets are + * placed into boxes based on rules (filters) - e.g. depending on + * destination IP/MAC address. If there is no rule to be applied, the + * root qdisc has a default where such packets go (1:2 in this case). + * Packets come in over and over again and boxes get filled more and + * more. Imagine that kernel sends packets just once a second. So it + * starts to traverse through this tree. It starts with the root qdisc + * and through 1:1 it gets to 1:2. It sends packets up to 1:2's 'rate'. + * Then it moves to 1:3 and again sends packets up to 1:3's 'rate'. The + * whole process is repeated until 1:n is processed. So now we have + * ensured each class its guaranteed bandwidth. If the sum of sent data + * doesn't exceed the 'rate' in 1:1 class, we can go further and send + * more packets. The rest of available bandwidth is distributed to the + * 1:2,1:3...1:n classes by ratio of their 'rate'. As soon as the root + * 'rate' limit is reached or there are no more packets to send, we stop + * sending and wait another second. Each class has an SFQ qdisc which + * shuffles packets in boxes stochastically, so one sender cannot + * starve others. + * + * Therefore, whenever we want to plug in a new guaranteed interface, we + * need to create a new class and adjust the 'rate' of the 1:2 class. + * When unplugging we do the exact opposite - remove the associated + * class, and adjust the 'rate'. + * + * This description is rather long, but it is still a good idea to read + * it before you dig into the code. + */ + if (hierarchical_class) { + virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "add", "dev", ifname, "parent", + "1:", "classid", "1:1", "htb", "rate", average, + "ceil", peak ? peak : average, NULL); + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + } virCommandFree(cmd); cmd = virCommandNew(TC); virCommandAddArgList(cmd,"class", "add", "dev", ifname, "parent", - "1:", "classid", "1:1", "htb", NULL); - virCommandAddArgList(cmd, "rate", average, NULL); + hierarchical_class ? "1:1" : "1:", "classid", + hierarchical_class ? "1:2" : "1:1", "htb", + "rate", average, NULL); if (peak) virCommandAddArgList(cmd, "ceil", peak, NULL); @@ -104,7 +182,8 @@ virNetDevBandwidthSet(const char *ifname, virCommandFree(cmd); cmd = virCommandNew(TC); virCommandAddArgList(cmd, "qdisc", "add", "dev", ifname, "parent", - "1:1", "handle", "2:", "sfq", "perturb", + hierarchical_class ? "1:2" : "1:1", + "handle", "2:", "sfq", "perturb", "10", NULL); if (virCommandRun(cmd, NULL) < 0) diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index 35f8b89..d308ab2 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -42,7 +42,9 @@ struct _virNetDevBandwidth { void virNetDevBandwidthFree(virNetDevBandwidthPtr def); -int virNetDevBandwidthSet(const char *ifname, virNetDevBandwidthPtr bandwidth) +int virNetDevBandwidthSet(const char *ifname, + virNetDevBandwidthPtr bandwidth, + bool hierarchical_class) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; int virNetDevBandwidthClear(const char *ifname) ATTRIBUTE_NONNULL(1); diff --git a/src/util/virnetdevmacvlan.c b/src/util/virnetdevmacvlan.c index d8e646a..657c484 100644 --- a/src/util/virnetdevmacvlan.c +++ b/src/util/virnetdevmacvlan.c @@ -925,7 +925,7 @@ create_name: rc = 0; } - if (virNetDevBandwidthSet(cr_ifname, bandwidth) < 0) { + if (virNetDevBandwidthSet(cr_ifname, bandwidth, false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), cr_ifname); -- 1.7.8.6

On 12/04/2012 02:18 PM, Michal Privoznik wrote:
These classes can borrow unused bandwidth. Basically, only egress qdsics can have classes, therefore we can do this kind of traffic shaping only on host's outgoing, that is domain's incoming traffic. --- src/lxc/lxc_process.c | 3 +- src/network/bridge_driver.c | 3 +- src/qemu/qemu_command.c | 3 +- src/qemu/qemu_driver.c | 2 +- src/util/virnetdevbandwidth.c | 93 +++++++++++++++++++++++++++++++++++++--- src/util/virnetdevbandwidth.h | 4 +- src/util/virnetdevmacvlan.c | 2 +- 7 files changed, 97 insertions(+), 13 deletions(-)
diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index 8b9d02f..3937839 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -337,7 +337,8 @@ static int virLXCProcessSetupInterfaceBridged(virConnectPtr conn, goto cleanup;
if (virNetDevBandwidthSet(net->ifname, - virDomainNetGetActualBandwidth(net)) < 0) { + virDomainNetGetActualBandwidth(net), + false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), net->ifname); diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 75f3c3a..63be5e2 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -2290,7 +2290,8 @@ networkStartNetworkVirtual(struct network_driver *driver, VIR_FORCE_CLOSE(tapfd); }
- if (virNetDevBandwidthSet(network->def->bridge, network->def->bandwidth) < 0) { + if (virNetDevBandwidthSet(network->def->bridge, + network->def->bandwidth, true) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), network->def->bridge); diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 1805625..2b466dc 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -292,7 +292,8 @@ qemuNetworkIfaceConnect(virDomainDefPtr def,
if (tapfd >= 0 && virNetDevBandwidthSet(net->ifname, - virDomainNetGetActualBandwidth(net)) < 0) { + virDomainNetGetActualBandwidth(net), + false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), net->ifname); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 8e838cd..259ac37 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -8983,7 +8983,7 @@ qemuDomainSetInterfaceParameters(virDomainPtr dom, sizeof(*newBandwidth->out)); }
- if (virNetDevBandwidthSet(net->ifname, newBandwidth) < 0) { + if (virNetDevBandwidthSet(net->ifname, newBandwidth, false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), device); diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index 49fc425..71c272e 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -45,17 +45,21 @@ virNetDevBandwidthFree(virNetDevBandwidthPtr def) * virNetDevBandwidthSet: * @ifname: on which interface * @bandwidth: rates to set (may be NULL) + * @hierarchical_class: whether to create hierarchical class * * This function enables QoS on specified interface * and set given traffic limits for both, incoming * and outgoing traffic. Any previous setting get - * overwritten. + * overwritten. If @hierarchical_class is TRUE, create + * hierarchical class. It is used to guarantee minimal + * throughput ('floor' attribute in NIC). * * Return 0 on success, -1 otherwise. */ int virNetDevBandwidthSet(const char *ifname, - virNetDevBandwidthPtr bandwidth) + virNetDevBandwidthPtr bandwidth, + bool hierarchical_class) { int ret = -1; virCommandPtr cmd = NULL; @@ -71,7 +75,7 @@ virNetDevBandwidthSet(const char *ifname,
virNetDevBandwidthClear(ifname);
- if (bandwidth->in) { + if (bandwidth->in && bandwidth->in->average) { if (virAsprintf(&average, "%llukbps", bandwidth->in->average) < 0) goto cleanup; if (bandwidth->in->peak && @@ -83,15 +87,89 @@ virNetDevBandwidthSet(const char *ifname,
cmd = virCommandNew(TC); virCommandAddArgList(cmd, "qdisc", "add", "dev", ifname, "root", - "handle", "1:", "htb", "default", "1", NULL); + "handle", "1:", "htb", "default", + hierarchical_class ? "2" : "1", NULL); if (virCommandRun(cmd, NULL) < 0) goto cleanup;
+ /* If we are creating a hierarchical class, all non guaranteed traffic + * goes to the 1:2 class which will adjust 'rate' dynamically as NICs + * with guaranteed throughput are plugged and unplugged. Class 1:1 + * exists so we don't exceed the maximum limit for the network. For each + * NIC with guaranteed throughput a separate classid will be created. + * NB '1:' is just a shorter notation of '1:0'. + * + * To get a picture how this works: + * + * +-----+ +---------+ +-----------+ +-----------+ +-----+ + * | | | qdisc | | class 1:1 | | class 1:2 | | | + * | NIC | | def 1:2 | | rate | | rate | | sfq | + * | | --> | | --> | peak | -+-> | peak | --> | | + * +-----+ +---------+ +-----------+ | +-----------+ +-----+ + * | + * | +-----------+ +-----+ + * | | class 1:3 | | | + * | | rate | | sfq | + * +-> | peak | --> | | + * | +-----------+ +-----+ + * ... + * | +-----------+ +-----+ + * | | class 1:n | | | + * | | rate | | sfq | + * +-> | peak | --> | | + * +-----------+ +-----+ + * + * After the routing decision, when is it clear a packet is to be sent + * via a particular NIC, it is sent to the root qdisc (queueing + * discipline). In this case HTB (Hierarchical Token Bucket). It has + * only one direct child class (with id 1:1) which shapes the overall + * rate that is sent through the NIC. This class has at least one child + * (1:2) which is meant for all non-privileged (non guaranteed) traffic + * from all domains. Then, for each interface with guaranteed + * throughput, a separate class (1:n) is created. Imagine a class is a + * box. Whenever a packet ends up in a class it is stored in this box + * until the kernel sends it, then it is removed from box. Packets are + * placed into boxes based on rules (filters) - e.g. depending on + * destination IP/MAC address. If there is no rule to be applied, the + * root qdisc has a default where such packets go (1:2 in this case). + * Packets come in over and over again and boxes get filled more and + * more. Imagine that kernel sends packets just once a second. So it + * starts to traverse through this tree. It starts with the root qdisc + * and through 1:1 it gets to 1:2. It sends packets up to 1:2's 'rate'. + * Then it moves to 1:3 and again sends packets up to 1:3's 'rate'. The + * whole process is repeated until 1:n is processed. So now we have + * ensured each class its guaranteed bandwidth. If the sum of sent data + * doesn't exceed the 'rate' in 1:1 class, we can go further and send + * more packets. The rest of available bandwidth is distributed to the + * 1:2,1:3...1:n classes by ratio of their 'rate'. As soon as the root + * 'rate' limit is reached or there are no more packets to send, we stop + * sending and wait another second. Each class has an SFQ qdisc which + * shuffles packets in boxes stochastically, so one sender cannot + * starve others. + * + * Therefore, whenever we want to plug in a new guaranteed interface, we + * need to create a new class and adjust the 'rate' of the 1:2 class. + * When unplugging we do the exact opposite - remove the associated + * class, and adjust the 'rate'. + * + * This description is rather long, but it is still a good idea to read + * it before you dig into the code. + */ + if (hierarchical_class) { + virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "add", "dev", ifname, "parent", + "1:", "classid", "1:1", "htb", "rate", average, + "ceil", peak ? peak : average, NULL); + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + }
One twist to this that I thought about - when you first upgrade to a version of libvirt that supports floor, you will most likely already have some networks and interfaces running that don't have the hierarchical class setup, so it won't work properly until you bring down all the domains and networks and restart them. I'm not sure what the best solution to this particular version of the live upgrade problem is, though, so I'm not going to let it stop me from giving an ACK.
virCommandFree(cmd); cmd = virCommandNew(TC); virCommandAddArgList(cmd,"class", "add", "dev", ifname, "parent", - "1:", "classid", "1:1", "htb", NULL); - virCommandAddArgList(cmd, "rate", average, NULL); + hierarchical_class ? "1:1" : "1:", "classid", + hierarchical_class ? "1:2" : "1:1", "htb", + "rate", average, NULL);
if (peak) virCommandAddArgList(cmd, "ceil", peak, NULL); @@ -104,7 +182,8 @@ virNetDevBandwidthSet(const char *ifname, virCommandFree(cmd); cmd = virCommandNew(TC); virCommandAddArgList(cmd, "qdisc", "add", "dev", ifname, "parent", - "1:1", "handle", "2:", "sfq", "perturb", + hierarchical_class ? "1:2" : "1:1", + "handle", "2:", "sfq", "perturb", "10", NULL);
if (virCommandRun(cmd, NULL) < 0) diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index 35f8b89..d308ab2 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -42,7 +42,9 @@ struct _virNetDevBandwidth {
void virNetDevBandwidthFree(virNetDevBandwidthPtr def);
-int virNetDevBandwidthSet(const char *ifname, virNetDevBandwidthPtr bandwidth) +int virNetDevBandwidthSet(const char *ifname, + virNetDevBandwidthPtr bandwidth, + bool hierarchical_class) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; int virNetDevBandwidthClear(const char *ifname) ATTRIBUTE_NONNULL(1); diff --git a/src/util/virnetdevmacvlan.c b/src/util/virnetdevmacvlan.c index d8e646a..657c484 100644 --- a/src/util/virnetdevmacvlan.c +++ b/src/util/virnetdevmacvlan.c @@ -925,7 +925,7 @@ create_name: rc = 0; }
- if (virNetDevBandwidthSet(cr_ifname, bandwidth) < 0) { + if (virNetDevBandwidthSet(cr_ifname, bandwidth, false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), cr_ifname);
ACK (I'd already ACKed this one, pending answers to my questions).

These set bridge part of QoS when bringing domain's interface up. Long story short, if there's a 'floor' set, a new QoS class is created. ClassID MUST be unique within the bridge and should be kept for unplug phase. --- po/POTFILES.in | 1 + src/util/virnetdevbandwidth.c | 188 +++++++++++++++++++++++++++++++++++++++++ src/util/virnetdevbandwidth.h | 13 +++ 3 files changed, 202 insertions(+), 0 deletions(-) diff --git a/po/POTFILES.in b/po/POTFILES.in index 586aa2b..d47f445 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -158,6 +158,7 @@ src/util/virinitctl.c src/util/virkeyfile.c src/util/virlockspace.c src/util/virnetdev.c +src/util/virnetdevbandwidth.c src/util/virnetdevbridge.c src/util/virnetdevmacvlan.c src/util/virnetdevopenvswitch.c diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index 71c272e..6c4920b 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -352,3 +352,191 @@ virNetDevBandwidthEqual(virNetDevBandwidthPtr a, return true; } + +/* + * virNetDevBandwidthPlug: + * @brname: name of the bridge + * @net_bandwidth: QoS settings on @brname + * @ifmac: MAC of interface + * @bandwidth: QoS settings for interface + * @id: unique ID (MUST be greater than 2) + * + * Set bridge part of interface QoS settings, e.g. guaranteed + * bandwidth. @id is an unique ID (among @brname) from which + * other identifiers for class, qdisc and filter are derived. + * However, two classes were already set up (by + * virNetDevBandwidthSet). That's why this @id MUST be greater + * than 2. You may want to keep passed @id, as it is used later + * by virNetDevBandwidthUnplug. + * + * Returns: + * 1 if there is nothing to set + * 0 if QoS set successfully + * -1 otherwise. + */ +int +virNetDevBandwidthPlug(const char *brname, + virNetDevBandwidthPtr net_bandwidth, + const virMacAddrPtr ifmac_ptr, + virNetDevBandwidthPtr bandwidth, + unsigned int id) +{ + int ret = -1; + virCommandPtr cmd = NULL; + char *class_id = NULL; + char *qdisc_id = NULL; + char *filter_id = NULL; + char *floor = NULL; + char *ceil = NULL; + unsigned char ifmac[VIR_MAC_BUFLEN]; + char ifmacStr[VIR_MAC_STRING_BUFLEN]; + char *mac[2] = {NULL, NULL}; + + if (!bandwidth || !bandwidth->in || !bandwidth->in->floor) { + /* nothing to set */ + return 1; + } + + if (id <= 2) { + virReportError(VIR_ERR_INTERNAL_ERROR, _("Invalid class ID %d"), id); + return -1; + } + + virMacAddrGetRaw(ifmac_ptr, ifmac); + virMacAddrFormat(ifmac_ptr, ifmacStr); + + if (!net_bandwidth || !net_bandwidth->in) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Bridge '%s' has no QoS set, therefore " + "unable to set 'floor' on '%s'"), + brname, ifmacStr); + return -1; + } + + if (virAsprintf(&class_id, "1:%x", id) < 0 || + virAsprintf(&qdisc_id, "%x:", id) < 0 || + virAsprintf(&filter_id, "%u", id) < 0 || + virAsprintf(&mac[0], "0x%02x%02x%02x%02x", ifmac[2], + ifmac[3], ifmac[4], ifmac[5]) < 0 || + virAsprintf(&mac[1], "0x%02x%02x", ifmac[0], ifmac[1]) < 0 || + virAsprintf(&floor, "%llukbps", bandwidth->in->floor) < 0 || + virAsprintf(&ceil, "%llukbps", net_bandwidth->in->peak ? + net_bandwidth->in->peak : + net_bandwidth->in->average) < 0) { + virReportOOMError(); + goto cleanup; + } + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "add", "dev", brname, "parent", "1:1", + "classid", class_id, "htb", "rate", floor, + "ceil", ceil, NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "add", "dev", brname, "parent", + class_id, "handle", qdisc_id, "sfq", "perturb", + "10", NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + /* Okay, this not nice. But since libvirt does not know anything about + * interface IP address(es), and tc fw filter simply refuse to use ebtables + * marks, we need to use u32 selector to match MAC address. + * If libvirt will ever know something, remove this FIXME + */ + virCommandAddArgList(cmd, "filter", "add", "dev", brname, "protocol", "ip", + "prio", filter_id, "u32", + "match", "u16", "0x0800", "0xffff", "at", "-2", + "match", "u32", mac[0], "0xffffffff", "at", "-12", + "match", "u16", mac[1], "0xffff", "at", "-14", + "flowid", class_id, NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + ret = 0; + +cleanup: + VIR_FREE(mac[1]); + VIR_FREE(mac[0]); + VIR_FREE(ceil); + VIR_FREE(floor); + VIR_FREE(filter_id); + VIR_FREE(qdisc_id); + VIR_FREE(class_id); + virCommandFree(cmd); + return ret; +} + +/* + * virNetDevBandwidthUnplug: + * @brname: from which bridge are we unplugging + * @id: unique identifier (MUST be greater than 2) + * + * Remove QoS settings from bridge. + * + * Returns 0 on success, -1 otherwise. + */ +int +virNetDevBandwidthUnplug(const char *brname, + unsigned int id) +{ + int ret = -1; + int cmd_ret = 0; + virCommandPtr cmd = NULL; + char *class_id = NULL; + char *qdisc_id = NULL; + char *filter_id = NULL; + + if (id <= 2) { + virReportError(VIR_ERR_INTERNAL_ERROR, _("Invalid class ID %d"), id); + return -1; + } + + if (virAsprintf(&class_id, "1:%x", id) < 0 || + virAsprintf(&qdisc_id, "%x:", id) < 0 || + virAsprintf(&filter_id, "%u", id) < 0) { + virReportOOMError(); + goto cleanup; + } + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "del", "dev", brname, + "handle", qdisc_id, NULL); + + /* Don't threat tc errors as fatal, but + * try to remove as much as possible */ + if (virCommandRun(cmd, &cmd_ret) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "filter", "del", "dev", brname, + "prio", filter_id, NULL); + + if (virCommandRun(cmd, &cmd_ret) < 0) + goto cleanup; + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "del", "dev", brname, + "classid", class_id, NULL); + + if (virCommandRun(cmd, &cmd_ret) < 0) + goto cleanup; + + ret = 0; + +cleanup: + VIR_FREE(filter_id); + VIR_FREE(qdisc_id); + VIR_FREE(class_id); + virCommandFree(cmd); + return ret; +} diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index d308ab2..3d4072d 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -24,6 +24,7 @@ # define __VIR_NETDEV_BANDWIDTH_H__ # include "internal.h" +# include "virmacaddr.h" typedef struct _virNetDevBandwidthRate virNetDevBandwidthRate; typedef virNetDevBandwidthRate *virNetDevBandwidthRatePtr; @@ -53,4 +54,16 @@ int virNetDevBandwidthCopy(virNetDevBandwidthPtr *dest, const virNetDevBandwidth bool virNetDevBandwidthEqual(virNetDevBandwidthPtr a, virNetDevBandwidthPtr b); +int virNetDevBandwidthPlug(const char *brname, + virNetDevBandwidthPtr net_bandwidth, + const virMacAddrPtr ifmac_ptr, + virNetDevBandwidthPtr bandwidth, + unsigned int id) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) + ATTRIBUTE_RETURN_CHECK; + +int virNetDevBandwidthUnplug(const char *brname, + unsigned int id) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; + #endif /* __VIR_NETDEV_BANDWIDTH_H__ */ -- 1.7.8.6

On 12/04/2012 02:18 PM, Michal Privoznik wrote:
These set bridge part of QoS when bringing domain's interface up. Long story short, if there's a 'floor' set, a new QoS class is created. ClassID MUST be unique within the bridge and should be kept for unplug phase. --- po/POTFILES.in | 1 + src/util/virnetdevbandwidth.c | 188 +++++++++++++++++++++++++++++++++++++++++ src/util/virnetdevbandwidth.h | 13 +++ 3 files changed, 202 insertions(+), 0 deletions(-)
diff --git a/po/POTFILES.in b/po/POTFILES.in index 586aa2b..d47f445 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -158,6 +158,7 @@ src/util/virinitctl.c src/util/virkeyfile.c src/util/virlockspace.c src/util/virnetdev.c +src/util/virnetdevbandwidth.c src/util/virnetdevbridge.c src/util/virnetdevmacvlan.c src/util/virnetdevopenvswitch.c diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index 71c272e..6c4920b 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -352,3 +352,191 @@ virNetDevBandwidthEqual(virNetDevBandwidthPtr a,
return true; } + +/* + * virNetDevBandwidthPlug: + * @brname: name of the bridge + * @net_bandwidth: QoS settings on @brname + * @ifmac: MAC of interface + * @bandwidth: QoS settings for interface + * @id: unique ID (MUST be greater than 2) + * + * Set bridge part of interface QoS settings, e.g. guaranteed + * bandwidth. @id is an unique ID (among @brname) from which + * other identifiers for class, qdisc and filter are derived. + * However, two classes were already set up (by + * virNetDevBandwidthSet). That's why this @id MUST be greater + * than 2. You may want to keep passed @id, as it is used later + * by virNetDevBandwidthUnplug. + * + * Returns: + * 1 if there is nothing to set + * 0 if QoS set successfully + * -1 otherwise. + */ +int +virNetDevBandwidthPlug(const char *brname, + virNetDevBandwidthPtr net_bandwidth, + const virMacAddrPtr ifmac_ptr, + virNetDevBandwidthPtr bandwidth, + unsigned int id) +{ + int ret = -1; + virCommandPtr cmd = NULL; + char *class_id = NULL; + char *qdisc_id = NULL; + char *filter_id = NULL; + char *floor = NULL; + char *ceil = NULL; + unsigned char ifmac[VIR_MAC_BUFLEN]; + char ifmacStr[VIR_MAC_STRING_BUFLEN]; + char *mac[2] = {NULL, NULL}; + + if (!bandwidth || !bandwidth->in || !bandwidth->in->floor) { + /* nothing to set */ + return 1; + }
I find this "return 1 when nothing needs to be done" a bit confusing, in particular the bit in 6/8 where you log the warning about "maybe you should upgrade libvirt". It seems like what you're checking for should only be possible if there's an inconsistency/error among your functions, is that right? Is there a real danger of that, or are you just being paranoid? :-) (at some point you have to be able to make assumptions about the correctness of other functions in the same subsystem, or your code ends up spending most of its energy protecting against itself)
+ + if (id <= 2) { + virReportError(VIR_ERR_INTERNAL_ERROR, _("Invalid class ID %d"), id); + return -1; + } + + virMacAddrGetRaw(ifmac_ptr, ifmac); + virMacAddrFormat(ifmac_ptr, ifmacStr); + + if (!net_bandwidth || !net_bandwidth->in) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Bridge '%s' has no QoS set, therefore " + "unable to set 'floor' on '%s'"), + brname, ifmacStr); + return -1; + } + + if (virAsprintf(&class_id, "1:%x", id) < 0 || + virAsprintf(&qdisc_id, "%x:", id) < 0 || + virAsprintf(&filter_id, "%u", id) < 0 || + virAsprintf(&mac[0], "0x%02x%02x%02x%02x", ifmac[2], + ifmac[3], ifmac[4], ifmac[5]) < 0 || + virAsprintf(&mac[1], "0x%02x%02x", ifmac[0], ifmac[1]) < 0 || + virAsprintf(&floor, "%llukbps", bandwidth->in->floor) < 0 || + virAsprintf(&ceil, "%llukbps", net_bandwidth->in->peak ? + net_bandwidth->in->peak : + net_bandwidth->in->average) < 0) { + virReportOOMError(); + goto cleanup; + } + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "add", "dev", brname, "parent", "1:1", + "classid", class_id, "htb", "rate", floor, + "ceil", ceil, NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "add", "dev", brname, "parent", + class_id, "handle", qdisc_id, "sfq", "perturb", + "10", NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + /* Okay, this not nice. But since libvirt does not know anything about + * interface IP address(es), and tc fw filter simply refuse to use ebtables + * marks, we need to use u32 selector to match MAC address. + * If libvirt will ever know something, remove this FIXME + */ + virCommandAddArgList(cmd, "filter", "add", "dev", brname, "protocol", "ip", + "prio", filter_id, "u32", + "match", "u16", "0x0800", "0xffff", "at", "-2", + "match", "u32", mac[0], "0xffffffff", "at", "-12", + "match", "u16", mac[1], "0xffff", "at", "-14", + "flowid", class_id, NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + ret = 0; + +cleanup: + VIR_FREE(mac[1]); + VIR_FREE(mac[0]); + VIR_FREE(ceil); + VIR_FREE(floor); + VIR_FREE(filter_id); + VIR_FREE(qdisc_id); + VIR_FREE(class_id); + virCommandFree(cmd); + return ret; +} + +/* + * virNetDevBandwidthUnplug: + * @brname: from which bridge are we unplugging + * @id: unique identifier (MUST be greater than 2) + * + * Remove QoS settings from bridge. + * + * Returns 0 on success, -1 otherwise. + */ +int +virNetDevBandwidthUnplug(const char *brname, + unsigned int id) +{ + int ret = -1; + int cmd_ret = 0; + virCommandPtr cmd = NULL; + char *class_id = NULL; + char *qdisc_id = NULL; + char *filter_id = NULL; + + if (id <= 2) { + virReportError(VIR_ERR_INTERNAL_ERROR, _("Invalid class ID %d"), id); + return -1; + } + + if (virAsprintf(&class_id, "1:%x", id) < 0 || + virAsprintf(&qdisc_id, "%x:", id) < 0 || + virAsprintf(&filter_id, "%u", id) < 0) { + virReportOOMError(); + goto cleanup; + } + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "del", "dev", brname, + "handle", qdisc_id, NULL); + + /* Don't threat tc errors as fatal, but + * try to remove as much as possible */ + if (virCommandRun(cmd, &cmd_ret) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "filter", "del", "dev", brname, + "prio", filter_id, NULL); + + if (virCommandRun(cmd, &cmd_ret) < 0) + goto cleanup; + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "del", "dev", brname, + "classid", class_id, NULL); + + if (virCommandRun(cmd, &cmd_ret) < 0) + goto cleanup; + + ret = 0; + +cleanup: + VIR_FREE(filter_id); + VIR_FREE(qdisc_id); + VIR_FREE(class_id); + virCommandFree(cmd); + return ret; +} diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index d308ab2..3d4072d 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -24,6 +24,7 @@ # define __VIR_NETDEV_BANDWIDTH_H__
# include "internal.h" +# include "virmacaddr.h"
typedef struct _virNetDevBandwidthRate virNetDevBandwidthRate; typedef virNetDevBandwidthRate *virNetDevBandwidthRatePtr; @@ -53,4 +54,16 @@ int virNetDevBandwidthCopy(virNetDevBandwidthPtr *dest, const virNetDevBandwidth
bool virNetDevBandwidthEqual(virNetDevBandwidthPtr a, virNetDevBandwidthPtr b);
+int virNetDevBandwidthPlug(const char *brname, + virNetDevBandwidthPtr net_bandwidth, + const virMacAddrPtr ifmac_ptr, + virNetDevBandwidthPtr bandwidth, + unsigned int id) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) + ATTRIBUTE_RETURN_CHECK; + +int virNetDevBandwidthUnplug(const char *brname, + unsigned int id) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; + #endif /* __VIR_NETDEV_BANDWIDTH_H__ */
ACK.

This will be used whenever a NIC with guaranteed throughput is to be plugged into a bridge. It will adjust the average throughput of non guaranteed NICs (classid 1:2) to meet new requirements. --- src/util/virnetdevbandwidth.c | 49 +++++++++++++++++++++++++++++++++++++++++ src/util/virnetdevbandwidth.h | 6 +++++ 2 files changed, 55 insertions(+), 0 deletions(-) diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index 6c4920b..5fa8b87 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -540,3 +540,52 @@ cleanup: virCommandFree(cmd); return ret; } + +/** + * virNetDevBandwidthUpdateRate: + * @ifname: interface name + * @classid: ID of class to update + * @new_rate: new rate + * + * This function updates the 'rate' attribute of HTB class. + * It can be used whenever a new interface is plugged to a + * bridge to adjust average throughput of non guaranteed + * NICs. + * + * Returns 0 on success, -1 otherwise. + */ +int +virNetDevBandwidthUpdateRate(const char *ifname, + const char *class_id, + virNetDevBandwidthPtr bandwidth, + unsigned long long new_rate) +{ + int ret = -1; + virCommandPtr cmd = NULL; + char *rate = NULL; + char *ceil = NULL; + + if (virAsprintf(&rate, "%llukbps", new_rate) < 0 || + virAsprintf(&ceil, "%llukbps", bandwidth->in->peak ? + bandwidth->in->peak : + bandwidth->in->average) < 0) { + virReportOOMError(); + goto cleanup; + } + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "change", "dev", ifname, + "classid", class_id, "htb", "rate", rate, + "ceil", ceil, NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + ret = 0; + +cleanup: + virCommandFree(cmd); + VIR_FREE(rate); + VIR_FREE(ceil); + return ret; +} diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index 3d4072d..66f9e37 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -66,4 +66,10 @@ int virNetDevBandwidthUnplug(const char *brname, unsigned int id) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; +int virNetDevBandwidthUpdateRate(const char *ifname, + const char *class_id, + virNetDevBandwidthPtr bandwidth, + unsigned long long new_rate) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) + ATTRIBUTE_RETURN_CHECK; #endif /* __VIR_NETDEV_BANDWIDTH_H__ */ -- 1.7.8.6

On 12/04/2012 02:18 PM, Michal Privoznik wrote:
This will be used whenever a NIC with guaranteed throughput is to be plugged into a bridge. It will adjust the average throughput of non guaranteed NICs (classid 1:2) to meet new requirements. --- src/util/virnetdevbandwidth.c | 49 +++++++++++++++++++++++++++++++++++++++++ src/util/virnetdevbandwidth.h | 6 +++++ 2 files changed, 55 insertions(+), 0 deletions(-)
diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index 6c4920b..5fa8b87 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -540,3 +540,52 @@ cleanup: virCommandFree(cmd); return ret; } + +/** + * virNetDevBandwidthUpdateRate: + * @ifname: interface name + * @classid: ID of class to update + * @new_rate: new rate + * + * This function updates the 'rate' attribute of HTB class. + * It can be used whenever a new interface is plugged to a + * bridge to adjust average throughput of non guaranteed + * NICs. + * + * Returns 0 on success, -1 otherwise. + */ +int +virNetDevBandwidthUpdateRate(const char *ifname, + const char *class_id, + virNetDevBandwidthPtr bandwidth, + unsigned long long new_rate) +{ + int ret = -1; + virCommandPtr cmd = NULL; + char *rate = NULL; + char *ceil = NULL; + + if (virAsprintf(&rate, "%llukbps", new_rate) < 0 || + virAsprintf(&ceil, "%llukbps", bandwidth->in->peak ? + bandwidth->in->peak : + bandwidth->in->average) < 0) { + virReportOOMError(); + goto cleanup; + } + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "change", "dev", ifname, + "classid", class_id, "htb", "rate", rate, + "ceil", ceil, NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + ret = 0; + +cleanup: + virCommandFree(cmd); + VIR_FREE(rate); + VIR_FREE(ceil); + return ret; +} diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index 3d4072d..66f9e37 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -66,4 +66,10 @@ int virNetDevBandwidthUnplug(const char *brname, unsigned int id) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK;
+int virNetDevBandwidthUpdateRate(const char *ifname, + const char *class_id, + virNetDevBandwidthPtr bandwidth, + unsigned long long new_rate) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) + ATTRIBUTE_RETURN_CHECK; #endif /* __VIR_NETDEV_BANDWIDTH_H__ */
ACK

Network should be notified if we plug in or unplug an interface, so it can perform some action, e.g. set/unset network part of QoS. However, we are doing this in very early stage, so iface->ifname isn't filled in yet. So whenever we want to report an error, we must use a different identifier, e.g. the MAC address. --- src/Makefile.am | 7 +- src/conf/domain_conf.h | 1 + src/conf/network_conf.c | 21 ++++ src/conf/network_conf.h | 4 + src/libvirt_network.syms | 13 +++ src/libvirt_private.syms | 8 -- src/network/bridge_driver.c | 223 ++++++++++++++++++++++++++++++++++++++++++- 7 files changed, 266 insertions(+), 11 deletions(-) create mode 100644 src/libvirt_network.syms diff --git a/src/Makefile.am b/src/Makefile.am index 01cb995..04378d1 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -1373,6 +1373,10 @@ if WITH_ATOMIC_OPS_PTHREAD USED_SYM_FILES += libvirt_atomic.syms endif +if WITH_NETWORK +USED_SYM_FILES += libvirt_network.syms +endif + EXTRA_DIST += \ libvirt_public.syms \ libvirt_private.syms \ @@ -1386,7 +1390,8 @@ EXTRA_DIST += \ libvirt_sasl.syms \ libvirt_vmx.syms \ libvirt_xenxs.syms \ - libvirt_libssh2.syms + libvirt_libssh2.syms \ + libvirt_network.syms GENERATED_SYM_FILES = libvirt.syms libvirt.def libvirt_qemu.def diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 4ab15e9..b4d149b 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -807,6 +807,7 @@ struct _virDomainActualNetDef { virNetDevVPortProfilePtr virtPortProfile; virNetDevBandwidthPtr bandwidth; virNetDevVlan vlan; + unsigned int class_id; /* class ID for bandwidth 'floor' */ }; /* Stores the virtual network interface configuration */ diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index 2cb9e81..441ccdc 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -47,6 +47,9 @@ #define MAX_BRIDGE_ID 256 #define VIR_FROM_THIS VIR_FROM_NETWORK +#define NEXT_FREE_CLASS_ID 3 +/* currently, /sbin/tc implementation allows up 16 bits for minor class size */ +#define CLASS_ID_BITMAP_SIZE (1<<16) VIR_ENUM_IMPL(virNetworkForward, VIR_NETWORK_FORWARD_LAST, @@ -319,10 +322,28 @@ virNetworkAssignDef(virNetworkObjListPtr nets, virNetworkObjLock(network); network->def = def; + if (!(network->class_id = virBitmapNew(CLASS_ID_BITMAP_SIZE))) { + virReportOOMError(); + goto error; + } + + if (virBitmapSetBit(network->class_id, 0) < 0 || + virBitmapSetBit(network->class_id, 1) < 0 || + virBitmapSetBit(network->class_id, 2) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("unable to initialize class id bitmap")); + goto error; + } + + network->def = def; nets->objs[nets->count] = network; nets->count++; return network; +error: + virNetworkObjUnlock(network); + virNetworkObjFree(network); + return NULL; } diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h index 3e46304..20f8a51 100644 --- a/src/conf/network_conf.h +++ b/src/conf/network_conf.h @@ -38,6 +38,7 @@ # include "virnetdevvlan.h" # include "virmacaddr.h" # include "device_conf.h" +# include "bitmap.h" enum virNetworkForwardType { VIR_NETWORK_FORWARD_NONE = 0, @@ -221,6 +222,9 @@ struct _virNetworkObj { virNetworkDefPtr def; /* The current definition */ virNetworkDefPtr newDef; /* New definition to activate at shutdown */ + + virBitmapPtr class_id; /* bitmap of class IDs for QoS */ + unsigned long long floor_sum; /* sum of all 'floor'-s of attached NICs */ }; typedef struct _virNetworkObjList virNetworkObjList; diff --git a/src/libvirt_network.syms b/src/libvirt_network.syms new file mode 100644 index 0000000..fc8d415 --- /dev/null +++ b/src/libvirt_network.syms @@ -0,0 +1,13 @@ +# +# Network-specific symbols +# + +# virnetdevbandwidth.h +virNetDevBandwidthClear; +virNetDevBandwidthCopy; +virNetDevBandwidthEqual; +virNetDevBandwidthFree; +virNetDevBandwidthSet; +virNetDevBandwidthPlug; +virNetDevBandwidthUnplug; +virNetDevBandwidthUpdateRate; diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 41e2629..f5648a0 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1497,14 +1497,6 @@ virNetDevSetOnline; virNetDevValidateConfig; -# virnetdevbandwidth.h -virNetDevBandwidthClear; -virNetDevBandwidthCopy; -virNetDevBandwidthEqual; -virNetDevBandwidthFree; -virNetDevBandwidthSet; - - # virnetdevbridge.h virNetDevBridgeAddPort; virNetDevBridgeCreate; diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 63be5e2..15cbc87 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -121,6 +121,11 @@ static int networkShutdownNetworkExternal(struct network_driver *driver, static void networkReloadIptablesRules(struct network_driver *driver); static void networkRefreshDaemons(struct network_driver *driver); +static int networkPlugBandwidth(virNetworkObjPtr net, + virDomainNetDefPtr iface); +static int networkUnplugBandwidth(virNetworkObjPtr net, + virDomainNetDefPtr iface); + static struct network_driver *driverState = NULL; static char * @@ -3469,6 +3474,7 @@ networkAllocateActualDevice(virDomainNetDefPtr iface) enum virDomainNetType actualType = iface->type; virNetworkObjPtr network = NULL; virNetworkDefPtr netdef = NULL; + virNetDevBandwidthPtr bandwidth = NULL; virPortGroupDefPtr portgroup = NULL; virNetDevVPortProfilePtr virtport = iface->virtPortProfile; virNetDevVlanPtr vlan = NULL; @@ -3507,7 +3513,13 @@ networkAllocateActualDevice(virDomainNetDefPtr iface) * (already in NetDef). Otherwise, if there is bandwidth info in * the portgroup, fill that into the ActualDef. */ - if (portgroup && !iface->bandwidth) { + + if (iface->bandwidth) + bandwidth = iface->bandwidth; + else if (portgroup && portgroup->bandwidth) + bandwidth = portgroup->bandwidth; + + if (bandwidth) { if (!iface->data.network.actual && (VIR_ALLOC(iface->data.network.actual) < 0)) { virReportOOMError(); @@ -3515,7 +3527,7 @@ networkAllocateActualDevice(virDomainNetDefPtr iface) } if (virNetDevBandwidthCopy(&iface->data.network.actual->bandwidth, - portgroup->bandwidth) < 0) + bandwidth) < 0) goto error; } @@ -3528,6 +3540,10 @@ networkAllocateActualDevice(virDomainNetDefPtr iface) */ if (iface->data.network.actual) iface->data.network.actual->type = VIR_DOMAIN_NET_TYPE_NETWORK; + + if (networkPlugBandwidth(network, iface) < 0) + goto error; + } else if ((netdef->forwardType == VIR_NETWORK_FORWARD_BRIDGE) && netdef->bridge) { @@ -4039,6 +4055,12 @@ networkReleaseActualDevice(virDomainNetDefPtr iface) } netdef = network->def; + if ((netdef->forwardType == VIR_NETWORK_FORWARD_NONE || + netdef->forwardType == VIR_NETWORK_FORWARD_NAT || + netdef->forwardType == VIR_NETWORK_FORWARD_ROUTE) && + networkUnplugBandwidth(network, iface) < 0) + goto error; + if ((!iface->data.network.actual) || ((actualType != VIR_DOMAIN_NET_TYPE_DIRECT) && (actualType != VIR_DOMAIN_NET_TYPE_HOSTDEV))) { @@ -4242,3 +4264,200 @@ cleanup: error: goto cleanup; } + +/** + * networkCheckBandwidth: + * @net: network QoS + * @iface: interface QoS + * @new_rate: new rate for non guaranteed class + * + * Returns: -1 if plugging would overcommit network QoS + * 0 if plugging is safe (@new_rate updated) + * 1 if no QoS is set (@new_rate untouched) + */ +static int +networkCheckBandwidth(virNetworkObjPtr net, + virDomainNetDefPtr iface, + unsigned long long *new_rate) +{ + int ret = -1; + virNetDevBandwidthPtr netBand = net->def->bandwidth; + virNetDevBandwidthPtr ifaceBand = iface->bandwidth; + unsigned long long tmp_floor_sum = net->floor_sum; + unsigned long long tmp_new_rate = 0; + char ifmac[VIR_MAC_STRING_BUFLEN]; + + if (!ifaceBand || !ifaceBand->in || !ifaceBand->in->floor || + !netBand || !netBand->in) + return 1; + + virMacAddrFormat(&iface->mac, ifmac); + + tmp_new_rate = netBand->in->average; + tmp_floor_sum += ifaceBand->in->floor; + + /* check against peak */ + if (netBand->in->peak) { + tmp_new_rate = netBand->in->peak; + if (tmp_floor_sum > netBand->in->peak) { + virReportError(VIR_ERR_OPERATION_INVALID, + _("Cannot plug '%s' interface into '%s' because it " + "would overcommit 'peak' on network '%s'"), + ifmac, + net->def->bridge, + net->def->name); + goto cleanup; + } + } else if (tmp_floor_sum > netBand->in->average) { + /* tmp_floor_sum can be between 'average' and 'peak' iff 'peak' is set. + * Otherwise, tmp_floor_sum must be below 'average'. */ + virReportError(VIR_ERR_OPERATION_INVALID, + _("Cannot plug '%s' interface into '%s' because it " + "would overcommit 'average' on network '%s'"), + ifmac, + net->def->bridge, + net->def->name); + goto cleanup; + } + + *new_rate = tmp_new_rate; + ret = 0; + +cleanup: + return ret; +} + +/** + * networkNextClassID: + * @net: network object + * + * Find next free class ID. @net is supposed + * to be locked already. If there is a free ID, + * it is marked as used and returned. + * + * Returns next free class ID or -1 if none is available. + */ +static ssize_t +networkNextClassID(virNetworkObjPtr net) +{ + size_t ret = 0; + bool is_set = false; + + while (virBitmapGetBit(net->class_id, ret, &is_set) == 0 && is_set) + ret++; + + if (is_set || virBitmapSetBit(net->class_id, ret) < 0) + return -1; + + return ret; +} + +static int +networkPlugBandwidth(virNetworkObjPtr net, + virDomainNetDefPtr iface) +{ + int ret = -1; + int plug_ret; + unsigned long long new_rate = 0; + ssize_t class_id = 0; + char ifmac[VIR_MAC_STRING_BUFLEN]; + + /* XXX iface->ifname is not filled in, it gets allocated later, + * but not in this function. Use MAC in error messages instead */ + if ((plug_ret = networkCheckBandwidth(net, iface, &new_rate)) < 0) { + /* helper reported error */ + goto cleanup; + } + + if (plug_ret > 0) { + /* no QoS needs to be set; claim success */ + ret = 0; + goto cleanup; + } + + virMacAddrFormat(&iface->mac, ifmac); + if (iface->type != VIR_DOMAIN_NET_TYPE_NETWORK || + !iface->data.network.actual) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Cannot set bandwidth on interface '%s' of type %d"), + ifmac, iface->type); + goto cleanup; + } + + /* generate new class_id */ + if ((class_id = networkNextClassID(net)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Could not generate next class ID")); + goto cleanup; + } + + plug_ret = virNetDevBandwidthPlug(net->def->bridge, + net->def->bandwidth, + &iface->mac, + iface->bandwidth, + class_id); + if (plug_ret < 0) { + ignore_value(virNetDevBandwidthUnplug(net->def->bridge, class_id)); + goto cleanup; + } + + if (plug_ret > 0) { + /* Well, this is embarrassing. The networkCheckBandwidth helper + * says we need to set a QoS, but virNetDevBandwidthPlug says + * we don't need to. It's almost certainly a bug then. */ + VIR_WARN("Something strange happened. You may want to upgrade libvirt"); + ret = 0; + goto cleanup; + } + + /* QoS was set, generate new class ID */ + iface->data.network.actual->class_id = class_id; + /* update sum of 'floor'-s of attached NICs */ + net->floor_sum += iface->bandwidth->in->floor; + /* update rate for non guaranteed NICs */ + new_rate -= net->floor_sum; + if (virNetDevBandwidthUpdateRate(net->def->bridge, "1:2", + net->def->bandwidth, new_rate) < 0) + VIR_WARN("Unable to update rate for 1:2 class on %s bridge", + net->def->bridge); + + ret = 0; + +cleanup: + return ret; +} + +static int +networkUnplugBandwidth(virNetworkObjPtr net, + virDomainNetDefPtr iface) +{ + int ret = 0; + unsigned long long new_rate; + + if (iface->data.network.actual && + iface->data.network.actual->class_id) { + /* we must remove class from bridge */ + new_rate = net->def->bandwidth->in->average; + + if (net->def->bandwidth->in->peak > 0) + new_rate = net->def->bandwidth->in->peak; + + ret = virNetDevBandwidthUnplug(net->def->bridge, + iface->data.network.actual->class_id); + if (ret < 0) + goto cleanup; + /* update sum of 'floor'-s of attached NICs */ + net->floor_sum -= iface->bandwidth->in->floor; + /* update rate for non guaranteed NICs */ + new_rate -= net->floor_sum; + if (virNetDevBandwidthUpdateRate(net->def->bridge, "1:2", + net->def->bandwidth, new_rate) < 0) + VIR_WARN("Unable to update rate for 1:2 class on %s bridge", + net->def->bridge); + /* no class is associated any longer */ + iface->data.network.actual->class_id = 0; + } + +cleanup: + return ret; +} -- 1.7.8.6

On 12/04/2012 02:19 PM, Michal Privoznik wrote:
Network should be notified if we plug in or unplug an interface, so it can perform some action, e.g. set/unset network part of QoS. However, we are doing this in very early stage, so iface->ifname isn't filled in yet. So whenever we want to report an error, we must use a different identifier, e.g. the MAC address. --- src/Makefile.am | 7 +- src/conf/domain_conf.h | 1 + src/conf/network_conf.c | 21 ++++ src/conf/network_conf.h | 4 + src/libvirt_network.syms | 13 +++ src/libvirt_private.syms | 8 -- src/network/bridge_driver.c | 223 ++++++++++++++++++++++++++++++++++++++++++- 7 files changed, 266 insertions(+), 11 deletions(-) create mode 100644 src/libvirt_network.syms
diff --git a/src/Makefile.am b/src/Makefile.am index 01cb995..04378d1 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -1373,6 +1373,10 @@ if WITH_ATOMIC_OPS_PTHREAD USED_SYM_FILES += libvirt_atomic.syms endif
+if WITH_NETWORK +USED_SYM_FILES += libvirt_network.syms +endif
Was this necessary for some reason, or did you just decide it made things more organized?
+ EXTRA_DIST += \ libvirt_public.syms \ libvirt_private.syms \ @@ -1386,7 +1390,8 @@ EXTRA_DIST += \ libvirt_sasl.syms \ libvirt_vmx.syms \ libvirt_xenxs.syms \ - libvirt_libssh2.syms + libvirt_libssh2.syms \ + libvirt_network.syms
GENERATED_SYM_FILES = libvirt.syms libvirt.def libvirt_qemu.def
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 4ab15e9..b4d149b 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -807,6 +807,7 @@ struct _virDomainActualNetDef { virNetDevVPortProfilePtr virtPortProfile; virNetDevBandwidthPtr bandwidth; virNetDevVlan vlan; + unsigned int class_id; /* class ID for bandwidth 'floor' */ };
/* Stores the virtual network interface configuration */ diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index 2cb9e81..441ccdc 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -47,6 +47,9 @@
#define MAX_BRIDGE_ID 256 #define VIR_FROM_THIS VIR_FROM_NETWORK +#define NEXT_FREE_CLASS_ID 3 +/* currently, /sbin/tc implementation allows up 16 bits for minor class size */ +#define CLASS_ID_BITMAP_SIZE (1<<16)
VIR_ENUM_IMPL(virNetworkForward, VIR_NETWORK_FORWARD_LAST, @@ -319,10 +322,28 @@ virNetworkAssignDef(virNetworkObjListPtr nets, virNetworkObjLock(network); network->def = def;
+ if (!(network->class_id = virBitmapNew(CLASS_ID_BITMAP_SIZE))) { + virReportOOMError(); + goto error; + } + + if (virBitmapSetBit(network->class_id, 0) < 0 || + virBitmapSetBit(network->class_id, 1) < 0 || + virBitmapSetBit(network->class_id, 2) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("unable to initialize class id bitmap")); + goto error;
Those are all guaranteed to be successful (as long as CLASS_ID_BITMAP_SIZE > 3) and it looks like most uses of virBitmapSetBit are inside ignore_value(); this looks like another good candidate to do that - it will eliminate one more message that needs translating :-)
+ } + + network->def = def; nets->objs[nets->count] = network; nets->count++;
return network; +error: + virNetworkObjUnlock(network); + virNetworkObjFree(network); + return NULL;
}
diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h index 3e46304..20f8a51 100644 --- a/src/conf/network_conf.h +++ b/src/conf/network_conf.h @@ -38,6 +38,7 @@ # include "virnetdevvlan.h" # include "virmacaddr.h" # include "device_conf.h" +# include "bitmap.h"
enum virNetworkForwardType { VIR_NETWORK_FORWARD_NONE = 0, @@ -221,6 +222,9 @@ struct _virNetworkObj {
virNetworkDefPtr def; /* The current definition */ virNetworkDefPtr newDef; /* New definition to activate at shutdown */ + + virBitmapPtr class_id; /* bitmap of class IDs for QoS */ + unsigned long long floor_sum; /* sum of all 'floor'-s of attached NICs */ };
typedef struct _virNetworkObjList virNetworkObjList; diff --git a/src/libvirt_network.syms b/src/libvirt_network.syms new file mode 100644 index 0000000..fc8d415 --- /dev/null +++ b/src/libvirt_network.syms @@ -0,0 +1,13 @@ +# +# Network-specific symbols +# + +# virnetdevbandwidth.h +virNetDevBandwidthClear; +virNetDevBandwidthCopy; +virNetDevBandwidthEqual; +virNetDevBandwidthFree; +virNetDevBandwidthSet; +virNetDevBandwidthPlug; +virNetDevBandwidthUnplug; +virNetDevBandwidthUpdateRate; diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 41e2629..f5648a0 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1497,14 +1497,6 @@ virNetDevSetOnline; virNetDevValidateConfig;
-# virnetdevbandwidth.h -virNetDevBandwidthClear; -virNetDevBandwidthCopy; -virNetDevBandwidthEqual; -virNetDevBandwidthFree; -virNetDevBandwidthSet; - -
I don't really like that this movement of existing symbols into a new symbol file was mixed into this patch. Also, I'm pretty sure this isn't the intended use of the libvirt_xxx.syms files - what should be in this file are symbols that are *defined* if --with-network is passed to configure (i.e. if WITH_NETWORK is defined). What you have listed here are symbols that are *always* defined, but currently happen to only be *used* if WITH_NETWORK is defined.
# virnetdevbridge.h virNetDevBridgeAddPort; virNetDevBridgeCreate; diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 63be5e2..15cbc87 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -121,6 +121,11 @@ static int networkShutdownNetworkExternal(struct network_driver *driver, static void networkReloadIptablesRules(struct network_driver *driver); static void networkRefreshDaemons(struct network_driver *driver);
+static int networkPlugBandwidth(virNetworkObjPtr net, + virDomainNetDefPtr iface); +static int networkUnplugBandwidth(virNetworkObjPtr net, + virDomainNetDefPtr iface); + static struct network_driver *driverState = NULL;
static char * @@ -3469,6 +3474,7 @@ networkAllocateActualDevice(virDomainNetDefPtr iface) enum virDomainNetType actualType = iface->type; virNetworkObjPtr network = NULL; virNetworkDefPtr netdef = NULL; + virNetDevBandwidthPtr bandwidth = NULL; virPortGroupDefPtr portgroup = NULL; virNetDevVPortProfilePtr virtport = iface->virtPortProfile; virNetDevVlanPtr vlan = NULL; @@ -3507,7 +3513,13 @@ networkAllocateActualDevice(virDomainNetDefPtr iface) * (already in NetDef). Otherwise, if there is bandwidth info in * the portgroup, fill that into the ActualDef. */ - if (portgroup && !iface->bandwidth) { + + if (iface->bandwidth) + bandwidth = iface->bandwidth; + else if (portgroup && portgroup->bandwidth) + bandwidth = portgroup->bandwidth; + + if (bandwidth) { if (!iface->data.network.actual && (VIR_ALLOC(iface->data.network.actual) < 0)) { virReportOOMError(); @@ -3515,7 +3527,7 @@ networkAllocateActualDevice(virDomainNetDefPtr iface) }
if (virNetDevBandwidthCopy(&iface->data.network.actual->bandwidth, - portgroup->bandwidth) < 0) + bandwidth) < 0) goto error; }
@@ -3528,6 +3540,10 @@ networkAllocateActualDevice(virDomainNetDefPtr iface) */ if (iface->data.network.actual) iface->data.network.actual->type = VIR_DOMAIN_NET_TYPE_NETWORK; + + if (networkPlugBandwidth(network, iface) < 0) + goto error; + } else if ((netdef->forwardType == VIR_NETWORK_FORWARD_BRIDGE) && netdef->bridge) {
@@ -4039,6 +4055,12 @@ networkReleaseActualDevice(virDomainNetDefPtr iface) } netdef = network->def;
+ if ((netdef->forwardType == VIR_NETWORK_FORWARD_NONE || + netdef->forwardType == VIR_NETWORK_FORWARD_NAT || + netdef->forwardType == VIR_NETWORK_FORWARD_ROUTE) && + networkUnplugBandwidth(network, iface) < 0) + goto error;
It's not listed in the coding guidelines, but I like to use braces when the condition is multiline too.
+ if ((!iface->data.network.actual) || ((actualType != VIR_DOMAIN_NET_TYPE_DIRECT) && (actualType != VIR_DOMAIN_NET_TYPE_HOSTDEV))) { @@ -4242,3 +4264,200 @@ cleanup: error: goto cleanup; } + +/** + * networkCheckBandwidth: + * @net: network QoS + * @iface: interface QoS + * @new_rate: new rate for non guaranteed class + * + * Returns: -1 if plugging would overcommit network QoS + * 0 if plugging is safe (@new_rate updated) + * 1 if no QoS is set (@new_rate untouched) + */ +static int +networkCheckBandwidth(virNetworkObjPtr net, + virDomainNetDefPtr iface, + unsigned long long *new_rate) +{ + int ret = -1; + virNetDevBandwidthPtr netBand = net->def->bandwidth; + virNetDevBandwidthPtr ifaceBand = iface->bandwidth; + unsigned long long tmp_floor_sum = net->floor_sum; + unsigned long long tmp_new_rate = 0; + char ifmac[VIR_MAC_STRING_BUFLEN]; + + if (!ifaceBand || !ifaceBand->in || !ifaceBand->in->floor || + !netBand || !netBand->in) + return 1; + + virMacAddrFormat(&iface->mac, ifmac); + + tmp_new_rate = netBand->in->average; + tmp_floor_sum += ifaceBand->in->floor; + + /* check against peak */ + if (netBand->in->peak) { + tmp_new_rate = netBand->in->peak; + if (tmp_floor_sum > netBand->in->peak) { + virReportError(VIR_ERR_OPERATION_INVALID, + _("Cannot plug '%s' interface into '%s' because it " + "would overcommit 'peak' on network '%s'"), + ifmac, + net->def->bridge, + net->def->name); + goto cleanup; + } + } else if (tmp_floor_sum > netBand->in->average) { + /* tmp_floor_sum can be between 'average' and 'peak' iff 'peak' is set. + * Otherwise, tmp_floor_sum must be below 'average'. */ + virReportError(VIR_ERR_OPERATION_INVALID, + _("Cannot plug '%s' interface into '%s' because it " + "would overcommit 'average' on network '%s'"), + ifmac, + net->def->bridge, + net->def->name); + goto cleanup; + } + + *new_rate = tmp_new_rate; + ret = 0; + +cleanup: + return ret; +} + +/** + * networkNextClassID: + * @net: network object + * + * Find next free class ID. @net is supposed + * to be locked already. If there is a free ID, + * it is marked as used and returned. + * + * Returns next free class ID or -1 if none is available. + */ +static ssize_t +networkNextClassID(virNetworkObjPtr net) +{ + size_t ret = 0; + bool is_set = false; + + while (virBitmapGetBit(net->class_id, ret, &is_set) == 0 && is_set) + ret++; + + if (is_set || virBitmapSetBit(net->class_id, ret) < 0) + return -1; + + return ret; +} + +static int +networkPlugBandwidth(virNetworkObjPtr net, + virDomainNetDefPtr iface) +{ + int ret = -1; + int plug_ret; + unsigned long long new_rate = 0; + ssize_t class_id = 0; + char ifmac[VIR_MAC_STRING_BUFLEN]; + + /* XXX iface->ifname is not filled in, it gets allocated later, + * but not in this function. Use MAC in error messages instead */
I don't know that this comment is even needed - ifname is anyway some name unknown to the user, and if there's an error it will no longer be valid anyway. Even if it was already filled in, using MAC address in the error messages is still a better idea.
+ if ((plug_ret = networkCheckBandwidth(net, iface, &new_rate)) < 0) { + /* helper reported error */ + goto cleanup; + } + + if (plug_ret > 0) { + /* no QoS needs to be set; claim success */ + ret = 0; + goto cleanup; + } + + virMacAddrFormat(&iface->mac, ifmac); + if (iface->type != VIR_DOMAIN_NET_TYPE_NETWORK || + !iface->data.network.actual) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Cannot set bandwidth on interface '%s' of type %d"), + ifmac, iface->type); + goto cleanup; + } + + /* generate new class_id */ + if ((class_id = networkNextClassID(net)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Could not generate next class ID")); + goto cleanup; + } + + plug_ret = virNetDevBandwidthPlug(net->def->bridge, + net->def->bandwidth, + &iface->mac, + iface->bandwidth, + class_id); + if (plug_ret < 0) { + ignore_value(virNetDevBandwidthUnplug(net->def->bridge, class_id)); + goto cleanup; + } + + if (plug_ret > 0) { + /* Well, this is embarrassing. The networkCheckBandwidth helper + * says we need to set a QoS, but virNetDevBandwidthPlug says + * we don't need to. It's almost certainly a bug then. */ + VIR_WARN("Something strange happened. You may want to upgrade libvirt");
I realize this should never happen, but is it really okay to just return success? Also, the message is incredibly vague.
+ ret = 0; + goto cleanup; + } + + /* QoS was set, generate new class ID */ + iface->data.network.actual->class_id = class_id; + /* update sum of 'floor'-s of attached NICs */ + net->floor_sum += iface->bandwidth->in->floor; + /* update rate for non guaranteed NICs */ + new_rate -= net->floor_sum; + if (virNetDevBandwidthUpdateRate(net->def->bridge, "1:2", + net->def->bandwidth, new_rate) < 0) + VIR_WARN("Unable to update rate for 1:2 class on %s bridge", + net->def->bridge);
Likewise, is there a reason this is just a warning and not an error?
+ + ret = 0; + +cleanup: + return ret; +} + +static int +networkUnplugBandwidth(virNetworkObjPtr net, + virDomainNetDefPtr iface) +{ + int ret = 0; + unsigned long long new_rate; + + if (iface->data.network.actual && + iface->data.network.actual->class_id) { + /* we must remove class from bridge */ + new_rate = net->def->bandwidth->in->average; + + if (net->def->bandwidth->in->peak > 0) + new_rate = net->def->bandwidth->in->peak; + + ret = virNetDevBandwidthUnplug(net->def->bridge, + iface->data.network.actual->class_id); + if (ret < 0) + goto cleanup; + /* update sum of 'floor'-s of attached NICs */ + net->floor_sum -= iface->bandwidth->in->floor; + /* update rate for non guaranteed NICs */ + new_rate -= net->floor_sum; + if (virNetDevBandwidthUpdateRate(net->def->bridge, "1:2", + net->def->bandwidth, new_rate) < 0) + VIR_WARN("Unable to update rate for 1:2 class on %s bridge", + net->def->bridge);
same
+ /* no class is associated any longer */ + iface->data.network.actual->class_id = 0; + } + +cleanup: + return ret; +}
The other items are minor and wouldn't take away from an ACK, but I think the movement of virNetDevBandwidth* symbols into libvirt_network.syms was wrong.

On 11.12.2012 09:52, Laine Stump wrote:
On 12/04/2012 02:19 PM, Michal Privoznik wrote:
Network should be notified if we plug in or unplug an interface, so it can perform some action, e.g. set/unset network part of QoS. However, we are doing this in very early stage, so iface->ifname isn't filled in yet. So whenever we want to report an error, we must use a different identifier, e.g. the MAC address. --- src/Makefile.am | 7 +- src/conf/domain_conf.h | 1 + src/conf/network_conf.c | 21 ++++ src/conf/network_conf.h | 4 + src/libvirt_network.syms | 13 +++ src/libvirt_private.syms | 8 -- src/network/bridge_driver.c | 223 ++++++++++++++++++++++++++++++++++++++++++- 7 files changed, 266 insertions(+), 11 deletions(-) create mode 100644 src/libvirt_network.syms
diff --git a/src/Makefile.am b/src/Makefile.am index 01cb995..04378d1 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -1373,6 +1373,10 @@ if WITH_ATOMIC_OPS_PTHREAD USED_SYM_FILES += libvirt_atomic.syms endif
+if WITH_NETWORK +USED_SYM_FILES += libvirt_network.syms +endif
Was this necessary for some reason, or did you just decide it made things more organized?
Just more organized.
+ EXTRA_DIST += \ libvirt_public.syms \ libvirt_private.syms \ @@ -1386,7 +1390,8 @@ EXTRA_DIST += \ libvirt_sasl.syms \ libvirt_vmx.syms \ libvirt_xenxs.syms \ - libvirt_libssh2.syms + libvirt_libssh2.syms \ + libvirt_network.syms
GENERATED_SYM_FILES = libvirt.syms libvirt.def libvirt_qemu.def
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 4ab15e9..b4d149b 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -807,6 +807,7 @@ struct _virDomainActualNetDef { virNetDevVPortProfilePtr virtPortProfile; virNetDevBandwidthPtr bandwidth; virNetDevVlan vlan; + unsigned int class_id; /* class ID for bandwidth 'floor' */ };
/* Stores the virtual network interface configuration */ diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index 2cb9e81..441ccdc 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -47,6 +47,9 @@
#define MAX_BRIDGE_ID 256 #define VIR_FROM_THIS VIR_FROM_NETWORK +#define NEXT_FREE_CLASS_ID 3 +/* currently, /sbin/tc implementation allows up 16 bits for minor class size */ +#define CLASS_ID_BITMAP_SIZE (1<<16)
VIR_ENUM_IMPL(virNetworkForward, VIR_NETWORK_FORWARD_LAST, @@ -319,10 +322,28 @@ virNetworkAssignDef(virNetworkObjListPtr nets, virNetworkObjLock(network); network->def = def;
+ if (!(network->class_id = virBitmapNew(CLASS_ID_BITMAP_SIZE))) { + virReportOOMError(); + goto error; + } + + if (virBitmapSetBit(network->class_id, 0) < 0 || + virBitmapSetBit(network->class_id, 1) < 0 || + virBitmapSetBit(network->class_id, 2) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("unable to initialize class id bitmap")); + goto error;
Those are all guaranteed to be successful (as long as CLASS_ID_BITMAP_SIZE > 3) and it looks like most uses of virBitmapSetBit are inside ignore_value(); this looks like another good candidate to do that - it will eliminate one more message that needs translating :-)
Yeah, I was thinking about that as another possibility but had chosen this one. Can't recall why :)
+ } + + network->def = def; nets->objs[nets->count] = network; nets->count++;
return network; +error: + virNetworkObjUnlock(network); + virNetworkObjFree(network); + return NULL;
}
diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h index 3e46304..20f8a51 100644 --- a/src/conf/network_conf.h +++ b/src/conf/network_conf.h @@ -38,6 +38,7 @@ # include "virnetdevvlan.h" # include "virmacaddr.h" # include "device_conf.h" +# include "bitmap.h"
enum virNetworkForwardType { VIR_NETWORK_FORWARD_NONE = 0, @@ -221,6 +222,9 @@ struct _virNetworkObj {
virNetworkDefPtr def; /* The current definition */ virNetworkDefPtr newDef; /* New definition to activate at shutdown */ + + virBitmapPtr class_id; /* bitmap of class IDs for QoS */ + unsigned long long floor_sum; /* sum of all 'floor'-s of attached NICs */ };
typedef struct _virNetworkObjList virNetworkObjList; diff --git a/src/libvirt_network.syms b/src/libvirt_network.syms new file mode 100644 index 0000000..fc8d415 --- /dev/null +++ b/src/libvirt_network.syms @@ -0,0 +1,13 @@ +# +# Network-specific symbols +# + +# virnetdevbandwidth.h +virNetDevBandwidthClear; +virNetDevBandwidthCopy; +virNetDevBandwidthEqual; +virNetDevBandwidthFree; +virNetDevBandwidthSet; +virNetDevBandwidthPlug; +virNetDevBandwidthUnplug; +virNetDevBandwidthUpdateRate; diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 41e2629..f5648a0 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1497,14 +1497,6 @@ virNetDevSetOnline; virNetDevValidateConfig;
-# virnetdevbandwidth.h -virNetDevBandwidthClear; -virNetDevBandwidthCopy; -virNetDevBandwidthEqual; -virNetDevBandwidthFree; -virNetDevBandwidthSet; - -
I don't really like that this movement of existing symbols into a new symbol file was mixed into this patch. Also, I'm pretty sure this isn't the intended use of the libvirt_xxx.syms files - what should be in this file are symbols that are *defined* if --with-network is passed to configure (i.e. if WITH_NETWORK is defined). What you have listed here are symbols that are *always* defined, but currently happen to only be *used* if WITH_NETWORK is defined.
D'oh.
# virnetdevbridge.h virNetDevBridgeAddPort; virNetDevBridgeCreate; diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 63be5e2..15cbc87 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -121,6 +121,11 @@ static int networkShutdownNetworkExternal(struct network_driver *driver, static void networkReloadIptablesRules(struct network_driver *driver); static void networkRefreshDaemons(struct network_driver *driver);
+static int networkPlugBandwidth(virNetworkObjPtr net, + virDomainNetDefPtr iface); +static int networkUnplugBandwidth(virNetworkObjPtr net, + virDomainNetDefPtr iface); + static struct network_driver *driverState = NULL;
static char * @@ -3469,6 +3474,7 @@ networkAllocateActualDevice(virDomainNetDefPtr iface) enum virDomainNetType actualType = iface->type; virNetworkObjPtr network = NULL; virNetworkDefPtr netdef = NULL; + virNetDevBandwidthPtr bandwidth = NULL; virPortGroupDefPtr portgroup = NULL; virNetDevVPortProfilePtr virtport = iface->virtPortProfile; virNetDevVlanPtr vlan = NULL; @@ -3507,7 +3513,13 @@ networkAllocateActualDevice(virDomainNetDefPtr iface) * (already in NetDef). Otherwise, if there is bandwidth info in * the portgroup, fill that into the ActualDef. */ - if (portgroup && !iface->bandwidth) { + + if (iface->bandwidth) + bandwidth = iface->bandwidth; + else if (portgroup && portgroup->bandwidth) + bandwidth = portgroup->bandwidth; + + if (bandwidth) { if (!iface->data.network.actual && (VIR_ALLOC(iface->data.network.actual) < 0)) { virReportOOMError(); @@ -3515,7 +3527,7 @@ networkAllocateActualDevice(virDomainNetDefPtr iface) }
if (virNetDevBandwidthCopy(&iface->data.network.actual->bandwidth, - portgroup->bandwidth) < 0) + bandwidth) < 0) goto error; }
@@ -3528,6 +3540,10 @@ networkAllocateActualDevice(virDomainNetDefPtr iface) */ if (iface->data.network.actual) iface->data.network.actual->type = VIR_DOMAIN_NET_TYPE_NETWORK; + + if (networkPlugBandwidth(network, iface) < 0) + goto error; + } else if ((netdef->forwardType == VIR_NETWORK_FORWARD_BRIDGE) && netdef->bridge) {
@@ -4039,6 +4055,12 @@ networkReleaseActualDevice(virDomainNetDefPtr iface) } netdef = network->def;
+ if ((netdef->forwardType == VIR_NETWORK_FORWARD_NONE || + netdef->forwardType == VIR_NETWORK_FORWARD_NAT || + netdef->forwardType == VIR_NETWORK_FORWARD_ROUTE) && + networkUnplugBandwidth(network, iface) < 0) + goto error;
It's not listed in the coding guidelines, but I like to use braces when the condition is multiline too.
+ if ((!iface->data.network.actual) || ((actualType != VIR_DOMAIN_NET_TYPE_DIRECT) && (actualType != VIR_DOMAIN_NET_TYPE_HOSTDEV))) { @@ -4242,3 +4264,200 @@ cleanup: error: goto cleanup; } + +/** + * networkCheckBandwidth: + * @net: network QoS + * @iface: interface QoS + * @new_rate: new rate for non guaranteed class + * + * Returns: -1 if plugging would overcommit network QoS + * 0 if plugging is safe (@new_rate updated) + * 1 if no QoS is set (@new_rate untouched) + */ +static int +networkCheckBandwidth(virNetworkObjPtr net, + virDomainNetDefPtr iface, + unsigned long long *new_rate) +{ + int ret = -1; + virNetDevBandwidthPtr netBand = net->def->bandwidth; + virNetDevBandwidthPtr ifaceBand = iface->bandwidth; + unsigned long long tmp_floor_sum = net->floor_sum; + unsigned long long tmp_new_rate = 0; + char ifmac[VIR_MAC_STRING_BUFLEN]; + + if (!ifaceBand || !ifaceBand->in || !ifaceBand->in->floor || + !netBand || !netBand->in) + return 1; + + virMacAddrFormat(&iface->mac, ifmac); + + tmp_new_rate = netBand->in->average; + tmp_floor_sum += ifaceBand->in->floor; + + /* check against peak */ + if (netBand->in->peak) { + tmp_new_rate = netBand->in->peak; + if (tmp_floor_sum > netBand->in->peak) { + virReportError(VIR_ERR_OPERATION_INVALID, + _("Cannot plug '%s' interface into '%s' because it " + "would overcommit 'peak' on network '%s'"), + ifmac, + net->def->bridge, + net->def->name); + goto cleanup; + } + } else if (tmp_floor_sum > netBand->in->average) { + /* tmp_floor_sum can be between 'average' and 'peak' iff 'peak' is set. + * Otherwise, tmp_floor_sum must be below 'average'. */ + virReportError(VIR_ERR_OPERATION_INVALID, + _("Cannot plug '%s' interface into '%s' because it " + "would overcommit 'average' on network '%s'"), + ifmac, + net->def->bridge, + net->def->name); + goto cleanup; + } + + *new_rate = tmp_new_rate; + ret = 0; + +cleanup: + return ret; +} + +/** + * networkNextClassID: + * @net: network object + * + * Find next free class ID. @net is supposed + * to be locked already. If there is a free ID, + * it is marked as used and returned. + * + * Returns next free class ID or -1 if none is available. + */ +static ssize_t +networkNextClassID(virNetworkObjPtr net) +{ + size_t ret = 0; + bool is_set = false; + + while (virBitmapGetBit(net->class_id, ret, &is_set) == 0 && is_set) + ret++; + + if (is_set || virBitmapSetBit(net->class_id, ret) < 0) + return -1; + + return ret; +} + +static int +networkPlugBandwidth(virNetworkObjPtr net, + virDomainNetDefPtr iface) +{ + int ret = -1; + int plug_ret; + unsigned long long new_rate = 0; + ssize_t class_id = 0; + char ifmac[VIR_MAC_STRING_BUFLEN]; + + /* XXX iface->ifname is not filled in, it gets allocated later, + * but not in this function. Use MAC in error messages instead */
I don't know that this comment is even needed - ifname is anyway some name unknown to the user, and if there's an error it will no longer be valid anyway. Even if it was already filled in, using MAC address in the error messages is still a better idea.
+ if ((plug_ret = networkCheckBandwidth(net, iface, &new_rate)) < 0) { + /* helper reported error */ + goto cleanup; + } + + if (plug_ret > 0) { + /* no QoS needs to be set; claim success */ + ret = 0; + goto cleanup; + } + + virMacAddrFormat(&iface->mac, ifmac); + if (iface->type != VIR_DOMAIN_NET_TYPE_NETWORK || + !iface->data.network.actual) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Cannot set bandwidth on interface '%s' of type %d"), + ifmac, iface->type); + goto cleanup; + } + + /* generate new class_id */ + if ((class_id = networkNextClassID(net)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Could not generate next class ID")); + goto cleanup; + } + + plug_ret = virNetDevBandwidthPlug(net->def->bridge, + net->def->bandwidth, + &iface->mac, + iface->bandwidth, + class_id); + if (plug_ret < 0) { + ignore_value(virNetDevBandwidthUnplug(net->def->bridge, class_id)); + goto cleanup; + } + + if (plug_ret > 0) { + /* Well, this is embarrassing. The networkCheckBandwidth helper + * says we need to set a QoS, but virNetDevBandwidthPlug says + * we don't need to. It's almost certainly a bug then. */ + VIR_WARN("Something strange happened. You may want to upgrade libvirt");
I realize this should never happen, but is it really okay to just return success?
Also, the message is incredibly vague.
It is indeed. But since it would never be printed out (it's just a warn) I guess it's okay. Moreover, each error message produces a log entry with appropriate line number, so a libvirt devel immediately knows where the error was produced, right? If we don't want to return success, we need to produce a meaningful error message then. If somebody has an idea ... Or as you say in 4/8 - I could drop the (plug_ret > 0) completely and start to believe in correctness of parts of our subsystems.
+ ret = 0; + goto cleanup; + } + + /* QoS was set, generate new class ID */ + iface->data.network.actual->class_id = class_id; + /* update sum of 'floor'-s of attached NICs */ + net->floor_sum += iface->bandwidth->in->floor; + /* update rate for non guaranteed NICs */ + new_rate -= net->floor_sum; + if (virNetDevBandwidthUpdateRate(net->def->bridge, "1:2", + net->def->bandwidth, new_rate) < 0) + VIR_WARN("Unable to update rate for 1:2 class on %s bridge", + net->def->bridge);
Likewise, is there a reason this is just a warning and not an error?
This should be the part that solves the upgrading issue you mention in 3/8. That is, if we fail to upgrade 1:2 class it is likely there is no such class (okay, we could have failed because of OOM, pipe() may have failed, whatever. Don't focus on these for now). So we shouldn't fail. Or maybe we should distinguish between 'class 1:2' is not there and other fail cases.
+ + ret = 0; + +cleanup: + return ret; +} + +static int +networkUnplugBandwidth(virNetworkObjPtr net, + virDomainNetDefPtr iface) +{ + int ret = 0; + unsigned long long new_rate; + + if (iface->data.network.actual && + iface->data.network.actual->class_id) { + /* we must remove class from bridge */ + new_rate = net->def->bandwidth->in->average; + + if (net->def->bandwidth->in->peak > 0) + new_rate = net->def->bandwidth->in->peak; + + ret = virNetDevBandwidthUnplug(net->def->bridge, + iface->data.network.actual->class_id); + if (ret < 0) + goto cleanup; + /* update sum of 'floor'-s of attached NICs */ + net->floor_sum -= iface->bandwidth->in->floor; + /* update rate for non guaranteed NICs */ + new_rate -= net->floor_sum; + if (virNetDevBandwidthUpdateRate(net->def->bridge, "1:2", + net->def->bandwidth, new_rate) < 0) + VIR_WARN("Unable to update rate for 1:2 class on %s bridge", + net->def->bridge);
same
+ /* no class is associated any longer */ + iface->data.network.actual->class_id = 0; + } + +cleanup: + return ret; +}
The other items are minor and wouldn't take away from an ACK, but I think the movement of virNetDevBandwidth* symbols into libvirt_network.syms was wrong.
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 12/04/2012 02:19 PM, Michal Privoznik wrote:
Network should be notified if we plug in or unplug an interface, so it can perform some action, e.g. set/unset network part of QoS. However, we are doing this in very early stage, so iface->ifname isn't filled in yet. So whenever we want to report an error, we must use a different identifier, e.g. the MAC address. --- src/Makefile.am | 7 +- src/conf/domain_conf.h | 1 + src/conf/network_conf.c | 21 ++++ src/conf/network_conf.h | 4 + src/libvirt_network.syms | 13 +++ src/libvirt_private.syms | 8 -- src/network/bridge_driver.c | 223 ++++++++++++++++++++++++++++++++++++++++++- 7 files changed, 266 insertions(+), 11 deletions(-) create mode 100644 src/libvirt_network.syms
diff --git a/src/Makefile.am b/src/Makefile.am index 01cb995..04378d1 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -1373,6 +1373,10 @@ if WITH_ATOMIC_OPS_PTHREAD USED_SYM_FILES += libvirt_atomic.syms endif
+if WITH_NETWORK +USED_SYM_FILES += libvirt_network.syms +endif + EXTRA_DIST += \ libvirt_public.syms \ libvirt_private.syms \ @@ -1386,7 +1390,8 @@ EXTRA_DIST += \ libvirt_sasl.syms \ libvirt_vmx.syms \ libvirt_xenxs.syms \ - libvirt_libssh2.syms + libvirt_libssh2.syms \ + libvirt_network.syms
GENERATED_SYM_FILES = libvirt.syms libvirt.def libvirt_qemu.def
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 4ab15e9..b4d149b 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -807,6 +807,7 @@ struct _virDomainActualNetDef { virNetDevVPortProfilePtr virtPortProfile; virNetDevBandwidthPtr bandwidth; virNetDevVlan vlan; + unsigned int class_id; /* class ID for bandwidth 'floor' */ };
I also just noticed that you add this directly into the actualnetdef rather than into the virNetDevBandwidth. Did you maybe do this because you wanted the parser to be able to see the network type before allowing it? If so, I don't think that's necessary - the actualdef is never parsed from user config, so if it's there at an inappropriate time, it's either a program bug, or someone messing with the domain status file. I figured it would be simplest to pass around if it was part of the bandwidth object (and logically it fits there). Also, I think it's appropriate to add the bits for parsing/formatting a new element in the same patch where it was added to the struct, but you've delayed it to a separate patch at the end that only does that one thing. I'm okay with that, since it will still pass make check at each step, but I think it would make more sense to put that last patch earlier, and include this data definition with it.

Currently, we are only keeping a inactive XML configuration in status dir. This is no longer enough as we need to keep this class_id attribute so we don't overwrite old entries when the daemon restarts. However, since there has already been release which has just <network/> as root element, and we want to keep things compatible, detect that loaded status file is older one, and don't scream about it. --- src/conf/network_conf.c | 205 ++++++++++++++++++++++++++++++++++--------- src/conf/network_conf.h | 2 + src/libvirt_private.syms | 1 + src/network/bridge_driver.c | 27 +++++-- 4 files changed, 189 insertions(+), 46 deletions(-) diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index 441ccdc..6ede26d 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -1671,6 +1671,79 @@ cleanup: return def; } +int +virNetworkObjUpdateParseFile(const char *filename, + virNetworkObjPtr net) +{ + int ret = -1; + xmlDocPtr xml = NULL; + xmlNodePtr node = NULL; + virNetworkDefPtr tmp = NULL; + xmlXPathContextPtr ctxt = NULL; + + xml = virXMLParse(filename, NULL, _("(network status)")); + if (!xml) + return -1; + + ctxt = xmlXPathNewContext(xml); + if (ctxt == NULL) { + virReportOOMError(); + goto cleanup; + } + + node = xmlDocGetRootElement(xml); + if (xmlStrEqual(node->name, BAD_CAST "networkstatus")) { + /* Newer network status file. Contains useful + * info which are not to be found in bare config XML */ + char *class_id = NULL; + char *floor_sum = NULL; + + ctxt->node = node; + class_id = virXPathString("string(./class_id[1]/@bitmap)", ctxt); + if (class_id && + virBitmapParse(class_id, ',', + &net->class_id, CLASS_ID_BITMAP_SIZE) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed 'class_id' attribute: %s"), + class_id); + VIR_FREE(class_id); + goto cleanup; + } + VIR_FREE(class_id); + + floor_sum = virXPathString("string(./floor[1]/@sum)", ctxt); + if (floor_sum && + virStrToLong_ull(floor_sum, NULL, 10, &net->floor_sum) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed 'floor_sum' attribute: %s"), + floor_sum); + VIR_FREE(floor_sum); + } + VIR_FREE(floor_sum); + } + + node = virXPathNode("//network", ctxt); + if (!node) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Could not find any 'network' element")); + goto cleanup; + } + + ctxt->node = node; + tmp = virNetworkDefParseXML(ctxt); + + if (tmp) { + net->newDef = net->def; + net->def = tmp; + } + + ret = 0; + +cleanup: + xmlXPathFreeContext(ctxt); + return ret; +} + static int virNetworkDNSDefFormat(virBufferPtr buf, virNetworkDNSDefPtr def) @@ -1849,24 +1922,26 @@ virPortGroupDefFormat(virBufferPtr buf, return 0; } -char *virNetworkDefFormat(const virNetworkDefPtr def, unsigned int flags) +static int +virNetworkDefFormatInternal(virBufferPtr buf, + const virNetworkDefPtr def, + unsigned int flags) { - virBuffer buf = VIR_BUFFER_INITIALIZER; unsigned char *uuid; char uuidstr[VIR_UUID_STRING_BUFLEN]; int ii; - virBufferAddLit(&buf, "<network"); + virBufferAddLit(buf, "<network"); if (!(flags & VIR_NETWORK_XML_INACTIVE) && (def->connections > 0)) { - virBufferAsprintf(&buf, " connections='%d'", def->connections); + virBufferAsprintf(buf, " connections='%d'", def->connections); } - virBufferAddLit(&buf, ">\n"); - virBufferAdjustIndent(&buf, 2); - virBufferEscapeString(&buf, "<name>%s</name>\n", def->name); + virBufferAddLit(buf, ">\n"); + virBufferAdjustIndent(buf, 2); + virBufferEscapeString(buf, "<name>%s</name>\n", def->name); uuid = def->uuid; virUUIDFormat(uuid, uuidstr); - virBufferAsprintf(&buf, "<uuid>%s</uuid>\n", uuidstr); + virBufferAsprintf(buf, "<uuid>%s</uuid>\n", uuidstr); if (def->forwardType != VIR_NETWORK_FORWARD_NONE) { const char *dev = NULL; @@ -1880,40 +1955,40 @@ char *virNetworkDefFormat(const virNetworkDefPtr def, unsigned int flags) def->forwardType, def->name); goto error; } - virBufferAddLit(&buf, "<forward"); - virBufferEscapeString(&buf, " dev='%s'", dev); - virBufferAsprintf(&buf, " mode='%s'", mode); + virBufferAddLit(buf, "<forward"); + virBufferEscapeString(buf, " dev='%s'", dev); + virBufferAsprintf(buf, " mode='%s'", mode); if (def->forwardType == VIR_NETWORK_FORWARD_HOSTDEV) { if (def->managed == 1) - virBufferAddLit(&buf, " managed='yes'"); + virBufferAddLit(buf, " managed='yes'"); else - virBufferAddLit(&buf, " managed='no'"); + virBufferAddLit(buf, " managed='no'"); } - virBufferAsprintf(&buf, "%s>\n", + virBufferAsprintf(buf, "%s>\n", (def->nForwardIfs || def->nForwardPfs) ? "" : "/"); - virBufferAdjustIndent(&buf, 2); + virBufferAdjustIndent(buf, 2); /* For now, hard-coded to at most 1 forwardPfs */ if (def->nForwardPfs) - virBufferEscapeString(&buf, "<pf dev='%s'/>\n", + virBufferEscapeString(buf, "<pf dev='%s'/>\n", def->forwardPfs[0].dev); if (def->nForwardIfs && (!def->nForwardPfs || !(flags & VIR_NETWORK_XML_INACTIVE))) { for (ii = 0; ii < def->nForwardIfs; ii++) { if (def->forwardType != VIR_NETWORK_FORWARD_HOSTDEV) { - virBufferEscapeString(&buf, "<interface dev='%s'", + virBufferEscapeString(buf, "<interface dev='%s'", def->forwardIfs[ii].device.dev); if (!(flags & VIR_NETWORK_XML_INACTIVE) && (def->forwardIfs[ii].connections > 0)) { - virBufferAsprintf(&buf, " connections='%d'", + virBufferAsprintf(buf, " connections='%d'", def->forwardIfs[ii].connections); } - virBufferAddLit(&buf, "/>\n"); + virBufferAddLit(buf, "/>\n"); } else { if (def->forwardIfs[ii].type == VIR_NETWORK_FORWARD_HOSTDEV_DEVICE_PCI) { - if (virDevicePCIAddressFormat(&buf, + if (virDevicePCIAddressFormat(buf, def->forwardIfs[ii].device.pci, true) < 0) goto error; @@ -1921,67 +1996,116 @@ char *virNetworkDefFormat(const virNetworkDefPtr def, unsigned int flags) } } } - virBufferAdjustIndent(&buf, -2); + virBufferAdjustIndent(buf, -2); if (def->nForwardPfs || def->nForwardIfs) - virBufferAddLit(&buf, "</forward>\n"); + virBufferAddLit(buf, "</forward>\n"); } if (def->forwardType == VIR_NETWORK_FORWARD_NONE || - def->forwardType == VIR_NETWORK_FORWARD_NAT || - def->forwardType == VIR_NETWORK_FORWARD_ROUTE) { + def->forwardType == VIR_NETWORK_FORWARD_NAT || + def->forwardType == VIR_NETWORK_FORWARD_ROUTE) { - virBufferAddLit(&buf, "<bridge"); + virBufferAddLit(buf, "<bridge"); if (def->bridge) - virBufferEscapeString(&buf, " name='%s'", def->bridge); - virBufferAsprintf(&buf, " stp='%s' delay='%ld' />\n", + virBufferEscapeString(buf, " name='%s'", def->bridge); + virBufferAsprintf(buf, " stp='%s' delay='%ld' />\n", def->stp ? "on" : "off", def->delay); } else if (def->forwardType == VIR_NETWORK_FORWARD_BRIDGE && def->bridge) { - virBufferEscapeString(&buf, "<bridge name='%s' />\n", def->bridge); + virBufferEscapeString(buf, "<bridge name='%s' />\n", def->bridge); } if (def->mac_specified) { char macaddr[VIR_MAC_STRING_BUFLEN]; virMacAddrFormat(&def->mac, macaddr); - virBufferAsprintf(&buf, "<mac address='%s'/>\n", macaddr); + virBufferAsprintf(buf, "<mac address='%s'/>\n", macaddr); } if (def->domain) - virBufferAsprintf(&buf, "<domain name='%s'/>\n", def->domain); + virBufferAsprintf(buf, "<domain name='%s'/>\n", def->domain); - if (virNetworkDNSDefFormat(&buf, def->dns) < 0) + if (virNetworkDNSDefFormat(buf, def->dns) < 0) goto error; - if (virNetDevVlanFormat(&def->vlan, &buf) < 0) + if (virNetDevVlanFormat(&def->vlan, buf) < 0) goto error; - if (virNetDevBandwidthFormat(def->bandwidth, &buf) < 0) + if (virNetDevBandwidthFormat(def->bandwidth, buf) < 0) goto error; for (ii = 0; ii < def->nips; ii++) { - if (virNetworkIpDefFormat(&buf, &def->ips[ii]) < 0) + if (virNetworkIpDefFormat(buf, &def->ips[ii]) < 0) goto error; } - if (virNetDevVPortProfileFormat(def->virtPortProfile, &buf) < 0) + if (virNetDevVPortProfileFormat(def->virtPortProfile, buf) < 0) goto error; for (ii = 0; ii < def->nPortGroups; ii++) - if (virPortGroupDefFormat(&buf, &def->portGroups[ii]) < 0) + if (virPortGroupDefFormat(buf, &def->portGroups[ii]) < 0) goto error; + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</network>\n"); + + return 0; + +error: + return -1; +} + +char * +virNetworkDefFormat(virNetworkDefPtr def, + unsigned int flags) +{ + virBuffer buf = VIR_BUFFER_INITIALIZER; + + if (virNetworkDefFormatInternal(&buf, def, flags) < 0) + goto error; + + if (virBufferError(&buf)) + goto no_memory; + + return virBufferContentAndReset(&buf); + +no_memory: + virReportOOMError(); +error: + virBufferFreeAndReset(&buf); + return NULL; +} + +static char * +virNetworkObjFormat(virNetworkObjPtr net, + unsigned int flags) +{ + virBuffer buf = VIR_BUFFER_INITIALIZER; + char *class_id = virBitmapFormat(net->class_id); + + if (!class_id) + goto no_memory; + + virBufferAddLit(&buf, "<networkstatus>\n"); + virBufferAsprintf(&buf, " <class_id bitmap='%s'/>\n", class_id); + virBufferAsprintf(&buf, " <floor sum='%llu'/>\n", net->floor_sum); + VIR_FREE(class_id); + + virBufferAdjustIndent(&buf, 2); + if (virNetworkDefFormatInternal(&buf, net->def, flags) < 0) + goto error; + virBufferAdjustIndent(&buf, -2); - virBufferAddLit(&buf, "</network>\n"); + virBufferAddLit(&buf, "</networkstatus>"); if (virBufferError(&buf)) goto no_memory; return virBufferContentAndReset(&buf); - no_memory: +no_memory: virReportOOMError(); - error: +error: virBufferFreeAndReset(&buf); return NULL; } @@ -2053,9 +2177,10 @@ int virNetworkSaveStatus(const char *statusDir, virNetworkObjPtr network) { int ret = -1; + int flags = 0; char *xml; - if (!(xml = virNetworkDefFormat(network->def, 0))) + if (!(xml = virNetworkObjFormat(network, flags))) goto cleanup; if (virNetworkSaveXML(statusDir, network->def, xml)) diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h index 20f8a51..6f6b221 100644 --- a/src/conf/network_conf.h +++ b/src/conf/network_conf.h @@ -271,6 +271,8 @@ virNetworkDefPtr virNetworkDefParseString(const char *xmlStr); virNetworkDefPtr virNetworkDefParseFile(const char *filename); virNetworkDefPtr virNetworkDefParseNode(xmlDocPtr xml, xmlNodePtr root); +int virNetworkObjUpdateParseFile(const char *filename, + virNetworkObjPtr net); char *virNetworkDefFormat(const virNetworkDefPtr def, unsigned int flags); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index f5648a0..5bd2487 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -872,6 +872,7 @@ virNetworkObjSetDefTransient; virNetworkObjUnlock; virNetworkObjUnsetDefTransient; virNetworkObjUpdate; +virNetworkObjUpdateParseFile; virNetworkRemoveInactive; virNetworkSaveConfig; virNetworkSaveStatus; diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 15cbc87..7f3baaa 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -242,7 +242,6 @@ networkFindActiveConfigs(struct network_driver *driver) { for (i = 0 ; i < driver->networks.count ; i++) { virNetworkObjPtr obj = driver->networks.objs[i]; - virNetworkDefPtr tmp; char *config; virNetworkObjLock(obj); @@ -260,12 +259,10 @@ networkFindActiveConfigs(struct network_driver *driver) { } /* Try and load the live config */ - tmp = virNetworkDefParseFile(config); + if (virNetworkObjUpdateParseFile(config, obj) < 0) + VIR_WARN("Unable to update config of '%s' network", + obj->def->name); VIR_FREE(config); - if (tmp) { - obj->newDef = obj->def; - obj->def = tmp; - } /* If bridge exists, then mark it active */ if (obj->def->bridge && @@ -4414,6 +4411,14 @@ networkPlugBandwidth(virNetworkObjPtr net, iface->data.network.actual->class_id = class_id; /* update sum of 'floor'-s of attached NICs */ net->floor_sum += iface->bandwidth->in->floor; + /* update status file */ + if (virNetworkSaveStatus(NETWORK_STATE_DIR, net) < 0) { + ignore_value(virBitmapClearBit(net->class_id, class_id)); + net->floor_sum -= iface->bandwidth->in->floor; + iface->data.network.actual->class_id = 0; + ignore_value(virNetDevBandwidthUnplug(net->def->bridge, class_id)); + goto cleanup; + } /* update rate for non guaranteed NICs */ new_rate -= net->floor_sum; if (virNetDevBandwidthUpdateRate(net->def->bridge, "1:2", @@ -4448,6 +4453,16 @@ networkUnplugBandwidth(virNetworkObjPtr net, goto cleanup; /* update sum of 'floor'-s of attached NICs */ net->floor_sum -= iface->bandwidth->in->floor; + /* return class ID */ + ignore_value(virBitmapClearBit(net->class_id, + iface->data.network.actual->class_id)); + /* update status file */ + if (virNetworkSaveStatus(NETWORK_STATE_DIR, net) < 0) { + net->floor_sum += iface->bandwidth->in->floor; + ignore_value(virBitmapSetBit(net->class_id, + iface->data.network.actual->class_id)); + goto cleanup; + } /* update rate for non guaranteed NICs */ new_rate -= net->floor_sum; if (virNetDevBandwidthUpdateRate(net->def->bridge, "1:2", -- 1.7.8.6

On 12/04/2012 02:19 PM, Michal Privoznik wrote:
Currently, we are only keeping a inactive XML configuration in status dir. This is no longer enough as we need to keep this class_id attribute so we don't overwrite old entries when the daemon restarts. However, since there has already been release which has just <network/> as root element, and we want to keep things compatible, detect that loaded status file is older one, and don't scream about it. --- src/conf/network_conf.c | 205 ++++++++++++++++++++++++++++++++++--------- src/conf/network_conf.h | 2 + src/libvirt_private.syms | 1 + src/network/bridge_driver.c | 27 +++++-- 4 files changed, 189 insertions(+), 46 deletions(-)
diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index 441ccdc..6ede26d 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -1671,6 +1671,79 @@ cleanup: return def; }
+int +virNetworkObjUpdateParseFile(const char *filename, + virNetworkObjPtr net) +{ + int ret = -1; + xmlDocPtr xml = NULL; + xmlNodePtr node = NULL; + virNetworkDefPtr tmp = NULL; + xmlXPathContextPtr ctxt = NULL; + + xml = virXMLParse(filename, NULL, _("(network status)")); + if (!xml) + return -1; + + ctxt = xmlXPathNewContext(xml); + if (ctxt == NULL) { + virReportOOMError(); + goto cleanup; + } + + node = xmlDocGetRootElement(xml); + if (xmlStrEqual(node->name, BAD_CAST "networkstatus")) { + /* Newer network status file. Contains useful + * info which are not to be found in bare config XML */ + char *class_id = NULL; + char *floor_sum = NULL; + + ctxt->node = node; + class_id = virXPathString("string(./class_id[1]/@bitmap)", ctxt); + if (class_id && + virBitmapParse(class_id, ',', + &net->class_id, CLASS_ID_BITMAP_SIZE) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed 'class_id' attribute: %s"), + class_id); + VIR_FREE(class_id); + goto cleanup; + } + VIR_FREE(class_id); + + floor_sum = virXPathString("string(./floor[1]/@sum)", ctxt); + if (floor_sum && + virStrToLong_ull(floor_sum, NULL, 10, &net->floor_sum) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed 'floor_sum' attribute: %s"), + floor_sum); + VIR_FREE(floor_sum); + } + VIR_FREE(floor_sum); + } + + node = virXPathNode("//network", ctxt); + if (!node) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Could not find any 'network' element")); + goto cleanup; + } + + ctxt->node = node; + tmp = virNetworkDefParseXML(ctxt); + + if (tmp) { + net->newDef = net->def; + net->def = tmp; + } + + ret = 0; + +cleanup: + xmlXPathFreeContext(ctxt); + return ret; +} + static int virNetworkDNSDefFormat(virBufferPtr buf, virNetworkDNSDefPtr def) @@ -1849,24 +1922,26 @@ virPortGroupDefFormat(virBufferPtr buf, return 0; }
-char *virNetworkDefFormat(const virNetworkDefPtr def, unsigned int flags) +static int +virNetworkDefFormatInternal(virBufferPtr buf, + const virNetworkDefPtr def, + unsigned int flags) { - virBuffer buf = VIR_BUFFER_INITIALIZER; unsigned char *uuid; char uuidstr[VIR_UUID_STRING_BUFLEN]; int ii;
- virBufferAddLit(&buf, "<network"); + virBufferAddLit(buf, "<network"); if (!(flags & VIR_NETWORK_XML_INACTIVE) && (def->connections > 0)) { - virBufferAsprintf(&buf, " connections='%d'", def->connections); + virBufferAsprintf(buf, " connections='%d'", def->connections); } - virBufferAddLit(&buf, ">\n"); - virBufferAdjustIndent(&buf, 2); - virBufferEscapeString(&buf, "<name>%s</name>\n", def->name); + virBufferAddLit(buf, ">\n"); + virBufferAdjustIndent(buf, 2); + virBufferEscapeString(buf, "<name>%s</name>\n", def->name);
uuid = def->uuid; virUUIDFormat(uuid, uuidstr); - virBufferAsprintf(&buf, "<uuid>%s</uuid>\n", uuidstr); + virBufferAsprintf(buf, "<uuid>%s</uuid>\n", uuidstr);
if (def->forwardType != VIR_NETWORK_FORWARD_NONE) { const char *dev = NULL; @@ -1880,40 +1955,40 @@ char *virNetworkDefFormat(const virNetworkDefPtr def, unsigned int flags) def->forwardType, def->name); goto error; } - virBufferAddLit(&buf, "<forward"); - virBufferEscapeString(&buf, " dev='%s'", dev); - virBufferAsprintf(&buf, " mode='%s'", mode); + virBufferAddLit(buf, "<forward"); + virBufferEscapeString(buf, " dev='%s'", dev); + virBufferAsprintf(buf, " mode='%s'", mode); if (def->forwardType == VIR_NETWORK_FORWARD_HOSTDEV) { if (def->managed == 1) - virBufferAddLit(&buf, " managed='yes'"); + virBufferAddLit(buf, " managed='yes'"); else - virBufferAddLit(&buf, " managed='no'"); + virBufferAddLit(buf, " managed='no'"); } - virBufferAsprintf(&buf, "%s>\n", + virBufferAsprintf(buf, "%s>\n", (def->nForwardIfs || def->nForwardPfs) ? "" : "/"); - virBufferAdjustIndent(&buf, 2); + virBufferAdjustIndent(buf, 2);
/* For now, hard-coded to at most 1 forwardPfs */ if (def->nForwardPfs) - virBufferEscapeString(&buf, "<pf dev='%s'/>\n", + virBufferEscapeString(buf, "<pf dev='%s'/>\n", def->forwardPfs[0].dev);
if (def->nForwardIfs && (!def->nForwardPfs || !(flags & VIR_NETWORK_XML_INACTIVE))) { for (ii = 0; ii < def->nForwardIfs; ii++) { if (def->forwardType != VIR_NETWORK_FORWARD_HOSTDEV) { - virBufferEscapeString(&buf, "<interface dev='%s'", + virBufferEscapeString(buf, "<interface dev='%s'", def->forwardIfs[ii].device.dev); if (!(flags & VIR_NETWORK_XML_INACTIVE) && (def->forwardIfs[ii].connections > 0)) { - virBufferAsprintf(&buf, " connections='%d'", + virBufferAsprintf(buf, " connections='%d'", def->forwardIfs[ii].connections); } - virBufferAddLit(&buf, "/>\n"); + virBufferAddLit(buf, "/>\n"); } else { if (def->forwardIfs[ii].type == VIR_NETWORK_FORWARD_HOSTDEV_DEVICE_PCI) { - if (virDevicePCIAddressFormat(&buf, + if (virDevicePCIAddressFormat(buf, def->forwardIfs[ii].device.pci, true) < 0) goto error; @@ -1921,67 +1996,116 @@ char *virNetworkDefFormat(const virNetworkDefPtr def, unsigned int flags) } } } - virBufferAdjustIndent(&buf, -2); + virBufferAdjustIndent(buf, -2); if (def->nForwardPfs || def->nForwardIfs) - virBufferAddLit(&buf, "</forward>\n"); + virBufferAddLit(buf, "</forward>\n"); }
if (def->forwardType == VIR_NETWORK_FORWARD_NONE || - def->forwardType == VIR_NETWORK_FORWARD_NAT || - def->forwardType == VIR_NETWORK_FORWARD_ROUTE) { + def->forwardType == VIR_NETWORK_FORWARD_NAT || + def->forwardType == VIR_NETWORK_FORWARD_ROUTE) {
- virBufferAddLit(&buf, "<bridge"); + virBufferAddLit(buf, "<bridge"); if (def->bridge) - virBufferEscapeString(&buf, " name='%s'", def->bridge); - virBufferAsprintf(&buf, " stp='%s' delay='%ld' />\n", + virBufferEscapeString(buf, " name='%s'", def->bridge); + virBufferAsprintf(buf, " stp='%s' delay='%ld' />\n", def->stp ? "on" : "off", def->delay); } else if (def->forwardType == VIR_NETWORK_FORWARD_BRIDGE && def->bridge) { - virBufferEscapeString(&buf, "<bridge name='%s' />\n", def->bridge); + virBufferEscapeString(buf, "<bridge name='%s' />\n", def->bridge); }
if (def->mac_specified) { char macaddr[VIR_MAC_STRING_BUFLEN]; virMacAddrFormat(&def->mac, macaddr); - virBufferAsprintf(&buf, "<mac address='%s'/>\n", macaddr); + virBufferAsprintf(buf, "<mac address='%s'/>\n", macaddr); }
if (def->domain) - virBufferAsprintf(&buf, "<domain name='%s'/>\n", def->domain); + virBufferAsprintf(buf, "<domain name='%s'/>\n", def->domain);
- if (virNetworkDNSDefFormat(&buf, def->dns) < 0) + if (virNetworkDNSDefFormat(buf, def->dns) < 0) goto error;
- if (virNetDevVlanFormat(&def->vlan, &buf) < 0) + if (virNetDevVlanFormat(&def->vlan, buf) < 0) goto error; - if (virNetDevBandwidthFormat(def->bandwidth, &buf) < 0) + if (virNetDevBandwidthFormat(def->bandwidth, buf) < 0) goto error;
for (ii = 0; ii < def->nips; ii++) { - if (virNetworkIpDefFormat(&buf, &def->ips[ii]) < 0) + if (virNetworkIpDefFormat(buf, &def->ips[ii]) < 0) goto error; }
- if (virNetDevVPortProfileFormat(def->virtPortProfile, &buf) < 0) + if (virNetDevVPortProfileFormat(def->virtPortProfile, buf) < 0) goto error;
for (ii = 0; ii < def->nPortGroups; ii++) - if (virPortGroupDefFormat(&buf, &def->portGroups[ii]) < 0) + if (virPortGroupDefFormat(buf, &def->portGroups[ii]) < 0) goto error;
+ virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</network>\n"); + + return 0; + +error: + return -1; +} + +char * +virNetworkDefFormat(virNetworkDefPtr def, + unsigned int flags) +{ + virBuffer buf = VIR_BUFFER_INITIALIZER; + + if (virNetworkDefFormatInternal(&buf, def, flags) < 0) + goto error; + + if (virBufferError(&buf)) + goto no_memory; + + return virBufferContentAndReset(&buf); + +no_memory: + virReportOOMError(); +error: + virBufferFreeAndReset(&buf); + return NULL; +} + +static char * +virNetworkObjFormat(virNetworkObjPtr net, + unsigned int flags) +{ + virBuffer buf = VIR_BUFFER_INITIALIZER; + char *class_id = virBitmapFormat(net->class_id); + + if (!class_id) + goto no_memory; + + virBufferAddLit(&buf, "<networkstatus>\n"); + virBufferAsprintf(&buf, " <class_id bitmap='%s'/>\n", class_id); + virBufferAsprintf(&buf, " <floor sum='%llu'/>\n", net->floor_sum); + VIR_FREE(class_id); + + virBufferAdjustIndent(&buf, 2); + if (virNetworkDefFormatInternal(&buf, net->def, flags) < 0) + goto error; + virBufferAdjustIndent(&buf, -2); - virBufferAddLit(&buf, "</network>\n"); + virBufferAddLit(&buf, "</networkstatus>");
if (virBufferError(&buf)) goto no_memory;
return virBufferContentAndReset(&buf);
- no_memory: +no_memory: virReportOOMError(); - error: +error: virBufferFreeAndReset(&buf); return NULL; } @@ -2053,9 +2177,10 @@ int virNetworkSaveStatus(const char *statusDir, virNetworkObjPtr network) { int ret = -1; + int flags = 0; char *xml;
- if (!(xml = virNetworkDefFormat(network->def, 0))) + if (!(xml = virNetworkObjFormat(network, flags))) goto cleanup;
if (virNetworkSaveXML(statusDir, network->def, xml)) diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h index 20f8a51..6f6b221 100644 --- a/src/conf/network_conf.h +++ b/src/conf/network_conf.h @@ -271,6 +271,8 @@ virNetworkDefPtr virNetworkDefParseString(const char *xmlStr); virNetworkDefPtr virNetworkDefParseFile(const char *filename); virNetworkDefPtr virNetworkDefParseNode(xmlDocPtr xml, xmlNodePtr root); +int virNetworkObjUpdateParseFile(const char *filename, + virNetworkObjPtr net);
char *virNetworkDefFormat(const virNetworkDefPtr def, unsigned int flags);
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index f5648a0..5bd2487 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -872,6 +872,7 @@ virNetworkObjSetDefTransient; virNetworkObjUnlock; virNetworkObjUnsetDefTransient; virNetworkObjUpdate; +virNetworkObjUpdateParseFile; virNetworkRemoveInactive; virNetworkSaveConfig; virNetworkSaveStatus; diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 15cbc87..7f3baaa 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -242,7 +242,6 @@ networkFindActiveConfigs(struct network_driver *driver) {
for (i = 0 ; i < driver->networks.count ; i++) { virNetworkObjPtr obj = driver->networks.objs[i]; - virNetworkDefPtr tmp; char *config;
virNetworkObjLock(obj); @@ -260,12 +259,10 @@ networkFindActiveConfigs(struct network_driver *driver) { }
/* Try and load the live config */ - tmp = virNetworkDefParseFile(config); + if (virNetworkObjUpdateParseFile(config, obj) < 0) + VIR_WARN("Unable to update config of '%s' network", + obj->def->name); VIR_FREE(config); - if (tmp) { - obj->newDef = obj->def; - obj->def = tmp; - }
/* If bridge exists, then mark it active */ if (obj->def->bridge && @@ -4414,6 +4411,14 @@ networkPlugBandwidth(virNetworkObjPtr net, iface->data.network.actual->class_id = class_id; /* update sum of 'floor'-s of attached NICs */ net->floor_sum += iface->bandwidth->in->floor; + /* update status file */ + if (virNetworkSaveStatus(NETWORK_STATE_DIR, net) < 0) { + ignore_value(virBitmapClearBit(net->class_id, class_id)); + net->floor_sum -= iface->bandwidth->in->floor; + iface->data.network.actual->class_id = 0; + ignore_value(virNetDevBandwidthUnplug(net->def->bridge, class_id)); + goto cleanup; + } /* update rate for non guaranteed NICs */ new_rate -= net->floor_sum; if (virNetDevBandwidthUpdateRate(net->def->bridge, "1:2", @@ -4448,6 +4453,16 @@ networkUnplugBandwidth(virNetworkObjPtr net, goto cleanup; /* update sum of 'floor'-s of attached NICs */ net->floor_sum -= iface->bandwidth->in->floor; + /* return class ID */ + ignore_value(virBitmapClearBit(net->class_id, + iface->data.network.actual->class_id)); + /* update status file */ + if (virNetworkSaveStatus(NETWORK_STATE_DIR, net) < 0) { + net->floor_sum += iface->bandwidth->in->floor; + ignore_value(virBitmapSetBit(net->class_id, + iface->data.network.actual->class_id)); + goto cleanup; + } /* update rate for non guaranteed NICs */ new_rate -= net->floor_sum; if (virNetDevBandwidthUpdateRate(net->def->bridge, "1:2",
ACK.

Interfaces keeps a class_id, which is an ID from which bridge part of QoS settings is derived. We need to store class_id in domain status file, so we can later pass it to virNetDevBandwidthUnplug. --- src/conf/domain_conf.c | 13 +++++++++++++ 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 99aa08d..9f6d1e9 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -4777,6 +4777,17 @@ virDomainActualNetDefParseXML(xmlNodePtr node, hostdev, flags) < 0) { goto error; } + } else if (actual->type == VIR_DOMAIN_NET_TYPE_NETWORK) { + char *class_id = virXPathString("string(./class/@id)", ctxt); + if (class_id && + virStrToLong_ui(class_id, NULL, 10, &actual->class_id) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to parse class id '%s'"), + class_id); + VIR_FREE(class_id); + goto error; + } + VIR_FREE(class_id); } bandwidth_node = virXPathNode("./bandwidth", ctxt); @@ -12467,6 +12478,8 @@ virDomainActualNetDefFormat(virBufferPtr buf, break; case VIR_DOMAIN_NET_TYPE_NETWORK: + if (def->class_id) + virBufferAsprintf(buf, "<class id='%u'/>", def->class_id); break; default: virReportError(VIR_ERR_INTERNAL_ERROR, -- 1.7.8.6

On 12/04/2012 02:19 PM, Michal Privoznik wrote:
Interfaces keeps a class_id, which is an ID from which bridge part of QoS settings is derived. We need to store class_id in domain status file, so we can later pass it to virNetDevBandwidthUnplug. --- src/conf/domain_conf.c | 13 +++++++++++++ 1 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 99aa08d..9f6d1e9 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -4777,6 +4777,17 @@ virDomainActualNetDefParseXML(xmlNodePtr node, hostdev, flags) < 0) { goto error; } + } else if (actual->type == VIR_DOMAIN_NET_TYPE_NETWORK) { + char *class_id = virXPathString("string(./class/@id)", ctxt); + if (class_id && + virStrToLong_ui(class_id, NULL, 10, &actual->class_id) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to parse class id '%s'"), + class_id); + VIR_FREE(class_id); + goto error; + } + VIR_FREE(class_id); }
bandwidth_node = virXPathNode("./bandwidth", ctxt); @@ -12467,6 +12478,8 @@ virDomainActualNetDefFormat(virBufferPtr buf, break;
case VIR_DOMAIN_NET_TYPE_NETWORK: + if (def->class_id) + virBufferAsprintf(buf, "<class id='%u'/>", def->class_id); break; default: virReportError(VIR_ERR_INTERNAL_ERROR,
I just added a comment to 6/8 - I think this patch should be done prior to 6/8, and include the addition of class_id to the actualNetDef (well, really I think that the class_id would be better added to virNetDevBandwidth instead). ACK with class_id moved into the bandwidth object (unless there's some reason you didn't do that), and a preference for re-ordering/re-grouping as I mentioned in the previous paragraph.

On 11.12.2012 13:03, Laine Stump wrote:
On 12/04/2012 02:19 PM, Michal Privoznik wrote:
Interfaces keeps a class_id, which is an ID from which bridge part of QoS settings is derived. We need to store class_id in domain status file, so we can later pass it to virNetDevBandwidthUnplug. --- src/conf/domain_conf.c | 13 +++++++++++++ 1 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 99aa08d..9f6d1e9 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -4777,6 +4777,17 @@ virDomainActualNetDefParseXML(xmlNodePtr node, hostdev, flags) < 0) { goto error; } + } else if (actual->type == VIR_DOMAIN_NET_TYPE_NETWORK) { + char *class_id = virXPathString("string(./class/@id)", ctxt); + if (class_id && + virStrToLong_ui(class_id, NULL, 10, &actual->class_id) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to parse class id '%s'"), + class_id); + VIR_FREE(class_id); + goto error; + } + VIR_FREE(class_id); }
bandwidth_node = virXPathNode("./bandwidth", ctxt); @@ -12467,6 +12478,8 @@ virDomainActualNetDefFormat(virBufferPtr buf, break;
case VIR_DOMAIN_NET_TYPE_NETWORK: + if (def->class_id) + virBufferAsprintf(buf, "<class id='%u'/>", def->class_id); break; default: virReportError(VIR_ERR_INTERNAL_ERROR,
I just added a comment to 6/8 - I think this patch should be done prior to 6/8, and include the addition of class_id to the actualNetDef (well, really I think that the class_id would be better added to virNetDevBandwidth instead).
ACK with class_id moved into the bandwidth object (unless there's some reason you didn't do that), and a preference for re-ordering/re-grouping as I mentioned in the previous paragraph.
My motive is to keep virNetDevBandwidth clean, so it contains just bandwidth definition. Class ID should be part of network object itself. Something similar to domain definition and domain object. Definition should keep defined values, while an object should contain runtime info. In this case: Class ID is not something from user. It's rather generated piece of information which - moreover - is dependent on current implementation. I mean, it just keeps track of assigned ID so we can talk to /sbin/tc (=current impl).
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 12/11/2012 08:14 AM, Michal Privoznik wrote:
On 12/04/2012 02:19 PM, Michal Privoznik wrote:
Interfaces keeps a class_id, which is an ID from which bridge part of QoS settings is derived. We need to store class_id in domain status file, so we can later pass it to virNetDevBandwidthUnplug. --- src/conf/domain_conf.c | 13 +++++++++++++ 1 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 99aa08d..9f6d1e9 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -4777,6 +4777,17 @@ virDomainActualNetDefParseXML(xmlNodePtr node, hostdev, flags) < 0) { goto error; } + } else if (actual->type == VIR_DOMAIN_NET_TYPE_NETWORK) { + char *class_id = virXPathString("string(./class/@id)", ctxt); + if (class_id && + virStrToLong_ui(class_id, NULL, 10, &actual->class_id) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to parse class id '%s'"), + class_id); + VIR_FREE(class_id); + goto error; + } + VIR_FREE(class_id); }
bandwidth_node = virXPathNode("./bandwidth", ctxt); @@ -12467,6 +12478,8 @@ virDomainActualNetDefFormat(virBufferPtr buf, break;
case VIR_DOMAIN_NET_TYPE_NETWORK: + if (def->class_id) + virBufferAsprintf(buf, "<class id='%u'/>", def->class_id); break; default: virReportError(VIR_ERR_INTERNAL_ERROR, I just added a comment to 6/8 - I think this patch should be done prior to 6/8, and include the addition of class_id to the actualNetDef (well, really I think that the class_id would be better added to virNetDevBandwidth instead).
ACK with class_id moved into the bandwidth object (unless there's some reason you didn't do that), and a preference for re-ordering/re-grouping as I mentioned in the previous paragraph. My motive is to keep virNetDevBandwidth clean, so it contains just bandwidth definition. Class ID should be part of network object itself. Something similar to domain definition and domain object. Definition should keep defined values, while an object should contain runtime info. In this case: Class ID is not something from user. It's rather generated
On 11.12.2012 13:03, Laine Stump wrote: piece of information which - moreover - is dependent on current implementation. I mean, it just keeps track of assigned ID so we can talk to /sbin/tc (=current impl).
Okay, I can see where that makes sense - the bandwidth element is also present in user config, and you don't want to pollute it there. Fine with me then :-)
participants (2)
-
Laine Stump
-
Michal Privoznik