[libvirt] [PATCH v1 00/11] Allow minimal bandwidth guarantee

This is a rework of my one year old patchset [1]. If you are brave enough and start reviewing it, you may want to read the last patch first. The whole magic hidden behind the scene is documented there. For now, the only supported combination where this works is virtual interface connected to virtual network. This network MUST have an inbound average set, at least. This tells how much bandwidth do we have, and how much can we guarantee. If we would miss this info, then we cannot guarantee anything. On the other hand, this is just a very small part of whole problem. Users MUST configure all other devices on traffic path to follow up their setting in libvirt. But that's out of our scope. 1: http://www.redhat.com/archives/libvir-list/2011-December/msg00486.html Michal Privoznik (11): bandwidth: add new 'floor' attribute bandwidth: Create hierarchical shaping classes bandwidth: Create (un)plug functions bandwidth: Create rate update function bandwidth: Create network (un)plug functions network: Create real network status files domain: Keep assigned class_id in domstatus XML network: Update status file on (un)plug network: Allow class ID to be reused bandwidth: Attach sfq to leaf node bandwidth: Add documentation docs/formatdomain.html.in | 20 ++- docs/schemas/networkcommon.rng | 5 + po/POTFILES.in | 1 + src/Makefile.am | 7 +- src/conf/domain_conf.c | 10 +- src/conf/domain_conf.h | 1 + src/conf/netdev_bandwidth_conf.c | 54 ++++++- src/conf/netdev_bandwidth_conf.h | 3 +- src/conf/network_conf.c | 230 ++++++++++++++++++++++----- src/conf/network_conf.h | 6 + src/libvirt_network.syms | 8 + src/libvirt_private.syms | 1 + src/lxc/lxc_process.c | 3 +- src/network/bridge_driver.c | 224 +++++++++++++++++++++++++- src/network/bridge_driver.h | 10 +- src/qemu/qemu_command.c | 35 +++-- src/qemu/qemu_domain.c | 66 +++++++- src/qemu/qemu_driver.c | 2 +- src/qemu/qemu_process.c | 12 ++ src/util/virnetdevbandwidth.c | 326 +++++++++++++++++++++++++++++++++++++- src/util/virnetdevbandwidth.h | 25 +++- src/util/virnetdevmacvlan.c | 2 +- 22 files changed, 956 insertions(+), 95 deletions(-) create mode 100644 src/libvirt_network.syms -- 1.7.8.6

This is however supported only on domain interfaces with type='network'. Moreover, target network needs to have at least inbound QoS set. This is required by hierarchical traffic shaping.
From now on, the required attribute for <inbound/> is either 'average' (old) or 'floor' (new). This new attribute can be used just for interfaces type of network (<interface type='network'/>) currently.
docs/formatdomain.html.in | 20 ++++++++++++-- docs/schemas/networkcommon.rng | 5 +++ src/conf/domain_conf.c | 6 +++- src/conf/netdev_bandwidth_conf.c | 54 +++++++++++++++++++++++++++++++++----- src/conf/netdev_bandwidth_conf.h | 3 +- src/conf/network_conf.c | 4 +- src/util/virnetdevbandwidth.h | 1 + 7 files changed, 78 insertions(+), 15 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index c8da33d..da8184a 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -3047,7 +3047,7 @@ qemu-kvm -net nic,model=? /dev/null <source network='default'/> <target dev='vnet0'/> <b><bandwidth> - <inbound average='1000' peak='5000' burst='1024'/> + <inbound average='1000' peak='5000' floor='200' burst='1024'/> <outbound average='128' peak='256' burst='256'/> </bandwidth></b> </interface> @@ -3062,14 +3062,28 @@ qemu-kvm -net nic,model=? /dev/null children element out result in no QoS applied on that traffic direction. So, when you want to shape only domain's incoming traffic, use <code>inbound</code> only, and vice versa. Each of these elements have one - mandatory attribute <code>average</code>. It specifies average bit rate on + mandatory attribute <code>average</code> (or <code>floor</code> as + described below). It specifies average bit rate on interface being shaped. Then there are two optional attributes: <code>peak</code>, which specifies maximum rate at which interface can send data, and <code>burst</code>, amount of bytes that can be burst at <code>peak</code> speed. Accepted values for attributes are integer numbers. The units for <code>average</code> and <code>peak</code> attributes are kilobytes per second, and for the <code>burst</code> just kilobytes. - <span class="since">Since 0.9.4</span> + <span class="since">Since 0.9.4</span> The <code>inbound</code> can + optionally have <code>floor</code> attribute. This is there for + guaranteeing minimal throughput for shaped interfaces. This, however, + requires that all traffic goes through one point where QoS decisions can + take place. That's why this attribute works only for virtual networks for + now (that is <code><interface type='network'/></code>. Moreover, the + virtual network the interface is connected to is required to have at least + inbound Qos set (<code>average</code> at least). Moreover, with + <code>floor<code> attribute users don't need to specify + <code>average</code>. However, <code>peak</code> and <code>burst</code> + attributes still require <code>average</code>. Currently, linux kernel + doesn't allow egress qdiscs to have any classes therefore + <code>floor</code> can be applied only on <code>inbound</code> and not + </code>outbound</code>. <span class="since">Since 1.0.1</span> </p> <h5><a name="elementVlanTag">Setting VLAN tag (on supported network types only)</a></h5> diff --git a/docs/schemas/networkcommon.rng b/docs/schemas/networkcommon.rng index c7749e7..51ff759 100644 --- a/docs/schemas/networkcommon.rng +++ b/docs/schemas/networkcommon.rng @@ -149,6 +149,11 @@ </attribute> </optional> <optional> + <attribute name="floor"> + <ref name="speed"/> + </attribute> + </optional> + <optional> <attribute name='burst'> <ref name="BurstSize"/> </attribute> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 99f03a9..bf23b77 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -4742,7 +4742,8 @@ virDomainActualNetDefParseXML(xmlNodePtr node, bandwidth_node = virXPathNode("./bandwidth", ctxt); if (bandwidth_node && - !(actual->bandwidth = virNetDevBandwidthParse(bandwidth_node))) + !(actual->bandwidth = virNetDevBandwidthParse(bandwidth_node, + actual->type))) goto error; vlanNode = virXPathNode("./vlan", ctxt); @@ -4930,7 +4931,8 @@ virDomainNetDefParseXML(virCapsPtr caps, goto error; } } else if (xmlStrEqual(cur->name, BAD_CAST "bandwidth")) { - if (!(def->bandwidth = virNetDevBandwidthParse(cur))) + if (!(def->bandwidth = virNetDevBandwidthParse(cur, + def->type))) goto error; } else if (xmlStrEqual(cur->name, BAD_CAST "vlan")) { if (virNetDevVlanParse(cur, ctxt, &def->vlan) < 0) diff --git a/src/conf/netdev_bandwidth_conf.c b/src/conf/netdev_bandwidth_conf.c index 261718f..38c5a5b 100644 --- a/src/conf/netdev_bandwidth_conf.c +++ b/src/conf/netdev_bandwidth_conf.c @@ -26,6 +26,7 @@ #include "virterror_internal.h" #include "util.h" #include "memory.h" +#include "domain_conf.h" #define VIR_FROM_THIS VIR_FROM_NONE @@ -36,6 +37,7 @@ virNetDevBandwidthParseRate(xmlNodePtr node, virNetDevBandwidthRatePtr rate) char *average = NULL; char *peak = NULL; char *burst = NULL; + char *floor = NULL; if (!node || !rate) { virReportError(VIR_ERR_INVALID_ARG, "%s", @@ -46,6 +48,7 @@ virNetDevBandwidthParseRate(xmlNodePtr node, virNetDevBandwidthRatePtr rate) average = virXMLPropString(node, "average"); peak = virXMLPropString(node, "peak"); burst = virXMLPropString(node, "burst"); + floor = virXMLPropString(node, "floor"); if (average) { if (virStrToLong_ull(average, NULL, 10, &rate->average) < 0) { @@ -54,9 +57,15 @@ virNetDevBandwidthParseRate(xmlNodePtr node, virNetDevBandwidthRatePtr rate) average); goto cleanup; } - } else { + } else if (!floor) { virReportError(VIR_ERR_XML_DETAIL, "%s", - _("Missing mandatory average attribute")); + _("Missing mandatory average or floor attributes")); + goto cleanup; + } + + if ((peak || burst) && !average) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("'peak' and 'burst' require 'average' attribute")); goto cleanup; } @@ -74,12 +83,20 @@ virNetDevBandwidthParseRate(xmlNodePtr node, virNetDevBandwidthRatePtr rate) goto cleanup; } + if (floor && virStrToLong_ull(floor, NULL, 10, &rate->floor) < 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("could not convert %s"), + floor); + goto cleanup; + } + ret = 0; cleanup: VIR_FREE(average); VIR_FREE(peak); VIR_FREE(burst); + VIR_FREE(floor); return ret; } @@ -87,13 +104,17 @@ cleanup: /** * virNetDevBandwidthParse: * @node: XML node + * @net_type: one of virDomainNetType * - * Parse bandwidth XML and return pointer to structure + * Parse bandwidth XML and return pointer to structure. + * @net_type tell to which type will/is interface connected to. + * Pass -1 if this is not called on interface. * * Returns !NULL on success, NULL on error. */ virNetDevBandwidthPtr -virNetDevBandwidthParse(xmlNodePtr node) +virNetDevBandwidthParse(xmlNodePtr node, + int net_type) { virNetDevBandwidthPtr def = NULL; xmlNodePtr cur = node->children; @@ -144,6 +165,13 @@ virNetDevBandwidthParse(xmlNodePtr node) /* helper reported error for us */ goto error; } + + if (def->in->floor && net_type != VIR_DOMAIN_NET_TYPE_NETWORK) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("floor attribute is supported only for " + "interfaces of type network")); + goto error; + } } if (out) { @@ -156,6 +184,13 @@ virNetDevBandwidthParse(xmlNodePtr node) /* helper reported error for us */ goto error; } + + if (def->out->floor) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("'floor' attribute allowed " + "only in <inbound> element")); + goto error; + } } return def; @@ -175,13 +210,18 @@ virNetDevBandwidthRateFormat(virNetDevBandwidthRatePtr def, if (!def) return 0; - if (def->average) { - virBufferAsprintf(buf, " <%s average='%llu'", elem_name, - def->average); + if (def->average || def->floor) { + virBufferAsprintf(buf, " <%s", elem_name); + + if (def->average) + virBufferAsprintf(buf, " average='%llu'", def->average); if (def->peak) virBufferAsprintf(buf, " peak='%llu'", def->peak); + if (def->floor) + virBufferAsprintf(buf, " floor='%llu'", def->floor); + if (def->burst) virBufferAsprintf(buf, " burst='%llu'", def->burst); virBufferAddLit(buf, "/>\n"); diff --git a/src/conf/netdev_bandwidth_conf.h b/src/conf/netdev_bandwidth_conf.h index bca5c50..0080165 100644 --- a/src/conf/netdev_bandwidth_conf.h +++ b/src/conf/netdev_bandwidth_conf.h @@ -28,7 +28,8 @@ # include "buf.h" # include "xml.h" -virNetDevBandwidthPtr virNetDevBandwidthParse(xmlNodePtr node) +virNetDevBandwidthPtr virNetDevBandwidthParse(xmlNodePtr node, + int net_type) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; int virNetDevBandwidthFormat(virNetDevBandwidthPtr def, virBufferPtr buf) diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index 228951d..41831e0 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -1186,7 +1186,7 @@ virNetworkPortGroupParseXML(virPortGroupDefPtr def, bandwidth_node = virXPathNode("./bandwidth", ctxt); if (bandwidth_node && - !(def->bandwidth = virNetDevBandwidthParse(bandwidth_node))) { + !(def->bandwidth = virNetDevBandwidthParse(bandwidth_node, -1))) { goto error; } @@ -1262,7 +1262,7 @@ virNetworkDefParseXML(xmlXPathContextPtr ctxt) def->domain = virXPathString("string(./domain[1]/@name)", ctxt); if ((bandwidthNode = virXPathNode("./bandwidth", ctxt)) != NULL && - (def->bandwidth = virNetDevBandwidthParse(bandwidthNode)) == NULL) + (def->bandwidth = virNetDevBandwidthParse(bandwidthNode, -1)) == NULL) goto error; vlanNode = virXPathNode("./vlan", ctxt); diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index e046230..35f8b89 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -30,6 +30,7 @@ typedef virNetDevBandwidthRate *virNetDevBandwidthRatePtr; struct _virNetDevBandwidthRate { unsigned long long average; /* kbytes/s */ unsigned long long peak; /* kbytes/s */ + unsigned long long floor; /* kbytes/s */ unsigned long long burst; /* kbytes */ }; -- 1.7.8.6

On 11/19/2012 11:51 AM, Michal Privoznik wrote:
This is however supported only on domain interfaces with type='network'. Moreover, target network needs to have at least inbound QoS set. This is required by hierarchical traffic shaping.
From now on, the required attribute for <inbound/> is either 'average' (old) or 'floor' (new). This new attribute can be used just for interfaces type of network (<interface type='network'/>) currently.
docs/formatdomain.html.in | 20 ++++++++++++-- docs/schemas/networkcommon.rng | 5 +++ src/conf/domain_conf.c | 6 +++- src/conf/netdev_bandwidth_conf.c | 54 +++++++++++++++++++++++++++++++++----- src/conf/netdev_bandwidth_conf.h | 3 +- src/conf/network_conf.c | 4 +- src/util/virnetdevbandwidth.h | 1 + 7 files changed, 78 insertions(+), 15 deletions(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index c8da33d..da8184a 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -3047,7 +3047,7 @@ qemu-kvm -net nic,model=? /dev/null <source network='default'/> <target dev='vnet0'/> <b><bandwidth> - <inbound average='1000' peak='5000' burst='1024'/> + <inbound average='1000' peak='5000' floor='200' burst='1024'/> <outbound average='128' peak='256' burst='256'/> </bandwidth></b> </interface> @@ -3062,14 +3062,28 @@ qemu-kvm -net nic,model=? /dev/null children element out result in no QoS applied on that traffic direction. So, when you want to shape only domain's incoming traffic, use <code>inbound</code> only, and vice versa. Each of these elements have one - mandatory attribute <code>average</code>. It specifies average bit rate on + mandatory attribute <code>average</code> (or <code>floor</code> as + described below). It specifies average bit rate on
s|It|<code>average</code>|
interface being shaped. Then there are two optional attributes:
*the* interface being shaped
<code>peak</code>, which specifies maximum rate at which interface can send data, and <code>burst</code>, amount of bytes that can be burst at <code>peak</code> speed. Accepted values for attributes are integer numbers. The units for <code>average</code> and <code>peak</code> attributes are kilobytes per second, and for the <code>burst</code> just kilobytes. - <span class="since">Since 0.9.4</span> + <span class="since">Since 0.9.4</span> The <code>inbound</code> can + optionally have <code>floor</code> attribute. This is there for + guaranteeing minimal throughput for shaped interfaces. This, however, + requires that all traffic goes through one point where QoS decisions can + take place. That's why this attribute works only for virtual networks for + now (that is <code><interface type='network'/></code>.
with a forward type of route, nat, or no forward at all (i.e. forward type can't be 'bridge' or 'hostdev')
Moreover, the + virtual network the interface is connected to is required to have at least + inbound Qos set (<code>average</code> at least). Moreover, with + <code>floor<code> attribute users don't need to specify + <code>average</code>. However, <code>peak</code> and <code>burst</code> + attributes still require <code>average</code>. Currently, linux kernel + doesn't allow egress qdiscs to have any classes therefore + <code>floor</code> can be applied only on <code>inbound</code> and not + </code>outbound</code>. <span class="since">Since 1.0.1</span> </p>
<h5><a name="elementVlanTag">Setting VLAN tag (on supported network types only)</a></h5> diff --git a/docs/schemas/networkcommon.rng b/docs/schemas/networkcommon.rng index c7749e7..51ff759 100644 --- a/docs/schemas/networkcommon.rng +++ b/docs/schemas/networkcommon.rng @@ -149,6 +149,11 @@ </attribute> </optional> <optional> + <attribute name="floor"> + <ref name="speed"/> + </attribute> + </optional> + <optional> <attribute name='burst'> <ref name="BurstSize"/> </attribute> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 99f03a9..bf23b77 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -4742,7 +4742,8 @@ virDomainActualNetDefParseXML(xmlNodePtr node,
bandwidth_node = virXPathNode("./bandwidth", ctxt); if (bandwidth_node && - !(actual->bandwidth = virNetDevBandwidthParse(bandwidth_node))) + !(actual->bandwidth = virNetDevBandwidthParse(bandwidth_node, + actual->type))) goto error;
vlanNode = virXPathNode("./vlan", ctxt); @@ -4930,7 +4931,8 @@ virDomainNetDefParseXML(virCapsPtr caps, goto error; } } else if (xmlStrEqual(cur->name, BAD_CAST "bandwidth")) { - if (!(def->bandwidth = virNetDevBandwidthParse(cur))) + if (!(def->bandwidth = virNetDevBandwidthParse(cur, + def->type)))
Of course the nominal type of an interface could be "network", but when it's actually allocated it could end up being one with fordward mode='bridge|hostdev', so you need to validate again at runtime. I haven't looked ahead yet to see if you're doing that.
goto error; } else if (xmlStrEqual(cur->name, BAD_CAST "vlan")) { if (virNetDevVlanParse(cur, ctxt, &def->vlan) < 0) diff --git a/src/conf/netdev_bandwidth_conf.c b/src/conf/netdev_bandwidth_conf.c index 261718f..38c5a5b 100644 --- a/src/conf/netdev_bandwidth_conf.c +++ b/src/conf/netdev_bandwidth_conf.c @@ -26,6 +26,7 @@ #include "virterror_internal.h" #include "util.h" #include "memory.h" +#include "domain_conf.h"
#define VIR_FROM_THIS VIR_FROM_NONE
@@ -36,6 +37,7 @@ virNetDevBandwidthParseRate(xmlNodePtr node, virNetDevBandwidthRatePtr rate) char *average = NULL; char *peak = NULL; char *burst = NULL; + char *floor = NULL;
if (!node || !rate) { virReportError(VIR_ERR_INVALID_ARG, "%s", @@ -46,6 +48,7 @@ virNetDevBandwidthParseRate(xmlNodePtr node, virNetDevBandwidthRatePtr rate) average = virXMLPropString(node, "average"); peak = virXMLPropString(node, "peak"); burst = virXMLPropString(node, "burst"); + floor = virXMLPropString(node, "floor");
if (average) { if (virStrToLong_ull(average, NULL, 10, &rate->average) < 0) { @@ -54,9 +57,15 @@ virNetDevBandwidthParseRate(xmlNodePtr node, virNetDevBandwidthRatePtr rate) average); goto cleanup; } - } else { + } else if (!floor) { virReportError(VIR_ERR_XML_DETAIL, "%s", - _("Missing mandatory average attribute")); + _("Missing mandatory average or floor attributes")); + goto cleanup; + } + + if ((peak || burst) && !average) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("'peak' and 'burst' require 'average' attribute")); goto cleanup; }
I didn't understand if floor and average were mutually exclusive. If that's the case, then you need to check for it here. Otherwise, I just need to understand better :-)
@@ -74,12 +83,20 @@ virNetDevBandwidthParseRate(xmlNodePtr node, virNetDevBandwidthRatePtr rate) goto cleanup; }
+ if (floor && virStrToLong_ull(floor, NULL, 10, &rate->floor) < 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("could not convert %s"),
mention in the log that it this was for the floor attribute, e.g. "could not convert bandwidth floor value '%s'".
+ floor); + goto cleanup; + } + ret = 0;
cleanup: VIR_FREE(average); VIR_FREE(peak); VIR_FREE(burst); + VIR_FREE(floor);
return ret; } @@ -87,13 +104,17 @@ cleanup: /** * virNetDevBandwidthParse: * @node: XML node + * @net_type: one of virDomainNetType * - * Parse bandwidth XML and return pointer to structure + * Parse bandwidth XML and return pointer to structure. + * @net_type tell to which type will/is interface connected to. + * Pass -1 if this is not called on interface. * * Returns !NULL on success, NULL on error. */ virNetDevBandwidthPtr -virNetDevBandwidthParse(xmlNodePtr node) +virNetDevBandwidthParse(xmlNodePtr node, + int net_type) { virNetDevBandwidthPtr def = NULL; xmlNodePtr cur = node->children; @@ -144,6 +165,13 @@ virNetDevBandwidthParse(xmlNodePtr node) /* helper reported error for us */ goto error; } + + if (def->in->floor && net_type != VIR_DOMAIN_NET_TYPE_NETWORK) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("floor attribute is supported only for " + "interfaces of type network"));
It would be nice to have a slightly different error message when someone tried to specify floor on a network's bandwidth (like "setting the floor attribute is not supported for an entire network's bandwidth". (I don't really like the wording of what I just said, but you get the idea. MY coffee hasn't kicked in yet :-). I guess you could do this based on the magic value net_type == -1, although that sounds a bit kludgy).
+ goto error; + } }
if (out) { @@ -156,6 +184,13 @@ virNetDevBandwidthParse(xmlNodePtr node) /* helper reported error for us */ goto error; } + + if (def->out->floor) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("'floor' attribute allowed " + "only in <inbound> element")); + goto error; + } }
return def; @@ -175,13 +210,18 @@ virNetDevBandwidthRateFormat(virNetDevBandwidthRatePtr def, if (!def) return 0;
- if (def->average) { - virBufferAsprintf(buf, " <%s average='%llu'", elem_name, - def->average); + if (def->average || def->floor) { + virBufferAsprintf(buf, " <%s", elem_name); + + if (def->average) + virBufferAsprintf(buf, " average='%llu'", def->average);
if (def->peak) virBufferAsprintf(buf, " peak='%llu'", def->peak);
+ if (def->floor) + virBufferAsprintf(buf, " floor='%llu'", def->floor); + if (def->burst) virBufferAsprintf(buf, " burst='%llu'", def->burst); virBufferAddLit(buf, "/>\n"); diff --git a/src/conf/netdev_bandwidth_conf.h b/src/conf/netdev_bandwidth_conf.h index bca5c50..0080165 100644 --- a/src/conf/netdev_bandwidth_conf.h +++ b/src/conf/netdev_bandwidth_conf.h @@ -28,7 +28,8 @@ # include "buf.h" # include "xml.h"
-virNetDevBandwidthPtr virNetDevBandwidthParse(xmlNodePtr node) +virNetDevBandwidthPtr virNetDevBandwidthParse(xmlNodePtr node, + int net_type) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; int virNetDevBandwidthFormat(virNetDevBandwidthPtr def, virBufferPtr buf) diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index 228951d..41831e0 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -1186,7 +1186,7 @@ virNetworkPortGroupParseXML(virPortGroupDefPtr def,
bandwidth_node = virXPathNode("./bandwidth", ctxt); if (bandwidth_node && - !(def->bandwidth = virNetDevBandwidthParse(bandwidth_node))) { + !(def->bandwidth = virNetDevBandwidthParse(bandwidth_node, -1))) { goto error; }
@@ -1262,7 +1262,7 @@ virNetworkDefParseXML(xmlXPathContextPtr ctxt) def->domain = virXPathString("string(./domain[1]/@name)", ctxt);
if ((bandwidthNode = virXPathNode("./bandwidth", ctxt)) != NULL && - (def->bandwidth = virNetDevBandwidthParse(bandwidthNode)) == NULL) + (def->bandwidth = virNetDevBandwidthParse(bandwidthNode, -1)) == NULL) goto error;
vlanNode = virXPathNode("./vlan", ctxt); diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index e046230..35f8b89 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -30,6 +30,7 @@ typedef virNetDevBandwidthRate *virNetDevBandwidthRatePtr; struct _virNetDevBandwidthRate { unsigned long long average; /* kbytes/s */ unsigned long long peak; /* kbytes/s */ + unsigned long long floor; /* kbytes/s */ unsigned long long burst; /* kbytes */ };
ACK with the minor things fixed (the bit about having a different log message for network vs. interface is optional; or maybe you can think of a single message that works better for both cases).

On 30.11.2012 13:14, Laine Stump wrote:
On 11/19/2012 11:51 AM, Michal Privoznik wrote:
This is however supported only on domain interfaces with type='network'. Moreover, target network needs to have at least inbound QoS set. This is required by hierarchical traffic shaping.
From now on, the required attribute for <inbound/> is either 'average' (old) or 'floor' (new). This new attribute can be used just for interfaces type of network (<interface type='network'/>) currently.
docs/formatdomain.html.in | 20 ++++++++++++-- docs/schemas/networkcommon.rng | 5 +++ src/conf/domain_conf.c | 6 +++- src/conf/netdev_bandwidth_conf.c | 54 +++++++++++++++++++++++++++++++++----- src/conf/netdev_bandwidth_conf.h | 3 +- src/conf/network_conf.c | 4 +- src/util/virnetdevbandwidth.h | 1 + 7 files changed, 78 insertions(+), 15 deletions(-)
diff --git a/src/conf/netdev_bandwidth_conf.c b/src/conf/netdev_bandwidth_conf.c index 261718f..38c5a5b 100644 --- a/src/conf/netdev_bandwidth_conf.c +++ b/src/conf/netdev_bandwidth_conf.c @@ -26,6 +26,7 @@ #include "virterror_internal.h" #include "util.h" #include "memory.h" +#include "domain_conf.h"
#define VIR_FROM_THIS VIR_FROM_NONE
@@ -36,6 +37,7 @@ virNetDevBandwidthParseRate(xmlNodePtr node, virNetDevBandwidthRatePtr rate) char *average = NULL; char *peak = NULL; char *burst = NULL; + char *floor = NULL;
if (!node || !rate) { virReportError(VIR_ERR_INVALID_ARG, "%s", @@ -46,6 +48,7 @@ virNetDevBandwidthParseRate(xmlNodePtr node, virNetDevBandwidthRatePtr rate) average = virXMLPropString(node, "average"); peak = virXMLPropString(node, "peak"); burst = virXMLPropString(node, "burst"); + floor = virXMLPropString(node, "floor");
if (average) { if (virStrToLong_ull(average, NULL, 10, &rate->average) < 0) { @@ -54,9 +57,15 @@ virNetDevBandwidthParseRate(xmlNodePtr node, virNetDevBandwidthRatePtr rate) average); goto cleanup; } - } else { + } else if (!floor) { virReportError(VIR_ERR_XML_DETAIL, "%s", - _("Missing mandatory average attribute")); + _("Missing mandatory average or floor attributes")); + goto cleanup; + } + + if ((peak || burst) && !average) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("'peak' and 'burst' require 'average' attribute")); goto cleanup; }
I didn't understand if floor and average were mutually exclusive. If that's the case, then you need to check for it here. Otherwise, I just need to understand better :-)
They are not mutually exclusive. The interface (of correct type as you've pointed out above) can has both 'floor' and 'average'. The first is applied on bridge the second directly on the interface. I admit that this QoS magic is kind of black box. But if you ever saw a picture of flow of a packet within linux kernel, where are which {ip,eb}tables rules applied, then you know why is that. See [1] for instance.
ACK with the minor things fixed (the bit about having a different log message for network vs. interface is optional; or maybe you can think of a single message that works better for both cases).
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
1: http://blog.propter.net/wp-content/uploads/2012/02/PacketFlow.png

On 11/19/2012 11:51 AM, Michal Privoznik wrote:
This is however supported only on domain interfaces with type='network'. Moreover, target network needs to have at least inbound QoS set. This is required by hierarchical traffic shaping.
From now on, the required attribute for <inbound/> is either 'average' (old) or 'floor' (new). This new attribute can be used just for interfaces type of network (<interface type='network'/>) currently.
Another thing that I just thought of - this patch needs to update virNetdevBandwidthEqual to check floor, otherwise qemuNetChange won't properly detect when bandwidth is changed (btw, a small patch to do the right thing when that happens (similar to the patch I recently sent to support changing filters) would be really nice :-)

On 30.11.2012 13:58, Laine Stump wrote:
On 11/19/2012 11:51 AM, Michal Privoznik wrote:
This is however supported only on domain interfaces with type='network'. Moreover, target network needs to have at least inbound QoS set. This is required by hierarchical traffic shaping.
From now on, the required attribute for <inbound/> is either 'average' (old) or 'floor' (new). This new attribute can be used just for interfaces type of network (<interface type='network'/>) currently.
Another thing that I just thought of - this patch needs to update virNetdevBandwidthEqual to check floor, otherwise qemuNetChange won't properly detect when bandwidth is changed (btw, a small patch to do the right thing when that happens (similar to the patch I recently sent to support changing filters) would be really nice :-)
Okay, I'll save it for a separate patch. Michal

These classes can borrow unused bandwidth. Basically, only egress qdsics can have classes, therefore we can do this kind of traffic shaping only on host's outgoing, that is domain's incoming traffic. --- src/lxc/lxc_process.c | 3 ++- src/network/bridge_driver.c | 3 ++- src/qemu/qemu_command.c | 3 ++- src/qemu/qemu_driver.c | 2 +- src/util/virnetdevbandwidth.c | 33 +++++++++++++++++++++++++++------ src/util/virnetdevbandwidth.h | 4 +++- src/util/virnetdevmacvlan.c | 2 +- 7 files changed, 38 insertions(+), 12 deletions(-) diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index 079bc3a..91ce2d3 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -335,7 +335,8 @@ static int virLXCProcessSetupInterfaceBridged(virConnectPtr conn, goto cleanup; if (virNetDevBandwidthSet(net->ifname, - virDomainNetGetActualBandwidth(net)) < 0) { + virDomainNetGetActualBandwidth(net), + false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), net->ifname); diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index c153d36..256b601 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -2240,7 +2240,8 @@ networkStartNetworkVirtual(struct network_driver *driver, VIR_FORCE_CLOSE(tapfd); } - if (virNetDevBandwidthSet(network->def->bridge, network->def->bandwidth) < 0) { + if (virNetDevBandwidthSet(network->def->bridge, + network->def->bandwidth, true) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), network->def->bridge); diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 22bb209..c1f8680 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -292,7 +292,8 @@ qemuNetworkIfaceConnect(virDomainDefPtr def, if (tapfd >= 0 && virNetDevBandwidthSet(net->ifname, - virDomainNetGetActualBandwidth(net)) < 0) { + virDomainNetGetActualBandwidth(net), + false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), net->ifname); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 595c452..9ae0c1b 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -8960,7 +8960,7 @@ qemuDomainSetInterfaceParameters(virDomainPtr dom, sizeof(*newBandwidth->out)); } - if (virNetDevBandwidthSet(net->ifname, newBandwidth) < 0) { + if (virNetDevBandwidthSet(net->ifname, newBandwidth, false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), device); diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index bddb788..fcc6b56 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -45,17 +45,21 @@ virNetDevBandwidthFree(virNetDevBandwidthPtr def) * virNetDevBandwidthSet: * @ifname: on which interface * @bandwidth: rates to set (may be NULL) + * @hierarchical_class: whether to create hierarchical class * * This function enables QoS on specified interface * and set given traffic limits for both, incoming * and outgoing traffic. Any previous setting get - * overwritten. + * overwritten. If @hierarchical_class is TRUE, create + * hierarchical class. It is used to guarantee minimal + * throughput ('floor' attribute in NIC). * * Return 0 on success, -1 otherwise. */ int virNetDevBandwidthSet(const char *ifname, - virNetDevBandwidthPtr bandwidth) + virNetDevBandwidthPtr bandwidth, + bool hierarchical_class) { int ret = -1; virCommandPtr cmd = NULL; @@ -71,7 +75,7 @@ virNetDevBandwidthSet(const char *ifname, virNetDevBandwidthClear(ifname); - if (bandwidth->in) { + if (bandwidth->in && bandwidth->in->average) { if (virAsprintf(&average, "%llukbps", bandwidth->in->average) < 0) goto cleanup; if (bandwidth->in->peak && @@ -83,15 +87,32 @@ virNetDevBandwidthSet(const char *ifname, cmd = virCommandNew(TC); virCommandAddArgList(cmd, "qdisc", "add", "dev", ifname, "root", - "handle", "1:", "htb", "default", "1", NULL); + "handle", "1:", "htb", "default", + hierarchical_class ? "2" : "1", NULL); if (virCommandRun(cmd, NULL) < 0) goto cleanup; + if (hierarchical_class) { + /* If we are creating hierarchical class, all non guaranteed traffic + * goes to 1:2 class which will adjust 'rate' dynamically as NICs with + * guaranteed throughput are plugged and unplugged. Class 1:1 is there + * so we don't exceed the maximum limit for network. For each NIC with + * guaranteed throughput a separate classid will be created. + * NB '1:' is just a shorter notation of '1:0' */ + virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "add", "dev", ifname, "parent", + "1:", "classid", "1:1", "htb", "rate", average, + "ceil", peak ? peak : average, NULL); + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + } virCommandFree(cmd); cmd = virCommandNew(TC); virCommandAddArgList(cmd,"class", "add", "dev", ifname, "parent", - "1:", "classid", "1:1", "htb", NULL); - virCommandAddArgList(cmd, "rate", average, NULL); + hierarchical_class ? "1:1" : "1:", "classid", + hierarchical_class ? "1:2" : "1:1", "htb", + "rate", average, NULL); if (peak) virCommandAddArgList(cmd, "ceil", peak, NULL); diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index 35f8b89..d308ab2 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -42,7 +42,9 @@ struct _virNetDevBandwidth { void virNetDevBandwidthFree(virNetDevBandwidthPtr def); -int virNetDevBandwidthSet(const char *ifname, virNetDevBandwidthPtr bandwidth) +int virNetDevBandwidthSet(const char *ifname, + virNetDevBandwidthPtr bandwidth, + bool hierarchical_class) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; int virNetDevBandwidthClear(const char *ifname) ATTRIBUTE_NONNULL(1); diff --git a/src/util/virnetdevmacvlan.c b/src/util/virnetdevmacvlan.c index bd26ba9..cfe3f87 100644 --- a/src/util/virnetdevmacvlan.c +++ b/src/util/virnetdevmacvlan.c @@ -925,7 +925,7 @@ create_name: rc = 0; } - if (virNetDevBandwidthSet(cr_ifname, bandwidth) < 0) { + if (virNetDevBandwidthSet(cr_ifname, bandwidth, false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), cr_ifname); -- 1.7.8.6

On 11/19/2012 11:51 AM, Michal Privoznik wrote:
These classes can borrow unused bandwidth. Basically, only egress qdsics can have classes, therefore we can do this kind of traffic shaping only on host's outgoing, that is domain's incoming traffic. --- src/lxc/lxc_process.c | 3 ++- src/network/bridge_driver.c | 3 ++- src/qemu/qemu_command.c | 3 ++- src/qemu/qemu_driver.c | 2 +- src/util/virnetdevbandwidth.c | 33 +++++++++++++++++++++++++++------ src/util/virnetdevbandwidth.h | 4 +++- src/util/virnetdevmacvlan.c | 2 +- 7 files changed, 38 insertions(+), 12 deletions(-)
diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index 079bc3a..91ce2d3 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -335,7 +335,8 @@ static int virLXCProcessSetupInterfaceBridged(virConnectPtr conn, goto cleanup;
if (virNetDevBandwidthSet(net->ifname, - virDomainNetGetActualBandwidth(net)) < 0) { + virDomainNetGetActualBandwidth(net), + false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), net->ifname); diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index c153d36..256b601 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -2240,7 +2240,8 @@ networkStartNetworkVirtual(struct network_driver *driver, VIR_FORCE_CLOSE(tapfd); }
- if (virNetDevBandwidthSet(network->def->bridge, network->def->bandwidth) < 0) { + if (virNetDevBandwidthSet(network->def->bridge, + network->def->bandwidth, true) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), network->def->bridge); diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 22bb209..c1f8680 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -292,7 +292,8 @@ qemuNetworkIfaceConnect(virDomainDefPtr def,
if (tapfd >= 0 && virNetDevBandwidthSet(net->ifname, - virDomainNetGetActualBandwidth(net)) < 0) { + virDomainNetGetActualBandwidth(net), + false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), net->ifname); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 595c452..9ae0c1b 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -8960,7 +8960,7 @@ qemuDomainSetInterfaceParameters(virDomainPtr dom, sizeof(*newBandwidth->out)); }
- if (virNetDevBandwidthSet(net->ifname, newBandwidth) < 0) { + if (virNetDevBandwidthSet(net->ifname, newBandwidth, false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), device); diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index bddb788..fcc6b56 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -45,17 +45,21 @@ virNetDevBandwidthFree(virNetDevBandwidthPtr def) * virNetDevBandwidthSet: * @ifname: on which interface * @bandwidth: rates to set (may be NULL) + * @hierarchical_class: whether to create hierarchical class * * This function enables QoS on specified interface * and set given traffic limits for both, incoming * and outgoing traffic. Any previous setting get - * overwritten. + * overwritten. If @hierarchical_class is TRUE, create + * hierarchical class. It is used to guarantee minimal + * throughput ('floor' attribute in NIC). * * Return 0 on success, -1 otherwise. */ int virNetDevBandwidthSet(const char *ifname, - virNetDevBandwidthPtr bandwidth) + virNetDevBandwidthPtr bandwidth, + bool hierarchical_class) { int ret = -1; virCommandPtr cmd = NULL; @@ -71,7 +75,7 @@ virNetDevBandwidthSet(const char *ifname,
virNetDevBandwidthClear(ifname);
- if (bandwidth->in) { + if (bandwidth->in && bandwidth->in->average) { if (virAsprintf(&average, "%llukbps", bandwidth->in->average) < 0) goto cleanup; if (bandwidth->in->peak && @@ -83,15 +87,32 @@ virNetDevBandwidthSet(const char *ifname,
cmd = virCommandNew(TC); virCommandAddArgList(cmd, "qdisc", "add", "dev", ifname, "root", - "handle", "1:", "htb", "default", "1", NULL); + "handle", "1:", "htb", "default", + hierarchical_class ? "2" : "1", NULL);
Don't you really only need to do this if floor has been set? If you create the hierarchical class and don't need it, does this result in any extra overhead? If so, I think you should do (hierarchical_class && bandwidth->floor).
if (virCommandRun(cmd, NULL) < 0) goto cleanup;
+ if (hierarchical_class) {
similar
+ /* If we are creating hierarchical class, all non guaranteed traffic + * goes to 1:2 class which will adjust 'rate' dynamically as NICs with + * guaranteed throughput are plugged and unplugged. Class 1:1 is there + * so we don't exceed the maximum limit for network. For each NIC with + * guaranteed throughput a separate classid will be created. + * NB '1:' is just a shorter notation of '1:0' */
I have to admit that I don't follow the full discussion of the details of this, but trust that you've done some testing to verify that the results are as desired when floor is specified, and identical to previous code if floor isn't specified.
+ virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "add", "dev", ifname, "parent", + "1:", "classid", "1:1", "htb", "rate", average, + "ceil", peak ? peak : average, NULL); + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + } virCommandFree(cmd); cmd = virCommandNew(TC); virCommandAddArgList(cmd,"class", "add", "dev", ifname, "parent", - "1:", "classid", "1:1", "htb", NULL); - virCommandAddArgList(cmd, "rate", average, NULL); + hierarchical_class ? "1:1" : "1:", "classid", + hierarchical_class ? "1:2" : "1:1", "htb", + "rate", average, NULL);
if (peak) virCommandAddArgList(cmd, "ceil", peak, NULL); diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index 35f8b89..d308ab2 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -42,7 +42,9 @@ struct _virNetDevBandwidth {
void virNetDevBandwidthFree(virNetDevBandwidthPtr def);
-int virNetDevBandwidthSet(const char *ifname, virNetDevBandwidthPtr bandwidth) +int virNetDevBandwidthSet(const char *ifname, + virNetDevBandwidthPtr bandwidth, + bool hierarchical_class) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; int virNetDevBandwidthClear(const char *ifname) ATTRIBUTE_NONNULL(1); diff --git a/src/util/virnetdevmacvlan.c b/src/util/virnetdevmacvlan.c index bd26ba9..cfe3f87 100644 --- a/src/util/virnetdevmacvlan.c +++ b/src/util/virnetdevmacvlan.c @@ -925,7 +925,7 @@ create_name: rc = 0; }
- if (virNetDevBandwidthSet(cr_ifname, bandwidth) < 0) { + if (virNetDevBandwidthSet(cr_ifname, bandwidth, false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), cr_ifname);
Huh. I thought I had remembered that bandwidth (like iptables rules) couldn't be applied to macvtap interfaces. Is that wrong? (unrelated to this patchset anyway, just curious). ACK with the above questions appropriately answered.

On 30.11.2012 17:52, Laine Stump wrote:
On 11/19/2012 11:51 AM, Michal Privoznik wrote:
These classes can borrow unused bandwidth. Basically, only egress qdsics can have classes, therefore we can do this kind of traffic shaping only on host's outgoing, that is domain's incoming traffic. --- src/lxc/lxc_process.c | 3 ++- src/network/bridge_driver.c | 3 ++- src/qemu/qemu_command.c | 3 ++- src/qemu/qemu_driver.c | 2 +- src/util/virnetdevbandwidth.c | 33 +++++++++++++++++++++++++++------ src/util/virnetdevbandwidth.h | 4 +++- src/util/virnetdevmacvlan.c | 2 +- 7 files changed, 38 insertions(+), 12 deletions(-)
diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index 079bc3a..91ce2d3 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -335,7 +335,8 @@ static int virLXCProcessSetupInterfaceBridged(virConnectPtr conn, goto cleanup;
if (virNetDevBandwidthSet(net->ifname, - virDomainNetGetActualBandwidth(net)) < 0) { + virDomainNetGetActualBandwidth(net), + false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), net->ifname); diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index c153d36..256b601 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -2240,7 +2240,8 @@ networkStartNetworkVirtual(struct network_driver *driver, VIR_FORCE_CLOSE(tapfd); }
- if (virNetDevBandwidthSet(network->def->bridge, network->def->bandwidth) < 0) { + if (virNetDevBandwidthSet(network->def->bridge, + network->def->bandwidth, true) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), network->def->bridge); diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 22bb209..c1f8680 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -292,7 +292,8 @@ qemuNetworkIfaceConnect(virDomainDefPtr def,
if (tapfd >= 0 && virNetDevBandwidthSet(net->ifname, - virDomainNetGetActualBandwidth(net)) < 0) { + virDomainNetGetActualBandwidth(net), + false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), net->ifname); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 595c452..9ae0c1b 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -8960,7 +8960,7 @@ qemuDomainSetInterfaceParameters(virDomainPtr dom, sizeof(*newBandwidth->out)); }
- if (virNetDevBandwidthSet(net->ifname, newBandwidth) < 0) { + if (virNetDevBandwidthSet(net->ifname, newBandwidth, false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), device); diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index bddb788..fcc6b56 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -45,17 +45,21 @@ virNetDevBandwidthFree(virNetDevBandwidthPtr def) * virNetDevBandwidthSet: * @ifname: on which interface * @bandwidth: rates to set (may be NULL) + * @hierarchical_class: whether to create hierarchical class * * This function enables QoS on specified interface * and set given traffic limits for both, incoming * and outgoing traffic. Any previous setting get - * overwritten. + * overwritten. If @hierarchical_class is TRUE, create + * hierarchical class. It is used to guarantee minimal + * throughput ('floor' attribute in NIC). * * Return 0 on success, -1 otherwise. */ int virNetDevBandwidthSet(const char *ifname, - virNetDevBandwidthPtr bandwidth) + virNetDevBandwidthPtr bandwidth, + bool hierarchical_class) { int ret = -1; virCommandPtr cmd = NULL; @@ -71,7 +75,7 @@ virNetDevBandwidthSet(const char *ifname,
virNetDevBandwidthClear(ifname);
- if (bandwidth->in) { + if (bandwidth->in && bandwidth->in->average) { if (virAsprintf(&average, "%llukbps", bandwidth->in->average) < 0) goto cleanup; if (bandwidth->in->peak && @@ -83,15 +87,32 @@ virNetDevBandwidthSet(const char *ifname,
cmd = virCommandNew(TC); virCommandAddArgList(cmd, "qdisc", "add", "dev", ifname, "root", - "handle", "1:", "htb", "default", "1", NULL); + "handle", "1:", "htb", "default", + hierarchical_class ? "2" : "1", NULL);
Don't you really only need to do this if floor has been set? If you create the hierarchical class and don't need it, does this result in any extra overhead? If so, I think you should do (hierarchical_class && bandwidth->floor).
Yes I do need that unless I want to create hierarchical classes when the first interface with bandwidth->floor is plugged in. Note that hierarchical_class is true only on bridges and since only interfaces are allowed to have a floor, only one of bandwith->floor and hierarchical_class can be non-zero. However, dynamically creating classes would complicate the code even more and doesn't really provide noticeable trade off between overhead and code complexity.
if (virCommandRun(cmd, NULL) < 0) goto cleanup;
+ if (hierarchical_class) {
similar
+ /* If we are creating hierarchical class, all non guaranteed traffic + * goes to 1:2 class which will adjust 'rate' dynamically as NICs with + * guaranteed throughput are plugged and unplugged. Class 1:1 is there + * so we don't exceed the maximum limit for network. For each NIC with + * guaranteed throughput a separate classid will be created. + * NB '1:' is just a shorter notation of '1:0' */
I have to admit that I don't follow the full discussion of the details of this, but trust that you've done some testing to verify that the results are as desired when floor is specified, and identical to previous code if floor isn't specified.
You bet I did :)
+ virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "add", "dev", ifname, "parent", + "1:", "classid", "1:1", "htb", "rate", average, + "ceil", peak ? peak : average, NULL); + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + } virCommandFree(cmd); cmd = virCommandNew(TC); virCommandAddArgList(cmd,"class", "add", "dev", ifname, "parent", - "1:", "classid", "1:1", "htb", NULL); - virCommandAddArgList(cmd, "rate", average, NULL); + hierarchical_class ? "1:1" : "1:", "classid", + hierarchical_class ? "1:2" : "1:1", "htb", + "rate", average, NULL);
if (peak) virCommandAddArgList(cmd, "ceil", peak, NULL); diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index 35f8b89..d308ab2 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -42,7 +42,9 @@ struct _virNetDevBandwidth {
void virNetDevBandwidthFree(virNetDevBandwidthPtr def);
-int virNetDevBandwidthSet(const char *ifname, virNetDevBandwidthPtr bandwidth) +int virNetDevBandwidthSet(const char *ifname, + virNetDevBandwidthPtr bandwidth, + bool hierarchical_class) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; int virNetDevBandwidthClear(const char *ifname) ATTRIBUTE_NONNULL(1); diff --git a/src/util/virnetdevmacvlan.c b/src/util/virnetdevmacvlan.c index bd26ba9..cfe3f87 100644 --- a/src/util/virnetdevmacvlan.c +++ b/src/util/virnetdevmacvlan.c @@ -925,7 +925,7 @@ create_name: rc = 0; }
- if (virNetDevBandwidthSet(cr_ifname, bandwidth) < 0) { + if (virNetDevBandwidthSet(cr_ifname, bandwidth, false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot set bandwidth limits on %s"), cr_ifname);
Huh. I thought I had remembered that bandwidth (like iptables rules) couldn't be applied to macvtap interfaces. Is that wrong? (unrelated to this patchset anyway, just curious).
Not on the level we are doing things. In linux, we have these traffic shaping classes which actually limit the flow of packets which resides within them. To send a packet into appropriate class is job for the filters. These can look at the packet headers and decide to which class is packet sent. The filter can see full range of headers in terms of ISO/OSI model - that is, a decision can be made depending on TCP flag set, src/dst IP address, src/dst MAC address, iptables marks (iptables ... -j MARK --set-mark <some-mark>). And since libvirt doesn't know anything about IP addresses of guest we are left only with 2nd level of the ISO/OSI model - MAC addresses. Problem with iptables marks was - IIRC - they doesn't apply on macvtap devices. For some reason the mark weren't applied in my testing so I've brushed the idea aside. In addition, I think you have to enable something in kernel for packets from a macvtap NIC to go through a bridge.
ACK with the above questions appropriately answered.
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

These set bridge part of QoS when bringing domain's interface up. Long story short, if there's a 'floor' set, a new QoS class is created. ClassID MUST be unique within the bridge and should be kept for unplug phase. --- po/POTFILES.in | 1 + src/util/virnetdevbandwidth.c | 178 +++++++++++++++++++++++++++++++++++++++++ src/util/virnetdevbandwidth.h | 14 +++ 3 files changed, 193 insertions(+), 0 deletions(-) diff --git a/po/POTFILES.in b/po/POTFILES.in index 9768528..f51e2c7 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -154,6 +154,7 @@ src/util/virhash.c src/util/virkeyfile.c src/util/virlockspace.c src/util/virnetdev.c +src/util/virnetdevbandwidth.c src/util/virnetdevbridge.c src/util/virnetdevmacvlan.c src/util/virnetdevopenvswitch.c diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index fcc6b56..9597dcd 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -285,3 +285,181 @@ virNetDevBandwidthEqual(virNetDevBandwidthPtr a, return true; } + +/* + * virNetDevBandwidthPlug: + * @brname: name of the bridge + * @net_bandwidth: QoS settings on @brname + * @ifname: interface being plugged into @brname + * @ifmac: MAC of @ifname + * @bandwidth: QoS settings for @ifname + * @id: unique ID (MUST be greater than 2) + * + * Set bridge part of interface QoS settings, e.g. guaranteed + * bandwidth. @id is an unique ID (among @brname) from which + * other identifiers for class, qdisc and filter are derived. + * However, two classes were already set up (by + * virNetDevBandwidthSet). That's why this @id MUST be greater + * than 2. You may want to keep passed @id, as it is used later + * by virNetDevBandwidthUnplug. + * + * Returns: + * 1 if there is nothing to set + * 0 if QoS set successfully + * -1 otherwise. + */ +int +virNetDevBandwidthPlug(const char *brname, + virNetDevBandwidthPtr net_bandwidth, + const char *ifname, + const virMacAddrPtr ifmac_ptr, + virNetDevBandwidthPtr bandwidth, + unsigned int id) +{ + int ret = -1; + virCommandPtr cmd = NULL; + char *class_id = NULL; + char *qdisc_id = NULL; + char *filter_id = NULL; + char *floor = NULL; + char *ceil = NULL; + unsigned char ifmac[VIR_MAC_BUFLEN]; + char *mac[2]; + + if (!bandwidth || !bandwidth->in || !bandwidth->in->floor) { + /* nothing to set */ + return 1; + } + + if (!net_bandwidth || !net_bandwidth->in) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Bridge '%s' has no QoS set, therefore " + "unable to set 'floor' on '%s'"), + brname, ifname); + return -1; + } + + virMacAddrGetRaw(ifmac_ptr, ifmac); + + if (virAsprintf(&class_id, "1:%x", id) < 0 || + virAsprintf(&qdisc_id, "%x:", id) < 0 || + virAsprintf(&filter_id, "%u", id) < 0 || + virAsprintf(&mac[0], "0x%02x%02x%02x%02x", ifmac[2], + ifmac[3], ifmac[4], ifmac[5]) < 0 || + virAsprintf(&mac[1], "0x%02x%02x", ifmac[0], ifmac[1]) < 0 || + virAsprintf(&floor, "%llukbps", bandwidth->in->floor) < 0 || + virAsprintf(&ceil, "%llukbps", net_bandwidth->in->peak ? + net_bandwidth->in->peak : + net_bandwidth->in->average) < 0) { + virReportOOMError(); + goto cleanup; + } + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "add", "dev", brname, "parent", "1:1", + "classid", class_id, "htb", "rate", floor, + "ceil", ceil, NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "add", "dev", brname, "parent", + class_id, "handle", qdisc_id, "sfq", "perturb", + "10", NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + /* Okay, this not nice. But since libvirt does not know anything about + * interface IP address(es), and tc fw filter simply refuse to use ebtables + * marks, we need to use u32 selector to match MAC address. + * If libvirt will ever know something, remove this FIXME + */ + virCommandAddArgList(cmd, "filter", "add", "dev", brname, "protocol", "ip", + "prio", filter_id, "u32", + "match", "u16", "0x0800", "0xffff", "at", "-2", + "match", "u32", mac[0], "0xffffffff", "at", "-12", + "match", "u16", mac[1], "0xffff", "at", "-14", + "flowid", class_id, NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + ret = 0; + +cleanup: + VIR_FREE(ceil); + VIR_FREE(floor); + VIR_FREE(mac[1]); + VIR_FREE(mac[0]); + VIR_FREE(filter_id); + VIR_FREE(qdisc_id); + VIR_FREE(class_id); + virCommandFree(cmd); + return ret; +} + +/* + * virNetDevBandwidthUnplug: + * @brname: from which bridge are we unplugging + * @id: unique identifier (MUST be greater than 2) + * + * Remove QoS settings from bridge. + * + * Returns 0 on success, -1 otherwise. + */ +int +virNetDevBandwidthUnplug(const char *brname, + unsigned int id) +{ + int ret = -1; + int cmd_ret = 0; + virCommandPtr cmd = NULL; + char *class_id = NULL; + char *qdisc_id = NULL; + char *filter_id = NULL; + + if (virAsprintf(&class_id, "1:%x", id) < 0 || + virAsprintf(&qdisc_id, "%x:", id) < 0 || + virAsprintf(&filter_id, "%u", id) < 0) { + virReportOOMError(); + goto cleanup; + } + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "del", "dev", brname, + "handle", qdisc_id, NULL); + + /* Don't threat tc errors as fatal, but + * try to remove as much as possible */ + if (virCommandRun(cmd, &cmd_ret) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "filter", "del", "dev", brname, + "prio", filter_id, NULL); + + if (virCommandRun(cmd, &cmd_ret) < 0) + goto cleanup; + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "del", "dev", brname, + "classid", class_id, NULL); + + if (virCommandRun(cmd, &cmd_ret) < 0) + goto cleanup; + + ret = 0; + +cleanup: + VIR_FREE(filter_id); + VIR_FREE(qdisc_id); + VIR_FREE(class_id); + virCommandFree(cmd); + return ret; +} diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index d308ab2..19eb418 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -24,6 +24,7 @@ # define __VIR_NETDEV_BANDWIDTH_H__ # include "internal.h" +# include "virmacaddr.h" typedef struct _virNetDevBandwidthRate virNetDevBandwidthRate; typedef virNetDevBandwidthRate *virNetDevBandwidthRatePtr; @@ -53,4 +54,17 @@ int virNetDevBandwidthCopy(virNetDevBandwidthPtr *dest, const virNetDevBandwidth bool virNetDevBandwidthEqual(virNetDevBandwidthPtr a, virNetDevBandwidthPtr b); +int virNetDevBandwidthPlug(const char *brname, + virNetDevBandwidthPtr net_bandwidth, + const char *ifname, + const virMacAddrPtr ifmac_ptr, + virNetDevBandwidthPtr bandwidth, + unsigned int id) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) + ATTRIBUTE_RETURN_CHECK; + +int virNetDevBandwidthUnplug(const char *brname, + unsigned int id) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; + #endif /* __VIR_NETDEV_BANDWIDTH_H__ */ -- 1.7.8.6

On 11/19/2012 11:51 AM, Michal Privoznik wrote:
These set bridge part of QoS when bringing domain's interface up. Long story short, if there's a 'floor' set, a new QoS class is created. ClassID MUST be unique within the bridge and should be kept for unplug phase. --- po/POTFILES.in | 1 + src/util/virnetdevbandwidth.c | 178 +++++++++++++++++++++++++++++++++++++++++ src/util/virnetdevbandwidth.h | 14 +++ 3 files changed, 193 insertions(+), 0 deletions(-)
diff --git a/po/POTFILES.in b/po/POTFILES.in index 9768528..f51e2c7 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -154,6 +154,7 @@ src/util/virhash.c src/util/virkeyfile.c src/util/virlockspace.c src/util/virnetdev.c +src/util/virnetdevbandwidth.c src/util/virnetdevbridge.c src/util/virnetdevmacvlan.c src/util/virnetdevopenvswitch.c diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index fcc6b56..9597dcd 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -285,3 +285,181 @@ virNetDevBandwidthEqual(virNetDevBandwidthPtr a,
return true; } + +/* + * virNetDevBandwidthPlug: + * @brname: name of the bridge + * @net_bandwidth: QoS settings on @brname + * @ifname: interface being plugged into @brname + * @ifmac: MAC of @ifname + * @bandwidth: QoS settings for @ifname + * @id: unique ID (MUST be greater than 2)
* If it must be > 2, then you need to check for that at the top of the function.
+ * + * Set bridge part of interface QoS settings, e.g. guaranteed + * bandwidth. @id is an unique ID (among @brname) from which + * other identifiers for class, qdisc and filter are derived. + * However, two classes were already set up (by + * virNetDevBandwidthSet). That's why this @id MUST be greater + * than 2. You may want to keep passed @id, as it is used later + * by virNetDevBandwidthUnplug. + * + * Returns: + * 1 if there is nothing to set + * 0 if QoS set successfully + * -1 otherwise. + */ +int +virNetDevBandwidthPlug(const char *brname, + virNetDevBandwidthPtr net_bandwidth, + const char *ifname, + const virMacAddrPtr ifmac_ptr, + virNetDevBandwidthPtr bandwidth, + unsigned int id) +{ + int ret = -1; + virCommandPtr cmd = NULL; + char *class_id = NULL; + char *qdisc_id = NULL; + char *filter_id = NULL; + char *floor = NULL; + char *ceil = NULL; + unsigned char ifmac[VIR_MAC_BUFLEN]; + char *mac[2]; + + if (!bandwidth || !bandwidth->in || !bandwidth->in->floor) { + /* nothing to set */ + return 1; + } + + if (!net_bandwidth || !net_bandwidth->in) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Bridge '%s' has no QoS set, therefore " + "unable to set 'floor' on '%s'"), + brname, ifname); + return -1; + } + + virMacAddrGetRaw(ifmac_ptr, ifmac); + + if (virAsprintf(&class_id, "1:%x", id) < 0 || + virAsprintf(&qdisc_id, "%x:", id) < 0 || + virAsprintf(&filter_id, "%u", id) < 0 || + virAsprintf(&mac[0], "0x%02x%02x%02x%02x", ifmac[2], + ifmac[3], ifmac[4], ifmac[5]) < 0 || + virAsprintf(&mac[1], "0x%02x%02x", ifmac[0], ifmac[1]) < 0 || + virAsprintf(&floor, "%llukbps", bandwidth->in->floor) < 0 || + virAsprintf(&ceil, "%llukbps", net_bandwidth->in->peak ? + net_bandwidth->in->peak : + net_bandwidth->in->average) < 0) { + virReportOOMError(); + goto cleanup; + } + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "add", "dev", brname, "parent", "1:1", + "classid", class_id, "htb", "rate", floor, + "ceil", ceil, NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "add", "dev", brname, "parent", + class_id, "handle", qdisc_id, "sfq", "perturb", + "10", NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + /* Okay, this not nice. But since libvirt does not know anything about + * interface IP address(es), and tc fw filter simply refuse to use ebtables + * marks, we need to use u32 selector to match MAC address. + * If libvirt will ever know something, remove this FIXME
Heck, maybe not even then - don't forget about IPv6, along with the fact that I'm not convinced the host can ever know all the IP addresses used by an untrusted guest with 100% certainty (unless we filter for them I suppose).
+ */ + virCommandAddArgList(cmd, "filter", "add", "dev", brname, "protocol", "ip", + "prio", filter_id, "u32", + "match", "u16", "0x0800", "0xffff", "at", "-2", + "match", "u32", mac[0], "0xffffffff", "at", "-12", + "match", "u16", mac[1], "0xffff", "at", "-14", + "flowid", class_id, NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + ret = 0; + +cleanup: + VIR_FREE(ceil); + VIR_FREE(floor); + VIR_FREE(mac[1]); + VIR_FREE(mac[0]);
* If the expression in the if() statement setting mac[0] and mac[1] short circuits before setting them, they will be uninitialized. (that would only happen in case of OOM, but it's still not nice). You should initialize them with "= {NULL, NULL}"
+ VIR_FREE(filter_id); + VIR_FREE(qdisc_id); + VIR_FREE(class_id); + virCommandFree(cmd);
It would be simpler to visually verify everything is being cleaned up if these were in either exactly the same order, or exactly the opposite order they were defined in.
+ return ret; +} + +/* + * virNetDevBandwidthUnplug: + * @brname: from which bridge are we unplugging + * @id: unique identifier (MUST be greater than 2) + * + * Remove QoS settings from bridge. + * + * Returns 0 on success, -1 otherwise. + */ +int +virNetDevBandwidthUnplug(const char *brname, + unsigned int id)
As I'm looking at all these uses of id's, I'm wondering if there's any danger of namespace conflict with other users of tc. (This isn't any critique, just curiousity).
+{ + int ret = -1; + int cmd_ret = 0; + virCommandPtr cmd = NULL; + char *class_id = NULL; + char *qdisc_id = NULL; + char *filter_id = NULL; + + if (virAsprintf(&class_id, "1:%x", id) < 0 || + virAsprintf(&qdisc_id, "%x:", id) < 0 || + virAsprintf(&filter_id, "%u", id) < 0) { + virReportOOMError(); + goto cleanup; + } + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "del", "dev", brname, + "handle", qdisc_id, NULL); + + /* Don't threat tc errors as fatal, but + * try to remove as much as possible */
What's your definition of "fatal"? In this case, if tc fails you return -1, not 0.
+ if (virCommandRun(cmd, &cmd_ret) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "filter", "del", "dev", brname, + "prio", filter_id, NULL); + + if (virCommandRun(cmd, &cmd_ret) < 0) + goto cleanup;
same here
+ + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "del", "dev", brname, + "classid", class_id, NULL); + + if (virCommandRun(cmd, &cmd_ret) < 0) + goto cleanup;
and here. I don't see you being forgiving at all (is it maybe in the caller that you ignore the return status?)
+ + ret = 0; + +cleanup: + VIR_FREE(filter_id); + VIR_FREE(qdisc_id); + VIR_FREE(class_id); + virCommandFree(cmd); + return ret; +} diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index d308ab2..19eb418 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -24,6 +24,7 @@ # define __VIR_NETDEV_BANDWIDTH_H__
# include "internal.h" +# include "virmacaddr.h"
typedef struct _virNetDevBandwidthRate virNetDevBandwidthRate; typedef virNetDevBandwidthRate *virNetDevBandwidthRatePtr; @@ -53,4 +54,17 @@ int virNetDevBandwidthCopy(virNetDevBandwidthPtr *dest, const virNetDevBandwidth
bool virNetDevBandwidthEqual(virNetDevBandwidthPtr a, virNetDevBandwidthPtr b);
+int virNetDevBandwidthPlug(const char *brname, + virNetDevBandwidthPtr net_bandwidth, + const char *ifname, + const virMacAddrPtr ifmac_ptr, + virNetDevBandwidthPtr bandwidth, + unsigned int id) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) + ATTRIBUTE_RETURN_CHECK; + +int virNetDevBandwidthUnplug(const char *brname, + unsigned int id) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; + #endif /* __VIR_NETDEV_BANDWIDTH_H__ */
ACK with the points marked as "*" fixed (other suggestions you can take or leave)

On 30.11.2012 18:53, Laine Stump wrote:
On 11/19/2012 11:51 AM, Michal Privoznik wrote:
These set bridge part of QoS when bringing domain's interface up. Long story short, if there's a 'floor' set, a new QoS class is created. ClassID MUST be unique within the bridge and should be kept for unplug phase. --- po/POTFILES.in | 1 + src/util/virnetdevbandwidth.c | 178 +++++++++++++++++++++++++++++++++++++++++ src/util/virnetdevbandwidth.h | 14 +++ 3 files changed, 193 insertions(+), 0 deletions(-)
diff --git a/po/POTFILES.in b/po/POTFILES.in index 9768528..f51e2c7 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -154,6 +154,7 @@ src/util/virhash.c src/util/virkeyfile.c src/util/virlockspace.c src/util/virnetdev.c +src/util/virnetdevbandwidth.c src/util/virnetdevbridge.c src/util/virnetdevmacvlan.c src/util/virnetdevopenvswitch.c diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index fcc6b56..9597dcd 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -285,3 +285,181 @@ virNetDevBandwidthEqual(virNetDevBandwidthPtr a,
return true; } + +/* + * virNetDevBandwidthPlug: + * @brname: name of the bridge + * @net_bandwidth: QoS settings on @brname + * @ifname: interface being plugged into @brname + * @ifmac: MAC of @ifname + * @bandwidth: QoS settings for @ifname + * @id: unique ID (MUST be greater than 2)
* If it must be > 2, then you need to check for that at the top of the function.
+ * + * Set bridge part of interface QoS settings, e.g. guaranteed + * bandwidth. @id is an unique ID (among @brname) from which + * other identifiers for class, qdisc and filter are derived. + * However, two classes were already set up (by + * virNetDevBandwidthSet). That's why this @id MUST be greater + * than 2. You may want to keep passed @id, as it is used later + * by virNetDevBandwidthUnplug. + * + * Returns: + * 1 if there is nothing to set + * 0 if QoS set successfully + * -1 otherwise. + */ +int +virNetDevBandwidthPlug(const char *brname, + virNetDevBandwidthPtr net_bandwidth, + const char *ifname, + const virMacAddrPtr ifmac_ptr, + virNetDevBandwidthPtr bandwidth, + unsigned int id) +{ + int ret = -1; + virCommandPtr cmd = NULL; + char *class_id = NULL; + char *qdisc_id = NULL; + char *filter_id = NULL; + char *floor = NULL; + char *ceil = NULL; + unsigned char ifmac[VIR_MAC_BUFLEN]; + char *mac[2]; + + if (!bandwidth || !bandwidth->in || !bandwidth->in->floor) { + /* nothing to set */ + return 1; + } + + if (!net_bandwidth || !net_bandwidth->in) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Bridge '%s' has no QoS set, therefore " + "unable to set 'floor' on '%s'"), + brname, ifname); + return -1; + } + + virMacAddrGetRaw(ifmac_ptr, ifmac); + + if (virAsprintf(&class_id, "1:%x", id) < 0 || + virAsprintf(&qdisc_id, "%x:", id) < 0 || + virAsprintf(&filter_id, "%u", id) < 0 || + virAsprintf(&mac[0], "0x%02x%02x%02x%02x", ifmac[2], + ifmac[3], ifmac[4], ifmac[5]) < 0 || + virAsprintf(&mac[1], "0x%02x%02x", ifmac[0], ifmac[1]) < 0 || + virAsprintf(&floor, "%llukbps", bandwidth->in->floor) < 0 || + virAsprintf(&ceil, "%llukbps", net_bandwidth->in->peak ? + net_bandwidth->in->peak : + net_bandwidth->in->average) < 0) { + virReportOOMError(); + goto cleanup; + } + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "add", "dev", brname, "parent", "1:1", + "classid", class_id, "htb", "rate", floor, + "ceil", ceil, NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "add", "dev", brname, "parent", + class_id, "handle", qdisc_id, "sfq", "perturb", + "10", NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + /* Okay, this not nice. But since libvirt does not know anything about + * interface IP address(es), and tc fw filter simply refuse to use ebtables + * marks, we need to use u32 selector to match MAC address. + * If libvirt will ever know something, remove this FIXME
Heck, maybe not even then - don't forget about IPv6, along with the fact that I'm not convinced the host can ever know all the IP addresses used by an untrusted guest with 100% certainty (unless we filter for them I suppose).
+ */ + virCommandAddArgList(cmd, "filter", "add", "dev", brname, "protocol", "ip", + "prio", filter_id, "u32", + "match", "u16", "0x0800", "0xffff", "at", "-2", + "match", "u32", mac[0], "0xffffffff", "at", "-12", + "match", "u16", mac[1], "0xffff", "at", "-14", + "flowid", class_id, NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + ret = 0; + +cleanup: + VIR_FREE(ceil); + VIR_FREE(floor); + VIR_FREE(mac[1]); + VIR_FREE(mac[0]);
* If the expression in the if() statement setting mac[0] and mac[1] short circuits before setting them, they will be uninitialized. (that would only happen in case of OOM, but it's still not nice). You should initialize them with "= {NULL, NULL}"
+ VIR_FREE(filter_id); + VIR_FREE(qdisc_id); + VIR_FREE(class_id); + virCommandFree(cmd);
It would be simpler to visually verify everything is being cleaned up if these were in either exactly the same order, or exactly the opposite order they were defined in.
+ return ret; +} + +/* + * virNetDevBandwidthUnplug: + * @brname: from which bridge are we unplugging + * @id: unique identifier (MUST be greater than 2) + * + * Remove QoS settings from bridge. + * + * Returns 0 on success, -1 otherwise. + */ +int +virNetDevBandwidthUnplug(const char *brname, + unsigned int id)
As I'm looking at all these uses of id's, I'm wondering if there's any danger of namespace conflict with other users of tc. (This isn't any critique, just curiousity).
No, class ID is specific within an NIC. That is, classID of 3 on eth0 doesn't interfere with class on vnet0. In fact, they have nothing in common. What we could interfere with is - if user sets something afterwards on fully libvirt managed interface. But I guess we don't support this, right? It's all or nothing approach to me.
+{ + int ret = -1; + int cmd_ret = 0; + virCommandPtr cmd = NULL; + char *class_id = NULL; + char *qdisc_id = NULL; + char *filter_id = NULL; + + if (virAsprintf(&class_id, "1:%x", id) < 0 || + virAsprintf(&qdisc_id, "%x:", id) < 0 || + virAsprintf(&filter_id, "%u", id) < 0) { + virReportOOMError(); + goto cleanup; + } + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "del", "dev", brname, + "handle", qdisc_id, NULL); + + /* Don't threat tc errors as fatal, but + * try to remove as much as possible */
What's your definition of "fatal"? In this case, if tc fails you return -1, not 0.
No, the return value of tc is stored into cmd_ret. Since we are not passing a NULL here, virCommandRun should fail if something went really wrong - e.g. OOM, or a pipe couldn't be created, and so on.
+ if (virCommandRun(cmd, &cmd_ret) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "filter", "del", "dev", brname, + "prio", filter_id, NULL); + + if (virCommandRun(cmd, &cmd_ret) < 0) + goto cleanup;
same here
+ + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "del", "dev", brname, + "classid", class_id, NULL); + + if (virCommandRun(cmd, &cmd_ret) < 0) + goto cleanup;
and here. I don't see you being forgiving at all (is it maybe in the caller that you ignore the return status?)
+ + ret = 0; + +cleanup: + VIR_FREE(filter_id); + VIR_FREE(qdisc_id); + VIR_FREE(class_id); + virCommandFree(cmd); + return ret; +} diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index d308ab2..19eb418 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -24,6 +24,7 @@ # define __VIR_NETDEV_BANDWIDTH_H__
# include "internal.h" +# include "virmacaddr.h"
typedef struct _virNetDevBandwidthRate virNetDevBandwidthRate; typedef virNetDevBandwidthRate *virNetDevBandwidthRatePtr; @@ -53,4 +54,17 @@ int virNetDevBandwidthCopy(virNetDevBandwidthPtr *dest, const virNetDevBandwidth
bool virNetDevBandwidthEqual(virNetDevBandwidthPtr a, virNetDevBandwidthPtr b);
+int virNetDevBandwidthPlug(const char *brname, + virNetDevBandwidthPtr net_bandwidth, + const char *ifname, + const virMacAddrPtr ifmac_ptr, + virNetDevBandwidthPtr bandwidth, + unsigned int id) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) + ATTRIBUTE_RETURN_CHECK; + +int virNetDevBandwidthUnplug(const char *brname, + unsigned int id) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; + #endif /* __VIR_NETDEV_BANDWIDTH_H__ */
ACK with the points marked as "*" fixed (other suggestions you can take or leave)
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 12/03/2012 10:55 AM, Michal Privoznik wrote:
On 30.11.2012 18:53, Laine Stump wrote:
As I'm looking at all these uses of id's, I'm wondering if there's any danger of namespace conflict with other users of tc. (This isn't any critique, just curiousity). No, class ID is specific within an NIC. That is, classID of 3 on eth0 doesn't interfere with class on vnet0. In fact, they have nothing in common. What we could interfere with is - if user sets something afterwards on fully libvirt managed interface. But I guess we don't support this, right? It's all or nothing approach to me.
Correct. I was only wondering if the use of the same class_id on a different interface would cause interference. If it doesn't then nothing to worry about :-)
+{ + int ret = -1; + int cmd_ret = 0; + virCommandPtr cmd = NULL; + char *class_id = NULL; + char *qdisc_id = NULL; + char *filter_id = NULL; + + if (virAsprintf(&class_id, "1:%x", id) < 0 || + virAsprintf(&qdisc_id, "%x:", id) < 0 || + virAsprintf(&filter_id, "%u", id) < 0) { + virReportOOMError(); + goto cleanup; + } + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "del", "dev", brname, + "handle", qdisc_id, NULL); + + /* Don't threat tc errors as fatal, but + * try to remove as much as possible */ What's your definition of "fatal"? In this case, if tc fails you return -1, not 0. No, the return value of tc is stored into cmd_ret. Since we are not passing a NULL here, virCommandRun should fail if something went really wrong - e.g. OOM, or a pipe couldn't be created, and so on.
Right. I missed the presence of cmd_ret.

This will be used whenever a NIC with guaranteed throughput is to be plugged into a bridge. It will adjust the average throughput of non guaranteed NICs (classid 1:2) to meet new requirements. --- src/util/virnetdevbandwidth.c | 49 +++++++++++++++++++++++++++++++++++++++++ src/util/virnetdevbandwidth.h | 6 +++++ 2 files changed, 55 insertions(+), 0 deletions(-) diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index 9597dcd..3abe7e2 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -463,3 +463,52 @@ cleanup: virCommandFree(cmd); return ret; } + +/** + * virNetDevBandwidthUpdateRate: + * @ifname: interface name + * @classid: ID of class to update + * @new_rate: new rate + * + * This function updates the 'rate' attribute of HTB class. + * It can be used whenever a new interface is plugged to a + * bridge to adjust average throughput of non guaranteed + * NICs. + * + * Returns 0 on success, -1 otherwise. + */ +int +virNetDevBandwidthUpdateRate(const char *ifname, + const char *class_id, + virNetDevBandwidthPtr bandwidth, + unsigned long long new_rate) +{ + int ret = -1; + virCommandPtr cmd = NULL; + char *rate = NULL; + char *ceil = NULL; + + if (virAsprintf(&rate, "%llukbps", new_rate) < 0 || + virAsprintf(&ceil, "%llukbps", bandwidth->in->peak ? + bandwidth->in->peak : + bandwidth->in->average) < 0) { + virReportOOMError(); + goto cleanup; + } + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "change", "dev", ifname, + "classid", class_id, "htb", "rate", rate, + "ceil", ceil, NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + ret = 0; + +cleanup: + virCommandFree(cmd); + VIR_FREE(rate); + VIR_FREE(ceil); + return ret; +} diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index 19eb418..a5d595e 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -67,4 +67,10 @@ int virNetDevBandwidthUnplug(const char *brname, unsigned int id) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; +int virNetDevBandwidthUpdateRate(const char *ifname, + const char *class_id, + virNetDevBandwidthPtr bandwidth, + unsigned long long new_rate) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) + ATTRIBUTE_RETURN_CHECK; #endif /* __VIR_NETDEV_BANDWIDTH_H__ */ -- 1.7.8.6

On 11/19/2012 11:51 AM, Michal Privoznik wrote:
This will be used whenever a NIC with guaranteed throughput is to be plugged into a bridge. It will adjust the average throughput of non guaranteed NICs (classid 1:2) to meet new requirements. --- src/util/virnetdevbandwidth.c | 49 +++++++++++++++++++++++++++++++++++++++++ src/util/virnetdevbandwidth.h | 6 +++++ 2 files changed, 55 insertions(+), 0 deletions(-)
diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index 9597dcd..3abe7e2 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -463,3 +463,52 @@ cleanup: virCommandFree(cmd); return ret; } + +/** + * virNetDevBandwidthUpdateRate: + * @ifname: interface name + * @classid: ID of class to update + * @new_rate: new rate + * + * This function updates the 'rate' attribute of HTB class. + * It can be used whenever a new interface is plugged to a + * bridge to adjust average throughput of non guaranteed + * NICs.
So it's changing the "average" for that NIC to something other than what was configured? Is a record of this kept anywhere? (or maybe it's not necessary).
+ * + * Returns 0 on success, -1 otherwise. + */ +int +virNetDevBandwidthUpdateRate(const char *ifname, + const char *class_id, + virNetDevBandwidthPtr bandwidth, + unsigned long long new_rate) +{ + int ret = -1; + virCommandPtr cmd = NULL; + char *rate = NULL; + char *ceil = NULL; + + if (virAsprintf(&rate, "%llukbps", new_rate) < 0 || + virAsprintf(&ceil, "%llukbps", bandwidth->in->peak ? + bandwidth->in->peak : + bandwidth->in->average) < 0) { + virReportOOMError(); + goto cleanup; + } + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "change", "dev", ifname, + "classid", class_id, "htb", "rate", rate, + "ceil", ceil, NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + ret = 0; + +cleanup: + virCommandFree(cmd); + VIR_FREE(rate); + VIR_FREE(ceil); + return ret; +} diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index 19eb418..a5d595e 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -67,4 +67,10 @@ int virNetDevBandwidthUnplug(const char *brname, unsigned int id) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK;
+int virNetDevBandwidthUpdateRate(const char *ifname, + const char *class_id, + virNetDevBandwidthPtr bandwidth, + unsigned long long new_rate) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) + ATTRIBUTE_RETURN_CHECK; #endif /* __VIR_NETDEV_BANDWIDTH_H__ */
ACK.

On 30.11.2012 18:56, Laine Stump wrote:
On 11/19/2012 11:51 AM, Michal Privoznik wrote:
This will be used whenever a NIC with guaranteed throughput is to be plugged into a bridge. It will adjust the average throughput of non guaranteed NICs (classid 1:2) to meet new requirements. --- src/util/virnetdevbandwidth.c | 49 +++++++++++++++++++++++++++++++++++++++++ src/util/virnetdevbandwidth.h | 6 +++++ 2 files changed, 55 insertions(+), 0 deletions(-)
diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index 9597dcd..3abe7e2 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -463,3 +463,52 @@ cleanup: virCommandFree(cmd); return ret; } + +/** + * virNetDevBandwidthUpdateRate: + * @ifname: interface name + * @classid: ID of class to update + * @new_rate: new rate + * + * This function updates the 'rate' attribute of HTB class. + * It can be used whenever a new interface is plugged to a + * bridge to adjust average throughput of non guaranteed + * NICs.
So it's changing the "average" for that NIC to something other than what was configured? Is a record of this kept anywhere? (or maybe it's not necessary).
Right, but it will be used only for changing the average of the first child class on the bridge, that is 1:2 which is the class where all not guaranteed traffic is sent. And yeah - I keep track of it in the next patches - it's called floor_sum.
+ * + * Returns 0 on success, -1 otherwise. + */ +int +virNetDevBandwidthUpdateRate(const char *ifname, + const char *class_id, + virNetDevBandwidthPtr bandwidth, + unsigned long long new_rate) +{ + int ret = -1; + virCommandPtr cmd = NULL; + char *rate = NULL; + char *ceil = NULL; + + if (virAsprintf(&rate, "%llukbps", new_rate) < 0 || + virAsprintf(&ceil, "%llukbps", bandwidth->in->peak ? + bandwidth->in->peak : + bandwidth->in->average) < 0) { + virReportOOMError(); + goto cleanup; + } + + cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "class", "change", "dev", ifname, + "classid", class_id, "htb", "rate", rate, + "ceil", ceil, NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + ret = 0; + +cleanup: + virCommandFree(cmd); + VIR_FREE(rate); + VIR_FREE(ceil); + return ret; +} diff --git a/src/util/virnetdevbandwidth.h b/src/util/virnetdevbandwidth.h index 19eb418..a5d595e 100644 --- a/src/util/virnetdevbandwidth.h +++ b/src/util/virnetdevbandwidth.h @@ -67,4 +67,10 @@ int virNetDevBandwidthUnplug(const char *brname, unsigned int id) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK;
+int virNetDevBandwidthUpdateRate(const char *ifname, + const char *class_id, + virNetDevBandwidthPtr bandwidth, + unsigned long long new_rate) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) + ATTRIBUTE_RETURN_CHECK; #endif /* __VIR_NETDEV_BANDWIDTH_H__ */
ACK.
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

Network should be notified if we plug in or unplug an interface, so it can perform some action, e.g. set/unset network part of QoS. --- src/Makefile.am | 7 ++- src/conf/domain_conf.h | 1 + src/conf/network_conf.c | 1 + src/conf/network_conf.h | 3 + src/libvirt_network.syms | 8 ++ src/network/bridge_driver.c | 165 +++++++++++++++++++++++++++++++++++++++++++ src/network/bridge_driver.h | 10 ++- src/qemu/qemu_command.c | 32 ++++++--- src/qemu/qemu_process.c | 12 +++ 9 files changed, 225 insertions(+), 14 deletions(-) create mode 100644 src/libvirt_network.syms diff --git a/src/Makefile.am b/src/Makefile.am index 1f32263..9b14ef8 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -1366,6 +1366,10 @@ if WITH_ATOMIC_OPS_PTHREAD USED_SYM_FILES += libvirt_atomic.syms endif +if WITH_NETWORK +USED_SYM_FILES += libvirt_network.syms +endif + EXTRA_DIST += \ libvirt_public.syms \ libvirt_private.syms \ @@ -1379,7 +1383,8 @@ EXTRA_DIST += \ libvirt_sasl.syms \ libvirt_vmx.syms \ libvirt_xenxs.syms \ - libvirt_libssh2.syms + libvirt_libssh2.syms \ + libvirt_network.syms GENERATED_SYM_FILES = libvirt.syms libvirt.def libvirt_qemu.def diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 091879e..09c705a 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -862,6 +862,7 @@ struct _virDomainNetDef { virNetDevBandwidthPtr bandwidth; virNetDevVlan vlan; int linkstate; + unsigned int class_id; /* Class ID for 'floor' */ }; /* Used for prefix of ifname of any network name generated dynamically diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index 41831e0..80189e6 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -318,6 +318,7 @@ virNetworkAssignDef(virNetworkObjListPtr nets, } virNetworkObjLock(network); network->def = def; + network->class_id = 3; /* next free class id */ nets->objs[nets->count] = network; nets->count++; diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h index 3e46304..8a8d1e4 100644 --- a/src/conf/network_conf.h +++ b/src/conf/network_conf.h @@ -221,6 +221,9 @@ struct _virNetworkObj { virNetworkDefPtr def; /* The current definition */ virNetworkDefPtr newDef; /* New definition to activate at shutdown */ + + unsigned int class_id; /* next class ID for QoS */ + unsigned long long floor_sum; /* sum of all 'floor'-s of attached NICs */ }; typedef struct _virNetworkObjList virNetworkObjList; diff --git a/src/libvirt_network.syms b/src/libvirt_network.syms new file mode 100644 index 0000000..14f76ec --- /dev/null +++ b/src/libvirt_network.syms @@ -0,0 +1,8 @@ +# +# Network-specific symbols +# + +# bridge_driver.h +virNetDevBandwidthPlug; +virNetDevBandwidthUnplug; +virNetDevBandwidthUpdateRate; diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 256b601..44fa56e 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -4192,3 +4192,168 @@ cleanup: error: goto cleanup; } + +/** + * networkCheckBandwidth: + * @net: network QoS + * @iface: interface QoS + * @new_rate: new rate for non guaranteed class + * + * Returns: -1 if plugging would overcommit network QoS + * 0 if plugging is safe (@new_rate updated) + * 1 if no QoS is set (@new_rate untouched) + */ +static int +networkCheckBandwidth(virNetworkObjPtr net, + virDomainNetDefPtr iface, + unsigned long long *new_rate) +{ + int ret = -1; + virNetDevBandwidthPtr netBand = net->def->bandwidth; + virNetDevBandwidthPtr ifaceBand = iface->bandwidth; + unsigned long long tmp_floor_sum = net->floor_sum; + unsigned long long tmp_new_rate = 0; + + if (!ifaceBand || !ifaceBand->in || !ifaceBand->in->floor || + !netBand || !netBand->in) + return 1; + + tmp_new_rate = netBand->in->average; + tmp_floor_sum += ifaceBand->in->floor; + + /* check against peak */ + if (netBand->in->peak) { + tmp_new_rate = netBand->in->peak; + if (tmp_floor_sum > netBand->in->peak) { + virReportError(VIR_ERR_OPERATION_INVALID, + _("Cannot plug '%s' interface into '%s' because it " + "would overcommit 'peak' on network '%s'"), + iface->ifname, + net->def->bridge, + net->def->name); + goto cleanup; + } + } else if (tmp_floor_sum > netBand->in->average) { + /* tmp_floor_sum can be between 'average' and 'peak' iff 'peak' is set. + * Otherwise, tmp_floor_sum must be below 'average'. */ + virReportError(VIR_ERR_OPERATION_INVALID, + _("Cannot plug '%s' interface into '%s' because it " + "would overcommit 'average' on network '%s'"), + iface->ifname, + net->def->bridge, + net->def->name); + goto cleanup; + } + + *new_rate = tmp_new_rate; + ret = 0; + +cleanup: + return ret; +} + +int +networkNotifyPlug(virNetworkPtr network, + virDomainNetDefPtr iface) +{ + struct network_driver *driver = network->conn->networkPrivateData; + virNetworkObjPtr net = NULL; + int ret = -1; + int plug_ret; + unsigned long long new_rate = 0; + + networkDriverLock(driver); + net = virNetworkFindByUUID(&driver->networks, network->uuid); + networkDriverUnlock(driver); + + if ((plug_ret = networkCheckBandwidth(net, iface, &new_rate)) < 0) { + /* helper reported error */ + goto cleanup; + } + + if (plug_ret > 0) { + /* no QoS needs to be set; claim success */ + ret = 0; + goto cleanup; + } + + plug_ret = virNetDevBandwidthPlug(net->def->bridge, + net->def->bandwidth, + iface->ifname, + &iface->mac, + iface->bandwidth, + net->class_id); + if (plug_ret < 0) { + ignore_value(virNetDevBandwidthUnplug(net->def->bridge, + net->class_id)); + goto cleanup; + } + + if (plug_ret > 0) { + /* Well, this is embarrassing. The networkCheckBandwidth helper + * says we need to set a QoS, but virNetDevBandwidthPlug says + * we don't need to. It's almost certainly a bug then. */ + VIR_WARN("Something strange happened. You may want to upgrade libvirt"); + ret = 0; + goto cleanup; + } + + /* QoS was set, generate new class ID */ + iface->class_id = net->class_id; + net->class_id++; + /* update sum of 'floor'-s of attached NICs */ + net->floor_sum += iface->bandwidth->in->floor; + /* update rate for non guaranteed NICs */ + new_rate -= net->floor_sum; + if (virNetDevBandwidthUpdateRate(net->def->bridge, "1:2", + net->def->bandwidth, new_rate) < 0) + VIR_WARN("Unable to update rate for 1:2 class on %s bridge", + net->def->bridge); + + ret = 0; + +cleanup: + if (net) + virNetworkObjUnlock(net); + return ret; +} + +int +networkNotifyUnplug(virDomainNetDefPtr iface) +{ + struct network_driver *driver = driverState; + virNetworkObjPtr net = NULL; + int ret = 0; + unsigned long long new_rate; + + networkDriverLock(driver); + net = virNetworkFindByName(&driver->networks, iface->data.network.name); + networkDriverUnlock(driver); + + if (iface->class_id) { + /* we must remove class from bridge */ + new_rate = net->def->bandwidth->in->average; + + if (net->def->bandwidth->in->peak > 0) + new_rate = net->def->bandwidth->in->peak; + + ret = virNetDevBandwidthUnplug(net->def->bridge, iface->class_id); + if (ret < 0) + goto cleanup; + /* update sum of 'floor'-s of attached NICs */ + net->floor_sum -= iface->bandwidth->in->floor; + /* update rate for non guaranteed NICs */ + new_rate -= net->floor_sum; + if (virNetDevBandwidthUpdateRate(net->def->bridge, "1:2", + net->def->bandwidth, new_rate) < 0) + VIR_WARN("Unable to update rate for 1:2 class on %s bridge", + net->def->bridge); + /* no class is associated any longer */ + iface->class_id = 0; + } + +cleanup: + if (net) + virNetworkObjUnlock(net); + return ret; +} diff --git a/src/network/bridge_driver.h b/src/network/bridge_driver.h index 0fae275..2f4b2e4 100644 --- a/src/network/bridge_driver.h +++ b/src/network/bridge_driver.h @@ -48,8 +48,12 @@ int networkGetNetworkAddress(const char *netname, char **netaddr) int networkBuildDhcpDaemonCommandLine(virNetworkObjPtr network, virCommandPtr *cmdout, char *pidfile, - dnsmasqContext *dctx) - ; + dnsmasqContext *dctx); +int networkNotifyPlug(virNetworkPtr network, + virDomainNetDefPtr iface) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int networkNotifyUnplug(virDomainNetDefPtr iface) + ATTRIBUTE_NONNULL(1); # else /* Define no-op replacements that don't drag in any link dependencies. */ # define networkAllocateActualDevice(iface) 0 @@ -57,6 +61,8 @@ int networkBuildDhcpDaemonCommandLine(virNetworkObjPtr network, # define networkReleaseActualDevice(iface) 0 # define networkGetNetworkAddress(netname, netaddr) (-2) # define networkBuildDhcpDaemonCommandLine(network, cmdout, pidfile, dctx) 0 +# define networkNotifyPlug(network, iface) 0 +# define networkNotifyUnplug(network, iface) 0 # endif typedef char *(*networkDnsmasqLeaseFileNameFunc)(const char *netname); diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index c1f8680..efe98b3 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -207,12 +207,13 @@ qemuNetworkIfaceConnect(virDomainDefPtr def, unsigned int tap_create_flags = VIR_NETDEV_TAP_CREATE_IFUP; bool template_ifname = false; int actualType = virDomainNetGetActualType(net); + virNetworkPtr network = NULL; + virErrorPtr errobj = NULL; if (actualType == VIR_DOMAIN_NET_TYPE_NETWORK) { int active, fail = 0; - virErrorPtr errobj; - virNetworkPtr network = virNetworkLookupByName(conn, - net->data.network.name); + network = virNetworkLookupByName(conn, + net->data.network.name); if (!network) return -1; @@ -232,14 +233,14 @@ qemuNetworkIfaceConnect(virDomainDefPtr def, fail = 1; } - /* Make sure any above failure is preserved */ - errobj = virSaveLastError(); - virNetworkFree(network); - virSetError(errobj); - virFreeError(errobj); - - if (fail) + if (fail) { + /* Make sure any above failure is preserved */ + errobj = virSaveLastError(); + virNetworkFree(network); + virSetError(errobj); + virFreeError(errobj); return -1; + } } else if (actualType == VIR_DOMAIN_NET_TYPE_BRIDGE) { if (!(brname = strdup(virDomainNetGetActualBridgeName(net)))) { @@ -301,6 +302,12 @@ qemuNetworkIfaceConnect(virDomainDefPtr def, goto cleanup; } + if (tapfd >= 0 && network && + networkNotifyPlug(network, net) < 0) { + VIR_FORCE_CLOSE(tapfd); + goto cleanup; + } + if (tapfd >= 0) { if ((net->filter) && (net->ifname)) { if (virDomainConfNWFilterInstantiate(conn, def->uuid, net) < 0) @@ -310,7 +317,10 @@ qemuNetworkIfaceConnect(virDomainDefPtr def, cleanup: VIR_FREE(brname); - + errobj = virSaveLastError(); + virNetworkFree(network); + virSetError(errobj); + virFreeError(errobj); return tapfd; } diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 29b7ae1..6ff065c 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -4047,6 +4047,18 @@ void qemuProcessStop(struct qemud_driver *driver, } } + for (i = 0; i < vm->def->nnets; i++) { + virDomainNetDefPtr net = vm->def->nets[i]; + + if (virDomainNetGetActualType(net) != VIR_DOMAIN_NET_TYPE_NETWORK) + continue; + + if (networkNotifyUnplug(net) < 0) { + VIR_WARN("Unable to remove QoS settings for interface '%s'", + net->ifname); + } + } + if (priv->agent) { qemuAgentClose(priv->agent); priv->agent = NULL; -- 1.7.8.6

On 11/19/2012 11:51 AM, Michal Privoznik wrote:
Network should be notified if we plug in or unplug an interface, so it can perform some action, e.g. set/unset network part of QoS. --- src/Makefile.am | 7 ++- src/conf/domain_conf.h | 1 + src/conf/network_conf.c | 1 + src/conf/network_conf.h | 3 + src/libvirt_network.syms | 8 ++ src/network/bridge_driver.c | 165 +++++++++++++++++++++++++++++++++++++++++++ src/network/bridge_driver.h | 10 ++- src/qemu/qemu_command.c | 32 ++++++--- src/qemu/qemu_process.c | 12 +++ 9 files changed, 225 insertions(+), 14 deletions(-) create mode 100644 src/libvirt_network.syms
diff --git a/src/Makefile.am b/src/Makefile.am index 1f32263..9b14ef8 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -1366,6 +1366,10 @@ if WITH_ATOMIC_OPS_PTHREAD USED_SYM_FILES += libvirt_atomic.syms endif
+if WITH_NETWORK +USED_SYM_FILES += libvirt_network.syms +endif + EXTRA_DIST += \ libvirt_public.syms \ libvirt_private.syms \ @@ -1379,7 +1383,8 @@ EXTRA_DIST += \ libvirt_sasl.syms \ libvirt_vmx.syms \ libvirt_xenxs.syms \ - libvirt_libssh2.syms + libvirt_libssh2.syms \ + libvirt_network.syms
GENERATED_SYM_FILES = libvirt.syms libvirt.def libvirt_qemu.def
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 091879e..09c705a 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -862,6 +862,7 @@ struct _virDomainNetDef { virNetDevBandwidthPtr bandwidth; virNetDevVlan vlan; int linkstate; + unsigned int class_id; /* Class ID for 'floor' */ };
/* Used for prefix of ifname of any network name generated dynamically diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index 41831e0..80189e6 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -318,6 +318,7 @@ virNetworkAssignDef(virNetworkObjListPtr nets, } virNetworkObjLock(network); network->def = def; + network->class_id = 3; /* next free class id */
I don't really like the "magic number" characteristic of the 3 here. Can we give this a #define or something?
nets->objs[nets->count] = network; nets->count++; diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h index 3e46304..8a8d1e4 100644 --- a/src/conf/network_conf.h +++ b/src/conf/network_conf.h @@ -221,6 +221,9 @@ struct _virNetworkObj {
virNetworkDefPtr def; /* The current definition */ virNetworkDefPtr newDef; /* New definition to activate at shutdown */ + + unsigned int class_id; /* next class ID for QoS */
Okay, so this is just a monotonically increasing counter so that each interface gets a new and different class_id. Maybe you should call it "nextFreeClassID" or something like that, so that everyone understands it's not a class id used by the network. Or you might want to do this with a bitmap so you can re-use id's of interfaces that are disconnected. (can virBitmaps being dynamically expanded?)
+ unsigned long long floor_sum; /* sum of all 'floor'-s of attached NICs */
And you keep track of the sum of all floors here. So, what happens if libvirtd is restarted? It looks like they both go back to their initial values. And what about hotplug? This is a similar problem to the pool of interfaces used by macvtap/hostdev modes - a network with an interface pool needs to have the "inuse" counters of each interface refreshed whenever libvirtd restarts. So the necessary hooks are already in place: networkAllocateActualDevice - called any time an interface with type='network' is initially allocated and connected. networkNotifyActualDevice - called for each type='network' interface of each active domain any time libvirtd restarts networkReleaseActualDevice - called for every type='network' interface as it is being disconnected and destroyed. These are called for *all* type='network' interfaces, not just those that need to allocate a physical netdev from a pool. Rather than adding your own new hooks (and figuring out all the places they need to be called), I think it would be better to use the existing hooks (perhaps calling a reduced/renamed version of your functions, which can possibly be moved over to ). That will have two advantages: 1) It will assure that the bandwidth functions are called at all the right times, including hotplug/unplug, and libvirtd restart 2) It will continue the process of consolidating all network-related functionality need for these three events into 3 functions which may some day be moved into their own daemom with a public (within libvirt anyway) API accessible via RPC (thus allowing non-privileged libvirts to have full networking functionality). Note that you will probably want to save the interface class_id in the iface->actual (as a matter of fact, in iface->data.network.actual->bandwidth, which can be retrieved with virDomainNetGetActualBandwidth()) so that it's saved properly in the interface's state without polluting the interface's config. Then it will be read back from the state file during the restart and available during networkNotifyActualDevice() of each interface. I guess you can then re-set the network->class_id by checking interface->class_id and setting network->class_id = MAX(network->class_id, iface->class_id + 1); for each encountered interface (or add it to a bitmap, if a) bitmaps can be enlarged dynamically or b) we can determine a reasonable maximum limit on number of active domains). At the same time, you'll recompute the network->floor_sum iteratively as the network is called with a notify for each interface.
};
typedef struct _virNetworkObjList virNetworkObjList; diff --git a/src/libvirt_network.syms b/src/libvirt_network.syms new file mode 100644 index 0000000..14f76ec --- /dev/null +++ b/src/libvirt_network.syms @@ -0,0 +1,8 @@ +# +# Network-specific symbols +# + +# bridge_driver.h +virNetDevBandwidthPlug; +virNetDevBandwidthUnplug; +virNetDevBandwidthUpdateRate; diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 256b601..44fa56e 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -4192,3 +4192,168 @@ cleanup: error: goto cleanup; } + +/** + * networkCheckBandwidth: + * @net: network QoS + * @iface: interface QoS + * @new_rate: new rate for non guaranteed class + * + * Returns: -1 if plugging would overcommit network QoS + * 0 if plugging is safe (@new_rate updated) + * 1 if no QoS is set (@new_rate untouched) + */ +static int +networkCheckBandwidth(virNetworkObjPtr net, + virDomainNetDefPtr iface, + unsigned long long *new_rate) +{ + int ret = -1; + virNetDevBandwidthPtr netBand = net->def->bandwidth; + virNetDevBandwidthPtr ifaceBand = iface->bandwidth; + unsigned long long tmp_floor_sum = net->floor_sum; + unsigned long long tmp_new_rate = 0; + + if (!ifaceBand || !ifaceBand->in || !ifaceBand->in->floor || + !netBand || !netBand->in) + return 1; + + tmp_new_rate = netBand->in->average; + tmp_floor_sum += ifaceBand->in->floor; + + /* check against peak */ + if (netBand->in->peak) { + tmp_new_rate = netBand->in->peak; + if (tmp_floor_sum > netBand->in->peak) { + virReportError(VIR_ERR_OPERATION_INVALID, + _("Cannot plug '%s' interface into '%s' because it " + "would overcommit 'peak' on network '%s'"), + iface->ifname, + net->def->bridge, + net->def->name); + goto cleanup; + } + } else if (tmp_floor_sum > netBand->in->average) { + /* tmp_floor_sum can be between 'average' and 'peak' iff 'peak' is set. + * Otherwise, tmp_floor_sum must be below 'average'. */ + virReportError(VIR_ERR_OPERATION_INVALID, + _("Cannot plug '%s' interface into '%s' because it " + "would overcommit 'average' on network '%s'"), + iface->ifname, + net->def->bridge, + net->def->name); + goto cleanup; + } + + *new_rate = tmp_new_rate; + ret = 0; + +cleanup: + return ret; +} + +int +networkNotifyPlug(virNetworkPtr network, + virDomainNetDefPtr iface) +{ + struct network_driver *driver = network->conn->networkPrivateData; + virNetworkObjPtr net = NULL; + int ret = -1; + int plug_ret; + unsigned long long new_rate = 0; + + networkDriverLock(driver); + net = virNetworkFindByUUID(&driver->networks, network->uuid); + networkDriverUnlock(driver); + + if ((plug_ret = networkCheckBandwidth(net, iface, &new_rate)) < 0) { + /* helper reported error */ + goto cleanup; + } + + if (plug_ret > 0) { + /* no QoS needs to be set; claim success */ + ret = 0; + goto cleanup; + } + + plug_ret = virNetDevBandwidthPlug(net->def->bridge, + net->def->bandwidth, + iface->ifname, + &iface->mac, + iface->bandwidth, + net->class_id); + if (plug_ret < 0) { + ignore_value(virNetDevBandwidthUnplug(net->def->bridge, + net->class_id)); + goto cleanup; + } + + if (plug_ret > 0) { + /* Well, this is embarrassing. The networkCheckBandwidth helper + * says we need to set a QoS, but virNetDevBandwidthPlug says + * we don't need to. It's almost certainly a bug then. */ + VIR_WARN("Something strange happened. You may want to upgrade libvirt"); + ret = 0; + goto cleanup; + } + + /* QoS was set, generate new class ID */ + iface->class_id = net->class_id; + net->class_id++; + /* update sum of 'floor'-s of attached NICs */ + net->floor_sum += iface->bandwidth->in->floor; + /* update rate for non guaranteed NICs */ + new_rate -= net->floor_sum; + if (virNetDevBandwidthUpdateRate(net->def->bridge, "1:2", + net->def->bandwidth, new_rate) < 0) + VIR_WARN("Unable to update rate for 1:2 class on %s bridge", + net->def->bridge); + + ret = 0; + +cleanup: + if (net) + virNetworkObjUnlock(net); + return ret; +} + +int +networkNotifyUnplug(virDomainNetDefPtr iface) +{ + struct network_driver *driver = driverState; + virNetworkObjPtr net = NULL; + int ret = 0; + unsigned long long new_rate; + + networkDriverLock(driver); + net = virNetworkFindByName(&driver->networks, iface->data.network.name); + networkDriverUnlock(driver); + + if (iface->class_id) { + /* we must remove class from bridge */ + new_rate = net->def->bandwidth->in->average; + + if (net->def->bandwidth->in->peak > 0) + new_rate = net->def->bandwidth->in->peak; + + ret = virNetDevBandwidthUnplug(net->def->bridge, iface->class_id); + if (ret < 0) + goto cleanup; + /* update sum of 'floor'-s of attached NICs */ + net->floor_sum -= iface->bandwidth->in->floor; + /* update rate for non guaranteed NICs */ + new_rate -= net->floor_sum; + if (virNetDevBandwidthUpdateRate(net->def->bridge, "1:2", + net->def->bandwidth, new_rate) < 0) + VIR_WARN("Unable to update rate for 1:2 class on %s bridge", + net->def->bridge); + /* no class is associated any longer */ + iface->class_id = 0; + } + +cleanup: + if (net) + virNetworkObjUnlock(net); + return ret; +} diff --git a/src/network/bridge_driver.h b/src/network/bridge_driver.h index 0fae275..2f4b2e4 100644 --- a/src/network/bridge_driver.h +++ b/src/network/bridge_driver.h @@ -48,8 +48,12 @@ int networkGetNetworkAddress(const char *netname, char **netaddr)
int networkBuildDhcpDaemonCommandLine(virNetworkObjPtr network, virCommandPtr *cmdout, char *pidfile, - dnsmasqContext *dctx) - ; + dnsmasqContext *dctx); +int networkNotifyPlug(virNetworkPtr network, + virDomainNetDefPtr iface) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int networkNotifyUnplug(virDomainNetDefPtr iface) + ATTRIBUTE_NONNULL(1); # else /* Define no-op replacements that don't drag in any link dependencies. */ # define networkAllocateActualDevice(iface) 0 @@ -57,6 +61,8 @@ int networkBuildDhcpDaemonCommandLine(virNetworkObjPtr network, # define networkReleaseActualDevice(iface) 0 # define networkGetNetworkAddress(netname, netaddr) (-2) # define networkBuildDhcpDaemonCommandLine(network, cmdout, pidfile, dctx) 0 +# define networkNotifyPlug(network, iface) 0 +# define networkNotifyUnplug(network, iface) 0 # endif
typedef char *(*networkDnsmasqLeaseFileNameFunc)(const char *netname); diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index c1f8680..efe98b3 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -207,12 +207,13 @@ qemuNetworkIfaceConnect(virDomainDefPtr def, unsigned int tap_create_flags = VIR_NETDEV_TAP_CREATE_IFUP; bool template_ifname = false; int actualType = virDomainNetGetActualType(net); + virNetworkPtr network = NULL; + virErrorPtr errobj = NULL;
if (actualType == VIR_DOMAIN_NET_TYPE_NETWORK) { int active, fail = 0; - virErrorPtr errobj; - virNetworkPtr network = virNetworkLookupByName(conn, - net->data.network.name); + network = virNetworkLookupByName(conn, + net->data.network.name); if (!network) return -1;
@@ -232,14 +233,14 @@ qemuNetworkIfaceConnect(virDomainDefPtr def, fail = 1; }
- /* Make sure any above failure is preserved */ - errobj = virSaveLastError(); - virNetworkFree(network); - virSetError(errobj); - virFreeError(errobj); - - if (fail) + if (fail) { + /* Make sure any above failure is preserved */ + errobj = virSaveLastError(); + virNetworkFree(network); + virSetError(errobj); + virFreeError(errobj); return -1; + }
} else if (actualType == VIR_DOMAIN_NET_TYPE_BRIDGE) { if (!(brname = strdup(virDomainNetGetActualBridgeName(net)))) { @@ -301,6 +302,12 @@ qemuNetworkIfaceConnect(virDomainDefPtr def, goto cleanup; }
+ if (tapfd >= 0 && network && + networkNotifyPlug(network, net) < 0) { + VIR_FORCE_CLOSE(tapfd); + goto cleanup; + } + if (tapfd >= 0) { if ((net->filter) && (net->ifname)) { if (virDomainConfNWFilterInstantiate(conn, def->uuid, net) < 0) @@ -310,7 +317,10 @@ qemuNetworkIfaceConnect(virDomainDefPtr def,
cleanup: VIR_FREE(brname); - + errobj = virSaveLastError(); + virNetworkFree(network); + virSetError(errobj); + virFreeError(errobj); return tapfd; }
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 29b7ae1..6ff065c 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -4047,6 +4047,18 @@ void qemuProcessStop(struct qemud_driver *driver, } }
+ for (i = 0; i < vm->def->nnets; i++) { + virDomainNetDefPtr net = vm->def->nets[i]; + + if (virDomainNetGetActualType(net) != VIR_DOMAIN_NET_TYPE_NETWORK) + continue; + + if (networkNotifyUnplug(net) < 0) { + VIR_WARN("Unable to remove QoS settings for interface '%s'", + net->ifname); + } + } + if (priv->agent) { qemuAgentClose(priv->agent); priv->agent = NULL;
To summarize: I think this needs to be refactored to be integrated into the network*ActualDevice() functions so that the bookkeeping is properly handled in all cases, including hotplug and libvirtd restart.

On 30.11.2012 19:43, Laine Stump wrote:
On 11/19/2012 11:51 AM, Michal Privoznik wrote:
Network should be notified if we plug in or unplug an interface, so it can perform some action, e.g. set/unset network part of QoS. --- src/Makefile.am | 7 ++- src/conf/domain_conf.h | 1 + src/conf/network_conf.c | 1 + src/conf/network_conf.h | 3 + src/libvirt_network.syms | 8 ++ src/network/bridge_driver.c | 165 +++++++++++++++++++++++++++++++++++++++++++ src/network/bridge_driver.h | 10 ++- src/qemu/qemu_command.c | 32 ++++++--- src/qemu/qemu_process.c | 12 +++ 9 files changed, 225 insertions(+), 14 deletions(-) create mode 100644 src/libvirt_network.syms
diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h index 3e46304..8a8d1e4 100644 --- a/src/conf/network_conf.h +++ b/src/conf/network_conf.h @@ -221,6 +221,9 @@ struct _virNetworkObj {
virNetworkDefPtr def; /* The current definition */ virNetworkDefPtr newDef; /* New definition to activate at shutdown */ + + unsigned int class_id; /* next class ID for QoS */
Okay, so this is just a monotonically increasing counter so that each interface gets a new and different class_id. Maybe you should call it "nextFreeClassID" or something like that, so that everyone understands it's not a class id used by the network. Or you might want to do this with a bitmap so you can re-use id's of interfaces that are disconnected. (can virBitmaps being dynamically expanded?)
+ unsigned long long floor_sum; /* sum of all 'floor'-s of attached NICs */
And you keep track of the sum of all floors here.
So, what happens if libvirtd is restarted? It looks like they both go back to their initial values.
We think alike :) In the next patches I am fixing this. class_id becomes a virBitmap and among with floor_sum gets stored into network status XML. I've split it into several patches because changes are big (in number of lines at least) and I think this way it's easier to review.
And what about hotplug?
This is a similar problem to the pool of interfaces used by macvtap/hostdev modes - a network with an interface pool needs to have the "inuse" counters of each interface refreshed whenever libvirtd restarts. So the necessary hooks are already in place:
networkAllocateActualDevice - called any time an interface with type='network' is initially allocated and connected.
networkNotifyActualDevice - called for each type='network' interface of each active domain any time libvirtd restarts
networkReleaseActualDevice - called for every type='network' interface as it is being disconnected and destroyed.
These are called for *all* type='network' interfaces, not just those that need to allocate a physical netdev from a pool.
Rather than adding your own new hooks (and figuring out all the places they need to be called), I think it would be better to use the existing hooks (perhaps calling a reduced/renamed version of your functions, which can possibly be moved over to ). That will have two advantages:
1) It will assure that the bandwidth functions are called at all the right times, including hotplug/unplug, and libvirtd restart
2) It will continue the process of consolidating all network-related functionality need for these three events into 3 functions which may some day be moved into their own daemom with a public (within libvirt anyway) API accessible via RPC (thus allowing non-privileged libvirts to have full networking functionality).
Okay, renamed to networkPlugBandwidth and networkUnplugBandwidth and inserted into networkAllocateActualDevice and networkReleaseActualDevice respectively.
Note that you will probably want to save the interface class_id in the iface->actual (as a matter of fact, in iface->data.network.actual->bandwidth, which can be retrieved with virDomainNetGetActualBandwidth()) so that it's saved properly in the interface's state without polluting the interface's config. Then it will be read back from the state file during the restart and available during networkNotifyActualDevice() of each interface. I guess you can then re-set the network->class_id by checking interface->class_id and setting
network->class_id = MAX(network->class_id, iface->class_id + 1);
for each encountered interface (or add it to a bitmap, if a) bitmaps can be enlarged dynamically or b) we can determine a reasonable maximum limit on number of active domains). At the same time, you'll recompute the network->floor_sum iteratively as the network is called with a notify for each interface.
...
To summarize: I think this needs to be refactored to be integrated into the network*ActualDevice() functions so that the bookkeeping is properly handled in all cases, including hotplug and libvirtd restart.
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

Currently, we are only keeping a inactive XML configuration in status dir. This is no longer enough as we need to keep this class_id attribute so we don't overwrite old entries when the daemon restarts. However, since there has already been a libvirt release which has just <network/> as root element, and we want to keep things compatible, detect that loaded status file is older one, and don't scream about it. --- src/conf/network_conf.c | 199 ++++++++++++++++++++++++++++++++++--------- src/conf/network_conf.h | 2 + src/libvirt_private.syms | 1 + src/network/bridge_driver.c | 9 +-- 4 files changed, 165 insertions(+), 46 deletions(-) diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index 80189e6..9c2e4d4 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -1645,6 +1645,78 @@ cleanup: return def; } +int +virNetworkObjUpdateParseFile(const char *filename, + virNetworkObjPtr net) +{ + int ret = -1; + xmlDocPtr xml = NULL; + xmlNodePtr node = NULL; + virNetworkDefPtr tmp = NULL; + xmlXPathContextPtr ctxt = NULL; + + xml = virXMLParse(filename, NULL, _("(network status)")); + if (!xml) + return -1; + + ctxt = xmlXPathNewContext(xml); + if (ctxt == NULL) { + virReportOOMError(); + goto cleanup; + } + + node = xmlDocGetRootElement(xml); + if (xmlStrEqual(node->name, BAD_CAST "networkstatus")) { + /* Newer network status file. Contains useful + * info which are not to be found in bare config XML */ + char *class_id = NULL; + char *floor_sum = NULL; + + ctxt->node = node; + class_id = virXPathString("string(./class_id[1]/@next)", ctxt); + if (class_id && + virStrToLong_ui(class_id, NULL, 10, &net->class_id) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed 'class_id' attribute: %s"), + class_id); + VIR_FREE(class_id); + goto cleanup; + } + VIR_FREE(class_id); + + floor_sum = virXPathString("string(./floor[1]/@sum)", ctxt); + if (floor_sum && + virStrToLong_ull(floor_sum, NULL, 10, &net->floor_sum) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed 'floor_sum' attribute: %s"), + floor_sum); + VIR_FREE(floor_sum); + } + VIR_FREE(floor_sum); + } + + node = virXPathNode("//network", ctxt); + if (!node) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Could not find any 'network' element")); + goto cleanup; + } + + ctxt->node = node; + tmp = virNetworkDefParseXML(ctxt); + + if (tmp) { + net->newDef = net->def; + net->def = tmp; + } + + ret = 0; + +cleanup: + xmlXPathFreeContext(ctxt); + return ret; +} + static int virNetworkDNSDefFormat(virBufferPtr buf, virNetworkDNSDefPtr def) @@ -1823,24 +1895,26 @@ virPortGroupDefFormat(virBufferPtr buf, return 0; } -char *virNetworkDefFormat(const virNetworkDefPtr def, unsigned int flags) +static int +virNetworkDefFormatInternal(virBufferPtr buf, + const virNetworkDefPtr def, + unsigned int flags) { - virBuffer buf = VIR_BUFFER_INITIALIZER; unsigned char *uuid; char uuidstr[VIR_UUID_STRING_BUFLEN]; int ii; - virBufferAddLit(&buf, "<network"); + virBufferAddLit(buf, "<network"); if (!(flags & VIR_NETWORK_XML_INACTIVE) && (def->connections > 0)) { - virBufferAsprintf(&buf, " connections='%d'", def->connections); + virBufferAsprintf(buf, " connections='%d'", def->connections); } - virBufferAddLit(&buf, ">\n"); - virBufferAdjustIndent(&buf, 2); - virBufferEscapeString(&buf, "<name>%s</name>\n", def->name); + virBufferAddLit(buf, ">\n"); + virBufferAdjustIndent(buf, 2); + virBufferEscapeString(buf, "<name>%s</name>\n", def->name); uuid = def->uuid; virUUIDFormat(uuid, uuidstr); - virBufferAsprintf(&buf, "<uuid>%s</uuid>\n", uuidstr); + virBufferAsprintf(buf, "<uuid>%s</uuid>\n", uuidstr); if (def->forwardType != VIR_NETWORK_FORWARD_NONE) { const char *dev = NULL; @@ -1854,40 +1928,40 @@ char *virNetworkDefFormat(const virNetworkDefPtr def, unsigned int flags) def->forwardType, def->name); goto error; } - virBufferAddLit(&buf, "<forward"); - virBufferEscapeString(&buf, " dev='%s'", dev); - virBufferAsprintf(&buf, " mode='%s'", mode); + virBufferAddLit(buf, "<forward"); + virBufferEscapeString(buf, " dev='%s'", dev); + virBufferAsprintf(buf, " mode='%s'", mode); if (def->forwardType == VIR_NETWORK_FORWARD_HOSTDEV) { if (def->managed == 1) - virBufferAddLit(&buf, " managed='yes'"); + virBufferAddLit(buf, " managed='yes'"); else - virBufferAddLit(&buf, " managed='no'"); + virBufferAddLit(buf, " managed='no'"); } - virBufferAsprintf(&buf, "%s>\n", + virBufferAsprintf(buf, "%s>\n", (def->nForwardIfs || def->nForwardPfs) ? "" : "/"); - virBufferAdjustIndent(&buf, 2); + virBufferAdjustIndent(buf, 2); /* For now, hard-coded to at most 1 forwardPfs */ if (def->nForwardPfs) - virBufferEscapeString(&buf, "<pf dev='%s'/>\n", + virBufferEscapeString(buf, "<pf dev='%s'/>\n", def->forwardPfs[0].dev); if (def->nForwardIfs && (!def->nForwardPfs || !(flags & VIR_NETWORK_XML_INACTIVE))) { for (ii = 0; ii < def->nForwardIfs; ii++) { if (def->forwardType != VIR_NETWORK_FORWARD_HOSTDEV) { - virBufferEscapeString(&buf, "<interface dev='%s'", + virBufferEscapeString(buf, "<interface dev='%s'", def->forwardIfs[ii].device.dev); if (!(flags & VIR_NETWORK_XML_INACTIVE) && (def->forwardIfs[ii].connections > 0)) { - virBufferAsprintf(&buf, " connections='%d'", + virBufferAsprintf(buf, " connections='%d'", def->forwardIfs[ii].connections); } - virBufferAddLit(&buf, "/>\n"); + virBufferAddLit(buf, "/>\n"); } else { if (def->forwardIfs[ii].type == VIR_NETWORK_FORWARD_HOSTDEV_DEVICE_PCI) { - if (virDevicePCIAddressFormat(&buf, + if (virDevicePCIAddressFormat(buf, def->forwardIfs[ii].device.pci, true) < 0) goto error; @@ -1895,67 +1969,111 @@ char *virNetworkDefFormat(const virNetworkDefPtr def, unsigned int flags) } } } - virBufferAdjustIndent(&buf, -2); + virBufferAdjustIndent(buf, -2); if (def->nForwardPfs || def->nForwardIfs) - virBufferAddLit(&buf, "</forward>\n"); + virBufferAddLit(buf, "</forward>\n"); } if (def->forwardType == VIR_NETWORK_FORWARD_NONE || - def->forwardType == VIR_NETWORK_FORWARD_NAT || - def->forwardType == VIR_NETWORK_FORWARD_ROUTE) { + def->forwardType == VIR_NETWORK_FORWARD_NAT || + def->forwardType == VIR_NETWORK_FORWARD_ROUTE) { - virBufferAddLit(&buf, "<bridge"); + virBufferAddLit(buf, "<bridge"); if (def->bridge) - virBufferEscapeString(&buf, " name='%s'", def->bridge); - virBufferAsprintf(&buf, " stp='%s' delay='%ld' />\n", + virBufferEscapeString(buf, " name='%s'", def->bridge); + virBufferAsprintf(buf, " stp='%s' delay='%ld' />\n", def->stp ? "on" : "off", def->delay); } else if (def->forwardType == VIR_NETWORK_FORWARD_BRIDGE && def->bridge) { - virBufferEscapeString(&buf, "<bridge name='%s' />\n", def->bridge); + virBufferEscapeString(buf, "<bridge name='%s' />\n", def->bridge); } if (def->mac_specified) { char macaddr[VIR_MAC_STRING_BUFLEN]; virMacAddrFormat(&def->mac, macaddr); - virBufferAsprintf(&buf, "<mac address='%s'/>\n", macaddr); + virBufferAsprintf(buf, "<mac address='%s'/>\n", macaddr); } if (def->domain) - virBufferAsprintf(&buf, "<domain name='%s'/>\n", def->domain); + virBufferAsprintf(buf, "<domain name='%s'/>\n", def->domain); - if (virNetworkDNSDefFormat(&buf, def->dns) < 0) + if (virNetworkDNSDefFormat(buf, def->dns) < 0) goto error; - if (virNetDevVlanFormat(&def->vlan, &buf) < 0) + if (virNetDevVlanFormat(&def->vlan, buf) < 0) goto error; - if (virNetDevBandwidthFormat(def->bandwidth, &buf) < 0) + if (virNetDevBandwidthFormat(def->bandwidth, buf) < 0) goto error; for (ii = 0; ii < def->nips; ii++) { - if (virNetworkIpDefFormat(&buf, &def->ips[ii]) < 0) + if (virNetworkIpDefFormat(buf, &def->ips[ii]) < 0) goto error; } - if (virNetDevVPortProfileFormat(def->virtPortProfile, &buf) < 0) + if (virNetDevVPortProfileFormat(def->virtPortProfile, buf) < 0) goto error; for (ii = 0; ii < def->nPortGroups; ii++) - if (virPortGroupDefFormat(&buf, &def->portGroups[ii]) < 0) + if (virPortGroupDefFormat(buf, &def->portGroups[ii]) < 0) goto error; + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</network>\n"); + + return 0; + +error: + return -1; +} + +char * +virNetworkDefFormat(virNetworkDefPtr def, + unsigned int flags) +{ + virBuffer buf = VIR_BUFFER_INITIALIZER; + + if (virNetworkDefFormatInternal(&buf, def, flags) < 0) + goto error; + + if (virBufferError(&buf)) + goto no_memory; + + return virBufferContentAndReset(&buf); + +no_memory: + virReportOOMError(); +error: + virBufferFreeAndReset(&buf); + return NULL; +} + +static char * +virNetworkObjFormat(virNetworkObjPtr net, + unsigned int flags) +{ + virBuffer buf = VIR_BUFFER_INITIALIZER; + + virBufferAddLit(&buf, "<networkstatus>\n"); + virBufferAsprintf(&buf, " <class_id next='%u'/>\n", net->class_id); + virBufferAsprintf(&buf, " <floor sum='%llu'/>\n", net->floor_sum); + + virBufferAdjustIndent(&buf, 2); + if (virNetworkDefFormatInternal(&buf, net->def, flags) < 0) + goto error; + virBufferAdjustIndent(&buf, -2); - virBufferAddLit(&buf, "</network>\n"); + virBufferAddLit(&buf, "</networkstatus>"); if (virBufferError(&buf)) goto no_memory; return virBufferContentAndReset(&buf); - no_memory: +no_memory: virReportOOMError(); - error: +error: virBufferFreeAndReset(&buf); return NULL; } @@ -2027,9 +2145,10 @@ int virNetworkSaveStatus(const char *statusDir, virNetworkObjPtr network) { int ret = -1; + int flags = 0; char *xml; - if (!(xml = virNetworkDefFormat(network->def, 0))) + if (!(xml = virNetworkObjFormat(network, flags))) goto cleanup; if (virNetworkSaveXML(statusDir, network->def, xml)) diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h index 8a8d1e4..efa9dea 100644 --- a/src/conf/network_conf.h +++ b/src/conf/network_conf.h @@ -270,6 +270,8 @@ virNetworkDefPtr virNetworkDefParseString(const char *xmlStr); virNetworkDefPtr virNetworkDefParseFile(const char *filename); virNetworkDefPtr virNetworkDefParseNode(xmlDocPtr xml, xmlNodePtr root); +int virNetworkObjUpdateParseFile(const char *filename, + virNetworkObjPtr net); char *virNetworkDefFormat(const virNetworkDefPtr def, unsigned int flags); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 5a07139..3ed0c97 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -858,6 +858,7 @@ virNetworkObjSetDefTransient; virNetworkObjUnlock; virNetworkObjUnsetDefTransient; virNetworkObjUpdate; +virNetworkObjUpdateParseFile; virNetworkRemoveInactive; virNetworkSaveConfig; virNetworkSaveStatus; diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 44fa56e..8dc9d19 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -236,7 +236,6 @@ networkFindActiveConfigs(struct network_driver *driver) { for (i = 0 ; i < driver->networks.count ; i++) { virNetworkObjPtr obj = driver->networks.objs[i]; - virNetworkDefPtr tmp; char *config; virNetworkObjLock(obj); @@ -254,12 +253,10 @@ networkFindActiveConfigs(struct network_driver *driver) { } /* Try and load the live config */ - tmp = virNetworkDefParseFile(config); + if (virNetworkObjUpdateParseFile(config, obj) < 0) + VIR_WARN("Unable to update config of '%s' network", + obj->def->name); VIR_FREE(config); - if (tmp) { - obj->newDef = obj->def; - obj->def = tmp; - } /* If bridge exists, then mark it active */ if (obj->def->bridge && -- 1.7.8.6

On 11/19/2012 11:51 AM, Michal Privoznik wrote:
Currently, we are only keeping a inactive XML configuration in status dir. This is no longer enough as we need to keep this class_id attribute so we don't overwrite old entries when the daemon restarts.
Aha! So you're looking at solving the problem in a different way - save everything to the status file rather than recomputing it as you restart. While I like the idea of having the network status file hold this information, I think its reliability is suspect. What if a guest's process is terminated while libvirtd isn't running? Or what if libvirtd exits unexpectedly after the commands to setup bandwidth have been executed, but before the new network state file has been written (or vice versa, depending on the code). Also, networks aren't properly shut down when the host is being shutdown (there's no equivalent to the libvirt-guests service, although at least one person a month or two ago requested it). If everybody agrees that saving this info to a file and re-reading it on start up is reliable, though, then we might as well do the same thing with the device pool (although it's currently a bit different - the inuse count is stored in the virNetworkDef rather than Obj, and is reported during net-dumpxml)
However, since there has already been a libvirt release
*a* libvirt release? :-)
which has just <network/> as root element, and we want to keep things compatible, detect that loaded status file is older one, and don't scream about it. --- src/conf/network_conf.c | 199 ++++++++++++++++++++++++++++++++++--------- src/conf/network_conf.h | 2 + src/libvirt_private.syms | 1 + src/network/bridge_driver.c | 9 +-- 4 files changed, 165 insertions(+), 46 deletions(-)
diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index 80189e6..9c2e4d4 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -1645,6 +1645,78 @@ cleanup: return def; }
+int +virNetworkObjUpdateParseFile(const char *filename, + virNetworkObjPtr net) +{ + int ret = -1; + xmlDocPtr xml = NULL; + xmlNodePtr node = NULL; + virNetworkDefPtr tmp = NULL; + xmlXPathContextPtr ctxt = NULL; + + xml = virXMLParse(filename, NULL, _("(network status)")); + if (!xml) + return -1; + + ctxt = xmlXPathNewContext(xml); + if (ctxt == NULL) { + virReportOOMError(); + goto cleanup; + } + + node = xmlDocGetRootElement(xml); + if (xmlStrEqual(node->name, BAD_CAST "networkstatus")) { + /* Newer network status file. Contains useful + * info which are not to be found in bare config XML */ + char *class_id = NULL; + char *floor_sum = NULL; + + ctxt->node = node; + class_id = virXPathString("string(./class_id[1]/@next)", ctxt); + if (class_id && + virStrToLong_ui(class_id, NULL, 10, &net->class_id) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed 'class_id' attribute: %s"), + class_id); + VIR_FREE(class_id); + goto cleanup; + } + VIR_FREE(class_id); + + floor_sum = virXPathString("string(./floor[1]/@sum)", ctxt); + if (floor_sum && + virStrToLong_ull(floor_sum, NULL, 10, &net->floor_sum) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed 'floor_sum' attribute: %s"), + floor_sum); + VIR_FREE(floor_sum); + } + VIR_FREE(floor_sum); + } + + node = virXPathNode("//network", ctxt); + if (!node) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Could not find any 'network' element")); + goto cleanup; + } + + ctxt->node = node; + tmp = virNetworkDefParseXML(ctxt); + + if (tmp) { + net->newDef = net->def; + net->def = tmp; + } + + ret = 0; + +cleanup: + xmlXPathFreeContext(ctxt); + return ret; +} + static int virNetworkDNSDefFormat(virBufferPtr buf, virNetworkDNSDefPtr def) @@ -1823,24 +1895,26 @@ virPortGroupDefFormat(virBufferPtr buf, return 0; }
-char *virNetworkDefFormat(const virNetworkDefPtr def, unsigned int flags) +static int +virNetworkDefFormatInternal(virBufferPtr buf, + const virNetworkDefPtr def, + unsigned int flags) { - virBuffer buf = VIR_BUFFER_INITIALIZER; unsigned char *uuid; char uuidstr[VIR_UUID_STRING_BUFLEN]; int ii;
- virBufferAddLit(&buf, "<network"); + virBufferAddLit(buf, "<network"); if (!(flags & VIR_NETWORK_XML_INACTIVE) && (def->connections > 0)) { - virBufferAsprintf(&buf, " connections='%d'", def->connections); + virBufferAsprintf(buf, " connections='%d'", def->connections); } - virBufferAddLit(&buf, ">\n"); - virBufferAdjustIndent(&buf, 2); - virBufferEscapeString(&buf, "<name>%s</name>\n", def->name); + virBufferAddLit(buf, ">\n"); + virBufferAdjustIndent(buf, 2); + virBufferEscapeString(buf, "<name>%s</name>\n", def->name);
uuid = def->uuid; virUUIDFormat(uuid, uuidstr); - virBufferAsprintf(&buf, "<uuid>%s</uuid>\n", uuidstr); + virBufferAsprintf(buf, "<uuid>%s</uuid>\n", uuidstr);
if (def->forwardType != VIR_NETWORK_FORWARD_NONE) { const char *dev = NULL; @@ -1854,40 +1928,40 @@ char *virNetworkDefFormat(const virNetworkDefPtr def, unsigned int flags) def->forwardType, def->name); goto error; } - virBufferAddLit(&buf, "<forward"); - virBufferEscapeString(&buf, " dev='%s'", dev); - virBufferAsprintf(&buf, " mode='%s'", mode); + virBufferAddLit(buf, "<forward"); + virBufferEscapeString(buf, " dev='%s'", dev); + virBufferAsprintf(buf, " mode='%s'", mode); if (def->forwardType == VIR_NETWORK_FORWARD_HOSTDEV) { if (def->managed == 1) - virBufferAddLit(&buf, " managed='yes'"); + virBufferAddLit(buf, " managed='yes'"); else - virBufferAddLit(&buf, " managed='no'"); + virBufferAddLit(buf, " managed='no'"); } - virBufferAsprintf(&buf, "%s>\n", + virBufferAsprintf(buf, "%s>\n", (def->nForwardIfs || def->nForwardPfs) ? "" : "/"); - virBufferAdjustIndent(&buf, 2); + virBufferAdjustIndent(buf, 2);
/* For now, hard-coded to at most 1 forwardPfs */ if (def->nForwardPfs) - virBufferEscapeString(&buf, "<pf dev='%s'/>\n", + virBufferEscapeString(buf, "<pf dev='%s'/>\n", def->forwardPfs[0].dev);
if (def->nForwardIfs && (!def->nForwardPfs || !(flags & VIR_NETWORK_XML_INACTIVE))) { for (ii = 0; ii < def->nForwardIfs; ii++) { if (def->forwardType != VIR_NETWORK_FORWARD_HOSTDEV) { - virBufferEscapeString(&buf, "<interface dev='%s'", + virBufferEscapeString(buf, "<interface dev='%s'", def->forwardIfs[ii].device.dev); if (!(flags & VIR_NETWORK_XML_INACTIVE) && (def->forwardIfs[ii].connections > 0)) { - virBufferAsprintf(&buf, " connections='%d'", + virBufferAsprintf(buf, " connections='%d'", def->forwardIfs[ii].connections); } - virBufferAddLit(&buf, "/>\n"); + virBufferAddLit(buf, "/>\n"); } else { if (def->forwardIfs[ii].type == VIR_NETWORK_FORWARD_HOSTDEV_DEVICE_PCI) { - if (virDevicePCIAddressFormat(&buf, + if (virDevicePCIAddressFormat(buf, def->forwardIfs[ii].device.pci, true) < 0) goto error; @@ -1895,67 +1969,111 @@ char *virNetworkDefFormat(const virNetworkDefPtr def, unsigned int flags) } } } - virBufferAdjustIndent(&buf, -2); + virBufferAdjustIndent(buf, -2); if (def->nForwardPfs || def->nForwardIfs) - virBufferAddLit(&buf, "</forward>\n"); + virBufferAddLit(buf, "</forward>\n"); }
if (def->forwardType == VIR_NETWORK_FORWARD_NONE || - def->forwardType == VIR_NETWORK_FORWARD_NAT || - def->forwardType == VIR_NETWORK_FORWARD_ROUTE) { + def->forwardType == VIR_NETWORK_FORWARD_NAT || + def->forwardType == VIR_NETWORK_FORWARD_ROUTE) {
- virBufferAddLit(&buf, "<bridge"); + virBufferAddLit(buf, "<bridge"); if (def->bridge) - virBufferEscapeString(&buf, " name='%s'", def->bridge); - virBufferAsprintf(&buf, " stp='%s' delay='%ld' />\n", + virBufferEscapeString(buf, " name='%s'", def->bridge); + virBufferAsprintf(buf, " stp='%s' delay='%ld' />\n", def->stp ? "on" : "off", def->delay); } else if (def->forwardType == VIR_NETWORK_FORWARD_BRIDGE && def->bridge) { - virBufferEscapeString(&buf, "<bridge name='%s' />\n", def->bridge); + virBufferEscapeString(buf, "<bridge name='%s' />\n", def->bridge); }
if (def->mac_specified) { char macaddr[VIR_MAC_STRING_BUFLEN]; virMacAddrFormat(&def->mac, macaddr); - virBufferAsprintf(&buf, "<mac address='%s'/>\n", macaddr); + virBufferAsprintf(buf, "<mac address='%s'/>\n", macaddr); }
if (def->domain) - virBufferAsprintf(&buf, "<domain name='%s'/>\n", def->domain); + virBufferAsprintf(buf, "<domain name='%s'/>\n", def->domain);
- if (virNetworkDNSDefFormat(&buf, def->dns) < 0) + if (virNetworkDNSDefFormat(buf, def->dns) < 0) goto error;
- if (virNetDevVlanFormat(&def->vlan, &buf) < 0) + if (virNetDevVlanFormat(&def->vlan, buf) < 0) goto error; - if (virNetDevBandwidthFormat(def->bandwidth, &buf) < 0) + if (virNetDevBandwidthFormat(def->bandwidth, buf) < 0) goto error;
for (ii = 0; ii < def->nips; ii++) { - if (virNetworkIpDefFormat(&buf, &def->ips[ii]) < 0) + if (virNetworkIpDefFormat(buf, &def->ips[ii]) < 0) goto error; }
- if (virNetDevVPortProfileFormat(def->virtPortProfile, &buf) < 0) + if (virNetDevVPortProfileFormat(def->virtPortProfile, buf) < 0) goto error;
for (ii = 0; ii < def->nPortGroups; ii++) - if (virPortGroupDefFormat(&buf, &def->portGroups[ii]) < 0) + if (virPortGroupDefFormat(buf, &def->portGroups[ii]) < 0) goto error;
+ virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</network>\n"); + + return 0; + +error: + return -1; +} + +char * +virNetworkDefFormat(virNetworkDefPtr def, + unsigned int flags) +{ + virBuffer buf = VIR_BUFFER_INITIALIZER; + + if (virNetworkDefFormatInternal(&buf, def, flags) < 0) + goto error; + + if (virBufferError(&buf)) + goto no_memory; + + return virBufferContentAndReset(&buf); + +no_memory: + virReportOOMError(); +error: + virBufferFreeAndReset(&buf); + return NULL; +} + +static char * +virNetworkObjFormat(virNetworkObjPtr net, + unsigned int flags) +{ + virBuffer buf = VIR_BUFFER_INITIALIZER; + + virBufferAddLit(&buf, "<networkstatus>\n"); + virBufferAsprintf(&buf, " <class_id next='%u'/>\n", net->class_id); + virBufferAsprintf(&buf, " <floor sum='%llu'/>\n", net->floor_sum); + + virBufferAdjustIndent(&buf, 2); + if (virNetworkDefFormatInternal(&buf, net->def, flags) < 0) + goto error; + virBufferAdjustIndent(&buf, -2); - virBufferAddLit(&buf, "</network>\n"); + virBufferAddLit(&buf, "</networkstatus>");
if (virBufferError(&buf)) goto no_memory;
return virBufferContentAndReset(&buf);
- no_memory: +no_memory: virReportOOMError(); - error: +error: virBufferFreeAndReset(&buf); return NULL; } @@ -2027,9 +2145,10 @@ int virNetworkSaveStatus(const char *statusDir, virNetworkObjPtr network) { int ret = -1; + int flags = 0; char *xml;
- if (!(xml = virNetworkDefFormat(network->def, 0))) + if (!(xml = virNetworkObjFormat(network, flags))) goto cleanup;
if (virNetworkSaveXML(statusDir, network->def, xml)) diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h index 8a8d1e4..efa9dea 100644 --- a/src/conf/network_conf.h +++ b/src/conf/network_conf.h @@ -270,6 +270,8 @@ virNetworkDefPtr virNetworkDefParseString(const char *xmlStr); virNetworkDefPtr virNetworkDefParseFile(const char *filename); virNetworkDefPtr virNetworkDefParseNode(xmlDocPtr xml, xmlNodePtr root); +int virNetworkObjUpdateParseFile(const char *filename, + virNetworkObjPtr net);
char *virNetworkDefFormat(const virNetworkDefPtr def, unsigned int flags);
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 5a07139..3ed0c97 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -858,6 +858,7 @@ virNetworkObjSetDefTransient; virNetworkObjUnlock; virNetworkObjUnsetDefTransient; virNetworkObjUpdate; +virNetworkObjUpdateParseFile; virNetworkRemoveInactive; virNetworkSaveConfig; virNetworkSaveStatus; diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 44fa56e..8dc9d19 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -236,7 +236,6 @@ networkFindActiveConfigs(struct network_driver *driver) {
for (i = 0 ; i < driver->networks.count ; i++) { virNetworkObjPtr obj = driver->networks.objs[i]; - virNetworkDefPtr tmp; char *config;
virNetworkObjLock(obj); @@ -254,12 +253,10 @@ networkFindActiveConfigs(struct network_driver *driver) { }
/* Try and load the live config */ - tmp = virNetworkDefParseFile(config); + if (virNetworkObjUpdateParseFile(config, obj) < 0) + VIR_WARN("Unable to update config of '%s' network", + obj->def->name); VIR_FREE(config); - if (tmp) { - obj->newDef = obj->def; - obj->def = tmp; - }
/* If bridge exists, then mark it active */ if (obj->def->bridge &&
ACK.

On 30.11.2012 20:04, Laine Stump wrote:
On 11/19/2012 11:51 AM, Michal Privoznik wrote:
Currently, we are only keeping a inactive XML configuration in status dir. This is no longer enough as we need to keep this class_id attribute so we don't overwrite old entries when the daemon restarts.
Aha! So you're looking at solving the problem in a different way - save everything to the status file rather than recomputing it as you restart.
While I like the idea of having the network status file hold this information, I think its reliability is suspect. What if a guest's process is terminated while libvirtd isn't running? Or what if libvirtd exits unexpectedly after the commands to setup bandwidth have been executed, but before the new network state file has been written (or vice versa, depending on the code).
Also, networks aren't properly shut down when the host is being shutdown (there's no equivalent to the libvirt-guests service, although at least one person a month or two ago requested it).
If guest is being shut down, the networkReleaseActualDevice() is called, isn't it? And this should update the floor_sum. Even if libvirtd is restarted and qemu process dies meanwhile, qemuProcessStop is called and this calls the networkReleaseActualDevice() so I think we are safe here.
If everybody agrees that saving this info to a file and re-reading it on start up is reliable, though, then we might as well do the same thing with the device pool (although it's currently a bit different - the inuse count is stored in the virNetworkDef rather than Obj, and is reported during net-dumpxml)
With new code we can update this.
However, since there has already been a libvirt release
*a* libvirt release? :-)
That 'a' belongs to 'release'. It doesn't matter which release it was. Yeah, now that I re-read it again it's a mess :)
which has just <network/> as root element, and we want to keep things compatible, detect that loaded status file is older one, and don't scream about it. --- src/conf/network_conf.c | 199 ++++++++++++++++++++++++++++++++++--------- src/conf/network_conf.h | 2 + src/libvirt_private.syms | 1 + src/network/bridge_driver.c | 9 +-- 4 files changed, 165 insertions(+), 46 deletions(-)
ACK.

On 12/03/2012 10:55 AM, Michal Privoznik wrote:
On 11/19/2012 11:51 AM, Michal Privoznik wrote:
Currently, we are only keeping a inactive XML configuration in status dir. This is no longer enough as we need to keep this class_id attribute so we don't overwrite old entries when the daemon restarts. Aha! So you're looking at solving the problem in a different way - save everything to the status file rather than recomputing it as you restart.
While I like the idea of having the network status file hold this information, I think its reliability is suspect. What if a guest's process is terminated while libvirtd isn't running? Or what if libvirtd exits unexpectedly after the commands to setup bandwidth have been executed, but before the new network state file has been written (or vice versa, depending on the code).
Also, networks aren't properly shut down when the host is being shutdown (there's no equivalent to the libvirt-guests service, although at least one person a month or two ago requested it). If guest is being shut down, the networkReleaseActualDevice() is called, isn't it? And this should update the floor_sum. Even if libvirtd is restarted and qemu process dies meanwhile, qemuProcessStop is called and
On 30.11.2012 20:04, Laine Stump wrote: this calls the networkReleaseActualDevice() so I think we are safe here.
If you've tested this and it works properly, then okay.

Interfaces keeps a class_id, which is an ID from which bridge part of QoS settings is derived. We need to store class_id in domain status file, so we can later pass it to virNetDevBandwidthUnplug. --- src/conf/domain_conf.c | 4 +- src/qemu/qemu_domain.c | 66 ++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 63 insertions(+), 7 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index bf23b77..37a8875 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -10242,7 +10242,7 @@ static virDomainObjPtr virDomainObjParseXML(virCapsPtr caps, VIR_FREE(nodes); if (caps->privateDataXMLParse && - ((caps->privateDataXMLParse)(ctxt, obj->privateData)) < 0) + ((caps->privateDataXMLParse)(ctxt, obj)) < 0) goto error; return obj; @@ -14212,7 +14212,7 @@ static char *virDomainObjFormat(virCapsPtr caps, } if (caps->privateDataXMLFormat && - ((caps->privateDataXMLFormat)(&buf, obj->privateData)) < 0) + ((caps->privateDataXMLFormat)(&buf, obj)) < 0) goto error; virBufferAdjustIndent(&buf, 2); diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index e0d6951..5312946 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -258,9 +258,12 @@ static void qemuDomainObjPrivateFree(void *data) static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) { - qemuDomainObjPrivatePtr priv = data; + virDomainObjPtr vm = data; + qemuDomainObjPrivatePtr priv = vm->privateData; const char *monitorpath; enum qemuDomainJob job; + bool do_class_id = false; + int i; /* priv->monitor_chr is set only for qemu */ if (priv->monConfig) { @@ -283,7 +286,6 @@ static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) if (priv->nvcpupids) { - int i; virBufferAddLit(buf, " <vcpus>\n"); for (i = 0 ; i < priv->nvcpupids ; i++) { virBufferAsprintf(buf, " <vcpu pid='%d'/>\n", priv->vcpupids[i]); @@ -292,7 +294,6 @@ static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) } if (priv->caps) { - int i; virBufferAddLit(buf, " <qemuCaps>\n"); for (i = 0 ; i < QEMU_CAPS_LAST ; i++) { if (qemuCapsGet(priv->caps, i)) { @@ -326,15 +327,36 @@ static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) if (priv->fakeReboot) virBufferAsprintf(buf, " <fakereboot/>\n"); + for (i = 0; i < vm->def->nnets; i++) { + if (vm->def->nets[i]->class_id) { + do_class_id = true; + break; + } + } + + if (do_class_id) { + virBufferAddLit(buf, " <class_id>\n"); + for (; i < vm->def->nnets; i++) { + virDomainNetDefPtr iface = vm->def->nets[i]; + if (iface->class_id) { + virBufferAsprintf(buf, " <interface alias='%s' " + "class_id='%u'/>\n", + iface->info.alias, iface->class_id); + } + } + virBufferAddLit(buf, " </class_id>\n"); + } + return 0; } static int qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, void *data) { - qemuDomainObjPrivatePtr priv = data; + virDomainObjPtr vm = data; + qemuDomainObjPrivatePtr priv = vm->privateData; char *monitorpath; char *tmp; - int n, i; + int n, i, ii; xmlNodePtr *nodes = NULL; qemuCapsPtr caps = NULL; @@ -471,6 +493,40 @@ static int qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, void *data) priv->fakeReboot = virXPathBoolean("boolean(./fakereboot)", ctxt) == 1; + if ((n = virXPathNodeSet("./class_id/interface", ctxt, &nodes)) < 0) + goto error; + + for (i = 0; i < n; i++) { + char *alias = virXMLPropString(nodes[i], "alias"); + char *class_id = virXMLPropString(nodes[i], "class_id"); + virDomainNetDefPtr iface = NULL; + if (alias && class_id) { + for (ii = 0; ii < vm->def->nnets; ii++) { + if (STREQ(vm->def->nets[ii]->info.alias, alias)) { + iface = vm->def->nets[ii]; + break; + } + } + + if (!iface) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("No such interface '%s'"), + alias); + VIR_FREE(alias); + VIR_FREE(class_id); + goto error; + } + + if (virStrToLong_ui(class_id, NULL, 10, &iface->class_id) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed class_id attribute: %s"), + class_id); + } + } + VIR_FREE(alias); + VIR_FREE(class_id); + } + return 0; error: -- 1.7.8.6

On 11/19/2012 11:51 AM, Michal Privoznik wrote:
Interfaces keeps a class_id, which is an ID from which bridge part of QoS settings is derived. We need to store class_id in domain status file, so we can later pass it to virNetDevBandwidthUnplug.
Interesting use of alias to find the original interface matching the class_id. But we already have a "status-only" subelement in every type='network' interface, so it will be much simpler to just store it there (conveniently in the bandwidth object, as I suggested in an earlier patch). (if <actual> wasn't already saved within <interface>, I might have considered doing it this way, but that ship has already sailed.)
--- src/conf/domain_conf.c | 4 +- src/qemu/qemu_domain.c | 66 ++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 63 insertions(+), 7 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index bf23b77..37a8875 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -10242,7 +10242,7 @@ static virDomainObjPtr virDomainObjParseXML(virCapsPtr caps, VIR_FREE(nodes);
if (caps->privateDataXMLParse && - ((caps->privateDataXMLParse)(ctxt, obj->privateData)) < 0) + ((caps->privateDataXMLParse)(ctxt, obj)) < 0) goto error;
return obj; @@ -14212,7 +14212,7 @@ static char *virDomainObjFormat(virCapsPtr caps, }
if (caps->privateDataXMLFormat && - ((caps->privateDataXMLFormat)(&buf, obj->privateData)) < 0) + ((caps->privateDataXMLFormat)(&buf, obj)) < 0) goto error;
virBufferAdjustIndent(&buf, 2); diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index e0d6951..5312946 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -258,9 +258,12 @@ static void qemuDomainObjPrivateFree(void *data)
static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) { - qemuDomainObjPrivatePtr priv = data; + virDomainObjPtr vm = data; + qemuDomainObjPrivatePtr priv = vm->privateData; const char *monitorpath; enum qemuDomainJob job; + bool do_class_id = false; + int i;
/* priv->monitor_chr is set only for qemu */ if (priv->monConfig) { @@ -283,7 +286,6 @@ static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data)
if (priv->nvcpupids) { - int i; virBufferAddLit(buf, " <vcpus>\n"); for (i = 0 ; i < priv->nvcpupids ; i++) { virBufferAsprintf(buf, " <vcpu pid='%d'/>\n", priv->vcpupids[i]); @@ -292,7 +294,6 @@ static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) }
if (priv->caps) { - int i; virBufferAddLit(buf, " <qemuCaps>\n"); for (i = 0 ; i < QEMU_CAPS_LAST ; i++) { if (qemuCapsGet(priv->caps, i)) { @@ -326,15 +327,36 @@ static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) if (priv->fakeReboot) virBufferAsprintf(buf, " <fakereboot/>\n");
+ for (i = 0; i < vm->def->nnets; i++) { + if (vm->def->nets[i]->class_id) { + do_class_id = true; + break; + } + } + + if (do_class_id) { + virBufferAddLit(buf, " <class_id>\n"); + for (; i < vm->def->nnets; i++) { + virDomainNetDefPtr iface = vm->def->nets[i]; + if (iface->class_id) { + virBufferAsprintf(buf, " <interface alias='%s' " + "class_id='%u'/>\n", + iface->info.alias, iface->class_id); + } + } + virBufferAddLit(buf, " </class_id>\n"); + } + return 0; }
static int qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, void *data) { - qemuDomainObjPrivatePtr priv = data; + virDomainObjPtr vm = data; + qemuDomainObjPrivatePtr priv = vm->privateData; char *monitorpath; char *tmp; - int n, i; + int n, i, ii; xmlNodePtr *nodes = NULL; qemuCapsPtr caps = NULL;
@@ -471,6 +493,40 @@ static int qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, void *data)
priv->fakeReboot = virXPathBoolean("boolean(./fakereboot)", ctxt) == 1;
+ if ((n = virXPathNodeSet("./class_id/interface", ctxt, &nodes)) < 0) + goto error; + + for (i = 0; i < n; i++) { + char *alias = virXMLPropString(nodes[i], "alias"); + char *class_id = virXMLPropString(nodes[i], "class_id"); + virDomainNetDefPtr iface = NULL; + if (alias && class_id) { + for (ii = 0; ii < vm->def->nnets; ii++) { + if (STREQ(vm->def->nets[ii]->info.alias, alias)) { + iface = vm->def->nets[ii]; + break; + } + } + + if (!iface) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("No such interface '%s'"), + alias); + VIR_FREE(alias); + VIR_FREE(class_id); + goto error; + } + + if (virStrToLong_ui(class_id, NULL, 10, &iface->class_id) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed class_id attribute: %s"), + class_id); + } + } + VIR_FREE(alias); + VIR_FREE(class_id); + } + return 0;
error:

Status file keeps track of class_id and floor_sum. It's better to keep it updated in case libvirtd is killed. --- src/network/bridge_driver.c | 14 ++++++++++++++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 8dc9d19..5a0f43f 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -4300,6 +4300,15 @@ networkNotifyPlug(virNetworkPtr network, net->class_id++; /* update sum of 'floor'-s of attached NICs */ net->floor_sum += iface->bandwidth->in->floor; + /* update status file */ + if (virNetworkSaveStatus(NETWORK_STATE_DIR, net) < 0) { + net->class_id--; + net->floor_sum -= iface->bandwidth->in->floor; + iface->class_id = 0; + ignore_value(virNetDevBandwidthUnplug(net->def->bridge, + net->class_id)); + goto cleanup; + } /* update rate for non guaranteed NICs */ new_rate -= net->floor_sum; if (virNetDevBandwidthUpdateRate(net->def->bridge, "1:2", @@ -4339,6 +4348,11 @@ networkNotifyUnplug(virDomainNetDefPtr iface) goto cleanup; /* update sum of 'floor'-s of attached NICs */ net->floor_sum -= iface->bandwidth->in->floor; + /* update status file */ + if (virNetworkSaveStatus(NETWORK_STATE_DIR, net) < 0) { + net->floor_sum += iface->bandwidth->in->floor; + goto cleanup; + } /* update rate for non guaranteed NICs */ new_rate -= net->floor_sum; if (virNetDevBandwidthUpdateRate(net->def->bridge, "1:2", -- 1.7.8.6

On 11/19/2012 11:51 AM, Michal Privoznik wrote:
Status file keeps track of class_id and floor_sum. It's better to keep it updated in case libvirtd is killed.
I'm not sure why you're doing this type of iterative improvement in separate patches. Since you would want this functionality in any working version of the code, and you haven't already pushed the earlier versions, why not just do it in the original patch that introduces these functions? Likewise, doing part of the functionality, then a bit of infrastructure to allow the new functionality to work better, and then another patch to improve the new functionality makes it a bit of a treasure hunt to review; I like to make my patchsets so that the first patches contain all the improvements/changes to existing infrastructure that will be needed, then the following patches introduce the new code, fully functioning from the beginning.
--- src/network/bridge_driver.c | 14 ++++++++++++++ 1 files changed, 14 insertions(+), 0 deletions(-)
diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 8dc9d19..5a0f43f 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -4300,6 +4300,15 @@ networkNotifyPlug(virNetworkPtr network, net->class_id++; /* update sum of 'floor'-s of attached NICs */ net->floor_sum += iface->bandwidth->in->floor; + /* update status file */ + if (virNetworkSaveStatus(NETWORK_STATE_DIR, net) < 0) { + net->class_id--; + net->floor_sum -= iface->bandwidth->in->floor; + iface->class_id = 0; + ignore_value(virNetDevBandwidthUnplug(net->def->bridge, + net->class_id)); + goto cleanup; + } /* update rate for non guaranteed NICs */ new_rate -= net->floor_sum; if (virNetDevBandwidthUpdateRate(net->def->bridge, "1:2", @@ -4339,6 +4348,11 @@ networkNotifyUnplug(virDomainNetDefPtr iface) goto cleanup; /* update sum of 'floor'-s of attached NICs */ net->floor_sum -= iface->bandwidth->in->floor; + /* update status file */ + if (virNetworkSaveStatus(NETWORK_STATE_DIR, net) < 0) { + net->floor_sum += iface->bandwidth->in->floor; + goto cleanup; + } /* update rate for non guaranteed NICs */ new_rate -= net->floor_sum; if (virNetDevBandwidthUpdateRate(net->def->bridge, "1:2",

On 30.11.2012 21:00, Laine Stump wrote:
On 11/19/2012 11:51 AM, Michal Privoznik wrote:
Status file keeps track of class_id and floor_sum. It's better to keep it updated in case libvirtd is killed.
I'm not sure why you're doing this type of iterative improvement in separate patches. Since you would want this functionality in any working version of the code, and you haven't already pushed the earlier versions, why not just do it in the original patch that introduces these functions?
Likewise, doing part of the functionality, then a bit of infrastructure to allow the new functionality to work better, and then another patch to improve the new functionality makes it a bit of a treasure hunt to review; I like to make my patchsets so that the first patches contain all the improvements/changes to existing infrastructure that will be needed, then the following patches introduce the new code, fully functioning from the beginning.
Yeah, these two patches can be joined together.

With current implementation, class ID is just incremented. This can lead to its exhaustion as tc accepts only 16 bits long identifiers. Hence, it's better if we allow class ID to be reused. To keep track which IDs are free and which are taken virBitmap is used. This requires network status file to change a bit: from <class_id next="5"/> to <class_id bitmap="0-4"/>. But since the previous format hasn't been released, it doesn't really matter. --- src/conf/network_conf.c | 34 +++++++++++++++++++++++++---- src/conf/network_conf.h | 3 +- src/network/bridge_driver.c | 49 ++++++++++++++++++++++++++++++++++++------- 3 files changed, 72 insertions(+), 14 deletions(-) diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index 9c2e4d4..a41119c 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -47,6 +47,8 @@ #define MAX_BRIDGE_ID 256 #define VIR_FROM_THIS VIR_FROM_NETWORK +/* currently, /sbin/tc implementation allows up 16 bits for minor class size */ +#define CLASS_ID_BITMAP_SIZE (1<<16) VIR_ENUM_IMPL(virNetworkForward, VIR_NETWORK_FORWARD_LAST, @@ -317,13 +319,29 @@ virNetworkAssignDef(virNetworkObjListPtr nets, return NULL; } virNetworkObjLock(network); - network->def = def; - network->class_id = 3; /* next free class id */ + if (!(network->class_id = virBitmapNew(CLASS_ID_BITMAP_SIZE))) { + virReportOOMError(); + goto error; + } + + if (virBitmapSetBit(network->class_id, 0) < 0 || + virBitmapSetBit(network->class_id, 1) < 0 || + virBitmapSetBit(network->class_id, 2) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("unable to initialize class id bitmap")); + goto error; + } + + network->def = def; nets->objs[nets->count] = network; nets->count++; return network; +error: + virNetworkObjUnlock(network); + virNetworkObjFree(network); + return NULL; } @@ -1673,9 +1691,10 @@ virNetworkObjUpdateParseFile(const char *filename, char *floor_sum = NULL; ctxt->node = node; - class_id = virXPathString("string(./class_id[1]/@next)", ctxt); + class_id = virXPathString("string(./class_id[1]/@bitmap)", ctxt); if (class_id && - virStrToLong_ui(class_id, NULL, 10, &net->class_id) < 0) { + virBitmapParse(class_id, ',', + &net->class_id, CLASS_ID_BITMAP_SIZE) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Malformed 'class_id' attribute: %s"), class_id); @@ -2054,10 +2073,15 @@ virNetworkObjFormat(virNetworkObjPtr net, unsigned int flags) { virBuffer buf = VIR_BUFFER_INITIALIZER; + char *class_id = virBitmapFormat(net->class_id); + + if (!class_id) + goto no_memory; virBufferAddLit(&buf, "<networkstatus>\n"); - virBufferAsprintf(&buf, " <class_id next='%u'/>\n", net->class_id); + virBufferAsprintf(&buf, " <class_id bitmap='%s'/>\n", class_id); virBufferAsprintf(&buf, " <floor sum='%llu'/>\n", net->floor_sum); + VIR_FREE(class_id); virBufferAdjustIndent(&buf, 2); if (virNetworkDefFormatInternal(&buf, net->def, flags) < 0) diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h index efa9dea..6f6b221 100644 --- a/src/conf/network_conf.h +++ b/src/conf/network_conf.h @@ -38,6 +38,7 @@ # include "virnetdevvlan.h" # include "virmacaddr.h" # include "device_conf.h" +# include "bitmap.h" enum virNetworkForwardType { VIR_NETWORK_FORWARD_NONE = 0, @@ -222,7 +223,7 @@ struct _virNetworkObj { virNetworkDefPtr def; /* The current definition */ virNetworkDefPtr newDef; /* New definition to activate at shutdown */ - unsigned int class_id; /* next class ID for QoS */ + virBitmapPtr class_id; /* bitmap of class IDs for QoS */ unsigned long long floor_sum; /* sum of all 'floor'-s of attached NICs */ }; diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 5a0f43f..8341a78 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -4249,6 +4249,31 @@ cleanup: return ret; } +/** + * networkNextClassID: + * @net: network object + * + * Find next free class ID. @net is supposed + * to be locked already. If there is a free ID, + * it is marked as used and returned. + * + * Returns next free class ID or -1 if none is available. + */ +static ssize_t +networkNextClassID(virNetworkObjPtr net) +{ + size_t ret = 0; + bool is_set = false; + + while (virBitmapGetBit(net->class_id, ret, &is_set) == 0 && is_set) + ret++; + + if (is_set || virBitmapSetBit(net->class_id, ret) < 0) + return -1; + + return ret; +} + int networkNotifyPlug(virNetworkPtr network, virDomainNetDefPtr iface) @@ -4258,6 +4283,7 @@ networkNotifyPlug(virNetworkPtr network, int ret = -1; int plug_ret; unsigned long long new_rate = 0; + ssize_t class_id = 0; networkDriverLock(driver); net = virNetworkFindByUUID(&driver->networks, network->uuid); @@ -4274,15 +4300,21 @@ networkNotifyPlug(virNetworkPtr network, goto cleanup; } + /* generate new class_id */ + if ((class_id = networkNextClassID(net)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Could not generate next class ID")); + goto cleanup; + } + plug_ret = virNetDevBandwidthPlug(net->def->bridge, net->def->bandwidth, iface->ifname, &iface->mac, iface->bandwidth, - net->class_id); + class_id); if (plug_ret < 0) { - ignore_value(virNetDevBandwidthUnplug(net->def->bridge, - net->class_id)); + ignore_value(virNetDevBandwidthUnplug(net->def->bridge, class_id)); goto cleanup; } @@ -4296,17 +4328,15 @@ networkNotifyPlug(virNetworkPtr network, } /* QoS was set, generate new class ID */ - iface->class_id = net->class_id; - net->class_id++; + iface->class_id = class_id; /* update sum of 'floor'-s of attached NICs */ net->floor_sum += iface->bandwidth->in->floor; /* update status file */ if (virNetworkSaveStatus(NETWORK_STATE_DIR, net) < 0) { - net->class_id--; + ignore_value(virBitmapClearBit(net->class_id, class_id)); net->floor_sum -= iface->bandwidth->in->floor; iface->class_id = 0; - ignore_value(virNetDevBandwidthUnplug(net->def->bridge, - net->class_id)); + ignore_value(virNetDevBandwidthUnplug(net->def->bridge, class_id)); goto cleanup; } /* update rate for non guaranteed NICs */ @@ -4348,9 +4378,12 @@ networkNotifyUnplug(virDomainNetDefPtr iface) goto cleanup; /* update sum of 'floor'-s of attached NICs */ net->floor_sum -= iface->bandwidth->in->floor; + /* return class ID */ + ignore_value(virBitmapClearBit(net->class_id, iface->class_id)); /* update status file */ if (virNetworkSaveStatus(NETWORK_STATE_DIR, net) < 0) { net->floor_sum += iface->bandwidth->in->floor; + ignore_value(virBitmapSetBit(net->class_id, iface->class_id)); goto cleanup; } /* update rate for non guaranteed NICs */ -- 1.7.8.6

On 11/19/2012 11:51 AM, Michal Privoznik wrote:
With current implementation, class ID is just incremented. This can lead to its exhaustion as tc accepts only 16 bits long identifiers. Hence, it's better if we allow class ID to be reused. To keep track which IDs are free and which are taken virBitmap is used. This requires network status file to change a bit: from <class_id next="5"/> to <class_id bitmap="0-4"/>. But since the previous format hasn't been released, it doesn't really matter.
Heh. Well, there you have it. :-) You've already implemented what I suggested in the review of 5/10. But rather than introducing one implementation that we need to review, then almost immediately replacing it with something else, why not just implement it this way to begin with?
--- src/conf/network_conf.c | 34 +++++++++++++++++++++++++---- src/conf/network_conf.h | 3 +- src/network/bridge_driver.c | 49 ++++++++++++++++++++++++++++++++++++------- 3 files changed, 72 insertions(+), 14 deletions(-)
diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index 9c2e4d4..a41119c 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -47,6 +47,8 @@
#define MAX_BRIDGE_ID 256 #define VIR_FROM_THIS VIR_FROM_NETWORK +/* currently, /sbin/tc implementation allows up 16 bits for minor class size */ +#define CLASS_ID_BITMAP_SIZE (1<<16)
VIR_ENUM_IMPL(virNetworkForward, VIR_NETWORK_FORWARD_LAST, @@ -317,13 +319,29 @@ virNetworkAssignDef(virNetworkObjListPtr nets, return NULL; } virNetworkObjLock(network); - network->def = def; - network->class_id = 3; /* next free class id */
+ if (!(network->class_id = virBitmapNew(CLASS_ID_BITMAP_SIZE))) { + virReportOOMError(); + goto error; + } + + if (virBitmapSetBit(network->class_id, 0) < 0 || + virBitmapSetBit(network->class_id, 1) < 0 || + virBitmapSetBit(network->class_id, 2) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("unable to initialize class id bitmap")); + goto error; + } + + network->def = def; nets->objs[nets->count] = network; nets->count++;
return network; +error: + virNetworkObjUnlock(network); + virNetworkObjFree(network); + return NULL;
}
@@ -1673,9 +1691,10 @@ virNetworkObjUpdateParseFile(const char *filename, char *floor_sum = NULL;
ctxt->node = node; - class_id = virXPathString("string(./class_id[1]/@next)", ctxt); + class_id = virXPathString("string(./class_id[1]/@bitmap)", ctxt); if (class_id && - virStrToLong_ui(class_id, NULL, 10, &net->class_id) < 0) { + virBitmapParse(class_id, ',', + &net->class_id, CLASS_ID_BITMAP_SIZE) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Malformed 'class_id' attribute: %s"), class_id); @@ -2054,10 +2073,15 @@ virNetworkObjFormat(virNetworkObjPtr net, unsigned int flags) { virBuffer buf = VIR_BUFFER_INITIALIZER; + char *class_id = virBitmapFormat(net->class_id); + + if (!class_id) + goto no_memory;
virBufferAddLit(&buf, "<networkstatus>\n"); - virBufferAsprintf(&buf, " <class_id next='%u'/>\n", net->class_id); + virBufferAsprintf(&buf, " <class_id bitmap='%s'/>\n", class_id); virBufferAsprintf(&buf, " <floor sum='%llu'/>\n", net->floor_sum); + VIR_FREE(class_id);
virBufferAdjustIndent(&buf, 2); if (virNetworkDefFormatInternal(&buf, net->def, flags) < 0) diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h index efa9dea..6f6b221 100644 --- a/src/conf/network_conf.h +++ b/src/conf/network_conf.h @@ -38,6 +38,7 @@ # include "virnetdevvlan.h" # include "virmacaddr.h" # include "device_conf.h" +# include "bitmap.h"
enum virNetworkForwardType { VIR_NETWORK_FORWARD_NONE = 0, @@ -222,7 +223,7 @@ struct _virNetworkObj { virNetworkDefPtr def; /* The current definition */ virNetworkDefPtr newDef; /* New definition to activate at shutdown */
- unsigned int class_id; /* next class ID for QoS */ + virBitmapPtr class_id; /* bitmap of class IDs for QoS */ unsigned long long floor_sum; /* sum of all 'floor'-s of attached NICs */ };
diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 5a0f43f..8341a78 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -4249,6 +4249,31 @@ cleanup: return ret; }
+/** + * networkNextClassID: + * @net: network object + * + * Find next free class ID. @net is supposed + * to be locked already. If there is a free ID, + * it is marked as used and returned. + * + * Returns next free class ID or -1 if none is available. + */ +static ssize_t +networkNextClassID(virNetworkObjPtr net) +{ + size_t ret = 0; + bool is_set = false; + + while (virBitmapGetBit(net->class_id, ret, &is_set) == 0 && is_set) + ret++; + + if (is_set || virBitmapSetBit(net->class_id, ret) < 0) + return -1; + + return ret; +} + int networkNotifyPlug(virNetworkPtr network, virDomainNetDefPtr iface) @@ -4258,6 +4283,7 @@ networkNotifyPlug(virNetworkPtr network, int ret = -1; int plug_ret; unsigned long long new_rate = 0; + ssize_t class_id = 0;
networkDriverLock(driver); net = virNetworkFindByUUID(&driver->networks, network->uuid); @@ -4274,15 +4300,21 @@ networkNotifyPlug(virNetworkPtr network, goto cleanup; }
+ /* generate new class_id */ + if ((class_id = networkNextClassID(net)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Could not generate next class ID")); + goto cleanup; + } + plug_ret = virNetDevBandwidthPlug(net->def->bridge, net->def->bandwidth, iface->ifname, &iface->mac, iface->bandwidth, - net->class_id); + class_id); if (plug_ret < 0) { - ignore_value(virNetDevBandwidthUnplug(net->def->bridge, - net->class_id)); + ignore_value(virNetDevBandwidthUnplug(net->def->bridge, class_id)); goto cleanup; }
@@ -4296,17 +4328,15 @@ networkNotifyPlug(virNetworkPtr network, }
/* QoS was set, generate new class ID */ - iface->class_id = net->class_id; - net->class_id++; + iface->class_id = class_id; /* update sum of 'floor'-s of attached NICs */ net->floor_sum += iface->bandwidth->in->floor; /* update status file */ if (virNetworkSaveStatus(NETWORK_STATE_DIR, net) < 0) { - net->class_id--; + ignore_value(virBitmapClearBit(net->class_id, class_id)); net->floor_sum -= iface->bandwidth->in->floor; iface->class_id = 0; - ignore_value(virNetDevBandwidthUnplug(net->def->bridge, - net->class_id)); + ignore_value(virNetDevBandwidthUnplug(net->def->bridge, class_id)); goto cleanup; } /* update rate for non guaranteed NICs */ @@ -4348,9 +4378,12 @@ networkNotifyUnplug(virDomainNetDefPtr iface) goto cleanup; /* update sum of 'floor'-s of attached NICs */ net->floor_sum -= iface->bandwidth->in->floor; + /* return class ID */ + ignore_value(virBitmapClearBit(net->class_id, iface->class_id)); /* update status file */ if (virNetworkSaveStatus(NETWORK_STATE_DIR, net) < 0) { net->floor_sum += iface->bandwidth->in->floor; + ignore_value(virBitmapSetBit(net->class_id, iface->class_id)); goto cleanup; } /* update rate for non guaranteed NICs */

On 30.11.2012 21:55, Laine Stump wrote:
On 11/19/2012 11:51 AM, Michal Privoznik wrote:
With current implementation, class ID is just incremented. This can lead to its exhaustion as tc accepts only 16 bits long identifiers. Hence, it's better if we allow class ID to be reused. To keep track which IDs are free and which are taken virBitmap is used. This requires network status file to change a bit: from <class_id next="5"/> to <class_id bitmap="0-4"/>. But since the previous format hasn't been released, it doesn't really matter.
Heh. Well, there you have it. :-) You've already implemented what I suggested in the review of 5/10. But rather than introducing one implementation that we need to review, then almost immediately replacing it with something else, why not just implement it this way to begin with?
Because I think 5/10 is big enough already :)

On 12/03/2012 10:55 AM, Michal Privoznik wrote:
On 30.11.2012 21:55, Laine Stump wrote:
On 11/19/2012 11:51 AM, Michal Privoznik wrote:
With current implementation, class ID is just incremented. This can lead to its exhaustion as tc accepts only 16 bits long identifiers. Hence, it's better if we allow class ID to be reused. To keep track which IDs are free and which are taken virBitmap is used. This requires network status file to change a bit: from <class_id next="5"/> to <class_id bitmap="0-4"/>. But since the previous format hasn't been released, it doesn't really matter. Heh. Well, there you have it. :-) You've already implemented what I suggested in the review of 5/10. But rather than introducing one implementation that we need to review, then almost immediately replacing it with something else, why not just implement it this way to begin with? Because I think 5/10 is big enough already :)
I actually don't mind large patches, as long as they're not mixing up a bunch of unrelated stuff. (And I don't mind split patches *too* much as long as a later patch doesn't undo too much stuff that was just put in with an earlier patch.)

SFQ is qdisc which doesn't really shape any traffic but 'just' re-arrange packets in sending buffer so no stream starve. The goal is to ensure fairness. There is basically only one configuration parameter (perturb) which is set to advised value of 10. --- src/util/virnetdevbandwidth.c | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index 3abe7e2..b4ffc29 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -124,6 +124,16 @@ virNetDevBandwidthSet(const char *ifname, virCommandFree(cmd); cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "add", "dev", ifname, "parent", + hierarchical_class ? "1:2" : "1:1", + "handle", "2:", "sfq", "perturb", + "10", NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); virCommandAddArgList(cmd,"filter", "add", "dev", ifname, "parent", "1:0", "protocol", "ip", "handle", "1", "fw", "flowid", "1", NULL); -- 1.7.8.6

On 11/19/2012 11:51 AM, Michal Privoznik wrote:
SFQ is qdisc which doesn't really shape any traffic but 'just' re-arrange packets in sending buffer so no stream starve. The goal is to ensure fairness. There is basically only one configuration parameter (perturb) which is set to advised value of 10.
If this is a part of adding floor, then it should be merged into the earlier patch (2/11 I think?). If it is a good thing to have independent of floor, then I think it should be put in right at the beginning of the series (without the reference to hierarchical_class) so that if somebody wants to backport just that patch, it will be easier (then in 2/11 you would add the reference to hierarchical_class). Otherwise ACK.
--- src/util/virnetdevbandwidth.c | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index 3abe7e2..b4ffc29 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -124,6 +124,16 @@ virNetDevBandwidthSet(const char *ifname,
virCommandFree(cmd); cmd = virCommandNew(TC); + virCommandAddArgList(cmd, "qdisc", "add", "dev", ifname, "parent", + hierarchical_class ? "1:2" : "1:1", + "handle", "2:", "sfq", "perturb", + "10", NULL); + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + virCommandFree(cmd); + cmd = virCommandNew(TC); virCommandAddArgList(cmd,"filter", "add", "dev", ifname, "parent", "1:0", "protocol", "ip", "handle", "1", "fw", "flowid", "1", NULL);

On 30.11.2012 22:00, Laine Stump wrote:
On 11/19/2012 11:51 AM, Michal Privoznik wrote:
SFQ is qdisc which doesn't really shape any traffic but 'just' re-arrange packets in sending buffer so no stream starve. The goal is to ensure fairness. There is basically only one configuration parameter (perturb) which is set to advised value of 10.
If this is a part of adding floor, then it should be merged into the earlier patch (2/11 I think?). If it is a good thing to have independent of floor, then I think it should be put in right at the beginning of the series (without the reference to hierarchical_class) so that if somebody wants to backport just that patch, it will be easier (then in 2/11 you would add the reference to hierarchical_class).
Otherwise ACK.
Okay, I'll move it in the front.

This approach implemented in previous patches is not trivial and deserves small description. --- src/util/virnetdevbandwidth.c | 68 +++++++++++++++++++++++++++++++++++++--- 1 files changed, 62 insertions(+), 6 deletions(-) diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index b4ffc29..d32c7db 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -92,13 +92,69 @@ virNetDevBandwidthSet(const char *ifname, if (virCommandRun(cmd, NULL) < 0) goto cleanup; + /* If we are creating hierarchical class, all non guaranteed traffic + * goes to 1:2 class which will adjust 'rate' dynamically as NICs with + * guaranteed throughput are plugged and unplugged. Class 1:1 is there + * so we don't exceed the maximum limit for network. For each NIC with + * guaranteed throughput a separate classid will be created. + * NB '1:' is just a shorter notation of '1:0'. + * + * To get a picture how this works: + * + * +-----+ +---------+ +-----------+ +-----------+ +-----+ + * | | | qdisc | | class 1:1 | | class 1:2 | | | + * | NIC | | def 1:2 | | rate | | rate | | sfq | + * | | --> | | --> | peak | -+-> | peak | --> | | + * +-----+ +---------+ +-----------+ | +-----------+ +-----+ + * | + * | +-----------+ +-----+ + * | | class 1:3 | | | + * | | rate | | sfq | + * +-> | peak | --> | | + * | +-----------+ +-----+ + * ... + * | +-----------+ +-----+ + * | | class 1:n | | | + * | | rate | | sfq | + * +-> | peak | --> | | + * +-----------+ +-----+ + * + * After the routing decision, when is it clear a packet is to be send + * via NIC, it is sent to root qdisc (queueing discipline). In this case + * HTB (Hierarchical Token Bucket). It has only one direct child class + * (with id 1:1) which shapes the overall rate that is sent through NIC. + * This class have at least one child (1:2). This is meant for whole + * non-privileged (non guaranteed) traffic from all domains. Then, for + * each interface with guaranteed throughput a separate class (1:n) is + * created. Imagine a class is a box. Whenever a packet ends up in a + * class it is stored in this box until a kernel sends it in which case + * it is removed from box. Packets are placed into boxes based on rules + * (filters) - e.g. depending on destination IP/MAC address. If there is + * no rule to be applied, root qdisc have a default where such packets + * go (1:2 in this case). Packets come in over and over again and boxes + * get filled more and more. Imagine that kernel sends packets just once + * a second. So it starts to traverse through this tree. It starts with + * root qdisc and over 1:1 it gets to 1:2. It sends packets up to its + * 'rate'. Then it takes 1:3 and again sends packets up to its 'rate'. + * And the whole process is repeated until 1:n is processed. So now we + * have ensured each class its guaranteed bandwidth. If the sum of sent + * data doesn't exceed 'rate' in 1:1 class, we can go further and send + * more packets. The rest of available bandwidth is distributed to + * 1:2,1:3...1:n classes by ratio of their 'rate'. As soon as root + * 'rate' limit is reached or there are no more packets to send, we stop + * sending and wait another second. Each class has SFQ qdisc which + * shuffles packets in boxes stochastically, so one sender could not + * starve others. + * + * Therefore, whenever we want to plug a new guaranteed interface, we + * need to create a new class and adjust 'rate' of 1:2 class. When + * unplugging we do the exact opposite - remove associated class, and + * adjust the 'rate'. + * + * This description is rather longer and you'd better read it before you + * start digging into this :) + */ if (hierarchical_class) { - /* If we are creating hierarchical class, all non guaranteed traffic - * goes to 1:2 class which will adjust 'rate' dynamically as NICs with - * guaranteed throughput are plugged and unplugged. Class 1:1 is there - * so we don't exceed the maximum limit for network. For each NIC with - * guaranteed throughput a separate classid will be created. - * NB '1:' is just a shorter notation of '1:0' */ virCommandFree(cmd); cmd = virCommandNew(TC); virCommandAddArgList(cmd, "class", "add", "dev", ifname, "parent", -- 1.7.8.6

On 11/19/2012 11:51 AM, Michal Privoznik wrote:
This approach implemented in previous patches is not trivial and deserves small description. --- src/util/virnetdevbandwidth.c | 68 +++++++++++++++++++++++++++++++++++++--- 1 files changed, 62 insertions(+), 6 deletions(-)
diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c index b4ffc29..d32c7db 100644 --- a/src/util/virnetdevbandwidth.c +++ b/src/util/virnetdevbandwidth.c @@ -92,13 +92,69 @@ virNetDevBandwidthSet(const char *ifname, if (virCommandRun(cmd, NULL) < 0) goto cleanup;
+ /* If we are creating hierarchical class, all non guaranteed traffic
"creating *a* hierarchical class"
+ * goes to 1:2 class which will adjust 'rate' dynamically as NICs with
"goes to *the& 1:2 class"
+ * guaranteed throughput are plugged and unplugged. Class 1:1 is there
s/is there/exists/
+ * so we don't exceed the maximum limit for network. For each NIC with
"for *the* network"
+ * guaranteed throughput a separate classid will be created. + * NB '1:' is just a shorter notation of '1:0'. + * + * To get a picture how this works: + * + * +-----+ +---------+ +-----------+ +-----------+ +-----+ + * | | | qdisc | | class 1:1 | | class 1:2 | | | + * | NIC | | def 1:2 | | rate | | rate | | sfq | + * | | --> | | --> | peak | -+-> | peak | --> | | + * +-----+ +---------+ +-----------+ | +-----------+ +-----+ + * | + * | +-----------+ +-----+ + * | | class 1:3 | | | + * | | rate | | sfq | + * +-> | peak | --> | | + * | +-----------+ +-----+ + * ... + * | +-----------+ +-----+ + * | | class 1:n | | | + * | | rate | | sfq | + * +-> | peak | --> | | + * +-----------+ +-----+ + * + * After the routing decision, when is it clear a packet is to be send
s/send/sent/
+ * via NIC, it is sent to root qdisc (queueing discipline). In this case
"via *a particular* NIC, it is sent to *the* root qdisc" (for that NIC?)
+ * HTB (Hierarchical Token Bucket). It has only one direct child class + * (with id 1:1) which shapes the overall rate that is sent through NIC.
"through the NIC"
+ * This class have at least one child (1:2). This is meant for whole
s/have/has/ What does "is meant" mean here? A better phrase is needed s/whole/all/
+ * non-privileged (non guaranteed) traffic from all domains. Then, for + * each interface with guaranteed throughput a separate class (1:n) is
put a comma after throughput
+ * created. Imagine a class is a box. Whenever a packet ends up in a + * class it is stored in this box until a kernel sends it in which case
s/a kernel/the kernel/ s/in which case/, then/
+ * it is removed from box. Packets are placed into boxes based on rules + * (filters) - e.g. depending on destination IP/MAC address. If there is + * no rule to be applied, root qdisc have a default where such packets
"*the* root qdisc *has* a default"
+ * go (1:2 in this case). Packets come in over and over again and boxes + * get filled more and more. Imagine that kernel sends packets just once
s/kernel/the kernel/
+ * a second. So it starts to traverse through this tree. It starts with + * root qdisc and over 1:1 it gets to 1:2. It sends packets up to its s/root/the root/ s/over/through/
What does "its" refer to? 1:2? If so, use "1:2's", because "its" is ambiguous.
+ * 'rate'. Then it takes 1:3 and again sends packets up to its 'rate'.
s/takes/moves to/ Again, "its" is kind of ambiguous. I *think* you mean 1:3's rate, but because you've repeatedly referred to the kernel as "it", someone may be confused into thinking that the kernel has a 'rate', or something equally inane.
+ * And the whole process is repeated until 1:n is processed. So now we
s/And the/The/ What does "until 1:n is processed" mean? That if there are "n" classes, all of them will be processed? In that case, maybe you could change it to something like "The whole process is repeated until all classes 1:1 - 1:n are processed."
+ * have ensured each class its guaranteed bandwidth. If the sum of sent + * data doesn't exceed 'rate' in 1:1 class, we can go further and send
"exceed *the* 'rate'"
+ * more packets. The rest of available bandwidth is distributed to
s/to/to the/
+ * 1:2,1:3...1:n classes by ratio of their 'rate'. As soon as root
"as *the* root"
+ * 'rate' limit is reached or there are no more packets to send, we stop + * sending and wait another second. Each class has SFQ qdisc which
"has *an* SFQ qdisc"
+ * shuffles packets in boxes stochastically, so one sender could not
s/could/can/
+ * starve others. + * + * Therefore, whenever we want to plug a new guaranteed interface, we
s/plug/plug in/
+ * need to create a new class and adjust 'rate' of 1:2 class. When
"adjust *the* 'rate' of *the* 1:2 class."
+ * unplugging we do the exact opposite - remove associated class, and
"remove *the* associated class"
+ * adjust the 'rate'. + * + * This description is rather longer and you'd better read it before you + * start digging into this :)
"This description is rather long, but it is still a good idea to read it before you dig into the code"
+ */ if (hierarchical_class) { - /* If we are creating hierarchical class, all non guaranteed traffic - * goes to 1:2 class which will adjust 'rate' dynamically as NICs with - * guaranteed throughput are plugged and unplugged. Class 1:1 is there - * so we don't exceed the maximum limit for network. For each NIC with - * guaranteed throughput a separate classid will be created. - * NB '1:' is just a shorter notation of '1:0' */ virCommandFree(cmd); cmd = virCommandNew(TC); virCommandAddArgList(cmd, "class", "add", "dev", ifname, "parent",
ACK with those changes, but please squash it into the patch that adds the code.
participants (2)
-
Laine Stump
-
Michal Privoznik