This approach implemented in previous patches is not trivial
and deserves small description.
---
src/util/virnetdevbandwidth.c | 68 +++++++++++++++++++++++++++++++++++++---
1 files changed, 62 insertions(+), 6 deletions(-)
diff --git a/src/util/virnetdevbandwidth.c b/src/util/virnetdevbandwidth.c
index b4ffc29..d32c7db 100644
--- a/src/util/virnetdevbandwidth.c
+++ b/src/util/virnetdevbandwidth.c
@@ -92,13 +92,69 @@ virNetDevBandwidthSet(const char *ifname,
if (virCommandRun(cmd, NULL) < 0)
goto cleanup;
+ /* If we are creating hierarchical class, all non guaranteed traffic
+ * goes to 1:2 class which will adjust 'rate' dynamically as NICs with
+ * guaranteed throughput are plugged and unplugged. Class 1:1 is there
+ * so we don't exceed the maximum limit for network. For each NIC with
+ * guaranteed throughput a separate classid will be created.
+ * NB '1:' is just a shorter notation of '1:0'.
+ *
+ * To get a picture how this works:
+ *
+ * +-----+ +---------+ +-----------+ +-----------+ +-----+
+ * | | | qdisc | | class 1:1 | | class 1:2 | | |
+ * | NIC | | def 1:2 | | rate | | rate | | sfq |
+ * | | --> | | --> | peak | -+-> | peak | --> |
|
+ * +-----+ +---------+ +-----------+ | +-----------+ +-----+
+ * |
+ * | +-----------+ +-----+
+ * | | class 1:3 | | |
+ * | | rate | | sfq |
+ * +-> | peak | --> |
|
+ * | +-----------+ +-----+
+ * ...
+ * | +-----------+ +-----+
+ * | | class 1:n | | |
+ * | | rate | | sfq |
+ * +-> | peak | --> |
|
+ * +-----------+ +-----+
+ *
+ * After the routing decision, when is it clear a packet is to be send
+ * via NIC, it is sent to root qdisc (queueing discipline). In this case
+ * HTB (Hierarchical Token Bucket). It has only one direct child class
+ * (with id 1:1) which shapes the overall rate that is sent through NIC.
+ * This class have at least one child (1:2). This is meant for whole
+ * non-privileged (non guaranteed) traffic from all domains. Then, for
+ * each interface with guaranteed throughput a separate class (1:n) is
+ * created. Imagine a class is a box. Whenever a packet ends up in a
+ * class it is stored in this box until a kernel sends it in which case
+ * it is removed from box. Packets are placed into boxes based on rules
+ * (filters) - e.g. depending on destination IP/MAC address. If there is
+ * no rule to be applied, root qdisc have a default where such packets
+ * go (1:2 in this case). Packets come in over and over again and boxes
+ * get filled more and more. Imagine that kernel sends packets just once
+ * a second. So it starts to traverse through this tree. It starts with
+ * root qdisc and over 1:1 it gets to 1:2. It sends packets up to its
+ * 'rate'. Then it takes 1:3 and again sends packets up to its
'rate'.
+ * And the whole process is repeated until 1:n is processed. So now we
+ * have ensured each class its guaranteed bandwidth. If the sum of sent
+ * data doesn't exceed 'rate' in 1:1 class, we can go further and
send
+ * more packets. The rest of available bandwidth is distributed to
+ * 1:2,1:3...1:n classes by ratio of their 'rate'. As soon as root
+ * 'rate' limit is reached or there are no more packets to send, we stop
+ * sending and wait another second. Each class has SFQ qdisc which
+ * shuffles packets in boxes stochastically, so one sender could not
+ * starve others.
+ *
+ * Therefore, whenever we want to plug a new guaranteed interface, we
+ * need to create a new class and adjust 'rate' of 1:2 class. When
+ * unplugging we do the exact opposite - remove associated class, and
+ * adjust the 'rate'.
+ *
+ * This description is rather longer and you'd better read it before you
+ * start digging into this :)
+ */
if (hierarchical_class) {
- /* If we are creating hierarchical class, all non guaranteed traffic
- * goes to 1:2 class which will adjust 'rate' dynamically as NICs
with
- * guaranteed throughput are plugged and unplugged. Class 1:1 is there
- * so we don't exceed the maximum limit for network. For each NIC with
- * guaranteed throughput a separate classid will be created.
- * NB '1:' is just a shorter notation of '1:0' */
virCommandFree(cmd);
cmd = virCommandNew(TC);
virCommandAddArgList(cmd, "class", "add",
"dev", ifname, "parent",
--
1.7.8.6