On 04/14/2015 12:59 PM, Michal Privoznik wrote:
So, as partly explained in previous commit, there is an issue
with NATed networks on a guest originated MAC change. When user
sets @floor in <inbound/> under <bandwidth/> it results in a QoS
hierarchy being built on the corresponding bridge (yes, this
feature is type='network' only). The reason is we can make
shaping decisions only there, where the whole traffic can be
seen. In case of bridged networks it's bridge. But, on the
bridge, in egress qdiscs (outgoing traffic, after routing
decision), due to some Linux kernel internal implementation, we
are not able to see the device packet came from. So we have a set
of filters that place the packet into correct qdisc based on the
MAC recorded in the packet. But with recent changes, this is not
working as expected. We taught libvirt to listen to MAC change
events from guests. However, the QoS hierarchy is not updated
accordingly resulting in applying incorrect shaping rules on a
vNIC that changed the MAC address.
How about something like this?
Because packets going through the egress from a bridge (where our
bandwidth limiting takes place) have no information about which
interface they came from, the QoS rules that we create instead use the
source MAC address of the packets to make their decisions about which
QDisc the packet should be in.
One flaw in this is that when a guest changed the MAC address it used, packets from the
guest would no longer be put into the correct QDisc, but would instead be put in an
"unprivileged" class, resulting in the bandwidth "floor" (minimum
guaranteed) being no longer honored.
Now that libvirt has infrastructure to capture and respond to RX_FILTER_CHANGE events
from
qemu (sent whenever a guest interface modifies its MAC address, among
other things), we can notice when a guest MAC address changes, and
update the QoS rules accordingly, so that bandwidth floor is honored even after a guest
MAC address change.
Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
---
src/qemu/qemu_driver.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index f5a3ef9..6aab1d0 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -4382,6 +4382,28 @@ processNicRxFilterChangedEvent(virQEMUDriverPtr driver,
syncNicRxFilterDeviceOptions(def->ifname, guestFilter, hostFilter);
}
We had discussed in private mail whether this should always be done
regardless of the setting of trustGuestRxFilters. In some ways it seems
like it should, since the guest is free to change its MAC address
anyway. On the other hand, if the admin hasn't specifically allowed
changing the MAC address, there's nothing saying that we must honor the
bandwidth floor for the guest even though it is "going rogue".
Note that when macTableManager="libvirt", traffic from to/from the guest
*won't* pass at all after a MAC address change.
So maybe it *is* proper to not do this unless trustGuestRxFilters is
"yes" (as you have it doing right now).
+ if (virDomainNetGetActualType(def) == VIR_DOMAIN_NET_TYPE_NETWORK) {
+ const char *brname = virDomainNetGetActualBridgeName(def);
+
+ if (virNetDevGetRxFilter(def->ifname, &hostFilter)) {
+ VIR_WARN("Couldn't get current RX filter for device %s "
+ "while responding to NIC_RX_FILTER_CHANGED",
+ def->ifname);
+ goto endjob;
+ }
+
+ /* For TUN/TAP connections, set the following macvtap network device
+ * attributes to match those of the guest network device:
+ * - MAC address
+ * - QoS filters (which are based on MAC address)
+ */
+ syncNicRxFilterMacAddr(def->ifname, guestFilter, hostFilter);
syncNicRxFilterMacAddr() is only useful for macvtap devices. It sets the
MAC address of the tap (macvtap in that case) to the same address used
by the guest. Doing so for a tap device would be disastrous, as the
kernel would no longer understand that it had to forward packets
*through* the tap device to the guest, but would believe that the host
itself was the final destination.
If anything, the MAC address of the tap could be set to the guest's MAC
with the initial byte changed to 0xFE, but even that isn't really
necessary (the exact MAC address of the tap device isn't important, it's
just important that it is unique within the L2 domain that the tap is
connected to).
+
+ if (virNetDevBandwidthUpdateFilter(brname, &guestFilter->mac,
+ def->data.network.actual->class_id)
< 0)
+ goto endjob;
+ }
+
endjob:
qemuDomainObjEndJob(driver, vm);