
On Wed, Jan 22, 2020 at 02:59:36PM +0200, Dan Kenigsberg wrote:
On Tue, Jan 21, 2020 at 4:47 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Tue, Jan 21, 2020 at 12:46:38PM +0200, Dan Kenigsberg wrote:
On Mon, Jan 20, 2020 at 8:33 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
On Sun, Jan 19, 2020 at 10:24:19PM -0500, Laine Stump wrote:
Current virtio-net drivers that support the failover feature match up the virtio backup device with its corresponding hostdev device by looking for an interface with a matching MAC address. Since libvirt will assign a different random MAC address to each interface that isn't given an explicit MAC address, this means that the configuration for any failover pairs must include explicit matching MAC addresses.
To make life easier, we could have libvirt populate the XML config with the same auto-generated MAC address for both interfaces when it detects a failover pair that have no MAC addresses provided (a failover pair can be detected by matching <alias name='blah'/> of the virtio interface with <driver backupAlias='blah'/> of the hostdev interface), but then we would be stuck with that behavior even if the virtio guest driver later eliminated the requirement that mac addresses match.
Additionally, some management software uses the MAC address as the primary index for its list of network devices, and can't properly deal with two interfaces having the same MAC address (oVirt). Even libvirt's own virsh utility uses MAC address (combined with interface type) to uniquely identify interfaces for the virsh detach-interface command (in this case, fortunately the runtime interface type is used, so one of the interfaces will always be of type='hostdev' and the other type='something-else", so it doesn't currently cause any problem).
In order to remove the necessity of explicitly setting interface MAC addresses, as well as permit the two interfaces of a failover pair to each have a unique index while still fulfilling the current guest driver requirement that the MAC addresses matchin the guest, this patch adds a new attribute "useBackupMAC" that is set on the hostdev interface of the pair. When useBackupMAC='yes', the setup for the hostdev interface will find the virtio failover interface (using backupAlias) and use that interface's MAC address to initialize the MAC address of the hostdev interface; the MAC address in the hostdev interface config remains unchanged, it just isnt used for device initialization.
I made this patch to followup on https://bugzilla.redhat.com/show_bug.cgi?id=1693587#c12 (where I suggested this attribute as a possible remedy to oVirt's requirement that each network device have a unique MAC address).
Truthfully, I'm not convinced that I want it though, as it seems "a bit" hackish. In particular, I've thought for a long time that the "hostdev manager" code in util/virhostdev.c should really live in the node device driver and be called from the hypervisors via a public API (so that there is one central place on the host that maintains the list of in-use PCI devices and their status), but this patch adds an obstacle to that goal by adding a virDomainDefPtr to more of the APIs in that library - if this was turned into a public API, then entire virDomainDef would need to be serialized and sent in the API call, then parsed at the other end - yuck :-/. NB: there are already functions in virhostdev.h that take a virDomainDefPtr, so maybe I'm being too sensitive.
On the upside, it solves a problem, and default bevahior is unchanged.
I don't believe it does solve a real problem.
If a mgmt app is capable of setting useBackupMAC=yes when writing the XML doc, then I don't see why it cannot just as easily set the matching MAC address when wring the XML doc.
It can still treat both NICs as having a different MAC address in its own internal code. All it has to do is use the matching MAC address when writing out the XML config it gives to libvirt.
I know oVirt has a facility for hook scripts that are add-ons which can arbitrarily munge the XML that VDSM creates. So AFAICT this doesn't even need to involve changes to VDSM itself. There can be a hook script which looks for the config indicating a failover pair, and rewrites the XML to have the matching MAC addr.
Such a workaround then only needs to exist for as long as the mgmt app has this problematic limitation without impacting libvirt's maint.
So I don't want to take this application specific hack in libvirt.
I see why you can see it as a hack that should not exist in libvirt. However I would try to put out my case for it. This feature creates something very similar to an in-guest bond between an sriov interface to a virtio one. Currently, we ask end users to configure the bond, set the sriov interface as the primary interface, and allow mac-spoofing on the virio interface. To me, the purpose of this feature is to remove the need for end-user intervention. The bond device no longer need to be created in the guest, it can be configured by management on the host side. Defining bonds in Linux is an established procedure. You select pre-existing interfaces, each with its different mac address, pick up a bonding mode, pick up a master interface, pick up the active mac address of the bond and start using it. I'd like to have the same experience when I configure this new type of bonding via libvirt. It just feels right to have a couple of independent interfaces, bonded together, with one selected as "master".
This is comparing apples & oranges IMHO. It is comparing what is done as an end user in the guest with what is done as an internal config option in libvirt. If you want to compare the user experiance, then the comparison is between the guest setup and the RHEV user interface experiance. AFAICT, we can achieve the desired user experiance in RHEV and there's no functional reason to need this patch. It is just adding a policy control knob that doesn't have any impact on what it is possible to configure for the guest, while adding to maint burden of libvirt.
IMHO there are three layers of apples here. In oVirt or KubeVirt, maybe also in OpenStack, a VM owner would like to define to virtual interfaces. One that is primary, based on SR-IOV. The other is secondary, connected via virtio to a logical network to their choosing. They would to specify the second interface as a backup for the first one. They don't really care about the mac address of the interfaces. In the guest, oVirt users can currently do just this using normal Linux bonding. IMHO a user of virsh or virt-manager would like to do that, too. He or she should not bother setting the mac address of the two interfaces; no human like to do that. They rather state which interface is the backup of another one.
I think that what you see as a maintenance burden stems from a design mistake^Wdecision in virtio, which expresses the "x is the backup of y" as "x and y have the same mac address and x is virtio". I (egotistically?) think of libvirt as the human-accessible API for virtualization, that let me express what I want, with little leakage of implementation issues.
I think this is where the disconnect is. The libvirt API is not aiming to provide human targetted convenience. It is intending to provide a machine targetted functional mechanism with clear semantics, on which applications can construct human targetted virtualization solutions. Human convenience entails making a large number of usage policy decisions based on criteria that are appropriate for the application's use cases & knowledge of the guest OS needs.
I'll try to express the user experience I envisage in KubeVirt terms:
kind: VM spec: domain: devices: interfaces: - name: fast sriov: {} - name: backup bridge: {} backupOf: fast # <---- how I'd like to express the relation between the two nics networks: - name: fast multus: networkName: sriov-vlan-100 - name: backup multus: networkName: ovs-vlan-100
Yes, this is a perfectly sane way to expose teaming in KubeVirt. What you're showing here is the KubeVirt config though, and this will be converted into a libvirt guest XML config, at which time KubeVirt can choose to provide a matching MAC address for the two NICs it has requested. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|