[libvirt] RFC: setting mac address on network devices being assigned to a guest via PCI passthrough (<hostdev>)

Friday, 20 January 2012

To refresh everyone's memory, the origin of the problem I'm trying to 
solve here is that the VFs of an SRIOV-capable ethernet card are given 
new random MAC addresses each time the card is initialized. If those VFs 
are then passed-through to a guest using the existing <hostdev> config, 
the guest will see a new MAC address each time the host is restarted, 
and will thus believe that a new ethernet card has been installed. This 
can result in anything from a dialog claiming that the guest has 
connected to a new network (MS products) to a new network device name 
showing up (Linux - "hmm, eth0 was unplugged, but here's this new 
device. Let's call it "eth1"!)

Several months ago I sent out some mail proposing a scheme for 
automatically allocating network devices from a pool to be assigned to 
guests via PCI passthrough:

  https://www.redhat.com/archives/libvir-list/2011-August/msg00937.html

My idea was to have a new <network> forward mode combined with guest 
<interface> definitions that would end up auto-generating a transient 
<hostdev> entry in the guest's config (and setting the VF's mac address 
in the process). Dan Berrange pointed out in that thread that we really 
do need to have a persistent <hostdev> entry for these devices in the 
domain xml, if for no other reason than to guarantee that the same 
guest-side PCI address is always used (thus preventing surprises in the 
guest, such as re-activation demands from Microsoft OSes). (There were 
other reasons, but that one was the real "hard stop" for me.)

I've come back to this problem, and have decided that, while having the 
actual host device auto-allocated at runtime would be nice, first 
implementing a less ambitious solution that uses a hand-picked device 
would not preclude adding the more complicated/useful functionality 
later. So, here's a new simpler proposal.

Step 1
------

In the end, the solution will be that the VF's auto-generated random MAC 
address should be replaced with a fixed MAC address supplied by libvirt 
prior to assigning the VF to the guest. As a first step to satisfy this 
basic requirement, I'm figuring to just extend the <hostdev> xml in this 
way:

|<hostdev mode='subsystem' type='pci' managed='yes'>
|<source>
|<address bus='0x06' slot='0x02' function='0x0'/>
|</source>
|<mac address='11:22:33:44:55:66"/>
|</hostdev>

When libvirt sees <mac address...> in the hostdev at device attach time, 
it would first verify that the device is a network device (if not, it 
would log an error and fail the operation). If it is a network device, 
the pci address would be converted into a network device name, and that 
device would have its MAC address set to the configured value, and then 
the attach would proceed.

My main questions here are:

1) Is this the right place for the new element? Or should it go into 
<source>?

2) Should we bother trying to save the original MAC address to restore 
when the device is released? (I guess that might be important if, for 
example, the guest config was changed to use a different device but same 
MAC address - you could end up with two devices having the same MAC 
address).

3) I've seen requests from 2 places to do host-side virtual port 
association (i.e. vepa / 802.1Qb[gh]). Would it be feasible to do that 
association with the device after setting MAC address and before 
assigning it to the guest? (and likewise for the inverse) Or would the 
act of PCI assignment screw that up somehow? (one of the messages in the 
earlier thread says something about the device initialization by the 
guest un-doing necessary setup) (if it would work, a <virtualport> could 
just be added along with <mac address>).

Beyond those 3 questons, this all seems rather uncontroversial, so I'll 
start coding something up right away (and modify as necessary after 
discussion).

Step 2
------

Once the basic functionality is in place, a further step would be one 
just to simplify the admins job - we could do this by replacing this config:

| <source>
| <address bus='x' slot='y' function='z'/>
| </source>

with:

| <source>
| <address netdev='eth22'/>
| </source>

(or possibly it could be a separate element within <source>, e.g. 
"<network dev='eth22'/>") As long as the domain isn't running,
the 
config would remain like this. The first time the device was attached, 
the name in netdev would be resolved to a pci address (or failed if the 
given netdev wasn't a PCI device); and the config auto-filled as follows:

| <source>
| <address bus='x' slot='y' function='z'
netdev='eth22'/>
| </source>

On subsequent attaches (i.e. when both netdev and a pci address are 
present) the netdev would again be resolved and compared to the pci 
address to make sure they still agree; if not, the operation would fail. 
This would satisfy management applications' desire to see the pci 
address info of all devices assigned to guests, while retaining the 
original config info (and also lead quite nicely into step 3...).

Step 3
------

To further simplify configuration, it would be very nice if the choice 
of network device could be done automatically. Since libvirt's networks 
already have the concept of a pool of devices (and also of portgroups 
which can be used to set <virtualport> parameters), it kind of makes to 
sense to use that. In this case, a network would be defined something 
like this:

| <network>
| <name>passthrough-net</name>
| <forward dev='eth20' mode='hostdev'> <!-- or
"hardware" or "device" -->
| <interface dev='eth20'/>
| <interface dev='eth21'/>
| <interface dev='eth22'/>
|       ..
| </forward>
| </network>

(it could also contain a virtualport definition and/or portgroups 
containing virtualport definitions. Obviously, we would have to prohibit 
<bandwidth> elements (and several other things) in the definitions>)

Then, in lieu of a pci address or network device name (as "netdev"), the 
<hostdev>'s <source> would have a reference to the network:

|<hostdev mode='subsystem' type='pci' managed='yes'>
|<source>
|<address network='passthrough-net'/>
|</source>
|<mac address='11:22:33:44:55:66"/>
|</hostdev>

(or, again, maybe use the separate <network> element: "<network 
name='passthrough-net'/>) At attach time, the pool of devices in 
passthrough-net would be searched for a free device, and if found, that 
device would have its MAC address changed and be assigned to the guest. 
In this case, the live XML would be updated with the pci address 
information, but when the guest was destroyed, the device would be 
handed back to the pool, and the pci address info once again removed 
from the config.

Step 2 & 3 probably won't be implemented right away, but I thought I 
should toss the ideas out there in case they lead to something else that 
would require a change in step 1.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005