On 01/23/2012 09:08 AM, Paolo Bonzini wrote:
On 01/20/2012 10:50 PM, Laine Stump wrote:
> To refresh everyone's memory, the origin of the problem I'm trying to
> solve here is that the VFs of an SRIOV-capable ethernet card are given
> new random MAC addresses each time the card is initialized. If those VFs
> are then passed-through to a guest using the existing <hostdev> config,
> the guest will see a new MAC address each time the host is restarted,
> and will thus believe that a new ethernet card has been installed. This
> can result in anything from a dialog claiming that the guest has
> connected to a new network (MS products) to a new network device name
> showing up (Linux - "hmm, eth0 was unplugged, but here's this new
> device. Let's call it "eth1"!)
>
> Several months ago I sent out some mail proposing a scheme for
> automatically allocating network devices from a pool to be assigned to
> guests via PCI passthrough:
>
>
https://www.redhat.com/archives/libvir-list/2011-August/msg00937.html
>
>
> My idea was to have a new <network> forward mode combined with guest
> <interface> definitions that would end up auto-generating a transient
> <hostdev> entry in the guest's config (and setting the VF's mac
address
> in the process). Dan Berrange pointed out in that thread that we really
> do need to have a persistent <hostdev> entry for these devices in the
> domain xml, if for no other reason than to guarantee that the same
> guest-side PCI address is always used (thus preventing surprises in the
> guest, such as re-activation demands from Microsoft OSes). (There were
> other reasons, but that one was the real "hard stop" for me.)
>
> I've come back to this problem, and have decided that, while having the
> actual host device auto-allocated at runtime would be nice, first
> implementing a less ambitious solution that uses a hand-picked device
> would not preclude adding the more complicated/useful functionality
> later. So, here's a new simpler proposal.
>
>
> Step 1
> ------
>
> In the end, the solution will be that the VF's auto-generated random MAC
> address should be replaced with a fixed MAC address supplied by libvirt
> prior to assigning the VF to the guest. As a first step to satisfy this
> basic requirement, I'm figuring to just extend the <hostdev> xml in this
> way:
>
> |<hostdev mode='subsystem' type='pci' managed='yes'>
> |<source>
> |<address bus='0x06' slot='0x02' function='0x0'/>
> |</source>
> |<mac address='11:22:33:44:55:66"/>
> |</hostdev>
AARRRGGGHH!!!! Is there no way for me to force Thunderbird to keep its
hands off the white space at the beginning of lines in XML example
snippets?? (These were all nicely indented, and I added the "|" at thee
beginning of each line because I knew Thunderbird would swallow the
whitespace if it was at the beginning of the line).
In view of the discussion on SCSI passthrough, it seems to me that
this should be attached to an <interface> element:
<devices>
<interface type='hostdev'>
<source>
<address type='pci' bus='0x06' slot='0x02'
function='0x0'/>
</source>
<mac address='00:16:3e:5d:c7:9e'/>
<address type='pci' .../>
</interface>
</devices>
Nice! I should have thought of this in my original proposal - it's the
logical extension of having networks of type='hostdev'. I would prefer
this as well, but it hits one of Dan's criticism's of the original
proposal (from
https://www.redhat.com/archives/libvir-list/2011-August/msg01033.html ),
so I didn't further consider using a change to <interface>:
On 08/22/2011 at 05:17 AM, Dan Berrange wrote:
The issue I see is that if an application wants to know what
PCI devices have been assigned to a guest, they can no longer
just look at<hostdev> elements. They also need to look at
<interface> elements. If we follow this proposed model in other
areas, we could end up with PCI devices appearing as<disks>
<controllers> and who knows what else. I think this is not
very desirable for applications, and it is also not good for
our internal code that manages PCI devices. ie the security
drivers now have to look at many different places to find
what PCI devices need labelling.
Did something to nullify that criticism come up in the SCSI passthrough
discussion? If so, I'll implement that instead. (I guess this would just
mean that, in order to know what PCI devices have been assigned to a
guest, a scan should be done of all devices for a <source> element that
has <address type='pci' ... /> (along with <hostdev
type='pci'...> . The
problem, of course, is that existing management applications will need
to be modified to recognize this, but it does seem like a nice generic
extension (assuming it could conceivably work for any type of device,
not just <interface> and <hostdev>). This syntax meets the other
criterium of preserving pci address location in the guest. It would
require new checks to disallow migration if an <interface type='hostdev'
...> was attached, though.
(I have a feeling there's going to be blowback on the "security drivers"
front... :-)
(Note that even with *no new XML*, we already have a problem where just
scanning all the <hostdev> entries won't tell us about all host devices
that are currently assigned exclusively to guests - using a network
device via macvtap in passthrough mode is effectively the same (although
it's not directly exposed to the guest as the original PCI device, that
device isn't available for use by any other guest, or by the host)).
> 3) I've seen requests from 2 places to do host-side virtual
port
> association (i.e. vepa / 802.1Qb[gh]). Would it be feasible to do that
> association with the device after setting MAC address and before
> assigning it to the guest? (and likewise for the inverse) Or would the
> act of PCI assignment screw that up somehow? (one of the messages in the
> earlier thread says something about the device initialization by the
> guest un-doing necessary setup) (if it would work, a <virtualport> could
> just be added along with <mac address>).
I know almost nothing about this, but it does sound like another hint
that augmenting <interface> is a better plan.
Agreed; that was kind of the idea of the original proposal, and I still
prefer it (especially with your logical extension). There is stuff in
<interface> that would never apply to a pci-passthrough interface (e.g.
bandwidth control), but there is just as much that does apply.
> Step 2
> ------
>
> Once the basic functionality is in place, a further step would be one
> just to simplify the admins job - we could do this by replacing this
> config:
>
> | <source>
> | <address bus='x' slot='y' function='z'/>
> | </source>
>
> with:
>
> | <source>
> | <address netdev='eth22'/>
> | </source>
<devices>
<interface type='hostdev'>
<source dev='eth22'/>
<address type='pci' .../>
(NB: the <address type='pci'.../> you show here is used to configure the
address on the guest, not on the host)
</interface>
</devices>
Right - the one "required" feature that's missing though is that the pci
address on the host is then no longer easily grabbed by a management
application (as I mentioned before, though, that's already the case for
interfaces assigned using macvtap-passthrough, and they're just as
unavailable to other guests as pci-passthrough interfaces).
>
>> To further simplify configuration, it would be very nice if the choice
>> of network device could be done automatically. Since libvirt's networks
>> already have the concept of a pool of devices (and also of portgroups
>> which can be used to set <virtualport> parameters), it kind of makes to
>> sense to use that. In this case, a network would be defined something
>> like this:
>>
>> | <network>
>> | <name>passthrough-net</name>
>> | <forward dev='eth20' mode='hostdev'> <!-- or
"hardware" or "device"
>> -->
>> | <interface dev='eth20'/>
>> | <interface dev='eth21'/>
>> | <interface dev='eth22'/>
>> | ..
>> | </forward>
>> | </network>
>>
>> (it could also contain a virtualport definition and/or portgroups
>> containing virtualport definitions. Obviously, we would have to prohibit
>> <bandwidth> elements (and several other things) in the definitions>)
>>
>> Then, in lieu of a pci address or network device name (as "netdev"),
the
>> <hostdev>'s <source> would have a reference to the network:
>>
>> |<hostdev mode='subsystem' type='pci'
managed='yes'>
>> |<source>
>> |<address network='passthrough-net'/>
>> |</source>
>> |<mac address='11:22:33:44:55:66"/>
>> |</hostdev>
>
> <devices>
> <interface type='hostdev'>
> <source network='passthrough-net'/>
> <mac address='11:22:33:44:55:66"/>
> <address type='pci' .../>
</interface>
</devices>
And of course at runtime, the host device actually used would be listed
in the <actual> element (which would also show the "actual type" to be
"hostdev").
Again, though, the host-side pci address of the device isn't available
anywhere in the XML. I personally don't have a problem with that, but
then I'm not an author/maintainer of any management application :-)
> (or, again, maybe use the separate <network> element: "<network
> name='passthrough-net'/>) At attach time, the pool of devices in
> passthrough-net would be searched for a free device, and if found, that
> device would have its MAC address changed and be assigned to the guest.
> In this case, the live XML would be updated with the pci address
> information, but when the guest was destroyed, the device would be
> handed back to the pool, and the pci address info once again removed
> from the config.
This sounds really nice, especially together with the auto-add VF
functionality that was committed recently.
Yep. I can't imagine doing PCI passthrough with 64 VFs by manually
entering in the PCI address of each VF.