On Mon, Aug 22, 2011 at 05:17:25AM -0400, Laine Stump wrote:
For some reason beyond my comprehension, the designers of SRIOV
ethernet cards decided that the virtual functions (VF) of the card
(each VF corresponds to an ethernet device, e.g. "eth10") should
each be given a new+different+random MAC address each time the
hardware is rebooted.
[...snip...]
This makes using SRIOV VFs via PCI passthrough very unpalatable. The
problem can be solved by setting the MAC address of the ethernet
device prior to assigning it to the guest, but of course the
<hostdev> element used to assign PCI devices to guests has no place
to specify a MAC address (and I'm not sure it would be appropriate
to add something that function-specific to <hostdev>).
In discussions at the KVM forum, other related problems were
noted too. Specifically when using an SRIOV VF with VEPA/VNLink
we need to be able to set the port profile on the VF before
assigning it to the guest, to lock down what the guest can
do. We also likely need to a specify a VLAN tag on the NIC.
The VLAN tag is actally something we need to be able todo
for normal non-PCI passthrough usage of SRIOV networks too.
Dave Allan
and I have discussed a different possible method of eliminating this
problem (using a new forward type for libvirt networks) that I've
outlined below. Please let me know what you think - is this
reasonable in general? If so, what about the details? If not, any
counter-proposals to solve the problem?
The issue I see is that if an application wants to know what
PCI devices have been assigned to a guest, they can no longer
just look at <hostdev> elements. They also need to look at
<interface> elements. If we follow this proposed model in other
areas, we could end up with PCI devices appearing as <disks>
<controllers> and who knows what else. I think this is not
very desirable for applications, and it is also not good for
our internal code that manages PCI devices. ie the security
drivers now have to look at many different places to find
what PCI devices need labelling.
One problem this doesn't solve is that when a guest is migrated,
the
PCI info for the allocated ethernet device on the destination host
will almost surely be different. Is there any provision for dealing
with this in the device passthrough code? If not, then migration
will still not be possible.
Migration is irrelevant with PCI passthrough, since we reject any
attempt to migrate a guest with assigned PCI devices. A management
app must explicitly hot-unplug all PCI devices before doing any
migration, and plug back in new ones after migration finishes.
Although I realize that many people are predisposed to not like the
idea of PCI passthrough of ethernet devices (including me), it seems
that it's going to be used, so we may as well provide the management
tools to do it in a sane manner.
Reluctantly I think we need to provide the neccessary information
underneath the <hostdev> element. Fortunately we already have an
XML schema for port profile and such things, that we share between
the <interface> device element and the <network> schema.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|