On Mon, Jun 24, 2013 at 10:40:48AM -0400, Laine Stump wrote:
On 06/24/2013 06:06 AM, Daniel P. Berrange wrote:
> On Mon, Jun 24, 2013 at 05:54:49AM -0400, Laine Stump wrote:
>> When I first put in support for VFIO device assignment, I didn't
>> realize that groups of devices were quite as common as they actually
>> are. In particular, I didn't know that often multiple
>> seemingly-unrelated devices can end up in the same VFIO iommu group
>> due to unlucky circumstances of hardware - they may share a dma
>> controller which means that the devices can't truly be isolated from
>> each other, and thus should not be simultaneously assigned to
>> different guests (or even used by the host) - all of the devices in a
>> group should be either assigned to the same guest or, if not assigned
>> to the guest, should be isolated off in a driver to prevent them
>> from being used by the host.
>>
>> The following set of patches makes setting that up easier to deal
>> with. The end result of all the patches is the following:
>>
>> 1) The virNodeDevice API will be able to detach or re-attach all the
>> devices in a particular group with a single API call.
>>
>> 2) <hostdev managed='yes'>, <interface type='hostdev'
managed='yes'>,
>> and <interface type='network' managed='yes'> devices
(where the
>> network is itself a pool of SRIOV Virtual Functions) can specify:
>>
>> <driver name='vfio' group='auto'/>
>>
>> and libvirt will automatically detach (and bind to the 'vfio-pci'
>> driver for assignment/isolation) all devices in the same group as
>> the device being assigned. Likewise, when the device it detached
>> from the guest, a check will be made and, if none of the devices in
>> the same group as the device being detach is still in use by a guest
> I am concerned that group='auto' is a really incredibly dangerous
> setting from the POV of operation of the host OS.
>
> I can just imagine forum postings / docs saying to use group=auto
> and people blindly following it without much inclination as to
> what will happen. They will be trying to assign a spare NIC to
> their guest; they'll get an error saying it can't be done since it
> is part of a group; they'll search google and find a recommendation
> to use group=auto to "fix" the problem. libvirt will see that their
> SATA controller & graphics card are part of the same group as the
> NIC and automatically detach them both from the host OS. Kaboom,
> the user is screwed.
Yes, I understand (and share) your concern. These patches grew out of
claims that there would be a regression in behavior if "managed='yes'"
stopped working properly with devices that were in a group, and this was
the only way I could see to make it work.
IIUC, 2 devices are in the same group if there is no way to assign
them to different domains at the same time, eg due to lack of FLR.
If that is true, then old non-VFIO would have refused to start the
guest if you had assigned only one of the devices to the guest,
even with managed=yes.
So it doesn't sound like a regression to me - both with old style
pci assign and with vfio, you would be required to either assign
all the devices to the same guest, or manually unbind the non-assigned
devices from the host to allow the guest to start.
>
> With traditional configs, even with managed=yes, you could be sure
> that only the single device in the XML would ever be touched. If
> there was a conflict due to other devices being on the same PCI
> bridge without FLR, then the device would safely fail to be assigned
> until the user had explicitly disconnected other devices from the
> host. We never attempted to automatically disconnect anything that
> was not part of the XML
>
> Following on from that, how does an application determine what
> other devices are present in the group associated with the device
> being assigned ? Are we exposing group membership info in the node
> device XML anywhere ?
Yes. nodedev-dumpxml now has the following information for every device
that's in a group (this is added in Patch 20/22):
<iommuGroup number='12'>
<address domain='0x0000' bus='0x02' slot='0x00'
function='0x0'/>
<address domain='0x0000' bus='0x02' slot='0x00'
function='0x1'/>
</iommuGroup>
> I'm not sure what else to suggest, other than to say we should not
> add this attribute, and require that the application/user explicitly
> disconnect any other devices in the same group from the host OS. Any
> other option I can think of just sounds too dangerous.
I would suggest adding some sort of "assignment white list" to libvirt's
config, and requiring any device manipulated in any manner to be on that
white list, but in a way it's already possible to create such a list -
just manually detach all devices that will be assigned to a guest (and
any devices in the same iommu groups) and stop using managed='yes'.
Yes, as you say, you can already setup an "assignment whitelist" simply
by unbinding all allowed devices from the host & using managed=no.
(Just doing that was my original plan, but it had opposition due to
the
perceived regression in behavior.)
As above I don't see any regression in behaviour - non-VFIO case would
not silently detach any devices not explicitly listed in the XML.
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|