[libvirt] VFIO device groups and libvirt

VFIO device assignment has a concept of device "groups"; each group is a set of devices that must all be assigned to the same guest (alternately, unassigned devices can be attached to the vfio stub driver). This is to prevent cross-guest (or guest-host) contamination of devices that share DMA mapping or some such thing and thus can't be completely isolated from each other. The vfio support currently in libvirt works with no problems if no other devices are in the same "group" as a device being assigned to a guest. If there are other devices in the group, it fails with a message noting that the other devices in the group must also be attached to the vfio stub driver (so that the host can't use them). I am now adding code to libvirt that, when receiving a request to attach a device in a group to the vfio stub, will notice the other devices in the group and attach them to the stub as well; it will then allow attaching all of these devices to the guest, but not to any other guest. This will be nice for those who *want* this behavior, but I think that very often a person trying to assign a device that's in a group doesn't even realize that it's in a group, and if they knew it was in a group, they *wouldn't* want to lose all those devices just to assign one (once a single device in a group is assigned to the guest, none of the other devices in the group are usable by the host). An example of this was encountered by Cole the other day. He had a non-SRIOV network card plugged into his system that he wanted to assign with VFIO, but it turned out that due to the hardware layout of his motherboard, this plugin network card was placed in a group with 2 devices that were part of the motherboard chipset. When this occurs, a user might want to take one of three different courses: 1) attach all those onboard devices to the vfio stub driver too (making them inaccessible to the host, but allowing the assignment of the target device via VFIO) 2) use legacy KVM device assignment instead (which is less secure, but will allow assigning one of the devices while the rest remain available to the host) 3) try plugging the card they want to assign into a different slot (which may remove the device from this group and place it in another, possibly empty, group) It's difficult to say which the user will want, so I don't want the new "attach all devices in the group to the stub" code to be automatic. Instead, I think it should only take effect if some option is specified in the device's <driver> element. For example: <hostdev managed='yes'> <driver name='vfio' group='allow'/> ... </hostdev> Without that xml option, the same error message logged now will continue to be logged. With that xml option, all the devices in the group will be attached to the vfio stub driver. Likewise, in virNodeDeviceDetachFlags(), a VIR_NODE_DEVICE_DETACH_GROUP flag will have the following effect: Single device in group - no effect Multiple devices, no flag - detach will fail Multiple devices, flag set - all devices in the same group as specified device will be detached by one call. Does this sound reasonable?

On Thu, 2013-05-30 at 12:29 -0400, Laine Stump wrote:
VFIO device assignment has a concept of device "groups"; each group is a set of devices that must all be assigned to the same guest (alternately, unassigned devices can be attached to the vfio stub driver). This is to prevent cross-guest (or guest-host) contamination of devices that share DMA mapping or some such thing and thus can't be completely isolated from each other.
The vfio support currently in libvirt works with no problems if no other devices are in the same "group" as a device being assigned to a guest. If there are other devices in the group, it fails with a message noting that the other devices in the group must also be attached to the vfio stub driver (so that the host can't use them).
I am now adding code to libvirt that, when receiving a request to attach a device in a group to the vfio stub, will notice the other devices in the group and attach them to the stub as well; it will then allow attaching all of these devices to the guest, but not to any other guest. This will be nice for those who *want* this behavior, but I think that very often a person trying to assign a device that's in a group doesn't even realize that it's in a group, and if they knew it was in a group, they *wouldn't* want to lose all those devices just to assign one (once a single device in a group is assigned to the guest, none of the other devices in the group are usable by the host).
An example of this was encountered by Cole the other day. He had a non-SRIOV network card plugged into his system that he wanted to assign with VFIO, but it turned out that due to the hardware layout of his motherboard, this plugin network card was placed in a group with 2 devices that were part of the motherboard chipset. When this occurs, a user might want to take one of three different courses:
1) attach all those onboard devices to the vfio stub driver too (making them inaccessible to the host, but allowing the assignment of the target device via VFIO)
2) use legacy KVM device assignment instead (which is less secure, but will allow assigning one of the devices while the rest remain available to the host)
3) try plugging the card they want to assign into a different slot (which may remove the device from this group and place it in another, possibly empty, group)
It's difficult to say which the user will want, so I don't want the new "attach all devices in the group to the stub" code to be automatic. Instead, I think it should only take effect if some option is specified in the device's <driver> element. For example:
<hostdev managed='yes'> <driver name='vfio' group='allow'/> ... </hostdev>
Without that xml option, the same error message logged now will continue to be logged. With that xml option, all the devices in the group will be attached to the vfio stub driver.
Likewise, in virNodeDeviceDetachFlags(), a VIR_NODE_DEVICE_DETACH_GROUP flag will have the following effect:
Single device in group - no effect Multiple devices, no flag - detach will fail Multiple devices, flag set - all devices in the same group as specified device will be detached by one call.
Does this sound reasonable?
nit, vfio is not a stub driver, it's an actual driver. vfio also whitelists some host drivers. The two so far are pci-stub and pcieport. libvirt should not disconnect root ports from pcieport unless the root port is being assigned to a guest (which isn't supported). pci-stub can be used to satisfy the requirement that the device is disconnected from the host, but doesn't allow direct access of the device through vfio. I'm also working on a kernel patch that will allow a user to specify on the kernel command line devices and sets of devices to assume have ACS support. This should allow users to strategically change device grouping if they want to opt-in to the risk of assigning devices in topologies without ACS support. The additional xml flag is unfortunate, but I can see why you want it. Thanks, Alex

On 05/30/2013 01:18 PM, Alex Williamson wrote:
nit, vfio is not a stub driver, it's an actual driver. vfio also whitelists some host drivers. The two so far are pci-stub and pcieport. libvirt should not disconnect root ports from pcieport unless the root port is being assigned to a guest (which isn't supported).
So are you saying that if someone wants to assign a device that's in the same group as a device that is already bound to pcieport, even if they say its okay to bind the entire group to vfio-pci, we should just leave the pcieport device alone? Is the same true for pci-stub?
pci-stub can be used to satisfy the requirement that the device is disconnected from the host, but doesn't allow direct access of the device through vfio.
I'm also working on a kernel patch that will allow a user to specify on the kernel command line devices and sets of devices to assume have ACS support. This should allow users to strategically change device grouping if they want to opt-in to the risk of assigning devices in topologies without ACS support.
If that's possible, then it seems the usefulness of what I'm doing becomes significantly less (or at least the need for it becomes much less urgent).
The additional xml flag is unfortunate, but I can see why you want it.
I dislike it too, since it means that we will never be able to support a simple <hostdev> assignment of a device that's in a group without specifying a <driver group='allow'/>. I'm almost inclined to just categorically state that if you want to use VFIO to assign a device that's in a group, you can't use <hostdev managed='yes'>, but instead have to manually detach the devices in the group from the host and attach them to vfio-pci/pci-stub (libvirt provides an API to do this - virNodeDeviceDetachFlags()). This would keep the XML clean for the cases of non-grouped devices (as well as devices in a group of devices that were all bound to vfio-pci, pci-stub, or pcieport); for for devices in groups that used other drivers, the user could either add the kernel commandline options you mentioned (to allow willy-nilly assignment to any combination of guests) or bind the devices in the group to vfio-pci prior to starting the guest / assigning the device. Does that sound like a reasonable level of usability?

On Thu, 2013-05-30 at 13:48 -0400, Laine Stump wrote:
On 05/30/2013 01:18 PM, Alex Williamson wrote:
nit, vfio is not a stub driver, it's an actual driver. vfio also whitelists some host drivers. The two so far are pci-stub and pcieport. libvirt should not disconnect root ports from pcieport unless the root port is being assigned to a guest (which isn't supported).
So are you saying that if someone wants to assign a device that's in the same group as a device that is already bound to pcieport, even if they say its okay to bind the entire group to vfio-pci, we should just leave the pcieport device alone? Is the same true for pci-stub?
Yes. pcieport makes sure the bridge device is enabled and has sub-drivers for things like AER that will become useful once that support is incorporated. pci-stub doesn't do anything useful for vfio, but it doesn't do anything that risks the isolation of the group either. If I had a group that contained an endpoint device and a motherboard component, I might want to attach the motherboard component to pci-stub and the endpoint to vfio-pci. The devices are not isolated from one another (that's why they share a group), but it becomes more difficult to get at the other device from userspace if it's attached to pci-stub instead of vfio-pci. It's just an extra barrier we can choose to use.
pci-stub can be used to satisfy the requirement that the device is disconnected from the host, but doesn't allow direct access of the device through vfio.
I'm also working on a kernel patch that will allow a user to specify on the kernel command line devices and sets of devices to assume have ACS support. This should allow users to strategically change device grouping if they want to opt-in to the risk of assigning devices in topologies without ACS support.
If that's possible, then it seems the usefulness of what I'm doing becomes significantly less (or at least the need for it becomes much less urgent).
We're still going to have groups with multiple devices, and often they're justified. For instance, even with the ACS overrides I'm planning, the motherboard USB functions on my system are still grouped together because they don't event claim to be PCIe devices. USB is a case where there's absolutely sharing between functions and it's dangerous to assigned different functions to different guests or leave some attached to the host. These should be treated as a package, and so far ACS does that for us.
The additional xml flag is unfortunate, but I can see why you want it.
I dislike it too, since it means that we will never be able to support a simple <hostdev> assignment of a device that's in a group without specifying a <driver group='allow'/>.
I'm almost inclined to just categorically state that if you want to use VFIO to assign a device that's in a group, you can't use <hostdev managed='yes'>, but instead have to manually detach the devices in the group from the host and attach them to vfio-pci/pci-stub (libvirt provides an API to do this - virNodeDeviceDetachFlags()). This would keep the XML clean for the cases of non-grouped devices (as well as devices in a group of devices that were all bound to vfio-pci, pci-stub, or pcieport); for for devices in groups that used other drivers, the user could either add the kernel commandline options you mentioned (to allow willy-nilly assignment to any combination of guests) or bind the devices in the group to vfio-pci prior to starting the guest / assigning the device.
Does that sound like a reasonable level of usability?
It sounds like an additional barrier to using vfio. Let's not overlook that places where libvirt allows pci-assign to attach individual functions and vfio does not should be considered a bug in libvirt's ACS enforcement. vfio actually enables secure assignment by telling libvirt the bounds of that isolation group whereas pci-assign requires libvirt to figure out that out, so the only option it has is to require the user to disable ACS checking altogether. Just because pci-assign and libvirt allow it doesn't mean that it's ok. Thanks, Alex
participants (2)
-
Alex Williamson
-
Laine Stump