On Fri, 2015-11-20 at 12:24 -0500, Laine Stump wrote:
On 11/20/2015 11:58 AM, Andrea Bolognani wrote:
> On Fri, 2015-11-20 at 11:33 -0500, Laine Stump wrote:
>> Seems safe, but is this really what we want to do? I haven't
>> read/understood the remaining patches yet, but this makes it sound like
>> what is going to happen is that all of the devices will be unbound from
>> vfio-pci immediately, so they are "in limbo", and will then be
reprobed
>> once all devices are unused (and therefore unbound from vfio-pci).
>>
>> I think that may be a bit dangerous. Instead, we should leave the
>> devices bound to vfio-pci until all of them are unused, and at that
>> time, we should unbind them all from vfio-pci, then reprobe them all.
>> (again, I may have misunderstood the direction, if so ignore this).
> I agree, we should not unbind any device from vfio-pci until
> all the devices in the IOMMU group have been detached from
> the guest.
... and I've just looked back at my original comment about this in the
BZ, and see that at that time I only suggested delaying the reprobe, but
said nothing about delaying the unbind. And I'm not as sure about the
necessity of waiting as I was 1/2 an hour ago. I suppose the issue is
that it brings all those unbound devices one step closer to getting
bound to the host driver. However, that will happen only if those
device's PCI addresses are written to "drivers_reprobe" in sysfs (right?
is there any other way a more "global" reprobe could happen and snatch
up everything that's currently unbound?)
Any load of a module will snatch up any unclaimed devices that match it,
so if you unbind and leave the devices orpaned, a random module load
could cause much badness. Adding a new_id will also cause a device
scan, so if that happened to match the device: random badness.
So maybe I'd better ask someone who knows more about this than me
-
Alex, is there an issue with unbinding some devices in an iommu group
from vfio-pci at an earlier time, and leaving then unbound to any driver
at all while some other devices in the group are still in use by the
guest? Is there an advantage to keeping them all bound to vfio-pci until
none of them are used, and then unbinding/reprobing them all at the same
time? Or should we unbind each from vfio-pci immediately when they are
detached from the guest, and reprobe them all once they're all unbound?
Unbinding them from vfio-pci leaves them susceptible to random bad
things happen, as outlined above, and potentially limits vfio's ability
to do things like bus resets. For instance imagine a 2-port NIC where
each port is a PCI function, the functions are grouped together and the
devices don't support any sort of internal reset. If both devices are
bound to vfio-pci, then the user owns them both and we can do a bus
reset. If one of those devices gets released from the user, as soon as
it's unbound from vfio-pci it's no longer in our control and the bus
rest option is gone.
The best course of action would be to leave any managed devices bound to
vfio-pci until all of the devices within the group are no longer in use.
Thanks,
Alex