On Wed, 2015-12-02 at 18:17 +0100, Andrea Bolognani wrote:
This series is my attempt at fixing
https://bugzilla.redhat.com/show_bug.cgi?id=1272300
[...]
The problem being solved is that, when using VFIO, IOMMU group
ownership can't be shared, eg. two devices that are in the
same IOMMU group can't be assigned to different guests, or to
the host and a guest. If that happens, the host will probably
crash.
The series deals with this issue by making sure safety
conditions are met before detaching devices from the host or
reattaching them to the host. In praticular, when we're asked
to reattach a device to the host but doing so would lead to
sharing IOMMU group ownership, we delay the operation until
we can guarantee this will not cause problems. As a nice side
effect of the changes we check for this when starting a guest
too, instead of assuming it will work and having QEMU error
out immediately afterwards.
Shivaprasad raised a concern on IRC, which I'm sharing here
for wider discussion. I'm CC'ing Laine and Alex, hopefully
they don't mind - let me know otherwise.
Assume we have a PCI device with two functions. With this
series applied, when reattaching both functions to the host
this would happen:
f0 remove from guest
f1 remove from guest
f0 unbind from vfio-pci
f0 trigger host driver reprobe
f1 unbind from vfio-pci
f1 trigger host driver reprobe
Shivaprasad is concerned this is not actually safe, and the
proper sequence would rather be:
f0 remove from guest
f1 remove from guest
f0 unbind from vfio-pci
f1 unbind from vfio-pci
f0 trigger host driver reprobe
f1 trigger host driver reprobe
Doing so would AFAICT mean basically duplicating the delay
logic this series adds to virHostdev into virPCI, to ensure
that devices are unbound from vfio-pci only once the same
operation has been requested for all devices in the IOMMU
group, and reprobe is triggered only after all devices have
been unbound from vfio-pci.
I was under the impression that what the current series
does, eg. sharing devices in the same IOMMU group between
the host driver and vfio-pci is safe as long as no guest is
using them at the same time, and that devices could be
safely "moved" between the host driver (eg. in use) and
vfio-pci (eg. idle, waiting to be assigned to a guest) as
many times as desired without ill consequences.
Is my understanding wrong? Do I need to rework the series
so that unbinds and reprobes are always executed across the
IOMMU group?
Any suggestion or pointers to relevant documentation will
be very much appreciated.
Cheers.
--
Andrea Bolognani
Software Engineer - Virtualization Team