[libvirt] [PATCH v2 0/2] check IOMMU group devices usage when preparing device for vfio passthrough

==================================================================== If a device in the same iommu group is in use by a different vm, the guest boot fails with the below error during vfio passthrough. bash-4.2$ virsh start demo error: Failed to start domain demo error: internal error: process exited while connecting to monitor: 2014-12-02T13:43:52.020136Z qemu-system-x86_64: -device vfio-pci,host=00:1c.3,id=hostdev0,bus=pci.0,addr=0x5: vfio: error opening /dev/vfio/7: Device or resource busy Solution: ===================================================================== The patch iterates through the iommu group devices and errors out cleanly mentioning the device and the guest which is using the device. With Patch bash-4.2$ virsh start demo error: Failed to start domain demo error: Requested operation is not valid: PCI device 0000:0d:00.0 is in use by driver QEMU, domain vm10 --- Shivaprasad G Bhat (2): Implement virPCIDeviceGetAddress function check IOMMU group devices usage when preparing device for vfio passthrough src/util/virhostdev.c | 89 ++++++++++++++++++++++++++++++------------------- src/util/virpci.c | 16 +++++++++ src/util/virpci.h | 1 + 3 files changed, 72 insertions(+), 34 deletions(-) -- Signature

Basically a getter function which is implemented for accessing the address fields in virPCIDevice. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virpci.c | 16 ++++++++++++++++ src/util/virpci.h | 1 + 2 files changed, 17 insertions(+) diff --git a/src/util/virpci.c b/src/util/virpci.c index cd78212..831a5d8 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1655,6 +1655,22 @@ virPCIDeviceFree(virPCIDevicePtr dev) VIR_FREE(dev); } +virPCIDeviceAddressPtr virPCIDeviceGetAddress(virPCIDevicePtr dev) +{ + + virPCIDeviceAddressPtr pciAddrPtr; + + if (!dev || (VIR_ALLOC(pciAddrPtr) < 0)) + return NULL; + + pciAddrPtr->domain = dev->domain; + pciAddrPtr->bus = dev->bus; + pciAddrPtr->slot = dev->slot; + pciAddrPtr->function = dev->function; + + return pciAddrPtr; +} + const char * virPCIDeviceGetName(virPCIDevicePtr dev) { diff --git a/src/util/virpci.h b/src/util/virpci.h index 1ce9821..64b9e96 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -94,6 +94,7 @@ int virPCIDeviceSetStubDriver(virPCIDevicePtr dev, const char *driver) ATTRIBUTE_NONNULL(2); const char *virPCIDeviceGetStubDriver(virPCIDevicePtr dev); +virPCIDeviceAddressPtr virPCIDeviceGetAddress(virPCIDevicePtr dev); int virPCIDeviceSetUsedBy(virPCIDevice *dev, const char *drv_name, const char *dom_name);

On 14.01.2015 12:02, Shivaprasad G Bhat wrote:
Basically a getter function which is implemented for accessing the address fields in virPCIDevice.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virpci.c | 16 ++++++++++++++++ src/util/virpci.h | 1 + 2 files changed, 17 insertions(+)
diff --git a/src/util/virpci.c b/src/util/virpci.c index cd78212..831a5d8 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1655,6 +1655,22 @@ virPCIDeviceFree(virPCIDevicePtr dev) VIR_FREE(dev); }
I'm adding a comment here to note explicitly that caller must free the returned value.
+virPCIDeviceAddressPtr virPCIDeviceGetAddress(virPCIDevicePtr dev) +{ + + virPCIDeviceAddressPtr pciAddrPtr; + + if (!dev || (VIR_ALLOC(pciAddrPtr) < 0)) + return NULL; + + pciAddrPtr->domain = dev->domain; + pciAddrPtr->bus = dev->bus; + pciAddrPtr->slot = dev->slot; + pciAddrPtr->function = dev->function; + + return pciAddrPtr; +} + const char * virPCIDeviceGetName(virPCIDevicePtr dev) { diff --git a/src/util/virpci.h b/src/util/virpci.h index 1ce9821..64b9e96 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -94,6 +94,7 @@ int virPCIDeviceSetStubDriver(virPCIDevicePtr dev, const char *driver) ATTRIBUTE_NONNULL(2); const char *virPCIDeviceGetStubDriver(virPCIDevicePtr dev); +virPCIDeviceAddressPtr virPCIDeviceGetAddress(virPCIDevicePtr dev); int virPCIDeviceSetUsedBy(virPCIDevice *dev, const char *drv_name, const char *dom_name);
ACK Michal

The virsh start <domain> fails with qemu error when the hostdevices of the same iommu group are used actively by other vms. It is not clear which hostdev from the same iommu group is used by any of the running guests. User has to go through every guest xml to figure out who is using the hostdev of same iommu group. Solution: Iterate the iommu group of the hostdev and error our neatly in case a device in the same iommu group is busy. Reattach code also does the same kind of check, remove duplicate code as well. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virhostdev.c | 89 ++++++++++++++++++++++++++++++------------------- 1 file changed, 55 insertions(+), 34 deletions(-) diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index 18ff96b..78deeba 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -54,6 +54,41 @@ static virClassPtr virHostdevManagerClass; static void virHostdevManagerDispose(void *obj); static virHostdevManagerPtr virHostdevManagerNew(void); +static int virHostdevIsPCINodeDeviceUsed(virPCIDeviceAddressPtr devAddr, void *opaque) +{ + virPCIDevicePtr other; + int ret = -1; + virPCIDevicePtr pci = NULL; + virHostdevManagerPtr hostdev_mgr = opaque; + + if (!(pci = virPCIDeviceNew(devAddr->domain, devAddr->bus, + devAddr->slot, devAddr->function))) + goto cleanup; + + other = virPCIDeviceListFind(hostdev_mgr->activePCIHostdevs, pci); + if (other) { + const char *other_drvname = NULL; + const char *other_domname = NULL; + virPCIDeviceGetUsedBy(other, &other_drvname, &other_domname); + + if (other_drvname && other_domname) + virReportError(VIR_ERR_OPERATION_INVALID, + _("PCI device %s is in use by " + "driver %s, domain %s"), + virPCIDeviceGetName(pci), + other_drvname, other_domname); + else + virReportError(VIR_ERR_OPERATION_INVALID, + _("PCI device %s is in use"), + virPCIDeviceGetName(pci)); + goto cleanup; + } + ret = 0; + cleanup: + virPCIDeviceFree(pci); + return ret; +} + static int virHostdevManagerOnceInit(void) { if (!(virHostdevManagerClass = virClassNew(virClassForObject(), @@ -494,6 +529,7 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr hostdev_mgr, int last_processed_hostdev_vf = -1; size_t i; int ret = -1; + virPCIDeviceAddressPtr devAddr = NULL; if (!nhostdevs) return 0; @@ -518,7 +554,6 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr hostdev_mgr, for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); - virPCIDevicePtr other; bool strict_acs_check = !!(flags & VIR_HOSTDEV_STRICT_ACS_CHECK); if (!virPCIDeviceIsAssignable(dev, strict_acs_check)) { @@ -528,23 +563,19 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr hostdev_mgr, goto cleanup; } /* The device is in use by other active domain if - * the dev is in list activePCIHostdevs. + * the dev is in list activePCIHostdevs. VFIO devices + * belonging to same iommu group cant be shared + * across guests. */ - if ((other = virPCIDeviceListFind(hostdev_mgr->activePCIHostdevs, dev))) { - const char *other_drvname; - const char *other_domname; - - virPCIDeviceGetUsedBy(other, &other_drvname, &other_domname); - if (other_drvname && other_domname) - virReportError(VIR_ERR_OPERATION_INVALID, - _("PCI device %s is in use by " - "driver %s, domain %s"), - virPCIDeviceGetName(dev), - other_drvname, other_domname); - else - virReportError(VIR_ERR_OPERATION_INVALID, - _("PCI device %s is already in use"), - virPCIDeviceGetName(dev)); + if (STREQ(virPCIDeviceGetStubDriver(dev), "vfio-pci")) { + devAddr = virPCIDeviceGetAddress(dev); + if (!devAddr) + goto cleanup; + if (virPCIDeviceAddressIOMMUGroupIterate(devAddr, + virHostdevIsPCINodeDeviceUsed, + hostdev_mgr) < 0) + goto cleanup; + } else if (virHostdevIsPCINodeDeviceUsed(devAddr, hostdev_mgr)) { goto cleanup; } } @@ -678,6 +709,7 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr hostdev_mgr, virObjectUnlock(hostdev_mgr->activePCIHostdevs); virObjectUnlock(hostdev_mgr->inactivePCIHostdevs); virObjectUnref(pcidevs); + VIR_FREE(devAddr); return ret; } @@ -1506,29 +1538,17 @@ int virHostdevPCINodeDeviceReAttach(virHostdevManagerPtr hostdev_mgr, virPCIDevicePtr pci) { - virPCIDevicePtr other; + virPCIDeviceAddressPtr devAddr = NULL; int ret = -1; virObjectLock(hostdev_mgr->activePCIHostdevs); virObjectLock(hostdev_mgr->inactivePCIHostdevs); - other = virPCIDeviceListFind(hostdev_mgr->activePCIHostdevs, pci); - if (other) { - const char *other_drvname = NULL; - const char *other_domname = NULL; - virPCIDeviceGetUsedBy(other, &other_drvname, &other_domname); - if (other_drvname && other_domname) - virReportError(VIR_ERR_OPERATION_INVALID, - _("PCI device %s is still in use by " - "driver %s, domain %s"), - virPCIDeviceGetName(pci), - other_drvname, other_domname); - else - virReportError(VIR_ERR_OPERATION_INVALID, - _("PCI device %s is still in use"), - virPCIDeviceGetName(pci)); + if (!(devAddr = virPCIDeviceGetAddress(pci))) + goto out; + + if (virHostdevIsPCINodeDeviceUsed(devAddr, hostdev_mgr)) goto out; - } virPCIDeviceReattachInit(pci); @@ -1540,6 +1560,7 @@ virHostdevPCINodeDeviceReAttach(virHostdevManagerPtr hostdev_mgr, out: virObjectUnlock(hostdev_mgr->inactivePCIHostdevs); virObjectUnlock(hostdev_mgr->activePCIHostdevs); + VIR_FREE(devAddr); return ret; }

On 14.01.2015 12:03, Shivaprasad G Bhat wrote:
The virsh start <domain> fails with qemu error when the hostdevices of the same iommu group are used actively by other vms. It is not clear which hostdev from the same iommu group is used by any of the running guests. User has to go through every guest xml to figure out who is using the hostdev of same iommu group.
Solution: Iterate the iommu group of the hostdev and error our neatly in case a device in the same iommu group is busy. Reattach code also does the same kind of check, remove duplicate code as well.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virhostdev.c | 89 ++++++++++++++++++++++++++++++------------------- 1 file changed, 55 insertions(+), 34 deletions(-)
diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index 18ff96b..78deeba 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -54,6 +54,41 @@ static virClassPtr virHostdevManagerClass; static void virHostdevManagerDispose(void *obj); static virHostdevManagerPtr virHostdevManagerNew(void);
+static int virHostdevIsPCINodeDeviceUsed(virPCIDeviceAddressPtr devAddr, void *opaque) +{ + virPCIDevicePtr other; + int ret = -1; + virPCIDevicePtr pci = NULL; + virHostdevManagerPtr hostdev_mgr = opaque; + + if (!(pci = virPCIDeviceNew(devAddr->domain, devAddr->bus, + devAddr->slot, devAddr->function))) + goto cleanup; + + other = virPCIDeviceListFind(hostdev_mgr->activePCIHostdevs, pci); + if (other) { + const char *other_drvname = NULL; + const char *other_domname = NULL; + virPCIDeviceGetUsedBy(other, &other_drvname, &other_domname); + + if (other_drvname && other_domname) + virReportError(VIR_ERR_OPERATION_INVALID, + _("PCI device %s is in use by " + "driver %s, domain %s"), + virPCIDeviceGetName(pci), + other_drvname, other_domname); + else + virReportError(VIR_ERR_OPERATION_INVALID, + _("PCI device %s is in use"), + virPCIDeviceGetName(pci)); + goto cleanup; + } + ret = 0; + cleanup: + virPCIDeviceFree(pci); + return ret; +} + static int virHostdevManagerOnceInit(void) { if (!(virHostdevManagerClass = virClassNew(virClassForObject(), @@ -494,6 +529,7 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr hostdev_mgr, int last_processed_hostdev_vf = -1; size_t i; int ret = -1; + virPCIDeviceAddressPtr devAddr = NULL;
if (!nhostdevs) return 0; @@ -518,7 +554,6 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr hostdev_mgr,
for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); - virPCIDevicePtr other; bool strict_acs_check = !!(flags & VIR_HOSTDEV_STRICT_ACS_CHECK);
if (!virPCIDeviceIsAssignable(dev, strict_acs_check)) { @@ -528,23 +563,19 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr hostdev_mgr, goto cleanup; } /* The device is in use by other active domain if - * the dev is in list activePCIHostdevs. + * the dev is in list activePCIHostdevs. VFIO devices + * belonging to same iommu group cant be shared + * across guests. */ - if ((other = virPCIDeviceListFind(hostdev_mgr->activePCIHostdevs, dev))) { - const char *other_drvname; - const char *other_domname; - - virPCIDeviceGetUsedBy(other, &other_drvname, &other_domname); - if (other_drvname && other_domname) - virReportError(VIR_ERR_OPERATION_INVALID, - _("PCI device %s is in use by " - "driver %s, domain %s"), - virPCIDeviceGetName(dev), - other_drvname, other_domname); - else - virReportError(VIR_ERR_OPERATION_INVALID, - _("PCI device %s is already in use"), - virPCIDeviceGetName(dev)); + if (STREQ(virPCIDeviceGetStubDriver(dev), "vfio-pci")) { + devAddr = virPCIDeviceGetAddress(dev); + if (!devAddr) + goto cleanup; + if (virPCIDeviceAddressIOMMUGroupIterate(devAddr, + virHostdevIsPCINodeDeviceUsed, + hostdev_mgr) < 0) + goto cleanup; + } else if (virHostdevIsPCINodeDeviceUsed(devAddr, hostdev_mgr)) {
This isn't gonna fly really. I mean, imagine this is the first iteration and the stub driver for the device is not vfio-pci. So we end up here, with devAddr being NULL. And virHostdevIsPCINodeDeviceUsed will dereference it immediately. Moreover, if we have two devices on the list, both with vfio-pci stub driver, on the first iteration, devAddr will be allocated. However, on the next iteration, devAddr is allocated again(!) without it being freed previously. And I'm not mentioning the case where in the first iteration we go with vfio-pci branch and then, in later iteration we go by the else branch (where devAddr would refer to different device obviously). Fortunately, there's just a small change required: allocate devAddr unconditionally and free it when no longer needed. I'll fix it when pushing. But next time - the crasher was caught by our test suite too: virhostdevtest. If you ran 'make check' you'd see it.
goto cleanup; } } @@ -678,6 +709,7 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr hostdev_mgr, virObjectUnlock(hostdev_mgr->activePCIHostdevs); virObjectUnlock(hostdev_mgr->inactivePCIHostdevs); virObjectUnref(pcidevs); + VIR_FREE(devAddr); return ret; }
@@ -1506,29 +1538,17 @@ int virHostdevPCINodeDeviceReAttach(virHostdevManagerPtr hostdev_mgr, virPCIDevicePtr pci) { - virPCIDevicePtr other; + virPCIDeviceAddressPtr devAddr = NULL; int ret = -1;
virObjectLock(hostdev_mgr->activePCIHostdevs); virObjectLock(hostdev_mgr->inactivePCIHostdevs); - other = virPCIDeviceListFind(hostdev_mgr->activePCIHostdevs, pci); - if (other) { - const char *other_drvname = NULL; - const char *other_domname = NULL; - virPCIDeviceGetUsedBy(other, &other_drvname, &other_domname);
- if (other_drvname && other_domname) - virReportError(VIR_ERR_OPERATION_INVALID, - _("PCI device %s is still in use by " - "driver %s, domain %s"), - virPCIDeviceGetName(pci), - other_drvname, other_domname); - else - virReportError(VIR_ERR_OPERATION_INVALID, - _("PCI device %s is still in use"), - virPCIDeviceGetName(pci)); + if (!(devAddr = virPCIDeviceGetAddress(pci))) + goto out; + + if (virHostdevIsPCINodeDeviceUsed(devAddr, hostdev_mgr)) goto out; - }
virPCIDeviceReattachInit(pci);
@@ -1540,6 +1560,7 @@ virHostdevPCINodeDeviceReAttach(virHostdevManagerPtr hostdev_mgr, out: virObjectUnlock(hostdev_mgr->inactivePCIHostdevs); virObjectUnlock(hostdev_mgr->activePCIHostdevs); + VIR_FREE(devAddr); return ret; }
Fixed and ACKed. Michal

On 14.01.2015 12:01, Shivaprasad G Bhat wrote:
==================================================================== If a device in the same iommu group is in use by a different vm, the guest boot fails with the below error during vfio passthrough. bash-4.2$ virsh start demo error: Failed to start domain demo error: internal error: process exited while connecting to monitor: 2014-12-02T13:43:52.020136Z qemu-system-x86_64: -device vfio-pci,host=00:1c.3,id=hostdev0,bus=pci.0,addr=0x5: vfio: error opening /dev/vfio/7: Device or resource busy
Solution: ===================================================================== The patch iterates through the iommu group devices and errors out cleanly mentioning the device and the guest which is using the device.
With Patch bash-4.2$ virsh start demo error: Failed to start domain demo error: Requested operation is not valid: PCI device 0000:0d:00.0 is in use by driver QEMU, domain vm10
---
Shivaprasad G Bhat (2): Implement virPCIDeviceGetAddress function check IOMMU group devices usage when preparing device for vfio passthrough
src/util/virhostdev.c | 89 ++++++++++++++++++++++++++++++------------------- src/util/virpci.c | 16 +++++++++ src/util/virpci.h | 1 + 3 files changed, 72 insertions(+), 34 deletions(-)
I've fixed issues I've raised and pushed. Michal

On Wed, Jan 14, 2015 at 10:34 PM, Michal Privoznik <mprivozn@redhat.com> wrote:
On 14.01.2015 12:01, Shivaprasad G Bhat wrote:
==================================================================== If a device in the same iommu group is in use by a different vm, the guest boot fails with the below error during vfio passthrough. bash-4.2$ virsh start demo error: Failed to start domain demo error: internal error: process exited while connecting to monitor: 2014-12-02T13:43:52.020136Z qemu-system-x86_64: -device vfio-pci,host=00:1c.3,id=hostdev0,bus=pci.0,addr=0x5: vfio: error opening /dev/vfio/7: Device or resource busy
Solution: ===================================================================== The patch iterates through the iommu group devices and errors out cleanly mentioning the device and the guest which is using the device.
With Patch bash-4.2$ virsh start demo error: Failed to start domain demo error: Requested operation is not valid: PCI device 0000:0d:00.0 is in use by driver QEMU, domain vm10
---
Shivaprasad G Bhat (2): Implement virPCIDeviceGetAddress function check IOMMU group devices usage when preparing device for vfio passthrough
src/util/virhostdev.c | 89 ++++++++++++++++++++++++++++++------------------- src/util/virpci.c | 16 +++++++++ src/util/virpci.h | 1 + 3 files changed, 72 insertions(+), 34 deletions(-)
I've fixed issues I've raised and pushed.
Thanks for fixing and pushing Michal.. Not sure how I missed it. I'll be cautious using make check in future. Regards, Shiva
Michal
participants (3)
-
Michal Privoznik
-
Shivaprasad bhat
-
Shivaprasad G Bhat