[libvirt] [PATCH v4 00/10] Few VFIO related fixes

The series fixes few VFIO related host crash issues. The patches 3, 4, 9 and 10 are actual fixes. Patch 7 and 8 are test changes to allow testing patch 9. Rest of the patches are mostly code movements except for patch 2. --- Changes from v3 -> v4 - Andrea's suggestion not to overload the usage of stubDriver to set it in virPCINew is addressed. Not setting it any more. Fetch it fresh from sysfs. - The test case added in old P7 is improvised - Old P7 is split into 3 patches where I have moved the virpcimock changes to support iommu into a new patch. Shivaprasad G Bhat (10): Implement virPCIIsKnownStub function Add iommu group number info to virPCIDevice Refuse to reattach from vfio if the iommu group is in use by any domain Wait for vfio-pci device cleanups before reassinging the device to host driver Split reprobe action from the virPCIUnbindFromStub into a new function Pass activeDevs and inactiveDevs to virPCIDeviceUnbindFromStub and virPCIDeviceBindToStub Change the negative test case to try pciback instead of vfio-pci Add iommu info for pci on mocked sysfs Postpone reprobing till all the devices in iommu group are unbound from vfio Wait for iommmu device to go away before reprobing the host driver src/util/virhostdev.c | 28 ++++- src/util/virpci.c | 297 +++++++++++++++++++++++++++++++++++++++++-------- tests/virpcimock.c | 193 +++++++++++++++++++++++++++++--- tests/virpcitest.c | 117 +++++++++++++++++++ 4 files changed, 566 insertions(+), 69 deletions(-) -- Signature

The checks to known stubs can be easily done having this implementation. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virpci.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/src/util/virpci.c b/src/util/virpci.c index 35b1459..bff37d7 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1080,6 +1080,23 @@ static const char *virPCIKnownStubs[] = { NULL }; +static bool +virPCIIsKnownStub(char *driver) +{ + const char **stubTest; + bool ret = false; + + for (stubTest = virPCIKnownStubs; *stubTest != NULL; stubTest++) { + if (STREQ_NULLABLE(driver, *stubTest)) { + ret = true; + VIR_DEBUG("Found stub driver %s", *stubTest); + break; + } + } + + return ret; +} + static int virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) { @@ -1087,8 +1104,6 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) char *drvdir = NULL; char *path = NULL; char *driver = NULL; - const char **stubTest; - bool isStub = false; /* If the device is currently bound to one of the "well known" * stub drivers, then unbind it, otherwise ignore it. @@ -1105,14 +1120,7 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) goto remove_slot; /* If the device isn't bound to a known stub, skip the unbind. */ - for (stubTest = virPCIKnownStubs; *stubTest != NULL; stubTest++) { - if (STREQ(driver, *stubTest)) { - isStub = true; - VIR_DEBUG("Found stub driver %s", *stubTest); - break; - } - } - if (!isStub) + if (!virPCIIsKnownStub(driver)) goto remove_slot; if (virPCIDeviceUnbind(dev, dev->reprobe) < 0)

On 11/14/2015 03:34 AM, Shivaprasad G Bhat wrote:
The checks to known stubs can be easily done having this implementation.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virpci.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-)
diff --git a/src/util/virpci.c b/src/util/virpci.c index 35b1459..bff37d7 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1080,6 +1080,23 @@ static const char *virPCIKnownStubs[] = { NULL };
+static bool +virPCIIsKnownStub(char *driver) +{ + const char **stubTest; + bool ret = false; + + for (stubTest = virPCIKnownStubs; *stubTest != NULL; stubTest++) { + if (STREQ_NULLABLE(driver, *stubTest)) { + ret = true; + VIR_DEBUG("Found stub driver %s", *stubTest); + break; + } + } + + return ret; +} + static int virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) { @@ -1087,8 +1104,6 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) char *drvdir = NULL; char *path = NULL; char *driver = NULL; - const char **stubTest; - bool isStub = false;
/* If the device is currently bound to one of the "well known" * stub drivers, then unbind it, otherwise ignore it. @@ -1105,14 +1120,7 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) goto remove_slot;
/* If the device isn't bound to a known stub, skip the unbind. */ - for (stubTest = virPCIKnownStubs; *stubTest != NULL; stubTest++) { - if (STREQ(driver, *stubTest)) { - isStub = true; - VIR_DEBUG("Found stub driver %s", *stubTest); - break; - } - } - if (!isStub) + if (!virPCIIsKnownStub(driver)) goto remove_slot;
if (virPCIDeviceUnbind(dev, dev->reprobe) < 0)
I haven't yet seen where/how this will be used, but it doesn't change behavior and doesn't add more than a few lines, so ACK.

On 14.11.2015 09:34, Shivaprasad G Bhat wrote:
The checks to known stubs can be easily done having this implementation.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virpci.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-)
diff --git a/src/util/virpci.c b/src/util/virpci.c index 35b1459..bff37d7 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1080,6 +1080,23 @@ static const char *virPCIKnownStubs[] = { NULL };
+static bool +virPCIIsKnownStub(char *driver) +{ + const char **stubTest; + bool ret = false; + + for (stubTest = virPCIKnownStubs; *stubTest != NULL; stubTest++) { + if (STREQ_NULLABLE(driver, *stubTest)) { + ret = true; + VIR_DEBUG("Found stub driver %s", *stubTest); + break; + } + } + + return ret; +} + static int virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) { @@ -1087,8 +1104,6 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) char *drvdir = NULL; char *path = NULL; char *driver = NULL; - const char **stubTest; - bool isStub = false;
/* If the device is currently bound to one of the "well known" * stub drivers, then unbind it, otherwise ignore it. @@ -1105,14 +1120,7 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) goto remove_slot;
/* If the device isn't bound to a known stub, skip the unbind. */ - for (stubTest = virPCIKnownStubs; *stubTest != NULL; stubTest++) { - if (STREQ(driver, *stubTest)) { - isStub = true; - VIR_DEBUG("Found stub driver %s", *stubTest); - break; - } - } - if (!isStub) + if (!virPCIIsKnownStub(driver)) goto remove_slot;
if (virPCIDeviceUnbind(dev, dev->reprobe) < 0)
ACK Michal

The iommu group number need not be fetched from the sysfs everytime as it remains constant. Fetch it once during allocation. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virpci.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/src/util/virpci.c b/src/util/virpci.c index bff37d7..89e69e2 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -75,6 +75,7 @@ struct _virPCIDevice { bool has_pm_reset; bool managed; char *stubDriver; + int iommuGroup; /* used by reattach function */ bool unbind_from_stub; @@ -1564,6 +1565,8 @@ virPCIDeviceNew(unsigned int domain, virPCIDevicePtr dev; char *vendor = NULL; char *product = NULL; + virPCIDeviceAddress devAddr = { domain, bus, + slot, function }; if (VIR_ALLOC(dev) < 0) return NULL; @@ -1611,6 +1614,8 @@ virPCIDeviceNew(unsigned int domain, goto error; } + dev->iommuGroup = virPCIDeviceAddressGetIOMMUGroupNum(&devAddr); + VIR_DEBUG("%s %s: initialized", dev->id, dev->name); cleanup:

On 11/14/2015 03:35 AM, Shivaprasad G Bhat wrote:
The iommu group number need not be fetched from the sysfs everytime as it remains constant. Fetch it once during allocation.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virpci.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/src/util/virpci.c b/src/util/virpci.c index bff37d7..89e69e2 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -75,6 +75,7 @@ struct _virPCIDevice { bool has_pm_reset; bool managed; char *stubDriver; + int iommuGroup;
/* used by reattach function */ bool unbind_from_stub; @@ -1564,6 +1565,8 @@ virPCIDeviceNew(unsigned int domain, virPCIDevicePtr dev; char *vendor = NULL; char *product = NULL; + virPCIDeviceAddress devAddr = { domain, bus, + slot, function };
Ugh. Pre-existing, but virPCIDevice should have a single virPCIDeviceAddress instead of individual members for domain, bus, slot, and function. Then you wouldn't need to create a separate object just to send to virPCIDeviceAddressGetIOMMUGroupNum. That change would create a lot of diff lines though, so isn't appropriate for this patch. Nothing dangerous about this (as long as we later do the right thing when iommuGroup is < 0). ACK
if (VIR_ALLOC(dev) < 0) return NULL; @@ -1611,6 +1614,8 @@ virPCIDeviceNew(unsigned int domain, goto error; }
+ dev->iommuGroup = virPCIDeviceAddressGetIOMMUGroupNum(&devAddr); + VIR_DEBUG("%s %s: initialized", dev->id, dev->name);
cleanup:
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 14.11.2015 09:35, Shivaprasad G Bhat wrote:
The iommu group number need not be fetched from the sysfs everytime as it remains constant. Fetch it once during allocation.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virpci.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/src/util/virpci.c b/src/util/virpci.c index bff37d7..89e69e2 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -75,6 +75,7 @@ struct _virPCIDevice { bool has_pm_reset; bool managed; char *stubDriver; + int iommuGroup;
/* used by reattach function */ bool unbind_from_stub; @@ -1564,6 +1565,8 @@ virPCIDeviceNew(unsigned int domain, virPCIDevicePtr dev; char *vendor = NULL; char *product = NULL; + virPCIDeviceAddress devAddr = { domain, bus, + slot, function };
I'd rather see this explicitly initialized {.domain = domain, .bus = bus, ...} because if we ever change order of virPCIDeviceAddress struct (no idea why we would do that right now, but we might) this will slip silently.
if (VIR_ALLOC(dev) < 0) return NULL; @@ -1611,6 +1614,8 @@ virPCIDeviceNew(unsigned int domain, goto error; }
+ dev->iommuGroup = virPCIDeviceAddressGetIOMMUGroupNum(&devAddr); + VIR_DEBUG("%s %s: initialized", dev->id, dev->name);
cleanup:
ACK with that fixed. Michal

It is incorrect to attempt the device reattach of a function, when some other domain is using some functions belonging to the same iommu group. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virhostdev.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index de029a0..f24ccd8 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -1590,18 +1590,35 @@ virHostdevPCINodeDeviceReAttach(virHostdevManagerPtr hostdev_mgr, virPCIDevicePtr pci) { virPCIDeviceAddressPtr devAddr = NULL; + bool usesVfio = false; + char *drvPath = NULL; + char *drvName = NULL; struct virHostdevIsPCINodeDeviceUsedData data = {hostdev_mgr, NULL, false}; int ret = -1; + if (virPCIDeviceGetDriverPathAndName(pci, &drvPath, &drvName) < 0) + goto out; + + if (STREQ_NULLABLE(drvName, "vfio-pci")) + usesVfio = true; + virObjectLock(hostdev_mgr->activePCIHostdevs); virObjectLock(hostdev_mgr->inactivePCIHostdevs); if (!(devAddr = virPCIDeviceGetAddress(pci))) goto out; - if (virHostdevIsPCINodeDeviceUsed(devAddr, &data)) + if (usesVfio) { + /* Doesn't matter which device. If any domain is actively using the + * iommu group, refuse to reattach */ + if (virPCIDeviceAddressIOMMUGroupIterate(devAddr, + virHostdevIsPCINodeDeviceUsed, + &data) < 0) + goto out; + } else if (virHostdevIsPCINodeDeviceUsed(devAddr, &data)) { goto out; + } virPCIDeviceReattachInit(pci); @@ -1614,6 +1631,8 @@ virHostdevPCINodeDeviceReAttach(virHostdevManagerPtr hostdev_mgr, virObjectUnlock(hostdev_mgr->inactivePCIHostdevs); virObjectUnlock(hostdev_mgr->activePCIHostdevs); VIR_FREE(devAddr); + VIR_FREE(drvPath); + VIR_FREE(drvName); return ret; }

On 11/14/2015 03:36 AM, Shivaprasad G Bhat wrote:
Refuse to reattach from vfio if the iommu group is in use by any domain
util: refuse to reattach device if any device in the same group is in use by any domain
It is incorrect to attempt the device reattach of a function,
s/function/device/
when some other domain is using some functions belonging to the same iommu group.
s/some other domain/any other domain/ s/some functions/any device/
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virhostdev.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index de029a0..f24ccd8 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -1590,18 +1590,35 @@ virHostdevPCINodeDeviceReAttach(virHostdevManagerPtr hostdev_mgr, virPCIDevicePtr pci) { virPCIDeviceAddressPtr devAddr = NULL; + bool usesVfio = false; + char *drvPath = NULL; + char *drvName = NULL; struct virHostdevIsPCINodeDeviceUsedData data = {hostdev_mgr, NULL, false}; int ret = -1;
+ if (virPCIDeviceGetDriverPathAndName(pci, &drvPath, &drvName) < 0) + goto out; + + if (STREQ_NULLABLE(drvName, "vfio-pci")) + usesVfio = true; + virObjectLock(hostdev_mgr->activePCIHostdevs); virObjectLock(hostdev_mgr->inactivePCIHostdevs);
if (!(devAddr = virPCIDeviceGetAddress(pci))) goto out;
- if (virHostdevIsPCINodeDeviceUsed(devAddr, &data)) + if (usesVfio) { + /* Doesn't matter which device. If any domain is actively using the + * iommu group, refuse to reattach */ + if (virPCIDeviceAddressIOMMUGroupIterate(devAddr, + virHostdevIsPCINodeDeviceUsed, + &data) < 0) + goto out;
Calling the iterator seems very inefficient to me (which is why I suggested in the original BZ comments that we save the iommuGroup of each device to the activelist): 1) you know the iommu group of the current PCI device (it is in pci->iommuGroup) 2) all that virHostdevIsPCINodeDeviceUsed does is check through activePCIHostdevs for the device, and this iterator is just calling virHostdevIsPCINodeDeviceUsed for each device in the same iommuGroup as the current device. Therefore, all you really need to do is look for any devices in acticePCIHostdevs that have dev->iommuGroup == pci->iommuGroup. Calling the iterator results in a lot of accesses to sysfs, opening and closing files, printfs, etc. Unless that is necessary to catch some extra case (and I don't see that), it's much better to do it the simpler way - more efficient and easier to understand a year from now when we come back to this.
+ } else if (virHostdevIsPCINodeDeviceUsed(devAddr, &data)) { goto out; + }
virPCIDeviceReattachInit(pci);
@@ -1614,6 +1631,8 @@ virHostdevPCINodeDeviceReAttach(virHostdevManagerPtr hostdev_mgr, virObjectUnlock(hostdev_mgr->inactivePCIHostdevs); virObjectUnlock(hostdev_mgr->activePCIHostdevs); VIR_FREE(devAddr); + VIR_FREE(drvPath); + VIR_FREE(drvName); return ret; }
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 11/20/2015 08:51 PM, Laine Stump wrote:
On 11/14/2015 03:36 AM, Shivaprasad G Bhat wrote:
Refuse to reattach from vfio if the iommu group is in use by any domain
util: refuse to reattach device if any device in the same group is in use by any domain
It is incorrect to attempt the device reattach of a function,
s/function/device/
when some other domain is using some functions belonging to the same iommu group.
s/some other domain/any other domain/ s/some functions/any device/
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virhostdev.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index de029a0..f24ccd8 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -1590,18 +1590,35 @@ virHostdevPCINodeDeviceReAttach(virHostdevManagerPtr hostdev_mgr, virPCIDevicePtr pci) { virPCIDeviceAddressPtr devAddr = NULL; + bool usesVfio = false; + char *drvPath = NULL; + char *drvName = NULL; struct virHostdevIsPCINodeDeviceUsedData data = {hostdev_mgr, NULL, false}; int ret = -1; + if (virPCIDeviceGetDriverPathAndName(pci, &drvPath, &drvName) < 0) + goto out; + + if (STREQ_NULLABLE(drvName, "vfio-pci")) + usesVfio = true; + virObjectLock(hostdev_mgr->activePCIHostdevs); virObjectLock(hostdev_mgr->inactivePCIHostdevs); if (!(devAddr = virPCIDeviceGetAddress(pci))) goto out; - if (virHostdevIsPCINodeDeviceUsed(devAddr, &data)) + if (usesVfio) { + /* Doesn't matter which device. If any domain is actively using the + * iommu group, refuse to reattach */ + if (virPCIDeviceAddressIOMMUGroupIterate(devAddr, + virHostdevIsPCINodeDeviceUsed, + &data) < 0) + goto out;
Calling the iterator seems very inefficient to me (which is why I suggested in the original BZ comments that we save the iommuGroup of each device to the activelist):
1) you know the iommu group of the current PCI device (it is in pci->iommuGroup)
2) all that virHostdevIsPCINodeDeviceUsed does is check through activePCIHostdevs for the device, and this iterator is just calling virHostdevIsPCINodeDeviceUsed for each device in the same iommuGroup as the current device.
Therefore, all you really need to do is look for any devices in acticePCIHostdevs that have dev->iommuGroup == pci->iommuGroup.
Calling the iterator results in a lot of accesses to sysfs, opening and closing files, printfs, etc. Unless that is necessary to catch some extra case (and I don't see that), it's much better to do it the simpler way - more efficient and easier to understand a year from now when we come back to this.
I always felt relying on the activelist is not possible as its lost during libvirt restart. Now that you point it out, I realise the activelist is populated back during libvirt domain discovery so activelist is actually reliable. I'll change this to use activelist instead. Thanks, Shivaprasad
+ } else if (virHostdevIsPCINodeDeviceUsed(devAddr, &data)) { goto out; + } virPCIDeviceReattachInit(pci); @@ -1614,6 +1631,8 @@ virHostdevPCINodeDeviceReAttach(virHostdevManagerPtr hostdev_mgr, virObjectUnlock(hostdev_mgr->inactivePCIHostdevs); virObjectUnlock(hostdev_mgr->activePCIHostdevs); VIR_FREE(devAddr); + VIR_FREE(drvPath); + VIR_FREE(drvName); return ret; }
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 14.11.2015 09:36, Shivaprasad G Bhat wrote:
It is incorrect to attempt the device reattach of a function, when some other domain is using some functions belonging to the same iommu group.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virhostdev.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index de029a0..f24ccd8 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -1590,18 +1590,35 @@ virHostdevPCINodeDeviceReAttach(virHostdevManagerPtr hostdev_mgr, virPCIDevicePtr pci) { virPCIDeviceAddressPtr devAddr = NULL; + bool usesVfio = false; + char *drvPath = NULL; + char *drvName = NULL; struct virHostdevIsPCINodeDeviceUsedData data = {hostdev_mgr, NULL, false}; int ret = -1;
+ if (virPCIDeviceGetDriverPathAndName(pci, &drvPath, &drvName) < 0) + goto out; + + if (STREQ_NULLABLE(drvName, "vfio-pci")) + usesVfio = true; + virObjectLock(hostdev_mgr->activePCIHostdevs); virObjectLock(hostdev_mgr->inactivePCIHostdevs);
if (!(devAddr = virPCIDeviceGetAddress(pci))) goto out;
- if (virHostdevIsPCINodeDeviceUsed(devAddr, &data)) + if (usesVfio) { + /* Doesn't matter which device. If any domain is actively using the + * iommu group, refuse to reattach */
s/reattach/reattach./ And, does not matter right now, but maybe we will need to set data.usesVfio = true; in this case. Currently, this will result in no-op
+ if (virPCIDeviceAddressIOMMUGroupIterate(devAddr, + virHostdevIsPCINodeDeviceUsed, + &data) < 0) + goto out; + } else if (virHostdevIsPCINodeDeviceUsed(devAddr, &data)) { goto out; + }
virPCIDeviceReattachInit(pci);
@@ -1614,6 +1631,8 @@ virHostdevPCINodeDeviceReAttach(virHostdevManagerPtr hostdev_mgr, virObjectUnlock(hostdev_mgr->inactivePCIHostdevs); virObjectUnlock(hostdev_mgr->activePCIHostdevs); VIR_FREE(devAddr); + VIR_FREE(drvPath); + VIR_FREE(drvName); return ret; }
And as you and Laine agreed, this is going to be turned into a list, so this patch probably ends up being dropped anyway. Michal

Before unbind from stub driver libvirt should be sure the guest is not using the device anymore. The libvirt today waits for pci-stub driver alone in /proc/iommu. The same wait is needed for vfio devices too. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virhostdev.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index f24ccd8..9f15f34 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -756,6 +756,13 @@ virHostdevReattachPCIDevice(virPCIDevicePtr dev, virHostdevManagerPtr mgr) usleep(100*1000); retries--; } + } else if (STREQ(virPCIDeviceGetStubDriver(dev), "vfio-pci")) { + int retries = 100; + while (virPCIDeviceWaitForCleanup(dev, "vfio-pci") + && retries) { + usleep(100*1000); + retries--; + } } if (virPCIDeviceReattach(dev, mgr->activePCIHostdevs,

On Sat, 2015-11-14 at 14:06 +0530, Shivaprasad G Bhat wrote:
Before unbind from stub driver libvirt should be sure the guest is not using the device anymore. The libvirt today waits for pci-stub driver alone in /proc/iommu. The same wait is needed for vfio devices too.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virhostdev.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index f24ccd8..9f15f34 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -756,6 +756,13 @@ virHostdevReattachPCIDevice(virPCIDevicePtr dev, virHostdevManagerPtr mgr) usleep(100*1000); retries--; } + } else if (STREQ(virPCIDeviceGetStubDriver(dev), "vfio-pci")) { + int retries = 100; + while (virPCIDeviceWaitForCleanup(dev, "vfio-pci") + && retries) { + usleep(100*1000); + retries--; + } }
if (virPCIDeviceReattach(dev, mgr->activePCIHostdevs,
Hi, Laine pointed out this patch to me, and it seems entirely unnecessary. Can you explain what it fixes? The reason that pci-stub and legacy KVM device assignment needs to do this little delay is because pci-stub itself is not in control of the device, it will happily release the device at any point in time, regardless of whether KVM is still making use of it. With vfio-pci, all the device access occurs through the vfio-pci driver, so you can be sure that if the vfio-pci driver unbinds from the device it is unused. In fact, the unbind will block until it is unused. Are you doing this to make sure you don't get stuck in that blocked unbind and trigger an eject request to the guest? It certainly needs an explanation beyond pci-stub did this, so we should too. Thanks, Alex

On Fri, Nov 20, 2015 at 4:54 AM, Alex Williamson <alex.williamson@redhat.com> wrote:
On Sat, 2015-11-14 at 14:06 +0530, Shivaprasad G Bhat wrote:
Before unbind from stub driver libvirt should be sure the guest is not using the device anymore. The libvirt today waits for pci-stub driver alone in /proc/iommu. The same wait is needed for vfio devices too.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virhostdev.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index f24ccd8..9f15f34 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -756,6 +756,13 @@ virHostdevReattachPCIDevice(virPCIDevicePtr dev, virHostdevManagerPtr mgr) usleep(100*1000); retries--; } + } else if (STREQ(virPCIDeviceGetStubDriver(dev), "vfio-pci")) { + int retries = 100; + while (virPCIDeviceWaitForCleanup(dev, "vfio-pci") + && retries) { + usleep(100*1000); + retries--; + } }
if (virPCIDeviceReattach(dev, mgr->activePCIHostdevs,
Hi,
Laine pointed out this patch to me, and it seems entirely unnecessary. Can you explain what it fixes?
The reason that pci-stub and legacy KVM device assignment needs to do this little delay is because pci-stub itself is not in control of the device, it will happily release the device at any point in time, regardless of whether KVM is still making use of it. With vfio-pci, all the device access occurs through the vfio-pci driver, so you can be sure that if the vfio-pci driver unbinds from the device it is unused. In fact, the unbind will block until it is unused.
Hi Alex, Thanks for taking a look. I didn't want to leave out any case where the host crash would occur because of early release of the device. I could see the /dev/iommu reporting the vfio-pci using the device when the guest is actually using it. The behavior of vfio-pci that you mentioned, is it true across architectures? If yes, we can confidently drop this wait. Thanks, Shivaprasad
Are you doing this to make sure you don't get stuck in that blocked unbind and trigger an eject request to the guest? It certainly needs an explanation beyond pci-stub did this, so we should too. Thanks,
Alex
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Fri, 2015-11-20 at 12:26 +0530, Shivaprasad bhat wrote:
On Fri, Nov 20, 2015 at 4:54 AM, Alex Williamson <alex.williamson@redhat.com> wrote:
On Sat, 2015-11-14 at 14:06 +0530, Shivaprasad G Bhat wrote:
Before unbind from stub driver libvirt should be sure the guest is not using the device anymore. The libvirt today waits for pci-stub driver alone in /proc/iommu. The same wait is needed for vfio devices too.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virhostdev.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index f24ccd8..9f15f34 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -756,6 +756,13 @@ virHostdevReattachPCIDevice(virPCIDevicePtr dev, virHostdevManagerPtr mgr) usleep(100*1000); retries--; } + } else if (STREQ(virPCIDeviceGetStubDriver(dev), "vfio-pci")) { + int retries = 100; + while (virPCIDeviceWaitForCleanup(dev, "vfio-pci") + && retries) { + usleep(100*1000); + retries--; + } }
if (virPCIDeviceReattach(dev, mgr->activePCIHostdevs,
Hi,
Laine pointed out this patch to me, and it seems entirely unnecessary. Can you explain what it fixes?
The reason that pci-stub and legacy KVM device assignment needs to do this little delay is because pci-stub itself is not in control of the device, it will happily release the device at any point in time, regardless of whether KVM is still making use of it. With vfio-pci, all the device access occurs through the vfio-pci driver, so you can be sure that if the vfio-pci driver unbinds from the device it is unused. In fact, the unbind will block until it is unused.
Hi Alex,
Thanks for taking a look.
I didn't want to leave out any case where the host crash would occur because of early release of the device. I could see the /dev/iommu reporting the vfio-pci using the device when the guest is actually using it. The behavior of vfio-pci that you mentioned, is it true across architectures? If yes, we can confidently drop this wait.
As Laine suggests, if we paper over every kernel bug with random delays instead of fixing the actual problem, we're left with a very fragile solution that nobody understands. There are two aspects to returning devices to host drivers, the first is the integrity of the group, which is libvirt's responsibility. If libvirt reassigns a device within a group to the host driver while other devices within the group are still in use, it compromises the isolation of the group and the kernel will necessarily BUG_ON. This is improper group management. Adding delays and testing unintentional markers just seems like fragile ways to avoid addressing the real issue here. The other aspect is a device that should be free and clear of any sort of group management issues and we're simply unbinding from the vfio driver and re-binding to the host driver. As I mentioned, vfio-pci won't release a device until it's unused, so needing to add a delay and test another file to avoid a kernel crash either sounds like a bug in the kernel or a hack to work around not addressing the problem of proper group management above. Thanks, Alex

The reprobe needs to be called multiple times for vfio devices for each device in the iommu group in future patch. So split the reprobe into a new function. No functional change. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virpci.c | 47 +++++++++++++++++++++++++++++++---------------- 1 file changed, 31 insertions(+), 16 deletions(-) diff --git a/src/util/virpci.c b/src/util/virpci.c index 89e69e2..febf100 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1099,6 +1099,36 @@ virPCIIsKnownStub(char *driver) } static int +virPCIDeviceReprobeHostDriver(virPCIDevicePtr dev, + const char *driver, + const char *drvdir) +{ + char *path = NULL; + int ret = -1; + + /* Trigger a re-probe of the device is not in the stub's dynamic + * ID table. If the stub is available, but 'remove_id' isn't + * available, then re-probing would just cause the device to be + * re-bound to the stub. + */ + if (driver && !(path = virPCIDriverFile(driver, "remove_id"))) + goto cleanup; + + if (!driver || !virFileExists(drvdir) || virFileExists(path)) { + if (virFileWriteStr(PCI_SYSFS "drivers_probe", dev->name, 0) < 0) { + virReportSystemError(errno, + _("Failed to trigger a re-probe for PCI device '%s'"), + dev->name); + goto cleanup; + } + } + ret = 0; + cleanup: + VIR_FREE(path); + return ret; +} + +static int virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) { int result = -1; @@ -1150,24 +1180,9 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) goto cleanup; } - /* Trigger a re-probe of the device is not in the stub's dynamic - * ID table. If the stub is available, but 'remove_id' isn't - * available, then re-probing would just cause the device to be - * re-bound to the stub. - */ - VIR_FREE(path); - if (driver && !(path = virPCIDriverFile(driver, "remove_id"))) + if (virPCIDeviceReprobeHostDriver(dev, driver, drvdir) < 0) goto cleanup; - if (!driver || !virFileExists(drvdir) || virFileExists(path)) { - if (virFileWriteStr(PCI_SYSFS "drivers_probe", dev->name, 0) < 0) { - virReportSystemError(errno, - _("Failed to trigger a re-probe for PCI device '%s'"), - dev->name); - goto cleanup; - } - } - result = 0; cleanup:

On 11/14/2015 03:36 AM, Shivaprasad G Bhat wrote:
The reprobe needs to be called multiple times for vfio devices for each device in the iommu group in future patch. So split the reprobe into a new function. No functional change.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virpci.c | 47 +++++++++++++++++++++++++++++++---------------- 1 file changed, 31 insertions(+), 16 deletions(-)
diff --git a/src/util/virpci.c b/src/util/virpci.c index 89e69e2..febf100 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1099,6 +1099,36 @@ virPCIIsKnownStub(char *driver) }
static int +virPCIDeviceReprobeHostDriver(virPCIDevicePtr dev, + const char *driver, + const char *drvdir) +{ + char *path = NULL; + int ret = -1; + + /* Trigger a re-probe of the device is not in the stub's dynamic
As long as you're moving the code, s/of/if/
+ * ID table. If the stub is available, but 'remove_id' isn't + * available, then re-probing would just cause the device to be + * re-bound to the stub. + */ + if (driver && !(path = virPCIDriverFile(driver, "remove_id"))) + goto cleanup; + + if (!driver || !virFileExists(drvdir) || virFileExists(path)) { + if (virFileWriteStr(PCI_SYSFS "drivers_probe", dev->name, 0) < 0) { + virReportSystemError(errno, + _("Failed to trigger a re-probe for PCI device '%s'"), + dev->name); + goto cleanup; + } + } + ret = 0; + cleanup: + VIR_FREE(path); + return ret; +} + +static int virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) { int result = -1; @@ -1150,24 +1180,9 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) goto cleanup; }
- /* Trigger a re-probe of the device is not in the stub's dynamic - * ID table. If the stub is available, but 'remove_id' isn't - * available, then re-probing would just cause the device to be - * re-bound to the stub. - */ - VIR_FREE(path); - if (driver && !(path = virPCIDriverFile(driver, "remove_id"))) + if (virPCIDeviceReprobeHostDriver(dev, driver, drvdir) < 0) goto cleanup;
- if (!driver || !virFileExists(drvdir) || virFileExists(path)) { - if (virFileWriteStr(PCI_SYSFS "drivers_probe", dev->name, 0) < 0) { - virReportSystemError(errno, - _("Failed to trigger a re-probe for PCI device '%s'"), - dev->name); - goto cleanup; - } - } - result = 0;
cleanup:
Seems safe, but is this really what we want to do? I haven't read/understood the remaining patches yet, but this makes it sound like what is going to happen is that all of the devices will be unbound from vfio-pci immediately, so they are "in limbo", and will then be reprobed once all devices are unused (and therefore unbound from vfio-pci). I think that may be a bit dangerous. Instead, we should leave the devices bound to vfio-pci until all of them are unused, and at that time, we should unbind them all from vfio-pci, then reprobe them all. (again, I may have misunderstood the direction, if so ignore this). If I am misunderstanding, and unbinding from vfio-pci will also be delayed until all devices are unused, then ACK.

On Fri, 2015-11-20 at 11:33 -0500, Laine Stump wrote:
Seems safe, but is this really what we want to do? I haven't read/understood the remaining patches yet, but this makes it sound like what is going to happen is that all of the devices will be unbound from vfio-pci immediately, so they are "in limbo", and will then be reprobed once all devices are unused (and therefore unbound from vfio-pci).
I think that may be a bit dangerous. Instead, we should leave the devices bound to vfio-pci until all of them are unused, and at that time, we should unbind them all from vfio-pci, then reprobe them all. (again, I may have misunderstood the direction, if so ignore this).
I agree, we should not unbind any device from vfio-pci until all the devices in the IOMMU group have been detached from the guest. Cheers. -- Andrea Bolognani Software Engineer - Virtualization Team

On 11/20/2015 11:58 AM, Andrea Bolognani wrote:
Seems safe, but is this really what we want to do? I haven't read/understood the remaining patches yet, but this makes it sound like what is going to happen is that all of the devices will be unbound from vfio-pci immediately, so they are "in limbo", and will then be reprobed once all devices are unused (and therefore unbound from vfio-pci).
I think that may be a bit dangerous. Instead, we should leave the devices bound to vfio-pci until all of them are unused, and at that time, we should unbind them all from vfio-pci, then reprobe them all. (again, I may have misunderstood the direction, if so ignore this). I agree, we should not unbind any device from vfio-pci until all the devices in the IOMMU group have been detached from
On Fri, 2015-11-20 at 11:33 -0500, Laine Stump wrote: the guest.
... and I've just looked back at my original comment about this in the BZ, and see that at that time I only suggested delaying the reprobe, but said nothing about delaying the unbind. And I'm not as sure about the necessity of waiting as I was 1/2 an hour ago. I suppose the issue is that it brings all those unbound devices one step closer to getting bound to the host driver. However, that will happen only if those device's PCI addresses are written to "drivers_reprobe" in sysfs (right? is there any other way a more "global" reprobe could happen and snatch up everything that's currently unbound?) So maybe I'd better ask someone who knows more about this than me - Alex, is there an issue with unbinding some devices in an iommu group from vfio-pci at an earlier time, and leaving then unbound to any driver at all while some other devices in the group are still in use by the guest? Is there an advantage to keeping them all bound to vfio-pci until none of them are used, and then unbinding/reprobing them all at the same time? Or should we unbind each from vfio-pci immediately when they are detached from the guest, and reprobe them all once they're all unbound? Sorry for waffling so much on this. It's just that going through the code in virpci.c makes my head hurt.

On Fri, 2015-11-20 at 12:24 -0500, Laine Stump wrote:
On 11/20/2015 11:58 AM, Andrea Bolognani wrote:
Seems safe, but is this really what we want to do? I haven't read/understood the remaining patches yet, but this makes it sound like what is going to happen is that all of the devices will be unbound from vfio-pci immediately, so they are "in limbo", and will then be reprobed once all devices are unused (and therefore unbound from vfio-pci).
I think that may be a bit dangerous. Instead, we should leave the devices bound to vfio-pci until all of them are unused, and at that time, we should unbind them all from vfio-pci, then reprobe them all. (again, I may have misunderstood the direction, if so ignore this). I agree, we should not unbind any device from vfio-pci until all the devices in the IOMMU group have been detached from
On Fri, 2015-11-20 at 11:33 -0500, Laine Stump wrote: the guest.
... and I've just looked back at my original comment about this in the BZ, and see that at that time I only suggested delaying the reprobe, but said nothing about delaying the unbind. And I'm not as sure about the necessity of waiting as I was 1/2 an hour ago. I suppose the issue is that it brings all those unbound devices one step closer to getting bound to the host driver. However, that will happen only if those device's PCI addresses are written to "drivers_reprobe" in sysfs (right? is there any other way a more "global" reprobe could happen and snatch up everything that's currently unbound?)
Any load of a module will snatch up any unclaimed devices that match it, so if you unbind and leave the devices orpaned, a random module load could cause much badness. Adding a new_id will also cause a device scan, so if that happened to match the device: random badness.
So maybe I'd better ask someone who knows more about this than me - Alex, is there an issue with unbinding some devices in an iommu group from vfio-pci at an earlier time, and leaving then unbound to any driver at all while some other devices in the group are still in use by the guest? Is there an advantage to keeping them all bound to vfio-pci until none of them are used, and then unbinding/reprobing them all at the same time? Or should we unbind each from vfio-pci immediately when they are detached from the guest, and reprobe them all once they're all unbound?
Unbinding them from vfio-pci leaves them susceptible to random bad things happen, as outlined above, and potentially limits vfio's ability to do things like bus resets. For instance imagine a 2-port NIC where each port is a PCI function, the functions are grouped together and the devices don't support any sort of internal reset. If both devices are bound to vfio-pci, then the user owns them both and we can do a bus reset. If one of those devices gets released from the user, as soon as it's unbound from vfio-pci it's no longer in our control and the bus rest option is gone. The best course of action would be to leave any managed devices bound to vfio-pci until all of the devices within the group are no longer in use. Thanks, Alex

On 11/20/2015 11:30 PM, Alex Williamson wrote:
On Fri, 2015-11-20 at 12:24 -0500, Laine Stump wrote:
Seems safe, but is this really what we want to do? I haven't read/understood the remaining patches yet, but this makes it sound like what is going to happen is that all of the devices will be unbound from vfio-pci immediately, so they are "in limbo", and will then be reprobed once all devices are unused (and therefore unbound from vfio-pci).
I think that may be a bit dangerous. Instead, we should leave the devices bound to vfio-pci until all of them are unused, and at that time, we should unbind them all from vfio-pci, then reprobe them all. (again, I may have misunderstood the direction, if so ignore this). I agree, we should not unbind any device from vfio-pci until all the devices in the IOMMU group have been detached from
On Fri, 2015-11-20 at 11:33 -0500, Laine Stump wrote: the guest. ... and I've just looked back at my original comment about this in the BZ, and see that at that time I only suggested delaying the reprobe, but said nothing about delaying the unbind. And I'm not as sure about the necessity of waiting as I was 1/2 an hour ago. I suppose the issue is
On 11/20/2015 11:58 AM, Andrea Bolognani wrote: that it brings all those unbound devices one step closer to getting bound to the host driver. However, that will happen only if those device's PCI addresses are written to "drivers_reprobe" in sysfs (right? is there any other way a more "global" reprobe could happen and snatch up everything that's currently unbound?) Any load of a module will snatch up any unclaimed devices that match it, so if you unbind and leave the devices orpaned, a random module load could cause much badness. Adding a new_id will also cause a device scan, so if that happened to match the device: random badness.
So maybe I'd better ask someone who knows more about this than me - Alex, is there an issue with unbinding some devices in an iommu group from vfio-pci at an earlier time, and leaving then unbound to any driver at all while some other devices in the group are still in use by the guest? Is there an advantage to keeping them all bound to vfio-pci until none of them are used, and then unbinding/reprobing them all at the same time? Or should we unbind each from vfio-pci immediately when they are detached from the guest, and reprobe them all once they're all unbound? Unbinding them from vfio-pci leaves them susceptible to random bad things happen, as outlined above, and potentially limits vfio's ability to do things like bus resets. For instance imagine a 2-port NIC where each port is a PCI function, the functions are grouped together and the devices don't support any sort of internal reset. If both devices are bound to vfio-pci, then the user owns them both and we can do a bus reset. If one of those devices gets released from the user, as soon as it's unbound from vfio-pci it's no longer in our control and the bus rest option is gone.
The best course of action would be to leave any managed devices bound to vfio-pci until all of the devices within the group are no longer in use. Thanks, Hi Laine, Alex,
I am actually queuing the unbind from vfio until the last device reattach is requested when any device in the iommu group is in use by the guest. So, I believe this is taken care. Patch 9 is doing this. Thanks, Shiva
Alex

On Fri, 2015-11-20 at 11:33 -0500, Laine Stump wrote:
On 11/14/2015 03:36 AM, Shivaprasad G Bhat wrote:
The reprobe needs to be called multiple times for vfio devices for each device in the iommu group in future patch. So split the reprobe into a new function. No functional change.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virpci.c | 47 +++++++++++++++++++++++++++++++---------------- 1 file changed, 31 insertions(+), 16 deletions(-)
diff --git a/src/util/virpci.c b/src/util/virpci.c index 89e69e2..febf100 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1099,6 +1099,36 @@ virPCIIsKnownStub(char *driver) }
static int +virPCIDeviceReprobeHostDriver(virPCIDevicePtr dev, + const char *driver, + const char *drvdir) +{ + char *path = NULL; + int ret = -1; + + /* Trigger a re-probe of the device is not in the stub's dynamic
As long as you're moving the code, s/of/if/
+ * ID table. If the stub is available, but 'remove_id' isn't + * available, then re-probing would just cause the device to be + * re-bound to the stub. + */ + if (driver && !(path = virPCIDriverFile(driver, "remove_id"))) + goto cleanup; + + if (!driver || !virFileExists(drvdir) || virFileExists(path)) { + if (virFileWriteStr(PCI_SYSFS "drivers_probe", dev->name, 0) < 0) { + virReportSystemError(errno, + _("Failed to trigger a re-probe for PCI device '%s'"), + dev->name); + goto cleanup; + } + } + ret = 0; + cleanup: + VIR_FREE(path); + return ret; +} + +static int virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) { int result = -1; @@ -1150,24 +1180,9 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) goto cleanup; }
- /* Trigger a re-probe of the device is not in the stub's dynamic - * ID table. If the stub is available, but 'remove_id' isn't - * available, then re-probing would just cause the device to be - * re-bound to the stub. - */ - VIR_FREE(path); - if (driver && !(path = virPCIDriverFile(driver, "remove_id"))) + if (virPCIDeviceReprobeHostDriver(dev, driver, drvdir) < 0) goto cleanup;
- if (!driver || !virFileExists(drvdir) || virFileExists(path)) { - if (virFileWriteStr(PCI_SYSFS "drivers_probe", dev->name, 0) < 0) { - virReportSystemError(errno, - _("Failed to trigger a re-probe for PCI device '%s'"), - dev->name); - goto cleanup; - } - } - result = 0;
cleanup:
Seems safe, but is this really what we want to do? I haven't read/understood the remaining patches yet, but this makes it sound like what is going to happen is that all of the devices will be unbound from vfio-pci immediately, so they are "in limbo", and will then be reprobed once all devices are unused (and therefore unbound from vfio-pci).
I think that may be a bit dangerous. Instead, we should leave the devices bound to vfio-pci until all of them are unused, and at that time, we should unbind them all from vfio-pci, then reprobe them all. (again, I may have misunderstood the direction, if so ignore this).
If I am misunderstanding, and unbinding from vfio-pci will also be delayed until all devices are unused, then ACK.
Why don't we start making use of the driver_override feature that we've had in the kernel instead of continuing to hack on the racy add_id/remove_id stuff? We've already solved the problem in the kernel. Want to bind a device to vfio-pci: echo vfio-pci > /sys/bus/pci/devices/<dev>/driver_override if [ -e /sys/bus/pci/devices/<dev>/driver ]; then echo <dev> > /sys/bus/pci/devices/<dev>/driver/unbind fi echo <dev> > /sys/bus/pci/drivers_probe To rebind, replace vfio-pci in the first echo with null and repeat the rest. The affects are limited to a single device, we're not going to have surprise unbound devices show up bound to the driver, and we don't race anyone manipulating other devices. Thanks, Alex

On 11/20/2015 12:51 PM, Alex Williamson wrote:
On Fri, 2015-11-20 at 11:33 -0500, Laine Stump wrote:
On 11/14/2015 03:36 AM, Shivaprasad G Bhat wrote:
The reprobe needs to be called multiple times for vfio devices for each device in the iommu group in future patch. So split the reprobe into a new function. No functional change.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virpci.c | 47 +++++++++++++++++++++++++++++++---------------- 1 file changed, 31 insertions(+), 16 deletions(-)
diff --git a/src/util/virpci.c b/src/util/virpci.c index 89e69e2..febf100 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1099,6 +1099,36 @@ virPCIIsKnownStub(char *driver) }
static int +virPCIDeviceReprobeHostDriver(virPCIDevicePtr dev, + const char *driver, + const char *drvdir) +{ + char *path = NULL; + int ret = -1; + + /* Trigger a re-probe of the device is not in the stub's dynamic As long as you're moving the code, s/of/if/ + * ID table. If the stub is available, but 'remove_id' isn't + * available, then re-probing would just cause the device to be + * re-bound to the stub. + */ + if (driver && !(path = virPCIDriverFile(driver, "remove_id"))) + goto cleanup; + + if (!driver || !virFileExists(drvdir) || virFileExists(path)) { + if (virFileWriteStr(PCI_SYSFS "drivers_probe", dev->name, 0) < 0) { + virReportSystemError(errno, + _("Failed to trigger a re-probe for PCI device '%s'"), + dev->name); + goto cleanup; + } + } + ret = 0; + cleanup: + VIR_FREE(path); + return ret; +} + +static int virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) { int result = -1; @@ -1150,24 +1180,9 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) goto cleanup; }
- /* Trigger a re-probe of the device is not in the stub's dynamic - * ID table. If the stub is available, but 'remove_id' isn't - * available, then re-probing would just cause the device to be - * re-bound to the stub. - */ - VIR_FREE(path); - if (driver && !(path = virPCIDriverFile(driver, "remove_id"))) + if (virPCIDeviceReprobeHostDriver(dev, driver, drvdir) < 0) goto cleanup;
- if (!driver || !virFileExists(drvdir) || virFileExists(path)) { - if (virFileWriteStr(PCI_SYSFS "drivers_probe", dev->name, 0) < 0) { - virReportSystemError(errno, - _("Failed to trigger a re-probe for PCI device '%s'"), - dev->name); - goto cleanup; - } - } - result = 0;
cleanup:
Seems safe, but is this really what we want to do? I haven't read/understood the remaining patches yet, but this makes it sound like what is going to happen is that all of the devices will be unbound from vfio-pci immediately, so they are "in limbo", and will then be reprobed once all devices are unused (and therefore unbound from vfio-pci).
I think that may be a bit dangerous. Instead, we should leave the devices bound to vfio-pci until all of them are unused, and at that time, we should unbind them all from vfio-pci, then reprobe them all. (again, I may have misunderstood the direction, if so ignore this).
If I am misunderstanding, and unbinding from vfio-pci will also be delayed until all devices are unused, then ACK. Why don't we start making use of the driver_override feature that we've had in the kernel instead of continuing to hack on the racy add_id/remove_id stuff? We've already solved the problem in the kernel.
How far back was this available? I'm guessing it is safe in any kernel that also supports vfio? What about earlier than that? (okay, I think I just answered my own question by booting a RHEL6 guest (kernel 2.6.32) and seeing that it doesn't have driver_override. So we have to keep the existing code that uses add_id/remove_id, but can add use of driver_override in the cases that it is available. This can/should be done orthogonally to this current patch series).
Want to bind a device to vfio-pci:
echo vfio-pci > /sys/bus/pci/devices/<dev>/driver_override if [ -e /sys/bus/pci/devices/<dev>/driver ]; then echo <dev> > /sys/bus/pci/devices/<dev>/driver/unbind fi echo <dev> > /sys/bus/pci/drivers_probe
If multiple devices will be rebound to their host drivers, is there an advantage to doing all of the driver_override writing for all devices first, then doing the drivers_probe once at the end? Since libvirt has historically dealt with a single device at a time, the two were always done in immediate succession, but now we're talking about binding multiple devices at a time.
To rebind, replace vfio-pci in the first echo with null and repeat the rest. The affects are limited to a single device, we're not going to have surprise unbound devices show up bound to the driver, and we don't race anyone manipulating other devices. Thanks,
Alex

On 14.11.2015 09:36, Shivaprasad G Bhat wrote:
The reprobe needs to be called multiple times for vfio devices for each device in the iommu group in future patch. So split the reprobe into a new function. No functional change.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virpci.c | 47 +++++++++++++++++++++++++++++++---------------- 1 file changed, 31 insertions(+), 16 deletions(-)
diff --git a/src/util/virpci.c b/src/util/virpci.c index 89e69e2..febf100 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1099,6 +1099,36 @@ virPCIIsKnownStub(char *driver) }
static int +virPCIDeviceReprobeHostDriver(virPCIDevicePtr dev, + const char *driver, + const char *drvdir) +{ + char *path = NULL; + int ret = -1; + + /* Trigger a re-probe of the device is not in the stub's dynamic + * ID table. If the stub is available, but 'remove_id' isn't + * available, then re-probing would just cause the device to be + * re-bound to the stub. + */ + if (driver && !(path = virPCIDriverFile(driver, "remove_id"))) + goto cleanup; + + if (!driver || !virFileExists(drvdir) || virFileExists(path)) { + if (virFileWriteStr(PCI_SYSFS "drivers_probe", dev->name, 0) < 0) { + virReportSystemError(errno, + _("Failed to trigger a re-probe for PCI device '%s'"), + dev->name); + goto cleanup; + } + } + ret = 0; + cleanup: + VIR_FREE(path); + return ret; +} + +static int virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) { int result = -1; @@ -1150,24 +1180,9 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) goto cleanup; }
- /* Trigger a re-probe of the device is not in the stub's dynamic - * ID table. If the stub is available, but 'remove_id' isn't - * available, then re-probing would just cause the device to be - * re-bound to the stub. - */ - VIR_FREE(path); - if (driver && !(path = virPCIDriverFile(driver, "remove_id"))) + if (virPCIDeviceReprobeHostDriver(dev, driver, drvdir) < 0) goto cleanup;
- if (!driver || !virFileExists(drvdir) || virFileExists(path)) { - if (virFileWriteStr(PCI_SYSFS "drivers_probe", dev->name, 0) < 0) { - virReportSystemError(errno, - _("Failed to trigger a re-probe for PCI device '%s'"), - dev->name); - goto cleanup; - } - } - result = 0;
cleanup:
ACK Michal

The inactiveDevs need to be selectively altered for more than one device in case of vfio devices. So, pass the whole list. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virpci.c | 36 +++++++++++++++++++----------------- 1 file changed, 19 insertions(+), 17 deletions(-) diff --git a/src/util/virpci.c b/src/util/virpci.c index febf100..a8a22d1 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1129,7 +1129,9 @@ virPCIDeviceReprobeHostDriver(virPCIDevicePtr dev, } static int -virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) +virPCIDeviceUnbindFromStub(virPCIDevicePtr dev, + virPCIDeviceListPtr activeDevs ATTRIBUTE_UNUSED, + virPCIDeviceListPtr inactiveDevs) { int result = -1; char *drvdir = NULL; @@ -1186,6 +1188,9 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) result = 0; cleanup: + if ((result == 0) && inactiveDevs) + virPCIDeviceListDel(inactiveDevs, dev); + /* do not do it again */ dev->unbind_from_stub = false; dev->remove_slot = false; @@ -1201,7 +1206,9 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) static int virPCIDeviceBindToStub(virPCIDevicePtr dev, - const char *stubDriverName) + const char *stubDriverName, + virPCIDeviceListPtr activeDevs, + virPCIDeviceListPtr inactiveDevs) { int result = -1; bool reprobe = false; @@ -1328,9 +1335,15 @@ virPCIDeviceBindToStub(virPCIDevicePtr dev, VIR_FREE(driverLink); VIR_FREE(path); + /* Add *a copy of* the dev into list inactiveDevs, if + * it's not already there. */ + if ((result == 0) && inactiveDevs && !virPCIDeviceListFind(inactiveDevs, dev) && + virPCIDeviceListAddCopy(inactiveDevs, dev) < 0) { + result = -1; + } if (result < 0) { VIR_FREE(newDriverName); - virPCIDeviceUnbindFromStub(dev); + virPCIDeviceUnbindFromStub(dev, activeDevs, inactiveDevs); } else { VIR_FREE(dev->stubDriver); dev->stubDriver = newDriverName; @@ -1377,16 +1390,9 @@ virPCIDeviceDetach(virPCIDevicePtr dev, return -1; } - if (virPCIDeviceBindToStub(dev, dev->stubDriver) < 0) - return -1; - - /* Add *a copy of* the dev into list inactiveDevs, if - * it's not already there. - */ - if (inactiveDevs && !virPCIDeviceListFind(inactiveDevs, dev) && - virPCIDeviceListAddCopy(inactiveDevs, dev) < 0) { + if (virPCIDeviceBindToStub(dev, dev->stubDriver, + activeDevs, inactiveDevs) < 0) return -1; - } return 0; } @@ -1402,13 +1408,9 @@ virPCIDeviceReattach(virPCIDevicePtr dev, return -1; } - if (virPCIDeviceUnbindFromStub(dev) < 0) + if (virPCIDeviceUnbindFromStub(dev, activeDevs, inactiveDevs) < 0) return -1; - /* Steal the dev from list inactiveDevs */ - if (inactiveDevs) - virPCIDeviceListDel(inactiveDevs, dev); - return 0; }

On 14.11.2015 09:36, Shivaprasad G Bhat wrote:
The inactiveDevs need to be selectively altered for more than one device in case of vfio devices. So, pass the whole list.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virpci.c | 36 +++++++++++++++++++----------------- 1 file changed, 19 insertions(+), 17 deletions(-)
diff --git a/src/util/virpci.c b/src/util/virpci.c index febf100..a8a22d1 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1129,7 +1129,9 @@ virPCIDeviceReprobeHostDriver(virPCIDevicePtr dev, }
static int -virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) +virPCIDeviceUnbindFromStub(virPCIDevicePtr dev, + virPCIDeviceListPtr activeDevs ATTRIBUTE_UNUSED, + virPCIDeviceListPtr inactiveDevs) { int result = -1; char *drvdir = NULL; @@ -1186,6 +1188,9 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev) result = 0;
cleanup: + if ((result == 0) && inactiveDevs) + virPCIDeviceListDel(inactiveDevs, dev); + /* do not do it again */ dev->unbind_from_stub = false; dev->remove_slot = false; @@ -1201,7 +1206,9 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev)
static int virPCIDeviceBindToStub(virPCIDevicePtr dev, - const char *stubDriverName) + const char *stubDriverName, + virPCIDeviceListPtr activeDevs, + virPCIDeviceListPtr inactiveDevs) { int result = -1; bool reprobe = false; @@ -1328,9 +1335,15 @@ virPCIDeviceBindToStub(virPCIDevicePtr dev, VIR_FREE(driverLink); VIR_FREE(path);
+ /* Add *a copy of* the dev into list inactiveDevs, if + * it's not already there. */ + if ((result == 0) && inactiveDevs && !virPCIDeviceListFind(inactiveDevs, dev) && + virPCIDeviceListAddCopy(inactiveDevs, dev) < 0) { + result = -1; + } if (result < 0) { VIR_FREE(newDriverName); - virPCIDeviceUnbindFromStub(dev); + virPCIDeviceUnbindFromStub(dev, activeDevs, inactiveDevs); } else { VIR_FREE(dev->stubDriver); dev->stubDriver = newDriverName; @@ -1377,16 +1390,9 @@ virPCIDeviceDetach(virPCIDevicePtr dev, return -1; }
- if (virPCIDeviceBindToStub(dev, dev->stubDriver) < 0) - return -1; - - /* Add *a copy of* the dev into list inactiveDevs, if - * it's not already there. - */ - if (inactiveDevs && !virPCIDeviceListFind(inactiveDevs, dev) && - virPCIDeviceListAddCopy(inactiveDevs, dev) < 0) { + if (virPCIDeviceBindToStub(dev, dev->stubDriver, + activeDevs, inactiveDevs) < 0) return -1; - }
return 0; } @@ -1402,13 +1408,9 @@ virPCIDeviceReattach(virPCIDevicePtr dev, return -1; }
- if (virPCIDeviceUnbindFromStub(dev) < 0) + if (virPCIDeviceUnbindFromStub(dev, activeDevs, inactiveDevs) < 0) return -1;
- /* Steal the dev from list inactiveDevs */ - if (inactiveDevs) - virPCIDeviceListDel(inactiveDevs, dev); - return 0; }
ACK Michal

The next few patches implement the vfio-pci tests. So, change the test case to test the negative test case on pciback instead. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- tests/virpcimock.c | 2 +- tests/virpcitest.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/virpcimock.c b/tests/virpcimock.c index 0b49290..0724a36 100644 --- a/tests/virpcimock.c +++ b/tests/virpcimock.c @@ -815,7 +815,7 @@ init_env(void) MAKE_PCI_DRIVER("iwlwifi", 0x8086, 0x0044); MAKE_PCI_DRIVER("i915", 0x8086, 0x0046, 0x8086, 0x0047); MAKE_PCI_DRIVER("pci-stub", -1, -1); - pci_driver_new("vfio-pci", PCI_ACTION_BIND, -1, -1); + pci_driver_new("pciback", PCI_ACTION_BIND, -1, -1); # define MAKE_PCI_DEVICE(Id, Vendor, Device, ...) \ do { \ diff --git a/tests/virpcitest.c b/tests/virpcitest.c index d4d3253..25591f9 100644 --- a/tests/virpcitest.c +++ b/tests/virpcitest.c @@ -269,7 +269,7 @@ testVirPCIDeviceDetachFail(const void *opaque) if (!dev) goto cleanup; - if (virPCIDeviceSetStubDriver(dev, "vfio-pci") < 0) + if (virPCIDeviceSetStubDriver(dev, "pciback") < 0) goto cleanup; if (virPCIDeviceDetach(dev, NULL, NULL) < 0) {

On 11/14/2015 03:37 AM, Shivaprasad G Bhat wrote:
The next few patches implement the vfio-pci tests. So, change the test case to test the negative test case on pciback instead.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- tests/virpcimock.c | 2 +- tests/virpcitest.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tests/virpcimock.c b/tests/virpcimock.c index 0b49290..0724a36 100644 --- a/tests/virpcimock.c +++ b/tests/virpcimock.c @@ -815,7 +815,7 @@ init_env(void) MAKE_PCI_DRIVER("iwlwifi", 0x8086, 0x0044); MAKE_PCI_DRIVER("i915", 0x8086, 0x0046, 0x8086, 0x0047); MAKE_PCI_DRIVER("pci-stub", -1, -1); - pci_driver_new("vfio-pci", PCI_ACTION_BIND, -1, -1); + pci_driver_new("pciback", PCI_ACTION_BIND, -1, -1);
Is there a specific reason for making this the name of a stub driver that we support in the real driver, but not in the test driver? Or would the test be served just as well with some other random name? Or, maybe we really do want it to be vfio-pci so that we will test failure paths when the driver is vfio-pci. Either way, I'm not certain that changing this to "pciback" is the right thing. (not certain that it *isn't* either, which is why I Cc'ed jdenemar :-)
# define MAKE_PCI_DEVICE(Id, Vendor, Device, ...) \ do { \ diff --git a/tests/virpcitest.c b/tests/virpcitest.c index d4d3253..25591f9 100644 --- a/tests/virpcitest.c +++ b/tests/virpcitest.c @@ -269,7 +269,7 @@ testVirPCIDeviceDetachFail(const void *opaque) if (!dev) goto cleanup;
- if (virPCIDeviceSetStubDriver(dev, "vfio-pci") < 0) + if (virPCIDeviceSetStubDriver(dev, "pciback") < 0) goto cleanup;
if (virPCIDeviceDetach(dev, NULL, NULL) < 0) {
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 11/20/2015 10:35 PM, Laine Stump wrote:
On 11/14/2015 03:37 AM, Shivaprasad G Bhat wrote:
The next few patches implement the vfio-pci tests. So, change the test case to test the negative test case on pciback instead.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- tests/virpcimock.c | 2 +- tests/virpcitest.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tests/virpcimock.c b/tests/virpcimock.c index 0b49290..0724a36 100644 --- a/tests/virpcimock.c +++ b/tests/virpcimock.c @@ -815,7 +815,7 @@ init_env(void) MAKE_PCI_DRIVER("iwlwifi", 0x8086, 0x0044); MAKE_PCI_DRIVER("i915", 0x8086, 0x0046, 0x8086, 0x0047); MAKE_PCI_DRIVER("pci-stub", -1, -1); - pci_driver_new("vfio-pci", PCI_ACTION_BIND, -1, -1); + pci_driver_new("pciback", PCI_ACTION_BIND, -1, -1);
Is there a specific reason for making this the name of a stub driver that we support in the real driver, but not in the test driver? Or would the test be served just as well with some other random name?
I dont see why vfio-pci was chosen for negative test case earlier . I chose pciback because none of the existing test cases target this driver. So, felt safe to choose this driver for the negative test case. I think we can even choose a random test driver. Thanks, Shiva
Or, maybe we really do want it to be vfio-pci so that we will test failure paths when the driver is vfio-pci.
Either way, I'm not certain that changing this to "pciback" is the right thing. (not certain that it *isn't* either, which is why I Cc'ed jdenemar :-)
# define MAKE_PCI_DEVICE(Id, Vendor, Device, ...) \ do { \ diff --git a/tests/virpcitest.c b/tests/virpcitest.c index d4d3253..25591f9 100644 --- a/tests/virpcitest.c +++ b/tests/virpcitest.c @@ -269,7 +269,7 @@ testVirPCIDeviceDetachFail(const void *opaque) if (!dev) goto cleanup; - if (virPCIDeviceSetStubDriver(dev, "vfio-pci") < 0) + if (virPCIDeviceSetStubDriver(dev, "pciback") < 0) goto cleanup; if (virPCIDeviceDetach(dev, NULL, NULL) < 0) {
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

The iommu group info can be used to check if the devices are bound/unbound from vfio at the group level granualrity. Add the info to mock as to help add test cases later. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- tests/virpcimock.c | 180 +++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 164 insertions(+), 16 deletions(-) diff --git a/tests/virpcimock.c b/tests/virpcimock.c index 0724a36..837976a 100644 --- a/tests/virpcimock.c +++ b/tests/virpcimock.c @@ -127,9 +127,15 @@ struct pciDevice { int vendor; int device; int class; + int iommu; struct pciDriver *driver; /* Driver attached. NULL if attached to no driver */ }; +struct pciIommuGroup { + int iommu; + size_t nDevicesBoundToVFIO; /* Indicates the devices in the group */ +}; + struct fdCallback { int fd; char *path; @@ -141,6 +147,9 @@ size_t nPCIDevices = 0; struct pciDriver **pciDrivers = NULL; size_t nPCIDrivers = 0; +struct pciIommuGroup **pciIommuGroups = NULL; +size_t npciIommuGroups = 0; + struct fdCallback *callbacks = NULL; size_t nCallbacks = 0; @@ -325,6 +334,7 @@ pci_device_new_from_stub(const struct pciDevice *data) char *configSrc; char tmp[32]; struct stat sb; + char *iommugrouppath, *deviommupath, *iommugroupdevs = NULL; if (VIR_STRDUP_QUIET(id, data->id) < 0) ABORT_OOM(); @@ -387,6 +397,25 @@ pci_device_new_from_stub(const struct pciDevice *data) ABORT("@tmp overflow"); make_file(devpath, "class", tmp, -1); + if (virAsprintfQuiet(&deviommupath, "%s/iommu_group", devpath) < 0 || + virAsprintfQuiet(&iommugrouppath, "%s/iommu_groups/%d", + fakesysfsdir, dev->iommu) < 0) + ABORT("@deviommupath overflow"); + + if (symlink(iommugrouppath, deviommupath) < 0) + ABORT("Unable to link device to iommu group"); + + VIR_FREE(deviommupath); + if (virAsprintfQuiet(&iommugroupdevs, "%s/devices/%s", + iommugrouppath, dev->id) < 0) + ABORT("@iommugroupdevs overflow"); + + if (symlink(devpath, iommugroupdevs) < 0) + ABORT("Unable to link iommu group devices to current device"); + + VIR_FREE(iommugrouppath); + VIR_FREE(iommugroupdevs); + if (pci_device_autobind(dev) < 0) ABORT("Unable to bind: %s", data->id); @@ -435,7 +464,95 @@ pci_device_autobind(struct pciDevice *dev) return pci_driver_bind(driver, dev); } +static void +pci_iommu_new(int num) +{ + char *iommupath; + struct pciIommuGroup *iommuGroup; + + if (VIR_ALLOC_QUIET(iommuGroup) < 0) + ABORT_OOM(); + + iommuGroup->iommu = num; + iommuGroup->nDevicesBoundToVFIO = 0; /* No device bound to vfio by default */ + + if (virAsprintfQuiet(&iommupath, "%s/iommu_groups/%d/devices", fakesysfsdir, num) < 0) + ABORT_OOM(); + + if (virFileMakePath(iommupath) < 0) + ABORT("Unable to create: %s", iommupath); + + if (VIR_APPEND_ELEMENT_QUIET(pciIommuGroups, npciIommuGroups, iommuGroup) < 0) + ABORT_OOM(); +} + +static int +pci_vfio_release_iommu(struct pciDevice *device) +{ + char *vfiopath = NULL; + int ret = -1; + size_t i = 0; + + for (i = 0; i < npciIommuGroups; i++) { + if (pciIommuGroups[i]->iommu == device->iommu) + break; + } + + if (i != npciIommuGroups) { + if (pciIommuGroups[i]->nDevicesBoundToVFIO == 0) { + ret = 0; + goto cleanup; + } + pciIommuGroups[i]->nDevicesBoundToVFIO--; + if (!pciIommuGroups[i]->nDevicesBoundToVFIO) { + if (virAsprintfQuiet(&vfiopath, "%s/dev/vfio/%d", + fakesysfsdir, device->iommu) < 0) { + errno = ENOMEM; + goto cleanup; + } + if (unlink(vfiopath) < 0) + goto cleanup; + } + } + + ret = 0; + cleanup: + VIR_FREE(vfiopath); + return ret; +} + +static int +pci_vfio_lock_iommu(struct pciDevice *device) +{ + char *vfiopath = NULL; + int ret = -1; + size_t i = 0; + int fd = -1; + + for (i = 0; i < npciIommuGroups; i++) { + if (pciIommuGroups[i]->iommu == device->iommu) + break; + } + if (i != npciIommuGroups) { + if (!pciIommuGroups[i]->nDevicesBoundToVFIO) { + if (virAsprintfQuiet(&vfiopath, "%s/dev/vfio/%d", + fakesysfsdir, device->iommu) < 0) { + errno = ENOMEM; + goto cleanup; + } + if ((fd = realopen(vfiopath, O_CREAT)) < 0) + goto cleanup; + } + pciIommuGroups[i]->nDevicesBoundToVFIO++; + } + + ret = 0; + cleanup: + realclose(fd); + VIR_FREE(vfiopath); + return ret; +} /* * PCI Driver functions */ @@ -556,6 +673,10 @@ pci_driver_bind(struct pciDriver *driver, if (symlink(devpath, driverpath) < 0) goto cleanup; + if (STREQ(driver->name, "vfio-pci")) + if (pci_vfio_lock_iommu(dev) < 0) + goto cleanup; + dev->driver = driver; ret = 0; cleanup: @@ -590,6 +711,10 @@ pci_driver_unbind(struct pciDriver *driver, unlink(driverpath) < 0) goto cleanup; + if (STREQ(driver->name, "vfio-pci")) + if (pci_vfio_release_iommu(dev) < 0) + goto cleanup; + dev->driver = NULL; ret = 0; cleanup: @@ -801,6 +926,8 @@ init_syms(void) static void init_env(void) { + char *devvfio; + if (fakesysfsdir) return; @@ -809,12 +936,33 @@ init_env(void) make_file(fakesysfsdir, "drivers_probe", NULL, -1); + if (virAsprintfQuiet(&devvfio, "%s/dev/vfio", fakesysfsdir) < 0) + ABORT_OOM(); + + if (virFileMakePath(devvfio) < 0) + ABORT("Unable to create: %s", devvfio); + VIR_FREE(devvfio); + + pci_iommu_new(1); + pci_iommu_new(2); + pci_iommu_new(3); + pci_iommu_new(4); + pci_iommu_new(5); + pci_iommu_new(6); + pci_iommu_new(7); + pci_iommu_new(8); + pci_iommu_new(9); + pci_iommu_new(10); + pci_iommu_new(11); + + # define MAKE_PCI_DRIVER(name, ...) \ pci_driver_new(name, 0, __VA_ARGS__, -1, -1) - MAKE_PCI_DRIVER("iwlwifi", 0x8086, 0x0044); - MAKE_PCI_DRIVER("i915", 0x8086, 0x0046, 0x8086, 0x0047); + MAKE_PCI_DRIVER("i915", 0x8086, 0x0046, 0x8086, 0x0047, 0x8086, 0x0048, 0x1033, 0x0035, 0x1033, 0x00e0); + MAKE_PCI_DRIVER("iwlwifi", 0x8086, 0x0044, 0x0086, 0x105e, 0x0086, 0x105d); MAKE_PCI_DRIVER("pci-stub", -1, -1); + MAKE_PCI_DRIVER("vfio-pci", -1, -1); pci_driver_new("pciback", PCI_ACTION_BIND, -1, -1); # define MAKE_PCI_DEVICE(Id, Vendor, Device, ...) \ @@ -824,20 +972,20 @@ init_env(void) pci_device_new_from_stub(&dev); \ } while (0) - MAKE_PCI_DEVICE("0000:00:00.0", 0x8086, 0x0044); - MAKE_PCI_DEVICE("0000:00:01.0", 0x8086, 0x0044); - MAKE_PCI_DEVICE("0000:00:02.0", 0x8086, 0x0046); - MAKE_PCI_DEVICE("0000:00:03.0", 0x8086, 0x0048); - MAKE_PCI_DEVICE("0001:00:00.0", 0x1014, 0x03b9, .class = 0x060400); - MAKE_PCI_DEVICE("0001:01:00.0", 0x8086, 0x105e); - MAKE_PCI_DEVICE("0001:01:00.1", 0x8086, 0x105e); - MAKE_PCI_DEVICE("0005:80:00.0", 0x10b5, 0x8112, .class = 0x060400); - MAKE_PCI_DEVICE("0005:90:01.0", 0x1033, 0x0035); - MAKE_PCI_DEVICE("0005:90:01.1", 0x1033, 0x0035); - MAKE_PCI_DEVICE("0005:90:01.2", 0x1033, 0x00e0); - MAKE_PCI_DEVICE("0000:0a:01.0", 0x8086, 0x0047); - MAKE_PCI_DEVICE("0000:0a:02.0", 0x8286, 0x0048); - MAKE_PCI_DEVICE("0000:0a:03.0", 0x8386, 0x0048); + MAKE_PCI_DEVICE("0000:00:00.0", 0x8086, 0x0044, .iommu = 1); + MAKE_PCI_DEVICE("0000:00:01.0", 0x8086, 0x0044, .iommu = 2); + MAKE_PCI_DEVICE("0000:00:02.0", 0x8086, 0x0046, .iommu = 3); + MAKE_PCI_DEVICE("0000:00:03.0", 0x8086, 0x0048, .iommu = 4); + MAKE_PCI_DEVICE("0001:00:00.0", 0x1014, 0x03b9, .iommu = 5, .class = 0x060400); + MAKE_PCI_DEVICE("0001:01:00.0", 0x8086, 0x105e, .iommu = 6); + MAKE_PCI_DEVICE("0001:01:00.1", 0x8086, 0x105d, .iommu = 6); + MAKE_PCI_DEVICE("0005:80:00.0", 0x10b5, 0x8112, .iommu = 7, .class = 0x060400); + MAKE_PCI_DEVICE("0005:90:01.0", 0x1033, 0x0035, .iommu = 8); + MAKE_PCI_DEVICE("0005:90:01.1", 0x1033, 0x0035, .iommu = 8); + MAKE_PCI_DEVICE("0005:90:01.2", 0x1033, 0x00e0, .iommu = 8); + MAKE_PCI_DEVICE("0000:0a:01.0", 0x8086, 0x0047, .iommu = 9); + MAKE_PCI_DEVICE("0000:0a:02.0", 0x8286, 0x0048, .iommu = 10); + MAKE_PCI_DEVICE("0000:0a:03.0", 0x8386, 0x0048, .iommu = 11); }

I'm responding to this only to say that is someone with prior experience with virpcimock.c could review it instead, that would be *much* better. (If not, I can come back and do it). On 11/14/2015 03:38 AM, Shivaprasad G Bhat wrote:
The iommu group info can be used to check if the devices are bound/unbound from vfio at the group level granualrity. Add the info to mock as to help add test cases later.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- tests/virpcimock.c | 180 +++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 164 insertions(+), 16 deletions(-)
diff --git a/tests/virpcimock.c b/tests/virpcimock.c index 0724a36..837976a 100644 --- a/tests/virpcimock.c +++ b/tests/virpcimock.c @@ -127,9 +127,15 @@ struct pciDevice { int vendor; int device; int class; + int iommu; struct pciDriver *driver; /* Driver attached. NULL if attached to no driver */ };
+struct pciIommuGroup { + int iommu; + size_t nDevicesBoundToVFIO; /* Indicates the devices in the group */ +}; + struct fdCallback { int fd; char *path; @@ -141,6 +147,9 @@ size_t nPCIDevices = 0; struct pciDriver **pciDrivers = NULL; size_t nPCIDrivers = 0;
+struct pciIommuGroup **pciIommuGroups = NULL; +size_t npciIommuGroups = 0; + struct fdCallback *callbacks = NULL; size_t nCallbacks = 0;
@@ -325,6 +334,7 @@ pci_device_new_from_stub(const struct pciDevice *data) char *configSrc; char tmp[32]; struct stat sb; + char *iommugrouppath, *deviommupath, *iommugroupdevs = NULL;
if (VIR_STRDUP_QUIET(id, data->id) < 0) ABORT_OOM(); @@ -387,6 +397,25 @@ pci_device_new_from_stub(const struct pciDevice *data) ABORT("@tmp overflow"); make_file(devpath, "class", tmp, -1);
+ if (virAsprintfQuiet(&deviommupath, "%s/iommu_group", devpath) < 0 || + virAsprintfQuiet(&iommugrouppath, "%s/iommu_groups/%d", + fakesysfsdir, dev->iommu) < 0) + ABORT("@deviommupath overflow"); + + if (symlink(iommugrouppath, deviommupath) < 0) + ABORT("Unable to link device to iommu group"); + + VIR_FREE(deviommupath); + if (virAsprintfQuiet(&iommugroupdevs, "%s/devices/%s", + iommugrouppath, dev->id) < 0) + ABORT("@iommugroupdevs overflow"); + + if (symlink(devpath, iommugroupdevs) < 0) + ABORT("Unable to link iommu group devices to current device"); + + VIR_FREE(iommugrouppath); + VIR_FREE(iommugroupdevs); + if (pci_device_autobind(dev) < 0) ABORT("Unable to bind: %s", data->id);
@@ -435,7 +464,95 @@ pci_device_autobind(struct pciDevice *dev) return pci_driver_bind(driver, dev); }
+static void +pci_iommu_new(int num) +{ + char *iommupath; + struct pciIommuGroup *iommuGroup; + + if (VIR_ALLOC_QUIET(iommuGroup) < 0) + ABORT_OOM(); + + iommuGroup->iommu = num; + iommuGroup->nDevicesBoundToVFIO = 0; /* No device bound to vfio by default */ + + if (virAsprintfQuiet(&iommupath, "%s/iommu_groups/%d/devices", fakesysfsdir, num) < 0) + ABORT_OOM(); + + if (virFileMakePath(iommupath) < 0) + ABORT("Unable to create: %s", iommupath); + + if (VIR_APPEND_ELEMENT_QUIET(pciIommuGroups, npciIommuGroups, iommuGroup) < 0) + ABORT_OOM(); +} + +static int +pci_vfio_release_iommu(struct pciDevice *device) +{ + char *vfiopath = NULL; + int ret = -1; + size_t i = 0; + + for (i = 0; i < npciIommuGroups; i++) { + if (pciIommuGroups[i]->iommu == device->iommu) + break; + } + + if (i != npciIommuGroups) { + if (pciIommuGroups[i]->nDevicesBoundToVFIO == 0) { + ret = 0; + goto cleanup; + } + pciIommuGroups[i]->nDevicesBoundToVFIO--; + if (!pciIommuGroups[i]->nDevicesBoundToVFIO) { + if (virAsprintfQuiet(&vfiopath, "%s/dev/vfio/%d", + fakesysfsdir, device->iommu) < 0) { + errno = ENOMEM; + goto cleanup; + } + if (unlink(vfiopath) < 0) + goto cleanup; + } + } + + ret = 0; + cleanup: + VIR_FREE(vfiopath); + return ret; +} + +static int +pci_vfio_lock_iommu(struct pciDevice *device) +{ + char *vfiopath = NULL; + int ret = -1; + size_t i = 0; + int fd = -1; + + for (i = 0; i < npciIommuGroups; i++) { + if (pciIommuGroups[i]->iommu == device->iommu) + break; + }
+ if (i != npciIommuGroups) { + if (!pciIommuGroups[i]->nDevicesBoundToVFIO) { + if (virAsprintfQuiet(&vfiopath, "%s/dev/vfio/%d", + fakesysfsdir, device->iommu) < 0) { + errno = ENOMEM; + goto cleanup; + } + if ((fd = realopen(vfiopath, O_CREAT)) < 0) + goto cleanup; + } + pciIommuGroups[i]->nDevicesBoundToVFIO++; + } + + ret = 0; + cleanup: + realclose(fd); + VIR_FREE(vfiopath); + return ret; +} /* * PCI Driver functions */ @@ -556,6 +673,10 @@ pci_driver_bind(struct pciDriver *driver, if (symlink(devpath, driverpath) < 0) goto cleanup;
+ if (STREQ(driver->name, "vfio-pci")) + if (pci_vfio_lock_iommu(dev) < 0) + goto cleanup; + dev->driver = driver; ret = 0; cleanup: @@ -590,6 +711,10 @@ pci_driver_unbind(struct pciDriver *driver, unlink(driverpath) < 0) goto cleanup;
+ if (STREQ(driver->name, "vfio-pci")) + if (pci_vfio_release_iommu(dev) < 0) + goto cleanup; + dev->driver = NULL; ret = 0; cleanup: @@ -801,6 +926,8 @@ init_syms(void) static void init_env(void) { + char *devvfio; + if (fakesysfsdir) return;
@@ -809,12 +936,33 @@ init_env(void)
make_file(fakesysfsdir, "drivers_probe", NULL, -1);
+ if (virAsprintfQuiet(&devvfio, "%s/dev/vfio", fakesysfsdir) < 0) + ABORT_OOM(); + + if (virFileMakePath(devvfio) < 0) + ABORT("Unable to create: %s", devvfio); + VIR_FREE(devvfio); + + pci_iommu_new(1); + pci_iommu_new(2); + pci_iommu_new(3); + pci_iommu_new(4); + pci_iommu_new(5); + pci_iommu_new(6); + pci_iommu_new(7); + pci_iommu_new(8); + pci_iommu_new(9); + pci_iommu_new(10); + pci_iommu_new(11); + + # define MAKE_PCI_DRIVER(name, ...) \ pci_driver_new(name, 0, __VA_ARGS__, -1, -1)
- MAKE_PCI_DRIVER("iwlwifi", 0x8086, 0x0044); - MAKE_PCI_DRIVER("i915", 0x8086, 0x0046, 0x8086, 0x0047); + MAKE_PCI_DRIVER("i915", 0x8086, 0x0046, 0x8086, 0x0047, 0x8086, 0x0048, 0x1033, 0x0035, 0x1033, 0x00e0); + MAKE_PCI_DRIVER("iwlwifi", 0x8086, 0x0044, 0x0086, 0x105e, 0x0086, 0x105d); MAKE_PCI_DRIVER("pci-stub", -1, -1); + MAKE_PCI_DRIVER("vfio-pci", -1, -1); pci_driver_new("pciback", PCI_ACTION_BIND, -1, -1);
# define MAKE_PCI_DEVICE(Id, Vendor, Device, ...) \ @@ -824,20 +972,20 @@ init_env(void) pci_device_new_from_stub(&dev); \ } while (0)
- MAKE_PCI_DEVICE("0000:00:00.0", 0x8086, 0x0044); - MAKE_PCI_DEVICE("0000:00:01.0", 0x8086, 0x0044); - MAKE_PCI_DEVICE("0000:00:02.0", 0x8086, 0x0046); - MAKE_PCI_DEVICE("0000:00:03.0", 0x8086, 0x0048); - MAKE_PCI_DEVICE("0001:00:00.0", 0x1014, 0x03b9, .class = 0x060400); - MAKE_PCI_DEVICE("0001:01:00.0", 0x8086, 0x105e); - MAKE_PCI_DEVICE("0001:01:00.1", 0x8086, 0x105e); - MAKE_PCI_DEVICE("0005:80:00.0", 0x10b5, 0x8112, .class = 0x060400); - MAKE_PCI_DEVICE("0005:90:01.0", 0x1033, 0x0035); - MAKE_PCI_DEVICE("0005:90:01.1", 0x1033, 0x0035); - MAKE_PCI_DEVICE("0005:90:01.2", 0x1033, 0x00e0); - MAKE_PCI_DEVICE("0000:0a:01.0", 0x8086, 0x0047); - MAKE_PCI_DEVICE("0000:0a:02.0", 0x8286, 0x0048); - MAKE_PCI_DEVICE("0000:0a:03.0", 0x8386, 0x0048); + MAKE_PCI_DEVICE("0000:00:00.0", 0x8086, 0x0044, .iommu = 1); + MAKE_PCI_DEVICE("0000:00:01.0", 0x8086, 0x0044, .iommu = 2); + MAKE_PCI_DEVICE("0000:00:02.0", 0x8086, 0x0046, .iommu = 3); + MAKE_PCI_DEVICE("0000:00:03.0", 0x8086, 0x0048, .iommu = 4); + MAKE_PCI_DEVICE("0001:00:00.0", 0x1014, 0x03b9, .iommu = 5, .class = 0x060400); + MAKE_PCI_DEVICE("0001:01:00.0", 0x8086, 0x105e, .iommu = 6); + MAKE_PCI_DEVICE("0001:01:00.1", 0x8086, 0x105d, .iommu = 6); + MAKE_PCI_DEVICE("0005:80:00.0", 0x10b5, 0x8112, .iommu = 7, .class = 0x060400); + MAKE_PCI_DEVICE("0005:90:01.0", 0x1033, 0x0035, .iommu = 8); + MAKE_PCI_DEVICE("0005:90:01.1", 0x1033, 0x0035, .iommu = 8); + MAKE_PCI_DEVICE("0005:90:01.2", 0x1033, 0x00e0, .iommu = 8); + MAKE_PCI_DEVICE("0000:0a:01.0", 0x8086, 0x0047, .iommu = 9); + MAKE_PCI_DEVICE("0000:0a:02.0", 0x8286, 0x0048, .iommu = 10); + MAKE_PCI_DEVICE("0000:0a:03.0", 0x8386, 0x0048, .iommu = 11); }
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 14.11.2015 09:38, Shivaprasad G Bhat wrote:
The iommu group info can be used to check if the devices are bound/unbound from vfio at the group level granualrity. Add the info to mock as to help add test cases later.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- tests/virpcimock.c | 180 +++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 164 insertions(+), 16 deletions(-)
diff --git a/tests/virpcimock.c b/tests/virpcimock.c index 0724a36..837976a 100644 --- a/tests/virpcimock.c +++ b/tests/virpcimock.c @@ -127,9 +127,15 @@ struct pciDevice { int vendor; int device; int class; + int iommu; struct pciDriver *driver; /* Driver attached. NULL if attached to no driver */ };
+struct pciIommuGroup { + int iommu; + size_t nDevicesBoundToVFIO; /* Indicates the devices in the group */ +}; + struct fdCallback { int fd; char *path; @@ -141,6 +147,9 @@ size_t nPCIDevices = 0; struct pciDriver **pciDrivers = NULL; size_t nPCIDrivers = 0;
+struct pciIommuGroup **pciIommuGroups = NULL; +size_t npciIommuGroups = 0; + struct fdCallback *callbacks = NULL; size_t nCallbacks = 0;
@@ -325,6 +334,7 @@ pci_device_new_from_stub(const struct pciDevice *data) char *configSrc; char tmp[32]; struct stat sb; + char *iommugrouppath, *deviommupath, *iommugroupdevs = NULL;
if (VIR_STRDUP_QUIET(id, data->id) < 0) ABORT_OOM(); @@ -387,6 +397,25 @@ pci_device_new_from_stub(const struct pciDevice *data) ABORT("@tmp overflow"); make_file(devpath, "class", tmp, -1);
+ if (virAsprintfQuiet(&deviommupath, "%s/iommu_group", devpath) < 0 || + virAsprintfQuiet(&iommugrouppath, "%s/iommu_groups/%d", + fakesysfsdir, dev->iommu) < 0) + ABORT("@deviommupath overflow");
Technically this is not overflow rather than OOM. I guess you've just copied a pattern just above these lines (not to be seen in this patch though), where we really are snprintf()-ing into 32 bytes wide string. Here we have all the memory, so this should be ABORT_OOM();
+ + if (symlink(iommugrouppath, deviommupath) < 0) + ABORT("Unable to link device to iommu group"); + + VIR_FREE(deviommupath); + if (virAsprintfQuiet(&iommugroupdevs, "%s/devices/%s", + iommugrouppath, dev->id) < 0) + ABORT("@iommugroupdevs overflow");
Same here.
+ + if (symlink(devpath, iommugroupdevs) < 0) + ABORT("Unable to link iommu group devices to current device"); + + VIR_FREE(iommugrouppath); + VIR_FREE(iommugroupdevs); + if (pci_device_autobind(dev) < 0) ABORT("Unable to bind: %s", data->id);
@@ -435,7 +464,95 @@ pci_device_autobind(struct pciDevice *dev) return pci_driver_bind(driver, dev); }
+static void +pci_iommu_new(int num) +{ + char *iommupath; + struct pciIommuGroup *iommuGroup; + + if (VIR_ALLOC_QUIET(iommuGroup) < 0) + ABORT_OOM(); + + iommuGroup->iommu = num; + iommuGroup->nDevicesBoundToVFIO = 0; /* No device bound to vfio by default */ + + if (virAsprintfQuiet(&iommupath, "%s/iommu_groups/%d/devices", fakesysfsdir, num) < 0) + ABORT_OOM(); + + if (virFileMakePath(iommupath) < 0) + ABORT("Unable to create: %s", iommupath); + + if (VIR_APPEND_ELEMENT_QUIET(pciIommuGroups, npciIommuGroups, iommuGroup) < 0) + ABORT_OOM();
@iommupath is no longer needed and leaked.
+} + +static int +pci_vfio_release_iommu(struct pciDevice *device) +{ + char *vfiopath = NULL; + int ret = -1; + size_t i = 0; + + for (i = 0; i < npciIommuGroups; i++) { + if (pciIommuGroups[i]->iommu == device->iommu) + break; + } + + if (i != npciIommuGroups) { + if (pciIommuGroups[i]->nDevicesBoundToVFIO == 0) { + ret = 0; + goto cleanup; + }
This is somewhat confusing to me. This can happen due to a bug in our code - in that case I'd expect an error to be thrown.
+ pciIommuGroups[i]->nDevicesBoundToVFIO--; + if (!pciIommuGroups[i]->nDevicesBoundToVFIO) { + if (virAsprintfQuiet(&vfiopath, "%s/dev/vfio/%d", + fakesysfsdir, device->iommu) < 0) { + errno = ENOMEM; + goto cleanup; + } + if (unlink(vfiopath) < 0) + goto cleanup; + } + } + + ret = 0; + cleanup: + VIR_FREE(vfiopath); + return ret; +} + +static int +pci_vfio_lock_iommu(struct pciDevice *device) +{ + char *vfiopath = NULL; + int ret = -1; + size_t i = 0; + int fd = -1; + + for (i = 0; i < npciIommuGroups; i++) { + if (pciIommuGroups[i]->iommu == device->iommu) + break; + }
+ if (i != npciIommuGroups) { + if (!pciIommuGroups[i]->nDevicesBoundToVFIO) { + if (virAsprintfQuiet(&vfiopath, "%s/dev/vfio/%d", + fakesysfsdir, device->iommu) < 0) { + errno = ENOMEM; + goto cleanup; + } + if ((fd = realopen(vfiopath, O_CREAT)) < 0) + goto cleanup; + } + pciIommuGroups[i]->nDevicesBoundToVFIO++; + } + + ret = 0; + cleanup: + realclose(fd); + VIR_FREE(vfiopath); + return ret; +}
Missing empty line.
/* * PCI Driver functions */ @@ -556,6 +673,10 @@ pci_driver_bind(struct pciDriver *driver, if (symlink(devpath, driverpath) < 0) goto cleanup;
+ if (STREQ(driver->name, "vfio-pci")) + if (pci_vfio_lock_iommu(dev) < 0) + goto cleanup; + dev->driver = driver; ret = 0; cleanup: @@ -590,6 +711,10 @@ pci_driver_unbind(struct pciDriver *driver, unlink(driverpath) < 0) goto cleanup;
+ if (STREQ(driver->name, "vfio-pci")) + if (pci_vfio_release_iommu(dev) < 0) + goto cleanup; + dev->driver = NULL; ret = 0; cleanup: @@ -801,6 +926,8 @@ init_syms(void) static void init_env(void) { + char *devvfio; + if (fakesysfsdir) return;
@@ -809,12 +936,33 @@ init_env(void)
make_file(fakesysfsdir, "drivers_probe", NULL, -1);
+ if (virAsprintfQuiet(&devvfio, "%s/dev/vfio", fakesysfsdir) < 0) + ABORT_OOM(); + + if (virFileMakePath(devvfio) < 0) + ABORT("Unable to create: %s", devvfio); + VIR_FREE(devvfio); + + pci_iommu_new(1); + pci_iommu_new(2); + pci_iommu_new(3); + pci_iommu_new(4); + pci_iommu_new(5); + pci_iommu_new(6); + pci_iommu_new(7); + pci_iommu_new(8); + pci_iommu_new(9); + pci_iommu_new(10); + pci_iommu_new(11); + + # define MAKE_PCI_DRIVER(name, ...) \ pci_driver_new(name, 0, __VA_ARGS__, -1, -1)
- MAKE_PCI_DRIVER("iwlwifi", 0x8086, 0x0044); - MAKE_PCI_DRIVER("i915", 0x8086, 0x0046, 0x8086, 0x0047); + MAKE_PCI_DRIVER("i915", 0x8086, 0x0046, 0x8086, 0x0047, 0x8086, 0x0048, 0x1033, 0x0035, 0x1033, 0x00e0); + MAKE_PCI_DRIVER("iwlwifi", 0x8086, 0x0044, 0x0086, 0x105e, 0x0086, 0x105d);
The whole idea was that we have some PCI devices not attached to any driver. I'd like to keep a device or two that way.
MAKE_PCI_DRIVER("pci-stub", -1, -1); + MAKE_PCI_DRIVER("vfio-pci", -1, -1); pci_driver_new("pciback", PCI_ACTION_BIND, -1, -1);
# define MAKE_PCI_DEVICE(Id, Vendor, Device, ...) \ @@ -824,20 +972,20 @@ init_env(void) pci_device_new_from_stub(&dev); \ } while (0)
- MAKE_PCI_DEVICE("0000:00:00.0", 0x8086, 0x0044); - MAKE_PCI_DEVICE("0000:00:01.0", 0x8086, 0x0044); - MAKE_PCI_DEVICE("0000:00:02.0", 0x8086, 0x0046); - MAKE_PCI_DEVICE("0000:00:03.0", 0x8086, 0x0048); - MAKE_PCI_DEVICE("0001:00:00.0", 0x1014, 0x03b9, .class = 0x060400); - MAKE_PCI_DEVICE("0001:01:00.0", 0x8086, 0x105e); - MAKE_PCI_DEVICE("0001:01:00.1", 0x8086, 0x105e); - MAKE_PCI_DEVICE("0005:80:00.0", 0x10b5, 0x8112, .class = 0x060400); - MAKE_PCI_DEVICE("0005:90:01.0", 0x1033, 0x0035); - MAKE_PCI_DEVICE("0005:90:01.1", 0x1033, 0x0035); - MAKE_PCI_DEVICE("0005:90:01.2", 0x1033, 0x00e0); - MAKE_PCI_DEVICE("0000:0a:01.0", 0x8086, 0x0047); - MAKE_PCI_DEVICE("0000:0a:02.0", 0x8286, 0x0048); - MAKE_PCI_DEVICE("0000:0a:03.0", 0x8386, 0x0048); + MAKE_PCI_DEVICE("0000:00:00.0", 0x8086, 0x0044, .iommu = 1); + MAKE_PCI_DEVICE("0000:00:01.0", 0x8086, 0x0044, .iommu = 2); + MAKE_PCI_DEVICE("0000:00:02.0", 0x8086, 0x0046, .iommu = 3); + MAKE_PCI_DEVICE("0000:00:03.0", 0x8086, 0x0048, .iommu = 4); + MAKE_PCI_DEVICE("0001:00:00.0", 0x1014, 0x03b9, .iommu = 5, .class = 0x060400); + MAKE_PCI_DEVICE("0001:01:00.0", 0x8086, 0x105e, .iommu = 6); + MAKE_PCI_DEVICE("0001:01:00.1", 0x8086, 0x105d, .iommu = 6);
Okay, you want two slightly different WiFi cards. Thank God for git show --word-diff.
+ MAKE_PCI_DEVICE("0005:80:00.0", 0x10b5, 0x8112, .iommu = 7, .class = 0x060400); + MAKE_PCI_DEVICE("0005:90:01.0", 0x1033, 0x0035, .iommu = 8); + MAKE_PCI_DEVICE("0005:90:01.1", 0x1033, 0x0035, .iommu = 8); + MAKE_PCI_DEVICE("0005:90:01.2", 0x1033, 0x00e0, .iommu = 8); + MAKE_PCI_DEVICE("0000:0a:01.0", 0x8086, 0x0047, .iommu = 9); + MAKE_PCI_DEVICE("0000:0a:02.0", 0x8286, 0x0048, .iommu = 10); + MAKE_PCI_DEVICE("0000:0a:03.0", 0x8386, 0x0048, .iommu = 11); }
This is the diff that I've came up with so far: diff --git a/tests/virpcimock.c b/tests/virpcimock.c index 837976a..5ef5eac 100644 --- a/tests/virpcimock.c +++ b/tests/virpcimock.c @@ -400,7 +400,7 @@ pci_device_new_from_stub(const struct pciDevice *data) if (virAsprintfQuiet(&deviommupath, "%s/iommu_group", devpath) < 0 || virAsprintfQuiet(&iommugrouppath, "%s/iommu_groups/%d", fakesysfsdir, dev->iommu) < 0) - ABORT("@deviommupath overflow"); + ABORT_OOM(); if (symlink(iommugrouppath, deviommupath) < 0) ABORT("Unable to link device to iommu group"); @@ -408,7 +408,7 @@ pci_device_new_from_stub(const struct pciDevice *data) VIR_FREE(deviommupath); if (virAsprintfQuiet(&iommugroupdevs, "%s/devices/%s", iommugrouppath, dev->id) < 0) - ABORT("@iommugroupdevs overflow"); + ABORT_OOM(); if (symlink(devpath, iommugroupdevs) < 0) ABORT("Unable to link iommu group devices to current device"); @@ -484,6 +484,8 @@ pci_iommu_new(int num) if (VIR_APPEND_ELEMENT_QUIET(pciIommuGroups, npciIommuGroups, iommuGroup) < 0) ABORT_OOM(); + + VIR_FREE(iommupath); } static int @@ -499,10 +501,9 @@ pci_vfio_release_iommu(struct pciDevice *device) } if (i != npciIommuGroups) { - if (pciIommuGroups[i]->nDevicesBoundToVFIO == 0) { - ret = 0; - goto cleanup; - } + if (pciIommuGroups[i]->nDevicesBoundToVFIO == 0) + ABORT("nDevicesBoundToVFIO has unexpected value"); + pciIommuGroups[i]->nDevicesBoundToVFIO--; if (!pciIommuGroups[i]->nDevicesBoundToVFIO) { if (virAsprintfQuiet(&vfiopath, "%s/dev/vfio/%d", @@ -553,6 +554,7 @@ pci_vfio_lock_iommu(struct pciDevice *device) VIR_FREE(vfiopath); return ret; } + /* * PCI Driver functions */ Plus what's needed to have at least one or two device not attached to any driver. I'm okay if you send it as reply that will be squashed in before pushing. Otherwise looking good. weak ACK. Michal

Thanks for the comments Michal.. I'll fix them all in my next version. Regards, Shivaprasad On Mon, Nov 23, 2015 at 10:34 PM, Michal Privoznik <mprivozn@redhat.com> wrote:
On 14.11.2015 09:38, Shivaprasad G Bhat wrote:
The iommu group info can be used to check if the devices are bound/unbound from vfio at the group level granualrity. Add the info to mock as to help add test cases later.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- tests/virpcimock.c | 180 +++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 164 insertions(+), 16 deletions(-)
diff --git a/tests/virpcimock.c b/tests/virpcimock.c index 0724a36..837976a 100644 --- a/tests/virpcimock.c +++ b/tests/virpcimock.c @@ -127,9 +127,15 @@ struct pciDevice { int vendor; int device; int class; + int iommu; struct pciDriver *driver; /* Driver attached. NULL if attached to no driver */ };
+struct pciIommuGroup { + int iommu; + size_t nDevicesBoundToVFIO; /* Indicates the devices in the group */ +}; + struct fdCallback { int fd; char *path; @@ -141,6 +147,9 @@ size_t nPCIDevices = 0; struct pciDriver **pciDrivers = NULL; size_t nPCIDrivers = 0;
+struct pciIommuGroup **pciIommuGroups = NULL; +size_t npciIommuGroups = 0; + struct fdCallback *callbacks = NULL; size_t nCallbacks = 0;
@@ -325,6 +334,7 @@ pci_device_new_from_stub(const struct pciDevice *data) char *configSrc; char tmp[32]; struct stat sb; + char *iommugrouppath, *deviommupath, *iommugroupdevs = NULL;
if (VIR_STRDUP_QUIET(id, data->id) < 0) ABORT_OOM(); @@ -387,6 +397,25 @@ pci_device_new_from_stub(const struct pciDevice *data) ABORT("@tmp overflow"); make_file(devpath, "class", tmp, -1);
+ if (virAsprintfQuiet(&deviommupath, "%s/iommu_group", devpath) < 0 || + virAsprintfQuiet(&iommugrouppath, "%s/iommu_groups/%d", + fakesysfsdir, dev->iommu) < 0) + ABORT("@deviommupath overflow");
Technically this is not overflow rather than OOM. I guess you've just copied a pattern just above these lines (not to be seen in this patch though), where we really are snprintf()-ing into 32 bytes wide string. Here we have all the memory, so this should be ABORT_OOM();
+ + if (symlink(iommugrouppath, deviommupath) < 0) + ABORT("Unable to link device to iommu group"); + + VIR_FREE(deviommupath); + if (virAsprintfQuiet(&iommugroupdevs, "%s/devices/%s", + iommugrouppath, dev->id) < 0) + ABORT("@iommugroupdevs overflow");
Same here.
+ + if (symlink(devpath, iommugroupdevs) < 0) + ABORT("Unable to link iommu group devices to current device"); + + VIR_FREE(iommugrouppath); + VIR_FREE(iommugroupdevs); + if (pci_device_autobind(dev) < 0) ABORT("Unable to bind: %s", data->id);
@@ -435,7 +464,95 @@ pci_device_autobind(struct pciDevice *dev) return pci_driver_bind(driver, dev); }
+static void +pci_iommu_new(int num) +{ + char *iommupath; + struct pciIommuGroup *iommuGroup; + + if (VIR_ALLOC_QUIET(iommuGroup) < 0) + ABORT_OOM(); + + iommuGroup->iommu = num; + iommuGroup->nDevicesBoundToVFIO = 0; /* No device bound to vfio by default */ + + if (virAsprintfQuiet(&iommupath, "%s/iommu_groups/%d/devices", fakesysfsdir, num) < 0) + ABORT_OOM(); + + if (virFileMakePath(iommupath) < 0) + ABORT("Unable to create: %s", iommupath); + + if (VIR_APPEND_ELEMENT_QUIET(pciIommuGroups, npciIommuGroups, iommuGroup) < 0) + ABORT_OOM();
@iommupath is no longer needed and leaked.
+} + +static int +pci_vfio_release_iommu(struct pciDevice *device) +{ + char *vfiopath = NULL; + int ret = -1; + size_t i = 0; + + for (i = 0; i < npciIommuGroups; i++) { + if (pciIommuGroups[i]->iommu == device->iommu) + break; + } + + if (i != npciIommuGroups) { + if (pciIommuGroups[i]->nDevicesBoundToVFIO == 0) { + ret = 0; + goto cleanup; + }
This is somewhat confusing to me. This can happen due to a bug in our code - in that case I'd expect an error to be thrown.
+ pciIommuGroups[i]->nDevicesBoundToVFIO--; + if (!pciIommuGroups[i]->nDevicesBoundToVFIO) { + if (virAsprintfQuiet(&vfiopath, "%s/dev/vfio/%d", + fakesysfsdir, device->iommu) < 0) { + errno = ENOMEM; + goto cleanup; + } + if (unlink(vfiopath) < 0) + goto cleanup; + } + } + + ret = 0; + cleanup: + VIR_FREE(vfiopath); + return ret; +} + +static int +pci_vfio_lock_iommu(struct pciDevice *device) +{ + char *vfiopath = NULL; + int ret = -1; + size_t i = 0; + int fd = -1; + + for (i = 0; i < npciIommuGroups; i++) { + if (pciIommuGroups[i]->iommu == device->iommu) + break; + }
+ if (i != npciIommuGroups) { + if (!pciIommuGroups[i]->nDevicesBoundToVFIO) { + if (virAsprintfQuiet(&vfiopath, "%s/dev/vfio/%d", + fakesysfsdir, device->iommu) < 0) { + errno = ENOMEM; + goto cleanup; + } + if ((fd = realopen(vfiopath, O_CREAT)) < 0) + goto cleanup; + } + pciIommuGroups[i]->nDevicesBoundToVFIO++; + } + + ret = 0; + cleanup: + realclose(fd); + VIR_FREE(vfiopath); + return ret; +}
Missing empty line.
/* * PCI Driver functions */ @@ -556,6 +673,10 @@ pci_driver_bind(struct pciDriver *driver, if (symlink(devpath, driverpath) < 0) goto cleanup;
+ if (STREQ(driver->name, "vfio-pci")) + if (pci_vfio_lock_iommu(dev) < 0) + goto cleanup; + dev->driver = driver; ret = 0; cleanup: @@ -590,6 +711,10 @@ pci_driver_unbind(struct pciDriver *driver, unlink(driverpath) < 0) goto cleanup;
+ if (STREQ(driver->name, "vfio-pci")) + if (pci_vfio_release_iommu(dev) < 0) + goto cleanup; + dev->driver = NULL; ret = 0; cleanup: @@ -801,6 +926,8 @@ init_syms(void) static void init_env(void) { + char *devvfio; + if (fakesysfsdir) return;
@@ -809,12 +936,33 @@ init_env(void)
make_file(fakesysfsdir, "drivers_probe", NULL, -1);
+ if (virAsprintfQuiet(&devvfio, "%s/dev/vfio", fakesysfsdir) < 0) + ABORT_OOM(); + + if (virFileMakePath(devvfio) < 0) + ABORT("Unable to create: %s", devvfio); + VIR_FREE(devvfio); + + pci_iommu_new(1); + pci_iommu_new(2); + pci_iommu_new(3); + pci_iommu_new(4); + pci_iommu_new(5); + pci_iommu_new(6); + pci_iommu_new(7); + pci_iommu_new(8); + pci_iommu_new(9); + pci_iommu_new(10); + pci_iommu_new(11); + + # define MAKE_PCI_DRIVER(name, ...) \ pci_driver_new(name, 0, __VA_ARGS__, -1, -1)
- MAKE_PCI_DRIVER("iwlwifi", 0x8086, 0x0044); - MAKE_PCI_DRIVER("i915", 0x8086, 0x0046, 0x8086, 0x0047); + MAKE_PCI_DRIVER("i915", 0x8086, 0x0046, 0x8086, 0x0047, 0x8086, 0x0048, 0x1033, 0x0035, 0x1033, 0x00e0); + MAKE_PCI_DRIVER("iwlwifi", 0x8086, 0x0044, 0x0086, 0x105e, 0x0086, 0x105d);
The whole idea was that we have some PCI devices not attached to any driver. I'd like to keep a device or two that way.
MAKE_PCI_DRIVER("pci-stub", -1, -1); + MAKE_PCI_DRIVER("vfio-pci", -1, -1); pci_driver_new("pciback", PCI_ACTION_BIND, -1, -1);
# define MAKE_PCI_DEVICE(Id, Vendor, Device, ...) \ @@ -824,20 +972,20 @@ init_env(void) pci_device_new_from_stub(&dev); \ } while (0)
- MAKE_PCI_DEVICE("0000:00:00.0", 0x8086, 0x0044); - MAKE_PCI_DEVICE("0000:00:01.0", 0x8086, 0x0044); - MAKE_PCI_DEVICE("0000:00:02.0", 0x8086, 0x0046); - MAKE_PCI_DEVICE("0000:00:03.0", 0x8086, 0x0048); - MAKE_PCI_DEVICE("0001:00:00.0", 0x1014, 0x03b9, .class = 0x060400); - MAKE_PCI_DEVICE("0001:01:00.0", 0x8086, 0x105e); - MAKE_PCI_DEVICE("0001:01:00.1", 0x8086, 0x105e); - MAKE_PCI_DEVICE("0005:80:00.0", 0x10b5, 0x8112, .class = 0x060400); - MAKE_PCI_DEVICE("0005:90:01.0", 0x1033, 0x0035); - MAKE_PCI_DEVICE("0005:90:01.1", 0x1033, 0x0035); - MAKE_PCI_DEVICE("0005:90:01.2", 0x1033, 0x00e0); - MAKE_PCI_DEVICE("0000:0a:01.0", 0x8086, 0x0047); - MAKE_PCI_DEVICE("0000:0a:02.0", 0x8286, 0x0048); - MAKE_PCI_DEVICE("0000:0a:03.0", 0x8386, 0x0048); + MAKE_PCI_DEVICE("0000:00:00.0", 0x8086, 0x0044, .iommu = 1); + MAKE_PCI_DEVICE("0000:00:01.0", 0x8086, 0x0044, .iommu = 2); + MAKE_PCI_DEVICE("0000:00:02.0", 0x8086, 0x0046, .iommu = 3); + MAKE_PCI_DEVICE("0000:00:03.0", 0x8086, 0x0048, .iommu = 4); + MAKE_PCI_DEVICE("0001:00:00.0", 0x1014, 0x03b9, .iommu = 5, .class = 0x060400); + MAKE_PCI_DEVICE("0001:01:00.0", 0x8086, 0x105e, .iommu = 6); + MAKE_PCI_DEVICE("0001:01:00.1", 0x8086, 0x105d, .iommu = 6);
Okay, you want two slightly different WiFi cards. Thank God for git show --word-diff.
+ MAKE_PCI_DEVICE("0005:80:00.0", 0x10b5, 0x8112, .iommu = 7, .class = 0x060400); + MAKE_PCI_DEVICE("0005:90:01.0", 0x1033, 0x0035, .iommu = 8); + MAKE_PCI_DEVICE("0005:90:01.1", 0x1033, 0x0035, .iommu = 8); + MAKE_PCI_DEVICE("0005:90:01.2", 0x1033, 0x00e0, .iommu = 8); + MAKE_PCI_DEVICE("0000:0a:01.0", 0x8086, 0x0047, .iommu = 9); + MAKE_PCI_DEVICE("0000:0a:02.0", 0x8286, 0x0048, .iommu = 10); + MAKE_PCI_DEVICE("0000:0a:03.0", 0x8386, 0x0048, .iommu = 11); }
This is the diff that I've came up with so far:
diff --git a/tests/virpcimock.c b/tests/virpcimock.c index 837976a..5ef5eac 100644 --- a/tests/virpcimock.c +++ b/tests/virpcimock.c @@ -400,7 +400,7 @@ pci_device_new_from_stub(const struct pciDevice *data) if (virAsprintfQuiet(&deviommupath, "%s/iommu_group", devpath) < 0 || virAsprintfQuiet(&iommugrouppath, "%s/iommu_groups/%d", fakesysfsdir, dev->iommu) < 0) - ABORT("@deviommupath overflow"); + ABORT_OOM();
if (symlink(iommugrouppath, deviommupath) < 0) ABORT("Unable to link device to iommu group"); @@ -408,7 +408,7 @@ pci_device_new_from_stub(const struct pciDevice *data) VIR_FREE(deviommupath); if (virAsprintfQuiet(&iommugroupdevs, "%s/devices/%s", iommugrouppath, dev->id) < 0) - ABORT("@iommugroupdevs overflow"); + ABORT_OOM();
if (symlink(devpath, iommugroupdevs) < 0) ABORT("Unable to link iommu group devices to current device"); @@ -484,6 +484,8 @@ pci_iommu_new(int num)
if (VIR_APPEND_ELEMENT_QUIET(pciIommuGroups, npciIommuGroups, iommuGroup) < 0) ABORT_OOM(); + + VIR_FREE(iommupath); }
static int @@ -499,10 +501,9 @@ pci_vfio_release_iommu(struct pciDevice *device) }
if (i != npciIommuGroups) { - if (pciIommuGroups[i]->nDevicesBoundToVFIO == 0) { - ret = 0; - goto cleanup; - } + if (pciIommuGroups[i]->nDevicesBoundToVFIO == 0) + ABORT("nDevicesBoundToVFIO has unexpected value"); + pciIommuGroups[i]->nDevicesBoundToVFIO--; if (!pciIommuGroups[i]->nDevicesBoundToVFIO) { if (virAsprintfQuiet(&vfiopath, "%s/dev/vfio/%d", @@ -553,6 +554,7 @@ pci_vfio_lock_iommu(struct pciDevice *device) VIR_FREE(vfiopath); return ret; } + /* * PCI Driver functions */
Plus what's needed to have at least one or two device not attached to any driver. I'm okay if you send it as reply that will be squashed in before pushing.
Otherwise looking good. weak ACK.
Michal
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

The host will crash if a device is bound to host driver when the device belonging to same iommu group is in use by any of the guests. So, do the host driver probe only after all the devices in the iommu group have unbound from the vfio. The patch also adds the test case. The patch fixes https://bugzilla.redhat.com/show_bug.cgi?id=1272300 Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virpci.c | 156 +++++++++++++++++++++++++++++++++++++++++++++++----- tests/virpcitest.c | 115 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 257 insertions(+), 14 deletions(-) diff --git a/src/util/virpci.c b/src/util/virpci.c index a8a22d1..5462cd2 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1129,6 +1129,31 @@ virPCIDeviceReprobeHostDriver(virPCIDevicePtr dev, } static int +virPCIDeviceBoundToVFIODriver(virPCIDeviceAddressPtr devAddr, void *opaque ATTRIBUTE_UNUSED) +{ + int ret = 0; + virPCIDevicePtr pci = NULL; + char *drvpath = NULL; + char *driver = NULL; + + if (!(pci = virPCIDeviceNew(devAddr->domain, devAddr->bus, + devAddr->slot, devAddr->function))) + goto cleanup; + + if (virPCIDeviceGetDriverPathAndName(pci, &drvpath, &driver) < 0) + goto cleanup; + + if (STREQ_NULLABLE(driver, "vfio-pci")) + ret = -1; + + cleanup: + VIR_FREE(drvpath); + VIR_FREE(driver); + virPCIDeviceFree(pci); + return ret; +} + +static int virPCIDeviceUnbindFromStub(virPCIDevicePtr dev, virPCIDeviceListPtr activeDevs ATTRIBUTE_UNUSED, virPCIDeviceListPtr inactiveDevs) @@ -1145,21 +1170,91 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev, goto cleanup; if (!driver) { - /* The device is not bound to any driver and we are almost done. */ + /* This device was probably unbound before and libvirt restared. + * Add to the inactive list if not already there so that we + * reprobe it if needed. + */ + if (inactiveDevs && !virPCIDeviceListFind(inactiveDevs, dev) && + virPCIDeviceListAddCopy(inactiveDevs, dev) < 0) goto reprobe; } - if (!dev->unbind_from_stub) + if (!dev->unbind_from_stub) { + /* Add to the inactive list if not already there so that we + * reprobe it if needed. + */ + if (inactiveDevs && !virPCIDeviceListFind(inactiveDevs, dev) && + virPCIDeviceListAddCopy(inactiveDevs, dev) < 0) goto remove_slot; + } /* If the device isn't bound to a known stub, skip the unbind. */ if (!virPCIIsKnownStub(driver)) goto remove_slot; - if (virPCIDeviceUnbind(dev, dev->reprobe) < 0) - goto cleanup; - dev->unbind_from_stub = false; + if (STREQ_NULLABLE(driver, "vfio-pci")) { + size_t i = 0; + bool inInactiveList = false; + virPCIDevicePtr temp; + while (activeDevs && (i < virPCIDeviceListCount(activeDevs))) { + virPCIDevicePtr pcidev = virPCIDeviceListGet(activeDevs, i); + if (dev->iommuGroup == pcidev->iommuGroup) { + /* If Part of activelist, deal with current device later */ + if (inactiveDevs && !virPCIDeviceListFind(inactiveDevs, dev) && + virPCIDeviceListAddCopy(inactiveDevs, dev) < 0) + goto cleanup; + result = 0; + goto revisit; + } + i++; + } + /* No guests are using any of the devices in the iommu. + * This is the first device after guest shutdown/unplug + * of all devices. So, rest of the devices are in inactiveDevs. + * OR This is the first device after libvirt start and nothing is in + * inactiveDevs. + * If the device is in inactiveList(User can issue reattach multiple times + * for the same device), mark it as already unbound. + */ + if (virPCIDeviceUnbind(dev, dev->reprobe) < 0) + goto cleanup; + dev->unbind_from_stub = false; + if (inactiveDevs && (temp = virPCIDeviceListFind(inactiveDevs, dev))) { + temp->unbind_from_stub = false; + inInactiveList = true; + } + /* Unbind rest of the devices in the same iommu group. + * After the fresh libvirt start, nothing is detached in the step + * as inactiveDevs is empty.*/ + i = 0; + while (inactiveDevs && (i < virPCIDeviceListCount(inactiveDevs))) { + virPCIDevicePtr pcidev = virPCIDeviceListGet(inactiveDevs, i); + if (dev->iommuGroup == pcidev->iommuGroup) { + if (pcidev->unbind_from_stub && + virPCIDeviceUnbind(pcidev, pcidev->reprobe) < 0) { + goto cleanup; + } + pcidev->unbind_from_stub = false; + } + i++; + } + /* Libvirt just restarted and inactive list is empty or yet to get into + * the list. But no devices used actively from same iommu group. + * Basically this could be the first one to unbind from vfio but + * could be the last function to be bound to vfio if this is after + * libvirt restart. + */ + if (!inInactiveList) { + if (inactiveDevs && virPCIDeviceListAddCopy(inactiveDevs, dev) < 0) + goto cleanup; + } + } else { + if (virPCIDeviceUnbind(dev, dev->reprobe) < 0) + goto cleanup; + dev->unbind_from_stub = false; + } + /* The remove_slot means nothing for VFIO. Dont manage it in active/inactiveList */ remove_slot: if (!dev->remove_slot) goto reprobe; @@ -1177,25 +1272,58 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev, dev->remove_slot = false; reprobe: - if (!dev->reprobe) { - result = 0; - goto cleanup; - } - if (virPCIDeviceReprobeHostDriver(dev, driver, drvdir) < 0) - goto cleanup; + /* For VFIO devices if there is a pending reprobe, the new reattach + * request would come with device->stubDriver set to null as the + * device is actually unbound. Dont bind to host driver if the + * iommu group is in use. */ + if (!driver || STREQ_NULLABLE(driver, "vfio-pci")) { + size_t i = 0; + virPCIDeviceAddress devAddr = { dev->domain, dev->bus, + dev->slot, dev->function }; + if (virPCIDeviceAddressIOMMUGroupIterate(&devAddr, virPCIDeviceBoundToVFIODriver, NULL) < 0) { + result = 0; + goto cleanup; + } + /* This device is the last to unbind from vfio. As we explicitly + * add a missing device in the list to inactiveList, we will just + * go through the list. */ + while (inactiveDevs && (i < virPCIDeviceListCount(inactiveDevs))) { + virPCIDevicePtr pcidev = virPCIDeviceListGet(inactiveDevs, i); + if (dev->iommuGroup == pcidev->iommuGroup) { + if (pcidev->reprobe && + virPCIDeviceReprobeHostDriver(pcidev, driver, drvdir) < 0) + goto cleanup; + /* Steal the dev from list inactiveDevs */ + virPCIDeviceListDel(inactiveDevs, pcidev); + continue; + } + i++; + } + /* If the list was null, we failed to add to the list before. + * Reprobe the current device explicitly. */ + if (!inactiveDevs) { + if (dev->reprobe && + virPCIDeviceReprobeHostDriver(dev, driver, drvdir) < 0) + goto cleanup; + } + } else { + if (virPCIDeviceReprobeHostDriver(dev, driver, drvdir) < 0) + goto cleanup; + + if (inactiveDevs) + virPCIDeviceListDel(inactiveDevs, dev); + } result = 0; cleanup: - if ((result == 0) && inactiveDevs) - virPCIDeviceListDel(inactiveDevs, dev); - /* do not do it again */ dev->unbind_from_stub = false; dev->remove_slot = false; dev->reprobe = false; + revisit: VIR_FREE(drvdir); VIR_FREE(path); VIR_FREE(driver); diff --git a/tests/virpcitest.c b/tests/virpcitest.c index 25591f9..070775c 100644 --- a/tests/virpcitest.c +++ b/tests/virpcitest.c @@ -211,6 +211,118 @@ testVirPCIDeviceReattach(const void *opaque ATTRIBUTE_UNUSED) return ret; } +static int +testVirPCIDeviceVFIOReattach(const void *opaque ATTRIBUTE_UNUSED) +{ + int ret = -1; + virPCIDevicePtr dev[] = {NULL, NULL, NULL, NULL, NULL}; + size_t i, nDev = ARRAY_CARDINALITY(dev); + virPCIDeviceListPtr activeDevs = NULL, inactiveDevs = NULL; + int count; + + if (!(activeDevs = virPCIDeviceListNew()) || + !(inactiveDevs = virPCIDeviceListNew())) + goto cleanup; + + /* Mark the device 0001:1:00.0 as active in a guest */ + for (i = 0; i < 2; i++) { + if (!(dev[i] = virPCIDeviceNew(1, 0x01, 0, i))) + goto cleanup; + if (virPCIDeviceSetStubDriver(dev[i], "vfio-pci") < 0) + goto cleanup; + if (virPCIDeviceDetach(dev[i], activeDevs, inactiveDevs) < 0) + goto cleanup; + if (virPCIDeviceListAddCopy(activeDevs, dev[i]) < 0) + goto cleanup; + virPCIDeviceListDel(inactiveDevs, dev[i]); + } + CHECK_LIST_COUNT(activeDevs, 2); + CHECK_LIST_COUNT(inactiveDevs, 0); + + for (i = 0; i < 3; i++) { + if (!(dev[i+2] = virPCIDeviceNew(5, 0x90, 1, i))) + goto cleanup; + /* + if (virPCIDeviceListAdd(inactiveDevs, dev[i]) < 0) { + virPCIDeviceFree(dev[i]); + goto cleanup; + } + */ + if (virPCIDeviceSetStubDriver(dev[i+2], "vfio-pci") < 0) + goto cleanup; + + + if (virPCIDeviceDetach(dev[i+2], activeDevs, inactiveDevs) < 0) + goto cleanup; + } + + CHECK_LIST_COUNT(activeDevs, 2); + CHECK_LIST_COUNT(inactiveDevs, 3); + /* Mark the device 0005:90:01.0 as active in a guest */ + if (virPCIDeviceListAddCopy(activeDevs, dev[2]) < 0) + goto cleanup; + + virPCIDeviceListDel(inactiveDevs, dev[2]); + CHECK_LIST_COUNT(activeDevs, 3); + CHECK_LIST_COUNT(inactiveDevs, 2); + + /* Check that the reattach silently queues when any domain is actively + * using any of the devices in the iommu. */ + + if (virPCIDeviceReattach(dev[3], activeDevs, inactiveDevs) < 0) + goto cleanup; + + if (virPCIDeviceReattach(dev[4], activeDevs, inactiveDevs) < 0) + goto cleanup; + + CHECK_LIST_COUNT(activeDevs, 3); + CHECK_LIST_COUNT(inactiveDevs, 2); + + virPCIDeviceListDel(activeDevs, dev[2]); + /* The devices should get queued in inactiveDevs */ + CHECK_LIST_COUNT(activeDevs, 2); + CHECK_LIST_COUNT(inactiveDevs, 2); + + if (virPCIDeviceReattach(dev[2], activeDevs, inactiveDevs) < 0) + goto cleanup; + + /* The queued devices should all be Reattached */ + CHECK_LIST_COUNT(activeDevs, 2); + CHECK_LIST_COUNT(inactiveDevs, 0); + + virPCIDeviceListDel(activeDevs, dev[0]); + virPCIDeviceListDel(activeDevs, dev[1]); + + CHECK_LIST_COUNT(activeDevs, 0); + CHECK_LIST_COUNT(inactiveDevs, 0); + /* Already reattached. No need to queue */ + + if (virPCIDeviceReattach(dev[0], activeDevs, inactiveDevs) < 0) + goto cleanup; + + CHECK_LIST_COUNT(inactiveDevs, 1); + + if (virPCIDeviceReattach(dev[1], activeDevs, inactiveDevs) < 0) + goto cleanup; + + CHECK_LIST_COUNT(inactiveDevs, 0); + + for (i = 2; i < nDev; i++) { + if (virPCIDeviceReattach(dev[i], activeDevs, inactiveDevs) < 0) + goto cleanup; + } + + /* Already reattached. No need to queue */ + CHECK_LIST_COUNT(activeDevs, 0); + CHECK_LIST_COUNT(inactiveDevs, 0); + + ret = 0; + cleanup: + virObjectUnref(activeDevs); + virObjectUnref(inactiveDevs); + return ret; +} + struct testPCIDevData { unsigned int domain; unsigned int bus; @@ -444,6 +556,9 @@ mymain(void) DO_TEST_PCI(testVirPCIDeviceReattachSingle, 0, 0x0a, 3, 0); DO_TEST_PCI_DRIVER(0, 0x0a, 3, 0, NULL); + /* Reattach an unknown unbound device */ + DO_TEST(testVirPCIDeviceVFIOReattach); + if (getenv("LIBVIRT_SKIP_CLEANUP") == NULL) virFileDeleteTree(fakesysfsdir);

There could be a delay of 1 or 2 seconds before the vfio-pci driver is unbound and the device file /dev/vfio/<iommu> is actually removed. If the file exists, the host driver probing the device can lead to crash. So, wait and avoid the crash. This cant be tested with the mock as the delay cant be simulated. The mock changes here are to just let the test cases pass for access() call to check if the /dev/vfio/<iommu> exists. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virpci.c | 41 +++++++++++++++++++++++++++++++++++++++++ tests/virpcimock.c | 11 ++++++++++- 2 files changed, 51 insertions(+), 1 deletion(-) diff --git a/src/util/virpci.c b/src/util/virpci.c index 5462cd2..4765302 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1098,6 +1098,43 @@ virPCIIsKnownStub(char *driver) return ret; } +#define VFIO_UNBIND_TIMEOUT 100 + +/* It is not safe to initiate host driver probe if the vfio driver has not + * completely unbound the device. Usual wait time is 1 to 2 seconds. + * So, return if the unbind didn't complete in 10 seconds. + */ +static int +virPCIWaitForVFIOUnbindCompletion(virPCIDevicePtr dev) +{ + int retry = 0; + int ret = -1; + char *path = NULL; + + if (!(path = virPCIDeviceGetIOMMUGroupDev(dev))) + goto cleanup; + + while (retry++ < VFIO_UNBIND_TIMEOUT) { + if (!virFileExists(path)) + break; + usleep(100000); /* Sleep for 1/10th of a second */ + } + + if (virFileExists(path)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("VFIO unbind not completed even after %d seconds" + " for device %s"), retry, dev->name); + goto cleanup; + } + + ret = 0; + cleanup: + VIR_FREE(path); + return ret; + +} + + static int virPCIDeviceReprobeHostDriver(virPCIDevicePtr dev, const char *driver, @@ -1288,6 +1325,10 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev, /* This device is the last to unbind from vfio. As we explicitly * add a missing device in the list to inactiveList, we will just * go through the list. */ + + if (virPCIWaitForVFIOUnbindCompletion(dev) < 0) + goto cleanup; + while (inactiveDevs && (i < virPCIDeviceListCount(inactiveDevs))) { virPCIDevicePtr pcidev = virPCIDeviceListGet(inactiveDevs, i); if (dev->iommuGroup == pcidev->iommuGroup) { diff --git a/tests/virpcimock.c b/tests/virpcimock.c index 837976a..cf92c58 100644 --- a/tests/virpcimock.c +++ b/tests/virpcimock.c @@ -52,6 +52,7 @@ static DIR * (*realopendir)(const char *name); char *fakesysfsdir; # define PCI_SYSFS_PREFIX "/sys/bus/pci/" +# define ROOT_DEV_PREFIX "/dev/" # define STDERR(...) \ fprintf(stderr, "%s %zu: ", __FUNCTION__, (size_t) __LINE__); \ @@ -248,6 +249,13 @@ getrealpath(char **newpath, errno = ENOMEM; return -1; } + } else if (STRPREFIX(path, ROOT_DEV_PREFIX)) { + if (virAsprintfQuiet(newpath, "%s/%s", + fakesysfsdir, + path) < 0) { + errno = ENOMEM; + return -1; + } } else { if (VIR_STRDUP_QUIET(*newpath, path) < 0) return -1; @@ -1002,7 +1010,8 @@ access(const char *path, int mode) init_syms(); - if (STRPREFIX(path, PCI_SYSFS_PREFIX)) { + if (STRPREFIX(path, PCI_SYSFS_PREFIX) || + STRPREFIX(path, ROOT_DEV_PREFIX)) { char *newpath; if (getrealpath(&newpath, path) < 0) return -1;

On 11/14/2015 03:39 AM, Shivaprasad G Bhat wrote:
There could be a delay of 1 or 2 seconds before the vfio-pci driver is unbound and the device file /dev/vfio/<iommu> is actually removed. If the file exists, the host driver probing the device can lead to crash. So, wait and avoid the crash.
If this is true, it is a kernel bug and must be fixed, not glossed over with a clunky timed loop. In a discussion on IRC, Alex has told me that by the time you return from unbinding the last device from vfio-pci (by writing the ASCII representation of the device's PCI address to its driver/unbind in sysfs), it should be safe to reprobe, and the reprobe should be successful. In other words, this wait should be unnecessary. If libvirt added fixups like this for every transient kernel bug, it would end up being a fragile unmaintainable mess understood by nobody. Instead, we should push for the kernel to have its bugs fixed.
This cant be tested with the mock as the delay cant be simulated. The mock changes here are to just let the test cases pass for access() call to check if the /dev/vfio/<iommu> exists.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virpci.c | 41 +++++++++++++++++++++++++++++++++++++++++ tests/virpcimock.c | 11 ++++++++++- 2 files changed, 51 insertions(+), 1 deletion(-)
diff --git a/src/util/virpci.c b/src/util/virpci.c index 5462cd2..4765302 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1098,6 +1098,43 @@ virPCIIsKnownStub(char *driver) return ret; }
+#define VFIO_UNBIND_TIMEOUT 100 + +/* It is not safe to initiate host driver probe if the vfio driver has not + * completely unbound the device. Usual wait time is 1 to 2 seconds. + * So, return if the unbind didn't complete in 10 seconds. + */ +static int +virPCIWaitForVFIOUnbindCompletion(virPCIDevicePtr dev) +{ + int retry = 0; + int ret = -1; + char *path = NULL; + + if (!(path = virPCIDeviceGetIOMMUGroupDev(dev))) + goto cleanup; + + while (retry++ < VFIO_UNBIND_TIMEOUT) { + if (!virFileExists(path)) + break; + usleep(100000); /* Sleep for 1/10th of a second */ + } + + if (virFileExists(path)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("VFIO unbind not completed even after %d seconds" + " for device %s"), retry, dev->name); + goto cleanup; + } + + ret = 0; + cleanup: + VIR_FREE(path); + return ret; + +} + + static int virPCIDeviceReprobeHostDriver(virPCIDevicePtr dev, const char *driver, @@ -1288,6 +1325,10 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev, /* This device is the last to unbind from vfio. As we explicitly * add a missing device in the list to inactiveList, we will just * go through the list. */ + + if (virPCIWaitForVFIOUnbindCompletion(dev) < 0) + goto cleanup; + while (inactiveDevs && (i < virPCIDeviceListCount(inactiveDevs))) { virPCIDevicePtr pcidev = virPCIDeviceListGet(inactiveDevs, i); if (dev->iommuGroup == pcidev->iommuGroup) { diff --git a/tests/virpcimock.c b/tests/virpcimock.c index 837976a..cf92c58 100644 --- a/tests/virpcimock.c +++ b/tests/virpcimock.c @@ -52,6 +52,7 @@ static DIR * (*realopendir)(const char *name); char *fakesysfsdir;
# define PCI_SYSFS_PREFIX "/sys/bus/pci/" +# define ROOT_DEV_PREFIX "/dev/"
# define STDERR(...) \ fprintf(stderr, "%s %zu: ", __FUNCTION__, (size_t) __LINE__); \ @@ -248,6 +249,13 @@ getrealpath(char **newpath, errno = ENOMEM; return -1; } + } else if (STRPREFIX(path, ROOT_DEV_PREFIX)) { + if (virAsprintfQuiet(newpath, "%s/%s", + fakesysfsdir, + path) < 0) { + errno = ENOMEM; + return -1; + } } else { if (VIR_STRDUP_QUIET(*newpath, path) < 0) return -1; @@ -1002,7 +1010,8 @@ access(const char *path, int mode)
init_syms();
- if (STRPREFIX(path, PCI_SYSFS_PREFIX)) { + if (STRPREFIX(path, PCI_SYSFS_PREFIX) || + STRPREFIX(path, ROOT_DEV_PREFIX)) { char *newpath; if (getrealpath(&newpath, path) < 0) return -1;
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Fri, Nov 20, 2015 at 9:07 PM, Laine Stump <laine@laine.org> wrote:
On 11/14/2015 03:39 AM, Shivaprasad G Bhat wrote:
There could be a delay of 1 or 2 seconds before the vfio-pci driver is unbound and the device file /dev/vfio/<iommu> is actually removed. If the file exists, the host driver probing the device can lead to crash. So, wait and avoid the crash.
If this is true, it is a kernel bug and must be fixed, not glossed over with a clunky timed loop.
In a discussion on IRC, Alex has told me that by the time you return from unbinding the last device from vfio-pci (by writing the ASCII representation of the device's PCI address to its driver/unbind in sysfs), it should be safe to reprobe, and the reprobe should be successful. In other words, this wait should be unnecessary.
If libvirt added fixups like this for every transient kernel bug, it would end up being a fragile unmaintainable mess understood by nobody. Instead, we should push for the kernel to have its bugs fixed.
Hi Laine, I got it confirmed from Alexey(CC'ed) that this delay is actually not needed. So, I feel this patch and Patch 9 which unbinds from vfio in a way to help this wait to take place, both can be dropped. Andrea's approach of unbind can be taken which is rather clean. My patch http://www.redhat.com/archives/libvir-list/2015-November/msg00007.html still fixes a genuine issue which I'll rework after Andrea's series gets merged. Andrea, I think we can use the mock changes and the test case(with some change) from this series for your series. Please let me know if you can pull them along or I'll rework and send them after you series is merged. Thanks and Regards, Shivaprasad
This cant be tested with the mock as the delay cant be simulated. The mock changes here are to just let the test cases pass for access() call to check if the /dev/vfio/<iommu> exists.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virpci.c | 41 +++++++++++++++++++++++++++++++++++++++++ tests/virpcimock.c | 11 ++++++++++- 2 files changed, 51 insertions(+), 1 deletion(-)
diff --git a/src/util/virpci.c b/src/util/virpci.c index 5462cd2..4765302 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1098,6 +1098,43 @@ virPCIIsKnownStub(char *driver) return ret; } +#define VFIO_UNBIND_TIMEOUT 100 + +/* It is not safe to initiate host driver probe if the vfio driver has not + * completely unbound the device. Usual wait time is 1 to 2 seconds. + * So, return if the unbind didn't complete in 10 seconds. + */ +static int +virPCIWaitForVFIOUnbindCompletion(virPCIDevicePtr dev) +{ + int retry = 0; + int ret = -1; + char *path = NULL; + + if (!(path = virPCIDeviceGetIOMMUGroupDev(dev))) + goto cleanup; + + while (retry++ < VFIO_UNBIND_TIMEOUT) { + if (!virFileExists(path)) + break; + usleep(100000); /* Sleep for 1/10th of a second */ + } + + if (virFileExists(path)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("VFIO unbind not completed even after %d seconds" + " for device %s"), retry, dev->name); + goto cleanup; + } + + ret = 0; + cleanup: + VIR_FREE(path); + return ret; + +} + + static int virPCIDeviceReprobeHostDriver(virPCIDevicePtr dev, const char *driver, @@ -1288,6 +1325,10 @@ virPCIDeviceUnbindFromStub(virPCIDevicePtr dev, /* This device is the last to unbind from vfio. As we explicitly * add a missing device in the list to inactiveList, we will just * go through the list. */ + + if (virPCIWaitForVFIOUnbindCompletion(dev) < 0) + goto cleanup; + while (inactiveDevs && (i < virPCIDeviceListCount(inactiveDevs))) { virPCIDevicePtr pcidev = virPCIDeviceListGet(inactiveDevs, i); if (dev->iommuGroup == pcidev->iommuGroup) { diff --git a/tests/virpcimock.c b/tests/virpcimock.c index 837976a..cf92c58 100644 --- a/tests/virpcimock.c +++ b/tests/virpcimock.c @@ -52,6 +52,7 @@ static DIR * (*realopendir)(const char *name); char *fakesysfsdir; # define PCI_SYSFS_PREFIX "/sys/bus/pci/" +# define ROOT_DEV_PREFIX "/dev/" # define STDERR(...) \ fprintf(stderr, "%s %zu: ", __FUNCTION__, (size_t) __LINE__); \ @@ -248,6 +249,13 @@ getrealpath(char **newpath, errno = ENOMEM; return -1; } + } else if (STRPREFIX(path, ROOT_DEV_PREFIX)) { + if (virAsprintfQuiet(newpath, "%s/%s", + fakesysfsdir, + path) < 0) { + errno = ENOMEM; + return -1; + } } else { if (VIR_STRDUP_QUIET(*newpath, path) < 0) return -1; @@ -1002,7 +1010,8 @@ access(const char *path, int mode) init_syms(); - if (STRPREFIX(path, PCI_SYSFS_PREFIX)) { + if (STRPREFIX(path, PCI_SYSFS_PREFIX) || + STRPREFIX(path, ROOT_DEV_PREFIX)) { char *newpath; if (getrealpath(&newpath, path) < 0) return -1;
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Sat, 2015-11-14 at 14:03 +0530, Shivaprasad G Bhat wrote:
The series fixes few VFIO related host crash issues. The patches 3, 4, 9 and 10 are actual fixes. Patch 7 and 8 are test changes to allow testing patch 9. Rest of the patches are mostly code movements except for patch 2.
--- Changes from v3 -> v4 - Andrea's suggestion not to overload the usage of stubDriver to set it in virPCINew is addressed. Not setting it any more. Fetch it fresh from sysfs. - The test case added in old P7 is improvised - Old P7 is split into 3 patches where I have moved the virpcimock changes to support iommu into a new patch.
Shivaprasad G Bhat (10): Implement virPCIIsKnownStub function Add iommu group number info to virPCIDevice Refuse to reattach from vfio if the iommu group is in use by any domain Wait for vfio-pci device cleanups before reassinging the device to host driver Split reprobe action from the virPCIUnbindFromStub into a new function Pass activeDevs and inactiveDevs to virPCIDeviceUnbindFromStub and virPCIDeviceBindToStub Change the negative test case to try pciback instead of vfio-pci Add iommu info for pci on mocked sysfs Postpone reprobing till all the devices in iommu group are unbound from vfio Wait for iommmu device to go away before reprobing the host driver
src/util/virhostdev.c | 28 ++++- src/util/virpci.c | 297 +++++++++++++++++++++++++++++++++++++++++-------- tests/virpcimock.c | 193 +++++++++++++++++++++++++++++--- tests/virpcitest.c | 117 +++++++++++++++++++ 4 files changed, 566 insertions(+), 69 deletions(-)
I haven't gone through the patches, but I've tried detaching one of the four functions of an Ethernet adapter while the guest was running and I haven't experienced the host crash doing so usually leads to. I will look at the code tomorrow to report issues and weak-ACK what looks okay; a more authoritative second opinion is definitely needed before any of this can go in. Cheers. -- Andrea Bolognani Software Engineer - Virtualization Team
participants (6)
-
Alex Williamson
-
Andrea Bolognani
-
Laine Stump
-
Michal Privoznik
-
Shivaprasad bhat
-
Shivaprasad G Bhat