[libvirt] [PATCHv2 00/12] remains of 'support VFIO groups'

This is what remains of yesterday's VFIO groups patchset that is considered still useful, but wasn't yet ACKed (I've pushed the 5 that were ACKed). In addition, I found a few more bugs in the virPCIDeviceList handling and have included patches for those. There are a couple of these that would be really nice to get into 1.0.7, one in particular which is new functionality, but has 4 prerequisites: The most important patch here from a functionality point of view is PATCH 05/12, which adds the <iommuGroup> element to the output of "virsh nodedev-dumpxml". This is essential for users/management applications to have adequate information to know which devices need to be detached from host drivers. (Unfortunately, all of the patches prior to 05/12 are required for 05/12 to cleanly apply.) The other new patch that I've been told is important is 08/12, which supresses reset of PCI devices that were/will be assigned using VFIO - apparently the vfio driver already resets the devices when appropriate, so having libvirt do it is just redundant. (In order for 08/12 to operate reliably, 07/12 is required, and in order for 07/12 to apply, 06/12 must also be applied). If someone ACKs any of these patches prior to DV cutting the first rc and I don't respond on IRC (not sure what my schedule is in the next 12 hours :-), please feel free to push any of them you like, as long as you push them in order with none missing from the middle. Laine Stump (12): pci: eliminate unused driver arg from virPCIDeviceDetach pci: rename virPCIDeviceGetVFIOGroupDev to virPCIDeviceGetIOMMUGroupDev pci: make virPCIParseDeviceAddress public pci: new iommu_group functions nodedev: add iommuGroup to node device object pci: eliminate repetitive path constructions in virPCIDeviceBindToStub pci: update stubDriver name in virPCIDeviceBindToStub qemu: don't reset PCI devices being assigned with VFIO pci: virPCIDeviceListAddCopy API pci: eliminate leak in OOM condition pci: fix dangling pointer in qemuDomainReAttachHostdevDevices qemu: fix infinite loop in OOM error path docs/formatnode.html.in | 63 +++- docs/schemas/nodedev.rng | 11 + src/conf/node_device_conf.c | 86 +++++- src/conf/node_device_conf.h | 5 +- src/libvirt_private.syms | 8 +- src/node_device/node_device_udev.c | 21 +- src/qemu/qemu_cgroup.c | 4 +- src/qemu/qemu_driver.c | 2 +- src/qemu/qemu_hostdev.c | 52 ++-- src/qemu/qemu_hotplug.c | 5 +- src/security/security_apparmor.c | 2 +- src/security/security_dac.c | 4 +- src/security/security_selinux.c | 4 +- src/util/virpci.c | 344 +++++++++++++++++---- src/util/virpci.h | 21 +- src/xen/xen_driver.c | 2 +- tests/nodedevschemadata/pci_8086_10c9_sriov_pf.xml | 16 + tests/nodedevxml2xmltest.c | 1 + 18 files changed, 553 insertions(+), 98 deletions(-) create mode 100644 tests/nodedevschemadata/pci_8086_10c9_sriov_pf.xml -- 1.7.11.7

The driver arg to virPCIDeviceDetach is no longer used (the name of the stub driver is now set in the virPCIDevice object, and virPCIDeviceDetach retrieves it from there). Remove it. --- src/qemu/qemu_driver.c | 2 +- src/qemu/qemu_hostdev.c | 2 +- src/util/virpci.c | 13 +++++-------- src/util/virpci.h | 3 +-- src/xen/xen_driver.c | 2 +- 5 files changed, 9 insertions(+), 13 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index efee62f..83770e5 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -10617,7 +10617,7 @@ qemuNodeDeviceDetachFlags(virNodeDevicePtr dev, virObjectLock(driver->inactivePciHostdevs); if (virPCIDeviceDetach(pci, driver->activePciHostdevs, - driver->inactivePciHostdevs, NULL) < 0) { + driver->inactivePciHostdevs) < 0) { goto out; } diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index 7a9e6eb..dfe39c6 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -540,7 +540,7 @@ int qemuPrepareHostdevPCIDevices(virQEMUDriverPtr driver, for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); if (virPCIDeviceGetManaged(dev) && - virPCIDeviceDetach(dev, driver->activePciHostdevs, NULL, NULL) < 0) + virPCIDeviceDetach(dev, driver->activePciHostdevs, NULL) < 0) goto reattachdevs; } diff --git a/src/util/virpci.c b/src/util/virpci.c index 2f4032f..2980e22 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1238,15 +1238,12 @@ cleanup: int virPCIDeviceDetach(virPCIDevicePtr dev, virPCIDeviceList *activeDevs, - virPCIDeviceList *inactiveDevs, - const char *driver) + virPCIDeviceList *inactiveDevs) { - if (!driver && dev->stubDriver) - driver = dev->stubDriver; - - if (virPCIProbeStubDriver(driver) < 0) { + if (virPCIProbeStubDriver(dev->stubDriver) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, - _("Failed to load PCI stub module %s"), driver); + _("Failed to load PCI stub module %s"), + dev->stubDriver); return -1; } @@ -1256,7 +1253,7 @@ virPCIDeviceDetach(virPCIDevicePtr dev, return -1; } - if (virPCIDeviceBindToStub(dev, driver) < 0) + if (virPCIDeviceBindToStub(dev, dev->stubDriver) < 0) return -1; /* Add *a copy of* the dev into list inactiveDevs, if diff --git a/src/util/virpci.h b/src/util/virpci.h index d069adb..944aa09 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -51,8 +51,7 @@ const char *virPCIDeviceGetName(virPCIDevicePtr dev); int virPCIDeviceDetach(virPCIDevicePtr dev, virPCIDeviceListPtr activeDevs, - virPCIDeviceListPtr inactiveDevs, - const char *driver); + virPCIDeviceListPtr inactiveDevs); int virPCIDeviceReattach(virPCIDevicePtr dev, virPCIDeviceListPtr activeDevs, virPCIDeviceListPtr inactiveDevs); diff --git a/src/xen/xen_driver.c b/src/xen/xen_driver.c index ffc6e41..998fc69 100644 --- a/src/xen/xen_driver.c +++ b/src/xen/xen_driver.c @@ -2440,7 +2440,7 @@ xenUnifiedNodeDeviceDetachFlags(virNodeDevicePtr dev, goto out; } - if (virPCIDeviceDetach(pci, NULL, NULL, NULL) < 0) + if (virPCIDeviceDetach(pci, NULL, NULL) < 0) goto out; ret = 0; -- 1.7.11.7

On Mon, Jun 24, 2013 at 11:05:27PM -0400, Laine Stump wrote:
The driver arg to virPCIDeviceDetach is no longer used (the name of the stub driver is now set in the virPCIDevice object, and virPCIDeviceDetach retrieves it from there). Remove it.
What happens when libvirtd is restarted ? Are we correctly preserving the original value across restart, or are we capable of auto-detecting the correct value again ? Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 06/25/2013 05:50 AM, Daniel P. Berrange wrote:
The driver arg to virPCIDeviceDetach is no longer used (the name of the stub driver is now set in the virPCIDevice object, and virPCIDeviceDetach retrieves it from there). Remove it. What happens when libvirtd is restarted ? Are we correctly preserving
On Mon, Jun 24, 2013 at 11:05:27PM -0400, Laine Stump wrote: the original value across restart, or are we capable of auto-detecting the correct value again ?
That's a good question, for two reasons: 1) I didn't know the answer for sure when you asked it, and 2) I went back over the code, learned the answer, and it was a good answer :-) Originally the stubDriverName in the virPCIDevice object was only used for a short period - during qemu's "device Preparation" it will create a virPCIDeviceList from the virDomainHostdevDef list, then later iterate through the virPCIDeviceList to detach all the devices from their host drivers; during this detach loop, the original viDomainHostdevDef isn't available, so an indicator of which stub to bind to must be stashed in the virPCIDevice object when it is created. Later it became useful to be able to rely on the stubDriverName always being correct (at least in the activePciHostdevs list; it doesn't matter for inactivePciHostdevs). Fortunately that list is recreated whenever libvirt restarts by calling qemuUpdateActivePciHostdevs() for every active domain. That function also updates stubDriverName for each active hostdev based on the config (which must be the actual driver in use, otherwise we would have encountered an error when the device was originally being attached). So, the answer is yes, we auto-set the correct value in the virPCIDevice object when libvirtd restarts. However, this entire topic brings up a question that I've had for awhile - what happens to the "inactivePciHostdevs" list when libvirt restarts? This is the list of devices that have been detached from the host by libvirt's virNodeDevice API, but are not currently in use by any domain. This list isn't stored in any status file, and is not reconstructed when libvirtd restarts (the information to do that doesn't exist). I *think* the answer is that it doesn't matter, because libvirt will just use the device anyway, then put it on the inactive list when it's done with it. But that suggests the question "Then why have the inactive list?" I don't know the answer to that one, but it does seem like either we should be reconstructing inactivePciHostdevs when libvirtd restarts, or we should just give up and not use (or maintain) it.

On Tue, Jun 25, 2013 at 12:35:03PM -0400, Laine Stump wrote:
On 06/25/2013 05:50 AM, Daniel P. Berrange wrote:
The driver arg to virPCIDeviceDetach is no longer used (the name of the stub driver is now set in the virPCIDevice object, and virPCIDeviceDetach retrieves it from there). Remove it. What happens when libvirtd is restarted ? Are we correctly preserving
On Mon, Jun 24, 2013 at 11:05:27PM -0400, Laine Stump wrote: the original value across restart, or are we capable of auto-detecting the correct value again ?
That's a good question, for two reasons: 1) I didn't know the answer for sure when you asked it, and 2) I went back over the code, learned the answer, and it was a good answer :-)
Originally the stubDriverName in the virPCIDevice object was only used for a short period - during qemu's "device Preparation" it will create a virPCIDeviceList from the virDomainHostdevDef list, then later iterate through the virPCIDeviceList to detach all the devices from their host drivers; during this detach loop, the original viDomainHostdevDef isn't available, so an indicator of which stub to bind to must be stashed in the virPCIDevice object when it is created.
Later it became useful to be able to rely on the stubDriverName always being correct (at least in the activePciHostdevs list; it doesn't matter for inactivePciHostdevs). Fortunately that list is recreated whenever libvirt restarts by calling qemuUpdateActivePciHostdevs() for every active domain. That function also updates stubDriverName for each active hostdev based on the config (which must be the actual driver in use, otherwise we would have encountered an error when the device was originally being attached).
So, the answer is yes, we auto-set the correct value in the virPCIDevice object when libvirtd restarts.
ACK, to the patch then.
However, this entire topic brings up a question that I've had for awhile - what happens to the "inactivePciHostdevs" list when libvirt restarts? This is the list of devices that have been detached from the host by libvirt's virNodeDevice API, but are not currently in use by any domain. This list isn't stored in any status file, and is not reconstructed when libvirtd restarts (the information to do that doesn't exist). I *think* the answer is that it doesn't matter, because libvirt will just use the device anyway, then put it on the inactive list when it's done with it. But that suggests the question "Then why have the inactive list?" I don't know the answer to that one, but it does seem like either we should be reconstructing inactivePciHostdevs when libvirtd restarts, or we should just give up and not use (or maintain) it.
Yes, that is odd. Would have to look back to see why we introduced this list in the first place. If we can kill it so much the better Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

I realized after the fact that it's probably better in the long run to give this function a name that matches the name of the link used in sysfs to hold the group (iommu_group). I'm changing it now because I'm about to add several more functions that deal with iommu groups. --- src/libvirt_private.syms | 2 +- src/qemu/qemu_cgroup.c | 4 ++-- src/security/security_apparmor.c | 2 +- src/security/security_dac.c | 4 ++-- src/security/security_selinux.c | 4 ++-- src/util/virpci.c | 6 +++--- src/util/virpci.h | 2 +- 7 files changed, 12 insertions(+), 12 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 7c38c58..410e932 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1692,6 +1692,7 @@ virPCIDeviceCopy; virPCIDeviceDetach; virPCIDeviceFileIterate; virPCIDeviceFree; +virPCIDeviceGetIOMMUGroupDev; virPCIDeviceGetManaged; virPCIDeviceGetName; virPCIDeviceGetRemoveSlot; @@ -1699,7 +1700,6 @@ virPCIDeviceGetReprobe; virPCIDeviceGetStubDriver; virPCIDeviceGetUnbindFromStub; virPCIDeviceGetUsedBy; -virPCIDeviceGetVFIOGroupDev; virPCIDeviceIsAssignable; virPCIDeviceListAdd; virPCIDeviceListCount; diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index cf46993..5f54ca6 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -277,7 +277,7 @@ qemuSetupHostdevCGroup(virDomainObjPtr vm, if (!pci) goto cleanup; - if (!(path = virPCIDeviceGetVFIOGroupDev(pci))) + if (!(path = virPCIDeviceGetIOMMUGroupDev(pci))) goto cleanup; VIR_DEBUG("Cgroup allow %s for PCI device assignment", path); @@ -376,7 +376,7 @@ qemuTeardownHostdevCgroup(virDomainObjPtr vm, if (!pci) goto cleanup; - if (!(path = virPCIDeviceGetVFIOGroupDev(pci))) + if (!(path = virPCIDeviceGetIOMMUGroupDev(pci))) goto cleanup; VIR_DEBUG("Cgroup deny %s for PCI device assignment", path); diff --git a/src/security/security_apparmor.c b/src/security/security_apparmor.c index 87c2777..50b0e74 100644 --- a/src/security/security_apparmor.c +++ b/src/security/security_apparmor.c @@ -823,7 +823,7 @@ AppArmorSetSecurityHostdevLabel(virSecurityManagerPtr mgr, if (dev->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO) { - char *vfioGroupDev = virPCIDeviceGetVFIOGroupDev(pci); + char *vfioGroupDev = virPCIDeviceGetIOMMUGroupDev(pci); if (!vfioGroupDev) { virPCIDeviceFree(pci); diff --git a/src/security/security_dac.c b/src/security/security_dac.c index b8d1a92..0d6defc 100644 --- a/src/security/security_dac.c +++ b/src/security/security_dac.c @@ -529,7 +529,7 @@ virSecurityDACSetSecurityHostdevLabel(virSecurityManagerPtr mgr, if (dev->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO) { - char *vfioGroupDev = virPCIDeviceGetVFIOGroupDev(pci); + char *vfioGroupDev = virPCIDeviceGetIOMMUGroupDev(pci); if (!vfioGroupDev) { virPCIDeviceFree(pci); @@ -648,7 +648,7 @@ virSecurityDACRestoreSecurityHostdevLabel(virSecurityManagerPtr mgr, if (dev->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO) { - char *vfioGroupDev = virPCIDeviceGetVFIOGroupDev(pci); + char *vfioGroupDev = virPCIDeviceGetIOMMUGroupDev(pci); if (!vfioGroupDev) { virPCIDeviceFree(pci); diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index 6fe063e..7802dda 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1333,7 +1333,7 @@ virSecuritySELinuxSetSecurityHostdevSubsysLabel(virDomainDefPtr def, if (dev->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO) { - char *vfioGroupDev = virPCIDeviceGetVFIOGroupDev(pci); + char *vfioGroupDev = virPCIDeviceGetIOMMUGroupDev(pci); if (!vfioGroupDev) { virPCIDeviceFree(pci); @@ -1528,7 +1528,7 @@ virSecuritySELinuxRestoreSecurityHostdevSubsysLabel(virSecurityManagerPtr mgr, if (dev->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO) { - char *vfioGroupDev = virPCIDeviceGetVFIOGroupDev(pci); + char *vfioGroupDev = virPCIDeviceGetIOMMUGroupDev(pci); if (!vfioGroupDev) { virPCIDeviceFree(pci); diff --git a/src/util/virpci.c b/src/util/virpci.c index 2980e22..51fad78 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1852,11 +1852,11 @@ cleanup: return ret; } -/* virPCIDeviceGetVFIOGroupDev - return the name of the device used to - * control this PCI device's group (e.g. "/dev/vfio/15") +/* virPCIDeviceGetIOMMUGroupDev - return the name of the device used + * to control this PCI device's group (e.g. "/dev/vfio/15") */ char * -virPCIDeviceGetVFIOGroupDev(virPCIDevicePtr dev) +virPCIDeviceGetIOMMUGroupDev(virPCIDevicePtr dev) { char *devPath = NULL; char *groupPath = NULL; diff --git a/src/util/virpci.h b/src/util/virpci.h index 944aa09..89717b8 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -117,7 +117,7 @@ int virPCIDeviceFileIterate(virPCIDevicePtr dev, virPCIDeviceFileActor actor, void *opaque); char * -virPCIDeviceGetVFIOGroupDev(virPCIDevicePtr dev); +virPCIDeviceGetIOMMUGroupDev(virPCIDevicePtr dev); int virPCIDeviceIsAssignable(virPCIDevicePtr dev, int strict_acs_check); -- 1.7.11.7

On Mon, Jun 24, 2013 at 11:05:28PM -0400, Laine Stump wrote:
I realized after the fact that it's probably better in the long run to give this function a name that matches the name of the link used in sysfs to hold the group (iommu_group).
I'm changing it now because I'm about to add several more functions that deal with iommu groups. --- src/libvirt_private.syms | 2 +- src/qemu/qemu_cgroup.c | 4 ++-- src/security/security_apparmor.c | 2 +- src/security/security_dac.c | 4 ++-- src/security/security_selinux.c | 4 ++-- src/util/virpci.c | 6 +++--- src/util/virpci.h | 2 +- 7 files changed, 12 insertions(+), 12 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

This function has utility outside of virpci.c, so make it public. --- src/libvirt_private.syms | 1 + src/util/virpci.c | 2 +- src/util/virpci.h | 2 ++ 3 files changed, 4 insertions(+), 1 deletion(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 410e932..643f311 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1728,6 +1728,7 @@ virPCIGetVirtualFunctionIndex; virPCIGetVirtualFunctionInfo; virPCIGetVirtualFunctions; virPCIIsVirtualFunction; +virPCIParseDeviceAddress; # util/virpidfile.h diff --git a/src/util/virpci.c b/src/util/virpci.c index 51fad78..fc04cac 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -2046,7 +2046,7 @@ logStrToLong_ui(char const *s, return ret; } -static int +int virPCIParseDeviceAddress(char *address, virPCIDeviceAddressPtr bdf) { diff --git a/src/util/virpci.h b/src/util/virpci.h index 89717b8..bcf1b81 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -152,6 +152,8 @@ int virPCIGetAddrString(unsigned int domain, char **pciConfigAddr) ATTRIBUTE_NONNULL(5) ATTRIBUTE_RETURN_CHECK; +int virPCIParseDeviceAddress(char *address, virPCIDeviceAddressPtr bdf); + int virPCIGetVirtualFunctionInfo(const char *vf_sysfs_device_path, char **pfname, int *vf_index); -- 1.7.11.7

On Mon, Jun 24, 2013 at 11:05:29PM -0400, Laine Stump wrote:
This function has utility outside of virpci.c, so make it public. --- src/libvirt_private.syms | 1 + src/util/virpci.c | 2 +- src/util/virpci.h | 2 ++ 3 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 410e932..643f311 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1728,6 +1728,7 @@ virPCIGetVirtualFunctionIndex; virPCIGetVirtualFunctionInfo; virPCIGetVirtualFunctions; virPCIIsVirtualFunction; +virPCIParseDeviceAddress;
# util/virpidfile.h diff --git a/src/util/virpci.c b/src/util/virpci.c index 51fad78..fc04cac 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -2046,7 +2046,7 @@ logStrToLong_ui(char const *s, return ret; }
-static int +int virPCIParseDeviceAddress(char *address, virPCIDeviceAddressPtr bdf)
Given that the struct is 'virPCIDeviceAddressPtr' the method name is really better as virPCIDeviceAddressParse'
{ diff --git a/src/util/virpci.h b/src/util/virpci.h index 89717b8..bcf1b81 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -152,6 +152,8 @@ int virPCIGetAddrString(unsigned int domain, char **pciConfigAddr) ATTRIBUTE_NONNULL(5) ATTRIBUTE_RETURN_CHECK;
+int virPCIParseDeviceAddress(char *address, virPCIDeviceAddressPtr bdf); + int virPCIGetVirtualFunctionInfo(const char *vf_sysfs_device_path, char **pfname, int *vf_index);
ACK with the suggested rename Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 06/25/2013 06:30 AM, Daniel P. Berrange wrote:
On Mon, Jun 24, 2013 at 11:05:29PM -0400, Laine Stump wrote:
This function has utility outside of virpci.c, so make it public. --- src/libvirt_private.syms | 1 + src/util/virpci.c | 2 +- src/util/virpci.h | 2 ++ 3 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 410e932..643f311 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1728,6 +1728,7 @@ virPCIGetVirtualFunctionIndex; virPCIGetVirtualFunctionInfo; virPCIGetVirtualFunctions; virPCIIsVirtualFunction; +virPCIParseDeviceAddress;
# util/virpidfile.h diff --git a/src/util/virpci.c b/src/util/virpci.c index 51fad78..fc04cac 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -2046,7 +2046,7 @@ logStrToLong_ui(char const *s, return ret; }
-static int +int virPCIParseDeviceAddress(char *address, virPCIDeviceAddressPtr bdf) Given that the struct is 'virPCIDeviceAddressPtr' the method name is really better as virPCIDeviceAddressParse'
Yeah, I noticed that way after the fact (it was an existing function that I decided to use).
{ diff --git a/src/util/virpci.h b/src/util/virpci.h index 89717b8..bcf1b81 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -152,6 +152,8 @@ int virPCIGetAddrString(unsigned int domain, char **pciConfigAddr) ATTRIBUTE_NONNULL(5) ATTRIBUTE_RETURN_CHECK;
+int virPCIParseDeviceAddress(char *address, virPCIDeviceAddressPtr bdf); + int virPCIGetVirtualFunctionInfo(const char *vf_sysfs_device_path, char **pfname, int *vf_index); ACK with the suggested rename
Thanks. I'll make that change and push it.

Any device which belongs to an "IOMMU group" (used by vfio) will have links to all devices of its group listed in /sys/bus/pci/$device/iommu_group/devices; /sys/bus/pci/$device/iommu_group is actually a link to /sys/kernel/iommu_groups/$n, where $n is the group number (there will be a corresponding device node at /dev/vfio/$n once the devices are bound to the vfio-pci driver) The following functions are added: virPCIDeviceGetIOMMUGroupList Gets a virPCIDeviceList with one virPCIDeviceList for each device in the same IOMMU group as the provided virPCIDevice (a copy of the original device object is included in the list. virPCIIOMMUGroupIterate Calls the function @actor once for each device in the group that contains the given virPCIDeviceAddress. virPCIGetIOMMUGroupAddresses Fills in a virPCIDeviceAddressPtr * with an array of virPCIDeviceAddress, one for each device in the iommu group of the provided virPCIDeviceAddress (including a copy of the original). virPCIGetIOMMUGroupNum Returns the group number as an int (a valid group number will always be 0 or greater). If there is no iommu_group link in the device's directory (usually indicating that vfio isn't loaded), -2 will be returned. On any real error, -1 will be returned. --- src/libvirt_private.syms | 4 + src/util/virpci.c | 217 +++++++++++++++++++++++++++++++++++++++++++++++ src/util/virpci.h | 15 +++- 3 files changed, 233 insertions(+), 3 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 643f311..e906742 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1693,6 +1693,7 @@ virPCIDeviceDetach; virPCIDeviceFileIterate; virPCIDeviceFree; virPCIDeviceGetIOMMUGroupDev; +virPCIDeviceGetIOMMUGroupList; virPCIDeviceGetManaged; virPCIDeviceGetName; virPCIDeviceGetRemoveSlot; @@ -1722,11 +1723,14 @@ virPCIDeviceSetStubDriver; virPCIDeviceSetUnbindFromStub; virPCIDeviceSetUsedBy; virPCIDeviceWaitForCleanup; +virPCIGetIOMMUGroupAddresses; +virPCIGetIOMMUGroupNum; virPCIGetNetName; virPCIGetPhysicalFunction; virPCIGetVirtualFunctionIndex; virPCIGetVirtualFunctionInfo; virPCIGetVirtualFunctions; +virPCIIOMMUGroupIterate; virPCIIsVirtualFunction; virPCIParseDeviceAddress; diff --git a/src/util/virpci.c b/src/util/virpci.c index fc04cac..5209372 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1852,6 +1852,223 @@ cleanup: return ret; } + +/* virPCIIOMMUGroupIterate: + * Call @actor for all devices in the same iommu_group as orig + * (including orig itself) Even if there is no iommu_group for the + * device, call @actor once for orig. + */ +int +virPCIIOMMUGroupIterate(virPCIDeviceAddressPtr orig, + virPCIDeviceAddressActor actor, + void *opaque) +{ + char *groupPath = NULL; + DIR *groupDir = NULL; + int ret = -1; + struct dirent *ent; + + if (virAsprintf(&groupPath, + PCI_SYSFS "devices/%04x:%02x:%02x.%x/iommu_group/devices", + orig->domain, orig->bus, orig->slot, orig->function) < 0) { + virReportOOMError(); + goto cleanup; + } + + if (!(groupDir = opendir(groupPath))) { + /* just process the original device, nothing more */ + ret = (actor)(orig, opaque); + goto cleanup; + } + + while ((ent = readdir(groupDir)) != NULL) { + virPCIDeviceAddress newDev; + + if (ent->d_name[0] == '.') + continue; + + if (virPCIParseDeviceAddress(ent->d_name, &newDev) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Found invalid device link '%s' in '%s'"), + ent->d_name, groupPath); + goto cleanup; + } + + if ((actor)(&newDev, opaque) < 0) + goto cleanup; + } + + ret = 0; + +cleanup: + VIR_FREE(groupPath); + if (groupDir) + closedir(groupDir); + return ret; +} + + +static int +virPCIDeviceGetIOMMUGroupAddOne(virPCIDeviceAddressPtr newDevAddr, void *opaque) +{ + int ret = -1; + virPCIDeviceListPtr groupList = opaque; + virPCIDevicePtr newDev; + + if (!(newDev = virPCIDeviceNew(newDevAddr->domain, newDevAddr->bus, + newDevAddr->slot, newDevAddr->function))) { + goto cleanup; + } + + if (virPCIDeviceListAdd(groupList, newDev) < 0) { + goto cleanup; + } + newDev = NULL; /* it's now on the list */ + ret = 0; +cleanup: + virPCIDeviceFree(newDev); + return ret; +} + + +/* + * virPCIDeviceGetIOMMUGroupList - return a virPCIDeviceList containing + * all of the devices in the same iommu_group as @dev. + * + * Return the new list, or NULL on failure + */ +virPCIDeviceListPtr +virPCIDeviceGetIOMMUGroupList(virPCIDevicePtr dev) +{ + virPCIDeviceListPtr groupList = virPCIDeviceListNew(); + virPCIDeviceAddress devAddr = { dev->domain, dev->bus, + dev->slot, dev->function }; + + if (!groupList) + goto error; + + if (virPCIIOMMUGroupIterate(&devAddr, virPCIDeviceGetIOMMUGroupAddOne, + groupList) < 0) { + goto error; + } + return groupList; + +error: + virObjectUnref(groupList); + return NULL; +} + + +typedef struct { + virPCIDeviceAddressPtr **iommuGroupDevices; + size_t *nIommuGroupDevices; +} virPCIDeviceAddressList; +typedef virPCIDeviceAddressList *virPCIDeviceAddressListPtr; + +static int +virPCIGetIOMMUGroupAddressesAddOne(virPCIDeviceAddressPtr newDevAddr, void *opaque) +{ + int ret = -1; + virPCIDeviceAddressListPtr addrList = opaque; + virPCIDeviceAddressPtr copyAddr; + + /* make a copy to insert onto the list */ + if (VIR_ALLOC(copyAddr) < 0) + goto cleanup; + + *copyAddr = *newDevAddr; + + if (VIR_APPEND_ELEMENT(*addrList->iommuGroupDevices, + *addrList->nIommuGroupDevices, copyAddr) < 0) { + goto cleanup; + } + + ret = 0; +cleanup: + VIR_FREE(copyAddr); + return ret; +} + + +/* + * virPCIGetIOMMUGroupAddresses - return a virPCIDeviceList containing + * all of the devices in the same iommu_group as @dev. + * + * Return the new list, or NULL on failure + */ +int +virPCIGetIOMMUGroupAddresses(virPCIDeviceAddressPtr devAddr, + virPCIDeviceAddressPtr **iommuGroupDevices, + size_t *nIommuGroupDevices) +{ + int ret = -1; + virPCIDeviceAddressList addrList = { iommuGroupDevices, + nIommuGroupDevices }; + + if (virPCIIOMMUGroupIterate(devAddr, + virPCIGetIOMMUGroupAddressesAddOne, + &addrList) < 0) { + goto cleanup; + } + + ret = 0; +cleanup: + return ret; +} + + +/* virPCIGetIOMMUGroupNum - return the group number of this PCI + * device's iommu_group, or -2 if there is no iommu_group for the + * device (or -1 if there was any other error) + */ +int +virPCIGetIOMMUGroupNum(virPCIDeviceAddressPtr addr) +{ + char *devName = NULL; + char *devPath = NULL; + char *groupPath = NULL; + const char *groupNumStr; + unsigned int groupNum; + int ret = -1; + + if (virAsprintf(&devName, "%.4x:%.2x:%.2x.%.1x", addr->domain, + addr->bus, addr->slot, addr->function) < 0) { + virReportOOMError(); + goto cleanup; + } + + if (virPCIFile(&devPath, devName, "iommu_group") < 0) + goto cleanup; + if (virFileIsLink(devPath) != 1) { + ret = -2; + goto cleanup; + } + if (virFileResolveLink(devPath, &groupPath) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to resolve device %s iommu_group symlink %s"), + devName, devPath); + goto cleanup; + } + + groupNumStr = last_component(groupPath); + if (virStrToLong_ui(groupNumStr, NULL, 10, &groupNum) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("device %s iommu_group symlink %s has " + "invalid group number %s"), + devName, groupPath, groupNumStr); + ret = -1; + goto cleanup; + } + + ret = groupNum; +cleanup: + VIR_FREE(devName); + VIR_FREE(devPath); + VIR_FREE(groupPath); + return ret; +} + + /* virPCIDeviceGetIOMMUGroupDev - return the name of the device used * to control this PCI device's group (e.g. "/dev/vfio/15") */ diff --git a/src/util/virpci.h b/src/util/virpci.h index bcf1b81..972f86b 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -112,12 +112,21 @@ int virPCIDeviceListFindIndex(virPCIDeviceListPtr list, */ typedef int (*virPCIDeviceFileActor)(virPCIDevicePtr dev, const char *path, void *opaque); - int virPCIDeviceFileIterate(virPCIDevicePtr dev, virPCIDeviceFileActor actor, void *opaque); -char * -virPCIDeviceGetIOMMUGroupDev(virPCIDevicePtr dev); + +typedef int (*virPCIDeviceAddressActor)(virPCIDeviceAddress *addr, + void *opaque); +int virPCIIOMMUGroupIterate(virPCIDeviceAddressPtr orig, + virPCIDeviceAddressActor actor, + void *opaque); +virPCIDeviceListPtr virPCIDeviceGetIOMMUGroupList(virPCIDevicePtr dev); +int virPCIGetIOMMUGroupAddresses(virPCIDeviceAddressPtr devAddr, + virPCIDeviceAddressPtr **iommuGroupDevices, + size_t *nIommuGroupDevices); +int virPCIGetIOMMUGroupNum(virPCIDeviceAddressPtr dev); +char *virPCIDeviceGetIOMMUGroupDev(virPCIDevicePtr dev); int virPCIDeviceIsAssignable(virPCIDevicePtr dev, int strict_acs_check); -- 1.7.11.7

On Mon, Jun 24, 2013 at 11:05:30PM -0400, Laine Stump wrote:
Any device which belongs to an "IOMMU group" (used by vfio) will have links to all devices of its group listed in /sys/bus/pci/$device/iommu_group/devices; /sys/bus/pci/$device/iommu_group is actually a link to /sys/kernel/iommu_groups/$n, where $n is the group number (there will be a corresponding device node at /dev/vfio/$n once the devices are bound to the vfio-pci driver)
The following functions are added:
Naming convention is normally <object><action>, which I would interpret here as 'virPCIDeviceIOMMUGroup' as the object, since you're passing a 'virPCIDevice' or 'virPCIDeviceAddress' type as parameter.
virPCIDeviceGetIOMMUGroupList
Gets a virPCIDeviceList with one virPCIDeviceList for each device in the same IOMMU group as the provided virPCIDevice (a copy of the original device object is included in the list.
ack to this name
virPCIIOMMUGroupIterate
Calls the function @actor once for each device in the group that contains the given virPCIDeviceAddress.
virPCIDeviceIOMMUGroupIterate
virPCIGetIOMMUGroupAddresses
Fills in a virPCIDeviceAddressPtr * with an array of virPCIDeviceAddress, one for each device in the iommu group of the provided virPCIDeviceAddress (including a copy of the original).
virPCIDeviceGetIOMMUGroupAddresses
virPCIGetIOMMUGroupNum
Returns the group number as an int (a valid group number will always be 0 or greater). If there is no iommu_group link in the device's directory (usually indicating that vfio isn't loaded), -2 will be returned. On any real error, -1 will be returned.
virPCIDeviceGetIOMMUGroupNum Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 06/25/2013 06:34 AM, Daniel P. Berrange wrote:
On Mon, Jun 24, 2013 at 11:05:30PM -0400, Laine Stump wrote:
Any device which belongs to an "IOMMU group" (used by vfio) will have links to all devices of its group listed in /sys/bus/pci/$device/iommu_group/devices; /sys/bus/pci/$device/iommu_group is actually a link to /sys/kernel/iommu_groups/$n, where $n is the group number (there will be a corresponding device node at /dev/vfio/$n once the devices are bound to the vfio-pci driver)
The following functions are added: Naming convention is normally <object><action>, which I would interpret here as 'virPCIDeviceIOMMUGroup' as the object, since you're passing a 'virPCIDevice' or 'virPCIDeviceAddress' type as parameter.
virPCIDeviceGetIOMMUGroupList
Gets a virPCIDeviceList with one virPCIDeviceList for each device in the same IOMMU group as the provided virPCIDevice (a copy of the original device object is included in the list. ack to this name
virPCIIOMMUGroupIterate
Calls the function @actor once for each device in the group that contains the given virPCIDeviceAddress. virPCIDeviceIOMMUGroupIterate
Well, actually it would be virPCIDeviceAddressIOMMUGroupIterate, which is really quite a mouthful (and leads to very long lines). Due to the extreme length, and based on seeing some other function that had just a "virPCI" prefix, I chose the shorter name (although I secretly wondered if you would raise an objection). I definitely see the advantage of consistent naming, even if it's not (yet) done consistently across all functions, so I'll change these names.
virPCIGetIOMMUGroupAddresses
Fills in a virPCIDeviceAddressPtr * with an array of virPCIDeviceAddress, one for each device in the iommu group of the provided virPCIDeviceAddress (including a copy of the original). virPCIDeviceGetIOMMUGroupAddresses
virPCIDeviceAddressGetIOMMUGroupAddresses
virPCIGetIOMMUGroupNum
Returns the group number as an int (a valid group number will always be 0 or greater). If there is no iommu_group link in the device's directory (usually indicating that vfio isn't loaded), -2 will be returned. On any real error, -1 will be returned. virPCIDeviceGetIOMMUGroupNum
virPCIDeviceAddressGetIOMMUGroupNum Going through all this, I've noticed that we *still* have multiple "pci address" objects. For example virDevicePCIAddress (defined in /conf, includes a "multifunction" attribute) and virPCIDeviceAddress (defined in /util, no "multifunction"), and that many larger objects that include a PCI address just define the individual attributes instead of using one of the predefined objects. This leads to code that constructs one flavor of PCI address/device from another just to call a function that requires the other flavor. That *really* needs cleaning up...

On Mon, Jun 24, 2013 at 11:05:30PM -0400, Laine Stump wrote:
Any device which belongs to an "IOMMU group" (used by vfio) will have links to all devices of its group listed in /sys/bus/pci/$device/iommu_group/devices; /sys/bus/pci/$device/iommu_group is actually a link to /sys/kernel/iommu_groups/$n, where $n is the group number (there will be a corresponding device node at /dev/vfio/$n once the devices are bound to the vfio-pci driver)
diff --git a/src/util/virpci.c b/src/util/virpci.c index fc04cac..5209372 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1852,6 +1852,223 @@ cleanup: return ret; }
+ +/* virPCIIOMMUGroupIterate: + * Call @actor for all devices in the same iommu_group as orig + * (including orig itself) Even if there is no iommu_group for the + * device, call @actor once for orig. + */ +int +virPCIIOMMUGroupIterate(virPCIDeviceAddressPtr orig, + virPCIDeviceAddressActor actor, + void *opaque) +{ + char *groupPath = NULL; + DIR *groupDir = NULL; + int ret = -1; + struct dirent *ent; + + if (virAsprintf(&groupPath, + PCI_SYSFS "devices/%04x:%02x:%02x.%x/iommu_group/devices", + orig->domain, orig->bus, orig->slot, orig->function) < 0) { + virReportOOMError(); + goto cleanup; + } + + if (!(groupDir = opendir(groupPath))) { + /* just process the original device, nothing more */ + ret = (actor)(orig, opaque); + goto cleanup; + } +
Since Eric isn't around to nit-pick, I'll do it :-) Set 'errno = 0' here before the readdir() call
+ while ((ent = readdir(groupDir)) != NULL) { + virPCIDeviceAddress newDev; + + if (ent->d_name[0] == '.') + continue; + + if (virPCIParseDeviceAddress(ent->d_name, &newDev) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Found invalid device link '%s' in '%s'"), + ent->d_name, groupPath); + goto cleanup; + } + + if ((actor)(&newDev, opaque) < 0) + goto cleanup;
And here errno = 0;
+ }
Then do if (errno != 0) { virReportSystemError(errno, _("Failed to read directory entry for %s"), grouppath) goto cleanup } that way you distinguish EOF from error when readdir() returns NULL. /me makes a note that we should write a wrapper around readdir(), since almost all our code has this wrong.
+
+ ret = 0; + +cleanup: + VIR_FREE(groupPath); + if (groupDir) + closedir(groupDir); + return ret; +} + + +static int +virPCIDeviceGetIOMMUGroupAddOne(virPCIDeviceAddressPtr newDevAddr, void *opaque) +{ + int ret = -1; + virPCIDeviceListPtr groupList = opaque; + virPCIDevicePtr newDev; + + if (!(newDev = virPCIDeviceNew(newDevAddr->domain, newDevAddr->bus, + newDevAddr->slot, newDevAddr->function))) { + goto cleanup; + } + + if (virPCIDeviceListAdd(groupList, newDev) < 0) { + goto cleanup; + } + newDev = NULL; /* it's now on the list */ + ret = 0; +cleanup: + virPCIDeviceFree(newDev); + return ret; +} + + +/* + * virPCIDeviceGetIOMMUGroupList - return a virPCIDeviceList containing + * all of the devices in the same iommu_group as @dev. + * + * Return the new list, or NULL on failure + */ +virPCIDeviceListPtr +virPCIDeviceGetIOMMUGroupList(virPCIDevicePtr dev) +{ + virPCIDeviceListPtr groupList = virPCIDeviceListNew(); + virPCIDeviceAddress devAddr = { dev->domain, dev->bus, + dev->slot, dev->function }; + + if (!groupList) + goto error; + + if (virPCIIOMMUGroupIterate(&devAddr, virPCIDeviceGetIOMMUGroupAddOne, + groupList) < 0) { + goto error; + }
Overkill with {} here
+static int +virPCIGetIOMMUGroupAddressesAddOne(virPCIDeviceAddressPtr newDevAddr, void *opaque) +{ + int ret = -1; + virPCIDeviceAddressListPtr addrList = opaque; + virPCIDeviceAddressPtr copyAddr; + + /* make a copy to insert onto the list */ + if (VIR_ALLOC(copyAddr) < 0) + goto cleanup; + + *copyAddr = *newDevAddr; + + if (VIR_APPEND_ELEMENT(*addrList->iommuGroupDevices, + *addrList->nIommuGroupDevices, copyAddr) < 0) { + goto cleanup; + }
Overkill with {} Missing virReportOOMError() ?
+ + ret = 0; +cleanup: + VIR_FREE(copyAddr); + return ret; +} + + +/* + * virPCIGetIOMMUGroupAddresses - return a virPCIDeviceList containing + * all of the devices in the same iommu_group as @dev. + * + * Return the new list, or NULL on failure + */ +int +virPCIGetIOMMUGroupAddresses(virPCIDeviceAddressPtr devAddr, + virPCIDeviceAddressPtr **iommuGroupDevices, + size_t *nIommuGroupDevices) +{ + int ret = -1; + virPCIDeviceAddressList addrList = { iommuGroupDevices, + nIommuGroupDevices }; + + if (virPCIIOMMUGroupIterate(devAddr, + virPCIGetIOMMUGroupAddressesAddOne, + &addrList) < 0) { + goto cleanup; + }
Overkill with {}
+ + ret = 0; +cleanup: + return ret; +} +
Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 06/25/2013 06:40 AM, Daniel P. Berrange wrote:
On Mon, Jun 24, 2013 at 11:05:30PM -0400, Laine Stump wrote:
Any device which belongs to an "IOMMU group" (used by vfio) will have links to all devices of its group listed in /sys/bus/pci/$device/iommu_group/devices; /sys/bus/pci/$device/iommu_group is actually a link to /sys/kernel/iommu_groups/$n, where $n is the group number (there will be a corresponding device node at /dev/vfio/$n once the devices are bound to the vfio-pci driver) diff --git a/src/util/virpci.c b/src/util/virpci.c index fc04cac..5209372 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1852,6 +1852,223 @@ cleanup: return ret; }
+ +/* virPCIIOMMUGroupIterate: + * Call @actor for all devices in the same iommu_group as orig + * (including orig itself) Even if there is no iommu_group for the + * device, call @actor once for orig. + */ +int +virPCIIOMMUGroupIterate(virPCIDeviceAddressPtr orig, + virPCIDeviceAddressActor actor, + void *opaque) +{ + char *groupPath = NULL; + DIR *groupDir = NULL; + int ret = -1; + struct dirent *ent; + + if (virAsprintf(&groupPath, + PCI_SYSFS "devices/%04x:%02x:%02x.%x/iommu_group/devices", + orig->domain, orig->bus, orig->slot, orig->function) < 0) { + virReportOOMError(); + goto cleanup; + } + + if (!(groupDir = opendir(groupPath))) { + /* just process the original device, nothing more */ + ret = (actor)(orig, opaque); + goto cleanup; + } + Since Eric isn't around to nit-pick, I'll do it :-)
Set 'errno = 0' here before the readdir() call
+ while ((ent = readdir(groupDir)) != NULL) { + virPCIDeviceAddress newDev; + + if (ent->d_name[0] == '.') + continue; + + if (virPCIParseDeviceAddress(ent->d_name, &newDev) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Found invalid device link '%s' in '%s'"), + ent->d_name, groupPath); + goto cleanup; + } + + if ((actor)(&newDev, opaque) < 0) + goto cleanup; And here
errno = 0;
+ } Then do
if (errno != 0) { virReportSystemError(errno, _("Failed to read directory entry for %s"), grouppath) goto cleanup }
that way you distinguish EOF from error when readdir() returns NULL.
/me makes a note that we should write a wrapper around readdir(), since almost all our code has this wrong.
Definitely. If I ever did know about this idiosyncracy of readdir() in the past, I had certainly forgotten it.
+ + ret = 0; + +cleanup: + VIR_FREE(groupPath); + if (groupDir) + closedir(groupDir); + return ret; +} + + +static int +virPCIDeviceGetIOMMUGroupAddOne(virPCIDeviceAddressPtr newDevAddr, void *opaque) +{ + int ret = -1; + virPCIDeviceListPtr groupList = opaque; + virPCIDevicePtr newDev; + + if (!(newDev = virPCIDeviceNew(newDevAddr->domain, newDevAddr->bus, + newDevAddr->slot, newDevAddr->function))) { + goto cleanup; + } + + if (virPCIDeviceListAdd(groupList, newDev) < 0) { + goto cleanup; + } + newDev = NULL; /* it's now on the list */ + ret = 0; +cleanup: + virPCIDeviceFree(newDev); + return ret; +} + + +/* + * virPCIDeviceGetIOMMUGroupList - return a virPCIDeviceList containing + * all of the devices in the same iommu_group as @dev. + * + * Return the new list, or NULL on failure + */ +virPCIDeviceListPtr +virPCIDeviceGetIOMMUGroupList(virPCIDevicePtr dev) +{ + virPCIDeviceListPtr groupList = virPCIDeviceListNew(); + virPCIDeviceAddress devAddr = { dev->domain, dev->bus, + dev->slot, dev->function }; + + if (!groupList) + goto error; + + if (virPCIIOMMUGroupIterate(&devAddr, virPCIDeviceGetIOMMUGroupAddOne, + groupList) < 0) { + goto error; + } Overkill with {} here
In addition to the necessary braces when the body is > 1 line, I also like to have {} when the conditional itself is > 1 line (and I've been doing that ever since I can remember). If you dislike it I can cease and desist, though :-)
+static int +virPCIGetIOMMUGroupAddressesAddOne(virPCIDeviceAddressPtr newDevAddr, void *opaque) +{ + int ret = -1; + virPCIDeviceAddressListPtr addrList = opaque; + virPCIDeviceAddressPtr copyAddr; + + /* make a copy to insert onto the list */ + if (VIR_ALLOC(copyAddr) < 0) + goto cleanup; + + *copyAddr = *newDevAddr; + + if (VIR_APPEND_ELEMENT(*addrList->iommuGroupDevices, + *addrList->nIommuGroupDevices, copyAddr) < 0) { + goto cleanup; + } Overkill with {}
Missing virReportOOMError() ?
Yes. I caught myself omitting one of those the other night, and at the time thought to myself "I think I left one out somewhere else too...". Maybe we should send *all* VIR_* macros down the same road as VIR_STRDUP, and integrate OOM error logging into them.
+ + ret = 0; +cleanup: + VIR_FREE(copyAddr); + return ret; +} + + +/* + * virPCIGetIOMMUGroupAddresses - return a virPCIDeviceList containing + * all of the devices in the same iommu_group as @dev. + * + * Return the new list, or NULL on failure + */ +int +virPCIGetIOMMUGroupAddresses(virPCIDeviceAddressPtr devAddr, + virPCIDeviceAddressPtr **iommuGroupDevices, + size_t *nIommuGroupDevices) +{ + int ret = -1; + virPCIDeviceAddressList addrList = { iommuGroupDevices, + nIommuGroupDevices }; + + if (virPCIIOMMUGroupIterate(devAddr, + virPCIGetIOMMUGroupAddressesAddOne, + &addrList) < 0) { + goto cleanup; + } Overkill with {}
Same deal - the condition is 3 lines long and this gives me a visual crush. It could just be because I'm accustomed to it though (I also used to hate the K&R/libvirt style of putting the opening brace at the end of the line, but I've gotten used to that too)

This includes adding it to the nodedev parser and formatter, docs, and test. --- docs/formatnode.html.in | 63 +++++++++++++++- docs/schemas/nodedev.rng | 11 +++ src/conf/node_device_conf.c | 86 +++++++++++++++++++++- src/conf/node_device_conf.h | 5 +- src/node_device/node_device_udev.c | 21 +++++- tests/nodedevschemadata/pci_8086_10c9_sriov_pf.xml | 16 ++++ tests/nodedevxml2xmltest.c | 1 + 7 files changed, 199 insertions(+), 4 deletions(-) create mode 100644 tests/nodedevschemadata/pci_8086_10c9_sriov_pf.xml diff --git a/docs/formatnode.html.in b/docs/formatnode.html.in index 11654e5..ad05a70 100644 --- a/docs/formatnode.html.in +++ b/docs/formatnode.html.in @@ -80,6 +80,36 @@ <dd>Vendor details from the device ROM, including an attribute <code>id</code> with the hexadecimal vendor id, and an optional text name of that vendor.</dd> + <dt><code>iommuGroup</code></dt> + <dd> + This optional element describes the "IOMMU group" this + device belongs to. If the element exists, it has a + mandatory <code>number</code> attribute which tells + the group number used for management of the group (all + devices in group "n" will be found in + "/sys/kernel/iommu_groups/n"). It will also have a + list of <code>address</code> subelements, each + containing the PCI address of a device in the same + group. The toplevel device will itself be included in + this list. + </dd> + <dt><code>capability</code></dt> + <dd> + This optional element can occur multiple times. If it + exists, it has a mandatory <code>type</code> attribute + which will be set to + either <code>physical_function</code> + or <code>virtual_functions</code>. If the type + is <code>physical_function</code>, there will be a + single <code>address</code> subelement which contains + the PCI address of the SRIOV Physical Function (PF) + that is the parent of this device (and this device is, + by implication, an SRIOV Virtual Function (VF)). If + the type is <code>virtual_functions</code>, then this + device is an SRIOV PF, and the capability element will + have a list of <code>address</code> subelements, one + for each VF on this PF. + </dd> </dl> </dd> <dt><code>usb_device</code></dt> @@ -232,7 +262,38 @@ <address>00:27:13:6a:fe:00</address> <capability type='80203'/> </capability> -</device></pre> +</device> + +<device> + <name>pci_0000_02_00_0</name> + <path>/sys/devices/pci0000:00/0000:00:04.0/0000:02:00.0</path> + <parent>pci_0000_00_04_0</parent> + <driver> + <name>igb</name> + </driver> + <capability type='pci'> + <domain>0</domain> + <bus>2</bus> + <slot>0</slot> + <function>0</function> + <product id='0x10c9'>82576 Gigabit Network Connection</product> + <vendor id='0x8086'>Intel Corporation</vendor> + <capability type='virt_functions'> + <address domain='0x0000' bus='0x02' slot='0x10' function='0x0'/> + <address domain='0x0000' bus='0x02' slot='0x10' function='0x2'/> + <address domain='0x0000' bus='0x02' slot='0x10' function='0x4'/> + <address domain='0x0000' bus='0x02' slot='0x10' function='0x6'/> + <address domain='0x0000' bus='0x02' slot='0x11' function='0x0'/> + <address domain='0x0000' bus='0x02' slot='0x11' function='0x2'/> + <address domain='0x0000' bus='0x02' slot='0x11' function='0x4'/> + </capability> + <iommuGroup number='12'> + <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> + <address domain='0x0000' bus='0x02' slot='0x00' function='0x1'/> + </iommuGroup> + </capability> +</device> + </pre> </body> </html> diff --git a/docs/schemas/nodedev.rng b/docs/schemas/nodedev.rng index 1f5dcb9..d2bebff 100644 --- a/docs/schemas/nodedev.rng +++ b/docs/schemas/nodedev.rng @@ -144,6 +144,17 @@ </element> </optional> + <optional> + <element name='iommuGroup'> + <attribute name='number'> + <ref name='unsignedInt'/> + </attribute> + <oneOrMore> + <ref name='address'/> + </oneOrMore> + </element> + </optional> + </define> <define name='capusbdev'> diff --git a/src/conf/node_device_conf.c b/src/conf/node_device_conf.c index 548cd8f..96742ef 100644 --- a/src/conf/node_device_conf.c +++ b/src/conf/node_device_conf.c @@ -1,6 +1,7 @@ /* * node_device_conf.c: config handling for node devices * + * Copyright (C) 2009-2013 Red Hat, Inc. * Copyright (C) 2008 Virtual Iron Software, Inc. * Copyright (C) 2008 David F. Lively * @@ -31,6 +32,7 @@ #include "viralloc.h" #include "virstring.h" #include "node_device_conf.h" +#include "device_conf.h" #include "virxml.h" #include "virbuffer.h" #include "viruuid.h" @@ -330,6 +332,20 @@ char *virNodeDeviceDefFormat(const virNodeDeviceDefPtr def) } virBufferAddLit(&buf, " </capability>\n"); } + if (data->pci_dev.nIommuGroupDevices) { + virBufferAsprintf(&buf, " <iommuGroup number='%d'>\n", + data->pci_dev.iommuGroupNumber); + for (i = 0; i < data->pci_dev.nIommuGroupDevices; i++) { + virBufferAsprintf(&buf, + " <address domain='0x%.4x' bus='0x%.2x' " + "slot='0x%.2x' function='0x%.1x'/>\n", + data->pci_dev.iommuGroupDevices[i]->domain, + data->pci_dev.iommuGroupDevices[i]->bus, + data->pci_dev.iommuGroupDevices[i]->slot, + data->pci_dev.iommuGroupDevices[i]->function); + } + virBufferAddLit(&buf, " </iommuGroup>\n"); + } break; case VIR_NODE_DEV_CAP_USB_DEV: virBufferAsprintf(&buf, " <bus>%d</bus>\n", data->usb_dev.bus); @@ -966,12 +982,71 @@ out: } static int +virNodeDevCapPciDevIommuGroupParseXML(xmlXPathContextPtr ctxt, + xmlNodePtr iommuGroupNode, + union _virNodeDevCapData *data) +{ + xmlNodePtr origNode = ctxt->node; + xmlNodePtr *addrNodes = NULL; + char *numberStr = NULL; + int nAddrNodes, ii, ret = -1; + virPCIDeviceAddressPtr pciAddr = NULL; + + ctxt->node = iommuGroupNode; + + numberStr = virXMLPropString(iommuGroupNode, "number"); + if (!numberStr) { + virReportError(VIR_ERR_XML_ERROR, + "%s", _("missing iommuGroup number attribute")); + goto cleanup; + } + if (virStrToLong_ui(numberStr, NULL, 10, + &data->pci_dev.iommuGroupNumber) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid iommuGroup number attribute '%s'"), + numberStr); + goto cleanup; + } + + if ((nAddrNodes = virXPathNodeSet("./address", ctxt, &addrNodes)) < 0) + goto cleanup; + + for (ii = 0; ii < nAddrNodes; ii++) { + virDevicePCIAddress addr = { 0, 0, 0, 0, 0 }; + if (virDevicePCIAddressParseXML(addrNodes[ii], &addr) < 0) + goto cleanup; + if (VIR_ALLOC(pciAddr) < 0) { + virReportOOMError(); + goto cleanup; + } + pciAddr->domain = addr.domain; + pciAddr->bus = addr.bus; + pciAddr->slot = addr.slot; + pciAddr->function = addr.function; + if (VIR_APPEND_ELEMENT(data->pci_dev.iommuGroupDevices, + data->pci_dev.nIommuGroupDevices, + pciAddr) < 0) { + virReportOOMError(); + goto cleanup; + } + } + + ret = 0; +cleanup: + ctxt->node = origNode; + VIR_FREE(addrNodes); + VIR_FREE(pciAddr); + return ret; +} + + +static int virNodeDevCapPciDevParseXML(xmlXPathContextPtr ctxt, virNodeDeviceDefPtr def, xmlNodePtr node, union _virNodeDevCapData *data) { - xmlNodePtr orignode; + xmlNodePtr orignode, iommuGroupNode; int ret = -1; orignode = ctxt->node; @@ -1016,6 +1091,12 @@ virNodeDevCapPciDevParseXML(xmlXPathContextPtr ctxt, data->pci_dev.vendor_name = virXPathString("string(./vendor[1])", ctxt); data->pci_dev.product_name = virXPathString("string(./product[1])", ctxt); + if ((iommuGroupNode = virXPathNode("./iommuGroup[1]", ctxt))) { + if (virNodeDevCapPciDevIommuGroupParseXML(ctxt, iommuGroupNode, + data) < 0) { + goto out; + } + } ret = 0; out: ctxt->node = orignode; @@ -1385,6 +1466,9 @@ void virNodeDevCapsDefFree(virNodeDevCapsDefPtr caps) for (i = 0; i < data->pci_dev.num_virtual_functions; i++) { VIR_FREE(data->pci_dev.virtual_functions[i]); } + for (i = 0; i < data->pci_dev.nIommuGroupDevices; i++) { + VIR_FREE(data->pci_dev.iommuGroupDevices[i]); + } break; case VIR_NODE_DEV_CAP_USB_DEV: VIR_FREE(data->usb_dev.product_name); diff --git a/src/conf/node_device_conf.h b/src/conf/node_device_conf.h index 17240be..ec35da2 100644 --- a/src/conf/node_device_conf.h +++ b/src/conf/node_device_conf.h @@ -1,7 +1,7 @@ /* * node_device_conf.h: config handling for node devices * - * Copyright (C) 2010-2011 Red Hat, Inc. + * Copyright (C) 2009-2013 Red Hat, Inc. * Copyright (C) 2008 Virtual Iron Software, Inc. * Copyright (C) 2008 David F. Lively * @@ -112,6 +112,9 @@ struct _virNodeDevCapsDef { virPCIDeviceAddressPtr *virtual_functions; unsigned int num_virtual_functions; unsigned int flags; + virPCIDeviceAddressPtr *iommuGroupDevices; + size_t nIommuGroupDevices; + unsigned int iommuGroupNumber; } pci_dev; struct { unsigned int bus; diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index b94be80..9bb4a17 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -421,7 +421,8 @@ static int udevProcessPCI(struct udev_device *device, { const char *syspath = NULL; union _virNodeDevCapData *data = &def->caps->data; - int ret = -1; + virPCIDeviceAddress addr; + int tmpGroup, ret = -1; char *p; int rc; @@ -501,6 +502,24 @@ static int udevProcessPCI(struct udev_device *device, else if (!rc && (data->pci_dev.num_virtual_functions > 0)) data->pci_dev.flags |= VIR_NODE_DEV_CAP_FLAG_PCI_VIRTUAL_FUNCTION; + /* iommu group */ + addr.domain = data->pci_dev.domain; + addr.bus = data->pci_dev.bus; + addr.slot = data->pci_dev.slot; + addr.function = data->pci_dev.function; + tmpGroup = virPCIGetIOMMUGroupNum(&addr); + if (tmpGroup == -1) { + /* error was already reported */ + goto out; + /* -2 return means there is no iommu_group data */ + } else if (tmpGroup >= 0) { + if (virPCIGetIOMMUGroupAddresses(&addr, &data->pci_dev.iommuGroupDevices, + &data->pci_dev.nIommuGroupDevices) < 0) { + goto out; + } + data->pci_dev.iommuGroupNumber = tmpGroup; + } + ret = 0; out: diff --git a/tests/nodedevschemadata/pci_8086_10c9_sriov_pf.xml b/tests/nodedevschemadata/pci_8086_10c9_sriov_pf.xml new file mode 100644 index 0000000..eff8932 --- /dev/null +++ b/tests/nodedevschemadata/pci_8086_10c9_sriov_pf.xml @@ -0,0 +1,16 @@ +<device> + <name>pci_0000_02_00_0</name> + <parent>pci_0000_00_04_0</parent> + <capability type='pci'> + <domain>0</domain> + <bus>2</bus> + <slot>0</slot> + <function>0</function> + <product id='0x10c9'>82576 Gigabit Network Connection</product> + <vendor id='0x8086'>Intel Corporation</vendor> + <iommuGroup number='12'> + <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> + <address domain='0x0000' bus='0x02' slot='0x00' function='0x1'/> + </iommuGroup> + </capability> +</device> diff --git a/tests/nodedevxml2xmltest.c b/tests/nodedevxml2xmltest.c index e81c617..ed49857 100644 --- a/tests/nodedevxml2xmltest.c +++ b/tests/nodedevxml2xmltest.c @@ -78,6 +78,7 @@ mymain(void) DO_TEST("net_00_13_02_b9_f9_d3"); DO_TEST("net_00_15_58_2f_e9_55"); DO_TEST("pci_1002_71c4"); + DO_TEST("pci_8086_10c9_sriov_pf"); DO_TEST("pci_8086_27c5_scsi_host_0"); DO_TEST("pci_8086_27c5_scsi_host_scsi_device_lun0"); DO_TEST("pci_8086_27c5_scsi_host_scsi_host"); -- 1.7.11.7

On Mon, Jun 24, 2013 at 11:05:31PM -0400, Laine Stump wrote:
This includes adding it to the nodedev parser and formatter, docs, and test. --- docs/formatnode.html.in | 63 +++++++++++++++- docs/schemas/nodedev.rng | 11 +++ src/conf/node_device_conf.c | 86 +++++++++++++++++++++- src/conf/node_device_conf.h | 5 +- src/node_device/node_device_udev.c | 21 +++++- tests/nodedevschemadata/pci_8086_10c9_sriov_pf.xml | 16 ++++ tests/nodedevxml2xmltest.c | 1 + 7 files changed, 199 insertions(+), 4 deletions(-) create mode 100644 tests/nodedevschemadata/pci_8086_10c9_sriov_pf.xml
ACK, but could you illustrate the XML syntax in the commit message. I know its shown in the docs, but it makes life easier when browsing 'git log' Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

I just realized that I had only implemented this for the udev nodeDevice driver, but not the HAL driver. I can easily add the same code into the HAL driver, but don't have any system to test building it on. Should I put that code in untested, or leave the HAL driver without this functionality? (Does anyone still use HAL?) On 06/24/2013 11:05 PM, Laine Stump wrote:
This includes adding it to the nodedev parser and formatter, docs, and test. --- docs/formatnode.html.in | 63 +++++++++++++++- docs/schemas/nodedev.rng | 11 +++ src/conf/node_device_conf.c | 86 +++++++++++++++++++++- src/conf/node_device_conf.h | 5 +- src/node_device/node_device_udev.c | 21 +++++- tests/nodedevschemadata/pci_8086_10c9_sriov_pf.xml | 16 ++++ tests/nodedevxml2xmltest.c | 1 + 7 files changed, 199 insertions(+), 4 deletions(-) create mode 100644 tests/nodedevschemadata/pci_8086_10c9_sriov_pf.xml
diff --git a/docs/formatnode.html.in b/docs/formatnode.html.in index 11654e5..ad05a70 100644 --- a/docs/formatnode.html.in +++ b/docs/formatnode.html.in @@ -80,6 +80,36 @@ <dd>Vendor details from the device ROM, including an attribute <code>id</code> with the hexadecimal vendor id, and an optional text name of that vendor.</dd> + <dt><code>iommuGroup</code></dt> + <dd> + This optional element describes the "IOMMU group" this + device belongs to. If the element exists, it has a + mandatory <code>number</code> attribute which tells + the group number used for management of the group (all + devices in group "n" will be found in + "/sys/kernel/iommu_groups/n"). It will also have a + list of <code>address</code> subelements, each + containing the PCI address of a device in the same + group. The toplevel device will itself be included in + this list. + </dd> + <dt><code>capability</code></dt> + <dd> + This optional element can occur multiple times. If it + exists, it has a mandatory <code>type</code> attribute + which will be set to + either <code>physical_function</code> + or <code>virtual_functions</code>. If the type + is <code>physical_function</code>, there will be a + single <code>address</code> subelement which contains + the PCI address of the SRIOV Physical Function (PF) + that is the parent of this device (and this device is, + by implication, an SRIOV Virtual Function (VF)). If + the type is <code>virtual_functions</code>, then this + device is an SRIOV PF, and the capability element will + have a list of <code>address</code> subelements, one + for each VF on this PF. + </dd> </dl> </dd> <dt><code>usb_device</code></dt> @@ -232,7 +262,38 @@ <address>00:27:13:6a:fe:00</address> <capability type='80203'/> </capability> -</device></pre> +</device> + +<device> + <name>pci_0000_02_00_0</name> + <path>/sys/devices/pci0000:00/0000:00:04.0/0000:02:00.0</path> + <parent>pci_0000_00_04_0</parent> + <driver> + <name>igb</name> + </driver> + <capability type='pci'> + <domain>0</domain> + <bus>2</bus> + <slot>0</slot> + <function>0</function> + <product id='0x10c9'>82576 Gigabit Network Connection</product> + <vendor id='0x8086'>Intel Corporation</vendor> + <capability type='virt_functions'> + <address domain='0x0000' bus='0x02' slot='0x10' function='0x0'/> + <address domain='0x0000' bus='0x02' slot='0x10' function='0x2'/> + <address domain='0x0000' bus='0x02' slot='0x10' function='0x4'/> + <address domain='0x0000' bus='0x02' slot='0x10' function='0x6'/> + <address domain='0x0000' bus='0x02' slot='0x11' function='0x0'/> + <address domain='0x0000' bus='0x02' slot='0x11' function='0x2'/> + <address domain='0x0000' bus='0x02' slot='0x11' function='0x4'/> + </capability> + <iommuGroup number='12'> + <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> + <address domain='0x0000' bus='0x02' slot='0x00' function='0x1'/> + </iommuGroup> + </capability> +</device> + </pre>
</body> </html> diff --git a/docs/schemas/nodedev.rng b/docs/schemas/nodedev.rng index 1f5dcb9..d2bebff 100644 --- a/docs/schemas/nodedev.rng +++ b/docs/schemas/nodedev.rng @@ -144,6 +144,17 @@ </element> </optional>
+ <optional> + <element name='iommuGroup'> + <attribute name='number'> + <ref name='unsignedInt'/> + </attribute> + <oneOrMore> + <ref name='address'/> + </oneOrMore> + </element> + </optional> + </define>
<define name='capusbdev'> diff --git a/src/conf/node_device_conf.c b/src/conf/node_device_conf.c index 548cd8f..96742ef 100644 --- a/src/conf/node_device_conf.c +++ b/src/conf/node_device_conf.c @@ -1,6 +1,7 @@ /* * node_device_conf.c: config handling for node devices * + * Copyright (C) 2009-2013 Red Hat, Inc. * Copyright (C) 2008 Virtual Iron Software, Inc. * Copyright (C) 2008 David F. Lively * @@ -31,6 +32,7 @@ #include "viralloc.h" #include "virstring.h" #include "node_device_conf.h" +#include "device_conf.h" #include "virxml.h" #include "virbuffer.h" #include "viruuid.h" @@ -330,6 +332,20 @@ char *virNodeDeviceDefFormat(const virNodeDeviceDefPtr def) } virBufferAddLit(&buf, " </capability>\n"); } + if (data->pci_dev.nIommuGroupDevices) { + virBufferAsprintf(&buf, " <iommuGroup number='%d'>\n", + data->pci_dev.iommuGroupNumber); + for (i = 0; i < data->pci_dev.nIommuGroupDevices; i++) { + virBufferAsprintf(&buf, + " <address domain='0x%.4x' bus='0x%.2x' " + "slot='0x%.2x' function='0x%.1x'/>\n", + data->pci_dev.iommuGroupDevices[i]->domain, + data->pci_dev.iommuGroupDevices[i]->bus, + data->pci_dev.iommuGroupDevices[i]->slot, + data->pci_dev.iommuGroupDevices[i]->function); + } + virBufferAddLit(&buf, " </iommuGroup>\n"); + } break; case VIR_NODE_DEV_CAP_USB_DEV: virBufferAsprintf(&buf, " <bus>%d</bus>\n", data->usb_dev.bus); @@ -966,12 +982,71 @@ out: }
static int +virNodeDevCapPciDevIommuGroupParseXML(xmlXPathContextPtr ctxt, + xmlNodePtr iommuGroupNode, + union _virNodeDevCapData *data) +{ + xmlNodePtr origNode = ctxt->node; + xmlNodePtr *addrNodes = NULL; + char *numberStr = NULL; + int nAddrNodes, ii, ret = -1; + virPCIDeviceAddressPtr pciAddr = NULL; + + ctxt->node = iommuGroupNode; + + numberStr = virXMLPropString(iommuGroupNode, "number"); + if (!numberStr) { + virReportError(VIR_ERR_XML_ERROR, + "%s", _("missing iommuGroup number attribute")); + goto cleanup; + } + if (virStrToLong_ui(numberStr, NULL, 10, + &data->pci_dev.iommuGroupNumber) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid iommuGroup number attribute '%s'"), + numberStr); + goto cleanup; + } + + if ((nAddrNodes = virXPathNodeSet("./address", ctxt, &addrNodes)) < 0) + goto cleanup; + + for (ii = 0; ii < nAddrNodes; ii++) { + virDevicePCIAddress addr = { 0, 0, 0, 0, 0 }; + if (virDevicePCIAddressParseXML(addrNodes[ii], &addr) < 0) + goto cleanup; + if (VIR_ALLOC(pciAddr) < 0) { + virReportOOMError(); + goto cleanup; + } + pciAddr->domain = addr.domain; + pciAddr->bus = addr.bus; + pciAddr->slot = addr.slot; + pciAddr->function = addr.function; + if (VIR_APPEND_ELEMENT(data->pci_dev.iommuGroupDevices, + data->pci_dev.nIommuGroupDevices, + pciAddr) < 0) { + virReportOOMError(); + goto cleanup; + } + } + + ret = 0; +cleanup: + ctxt->node = origNode; + VIR_FREE(addrNodes); + VIR_FREE(pciAddr); + return ret; +} + + +static int virNodeDevCapPciDevParseXML(xmlXPathContextPtr ctxt, virNodeDeviceDefPtr def, xmlNodePtr node, union _virNodeDevCapData *data) { - xmlNodePtr orignode; + xmlNodePtr orignode, iommuGroupNode; int ret = -1;
orignode = ctxt->node; @@ -1016,6 +1091,12 @@ virNodeDevCapPciDevParseXML(xmlXPathContextPtr ctxt, data->pci_dev.vendor_name = virXPathString("string(./vendor[1])", ctxt); data->pci_dev.product_name = virXPathString("string(./product[1])", ctxt);
+ if ((iommuGroupNode = virXPathNode("./iommuGroup[1]", ctxt))) { + if (virNodeDevCapPciDevIommuGroupParseXML(ctxt, iommuGroupNode, + data) < 0) { + goto out; + } + } ret = 0; out: ctxt->node = orignode; @@ -1385,6 +1466,9 @@ void virNodeDevCapsDefFree(virNodeDevCapsDefPtr caps) for (i = 0; i < data->pci_dev.num_virtual_functions; i++) { VIR_FREE(data->pci_dev.virtual_functions[i]); } + for (i = 0; i < data->pci_dev.nIommuGroupDevices; i++) { + VIR_FREE(data->pci_dev.iommuGroupDevices[i]); + } break; case VIR_NODE_DEV_CAP_USB_DEV: VIR_FREE(data->usb_dev.product_name); diff --git a/src/conf/node_device_conf.h b/src/conf/node_device_conf.h index 17240be..ec35da2 100644 --- a/src/conf/node_device_conf.h +++ b/src/conf/node_device_conf.h @@ -1,7 +1,7 @@ /* * node_device_conf.h: config handling for node devices * - * Copyright (C) 2010-2011 Red Hat, Inc. + * Copyright (C) 2009-2013 Red Hat, Inc. * Copyright (C) 2008 Virtual Iron Software, Inc. * Copyright (C) 2008 David F. Lively * @@ -112,6 +112,9 @@ struct _virNodeDevCapsDef { virPCIDeviceAddressPtr *virtual_functions; unsigned int num_virtual_functions; unsigned int flags; + virPCIDeviceAddressPtr *iommuGroupDevices; + size_t nIommuGroupDevices; + unsigned int iommuGroupNumber; } pci_dev; struct { unsigned int bus; diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index b94be80..9bb4a17 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -421,7 +421,8 @@ static int udevProcessPCI(struct udev_device *device, { const char *syspath = NULL; union _virNodeDevCapData *data = &def->caps->data; - int ret = -1; + virPCIDeviceAddress addr; + int tmpGroup, ret = -1; char *p; int rc;
@@ -501,6 +502,24 @@ static int udevProcessPCI(struct udev_device *device, else if (!rc && (data->pci_dev.num_virtual_functions > 0)) data->pci_dev.flags |= VIR_NODE_DEV_CAP_FLAG_PCI_VIRTUAL_FUNCTION;
+ /* iommu group */ + addr.domain = data->pci_dev.domain; + addr.bus = data->pci_dev.bus; + addr.slot = data->pci_dev.slot; + addr.function = data->pci_dev.function; + tmpGroup = virPCIGetIOMMUGroupNum(&addr); + if (tmpGroup == -1) { + /* error was already reported */ + goto out; + /* -2 return means there is no iommu_group data */ + } else if (tmpGroup >= 0) { + if (virPCIGetIOMMUGroupAddresses(&addr, &data->pci_dev.iommuGroupDevices, + &data->pci_dev.nIommuGroupDevices) < 0) { + goto out; + } + data->pci_dev.iommuGroupNumber = tmpGroup; + } + ret = 0;
out: diff --git a/tests/nodedevschemadata/pci_8086_10c9_sriov_pf.xml b/tests/nodedevschemadata/pci_8086_10c9_sriov_pf.xml new file mode 100644 index 0000000..eff8932 --- /dev/null +++ b/tests/nodedevschemadata/pci_8086_10c9_sriov_pf.xml @@ -0,0 +1,16 @@ +<device> + <name>pci_0000_02_00_0</name> + <parent>pci_0000_00_04_0</parent> + <capability type='pci'> + <domain>0</domain> + <bus>2</bus> + <slot>0</slot> + <function>0</function> + <product id='0x10c9'>82576 Gigabit Network Connection</product> + <vendor id='0x8086'>Intel Corporation</vendor> + <iommuGroup number='12'> + <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> + <address domain='0x0000' bus='0x02' slot='0x00' function='0x1'/> + </iommuGroup> + </capability> +</device> diff --git a/tests/nodedevxml2xmltest.c b/tests/nodedevxml2xmltest.c index e81c617..ed49857 100644 --- a/tests/nodedevxml2xmltest.c +++ b/tests/nodedevxml2xmltest.c @@ -78,6 +78,7 @@ mymain(void) DO_TEST("net_00_13_02_b9_f9_d3"); DO_TEST("net_00_15_58_2f_e9_55"); DO_TEST("pci_1002_71c4"); + DO_TEST("pci_8086_10c9_sriov_pf"); DO_TEST("pci_8086_27c5_scsi_host_0"); DO_TEST("pci_8086_27c5_scsi_host_scsi_device_lun0"); DO_TEST("pci_8086_27c5_scsi_host_scsi_host");

On Wed, Jun 26, 2013 at 12:53:47AM -0400, Laine Stump wrote:
I just realized that I had only implemented this for the udev nodeDevice driver, but not the HAL driver. I can easily add the same code into the HAL driver, but don't have any system to test building it on.
Should I put that code in untested, or leave the HAL driver without this functionality? (Does anyone still use HAL?)
HAL is dead, so no system which is new enough to support vfio will be using HAL. As such you can ignore it for anything that is related to vfio IMHO Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

The same strings were being re-created multiple times just to save declaring a new variable. In the meantime, the use of the generic variable names led to confusion when trying to follow the code. This patch creates strings for: stubDriverName (was called "driver" in original args) stubDriverPath ("/sys/bus/pci/drivers/${stubDriverName}") driverLink ("${device}/driver") oldDriverName (the final component of path linked to by "${device}/driver") oldDriverPath ("/sys/bus/pci/drivers/${oldDriverName}") then re-uses them as necessary. --- src/util/virpci.c | 73 ++++++++++++++++++++++++++++--------------------------- 1 file changed, 37 insertions(+), 36 deletions(-) diff --git a/src/util/virpci.c b/src/util/virpci.c index 5209372..128f721 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1068,22 +1068,30 @@ cleanup: static int -virPCIDeviceBindToStub(virPCIDevicePtr dev, const char *driver) +virPCIDeviceBindToStub(virPCIDevicePtr dev, + const char *stubDriverName) { int result = -1; - char *drvdir = NULL; - char *path = NULL; int reprobe = false; - /* check whether the device is already bound to a driver */ - if (virPCIDriverDir(&drvdir, driver) < 0 || - virPCIFile(&path, dev->name, "driver") < 0) { + char *stubDriverPath = NULL; + char *driverLink = NULL; + char *oldDriverPath = NULL; + char *oldDriverName = NULL; + char *path = NULL; /* reused for different purposes */ + + if (virPCIDriverDir(&stubDriverPath, stubDriverName) < 0 || + virPCIFile(&driverLink, dev->name, "driver") < 0 || + virPCIDeviceGetDriverPathAndName(dev, &oldDriverPath, + &oldDriverName) < 0) { goto cleanup; } - if (virFileExists(path)) { - if (virFileLinkPointsTo(path, drvdir)) { - /* The device is already bound to pci-stub */ + if (virFileExists(driverLink)) { + if (virFileLinkPointsTo(driverLink, stubDriverPath)) { + /* The device is already bound to the correct driver */ + VIR_DEBUG("Device %s is already bound to %s", + dev->name, stubDriverName); result = 0; goto cleanup; } @@ -1098,26 +1106,21 @@ virPCIDeviceBindToStub(virPCIDevicePtr dev, const char *driver) * is triggered for such a device, it will also be immediately * bound by the stub. */ - if (virPCIDriverFile(&path, driver, "new_id") < 0) { + if (virPCIDriverFile(&path, stubDriverName, "new_id") < 0) { goto cleanup; } if (virFileWriteStr(path, dev->id, 0) < 0) { virReportSystemError(errno, _("Failed to add PCI device ID '%s' to %s"), - dev->id, driver); + dev->id, stubDriverName); goto cleanup; } /* check whether the device is bound to pci-stub when we write dev->id to - * new_id. + * ${stubDriver}/new_id. */ - if (virPCIDriverDir(&drvdir, driver) < 0 || - virPCIFile(&path, dev->name, "driver") < 0) { - goto remove_id; - } - - if (virFileLinkPointsTo(path, drvdir)) { + if (virFileLinkPointsTo(driverLink, stubDriverPath)) { dev->unbind_from_stub = true; dev->remove_slot = true; goto remove_id; @@ -1144,33 +1147,29 @@ virPCIDeviceBindToStub(virPCIDevicePtr dev, const char *driver) /* If the device isn't already bound to pci-stub, try binding it now. */ - if (virPCIDriverDir(&drvdir, driver) < 0 || - virPCIFile(&path, dev->name, "driver") < 0) { - goto remove_id; - } - - if (!virFileLinkPointsTo(path, drvdir)) { + if (!virFileLinkPointsTo(driverLink, stubDriverPath)) { /* Xen's pciback.ko wants you to use new_slot first */ - if (virPCIDriverFile(&path, driver, "new_slot") < 0) { + if (virPCIDriverFile(&path, stubDriverName, "new_slot") < 0) { goto remove_id; } if (virFileExists(path) && virFileWriteStr(path, dev->name, 0) < 0) { virReportSystemError(errno, - _("Failed to add slot for PCI device '%s' to %s"), - dev->name, driver); + _("Failed to add slot for " + "PCI device '%s' to %s"), + dev->name, stubDriverName); goto remove_id; } dev->remove_slot = true; - if (virPCIDriverFile(&path, driver, "bind") < 0) { + if (virPCIDriverFile(&path, stubDriverName, "bind") < 0) { goto remove_id; } if (virFileWriteStr(path, dev->name, 0) < 0) { virReportSystemError(errno, _("Failed to bind PCI device '%s' to %s"), - dev->name, driver); + dev->name, stubDriverName); goto remove_id; } dev->unbind_from_stub = true; @@ -1180,11 +1179,11 @@ remove_id: /* If 'remove_id' exists, remove the device id from pci-stub's dynamic * ID table so that 'drivers_probe' works below. */ - if (virPCIDriverFile(&path, driver, "remove_id") < 0) { + if (virPCIDriverFile(&path, stubDriverName, "remove_id") < 0) { /* We do not remove PCI ID from pci-stub, and we cannot reprobe it */ if (dev->reprobe) { VIR_WARN("Could not remove PCI ID '%s' from %s, and the device " - "cannot be probed again.", dev->id, driver); + "cannot be probed again.", dev->id, stubDriverName); } dev->reprobe = false; goto cleanup; @@ -1193,12 +1192,12 @@ remove_id: if (virFileExists(path) && virFileWriteStr(path, dev->id, 0) < 0) { virReportSystemError(errno, _("Failed to remove PCI ID '%s' from %s"), - dev->id, driver); + dev->id, stubDriverName); /* remove PCI ID from pci-stub failed, and we cannot reprobe it */ if (dev->reprobe) { VIR_WARN("Failed to remove PCI ID '%s' from %s, and the device " - "cannot be probed again.", dev->id, driver); + "cannot be probed again.", dev->id, stubDriverName); } dev->reprobe = false; goto cleanup; @@ -1207,12 +1206,14 @@ remove_id: result = 0; cleanup: - VIR_FREE(drvdir); + VIR_FREE(stubDriverPath); + VIR_FREE(driverLink); + VIR_FREE(oldDriverPath); + VIR_FREE(oldDriverName); VIR_FREE(path); - if (result < 0) { + if (result < 0) virPCIDeviceUnbindFromStub(dev); - } return result; } -- 1.7.11.7

On Mon, Jun 24, 2013 at 11:05:32PM -0400, Laine Stump wrote:
The same strings were being re-created multiple times just to save declaring a new variable. In the meantime, the use of the generic variable names led to confusion when trying to follow the code. This patch creates strings for:
stubDriverName (was called "driver" in original args) stubDriverPath ("/sys/bus/pci/drivers/${stubDriverName}") driverLink ("${device}/driver") oldDriverName (the final component of path linked to by "${device}/driver") oldDriverPath ("/sys/bus/pci/drivers/${oldDriverName}")
then re-uses them as necessary. --- src/util/virpci.c | 73 ++++++++++++++++++++++++++++--------------------------- 1 file changed, 37 insertions(+), 36 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

If the device is bound to a stub driver different from what is saved in the virPCIDevice's stubDriver attribute, update it. --- src/util/virpci.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/src/util/virpci.c b/src/util/virpci.c index 128f721..cbade1b 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1079,6 +1079,7 @@ virPCIDeviceBindToStub(virPCIDevicePtr dev, char *oldDriverPath = NULL; char *oldDriverName = NULL; char *path = NULL; /* reused for different purposes */ + const char *newDriverName = NULL; if (virPCIDriverDir(&stubDriverPath, stubDriverName) < 0 || virPCIFile(&driverLink, dev->name, "driver") < 0 || @@ -1092,6 +1093,7 @@ virPCIDeviceBindToStub(virPCIDevicePtr dev, /* The device is already bound to the correct driver */ VIR_DEBUG("Device %s is already bound to %s", dev->name, stubDriverName); + newDriverName = stubDriverName; result = 0; goto cleanup; } @@ -1203,6 +1205,7 @@ remove_id: goto cleanup; } + newDriverName = stubDriverName; result = 0; cleanup: @@ -1212,6 +1215,11 @@ cleanup: VIR_FREE(oldDriverName); VIR_FREE(path); + if (newDriverName && + STRNEQ_NULLABLE(dev->stubDriver, newDriverName)) { + VIR_FREE(dev->stubDriver); + result = VIR_STRDUP(dev->stubDriver, newDriverName); + } if (result < 0) virPCIDeviceUnbindFromStub(dev); -- 1.7.11.7

On Mon, Jun 24, 2013 at 11:05:33PM -0400, Laine Stump wrote:
If the device is bound to a stub driver different from what is saved in the virPCIDevice's stubDriver attribute, update it. --- src/util/virpci.c | 8 ++++++++ 1 file changed, 8 insertions(+)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

I just learned that VFIO resets PCI devices when they are assigned to guests / returned to the host, so it is redundant for libvirt to reset the devices. This patch inhibits calling virPCIDeviceReset to devices that will be/were assigned using VFIO. --- src/qemu/qemu_hostdev.c | 4 ++++ src/qemu/qemu_hotplug.c | 5 +++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index dfe39c6..d7d54d7 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -548,6 +548,8 @@ int qemuPrepareHostdevPCIDevices(virQEMUDriverPtr driver, * can safely reset them */ for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); + if (STREQ_NULLABLE(virPCIDeviceGetStubDriver(dev), "vfio-pci")) + continue; if (virPCIDeviceReset(dev, driver->activePciHostdevs, driver->inactivePciHostdevs) < 0) goto reattachdevs; @@ -1114,6 +1116,8 @@ void qemuDomainReAttachHostdevDevices(virQEMUDriverPtr driver, for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); + if (STREQ_NULLABLE(virPCIDeviceGetStubDriver(dev), "vfio-pci")) + continue; if (virPCIDeviceReset(dev, driver->activePciHostdevs, driver->inactivePciHostdevs) < 0) { virErrorPtr err = virGetLastError(); diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 18f5fa5..46875ad 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -2528,8 +2528,9 @@ qemuDomainDetachHostPciDevice(virQEMUDriverPtr driver, if (pci) { activePci = virPCIDeviceListSteal(driver->activePciHostdevs, pci); if (activePci && - virPCIDeviceReset(activePci, driver->activePciHostdevs, - driver->inactivePciHostdevs) == 0) { + (subsys->u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO || + virPCIDeviceReset(activePci, driver->activePciHostdevs, + driver->inactivePciHostdevs) == 0)) { qemuReattachPciDevice(activePci, driver); ret = 0; } else { -- 1.7.11.7

I didn't notice that Martin had already ACKed this patch earlier today. I just pushed it, so no further review is needed (although it would still be good to get 07/12 reviewed and pushed to be sure this patch has proper information) On 06/24/2013 11:05 PM, Laine Stump wrote:
I just learned that VFIO resets PCI devices when they are assigned to guests / returned to the host, so it is redundant for libvirt to reset the devices. This patch inhibits calling virPCIDeviceReset to devices that will be/were assigned using VFIO. --- src/qemu/qemu_hostdev.c | 4 ++++ src/qemu/qemu_hotplug.c | 5 +++-- 2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index dfe39c6..d7d54d7 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -548,6 +548,8 @@ int qemuPrepareHostdevPCIDevices(virQEMUDriverPtr driver, * can safely reset them */ for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); + if (STREQ_NULLABLE(virPCIDeviceGetStubDriver(dev), "vfio-pci")) + continue; if (virPCIDeviceReset(dev, driver->activePciHostdevs, driver->inactivePciHostdevs) < 0) goto reattachdevs; @@ -1114,6 +1116,8 @@ void qemuDomainReAttachHostdevDevices(virQEMUDriverPtr driver,
for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); + if (STREQ_NULLABLE(virPCIDeviceGetStubDriver(dev), "vfio-pci")) + continue; if (virPCIDeviceReset(dev, driver->activePciHostdevs, driver->inactivePciHostdevs) < 0) { virErrorPtr err = virGetLastError(); diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 18f5fa5..46875ad 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -2528,8 +2528,9 @@ qemuDomainDetachHostPciDevice(virQEMUDriverPtr driver, if (pci) { activePci = virPCIDeviceListSteal(driver->activePciHostdevs, pci); if (activePci && - virPCIDeviceReset(activePci, driver->activePciHostdevs, - driver->inactivePciHostdevs) == 0) { + (subsys->u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO || + virPCIDeviceReset(activePci, driver->activePciHostdevs, + driver->inactivePciHostdevs) == 0)) { qemuReattachPciDevice(activePci, driver); ret = 0; } else {

On Mon, Jun 24, 2013 at 11:05:34PM -0400, Laine Stump wrote:
I just learned that VFIO resets PCI devices when they are assigned to guests / returned to the host, so it is redundant for libvirt to reset the devices. This patch inhibits calling virPCIDeviceReset to devices that will be/were assigned using VFIO. --- src/qemu/qemu_hostdev.c | 4 ++++ src/qemu/qemu_hotplug.c | 5 +++-- 2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index dfe39c6..d7d54d7 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -548,6 +548,8 @@ int qemuPrepareHostdevPCIDevices(virQEMUDriverPtr driver, * can safely reset them */ for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); + if (STREQ_NULLABLE(virPCIDeviceGetStubDriver(dev), "vfio-pci")) + continue; if (virPCIDeviceReset(dev, driver->activePciHostdevs, driver->inactivePciHostdevs) < 0) goto reattachdevs; @@ -1114,6 +1116,8 @@ void qemuDomainReAttachHostdevDevices(virQEMUDriverPtr driver,
for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); + if (STREQ_NULLABLE(virPCIDeviceGetStubDriver(dev), "vfio-pci")) + continue; if (virPCIDeviceReset(dev, driver->activePciHostdevs, driver->inactivePciHostdevs) < 0) { virErrorPtr err = virGetLastError(); diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 18f5fa5..46875ad 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -2528,8 +2528,9 @@ qemuDomainDetachHostPciDevice(virQEMUDriverPtr driver, if (pci) { activePci = virPCIDeviceListSteal(driver->activePciHostdevs, pci); if (activePci && - virPCIDeviceReset(activePci, driver->activePciHostdevs, - driver->inactivePciHostdevs) == 0) { + (subsys->u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO || + virPCIDeviceReset(activePci, driver->activePciHostdevs, + driver->inactivePciHostdevs) == 0)) {
I'm guessing that virPCIDeviceGetStubDriver() isn't returning 'pci-stub' here, right ? Otherwise you'd would have used the same pattern as earlier
qemuReattachPciDevice(activePci, driver); ret = 0; } else {
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 06/25/2013 06:44 AM, Daniel P. Berrange wrote:
On Mon, Jun 24, 2013 at 11:05:34PM -0400, Laine Stump wrote:
I just learned that VFIO resets PCI devices when they are assigned to guests / returned to the host, so it is redundant for libvirt to reset the devices. This patch inhibits calling virPCIDeviceReset to devices that will be/were assigned using VFIO. --- src/qemu/qemu_hostdev.c | 4 ++++ src/qemu/qemu_hotplug.c | 5 +++-- 2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index dfe39c6..d7d54d7 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -548,6 +548,8 @@ int qemuPrepareHostdevPCIDevices(virQEMUDriverPtr driver, * can safely reset them */ for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); + if (STREQ_NULLABLE(virPCIDeviceGetStubDriver(dev), "vfio-pci")) + continue; if (virPCIDeviceReset(dev, driver->activePciHostdevs, driver->inactivePciHostdevs) < 0) goto reattachdevs; @@ -1114,6 +1116,8 @@ void qemuDomainReAttachHostdevDevices(virQEMUDriverPtr driver,
for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); + if (STREQ_NULLABLE(virPCIDeviceGetStubDriver(dev), "vfio-pci")) + continue; if (virPCIDeviceReset(dev, driver->activePciHostdevs, driver->inactivePciHostdevs) < 0) { virErrorPtr err = virGetLastError(); diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 18f5fa5..46875ad 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -2528,8 +2528,9 @@ qemuDomainDetachHostPciDevice(virQEMUDriverPtr driver, if (pci) { activePci = virPCIDeviceListSteal(driver->activePciHostdevs, pci); if (activePci && - virPCIDeviceReset(activePci, driver->activePciHostdevs, - driver->inactivePciHostdevs) == 0) { + (subsys->u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO || + virPCIDeviceReset(activePci, driver->activePciHostdevs, + driver->inactivePciHostdevs) == 0)) { I'm guessing that virPCIDeviceGetStubDriver() isn't returning 'pci-stub' here, right ? Otherwise you'd would have used the same pattern as earlier
No, the stub driver should be set correctly at this time[*]. It was just a case of having multiple options for learning the answer in this case, and choosing the one that's more efficient. In the other cases where I need to decide whether or not to call virPCIDeviceReset, I don't have access to the virDomainHostdevDef, only to a virPCIDevice that was created from the virDomainHostdevDef, so I use the less efficient method of a strcmp of the stubDriverName. [*] When the device is initially attached to the guest, It's set from the hostdev right at the time the virPCIDevice is created based on config data in virDomainHostdevDef. That virPCIDevice is placed in the activePciHostdevs list and is what is found in "activePci" in the above code chunk. In the case that libvirtd is restarted, again the virPCIDevice that is in the activePciHostdevs list is created based on the virDomainHostdevDef in the domain's config, so it will have the correct stubDriverName.

On Tue, Jun 25, 2013 at 01:06:55PM -0400, Laine Stump wrote:
On 06/25/2013 06:44 AM, Daniel P. Berrange wrote:
On Mon, Jun 24, 2013 at 11:05:34PM -0400, Laine Stump wrote:
I just learned that VFIO resets PCI devices when they are assigned to guests / returned to the host, so it is redundant for libvirt to reset the devices. This patch inhibits calling virPCIDeviceReset to devices that will be/were assigned using VFIO. --- src/qemu/qemu_hostdev.c | 4 ++++ src/qemu/qemu_hotplug.c | 5 +++-- 2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index dfe39c6..d7d54d7 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -548,6 +548,8 @@ int qemuPrepareHostdevPCIDevices(virQEMUDriverPtr driver, * can safely reset them */ for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); + if (STREQ_NULLABLE(virPCIDeviceGetStubDriver(dev), "vfio-pci")) + continue; if (virPCIDeviceReset(dev, driver->activePciHostdevs, driver->inactivePciHostdevs) < 0) goto reattachdevs; @@ -1114,6 +1116,8 @@ void qemuDomainReAttachHostdevDevices(virQEMUDriverPtr driver,
for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); + if (STREQ_NULLABLE(virPCIDeviceGetStubDriver(dev), "vfio-pci")) + continue; if (virPCIDeviceReset(dev, driver->activePciHostdevs, driver->inactivePciHostdevs) < 0) { virErrorPtr err = virGetLastError(); diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 18f5fa5..46875ad 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -2528,8 +2528,9 @@ qemuDomainDetachHostPciDevice(virQEMUDriverPtr driver, if (pci) { activePci = virPCIDeviceListSteal(driver->activePciHostdevs, pci); if (activePci && - virPCIDeviceReset(activePci, driver->activePciHostdevs, - driver->inactivePciHostdevs) == 0) { + (subsys->u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO || + virPCIDeviceReset(activePci, driver->activePciHostdevs, + driver->inactivePciHostdevs) == 0)) { I'm guessing that virPCIDeviceGetStubDriver() isn't returning 'pci-stub' here, right ? Otherwise you'd would have used the same pattern as earlier
No, the stub driver should be set correctly at this time[*]. It was just a case of having multiple options for learning the answer in this case, and choosing the one that's more efficient.
Ok, so where I was going here, was that if the the virPCIDevicePtr object has enough info, then could we make virPCIDeviceReset() look for stub == vfio-pci, and make itself a nop. This avoids the risk of one caller forgetting to do the check itself. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 06/25/2013 01:09 PM, Daniel P. Berrange wrote:
On Tue, Jun 25, 2013 at 01:06:55PM -0400, Laine Stump wrote:
On 06/25/2013 06:44 AM, Daniel P. Berrange wrote:
On Mon, Jun 24, 2013 at 11:05:34PM -0400, Laine Stump wrote:
I just learned that VFIO resets PCI devices when they are assigned to guests / returned to the host, so it is redundant for libvirt to reset the devices. This patch inhibits calling virPCIDeviceReset to devices that will be/were assigned using VFIO. --- src/qemu/qemu_hostdev.c | 4 ++++ src/qemu/qemu_hotplug.c | 5 +++-- 2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index dfe39c6..d7d54d7 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -548,6 +548,8 @@ int qemuPrepareHostdevPCIDevices(virQEMUDriverPtr driver, * can safely reset them */ for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); + if (STREQ_NULLABLE(virPCIDeviceGetStubDriver(dev), "vfio-pci")) + continue; if (virPCIDeviceReset(dev, driver->activePciHostdevs, driver->inactivePciHostdevs) < 0) goto reattachdevs; @@ -1114,6 +1116,8 @@ void qemuDomainReAttachHostdevDevices(virQEMUDriverPtr driver,
for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); + if (STREQ_NULLABLE(virPCIDeviceGetStubDriver(dev), "vfio-pci")) + continue; if (virPCIDeviceReset(dev, driver->activePciHostdevs, driver->inactivePciHostdevs) < 0) { virErrorPtr err = virGetLastError(); diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 18f5fa5..46875ad 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -2528,8 +2528,9 @@ qemuDomainDetachHostPciDevice(virQEMUDriverPtr driver, if (pci) { activePci = virPCIDeviceListSteal(driver->activePciHostdevs, pci); if (activePci && - virPCIDeviceReset(activePci, driver->activePciHostdevs, - driver->inactivePciHostdevs) == 0) { + (subsys->u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO || + virPCIDeviceReset(activePci, driver->activePciHostdevs, + driver->inactivePciHostdevs) == 0)) { I'm guessing that virPCIDeviceGetStubDriver() isn't returning 'pci-stub' here, right ? Otherwise you'd would have used the same pattern as earlier No, the stub driver should be set correctly at this time[*]. It was just a case of having multiple options for learning the answer in this case, and choosing the one that's more efficient. Ok, so where I was going here, was that if the the virPCIDevicePtr object has enough info, then could we make virPCIDeviceReset() look for stub == vfio-pci, and make itself a nop. This avoids the risk of one caller forgetting to do the check itself.
Interesting idea, and it will work as long as: 1) we only reset the device while it's bound to the stub driver (not, for example, just after it's been unbound from vfio-pci and bound to the host driver). and 2) we really never want to reset a device that is bound to vfio-pci. I just double-checked, and as far as I can see, both of these are true, so we can do it. I'll make a patch to do that as soon as I finish cleaning up the existing patches.

Make a copy of the device and add the copy to the list. (virPCIDeviceListAdd() adds the original object to the list instead). --- src/libvirt_private.syms | 1 + src/util/virpci.c | 17 +++++++++++++++++ src/util/virpci.h | 1 + 3 files changed, 19 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index e906742..bf9bd12 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1703,6 +1703,7 @@ virPCIDeviceGetUnbindFromStub; virPCIDeviceGetUsedBy; virPCIDeviceIsAssignable; virPCIDeviceListAdd; +virPCIDeviceListAddCopy; virPCIDeviceListCount; virPCIDeviceListDel; virPCIDeviceListFind; diff --git a/src/util/virpci.c b/src/util/virpci.c index cbade1b..378b4f3 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1705,6 +1705,23 @@ virPCIDeviceListAdd(virPCIDeviceListPtr list, return 0; } + +/* virPCIDeviceListAddCopy - add a *copy* of the device to this list */ +int +virPCIDeviceListAddCopy(virPCIDeviceListPtr list, virPCIDevicePtr dev) +{ + virPCIDevicePtr copy = virPCIDeviceCopy(dev); + + if (!copy) + return -1; + if (virPCIDeviceListAdd(list, copy) < 0) { + virPCIDeviceFree(copy); + return -1; + } + return 0; +} + + virPCIDevicePtr virPCIDeviceListGet(virPCIDeviceListPtr list, int idx) diff --git a/src/util/virpci.h b/src/util/virpci.h index 972f86b..5f80de3 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -83,6 +83,7 @@ void virPCIDeviceReattachInit(virPCIDevice *dev); virPCIDeviceListPtr virPCIDeviceListNew(void); int virPCIDeviceListAdd(virPCIDeviceListPtr list, virPCIDevicePtr dev); +int virPCIDeviceListAddCopy(virPCIDeviceListPtr list, virPCIDevicePtr dev); virPCIDevicePtr virPCIDeviceListGet(virPCIDeviceListPtr list, int idx); int virPCIDeviceListCount(virPCIDeviceListPtr list); -- 1.7.11.7

On Mon, Jun 24, 2013 at 11:05:35PM -0400, Laine Stump wrote:
Make a copy of the device and add the copy to the list. (virPCIDeviceListAdd() adds the original object to the list instead). --- src/libvirt_private.syms | 1 + src/util/virpci.c | 17 +++++++++++++++++ src/util/virpci.h | 1 + 3 files changed, 19 insertions(+)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index e906742..bf9bd12 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1703,6 +1703,7 @@ virPCIDeviceGetUnbindFromStub; virPCIDeviceGetUsedBy; virPCIDeviceIsAssignable; virPCIDeviceListAdd; +virPCIDeviceListAddCopy; virPCIDeviceListCount; virPCIDeviceListDel; virPCIDeviceListFind; diff --git a/src/util/virpci.c b/src/util/virpci.c index cbade1b..378b4f3 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1705,6 +1705,23 @@ virPCIDeviceListAdd(virPCIDeviceListPtr list, return 0; }
+ +/* virPCIDeviceListAddCopy - add a *copy* of the device to this list */ +int +virPCIDeviceListAddCopy(virPCIDeviceListPtr list, virPCIDevicePtr dev) +{ + virPCIDevicePtr copy = virPCIDeviceCopy(dev); + + if (!copy) + return -1; + if (virPCIDeviceListAdd(list, copy) < 0) { + virPCIDeviceFree(copy); + return -1; + } + return 0; +} + + virPCIDevicePtr virPCIDeviceListGet(virPCIDeviceListPtr list, int idx) diff --git a/src/util/virpci.h b/src/util/virpci.h index 972f86b..5f80de3 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -83,6 +83,7 @@ void virPCIDeviceReattachInit(virPCIDevice *dev); virPCIDeviceListPtr virPCIDeviceListNew(void); int virPCIDeviceListAdd(virPCIDeviceListPtr list, virPCIDevicePtr dev); +int virPCIDeviceListAddCopy(virPCIDeviceListPtr list, virPCIDevicePtr dev); virPCIDevicePtr virPCIDeviceListGet(virPCIDeviceListPtr list, int idx); int virPCIDeviceListCount(virPCIDeviceListPtr list);
ACK, though as a followup, it'd be nice to actually eliminate the virPCIDeviceListAdd variant, since APIs stealing ownership of passed in parameters have been a good source of bugs historically. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 06/25/2013 06:45 AM, Daniel P. Berrange wrote:
On Mon, Jun 24, 2013 at 11:05:35PM -0400, Laine Stump wrote:
Make a copy of the device and add the copy to the list. (virPCIDeviceListAdd() adds the original object to the list instead). --- src/libvirt_private.syms | 1 + src/util/virpci.c | 17 +++++++++++++++++ src/util/virpci.h | 1 + 3 files changed, 19 insertions(+)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index e906742..bf9bd12 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1703,6 +1703,7 @@ virPCIDeviceGetUnbindFromStub; virPCIDeviceGetUsedBy; virPCIDeviceIsAssignable; virPCIDeviceListAdd; +virPCIDeviceListAddCopy; virPCIDeviceListCount; virPCIDeviceListDel; virPCIDeviceListFind; diff --git a/src/util/virpci.c b/src/util/virpci.c index cbade1b..378b4f3 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1705,6 +1705,23 @@ virPCIDeviceListAdd(virPCIDeviceListPtr list, return 0; }
+ +/* virPCIDeviceListAddCopy - add a *copy* of the device to this list */ +int +virPCIDeviceListAddCopy(virPCIDeviceListPtr list, virPCIDevicePtr dev) +{ + virPCIDevicePtr copy = virPCIDeviceCopy(dev); + + if (!copy) + return -1; + if (virPCIDeviceListAdd(list, copy) < 0) { + virPCIDeviceFree(copy); + return -1; + } + return 0; +} + + virPCIDevicePtr virPCIDeviceListGet(virPCIDeviceListPtr list, int idx) diff --git a/src/util/virpci.h b/src/util/virpci.h index 972f86b..5f80de3 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -83,6 +83,7 @@ void virPCIDeviceReattachInit(virPCIDevice *dev); virPCIDeviceListPtr virPCIDeviceListNew(void); int virPCIDeviceListAdd(virPCIDeviceListPtr list, virPCIDevicePtr dev); +int virPCIDeviceListAddCopy(virPCIDeviceListPtr list, virPCIDevicePtr dev); virPCIDevicePtr virPCIDeviceListGet(virPCIDeviceListPtr list, int idx); int virPCIDeviceListCount(virPCIDeviceListPtr list); ACK, though as a followup, it'd be nice to actually eliminate the virPCIDeviceListAdd variant, since APIs stealing ownership of passed in parameters have been a good source of bugs historically.
Yes, and a source of confusion (which I guess is a good part of the cause of the bugs :-) The *really* ugly part of all the virPCIDeviceList code is all the places that an object isn't even stolen, but is placed on multiple lists simultaneously.

The "fix" I pushed a few commits ago would still leak a virPCIDevice in case of an OOM error. Although it's inconsequential in practice, this patch satisfies my OCD. --- src/util/virpci.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/src/util/virpci.c b/src/util/virpci.c index 378b4f3..515feea 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1268,11 +1268,9 @@ virPCIDeviceDetach(virPCIDevicePtr dev, /* Add *a copy of* the dev into list inactiveDevs, if * it's not already there. */ - if (inactiveDevs && !virPCIDeviceListFind(inactiveDevs, dev)) { - virPCIDevicePtr copy = virPCIDeviceCopy(dev); - - if ((!copy) || virPCIDeviceListAdd(inactiveDevs, copy) < 0) - return -1; + if (inactiveDevs && !virPCIDeviceListFind(inactiveDevs, dev) && + virPCIDeviceListAddCopy(inactiveDevs, dev) < 0) { + return -1; } return 0; -- 1.7.11.7

On Mon, Jun 24, 2013 at 11:05:36PM -0400, Laine Stump wrote:
The "fix" I pushed a few commits ago would still leak a virPCIDevice in case of an OOM error. Although it's inconsequential in practice, this patch satisfies my OCD. --- src/util/virpci.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/src/util/virpci.c b/src/util/virpci.c index 378b4f3..515feea 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1268,11 +1268,9 @@ virPCIDeviceDetach(virPCIDevicePtr dev, /* Add *a copy of* the dev into list inactiveDevs, if * it's not already there. */ - if (inactiveDevs && !virPCIDeviceListFind(inactiveDevs, dev)) { - virPCIDevicePtr copy = virPCIDeviceCopy(dev); - - if ((!copy) || virPCIDeviceListAdd(inactiveDevs, copy) < 0) - return -1; + if (inactiveDevs && !virPCIDeviceListFind(inactiveDevs, dev) && + virPCIDeviceListAddCopy(inactiveDevs, dev) < 0) { + return -1; }
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

(This isn't as bad as it sounds - it's only a problem in case of an OOM error.) qemuGetActivePciHostDeviceList() had been creating a list that contained pointers to objects that were also on the activePciHostdevs list. In case of an OOM error, this newly created list would be virObjectUnref'ed, which would cause everything on the list to be freed. But all of those objects would still be on the activePciHostdevs list, which could have very bad consequences if that list was ever again accessed. The solution used here is to populate the new list with *copies* of the objects from the original list. It turns out that on return from qemuGetActivePciHostDeviceList(), the caller would almost immediately go through all the device objects and "steal" them (i.e. remove the pointer from the list but not delete it) all from either one list or the other; we now instead just *delete* (remove from the list and free) each device from one list or the other, so in the end we have the same state. --- src/qemu/qemu_hostdev.c | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index d7d54d7..09ac6ad 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -86,6 +86,12 @@ qemuGetPciHostDeviceList(virDomainHostdevDefPtr *hostdevs, int nhostdevs) } /* + * qemuGetActivePciHostDeviceList - make a new list with a *copy* of + * every virPCIDevice object that is found on the activePciHostdevs + * list *and* is in the hostdev list for this domain. + * + * Return the new list, or NULL if there was a failure. + * * Pre-condition: driver->activePciHostdevs is locked */ static virPCIDeviceListPtr @@ -101,7 +107,7 @@ qemuGetActivePciHostDeviceList(virQEMUDriverPtr driver, for (i = 0; i < nhostdevs; i++) { virDomainHostdevDefPtr hostdev = hostdevs[i]; - virPCIDevicePtr dev; + virDevicePCIAddressPtr addr; virPCIDevicePtr activeDev; if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS) @@ -109,24 +115,14 @@ qemuGetActivePciHostDeviceList(virQEMUDriverPtr driver, if (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) continue; - dev = virPCIDeviceNew(hostdev->source.subsys.u.pci.addr.domain, - hostdev->source.subsys.u.pci.addr.bus, - hostdev->source.subsys.u.pci.addr.slot, - hostdev->source.subsys.u.pci.addr.function); - if (!dev) { + addr = &hostdev->source.subsys.u.pci.addr; + activeDev = virPCIDeviceListFindByIDs(driver->activePciHostdevs, + addr->domain, addr->bus, + addr->slot, addr->function); + if (activeDev && virPCIDeviceListAddCopy(list, activeDev) < 0) { virObjectUnref(list); return NULL; } - - if ((activeDev = virPCIDeviceListFind(driver->activePciHostdevs, dev))) { - if (virPCIDeviceListAdd(list, activeDev) < 0) { - virPCIDeviceFree(dev); - virObjectUnref(list); - return NULL; - } - } - - virPCIDeviceFree(dev); } return list; @@ -1084,20 +1080,24 @@ void qemuDomainReAttachHostdevDevices(virQEMUDriverPtr driver, virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); virPCIDevicePtr activeDev = NULL; - /* Never delete the dev from list driver->activePciHostdevs - * if it's used by other domain. + /* delete the copy of the dev from pcidevs if it's used by + * other domain. Or delete it from activePciHostDevs if it had + * been used by this domain. */ activeDev = virPCIDeviceListFind(driver->activePciHostdevs, dev); if (activeDev && STRNEQ_NULLABLE(name, virPCIDeviceGetUsedBy(activeDev))) { - virPCIDeviceListSteal(pcidevs, dev); + virPCIDeviceListDel(pcidevs, dev); continue; } - /* virObjectUnref() will take care of freeing the dev. */ - virPCIDeviceListSteal(driver->activePciHostdevs, dev); + virPCIDeviceListDel(driver->activePciHostdevs, dev); } + /* At this point, any device that had been used by the guest is in + * pcidevs, but has been removed from activePciHostdevs. + */ + /* * For SRIOV net host devices, unset mac and port profile before * reset and reattach device -- 1.7.11.7

A loop in qemuPrepareHostdevPCIDevices() intended to cycle through all the objects on the list pcidevs was doing "while (listcount > 0)", but nothing in the body of the loop was reducing the size of the list - it was instead removing items from a *different* list. It has now been safely changed to a for() loop. --- src/qemu/qemu_hostdev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index 09ac6ad..404939e 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -638,8 +638,8 @@ inactivedevs: /* Only steal all the devices from driver->activePciHostdevs. We will * free them in virObjectUnref(). */ - while (virPCIDeviceListCount(pcidevs) > 0) { - virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, 0); + for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { + virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); virPCIDeviceListSteal(driver->activePciHostdevs, dev); } -- 1.7.11.7

On Mon, Jun 24, 2013 at 11:05:38PM -0400, Laine Stump wrote:
A loop in qemuPrepareHostdevPCIDevices() intended to cycle through all the objects on the list pcidevs was doing "while (listcount > 0)", but nothing in the body of the loop was reducing the size of the list - it was instead removing items from a *different* list. It has now been safely changed to a for() loop. --- src/qemu/qemu_hostdev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index 09ac6ad..404939e 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -638,8 +638,8 @@ inactivedevs: /* Only steal all the devices from driver->activePciHostdevs. We will * free them in virObjectUnref(). */ - while (virPCIDeviceListCount(pcidevs) > 0) { - virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, 0); + for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { + virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i); virPCIDeviceListSteal(driver->activePciHostdevs, dev); }
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 06/24/2013 11:05 PM, Laine Stump wrote:
This is what remains of yesterday's VFIO groups patchset that is considered still useful, but wasn't yet ACKed (I've pushed the 5 that were ACKed). In addition, I found a few more bugs in the virPCIDeviceList handling and have included patches for those. There are a couple of these that would be really nice to get into 1.0.7, one in particular which is new functionality, but has 4 prerequisites:
The most important patch here from a functionality point of view is PATCH 05/12, which adds the <iommuGroup> element to the output of "virsh nodedev-dumpxml". This is essential for users/management applications to have adequate information to know which devices need to be detached from host drivers. (Unfortunately, all of the patches prior to 05/12 are required for 05/12 to cleanly apply.)
Okay, I've pushed all except 04/12 (which needed changes) and 05/12 (which requires 04/12 to be pushed first). I made the changes to 04/12 and reposted. https://www.redhat.com/archives/libvir-list/2013-June/msg01090.html
participants (2)
-
Daniel P. Berrange
-
Laine Stump