[libvirt] [PATCH] Read PCI class from sysfs class file instead of config space.

When determining if a device is behind a PCI bridge, the PCI device class is checked by reading the config space. However, there are some devices which have the wrong class on the config space, but the class is initialized by Linux correctly as a PCI BRIDGE. This class can be read by the sysfs file '/sys/bus/pci/devices/xxxx:xx:xx.x/class'. One example of such bridge is IBM PCI Bridge 1014:03b9, which is identified as a Host Bridge when reading the config space. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> --- src/util/virpci.c | 38 +++++++++++++++++++++++++++++++++++--- 1 files changed, 35 insertions(+), 3 deletions(-) diff --git a/src/util/virpci.c b/src/util/virpci.c index 8ec642f..8353411 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -344,6 +344,34 @@ virPCIDeviceRead32(virPCIDevicePtr dev, int cfgfd, unsigned int pos) } static int +virPCIDeviceReadClass(virPCIDevicePtr dev, uint16_t *device_class) +{ + char *path = NULL; + char *id_str; + int ret = 0; + unsigned int value; + + if (virPCIFile(&path, dev->name, "class") < 0) + return -1; + + /* class string is '0xNNNNNN\n' ... i.e. 9 bytes */ + if (virFileReadAll(path, 9, &id_str) < 0) { + VIR_FREE(path); + return -1; + } + + VIR_FREE(path); + + id_str[8] = '\0'; + ret = virStrToLong_ui(id_str, NULL, 16, &value); + if (ret == 0) + *device_class = (value >> 8) & 0xFFFF; + + VIR_FREE(id_str); + return ret; +} + +static int virPCIDeviceWrite(virPCIDevicePtr dev, int cfgfd, unsigned int pos, @@ -645,8 +673,8 @@ virPCIDeviceIsParent(virPCIDevicePtr dev, virPCIDevicePtr check, void *data) return 0; /* Is it a bridge? */ - device_class = virPCIDeviceRead16(check, fd, PCI_CLASS_DEVICE); - if (device_class != PCI_CLASS_BRIDGE_PCI) + ret = virPCIDeviceReadClass(check, &device_class); + if (ret == -1 || device_class != PCI_CLASS_BRIDGE_PCI) goto cleanup; /* Is it a plane? */ @@ -2110,6 +2138,7 @@ virPCIDeviceDownstreamLacksACS(virPCIDevicePtr dev) unsigned int pos; int fd; int ret = 0; + uint16_t device_class; if ((fd = virPCIDeviceConfigOpen(dev, true)) < 0) return -1; @@ -2119,8 +2148,11 @@ virPCIDeviceDownstreamLacksACS(virPCIDevicePtr dev) goto cleanup; } + if (virPCIDeviceReadClass(dev, &device_class)) + goto cleanup; + pos = dev->pcie_cap_pos; - if (!pos || virPCIDeviceRead16(dev, fd, PCI_CLASS_DEVICE) != PCI_CLASS_BRIDGE_PCI) + if (!pos || device_class != PCI_CLASS_BRIDGE_PCI) goto cleanup; flags = virPCIDeviceRead16(dev, fd, pos + PCI_EXP_FLAGS); -- 1.7.1

On Tue, Dec 10, 2013 at 07:57:01AM -0200, Thadeu Lima de Souza Cascardo wrote:
When determining if a device is behind a PCI bridge, the PCI device class is checked by reading the config space. However, there are some devices which have the wrong class on the config space, but the class is initialized by Linux correctly as a PCI BRIDGE. This class can be read by the sysfs file '/sys/bus/pci/devices/xxxx:xx:xx.x/class'.
Do you have any idea how long this file has been present in sysfs ? Hopefully it is a fairly ancient feature, so we can rely on it existing ? Otherwise we'll need to fallback to reading the config space when the sysfs file does not exist.
One example of such bridge is IBM PCI Bridge 1014:03b9, which is identified as a Host Bridge when reading the config space.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> --- src/util/virpci.c | 38 +++++++++++++++++++++++++++++++++++--- 1 files changed, 35 insertions(+), 3 deletions(-)
Looks reasonable to me, though prefer Laine to ACK it since he knows the PCI code best of anyone Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Wed, Dec 11, 2013 at 12:23:59PM +0000, Daniel P. Berrange wrote:
On Tue, Dec 10, 2013 at 07:57:01AM -0200, Thadeu Lima de Souza Cascardo wrote:
When determining if a device is behind a PCI bridge, the PCI device class is checked by reading the config space. However, there are some devices which have the wrong class on the config space, but the class is initialized by Linux correctly as a PCI BRIDGE. This class can be read by the sysfs file '/sys/bus/pci/devices/xxxx:xx:xx.x/class'.
Do you have any idea how long this file has been present in sysfs ? Hopefully it is a fairly ancient feature, so we can rely on it existing ? Otherwise we'll need to fallback to reading the config space when the sysfs file does not exist.
At least 10 years, and before the config file was available (also more than 10 years). :-)
One example of such bridge is IBM PCI Bridge 1014:03b9, which is identified as a Host Bridge when reading the config space.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> --- src/util/virpci.c | 38 +++++++++++++++++++++++++++++++++++--- 1 files changed, 35 insertions(+), 3 deletions(-)
Looks reasonable to me, though prefer Laine to ACK it since he knows the PCI code best of anyone
Thanks for your review. Cascardo.
Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 12/11/2013 02:23 PM, Daniel P. Berrange wrote:
On Tue, Dec 10, 2013 at 07:57:01AM -0200, Thadeu Lima de Souza Cascardo wrote:
When determining if a device is behind a PCI bridge, the PCI device class is checked by reading the config space. However, there are some devices which have the wrong class on the config space, but the class is initialized by Linux correctly as a PCI BRIDGE. This class can be read by the sysfs file '/sys/bus/pci/devices/xxxx:xx:xx.x/class'. Do you have any idea how long this file has been present in sysfs ? Hopefully it is a fairly ancient feature, so we can rely on it existing ? Otherwise we'll need to fallback to reading the config space when the sysfs file does not exist.
One example of such bridge is IBM PCI Bridge 1014:03b9, which is identified as a Host Bridge when reading the config space.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> --- src/util/virpci.c | 38 +++++++++++++++++++++++++++++++++++--- 1 files changed, 35 insertions(+), 3 deletions(-) Looks reasonable to me, though prefer Laine to ACK it since he knows the PCI code best of anyone
??? It's possible I'm the most recent person to poke it in a major way, but I still consider myself fairly unschooled on PCI.

On 10.12.2013 10:57, Thadeu Lima de Souza Cascardo wrote:
When determining if a device is behind a PCI bridge, the PCI device class is checked by reading the config space. However, there are some devices which have the wrong class on the config space, but the class is initialized by Linux correctly as a PCI BRIDGE. This class can be read by the sysfs file '/sys/bus/pci/devices/xxxx:xx:xx.x/class'.
One example of such bridge is IBM PCI Bridge 1014:03b9, which is identified as a Host Bridge when reading the config space.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> --- src/util/virpci.c | 38 +++++++++++++++++++++++++++++++++++--- 1 files changed, 35 insertions(+), 3 deletions(-)
Adding a regression test should be easy (esp. when virpcitest is already written). Michal

On 12/10/2013 11:57 AM, Thadeu Lima de Souza Cascardo wrote:
When determining if a device is behind a PCI bridge, the PCI device class is checked by reading the config space. However, there are some devices which have the wrong class on the config space, but the class is initialized by Linux correctly as a PCI BRIDGE. This class can be read by the sysfs file '/sys/bus/pci/devices/xxxx:xx:xx.x/class'.
Since I'm not an expert on the PCI spec, I'll just have to take your word for this :-)
One example of such bridge is IBM PCI Bridge 1014:03b9, which is identified as a Host Bridge when reading the config space.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> --- src/util/virpci.c | 38 +++++++++++++++++++++++++++++++++++--- 1 files changed, 35 insertions(+), 3 deletions(-)
diff --git a/src/util/virpci.c b/src/util/virpci.c index 8ec642f..8353411 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -344,6 +344,34 @@ virPCIDeviceRead32(virPCIDevicePtr dev, int cfgfd, unsigned int pos) }
static int +virPCIDeviceReadClass(virPCIDevicePtr dev, uint16_t *device_class) +{ + char *path = NULL; + char *id_str; + int ret = 0; + unsigned int value; + + if (virPCIFile(&path, dev->name, "class") < 0) + return -1; + + /* class string is '0xNNNNNN\n' ... i.e. 9 bytes */ + if (virFileReadAll(path, 9, &id_str) < 0) { + VIR_FREE(path); + return -1; + } + + VIR_FREE(path); + + id_str[8] = '\0'; + ret = virStrToLong_ui(id_str, NULL, 16, &value); + if (ret == 0) + *device_class = (value >> 8) & 0xFFFF;
If the class file for some reason doesn't contain a hexadecimal number, this will return an error (-1) without having logged an error. This results in one of those "An error occurred, but the cause is unknown" messages, which is always very frustrating to debug. Even though it's *highly* unlikely that such an error would occur, we should still have the code to log it just in case.
+ + VIR_FREE(id_str); + return ret; +} + +static int virPCIDeviceWrite(virPCIDevicePtr dev, int cfgfd, unsigned int pos, @@ -645,8 +673,8 @@ virPCIDeviceIsParent(virPCIDevicePtr dev, virPCIDevicePtr check, void *data) return 0;
/* Is it a bridge? */ - device_class = virPCIDeviceRead16(check, fd, PCI_CLASS_DEVICE); - if (device_class != PCI_CLASS_BRIDGE_PCI) + ret = virPCIDeviceReadClass(check, &device_class); + if (ret == -1 || device_class != PCI_CLASS_BRIDGE_PCI) goto cleanup;
/* Is it a plane? */ @@ -2110,6 +2138,7 @@ virPCIDeviceDownstreamLacksACS(virPCIDevicePtr dev) unsigned int pos; int fd; int ret = 0; + uint16_t device_class;
if ((fd = virPCIDeviceConfigOpen(dev, true)) < 0) return -1; @@ -2119,8 +2148,11 @@ virPCIDeviceDownstreamLacksACS(virPCIDevicePtr dev) goto cleanup; }
+ if (virPCIDeviceReadClass(dev, &device_class)) + goto cleanup; + pos = dev->pcie_cap_pos; - if (!pos || virPCIDeviceRead16(dev, fd, PCI_CLASS_DEVICE) != PCI_CLASS_BRIDGE_PCI) + if (!pos || device_class != PCI_CLASS_BRIDGE_PCI) goto cleanup;
flags = virPCIDeviceRead16(dev, fd, pos + PCI_EXP_FLAGS);
Aside from the above lack of error logging, as Michal suggested a test case would be very useful.

On Wed, Dec 11, 2013 at 04:30:48PM +0200, Laine Stump wrote:
On 12/10/2013 11:57 AM, Thadeu Lima de Souza Cascardo wrote:
When determining if a device is behind a PCI bridge, the PCI device class is checked by reading the config space. However, there are some devices which have the wrong class on the config space, but the class is initialized by Linux correctly as a PCI BRIDGE. This class can be read by the sysfs file '/sys/bus/pci/devices/xxxx:xx:xx.x/class'.
Since I'm not an expert on the PCI spec, I'll just have to take your word for this :-)
Well, I'm not an expert either, but in this case, it's just a matter of dealing with bad hardware. After this patch, I managed to get a guest up with the PCI device and it worked. But if it makes everybody more comfortable (or less comfortable, realizing there is broken hardware out there :-), here are two snippets from kernel code: arch/powerpc/platforms/powernv/pci.c: /* Fixup wrong class code in p7ioc and p8 root complex */ static void pnv_p7ioc_rc_quirk(struct pci_dev *dev) { dev->class = PCI_CLASS_BRIDGE_PCI << 8; } DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IBM, 0x3b9, pnv_p7ioc_rc_quirk); vers/pci/host/pci-tegra.c: /* Tegra PCIE root complex wrongly reports device class */ static void tegra_pcie_fixup_class(struct pci_dev *dev) { dev->class = PCI_CLASS_BRIDGE_PCI << 8; } DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_NVIDIA, 0x0bf0, tegra_pcie_fixup_class); DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_NVIDIA, 0x0bf1, tegra_pcie_fixup_class); DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_NVIDIA, 0x0e1c, tegra_pcie_fixup_class); DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_NVIDIA, 0x0e1d, tegra_pcie_fixup_class); I just believe that, instead of introducing those same quirks into libvirt code, it's so much better to leave this at the kernel, and read the fixed value of class.
One example of such bridge is IBM PCI Bridge 1014:03b9, which is identified as a Host Bridge when reading the config space.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> --- src/util/virpci.c | 38 +++++++++++++++++++++++++++++++++++--- 1 files changed, 35 insertions(+), 3 deletions(-)
diff --git a/src/util/virpci.c b/src/util/virpci.c index 8ec642f..8353411 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -344,6 +344,34 @@ virPCIDeviceRead32(virPCIDevicePtr dev, int cfgfd, unsigned int pos) }
static int +virPCIDeviceReadClass(virPCIDevicePtr dev, uint16_t *device_class) +{ + char *path = NULL; + char *id_str; + int ret = 0; + unsigned int value; + + if (virPCIFile(&path, dev->name, "class") < 0) + return -1; + + /* class string is '0xNNNNNN\n' ... i.e. 9 bytes */ + if (virFileReadAll(path, 9, &id_str) < 0) { + VIR_FREE(path); + return -1; + } + + VIR_FREE(path); + + id_str[8] = '\0'; + ret = virStrToLong_ui(id_str, NULL, 16, &value); + if (ret == 0) + *device_class = (value >> 8) & 0xFFFF;
If the class file for some reason doesn't contain a hexadecimal number, this will return an error (-1) without having logged an error. This results in one of those "An error occurred, but the cause is unknown" messages, which is always very frustrating to debug. Even though it's *highly* unlikely that such an error would occur, we should still have the code to log it just in case.
I will work it out and repost.
+ + VIR_FREE(id_str); + return ret; +} + +static int virPCIDeviceWrite(virPCIDevicePtr dev, int cfgfd, unsigned int pos, @@ -645,8 +673,8 @@ virPCIDeviceIsParent(virPCIDevicePtr dev, virPCIDevicePtr check, void *data) return 0;
/* Is it a bridge? */ - device_class = virPCIDeviceRead16(check, fd, PCI_CLASS_DEVICE); - if (device_class != PCI_CLASS_BRIDGE_PCI) + ret = virPCIDeviceReadClass(check, &device_class); + if (ret == -1 || device_class != PCI_CLASS_BRIDGE_PCI) goto cleanup;
/* Is it a plane? */ @@ -2110,6 +2138,7 @@ virPCIDeviceDownstreamLacksACS(virPCIDevicePtr dev) unsigned int pos; int fd; int ret = 0; + uint16_t device_class;
if ((fd = virPCIDeviceConfigOpen(dev, true)) < 0) return -1; @@ -2119,8 +2148,11 @@ virPCIDeviceDownstreamLacksACS(virPCIDevicePtr dev) goto cleanup; }
+ if (virPCIDeviceReadClass(dev, &device_class)) + goto cleanup; + pos = dev->pcie_cap_pos; - if (!pos || virPCIDeviceRead16(dev, fd, PCI_CLASS_DEVICE) != PCI_CLASS_BRIDGE_PCI) + if (!pos || device_class != PCI_CLASS_BRIDGE_PCI) goto cleanup;
flags = virPCIDeviceRead16(dev, fd, pos + PCI_EXP_FLAGS);
Aside from the above lack of error logging, as Michal suggested a test case would be very useful.
I will work on that as well. I guess this should be a different commit, and could even be applied before this one. I'll test virPCIDeviceIsAssignable, which is the public function here, and provide two test cases, one that works without this patch, and one that doesn't. Does that work? Any suggestions? Thanks for the review. Cascardo.

On 12/11/2013 04:46 PM, Thadeu Lima de Souza Cascardo wrote:
On Wed, Dec 11, 2013 at 04:30:48PM +0200, Laine Stump wrote:
Aside from the above lack of error logging, as Michal suggested a test case would be very useful. I will work on that as well. I guess this should be a different commit, and could even be applied before this one. I'll test virPCIDeviceIsAssignable, which is the public function here, and provide two test cases, one that works without this patch, and one that doesn't.
Does that work? Any suggestions?
Whenever possible, a test case for new/different functionality (or to prove that the code changes didn't break some other desired existing behavior) should be in the same commit as the new functionality (certainly not in a commit *before* the new functionality, as that would give us a version in git that failed make check). If you wanted you could make the test that "works without this patch" a separate commit, but I don't think that's necessary.
participants (4)
-
Daniel P. Berrange
-
Laine Stump
-
Michal Privoznik
-
Thadeu Lima de Souza Cascardo