On Wed, Dec 11, 2013 at 04:30:48PM +0200, Laine Stump wrote:
On 12/10/2013 11:57 AM, Thadeu Lima de Souza Cascardo wrote:
> When determining if a device is behind a PCI bridge, the PCI device
> class is checked by reading the config space. However, there are some
> devices which have the wrong class on the config space, but the class is
> initialized by Linux correctly as a PCI BRIDGE. This class can be read
> by the sysfs file '/sys/bus/pci/devices/xxxx:xx:xx.x/class'.
Since I'm not an expert on the PCI spec, I'll just have to take your
word for this :-)
Well, I'm not an expert either, but in this case, it's just a matter of
dealing with bad hardware. After this patch, I managed to get a guest up
with the PCI device and it worked. But if it makes everybody more
comfortable (or less comfortable, realizing there is broken hardware out
there :-), here are two snippets from kernel code:
arch/powerpc/platforms/powernv/pci.c:
/* Fixup wrong class code in p7ioc and p8 root complex */
static void pnv_p7ioc_rc_quirk(struct pci_dev *dev)
{
dev->class = PCI_CLASS_BRIDGE_PCI << 8;
}
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IBM, 0x3b9, pnv_p7ioc_rc_quirk);
vers/pci/host/pci-tegra.c:
/* Tegra PCIE root complex wrongly reports device class */
static void tegra_pcie_fixup_class(struct pci_dev *dev)
{
dev->class = PCI_CLASS_BRIDGE_PCI << 8;
}
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_NVIDIA, 0x0bf0, tegra_pcie_fixup_class);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_NVIDIA, 0x0bf1, tegra_pcie_fixup_class);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_NVIDIA, 0x0e1c, tegra_pcie_fixup_class);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_NVIDIA, 0x0e1d, tegra_pcie_fixup_class);
I just believe that, instead of introducing those same quirks into
libvirt code, it's so much better to leave this at the kernel, and read
the fixed value of class.
>
> One example of such bridge is IBM PCI Bridge 1014:03b9, which is
> identified as a Host Bridge when reading the config space.
>
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)linux.vnet.ibm.com>
> ---
> src/util/virpci.c | 38 +++++++++++++++++++++++++++++++++++---
> 1 files changed, 35 insertions(+), 3 deletions(-)
>
> diff --git a/src/util/virpci.c b/src/util/virpci.c
> index 8ec642f..8353411 100644
> --- a/src/util/virpci.c
> +++ b/src/util/virpci.c
> @@ -344,6 +344,34 @@ virPCIDeviceRead32(virPCIDevicePtr dev, int cfgfd, unsigned int
pos)
> }
>
> static int
> +virPCIDeviceReadClass(virPCIDevicePtr dev, uint16_t *device_class)
> +{
> + char *path = NULL;
> + char *id_str;
> + int ret = 0;
> + unsigned int value;
> +
> + if (virPCIFile(&path, dev->name, "class") < 0)
> + return -1;
> +
> + /* class string is '0xNNNNNN\n' ... i.e. 9 bytes */
> + if (virFileReadAll(path, 9, &id_str) < 0) {
> + VIR_FREE(path);
> + return -1;
> + }
> +
> + VIR_FREE(path);
> +
> + id_str[8] = '\0';
> + ret = virStrToLong_ui(id_str, NULL, 16, &value);
> + if (ret == 0)
> + *device_class = (value >> 8) & 0xFFFF;
If the class file for some reason doesn't contain a hexadecimal number,
this will return an error (-1) without having logged an error. This
results in one of those "An error occurred, but the cause is unknown"
messages, which is always very frustrating to debug. Even though it's
*highly* unlikely that such an error would occur, we should still have
the code to log it just in case.
I will work it out and repost.
> +
> + VIR_FREE(id_str);
> + return ret;
> +}
> +
> +static int
> virPCIDeviceWrite(virPCIDevicePtr dev,
> int cfgfd,
> unsigned int pos,
> @@ -645,8 +673,8 @@ virPCIDeviceIsParent(virPCIDevicePtr dev, virPCIDevicePtr check,
void *data)
> return 0;
>
> /* Is it a bridge? */
> - device_class = virPCIDeviceRead16(check, fd, PCI_CLASS_DEVICE);
> - if (device_class != PCI_CLASS_BRIDGE_PCI)
> + ret = virPCIDeviceReadClass(check, &device_class);
> + if (ret == -1 || device_class != PCI_CLASS_BRIDGE_PCI)
> goto cleanup;
>
> /* Is it a plane? */
> @@ -2110,6 +2138,7 @@ virPCIDeviceDownstreamLacksACS(virPCIDevicePtr dev)
> unsigned int pos;
> int fd;
> int ret = 0;
> + uint16_t device_class;
>
> if ((fd = virPCIDeviceConfigOpen(dev, true)) < 0)
> return -1;
> @@ -2119,8 +2148,11 @@ virPCIDeviceDownstreamLacksACS(virPCIDevicePtr dev)
> goto cleanup;
> }
>
> + if (virPCIDeviceReadClass(dev, &device_class))
> + goto cleanup;
> +
> pos = dev->pcie_cap_pos;
> - if (!pos || virPCIDeviceRead16(dev, fd, PCI_CLASS_DEVICE) !=
PCI_CLASS_BRIDGE_PCI)
> + if (!pos || device_class != PCI_CLASS_BRIDGE_PCI)
> goto cleanup;
>
> flags = virPCIDeviceRead16(dev, fd, pos + PCI_EXP_FLAGS);
Aside from the above lack of error logging, as Michal suggested a test
case would be very useful.
I will work on that as well. I guess this should be a different commit,
and could even be applied before this one. I'll test
virPCIDeviceIsAssignable, which is the public function here, and provide
two test cases, one that works without this patch, and one that doesn't.
Does that work? Any suggestions?
Thanks for the review.
Cascardo.