[libvirt] [PATCH v3 0/2] nodedev: Expose PCI header type information

v3: changed 'header' to 'capability', only reported for non-endpoints v2: Removed <multifunction/> reporting. https://www.redhat.com/archives/libvir-list/2016-March/msg00774.html v1: https://www.redhat.com/archives/libvir-list/2016-March/msg00645.html Martin Kletzander (2): nodedev: Indent PCI express for future fix nodedev: Expose PCI header type docs/schemas/nodedev.rng | 11 +++++++ src/conf/node_device_conf.c | 20 ++++++++++++ src/conf/node_device_conf.h | 1 + src/libvirt_private.syms | 3 ++ src/node_device/node_device_udev.c | 37 ++++++++++++---------- src/util/virpci.c | 33 +++++++++++++++++++ src/util/virpci.h | 12 +++++++ .../pci_0000_00_02_0_header_type.xml | 15 +++++++++ .../pci_0000_00_1c_0_header_type.xml | 20 ++++++++++++ tests/nodedevxml2xmltest.c | 2 ++ 10 files changed, 138 insertions(+), 16 deletions(-) create mode 100644 tests/nodedevschemadata/pci_0000_00_02_0_header_type.xml create mode 100644 tests/nodedevschemadata/pci_0000_00_1c_0_header_type.xml -- 2.7.3

Best viewed with '-w' as this is just an adjustment for future patch to be readable without that. Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/node_device/node_device_udev.c | 38 ++++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index 100b44d20186..aaee0e503d7b 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -505,28 +505,30 @@ static int udevProcessPCI(struct udev_device *device, goto out; /* We need to be root to read PCI device configs */ - if (priv->privileged && virPCIDeviceIsPCIExpress(pciDev) > 0) { - if (VIR_ALLOC(pci_express) < 0) - goto out; - - if (virPCIDeviceHasPCIExpressLink(pciDev) > 0) { - if (VIR_ALLOC(pci_express->link_cap) < 0 || - VIR_ALLOC(pci_express->link_sta) < 0) + if (priv->privileged) { + if (virPCIDeviceIsPCIExpress(pciDev) > 0) { + if (VIR_ALLOC(pci_express) < 0) goto out; - if (virPCIDeviceGetLinkCapSta(pciDev, - &pci_express->link_cap->port, - &pci_express->link_cap->speed, - &pci_express->link_cap->width, - &pci_express->link_sta->speed, - &pci_express->link_sta->width) < 0) - goto out; + if (virPCIDeviceHasPCIExpressLink(pciDev) > 0) { + if (VIR_ALLOC(pci_express->link_cap) < 0 || + VIR_ALLOC(pci_express->link_sta) < 0) + goto out; - pci_express->link_sta->port = -1; /* PCIe can't negotiate port. Yet :) */ + if (virPCIDeviceGetLinkCapSta(pciDev, + &pci_express->link_cap->port, + &pci_express->link_cap->speed, + &pci_express->link_cap->width, + &pci_express->link_sta->speed, + &pci_express->link_sta->width) < 0) + goto out; + + pci_express->link_sta->port = -1; /* PCIe can't negotiate port. Yet :) */ + } + data->pci_dev.flags |= VIR_NODE_DEV_CAP_FLAG_PCIE; + data->pci_dev.pci_express = pci_express; + pci_express = NULL; } - data->pci_dev.flags |= VIR_NODE_DEV_CAP_FLAG_PCIE; - data->pci_dev.pci_express = pci_express; - pci_express = NULL; } ret = 0; -- 2.7.3

On 03/17/2016 12:18 PM, Martin Kletzander wrote:
Best viewed with '-w' as this is just an adjustment for future patch to be readable without that.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/node_device/node_device_udev.c | 38 ++++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 18 deletions(-)
ACK John

If we expose this information, which is one byte in every PCI config file, we let all mgmt apps know whether the device itself is an endpoint or not so it's easier for them to decide whether such device can be passed through into a VM (endpoint) or not (*-bridge). Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1317531 Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- docs/schemas/nodedev.rng | 11 ++++++++ src/conf/node_device_conf.c | 20 +++++++++++++ src/conf/node_device_conf.h | 1 + src/libvirt_private.syms | 3 ++ src/node_device/node_device_udev.c | 3 ++ src/util/virpci.c | 33 ++++++++++++++++++++++ src/util/virpci.h | 12 ++++++++ .../pci_0000_00_02_0_header_type.xml | 15 ++++++++++ .../pci_0000_00_1c_0_header_type.xml | 20 +++++++++++++ tests/nodedevxml2xmltest.c | 2 ++ 10 files changed, 120 insertions(+) create mode 100644 tests/nodedevschemadata/pci_0000_00_02_0_header_type.xml create mode 100644 tests/nodedevschemadata/pci_0000_00_1c_0_header_type.xml diff --git a/docs/schemas/nodedev.rng b/docs/schemas/nodedev.rng index 744dccdf5fa9..949811cacb01 100644 --- a/docs/schemas/nodedev.rng +++ b/docs/schemas/nodedev.rng @@ -169,6 +169,17 @@ </optional> <optional> + <element name='capability'> + <attribute name='type'> + <choice> + <value>pci-bridge</value> + <value>cardbus-bridge</value> + </choice> + </attribute> + </element> + </optional> + + <optional> <element name='pci-express'> <zeroOrMore> <element name='link'> diff --git a/src/conf/node_device_conf.c b/src/conf/node_device_conf.c index 0c9c348e3793..611045c679ab 100644 --- a/src/conf/node_device_conf.c +++ b/src/conf/node_device_conf.c @@ -402,6 +402,12 @@ char *virNodeDeviceDefFormat(const virNodeDeviceDef *def) if (data->pci_dev.numa_node >= 0) virBufferAsprintf(&buf, "<numa node='%d'/>\n", data->pci_dev.numa_node); + + if (data->pci_dev.hdrType) { + virBufferAsprintf(&buf, "<capability type='%s'/>\n", + virPCIHeaderTypeToString(data->pci_dev.hdrType)); + } + if (data->pci_dev.flags & VIR_NODE_DEV_CAP_FLAG_PCIE) virPCIEDeviceInfoFormat(&buf, data->pci_dev.pci_express); break; @@ -1272,6 +1278,7 @@ virNodeDevCapPCIDevParseXML(xmlXPathContextPtr ctxt, xmlNodePtr orignode, iommuGroupNode, pciExpress; int ret = -1; virPCIEDeviceInfoPtr pci_express = NULL; + char *tmp = NULL; orignode = ctxt->node; ctxt->node = node; @@ -1329,6 +1336,18 @@ virNodeDevCapPCIDevParseXML(xmlXPathContextPtr ctxt, _("invalid NUMA node ID supplied for '%s'")) < 0) goto out; + if ((tmp = virXPathString("string(./capability[1]/@type)", ctxt))) { + int hdrType = virPCIHeaderTypeFromString(tmp); + + if (hdrType <= 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unknown PCI header type '%s'"), tmp); + goto out; + } + + data->pci_dev.hdrType = hdrType; + } + if ((pciExpress = virXPathNode("./pci-express[1]", ctxt))) { if (VIR_ALLOC(pci_express) < 0) goto out; @@ -1343,6 +1362,7 @@ virNodeDevCapPCIDevParseXML(xmlXPathContextPtr ctxt, ret = 0; out: + VIR_FREE(tmp); virPCIEDeviceInfoFree(pci_express); ctxt->node = orignode; return ret; diff --git a/src/conf/node_device_conf.h b/src/conf/node_device_conf.h index d0071867ccb7..be6dd5eb4ec1 100644 --- a/src/conf/node_device_conf.h +++ b/src/conf/node_device_conf.h @@ -119,6 +119,7 @@ typedef struct _virNodeDevCapData { unsigned int iommuGroupNumber; int numa_node; virPCIEDeviceInfoPtr pci_express; + int hdrType; /* enum virPCIHeaderType or -1 */ } pci_dev; struct { unsigned int bus; diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 55c3047d3db6..d2fcb868356f 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2004,11 +2004,14 @@ virPCIDeviceSetUsedBy; virPCIDeviceUnbind; virPCIDeviceWaitForCleanup; virPCIEDeviceInfoFree; +virPCIGetHeaderType; virPCIGetNetName; virPCIGetPhysicalFunction; virPCIGetVirtualFunctionIndex; virPCIGetVirtualFunctionInfo; virPCIGetVirtualFunctions; +virPCIHeaderTypeFromString; +virPCIHeaderTypeToString; virPCIIsVirtualFunction; virPCIStubDriverTypeFromString; virPCIStubDriverTypeToString; diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index aaee0e503d7b..6bff5bac7366 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -506,6 +506,9 @@ static int udevProcessPCI(struct udev_device *device, /* We need to be root to read PCI device configs */ if (priv->privileged) { + if (virPCIGetHeaderType(pciDev, &data->pci_dev.hdrType) < 0) + goto out; + if (virPCIDeviceIsPCIExpress(pciDev) > 0) { if (VIR_ALLOC(pci_express) < 0) goto out; diff --git a/src/util/virpci.c b/src/util/virpci.c index 1854318dd114..f7921f86d6de 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -62,6 +62,12 @@ VIR_ENUM_IMPL(virPCIStubDriver, VIR_PCI_STUB_DRIVER_LAST, "vfio-pci", /* VFIO */ ); +VIR_ENUM_IMPL(virPCIHeader, VIR_PCI_HEADER_LAST, + "endpoint", + "pci-bridge", + "cardbus-bridge", +); + struct _virPCIDevice { virPCIDeviceAddress address; @@ -2883,6 +2889,33 @@ virPCIDeviceGetLinkCapSta(virPCIDevicePtr dev, } +int virPCIGetHeaderType(virPCIDevicePtr dev, int *hdrType) +{ + int fd; + uint8_t type; + + *hdrType = -1; + + if ((fd = virPCIDeviceConfigOpen(dev, true)) < 0) + return -1; + + type = virPCIDeviceRead8(dev, fd, PCI_HEADER_TYPE); + + virPCIDeviceConfigClose(dev, fd); + + type &= PCI_HEADER_TYPE_MASK; + if (type >= VIR_PCI_HEADER_LAST) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unknown PCI header type '%d'"), type); + return -1; + } + + *hdrType = type; + + return 0; +} + + void virPCIEDeviceInfoFree(virPCIEDeviceInfoPtr dev) { diff --git a/src/util/virpci.h b/src/util/virpci.h index 55329c8a03d3..82f45ec4175f 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -62,6 +62,16 @@ typedef enum { VIR_ENUM_DECL(virPCIELinkSpeed) +typedef enum { + VIR_PCI_HEADER_ENDPOINT = 0, + VIR_PCI_HEADER_PCI_BRIDGE, + VIR_PCI_HEADER_CARDBUS_BRIDGE, + + VIR_PCI_HEADER_LAST +} virPCIHeaderType; + +VIR_ENUM_DECL(virPCIHeader) + typedef struct _virPCIELink virPCIELink; typedef virPCIELink *virPCIELinkPtr; struct _virPCIELink { @@ -223,6 +233,8 @@ int virPCIDeviceGetLinkCapSta(virPCIDevicePtr dev, unsigned int *sta_speed, unsigned int *sta_width); +int virPCIGetHeaderType(virPCIDevicePtr dev, int *hdrType); + void virPCIEDeviceInfoFree(virPCIEDeviceInfoPtr dev); #endif /* __VIR_PCI_H__ */ diff --git a/tests/nodedevschemadata/pci_0000_00_02_0_header_type.xml b/tests/nodedevschemadata/pci_0000_00_02_0_header_type.xml new file mode 100644 index 000000000000..5150fd1e8b61 --- /dev/null +++ b/tests/nodedevschemadata/pci_0000_00_02_0_header_type.xml @@ -0,0 +1,15 @@ +<device> + <name>pci_0000_00_02_0</name> + <parent>computer</parent> + <capability type='pci'> + <domain>0</domain> + <bus>0</bus> + <slot>2</slot> + <function>0</function> + <product id='0x0416'>4th Gen Core Processor Integrated Graphics Controller</product> + <vendor id='0x8086'>Intel Corporation</vendor> + <iommuGroup number='1'> + <address domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </iommuGroup> + </capability> +</device> diff --git a/tests/nodedevschemadata/pci_0000_00_1c_0_header_type.xml b/tests/nodedevschemadata/pci_0000_00_1c_0_header_type.xml new file mode 100644 index 000000000000..dea5f05237ff --- /dev/null +++ b/tests/nodedevschemadata/pci_0000_00_1c_0_header_type.xml @@ -0,0 +1,20 @@ +<device> + <name>pci_0000_00_1c_0</name> + <parent>computer</parent> + <capability type='pci'> + <domain>0</domain> + <bus>0</bus> + <slot>28</slot> + <function>0</function> + <product id='0x8c10'>8 Series/C220 Series Chipset Family PCI Express Root Port #1</product> + <vendor id='0x8086'>Intel Corporation</vendor> + <iommuGroup number='8'> + <address domain='0x0000' bus='0x00' slot='0x1c' function='0x0'/> + </iommuGroup> + <capability type='pci-bridge'/> + <pci-express> + <link validity='cap' port='1' speed='5' width='1'/> + <link validity='sta' speed='2.5' width='1'/> + </pci-express> + </capability> +</device> diff --git a/tests/nodedevxml2xmltest.c b/tests/nodedevxml2xmltest.c index 0089b5dadffc..f519bce6345f 100644 --- a/tests/nodedevxml2xmltest.c +++ b/tests/nodedevxml2xmltest.c @@ -91,6 +91,8 @@ mymain(void) DO_TEST("usb_device_1d6b_1_0000_00_1d_0"); DO_TEST("pci_8086_4238_pcie_wireless"); DO_TEST("pci_8086_0c0c_snd_hda_intel"); + DO_TEST("pci_0000_00_02_0_header_type"); + DO_TEST("pci_0000_00_1c_0_header_type"); return ret == 0 ? EXIT_SUCCESS : EXIT_FAILURE; } -- 2.7.3

On 03/17/2016 12:18 PM, Martin Kletzander wrote:
If we expose this information, which is one byte in every PCI config file, we let all mgmt apps know whether the device itself is an endpoint or not so it's easier for them to decide whether such device can be passed through into a VM (endpoint) or not (*-bridge).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1317531
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- docs/schemas/nodedev.rng | 11 ++++++++ src/conf/node_device_conf.c | 20 +++++++++++++ src/conf/node_device_conf.h | 1 + src/libvirt_private.syms | 3 ++ src/node_device/node_device_udev.c | 3 ++ src/util/virpci.c | 33 ++++++++++++++++++++++ src/util/virpci.h | 12 ++++++++ .../pci_0000_00_02_0_header_type.xml | 15 ++++++++++ .../pci_0000_00_1c_0_header_type.xml | 20 +++++++++++++ tests/nodedevxml2xmltest.c | 2 ++ 10 files changed, 120 insertions(+) create mode 100644 tests/nodedevschemadata/pci_0000_00_02_0_header_type.xml create mode 100644 tests/nodedevschemadata/pci_0000_00_1c_0_header_type.xml
Will there be an adjustment to formatnode.html.in too? To explain how to use... John

On Thu, Mar 17, 2016 at 12:30:48PM -0400, John Ferlan wrote:
On 03/17/2016 12:18 PM, Martin Kletzander wrote:
If we expose this information, which is one byte in every PCI config file, we let all mgmt apps know whether the device itself is an endpoint or not so it's easier for them to decide whether such device can be passed through into a VM (endpoint) or not (*-bridge).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1317531
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- docs/schemas/nodedev.rng | 11 ++++++++ src/conf/node_device_conf.c | 20 +++++++++++++ src/conf/node_device_conf.h | 1 + src/libvirt_private.syms | 3 ++ src/node_device/node_device_udev.c | 3 ++ src/util/virpci.c | 33 ++++++++++++++++++++++ src/util/virpci.h | 12 ++++++++ .../pci_0000_00_02_0_header_type.xml | 15 ++++++++++ .../pci_0000_00_1c_0_header_type.xml | 20 +++++++++++++ tests/nodedevxml2xmltest.c | 2 ++ 10 files changed, 120 insertions(+) create mode 100644 tests/nodedevschemadata/pci_0000_00_02_0_header_type.xml create mode 100644 tests/nodedevschemadata/pci_0000_00_1c_0_header_type.xml
Will there be an adjustment to formatnode.html.in too? To explain how to use...
Oh, thanks for reminding me of that. Would squashing something similar to this suffice or should I rather repost this as v4? diff --git i/docs/formatnode.html.in w/docs/formatnode.html.in index 6c12227b8952..e7b11400cbeb 100644 --- i/docs/formatnode.html.in +++ w/docs/formatnode.html.in @@ -97,27 +97,38 @@ <dd> This optional element can occur multiple times. If it exists, it has a mandatory <code>type</code> attribute - which will be set to - either <code>physical_function</code> - or <code>virtual_functions</code>. If the type - is <code>physical_function</code>, there will be a - single <code>address</code> subelement which contains - the PCI address of the SRIOV Physical Function (PF) - that is the parent of this device (and this device is, - by implication, an SRIOV Virtual Function (VF)). If - the type is <code>virtual_functions</code>, then this - device is an SRIOV PF, and the capability element will - have a list of <code>address</code> subelements, one - for each VF on this PF. If the host system supports - reporting it (via the "sriov_maxvfs" file in the - device's sysfs directory) the capability element will - also have an attribute named <code>maxCount</code> - which is the maximum number of SRIOV VFs supported by - this device, which could be higher than the number of - VFs that are curently active <span class="since">since - 1.3.0</span>; in this case, even if there are - currently no active VFs the virtual_functions - capabililty will still be shown. + which will be set to: + <dl> + <dt><code>physical_function</code></dt> + <dd> + That means there will be a single <code>address</code> + subelement which contains the PCI address of the SRIOV + Physical Function (PF) that is the parent of this device + (and this device is, by implication, an SRIOV Virtual + Function (VF)). + </dd> + <dt><code>virtual_function</code></dt> + <dd> + In this case this device is an SRIOV PF, and the capability + element will have a list of <code>address</code> + subelements, one for each VF on this PF. If the host system + supports reporting it (via the "sriov_maxvfs" file in the + device's sysfs directory) the capability element will also + have an attribute named <code>maxCount</code> which is the + maximum number of SRIOV VFs supported by this device, which + could be higher than the number of VFs that are curently + active <span class="since">since 1.3.0</span>; in this case, + even if there are currently no active VFs the + virtual_functions capabililty will still be shown. + </dd> + <dt><code>pci-bridge</code> or <code>cardbus-bridge</code></dt> + <dd> + This shows merely that the lower 7 bits of PCI header type + have either value of 1 or 2 respectively. Usually this + means such device cannot be used for PCI passthrough. + <span class="since">Since 1.3.3</span> + </dd> + </dl> </dd> <dt><code>numa</code></dt> <dd> -- Martin

On 03/18/2016 03:28 AM, Martin Kletzander wrote:
On Thu, Mar 17, 2016 at 12:30:48PM -0400, John Ferlan wrote:
On 03/17/2016 12:18 PM, Martin Kletzander wrote:
If we expose this information, which is one byte in every PCI config file, we let all mgmt apps know whether the device itself is an endpoint or not so it's easier for them to decide whether such device can be passed through into a VM (endpoint) or not (*-bridge).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1317531
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- docs/schemas/nodedev.rng | 11 ++++++++ src/conf/node_device_conf.c | 20 +++++++++++++ src/conf/node_device_conf.h | 1 + src/libvirt_private.syms | 3 ++ src/node_device/node_device_udev.c | 3 ++ src/util/virpci.c | 33 ++++++++++++++++++++++ src/util/virpci.h | 12 ++++++++ .../pci_0000_00_02_0_header_type.xml | 15 ++++++++++ .../pci_0000_00_1c_0_header_type.xml | 20 +++++++++++++ tests/nodedevxml2xmltest.c | 2 ++ 10 files changed, 120 insertions(+) create mode 100644 tests/nodedevschemadata/pci_0000_00_02_0_header_type.xml create mode 100644 tests/nodedevschemadata/pci_0000_00_1c_0_header_type.xml
Will there be an adjustment to formatnode.html.in too? To explain how to use...
Oh, thanks for reminding me of that. Would squashing something similar to this suffice or should I rather repost this as v4?
Squashing is fine - just didn't want it forgotten... I like the new formatting too. ACK as a squash - John

On Thu, Mar 17, 2016 at 05:18:50PM +0100, Martin Kletzander wrote:
If we expose this information, which is one byte in every PCI config file, we let all mgmt apps know whether the device itself is an endpoint or not so it's easier for them to decide whether such device can be passed through into a VM (endpoint) or not (*-bridge).
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1317531
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- docs/schemas/nodedev.rng | 11 ++++++++ src/conf/node_device_conf.c | 20 +++++++++++++ src/conf/node_device_conf.h | 1 + src/libvirt_private.syms | 3 ++ src/node_device/node_device_udev.c | 3 ++ src/util/virpci.c | 33 ++++++++++++++++++++++ src/util/virpci.h | 12 ++++++++ .../pci_0000_00_02_0_header_type.xml | 15 ++++++++++ .../pci_0000_00_1c_0_header_type.xml | 20 +++++++++++++ tests/nodedevxml2xmltest.c | 2 ++ 10 files changed, 120 insertions(+) create mode 100644 tests/nodedevschemadata/pci_0000_00_02_0_header_type.xml create mode 100644 tests/nodedevschemadata/pci_0000_00_1c_0_header_type.xml
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (3)
-
Daniel P. Berrange
-
John Ferlan
-
Martin Kletzander