[PATCH 0/7] fix PCI vs PCIe detection when running libvirtd as root, but unprivileged

I think there's an adequate description in patch 6 (and maybe in the bugzilla record: https://bugzilla.redhat.com/1901685). I already typed a long description once, and git send-email failed due to network problems and threw away everything I typed. I don't feel like typing it again. Laine Stump (7): qemu: use g_autoptr for a virPCIDevice util: simplify calling of virPCIDeviceDetectFunctionLevelReset() util: simplify call to virPCIDeviceDetectPowerManagementReset() util: make read error of PCI config file more detailed util: change call sequence for virPCIDeviceFindCapabilityOffset() util: make virPCIDeviceIsPCIExpress() more intelligent qemu: remove now-redundant check for file length when determining PCIe vs. PCI src/qemu/qemu_domain_address.c | 43 ++----------- src/util/virpci.c | 107 ++++++++++++++++++++++++--------- 2 files changed, 82 insertions(+), 68 deletions(-) -- 2.28.0

The one instance of a virPCIDevice in qemuDomainDeviceCalculatePCIConnectFlags() needs to be converted to use g_autoptr as a prerequisite for a bugfix. It's in this patch by itself (rather than in a patch converting all virPCIDevice usages to g_autoptr) to simplify any backport of said bugfix. Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_domain_address.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index d872f75b38..b07672e2f4 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -803,7 +803,7 @@ qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDefPtr dev, case VIR_DOMAIN_DEVICE_HOSTDEV: { virDomainHostdevDefPtr hostdev = dev->data.hostdev; bool isExpress = false; - virPCIDevicePtr pciDev; + g_autoptr(virPCIDevice) pciDev = NULL; virPCIDeviceAddressPtr hostAddr = &hostdev->source.subsys.u.pci.addr; if (!virHostdevIsMdevDevice(hostdev) && @@ -891,8 +891,6 @@ qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDefPtr dev, off_t configLen = virFileLength(virPCIDeviceGetConfigPath(pciDev), -1); - virPCIDeviceFree(pciDev); - if (configLen == 256) return pciFlags; @@ -904,7 +902,6 @@ qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDefPtr dev, * a definitive answer. */ isExpress = virPCIDeviceIsPCIExpress(pciDev); - virPCIDeviceFree(pciDev); if (isExpress) return pcieFlags; -- 2.28.0

This function returned an int, and that int was being checked for < 0 in its solitary caller, but within the function it would only ever return 0 or 1. Change the function itself to return a bool, and the caller to just directly set the flag in the virPCIDevice. Signed-off-by: Laine Stump <laine@redhat.com> --- src/util/virpci.c | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/src/util/virpci.c b/src/util/virpci.c index b63abac8cc..403ef1ec8e 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -549,7 +549,7 @@ virPCIDeviceFindExtendedCapabilityOffset(virPCIDevicePtr dev, /* detects whether this device has FLR. Returns 0 if the device does * not have FLR, 1 if it does, and -1 on error */ -static int +static bool virPCIDeviceDetectFunctionLevelReset(virPCIDevicePtr dev, int cfgfd) { uint32_t caps; @@ -567,7 +567,7 @@ virPCIDeviceDetectFunctionLevelReset(virPCIDevicePtr dev, int cfgfd) caps = virPCIDeviceRead32(dev, cfgfd, dev->pcie_cap_pos + PCI_EXP_DEVCAP); if (caps & PCI_EXP_DEVCAP_FLR) { VIR_DEBUG("%s %s: detected PCIe FLR capability", dev->id, dev->name); - return 1; + return true; } } @@ -580,7 +580,7 @@ virPCIDeviceDetectFunctionLevelReset(virPCIDevicePtr dev, int cfgfd) caps = virPCIDeviceRead16(dev, cfgfd, pos + PCI_AF_CAP); if (caps & PCI_AF_CAP_FLR) { VIR_DEBUG("%s %s: detected PCI FLR capability", dev->id, dev->name); - return 1; + return true; } } @@ -596,12 +596,12 @@ virPCIDeviceDetectFunctionLevelReset(virPCIDevicePtr dev, int cfgfd) if (found) { VIR_DEBUG("%s %s: buggy device didn't advertise FLR, but is a VF; forcing flr on", dev->id, dev->name); - return 1; + return true; } VIR_DEBUG("%s %s: no FLR capability found", dev->id, dev->name); - return 0; + return false; } /* Require the device has the PCI Power Management capability @@ -885,15 +885,10 @@ virPCIDeviceTryPowerManagementReset(virPCIDevicePtr dev, int cfgfd) static int virPCIDeviceInit(virPCIDevicePtr dev, int cfgfd) { - int flr; - dev->pcie_cap_pos = virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_EXP); dev->pci_pm_cap_pos = virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_PM); - flr = virPCIDeviceDetectFunctionLevelReset(dev, cfgfd); - if (flr < 0) - return flr; - dev->has_flr = !!flr; - dev->has_pm_reset = !!virPCIDeviceDetectPowerManagementReset(dev, cfgfd); + dev->has_flr = virPCIDeviceDetectFunctionLevelReset(dev, cfgfd); + dev->has_pm_reset = !!virPCIDeviceDetectPowerManagementReset(dev, cfgfd); return 0; } -- 2.28.0

This function returned an int, but would only return 0 or 1, and the one place it was called would just use !! to convert that value to a bool. Change the function to directly return bool instead. Signed-off-by: Laine Stump <laine@redhat.com> --- src/util/virpci.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/util/virpci.c b/src/util/virpci.c index 403ef1ec8e..31622cddfa 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -608,7 +608,7 @@ virPCIDeviceDetectFunctionLevelReset(virPCIDevicePtr dev, int cfgfd) * and that a D3hot->D0 transition will results in a full * internal reset, not just a soft reset. */ -static unsigned int +static bool virPCIDeviceDetectPowerManagementReset(virPCIDevicePtr dev, int cfgfd) { if (dev->pci_pm_cap_pos) { @@ -618,13 +618,13 @@ virPCIDeviceDetectPowerManagementReset(virPCIDevicePtr dev, int cfgfd) ctl = virPCIDeviceRead32(dev, cfgfd, dev->pci_pm_cap_pos + PCI_PM_CTRL); if (!(ctl & PCI_PM_CTRL_NO_SOFT_RESET)) { VIR_DEBUG("%s %s: detected PM reset capability", dev->id, dev->name); - return 1; + return true; } } VIR_DEBUG("%s %s: no PM reset capability found", dev->id, dev->name); - return 0; + return false; } /* Any active devices on the same domain/bus ? */ @@ -888,7 +888,7 @@ virPCIDeviceInit(virPCIDevicePtr dev, int cfgfd) dev->pcie_cap_pos = virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_EXP); dev->pci_pm_cap_pos = virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_PM); dev->has_flr = virPCIDeviceDetectFunctionLevelReset(dev, cfgfd); - dev->has_pm_reset = !!virPCIDeviceDetectPowerManagementReset(dev, cfgfd); + dev->has_pm_reset = virPCIDeviceDetectPowerManagementReset(dev, cfgfd); return 0; } -- 2.28.0

The new message is more verbose/useful, but only logged at debug level instead of as a warning (since it could easily happen in a non-error situation). Signed-off-by: Laine Stump <laine@redhat.com> --- src/util/virpci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/util/virpci.c b/src/util/virpci.c index 31622cddfa..9109fb4f3a 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -332,8 +332,8 @@ virPCIDeviceRead(virPCIDevicePtr dev, if (lseek(cfgfd, pos, SEEK_SET) != pos || saferead(cfgfd, buf, buflen) != buflen) { - VIR_WARN("Failed to read from '%s' : %s", dev->path, - g_strerror(errno)); + VIR_DEBUG("Failed to read %u bytes at %u from '%s' : %s", + buflen, pos, dev->path, g_strerror(errno)); return -1; } return 0; -- 2.28.0

Previously there was no way to differentiate between this function 1) encountering an error while reading the pci config, and 2) determining that the device in question is a conventional PCI device, and so has no Express Capabilities. The difference between these two conditions is important, because an unprivileged libvirtd will be unable to read all of the pci config (it can only read the first 64 bytes, and will get ENOENT when it tries to seek past that limit) even though the device is in fact a PCIe device. This patch just changes virPCIDeviceFindCapabilityOffset() to put the determined offset into an argument of the function (rather than sending it back as the return value), and to return the standard "0 on success, -1 on failure". Failure is determined by checking the value of errno after each attemptd read of the config file (which can only work reliably if errno is reset to 0 before each read, and after virPCIDeviceFindCapabilityOffset() has finished examining it). (NB: if the config file is read successfully, but no Express Capabilities are found, then the function returns success, but the returned offset will be 0 (which is an impossible offset for Express Capabilities, and so easily recognizeable). An upcoming patch will take advantage of the change made here. Signed-off-by: Laine Stump <laine@redhat.com> --- src/util/virpci.c | 42 +++++++++++++++++++++++++++++++----------- 1 file changed, 31 insertions(+), 11 deletions(-) diff --git a/src/util/virpci.c b/src/util/virpci.c index 9109fb4f3a..2f99bb93bd 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -329,6 +329,7 @@ virPCIDeviceRead(virPCIDevicePtr dev, unsigned int buflen) { memset(buf, 0, buflen); + errno = 0; if (lseek(cfgfd, pos, SEEK_SET) != pos || saferead(cfgfd, buf, buflen) != buflen) { @@ -483,19 +484,24 @@ virPCIDeviceIterDevices(virPCIDeviceIterPredicate predicate, return ret; } -static uint8_t +static int virPCIDeviceFindCapabilityOffset(virPCIDevicePtr dev, int cfgfd, - unsigned int capability) + unsigned int capability, + unsigned int *offset) { uint16_t status; uint8_t pos; + *offset = 0; /* assume failure (*nothing* can be at offset 0) */ + status = virPCIDeviceRead16(dev, cfgfd, PCI_STATUS); - if (!(status & PCI_STATUS_CAP_LIST)) - return 0; + if (errno != 0 || !(status & PCI_STATUS_CAP_LIST)) + goto error; pos = virPCIDeviceRead8(dev, cfgfd, PCI_CAPABILITY_LIST); + if (errno != 0) + goto error; /* Zero indicates last capability, capabilities can't * be in the config space header and 0xff is returned @@ -506,18 +512,31 @@ virPCIDeviceFindCapabilityOffset(virPCIDevicePtr dev, */ while (pos >= PCI_CONF_HEADER_LEN && pos != 0xff) { uint8_t capid = virPCIDeviceRead8(dev, cfgfd, pos); + if (errno != 0) + goto error; + if (capid == capability) { VIR_DEBUG("%s %s: found cap 0x%.2x at 0x%.2x", dev->id, dev->name, capability, pos); - return pos; + *offset = pos; + return 0; } pos = virPCIDeviceRead8(dev, cfgfd, pos + 1); + if (errno != 0) + goto error; } - VIR_DEBUG("%s %s: failed to find cap 0x%.2x", dev->id, dev->name, capability); + error: + VIR_DEBUG("%s %s: failed to find cap 0x%.2x (%s)", + dev->id, dev->name, capability, g_strerror(errno)); - return 0; + /* reset errno in case the failure was due to insufficient + * privileges to read the entire PCI config file + */ + errno = 0; + + return -1; } static unsigned int @@ -553,7 +572,7 @@ static bool virPCIDeviceDetectFunctionLevelReset(virPCIDevicePtr dev, int cfgfd) { uint32_t caps; - uint8_t pos; + unsigned int pos; g_autofree char *path = NULL; int found; @@ -575,7 +594,8 @@ virPCIDeviceDetectFunctionLevelReset(virPCIDevicePtr dev, int cfgfd) * the same thing, except for conventional PCI * devices. This is not common yet. */ - pos = virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_AF); + virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_AF, &pos); + if (pos) { caps = virPCIDeviceRead16(dev, cfgfd, pos + PCI_AF_CAP); if (caps & PCI_AF_CAP_FLR) { @@ -885,8 +905,8 @@ virPCIDeviceTryPowerManagementReset(virPCIDevicePtr dev, int cfgfd) static int virPCIDeviceInit(virPCIDevicePtr dev, int cfgfd) { - dev->pcie_cap_pos = virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_EXP); - dev->pci_pm_cap_pos = virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_PM); + virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_EXP, &dev->pcie_cap_pos); + virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_PM, &dev->pci_pm_cap_pos); dev->has_flr = virPCIDeviceDetectFunctionLevelReset(dev, cfgfd); dev->has_pm_reset = virPCIDeviceDetectPowerManagementReset(dev, cfgfd); -- 2.28.0

On Tue, Dec 08, 2020 at 08:37:43PM -0500, Laine Stump wrote:
Previously there was no way to differentiate between this function 1) encountering an error while reading the pci config, and 2) determining that the device in question is a conventional PCI device, and so has no Express Capabilities.
The difference between these two conditions is important, because an unprivileged libvirtd will be unable to read all of the pci config (it can only read the first 64 bytes, and will get ENOENT when it tries to seek past that limit) even though the device is in fact a PCIe device.
This patch just changes virPCIDeviceFindCapabilityOffset() to put the determined offset into an argument of the function (rather than sending it back as the return value), and to return the standard "0 on success, -1 on failure". Failure is determined by checking the value of errno after each attemptd read of the config file (which can only work reliably if errno is reset to 0 before each read, and after virPCIDeviceFindCapabilityOffset() has finished examining it).
(NB: if the config file is read successfully, but no Express Capabilities are found, then the function returns success, but the returned offset will be 0 (which is an impossible offset for Express Capabilities, and so easily recognizeable).
An upcoming patch will take advantage of the change made here.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/util/virpci.c | 42 +++++++++++++++++++++++++++++++----------- 1 file changed, 31 insertions(+), 11 deletions(-)
diff --git a/src/util/virpci.c b/src/util/virpci.c index 9109fb4f3a..2f99bb93bd 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -329,6 +329,7 @@ virPCIDeviceRead(virPCIDevicePtr dev, unsigned int buflen) { memset(buf, 0, buflen); + errno = 0;
if (lseek(cfgfd, pos, SEEK_SET) != pos || saferead(cfgfd, buf, buflen) != buflen) { @@ -483,19 +484,24 @@ virPCIDeviceIterDevices(virPCIDeviceIterPredicate predicate, return ret; }
I think there should be some API docs added here to describe the semantics this method is trying to achieve wrt unprivileged usage and pci vs pcie.
-static uint8_t +static int virPCIDeviceFindCapabilityOffset(virPCIDevicePtr dev, int cfgfd, - unsigned int capability) + unsigned int capability, + unsigned int *offset) { uint16_t status; uint8_t pos;
+ *offset = 0; /* assume failure (*nothing* can be at offset 0) */ + status = virPCIDeviceRead16(dev, cfgfd, PCI_STATUS); - if (!(status & PCI_STATUS_CAP_LIST)) - return 0; + if (errno != 0 || !(status & PCI_STATUS_CAP_LIST)) + goto error;
pos = virPCIDeviceRead8(dev, cfgfd, PCI_CAPABILITY_LIST); + if (errno != 0) + goto error;
/* Zero indicates last capability, capabilities can't * be in the config space header and 0xff is returned @@ -506,18 +512,31 @@ virPCIDeviceFindCapabilityOffset(virPCIDevicePtr dev, */ while (pos >= PCI_CONF_HEADER_LEN && pos != 0xff) { uint8_t capid = virPCIDeviceRead8(dev, cfgfd, pos); + if (errno != 0) + goto error; + if (capid == capability) { VIR_DEBUG("%s %s: found cap 0x%.2x at 0x%.2x", dev->id, dev->name, capability, pos); - return pos; + *offset = pos; + return 0; }
pos = virPCIDeviceRead8(dev, cfgfd, pos + 1); + if (errno != 0) + goto error; }
- VIR_DEBUG("%s %s: failed to find cap 0x%.2x", dev->id, dev->name, capability); + error: + VIR_DEBUG("%s %s: failed to find cap 0x%.2x (%s)", + dev->id, dev->name, capability, g_strerror(errno));
- return 0; + /* reset errno in case the failure was due to insufficient + * privileges to read the entire PCI config file + */ + errno = 0; + + return -1; }
static unsigned int @@ -553,7 +572,7 @@ static bool virPCIDeviceDetectFunctionLevelReset(virPCIDevicePtr dev, int cfgfd) { uint32_t caps; - uint8_t pos; + unsigned int pos; g_autofree char *path = NULL; int found;
@@ -575,7 +594,8 @@ virPCIDeviceDetectFunctionLevelReset(virPCIDevicePtr dev, int cfgfd) * the same thing, except for conventional PCI * devices. This is not common yet. */ - pos = virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_AF); + virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_AF, &pos); + if (pos) { caps = virPCIDeviceRead16(dev, cfgfd, pos + PCI_AF_CAP); if (caps & PCI_AF_CAP_FLR) { @@ -885,8 +905,8 @@ virPCIDeviceTryPowerManagementReset(virPCIDevicePtr dev, int cfgfd) static int virPCIDeviceInit(virPCIDevicePtr dev, int cfgfd) { - dev->pcie_cap_pos = virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_EXP); - dev->pci_pm_cap_pos = virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_PM); + virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_EXP, &dev->pcie_cap_pos); + virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_PM, &dev->pci_pm_cap_pos); dev->has_flr = virPCIDeviceDetectFunctionLevelReset(dev, cfgfd); dev->has_pm_reset = virPCIDeviceDetectPowerManagementReset(dev, cfgfd);
-- 2.28.0
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 12/9/20 4:18 AM, Daniel P. Berrangé wrote:
On Tue, Dec 08, 2020 at 08:37:43PM -0500, Laine Stump wrote:
Previously there was no way to differentiate between this function 1) encountering an error while reading the pci config, and 2) determining that the device in question is a conventional PCI device, and so has no Express Capabilities.
The difference between these two conditions is important, because an unprivileged libvirtd will be unable to read all of the pci config (it can only read the first 64 bytes, and will get ENOENT when it tries to seek past that limit) even though the device is in fact a PCIe device.
This patch just changes virPCIDeviceFindCapabilityOffset() to put the determined offset into an argument of the function (rather than sending it back as the return value), and to return the standard "0 on success, -1 on failure". Failure is determined by checking the value of errno after each attemptd read of the config file (which can only work reliably if errno is reset to 0 before each read, and after virPCIDeviceFindCapabilityOffset() has finished examining it).
(NB: if the config file is read successfully, but no Express Capabilities are found, then the function returns success, but the returned offset will be 0 (which is an impossible offset for Express Capabilities, and so easily recognizeable).
An upcoming patch will take advantage of the change made here.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/util/virpci.c | 42 +++++++++++++++++++++++++++++++----------- 1 file changed, 31 insertions(+), 11 deletions(-)
diff --git a/src/util/virpci.c b/src/util/virpci.c index 9109fb4f3a..2f99bb93bd 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -329,6 +329,7 @@ virPCIDeviceRead(virPCIDevicePtr dev, unsigned int buflen) { memset(buf, 0, buflen); + errno = 0;
if (lseek(cfgfd, pos, SEEK_SET) != pos || saferead(cfgfd, buf, buflen) != buflen) { @@ -483,19 +484,24 @@ virPCIDeviceIterDevices(virPCIDeviceIterPredicate predicate, return ret; }
I think there should be some API docs added here to describe the semantics this method is trying to achieve wrt unprivileged usage and pci vs pcie.
There is something about that in patch 6/7, but I think all of these functions could benefit from documenting their API (e.g. the fact that errno may be set on return from virPCIDeviceReadNN(), since the functions themselves directly return the value that is read, and so are unable indicate an error in the return value (I considered doing that, but wanted a less invasive change, and there are 20-30 callers of those functions, most of which wouldn't otherwise need to be changed). I'll add in some API docublurbs and resend the entire series just for simplicity...
-static uint8_t +static int virPCIDeviceFindCapabilityOffset(virPCIDevicePtr dev, int cfgfd, - unsigned int capability) + unsigned int capability, + unsigned int *offset) { uint16_t status; uint8_t pos;
+ *offset = 0; /* assume failure (*nothing* can be at offset 0) */ + status = virPCIDeviceRead16(dev, cfgfd, PCI_STATUS); - if (!(status & PCI_STATUS_CAP_LIST)) - return 0; + if (errno != 0 || !(status & PCI_STATUS_CAP_LIST)) + goto error;
pos = virPCIDeviceRead8(dev, cfgfd, PCI_CAPABILITY_LIST); + if (errno != 0) + goto error;
/* Zero indicates last capability, capabilities can't * be in the config space header and 0xff is returned @@ -506,18 +512,31 @@ virPCIDeviceFindCapabilityOffset(virPCIDevicePtr dev, */ while (pos >= PCI_CONF_HEADER_LEN && pos != 0xff) { uint8_t capid = virPCIDeviceRead8(dev, cfgfd, pos); + if (errno != 0) + goto error; + if (capid == capability) { VIR_DEBUG("%s %s: found cap 0x%.2x at 0x%.2x", dev->id, dev->name, capability, pos); - return pos; + *offset = pos; + return 0; }
pos = virPCIDeviceRead8(dev, cfgfd, pos + 1); + if (errno != 0) + goto error; }
- VIR_DEBUG("%s %s: failed to find cap 0x%.2x", dev->id, dev->name, capability); + error: + VIR_DEBUG("%s %s: failed to find cap 0x%.2x (%s)", + dev->id, dev->name, capability, g_strerror(errno));
- return 0; + /* reset errno in case the failure was due to insufficient + * privileges to read the entire PCI config file + */ + errno = 0; + + return -1; }
static unsigned int @@ -553,7 +572,7 @@ static bool virPCIDeviceDetectFunctionLevelReset(virPCIDevicePtr dev, int cfgfd) { uint32_t caps; - uint8_t pos; + unsigned int pos; g_autofree char *path = NULL; int found;
@@ -575,7 +594,8 @@ virPCIDeviceDetectFunctionLevelReset(virPCIDevicePtr dev, int cfgfd) * the same thing, except for conventional PCI * devices. This is not common yet. */ - pos = virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_AF); + virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_AF, &pos); + if (pos) { caps = virPCIDeviceRead16(dev, cfgfd, pos + PCI_AF_CAP); if (caps & PCI_AF_CAP_FLR) { @@ -885,8 +905,8 @@ virPCIDeviceTryPowerManagementReset(virPCIDevicePtr dev, int cfgfd) static int virPCIDeviceInit(virPCIDevicePtr dev, int cfgfd) { - dev->pcie_cap_pos = virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_EXP); - dev->pci_pm_cap_pos = virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_PM); + virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_EXP, &dev->pcie_cap_pos); + virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_PM, &dev->pci_pm_cap_pos); dev->has_flr = virPCIDeviceDetectFunctionLevelReset(dev, cfgfd); dev->has_pm_reset = virPCIDeviceDetectPowerManagementReset(dev, cfgfd);
-- 2.28.0
Regards, Daniel

Until now there has been an extra bit of code in qemuDomainDeviceCalculatePCIConnectFlag() (one of the two callers of virPCIDeviceIsPCIExpress()) that tries to determine if a device is PCIe by looking at the *length* of its sysfs config file; it only does this when libvirt is running as a non-root process. This patch takes advantage of our newfound ability to tell the difference between "I read a 0 from the device PCI config file" and "I couldn't read the PCI Express Capabilities because I don't have sufficient permission" to move the file length check down into virPCIDeviceIsPCIExpress(), and do that check any time we fail while reading the config file (not only when the process is non-root). Fixes: https://bugzilla.redhat.com/1901685 Signed-off-by: Laine Stump <laine@redhat.com> --- src/util/virpci.c | 42 ++++++++++++++++++++++++++++++++++++++---- 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/src/util/virpci.c b/src/util/virpci.c index 2f99bb93bd..01a156dfcf 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -73,10 +73,18 @@ struct _virPCIDevice { char *used_by_drvname; char *used_by_domname; + /* The following 5 items are only valid after virPCIDeviceInit() + * has been called for the virPCIDevice object. This is *not* done + * in most cases (because it creates extra overhead, and parts of + * it can fail if libvirtd is running unprivileged) + */ unsigned int pcie_cap_pos; unsigned int pci_pm_cap_pos; bool has_flr; bool has_pm_reset; + bool is_pcie; + /**/ + bool managed; virPCIStubDriver stubDriver; @@ -594,7 +602,8 @@ virPCIDeviceDetectFunctionLevelReset(virPCIDevicePtr dev, int cfgfd) * the same thing, except for conventional PCI * devices. This is not common yet. */ - virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_AF, &pos); + if (virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_AF, &pos) < 0) + goto error; if (pos) { caps = virPCIDeviceRead16(dev, cfgfd, pos + PCI_AF_CAP); @@ -619,8 +628,8 @@ virPCIDeviceDetectFunctionLevelReset(virPCIDevicePtr dev, int cfgfd) return true; } + error: VIR_DEBUG("%s %s: no FLR capability found", dev->id, dev->name); - return false; } @@ -905,7 +914,32 @@ virPCIDeviceTryPowerManagementReset(virPCIDevicePtr dev, int cfgfd) static int virPCIDeviceInit(virPCIDevicePtr dev, int cfgfd) { - virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_EXP, &dev->pcie_cap_pos); + dev->is_pcie = false; + if (virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_EXP, &dev->pcie_cap_pos) < 0) { + /* an unprivileged process is unable to read *all* of a + * device's PCI config (it can only read the first 64 + * bytes, which isn't enough for see the Express + * Capabilities data). If virPCIDeviceFindCapabilityOffset + * returns failure (and not just a pcie_cap_pos == 0, + * which is *success* at determining the device is *not* + * PCIe) we make an educated guess based on the length of + * the device's config file - if it is 256 bytes, then it + * is definitely a legacy PCI device. If it's larger than + * that, then it is *probably PCIe (although it could be + * PCI-x, but those are extremely rare). If the config + * file can't be found (in which case the "length" will be + * -1), then we blindly assume the most likely outcome - + * PCIe. + */ + off_t configLen = virFileLength(virPCIDeviceGetConfigPath(dev), -1); + + if (configLen != 256) + dev->is_pcie = true; + + } else { + dev->is_pcie = (dev->pcie_cap_pos != 0); + } + virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_PM, &dev->pci_pm_cap_pos); dev->has_flr = virPCIDeviceDetectFunctionLevelReset(dev, cfgfd); dev->has_pm_reset = virPCIDeviceDetectPowerManagementReset(dev, cfgfd); @@ -2609,7 +2643,7 @@ virPCIDeviceIsPCIExpress(virPCIDevicePtr dev) if (virPCIDeviceInit(dev, fd) < 0) goto cleanup; - ret = dev->pcie_cap_pos != 0; + ret = dev->is_pcie; cleanup: virPCIDeviceConfigClose(dev, fd); -- 2.28.0

Now that virPCIDeviceIsPCIExpress() checks the length of the file when the process lacks sufficient privilege to read the entire PCI config file in sysfs, we can remove the open-coding for that case from its consumer. Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_domain_address.c | 38 +++------------------------------- 1 file changed, 3 insertions(+), 35 deletions(-) diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index b07672e2f4..f0ba318cc8 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -579,7 +579,6 @@ qemuDomainDeviceCalculatePCIAddressExtensionFlags(virQEMUCapsPtr qemuCaps, */ static virDomainPCIConnectFlags qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDefPtr dev, - virQEMUDriverPtr driver, virDomainPCIConnectFlags pcieFlags, virDomainPCIConnectFlags virtioFlags) { @@ -802,7 +801,6 @@ qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDefPtr dev, case VIR_DOMAIN_DEVICE_HOSTDEV: { virDomainHostdevDefPtr hostdev = dev->data.hostdev; - bool isExpress = false; g_autoptr(virPCIDevice) pciDev = NULL; virPCIDeviceAddressPtr hostAddr = &hostdev->source.subsys.u.pci.addr; @@ -873,37 +871,7 @@ qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDefPtr dev, return pcieFlags; } - if (!driver->privileged) { - /* unprivileged libvirtd is unable to read *all* of a - * device's PCI config (it can only read the first 64 - * bytes, which isn't enough for the check that's done - * in virPCIDeviceIsPCIExpress()), so instead of - * trying and failing, we make an educated guess based - * on the length of the device's config file - if it - * is 256 bytes, then it is definitely a legacy PCI - * device. If it's larger than that, then it is - * *probably PCIe (although it could be PCI-x, but - * those are extremely rare). If the config file can't - * be found (in which case the "length" will be -1), - * then we blindly assume the most likely outcome - - * PCIe. - */ - off_t configLen - = virFileLength(virPCIDeviceGetConfigPath(pciDev), -1); - - if (configLen == 256) - return pciFlags; - - return pcieFlags; - } - - /* If we are running with privileges, we can examine the - * PCI config contents with virPCIDeviceIsPCIExpress() for - * a definitive answer. - */ - isExpress = virPCIDeviceIsPCIExpress(pciDev); - - if (isExpress) + if (virPCIDeviceIsPCIExpress(pciDev)) return pcieFlags; return pciFlags; @@ -1124,7 +1092,7 @@ qemuDomainFillDevicePCIConnectFlagsIter(virDomainDefPtr def G_GNUC_UNUSED, qemuDomainFillDevicePCIConnectFlagsIterData *data = opaque; info->pciConnectFlags - = qemuDomainDeviceCalculatePCIConnectFlags(dev, data->driver, + = qemuDomainDeviceCalculatePCIConnectFlags(dev, data->pcieFlags, data->virtioFlags); return 0; @@ -1468,7 +1436,7 @@ qemuDomainFillDevicePCIConnectFlags(virDomainDefPtr def, qemuDomainFillDevicePCIConnectFlagsIterInit(def, qemuCaps, driver, &data); info->pciConnectFlags - = qemuDomainDeviceCalculatePCIConnectFlags(dev, data.driver, + = qemuDomainDeviceCalculatePCIConnectFlags(dev, data.pcieFlags, data.virtioFlags); } -- 2.28.0

On 12/9/20 2:37 AM, Laine Stump wrote:
I think there's an adequate description in patch 6 (and maybe in the bugzilla record: https://bugzilla.redhat.com/1901685). I already typed a long description once, and git send-email failed due to network problems and threw away everything I typed. I don't feel like typing it again.
Laine Stump (7): qemu: use g_autoptr for a virPCIDevice util: simplify calling of virPCIDeviceDetectFunctionLevelReset() util: simplify call to virPCIDeviceDetectPowerManagementReset() util: make read error of PCI config file more detailed util: change call sequence for virPCIDeviceFindCapabilityOffset() util: make virPCIDeviceIsPCIExpress() more intelligent qemu: remove now-redundant check for file length when determining PCIe vs. PCI
src/qemu/qemu_domain_address.c | 43 ++----------- src/util/virpci.c | 107 ++++++++++++++++++++++++--------- 2 files changed, 82 insertions(+), 68 deletions(-)
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Michal
participants (3)
-
Daniel P. Berrangé
-
Laine Stump
-
Michal Privoznik