[libvirt] [PATCH] Fix a potential race in pciInitDevice.

If detecting the FLR flag of a pci device fails, then we could run into the situation of trying to close a file descriptor twice, once in pciInitDevice() and once in pciFreeDevice(). Fix that by removing the pciCloseConfig() in pciInitDevice() and just letting pciFreeDevice() handle it. Thanks to Chris Wright for pointing out this problem. While we are at it, fix an error check. While it would actually work as-is (since success returns 0), it's still more clear to check for < 0 (as the rest of the code does). Signed-off-by: Chris Lalancette <clalance@redhat.com> --- src/qemu/qemu_driver.c | 2 +- src/util/pci.c | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index b4c468a..9436e2a 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -8010,7 +8010,7 @@ static int qemudDomainAttachHostPciDevice(struct qemud_driver *driver, return -1; } - if (qemuPrepareHostdevPCIDevices(driver, &hostdev, 1)) + if (qemuPrepareHostdevPCIDevices(driver, &hostdev, 1) < 0) return -1; if (qemuCmdFlags & QEMUD_CMD_FLAG_DEVICE) { diff --git a/src/util/pci.c b/src/util/pci.c index 14ef058..26d55b8 100644 --- a/src/util/pci.c +++ b/src/util/pci.c @@ -189,8 +189,10 @@ pciCloseConfig(pciDevice *dev) if (!dev) return; - if (dev->fd >= 0) + if (dev->fd >= 0) { close(dev->fd); + dev->fd = -1; + } } static int @@ -673,10 +675,8 @@ pciInitDevice(pciDevice *dev) dev->pcie_cap_pos = pciFindCapabilityOffset(dev, PCI_CAP_ID_EXP); dev->pci_pm_cap_pos = pciFindCapabilityOffset(dev, PCI_CAP_ID_PM); flr = pciDetectFunctionLevelReset(dev); - if (flr < 0) { - pciCloseConfig(dev); + if (flr < 0) return flr; - } dev->has_flr = flr; dev->has_pm_reset = pciDetectPowerManagementReset(dev); dev->initted = 1; -- 1.7.1.1

The code in libvirt that does ACS checking has a pretty serious bug that was essentially rendering the check useless. When trying to assign a device, we have to check that all bridges upstream of this device support ACS. That means that we have to find the parent of the current device, check for ACS, then find the parent of that device, check for ACS, etc. However, the code to find the parent of the device had a much too relaxed check. It would iterate through all PCI devices on the system, looking for a device that had a range of busses that included the current device's bus. That's not correct, though. Depending on how we iterated through the list of PCI devices, we could first find the *topmost* bridge in the system; since it necessarily had a range of busses including the current device's bus, we would only ever check the topmost bridge, and not check any of the intermediate bridges. This patch is simple in that it looks for the PCI device whose secondary device *exactly* equals the bus of the device we are looking for. That means that one, and only one bridge will be found, and it will be the correct device. Note that this also caused a fairly serious bug in the secondary bus reset code, where we could erroneously reset the topmost bus instead of the inner bus. This patch was tested by me on a 4-port NIC with a bridge without ACS (where assignment failed), a 4-port NIC with a bridge with ACS (where assignment succeeded), and a 2-port NIC with no bridges (where assignment succeeded). Signed-off-by: Chris Lalancette <clalance@redhat.com> --- src/util/pci.c | 5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/src/util/pci.c b/src/util/pci.c index 26d55b8..df30e04 100644 --- a/src/util/pci.c +++ b/src/util/pci.c @@ -513,7 +513,7 @@ static int pciIsParent(pciDevice *dev, pciDevice *check, void *data ATTRIBUTE_UNUSED) { uint16_t device_class; - uint8_t header_type, secondary, subordinate; + uint8_t header_type, secondary; if (dev->domain != check->domain) return 0; @@ -529,12 +529,11 @@ pciIsParent(pciDevice *dev, pciDevice *check, void *data ATTRIBUTE_UNUSED) return 0; secondary = pciRead8(check, PCI_SECONDARY_BUS); - subordinate = pciRead8(check, PCI_SUBORDINATE_BUS); VIR_DEBUG("%s %s: found parent device %s", dev->id, dev->name, check->name); /* No, it's superman! */ - return (dev->bus >= secondary && dev->bus <= subordinate); + return (dev->bus == secondary); } static pciDevice * -- 1.7.1.1

On 07/28/2010 03:07 PM, Chris Lalancette wrote:
However, the code to find the parent of the device had a much too relaxed check. It would iterate through all PCI devices on the system, looking for a device that had a range of busses that included the current device's bus.
Stupid English. I'm more used seeing buses than busses, and was about to call you on it, but http://www.googlefight.com/index.php?lang=en_GB&word1=buses&word2=busses shows both forms as roughly neck-and-neck and dictionary.com lists both spellings as acceptable plurals.
This patch is simple in that it looks for the PCI device whose secondary device *exactly* equals the bus of the device we are looking for. That means that one, and only one bridge will be found, and it will be the correct device.
Makes sense.
Note that this also caused a fairly serious bug in the
s/caused/solved/ - the bug was caused by the condition before this commit, but the commit message is documenting what this particular commit does for the code base.
/* No, it's superman! */ - return (dev->bus >= secondary && dev->bus <= subordinate); + return (dev->bus == secondary);
Do we want a comment here in the code to summarize your commit message? But that's a minor nit, so: ACK. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

Chris Lalancette wrote:
The code in libvirt that does ACS checking has a pretty serious bug that was essentially rendering the check useless. When trying to assign a device, we have to check that all bridges upstream of this device support ACS. That means that we have to find the parent of the current device, check for ACS, then find the parent of that device, check for ACS, etc.
However, the code to find the parent of the device had a much too relaxed check. It would iterate through all PCI devices on the system, looking for a device that had a range of busses that included the current device's bus.
That's not correct, though. Depending on how we iterated through the list of PCI devices, we could first find the *topmost* bridge in the system; since it necessarily had a range of busses including the current device's bus, we would only ever check the topmost bridge, and not check any of the intermediate bridges.
This patch is simple in that it looks for the PCI device whose secondary device *exactly* equals the bus of the device we are looking for. That means that one, and only one bridge will be found, and it will be the correct device.
Note that this also caused a fairly serious bug in the secondary bus reset code, where we could erroneously reset the topmost bus instead of the inner bus.
This patch was tested by me on a 4-port NIC with a bridge without ACS (where assignment failed), a 4-port NIC with a bridge with ACS (where assignment succeeded), and a 2-port NIC with no bridges (where assignment succeeded).
Signed-off-by: Chris Lalancette <clalance@redhat.com> --- src/util/pci.c | 5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/src/util/pci.c b/src/util/pci.c index 26d55b8..df30e04 100644 --- a/src/util/pci.c +++ b/src/util/pci.c @@ -513,7 +513,7 @@ static int pciIsParent(pciDevice *dev, pciDevice *check, void *data ATTRIBUTE_UNUSED) { uint16_t device_class; - uint8_t header_type, secondary, subordinate; + uint8_t header_type, secondary;
if (dev->domain != check->domain) return 0; @@ -529,12 +529,11 @@ pciIsParent(pciDevice *dev, pciDevice *check, void *data ATTRIBUTE_UNUSED) return 0;
secondary = pciRead8(check, PCI_SECONDARY_BUS); - subordinate = pciRead8(check, PCI_SUBORDINATE_BUS);
VIR_DEBUG("%s %s: found parent device %s", dev->id, dev->name, check->name);
/* No, it's superman! */ - return (dev->bus >= secondary && dev->bus <= subordinate); + return (dev->bus == secondary); }
static pciDevice *
Acked-by: Donald Dutile <ddutile@redhat.com>

* Chris Lalancette (clalance@redhat.com) wrote:
The code in libvirt that does ACS checking has a pretty serious bug that was essentially rendering the check useless. When trying to assign a device, we have to check that all bridges upstream of this device support ACS. That means that we have to find the parent of the current device, check for ACS, then find the parent of that device, check for ACS, etc.
However, the code to find the parent of the device had a much too relaxed check. It would iterate through all PCI devices on the system, looking for a device that had a range of busses that included the current device's bus.
That's not correct, though. Depending on how we iterated through the list of PCI devices, we could first find the *topmost* bridge in the system; since it necessarily had a range of busses including the current device's bus, we would only ever check the topmost bridge, and not check any of the intermediate bridges.
This patch is simple in that it looks for the PCI device whose secondary device *exactly* equals the bus of the device we are looking for. That means that one, and only one bridge will be found, and it will be the correct device.
Note that this also caused a fairly serious bug in the secondary bus reset code, where we could erroneously reset the topmost bus instead of the inner bus.
This patch was tested by me on a 4-port NIC with a bridge without ACS (where assignment failed), a 4-port NIC with a bridge with ACS (where assignment succeeded), and a 2-port NIC with no bridges (where assignment succeeded).
Hmm, I'm not sure this will work with all SR-IOV devices. VFs can exist on a bus number that is >= than the PF bus number. When it's > PF, the upstream switch effectively addresses it via the subordinate value (not secondary). Can this be loosened to look for the "leaf" parent? I.e. one where the secondary bus matches, or the secondary < bus <= subordinate, and no other bridge has the subordinate value as the secondary value. thanks, -chris

* Chris Lalancette (clalance@redhat.com) wrote:
If detecting the FLR flag of a pci device fails, then we could run into the situation of trying to close a file descriptor twice, once in pciInitDevice() and once in pciFreeDevice(). Fix that by removing the pciCloseConfig() in pciInitDevice() and just letting pciFreeDevice() handle it.
Thanks to Chris Wright for pointing out this problem.
While we are at it, fix an error check. While it would actually work as-is (since success returns 0), it's still more clear to check for < 0 (as the rest of the code does).
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Acked-by: Chris Wright <chrisw@redhat.com>

On 07/28/10 - 02:39:38PM, Chris Wright wrote:
* Chris Lalancette (clalance@redhat.com) wrote:
If detecting the FLR flag of a pci device fails, then we could run into the situation of trying to close a file descriptor twice, once in pciInitDevice() and once in pciFreeDevice(). Fix that by removing the pciCloseConfig() in pciInitDevice() and just letting pciFreeDevice() handle it.
Thanks to Chris Wright for pointing out this problem.
While we are at it, fix an error check. While it would actually work as-is (since success returns 0), it's still more clear to check for < 0 (as the rest of the code does).
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Thanks, I've pushed this patch. -- Chris Lalancette

Chris Lalancette wrote:
If detecting the FLR flag of a pci device fails, then we could run into the situation of trying to close a file descriptor twice, once in pciInitDevice() and once in pciFreeDevice(). Fix that by removing the pciCloseConfig() in pciInitDevice() and just letting pciFreeDevice() handle it.
Thanks to Chris Wright for pointing out this problem.
While we are at it, fix an error check. While it would actually work as-is (since success returns 0), it's still more clear to check for < 0 (as the rest of the code does).
Signed-off-by: Chris Lalancette <clalance@redhat.com> --- src/qemu/qemu_driver.c | 2 +- src/util/pci.c | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index b4c468a..9436e2a 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -8010,7 +8010,7 @@ static int qemudDomainAttachHostPciDevice(struct qemud_driver *driver, return -1; }
- if (qemuPrepareHostdevPCIDevices(driver, &hostdev, 1)) + if (qemuPrepareHostdevPCIDevices(driver, &hostdev, 1) < 0) return -1;
if (qemuCmdFlags & QEMUD_CMD_FLAG_DEVICE) { diff --git a/src/util/pci.c b/src/util/pci.c index 14ef058..26d55b8 100644 --- a/src/util/pci.c +++ b/src/util/pci.c @@ -189,8 +189,10 @@ pciCloseConfig(pciDevice *dev) if (!dev) return;
- if (dev->fd >= 0) + if (dev->fd >= 0) { close(dev->fd); + dev->fd = -1; + } }
static int @@ -673,10 +675,8 @@ pciInitDevice(pciDevice *dev) dev->pcie_cap_pos = pciFindCapabilityOffset(dev, PCI_CAP_ID_EXP); dev->pci_pm_cap_pos = pciFindCapabilityOffset(dev, PCI_CAP_ID_PM); flr = pciDetectFunctionLevelReset(dev); - if (flr < 0) { - pciCloseConfig(dev); + if (flr < 0) return flr; - } dev->has_flr = flr; dev->has_pm_reset = pciDetectPowerManagementReset(dev); dev->initted = 1;
seems simple & logical enough... Acked-by: Donald Dutile <ddutile@redhat.com>
participants (4)
-
Chris Lalancette
-
Chris Wright
-
Don Dutile
-
Eric Blake