[libvirt] [PATCH 0/2] Set PCI_PHYSICAL_FUNCTION flag more carefully

If you're running libvirt master on a remote host, and connect to it using virt-manager, the remote daemon will crash. The same will happen if you run 'virsh nodedev-dumpxml' for a PCI device that happens *not* to be a virtual function. The first patch fixes the crash. The second one fixes something that could potentially cause trouble down the line, and also makes the code a tiny bit nicer. Andrea Bolognani (2): nodedev: sysfs: Set PCI_PHYSICAL_FUNCTION flag more carefully conf: nodedev: Set PCI_PHYSICAL_FUNCTION flag more carefully src/conf/node_device_conf.c | 4 ++-- src/node_device/node_device_linux_sysfs.c | 10 ++++++++-- 2 files changed, 10 insertions(+), 4 deletions(-) -- 2.5.5

The flag was set if virPCIGetPhysicalFunction() did not return an error; unfortunately, the 'physfn' sysfs file not being present is not considered an error, which means that the flag was set pretty much unconditionally. This, in turn, caused libvirtd to crash in virNodeDeviceDefFormat(), where the presence of the flag is considered a reliable indicator of the fact that pci_dev.physical_function has been filled in. Change the code to check pci_dev.physical_function before setting the flag. --- src/node_device/node_device_linux_sysfs.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/src/node_device/node_device_linux_sysfs.c b/src/node_device/node_device_linux_sysfs.c index 24a6a2e..b337d2d 100644 --- a/src/node_device/node_device_linux_sysfs.c +++ b/src/node_device/node_device_linux_sysfs.c @@ -154,19 +154,25 @@ nodeDeviceSysfsGetPCISRIOVCaps(const char *sysfsPath, data->pci_dev.flags &= ~VIR_NODE_DEV_CAP_FLAG_PCI_VIRTUAL_FUNCTION; data->pci_dev.flags &= ~VIR_NODE_DEV_CAP_FLAG_PCI_PHYSICAL_FUNCTION; - if (!virPCIGetPhysicalFunction(sysfsPath, &data->pci_dev.physical_function)) + ret = virPCIGetPhysicalFunction(sysfsPath, + &data->pci_dev.physical_function); + if (ret < 0) + goto out; + + if (data->pci_dev.physical_function) data->pci_dev.flags |= VIR_NODE_DEV_CAP_FLAG_PCI_PHYSICAL_FUNCTION; ret = virPCIGetVirtualFunctions(sysfsPath, &data->pci_dev.virtual_functions, &data->pci_dev.num_virtual_functions, &data->pci_dev.max_virtual_functions); if (ret < 0) - return ret; + goto out; if (data->pci_dev.num_virtual_functions > 0 || data->pci_dev.max_virtual_functions > 0) data->pci_dev.flags |= VIR_NODE_DEV_CAP_FLAG_PCI_VIRTUAL_FUNCTION; + out: return ret; } -- 2.5.5

On 05/23/2016 07:53 AM, Andrea Bolognani wrote:
The flag was set if virPCIGetPhysicalFunction() did not return an error; unfortunately, the 'physfn' sysfs file not being present is not considered an error, which means that the flag was set pretty much unconditionally.
This behavior is the result of commit c8b1a83605: https://www.redhat.com/archives/libvir-list/2016-May/msg01348.html Prior to this, virPCIGetPhysicalFunction had returned the return value from virPCIGetDeviceAddressFromSysfsLink(), which would be -1 if the physfn file didn't exist. That patch discards the return value and hardcodes a "return 0". I think that virPCIGetPhysicalFunction should be restored to its original glory (No, I didn't write it). Either that, or the other caller to virPCIGetPhysicalFunction (virPCIGetVirtualFunctionInfo) should also be fixed so that it doesn't crash due to the changed semantics of virPCIGetPhysicalFunction. (A more correct fix may be to change the semantics of virPCIGetPhysicalFunction() to return -1 only in the case of a bonafide error (OOM, failure to parse the contents of an existing physfn file), but that would also require modifying virPCIGetDeviceAddressFromSysfsLink() to return 0/-1 rather than a virPCIDeviceAddresPtr so that we can tell the difference between "no physfn" and "there was a physfn but I encountered some other error". In other words - 1) revert commit c1faf309 (part of the same "coverity fixes" series as x8b1a83605), then 2) adjust the function to set *bdf = 0 and return 0 when the physfn file doesn't exist.)
This, in turn, caused libvirtd to crash in virNodeDeviceDefFormat(), where the presence of the flag is considered a reliable indicator of the fact that pci_dev.physical_function has been filled in.
Change the code to check pci_dev.physical_function before setting the flag. --- src/node_device/node_device_linux_sysfs.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/node_device/node_device_linux_sysfs.c b/src/node_device/node_device_linux_sysfs.c index 24a6a2e..b337d2d 100644 --- a/src/node_device/node_device_linux_sysfs.c +++ b/src/node_device/node_device_linux_sysfs.c @@ -154,19 +154,25 @@ nodeDeviceSysfsGetPCISRIOVCaps(const char *sysfsPath, data->pci_dev.flags &= ~VIR_NODE_DEV_CAP_FLAG_PCI_VIRTUAL_FUNCTION; data->pci_dev.flags &= ~VIR_NODE_DEV_CAP_FLAG_PCI_PHYSICAL_FUNCTION;
- if (!virPCIGetPhysicalFunction(sysfsPath, &data->pci_dev.physical_function)) + ret = virPCIGetPhysicalFunction(sysfsPath, + &data->pci_dev.physical_function); + if (ret < 0) + goto out; + + if (data->pci_dev.physical_function) data->pci_dev.flags |= VIR_NODE_DEV_CAP_FLAG_PCI_PHYSICAL_FUNCTION;
ret = virPCIGetVirtualFunctions(sysfsPath, &data->pci_dev.virtual_functions, &data->pci_dev.num_virtual_functions, &data->pci_dev.max_virtual_functions); if (ret < 0) - return ret; + goto out;
if (data->pci_dev.num_virtual_functions > 0 || data->pci_dev.max_virtual_functions > 0) data->pci_dev.flags |= VIR_NODE_DEV_CAP_FLAG_PCI_VIRTUAL_FUNCTION;
+ out: return ret; }

On Mon, 2016-05-23 at 15:21 -0400, Laine Stump wrote:
The flag was set if virPCIGetPhysicalFunction() did not return an error; unfortunately, the 'physfn' sysfs file not being present is not considered an error, which means that the flag was set pretty much unconditionally. This behavior is the result of commit c8b1a83605: https://www.redhat.com/archives/libvir-list/2016-May/msg01348.html Prior to this, virPCIGetPhysicalFunction had returned the return value from virPCIGetDeviceAddressFromSysfsLink(), which would be -1 if the
On 05/23/2016 07:53 AM, Andrea Bolognani wrote: physfn file didn't exist. That patch discards the return value and hardcodes a "return 0". I think that virPCIGetPhysicalFunction should be restored to its original glory (No, I didn't write it). Either that, or the other caller to virPCIGetPhysicalFunction (virPCIGetVirtualFunctionInfo) should also be fixed so that it doesn't crash due to the changed semantics of virPCIGetPhysicalFunction. (A more correct fix may be to change the semantics of virPCIGetPhysicalFunction() to return -1 only in the case of a bonafide error (OOM, failure to parse the contents of an existing physfn file), but that would also require modifying virPCIGetDeviceAddressFromSysfsLink() to return 0/-1 rather than a virPCIDeviceAddresPtr so that we can tell the difference between "no physfn" and "there was a physfn but I encountered some other error". In other words - 1) revert commit c1faf309 (part of the same "coverity fixes" series as x8b1a83605), then 2) adjust the function to set *bdf = 0 and return 0 when the physfn file doesn't exist.)
Thanks for the analysis. Fixing the remaining caller looked like the simpler fix, so I've gone that route. Patch 2/2 is unchanged. https://www.redhat.com/archives/libvir-list/2016-May/msg01783.html -- Andrea Bolognani Software Engineer - Virtualization Team

Instead of setting the flag before parsing the PCI address, set it afterwards. This ensure we can never end up in a situation where the flag has been set but pci_dev.physical_function has not been filled in. --- src/conf/node_device_conf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/conf/node_device_conf.c b/src/conf/node_device_conf.c index aed95d5..a23d8ef 100644 --- a/src/conf/node_device_conf.c +++ b/src/conf/node_device_conf.c @@ -1293,8 +1293,6 @@ virNodeDevPCICapabilityParseXML(xmlXPathContextPtr ctxt, if (VIR_ALLOC(data->pci_dev.physical_function) < 0) goto out; - data->pci_dev.flags |= VIR_NODE_DEV_CAP_FLAG_PCI_PHYSICAL_FUNCTION; - if (!address) { virReportError(VIR_ERR_XML_ERROR, "%s", _("Missing address in 'phys_function' capability")); @@ -1304,6 +1302,8 @@ virNodeDevPCICapabilityParseXML(xmlXPathContextPtr ctxt, if (virPCIDeviceAddressParseXML(address, data->pci_dev.physical_function) < 0) goto out; + + data->pci_dev.flags |= VIR_NODE_DEV_CAP_FLAG_PCI_PHYSICAL_FUNCTION; } else if (STREQ(type, "virt_functions")) { int naddresses; -- 2.5.5
participants (2)
-
Andrea Bolognani
-
Laine Stump