[libvirt] [PATCH v2 0/4] qemu: assign vfio devices to PCIe addresses when appropriate

Patches 1 and 4 were originally added to the end of the "more PCIe less legacy PCI" patchset in its final incarnation, but all the other patches were ACKed and pushed, while this needed a bit more work, resulting in this "faux V2" - although it's the 2nd time I've posted these patches, their "V1" was really inside the "V11666" of a larger series. In order to implement Alex's suggestion of checking the length of a PCI device's config file to determine if it's an Express device, I first made a utility function to return the length of a file, and then another to return the name of the config file of a virPCIDevice - those are the new patches 2 and 3. I'm open to suggestions about additional checks to put in virFileLength(), as long as they don't involve opening the file. For now I've kept it as simple as possible. (Just keep in mind that I'll be away from my keyboard from Wednesday through Sunday this week). Laine Stump (4): util: new function virFileLength() util: new function virPCIDeviceGetConfigPath() qemu: propagate virQEMUDriver object to qemuDomainDeviceCalculatePCIConnectFlags qemu: assign vfio devices to PCIe addresses when appropriate src/libvirt_private.syms | 2 + src/qemu/qemu_domain.c | 2 +- src/qemu/qemu_domain_address.c | 117 ++++++++++++++++++++++++++++++++++++----- src/qemu/qemu_domain_address.h | 7 ++- src/qemu/qemu_hotplug.c | 19 +++---- src/qemu/qemu_process.c | 13 +++-- src/util/virfile.c | 13 +++++ src/util/virfile.h | 2 + src/util/virpci.c | 6 +++ src/util/virpci.h | 1 + tests/qemuhotplugtest.c | 4 +- 11 files changed, 157 insertions(+), 29 deletions(-) -- 2.9.3

This new function just calls stat() and returns st_size (or -1 if there is an error). We may decide we want this function to be more complex, and handle things like block devices - this is a placeholder (that works) for any more complicated funtion. NB: virFileLength() takes a path rather than an fd because it needs to be called for files that can't be opened (due to permissions). --- New in "V2" src/libvirt_private.syms | 1 + src/util/virfile.c | 13 +++++++++++++ src/util/virfile.h | 2 ++ 3 files changed, 16 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index ac6a1e1..1c0b912 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1578,6 +1578,7 @@ virFileIsLink; virFileIsMountPoint; virFileIsSharedFS; virFileIsSharedFSType; +virFileLength; virFileLinkPointsTo; virFileLock; virFileLoopDeviceAssociate; diff --git a/src/util/virfile.c b/src/util/virfile.c index a45279a..11b6027 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -1735,6 +1735,19 @@ virFileActivateDirOverride(const char *argv0) } } + +off_t +virFileLength(const char *file) +{ + struct stat s; + + if (stat(file, &s) < 0) + return -1; + + return s.st_size; +} + + bool virFileIsDir(const char *path) { diff --git a/src/util/virfile.h b/src/util/virfile.h index b4ae6ea..a0c646d 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -179,6 +179,8 @@ char *virFileFindResourceFull(const char *filename, void virFileActivateDirOverride(const char *argv0) ATTRIBUTE_NONNULL(1); +off_t virFileLength(const char *file) ATTRIBUTE_NONNULL(1); + bool virFileIsDir (const char *file) ATTRIBUTE_NONNULL(1); bool virFileExists(const char *file) ATTRIBUTE_NONNULL(1); bool virFileIsExecutable(const char *file) ATTRIBUTE_NONNULL(1); -- 2.9.3

On Mon, 2016-11-21 at 00:01 -0500, Laine Stump wrote:
This new function just calls stat() and returns st_size (or -1 if there is an error). We may decide we want this function to be more complex, and handle things like block devices - this is a placeholder (that works) for any more complicated funtion.
s/funtion/function/
NB: virFileLength() takes a path rather than an fd because it needs to be called for files that can't be opened (due to permissions). --- New in "V2" src/libvirt_private.syms | 1 + src/util/virfile.c | 13 +++++++++++++ src/util/virfile.h | 2 ++ 3 files changed, 16 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index ac6a1e1..1c0b912 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1578,6 +1578,7 @@ virFileIsLink; virFileIsMountPoint; virFileIsSharedFS; virFileIsSharedFSType; +virFileLength; virFileLinkPointsTo; virFileLock; virFileLoopDeviceAssociate; diff --git a/src/util/virfile.c b/src/util/virfile.c index a45279a..11b6027 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -1735,6 +1735,19 @@ virFileActivateDirOverride(const char *argv0) } } + +off_t +virFileLength(const char *file)
I find the lack of documentation disturbing.
+{ + struct stat s; + + if (stat(file, &s) < 0) + return -1; + + return s.st_size; +} + + bool virFileIsDir(const char *path) { diff --git a/src/util/virfile.h b/src/util/virfile.h index b4ae6ea..a0c646d 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -179,6 +179,8 @@ char *virFileFindResourceFull(const char *filename, void virFileActivateDirOverride(const char *argv0) ATTRIBUTE_NONNULL(1); +off_t virFileLength(const char *file) ATTRIBUTE_NONNULL(1); + bool virFileIsDir (const char *file) ATTRIBUTE_NONNULL(1);
How did a space end up here? Weird :)
bool virFileExists(const char *file) ATTRIBUTE_NONNULL(1); bool virFileIsExecutable(const char *file) ATTRIBUTE_NONNULL(1);
As you mention in the commit message, this simple implementation might not handle all cases; on the other hand, its addition to virFile might encourage others to use it. So I guess my question would be: is adding this function, in its current form, worth it? Would it be better to just call stat() in qemuDomainDeviceCalculatePCIConnectFlags(), and replace that later with a call to a more fleshed-out virFileLength() that can be used not just in that specific spot, but hopefully in a bunch other places? -- Andrea Bolognani / Red Hat / Virtualization

On 11/24/2016 09:36 AM, Andrea Bolognani wrote:
On Mon, 2016-11-21 at 00:01 -0500, Laine Stump wrote:
This new function just calls stat() and returns st_size (or -1 if there is an error). We may decide we want this function to be more complex, and handle things like block devices - this is a placeholder (that works) for any more complicated funtion. s/funtion/function/
NB: virFileLength() takes a path rather than an fd because it needs to be called for files that can't be opened (due to permissions). --- New in "V2"
src/libvirt_private.syms | 1 + src/util/virfile.c | 13 +++++++++++++ src/util/virfile.h | 2 ++ 3 files changed, 16 insertions(+)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index ac6a1e1..1c0b912 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1578,6 +1578,7 @@ virFileIsLink; virFileIsMountPoint; virFileIsSharedFS; virFileIsSharedFSType; +virFileLength; virFileLinkPointsTo; virFileLock; virFileLoopDeviceAssociate; diff --git a/src/util/virfile.c b/src/util/virfile.c index a45279a..11b6027 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -1735,6 +1735,19 @@ virFileActivateDirOverride(const char *argv0) } }
+ +off_t +virFileLength(const char *file) I find the lack of documentation disturbing.
+{ + struct stat s; + + if (stat(file, &s) < 0) + return -1; + + return s.st_size; +} + + bool virFileIsDir(const char *path) { diff --git a/src/util/virfile.h b/src/util/virfile.h index b4ae6ea..a0c646d 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -179,6 +179,8 @@ char *virFileFindResourceFull(const char *filename, void virFileActivateDirOverride(const char *argv0) ATTRIBUTE_NONNULL(1);
+off_t virFileLength(const char *file) ATTRIBUTE_NONNULL(1); + bool virFileIsDir (const char *file) ATTRIBUTE_NONNULL(1); How did a space end up here? Weird :)
bool virFileExists(const char *file) ATTRIBUTE_NONNULL(1); bool virFileIsExecutable(const char *file) ATTRIBUTE_NONNULL(1); As you mention in the commit message, this simple implementation might not handle all cases; on the other hand, its addition to virFile might encourage others to use it.
So I guess my question would be: is adding this function, in its current form, worth it? Would it be better to just call stat() in qemuDomainDeviceCalculatePCIConnectFlags(), and replace that later with a call to a more fleshed-out virFileLength() that can be used not just in that specific spot, but hopefully in a bunch other places?
I was going in the other direction - add a simple implementation of the function that will encourage people to enhance it (if necessary) for their own uses rather than continuing to add calls to stat() "until someone implements a full featured virFileLength()" (which may never happen).

On 11/20/2016 11:01 PM, Laine Stump wrote:
This new function just calls stat() and returns st_size (or -1 if there is an error). We may decide we want this function to be more complex, and handle things like block devices - this is a placeholder (that works) for any more complicated funtion.
NB: virFileLength() takes a path rather than an fd because it needs to be called for files that can't be opened (due to permissions).
I almost wonder if it would be better to take both a path AND an fd, with semantics of: if fd == -1, use path alone - stat() if fd >= 0, path should be non-NULL and describes the fd (for error messages) - fstat() For block devices, [f]stat() probably won't do what we want, where lseek(SEEK_END) and/or ioctl() may be better. But like you say, those can be added later. I definitely like the idea of having a nice wrapper function that we can enhance later, to abstract out the logic so that it is not repeated in the callers.
--- New in "V2"
src/libvirt_private.syms | 1 + src/util/virfile.c | 13 +++++++++++++ src/util/virfile.h | 2 ++ 3 files changed, 16 insertions(+)
+ +off_t +virFileLength(const char *file) +{ + struct stat s; + + if (stat(file, &s) < 0) + return -1; + + return s.st_size;
Since we KNOW it won't work right on block devices, and we have the stat() results, shouldn't this also check for S_ISREG(s.st_mode) and return -1 if not? -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

This new function just calls fstat() (if provided with a valid fd) or stat() (if fd is -1) and returns st_size (or -1 if there is an error). We may decide we want this function to be more complex, and handle things like block devices - this is a placeholder (that works) for any more complicated function. --- Changes from V2: * add fd arg and use it to optionally call fstat() instead of stat. * return -1 unless it is a "regular file" (i.e. not a block device). Is this what you were thinking of Eric? src/libvirt_private.syms | 1 + src/util/virfile.c | 34 ++++++++++++++++++++++++++++++++++ src/util/virfile.h | 1 + 3 files changed, 36 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 7cc7bf8..ee59cae 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1578,6 +1578,7 @@ virFileIsLink; virFileIsMountPoint; virFileIsSharedFS; virFileIsSharedFSType; +virFileLength; virFileLinkPointsTo; virFileLock; virFileLoopDeviceAssociate; diff --git a/src/util/virfile.c b/src/util/virfile.c index 1fb89ce..fcd0d92 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -1735,6 +1735,39 @@ virFileActivateDirOverride(const char *argv0) } } + +/** + * virFileLength: + * @path: full path of the file + * @fd: open file descriptor for file (or -1 to use @path) + * + * If fd >= 0, return the length of the open file indicated by @fd. + * If fd < 0 (i.e. -1) return the length of the file indicated by + * @path. + * + * Returns the length, or -1 if the file doesn't + * exist or its info was inaccessible. No error is logged. + */ +off_t +virFileLength(const char *path, int fd) +{ + struct stat s; + + if (fd >= 0) { + if (fstat(fd, &s) < 0) + return -1; + } else { + if (stat(path, &s) < 0) + return -1; + } + + if (!S_ISREG(s.st_mode)) + return -1; + + return s.st_size; +} + + bool virFileIsDir(const char *path) { diff --git a/src/util/virfile.h b/src/util/virfile.h index b4ae6ea..836cc60 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -179,6 +179,7 @@ char *virFileFindResourceFull(const char *filename, void virFileActivateDirOverride(const char *argv0) ATTRIBUTE_NONNULL(1); +off_t virFileLength(const char *path, int fd) ATTRIBUTE_NONNULL(1); bool virFileIsDir (const char *file) ATTRIBUTE_NONNULL(1); bool virFileExists(const char *file) ATTRIBUTE_NONNULL(1); bool virFileIsExecutable(const char *file) ATTRIBUTE_NONNULL(1); -- 2.7.4

On 11/29/2016 01:23 PM, Laine Stump wrote:
This new function just calls fstat() (if provided with a valid fd) or stat() (if fd is -1) and returns st_size (or -1 if there is an error). We may decide we want this function to be more complex, and handle things like block devices - this is a placeholder (that works) for any more complicated function. --- Changes from V2:
* add fd arg and use it to optionally call fstat() instead of stat. * return -1 unless it is a "regular file" (i.e. not a block device).
Is this what you were thinking of Eric?
Yep! ACK -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 11/29/2016 02:47 PM, Eric Blake wrote:
On 11/29/2016 01:23 PM, Laine Stump wrote:
This new function just calls fstat() (if provided with a valid fd) or stat() (if fd is -1) and returns st_size (or -1 if there is an error). We may decide we want this function to be more complex, and handle things like block devices - this is a placeholder (that works) for any more complicated function. --- Changes from V2:
* add fd arg and use it to optionally call fstat() instead of stat. * return -1 unless it is a "regular file" (i.e. not a block device).
Is this what you were thinking of Eric? Yep!
ACK
Thanks! With that ACK, the series is ACKed. This is the last piece of the "Use PCIe instead of legacy PCI on machinetypes with a PCIe root bus" that hasn't been pushed. DV already cut RV1 for the next release, but allowed that it seems innocuous enough to push these patches before rc2 (thus putting the entire feature into a single release, rather than part of it): https://www.redhat.com/archives/libvir-list/2016-November/msg01420.html so I'm going to push the seriestoday (RC2 will be cut tomorrow)

The path to the config file for a PCI device is conventiently stored in a virPCIDevice object, but that object's contents aren't directly visible outside of virpci.c, so we need to have an accessor function for it if anyone needs to look at it. --- New in "V2" src/libvirt_private.syms | 1 + src/util/virpci.c | 6 ++++++ src/util/virpci.h | 1 + 3 files changed, 8 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 1c0b912..77e4b63 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2135,6 +2135,7 @@ virPCIDeviceCopy; virPCIDeviceDetach; virPCIDeviceFileIterate; virPCIDeviceFree; +virPCIDeviceGetConfigPath; virPCIDeviceGetDriverPathAndName; virPCIDeviceGetIOMMUGroupDev; virPCIDeviceGetIOMMUGroupList; diff --git a/src/util/virpci.c b/src/util/virpci.c index 6c8174a..0c06249 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1848,6 +1848,12 @@ virPCIDeviceGetName(virPCIDevicePtr dev) return dev->name; } +const char * +virPCIDeviceGetConfigPath(virPCIDevicePtr dev) +{ + return dev->path; +} + void virPCIDeviceSetManaged(virPCIDevicePtr dev, bool managed) { dev->managed = managed; diff --git a/src/util/virpci.h b/src/util/virpci.h index 5c63eab..a5e8d00 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -97,6 +97,7 @@ virPCIDevicePtr virPCIDeviceNew(unsigned int domain, virPCIDevicePtr virPCIDeviceCopy(virPCIDevicePtr dev); void virPCIDeviceFree(virPCIDevicePtr dev); const char *virPCIDeviceGetName(virPCIDevicePtr dev); +const char *virPCIDeviceGetConfigPath(virPCIDevicePtr dev); int virPCIDeviceDetach(virPCIDevicePtr dev, virPCIDeviceListPtr activeDevs, -- 2.9.3

On Mon, 2016-11-21 at 00:01 -0500, Laine Stump wrote:
The path to the config file for a PCI device is conventiently stored in a virPCIDevice object, but that object's contents aren't directly visible outside of virpci.c, so we need to have an accessor function for it if anyone needs to look at it. --- New in "V2" src/libvirt_private.syms | 1 + src/util/virpci.c | 6 ++++++ src/util/virpci.h | 1 + 3 files changed, 8 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 1c0b912..77e4b63 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2135,6 +2135,7 @@ virPCIDeviceCopy; virPCIDeviceDetach; virPCIDeviceFileIterate; virPCIDeviceFree; +virPCIDeviceGetConfigPath; virPCIDeviceGetDriverPathAndName; virPCIDeviceGetIOMMUGroupDev; virPCIDeviceGetIOMMUGroupList; diff --git a/src/util/virpci.c b/src/util/virpci.c index 6c8174a..0c06249 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1848,6 +1848,12 @@ virPCIDeviceGetName(virPCIDevicePtr dev) return dev->name; } +const char * +virPCIDeviceGetConfigPath(virPCIDevicePtr dev)
Documentation maybe? Then again, it's a trivial accessor.
+{ + return dev->path; +} + void virPCIDeviceSetManaged(virPCIDevicePtr dev, bool managed) { dev->managed = managed; diff --git a/src/util/virpci.h b/src/util/virpci.h index 5c63eab..a5e8d00 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -97,6 +97,7 @@ virPCIDevicePtr virPCIDeviceNew(unsigned int domain, virPCIDevicePtr virPCIDeviceCopy(virPCIDevicePtr dev); void virPCIDeviceFree(virPCIDevicePtr dev); const char *virPCIDeviceGetName(virPCIDevicePtr dev); +const char *virPCIDeviceGetConfigPath(virPCIDevicePtr dev); int virPCIDeviceDetach(virPCIDevicePtr dev, virPCIDeviceListPtr activeDevs,
ACK whether or not you decide to add documentation. -- Andrea Bolognani / Red Hat / Virtualization

If libvirtd is running unprivileged, it can open a device's PCI config data in sysfs, but can only read the first 64 bytes. But as part of determining whether a device is Express or legacy PCI, qemuDomainDeviceCalculatePCIConnectFlags() will be updated in a future patch to call virPCIDeviceIsPCIExpress(), which tries to read beyond the first 64 bytes of the PCI config data and fails with an error log if the read is unsuccessful. In order to avoid creating a parallel "quiet" version of virPCIDeviceIsPCIExpress(), this patch passes a virQEMUDriverPtr down through all the call chains that initialize the qemuDomainFillDevicePCIConnectFlagsIterData, and saves the driver pointer with the rest of the iterdata so that it can be used by qemuDomainDeviceCalculatePCIConnectFlags(). This pointer isn't used yet, but will be used in an upcoming patch (that detects Express vs legacy PCI for VFIO assigned devices) to examine driver->privileged. --- No change from "V1" src/qemu/qemu_domain.c | 2 +- src/qemu/qemu_domain_address.c | 33 +++++++++++++++++++++++---------- src/qemu/qemu_domain_address.h | 7 +++++-- src/qemu/qemu_hotplug.c | 19 ++++++++++--------- src/qemu/qemu_process.c | 13 +++++++++---- tests/qemuhotplugtest.c | 4 +++- 6 files changed, 51 insertions(+), 27 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 0f113a7..5642134 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -3123,7 +3123,7 @@ qemuDomainDefAssignAddresses(virDomainDef *def, goto cleanup; } - if (qemuDomainAssignAddresses(def, qemuCaps, NULL, newDomain) < 0) + if (qemuDomainAssignAddresses(def, qemuCaps, driver, NULL, newDomain) < 0) goto cleanup; ret = 0; diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 30ecb4e..8abae65 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -428,6 +428,7 @@ qemuDomainAssignARMVirtioMMIOAddresses(virDomainDefPtr def, */ static virDomainPCIConnectFlags qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDefPtr dev, + virQEMUDriverPtr driver ATTRIBUTE_UNUSED, virDomainPCIConnectFlags pcieFlags, virDomainPCIConnectFlags virtioFlags) { @@ -666,6 +667,7 @@ qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDefPtr dev, typedef struct { virDomainPCIConnectFlags virtioFlags; virDomainPCIConnectFlags pcieFlags; + virQEMUDriverPtr driver; } qemuDomainFillDevicePCIConnectFlagsIterData; @@ -678,8 +680,12 @@ typedef struct { static void qemuDomainFillDevicePCIConnectFlagsIterInit(virDomainDefPtr def, virQEMUCapsPtr qemuCaps, + virQEMUDriverPtr driver, qemuDomainFillDevicePCIConnectFlagsIterData *data) { + + data->driver = driver; + if (qemuDomainMachineHasPCIeRoot(def)) { data->pcieFlags = (VIR_PCI_CONNECT_TYPE_PCIE_DEVICE | VIR_PCI_CONNECT_HOTPLUGGABLE); @@ -719,7 +725,8 @@ qemuDomainFillDevicePCIConnectFlagsIter(virDomainDefPtr def ATTRIBUTE_UNUSED, qemuDomainFillDevicePCIConnectFlagsIterData *data = opaque; info->pciConnectFlags - = qemuDomainDeviceCalculatePCIConnectFlags(dev, data->pcieFlags, + = qemuDomainDeviceCalculatePCIConnectFlags(dev, data->driver, + data->pcieFlags, data->virtioFlags); return 0; } @@ -739,11 +746,12 @@ qemuDomainFillDevicePCIConnectFlagsIter(virDomainDefPtr def ATTRIBUTE_UNUSED, */ static int qemuDomainFillAllPCIConnectFlags(virDomainDefPtr def, - virQEMUCapsPtr qemuCaps) + virQEMUCapsPtr qemuCaps, + virQEMUDriverPtr driver) { qemuDomainFillDevicePCIConnectFlagsIterData data; - qemuDomainFillDevicePCIConnectFlagsIterInit(def, qemuCaps, &data); + qemuDomainFillDevicePCIConnectFlagsIterInit(def, qemuCaps, driver, &data); return virDomainDeviceInfoIterate(def, qemuDomainFillDevicePCIConnectFlagsIter, @@ -765,7 +773,8 @@ qemuDomainFillAllPCIConnectFlags(virDomainDefPtr def, static void qemuDomainFillDevicePCIConnectFlags(virDomainDefPtr def, virDomainDeviceDefPtr dev, - virQEMUCapsPtr qemuCaps) + virQEMUCapsPtr qemuCaps, + virQEMUDriverPtr driver) { virDomainDeviceInfoPtr info = virDomainDeviceGetInfo(dev); @@ -781,10 +790,11 @@ qemuDomainFillDevicePCIConnectFlags(virDomainDefPtr def, */ qemuDomainFillDevicePCIConnectFlagsIterData data; - qemuDomainFillDevicePCIConnectFlagsIterInit(def, qemuCaps, &data); + qemuDomainFillDevicePCIConnectFlagsIterInit(def, qemuCaps, driver, &data); info->pciConnectFlags - = qemuDomainDeviceCalculatePCIConnectFlags(dev, data.pcieFlags, + = qemuDomainDeviceCalculatePCIConnectFlags(dev, data.driver, + data.pcieFlags, data.virtioFlags); } } @@ -1819,6 +1829,7 @@ qemuDomainAddressFindNewBusNr(virDomainDefPtr def) static int qemuDomainAssignPCIAddresses(virDomainDefPtr def, virQEMUCapsPtr qemuCaps, + virQEMUDriverPtr driver, virDomainObjPtr obj) { int ret = -1; @@ -1844,7 +1855,7 @@ qemuDomainAssignPCIAddresses(virDomainDefPtr def, * of all devices. This will be used to pick an appropriate * bus when assigning addresses. */ - if (qemuDomainFillAllPCIConnectFlags(def, qemuCaps) < 0) + if (qemuDomainFillAllPCIConnectFlags(def, qemuCaps, driver) < 0) goto cleanup; if (nbuses > 0 && @@ -1963,7 +1974,7 @@ qemuDomainAssignPCIAddresses(virDomainDefPtr def, dev.type = VIR_DOMAIN_DEVICE_CONTROLLER; dev.data.controller = def->controllers[contIndex]; /* set connect flags so it will be properly addressed */ - qemuDomainFillDevicePCIConnectFlags(def, &dev, qemuCaps); + qemuDomainFillDevicePCIConnectFlags(def, &dev, qemuCaps, driver); if (qemuDomainPCIAddressReserveNextSlot(addrs, &dev.data.controller->info) < 0) goto cleanup; @@ -2333,6 +2344,7 @@ qemuDomainAssignUSBAddresses(virDomainDefPtr def, int qemuDomainAssignAddresses(virDomainDefPtr def, virQEMUCapsPtr qemuCaps, + virQEMUDriverPtr driver, virDomainObjPtr obj, bool newDomain) { @@ -2347,7 +2359,7 @@ qemuDomainAssignAddresses(virDomainDefPtr def, qemuDomainAssignARMVirtioMMIOAddresses(def, qemuCaps); - if (qemuDomainAssignPCIAddresses(def, qemuCaps, obj) < 0) + if (qemuDomainAssignPCIAddresses(def, qemuCaps, driver, obj) < 0) return -1; if (qemuDomainAssignUSBAddresses(def, obj, newDomain) < 0) @@ -2376,6 +2388,7 @@ qemuDomainAssignAddresses(virDomainDefPtr def, */ int qemuDomainEnsurePCIAddress(virDomainObjPtr obj, + virQEMUDriverPtr driver, virDomainDeviceDefPtr dev) { qemuDomainObjPrivatePtr priv = obj->privateData; @@ -2384,7 +2397,7 @@ qemuDomainEnsurePCIAddress(virDomainObjPtr obj, if (!info) return 0; - qemuDomainFillDevicePCIConnectFlags(obj->def, dev, priv->qemuCaps); + qemuDomainFillDevicePCIConnectFlags(obj->def, dev, priv->qemuCaps, driver); return virDomainPCIAddressEnsureAddr(priv->pciaddrs, info, info->pciConnectFlags); diff --git a/src/qemu/qemu_domain_address.h b/src/qemu/qemu_domain_address.h index e532bd8..bbceb1b 100644 --- a/src/qemu/qemu_domain_address.h +++ b/src/qemu/qemu_domain_address.h @@ -25,6 +25,7 @@ # include "domain_addr.h" # include "domain_conf.h" +# include "qemu_conf.h" # include "qemu_capabilities.h" int qemuDomainSetSCSIControllerModel(const virDomainDef *def, @@ -33,13 +34,15 @@ int qemuDomainSetSCSIControllerModel(const virDomainDef *def, int qemuDomainAssignAddresses(virDomainDefPtr def, virQEMUCapsPtr qemuCaps, + virQEMUDriverPtr driver, virDomainObjPtr obj, bool newDomain) - ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); int qemuDomainEnsurePCIAddress(virDomainObjPtr obj, + virQEMUDriverPtr driver, virDomainDeviceDefPtr dev) - ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); void qemuDomainReleaseDeviceAddress(virDomainObjPtr vm, virDomainDeviceInfoPtr info, diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 6b3a068..ad06720 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -345,7 +345,7 @@ qemuDomainAttachVirtioDiskDevice(virConnectPtr conn, goto error; } else if (!disk->info.type || disk->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI) { - if (qemuDomainEnsurePCIAddress(vm, &dev) < 0) + if (qemuDomainEnsurePCIAddress(vm, driver, &dev) < 0) goto error; } releaseaddr = true; @@ -504,7 +504,7 @@ int qemuDomainAttachControllerDevice(virQEMUDriverPtr driver, if (controller->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE || controller->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI) { - if (qemuDomainEnsurePCIAddress(vm, &dev) < 0) + if (qemuDomainEnsurePCIAddress(vm, driver, &dev) < 0) goto cleanup; } else if (controller->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_CCW) { if (!(ccwaddrs = qemuDomainCCWAddrSetCreateFromDomain(vm->def))) @@ -1137,7 +1137,7 @@ qemuDomainAttachNetDevice(virQEMUDriverPtr driver, virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("virtio-s390 net device cannot be hotplugged.")); goto cleanup; - } else if (qemuDomainEnsurePCIAddress(vm, &dev) < 0) { + } else if (qemuDomainEnsurePCIAddress(vm, driver, &dev) < 0) { goto cleanup; } @@ -1441,7 +1441,7 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver, if (qemuAssignDeviceHostdevAlias(vm->def, &hostdev->info->alias, -1) < 0) goto error; - if (qemuDomainEnsurePCIAddress(vm, &dev) < 0) + if (qemuDomainEnsurePCIAddress(vm, driver, &dev) < 0) goto error; releaseaddr = true; if (backend != VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO && @@ -1772,7 +1772,8 @@ qemuDomainChrRemove(virDomainDefPtr vmdef, * 0 otherwise */ static int -qemuDomainAttachChrDeviceAssignAddr(virDomainObjPtr vm, +qemuDomainAttachChrDeviceAssignAddr(virQEMUDriverPtr driver, + virDomainObjPtr vm, virDomainChrDefPtr chr) { virDomainDefPtr def = vm->def; @@ -1787,7 +1788,7 @@ qemuDomainAttachChrDeviceAssignAddr(virDomainObjPtr vm, } else if (chr->deviceType == VIR_DOMAIN_CHR_DEVICE_TYPE_SERIAL && chr->targetType == VIR_DOMAIN_CHR_SERIAL_TARGET_TYPE_PCI) { - if (qemuDomainEnsurePCIAddress(vm, &dev) < 0) + if (qemuDomainEnsurePCIAddress(vm, driver, &dev) < 0) return -1; return 1; @@ -1846,7 +1847,7 @@ int qemuDomainAttachChrDevice(virConnectPtr conn, if (qemuAssignDeviceChrAlias(vmdef, chr, -1) < 0) goto cleanup; - if ((rc = qemuDomainAttachChrDeviceAssignAddr(vm, chr)) < 0) + if ((rc = qemuDomainAttachChrDeviceAssignAddr(driver, vm, chr)) < 0) goto cleanup; if (rc == 1) need_release = true; @@ -1986,7 +1987,7 @@ qemuDomainAttachRNGDevice(virConnectPtr conn, if (rng->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE || rng->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI) { - if (qemuDomainEnsurePCIAddress(vm, &dev) < 0) + if (qemuDomainEnsurePCIAddress(vm, driver, &dev) < 0) goto cleanup; } else if (rng->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_CCW) { if (!(ccwaddrs = qemuDomainCCWAddrSetCreateFromDomain(vm->def))) @@ -2531,7 +2532,7 @@ qemuDomainAttachShmemDevice(virQEMUDriverPtr driver, if ((shmem->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE || shmem->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI) && - (qemuDomainEnsurePCIAddress(vm, &dev) < 0)) + (qemuDomainEnsurePCIAddress(vm, driver, &dev) < 0)) return -1; if (!(shmstr = qemuBuildShmemDevStr(vm->def, shmem, priv->qemuCaps))) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 09b2a72..ac23cb2 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3373,8 +3373,10 @@ qemuProcessReconnect(void *opaque) goto cleanup; } - if ((qemuDomainAssignAddresses(obj->def, priv->qemuCaps, obj, false)) < 0) + if ((qemuDomainAssignAddresses(obj->def, priv->qemuCaps, + driver, obj, false)) < 0) { goto error; + } /* if domain requests security driver we haven't loaded, report error, but * do not kill the domain @@ -5161,9 +5163,10 @@ qemuProcessPrepareDomain(virConnectPtr conn, * use in hotplug */ VIR_DEBUG("Assigning domain PCI addresses"); - if ((qemuDomainAssignAddresses(vm->def, priv->qemuCaps, vm, - !!(flags & VIR_QEMU_PROCESS_START_NEW))) < 0) + if ((qemuDomainAssignAddresses(vm->def, priv->qemuCaps, driver, vm, + !!(flags & VIR_QEMU_PROCESS_START_NEW))) < 0) { goto cleanup; + } if (qemuAssignDeviceAliases(vm->def, priv->qemuCaps) < 0) goto cleanup; @@ -6369,8 +6372,10 @@ int qemuProcessAttach(virConnectPtr conn ATTRIBUTE_UNUSED, * use in hotplug */ VIR_DEBUG("Assigning domain PCI addresses"); - if ((qemuDomainAssignAddresses(vm->def, priv->qemuCaps, vm, false)) < 0) + if ((qemuDomainAssignAddresses(vm->def, priv->qemuCaps, + driver, vm, false)) < 0) { goto error; + } if ((timestamp = virTimeStringNow()) == NULL) goto error; diff --git a/tests/qemuhotplugtest.c b/tests/qemuhotplugtest.c index 89e870c..f0a8453 100644 --- a/tests/qemuhotplugtest.c +++ b/tests/qemuhotplugtest.c @@ -90,8 +90,10 @@ qemuHotplugCreateObjects(virDomainXMLOptionPtr xmlopt, VIR_DOMAIN_DEF_PARSE_INACTIVE))) goto cleanup; - if (qemuDomainAssignAddresses((*vm)->def, priv->qemuCaps, *vm, true) < 0) + if (qemuDomainAssignAddresses((*vm)->def, priv->qemuCaps, + &driver, *vm, true) < 0) { goto cleanup; + } if (qemuAssignDeviceAliases((*vm)->def, priv->qemuCaps) < 0) goto cleanup; -- 2.9.3

On Mon, 2016-11-21 at 00:01 -0500, Laine Stump wrote: [...]
@@ -2376,6 +2388,7 @@ qemuDomainAssignAddresses(virDomainDefPtr def, */ int qemuDomainEnsurePCIAddress(virDomainObjPtr obj, + virQEMUDriverPtr driver, virDomainDeviceDefPtr dev)
I'm pretty sure you want the virQEMUDriverPtr to be *after* the virDomainDeviceDefPtr, not before it.
@@ -1772,7 +1772,8 @@ qemuDomainChrRemove(virDomainDefPtr vmdef, * 0 otherwise */ static int -qemuDomainAttachChrDeviceAssignAddr(virDomainObjPtr vm, +qemuDomainAttachChrDeviceAssignAddr(virQEMUDriverPtr driver, + virDomainObjPtr vm, virDomainChrDefPtr chr)
Here the virQEMUDriverPtr should be last.
@@ -3373,8 +3373,10 @@ qemuProcessReconnect(void *opaque) goto cleanup; } - if ((qemuDomainAssignAddresses(obj->def, priv->qemuCaps, obj, false)) < 0) + if ((qemuDomainAssignAddresses(obj->def, priv->qemuCaps, + driver, obj, false)) < 0) { goto error; + }
According to our style guidelines, the curly braces are optional here, as the second line of the condition is indented in a way that makes it obvious it's still part of the condition. I point this out just as an aside, I actually like it better with the curly braces :) ACK once you shuffle the arguments around. -- Andrea Bolognani / Red Hat / Virtualization

On 11/24/2016 10:31 AM, Andrea Bolognani wrote:
@@ -2376,6 +2388,7 @@ qemuDomainAssignAddresses(virDomainDefPtr def, */ int qemuDomainEnsurePCIAddress(virDomainObjPtr obj, + virQEMUDriverPtr driver, virDomainDeviceDefPtr dev) I'm pretty sure you want the virQEMUDriverPtr to be *after*
On Mon, 2016-11-21 at 00:01 -0500, Laine Stump wrote: [...] the virDomainDeviceDefPtr, not before it.
@@ -1772,7 +1772,8 @@ qemuDomainChrRemove(virDomainDefPtr vmdef, * 0 otherwise */ static int -qemuDomainAttachChrDeviceAssignAddr(virDomainObjPtr vm, +qemuDomainAttachChrDeviceAssignAddr(virQEMUDriverPtr driver, + virDomainObjPtr vm, virDomainChrDefPtr chr) Here the virQEMUDriverPtr should be last.
@@ -3373,8 +3373,10 @@ qemuProcessReconnect(void *opaque) goto cleanup; }
- if ((qemuDomainAssignAddresses(obj->def, priv->qemuCaps, obj, false)) < 0) + if ((qemuDomainAssignAddresses(obj->def, priv->qemuCaps, + driver, obj, false)) < 0) { goto error; + } According to our style guidelines, the curly braces are optional here, as the second line of the condition is indented in a way that makes it obvious it's still part of the condition.
I point this out just as an aside, I actually like it better with the curly braces :)
I agree. I never got the logic behind that exception, so I'm glad it's optional to omit the braces.
ACK once you shuffle the arguments around.
-- Andrea Bolognani / Red Hat / Virtualization

Although nearly all host devices that are assigned to guests using vfio ("<hostdev>" devices in libvirt) are physically PCI Express devices, until now libvirt's PCI address assignment has always assigned them addresses on legacy PCI controllers in the guest, even if the guest's machinetype has a PCIe root bus (e.g. q35 and aarch64/virt). This patch tries to assign them to an address on a PCIe controller instead, when appropriate. First we do some preliminary checks that might allow setting the flags without doing any extra work, and if those conditions aren't met (and if libvirt is running privileged so that it has proper permissions), we perform the (relatively) time consuming task of reading the device's PCI config to see if it is an Express device. If this is successful, the connect flags are set based on the result, but if we aren't able to read the PCI config (most likely due to the device not being present on the system at the time of the check) we assume it is (or will be) an Express device, since that is almost always the case anyway. --- Change since V1: implemented ALex's suggestion of checking the length of the PCI config file when running unprivileged. src/qemu/qemu_domain_address.c | 86 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 83 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 8abae65..47b2c7f 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -428,7 +428,7 @@ qemuDomainAssignARMVirtioMMIOAddresses(virDomainDefPtr def, */ static virDomainPCIConnectFlags qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDefPtr dev, - virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virQEMUDriverPtr driver, virDomainPCIConnectFlags pcieFlags, virDomainPCIConnectFlags virtioFlags) { @@ -558,8 +558,88 @@ qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDefPtr dev, return 0; } - case VIR_DOMAIN_DEVICE_HOSTDEV: - return pciFlags; + case VIR_DOMAIN_DEVICE_HOSTDEV: { + virDomainHostdevDefPtr hostdev = dev->data.hostdev; + + if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && + hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) { + bool isExpress = false; + virPCIDevicePtr pciDev; + virPCIDeviceAddressPtr hostAddr = &hostdev->source.subsys.u.pci.addr; + + if (pciFlags == pcieFlags) { + /* This arch/qemu only supports legacy PCI, so there + * is no point in checking if the device is an Express + * device. + */ + return pciFlags; + } + + if (virDeviceInfoPCIAddressPresent(hostdev->info)) { + /* A guest-side address has already been assigned, so + * we can avoid reading the PCI config, and just use + * pcieFlags, since the pciConnectFlags checking is + * more relaxed when an address is already assigned + * than it is when we're looking for a new address (so + * validation will pass regardless of whether we set + * the flags to PCI or PCIE). + */ + return pcieFlags; + } + + if (!(pciDev = virPCIDeviceNew(hostAddr->domain, + hostAddr->bus, + hostAddr->slot, + hostAddr->function))) { + /* libvirt should be able to perform all the + * operations in virPCIDeviceNew() even if it's + * running unprivileged, so if this fails, the device + * apparently doesn't currently exist on the host. + * Since the overwhelming majority of assignable host + * devices are PCIe, assume this one is too. + */ + return pcieFlags; + } + + if (!driver->privileged) { + /* unprivileged libvirtd is unable to read *all* of a + * device's PCI config (it can only read the first 64 + * bytes, which isn't enough for the check that's done + * in virPCIDeviceIsPCIExpress()), so instead of + * trying and failing, we make an educated guess based + * on the length of the device's config file - if it + * is 256 bytes, then it is definitely a legacy PCI + * device. If it's larger than that, then it is + * *probably PCIe (although it could be PCI-x, but + * those are extremely rare). If the config file can't + * be found (in which case the "length" will be -1), + * then we blindly assume the most likely outcome - + * PCIe. + */ + off_t configLen = virFileLength(virPCIDeviceGetConfigPath(pciDev)); + + virPCIDeviceFree(pciDev); + + if (configLen == 256) + return pciFlags; + + return pcieFlags; + } + + /* If we are running with privileges, we can examine the + * PCI config contents with virPCIDeviceIsPCIExpress() for + * a definitive answer. + */ + isExpress = virPCIDeviceIsPCIExpress(pciDev); + virPCIDeviceFree(pciDev); + + if (isExpress) + return pcieFlags; + + return pciFlags; + } + return 0; + } case VIR_DOMAIN_DEVICE_MEMBALLOON: switch ((virDomainMemballoonModel) dev->data.memballoon->model) { -- 2.9.3

On Mon, 2016-11-21 at 00:01 -0500, Laine Stump wrote:
Although nearly all host devices that are assigned to guests using vfio ("<hostdev>" devices in libvirt) are physically PCI Express
s/vfio/VFIO/ Same for the subject. [...]
@@ -558,8 +558,88 @@ qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDefPtr dev, return 0; } - case VIR_DOMAIN_DEVICE_HOSTDEV: - return pciFlags; + case VIR_DOMAIN_DEVICE_HOSTDEV: { + virDomainHostdevDefPtr hostdev = dev->data.hostdev; + + if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && + hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) {
How about inverting this condition and returning 0 immediately unless if the wrong mode or type is used? You could get rid of one indentation level that way, and the function already has a gazillion return statements anyway. [...]
+ if (virDeviceInfoPCIAddressPresent(hostdev->info)) { + /* A guest-side address has already been assigned, so + * we can avoid reading the PCI config, and just use + * pcieFlags, since the pciConnectFlags checking is + * more relaxed when an address is already assigned + * than it is when we're looking for a new address (so + * validation will pass regardless of whether we set + * the flags to PCI or PCIE).
s/PCIE/PCIe/ [...]
+ /* If we are running with privileges, we can examine the + * PCI config contents with virPCIDeviceIsPCIExpress() for + * a definitive answer. + */ + isExpress = virPCIDeviceIsPCIExpress(pciDev); + virPCIDeviceFree(pciDev); + + if (isExpress) + return pcieFlags; + + return pciFlags; + }
Empty line here, please.
+ return 0; + } case VIR_DOMAIN_DEVICE_MEMBALLOON: switch ((virDomainMemballoonModel) dev->data.memballoon->model) {
ACK (with the nits fixed) regardless of whether you decide to go with my suggestion of reducing the indentation level. -- Andrea Bolognani / Red Hat / Virtualization

On 11/24/2016 10:43 AM, Andrea Bolognani wrote:
On Mon, 2016-11-21 at 00:01 -0500, Laine Stump wrote:
Although nearly all host devices that are assigned to guests using vfio ("<hostdev>" devices in libvirt) are physically PCI Express s/vfio/VFIO/
Same for the subject.
[...]
@@ -558,8 +558,88 @@ qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDefPtr dev, return 0; }
- case VIR_DOMAIN_DEVICE_HOSTDEV: - return pciFlags; + case VIR_DOMAIN_DEVICE_HOSTDEV: { + virDomainHostdevDefPtr hostdev = dev->data.hostdev; + + if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && + hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) { How about inverting this condition and returning 0 immediately unless if the wrong mode or type is used? You could get rid of one indentation level that way, and the function already has a gazillion return statements anyway.
I've been trying to return early only for specific cases, and leave the "catchall" for each case at the end of the case. But since this function is returning the *PCI* connect flags, I guess "not a PCI device" is a pretty specific case :-) (in other words "okay, I'll invert the condition, return early with a 0, and reduce the indent level (which I also like)).

On Mon, 2016-11-21 at 00:01 -0500, Laine Stump wrote:
Patches 1 and 4 were originally added to the end of the "more PCIe less legacy PCI" patchset in its final incarnation, but all the other patches were ACKed and pushed, while this needed a bit more work, resulting in this "faux V2" - although it's the 2nd time I've posted these patches, their "V1" was really inside the "V11666" of a larger series.
Forgot to mention yesterday: the NEWS file will need to contain an entry for this. You can add a single entry that covers both this series and the previous PCIe-related series, or just document the previous one if this one somehow doesn't make it into the release. -- Andrea Bolognani / Red Hat / Virtualization
participants (3)
-
Andrea Bolognani
-
Eric Blake
-
Laine Stump