[libvirt] [PATCH 0/3] Properly deal with multiple netdevs per PCI device when converting VF <-> PF

The commit log of Patch 1 explains the majority of the reason for these patches. In short, they disambiguate the multiple netdevs per PCI device on the SRIOV PF and VFs of a Mellanox dual port NIC *when converting from a VF netdev to a PF netdev and vice versa*. This permits us to set the vlan tag and MAC address for the correct VF netdev (and detect the online status of the correct PF netdev for that VF) when using a VF in macvtap passthrough mode. (in other words, with these patches in place, it is possible to use *all* the VF netdevs on both ports of a dual port mellanox NIC for macvtap passthrough.) These patches *do not* solve the problem of saving/setting the mac address/vlan tag for both ports when using a dual port VF for PCI device assignment with VFIO (see my RFC email from yesterday). That is a much more complex problem. (These patches are a prerequisite to anything that might come out of that RFC though). Laine Stump (3): util: new function virNetDevGetPhysPortID() util: support matching a phys_port_id in virPCIGetNetName() util: match phys_port_id when converting PF-netdev to/from VF-netdev src/libvirt_private.syms | 1 + src/util/virhostdev.c | 2 +- src/util/virnetdev.c | 84 +++++++++++++++++++++++++++++++++++++++++++++--- src/util/virnetdev.h | 5 +++ src/util/virpci.c | 49 ++++++++++++++++++++++------ src/util/virpci.h | 4 ++- 6 files changed, 129 insertions(+), 16 deletions(-) -- 2.13.3

On Linux each network device *can* (but not necessarily *does*) have an attribute called phys_port_id which can be read from the file of that name in the netdev's sysfs directory. The examples I've seen have been a many-digit hexadecimal number (as an ASCII string). This value can be useful when a single PCI device is associated with multiple netdevs (e.g a dual port Mellanox SR-IOV NIC - this card has a single PCI Physical Function (PF), and that PF has two netdevs associated with it (the "net" subdirectory of the PF in sysfs has two links rather than the usual single link to a netdev directory). Each of the PF netdevs has a different phys_port_id. The Virtual Functions (VF) are similar - the PF (a PCI device) has "n" VFs (also each of these is a PCI device), each VF has two netdevs, and each of the VF netdevs points back to the VF PCI device (with the "device" entry in its sysfs directory) as well as having a phys_port_id matching the PF netdev it is associated with. virNetDevGetPhysPortID() simply attempts to read the phys_port_id for the given netdev and return it to the caller. If this particular netdev driver doesn't support phys_port_id (most don't), it returns NULL (*not* a NULL-terminated string, but a NULL pointer) but still counts it as a success. --- src/libvirt_private.syms | 1 + src/util/virnetdev.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virnetdev.h | 5 +++++ 3 files changed, 57 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 32ac0835e..28d089396 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2113,6 +2113,7 @@ virNetDevGetMTU; virNetDevGetName; virNetDevGetOnline; virNetDevGetPhysicalFunction; +virNetDevGetPhysPortID; virNetDevGetPromiscuous; virNetDevGetRcvAllMulti; virNetDevGetRcvMulti; diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index 90b7bee34..a2664de78 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -1170,6 +1170,46 @@ virNetDevGetPCIDevice(const char *devName) /** + * virNetDevGetPhysPortID: + * + * @ifname: name of a netdev + * + * @physPortID: pointer to char* that will receive @ifname's + * phys_port_id from sysfs (null terminated + * string). Could be NULL if @ifname's net driver doesn't + * support phys_port_id (most netdev drivers + * don't). Caller is responsible for freeing the string + * when finished. + * + * Returns 0 on success or -1 on failure. + */ +int +virNetDevGetPhysPortID(const char *ifname, + char **physPortID) +{ + int ret = -1; + char *physPortIDFile = NULL; + + *physPortID = NULL; + + if (virNetDevSysfsFile(&physPortIDFile, ifname, "phys_port_id") < 0) + goto cleanup; + + /* a failure to read just means the driver doesn't support + * phys_port_id, so set success now and ignore the return from + * virFileReadAllQuiet(). + */ + ret = 0; + + ignore_value(virFileReadAllQuiet(physPortIDFile, 1024, physPortID)); + + cleanup: + VIR_FREE(physPortIDFile); + return ret; +} + + +/** * virNetDevGetVirtualFunctions: * * @pfname : name of the physical function interface name @@ -1433,6 +1473,17 @@ virNetDevGetVirtualFunctionInfo(const char *vfname, char **pfname, #else /* !__linux__ */ int +virNetDevGetPhysPortID(const char *ifname ATTRIBUTE_UNUSED, + char **physPortID ATTRIBUTE_UNUSED) +{ + /* this actually should never be called, and is just here to + * satisfy the linker. + */ + *physPortID = NULL; + return 0; +} + +int virNetDevGetVirtualFunctions(const char *pfname ATTRIBUTE_UNUSED, char ***vfname ATTRIBUTE_UNUSED, virPCIDeviceAddressPtr **virt_fns ATTRIBUTE_UNUSED, diff --git a/src/util/virnetdev.h b/src/util/virnetdev.h index 51fcae544..9205c0e86 100644 --- a/src/util/virnetdev.h +++ b/src/util/virnetdev.h @@ -226,6 +226,11 @@ int virNetDevGetPhysicalFunction(const char *ifname, char **pfname) int virNetDevPFGetVF(const char *pfname, int vf, char **vfname) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; +int virNetDevGetPhysPortID(const char *ifname, + char **physPortID) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) + ATTRIBUTE_RETURN_CHECK; + int virNetDevGetVirtualFunctions(const char *pfname, char ***vfname, virPCIDeviceAddressPtr **virt_fns, -- 2.13.3

A single PCI device may have multiple netdevs associated with it. Each of those netdevs will have a different phys_port_id entry in sysfs. This patch modifies virPCIGetNetName() to allow matching the netdev for a PCI device that has the same phys_port_id that the caller wants. --- src/util/virhostdev.c | 2 +- src/util/virnetdev.c | 6 +++--- src/util/virpci.c | 49 ++++++++++++++++++++++++++++++++++++++++--------- src/util/virpci.h | 4 +++- 4 files changed, 47 insertions(+), 14 deletions(-) diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index 579563c3f..fcefebd07 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -326,7 +326,7 @@ virHostdevNetDevice(virDomainHostdevDefPtr hostdev, char **linkdev, * type='hostdev'>, and it is only those devices that should * end up calling this function. */ - if (virPCIGetNetName(sysfs_path, linkdev) < 0) + if (virPCIGetNetName(sysfs_path, NULL, linkdev) < 0) goto cleanup; if (!linkdev) { diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index a2664de78..1c150b7d7 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -1262,7 +1262,7 @@ virNetDevGetVirtualFunctions(const char *pfname, goto cleanup; } - if (virPCIGetNetName(pci_sysfs_device_link, &((*vfname)[i])) < 0) + if (virPCIGetNetName(pci_sysfs_device_link, NULL, &((*vfname)[i])) < 0) goto cleanup; if (!(*vfname)[i]) @@ -1362,7 +1362,7 @@ virNetDevGetPhysicalFunction(const char *ifname, char **pfname) if (virNetDevSysfsDeviceFile(&physfn_sysfs_path, ifname, "physfn") < 0) return ret; - if (virPCIGetNetName(physfn_sysfs_path, pfname) < 0) + if (virPCIGetNetName(physfn_sysfs_path, NULL, pfname) < 0) goto cleanup; if (!*pfname) { @@ -1422,7 +1422,7 @@ virNetDevPFGetVF(const char *pfname, int vf, char **vfname) * isn't bound to a netdev driver, it won't have a netdev name, * and vfname will be NULL). */ - ret = virPCIGetNetName(virtfnSysfsPath, vfname); + ret = virPCIGetNetName(virtfnSysfsPath, NULL, vfname); cleanup: VIR_FREE(virtfnName); diff --git a/src/util/virpci.c b/src/util/virpci.c index 2c1b75855..5d811ada6 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -24,6 +24,7 @@ #include <config.h> #include "virpci.h" +#include "virnetdev.h" #include <dirent.h> #include <fcntl.h> @@ -2857,12 +2858,15 @@ virPCIDeviceAddressGetSysfsFile(virPCIDeviceAddressPtr addr, * Returns the network device name of a pci device */ int -virPCIGetNetName(char *device_link_sysfs_path, char **netname) +virPCIGetNetName(char *device_link_sysfs_path, + char *physPortID, + char **netname) { char *pcidev_sysfs_net_path = NULL; int ret = -1; DIR *dir = NULL; struct dirent *entry = NULL; + char *thisPhysPortID = NULL; if (virBuildPath(&pcidev_sysfs_net_path, device_link_sysfs_path, "net") == -1) { @@ -2873,21 +2877,47 @@ virPCIGetNetName(char *device_link_sysfs_path, char **netname) if (virDirOpenQuiet(&dir, pcidev_sysfs_net_path) < 0) { /* this *isn't* an error - caller needs to check for netname == NULL */ ret = 0; - goto out; + goto cleanup; } while (virDirRead(dir, &entry, pcidev_sysfs_net_path) > 0) { - /* Assume a single directory entry */ - if (VIR_STRDUP(*netname, entry->d_name) > 0) - ret = 0; + /* if the caller sent a physPortID, compare it to the + * physportID of this netdev. If not, accept the first netdev + */ + if (physPortID) { + if (virNetDevGetPhysPortID(entry->d_name, &thisPhysPortID) < 0) + goto cleanup; + + /* if this one doesn't match, keep looking */ + if (STRNEQ_NULLABLE(physPortID, thisPhysPortID)) { + VIR_FREE(thisPhysPortID); + continue; + } + } + if (VIR_STRDUP(*netname, entry->d_name) < 0) + goto cleanup; + + ret = 0; break; } + if (ret < 0) { + if (physPortID) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Could not find network device with " + "phys_port_id '%s' under PCI device at %s"), + physPortID, device_link_sysfs_path); + } else { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("PCI device at %s had a net directory, " + "but it was empty"), + device_link_sysfs_path); + } + } + cleanup: VIR_DIR_CLOSE(dir); - - out: VIR_FREE(pcidev_sysfs_net_path); - + VIR_FREE(thisPhysPortID); return ret; } @@ -2915,7 +2945,7 @@ virPCIGetVirtualFunctionInfo(const char *vf_sysfs_device_path, goto cleanup; } - if (virPCIGetNetName(pf_sysfs_device_path, pfname) < 0) + if (virPCIGetNetName(pf_sysfs_device_path, NULL, pfname) < 0) goto cleanup; if (!*pfname) { @@ -2992,6 +3022,7 @@ virPCIDeviceAddressGetSysfsFile(virPCIDeviceAddressPtr dev ATTRIBUTE_UNUSED, int virPCIGetNetName(char *device_link_sysfs_path ATTRIBUTE_UNUSED, + char *physPortID ATTRIBUTE_UNUSED, char **netname ATTRIBUTE_UNUSED) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _(unsupported)); diff --git a/src/util/virpci.h b/src/util/virpci.h index 570684e75..c0e54d785 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -207,7 +207,9 @@ int virPCIGetVirtualFunctionIndex(const char *pf_sysfs_device_link, int virPCIDeviceAddressGetSysfsFile(virPCIDeviceAddressPtr addr, char **pci_sysfs_device_link); -int virPCIGetNetName(char *device_link_sysfs_path, char **netname); +int virPCIGetNetName(char *device_link_sysfs_path, + char *physPortID, + char **netname); int virPCIGetSysfsFile(char *virPCIDeviceName, char **pci_sysfs_device_link) -- 2.13.3

This patch updates functions in netdev.c to pay attention to phys_port_id. It uses the new function virNetDevGetPhysPortID() to learn the phys_port_id of a VF or PF, then sends that info to virPCIGetNetName(), which has newly been modified to take an optional phys_port_id. --- src/util/virnetdev.c | 33 ++++++++++++++++++++++++++++----- 1 file changed, 28 insertions(+), 5 deletions(-) diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index 1c150b7d7..83b255219 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -1231,13 +1231,17 @@ virNetDevGetVirtualFunctions(const char *pfname, char *pf_sysfs_device_link = NULL; char *pci_sysfs_device_link = NULL; char *pciConfigAddr = NULL; + char *pfPhysPortID = NULL; *virt_fns = NULL; *n_vfname = 0; *max_vfs = 0; + if (virNetDevGetPhysPortID(pfname, &pfPhysPortID) < 0) + goto cleanup; + if (virNetDevSysfsFile(&pf_sysfs_device_link, pfname, "device") < 0) - return ret; + goto cleanup; if (virPCIGetVirtualFunctions(pf_sysfs_device_link, virt_fns, n_vfname, max_vfs) < 0) @@ -1262,8 +1266,10 @@ virNetDevGetVirtualFunctions(const char *pfname, goto cleanup; } - if (virPCIGetNetName(pci_sysfs_device_link, NULL, &((*vfname)[i])) < 0) + if (virPCIGetNetName(pci_sysfs_device_link, + pfPhysPortID, &((*vfname)[i])) < 0) { goto cleanup; + } if (!(*vfname)[i]) VIR_INFO("VF does not have an interface name"); @@ -1276,6 +1282,7 @@ virNetDevGetVirtualFunctions(const char *pfname, VIR_FREE(*vfname); VIR_FREE(*virt_fns); } + VIR_FREE(pfPhysPortID); VIR_FREE(pf_sysfs_device_link); VIR_FREE(pci_sysfs_device_link); VIR_FREE(pciConfigAddr); @@ -1357,13 +1364,19 @@ int virNetDevGetPhysicalFunction(const char *ifname, char **pfname) { char *physfn_sysfs_path = NULL; + char *vfPhysPortID = NULL; int ret = -1; + if (virNetDevGetPhysPortID(ifname, &vfPhysPortID) < 0) + goto cleanup; + if (virNetDevSysfsDeviceFile(&physfn_sysfs_path, ifname, "physfn") < 0) - return ret; + goto cleanup; - if (virPCIGetNetName(physfn_sysfs_path, NULL, pfname) < 0) + if (virPCIGetNetName(physfn_sysfs_path, + vfPhysPortID, pfname) < 0) { goto cleanup; + } if (!*pfname) { /* this shouldn't be possible. A VF can't exist unless its @@ -1377,6 +1390,7 @@ virNetDevGetPhysicalFunction(const char *ifname, char **pfname) ret = 0; cleanup: + VIR_FREE(vfPhysPortID); VIR_FREE(physfn_sysfs_path); return ret; } @@ -1404,8 +1418,16 @@ virNetDevPFGetVF(const char *pfname, int vf, char **vfname) { char *virtfnName = NULL; char *virtfnSysfsPath = NULL; + char *pfPhysPortID = NULL; int ret = -1; + /* a VF may have multiple "ports", each one having its own netdev, + * and each netdev having a different phys_port_id. Be sure we get + * the VF netdev with a phys_port_id matchine that of pfname + */ + if (virNetDevGetPhysPortID(pfname, &pfPhysPortID) < 0) + goto cleanup; + if (virAsprintf(&virtfnName, "virtfn%d", vf) < 0) goto cleanup; @@ -1422,11 +1444,12 @@ virNetDevPFGetVF(const char *pfname, int vf, char **vfname) * isn't bound to a netdev driver, it won't have a netdev name, * and vfname will be NULL). */ - ret = virPCIGetNetName(virtfnSysfsPath, NULL, vfname); + ret = virPCIGetNetName(virtfnSysfsPath, pfPhysPortID, vfname); cleanup: VIR_FREE(virtfnName); VIR_FREE(virtfnSysfsPath); + VIR_FREE(pfPhysPortID); return ret; } -- 2.13.3

On 08/04/2017 05:36 AM, Laine Stump wrote:
The commit log of Patch 1 explains the majority of the reason for these patches. In short, they disambiguate the multiple netdevs per PCI device on the SRIOV PF and VFs of a Mellanox dual port NIC *when converting from a VF netdev to a PF netdev and vice versa*. This permits us to set the vlan tag and MAC address for the correct VF netdev (and detect the online status of the correct PF netdev for that VF) when using a VF in macvtap passthrough mode. (in other words, with these patches in place, it is possible to use *all* the VF netdevs on both ports of a dual port mellanox NIC for macvtap passthrough.)
These patches *do not* solve the problem of saving/setting the mac address/vlan tag for both ports when using a dual port VF for PCI device assignment with VFIO (see my RFC email from yesterday). That is a much more complex problem. (These patches are a prerequisite to anything that might come out of that RFC though).
Laine Stump (3): util: new function virNetDevGetPhysPortID() util: support matching a phys_port_id in virPCIGetNetName() util: match phys_port_id when converting PF-netdev to/from VF-netdev
src/libvirt_private.syms | 1 + src/util/virhostdev.c | 2 +- src/util/virnetdev.c | 84 +++++++++++++++++++++++++++++++++++++++++++++--- src/util/virnetdev.h | 5 +++ src/util/virpci.c | 49 ++++++++++++++++++++++------ src/util/virpci.h | 4 ++- 6 files changed, 129 insertions(+), 16 deletions(-)
ACK Michal
participants (2)
-
Laine Stump
-
Michal Privoznik