[libvirt] [PATCH 0/5] Interface pools and passthrough mode

Interface Pools and Passthrough mode: Current Method: The passthrough mode uses a macvtap a direct connection to connect each guest to the network. The physical interface to be used is picked from among those listed in <interface> sub elements of the <forward> element. The current specification for <forward> extends to allow 0 or more <interface> sub-elements: Example: <forward mode='passthrough' dev='eth10'/> <interface dev='eth10'/> <interface dev='eth12'/> <interface dev='eth18'/> <interface dev='eth20'/> </forward> However with an ethernet card with 64 VF's or more, the above method gets tedious on the system. On the other hand, just parameterizing a string (eth%d) is inadequate, eg, when there are multiple non-contiguous ranges. Proposed Method: The 5 patches provided along with this introductory e-mail i) Introduce a structure to store the state of all the virtual functions attached to each physical function ii) Find a free virtual function given the physical function. The forward specification will hence only mention the physical function as the interface sub element: Example: <forward mode='passthrough' dev='eth2'/> <interface dev='eth2'/> </forward> Shradha Shah (5): Moving the declaration of _pciDevice and _pciDeviceList to pci.h Added function pciSysfsFile to enable access to the PCI SYSFS files. Adding functions virNetDevGetVirtualFunctions and virNetDevGetNetName virNetDevGetVirtualFunctions: Gets the BDF of all the Virtual Functions given a physical function virNetDevGetNetName: Gets the interface name of the PCI Device. Addition of a new device structure to store the state of a Virtual Function Modifications to the Physical Device Structure to store state of its Virtual Functions If a system has 64 or more VF's, it is quite tedious to mention each VF in the interface pool. The following modification find a Free VF given a Physical Function when mode=passthrough. We also save the state of each VF. Hence modifications to networkAllocateActualDevice, networkNotifyActualDevice and networkReleaseActualDevice were required. The following patch does just that. src/conf/network_conf.c | 19 ++++- src/conf/network_conf.h | 10 ++ src/network/bridge_driver.c | 208 +++++++++++++++++++++++++++++++++++++------ src/util/pci.c | 52 ++++------- src/util/pci.h | 36 ++++++++ src/util/virnetdev.c | 89 ++++++++++++++++++ src/util/virnetdev.h | 10 ++ 7 files changed, 359 insertions(+), 65 deletions(-) -- 1.7.4.4

--- src/util/pci.c | 35 ----------------------------------- src/util/pci.h | 34 ++++++++++++++++++++++++++++++++++ 2 files changed, 34 insertions(+), 35 deletions(-) diff --git a/src/util/pci.c b/src/util/pci.c index cd82b43..857078d 100644 --- a/src/util/pci.c +++ b/src/util/pci.c @@ -45,45 +45,10 @@ # define MODPROBE "modprobe" #endif -#define PCI_SYSFS "/sys/bus/pci/" -#define PCI_ID_LEN 10 /* "XXXX XXXX" */ -#define PCI_ADDR_LEN 13 /* "XXXX:XX:XX.X" */ - #define SRIOV_FOUND 0 #define SRIOV_NOT_FOUND 1 #define SRIOV_ERROR -1 -struct _pciDevice { - unsigned domain; - unsigned bus; - unsigned slot; - unsigned function; - - char name[PCI_ADDR_LEN]; /* domain:bus:slot.function */ - char id[PCI_ID_LEN]; /* product vendor */ - char *path; - const char *used_by; /* The domain which uses the device */ - int fd; - - unsigned initted; - unsigned pcie_cap_pos; - unsigned pci_pm_cap_pos; - unsigned has_flr : 1; - unsigned has_pm_reset : 1; - unsigned managed : 1; - - /* used by reattach function */ - unsigned unbind_from_stub : 1; - unsigned remove_slot : 1; - unsigned reprobe : 1; -}; - -struct _pciDeviceList { - unsigned count; - pciDevice **devs; -}; - - /* For virReportOOMError() and virReportSystemError() */ #define VIR_FROM_THIS VIR_FROM_NONE diff --git a/src/util/pci.h b/src/util/pci.h index 76e37e3..8e47fc2 100644 --- a/src/util/pci.h +++ b/src/util/pci.h @@ -24,9 +24,43 @@ # include "internal.h" +#define PCI_SYSFS "/sys/bus/pci/" +#define PCI_ID_LEN 10 /* "XXXX XXXX" */ +#define PCI_ADDR_LEN 13 /* "XXXX:XX:XX.X" */ + typedef struct _pciDevice pciDevice; typedef struct _pciDeviceList pciDeviceList; +struct _pciDevice { + unsigned domain; + unsigned bus; + unsigned slot; + unsigned function; + + char name[PCI_ADDR_LEN]; /* domain:bus:slot.function */ + char id[PCI_ID_LEN]; /* product vendor */ + char *path; + const char *used_by; /* The domain which uses the device */ + int fd; + + unsigned initted; + unsigned pcie_cap_pos; + unsigned pci_pm_cap_pos; + unsigned has_flr : 1; + unsigned has_pm_reset : 1; + unsigned managed : 1; + + /* used by reattach function */ + unsigned unbind_from_stub : 1; + unsigned remove_slot : 1; + unsigned reprobe : 1; +}; + +struct _pciDeviceList { + unsigned count; + pciDevice **devs; +}; + struct pci_config_address { unsigned int domain; unsigned int bus; -- 1.7.4.4

On Tue, Nov 29, 2011 at 03:48:29PM +0000, Shradha Shah wrote:
--- src/util/pci.c | 35 ----------------------------------- src/util/pci.h | 34 ++++++++++++++++++++++++++++++++++ 2 files changed, 34 insertions(+), 35 deletions(-)
diff --git a/src/util/pci.c b/src/util/pci.c index cd82b43..857078d 100644 --- a/src/util/pci.c +++ b/src/util/pci.c @@ -45,45 +45,10 @@ # define MODPROBE "modprobe" #endif
-#define PCI_SYSFS "/sys/bus/pci/" -#define PCI_ID_LEN 10 /* "XXXX XXXX" */ -#define PCI_ADDR_LEN 13 /* "XXXX:XX:XX.X" */ - #define SRIOV_FOUND 0 #define SRIOV_NOT_FOUND 1 #define SRIOV_ERROR -1
-struct _pciDevice { - unsigned domain; - unsigned bus; - unsigned slot; - unsigned function; - - char name[PCI_ADDR_LEN]; /* domain:bus:slot.function */ - char id[PCI_ID_LEN]; /* product vendor */ - char *path; - const char *used_by; /* The domain which uses the device */ - int fd; - - unsigned initted; - unsigned pcie_cap_pos; - unsigned pci_pm_cap_pos; - unsigned has_flr : 1; - unsigned has_pm_reset : 1; - unsigned managed : 1; - - /* used by reattach function */ - unsigned unbind_from_stub : 1; - unsigned remove_slot : 1; - unsigned reprobe : 1; -}; - -struct _pciDeviceList { - unsigned count; - pciDevice **devs; -}; - - /* For virReportOOMError() and virReportSystemError() */ #define VIR_FROM_THIS VIR_FROM_NONE
diff --git a/src/util/pci.h b/src/util/pci.h index 76e37e3..8e47fc2 100644 --- a/src/util/pci.h +++ b/src/util/pci.h @@ -24,9 +24,43 @@
# include "internal.h"
+#define PCI_SYSFS "/sys/bus/pci/" +#define PCI_ID_LEN 10 /* "XXXX XXXX" */ +#define PCI_ADDR_LEN 13 /* "XXXX:XX:XX.X" */ + typedef struct _pciDevice pciDevice; typedef struct _pciDeviceList pciDeviceList;
+struct _pciDevice { + unsigned domain; + unsigned bus; + unsigned slot; + unsigned function; + + char name[PCI_ADDR_LEN]; /* domain:bus:slot.function */ + char id[PCI_ID_LEN]; /* product vendor */ + char *path; + const char *used_by; /* The domain which uses the device */ + int fd; + + unsigned initted; + unsigned pcie_cap_pos; + unsigned pci_pm_cap_pos; + unsigned has_flr : 1; + unsigned has_pm_reset : 1; + unsigned managed : 1; + + /* used by reattach function */ + unsigned unbind_from_stub : 1; + unsigned remove_slot : 1; + unsigned reprobe : 1; +}; + +struct _pciDeviceList { + unsigned count; + pciDevice **devs; +};
I could have misread something, but looking at the following 4 patches, I don't see any code which requires direct access to the internals of the 'struct pciDevice' type. If there is such a need I've missed, then it would be desirable to just add more setter/getter APIs in the pci.h, and keep 'struct pciDevice' private to pci.c Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

--- src/util/pci.c | 17 +++++++++++++++++ src/util/pci.h | 2 ++ 2 files changed, 19 insertions(+), 0 deletions(-) diff --git a/src/util/pci.c b/src/util/pci.c index 857078d..89a9d12 100644 --- a/src/util/pci.c +++ b/src/util/pci.c @@ -39,6 +39,7 @@ #include "command.h" #include "virterror_internal.h" #include "virfile.h" +#include "virnetdev.h" /* avoid compilation breakage on some systems */ #ifndef MODPROBE @@ -1991,6 +1992,22 @@ out: return ret; } + /* + * Returns a path to the PCI sysfs file given the BDF of the PCI function + */ + +int +pciSysfsFile(char **pci_sysfs_device_link, char *pciDeviceName) +{ + if (virAsprintf(pci_sysfs_device_link, PCI_SYSFS "devices/%s", + pciDeviceName) < 0) { + virReportOOMError(); + return -1; + } + + return 0; +} + /* * Returns the network device name of a pci device */ diff --git a/src/util/pci.h b/src/util/pci.h index 8e47fc2..3800d98 100644 --- a/src/util/pci.h +++ b/src/util/pci.h @@ -143,4 +143,6 @@ int pciGetVirtualFunctionIndex(const char *pf_sysfs_device_link, int pciDeviceNetName(char *device_link_sysfs_path, char **netname); +int pciSysfsFile(char **pci_sysfs_device_link, char *pciDeviceName); + #endif /* __VIR_PCI_H__ */ -- 1.7.4.4

On Tue, Nov 29, 2011 at 03:48:52PM +0000, Shradha Shah wrote:
--- src/util/pci.c | 17 +++++++++++++++++ src/util/pci.h | 2 ++ 2 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/src/util/pci.c b/src/util/pci.c index 857078d..89a9d12 100644 --- a/src/util/pci.c +++ b/src/util/pci.c @@ -39,6 +39,7 @@ #include "command.h" #include "virterror_internal.h" #include "virfile.h" +#include "virnetdev.h"
I think this can be removed.
@@ -1991,6 +1992,22 @@ out: return ret; }
+ /* + * Returns a path to the PCI sysfs file given the BDF of the PCI function + */ + +int +pciSysfsFile(char **pci_sysfs_device_link, char *pciDeviceName) +{ + if (virAsprintf(pci_sysfs_device_link, PCI_SYSFS "devices/%s", + pciDeviceName) < 0) { + virReportOOMError(); + return -1; + } + + return 0; +} + /* * Returns the network device name of a pci device */ diff --git a/src/util/pci.h b/src/util/pci.h index 8e47fc2..3800d98 100644 --- a/src/util/pci.h +++ b/src/util/pci.h @@ -143,4 +143,6 @@ int pciGetVirtualFunctionIndex(const char *pf_sysfs_device_link,
int pciDeviceNetName(char *device_link_sysfs_path, char **netname);
+int pciSysfsFile(char **pci_sysfs_device_link, char *pciDeviceName);
My preference would be to swap the order of these two args, so the out parameter is last. Functionally though, the patch is fine. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

--- src/util/pci.c | 3 +-- src/util/pci.h | 2 +- src/util/virnetdev.c | 2 +- 3 files changed, 3 insertions(+), 4 deletions(-) diff --git a/src/util/pci.c b/src/util/pci.c index 89a9d12..311a847 100644 --- a/src/util/pci.c +++ b/src/util/pci.c @@ -39,7 +39,6 @@ #include "command.h" #include "virterror_internal.h" #include "virfile.h" -#include "virnetdev.h" /* avoid compilation breakage on some systems */ #ifndef MODPROBE @@ -1997,7 +1996,7 @@ out: */ int -pciSysfsFile(char **pci_sysfs_device_link, char *pciDeviceName) +pciSysfsFile(char *pciDeviceName, char **pci_sysfs_device_link) { if (virAsprintf(pci_sysfs_device_link, PCI_SYSFS "devices/%s", pciDeviceName) < 0) { diff --git a/src/util/pci.h b/src/util/pci.h index 3800d98..30ff6ff 100644 --- a/src/util/pci.h +++ b/src/util/pci.h @@ -143,6 +143,6 @@ int pciGetVirtualFunctionIndex(const char *pf_sysfs_device_link, int pciDeviceNetName(char *device_link_sysfs_path, char **netname); -int pciSysfsFile(char **pci_sysfs_device_link, char *pciDeviceName); +int pciSysfsFile(char *pciDeviceName, char **pci_sysfs_device_link); #endif /* __VIR_PCI_H__ */ diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index d07d8fa..0b3f91a 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -1017,7 +1017,7 @@ virNetDevGetNetName(struct pci_config_address *vf, char **vfname) dev = pciGetDevice(vf->domain, vf->bus, vf->slot, vf->function); if (dev != NULL) { - if (pciSysfsFile(&pci_sysfs_device_link, dev->name) < 0) { + if (pciSysfsFile(dev->name, &pci_sysfs_device_link) < 0) { virReportSystemError(ENOSYS, "%s", _("Failed to get PCI SYSFS file")); goto out; -- 1.7.4.4

--- src/util/virnetdev.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virnetdev.h | 10 ++++++ 2 files changed, 99 insertions(+), 0 deletions(-) diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index 86196a1..d07d8fa 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -968,6 +968,76 @@ virNetDevSysfsDeviceFile(char **pf_sysfs_device_link, const char *ifname, } /** + * virNetDevGetVirtualFunctions: + * + * @pfname : name of the physical function interface name + * @virt_fns: array that will hold the pointers to the virtual_functions + * @num_virt_fns: number of virtual functions + * + * Returns 0 on success and -1 on failure + */ + +int +virNetDevGetVirtualFunctions(const char *pfname, + struct pci_config_address ***virt_fns, + unsigned int *num_virt_fns) +{ + int ret = -1; + char *pf_sysfs_device_link = NULL; + + + if (virNetDevSysfsFile(&pf_sysfs_device_link, pfname, "device") < 0) + return ret; + + if (pciGetVirtualFunctions(pf_sysfs_device_link, virt_fns, + num_virt_fns) < 0) + goto out; + ret = 0; + +out: + VIR_FREE(pf_sysfs_device_link); + return ret; +} + +/** + * virNetDevGetNetName: + * + * @vf : pci address (BDF) of the Virtual Function + * @vfname: interface name of the virtual function returned by this function + * + * Returns 0 on success and -1 on failure + */ + +int +virNetDevGetNetName(struct pci_config_address *vf, char **vfname) +{ + int ret = -1; + pciDevice *dev = NULL; + char *pci_sysfs_device_link = NULL; + + dev = pciGetDevice(vf->domain, vf->bus, vf->slot, vf->function); + if (dev != NULL) { + if (pciSysfsFile(&pci_sysfs_device_link, dev->name) < 0) { + virReportSystemError(ENOSYS, "%s", + _("Failed to get PCI SYSFS file")); + goto out; + } + + if (pciDeviceNetName(pci_sysfs_device_link, vfname) < 0) { + virReportSystemError(ENOSYS, "%s", + _("Failed to get interface name of the VF")); + goto out; + } + } + ret = 0; + +out: + VIR_FREE(pci_sysfs_device_link); + pciFreeDevice(dev); + return ret; +} + +/** * virNetDevIsVirtualFunction: * @ifname : name of the interface * @@ -1055,6 +1125,25 @@ virNetDevGetPhysicalFunction(const char *ifname, char **pfname) } #else /* !__linux__ */ int +virNetDevGetVirtualFunctions(const char *pfname ATTRIBUTE_UNUSED, + struct pci_config_address ***virt_fns ATTRIBUTE_UNUSED, + unsigned int *num_virt_fns ATTRIBUTE_UNUSED) +{ + virReportSystemError(ENOSYS, "%s", + _("Unable to get virtual functions on this platfornm")); + return -1; +} + +int +virNetDevGetNetName(struct pci_config_address *vf ATTRIBUTE_UNUSED, + char **vfname ATTRIBUTE_UNUSED) +{ + virReportSystemError(ENOSYS, "%s", + _("Unable to get Device name on this platfornm")); + return -1; +} + +int virNetDevIsVirtualFunction(const char *ifname ATTRIBUTE_UNUSED) { virReportSystemError(ENOSYS, "%s", diff --git a/src/util/virnetdev.h b/src/util/virnetdev.h index 13ba1da..4d1573a 100644 --- a/src/util/virnetdev.h +++ b/src/util/virnetdev.h @@ -24,6 +24,7 @@ # define __VIR_NETDEV_H__ # include "virsocketaddr.h" +# include "pci.h" int virNetDevExists(const char *brname) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; @@ -99,4 +100,13 @@ int virNetDevGetVirtualFunctionIndex(const char *pfname, const char *vfname, int virNetDevGetPhysicalFunction(const char *ifname, char **pfname) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; +int virNetDevGetVirtualFunctions(const char *pfname, + struct pci_config_address ***virt_fns, + unsigned int *num_virt_fns) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_RETURN_CHECK; + +int virNetDevGetNetName(struct pci_config_address *vf, char **vfname) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; + #endif /* __VIR_NETDEV_H__ */ -- 1.7.4.4

On Tue, Nov 29, 2011 at 03:49:05PM +0000, Shradha Shah wrote:
--- src/util/virnetdev.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virnetdev.h | 10 ++++++ 2 files changed, 99 insertions(+), 0 deletions(-)
diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index 86196a1..d07d8fa 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -968,6 +968,76 @@ virNetDevSysfsDeviceFile(char **pf_sysfs_device_link, const char *ifname, }
/** + * virNetDevGetVirtualFunctions: + * + * @pfname : name of the physical function interface name + * @virt_fns: array that will hold the pointers to the virtual_functions + * @num_virt_fns: number of virtual functions + * + * Returns 0 on success and -1 on failure + */ + +int +virNetDevGetVirtualFunctions(const char *pfname, + struct pci_config_address ***virt_fns, + unsigned int *num_virt_fns) +{ + int ret = -1; + char *pf_sysfs_device_link = NULL; + + + if (virNetDevSysfsFile(&pf_sysfs_device_link, pfname, "device") < 0) + return ret; + + if (pciGetVirtualFunctions(pf_sysfs_device_link, virt_fns, + num_virt_fns) < 0) + goto out; + ret = 0; + +out: + VIR_FREE(pf_sysfs_device_link); + return ret; +} + +/** + * virNetDevGetNetName: + * + * @vf : pci address (BDF) of the Virtual Function + * @vfname: interface name of the virtual function returned by this function + * + * Returns 0 on success and -1 on failure + */ + +int +virNetDevGetNetName(struct pci_config_address *vf, char **vfname) +{ + int ret = -1; + pciDevice *dev = NULL; + char *pci_sysfs_device_link = NULL; + + dev = pciGetDevice(vf->domain, vf->bus, vf->slot, vf->function); + if (dev != NULL) { + if (pciSysfsFile(&pci_sysfs_device_link, dev->name) < 0) { + virReportSystemError(ENOSYS, "%s", + _("Failed to get PCI SYSFS file")); + goto out; + } + + if (pciDeviceNetName(pci_sysfs_device_link, vfname) < 0) { + virReportSystemError(ENOSYS, "%s", + _("Failed to get interface name of the VF")); + goto out; + } + } + ret = 0; + +out: + VIR_FREE(pci_sysfs_device_link); + pciFreeDevice(dev); + return ret; +} + +/** * virNetDevIsVirtualFunction: * @ifname : name of the interface * @@ -1055,6 +1125,25 @@ virNetDevGetPhysicalFunction(const char *ifname, char **pfname) } #else /* !__linux__ */ int +virNetDevGetVirtualFunctions(const char *pfname ATTRIBUTE_UNUSED, + struct pci_config_address ***virt_fns ATTRIBUTE_UNUSED, + unsigned int *num_virt_fns ATTRIBUTE_UNUSED) +{ + virReportSystemError(ENOSYS, "%s", + _("Unable to get virtual functions on this platfornm")); + return -1; +} + +int +virNetDevGetNetName(struct pci_config_address *vf ATTRIBUTE_UNUSED, + char **vfname ATTRIBUTE_UNUSED) +{ + virReportSystemError(ENOSYS, "%s", + _("Unable to get Device name on this platfornm")); + return -1; +} + +int virNetDevIsVirtualFunction(const char *ifname ATTRIBUTE_UNUSED) { virReportSystemError(ENOSYS, "%s", diff --git a/src/util/virnetdev.h b/src/util/virnetdev.h index 13ba1da..4d1573a 100644 --- a/src/util/virnetdev.h +++ b/src/util/virnetdev.h @@ -24,6 +24,7 @@ # define __VIR_NETDEV_H__
# include "virsocketaddr.h" +# include "pci.h"
int virNetDevExists(const char *brname) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; @@ -99,4 +100,13 @@ int virNetDevGetVirtualFunctionIndex(const char *pfname, const char *vfname, int virNetDevGetPhysicalFunction(const char *ifname, char **pfname) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK;
+int virNetDevGetVirtualFunctions(const char *pfname, + struct pci_config_address ***virt_fns, + unsigned int *num_virt_fns) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_RETURN_CHECK; + +int virNetDevGetNetName(struct pci_config_address *vf, char **vfname) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; +
Rather than expose the 'struct pci_config_address' in the virnetdev.h header file, I think it would be desirable to merge the virNetDevGetNameName code into virNetDevGetVirtualFunctions, so that it just directly returns the list of VF devices. eg int virNetDevGetVirtualFunctions(const char *pfname, char **vfnames, unsigned int *nvfnames) The callers all seem to call virNetDevGetNetName on the results of virNetDevGetVirtualFunctions, so this merge would simplify code in the callers somewhat. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

--- src/conf/network_conf.c | 19 ++++++++++++++++++- src/conf/network_conf.h | 10 ++++++++++ 2 files changed, 28 insertions(+), 1 deletions(-) diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c index 10afcde..d30d757 100644 --- a/src/conf/network_conf.c +++ b/src/conf/network_conf.c @@ -43,6 +43,7 @@ #include "buf.h" #include "c-ctype.h" #include "virfile.h" +#include "logging.h" #define MAX_BRIDGE_ID 256 #define VIR_FROM_THIS VIR_FROM_NETWORK @@ -103,6 +104,12 @@ virNetworkForwardIfDefClear(virNetworkForwardIfDefPtr def) VIR_FREE(def->dev); } +static void +virNetworkForwardVfDefClear(virNetworkForwardVfDefPtr def) +{ + VIR_FREE(def->dev); +} + static void virNetworkIpDefClear(virNetworkIpDefPtr def) { int ii; @@ -144,7 +151,7 @@ static void virNetworkDNSDefFree(virNetworkDNSDefPtr def) void virNetworkDefFree(virNetworkDefPtr def) { - int ii; + int ii, jj; if (!def) return; @@ -154,6 +161,10 @@ void virNetworkDefFree(virNetworkDefPtr def) VIR_FREE(def->domain); for (ii = 0 ; ii < def->nForwardIfs && def->forwardIfs ; ii++) { + for (jj = 0 ; jj < (def->forwardIfs[ii]).nForwardVfs ; jj++) { + virNetworkForwardVfDefClear(&(def->forwardIfs[ii].forwardVfs[jj])); + } + VIR_FREE(def->forwardIfs[ii].forwardVfs); virNetworkForwardIfDefClear(&def->forwardIfs[ii]); } VIR_FREE(def->forwardIfs); @@ -979,6 +990,9 @@ virNetworkDefParseXML(xmlXPathContextPtr ctxt) if (forwardDev) { def->forwardIfs[0].usageCount = 0; + def->forwardIfs[0].vfs_in_use_count = 0; + def->forwardIfs[0].nForwardVfs = 0; + def->forwardIfs[0].forwardVfs = NULL; def->forwardIfs[0].dev = forwardDev; forwardDev = NULL; def->nForwardIfs++; @@ -1011,6 +1025,9 @@ virNetworkDefParseXML(xmlXPathContextPtr ctxt) def->forwardIfs[ii].dev = forwardDev; forwardDev = NULL; def->forwardIfs[ii].usageCount = 0; + def->forwardIfs[ii].vfs_in_use_count = 0; + def->forwardIfs[ii].nForwardVfs = 0; + def->forwardIfs[ii].forwardVfs = NULL; def->nForwardIfs++; } } diff --git a/src/conf/network_conf.h b/src/conf/network_conf.h index 1be20f8..d1dba94 100644 --- a/src/conf/network_conf.h +++ b/src/conf/network_conf.h @@ -112,11 +112,21 @@ struct _virNetworkIpDef { virSocketAddr bootserver; }; +typedef struct _virNetworkForwardVfDef virNetworkForwardVfDef; +typedef virNetworkForwardVfDef *virNetworkForwardVfDefPtr; +struct _virNetworkForwardVfDef { + char *dev; + int usageCount; +}; + typedef struct _virNetworkForwardIfDef virNetworkForwardIfDef; typedef virNetworkForwardIfDef *virNetworkForwardIfDefPtr; struct _virNetworkForwardIfDef { char *dev; /* name of device */ int usageCount; /* how many guest interfaces are bound to this device? */ + int vfs_in_use_count; /*if SRIOV, Are any of the Vf's on the dev in use? */ + size_t nForwardVfs; + virNetworkForwardVfDefPtr forwardVfs; }; typedef struct _virPortGroupDef virPortGroupDef; -- 1.7.4.4

--- src/network/bridge_driver.c | 208 +++++++++++++++++++++++++++++++++++++------ 1 files changed, 179 insertions(+), 29 deletions(-) diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index c49c25b..a2e3119 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -55,6 +55,7 @@ #include "memory.h" #include "uuid.h" #include "iptables.h" +#include "pci.h" #include "logging.h" #include "dnsmasq.h" #include "configmake.h" @@ -2823,8 +2824,11 @@ networkAllocateActualDevice(virDomainNetDefPtr iface) goto cleanup; } else { int ii; - virNetworkForwardIfDefPtr dev = NULL; - + virNetworkForwardVfDefPtr dev = NULL; + unsigned int vf_found = 0; + unsigned int num_virt_fns = 0; + struct pci_config_address **virt_fns = NULL; + /* pick an interface from the pool */ /* PASSTHROUGH mode, and PRIVATE Mode + 802.1Qbh both require @@ -2832,32 +2836,105 @@ networkAllocateActualDevice(virDomainNetDefPtr iface) * 0. Other modes can share, so just search for the one with * the lowest usageCount. */ - if ((netdef->forwardType == VIR_NETWORK_FORWARD_PASSTHROUGH) || - ((netdef->forwardType == VIR_NETWORK_FORWARD_PRIVATE) && - iface->data.network.actual->data.direct.virtPortProfile && - (iface->data.network.actual->data.direct.virtPortProfile->virtPortType - == VIR_NETDEV_VPORT_PROFILE_8021QBH))) { - /* pick first dev with 0 usageCount */ + if (netdef->forwardType == VIR_NETWORK_FORWARD_PASSTHROUGH) { + if (netdef->nForwardIfs == 1) { + if (netdef->forwardIfs[0].vfs_in_use_count == 0) { + if ((virNetDevGetVirtualFunctions(netdef->forwardIfs[0].dev, + &virt_fns, &num_virt_fns)) < 0){ + networkReportError(VIR_ERR_INTERNAL_ERROR, + _("Could not get Virtual functions on %s"), + netdef->forwardIfs[0].dev); + goto out; + } + + netdef->forwardIfs[0].nForwardVfs = num_virt_fns; + + if ((VIR_ALLOC_N(netdef->forwardIfs[0].forwardVfs, + netdef->forwardIfs[0].nForwardVfs)) < 0) { + virReportOOMError(); + goto out; + } + + for (ii = 0; ii < netdef->forwardIfs[0].nForwardVfs; ii++) { + if ((virNetDevGetNetName(virt_fns[ii], + &netdef->forwardIfs[0].forwardVfs[ii].dev)) < 0) { + networkReportError(VIR_ERR_INTERNAL_ERROR, + _("Could not get Interface name")); + goto out; + } + netdef->forwardIfs[0].forwardVfs[ii].usageCount = 0; + } + } + + for (ii = 0; ii < netdef->forwardIfs[0].nForwardVfs; ii++) { + if (netdef->forwardIfs[0].forwardVfs[ii].usageCount == 0) { + netdef->forwardIfs[0].vfs_in_use_count++; + dev = &netdef->forwardIfs[0].forwardVfs[ii]; + vf_found = 1; + break; + } + } + /* If No Vf's are present or if Vf's are in use + * check whether the PF is free and assign PF + * to dev + */ + if(vf_found == 0) { + networkReportError(VIR_ERR_INTERNAL_ERROR, + _("No Free Vf's on %s, checking if PF is free"), + netdef->forwardIfs[0].dev); + if (netdef->forwardIfs[0].usageCount == 0) { + dev = (virNetworkForwardVfDef *)&netdef->forwardIfs[0]; + } + } + vf_found = 0; + } + /* If more than 1 ForwardIfs are present check usagecount of each + * find the one that is free + */ + else { + for (ii = 0; ii < netdef->nForwardIfs; ii++) { + if (netdef->forwardIfs[ii].usageCount == 0) { + dev = (virNetworkForwardVfDef *)&netdef->forwardIfs[ii]; + break; + } + } + } + } + else if ((netdef->forwardType == VIR_NETWORK_FORWARD_PRIVATE) && + iface->data.network.actual->data.direct.virtPortProfile && + (iface->data.network.actual->data.direct.virtPortProfile->virtPortType + == VIR_NETDEV_VPORT_PROFILE_8021QBH)) { + + + /* pick first dev with 0 usageCount */ + for (ii = 0; ii < netdef->nForwardIfs; ii++) { if (netdef->forwardIfs[ii].usageCount == 0) { - dev = &netdef->forwardIfs[ii]; + dev = (virNetworkForwardVfDef *)&netdef->forwardIfs[ii]; break; } } - } else { + } + else { /* pick least used dev */ - dev = &netdef->forwardIfs[0]; + dev = (virNetworkForwardVfDef *)&netdef->forwardIfs[0]; for (ii = 1; ii < netdef->nForwardIfs; ii++) { if (netdef->forwardIfs[ii].usageCount < dev->usageCount) - dev = &netdef->forwardIfs[ii]; + dev = (virNetworkForwardVfDef *)&netdef->forwardIfs[ii]; } } + + out: + for (ii = 0; ii < num_virt_fns; ii++) + VIR_FREE(virt_fns[ii]); + VIR_FREE(virt_fns); + /* dev points at the physical device we want to use */ if (!dev) { networkReportError(VIR_ERR_INTERNAL_ERROR, _("network '%s' requires exclusive access to interfaces, but none are available"), - netdef->name); + netdef->name); goto cleanup; } iface->data.network.actual->data.direct.linkdev = strdup(dev->dev); @@ -2871,7 +2948,7 @@ networkAllocateActualDevice(virDomainNetDefPtr iface) dev->dev, dev->usageCount); } } - + ret = 0; cleanup: if (network) @@ -2935,17 +3012,55 @@ networkNotifyActualDevice(virDomainNetDefPtr iface) netdef->name); goto cleanup; } else { - int ii; - virNetworkForwardIfDefPtr dev = NULL; - + int ii, jj, isVf; + virNetworkForwardVfDefPtr dev = NULL; + char *pfDeviceName; + /* find the matching interface in the pool and increment its usageCount */ + + isVf = virNetDevIsVirtualFunction(actualDev); - for (ii = 0; ii < netdef->nForwardIfs; ii++) { - if (STREQ(actualDev, netdef->forwardIfs[ii].dev)) { - dev = &netdef->forwardIfs[ii]; - break; + if (isVf == 1) { + /*If Vf get PF : pciGetPhysicalFunction */ + if ((virNetDevGetPhysicalFunction(actualDev, &pfDeviceName)) < 0) { + networkReportError(VIR_ERR_INTERNAL_ERROR, + _("Could not get Physical functions for %s"), + actualDev); + goto out; + } + + /* Find the matching PF from the list of netdef->forwardIfs + * Search through the ForwardVfs list of the PF to find the VF + */ + for (ii = 0; ii < netdef->nForwardIfs; ii++) { + if (STREQ(pfDeviceName, netdef->forwardIfs[ii].dev)) { + for (jj = 0; jj < netdef->forwardIfs[ii].nForwardVfs; jj++) { + if (STREQ(actualDev, netdef->forwardIfs[ii].forwardVfs[jj].dev)) { + dev = &netdef->forwardIfs[ii].forwardVfs[jj]; + break; + } + } + } } } + else if (isVf == 0) { + /*If the actual dev is a PF do the following code*/ + for (ii = 0; ii < netdef->nForwardIfs; ii++) { + if (STREQ(actualDev, netdef->forwardIfs[ii].dev)) { + dev = (virNetworkForwardVfDef *)&netdef->forwardIfs[ii]; + break; + } + } + } + else { + networkReportError(VIR_ERR_INTERNAL_ERROR, + _("Could not find whether %s is PF or VF"), + actualDev); + goto out; + } + + out: + /* dev points at the physical device we want to use */ if (!dev) { networkReportError(VIR_ERR_INTERNAL_ERROR, @@ -2953,7 +3068,7 @@ networkNotifyActualDevice(virDomainNetDefPtr iface) netdef->name, actualDev); goto cleanup; } - + /* PASSTHROUGH mode, and PRIVATE Mode + 802.1Qbh both require * exclusive access to a device, so current usageCount must be * 0 in those cases. @@ -3036,15 +3151,50 @@ networkReleaseActualDevice(virDomainNetDefPtr iface) netdef->name); goto cleanup; } else { - int ii; - virNetworkForwardIfDefPtr dev = NULL; - - for (ii = 0; ii < netdef->nForwardIfs; ii++) { - if (STREQ(actualDev, netdef->forwardIfs[ii].dev)) { - dev = &netdef->forwardIfs[ii]; - break; + int ii, jj, isVf; + virNetworkForwardVfDefPtr dev = NULL; + char *pfDeviceName; + + isVf = virNetDevIsVirtualFunction(actualDev); + + if (isVf == 1) { + /*If Vf get PF */ + if ((virNetDevGetPhysicalFunction(actualDev, &pfDeviceName)) < 0) { + networkReportError(VIR_ERR_INTERNAL_ERROR, + _("Could not get Physical functions for %s"), + actualDev); + goto out; } + + /* Find the matching PF from the list of netdef->forwardIfs + * search through the ForwardVfs list of the PF to find the VF + */ + for (ii = 0; ii < netdef->nForwardIfs; ii++) { + if (STREQ(pfDeviceName, netdef->forwardIfs[ii].dev)) { + for (jj = 0; jj < netdef->forwardIfs[ii].nForwardVfs; jj++) { + if (STREQ(actualDev, netdef->forwardIfs[ii].forwardVfs[jj].dev)) { + dev = &netdef->forwardIfs[ii].forwardVfs[jj]; + break; + } + } + } + } + } + else if (isVf == 0) { + for (ii = 0; ii < netdef->nForwardIfs; ii++) { + if (STREQ(actualDev, netdef->forwardIfs[ii].dev)) { + dev = (virNetworkForwardVfDef *)&netdef->forwardIfs[ii]; + break; + } + } + } + else { + networkReportError(VIR_ERR_INTERNAL_ERROR, + _("Could not find whether %s is PF or VF"), + actualDev); + goto out; } + out: /* dev points at the physical device we've been using */ if (!dev) { networkReportError(VIR_ERR_INTERNAL_ERROR, -- 1.7.4.4

On Tue, Nov 29, 2011 at 03:46:13PM +0000, Shradha Shah wrote:
Interface Pools and Passthrough mode:
Current Method: The passthrough mode uses a macvtap a direct connection to connect each guest to the network. The physical interface to be used is picked from among those listed in <interface> sub elements of the <forward> element.
The current specification for <forward> extends to allow 0 or more <interface> sub-elements: Example: <forward mode='passthrough' dev='eth10'/> <interface dev='eth10'/> <interface dev='eth12'/> <interface dev='eth18'/> <interface dev='eth20'/> </forward>
However with an ethernet card with 64 VF's or more, the above method gets tedious on the system.
On the other hand, just parameterizing a string (eth%d) is inadequate, eg, when there are multiple non-contiguous ranges.
Proposed Method: The 5 patches provided along with this introductory e-mail
i) Introduce a structure to store the state of all the virtual functions attached to each physical function ii) Find a free virtual function given the physical function.
The forward specification will hence only mention the physical function as the interface sub element: Example: <forward mode='passthrough' dev='eth2'/> <interface dev='eth2'/> </forward>
I can see what you mean about it being tedious to construct the config with all 64 (or more) VF's listed, but this proposed change has a couple of issues First of all, the change you describe would be a semantic change in the network XML, which would break compatibility with previous releases of libvirt. Since we consider XML to be poart of our long term ABI stability guarantee I don't think we can do the change. Ignoring the ABI issue, I'm concerned that as we get PFs with an increasingly large number of VFs, we may well *not* want to associate all VFs with a single virtual network definition. eg, we might wna to put 32 VFs in one network and 32 VFs in another network. Or if we have 2 PFs, we might want to interleave VFs from several PFs across virtual networks. If all we can do is list the PF in the XML, we loose significant flexibility in how VFs are assigned. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 11/29/2011 02:53 PM, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 03:46:13PM +0000, Shradha Shah wrote:
Interface Pools and Passthrough mode:
Current Method: The passthrough mode uses a macvtap a direct connection to connect each guest to the network. The physical interface to be used is picked from among those listed in<interface> sub elements of the<forward> element.
The current specification for<forward> extends to allow 0 or more<interface> sub-elements: Example: <forward mode='passthrough' dev='eth10'/> <interface dev='eth10'/> <interface dev='eth12'/> <interface dev='eth18'/> <interface dev='eth20'/> </forward>
However with an ethernet card with 64 VF's or more, the above method gets tedious on the system.
On the other hand, just parameterizing a string (eth%d) is inadequate, eg, when there are multiple non-contiguous ranges.
Heh. You've gone through some of the same thought process I went through when I wrote the original code :-)
Proposed Method: The 5 patches provided along with this introductory e-mail
i) Introduce a structure to store the state of all the virtual functions attached to each physical function ii) Find a free virtual function given the physical function.
The forward specification will hence only mention the physical function as the interface sub element: Example: <forward mode='passthrough' dev='eth2'/> <interface dev='eth2'/> </forward>
I can see what you mean about it being tedious to construct the config with all 64 (or more) VF's listed, but this proposed change has a couple of issues
First of all, the change you describe would be a semantic change in the network XML, which would break compatibility with previous releases of libvirt. Since we consider XML to be poart of our long term ABI stability guarantee I don't think we can do the change.
Ignoring the ABI issue, I'm concerned that as we get PFs with an increasingly large number of VFs, we may well *not* want to associate all VFs with a single virtual network definition. eg, we might wna to put 32 VFs in one network and 32 VFs in another network. Or if we have 2 PFs, we might want to interleave VFs from several PFs across virtual networks. If all we can do is list the PF in the XML, we loose significant flexibility in how VFs are assigned.
My first concern too when I saw the patch was the semantic change (but also the loss of flexibility), which is obviously a no-go. It's a convenient capability to have though, so it would be nice to get it in somehow. What if we allowed including all the VFs associated with a PF by adding an extra attribute? e.g.: <interface dev='eth10' type='sriov'/> (or whatever is more appropriate in place of "sriov"). Or possibly a different element type could be used: <pf dev='eth10'/> (didn't want to spend time thinking of a better name than "pf"...). At the time the network is created, this would cause libvirt to get the list of all VFs for the given PF and put them into the pool. This could be used instead of, or in combination with, the existing <interface dev='eth1'/> form. Thus the existing semantics would be preserved, the flexibility of specifying individual devices would be retained, and the desired convenience of adding all VFs of a PF with a single line would be added.

On 11/30/2011 01:29 AM, Laine Stump wrote:
On 11/29/2011 02:53 PM, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 03:46:13PM +0000, Shradha Shah wrote:
Interface Pools and Passthrough mode:
Current Method: The passthrough mode uses a macvtap a direct connection to connect each guest to the network. The physical interface to be used is picked from among those listed in<interface> sub elements of the<forward> element.
The current specification for<forward> extends to allow 0 or more<interface> sub-elements: Example: <forward mode='passthrough' dev='eth10'/> <interface dev='eth10'/> <interface dev='eth12'/> <interface dev='eth18'/> <interface dev='eth20'/> </forward>
However with an ethernet card with 64 VF's or more, the above method gets tedious on the system.
On the other hand, just parameterizing a string (eth%d) is inadequate, eg, when there are multiple non-contiguous ranges.
Heh. You've gone through some of the same thought process I went through when I wrote the original code :-)
Proposed Method: The 5 patches provided along with this introductory e-mail
i) Introduce a structure to store the state of all the virtual functions attached to each physical function ii) Find a free virtual function given the physical function.
The forward specification will hence only mention the physical function as the interface sub element: Example: <forward mode='passthrough' dev='eth2'/> <interface dev='eth2'/> </forward>
I can see what you mean about it being tedious to construct the config with all 64 (or more) VF's listed, but this proposed change has a couple of issues
First of all, the change you describe would be a semantic change in the network XML, which would break compatibility with previous releases of libvirt. Since we consider XML to be poart of our long term ABI stability guarantee I don't think we can do the change.
Ignoring the ABI issue, I'm concerned that as we get PFs with an increasingly large number of VFs, we may well *not* want to associate all VFs with a single virtual network definition. eg, we might wna to put 32 VFs in one network and 32 VFs in another network. Or if we have 2 PFs, we might want to interleave VFs from several PFs across virtual networks. If all we can do is list the PF in the XML, we loose significant flexibility in how VFs are assigned.
My first concern too when I saw the patch was the semantic change (but also the loss of flexibility), which is obviously a no-go. It's a convenient capability to have though, so it would be nice to get it in somehow. What if we allowed including all the VFs associated with a PF by adding an extra attribute? e.g.:
<interface dev='eth10' type='sriov'/>
(or whatever is more appropriate in place of "sriov"). Or possibly a different element type could be used:
<pf dev='eth10'/>
(didn't want to spend time thinking of a better name than "pf"...).
At the time the network is created, this would cause libvirt to get the list of all VFs for the given PF and put them into the pool. This could be used instead of, or in combination with, the existing <interface dev='eth1'/> form. Thus the existing semantics would be preserved, the flexibility of specifying individual devices would be retained, and the desired convenience of adding all VFs of a PF with a single line would be added.
I completely agree with this method, I can work on this cause next week. May I ask which method you would suggest I go forward with, 1) <interface dev='eth10' type='sriov'/> 2) Or possibly a different element type could be used: <pf dev='eth10'/>
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Laine and Daniel, Many thanks for reviewing my patch series. I do understand the point of loss of flexibility when we want to interleave Vfs from several PFs across virtual networks. Regarding the changes to the network semantics, I am a little bit confused, 1) We do not require a change to the XML schema in order to support the patch series 2) Is the change required to support the modifications to the structures 'virNetworkForwardVfDef' and 'virNetworkForwardIfDef'? OR 3) Would the change be required because of the compatibility issue with interface pools? If the concern is the compatibility issue, the patch series I have submitted takes this into consideration and will still work as before if interface pools are mentioned instead of just the PF. I completely agree with Laine's suggestion in the previous thread, I can work on this cause next week. May I ask for suggestions as to which method I should go forward with, 1) <interface dev='eth10' type='sriov'/> 2) Or possibly a different element type could be used: <pf dev='eth10'/> Many Thanks, Regards, Shradha Shah

On 12/02/2011 11:30 AM, Shradha Shah wrote:
On 11/30/2011 01:29 AM, Laine Stump wrote:
On 11/29/2011 02:53 PM, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 03:46:13PM +0000, Shradha Shah wrote:
Interface Pools and Passthrough mode:
Current Method: The passthrough mode uses a macvtap a direct connection to connect each guest to the network. The physical interface to be used is picked from among those listed in<interface> sub elements of the<forward> element.
The current specification for<forward> extends to allow 0 or more<interface> sub-elements: Example: <forward mode='passthrough' dev='eth10'/> <interface dev='eth10'/> <interface dev='eth12'/> <interface dev='eth18'/> <interface dev='eth20'/> </forward>
However with an ethernet card with 64 VF's or more, the above method gets tedious on the system.
On the other hand, just parameterizing a string (eth%d) is inadequate, eg, when there are multiple non-contiguous ranges.
Heh. You've gone through some of the same thought process I went through when I wrote the original code :-)
Proposed Method: The 5 patches provided along with this introductory e-mail
i) Introduce a structure to store the state of all the virtual functions attached to each physical function ii) Find a free virtual function given the physical function.
The forward specification will hence only mention the physical function as the interface sub element: Example: <forward mode='passthrough' dev='eth2'/> <interface dev='eth2'/> </forward> I can see what you mean about it being tedious to construct the config with all 64 (or more) VF's listed, but this proposed change has a couple of issues
First of all, the change you describe would be a semantic change in the network XML, which would break compatibility with previous releases of libvirt. Since we consider XML to be poart of our long term ABI stability guarantee I don't think we can do the change.
Ignoring the ABI issue, I'm concerned that as we get PFs with an increasingly large number of VFs, we may well *not* want to associate all VFs with a single virtual network definition. eg, we might wna to put 32 VFs in one network and 32 VFs in another network. Or if we have 2 PFs, we might want to interleave VFs from several PFs across virtual networks. If all we can do is list the PF in the XML, we loose significant flexibility in how VFs are assigned. My first concern too when I saw the patch was the semantic change (but also the loss of flexibility), which is obviously a no-go. It's a convenient capability to have though, so it would be nice to get it in somehow. What if we allowed including all the VFs associated with a PF by adding an extra attribute? e.g.:
<interface dev='eth10' type='sriov'/>
(or whatever is more appropriate in place of "sriov"). Or possibly a different element type could be used:
<pf dev='eth10'/>
(didn't want to spend time thinking of a better name than "pf"...).
At the time the network is created, this would cause libvirt to get the list of all VFs for the given PF and put them into the pool. This could be used instead of, or in combination with, the existing<interface dev='eth1'/> form. Thus the existing semantics would be preserved, the flexibility of specifying individual devices would be retained, and the desired convenience of adding all VFs of a PF with a single line would be added.
I completely agree with this method, I can work on this cause next week. May I ask which method you would suggest I go forward with,
1)<interface dev='eth10' type='sriov'/> 2) Or possibly a different element type could be used:<pf dev='eth10'/>
I don't have a preference. I was hoping that Dan (or somebody else) would :-) (Also, don't infer from my mail that this suggestion is the way to go - my own ideas have been shot down on libvir-list before as well.)
Laine and Daniel,
Many thanks for reviewing my patch series.
I do understand the point of loss of flexibility when we want to interleave Vfs from several PFs across virtual networks.
Regarding the changes to the network semantics, I am a little bit confused,
1) We do not require a change to the XML schema in order to support the patch series
Additions to the schema are fine.
2) Is the change required to support the modifications to the structures 'virNetworkForwardVfDef' and 'virNetworkForwardIfDef'? OR 3) Would the change be required because of the compatibility issue with interface pools?
The problem we're concerned about is that the meaning of: <forward mode='passthrough'> <interface dev='eth0'/> </forward> changes. With existing code, that means "a pool with a single ethernet interface named eth0"; with your patches, it means "a pool with several ethernet interfaces ethX-ethY (which are the VFs associated with the PF named eth0). This isn't acceptable because libvirt guarantees 100% API stability across releases. It would be acceptable, though (in my opinion) to have the new behavior occur based on the presence of an extra attribute, or a different element.
If the concern is the compatibility issue, the patch series I have submitted takes this into consideration and will still work as before if interface pools are mentioned instead of just the PF.
Not having an SRIOV-capable NIC to toy with (yet! I have one on order), I was unable to directly try out your code, and didn't go through it line by line, but based on your description it seems that an interface definition that is currently valid (my example above) would have a different interpretation after your patches are applied. Is that not the case?
I completely agree with Laine's suggestion in the previous thread, I can work on this cause next week. May I ask for suggestions as to which method I should go forward with,
1)<interface dev='eth10' type='sriov'/> 2) Or possibly a different element type could be used:<pf dev='eth10'/>
You may want to wait until Monday to see if Dan weighs in on this one way or the other (or with a better idea) (he's traveling at the moment).

On Tue, Nov 29, 2011 at 08:29:35PM -0500, Laine Stump wrote:
On 11/29/2011 02:53 PM, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 03:46:13PM +0000, Shradha Shah wrote:
Interface Pools and Passthrough mode:
Current Method: The passthrough mode uses a macvtap a direct connection to connect each guest to the network. The physical interface to be used is picked from among those listed in<interface> sub elements of the<forward> element.
The current specification for<forward> extends to allow 0 or more<interface> sub-elements: Example: <forward mode='passthrough' dev='eth10'/> <interface dev='eth10'/> <interface dev='eth12'/> <interface dev='eth18'/> <interface dev='eth20'/> </forward>
However with an ethernet card with 64 VF's or more, the above method gets tedious on the system.
Ignoring the ABI issue, I'm concerned that as we get PFs with an increasingly large number of VFs, we may well *not* want to associate all VFs with a single virtual network definition. eg, we might wna to put 32 VFs in one network and 32 VFs in another network. Or if we have 2 PFs, we might want to interleave VFs from several PFs across virtual networks. If all we can do is list the PF in the XML, we loose significant flexibility in how VFs are assigned.
My first concern too when I saw the patch was the semantic change (but also the loss of flexibility), which is obviously a no-go. It's a convenient capability to have though, so it would be nice to get it in somehow. What if we allowed including all the VFs associated with a PF by adding an extra attribute? e.g.:
<interface dev='eth10' type='sriov'/>
This feels a little bit wrong to me.
(or whatever is more appropriate in place of "sriov"). Or possibly a different element type could be used:
<pf dev='eth10'/>
I like this idea, because it is providing additional useful info, rather than changing existing elements, so it is maximally compatible.
(didn't want to spend time thinking of a better name than "pf"...).
At the time the network is created, this would cause libvirt to get the list of all VFs for the given PF and put them into the pool. This could be used instead of, or in combination with, the existing <interface dev='eth1'/> form. Thus the existing semantics would be preserved, the flexibility of specifying individual devices would be retained, and the desired convenience of adding all VFs of a PF with a single line would be added.
IIUC, what you're suggesting is the following behaviour: * Explicit interface list. App inputs: <forward mode='passthrough'> <interface dev='eth10'/> <interface dev='eth11'/> <interface dev='eth12'/> <interface dev='eth13'/> </forward> libvirt does not change XML * Automatically interface list from PF. App inputs: <forward mode='passthrough'> <pf dev='eth0'/> </forward> libvirt expands XML to be <forward mode='passthrough'> <pf dev='eth0'/> <interface dev='eth10'/> <interface dev='eth11'/> <interface dev='eth12'/> <interface dev='eth13'/> </forward> This is good because all previous info is still intact Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 12/05/2011 11:37 AM, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 08:29:35PM -0500, Laine Stump wrote:
On 11/29/2011 02:53 PM, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 03:46:13PM +0000, Shradha Shah wrote:
Interface Pools and Passthrough mode:
Current Method: The passthrough mode uses a macvtap a direct connection to connect each guest to the network. The physical interface to be used is picked from among those listed in<interface> sub elements of the<forward> element.
The current specification for<forward> extends to allow 0 or more<interface> sub-elements: Example: <forward mode='passthrough' dev='eth10'/> <interface dev='eth10'/> <interface dev='eth12'/> <interface dev='eth18'/> <interface dev='eth20'/> </forward>
However with an ethernet card with 64 VF's or more, the above method gets tedious on the system.
Ignoring the ABI issue, I'm concerned that as we get PFs with an increasingly large number of VFs, we may well *not* want to associate all VFs with a single virtual network definition. eg, we might wna to put 32 VFs in one network and 32 VFs in another network. Or if we have 2 PFs, we might want to interleave VFs from several PFs across virtual networks. If all we can do is list the PF in the XML, we loose significant flexibility in how VFs are assigned.
My first concern too when I saw the patch was the semantic change (but also the loss of flexibility), which is obviously a no-go. It's a convenient capability to have though, so it would be nice to get it in somehow. What if we allowed including all the VFs associated with a PF by adding an extra attribute? e.g.:
<interface dev='eth10' type='sriov'/>
This feels a little bit wrong to me.
(or whatever is more appropriate in place of "sriov"). Or possibly a different element type could be used:
<pf dev='eth10'/>
I like this idea, because it is providing additional useful info, rather than changing existing elements, so it is maximally compatible.
(didn't want to spend time thinking of a better name than "pf"...).
At the time the network is created, this would cause libvirt to get the list of all VFs for the given PF and put them into the pool. This could be used instead of, or in combination with, the existing <interface dev='eth1'/> form. Thus the existing semantics would be preserved, the flexibility of specifying individual devices would be retained, and the desired convenience of adding all VFs of a PF with a single line would be added.
IIUC, what you're suggesting is the following behaviour:
* Explicit interface list. App inputs:
<forward mode='passthrough'> <interface dev='eth10'/> <interface dev='eth11'/> <interface dev='eth12'/> <interface dev='eth13'/> </forward>
libvirt does not change XML
* Automatically interface list from PF. App inputs:
<forward mode='passthrough'> <pf dev='eth0'/> </forward>
libvirt expands XML to be
<forward mode='passthrough'> <pf dev='eth0'/> <interface dev='eth10'/> <interface dev='eth11'/> <interface dev='eth12'/> <interface dev='eth13'/> </forward>
This is good because all previous info is still intact
Regards, Daniel
Daniel and Laine, I agree with the above mentioned suggestion and I will re-submit a set of patches using your suggestions. Many thanks for your input. Regards, Shradha

On 12/05/2011 06:37 AM, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 08:29:35PM -0500, Laine Stump wrote:
On 11/29/2011 02:53 PM, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 03:46:13PM +0000, Shradha Shah wrote:
Interface Pools and Passthrough mode:
Current Method: The passthrough mode uses a macvtap a direct connection to connect each guest to the network. The physical interface to be used is picked from among those listed in<interface> sub elements of the<forward> element.
The current specification for<forward> extends to allow 0 or more<interface> sub-elements: Example: <forward mode='passthrough' dev='eth10'/> <interface dev='eth10'/> <interface dev='eth12'/> <interface dev='eth18'/> <interface dev='eth20'/> </forward>
However with an ethernet card with 64 VF's or more, the above method gets tedious on the system. Ignoring the ABI issue, I'm concerned that as we get PFs with an increasingly large number of VFs, we may well *not* want to associate all VFs with a single virtual network definition. eg, we might wna to put 32 VFs in one network and 32 VFs in another network. Or if we have 2 PFs, we might want to interleave VFs from several PFs across virtual networks. If all we can do is list the PF in the XML, we loose significant flexibility in how VFs are assigned. My first concern too when I saw the patch was the semantic change (but also the loss of flexibility), which is obviously a no-go. It's a convenient capability to have though, so it would be nice to get it in somehow. What if we allowed including all the VFs associated with a PF by adding an extra attribute? e.g.:
<interface dev='eth10' type='sriov'/> This feels a little bit wrong to me.
(or whatever is more appropriate in place of "sriov"). Or possibly a different element type could be used:
<pf dev='eth10'/> I like this idea, because it is providing additional useful info, rather than changing existing elements, so it is maximally compatible.
(didn't want to spend time thinking of a better name than "pf"...).
At the time the network is created, this would cause libvirt to get the list of all VFs for the given PF and put them into the pool. This could be used instead of, or in combination with, the existing <interface dev='eth1'/> form. Thus the existing semantics would be preserved, the flexibility of specifying individual devices would be retained, and the desired convenience of adding all VFs of a PF with a single line would be added. IIUC, what you're suggesting is the following behaviour:
* Explicit interface list. App inputs:
<forward mode='passthrough'> <interface dev='eth10'/> <interface dev='eth11'/> <interface dev='eth12'/> <interface dev='eth13'/> </forward>
libvirt does not change XML
* Automatically interface list from PF. App inputs:
<forward mode='passthrough'> <pf dev='eth0'/> </forward>
libvirt expands XML to be
<forward mode='passthrough'> <pf dev='eth0'/> <interface dev='eth10'/> <interface dev='eth11'/> <interface dev='eth12'/> <interface dev='eth13'/> </forward>
This is good because all previous info is still intact
I actually hadn't thought of modifying the XML and displaying it in net-dumpxml or (netdumpxml --inactive), which is what I think you may be implying here. This would have the advantage of making a management application's job easier when displaying status (available interfaces, etc), but could lead to confusion when a host's hardware was changed (since there would be no detectable difference between dev elements that were entered by hand, and those that were automatically derived from a pf element). Also, it would end up cluttering up the config file again, which is part of what this is trying to avoid (although eliminating the need to type in all N vf names is the primary concern). Unless we come up with a way of differentiating between auto-generated <interface> elements (including keeping track of the parent <pf>) and those entered by hand, I think the XML itself shouldn't be changed, but only the contents of the interface pool in memory. (I'm not necessarily suggesting it, but one way of differentiating auto-generated entries would be with an additional "parent='eth0'" (or "pf='eth0'") attribute).

On Mon, Dec 05, 2011 at 02:00:52PM -0500, Laine Stump wrote:
On 12/05/2011 06:37 AM, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 08:29:35PM -0500, Laine Stump wrote:
On 11/29/2011 02:53 PM, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 03:46:13PM +0000, Shradha Shah wrote:
Interface Pools and Passthrough mode:
Current Method: The passthrough mode uses a macvtap a direct connection to connect each guest to the network. The physical interface to be used is picked from among those listed in<interface> sub elements of the<forward> element.
The current specification for<forward> extends to allow 0 or more<interface> sub-elements: Example: <forward mode='passthrough' dev='eth10'/> <interface dev='eth10'/> <interface dev='eth12'/> <interface dev='eth18'/> <interface dev='eth20'/> </forward>
However with an ethernet card with 64 VF's or more, the above method gets tedious on the system. Ignoring the ABI issue, I'm concerned that as we get PFs with an increasingly large number of VFs, we may well *not* want to associate all VFs with a single virtual network definition. eg, we might wna to put 32 VFs in one network and 32 VFs in another network. Or if we have 2 PFs, we might want to interleave VFs from several PFs across virtual networks. If all we can do is list the PF in the XML, we loose significant flexibility in how VFs are assigned. My first concern too when I saw the patch was the semantic change (but also the loss of flexibility), which is obviously a no-go. It's a convenient capability to have though, so it would be nice to get it in somehow. What if we allowed including all the VFs associated with a PF by adding an extra attribute? e.g.:
<interface dev='eth10' type='sriov'/> This feels a little bit wrong to me.
(or whatever is more appropriate in place of "sriov"). Or possibly a different element type could be used:
<pf dev='eth10'/> I like this idea, because it is providing additional useful info, rather than changing existing elements, so it is maximally compatible.
(didn't want to spend time thinking of a better name than "pf"...).
At the time the network is created, this would cause libvirt to get the list of all VFs for the given PF and put them into the pool. This could be used instead of, or in combination with, the existing <interface dev='eth1'/> form. Thus the existing semantics would be preserved, the flexibility of specifying individual devices would be retained, and the desired convenience of adding all VFs of a PF with a single line would be added. IIUC, what you're suggesting is the following behaviour:
* Explicit interface list. App inputs:
<forward mode='passthrough'> <interface dev='eth10'/> <interface dev='eth11'/> <interface dev='eth12'/> <interface dev='eth13'/> </forward>
libvirt does not change XML
* Automatically interface list from PF. App inputs:
<forward mode='passthrough'> <pf dev='eth0'/> </forward>
libvirt expands XML to be
<forward mode='passthrough'> <pf dev='eth0'/> <interface dev='eth10'/> <interface dev='eth11'/> <interface dev='eth12'/> <interface dev='eth13'/> </forward>
This is good because all previous info is still intact
I actually hadn't thought of modifying the XML and displaying it in net-dumpxml or (netdumpxml --inactive), which is what I think you may be implying here. This would have the advantage of making a management application's job easier when displaying status (available interfaces, etc), but could lead to confusion when a host's hardware was changed (since there would be no detectable difference between dev elements that were entered by hand, and those that were automatically derived from a pf element). Also, it would end up cluttering up the config file again, which is part of what this is trying to avoid (although eliminating the need to type in all N vf names is the primary concern).
Unless we come up with a way of differentiating between auto-generated <interface> elements (including keeping track of the parent <pf>) and those entered by hand, I think the XML itself shouldn't be changed, but only the contents of the interface pool in memory.
As with domains, every network has both an active and inactive XML config. When the network is not running, we should only be showing the user provided <interface> elements. Only once you start the network, do we automatically fill in <interface> elements based on <pf>. So if we add a flag for virNetworkGetXMLDesc() like VIR_NETWORK_XML_INACTIVE, then you can distinguish by comparing the live XML to the inactive XML. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 12/06/2011 10:16 AM, Daniel P. Berrange wrote:
On Mon, Dec 05, 2011 at 02:00:52PM -0500, Laine Stump wrote:
On 12/05/2011 06:37 AM, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 08:29:35PM -0500, Laine Stump wrote:
On 11/29/2011 02:53 PM, Daniel P. Berrange wrote:
On Tue, Nov 29, 2011 at 03:46:13PM +0000, Shradha Shah wrote:
Interface Pools and Passthrough mode:
Current Method: The passthrough mode uses a macvtap a direct connection to connect each guest to the network. The physical interface to be used is picked from among those listed in<interface> sub elements of the<forward> element.
The current specification for<forward> extends to allow 0 or more<interface> sub-elements: Example: <forward mode='passthrough' dev='eth10'/> <interface dev='eth10'/> <interface dev='eth12'/> <interface dev='eth18'/> <interface dev='eth20'/> </forward>
However with an ethernet card with 64 VF's or more, the above method gets tedious on the system. Ignoring the ABI issue, I'm concerned that as we get PFs with an increasingly large number of VFs, we may well *not* want to associate all VFs with a single virtual network definition. eg, we might wna to put 32 VFs in one network and 32 VFs in another network. Or if we have 2 PFs, we might want to interleave VFs from several PFs across virtual networks. If all we can do is list the PF in the XML, we loose significant flexibility in how VFs are assigned. My first concern too when I saw the patch was the semantic change (but also the loss of flexibility), which is obviously a no-go. It's a convenient capability to have though, so it would be nice to get it in somehow. What if we allowed including all the VFs associated with a PF by adding an extra attribute? e.g.:
<interface dev='eth10' type='sriov'/> This feels a little bit wrong to me.
(or whatever is more appropriate in place of "sriov"). Or possibly a different element type could be used:
<pf dev='eth10'/> I like this idea, because it is providing additional useful info, rather than changing existing elements, so it is maximally compatible.
(didn't want to spend time thinking of a better name than "pf"...).
At the time the network is created, this would cause libvirt to get the list of all VFs for the given PF and put them into the pool. This could be used instead of, or in combination with, the existing <interface dev='eth1'/> form. Thus the existing semantics would be preserved, the flexibility of specifying individual devices would be retained, and the desired convenience of adding all VFs of a PF with a single line would be added. IIUC, what you're suggesting is the following behaviour:
* Explicit interface list. App inputs:
<forward mode='passthrough'> <interface dev='eth10'/> <interface dev='eth11'/> <interface dev='eth12'/> <interface dev='eth13'/> </forward>
libvirt does not change XML
* Automatically interface list from PF. App inputs:
<forward mode='passthrough'> <pf dev='eth0'/> </forward>
libvirt expands XML to be
<forward mode='passthrough'> <pf dev='eth0'/> <interface dev='eth10'/> <interface dev='eth11'/> <interface dev='eth12'/> <interface dev='eth13'/> </forward>
This is good because all previous info is still intact
I actually hadn't thought of modifying the XML and displaying it in net-dumpxml or (netdumpxml --inactive), which is what I think you may be implying here. This would have the advantage of making a management application's job easier when displaying status (available interfaces, etc), but could lead to confusion when a host's hardware was changed (since there would be no detectable difference between dev elements that were entered by hand, and those that were automatically derived from a pf element). Also, it would end up cluttering up the config file again, which is part of what this is trying to avoid (although eliminating the need to type in all N vf names is the primary concern).
Unless we come up with a way of differentiating between auto-generated <interface> elements (including keeping track of the parent <pf>) and those entered by hand, I think the XML itself shouldn't be changed, but only the contents of the interface pool in memory.
As with domains, every network has both an active and inactive XML config. When the network is not running, we should only be showing the user provided <interface> elements. Only once you start the network, do we automatically fill in <interface> elements based on <pf>. So if we add a flag for virNetworkGetXMLDesc() like VIR_NETWORK_XML_INACTIVE, then you can distinguish by comparing the live XML to the inactive XML.
I agree that an inactive XML should show only the user provided data. But in the case of active XML, would it be wise to display the entire VF pool in the XML? I think this would clutter the config file when a NIC supports 127 VF's per port like in the case of Solarflare. Also, the free Vf's are discovered only after virNetworkDefParseXML and networkAllocateActualDevice (in mode Passthrough). Is there a way of modifying the active XML at this point? Shradha
Daniel

On Tue, Nov 29, 2011 at 03:46:13PM +0000, Shradha Shah wrote:
Shradha Shah (5): Moving the declaration of _pciDevice and _pciDeviceList to pci.h Added function pciSysfsFile to enable access to the PCI SYSFS files. Adding functions virNetDevGetVirtualFunctions and virNetDevGetNetName virNetDevGetVirtualFunctions: Gets the BDF of all the Virtual Functions given a physical function virNetDevGetNetName: Gets the interface name of the PCI Device. Addition of a new device structure to store the state of a Virtual Function Modifications to the Physical Device Structure to store state of its Virtual Functions If a system has 64 or more VF's, it is quite tedious to mention each VF in the interface pool. The following modification find a Free VF given a Physical Function when mode=passthrough. We also save the state of each VF. Hence modifications to networkAllocateActualDevice, networkNotifyActualDevice and networkReleaseActualDevice were required. The following patch does just that.
BTW, to make 'git send-email' happy when generating Subject: lines and these patch summaries, make sure the commit message consists of a short single line summary, then a blank line, then the longer multiple line description Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (3)
-
Daniel P. Berrange
-
Laine Stump
-
Shradha Shah