[libvirt] [PATCH 0/4] Support mac and port profile for <interface type='hostdev'>

This patch series is based on laines patches to support <interface type='hostdev'>. https://www.redhat.com/archives/libvir-list/2012-February/msg01126.html It support to set mac and port profile on an interface of type hostdev. * If virtualport is specified, the existing virtual port functions are called to set mac, vlan and port profile. * If virtualport is not specified and device is a sriov virtual function, - mac is set using IFLA_VF_MAC * If virtualport is not specified and device is a non-sriov virtual function, - mac is set using existing SIOCGIFHWADDR (This requires that the netdev be present on the host before starting the VM) This series implements the below : 01/4 pci: Add two new pci util pciDeviceGetVirtualFunctionInfo and pciConfigAddressToSysfsFile 02/4 virtnetdev: Add support functions for mac and portprofile associations on a hostdev 03/4 virnetdevvportprofile: Changes to support portprofiles for hostdevs 04/4 qemu_hostdev: Add support to install port profile and mac address on hostdev Stefan Berger is CC'ed for 802.1Qbg changes in patch 03/4. Current code for 802.1Qbg uses macvtap ifname. And for network interfaces with type=hostdev a macvtap ifname does not exist. This patch just adds a null check for ifname in 802.1Qbg port profile handling code.

From: Roopa Prabhu <roprabhu@cisco.com> pciDeviceGetVirtualFunctionInfo returns pf netdevice name and virtual function index for a given vf. This is just a wrapper around existing functions to return vf's pf and vf_index with one api call pciConfigAddressToSysfsfile returns the sysfile pci device link from a 'struct pci_config_address' Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> --- src/util/pci.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ src/util/pci.h | 7 +++++++ 2 files changed, 62 insertions(+), 0 deletions(-) diff --git a/src/util/pci.c b/src/util/pci.c index c660e8d..d8c136e 100644 --- a/src/util/pci.c +++ b/src/util/pci.c @@ -2081,6 +2081,20 @@ pciSysfsFile(char *pciDeviceName, char **pci_sysfs_device_link) return 0; } +int +pciConfigAddressToSysfsFile(struct pci_config_address *dev, + char **pci_sysfs_device_link) +{ + if (virAsprintf(pci_sysfs_device_link, + PCI_SYSFS "devices/%04x:%02x:%02x.%x", dev->domain, + dev->bus, dev->slot, dev->function) < 0) { + virReportOOMError(); + return -1; + } + + return 0; +} + /* * Returns the network device name of a pci device */ @@ -2123,6 +2137,38 @@ out: return ret; } + +int +pciDeviceGetVirtualFunctionInfo(const char *vf_sysfs_device_path, + char **pfname, int *vf_index) +{ + struct pci_config_address *pf_config_address = NULL; + char *pf_sysfs_device_path = NULL; + int ret = -1; + + if (pciGetPhysicalFunction(vf_sysfs_device_path, &pf_config_address) < 0) + return ret; + + if (pciConfigAddressToSysfsFile(pf_config_address, + &pf_sysfs_device_path) < 0) { + + VIR_FREE(pf_config_address); + return ret; + } + + if (pciGetVirtualFunctionIndex(pf_sysfs_device_path, vf_sysfs_device_path, + vf_index) < 0) + goto cleanup; + + ret = pciDeviceNetName(pf_sysfs_device_path, pfname); + +cleanup: + VIR_FREE(pf_config_address); + VIR_FREE(pf_sysfs_device_path); + + return ret; +} + #else int pciGetPhysicalFunction(const char *vf_sysfs_path ATTRIBUTE_UNUSED, @@ -2170,4 +2216,13 @@ pciDeviceNetName(char *device_link_sysfs_path ATTRIBUTE_UNUSED, "supported on non-linux platforms")); return -1; } + +int +pciDeviceGetVirtualFunctionInfo(const char *vf_sysfs_device_path, + char **pfname, int *vf_index) +{ + pciReportError(VIR_ERR_INTERNAL_ERROR, _("pciDeviceGetVirtualFunctionInfo " + "is not supported on non-linux platforms")); + return -1; +} #endif /* __linux__ */ diff --git a/src/util/pci.h b/src/util/pci.h index 25b5b66..b71bb12 100644 --- a/src/util/pci.h +++ b/src/util/pci.h @@ -111,6 +111,9 @@ int pciGetVirtualFunctionIndex(const char *pf_sysfs_device_link, const char *vf_sysfs_device_link, int *vf_index); +int pciConfigAddressToSysfsFile(struct pci_config_address *dev, + char **pci_sysfs_device_link); + int pciDeviceNetName(char *device_link_sysfs_path, char **netname); int pciSysfsFile(char *pciDeviceName, char **pci_sysfs_device_link) @@ -122,4 +125,8 @@ int pciGetDeviceAddrString(unsigned domain, unsigned function, char **pciConfigAddr) ATTRIBUTE_NONNULL(5) ATTRIBUTE_RETURN_CHECK; + +int pciDeviceGetVirtualFunctionInfo(const char *vf_sysfs_device_path, + char **pfname, int *vf_index); + #endif /* __VIR_PCI_H__ */

From: Roopa Prabhu <roprabhu@cisco.com> This patch adds the following: - functions to set and get vf configs - Functions to replace and store vf configs (Only mac address is handled today. But the functions can be easily extended for vlans and other vf configs) - function to dump link dev info (This is moved from virnetdevvportprofile.c) Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> --- src/util/virnetdev.c | 531 ++++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virnetdev.h | 19 ++ 2 files changed, 549 insertions(+), 1 deletions(-) diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index 9d76d47..25f2155 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -1127,8 +1127,497 @@ virNetDevGetPhysicalFunction(const char *ifname, char **pfname) return ret; } -#else /* !__linux__ */ +static struct nla_policy ifla_vf_policy[IFLA_VF_MAX+1] = { + [IFLA_VF_MAC] = { .type = NLA_UNSPEC, + .maxlen = sizeof(struct ifla_vf_mac) }, + [IFLA_VF_VLAN] = { .type = NLA_UNSPEC, + .maxlen = sizeof(struct ifla_vf_vlan) }, +}; + +/** + * virNetDevLinkDump: + * + * @ifname: The name of the interface; only use if ifindex < 0 + * @ifindex: The interface index; may be < 0 if ifname is given + * @nltarget_kernel: whether to send the message to the kernel or another + * process + * @nlattr: pointer to a pointer of netlink attributes that will contain + * the results + * @recvbuf: Pointer to the buffer holding the returned netlink response + * message; free it, once not needed anymore + * @getPidFunc: Pointer to a function that will be invoked if the kernel + * is not the target of the netlink message but it is to be + * sent to another process. + * + * Get information about an interface given its name or index. + * + * Returns 0 on success, -1 on fatal error. + */ +int +virNetDevLinkDump(const char *ifname, int ifindex, + bool nltarget_kernel, struct nlattr **tb, + unsigned char **recvbuf, + uint32_t (*getPidFunc)(void)) +{ + int rc = 0; + struct nlmsghdr *resp; + struct nlmsgerr *err; + struct ifinfomsg ifinfo = { + .ifi_family = AF_UNSPEC, + .ifi_index = ifindex + }; + unsigned int recvbuflen; + uint32_t pid = 0; + struct nl_msg *nl_msg; + + *recvbuf = NULL; + + if (ifindex <= 0 && virNetDevGetIndex(ifname, &ifindex) < 0) + return -1; + + ifinfo.ifi_index = ifindex; + + nl_msg = nlmsg_alloc_simple(RTM_GETLINK, NLM_F_REQUEST); + if (!nl_msg) { + virReportOOMError(); + return -1; + } + + if (nlmsg_append(nl_msg, &ifinfo, sizeof(ifinfo), NLMSG_ALIGNTO) < 0) + goto buffer_too_small; + + if (ifindex < 0 && ifname) { + if (nla_put(nl_msg, IFLA_IFNAME, strlen(ifname)+1, ifname) < 0) + goto buffer_too_small; + } + + if (!nltarget_kernel) { + pid = getPidFunc(); + if (pid == 0) { + rc = -1; + goto cleanup; + } + } + + if (virNetlinkCommand(nl_msg, recvbuf, &recvbuflen, pid) < 0) { + rc = -1; + goto cleanup; + } + + if (recvbuflen < NLMSG_LENGTH(0) || *recvbuf == NULL) + goto malformed_resp; + + resp = (struct nlmsghdr *)*recvbuf; + + switch (resp->nlmsg_type) { + case NLMSG_ERROR: + err = (struct nlmsgerr *)NLMSG_DATA(resp); + if (resp->nlmsg_len < NLMSG_LENGTH(sizeof(*err))) + goto malformed_resp; + + if (err->error) { + virReportSystemError(-err->error, + _("error dumping %s (%d) interface"), + ifname, ifindex); + rc = -1; + } + break; + + case GENL_ID_CTRL: + case NLMSG_DONE: + rc = nlmsg_parse(resp, sizeof(struct ifinfomsg), + tb, IFLA_MAX, NULL); + if (rc < 0) + goto malformed_resp; + break; + + default: + goto malformed_resp; + } + + if (rc != 0) + VIR_FREE(*recvbuf); + +cleanup: + nlmsg_free(nl_msg); + + return rc; + +malformed_resp: + nlmsg_free(nl_msg); + + virNetDevError(VIR_ERR_INTERNAL_ERROR, "%s", + _("malformed netlink response message")); + VIR_FREE(*recvbuf); + return -1; + +buffer_too_small: + nlmsg_free(nl_msg); + + virNetDevError(VIR_ERR_INTERNAL_ERROR, "%s", + _("allocated netlink buffer is too small")); + return -1; +} + +static int +virNetDevSetVfConfig(const char *ifname, int ifindex, int vf, + bool nltarget_kernel, const unsigned char *macaddr, + int vlanid, uint32_t (*getPidFunc)(void)) +{ + int rc = -1; + struct nlmsghdr *resp; + struct nlmsgerr *err; + unsigned char *recvbuf = NULL; + unsigned int recvbuflen = 0; + uint32_t pid = 0; + struct nl_msg *nl_msg; + struct ifinfomsg ifinfo = { + .ifi_family = AF_UNSPEC, + .ifi_index = ifindex + }; + + nl_msg = nlmsg_alloc_simple(RTM_SETLINK, NLM_F_REQUEST); + if (!nl_msg) { + virReportOOMError(); + return rc; + } + + if (nlmsg_append(nl_msg, &ifinfo, sizeof(ifinfo), NLMSG_ALIGNTO) < 0) + goto buffer_too_small; + + if (ifname && + nla_put(nl_msg, IFLA_IFNAME, strlen(ifname)+1, ifname) < 0) + goto buffer_too_small; + + if (macaddr || vlanid >= 0) { + struct nlattr *vfinfolist, *vfinfo; + + if (!(vfinfolist = nla_nest_start(nl_msg, IFLA_VFINFO_LIST))) + goto buffer_too_small; + + if (!(vfinfo = nla_nest_start(nl_msg, IFLA_VF_INFO))) + goto buffer_too_small; + + if (macaddr) { + struct ifla_vf_mac ifla_vf_mac = { + .vf = vf, + .mac = { 0, }, + }; + + memcpy(ifla_vf_mac.mac, macaddr, 6); + + if (nla_put(nl_msg, IFLA_VF_MAC, sizeof(ifla_vf_mac), + &ifla_vf_mac) < 0) + goto buffer_too_small; + } + + if (vlanid >= 0) { + struct ifla_vf_vlan ifla_vf_vlan = { + .vf = vf, + .vlan = vlanid, + .qos = 0, + }; + + if (nla_put(nl_msg, IFLA_VF_VLAN, sizeof(ifla_vf_vlan), + &ifla_vf_vlan) < 0) + goto buffer_too_small; + } + + nla_nest_end(nl_msg, vfinfo); + nla_nest_end(nl_msg, vfinfolist); + } + + if (!nltarget_kernel) { + pid = getPidFunc(); + if (pid == 0) { + rc = -1; + goto err_exit; + } + } + + if (virNetlinkCommand(nl_msg, &recvbuf, &recvbuflen, pid) < 0) + goto err_exit; + + if (recvbuflen < NLMSG_LENGTH(0) || recvbuf == NULL) + goto malformed_resp; + + resp = (struct nlmsghdr *)recvbuf; + + switch (resp->nlmsg_type) { + case NLMSG_ERROR: + err = (struct nlmsgerr *)NLMSG_DATA(resp); + if (resp->nlmsg_len < NLMSG_LENGTH(sizeof(*err))) + goto malformed_resp; + + if (err->error) { + virReportSystemError(-err->error, + _("error during set vf mac of ifindex %d"), + ifindex); + goto err_exit; + } + break; + + case NLMSG_DONE: + break; + + default: + goto malformed_resp; + } + + rc = 0; + +err_exit: + nlmsg_free(nl_msg); + + VIR_FREE(recvbuf); + + return rc; + +malformed_resp: + nlmsg_free(nl_msg); + + virNetDevError(VIR_ERR_INTERNAL_ERROR, "%s", + _("malformed netlink response message")); + VIR_FREE(recvbuf); + return rc; + +buffer_too_small: + nlmsg_free(nl_msg); + + virNetDevError(VIR_ERR_INTERNAL_ERROR, "%s", + _("allocated netlink buffer is too small")); + return rc; +} + +static int +virNetDevParseVfConfig(struct nlattr **tb, int32_t vf, unsigned char *mac, + int *vlanid) +{ + const char *msg = NULL; + int rc = -1; + + if (tb[IFLA_VFINFO_LIST]) { + struct ifla_vf_mac *vf_mac; + struct ifla_vf_vlan *vf_vlan; + struct nlattr *tb_vf_info = {NULL, }; + struct nlattr *tb_vf[IFLA_VF_MAX+1]; + int found = 0; + int rem; + + nla_for_each_nested(tb_vf_info, tb[IFLA_VFINFO_LIST], rem) { + if (nla_type(tb_vf_info) != IFLA_VF_INFO) + continue; + + if (nla_parse_nested(tb_vf, IFLA_VF_MAX, tb_vf_info, + ifla_vf_policy)) { + msg = _("error parsing IFLA_VF_INFO"); + goto err_exit; + } + + if (tb[IFLA_VF_MAC]) { + vf_mac = RTA_DATA(tb_vf[IFLA_VF_MAC]); + if (vf_mac && vf_mac->vf == vf) { + memcpy(mac, vf_mac->mac, VIR_MAC_BUFLEN); + found = 1; + } + } + + if (tb[IFLA_VF_VLAN]) { + vf_vlan = RTA_DATA(tb_vf[IFLA_VF_VLAN]); + if (vf_vlan && vf_vlan->vf == vf) { + *vlanid = vf_vlan->vlan; + found = 1; + } + } + if (found) { + rc = 0; + break; + } + } + } + +err_exit: + if (msg) + virNetDevError(VIR_ERR_INTERNAL_ERROR, "%s", msg); + + return rc; +} + +static int +virNetDevGetVfConfig(const char *ifname, int vf, unsigned char *mac, + int *vlanid) +{ + int rc = -1; + unsigned char *recvbuf = NULL; + struct nlattr *tb[IFLA_MAX + 1] = {NULL, }; + int ifindex = -1; + + rc = virNetDevLinkDump(ifname, ifindex, true, tb, &recvbuf, NULL); + if (rc < 0) + return rc; + + rc = virNetDevParseVfConfig(tb, vf, mac, vlanid); + + VIR_FREE(recvbuf); + + return rc; +} + +static int +virNetDevReplaceVfConfig(const char *pflinkdev, int vf, + const unsigned char *macaddress, + int vlanid, + const char *stateDir) +{ + unsigned char oldmac[6]; + int oldvlanid = -1; + char *path = NULL; + char macstr[VIR_MAC_STRING_BUFLEN]; + int ifindex = -1; + + if (virNetDevGetVfConfig(pflinkdev, vf, oldmac, &oldvlanid) < 0) + return -1; + + if (virAsprintf(&path, "%s/%s_vf%d", + stateDir, pflinkdev, vf) < 0) { + virReportOOMError(); + return -1; + } + + virMacAddrFormat(oldmac, macstr); + if (virFileWriteStr(path, macstr, O_CREAT|O_TRUNC|O_WRONLY) < 0) { + virReportSystemError(errno, _("Unable to preserve mac for pf = %s, vf = %d"), pflinkdev, vf); + VIR_FREE(path); + return -1; + } + + VIR_FREE(path); + + return virNetDevSetVfConfig(pflinkdev, ifindex, vf, true, + macaddress, vlanid, NULL); +} + +static int +virNetDevRestoreVfConfig(const char *pflinkdev, int vf, + const char *stateDir) +{ + int rc = -1; + char *macstr = NULL; + char *path = NULL; + unsigned char oldmac[6]; + int vlanid = -1; + int ifindex = -1; + + if (virAsprintf(&path, "%s/%s_vf%d", + stateDir, pflinkdev, vf) < 0) { + virReportOOMError(); + return rc; + } + + if (virFileReadAll(path, VIR_MAC_STRING_BUFLEN, &macstr) < 0) { + VIR_FREE(path); + return rc; + } + + if (virMacAddrParse(macstr, &oldmac[0]) != 0) { + virNetDevError(VIR_ERR_INTERNAL_ERROR, + _("Cannot parse MAC address from '%s'"), + macstr); + goto cleanup; + } + + /*reset mac and remove file-ignore results*/ + rc = virNetDevSetVfConfig(pflinkdev, ifindex, vf, true, + oldmac, vlanid, NULL); + ignore_value(unlink(path)); + +cleanup: + VIR_FREE(path); + VIR_FREE(macstr); + + return rc; +} + +/** + * virNetDevReplaceNetConfig: + * @linkdev: name of the interface + * @vf: vf index if linkdev is a pf + * @macaddress: new MAC address for interface + * @vlanid: new vlanid + * @stateDir: directory to store old net config + * + * Returns 0 on success, -1 on failure + * + */ +int +virNetDevReplaceNetConfig(char *linkdev, int vf, + const unsigned char *macaddress, int vlanid, + char *stateDir) +{ + if (vf == -1) + return virNetDevReplaceMacAddress(linkdev, macaddress, stateDir); + else + return virNetDevReplaceVfConfig(linkdev, vf, macaddress, vlanid, + stateDir); +} + +/** + * virNetDevRestoreNetConfig: + * @linkdev: name of the interface + * @vf: vf index if linkdev is a pf + * @stateDir: directory containing old net config + * + * Returns 0 on success, -errno on failure. + * + */ +int +virNetDevRestoreNetConfig(char *linkdev, int vf, char *stateDir) +{ + if (vf == -1) + return virNetDevRestoreMacAddress(linkdev, stateDir); + else + return virNetDevRestoreVfConfig(linkdev, vf, stateDir); +} + +/** + * virNetDevGetVirtualFunctionInfo: + * @vfname: name of the virtual function interface + * @pfname: name of the physical function + * @vf: vf index + * + * Returns 0 on success, -errno on failure. + * + */ +int +virNetDevGetVirtualFunctionInfo(const char *vfname, char **pfname, + int *vf) +{ + char *pf_sysfs_path = NULL, *vf_sysfs_path = NULL; + int ret = -1; + + if (virNetDevGetPhysicalFunction(vfname, pfname) < 0) + return ret; + + if (virNetDevSysfsDeviceFile(&pf_sysfs_path, *pfname, "device") < 0) { + VIR_FREE(*pfname); + return ret; + } + + if (virNetDevSysfsDeviceFile(&vf_sysfs_path, vfname, "device") < 0) { + VIR_FREE(*pfname); + VIR_FREE(pf_sysfs_path); + return ret; + } + + ret = pciGetVirtualFunctionIndex(pf_sysfs_path, vf_sysfs_path, vf); + + VIR_FREE(vf_sysfs_path); + VIR_FREE(pf_sysfs_path); + + return ret; +} +#else /* !__linux__ */ int virNetDevGetVirtualFunctions(const char *pfname ATTRIBUTE_UNUSED, char ***vfname ATTRIBUTE_UNUSED, @@ -1165,4 +1654,44 @@ virNetDevGetPhysicalFunction(const char *ifname ATTRIBUTE_UNUSED, _("Unable to get physical function status on this platform")); return -1; } + +int +virNetDevLinkDump(const char *ifname, int ifindex, + bool nltarget_kernel, struct nlattr **tb, + unsigned char **recvbuf, + uint32_t (*getPidFunc)(void)) +{ + virReportSystemError(ENOSYS, "%s", + _("Unable to dump link info on this platform")); + return -1; +} + +int +virNetDevReplaceNetConfig(char *linkdev, int vf, + const unsigned char *macaddress, int vlanid, + char *stateDir) +{ + virReportSystemError(ENOSYS, "%s", + _("Unable to replace net config on this platform")); + return -1; + +} + +int +virNetDevRestoreNetConfig(char *linkdev, int vf, char *stateDir) +{ + virReportSystemError(ENOSYS, "%s", + _("Unable to restore net config on this platform")); + return -1; +} + +int +virNetDevGetVirtualFunctionInfo(const char *vfname, char **pfname, + int *vf) +{ + virReportSystemError(ENOSYS, "%s", + _("Unable to get virtual function info on this platform")); + return -1; +} + #endif /* !__linux__ */ diff --git a/src/util/virnetdev.h b/src/util/virnetdev.h index 7845e25..d6b78c3 100644 --- a/src/util/virnetdev.h +++ b/src/util/virnetdev.h @@ -24,6 +24,7 @@ # define __VIR_NETDEV_H__ # include "virsocketaddr.h" +# include "virnetlink.h" int virNetDevExists(const char *brname) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; @@ -105,4 +106,22 @@ int virNetDevGetVirtualFunctions(const char *pfname, ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) ATTRIBUTE_RETURN_CHECK; +int virNetDevLinkDump(const char *ifname, int ifindex, + bool nltarget_kernel, struct nlattr **tb, + unsigned char **recvbuf, + uint32_t (*getPidFunc)(void)) + ATTRIBUTE_RETURN_CHECK; + +int virNetDevReplaceNetConfig(char *linkdev, int vf, + const unsigned char *macaddress, int vlanid, + char *stateDir) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(5); + +int virNetDevRestoreNetConfig(char *linkdev, int vf, char *stateDir) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3); + +int virNetDevGetVirtualFunctionInfo(const char *vfname, char **pfname, + int *vf) + ATTRIBUTE_NONNULL(1); + #endif /* __VIR_NETDEV_H__ */

From: Roopa Prabhu <roprabhu@cisco.com> This patch includes the following changes - removes some netlink functions which are now available in virnetdev.c - Adds a vf argument to all port profile functions For 802.1Qbh devices, the port profile calls can use a vf argument if passed by the caller. If the vf argument is -1 it will try to derive the vf if the device passed is a virtual function. For 802.1Qbg devices, This patch introduces a null check for the device argument because during port profile assignment on a hostdev, this argument can be null. Stefan CC'ed for comments Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> --- src/qemu/qemu_migration.c | 2 src/util/virnetdevmacvlan.c | 6 + src/util/virnetdevvportprofile.c | 203 +++++--------------------------------- src/util/virnetdevvportprofile.h | 8 + 4 files changed, 42 insertions(+), 177 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 7df2d4f..b80ed69 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2605,6 +2605,7 @@ qemuMigrationVPAssociatePortProfiles(virDomainDefPtr def) { virDomainNetGetActualVirtPortProfile(net), net->mac, virDomainNetGetActualDirectDev(net), + -1, def->uuid, VIR_NETDEV_VPORT_PROFILE_OP_MIGRATE_IN_FINISH) < 0) goto err_exit; @@ -2622,6 +2623,7 @@ err_exit: virDomainNetGetActualVirtPortProfile(net), net->mac, virDomainNetGetActualDirectDev(net), + -1, VIR_NETDEV_VPORT_PROFILE_OP_MIGRATE_IN_FINISH)); } } diff --git a/src/util/virnetdevmacvlan.c b/src/util/virnetdevmacvlan.c index 25d0846..498a2a0 100644 --- a/src/util/virnetdevmacvlan.c +++ b/src/util/virnetdevmacvlan.c @@ -486,6 +486,7 @@ int virNetDevMacVLanCreateWithVPortProfile(const char *tgifname, uint32_t macvtapMode; const char *cr_ifname; int ret; + int vf = -1; macvtapMode = modeMap[mode]; @@ -547,6 +548,7 @@ create_name: virtPortProfile, macaddress, linkdev, + vf, vmuuid, vmOp) < 0) { rc = -1; goto link_del_exit; @@ -597,6 +599,7 @@ disassociate_exit: virtPortProfile, macaddress, linkdev, + vf, vmOp)); link_del_exit: @@ -624,6 +627,8 @@ int virNetDevMacVLanDeleteWithVPortProfile(const char *ifname, char *stateDir) { int ret = 0; + int vf = -1; + if (mode == VIR_NETDEV_MACVLAN_MODE_PASSTHRU) { ignore_value(virNetDevRestoreMacAddress(linkdev, stateDir)); } @@ -633,6 +638,7 @@ int virNetDevMacVLanDeleteWithVPortProfile(const char *ifname, virtPortProfile, macaddr, linkdev, + vf, VIR_NETDEV_VPORT_PROFILE_OP_DESTROY) < 0) ret = -1; if (virNetDevMacVLanDelete(ifname) < 0) diff --git a/src/util/virnetdevvportprofile.c b/src/util/virnetdevvportprofile.c index 67fd7bc..03c85dd 100644 --- a/src/util/virnetdevvportprofile.c +++ b/src/util/virnetdevvportprofile.c @@ -126,11 +126,6 @@ static struct nla_policy ifla_port_policy[IFLA_PORT_MAX + 1] = { [IFLA_PORT_RESPONSE] = { .type = NLA_U16 }, }; -static struct nla_policy ifla_policy[IFLA_MAX + 1] = -{ - [IFLA_VF_PORTS] = { .type = NLA_NESTED }, -}; - static uint32_t virNetDevVPortProfileGetLldpadPid(void) { @@ -164,126 +159,6 @@ virNetDevVPortProfileGetLldpadPid(void) { return pid; } - -/** - * virNetDevVPortProfileLinkDump: - * - * @ifname: The name of the interface; only use if ifindex < 0 - * @ifindex: The interface index; may be < 0 if ifname is given - * @nltarget_kernel: whether to send the message to the kernel or another - * process - * @nlattr: pointer to a pointer of netlink attributes that will contain - * the results - * @recvbuf: Pointer to the buffer holding the returned netlink response - * message; free it, once not needed anymore - * @getPidFunc: Pointer to a function that will be invoked if the kernel - * is not the target of the netlink message but it is to be - * sent to another process. - * - * Get information about an interface given its name or index. - * - * Returns 0 on success, -1 on fatal error. - */ -static int -virNetDevVPortProfileLinkDump(const char *ifname, int ifindex, bool nltarget_kernel, - struct nlattr **tb, unsigned char **recvbuf, - uint32_t (*getPidFunc)(void)) -{ - int rc = 0; - struct nlmsghdr *resp; - struct nlmsgerr *err; - struct ifinfomsg ifinfo = { - .ifi_family = AF_UNSPEC, - .ifi_index = ifindex - }; - unsigned int recvbuflen; - uint32_t pid = 0; - struct nl_msg *nl_msg; - - *recvbuf = NULL; - - nl_msg = nlmsg_alloc_simple(RTM_GETLINK, NLM_F_REQUEST); - if (!nl_msg) { - virReportOOMError(); - return -1; - } - - if (nlmsg_append(nl_msg, &ifinfo, sizeof(ifinfo), NLMSG_ALIGNTO) < 0) - goto buffer_too_small; - - if (ifindex < 0 && ifname) { - if (nla_put(nl_msg, IFLA_IFNAME, strlen(ifname)+1, ifname) < 0) - goto buffer_too_small; - } - - if (!nltarget_kernel) { - pid = getPidFunc(); - if (pid == 0) { - rc = -1; - goto cleanup; - } - } - - if (virNetlinkCommand(nl_msg, recvbuf, &recvbuflen, pid) < 0) { - rc = -1; - goto cleanup; - } - - if (recvbuflen < NLMSG_LENGTH(0) || *recvbuf == NULL) - goto malformed_resp; - - resp = (struct nlmsghdr *)*recvbuf; - - switch (resp->nlmsg_type) { - case NLMSG_ERROR: - err = (struct nlmsgerr *)NLMSG_DATA(resp); - if (resp->nlmsg_len < NLMSG_LENGTH(sizeof(*err))) - goto malformed_resp; - - if (err->error) { - virReportSystemError(-err->error, - _("error dumping %s (%d) interface"), - ifname, ifindex); - rc = -1; - } - break; - - case GENL_ID_CTRL: - case NLMSG_DONE: - if (nlmsg_parse(resp, sizeof(struct ifinfomsg), - tb, IFLA_MAX, ifla_policy)) { - goto malformed_resp; - } - break; - - default: - goto malformed_resp; - } - - if (rc != 0) - VIR_FREE(*recvbuf); - -cleanup: - nlmsg_free(nl_msg); - - return rc; - -malformed_resp: - nlmsg_free(nl_msg); - - virNetDevError(VIR_ERR_INTERNAL_ERROR, "%s", - _("malformed netlink response message")); - VIR_FREE(*recvbuf); - return -1; - -buffer_too_small: - nlmsg_free(nl_msg); - - virNetDevError(VIR_ERR_INTERNAL_ERROR, "%s", - _("allocated netlink buffer is too small")); - return -1; -} - /** * virNetDevVPortProfileGetStatus: * @@ -607,7 +482,7 @@ virNetDevVPortProfileGetNthParent(const char *ifname, int ifindex, unsigned int return -1; while (!end && i <= nthParent) { - rc = virNetDevVPortProfileLinkDump(ifname, ifindex, true, tb, &recvbuf, NULL); + rc = virNetDevLinkDump(ifname, ifindex, true, tb, &recvbuf, NULL); if (rc < 0) break; @@ -676,8 +551,8 @@ virNetDevVPortProfileOpCommon(const char *ifname, int ifindex, } while (--repeats >= 0) { - rc = virNetDevVPortProfileLinkDump(NULL, ifindex, nltarget_kernel, tb, - &recvbuf, virNetDevVPortProfileGetLldpadPid); + rc = virNetDevLinkDump(NULL, ifindex, nltarget_kernel, tb, + &recvbuf, virNetDevVPortProfileGetLldpadPid); if (rc < 0) goto err_exit; @@ -750,6 +625,7 @@ virNetDevVPortProfileGetPhysdevAndVlan(const char *ifname, int *root_ifindex, ch static int virNetDevVPortProfileOp8021Qbg(const char *ifname, const unsigned char *macaddr, + int vf, const virNetDevVPortProfilePtr virtPort, enum virNetDevVPortProfileLinkOp virtPortOp) { @@ -763,7 +639,11 @@ virNetDevVPortProfileOp8021Qbg(const char *ifname, int vlanid; int physdev_ifindex = 0; char physdev_ifname[IFNAMSIZ] = { 0, }; - int vf = PORT_SELF_VF; + + if (!ifname) + return -EINVAL; + + vf = PORT_SELF_VF; if (virNetDevVPortProfileGetPhysdevAndVlan(ifname, &physdev_ifindex, physdev_ifname, &vlanid) < 0) { @@ -811,46 +691,11 @@ err_exit: return rc; } - -static int -virNetDevVPortProfileGetPhysfnDev(const char *linkdev, - int32_t *vf, - char **physfndev) -{ - int rc = -1; - - if (virNetDevIsVirtualFunction(linkdev) == 1) { - /* if linkdev is SR-IOV VF, then set vf = VF index */ - /* and set linkdev = PF device */ - - rc = virNetDevGetPhysicalFunction(linkdev, physfndev); - if (!rc) - rc = virNetDevGetVirtualFunctionIndex(*physfndev, linkdev, vf); - } else { - - /* Not SR-IOV VF: physfndev is linkdev and VF index - * refers to linkdev self - */ - - *vf = PORT_SELF_VF; - *physfndev = strdup(linkdev); - if (!*physfndev) { - virReportOOMError(); - goto err_exit; - } - rc = 0; - } - -err_exit: - - return rc; -} - - /* Returns 0 on success, -1 on general failure, and -2 on timeout */ static int virNetDevVPortProfileOp8021Qbh(const char *ifname, const unsigned char *macaddr, + int32_t vf, const virNetDevVPortProfilePtr virtPort, const unsigned char *vm_uuid, enum virNetDevVPortProfileLinkOp virtPortOp) @@ -858,14 +703,20 @@ virNetDevVPortProfileOp8021Qbh(const char *ifname, int rc = 0; char *physfndev = NULL; unsigned char hostuuid[VIR_UUID_BUFLEN]; - int32_t vf; bool nltarget_kernel = true; int ifindex; int vlanid = -1; - rc = virNetDevVPortProfileGetPhysfnDev(ifname, &vf, &physfndev); - if (rc < 0) - goto err_exit; + if (vf == -1 && virNetDevIsVirtualFunction(ifname) == 1) { + if (virNetDevGetVirtualFunctionInfo(ifname, &physfndev, &vf) < 0) + goto err_exit; + } else { + physfndev = strdup(ifname); + if (!physfndev) { + virReportOOMError(); + goto err_exit; + } + } rc = virNetDevGetIndex(physfndev, &ifindex); if (rc < 0) @@ -953,13 +804,14 @@ virNetDevVPortProfileAssociate(const char *macvtap_ifname, const virNetDevVPortProfilePtr virtPort, const unsigned char *macvtap_macaddr, const char *linkdev, + int vf, const unsigned char *vmuuid, enum virNetDevVPortProfileOp vmOp) { int rc = 0; VIR_DEBUG("Associating port profile '%p' on link device '%s'", - virtPort, macvtap_ifname); + virtPort, (macvtap_ifname ? macvtap_ifname : linkdev)); VIR_DEBUG("%s: VM OPERATION: %s", __FUNCTION__, virNetDevVPortProfileOpTypeToString(vmOp)); @@ -974,14 +826,14 @@ virNetDevVPortProfileAssociate(const char *macvtap_ifname, case VIR_NETDEV_VPORT_PROFILE_8021QBG: rc = virNetDevVPortProfileOp8021Qbg(macvtap_ifname, macvtap_macaddr, - virtPort, + vf, virtPort, (vmOp == VIR_NETDEV_VPORT_PROFILE_OP_MIGRATE_IN_START) ? VIR_NETDEV_VPORT_PROFILE_LINK_OP_PREASSOCIATE : VIR_NETDEV_VPORT_PROFILE_LINK_OP_ASSOCIATE); break; case VIR_NETDEV_VPORT_PROFILE_8021QBH: - rc = virNetDevVPortProfileOp8021Qbh(linkdev, macvtap_macaddr, + rc = virNetDevVPortProfileOp8021Qbh(linkdev, macvtap_macaddr, vf, virtPort, vmuuid, (vmOp == VIR_NETDEV_VPORT_PROFILE_OP_MIGRATE_IN_START) ? VIR_NETDEV_VPORT_PROFILE_LINK_OP_PREASSOCIATE_RR @@ -1014,6 +866,7 @@ virNetDevVPortProfileDisassociate(const char *macvtap_ifname, const virNetDevVPortProfilePtr virtPort, const unsigned char *macvtap_macaddr, const char *linkdev, + int vf, enum virNetDevVPortProfileOp vmOp) { int rc = 0; @@ -1033,7 +886,7 @@ virNetDevVPortProfileDisassociate(const char *macvtap_ifname, break; case VIR_NETDEV_VPORT_PROFILE_8021QBG: - rc = virNetDevVPortProfileOp8021Qbg(macvtap_ifname, macvtap_macaddr, + rc = virNetDevVPortProfileOp8021Qbg(macvtap_ifname, macvtap_macaddr, vf, virtPort, VIR_NETDEV_VPORT_PROFILE_LINK_OP_DISASSOCIATE); break; @@ -1043,7 +896,7 @@ virNetDevVPortProfileDisassociate(const char *macvtap_ifname, if (vmOp == VIR_NETDEV_VPORT_PROFILE_OP_MIGRATE_IN_FINISH) break; ignore_value(virNetDevSetOnline(linkdev, false)); - rc = virNetDevVPortProfileOp8021Qbh(linkdev, macvtap_macaddr, + rc = virNetDevVPortProfileOp8021Qbh(linkdev, macvtap_macaddr, vf, virtPort, NULL, VIR_NETDEV_VPORT_PROFILE_LINK_OP_DISASSOCIATE); break; @@ -1057,6 +910,7 @@ int virNetDevVPortProfileAssociate(const char *macvtap_ifname ATTRIBUTE_UNUSED, const virNetDevVPortProfilePtr virtPort ATTRIBUTE_UNUSED, const unsigned char *macvtap_macaddr ATTRIBUTE_UNUSED, const char *linkdev ATTRIBUTE_UNUSED, + int vf, const unsigned char *vmuuid ATTRIBUTE_UNUSED, enum virNetDevVPortProfileOp vmOp ATTRIBUTE_UNUSED) { @@ -1069,6 +923,7 @@ int virNetDevVPortProfileDisassociate(const char *macvtap_ifname ATTRIBUTE_UNUSE const virNetDevVPortProfilePtr virtPort ATTRIBUTE_UNUSED, const unsigned char *macvtap_macaddr ATTRIBUTE_UNUSED, const char *linkdev ATTRIBUTE_UNUSED, + int vf, enum virNetDevVPortProfileOp vmOp ATTRIBUTE_UNUSED) { virReportSystemError(ENOSYS, "%s", diff --git a/src/util/virnetdevvportprofile.h b/src/util/virnetdevvportprofile.h index ec2d06d..5c5dcf1 100644 --- a/src/util/virnetdevvportprofile.h +++ b/src/util/virnetdevvportprofile.h @@ -85,17 +85,19 @@ int virNetDevVPortProfileAssociate(const char *ifname, const virNetDevVPortProfilePtr virtPort, const unsigned char *macaddr, const char *linkdev, + int vf, const unsigned char *vmuuid, enum virNetDevVPortProfileOp vmOp) - ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4) - ATTRIBUTE_NONNULL(5) ATTRIBUTE_RETURN_CHECK; + ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(6) + ATTRIBUTE_RETURN_CHECK; int virNetDevVPortProfileDisassociate(const char *ifname, const virNetDevVPortProfilePtr virtPort, const unsigned char *macaddr, const char *linkdev, + int vf, enum virNetDevVPortProfileOp vmOp) - ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4) + ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4) ATTRIBUTE_RETURN_CHECK;

On 03/01/2012 04:02 AM, Roopa Prabhu wrote:
From: Roopa Prabhu<roprabhu@cisco.com>
This patch includes the following changes - removes some netlink functions which are now available in virnetdev.c - Adds a vf argument to all port profile functions
For 802.1Qbh devices, the port profile calls can use a vf argument if passed by the caller. If the vf argument is -1 it will try to derive the vf if the device passed is a virtual function.
For 802.1Qbg devices, This patch introduces a null check for the device argument because during port profile assignment on a hostdev, this argument can be null. Stefan CC'ed for comments
I agree to the change per se since entering the function with a NULL parameter would be fatal, though I am wondering under what condition this can occur. I don't see this happening in the Associate call path for example. Stefan

On 3/1/12 4:39 AM, "Stefan Berger" <stefanb@linux.vnet.ibm.com> wrote:
On 03/01/2012 04:02 AM, Roopa Prabhu wrote:
From: Roopa Prabhu<roprabhu@cisco.com>
This patch includes the following changes - removes some netlink functions which are now available in virnetdev.c - Adds a vf argument to all port profile functions
For 802.1Qbh devices, the port profile calls can use a vf argument if passed by the caller. If the vf argument is -1 it will try to derive the vf if the device passed is a virtual function.
For 802.1Qbg devices, This patch introduces a null check for the device argument because during port profile assignment on a hostdev, this argument can be null. Stefan CC'ed for comments
I agree to the change per se since entering the function with a NULL parameter would be fatal, though I am wondering under what condition this can occur. I don't see this happening in the Associate call path for example.
It happens in patch 4/4 where we call associate for a hostdev and there is no macvtap available there.

On 03/01/2012 10:32 AM, Roopa Prabhu wrote:
On 3/1/12 4:39 AM, "Stefan Berger"<stefanb@linux.vnet.ibm.com> wrote:
From: Roopa Prabhu<roprabhu@cisco.com>
This patch includes the following changes - removes some netlink functions which are now available in virnetdev.c - Adds a vf argument to all port profile functions
For 802.1Qbh devices, the port profile calls can use a vf argument if passed by the caller. If the vf argument is -1 it will try to derive the vf if the device passed is a virtual function.
For 802.1Qbg devices, This patch introduces a null check for the device argument because during port profile assignment on a hostdev, this argument can be null. Stefan CC'ed for comments I agree to the change per se since entering the function with a NULL
On 03/01/2012 04:02 AM, Roopa Prabhu wrote: parameter would be fatal, though I am wondering under what condition this can occur. I don't see this happening in the Associate call path for example. It happens in patch 4/4 where we call associate for a hostdev and there is no macvtap available there.
So this hostdev related code can only be used by 802.1Qbh because the type of device does not exist for 802.1Qbg? I think that at least there should be an error message logging the fact that the function was called without an interface name. Also the return error code should probably be -1 and not be overloaded with -E*. If possible (unless already done) the combination of hostdev and 802.1Qbg should probably even be prevented on the XML level. Stefan

On 3/1/12 7:52 AM, "Stefan Berger" <stefanb@linux.vnet.ibm.com> wrote:
On 03/01/2012 10:32 AM, Roopa Prabhu wrote:
On 3/1/12 4:39 AM, "Stefan Berger"<stefanb@linux.vnet.ibm.com> wrote:
From: Roopa Prabhu<roprabhu@cisco.com>
This patch includes the following changes - removes some netlink functions which are now available in virnetdev.c - Adds a vf argument to all port profile functions
For 802.1Qbh devices, the port profile calls can use a vf argument if passed by the caller. If the vf argument is -1 it will try to derive the vf if the device passed is a virtual function.
For 802.1Qbg devices, This patch introduces a null check for the device argument because during port profile assignment on a hostdev, this argument can be null. Stefan CC'ed for comments I agree to the change per se since entering the function with a NULL
On 03/01/2012 04:02 AM, Roopa Prabhu wrote: parameter would be fatal, though I am wondering under what condition this can occur. I don't see this happening in the Associate call path for example. It happens in patch 4/4 where we call associate for a hostdev and there is no macvtap available there.
So this hostdev related code can only be used by 802.1Qbh because the type of device does not exist for 802.1Qbg?
Not really. The hostdev device is just a pci device. But looking at 802.1Qbg port profile related code..i was not sure how it can be done when the device is a hostdev.
I think that at least there should be an error message logging the fact that the function was called without an interface name. Also the return error code should probably be -1 and not be overloaded with -E*. Ok This can be done.
If possible (unless already done) the combination of hostdev and 802.1Qbg should probably even be prevented on the XML level. This is not done yet. I think I will do it. And then if someone knows how to do it for 802.1Qbg they can open this up.
Thanks stefan.

On 03/01/2012 11:32 AM, Roopa Prabhu wrote:
On 3/1/12 7:52 AM, "Stefan Berger"<stefanb@linux.vnet.ibm.com> wrote:
On 03/01/2012 10:32 AM, Roopa Prabhu wrote:
On 3/1/12 4:39 AM, "Stefan Berger"<stefanb@linux.vnet.ibm.com> wrote:
From: Roopa Prabhu<roprabhu@cisco.com>
This patch includes the following changes - removes some netlink functions which are now available in virnetdev.c - Adds a vf argument to all port profile functions
For 802.1Qbh devices, the port profile calls can use a vf argument if passed by the caller. If the vf argument is -1 it will try to derive the vf if the device passed is a virtual function.
For 802.1Qbg devices, This patch introduces a null check for the device argument because during port profile assignment on a hostdev, this argument can be null. Stefan CC'ed for comments I agree to the change per se since entering the function with a NULL
On 03/01/2012 04:02 AM, Roopa Prabhu wrote: parameter would be fatal, though I am wondering under what condition this can occur. I don't see this happening in the Associate call path for example. It happens in patch 4/4 where we call associate for a hostdev and there is no macvtap available there.
So this hostdev related code can only be used by 802.1Qbh because the type of device does not exist for 802.1Qbg? Not really. The hostdev device is just a pci device. But looking at 802.1Qbg port profile related code..i was not sure how it can be done when the device is a hostdev.
My guess is that in the 802.1Qbg case we would have to build a macvtap device on top of the linkdev to have the macvtap_ifname parameter available. Well, maybe the point of intercept would be to check the type of profile in 4/4 and make sure it's a 802.1Qbh type and only then invoke the function -- like a switch() statement where 802.1Qbh type of profile is supported and 802.1Qbg is not (yet).
I think that at least there should be an error message logging the fact that the function was called without an interface name. Also the return error code should probably be -1 and not be overloaded with -E*. Ok This can be done.
If possible (unless already done) the combination of hostdev and 802.1Qbg should probably even be prevented on the XML level. This is not done yet. I think I will do it. And then if someone knows how to do it for 802.1Qbg they can open this up.
Stefan

On 03/01/2012 11:55 AM, Stefan Berger wrote:
On 03/01/2012 11:32 AM, Roopa Prabhu wrote:
On 3/1/12 7:52 AM, "Stefan Berger"<stefanb@linux.vnet.ibm.com> wrote:
On 03/01/2012 10:32 AM, Roopa Prabhu wrote:
On 3/1/12 4:39 AM, "Stefan Berger"<stefanb@linux.vnet.ibm.com> wrote:
From: Roopa Prabhu<roprabhu@cisco.com>
This patch includes the following changes - removes some netlink functions which are now available in virnetdev.c - Adds a vf argument to all port profile functions
For 802.1Qbh devices, the port profile calls can use a vf argument if passed by the caller. If the vf argument is -1 it will try to derive the vf if the device passed is a virtual function.
For 802.1Qbg devices, This patch introduces a null check for the device argument because during port profile assignment on a hostdev, this argument can be null. Stefan CC'ed for comments I agree to the change per se since entering the function with a NULL
On 03/01/2012 04:02 AM, Roopa Prabhu wrote: parameter would be fatal, though I am wondering under what condition this can occur. I don't see this happening in the Associate call path for example. It happens in patch 4/4 where we call associate for a hostdev and there is no macvtap available there.
So this hostdev related code can only be used by 802.1Qbh because the type of device does not exist for 802.1Qbg? Not really. The hostdev device is just a pci device. But looking at 802.1Qbg port profile related code..i was not sure how it can be done when the device is a hostdev.
My guess is that in the 802.1Qbg case we would have to build a macvtap device on top of the linkdev to have the macvtap_ifname parameter available. Well, maybe the point of intercept would be to check the type of profile in 4/4 and make sure it's a 802.1Qbh type and only then invoke the function -- like a switch() statement where 802.1Qbh type of profile is supported and 802.1Qbg is not (yet).
In the case of hostdev though, there is not necessarily any netdev driver at all in the host (and thus no "linkdev" to attach a macvtap to), certainly not after it's attached to the guest - control of the PCI device is given over to the guest. So is the problem here that 802.1QbX stuff can only work if there's an associated macvtap device? Although it might be possible to temporarily create a macvtap device and attach it to the PCI device's netdev driver prior to passing it through, that would only work if a netdev driver was bound to the PCI device (which isn't always the case, especially for SRIOV VFs), yet that netdev driver would then immediately need to be unbound prior to assigning the device to the guest, and most likely that would kill the macvtap device; even if the setup done using that macvtap device wasn't undone in the process, would it be possible to undo it later when the guest terminates (or the device is detached from the guest)?
I think that at least there should be an error message logging the fact that the function was called without an interface name. Also the return error code should probably be -1 and not be overloaded with -E*. Ok This can be done.
If possible (unless already done) the combination of hostdev and 802.1Qbg should probably even be prevented on the XML level. This is not done yet. I think I will do it. And then if someone knows how to do it for 802.1Qbg they can open this up.
Stefan

On Thu, 2012-03-01 at 13:02 -0500, Laine Stump wrote:
In the case of hostdev though, there is not necessarily any netdev driver at all in the host (and thus no "linkdev" to attach a macvtap to), certainly not after it's attached to the guest - control of the PCI device is given over to the guest.
So is the problem here that 802.1QbX stuff can only work if there's an associated macvtap device? Although it might be possible to temporarily create a macvtap device and attach it to the PCI device's netdev driver prior to passing it through, that would only work if a netdev driver was bound to the PCI device (which isn't always the case, especially for SRIOV VFs), yet that netdev driver would then immediately need to be unbound prior to assigning the device to the guest, and most likely that would kill the macvtap device; even if the setup done using that macvtap device wasn't undone in the process, would it be possible to undo it later when the guest terminates (or the device is detached from the guest)?
I wondered how the complete XML fragment for Qbh would look like and came up with the following: <interface type='hostdev'> <source dev='eth0' mode='private'/> <mac address='52:54:00:8b:c9:51'> <virtualport type='802.1Qbh'> <parameters profileid='finance'/> </virtualport> </interface> Can someone confirm? For Qbg, we would need then something like this: <interface type='hostdev'> <source dev='eth0' mode='vepa'/> <mac address='52:54:00:8b:c9:51'> <virtualport type="802.1Qbg"> <parameters managerid="11" typeid="1193047" typeidversion="2" instanceid="09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f" vlanid="2"/> </virtualport> </interface> to be of any use. Note the additional vlanid attribute. The semantic would be that the host establishes a Qbg association for (managerid, typeid, typeidversion, instanceid, vlanid) and that the VM would need to add the correct VLAN tag in order to be able to communicate. Does that make sense? Best regards, Gerhard Stenzel, Hybrid Technologies, LTC ----------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294

On 03/02/2012 09:12 AM, Gerhard Stenzel wrote:
On Thu, 2012-03-01 at 13:02 -0500, Laine Stump wrote:
In the case of hostdev though, there is not necessarily any netdev driver at all in the host (and thus no "linkdev" to attach a macvtap to), certainly not after it's attached to the guest - control of the PCI device is given over to the guest.
So is the problem here that 802.1QbX stuff can only work if there's an associated macvtap device? Although it might be possible to temporarily create a macvtap device and attach it to the PCI device's netdev driver prior to passing it through, that would only work if a netdev driver was bound to the PCI device (which isn't always the case, especially for SRIOV VFs), yet that netdev driver would then immediately need to be unbound prior to assigning the device to the guest, and most likely that would kill the macvtap device; even if the setup done using that macvtap device wasn't undone in the process, would it be possible to undo it later when the guest terminates (or the device is detached from the guest)? I wondered how the complete XML fragment for Qbh would look like and came up with the following:
<interface type='hostdev'> <source dev='eth0' mode='private'/>
1) Currently it requires a PCI address (although I plan to add the ability to accept a netdev name and automatically convert it to a PCI address): <source> <address type='pci' domain='0' bus='0' slot='6' function='0'/> </source> 2) I guess I have been misunderestimating the level of symbiosis between macvtap and 802.1QbX. I had thought that the private vs. vepa thing was related to whether or not macvtap could (or couldn't) share the physical device and (when sharing was allowed) whether or not it allowed multiple macvtap devices connected to the same physical to see traffic from each other. This assumption led me to believe that in the case of a PCI passthrough device, where there is obviously no sharing (or macvtap device), these different modes were irrelevant, and all that was needed was the information in <virtualport>. What I *think* I'm understanding from this discussion is that 1) in order for a virtual port association to happen, a macvtap interface must exist, and the association is done wrt that macvtap device *not* the physical device, or even the VF, and 2) knowing the information in <virtualport> (along with knowing that the physical device is not being shared) is not enough information to properly perform an associate operation. Is this correct? If that's the case, then there are some basic assumptions made here that are incorrect, and we will need to either change the lower level code to somehow accomplish a port associate without a macvtap interface, or we will need to pull some kind of trickery, possibly adding a macvtap interface to the PF to be used as a proxy to do the ASSOCIATE for the VF (will that even work? In particular, will it work if multiple VFs need to operate in one of the "exclusive" modes where no sharing of physical device is permitted?) Or maybe I'm still not understanding...
<mac address='52:54:00:8b:c9:51'> <virtualport type='802.1Qbh'> <parameters profileid='finance'/> </virtualport> </interface>
Can someone confirm?
For Qbg, we would need then something like this:
<interface type='hostdev'> <source dev='eth0' mode='vepa'/> <mac address='52:54:00:8b:c9:51'> <virtualport type="802.1Qbg"> <parameters managerid="11" typeid="1193047" typeidversion="2" instanceid="09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f" vlanid="2"/> </virtualport> </interface>
to be of any use.
Again, my knowledge is insufficient to understand - why was a vlanid not necessary before when we were dealing with a hostside macvtap tied to a guest-side emulated netdev, and why is it necessary now that we want to just passthrough the PCI device to the guest?
Note the additional vlanid attribute. The semantic would be that the host establishes a Qbg association for (managerid, typeid, typeidversion, instanceid, vlanid) and that the VM would need to add the correct VLAN tag in order to be able to communicate.
So adding the VLAN tag has in the past been done by the macvtap interface? Where did it learn the vlanid from? Definitely if the packets need to leave the host with a VLAN tag, in PCI passthrough mode that will need to be done by the guest OS, since the host will be unable to get its hands on the packets. Once that's the case, does it maybe make more sense to just leave *everything* up to the guest OS - do a PCI passthrough of the device (maybe setting the MAC address) and let the guest do the port associate etc. too? (Another way of saying this - at this point, shouldn't we just admit that transparent hostside support of VEPA (or any other protocol that requires data packets to be modified) using PCI passthrough by definition is not possible, and therefore isn't supported?) Or am I just misunderstanding again?
Does that make sense?
Not yet, but I'm slowly learning more and more :-)

On 03/02/2012 10:52 AM, Laine Stump wrote: > On 03/02/2012 09:12 AM, Gerhard Stenzel wrote: >> On Thu, 2012-03-01 at 13:02 -0500, Laine Stump wrote: >>> In the case of hostdev though, there is not necessarily any netdev >>> driver at all in the host (and thus no "linkdev" to attach a macvtap >>> to), certainly not after it's attached to the guest - control of the >>> PCI >>> device is given over to the guest. >>> >>> So is the problem here that 802.1QbX stuff can only work if there's an >>> associated macvtap device? Although it might be possible to >>> temporarily >>> create a macvtap device and attach it to the PCI device's netdev >>> driver >>> prior to passing it through, that would only work if a netdev driver >>> was >>> bound to the PCI device (which isn't always the case, especially for >>> SRIOV VFs), yet that netdev driver would then immediately need to be >>> unbound prior to assigning the device to the guest, and most likely >>> that >>> would kill the macvtap device; even if the setup done using that >>> macvtap >>> device wasn't undone in the process, would it be possible to undo it >>> later when the guest terminates (or the device is detached from the >>> guest)? >> I wondered how the complete XML fragment for Qbh would look like and >> came up with the following: >> >> <interface type='hostdev'> >> <source dev='eth0' mode='private'/> > 1) Currently it requires a PCI address (although I plan to add the > ability to accept a netdev name and automatically convert it to a PCI > address): > > <source> > <address type='pci' domain='0' bus='0' slot='6' function='0'/> > </source> > > 2) I guess I have been misunderestimating the level of symbiosis between > macvtap and 802.1QbX. I had thought that the private vs. vepa thing was > related to whether or not macvtap could (or couldn't) share the physical > device and (when sharing was allowed) whether or not it allowed multiple > macvtap devices connected to the same physical to see traffic from each > other. This assumption led me to believe that in the case of a PCI > passthrough device, where there is obviously no sharing (or macvtap > device), these different modes were irrelevant, and all that was needed > was the information in<virtualport>. > > What I *think* I'm understanding from this discussion is that 1) in > order for a virtual port association to happen, a macvtap interface must > exist, and the association is done wrt that macvtap device *not* the > physical device, or even the VF, and 2) knowing the information in Well, another aspect of 802.1Qbg versus Qbh is that 802.1Qbg needs an external daemon, lldpad, to setup the switch. 802.1Qbh presumably does 'all it needs' by talking to the firmware on the ethernet card... The protocol between libvirt and lldpad currently requires the transfer of an interface. So this protocol would have to be extended to 'somehow' support a raw PIC bus/device/function. Stefan

On Fri, 2012-03-02 at 10:52 -0500, Laine Stump wrote:
Again, my knowledge is insufficient to understand - why was a vlanid not necessary before when we were dealing with a hostside macvtap tied to a guest-side emulated netdev, and why is it necessary now that we want to just passthrough the PCI device to the guest?
Note the additional vlanid attribute. The semantic would be that the host establishes a Qbg association for (managerid, typeid, typeidversion, instanceid, vlanid) and that the VM would need to add the correct VLAN tag in order to be able to communicate.
So adding the VLAN tag has in the past been done by the macvtap interface? Where did it learn the vlanid from?
(Many questions for which I will need some time ..) Let me answer the simple ones first: If you look here http://libvirt.org/formatdomain.html: <devices> <interface type='direct'/> ... <interface type='direct'> <source dev='eth0.2' mode='vepa'/> <virtualport type="802.1Qbg"> <parameters managerid="11" typeid="1193047" typeidversion="2" instanceid="09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f"/> </virtualport> </interface> </devices> ... In this example, the macvtap interface will be created on top of the VLAN interface 2 on top of eth0. The Qbg switch needs this information: (managerid, typeid, typeidversion, instanceid, vlanid) macvtap/VEPA does not need the the VLAN to work, but Qbg does. So for PCI passthrough, if the host does the association, it has to know which VLANID to associate, but the guest has to add the VLAN tags.
Definitely if the packets need to leave the host with a VLAN tag, in PCI passthrough mode that will need to be done by the guest OS, since the host will be unable to get its hands on the packets. Once that's the case, does it maybe make more sense to just leave *everything* up to the guest OS - do a PCI passthrough of the device (maybe setting the MAC address) and let the guest do the port associate etc. too? (Another way of saying this - at this point, shouldn't we just admit that transparent hostside support of VEPA (or any other protocol that requires data packets to be modified) using PCI passthrough by definition is not possible, and therefore isn't supported?)
Letting the guest do the association is an option, which should work already (even if noone probably tested it yet), but the question is really how much control should the host have vs the guest. There are definitely scenarios thinkable where the host should do the association. Best regards, Gerhard Stenzel, Hybrid Technologies, LTC ----------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294

On 03/02/2012 11:37 AM, Gerhard Stenzel wrote:
Letting the guest do the association is an option, which should work already (even if noone probably tested it yet), but the question is really how much control should the host have vs the guest. There are definitely scenarios thinkable where the host should do the association.
I think an applicable scenario is where the infrastructure provider doesn't really know the guest user and needs to have 'mandatory access control' over the configuration of the infrastructure and yet wants to use the pass-through mode for better throughput. Stefan

On 03/02/2012 11:58 AM, Stefan Berger wrote:
On 03/02/2012 11:37 AM, Gerhard Stenzel wrote:
Letting the guest do the association is an option, which should work already (even if noone probably tested it yet), but the question is really how much control should the host have vs the guest. There are definitely scenarios thinkable where the host should do the association.
I think an applicable scenario is where the infrastructure provider doesn't really know the guest user and needs to have 'mandatory access control' over the configuration of the infrastructure and yet wants to use the pass-through mode for better throughput.
And/or when the guest OS doesn't have the necessary smarts to do the association (or maybe even vlan tagging) itself. So, at the end of this, is there *any* 802.1QbX mode that can work using PCI passthrough without at least one of the following things: 1) a macvtap interface on the host. (what about my idea of attaching a macvtap interface to the PF? does that have any hint of practicality?) 2) extending the protocol for talking with lldpad to support using a raw PCI device rather than a macvtap device. 3) the guest doing vlan tagging 4) the guest doing the full 802.1QbX associate/de-associate protocol exchange itself? Nobody has said it explicitly yet (I think), but I have the impression that this problem unfortunately can't be solved by libvirt alone. If that's the case, we should state that as soon as possible so that we can table the <virtualport> part of these patches for the short term, and separate the mac address part to get it pushed upstream (along with the new low-level PCI utility functions), as that is very useful on its own.

On Fri, 2012-03-02 at 14:27 -0500, Laine Stump wrote:
So, at the end of this, is there *any* 802.1QbX mode that can work using PCI passthrough without at least one of the following things:
1) a macvtap interface on the host. (what about my idea of attaching a macvtap interface to the PF? does that have any hint of practicality?)
2) extending the protocol for talking with lldpad to support using a raw PCI device rather than a macvtap device.
3) the guest doing vlan tagging
4) the guest doing the full 802.1QbX associate/de-associate protocol exchange itself?
Nobody has said it explicitly yet (I think), but I have the impression that this problem unfortunately can't be solved by libvirt alone. If that's the case, we should state that as soon as possible so that we can table the <virtualport> part of these patches for the short term, and separate the mac address part to get it pushed upstream (along with the new low-level PCI utility functions), as that is very useful on its own.
I am not sure I can follow the conclusion that this can not be solved in libvirt alone. Qbg: For the macvtap case, the macvtap device is "attached" to the underlying physical interface and this is where the association request is sent to, via lldpad. For the PCI passthrough case, the same must be possible, assuming the physical interface can be concluded from the PCI device and the VLAN information is provided. Or do I miss something? Best regards, Gerhard Stenzel, Hybrid Technologies, LTC ----------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294

On 3/2/12 9:21 AM, "Gerhard Stenzel" <gstenzel@linux.vnet.ibm.com> wrote:
On Fri, 2012-03-02 at 14:27 -0500, Laine Stump wrote:
So, at the end of this, is there *any* 802.1QbX mode that can work using PCI passthrough without at least one of the following things:
1) a macvtap interface on the host. (what about my idea of attaching a macvtap interface to the PF? does that have any hint of practicality?)
2) extending the protocol for talking with lldpad to support using a raw PCI device rather than a macvtap device.
3) the guest doing vlan tagging
4) the guest doing the full 802.1QbX associate/de-associate protocol exchange itself?
Nobody has said it explicitly yet (I think), but I have the impression that this problem unfortunately can't be solved by libvirt alone. If that's the case, we should state that as soon as possible so that we can table the <virtualport> part of these patches for the short term, and separate the mac address part to get it pushed upstream (along with the new low-level PCI utility functions), as that is very useful on its own.
I am not sure I can follow the conclusion that this can not be solved in libvirt alone. Qbg: For the macvtap case, the macvtap device is "attached" to the underlying physical interface and this is where the association request is sent to, via lldpad. For the PCI passthrough case, the same must be possible, assuming the physical interface can be concluded from the PCI device and the VLAN information is provided.
Or do I miss something?
Wondering if we simplify this to only support sriov devices and for 802.1Qbg all config can be done via the PF netdevice. Am assuming the vlan info has to only reach lldpad daemon and you don't really need a vlan device on host. In which case, your new xml syntax with vlanid should work I think. - we send the mac, vlan and port profile info to lldpad via the PF with vf index using the current IFLA_VF_MAC and IFLA_VF_PORT. No ? Thanks, Roopa

On 3/2/12 11:27 AM, "Laine Stump" <laine@laine.org> wrote:
On 03/02/2012 11:58 AM, Stefan Berger wrote:
On 03/02/2012 11:37 AM, Gerhard Stenzel wrote:
Letting the guest do the association is an option, which should work already (even if noone probably tested it yet), but the question is really how much control should the host have vs the guest. There are definitely scenarios thinkable where the host should do the association.
I think an applicable scenario is where the infrastructure provider doesn't really know the guest user and needs to have 'mandatory access control' over the configuration of the infrastructure and yet wants to use the pass-through mode for better throughput.
And/or when the guest OS doesn't have the necessary smarts to do the association (or maybe even vlan tagging) itself.
So, at the end of this, is there *any* 802.1QbX mode that can work using PCI passthrough without at least one of the following things:
1) a macvtap interface on the host. (what about my idea of attaching a macvtap interface to the PF? does that have any hint of practicality?)
2) extending the protocol for talking with lldpad to support using a raw PCI device rather than a macvtap device.
3) the guest doing vlan tagging
4) the guest doing the full 802.1QbX associate/de-associate protocol exchange itself?
Nobody has said it explicitly yet (I think), but I have the impression that this problem unfortunately can't be solved by libvirt alone. If that's the case, we should state that as soon as possible so that we can table the <virtualport> part of these patches for the short term, and separate the mac address part to get it pushed upstream (along with the new low-level PCI utility functions), as that is very useful on its own.
Laine, The patches I submitted for virtualport 802.1Qbh work just fine. I submitted these patches mainly to get the virtualport working for 802.1Qbh. Because we pass the port profile via the PF to fw and then to the switch. The driver in the guest just comes up on the VF and uses the already associated VF. We do the port profile association from the host because of security reasons. Instead of giving control to the guest OS. And as you can see in the patches, for 802.1Qbh port profile association on the hostdev is not much different from port profile association in the macvtap case. Thanks, Roopa

On 03/02/2012 03:16 PM, Roopa Prabhu wrote:
On 3/2/12 11:27 AM, "Laine Stump" <laine@laine.org> wrote:
On 03/02/2012 11:58 AM, Stefan Berger wrote:
On 03/02/2012 11:37 AM, Gerhard Stenzel wrote:
Letting the guest do the association is an option, which should work already (even if noone probably tested it yet), but the question is really how much control should the host have vs the guest. There are definitely scenarios thinkable where the host should do the association. I think an applicable scenario is where the infrastructure provider doesn't really know the guest user and needs to have 'mandatory access control' over the configuration of the infrastructure and yet wants to use the pass-through mode for better throughput. And/or when the guest OS doesn't have the necessary smarts to do the association (or maybe even vlan tagging) itself.
So, at the end of this, is there *any* 802.1QbX mode that can work using PCI passthrough without at least one of the following things:
1) a macvtap interface on the host. (what about my idea of attaching a macvtap interface to the PF? does that have any hint of practicality?)
2) extending the protocol for talking with lldpad to support using a raw PCI device rather than a macvtap device.
3) the guest doing vlan tagging
4) the guest doing the full 802.1QbX associate/de-associate protocol exchange itself?
Nobody has said it explicitly yet (I think), but I have the impression that this problem unfortunately can't be solved by libvirt alone. If that's the case, we should state that as soon as possible so that we can table the <virtualport> part of these patches for the short term, and separate the mac address part to get it pushed upstream (along with the new low-level PCI utility functions), as that is very useful on its own.
Laine, The patches I submitted for virtualport 802.1Qbh work just fine.
Yeah, I'm not sure how I lost sight of the fact that you said your testing had gone okay - I guess too little sleep. So I read too much into the discussion, and it's just Qbg that has these restrictions. Good! Pay no attention to my alarmism :-)
I submitted these patches mainly to get the virtualport working for 802.1Qbh. Because we pass the port profile via the PF to fw and then to the switch. The driver in the guest just comes up on the VF and uses the already associated VF.
Right. That's pretty much how I've always been assuming it worked for all of these modes. I guess I should spend more time at lower levels.
We do the port profile association from the host because of security reasons. Instead of giving control to the guest OS.
And as you can see in the patches, for 802.1Qbh port profile association on the hostdev is not much different from port profile association in the macvtap case.
Okay, then in the end these patches will support 802.1Qbh <virtualport> setting, as well as setting the MAC address (but only for SRIOV-capable devices). And any future support for 802.1Qbg would require both some extra support outside libvirt, as well as specifying the vlanid in the config, and requiring the guest to setup VLAN tagging. Did I get it right now?

On 3/2/12 12:45 PM, "Laine Stump" <laine@laine.org> wrote:
On 03/02/2012 03:16 PM, Roopa Prabhu wrote:
On 3/2/12 11:27 AM, "Laine Stump" <laine@laine.org> wrote:
On 03/02/2012 11:58 AM, Stefan Berger wrote:
On 03/02/2012 11:37 AM, Gerhard Stenzel wrote:
Letting the guest do the association is an option, which should work already (even if noone probably tested it yet), but the question is really how much control should the host have vs the guest. There are definitely scenarios thinkable where the host should do the association. I think an applicable scenario is where the infrastructure provider doesn't really know the guest user and needs to have 'mandatory access control' over the configuration of the infrastructure and yet wants to use the pass-through mode for better throughput. And/or when the guest OS doesn't have the necessary smarts to do the association (or maybe even vlan tagging) itself.
So, at the end of this, is there *any* 802.1QbX mode that can work using PCI passthrough without at least one of the following things:
1) a macvtap interface on the host. (what about my idea of attaching a macvtap interface to the PF? does that have any hint of practicality?)
2) extending the protocol for talking with lldpad to support using a raw PCI device rather than a macvtap device.
3) the guest doing vlan tagging
4) the guest doing the full 802.1QbX associate/de-associate protocol exchange itself?
Nobody has said it explicitly yet (I think), but I have the impression that this problem unfortunately can't be solved by libvirt alone. If that's the case, we should state that as soon as possible so that we can table the <virtualport> part of these patches for the short term, and separate the mac address part to get it pushed upstream (along with the new low-level PCI utility functions), as that is very useful on its own.
Laine, The patches I submitted for virtualport 802.1Qbh work just fine.
Yeah, I'm not sure how I lost sight of the fact that you said your testing had gone okay - I guess too little sleep. So I read too much into the discussion, and it's just Qbg that has these restrictions. Good! Pay no attention to my alarmism :-)
No problem :)
I submitted these patches mainly to get the virtualport working for 802.1Qbh. Because we pass the port profile via the PF to fw and then to the switch. The driver in the guest just comes up on the VF and uses the already associated VF.
Right. That's pretty much how I've always been assuming it worked for all of these modes. I guess I should spend more time at lower levels.
We do the port profile association from the host because of security reasons. Instead of giving control to the guest OS.
And as you can see in the patches, for 802.1Qbh port profile association on the hostdev is not much different from port profile association in the macvtap case.
Okay, then in the end these patches will support 802.1Qbh <virtualport> setting, as well as setting the MAC address (but only for SRIOV-capable devices). And any future support for 802.1Qbg would require both some extra support outside libvirt, as well as specifying the vlanid in the config, and requiring the guest to setup VLAN tagging. Did I get it right now?
Yes that seems right. I think we don't need to call out the setup in the guest. Its common for all modes. Host provisions the vlans (ie configures the switch port with that vlan etc). This is the step that libvirt does. And guest does his own setup for vlan devices if needed. Thanks, Roopa

On Fri, 2012-03-02 at 15:45 -0500, Laine Stump wrote:
Okay, then in the end these patches will support 802.1Qbh <virtualport> setting, as well as setting the MAC address (but only for SRIOV-capable devices). And any future support for 802.1Qbg would require both some extra support outside libvirt, as well as specifying the vlanid in the config, and requiring the guest to setup VLAN tagging. Did I get it right now?
Not sure, we need anything else for Qbg in addition to some changes in libvirt and vlan tagging in the guest. But, I think we are converging that the Qbh part looks okay and the Qbg part can be added later, if necessary. Best regards, Gerhard Stenzel, Hybrid Technologies, LTC ----------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294

On Fri, 2012-03-02 at 10:52 -0500, Laine Stump wrote:
1) Currently it requires a PCI address (although I plan to add the ability to accept a netdev name and automatically convert it to a PCI address):
<source> <address type='pci' domain='0' bus='0' slot='6' function='0'/> </source>
This means the XML fragment look more like this for Qbh: <interface type='hostdev'> <source> <address type='pci' domain='0' bus='0' slot='6' function='0'/> </source> <mac address='52:54:00:8b:c9:51'> <virtualport type='802.1Qbh'> <parameters profileid='finance'/> </virtualport> </interface> and for Qbg: <interface type='hostdev'> <source> <address type='pci' domain='0' bus='0' slot='6' function='0'/> </source> <mac address='52:54:00:8b:c9:51'> <virtualport type="802.1Qbg"> <parameters managerid="11" typeid="1193047" typeidversion="2" instanceid="09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f" vlanid="2"/> </virtualport> </interface>
2) I guess I have been misunderestimating the level of symbiosis between macvtap and 802.1QbX. I had thought that the private vs. vepa thing was related to whether or not macvtap could (or couldn't) share the physical device and (when sharing was allowed) whether or not it allowed multiple macvtap devices connected to the same physical to see traffic from each other. This assumption led me to believe that in the case of a PCI passthrough device, where there is obviously no sharing (or macvtap device), these different modes were irrelevant, and all that was needed was the information in <virtualport>.
What I *think* I'm understanding from this discussion is that 1) in order for a virtual port association to happen, a macvtap interface must exist, and the association is done wrt that macvtap device *not* the physical device, or even the VF, and 2) knowing the information in <virtualport> (along with knowing that the physical device is not being shared) is not enough information to properly perform an associate operation.
Is this correct?
If I understand above correctly, your first assumption seems correct and my XML examples have been misleading you.
If that's the case, then there are some basic assumptions made here that are incorrect, and we will need to either change the lower level code to somehow accomplish a port associate without a macvtap interface, or we will need to pull some kind of trickery, possibly adding a macvtap interface to the PF to be used as a proxy to do the ASSOCIATE for the VF (will that even work? In particular, will it work if multiple VFs need to operate in one of the "exclusive" modes where no sharing of physical device is permitted?)
I do not know for Qbh, but for Qbg: The switch knows nothing about macvtap devices or virtual functions, what matters is the combination of (managerid, typeid, typeidversion, instanceid, vlanid) to make an association. Best regards, Gerhard Stenzel, Hybrid Technologies, LTC ----------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294

On 03/02/2012 12:05 PM, Gerhard Stenzel wrote:
On Fri, 2012-03-02 at 10:52 -0500, Laine Stump wrote:
1) Currently it requires a PCI address (although I plan to add the ability to accept a netdev name and automatically convert it to a PCI address):
<source> <address type='pci' domain='0' bus='0' slot='6' function='0'/> </source> This means the XML fragment look more like this for Qbh:
<interface type='hostdev'> <source> <address type='pci' domain='0' bus='0' slot='6' function='0'/> </source> <mac address='52:54:00:8b:c9:51'> <virtualport type='802.1Qbh'> <parameters profileid='finance'/> </virtualport> </interface>
and for Qbg:
<interface type='hostdev'> <source> <address type='pci' domain='0' bus='0' slot='6' function='0'/> </source> <mac address='52:54:00:8b:c9:51'> <virtualport type="802.1Qbg"> <parameters managerid="11" typeid="1193047" typeidversion="2" instanceid="09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f" vlanid="2"/> </virtualport> </interface>
So vlanid is the new part? In the case we have no macvtap device we would have to find the vlanid in the XML and could convey that to lldpad directly (rather than determining it by walking the stack of interfaces as we do now) along with an ifindex of '0' or '-1' (?). If lldpad never really looked at the ifindex or ifname we sent it via the netlink message then this shouldn't be a problem to support, apart from having to configure the guest to create a vlan interface of course. So I guess the other set of parameters were not applied to the VM's traffic to allow VM A using vlanid 2 to be distinguishable from VM B using vlanid 2 as well on the same host? If this is all true, then at least the code path creating the association should be easy to adapt... Stefan

From: Roopa Prabhu <roprabhu@cisco.com> These changes are applied only if the hostdev has a parent net device. If the parent netdevice has virtual port information, the original virtualport associate functions are called (these set and restore both mac and port profile on an interface). Else, only mac address is set on the device using other methods depending on if its a sriov device or not. Changes also include hotplug pci devices Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> --- src/qemu/qemu_hostdev.c | 229 +++++++++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_hostdev.h | 8 +- src/qemu/qemu_hotplug.c | 19 ++++ 3 files changed, 242 insertions(+), 14 deletions(-) diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index b3cad8e..43d9dd7 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -29,6 +29,7 @@ #include "memory.h" #include "pci.h" #include "hostusb.h" +#include "virnetdev.h" static pciDeviceList * qemuGetPciHostDeviceList(virDomainHostdevDefPtr *hostdevs, int nhostdevs) @@ -156,12 +157,131 @@ int qemuUpdateActivePciHostdevs(struct qemud_driver *driver, return 0; } +static int +qemuDomainHostdevPciSysfsPath(virDomainHostdevDefPtr hostdev, char **sysfs_path) +{ + struct pci_config_address config_address; + + config_address.domain = hostdev->source.subsys.u.pci.domain; + config_address.bus = hostdev->source.subsys.u.pci.bus; + config_address.slot = hostdev->source.subsys.u.pci.slot; + config_address.function = hostdev->source.subsys.u.pci.function; + + return pciConfigAddressToSysfsFile(&config_address, sysfs_path); +} + +int +qemuDomainHostdevIsVirtualFunction(virDomainHostdevDefPtr hostdev) +{ + char *sysfs_path = NULL; + int ret = -1; + + if (qemuDomainHostdevPciSysfsPath(hostdev, &sysfs_path) < 0) + return ret; + + ret = pciDeviceIsVirtualFunction(sysfs_path); + + VIR_FREE(sysfs_path); + + return ret; +} + +static int +qemuDomainHostdevNetDevice(virDomainHostdevDefPtr hostdev, char **linkdev, + int *vf) +{ + int ret = -1; + char *sysfs_path = NULL; + + if (qemuDomainHostdevPciSysfsPath(hostdev, &sysfs_path) < 0) + return ret; + + if (pciDeviceIsVirtualFunction(sysfs_path) == 1) { + if (pciDeviceGetVirtualFunctionInfo(sysfs_path, linkdev, + vf) < 0) + goto cleanup; + } + else { + if (pciDeviceNetName(sysfs_path, linkdev) < 0) + goto cleanup; + *vf = -1; + } + + ret = 0; + +cleanup: + VIR_FREE(sysfs_path); + + return ret; +} + +int +qemuDomainHostdevNetConfigReplace(virDomainHostdevDefPtr hostdev, + const unsigned char *uuid, + char *stateDir) +{ + char *linkdev = NULL; + virNetDevVPortProfilePtr virtPortProfile; + int ret = -1; + int vf = -1; + int vlanid = -1; + + if (qemuDomainHostdevNetDevice(hostdev, &linkdev, &vf) < 0) + return ret; + + virtPortProfile = virDomainNetGetActualVirtPortProfile( + hostdev->parent.data.net); + if (virtPortProfile) + ret = virNetDevVPortProfileAssociate(NULL, virtPortProfile, + hostdev->parent.data.net->mac, + linkdev, vf, uuid, + VIR_NETDEV_VPORT_PROFILE_OP_CREATE); + else + /* Set only mac */ + ret = virNetDevReplaceNetConfig(linkdev, vf, + hostdev->parent.data.net->mac, vlanid, + stateDir); + + VIR_FREE(linkdev); + + return ret; +} + +int +qemuDomainHostdevNetConfigRestore(virDomainHostdevDefPtr hostdev, + char *stateDir) +{ + char *linkdev = NULL; + virNetDevVPortProfilePtr virtPortProfile; + int ret = -1; + int vf = -1; + + if (qemuDomainHostdevNetDevice(hostdev, &linkdev, &vf) < 0) + return ret; + + virtPortProfile = virDomainNetGetActualVirtPortProfile( + hostdev->parent.data.net); + if (virtPortProfile) + ret = virNetDevVPortProfileDisassociate(NULL, virtPortProfile, + hostdev->parent.data.net->mac, + linkdev, vf, + VIR_NETDEV_VPORT_PROFILE_OP_DESTROY); + else + ret = virNetDevRestoreNetConfig(linkdev, vf, stateDir); + + VIR_FREE(linkdev); + + return ret; +} + int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, const char *name, + const unsigned char *uuid, virDomainHostdevDefPtr *hostdevs, int nhostdevs) { pciDeviceList *pcidevs; + int last_processed_hostdev = -1, last_processed_hostdev_vf = -1; int i; int ret = -1; @@ -208,7 +328,25 @@ int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, } } - /* Loop 2: detach managed devices */ + /* Loop 2: For Non SRIOV network devices, set the network + * config before we detach the device */ + for (i = 0; i < nhostdevs; i++) { + virDomainHostdevDefPtr hostdev = hostdevs[i]; + + if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS) + continue; + if (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) + continue; + + if (hostdev->parent.data.net && + qemuDomainHostdevIsVirtualFunction(hostdev) == 0) + if (qemuDomainHostdevNetConfigReplace(hostdev, uuid, + driver->stateDir) < 0) + goto resetnetconfig; + last_processed_hostdev = i; + } + + /* Loop 3: detach managed devices */ for (i = 0; i < pciDeviceListCount(pcidevs); i++) { pciDevice *dev = pciDeviceListGet(pcidevs, i); if (pciDeviceGetManaged(dev) && @@ -216,7 +354,7 @@ int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, goto reattachdevs; } - /* Loop 3: Now that all the PCI hostdevs have been detached, we + /* Loop 4: Now that all the PCI hostdevs have been detached, we * can safely reset them */ for (i = 0; i < pciDeviceListCount(pcidevs); i++) { pciDevice *dev = pciDeviceListGet(pcidevs, i); @@ -225,7 +363,23 @@ int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, goto reattachdevs; } - /* Loop 4: Now mark all the devices as active */ + /* Loop 5: For SRIOV network devices, Now that we have detached the + * the network device, set the netdev config */ + for (i = 0; i < nhostdevs; i++) { + virDomainHostdevDefPtr hostdev = hostdevs[i]; + if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS) + continue; + if (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) + continue; + if (hostdev->parent.data.net && + qemuDomainHostdevIsVirtualFunction(hostdev) == 1) + if (qemuDomainHostdevNetConfigReplace(hostdev, uuid, + driver->stateDir) < 0) + goto resetvfnetconfig; + last_processed_hostdev_vf = i; + } + + /* Loop 6: Now mark all the devices as active */ for (i = 0; i < pciDeviceListCount(pcidevs); i++) { pciDevice *dev = pciDeviceListGet(pcidevs, i); if (pciDeviceListAdd(driver->activePciHostdevs, dev) < 0) { @@ -234,13 +388,13 @@ int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, } } - /* Loop 5: Now remove the devices from inactive list. */ + /* Loop 7: Now remove the devices from inactive list. */ for (i = 0; i < pciDeviceListCount(pcidevs); i++) { pciDevice *dev = pciDeviceListGet(pcidevs, i); pciDeviceListDel(driver->inactivePciHostdevs, dev); } - /* Loop 6: Now set the used_by_domain of the device in + /* Loop 8: Now set the used_by_domain of the device in * driver->activePciHostdevs as domain name. */ for (i = 0; i < pciDeviceListCount(pcidevs); i++) { @@ -252,7 +406,7 @@ int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, pciDeviceSetUsedBy(activeDev, name); } - /* Loop 7: Now set the original states for hostdev def */ + /* Loop 9: Now set the original states for hostdev def */ for (i = 0; i < nhostdevs; i++) { pciDevice *dev; pciDevice *pcidev; @@ -270,7 +424,7 @@ int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, /* original states "unbind_from_stub", "remove_slot", * "reprobe" were already set by pciDettachDevice in - * loop 2. + * loop 3. */ if ((pcidev = pciDeviceListFind(pcidevs, dev))) { hostdev->origstates.states.pci.unbind_from_stub = @@ -302,12 +456,28 @@ inactivedevs: pciDeviceListSteal(driver->activePciHostdevs, dev); } +resetvfnetconfig: + for (i = 0; i < last_processed_hostdev_vf; i++) { + virDomainHostdevDefPtr hostdev = hostdevs[i]; + if (hostdev->parent.data.net && + qemuDomainHostdevIsVirtualFunction(hostdev) == 1) + qemuDomainHostdevNetConfigRestore(hostdev, driver->stateDir); + } + reattachdevs: for (i = 0; i < pciDeviceListCount(pcidevs); i++) { pciDevice *dev = pciDeviceListGet(pcidevs, i); pciReAttachDevice(dev, driver->activePciHostdevs, NULL); } +resetnetconfig: + for (i = 0; i < last_processed_hostdev; i++) { + virDomainHostdevDefPtr hostdev = hostdevs[i]; + if (hostdev->parent.data.net && + qemuDomainHostdevIsVirtualFunction(hostdev) == 0) + qemuDomainHostdevNetConfigRestore(hostdev, driver->stateDir); + } + cleanup: pciDeviceListFree(pcidevs); return ret; @@ -317,10 +487,10 @@ static int qemuPrepareHostPCIDevices(struct qemud_driver *driver, virDomainDefPtr def) { - return qemuPrepareHostdevPCIDevices(driver, def->name, def->hostdevs, def->nhostdevs); + return qemuPrepareHostdevPCIDevices(driver, def->name, def->uuid, + def->hostdevs, def->nhostdevs); } - int qemuPrepareHostdevUSBDevices(struct qemud_driver *driver, const char *name, @@ -494,8 +664,10 @@ void qemuDomainReAttachHostdevDevices(struct qemud_driver *driver, return; } - /* Again 3 loops; mark all devices as inactive before reset - * them and reset all the devices before re-attach */ + /* Again 5 loops; mark all devices as inactive before reset + * them and reset all the devices before re-attach. + * Attach mac and port profile parameters to devices + */ for (i = 0; i < pciDeviceListCount(pcidevs); i++) { pciDevice *dev = pciDeviceListGet(pcidevs, i); pciDevice *activeDev = NULL; @@ -514,6 +686,21 @@ void qemuDomainReAttachHostdevDevices(struct qemud_driver *driver, pciDeviceListSteal(driver->activePciHostdevs, dev); } + /* + * For SRIOV net host devices, unset mac and port profile before + * reset and reattach device + */ + for (i = 0; i < nhostdevs; i++) { + virDomainHostdevDefPtr hostdev = hostdevs[i]; + if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS) + continue; + if (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) + continue; + if (hostdev->parent.data.net && + qemuDomainHostdevIsVirtualFunction(hostdev) == 1) + qemuDomainHostdevNetConfigRestore(hostdev, driver->stateDir); + } + for (i = 0; i < pciDeviceListCount(pcidevs); i++) { pciDevice *dev = pciDeviceListGet(pcidevs, i); if (pciResetDevice(dev, driver->activePciHostdevs, @@ -530,6 +717,23 @@ void qemuDomainReAttachHostdevDevices(struct qemud_driver *driver, qemuReattachPciDevice(dev, driver); } + /* + * For non-SRIOV net hostdevices, unset mac and port profile after + * reset and reattach device + */ + for (i = 0; i < nhostdevs; i++) { + virDomainHostdevDefPtr hostdev = hostdevs[i]; + + if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS) + continue; + if (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) + continue; + + if (hostdev->parent.data.net && + qemuDomainHostdevIsVirtualFunction(hostdev) == 0) + qemuDomainHostdevNetConfigRestore(hostdev, driver->stateDir); + } + pciDeviceListFree(pcidevs); } @@ -540,5 +744,6 @@ void qemuDomainReAttachHostDevices(struct qemud_driver *driver, if (!def->nhostdevs) return; - qemuDomainReAttachHostdevDevices(driver, def->name, def->hostdevs, def->nhostdevs); + qemuDomainReAttachHostdevDevices(driver, def->name, def->hostdevs, + def->nhostdevs); } diff --git a/src/qemu/qemu_hostdev.h b/src/qemu/qemu_hostdev.h index d852f5b..173e4f4 100644 --- a/src/qemu/qemu_hostdev.h +++ b/src/qemu/qemu_hostdev.h @@ -31,6 +31,7 @@ int qemuUpdateActivePciHostdevs(struct qemud_driver *driver, virDomainDefPtr def); int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, const char *name, + const unsigned char *uuid, virDomainHostdevDefPtr *hostdevs, int nhostdevs); int qemuPrepareHostdevUSBDevices(struct qemud_driver *driver, @@ -46,6 +47,11 @@ void qemuDomainReAttachHostdevDevices(struct qemud_driver *driver, int nhostdevs); void qemuDomainReAttachHostDevices(struct qemud_driver *driver, virDomainDefPtr def); - +int qemuDomainHostdevIsVirtualFunction(virDomainHostdevDefPtr hostdev); +int qemuDomainHostdevNetConfigReplace(virDomainHostdevDefPtr hostdev, + const unsigned char *uuid, + char *stateDir); +int qemuDomainHostdevNetConfigRestore(virDomainHostdevDefPtr hostdev, + char *stateDir); #endif /* __QEMU_HOSTDEV_H__ */ diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 50563c5..d9e9b70 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -927,7 +927,8 @@ int qemuDomainAttachHostPciDevice(struct qemud_driver *driver, return -1; } - if (qemuPrepareHostdevPCIDevices(driver, vm->def->name, &hostdev, 1) < 0) + if (qemuPrepareHostdevPCIDevices(driver, vm->def->name, vm->def->uuid, + &hostdev, 1) < 0) return -1; if (qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE)) { @@ -1886,6 +1887,14 @@ qemuDomainDetachHostPciDevice(struct qemud_driver *driver, if (ret < 0) return -1; + /* + * For SRIOV net host devices, unset mac and port profile before + * reset and reattach device + */ + if (detach->parent.data.net && + qemuDomainHostdevIsVirtualFunction(detach) == 1) + qemuDomainHostdevNetConfigRestore(detach, driver->stateDir); + pci = pciGetDevice(subsys->u.pci.domain, subsys->u.pci.bus, subsys->u.pci.slot, subsys->u.pci.function); if (pci) { @@ -1901,6 +1910,14 @@ qemuDomainDetachHostPciDevice(struct qemud_driver *driver, ret = -1; } + /* + * For non-SRIOV net host devices, restore mac and port profile before + * reset and reattach device + */ + if (detach->parent.data.net && + qemuDomainHostdevIsVirtualFunction(detach) == 0) + qemuDomainHostdevNetConfigRestore(detach, driver->stateDir); + if (qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE) && qemuDomainPCIAddressReleaseSlot(priv->pciaddrs, detach->info->addr.pci.slot) < 0)

On 03/01/2012 04:02 AM, Roopa Prabhu wrote:
This patch series is based on laines patches to support <interface type='hostdev'>. https://www.redhat.com/archives/libvir-list/2012-February/msg01126.html
It support to set mac and port profile on an interface of type hostdev. * If virtualport is specified, the existing virtual port functions are called to set mac, vlan and port profile.
I'm unable to test that part, as I don't have any 802.1QbX capable switches (and it sounds like the design is problematic anyway.)
* If virtualport is not specified and device is a sriov virtual function, - mac is set using IFLA_VF_MAC
Success!! I tried this for VFs that have a netdev driver attached, and VFs that don't, and it behaved properly in both cases - when the guest is started, the MAC address is set properly for the guest to use, and when the guest is stopped, the MAC address of that VF is restored to its original value (implying that your code to save the old MAC address works properly).
* If virtualport is not specified and device is a non-sriov virtual function, - mac is set using existing SIOCGIFHWADDR (This requires that the netdev be present on the host before starting the VM)
This one has a problem, at least with my non-sriov hardware (which happens to be the onboard NetXtreme device of a Thinkstation, using the tg3 driver) it appears the MAC address gets reset to its original setting at some point after libvirt changes it. To help understand what happens - assume the device's original MAC address is o:o:o:o:o:o, and my xml looks like this: <interface type='hostdev' managed='yes'> <mac address='n:n:n:n:n:n'/> ... </interface> When the guest boots up, ifconfig shows there is an interface with mac address o:o:o:o:o:o. Additionally, if I manually change the mac address to p:p:p:p:p:p on the host before starting the guest, when the guest boots, ifconfig shows the mac address as... o:o:o:o:o:o. So, whether or not libvirt is successfully setting the mac address, it's getting reset (probably by the card's firmware). So perhaps this is another case of wanting to do something that just isn't possible, and the way out is to simply generate an error on domain startup if the netdev being passed through isn't a VF?
This series implements the below : 01/4 pci: Add two new pci util pciDeviceGetVirtualFunctionInfo and pciConfigAddressToSysfsFile 02/4 virtnetdev: Add support functions for mac and portprofile associations on a hostdev 03/4 virnetdevvportprofile: Changes to support portprofiles for hostdevs 04/4 qemu_hostdev: Add support to install port profile and mac address on hostdev
Stefan Berger is CC'ed for 802.1Qbg changes in patch 03/4. Current code for 802.1Qbg uses macvtap ifname. And for network interfaces with type=hostdev a macvtap ifname does not exist. This patch just adds a null check for ifname in 802.1Qbg port profile handling code.

On 3/2/12 11:04 AM, "Laine Stump" <laine@laine.org> wrote:
On 03/01/2012 04:02 AM, Roopa Prabhu wrote:
This patch series is based on laines patches to support <interface type='hostdev'>. https://www.redhat.com/archives/libvir-list/2012-February/msg01126.html
It support to set mac and port profile on an interface of type hostdev. * If virtualport is specified, the existing virtual port functions are called to set mac, vlan and port profile.
I'm unable to test that part, as I don't have any 802.1QbX capable switches (and it sounds like the design is problematic anyway.)
The design is still fine for 802.1Qbh. I have tested it. 802.1Qbh does not need a macvtap device. For 802.1Qbg in v2 (was planning on pushing it the next hr), I have put a check for 802.1Qbg and hostdevs and fail as suggested by stefan.
* If virtualport is not specified and device is a sriov virtual function, - mac is set using IFLA_VF_MAC
Success!! I tried this for VFs that have a netdev driver attached, and VFs that don't, and it behaved properly in both cases - when the guest is started, the MAC address is set properly for the guest to use, and when the guest is stopped, the MAC address of that VF is restored to its original value (implying that your code to save the old MAC address works properly).
Nice. Thanks for trying it out.
* If virtualport is not specified and device is a non-sriov virtual function, - mac is set using existing SIOCGIFHWADDR (This requires that the netdev be present on the host before starting the VM)
This one has a problem, at least with my non-sriov hardware (which happens to be the onboard NetXtreme device of a Thinkstation, using the tg3 driver) it appears the MAC address gets reset to its original setting at some point after libvirt changes it. To help understand what happens - assume the device's original MAC address is o:o:o:o:o:o, and my xml looks like this:
<interface type='hostdev' managed='yes'> <mac address='n:n:n:n:n:n'/> ... </interface>
When the guest boots up, ifconfig shows there is an interface with mac address o:o:o:o:o:o.
Additionally, if I manually change the mac address to p:p:p:p:p:p on the host before starting the guest, when the guest boots, ifconfig shows the mac address as... o:o:o:o:o:o. So, whether or not libvirt is successfully setting the mac address, it's getting reset (probably by the card's firmware).
Yes this I kind of expected. It depends on the driver. I thought there are some drivers that remember the mac address set by SIOCGIFHWADDR. But my assumption was totally based on the fact that we wanted to add support for all devices. Having the code there just means we try to set the device mac. It takes effect only if the hw supports it.
So perhaps this is another case of wanting to do something that just isn't possible, and the way out is to simply generate an error on domain startup if the netdev being passed through isn't a VF?
We can do this too. Only support it for sriov vf's. Thanks, Roopa
This series implements the below : 01/4 pci: Add two new pci util pciDeviceGetVirtualFunctionInfo and pciConfigAddressToSysfsFile 02/4 virtnetdev: Add support functions for mac and portprofile associations on a hostdev 03/4 virnetdevvportprofile: Changes to support portprofiles for hostdevs 04/4 qemu_hostdev: Add support to install port profile and mac address on hostdev
Stefan Berger is CC'ed for 802.1Qbg changes in patch 03/4. Current code for 802.1Qbg uses macvtap ifname. And for network interfaces with type=hostdev a macvtap ifname does not exist. This patch just adds a null check for ifname in 802.1Qbg port profile handling code.

On 03/02/2012 03:03 PM, Roopa Prabhu wrote:
On 3/2/12 11:04 AM, "Laine Stump" <laine@laine.org> wrote:
On 03/01/2012 04:02 AM, Roopa Prabhu wrote:
This patch series is based on laines patches to support <interface type='hostdev'>. https://www.redhat.com/archives/libvir-list/2012-February/msg01126.html
It support to set mac and port profile on an interface of type hostdev. * If virtualport is specified, the existing virtual port functions are called to set mac, vlan and port profile. I'm unable to test that part, as I don't have any 802.1QbX capable switches (and it sounds like the design is problematic anyway.)
The design is still fine for 802.1Qbh.
Yes, my apologies for misinterpreting all the exchanges.
I have tested it. 802.1Qbh does not need a macvtap device. For 802.1Qbg in v2 (was planning on pushing it the next hr),
I'll try to review it as quickly as possible.
I have put a check for 802.1Qbg and hostdevs and fail as suggested by stefan.
I'm quickly learning that I understood much less about 802.1QbX (and in particular, how it's been implemented) than I thought! (Fortunately, as a result of all this, I think I now understand it a bit better).
* If virtualport is not specified and device is a sriov virtual function, - mac is set using IFLA_VF_MAC Success!! I tried this for VFs that have a netdev driver attached, and VFs that don't, and it behaved properly in both cases - when the guest is started, the MAC address is set properly for the guest to use, and when the guest is stopped, the MAC address of that VF is restored to its original value (implying that your code to save the old MAC address works properly).
Nice. Thanks for trying it out.
* If virtualport is not specified and device is a non-sriov virtual function, - mac is set using existing SIOCGIFHWADDR (This requires that the netdev be present on the host before starting the VM) This one has a problem, at least with my non-sriov hardware (which happens to be the onboard NetXtreme device of a Thinkstation, using the tg3 driver) it appears the MAC address gets reset to its original setting at some point after libvirt changes it. To help understand what happens - assume the device's original MAC address is o:o:o:o:o:o, and my xml looks like this:
<interface type='hostdev' managed='yes'> <mac address='n:n:n:n:n:n'/> ... </interface>
When the guest boots up, ifconfig shows there is an interface with mac address o:o:o:o:o:o.
Additionally, if I manually change the mac address to p:p:p:p:p:p on the host before starting the guest, when the guest boots, ifconfig shows the mac address as... o:o:o:o:o:o. So, whether or not libvirt is successfully setting the mac address, it's getting reset (probably by the card's firmware). Yes this I kind of expected. It depends on the driver. I thought there are some drivers that remember the mac address set by SIOCGIFHWADDR. But my assumption was totally based on the fact that we wanted to add support for all devices. Having the code there just means we try to set the device mac. It takes effect only if the hw supports it.
If there was just a way to determine that at runtime, but it seems that by the time the MAC address has been reset, we are no longer able to call the ioctl to check the address :-(
So perhaps this is another case of wanting to do something that just isn't possible, and the way out is to simply generate an error on domain startup if the netdev being passed through isn't a VF?
We can do this too. Only support it for sriov vf's.
Yes, if it's going to silently do the wrong thing, maybe we should leave the SIOCGIFHWADDR code in there for reference, but also add a failure condition if the card isn't SRIOV. (I guess this means my effort to make sure USB ethernets were also supported was kind of pointless, since they're sure to have the same problems :-P Mostly I only included that support to promote code sharing and consistency, though, so I'm not really disappointed.)

On 3/2/12 12:54 PM, "Laine Stump" <laine@laine.org> wrote:
On 03/02/2012 03:03 PM, Roopa Prabhu wrote:
On 3/2/12 11:04 AM, "Laine Stump" <laine@laine.org> wrote:
On 03/01/2012 04:02 AM, Roopa Prabhu wrote:
This patch series is based on laines patches to support <interface type='hostdev'>. https://www.redhat.com/archives/libvir-list/2012-February/msg01126.html
It support to set mac and port profile on an interface of type hostdev. * If virtualport is specified, the existing virtual port functions are called to set mac, vlan and port profile. I'm unable to test that part, as I don't have any 802.1QbX capable switches (and it sounds like the design is problematic anyway.)
The design is still fine for 802.1Qbh.
Yes, my apologies for misinterpreting all the exchanges.
No problem atall.
I have tested it. 802.1Qbh does not need a macvtap device. For 802.1Qbg in v2 (was planning on pushing it the next hr),
I'll try to review it as quickly as possible.
I have put a check for 802.1Qbg and hostdevs and fail as suggested by stefan.
I'm quickly learning that I understood much less about 802.1QbX (and in particular, how it's been implemented) than I thought! (Fortunately, as a result of all this, I think I now understand it a bit better).
And I am understanding 802.1Qbg a bit better now :)
* If virtualport is not specified and device is a sriov virtual function, - mac is set using IFLA_VF_MAC Success!! I tried this for VFs that have a netdev driver attached, and VFs that don't, and it behaved properly in both cases - when the guest is started, the MAC address is set properly for the guest to use, and when the guest is stopped, the MAC address of that VF is restored to its original value (implying that your code to save the old MAC address works properly).
Nice. Thanks for trying it out.
* If virtualport is not specified and device is a non-sriov virtual function, - mac is set using existing SIOCGIFHWADDR (This requires that the netdev be present on the host before starting the VM) This one has a problem, at least with my non-sriov hardware (which happens to be the onboard NetXtreme device of a Thinkstation, using the tg3 driver) it appears the MAC address gets reset to its original setting at some point after libvirt changes it. To help understand what happens - assume the device's original MAC address is o:o:o:o:o:o, and my xml looks like this:
<interface type='hostdev' managed='yes'> <mac address='n:n:n:n:n:n'/> ... </interface>
When the guest boots up, ifconfig shows there is an interface with mac address o:o:o:o:o:o.
Additionally, if I manually change the mac address to p:p:p:p:p:p on the host before starting the guest, when the guest boots, ifconfig shows the mac address as... o:o:o:o:o:o. So, whether or not libvirt is successfully setting the mac address, it's getting reset (probably by the card's firmware). Yes this I kind of expected. It depends on the driver. I thought there are some drivers that remember the mac address set by SIOCGIFHWADDR. But my assumption was totally based on the fact that we wanted to add support for all devices. Having the code there just means we try to set the device mac. It takes effect only if the hw supports it.
If there was just a way to determine that at runtime, but it seems that by the time the MAC address has been reset, we are no longer able to call the ioctl to check the address :-(
So perhaps this is another case of wanting to do something that just isn't possible, and the way out is to simply generate an error on domain startup if the netdev being passed through isn't a VF?
We can do this too. Only support it for sriov vf's.
Yes, if it's going to silently do the wrong thing, maybe we should leave the SIOCGIFHWADDR code in there for reference, but also add a failure condition if the card isn't SRIOV.
Ok. Heres what I will do (If I understand you correctly): - I will call the mac/portprofile set functions for sriov devices only. - Throw an error for non-sriov devices - Keep the mac set code for non-sriov devices around for reference
(I guess this means my effort to make sure USB ethernets were also supported was kind of pointless, since they're sure to have the same problems :-P Mostly I only included that support to promote code sharing and consistency, though, so I'm not really disappointed.)
:) Thanks Laine.
participants (4)
-
Gerhard Stenzel
-
Laine Stump
-
Roopa Prabhu
-
Stefan Berger