[libvirt] [RFC PATCH 00/28] Enable multifunction pci hotplug

Hi All, I have revisited/rewritten my previously posted patches. Here is the RFC. Since this patchset is a complete rewrite, I am starting with v1 here. The semantics is as discussed before https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html As I went on to refactor the code to support multifunction virtio devices, I realised the abort/cleanup path would be a nightmare there, in case of failures. So, dropped that attempt. The current RFC limits to the real practical use cases of Multifunction PCI hostdevices. All new test code to support multifunction PCI hostdevices and test cases are added to prove the functionality. So, to summarise ============= Patch 1 - is a bug fix Patch 2-6 - Adds all PCI/VFIO/Multifunction/multiple devices per IOMMU group support to our mock test environment. Patches till here, are kind of basic and independent but necessary for the remaining patches. ============= Patch 7-14 - Detect and auto-address PCI multifunction devices. ============= Patch 15-25 - Refactor/Prepare for hotplug/unplug Patch 26-28 - Finally implement Hotplug/Unplug Thanks, Shivaprasad --- Shivaprasad G Bhat (28): Fix the iommu group path in mock pci util: move the hostdev passthrough support functions to utility tests: pci: Mock the iommu groups and vfio virpcitest: Change the stub driver to vfio from pci-stub virpcimock: Mock the SRIOV Virtual functions tests: qemu: Add test case for pci-hostdev hotplug tests: Add a baseline test for multifunction pci device use case util: virpci: detect if the device is a multifunction device from sysfs tests: qemu: mock pci environment for qemuargv2xmltests virhostdev: Introduce virHostdevPCIDevicesBelongToSameSlot qemu: address: Separate the slots into multiple aggregates qemu: address: Enable auto addressing multifunction cards util: make virHostdevIsVirtualFunction() public conf: qemu: validate multifunction hostdevice domain configs conf: Add helper to get active functions of a slot of domain qemu: hostdev: Move the hostdev preparation to a separate function qemu: hotplug: Move the detach of PCI device to the beginnging of live hotplug qemu: hotplug: move assignment outside qemuDomainAttachHostPCIDevice Introduce virDomainDeviceDefParseXMLMany Introduce qemuDomainDeviceParseXMLMany qemu: refactor qemuDomain[Attach|Detach]DeviceConfig qemu: refactor qemuDomain[Attach|Detach]DeviceLive qemu: hotplug: Queue and wait for multiple devices domain: addr: Introduce virDomainPCIAddressEnsureMultifunctionAddress qemu: hotplug: Implement multifunction device hotplug qemu: hotplug : Prevent updates to mulitfunction device qemu: hotplug: Move out the Single function check qemu: hotplug: Implement multifunction device unplug src/conf/device_conf.h | 7 src/conf/domain_addr.c | 127 ++++++- src/conf/domain_addr.h | 41 +- src/conf/domain_conf.c | 194 +++++++++- src/conf/domain_conf.h | 39 ++ src/libvirt_private.syms | 14 + src/node_device/node_device_udev.c | 2 src/qemu/qemu_capabilities.c | 5 src/qemu/qemu_domain.c | 72 ++++ src/qemu/qemu_domain.h | 19 + src/qemu/qemu_domain_address.c | 375 ++++++++++++++++++- src/qemu/qemu_domain_address.h | 15 + src/qemu/qemu_driver.c | 197 +++++++--- src/qemu/qemu_hostdev.c | 70 ---- src/qemu/qemu_hostdev.h | 3 src/qemu/qemu_hotplug.c | 389 ++++++++++++++++---- src/qemu/qemu_hotplug.h | 14 + src/util/virhostdev.c | 96 +++++ src/util/virhostdev.h | 11 + src/util/virpci.c | 22 + src/util/virpci.h | 8 src/util/virprocess.h | 2 tests/Makefile.am | 7 tests/qemuargv2xmldata/hostdev-pci-address.args | 2 tests/qemuargv2xmldata/hostdev-pci-address.xml | 2 tests/qemuargv2xmltest.c | 18 + tests/qemuhotplugtest.c | 98 ++++- .../qemuhotplug-hostdev-pci.xml | 6 .../qemuhotplug-multifunction-hostdev-pci-2.xml | 14 + .../qemuhotplug-multifunction-hostdev-pci.xml | 20 + .../qemuhotplug-base-live+hostdev-pci.xml | 60 +++ ...hotplug-base-live+multifunction-hostdev-pci.xml | 76 ++++ .../qemuhotplug-pseries-base-live+hostdev-pci.xml | 53 +++ ...eries-base-live+multifunction-hostdev-pci-2.xml | 61 +++ ...pseries-base-live+multifunction-hostdev-pci.xml | 69 ++++ .../qemuhotplug-pseries-base-live.xml | 45 ++ .../hostdev-pci-address-device.args | 2 .../hostdev-pci-address-device.xml | 2 tests/qemuxml2argvdata/hostdev-pci-address.args | 2 tests/qemuxml2argvdata/hostdev-pci-address.xml | 2 .../hostdev-pci-multifunction.args | 31 ++ .../qemuxml2argvdata/hostdev-pci-multifunction.xml | 59 +++ .../hostdev-pci-no-primary-function.xml | 23 + tests/qemuxml2argvdata/hostdev-pci-validate.args | 25 + tests/qemuxml2argvdata/hostdev-pci-validate.xml | 29 + .../qemuxml2argvdata/hostdev-vfio-multidomain.args | 2 .../qemuxml2argvdata/hostdev-vfio-multidomain.xml | 2 tests/qemuxml2argvdata/hostdev-vfio.args | 2 tests/qemuxml2argvdata/hostdev-vfio.xml | 2 tests/qemuxml2argvdata/net-hostdev-fail.xml | 2 .../qemuxml2argvdata/net-hostdev-multidomain.args | 2 tests/qemuxml2argvdata/net-hostdev-multidomain.xml | 2 tests/qemuxml2argvdata/net-hostdev-vfio.args | 2 tests/qemuxml2argvdata/net-hostdev-vfio.xml | 2 tests/qemuxml2argvdata/net-hostdev.args | 2 tests/qemuxml2argvdata/net-hostdev.xml | 2 tests/qemuxml2argvdata/pci-rom.args | 4 tests/qemuxml2argvdata/pci-rom.xml | 4 tests/qemuxml2argvdata/pseries-hostdevs-1.args | 5 tests/qemuxml2argvdata/pseries-hostdevs-3.args | 5 tests/qemuxml2argvtest.c | 17 + tests/qemuxml2xmloutdata/hostdev-pci-address.xml | 2 .../hostdev-pci-multifunction.xml | 79 ++++ tests/qemuxml2xmloutdata/hostdev-vfio.xml | 2 tests/qemuxml2xmloutdata/net-hostdev-vfio.xml | 2 tests/qemuxml2xmloutdata/net-hostdev.xml | 2 tests/qemuxml2xmloutdata/pci-rom.xml | 4 tests/qemuxml2xmloutdata/pseries-hostdevs-1.xml | 4 tests/qemuxml2xmloutdata/pseries-hostdevs-3.xml | 4 tests/qemuxml2xmltest.c | 1 tests/virhostdevtest.c | 39 -- tests/virpcimock.c | 199 +++++++++- tests/virpcitest.c | 12 - tests/virpcitestdata/0000-06-12.0.config | Bin tests/virpcitestdata/0000-06-12.1.config | Bin tests/virpcitestdata/0000-06-12.2.config | Bin tests/virpcitestdata/0005-90-01.1.config | Bin tests/virpcitestdata/0005-90-01.2.config | Bin tests/virpcitestdata/0005-90-01.3.config | Bin tests/virprocessmock.c | 28 + 80 files changed, 2463 insertions(+), 400 deletions(-) create mode 100644 tests/qemuhotplugtestdevices/qemuhotplug-hostdev-pci.xml create mode 100644 tests/qemuhotplugtestdevices/qemuhotplug-multifunction-hostdev-pci-2.xml create mode 100644 tests/qemuhotplugtestdevices/qemuhotplug-multifunction-hostdev-pci.xml create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-base-live+hostdev-pci.xml create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-base-live+multifunction-hostdev-pci.xml create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+hostdev-pci.xml create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+multifunction-hostdev-pci-2.xml create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+multifunction-hostdev-pci.xml create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live.xml create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction.args create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction.xml create mode 100644 tests/qemuxml2argvdata/hostdev-pci-no-primary-function.xml create mode 100644 tests/qemuxml2argvdata/hostdev-pci-validate.args create mode 100644 tests/qemuxml2argvdata/hostdev-pci-validate.xml create mode 100644 tests/qemuxml2xmloutdata/hostdev-pci-multifunction.xml create mode 100644 tests/virpcitestdata/0000-06-12.0.config create mode 100644 tests/virpcitestdata/0000-06-12.1.config create mode 100644 tests/virpcitestdata/0000-06-12.2.config create mode 100644 tests/virpcitestdata/0005-90-01.3.config create mode 100644 tests/virprocessmock.c -- Signature

The mocked path falls into /sys/bus/kernel/iommu_groups instead of /sys/kernel/iommu_groups. Needed for adding some new test cases. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- tests/virpcimock.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/virpcimock.c b/tests/virpcimock.c index 176c64d654..2a7e9216b2 100644 --- a/tests/virpcimock.c +++ b/tests/virpcimock.c @@ -404,7 +404,7 @@ pci_device_new_from_stub(const struct pciDevice *data) make_file(devpath, "class", tmp, -1); if (snprintf(tmp, sizeof(tmp), - "%s/../../../kernel/iommu_groups/%d", + "%s/../../../../kernel/iommu_groups/%d", devpath, dev->iommuGroup) < 0) { ABORT("@tmp overflow"); } @@ -412,7 +412,7 @@ pci_device_new_from_stub(const struct pciDevice *data) ABORT("Unable to create %s", tmp); if (snprintf(tmp, sizeof(tmp), - "../../../kernel/iommu_groups/%d", dev->iommuGroup) < 0) { + "../../../../kernel/iommu_groups/%d", dev->iommuGroup) < 0) { ABORT("@tmp overflow"); } make_symlink(devpath, "iommu_group", tmp);

There is some duplicity of code here. Move them to common place. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/libvirt_private.syms | 2 + src/qemu/qemu_capabilities.c | 5 +-- src/qemu/qemu_driver.c | 5 +-- src/qemu/qemu_hostdev.c | 70 ++---------------------------------------- src/qemu/qemu_hostdev.h | 3 -- src/util/virhostdev.c | 63 ++++++++++++++++++++++++++++++++++++++ src/util/virhostdev.h | 3 ++ tests/virhostdevtest.c | 31 ------------------- 8 files changed, 75 insertions(+), 107 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index c67bce7389..e1bdaa127e 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1915,6 +1915,8 @@ virHostCPUStatsAssign; # util/virhostdev.h virHostdevFindUSBDevice; +virHostdevHostSupportsPassthroughKVM; +virHostdevHostSupportsPassthroughVFIO; virHostdevIsSCSIDevice; virHostdevManagerGetDefault; virHostdevPCINodeDeviceDetach; diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 3eb5ed6d1a..943e92a3d3 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -42,7 +42,6 @@ #include "virhostcpu.h" #include "qemu_monitor.h" #include "virstring.h" -#include "qemu_hostdev.h" #include "qemu_domain.h" #define __QEMU_CAPSPRIV_H_ALLOW__ #include "qemu_capspriv.h" @@ -5725,8 +5724,8 @@ static int virQEMUCapsFillDomainDeviceHostdevCaps(virQEMUCapsPtr qemuCaps, virDomainCapsDeviceHostdevPtr hostdev) { - bool supportsPassthroughKVM = qemuHostdevHostSupportsPassthroughLegacy(); - bool supportsPassthroughVFIO = qemuHostdevHostSupportsPassthroughVFIO(); + bool supportsPassthroughKVM = virHostdevHostSupportsPassthroughKVM(); + bool supportsPassthroughVFIO = virHostdevHostSupportsPassthroughVFIO(); hostdev->supported = true; /* VIR_DOMAIN_HOSTDEV_MODE_CAPABILITIES is for containers only */ diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 8c872c1f08..0ade86d6a9 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -52,7 +52,6 @@ #include "qemu_command.h" #include "qemu_parse_command.h" #include "qemu_cgroup.h" -#include "qemu_hostdev.h" #include "qemu_hotplug.h" #include "qemu_monitor.h" #include "qemu_process.h" @@ -12963,8 +12962,8 @@ qemuNodeDeviceDetachFlags(virNodeDevicePtr dev, int ret = -1; virNodeDeviceDefPtr def = NULL; char *xml = NULL; - bool legacy = qemuHostdevHostSupportsPassthroughLegacy(); - bool vfio = qemuHostdevHostSupportsPassthroughVFIO(); + bool legacy = virHostdevHostSupportsPassthroughKVM(); + bool vfio = virHostdevHostSupportsPassthroughVFIO(); virHostdevManagerPtr hostdev_mgr = driver->hostdevMgr; virCheckFlags(0, -1); diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index 73d26f4c63..c110cf7816 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -121,78 +121,14 @@ qemuHostdevUpdateActiveDomainDevices(virQEMUDriverPtr driver, return 0; } -bool -qemuHostdevHostSupportsPassthroughVFIO(void) -{ - DIR *iommuDir = NULL; - struct dirent *iommuGroup = NULL; - bool ret = false; - int direrr; - - /* condition 1 - /sys/kernel/iommu_groups/ contains entries */ - if (virDirOpenQuiet(&iommuDir, "/sys/kernel/iommu_groups/") < 0) - goto cleanup; - - while ((direrr = virDirRead(iommuDir, &iommuGroup, NULL)) > 0) { - /* assume we found a group */ - break; - } - - if (direrr < 0 || !iommuGroup) - goto cleanup; - /* okay, iommu is on and recognizes groups */ - - /* condition 2 - /dev/vfio/vfio exists */ - if (!virFileExists("/dev/vfio/vfio")) - goto cleanup; - - ret = true; - - cleanup: - VIR_DIR_CLOSE(iommuDir); - return ret; -} - - -#if HAVE_LINUX_KVM_H -# include <linux/kvm.h> -bool -qemuHostdevHostSupportsPassthroughLegacy(void) -{ - int kvmfd = -1; - bool ret = false; - - if ((kvmfd = open("/dev/kvm", O_RDONLY)) < 0) - goto cleanup; - -# ifdef KVM_CAP_IOMMU - if ((ioctl(kvmfd, KVM_CHECK_EXTENSION, KVM_CAP_IOMMU)) <= 0) - goto cleanup; - - ret = true; -# endif - - cleanup: - VIR_FORCE_CLOSE(kvmfd); - - return ret; -} -#else -bool -qemuHostdevHostSupportsPassthroughLegacy(void) -{ - return false; -} -#endif - static bool qemuHostdevPreparePCIDevicesCheckSupport(virDomainHostdevDefPtr *hostdevs, size_t nhostdevs, virQEMUCapsPtr qemuCaps) { - bool supportsPassthroughKVM = qemuHostdevHostSupportsPassthroughLegacy(); - bool supportsPassthroughVFIO = qemuHostdevHostSupportsPassthroughVFIO(); + bool supportsPassthroughKVM = virHostdevHostSupportsPassthroughKVM(); + bool supportsPassthroughVFIO = virHostdevHostSupportsPassthroughVFIO(); size_t i; /* assign defaults for hostdev passthrough */ @@ -330,7 +266,7 @@ qemuHostdevPrepareMediatedDevices(virQEMUDriverPtr driver, int nhostdevs) { virHostdevManagerPtr hostdev_mgr = driver->hostdevMgr; - bool supportsVFIO = qemuHostdevHostSupportsPassthroughVFIO(); + bool supportsVFIO = virHostdevHostSupportsPassthroughVFIO(); size_t i; for (i = 0; i < nhostdevs; i++) { diff --git a/src/qemu/qemu_hostdev.h b/src/qemu/qemu_hostdev.h index 9a7c7f143c..bdc51f9d9e 100644 --- a/src/qemu/qemu_hostdev.h +++ b/src/qemu/qemu_hostdev.h @@ -27,9 +27,6 @@ # include "qemu_conf.h" # include "domain_conf.h" -bool qemuHostdevHostSupportsPassthroughLegacy(void); -bool qemuHostdevHostSupportsPassthroughVFIO(void); - int qemuHostdevUpdateActiveMediatedDevices(virQEMUDriverPtr driver, virDomainDefPtr def); int qemuHostdevUpdateActivePCIDevices(virQEMUDriverPtr driver, diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index a12224c58f..e0133cdeec 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -2299,3 +2299,66 @@ virHostdevUpdateActiveDomainDevices(virHostdevManagerPtr mgr, return 0; } + +bool +virHostdevHostSupportsPassthroughVFIO(void) +{ + DIR *iommuDir = NULL; + struct dirent *iommuGroup = NULL; + bool ret = false; + int direrr; + + /* condition 1 - /sys/kernel/iommu_groups/ contains entries */ + if (virDirOpenQuiet(&iommuDir, "/sys/kernel/iommu_groups/") < 0) + goto cleanup; + + while ((direrr = virDirRead(iommuDir, &iommuGroup, NULL)) > 0) { + /* assume we found a group */ + break; + } + + if (direrr < 0 || !iommuGroup) + goto cleanup; + /* okay, iommu is on and recognizes groups */ + + /* condition 2 - /dev/vfio/vfio exists */ + if (!virFileExists("/dev/vfio/vfio")) + goto cleanup; + + ret = true; + + cleanup: + VIR_DIR_CLOSE(iommuDir); + return ret; +} + +#if HAVE_LINUX_KVM_H +# include <linux/kvm.h> +bool +virHostdevHostSupportsPassthroughKVM(void) +{ + int kvmfd = -1; + bool ret = false; + + if ((kvmfd = open("/dev/kvm", O_RDONLY)) < 0) + goto cleanup; + +# ifdef KVM_CAP_IOMMU + if ((ioctl(kvmfd, KVM_CHECK_EXTENSION, KVM_CAP_IOMMU)) <= 0) + goto cleanup; + + ret = true; +# endif + + cleanup: + VIR_FORCE_CLOSE(kvmfd); + + return ret; +} +#else +bool +virHostdevHostSupportsPassthroughKVM(void) +{ + return false; +} +#endif diff --git a/src/util/virhostdev.h b/src/util/virhostdev.h index 54e1c66be3..d5efffbac2 100644 --- a/src/util/virhostdev.h +++ b/src/util/virhostdev.h @@ -203,4 +203,7 @@ int virHostdevPCINodeDeviceReset(virHostdevManagerPtr mgr, virPCIDevicePtr pci) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +bool virHostdevHostSupportsPassthroughKVM(void); +bool virHostdevHostSupportsPassthroughVFIO(void); + #endif /* __VIR_HOSTDEV_H__ */ diff --git a/tests/virhostdevtest.c b/tests/virhostdevtest.c index 5b03cb6aee..e4fee567a2 100644 --- a/tests/virhostdevtest.c +++ b/tests/virhostdevtest.c @@ -130,37 +130,6 @@ myInit(void) return -1; } -# if HAVE_LINUX_KVM_H -# include <linux/kvm.h> -static bool -virHostdevHostSupportsPassthroughKVM(void) -{ - int kvmfd = -1; - bool ret = false; - - if ((kvmfd = open("/dev/kvm", O_RDONLY)) < 0) - goto cleanup; - -# ifdef KVM_CAP_IOMMU - if ((ioctl(kvmfd, KVM_CHECK_EXTENSION, KVM_CAP_IOMMU)) <= 0) - goto cleanup; - - ret = true; -# endif - - cleanup: - VIR_FORCE_CLOSE(kvmfd); - - return ret; -} -# else -static bool -virHostdevHostSupportsPassthroughKVM(void) -{ - return false; -} -# endif - static int testVirHostdevPreparePCIHostdevs_unmanaged(void) {

The iommu group, /dev/vfio/<iommuGroupNumber> behaviours of the host are mocked. This patch implments support for multifunction/multiple devices per iommu groups and emulates the /dev/vfio/<iommuGroupNumber> file correctly. This code helps adding necessary testcases for pci-hotplug code. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- tests/virpcimock.c | 178 +++++++++++++++++++++++++++--- tests/virpcitestdata/0005-90-01.1.config | Bin tests/virpcitestdata/0005-90-01.2.config | Bin tests/virpcitestdata/0005-90-01.3.config | Bin 4 files changed, 162 insertions(+), 16 deletions(-) create mode 100644 tests/virpcitestdata/0005-90-01.3.config diff --git a/tests/virpcimock.c b/tests/virpcimock.c index 2a7e9216b2..22adc740b8 100644 --- a/tests/virpcimock.c +++ b/tests/virpcimock.c @@ -51,7 +51,13 @@ static DIR * (*real_opendir)(const char *name); char *fakerootdir; char *fakesysfspcidir; +struct pciIommuGroup { + int iommu; + size_t nDevicesBoundToVFIO; /* Indicates the devices in the group */ +}; + # define SYSFS_PCI_PREFIX "/sys/bus/pci/" +# define SYSFS_KERNEL_PREFIX "/sys/kernel/" # define STDERR(...) \ fprintf(stderr, "%s %zu: ", __FUNCTION__, (size_t) __LINE__); \ @@ -142,6 +148,9 @@ size_t nPCIDevices = 0; struct pciDriver **pciDrivers = NULL; size_t nPCIDrivers = 0; +struct pciIommuGroup **pciIommuGroups = NULL; +size_t npciIommuGroups = 0; + struct fdCallback *callbacks = NULL; size_t nCallbacks = 0; @@ -191,7 +200,7 @@ make_file(const char *path, VIR_FREE(filepath); } -static void +static void ATTRIBUTE_UNUSED make_symlink(const char *path, const char *name, const char *target) @@ -255,6 +264,13 @@ getrealpath(char **newpath, errno = ENOMEM; return -1; } + } else if (STRPREFIX(path, SYSFS_KERNEL_PREFIX)) { + if (virAsprintfQuiet(newpath, "%s/%s", + fakerootdir, + path) < 0) { + errno = ENOMEM; + return -1; + } } else { if (VIR_STRDUP_QUIET(*newpath, path) < 0) return -1; @@ -465,6 +481,101 @@ pci_device_autobind(struct pciDevice *dev) return pci_driver_bind(driver, dev); } +static void +pci_iommu_new(int num) +{ + char *iommupath, *kerneldir; + struct pciIommuGroup *iommuGroup; + + if (VIR_ALLOC_QUIET(iommuGroup) < 0) + ABORT_OOM(); + + iommuGroup->iommu = num; + iommuGroup->nDevicesBoundToVFIO = 0; /* No device bound to vfio by default */ + + if (virAsprintfQuiet(&kerneldir, "%s%s", + fakerootdir, SYSFS_KERNEL_PREFIX) < 0) + ABORT_OOM(); + + if (virAsprintfQuiet(&iommupath, "%s/iommu_groups/%d/devices", kerneldir, num) < 0) + ABORT_OOM(); + VIR_FREE(kerneldir); + + if (virFileMakePath(iommupath) < 0) + ABORT("Unable to create: %s", iommupath); + VIR_FREE(iommupath); + + if (VIR_APPEND_ELEMENT_QUIET(pciIommuGroups, npciIommuGroups, iommuGroup) < 0) + ABORT_OOM(); +} + +static int +pci_vfio_release_iommu(struct pciDevice *device) +{ + char *vfiopath = NULL; + int ret = -1; + size_t i = 0; + + for (i = 0; i < npciIommuGroups; i++) { + if (pciIommuGroups[i]->iommu == device->iommuGroup) + break; + } + + if (i != npciIommuGroups) { + if (pciIommuGroups[i]->nDevicesBoundToVFIO == 0) { + ret = 0; + goto cleanup; + } + pciIommuGroups[i]->nDevicesBoundToVFIO--; + if (!pciIommuGroups[i]->nDevicesBoundToVFIO) { + if (virAsprintfQuiet(&vfiopath, "%s/dev/vfio/%d", + fakesysfspcidir, device->iommuGroup) < 0) { + errno = ENOMEM; + goto cleanup; + } + if (unlink(vfiopath) < 0) + goto cleanup; + } + } + + ret = 0; + cleanup: + VIR_FREE(vfiopath); + return ret; +} + +static int +pci_vfio_lock_iommu(struct pciDevice *device) +{ + char *vfiopath = NULL; + int ret = -1; + size_t i = 0; + int fd = -1; + + for (i = 0; i < npciIommuGroups; i++) { + if (pciIommuGroups[i]->iommu == device->iommuGroup) + break; + } + + if (i != npciIommuGroups) { + if (!pciIommuGroups[i]->nDevicesBoundToVFIO) { + if (virAsprintfQuiet(&vfiopath, "%s/dev/vfio/%d", + fakesysfspcidir, device->iommuGroup) < 0) { + errno = ENOMEM; + goto cleanup; + } + if ((fd = real_open(vfiopath, O_CREAT)) < 0) + goto cleanup; + } + pciIommuGroups[i]->nDevicesBoundToVFIO++; + } + + ret = 0; + cleanup: + real_close(fd); + VIR_FREE(vfiopath); + return ret; +} /* * PCI Driver functions @@ -588,6 +699,10 @@ pci_driver_bind(struct pciDriver *driver, if (symlink(devpath, driverpath) < 0) goto cleanup; + if (STREQ(driver->name, "vfio-pci")) + if (pci_vfio_lock_iommu(dev) < 0) + goto cleanup; + dev->driver = driver; ret = 0; cleanup: @@ -622,6 +737,10 @@ pci_driver_unbind(struct pciDriver *driver, unlink(driverpath) < 0) goto cleanup; + if (STREQ(driver->name, "vfio-pci")) + if (pci_vfio_release_iommu(dev) < 0) + goto cleanup; + dev->driver = NULL; ret = 0; cleanup: @@ -819,6 +938,8 @@ init_syms(void) static void init_env(void) { + char *devvfio; + if (fakerootdir && fakesysfspcidir) return; @@ -833,6 +954,28 @@ init_env(void) ABORT("Unable to create: %s", fakesysfspcidir); make_file(fakesysfspcidir, "drivers_probe", NULL, -1); + if (virAsprintfQuiet(&devvfio, "%s/dev/vfio", fakesysfspcidir) < 0) + ABORT_OOM(); + + if (virFileMakePath(devvfio) < 0) + ABORT("Unable to create: %s", devvfio); + + /* Create /dev/vfio/vfio file */ + make_file(devvfio, "vfio", NULL, -1); + VIR_FREE(devvfio); + + pci_iommu_new(0); + pci_iommu_new(1); + pci_iommu_new(2); + pci_iommu_new(3); + pci_iommu_new(4); + pci_iommu_new(5); + pci_iommu_new(6); + pci_iommu_new(7); + pci_iommu_new(8); + pci_iommu_new(9); + pci_iommu_new(10); + pci_iommu_new(11); # define MAKE_PCI_DRIVER(name, ...) \ pci_driver_new(name, 0, __VA_ARGS__, -1, -1) @@ -842,6 +985,7 @@ init_env(void) MAKE_PCI_DRIVER("pci-stub", -1, -1); pci_driver_new("vfio-pci", PCI_ACTION_BIND, -1, -1); + # define MAKE_PCI_DEVICE(Id, Vendor, Device, ...) \ do { \ struct pciDevice dev = {.id = (char *)Id, .vendor = Vendor, \ @@ -849,20 +993,21 @@ init_env(void) pci_device_new_from_stub(&dev); \ } while (0) - MAKE_PCI_DEVICE("0000:00:00.0", 0x8086, 0x0044); - MAKE_PCI_DEVICE("0000:00:01.0", 0x8086, 0x0044); - MAKE_PCI_DEVICE("0000:00:02.0", 0x8086, 0x0046); - MAKE_PCI_DEVICE("0000:00:03.0", 0x8086, 0x0048); - MAKE_PCI_DEVICE("0001:00:00.0", 0x1014, 0x03b9, .class = 0x060400); - MAKE_PCI_DEVICE("0001:01:00.0", 0x8086, 0x105e, .iommuGroup = 0); - MAKE_PCI_DEVICE("0001:01:00.1", 0x8086, 0x105e, .iommuGroup = 0); - MAKE_PCI_DEVICE("0005:80:00.0", 0x10b5, 0x8112, .class = 0x060400); - MAKE_PCI_DEVICE("0005:90:01.0", 0x1033, 0x0035, .iommuGroup = 1); - MAKE_PCI_DEVICE("0005:90:01.1", 0x1033, 0x0035, .iommuGroup = 1); - MAKE_PCI_DEVICE("0005:90:01.2", 0x1033, 0x00e0, .iommuGroup = 1); - MAKE_PCI_DEVICE("0000:0a:01.0", 0x8086, 0x0047); - MAKE_PCI_DEVICE("0000:0a:02.0", 0x8286, 0x0048); - MAKE_PCI_DEVICE("0000:0a:03.0", 0x8386, 0x0048); + MAKE_PCI_DEVICE("0000:00:00.0", 0x8086, 0x0044, .iommuGroup = 0); + MAKE_PCI_DEVICE("0000:00:01.0", 0x8086, 0x0044, .iommuGroup = 1); + MAKE_PCI_DEVICE("0000:00:02.0", 0x8086, 0x0046, .iommuGroup = 2); + MAKE_PCI_DEVICE("0000:00:03.0", 0x8086, 0x0048, .iommuGroup = 3); + MAKE_PCI_DEVICE("0001:00:00.0", 0x1014, 0x03b9, .class = 0x060400, .iommuGroup = 4); + MAKE_PCI_DEVICE("0001:01:00.0", 0x8086, 0x105e, .iommuGroup = 5); + MAKE_PCI_DEVICE("0001:01:00.1", 0x8086, 0x105e, .iommuGroup = 5); + MAKE_PCI_DEVICE("0005:80:00.0", 0x10b5, 0x8112, .class = 0x060400, .iommuGroup = 6); + MAKE_PCI_DEVICE("0005:90:01.0", 0x1033, 0x0035, .iommuGroup = 7); + MAKE_PCI_DEVICE("0005:90:01.1", 0x1033, 0x0035, .iommuGroup = 7); + MAKE_PCI_DEVICE("0005:90:01.2", 0x1033, 0x00e0, .iommuGroup = 7); + MAKE_PCI_DEVICE("0005:90:01.3", 0x1033, 0x00e0, .iommuGroup = 7); + MAKE_PCI_DEVICE("0000:0a:01.0", 0x8086, 0x0047, .iommuGroup = 8); + MAKE_PCI_DEVICE("0000:0a:02.0", 0x8286, 0x0048, .iommuGroup = 8); + MAKE_PCI_DEVICE("0000:0a:03.0", 0x8386, 0x0048, .iommuGroup = 8); } @@ -1029,7 +1174,8 @@ opendir(const char *path) init_syms(); - if (STRPREFIX(path, SYSFS_PCI_PREFIX) && + if ((STRPREFIX(path, SYSFS_PCI_PREFIX) || + STRPREFIX(path, SYSFS_KERNEL_PREFIX)) && getrealpath(&newpath, path) < 0) return NULL; diff --git a/tests/virpcitestdata/0005-90-01.1.config b/tests/virpcitestdata/0005-90-01.1.config index beee76534041a7020c08ae9ac03d9a349c6ea12e..a60599bd342d3ebcdc7b8367ca36ad337f602fde 100644 GIT binary patch delta 44 ycmZo*YG4vE7BFRCV-R3+7GUOK;Amg~f~JXq5)*X<85t+qEn;Cc-jFjfPzC^<7YL>R delta 39 ucmZo*YG4vE7BFRCV-R3+7GUOK;9y{25MXGU7$`AON05<eqTQm22?_viD+cla diff --git a/tests/virpcitestdata/0005-90-01.2.config b/tests/virpcitestdata/0005-90-01.2.config index cfd72e4d7165bff2751ecbdc570ce7f1e0646226..a60599bd342d3ebcdc7b8367ca36ad337f602fde 100644 GIT binary patch literal 256 zcmXpOFlAt45MXi^VCG@qXkY+>CJ=!Q7z5RUfCHEW5{!&mj0{Y5Fz#TaS&cX3;ByxM DCDjDB literal 256 zcmXpOc)-BMAi%_;z|6zo!oa|wz|aIFu>xbDS`csmlR$!5K#7rosSd`)Mk^@TV-u#E T7_0Gy9FS#<5E~CbC<F-rZiWW} diff --git a/tests/virpcitestdata/0005-90-01.3.config b/tests/virpcitestdata/0005-90-01.3.config new file mode 100644 index 0000000000000000000000000000000000000000..a60599bd342d3ebcdc7b8367ca36ad337f602fde GIT binary patch literal 256 zcmXpOFlAt45MXi^VCG@qXkY+>CJ=!Q7z5RUfCHEW5{!&mj0{Y5Fz#TaS&cX3;ByxM DCDjDB literal 0 HcmV?d00001

The pci-stub is obsolete for a while now. Upcoming test cases try to test the VFIO hotplug/unplug cases. Change the default test driver to vfio-pci instead of pci-stub, and fail bind for pci-stub instead. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- tests/virhostdevtest.c | 8 ++++---- tests/virpcimock.c | 5 ++--- tests/virpcitest.c | 12 ++++++------ 3 files changed, 12 insertions(+), 13 deletions(-) diff --git a/tests/virhostdevtest.c b/tests/virhostdevtest.c index e4fee567a2..41062ebf3c 100644 --- a/tests/virhostdevtest.c +++ b/tests/virhostdevtest.c @@ -97,7 +97,7 @@ myInit(void) subsys.u.pci.addr.bus = 0; subsys.u.pci.addr.slot = i + 1; subsys.u.pci.addr.function = 0; - subsys.u.pci.backend = VIR_DOMAIN_HOSTDEV_PCI_BACKEND_KVM; + subsys.u.pci.backend = VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO; hostdevs[i]->source.subsys = subsys; } @@ -105,7 +105,7 @@ myInit(void) if (!(dev[i] = virPCIDeviceNew(0, 0, i + 1, 0))) goto cleanup; - virPCIDeviceSetStubDriver(dev[i], VIR_PCI_STUB_DRIVER_KVM); + virPCIDeviceSetStubDriver(dev[i], VIR_PCI_STUB_DRIVER_VFIO); } if (VIR_ALLOC(mgr) < 0) @@ -485,7 +485,7 @@ testVirHostdevRoundtripManaged(const void *opaque ATTRIBUTE_UNUSED) { int ret = -1; - if (virHostdevHostSupportsPassthroughKVM()) { + if (virHostdevHostSupportsPassthroughVFIO()) { if (testVirHostdevPreparePCIHostdevs_managed(false) < 0) goto out; if (testVirHostdevReAttachPCIHostdevs_managed(false) < 0) @@ -517,7 +517,7 @@ testVirHostdevRoundtripMixed(const void *opaque ATTRIBUTE_UNUSED) if (testVirHostdevDetachPCINodeDevice() < 0) goto out; - if (virHostdevHostSupportsPassthroughKVM()) { + if (virHostdevHostSupportsPassthroughVFIO()) { if (testVirHostdevPreparePCIHostdevs_managed(true) < 0) goto out; if (testVirHostdevReAttachPCIHostdevs_managed(true) < 0) diff --git a/tests/virpcimock.c b/tests/virpcimock.c index 22adc740b8..c9b7434c9a 100644 --- a/tests/virpcimock.c +++ b/tests/virpcimock.c @@ -982,9 +982,8 @@ init_env(void) MAKE_PCI_DRIVER("iwlwifi", 0x8086, 0x0044); MAKE_PCI_DRIVER("i915", 0x8086, 0x0046, 0x8086, 0x0047); - MAKE_PCI_DRIVER("pci-stub", -1, -1); - pci_driver_new("vfio-pci", PCI_ACTION_BIND, -1, -1); - + pci_driver_new("pci-stub", PCI_ACTION_BIND, -1, -1); + MAKE_PCI_DRIVER("vfio-pci", -1, -1); # define MAKE_PCI_DEVICE(Id, Vendor, Device, ...) \ do { \ diff --git a/tests/virpcitest.c b/tests/virpcitest.c index 80e994ae52..fa7acb85a2 100644 --- a/tests/virpcitest.c +++ b/tests/virpcitest.c @@ -110,12 +110,12 @@ testVirPCIDeviceDetach(const void *opaque ATTRIBUTE_UNUSED) if (!(dev[i] = virPCIDeviceNew(0, 0, i + 1, 0))) goto cleanup; - virPCIDeviceSetStubDriver(dev[i], VIR_PCI_STUB_DRIVER_KVM); + virPCIDeviceSetStubDriver(dev[i], VIR_PCI_STUB_DRIVER_VFIO); if (virPCIDeviceDetach(dev[i], activeDevs, inactiveDevs) < 0) goto cleanup; - if (testVirPCIDeviceCheckDriver(dev[i], "pci-stub") < 0) + if (testVirPCIDeviceCheckDriver(dev[i], "vfio-pci") < 0) goto cleanup; CHECK_LIST_COUNT(activeDevs, 0); @@ -249,7 +249,7 @@ testVirPCIDeviceDetachSingle(const void *opaque) if (!dev) goto cleanup; - virPCIDeviceSetStubDriver(dev, VIR_PCI_STUB_DRIVER_KVM); + virPCIDeviceSetStubDriver(dev, VIR_PCI_STUB_DRIVER_VFIO); if (virPCIDeviceDetach(dev, NULL, NULL) < 0) goto cleanup; @@ -271,7 +271,7 @@ testVirPCIDeviceDetachFail(const void *opaque) if (!dev) goto cleanup; - virPCIDeviceSetStubDriver(dev, VIR_PCI_STUB_DRIVER_VFIO); + virPCIDeviceSetStubDriver(dev, VIR_PCI_STUB_DRIVER_KVM); if (virPCIDeviceDetach(dev, NULL, NULL) < 0) { if (virTestGetVerbose() || virTestGetDebug()) @@ -282,7 +282,7 @@ testVirPCIDeviceDetachFail(const void *opaque) virReportError(VIR_ERR_INTERNAL_ERROR, "Attaching device %s to %s should have failed", virPCIDeviceGetName(dev), - virPCIStubDriverTypeToString(VIR_PCI_STUB_DRIVER_VFIO)); + virPCIStubDriverTypeToString(VIR_PCI_STUB_DRIVER_KVM)); } cleanup: @@ -441,7 +441,7 @@ mymain(void) /* Detach an unbound device */ DO_TEST_PCI_DRIVER(0, 0x0a, 2, 0, NULL); DO_TEST_PCI(testVirPCIDeviceDetachSingle, 0, 0x0a, 2, 0); - DO_TEST_PCI_DRIVER(0, 0x0a, 2, 0, "pci-stub"); + DO_TEST_PCI_DRIVER(0, 0x0a, 2, 0, "vfio-pci"); /* Reattach an unknown unbound device */ DO_TEST_PCI_DRIVER(0, 0x0a, 3, 0, NULL);

The softlink to physfn is the way to know if the device is VF or not. So, the patch softlinks 'physfn' to the parent function. The multifunction PCI devices dont have 'physfn' softlinks. The patch adds few Virtual functions to the mock environment and changes the existing test xmls using the VFs to use the newly added VFs for their use case. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- .../hostdev-pci-address-device.args | 2 +- .../hostdev-pci-address-device.xml | 2 +- tests/qemuxml2argvdata/hostdev-pci-address.args | 2 +- tests/qemuxml2argvdata/hostdev-pci-address.xml | 2 +- .../qemuxml2argvdata/hostdev-vfio-multidomain.args | 2 +- .../qemuxml2argvdata/hostdev-vfio-multidomain.xml | 2 +- tests/qemuxml2argvdata/hostdev-vfio.args | 2 +- tests/qemuxml2argvdata/hostdev-vfio.xml | 2 +- tests/qemuxml2argvdata/net-hostdev-fail.xml | 2 +- .../qemuxml2argvdata/net-hostdev-multidomain.args | 2 +- tests/qemuxml2argvdata/net-hostdev-multidomain.xml | 2 +- tests/qemuxml2argvdata/net-hostdev-vfio.args | 2 +- tests/qemuxml2argvdata/net-hostdev-vfio.xml | 2 +- tests/qemuxml2argvdata/net-hostdev.args | 2 +- tests/qemuxml2argvdata/net-hostdev.xml | 2 +- tests/qemuxml2argvdata/pci-rom.args | 4 ++-- tests/qemuxml2argvdata/pci-rom.xml | 4 ++-- tests/qemuxml2xmloutdata/hostdev-pci-address.xml | 2 +- tests/qemuxml2xmloutdata/hostdev-vfio.xml | 2 +- tests/qemuxml2xmloutdata/net-hostdev-vfio.xml | 2 +- tests/qemuxml2xmloutdata/net-hostdev.xml | 2 +- tests/qemuxml2xmloutdata/pci-rom.xml | 4 ++-- tests/virpcimock.c | 14 ++++++++++++++ tests/virpcitestdata/0000-06-12.0.config | Bin tests/virpcitestdata/0000-06-12.1.config | Bin tests/virpcitestdata/0000-06-12.2.config | Bin 26 files changed, 39 insertions(+), 25 deletions(-) create mode 100644 tests/virpcitestdata/0000-06-12.0.config create mode 100644 tests/virpcitestdata/0000-06-12.1.config create mode 100644 tests/virpcitestdata/0000-06-12.2.config diff --git a/tests/qemuxml2argvdata/hostdev-pci-address-device.args b/tests/qemuxml2argvdata/hostdev-pci-address-device.args index a250082e1e..93221b6e9d 100644 --- a/tests/qemuxml2argvdata/hostdev-pci-address-device.args +++ b/tests/qemuxml2argvdata/hostdev-pci-address-device.args @@ -22,5 +22,5 @@ server,nowait \ -usb \ -drive file=/dev/HostVG/QEMUGuest2,format=raw,if=none,id=drive-ide0-0-0 \ -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ --device pci-assign,host=06:12.5,id=hostdev0,bus=pci.0,addr=0x3 \ +-device pci-assign,host=06:12.1,id=hostdev0,bus=pci.0,addr=0x3 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 diff --git a/tests/qemuxml2argvdata/hostdev-pci-address-device.xml b/tests/qemuxml2argvdata/hostdev-pci-address-device.xml index 814fdcc271..492031d5a9 100644 --- a/tests/qemuxml2argvdata/hostdev-pci-address-device.xml +++ b/tests/qemuxml2argvdata/hostdev-pci-address-device.xml @@ -20,7 +20,7 @@ </disk> <hostdev mode='subsystem' type='pci' managed='yes'> <source> - <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> </source> </hostdev> <memballoon model='virtio'/> diff --git a/tests/qemuxml2argvdata/hostdev-pci-address.args b/tests/qemuxml2argvdata/hostdev-pci-address.args index 1f3a2443bf..728712a1c1 100644 --- a/tests/qemuxml2argvdata/hostdev-pci-address.args +++ b/tests/qemuxml2argvdata/hostdev-pci-address.args @@ -21,4 +21,4 @@ server,nowait \ -usb \ -drive file=/dev/HostVG/QEMUGuest2,format=raw,if=none,id=drive-ide0-0-0 \ -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ --device pci-assign,host=06:12.5,id=hostdev0,bus=pci.0,addr=0x3 +-device pci-assign,host=06:12.1,id=hostdev0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/hostdev-pci-address.xml b/tests/qemuxml2argvdata/hostdev-pci-address.xml index 17968350ff..af7e300c57 100644 --- a/tests/qemuxml2argvdata/hostdev-pci-address.xml +++ b/tests/qemuxml2argvdata/hostdev-pci-address.xml @@ -27,7 +27,7 @@ <input type='keyboard' bus='ps2'/> <hostdev mode='subsystem' type='pci' managed='yes'> <source> - <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> </source> </hostdev> <memballoon model='none'/> diff --git a/tests/qemuxml2argvdata/hostdev-vfio-multidomain.args b/tests/qemuxml2argvdata/hostdev-vfio-multidomain.args index 492e9b35e0..87ea9e7e09 100644 --- a/tests/qemuxml2argvdata/hostdev-vfio-multidomain.args +++ b/tests/qemuxml2argvdata/hostdev-vfio-multidomain.args @@ -22,5 +22,5 @@ server,nowait \ -usb \ -drive file=/dev/HostVG/QEMUGuest2,format=raw,if=none,id=drive-ide0-0-0 \ -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ --device vfio-pci,host=55aa:20:0f.3,id=hostdev0,bus=pci.0,addr=0x3 \ +-device vfio-pci,host=0021:de:1f.1,id=hostdev0,bus=pci.0,addr=0x3 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 diff --git a/tests/qemuxml2argvdata/hostdev-vfio-multidomain.xml b/tests/qemuxml2argvdata/hostdev-vfio-multidomain.xml index 832458125b..7c34b65c7f 100644 --- a/tests/qemuxml2argvdata/hostdev-vfio-multidomain.xml +++ b/tests/qemuxml2argvdata/hostdev-vfio-multidomain.xml @@ -25,7 +25,7 @@ <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> - <address domain='0x55aa' bus='32' slot='15' function='3'/> + <address domain='0x0021' bus='222' slot='31' function='1'/> </source> </hostdev> <memballoon model='virtio'/> diff --git a/tests/qemuxml2argvdata/hostdev-vfio.args b/tests/qemuxml2argvdata/hostdev-vfio.args index ba54d03e8e..d7cc2b7246 100644 --- a/tests/qemuxml2argvdata/hostdev-vfio.args +++ b/tests/qemuxml2argvdata/hostdev-vfio.args @@ -22,5 +22,5 @@ server,nowait \ -usb \ -drive file=/dev/HostVG/QEMUGuest2,format=raw,if=none,id=drive-ide0-0-0 \ -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ --device vfio-pci,host=06:12.5,id=hostdev0,bus=pci.0,addr=0x3 \ +-device vfio-pci,host=06:12.1,id=hostdev0,bus=pci.0,addr=0x3 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 diff --git a/tests/qemuxml2argvdata/hostdev-vfio.xml b/tests/qemuxml2argvdata/hostdev-vfio.xml index 4d96b9f732..b2e6c62e7e 100644 --- a/tests/qemuxml2argvdata/hostdev-vfio.xml +++ b/tests/qemuxml2argvdata/hostdev-vfio.xml @@ -27,7 +27,7 @@ <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> - <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> </source> </hostdev> <memballoon model='virtio'/> diff --git a/tests/qemuxml2argvdata/net-hostdev-fail.xml b/tests/qemuxml2argvdata/net-hostdev-fail.xml index c815d68bd9..50b102c658 100644 --- a/tests/qemuxml2argvdata/net-hostdev-fail.xml +++ b/tests/qemuxml2argvdata/net-hostdev-fail.xml @@ -25,7 +25,7 @@ <interface type='hostdev' managed='yes'> <mac address='00:11:22:33:44:55'/> <source> - <address type='pci' domain='0x0000' bus='0x03' slot='0x07' function='0x1'/> + <address type='pci' domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> </source> <model type='virtio'/> <filterref filter='myfilter'/> diff --git a/tests/qemuxml2argvdata/net-hostdev-multidomain.args b/tests/qemuxml2argvdata/net-hostdev-multidomain.args index 7d27964715..58696a4a9c 100644 --- a/tests/qemuxml2argvdata/net-hostdev-multidomain.args +++ b/tests/qemuxml2argvdata/net-hostdev-multidomain.args @@ -22,5 +22,5 @@ server,nowait \ -usb \ -drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-ide0-0-0 \ -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ --device pci-assign,host=2424:21:1c.6,id=hostdev0,bus=pci.0,addr=0x3 \ +-device pci-assign,host=0021:de:1f.1,id=hostdev0,bus=pci.0,addr=0x3 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 diff --git a/tests/qemuxml2argvdata/net-hostdev-multidomain.xml b/tests/qemuxml2argvdata/net-hostdev-multidomain.xml index 27e5f0f724..71d053a3da 100644 --- a/tests/qemuxml2argvdata/net-hostdev-multidomain.xml +++ b/tests/qemuxml2argvdata/net-hostdev-multidomain.xml @@ -25,7 +25,7 @@ <interface type='hostdev' managed='yes'> <mac address='00:11:22:33:44:55'/> <source> - <address type='pci' domain='0x2424' bus='0x21' slot='0x1c' function='0x6'/> + <address type='pci' domain='0x0021' bus='0xde' slot='0x1f' function='0x1'/> </source> <vlan> <tag id='42'/> diff --git a/tests/qemuxml2argvdata/net-hostdev-vfio.args b/tests/qemuxml2argvdata/net-hostdev-vfio.args index 7247c6c1f2..6dfbe61689 100644 --- a/tests/qemuxml2argvdata/net-hostdev-vfio.args +++ b/tests/qemuxml2argvdata/net-hostdev-vfio.args @@ -22,5 +22,5 @@ server,nowait \ -usb \ -drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-ide0-0-0 \ -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ --device vfio-pci,host=03:07.1,id=hostdev0,bus=pci.0,addr=0x3 \ +-device vfio-pci,host=06:12.1,id=hostdev0,bus=pci.0,addr=0x3 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 diff --git a/tests/qemuxml2argvdata/net-hostdev-vfio.xml b/tests/qemuxml2argvdata/net-hostdev-vfio.xml index 24034cad8b..aff681c0fb 100644 --- a/tests/qemuxml2argvdata/net-hostdev-vfio.xml +++ b/tests/qemuxml2argvdata/net-hostdev-vfio.xml @@ -26,7 +26,7 @@ <mac address='00:11:22:33:44:55'/> <driver name='vfio'/> <source> - <address type='pci' domain='0x0000' bus='0x03' slot='0x07' function='0x1'/> + <address type='pci' domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> </source> <vlan> <tag id='42'/> diff --git a/tests/qemuxml2argvdata/net-hostdev.args b/tests/qemuxml2argvdata/net-hostdev.args index e137a0a5b1..7663821e53 100644 --- a/tests/qemuxml2argvdata/net-hostdev.args +++ b/tests/qemuxml2argvdata/net-hostdev.args @@ -22,5 +22,5 @@ server,nowait \ -usb \ -drive file=/dev/HostVG/QEMUGuest1,format=raw,if=none,id=drive-ide0-0-0 \ -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ --device pci-assign,host=03:07.1,id=hostdev0,bus=pci.0,addr=0x3 \ +-device pci-assign,host=06:12.1,id=hostdev0,bus=pci.0,addr=0x3 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 diff --git a/tests/qemuxml2argvdata/net-hostdev.xml b/tests/qemuxml2argvdata/net-hostdev.xml index 829d8b850b..9949e6ef30 100644 --- a/tests/qemuxml2argvdata/net-hostdev.xml +++ b/tests/qemuxml2argvdata/net-hostdev.xml @@ -25,7 +25,7 @@ <interface type='hostdev' managed='yes'> <mac address='00:11:22:33:44:55'/> <source> - <address type='pci' domain='0x0000' bus='0x03' slot='0x07' function='0x1'/> + <address type='pci' domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> </source> <vlan> <tag id='42'/> diff --git a/tests/qemuxml2argvdata/pci-rom.args b/tests/qemuxml2argvdata/pci-rom.args index b50581283e..9451d7613a 100644 --- a/tests/qemuxml2argvdata/pci-rom.args +++ b/tests/qemuxml2argvdata/pci-rom.args @@ -28,7 +28,7 @@ rombar=1 \ -device virtio-net-pci,vlan=1,id=net1,mac=52:54:00:24:a5:9e,bus=pci.0,addr=0x4,\ romfile=/etc/fake/bootrom.bin \ -net user,vlan=1,name=hostnet1 \ --device pci-assign,host=06:12.5,id=hostdev0,bus=pci.0,addr=0x5,rombar=0 \ --device pci-assign,host=06:12.6,id=hostdev1,bus=pci.0,addr=0x6,rombar=1,\ +-device pci-assign,host=06:12.1,id=hostdev0,bus=pci.0,addr=0x5,rombar=0 \ +-device pci-assign,host=06:12.2,id=hostdev1,bus=pci.0,addr=0x6,rombar=1,\ romfile=/etc/fake/bootrom.bin \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 diff --git a/tests/qemuxml2argvdata/pci-rom.xml b/tests/qemuxml2argvdata/pci-rom.xml index 09cafb35a9..048bcd3b59 100644 --- a/tests/qemuxml2argvdata/pci-rom.xml +++ b/tests/qemuxml2argvdata/pci-rom.xml @@ -36,13 +36,13 @@ <input type='keyboard' bus='ps2'/> <hostdev mode='subsystem' type='pci' managed='yes'> <source> - <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> </source> <rom bar='off'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <source> - <address domain='0x0000' bus='0x06' slot='0x12' function='0x6'/> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x2'/> </source> <rom bar='on' file='/etc/fake/bootrom.bin'/> </hostdev> diff --git a/tests/qemuxml2xmloutdata/hostdev-pci-address.xml b/tests/qemuxml2xmloutdata/hostdev-pci-address.xml index 59cb86e1d5..1e80b9b387 100644 --- a/tests/qemuxml2xmloutdata/hostdev-pci-address.xml +++ b/tests/qemuxml2xmloutdata/hostdev-pci-address.xml @@ -31,7 +31,7 @@ <input type='keyboard' bus='ps2'/> <hostdev mode='subsystem' type='pci' managed='yes'> <source> - <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </hostdev> diff --git a/tests/qemuxml2xmloutdata/hostdev-vfio.xml b/tests/qemuxml2xmloutdata/hostdev-vfio.xml index 77bd62a129..15a845f4d1 100644 --- a/tests/qemuxml2xmloutdata/hostdev-vfio.xml +++ b/tests/qemuxml2xmloutdata/hostdev-vfio.xml @@ -32,7 +32,7 @@ <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> - <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </hostdev> diff --git a/tests/qemuxml2xmloutdata/net-hostdev-vfio.xml b/tests/qemuxml2xmloutdata/net-hostdev-vfio.xml index 0523cd8d3b..3f057a8249 100644 --- a/tests/qemuxml2xmloutdata/net-hostdev-vfio.xml +++ b/tests/qemuxml2xmloutdata/net-hostdev-vfio.xml @@ -31,7 +31,7 @@ <mac address='00:11:22:33:44:55'/> <driver name='vfio'/> <source> - <address type='pci' domain='0x0000' bus='0x03' slot='0x07' function='0x1'/> + <address type='pci' domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> </source> <vlan> <tag id='42'/> diff --git a/tests/qemuxml2xmloutdata/net-hostdev.xml b/tests/qemuxml2xmloutdata/net-hostdev.xml index bede4b034f..0141a0c5a3 100644 --- a/tests/qemuxml2xmloutdata/net-hostdev.xml +++ b/tests/qemuxml2xmloutdata/net-hostdev.xml @@ -30,7 +30,7 @@ <interface type='hostdev' managed='yes'> <mac address='00:11:22:33:44:55'/> <source> - <address type='pci' domain='0x0000' bus='0x03' slot='0x07' function='0x1'/> + <address type='pci' domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> </source> <vlan> <tag id='42'/> diff --git a/tests/qemuxml2xmloutdata/pci-rom.xml b/tests/qemuxml2xmloutdata/pci-rom.xml index 982231fefe..5b5a8b217b 100644 --- a/tests/qemuxml2xmloutdata/pci-rom.xml +++ b/tests/qemuxml2xmloutdata/pci-rom.xml @@ -43,14 +43,14 @@ <input type='keyboard' bus='ps2'/> <hostdev mode='subsystem' type='pci' managed='yes'> <source> - <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> </source> <rom bar='off'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <source> - <address domain='0x0000' bus='0x06' slot='0x12' function='0x6'/> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x2'/> </source> <rom bar='on' file='/etc/fake/bootrom.bin'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> diff --git a/tests/virpcimock.c b/tests/virpcimock.c index c9b7434c9a..abe08a30ea 100644 --- a/tests/virpcimock.c +++ b/tests/virpcimock.c @@ -134,6 +134,7 @@ struct pciDevice { int device; int class; int iommuGroup; + const char *physfn; struct pciDriver *driver; /* Driver attached. NULL if attached to no driver */ }; @@ -433,6 +434,14 @@ pci_device_new_from_stub(const struct pciDevice *data) } make_symlink(devpath, "iommu_group", tmp); + if (dev->physfn) { + if (snprintf(tmp, sizeof(tmp), + "%s/devices/%s", fakesysfspcidir, dev->physfn) < 0) { + ABORT("@tmp overflow"); + } + make_symlink(devpath, "physfn", tmp); + } + if (pci_device_autobind(dev) < 0) ABORT("Unable to bind: %s", data->id); @@ -1007,6 +1016,11 @@ init_env(void) MAKE_PCI_DEVICE("0000:0a:01.0", 0x8086, 0x0047, .iommuGroup = 8); MAKE_PCI_DEVICE("0000:0a:02.0", 0x8286, 0x0048, .iommuGroup = 8); MAKE_PCI_DEVICE("0000:0a:03.0", 0x8386, 0x0048, .iommuGroup = 8); + MAKE_PCI_DEVICE("0000:06:12.0", 0x8086, 0x0047, .iommuGroup = 9); + MAKE_PCI_DEVICE("0000:06:12.1", 0x8086, 0x0047, .iommuGroup = 10, .physfn = "0000:06:12.0"); /* Virtual Function */ + MAKE_PCI_DEVICE("0000:06:12.2", 0x8086, 0x0047, .iommuGroup = 11, .physfn = "0000:06:12.0"); /* Virtual Function */ + MAKE_PCI_DEVICE("0021:de:1f.0", 0x8086, 0x0047, .iommuGroup = 12); + MAKE_PCI_DEVICE("0021:de:1f.1", 0x8086, 0x0047, .iommuGroup = 13, .physfn = "0021:de:1f.0"); /* Virtual Function */ } diff --git a/tests/virpcitestdata/0000-06-12.0.config b/tests/virpcitestdata/0000-06-12.0.config new file mode 100644 index 0000000000000000000000000000000000000000..beee76534041a7020c08ae9ac03d9a349c6ea12e GIT binary patch literal 256 ycmXpOFlAt45MXi^VCG@qU|?VnU}yr8Sb;H6EeJS(Ng%<*sKv;@R0rb@MH&E<&;s)S literal 0 HcmV?d00001 diff --git a/tests/virpcitestdata/0000-06-12.1.config b/tests/virpcitestdata/0000-06-12.1.config new file mode 100644 index 0000000000000000000000000000000000000000..beee76534041a7020c08ae9ac03d9a349c6ea12e GIT binary patch literal 256 ycmXpOFlAt45MXi^VCG@qU|?VnU}yr8Sb;H6EeJS(Ng%<*sKv;@R0rb@MH&E<&;s)S literal 0 HcmV?d00001 diff --git a/tests/virpcitestdata/0000-06-12.2.config b/tests/virpcitestdata/0000-06-12.2.config new file mode 100644 index 0000000000000000000000000000000000000000..beee76534041a7020c08ae9ac03d9a349c6ea12e GIT binary patch literal 256 ycmXpOFlAt45MXi^VCG@qU|?VnU}yr8Sb;H6EeJS(Ng%<*sKv;@R0rb@MH&E<&;s)S literal 0 HcmV?d00001

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/util/virprocess.h | 2 - tests/Makefile.am | 7 ++ tests/qemuhotplugtest.c | 42 ++++++++++++++ .../qemuhotplug-hostdev-pci.xml | 6 ++ .../qemuhotplug-base-live+hostdev-pci.xml | 60 ++++++++++++++++++++ .../qemuhotplug-pseries-base-live+hostdev-pci.xml | 53 ++++++++++++++++++ .../qemuhotplug-pseries-base-live.xml | 45 +++++++++++++++ tests/virprocessmock.c | 28 +++++++++ 8 files changed, 241 insertions(+), 2 deletions(-) create mode 100644 tests/qemuhotplugtestdevices/qemuhotplug-hostdev-pci.xml create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-base-live+hostdev-pci.xml create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+hostdev-pci.xml create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live.xml create mode 100644 tests/virprocessmock.c diff --git a/src/util/virprocess.h b/src/util/virprocess.h index 3c5a882772..5e68b47744 100644 --- a/src/util/virprocess.h +++ b/src/util/virprocess.h @@ -72,7 +72,7 @@ int virProcessGetNamespaces(pid_t pid, int virProcessSetNamespaces(size_t nfdlist, int *fdlist); -int virProcessSetMaxMemLock(pid_t pid, unsigned long long bytes); +int virProcessSetMaxMemLock(pid_t pid, unsigned long long bytes) ATTRIBUTE_NOINLINE; int virProcessSetMaxProcesses(pid_t pid, unsigned int procs); int virProcessSetMaxFiles(pid_t pid, unsigned int files); int virProcessSetMaxCoreSize(pid_t pid, unsigned long long bytes); diff --git a/tests/Makefile.am b/tests/Makefile.am index 15c8cc8158..9ff78ad382 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -213,6 +213,7 @@ test_libraries = libshunload.la \ virpcimock.la \ virnetdevmock.la \ virrandommock.la \ + virprocessmock.la \ virhostcpumock.la \ domaincapsmock.la \ virfilecachemock.la \ @@ -1149,6 +1150,12 @@ virrandommock_la_SOURCES = \ virrandommock_la_LDFLAGS = $(MOCKLIBS_LDFLAGS) virrandommock_la_LIBADD = $(MOCKLIBS_LIBS) +virprocessmock_la_SOURCES = \ + virprocessmock.c +virprocessmock_la_CFLAGS = $(AM_CFLAGS) +virprocessmock_la_LDFLAGS = $(MOCKLIBS_LDFLAGS) +virprocessmock_la_LIBADD = $(MOCKLIBS_LIBS) + virhostcpumock_la_SOURCES = \ virhostcpumock.c virhostcpumock_la_LDFLAGS = $(MOCKLIBS_LDFLAGS) diff --git a/tests/qemuhotplugtest.c b/tests/qemuhotplugtest.c index d42f8e12cb..31ce8d43b9 100644 --- a/tests/qemuhotplugtest.c +++ b/tests/qemuhotplugtest.c @@ -26,6 +26,7 @@ #include "qemumonitortestutils.h" #include "testutils.h" #include "testutilsqemu.h" +#include "virhostdev.h" #include "virerror.h" #include "virstring.h" #include "virthread.h" @@ -78,6 +79,8 @@ qemuHotplugCreateObjects(virDomainXMLOptionPtr xmlopt, virQEMUCapsSet(priv->qemuCaps, QEMU_CAPS_DEVICE_IVSHMEM_PLAIN); virQEMUCapsSet(priv->qemuCaps, QEMU_CAPS_DEVICE_IVSHMEM_DOORBELL); virQEMUCapsSet(priv->qemuCaps, QEMU_CAPS_SCSI_DISK_WWN); + virQEMUCapsSet(priv->qemuCaps, QEMU_CAPS_DEVICE_VFIO_PCI); + virQEMUCapsSet(priv->qemuCaps, QEMU_CAPS_DEVICE_SPAPR_PCI_HOST_BRIDGE); if (event) virQEMUCapsSet(priv->qemuCaps, QEMU_CAPS_DEVICE_DEL_EVENT); @@ -131,6 +134,9 @@ testQemuHotplugAttach(virDomainObjPtr vm, case VIR_DOMAIN_DEVICE_WATCHDOG: ret = qemuDomainAttachWatchdog(&driver, vm, dev->data.watchdog); break; + case VIR_DOMAIN_DEVICE_HOSTDEV: + ret = qemuDomainAttachHostDevice(&driver, vm, dev->data.hostdev); + break; default: VIR_TEST_VERBOSE("device type '%s' cannot be attached\n", virDomainDeviceTypeToString(dev->type)); @@ -159,6 +165,9 @@ testQemuHotplugDetach(virDomainObjPtr vm, case VIR_DOMAIN_DEVICE_WATCHDOG: ret = qemuDomainDetachWatchdog(&driver, vm, dev->data.watchdog); break; + case VIR_DOMAIN_DEVICE_HOSTDEV: + ret = qemuDomainDetachHostDevice(&driver, vm, dev); + break; default: VIR_TEST_VERBOSE("device type '%s' cannot be detached\n", virDomainDeviceTypeToString(dev->type)); @@ -579,6 +588,7 @@ testQemuHotplugCpuIndividual(const void *opaque) } +#define FAKEROOTDIRTEMPLATE abs_builddir "/fakerootdir-XXXXXX" static int mymain(void) @@ -586,6 +596,20 @@ mymain(void) int ret = 0; struct qemuHotplugTestData data = {0}; struct testQemuHotplugCpuParams cpudata; + char *fakerootdir; + + if (VIR_STRDUP_QUIET(fakerootdir, FAKEROOTDIRTEMPLATE) < 0) { + fprintf(stderr, "Out of memory\n"); + abort(); + } + + if (!mkdtemp(fakerootdir)) { + fprintf(stderr, "Cannot create fakerootdir"); + abort(); + } + + setenv("LIBVIRT_FAKE_ROOT_DIR", fakerootdir, 1); + unsetenv("LD_PRELOAD"); #if !WITH_YAJL fputs("libvirt not compiled with yajl, skipping this test\n", stderr); @@ -613,6 +637,8 @@ mymain(void) if (!driver.lockManager) return EXIT_FAILURE; + driver.hostdevMgr = virHostdevManagerGetDefault(); + /* wait only 100ms for DEVICE_DELETED event */ qemuDomainRemoveDeviceWaitTime = 100; @@ -821,6 +847,14 @@ mymain(void) "disk-scsi-duplicate-wwn", false, false, "human-monitor-command", HMP("OK\\r\\n"), "device_add", QMP_OK); + DO_TEST_ATTACH_EVENT("base-live", "hostdev-pci", false, true, + "device_add", QMP_OK); + DO_TEST_DETACH("base-live", "hostdev-pci", false, false, + "device_del", QMP_DEVICE_DELETED("hostdev0") QMP_OK); + DO_TEST_ATTACH_EVENT("pseries-base-live", "hostdev-pci", false, true, + "device_add", QMP_OK); + DO_TEST_DETACH("pseries-base-live", "hostdev-pci", false, false, + "device_del", QMP_DEVICE_DELETED("hostdev0") QMP_OK); DO_TEST_ATTACH("base-live", "watchdog", false, true, "watchdog-set-action", QMP_OK, @@ -873,9 +907,15 @@ mymain(void) DO_TEST_CPU_INDIVIDUAL("ppc64-modern-individual", "16-22", true, true, true); DO_TEST_CPU_INDIVIDUAL("ppc64-modern-individual", "17", true, true, true); + if (getenv("LIBVIRT_SKIP_CLEANUP") == NULL) + virFileDeleteTree(fakerootdir); + VIR_FREE(fakerootdir); + qemuTestDriverFree(&driver); virObjectUnref(data.vm); return (ret == 0) ? EXIT_SUCCESS : EXIT_FAILURE; } -VIR_TEST_MAIN(mymain) +VIR_TEST_MAIN_PRELOAD(mymain, + abs_builddir "/.libs/virpcimock.so", + abs_builddir "/.libs/virprocessmock.so"); diff --git a/tests/qemuhotplugtestdevices/qemuhotplug-hostdev-pci.xml b/tests/qemuhotplugtestdevices/qemuhotplug-hostdev-pci.xml new file mode 100644 index 0000000000..6f7c99c943 --- /dev/null +++ b/tests/qemuhotplugtestdevices/qemuhotplug-hostdev-pci.xml @@ -0,0 +1,6 @@ +<hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x2'/> + </source> +</hostdev> diff --git a/tests/qemuhotplugtestdomains/qemuhotplug-base-live+hostdev-pci.xml b/tests/qemuhotplugtestdomains/qemuhotplug-base-live+hostdev-pci.xml new file mode 100644 index 0000000000..524fb59dc1 --- /dev/null +++ b/tests/qemuhotplugtestdomains/qemuhotplug-base-live+hostdev-pci.xml @@ -0,0 +1,60 @@ +<domain type='kvm' id='7'> + <name>hotplug</name> + <uuid>d091ea82-29e6-2e34-3005-f02617b36e87</uuid> + <memory unit='KiB'>4194304</memory> + <currentMemory unit='KiB'>4194304</currentMemory> + <vcpu placement='static'>4</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <features> + <acpi/> + <apic/> + <pae/> + </features> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>restart</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <controller type='usb' index='0'> + <alias name='usb'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='ide' index='0'> + <alias name='ide'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <controller type='scsi' index='0' model='virtio-scsi'> + <alias name='scsi0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </controller> + <controller type='pci' index='0' model='pci-root'> + <alias name='pci'/> + </controller> + <controller type='virtio-serial' index='0'> + <alias name='virtio-serial0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </controller> + <input type='mouse' bus='ps2'> + <alias name='input0'/> + </input> + <input type='keyboard' bus='ps2'> + <alias name='input1'/> + </input> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x2'/> + </source> + <alias name='hostdev0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </hostdev> + <memballoon model='none'> + <alias name='balloon0'/> + </memballoon> + </devices> + <seclabel type='none' model='none'/> +</domain> diff --git a/tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+hostdev-pci.xml b/tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+hostdev-pci.xml new file mode 100644 index 0000000000..fb56aa38d0 --- /dev/null +++ b/tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+hostdev-pci.xml @@ -0,0 +1,53 @@ +<domain type='kvm' id='7'> + <name>hotplug</name> + <uuid>d091ea82-29e6-2e34-3005-f02617b36e87</uuid> + <memory unit='KiB'>4194304</memory> + <currentMemory unit='KiB'>4194304</currentMemory> + <vcpu placement='static'>4</vcpu> + <os> + <type arch='ppc64' machine='pseries'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-ppc64</emulator> + <controller type='pci' index='0' model='pci-root'> + <model name='spapr-pci-host-bridge'/> + <target index='0'/> + <alias name='pci.0'/> + </controller> + <controller type='pci' index='1' model='pci-root'> + <model name='spapr-pci-host-bridge'/> + <target index='1'/> + <alias name='pci.1'/> + </controller> + <controller type='usb' index='0'> + <alias name='usb'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> + </controller> + <input type='keyboard' bus='usb'> + <alias name='input0'/> + <address type='usb' bus='0' port='1'/> + </input> + <input type='mouse' bus='usb'> + <alias name='input1'/> + <address type='usb' bus='0' port='2'/> + </input> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x2'/> + </source> + <alias name='hostdev0'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/> + </hostdev> + <memballoon model='none'> + <alias name='balloon0'/> + </memballoon> + <panic model='pseries'/> + </devices> + <seclabel type='none' model='none'/> +</domain> diff --git a/tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live.xml b/tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live.xml new file mode 100644 index 0000000000..896c8ae033 --- /dev/null +++ b/tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live.xml @@ -0,0 +1,45 @@ +<domain type='kvm' id='7'> + <name>hotplug</name> + <uuid>d091ea82-29e6-2e34-3005-f02617b36e87</uuid> + <memory unit='KiB'>4194304</memory> + <currentMemory unit='KiB'>4194304</currentMemory> + <vcpu placement='static'>4</vcpu> + <os> + <type arch='ppc64' machine='pseries'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-ppc64</emulator> + <controller type='pci' index='0' model='pci-root'> + <model name='spapr-pci-host-bridge'/> + <target index='0'/> + <alias name='pci.0'/> + </controller> + <controller type='pci' index='1' model='pci-root'> + <model name='spapr-pci-host-bridge'/> + <target index='1'/> + <alias name='pci.1'/> + </controller> + <controller type='usb' index='0'> + <alias name='usb'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> + </controller> + <input type='keyboard' bus='usb'> + <alias name='input0'/> + <address type='usb' bus='0' port='1'/> + </input> + <input type='mouse' bus='usb'> + <alias name='input1'/> + <address type='usb' bus='0' port='2'/> + </input> + <memballoon model='none'> + <alias name='balloon0'/> + </memballoon> + <panic model='pseries'/> + </devices> + <seclabel type='none' model='none'/> +</domain> diff --git a/tests/virprocessmock.c b/tests/virprocessmock.c new file mode 100644 index 0000000000..985562fee1 --- /dev/null +++ b/tests/virprocessmock.c @@ -0,0 +1,28 @@ +/* + * Copyright (C) 2018 Red Hat, Inc. + * Copyright (C) 2018 IBM Corp. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + */ + +#include <config.h> +#include "virprocess.h" + +int +virProcessSetMaxMemLock(pid_t pid ATTRIBUTE_UNUSED, unsigned long long bytes ATTRIBUTE_UNUSED) +{ + return 0; +}

There are already good number of test cases with hostdevices, few have multifunction devices but none having more than one than one multifunction cards. This patch adds a case where there are two multifunction cards and two Virtual functions part of the same XML. 0001:01:00.X & 0005:09:00.X - are Multifunction PCI cards. 0000:06:12.[5|6] - are SRIOV Virtual functions The next few commits improve on automatically detecting the multifunction cards and auto-assinging the addresses appropriately. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- .../hostdev-pci-multifunction.args | 29 +++++++ .../qemuxml2argvdata/hostdev-pci-multifunction.xml | 59 +++++++++++++++ tests/qemuxml2argvtest.c | 5 + .../hostdev-pci-multifunction.xml | 79 ++++++++++++++++++++ tests/qemuxml2xmltest.c | 1 5 files changed, 173 insertions(+) create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction.args create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction.xml create mode 100644 tests/qemuxml2xmloutdata/hostdev-pci-multifunction.xml diff --git a/tests/qemuxml2argvdata/hostdev-pci-multifunction.args b/tests/qemuxml2argvdata/hostdev-pci-multifunction.args new file mode 100644 index 0000000000..6b57f5713f --- /dev/null +++ b/tests/qemuxml2argvdata/hostdev-pci-multifunction.args @@ -0,0 +1,29 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name delete \ +-S \ +-M pc \ +-m 256 \ +-smp 4,sockets=4,cores=1,threads=1 \ +-uuid 583a8e8e-f0ce-4f53-89ab-092862148b25 \ +-nographic \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-delete/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=readline \ +-no-acpi \ +-boot c \ +-usb \ +-device vfio-pci,host=0005:90:01.0,id=hostdev0,bus=pci.0,addr=0x3 \ +-device vfio-pci,host=0001:01:00.1,id=hostdev1,bus=pci.0,addr=0x4 \ +-device vfio-pci,host=0001:01:00.0,id=hostdev2,bus=pci.0,addr=0x5 \ +-device vfio-pci,host=0005:90:01.2,id=hostdev3,bus=pci.0,addr=0x6 \ +-device vfio-pci,host=0005:90:01.3,id=hostdev4,bus=pci.0,addr=0x7 \ +-device vfio-pci,host=06:12.1,id=hostdev5,bus=pci.0,addr=0x8 \ +-device vfio-pci,host=06:12.2,id=hostdev6,bus=pci.0,addr=0x9 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa diff --git a/tests/qemuxml2argvdata/hostdev-pci-multifunction.xml b/tests/qemuxml2argvdata/hostdev-pci-multifunction.xml new file mode 100644 index 0000000000..06c889c64d --- /dev/null +++ b/tests/qemuxml2argvdata/hostdev-pci-multifunction.xml @@ -0,0 +1,59 @@ +<domain type='kvm'> + <name>delete</name> + <uuid>583a8e8e-f0ce-4f53-89ab-092862148b25</uuid> + <memory unit='KiB'>262144</memory> + <vcpu placement='static'>4</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + </os> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0005' bus='0x90' slot='0x01' function='0x0'/> + </source> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0001' bus='0x01' slot='0x00' function='0x1'/> + </source> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0001' bus='0x01' slot='0x00' function='0x0'/> + </source> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0005' bus='0x90' slot='0x01' function='0x2'/> + </source> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0005' bus='0x90' slot='0x01' function='0x3'/> + </source> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> + </source> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x2'/> + </source> + </hostdev> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 731db9ed52..1e00eb167a 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1604,6 +1604,11 @@ mymain(void) DO_TEST("hostdev-vfio-multidomain", QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_DEVICE_VFIO_PCI, QEMU_CAPS_HOST_PCI_MULTIDOMAIN); + DO_TEST("hostdev-pci-multifunction", + QEMU_CAPS_KVM, + QEMU_CAPS_DEVICE_VFIO_PCI, + QEMU_CAPS_HOST_PCI_MULTIDOMAIN, + QEMU_CAPS_PCI_MULTIFUNCTION); DO_TEST("hostdev-mdev-precreated", QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_DEVICE_VFIO_PCI); diff --git a/tests/qemuxml2xmloutdata/hostdev-pci-multifunction.xml b/tests/qemuxml2xmloutdata/hostdev-pci-multifunction.xml new file mode 100644 index 0000000000..52ed86e305 --- /dev/null +++ b/tests/qemuxml2xmloutdata/hostdev-pci-multifunction.xml @@ -0,0 +1,79 @@ +<domain type='kvm'> + <name>delete</name> + <uuid>583a8e8e-f0ce-4f53-89ab-092862148b25</uuid> + <memory unit='KiB'>262144</memory> + <currentMemory unit='KiB'>262144</currentMemory> + <vcpu placement='static'>4</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <controller type='usb' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='ide' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0005' bus='0x90' slot='0x01' function='0x0'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0001' bus='0x01' slot='0x00' function='0x1'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0001' bus='0x01' slot='0x00' function='0x0'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0005' bus='0x90' slot='0x01' function='0x2'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0005' bus='0x90' slot='0x01' function='0x3'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x2'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> + </hostdev> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 28ba46efb2..91e3285109 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -473,6 +473,7 @@ mymain(void) DO_TEST("hostdev-usb-address", NONE); DO_TEST("hostdev-pci-address", NONE); + DO_TEST("hostdev-pci-multifunction", NONE); DO_TEST("hostdev-vfio", NONE); DO_TEST("hostdev-mdev-precreated", NONE); DO_TEST("pci-rom", NONE);

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/libvirt_private.syms | 1 + src/node_device/node_device_udev.c | 2 +- src/util/virhostdev.c | 2 +- src/util/virpci.c | 20 ++++++++++++++++++-- src/util/virpci.h | 3 ++- 5 files changed, 23 insertions(+), 5 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index e1bdaa127e..53e174a223 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2462,6 +2462,7 @@ virPCIDeviceGetUnbindFromStub; virPCIDeviceGetUsedBy; virPCIDeviceHasPCIExpressLink; virPCIDeviceIsAssignable; +virPCIDeviceIsMultifunction; virPCIDeviceIsPCIExpress; virPCIDeviceListAdd; virPCIDeviceListAddCopy; diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c index e87eb32a85..99b936bb5c 100644 --- a/src/node_device/node_device_udev.c +++ b/src/node_device/node_device_udev.c @@ -455,7 +455,7 @@ udevProcessPCI(struct udev_device *device, /* We need to be root to read PCI device configs */ if (privileged) { - if (virPCIGetHeaderType(pciDev, &pci_dev->hdrType) < 0) + if (virPCIGetHeaderType(pciDev, &pci_dev->hdrType, NULL) < 0) goto cleanup; if (virPCIDeviceIsPCIExpress(pciDev) > 0) { diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index e0133cdeec..807caf567a 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -693,7 +693,7 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr mgr, struct virHostdevIsPCINodeDeviceUsedData data = { mgr, dom_name, usesVFIO }; int hdrType = -1; - if (virPCIGetHeaderType(pci, &hdrType) < 0) + if (virPCIGetHeaderType(pci, &hdrType, NULL) < 0) goto cleanup; if (hdrType != VIR_PCI_HEADER_ENDPOINT) { diff --git a/src/util/virpci.c b/src/util/virpci.c index 55e4c3e492..a827b3bc0f 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -936,7 +936,7 @@ virPCIDeviceReset(virPCIDevicePtr dev, int fd = -1; int hdrType = -1; - if (virPCIGetHeaderType(dev, &hdrType) < 0) + if (virPCIGetHeaderType(dev, &hdrType, NULL) < 0) return -1; if (hdrType != VIR_PCI_HEADER_ENDPOINT) { @@ -3169,6 +3169,18 @@ virPCIGetMdevTypes(const char *sysfspath ATTRIBUTE_UNUSED, } #endif /* __linux__ */ +bool +virPCIDeviceIsMultifunction(virPCIDevicePtr dev) +{ + bool multifunction = false; + int hdrType = -1; + + if (virPCIGetHeaderType(dev, &hdrType, &multifunction) < 0) + return false; + + return multifunction; +} + int virPCIDeviceIsPCIExpress(virPCIDevicePtr dev) { @@ -3254,10 +3266,11 @@ virPCIDeviceGetLinkCapSta(virPCIDevicePtr dev, } -int virPCIGetHeaderType(virPCIDevicePtr dev, int *hdrType) +int virPCIGetHeaderType(virPCIDevicePtr dev, int *hdrType, bool *multiFunc) { int fd; uint8_t type; + bool multi = false; *hdrType = -1; @@ -3268,6 +3281,7 @@ int virPCIGetHeaderType(virPCIDevicePtr dev, int *hdrType) virPCIDeviceConfigClose(dev, fd); + multi = type & PCI_HEADER_TYPE_MULTI; type &= PCI_HEADER_TYPE_MASK; if (type >= VIR_PCI_HEADER_LAST) { virReportError(VIR_ERR_INTERNAL_ERROR, @@ -3276,6 +3290,8 @@ int virPCIGetHeaderType(virPCIDevicePtr dev, int *hdrType) } *hdrType = type; + if (multiFunc) + *multiFunc = multi; return 0; } diff --git a/src/util/virpci.h b/src/util/virpci.h index 794b7e59db..179249677a 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -199,6 +199,7 @@ int virPCIGetVirtualFunctions(const char *sysfs_path, size_t *num_virtual_functions, unsigned int *max_virtual_functions); +bool virPCIDeviceIsMultifunction(virPCIDevicePtr dev); int virPCIIsVirtualFunction(const char *vf_sysfs_device_link); int virPCIGetVirtualFunctionIndex(const char *pf_sysfs_device_link, @@ -246,7 +247,7 @@ int virPCIDeviceGetLinkCapSta(virPCIDevicePtr dev, unsigned int *sta_speed, unsigned int *sta_width); -int virPCIGetHeaderType(virPCIDevicePtr dev, int *hdrType); +int virPCIGetHeaderType(virPCIDevicePtr dev, int *hdrType, bool *multiFunc); void virPCIEDeviceInfoFree(virPCIEDeviceInfoPtr dev);

The upcoming patches verify the sysfs for pci hostdevs. So, preload the mock library and change the device to what exists on the mock environment to avoid a failure. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- tests/qemuargv2xmldata/hostdev-pci-address.args | 2 +- tests/qemuargv2xmldata/hostdev-pci-address.xml | 2 +- tests/qemuargv2xmltest.c | 18 ++++++++++++++++-- 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/tests/qemuargv2xmldata/hostdev-pci-address.args b/tests/qemuargv2xmldata/hostdev-pci-address.args index 4d24ce3b75..111dff773f 100644 --- a/tests/qemuargv2xmldata/hostdev-pci-address.args +++ b/tests/qemuargv2xmldata/hostdev-pci-address.args @@ -20,4 +20,4 @@ QEMU_AUDIO_DRV=none \ -net none \ -serial none \ -parallel none \ --pcidevice host=06:12.5 +-pcidevice host=06:12.1 diff --git a/tests/qemuargv2xmldata/hostdev-pci-address.xml b/tests/qemuargv2xmldata/hostdev-pci-address.xml index d6b9ce1d25..fd0bff62e6 100644 --- a/tests/qemuargv2xmldata/hostdev-pci-address.xml +++ b/tests/qemuargv2xmldata/hostdev-pci-address.xml @@ -31,7 +31,7 @@ <input type='keyboard' bus='ps2'/> <hostdev mode='subsystem' type='pci' managed='yes'> <source> - <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </hostdev> diff --git a/tests/qemuargv2xmltest.c b/tests/qemuargv2xmltest.c index cb010268c4..a68c13fead 100644 --- a/tests/qemuargv2xmltest.c +++ b/tests/qemuargv2xmltest.c @@ -144,12 +144,25 @@ testCompareXMLToArgvHelper(const void *data) return result; } - +# define FAKEROOTDIRTEMPLATE abs_builddir "/fakerootdir-XXXXXX" static int mymain(void) { int ret = 0; + char *fakerootdir; + + if (VIR_STRDUP_QUIET(fakerootdir, FAKEROOTDIRTEMPLATE) < 0) { + fprintf(stderr, "Out of memory\n"); + abort(); + } + + if (!mkdtemp(fakerootdir)) { + fprintf(stderr, "Cannot create fakerootdir"); + abort(); + } + + setenv("LIBVIRT_FAKE_ROOT_DIR", fakerootdir, 1); if (qemuTestDriverInit(&driver) < 0) return EXIT_FAILURE; @@ -298,7 +311,8 @@ mymain(void) return ret == 0 ? EXIT_SUCCESS : EXIT_FAILURE; } -VIR_TEST_MAIN(mymain) +VIR_TEST_MAIN_PRELOAD(mymain, + abs_builddir "/.libs/virpcimock.so") #else

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/libvirt_private.syms | 2 ++ src/util/virhostdev.c | 29 +++++++++++++++++++++++++++++ src/util/virhostdev.h | 5 +++++ src/util/virpci.c | 2 +- src/util/virpci.h | 3 +++ 5 files changed, 40 insertions(+), 1 deletion(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 53e174a223..68b648ba31 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1919,6 +1919,7 @@ virHostdevHostSupportsPassthroughKVM; virHostdevHostSupportsPassthroughVFIO; virHostdevIsSCSIDevice; virHostdevManagerGetDefault; +virHostdevPCIDevicesBelongToSameSlot; virHostdevPCINodeDeviceDetach; virHostdevPCINodeDeviceReAttach; virHostdevPCINodeDeviceReset; @@ -2443,6 +2444,7 @@ virPCIDeviceAddressGetIOMMUGroupAddresses; virPCIDeviceAddressGetIOMMUGroupNum; virPCIDeviceAddressGetSysfsFile; virPCIDeviceAddressIOMMUGroupIterate; +virPCIDeviceAddressIsEqual; virPCIDeviceAddressParse; virPCIDeviceCopy; virPCIDeviceDetach; diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index 807caf567a..9508a29954 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -306,6 +306,35 @@ virHostdevIsVirtualFunction(virDomainHostdevDefPtr hostdev) } +bool +virHostdevPCIDevicesBelongToSameSlot(virDomainHostdevDefPtr dev1, + virDomainHostdevDefPtr dev2) +{ + virPCIDeviceAddressPtr devAddr1 = NULL, devAddr2 = NULL; + + if (!dev1 || !dev2) + return false; + + devAddr1 = &dev1->source.subsys.u.pci.addr; + devAddr2 = &dev2->source.subsys.u.pci.addr; + if ((devAddr1->domain != devAddr2->domain) || + (devAddr1->bus != devAddr2->bus) || + (devAddr1->slot != devAddr2->slot) || + (virPCIDeviceAddressIsEqual(devAddr1, devAddr2))) { + return false; + } + + /* The Virtual Functions have multifunction false even though they have same + * domain:bus:slot as the Physical function. They are to be treated + * like non-multifunction devices + */ + if (virHostdevIsVirtualFunction(dev1) || virHostdevIsVirtualFunction(dev2)) + return false; + + return true; +} + + static int virHostdevNetDevice(virDomainHostdevDefPtr hostdev, int pfNetDevIdx, diff --git a/src/util/virhostdev.h b/src/util/virhostdev.h index d5efffbac2..ded7620355 100644 --- a/src/util/virhostdev.h +++ b/src/util/virhostdev.h @@ -155,6 +155,11 @@ virHostdevUpdateActiveUSBDevices(virHostdevManagerPtr mgr, const char *drv_name, const char *dom_name) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5); + +bool +virHostdevPCIDevicesBelongToSameSlot(virDomainHostdevDefPtr dev1, + virDomainHostdevDefPtr dev2); + int virHostdevUpdateActiveSCSIDevices(virHostdevManagerPtr mgr, virDomainHostdevDefPtr *hostdevs, diff --git a/src/util/virpci.c b/src/util/virpci.c index a827b3bc0f..987db605cc 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -2603,7 +2603,7 @@ virPCIDeviceAddressParse(char *address, /* * returns true if equal */ -static bool +bool virPCIDeviceAddressIsEqual(virPCIDeviceAddressPtr bdf1, virPCIDeviceAddressPtr bdf2) { diff --git a/src/util/virpci.h b/src/util/virpci.h index 179249677a..5830fb4c12 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -188,6 +188,9 @@ int virPCIDeviceIsAssignable(virPCIDevicePtr dev, int strict_acs_check); int virPCIDeviceWaitForCleanup(virPCIDevicePtr dev, const char *matcher); +bool +virPCIDeviceAddressIsEqual(virPCIDeviceAddressPtr bdf1, + virPCIDeviceAddressPtr bdf2); virPCIDeviceAddressPtr virPCIGetDeviceAddressFromSysfsLink(const char *device_link);

Today's aggregate flag with the slot being true for pcie-root-ports is not enough as there will more number of aggregates depending on the number of Multifuntion PCI cards assigned to the domain. The aggregate is changed to unsigned int and Zero means Not Applicable, 1 is reserved for the pcie-root-ports and >= 2 for the the PCI multifunction cards(coming..). Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/conf/device_conf.h | 1 src/conf/domain_addr.c | 43 ++++++++++++++------- src/conf/domain_addr.h | 36 ++++++++--------- src/qemu/qemu_domain_address.c | 83 +++++++++++++++++++++++++++++++++------- src/qemu/qemu_domain_address.h | 8 ++++ 5 files changed, 123 insertions(+), 48 deletions(-) diff --git a/src/conf/device_conf.h b/src/conf/device_conf.h index f87d6f1fc6..cdb2040fb8 100644 --- a/src/conf/device_conf.h +++ b/src/conf/device_conf.h @@ -163,6 +163,7 @@ struct _virDomainDeviceInfo { * assignment, never saved and never reported. */ int pciConnectFlags; /* enum virDomainPCIConnectFlags */ + unsigned int aggregateSlotIdx; /* Used when the aggregate flag is set */ char *loadparm; /* PCI devices will only be automatically placed on a PCI bus diff --git a/src/conf/domain_addr.c b/src/conf/domain_addr.c index 0c914fe25c..c4a0b99628 100644 --- a/src/conf/domain_addr.c +++ b/src/conf/domain_addr.c @@ -63,7 +63,7 @@ virDomainPCIControllerModelToConnectType(virDomainControllerModelPCI model) return VIR_PCI_CONNECT_TYPE_DMI_TO_PCI_BRIDGE; case VIR_DOMAIN_CONTROLLER_MODEL_PCIE_ROOT_PORT: - return VIR_PCI_CONNECT_TYPE_PCIE_ROOT_PORT | VIR_PCI_CONNECT_AGGREGATE_SLOT; + return VIR_PCI_CONNECT_TYPE_PCIE_ROOT_PORT; case VIR_DOMAIN_CONTROLLER_MODEL_PCIE_SWITCH_UPSTREAM_PORT: return VIR_PCI_CONNECT_TYPE_PCIE_SWITCH_UPSTREAM_PORT; @@ -565,6 +565,7 @@ virDomainPCIAddressReserveAddrInternal(virDomainPCIAddressSetPtr addrs, virPCIDeviceAddressPtr addr, virDomainPCIConnectFlags flags, unsigned int isolationGroup, + unsigned int aggregateSlotIdx, bool fromConfig) { int ret = -1; @@ -598,8 +599,13 @@ virDomainPCIAddressReserveAddrInternal(virDomainPCIAddressSetPtr addrs, * slot, set the slot's aggregate flag. */ if (!bus->slot[addr->slot].functions && - flags & VIR_PCI_CONNECT_AGGREGATE_SLOT) { - bus->slot[addr->slot].aggregate = true; + aggregateSlotIdx > 0) { + bus->slot[addr->slot].aggregateSlotIdx = aggregateSlotIdx; + } else if (bus->slot[addr->slot].aggregateSlotIdx != aggregateSlotIdx && fromConfig) { + bus->slot[addr->slot].aggregateSlotIdx = 0; + VIR_DEBUG("PCI functions of %.4x:%.2x is aggregated to slot %u" + "because of user assigned address %s", + addr->domain, addr->bus, aggregateSlotIdx, addrStr); } if (virDomainPCIAddressBusIsEmpty(bus) && !bus->isolationGroupLocked) { @@ -624,8 +630,8 @@ virDomainPCIAddressReserveAddrInternal(virDomainPCIAddressSetPtr addrs, /* mark the requested function as reserved */ bus->slot[addr->slot].functions |= (1 << addr->function); - VIR_DEBUG("Reserving PCI address %s (aggregate='%s')", addrStr, - bus->slot[addr->slot].aggregate ? "true" : "false"); + VIR_DEBUG("Reserving PCI address %s (aggregateSlotIdx='%d')", addrStr, + bus->slot[addr->slot].aggregateSlotIdx); ret = 0; cleanup: @@ -638,10 +644,11 @@ int virDomainPCIAddressReserveAddr(virDomainPCIAddressSetPtr addrs, virPCIDeviceAddressPtr addr, virDomainPCIConnectFlags flags, - unsigned int isolationGroup) + unsigned int isolationGroup, + unsigned int aggregateSlotIdx) { return virDomainPCIAddressReserveAddrInternal(addrs, addr, flags, - isolationGroup, true); + isolationGroup, aggregateSlotIdx, true); } int @@ -678,6 +685,7 @@ virDomainPCIAddressEnsureAddr(virDomainPCIAddressSetPtr addrs, ret = virDomainPCIAddressReserveAddrInternal(addrs, &dev->addr.pci, flags, dev->isolationGroup, + dev->aggregateSlotIdx, true); } else { ret = virDomainPCIAddressReserveNextAddr(addrs, dev, flags, -1); @@ -730,6 +738,7 @@ virDomainPCIAddressSetFree(virDomainPCIAddressSetPtr addrs) static int virDomainPCIAddressFindUnusedFunctionOnBus(virDomainPCIAddressBusPtr bus, virPCIDeviceAddressPtr searchAddr, + unsigned int aggregateSlotIdx, int function, virDomainPCIConnectFlags flags, bool *found) @@ -753,8 +762,8 @@ virDomainPCIAddressFindUnusedFunctionOnBus(virDomainPCIAddressBusPtr bus, break; } - if (flags & VIR_PCI_CONNECT_AGGREGATE_SLOT && - bus->slot[searchAddr->slot].aggregate) { + if (bus->slot[searchAddr->slot].aggregateSlotIdx > 0 && + bus->slot[searchAddr->slot].aggregateSlotIdx == aggregateSlotIdx) { /* slot and device are okay with aggregating devices */ if ((bus->slot[searchAddr->slot].functions & (1 << searchAddr->function)) == 0) { @@ -799,6 +808,7 @@ virDomainPCIAddressGetNextAddr(virDomainPCIAddressSetPtr addrs, virPCIDeviceAddressPtr next_addr, virDomainPCIConnectFlags flags, unsigned int isolationGroup, + unsigned int aggregateSlotIdx, int function) { virPCIDeviceAddress a = { 0 }; @@ -827,7 +837,9 @@ virDomainPCIAddressGetNextAddr(virDomainPCIAddressSetPtr addrs, a.slot = bus->minSlot; - if (virDomainPCIAddressFindUnusedFunctionOnBus(bus, &a, function, + if (virDomainPCIAddressFindUnusedFunctionOnBus(bus, &a, + aggregateSlotIdx, + function, flags, &found) < 0) { goto error; } @@ -851,7 +863,9 @@ virDomainPCIAddressGetNextAddr(virDomainPCIAddressSetPtr addrs, a.slot = bus->minSlot; - if (virDomainPCIAddressFindUnusedFunctionOnBus(bus, &a, function, + if (virDomainPCIAddressFindUnusedFunctionOnBus(bus, &a, + aggregateSlotIdx, + function, flags, &found) < 0) { goto error; } @@ -910,12 +924,13 @@ virDomainPCIAddressReserveNextAddr(virDomainPCIAddressSetPtr addrs, { virPCIDeviceAddress addr; - if (virDomainPCIAddressGetNextAddr(addrs, &addr, flags, - dev->isolationGroup, function) < 0) + if (virDomainPCIAddressGetNextAddr(addrs, &addr, flags, dev->isolationGroup, + dev->aggregateSlotIdx, function) < 0) return -1; if (virDomainPCIAddressReserveAddrInternal(addrs, &addr, flags, - dev->isolationGroup, false) < 0) + dev->isolationGroup, + dev->aggregateSlotIdx, false) < 0) return -1; if (!addrs->dryRun) { diff --git a/src/conf/domain_addr.h b/src/conf/domain_addr.h index d3541bab09..fa98b67e5c 100644 --- a/src/conf/domain_addr.h +++ b/src/conf/domain_addr.h @@ -32,23 +32,18 @@ typedef enum { VIR_PCI_CONNECT_HOTPLUGGABLE = 1 << 0, /* is hotplug needed/supported */ - /* set for devices that can share a single slot in auto-assignment - * (by assigning one device to each of the 8 functions on the slot) - */ - VIR_PCI_CONNECT_AGGREGATE_SLOT = 1 << 1, - /* kinds of devices as a bitmap so they can be combined (some PCI * controllers permit connecting multiple types of devices) */ - VIR_PCI_CONNECT_TYPE_PCI_DEVICE = 1 << 2, - VIR_PCI_CONNECT_TYPE_PCIE_DEVICE = 1 << 3, - VIR_PCI_CONNECT_TYPE_PCIE_ROOT_PORT = 1 << 4, - VIR_PCI_CONNECT_TYPE_PCIE_SWITCH_UPSTREAM_PORT = 1 << 5, - VIR_PCI_CONNECT_TYPE_PCIE_SWITCH_DOWNSTREAM_PORT = 1 << 6, - VIR_PCI_CONNECT_TYPE_DMI_TO_PCI_BRIDGE = 1 << 7, - VIR_PCI_CONNECT_TYPE_PCI_EXPANDER_BUS = 1 << 8, - VIR_PCI_CONNECT_TYPE_PCIE_EXPANDER_BUS = 1 << 9, - VIR_PCI_CONNECT_TYPE_PCI_BRIDGE = 1 << 10, + VIR_PCI_CONNECT_TYPE_PCI_DEVICE = 1 << 1, + VIR_PCI_CONNECT_TYPE_PCIE_DEVICE = 1 << 2, + VIR_PCI_CONNECT_TYPE_PCIE_ROOT_PORT = 1 << 3, + VIR_PCI_CONNECT_TYPE_PCIE_SWITCH_UPSTREAM_PORT = 1 << 4, + VIR_PCI_CONNECT_TYPE_PCIE_SWITCH_DOWNSTREAM_PORT = 1 << 5, + VIR_PCI_CONNECT_TYPE_DMI_TO_PCI_BRIDGE = 1 << 6, + VIR_PCI_CONNECT_TYPE_PCI_EXPANDER_BUS = 1 << 7, + VIR_PCI_CONNECT_TYPE_PCIE_EXPANDER_BUS = 1 << 8, + VIR_PCI_CONNECT_TYPE_PCI_BRIDGE = 1 << 9, } virDomainPCIConnectFlags; /* a combination of all bits that describe the type of connections @@ -81,12 +76,12 @@ typedef struct { */ uint8_t functions; - /* aggregate is true if this slot has only devices with - * VIR_PCI_CONNECT_AGGREGATE assigned to its functions (meaning - * that other devices with the same flags could also be - * auto-assigned to the other functions) + /* aggregate is greater than zero if this slot has only devices with + * VIR_PCI_CONNECT_AGGREGATE assigned to its functions and + * that other devices with the same aggregateSlotIdx could also be + * auto-assigned to the other functions on this slot) */ - bool aggregate; + unsigned int aggregateSlotIdx; } virDomainPCIAddressSlot; typedef struct { @@ -152,7 +147,8 @@ bool virDomainPCIAddressSlotInUse(virDomainPCIAddressSetPtr addrs, int virDomainPCIAddressReserveAddr(virDomainPCIAddressSetPtr addrs, virPCIDeviceAddressPtr addr, virDomainPCIConnectFlags flags, - unsigned int isolationGroup) + unsigned int isolationGroup, + unsigned int aggregateSlotIdx) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); int virDomainPCIAddressReserveNextAddr(virDomainPCIAddressSetPtr addrs, diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 5f4e8edd2c..424b56dac9 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -26,6 +26,7 @@ #include "qemu_domain_address.h" #include "qemu_domain.h" #include "viralloc.h" +#include "virhostdev.h" #include "virerror.h" #include "virlog.h" @@ -1185,6 +1186,53 @@ qemuDomainSetupIsolationGroups(virDomainDefPtr def) } +void +qemuDomainSetDeviceSlotAggregateIdx(virDomainDefPtr def ATTRIBUTE_UNUSED, + virDomainDeviceDefPtr dev) +{ + virDomainDeviceInfoPtr info = virDomainDeviceGetInfo(dev); + + if (!info) + return; + + info->aggregateSlotIdx = 0; + + if (dev->type == VIR_DOMAIN_DEVICE_CONTROLLER) { + virDomainControllerDefPtr cont = dev->data.controller; + if (cont->type == VIR_DOMAIN_CONTROLLER_TYPE_PCI && + cont->model == VIR_DOMAIN_CONTROLLER_MODEL_PCIE_ROOT_PORT) { + info->aggregateSlotIdx = 1; + } + } + + return; +} + + +static int +qemuDomainFillDeviceSlotAggregationIter(virDomainDefPtr def, + virDomainDeviceDefPtr dev, + virDomainDeviceInfoPtr info ATTRIBUTE_UNUSED, + void *opaque ATTRIBUTE_UNUSED) +{ + qemuDomainSetDeviceSlotAggregateIdx(def, dev); + + return 0; +} + + +static int +qemuDomainSetupSlotAggregation(virDomainDefPtr def) +{ + if (virDomainDeviceInfoIterate(def, qemuDomainFillDeviceSlotAggregationIter, + NULL) < 0) { + return -1; + } + + return 0; +} + + /** * qemuDomainFillDevicePCIConnectFlags: * @@ -1319,7 +1367,8 @@ qemuDomainCollectPCIAddress(virDomainDefPtr def ATTRIBUTE_UNUSED, if (virDomainPCIAddressReserveAddr(addrs, addr, info->pciConnectFlags, - info->isolationGroup) < 0) { + info->isolationGroup, + info->aggregateSlotIdx) < 0) { goto cleanup; } @@ -1480,7 +1529,7 @@ qemuDomainValidateDevicePCISlotsPIIX3(virDomainDefPtr def, continue; } if (addrs->nbuses && - virDomainPCIAddressReserveAddr(addrs, &cont->info.addr.pci, flags, 0) < 0) + virDomainPCIAddressReserveAddr(addrs, &cont->info.addr.pci, flags, 0, 0) < 0) goto cleanup; } @@ -1489,11 +1538,11 @@ qemuDomainValidateDevicePCISlotsPIIX3(virDomainDefPtr def, memset(&tmp_addr, 0, sizeof(tmp_addr)); tmp_addr.slot = 1; /* ISA Bridge at 00:01.0 */ - if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0) < 0) + if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0, 0) < 0) goto cleanup; /* Bridge at 00:01.3 */ tmp_addr.function = 3; - if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0) < 0) + if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0, 0) < 0) goto cleanup; } @@ -1528,7 +1577,7 @@ qemuDomainValidateDevicePCISlotsPIIX3(virDomainDefPtr def, goto cleanup; } } else { - if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0) < 0) + if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0, 0) < 0) goto cleanup; primaryVideo->info.addr.pci = tmp_addr; primaryVideo->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI; @@ -1553,7 +1602,7 @@ qemuDomainValidateDevicePCISlotsPIIX3(virDomainDefPtr def, VIR_DEBUG("PCI address 0:0:2.0 in use, future addition of a video" " device will not be possible without manual" " intervention"); - } else if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0) < 0) { + } else if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0, 0) < 0) { goto cleanup; } } @@ -1629,7 +1678,7 @@ qemuDomainValidateDevicePCISlotsQ35(virDomainDefPtr def, assign = true; } if (assign) { - if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0) < 0) + if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0, 0) < 0) goto cleanup; cont->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI; @@ -1652,7 +1701,7 @@ qemuDomainValidateDevicePCISlotsQ35(virDomainDefPtr def, memset(&tmp_addr, 0, sizeof(tmp_addr)); tmp_addr.slot = 0x1E; if (!virDomainPCIAddressSlotInUse(addrs, &tmp_addr)) { - if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0) < 0) + if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0, 0) < 0) goto cleanup; cont->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI; @@ -1676,12 +1725,12 @@ qemuDomainValidateDevicePCISlotsQ35(virDomainDefPtr def, tmp_addr.slot = 0x1F; tmp_addr.function = 0; tmp_addr.multi = VIR_TRISTATE_SWITCH_ON; - if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0) < 0) + if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0, 0) < 0) goto cleanup; tmp_addr.function = 3; tmp_addr.multi = VIR_TRISTATE_SWITCH_ABSENT; - if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0) < 0) + if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0, 0) < 0) goto cleanup; } @@ -1715,7 +1764,7 @@ qemuDomainValidateDevicePCISlotsQ35(virDomainDefPtr def, goto cleanup; } } else { - if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0) < 0) + if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0, 0) < 0) goto cleanup; primaryVideo->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI; primaryVideo->info.addr.pci = tmp_addr; @@ -1741,7 +1790,7 @@ qemuDomainValidateDevicePCISlotsQ35(virDomainDefPtr def, " device will not be possible without manual" " intervention"); virResetLastError(); - } else if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0) < 0) { + } else if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0, 0) < 0) { goto cleanup; } } @@ -1762,7 +1811,7 @@ qemuDomainValidateDevicePCISlotsQ35(virDomainDefPtr def, !virDeviceInfoPCIAddressWanted(&sound->info)) { continue; } - if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0) < 0) + if (virDomainPCIAddressReserveAddr(addrs, &tmp_addr, flags, 0, 0) < 0) goto cleanup; sound->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI; @@ -1967,7 +2016,8 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, /* Reserve this function on the slot we found */ if (virDomainPCIAddressReserveAddr(addrs, &addr, cont->info.pciConnectFlags, - cont->info.isolationGroup) < 0) { + cont->info.isolationGroup, + cont->info.aggregateSlotIdx) < 0) { goto error; } @@ -2348,6 +2398,9 @@ qemuDomainAssignPCIAddresses(virDomainDefPtr def, if (qemuDomainSetupIsolationGroups(def) < 0) goto cleanup; + if (qemuDomainSetupSlotAggregation(def) < 0) + goto cleanup; + if (nbuses > 0) { /* 1st pass to figure out how many PCI bridges we need */ if (!(addrs = qemuDomainPCIAddressSetCreate(def, nbuses, true))) @@ -2463,6 +2516,7 @@ qemuDomainAssignPCIAddresses(virDomainDefPtr def, dev.data.controller = def->controllers[contIndex]; /* set connect flags so it will be properly addressed */ qemuDomainFillDevicePCIConnectFlags(def, &dev, qemuCaps, driver); + qemuDomainSetDeviceSlotAggregateIdx(def, &dev); /* Reserve an address for the controller. pci-root and pcie-root * controllers don't plug into any other PCI controller, hence @@ -2932,6 +2986,7 @@ qemuDomainEnsurePCIAddress(virDomainObjPtr obj, return 0; qemuDomainFillDevicePCIConnectFlags(obj->def, dev, priv->qemuCaps, driver); + qemuDomainSetDeviceSlotAggregateIdx(obj->def, dev); return virDomainPCIAddressEnsureAddr(priv->pciaddrs, info, info->pciConnectFlags); diff --git a/src/qemu/qemu_domain_address.h b/src/qemu/qemu_domain_address.h index 83f8e81cad..650f977cf7 100644 --- a/src/qemu/qemu_domain_address.h +++ b/src/qemu/qemu_domain_address.h @@ -55,6 +55,14 @@ int qemuDomainFillDeviceIsolationGroup(virDomainDefPtr def, virDomainDeviceDefPtr dev) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +void +qemuDomainSetDeviceSlotAggregateIdx(virDomainDefPtr def, + virDomainDeviceDefPtr dev); +int +qemuDomainDefDeviceFindSlotAggregateIdx(virDomainDefPtr def, + virDomainDeviceDefPtr dev); + + void qemuDomainReleaseDeviceAddress(virDomainObjPtr vm, virDomainDeviceInfoPtr info, const char *devstr);

For existing domains using the primary function alone of a multifunction card, the card is still treated as a multifunction card. This is done to prevent hotplug of other functions when the primary function is already hotplugged. If the secondary functions are part of the xml without the primary function being part of the xml, this has never been supported. So, the libvirt doesn't consider this either as a multifunction card. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/qemu/qemu_domain.h | 11 + src/qemu/qemu_domain_address.c | 169 +++++++++++++++++++- tests/qemuhotplugtest.c | 1 .../hostdev-pci-multifunction.args | 18 +- tests/qemuxml2argvdata/pseries-hostdevs-1.args | 5 - tests/qemuxml2argvdata/pseries-hostdevs-3.args | 5 - tests/qemuxml2argvtest.c | 5 - .../hostdev-pci-multifunction.xml | 16 +- tests/qemuxml2xmloutdata/pseries-hostdevs-1.xml | 4 tests/qemuxml2xmloutdata/pseries-hostdevs-3.xml | 4 10 files changed, 207 insertions(+), 31 deletions(-) diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 6d3e6eb5e3..fbfc994652 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -473,6 +473,17 @@ struct _qemuDomainSaveCookie { qemuDomainSaveCookiePtr qemuDomainSaveCookieNew(virDomainObjPtr vm); +typedef struct _qemuDomainPCIHostdevdata qemuDomainPCIHostdevdata; +typedef qemuDomainPCIHostdevdata *qemuDomainPCIHostdevDataPtr; +struct _qemuDomainPCIHostdevdata { + const virDomainDef *def; + virDomainPCIAddressSetPtr addrs; + virDomainHostdevDefPtr device; +}; + +typedef int (*virDomainPCIHostdevCallback)(qemuDomainPCIHostdevDataPtr data, + virDomainHostdevDefPtr hostdev); + const char *qemuDomainAsyncJobPhaseToString(qemuDomainAsyncJob job, int phase); int qemuDomainAsyncJobPhaseFromString(qemuDomainAsyncJob job, diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 424b56dac9..bc72b6e94c 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -1186,11 +1186,155 @@ qemuDomainSetupIsolationGroups(virDomainDefPtr def) } +#define PCI_MAX_BRIDGE_NUMBER 0xff +#define PCI_MAX_DEVICES 32 + + +/** + * qemuDomainPCIHostDevicesIter: + * @data - The data->device is the one which is called-back with for + * each hostdev + * cb() - callback to be called for each hostdev + * Return : + * If the callback for any of the hostdev fails, the Iter returns + * with the return value for that callback. + * Zero on success. + */ +static +int qemuDomainPCIHostDevicesIter(qemuDomainPCIHostdevDataPtr data, + virDomainPCIHostdevCallback cb) +{ + size_t i; + int ret = -1; + + /* Iterate through the PCI Hostdevices, the Mdev source is of type + * UUID, so skip that. */ + for (i = 0; i < data->def->nhostdevs; i++) { + virDomainHostdevSubsysPtr subsys = &data->def->hostdevs[i]->source.subsys; + virDomainHostdevDefPtr hostdev = data->def->hostdevs[i]; + if (data->device == hostdev) + continue; + if (data->def->hostdevs[i]->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS) + continue; + if (subsys->type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && + subsys->type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_SCSI_HOST && + (subsys->type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_MDEV && + subsys->u.mdev.model == VIR_MDEV_MODEL_TYPE_VFIO_PCI)) { + continue; + } + + if ((ret = cb(data, hostdev)) != 0) + return ret; + } + + return 0; +} + + + +static int +qemuDomainFindNextAggregationSlotIdxIter(virDomainDefPtr def ATTRIBUTE_UNUSED, + virDomainDeviceDefPtr dev ATTRIBUTE_UNUSED, + virDomainDeviceInfoPtr info, + void *opaque) +{ + int *aggregationSlotIdx = opaque; + + if (info && info->aggregateSlotIdx == *aggregationSlotIdx) + return -1; + + return 0; +} + + +static unsigned int +qemuDomainFindNextSlotAggregationIdx(virDomainDefPtr def) +{ + int aggregateSlotIdx = 2; + + while (aggregateSlotIdx < PCI_MAX_BRIDGE_NUMBER * PCI_MAX_DEVICES && + virDomainDeviceInfoIterate(def, + qemuDomainFindNextAggregationSlotIdxIter, + &aggregateSlotIdx) < 0) { + aggregateSlotIdx++; + } + + return aggregateSlotIdx; +} + + +static int +qemuDomainDefHostdevGetSlotAggregateIdx(qemuDomainPCIHostdevDataPtr data, + virDomainHostdevDefPtr hostdev) +{ + if (data->device && + virHostdevPCIDevicesBelongToSameSlot(data->device, hostdev)) { + if (hostdev->info->aggregateSlotIdx > 0) + return hostdev->info->aggregateSlotIdx; + } + + return 0; +} + +/** + * qemuDomainDefDeviceFindSlotAggregateIdx: + * @def : domain def + * @dev : Find the slot aggregate for the device if other + * functions are already part of the def and have + * a slot aggreate idx assigned. + * Return: + * -1: if not assigned. + * 0: If the device is not a hostdev or not a + * multifunction device. + * >0: If assinged a value; + **/ +int +qemuDomainDefDeviceFindSlotAggregateIdx(virDomainDefPtr def, + virDomainDeviceDefPtr dev) +{ + int aggregateSlotIdx = 0; + virPCIDevicePtr pciDev; + virDomainHostdevDefPtr hostdev = dev->data.hostdev; + qemuDomainPCIHostdevdata temp = {def, NULL, hostdev}; + virPCIDeviceAddressPtr hostAddr = &hostdev->source.subsys.u.pci.addr; + + /* Only PCI host devices are subject to isolation */ + if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS || + hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) { + return 0; + } + + if (!(pciDev = virPCIDeviceNew(hostAddr->domain, + hostAddr->bus, + hostAddr->slot, + hostAddr->function))) { + /* libvirt should be able to perform all the + * operations in virPCIDeviceNew() even if it's + * running unprivileged, so if this fails, the device + * apparently doesn't currently exist on the host. + * Since majority of host are non-multifunction, + * assume this one is too. + */ + return 0; + } + + if (!virPCIDeviceIsMultifunction(pciDev)) + return 0; + + aggregateSlotIdx = qemuDomainPCIHostDevicesIter(&temp, qemuDomainDefHostdevGetSlotAggregateIdx); + if (aggregateSlotIdx > 0) + return aggregateSlotIdx; + + return -1; +} + + void -qemuDomainSetDeviceSlotAggregateIdx(virDomainDefPtr def ATTRIBUTE_UNUSED, +qemuDomainSetDeviceSlotAggregateIdx(virDomainDefPtr def, virDomainDeviceDefPtr dev) { virDomainDeviceInfoPtr info = virDomainDeviceGetInfo(dev); + int aggregateSlotIdx = 0; if (!info) return; @@ -1203,6 +1347,12 @@ qemuDomainSetDeviceSlotAggregateIdx(virDomainDefPtr def ATTRIBUTE_UNUSED, cont->model == VIR_DOMAIN_CONTROLLER_MODEL_PCIE_ROOT_PORT) { info->aggregateSlotIdx = 1; } + } else if (dev->type == VIR_DOMAIN_DEVICE_HOSTDEV) { + aggregateSlotIdx = qemuDomainDefDeviceFindSlotAggregateIdx(def, dev); + if (aggregateSlotIdx > 0) + info->aggregateSlotIdx = aggregateSlotIdx; + else if (aggregateSlotIdx < 0) + info->aggregateSlotIdx = qemuDomainFindNextSlotAggregationIdx(def); } return; @@ -2073,10 +2223,12 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, /* Host PCI devices */ for (i = 0; i < def->nhostdevs; i++) { - virDomainHostdevSubsysPtr subsys = &def->hostdevs[i]->source.subsys; - if (!virDeviceInfoPCIAddressWanted(def->hostdevs[i]->info)) + int function = 0; + virDomainHostdevDefPtr hostdev = def->hostdevs[i]; + virDomainHostdevSubsysPtr subsys = &hostdev->source.subsys; + if (!virDeviceInfoPCIAddressWanted(hostdev->info)) continue; - if (def->hostdevs[i]->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS) + if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS) continue; if (subsys->type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && subsys->type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_SCSI_HOST && @@ -2085,9 +2237,14 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, continue; } - if (qemuDomainPCIAddressReserveNextAddr(addrs, - def->hostdevs[i]->info) < 0) + if (hostdev->info->aggregateSlotIdx > 1) + function = hostdev->source.subsys.u.pci.addr.function; + + if (virDomainPCIAddressReserveNextAddr(addrs, hostdev->info, + hostdev->info->pciConnectFlags, + function) < 0) { goto error; + } } /* VirtIO balloon */ diff --git a/tests/qemuhotplugtest.c b/tests/qemuhotplugtest.c index 31ce8d43b9..b5dca5e5c9 100644 --- a/tests/qemuhotplugtest.c +++ b/tests/qemuhotplugtest.c @@ -81,6 +81,7 @@ qemuHotplugCreateObjects(virDomainXMLOptionPtr xmlopt, virQEMUCapsSet(priv->qemuCaps, QEMU_CAPS_SCSI_DISK_WWN); virQEMUCapsSet(priv->qemuCaps, QEMU_CAPS_DEVICE_VFIO_PCI); virQEMUCapsSet(priv->qemuCaps, QEMU_CAPS_DEVICE_SPAPR_PCI_HOST_BRIDGE); + virQEMUCapsSet(priv->qemuCaps, QEMU_CAPS_VIRTIO_PCI_DISABLE_LEGACY); if (event) virQEMUCapsSet(priv->qemuCaps, QEMU_CAPS_DEVICE_DEL_EVENT); diff --git a/tests/qemuxml2argvdata/hostdev-pci-multifunction.args b/tests/qemuxml2argvdata/hostdev-pci-multifunction.args index 6b57f5713f..243ca8f2aa 100644 --- a/tests/qemuxml2argvdata/hostdev-pci-multifunction.args +++ b/tests/qemuxml2argvdata/hostdev-pci-multifunction.args @@ -19,11 +19,13 @@ server,nowait \ -no-acpi \ -boot c \ -usb \ --device vfio-pci,host=0005:90:01.0,id=hostdev0,bus=pci.0,addr=0x3 \ --device vfio-pci,host=0001:01:00.1,id=hostdev1,bus=pci.0,addr=0x4 \ --device vfio-pci,host=0001:01:00.0,id=hostdev2,bus=pci.0,addr=0x5 \ --device vfio-pci,host=0005:90:01.2,id=hostdev3,bus=pci.0,addr=0x6 \ --device vfio-pci,host=0005:90:01.3,id=hostdev4,bus=pci.0,addr=0x7 \ --device vfio-pci,host=06:12.1,id=hostdev5,bus=pci.0,addr=0x8 \ --device vfio-pci,host=06:12.2,id=hostdev6,bus=pci.0,addr=0x9 \ --device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa +-device vfio-pci,host=0005:90:01.0,id=hostdev0,bus=pci.0,multifunction=on,\ +addr=0x3 \ +-device vfio-pci,host=0001:01:00.1,id=hostdev1,bus=pci.0,addr=0x4.0x1 \ +-device vfio-pci,host=0001:01:00.0,id=hostdev2,bus=pci.0,multifunction=on,\ +addr=0x4 \ +-device vfio-pci,host=0005:90:01.2,id=hostdev3,bus=pci.0,addr=0x3.0x2 \ +-device vfio-pci,host=0005:90:01.3,id=hostdev4,bus=pci.0,addr=0x3.0x3 \ +-device vfio-pci,host=06:12.1,id=hostdev5,bus=pci.0,addr=0x5 \ +-device vfio-pci,host=06:12.2,id=hostdev6,bus=pci.0,addr=0x6 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 diff --git a/tests/qemuxml2argvdata/pseries-hostdevs-1.args b/tests/qemuxml2argvdata/pseries-hostdevs-1.args index 8a4a4c5a63..6195350291 100644 --- a/tests/qemuxml2argvdata/pseries-hostdevs-1.args +++ b/tests/qemuxml2argvdata/pseries-hostdevs-1.args @@ -21,5 +21,6 @@ server,nowait \ -device spapr-pci-host-bridge,index=1,id=pci.1 \ -device spapr-pci-host-bridge,index=2,id=pci.2 \ -device vfio-pci,host=0005:90:01.0,id=hostdev0,bus=pci.1.0,addr=0x1 \ --device vfio-pci,host=0001:01:00.0,id=hostdev1,bus=pci.2.0,addr=0x1 \ --device vfio-pci,host=0001:01:00.1,id=hostdev2,bus=pci.2.0,addr=0x2 +-device vfio-pci,host=0001:01:00.0,id=hostdev1,bus=pci.2.0,multifunction=on,\ +addr=0x1 \ +-device vfio-pci,host=0001:01:00.1,id=hostdev2,bus=pci.2.0,addr=0x1.0x1 diff --git a/tests/qemuxml2argvdata/pseries-hostdevs-3.args b/tests/qemuxml2argvdata/pseries-hostdevs-3.args index 66a31ba1a8..2e800574de 100644 --- a/tests/qemuxml2argvdata/pseries-hostdevs-3.args +++ b/tests/qemuxml2argvdata/pseries-hostdevs-3.args @@ -20,5 +20,6 @@ server,nowait \ -boot c \ -device spapr-pci-host-bridge,index=1,id=pci.1 \ -device spapr-pci-host-bridge,index=2,id=pci.2 \ --device vfio-pci,host=0001:01:00.0,id=hostdev0,bus=pci.2.0,addr=0x1 \ --device vfio-pci,host=0001:01:00.1,id=hostdev1,bus=pci.2.0,addr=0x2 +-device vfio-pci,host=0001:01:00.0,id=hostdev0,bus=pci.2.0,multifunction=on,\ +addr=0x1 \ +-device vfio-pci,host=0001:01:00.1,id=hostdev1,bus=pci.2.0,addr=0x1.0x1 diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 1e00eb167a..583598dfec 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1607,7 +1607,8 @@ mymain(void) DO_TEST("hostdev-pci-multifunction", QEMU_CAPS_KVM, QEMU_CAPS_DEVICE_VFIO_PCI, - QEMU_CAPS_HOST_PCI_MULTIDOMAIN, + QEMU_CAPS_VIRTIO_PCI_DISABLE_LEGACY, + QEMU_CAPS_HOST_PCI_MULTIDOMAIN, QEMU_CAPS_PCI_MULTIFUNCTION); DO_TEST("hostdev-mdev-precreated", QEMU_CAPS_NODEFCONFIG, @@ -1905,6 +1906,7 @@ mymain(void) QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_DEVICE_SPAPR_PCI_HOST_BRIDGE, QEMU_CAPS_HOST_PCI_MULTIDOMAIN, + QEMU_CAPS_PCI_MULTIFUNCTION, QEMU_CAPS_VIRTIO_SCSI, QEMU_CAPS_DEVICE_VFIO_PCI); DO_TEST("pseries-hostdevs-2", @@ -1918,6 +1920,7 @@ mymain(void) QEMU_CAPS_DEVICE_SPAPR_PCI_HOST_BRIDGE, QEMU_CAPS_HOST_PCI_MULTIDOMAIN, QEMU_CAPS_VIRTIO_SCSI, + QEMU_CAPS_PCI_MULTIFUNCTION, QEMU_CAPS_DEVICE_VFIO_PCI); DO_TEST("pseries-features-hpt", diff --git a/tests/qemuxml2xmloutdata/hostdev-pci-multifunction.xml b/tests/qemuxml2xmloutdata/hostdev-pci-multifunction.xml index 52ed86e305..3396a73cf8 100644 --- a/tests/qemuxml2xmloutdata/hostdev-pci-multifunction.xml +++ b/tests/qemuxml2xmloutdata/hostdev-pci-multifunction.xml @@ -28,52 +28,52 @@ <source> <address domain='0x0005' bus='0x90' slot='0x01' function='0x0'/> </source> - <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0' multifunction='on'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0001' bus='0x01' slot='0x00' function='0x1'/> </source> - <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x1'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0001' bus='0x01' slot='0x00' function='0x0'/> </source> - <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0' multifunction='on'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0005' bus='0x90' slot='0x01' function='0x2'/> </source> - <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x2'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0005' bus='0x90' slot='0x01' function='0x3'/> </source> - <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x3'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> </source> - <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x06' slot='0x12' function='0x2'/> </source> - <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </hostdev> <memballoon model='virtio'> - <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </memballoon> </devices> </domain> diff --git a/tests/qemuxml2xmloutdata/pseries-hostdevs-1.xml b/tests/qemuxml2xmloutdata/pseries-hostdevs-1.xml index e77a060a38..c09588de9d 100644 --- a/tests/qemuxml2xmloutdata/pseries-hostdevs-1.xml +++ b/tests/qemuxml2xmloutdata/pseries-hostdevs-1.xml @@ -40,14 +40,14 @@ <source> <address domain='0x0001' bus='0x01' slot='0x00' function='0x0'/> </source> - <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0' multifunction='on'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0001' bus='0x01' slot='0x00' function='0x1'/> </source> - <address type='pci' domain='0x0000' bus='0x02' slot='0x02' function='0x0'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x1'/> </hostdev> <memballoon model='none'/> <panic model='pseries'/> diff --git a/tests/qemuxml2xmloutdata/pseries-hostdevs-3.xml b/tests/qemuxml2xmloutdata/pseries-hostdevs-3.xml index f91959b805..f01adf6d25 100644 --- a/tests/qemuxml2xmloutdata/pseries-hostdevs-3.xml +++ b/tests/qemuxml2xmloutdata/pseries-hostdevs-3.xml @@ -32,14 +32,14 @@ <source> <address domain='0x0001' bus='0x01' slot='0x00' function='0x0'/> </source> - <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0' multifunction='on'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0001' bus='0x01' slot='0x00' function='0x1'/> </source> - <address type='pci' domain='0x0000' bus='0x02' slot='0x02' function='0x0'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x1'/> </hostdev> <memballoon model='none'/> <panic model='pseries'/>

This function will be useful in qemu_domain_address.c, so promote it from static. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/libvirt_private.syms | 1 + src/util/virhostdev.c | 2 +- src/util/virhostdev.h | 3 +++ 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 68b648ba31..b092c240d8 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1918,6 +1918,7 @@ virHostdevFindUSBDevice; virHostdevHostSupportsPassthroughKVM; virHostdevHostSupportsPassthroughVFIO; virHostdevIsSCSIDevice; +virHostdevIsVirtualFunction; virHostdevManagerGetDefault; virHostdevPCIDevicesBelongToSameSlot; virHostdevPCINodeDeviceDetach; diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index 9508a29954..454ae3568c 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -289,7 +289,7 @@ virHostdevPCISysfsPath(virDomainHostdevDefPtr hostdev, } -static int +int virHostdevIsVirtualFunction(virDomainHostdevDefPtr hostdev) { char *sysfs_path = NULL; diff --git a/src/util/virhostdev.h b/src/util/virhostdev.h index ded7620355..d696b5fdcc 100644 --- a/src/util/virhostdev.h +++ b/src/util/virhostdev.h @@ -196,6 +196,9 @@ virHostdevReAttachDomainDevices(virHostdevManagerPtr mgr, bool virHostdevIsSCSIDevice(virDomainHostdevDefPtr hostdev) ATTRIBUTE_NONNULL(1); +int +virHostdevIsVirtualFunction(virDomainHostdevDefPtr hostdev) + ATTRIBUTE_NONNULL(1); /* functions used by NodeDevDetach/Reattach/Reset */ int virHostdevPCINodeDeviceDetach(virHostdevManagerPtr mgr,

It is invalid to have secondary functions without the primary functions part of the domain. Prevents new domain define, but existing ones would not vanish. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/qemu/qemu_domain.c | 3 + src/qemu/qemu_domain_address.c | 57 ++++++++++++++++++++ src/qemu/qemu_domain_address.h | 2 + .../hostdev-pci-no-primary-function.xml | 23 ++++++++ tests/qemuxml2argvdata/hostdev-pci-validate.args | 25 +++++++++ tests/qemuxml2argvdata/hostdev-pci-validate.xml | 29 ++++++++++ tests/qemuxml2argvtest.c | 9 +++ 7 files changed, 148 insertions(+) create mode 100644 tests/qemuxml2argvdata/hostdev-pci-no-primary-function.xml create mode 100644 tests/qemuxml2argvdata/hostdev-pci-validate.args create mode 100644 tests/qemuxml2argvdata/hostdev-pci-validate.xml diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 2d108bec1b..12ed68a89b 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -3565,6 +3565,9 @@ qemuDomainDefValidate(const virDomainDef *def, if (qemuDomainDefValidateFeatures(def) < 0) goto cleanup; + if (qemuDomainDefValidatePCIHostdevs(def) < 0) + goto cleanup; + ret = 0; cleanup: diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index bc72b6e94c..7bee4fb937 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -1996,6 +1996,63 @@ qemuDomainValidateDevicePCISlotsChipsets(virDomainDefPtr def, } +static int +qemuDomainDefPCIHostdevIsPrimaryFunction(qemuDomainPCIHostdevDataPtr data, + virDomainHostdevDefPtr hostdev) +{ + if (!data->device || !hostdev) + return 0; + + if ((hostdev->source.subsys.u.pci.addr.function == 0) && + (virHostdevPCIDevicesBelongToSameSlot(data->device, hostdev))) + return 1; + + return 0; +} + + +static int qemuDomainDefValidatePCIMultifunctionHostdev(qemuDomainPCIHostdevDataPtr data, + virDomainHostdevDefPtr hostdev) +{ + int ret = 0; + qemuDomainPCIHostdevdata hostdevIterData = {data->def, NULL, hostdev}; + + if (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI || + hostdev->source.subsys.u.pci.addr.function == 0) + goto skip; + + if (virHostdevIsVirtualFunction(hostdev)) + goto skip; + + /* If the device is non-zero function but its Primary function is not + * part of the domain, then error out. + */ + if (!qemuDomainPCIHostDevicesIter(&hostdevIterData, + qemuDomainDefPCIHostdevIsPrimaryFunction)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Secondary functions of a PCI multifunction card " + "cannot be assigned to a domain without the " + "Primary function.")); + ret = -1; + } + + skip: + return ret; +} + +int qemuDomainDefValidatePCIHostdevs(const virDomainDef *def) +{ + qemuDomainPCIHostdevdata hostdevdata = {def, NULL, NULL}; + + if (qemuDomainPCIHostDevicesIter(&hostdevdata, + qemuDomainDefValidatePCIMultifunctionHostdev)) { + return -1; + } + + return 0; +} + + /* * This assigns static PCI slots to all configured devices. * The ordering here is chosen to match the ordering used diff --git a/src/qemu/qemu_domain_address.h b/src/qemu/qemu_domain_address.h index 650f977cf7..e1cc467714 100644 --- a/src/qemu/qemu_domain_address.h +++ b/src/qemu/qemu_domain_address.h @@ -62,6 +62,8 @@ int qemuDomainDefDeviceFindSlotAggregateIdx(virDomainDefPtr def, virDomainDeviceDefPtr dev); +int qemuDomainDefValidatePCIHostdevs(const virDomainDef *def); + void qemuDomainReleaseDeviceAddress(virDomainObjPtr vm, virDomainDeviceInfoPtr info, diff --git a/tests/qemuxml2argvdata/hostdev-pci-no-primary-function.xml b/tests/qemuxml2argvdata/hostdev-pci-no-primary-function.xml new file mode 100644 index 0000000000..7106ab73b1 --- /dev/null +++ b/tests/qemuxml2argvdata/hostdev-pci-no-primary-function.xml @@ -0,0 +1,23 @@ +<domain type='kvm'> + <name>delete</name> + <uuid>583a8e8e-f0ce-4f53-89ab-092862148b25</uuid> + <memory unit='KiB'>262144</memory> + <vcpu placement='static'>4</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + </os> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0005' bus='0x09' slot='0x00' function='0x1'/> + </source> + </hostdev> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/hostdev-pci-validate.args b/tests/qemuxml2argvdata/hostdev-pci-validate.args new file mode 100644 index 0000000000..bda8cab6c9 --- /dev/null +++ b/tests/qemuxml2argvdata/hostdev-pci-validate.args @@ -0,0 +1,25 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name delete \ +-S \ +-M pc \ +-m 256 \ +-smp 4,sockets=4,cores=1,threads=1 \ +-uuid 583a8e8e-f0ce-4f53-89ab-092862148b25 \ +-nographic \ +-nodefconfig \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-delete/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=readline \ +-no-acpi \ +-boot c \ +-usb \ +-device vfio-pci,host=06:12.1,id=hostdev0,bus=pci.0,addr=0x3 \ +-device vfio-pci,host=06:12.2,id=hostdev1,bus=pci.0,addr=0x4 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 diff --git a/tests/qemuxml2argvdata/hostdev-pci-validate.xml b/tests/qemuxml2argvdata/hostdev-pci-validate.xml new file mode 100644 index 0000000000..54797c2dda --- /dev/null +++ b/tests/qemuxml2argvdata/hostdev-pci-validate.xml @@ -0,0 +1,29 @@ +<domain type='kvm'> + <name>delete</name> + <uuid>583a8e8e-f0ce-4f53-89ab-092862148b25</uuid> + <memory unit='KiB'>262144</memory> + <vcpu placement='static'>4</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + </os> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> + </source> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x2'/> + </source> + </hostdev> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 583598dfec..659f20cb28 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1610,6 +1610,15 @@ mymain(void) QEMU_CAPS_VIRTIO_PCI_DISABLE_LEGACY, QEMU_CAPS_HOST_PCI_MULTIDOMAIN, QEMU_CAPS_PCI_MULTIFUNCTION); + DO_TEST("hostdev-pci-validate", + QEMU_CAPS_KVM, + QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_HOST_PCI_MULTIDOMAIN, + QEMU_CAPS_DEVICE_VFIO_PCI); + DO_TEST_PARSE_ERROR("hostdev-pci-no-primary-function", + QEMU_CAPS_KVM, + QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_HOST_PCI_MULTIDOMAIN, + QEMU_CAPS_DEVICE_VFIO_PCI); + DO_TEST("hostdev-mdev-precreated", QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_DEVICE_VFIO_PCI);

In some cases it may be better to have a bitmap representing state of individual functions rather than iterating the definition. The new helper creates a bitmap representing the state from the domain definition. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/conf/domain_conf.c | 25 +++++++++++++++++++++++++ src/conf/domain_conf.h | 3 +++ src/libvirt_private.syms | 1 + 3 files changed, 29 insertions(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 86fc275116..3deed8eb4d 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -16146,6 +16146,31 @@ virDomainHostdevFind(virDomainDefPtr def, return *found ? i : -1; } +#define PCI_MAX_SLOT_FUNCTIONS 8 +/** + * virDomainDefHostdevGetPCIOnlineFunctionMap: + * @def: domain definition + * @aggrSlotIdx: slot aggregation index + * Returns a bitmap representing state of individual functions of a slot. + */ +virBitmapPtr +virDomainDefHostdevGetPCIOnlineFunctionMap(virDomainDefPtr def, + int aggrSlotIdx) +{ + size_t i; + virBitmapPtr ret = NULL; + + if (!(ret = virBitmapNew(PCI_MAX_SLOT_FUNCTIONS))) + return NULL; + + for (i = 0; i < def->nhostdevs; i++) { + if (def->hostdevs[i]->info->aggregateSlotIdx == aggrSlotIdx) + ignore_value(virBitmapSetBit(ret, def->hostdevs[i]->source.subsys.u.pci.addr.function)); + } + + return ret; +} + static bool virDomainDiskControllerMatch(int controller_type, int disk_bus) { diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 61379e50fe..a3d686a5ca 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -3067,6 +3067,9 @@ virDomainHostdevDefPtr virDomainHostdevRemove(virDomainDefPtr def, size_t i); int virDomainHostdevFind(virDomainDefPtr def, virDomainHostdevDefPtr match, virDomainHostdevDefPtr *found); +virBitmapPtr virDomainDefHostdevGetPCIOnlineFunctionMap(virDomainDefPtr def, + int aggrSlotIdx); + virDomainGraphicsListenDefPtr virDomainGraphicsGetListen(virDomainGraphicsDefPtr def, size_t i); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b092c240d8..b0e8f2ca61 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -278,6 +278,7 @@ virDomainDefHasMemballoon; virDomainDefHasMemoryHotplug; virDomainDefHasUSB; virDomainDefHasVcpusOffline; +virDomainDefHostdevGetPCIOnlineFunctionMap; virDomainDefLifecycleActionAllowed; virDomainDefMaybeAddController; virDomainDefMaybeAddInput;

No functional change. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/qemu/qemu_hotplug.c | 90 +++++++++++++++++++++++++++++------------------ src/qemu/qemu_hotplug.h | 5 +++ 2 files changed, 60 insertions(+), 35 deletions(-) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index e0a5300f08..1bf87d963e 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -688,6 +688,57 @@ qemuDomainAttachUSBMassStorageDevice(virQEMUDriverPtr driver, return 0; } +int qemuDomainAttachPCIHostDevicePrepare(virQEMUDriverPtr driver, + virDomainDefPtr def, + virDomainHostdevDefPtr hostdev, + virQEMUCapsPtr qemuCaps) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + unsigned int flags = 0; + int ret = -1; + int backend; + + if (!cfg->relaxedACS) + flags |= VIR_HOSTDEV_STRICT_ACS_CHECK; + if (qemuHostdevPreparePCIDevices(driver, def->name, def->uuid, + &hostdev, 1, qemuCaps, flags) < 0) + goto exit; + + /* this could have been changed by qemuHostdevPreparePCIDevices */ + backend = hostdev->source.subsys.u.pci.backend; + + switch ((virDomainHostdevSubsysPCIBackendType) backend) { + case VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO: + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VFIO_PCI)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("VFIO PCI device assignment is not " + "supported by this version of qemu")); + goto error; + } + break; + + case VIR_DOMAIN_HOSTDEV_PCI_BACKEND_DEFAULT: + case VIR_DOMAIN_HOSTDEV_PCI_BACKEND_KVM: + break; + + case VIR_DOMAIN_HOSTDEV_PCI_BACKEND_XEN: + case VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_LAST: + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("QEMU does not support device assignment mode '%s'"), + virDomainHostdevSubsysPCIBackendTypeToString(backend)); + goto error; + break; + } + + ret = 0; + exit: + virObjectUnref(cfg); + return ret; + error: + qemuHostdevReAttachPCIDevices(driver, def->name, &hostdev, 1); + goto exit; +} + int qemuDomainAttachDeviceDiskLive(virQEMUDriverPtr driver, @@ -1257,44 +1308,16 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver, bool teardownlabel = false; bool teardowndevice = false; int backend; - virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); - unsigned int flags = 0; if (VIR_REALLOC_N(vm->def->hostdevs, vm->def->nhostdevs + 1) < 0) - goto cleanup; + return -1; - if (!cfg->relaxedACS) - flags |= VIR_HOSTDEV_STRICT_ACS_CHECK; - if (qemuHostdevPreparePCIDevices(driver, vm->def->name, vm->def->uuid, - &hostdev, 1, priv->qemuCaps, flags) < 0) - goto cleanup; + if (qemuDomainAttachPCIHostDevicePrepare(driver, vm->def, + hostdev, priv->qemuCaps) < 0) + return -1; - /* this could have been changed by qemuHostdevPreparePCIDevices */ backend = hostdev->source.subsys.u.pci.backend; - switch ((virDomainHostdevSubsysPCIBackendType) backend) { - case VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO: - if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE_VFIO_PCI)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("VFIO PCI device assignment is not " - "supported by this version of qemu")); - goto error; - } - break; - - case VIR_DOMAIN_HOSTDEV_PCI_BACKEND_DEFAULT: - case VIR_DOMAIN_HOSTDEV_PCI_BACKEND_KVM: - break; - - case VIR_DOMAIN_HOSTDEV_PCI_BACKEND_XEN: - case VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_LAST: - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("QEMU does not support device assignment mode '%s'"), - virDomainHostdevSubsysPCIBackendTypeToString(backend)); - goto error; - break; - } - /* Temporarily add the hostdev to the domain definition. This is needed * because qemuDomainAdjustMaxMemLock() requires the hostdev to be already * part of the domain definition, but other functions like @@ -1366,7 +1389,6 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver, VIR_FREE(devstr); VIR_FREE(configfd_name); VIR_FORCE_CLOSE(configfd); - virObjectUnref(cfg); return 0; @@ -1389,8 +1411,6 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver, VIR_FREE(configfd_name); VIR_FORCE_CLOSE(configfd); - cleanup: - virObjectUnref(cfg); return -1; } diff --git a/src/qemu/qemu_hotplug.h b/src/qemu/qemu_hotplug.h index b2f5fa688b..db0e1df79a 100644 --- a/src/qemu/qemu_hotplug.h +++ b/src/qemu/qemu_hotplug.h @@ -154,6 +154,11 @@ void qemuDomainRemoveVcpuAlias(virQEMUDriverPtr driver, virDomainObjPtr vm, const char *alias); +int qemuDomainAttachPCIHostDevicePrepare(virQEMUDriverPtr driver, + virDomainDefPtr def, + virDomainHostdevDefPtr dev, + virQEMUCapsPtr qemuCaps); + int qemuDomainChrInsert(virDomainDefPtr vmdef, virDomainChrDefPtr chr);

The hostdevices are the only devices which have dependencies outside of themselves such that, other functions of the PCI card should also have been detached from host driver before attempting the hotplug. This patch moves the detach to the beginning of the hotplug so that the following patch can detach all funtions first before attempting to hotplug any. We need not move the detach for net devices using SRIOV as all SRIOV devices are single function devices and can be independently detached as usual. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/qemu/qemu_hotplug.c | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 1bf87d963e..214e169980 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -888,6 +888,7 @@ qemuDomainAttachNetDevice(virQEMUDriverPtr driver, bool charDevPlugged = false; bool netdevPlugged = false; bool hostPlugged = false; + virDomainHostdevDefPtr hostdev = NULL; /* preallocate new slot for device */ if (VIR_REALLOC_N(vm->def->nets, vm->def->nnets + 1) < 0) @@ -998,9 +999,16 @@ qemuDomainAttachNetDevice(virQEMUDriverPtr driver, * as a hostdev (the hostdev code will reach over into the * netdev-specific code as appropriate), then also added to * the nets list (see cleanup:) if successful. + * + * qemuDomainAttachHostDevice uses a connection to resolve + * a SCSI hostdev secret, which is not this case, so pass NULL. */ - ret = qemuDomainAttachHostDevice(driver, vm, - virDomainNetGetActualHostdev(net)); + hostdev = virDomainNetGetActualHostdev(net); + if (qemuDomainAttachPCIHostDevicePrepare(driver, vm->def, + hostdev, priv->qemuCaps) < 0) + goto cleanup; + if ((ret = qemuDomainAttachHostDevice(driver, vm, hostdev)) < 0) + qemuHostdevReAttachPCIDevices(driver, vm->def->name, &hostdev, 1); goto cleanup; break; @@ -1312,10 +1320,6 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver, if (VIR_REALLOC_N(vm->def->hostdevs, vm->def->nhostdevs + 1) < 0) return -1; - if (qemuDomainAttachPCIHostDevicePrepare(driver, vm->def, - hostdev, priv->qemuCaps) < 0) - return -1; - backend = hostdev->source.subsys.u.pci.backend; /* Temporarily add the hostdev to the domain definition. This is needed @@ -1405,8 +1409,6 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver, if (releaseaddr) qemuDomainReleaseDeviceAddress(vm, info, NULL); - qemuHostdevReAttachPCIDevices(driver, vm->def->name, &hostdev, 1); - VIR_FREE(devstr); VIR_FREE(configfd_name); VIR_FORCE_CLOSE(configfd); @@ -2588,6 +2590,8 @@ qemuDomainAttachHostDevice(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainHostdevDefPtr hostdev) { + qemuDomainObjPrivatePtr priv = vm->privateData; + if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("hotplug is not supported for hostdev mode '%s'"), @@ -2597,9 +2601,14 @@ qemuDomainAttachHostDevice(virQEMUDriverPtr driver, switch (hostdev->source.subsys.type) { case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: + if (qemuDomainAttachPCIHostDevicePrepare(driver, vm->def, + hostdev, priv->qemuCaps) < 0) + goto error; if (qemuDomainAttachHostPCIDevice(driver, vm, - hostdev) < 0) + hostdev) < 0) { + qemuHostdevReAttachPCIDevices(driver, vm->def->name, &hostdev, 1); goto error; + } break; case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB:

No functional change. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/qemu/qemu_hotplug.c | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 214e169980..007ecb0923 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1304,14 +1304,11 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver, virDomainHostdevDefPtr hostdev) { qemuDomainObjPrivatePtr priv = vm->privateData; - virDomainDeviceDef dev = { VIR_DOMAIN_DEVICE_HOSTDEV, - { .hostdev = hostdev } }; virDomainDeviceInfoPtr info = hostdev->info; int ret; char *devstr = NULL; int configfd = -1; char *configfd_name = NULL; - bool releaseaddr = false; bool teardowncgroup = false; bool teardownlabel = false; bool teardowndevice = false; @@ -1350,15 +1347,6 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver, if (qemuAssignDeviceHostdevAlias(vm->def, &info->alias, -1) < 0) goto error; - if (qemuDomainIsPSeries(vm->def)) { - /* Isolation groups are only relevant for pSeries guests */ - if (qemuDomainFillDeviceIsolationGroup(vm->def, &dev) < 0) - goto error; - } - - if (qemuDomainEnsurePCIAddress(vm, &dev, driver) < 0) - goto error; - releaseaddr = true; if (backend != VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO && virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_PCI_CONFIGFD)) { configfd = qemuOpenPCIConfig(hostdev); @@ -1406,9 +1394,6 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver, qemuDomainNamespaceTeardownHostdev(vm, hostdev) < 0) VIR_WARN("Unable to remove host device from /dev"); - if (releaseaddr) - qemuDomainReleaseDeviceAddress(vm, info, NULL); - VIR_FREE(devstr); VIR_FREE(configfd_name); VIR_FORCE_CLOSE(configfd); @@ -2591,6 +2576,8 @@ qemuDomainAttachHostDevice(virQEMUDriverPtr driver, virDomainHostdevDefPtr hostdev) { qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDeviceDef dev = { VIR_DOMAIN_DEVICE_HOSTDEV, + { .hostdev = hostdev } }; if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, @@ -2604,8 +2591,19 @@ qemuDomainAttachHostDevice(virQEMUDriverPtr driver, if (qemuDomainAttachPCIHostDevicePrepare(driver, vm->def, hostdev, priv->qemuCaps) < 0) goto error; + + if (qemuDomainIsPSeries(vm->def)) { + /* Isolation groups are only relevant for pSeries guests */ + if (qemuDomainFillDeviceIsolationGroup(vm->def, &dev) < 0) + goto error; + } + + if (qemuDomainEnsurePCIAddress(vm, &dev, driver) < 0) + goto error; + if (qemuDomainAttachHostPCIDevice(driver, vm, hostdev) < 0) { + qemuDomainReleaseDeviceAddress(vm, hostdev->info, NULL); qemuHostdevReAttachPCIDevices(driver, vm->def->name, &hostdev, 1); goto error; }

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/conf/domain_conf.c | 169 ++++++++++++++++++++++++++++++++++++++++------ src/conf/domain_conf.h | 36 ++++++++++ src/libvirt_private.syms | 6 ++ 3 files changed, 190 insertions(+), 21 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 3deed8eb4d..5f4a5127d4 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -963,6 +963,83 @@ virDomainXMLOptionClassDispose(void *obj) (xmlopt->config.privFree)(xmlopt->config.priv); } +/* virDomainDeviceDefListAddCopy - add a *copy* of the device to this list */ +int +virDomainDeviceDefListAddCopy(virDomainDeviceDefListPtr list, + virDomainDeviceDefPtr dev, + virDomainDeviceDefListDataPtr data) +{ + virDomainDeviceDefPtr copy = virDomainDeviceDefCopy(dev, data->def, data->caps, data->xmlopt); + + if (!copy) + return -1; + if (VIR_APPEND_ELEMENT(list->devs, list->count, copy) < 0) { + virDomainDeviceDefFree(copy); + return -1; + } + return 0; +} + +void virDomainDeviceDefListFree(virDomainDeviceDefListPtr list) +{ + size_t i; + + if (!list) + return; + for (i = 0; i < list->count; i++) + virDomainDeviceDefFree(list->devs[i]); + VIR_FREE(list->devs); +} + +void virDomainDeviceDefListFreeShallow(virDomainDeviceDefListPtr list) +{ + size_t i; + + if (!list) + return; + for (i = 0; i < list->count; i++) + VIR_FREE(list->devs[i]); +} + + +/* virDomainDeviceDefListIter - Iterate through the list with the callback*/ +int +virDomainDeviceDefListIterate(virDomainDeviceDefListPtr list, + virDomainDeviceDefListIterCallback cb, + void *data) +{ + size_t i; + + for (i = 0; i < list->count; i++) + if (cb(list->devs[i], data)) + return -1; + + return 0; +} + +virDomainDeviceDefListPtr +virDomainDeviceDefListCopy(virDomainDeviceDefListPtr list, + virDomainDeviceDefListDataPtr data) +{ + size_t i; + virDomainDeviceDefListPtr devlist = NULL; + + if (list && (VIR_ALLOC(devlist) < 0)) + goto cleanup; + + for (i = 0; i < list->count; i++) { + if (virDomainDeviceDefListAddCopy(devlist, list->devs[i], data) < 0) + goto cleanup; + } + + return devlist; + cleanup: + virDomainDeviceDefListFree(devlist); + return NULL; +} + + + /** * virDomainKeyWrapCipherDefParseXML: * @@ -15677,25 +15754,16 @@ virDomainIOMMUDefParseXML(xmlNodePtr node, return ret; } - -virDomainDeviceDefPtr -virDomainDeviceDefParse(const char *xmlStr, - const virDomainDef *def, - virCapsPtr caps, - virDomainXMLOptionPtr xmlopt, - unsigned int flags) +static +virDomainDeviceDefPtr virDomainDeviceDefParseXML(xmlNodePtr node, + const virDomainDef *def, + virCapsPtr caps, + virDomainXMLOptionPtr xmlopt, + xmlXPathContextPtr ctxt, + unsigned int flags) { - xmlDocPtr xml; - xmlNodePtr node; - xmlXPathContextPtr ctxt = NULL; virDomainDeviceDefPtr dev = NULL; char *netprefix; - - if (!(xml = virXMLParseStringCtxt(xmlStr, _("(device_definition)"), &ctxt))) - goto error; - - node = ctxt->node; - if (VIR_ALLOC(dev) < 0) goto error; @@ -15846,14 +15914,33 @@ virDomainDeviceDefParse(const char *xmlStr, if (virDomainDeviceDefValidate(dev, def, flags, xmlopt) < 0) goto error; - cleanup: + return dev; + error: + return NULL; +} + +virDomainDeviceDefPtr +virDomainDeviceDefParse(const char *xmlStr, + const virDomainDef *def, + virCapsPtr caps, + virDomainXMLOptionPtr xmlopt, + unsigned int flags) +{ + xmlDocPtr xml; + xmlNodePtr node; + xmlXPathContextPtr ctxt = NULL; + virDomainDeviceDefPtr dev = NULL; + + if (!(xml = virXMLParseStringCtxt(xmlStr, _("(device_definition)"), &ctxt))) + return NULL; + + node = ctxt->node; + + dev = virDomainDeviceDefParseXML(node, def, caps, xmlopt, ctxt, flags); + xmlFreeDoc(xml); xmlXPathFreeContext(ctxt); return dev; - - error: - VIR_FREE(dev); - goto cleanup; } @@ -29394,3 +29481,43 @@ virDomainDiskTranslateSourcePool(virDomainDiskDefPtr def) virStoragePoolDefFree(pooldef); return ret; } + +virDomainDeviceDefListPtr +virDomainDeviceDefParseXMLMany(const char *xml, + const virDomainDef *def, + virCapsPtr caps, + virDomainXMLOptionPtr xmlopt, + unsigned int flags) +{ + xmlXPathContextPtr ctxt = NULL; + xmlDocPtr xmlPtr; + xmlNodePtr node, root; + virDomainDeviceDefPtr dev = NULL; + virDomainDeviceDefListPtr devlist; + + if (!(xmlPtr = virXMLParseStringCtxt(xml, _("(device_definition)"), &ctxt))) + return NULL; + + if (VIR_ALLOC(devlist) < 0) + goto exit; + + root = xmlDocGetRootElement(xmlPtr); + node = root->children; + while (node) { + if (node->type == XML_ELEMENT_NODE) { + dev = virDomainDeviceDefParseXML(node, def, caps, xmlopt, ctxt, flags); + if (VIR_APPEND_ELEMENT(devlist->devs, devlist->count, dev) < 0) { + virDomainDeviceDefFree(dev); + virDomainDeviceDefListFree(devlist); + goto exit; + } + dev = NULL; + } + node = node->next; + } + + exit: + xmlFreeDoc(xmlPtr); + xmlXPathFreeContext(ctxt); + return devlist; +} diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index a3d686a5ca..828fe9f6d8 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2561,6 +2561,37 @@ typedef int (*virDomainDefPostParseBasicCallback)(virDomainDefPtr def, virCapsPtr caps, void *opaque); +typedef struct _virDomainDeviceDefListData virDomainDeviceDefListData; +typedef virDomainDeviceDefListData *virDomainDeviceDefListDataPtr; +struct _virDomainDeviceDefListData { + const virDomainDef *def; + virCapsPtr caps; + virDomainXMLOptionPtr xmlopt; +}; + +struct virDomainDeviceDefList { + virDomainDeviceDefPtr *devs; + size_t count; +}; +typedef struct virDomainDeviceDefList *virDomainDeviceDefListPtr; + +typedef int (*virDomainDeviceDefListIterCallback)(virDomainDeviceDefPtr dev, + void *opaque); +int virDomainDeviceDefListIterate(virDomainDeviceDefListPtr devlist, + virDomainDeviceDefListIterCallback cb, + void *data); +int +virDomainDeviceDefListAddCopy(virDomainDeviceDefListPtr list, + virDomainDeviceDefPtr dev, + virDomainDeviceDefListDataPtr data); +virDomainDeviceDefListPtr +virDomainDeviceDefListCopy(virDomainDeviceDefListPtr list, + virDomainDeviceDefListDataPtr data); + +void virDomainDeviceDefListFree(virDomainDeviceDefListPtr list); +void virDomainDeviceDefListFreeShallow(virDomainDeviceDefListPtr list); + + /* Called once after everything else has been parsed, for adjusting * overall domain defaults. * @parseOpaque is opaque data passed by virDomainDefParse* caller, @@ -2935,6 +2966,11 @@ virDomainDeviceDefPtr virDomainDeviceDefParse(const char *xmlStr, virCapsPtr caps, virDomainXMLOptionPtr xmlopt, unsigned int flags); +virDomainDeviceDefListPtr virDomainDeviceDefParseXMLMany(const char *xmlStr, + const virDomainDef *def, + virCapsPtr caps, + virDomainXMLOptionPtr xmlopt, + unsigned int flags); virStorageSourcePtr virDomainDiskDefSourceParse(const char *xmlStr, const virDomainDef *def, virDomainXMLOptionPtr xmlopt, diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b0e8f2ca61..d1ff2f5f99 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -300,7 +300,13 @@ virDomainDeviceAddressTypeToString; virDomainDeviceAliasIsUserAlias; virDomainDeviceDefCopy; virDomainDeviceDefFree; +virDomainDeviceDefListAddCopy; +virDomainDeviceDefListCopy; +virDomainDeviceDefListFree; +virDomainDeviceDefListFreeShallow; +virDomainDeviceDefListIterate; virDomainDeviceDefParse; +virDomainDeviceDefParseXMLMany; virDomainDeviceFindSCSIController; virDomainDeviceGetInfo; virDomainDeviceInfoIterate;

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/qemu/qemu_domain.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 5 ++++ 2 files changed, 72 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 12ed68a89b..c0a0af525f 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11842,3 +11842,70 @@ qemuProcessEventFree(struct qemuProcessEvent *event) } VIR_FREE(event); } + + +static bool isPCIMultifunctionDeviceXML(const char *xml) +{ + xmlDocPtr xmlptr; + + if (!(xmlptr = virXMLParse(NULL, xml, _("(device_definition)")))) { + /* We report error anyway later */ + return false; + } + + return STREQ((const char *)(xmlDocGetRootElement(xmlptr))->name, "devices"); +} + +static +int qemuDomainValidateMultifunctionDeviceList(virDomainDeviceDefListPtr devlist) +{ + size_t i; + virDomainHostdevDefPtr hostdev = NULL; + virDomainDeviceInfoPtr info; + + for (i = 0; i < devlist->count; i++) { + info = virDomainDeviceGetInfo(devlist->devs[i]); + if (devlist->devs[i]->type == VIR_DOMAIN_DEVICE_HOSTDEV) + hostdev = devlist->devs[i]->data.hostdev; + + if (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) + return -1; + + if (info->type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE && + info->type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI) { + return -1; + } + } + return 0; +} + + +virDomainDeviceDefListPtr +qemuDomainDeviceParseXMLMany(const char *xml, + virDomainDeviceDefListDataPtr data, + unsigned int parse_flags) +{ + virDomainDeviceDefListPtr devlist = NULL; + + if (isPCIMultifunctionDeviceXML(xml)) { + if (!(devlist = virDomainDeviceDefParseXMLMany(xml, data->def, + data->caps, data->xmlopt, + parse_flags))) + goto cleanup; + + if (qemuDomainValidateMultifunctionDeviceList(devlist) < 0) + goto cleanup; + } else { + virDomainDeviceDefPtr dev = virDomainDeviceDefParse(xml, data->def, + data->caps, data->xmlopt, + parse_flags); + if (!dev || VIR_ALLOC(devlist) < 0) + goto cleanup; + + if (VIR_APPEND_ELEMENT(devlist->devs, devlist->count, dev) < 0) + goto cleanup; + } + + cleanup: + return devlist; +} diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index fbfc994652..339d0ba82c 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -1011,4 +1011,9 @@ qemuDomainPrepareDiskSource(virDomainDiskDefPtr disk, qemuDomainObjPrivatePtr priv, virQEMUDriverConfigPtr cfg); +virDomainDeviceDefListPtr +qemuDomainDeviceParseXMLMany(const char *xml, + virDomainDeviceDefListDataPtr data, + unsigned int parse_flags); + #endif /* __QEMU_DOMAIN_H__ */

This helps calling the routines with a list of devices. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/qemu/qemu_driver.c | 45 ++++++++++++++++++++++++++++++++++----------- 1 file changed, 34 insertions(+), 11 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 0ade86d6a9..18e88f05bb 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7969,11 +7969,8 @@ qemuDomainUpdateDeviceLive(virDomainObjPtr vm, } static int -qemuDomainAttachDeviceConfig(virDomainDefPtr vmdef, - virDomainDeviceDefPtr dev, - virCapsPtr caps, - unsigned int parse_flags, - virDomainXMLOptionPtr xmlopt) +qemuDomainAttachDeviceConfigInternal(virDomainDefPtr vmdef, + virDomainDeviceDefPtr dev) { virDomainDiskDefPtr disk; virDomainNetDefPtr net; @@ -8151,20 +8148,34 @@ qemuDomainAttachDeviceConfig(virDomainDefPtr vmdef, return -1; } - if (virDomainDefPostParse(vmdef, caps, parse_flags, xmlopt, NULL) < 0) - return -1; - return 0; } static int -qemuDomainDetachDeviceConfig(virDomainDefPtr vmdef, +qemuDomainAttachDeviceConfig(virDomainDefPtr vmdef, virDomainDeviceDefPtr dev, virCapsPtr caps, unsigned int parse_flags, virDomainXMLOptionPtr xmlopt) { + if (virDomainDefCompatibleDevice(vmdef, dev, NULL) < 0) + return -1; + + if (qemuDomainAttachDeviceConfigInternal(vmdef, dev)) + return -1; + + if (virDomainDefPostParse(vmdef, caps, parse_flags, xmlopt, NULL) < 0) + return -1; + + return 0; +} + + +static int +qemuDomainDetachDeviceConfigInternal(virDomainDefPtr vmdef, + virDomainDeviceDefPtr dev) +{ virDomainDiskDefPtr disk, det_disk; virDomainNetDefPtr net; virDomainHostdevDefPtr hostdev, det_hostdev; @@ -8334,6 +8345,20 @@ qemuDomainDetachDeviceConfig(virDomainDefPtr vmdef, return -1; } + return 0; +} + + +static int +qemuDomainDetachDeviceConfig(virDomainDefPtr vmdef, + virDomainDeviceDefPtr dev, + virCapsPtr caps, + unsigned int parse_flags, + virDomainXMLOptionPtr xmlopt) +{ + if (qemuDomainDetachDeviceConfigInternal(vmdef, dev)) + return -1; + if (virDomainDefPostParse(vmdef, caps, parse_flags, xmlopt, NULL) < 0) return -1; @@ -8486,8 +8511,6 @@ qemuDomainAttachDeviceLiveAndConfig(virDomainObjPtr vm, if (!vmdef) goto cleanup; - if (virDomainDefCompatibleDevice(vmdef, dev, NULL) < 0) - goto cleanup; if ((ret = qemuDomainAttachDeviceConfig(vmdef, dev, caps, parse_flags, driver->xmlopt)) < 0)

Helps calling multiple time per device --- src/qemu/qemu_driver.c | 41 +++++++++++++++++++++++++++++++---------- 1 file changed, 31 insertions(+), 10 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 18e88f05bb..d2e10082ea 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7599,9 +7599,9 @@ qemuDomainUndefine(virDomainPtr dom) } static int -qemuDomainAttachDeviceLive(virDomainObjPtr vm, - virDomainDeviceDefPtr dev, - virQEMUDriverPtr driver) +qemuDomainAttachDeviceLiveInternal(virDomainObjPtr vm, + virDomainDeviceDefPtr dev, + virQEMUDriverPtr driver) { int ret = -1; const char *alias = NULL; @@ -7739,12 +7739,25 @@ qemuDomainAttachDeviceLive(virDomainObjPtr vm, qemuDomainEventQueue(driver, event); } + return ret; +} + +static int +qemuDomainAttachDeviceLive(virDomainObjPtr vm, + virDomainDeviceDefPtr dev, + virQEMUDriverPtr driver) +{ + int ret = -1; + + if (virDomainDefCompatibleDevice(vm->def, dev, NULL) < 0) + return -1; + + ret = qemuDomainAttachDeviceLiveInternal(vm, dev, driver); if (ret == 0) ret = qemuDomainUpdateDeviceList(driver, vm, QEMU_ASYNC_JOB_NONE); return ret; } - static int qemuDomainDetachDeviceControllerLive(virQEMUDriverPtr driver, virDomainObjPtr vm, @@ -7766,9 +7779,9 @@ qemuDomainDetachDeviceControllerLive(virQEMUDriverPtr driver, } static int -qemuDomainDetachDeviceLive(virDomainObjPtr vm, - virDomainDeviceDefPtr dev, - virQEMUDriverPtr driver) +qemuDomainDetachDeviceLiveInternal(virDomainObjPtr vm, + virDomainDeviceDefPtr dev, + virQEMUDriverPtr driver) { int ret = -1; @@ -7829,6 +7842,17 @@ qemuDomainDetachDeviceLive(virDomainObjPtr vm, break; } + return ret; +} + +static int +qemuDomainDetachDeviceLive(virDomainObjPtr vm, + virDomainDeviceDefPtr dev, + virQEMUDriverPtr driver) +{ + int ret = -1; + + ret = qemuDomainDetachDeviceLiveInternal(vm, dev, driver); if (ret == 0) ret = qemuDomainUpdateDeviceList(driver, vm, QEMU_ASYNC_JOB_NONE); @@ -8518,9 +8542,6 @@ qemuDomainAttachDeviceLiveAndConfig(virDomainObjPtr vm, } if (flags & VIR_DOMAIN_AFFECT_LIVE) { - if (virDomainDefCompatibleDevice(vm->def, dev_copy, NULL) < 0) - goto cleanup; - if ((ret = qemuDomainAttachDeviceLive(vm, dev_copy, driver)) < 0) goto cleanup; /*

With multifunction devices, multiple delete requests are sent to qemu and all the requests should be queued up. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/qemu/qemu_domain.h | 3 ++- src/qemu/qemu_hotplug.c | 38 ++++++++++++++++++++++++-------------- 2 files changed, 26 insertions(+), 15 deletions(-) diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 339d0ba82c..85cbc2b5e8 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -195,7 +195,8 @@ typedef enum { typedef struct _qemuDomainUnpluggingDevice qemuDomainUnpluggingDevice; typedef qemuDomainUnpluggingDevice *qemuDomainUnpluggingDevicePtr; struct _qemuDomainUnpluggingDevice { - const char *alias; + const char **aliases; + size_t naliases; qemuDomainUnpluggingDeviceStatus status; }; diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 007ecb0923..7dffaf9502 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -4542,16 +4542,21 @@ qemuDomainRemoveDevice(virQEMUDriverPtr driver, static void qemuDomainMarkDeviceAliasForRemoval(virDomainObjPtr vm, - const char *alias) + const char *alias, + bool fresh) { qemuDomainObjPrivatePtr priv = vm->privateData; - memset(&priv->unplug, 0, sizeof(priv->unplug)); + if (fresh) + memset(&priv->unplug, 0, sizeof(priv->unplug)); if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE_DEL_EVENT)) return; - priv->unplug.alias = alias; + if (VIR_REALLOC_N(priv->unplug.aliases, priv->unplug.naliases + 1) < 0) + return; + + priv->unplug.aliases[priv->unplug.naliases++] = alias; } @@ -4560,7 +4565,7 @@ qemuDomainMarkDeviceForRemoval(virDomainObjPtr vm, virDomainDeviceInfoPtr info) { - qemuDomainMarkDeviceAliasForRemoval(vm, info->alias); + qemuDomainMarkDeviceAliasForRemoval(vm, info->alias, true); } @@ -4568,7 +4573,8 @@ static void qemuDomainResetDeviceRemoval(virDomainObjPtr vm) { qemuDomainObjPrivatePtr priv = vm->privateData; - priv->unplug.alias = NULL; + VIR_FREE(priv->unplug.aliases); + priv->unplug.naliases = 0; } /* Returns: @@ -4596,13 +4602,14 @@ qemuDomainWaitForDeviceRemoval(virDomainObjPtr vm) return 1; until += qemuDomainRemoveDeviceWaitTime; - while (priv->unplug.alias) { + /* All devices should get released around same time*/ + while (priv->unplug.naliases) { if ((rc = virDomainObjWaitUntil(vm, until)) == 1) return 0; if (rc < 0) { VIR_WARN("Failed to wait on unplug condition for domain '%s' " - "device '%s'", vm->def->name, priv->unplug.alias); + "device '%s'", vm->def->name, priv->unplug.aliases[0]); return 1; } } @@ -4627,14 +4634,17 @@ qemuDomainSignalDeviceRemoval(virDomainObjPtr vm, const char *devAlias, qemuDomainUnpluggingDeviceStatus status) { + size_t i; qemuDomainObjPrivatePtr priv = vm->privateData; - if (STREQ_NULLABLE(priv->unplug.alias, devAlias)) { - VIR_DEBUG("Removal of device '%s' continues in waiting thread", devAlias); - qemuDomainResetDeviceRemoval(vm); - priv->unplug.status = status; - virDomainObjBroadcast(vm); - return true; + for (i = 0; i < priv->unplug.naliases; i++) { + if (STREQ_NULLABLE(priv->unplug.aliases[i], devAlias)) { + VIR_DEBUG("Removal of device '%s' continues in waiting thread", devAlias); + VIR_DELETE_ELEMENT(priv->unplug.aliases, i, priv->unplug.naliases); + priv->unplug.status = status; + virDomainObjBroadcast(vm); + return true; + } } return false; } @@ -5675,7 +5685,7 @@ qemuDomainHotplugDelVcpu(virQEMUDriverPtr driver, return -1; } - qemuDomainMarkDeviceAliasForRemoval(vm, vcpupriv->alias); + qemuDomainMarkDeviceAliasForRemoval(vm, vcpupriv->alias, true); qemuDomainObjEnterMonitor(driver, vm);

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/conf/device_conf.h | 6 +++ src/conf/domain_addr.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++ src/conf/domain_addr.h | 5 +++ src/libvirt_private.syms | 1 + src/util/virpci.h | 2 + 5 files changed, 98 insertions(+) diff --git a/src/conf/device_conf.h b/src/conf/device_conf.h index cdb2040fb8..ae7a651ee0 100644 --- a/src/conf/device_conf.h +++ b/src/conf/device_conf.h @@ -177,6 +177,12 @@ struct _virDomainDeviceInfo { bool isolationGroupLocked; }; +typedef struct _virDomainPCIMultifunctionAddressInfo virDomainPCIMultifunctionAddressInfo; +typedef virDomainPCIMultifunctionAddressInfo *virDomainPCIMultifunctionAddressInfoPtr; +struct _virDomainPCIMultifunctionAddressInfo { + virDomainDeviceInfoPtr infos[VIR_PCI_MAX_FUNCTIONS]; +}; + int virDomainDeviceInfoCopy(virDomainDeviceInfoPtr dst, virDomainDeviceInfoPtr src); void virDomainDeviceInfoClear(virDomainDeviceInfoPtr info); diff --git a/src/conf/domain_addr.c b/src/conf/domain_addr.c index c4a0b99628..43227a4b25 100644 --- a/src/conf/domain_addr.c +++ b/src/conf/domain_addr.c @@ -27,6 +27,7 @@ #include "virlog.h" #include "virstring.h" #include "domain_addr.h" +#include "device_conf.h" #define VIR_FROM_THIS VIR_FROM_DOMAIN @@ -697,6 +698,89 @@ virDomainPCIAddressEnsureAddr(virDomainPCIAddressSetPtr addrs, } +/* + *virDomainPCIAddressEnsureMultifunctionAddress: + * + * + * + */ + +int +virDomainPCIAddressEnsureMultifunctionAddress(virDomainPCIAddressSetPtr addrs, + virDomainPCIMultifunctionAddressInfoPtr pcicard) +{ + size_t i; + int ret = 0; + virPCIDeviceAddressPtr addr1 = NULL, addr2 = NULL; + virDomainDeviceInfoPtr dev = NULL; + char *addrStr = NULL; + + if (!pcicard->infos[0]) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Function-Zero missing on the slot")); + return -1; + } + + /* If the address is given by the user, make sure they belong + * to same slot */ + for (i = 0; i < VIR_PCI_MAX_FUNCTIONS; i++) { + dev = pcicard->infos[i]; + if (dev && !dev->pciConnectFlags) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Not a PCI Multifunction device.")); + goto cleanup; + } + if (dev && virDeviceInfoPCIAddressPresent(dev)) { + /* Pick one and compare against rest of the user given */ + addr1 = addr1 ? addr1 : &dev->addr.pci; + addr2 = &dev->addr.pci; + if (!(addrStr = virDomainPCIAddressAsString(addr2))) + goto cleanup; + if (!virDomainPCIAddressValidate(addrs, addr2, + addrStr, dev->pciConnectFlags, true)) + goto cleanup; + if (!(addr1->domain == addr2->domain && addr1->bus == addr2->bus && + addr1->slot == addr2->slot)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Addresses belong to different PCI slots")); + goto cleanup; + } + VIR_FREE(addrStr); + } + } + + /* Reserve all the user given addresses. */ + for (i = 0; i < VIR_PCI_MAX_FUNCTIONS; i++) { + dev = pcicard->infos[i]; + if (dev && virDeviceInfoPCIAddressPresent(dev)) { + ret = virDomainPCIAddressReserveAddrInternal(addrs, &dev->addr.pci, + dev->pciConnectFlags, dev->isolationGroup, + dev->aggregateSlotIdx, + true); + if (ret < 0) + goto cleanup; + } + } + + /* If the user has not given addresses, start with function zero */ + for (i = 0; i < VIR_PCI_MAX_FUNCTIONS; i++) { + dev = pcicard->infos[i]; + if (dev && !virDeviceInfoPCIAddressPresent(dev)) { + ret = virDomainPCIAddressReserveNextAddr(addrs, dev, dev->pciConnectFlags, i); + if (ret < 0) + goto cleanup; + } + } + + /* Set multi on overriding what user has set. */ + pcicard->infos[0]->addr.pci.multi = VIR_TRISTATE_SWITCH_ON; + + cleanup: + VIR_FREE(addrStr); + return ret; +} + + void virDomainPCIAddressReleaseAddr(virDomainPCIAddressSetPtr addrs, virPCIDeviceAddressPtr addr) diff --git a/src/conf/domain_addr.h b/src/conf/domain_addr.h index fa98b67e5c..e80e1e9089 100644 --- a/src/conf/domain_addr.h +++ b/src/conf/domain_addr.h @@ -162,6 +162,11 @@ int virDomainPCIAddressEnsureAddr(virDomainPCIAddressSetPtr addrs, virDomainPCIConnectFlags flags) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int +virDomainPCIAddressEnsureMultifunctionAddress(virDomainPCIAddressSetPtr addrs, + virDomainPCIMultifunctionAddressInfoPtr pcicard) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); + void virDomainPCIAddressReleaseAddr(virDomainPCIAddressSetPtr addrs, virPCIDeviceAddressPtr addr) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index d1ff2f5f99..3e4d627004 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -114,6 +114,7 @@ virDomainPCIAddressAsString; virDomainPCIAddressBusIsFullyReserved; virDomainPCIAddressBusSetModel; virDomainPCIAddressEnsureAddr; +virDomainPCIAddressEnsureMultifunctionAddress; virDomainPCIAddressReleaseAddr; virDomainPCIAddressReserveAddr; virDomainPCIAddressReserveNextAddr; diff --git a/src/util/virpci.h b/src/util/virpci.h index 5830fb4c12..65e586ed15 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -29,6 +29,8 @@ # include "virobject.h" # include "virutil.h" +# define VIR_PCI_MAX_FUNCTIONS 8 + typedef struct _virPCIDevice virPCIDevice; typedef virPCIDevice *virPCIDevicePtr; typedef struct _virPCIDeviceAddress virPCIDeviceAddress;

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/qemu/qemu_domain.c | 2 src/qemu/qemu_domain_address.c | 68 +++++++++++++++++ src/qemu/qemu_domain_address.h | 5 + src/qemu/qemu_driver.c | 66 ++++++++++------ src/qemu/qemu_hotplug.c | 82 ++++++++++++++++++++ src/qemu/qemu_hotplug.h | 4 + tests/qemuhotplugtest.c | 39 ++++++++-- .../qemuhotplug-multifunction-hostdev-pci-2.xml | 14 +++ .../qemuhotplug-multifunction-hostdev-pci.xml | 20 +++++ ...hotplug-base-live+multifunction-hostdev-pci.xml | 76 +++++++++++++++++++ ...eries-base-live+multifunction-hostdev-pci-2.xml | 61 +++++++++++++++ ...pseries-base-live+multifunction-hostdev-pci.xml | 69 +++++++++++++++++ 12 files changed, 475 insertions(+), 31 deletions(-) create mode 100644 tests/qemuhotplugtestdevices/qemuhotplug-multifunction-hostdev-pci-2.xml create mode 100644 tests/qemuhotplugtestdevices/qemuhotplug-multifunction-hostdev-pci.xml create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-base-live+multifunction-hostdev-pci.xml create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+multifunction-hostdev-pci-2.xml create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+multifunction-hostdev-pci.xml diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index c0a0af525f..1cfaf01540 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -33,6 +33,8 @@ #include "qemu_capabilities.h" #include "qemu_migration.h" #include "qemu_security.h" +#include "qemu_hotplug.h" +#include "qemu_hostdev.h" #include "viralloc.h" #include "virlog.h" #include "virerror.h" diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 7bee4fb937..ee743321dd 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -25,6 +25,7 @@ #include "qemu_domain_address.h" #include "qemu_domain.h" +#include "domain_conf.h" #include "viralloc.h" #include "virhostdev.h" #include "virerror.h" @@ -3206,6 +3207,73 @@ qemuDomainEnsurePCIAddress(virDomainObjPtr obj, info->pciConnectFlags); } + +int +qemuDomainPCIMultifunctionHostdevEnsurePCIAddresses(virDomainObjPtr vm, + virDomainDeviceDefListPtr devlist, + virQEMUDriverPtr driver) +{ + int ret = -1, aggrslotidx = 0; + virBitmapPtr slotmap = NULL; + size_t i; + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainPCIMultifunctionAddressInfoPtr devinfos = NULL; + + if (devlist->count > VIR_PCI_MAX_FUNCTIONS) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("More devices per slot found")); + return -1; + } + + for (i = 0; i < devlist->count; i++) + qemuDomainFillDevicePCIConnectFlags(vm->def, devlist->devs[i], priv->qemuCaps, driver); + + if (VIR_REALLOC_N(vm->def->hostdevs, vm->def->nhostdevs + devlist->count) < 0) + return -1; + + /* Temporarily add the devices to the domain def to get the + * next aggregateIdx */ + for (i = 0; i < devlist->count; i++) + vm->def->hostdevs[vm->def->nhostdevs++] = devlist->devs[i]->data.hostdev; + + for (i = 0; i < devlist->count; i++) { + if (qemuDomainIsPSeries(vm->def)) { + /* Isolation groups are only relevant for pSeries guests */ + if (qemuDomainFillDeviceIsolationGroup(vm->def, devlist->devs[i]) < 0) + return -1; + } + qemuDomainSetDeviceSlotAggregateIdx(vm->def, devlist->devs[i]); + aggrslotidx = aggrslotidx ? aggrslotidx : devlist->devs[i]->data.hostdev->info->aggregateSlotIdx; + if (aggrslotidx != devlist->devs[i]->data.hostdev->info->aggregateSlotIdx) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Devices belong to different PCI slots")); + return -1; + } + } + + for (i = 0; i < devlist->count; i++) + vm->def->hostdevs[--(vm->def->nhostdevs)] = NULL; + + slotmap = virDomainDefHostdevGetPCIOnlineFunctionMap(vm->def, aggrslotidx); + if (!virBitmapIsAllClear(slotmap)) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("Device already assigned to guest")); + return -1; + } + + if (VIR_ALLOC(devinfos) < 0) + return -1; + + for (i = 0; i < devlist->count; i++) { + virPCIDeviceAddress addr = devlist->devs[i]->data.hostdev->source.subsys.u.pci.addr; + devinfos->infos[addr.function] = devlist->devs[i]->data.hostdev->info; + } + + ret = virDomainPCIAddressEnsureMultifunctionAddress(priv->pciaddrs, devinfos); + VIR_FREE(devinfos); + return ret; +} + void qemuDomainReleaseDeviceAddress(virDomainObjPtr vm, virDomainDeviceInfoPtr info, diff --git a/src/qemu/qemu_domain_address.h b/src/qemu/qemu_domain_address.h index e1cc467714..43d1de889e 100644 --- a/src/qemu/qemu_domain_address.h +++ b/src/qemu/qemu_domain_address.h @@ -51,6 +51,11 @@ int qemuDomainEnsurePCIAddress(virDomainObjPtr obj, virQEMUDriverPtr driver) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); +int +qemuDomainPCIMultifunctionHostdevEnsurePCIAddresses(virDomainObjPtr obj, + virDomainDeviceDefListPtr devlist, + virQEMUDriverPtr driver); + int qemuDomainFillDeviceIsolationGroup(virDomainDefPtr def, virDomainDeviceDefPtr dev) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index d2e10082ea..fb14475d8c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7744,15 +7744,26 @@ qemuDomainAttachDeviceLiveInternal(virDomainObjPtr vm, static int qemuDomainAttachDeviceLive(virDomainObjPtr vm, - virDomainDeviceDefPtr dev, + virDomainDeviceDefListPtr devlist, virQEMUDriverPtr driver) { int ret = -1; + size_t i; - if (virDomainDefCompatibleDevice(vm->def, dev, NULL) < 0) - return -1; + for (i = 0; i < devlist->count; i++) + if (virDomainDefCompatibleDevice(vm->def, devlist->devs[i], NULL) < 0) + return ret; + + if (devlist->count > 1) { + ret = qemuDomainAttachMultifunctionDevice(vm, devlist, driver); + if (ret == 0) { + for (i = 0; i < devlist->count; i++) + devlist->devs[i]->data.hostdev = NULL; + } + } else if (devlist->count == 1) { + ret = qemuDomainAttachDeviceLiveInternal(vm, devlist->devs[0], driver); + } - ret = qemuDomainAttachDeviceLiveInternal(vm, dev, driver); if (ret == 0) ret = qemuDomainUpdateDeviceList(driver, vm, QEMU_ASYNC_JOB_NONE); @@ -8178,16 +8189,22 @@ qemuDomainAttachDeviceConfigInternal(virDomainDefPtr vmdef, static int qemuDomainAttachDeviceConfig(virDomainDefPtr vmdef, - virDomainDeviceDefPtr dev, + virDomainDeviceDefListPtr devlist, virCapsPtr caps, unsigned int parse_flags, virDomainXMLOptionPtr xmlopt) { - if (virDomainDefCompatibleDevice(vmdef, dev, NULL) < 0) - return -1; + size_t i; - if (qemuDomainAttachDeviceConfigInternal(vmdef, dev)) - return -1; + for (i = 0; i < devlist->count; i++) { + if (virDomainDefCompatibleDevice(vmdef, devlist->devs[i], NULL) < 0) + return -1; + } + + for (i = 0; i < devlist->count; i++) { + if (qemuDomainAttachDeviceConfigInternal(vmdef, devlist->devs[i])) + return -1; + } if (virDomainDefPostParse(vmdef, caps, parse_flags, xmlopt, NULL) < 0) return -1; @@ -8493,9 +8510,11 @@ qemuDomainAttachDeviceLiveAndConfig(virDomainObjPtr vm, const char *xml, unsigned int flags) { + size_t i; virDomainDefPtr vmdef = NULL; virQEMUDriverConfigPtr cfg = NULL; - virDomainDeviceDefPtr dev = NULL, dev_copy = NULL; + virDomainDeviceDefListPtr devlist = NULL, devcopylist = NULL; + virDomainDeviceDefListData data = {.def = vm->def, .xmlopt = driver->xmlopt, .caps = NULL}; int ret = -1; virCapsPtr caps = NULL; unsigned int parse_flags = VIR_DOMAIN_DEF_PARSE_INACTIVE | @@ -8508,15 +8527,16 @@ qemuDomainAttachDeviceLiveAndConfig(virDomainObjPtr vm, if (!(caps = virQEMUDriverGetCapabilities(driver, false))) goto cleanup; + data.caps = caps; - dev = dev_copy = virDomainDeviceDefParse(xml, vm->def, - caps, driver->xmlopt, - parse_flags); - if (dev == NULL) + devlist = qemuDomainDeviceParseXMLMany(xml, &data, parse_flags); + if (!devlist) goto cleanup; + devcopylist = devlist; - if (virDomainDeviceValidateAliasForHotplug(vm, dev, flags) < 0) - goto cleanup; + for (i = 0; i < devlist->count; i++) + if (virDomainDeviceValidateAliasForHotplug(vm, devlist->devs[i], flags) < 0) + goto cleanup; if (flags & VIR_DOMAIN_AFFECT_CONFIG && flags & VIR_DOMAIN_AFFECT_LIVE) { @@ -8524,8 +8544,8 @@ qemuDomainAttachDeviceLiveAndConfig(virDomainObjPtr vm, * create a deep copy of device as adding * to CONFIG takes one instance. */ - dev_copy = virDomainDeviceDefCopy(dev, vm->def, caps, driver->xmlopt); - if (!dev_copy) + devcopylist = virDomainDeviceDefListCopy(devlist, &data); + if (!devcopylist) goto cleanup; } @@ -8535,14 +8555,14 @@ qemuDomainAttachDeviceLiveAndConfig(virDomainObjPtr vm, if (!vmdef) goto cleanup; - if ((ret = qemuDomainAttachDeviceConfig(vmdef, dev, caps, + if ((ret = qemuDomainAttachDeviceConfig(vmdef, devlist, caps, parse_flags, driver->xmlopt)) < 0) goto cleanup; } if (flags & VIR_DOMAIN_AFFECT_LIVE) { - if ((ret = qemuDomainAttachDeviceLive(vm, dev_copy, driver)) < 0) + if ((ret = qemuDomainAttachDeviceLive(vm, devcopylist, driver)) < 0) goto cleanup; /* * update domain status forcibly because the domain status may be @@ -8566,9 +8586,9 @@ qemuDomainAttachDeviceLiveAndConfig(virDomainObjPtr vm, cleanup: virDomainDefFree(vmdef); - if (dev != dev_copy) - virDomainDeviceDefFree(dev_copy); - virDomainDeviceDefFree(dev); + if (devlist != devcopylist) + virDomainDeviceDefListFree(devcopylist); + virDomainDeviceDefListFree(devlist); virObjectUnref(cfg); virObjectUnref(caps); diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 7dffaf9502..a339e92bfa 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1402,6 +1402,88 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver, } +static int +qemuiHostdevPCIMultifunctionDevicesListSort(const void *p1, + const void *p2) +{ + virDomainDeviceDefPtr a = *(virDomainDeviceDefPtr *) p1; + virDomainDeviceDefPtr b = *(virDomainDeviceDefPtr *) p2; + virPCIDeviceAddressPtr addr1 = &a->data.hostdev->source.subsys.u.pci.addr; + virPCIDeviceAddressPtr addr2 = &b->data.hostdev->source.subsys.u.pci.addr; + + return addr1->function - addr2->function; +} + + +int +qemuDomainAttachMultifunctionDevice(virDomainObjPtr vm, + virDomainDeviceDefListPtr devlist, + virQEMUDriverPtr driver) +{ + size_t i, d, h = devlist->count; + int ret = -1; + char *alias; + virObjectEventPtr event; + virQEMUCapsPtr qemuCaps = NULL; + virDomainHostdevDefPtr hostdev = NULL; + + if (!(qemuCaps = virQEMUCapsCacheLookup(driver->qemuCapsCache, + vm->def->emulator))) + return -1; + + qsort(devlist->devs, devlist->count, sizeof(*devlist->devs), qemuiHostdevPCIMultifunctionDevicesListSort); + + if (qemuDomainPCIMultifunctionHostdevEnsurePCIAddresses(vm, devlist, driver) < 0) + return -1; + + for (d = 0; d < devlist->count; d++) + if (qemuDomainAttachPCIHostDevicePrepare(driver, vm->def, devlist->devs[d]->data.hostdev, qemuCaps) < 0) + goto cleanup; + + /* Hotplug all functions, and Primary at last */ + for (h = devlist->count; h > 0; h--) { + /* The functions need not be contiguous, as a card may be sold with + * minimal functionality and then install the additional functions on + * purchase into any of the daughter-card connectors. + */ + hostdev = devlist->devs[h-1]->data.hostdev; + ret = qemuDomainAttachHostPCIDevice(driver, vm, hostdev); + if (ret) + goto release; + + alias = hostdev->info->alias; + hostdev = NULL; + + event = virDomainEventDeviceAddedNewFromObj(vm, alias); + qemuDomainEventQueue(driver, event); + } + + release: + /* Release addresses for the device which are not hotplugged. + */ + for (i = 0; i < h; i++) + qemuDomainReleaseDeviceAddress(vm, devlist->devs[i]->data.hostdev->info, NULL); + + cleanup: + /* If none are actually hotplugged and just detached from the + * host driver reattach the devices to host driver. + * + * If one of the hotplug failed, those which are already hotplugged cannot + * be unplugged as they are released by qemu only on guest reboot even + * if we issue device_del on them. + * So, dont attempt to reattach any of them. + * NB: Let them be in the guest as they are not used anyway without + * function-zero? + */ + if (d > 0 && h == devlist->count) { + for (i = 0; i < d; i++) + qemuHostdevReAttachPCIDevices(driver, vm->def->name, &devlist->devs[i]->data.hostdev, 1); + } + + return ret; +} + + void qemuDomainDelTLSObjects(virQEMUDriverPtr driver, virDomainObjPtr vm, diff --git a/src/qemu/qemu_hotplug.h b/src/qemu/qemu_hotplug.h index db0e1df79a..7a6e2dfb60 100644 --- a/src/qemu/qemu_hotplug.h +++ b/src/qemu/qemu_hotplug.h @@ -158,6 +158,10 @@ int qemuDomainAttachPCIHostDevicePrepare(virQEMUDriverPtr driver, virDomainDefPtr def, virDomainHostdevDefPtr dev, virQEMUCapsPtr qemuCaps); +int +qemuDomainAttachMultifunctionDevice(virDomainObjPtr vm, + virDomainDeviceDefListPtr devlist, + virQEMUDriverPtr driver); int qemuDomainChrInsert(virDomainDefPtr vmdef, diff --git a/tests/qemuhotplugtest.c b/tests/qemuhotplugtest.c index b5dca5e5c9..39f122e083 100644 --- a/tests/qemuhotplugtest.c +++ b/tests/qemuhotplugtest.c @@ -82,6 +82,8 @@ qemuHotplugCreateObjects(virDomainXMLOptionPtr xmlopt, virQEMUCapsSet(priv->qemuCaps, QEMU_CAPS_DEVICE_VFIO_PCI); virQEMUCapsSet(priv->qemuCaps, QEMU_CAPS_DEVICE_SPAPR_PCI_HOST_BRIDGE); virQEMUCapsSet(priv->qemuCaps, QEMU_CAPS_VIRTIO_PCI_DISABLE_LEGACY); + virQEMUCapsSet(priv->qemuCaps, QEMU_CAPS_HOST_PCI_MULTIDOMAIN); + virQEMUCapsSet(priv->qemuCaps, QEMU_CAPS_PCI_MULTIFUNCTION); if (event) virQEMUCapsSet(priv->qemuCaps, QEMU_CAPS_DEVICE_DEL_EVENT); @@ -115,9 +117,14 @@ qemuHotplugCreateObjects(virDomainXMLOptionPtr xmlopt, static int testQemuHotplugAttach(virDomainObjPtr vm, - virDomainDeviceDefPtr dev) + virDomainDeviceDefListPtr devlist) { int ret = -1; + virDomainDeviceDefPtr dev; + + if (devlist->count > 1) + return qemuDomainAttachMultifunctionDevice(vm, devlist, &driver); + dev = devlist->devs[0]; switch (dev->type) { case VIR_DOMAIN_DEVICE_DISK: @@ -249,7 +256,9 @@ testQemuHotplug(const void *data) bool keep = test->keep; unsigned int device_parse_flags = 0; virDomainObjPtr vm = NULL; - virDomainDeviceDefPtr dev = NULL; + virDomainDeviceDefPtr dev = NULL; /*temperory */ + virDomainDeviceDefListPtr devlist = NULL; + virDomainDeviceDefListData listdata; virCapsPtr caps = NULL; qemuMonitorTestPtr test_mon = NULL; qemuDomainObjPrivatePtr priv = NULL; @@ -286,10 +295,13 @@ testQemuHotplug(const void *data) if (test->action == ATTACH) device_parse_flags = VIR_DOMAIN_DEF_PARSE_INACTIVE; - if (!(dev = virDomainDeviceDefParse(device_xml, vm->def, - caps, driver.xmlopt, - device_parse_flags))) + listdata.def = vm->def; + listdata.xmlopt = driver.xmlopt; + listdata.caps = caps; + devlist = qemuDomainDeviceParseXMLMany(device_xml, &listdata, device_parse_flags); + if (!devlist) goto cleanup; + dev = devlist->devs[0]; /* temporary */ /* Now is the best time to feed the spoofed monitor with predefined * replies. */ @@ -319,11 +331,11 @@ testQemuHotplug(const void *data) switch (test->action) { case ATTACH: - ret = testQemuHotplugAttach(vm, dev); + ret = testQemuHotplugAttach(vm, devlist); if (ret == 0) { /* vm->def stolen dev->data.* so we just need to free the dev * envelope */ - VIR_FREE(dev); + virDomainDeviceDefListFreeShallow(devlist); } if (ret == 0 || fail) ret = testQemuHotplugCheckResult(vm, result_xml, @@ -357,7 +369,7 @@ testQemuHotplug(const void *data) virObjectUnref(vm); test->vm = NULL; } - virDomainDeviceDefFree(dev); + virDomainDeviceDefListFree(devlist); virObjectUnref(caps); qemuMonitorTestFree(test_mon); return ((ret < 0 && fail) || (!ret && !fail)) ? 0 : -1; @@ -856,6 +868,17 @@ mymain(void) "device_add", QMP_OK); DO_TEST_DETACH("pseries-base-live", "hostdev-pci", false, false, "device_del", QMP_DEVICE_DELETED("hostdev0") QMP_OK); + DO_TEST_ATTACH("base-live", "multifunction-hostdev-pci", false, false, + "device_add", QMP_OK, + "device_add", QMP_OK, + "device_add", QMP_OK); + + qemuTestSetHostArch(driver.caps, VIR_ARCH_PPC64); + DO_TEST_ATTACH("pseries-base-live", "multifunction-hostdev-pci-2", false, false, + "device_add", QMP_OK, + "device_add", QMP_OK, + "device_add", QMP_OK); + qemuTestSetHostArch(driver.caps, VIR_ARCH_X86_64); DO_TEST_ATTACH("base-live", "watchdog", false, true, "watchdog-set-action", QMP_OK, diff --git a/tests/qemuhotplugtestdevices/qemuhotplug-multifunction-hostdev-pci-2.xml b/tests/qemuhotplugtestdevices/qemuhotplug-multifunction-hostdev-pci-2.xml new file mode 100644 index 0000000000..02e2236e5d --- /dev/null +++ b/tests/qemuhotplugtestdevices/qemuhotplug-multifunction-hostdev-pci-2.xml @@ -0,0 +1,14 @@ +<devices> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0001' bus='0x01' slot='0x00' function='0x1'/> + </source> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0001' bus='0x01' slot='0x00' function='0x0'/> + </source> + </hostdev> +</devices> diff --git a/tests/qemuhotplugtestdevices/qemuhotplug-multifunction-hostdev-pci.xml b/tests/qemuhotplugtestdevices/qemuhotplug-multifunction-hostdev-pci.xml new file mode 100644 index 0000000000..54bb627e30 --- /dev/null +++ b/tests/qemuhotplugtestdevices/qemuhotplug-multifunction-hostdev-pci.xml @@ -0,0 +1,20 @@ +<devices> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0005' bus='0x90' slot='0x01' function='0x1'/> + </source> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0005' bus='0x90' slot='0x01' function='0x2'/> + </source> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0005' bus='0x90' slot='0x01' function='0x0'/> + </source> + </hostdev> +</devices> diff --git a/tests/qemuhotplugtestdomains/qemuhotplug-base-live+multifunction-hostdev-pci.xml b/tests/qemuhotplugtestdomains/qemuhotplug-base-live+multifunction-hostdev-pci.xml new file mode 100644 index 0000000000..f53fe7601b --- /dev/null +++ b/tests/qemuhotplugtestdomains/qemuhotplug-base-live+multifunction-hostdev-pci.xml @@ -0,0 +1,76 @@ +<domain type='kvm' id='7'> + <name>hotplug</name> + <uuid>d091ea82-29e6-2e34-3005-f02617b36e87</uuid> + <memory unit='KiB'>4194304</memory> + <currentMemory unit='KiB'>4194304</currentMemory> + <vcpu placement='static'>4</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <features> + <acpi/> + <apic/> + <pae/> + </features> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>restart</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <controller type='usb' index='0'> + <alias name='usb'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='ide' index='0'> + <alias name='ide'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <controller type='scsi' index='0' model='virtio-scsi'> + <alias name='scsi0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </controller> + <controller type='pci' index='0' model='pci-root'> + <alias name='pci'/> + </controller> + <controller type='virtio-serial' index='0'> + <alias name='virtio-serial0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </controller> + <input type='mouse' bus='ps2'> + <alias name='input0'/> + </input> + <input type='keyboard' bus='ps2'> + <alias name='input1'/> + </input> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0005' bus='0x90' slot='0x01' function='0x2'/> + </source> + <alias name='hostdev0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x2'/> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0005' bus='0x90' slot='0x01' function='0x1'/> + </source> + <alias name='hostdev1'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0005' bus='0x90' slot='0x01' function='0x0'/> + </source> + <alias name='hostdev2'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/> + </hostdev> + <memballoon model='none'> + <alias name='balloon0'/> + </memballoon> + </devices> + <seclabel type='none' model='none'/> +</domain> diff --git a/tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+multifunction-hostdev-pci-2.xml b/tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+multifunction-hostdev-pci-2.xml new file mode 100644 index 0000000000..85dc48fdb8 --- /dev/null +++ b/tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+multifunction-hostdev-pci-2.xml @@ -0,0 +1,61 @@ +<domain type='kvm' id='7'> + <name>hotplug</name> + <uuid>d091ea82-29e6-2e34-3005-f02617b36e87</uuid> + <memory unit='KiB'>4194304</memory> + <currentMemory unit='KiB'>4194304</currentMemory> + <vcpu placement='static'>4</vcpu> + <os> + <type arch='ppc64' machine='pseries'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-ppc64</emulator> + <controller type='pci' index='0' model='pci-root'> + <model name='spapr-pci-host-bridge'/> + <target index='0'/> + <alias name='pci.0'/> + </controller> + <controller type='pci' index='1' model='pci-root'> + <model name='spapr-pci-host-bridge'/> + <target index='1'/> + <alias name='pci.1'/> + </controller> + <controller type='usb' index='0'> + <alias name='usb'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> + </controller> + <input type='keyboard' bus='usb'> + <alias name='input0'/> + <address type='usb' bus='0' port='1'/> + </input> + <input type='mouse' bus='usb'> + <alias name='input1'/> + <address type='usb' bus='0' port='2'/> + </input> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0001' bus='0x01' slot='0x00' function='0x1'/> + </source> + <alias name='hostdev0'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x1'/> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0001' bus='0x01' slot='0x00' function='0x0'/> + </source> + <alias name='hostdev1'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0' multifunction='on'/> + </hostdev> + <memballoon model='none'> + <alias name='balloon0'/> + </memballoon> + <panic model='pseries'/> + </devices> + <seclabel type='none' model='none'/> +</domain> diff --git a/tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+multifunction-hostdev-pci.xml b/tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+multifunction-hostdev-pci.xml new file mode 100644 index 0000000000..9338d42e2e --- /dev/null +++ b/tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+multifunction-hostdev-pci.xml @@ -0,0 +1,69 @@ +<domain type='kvm' id='7'> + <name>hotplug</name> + <uuid>d091ea82-29e6-2e34-3005-f02617b36e87</uuid> + <memory unit='KiB'>4194304</memory> + <currentMemory unit='KiB'>4194304</currentMemory> + <vcpu placement='static'>4</vcpu> + <os> + <type arch='ppc64' machine='pseries'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-ppc64</emulator> + <controller type='pci' index='0' model='pci-root'> + <model name='spapr-pci-host-bridge'/> + <target index='0'/> + <alias name='pci.0'/> + </controller> + <controller type='pci' index='1' model='pci-root'> + <model name='spapr-pci-host-bridge'/> + <target index='1'/> + <alias name='pci.1'/> + </controller> + <controller type='usb' index='0'> + <alias name='usb'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> + </controller> + <input type='keyboard' bus='usb'> + <alias name='input0'/> + <address type='usb' bus='0' port='1'/> + </input> + <input type='mouse' bus='usb'> + <alias name='input1'/> + <address type='usb' bus='0' port='2'/> + </input> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0005' bus='0x90' slot='0x01' function='0x2'/> + </source> + <alias name='hostdev0'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x2'/> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0005' bus='0x90' slot='0x01' function='0x1'/> + </source> + <alias name='hostdev1'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x1'/> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver name='vfio'/> + <source> + <address domain='0x0005' bus='0x90' slot='0x01' function='0x0'/> + </source> + <alias name='hostdev2'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0' multifunction='on'/> + </hostdev> + <memballoon model='none'> + <alias name='balloon0'/> + </memballoon> + <panic model='pseries'/> + </devices> + <seclabel type='none' model='none'/> +</domain>

The PCI hostdevs once part of the domain, cant be changed. So, prevent attempts. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/qemu/qemu_driver.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index fb14475d8c..94f76979e5 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -8646,6 +8646,8 @@ static int qemuDomainUpdateDeviceFlags(virDomainPtr dom, virQEMUDriverPtr driver = dom->conn->privateData; virDomainObjPtr vm = NULL; virDomainDefPtr vmdef = NULL; + virDomainDeviceDefListPtr devlist; + virDomainDeviceDefListData data = {.xmlopt = driver->xmlopt, .caps = NULL}; virDomainDeviceDefPtr dev = NULL, dev_copy = NULL; bool force = (flags & VIR_DOMAIN_DEVICE_MODIFY_FORCE) != 0; int ret = -1; @@ -8663,9 +8665,11 @@ static int qemuDomainUpdateDeviceFlags(virDomainPtr dom, if (!(caps = virQEMUDriverGetCapabilities(driver, false))) goto cleanup; + data.caps = caps; if (!(vm = qemuDomObjFromDomain(dom))) goto cleanup; + data.def = vm->def; if (virDomainUpdateDeviceFlagsEnsureACL(dom->conn, vm->def, flags) < 0) goto cleanup; @@ -8673,12 +8677,18 @@ static int qemuDomainUpdateDeviceFlags(virDomainPtr dom, if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) goto cleanup; - dev = dev_copy = virDomainDeviceDefParse(xml, vm->def, - caps, driver->xmlopt, - parse_flags); - if (dev == NULL) + devlist = qemuDomainDeviceParseXMLMany(xml, &data, parse_flags); + if (!devlist) goto endjob; + if (devlist->count > 1) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("update of device multifunction devices is not supported")); + goto endjob; + } + + dev = dev_copy = devlist->devs[0]; + if (virDomainObjUpdateModificationImpact(vm, &flags) < 0) goto endjob;

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/qemu/qemu_hotplug.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index a339e92bfa..5f6302eaf9 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -4993,17 +4993,8 @@ qemuDomainDetachHostPCIDevice(virQEMUDriverPtr driver, virDomainHostdevDefPtr detach) { qemuDomainObjPrivatePtr priv = vm->privateData; - virDomainHostdevSubsysPCIPtr pcisrc = &detach->source.subsys.u.pci; int ret; - if (qemuIsMultiFunctionDevice(vm->def, detach->info)) { - virReportError(VIR_ERR_OPERATION_FAILED, - _("cannot hot unplug multifunction PCI device: %.4x:%.2x:%.2x.%.1x"), - pcisrc->addr.domain, pcisrc->addr.bus, - pcisrc->addr.slot, pcisrc->addr.function); - return -1; - } - qemuDomainMarkDeviceForRemoval(vm, detach->info); qemuDomainObjEnterMonitor(driver, vm); @@ -5094,12 +5085,22 @@ qemuDomainDetachThisHostDevice(virQEMUDriverPtr driver, virDomainHostdevDefPtr detach) { int ret = -1; + virDomainHostdevSubsysPtr subsys = &detach->source.subsys; + virDomainHostdevSubsysPCIPtr pcisrc = &subsys->u.pci; if (qemuAssignDeviceHostdevAlias(vm->def, &detach->info->alias, -1) < 0) return -1; switch (detach->source.subsys.type) { case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: + if (qemuIsMultiFunctionDevice(vm->def, detach->info)) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("cannot hot unplug multifunction PCI device: %.4x:%.2x:%.2x.%.1x"), + pcisrc->addr.domain, pcisrc->addr.bus, + pcisrc->addr.slot, pcisrc->addr.function); + return -1; + } + ret = qemuDomainDetachHostPCIDevice(driver, vm, detach); break; case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB:

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> --- src/qemu/qemu_driver.c | 48 +++++++++++++------- src/qemu/qemu_hotplug.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_hotplug.h | 5 ++ tests/qemuhotplugtest.c | 26 ++++++++--- 4 files changed, 165 insertions(+), 25 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 94f76979e5..0640395b00 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7858,12 +7858,18 @@ qemuDomainDetachDeviceLiveInternal(virDomainObjPtr vm, static int qemuDomainDetachDeviceLive(virDomainObjPtr vm, - virDomainDeviceDefPtr dev, + virDomainDeviceDefListPtr devlist, virQEMUDriverPtr driver) { int ret = -1; + virDomainDeviceDefPtr dev = devlist->devs[0]; + + if (devlist->count > 1) { + ret = qemuDomainDetachMultifunctionDevice(vm, devlist, driver); + } else { + ret = qemuDomainDetachDeviceLiveInternal(vm, dev, driver); + } - ret = qemuDomainDetachDeviceLiveInternal(vm, dev, driver); if (ret == 0) ret = qemuDomainUpdateDeviceList(driver, vm, QEMU_ASYNC_JOB_NONE); @@ -8392,17 +8398,24 @@ qemuDomainDetachDeviceConfigInternal(virDomainDefPtr vmdef, static int qemuDomainDetachDeviceConfig(virDomainDefPtr vmdef, - virDomainDeviceDefPtr dev, + virDomainDeviceDefListPtr devlist, virCapsPtr caps, unsigned int parse_flags, virDomainXMLOptionPtr xmlopt) { - if (qemuDomainDetachDeviceConfigInternal(vmdef, dev)) - return -1; + size_t i; + for (i = 0; i < devlist->count; i++) + if (qemuDomainDetachDeviceConfigInternal(vmdef, devlist->devs[i])) + return -1; if (virDomainDefPostParse(vmdef, caps, parse_flags, xmlopt, NULL) < 0) return -1; + /* Dont allow removing the primary function alone for a multifunction + * device leading to guest start failure later. */ + if (devlist->count > 1 && qemuDomainDefValidatePCIHostdevs(vmdef) < 0) + return -1; + return 0; } @@ -8765,8 +8778,9 @@ qemuDomainDetachDeviceLiveAndConfig(virQEMUDriverPtr driver, { virCapsPtr caps = NULL; virQEMUDriverConfigPtr cfg = NULL; - virDomainDeviceDefPtr dev = NULL, dev_copy = NULL; unsigned int parse_flags = VIR_DOMAIN_DEF_PARSE_SKIP_VALIDATE; + virDomainDeviceDefListPtr devlist = NULL, devcopylist = NULL; + virDomainDeviceDefListData data = {.def = vm->def, .xmlopt = driver->xmlopt, .caps = NULL}; virDomainDefPtr vmdef = NULL; int ret = -1; @@ -8775,6 +8789,7 @@ qemuDomainDetachDeviceLiveAndConfig(virQEMUDriverPtr driver, if (!(caps = virQEMUDriverGetCapabilities(driver, false))) goto cleanup; + data.caps = caps; cfg = virQEMUDriverGetConfig(driver); @@ -8782,11 +8797,10 @@ qemuDomainDetachDeviceLiveAndConfig(virQEMUDriverPtr driver, !(flags & VIR_DOMAIN_AFFECT_LIVE)) parse_flags |= VIR_DOMAIN_DEF_PARSE_INACTIVE; - dev = dev_copy = virDomainDeviceDefParse(xml, vm->def, - caps, driver->xmlopt, - parse_flags); - if (dev == NULL) + devlist = qemuDomainDeviceParseXMLMany(xml, &data, parse_flags); + if (!devlist) goto cleanup; + devcopylist = devlist; if (flags & VIR_DOMAIN_AFFECT_CONFIG && flags & VIR_DOMAIN_AFFECT_LIVE) { @@ -8794,8 +8808,8 @@ qemuDomainDetachDeviceLiveAndConfig(virQEMUDriverPtr driver, * create a deep copy of device as adding * to CONFIG takes one instance. */ - dev_copy = virDomainDeviceDefCopy(dev, vm->def, caps, driver->xmlopt); - if (!dev_copy) + devcopylist = virDomainDeviceDefListCopy(devlist, &data); + if (!devcopylist) goto cleanup; } @@ -8805,14 +8819,14 @@ qemuDomainDetachDeviceLiveAndConfig(virQEMUDriverPtr driver, if (!vmdef) goto cleanup; - if ((ret = qemuDomainDetachDeviceConfig(vmdef, dev, caps, + if ((ret = qemuDomainDetachDeviceConfig(vmdef, devlist, caps, parse_flags, driver->xmlopt)) < 0) goto cleanup; } if (flags & VIR_DOMAIN_AFFECT_LIVE) { - if ((ret = qemuDomainDetachDeviceLive(vm, dev_copy, driver)) < 0) + if ((ret = qemuDomainDetachDeviceLive(vm, devcopylist, driver)) < 0) goto cleanup; /* * update domain status forcibly because the domain status may be @@ -8837,9 +8851,9 @@ qemuDomainDetachDeviceLiveAndConfig(virQEMUDriverPtr driver, cleanup: virObjectUnref(caps); virObjectUnref(cfg); - if (dev != dev_copy) - virDomainDeviceDefFree(dev_copy); - virDomainDeviceDefFree(dev); + if (devlist != devcopylist) + virDomainDeviceDefListFree(devcopylist); + virDomainDeviceDefListFree(devlist); virDomainDefFree(vmdef); return ret; } diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 5f6302eaf9..87acd5cf69 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -5210,6 +5210,117 @@ int qemuDomainDetachHostDevice(virQEMUDriverPtr driver, int +qemuDomainDetachMultifunctionDevice(virDomainObjPtr vm, + virDomainDeviceDefListPtr devlist, + virQEMUDriverPtr driver) +{ + size_t i; + int ret = -1; + virBitmapPtr tbdmap = NULL, onlinemap = NULL; + int *functions = NULL; + virDomainHostdevDefPtr hostdev, detach = NULL; + virDomainHostdevSubsysPtr subsys = NULL; + int slotaggridx = 0; + virDomainHostdevSubsysPCIPtr pcisrc = NULL; + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE_DEL_EVENT)) + return -1; + +#define FOR_EACH_DEV_IN_DEVLIST() \ + for (i = 0; i < devlist->count; i++) { \ + hostdev = devlist->devs[i]->data.hostdev; \ + subsys = &hostdev->source.subsys; \ + pcisrc = &subsys->u.pci; \ + virDomainHostdevFind(vm->def, hostdev, &detach); + + FOR_EACH_DEV_IN_DEVLIST() + if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("hot unplug is not supported for hostdev mode '%s'"), + virDomainHostdevModeTypeToString(hostdev->mode)); + goto cleanup; + } + } + + FOR_EACH_DEV_IN_DEVLIST() + if (!detach) { + virReportError(VIR_ERR_DEVICE_MISSING, + _("host pci device %.4x:%.2x:%.2x.%.1x not found"), + pcisrc->addr.domain, pcisrc->addr.bus, + pcisrc->addr.slot, pcisrc->addr.function); + goto cleanup; + } + } + + /* Check if the devices belong to same guest slot.*/ + FOR_EACH_DEV_IN_DEVLIST() + /* Pick one aggregateSlotIdx and compare against rest of them */ + slotaggridx = slotaggridx ? slotaggridx : detach->info->aggregateSlotIdx; + if (slotaggridx != detach->info->aggregateSlotIdx) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("host pci device %.4x:%.2x:%.2x.%.1x doesnt not belong to" + " same slot"), pcisrc->addr.domain, pcisrc->addr.bus, + pcisrc->addr.slot, pcisrc->addr.function); + goto cleanup; + } + } + + /* Check if the whole slot is being removed or not */ + onlinemap = virDomainDefHostdevGetPCIOnlineFunctionMap(vm->def, slotaggridx); + FOR_EACH_DEV_IN_DEVLIST() + ignore_value(virBitmapClearBit(onlinemap, pcisrc->addr.function)); + } + + if (!virBitmapIsAllClear(onlinemap)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("hot unplug of partial PCI slot not allowed")); + goto cleanup; + } + + /* Mark all aliases for removal */ + memset(&priv->unplug, 0, sizeof(priv->unplug)); + FOR_EACH_DEV_IN_DEVLIST() + if (qemuAssignDeviceHostdevAlias(vm->def, &detach->info->alias, -1) < 0) + goto reset; + + qemuDomainMarkDeviceAliasForRemoval(vm, detach->info->alias, false); + } + + qemuDomainObjEnterMonitor(driver, vm); + FOR_EACH_DEV_IN_DEVLIST() + if (qemuMonitorDelDevice(priv->mon, detach->info->alias) < 0) { + ignore_value(qemuDomainObjExitMonitor(driver, vm)); + if (virDomainObjIsActive(vm)) + virDomainAuditHostdev(vm, detach, "detach", false); + goto reset; + } + if (ARCH_IS_X86(vm->def->os.arch)) + break; /* deleting any one is enough! */ + } + + if (qemuDomainObjExitMonitor(driver, vm) < 0) + ret = -1; + + if ((ret = qemuDomainWaitForDeviceRemoval(vm)) == 1) { + FOR_EACH_DEV_IN_DEVLIST() + ret = qemuDomainRemoveHostDevice(driver, vm, detach); + } + } + reset: + qemuDomainResetDeviceRemoval(vm); + cleanup: + VIR_FREE(functions); + virBitmapFree(onlinemap); + virBitmapFree(tbdmap); + +#undef FOR_EACH_DEV_IN_DEVLIST + + return ret; +} + + +int qemuDomainDetachShmemDevice(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainShmemDefPtr dev) diff --git a/src/qemu/qemu_hotplug.h b/src/qemu/qemu_hotplug.h index 7a6e2dfb60..bc850ecc44 100644 --- a/src/qemu/qemu_hotplug.h +++ b/src/qemu/qemu_hotplug.h @@ -164,6 +164,11 @@ qemuDomainAttachMultifunctionDevice(virDomainObjPtr vm, virQEMUDriverPtr driver); int +qemuDomainDetachMultifunctionDevice(virDomainObjPtr vm, + virDomainDeviceDefListPtr devlist, + virQEMUDriverPtr driver); + +int qemuDomainChrInsert(virDomainDefPtr vmdef, virDomainChrDefPtr chr); virDomainChrDefPtr diff --git a/tests/qemuhotplugtest.c b/tests/qemuhotplugtest.c index 39f122e083..61557ce562 100644 --- a/tests/qemuhotplugtest.c +++ b/tests/qemuhotplugtest.c @@ -156,9 +156,14 @@ testQemuHotplugAttach(virDomainObjPtr vm, static int testQemuHotplugDetach(virDomainObjPtr vm, - virDomainDeviceDefPtr dev) + virDomainDeviceDefListPtr devlist) { int ret = -1; + virDomainDeviceDefPtr dev; + + if (devlist->count > 1) + return qemuDomainDetachMultifunctionDevice(vm, devlist, &driver); + dev = devlist->devs[0]; switch (dev->type) { case VIR_DOMAIN_DEVICE_DISK: @@ -256,7 +261,6 @@ testQemuHotplug(const void *data) bool keep = test->keep; unsigned int device_parse_flags = 0; virDomainObjPtr vm = NULL; - virDomainDeviceDefPtr dev = NULL; /*temperory */ virDomainDeviceDefListPtr devlist = NULL; virDomainDeviceDefListData listdata; virCapsPtr caps = NULL; @@ -301,7 +305,6 @@ testQemuHotplug(const void *data) devlist = qemuDomainDeviceParseXMLMany(device_xml, &listdata, device_parse_flags); if (!devlist) goto cleanup; - dev = devlist->devs[0]; /* temporary */ /* Now is the best time to feed the spoofed monitor with predefined * replies. */ @@ -343,14 +346,14 @@ testQemuHotplug(const void *data) break; case DETACH: - ret = testQemuHotplugDetach(vm, dev); + ret = testQemuHotplugDetach(vm, devlist); if (ret == 0 || fail) ret = testQemuHotplugCheckResult(vm, domain_xml, domain_filename, fail); break; case UPDATE: - ret = testQemuHotplugUpdate(vm, dev); + ret = testQemuHotplugUpdate(vm, devlist->devs[0]); } cleanup: @@ -868,16 +871,23 @@ mymain(void) "device_add", QMP_OK); DO_TEST_DETACH("pseries-base-live", "hostdev-pci", false, false, "device_del", QMP_DEVICE_DELETED("hostdev0") QMP_OK); - DO_TEST_ATTACH("base-live", "multifunction-hostdev-pci", false, false, + DO_TEST_ATTACH_EVENT("base-live", "multifunction-hostdev-pci", false, true, "device_add", QMP_OK, "device_add", QMP_OK, "device_add", QMP_OK); + DO_TEST_DETACH("base-live", "multifunction-hostdev-pci", false, false, + "device_del", QMP_DEVICE_DELETED("hostdev2") + QMP_DEVICE_DELETED("hostdev1") + QMP_DEVICE_DELETED("hostdev0") QMP_OK); qemuTestSetHostArch(driver.caps, VIR_ARCH_PPC64); - DO_TEST_ATTACH("pseries-base-live", "multifunction-hostdev-pci-2", false, false, - "device_add", QMP_OK, + DO_TEST_ATTACH_EVENT("pseries-base-live", "multifunction-hostdev-pci-2", false, true, "device_add", QMP_OK, "device_add", QMP_OK); + DO_TEST_DETACH("pseries-base-live", "multifunction-hostdev-pci-2", false, false, + "device_del", QMP_OK, + "device_del", QMP_DEVICE_DELETED("hostdev1") + QMP_DEVICE_DELETED("hostdev0") QMP_OK); qemuTestSetHostArch(driver.caps, VIR_ARCH_X86_64); DO_TEST_ATTACH("base-live", "watchdog", false, true,

On Wed, Mar 14, 2018 at 10:44:30PM +0530, Shivaprasad G Bhat wrote:
Hi All,
I have revisited/rewritten my previously posted patches. Here is the RFC. Since this patchset is a complete rewrite, I am starting with v1 here.
The semantics is as discussed before https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html
As I went on to refactor the code to support multifunction virtio devices, I realised the abort/cleanup path would be a nightmare there, in case of failures. So, dropped that attempt. The current RFC limits to the real practical use cases of Multifunction PCI hostdevices. All new test code to support multifunction PCI hostdevices and test cases are added to prove the functionality.
I guess I'm not really understanding the use case here. With SRIOV devices, you can already choose between assigning either the physical function (which gives the guest access to all virtual functions), or to assign an arbitrary set of individiual functions to various guests. Why do we need to be able to list many <hostdev> at the same time when hotplugging to assign multiple functions. Basically can you provide a full description of the problem you are trying to solve and why existing functionality isn't sufficient. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 03/15/2018 03:31 PM, Daniel P. Berrangé wrote:
On Wed, Mar 14, 2018 at 10:44:30PM +0530, Shivaprasad G Bhat wrote:
Hi All,
I have revisited/rewritten my previously posted patches. Here is the RFC. Since this patchset is a complete rewrite, I am starting with v1 here.
The semantics is as discussed before https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html
As I went on to refactor the code to support multifunction virtio devices, I realised the abort/cleanup path would be a nightmare there, in case of failures. So, dropped that attempt. The current RFC limits to the real practical use cases of Multifunction PCI hostdevices. All new test code to support multifunction PCI hostdevices and test cases are added to prove the functionality. I guess I'm not really understanding the use case here. With SRIOV devices, you can already choose between assigning either the physical function (which gives the guest access to all virtual functions), or to assign an arbitrary set of individiual functions to various guests. Why do we need to be able to list many <hostdev> at the same time when hotplugging to assign multiple functions.
Basically can you provide a full description of the problem you are trying to solve and why existing functionality isn't sufficient.
Hi Daniel, This is for cards which may not necessarily be networking cards. Or may be a mix of networking and storage. Suppose, user has below card 0005:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10) 0005:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10) If user wants to hotplug this card to guest, He has to detach all the functions from host driver, then hotplug 0005:01:00.0, 0005:01:00.1, so on individually. But, today with each hotplug of the function, each <hostdev> goes to different guest slot. Whereas, PCI requires all of them to be on the same slot. This is not supported on libvirt today. The multifunction cards cant be hotplugged to guest today with the individual <hostdev>, as the operation is queued by qemu till the function zero of guest slot is hotplugged. On function zero hotplug, the qemu sends out the event to guest for device probing where all the previously hotplugged functions from the same slot are discovered. So, grouping the <hostdev>s within the <devices> would become necessary to make the whole thing a single operation. The patches try to fix this aspect of the use case. Thanks, Shivaprasad
Regards, Daniel

On Thu, Mar 15, 2018 at 07:54:47PM +0530, Shivaprasad G Bhat wrote:
On 03/15/2018 03:31 PM, Daniel P. Berrangé wrote:
On Wed, Mar 14, 2018 at 10:44:30PM +0530, Shivaprasad G Bhat wrote:
Hi All,
I have revisited/rewritten my previously posted patches. Here is the RFC. Since this patchset is a complete rewrite, I am starting with v1 here.
The semantics is as discussed before https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html
As I went on to refactor the code to support multifunction virtio devices, I realised the abort/cleanup path would be a nightmare there, in case of failures. So, dropped that attempt. The current RFC limits to the real practical use cases of Multifunction PCI hostdevices. All new test code to support multifunction PCI hostdevices and test cases are added to prove the functionality. I guess I'm not really understanding the use case here. With SRIOV devices, you can already choose between assigning either the physical function (which gives the guest access to all virtual functions), or to assign an arbitrary set of individiual functions to various guests. Why do we need to be able to list many <hostdev> at the same time when hotplugging to assign multiple functions.
Basically can you provide a full description of the problem you are trying to solve and why existing functionality isn't sufficient.
Hi Daniel,
This is for cards which may not necessarily be networking cards. Or may be a mix of networking and storage.
Suppose, user has below card 0005:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10) 0005:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10)
Ok, so this is a device with many functions, but which isn't SRIOV based, and the goal is to assign the physical device to the guest, such that guest has all functions available.
If user wants to hotplug this card to guest, He has to detach all the functions from host driver, then hotplug 0005:01:00.0, 0005:01:00.1, so on individually. But, today with each hotplug of the function, each <hostdev> goes to different guest slot. Whereas, PCI requires all of them to be on the same slot. This is not supported on libvirt today.
The multifunction cards cant be hotplugged to guest today with the individual <hostdev>, as the operation is queued by qemu till the function zero of guest slot is hotplugged. On function zero hotplug, the qemu sends out the event to guest for device probing where all the previously hotplugged functions from the same slot are discovered. So, grouping the <hostdev>s within the <devices> would become necessary to make the whole thing a single operation.
So IIUC, from the patches, if the user wants to assign the physical device to the guest, they would need to provide XML that looked like this to the virDomainAttachDevice() method: <devices> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x1'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x2'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x3'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x4'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x5'/> </source> </hostdev> </devices> Where as if the device were SRIOV based, they would only have to provide <device> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/> </source> </hostdev> </device> for the guest to get access to all functions. I find this difference in behaviour and approach really unpleasant. I think that they user should only need to provide the the address of the physical device, in both cases. At most perhaps we need a new attribute multifunction="on" on the source address to tell libvirt that it should attach all the functions, not just the first <device> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0' mutlifunction="on"/> </source> </hostdev> </device> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, 15 Mar 2018 14:33:55 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote:
On Thu, Mar 15, 2018 at 07:54:47PM +0530, Shivaprasad G Bhat wrote:
On 03/15/2018 03:31 PM, Daniel P. Berrangé wrote:
On Wed, Mar 14, 2018 at 10:44:30PM +0530, Shivaprasad G Bhat wrote:
Hi All,
I have revisited/rewritten my previously posted patches. Here is the RFC. Since this patchset is a complete rewrite, I am starting with v1 here.
The semantics is as discussed before https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html
As I went on to refactor the code to support multifunction virtio devices, I realised the abort/cleanup path would be a nightmare there, in case of failures. So, dropped that attempt. The current RFC limits to the real practical use cases of Multifunction PCI hostdevices. All new test code to support multifunction PCI hostdevices and test cases are added to prove the functionality. I guess I'm not really understanding the use case here. With SRIOV devices, you can already choose between assigning either the physical function (which gives the guest access to all virtual functions), or
Say what? If a guest is assigned a PF, they get the PF, they don't get to enable SR-IOV to also get the VFs. But SR-IOV and multifunction are far from synonymous nor is SR-IOV ubiquitous to all use cases, so I don't know why we're bringing SR-IOV into this discussion.
to assign an arbitrary set of individiual functions to various guests. Why do we need to be able to list many <hostdev> at the same time when hotplugging to assign multiple functions.
Basically can you provide a full description of the problem you are trying to solve and why existing functionality isn't sufficient.
Hi Daniel,
This is for cards which may not necessarily be networking cards. Or may be a mix of networking and storage.
Suppose, user has below card 0005:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10) 0005:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10)
Ok, so this is a device with many functions, but which isn't SRIOV based, and the goal is to assign the physical device to the guest, such that guest has all functions available.
If user wants to hotplug this card to guest, He has to detach all the functions from host driver, then hotplug 0005:01:00.0, 0005:01:00.1, so on individually. But, today with each hotplug of the function, each <hostdev> goes to different guest slot. Whereas, PCI requires all of them to be on the same slot. This is not supported on libvirt today.
The multifunction cards cant be hotplugged to guest today with the individual <hostdev>, as the operation is queued by qemu till the function zero of guest slot is hotplugged. On function zero hotplug, the qemu sends out the event to guest for device probing where all the previously hotplugged functions from the same slot are discovered. So, grouping the <hostdev>s within the <devices> would become necessary to make the whole thing a single operation.
So IIUC, from the patches, if the user wants to assign the physical device to the guest, they would need to provide XML that looked like this to the virDomainAttachDevice() method:
<devices> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x1'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x2'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x3'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x4'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x5'/> </source> </hostdev> </devices>
Where as if the device were SRIOV based, they would only have to provide
<device> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/> </source> </hostdev> </device>
for the guest to get access to all functions.
Since when has this been the case? (nit, the example is domain=0x5, bus=0x1,...)
I find this difference in behaviour and approach really unpleasant.
I think that they user should only need to provide the the address of the physical device, in both cases. At most perhaps we need a new attribute multifunction="on" on the source address to tell libvirt that it should attach all the functions, not just the first
<device> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0' mutlifunction="on"/> </source> </hostdev> </device>
Neither really bothers me, but I'm confused by the claimed existing handling of SR-IOV. Either you're assigning a PF and SR-IOV is irrelevant and unavailable to the guest or you're assigning a VF and, well, SR-IOV is still mostly irrelevant to libvirt unless someone decides to assign the PF hosting the VF or libvirt needs to do VF configuration via the PF. Thanks, Alex

On Thu, Mar 15, 2018 at 08:59:41AM -0600, Alex Williamson wrote:
On Thu, 15 Mar 2018 14:33:55 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote:
On Thu, Mar 15, 2018 at 07:54:47PM +0530, Shivaprasad G Bhat wrote:
On 03/15/2018 03:31 PM, Daniel P. Berrangé wrote:
On Wed, Mar 14, 2018 at 10:44:30PM +0530, Shivaprasad G Bhat wrote:
Hi All,
I have revisited/rewritten my previously posted patches. Here is the RFC. Since this patchset is a complete rewrite, I am starting with v1 here.
The semantics is as discussed before https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html
As I went on to refactor the code to support multifunction virtio devices, I realised the abort/cleanup path would be a nightmare there, in case of failures. So, dropped that attempt. The current RFC limits to the real practical use cases of Multifunction PCI hostdevices. All new test code to support multifunction PCI hostdevices and test cases are added to prove the functionality. I guess I'm not really understanding the use case here. With SRIOV devices, you can already choose between assigning either the physical function (which gives the guest access to all virtual functions), or
Say what? If a guest is assigned a PF, they get the PF, they don't get to enable SR-IOV to also get the VFs. But SR-IOV and multifunction are far from synonymous nor is SR-IOV ubiquitous to all use cases, so I don't know why we're bringing SR-IOV into this discussion.
to assign an arbitrary set of individiual functions to various guests. Why do we need to be able to list many <hostdev> at the same time when hotplugging to assign multiple functions.
Basically can you provide a full description of the problem you are trying to solve and why existing functionality isn't sufficient.
Hi Daniel,
This is for cards which may not necessarily be networking cards. Or may be a mix of networking and storage.
Suppose, user has below card 0005:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10) 0005:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10)
Ok, so this is a device with many functions, but which isn't SRIOV based, and the goal is to assign the physical device to the guest, such that guest has all functions available.
If user wants to hotplug this card to guest, He has to detach all the functions from host driver, then hotplug 0005:01:00.0, 0005:01:00.1, so on individually. But, today with each hotplug of the function, each <hostdev> goes to different guest slot. Whereas, PCI requires all of them to be on the same slot. This is not supported on libvirt today.
The multifunction cards cant be hotplugged to guest today with the individual <hostdev>, as the operation is queued by qemu till the function zero of guest slot is hotplugged. On function zero hotplug, the qemu sends out the event to guest for device probing where all the previously hotplugged functions from the same slot are discovered. So, grouping the <hostdev>s within the <devices> would become necessary to make the whole thing a single operation.
So IIUC, from the patches, if the user wants to assign the physical device to the guest, they would need to provide XML that looked like this to the virDomainAttachDevice() method:
<devices> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x1'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x2'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x3'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x4'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x5'/> </source> </hostdev> </devices>
Where as if the device were SRIOV based, they would only have to provide
<device> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/> </source> </hostdev> </device>
for the guest to get access to all functions.
Since when has this been the case? (nit, the example is domain=0x5, bus=0x1,...)
I find this difference in behaviour and approach really unpleasant.
I think that they user should only need to provide the the address of the physical device, in both cases. At most perhaps we need a new attribute multifunction="on" on the source address to tell libvirt that it should attach all the functions, not just the first
<device> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0' mutlifunction="on"/> </source> </hostdev> </device>
Neither really bothers me, but I'm confused by the claimed existing handling of SR-IOV. Either you're assigning a PF and SR-IOV is irrelevant and unavailable to the guest or you're assigning a VF and, well, SR-IOV is still mostly irrelevant to libvirt unless someone decides to assign the PF hosting the VF or libvirt needs to do VF configuration via the PF. Thanks,
Hmm, could be a mis-understanding then. I was under the belief that when you assign the PF or a SRIOV device to the guest, all the VFs obviously disappear from the host due to driver being unloaded. The guest now has the PF, and would have the option to enable VFs too if it so desired, just as the host had option for when it owned the PF. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, 15 Mar 2018 15:06:38 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote:
On Thu, Mar 15, 2018 at 08:59:41AM -0600, Alex Williamson wrote:
Neither really bothers me, but I'm confused by the claimed existing handling of SR-IOV. Either you're assigning a PF and SR-IOV is irrelevant and unavailable to the guest or you're assigning a VF and, well, SR-IOV is still mostly irrelevant to libvirt unless someone decides to assign the PF hosting the VF or libvirt needs to do VF configuration via the PF. Thanks,
Hmm, could be a mis-understanding then. I was under the belief that when you assign the PF or a SRIOV device to the guest, all the VFs obviously disappear from the host due to driver being unloaded. The guest now has the PF, and would have the option to enable VFs too if it so desired, just as the host had option for when it owned the PF.
Yeah, that's not how it currently works. Some people would like it if this were the case, but we've not gotten past the security issues. If the user is allowed to enable SR-IOV, those VFs don't just appear for the VM, they appear on the host. The host needs to probe for them, assign resources, and attach drivers. What should the host do with VFs that are managed by an untrusted userspace driver? The isolation between VFs and PFs depends on the vendor's SR-IOV implementation. Minimally, the PF driver manages the PCI bus master config bit and can trivially introduce a denial of service for the VFs. Allowing a VM to enable SR-IOV only for the purpose of assigning those VFs back to the VM owning the PF doesn't seem to be a particularly compelling feature on its own. Thanks, Alex

On Thu, Mar 15, 2018 at 09:54:46AM -0600, Alex Williamson wrote:
On Thu, 15 Mar 2018 15:06:38 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote:
On Thu, Mar 15, 2018 at 08:59:41AM -0600, Alex Williamson wrote:
Neither really bothers me, but I'm confused by the claimed existing handling of SR-IOV. Either you're assigning a PF and SR-IOV is irrelevant and unavailable to the guest or you're assigning a VF and, well, SR-IOV is still mostly irrelevant to libvirt unless someone decides to assign the PF hosting the VF or libvirt needs to do VF configuration via the PF. Thanks,
Hmm, could be a mis-understanding then. I was under the belief that when you assign the PF or a SRIOV device to the guest, all the VFs obviously disappear from the host due to driver being unloaded. The guest now has the PF, and would have the option to enable VFs too if it so desired, just as the host had option for when it owned the PF.
Yeah, that's not how it currently works. Some people would like it if this were the case, but we've not gotten past the security issues. If the user is allowed to enable SR-IOV, those VFs don't just appear for the VM, they appear on the host. The host needs to probe for them, assign resources, and attach drivers. What should the host do with VFs that are managed by an untrusted userspace driver? The isolation between VFs and PFs depends on the vendor's SR-IOV implementation. Minimally, the PF driver manages the PCI bus master config bit and can trivially introduce a denial of service for the VFs. Allowing a VM to enable SR-IOV only for the purpose of assigning those VFs back to the VM owning the PF doesn't seem to be a particularly compelling feature on its own. Thanks,
So it sounds like if, in the future, we did want to allow the guest to have the PF, *and* all the VFs at the same time, we would probably need to arrange that all from the host side, vaguely like is being proposed in this series for non-SRIOV, and not let the guest have control over VF create/delete itself. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 03/15/2018 08:03 PM, Daniel P. Berrangé wrote:
On Thu, Mar 15, 2018 at 07:54:47PM +0530, Shivaprasad G Bhat wrote:
On 03/15/2018 03:31 PM, Daniel P. Berrangé wrote:
On Wed, Mar 14, 2018 at 10:44:30PM +0530, Shivaprasad G Bhat wrote:
Hi All,
I have revisited/rewritten my previously posted patches. Here is the RFC. Since this patchset is a complete rewrite, I am starting with v1 here.
The semantics is as discussed before https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html
As I went on to refactor the code to support multifunction virtio devices, I realised the abort/cleanup path would be a nightmare there, in case of failures. So, dropped that attempt. The current RFC limits to the real practical use cases of Multifunction PCI hostdevices. All new test code to support multifunction PCI hostdevices and test cases are added to prove the functionality. I guess I'm not really understanding the use case here. With SRIOV devices, you can already choose between assigning either the physical function (which gives the guest access to all virtual functions), or to assign an arbitrary set of individiual functions to various guests. Why do we need to be able to list many <hostdev> at the same time when hotplugging to assign multiple functions.
Basically can you provide a full description of the problem you are trying to solve and why existing functionality isn't sufficient. Hi Daniel,
This is for cards which may not necessarily be networking cards. Or may be a mix of networking and storage.
Suppose, user has below card 0005:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10) 0005:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10)
Ok, so this is a device with many functions, but which isn't SRIOV based, and the goal is to assign the physical device to the guest, such that guest has all functions available.
If user wants to hotplug this card to guest, He has to detach all the functions from host driver, then hotplug 0005:01:00.0, 0005:01:00.1, so on individually. But, today with each hotplug of the function, each <hostdev> goes to different guest slot. Whereas, PCI requires all of them to be on the same slot. This is not supported on libvirt today.
The multifunction cards cant be hotplugged to guest today with the individual <hostdev>, as the operation is queued by qemu till the function zero of guest slot is hotplugged. On function zero hotplug, the qemu sends out the event to guest for device probing where all the previously hotplugged functions from the same slot are discovered. So, grouping the <hostdev>s within the <devices> would become necessary to make the whole thing a single operation. So IIUC, from the patches, if the user wants to assign the physical device to the guest, they would need to provide XML that looked like this to the virDomainAttachDevice() method:
<devices> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x1'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x2'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x3'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x4'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x5'/> </source> </hostdev> </devices>
Where as if the device were SRIOV based, they would only have to provide
<device> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/> </source> </hostdev> </device>
for the guest to get access to all functions.
I find this difference in behaviour and approach really unpleasant.
I think that they user should only need to provide the the address of the physical device, in both cases. At most perhaps we need a new attribute multifunction="on" on the source address to tell libvirt that it should attach all the functions, not just the first
<device> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0' mutlifunction="on"/> </source> </hostdev> </device>
But with this approach the user can not prevent few functions from being assigned to guest if he wants to. It will be all or none. The PCI requires only function zero to be present and so, partial assignment is expected to work. So user should have that control?
Regards, Daniel

On Thu, Mar 15, 2018 at 08:47:32PM +0530, Shivaprasad G Bhat wrote:
On 03/15/2018 08:03 PM, Daniel P. Berrangé wrote:
On Thu, Mar 15, 2018 at 07:54:47PM +0530, Shivaprasad G Bhat wrote:
On 03/15/2018 03:31 PM, Daniel P. Berrangé wrote:
On Wed, Mar 14, 2018 at 10:44:30PM +0530, Shivaprasad G Bhat wrote:
Hi All,
I have revisited/rewritten my previously posted patches. Here is the RFC. Since this patchset is a complete rewrite, I am starting with v1 here.
The semantics is as discussed before https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html
As I went on to refactor the code to support multifunction virtio devices, I realised the abort/cleanup path would be a nightmare there, in case of failures. So, dropped that attempt. The current RFC limits to the real practical use cases of Multifunction PCI hostdevices. All new test code to support multifunction PCI hostdevices and test cases are added to prove the functionality. I guess I'm not really understanding the use case here. With SRIOV devices, you can already choose between assigning either the physical function (which gives the guest access to all virtual functions), or to assign an arbitrary set of individiual functions to various guests. Why do we need to be able to list many <hostdev> at the same time when hotplugging to assign multiple functions.
Basically can you provide a full description of the problem you are trying to solve and why existing functionality isn't sufficient. Hi Daniel,
This is for cards which may not necessarily be networking cards. Or may be a mix of networking and storage.
Suppose, user has below card 0005:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10) 0005:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10)
Ok, so this is a device with many functions, but which isn't SRIOV based, and the goal is to assign the physical device to the guest, such that guest has all functions available.
If user wants to hotplug this card to guest, He has to detach all the functions from host driver, then hotplug 0005:01:00.0, 0005:01:00.1, so on individually. But, today with each hotplug of the function, each <hostdev> goes to different guest slot. Whereas, PCI requires all of them to be on the same slot. This is not supported on libvirt today.
The multifunction cards cant be hotplugged to guest today with the individual <hostdev>, as the operation is queued by qemu till the function zero of guest slot is hotplugged. On function zero hotplug, the qemu sends out the event to guest for device probing where all the previously hotplugged functions from the same slot are discovered. So, grouping the <hostdev>s within the <devices> would become necessary to make the whole thing a single operation. So IIUC, from the patches, if the user wants to assign the physical device to the guest, they would need to provide XML that looked like this to the virDomainAttachDevice() method:
<devices> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x1'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x2'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x3'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x4'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x5'/> </source> </hostdev> </devices>
Where as if the device were SRIOV based, they would only have to provide
<device> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/> </source> </hostdev> </device>
for the guest to get access to all functions.
I find this difference in behaviour and approach really unpleasant.
I think that they user should only need to provide the the address of the physical device, in both cases. At most perhaps we need a new attribute multifunction="on" on the source address to tell libvirt that it should attach all the functions, not just the first
<device> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0' mutlifunction="on"/> </source> </hostdev> </device>
But with this approach the user can not prevent few functions from being assigned to guest if he wants to. It will be all or none. The PCI requires only function zero to be present and so, partial assignment is expected to work.
IIUC, once you've assigned the device with function 0 to a guest, no other guest or the host, can safely used the other functions of the device, right ? So I just assumed that if you're going to give some functions to the guest you might as well give them all, as nothing else can use them.
So user should have that control?
Is there a use case for only giving some of the them ? Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 03/15/2018 11:35 AM, Daniel P. Berrangé wrote:
On Thu, Mar 15, 2018 at 08:47:32PM +0530, Shivaprasad G Bhat wrote:
On 03/15/2018 03:31 PM, Daniel P. Berrangé wrote:
On Wed, Mar 14, 2018 at 10:44:30PM +0530, Shivaprasad G Bhat wrote:
Hi All,
I have revisited/rewritten my previously posted patches. Here is the RFC. Since this patchset is a complete rewrite, I am starting with v1 here.
The semantics is as discussed before https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html
As I went on to refactor the code to support multifunction virtio devices, I realised the abort/cleanup path would be a nightmare there, in case of failures. So, dropped that attempt. The current RFC limits to the real practical use cases of Multifunction PCI hostdevices. All new test code to support multifunction PCI hostdevices and test cases are added to prove the functionality. I guess I'm not really understanding the use case here. With SRIOV devices, you can already choose between assigning either the physical function (which gives the guest access to all virtual functions), or to assign an arbitrary set of individiual functions to various guests. Why do we need to be able to list many <hostdev> at the same time when hotplugging to assign multiple functions.
Basically can you provide a full description of the problem you are trying to solve and why existing functionality isn't sufficient. Hi Daniel,
This is for cards which may not necessarily be networking cards. Or may be a mix of networking and storage.
Suppose, user has below card 0005:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10) 0005:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10) Ok, so this is a device with many functions, but which isn't SRIOV
On Thu, Mar 15, 2018 at 07:54:47PM +0530, Shivaprasad G Bhat wrote: based, and the goal is to assign the physical device to the guest, such that guest has all functions available.
If user wants to hotplug this card to guest, He has to detach all the functions from host driver, then hotplug 0005:01:00.0, 0005:01:00.1, so on individually. But, today with each hotplug of the function, each <hostdev> goes to different guest slot. Whereas, PCI requires all of them to be on the same slot. This is not supported on libvirt today.
The multifunction cards cant be hotplugged to guest today with the individual <hostdev>, as the operation is queued by qemu till the function zero of guest slot is hotplugged. On function zero hotplug, the qemu sends out the event to guest for device probing where all the previously hotplugged functions from the same slot are discovered. So, grouping the <hostdev>s within the <devices> would become necessary to make the whole thing a single operation. So IIUC, from the patches, if the user wants to assign the physical device to the guest, they would need to provide XML that looked like this to the virDomainAttachDevice() method:
<devices> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x1'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x2'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x3'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x4'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x5'/> </source> </hostdev> </devices>
Where as if the device were SRIOV based, they would only have to provide
<device> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/> </source> </hostdev> </device>
for the guest to get access to all functions.
I find this difference in behaviour and approach really unpleasant.
I think that they user should only need to provide the the address of the physical device, in both cases. At most perhaps we need a new attribute multifunction="on" on the source address to tell libvirt that it should attach all the functions, not just the first
<device> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0' mutlifunction="on"/> </source> </hostdev> </device> But with this approach the user can not prevent few functions from being assigned to guest if he wants to. It will be all or none. The PCI requires only function zero to be
On 03/15/2018 08:03 PM, Daniel P. Berrangé wrote: present and so, partial assignment is expected to work.
IIUC, once you've assigned the device with function 0 to a guest, no other guest or the host, can safely used the other functions of the device, right ? So I just assumed that if you're going to give some functions to the guest you might as well give them all, as nothing else can use them.
You bring up a good point that the unassigned functions probably can't be safely used for anything else (since they're almost surely in the same IOMMU group), but the host may not want to give all of them to the guest. This points out that if we intend for managed='yes' to operate properly, those other functions will need to be bound to vfio-pci (or nothing). (*But* since in the past we've said that we don't want to implicitly re-bind devices to vfio-pci during assignment if they're not explicitly called out in the config, we're either going to need some method of specifying "manage (i.e. auto-bind to vfio-pci) this other device in the iommu group, but don't assign it", or just inform people that it's going to fail if they don't use managed='no' and bind to vfio-pci themselves.
So user should have that control? Is there a use case for only giving some of the them ?
Maybe one of the devices is dangerous to hand over to a guest? Aside from that, there are a few other scenarios where explicitly spelling out the devices to be assigned (and where to assign them) makes sense: 1) possibly (e.g. for testing or compatibility purposes) you want the addressing of the devices on the guest to be different from what they are on the host. You may want to put them at different functions, or you may even want/need devices that are on different functions of the same slot on the host to be on different slots in the guest. 2) Someone might want to assign multiple *emulated* devices to multiple functions on the same slot in the guest (either to mimic the device layout of real hardware, or to overcome slot limitations). Since the emulated devices have no "physical" topology to mimic on the guest, it must necessarily be spelled out in the config. 3) There may be a piece of hardware that changes what functions are populated with devices based on config (an admittedly un-useful example is the multiple VFs of an SRIOV network card). If you just use "multifunction='on'" to mean "assign *ALL THE DEVICES*!!!" (insert spear-wielding meme here), then the hardware given to the guest will change based on how the hardware has been configured prior to assignment. Also, in the future there may various knobs that need to be adjusted (in the config) for the individual devices, and if the assignment of the device to the guest is just implied by "multifunction='on'" then there will be no place for that config to live. I think it's safest to explicitly spell out which device will be assigned, and where they will be assigned.

On Thu, 15 Mar 2018 17:53:02 -0400 Laine Stump <laine@laine.org> wrote:
On 03/15/2018 11:35 AM, Daniel P. Berrangé wrote:
On Thu, Mar 15, 2018 at 08:47:32PM +0530, Shivaprasad G Bhat wrote:
On 03/15/2018 03:31 PM, Daniel P. Berrangé wrote:
On Wed, Mar 14, 2018 at 10:44:30PM +0530, Shivaprasad G Bhat wrote: > Hi All, > > I have revisited/rewritten my previously posted patches. Here is > the RFC. Since this patchset is a complete rewrite, I am starting > with v1 here. > > The semantics is as discussed before > https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html > > As I went on to refactor the code to support multifunction virtio devices, > I realised the abort/cleanup path would be a nightmare there, in case of > failures. So, dropped that attempt. The current RFC limits to the real > practical use cases of Multifunction PCI hostdevices. All new test code > to support multifunction PCI hostdevices and test cases are added to > prove the functionality. I guess I'm not really understanding the use case here. With SRIOV devices, you can already choose between assigning either the physical function (which gives the guest access to all virtual functions), or to assign an arbitrary set of individiual functions to various guests. Why do we need to be able to list many <hostdev> at the same time when hotplugging to assign multiple functions.
Basically can you provide a full description of the problem you are trying to solve and why existing functionality isn't sufficient. Hi Daniel,
This is for cards which may not necessarily be networking cards. Or may be a mix of networking and storage.
Suppose, user has below card 0005:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer) (rev 10) 0005:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10) 0005:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator (Lancer) (rev 10) Ok, so this is a device with many functions, but which isn't SRIOV
On Thu, Mar 15, 2018 at 07:54:47PM +0530, Shivaprasad G Bhat wrote: based, and the goal is to assign the physical device to the guest, such that guest has all functions available.
If user wants to hotplug this card to guest, He has to detach all the functions from host driver, then hotplug 0005:01:00.0, 0005:01:00.1, so on individually. But, today with each hotplug of the function, each <hostdev> goes to different guest slot. Whereas, PCI requires all of them to be on the same slot. This is not supported on libvirt today.
The multifunction cards cant be hotplugged to guest today with the individual <hostdev>, as the operation is queued by qemu till the function zero of guest slot is hotplugged. On function zero hotplug, the qemu sends out the event to guest for device probing where all the previously hotplugged functions from the same slot are discovered. So, grouping the <hostdev>s within the <devices> would become necessary to make the whole thing a single operation. So IIUC, from the patches, if the user wants to assign the physical device to the guest, they would need to provide XML that looked like this to the virDomainAttachDevice() method:
<devices> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x1'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x2'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x3'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x4'/> </source> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x5'/> </source> </hostdev> </devices>
Where as if the device were SRIOV based, they would only have to provide
<device> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/> </source> </hostdev> </device>
for the guest to get access to all functions.
I find this difference in behaviour and approach really unpleasant.
I think that they user should only need to provide the the address of the physical device, in both cases. At most perhaps we need a new attribute multifunction="on" on the source address to tell libvirt that it should attach all the functions, not just the first
<device> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x05' slot='0x1' function='0x0' mutlifunction="on"/> </source> </hostdev> </device> But with this approach the user can not prevent few functions from being assigned to guest if he wants to. It will be all or none. The PCI requires only function zero to be
On 03/15/2018 08:03 PM, Daniel P. Berrangé wrote: present and so, partial assignment is expected to work.
IIUC, once you've assigned the device with function 0 to a guest, no other guest or the host, can safely used the other functions of the device, right ? So I just assumed that if you're going to give some functions to the guest you might as well give them all, as nothing else can use them.
No! We have IOMMU groups to define what devices are not isolated, please do not start making further assumptions based simply on functions being in the same slot. Multifunction device can support ACS and define that the functions are DMA isolated and we also have quite a few quirks in the kernel for devices where the vendor has vouched for the functions being isolated. Even if the functions are grouped together, there's no guarantee that the user actually wants all the functions attached. If it's a special mode where defining multifunction on the source identity maps all the functions and allows hotplug, fine, but we have many existing users where functions are assigned to separate VMs.
You bring up a good point that the unassigned functions probably can't be safely used for anything else (since they're almost surely in the same IOMMU group),
No, they're not almost surely in the same group.
but the host may not want to give all of them to the guest.
If they are in the same group, then the host must give them all to the user, but that doesn't imply the user wants them all assigned. For instance, RHEL doesn't by default support assignment of a Quadro card's audio function, even though they're in the same group (interrupts are broken).
This points out that if we intend for managed='yes' to operate properly, those other functions will need to be bound to vfio-pci (or nothing). (*But* since in the past we've said that we don't want to implicitly re-bind devices to vfio-pci during assignment if they're not explicitly called out in the config, we're either going to need some method of specifying "manage (i.e. auto-bind to vfio-pci) this other device in the iommu group, but don't assign it", or just inform people that it's going to fail if they don't use managed='no' and bind to vfio-pci themselves.
So user should have that control? Is there a use case for only giving some of the them ?
Yes!
Maybe one of the devices is dangerous to hand over to a guest?
Aside from that, there are a few other scenarios where explicitly spelling out the devices to be assigned (and where to assign them) makes sense:
1) possibly (e.g. for testing or compatibility purposes) you want the addressing of the devices on the guest to be different from what they are on the host. You may want to put them at different functions, or you may even want/need devices that are on different functions of the same slot on the host to be on different slots in the guest.
2) Someone might want to assign multiple *emulated* devices to multiple functions on the same slot in the guest (either to mimic the device layout of real hardware, or to overcome slot limitations). Since the emulated devices have no "physical" topology to mimic on the guest, it must necessarily be spelled out in the config.
3) There may be a piece of hardware that changes what functions are populated with devices based on config (an admittedly un-useful example is the multiple VFs of an SRIOV network card). If you just use "multifunction='on'" to mean "assign *ALL THE DEVICES*!!!" (insert spear-wielding meme here), then the hardware given to the guest will change based on how the hardware has been configured prior to assignment.
Also, in the future there may various knobs that need to be adjusted (in the config) for the individual devices, and if the assignment of the device to the guest is just implied by "multifunction='on'" then there will be no place for that config to live.
I think it's safest to explicitly spell out which device will be assigned, and where they will be assigned.
+1 Making huge generalizations about how functions can be split and used is definitely the wrong path. Thanks, Alex
participants (4)
-
Alex Williamson
-
Daniel P. Berrangé
-
Laine Stump
-
Shivaprasad G Bhat