[libvirt] [PATCH v2 00/39] Introduce NVMe support

v2 of: https://www.redhat.com/archives/libvir-list/2019-July/msg00675.html As usual, you can find my patches on my github: https://github.com/zippy2/libvirt/tree/nvme_v3 https://travis-ci.org/zippy2/libvirt/builds/590033775 (Yeah, my branch is really called _v3 because reasons) diff to v1: - A lot. Hopefully all Peter's comments are worked in Michal Prívozník (39): virhostdev: Fix const correctness of virHostdevIs{PCINet,SCSI,Mdev}Device() virhostdev: Introduce and use virHostdevIsVFIODevice conf: Introduce virDomainDefHasMdevHostdev qemu_hostdev: Introduce qemuHostdevNeedsVFIO() qemu: Introduce qemuDomainNeedsVFIO qemu_cgroup: Teardown Cgroup for more host device types qemu: Explicitly add/remove /dev/vfio/vfio to/from NS/CGroups qemu_domain: Drop few useless checks in qemuDomainGetHostdevPath qemuDomainGetHostdevPath: Drop @freeTmpPath qemuDomainGetHostdevPath: Use more VIR_AUTOFREE/VIR_AUTOPTR qemuDomainGetHostdevPath: Don't include /dev/vfio/vfio in returned paths qemu: Drop some 'cleanup' labels virpci: Introduce and use virPCIDeviceAddressGetIOMMUGroupDev virHostdevPreparePCIDevices: Separate out function body virHostdevReAttachPCIDevices: Separate out function body virpci: Introduce virPCIDeviceAddressCopy qemuMigrationSrcIsSafe: Rework slightly schemas: Introduce disk type NVMe conf: Format and parse NVMe type disk virstoragefile: Introduce virStorageSourceChainHasNVMe domain_conf: Introduce virDomainDefHasNVMeDisk util: Introduce virNVMeDevice module virhostdev: Include virNVMeDevice module virpcimock: Introduce NVMe driver and devices virhostdevtest: Test virNVMeDevice assignment qemu: prepare NVMe devices too qemu: Take NVMe disks into account when calculating memlock limit qemu: Create NVMe disk in domain namespace qemu: Mark NVMe disks as 'need VFIO' qemu: Allow NVMe disk in CGroups security_selinux: Simplify virSecuritySELinuxSetImageLabelInternal virSecuritySELinuxRestoreImageLabelInt: Don't skip non-local storage qemu_capabilities: Introduce QEMU_CAPS_DRIVE_NVME qemu: Generate command line of NVMe disks qemu_monitor_text: Catch IOMMU/VFIO related errors in qemuMonitorTextAddDrive qemu: Don't leak storage perms on failure in qemuDomainAttachDiskGeneric qemu: Allow forcing VFIO when computing memlock limit qemu_hotplug: Prepare NVMe disks on hotplug virsh: Introduce nvme disk to domblklist docs/formatdomain.html.in | 57 ++- docs/schemas/domaincommon.rng | 32 ++ src/conf/domain_conf.c | 129 ++++- src/conf/domain_conf.h | 6 + src/libvirt_private.syms | 30 ++ src/libxl/xen_xl.c | 1 + src/qemu/qemu_block.c | 25 + src/qemu/qemu_capabilities.c | 4 + src/qemu/qemu_capabilities.h | 3 + src/qemu/qemu_cgroup.c | 216 ++++++--- src/qemu/qemu_command.c | 6 +- src/qemu/qemu_domain.c | 409 +++++++++------- src/qemu/qemu_domain.h | 17 +- src/qemu/qemu_driver.c | 4 + src/qemu/qemu_hostdev.c | 80 ++- src/qemu/qemu_hostdev.h | 18 + src/qemu/qemu_hotplug.c | 39 +- src/qemu/qemu_migration.c | 30 +- src/qemu/qemu_monitor_text.c | 7 + src/qemu/qemu_process.c | 7 + src/security/security_apparmor.c | 33 +- src/security/security_dac.c | 30 ++ src/security/security_selinux.c | 82 ++-- src/util/Makefile.inc.am | 2 + src/util/virhostdev.c | 455 ++++++++++++++++-- src/util/virhostdev.h | 44 +- src/util/virnvme.c | 454 +++++++++++++++++ src/util/virnvme.h | 95 ++++ src/util/virpci.c | 29 ++ src/util/virpci.h | 5 + src/util/virstoragefile.c | 73 +++ src/util/virstoragefile.h | 19 + .../caps_2.12.0.aarch64.xml | 1 + .../caps_2.12.0.ppc64.xml | 1 + .../caps_2.12.0.s390x.xml | 1 + .../caps_2.12.0.x86_64.xml | 1 + .../qemucapabilitiesdata/caps_3.0.0.ppc64.xml | 1 + .../caps_3.0.0.riscv32.xml | 1 + .../caps_3.0.0.riscv64.xml | 1 + .../qemucapabilitiesdata/caps_3.0.0.s390x.xml | 1 + .../caps_3.0.0.x86_64.xml | 1 + .../qemucapabilitiesdata/caps_3.1.0.ppc64.xml | 1 + .../caps_3.1.0.x86_64.xml | 1 + .../caps_4.0.0.aarch64.xml | 1 + .../qemucapabilitiesdata/caps_4.0.0.ppc64.xml | 1 + .../caps_4.0.0.riscv32.xml | 1 + .../caps_4.0.0.riscv64.xml | 1 + .../qemucapabilitiesdata/caps_4.0.0.s390x.xml | 1 + .../caps_4.0.0.x86_64.xml | 1 + .../caps_4.1.0.x86_64.xml | 1 + tests/qemumemlocktest.c | 2 +- .../disk-nvme.x86_64-latest.args | 53 ++ tests/qemuxml2argvdata/disk-nvme.xml | 63 +++ tests/qemuxml2argvtest.c | 1 + tests/qemuxml2xmloutdata/disk-nvme.xml | 1 + tests/qemuxml2xmltest.c | 1 + tests/virhostdevtest.c | 97 ++++ tests/virpcimock.c | 3 + tests/virpcitestdata/0000-01-00.0.config | Bin 0 -> 4096 bytes tests/virpcitestdata/0000-02-00.0.config | Bin 0 -> 4096 bytes tools/virsh-domain-monitor.c | 31 +- 61 files changed, 2330 insertions(+), 381 deletions(-) create mode 100644 src/util/virnvme.c create mode 100644 src/util/virnvme.h create mode 100644 tests/qemuxml2argvdata/disk-nvme.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/disk-nvme.xml create mode 120000 tests/qemuxml2xmloutdata/disk-nvme.xml create mode 100644 tests/virpcitestdata/0000-01-00.0.config create mode 100644 tests/virpcitestdata/0000-02-00.0.config -- 2.21.0

These functions do not change any of the passed hostdevs. They just read them. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/virhostdev.c | 6 +++--- src/util/virhostdev.h | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index 41fcab7222..90967b7c7a 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -354,7 +354,7 @@ virHostdevNetDevice(virDomainHostdevDefPtr hostdev, static bool -virHostdevIsPCINetDevice(virDomainHostdevDefPtr hostdev) +virHostdevIsPCINetDevice(const virDomainHostdevDef *hostdev) { return hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && @@ -369,7 +369,7 @@ virHostdevIsPCINetDevice(virDomainHostdevDefPtr hostdev) * Returns true if @hostdev is a SCSI device, false otherwise. */ bool -virHostdevIsSCSIDevice(virDomainHostdevDefPtr hostdev) +virHostdevIsSCSIDevice(const virDomainHostdevDef *hostdev) { return hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_SCSI; @@ -383,7 +383,7 @@ virHostdevIsSCSIDevice(virDomainHostdevDefPtr hostdev) * Returns true if @hostdev is a Mediated device, false otherwise. */ bool -virHostdevIsMdevDevice(virDomainHostdevDefPtr hostdev) +virHostdevIsMdevDevice(const virDomainHostdevDef *hostdev) { return hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_MDEV; diff --git a/src/util/virhostdev.h b/src/util/virhostdev.h index 88501e2743..8d7b8c3284 100644 --- a/src/util/virhostdev.h +++ b/src/util/virhostdev.h @@ -185,10 +185,10 @@ virHostdevReAttachDomainDevices(virHostdevManagerPtr mgr, const char *oldStateDir) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); bool -virHostdevIsSCSIDevice(virDomainHostdevDefPtr hostdev) +virHostdevIsSCSIDevice(const virDomainHostdevDef *hostdev) ATTRIBUTE_NONNULL(1); bool -virHostdevIsMdevDevice(virDomainHostdevDefPtr hostdev) +virHostdevIsMdevDevice(const virDomainHostdevDef *hostdev) ATTRIBUTE_NONNULL(1); /* functions used by NodeDevDetach/Reattach/Reset */ -- 2.21.0

On 9/26/19 12:11 PM, Michal Privoznik wrote:
These functions do not change any of the passed hostdevs. They just read them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/virhostdev.c | 6 +++--- src/util/virhostdev.h | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-)
Reviewed-by: Cole Robinson <crobinso@redhat.com> - Cole

In some places we need to check if a hostdev has VFIO backend. Because of how complicated virDomainHostdevDef structure is, the check consists of three lines. Move them to a function and replace all checks with the function call. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/domain_conf.c | 5 +---- src/libvirt_private.syms | 1 + src/qemu/qemu_cgroup.c | 4 +--- src/qemu/qemu_domain.c | 12 +++--------- src/qemu/qemu_hotplug.c | 8 +------- src/util/virhostdev.c | 15 +++++++++++++++ src/util/virhostdev.h | 3 +++ 7 files changed, 25 insertions(+), 23 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index c290baf953..adf8455579 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -31550,10 +31550,7 @@ virDomainDefHasVFIOHostdev(const virDomainDef *def) size_t i; for (i = 0; i < def->nhostdevs; i++) { - const virDomainHostdevDef *tmp = def->hostdevs[i]; - if (tmp->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && - tmp->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && - tmp->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO) + if (virHostdevIsVFIODevice(def->hostdevs[i])) return true; } diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 287e63bffa..ac37aea626 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2104,6 +2104,7 @@ virHostCPUStatsAssign; virHostdevFindUSBDevice; virHostdevIsMdevDevice; virHostdevIsSCSIDevice; +virHostdevIsVFIODevice; virHostdevManagerGetDefault; virHostdevPCINodeDeviceDetach; virHostdevPCINodeDeviceReAttach; diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 740a1b33dc..318157dab0 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -413,9 +413,7 @@ qemuTeardownHostdevCgroup(virDomainObjPtr vm, if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) return 0; - if (dev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && - dev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && - dev->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO && + if (virHostdevIsVFIODevice(dev) && qemuDomainGetHostdevPath(vm->def, dev, true, &npaths, &path, NULL) < 0) goto cleanup; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 2aa2164953..824bca89f4 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11871,9 +11871,7 @@ getPPC64MemLockLimitBytes(virDomainDefPtr def) for (i = 0; i < def->nhostdevs; i++) { virDomainHostdevDefPtr dev = def->hostdevs[i]; - if (dev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && - dev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && - dev->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO) { + if (virHostdevIsVFIODevice(dev)) { usesVFIO = true; pciAddr = &dev->source.subsys.u.pci.addr; @@ -12025,12 +12023,8 @@ qemuDomainGetMemLockLimitBytes(virDomainDefPtr def) * Note that this may not be valid for all platforms. */ for (i = 0; i < def->nhostdevs; i++) { - virDomainHostdevSubsysPtr subsys = &def->hostdevs[i]->source.subsys; - - if (def->hostdevs[i]->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && - (subsys->type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_MDEV || - (subsys->type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && - subsys->u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO))) { + if (virHostdevIsVFIODevice(def->hostdevs[i]) || + virHostdevIsMdevDevice(def->hostdevs[i])) { memKB = virDomainDefGetMemoryTotal(def) + 1024 * 1024; goto done; } diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index e9285f0964..5f92c61aa9 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -4440,16 +4440,10 @@ qemuDomainRemoveHostDevice(virQEMUDriverPtr driver, qemuDomainObjPrivatePtr priv = vm->privateData; VIR_AUTOFREE(char *) drivealias = NULL; VIR_AUTOFREE(char *) objAlias = NULL; - bool is_vfio = false; VIR_DEBUG("Removing host device %s from domain %p %s", hostdev->info->alias, vm, vm->def->name); - if (hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) { - int backend = hostdev->source.subsys.u.pci.backend; - is_vfio = backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO; - } - if (hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_SCSI) { virDomainHostdevSubsysSCSIPtr scsisrc = &hostdev->source.subsys.u.scsi; virDomainHostdevSubsysSCSIiSCSIPtr iscsisrc = &scsisrc->u.iscsi; @@ -4497,7 +4491,7 @@ qemuDomainRemoveHostDevice(virQEMUDriverPtr driver, virDomainAuditHostdev(vm, hostdev, "detach", true); - if (!is_vfio && + if (!virHostdevIsVFIODevice(hostdev) && qemuSecurityRestoreHostdevLabel(driver, vm, hostdev) < 0) VIR_WARN("Failed to restore host device labelling"); diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index 90967b7c7a..85812423b5 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -390,6 +390,21 @@ virHostdevIsMdevDevice(const virDomainHostdevDef *hostdev) } +/** + * virHostdevIsVFIODevice: + * @hostdev: host device to check + * + * Returns true if @hostdev is a PCI device with VFIO backend, false otherwise. + */ +bool +virHostdevIsVFIODevice(const virDomainHostdevDef *hostdev) +{ + return hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && + hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && + hostdev->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO; +} + + static int virHostdevNetConfigVirtPortProfile(const char *linkdev, int vf, virNetDevVPortProfilePtr virtPort, diff --git a/src/util/virhostdev.h b/src/util/virhostdev.h index 8d7b8c3284..c7ef2055c1 100644 --- a/src/util/virhostdev.h +++ b/src/util/virhostdev.h @@ -190,6 +190,9 @@ virHostdevIsSCSIDevice(const virDomainHostdevDef *hostdev) bool virHostdevIsMdevDevice(const virDomainHostdevDef *hostdev) ATTRIBUTE_NONNULL(1); +bool +virHostdevIsVFIODevice(const virDomainHostdevDef *hostdev) + ATTRIBUTE_NONNULL(1); /* functions used by NodeDevDetach/Reattach/Reset */ int virHostdevPCINodeDeviceDetach(virHostdevManagerPtr mgr, -- 2.21.0

On 9/26/19 12:11 PM, Michal Privoznik wrote:
In some places we need to check if a hostdev has VFIO backend. Because of how complicated virDomainHostdevDef structure is, the check consists of three lines. Move them to a function and replace all checks with the function call.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/domain_conf.c | 5 +---- src/libvirt_private.syms | 1 + src/qemu/qemu_cgroup.c | 4 +--- src/qemu/qemu_domain.c | 12 +++--------- src/qemu/qemu_hotplug.c | 8 +------- src/util/virhostdev.c | 15 +++++++++++++++ src/util/virhostdev.h | 3 +++ 7 files changed, 25 insertions(+), 23 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index c290baf953..adf8455579 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -31550,10 +31550,7 @@ virDomainDefHasVFIOHostdev(const virDomainDef *def) size_t i;
for (i = 0; i < def->nhostdevs; i++) { - const virDomainHostdevDef *tmp = def->hostdevs[i]; - if (tmp->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && - tmp->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && - tmp->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO) + if (virHostdevIsVFIODevice(def->hostdevs[i])) return true; }
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 287e63bffa..ac37aea626 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2104,6 +2104,7 @@ virHostCPUStatsAssign; virHostdevFindUSBDevice; virHostdevIsMdevDevice; virHostdevIsSCSIDevice; +virHostdevIsVFIODevice; virHostdevManagerGetDefault; virHostdevPCINodeDeviceDetach; virHostdevPCINodeDeviceReAttach; diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 740a1b33dc..318157dab0 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -413,9 +413,7 @@ qemuTeardownHostdevCgroup(virDomainObjPtr vm, if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) return 0;
- if (dev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && - dev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && - dev->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO && + if (virHostdevIsVFIODevice(dev) && qemuDomainGetHostdevPath(vm->def, dev, true, &npaths, &path, NULL) < 0) goto cleanup; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 2aa2164953..824bca89f4 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11871,9 +11871,7 @@ getPPC64MemLockLimitBytes(virDomainDefPtr def) for (i = 0; i < def->nhostdevs; i++) { virDomainHostdevDefPtr dev = def->hostdevs[i];
- if (dev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && - dev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && - dev->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO) { + if (virHostdevIsVFIODevice(dev)) { usesVFIO = true;
pciAddr = &dev->source.subsys.u.pci.addr; @@ -12025,12 +12023,8 @@ qemuDomainGetMemLockLimitBytes(virDomainDefPtr def) * Note that this may not be valid for all platforms. */ for (i = 0; i < def->nhostdevs; i++) { - virDomainHostdevSubsysPtr subsys = &def->hostdevs[i]->source.subsys; - - if (def->hostdevs[i]->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && - (subsys->type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_MDEV || - (subsys->type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && - subsys->u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO))) { + if (virHostdevIsVFIODevice(def->hostdevs[i]) || + virHostdevIsMdevDevice(def->hostdevs[i])) { memKB = virDomainDefGetMemoryTotal(def) + 1024 * 1024; goto done; }
I see a sneaky IsMdev conversion in there, but I don't mind ;) Reviewed-by: Cole Robinson <crobinso@redhat.com> - Cole

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/domain_conf.c | 14 ++++++++++++++ src/conf/domain_conf.h | 3 +++ src/libvirt_private.syms | 1 + 3 files changed, 18 insertions(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index adf8455579..c1769f743b 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -31558,6 +31558,20 @@ virDomainDefHasVFIOHostdev(const virDomainDef *def) } +bool +virDomainDefHasMdevHostdev(const virDomainDef *def) +{ + size_t i; + + for (i = 0; i < def->nhostdevs; i++) { + if (virHostdevIsMdevDevice(def->hostdevs[i])) + return true; + } + + return false; +} + + /** * virDomainGraphicsDefHasOpenGL: * @def: domain definition diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 53bdee22fb..bf5d1a154b 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -3669,6 +3669,9 @@ virDomainDefHasManagedPR(const virDomainDef *def); bool virDomainDefHasVFIOHostdev(const virDomainDef *def); +bool +virDomainDefHasMdevHostdev(const virDomainDef *def); + bool virDomainGraphicsDefHasOpenGL(const virDomainDef *def); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index ac37aea626..de124eb37b 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -294,6 +294,7 @@ virDomainDefGetVcpusMax; virDomainDefGetVcpusTopology; virDomainDefHasDeviceAddress; virDomainDefHasManagedPR; +virDomainDefHasMdevHostdev; virDomainDefHasMemballoon; virDomainDefHasMemoryHotplug; virDomainDefHasUSB; -- 2.21.0

On 9/26/19 12:11 PM, Michal Privoznik wrote:
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/domain_conf.c | 14 ++++++++++++++ src/conf/domain_conf.h | 3 +++ src/libvirt_private.syms | 1 + 3 files changed, 18 insertions(+)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index adf8455579..c1769f743b 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -31558,6 +31558,20 @@ virDomainDefHasVFIOHostdev(const virDomainDef *def) }
Reminds me I'd really like a domain_conf_util.c or similar just to split up this massive file. Alas... Reviewed-by: Cole Robinson <crobinso@redhat.com> - Cole

There are two types of host devices that require /dev/vfio/vfio access: 1) PCI devices with VFIO backend 2) Mediated devices Introduce a simple helper that returns true if passed @hostdev falls in either of the categories. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_hostdev.c | 9 +++++++++ src/qemu/qemu_hostdev.h | 2 ++ 2 files changed, 11 insertions(+) diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index af41c32679..ebbca817b8 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -118,6 +118,15 @@ qemuHostdevUpdateActiveDomainDevices(virQEMUDriverPtr driver, return 0; } + +bool +qemuHostdevNeedsVFIO(const virDomainHostdevDef *hostdev) +{ + return virHostdevIsVFIODevice(hostdev) || + virHostdevIsMdevDevice(hostdev); +} + + bool qemuHostdevHostSupportsPassthroughVFIO(void) { diff --git a/src/qemu/qemu_hostdev.h b/src/qemu/qemu_hostdev.h index e99c204961..536069fe8a 100644 --- a/src/qemu/qemu_hostdev.h +++ b/src/qemu/qemu_hostdev.h @@ -24,6 +24,8 @@ #include "qemu_conf.h" #include "domain_conf.h" +bool qemuHostdevNeedsVFIO(const virDomainHostdevDef *hostdev); + bool qemuHostdevHostSupportsPassthroughVFIO(void); int qemuHostdevUpdateActiveMediatedDevices(virQEMUDriverPtr driver, -- 2.21.0

On 9/26/19 12:12 PM, Michal Privoznik wrote:
There are two types of host devices that require /dev/vfio/vfio access:
1) PCI devices with VFIO backend 2) Mediated devices
Introduce a simple helper that returns true if passed @hostdev falls in either of the categories.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_hostdev.c | 9 +++++++++ src/qemu/qemu_hostdev.h | 2 ++ 2 files changed, 11 insertions(+)
diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index af41c32679..ebbca817b8 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -118,6 +118,15 @@ qemuHostdevUpdateActiveDomainDevices(virQEMUDriverPtr driver, return 0; }
+ +bool +qemuHostdevNeedsVFIO(const virDomainHostdevDef *hostdev) +{ + return virHostdevIsVFIODevice(hostdev) || + virHostdevIsMdevDevice(hostdev); +} + + bool qemuHostdevHostSupportsPassthroughVFIO(void) {
Converts from single to double spacing between functions, but I see this file is inconsistent anyways so no objection from me. Would be nice to standardize the codebase on double spacing IMO. I don't know if we have a guidelines for that one way or the other though? Anyways... Reviewed-by: Cole Robinson <crobinso@redhat.com> - Cole

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 9 +++++++++ src/qemu/qemu_domain.h | 2 ++ 2 files changed, 11 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 824bca89f4..6502c6191c 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -29,6 +29,7 @@ #include "qemu_dbus.h" #include "qemu_process.h" #include "qemu_capabilities.h" +#include "qemu_hostdev.h" #include "qemu_migration.h" #include "qemu_migration_params.h" #include "qemu_security.h" @@ -12825,6 +12826,14 @@ qemuDomainSupportsVideoVga(virDomainVideoDefPtr video, } +bool +qemuDomainNeedsVFIO(const virDomainDef *def) +{ + return virDomainDefHasVFIOHostdev(def) || + virDomainDefHasMdevHostdev(def); +} + + /** * qemuDomainGetHostdevPath: * @def: domain definition diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 6c490bd9d8..8e3917c205 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -1084,6 +1084,8 @@ int qemuDomainCheckMonitor(virQEMUDriverPtr driver, bool qemuDomainSupportsVideoVga(virDomainVideoDefPtr video, virQEMUCapsPtr qemuCaps); +bool qemuDomainNeedsVFIO(const virDomainDef *def); + int qemuDomainGetHostdevPath(virDomainDefPtr def, virDomainHostdevDefPtr dev, bool teardown, -- 2.21.0

On 9/26/19 12:12 PM, Michal Privoznik wrote:
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 9 +++++++++ src/qemu/qemu_domain.h | 2 ++ 2 files changed, 11 insertions(+)
Reviewed-by: Cole Robinson <crobinso@redhat.com> - Cole

Since its introduction in v1.0.5-rc1-19-g6e13860cb4 the qemuTeardownHostdevCgroup() does nothing unless the passed hostdev is a PCI device with VFIO backend. This seems unnecessary. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_cgroup.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 318157dab0..4d6f0c33cd 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -405,16 +405,10 @@ qemuTeardownHostdevCgroup(virDomainObjPtr vm, size_t i, npaths = 0; int rv, ret = -1; - /* currently this only does something for PCI devices using vfio - * for device assignment, but it is called for *all* hostdev - * devices. - */ - if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) return 0; - if (virHostdevIsVFIODevice(dev) && - qemuDomainGetHostdevPath(vm->def, dev, true, + if (qemuDomainGetHostdevPath(vm->def, dev, true, &npaths, &path, NULL) < 0) goto cleanup; -- 2.21.0

On 9/26/19 12:12 PM, Michal Privoznik wrote:
Since its introduction in v1.0.5-rc1-19-g6e13860cb4 the qemuTeardownHostdevCgroup() does nothing unless the passed hostdev is a PCI device with VFIO backend. This seems unnecessary.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_cgroup.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 318157dab0..4d6f0c33cd 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -405,16 +405,10 @@ qemuTeardownHostdevCgroup(virDomainObjPtr vm, size_t i, npaths = 0; int rv, ret = -1;
- /* currently this only does something for PCI devices using vfio - * for device assignment, but it is called for *all* hostdev - * devices. - */ - if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) return 0;
- if (virHostdevIsVFIODevice(dev) && - qemuDomainGetHostdevPath(vm->def, dev, true, + if (qemuDomainGetHostdevPath(vm->def, dev, true, &npaths, &path, NULL) < 0) goto cleanup;
Agreed, seems sensible to reflect qemuSetupHostdevCgroup Reviewed-by: Cole Robinson <crobinso@redhat.com> - Cole

In near future, the decision what to do with /dev/vfio/vfio with respect to domain namespace and CGroup is going to be moved out of qemuDomainGetHostdevPath() because there will be some other types of devices than hostdevs that need access to VFIO. All functions that I'm changing assume that hostdev we are adding/removing to VM is not in the definition yet (because of how qemuDomainNeedsVFIO() is written). Fortunately, this assumption is true. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_cgroup.c | 48 +++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_domain.c | 36 +++++++++++++++++++++++++++++++ 2 files changed, 83 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 4d6f0c33cd..f110b49d16 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -25,6 +25,7 @@ #include "qemu_domain.h" #include "qemu_process.h" #include "qemu_extdevice.h" +#include "qemu_hostdev.h" #include "vircgroup.h" #include "virlog.h" #include "viralloc.h" @@ -360,6 +361,17 @@ qemuTeardownInputCgroup(virDomainObjPtr vm, } +/** + * qemuSetupHostdevCgroup: + * vm: domain object + * @dev: device to allow + * + * For given host device @dev allow access to in Cgroups. + * Note, @dev must not be in @vm's definition. + * + * Returns: 0 on success, + * -1 otherwise. + */ int qemuSetupHostdevCgroup(virDomainObjPtr vm, virDomainHostdevDefPtr dev) @@ -386,6 +398,17 @@ qemuSetupHostdevCgroup(virDomainObjPtr vm, goto cleanup; } + if (qemuHostdevNeedsVFIO(dev) && + !qemuDomainNeedsVFIO(vm->def)) { + VIR_DEBUG("Cgroup allow %s perms=%d", QEMU_DEV_VFIO, VIR_CGROUP_DEVICE_RW); + rv = virCgroupAllowDevicePath(priv->cgroup, QEMU_DEV_VFIO, + VIR_CGROUP_DEVICE_RW, false); + virDomainAuditCgroupPath(vm, priv->cgroup, "allow", + QEMU_DEV_VFIO, "rw", rv); + if (rv < 0) + goto cleanup; + } + ret = 0; cleanup: @@ -396,9 +419,21 @@ qemuSetupHostdevCgroup(virDomainObjPtr vm, return ret; } + +/** + * qemuTeardownHostdevCgroup: + * @vm: doamin object + * @dev: device to tear down + * + * For given host device @dev deny access to it in CGroups. + * Note, @dev must not be in @vm's definition. + * + * Returns: 0 on success, + * -1 otherwise. + */ int qemuTeardownHostdevCgroup(virDomainObjPtr vm, - virDomainHostdevDefPtr dev) + virDomainHostdevDefPtr dev) { qemuDomainObjPrivatePtr priv = vm->privateData; char **path = NULL; @@ -422,6 +457,17 @@ qemuTeardownHostdevCgroup(virDomainObjPtr vm, goto cleanup; } + if (qemuHostdevNeedsVFIO(dev) && + !qemuDomainNeedsVFIO(vm->def)) { + VIR_DEBUG("Cgroup deny " QEMU_DEV_VFIO); + rv = virCgroupDenyDevicePath(priv->cgroup, QEMU_DEV_VFIO, + VIR_CGROUP_DEVICE_RWM, false); + virDomainAuditCgroupPath(vm, priv->cgroup, "deny", + QEMU_DEV_VFIO, "rwm", rv); + if (rv < 0) + goto cleanup; + } + ret = 0; cleanup: for (i = 0; i < npaths; i++) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 6502c6191c..02b6e590cd 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -13540,6 +13540,10 @@ qemuDomainSetupHostdev(virQEMUDriverConfigPtr cfg ATTRIBUTE_UNUSED, goto cleanup; } + if (qemuHostdevNeedsVFIO(dev) && + qemuDomainCreateDevice(QEMU_DEV_VFIO, data, false) < 0) + goto cleanup; + ret = 0; cleanup: for (i = 0; i < npaths; i++) @@ -14576,6 +14580,17 @@ qemuDomainNamespaceTeardownDisk(virDomainObjPtr vm ATTRIBUTE_UNUSED, } +/** + * qemuDomainNamespaceSetupHostdev: + * @vm: domain object + * @hostdev: hostdev to create in @vm's namespace + * + * For given @hostdev, create its devfs representation (if it has one) in + * domain namespace. Note, @hostdev must not be in @vm's definition. + * + * Returns: 0 on success, + * -1 otherwise. + */ int qemuDomainNamespaceSetupHostdev(virDomainObjPtr vm, virDomainHostdevDefPtr hostdev) @@ -14590,6 +14605,11 @@ qemuDomainNamespaceSetupHostdev(virDomainObjPtr vm, if (qemuDomainNamespaceMknodPaths(vm, (const char **)paths, npaths) < 0) goto cleanup; + if (qemuHostdevNeedsVFIO(hostdev) && + !qemuDomainNeedsVFIO(vm->def) && + qemuDomainNamespaceMknodPath(vm, QEMU_DEV_VFIO) < 0) + goto cleanup; + ret = 0; cleanup: for (i = 0; i < npaths; i++) @@ -14599,6 +14619,17 @@ qemuDomainNamespaceSetupHostdev(virDomainObjPtr vm, } +/** + * qemuDomainNamespaceTeardownHostdev: + * @vm: domain object + * @hostdev: hostdev to remove in @vm's namespace + * + * For given @hostdev, remove its devfs representation (if it has one) in + * domain namespace. Note, @hostdev must not be in @vm's definition. + * + * Returns: 0 on success, + * -1 otherwise. + */ int qemuDomainNamespaceTeardownHostdev(virDomainObjPtr vm, virDomainHostdevDefPtr hostdev) @@ -14614,6 +14645,11 @@ qemuDomainNamespaceTeardownHostdev(virDomainObjPtr vm, if (qemuDomainNamespaceUnlinkPaths(vm, (const char **)paths, npaths) < 0) goto cleanup; + if (qemuHostdevNeedsVFIO(hostdev) && + !qemuDomainNeedsVFIO(vm->def) && + qemuDomainNamespaceUnlinkPath(vm, QEMU_DEV_VFIO) < 0) + goto cleanup; + ret = 0; cleanup: for (i = 0; i < npaths; i++) -- 2.21.0

On 9/26/19 12:12 PM, Michal Privoznik wrote:
In near future, the decision what to do with /dev/vfio/vfio with respect to domain namespace and CGroup is going to be moved out of qemuDomainGetHostdevPath() because there will be some other types of devices than hostdevs that need access to VFIO.
All functions that I'm changing assume that hostdev we are adding/removing to VM is not in the definition yet (because of how qemuDomainNeedsVFIO() is written). Fortunately, this assumption is true.
qemuProcessLaunch -> qemuSetupCgroup -> qemuSetupDevicesCgroup -> has for (i = 0; i < vm->def->nhostdevs; i++) { if (qemuSetupHostdevCgroup(vm, vm->def->hostdevs[i]) < 0) goto cleanup; } So that above paragraph doesn't seem correct. If I apply up to patch #17, this breaks VM startup with a PCI passthrough device, but caveat only with cgroupv1. Apparently cgroupv2 doesn't have any notion of allowDevice ? or at least there's no impl there.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_cgroup.c | 48 +++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_domain.c | 36 +++++++++++++++++++++++++++++++ 2 files changed, 83 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 4d6f0c33cd..f110b49d16 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -25,6 +25,7 @@ #include "qemu_domain.h" #include "qemu_process.h" #include "qemu_extdevice.h" +#include "qemu_hostdev.h" #include "vircgroup.h" #include "virlog.h" #include "viralloc.h" @@ -360,6 +361,17 @@ qemuTeardownInputCgroup(virDomainObjPtr vm, }
+/** + * qemuSetupHostdevCgroup: + * vm: domain object + * @dev: device to allow + * + * For given host device @dev allow access to in Cgroups. + * Note, @dev must not be in @vm's definition. + * + * Returns: 0 on success, + * -1 otherwise. + */ int qemuSetupHostdevCgroup(virDomainObjPtr vm, virDomainHostdevDefPtr dev) @@ -386,6 +398,17 @@ qemuSetupHostdevCgroup(virDomainObjPtr vm, goto cleanup; }
+ if (qemuHostdevNeedsVFIO(dev) && + !qemuDomainNeedsVFIO(vm->def)) { + VIR_DEBUG("Cgroup allow %s perms=%d", QEMU_DEV_VFIO, VIR_CGROUP_DEVICE_RW); + rv = virCgroupAllowDevicePath(priv->cgroup, QEMU_DEV_VFIO, + VIR_CGROUP_DEVICE_RW, false); + virDomainAuditCgroupPath(vm, priv->cgroup, "allow", + QEMU_DEV_VFIO, "rw", rv); + if (rv < 0) + goto cleanup; + } + ret = 0;
So on VM startup this code path isn't hit, because dev is already in vm->def, so the if() condition will never be true. However this patch itself doesn't break things, because qemuDomainGetHostdevPath will also return /dev/vfio/vfio if the device needs it. I guess later patches undo that somehow but I didn't look into yet why that is. Is the !qemuDomainNeedsVFIO even necessary? The existing code will already call virCgroupAllowDevicePath(/dev/vfio/vfio) multiple times if the device has multiple VFIO devices attached so apparently that's not problematic. - Cole

On 10/18/19 12:10 AM, Cole Robinson wrote:
On 9/26/19 12:12 PM, Michal Privoznik wrote:
In near future, the decision what to do with /dev/vfio/vfio with respect to domain namespace and CGroup is going to be moved out of qemuDomainGetHostdevPath() because there will be some other types of devices than hostdevs that need access to VFIO.
All functions that I'm changing assume that hostdev we are adding/removing to VM is not in the definition yet (because of how qemuDomainNeedsVFIO() is written). Fortunately, this assumption is true.
qemuProcessLaunch -> qemuSetupCgroup -> qemuSetupDevicesCgroup ->
has
for (i = 0; i < vm->def->nhostdevs; i++) {
if (qemuSetupHostdevCgroup(vm, vm->def->hostdevs[i]) < 0)
goto cleanup;
}
So that above paragraph doesn't seem correct. If I apply up to patch #17, this breaks VM startup with a PCI passthrough device, but caveat only with cgroupv1. Apparently cgroupv2 doesn't have any notion of allowDevice ? or at least there's no impl there.
Yeah, cgroupv2 doesn't implement devices controller just yet.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_cgroup.c | 48 +++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_domain.c | 36 +++++++++++++++++++++++++++++++ 2 files changed, 83 insertions(+), 1 deletion(-)
@@ -386,6 +398,17 @@ qemuSetupHostdevCgroup(virDomainObjPtr vm, goto cleanup; }
+ if (qemuHostdevNeedsVFIO(dev) && + !qemuDomainNeedsVFIO(vm->def)) { + VIR_DEBUG("Cgroup allow %s perms=%d", QEMU_DEV_VFIO, VIR_CGROUP_DEVICE_RW); + rv = virCgroupAllowDevicePath(priv->cgroup, QEMU_DEV_VFIO, + VIR_CGROUP_DEVICE_RW, false); + virDomainAuditCgroupPath(vm, priv->cgroup, "allow", + QEMU_DEV_VFIO, "rw", rv); + if (rv < 0) + goto cleanup; + } + ret = 0;
So on VM startup this code path isn't hit, because dev is already in vm->def, so the if() condition will never be true.
However this patch itself doesn't break things, because qemuDomainGetHostdevPath will also return /dev/vfio/vfio if the device needs it. I guess later patches undo that somehow but I didn't look into yet why that is.
Is the !qemuDomainNeedsVFIO even necessary? The existing code will already call virCgroupAllowDevicePath(/dev/vfio/vfio) multiple times if the device has multiple VFIO devices attached so apparently that's not problematic.
Ah, good catch. It's not necessary. Will fix and repost. Michal

There are three cases where vir*DeviceGetPath() returns a const string. In these cases, the string is initialized in corresponding vir*DeviceNew() calls which fail if string couldn't be allocated. There's no point in checking the second time if the string is NULL. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 02b6e590cd..ca6de24e68 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -12915,8 +12915,7 @@ qemuDomainGetHostdevPath(virDomainDefPtr def, if (!usb) goto cleanup; - if (!(tmpPath = (char *)virUSBDeviceGetPath(usb))) - goto cleanup; + tmpPath = (char *)virUSBDeviceGetPath(usb); perm = VIR_CGROUP_DEVICE_RW; break; @@ -12937,8 +12936,7 @@ qemuDomainGetHostdevPath(virDomainDefPtr def, if (!scsi) goto cleanup; - if (!(tmpPath = (char *)virSCSIDeviceGetPath(scsi))) - goto cleanup; + tmpPath = (char *)virSCSIDeviceGetPath(scsi); perm = virSCSIDeviceGetReadonly(scsi) ? VIR_CGROUP_DEVICE_READ : VIR_CGROUP_DEVICE_RW; } @@ -12950,8 +12948,7 @@ qemuDomainGetHostdevPath(virDomainDefPtr def, if (!(host = virSCSIVHostDeviceNew(hostsrc->wwpn))) goto cleanup; - if (!(tmpPath = (char *)virSCSIVHostDeviceGetPath(host))) - goto cleanup; + tmpPath = (char *)virSCSIVHostDeviceGetPath(host); perm = VIR_CGROUP_DEVICE_RW; } break; -- 2.21.0

On 9/26/19 12:12 PM, Michal Privoznik wrote:
There are three cases where vir*DeviceGetPath() returns a const string. In these cases, the string is initialized in corresponding vir*DeviceNew() calls which fail if string couldn't be allocated. There's no point in checking the second time if the string is NULL.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
Reviewed-by: Cole Robinson <crobinso@redhat.com> - Cole

The @freeTmpPath boolean is used to determine if @tmpPath holds an allocated memory or is a pointer to a constant string and therefore if it needs to be freed or not when returning from the function. Well, we can unify the way we set @tmpPath so that it always holds an allocated memory and thus always must be freed. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index ca6de24e68..c903750fa0 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -12871,7 +12871,6 @@ qemuDomainGetHostdevPath(virDomainDefPtr def, virSCSIDevicePtr scsi = NULL; virSCSIVHostDevicePtr host = NULL; char *tmpPath = NULL; - bool freeTmpPath = false; bool includeVFIO = false; char **tmpPaths = NULL; int *tmpPerms = NULL; @@ -12894,7 +12893,6 @@ qemuDomainGetHostdevPath(virDomainDefPtr def, if (!(tmpPath = virPCIDeviceGetIOMMUGroupDev(pci))) goto cleanup; - freeTmpPath = true; perm = VIR_CGROUP_DEVICE_RW; if (teardown) { @@ -12915,7 +12913,8 @@ qemuDomainGetHostdevPath(virDomainDefPtr def, if (!usb) goto cleanup; - tmpPath = (char *)virUSBDeviceGetPath(usb); + if (VIR_STRDUP(tmpPath, virUSBDeviceGetPath(usb)) < 0) + goto cleanup; perm = VIR_CGROUP_DEVICE_RW; break; @@ -12936,7 +12935,8 @@ qemuDomainGetHostdevPath(virDomainDefPtr def, if (!scsi) goto cleanup; - tmpPath = (char *)virSCSIDeviceGetPath(scsi); + if (VIR_STRDUP(tmpPath, virSCSIDeviceGetPath(scsi)) < 0) + goto cleanup; perm = virSCSIDeviceGetReadonly(scsi) ? VIR_CGROUP_DEVICE_READ : VIR_CGROUP_DEVICE_RW; } @@ -12948,7 +12948,8 @@ qemuDomainGetHostdevPath(virDomainDefPtr def, if (!(host = virSCSIVHostDeviceNew(hostsrc->wwpn))) goto cleanup; - tmpPath = (char *)virSCSIVHostDeviceGetPath(host); + if (VIR_STRDUP(tmpPath, virSCSIVHostDeviceGetPath(host)) < 0) + goto cleanup; perm = VIR_CGROUP_DEVICE_RW; } break; @@ -12958,7 +12959,6 @@ qemuDomainGetHostdevPath(virDomainDefPtr def, if (!(tmpPath = virMediatedDeviceGetIOMMUGroupDev(mdevsrc->uuidstr))) goto cleanup; - freeTmpPath = true; includeVFIO = true; perm = VIR_CGROUP_DEVICE_RW; break; @@ -13009,8 +13009,7 @@ qemuDomainGetHostdevPath(virDomainDefPtr def, virUSBDeviceFree(usb); virSCSIDeviceFree(scsi); virSCSIVHostDeviceFree(host); - if (freeTmpPath) - VIR_FREE(tmpPath); + VIR_FREE(tmpPath); return ret; } -- 2.21.0

On 9/26/19 12:12 PM, Michal Privoznik wrote:
The @freeTmpPath boolean is used to determine if @tmpPath holds an allocated memory or is a pointer to a constant string and therefore if it needs to be freed or not when returning from the function. Well, we can unify the way we set @tmpPath so that it always holds an allocated memory and thus always must be freed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com> - Cole

There are several variables which could be automatically freed upon return from the function. I'm not changing @tmpPaths (which is a string list) because it is going to be removed in next commit. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index c903750fa0..d8d2f72bcc 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -12866,14 +12866,14 @@ qemuDomainGetHostdevPath(virDomainDefPtr def, virDomainHostdevSubsysSCSIPtr scsisrc = &dev->source.subsys.u.scsi; virDomainHostdevSubsysSCSIVHostPtr hostsrc = &dev->source.subsys.u.scsi_host; virDomainHostdevSubsysMediatedDevPtr mdevsrc = &dev->source.subsys.u.mdev; - virPCIDevicePtr pci = NULL; - virUSBDevicePtr usb = NULL; - virSCSIDevicePtr scsi = NULL; - virSCSIVHostDevicePtr host = NULL; - char *tmpPath = NULL; + VIR_AUTOPTR(virPCIDevice) pci = NULL; + VIR_AUTOPTR(virUSBDevice) usb = NULL; + VIR_AUTOPTR(virSCSIDevice) scsi = NULL; + VIR_AUTOPTR(virSCSIVHostDevice) host = NULL; + VIR_AUTOFREE(char *) tmpPath = NULL; bool includeVFIO = false; char **tmpPaths = NULL; - int *tmpPerms = NULL; + VIR_AUTOFREE(int *) tmpPerms = NULL; size_t tmpNpaths = 0; int perm = 0; @@ -13004,12 +13004,6 @@ qemuDomainGetHostdevPath(virDomainDefPtr def, ret = 0; cleanup: virStringListFreeCount(tmpPaths, tmpNpaths); - VIR_FREE(tmpPerms); - virPCIDeviceFree(pci); - virUSBDeviceFree(usb); - virSCSIDeviceFree(scsi); - virSCSIVHostDeviceFree(host); - VIR_FREE(tmpPath); return ret; } -- 2.21.0

On 9/26/19 12:12 PM, Michal Privoznik wrote:
There are several variables which could be automatically freed upon return from the function. I'm not changing @tmpPaths (which is a string list) because it is going to be removed in next commit.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-)
With the appropriate GLib adjustments: Reviewed-by: Cole Robinson <crobinso@redhat.com> I see you've already made adjustments in your branch. If a v3 is coming then I endorse pushing some of these cleanup pieces out of band, just to reduce the patch count. - Cole
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index c903750fa0..d8d2f72bcc 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -12866,14 +12866,14 @@ qemuDomainGetHostdevPath(virDomainDefPtr def, virDomainHostdevSubsysSCSIPtr scsisrc = &dev->source.subsys.u.scsi; virDomainHostdevSubsysSCSIVHostPtr hostsrc = &dev->source.subsys.u.scsi_host; virDomainHostdevSubsysMediatedDevPtr mdevsrc = &dev->source.subsys.u.mdev; - virPCIDevicePtr pci = NULL; - virUSBDevicePtr usb = NULL; - virSCSIDevicePtr scsi = NULL; - virSCSIVHostDevicePtr host = NULL; - char *tmpPath = NULL; + VIR_AUTOPTR(virPCIDevice) pci = NULL; + VIR_AUTOPTR(virUSBDevice) usb = NULL; + VIR_AUTOPTR(virSCSIDevice) scsi = NULL; + VIR_AUTOPTR(virSCSIVHostDevice) host = NULL; + VIR_AUTOFREE(char *) tmpPath = NULL; bool includeVFIO = false; char **tmpPaths = NULL; - int *tmpPerms = NULL; + VIR_AUTOFREE(int *) tmpPerms = NULL; size_t tmpNpaths = 0; int perm = 0;
@@ -13004,12 +13004,6 @@ qemuDomainGetHostdevPath(virDomainDefPtr def, ret = 0; cleanup: virStringListFreeCount(tmpPaths, tmpNpaths); - VIR_FREE(tmpPerms); - virPCIDeviceFree(pci); - virUSBDeviceFree(usb); - virSCSIDeviceFree(scsi); - virSCSIVHostDeviceFree(host); - VIR_FREE(tmpPath); return ret; }
- Cole

Now that all callers of qemuDomainGetHostdevPath() handle /dev/vfio/vfio on their own, we can safely drop handling in this function. In near future the decision whether domain needs VFIO file is going to include more device types than just virDomainHostdev. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_cgroup.c | 52 ++++++++------------- src/qemu/qemu_domain.c | 100 +++++++++-------------------------------- src/qemu/qemu_domain.h | 9 ++-- 3 files changed, 42 insertions(+), 119 deletions(-) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index f110b49d16..e3ea1e30ab 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -377,26 +377,23 @@ qemuSetupHostdevCgroup(virDomainObjPtr vm, virDomainHostdevDefPtr dev) { qemuDomainObjPrivatePtr priv = vm->privateData; - char **path = NULL; - int *perms = NULL; - size_t i, npaths = 0; + VIR_AUTOFREE(char *) path = NULL; + int perms; int rv, ret = -1; if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) return 0; - if (qemuDomainGetHostdevPath(NULL, dev, false, &npaths, &path, &perms) < 0) + if (qemuDomainGetHostdevPath(dev, &path, &perms) < 0) goto cleanup; - for (i = 0; i < npaths; i++) { - VIR_DEBUG("Cgroup allow %s perms=%d", path[i], perms[i]); - rv = virCgroupAllowDevicePath(priv->cgroup, path[i], perms[i], false); - virDomainAuditCgroupPath(vm, priv->cgroup, "allow", path[i], - virCgroupGetDevicePermsString(perms[i]), - rv); - if (rv < 0) - goto cleanup; - } + VIR_DEBUG("Cgroup allow %s perms=%d", path, perms); + rv = virCgroupAllowDevicePath(priv->cgroup, path, perms, false); + virDomainAuditCgroupPath(vm, priv->cgroup, "allow", path, + virCgroupGetDevicePermsString(perms), + rv); + if (rv < 0) + goto cleanup; if (qemuHostdevNeedsVFIO(dev) && !qemuDomainNeedsVFIO(vm->def)) { @@ -412,10 +409,6 @@ qemuSetupHostdevCgroup(virDomainObjPtr vm, ret = 0; cleanup: - for (i = 0; i < npaths; i++) - VIR_FREE(path[i]); - VIR_FREE(path); - VIR_FREE(perms); return ret; } @@ -436,26 +429,22 @@ qemuTeardownHostdevCgroup(virDomainObjPtr vm, virDomainHostdevDefPtr dev) { qemuDomainObjPrivatePtr priv = vm->privateData; - char **path = NULL; - size_t i, npaths = 0; + VIR_AUTOFREE(char *) path = NULL; int rv, ret = -1; if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) return 0; - if (qemuDomainGetHostdevPath(vm->def, dev, true, - &npaths, &path, NULL) < 0) + if (qemuDomainGetHostdevPath(dev, &path, NULL) < 0) goto cleanup; - for (i = 0; i < npaths; i++) { - VIR_DEBUG("Cgroup deny %s", path[i]); - rv = virCgroupDenyDevicePath(priv->cgroup, path[i], - VIR_CGROUP_DEVICE_RWM, false); - virDomainAuditCgroupPath(vm, priv->cgroup, - "deny", path[i], "rwm", rv); - if (rv < 0) - goto cleanup; - } + VIR_DEBUG("Cgroup deny %s", path); + rv = virCgroupDenyDevicePath(priv->cgroup, path, + VIR_CGROUP_DEVICE_RWM, false); + virDomainAuditCgroupPath(vm, priv->cgroup, + "deny", path, "rwm", rv); + if (rv < 0) + goto cleanup; if (qemuHostdevNeedsVFIO(dev) && !qemuDomainNeedsVFIO(vm->def)) { @@ -470,9 +459,6 @@ qemuTeardownHostdevCgroup(virDomainObjPtr vm, ret = 0; cleanup: - for (i = 0; i < npaths; i++) - VIR_FREE(path[i]); - VIR_FREE(path); return ret; } diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index d8d2f72bcc..bfe7838220 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -12836,29 +12836,23 @@ qemuDomainNeedsVFIO(const virDomainDef *def) /** * qemuDomainGetHostdevPath: - * @def: domain definition * @dev: host device definition - * @teardown: true if device will be removed - * @npaths: number of items in @path and @perms arrays * @path: resulting path to @dev * @perms: Optional pointer to VIR_CGROUP_DEVICE_* perms * * For given device @dev fetch its host path and store it at - * @path. If a device requires other paths to be present/allowed - * they are stored in the @path array after the actual path. - * Optionally, caller can get @perms on the path (e.g. rw/ro). + * @path. Optionally, caller can get @perms on the path (e.g. + * rw/ro). * - * The caller is responsible for freeing the memory. + * The caller is responsible for freeing the @path when no longer + * needed. * * Returns 0 on success, -1 otherwise. */ int -qemuDomainGetHostdevPath(virDomainDefPtr def, - virDomainHostdevDefPtr dev, - bool teardown, - size_t *npaths, - char ***path, - int **perms) +qemuDomainGetHostdevPath(virDomainHostdevDefPtr dev, + char **path, + int *perms) { int ret = -1; virDomainHostdevSubsysUSBPtr usbsrc = &dev->source.subsys.u.usb; @@ -12871,14 +12865,8 @@ qemuDomainGetHostdevPath(virDomainDefPtr def, VIR_AUTOPTR(virSCSIDevice) scsi = NULL; VIR_AUTOPTR(virSCSIVHostDevice) host = NULL; VIR_AUTOFREE(char *) tmpPath = NULL; - bool includeVFIO = false; - char **tmpPaths = NULL; - VIR_AUTOFREE(int *) tmpPerms = NULL; - size_t tmpNpaths = 0; int perm = 0; - *npaths = 0; - switch ((virDomainHostdevMode) dev->mode) { case VIR_DOMAIN_HOSTDEV_MODE_SUBSYS: switch ((virDomainHostdevSubsysType)dev->source.subsys.type) { @@ -12895,12 +12883,6 @@ qemuDomainGetHostdevPath(virDomainDefPtr def, goto cleanup; perm = VIR_CGROUP_DEVICE_RW; - if (teardown) { - if (!virDomainDefHasVFIOHostdev(def)) - includeVFIO = true; - } else { - includeVFIO = true; - } } break; @@ -12959,7 +12941,6 @@ qemuDomainGetHostdevPath(virDomainDefPtr def, if (!(tmpPath = virMediatedDeviceGetIOMMUGroupDev(mdevsrc->uuidstr))) goto cleanup; - includeVFIO = true; perm = VIR_CGROUP_DEVICE_RW; break; case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_LAST: @@ -12973,37 +12954,11 @@ qemuDomainGetHostdevPath(virDomainDefPtr def, break; } - if (tmpPath) { - size_t toAlloc = 1; - - if (includeVFIO) - toAlloc = 2; - - if (VIR_ALLOC_N(tmpPaths, toAlloc) < 0 || - VIR_ALLOC_N(tmpPerms, toAlloc) < 0 || - VIR_STRDUP(tmpPaths[0], tmpPath) < 0) - goto cleanup; - tmpNpaths = toAlloc; - tmpPerms[0] = perm; - - if (includeVFIO) { - if (VIR_STRDUP(tmpPaths[1], QEMU_DEV_VFIO) < 0) - goto cleanup; - tmpPerms[1] = VIR_CGROUP_DEVICE_RW; - } - } - - *npaths = tmpNpaths; - tmpNpaths = 0; - *path = tmpPaths; - tmpPaths = NULL; - if (perms) { - *perms = tmpPerms; - tmpPerms = NULL; - } + VIR_STEAL_PTR(*path, tmpPath); + if (perms) + *perms = perm; ret = 0; cleanup: - virStringListFreeCount(tmpPaths, tmpNpaths); return ret; } @@ -13519,16 +13474,13 @@ qemuDomainSetupHostdev(virQEMUDriverConfigPtr cfg ATTRIBUTE_UNUSED, const struct qemuDomainCreateDeviceData *data) { int ret = -1; - char **path = NULL; - size_t i, npaths = 0; + VIR_AUTOFREE(char *) path = NULL; - if (qemuDomainGetHostdevPath(NULL, dev, false, &npaths, &path, NULL) < 0) + if (qemuDomainGetHostdevPath(dev, &path, NULL) < 0) goto cleanup; - for (i = 0; i < npaths; i++) { - if (qemuDomainCreateDevice(path[i], data, false) < 0) - goto cleanup; - } + if (qemuDomainCreateDevice(path, data, false) < 0) + goto cleanup; if (qemuHostdevNeedsVFIO(dev) && qemuDomainCreateDevice(QEMU_DEV_VFIO, data, false) < 0) @@ -13536,9 +13488,6 @@ qemuDomainSetupHostdev(virQEMUDriverConfigPtr cfg ATTRIBUTE_UNUSED, ret = 0; cleanup: - for (i = 0; i < npaths; i++) - VIR_FREE(path[i]); - VIR_FREE(path); return ret; } @@ -14586,13 +14535,12 @@ qemuDomainNamespaceSetupHostdev(virDomainObjPtr vm, virDomainHostdevDefPtr hostdev) { int ret = -1; - char **paths = NULL; - size_t i, npaths = 0; + VIR_AUTOFREE(char *) path = NULL; - if (qemuDomainGetHostdevPath(NULL, hostdev, false, &npaths, &paths, NULL) < 0) + if (qemuDomainGetHostdevPath(hostdev, &path, NULL) < 0) goto cleanup; - if (qemuDomainNamespaceMknodPaths(vm, (const char **)paths, npaths) < 0) + if (qemuDomainNamespaceMknodPath(vm, path) < 0) goto cleanup; if (qemuHostdevNeedsVFIO(hostdev) && @@ -14602,9 +14550,6 @@ qemuDomainNamespaceSetupHostdev(virDomainObjPtr vm, ret = 0; cleanup: - for (i = 0; i < npaths; i++) - VIR_FREE(paths[i]); - VIR_FREE(paths); return ret; } @@ -14625,14 +14570,12 @@ qemuDomainNamespaceTeardownHostdev(virDomainObjPtr vm, virDomainHostdevDefPtr hostdev) { int ret = -1; - char **paths = NULL; - size_t i, npaths = 0; + VIR_AUTOFREE(char *) path = NULL; - if (qemuDomainGetHostdevPath(vm->def, hostdev, true, - &npaths, &paths, NULL) < 0) + if (qemuDomainGetHostdevPath(hostdev, &path, NULL) < 0) goto cleanup; - if (qemuDomainNamespaceUnlinkPaths(vm, (const char **)paths, npaths) < 0) + if (qemuDomainNamespaceUnlinkPath(vm, path) < 0) goto cleanup; if (qemuHostdevNeedsVFIO(hostdev) && @@ -14642,9 +14585,6 @@ qemuDomainNamespaceTeardownHostdev(virDomainObjPtr vm, ret = 0; cleanup: - for (i = 0; i < npaths; i++) - VIR_FREE(paths[i]); - VIR_FREE(paths); return ret; } diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 8e3917c205..de6479f8e0 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -1086,12 +1086,9 @@ bool qemuDomainSupportsVideoVga(virDomainVideoDefPtr video, bool qemuDomainNeedsVFIO(const virDomainDef *def); -int qemuDomainGetHostdevPath(virDomainDefPtr def, - virDomainHostdevDefPtr dev, - bool teardown, - size_t *npaths, - char ***path, - int **perms); +int qemuDomainGetHostdevPath(virDomainHostdevDefPtr dev, + char **path, + int *perms); int qemuDomainBuildNamespace(virQEMUDriverConfigPtr cfg, virSecurityManagerPtr mgr, -- 2.21.0

Previous patches rendered some of 'cleanup' labels needless. Drop them. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_cgroup.c | 25 ++++++++----------- src/qemu/qemu_domain.c | 56 +++++++++++++++++------------------------- 2 files changed, 32 insertions(+), 49 deletions(-) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index e3ea1e30ab..9684bf3662 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -379,13 +379,13 @@ qemuSetupHostdevCgroup(virDomainObjPtr vm, qemuDomainObjPrivatePtr priv = vm->privateData; VIR_AUTOFREE(char *) path = NULL; int perms; - int rv, ret = -1; + int rv; if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) return 0; if (qemuDomainGetHostdevPath(dev, &path, &perms) < 0) - goto cleanup; + return -1; VIR_DEBUG("Cgroup allow %s perms=%d", path, perms); rv = virCgroupAllowDevicePath(priv->cgroup, path, perms, false); @@ -393,7 +393,7 @@ qemuSetupHostdevCgroup(virDomainObjPtr vm, virCgroupGetDevicePermsString(perms), rv); if (rv < 0) - goto cleanup; + return -1; if (qemuHostdevNeedsVFIO(dev) && !qemuDomainNeedsVFIO(vm->def)) { @@ -403,13 +403,10 @@ qemuSetupHostdevCgroup(virDomainObjPtr vm, virDomainAuditCgroupPath(vm, priv->cgroup, "allow", QEMU_DEV_VFIO, "rw", rv); if (rv < 0) - goto cleanup; + return -1; } - ret = 0; - - cleanup: - return ret; + return 0; } @@ -430,13 +427,13 @@ qemuTeardownHostdevCgroup(virDomainObjPtr vm, { qemuDomainObjPrivatePtr priv = vm->privateData; VIR_AUTOFREE(char *) path = NULL; - int rv, ret = -1; + int rv; if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) return 0; if (qemuDomainGetHostdevPath(dev, &path, NULL) < 0) - goto cleanup; + return -1; VIR_DEBUG("Cgroup deny %s", path); rv = virCgroupDenyDevicePath(priv->cgroup, path, @@ -444,7 +441,7 @@ qemuTeardownHostdevCgroup(virDomainObjPtr vm, virDomainAuditCgroupPath(vm, priv->cgroup, "deny", path, "rwm", rv); if (rv < 0) - goto cleanup; + return -1; if (qemuHostdevNeedsVFIO(dev) && !qemuDomainNeedsVFIO(vm->def)) { @@ -454,12 +451,10 @@ qemuTeardownHostdevCgroup(virDomainObjPtr vm, virDomainAuditCgroupPath(vm, priv->cgroup, "deny", QEMU_DEV_VFIO, "rwm", rv); if (rv < 0) - goto cleanup; + return -1; } - ret = 0; - cleanup: - return ret; + return 0; } diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index bfe7838220..e92c2053c1 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -12854,7 +12854,6 @@ qemuDomainGetHostdevPath(virDomainHostdevDefPtr dev, char **path, int *perms) { - int ret = -1; virDomainHostdevSubsysUSBPtr usbsrc = &dev->source.subsys.u.usb; virDomainHostdevSubsysPCIPtr pcisrc = &dev->source.subsys.u.pci; virDomainHostdevSubsysSCSIPtr scsisrc = &dev->source.subsys.u.scsi; @@ -12877,10 +12876,10 @@ qemuDomainGetHostdevPath(virDomainHostdevDefPtr dev, pcisrc->addr.slot, pcisrc->addr.function); if (!pci) - goto cleanup; + return -1; if (!(tmpPath = virPCIDeviceGetIOMMUGroupDev(pci))) - goto cleanup; + return -1; perm = VIR_CGROUP_DEVICE_RW; } @@ -12893,10 +12892,10 @@ qemuDomainGetHostdevPath(virDomainHostdevDefPtr dev, usbsrc->device, NULL); if (!usb) - goto cleanup; + return -1; if (VIR_STRDUP(tmpPath, virUSBDeviceGetPath(usb)) < 0) - goto cleanup; + return -1; perm = VIR_CGROUP_DEVICE_RW; break; @@ -12915,10 +12914,10 @@ qemuDomainGetHostdevPath(virDomainHostdevDefPtr dev, dev->shareable); if (!scsi) - goto cleanup; + return -1; if (VIR_STRDUP(tmpPath, virSCSIDeviceGetPath(scsi)) < 0) - goto cleanup; + return -1; perm = virSCSIDeviceGetReadonly(scsi) ? VIR_CGROUP_DEVICE_READ : VIR_CGROUP_DEVICE_RW; } @@ -12928,10 +12927,10 @@ qemuDomainGetHostdevPath(virDomainHostdevDefPtr dev, if (hostsrc->protocol == VIR_DOMAIN_HOSTDEV_SUBSYS_SCSI_HOST_PROTOCOL_TYPE_VHOST) { if (!(host = virSCSIVHostDeviceNew(hostsrc->wwpn))) - goto cleanup; + return -1; if (VIR_STRDUP(tmpPath, virSCSIVHostDeviceGetPath(host)) < 0) - goto cleanup; + return -1; perm = VIR_CGROUP_DEVICE_RW; } break; @@ -12939,7 +12938,7 @@ qemuDomainGetHostdevPath(virDomainHostdevDefPtr dev, case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_MDEV: if (!(tmpPath = virMediatedDeviceGetIOMMUGroupDev(mdevsrc->uuidstr))) - goto cleanup; + return -1; perm = VIR_CGROUP_DEVICE_RW; break; @@ -12957,9 +12956,7 @@ qemuDomainGetHostdevPath(virDomainHostdevDefPtr dev, VIR_STEAL_PTR(*path, tmpPath); if (perms) *perms = perm; - ret = 0; - cleanup: - return ret; + return 0; } @@ -13473,22 +13470,19 @@ qemuDomainSetupHostdev(virQEMUDriverConfigPtr cfg ATTRIBUTE_UNUSED, virDomainHostdevDefPtr dev, const struct qemuDomainCreateDeviceData *data) { - int ret = -1; VIR_AUTOFREE(char *) path = NULL; if (qemuDomainGetHostdevPath(dev, &path, NULL) < 0) - goto cleanup; + return -1; if (qemuDomainCreateDevice(path, data, false) < 0) - goto cleanup; + return -1; if (qemuHostdevNeedsVFIO(dev) && qemuDomainCreateDevice(QEMU_DEV_VFIO, data, false) < 0) - goto cleanup; + return -1; - ret = 0; - cleanup: - return ret; + return 0; } @@ -14534,23 +14528,20 @@ int qemuDomainNamespaceSetupHostdev(virDomainObjPtr vm, virDomainHostdevDefPtr hostdev) { - int ret = -1; VIR_AUTOFREE(char *) path = NULL; if (qemuDomainGetHostdevPath(hostdev, &path, NULL) < 0) - goto cleanup; + return -1; if (qemuDomainNamespaceMknodPath(vm, path) < 0) - goto cleanup; + return -1; if (qemuHostdevNeedsVFIO(hostdev) && !qemuDomainNeedsVFIO(vm->def) && qemuDomainNamespaceMknodPath(vm, QEMU_DEV_VFIO) < 0) - goto cleanup; + return -1; - ret = 0; - cleanup: - return ret; + return 0; } @@ -14569,23 +14560,20 @@ int qemuDomainNamespaceTeardownHostdev(virDomainObjPtr vm, virDomainHostdevDefPtr hostdev) { - int ret = -1; VIR_AUTOFREE(char *) path = NULL; if (qemuDomainGetHostdevPath(hostdev, &path, NULL) < 0) - goto cleanup; + return -1; if (qemuDomainNamespaceUnlinkPath(vm, path) < 0) - goto cleanup; + return -1; if (qemuHostdevNeedsVFIO(hostdev) && !qemuDomainNeedsVFIO(vm->def) && qemuDomainNamespaceUnlinkPath(vm, QEMU_DEV_VFIO) < 0) - goto cleanup; + return -1; - ret = 0; - cleanup: - return ret; + return 0; } -- 2.21.0

Sometimes, we have a PCI address and not fully allocated virPCIDevice and yet we still want to know its /dev/vfio/N path. Introduce virPCIDeviceAddressGetIOMMUGroupDev() function exactly for that. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/qemu/qemu_domain.c | 10 +--------- src/util/virpci.c | 15 +++++++++++++++ src/util/virpci.h | 1 + 4 files changed, 18 insertions(+), 9 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index de124eb37b..4b726b3139 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2658,6 +2658,7 @@ virPCIDeviceAddressAsString; virPCIDeviceAddressEqual; virPCIDeviceAddressFree; virPCIDeviceAddressGetIOMMUGroupAddresses; +virPCIDeviceAddressGetIOMMUGroupDev; virPCIDeviceAddressGetIOMMUGroupNum; virPCIDeviceAddressGetSysfsFile; virPCIDeviceAddressIOMMUGroupIterate; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index e92c2053c1..f8fe430a7f 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -12859,7 +12859,6 @@ qemuDomainGetHostdevPath(virDomainHostdevDefPtr dev, virDomainHostdevSubsysSCSIPtr scsisrc = &dev->source.subsys.u.scsi; virDomainHostdevSubsysSCSIVHostPtr hostsrc = &dev->source.subsys.u.scsi_host; virDomainHostdevSubsysMediatedDevPtr mdevsrc = &dev->source.subsys.u.mdev; - VIR_AUTOPTR(virPCIDevice) pci = NULL; VIR_AUTOPTR(virUSBDevice) usb = NULL; VIR_AUTOPTR(virSCSIDevice) scsi = NULL; VIR_AUTOPTR(virSCSIVHostDevice) host = NULL; @@ -12871,14 +12870,7 @@ qemuDomainGetHostdevPath(virDomainHostdevDefPtr dev, switch ((virDomainHostdevSubsysType)dev->source.subsys.type) { case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: if (pcisrc->backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO) { - pci = virPCIDeviceNew(pcisrc->addr.domain, - pcisrc->addr.bus, - pcisrc->addr.slot, - pcisrc->addr.function); - if (!pci) - return -1; - - if (!(tmpPath = virPCIDeviceGetIOMMUGroupDev(pci))) + if (!(tmpPath = virPCIDeviceAddressGetIOMMUGroupDev(&pcisrc->addr))) return -1; perm = VIR_CGROUP_DEVICE_RW; diff --git a/src/util/virpci.c b/src/util/virpci.c index ee78151e74..fc620d49ce 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1974,6 +1974,21 @@ virPCIDeviceAddressGetIOMMUGroupNum(virPCIDeviceAddressPtr addr) } +char * +virPCIDeviceAddressGetIOMMUGroupDev(const virPCIDeviceAddress *devAddr) +{ + VIR_AUTOPTR(virPCIDevice) pci = NULL; + + if (!(pci = virPCIDeviceNew(devAddr->domain, + devAddr->bus, + devAddr->slot, + devAddr->function))) + return NULL; + + return virPCIDeviceGetIOMMUGroupDev(pci); +} + + /* virPCIDeviceGetIOMMUGroupDev - return the name of the device used * to control this PCI device's group (e.g. "/dev/vfio/15") */ diff --git a/src/util/virpci.h b/src/util/virpci.h index dc20f91710..293d10b3ab 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -193,6 +193,7 @@ int virPCIDeviceAddressGetIOMMUGroupAddresses(virPCIDeviceAddressPtr devAddr, virPCIDeviceAddressPtr **iommuGroupDevices, size_t *nIommuGroupDevices); int virPCIDeviceAddressGetIOMMUGroupNum(virPCIDeviceAddressPtr addr); +char *virPCIDeviceAddressGetIOMMUGroupDev(const virPCIDeviceAddress *devAddr); char *virPCIDeviceGetIOMMUGroupDev(virPCIDevicePtr dev); int virPCIDeviceIsAssignable(virPCIDevicePtr dev, -- 2.21.0

In near future we will have a list of PCI devices we want to detach (held in virPCIDeviceListPtr) but we don't have virDomainHostdevDefPtr. That's okay because virHostdevPreparePCIDevices() works with virPCIDeviceListPtr mostly anyway. And in very few places where it needs virDomainHostdevDefPtr are not interesting for our case. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ACKed-by: Peter Krempa <pkrempa@redhat.com> --- src/util/virhostdev.c | 48 +++++++++++++++++++++++++++++-------------- 1 file changed, 33 insertions(+), 15 deletions(-) diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index 85812423b5..d2c881bddf 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -719,27 +719,22 @@ virHostdevReattachAllPCIDevices(virHostdevManagerPtr mgr, } } -int -virHostdevPreparePCIDevices(virHostdevManagerPtr mgr, - const char *drv_name, - const char *dom_name, - const unsigned char *uuid, - virDomainHostdevDefPtr *hostdevs, - int nhostdevs, - unsigned int flags) + +static int +virHostdevPreparePCIDevicesImpl(virHostdevManagerPtr mgr, + const char *drv_name, + const char *dom_name, + const unsigned char *uuid, + virPCIDeviceListPtr pcidevs, + virDomainHostdevDefPtr *hostdevs, + int nhostdevs, + unsigned int flags) { - VIR_AUTOUNREF(virPCIDeviceListPtr) pcidevs = NULL; int last_processed_hostdev_vf = -1; size_t i; int ret = -1; virPCIDeviceAddressPtr devAddr = NULL; - if (!nhostdevs) - return 0; - - if (!(pcidevs = virHostdevGetPCIHostDeviceList(hostdevs, nhostdevs))) - return -1; - virObjectLock(mgr->activePCIHostdevs); virObjectLock(mgr->inactivePCIHostdevs); @@ -989,6 +984,29 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr mgr, return ret; } + +int +virHostdevPreparePCIDevices(virHostdevManagerPtr mgr, + const char *drv_name, + const char *dom_name, + const unsigned char *uuid, + virDomainHostdevDefPtr *hostdevs, + int nhostdevs, + unsigned int flags) +{ + VIR_AUTOUNREF(virPCIDeviceListPtr) pcidevs = NULL; + + if (!nhostdevs) + return 0; + + if (!(pcidevs = virHostdevGetPCIHostDeviceList(hostdevs, nhostdevs))) + return -1; + + return virHostdevPreparePCIDevicesImpl(mgr, drv_name, dom_name, uuid, + pcidevs, hostdevs, nhostdevs, flags); +} + + /* @oldStateDir: * For upgrade purpose: see virHostdevRestoreNetConfig */ -- 2.21.0

In near future we will have a list of PCI devices we want to re-attach to the host (held in virPCIDeviceListPtr) but we don't have virDomainHostdevDefPtr. That's okay because virHostdevReAttachPCIDevices() works with virPCIDeviceListPtr mostly anyway. And in very few places where it needs virDomainHostdevDefPtr are not interesting for our case. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ACKed-by: Peter Krempa <pkrempa@redhat.com> --- src/util/virhostdev.c | 58 +++++++++++++++++++++++++++---------------- 1 file changed, 37 insertions(+), 21 deletions(-) diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index d2c881bddf..f0c97ca887 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -1007,30 +1007,17 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr mgr, } -/* @oldStateDir: - * For upgrade purpose: see virHostdevRestoreNetConfig - */ -void -virHostdevReAttachPCIDevices(virHostdevManagerPtr mgr, - const char *drv_name, - const char *dom_name, - virDomainHostdevDefPtr *hostdevs, - int nhostdevs, - const char *oldStateDir) +static void +virHostdevReAttachPCIDevicesImpl(virHostdevManagerPtr mgr, + const char *drv_name, + const char *dom_name, + virPCIDeviceListPtr pcidevs, + virDomainHostdevDefPtr *hostdevs, + int nhostdevs, + const char *oldStateDir) { - VIR_AUTOUNREF(virPCIDeviceListPtr) pcidevs = NULL; size_t i; - if (!nhostdevs) - return; - - if (!(pcidevs = virHostdevGetPCIHostDeviceList(hostdevs, nhostdevs))) { - VIR_ERROR(_("Failed to allocate PCI device list: %s"), - virGetLastErrorMessage()); - virResetLastError(); - return; - } - virObjectLock(mgr->activePCIHostdevs); virObjectLock(mgr->inactivePCIHostdevs); @@ -1126,6 +1113,35 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr mgr, virObjectUnlock(mgr->inactivePCIHostdevs); } + +/* @oldStateDir: + * For upgrade purpose: see virHostdevRestoreNetConfig + */ +void +virHostdevReAttachPCIDevices(virHostdevManagerPtr mgr, + const char *drv_name, + const char *dom_name, + virDomainHostdevDefPtr *hostdevs, + int nhostdevs, + const char *oldStateDir) +{ + VIR_AUTOUNREF(virPCIDeviceListPtr) pcidevs = NULL; + + if (!nhostdevs) + return; + + if (!(pcidevs = virHostdevGetPCIHostDeviceList(hostdevs, nhostdevs))) { + VIR_ERROR(_("Failed to allocate PCI device list: %s"), + virGetLastErrorMessage()); + virResetLastError(); + return; + } + + virHostdevReAttachPCIDevicesImpl(mgr, drv_name, dom_name, pcidevs, + hostdevs, nhostdevs, oldStateDir); +} + + int virHostdevUpdateActivePCIDevices(virHostdevManagerPtr mgr, virDomainHostdevDefPtr *hostdevs, -- 2.21.0

This helper is cleaner than plain memcpy() because one doesn't have to look into virPCIDeviceAddress struct to see if it contains any strings / pointers. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ACKed-by: Peter Krempa <pkrempa@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virpci.c | 14 ++++++++++++++ src/util/virpci.h | 4 ++++ 3 files changed, 19 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 4b726b3139..b3c95495c3 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2655,6 +2655,7 @@ virObjectUnref; # util/virpci.h virPCIDeviceAddressAsString; +virPCIDeviceAddressCopy; virPCIDeviceAddressEqual; virPCIDeviceAddressFree; virPCIDeviceAddressGetIOMMUGroupAddresses; diff --git a/src/util/virpci.c b/src/util/virpci.c index fc620d49ce..9bf081540a 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1350,6 +1350,20 @@ virPCIDeviceAddressEqual(const virPCIDeviceAddress *addr1, return false; } +/** + * virPCIDeviceAddressCopy: + * @dst: where to store address + * @src: source address to copy + * + * Creates a deep copy of given @src address and stores it into + * @dst which has to be pre-allocated by caller. + */ +void virPCIDeviceAddressCopy(virPCIDeviceAddressPtr dst, + const virPCIDeviceAddress *src) +{ + memcpy(dst, src, sizeof(*src)); +} + char * virPCIDeviceAddressAsString(const virPCIDeviceAddress *addr) { diff --git a/src/util/virpci.h b/src/util/virpci.h index 293d10b3ab..0a89bc9dcf 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -42,6 +42,7 @@ typedef virZPCIDeviceAddress *virZPCIDeviceAddressPtr; struct _virZPCIDeviceAddress { unsigned int uid; /* exempt from syntax-check */ unsigned int fid; + /* Don't forget to update virPCIDeviceAddressCopy if needed. */ }; #define VIR_PCI_DEVICE_ADDRESS_FMT "%04x:%02x:%02x.%d" @@ -54,6 +55,7 @@ struct _virPCIDeviceAddress { int multi; /* virTristateSwitch */ int extFlags; /* enum virPCIDeviceAddressExtensionFlags */ virZPCIDeviceAddress zpci; + /* Don't forget to update virPCIDeviceAddressCopy if needed. */ }; typedef enum { @@ -234,6 +236,8 @@ bool virPCIDeviceAddressIsEmpty(const virPCIDeviceAddress *addr); bool virPCIDeviceAddressEqual(const virPCIDeviceAddress *addr1, const virPCIDeviceAddress *addr2); +void virPCIDeviceAddressCopy(virPCIDeviceAddressPtr dst, + const virPCIDeviceAddress *src); char *virPCIDeviceAddressAsString(const virPCIDeviceAddress *addr) ATTRIBUTE_NONNULL(1); -- 2.21.0

There are going to be more disk types that are considered unsafe with respect to migration. Therefore, move the error reporting call outside of if() body and rework if-else combo to switch(). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_migration.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index a98ec2d55a..693240ed3d 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1239,6 +1239,8 @@ qemuMigrationSrcIsSafe(virDomainDefPtr def, for (i = 0; i < def->ndisks; i++) { virDomainDiskDefPtr disk = def->disks[i]; const char *src = virDomainDiskGetSource(disk); + int actualType = virStorageSourceGetActualType(disk->src); + bool unsafe = false; /* Disks without any source (i.e. floppies and CD-ROMs) * OR readonly are safe. */ @@ -1252,21 +1254,34 @@ qemuMigrationSrcIsSafe(virDomainDefPtr def, continue; /* However, disks on local FS (e.g. ext4) are not safe. */ - if (virStorageSourceGetActualType(disk->src) == VIR_STORAGE_TYPE_FILE) { + switch ((virStorageType) actualType) { + case VIR_STORAGE_TYPE_FILE: if ((rc = virFileIsSharedFS(src)) < 0) { return false; } else if (rc == 0) { - virReportError(VIR_ERR_MIGRATE_UNSAFE, "%s", - _("Migration without shared storage is unsafe")); - return false; + unsafe = true; } if ((rc = virStorageFileIsClusterFS(src)) < 0) return false; else if (rc == 1) continue; - } else if (virStorageSourceGetActualType(disk->src) == VIR_STORAGE_TYPE_NETWORK) { + break; + case VIR_STORAGE_TYPE_NETWORK: /* But network disks are safe again. */ continue; + + case VIR_STORAGE_TYPE_NONE: + case VIR_STORAGE_TYPE_BLOCK: + case VIR_STORAGE_TYPE_DIR: + case VIR_STORAGE_TYPE_VOLUME: + case VIR_STORAGE_TYPE_LAST: + break; + } + + if (unsafe) { + virReportError(VIR_ERR_MIGRATE_UNSAFE, "%s", + _("Migration without shared storage is unsafe")); + return false; } /* Our code elsewhere guarantees shared disks are either readonly (in -- 2.21.0

There is this class of PCI devices that act like disks: NVMe. Therefore, they are both PCI devices and disks. While we already have <hostdev/> (and can assign a NVMe device to a domain successfully) we don't have disk representation. There are three problems with PCI assignment in case of a NVMe device: 1) domains with <hostdev/> can't be migrated 2) NVMe device is assigned whole, there's no way to assign only a namespace 3) Because hypervisors see <hostdev/> they don't put block layer on top of it - users don't get all the fancy features like snapshots NVMe namespaces are way of splitting one continuous NVDIMM memory into smaller ones, effectively creating smaller NVMe-s (which can then be partitioned, LVMed, etc.) Because of all of this the following XML was chosen to model a NVMe device: <disk type='nvme' device='disk'> <driver name='qemu' type='raw'/> <source type='pci' managed='yes' namespace='1'> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <target dev='vda' bus='virtio'/> </disk> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.html.in | 57 +++++++++++++++++++++++-- docs/schemas/domaincommon.rng | 32 ++++++++++++++ tests/qemuxml2argvdata/disk-nvme.xml | 63 ++++++++++++++++++++++++++++ 3 files changed, 149 insertions(+), 3 deletions(-) create mode 100644 tests/qemuxml2argvdata/disk-nvme.xml diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 93991aa3d1..ad116ff509 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -2934,6 +2934,13 @@ </backingStore> <target dev='vdd' bus='virtio'/> </disk> + <disk type='nvme' device='disk'> + <driver name='qemu' type='raw'/> + <source type='pci' managed='yes' namespace='1'> + <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> + </source> + <target dev='vde' bus='virtio'/> + </disk> </devices> ...</pre> @@ -2947,7 +2954,8 @@ Valid values are "file", "block", "dir" (<span class="since">since 0.7.5</span>), "network" (<span class="since">since 0.8.7</span>), or - "volume" (<span class="since">since 1.0.5</span>) + "volume" (<span class="since">since 1.0.5</span>), or + "nvme" (<span class="since">since 5.6.0</span>) and refer to the underlying source for the disk. <span class="since">Since 0.0.3</span> </dd> @@ -3130,6 +3138,43 @@ <span class="since">Since 1.0.5</span> </p> </dd> + <dt><code>nvme</code></dt> + <dd> + To specify disk source for NVMe disk the <code>source</code> + element has the following attributes: + <dl> + <dt><code>type</code></dt> + <dd>The type of address specified in <code>address</code> + sub-element. Currently, only <code>pci</code> value is + accepted. + </dd> + + <dt><code>managed</code></dt> + <dd>This attribute instructs libvirt to detach NVMe + controller automatically on domain startup (<code>yes</code>) + or expect the controller to be detached by system + administrator (<code>no</code>). + </dd> + + <dt><code>namespace</code></dt> + <dd>The namespace ID which should be assigned to the domain. + According to NVMe standard, namespace numbers start from 1, + including. + </dd> + </dl> + + The difference between <code><disk type='nvme'></code> + and <code><hostdev/></code> is that the latter is plain + host device assignment with all its limitations (e.g. no live + migration), while the former makes hypervisor to run the NVMe + disk through hypervisor's block layer thus enabling all + features provided by the layer (e.g. snapshots, domain + migration, etc.). Moreover, since the NVMe disk is unbinded + from its PCI driver, the host kernel storage stack is not + involved (compared to passing say <code>/dev/nvme0n1</code> via + <code><disk type='block'></code> and therefore lower + latencies can be achieved. + </dd> </dl> With "file", "block", and "volume", one or more optional sub-elements <code>seclabel</code>, <a href="#seclabel">described @@ -3292,11 +3337,17 @@ initiator IQN needed to access the source via mandatory attribute <code>name</code>. </dd> + <dt><code>address</code></dt> + <dd>For disk of type <code>nvme</code> this element + specifies the PCI address of the host NVMe + controller. + <span class="since">Since 5.6.0</span> + </dd> </dl> <p> - For a "file" or "volume" disk type which represents a cdrom or floppy - (the <code>device</code> attribute), it is possible to define + For a "file" or "volume" disk type which represents a cdrom or + floppy (the <code>device</code> attribute), it is possible to define policy what to do with the disk if the source file is not accessible. (NB, <code>startupPolicy</code> is not valid for "volume" disk unless the specified storage volume is of "file" type). This is done by the diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 40eb4a2d75..4871a4c4d4 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -1603,6 +1603,7 @@ <ref name="diskSourceDir"/> <ref name="diskSourceNetwork"/> <ref name="diskSourceVolume"/> + <ref name="diskSourceNvme"/> </choice> </define> @@ -1918,6 +1919,37 @@ </optional> </define> + <define name="diskSourceNvme"> + <attribute name="type"> + <value>nvme</value> + </attribute> + <optional> + <element name="source"> + <attribute name="type"> + <value>pci</value> + </attribute> + <attribute name="namespace"> + <ref name="uint32"/> + </attribute> + <optional> + <attribute name="managed"> + <ref name="virYesNo"/> + </attribute> + </optional> + <element name="address"> + <ref name="pciaddress"/> + </element> + <ref name="diskSourceCommon"/> + <optional> + <ref name="storageStartupPolicy"/> + </optional> + <optional> + <ref name="encryption"/> + </optional> + </element> + </optional> + </define> + <define name="diskTarget"> <data type="string"> <param name="pattern">(ioemu:)?(fd|hd|sd|vd|xvd|ubd)[a-zA-Z0-9_]+</param> diff --git a/tests/qemuxml2argvdata/disk-nvme.xml b/tests/qemuxml2argvdata/disk-nvme.xml new file mode 100644 index 0000000000..66b2d8450e --- /dev/null +++ b/tests/qemuxml2argvdata/disk-nvme.xml @@ -0,0 +1,63 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <disk type='nvme' device='disk'> + <driver name='qemu' type='raw'/> + <source type='pci' managed='yes' namespace='1'> + <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> + </source> + <target dev='vda' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </disk> + <disk type='nvme' device='disk'> + <driver name='qemu' type='raw'/> + <source type='pci' managed='yes' namespace='2'> + <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> + </source> + <target dev='vdb' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </disk> + <disk type='nvme' device='disk'> + <driver name='qemu' type='raw'/> + <source type='pci' managed='no' namespace='1'> + <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> + </source> + <target dev='vdc' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> + </disk> + <disk type='nvme' device='disk'> + <driver name='qemu' type='qcow2' cache='none'/> + <source type='pci' managed='no' namespace='2'> + <address domain='0x0001' bus='0x02' slot='0x00' function='0x0'/> + <encryption format='luks'> + <secret type='passphrase' uuid='0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f'/> + </encryption> + </source> + <target dev='vdd' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> + </disk> + <controller type='usb' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <controller type='scsi' index='0' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </controller> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='none'/> + </devices> +</domain> -- 2.21.0

To simplify implementation, some restrictions are added. For instance, an NVMe disk can't go to any bus but virtio and has to be type of 'disk' and can't have startupPolicy set. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/domain_conf.c | 96 ++++++++++++++++++++++++++ src/libvirt_private.syms | 1 + src/libxl/xen_xl.c | 1 + src/qemu/qemu_block.c | 2 + src/qemu/qemu_command.c | 1 + src/qemu/qemu_driver.c | 4 ++ src/qemu/qemu_migration.c | 5 ++ src/util/virstoragefile.c | 59 ++++++++++++++++ src/util/virstoragefile.h | 17 +++++ tests/qemuxml2xmloutdata/disk-nvme.xml | 1 + tests/qemuxml2xmltest.c | 1 + 11 files changed, 188 insertions(+) create mode 120000 tests/qemuxml2xmloutdata/disk-nvme.xml diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index c1769f743b..fe540c195e 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -5118,6 +5118,11 @@ virDomainDiskDefPostParse(virDomainDiskDefPtr disk, return -1; } + if (disk->src->type == VIR_STORAGE_TYPE_NVME) { + if (disk->src->nvme->managed == VIR_TRISTATE_BOOL_ABSENT) + disk->src->nvme->managed = VIR_TRISTATE_BOOL_YES; + } + if (disk->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE && virDomainDiskDefAssignAddress(xmlopt, disk, def) < 0) { return -1; @@ -9251,6 +9256,76 @@ virDomainDiskSourceNetworkParse(xmlNodePtr node, } +static int +virDomainDiskSourceNVMeParse(xmlNodePtr node, + xmlXPathContextPtr ctxt, + virStorageSourcePtr src) +{ + VIR_AUTOPTR(virStorageSourceNVMeDef) nvme = NULL; + VIR_AUTOFREE(char *) type = NULL; + VIR_AUTOFREE(char *) namespace = NULL; + VIR_AUTOFREE(char *) managed = NULL; + xmlNodePtr address; + + if (VIR_ALLOC(nvme) < 0) + return -1; + + if (!(type = virXMLPropString(node, "type"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("missing 'type' attribute to disk source")); + return -1; + } + + if (STRNEQ(type, "pci")) { + virReportError(VIR_ERR_XML_ERROR, + _("unsupported source type '%s'"), + type); + return -1; + } + + if (!(namespace = virXMLPropString(node, "namespace"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("missing 'namespace' attribute to disk source")); + return -1; + } + + if (virStrToLong_ull(namespace, NULL, 10, &nvme->namespace) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("malformed namespace '%s'"), + namespace); + return -1; + } + + /* NVMe namespaces start from 1 */ + if (nvme->namespace == 0) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("NVMe namespace can't be zero")); + return -1; + } + + if ((managed = virXMLPropString(node, "managed"))) { + if ((nvme->managed = virTristateBoolTypeFromString(managed)) <= 0) { + virReportError(VIR_ERR_XML_ERROR, + _("malformed managed value '%s'"), + managed); + return -1; + } + } + + if (!(address = virXPathNode("./address", ctxt))) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("NVMe disk source is missing address")); + return -1; + } + + if (virPCIDeviceAddressParseXML(address, &nvme->pciAddr) < 0) + return -1; + + VIR_STEAL_PTR(src->nvme, nvme); + return 0; +} + + static int virDomainDiskSourcePRParse(xmlNodePtr node, xmlXPathContextPtr ctxt, @@ -9351,6 +9426,10 @@ virDomainStorageSourceParse(xmlNodePtr node, if (virDomainDiskSourcePoolDefParse(node, &src->srcpool) < 0) return -1; break; + case VIR_STORAGE_TYPE_NVME: + if (virDomainDiskSourceNVMeParse(node, ctxt, src) < 0) + return -1; + break; case VIR_STORAGE_TYPE_NONE: case VIR_STORAGE_TYPE_LAST: virReportError(VIR_ERR_INTERNAL_ERROR, @@ -24088,6 +24167,19 @@ virDomainDiskSourceFormatNetwork(virBufferPtr attrBuf, } +static void +virDomainDiskSourceNVMeFormat(virBufferPtr attrBuf, + virBufferPtr childBuf, + const virStorageSourceNVMeDef *nvme) +{ + virBufferAddLit(attrBuf, " type='pci'"); + virBufferAsprintf(attrBuf, " managed='%s'", + virTristateBoolTypeToString(nvme->managed)); + virBufferAsprintf(attrBuf, " namespace='%llu'", nvme->namespace); + virPCIDeviceAddressFormat(childBuf, nvme->pciAddr, false); +} + + static int virDomainDiskSourceFormatPrivateData(virBufferPtr buf, virStorageSourcePtr src, @@ -24174,6 +24266,10 @@ virDomainDiskSourceFormat(virBufferPtr buf, break; + case VIR_STORAGE_TYPE_NVME: + virDomainDiskSourceNVMeFormat(&attrBuf, &childBuf, src->nvme); + break; + case VIR_STORAGE_TYPE_NONE: case VIR_STORAGE_TYPE_LAST: virReportError(VIR_ERR_INTERNAL_ERROR, diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b3c95495c3..91b03afb17 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -3045,6 +3045,7 @@ virStorageSourceNetworkAssignDefaultPorts; virStorageSourceNew; virStorageSourceNewFromBacking; virStorageSourceNewFromBackingAbsolute; +virStorageSourceNVMeDefFree; virStorageSourceParseRBDColonString; virStorageSourcePoolDefFree; virStorageSourcePoolModeTypeFromString; diff --git a/src/libxl/xen_xl.c b/src/libxl/xen_xl.c index 3a41a4ad00..5b69812f40 100644 --- a/src/libxl/xen_xl.c +++ b/src/libxl/xen_xl.c @@ -1662,6 +1662,7 @@ xenFormatXLDiskSrc(virStorageSourcePtr src, char **srcstr) break; case VIR_STORAGE_TYPE_VOLUME: + case VIR_STORAGE_TYPE_NVME: case VIR_STORAGE_TYPE_NONE: case VIR_STORAGE_TYPE_LAST: break; diff --git a/src/qemu/qemu_block.c b/src/qemu/qemu_block.c index 4b5dd30e17..99b2a4efaa 100644 --- a/src/qemu/qemu_block.c +++ b/src/qemu/qemu_block.c @@ -1077,6 +1077,7 @@ qemuBlockStorageSourceGetBackendProps(virStorageSourcePtr src, break; case VIR_STORAGE_TYPE_VOLUME: + case VIR_STORAGE_TYPE_NVME: case VIR_STORAGE_TYPE_NONE: case VIR_STORAGE_TYPE_LAST: return NULL; @@ -2367,6 +2368,7 @@ qemuBlockStorageSourceCreateGetStorageProps(virStorageSourcePtr src, case VIR_STORAGE_TYPE_BLOCK: case VIR_STORAGE_TYPE_DIR: case VIR_STORAGE_TYPE_VOLUME: + case VIR_STORAGE_TYPE_NVME: return 0; case VIR_STORAGE_TYPE_NONE: diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 77470a6037..b2519465ba 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -1144,6 +1144,7 @@ qemuGetDriveSourceString(virStorageSourcePtr src, break; case VIR_STORAGE_TYPE_VOLUME: + case VIR_STORAGE_TYPE_NVME: case VIR_STORAGE_TYPE_NONE: case VIR_STORAGE_TYPE_LAST: break; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 0c65414a1a..ecfccae92e 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -14906,6 +14906,7 @@ qemuDomainSnapshotPrepareDiskExternalInactive(virDomainSnapshotDiskDefPtr snapdi case VIR_STORAGE_TYPE_DIR: case VIR_STORAGE_TYPE_VOLUME: + case VIR_STORAGE_TYPE_NVME: case VIR_STORAGE_TYPE_NONE: case VIR_STORAGE_TYPE_LAST: virReportError(VIR_ERR_INTERNAL_ERROR, @@ -14922,6 +14923,7 @@ qemuDomainSnapshotPrepareDiskExternalInactive(virDomainSnapshotDiskDefPtr snapdi case VIR_STORAGE_TYPE_NETWORK: case VIR_STORAGE_TYPE_DIR: case VIR_STORAGE_TYPE_VOLUME: + case VIR_STORAGE_TYPE_NVME: case VIR_STORAGE_TYPE_NONE: case VIR_STORAGE_TYPE_LAST: virReportError(VIR_ERR_INTERNAL_ERROR, @@ -14989,6 +14991,7 @@ qemuDomainSnapshotPrepareDiskExternalActive(virDomainSnapshotDiskDefPtr snapdisk case VIR_STORAGE_TYPE_DIR: case VIR_STORAGE_TYPE_VOLUME: + case VIR_STORAGE_TYPE_NVME: case VIR_STORAGE_TYPE_NONE: case VIR_STORAGE_TYPE_LAST: virReportError(VIR_ERR_INTERNAL_ERROR, @@ -15116,6 +15119,7 @@ qemuDomainSnapshotPrepareDiskInternal(virDomainDiskDefPtr disk, case VIR_STORAGE_TYPE_DIR: case VIR_STORAGE_TYPE_VOLUME: + case VIR_STORAGE_TYPE_NVME: case VIR_STORAGE_TYPE_NONE: case VIR_STORAGE_TYPE_LAST: virReportError(VIR_ERR_INTERNAL_ERROR, diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 693240ed3d..4fa989ff79 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -228,6 +228,7 @@ qemuMigrationDstPrecreateDisk(virConnectPtr conn, case VIR_STORAGE_TYPE_BLOCK: case VIR_STORAGE_TYPE_DIR: + case VIR_STORAGE_TYPE_NVME: case VIR_STORAGE_TYPE_NONE: case VIR_STORAGE_TYPE_LAST: virReportError(VIR_ERR_INTERNAL_ERROR, @@ -1270,6 +1271,10 @@ qemuMigrationSrcIsSafe(virDomainDefPtr def, /* But network disks are safe again. */ continue; + case VIR_STORAGE_TYPE_NVME: + unsafe = true; + break; + case VIR_STORAGE_TYPE_NONE: case VIR_STORAGE_TYPE_BLOCK: case VIR_STORAGE_TYPE_DIR: diff --git a/src/util/virstoragefile.c b/src/util/virstoragefile.c index 3201f57e62..725d68a248 100644 --- a/src/util/virstoragefile.c +++ b/src/util/virstoragefile.c @@ -57,6 +57,7 @@ VIR_ENUM_IMPL(virStorage, "dir", "network", "volume", + "nvme", ); VIR_ENUM_IMPL(virStorageFileFormat, @@ -2116,6 +2117,50 @@ virStoragePRDefCopy(virStoragePRDefPtr src) } +static virStorageSourceNVMeDefPtr +virStorageSourceNVMeDefCopy(const virStorageSourceNVMeDef *src) +{ + VIR_AUTOPTR(virStorageSourceNVMeDef) ret = NULL; + + if (VIR_ALLOC(ret) < 0) + return NULL; + + ret->namespace = src->namespace; + ret->managed = src->managed; + virPCIDeviceAddressCopy(&ret->pciAddr, &src->pciAddr); + VIR_RETURN_PTR(ret); +} + + +static bool +virStorageSourceNVMeDefIsEqual(const virStorageSourceNVMeDef *a, + const virStorageSourceNVMeDef *b) +{ + if (!a && !b) + return true; + + if (!a || !b) + return false; + + if (a->namespace != b->namespace || + a->managed != b->managed || + !virPCIDeviceAddressEqual(&a->pciAddr, &b->pciAddr)) + return false; + + return true; +} + + +void +virStorageSourceNVMeDefFree(virStorageSourceNVMeDefPtr def) +{ + if (!def) + return; + + VIR_FREE(def); +} + + virSecurityDeviceLabelDefPtr virStorageSourceGetSecurityLabelDef(virStorageSourcePtr src, const char *model) @@ -2325,6 +2370,10 @@ virStorageSourceCopy(const virStorageSource *src, !(def->pr = virStoragePRDefCopy(src->pr))) return NULL; + if (src->nvme && + !(def->nvme = virStorageSourceNVMeDefCopy(src->nvme))) + return NULL; + if (virStorageSourceInitiatorCopy(&def->initiator, &src->initiator)) return NULL; @@ -2378,6 +2427,10 @@ virStorageSourceIsSameLocation(virStorageSourcePtr a, } } + if (a->type == VIR_STORAGE_TYPE_NVME && + !virStorageSourceNVMeDefIsEqual(a->nvme, b->nvme)) + return false; + return true; } @@ -2465,6 +2518,9 @@ virStorageSourceIsLocalStorage(const virStorageSource *src) case VIR_STORAGE_TYPE_NETWORK: case VIR_STORAGE_TYPE_VOLUME: + /* While NVMe disks are local, they are not accessible via src->path. + * Therefore, we have to return false here. */ + case VIR_STORAGE_TYPE_NVME: case VIR_STORAGE_TYPE_LAST: case VIR_STORAGE_TYPE_NONE: return false; @@ -2550,6 +2606,7 @@ virStorageSourceClear(virStorageSourcePtr def) VIR_FREE(def->compat); virStorageEncryptionFree(def->encryption); virStoragePRDefFree(def->pr); + virStorageSourceNVMeDefFree(def->nvme); virStorageSourceSeclabelsClear(def); virStoragePermsFree(def->perms); VIR_FREE(def->timestamps); @@ -3807,6 +3864,7 @@ virStorageSourceUpdatePhysicalSize(virStorageSourcePtr src, /* We shouldn't get VOLUME, but the switch requires all cases */ case VIR_STORAGE_TYPE_VOLUME: + case VIR_STORAGE_TYPE_NVME: case VIR_STORAGE_TYPE_NONE: case VIR_STORAGE_TYPE_LAST: return -1; @@ -4268,6 +4326,7 @@ virStorageSourceIsRelative(virStorageSourcePtr src) case VIR_STORAGE_TYPE_NETWORK: case VIR_STORAGE_TYPE_VOLUME: + case VIR_STORAGE_TYPE_NVME: case VIR_STORAGE_TYPE_NONE: case VIR_STORAGE_TYPE_LAST: return false; diff --git a/src/util/virstoragefile.h b/src/util/virstoragefile.h index 81b83a53ef..9e79c7c6ef 100644 --- a/src/util/virstoragefile.h +++ b/src/util/virstoragefile.h @@ -31,6 +31,7 @@ #include "virsecret.h" #include "virautoclean.h" #include "virenum.h" +#include "virpci.h" /* Minimum header size required to probe all known formats with * virStorageFileProbeFormat, or obtain metadata from a known format. @@ -52,6 +53,7 @@ typedef enum { VIR_STORAGE_TYPE_DIR, VIR_STORAGE_TYPE_NETWORK, VIR_STORAGE_TYPE_VOLUME, + VIR_STORAGE_TYPE_NVME, VIR_STORAGE_TYPE_LAST } virStorageType; @@ -231,6 +233,16 @@ struct _virStorageSourceInitiatorDef { char *iqn; /* Initiator IQN */ }; +typedef struct _virStorageSourceNVMeDef virStorageSourceNVMeDef; +typedef virStorageSourceNVMeDef *virStorageSourceNVMeDefPtr; +struct _virStorageSourceNVMeDef { + unsigned long long namespace; + int managed; /* enum virTristateBool */ + virPCIDeviceAddress pciAddr; + + /* Don't forget to update virStorageSourceNVMeDefCopy */ +}; + typedef struct _virStorageDriverData virStorageDriverData; typedef virStorageDriverData *virStorageDriverDataPtr; @@ -262,6 +274,8 @@ struct _virStorageSource { bool encryptionInherited; virStoragePRDefPtr pr; + virStorageSourceNVMeDefPtr nvme; /* type == VIR_STORAGE_TYPE_NVME */ + virStorageSourceInitiatorDef initiator; virObjectPtr privateData; @@ -416,6 +430,9 @@ bool virStoragePRDefIsManaged(virStoragePRDefPtr prd); bool virStorageSourceChainHasManagedPR(virStorageSourcePtr src); +void virStorageSourceNVMeDefFree(virStorageSourceNVMeDefPtr def); +VIR_DEFINE_AUTOPTR_FUNC(virStorageSourceNVMeDef, virStorageSourceNVMeDefFree); + virSecurityDeviceLabelDefPtr virStorageSourceGetSecurityLabelDef(virStorageSourcePtr src, const char *model); diff --git a/tests/qemuxml2xmloutdata/disk-nvme.xml b/tests/qemuxml2xmloutdata/disk-nvme.xml new file mode 120000 index 0000000000..ea9eb267ac --- /dev/null +++ b/tests/qemuxml2xmloutdata/disk-nvme.xml @@ -0,0 +1 @@ +../qemuxml2argvdata/disk-nvme.xml \ No newline at end of file diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index d5c66d8791..6202eed439 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -348,6 +348,7 @@ mymain(void) DO_TEST("disk-network-sheepdog", NONE); DO_TEST("disk-network-vxhs", NONE); DO_TEST("disk-network-tlsx509", NONE); + DO_TEST("disk-nvme", QEMU_CAPS_VIRTIO_SCSI, QEMU_CAPS_QCOW2_LUKS); DO_TEST("disk-scsi", QEMU_CAPS_SCSI_LSI, QEMU_CAPS_SCSI_MEGASAS, QEMU_CAPS_SCSI_MPTSAS1068, QEMU_CAPS_SCSI_DISK_WWN); DO_TEST("disk-virtio-scsi-reservations", -- 2.21.0

This function will return true if there's a storage source of type VIR_STORAGE_TYPE_NVME, or false otherwise. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ACKed-by: Peter Krempa <pkrempa@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virstoragefile.c | 14 ++++++++++++++ src/util/virstoragefile.h | 2 ++ 3 files changed, 17 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 91b03afb17..4e2a74a31b 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -3024,6 +3024,7 @@ virStoragePRDefIsManaged; virStoragePRDefParseXML; virStorageSourceBackingStoreClear; virStorageSourceChainHasManagedPR; +virStorageSourceChainHasNVMe; virStorageSourceClear; virStorageSourceCopy; virStorageSourceFindByNodeName; diff --git a/src/util/virstoragefile.c b/src/util/virstoragefile.c index 725d68a248..093d2403de 100644 --- a/src/util/virstoragefile.c +++ b/src/util/virstoragefile.c @@ -2161,6 +2161,20 @@ virStorageSourceNVMeDefFree(virStorageSourceNVMeDefPtr def) } +bool +virStorageSourceChainHasNVMe(const virStorageSource *src) +{ + const virStorageSource *n; + + for (n = src; virStorageSourceIsBacking(n); n = n->backingStore) { + if (n->type == VIR_STORAGE_TYPE_NVME) + return true; + } + + return false; +} + + virSecurityDeviceLabelDefPtr virStorageSourceGetSecurityLabelDef(virStorageSourcePtr src, const char *model) diff --git a/src/util/virstoragefile.h b/src/util/virstoragefile.h index 9e79c7c6ef..306f84b383 100644 --- a/src/util/virstoragefile.h +++ b/src/util/virstoragefile.h @@ -433,6 +433,8 @@ virStorageSourceChainHasManagedPR(virStorageSourcePtr src); void virStorageSourceNVMeDefFree(virStorageSourceNVMeDefPtr def); VIR_DEFINE_AUTOPTR_FUNC(virStorageSourceNVMeDef, virStorageSourceNVMeDefFree); +bool virStorageSourceChainHasNVMe(const virStorageSource *src); + virSecurityDeviceLabelDefPtr virStorageSourceGetSecurityLabelDef(virStorageSourcePtr src, const char *model); -- 2.21.0

This function will return true if any of disks (or their backing chain) for given domain contains an NVMe disk. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ACKed-by: Peter Krempa <pkrempa@redhat.com> --- src/conf/domain_conf.c | 14 ++++++++++++++ src/conf/domain_conf.h | 3 +++ src/libvirt_private.syms | 1 + 3 files changed, 18 insertions(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index fe540c195e..98ee1f4058 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -31640,6 +31640,20 @@ virDomainDefHasManagedPR(const virDomainDef *def) } +bool +virDomainDefHasNVMeDisk(const virDomainDef *def) +{ + size_t i; + + for (i = 0; i < def->ndisks; i++) { + if (virStorageSourceChainHasNVMe(def->disks[i]->src)) + return true; + } + + return false; +} + + bool virDomainDefHasVFIOHostdev(const virDomainDef *def) { diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index bf5d1a154b..ada2c70a8b 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -3666,6 +3666,9 @@ virDomainDiskGetDetectZeroesMode(virDomainDiskDiscard discard, bool virDomainDefHasManagedPR(const virDomainDef *def); +bool +virDomainDefHasNVMeDisk(const virDomainDef *def); + bool virDomainDefHasVFIOHostdev(const virDomainDef *def); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 4e2a74a31b..d383dbe929 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -297,6 +297,7 @@ virDomainDefHasManagedPR; virDomainDefHasMdevHostdev; virDomainDefHasMemballoon; virDomainDefHasMemoryHotplug; +virDomainDefHasNVMeDisk; virDomainDefHasUSB; virDomainDefHasVcpusOffline; virDomainDefHasVFIOHostdev; -- 2.21.0

This module will be used by virHostdevManager and it's inspired by virPCIDevice module. They are very similar except instead of what makes a NVMe device: PCI address AND namespace ID. This means that a NVMe device can appear in a domain multiple times, each time with a different namespace. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 18 ++ src/util/Makefile.inc.am | 2 + src/util/virnvme.c | 454 +++++++++++++++++++++++++++++++++++++++ src/util/virnvme.h | 95 ++++++++ 4 files changed, 569 insertions(+) create mode 100644 src/util/virnvme.c create mode 100644 src/util/virnvme.h diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index d383dbe929..0c84b347db 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2629,6 +2629,24 @@ virNumaSetPagePoolSize; virNumaSetupMemoryPolicy; +# util/virnvme.h +virNVMeDeviceAddressGet; +virNVMeDeviceCopy; +virNVMeDeviceFree; +virNVMeDeviceListAdd; +virNVMeDeviceListCount; +virNVMeDeviceListCreateDetachList; +virNVMeDeviceListDel; +virNVMeDeviceListGet; +virNVMeDeviceListLookup; +virNVMeDeviceListLookupIndex; +virNVMeDeviceListNew; +virNVMeDeviceNew; +virNVMeDeviceUsedByClear; +virNVMeDeviceUsedByGet; +virNVMeDeviceUsedBySet; + + # util/virobject.h virClassForObject; virClassForObjectLockable; diff --git a/src/util/Makefile.inc.am b/src/util/Makefile.inc.am index 482b657a90..7e677b891c 100644 --- a/src/util/Makefile.inc.am +++ b/src/util/Makefile.inc.am @@ -145,6 +145,8 @@ UTIL_SOURCES = \ util/virnetlink.h \ util/virnodesuspend.c \ util/virnodesuspend.h \ + util/virnvme.c \ + util/virnvme.h \ util/virkmod.c \ util/virkmod.h \ util/virnuma.c \ diff --git a/src/util/virnvme.c b/src/util/virnvme.c new file mode 100644 index 0000000000..f52955c615 --- /dev/null +++ b/src/util/virnvme.c @@ -0,0 +1,454 @@ +/* + * virnvme.c: helper APIs for managing NVMe devices + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + */ + +#include <config.h> + +#include "virnvme.h" +#include "virobject.h" +#include "virpci.h" +#include "viralloc.h" +#include "virlog.h" +#include "virstring.h" + +VIR_LOG_INIT("util.nvme"); +#define VIR_FROM_THIS VIR_FROM_NONE + +struct _virNVMeDevice { + virPCIDeviceAddress address; /* PCI address of controller */ + unsigned int namespace; /* Namespace ID */ + bool managed; + + char *drvname; + char *domname; +}; + + +struct _virNVMeDeviceList { + virObjectLockable parent; + + size_t count; + virNVMeDevicePtr *devs; +}; + + +static virClassPtr virNVMeDeviceListClass; + +static void virNVMeDeviceListDispose(void *obj); + +static int +virNVMeOnceInit(void) +{ + if (!VIR_CLASS_NEW(virNVMeDeviceList, virClassForObjectLockable())) + return -1; + + return 0; +} + +VIR_ONCE_GLOBAL_INIT(virNVMe); + + +virNVMeDevicePtr +virNVMeDeviceNew(const virPCIDeviceAddress *address, + unsigned long namespace, + bool managed) +{ + VIR_AUTOPTR(virNVMeDevice) dev = NULL; + + if (VIR_ALLOC(dev) < 0) + return NULL; + + virPCIDeviceAddressCopy(&dev->address, address); + dev->namespace = namespace; + dev->managed = managed; + + VIR_RETURN_PTR(dev); +} + + +void +virNVMeDeviceFree(virNVMeDevicePtr dev) +{ + if (!dev) + return; + + virNVMeDeviceUsedByClear(dev); + VIR_FREE(dev); +} + + +virNVMeDevicePtr +virNVMeDeviceCopy(const virNVMeDevice *dev) +{ + VIR_AUTOPTR(virNVMeDevice) copy = NULL; + + if (VIR_ALLOC(copy) < 0 || + VIR_STRDUP(copy->drvname, dev->drvname) < 0 || + VIR_STRDUP(copy->domname, dev->domname) < 0) + return NULL; + + virPCIDeviceAddressCopy(©->address, &dev->address); + copy->namespace = dev->namespace; + copy->managed = dev->managed; + + VIR_RETURN_PTR(copy); +} + + +const virPCIDeviceAddress * +virNVMeDeviceAddressGet(const virNVMeDevice *dev) +{ + return &dev->address; +} + + +void +virNVMeDeviceUsedByClear(virNVMeDevicePtr dev) +{ + VIR_FREE(dev->drvname); + VIR_FREE(dev->domname); +} + + +void +virNVMeDeviceUsedByGet(const virNVMeDevice *dev, + const char **drv, + const char **dom) +{ + *drv = dev->drvname; + *dom = dev->domname; +} + + +int +virNVMeDeviceUsedBySet(virNVMeDevicePtr dev, + const char *drv, + const char *dom) +{ + if (VIR_STRDUP(dev->drvname, drv) < 0 || + VIR_STRDUP(dev->domname, dom) < 0) { + virNVMeDeviceUsedByClear(dev); + return -1; + } + + return 0; +} + + +virNVMeDeviceListPtr +virNVMeDeviceListNew(void) +{ + virNVMeDeviceListPtr list; + + if (virNVMeInitialize() < 0) + return NULL; + + if (!(list = virObjectLockableNew(virNVMeDeviceListClass))) + return NULL; + + return list; +} + + +static void +virNVMeDeviceListDispose(void *obj) +{ + virNVMeDeviceListPtr list = obj; + size_t i; + + for (i = 0; i < list->count; i++) + virNVMeDeviceFree(list->devs[i]); + + VIR_FREE(list->devs); +} + + +size_t +virNVMeDeviceListCount(const virNVMeDeviceList *list) +{ + return list->count; +} + + +int +virNVMeDeviceListAdd(virNVMeDeviceListPtr list, + const virNVMeDevice *dev) +{ + virNVMeDevicePtr tmp; + + if ((tmp = virNVMeDeviceListLookup(list, dev))) { + VIR_AUTOFREE(char *) addrStr = virPCIDeviceAddressAsString(&tmp->address); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("NVMe device %s namespace %u is already on the list"), + NULLSTR(addrStr), tmp->namespace); + return -1; + } + + if (!(tmp = virNVMeDeviceCopy(dev)) || + VIR_APPEND_ELEMENT(list->devs, list->count, tmp) < 0) { + virNVMeDeviceFree(tmp); + return -1; + } + + return 0; +} + + +int +virNVMeDeviceListDel(virNVMeDeviceListPtr list, + const virNVMeDevice *dev) +{ + ssize_t idx; + virNVMeDevicePtr tmp = NULL; + + if ((idx = virNVMeDeviceListLookupIndex(list, dev)) < 0) { + VIR_AUTOFREE(char *) addrStr = virPCIDeviceAddressAsString(&dev->address); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("NVMe device %s namespace %u not found"), + NULLSTR(addrStr), dev->namespace); + return -1; + } + + tmp = list->devs[idx]; + VIR_DELETE_ELEMENT(list->devs, idx, list->count); + virNVMeDeviceFree(tmp); + return 0; +} + + +virNVMeDevicePtr +virNVMeDeviceListGet(virNVMeDeviceListPtr list, + size_t i) +{ + return i < list->count ? list->devs[i] : NULL; +} + + +virNVMeDevicePtr +virNVMeDeviceListLookup(virNVMeDeviceListPtr list, + const virNVMeDevice *dev) +{ + ssize_t idx; + + if ((idx = virNVMeDeviceListLookupIndex(list, dev)) < 0) + return NULL; + + return list->devs[idx]; +} + + +ssize_t +virNVMeDeviceListLookupIndex(virNVMeDeviceListPtr list, + const virNVMeDevice *dev) +{ + size_t i; + + if (!list) + return -1; + + for (i = 0; i < list->count; i++) { + virNVMeDevicePtr other = list->devs[i]; + + if (virPCIDeviceAddressEqual(&dev->address, &other->address) && + dev->namespace == other->namespace) + return i; + } + + return -1; +} + + +static virNVMeDevicePtr +virNVMeDeviceListLookupByPCIAddress(virNVMeDeviceListPtr list, + const virPCIDeviceAddress *address) +{ + size_t i; + + if (!list) + return NULL; + + for (i = 0; i < list->count; i++) { + virNVMeDevicePtr other = list->devs[i]; + + if (virPCIDeviceAddressEqual(address, &other->address)) + return other; + } + + return NULL; +} + + +static virPCIDevicePtr +virNVMeDeviceCreatePCIDevice(const virNVMeDevice *nvme) +{ + VIR_AUTOPTR(virPCIDevice) pci = NULL; + + if (!(pci = virPCIDeviceNew(nvme->address.domain, + nvme->address.bus, + nvme->address.slot, + nvme->address.function))) + return NULL; + + /* NVMe devices must be bound to vfio */ + virPCIDeviceSetStubDriver(pci, VIR_PCI_STUB_DRIVER_VFIO); + virPCIDeviceSetManaged(pci, nvme->managed); + + VIR_RETURN_PTR(pci); +} + + +/** + * virNVMeDeviceListCreateDetachList: + * @activeList: list of active NVMe devices + * @toDetachList: list of NVMe devices to detach from the host + * + * This function creates a list of PCI devices which can then be + * reused by PCI device detach functions (e.g. + * virHostdevPreparePCIDevicesImpl()) as each PCI device from the + * returned list is initialized properly for detach. + * + * Basically, this just blindly collects unique PCI addresses + * from @toDetachList that don't appear on @activeList. + * + * Returns: a list on success, + * NULL otherwise. + */ +virPCIDeviceListPtr +virNVMeDeviceListCreateDetachList(virNVMeDeviceListPtr activeList, + virNVMeDeviceListPtr toDetachList) +{ + VIR_AUTOUNREF(virPCIDeviceListPtr) pciDevices = NULL; + size_t i; + + if (!(pciDevices = virPCIDeviceListNew())) + return NULL; + + for (i = 0; i < toDetachList->count; i++) { + const virNVMeDevice *d = toDetachList->devs[i]; + VIR_AUTOPTR(virPCIDevice) pci = NULL; + + /* If there is a NVMe device with the same PCI address on + * the activeList, the device is already detached. */ + if (virNVMeDeviceListLookupByPCIAddress(activeList, &d->address)) + continue; + + /* It may happen that we want to detach two namespaces + * from the same NVMe device. This will be represented as + * two different instances of virNVMeDevice, but + * obviously we want to put the PCI device on the detach + * list only once. */ + if (virPCIDeviceListFindByIDs(pciDevices, + d->address.domain, + d->address.bus, + d->address.slot, + d->address.function)) + continue; + + if (!(pci = virNVMeDeviceCreatePCIDevice(d))) + return NULL; + + if (virPCIDeviceListAdd(pciDevices, pci) < 0) + return NULL; + + /* avoid freeing the device */ + pci = NULL; + } + + VIR_RETURN_PTR(pciDevices); +} + + +/** + * virNVMeDeviceListCreateReAttachList: + * @activeList: list of active NVMe devices + * @toReAttachList: list of devices to reattach to the host + * + * This is a counterpart to virNVMeDeviceListCreateDetachList. + * + * This function creates a list of PCI devices which can then be + * reused by PCI device reattach functions (e.g. + * virHostdevReAttachPCIDevicesImpl()) as each PCI device from + * the returned list is initialized properly for reattach. + * + * Basically, this just collects unique PCI addresses + * of devices that appear on @toReAttachList and are used + * exactly once (i.e. no other namespaces are used from the same + * NVMe device). For that purpose, this function needs to know + * list of active NVMe devices (@activeList). + * + * Returns: a list on success, + * NULL otherwise. + */ +virPCIDeviceListPtr +virNVMeDeviceListCreateReAttachList(virNVMeDeviceListPtr activeList, + virNVMeDeviceListPtr toReAttachList) +{ + VIR_AUTOUNREF(virPCIDeviceListPtr) pciDevices = NULL; + size_t i; + + if (!(pciDevices = virPCIDeviceListNew())) + return NULL; + + for (i = 0; i < toReAttachList->count; i++) { + const virNVMeDevice *d = toReAttachList->devs[i]; + VIR_AUTOPTR(virPCIDevice) pci = NULL; + size_t nused = 0; + + /* Check if there is any other NVMe device with the same PCI address as + * @d. To simplify this, let's just count how many NVMe devices with + * the same PCI address there are on the @activeList. */ + for (i = 0; i < activeList->count; i++) { + virNVMeDevicePtr other = activeList->devs[i]; + + if (!virPCIDeviceAddressEqual(&d->address, &other->address)) + continue; + + nused++; + } + + /* Now, the following cases can happen: + * nused > 1 -> there are other NVMe device active, do NOT detach it + * nused == 1 -> we've found only @d on the @activeList, detach it + * nused == 0 -> huh, wait, what? @d is NOT on the @active list, how can + * we reattach it? + */ + + if (nused == 0) { + /* Shouldn't happen (TM) */ + VIR_AUTOFREE(char *) addrStr = virPCIDeviceAddressAsString(&d->address); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("NVMe device %s namespace %u not found"), + NULLSTR(addrStr), d->namespace); + return NULL; + } else if (nused > 1) { + /* NVMe device is still in use */ + continue; + } + + /* nused == 1 -> detach the device */ + if (!(pci = virNVMeDeviceCreatePCIDevice(d))) + return NULL; + + if (virPCIDeviceListAdd(pciDevices, pci) < 0) + return NULL; + + /* avoid freeing the device */ + pci = NULL; + } + + VIR_RETURN_PTR(pciDevices); +} diff --git a/src/util/virnvme.h b/src/util/virnvme.h new file mode 100644 index 0000000000..e582f2b572 --- /dev/null +++ b/src/util/virnvme.h @@ -0,0 +1,95 @@ +/* + * virnvme.h: helper APIs for managing NVMe devices + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + */ + +#pragma once + +#include "virpci.h" + +typedef struct _virNVMeDevice virNVMeDevice; +typedef virNVMeDevice *virNVMeDevicePtr; + +/* Note that this list is lockable, and in fact, it is caller's + * responsibility to acquire the lock and release it. The reason + * is that in a lot of cases the list must be locked between two + * API calls and therefore only caller knows when it is safe to + * finally release the lock. */ +typedef struct _virNVMeDeviceList virNVMeDeviceList; +typedef virNVMeDeviceList *virNVMeDeviceListPtr; + +virNVMeDevicePtr +virNVMeDeviceNew(const virPCIDeviceAddress *address, + unsigned long namespace, + bool managed); + +void +virNVMeDeviceFree(virNVMeDevicePtr dev); + +VIR_DEFINE_AUTOPTR_FUNC(virNVMeDevice, virNVMeDeviceFree); + +virNVMeDevicePtr +virNVMeDeviceCopy(const virNVMeDevice *dev); + +const virPCIDeviceAddress * +virNVMeDeviceAddressGet(const virNVMeDevice *dev); + +void +virNVMeDeviceUsedByClear(virNVMeDevicePtr dev); + +void +virNVMeDeviceUsedByGet(const virNVMeDevice *dev, + const char **drv, + const char **dom); + +int +virNVMeDeviceUsedBySet(virNVMeDevicePtr dev, + const char *drv, + const char *dom); + +virNVMeDeviceListPtr +virNVMeDeviceListNew(void); + +size_t +virNVMeDeviceListCount(const virNVMeDeviceList *list); + +int +virNVMeDeviceListAdd(virNVMeDeviceListPtr list, + const virNVMeDevice *dev); + +int +virNVMeDeviceListDel(virNVMeDeviceListPtr list, + const virNVMeDevice *dev); + +virNVMeDevicePtr +virNVMeDeviceListGet(virNVMeDeviceListPtr list, + size_t i); + +virNVMeDevicePtr +virNVMeDeviceListLookup(virNVMeDeviceListPtr list, + const virNVMeDevice *dev); + +ssize_t +virNVMeDeviceListLookupIndex(virNVMeDeviceListPtr list, + const virNVMeDevice *dev); + +virPCIDeviceListPtr +virNVMeDeviceListCreateDetachList(virNVMeDeviceListPtr activeList, + virNVMeDeviceListPtr toDetachList); + +virPCIDeviceListPtr +virNVMeDeviceListCreateReAttachList(virNVMeDeviceListPtr activeList, + virNVMeDeviceListPtr toReAttachList); -- 2.21.0

Now that we have virNVMeDevice module (introduced in previous commit), let's use it int virHostdev to track which NVMe devices are free to be used by a domain and which are taken. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 5 + src/util/virhostdev.c | 332 +++++++++++++++++++++++++++++++++++++++ src/util/virhostdev.h | 37 +++++ 3 files changed, 374 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 0c84b347db..ae8b41ce30 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2113,18 +2113,23 @@ virHostdevPCINodeDeviceReAttach; virHostdevPCINodeDeviceReset; virHostdevPrepareDomainDevices; virHostdevPrepareMediatedDevices; +virHostdevPrepareNVMeDevices; +virHostdevPrepareOneNVMeDevice; virHostdevPreparePCIDevices; virHostdevPrepareSCSIDevices; virHostdevPrepareSCSIVHostDevices; virHostdevPrepareUSBDevices; virHostdevReAttachDomainDevices; virHostdevReAttachMediatedDevices; +virHostdevReAttachNVMeDevices; +virHostdevReAttachOneNVMeDevice; virHostdevReAttachPCIDevices; virHostdevReAttachSCSIDevices; virHostdevReAttachSCSIVHostDevices; virHostdevReAttachUSBDevices; virHostdevUpdateActiveDomainDevices; virHostdevUpdateActiveMediatedDevices; +virHostdevUpdateActiveNVMeDevices; virHostdevUpdateActivePCIDevices; virHostdevUpdateActiveSCSIDevices; virHostdevUpdateActiveUSBDevices; diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index f0c97ca887..9ccf79453c 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -138,6 +138,7 @@ virHostdevManagerDispose(void *obj) virObjectUnref(hostdevMgr->activeSCSIHostdevs); virObjectUnref(hostdevMgr->activeSCSIVHostHostdevs); virObjectUnref(hostdevMgr->activeMediatedHostdevs); + virObjectUnref(hostdevMgr->activeNVMeHostdevs); VIR_FREE(hostdevMgr->stateDir); } @@ -168,6 +169,9 @@ virHostdevManagerNew(void) if (!(hostdevMgr->activeMediatedHostdevs = virMediatedDeviceListNew())) return NULL; + if (!(hostdevMgr->activeNVMeHostdevs = virNVMeDeviceListNew())) + return NULL; + if (privileged) { if (VIR_STRDUP(hostdevMgr->stateDir, HOSTDEV_STATE_DIR) < 0) return NULL; @@ -2240,3 +2244,331 @@ virHostdevUpdateActiveDomainDevices(virHostdevManagerPtr mgr, return 0; } + + +static int +virHostdevGetNVMeDeviceList(virNVMeDeviceListPtr nvmeDevices, + virStorageSourcePtr src, + const char *drv_name, + const char *dom_name) +{ + virStorageSourcePtr n; + + for (n = src; virStorageSourceIsBacking(n); n = n->backingStore) { + VIR_AUTOPTR(virNVMeDevice) dev = NULL; + const virStorageSourceNVMeDef *srcNVMe = n->nvme; + + if (n->type != VIR_STORAGE_TYPE_NVME) + continue; + + if (!(dev = virNVMeDeviceNew(&srcNVMe->pciAddr, + srcNVMe->namespace, + srcNVMe->managed))) + return -1; + + if (virNVMeDeviceUsedBySet(dev, drv_name, dom_name) < 0) + return -1; + + if (virNVMeDeviceListAdd(nvmeDevices, dev) < 0) + return -1; + } + + return 0; +} + + +int +virHostdevPrepareOneNVMeDevice(virHostdevManagerPtr hostdev_mgr, + const char *drv_name, + const char *dom_name, + virStorageSourcePtr src) +{ + VIR_AUTOUNREF(virNVMeDeviceListPtr) nvmeDevices = NULL; + VIR_AUTOUNREF(virPCIDeviceListPtr) pciDevices = NULL; + const unsigned int pciFlags = 0; + virNVMeDevicePtr temp = NULL; + size_t i; + ssize_t lastGoodNVMeIdx = -1; + int ret = -1; + + if (!(nvmeDevices = virNVMeDeviceListNew())) + return -1; + + if (virHostdevGetNVMeDeviceList(nvmeDevices, src, drv_name, dom_name) < 0) + return -1; + + if (virNVMeDeviceListCount(nvmeDevices) == 0) + return 0; + + virObjectLock(hostdev_mgr->activeNVMeHostdevs); + + /* Firstly, let's check if all devices are free */ + for (i = 0; i < virNVMeDeviceListCount(nvmeDevices); i++) { + const virNVMeDevice *dev = virNVMeDeviceListGet(nvmeDevices, i); + const virPCIDeviceAddress *addr = NULL; + VIR_AUTOFREE(char *) addrStr = NULL; + const char *actual_drvname = NULL; + const char *actual_domname = NULL; + + temp = virNVMeDeviceListLookup(hostdev_mgr->activeNVMeHostdevs, dev); + + /* Not on the list means not used */ + if (!temp) + continue; + + virNVMeDeviceUsedByGet(temp, &actual_drvname, &actual_domname); + addr = virNVMeDeviceAddressGet(dev); + addrStr = virPCIDeviceAddressAsString(addr); + + virReportError(VIR_ERR_OPERATION_INVALID, + _("NVMe device %s already in use by driver %s domain %s"), + NULLSTR(addrStr), actual_drvname, actual_domname); + goto cleanup; + } + + if (!(pciDevices = virNVMeDeviceListCreateDetachList(hostdev_mgr->activeNVMeHostdevs, + nvmeDevices))) + goto cleanup; + + /* Let's check if all PCI devices are NVMe disks. */ + for (i = 0; i < virPCIDeviceListCount(pciDevices); i++) { + virPCIDevicePtr pci = virPCIDeviceListGet(pciDevices, i); + VIR_AUTOFREE(char *) drvPath = NULL; + VIR_AUTOFREE(char *) drvName = NULL; + int stub = VIR_PCI_STUB_DRIVER_NONE; + + if (virPCIDeviceGetDriverPathAndName(pci, &drvPath, &drvName) < 0) + goto cleanup; + + if (drvName) + stub = virPCIStubDriverTypeFromString(drvName); + + if (stub == VIR_PCI_STUB_DRIVER_VFIO || + STREQ_NULLABLE(drvName, "nvme")) + continue; + + VIR_WARN("Suspicious NVMe disk assignment. PCI device " + "%s is not an NVMe disk, it has %s driver", + virPCIDeviceGetName(pci), NULLSTR(drvName)); + } + + /* This looks like a good opportunity to merge inactive NVMe devices onto + * the active list. This, however, means that if something goes wrong we + * have to perform a rollback before returning. */ + for (i = 0; i < virNVMeDeviceListCount(nvmeDevices); i++) { + temp = virNVMeDeviceListGet(nvmeDevices, i); + + if (virNVMeDeviceListAdd(hostdev_mgr->activeNVMeHostdevs, temp) < 0) + goto rollback; + + lastGoodNVMeIdx = i; + } + + if (virHostdevPreparePCIDevicesImpl(hostdev_mgr, + drv_name, dom_name, NULL, + pciDevices, NULL, 0, pciFlags) < 0) + goto rollback; + + ret = 0; + cleanup: + virObjectUnlock(hostdev_mgr->activeNVMeHostdevs); + return ret; + + rollback: + while (lastGoodNVMeIdx >= 0) { + temp = virNVMeDeviceListGet(nvmeDevices, lastGoodNVMeIdx); + + virNVMeDeviceListDel(hostdev_mgr->activeNVMeHostdevs, temp); + + lastGoodNVMeIdx--; + } + goto cleanup; +} + + +int +virHostdevPrepareNVMeDevices(virHostdevManagerPtr hostdev_mgr, + const char *drv_name, + const char *dom_name, + virDomainDiskDefPtr *disks, + size_t ndisks) +{ + size_t i; + ssize_t lastGoodDiskIdx = -1; + + for (i = 0; i < ndisks; i++) { + if (virHostdevPrepareOneNVMeDevice(hostdev_mgr, drv_name, + dom_name, disks[i]->src) < 0) + goto rollback; + + lastGoodDiskIdx = i; + } + + return 0; + + rollback: + while (lastGoodDiskIdx >= 0) { + if (virHostdevReAttachOneNVMeDevice(hostdev_mgr, drv_name, dom_name, + disks[lastGoodDiskIdx]->src) < 0) { + VIR_ERROR(_("Failed to reattach NVMe for disk target: %s"), + disks[lastGoodDiskIdx]->dst); + } + + lastGoodDiskIdx--; + } + + return -1; +} + + +int +virHostdevReAttachOneNVMeDevice(virHostdevManagerPtr hostdev_mgr, + const char *drv_name, + const char *dom_name, + virStorageSourcePtr src) +{ + VIR_AUTOUNREF(virNVMeDeviceListPtr) nvmeDevices = NULL; + VIR_AUTOUNREF(virPCIDeviceListPtr) pciDevices = NULL; + size_t i; + int ret = -1; + + if (!(nvmeDevices = virNVMeDeviceListNew())) + return -1; + + if (virHostdevGetNVMeDeviceList(nvmeDevices, src, drv_name, dom_name) < 0) + return -1; + + if (virNVMeDeviceListCount(nvmeDevices) == 0) + return 0; + + virObjectLock(hostdev_mgr->activeNVMeHostdevs); + + if (!(pciDevices = virNVMeDeviceListCreateReAttachList(hostdev_mgr->activeNVMeHostdevs, + nvmeDevices))) + goto cleanup; + + virHostdevReAttachPCIDevicesImpl(hostdev_mgr, + drv_name, dom_name, pciDevices, + NULL, 0, NULL); + + for (i = 0; i < virNVMeDeviceListCount(nvmeDevices); i++) { + virNVMeDevicePtr temp = virNVMeDeviceListGet(nvmeDevices, i); + + if (virNVMeDeviceListDel(hostdev_mgr->activeNVMeHostdevs, temp) < 0) + goto cleanup; + } + + ret = 0; + cleanup: + virObjectUnlock(hostdev_mgr->activeNVMeHostdevs); + return ret; +} + + +int +virHostdevReAttachNVMeDevices(virHostdevManagerPtr hostdev_mgr, + const char *drv_name, + const char *dom_name, + virDomainDiskDefPtr *disks, + size_t ndisks) +{ + size_t i; + int ret = 0; + + /* Contrary to virHostdevPrepareNVMeDevices, this is a best + * effort approach. Just iterate over all disks and try to + * reattach them. Don't stop at the first failure. */ + for (i = 0; i < ndisks; i++) { + if (virHostdevReAttachOneNVMeDevice(hostdev_mgr, drv_name, + dom_name, disks[i]->src) < 0) { + VIR_ERROR(_("Failed to reattach NVMe for disk target: %s"), + disks[i]->dst); + ret = -1; + } + } + + return ret; +} + + +int +virHostdevUpdateActiveNVMeDevices(virHostdevManagerPtr hostdev_mgr, + const char *drv_name, + const char *dom_name, + virDomainDiskDefPtr *disks, + size_t ndisks) +{ + VIR_AUTOUNREF(virNVMeDeviceListPtr) nvmeDevices = NULL; + VIR_AUTOUNREF(virPCIDeviceListPtr) pciDevices = NULL; + virNVMeDevicePtr temp = NULL; + size_t i; + ssize_t lastGoodNVMeIdx = -1; + ssize_t lastGoodPCIIdx = -1; + int ret = -1; + + if (!(nvmeDevices = virNVMeDeviceListNew())) + return -1; + + for (i = 0; i < ndisks; i++) { + if (virHostdevGetNVMeDeviceList(nvmeDevices, disks[i]->src, drv_name, dom_name) < 0) + return -1; + } + + if (virNVMeDeviceListCount(nvmeDevices) == 0) + return 0; + + virObjectLock(hostdev_mgr->activeNVMeHostdevs); + virObjectLock(hostdev_mgr->activePCIHostdevs); + virObjectLock(hostdev_mgr->inactivePCIHostdevs); + + if (!(pciDevices = virNVMeDeviceListCreateDetachList(hostdev_mgr->activeNVMeHostdevs, + nvmeDevices))) + goto cleanup; + + for (i = 0; i < virNVMeDeviceListCount(nvmeDevices); i++) { + temp = virNVMeDeviceListGet(nvmeDevices, i); + + if (virNVMeDeviceListAdd(hostdev_mgr->activeNVMeHostdevs, temp) < 0) + goto rollback; + + lastGoodNVMeIdx = i; + } + + for (i = 0; i < virPCIDeviceListCount(pciDevices); i++) { + virPCIDevicePtr actual = virPCIDeviceListGet(pciDevices, i); + + /* We must restore some attributes that were lost on daemon restart. */ + virPCIDeviceSetUnbindFromStub(actual, true); + if (virPCIDeviceSetUsedBy(actual, drv_name, dom_name) < 0) + goto rollback; + + if (virPCIDeviceListAddCopy(hostdev_mgr->activePCIHostdevs, actual) < 0) + goto rollback; + + lastGoodPCIIdx = i; + } + + ret = 0; + cleanup: + virObjectUnlock(hostdev_mgr->inactivePCIHostdevs); + virObjectUnlock(hostdev_mgr->activePCIHostdevs); + virObjectUnlock(hostdev_mgr->activeNVMeHostdevs); + return ret; + + rollback: + while (lastGoodNVMeIdx >= 0) { + temp = virNVMeDeviceListGet(nvmeDevices, lastGoodNVMeIdx); + + virNVMeDeviceListDel(hostdev_mgr->activeNVMeHostdevs, temp); + + lastGoodNVMeIdx--; + } + while (lastGoodPCIIdx >= 0) { + virPCIDevicePtr actual = virPCIDeviceListGet(pciDevices, i); + + virPCIDeviceListDel(hostdev_mgr->activePCIHostdevs, actual); + + lastGoodPCIIdx--; + } + goto cleanup; +} diff --git a/src/util/virhostdev.h b/src/util/virhostdev.h index c7ef2055c1..372eca2751 100644 --- a/src/util/virhostdev.h +++ b/src/util/virhostdev.h @@ -29,6 +29,7 @@ #include "virscsivhost.h" #include "conf/domain_conf.h" #include "virmdev.h" +#include "virnvme.h" typedef enum { VIR_HOSTDEV_STRICT_ACS_CHECK = (1 << 0), /* strict acs check */ @@ -53,6 +54,9 @@ struct _virHostdevManager { virSCSIDeviceListPtr activeSCSIHostdevs; virSCSIVHostDeviceListPtr activeSCSIVHostHostdevs; virMediatedDeviceListPtr activeMediatedHostdevs; + /* NVMe devices are PCI devices really, but one NVMe disk can + * have multiple namespaces. */ + virNVMeDeviceListPtr activeNVMeHostdevs; }; virHostdevManagerPtr virHostdevManagerGetDefault(void); @@ -204,3 +208,36 @@ int virHostdevPCINodeDeviceReAttach(virHostdevManagerPtr mgr, int virHostdevPCINodeDeviceReset(virHostdevManagerPtr mgr, virPCIDevicePtr pci) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); + +int +virHostdevPrepareOneNVMeDevice(virHostdevManagerPtr hostdev_mgr, + const char *drv_name, + const char *dom_name, + virStorageSourcePtr src); + +int +virHostdevPrepareNVMeDevices(virHostdevManagerPtr hostdev_mgr, + const char *drv_name, + const char *dom_name, + virDomainDiskDefPtr *disks, + size_t ndisks); + +int +virHostdevReAttachOneNVMeDevice(virHostdevManagerPtr hostdev_mgr, + const char *drv_name, + const char *dom_name, + virStorageSourcePtr src); + +int +virHostdevReAttachNVMeDevices(virHostdevManagerPtr hostdev_mgr, + const char *drv_name, + const char *dom_name, + virDomainDiskDefPtr *disks, + size_t ndisks); + +int +virHostdevUpdateActiveNVMeDevices(virHostdevManagerPtr hostdev_mgr, + const char *drv_name, + const char *dom_name, + virDomainDiskDefPtr *disks, + size_t ndisks); -- 2.21.0

The device configs (which are actually the same one config) come from a NVMe disk of mine. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ACKed-by: Peter Krempa <pkrempa@redhat.com> --- tests/virpcimock.c | 3 +++ tests/virpcitestdata/0000-01-00.0.config | Bin 0 -> 4096 bytes tests/virpcitestdata/0000-02-00.0.config | Bin 0 -> 4096 bytes 3 files changed, 3 insertions(+) create mode 100644 tests/virpcitestdata/0000-01-00.0.config create mode 100644 tests/virpcitestdata/0000-02-00.0.config diff --git a/tests/virpcimock.c b/tests/virpcimock.c index edb558b6ec..c36d3e1371 100644 --- a/tests/virpcimock.c +++ b/tests/virpcimock.c @@ -1017,6 +1017,7 @@ init_env(void) MAKE_PCI_DRIVER("iwlwifi", 0x8086, 0x0044); MAKE_PCI_DRIVER("i915", 0x8086, 0x0046, 0x8086, 0x0047); MAKE_PCI_DRIVER("vfio-pci", -1, -1); + MAKE_PCI_DRIVER("nvme", 0x1cc1, 0x8201); # define MAKE_PCI_DEVICE(Id, Vendor, Device, IommuGroup, ...) \ do { \ @@ -1052,6 +1053,8 @@ init_env(void) MAKE_PCI_DEVICE("0021:de:1f.1", 0x8086, 0x0047, 13, .physfn = "0021:de:1f.0"); /* Virtual Function */ + MAKE_PCI_DEVICE("0000:01:00.0", 0x1cc1, 0x8201, 14, .klass = 0x010802); + MAKE_PCI_DEVICE("0000:02:00.0", 0x1cc1, 0x8201, 15, .klass = 0x010802); } diff --git a/tests/virpcitestdata/0000-01-00.0.config b/tests/virpcitestdata/0000-01-00.0.config new file mode 100644 index 0000000000000000000000000000000000000000..f92455e2ac5701ce60a51ae19828658b80744399 GIT binary patch literal 4096 zcmeHIF;Buk6#lMP3WC_80!0h0vM?|ZHYb~)#L>jXKcF*<t0@ew#(&`84`6g`<KpZu zaP%LTxZu0C6dH+PAO@3rm)v{zz3*Nx-?guS#YUQHfGar$G8Om~evt*l6}THG3$%ls zbL8T+aGAkfSZ5AOg~nJxaQ{&~b`11hPvNqjks{E-76s`bTjV$zsdNdt2Zx}86r3!0 z5-k@njLH$yMaSt!;XCjcMN7{rhH)Lzgm%?1tR|ap5RC)?OfZuh+-MNn&bPwMnC1$Y zWtY532v8x%Zq%*)y_#9Aly`TwONPEx+$`iba#<~-a)litu!!pkegQ#S0so<=$gQVo zbjxR?KpRD$uKLK=emkc^$-Zgf<*MZ@;uWY8!!UdWrXoze;QR3q_ajXzAQg}b)NzRk z87VP9U%z%(HWJ0B|6JTCgnRh9Wz<V@F5}c=IPK9xUNln!?lk>Mr%9L>;{S2-*!p%x w=b5pKIZ+ec2|JnLT|5EZ*;+<Yfz>MDr9bc*RexLU6J#~1fK)@Fxm<1DBgMR{#J2 literal 0 HcmV?d00001 diff --git a/tests/virpcitestdata/0000-02-00.0.config b/tests/virpcitestdata/0000-02-00.0.config new file mode 100644 index 0000000000000000000000000000000000000000..ebb44d8f69c91809b82e8d2669026dfff42c3100 GIT binary patch literal 4096 zcmeHIJ5Iwu5Pj>-$HXLdfc(IT4QW!Oh|*DEDG*U2(QpB%)6hF9Xc0G{-~cHpZ9zfJ z322bGLCP>|J5DSjlp;bw+F5C5_RZVz>a9KYO*YD;3~)tdAWH!g;g^|DT!A}LQllO0 zf<ukg!legyL7fFC5gKC!{{2_w#5T}-JA=b|MuI>KOBAGo6v%Nj66qpz7dAnM2{>Nx zI9e@W7?nb%gO1$~!w=vwj8>jg7)EtS6WUe7uo7>+ML1#rsDf3w!Hov7tz0X}jA<@| znO4!A1^^YZtw!BE*soP9<<j2nPSMZ{`E4z?rDikf6j#_0e3Q7Y;A`;P3iuB_MQ$@K zL$`cR3bc`bbvTZ_%x~vZDA})?c)4!b%Xk`9Vi*Rmz)Xah7kn=o;(nw_1*8H}fjX`* zB_kyU=<7E&%Z8$O^q-3wg>Vm(Pe#2&br`1}!)cEm@WPoIaHr{&J59pe0RNAZ%Qm+& w+Ruz#E{GcIPT1)j@8SvQ&et-M3anQFH~E3rsQUYQpCGGA1*8H}fj?2;8=mzw{{R30 literal 0 HcmV?d00001 -- 2.21.0

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ACKed-by: Peter Krempa <pkrempa@redhat.com> --- tests/virhostdevtest.c | 97 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 97 insertions(+) diff --git a/tests/virhostdevtest.c b/tests/virhostdevtest.c index 46627355c3..fde7f92bc4 100644 --- a/tests/virhostdevtest.c +++ b/tests/virhostdevtest.c @@ -48,6 +48,9 @@ VIR_LOG_INIT("tests.hostdevtest"); # define CHECK_PCI_LIST_COUNT(list, cnt) \ CHECK_LIST_COUNT(list, cnt, virPCIDeviceListCount) +# define CHECK_NVME_LIST_COUNT(list, cnt) \ + CHECK_LIST_COUNT(list, cnt, virNVMeDeviceListCount) + # define TEST_STATE_DIR abs_builddir "/hostdevmgr" static const char *drv_name = "test_driver"; static const char *dom_name = "test_domain"; @@ -57,6 +60,36 @@ static int nhostdevs = 3; static virDomainHostdevDefPtr hostdevs[] = {NULL, NULL, NULL}; static virPCIDevicePtr dev[] = {NULL, NULL, NULL}; static virHostdevManagerPtr mgr; +static const size_t ndisks = 3; +static virDomainDiskDefPtr disks[] = {NULL, NULL, NULL}; +static const char *diskXML[] = { + "<disk type='nvme' device='disk'>" + " <driver name='qemu' type='raw'/>" + " <source type='pci' managed='yes' namespace='1'>" + " <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>" + " </source>" + " <target dev='vda' bus='virtio'/>" + " <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>" + "</disk>", + + "<disk type='nvme' device='disk'>" + " <driver name='qemu' type='raw'/>" + " <source type='pci' managed='yes' namespace='2'>" + " <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>" + " </source>" + " <target dev='vdb' bus='virtio'/>" + " <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>" + "</disk>", + + "<disk type='nvme' device='disk'>" + " <driver name='qemu' type='raw'/>" + " <source type='pci' managed='no' namespace='1'>" + " <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>" + " </source>" + " <target dev='vdc' bus='virtio'/>" + " <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>" + "</disk>" +}; static void myCleanup(void) @@ -67,6 +100,9 @@ myCleanup(void) virDomainHostdevDefFree(hostdevs[i]); } + for (i = 0; i < ndisks; i++) + virDomainDiskDefFree(disks[i]); + if (mgr) { if (!getenv("LIBVIRT_SKIP_CLEANUP")) virFileDeleteTree(mgr->stateDir); @@ -75,6 +111,7 @@ myCleanup(void) virObjectUnref(mgr->activeUSBHostdevs); virObjectUnref(mgr->inactivePCIHostdevs); virObjectUnref(mgr->activeSCSIHostdevs); + virObjectUnref(mgr->activeNVMeHostdevs); VIR_FREE(mgr->stateDir); VIR_FREE(mgr); } @@ -107,6 +144,11 @@ myInit(void) virPCIDeviceSetStubDriver(dev[i], VIR_PCI_STUB_DRIVER_VFIO); } + for (i = 0; i < ndisks; i++) { + if (!(disks[i] = virDomainDiskDefParse(diskXML[i], NULL, NULL, 0))) + goto cleanup; + } + if (VIR_ALLOC(mgr) < 0) goto cleanup; if ((mgr->activePCIHostdevs = virPCIDeviceListNew()) == NULL) @@ -117,6 +159,8 @@ myInit(void) goto cleanup; if ((mgr->activeSCSIHostdevs = virSCSIDeviceListNew()) == NULL) goto cleanup; + if ((mgr->activeNVMeHostdevs = virNVMeDeviceListNew()) == NULL) + goto cleanup; if (VIR_STRDUP(mgr->stateDir, TEST_STATE_DIR) < 0) goto cleanup; if (virFileMakePath(mgr->stateDir) < 0) @@ -493,6 +537,58 @@ testVirHostdevOther(const void *opaque ATTRIBUTE_UNUSED) return 0; } +static int +testNVMeDiskRoundtrip(const void *opaque ATTRIBUTE_UNUSED) +{ + int ret = -1; + + /* Don't rely on a state that previous test cases might have + * left the manager in. Start with a clean slate. */ + virHostdevReAttachPCIDevices(mgr, drv_name, dom_name, + hostdevs, nhostdevs, NULL); + + CHECK_NVME_LIST_COUNT(mgr->activeNVMeHostdevs, 0); + CHECK_PCI_LIST_COUNT(mgr->activePCIHostdevs, 0); + CHECK_PCI_LIST_COUNT(mgr->inactivePCIHostdevs, 0); + + /* Firstly, attach all NVMe disks */ + if (virHostdevPrepareNVMeDevices(mgr, drv_name, dom_name, disks, ndisks) < 0) + goto cleanup; + + CHECK_NVME_LIST_COUNT(mgr->activeNVMeHostdevs, 3); + CHECK_PCI_LIST_COUNT(mgr->activePCIHostdevs, 2); + CHECK_PCI_LIST_COUNT(mgr->inactivePCIHostdevs, 0); + + /* Now, try to detach the first one. */ + if (virHostdevReAttachNVMeDevices(mgr, drv_name, dom_name, disks, 1) < 0) + goto cleanup; + + CHECK_NVME_LIST_COUNT(mgr->activeNVMeHostdevs, 2); + CHECK_PCI_LIST_COUNT(mgr->activePCIHostdevs, 2); + CHECK_PCI_LIST_COUNT(mgr->inactivePCIHostdevs, 0); + + /* And the last one */ + if (virHostdevReAttachNVMeDevices(mgr, drv_name, dom_name, &disks[2], 1) < 0) + goto cleanup; + + CHECK_NVME_LIST_COUNT(mgr->activeNVMeHostdevs, 1); + CHECK_PCI_LIST_COUNT(mgr->activePCIHostdevs, 1); + CHECK_PCI_LIST_COUNT(mgr->inactivePCIHostdevs, 0); + + /* Finally, detach the middle one */ + if (virHostdevReAttachNVMeDevices(mgr, drv_name, dom_name, &disks[1], 1) < 0) + goto cleanup; + + CHECK_NVME_LIST_COUNT(mgr->activeNVMeHostdevs, 0); + CHECK_PCI_LIST_COUNT(mgr->activePCIHostdevs, 0); + CHECK_PCI_LIST_COUNT(mgr->inactivePCIHostdevs, 0); + + ret = 0; + cleanup: + return ret; +} + + # define FAKEROOTDIRTEMPLATE abs_builddir "/fakerootdir-XXXXXX" static int @@ -530,6 +626,7 @@ mymain(void) DO_TEST(testVirHostdevRoundtripManaged); DO_TEST(testVirHostdevRoundtripMixed); DO_TEST(testVirHostdevOther); + DO_TEST(testNVMeDiskRoundtrip); myCleanup(); -- 2.21.0

The qemu driver has its own wrappers around virHostdev module (so that some arguments are filled in automatically). Extend these to include NVMe devices too. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ACKed-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_hostdev.c | 49 ++++++++++++++++++++++++++++++++++++++--- src/qemu/qemu_hostdev.h | 10 +++++++++ 2 files changed, 56 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index ebbca817b8..5ab0217858 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -96,13 +96,28 @@ qemuHostdevUpdateActiveMediatedDevices(virQEMUDriverPtr driver, } +int +qemuHostdevUpdateActiveNVMeDisks(virQEMUDriverPtr driver, + virDomainDefPtr def) +{ + return virHostdevUpdateActiveNVMeDevices(driver->hostdevMgr, + QEMU_DRIVER_NAME, + def->name, + def->disks, + def->ndisks); +} + + int qemuHostdevUpdateActiveDomainDevices(virQEMUDriverPtr driver, virDomainDefPtr def) { - if (!def->nhostdevs) + if (!def->nhostdevs && !def->ndisks) return 0; + if (qemuHostdevUpdateActiveNVMeDisks(driver, def) < 0) + return -1; + if (qemuHostdevUpdateActivePCIDevices(driver, def) < 0) return -1; @@ -197,6 +212,17 @@ qemuHostdevPreparePCIDevicesCheckSupport(virDomainHostdevDefPtr *hostdevs, return true; } +int +qemuHostdevPrepareNVMeDisks(virQEMUDriverPtr driver, + const char *name, + virDomainDiskDefPtr *disks, + size_t ndisks) +{ + return virHostdevPrepareNVMeDevices(driver->hostdevMgr, + QEMU_DRIVER_NAME, + name, disks, ndisks); +} + int qemuHostdevPreparePCIDevices(virQEMUDriverPtr driver, const char *name, @@ -313,9 +339,12 @@ qemuHostdevPrepareDomainDevices(virQEMUDriverPtr driver, virQEMUCapsPtr qemuCaps, unsigned int flags) { - if (!def->nhostdevs) + if (!def->nhostdevs && !def->ndisks) return 0; + if (qemuHostdevPrepareNVMeDisks(driver, def->name, def->disks, def->ndisks) < 0) + return -1; + if (qemuHostdevPreparePCIDevices(driver, def->name, def->uuid, def->hostdevs, def->nhostdevs, qemuCaps, flags) < 0) @@ -340,6 +369,17 @@ qemuHostdevPrepareDomainDevices(virQEMUDriverPtr driver, return 0; } +void +qemuHostdevReAttachNVMeDisks(virQEMUDriverPtr driver, + const char *name, + virDomainDiskDefPtr *disks, + size_t ndisks) +{ + virHostdevReAttachNVMeDevices(driver->hostdevMgr, + QEMU_DRIVER_NAME, + name, disks, ndisks); +} + void qemuHostdevReAttachPCIDevices(virQEMUDriverPtr driver, const char *name, @@ -419,9 +459,12 @@ void qemuHostdevReAttachDomainDevices(virQEMUDriverPtr driver, virDomainDefPtr def) { - if (!def->nhostdevs) + if (!def->nhostdevs && !def->ndisks) return; + qemuHostdevReAttachNVMeDisks(driver, def->name, def->disks, + def->ndisks); + qemuHostdevReAttachPCIDevices(driver, def->name, def->hostdevs, def->nhostdevs); diff --git a/src/qemu/qemu_hostdev.h b/src/qemu/qemu_hostdev.h index 536069fe8a..735414b6aa 100644 --- a/src/qemu/qemu_hostdev.h +++ b/src/qemu/qemu_hostdev.h @@ -28,6 +28,8 @@ bool qemuHostdevNeedsVFIO(const virDomainHostdevDef *hostdev); bool qemuHostdevHostSupportsPassthroughVFIO(void); +int qemuHostdevUpdateActiveNVMeDisks(virQEMUDriverPtr driver, + virDomainDefPtr def); int qemuHostdevUpdateActiveMediatedDevices(virQEMUDriverPtr driver, virDomainDefPtr def); int qemuHostdevUpdateActivePCIDevices(virQEMUDriverPtr driver, @@ -39,6 +41,10 @@ int qemuHostdevUpdateActiveSCSIDevices(virQEMUDriverPtr driver, int qemuHostdevUpdateActiveDomainDevices(virQEMUDriverPtr driver, virDomainDefPtr def); +int qemuHostdevPrepareNVMeDisks(virQEMUDriverPtr driver, + const char *name, + virDomainDiskDefPtr *disks, + size_t ndisks); int qemuHostdevPreparePCIDevices(virQEMUDriverPtr driver, const char *name, const unsigned char *uuid, @@ -68,6 +74,10 @@ int qemuHostdevPrepareDomainDevices(virQEMUDriverPtr driver, virQEMUCapsPtr qemuCaps, unsigned int flags); +void qemuHostdevReAttachNVMeDisks(virQEMUDriverPtr driver, + const char *name, + virDomainDiskDefPtr *disks, + size_t ndisks); void qemuHostdevReAttachPCIDevices(virQEMUDriverPtr driver, const char *name, virDomainHostdevDefPtr *hostdevs, -- 2.21.0

We have this beautiful function that does crystal ball divination. The function is named qemuDomainGetMemLockLimitBytes() and it calculates the upper limit of how much locked memory is given guest going to need. The function bases its guess on devices defined for a domain. For instance, if there is a VFIO hostdev defined then it adds 1GiB to the guessed maximum. Since NVMe disks are pretty much VFIO hostdevs (but not quite), we have to do the same sorcery. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ACKed-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_domain.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index f8fe430a7f..33929ce3a8 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11888,6 +11888,9 @@ getPPC64MemLockLimitBytes(virDomainDefPtr def) } } + if (virDomainDefHasNVMeDisk(def)) + usesVFIO = true; + memory = virDomainDefGetMemoryTotal(def); if (def->mem.max_memory) @@ -11986,6 +11989,7 @@ unsigned long long qemuDomainGetMemLockLimitBytes(virDomainDefPtr def) { unsigned long long memKB = 0; + bool usesVFIO = false; size_t i; /* prefer the hard limit */ @@ -12026,11 +12030,17 @@ qemuDomainGetMemLockLimitBytes(virDomainDefPtr def) for (i = 0; i < def->nhostdevs; i++) { if (virHostdevIsVFIODevice(def->hostdevs[i]) || virHostdevIsMdevDevice(def->hostdevs[i])) { - memKB = virDomainDefGetMemoryTotal(def) + 1024 * 1024; - goto done; + usesVFIO = true; + break; } } + if (virDomainDefHasNVMeDisk(def)) + usesVFIO = true; + + if (usesVFIO) + memKB = virDomainDefGetMemoryTotal(def) + 1024 * 1024; + done: return memKB << 10; } -- 2.21.0

If a domain has an NVMe disk configured, then we need to create /dev/vfio/* paths in domain's namespace so that qemu can open them. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 69 +++++++++++++++++++++++++++++++----------- 1 file changed, 52 insertions(+), 17 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 33929ce3a8..9409d8de8d 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -13423,16 +13423,29 @@ qemuDomainSetupDisk(virQEMUDriverConfigPtr cfg ATTRIBUTE_UNUSED, { virStorageSourcePtr next; char *dst = NULL; + bool hasNVMe = false; int ret = -1; for (next = disk->src; virStorageSourceIsBacking(next); next = next->backingStore) { - if (!next->path || !virStorageSourceIsLocalStorage(next)) { - /* Not creating device. Just continue. */ - continue; + if (next->type == VIR_STORAGE_TYPE_NVME) { + VIR_AUTOFREE(char *) nvmePath = NULL; + + hasNVMe = true; + + if (!(nvmePath = virPCIDeviceAddressGetIOMMUGroupDev(&next->nvme->pciAddr))) + goto cleanup; + + if (qemuDomainCreateDevice(nvmePath, data, false) < 0) + goto cleanup; + } else { + if (!next->path || !virStorageSourceIsLocalStorage(next)) { + /* Not creating device. Just continue. */ + continue; + } + + if (qemuDomainCreateDevice(next->path, data, false) < 0) + goto cleanup; } - - if (qemuDomainCreateDevice(next->path, data, false) < 0) - goto cleanup; } /* qemu-pr-helper might require access to /dev/mapper/control. */ @@ -13440,6 +13453,10 @@ qemuDomainSetupDisk(virQEMUDriverConfigPtr cfg ATTRIBUTE_UNUSED, qemuDomainCreateDevice(QEMU_DEVICE_MAPPER_CONTROL_PATH, data, true) < 0) goto cleanup; + if (hasNVMe && + qemuDomainCreateDevice(QEMU_DEV_VFIO, data, false) < 0) + goto cleanup; + ret = 0; cleanup: VIR_FREE(dst); @@ -14468,35 +14485,53 @@ qemuDomainNamespaceSetupDisk(virDomainObjPtr vm, virStorageSourcePtr src) { virStorageSourcePtr next; - const char **paths = NULL; + char **paths = NULL; size_t npaths = 0; - char *dmPath = NULL; + bool hasNVMe = false; + VIR_AUTOFREE(char *) dmPath = NULL; + VIR_AUTOFREE(char *) vfioPath = NULL; int ret = -1; for (next = src; virStorageSourceIsBacking(next); next = next->backingStore) { - if (virStorageSourceIsEmpty(next) || - !virStorageSourceIsLocalStorage(next)) { - /* Not creating device. Just continue. */ - continue; + VIR_AUTOFREE(char *) tmpPath = NULL; + + if (next->type == VIR_STORAGE_TYPE_NVME) { + hasNVMe = true; + + if (!(tmpPath = virPCIDeviceAddressGetIOMMUGroupDev(&next->nvme->pciAddr))) + goto cleanup; + } else { + if (virStorageSourceIsEmpty(next) || + !virStorageSourceIsLocalStorage(next)) { + /* Not creating device. Just continue. */ + continue; + } + + if (VIR_STRDUP(tmpPath, next->path) < 0) + goto cleanup; } - if (VIR_APPEND_ELEMENT_COPY(paths, npaths, next->path) < 0) + if (VIR_APPEND_ELEMENT(paths, npaths, tmpPath) < 0) goto cleanup; } /* qemu-pr-helper might require access to /dev/mapper/control. */ if (src->pr && (VIR_STRDUP(dmPath, QEMU_DEVICE_MAPPER_CONTROL_PATH) < 0 || - VIR_APPEND_ELEMENT_COPY(paths, npaths, dmPath) < 0)) + VIR_APPEND_ELEMENT(paths, npaths, dmPath) < 0)) goto cleanup; - if (qemuDomainNamespaceMknodPaths(vm, paths, npaths) < 0) + if (hasNVMe && + (VIR_STRDUP(vfioPath, QEMU_DEV_VFIO) < 0 || + VIR_APPEND_ELEMENT(paths, npaths, vfioPath) < 0)) + goto cleanup; + + if (qemuDomainNamespaceMknodPaths(vm, (const char **) paths, npaths) < 0) goto cleanup; ret = 0; cleanup: - VIR_FREE(dmPath); - VIR_FREE(paths); + virStringListFreeCount(paths, npaths); return ret; } -- 2.21.0

There are couple of places where a domain with a VFIO device gets special treatment: in CGroups when enabling/disabling access to /dev/vfio/vfio, and when creating/removing nodes in domain mount namespace. Well, a NVMe disk is a VFIO device too. Fortunately, we have this qemuDomainNeedsVFIO() function which is the only place that needs adjustment. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 9409d8de8d..e4e9d4e361 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -12840,7 +12840,8 @@ bool qemuDomainNeedsVFIO(const virDomainDef *def) { return virDomainDefHasVFIOHostdev(def) || - virDomainDefHasMdevHostdev(def); + virDomainDefHasMdevHostdev(def) || + virDomainDefHasNVMeDisk(def); } -- 2.21.0

If a domain has an NVMe disk configured, then we need to allow it on devices CGroup so that qemu can access it. There is one caveat though - if an NVMe disk is read only we need CGroup to allow write too. This is because when opening the device, qemu does couple of ioctl()-s which are considered as write. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_cgroup.c | 101 +++++++++++++++++++++++++++++------------ 1 file changed, 72 insertions(+), 29 deletions(-) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 9684bf3662..b9fb0ebca2 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -119,10 +119,30 @@ qemuSetupImageCgroupInternal(virDomainObjPtr vm, virStorageSourcePtr src, bool forceReadonly) { - if (!src->path || !virStorageSourceIsLocalStorage(src)) { - VIR_DEBUG("Not updating cgroups for disk path '%s', type: %s", - NULLSTR(src->path), virStorageTypeToString(src->type)); - return 0; + VIR_AUTOFREE(char *) path = NULL; + bool readonly = src->readonly || forceReadonly; + + if (src->type == VIR_STORAGE_TYPE_NVME) { + /* Even though disk is R/O we can't make it so in + * CGroups. QEMU will try to do some ioctl()-s over the + * device and such operations are considered R/W by the + * kernel */ + readonly = false; + + if (!(path = virPCIDeviceAddressGetIOMMUGroupDev(&src->nvme->pciAddr))) + return -1; + + if (qemuSetupImagePathCgroup(vm, QEMU_DEV_VFIO, false) < 0) + return -1; + } else { + if (!src->path || !virStorageSourceIsLocalStorage(src)) { + VIR_DEBUG("Not updating cgroups for disk path '%s', type: %s", + NULLSTR(src->path), virStorageTypeToString(src->type)); + return 0; + } + + if (VIR_STRDUP(path, src->path) < 0) + return -1; } if (virStoragePRDefIsManaged(src->pr) && @@ -130,7 +150,7 @@ qemuSetupImageCgroupInternal(virDomainObjPtr vm, qemuSetupImagePathCgroup(vm, QEMU_DEVICE_MAPPER_CONTROL_PATH, false) < 0) return -1; - return qemuSetupImagePathCgroup(vm, src->path, src->readonly || forceReadonly); + return qemuSetupImagePathCgroup(vm, path, readonly); } @@ -147,7 +167,10 @@ qemuTeardownImageCgroup(virDomainObjPtr vm, virStorageSourcePtr src) { qemuDomainObjPrivatePtr priv = vm->privateData; + VIR_AUTOFREE(char *path) = NULL; int perms = VIR_CGROUP_DEVICE_RWM; + bool hasPR = false; + bool hasNVMe = false; size_t i; int ret; @@ -155,41 +178,61 @@ qemuTeardownImageCgroup(virDomainObjPtr vm, VIR_CGROUP_CONTROLLER_DEVICES)) return 0; - if (!src->path || !virStorageSourceIsLocalStorage(src)) { - VIR_DEBUG("Not updating cgroups for disk path '%s', type: %s", - NULLSTR(src->path), virStorageTypeToString(src->type)); - return 0; + for (i = 0; i < vm->def->ndisks; i++) { + virStorageSourcePtr diskSrc = vm->def->disks[i]->src; + + if (src == diskSrc) + continue; + + if (virStoragePRDefIsManaged(diskSrc->pr)) + hasPR = true; + + if (virStorageSourceChainHasNVMe(diskSrc)) + hasNVMe = true; } - if (virFileExists(QEMU_DEVICE_MAPPER_CONTROL_PATH)) { - for (i = 0; i < vm->def->ndisks; i++) { - virStorageSourcePtr diskSrc = vm->def->disks[i]->src; + if (src->type == VIR_STORAGE_TYPE_NVME) { + if (!(path = virPCIDeviceAddressGetIOMMUGroupDev(&src->nvme->pciAddr))) + return -1; - if (src == diskSrc) - continue; - - if (virStoragePRDefIsManaged(diskSrc->pr)) - break; - } - - if (i == vm->def->ndisks) { - VIR_DEBUG("Disabling device mapper control"); - ret = virCgroupDenyDevicePath(priv->cgroup, - QEMU_DEVICE_MAPPER_CONTROL_PATH, - perms, true); + if (!hasNVMe && + !qemuDomainNeedsVFIO(vm->def)) { + ret = virCgroupDenyDevicePath(priv->cgroup, QEMU_DEV_VFIO, perms, true); virDomainAuditCgroupPath(vm, priv->cgroup, "deny", - QEMU_DEVICE_MAPPER_CONTROL_PATH, + QEMU_DEV_VFIO, virCgroupGetDevicePermsString(perms), ret); if (ret < 0) - return ret; + return -1; } + } else { + if (!src->path || !virStorageSourceIsLocalStorage(src)) { + VIR_DEBUG("Not updating cgroups for disk path '%s', type: %s", + NULLSTR(src->path), virStorageTypeToString(src->type)); + return 0; + } + + if (VIR_STRDUP(path, src->path) < 0) + return -1; + } + + if (!hasPR && + virFileExists(QEMU_DEVICE_MAPPER_CONTROL_PATH)) { + VIR_DEBUG("Disabling device mapper control"); + ret = virCgroupDenyDevicePath(priv->cgroup, + QEMU_DEVICE_MAPPER_CONTROL_PATH, + perms, true); + virDomainAuditCgroupPath(vm, priv->cgroup, "deny", + QEMU_DEVICE_MAPPER_CONTROL_PATH, + virCgroupGetDevicePermsString(perms), ret); + if (ret < 0) + return ret; } - VIR_DEBUG("Deny path %s", src->path); + VIR_DEBUG("Deny path %s", path); - ret = virCgroupDenyDevicePath(priv->cgroup, src->path, perms, true); + ret = virCgroupDenyDevicePath(priv->cgroup, path, perms, true); - virDomainAuditCgroupPath(vm, priv->cgroup, "deny", src->path, + virDomainAuditCgroupPath(vm, priv->cgroup, "deny", path, virCgroupGetDevicePermsString(perms), ret); /* If you're looking for a counter part to -- 2.21.0

This function calls virSecuritySELinuxSetFilecon() or virSecuritySELinuxSetFileconOptional() from a lot of places. It works, because in all places we're passing src->path which is what we wanted. But not anymore. We will want to be able to pass a different path and thus the function must be reworked a bit. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ACKed-by: Peter Krempa <pkrempa@redhat.com> --- src/security/security_selinux.c | 34 ++++++++++++--------------------- 1 file changed, 12 insertions(+), 22 deletions(-) diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index e879fa39ab..3a00666d26 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1823,7 +1823,9 @@ virSecuritySELinuxSetImageLabelInternal(virSecurityManagerPtr mgr, virSecurityDeviceLabelDefPtr disk_seclabel; virSecurityDeviceLabelDefPtr parent_seclabel = NULL; bool remember; - int ret; + const char *path = src->path; + const char *tcon = NULL; + int ret = -1; if (!src->path || !virStorageSourceIsLocalStorage(src)) return 0; @@ -1856,40 +1858,28 @@ virSecuritySELinuxSetImageLabelInternal(virSecurityManagerPtr mgr, if (!disk_seclabel->relabel) return 0; - ret = virSecuritySELinuxSetFilecon(mgr, src->path, - disk_seclabel->label, remember); + tcon = disk_seclabel->label; } else if (parent_seclabel && (!parent_seclabel->relabel || parent_seclabel->label)) { if (!parent_seclabel->relabel) return 0; - ret = virSecuritySELinuxSetFilecon(mgr, src->path, - parent_seclabel->label, remember); + tcon = parent_seclabel->label; } else if (!parent || parent == src) { if (src->shared) { - ret = virSecuritySELinuxSetFilecon(mgr, - src->path, - data->file_context, - remember); + tcon = data->file_context; } else if (src->readonly) { - ret = virSecuritySELinuxSetFilecon(mgr, - src->path, - data->content_context, - remember); + tcon = data->content_context; } else if (secdef->imagelabel) { - ret = virSecuritySELinuxSetFilecon(mgr, - src->path, - secdef->imagelabel, - remember); + tcon = secdef->imagelabel; } else { - ret = 0; + return 0; } } else { - ret = virSecuritySELinuxSetFilecon(mgr, - src->path, - data->content_context, - remember); + tcon = data->content_context; } + ret = virSecuritySELinuxSetFilecon(mgr, path, tcon, remember); + if (ret == 1 && !disk_seclabel) { /* If we failed to set a label, but virt_use_nfs let us * proceed anyway, then we don't need to relabel later. */ -- 2.21.0

This function is currently not called for any type of storage source that is not considered 'local' (as defined by virStorageSourceIsLocalStorage()). Well, NVMe disks are not 'local' from that point of view and therefore we will need to call this function more frequently. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/security/security_apparmor.c | 33 +++++++++++++++------- src/security/security_dac.c | 30 ++++++++++++++++++++ src/security/security_selinux.c | 48 +++++++++++++++++++++++++++----- 3 files changed, 94 insertions(+), 17 deletions(-) diff --git a/src/security/security_apparmor.c b/src/security/security_apparmor.c index 77eee9410c..580cd72223 100644 --- a/src/security/security_apparmor.c +++ b/src/security/security_apparmor.c @@ -803,22 +803,35 @@ AppArmorSetSecurityImageLabel(virSecurityManagerPtr mgr, { int rc = -1; char *profile_name = NULL; + VIR_AUTOFREE(char *) vfioGroupDev = NULL; + const char *path = NULL; virSecurityLabelDefPtr secdef; - if (!src->path || !virStorageSourceIsLocalStorage(src)) - return 0; - secdef = virDomainDefGetSecurityLabelDef(def, SECURITY_APPARMOR_NAME); if (!secdef || !secdef->relabel) return 0; if (secdef->imagelabel) { - /* if the device doesn't exist, error out */ - if (!virFileExists(src->path)) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("\'%s\' does not exist"), - src->path); - return -1; + if (src->type == VIR_STORAGE_TYPE_NVME) { + const virStorageSourceNVMeDef *nvme = src->nvme; + + if (!(vfioGroupDev = virPCIDeviceAddressGetIOMMUGroupDev(&nvme->pciAddr))) + return -1; + + path = vfioGroupDev; + } else { + if (!src->path || !virStorageSourceIsLocalStorage(src)) + return 0; + + /* if the device doesn't exist, error out */ + if (!virFileExists(src->path)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("\'%s\' does not exist"), + src->path); + return -1; + } + + path = src->path; } if ((profile_name = get_profile_name(def)) == NULL) @@ -827,7 +840,7 @@ AppArmorSetSecurityImageLabel(virSecurityManagerPtr mgr, /* update the profile only if it is loaded */ if (profile_loaded(secdef->imagelabel) >= 0) { if (load_profile(mgr, secdef->imagelabel, def, - src->path, false) < 0) { + path, false) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot update AppArmor profile " "\'%s\'"), diff --git a/src/security/security_dac.c b/src/security/security_dac.c index 4b4afef18a..42d585500d 100644 --- a/src/security/security_dac.c +++ b/src/security/security_dac.c @@ -915,6 +915,19 @@ virSecurityDACSetImageLabelInternal(virSecurityManagerPtr mgr, return -1; } + /* This is not very clean. But so far we don't have NVMe + * storage pool backend so that its chownCallback would be + * called. And this place looks least offensive. */ + if (src->type == VIR_STORAGE_TYPE_NVME) { + const virStorageSourceNVMeDef *nvme = src->nvme; + VIR_AUTOFREE(char *) vfioGroupDev = NULL; + + if (!(vfioGroupDev = virPCIDeviceAddressGetIOMMUGroupDev(&nvme->pciAddr))) + return -1; + + return virSecurityDACSetOwnership(mgr, NULL, vfioGroupDev, user, group, false); + } + /* We can't do restore on shared resources safely. Not even * with refcounting implemented in XATTRs because if there * was a domain running with the feature turned off the @@ -1004,6 +1017,23 @@ virSecurityDACRestoreImageLabelInt(virSecurityManagerPtr mgr, } } + /* This is not very clean. But so far we don't have NVMe + * storage pool backend so that its chownCallback would be + * called. And this place looks least offensive. */ + if (src->type == VIR_STORAGE_TYPE_NVME) { + const virStorageSourceNVMeDef *nvme = src->nvme; + VIR_AUTOFREE(char *) vfioGroupDev = NULL; + + if (!(vfioGroupDev = virPCIDeviceAddressGetIOMMUGroupDev(&nvme->pciAddr))) + return -1; + + /* Ideally, we would check if there is not another PCI + * device within domain def that is in the same IOMMU + * group. But we're not doing that for hostdevs yet. */ + + return virSecurityDACRestoreFileLabelInternal(mgr, NULL, vfioGroupDev, false); + } + return virSecurityDACRestoreFileLabelInternal(mgr, src, NULL, true); } diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index 3a00666d26..f439ecec46 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1754,9 +1754,8 @@ virSecuritySELinuxRestoreImageLabelInt(virSecurityManagerPtr mgr, { virSecurityLabelDefPtr seclabel; virSecurityDeviceLabelDefPtr disk_seclabel; - - if (!src->path || !virStorageSourceIsLocalStorage(src)) - return 0; + VIR_AUTOFREE(char *) vfioGroupDev = NULL; + const char *path = src->path; seclabel = virDomainDefGetSecurityLabelDef(def, SECURITY_SELINUX_NAME); if (seclabel == NULL) @@ -1788,9 +1787,16 @@ virSecuritySELinuxRestoreImageLabelInt(virSecurityManagerPtr mgr, * ownership, because that kills access on the destination host which is * sub-optimal for the guest VM's I/O attempts :-) */ if (migrated) { - int rc = virFileIsSharedFS(src->path); - if (rc < 0) - return -1; + int rc = 1; + + if (virStorageSourceIsLocalStorage(src)) { + if (!src->path) + return 0; + + if ((rc = virFileIsSharedFS(src->path)) < 0) + return -1; + } + if (rc == 1) { VIR_DEBUG("Skipping image label restore on %s because FS is shared", src->path); @@ -1798,7 +1804,22 @@ virSecuritySELinuxRestoreImageLabelInt(virSecurityManagerPtr mgr, } } - return virSecuritySELinuxRestoreFileLabel(mgr, src->path, true); + /* This is not very clean. But so far we don't have NVMe + * storage pool backend so that its chownCallback would be + * called. And this place looks least offensive. */ + if (src->type == VIR_STORAGE_TYPE_NVME) { + const virStorageSourceNVMeDef *nvme = src->nvme; + + if (!(vfioGroupDev = virPCIDeviceAddressGetIOMMUGroupDev(&nvme->pciAddr))) + return -1; + + /* Ideally, we would check if there is not another PCI + * device within domain def that is in the same IOMMU + * group. But we're not doing that for hostdevs yet. */ + path = vfioGroupDev; + } + + return virSecuritySELinuxRestoreFileLabel(mgr, path, true); } @@ -1823,6 +1844,7 @@ virSecuritySELinuxSetImageLabelInternal(virSecurityManagerPtr mgr, virSecurityDeviceLabelDefPtr disk_seclabel; virSecurityDeviceLabelDefPtr parent_seclabel = NULL; bool remember; + VIR_AUTOFREE(char *) vfioGroupDev = NULL; const char *path = src->path; const char *tcon = NULL; int ret = -1; @@ -1878,6 +1900,18 @@ virSecuritySELinuxSetImageLabelInternal(virSecurityManagerPtr mgr, tcon = data->content_context; } + /* This is not very clean. But so far we don't have NVMe + * storage pool backend so that its chownCallback would be + * called. And this place looks least offensive. */ + if (src->type == VIR_STORAGE_TYPE_NVME) { + const virStorageSourceNVMeDef *nvme = src->nvme; + + if (!(vfioGroupDev = virPCIDeviceAddressGetIOMMUGroupDev(&nvme->pciAddr))) + return -1; + + path = vfioGroupDev; + } + ret = virSecuritySELinuxSetFilecon(mgr, path, tcon, remember); if (ret == 1 && !disk_seclabel) { -- 2.21.0

This capability tracks if qemu is capable of: -drive file.driver=nvme The feature was added in QEMU's commit of v2.12.0-rc0~104^2~2. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_capabilities.c | 4 ++++ src/qemu/qemu_capabilities.h | 3 +++ tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_3.1.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_3.1.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_4.0.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_4.0.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_4.0.0.riscv32.xml | 1 + tests/qemucapabilitiesdata/caps_4.0.0.riscv64.xml | 1 + tests/qemucapabilitiesdata/caps_4.0.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_4.0.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_4.1.0.x86_64.xml | 1 + 20 files changed, 25 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 72e070e8ab..17ee5e48c8 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -540,6 +540,9 @@ VIR_ENUM_IMPL(virQEMUCaps, "dbus-vmstate", "vhost-user-gpu", "vhost-user-vga", + + /* 340 */ + "drive-nvme", ); @@ -1287,6 +1290,7 @@ static struct virQEMUCapsStringFlags virQEMUCapsQMPSchemaQueries[] = { { "query-display-options/ret-type/+egl-headless/rendernode", QEMU_CAPS_EGL_HEADLESS_RENDERNODE }, { "nbd-server-add/arg-type/bitmap", QEMU_CAPS_NBD_BITMAP }, { "blockdev-add/arg-type/+file/drop-cache", QEMU_CAPS_MIGRATION_FILE_DROP_CACHE }, + { "blockdev-add/arg-type/+nvme", QEMU_CAPS_DRIVE_NVME }, }; typedef struct _virQEMUCapsObjectTypeProps virQEMUCapsObjectTypeProps; diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 137b7161a5..5b6e7174d1 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -522,6 +522,9 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ QEMU_CAPS_DEVICE_VHOST_USER_GPU, /* -device vhost-user-gpu */ QEMU_CAPS_DEVICE_VHOST_USER_VGA, /* -device vhost-user-vga */ + /* 340 */ + QEMU_CAPS_DRIVE_NVME, /* -drive file.driver=nvme */ + QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml index 614fd14fb1..ec6479de2a 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml @@ -153,6 +153,7 @@ <flag name='memory-backend-memfd.hugetlb'/> <flag name='iothread.poll-max-ns'/> <flag name='memory-backend-file.align'/> + <flag name='drive-nvme'/> <version>2012000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>61700807</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml index fd9ae0bcb8..d4ed7cea49 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml @@ -151,6 +151,7 @@ <flag name='memory-backend-memfd.hugetlb'/> <flag name='iothread.poll-max-ns'/> <flag name='memory-backend-file.align'/> + <flag name='drive-nvme'/> <version>2011090</version> <kvmVersion>0</kvmVersion> <microcodeVersion>42900807</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml index 2930381068..91aee9f838 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml @@ -121,6 +121,7 @@ <flag name='memory-backend-memfd.hugetlb'/> <flag name='iothread.poll-max-ns'/> <flag name='memory-backend-file.align'/> + <flag name='drive-nvme'/> <version>2012000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>39100807</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml index 61b3602c48..96fa7c43c9 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml @@ -195,6 +195,7 @@ <flag name='iothread.poll-max-ns'/> <flag name='memory-backend-file.align'/> <flag name='x86-max-cpu'/> + <flag name='drive-nvme'/> <version>2011090</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100807</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml index 61be1df782..7b7d101ee9 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml @@ -152,6 +152,7 @@ <flag name='iothread.poll-max-ns'/> <flag name='memory-backend-file.align'/> <flag name='bochs-display'/> + <flag name='drive-nvme'/> <version>2012050</version> <kvmVersion>0</kvmVersion> <microcodeVersion>42900757</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml b/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml index 865becc179..d50789711a 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml @@ -92,6 +92,7 @@ <flag name='memory-backend-memfd.hugetlb'/> <flag name='iothread.poll-max-ns'/> <flag name='memory-backend-file.align'/> + <flag name='drive-nvme'/> <version>3000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml index eb54aeaff3..06a2e98b90 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml @@ -92,6 +92,7 @@ <flag name='memory-backend-memfd.hugetlb'/> <flag name='iothread.poll-max-ns'/> <flag name='memory-backend-file.align'/> + <flag name='drive-nvme'/> <version>3000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.s390x.xml b/tests/qemucapabilitiesdata/caps_3.0.0.s390x.xml index d511377262..a8fb99c9fd 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.s390x.xml @@ -123,6 +123,7 @@ <flag name='memory-backend-memfd.hugetlb'/> <flag name='iothread.poll-max-ns'/> <flag name='memory-backend-file.align'/> + <flag name='drive-nvme'/> <version>3000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>39100757</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml index 7a322030bd..6f926f2eaa 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml @@ -199,6 +199,7 @@ <flag name='nvdimm.unarmed'/> <flag name='x86-max-cpu'/> <flag name='bochs-display'/> + <flag name='drive-nvme'/> <version>3000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100757</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.1.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_3.1.0.ppc64.xml index 400dc45be4..4992d6c7e3 100644 --- a/tests/qemucapabilitiesdata/caps_3.1.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_3.1.0.ppc64.xml @@ -157,6 +157,7 @@ <flag name='memory-backend-file.pmem'/> <flag name='overcommit'/> <flag name='bochs-display'/> + <flag name='drive-nvme'/> <version>3000091</version> <kvmVersion>0</kvmVersion> <microcodeVersion>42900758</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.1.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_3.1.0.x86_64.xml index 434c644ad4..c5bd0a9b9e 100644 --- a/tests/qemucapabilitiesdata/caps_3.1.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_3.1.0.x86_64.xml @@ -202,6 +202,7 @@ <flag name='overcommit'/> <flag name='x86-max-cpu'/> <flag name='bochs-display'/> + <flag name='drive-nvme'/> <version>3000092</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100758</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_4.0.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_4.0.0.aarch64.xml index 20f119665b..eca166d5f9 100644 --- a/tests/qemucapabilitiesdata/caps_4.0.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_4.0.0.aarch64.xml @@ -165,6 +165,7 @@ <flag name='nbd-bitmap'/> <flag name='bochs-display'/> <flag name='migration-file-drop-cache'/> + <flag name='drive-nvme'/> <version>4000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>61700758</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_4.0.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_4.0.0.ppc64.xml index 9ea6f4d046..8a0dcbf705 100644 --- a/tests/qemucapabilitiesdata/caps_4.0.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_4.0.0.ppc64.xml @@ -170,6 +170,7 @@ <flag name='nbd-bitmap'/> <flag name='bochs-display'/> <flag name='migration-file-drop-cache'/> + <flag name='drive-nvme'/> <version>4000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>42900758</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_4.0.0.riscv32.xml b/tests/qemucapabilitiesdata/caps_4.0.0.riscv32.xml index 7503c2dbcd..d71ce16bcb 100644 --- a/tests/qemucapabilitiesdata/caps_4.0.0.riscv32.xml +++ b/tests/qemucapabilitiesdata/caps_4.0.0.riscv32.xml @@ -168,6 +168,7 @@ <flag name='nbd-bitmap'/> <flag name='bochs-display'/> <flag name='migration-file-drop-cache'/> + <flag name='drive-nvme'/> <version>4000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_4.0.0.riscv64.xml b/tests/qemucapabilitiesdata/caps_4.0.0.riscv64.xml index 4a94179ee7..d4a33642d6 100644 --- a/tests/qemucapabilitiesdata/caps_4.0.0.riscv64.xml +++ b/tests/qemucapabilitiesdata/caps_4.0.0.riscv64.xml @@ -168,6 +168,7 @@ <flag name='nbd-bitmap'/> <flag name='bochs-display'/> <flag name='migration-file-drop-cache'/> + <flag name='drive-nvme'/> <version>4000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_4.0.0.s390x.xml b/tests/qemucapabilitiesdata/caps_4.0.0.s390x.xml index ef802f3d1f..ee2ee0c1b0 100644 --- a/tests/qemucapabilitiesdata/caps_4.0.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_4.0.0.s390x.xml @@ -132,6 +132,7 @@ <flag name='bitmap-merge'/> <flag name='nbd-bitmap'/> <flag name='migration-file-drop-cache'/> + <flag name='drive-nvme'/> <version>4000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>39100758</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_4.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_4.0.0.x86_64.xml index 87c95f4d18..ef4b6aecdd 100644 --- a/tests/qemucapabilitiesdata/caps_4.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_4.0.0.x86_64.xml @@ -207,6 +207,7 @@ <flag name='x86-max-cpu'/> <flag name='bochs-display'/> <flag name='migration-file-drop-cache'/> + <flag name='drive-nvme'/> <version>4000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100758</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_4.1.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_4.1.0.x86_64.xml index 5d4540b3f7..f7a020d905 100644 --- a/tests/qemucapabilitiesdata/caps_4.1.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_4.1.0.x86_64.xml @@ -211,6 +211,7 @@ <flag name='migration-file-drop-cache'/> <flag name='vhost-user-gpu'/> <flag name='vhost-user-vga'/> + <flag name='drive-nvme'/> <version>4000050</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100759</microcodeVersion> -- 2.21.0

Now, that we have everything prepared, we can generate command line for NVMe disks. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_block.c | 25 ++++++++- src/qemu/qemu_command.c | 3 ++ src/qemu/qemu_process.c | 7 +++ .../disk-nvme.x86_64-latest.args | 53 +++++++++++++++++++ tests/qemuxml2argvtest.c | 1 + 5 files changed, 88 insertions(+), 1 deletion(-) create mode 100644 tests/qemuxml2argvdata/disk-nvme.x86_64-latest.args diff --git a/src/qemu/qemu_block.c b/src/qemu/qemu_block.c index 99b2a4efaa..311f42e52f 100644 --- a/src/qemu/qemu_block.c +++ b/src/qemu/qemu_block.c @@ -994,6 +994,25 @@ qemuBlockStorageSourceGetVvfatProps(virStorageSourcePtr src, } +static virJSONValuePtr +qemuBlockStorageSourceGetNVMeProps(virStorageSourcePtr src) +{ + const virStorageSourceNVMeDef *nvme = src->nvme; + VIR_AUTOFREE(char *) pciAddr = NULL; + virJSONValuePtr ret = NULL; + + if (!(pciAddr = virPCIDeviceAddressAsString(&nvme->pciAddr))) + return NULL; + + ignore_value(virJSONValueObjectCreate(&ret, + "s:driver", "nvme", + "s:device", pciAddr, + "U:namespace", nvme->namespace, + NULL)); + return ret; +} + + static int qemuBlockStorageSourceGetBlockdevGetCacheProps(virStorageSourcePtr src, virJSONValuePtr props) @@ -1076,8 +1095,12 @@ qemuBlockStorageSourceGetBackendProps(virStorageSourcePtr src, return NULL; break; - case VIR_STORAGE_TYPE_VOLUME: case VIR_STORAGE_TYPE_NVME: + if (!(fileprops = qemuBlockStorageSourceGetNVMeProps(src))) + return NULL; + break; + + case VIR_STORAGE_TYPE_VOLUME: case VIR_STORAGE_TYPE_NONE: case VIR_STORAGE_TYPE_LAST: return NULL; diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index b2519465ba..ad6c1e7e8a 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -1548,6 +1548,9 @@ qemuDiskSourceNeedsProps(virStorageSourcePtr src, src->haveTLS == VIR_TRISTATE_BOOL_YES) return true; + if (actualType == VIR_STORAGE_TYPE_NVME) + return true; + return false; } diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index a50cd54393..f20561c07b 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -5390,6 +5390,13 @@ qemuProcessStartValidateDisks(virDomainObjPtr vm, _("PowerPC pseries machines do not support floppy device")); return -1; } + + if (src->type == VIR_STORAGE_TYPE_NVME && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_DRIVE_NVME)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("NVMe disks are not supported with this QEMU binary")); + return -1; + } } return 0; diff --git a/tests/qemuxml2argvdata/disk-nvme.x86_64-latest.args b/tests/qemuxml2argvdata/disk-nvme.x86_64-latest.args new file mode 100644 index 0000000000..b0f640b751 --- /dev/null +++ b/tests/qemuxml2argvdata/disk-nvme.x86_64-latest.args @@ -0,0 +1,53 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-i686 \ +-name guest=QEMUGuest1,debug-threads=on \ +-S \ +-object secret,id=masterKey0,format=raw,\ +file=/tmp/lib/domain--1-QEMUGuest1/master-key.aes \ +-machine pc,accel=tcg,usb=off,dump-guest-core=off \ +-m 214 \ +-overcommit mem-lock=off \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-boot strict=on \ +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ +-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 \ +-drive file.driver=nvme,file.device=0000:01:00.0,file.namespace=1,format=raw,\ +if=none,id=drive-virtio-disk0 \ +-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,\ +id=virtio-disk0,bootindex=1 \ +-drive file.driver=nvme,file.device=0000:01:00.0,file.namespace=2,format=raw,\ +if=none,id=drive-virtio-disk1 \ +-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,\ +id=virtio-disk1 \ +-drive file.driver=nvme,file.device=0000:02:00.0,file.namespace=1,format=raw,\ +if=none,id=drive-virtio-disk2 \ +-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk2,\ +id=virtio-disk2 \ +-object secret,id=virtio-disk3-luks-secret0,\ +data=9eao5F8qtkGt+seB1HYivWIxbtwUu6MQtg1zpj/oDtUsPr1q8wBYM91uEHCn6j/1,\ +keyid=masterKey0,iv=AAECAwQFBgcICQoLDA0ODw==,format=base64 \ +-drive file.driver=nvme,file.device=0001:02:00.0,file.namespace=2,\ +encrypt.format=luks,encrypt.key-secret=virtio-disk3-luks-secret0,format=qcow2,\ +if=none,id=drive-virtio-disk3,cache=none \ +-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk3,\ +id=virtio-disk3,write-cache=on \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,\ +resourcecontrol=deny \ +-msg timestamp=on diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 5bbac1c8b8..896c05b267 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1095,6 +1095,7 @@ mymain(void) driver.config->vxhsTLS = 0; VIR_FREE(driver.config->vxhsTLSx509certdir); DO_TEST("disk-no-boot", NONE); + DO_TEST_CAPS_LATEST("disk-nvme"); DO_TEST_PARSE_ERROR("disk-device-lun-type-invalid", QEMU_CAPS_VIRTIO_SCSI); DO_TEST_FAILURE("disk-usb-nosupport", NONE); -- 2.21.0

Because this is a HMP we're dealing with, there is nothing like class of reply message, so we have to do some string comparison to guess if the command fails. Well, with NVMe disks whole new class of errors comes to play because qemu needs to initialize IOMMU and VFIO for them. You can see all the messages it may produce in qemu_vfio_init_pci(). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_monitor_text.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/qemu/qemu_monitor_text.c b/src/qemu/qemu_monitor_text.c index b1abcacdd0..39445247a0 100644 --- a/src/qemu/qemu_monitor_text.c +++ b/src/qemu/qemu_monitor_text.c @@ -76,6 +76,13 @@ int qemuMonitorTextAddDrive(qemuMonitorPtr mon, goto cleanup; } + if (strstr(reply, "IOMMU") || + strstr(reply, "VFIO")) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + reply); + goto cleanup; + } + ret = 0; cleanup: -- 2.21.0

At the very beginning of the attach function the qemuDomainStorageSourceChainAccessAllow() is called which modifies CGroups, locks and seclabels for new disk and its backing chain. This must be followed by a counterpart which reverts back all the changes if something goes wrong. This boils down to calling qemuDomainStorageSourceChainAccessRevoke() which is done under 'error' label. But not all failure branches jump there. They just jump onto 'cleanup' label where no revoke is done. Such mistake is easy to do because 'cleanup' label does exist. Therefore, dissolve 'error' block in 'cleanup' and have everything jump onto 'cleanup' label. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_hotplug.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 5f92c61aa9..75ffc099e1 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -682,13 +682,13 @@ qemuDomainAttachDiskGeneric(virQEMUDriverPtr driver, bool blockdev = virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCKDEV); if (qemuDomainStorageSourceChainAccessAllow(driver, vm, disk->src) < 0) - goto cleanup; + return -1; if (qemuAssignDeviceDiskAlias(vm->def, disk, priv->qemuCaps) < 0) - goto error; + goto cleanup; if (qemuDomainPrepareDiskSource(disk, priv, cfg) < 0) - goto error; + goto cleanup; if (blockdev) { if (disk->copy_on_read == VIR_TRISTATE_SWITCH_ON && @@ -705,13 +705,13 @@ qemuDomainAttachDiskGeneric(virQEMUDriverPtr driver, } if (!(devstr = qemuBuildDiskDeviceStr(vm->def, disk, 0, priv->qemuCaps))) - goto error; + goto cleanup; if (VIR_REALLOC_N(vm->def->disks, vm->def->ndisks + 1) < 0) - goto error; + goto cleanup; if (qemuHotplugAttachManagedPR(driver, vm, disk->src, QEMU_ASYNC_JOB_NONE) < 0) - goto error; + goto cleanup; qemuDomainObjEnterMonitor(driver, vm); @@ -747,7 +747,7 @@ qemuDomainAttachDiskGeneric(virQEMUDriverPtr driver, if (qemuDomainObjExitMonitor(driver, vm) < 0) { ret = -2; - goto error; + goto cleanup; } virDomainAuditDisk(vm, NULL, disk->src, "attach", true); @@ -756,6 +756,8 @@ qemuDomainAttachDiskGeneric(virQEMUDriverPtr driver, ret = 0; cleanup: + if (ret < 0) + ignore_value(qemuDomainStorageSourceChainAccessRevoke(driver, vm, disk->src)); qemuDomainSecretDiskDestroy(disk); return ret; @@ -772,9 +774,6 @@ qemuDomainAttachDiskGeneric(virQEMUDriverPtr driver, ret = -2; virDomainAuditDisk(vm, NULL, disk->src, "attach", false); - - error: - ignore_value(qemuDomainStorageSourceChainAccessRevoke(driver, vm, disk->src)); goto cleanup; } -- 2.21.0

With NVMe disks, one can start a blockjob with a NVMe disk that is not visible in domain XML (at least right away). Usually, it's fairly easy to override this limitation of qemuDomainGetMemLockLimitBytes() - for instance for hostdevs we temporarily add the device to domain def, let the function calculate the limit and then remove the device. But it's not so easy with virStorageSourcePtr - in some cases they don't necessarily are attached to a disk. And even if they are it's done later in the process and frankly, I find it too complicated to be able to use the simple trick we use with hostdevs. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_command.c | 2 +- src/qemu/qemu_domain.c | 46 ++++++++++++++++++++++++++--------------- src/qemu/qemu_domain.h | 6 ++++-- src/qemu/qemu_hotplug.c | 12 +++++------ tests/qemumemlocktest.c | 2 +- 5 files changed, 41 insertions(+), 27 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index ad6c1e7e8a..fa2394378a 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -10511,7 +10511,7 @@ qemuBuildCommandLine(virQEMUDriverPtr driver, /* In some situations, eg. VFIO passthrough, QEMU might need to lock a * significant amount of memory, so we need to set the limit accordingly */ - virCommandSetMaxMemLock(cmd, qemuDomainGetMemLockLimitBytes(def)); + virCommandSetMaxMemLock(cmd, qemuDomainGetMemLockLimitBytes(def, false)); if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_MSG_TIMESTAMP) && cfg->logTimestamp) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index e4e9d4e361..3fe0004cab 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11843,12 +11843,14 @@ ppc64VFIODeviceIsNV2Bridge(const char *device) /** * getPPC64MemLockLimitBytes: * @def: domain definition + * @forceVFIO: force VFIO usage * * A PPC64 helper that calculates the memory locking limit in order for * the guest to operate properly. */ static unsigned long long -getPPC64MemLockLimitBytes(virDomainDefPtr def) +getPPC64MemLockLimitBytes(virDomainDefPtr def, + bool forceVFIO) { unsigned long long memKB = 0; unsigned long long baseLimit = 0; @@ -11939,7 +11941,7 @@ getPPC64MemLockLimitBytes(virDomainDefPtr def) passthroughLimit = maxMemory + 128 * (1ULL<<30) / 512 * nPCIHostBridges + 8192; - } else if (usesVFIO) { + } else if (usesVFIO || forceVFIO) { /* For regular (non-NVLink2 present) VFIO passthrough, the value * of passthroughLimit is: * @@ -11977,16 +11979,20 @@ getPPC64MemLockLimitBytes(virDomainDefPtr def) /** * qemuDomainGetMemLockLimitBytes: * @def: domain definition + * @forceVFIO: force VFIO calculation * * Calculate the memory locking limit that needs to be set in order for * the guest to operate properly. The limit depends on a number of factors, * including certain configuration options and less immediately apparent ones * such as the guest architecture or the use of certain devices. + * The @forceVFIO argument can be used to tell this function will use VFIO even + * though @def doesn't indicates so right now. * * Returns: the memory locking limit, or 0 if setting the limit is not needed */ unsigned long long -qemuDomainGetMemLockLimitBytes(virDomainDefPtr def) +qemuDomainGetMemLockLimitBytes(virDomainDefPtr def, + bool forceVFIO) { unsigned long long memKB = 0; bool usesVFIO = false; @@ -12007,7 +12013,7 @@ qemuDomainGetMemLockLimitBytes(virDomainDefPtr def) return VIR_DOMAIN_MEMORY_PARAM_UNLIMITED; if (ARCH_IS_PPC64(def->os.arch) && def->virtType == VIR_DOMAIN_VIRT_KVM) - return getPPC64MemLockLimitBytes(def); + return getPPC64MemLockLimitBytes(def, forceVFIO); /* For device passthrough using VFIO the guest memory and MMIO memory * regions need to be locked persistent in order to allow DMA. @@ -12027,18 +12033,20 @@ qemuDomainGetMemLockLimitBytes(virDomainDefPtr def) * * Note that this may not be valid for all platforms. */ - for (i = 0; i < def->nhostdevs; i++) { - if (virHostdevIsVFIODevice(def->hostdevs[i]) || - virHostdevIsMdevDevice(def->hostdevs[i])) { - usesVFIO = true; - break; + if (!forceVFIO) { + for (i = 0; i < def->nhostdevs; i++) { + if (virHostdevIsVFIODevice(def->hostdevs[i]) || + virHostdevIsMdevDevice(def->hostdevs[i])) { + usesVFIO = true; + break; + } } + + if (virDomainDefHasNVMeDisk(def)) + usesVFIO = true; } - if (virDomainDefHasNVMeDisk(def)) - usesVFIO = true; - - if (usesVFIO) + if (usesVFIO || forceVFIO) memKB = virDomainDefGetMemoryTotal(def) + 1024 * 1024; done: @@ -12049,9 +12057,12 @@ qemuDomainGetMemLockLimitBytes(virDomainDefPtr def) /** * qemuDomainAdjustMaxMemLock: * @vm: domain + * @forceVFIO: apply VFIO requirements even if vm's def doesn't require it * * Adjust the memory locking limit for the QEMU process associated to @vm, in - * order to comply with VFIO or architecture requirements. + * order to comply with VFIO or architecture requirements. If @forceVFIO is + * true then the limit is changed even if nothing in @vm's definition indicates + * so. * * The limit will not be changed unless doing so is needed; the first time * the limit is changed, the original (default) limit is stored in @vm and @@ -12061,12 +12072,13 @@ qemuDomainGetMemLockLimitBytes(virDomainDefPtr def) * Returns: 0 on success, <0 on failure */ int -qemuDomainAdjustMaxMemLock(virDomainObjPtr vm) +qemuDomainAdjustMaxMemLock(virDomainObjPtr vm, + bool forceVFIO) { unsigned long long bytes = 0; int ret = -1; - bytes = qemuDomainGetMemLockLimitBytes(vm->def); + bytes = qemuDomainGetMemLockLimitBytes(vm->def, forceVFIO); if (bytes) { /* If this is the first time adjusting the limit, save the current @@ -12115,7 +12127,7 @@ qemuDomainAdjustMaxMemLockHostdev(virDomainObjPtr vm, int ret = 0; vm->def->hostdevs[vm->def->nhostdevs++] = hostdev; - if (qemuDomainAdjustMaxMemLock(vm) < 0) + if (qemuDomainAdjustMaxMemLock(vm, false) < 0) ret = -1; vm->def->hostdevs[--(vm->def->nhostdevs)] = NULL; diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index de6479f8e0..7802a8b98b 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -962,8 +962,10 @@ bool qemuDomainSupportsPCI(virDomainDefPtr def, void qemuDomainUpdateCurrentMemorySize(virDomainObjPtr vm); -unsigned long long qemuDomainGetMemLockLimitBytes(virDomainDefPtr def); -int qemuDomainAdjustMaxMemLock(virDomainObjPtr vm); +unsigned long long qemuDomainGetMemLockLimitBytes(virDomainDefPtr def, + bool forceVFIO); +int qemuDomainAdjustMaxMemLock(virDomainObjPtr vm, + bool forceVFIO); int qemuDomainAdjustMaxMemLockHostdev(virDomainObjPtr vm, virDomainHostdevDefPtr hostdev); diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 75ffc099e1..f5a672a195 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1648,7 +1648,7 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver, if (teardowndevice && qemuDomainNamespaceTeardownHostdev(vm, hostdev) < 0) VIR_WARN("Unable to remove host device from /dev"); - if (teardownmemlock && qemuDomainAdjustMaxMemLock(vm) < 0) + if (teardownmemlock && qemuDomainAdjustMaxMemLock(vm, false) < 0) VIR_WARN("Unable to reset maximum locked memory on hotplug fail"); if (releaseaddr) @@ -2387,7 +2387,7 @@ qemuDomainAttachMemory(virQEMUDriverPtr driver, if (virDomainMemoryInsert(vm->def, mem) < 0) goto cleanup; - if (qemuDomainAdjustMaxMemLock(vm) < 0) + if (qemuDomainAdjustMaxMemLock(vm, false) < 0) goto removedef; qemuDomainObjEnterMonitor(driver, vm); @@ -2458,7 +2458,7 @@ qemuDomainAttachMemory(virQEMUDriverPtr driver, /* reset the mlock limit */ virErrorPreserveLast(&orig_err); - ignore_value(qemuDomainAdjustMaxMemLock(vm)); + ignore_value(qemuDomainAdjustMaxMemLock(vm, false)); virErrorRestore(&orig_err); goto audit; @@ -2856,7 +2856,7 @@ qemuDomainAttachMediatedDevice(virQEMUDriverPtr driver, ret = 0; cleanup: if (ret < 0) { - if (teardownmemlock && qemuDomainAdjustMaxMemLock(vm) < 0) + if (teardownmemlock && qemuDomainAdjustMaxMemLock(vm, false) < 0) VIR_WARN("Unable to reset maximum locked memory on hotplug fail"); if (teardowncgroup && qemuTeardownHostdevCgroup(vm, hostdev) < 0) VIR_WARN("Unable to remove host device cgroup ACL on hotplug fail"); @@ -4378,7 +4378,7 @@ qemuDomainRemoveMemoryDevice(virQEMUDriverPtr driver, ignore_value(qemuProcessRefreshBalloonState(driver, vm, QEMU_ASYNC_JOB_NONE)); /* decrease the mlock limit after memory unplug if necessary */ - ignore_value(qemuDomainAdjustMaxMemLock(vm)); + ignore_value(qemuDomainAdjustMaxMemLock(vm, false)); return 0; } @@ -4505,7 +4505,7 @@ qemuDomainRemoveHostDevice(virQEMUDriverPtr driver, qemuDomainRemovePCIHostDevice(driver, vm, hostdev); /* QEMU might no longer need to lock as much memory, eg. we just * detached the last VFIO device, so adjust the limit here */ - if (qemuDomainAdjustMaxMemLock(vm) < 0) + if (qemuDomainAdjustMaxMemLock(vm, false) < 0) VIR_WARN("Failed to adjust locked memory limit"); break; case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB: diff --git a/tests/qemumemlocktest.c b/tests/qemumemlocktest.c index c9484ac9cb..1d52498e25 100644 --- a/tests/qemumemlocktest.c +++ b/tests/qemumemlocktest.c @@ -43,7 +43,7 @@ testCompareMemLock(const void *data) goto cleanup; } - ret = virTestCompareToULL(info->memlock, qemuDomainGetMemLockLimitBytes(def)); + ret = virTestCompareToULL(info->memlock, qemuDomainGetMemLockLimitBytes(def, false)); cleanup: virDomainDefFree(def); -- 2.21.0

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 58 +++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_hostdev.c | 22 ++++++++++++++++ src/qemu/qemu_hostdev.h | 6 +++++ 3 files changed, 86 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 3fe0004cab..cb94840d5f 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -10537,6 +10537,54 @@ typedef enum { } qemuDomainStorageSourceAccessFlags; +static int +qemuDomainStorageSourceAccessModifyNVMe(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virStorageSourcePtr src, + bool revoke) +{ + bool revoke_maxmemlock = false; + bool revoke_hostdev = false; + int ret = -1; + + if (!virStorageSourceChainHasNVMe(src)) + return 0; + + VIR_DEBUG("Modifying access for a NVMe disk src=%p revoke=%d", + src, revoke); + + if (revoke) { + revoke_maxmemlock = true; + revoke_hostdev = true; + ret = 0; + goto revoke; + } + + if (qemuDomainAdjustMaxMemLock(vm, true) < 0) + goto revoke; + + revoke_maxmemlock = true; + + if (qemuHostdevPrepareOneNVMeDisk(driver, vm->def->name, src) < 0) + goto revoke; + + revoke_hostdev = true; + + return 0; + + revoke: + if (revoke_maxmemlock) { + if (qemuDomainAdjustMaxMemLock(vm, false) < 0) + VIR_WARN("Unable to change max memlock limit"); + } + + if (revoke_hostdev) + qemuHostdevReAttachOneNVMeDisk(driver, vm->def->name, src); + + return ret; +} + + /** * qemuDomainStorageSourceAccessModify: * @driver: qemu driver struct @@ -10568,6 +10616,7 @@ qemuDomainStorageSourceAccessModify(virQEMUDriverPtr driver, bool revoke_cgroup = false; bool revoke_label = false; bool revoke_namespace = false; + bool revoke_nvme = false; bool revoke_lockspace = false; VIR_DEBUG("src='%s' readonly=%d force_ro=%d force_rw=%d revoke=%d chain=%d", @@ -10585,6 +10634,7 @@ qemuDomainStorageSourceAccessModify(virQEMUDriverPtr driver, revoke_cgroup = true; revoke_label = true; revoke_namespace = true; + revoke_nvme = true; revoke_lockspace = true; ret = 0; goto revoke; @@ -10595,6 +10645,11 @@ qemuDomainStorageSourceAccessModify(virQEMUDriverPtr driver, revoke_lockspace = true; + if (qemuDomainStorageSourceAccessModifyNVMe(driver, vm, src, false) < 0) + goto revoke; + + revoke_nvme = true; + /* When modifying access of existing @src namespace does not need update */ if (!(flags & QEMU_DOMAIN_STORAGE_SOURCE_ACCESS_MODIFY_ACCESS)) { if (qemuDomainNamespaceSetupDisk(vm, src) < 0) @@ -10645,6 +10700,9 @@ qemuDomainStorageSourceAccessModify(virQEMUDriverPtr driver, VIR_WARN("Unable to remove /dev entry for %s", srcstr); } + if (revoke_nvme) + qemuDomainStorageSourceAccessModifyNVMe(driver, vm, src, true); + if (revoke_lockspace) { if (virDomainLockImageDetach(driver->lockManager, vm, src) < 0) VIR_WARN("Unable to release lock on %s", srcstr); diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index 5ab0217858..6ab052724a 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -212,6 +212,17 @@ qemuHostdevPreparePCIDevicesCheckSupport(virDomainHostdevDefPtr *hostdevs, return true; } +int +qemuHostdevPrepareOneNVMeDisk(virQEMUDriverPtr driver, + const char *name, + virStorageSourcePtr src) +{ + return virHostdevPrepareOneNVMeDevice(driver->hostdevMgr, + QEMU_DRIVER_NAME, + name, + src); +} + int qemuHostdevPrepareNVMeDisks(virQEMUDriverPtr driver, const char *name, @@ -369,6 +380,17 @@ qemuHostdevPrepareDomainDevices(virQEMUDriverPtr driver, return 0; } +void +qemuHostdevReAttachOneNVMeDisk(virQEMUDriverPtr driver, + const char *name, + virStorageSourcePtr src) +{ + virHostdevReAttachOneNVMeDevice(driver->hostdevMgr, + QEMU_DRIVER_NAME, + name, + src); +} + void qemuHostdevReAttachNVMeDisks(virQEMUDriverPtr driver, const char *name, diff --git a/src/qemu/qemu_hostdev.h b/src/qemu/qemu_hostdev.h index 735414b6aa..23df925529 100644 --- a/src/qemu/qemu_hostdev.h +++ b/src/qemu/qemu_hostdev.h @@ -41,6 +41,9 @@ int qemuHostdevUpdateActiveSCSIDevices(virQEMUDriverPtr driver, int qemuHostdevUpdateActiveDomainDevices(virQEMUDriverPtr driver, virDomainDefPtr def); +int qemuHostdevPrepareOneNVMeDisk(virQEMUDriverPtr driver, + const char *name, + virStorageSourcePtr src); int qemuHostdevPrepareNVMeDisks(virQEMUDriverPtr driver, const char *name, virDomainDiskDefPtr *disks, @@ -74,6 +77,9 @@ int qemuHostdevPrepareDomainDevices(virQEMUDriverPtr driver, virQEMUCapsPtr qemuCaps, unsigned int flags); +void qemuHostdevReAttachOneNVMeDisk(virQEMUDriverPtr driver, + const char *name, + virStorageSourcePtr src); void qemuHostdevReAttachNVMeDisks(virQEMUDriverPtr driver, const char *name, virDomainDiskDefPtr *disks, -- 2.21.0

This is slightly more complicated because NVMe disk source is not a simple attribute to <source/> element. The format in which the PCI address and namespace ID are printed is the same as QEMU accepts them: nvme://XXXX:XX:XX.X/X Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- tools/virsh-domain-monitor.c | 31 +++++++++++++++++++++++++------ 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/tools/virsh-domain-monitor.c b/tools/virsh-domain-monitor.c index 0e2c4191d7..f663a87f8f 100644 --- a/tools/virsh-domain-monitor.c +++ b/tools/virsh-domain-monitor.c @@ -627,8 +627,8 @@ cmdDomblklist(vshControl *ctl, const vshCmd *cmd) for (i = 0; i < ndisks; i++) { ctxt->node = disks[i]; + type = virXPathString("string(./@type)", ctxt); if (details) { - type = virXPathString("string(./@type)", ctxt); device = virXPathString("string(./@device)", ctxt); if (!type || !device) { vshPrint(ctl, "unable to query block list details"); @@ -641,11 +641,30 @@ cmdDomblklist(vshControl *ctl, const vshCmd *cmd) vshError(ctl, "unable to query block list"); goto cleanup; } - source = virXPathString("string(./source/@file" - "|./source/@dev" - "|./source/@dir" - "|./source/@name" - "|./source/@volume)", ctxt); + + if (STREQ_NULLABLE(type, "nvme")) { + VIR_AUTOFREE(char *) namespace = NULL; + virPCIDeviceAddress addr = { 0 }; + xmlNodePtr addrNode = NULL; + + if (!(namespace = virXPathString("string(./source/@namespace)", ctxt)) || + !(addrNode = virXPathNode("./source/address", ctxt)) || + virPCIDeviceAddressParseXML(addrNode, &addr) < 0) { + vshError(ctl, "Unable to query NVMe disk address"); + goto cleanup; + } + + if (virAsprintf(&source, "nvme://%04x:%02x:%02x.%d/%s", + addr.domain, addr.bus, addr.slot, addr.function, namespace) < 0) + goto cleanup; + } else { + source = virXPathString("string(./source/@file" + "|./source/@dev" + "|./source/@dir" + "|./source/@name" + "|./source/@volume)", ctxt); + } + if (details) { if (vshTableRowAppend(table, type, device, target, NULLSTR_MINUS(source), NULL) < 0) -- 2.21.0
participants (2)
-
Cole Robinson
-
Michal Privoznik