[RFC PATCH 0/7] Implement support for iommufd and multiple vSMMUs

Hi, This is a follow up to the third RFC patchset [0] for supporting multiple vSMMU instances and using iommufd to propagate DMA mappings to kernel for VM-assigned host devices in a qemu VM. It builds on the recent libvirt patch series [1] without support for iommufd and the 'accel' iommu attribute. This patchset implements support for specifying multiple <iommu> devices within the VM definition when smmuv3Dev IOMMU model is specified, and is tested with Shameer's latest qemu RFC for HW-accelerated vSMMU devices [2] Moreover, it adds a new 'iommufd' attribute for hostdev devices to be associated with the iommufd object. For instance, specifying the iommufd object and associated hostdev in a VM definition with multiple IOMMUs, configured to be routed to pcie-expander-bus controllers in a way where VFIO device to SMMUv3 associations are matched with the host: <devices> ... <controller type='pci' index='1' model='pcie-expander-bus'> <model name='pxb-pcie'/> <target busNr='252'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> </controller> <controller type='pci' index='2' model='pcie-expander-bus'> <model name='pxb-pcie'/> <target busNr='248'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </controller> ... <controller type='pci' index='21' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='21' port='0x0'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </controller> <controller type='pci' index='22' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='22' port='0xa8'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> </controller> ... <hostdev mode='subsystem' type='pci' managed='no'> <driver iommufd='yes'/> <source> <address domain='0x0009' bus='0x01' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x15' slot='0x00' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='no'> <driver iommufd='yes'/> <source> <address domain='0x0019' bus='0x01' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x16' slot='0x00' function='0x0'/> </hostdev> <iommu model='smmuv3Dev' parentIdx='1' accel='on'/> <iommu model='smmuv3Dev' parentIdx='2' accel='on'/> </devices> This would get translated to a qemu command line with the arguments below. Note that libvirt will open the /dev/iommu and VFIO cdev, passing the associated fd number to qemu: -device '{"driver":"pxb-pcie","bus_nr":252,"id":"pci.1","bus":"pcie.0","addr":"0x1"}' \ -device '{"driver":"pxb-pcie","bus_nr":248,"id":"pci.2","bus":"pcie.0","addr":"0x2"}' \ -device '{"driver":"pcie-root-port","port":0,"chassis":21,"id":"pci.21","bus":"pci.1","addr":"0x0"}' \ -device '{"driver":"pcie-root-port","port":168,"chassis":22,"id":"pci.22","bus":"pci.2","addr":"0x0"}' \ -object '{"qom-type":"iommufd","id":"iommufd0","fd":"24"}' \ -device '{"driver":"arm-smmuv3","primary-bus":"pci.1","id":"smmuv3.0","accel":true}' \ -device '{"driver":"arm-smmuv3","primary-bus":"pci.2","id":"smmuv3.1","accel":true}' \ -device '{"driver":"vfio-pci","host":"0009:01:00.0","id":"hostdev0","iommufd":"iommufd0","fd":"22","bus":"pci.21","addr":"0x0"}' \ -device '{"driver":"vfio-pci","host":"0019:01:00.0","id":"hostdev1","iommufd":"iommufd0","fd":"25","bus":"pci.22","addr":"0x0"}' \ Changes from RFCv3: - Change iommufd attribute value type to boolean - Move iommufd attribute under hostdev PCI driver struct - Use virDomainDeviceInfoAddressIsEqual() for comparing definitions in virDomainIOMMUDefEquals() - Added qemuxmlconfdata test cases - Include seclabel logic for /dev/iommu This series is on Github: https://github.com/NathanChenNVIDIA/libvirt/tree/smmuv3Dev-iommufd-09-15-25 Thanks, Nathan [0] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/XXEO2... [1] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/3PYS3... [2] https://lore.kernel.org/qemu-devel/20250714155941.22176-1-shameerali.kolothu... Nathan Chen (8): qemu: add IOMMU model smmuv3Dev conf: Support multiple smmuv3Dev IOMMU devices tests: qemuxmlconfdata: provide smmuv3Dev sample XML and CLI args qemu: Implement support for associating iommufd to hostdev qemu: open iommufd FDs from libvirt backend qemu: Update Cgroup, namespace, and seclabel for qemu to access iommufd paths tests: qemuxmlconfdata: provide iommufd sample XML and CLI args cover letter: qemu: Implement support for iommufd and multiple vSMMUs docs/formatdomain.rst | 21 +- src/conf/device_conf.c | 9 + src/conf/device_conf.h | 1 + src/conf/domain_conf.c | 120 ++++++-- src/conf/domain_conf.h | 12 +- src/conf/domain_validate.c | 58 +++- src/conf/schemas/basictypes.rng | 5 + src/conf/schemas/domaincommon.rng | 15 +- src/libvirt_private.syms | 2 + src/qemu/qemu_alias.c | 15 +- src/qemu/qemu_cgroup.c | 61 ++++ src/qemu/qemu_cgroup.h | 1 + src/qemu/qemu_command.c | 265 +++++++++++++----- src/qemu/qemu_command.h | 3 +- src/qemu/qemu_domain.c | 8 + src/qemu/qemu_domain.h | 7 + src/qemu/qemu_domain_address.c | 33 ++- src/qemu/qemu_driver.c | 8 +- src/qemu/qemu_hotplug.c | 2 +- src/qemu/qemu_namespace.c | 44 +++ src/qemu/qemu_postparse.c | 11 +- src/qemu/qemu_process.c | 232 +++++++++++++++ src/qemu/qemu_validate.c | 18 +- src/security/security_apparmor.c | 15 + src/security/security_dac.c | 34 +++ src/security/security_selinux.c | 30 ++ src/util/virpci.c | 68 +++++ src/util/virpci.h | 1 + .../iommu-smmuv3Dev.aarch64-latest.args | 39 +++ .../iommu-smmuv3Dev.aarch64-latest.xml | 62 ++++ tests/qemuxmlconfdata/iommu-smmuv3Dev.xml | 49 ++++ .../iommufd-q35.x86_64-latest.args | 41 +++ .../iommufd-q35.x86_64-latest.xml | 60 ++++ tests/qemuxmlconfdata/iommufd-q35.xml | 38 +++ .../iommufd-virt.aarch64-latest.args | 33 +++ .../iommufd-virt.aarch64-latest.xml | 34 +++ tests/qemuxmlconfdata/iommufd-virt.xml | 22 ++ .../iommufd.x86_64-latest.args | 35 +++ .../qemuxmlconfdata/iommufd.x86_64-latest.xml | 38 +++ tests/qemuxmlconfdata/iommufd.xml | 30 ++ tests/qemuxmlconftest.c | 5 + 41 files changed, 1447 insertions(+), 138 deletions(-) create mode 100644 tests/qemuxmlconfdata/iommu-smmuv3Dev.aarch64-latest.args create mode 100644 tests/qemuxmlconfdata/iommu-smmuv3Dev.aarch64-latest.xml create mode 100644 tests/qemuxmlconfdata/iommu-smmuv3Dev.xml create mode 100644 tests/qemuxmlconfdata/iommufd-q35.x86_64-latest.args create mode 100644 tests/qemuxmlconfdata/iommufd-q35.x86_64-latest.xml create mode 100644 tests/qemuxmlconfdata/iommufd-q35.xml create mode 100644 tests/qemuxmlconfdata/iommufd-virt.aarch64-latest.args create mode 100644 tests/qemuxmlconfdata/iommufd-virt.aarch64-latest.xml create mode 100644 tests/qemuxmlconfdata/iommufd-virt.xml create mode 100644 tests/qemuxmlconfdata/iommufd.x86_64-latest.args create mode 100644 tests/qemuxmlconfdata/iommufd.x86_64-latest.xml create mode 100644 tests/qemuxmlconfdata/iommufd.xml -- 2.43.0

Introduce support for "smmuv3Dev" IOMMU model and "parentIdx" and "accel" IOMMU device attributes. The former indicates the index of the controller that a smmuv3Dev IOMMU device is attached to, while the latter indicates whether hardware accelerated SMMUv3 support is to be enabled. Signed-off-by: Nathan Chen <nathanc@nvidia.com> --- docs/formatdomain.rst | 13 +++++- src/conf/domain_conf.c | 35 +++++++++++++++ src/conf/domain_conf.h | 3 ++ src/conf/domain_validate.c | 26 +++++++++-- src/conf/schemas/domaincommon.rng | 11 +++++ src/qemu/qemu_command.c | 73 +++++++++++++++++++++++++++++-- src/qemu/qemu_domain_address.c | 2 + src/qemu/qemu_validate.c | 16 +++++++ 8 files changed, 170 insertions(+), 9 deletions(-) diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index f50dce477f..2f1e06ba0e 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -9161,8 +9161,17 @@ Example: ``model`` Supported values are ``intel`` (for Q35 guests) ``smmuv3`` (:since:`since 5.5.0`, for ARM virt guests), ``virtio`` - (:since:`since 8.3.0`, for Q35 and ARM virt guests) and - ``amd`` (:since:`since 11.5.0`). + (:since:`since 8.3.0`, for Q35 and ARM virt guests), + ``amd`` (:since:`since 11.5.0`), and ``smmuv3Dev`` (for + ARM virt guests). + +``parentIdx`` + The ``parentIdx`` attribute notes the index of the controller that a + smmuv3Dev IOMMU device is attached to. + +``accel`` + The ``accel`` attribute with possible values ``on`` and ``off`` can be used + to enable hardware acceleration support for smmuv3Dev IOMMU devices. ``driver`` The ``driver`` subelement can be used to configure additional options, some diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 281846dfbe..42b2a5c74f 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -1353,6 +1353,7 @@ VIR_ENUM_IMPL(virDomainIOMMUModel, "smmuv3", "virtio", "amd", + "smmuv3Dev", ); VIR_ENUM_IMPL(virDomainVsockModel, @@ -2813,6 +2814,8 @@ virDomainIOMMUDefNew(void) iommu = g_new0(virDomainIOMMUDef, 1); + iommu->parent_idx = -1; + return g_steal_pointer(&iommu); } @@ -14407,6 +14410,14 @@ virDomainIOMMUDefParseXML(virDomainXMLOption *xmlopt, VIR_XML_PROP_REQUIRED, &iommu->model) < 0) return NULL; + if (virXMLPropInt(node, "parentIdx", 10, VIR_XML_PROP_NONE, + &iommu->parent_idx, -1) < 0) + return NULL; + + if (virXMLPropTristateSwitch(node, "accel", VIR_XML_PROP_NONE, + &iommu->accel) < 0) + return NULL; + if ((driver = virXPathNode("./driver", ctxt))) { if (virXMLPropTristateSwitch(driver, "intremap", VIR_XML_PROP_NONE, &iommu->intremap) < 0) @@ -22092,6 +22103,18 @@ virDomainIOMMUDefCheckABIStability(virDomainIOMMUDef *src, dst->aw_bits, src->aw_bits); return false; } + if (src->parent_idx != dst->parent_idx) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Target domain IOMMU device parent_idx value '%1$d' does not match source '%2$d'"), + dst->parent_idx, src->parent_idx); + return false; + } + if (src->accel != dst->accel) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Target domain IOMMU device accel value '%1$d' does not match source '%2$d'"), + dst->accel, src->accel); + return false; + } if (src->dma_translation != dst->dma_translation) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("Target domain IOMMU device dma translation '%1$s' does not match source '%2$s'"), @@ -28417,6 +28440,18 @@ virDomainIOMMUDefFormat(virBuffer *buf, virBufferAsprintf(&attrBuf, " model='%s'", virDomainIOMMUModelTypeToString(iommu->model)); + if (iommu->parent_idx >= 0 && iommu->model == VIR_DOMAIN_IOMMU_MODEL_SMMUV3_DEV) { + virBufferAsprintf(&attrBuf, " parentIdx='%d'", + iommu->parent_idx); + } + + if (iommu->model == VIR_DOMAIN_IOMMU_MODEL_SMMUV3_DEV) { + if (iommu->accel != VIR_TRISTATE_SWITCH_ABSENT) { + virBufferAsprintf(&attrBuf, " accel='%s'", + virTristateSwitchTypeToString(iommu->accel)); + } + } + virXMLFormatElement(buf, "iommu", &attrBuf, &childBuf); } diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 39807b5fe3..c4541589d3 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -3039,6 +3039,7 @@ typedef enum { VIR_DOMAIN_IOMMU_MODEL_SMMUV3, VIR_DOMAIN_IOMMU_MODEL_VIRTIO, VIR_DOMAIN_IOMMU_MODEL_AMD, + VIR_DOMAIN_IOMMU_MODEL_SMMUV3_DEV, VIR_DOMAIN_IOMMU_MODEL_LAST } virDomainIOMMUModel; @@ -3050,10 +3051,12 @@ struct _virDomainIOMMUDef { virTristateSwitch eim; virTristateSwitch iotlb; unsigned int aw_bits; + int parent_idx; virDomainDeviceInfo info; virTristateSwitch dma_translation; virTristateSwitch xtsup; virTristateSwitch pt; + virTristateSwitch accel; }; typedef enum { diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c index 93a2bc9b01..ba12ea706d 100644 --- a/src/conf/domain_validate.c +++ b/src/conf/domain_validate.c @@ -3108,7 +3108,8 @@ virDomainIOMMUDefValidate(const virDomainIOMMUDef *iommu) iommu->eim != VIR_TRISTATE_SWITCH_ABSENT || iommu->iotlb != VIR_TRISTATE_SWITCH_ABSENT || iommu->aw_bits != 0 || - iommu->dma_translation != VIR_TRISTATE_SWITCH_ABSENT) { + iommu->dma_translation != VIR_TRISTATE_SWITCH_ABSENT || + iommu->accel != VIR_TRISTATE_SWITCH_ABSENT) { virReportError(VIR_ERR_XML_ERROR, _("iommu model '%1$s' doesn't support additional attributes"), virDomainIOMMUModelTypeToString(iommu->model)); @@ -3120,7 +3121,8 @@ virDomainIOMMUDefValidate(const virDomainIOMMUDef *iommu) if (iommu->caching_mode != VIR_TRISTATE_SWITCH_ABSENT || iommu->eim != VIR_TRISTATE_SWITCH_ABSENT || iommu->aw_bits != 0 || - iommu->dma_translation != VIR_TRISTATE_SWITCH_ABSENT) { + iommu->dma_translation != VIR_TRISTATE_SWITCH_ABSENT || + iommu->accel != VIR_TRISTATE_SWITCH_ABSENT) { virReportError(VIR_ERR_XML_ERROR, _("iommu model '%1$s' doesn't support some additional attributes"), virDomainIOMMUModelTypeToString(iommu->model)); @@ -3130,7 +3132,24 @@ virDomainIOMMUDefValidate(const virDomainIOMMUDef *iommu) case VIR_DOMAIN_IOMMU_MODEL_INTEL: if (iommu->pt != VIR_TRISTATE_SWITCH_ABSENT || - iommu->xtsup != VIR_TRISTATE_SWITCH_ABSENT) { + iommu->xtsup != VIR_TRISTATE_SWITCH_ABSENT || + iommu->accel != VIR_TRISTATE_SWITCH_ABSENT) { + virReportError(VIR_ERR_XML_ERROR, + _("iommu model '%1$s' doesn't support some additional attributes"), + virDomainIOMMUModelTypeToString(iommu->model)); + return -1; + } + break; + + case VIR_DOMAIN_IOMMU_MODEL_SMMUV3_DEV: + if (iommu->intremap != VIR_TRISTATE_SWITCH_ABSENT || + iommu->caching_mode != VIR_TRISTATE_SWITCH_ABSENT || + iommu->eim != VIR_TRISTATE_SWITCH_ABSENT || + iommu->iotlb != VIR_TRISTATE_SWITCH_ABSENT || + iommu->aw_bits != 0 || + iommu->dma_translation != VIR_TRISTATE_SWITCH_ABSENT || + iommu->xtsup != VIR_TRISTATE_SWITCH_ABSENT || + iommu->pt != VIR_TRISTATE_SWITCH_ABSENT) { virReportError(VIR_ERR_XML_ERROR, _("iommu model '%1$s' doesn't support some additional attributes"), virDomainIOMMUModelTypeToString(iommu->model)); @@ -3155,6 +3174,7 @@ virDomainIOMMUDefValidate(const virDomainIOMMUDef *iommu) case VIR_DOMAIN_IOMMU_MODEL_VIRTIO: case VIR_DOMAIN_IOMMU_MODEL_AMD: + case VIR_DOMAIN_IOMMU_MODEL_SMMUV3_DEV: case VIR_DOMAIN_IOMMU_MODEL_LAST: break; } diff --git a/src/conf/schemas/domaincommon.rng b/src/conf/schemas/domaincommon.rng index b9230a35b4..1aaa1a3180 100644 --- a/src/conf/schemas/domaincommon.rng +++ b/src/conf/schemas/domaincommon.rng @@ -6266,8 +6266,19 @@ <value>smmuv3</value> <value>virtio</value> <value>amd</value> + <value>smmuv3Dev</value> </choice> </attribute> + <optional> + <attribute name="parentIdx"> + <data type="unsignedInt"/> + </attribute> + </optional> + <optional> + <attribute name="accel"> + <ref name="virOnOff"/> + </attribute> + </optional> <interleave> <optional> <element name="driver"> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 031f09b7a5..53088f8d8f 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -6294,6 +6294,63 @@ qemuBuildBootCommandLine(virCommand *cmd, } +static virJSONValue * +qemuBuildPCISmmuv3DevDevProps(const virDomainDef *def, + const virDomainIOMMUDef *iommu) +{ + g_autoptr(virJSONValue) props = NULL; + g_autofree char *bus = NULL; + size_t i; + bool contIsPHB = false; + + for (i = 0; i < def->ncontrollers; i++) { + virDomainControllerDef *cont = def->controllers[i]; + if (cont->idx == iommu->parent_idx) { + if (cont->type == VIR_DOMAIN_CONTROLLER_TYPE_PCI) { + const char *alias = cont->info.alias; + contIsPHB = virDomainControllerIsPSeriesPHB(cont); + + if (!alias) + return NULL; + + if (virDomainDeviceAliasIsUserAlias(alias)) { + if (cont->model == VIR_DOMAIN_CONTROLLER_MODEL_PCI_ROOT && + iommu->parent_idx <= 0) { + if (qemuDomainSupportsPCIMultibus(def)) + bus = g_strdup("pci.0"); + else + bus = g_strdup("pci"); + } else if (cont->model == VIR_DOMAIN_CONTROLLER_MODEL_PCIE_ROOT) { + bus = g_strdup("pcie.0"); + } + } else { + bus = g_strdup(alias); + } + break; + } + } + } + + if (!bus) + return NULL; + + if (contIsPHB && iommu->parent_idx > 0) { + char *temp_bus = g_strdup_printf("%s.0", bus); + g_free(bus); + bus = temp_bus; + } + + if (virJSONValueObjectAdd(&props, + "s:driver", "arm-smmuv3", + "s:primary-bus", bus, + "b:accel", (iommu->accel == VIR_TRISTATE_SWITCH_ON), + NULL) < 0) + return NULL; + + return g_steal_pointer(&props); +} + + static int qemuBuildIOMMUCommandLine(virCommand *cmd, const virDomainDef *def, @@ -6342,7 +6399,6 @@ qemuBuildIOMMUCommandLine(virCommand *cmd, return 0; case VIR_DOMAIN_IOMMU_MODEL_SMMUV3: - /* There is no -device for SMMUv3, so nothing to be done here */ return 0; case VIR_DOMAIN_IOMMU_MODEL_AMD: @@ -6373,6 +6429,14 @@ qemuBuildIOMMUCommandLine(virCommand *cmd, return 0; + case VIR_DOMAIN_IOMMU_MODEL_SMMUV3_DEV: + if (!(props = qemuBuildPCISmmuv3DevDevProps(def, iommu))) + return -1; + if (qemuBuildDeviceCommandlineFromJSON(cmd, props, def, qemuCaps) < 0) + return -1; + + return 0; + case VIR_DOMAIN_IOMMU_MODEL_LAST: default: virReportEnumRangeError(virDomainIOMMUModel, iommu->model); @@ -7206,6 +7270,7 @@ qemuBuildMachineCommandLine(virCommand *cmd, case VIR_DOMAIN_IOMMU_MODEL_INTEL: case VIR_DOMAIN_IOMMU_MODEL_VIRTIO: case VIR_DOMAIN_IOMMU_MODEL_AMD: + case VIR_DOMAIN_IOMMU_MODEL_SMMUV3_DEV: /* These IOMMUs are formatted in qemuBuildIOMMUCommandLine */ break; @@ -10860,15 +10925,15 @@ qemuBuildCommandLine(virDomainObj *vm, if (qemuBuildBootCommandLine(cmd, def) < 0) return NULL; - if (qemuBuildIOMMUCommandLine(cmd, def, qemuCaps) < 0) - return NULL; - if (qemuBuildGlobalControllerCommandLine(cmd, def) < 0) return NULL; if (qemuBuildControllersCommandLine(cmd, def, qemuCaps) < 0) return NULL; + if (qemuBuildIOMMUCommandLine(cmd, def, qemuCaps) < 0) + return NULL; + if (qemuBuildMemoryDeviceCommandLine(cmd, cfg, def, priv) < 0) return NULL; diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 96a9ca9b14..06bf4fab32 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -952,6 +952,7 @@ qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDef *dev, case VIR_DOMAIN_IOMMU_MODEL_INTEL: case VIR_DOMAIN_IOMMU_MODEL_SMMUV3: + case VIR_DOMAIN_IOMMU_MODEL_SMMUV3_DEV: case VIR_DOMAIN_IOMMU_MODEL_LAST: /* These are not PCI devices */ return 0; @@ -2378,6 +2379,7 @@ qemuDomainAssignDevicePCISlots(virDomainDef *def, case VIR_DOMAIN_IOMMU_MODEL_INTEL: case VIR_DOMAIN_IOMMU_MODEL_SMMUV3: + case VIR_DOMAIN_IOMMU_MODEL_SMMUV3_DEV: case VIR_DOMAIN_IOMMU_MODEL_LAST: /* These are not PCI devices */ break; diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index c7ecb467a3..aac004c544 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -5414,6 +5414,22 @@ qemuValidateDomainDeviceDefIOMMU(const virDomainIOMMUDef *iommu, } break; + case VIR_DOMAIN_IOMMU_MODEL_SMMUV3_DEV: + if (!qemuDomainIsARMVirt(def)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("IOMMU device: '%1$s' is only supported with ARM Virt machines"), + virDomainIOMMUModelTypeToString(iommu->model)); + return -1; + } + // TODO: Check for pluggable device SMMUv3 qemu capability + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_MACHINE_VIRT_IOMMU)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("IOMMU device: '%1$s' is not supported with this QEMU binary"), + virDomainIOMMUModelTypeToString(iommu->model)); + return -1; + } + break; + case VIR_DOMAIN_IOMMU_MODEL_LAST: default: virReportEnumRangeError(virDomainIOMMUModel, iommu->model); -- 2.43.0

Add support for parsing multiple IOMMU devices from the VM definition when "smmuv3Dev" is the IOMMU model. Signed-off-by: Nathan Chen <nathanc@nvidia.com> --- src/conf/domain_conf.c | 85 +++++++++++++---- src/conf/domain_conf.h | 9 +- src/conf/domain_validate.c | 32 ++++--- src/conf/schemas/domaincommon.rng | 4 +- src/libvirt_private.syms | 2 + src/qemu/qemu_alias.c | 15 ++- src/qemu/qemu_command.c | 146 ++++++++++++++++-------------- src/qemu/qemu_domain_address.c | 35 +++---- src/qemu/qemu_driver.c | 8 +- src/qemu/qemu_postparse.c | 11 ++- src/qemu/qemu_validate.c | 2 +- 11 files changed, 216 insertions(+), 133 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 42b2a5c74f..ffb75c0503 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -4134,7 +4134,8 @@ void virDomainDefFree(virDomainDef *def) virDomainCryptoDefFree(def->cryptos[i]); g_free(def->cryptos); - virDomainIOMMUDefFree(def->iommu); + for (i = 0; i < def->niommus; i++) + virDomainIOMMUDefFree(def->iommu[i]); virDomainPstoreDefFree(def->pstore); @@ -5006,9 +5007,9 @@ virDomainDeviceInfoIterateFlags(virDomainDef *def, } device.type = VIR_DOMAIN_DEVICE_IOMMU; - if (def->iommu) { - device.data.iommu = def->iommu; - if ((rc = cb(def, &device, &def->iommu->info, opaque)) != 0) + for (i = 0; i < def->niommus; i++) { + device.data.iommu = def->iommu[i]; + if ((rc = cb(def, &device, &def->iommu[i]->info, opaque)) != 0) return rc; } @@ -16491,6 +16492,44 @@ virDomainInputDefFind(const virDomainDef *def, } +bool +virDomainIOMMUDefEquals(const virDomainIOMMUDef *a, + const virDomainIOMMUDef *b) +{ + if (a->model != b->model || + a->intremap != b->intremap || + a->caching_mode != b->caching_mode || + a->eim != b->eim || + a->iotlb != b->iotlb || + a->aw_bits != b->aw_bits || + a->parent_idx != b->parent_idx || + a->accel != b->accel || + a->dma_translation != b->dma_translation) + return false; + + if (a->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE && + !virDomainDeviceInfoAddressIsEqual(&a->info, &b->info)) + return false; + + return true; +} + + +ssize_t +virDomainIOMMUDefFind(const virDomainDef *def, + const virDomainIOMMUDef *iommu) +{ + size_t i; + + for (i = 0; i < def->niommus; i++) { + if (virDomainIOMMUDefEquals(iommu, def->iommu[i])) + return i; + } + + return -1; +} + + bool virDomainVsockDefEquals(const virDomainVsockDef *a, const virDomainVsockDef *b) @@ -20158,19 +20197,28 @@ virDomainDefParseXML(xmlXPathContextPtr ctxt, } VIR_FREE(nodes); + /* analysis of iommu devices */ if ((n = virXPathNodeSet("./devices/iommu", ctxt, &nodes)) < 0) return NULL; - if (n > 1) { + if (n > 1 && !virXPathBoolean("./devices/iommu/@model = 'smmuv3Dev'", ctxt)) { virReportError(VIR_ERR_XML_ERROR, "%s", - _("only a single IOMMU device is supported")); + _("multiple IOMMU devices are only supported with model smmuv3Dev")); return NULL; } - if (n > 0) { - if (!(def->iommu = virDomainIOMMUDefParseXML(xmlopt, nodes[0], - ctxt, flags))) + if (n > 0) + def->iommu = g_new0(virDomainIOMMUDef *, n); + + for (i = 0; i < n; i++) { + virDomainIOMMUDef *iommu; + + iommu = virDomainIOMMUDefParseXML(xmlopt, nodes[i], ctxt, flags); + + if (!iommu) return NULL; + + def->iommu[def->niommus++] = iommu; } VIR_FREE(nodes); @@ -22629,15 +22677,17 @@ virDomainDefCheckABIStabilityFlags(virDomainDef *src, goto error; } - if (!!src->iommu != !!dst->iommu) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("Target domain IOMMU device count does not match source")); + if (src->niommus != dst->niommus) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Target domain IOMMU device count %1$zu does not match source %2$zu"), + dst->niommus, src->niommus); goto error; } - if (src->iommu && - !virDomainIOMMUDefCheckABIStability(src->iommu, dst->iommu)) - goto error; + for (i = 0; i < src->niommus; i++) { + if (!virDomainIOMMUDefCheckABIStability(src->iommu[i], dst->iommu[i])) + goto error; + } if (!!src->vsock != !!dst->vsock) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", @@ -29482,8 +29532,9 @@ virDomainDefFormatInternalSetRootName(virDomainDef *def, for (n = 0; n < def->ncryptos; n++) { virDomainCryptoDefFormat(buf, def->cryptos[n], flags); } - if (def->iommu) - virDomainIOMMUDefFormat(buf, def->iommu); + + for (n = 0; n < def->niommus; n++) + virDomainIOMMUDefFormat(buf, def->iommu[n]); if (def->vsock) virDomainVsockDefFormat(buf, def->vsock); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index c4541589d3..dc8c5f35e2 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -3298,6 +3298,9 @@ struct _virDomainDef { size_t nwatchdogs; virDomainWatchdogDef **watchdogs; + size_t niommus; + virDomainIOMMUDef **iommu; + /* At maximum 2 TPMs on the domain if a TPM Proxy is present. */ size_t ntpms; virDomainTPMDef **tpms; @@ -3307,7 +3310,6 @@ struct _virDomainDef { virDomainNVRAMDef *nvram; virCPUDef *cpu; virDomainRedirFilterDef *redirfilter; - virDomainIOMMUDef *iommu; virDomainVsockDef *vsock; virDomainPstoreDef *pstore; @@ -4312,6 +4314,11 @@ virDomainShmemDef *virDomainShmemDefRemove(virDomainDef *def, size_t idx) ssize_t virDomainInputDefFind(const virDomainDef *def, const virDomainInputDef *input) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; +bool virDomainIOMMUDefEquals(const virDomainIOMMUDef *a, + const virDomainIOMMUDef *b); +ssize_t virDomainIOMMUDefFind(const virDomainDef *def, + const virDomainIOMMUDef *iommu) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; bool virDomainVsockDefEquals(const virDomainVsockDef *a, const virDomainVsockDef *b) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c index ba12ea706d..3cc8413a27 100644 --- a/src/conf/domain_validate.c +++ b/src/conf/domain_validate.c @@ -1851,21 +1851,31 @@ virDomainDefCputuneValidate(const virDomainDef *def) static int virDomainDefIOMMUValidate(const virDomainDef *def) { + size_t i; + if (!def->iommu) return 0; - if (def->iommu->intremap == VIR_TRISTATE_SWITCH_ON && - def->features[VIR_DOMAIN_FEATURE_IOAPIC] != VIR_DOMAIN_IOAPIC_QEMU) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("IOMMU interrupt remapping requires split I/O APIC (ioapic driver='qemu')")); - return -1; - } + for (i = 0; i < def->niommus; i++) { + virDomainIOMMUDef *iommu = def->iommu[i]; + if (def->niommus > 1 && iommu->model != VIR_DOMAIN_IOMMU_MODEL_SMMUV3_DEV) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("IOMMU model smmuv3Dev must be specified for multiple IOMMU definitions")); + } - if (def->iommu->eim == VIR_TRISTATE_SWITCH_ON && - def->iommu->intremap != VIR_TRISTATE_SWITCH_ON) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("IOMMU eim requires interrupt remapping to be enabled")); - return -1; + if (iommu->intremap == VIR_TRISTATE_SWITCH_ON && + def->features[VIR_DOMAIN_FEATURE_IOAPIC] != VIR_DOMAIN_IOAPIC_QEMU) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("IOMMU interrupt remapping requires split I/O APIC (ioapic driver='qemu')")); + return -1; + } + + if (iommu->eim == VIR_TRISTATE_SWITCH_ON && + iommu->intremap != VIR_TRISTATE_SWITCH_ON) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("IOMMU eim requires interrupt remapping to be enabled")); + return -1; + } } return 0; diff --git a/src/conf/schemas/domaincommon.rng b/src/conf/schemas/domaincommon.rng index 1aaa1a3180..4a1ef0c060 100644 --- a/src/conf/schemas/domaincommon.rng +++ b/src/conf/schemas/domaincommon.rng @@ -6964,9 +6964,9 @@ <zeroOrMore> <ref name="panic"/> </zeroOrMore> - <optional> + <zeroOrMore> <ref name="iommu"/> - </optional> + </zeroOrMore> <optional> <ref name="vsock"/> </optional> diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index fe72402527..34d9c263d4 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -491,6 +491,8 @@ virDomainInputSourceGrabToggleTypeToString; virDomainInputSourceGrabTypeFromString; virDomainInputSourceGrabTypeToString; virDomainInputTypeToString; +virDomainIOMMUDefEquals; +virDomainIOMMUDefFind; virDomainIOMMUDefFree; virDomainIOMMUDefNew; virDomainIOMMUModelTypeFromString; diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c index a27c688d79..5f2b11b9a6 100644 --- a/src/qemu/qemu_alias.c +++ b/src/qemu/qemu_alias.c @@ -647,10 +647,14 @@ qemuAssignDeviceVsockAlias(virDomainVsockDef *vsock) static void -qemuAssignDeviceIOMMUAlias(virDomainIOMMUDef *iommu) +qemuAssignDeviceIOMMUAlias(virDomainDef *def, + virDomainIOMMUDef **iommu) { - if (!iommu->info.alias) - iommu->info.alias = g_strdup("iommu0"); + size_t i; + for (i = 0; i < def->niommus; i++) { + if (!iommu[i]->info.alias) + iommu[i]->info.alias = g_strdup_printf("iommu%zu", i); + } } @@ -766,8 +770,9 @@ qemuAssignDeviceAliases(virDomainDef *def) if (def->vsock) { qemuAssignDeviceVsockAlias(def->vsock); } - if (def->iommu) - qemuAssignDeviceIOMMUAlias(def->iommu); + if (def->iommu && def->niommus > 0) { + qemuAssignDeviceIOMMUAlias(def, def->iommu); + } for (i = 0; i < def->ncryptos; i++) { qemuAssignDeviceCryptoAlias(def, def->cryptos[i]); } diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 53088f8d8f..fc6d4ae706 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -6296,10 +6296,12 @@ qemuBuildBootCommandLine(virCommand *cmd, static virJSONValue * qemuBuildPCISmmuv3DevDevProps(const virDomainDef *def, - const virDomainIOMMUDef *iommu) + const virDomainIOMMUDef *iommu, + size_t id) { g_autoptr(virJSONValue) props = NULL; g_autofree char *bus = NULL; + g_autofree char *smmuv3_id = NULL; size_t i; bool contIsPHB = false; @@ -6340,9 +6342,12 @@ qemuBuildPCISmmuv3DevDevProps(const virDomainDef *def, bus = temp_bus; } + smmuv3_id = g_strdup_printf("smmuv3.%zu", id); + if (virJSONValueObjectAdd(&props, "s:driver", "arm-smmuv3", "s:primary-bus", bus, + "s:id", smmuv3_id, "b:accel", (iommu->accel == VIR_TRISTATE_SWITCH_ON), NULL) < 0) return NULL; @@ -6356,91 +6361,92 @@ qemuBuildIOMMUCommandLine(virCommand *cmd, const virDomainDef *def, virQEMUCaps *qemuCaps) { + size_t i; g_autoptr(virJSONValue) props = NULL; g_autoptr(virJSONValue) wrapperProps = NULL; - const virDomainIOMMUDef *iommu = def->iommu; - - if (!iommu) + if (!def->iommu || def->niommus <= 0) return 0; - switch (iommu->model) { - case VIR_DOMAIN_IOMMU_MODEL_INTEL: - if (virJSONValueObjectAdd(&props, - "s:driver", "intel-iommu", - "s:id", iommu->info.alias, - "S:intremap", qemuOnOffAuto(iommu->intremap), - "T:caching-mode", iommu->caching_mode, - "S:eim", qemuOnOffAuto(iommu->eim), - "T:device-iotlb", iommu->iotlb, - "z:aw-bits", iommu->aw_bits, - "T:dma-translation", iommu->dma_translation, - NULL) < 0) - return -1; + for (i = 0; i < def->niommus; i++) { + virDomainIOMMUDef *iommu = def->iommu[i]; + switch (iommu->model) { + case VIR_DOMAIN_IOMMU_MODEL_INTEL: + if (virJSONValueObjectAdd(&props, + "s:driver", "intel-iommu", + "s:id", iommu->info.alias, + "S:intremap", qemuOnOffAuto(iommu->intremap), + "T:caching-mode", iommu->caching_mode, + "S:eim", qemuOnOffAuto(iommu->eim), + "T:device-iotlb", iommu->iotlb, + "z:aw-bits", iommu->aw_bits, + "T:dma-translation", iommu->dma_translation, + NULL) < 0) + return -1; - if (qemuBuildDeviceCommandlineFromJSON(cmd, props, def, qemuCaps) < 0) - return -1; + if (qemuBuildDeviceCommandlineFromJSON(cmd, props, def, qemuCaps) < 0) + return -1; - return 0; + return 0; + case VIR_DOMAIN_IOMMU_MODEL_VIRTIO: + if (virJSONValueObjectAdd(&props, + "s:driver", "virtio-iommu", + "s:id", iommu->info.alias, + NULL) < 0) { + return -1; + } - case VIR_DOMAIN_IOMMU_MODEL_VIRTIO: - if (virJSONValueObjectAdd(&props, - "s:driver", "virtio-iommu", - "s:id", iommu->info.alias, - NULL) < 0) { - return -1; - } + if (qemuBuildDeviceAddressProps(props, def, &iommu->info) < 0) + return -1; - if (qemuBuildDeviceAddressProps(props, def, &iommu->info) < 0) - return -1; + if (qemuBuildDeviceCommandlineFromJSON(cmd, props, def, qemuCaps) < 0) + return -1; - if (qemuBuildDeviceCommandlineFromJSON(cmd, props, def, qemuCaps) < 0) - return -1; + return 0; + case VIR_DOMAIN_IOMMU_MODEL_SMMUV3: + /* There is no -device for SMMUv3, so nothing to be done here */ + return 0; - return 0; + case VIR_DOMAIN_IOMMU_MODEL_AMD: + if (virJSONValueObjectAdd(&wrapperProps, + "s:driver", "AMDVI-PCI", + "s:id", iommu->info.alias, + NULL) < 0) + return -1; - case VIR_DOMAIN_IOMMU_MODEL_SMMUV3: - return 0; + if (qemuBuildDeviceAddressProps(wrapperProps, def, &iommu->info) < 0) + return -1; - case VIR_DOMAIN_IOMMU_MODEL_AMD: - if (virJSONValueObjectAdd(&wrapperProps, - "s:driver", "AMDVI-PCI", - "s:id", iommu->info.alias, - NULL) < 0) - return -1; + if (qemuBuildDeviceCommandlineFromJSON(cmd, wrapperProps, def, qemuCaps) < 0) + return -1; - if (qemuBuildDeviceAddressProps(wrapperProps, def, &iommu->info) < 0) - return -1; + if (virJSONValueObjectAdd(&props, + "s:driver", "amd-iommu", + "s:pci-id", iommu->info.alias, + "S:intremap", qemuOnOffAuto(iommu->intremap), + "T:pt", iommu->pt, + "T:xtsup", iommu->xtsup, + "T:device-iotlb", iommu->iotlb, + NULL) < 0) + return -1; - if (qemuBuildDeviceCommandlineFromJSON(cmd, wrapperProps, def, qemuCaps) < 0) - return -1; + if (qemuBuildDeviceCommandlineFromJSON(cmd, props, def, qemuCaps) < 0) + return -1; - if (virJSONValueObjectAdd(&props, - "s:driver", "amd-iommu", - "s:pci-id", iommu->info.alias, - "S:intremap", qemuOnOffAuto(iommu->intremap), - "T:pt", iommu->pt, - "T:xtsup", iommu->xtsup, - "T:device-iotlb", iommu->iotlb, - NULL) < 0) - return -1; + return 0; - if (qemuBuildDeviceCommandlineFromJSON(cmd, props, def, qemuCaps) < 0) - return -1; + case VIR_DOMAIN_IOMMU_MODEL_SMMUV3_DEV: + if (!(props = qemuBuildPCISmmuv3DevDevProps(def, iommu, i))) + return -1; + if (qemuBuildDeviceCommandlineFromJSON(cmd, props, def, qemuCaps) < 0) + return -1; + break; - return 0; - case VIR_DOMAIN_IOMMU_MODEL_SMMUV3_DEV: - if (!(props = qemuBuildPCISmmuv3DevDevProps(def, iommu))) - return -1; - if (qemuBuildDeviceCommandlineFromJSON(cmd, props, def, qemuCaps) < 0) + case VIR_DOMAIN_IOMMU_MODEL_LAST: + default: + virReportEnumRangeError(virDomainIOMMUModel, iommu->model); return -1; - - return 0; - - case VIR_DOMAIN_IOMMU_MODEL_LAST: - default: - virReportEnumRangeError(virDomainIOMMUModel, iommu->model); - return -1; + } } return 0; @@ -7261,8 +7267,8 @@ qemuBuildMachineCommandLine(virCommand *cmd, if (qemuAppendDomainFeaturesMachineParam(&buf, def, qemuCaps) < 0) return -1; - if (def->iommu) { - switch (def->iommu->model) { + if (def->iommu && def->niommus == 1) { + switch (def->iommu[0]->model) { case VIR_DOMAIN_IOMMU_MODEL_SMMUV3: virBufferAddLit(&buf, ",iommu=smmuv3"); break; @@ -7276,7 +7282,7 @@ qemuBuildMachineCommandLine(virCommand *cmd, case VIR_DOMAIN_IOMMU_MODEL_LAST: default: - virReportEnumRangeError(virDomainIOMMUModel, def->iommu->model); + virReportEnumRangeError(virDomainIOMMUModel, def->iommu[0]->model); return -1; } } diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 06bf4fab32..2ddc629304 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -2365,24 +2365,25 @@ qemuDomainAssignDevicePCISlots(virDomainDef *def, /* Nada - none are PCI based (yet) */ } - if (def->iommu) { - virDomainIOMMUDef *iommu = def->iommu; - - switch (iommu->model) { - case VIR_DOMAIN_IOMMU_MODEL_VIRTIO: - case VIR_DOMAIN_IOMMU_MODEL_AMD: - if (virDeviceInfoPCIAddressIsWanted(&iommu->info) && - qemuDomainPCIAddressReserveNextAddr(addrs, &iommu->info) < 0) { - return -1; - } - break; + if (def->iommu && def->niommus > 0) { + for (i = 0; i < def->niommus; i++) { + virDomainIOMMUDef *iommu = def->iommu[i]; + switch (iommu->model) { + case VIR_DOMAIN_IOMMU_MODEL_VIRTIO: + case VIR_DOMAIN_IOMMU_MODEL_AMD: + if (virDeviceInfoPCIAddressIsWanted(&iommu->info) && + qemuDomainPCIAddressReserveNextAddr(addrs, &iommu->info) < 0) { + return -1; + } + break; - case VIR_DOMAIN_IOMMU_MODEL_INTEL: - case VIR_DOMAIN_IOMMU_MODEL_SMMUV3: - case VIR_DOMAIN_IOMMU_MODEL_SMMUV3_DEV: - case VIR_DOMAIN_IOMMU_MODEL_LAST: - /* These are not PCI devices */ - break; + case VIR_DOMAIN_IOMMU_MODEL_INTEL: + case VIR_DOMAIN_IOMMU_MODEL_SMMUV3: + case VIR_DOMAIN_IOMMU_MODEL_SMMUV3_DEV: + case VIR_DOMAIN_IOMMU_MODEL_LAST: + /* These are not PCI devices */ + break; + } } } diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index ac72ea5cb0..3d65f78c9e 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -6894,12 +6894,12 @@ qemuDomainAttachDeviceConfig(virDomainDef *vmdef, break; case VIR_DOMAIN_DEVICE_IOMMU: - if (vmdef->iommu) { + if (vmdef->iommu && vmdef->niommus > 0) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", _("domain already has an iommu device")); return -1; } - vmdef->iommu = g_steal_pointer(&dev->data.iommu); + VIR_APPEND_ELEMENT(vmdef->iommu, vmdef->niommus, dev->data.iommu); break; case VIR_DOMAIN_DEVICE_VIDEO: @@ -7113,12 +7113,12 @@ qemuDomainDetachDeviceConfig(virDomainDef *vmdef, break; case VIR_DOMAIN_DEVICE_IOMMU: - if (!vmdef->iommu) { + if ((idx = virDomainIOMMUDefFind(vmdef, dev->data.iommu)) < 0) { virReportError(VIR_ERR_OPERATION_FAILED, "%s", _("matching iommu device not found")); return -1; } - g_clear_pointer(&vmdef->iommu, virDomainIOMMUDefFree); + VIR_DELETE_ELEMENT(vmdef->iommu, idx, vmdef->niommus); break; case VIR_DOMAIN_DEVICE_VIDEO: diff --git a/src/qemu/qemu_postparse.c b/src/qemu/qemu_postparse.c index 9c2427970d..e2744a4a61 100644 --- a/src/qemu/qemu_postparse.c +++ b/src/qemu/qemu_postparse.c @@ -1503,7 +1503,7 @@ qemuDomainDefAddDefaultDevices(virQEMUDriver *driver, } } - if (addIOMMU && !def->iommu && + if (addIOMMU && !def->iommu && def->niommus == 0 && virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_INTEL_IOMMU) && virQEMUCapsGet(qemuCaps, QEMU_CAPS_INTEL_IOMMU_INTREMAP) && virQEMUCapsGet(qemuCaps, QEMU_CAPS_INTEL_IOMMU_EIM)) { @@ -1515,7 +1515,8 @@ qemuDomainDefAddDefaultDevices(virQEMUDriver *driver, iommu->intremap = VIR_TRISTATE_SWITCH_ON; iommu->eim = VIR_TRISTATE_SWITCH_ON; - def->iommu = g_steal_pointer(&iommu); + def->iommu = g_new0(virDomainIOMMUDef *, 1); + def->iommu[def->niommus++] = g_steal_pointer(&iommu); } if (qemuDomainDefAddDefaultAudioBackend(driver, def) < 0) @@ -1591,9 +1592,9 @@ qemuDomainDefEnableDefaultFeatures(virDomainDef *def, * domain already has IOMMU without inremap. This will be fixed in * qemuDomainIOMMUDefPostParse() but there domain definition can't be * modified so change it now. */ - if (def->iommu && - (def->iommu->intremap == VIR_TRISTATE_SWITCH_ON || - qemuDomainNeedsIOMMUWithEIM(def)) && + if (def->iommu && def->niommus == 1 && + (def->iommu[0]->intremap == VIR_TRISTATE_SWITCH_ON || + qemuDomainNeedsIOMMUWithEIM(def)) && def->features[VIR_DOMAIN_FEATURE_IOAPIC] == VIR_DOMAIN_IOAPIC_NONE) { def->features[VIR_DOMAIN_FEATURE_IOAPIC] = VIR_DOMAIN_IOAPIC_QEMU; } diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index aac004c544..bdaeefe976 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -851,7 +851,7 @@ qemuValidateDomainVCpuTopology(const virDomainDef *def, virQEMUCaps *qemuCaps) QEMU_MAX_VCPUS_WITHOUT_EIM); return -1; } - if (!def->iommu || def->iommu->eim != VIR_TRISTATE_SWITCH_ON) { + if (!def->iommu || def->iommu[0]->eim != VIR_TRISTATE_SWITCH_ON) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("more than %1$d vCPUs require extended interrupt mode enabled on the iommu device"), QEMU_MAX_VCPUS_WITHOUT_EIM); -- 2.43.0

Provide sample XML and CLI args for the smmuv3Dev XML schema for pc, q35, and virt machine types. Signed-off-by: Nathan Chen <nathanc@nvidia.com> --- .../iommu-smmuv3Dev.aarch64-latest.args | 39 ++++++++++++ .../iommu-smmuv3Dev.aarch64-latest.xml | 62 +++++++++++++++++++ tests/qemuxmlconfdata/iommu-smmuv3Dev.xml | 49 +++++++++++++++ tests/qemuxmlconftest.c | 1 + 4 files changed, 151 insertions(+) create mode 100644 tests/qemuxmlconfdata/iommu-smmuv3Dev.aarch64-latest.args create mode 100644 tests/qemuxmlconfdata/iommu-smmuv3Dev.aarch64-latest.xml create mode 100644 tests/qemuxmlconfdata/iommu-smmuv3Dev.xml diff --git a/tests/qemuxmlconfdata/iommu-smmuv3Dev.aarch64-latest.args b/tests/qemuxmlconfdata/iommu-smmuv3Dev.aarch64-latest.args new file mode 100644 index 0000000000..d0b5b754ac --- /dev/null +++ b/tests/qemuxmlconfdata/iommu-smmuv3Dev.aarch64-latest.args @@ -0,0 +1,39 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/var/lib/libvirt/qemu/domain--1-guest \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/var/lib/libvirt/qemu/domain--1-guest/.local/share \ +XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain--1-guest/.cache \ +XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain--1-guest/.config \ +/usr/bin/qemu-system-aarch64 \ +-name guest=guest,debug-threads=on \ +-S \ +-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain--1-guest/master-key.aes"}' \ +-machine virt,usb=off,gic-version=2,dump-guest-core=off,memory-backend=mach-virt.ram,acpi=off \ +-accel tcg \ +-cpu cortex-a15 \ +-m size=1048576k \ +-object '{"qom-type":"memory-backend-ram","id":"mach-virt.ram","size":1073741824}' \ +-overcommit mem-lock=off \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid 1ccfd97d-5eb4-478a-bbe6-88d254c16db7 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-boot strict=on \ +-device '{"driver":"pxb-pcie","bus_nr":252,"id":"pci.1","bus":"pcie.0","addr":"0x1"}' \ +-device '{"driver":"pxb-pcie","bus_nr":248,"id":"pci.2","bus":"pcie.0","addr":"0x2"}' \ +-device '{"driver":"pcie-root-port","port":0,"chassis":21,"id":"pci.3","bus":"pci.1","addr":"0x0"}' \ +-device '{"driver":"pcie-root-port","port":168,"chassis":22,"id":"pci.4","bus":"pci.2","addr":"0x0"}' \ +-device '{"driver":"arm-smmuv3","primary-bus":"pci.1","id":"smmuv3.0","accel":true}' \ +-device '{"driver":"arm-smmuv3","primary-bus":"pci.2","id":"smmuv3.1","accel":true}' \ +-audiodev '{"id":"audio1","driver":"none"}' \ +-device '{"driver":"vfio-pci","host":"0000:06:12.5","id":"hostdev0","bus":"pci.3","addr":"0x0"}' \ +-device '{"driver":"vfio-pci","host":"0000:06:12.6","id":"hostdev1","bus":"pci.4","addr":"0x0"}' \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ +-msg timestamp=on diff --git a/tests/qemuxmlconfdata/iommu-smmuv3Dev.aarch64-latest.xml b/tests/qemuxmlconfdata/iommu-smmuv3Dev.aarch64-latest.xml new file mode 100644 index 0000000000..c2c9552586 --- /dev/null +++ b/tests/qemuxmlconfdata/iommu-smmuv3Dev.aarch64-latest.xml @@ -0,0 +1,62 @@ +<domain type='qemu'> + <name>guest</name> + <uuid>1ccfd97d-5eb4-478a-bbe6-88d254c16db7</uuid> + <memory unit='KiB'>1048576</memory> + <currentMemory unit='KiB'>1048576</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='aarch64' machine='virt'>hvm</type> + <boot dev='hd'/> + </os> + <features> + <gic version='2'/> + </features> + <cpu mode='custom' match='exact' check='none'> + <model fallback='forbid'>cortex-a15</model> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-aarch64</emulator> + <controller type='usb' index='0' model='none'/> + <controller type='pci' index='0' model='pcie-root'/> + <controller type='pci' index='1' model='pcie-expander-bus'> + <model name='pxb-pcie'/> + <target busNr='252'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> + </controller> + <controller type='pci' index='2' model='pcie-expander-bus'> + <model name='pxb-pcie'/> + <target busNr='248'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </controller> + <controller type='pci' index='3' model='pcie-root-port'> + <model name='pcie-root-port'/> + <target chassis='21' port='0x0'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> + </controller> + <controller type='pci' index='4' model='pcie-root-port'> + <model name='pcie-root-port'/> + <target chassis='22' port='0xa8'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> + </controller> + <audio id='1' type='none'/> + <hostdev mode='subsystem' type='pci' managed='no'> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> + </source> + <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='no'> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x6'/> + </source> + <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> + </hostdev> + <memballoon model='none'/> + <iommu model='smmuv3Dev' parentIdx='1' accel='on'/> + <iommu model='smmuv3Dev' parentIdx='2' accel='on'/> + </devices> +</domain> diff --git a/tests/qemuxmlconfdata/iommu-smmuv3Dev.xml b/tests/qemuxmlconfdata/iommu-smmuv3Dev.xml new file mode 100644 index 0000000000..59ff2ace5b --- /dev/null +++ b/tests/qemuxmlconfdata/iommu-smmuv3Dev.xml @@ -0,0 +1,49 @@ +<domain type='qemu'> + <name>guest</name> + <uuid>1ccfd97d-5eb4-478a-bbe6-88d254c16db7</uuid> + <memory unit='KiB'>1048576</memory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='aarch64' machine='virt'>hvm</type> + </os> + <devices> + <emulator>/usr/bin/qemu-system-aarch64</emulator> + <controller type='usb' model='none'/> + <memballoon model='none'/> + <controller type='pci' index='0' model='pcie-root'/> + <controller type='pci' index='1' model='pcie-expander-bus'> + <model name='pxb-pcie'/> + <target busNr='252'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> + </controller> + <controller type='pci' index='2' model='pcie-expander-bus'> + <model name='pxb-pcie'/> + <target busNr='248'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </controller> + <controller type='pci' index='3' model='pcie-root-port'> + <model name='pcie-root-port'/> + <target chassis='21' port='0x0'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> + </controller> + <controller type='pci' index='4' model='pcie-root-port'> + <model name='pcie-root-port'/> + <target chassis='22' port='0xa8'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> + </controller> + <hostdev mode='subsystem' type='pci' managed='no'> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> + </source> + <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> + </hostdev> + <hostdev mode='subsystem' type='pci' managed='no'> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x6'/> + </source> + <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> + </hostdev> + <iommu model='smmuv3Dev' parentIdx='1' accel='on'/> + <iommu model='smmuv3Dev' parentIdx='2' accel='on'/> + </devices> +</domain> diff --git a/tests/qemuxmlconftest.c b/tests/qemuxmlconftest.c index 171a6f1c78..045e56658e 100644 --- a/tests/qemuxmlconftest.c +++ b/tests/qemuxmlconftest.c @@ -2836,6 +2836,7 @@ mymain(void) DO_TEST_CAPS_LATEST_ABI_UPDATE("intel-iommu-eim-autoadd"); DO_TEST_CAPS_LATEST_ABI_UPDATE("intel-iommu-eim-autoadd-v2"); DO_TEST_CAPS_ARCH_LATEST("iommu-smmuv3", "aarch64"); + DO_TEST_CAPS_ARCH_LATEST("iommu-smmuv3Dev", "aarch64"); DO_TEST_CAPS_LATEST("virtio-iommu-x86_64"); DO_TEST_CAPS_ARCH_LATEST("virtio-iommu-aarch64", "aarch64"); DO_TEST_CAPS_LATEST_PARSE_ERROR("virtio-iommu-wrong-machine"); -- 2.43.0

Implement a new iommufd attribute under hostdevs' PCI subsystem driver that can be used to specify associated iommufd object when launching a qemu VM. Signed-off-by: Nathan Chen <nathanc@nvidia.com> --- docs/formatdomain.rst | 8 ++++++++ src/conf/device_conf.c | 9 +++++++++ src/conf/device_conf.h | 1 + src/conf/schemas/basictypes.rng | 5 +++++ src/qemu/qemu_command.c | 19 +++++++++++++++++++ 5 files changed, 42 insertions(+) diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 2f1e06ba0e..61b71fdf1f 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -4836,6 +4836,7 @@ or: device; if PCI ROM loading is disabled through this attribute, attempts to tweak the loading process further using the ``bar`` or ``file`` attributes will be rejected. :since:`Since 4.3.0 (QEMU and KVM only)`. + ``address`` The ``address`` element for USB devices has a ``bus`` and ``device`` attribute to specify the USB bus and device number the device appears at on @@ -4876,6 +4877,13 @@ or: found is "problematic" in some way, the generic vfio-pci driver similarly be forced. + The ``<driver>`` element's ``iommufd`` attribute is used to specify + using the iommufd interface to propagate DMA mappings to the kernel, + instead of legacy VFIO. When the attribute is present, an iommufd + object will be created by the resulting qemu command. Libvirt will + open the /dev/iommu and VFIO device cdev, passing the associated + file descriptor numbers to the qemu command. + (Note: :since:`Since 1.0.5`, the ``name`` attribute has been described to be used to select the type of PCI device assignment ("vfio", "kvm", or "xen"), but those values have been mostly diff --git a/src/conf/device_conf.c b/src/conf/device_conf.c index c278b81652..88979ecc39 100644 --- a/src/conf/device_conf.c +++ b/src/conf/device_conf.c @@ -60,6 +60,8 @@ int virDeviceHostdevPCIDriverInfoParseXML(xmlNodePtr node, virDeviceHostdevPCIDriverInfo *driver) { + virTristateBool iommufd; + driver->iommufd = false; if (virXMLPropEnum(node, "name", virDeviceHostdevPCIDriverNameTypeFromString, VIR_XML_PROP_NONZERO, @@ -67,6 +69,10 @@ virDeviceHostdevPCIDriverInfoParseXML(xmlNodePtr node, return -1; } + if (virXMLPropTristateBool(node, "iommufd", VIR_XML_PROP_NONE, &iommufd) < 0) + return -1; + virTristateBoolToBool(iommufd, &driver->iommufd); + driver->model = virXMLPropString(node, "model"); return 0; } @@ -93,6 +99,9 @@ virDeviceHostdevPCIDriverInfoFormat(virBuffer *buf, virBufferEscapeString(&driverAttrBuf, " model='%s'", driver->model); + if (driver->iommufd) + virBufferAddLit(&driverAttrBuf, " iommufd='yes'"); + virXMLFormatElement(buf, "driver", &driverAttrBuf, NULL); return 0; } diff --git a/src/conf/device_conf.h b/src/conf/device_conf.h index e570f51824..7bdbd80b0a 100644 --- a/src/conf/device_conf.h +++ b/src/conf/device_conf.h @@ -47,6 +47,7 @@ VIR_ENUM_DECL(virDeviceHostdevPCIDriverName); struct _virDeviceHostdevPCIDriverInfo { virDeviceHostdevPCIDriverName name; char *model; + bool iommufd; }; typedef enum { diff --git a/src/conf/schemas/basictypes.rng b/src/conf/schemas/basictypes.rng index 2931e316b7..089fc0f1c2 100644 --- a/src/conf/schemas/basictypes.rng +++ b/src/conf/schemas/basictypes.rng @@ -673,6 +673,11 @@ <ref name="genericName"/> </attribute> </optional> + <optional> + <attribute name="iommufd"> + <ref name="virYesNo"/> + </attribute> + </optional> <empty/> </element> </define> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index fc6d4ae706..c9d165dfd9 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -4808,6 +4808,7 @@ qemuBuildPCIHostdevDevProps(const virDomainDef *def, g_autofree char *host = virPCIDeviceAddressAsString(&pcisrc->addr); const char *failover_pair_id = NULL; const char *driver = NULL; + const char *iommufdId = NULL; /* 'ramfb' property must be omitted unless it's to be enabled */ bool ramfb = pcisrc->ramfb == VIR_TRISTATE_SWITCH_ON; @@ -4841,6 +4842,9 @@ qemuBuildPCIHostdevDevProps(const virDomainDef *def, teaming->persistent) failover_pair_id = teaming->persistent; + if (pcisrc->driver.iommufd) + iommufdId = "iommufd0"; + if (virJSONValueObjectAdd(&props, "s:driver", driver, "s:host", host, @@ -4849,6 +4853,7 @@ qemuBuildPCIHostdevDevProps(const virDomainDef *def, "S:failover_pair_id", failover_pair_id, "S:display", qemuOnOffAuto(pcisrc->display), "B:ramfb", ramfb, + "S:iommufd", iommufdId, NULL) < 0) return NULL; @@ -5265,6 +5270,9 @@ qemuBuildHostdevCommandLine(virCommand *cmd, virQEMUCaps *qemuCaps) { size_t i; + g_autoptr(virJSONValue) props = NULL; + int iommufd = 0; + const char * iommufdId = "iommufd0"; for (i = 0; i < def->nhostdevs; i++) { virDomainHostdevDef *hostdev = def->hostdevs[i]; @@ -5293,6 +5301,17 @@ qemuBuildHostdevCommandLine(virCommand *cmd, if (hostdev->info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_UNASSIGNED) continue; + if (subsys->u.pci.driver.iommufd && iommufd == 0) { + iommufd = 1; + if (qemuMonitorCreateObjectProps(&props, "iommufd", + iommufdId, + NULL) < 0) + return -1; + + if (qemuBuildObjectCommandlineFromJSON(cmd, props) < 0) + return -1; + } + if (qemuCommandAddExtDevice(cmd, hostdev->info, def, qemuCaps) < 0) return -1; -- 2.43.0

Open iommufd FDs from libvirt backend without exposing these FDs to XML users, i.e. one per domain for /dev/iommu and one per iommufd hostdev for /dev/vfio/devices/vfioX, and pass the FD to qemu command line. Signed-off-by: Nathan Chen <nathanc@nvidia.com> --- src/qemu/qemu_command.c | 43 +++++++- src/qemu/qemu_command.h | 3 +- src/qemu/qemu_domain.c | 8 ++ src/qemu/qemu_domain.h | 7 ++ src/qemu/qemu_hotplug.c | 2 +- src/qemu/qemu_process.c | 232 ++++++++++++++++++++++++++++++++++++++++ 6 files changed, 289 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index c9d165dfd9..418ea601d6 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -4800,7 +4800,8 @@ qemuBuildVideoCommandLine(virCommand *cmd, virJSONValue * qemuBuildPCIHostdevDevProps(const virDomainDef *def, - virDomainHostdevDef *dev) + virDomainHostdevDef *dev, + virDomainObj *vm) { g_autoptr(virJSONValue) props = NULL; virDomainHostdevSubsysPCI *pcisrc = &dev->source.subsys.u.pci; @@ -4811,6 +4812,13 @@ qemuBuildPCIHostdevDevProps(const virDomainDef *def, const char *iommufdId = NULL; /* 'ramfb' property must be omitted unless it's to be enabled */ bool ramfb = pcisrc->ramfb == VIR_TRISTATE_SWITCH_ON; + bool useIommufd = false; + qemuDomainObjPrivate *priv = vm ? vm->privateData : NULL; + + if (pcisrc->driver.name == VIR_DEVICE_HOSTDEV_PCI_DRIVER_NAME_VFIO && + pcisrc->driver.iommufd) { + useIommufd = true; + } /* caller has to assign proper passthrough driver name */ switch (pcisrc->driver.name) { @@ -4857,6 +4865,18 @@ qemuBuildPCIHostdevDevProps(const virDomainDef *def, NULL) < 0) return NULL; + if (useIommufd && priv) { + g_autofree char *vfioFdName = g_strdup_printf("vfio-%04x:%02x:%02x.%d", + pcisrc->addr.domain, pcisrc->addr.bus, + pcisrc->addr.slot, pcisrc->addr.function); + + int vfiofd = GPOINTER_TO_INT(g_hash_table_lookup(priv->vfioDeviceFds, vfioFdName)); + if (virJSONValueObjectAdd(&props, + "S:fd", g_strdup_printf("%d", vfiofd), + NULL) < 0) + return NULL; + } + if (qemuBuildDeviceAddressProps(props, def, dev->info) < 0) return NULL; @@ -5267,12 +5287,14 @@ qemuBuildAcpiNodesetProps(virCommand *cmd, static int qemuBuildHostdevCommandLine(virCommand *cmd, const virDomainDef *def, - virQEMUCaps *qemuCaps) + virQEMUCaps *qemuCaps, + virDomainObj *vm) { size_t i; g_autoptr(virJSONValue) props = NULL; int iommufd = 0; const char * iommufdId = "iommufd0"; + qemuDomainObjPrivate *priv = vm->privateData; for (i = 0; i < def->nhostdevs; i++) { virDomainHostdevDef *hostdev = def->hostdevs[i]; @@ -5303,8 +5325,10 @@ qemuBuildHostdevCommandLine(virCommand *cmd, if (subsys->u.pci.driver.iommufd && iommufd == 0) { iommufd = 1; + virCommandPassFD(cmd, priv->iommufd, VIR_COMMAND_PASS_FD_CLOSE_PARENT); if (qemuMonitorCreateObjectProps(&props, "iommufd", iommufdId, + "S:fd", g_strdup_printf("%d", priv->iommufd), NULL) < 0) return -1; @@ -5315,7 +5339,18 @@ qemuBuildHostdevCommandLine(virCommand *cmd, if (qemuCommandAddExtDevice(cmd, hostdev->info, def, qemuCaps) < 0) return -1; - if (!(devprops = qemuBuildPCIHostdevDevProps(def, hostdev))) + if (subsys->u.pci.driver.iommufd) { + virDomainHostdevSubsysPCI *pcisrc = &hostdev->source.subsys.u.pci; + g_autofree char *vfioFdName = g_strdup_printf("vfio-%04x:%02x:%02x.%d", + pcisrc->addr.domain, pcisrc->addr.bus, + pcisrc->addr.slot, pcisrc->addr.function); + + int vfiofd = GPOINTER_TO_INT(g_hash_table_lookup(priv->vfioDeviceFds, vfioFdName)); + + virCommandPassFD(cmd, vfiofd, VIR_COMMAND_PASS_FD_CLOSE_PARENT); + } + + if (!(devprops = qemuBuildPCIHostdevDevProps(def, hostdev, vm))) return -1; if (qemuBuildDeviceCommandlineFromJSON(cmd, devprops, def, qemuCaps) < 0) @@ -11018,7 +11053,7 @@ qemuBuildCommandLine(virDomainObj *vm, if (qemuBuildRedirdevCommandLine(cmd, def, qemuCaps) < 0) return NULL; - if (qemuBuildHostdevCommandLine(cmd, def, qemuCaps) < 0) + if (qemuBuildHostdevCommandLine(cmd, def, qemuCaps, vm) < 0) return NULL; if (migrateURI) diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index ad068f1f16..380aac261f 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -180,7 +180,8 @@ qemuBuildThreadContextProps(virJSONValue **tcProps, /* Current, best practice */ virJSONValue * qemuBuildPCIHostdevDevProps(const virDomainDef *def, - virDomainHostdevDef *dev); + virDomainHostdevDef *dev, + virDomainObj *vm); virJSONValue * qemuBuildRNGDevProps(const virDomainDef *def, diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index e45757ccd5..2f1c3de85d 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1954,6 +1954,11 @@ qemuDomainObjPrivateFree(void *data) virChrdevFree(priv->devs); + if (priv->iommufd >= 0) { + virEventRemoveHandle(priv->iommufd); + priv->iommufd = -1; + } + if (priv->pidMonitored >= 0) { virEventRemoveHandle(priv->pidMonitored); priv->pidMonitored = -1; @@ -1975,6 +1980,7 @@ qemuDomainObjPrivateFree(void *data) g_clear_pointer(&priv->blockjobs, g_hash_table_unref); g_clear_pointer(&priv->fds, g_hash_table_unref); + g_clear_pointer(&priv->vfioDeviceFds, g_hash_table_unref); /* This should never be non-NULL if we get here, but just in case... */ if (priv->eventThread) { @@ -2003,7 +2009,9 @@ qemuDomainObjPrivateAlloc(void *opaque) priv->blockjobs = virHashNew(virObjectUnref); priv->fds = virHashNew(g_object_unref); + priv->vfioDeviceFds = g_hash_table_new(g_str_hash, g_str_equal); + priv->iommufd = -1; priv->pidMonitored = -1; /* agent commands block by default, user can choose different behavior */ diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index ffe5bee1bf..bed5326bba 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -266,6 +266,10 @@ struct _qemuDomainObjPrivate { /* named file descriptor groups associated with the VM */ GHashTable *fds; + int iommufd; + + GHashTable *vfioDeviceFds; + char *memoryBackingDir; }; @@ -1171,3 +1175,6 @@ qemuDomainCheckCPU(virArch arch, bool qemuDomainMachineSupportsFloppy(const char *machine, virQEMUCaps *qemuCaps); + +int qemuProcessOpenVfioFds(virDomainObj *vm); +void qemuProcessCloseVfioFds(virDomainObj *vm); diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index fb426deb1a..661e9008f7 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1630,7 +1630,7 @@ qemuDomainAttachHostPCIDevice(virQEMUDriver *driver, goto error; } - if (!(devprops = qemuBuildPCIHostdevDevProps(vm->def, hostdev))) + if (!(devprops = qemuBuildPCIHostdevDevProps(vm->def, hostdev, vm))) goto error; qemuDomainObjEnterMonitor(vm); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index ead5bf3e48..5acaf12cfc 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -25,6 +25,7 @@ #include <unistd.h> #include <signal.h> #include <sys/stat.h> +#include <dirent.h> #if WITH_SYS_SYSCALL_H # include <sys/syscall.h> #endif @@ -8019,6 +8020,9 @@ qemuProcessLaunch(virConnectPtr conn, if (qemuExtDevicesStart(driver, vm, incomingMigrationExtDevices) < 0) goto cleanup; + if (qemuProcessOpenVfioFds(vm) < 0) + goto cleanup; + if (!(cmd = qemuBuildCommandLine(vm, incoming ? "defer" : NULL, vmop, @@ -10200,3 +10204,231 @@ qemuProcessHandleNbdkitExit(qemuNbdkitProcess *nbdkit, qemuProcessEventSubmit(vm, QEMU_PROCESS_EVENT_NBDKIT_EXITED, 0, 0, nbdkit); virObjectUnlock(vm); } + +/** + * qemuProcessOpenIommuFd: + * @vm: domain object + * @iommuFd: returned file descriptor + * + * Opens /dev/iommu file descriptor for the VM. + * + * Returns: 0 on success, -1 on failure + */ +static int +qemuProcessOpenIommuFd(virDomainObj *vm, int *iommuFd) +{ + int fd = -1; + + VIR_DEBUG("Opening IOMMU FD for domain %s", vm->def->name); + + if ((fd = open("/dev/iommu", O_RDWR | O_CLOEXEC)) < 0) { + if (errno == ENOENT) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("IOMMU FD support requires /dev/iommu device")); + } else { + virReportSystemError(errno, "%s", + _("cannot open /dev/iommu")); + } + return -1; + } + + *iommuFd = fd; + VIR_DEBUG("Opened IOMMU FD %d for domain %s", fd, vm->def->name); + return 0; +} + +/** + * qemuProcessGetVfioDevicePath: + * @hostdev: host device definition + * @vfioPath: returned VFIO device path + * + * Constructs the VFIO device path for a PCI hostdev. + * + * Returns: 0 on success, -1 on failure + */ +static int +qemuProcessGetVfioDevicePath(virDomainHostdevDef *hostdev, + char **vfioPath) +{ + virPCIDeviceAddress *addr; + g_autofree char *sysfsPath = NULL; + DIR *dir = NULL; + struct dirent *entry = NULL; + int ret = -1; + + if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS || + hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("VFIO FD only supported for PCI hostdevs")); + return -1; + } + + addr = &hostdev->source.subsys.u.pci.addr; + + /* Build sysfs path: /sys/bus/pci/devices/DDDD:BB:DD.F/vfio-dev/ */ + sysfsPath = g_strdup_printf("/sys/bus/pci/devices/" + "%04x:%02x:%02x.%d/vfio-dev/", + addr->domain, addr->bus, + addr->slot, addr->function); + + if (virDirOpen(&dir, sysfsPath) < 0) { + virReportSystemError(errno, + _("cannot open VFIO sysfs directory %1$s"), + sysfsPath); + return -1; + } + + /* Find the vfio device name in the directory */ + while (virDirRead(dir, &entry, sysfsPath) > 0) { + if (STRPREFIX(entry->d_name, "vfio")) { + *vfioPath = g_strdup_printf("/dev/vfio/devices/%s", entry->d_name); + ret = 0; + break; + } + } + + if (ret < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot find VFIO device for PCI device %1$04x:%2$02x:%3$02x.%4$d"), + addr->domain, addr->bus, addr->slot, addr->function); + } + + virDirClose(dir); + return ret; +} + +/** + * qemuProcessOpenVfioDeviceFd: + * @hostdev: host device definition + * @vfioFd: returned file descriptor + * + * Opens the VFIO device file descriptor for a hostdev. + * + * Returns: 0 on success, -1 on failure + */ +static int +qemuProcessOpenVfioDeviceFd(virDomainHostdevDef *hostdev, + int *vfioFd) +{ + g_autofree char *vfioPath = NULL; + int fd = -1; + + if (qemuProcessGetVfioDevicePath(hostdev, &vfioPath) < 0) + return -1; + + VIR_DEBUG("Opening VFIO device %s", vfioPath); + + if ((fd = open(vfioPath, O_RDWR | O_CLOEXEC)) < 0) { + if (errno == ENOENT) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("VFIO device %1$s not found - ensure device is bound to vfio-pci driver"), + vfioPath); + } else { + virReportSystemError(errno, + _("cannot open VFIO device %1$s"), vfioPath); + } + return -1; + } + + *vfioFd = fd; + VIR_DEBUG("Opened VFIO device FD %d for %s", *vfioFd, vfioPath); + return 0; +} + +/** + * qemuProcessOpenVfioFds: + * @vm: domain object + * + * Opens all necessary VFIO file descriptors for the domain. + * + * Returns: 0 on success, -1 on failure + */ +int +qemuProcessOpenVfioFds(virDomainObj *vm) +{ + qemuDomainObjPrivate *priv = vm->privateData; + bool needsIommuFd = false; + size_t i; + + /* Check if we have any hostdevs that need VFIO FDs */ + for (i = 0; i < vm->def->nhostdevs; i++) { + virDomainHostdevDef *hostdev = vm->def->hostdevs[i]; + int vfioFd = -1; + g_autofree char *fdname = NULL; + + if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && + hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) { + + /* Check if this hostdev uses VFIO with IOMMU FD */ + if (hostdev->source.subsys.u.pci.driver.name == VIR_DEVICE_HOSTDEV_PCI_DRIVER_NAME_VFIO && + hostdev->source.subsys.u.pci.driver.iommufd) { + + needsIommuFd = true; + + /* Open VFIO device FD */ + if (qemuProcessOpenVfioDeviceFd(hostdev, &vfioFd) < 0) + goto error; + + /* Store the FD */ + fdname = g_strdup_printf("vfio-%04x:%02x:%02x.%d", + hostdev->source.subsys.u.pci.addr.domain, + hostdev->source.subsys.u.pci.addr.bus, + hostdev->source.subsys.u.pci.addr.slot, + hostdev->source.subsys.u.pci.addr.function); + + g_hash_table_insert(priv->vfioDeviceFds, g_steal_pointer(&fdname), GINT_TO_POINTER(vfioFd)); + + VIR_DEBUG("Stored VFIO FD for device %s", fdname); + } + } + } + + /* Open IOMMU FD if needed */ + if (needsIommuFd) { + int iommuFd = -1; + + if (qemuProcessOpenIommuFd(vm, &iommuFd) < 0) + goto error; + + priv->iommufd = iommuFd; + + VIR_DEBUG("Stored IOMMU FD"); + } + + return 0; + + error: + qemuProcessCloseVfioFds(vm); + return -1; +} + +/** + * qemuProcessCloseVfioFds: + * @vm: domain object + * + * Closes all VFIO file descriptors for the domain. + */ +void +qemuProcessCloseVfioFds(virDomainObj *vm) +{ + qemuDomainObjPrivate *priv = vm->privateData; + GHashTableIter iter; + gpointer key, value; + + /* Close all VFIO device FDs */ + if (priv->vfioDeviceFds) { + g_hash_table_iter_init(&iter, priv->vfioDeviceFds); + while (g_hash_table_iter_next(&iter, &key, &value)) { + int fd = GPOINTER_TO_INT(value); + VIR_DEBUG("Closing VFIO device FD %d for %s", fd, (char*)key); + VIR_FORCE_CLOSE(fd); + } + g_hash_table_remove_all(priv->vfioDeviceFds); + } + + /* Close IOMMU FD */ + if (priv->iommufd >= 0) { + VIR_DEBUG("Closing IOMMU FD %d", priv->iommufd); + VIR_FORCE_CLOSE(priv->iommufd); + } +} -- 2.43.0

Allow access to /dev/iommu and /dev/vfio/devices/vfio* when launching a qemu VM with iommufd feature enabled. Signed-off-by: Nathan Chen <nathanc@nvidia.com> --- src/qemu/qemu_cgroup.c | 61 ++++++++++++++++++++++++++++ src/qemu/qemu_cgroup.h | 1 + src/qemu/qemu_namespace.c | 44 +++++++++++++++++++++ src/security/security_apparmor.c | 15 +++++++ src/security/security_dac.c | 34 ++++++++++++++++ src/security/security_selinux.c | 30 ++++++++++++++ src/util/virpci.c | 68 ++++++++++++++++++++++++++++++++ src/util/virpci.h | 1 + 8 files changed, 254 insertions(+) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index f10976c2b0..5f4315051a 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -462,6 +462,54 @@ qemuTeardownInputCgroup(virDomainObj *vm, } +int +qemuSetupIommufdCgroup(virDomainObj *vm) +{ + qemuDomainObjPrivate *priv = vm->privateData; + g_autoptr(DIR) dir = NULL; + struct dirent *dent; + g_autofree char *path = NULL; + int iommufd = 0; + size_t i; + + for (i = 0; i < vm->def->nhostdevs; i++) { + if (vm->def->hostdevs[i]->source.subsys.u.pci.driver.iommufd) { + iommufd = 1; + break; + } + } + + if (iommufd == 1) { + if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) + return 0; + if (virDirOpen(&dir, "/dev/vfio/devices") < 0) { + if (errno == ENOENT) + return 0; + return -1; + } + while (virDirRead(dir, &dent, "/dev/vfio/devices") > 0) { + if (STRPREFIX(dent->d_name, "vfio")) { + path = g_strdup_printf("/dev/vfio/devices/%s", dent->d_name); + } + if (path && + qemuCgroupAllowDevicePath(vm, path, + VIR_CGROUP_DEVICE_RW, false) < 0) { + return -1; + } + path = NULL; + } + if (virFileExists("/dev/iommu")) + path = g_strdup("/dev/iommu"); + if (path && + qemuCgroupAllowDevicePath(vm, path, + VIR_CGROUP_DEVICE_RW, false) < 0) { + return -1; + } + } + return 0; +} + + /** * qemuSetupHostdevCgroup: * vm: domain object @@ -760,6 +808,7 @@ qemuSetupDevicesCgroup(virDomainObj *vm) g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(priv->driver); const char *const *deviceACL = (const char *const *) cfg->cgroupDeviceACL; int rv = -1; + int iommufd = 0; size_t i; if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) @@ -830,6 +879,18 @@ qemuSetupDevicesCgroup(virDomainObj *vm) return -1; } + for (i = 0; i < vm->def->nhostdevs; i++) { + if (vm->def->hostdevs[i]->source.subsys.u.pci.driver.iommufd) { + iommufd = 1; + break; + } + } + + if (iommufd == 1) { + if (qemuSetupIommufdCgroup(vm) < 0) + return -1; + } + for (i = 0; i < vm->def->nmems; i++) { if (qemuSetupMemoryDevicesCgroup(vm, vm->def->mems[i]) < 0) return -1; diff --git a/src/qemu/qemu_cgroup.h b/src/qemu/qemu_cgroup.h index 3668034cde..bea677ba3c 100644 --- a/src/qemu/qemu_cgroup.h +++ b/src/qemu/qemu_cgroup.h @@ -42,6 +42,7 @@ int qemuSetupHostdevCgroup(virDomainObj *vm, int qemuTeardownHostdevCgroup(virDomainObj *vm, virDomainHostdevDef *dev) G_GNUC_WARN_UNUSED_RESULT; +int qemuSetupIommufdCgroup(virDomainObj *vm); int qemuSetupMemoryDevicesCgroup(virDomainObj *vm, virDomainMemoryDef *mem); int qemuTeardownMemoryDevicesCgroup(virDomainObj *vm, diff --git a/src/qemu/qemu_namespace.c b/src/qemu/qemu_namespace.c index f72da83929..1928a84e26 100644 --- a/src/qemu/qemu_namespace.c +++ b/src/qemu/qemu_namespace.c @@ -677,6 +677,47 @@ qemuDomainSetupLaunchSecurity(virDomainObj *vm, } +static int +qemuDomainSetupIommufd(virDomainObj *vm, + GSList **paths) +{ + g_autoptr(DIR) dir = NULL; + struct dirent *dent; + g_autofree char *path = NULL; + int iommufd = 0; + size_t i; + + for (i = 0; i < vm->def->nhostdevs; i++) { + if (vm->def->hostdevs[i]->source.subsys.u.pci.driver.iommufd) { + iommufd = 1; + break; + } + } + + /* Check if iommufd is enabled */ + if (iommufd == 1) { + if (virDirOpen(&dir, "/dev/vfio/devices") < 0) { + if (errno == ENOENT) + return 0; + return -1; + } + while (virDirRead(dir, &dent, "/dev/vfio/devices") > 0) { + if (STRPREFIX(dent->d_name, "vfio")) { + path = g_strdup_printf("/dev/vfio/devices/%s", dent->d_name); + *paths = g_slist_prepend(*paths, g_steal_pointer(&path)); + } + } + path = NULL; + if (virFileExists("/dev/iommu")) + path = g_strdup("/dev/iommu"); + if (path) + *paths = g_slist_prepend(*paths, g_steal_pointer(&path)); + } + + return 0; +} + + static int qemuNamespaceMknodPaths(virDomainObj *vm, GSList *paths, @@ -700,6 +741,9 @@ qemuDomainBuildNamespace(virQEMUDriverConfig *cfg, if (qemuDomainSetupAllDisks(vm, &paths) < 0) return -1; + if (qemuDomainSetupIommufd(vm, &paths) < 0) + return -1; + if (qemuDomainSetupAllHostdevs(vm, &paths) < 0) return -1; diff --git a/src/security/security_apparmor.c b/src/security/security_apparmor.c index 68ac39611f..0a878fd205 100644 --- a/src/security/security_apparmor.c +++ b/src/security/security_apparmor.c @@ -856,6 +856,21 @@ AppArmorSetSecurityHostdevLabel(virSecurityManager *mgr, } ret = AppArmorSetSecurityPCILabel(pci, vfioGroupDev, ptr); VIR_FREE(vfioGroupDev); + + if (dev->source.subsys.u.pci.driver.iommufd) { + g_autofree char *vfiofdDev = virPCIDeviceGetIOMMUFDDev(pci); + const char *iommufdDir = "/dev/iommu"; + if (vfiofdDev) { + int ret2 = AppArmorSetSecurityPCILabel(pci, vfiofdDev, ptr); + if (ret2 < 0) + ret = ret2; + ret2 = AppArmorSetSecurityPCILabel(pci, iommufdDir, ptr); + if (ret2 < 0) + ret = ret2; + } else { + return -1; + } + } } else { ret = virPCIDeviceFileIterate(pci, AppArmorSetSecurityPCILabel, ptr); } diff --git a/src/security/security_dac.c b/src/security/security_dac.c index 2f788b872a..361106222d 100644 --- a/src/security/security_dac.c +++ b/src/security/security_dac.c @@ -1290,6 +1290,24 @@ virSecurityDACSetHostdevLabel(virSecurityManager *mgr, ret = virSecurityDACSetHostdevLabelHelper(vfioGroupDev, false, &cbdata); + if (dev->source.subsys.u.pci.driver.iommufd) { + g_autofree char *vfiofdDev = virPCIDeviceGetIOMMUFDDev(pci); + const char *iommufdDir = "/dev/iommu"; + if (vfiofdDev) { + int ret2 = virSecurityDACSetHostdevLabelHelper(vfiofdDev, + false, + &cbdata); + if (ret2 < 0) + ret = ret2; + ret2 = virSecurityDACSetHostdevLabelHelper(iommufdDir, + false, + &cbdata); + if (ret2 < 0) + ret = ret2; + } else { + return -1; + } + } } else { ret = virPCIDeviceFileIterate(pci, virSecurityDACSetPCILabel, @@ -1450,6 +1468,22 @@ virSecurityDACRestoreHostdevLabel(virSecurityManager *mgr, ret = virSecurityDACRestoreFileLabelInternal(mgr, NULL, vfioGroupDev, false); + if (dev->source.subsys.u.pci.driver.iommufd) { + g_autofree char *vfiofdDev = virPCIDeviceGetIOMMUFDDev(pci); + const char *iommufdDir = "/dev/iommu"; + if (vfiofdDev) { + int ret2 = virSecurityDACRestoreFileLabelInternal(mgr, NULL, + vfiofdDev, false); + if (ret2 < 0) + ret = ret2; + ret2 = virSecurityDACRestoreFileLabelInternal(mgr, NULL, + iommufdDir, false); + if (ret2 < 0) + ret = ret2; + } else { + return -1; + } + } } else { ret = virPCIDeviceFileIterate(pci, virSecurityDACRestorePCILabel, mgr); } diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index fa5d1568eb..8f6b23846b 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -2248,6 +2248,25 @@ virSecuritySELinuxSetHostdevSubsysLabel(virSecurityManager *mgr, ret = virSecuritySELinuxSetHostdevLabelHelper(vfioGroupDev, false, &data); + if (dev->source.subsys.u.pci.driver.iommufd) { + g_autofree char *vfiofdDev = virPCIDeviceGetIOMMUFDDev(pci); + const char *iommufdDir = "/dev/iommu"; + if (vfiofdDev) { + int ret2 = virSecuritySELinuxSetHostdevLabelHelper(vfiofdDev, + false, + &data); + if (ret2 < 0) + ret = ret2; + ret2 = virSecuritySELinuxSetHostdevLabelHelper(iommufdDir, + false, + &data); + if (ret2 < 0) + ret = ret2; + } else { + return -1; + } + } + } else { ret = virPCIDeviceFileIterate(pci, virSecuritySELinuxSetPCILabel, &data); } @@ -2481,6 +2500,17 @@ virSecuritySELinuxRestoreHostdevSubsysLabel(virSecurityManager *mgr, return -1; ret = virSecuritySELinuxRestoreFileLabel(mgr, vfioGroupDev, false); + + if (dev->source.subsys.u.pci.driver.iommufd) { + g_autofree char *vfiofdDev = virPCIDeviceGetIOMMUFDDev(pci); + if (vfiofdDev) { + int ret2 = virSecuritySELinuxRestoreFileLabel(mgr, vfiofdDev, false); + if (ret2 < 0) + ret = ret2; + } else { + return -1; + } + } } else { ret = virPCIDeviceFileIterate(pci, virSecuritySELinuxRestorePCILabel, mgr); } diff --git a/src/util/virpci.c b/src/util/virpci.c index 90617e69c6..6e6e5e47c0 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -2478,6 +2478,74 @@ virPCIDeviceGetIOMMUGroupDev(virPCIDevice *dev) return g_strdup_printf("/dev/vfio/%s", groupFile); } +/* virPCIDeviceGetIOMMUFDDev - return the name of the device used + * to control this PCI device's group (e.g. "/dev/vfio/devices/vfio15") + */ +char * +virPCIDeviceGetIOMMUFDDev(virPCIDevice *dev) +{ + g_autofree char *path = NULL; + const char *pci_addr = NULL; + g_autoptr(DIR) dir = NULL; + struct dirent *entry; + char *vfiodev = NULL; + + /* Get PCI device address */ + pci_addr = virPCIDeviceGetName(dev); + if (!pci_addr) + return NULL; + + /* First try: look in PCI device's vfio-dev subdirectory */ + path = g_strdup_printf("/sys/bus/pci/devices/%s/vfio-dev", pci_addr); + + if (virDirOpen(&dir, path) == 1) { + while (virDirRead(dir, &entry, path) > 0) { + if (!g_str_has_prefix(entry->d_name, "vfio")) + continue; + + vfiodev = g_strdup_printf("/dev/vfio/devices/%s", entry->d_name); + break; + } + /* g_autoptr will automatically close dir when it goes out of scope */ + dir = NULL; + } + + /* Second try: scan /sys/class/vfio-dev for matching device */ + if (!vfiodev) { + g_free(path); + path = g_strdup("/sys/class/vfio-dev"); + + if (virDirOpen(&dir, path) == 1) { + while (virDirRead(dir, &entry, path) > 0) { + g_autofree char *dev_link = NULL; + g_autofree char *target = NULL; + + if (!g_str_has_prefix(entry->d_name, "vfio")) + continue; + + dev_link = g_strdup_printf("/sys/class/vfio-dev/%s/device", entry->d_name); + + if (virFileResolveLink(dev_link, &target) < 0) + continue; + + if (strstr(target, pci_addr)) { + vfiodev = g_strdup_printf("/dev/vfio/devices/%s", entry->d_name); + break; + } + } + /* g_autoptr will automatically close dir */ + } + } + + /* Verify the device path exists and is accessible */ + if (vfiodev && !virFileExists(vfiodev)) { + VIR_FREE(vfiodev); + return NULL; + } + + return vfiodev; +} + static int virPCIDeviceDownstreamLacksACS(virPCIDevice *dev) { diff --git a/src/util/virpci.h b/src/util/virpci.h index fc538566e1..996ffab2f9 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -203,6 +203,7 @@ int virPCIDeviceAddressGetIOMMUGroupNum(virPCIDeviceAddress *addr); char *virPCIDeviceAddressGetIOMMUGroupDev(const virPCIDeviceAddress *devAddr); bool virPCIDeviceExists(const virPCIDeviceAddress *addr); char *virPCIDeviceGetIOMMUGroupDev(virPCIDevice *dev); +char *virPCIDeviceGetIOMMUFDDev(virPCIDevice *dev); int virPCIDeviceIsAssignable(virPCIDevice *dev, int strict_acs_check); -- 2.43.0

Provide sample XML and CLI args for the iommufd XML schema for pc, q35, and virt machine types. Signed-off-by: Nathan Chen <nathanc@nvidia.com> --- .../iommufd-q35.x86_64-latest.args | 41 +++++++++++++ .../iommufd-q35.x86_64-latest.xml | 60 +++++++++++++++++++ tests/qemuxmlconfdata/iommufd-q35.xml | 38 ++++++++++++ .../iommufd-virt.aarch64-latest.args | 33 ++++++++++ .../iommufd-virt.aarch64-latest.xml | 34 +++++++++++ tests/qemuxmlconfdata/iommufd-virt.xml | 22 +++++++ .../iommufd.x86_64-latest.args | 35 +++++++++++ .../qemuxmlconfdata/iommufd.x86_64-latest.xml | 38 ++++++++++++ tests/qemuxmlconfdata/iommufd.xml | 30 ++++++++++ tests/qemuxmlconftest.c | 4 ++ 10 files changed, 335 insertions(+) create mode 100644 tests/qemuxmlconfdata/iommufd-q35.x86_64-latest.args create mode 100644 tests/qemuxmlconfdata/iommufd-q35.x86_64-latest.xml create mode 100644 tests/qemuxmlconfdata/iommufd-q35.xml create mode 100644 tests/qemuxmlconfdata/iommufd-virt.aarch64-latest.args create mode 100644 tests/qemuxmlconfdata/iommufd-virt.aarch64-latest.xml create mode 100644 tests/qemuxmlconfdata/iommufd-virt.xml create mode 100644 tests/qemuxmlconfdata/iommufd.x86_64-latest.args create mode 100644 tests/qemuxmlconfdata/iommufd.x86_64-latest.xml create mode 100644 tests/qemuxmlconfdata/iommufd.xml diff --git a/tests/qemuxmlconfdata/iommufd-q35.x86_64-latest.args b/tests/qemuxmlconfdata/iommufd-q35.x86_64-latest.args new file mode 100644 index 0000000000..7d819e141b --- /dev/null +++ b/tests/qemuxmlconfdata/iommufd-q35.x86_64-latest.args @@ -0,0 +1,41 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/var/lib/libvirt/qemu/domain--1-q35-test \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/var/lib/libvirt/qemu/domain--1-q35-test/.local/share \ +XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain--1-q35-test/.cache \ +XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain--1-q35-test/.config \ +/usr/bin/qemu-system-x86_64 \ +-name guest=q35-test,debug-threads=on \ +-S \ +-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain--1-q35-test/master-key.aes"}' \ +-machine q35,usb=off,dump-guest-core=off,memory-backend=pc.ram,acpi=off \ +-accel tcg \ +-cpu qemu64 \ +-m size=2097152k \ +-object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":2147483648}' \ +-overcommit mem-lock=off \ +-smp 2,sockets=2,cores=1,threads=1 \ +-uuid 11dbdcdd-4c3b-482b-8903-9bdb8c0a2774 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-boot strict=on \ +-device '{"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"}' \ +-device '{"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"}' \ +-device '{"driver":"qemu-xhci","id":"usb","bus":"pci.1","addr":"0x0"}' \ +-blockdev '{"driver":"host_device","filename":"/dev/HostVG/QEMUGuest1","node-name":"libvirt-1-storage","read-only":false}' \ +-device '{"driver":"ide-hd","bus":"ide.0","drive":"libvirt-1-storage","id":"sata0-0-0","bootindex":1}' \ +-audiodev '{"id":"audio1","driver":"none"}' \ +-device '{"driver":"qxl-vga","id":"video0","max_outputs":1,"ram_size":67108864,"vram_size":33554432,"vram64_size_mb":0,"vgamem_mb":8,"bus":"pcie.0","addr":"0x1"}' \ +-global ICH9-LPC.noreboot=off \ +-watchdog-action reset \ +-object '{"qom-type":"iommufd","id":"iommufd0","fd":"-1"}' \ +-device '{"driver":"vfio-pci","host":"0000:06:12.5","id":"hostdev0","iommufd":"iommufd0","fd":"0","bus":"pcie.0","addr":"0x3"}' \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ +-msg timestamp=on diff --git a/tests/qemuxmlconfdata/iommufd-q35.x86_64-latest.xml b/tests/qemuxmlconfdata/iommufd-q35.x86_64-latest.xml new file mode 100644 index 0000000000..bb76252b61 --- /dev/null +++ b/tests/qemuxmlconfdata/iommufd-q35.x86_64-latest.xml @@ -0,0 +1,60 @@ +<domain type='qemu'> + <name>q35-test</name> + <uuid>11dbdcdd-4c3b-482b-8903-9bdb8c0a2774</uuid> + <memory unit='KiB'>2097152</memory> + <currentMemory unit='KiB'>2097152</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='x86_64' machine='q35'>hvm</type> + <boot dev='hd'/> + </os> + <cpu mode='custom' match='exact' check='none'> + <model fallback='forbid'>qemu64</model> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <disk type='block' device='disk'> + <driver name='qemu' type='raw'/> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='sda' bus='sata'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='pci' index='0' model='pcie-root'/> + <controller type='pci' index='1' model='pcie-root-port'> + <model name='pcie-root-port'/> + <target chassis='1' port='0x10'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/> + </controller> + <controller type='pci' index='2' model='pcie-root-port'> + <model name='pcie-root-port'/> + <target chassis='2' port='0x11'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/> + </controller> + <controller type='sata' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> + </controller> + <controller type='usb' index='0' model='qemu-xhci'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> + </controller> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <audio id='1' type='none'/> + <video> + <model type='qxl' ram='65536' vram='32768' vgamem='8192' heads='1' primary='yes'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> + </video> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver iommufd='yes'/> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </hostdev> + <watchdog model='itco' action='reset'/> + <memballoon model='none'/> + </devices> +</domain> diff --git a/tests/qemuxmlconfdata/iommufd-q35.xml b/tests/qemuxmlconfdata/iommufd-q35.xml new file mode 100644 index 0000000000..f3c2269fb1 --- /dev/null +++ b/tests/qemuxmlconfdata/iommufd-q35.xml @@ -0,0 +1,38 @@ +<domain type='qemu'> + <name>q35-test</name> + <uuid>11dbdcdd-4c3b-482b-8903-9bdb8c0a2774</uuid> + <memory unit='KiB'>2097152</memory> + <currentMemory unit='KiB'>2097152</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='x86_64' machine='q35'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='sda' bus='sata'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='pci' index='0' model='pcie-root'/> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver iommufd='yes'/> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </hostdev> + <controller type='sata' index='0'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <video> + <model type='qxl' ram='65536' vram='32768' vgamem='8192' heads='1'/> + </video> + <memballoon model='none'/> + </devices> +</domain> diff --git a/tests/qemuxmlconfdata/iommufd-virt.aarch64-latest.args b/tests/qemuxmlconfdata/iommufd-virt.aarch64-latest.args new file mode 100644 index 0000000000..dbfd395168 --- /dev/null +++ b/tests/qemuxmlconfdata/iommufd-virt.aarch64-latest.args @@ -0,0 +1,33 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/var/lib/libvirt/qemu/domain--1-foo \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/var/lib/libvirt/qemu/domain--1-foo/.local/share \ +XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain--1-foo/.cache \ +XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain--1-foo/.config \ +/usr/bin/qemu-system-aarch64 \ +-name guest=foo,debug-threads=on \ +-S \ +-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain--1-foo/master-key.aes"}' \ +-machine virt,usb=off,gic-version=2,dump-guest-core=off,memory-backend=mach-virt.ram,acpi=off \ +-accel tcg \ +-cpu cortex-a15 \ +-m size=1048576k \ +-object '{"qom-type":"memory-backend-ram","id":"mach-virt.ram","size":1073741824}' \ +-overcommit mem-lock=off \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid 6ba7b810-9dad-11d1-80b4-00c04fd430c8 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-boot strict=on \ +-audiodev '{"id":"audio1","driver":"none"}' \ +-object '{"qom-type":"iommufd","id":"iommufd0","fd":"-1"}' \ +-device '{"driver":"vfio-pci","host":"0000:06:12.5","id":"hostdev0","iommufd":"iommufd0","fd":"0","bus":"pcie.0","addr":"0x1"}' \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ +-msg timestamp=on diff --git a/tests/qemuxmlconfdata/iommufd-virt.aarch64-latest.xml b/tests/qemuxmlconfdata/iommufd-virt.aarch64-latest.xml new file mode 100644 index 0000000000..97b6e1e1c7 --- /dev/null +++ b/tests/qemuxmlconfdata/iommufd-virt.aarch64-latest.xml @@ -0,0 +1,34 @@ +<domain type='qemu'> + <name>foo</name> + <uuid>6ba7b810-9dad-11d1-80b4-00c04fd430c8</uuid> + <memory unit='KiB'>1048576</memory> + <currentMemory unit='KiB'>1048576</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='aarch64' machine='virt'>hvm</type> + <boot dev='hd'/> + </os> + <features> + <gic version='2'/> + </features> + <cpu mode='custom' match='exact' check='none'> + <model fallback='forbid'>cortex-a15</model> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-aarch64</emulator> + <controller type='pci' index='0' model='pcie-root'/> + <audio id='1' type='none'/> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver iommufd='yes'/> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> + </hostdev> + <memballoon model='none'/> + </devices> +</domain> diff --git a/tests/qemuxmlconfdata/iommufd-virt.xml b/tests/qemuxmlconfdata/iommufd-virt.xml new file mode 100644 index 0000000000..c0b9d643b4 --- /dev/null +++ b/tests/qemuxmlconfdata/iommufd-virt.xml @@ -0,0 +1,22 @@ +<domain type='qemu'> + <name>foo</name> + <uuid>6ba7b810-9dad-11d1-80b4-00c04fd430c8</uuid> + <memory unit='KiB'>1048576</memory> + <currentMemory unit='KiB'>1048576</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='aarch64' machine='virt'>hvm</type> + </os> + <devices> + <emulator>/usr/bin/qemu-system-aarch64</emulator> + <controller type='pci' index='0' model='pcie-root'/> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver iommufd='yes'/> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> + </hostdev> + <memballoon model='none'/> + </devices> +</domain> diff --git a/tests/qemuxmlconfdata/iommufd.x86_64-latest.args b/tests/qemuxmlconfdata/iommufd.x86_64-latest.args new file mode 100644 index 0000000000..3130ba2e3a --- /dev/null +++ b/tests/qemuxmlconfdata/iommufd.x86_64-latest.args @@ -0,0 +1,35 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/var/lib/libvirt/qemu/domain--1-foo \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/var/lib/libvirt/qemu/domain--1-foo/.local/share \ +XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain--1-foo/.cache \ +XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain--1-foo/.config \ +/usr/bin/qemu-system-x86_64 \ +-name guest=foo,debug-threads=on \ +-S \ +-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain--1-foo/master-key.aes"}' \ +-machine pc,usb=off,dump-guest-core=off,memory-backend=pc.ram,acpi=off \ +-accel tcg \ +-cpu qemu64 \ +-m size=2097152k \ +-object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":2147483648}' \ +-overcommit mem-lock=off \ +-smp 2,sockets=2,cores=1,threads=1 \ +-uuid 3c7c30b5-7866-4b05-8a29-efebccba52a0 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-boot strict=on \ +-device '{"driver":"piix3-usb-uhci","id":"usb","bus":"pci.0","addr":"0x1.0x2"}' \ +-audiodev '{"id":"audio1","driver":"none"}' \ +-object '{"qom-type":"iommufd","id":"iommufd0","fd":"-1"}' \ +-device '{"driver":"vfio-pci","host":"0000:06:12.5","id":"hostdev0","iommufd":"iommufd0","fd":"0","bus":"pci.0","addr":"0x3"}' \ +-device '{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.0","addr":"0x2"}' \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ +-msg timestamp=on diff --git a/tests/qemuxmlconfdata/iommufd.x86_64-latest.xml b/tests/qemuxmlconfdata/iommufd.x86_64-latest.xml new file mode 100644 index 0000000000..2e8951aaf6 --- /dev/null +++ b/tests/qemuxmlconfdata/iommufd.x86_64-latest.xml @@ -0,0 +1,38 @@ +<domain type='qemu'> + <name>foo</name> + <uuid>3c7c30b5-7866-4b05-8a29-efebccba52a0</uuid> + <memory unit='KiB'>2097152</memory> + <currentMemory unit='KiB'>2097152</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <cpu mode='custom' match='exact' check='none'> + <model fallback='forbid'>qemu64</model> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <controller type='pci' index='0' model='pci-root'/> + <controller type='usb' index='0' model='piix3-uhci'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <audio id='1' type='none'/> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver iommufd='yes'/> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </hostdev> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxmlconfdata/iommufd.xml b/tests/qemuxmlconfdata/iommufd.xml new file mode 100644 index 0000000000..eb278414d2 --- /dev/null +++ b/tests/qemuxmlconfdata/iommufd.xml @@ -0,0 +1,30 @@ +<domain type='qemu'> + <name>foo</name> + <uuid>3c7c30b5-7866-4b05-8a29-efebccba52a0</uuid> + <memory unit='KiB'>2097152</memory> + <currentMemory unit='KiB'>2097152</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <controller type='pci' index='0' model='pci-root'/> + <hostdev mode='subsystem' type='pci' managed='yes'> + <driver iommufd='yes'/> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x5'/> + </source> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </hostdev> + <controller type='usb' index='0'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxmlconftest.c b/tests/qemuxmlconftest.c index 045e56658e..cf9428183d 100644 --- a/tests/qemuxmlconftest.c +++ b/tests/qemuxmlconftest.c @@ -2846,6 +2846,10 @@ mymain(void) DO_TEST_CAPS_LATEST_PARSE_ERROR("virtio-iommu-dma-translation"); DO_TEST_CAPS_LATEST("acpi-generic-initiator"); + DO_TEST_CAPS_LATEST("iommufd"); + DO_TEST_CAPS_LATEST("iommufd-q35"); + DO_TEST_CAPS_ARCH_LATEST("iommufd-virt", "aarch64"); + DO_TEST_CAPS_LATEST("cpu-hotplug-startup"); DO_TEST_CAPS_ARCH_LATEST_PARSE_ERROR("cpu-hotplug-granularity", "ppc64"); -- 2.43.0
participants (1)
-
Nathan Chen