[PATCH v6 0/8] qemu: acpi-generic-initiator support

= Overview = This patch set introduces support for acpi-generic-initiator devices, supported by QEMU [1]. The acpi-generic-initiator object is required to support Multi-Instance GPU (MIG) configurations on NVIDIA GPUs [2]. MIG enables partitioning of GPU resources into multiple isolated instances, each requiring a dedicated virtual NUMA node definition. = Implementation = This patch set implements the libvirt counterpart to the new QEMU feature, enabling users to configure acpi-generic-initiator objects within libvirt domain XML in an abstracted way, used to configure the GPU partitioning. This includes: - adding a "nodeset" attribute to the <acpi> object to assign the list of virtual NUMA nodes to a PCI device, - resolving the nodeset definitions into the proper QEMU command-line arguments, - ensuring compatibility with existing PCI and NUMA configuration. = Example = - Domain XML: ``` ... <cpu mode='host-passthrough' check='none'> <numa> <cell id='0' cpus='0-15' memory='8388608' unit='KiB'/> <cell id='1' memory='0' unit='KiB'/> <cell id='2' memory='0' unit='KiB'/> <cell id='3' memory='0' unit='KiB'/> <cell id='4' memory='0' unit='KiB'/> <cell id='5' memory='0' unit='KiB'/> <cell id='6' memory='0' unit='KiB'/> <cell id='7' memory='0' unit='KiB'/> <cell id='8' memory='0' unit='KiB'/> </numa> </cpu> ... <devices> ... <hostdev mode='subsystem' type='pci' managed='no'> <source> <address domain='0x0009' bus='0x01' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> <acpi nodeset='1-8'/> </hostdev> </devices> ``` - Generated QEMU command line options: ``` ... /usr/bin/qemu-system-aarch64 \ ... -object '{"qom-type":"memory-backend-ram","id":"ram-node0","size":8589934592}' \ -numa node,nodeid=0,cpus=0-15,memdev=ram-node0 \ -numa node,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ -numa node,nodeid=4 \ -numa node,nodeid=5 \ -numa node,nodeid=6 \ -numa node,nodeid=7 \ -numa node,nodeid=8 \ ... -device '{"driver":"vfio-pci","host":"0009:01:00.0","id":"hostdev0","bus":"pci.3","addr":"0x0"}' -object acpi-generic-initiator,id=gi0,pci-dev=hostdev0,node=1 \ -object acpi-generic-initiator,id=gi1,pci-dev=hostdev0,node=2 \ -object acpi-generic-initiator,id=gi2,pci-dev=hostdev0,node=3 \ -object acpi-generic-initiator,id=gi3,pci-dev=hostdev0,node=4 \ -object acpi-generic-initiator,id=gi4,pci-dev=hostdev0,node=5 \ -object acpi-generic-initiator,id=gi5,pci-dev=hostdev0,node=6 \ -object acpi-generic-initiator,id=gi6,pci-dev=hostdev0,node=7 \ -object acpi-generic-initiator,id=gi7,pci-dev=hostdev0,node=8 ``` = References = [1] https://lore.kernel.org/all/20231225045603.7654-2-ankita@nvidia.com/ [2] https://www.nvidia.com/en-in/technologies/multi-instance-gpu/ ChangeLog v5 -> v6: - Add a "nodeset" attribute to <acpi>, instead of introducing a new <acpi-generic-initiator> object (suggested by Daniel) - Rebase to v11.7.0 ChangeLog v4 -> v5: - Integrate suggestions and changes from Michal's review - Update qemu capabilities - Rebase to v11.6.0 ChangeLog v3 -> v4: - add acpi-generic-initiator documentation - refactor virDomainAcpiInitiatorDef to use info->alias and drop the name attribute - auto-generate alias for the acpi-generic-initiator devices via qemuAssignDeviceAliases() - use g_autoptr() when possible - add a new entry to NEWS.rst ChangeLog v2 -> v3: - replaced <text/> with proper types in the XML schema - avoid mixing g_free() and VIR_FREE() - use virXMLPropString() instead of looping all XML nodes - report proper errors with virReportError() - use virBufferEscapeString() to process strings passed by the user - fix broken formatting of function headers - misc coding style fixes ChangeLog v1 -> v2: - split parser and driver changes in separate patches - introduce a new qemu capability flag - introduce test in qemuxmlconftest Andrea Righi (8): qemu: capabilies: Introduce QEMU_CAPS_ACPI_GENERIC_INITIATOR qemu: Allow to define NUMA nodes without memory or CPUs assigned conf: Add nodeset attribute to the <acpi> element qemu: Validate acpi nodeset qemu: Generate acpi-generic-initiator command from acpi nodeset qemu: Add acpi-generic-initiator unit test docs: Document acpi nodeset in hostdev NEWS: Mention new acpi-generic-initiator support NEWS.rst | 10 ++++ docs/formatdomain.rst | 49 +++++++++++++++++ src/conf/device_conf.h | 3 + src/conf/domain_conf.c | 30 +++++++++- src/conf/numa_conf.c | 3 + src/conf/schemas/domaincommon.rng | 5 ++ src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 64 +++++++++++++++++++--- src/qemu/qemu_validate.c | 8 +++ tests/qemucapabilitiesdata/caps_10.0.0_aarch64.xml | 1 + .../caps_10.0.0_x86_64+amdsev.xml | 1 + tests/qemucapabilitiesdata/caps_10.0.0_x86_64.xml | 1 + .../caps_10.1.0_x86_64+inteltdx.xml | 1 + tests/qemucapabilitiesdata/caps_10.1.0_x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_10.2.0_x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_9.0.0_x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_9.1.0_riscv64.xml | 1 + tests/qemucapabilitiesdata/caps_9.1.0_x86_64.xml | 1 + .../caps_9.2.0_aarch64+hvf.xml | 1 + .../caps_9.2.0_x86_64+amdsev.xml | 1 + tests/qemucapabilitiesdata/caps_9.2.0_x86_64.xml | 1 + .../acpi-generic-initiator.x86_64-latest.args | 55 +++++++++++++++++++ .../acpi-generic-initiator.x86_64-latest.xml | 63 +++++++++++++++++++++ tests/qemuxmlconfdata/acpi-generic-initiator.xml | 63 +++++++++++++++++++++ tests/qemuxmlconftest.c | 1 + 26 files changed, 360 insertions(+), 9 deletions(-) create mode 100644 tests/qemuxmlconfdata/acpi-generic-initiator.x86_64-latest.args create mode 100644 tests/qemuxmlconfdata/acpi-generic-initiator.x86_64-latest.xml create mode 100644 tests/qemuxmlconfdata/acpi-generic-initiator.xml

This capability tracks whether QEMU supports the acpi-generic-initiator object type. This object has been introduced in QEMU with the commit: b64b7ed8bb ("qom: new object to associate device to NUMA node"). Signed-off-by: Andrea Righi <arighi@nvidia.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_10.0.0_aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_10.0.0_x86_64+amdsev.xml | 1 + tests/qemucapabilitiesdata/caps_10.0.0_x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_10.1.0_x86_64+inteltdx.xml | 1 + tests/qemucapabilitiesdata/caps_10.1.0_x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_10.2.0_x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_9.0.0_x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_9.1.0_riscv64.xml | 1 + tests/qemucapabilitiesdata/caps_9.1.0_x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_9.2.0_aarch64+hvf.xml | 1 + tests/qemucapabilitiesdata/caps_9.2.0_x86_64+amdsev.xml | 1 + tests/qemucapabilitiesdata/caps_9.2.0_x86_64.xml | 1 + 14 files changed, 15 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 688d100b01..d06e4e12db 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -742,6 +742,7 @@ VIR_ENUM_IMPL(virQEMUCaps, "amd-iommu.pci-id", /* QEMU_CAPS_AMD_IOMMU_PCI_ID */ "usb-bot", /* QEMU_CAPS_DEVICE_USB_BOT */ "tdx-guest", /* QEMU_CAPS_TDX_GUEST */ + "acpi-generic-initiator", /* QEMU_CAPS_ACPI_GENERIC_INITIATOR */ ); @@ -1434,6 +1435,7 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { { "tpm-spapr", QEMU_CAPS_DEVICE_TPM_SPAPR }, { "tpm-emulator", QEMU_CAPS_DEVICE_TPM_EMULATOR }, { "tpm-passthrough", QEMU_CAPS_DEVICE_TPM_PASSTHROUGH }, + { "acpi-generic-initiator", QEMU_CAPS_ACPI_GENERIC_INITIATOR }, }; diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 8916973364..e8cda1e058 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -723,6 +723,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ QEMU_CAPS_AMD_IOMMU_PCI_ID, /* amd-iommu.pci-id */ QEMU_CAPS_DEVICE_USB_BOT, /* -device usb-bot */ QEMU_CAPS_TDX_GUEST, /* -object tdx-guest,... */ + QEMU_CAPS_ACPI_GENERIC_INITIATOR, /* -object acpi-generic-initiator */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_10.0.0_aarch64.xml b/tests/qemucapabilitiesdata/caps_10.0.0_aarch64.xml index 2b071735a9..43d8488a12 100644 --- a/tests/qemucapabilitiesdata/caps_10.0.0_aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_10.0.0_aarch64.xml @@ -163,6 +163,7 @@ <flag name='nvme'/> <flag name='nvme-ns'/> <flag name='usb-bot'/> + <flag name='acpi-generic-initiator'/> <version>10000000</version> <microcodeVersion>61700285</microcodeVersion> <package>v10.0.0</package> diff --git a/tests/qemucapabilitiesdata/caps_10.0.0_x86_64+amdsev.xml b/tests/qemucapabilitiesdata/caps_10.0.0_x86_64+amdsev.xml index 4f15e424e7..b83de7cc4d 100644 --- a/tests/qemucapabilitiesdata/caps_10.0.0_x86_64+amdsev.xml +++ b/tests/qemucapabilitiesdata/caps_10.0.0_x86_64+amdsev.xml @@ -209,6 +209,7 @@ <flag name='nvme-ns'/> <flag name='amd-iommu'/> <flag name='usb-bot'/> + <flag name='acpi-generic-initiator'/> <version>10000000</version> <microcodeVersion>43100285</microcodeVersion> <package>v10.0.0</package> diff --git a/tests/qemucapabilitiesdata/caps_10.0.0_x86_64.xml b/tests/qemucapabilitiesdata/caps_10.0.0_x86_64.xml index 9946ed7d3b..4545de53ef 100644 --- a/tests/qemucapabilitiesdata/caps_10.0.0_x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_10.0.0_x86_64.xml @@ -209,6 +209,7 @@ <flag name='amd-iommu'/> <flag name='amd-iommu.pci-id'/> <flag name='usb-bot'/> + <flag name='acpi-generic-initiator'/> <version>10000000</version> <microcodeVersion>43100285</microcodeVersion> <package>v10.0.0</package> diff --git a/tests/qemucapabilitiesdata/caps_10.1.0_x86_64+inteltdx.xml b/tests/qemucapabilitiesdata/caps_10.1.0_x86_64+inteltdx.xml index e79a4f3e81..3381f0bafa 100644 --- a/tests/qemucapabilitiesdata/caps_10.1.0_x86_64+inteltdx.xml +++ b/tests/qemucapabilitiesdata/caps_10.1.0_x86_64+inteltdx.xml @@ -191,6 +191,7 @@ <flag name='amd-iommu.pci-id'/> <flag name='usb-bot'/> <flag name='tdx-guest'/> + <flag name='acpi-generic-initiator'/> <version>10000050</version> <microcodeVersion>43100286</microcodeVersion> <package>v10.0.0-1724-gf9a3def17b</package> diff --git a/tests/qemucapabilitiesdata/caps_10.1.0_x86_64.xml b/tests/qemucapabilitiesdata/caps_10.1.0_x86_64.xml index dc3088ba2c..2ae3305ba9 100644 --- a/tests/qemucapabilitiesdata/caps_10.1.0_x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_10.1.0_x86_64.xml @@ -213,6 +213,7 @@ <flag name='amd-iommu.pci-id'/> <flag name='usb-bot'/> <flag name='tdx-guest'/> + <flag name='acpi-generic-initiator'/> <version>10001000</version> <microcodeVersion>43100286</microcodeVersion> <package>v10.1.0</package> diff --git a/tests/qemucapabilitiesdata/caps_10.2.0_x86_64.xml b/tests/qemucapabilitiesdata/caps_10.2.0_x86_64.xml index 07826b1a6e..174053183c 100644 --- a/tests/qemucapabilitiesdata/caps_10.2.0_x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_10.2.0_x86_64.xml @@ -213,6 +213,7 @@ <flag name='amd-iommu.pci-id'/> <flag name='usb-bot'/> <flag name='tdx-guest'/> + <flag name='acpi-generic-initiator'/> <version>10001050</version> <microcodeVersion>43100287</microcodeVersion> <package>v10.1.0-1-ge771ba98de</package> diff --git a/tests/qemucapabilitiesdata/caps_9.0.0_x86_64.xml b/tests/qemucapabilitiesdata/caps_9.0.0_x86_64.xml index 4d3066bb11..4953de2247 100644 --- a/tests/qemucapabilitiesdata/caps_9.0.0_x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_9.0.0_x86_64.xml @@ -204,6 +204,7 @@ <flag name='nvme-ns'/> <flag name='amd-iommu'/> <flag name='usb-bot'/> + <flag name='acpi-generic-initiator'/> <version>9000000</version> <microcodeVersion>43100245</microcodeVersion> <package>v9.0.0</package> diff --git a/tests/qemucapabilitiesdata/caps_9.1.0_riscv64.xml b/tests/qemucapabilitiesdata/caps_9.1.0_riscv64.xml index a0bae85971..9ceefed89f 100644 --- a/tests/qemucapabilitiesdata/caps_9.1.0_riscv64.xml +++ b/tests/qemucapabilitiesdata/caps_9.1.0_riscv64.xml @@ -162,6 +162,7 @@ <flag name='nvme'/> <flag name='nvme-ns'/> <flag name='usb-bot'/> + <flag name='acpi-generic-initiator'/> <version>9001000</version> <microcodeVersion>0</microcodeVersion> <package>v9.1.0</package> diff --git a/tests/qemucapabilitiesdata/caps_9.1.0_x86_64.xml b/tests/qemucapabilitiesdata/caps_9.1.0_x86_64.xml index e203286df1..df062944e2 100644 --- a/tests/qemucapabilitiesdata/caps_9.1.0_x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_9.1.0_x86_64.xml @@ -203,6 +203,7 @@ <flag name='nvme-ns'/> <flag name='amd-iommu'/> <flag name='usb-bot'/> + <flag name='acpi-generic-initiator'/> <version>9001000</version> <microcodeVersion>43100246</microcodeVersion> <package>v9.1.0</package> diff --git a/tests/qemucapabilitiesdata/caps_9.2.0_aarch64+hvf.xml b/tests/qemucapabilitiesdata/caps_9.2.0_aarch64+hvf.xml index 50d78138f5..ede8e9fca0 100644 --- a/tests/qemucapabilitiesdata/caps_9.2.0_aarch64+hvf.xml +++ b/tests/qemucapabilitiesdata/caps_9.2.0_aarch64+hvf.xml @@ -135,6 +135,7 @@ <flag name='nvme'/> <flag name='nvme-ns'/> <flag name='usb-bot'/> + <flag name='acpi-generic-initiator'/> <version>9002002</version> <microcodeVersion>61700247</microcodeVersion> <package></package> diff --git a/tests/qemucapabilitiesdata/caps_9.2.0_x86_64+amdsev.xml b/tests/qemucapabilitiesdata/caps_9.2.0_x86_64+amdsev.xml index e94093a201..048d1b1462 100644 --- a/tests/qemucapabilitiesdata/caps_9.2.0_x86_64+amdsev.xml +++ b/tests/qemucapabilitiesdata/caps_9.2.0_x86_64+amdsev.xml @@ -207,6 +207,7 @@ <flag name='nvme-ns'/> <flag name='amd-iommu'/> <flag name='usb-bot'/> + <flag name='acpi-generic-initiator'/> <version>9002000</version> <microcodeVersion>43100247</microcodeVersion> <package>v9.2.0</package> diff --git a/tests/qemucapabilitiesdata/caps_9.2.0_x86_64.xml b/tests/qemucapabilitiesdata/caps_9.2.0_x86_64.xml index 889576d1f7..dd2d876cad 100644 --- a/tests/qemucapabilitiesdata/caps_9.2.0_x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_9.2.0_x86_64.xml @@ -205,6 +205,7 @@ <flag name='nvme-ns'/> <flag name='amd-iommu'/> <flag name='usb-bot'/> + <flag name='acpi-generic-initiator'/> <version>9002000</version> <microcodeVersion>43100247</microcodeVersion> <package>v9.2.0</package> -- 2.51.0

On Sat, Sep 06, 2025 at 03:08:56PM +0200, Andrea Righi wrote:
This capability tracks whether QEMU supports the acpi-generic-initiator object type.
This object has been introduced in QEMU with the commit: b64b7ed8bb ("qom: new object to associate device to NUMA node").
Signed-off-by: Andrea Righi <arighi@nvidia.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_10.0.0_aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_10.0.0_x86_64+amdsev.xml | 1 + tests/qemucapabilitiesdata/caps_10.0.0_x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_10.1.0_x86_64+inteltdx.xml | 1 + tests/qemucapabilitiesdata/caps_10.1.0_x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_10.2.0_x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_9.0.0_x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_9.1.0_riscv64.xml | 1 + tests/qemucapabilitiesdata/caps_9.1.0_x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_9.2.0_aarch64+hvf.xml | 1 + tests/qemucapabilitiesdata/caps_9.2.0_x86_64+amdsev.xml | 1 + tests/qemucapabilitiesdata/caps_9.2.0_x86_64.xml | 1 + 14 files changed, 15 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Allow to define NUMA nodes without memory or CPUs assigned to properly support the new acpi-generic-initiator device. This is required because the NUMA nodes passed to the acpi-generic-initiator object must be independent and not be shared with other resources, such as CPU or memory. Signed-off-by: Andrea Righi <arighi@nvidia.com> --- src/conf/numa_conf.c | 3 +++ src/qemu/qemu_command.c | 19 ++++++++++++------- 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c index 00f0c605ee..5b50f3e3f5 100644 --- a/src/conf/numa_conf.c +++ b/src/conf/numa_conf.c @@ -1492,6 +1492,9 @@ virDomainNumaFillCPUsInNode(virDomainNuma *numa, if (node >= virDomainNumaGetNodeCount(numa)) return -1; + if (virDomainNumaGetNodeMemorySize(numa, node) == 0) + return 0; + virBitmapSetAll(maxCPUsBitmap); for (i = 0; i < numa->nmem_nodes; i++) { diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index e8de386f30..3f9b583985 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -7820,7 +7820,9 @@ qemuBuildNumaCommandLine(virQEMUDriverConfig *cfg, } } - if (masterInitiator < 0) { + /* HMAT requires a master initiator, so when it's enabled, ensure that + * at least one NUMA node has CPUs assigned. */ + if (hmat && masterInitiator < 0) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("At least one NUMA node has to have CPUs")); goto cleanup; @@ -7828,8 +7830,9 @@ qemuBuildNumaCommandLine(virQEMUDriverConfig *cfg, for (i = 0; i < ncells; i++) { ssize_t initiator = virDomainNumaGetNodeInitiator(def->numa, i); + unsigned long long memSize = virDomainNumaGetNodeMemorySize(def->numa, i); - if (needBackend) { + if (needBackend && memSize > 0) { g_autoptr(virJSONValue) tcProps = NULL; if (qemuBuildThreadContextProps(&tcProps, &nodeBackends[i], @@ -7857,11 +7860,13 @@ qemuBuildNumaCommandLine(virQEMUDriverConfig *cfg, virBufferAsprintf(&buf, ",initiator=%zd", initiator); } - if (needBackend) - virBufferAsprintf(&buf, ",memdev=ram-node%zu", i); - else - virBufferAsprintf(&buf, ",mem=%llu", - virDomainNumaGetNodeMemorySize(def->numa, i) / 1024); + if (memSize > 0) { + if (needBackend) { + virBufferAsprintf(&buf, ",memdev=ram-node%zu", i); + } else { + virBufferAsprintf(&buf, ",mem=%llu", memSize / 1024); + } + } virCommandAddArgBuffer(cmd, &buf); } -- 2.51.0

On Sat, Sep 06, 2025 at 03:08:57PM +0200, Andrea Righi wrote:
Allow to define NUMA nodes without memory or CPUs assigned to properly support the new acpi-generic-initiator device.
This is required because the NUMA nodes passed to the acpi-generic-initiator object must be independent and not be shared with other resources, such as CPU or memory.
Signed-off-by: Andrea Righi <arighi@nvidia.com> --- src/conf/numa_conf.c | 3 +++ src/qemu/qemu_command.c | 19 ++++++++++++------- 2 files changed, 15 insertions(+), 7 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

This enables partitioning of PCI devices into multiple isolated instances, each requiring a dedicated virtual NUMA node definition. Link: https://mail.gnu.org/archive/html/qemu-arm/2024-03/msg00358.html Signed-off-by: Andrea Righi <arighi@nvidia.com> --- src/conf/device_conf.h | 3 +++ src/conf/domain_conf.c | 30 ++++++++++++++++++++++++++++-- src/conf/schemas/domaincommon.rng | 5 +++++ 3 files changed, 36 insertions(+), 2 deletions(-) diff --git a/src/conf/device_conf.h b/src/conf/device_conf.h index 2d97410f6e..e570f51824 100644 --- a/src/conf/device_conf.h +++ b/src/conf/device_conf.h @@ -185,6 +185,9 @@ struct _virDomainDeviceInfo { * cases we might want to prevent that from happening by * locking the isolation group */ bool isolationGroupLocked; + + /* NUMA nodeset affinity for this device */ + virBitmap *acpiNodeset; }; int virDeviceHostdevPCIDriverInfoParseXML(xmlNodePtr node, diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 7766e302ec..8c0bf63925 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -5558,8 +5558,20 @@ virDomainDeviceInfoFormat(virBuffer *buf, virBufferAddLit(buf, "/>\n"); } - if (info->acpiIndex != 0) - virBufferAsprintf(buf, "<acpi index='%u'/>\n", info->acpiIndex); + if (info->acpiIndex != 0 || info->acpiNodeset) { + virBufferAddLit(buf, "<acpi"); + + if (info->acpiIndex != 0) + virBufferAsprintf(buf, " index='%u'", info->acpiIndex); + + if (info->acpiNodeset) { + g_autofree char *nodeset = virBitmapFormat(info->acpiNodeset); + if (nodeset) + virBufferAsprintf(buf, " nodeset='%s'", nodeset); + } + + virBufferAddLit(buf, "/>\n"); + } if (info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE || info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_VIRTIO_S390) @@ -5884,9 +5896,23 @@ virDomainDeviceInfoParseXML(virDomainXMLOption *xmlopt, } if ((acpi = virXPathNode("./acpi", ctxt))) { + g_autofree char *nodeset = NULL; + if (virXMLPropUInt(acpi, "index", 10, VIR_XML_PROP_NONZERO, &info->acpiIndex) < 0) goto cleanup; + + if ((nodeset = virXMLPropString(acpi, "nodeset"))) { + if (virBitmapParse(nodeset, &info->acpiNodeset, + VIR_DOMAIN_CPUMASK_LEN) < 0) + goto cleanup; + + if (virBitmapIsAllClear(info->acpiNodeset)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Invalid value of 'nodeset': %1$s"), nodeset); + goto cleanup; + } + } } if ((address = virXPathNode("./address", ctxt)) && diff --git a/src/conf/schemas/domaincommon.rng b/src/conf/schemas/domaincommon.rng index e369fb6e81..298afe0b7c 100644 --- a/src/conf/schemas/domaincommon.rng +++ b/src/conf/schemas/domaincommon.rng @@ -7454,6 +7454,11 @@ <ref name="unsignedInt"/> </attribute> </optional> + <optional> + <attribute name="nodeset"> + <ref name="cpuset"/> + </attribute> + </optional> </element> </define> -- 2.51.0

On Sat, Sep 06, 2025 at 03:08:58PM +0200, Andrea Righi wrote:
This enables partitioning of PCI devices into multiple isolated instances, each requiring a dedicated virtual NUMA node definition.
Link: https://mail.gnu.org/archive/html/qemu-arm/2024-03/msg00358.html Signed-off-by: Andrea Righi <arighi@nvidia.com> --- src/conf/device_conf.h | 3 +++ src/conf/domain_conf.c | 30 ++++++++++++++++++++++++++++-- src/conf/schemas/domaincommon.rng | 5 +++++ 3 files changed, 36 insertions(+), 2 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Sat, Sep 06, 2025 at 03:08:58PM +0200, Andrea Righi wrote:
This enables partitioning of PCI devices into multiple isolated instances, each requiring a dedicated virtual NUMA node definition.
Link: https://mail.gnu.org/archive/html/qemu-arm/2024-03/msg00358.html Signed-off-by: Andrea Righi <arighi@nvidia.com> --- src/conf/device_conf.h | 3 +++ src/conf/domain_conf.c | 30 ++++++++++++++++++++++++++++-- src/conf/schemas/domaincommon.rng | 5 +++++ 3 files changed, 36 insertions(+), 2 deletions(-)
diff --git a/src/conf/device_conf.h b/src/conf/device_conf.h index 2d97410f6e..e570f51824 100644 --- a/src/conf/device_conf.h +++ b/src/conf/device_conf.h @@ -185,6 +185,9 @@ struct _virDomainDeviceInfo { * cases we might want to prevent that from happening by * locking the isolation group */ bool isolationGroupLocked; + + /* NUMA nodeset affinity for this device */ + virBitmap *acpiNodeset; };
This needed a virBitmapFree added, so I've fixed that too. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Signed-off-by: Andrea Righi <arighi@nvidia.com> --- src/qemu/qemu_validate.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index adba3e4a89..c7ecb467a3 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -1717,6 +1717,14 @@ qemuValidateDomainDeviceInfo(const virDomainDeviceDef *dev, } } + if (info->acpiNodeset) { + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_ACPI_GENERIC_INITIATOR)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("ACPI nodeset is not supported with this QEMU")); + return -1; + } + } + if (info->romenabled || info->rombar || info->romfile) { if (info->type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI && info->type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE && -- 2.51.0

On Sat, Sep 06, 2025 at 03:08:59PM +0200, Andrea Righi wrote:
Signed-off-by: Andrea Righi <arighi@nvidia.com> --- src/qemu/qemu_validate.c | 8 ++++++++ 1 file changed, 8 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Signed-off-by: Andrea Righi <arighi@nvidia.com> --- src/qemu/qemu_command.c | 45 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 3f9b583985..9ca0847789 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -5222,6 +5222,47 @@ qemuBuildHostdevSCSICommandLine(virCommand *cmd, } +static int +qemuBuildAcpiNodesetProps(virCommand *cmd, + virDomainDeviceInfo *info, + virQEMUCaps *qemuCaps) +{ + static unsigned int giIndex; + int node = -1; + + if (!info->acpiNodeset) + return 0; + + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_ACPI_GENERIC_INITIATOR)) + return -1; + + while ((node = virBitmapNextSetBit(info->acpiNodeset, node)) > -1) { + g_autoptr(virJSONValue) props = NULL; + g_autofree char *id = g_strdup_printf("gi%u", giIndex++); + + if (virJSONValueObjectAdd(&props, + "s:qom-type", "acpi-generic-initiator", + "s:id", id, + "s:pci-dev", info->alias, + "i:node", node, + NULL) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to build acpi-generic-initiator properties")); + + return -1; + } + + if (qemuBuildObjectCommandlineFromJSON(cmd, props) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to build QEMU command line for acpi-generic-initiator")); + return -1; + } + } + + return 0; +} + + static int qemuBuildHostdevCommandLine(virCommand *cmd, const virDomainDef *def, @@ -5264,6 +5305,10 @@ qemuBuildHostdevCommandLine(virCommand *cmd, if (qemuBuildDeviceCommandlineFromJSON(cmd, devprops, def, qemuCaps) < 0) return -1; + + if (qemuBuildAcpiNodesetProps(cmd, hostdev->info, qemuCaps) < 0) + return -1; + break; /* SCSI */ -- 2.51.0

On Sat, Sep 06, 2025 at 03:09:00PM +0200, Andrea Righi wrote:
Signed-off-by: Andrea Righi <arighi@nvidia.com> --- src/qemu/qemu_command.c | 45 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 3f9b583985..9ca0847789 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -5222,6 +5222,47 @@ qemuBuildHostdevSCSICommandLine(virCommand *cmd, }
+static int +qemuBuildAcpiNodesetProps(virCommand *cmd, + virDomainDeviceInfo *info, + virQEMUCaps *qemuCaps) +{ + static unsigned int giIndex; + int node = -1; + + if (!info->acpiNodeset) + return 0; + + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_ACPI_GENERIC_INITIATOR)) + return -1;
We can assume the validate function already ran, so we don't need this check here, which is good as this would return an error status without setting an error message.
+ + while ((node = virBitmapNextSetBit(info->acpiNodeset, node)) > -1) { + g_autoptr(virJSONValue) props = NULL; + g_autofree char *id = g_strdup_printf("gi%u", giIndex++); + + if (virJSONValueObjectAdd(&props, + "s:qom-type", "acpi-generic-initiator", + "s:id", id, + "s:pci-dev", info->alias, + "i:node", node, + NULL) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to build acpi-generic-initiator properties")); + + return -1; + } + + if (qemuBuildObjectCommandlineFromJSON(cmd, props) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to build QEMU command line for acpi-generic-initiator")); + return -1; + } + } + + return 0; +} + + static int qemuBuildHostdevCommandLine(virCommand *cmd, const virDomainDef *def, @@ -5264,6 +5305,10 @@ qemuBuildHostdevCommandLine(virCommand *cmd,
if (qemuBuildDeviceCommandlineFromJSON(cmd, devprops, def, qemuCaps) < 0) return -1; + + if (qemuBuildAcpiNodesetProps(cmd, hostdev->info, qemuCaps) < 0) + return -1; + break;
/* SCSI */ -- 2.51.0
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Hi Daniel, On Mon, Sep 08, 2025 at 06:02:20PM +0100, Daniel P. Berrangé wrote:
On Sat, Sep 06, 2025 at 03:09:00PM +0200, Andrea Righi wrote:
Signed-off-by: Andrea Righi <arighi@nvidia.com> --- src/qemu/qemu_command.c | 45 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 3f9b583985..9ca0847789 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -5222,6 +5222,47 @@ qemuBuildHostdevSCSICommandLine(virCommand *cmd, }
+static int +qemuBuildAcpiNodesetProps(virCommand *cmd, + virDomainDeviceInfo *info, + virQEMUCaps *qemuCaps) +{ + static unsigned int giIndex; + int node = -1; + + if (!info->acpiNodeset) + return 0; + + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_ACPI_GENERIC_INITIATOR)) + return -1;
We can assume the validate function already ran, so we don't need this check here, which is good as this would return an error status without setting an error message.
Ah yes, this check is redundant, we can definitely drop it. Should I send an update patch just with this change? Thanks, -Andrea

On Mon, Sep 08, 2025 at 07:15:18PM +0200, Andrea Righi wrote:
Hi Daniel,
On Mon, Sep 08, 2025 at 06:02:20PM +0100, Daniel P. Berrangé wrote:
On Sat, Sep 06, 2025 at 03:09:00PM +0200, Andrea Righi wrote:
Signed-off-by: Andrea Righi <arighi@nvidia.com> --- src/qemu/qemu_command.c | 45 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 3f9b583985..9ca0847789 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -5222,6 +5222,47 @@ qemuBuildHostdevSCSICommandLine(virCommand *cmd, }
+static int +qemuBuildAcpiNodesetProps(virCommand *cmd, + virDomainDeviceInfo *info, + virQEMUCaps *qemuCaps) +{ + static unsigned int giIndex; + int node = -1; + + if (!info->acpiNodeset) + return 0; + + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_ACPI_GENERIC_INITIATOR)) + return -1;
We can assume the validate function already ran, so we don't need this check here, which is good as this would return an error status without setting an error message.
Ah yes, this check is redundant, we can definitely drop it. Should I send an update patch just with this change?
Don't bother. The rest of the series is fine, so I'll make the obvious change and push this once I've validated it in CI. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Mon, Sep 08, 2025 at 06:19:38PM +0100, Daniel P. Berrangé wrote:
On Mon, Sep 08, 2025 at 07:15:18PM +0200, Andrea Righi wrote:
Hi Daniel,
On Mon, Sep 08, 2025 at 06:02:20PM +0100, Daniel P. Berrangé wrote:
On Sat, Sep 06, 2025 at 03:09:00PM +0200, Andrea Righi wrote:
Signed-off-by: Andrea Righi <arighi@nvidia.com> --- src/qemu/qemu_command.c | 45 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 3f9b583985..9ca0847789 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -5222,6 +5222,47 @@ qemuBuildHostdevSCSICommandLine(virCommand *cmd, }
+static int +qemuBuildAcpiNodesetProps(virCommand *cmd, + virDomainDeviceInfo *info, + virQEMUCaps *qemuCaps) +{ + static unsigned int giIndex; + int node = -1; + + if (!info->acpiNodeset) + return 0; + + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_ACPI_GENERIC_INITIATOR)) + return -1;
We can assume the validate function already ran, so we don't need this check here, which is good as this would return an error status without setting an error message.
Ah yes, this check is redundant, we can definitely drop it. Should I send an update patch just with this change?
Don't bother. The rest of the series is fine, so I'll make the obvious change and push this once I've validated it in CI.
Ok, thanks! -Andrea

Signed-off-by: Andrea Righi <arighi@nvidia.com> --- .../acpi-generic-initiator.x86_64-latest.args | 55 ++++++++++++++++ .../acpi-generic-initiator.x86_64-latest.xml | 63 +++++++++++++++++++ .../acpi-generic-initiator.xml | 63 +++++++++++++++++++ tests/qemuxmlconftest.c | 1 + 4 files changed, 182 insertions(+) create mode 100644 tests/qemuxmlconfdata/acpi-generic-initiator.x86_64-latest.args create mode 100644 tests/qemuxmlconfdata/acpi-generic-initiator.x86_64-latest.xml create mode 100644 tests/qemuxmlconfdata/acpi-generic-initiator.xml diff --git a/tests/qemuxmlconfdata/acpi-generic-initiator.x86_64-latest.args b/tests/qemuxmlconfdata/acpi-generic-initiator.x86_64-latest.args new file mode 100644 index 0000000000..87de002afd --- /dev/null +++ b/tests/qemuxmlconfdata/acpi-generic-initiator.x86_64-latest.args @@ -0,0 +1,55 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest2 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest2/.local/share \ +XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest2/.cache \ +XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain--1-QEMUGuest2/.config \ +/usr/bin/qemu-system-x86_64 \ +-name guest=QEMUGuest2,debug-threads=on \ +-S \ +-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain--1-QEMUGuest2/master-key.aes"}' \ +-machine q35,usb=off,dump-guest-core=off,acpi=off \ +-accel tcg \ +-cpu qemu64 \ +-m size=8388608k \ +-overcommit mem-lock=off \ +-smp 16,sockets=16,cores=1,threads=1 \ +-object '{"qom-type":"memory-backend-ram","id":"ram-node0","size":8589934592}' \ +-numa node,nodeid=0,cpus=0-15,memdev=ram-node0 \ +-numa node,nodeid=1 \ +-numa node,nodeid=2 \ +-numa node,nodeid=3 \ +-numa node,nodeid=4 \ +-numa node,nodeid=5 \ +-numa node,nodeid=6 \ +-numa node,nodeid=7 \ +-numa node,nodeid=8 \ +-uuid c7a5fdbd-edaf-9466-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-boot strict=on \ +-device '{"driver":"pcie-root-port","port":8,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x1"}' \ +-device '{"driver":"pcie-root-port","port":9,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x1.0x1"}' \ +-device '{"driver":"qemu-xhci","id":"usb","bus":"pci.1","addr":"0x0"}' \ +-audiodev '{"id":"audio1","driver":"none"}' \ +-global ICH9-LPC.noreboot=off \ +-watchdog-action reset \ +-device '{"driver":"vfio-pci","host":"0000:06:12.1","id":"hostdev0","bus":"pcie.0","addr":"0x2"}' \ +-object '{"qom-type":"acpi-generic-initiator","id":"gi0","pci-dev":"hostdev0","node":1}' \ +-object '{"qom-type":"acpi-generic-initiator","id":"gi1","pci-dev":"hostdev0","node":2}' \ +-object '{"qom-type":"acpi-generic-initiator","id":"gi2","pci-dev":"hostdev0","node":3}' \ +-object '{"qom-type":"acpi-generic-initiator","id":"gi3","pci-dev":"hostdev0","node":4}' \ +-object '{"qom-type":"acpi-generic-initiator","id":"gi4","pci-dev":"hostdev0","node":5}' \ +-object '{"qom-type":"acpi-generic-initiator","id":"gi5","pci-dev":"hostdev0","node":6}' \ +-object '{"qom-type":"acpi-generic-initiator","id":"gi6","pci-dev":"hostdev0","node":7}' \ +-object '{"qom-type":"acpi-generic-initiator","id":"gi7","pci-dev":"hostdev0","node":8}' \ +-device '{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pcie.0","addr":"0x6"}' \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ +-msg timestamp=on diff --git a/tests/qemuxmlconfdata/acpi-generic-initiator.x86_64-latest.xml b/tests/qemuxmlconfdata/acpi-generic-initiator.x86_64-latest.xml new file mode 100644 index 0000000000..26194ed11d --- /dev/null +++ b/tests/qemuxmlconfdata/acpi-generic-initiator.x86_64-latest.xml @@ -0,0 +1,63 @@ +<domain type='qemu'> + <name>QEMUGuest2</name> + <uuid>c7a5fdbd-edaf-9466-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>16</vcpu> + <os> + <type arch='x86_64' machine='q35'>hvm</type> + <boot dev='hd'/> + </os> + <cpu mode='custom' match='exact' check='none'> + <model fallback='forbid'>qemu64</model> + <numa> + <cell id='0' cpus='0-15' memory='8388608' unit='KiB'/> + <cell id='1' memory='0' unit='KiB'/> + <cell id='2' memory='0' unit='KiB'/> + <cell id='3' memory='0' unit='KiB'/> + <cell id='4' memory='0' unit='KiB'/> + <cell id='5' memory='0' unit='KiB'/> + <cell id='6' memory='0' unit='KiB'/> + <cell id='7' memory='0' unit='KiB'/> + <cell id='8' memory='0' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <controller type='pci' index='0' model='pcie-root'/> + <controller type='pci' index='1' model='pcie-root-port'> + <model name='pcie-root-port'/> + <target chassis='1' port='0x8'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/> + </controller> + <controller type='pci' index='2' model='pcie-root-port'> + <model name='pcie-root-port'/> + <target chassis='2' port='0x9'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <controller type='usb' index='0' model='qemu-xhci'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> + </controller> + <controller type='sata' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> + </controller> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <audio id='1' type='none'/> + <hostdev mode='subsystem' type='pci' managed='yes'> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> + </source> + <acpi nodeset='1-8'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </hostdev> + <watchdog model='itco' action='reset'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxmlconfdata/acpi-generic-initiator.xml b/tests/qemuxmlconfdata/acpi-generic-initiator.xml new file mode 100644 index 0000000000..26194ed11d --- /dev/null +++ b/tests/qemuxmlconfdata/acpi-generic-initiator.xml @@ -0,0 +1,63 @@ +<domain type='qemu'> + <name>QEMUGuest2</name> + <uuid>c7a5fdbd-edaf-9466-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>16</vcpu> + <os> + <type arch='x86_64' machine='q35'>hvm</type> + <boot dev='hd'/> + </os> + <cpu mode='custom' match='exact' check='none'> + <model fallback='forbid'>qemu64</model> + <numa> + <cell id='0' cpus='0-15' memory='8388608' unit='KiB'/> + <cell id='1' memory='0' unit='KiB'/> + <cell id='2' memory='0' unit='KiB'/> + <cell id='3' memory='0' unit='KiB'/> + <cell id='4' memory='0' unit='KiB'/> + <cell id='5' memory='0' unit='KiB'/> + <cell id='6' memory='0' unit='KiB'/> + <cell id='7' memory='0' unit='KiB'/> + <cell id='8' memory='0' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <controller type='pci' index='0' model='pcie-root'/> + <controller type='pci' index='1' model='pcie-root-port'> + <model name='pcie-root-port'/> + <target chassis='1' port='0x8'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/> + </controller> + <controller type='pci' index='2' model='pcie-root-port'> + <model name='pcie-root-port'/> + <target chassis='2' port='0x9'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <controller type='usb' index='0' model='qemu-xhci'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> + </controller> + <controller type='sata' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> + </controller> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <audio id='1' type='none'/> + <hostdev mode='subsystem' type='pci' managed='yes'> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> + </source> + <acpi nodeset='1-8'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </hostdev> + <watchdog model='itco' action='reset'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxmlconftest.c b/tests/qemuxmlconftest.c index d4ce969b5a..171a6f1c78 100644 --- a/tests/qemuxmlconftest.c +++ b/tests/qemuxmlconftest.c @@ -2843,6 +2843,7 @@ mymain(void) DO_TEST_CAPS_LATEST_PARSE_ERROR("virtio-iommu-invalid-address-type"); DO_TEST_CAPS_LATEST_PARSE_ERROR("virtio-iommu-invalid-address"); DO_TEST_CAPS_LATEST_PARSE_ERROR("virtio-iommu-dma-translation"); + DO_TEST_CAPS_LATEST("acpi-generic-initiator"); DO_TEST_CAPS_LATEST("cpu-hotplug-startup"); DO_TEST_CAPS_ARCH_LATEST_PARSE_ERROR("cpu-hotplug-granularity", "ppc64"); -- 2.51.0

On Sat, Sep 06, 2025 at 03:09:01PM +0200, Andrea Righi wrote:
Signed-off-by: Andrea Righi <arighi@nvidia.com> --- .../acpi-generic-initiator.x86_64-latest.args | 55 ++++++++++++++++ .../acpi-generic-initiator.x86_64-latest.xml | 63 +++++++++++++++++++ .../acpi-generic-initiator.xml | 63 +++++++++++++++++++ tests/qemuxmlconftest.c | 1 + 4 files changed, 182 insertions(+) create mode 100644 tests/qemuxmlconfdata/acpi-generic-initiator.x86_64-latest.args create mode 100644 tests/qemuxmlconfdata/acpi-generic-initiator.x86_64-latest.xml create mode 100644 tests/qemuxmlconfdata/acpi-generic-initiator.xml
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Add documentation for the new <acpi nodeset="..."> element in hostdev, which allows associating devices with ACPI Generic Initiator objects in QEMU. A typical use case is NVIDIA Multi-Instance GPU (MIG), where a physical GPU is partitioned into multiple isolated instances, each tied to one or more virtual NUMA nodes. The documentation includes an example showing how to configure <numa> cells together with a MIG device. Signed-off-by: Andrea Righi <arighi@nvidia.com> --- docs/formatdomain.rst | 49 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 9f7311b6d5..24f7cdd018 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -4894,6 +4894,55 @@ or: host device. :since:`Since 1.0.6`, but only works as expected :since:`since 1.2.2`. +ACPI Generic Initiators +^^^^^^^^^^^^^^^^^^^^^^^^ + +A host device may include an ``<acpi>`` element to create ACPI Generic +Initiator objects for the device in QEMU. + +This can be used for **NVIDIA Multi-Instance GPU (MIG)** configurations, +where a physical GPU is partitioned into multiple isolated instances, each +associated with one or more virtual NUMA nodes. + +By attaching an ``<acpi nodeset=.../>`` element to the MIG device in the +domain XML, the guest will configure the correct partitioning for that +instance. + +.. code-block:: xml + + <numa> + <cell id='0' cpus='0-15' memory='8388608' unit='KiB'/> + <cell id='1' memory='0' unit='KiB'/> + <cell id='2' memory='0' unit='KiB'/> + <cell id='3' memory='0' unit='KiB'/> + <cell id='4' memory='0' unit='KiB'/> + <cell id='5' memory='0' unit='KiB'/> + <cell id='6' memory='0' unit='KiB'/> + <cell id='7' memory='0' unit='KiB'/> + <cell id='8' memory='0' unit='KiB'/> + </numa> + ... + <hostdev mode='subsystem' type='pci' managed='yes'> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> + </source> + <acpi nodeset='1-8'/> + <address type='pci' domain='0x0000' bus='0x00' + slot='0x02' function='0x0'/> + </hostdev> + +Attributes of ``<acpi>``: + +``nodeset`` + A list of NUMA node IDs that will be associated with the device. + Each node in the set causes libvirt to create an + ``acpi-generic-initiator`` object in QEMU, tied to this device. + + The value uses the standard libvirt *nodeset* syntax (e.g. ``0-3,5``). + +If the ``<acpi>`` element is omitted, no acpi-generic-initiator objects are +created for the device. + Block / character devices ^^^^^^^^^^^^^^^^^^^^^^^^^ -- 2.51.0

On Sat, Sep 06, 2025 at 03:09:02PM +0200, Andrea Righi wrote:
Add documentation for the new <acpi nodeset="..."> element in hostdev, which allows associating devices with ACPI Generic Initiator objects in QEMU.
A typical use case is NVIDIA Multi-Instance GPU (MIG), where a physical GPU is partitioned into multiple isolated instances, each tied to one or more virtual NUMA nodes. The documentation includes an example showing how to configure <numa> cells together with a MIG device.
Signed-off-by: Andrea Righi <arighi@nvidia.com> --- docs/formatdomain.rst | 49 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Sat, Sep 06, 2025 at 03:09:02PM +0200, Andrea Righi wrote:
Add documentation for the new <acpi nodeset="..."> element in hostdev, which allows associating devices with ACPI Generic Initiator objects in QEMU.
A typical use case is NVIDIA Multi-Instance GPU (MIG), where a physical GPU is partitioned into multiple isolated instances, each tied to one or more virtual NUMA nodes. The documentation includes an example showing how to configure <numa> cells together with a MIG device.
Signed-off-by: Andrea Righi <arighi@nvidia.com> --- docs/formatdomain.rst | 49 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+)
diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 9f7311b6d5..24f7cdd018 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -4894,6 +4894,55 @@ or: host device. :since:`Since 1.0.6`, but only works as expected :since:`since 1.2.2`.
+ACPI Generic Initiators +^^^^^^^^^^^^^^^^^^^^^^^^ + +A host device may include an ``<acpi>`` element to create ACPI Generic +Initiator objects for the device in QEMU. + +This can be used for **NVIDIA Multi-Instance GPU (MIG)** configurations, +where a physical GPU is partitioned into multiple isolated instances, each +associated with one or more virtual NUMA nodes. + +By attaching an ``<acpi nodeset=.../>`` element to the MIG device in the +domain XML, the guest will configure the correct partitioning for that +instance. + +.. code-block:: xml
We can't use code formatting in CI, so I've changed this to a plain pre-formatted text block like the rest of the doc
+ + <numa> + <cell id='0' cpus='0-15' memory='8388608' unit='KiB'/> + <cell id='1' memory='0' unit='KiB'/> + <cell id='2' memory='0' unit='KiB'/> + <cell id='3' memory='0' unit='KiB'/> + <cell id='4' memory='0' unit='KiB'/> + <cell id='5' memory='0' unit='KiB'/> + <cell id='6' memory='0' unit='KiB'/> + <cell id='7' memory='0' unit='KiB'/> + <cell id='8' memory='0' unit='KiB'/> + </numa> + ... + <hostdev mode='subsystem' type='pci' managed='yes'> + <source> + <address domain='0x0000' bus='0x06' slot='0x12' function='0x1'/> + </source> + <acpi nodeset='1-8'/> + <address type='pci' domain='0x0000' bus='0x00' + slot='0x02' function='0x0'/> + </hostdev> + +Attributes of ``<acpi>``: + +``nodeset`` + A list of NUMA node IDs that will be associated with the device. + Each node in the set causes libvirt to create an + ``acpi-generic-initiator`` object in QEMU, tied to this device. + + The value uses the standard libvirt *nodeset* syntax (e.g. ``0-3,5``). + +If the ``<acpi>`` element is omitted, no acpi-generic-initiator objects are +created for the device. + Block / character devices ^^^^^^^^^^^^^^^^^^^^^^^^^
-- 2.51.0
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Signed-off-by: Andrea Righi <arighi@nvidia.com> --- NEWS.rst | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/NEWS.rst b/NEWS.rst index 9577be0213..bc894bd996 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -17,6 +17,16 @@ v11.8.0 (unreleased) * **New features** + * qemu: Add support for NUMA affinity of PCI devices + + To support NVIDIA Multi-Instance GPU (MIG) configurations, libvirt now + handles QEMU's acpi-generic-initiator device internally. MIG enables + partitioning a physical GPU into multiple isolated instances, each + associated with one or more virtual NUMA nodes. + + On the XML side, the existing <acpi> element has been extended with a + "nodeset" attribute to specify the NUMA node affinity of a PCI device. + * **Improvements** * **Bug fixes** -- 2.51.0

On Sat, Sep 06, 2025 at 03:09:03PM +0200, Andrea Righi wrote:
Signed-off-by: Andrea Righi <arighi@nvidia.com> --- NEWS.rst | 10 ++++++++++ 1 file changed, 10 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Sat, Sep 06, 2025 at 03:08:55PM +0200, Andrea Righi wrote:
= Overview =
This patch set introduces support for acpi-generic-initiator devices, supported by QEMU [1].
The acpi-generic-initiator object is required to support Multi-Instance GPU (MIG) configurations on NVIDIA GPUs [2]. MIG enables partitioning of GPU resources into multiple isolated instances, each requiring a dedicated virtual NUMA node definition.
This is now pushed with the mentioned fixes. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (2)
-
Andrea Righi
-
Daniel P. Berrangé