[PATCH RESEND 0/6] Add support for configuring PCI high memory MMIO size

Resending: Series has been re-based over latest upstream. This patch series adds support for configuring the PCI high memory MMIO window size for aarch64 virt machine types. This feature has been merged into the QEMU upstream master branch [1] and will be available in QEMU 10.0. It allows users to configure the size of the high memory MMIO window above 4GB, which is particularly useful for systems with large amounts of PCI memory requirements. The feature is exposed through the domain XML as a new PCI feature: <features> <pci> <highmem-mmio-size unit='G'>512</highmem-mmio-size> </pci> </features> When enabled, this configures the size of the PCI high memory MMIO window via QEMU's highmem-mmio-size machine property. The feature is only available for aarch64 virt machine types and requires QEMU support. This series depends on [2] and should be applied on top of those patches. For your convenience, this series is also available on Github [3]. [1] https://github.com/qemu/qemu/commit/f10104aeae3a17f181d5bb37b7fd7dad7fe86cba [2] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/Z4NQ3... [3] git fetch https://github.com/nvmochs/libvirt.git pci_highmem_mmio_size Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Matthew R. Ochs (6): domain: Add PCI configuration feature infrastructure schema: Add PCI configuration feature schema conf: Add PCI configuration XML parsing and formatting qemu: Add capability for PCI high memory MMIO size qemu: Add command line support for PCI high memory MMIO size tests: Add tests for machine PCI features src/conf/domain_conf.c | 103 ++++++++++++++++++ src/conf/domain_conf.h | 6 + src/conf/schemas/domaincommon.rng | 9 ++ src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 6 + src/qemu/qemu_validate.c | 15 +++ .../caps_10.0.0_aarch64.replies | 10 ++ .../caps_10.0.0_aarch64.xml | 1 + ...rch64-virt-machine-pci.aarch64-latest.args | 31 ++++++ ...arch64-virt-machine-pci.aarch64-latest.xml | 30 +++++ .../aarch64-virt-machine-pci.xml | 20 ++++ tests/qemuxmlconftest.c | 2 + 13 files changed, 236 insertions(+) create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.args create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.xml create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.xml -- 2.46.0

Add basic infrastructure for PCI configuration feature including: - New PCI configuration structure in domain_conf.h - Add VIR_DOMAIN_FEATURE_PCI enum and string conversion - Add pci field to virDomainDef to store PCI configuration This will be used to support QEMU's highmem-mmio-size machine property for the aarch64 virt machine type. Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> --- src/conf/domain_conf.c | 2 ++ src/conf/domain_conf.h | 6 ++++++ 2 files changed, 8 insertions(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index c72463818095..05d1d1bfdc83 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -187,6 +187,7 @@ VIR_ENUM_IMPL(virDomainFeature, "ras", "ps2", "aia", + "pci", ); VIR_ENUM_IMPL(virDomainCapabilitiesPolicy, @@ -4132,6 +4133,7 @@ void virDomainDefFree(virDomainDef *def) g_free(def->kvm_features); g_free(def->hyperv_vendor_id); g_free(def->tcg_features); + g_free(def->pci); virBlkioDeviceArrayClear(def->blkio.devices, def->blkio.ndevices); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 8dfadbb98df6..464d3225d4b1 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2226,6 +2226,7 @@ typedef enum { VIR_DOMAIN_FEATURE_RAS, VIR_DOMAIN_FEATURE_PS2, VIR_DOMAIN_FEATURE_AIA, + VIR_DOMAIN_FEATURE_PCI, VIR_DOMAIN_FEATURE_LAST } virDomainFeature; @@ -3088,6 +3089,10 @@ struct _virDomainPstoreDef { virDomainDeviceInfo info; }; +typedef struct _virDomainPCIDef { + unsigned long long highmemMMIOSize; /* in bytes */ +} virDomainPCIDef; +G_DEFINE_AUTOPTR_CLEANUP_FUNC(virDomainPCIDef, virObjectUnref); #define SCSI_SUPER_WIDE_BUS_MAX_CONT_UNIT 64 #define SCSI_WIDE_BUS_MAX_CONT_UNIT 16 @@ -3152,6 +3157,7 @@ struct _virDomainDef { virDomainPerfDef perf; virDomainOSDef os; + virDomainPCIDef *pci; char *emulator; /* Most {caps_,hyperv_,kvm_,}feature options utilize a virTristateSwitch * to handle support. A few assign specific data values to the option. -- 2.46.0

Add schema definition for PCI configuration feature including: - Add <pci> element under <features> - Add <highmem-mmio-size> element with scaledInteger type - Support unit attribute for size specification This allows XML configuration of PCI high memory MMIO size for aarch64 virt machines. Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> --- src/conf/schemas/domaincommon.rng | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/src/conf/schemas/domaincommon.rng b/src/conf/schemas/domaincommon.rng index 5597d5a66baf..3c65dd661014 100644 --- a/src/conf/schemas/domaincommon.rng +++ b/src/conf/schemas/domaincommon.rng @@ -7076,6 +7076,15 @@ <optional> <ref name="aia"/> </optional> + <optional> + <element name='pci'> + <optional> + <element name='highmem-mmio-size'> + <ref name='scaledInteger'/> + </element> + </optional> + </element> + </optional> </interleave> </element> </optional> -- 2.46.0

Add XML parsing and formatting support for PCI configuration: - Add virDomainFeaturesPCIDefParseXML function - Add virDomainFeaturesPCIDefFormat function - Wire up parsing in virDomainFeaturesDefParse - Wire up formatting in virDomainDefFormatFeatures - Use g_steal_pointer for memory management The highmem-mmio-size property can now be specified in domain XML and will be properly parsed and formatted as a domain feature. Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> --- src/conf/domain_conf.c | 101 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 101 insertions(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 05d1d1bfdc83..d10128b851fd 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -17232,6 +17232,48 @@ virDomainFeaturesTCGDefParse(virDomainDef *def, } +static virDomainPCIDef * +virDomainPCIDefNew(void) +{ + virDomainPCIDef *pci; + + pci = g_new0(virDomainPCIDef, 1); + return pci; +} + + +/** + * virDomainFeaturesPCIDefParseXML: + * @ctxt: XML context + * + * Parses PCI configuration from XML under features/pci. Currently only + * supports the highmem-mmio-size property for aarch64 virt machine types. + * + * Returns: parsed pci definition or NULL on error + */ +static virDomainPCIDef * +virDomainFeaturesPCIDefParseXML(xmlXPathContextPtr ctxt) +{ + g_autoptr(virDomainPCIDef) pci = virDomainPCIDefNew(); + unsigned long long size; + int rc; + + if ((rc = virParseScaledValue("./features/pci/highmem-mmio-size", + NULL, + ctxt, + &size, + 1024 * 1024 * 1024, /* Default scale: GiB */ + ULLONG_MAX, + false)) < 0) + return NULL; + + if (rc == 1) + pci->highmemMMIOSize = size; + + return g_steal_pointer(&pci); +} + + static int virDomainFeaturesDefParse(virDomainDef *def, xmlXPathContextPtr ctxt) @@ -17478,6 +17520,13 @@ virDomainFeaturesDefParse(virDomainDef *def, break; } + case VIR_DOMAIN_FEATURE_PCI: + if ((def->pci = virDomainFeaturesPCIDefParseXML(ctxt)) == NULL) + return -1; + + def->features[val] = VIR_TRISTATE_SWITCH_ON; + break; + case VIR_DOMAIN_FEATURE_LAST: break; } @@ -21468,6 +21517,25 @@ virDomainDefFeaturesCheckABIStability(virDomainDef *src, } break; + case VIR_DOMAIN_FEATURE_PCI: + if (src->features[i] != dst->features[i]) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("target domain pci feature %1$s does not match source %2$s"), + virTristateSwitchTypeToString(dst->features[i]), + virTristateSwitchTypeToString(src->features[i])); + return false; + } + if (src->features[i] == VIR_TRISTATE_SWITCH_ON) { + if (src->pci->highmemMMIOSize != dst->pci->highmemMMIOSize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("target domain pci highmem-mmio-size %1$llu does not match source %2$llu"), + dst->pci->highmemMMIOSize, + src->pci->highmemMMIOSize); + return false; + } + } + break; + case VIR_DOMAIN_FEATURE_MSRS: case VIR_DOMAIN_FEATURE_TCG: case VIR_DOMAIN_FEATURE_ASYNC_TEARDOWN: @@ -28352,6 +28420,34 @@ virDomainFeatureTCGFormat(virBuffer *buf, } +/** + * virDomainFeaturesPCIDefFormat: + * @buf: buffer to write XML data to + * @pci: PCI feature data to format + * + * Format the PCI feature configuration as XML under features/pci. + */ +static void +virDomainFeaturesPCIDefFormat(virBuffer *buf, + const virDomainPCIDef *pci) +{ + g_auto(virBuffer) attrBuf = VIR_BUFFER_INITIALIZER; + g_auto(virBuffer) childBuf = VIR_BUFFER_INIT_CHILD(buf); + + if (!pci) + return; + + if (pci->highmemMMIOSize > 0) { + virBufferAddLit(&attrBuf, " unit='G'"); + virBufferAsprintf(&childBuf, "<highmem-mmio-size%s>%llu</highmem-mmio-size>\n", + virBufferCurrentContent(&attrBuf), + pci->highmemMMIOSize / (1024 * 1024 * 1024)); + } + + virXMLFormatElement(buf, "pci", NULL, &childBuf); +} + + static int virDomainDefFormatFeatures(virBuffer *buf, virDomainDef *def) @@ -28710,6 +28806,11 @@ virDomainDefFormatFeatures(virBuffer *buf, virDomainAIATypeToString(def->features[i])); break; + case VIR_DOMAIN_FEATURE_PCI: + if (def->features[i] == VIR_TRISTATE_SWITCH_ON) + virDomainFeaturesPCIDefFormat(&childBuf, def->pci); + break; + case VIR_DOMAIN_FEATURE_LAST: break; } -- 2.46.0

Add QEMU capability for PCI high memory MMIO size configuration: - Add QEMU_CAPS_MACHINE_VIRT_HIGHMEM_MMIO_SIZE capability - Add capability to virt machine properties - Add highmem-mmio-size virt machine property to aarch64 qemu 10.0.0 capabilities This allows detecting support for the highmem-mmio-size virt machine property in QEMU. Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_10.0.0_aarch64.replies | 10 ++++++++++ tests/qemucapabilitiesdata/caps_10.0.0_aarch64.xml | 1 + 4 files changed, 14 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index a804335c85b8..80eeb39d2088 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -732,6 +732,7 @@ VIR_ENUM_IMPL(virQEMUCaps, /* 475 */ "virtio-scsi.iothread-mapping", /* QEMU_CAPS_VIRTIO_SCSI_IOTHREAD_MAPPING */ + "machine.virt.highmem-mmio-size", /* QEMU_CAPS_MACHINE_VIRT_HIGHMEM_MMIO_SIZE */ ); @@ -1767,6 +1768,7 @@ static struct virQEMUCapsStringFlags virQEMUCapsMachinePropsVirt[] = { { "iommu", QEMU_CAPS_MACHINE_VIRT_IOMMU }, { "ras", QEMU_CAPS_MACHINE_VIRT_RAS }, { "aia", QEMU_CAPS_MACHINE_VIRT_AIA }, + { "highmem-mmio-size", QEMU_CAPS_MACHINE_VIRT_HIGHMEM_MMIO_SIZE }, }; static struct virQEMUCapsStringFlags virQEMUCapsMachinePropsGeneric[] = { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index ea7c14daa9a6..6560bc3cad48 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -713,6 +713,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ /* 475 */ QEMU_CAPS_VIRTIO_SCSI_IOTHREAD_MAPPING, /* virtio-scsi supports per-virtqueue iothread mapping */ + QEMU_CAPS_MACHINE_VIRT_HIGHMEM_MMIO_SIZE, /* -machine virt,highmem-mmio-size=<size> */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_10.0.0_aarch64.replies b/tests/qemucapabilitiesdata/caps_10.0.0_aarch64.replies index 5ef02f7ae41d..65b77e8baafb 100644 --- a/tests/qemucapabilitiesdata/caps_10.0.0_aarch64.replies +++ b/tests/qemucapabilitiesdata/caps_10.0.0_aarch64.replies @@ -33843,6 +33843,11 @@ "description": "Set on/off to enable/disable high memory region for PCI ECAM", "type": "bool" }, + { + "name": "highmem-mmio-size", + "description": "Set the high memory region size for PCI MMIO", + "type": "size" + }, { "name": "highmem", "description": "Set on/off to enable/disable using physical address space above 32 bits", @@ -34469,6 +34474,11 @@ "help": "Set on/off to enable/disable high memory region for PCI ECAM", "type": "boolean" }, + { + "name": "highmem-mmio-size", + "help": "Set the high memory region size for PCI MMIO", + "type": "size" + }, { "name": "highmem", "help": "Set on/off to enable/disable using physical address space above 32 bits", diff --git a/tests/qemucapabilitiesdata/caps_10.0.0_aarch64.xml b/tests/qemucapabilitiesdata/caps_10.0.0_aarch64.xml index 3f46ab55d84f..ea7862c459ef 100644 --- a/tests/qemucapabilitiesdata/caps_10.0.0_aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_10.0.0_aarch64.xml @@ -172,6 +172,7 @@ <flag name='netdev-stream-reconnect-miliseconds'/> <flag name='migrate-incoming.exit-on-error'/> <flag name='blockdev-set-active'/> + <flag name='machine.virt.highmem-mmio-size'/> <version>9002050</version> <microcodeVersion>61700285</microcodeVersion> <package>v9.2.0-1967-gb69801dd6b</package> -- 2.46.0

Add support for generating QEMU command line with PCI high memory MMIO size: - Add highmem-mmio-size to machine command line generation - Add validation for aarch64/virt machine type requirement - Add capability check for QEMU support - Add feature validation in qemu_validate.c This enables configuring the PCI high memory MMIO window size for aarch64 virt machine types. Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> --- src/qemu/qemu_command.c | 6 ++++++ src/qemu/qemu_validate.c | 15 +++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index e6d308534f87..400cb8deee31 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -6859,6 +6859,12 @@ qemuAppendDomainFeaturesMachineParam(virBuffer *buf, virBufferAsprintf(buf, ",aia=%s", str); } + if (def->features[VIR_DOMAIN_FEATURE_PCI] == VIR_TRISTATE_SWITCH_ON) { + if (def->pci && def->pci->highmemMMIOSize > 0) { + virBufferAsprintf(buf, ",highmem-mmio-size=%lluG", + def->pci->highmemMMIOSize / (1024 * 1024 * 1024)); + } + } return 0; } diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index b2c3c9e2f631..aaf28ec92e0e 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -279,6 +279,21 @@ qemuValidateDomainDefFeatures(const virDomainDef *def, } break; + case VIR_DOMAIN_FEATURE_PCI: + if (def->features[i] != VIR_TRISTATE_SWITCH_ABSENT && + !qemuDomainIsARMVirt(def)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("features/pci is only supported with aarch64 virt machines")); + return -1; + } + if (def->features[i] != VIR_TRISTATE_SWITCH_ABSENT && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_MACHINE_VIRT_HIGHMEM_MMIO_SIZE)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("features/pci is not supported with this QEMU binary")); + return -1; + } + break; + case VIR_DOMAIN_FEATURE_SMM: case VIR_DOMAIN_FEATURE_KVM: case VIR_DOMAIN_FEATURE_XEN: -- 2.46.0

Add test coverage for machine-specific PCI features: - Add XML tests for aarch64 virt highmem-mmio-size - Add command line generation tests This ensures proper handling of machine-specific PCI features like the high memory MMIO window size configuration. Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> --- ...rch64-virt-machine-pci.aarch64-latest.args | 31 +++++++++++++++++++ ...arch64-virt-machine-pci.aarch64-latest.xml | 30 ++++++++++++++++++ .../aarch64-virt-machine-pci.xml | 20 ++++++++++++ tests/qemuxmlconftest.c | 2 ++ 4 files changed, 83 insertions(+) create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.args create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.xml create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.xml diff --git a/tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.args b/tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.args new file mode 100644 index 000000000000..7ab4e8bd624f --- /dev/null +++ b/tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.args @@ -0,0 +1,31 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/var/lib/libvirt/qemu/domain--1-aarch64-virt-machine \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/var/lib/libvirt/qemu/domain--1-aarch64-virt-machine/.local/share \ +XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain--1-aarch64-virt-machine/.cache \ +XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain--1-aarch64-virt-machine/.config \ +/usr/bin/qemu-system-aarch64 \ +-name guest=aarch64-virt-machine-pci,debug-threads=on \ +-S \ +-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain--1-aarch64-virt-machine/master-key.aes"}' \ +-machine virt,usb=off,gic-version=2,highmem-mmio-size=512G,dump-guest-core=off,memory-backend=mach-virt.ram,acpi=off \ +-accel tcg \ +-cpu cortex-a15 \ +-m size=1048576k \ +-object '{"qom-type":"memory-backend-ram","id":"mach-virt.ram","size":1073741824}' \ +-overcommit mem-lock=off \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid 6ba7b810-9dad-11d1-80b4-00c04fd430c8 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-boot strict=on \ +-audiodev '{"id":"audio1","driver":"none"}' \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ +-msg timestamp=on diff --git a/tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.xml b/tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.xml new file mode 100644 index 000000000000..d19a23b17b70 --- /dev/null +++ b/tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.xml @@ -0,0 +1,30 @@ +<domain type='qemu'> + <name>aarch64-virt-machine-pci</name> + <uuid>6ba7b810-9dad-11d1-80b4-00c04fd430c8</uuid> + <memory unit='KiB'>1048576</memory> + <currentMemory unit='KiB'>1048576</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='aarch64' machine='virt'>hvm</type> + <boot dev='hd'/> + </os> + <features> + <gic version='2'/> + <pci> + <highmem-mmio-size unit='G'>512</highmem-mmio-size> + </pci> + </features> + <cpu mode='custom' match='exact' check='none'> + <model fallback='forbid'>cortex-a15</model> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-aarch64</emulator> + <controller type='pci' index='0' model='pcie-root'/> + <audio id='1' type='none'/> + <memballoon model='none'/> + </devices> +</domain> diff --git a/tests/qemuxmlconfdata/aarch64-virt-machine-pci.xml b/tests/qemuxmlconfdata/aarch64-virt-machine-pci.xml new file mode 100644 index 000000000000..42ebb4b304b5 --- /dev/null +++ b/tests/qemuxmlconfdata/aarch64-virt-machine-pci.xml @@ -0,0 +1,20 @@ +<domain type='qemu'> + <name>aarch64-virt-machine-pci</name> + <uuid>6ba7b810-9dad-11d1-80b4-00c04fd430c8</uuid> + <memory unit='KiB'>1048576</memory> + <currentMemory unit='KiB'>1048576</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='aarch64' machine='virt'>hvm</type> + </os> + <features> + <pci> + <highmem-mmio-size unit='G'>512</highmem-mmio-size> + </pci> + </features> + <devices> + <emulator>/usr/bin/qemu-system-aarch64</emulator> + <controller type='pci' index='0' model='pcie-root'/> + <memballoon model='none'/> + </devices> +</domain> diff --git a/tests/qemuxmlconftest.c b/tests/qemuxmlconftest.c index 1f31ec810c7a..a5b326cd9364 100644 --- a/tests/qemuxmlconftest.c +++ b/tests/qemuxmlconftest.c @@ -2644,6 +2644,8 @@ mymain(void) DO_TEST_CAPS_ARCH_LATEST("clock-timer-armvtimer", "aarch64"); + DO_TEST_CAPS_ARCH_LATEST("aarch64-virt-machine-pci", "aarch64"); + qemuTestSetHostArch(&driver, VIR_ARCH_NONE); DO_TEST_CAPS_LATEST("kvm-pit-delay"); -- 2.46.0

On Fri, Apr 11, 2025 at 08:40:54AM -0700, Matthew R. Ochs via Devel wrote:
Resending: Series has been re-based over latest upstream.
This patch series adds support for configuring the PCI high memory MMIO window size for aarch64 virt machine types. This feature has been merged into the QEMU upstream master branch [1] and will be available in QEMU 10.0. It allows users to configure the size of the high memory MMIO window above 4GB, which is particularly useful for systems with large amounts of PCI memory requirements.
The feature is exposed through the domain XML as a new PCI feature: <features> <pci> <highmem-mmio-size unit='G'>512</highmem-mmio-size> </pci> </features>
When enabled, this configures the size of the PCI high memory MMIO window via QEMU's highmem-mmio-size machine property. The feature is only available for aarch64 virt machine types and requires QEMU support.
This isn't my area of expertize, but could you give any more background on why we need to /manually/ set such a property on Arm only ? Is there something that prevents us making QEMU "do the right thing" ? As a general rule these kind of obscure tunables are not very user friendly. Since they are obscure, most mgmt apps developers are not going to be aware of them, so may well not provide any way to set them, and even if they can be set, it still requires someone or something to remember to actually set it... which usually ends up only happening /after/ the end user has complained their setup is broken. Overall this leads to a poor user experience IME. IOW, if there is any plausible way we can make QEMU work suitably out of the box, that'd be preferrable to requiring a manually set obscure tunable like this
This series depends on [2] and should be applied on top of those patches.
For your convenience, this series is also available on Github [3].
[1] https://github.com/qemu/qemu/commit/f10104aeae3a17f181d5bb37b7fd7dad7fe86cba [2] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/Z4NQ3... [3] git fetch https://github.com/nvmochs/libvirt.git pci_highmem_mmio_size
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Matthew R. Ochs (6): domain: Add PCI configuration feature infrastructure schema: Add PCI configuration feature schema conf: Add PCI configuration XML parsing and formatting qemu: Add capability for PCI high memory MMIO size qemu: Add command line support for PCI high memory MMIO size tests: Add tests for machine PCI features
src/conf/domain_conf.c | 103 ++++++++++++++++++ src/conf/domain_conf.h | 6 + src/conf/schemas/domaincommon.rng | 9 ++ src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 6 + src/qemu/qemu_validate.c | 15 +++ .../caps_10.0.0_aarch64.replies | 10 ++ .../caps_10.0.0_aarch64.xml | 1 + ...rch64-virt-machine-pci.aarch64-latest.args | 31 ++++++ ...arch64-virt-machine-pci.aarch64-latest.xml | 30 +++++ .../aarch64-virt-machine-pci.xml | 20 ++++ tests/qemuxmlconftest.c | 2 + 13 files changed, 236 insertions(+) create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.args create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.aarch64-latest.xml create mode 100644 tests/qemuxmlconfdata/aarch64-virt-machine-pci.xml
-- 2.46.0
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Hi Daniel, Thanks for your feedback!
On May 7, 2025, at 11:51 AM, Daniel P. Berrangé <berrange@redhat.com> wrote: On Fri, Apr 11, 2025 at 08:40:54AM -0700, Matthew R. Ochs via Devel wrote:
Resending: Series has been re-based over latest upstream.
This patch series adds support for configuring the PCI high memory MMIO window size for aarch64 virt machine types. This feature has been merged into the QEMU upstream master branch [1] and will be available in QEMU 10.0. It allows users to configure the size of the high memory MMIO window above 4GB, which is particularly useful for systems with large amounts of PCI memory requirements.
The feature is exposed through the domain XML as a new PCI feature: <features> <pci> <highmem-mmio-size unit='G'>512</highmem-mmio-size> </pci> </features>
When enabled, this configures the size of the PCI high memory MMIO window via QEMU's highmem-mmio-size machine property. The feature is only available for aarch64 virt machine types and requires QEMU support.
This isn't my area of expertize, but could you give any more background on why we need to /manually/ set such a property on Arm only ? Is there something that prevents us making QEMU "do the right thing" ?
The highmem-mmio-size property is only available for the arm64 “virt” machine. It is only needed when a VM configuration will exceed the 512G default for PCI highmem region. There are some GPU devices that exist today that have very large BARs and require more than 512G when multiple devices are passed through to a VM. Regarding making QEMU “do the right thing”, we could add logic to libvirt to detect when these known devices are present in the VM configuration and automatically set an appropriate size for the parameter. However I was under the impression that type of solution was preferred to be handled at the mgmt app layer. -matt

On Wed, May 07, 2025 at 08:44:05PM +0000, Matt Ochs wrote:
Hi Daniel,
Thanks for your feedback!
On May 7, 2025, at 11:51 AM, Daniel P. Berrangé <berrange@redhat.com> wrote: On Fri, Apr 11, 2025 at 08:40:54AM -0700, Matthew R. Ochs via Devel wrote:
Resending: Series has been re-based over latest upstream.
This patch series adds support for configuring the PCI high memory MMIO window size for aarch64 virt machine types. This feature has been merged into the QEMU upstream master branch [1] and will be available in QEMU 10.0. It allows users to configure the size of the high memory MMIO window above 4GB, which is particularly useful for systems with large amounts of PCI memory requirements.
The feature is exposed through the domain XML as a new PCI feature: <features> <pci> <highmem-mmio-size unit='G'>512</highmem-mmio-size> </pci> </features>
When enabled, this configures the size of the PCI high memory MMIO window via QEMU's highmem-mmio-size machine property. The feature is only available for aarch64 virt machine types and requires QEMU support.
This isn't my area of expertize, but could you give any more background on why we need to /manually/ set such a property on Arm only ? Is there something that prevents us making QEMU "do the right thing" ?
The highmem-mmio-size property is only available for the arm64 “virt” machine. It is only needed when a VM configuration will exceed the 512G default for PCI highmem region. There are some GPU devices that exist today that have very large BARs and require more than 512G when multiple devices are passed through to a VM.
Regarding making QEMU “do the right thing”, we could add logic to libvirt to detect when these known devices are present in the VM configuration and automatically set an appropriate size for the parameter. However I was under the impression that type of solution was preferred to be handled at the mgmt app layer.
I wasn't suggestnig to put logic in libvirt actually. I'm querying why QEMU's memory map is setup such that this PCI assignment can't work by default with a standard QEMU configuration ? Can you confirm this works correctly on x86 QEMU with q35 machine type by default ? If so, what prevents QEMU 'virt' machine for aarch64 being changed to also work ? Libvirt can't detect when the devices are present in the VM config because this mmio setting is a cold boot option, while PCI devices are often hot-plugged to an existing VM. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Fri, Apr 11, 2025 at 08:40:54AM -0700, Matthew R. Ochs via Devel wrote:
Resending: Series has been re-based over latest upstream.
This patch series adds support for configuring the PCI high memory MMIO window size for aarch64 virt machine types. This feature has been merged into the QEMU upstream master branch [1] and will be available in QEMU 10.0. It allows users to configure the size of the high memory MMIO window above 4GB, which is particularly useful for systems with large amounts of PCI memory requirements.
The feature is exposed through the domain XML as a new PCI feature: <features> <pci> <highmem-mmio-size unit='G'>512</highmem-mmio-size> </pci> </features>
As a schema design comment. IIUC, the MMIO size we're configuring is conceptually a characteristic associated with the PCI(e) host and the memory layout it defines for PCI(e) devices to use. Checking through our schema I find we already have support for <controller type='pci' index='0' model='pci-root'> <pcihole64 unit='KiB'>1048576</pcihole64> </controller> this makes me think that we should model this new attribute in a similar way, eg so we can support: <controller type='pci' index='0' model='pci-root'> <pcihole64 unit='KiB'>1048576</pcihole64> <pcimmio64 unit='TiB'>2</pcimmio64> </controller> (pci-root or pcie-root are interchangable). This 'pcimmio64' value can then be mapped to whatever hypervisor or architecture specific setting is appropriate, avoiding exposing the QEMU arm 'highmem-mmio-size' naming convention. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On May 9, 2025, at 9:59 AM, Daniel P. Berrangé <berrange@redhat.com> wrote:
On Fri, Apr 11, 2025 at 08:40:54AM -0700, Matthew R. Ochs via Devel wrote:
Resending: Series has been re-based over latest upstream.
This patch series adds support for configuring the PCI high memory MMIO window size for aarch64 virt machine types. This feature has been merged into the QEMU upstream master branch [1] and will be available in QEMU 10.0. It allows users to configure the size of the high memory MMIO window above 4GB, which is particularly useful for systems with large amounts of PCI memory requirements.
The feature is exposed through the domain XML as a new PCI feature: <features> <pci> <highmem-mmio-size unit='G'>512</highmem-mmio-size> </pci> </features>
As a schema design comment. IIUC, the MMIO size we're configuring is conceptually a characteristic associated with the PCI(e) host and the memory layout it defines for PCI(e) devices to use.
Correct.
Checking through our schema I find we already have support for
<controller type='pci' index='0' model='pci-root'> <pcihole64 unit='KiB'>1048576</pcihole64> </controller>
this makes me think that we should model this new attribute in a similar way, eg so we can support:
<controller type='pci' index='0' model='pci-root'> <pcihole64 unit='KiB'>1048576</pcihole64> <pcimmio64 unit='TiB'>2</pcimmio64> </controller>
(pci-root or pcie-root are interchangable).
This 'pcimmio64' value can then be mapped to whatever hypervisor or architecture specific setting is appropriate, avoiding exposing the QEMU arm 'highmem-mmio-size' naming convention.
Thanks for the feedback, this sounds like a better approach. Would it make sense to just use the existing pcihole64 since [I think] it more or less represents the same concept (setting 64bit MMIO window)? Or perhaps that would be too messy or x86-centric and it’s better to go with what you proposed (pcimmio64)?

On Fri, May 09, 2025 at 07:29:04PM +0000, Matt Ochs wrote:
On May 9, 2025, at 9:59 AM, Daniel P. Berrangé <berrange@redhat.com> wrote:
On Fri, Apr 11, 2025 at 08:40:54AM -0700, Matthew R. Ochs via Devel wrote:
Resending: Series has been re-based over latest upstream.
This patch series adds support for configuring the PCI high memory MMIO window size for aarch64 virt machine types. This feature has been merged into the QEMU upstream master branch [1] and will be available in QEMU 10.0. It allows users to configure the size of the high memory MMIO window above 4GB, which is particularly useful for systems with large amounts of PCI memory requirements.
The feature is exposed through the domain XML as a new PCI feature: <features> <pci> <highmem-mmio-size unit='G'>512</highmem-mmio-size> </pci> </features>
As a schema design comment. IIUC, the MMIO size we're configuring is conceptually a characteristic associated with the PCI(e) host and the memory layout it defines for PCI(e) devices to use.
Correct.
Checking through our schema I find we already have support for
<controller type='pci' index='0' model='pci-root'> <pcihole64 unit='KiB'>1048576</pcihole64> </controller>
this makes me think that we should model this new attribute in a similar way, eg so we can support:
<controller type='pci' index='0' model='pci-root'> <pcihole64 unit='KiB'>1048576</pcihole64> <pcimmio64 unit='TiB'>2</pcimmio64> </controller>
(pci-root or pcie-root are interchangable).
This 'pcimmio64' value can then be mapped to whatever hypervisor or architecture specific setting is appropriate, avoiding exposing the QEMU arm 'highmem-mmio-size' naming convention.
Thanks for the feedback, this sounds like a better approach.
Would it make sense to just use the existing pcihole64 since [I think] it more or less represents the same concept (setting 64bit MMIO window)?
I'm not sure. I've been struggling to reproduce an effect wit hseting the existing -global q35-pcihost.pci-hole64-size=1048576K settings on x86, and also wondering how it interacts with the previously mentioned -fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=262144 Possibly the former only works with SeaBIOS, and the latter only works with EDK2, but I've not figured out how to prove this. I'm curious if there's a good way to identify the guest memory map impact, as I'm not finding a clear marker in 'dmesg' that correlates ?
Or perhaps that would be too messy or x86-centric and it’s better to go with what you proposed (pcimmio64)?
If the 'pcihole64' setting really is setting the MMIO64 window, then it would be preferrable to re-use the existing setting field. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On May 12, 2025, at 5:19 AM, Daniel P. Berrangé <berrange@redhat.com> wrote: On Fri, May 09, 2025 at 07:29:04PM +0000, Matt Ochs wrote:
Would it make sense to just use the existing pcihole64 since [I think] it more or less represents the same concept (setting 64bit MMIO window)?
I'm not sure. I've been struggling to reproduce an effect wit hseting the existing -global q35-pcihost.pci-hole64-size=1048576K settings on x86, and also wondering how it interacts with the previously mentioned -fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=262144
Possibly the former only works with SeaBIOS, and the latter only works with EDK2, but I've not figured out how to prove this.
The qemu docs mention opt/ovmf is specifically for OVMF firmware: https://github.com/qemu/qemu/blob/7be29f2f1a3f5b037d27eedbd5df9f441e8c8c16/d... The pcihole64 setting can be used with OVMF (see below) and with SEABIOS: https://github.com/libvirt/libvirt/blob/master/docs/formatdomain.rst (see pcihole64) The X-PciMmio64Mb parameter isn't directly supported in libvirt IIUC. The libvirt XML would need to directly pass qemu command line arguments to use it.
I'm curious if there's a good way to identify the guest memory map impact, as I'm not finding a clear marker in 'dmesg' that correlates ?
We were able to test this by using OVMF without the dynamic mmio window size patch (i.e. a version older than edk2-stable202211) and guest kernel parameters that are not set to allow re-calculating the MMIO window size by deferring guest resource allocations to the guest kernel (i.e. pci=realloc and pci=nocrs aren't set). With this we could reproduce a 4 GPU VM launch with guest BARs not mapped properly due to running out of space/resources. The BAR mapping failures will be clear in dmesg, with no BAR region mappings in /proc/iomem or output of lspci for the GPUs. From there we added the pcihole64 attribute to the VM's libvirt definition, setting a 2 TB hole size, and the VM booted with guest GPU BARs mapped properly in dmesg + GPU BAR mappings visible in /proc/iomem and lspci output. Lastly, observed the same behavior by removing the pcihole64 attribute and setting the X-PciMmio64Mb configuration to 2TB.
Or perhaps that would be too messy or x86-centric and it’s better to go with what you proposed (pcimmio64)?
If the 'pcihole64' setting really is setting the MMIO64 window, then it would be preferrable to re-use the existing setting field.
Per the tests above, pcihole64 is setting the MMIO64 window. The only concern I have with using it is that to date, it has been an x86-centric attribute and tied closely with the qemu -global parameter. I don’t think this is a show-stopper, but will require some code changes to allow it to work with the virt machine and connect it up to a different qemu parameter for that machine. -matt

On Mon, May 12, 2025 at 07:33:37PM +0000, Matt Ochs wrote:
On May 12, 2025, at 5:19 AM, Daniel P. Berrangé <berrange@redhat.com> wrote: On Fri, May 09, 2025 at 07:29:04PM +0000, Matt Ochs wrote:
Would it make sense to just use the existing pcihole64 since [I think] it more or less represents the same concept (setting 64bit MMIO window)?
I'm not sure. I've been struggling to reproduce an effect wit hseting the existing -global q35-pcihost.pci-hole64-size=1048576K settings on x86, and also wondering how it interacts with the previously mentioned -fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=262144
Possibly the former only works with SeaBIOS, and the latter only works with EDK2, but I've not figured out how to prove this.
The qemu docs mention opt/ovmf is specifically for OVMF firmware: https://github.com/qemu/qemu/blob/7be29f2f1a3f5b037d27eedbd5df9f441e8c8c16/d...
The pcihole64 setting can be used with OVMF (see below) and with SEABIOS: https://github.com/libvirt/libvirt/blob/master/docs/formatdomain.rst (see pcihole64)
The X-PciMmio64Mb parameter isn't directly supported in libvirt IIUC. The libvirt XML would need to directly pass qemu command line arguments to use it.
I'm wondering what the semantic difference is between setting the pcihole64 property and the X-PciMmio64Mb fwcfg, in the context of OVMF. The fact that both exist, suggests that there is a meaningful difference, which in turn would mean libvirt might need separate XML attributes for each, which in turn influences how we might choose to design the aarch64 solution.
I'm curious if there's a good way to identify the guest memory map impact, as I'm not finding a clear marker in 'dmesg' that correlates ?
We were able to test this by using OVMF without the dynamic mmio window size patch (i.e. a version older than edk2-stable202211) and guest kernel parameters that are not set to allow re-calculating the MMIO window size by deferring guest resource allocations to the guest kernel (i.e. pci=realloc and pci=nocrs aren't set). With this we could reproduce a 4 GPU VM launch with guest BARs not mapped properly due to running out of space/resources. The BAR mapping failures will be clear in dmesg, with no BAR region mappings in /proc/iomem or output of lspci for the GPUs.
From there we added the pcihole64 attribute to the VM's libvirt definition, setting a 2 TB hole size, and the VM booted with guest GPU BARs mapped properly in dmesg + GPU BAR mappings visible in /proc/iomem and lspci output.
Lastly, observed the same behavior by removing the pcihole64 attribute and setting the X-PciMmio64Mb configuration to 2TB.
Or perhaps that would be too messy or x86-centric and it’s better to go with what you proposed (pcimmio64)?
If the 'pcihole64' setting really is setting the MMIO64 window, then it would be preferrable to re-use the existing setting field.
Per the tests above, pcihole64 is setting the MMIO64 window. The only concern I have with using it is that to date, it has been an x86-centric attribute and tied closely with the qemu -global parameter. I don’t think this is a show-stopper, but will require some code changes to allow it to work with the virt machine and connect it up to a different qemu parameter for that machine.
-matt
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On May 13, 2025, at 3:10 AM, Daniel P. Berrangé <berrange@redhat.com> wrote:
On Mon, May 12, 2025 at 07:33:37PM +0000, Matt Ochs wrote:
On May 12, 2025, at 5:19 AM, Daniel P. Berrangé <berrange@redhat.com> wrote: On Fri, May 09, 2025 at 07:29:04PM +0000, Matt Ochs wrote:
Would it make sense to just use the existing pcihole64 since [I think] it more or less represents the same concept (setting 64bit MMIO window)?
I'm not sure. I've been struggling to reproduce an effect wit hseting the existing -global q35-pcihost.pci-hole64-size=1048576K settings on x86, and also wondering how it interacts with the previously mentioned -fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=262144
Possibly the former only works with SeaBIOS, and the latter only works with EDK2, but I've not figured out how to prove this.
The qemu docs mention opt/ovmf is specifically for OVMF firmware: https://github.com/qemu/qemu/blob/7be29f2f1a3f5b037d27eedbd5df9f441e8c8c16/d...
The pcihole64 setting can be used with OVMF (see below) and with SEABIOS: https://github.com/libvirt/libvirt/blob/master/docs/formatdomain.rst (see pcihole64)
The X-PciMmio64Mb parameter isn't directly supported in libvirt IIUC. The libvirt XML would need to directly pass qemu command line arguments to use it.
I'm wondering what the semantic difference is between setting the pcihole64 property and the X-PciMmio64Mb fwcfg, in the context of OVMF.
The fact that both exist, suggests that there is a meaningful difference, which in turn would mean libvirt might need separate XML attributes for each, which in turn influences how we might choose to design the aarch64 solution.
AFAICT, I think these are the key points between the two… - pcihole64 is a QEMU property It tells QEMU how much address space to reserve for 64-bit PCI MMIO. It is about the host’s reservation and what is exposed to the guest. - X-PciMmio64Mb is an OVMF/firmware override It tells OVMF to use a specific size for the MMIO64 window, regardless of what QEMU might have reserved or exposed by default. Moreover, as indicated by the X- prefix, this is an “experimental” option that isn’t widely documented and used as a workaround for situations where the default window sizing logic that is present in OVMF is insufficient. Since highmem-mmio-size is also a QEMU property that deals with host-side reservation for the MMIO64 window, it seems more in line with pcihole64.
participants (3)
-
Daniel P. Berrangé
-
Matt Ochs
-
Matthew R. Ochs