[RFC PATCH 0/8] Add support for Grace ACPI Extended GPU Memory (EGM) devices

The Grace SOC introduces Extended GPU Memory (EGM) [3], a feature that enables GPUs to efficiently access system memory within and across nodes. This patch series adds support for virtualizing EGM (vEGM) in libvirt, allowing VMs to utilize dedicated EGM memory regions through ACPI. RFC Status ========== This patch series is submitted as an RFC to gather feedback from the libvirt community on the overall approach and implementation details. While kernel EGM driver support and QEMU acpi-egm-memory device support are not yet upstream, reference implementations are available [1][2] to enable testing and validation of the libvirt integration. Any community feedback is appreciated. Background and Use Cases ========================= EGM allows host memory to be partitioned into two regions: 1. Standard memory for Host OS usage 2. EGM region assigned to VMs as their system memory This technology enables various high-performance computing scenarios [3]: - Large memory pools for AI/ML workloads - High-performance computing applications - Memory extension for systems with limited main memory - GPU-accelerated workloads requiring large addressable memory Implementation Overview ======================= This 8-patch series adds a new device type 'acpi-egm-memory' with the following structure: 1. Schema definition - Add XML schema definition for the new ACPI EGM memory device 2. XML parsing - Implement XML parsing and internal data structures 3. Capability detection - Add QEMU capability detection for EGM support 4. Validation - Add validation logic for EGM device configuration 5. Command generation - Implement QEMU command line generation 6. Resource management - Setup required cgroup and namespace configurations 7. Documentation - Add comprehensive documentation 8. Testing - Add qemuxmlconftest for ACPI EGM memory device XML Configuration ================= Example usage in domain XML: <devices> <hostdev mode="subsystem" type="pci" managed="yes"> <alias name="ua-hostdev0"/> <source> <address domain="0x0000" bus="0x01" slot="0x00" function="0x0"/> </source> </hostdev> <acpiEgmMemory> <alias name="egm0"/> <pciDev>ua-hostdev0</pciDev> <numaNode>0</numaNode> </acpiEgmMemory> </devices> This configuration results in appropriate QEMU command line options: -object memory-backend-file,id=m0,mem-path=/dev/egm0,size=32G,share=on,prealloc=on -object acpi-egm-memory,id=egm0,pci-dev=ua-hostdev0,node=0 Implementation Notes ==================== The device validation includes checking that referenced PCI devices exist, NUMA nodes are valid, and device paths are accessible with proper permissions. Memory backing is configured automatically to use the EGM device path, and cgroups/namespaces are set up to allow safe access. Testing ======= I've tested XML parsing, validation, and qemu command line generation. The qemuxmlconftest passes for XML handling, though the command generation test currently fails since QEMU doesn't have acpi-egm-memory support yet, but the generated args look correct for when it does. Requirements ============ This feature requires: - NVIDIA ARM64 Grace platform with EGM support - Host kernel with EGM driver support [1] - QEMU with ACPI EGM device support [2] [1] https://github.com/ianm-nv/NV-Kernels/tree/6.8_ghvirt_egm_may2025 [2] https://github.com/ianm-nv/qemu/tree/6.8_ghvirt_egm_may2025 [3] https://developer.nvidia.com/blog/nvidia-grace-hopper-superchip-architecture... Ian May (8): conf: Add schema definition for ACPI EGM memory device conf: Add definitions and XML parsing for ACPI EGM memory device qemu: Add capability detection for ACPI EGM memory device qemu: Add validation for ACPI EGM memory device configuration qemu: Add command line generation for ACPI EGM memory device qemu: Add cgroup and namespace setup for ACPI EGM memory device docs: Document ACPI EGM memory device tests: Add qemuxmlconftest for ACPI EGM memory device docs/formatdomain.rst | 80 ++++++++++++++ src/ch/ch_domain.c | 1 + src/conf/domain_conf.c | 102 ++++++++++++++++++ src/conf/domain_conf.h | 11 ++ src/conf/domain_postparse.c | 8 ++ src/conf/domain_validate.c | 23 ++++ src/conf/schemas/domaincommon.rng | 19 ++++ src/conf/virconftypes.h | 2 + src/libxl/libxl_driver.c | 6 ++ src/lxc/lxc_driver.c | 6 ++ src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_cgroup.c | 21 ++++ src/qemu/qemu_command.c | 37 +++++++ src/qemu/qemu_domain.c | 2 + src/qemu/qemu_domain_address.c | 2 + src/qemu/qemu_driver.c | 3 + src/qemu/qemu_hotplug.c | 5 + src/qemu/qemu_namespace.c | 21 ++++ src/qemu/qemu_postparse.c | 1 + src/qemu/qemu_validate.c | 99 +++++++++++++++++ src/test/test_driver.c | 4 + tests/meson.build | 1 + .../caps_10.0.0_aarch64.xml | 1 + tests/qemuegmmock.c | 67 ++++++++++++ .../acpi-egm-memory.aarch64-latest.args | 1 + .../acpi-egm-memory.aarch64-latest.xml | 56 ++++++++++ tests/qemuxmlconfdata/acpi-egm-memory.xml | 27 +++++ tests/qemuxmlconftest.c | 5 +- 29 files changed, 613 insertions(+), 1 deletion(-) create mode 100644 tests/qemuegmmock.c create mode 100644 tests/qemuxmlconfdata/acpi-egm-memory.aarch64-latest.args create mode 100644 tests/qemuxmlconfdata/acpi-egm-memory.aarch64-latest.xml create mode 100644 tests/qemuxmlconfdata/acpi-egm-memory.xml -- 2.43.0

Add RelaxNG schema definition for the ACPI EGM memory device configuration. This schema defines the XML structure for configuring extended memory access through ACPI tables. The schema includes: - Device alias for unique identification - PCI device reference for memory source - NUMA node assignment for memory placement Signed-off-by: Ian May <ianm@nvidia.com> --- src/conf/schemas/domaincommon.rng | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/src/conf/schemas/domaincommon.rng b/src/conf/schemas/domaincommon.rng index 2d6e15f144..8501419403 100644 --- a/src/conf/schemas/domaincommon.rng +++ b/src/conf/schemas/domaincommon.rng @@ -6429,6 +6429,22 @@ </element> </define> + <define name="acpiEgmMemory"> + <element name="acpiEgmMemory"> + <element name="alias"> + <attribute name="name"> + <data type="string"/> + </attribute> + </element> + <element name="pciDev"> + <data type="string"/> + </element> + <element name="numaNode"> + <data type="nonNegativeInteger"/> + </element> + </element> + </define> + <define name="hostdev"> <element name="hostdev"> <interleave> @@ -6901,6 +6917,9 @@ <optional> <ref name="pstore"/> </optional> + <optional> + <ref name="acpiEgmMemory"/> + </optional> </interleave> </element> </define> -- 2.43.0

Implement the core data structures and XML parsing for ACPI EGM memory device support. This includes: - New device type VIR_DOMAIN_DEVICE_EGM - Data structure virDomainAcpiEgmDef for device configuration - XML parsing and formatting functions - Integration with existing device handling infrastructure Signed-off-by: Ian May <ianm@nvidia.com> --- src/ch/ch_domain.c | 1 + src/conf/domain_conf.c | 102 +++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 11 ++++ src/conf/domain_postparse.c | 8 +++ src/conf/domain_validate.c | 1 + src/conf/virconftypes.h | 2 + src/libxl/libxl_driver.c | 6 ++ src/lxc/lxc_driver.c | 6 ++ src/qemu/qemu_domain.c | 2 + src/qemu/qemu_domain_address.c | 2 + src/qemu/qemu_driver.c | 3 + src/qemu/qemu_hotplug.c | 5 ++ src/qemu/qemu_postparse.c | 1 + src/test/test_driver.c | 4 ++ 14 files changed, 154 insertions(+) diff --git a/src/ch/ch_domain.c b/src/ch/ch_domain.c index 7231fdc49f..3c0ad0c513 100644 --- a/src/ch/ch_domain.c +++ b/src/ch/ch_domain.c @@ -185,6 +185,7 @@ chValidateDomainDeviceDef(const virDomainDeviceDef *dev, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("Cloud-Hypervisor doesn't support '%1$s' device"), virDomainDeviceTypeToString(dev->type)); diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index ba0d4a7b12..5f1854d89a 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -342,6 +342,7 @@ VIR_ENUM_IMPL(virDomainDevice, "audio", "crypto", "pstore", + "egm", ); VIR_ENUM_IMPL(virDomainDiskDevice, @@ -3623,6 +3624,17 @@ void virDomainPstoreDefFree(virDomainPstoreDef *def) g_free(def); } +void virDomainAcpiEgmDefFree(virDomainAcpiEgmDef *def) +{ + if (!def) + return; + + g_free(def->alias); + g_free(def->pciDev); + virDomainDeviceInfoClear(&def->info); + g_free(def); +} + void virDomainDeviceDefFree(virDomainDeviceDef *def) { if (!def) @@ -3710,6 +3722,9 @@ void virDomainDeviceDefFree(virDomainDeviceDef *def) case VIR_DOMAIN_DEVICE_PSTORE: virDomainPstoreDefFree(def->data.pstore); break; + case VIR_DOMAIN_DEVICE_EGM: + virDomainAcpiEgmDefFree(def->data.egm); + break; case VIR_DOMAIN_DEVICE_LAST: case VIR_DOMAIN_DEVICE_NONE: break; @@ -4688,6 +4703,8 @@ virDomainDeviceGetInfo(const virDomainDeviceDef *device) return &device->data.crypto->info; case VIR_DOMAIN_DEVICE_PSTORE: return &device->data.pstore->info; + case VIR_DOMAIN_DEVICE_EGM: + return &device->data.egm->info; /* The following devices do not contain virDomainDeviceInfo */ case VIR_DOMAIN_DEVICE_LEASE: @@ -4796,6 +4813,9 @@ virDomainDeviceSetData(virDomainDeviceDef *device, case VIR_DOMAIN_DEVICE_PSTORE: device->data.pstore = devicedata; break; + case VIR_DOMAIN_DEVICE_EGM: + device->data.egm = devicedata; + break; case VIR_DOMAIN_DEVICE_NONE: case VIR_DOMAIN_DEVICE_LAST: break; @@ -5021,6 +5041,13 @@ virDomainDeviceInfoIterateFlags(virDomainDef *def, return rc; } + device.type = VIR_DOMAIN_DEVICE_EGM; + if (def->egm) { + device.data.egm = def->egm; + if ((rc = cb(def, &device, &def->egm->info, opaque)) != 0) + return rc; + } + /* If the flag below is set, make sure @cb can handle @info being NULL */ if (iteratorFlags & DOMAIN_DEVICE_ITERATE_MISSING_INFO) { device.type = VIR_DOMAIN_DEVICE_GRAPHICS; @@ -5081,6 +5108,7 @@ virDomainDeviceInfoIterateFlags(virDomainDef *def, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: break; } #endif @@ -14506,6 +14534,40 @@ virDomainPstoreDefParseXML(virDomainXMLOption *xmlopt, } +static virDomainAcpiEgmDef * +virDomainAcpiEgmDefParseXML(virDomainXMLOption *xmlopt, + xmlNodePtr node, + xmlXPathContextPtr ctxt, + unsigned int flags) +{ + g_autoptr(virDomainAcpiEgmDef) def = NULL; + VIR_XPATH_NODE_AUTORESTORE(ctxt) + int rc; + xmlNodePtr alias = NULL; + + def = g_new0(virDomainAcpiEgmDef, 1); + + ctxt->node = node; + + alias = virXPathNode("./alias", ctxt); + if (!alias) + return NULL; + def->alias = virXMLPropString(alias, "name"); + def->pciDev = virXPathString("string(./pciDev)", ctxt); + rc = virXPathInt("string(./numaNode)", ctxt, &def->numaNode); + if (rc < 0 || def->numaNode < 0) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("invalid NUMA node in target")); + return NULL; + } + + if (virDomainDeviceInfoParseXML(xmlopt, node, ctxt, &def->info, flags) < 0) + return NULL; + + return g_steal_pointer(&def); +} + + static int virDomainDeviceDefParseType(const char *typestr, virDomainDeviceType *type) @@ -14691,6 +14753,12 @@ virDomainDeviceDefParse(const char *xmlStr, return NULL; } break; + case VIR_DOMAIN_DEVICE_EGM: + if (!(dev->data.egm = virDomainAcpiEgmDefParseXML(xmlopt, node, + ctxt, flags))) { + return NULL; + } + break; case VIR_DOMAIN_DEVICE_NONE: case VIR_DOMAIN_DEVICE_LAST: break; @@ -20104,6 +20172,22 @@ virDomainDefParseXML(xmlXPathContextPtr ctxt, } VIR_FREE(nodes); + if ((n = virXPathNodeSet("./devices/acpiEgmMemory", ctxt, &nodes)) < 0) + return NULL; + + if (n > 1) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("only a single egm device is supported")); + return NULL; + } + + if (n > 0) { + if (!(def->egm = virDomainAcpiEgmDefParseXML(xmlopt, nodes[0], + ctxt, flags))) + return NULL; + } + VIR_FREE(nodes); + /* analysis of the user namespace mapping */ if ((n = virXPathNodeSet("./idmap/uid", ctxt, &nodes)) < 0) return NULL; @@ -22576,6 +22660,7 @@ virDomainDefCheckABIStabilityFlags(virDomainDef *src, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: break; } #endif @@ -28844,6 +28929,19 @@ virDomainPstoreDefFormat(virBuffer *buf, return 0; } +static int +virDomainAcpiEgmDefFormat(virBuffer *buf, + virDomainAcpiEgmDef *egm) +{ + g_auto(virBuffer) childBuf = VIR_BUFFER_INIT_CHILD(buf); + + virBufferAsprintf(&childBuf, "<alias name='%s'/>\n", egm->alias); + virBufferAsprintf(&childBuf, "<pciDev>%s</pciDev>\n", egm->pciDev); + virBufferAsprintf(&childBuf, "<numaNode>%d</numaNode>\n", egm->numaNode); + + virXMLFormatElement(buf, "acpiEgmMemory", NULL, &childBuf); + return 0; +} int virDomainDefFormatInternal(virDomainDef *def, @@ -29328,6 +29426,9 @@ virDomainDefFormatInternalSetRootName(virDomainDef *def, if (def->pstore) virDomainPstoreDefFormat(buf, def->pstore, flags); + if (def->egm) + virDomainAcpiEgmDefFormat(buf, def->egm); + virBufferAdjustIndent(buf, -2); virBufferAddLit(buf, "</devices>\n"); @@ -29488,6 +29589,7 @@ virDomainDeviceIsUSB(virDomainDeviceDef *dev, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: break; } diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 6008ec66d3..5132c6587b 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -88,6 +88,7 @@ typedef enum { VIR_DOMAIN_DEVICE_AUDIO, VIR_DOMAIN_DEVICE_CRYPTO, VIR_DOMAIN_DEVICE_PSTORE, + VIR_DOMAIN_DEVICE_EGM, VIR_DOMAIN_DEVICE_LAST } virDomainDeviceType; @@ -122,6 +123,7 @@ struct _virDomainDeviceDef { virDomainAudioDef *audio; virDomainCryptoDef *crypto; virDomainPstoreDef *pstore; + virDomainAcpiEgmDef *egm; } data; }; @@ -3100,6 +3102,12 @@ struct _virDomainPstoreDef { virDomainDeviceInfo info; }; +struct _virDomainAcpiEgmDef { + char *alias; + char *pciDev; + int numaNode; + virDomainDeviceInfo info; +}; #define SCSI_SUPER_WIDE_BUS_MAX_CONT_UNIT 64 #define SCSI_WIDE_BUS_MAX_CONT_UNIT 16 @@ -3282,6 +3290,7 @@ struct _virDomainDef { virDomainIOMMUDef *iommu; virDomainVsockDef *vsock; virDomainPstoreDef *pstore; + virDomainAcpiEgmDef *egm; void *namespaceData; virXMLNamespace ns; @@ -3728,6 +3737,8 @@ void virDomainCryptoDefFree(virDomainCryptoDef *def); G_DEFINE_AUTOPTR_CLEANUP_FUNC(virDomainCryptoDef, virDomainCryptoDefFree); void virDomainPstoreDefFree(virDomainPstoreDef *def); G_DEFINE_AUTOPTR_CLEANUP_FUNC(virDomainPstoreDef, virDomainPstoreDefFree); +void virDomainAcpiEgmDefFree(virDomainAcpiEgmDef *def); +G_DEFINE_AUTOPTR_CLEANUP_FUNC(virDomainAcpiEgmDef, virDomainAcpiEgmDefFree); void virDomainNetTeamingInfoFree(virDomainNetTeamingInfo *teaming); G_DEFINE_AUTOPTR_CLEANUP_FUNC(virDomainNetTeamingInfo, virDomainNetTeamingInfoFree); void virDomainNetPortForwardFree(virDomainNetPortForward *pf); diff --git a/src/conf/domain_postparse.c b/src/conf/domain_postparse.c index a07ec8d94e..4933259129 100644 --- a/src/conf/domain_postparse.c +++ b/src/conf/domain_postparse.c @@ -85,6 +85,13 @@ virDomainDefPostParseMemory(virDomainDef *def, return -1; } + /* if we have a Grace EGM device, setup memory backing */ + if (def->egm) { + def->mem.source = VIR_DOMAIN_MEMORY_SOURCE_FILE; + def->mem.access = VIR_DOMAIN_MEMORY_ACCESS_SHARED; + def->mem.allocation = VIR_DOMAIN_MEMORY_ALLOCATION_IMMEDIATE; + } + return 0; } @@ -760,6 +767,7 @@ virDomainDeviceDefPostParseCommon(virDomainDeviceDef *dev, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: ret = 0; break; diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c index 8f7259a0e1..88e61fb878 100644 --- a/src/conf/domain_validate.c +++ b/src/conf/domain_validate.c @@ -3317,6 +3317,7 @@ virDomainDeviceDefValidateInternal(const virDomainDeviceDef *dev, case VIR_DOMAIN_DEVICE_PSTORE: return virDomainPstoreDefValidate(dev->data.pstore); + case VIR_DOMAIN_DEVICE_EGM: case VIR_DOMAIN_DEVICE_LEASE: case VIR_DOMAIN_DEVICE_WATCHDOG: case VIR_DOMAIN_DEVICE_HUB: diff --git a/src/conf/virconftypes.h b/src/conf/virconftypes.h index 8c6fcdbeaa..97ddc3de2a 100644 --- a/src/conf/virconftypes.h +++ b/src/conf/virconftypes.h @@ -268,6 +268,8 @@ typedef struct _virDomainCryptoDef virDomainCryptoDef; typedef struct _virDomainPstoreDef virDomainPstoreDef; +typedef struct _virDomainAcpiEgmDef virDomainAcpiEgmDef; + typedef struct _virDomainWatchdogDef virDomainWatchdogDef; typedef struct _virDomainXMLOption virDomainXMLOption; diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 308c0372aa..5e56327439 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -3492,6 +3492,7 @@ libxlDomainAttachDeviceLive(libxlDriverPrivate *driver, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("device type '%1$s' cannot be attached"), virDomainDeviceTypeToString(dev->type)); @@ -3596,6 +3597,7 @@ libxlDomainAttachDeviceConfig(virDomainDef *vmdef, virDomainDeviceDef *dev) case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("persistent attach of device is not supported")); return -1; @@ -3965,6 +3967,7 @@ libxlDomainDetachDeviceLive(libxlDriverPrivate *driver, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("device type '%1$s' cannot be detached"), virDomainDeviceTypeToString(dev->type)); @@ -4056,6 +4059,7 @@ libxlDomainDetachDeviceConfig(virDomainDef *vmdef, virDomainDeviceDef *dev) case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("persistent detach of device is not supported")); return -1; @@ -4119,6 +4123,7 @@ libxlDomainUpdateDeviceLive(virDomainObj *vm, virDomainDeviceDef *dev) case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("device type '%1$s' cannot be updated"), virDomainDeviceTypeToString(dev->type)); @@ -4182,6 +4187,7 @@ libxlDomainUpdateDeviceConfig(virDomainDef *vmdef, virDomainDeviceDef *dev) case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("persistent update of device is not supported")); return -1; diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index 80cf07d2e5..46f65ef630 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -3020,6 +3020,7 @@ lxcDomainAttachDeviceConfig(virDomainDef *vmdef, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("persistent attach of device is not supported")); break; @@ -3086,6 +3087,7 @@ lxcDomainUpdateDeviceConfig(virDomainDef *vmdef, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("persistent update of device is not supported")); break; @@ -3168,6 +3170,7 @@ lxcDomainDetachDeviceConfig(virDomainDef *vmdef, case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("persistent detach of device is not supported")); break; @@ -3270,6 +3273,7 @@ lxcDomainAttachDeviceMknodHelper(pid_t pid G_GNUC_UNUSED, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: virReportError(VIR_ERR_INTERNAL_ERROR, _("Unexpected device type %1$d"), data->def->type); @@ -3946,6 +3950,7 @@ lxcDomainAttachDeviceLive(virLXCDriver *driver, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("device type '%1$s' cannot be attached"), virDomainDeviceTypeToString(dev->type)); @@ -4364,6 +4369,7 @@ lxcDomainDetachDeviceLive(virLXCDriver *driver, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("device type '%1$s' cannot be detached"), virDomainDeviceTypeToString(dev->type)); diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 54eda9e12f..4414fd7289 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -8830,6 +8830,7 @@ qemuDomainPrepareChardevSourceOne(virDomainDeviceDef *dev, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: break; } @@ -10720,6 +10721,7 @@ qemuDomainDeviceBackendChardevForeachOne(virDomainDeviceDef *dev, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: /* no chardev backend */ break; } diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 96a9ca9b14..f204f595d4 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -471,6 +471,7 @@ qemuDomainDeviceSupportZPCI(virDomainDeviceDef *device) case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: break; case VIR_DOMAIN_DEVICE_NONE: @@ -1013,6 +1014,7 @@ qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDef *dev, break; case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: return pciFlags; /* These devices don't ever connect with PCI */ diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index a0f770b053..a0375a28e0 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -6906,6 +6906,7 @@ qemuDomainAttachDeviceConfig(virDomainDef *vmdef, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: case VIR_DOMAIN_DEVICE_LAST: virReportError(VIR_ERR_OPERATION_UNSUPPORTED, _("persistent attach of device '%1$s' is not supported"), @@ -7125,6 +7126,7 @@ qemuDomainDetachDeviceConfig(virDomainDef *vmdef, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: case VIR_DOMAIN_DEVICE_LAST: virReportError(VIR_ERR_OPERATION_UNSUPPORTED, _("persistent detach of device '%1$s' is not supported"), @@ -7251,6 +7253,7 @@ qemuDomainUpdateDeviceConfig(virDomainDef *vmdef, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: case VIR_DOMAIN_DEVICE_LAST: virReportError(VIR_ERR_OPERATION_UNSUPPORTED, _("persistent update of device '%1$s' is not supported"), diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index e9568af125..e0573d2eaf 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -3563,6 +3563,7 @@ qemuDomainAttachDeviceLive(virDomainObj *vm, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: case VIR_DOMAIN_DEVICE_LAST: virReportError(VIR_ERR_OPERATION_UNSUPPORTED, _("live attach of device '%1$s' is not supported"), @@ -5533,6 +5534,7 @@ qemuDomainRemoveAuditDevice(virDomainObj *vm, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: case VIR_DOMAIN_DEVICE_LAST: /* libvirt doesn't yet support detaching these devices */ break; @@ -5638,6 +5640,7 @@ qemuDomainRemoveDevice(virQEMUDriver *driver, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: case VIR_DOMAIN_DEVICE_LAST: virReportError(VIR_ERR_OPERATION_UNSUPPORTED, _("don't know how to remove a %1$s device"), @@ -6540,6 +6543,7 @@ qemuDomainDetachDeviceLive(virDomainObj *vm, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: case VIR_DOMAIN_DEVICE_LAST: virReportError(VIR_ERR_OPERATION_UNSUPPORTED, _("live detach of device '%1$s' is not supported"), @@ -7531,6 +7535,7 @@ qemuDomainUpdateDeviceLive(virDomainObj *vm, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: case VIR_DOMAIN_DEVICE_LAST: virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("live update of device '%1$s' is not supported"), diff --git a/src/qemu/qemu_postparse.c b/src/qemu/qemu_postparse.c index 9c2427970d..ae60ca02e8 100644 --- a/src/qemu/qemu_postparse.c +++ b/src/qemu/qemu_postparse.c @@ -959,6 +959,7 @@ qemuDomainDeviceDefPostParse(virDomainDeviceDef *dev, case VIR_DOMAIN_DEVICE_RNG: case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: + case VIR_DOMAIN_DEVICE_EGM: ret = 0; break; diff --git a/src/test/test_driver.c b/src/test/test_driver.c index 25335d9002..2e1048686c 100644 --- a/src/test/test_driver.c +++ b/src/test/test_driver.c @@ -10460,6 +10460,7 @@ testDomainAttachDeviceLive(virDomainObj *vm, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: case VIR_DOMAIN_DEVICE_LAST: virReportError(VIR_ERR_OPERATION_UNSUPPORTED, _("live attach of device '%1$s' is not supported"), @@ -10603,6 +10604,7 @@ testDomainUpdateDevice(virDomainDef *vmdef, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: case VIR_DOMAIN_DEVICE_LAST: virReportError(VIR_ERR_OPERATION_UNSUPPORTED, _("persistent update of device '%1$s' is not supported"), @@ -10975,6 +10977,7 @@ testDomainRemoveDevice(testDriver *driver, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: case VIR_DOMAIN_DEVICE_LAST: virReportError(VIR_ERR_OPERATION_UNSUPPORTED, _("live detach of device '%1$s' is not supported"), @@ -11046,6 +11049,7 @@ testDomainDetachDeviceLive(testDriver *driver, case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_CRYPTO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: case VIR_DOMAIN_DEVICE_LAST: virReportError(VIR_ERR_OPERATION_UNSUPPORTED, _("live detach of device '%1$s' is not supported"), -- 2.43.0

Add QEMU capability detection for the ACPI EGM memory device feature. This allows libvirt to determine if the QEMU binary supports the required functionality before attempting to use it. The capability is exposed through the QEMU device type 'acpi-egm-memory' and is used to validate device configuration and command line generation. Signed-off-by: Ian May <ianm@nvidia.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + 2 files changed, 3 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index b02f8e7a01..15b4461831 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -741,6 +741,7 @@ VIR_ENUM_IMPL(virQEMUCaps, "amd-iommu", /* QEMU_CAPS_AMD_IOMMU */ "amd-iommu.pci-id", /* QEMU_CAPS_AMD_IOMMU_PCI_ID */ "usb-bot", /* QEMU_CAPS_DEVICE_USB_BOT */ + "acpi-egm-memory", /* QEMU_CAPS_DEVICE_ACPI_EGM_MEMORY */ ); @@ -1429,6 +1430,7 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { { "nvme-ns", QEMU_CAPS_DEVICE_NVME_NS }, { "amd-iommu", QEMU_CAPS_AMD_IOMMU }, { "usb-bot", QEMU_CAPS_DEVICE_USB_BOT }, + { "acpi-egm-memory", QEMU_CAPS_DEVICE_ACPI_EGM_MEMORY }, }; diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 966e30fa11..36528d22f7 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -722,6 +722,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ QEMU_CAPS_AMD_IOMMU, /* -device amd-iommu */ QEMU_CAPS_AMD_IOMMU_PCI_ID, /* amd-iommu.pci-id */ QEMU_CAPS_DEVICE_USB_BOT, /* -device usb-bot */ + QEMU_CAPS_DEVICE_ACPI_EGM_MEMORY, /* For using extended memory */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; -- 2.43.0

Implement validation logic for ACPI EGM memory device configuration: - Validate PCI device reference exists and is properly configured - Check NUMA node assignment is valid - Verify device paths exist and are accessible - Ensure proper permissions on device files Signed-off-by: Ian May <ianm@nvidia.com> --- src/conf/domain_validate.c | 22 +++++++++ src/qemu/qemu_validate.c | 99 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 121 insertions(+) diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c index 88e61fb878..3cbfe867dc 100644 --- a/src/conf/domain_validate.c +++ b/src/conf/domain_validate.c @@ -3203,6 +3203,26 @@ virDomainPstoreDefValidate(const virDomainPstoreDef *pstore) return 0; } +static int +virDomainAcpiEgmDefValidate(const virDomainAcpiEgmDef *egm) +{ + if (egm->pciDev == NULL || egm->pciDev[0] == '\0') { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("missing pciDev for ACPI EGM device")); + return -1; + } + + if (egm->numaNode < 0) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("NUMA node must be specified for ACPI EGM device")); + return -1; + } + + VIR_DEBUG("Validating EGM device: alias=%s pciDev=%s numaNode=%d", + egm->alias, egm->pciDev, egm->numaNode); + + return 0; +} static int virDomainDeviceInfoValidate(const virDomainDeviceDef *dev) @@ -3318,6 +3338,8 @@ virDomainDeviceDefValidateInternal(const virDomainDeviceDef *dev, return virDomainPstoreDefValidate(dev->data.pstore); case VIR_DOMAIN_DEVICE_EGM: + return virDomainAcpiEgmDefValidate(dev->data.egm); + case VIR_DOMAIN_DEVICE_LEASE: case VIR_DOMAIN_DEVICE_WATCHDOG: case VIR_DOMAIN_DEVICE_HUB: diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 57dc4171fe..b7cb0c632b 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -4977,6 +4977,102 @@ qemuValidateDomainDeviceDefPstore(virDomainPstoreDef *pstore, return 0; } +static int +qemuValidateDomainDeviceDefAcpiEgm(virDomainAcpiEgmDef *egm, + const virDomainDef *def, + virQEMUCaps *qemuCaps) +{ + g_autofree char *egm_path = NULL; + g_autofree char *egm_pci_path = NULL; + g_autofree char *expected_pci = NULL; + g_autofree char *gpu_devices_content = NULL; + virDomainHostdevDef *hostdev = NULL; + size_t i; + + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_ACPI_EGM_MEMORY)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("ACPI EGM memory device is not supported with this QEMU binary")); + return -1; + } + + /* Find the referenced PCI hostdev */ + for (i = 0; i < def->nhostdevs; i++) { + virDomainHostdevDef *dev = def->hostdevs[i]; + + if (dev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS || + dev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) + continue; + + if (dev->info && dev->info->alias && STREQ(dev->info->alias, egm->pciDev)) { + hostdev = dev; + break; + } + } + + if (!hostdev) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Cannot find PCI device '%1$s' referenced by EGM device"), + egm->pciDev); + return -1; + } + + /* Validate NUMA node if configured */ + if (egm->numaNode > virDomainNumaGetNodeCount(def->numa)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("NUMA node %1$d for EGM device does not exist"), + egm->numaNode); + return -1; + } + + /* Validate EGM device path exists and is accessible */ + egm_path = g_strdup_printf("/dev/%s", egm->alias); + if (!virFileExists(egm_path)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("EGM device path '%1$s' does not exist"), + egm_path); + return -1; + } + + /* Check if we have proper permissions */ + if (access(egm_path, R_OK | W_OK) < 0) { + virReportSystemError(errno, + _("Cannot access EGM device '%1$s'"), + egm_path); + return -1; + } + + /* Validate EGM pci device path */ + egm_pci_path = g_strdup_printf("/sys/class/egm/%s/gpu_devices", egm->alias); + if (!virFileExists(egm_pci_path)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Cannot find GPU device information for EGM device '%1$s'"), + egm->alias); + return -1; + } + + /* Read and validate PCI address from gpu_devices file */ + expected_pci = g_strdup_printf("%04x:%02x:%02x.%x", + hostdev->source.subsys.u.pci.addr.domain, + hostdev->source.subsys.u.pci.addr.bus, + hostdev->source.subsys.u.pci.addr.slot, + hostdev->source.subsys.u.pci.addr.function); + + if (virFileReadAll(egm_pci_path, 1024, &gpu_devices_content) < 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Cannot read GPU device information for EGM device '%1$s'"), + egm->alias); + return -1; + } + + if (!strstr(gpu_devices_content, expected_pci)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("PCI device '%2$s' is not associated with EGM device '%1$s'"), + egm->alias, expected_pci); + return -1; + } + + return 0; +} static int qemuSoundCodecTypeToCaps(int type) @@ -5748,6 +5844,9 @@ qemuValidateDomainDeviceDef(const virDomainDeviceDef *dev, case VIR_DOMAIN_DEVICE_PSTORE: return qemuValidateDomainDeviceDefPstore(dev->data.pstore, def, qemuCaps); + case VIR_DOMAIN_DEVICE_EGM: + return qemuValidateDomainDeviceDefAcpiEgm(dev->data.egm, def, qemuCaps); + case VIR_DOMAIN_DEVICE_LEASE: case VIR_DOMAIN_DEVICE_PANIC: case VIR_DOMAIN_DEVICE_NONE: -- 2.43.0

Implement QEMU command line generation for the ACPI EGM memory device. This includes: - Adding the device to the QEMU command line - Setting up memory backend properties - Configuring device parameters (alias, PCI device, NUMA node) Signed-off-by: Ian May <ianm@nvidia.com> --- src/qemu/qemu_command.c | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 4b1e36a4c1..280211fbf7 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -1011,6 +1011,7 @@ qemuBuildVirtioDevGetConfigDev(const virDomainDeviceDef *device, case VIR_DOMAIN_DEVICE_IOMMU: case VIR_DOMAIN_DEVICE_AUDIO: case VIR_DOMAIN_DEVICE_PSTORE: + case VIR_DOMAIN_DEVICE_EGM: case VIR_DOMAIN_DEVICE_LAST: default: break; @@ -3451,6 +3452,8 @@ qemuBuildMemoryBackendProps(virJSONValue **backendProps, } else if (useHugepage) { if (qemuGetDomainHupageMemPath(priv->driver, def, pagesize, &memPath) < 0) return -1; + } else if (def->egm) { + memPath = g_strdup_printf("/dev/%s", def->egm->alias); } else { /* We can have both pagesize and mem source. If that's the case, * prefer hugepages as those are more specific. */ @@ -10533,6 +10536,36 @@ qemuBuildPstoreCommandLine(virCommand *cmd, return 0; } +static int +qemuBuildAcpiEgmCommandLine(virCommand *cmd, + virDomainAcpiEgmDef *egm, + virQEMUCaps *qemuCaps) +{ + g_autoptr(virJSONValue) egmProps = NULL; + + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_ACPI_EGM_MEMORY)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("ACPI EGM memory device is not supported with this QEMU binary")); + return -1; + } + + VIR_DEBUG("Creating ACPI EGM device: alias=%s, pciDev=%s, numaNode=%d", + egm->alias, egm->pciDev, egm->numaNode); + + if (qemuMonitorCreateObjectProps(&egmProps, + "acpi-egm-memory", + egm->alias, + "s:pci-dev", egm->pciDev, + "u:node", egm->numaNode, + NULL) < 0) { + return -1; + } + + if (qemuBuildObjectCommandlineFromJSON(cmd, egmProps) < 0) + return -1; + + return 0; +} static int qemuBuildAsyncTeardownCommandLine(virCommand *cmd, @@ -10887,6 +10920,10 @@ qemuBuildCommandLine(virDomainObj *vm, qemuBuildPstoreCommandLine(cmd, def, def->pstore, qemuCaps) < 0) return NULL; + if (def->egm && + qemuBuildAcpiEgmCommandLine(cmd, def->egm, qemuCaps) < 0) + return NULL; + if (qemuBuildAsyncTeardownCommandLine(cmd, def, qemuCaps) < 0) return NULL; -- 2.43.0

Implement proper isolation and access control for ACPI EGM memory devices: - Add device to cgroup for access control - Set up namespace mappings for device access - Ensure proper permissions in containerized environments Signed-off-by: Ian May <ianm@nvidia.com> --- src/qemu/qemu_cgroup.c | 21 +++++++++++++++++++++ src/qemu/qemu_namespace.c | 21 +++++++++++++++++++++ 2 files changed, 42 insertions(+) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 25e42ebfc6..3a33087778 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -753,6 +753,22 @@ qemuSetupSEVCgroup(virDomainObj *vm) VIR_CGROUP_DEVICE_RW, false); } +static int +qemuSetupAcpiEgmCgroup(virDomainObj *vm) +{ + g_autofree char *path = NULL; + + path = g_strdup_printf("/dev/%s", vm->def->egm->alias); + + if (path && + qemuCgroupAllowDevicePath(vm, path, + VIR_CGROUP_DEVICE_RW, false) < 0) { + return -1; + } + + return 0; +} + static int qemuSetupDevicesCgroup(virDomainObj *vm) { @@ -871,6 +887,11 @@ qemuSetupDevicesCgroup(virDomainObj *vm) } } + if (vm->def->egm) { + if (qemuSetupAcpiEgmCgroup(vm) < 0) + return -1; + } + return 0; } diff --git a/src/qemu/qemu_namespace.c b/src/qemu/qemu_namespace.c index 59421ec9d1..60000c2636 100644 --- a/src/qemu/qemu_namespace.c +++ b/src/qemu/qemu_namespace.c @@ -676,6 +676,24 @@ qemuDomainSetupLaunchSecurity(virDomainObj *vm, } +static int +qemuDomainSetupAcpiEgm(virDomainObj *vm, + GSList **paths) +{ + virDomainAcpiEgmDef *egm = vm->def->egm; + g_autofree char *path = NULL; + + if (!egm) + return 0; + + path = g_strdup_printf("/dev/%s", egm->alias); + + *paths = g_slist_prepend(*paths, g_steal_pointer(&path)); + + return 0; +} + + static int qemuNamespaceMknodPaths(virDomainObj *vm, GSList *paths, @@ -729,6 +747,9 @@ qemuDomainBuildNamespace(virQEMUDriverConfig *cfg, if (qemuDomainSetupLaunchSecurity(vm, &paths) < 0) return -1; + if (qemuDomainSetupAcpiEgm(vm, &paths) < 0) + return -1; + if (qemuNamespaceMknodPaths(vm, paths, NULL) < 0) return -1; -- 2.43.0

Add documentation for the ACPI EGM memory device feature: - Describe the purpose and use cases - Document XML configuration options - Provide example configurations - Explain requirements and limitations Signed-off-by: Ian May <ianm@nvidia.com> --- docs/formatdomain.rst | 80 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 80 insertions(+) diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 54a809eaf9..806af24fc2 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -1897,6 +1897,86 @@ For instance, ``target='0' cache='1'`` refers to the first level cache of NUMA node 0. +ACPI EGM Memory Devices +~~~~~~~~~~~~~~~~~~~~~~~ + +The ACPI EGM (Extended Guest Memory) device enables a guest to access extended memory regions +through ACPI. This is useful for exposing specialized memory regions from passthrough devices +to the guest OS. + +:: + + <devices> + ... + <acpiEgmMemory> + <alias name='egm0'/> + <pciDev>ua-hostdev0</pciDev> + <numaNode>0</numaNode> + </acpiEgmMemory> + ... + </devices> + +The ``acpiEgmMemory`` element has the following sub-elements: + +``alias`` + Specifies a unique identifier for the EGM device. + +``pciDev`` + Specifies the ID of the PCI device that provides the extended memory. This must + reference a valid PCI device defined in the domain configuration. + +``numaNode`` + Specifies the NUMA node to which the extended memory is assigned. This must reference + a valid NUMA node defined in the domain configuration. + +To use ACPI EGM, you typically need: + +1. A passthrough PCI device that exposes memory regions +2. A NUMA topology defined in the domain configuration +3. A suitable guest OS that can recognize and utilize the ACPI EGM tables +4. A QEMU version that supports the ACPI EGM feature + +The memory region exposed by the EGM device can be accessed by the guest OS through the ACPI +tables. This mechanism is often used for specialized workloads that require direct access to +device memory regions. + +Example configuration: + +:: + + <domain type='kvm'> + <name>egm-example</name> + <memory unit='MiB'>8192</memory> + <vcpu>4</vcpu> + <cpu mode='host-passthrough'> + <topology sockets='1' cores='4' threads='1'/> + <numa> + <cell id='0' cpus='0-3' memory='8192' unit='MiB'/> + </numa> + </cpu> + <os> + <type arch='aarch64' machine='virt'>hvm</type> + <loader readonly='yes' type='pflash'>/usr/share/AAVMF/AAVMF_CODE.fd</loader> + </os> + <devices> + <!-- Passthrough PCI device --> + <hostdev mode='subsystem' type='pci' managed='yes'> + <alias name='ua-hostdev0'/> + <source> + <address domain='0x0009' bus='0x01' slot='0x00' function='0x00'/> + </source> + </hostdev> + + <!-- ACPI EGM device referencing the PCI device --> + <acpiEgmMemory> + <alias name='egm0'/> + <pciDev>ua-hostdev0</pciDev> + <numaNode>0</numaNode> + </acpiEgmMemory> + </devices> + </domain> + + Events configuration -------------------- -- 2.43.0

Add test coverage for the ACPI EGM memory device feature: - Add test case to qemuxmlconftest.c for aarch64 architecture - Add acpi-egm-memory capability to QEMU 10.0.0 aarch64 capabilities - Create test input XML with EGM device configuration - Generate expected output XML and QEMU command line args - Update validation to skip filesystem checks during tests The test validates XML parsing, formatting, device validation, and QEMU command line generation for the EGM device. Filesystem validation is conditionally skipped in test environments while preserving full validation for production use. Signed-off-by: Ian May <ianm@nvidia.com> --- tests/meson.build | 1 + .../caps_10.0.0_aarch64.xml | 1 + tests/qemuegmmock.c | 67 +++++++++++++++++++ .../acpi-egm-memory.aarch64-latest.args | 1 + .../acpi-egm-memory.aarch64-latest.xml | 56 ++++++++++++++++ tests/qemuxmlconfdata/acpi-egm-memory.xml | 27 ++++++++ tests/qemuxmlconftest.c | 5 +- 7 files changed, 157 insertions(+), 1 deletion(-) create mode 100644 tests/qemuegmmock.c create mode 100644 tests/qemuxmlconfdata/acpi-egm-memory.aarch64-latest.args create mode 100644 tests/qemuxmlconfdata/acpi-egm-memory.aarch64-latest.xml create mode 100644 tests/qemuxmlconfdata/acpi-egm-memory.xml diff --git a/tests/meson.build b/tests/meson.build index 0d76d37959..312c05a5f1 100644 --- a/tests/meson.build +++ b/tests/meson.build @@ -174,6 +174,7 @@ if conf.has('WITH_QEMU') { 'name': 'qemucaps2xmlmock' }, { 'name': 'qemucapsprobemock', 'link_with': [ test_qemu_driver_lib ] }, { 'name': 'qemucpumock' }, + { 'name': 'qemuegmmock' }, { 'name': 'qemuhotplugmock', 'link_with': [ test_qemu_driver_lib, test_utils_qemu_lib, test_utils_lib ] }, { 'name': 'qemuxml2argvmock' }, { 'name': 'virhostidmock' }, diff --git a/tests/qemucapabilitiesdata/caps_10.0.0_aarch64.xml b/tests/qemucapabilitiesdata/caps_10.0.0_aarch64.xml index 200873b3a2..30abf65675 100644 --- a/tests/qemucapabilitiesdata/caps_10.0.0_aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_10.0.0_aarch64.xml @@ -109,6 +109,7 @@ <flag name='vnc-power-control'/> <flag name='rotation-rate'/> <flag name='acpi-index'/> + <flag name='acpi-egm-memory'/> <flag name='input-linux'/> <flag name='confidential-guest-support'/> <flag name='set-action'/> diff --git a/tests/qemuegmmock.c b/tests/qemuegmmock.c new file mode 100644 index 0000000000..c915212f45 --- /dev/null +++ b/tests/qemuegmmock.c @@ -0,0 +1,67 @@ +/* + * Copyright (C) 2024 Red Hat, Inc. + * SPDX-License-Identifier: LGPL-2.1-or-later + */ + +#include <config.h> +#include <unistd.h> + +#include "internal.h" +#include "virfile.h" +#include "virmock.h" + +static bool (*real_virFileExists)(const char *path); +static int (*real_access)(const char *path, int mode); +static int (*real_virFileReadAll)(const char *path, int maxlen, char **buf); + +static void +init_syms(void) +{ + if (real_virFileExists && real_access && real_virFileReadAll) + return; + + VIR_MOCK_REAL_INIT(virFileExists); + VIR_MOCK_REAL_INIT(access); + VIR_MOCK_REAL_INIT(virFileReadAll); +} + +bool +virFileExists(const char *path) +{ + init_syms(); + + /* Mock EGM device paths for testing */ + if (g_str_has_prefix(path, "/dev/egm") || + g_str_has_prefix(path, "/sys/class/egm/")) + return true; + + return real_virFileExists(path); +} + +int +access(const char *path, int mode) +{ + init_syms(); + + /* Mock EGM device paths for testing */ + if (g_str_has_prefix(path, "/dev/egm") || + g_str_has_prefix(path, "/sys/class/egm/")) + return 0; /* success */ + + return real_access(path, mode); +} + +int +virFileReadAll(const char *path, int maxlen, char **buf) +{ + init_syms(); + + /* Mock EGM GPU device file for testing */ + if (g_str_has_prefix(path, "/sys/class/egm/") && + g_str_has_suffix(path, "/gpu_devices")) { + *buf = g_strdup("0000:01:00.0\n"); + return strlen(*buf); + } + + return real_virFileReadAll(path, maxlen, buf); +} diff --git a/tests/qemuxmlconfdata/acpi-egm-memory.aarch64-latest.args b/tests/qemuxmlconfdata/acpi-egm-memory.aarch64-latest.args new file mode 100644 index 0000000000..773cc83946 --- /dev/null +++ b/tests/qemuxmlconfdata/acpi-egm-memory.aarch64-latest.args @@ -0,0 +1 @@ +-object acpi-egm-memory,id=egm0,pci-dev=ua-hostdev0,node=0 diff --git a/tests/qemuxmlconfdata/acpi-egm-memory.aarch64-latest.xml b/tests/qemuxmlconfdata/acpi-egm-memory.aarch64-latest.xml new file mode 100644 index 0000000000..a62e6b1368 --- /dev/null +++ b/tests/qemuxmlconfdata/acpi-egm-memory.aarch64-latest.xml @@ -0,0 +1,56 @@ +<domain type='kvm'> + <name>egm</name> + <uuid>00010203-0405-4607-8809-0a0b0c0d0e0f</uuid> + <memory unit='KiB'>524288</memory> + <currentMemory unit='KiB'>524288</currentMemory> + <memoryBacking> + <source type='file'/> + <access mode='shared'/> + <allocation mode='immediate'/> + </memoryBacking> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='aarch64' machine='virt'>hvm</type> + <boot dev='hd'/> + </os> + <features> + <gic version='3'/> + </features> + <cpu mode='host-passthrough' check='none'> + <topology sockets='1' dies='1' clusters='1' cores='1' threads='1'/> + <numa> + <cell id='0' cpus='0' memory='524288' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-aarch64</emulator> + <controller type='pci' index='0' model='pcie-root'/> + <controller type='pci' index='1' model='pcie-root-port'> + <model name='pcie-root-port'/> + <target chassis='1' port='0x8'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/> + </controller> + <controller type='pci' index='2' model='pcie-root-port'> + <model name='pcie-root-port'/> + <target chassis='2' port='0x9'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <audio id='1' type='none'/> + <hostdev mode='subsystem' type='pci' managed='yes'> + <source> + <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> + </source> + <alias name='ua-hostdev0'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> + </hostdev> + <acpiEgmMemory> + <alias name='egm0'/> + <pciDev>ua-hostdev0</pciDev> + <numaNode>0</numaNode> + </acpiEgmMemory> + </devices> +</domain> diff --git a/tests/qemuxmlconfdata/acpi-egm-memory.xml b/tests/qemuxmlconfdata/acpi-egm-memory.xml new file mode 100644 index 0000000000..58dff4be8e --- /dev/null +++ b/tests/qemuxmlconfdata/acpi-egm-memory.xml @@ -0,0 +1,27 @@ +<domain type="kvm"> + <name>egm</name> + <memory unit="MiB">512</memory> + <vcpu>1</vcpu> + <os> + <type arch="aarch64" machine="virt">hvm</type> + </os> + <cpu mode="host-passthrough"> + <topology sockets="1" cores="1" threads="1"/> + <numa> + <cell id="0" cpus="0" memory="512" unit="MiB"/> + </numa> + </cpu> + <devices> + <hostdev mode="subsystem" type="pci" managed="yes"> + <alias name="ua-hostdev0"/> + <source> + <address domain="0x0000" bus="0x01" slot="0x00" function="0x0"/> + </source> + </hostdev> + <acpiEgmMemory> + <alias name="egm0"/> + <pciDev>ua-hostdev0</pciDev> + <numaNode>0</numaNode> + </acpiEgmMemory> + </devices> +</domain> diff --git a/tests/qemuxmlconftest.c b/tests/qemuxmlconftest.c index aeca353437..cf19a1bf73 100644 --- a/tests/qemuxmlconftest.c +++ b/tests/qemuxmlconftest.c @@ -2952,6 +2952,8 @@ mymain(void) DO_TEST_CAPS_LATEST("devices-acpi-index"); + DO_TEST_CAPS_ARCH_LATEST("acpi-egm-memory", "aarch64"); + DO_TEST_CAPS_ARCH_LATEST_FULL("hvf-x86_64-q35-headless", "x86_64", ARG_CAPS_VARIANT, "+hvf", ARG_END); DO_TEST_CAPS_ARCH_LATEST_FULL("hvf-aarch64-virt-headless", "aarch64", ARG_CAPS_VARIANT, "+hvf", ARG_END); /* HVF guests should not work on Linux with KVM */ @@ -3049,7 +3051,8 @@ VIR_TEST_MAIN_PRELOAD(mymain, VIR_TEST_MOCK("virrandom"), VIR_TEST_MOCK("qemucpu"), VIR_TEST_MOCK("virpci"), - VIR_TEST_MOCK("virnuma")) + VIR_TEST_MOCK("virnuma"), + VIR_TEST_MOCK("qemuegm")) #else -- 2.43.0
participants (1)
-
Ian May