[PATCH 0/7] qemu: Make memory allocation NUMA aware

See 2/7 for explanation. Michal Prívozník (7): qemu_capabilities: Introduce QEMU_CAPS_THREAD_CONTEXT qemu_command: Introduce qemuBuildThreadContextProps() qemu: Delete thread-context objects at domain startup qemu_command: Generate thread-context object for guest NUMA memory qemu: Generate thread-context object for memory devices qemu_command: Generate thread-context object for main guest memory qemu_hotplug: Generate thread-context object for memory device src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 83 +++++++++++++++++++ src/qemu/qemu_command.h | 5 ++ src/qemu/qemu_domain.c | 2 + src/qemu/qemu_domain.h | 2 + src/qemu/qemu_hotplug.c | 22 +++++ src/qemu/qemu_process.c | 47 +++++++++++ src/qemu/qemu_process.h | 2 + .../caps_7.2.0.x86_64.xml | 1 + .../hugepages-memaccess.x86_64-latest.args | 15 ++-- .../hugepages-memaccess2.x86_64-latest.args | 15 ++-- ...pages-numa-default-dimm.x86_64-latest.args | 3 +- .../hugepages-shared.x86_64-latest.args | 12 ++- ...memory-default-hugepage.x86_64-latest.args | 3 +- .../memfd-memory-numa.x86_64-latest.args | 6 +- ...emory-hotplug-dimm-addr.x86_64-latest.args | 3 +- ...mory-hotplug-virtio-mem.x86_64-latest.args | 3 +- .../numatune-memnode.x86_64-latest.args | 9 +- .../numatune-system-memory.x86_64-latest.args | 3 +- .../pages-dimm-discard.x86_64-latest.args | 3 +- 21 files changed, 217 insertions(+), 25 deletions(-) -- 2.37.4

In its commit v7.1.0-1429-g7208429223 QEMU gained new object thread-context, which allows running specialized tasks with affinity set to a given subset of host CPUs/NUMA nodes. Even though only memory allocation task accepts this new object, it's exactly what we aim to implement in libvirt. Therefore, introduce a new capability to track whether QEMU is capable of this object. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_7.2.0.x86_64.xml | 1 + 3 files changed, 4 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index d3ba8c85c9..965af45cb2 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -678,6 +678,7 @@ VIR_ENUM_IMPL(virQEMUCaps, "query-stats", /* QEMU_CAPS_QUERY_STATS */ "query-stats-schemas", /* QEMU_CAPS_QUERY_STATS_SCHEMAS */ "sgx-epc", /* QEMU_CAPS_SGX_EPC */ + "thread-context", /* QEMU_CAPS_THREAD_CONTEXT */ ); @@ -1384,6 +1385,7 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { { "virtio-mem-pci", QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI }, { "virtio-iommu-pci", QEMU_CAPS_DEVICE_VIRTIO_IOMMU_PCI }, { "sgx-epc", QEMU_CAPS_SGX_EPC }, + { "thread-context", QEMU_CAPS_THREAD_CONTEXT }, }; diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index dfa3d8cd9b..b70c02c05b 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -657,6 +657,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ QEMU_CAPS_QUERY_STATS, /* accepts query-stats */ QEMU_CAPS_QUERY_STATS_SCHEMAS, /* accepts query-stats-schemas */ QEMU_CAPS_SGX_EPC, /* -object sgx-epc,... */ + QEMU_CAPS_THREAD_CONTEXT, /* -object thread-context */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_7.2.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_7.2.0.x86_64.xml index 6a861eea24..b478c53502 100644 --- a/tests/qemucapabilitiesdata/caps_7.2.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_7.2.0.x86_64.xml @@ -199,6 +199,7 @@ <flag name='migration.blocked-reasons'/> <flag name='query-stats'/> <flag name='query-stats-schemas'/> + <flag name='thread-context'/> <version>7001050</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100245</microcodeVersion> -- 2.37.4

On Mon, Nov 14, 2022 at 04:35:29PM +0100, Michal Privoznik wrote:
In its commit v7.1.0-1429-g7208429223 QEMU gained new object thread-context, which allows running specialized tasks with affinity set to a given subset of host CPUs/NUMA nodes. Even though only memory allocation task accepts this new object, it's exactly what we aim to implement in libvirt. Therefore, introduce a new capability to track whether QEMU is capable of this object.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>

The aim of thread-context object is to set affinity on threads that allocate memory for a memory-backend-* object. For instance: -object '{"qom-type":"thread-context","id":"tc-ram-node0","node-affinity":[3]}' \ -object '{"qom-type":"memory-backend-memfd","id":"ram-node0","hugetlb":true,\ "hugetlbsize":2097152,"share":true,"prealloc":true,"prealloc-threads":8,\ "size":15032385536,"host-nodes":[3],"policy":"preferred",\ "prealloc-context":"tc-ram-node0"}' \ allocates 14GiB worth of memory, backed by 2MiB hugepages from host NUMA node 3, using 8 threads. If it weren't for thread-context these threads wouldn't have any affinity and thus theoretically could be scheduled to run on CPUs of different NUMA node (which is what I saw occasionally). Therefore, whenever we are pinning memory (IOW setting host-nodes attribute), we can generate thread-context object with the same affinity. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_command.c | 48 +++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_command.h | 5 +++++ 2 files changed, 53 insertions(+) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 41b9f7cb52..392b248628 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3603,6 +3603,54 @@ qemuBuildMemoryDeviceProps(virQEMUDriverConfig *cfg, } +int +qemuBuildThreadContextProps(virJSONValue **tcProps, + virJSONValue **memProps, + qemuDomainObjPrivate *priv) +{ + g_autoptr(virJSONValue) props = NULL; + virJSONValue *nodemask = NULL; + g_autoptr(virJSONValue) nodemaskCopy = NULL; + g_autofree char *tcAlias = NULL; + const char *memalias = NULL; + + if (tcProps) + *tcProps = NULL; + + if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_THREAD_CONTEXT)) + return 0; + + nodemask = virJSONValueObjectGetArray(*memProps, "host-nodes"); + if (!nodemask) + return 0; + + memalias = virJSONValueObjectGetString(*memProps, "id"); + if (!memalias) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("memory device alias is not assigned")); + return -1; + } + + tcAlias = g_strdup_printf("tc-%s", memalias); + nodemaskCopy = virJSONValueCopy(nodemask); + + if (virJSONValueObjectAdd(&props, + "s:qom-type", "thread-context", + "s:id", tcAlias, + "a:node-affinity", &nodemaskCopy, + NULL) < 0) + return -1; + + if (virJSONValueObjectAdd(memProps, + "s:prealloc-context", tcAlias, + NULL) < 0) + return -1; + + *tcProps = g_steal_pointer(&props); + return 0; +} + + static char * qemuBuildLegacyNicStr(virDomainNetDef *net) { diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index 2578367df9..761cc5d865 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -147,6 +147,11 @@ qemuBuildMemoryDeviceProps(virQEMUDriverConfig *cfg, const virDomainDef *def, const virDomainMemoryDef *mem); +int +qemuBuildThreadContextProps(virJSONValue **tcProps, + virJSONValue **memProps, + qemuDomainObjPrivate *priv); + /* Current, best practice */ virJSONValue * qemuBuildPCIHostdevDevProps(const virDomainDef *def, -- 2.37.4

On Mon, Nov 14, 2022 at 04:35:30PM +0100, Michal Privoznik wrote:
The aim of thread-context object is to set affinity on threads that allocate memory for a memory-backend-* object. For instance:
-object '{"qom-type":"thread-context","id":"tc-ram-node0","node-affinity":[3]}' \ -object '{"qom-type":"memory-backend-memfd","id":"ram-node0","hugetlb":true,\ "hugetlbsize":2097152,"share":true,"prealloc":true,"prealloc-threads":8,\ "size":15032385536,"host-nodes":[3],"policy":"preferred",\ "prealloc-context":"tc-ram-node0"}' \
allocates 14GiB worth of memory, backed by 2MiB hugepages from host NUMA node 3, using 8 threads. If it weren't for thread-context these threads wouldn't have any affinity and thus theoretically could be scheduled to run on CPUs of different NUMA node (which is what I saw occasionally).
Therefore, whenever we are pinning memory (IOW setting host-nodes attribute), we can generate thread-context object with the same affinity.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_command.c | 48 +++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_command.h | 5 +++++ 2 files changed, 53 insertions(+)
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 41b9f7cb52..392b248628 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3603,6 +3603,54 @@ qemuBuildMemoryDeviceProps(virQEMUDriverConfig *cfg, }
+int +qemuBuildThreadContextProps(virJSONValue **tcProps, + virJSONValue **memProps, + qemuDomainObjPrivate *priv) +{ + g_autoptr(virJSONValue) props = NULL; + virJSONValue *nodemask = NULL; + g_autoptr(virJSONValue) nodemaskCopy = NULL; + g_autofree char *tcAlias = NULL; + const char *memalias = NULL; + + if (tcProps) + *tcProps = NULL; +
What's the reason for this function to support tcProps == NULL?
+ if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_THREAD_CONTEXT)) + return 0; + + nodemask = virJSONValueObjectGetArray(*memProps, "host-nodes"); + if (!nodemask) + return 0; + + memalias = virJSONValueObjectGetString(*memProps, "id"); + if (!memalias) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("memory device alias is not assigned")); + return -1; + } + + tcAlias = g_strdup_printf("tc-%s", memalias); + nodemaskCopy = virJSONValueCopy(nodemask); + + if (virJSONValueObjectAdd(&props, + "s:qom-type", "thread-context", + "s:id", tcAlias, + "a:node-affinity", &nodemaskCopy, + NULL) < 0) + return -1; + + if (virJSONValueObjectAdd(memProps, + "s:prealloc-context", tcAlias, + NULL) < 0) + return -1; + + *tcProps = g_steal_pointer(&props);
It would crash here anyway if tcProps == NULL; otherwise Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
+ return 0; +} + + static char * qemuBuildLegacyNicStr(virDomainNetDef *net) { diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index 2578367df9..761cc5d865 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -147,6 +147,11 @@ qemuBuildMemoryDeviceProps(virQEMUDriverConfig *cfg, const virDomainDef *def, const virDomainMemoryDef *mem);
+int +qemuBuildThreadContextProps(virJSONValue **tcProps, + virJSONValue **memProps, + qemuDomainObjPrivate *priv); + /* Current, best practice */ virJSONValue * qemuBuildPCIHostdevDevProps(const virDomainDef *def, -- 2.37.4

While technically thread-context objects can be reused, we only use them (well, will use them) to pin memory allocation threads. Therefore, once we connect to QEMU monitor, all memory (with prealloc=yes) was allocated and thus these objects are no longer needed and can be removed. For on demand allocation the TC object is left behind. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_command.c | 7 ++++++ src/qemu/qemu_domain.c | 2 ++ src/qemu/qemu_domain.h | 2 ++ src/qemu/qemu_process.c | 47 +++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_process.h | 2 ++ 5 files changed, 60 insertions(+) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 392b248628..90bc537b33 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3613,6 +3613,7 @@ qemuBuildThreadContextProps(virJSONValue **tcProps, g_autoptr(virJSONValue) nodemaskCopy = NULL; g_autofree char *tcAlias = NULL; const char *memalias = NULL; + bool prealloc = false; if (tcProps) *tcProps = NULL; @@ -3646,6 +3647,12 @@ qemuBuildThreadContextProps(virJSONValue **tcProps, NULL) < 0) return -1; + if (virJSONValueObjectGetBoolean(*memProps, "prealloc", &prealloc) >= 0 && + prealloc) { + priv->threadContextAliases = g_slist_prepend(priv->threadContextAliases, + g_steal_pointer(&tcAlias)); + } + *tcProps = g_steal_pointer(&props); return 0; } diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 0d579bfc9b..ef1a9c8c74 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1811,6 +1811,8 @@ qemuDomainObjPrivateDataClear(qemuDomainObjPrivate *priv) priv->preMigrationMemlock = 0; virHashRemoveAll(priv->statsSchema); + + g_slist_free_full(g_steal_pointer(&priv->threadContextAliases), g_free); } diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index d5f4fbad12..a9af8502d2 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -251,6 +251,8 @@ struct _qemuDomainObjPrivate { * briefly when starting a guest. Don't save/parse into XML. */ pid_t schedCoreChildPID; pid_t schedCoreChildFD; + + GSList *threadContextAliases; /* List of IDs of thread-context objects */ }; #define QEMU_DOMAIN_PRIVATE(vm) \ diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 4b26c384cf..0769f30d74 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -7500,6 +7500,50 @@ qemuProcessSetupLifecycleActions(virDomainObj *vm, } +int +qemuProcessDeleteThreadContext(virDomainObj *vm) +{ + qemuDomainObjPrivate *priv = vm->privateData; + GSList *next = priv->threadContextAliases; + int ret = -1; + + if (!next) + return 0; + + for (; next; next = next->next) { + if (qemuMonitorDelObject(priv->mon, next->data, true) < 0) + goto cleanup; + } + + ret = 0; + cleanup: + g_slist_free_full(g_steal_pointer(&priv->threadContextAliases), g_free); + return ret; +} + + +static int +qemuProcessDeleteThreadContextHelper(virDomainObj *vm, + virDomainAsyncJob asyncJob) +{ + qemuDomainObjPrivate *priv = vm->privateData; + int ret = -1; + + if (!priv->threadContextAliases) + return 0; + + VIR_DEBUG("Deleting thread context objects"); + if (qemuDomainObjEnterMonitorAsync(vm, asyncJob) < 0) + return -1; + + ret = qemuProcessDeleteThreadContext(vm); + + qemuDomainObjExitMonitor(vm); + + return ret; +} + + /** * qemuProcessLaunch: * @@ -7860,6 +7904,9 @@ qemuProcessLaunch(virConnectPtr conn, if (qemuProcessSetupLifecycleActions(vm, asyncJob) < 0) goto cleanup; + if (qemuProcessDeleteThreadContextHelper(vm, asyncJob) < 0) + goto cleanup; + ret = 0; cleanup: diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index d1f58ee258..9a24745f15 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -123,6 +123,8 @@ int qemuProcessPrepareHost(virQEMUDriver *driver, virDomainObj *vm, unsigned int flags); +int qemuProcessDeleteThreadContext(virDomainObj *vm); + int qemuProcessLaunch(virConnectPtr conn, virQEMUDriver *driver, virDomainObj *vm, -- 2.37.4

On Mon, Nov 14, 2022 at 04:35:31PM +0100, Michal Privoznik wrote:
While technically thread-context objects can be reused, we only use them (well, will use them) to pin memory allocation threads. Therefore, once we connect to QEMU monitor, all memory (with prealloc=yes) was allocated and thus these objects are no longer needed and can be removed. For on demand allocation the TC object is left behind.
Reviewed-by: Martin Kletzander <mkletzan@redhat.com> and I think it would make more sense if you squashed it into the previous commit, but I'm fine with either.

When generating memory for guest NUMA memory-backend-* might be used. This means, we may need to generate thread-context objects too. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_command.c | 10 ++++++++++ .../hugepages-memaccess.x86_64-latest.args | 12 ++++++++---- .../hugepages-memaccess2.x86_64-latest.args | 12 ++++++++---- .../hugepages-shared.x86_64-latest.args | 12 ++++++++---- .../memfd-memory-default-hugepage.x86_64-latest.args | 3 ++- .../memfd-memory-numa.x86_64-latest.args | 3 ++- .../numatune-memnode.x86_64-latest.args | 9 ++++++--- 7 files changed, 44 insertions(+), 17 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 90bc537b33..a0f2644ba1 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -7325,6 +7325,16 @@ qemuBuildNumaCommandLine(virQEMUDriverConfig *cfg, ssize_t initiator = virDomainNumaGetNodeInitiator(def->numa, i); if (needBackend) { + g_autoptr(virJSONValue) tcProps = NULL; + + if (qemuBuildThreadContextProps(&tcProps, &nodeBackends[i], priv) < 0) + goto cleanup; + + if (tcProps && + qemuBuildObjectCommandlineFromJSON(cmd, tcProps, + priv->qemuCaps) < 0) + goto cleanup; + if (qemuBuildObjectCommandlineFromJSON(cmd, nodeBackends[i], priv->qemuCaps) < 0) goto cleanup; diff --git a/tests/qemuxml2argvdata/hugepages-memaccess.x86_64-latest.args b/tests/qemuxml2argvdata/hugepages-memaccess.x86_64-latest.args index 55a8d899b7..9db085fd1d 100644 --- a/tests/qemuxml2argvdata/hugepages-memaccess.x86_64-latest.args +++ b/tests/qemuxml2argvdata/hugepages-memaccess.x86_64-latest.args @@ -16,13 +16,17 @@ XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ -m size=4194304k,slots=16,maxmem=8388608k \ -overcommit mem-lock=off \ -smp 4,sockets=4,cores=1,threads=1 \ --object '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/dev/hugepages1G/libvirt/qemu/-1-QEMUGuest1","share":false,"prealloc":true,"size":1073741824,"host-nodes":[0,1,2,3],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-ram-node0","node-affinity":[0,1,2,3]}' \ +-object '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/dev/hugepages1G/libvirt/qemu/-1-QEMUGuest1","share":false,"prealloc":true,"size":1073741824,"host-nodes":[0,1,2,3],"policy":"bind","prealloc-context":"tc-ram-node0"}' \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ --object '{"qom-type":"memory-backend-file","id":"ram-node1","mem-path":"/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1","share":true,"prealloc":true,"size":1073741824,"host-nodes":[0,1,2,3],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-ram-node1","node-affinity":[0,1,2,3]}' \ +-object '{"qom-type":"memory-backend-file","id":"ram-node1","mem-path":"/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1","share":true,"prealloc":true,"size":1073741824,"host-nodes":[0,1,2,3],"policy":"bind","prealloc-context":"tc-ram-node1"}' \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \ --object '{"qom-type":"memory-backend-file","id":"ram-node2","mem-path":"/dev/hugepages1G/libvirt/qemu/-1-QEMUGuest1","share":false,"prealloc":true,"size":1073741824,"host-nodes":[0,1,2,3],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-ram-node2","node-affinity":[0,1,2,3]}' \ +-object '{"qom-type":"memory-backend-file","id":"ram-node2","mem-path":"/dev/hugepages1G/libvirt/qemu/-1-QEMUGuest1","share":false,"prealloc":true,"size":1073741824,"host-nodes":[0,1,2,3],"policy":"bind","prealloc-context":"tc-ram-node2"}' \ -numa node,nodeid=2,cpus=2,memdev=ram-node2 \ --object '{"qom-type":"memory-backend-file","id":"ram-node3","mem-path":"/dev/hugepages1G/libvirt/qemu/-1-QEMUGuest1","share":false,"prealloc":true,"size":1073741824,"host-nodes":[3],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-ram-node3","node-affinity":[3]}' \ +-object '{"qom-type":"memory-backend-file","id":"ram-node3","mem-path":"/dev/hugepages1G/libvirt/qemu/-1-QEMUGuest1","share":false,"prealloc":true,"size":1073741824,"host-nodes":[3],"policy":"bind","prealloc-context":"tc-ram-node3"}' \ -numa node,nodeid=3,cpus=3,memdev=ram-node3 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ -display none \ diff --git a/tests/qemuxml2argvdata/hugepages-memaccess2.x86_64-latest.args b/tests/qemuxml2argvdata/hugepages-memaccess2.x86_64-latest.args index 187c0ab214..37f6dfabe9 100644 --- a/tests/qemuxml2argvdata/hugepages-memaccess2.x86_64-latest.args +++ b/tests/qemuxml2argvdata/hugepages-memaccess2.x86_64-latest.args @@ -16,13 +16,17 @@ XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ -m size=4194304k,slots=16,maxmem=8388608k \ -overcommit mem-lock=off \ -smp 4,sockets=4,cores=1,threads=1 \ --object '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/var/lib/libvirt/qemu/ram/-1-QEMUGuest1/ram-node0","share":false,"size":1073741824,"host-nodes":[0,1,2,3],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-ram-node0","node-affinity":[0,1,2,3]}' \ +-object '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/var/lib/libvirt/qemu/ram/-1-QEMUGuest1/ram-node0","share":false,"size":1073741824,"host-nodes":[0,1,2,3],"policy":"bind","prealloc-context":"tc-ram-node0"}' \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ --object '{"qom-type":"memory-backend-file","id":"ram-node1","mem-path":"/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1","share":true,"prealloc":true,"size":1073741824,"host-nodes":[0,1,2,3],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-ram-node1","node-affinity":[0,1,2,3]}' \ +-object '{"qom-type":"memory-backend-file","id":"ram-node1","mem-path":"/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1","share":true,"prealloc":true,"size":1073741824,"host-nodes":[0,1,2,3],"policy":"bind","prealloc-context":"tc-ram-node1"}' \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \ --object '{"qom-type":"memory-backend-file","id":"ram-node2","mem-path":"/var/lib/libvirt/qemu/ram/-1-QEMUGuest1/ram-node2","share":false,"size":1073741824,"host-nodes":[0,1,2,3],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-ram-node2","node-affinity":[0,1,2,3]}' \ +-object '{"qom-type":"memory-backend-file","id":"ram-node2","mem-path":"/var/lib/libvirt/qemu/ram/-1-QEMUGuest1/ram-node2","share":false,"size":1073741824,"host-nodes":[0,1,2,3],"policy":"bind","prealloc-context":"tc-ram-node2"}' \ -numa node,nodeid=2,cpus=2,memdev=ram-node2 \ --object '{"qom-type":"memory-backend-file","id":"ram-node3","mem-path":"/var/lib/libvirt/qemu/ram/-1-QEMUGuest1/ram-node3","share":false,"size":1073741824,"host-nodes":[3],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-ram-node3","node-affinity":[3]}' \ +-object '{"qom-type":"memory-backend-file","id":"ram-node3","mem-path":"/var/lib/libvirt/qemu/ram/-1-QEMUGuest1/ram-node3","share":false,"size":1073741824,"host-nodes":[3],"policy":"bind","prealloc-context":"tc-ram-node3"}' \ -numa node,nodeid=3,cpus=3,memdev=ram-node3 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ -display none \ diff --git a/tests/qemuxml2argvdata/hugepages-shared.x86_64-latest.args b/tests/qemuxml2argvdata/hugepages-shared.x86_64-latest.args index f4fea870fc..4e7ffb50a5 100644 --- a/tests/qemuxml2argvdata/hugepages-shared.x86_64-latest.args +++ b/tests/qemuxml2argvdata/hugepages-shared.x86_64-latest.args @@ -16,13 +16,17 @@ XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ -m 4096 \ -overcommit mem-lock=off \ -smp 4,sockets=4,cores=1,threads=1 \ --object '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/dev/hugepages1G/libvirt/qemu/-1-QEMUGuest1","prealloc":true,"size":1073741824,"host-nodes":[0,1,2,3],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-ram-node0","node-affinity":[0,1,2,3]}' \ +-object '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/dev/hugepages1G/libvirt/qemu/-1-QEMUGuest1","prealloc":true,"size":1073741824,"host-nodes":[0,1,2,3],"policy":"bind","prealloc-context":"tc-ram-node0"}' \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ --object '{"qom-type":"memory-backend-file","id":"ram-node1","mem-path":"/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1","share":true,"prealloc":true,"size":1073741824,"host-nodes":[0,1,2,3],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-ram-node1","node-affinity":[0,1,2,3]}' \ +-object '{"qom-type":"memory-backend-file","id":"ram-node1","mem-path":"/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1","share":true,"prealloc":true,"size":1073741824,"host-nodes":[0,1,2,3],"policy":"bind","prealloc-context":"tc-ram-node1"}' \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \ --object '{"qom-type":"memory-backend-file","id":"ram-node2","mem-path":"/dev/hugepages1G/libvirt/qemu/-1-QEMUGuest1","share":false,"prealloc":true,"size":1073741824,"host-nodes":[0,1,2,3],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-ram-node2","node-affinity":[0,1,2,3]}' \ +-object '{"qom-type":"memory-backend-file","id":"ram-node2","mem-path":"/dev/hugepages1G/libvirt/qemu/-1-QEMUGuest1","share":false,"prealloc":true,"size":1073741824,"host-nodes":[0,1,2,3],"policy":"bind","prealloc-context":"tc-ram-node2"}' \ -numa node,nodeid=2,cpus=2,memdev=ram-node2 \ --object '{"qom-type":"memory-backend-file","id":"ram-node3","mem-path":"/dev/hugepages1G/libvirt/qemu/-1-QEMUGuest1","prealloc":true,"size":1073741824,"host-nodes":[3],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-ram-node3","node-affinity":[3]}' \ +-object '{"qom-type":"memory-backend-file","id":"ram-node3","mem-path":"/dev/hugepages1G/libvirt/qemu/-1-QEMUGuest1","prealloc":true,"size":1073741824,"host-nodes":[3],"policy":"bind","prealloc-context":"tc-ram-node3"}' \ -numa node,nodeid=3,cpus=3,memdev=ram-node3 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ -display none \ diff --git a/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args b/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args index 383635c8cd..f516c02bea 100644 --- a/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args +++ b/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args @@ -16,7 +16,8 @@ XDG_CONFIG_HOME=/tmp/lib/domain--1-instance-00000092/.config \ -m 14336 \ -overcommit mem-lock=off \ -smp 8,sockets=1,dies=1,cores=8,threads=1 \ --object '{"qom-type":"memory-backend-memfd","id":"ram-node0","hugetlb":true,"hugetlbsize":2097152,"share":true,"prealloc":true,"size":15032385536,"host-nodes":[3],"policy":"preferred"}' \ +-object '{"qom-type":"thread-context","id":"tc-ram-node0","node-affinity":[3]}' \ +-object '{"qom-type":"memory-backend-memfd","id":"ram-node0","hugetlb":true,"hugetlbsize":2097152,"share":true,"prealloc":true,"size":15032385536,"host-nodes":[3],"policy":"preferred","prealloc-context":"tc-ram-node0"}' \ -numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \ -uuid 126f2720-6f8e-45ab-a886-ec9277079a67 \ -display none \ diff --git a/tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args b/tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args index 68bbd73551..c51bb6f828 100644 --- a/tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args +++ b/tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args @@ -16,7 +16,8 @@ XDG_CONFIG_HOME=/tmp/lib/domain--1-instance-00000092/.config \ -m size=14680064k,slots=16,maxmem=1099511627776k \ -overcommit mem-lock=off \ -smp 8,sockets=1,dies=1,cores=8,threads=1 \ --object '{"qom-type":"memory-backend-memfd","id":"ram-node0","hugetlb":true,"hugetlbsize":2097152,"share":true,"prealloc":true,"prealloc-threads":8,"size":15032385536,"host-nodes":[3],"policy":"preferred"}' \ +-object '{"qom-type":"thread-context","id":"tc-ram-node0","node-affinity":[3]}' \ +-object '{"qom-type":"memory-backend-memfd","id":"ram-node0","hugetlb":true,"hugetlbsize":2097152,"share":true,"prealloc":true,"prealloc-threads":8,"size":15032385536,"host-nodes":[3],"policy":"preferred","prealloc-context":"tc-ram-node0"}' \ -numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \ -uuid 126f2720-6f8e-45ab-a886-ec9277079a67 \ -display none \ diff --git a/tests/qemuxml2argvdata/numatune-memnode.x86_64-latest.args b/tests/qemuxml2argvdata/numatune-memnode.x86_64-latest.args index 7cb7e659a4..f4ef91006f 100644 --- a/tests/qemuxml2argvdata/numatune-memnode.x86_64-latest.args +++ b/tests/qemuxml2argvdata/numatune-memnode.x86_64-latest.args @@ -16,11 +16,14 @@ XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest/.config \ -m 24105 \ -overcommit mem-lock=off \ -smp 32,sockets=32,cores=1,threads=1 \ --object '{"qom-type":"memory-backend-ram","id":"ram-node0","size":20971520,"host-nodes":[3],"policy":"preferred"}' \ +-object '{"qom-type":"thread-context","id":"tc-ram-node0","node-affinity":[3]}' \ +-object '{"qom-type":"memory-backend-ram","id":"ram-node0","size":20971520,"host-nodes":[3],"policy":"preferred","prealloc-context":"tc-ram-node0"}' \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ --object '{"qom-type":"memory-backend-ram","id":"ram-node1","size":676331520,"host-nodes":[0,1,2,3,4,5,6,7],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-ram-node1","node-affinity":[0,1,2,3,4,5,6,7]}' \ +-object '{"qom-type":"memory-backend-ram","id":"ram-node1","size":676331520,"host-nodes":[0,1,2,3,4,5,6,7],"policy":"bind","prealloc-context":"tc-ram-node1"}' \ -numa node,nodeid=1,cpus=1-27,cpus=29,memdev=ram-node1 \ --object '{"qom-type":"memory-backend-ram","id":"ram-node2","size":24578621440,"host-nodes":[1,2,5,7],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-ram-node2","node-affinity":[1,2,5,7]}' \ +-object '{"qom-type":"memory-backend-ram","id":"ram-node2","size":24578621440,"host-nodes":[1,2,5,7],"policy":"bind","prealloc-context":"tc-ram-node2"}' \ -numa node,nodeid=2,cpus=28,cpus=30-31,memdev=ram-node2 \ -uuid 9f4b6512-e73a-4a25-93e8-5307802821ce \ -display none \ -- 2.37.4

On Mon, Nov 14, 2022 at 04:35:32PM +0100, Michal Privoznik wrote:
When generating memory for guest NUMA memory-backend-* might be used. This means, we may need to generate thread-context objects too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>

When generating memory for memory devices memory-backend-* might be used. This means, we may need to generate thread-context objects too. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_command.c | 9 +++++++++ .../hugepages-memaccess.x86_64-latest.args | 3 ++- .../hugepages-memaccess2.x86_64-latest.args | 3 ++- .../hugepages-numa-default-dimm.x86_64-latest.args | 3 ++- .../memfd-memory-numa.x86_64-latest.args | 3 ++- .../memory-hotplug-dimm-addr.x86_64-latest.args | 3 ++- .../memory-hotplug-virtio-mem.x86_64-latest.args | 3 ++- .../pages-dimm-discard.x86_64-latest.args | 3 ++- 8 files changed, 23 insertions(+), 7 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index a0f2644ba1..49ecd91300 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3503,6 +3503,7 @@ qemuBuildMemoryDimmBackendStr(virCommand *cmd, qemuDomainObjPrivate *priv) { g_autoptr(virJSONValue) props = NULL; + g_autoptr(virJSONValue) tcProps = NULL; g_autofree char *alias = NULL; if (!mem->info.alias) { @@ -3517,6 +3518,14 @@ qemuBuildMemoryDimmBackendStr(virCommand *cmd, priv, def, mem, true, false) < 0) return -1; + if (qemuBuildThreadContextProps(&tcProps, &props, priv) < 0) + return -1; + + if (tcProps && + qemuBuildObjectCommandlineFromJSON(cmd, tcProps, + priv->qemuCaps) < 0) + return -1; + if (qemuBuildObjectCommandlineFromJSON(cmd, props, priv->qemuCaps) < 0) return -1; diff --git a/tests/qemuxml2argvdata/hugepages-memaccess.x86_64-latest.args b/tests/qemuxml2argvdata/hugepages-memaccess.x86_64-latest.args index 9db085fd1d..8342a30c86 100644 --- a/tests/qemuxml2argvdata/hugepages-memaccess.x86_64-latest.args +++ b/tests/qemuxml2argvdata/hugepages-memaccess.x86_64-latest.args @@ -39,7 +39,8 @@ XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ -no-acpi \ -boot strict=on \ -device '{"driver":"piix3-usb-uhci","id":"usb","bus":"pci.0","addr":"0x1.0x2"}' \ --object '{"qom-type":"memory-backend-file","id":"memdimm0","mem-path":"/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1","share":true,"prealloc":true,"size":536870912,"host-nodes":[0,1,2,3],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-memdimm0","node-affinity":[0,1,2,3]}' \ +-object '{"qom-type":"memory-backend-file","id":"memdimm0","mem-path":"/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1","share":true,"prealloc":true,"size":536870912,"host-nodes":[0,1,2,3],"policy":"bind","prealloc-context":"tc-memdimm0"}' \ -device '{"driver":"pc-dimm","node":1,"memdev":"memdimm0","id":"dimm0","slot":0,"addr":4294967296}' \ -blockdev '{"driver":"host_device","filename":"/dev/HostVG/QEMUGuest1","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}' \ diff --git a/tests/qemuxml2argvdata/hugepages-memaccess2.x86_64-latest.args b/tests/qemuxml2argvdata/hugepages-memaccess2.x86_64-latest.args index 37f6dfabe9..f73da5865d 100644 --- a/tests/qemuxml2argvdata/hugepages-memaccess2.x86_64-latest.args +++ b/tests/qemuxml2argvdata/hugepages-memaccess2.x86_64-latest.args @@ -39,7 +39,8 @@ XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ -no-acpi \ -boot strict=on \ -device '{"driver":"piix3-usb-uhci","id":"usb","bus":"pci.0","addr":"0x1.0x2"}' \ --object '{"qom-type":"memory-backend-file","id":"memdimm0","mem-path":"/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1","share":true,"prealloc":true,"size":536870912,"host-nodes":[0,1,2,3],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-memdimm0","node-affinity":[0,1,2,3]}' \ +-object '{"qom-type":"memory-backend-file","id":"memdimm0","mem-path":"/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1","share":true,"prealloc":true,"size":536870912,"host-nodes":[0,1,2,3],"policy":"bind","prealloc-context":"tc-memdimm0"}' \ -device '{"driver":"pc-dimm","node":1,"memdev":"memdimm0","id":"dimm0","slot":0,"addr":4294967296}' \ -blockdev '{"driver":"host_device","filename":"/dev/HostVG/QEMUGuest1","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}' \ diff --git a/tests/qemuxml2argvdata/hugepages-numa-default-dimm.x86_64-latest.args b/tests/qemuxml2argvdata/hugepages-numa-default-dimm.x86_64-latest.args index 015464bded..3b8dfeea16 100644 --- a/tests/qemuxml2argvdata/hugepages-numa-default-dimm.x86_64-latest.args +++ b/tests/qemuxml2argvdata/hugepages-numa-default-dimm.x86_64-latest.args @@ -29,7 +29,8 @@ XDG_CONFIG_HOME=/tmp/lib/domain--1-fedora/.config \ -no-acpi \ -boot strict=on \ -device '{"driver":"piix3-usb-uhci","id":"usb","bus":"pci.0","addr":"0x1.0x2"}' \ --object '{"qom-type":"memory-backend-file","id":"memdimm0","mem-path":"/dev/hugepages1G/libvirt/qemu/-1-fedora","prealloc":true,"size":1073741824,"host-nodes":[1,2,3],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-memdimm0","node-affinity":[1,2,3]}' \ +-object '{"qom-type":"memory-backend-file","id":"memdimm0","mem-path":"/dev/hugepages1G/libvirt/qemu/-1-fedora","prealloc":true,"size":1073741824,"host-nodes":[1,2,3],"policy":"bind","prealloc-context":"tc-memdimm0"}' \ -device '{"driver":"pc-dimm","node":0,"memdev":"memdimm0","id":"dimm0","slot":0}' \ -audiodev '{"id":"audio1","driver":"none"}' \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ diff --git a/tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args b/tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args index c51bb6f828..4748f0fbb2 100644 --- a/tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args +++ b/tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args @@ -30,7 +30,8 @@ XDG_CONFIG_HOME=/tmp/lib/domain--1-instance-00000092/.config \ -no-acpi \ -boot strict=on \ -device '{"driver":"piix3-usb-uhci","id":"usb","bus":"pci.0","addr":"0x1.0x2"}' \ --object '{"qom-type":"memory-backend-file","id":"memnvdimm0","mem-path":"/tmp/nvdimm","share":true,"prealloc":true,"prealloc-threads":8,"size":536870912,"host-nodes":[3],"policy":"preferred"}' \ +-object '{"qom-type":"thread-context","id":"tc-memnvdimm0","node-affinity":[3]}' \ +-object '{"qom-type":"memory-backend-file","id":"memnvdimm0","mem-path":"/tmp/nvdimm","share":true,"prealloc":true,"prealloc-threads":8,"size":536870912,"host-nodes":[3],"policy":"preferred","prealloc-context":"tc-memnvdimm0"}' \ -device '{"driver":"nvdimm","node":0,"memdev":"memnvdimm0","id":"nvdimm0","slot":0}' \ -audiodev '{"id":"audio1","driver":"none"}' \ -device '{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.0","addr":"0x3"}' \ diff --git a/tests/qemuxml2argvdata/memory-hotplug-dimm-addr.x86_64-latest.args b/tests/qemuxml2argvdata/memory-hotplug-dimm-addr.x86_64-latest.args index ac24c77a2b..4e9bbde448 100644 --- a/tests/qemuxml2argvdata/memory-hotplug-dimm-addr.x86_64-latest.args +++ b/tests/qemuxml2argvdata/memory-hotplug-dimm-addr.x86_64-latest.args @@ -29,7 +29,8 @@ XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ -no-acpi \ -boot strict=on \ -device '{"driver":"piix3-usb-uhci","id":"usb","bus":"pci.0","addr":"0x1.0x2"}' \ --object '{"qom-type":"memory-backend-file","id":"memdimm0","mem-path":"/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1","prealloc":true,"size":536870912,"host-nodes":[1,2,3],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-memdimm0","node-affinity":[1,2,3]}' \ +-object '{"qom-type":"memory-backend-file","id":"memdimm0","mem-path":"/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1","prealloc":true,"size":536870912,"host-nodes":[1,2,3],"policy":"bind","prealloc-context":"tc-memdimm0"}' \ -device '{"driver":"pc-dimm","node":0,"memdev":"memdimm0","id":"dimm0","slot":0,"addr":4294967296}' \ -object '{"qom-type":"memory-backend-ram","id":"memdimm2","size":536870912}' \ -device '{"driver":"pc-dimm","node":0,"memdev":"memdimm2","id":"dimm2","slot":2}' \ diff --git a/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args index 5aa8110aeb..04876fc044 100644 --- a/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args +++ b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args @@ -32,7 +32,8 @@ XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ -device '{"driver":"piix3-usb-uhci","id":"usb","bus":"pci.0","addr":"0x1.0x2"}' \ -object '{"qom-type":"memory-backend-ram","id":"memvirtiomem0","reserve":false,"size":1073741824}' \ -device '{"driver":"virtio-mem-pci","node":0,"block-size":2097152,"requested-size":536870912,"memdev":"memvirtiomem0","id":"virtiomem0","bus":"pci.0","addr":"0x2"}' \ --object '{"qom-type":"memory-backend-file","id":"memvirtiomem1","mem-path":"/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1","reserve":false,"size":2147483648,"host-nodes":[1,2,3],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-memvirtiomem1","node-affinity":[1,2,3]}' \ +-object '{"qom-type":"memory-backend-file","id":"memvirtiomem1","mem-path":"/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1","reserve":false,"size":2147483648,"host-nodes":[1,2,3],"policy":"bind","prealloc-context":"tc-memvirtiomem1"}' \ -device '{"driver":"virtio-mem-pci","node":0,"block-size":2097152,"requested-size":1073741824,"memdev":"memvirtiomem1","prealloc":true,"id":"virtiomem1","bus":"pci.1","addr":"0x1"}' \ -blockdev '{"driver":"host_device","filename":"/dev/HostVG/QEMUGuest1","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}' \ diff --git a/tests/qemuxml2argvdata/pages-dimm-discard.x86_64-latest.args b/tests/qemuxml2argvdata/pages-dimm-discard.x86_64-latest.args index 19ed3928de..2418133aa0 100644 --- a/tests/qemuxml2argvdata/pages-dimm-discard.x86_64-latest.args +++ b/tests/qemuxml2argvdata/pages-dimm-discard.x86_64-latest.args @@ -29,7 +29,8 @@ XDG_CONFIG_HOME=/tmp/lib/domain--1-fedora/.config \ -no-acpi \ -boot strict=on \ -device '{"driver":"piix3-usb-uhci","id":"usb","bus":"pci.0","addr":"0x1.0x2"}' \ --object '{"qom-type":"memory-backend-file","id":"memdimm0","mem-path":"/dev/hugepages1G/libvirt/qemu/-1-fedora","discard-data":false,"prealloc":true,"size":1073741824,"host-nodes":[1,2,3],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-memdimm0","node-affinity":[1,2,3]}' \ +-object '{"qom-type":"memory-backend-file","id":"memdimm0","mem-path":"/dev/hugepages1G/libvirt/qemu/-1-fedora","discard-data":false,"prealloc":true,"size":1073741824,"host-nodes":[1,2,3],"policy":"bind","prealloc-context":"tc-memdimm0"}' \ -device '{"driver":"pc-dimm","node":0,"memdev":"memdimm0","id":"dimm0","slot":0}' \ -object '{"qom-type":"memory-backend-file","id":"memdimm1","mem-path":"/var/lib/libvirt/qemu/ram/-1-fedora/dimm1","discard-data":true,"share":false,"size":536870912}' \ -device '{"driver":"pc-dimm","node":0,"memdev":"memdimm1","id":"dimm1","slot":1}' \ -- 2.37.4

On Mon, Nov 14, 2022 at 04:35:33PM +0100, Michal Privoznik wrote:
When generating memory for memory devices memory-backend-* might be used. This means, we may need to generate thread-context objects too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>

When generating memory for main guest memory memory-backend-* might be used. This means, we may need to generate thread-context objects too. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_command.c | 9 +++++++++ .../numatune-system-memory.x86_64-latest.args | 3 ++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 49ecd91300..b8aafd39c8 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -7009,6 +7009,7 @@ qemuBuildMemCommandLineMemoryDefaultBackend(virCommand *cmd, { g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(priv->driver); g_autoptr(virJSONValue) props = NULL; + g_autoptr(virJSONValue) tcProps = NULL; virDomainMemoryDef mem = { 0 }; mem.size = virDomainDefGetMemoryInitial(def); @@ -7019,6 +7020,14 @@ qemuBuildMemCommandLineMemoryDefaultBackend(virCommand *cmd, priv, def, &mem, false, true) < 0) return -1; + if (qemuBuildThreadContextProps(&tcProps, &props, priv) < 0) + return -1; + + if (tcProps && + qemuBuildObjectCommandlineFromJSON(cmd, tcProps, + priv->qemuCaps) < 0) + return -1; + if (qemuBuildObjectCommandlineFromJSON(cmd, props, priv->qemuCaps) < 0) return -1; diff --git a/tests/qemuxml2argvdata/numatune-system-memory.x86_64-latest.args b/tests/qemuxml2argvdata/numatune-system-memory.x86_64-latest.args index fd93abe3eb..125dc43153 100644 --- a/tests/qemuxml2argvdata/numatune-system-memory.x86_64-latest.args +++ b/tests/qemuxml2argvdata/numatune-system-memory.x86_64-latest.args @@ -14,7 +14,8 @@ XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ -accel tcg \ -cpu qemu64 \ -m 214 \ --object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":224395264,"host-nodes":[0],"policy":"bind"}' \ +-object '{"qom-type":"thread-context","id":"tc-pc.ram","node-affinity":[0]}' \ +-object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":224395264,"host-nodes":[0],"policy":"bind","prealloc-context":"tc-pc.ram"}' \ -overcommit mem-lock=off \ -smp 2,sockets=2,cores=1,threads=1 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ -- 2.37.4

On Mon, Nov 14, 2022 at 04:35:34PM +0100, Michal Privoznik wrote:
When generating memory for main guest memory memory-backend-* might be used. This means, we may need to generate thread-context objects too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>

This is similar to one of previous commits which generated thread-context object for memory devices at cmd line generation phase. This one does the same for hotplug case, except it's doing so iff QEMU sandboxing is turned off. The reason is that once sandboxing is turned on, the __NR_sched_setaffinity syscall is filtered by libseccomp and thus QEMU is unable to instantiate the thread-context object. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_hotplug.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index da92ced2f4..5c49da87ba 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -2240,11 +2240,13 @@ qemuDomainAttachMemory(virQEMUDriver *driver, g_autoptr(virJSONValue) devprops = NULL; g_autofree char *objalias = NULL; bool objAdded = false; + bool tcObjAdded = false; bool releaseaddr = false; bool teardownlabel = false; bool teardowncgroup = false; bool teardowndevice = false; g_autoptr(virJSONValue) props = NULL; + g_autoptr(virJSONValue) tcProps = NULL; virObjectEvent *event; int id; int ret = -1; @@ -2273,6 +2275,11 @@ qemuDomainAttachMemory(virQEMUDriver *driver, priv, vm->def, mem, true, false) < 0) goto cleanup; + /* In case sandbox was turned on, thread-context won't work. */ + if (cfg->seccompSandbox == 0 && + qemuBuildThreadContextProps(&tcProps, &props, priv) < 0) + goto cleanup; + if (qemuProcessBuildDestroyMemoryPaths(driver, vm, mem, true) < 0) goto cleanup; @@ -2294,6 +2301,12 @@ qemuDomainAttachMemory(virQEMUDriver *driver, goto removedef; qemuDomainObjEnterMonitor(vm); + if (tcProps) { + if (qemuMonitorAddObject(priv->mon, &tcProps, NULL) < 0) + goto exit_monitor; + tcObjAdded = true; + } + if (qemuMonitorAddObject(priv->mon, &props, NULL) < 0) goto exit_monitor; objAdded = true; @@ -2301,6 +2314,12 @@ qemuDomainAttachMemory(virQEMUDriver *driver, if (qemuMonitorAddDeviceProps(priv->mon, &devprops) < 0) goto exit_monitor; + if (tcObjAdded) { + if (qemuProcessDeleteThreadContext(vm) < 0) + goto exit_monitor; + tcObjAdded = false; + } + qemuDomainObjExitMonitor(vm); event = virDomainEventDeviceAddedNewFromObj(vm, objalias); @@ -2339,6 +2358,8 @@ qemuDomainAttachMemory(virQEMUDriver *driver, virErrorPreserveLast(&orig_err); if (objAdded) ignore_value(qemuMonitorDelObject(priv->mon, objalias, false)); + if (tcObjAdded) + ignore_value(qemuProcessDeleteThreadContext(vm)); qemuDomainObjExitMonitor(vm); if (objAdded && mem) @@ -4380,6 +4401,7 @@ qemuDomainRemoveMemoryDevice(virQEMUDriver *driver, qemuDomainObjEnterMonitor(vm); rc = qemuMonitorDelObject(priv->mon, backendAlias, true); + /* XXX remove TC object */ qemuDomainObjExitMonitor(vm); virDomainAuditMemory(vm, oldmem, newmem, "update", rc == 0); -- 2.37.4

On Mon, Nov 14, 2022 at 04:35:35PM +0100, Michal Privoznik wrote:
This is similar to one of previous commits which generated thread-context object for memory devices at cmd line generation phase. This one does the same for hotplug case, except it's doing so iff QEMU sandboxing is turned off. The reason is that once sandboxing is turned on, the __NR_sched_setaffinity syscall is filtered by libseccomp and thus QEMU is unable to instantiate the thread-context object.
I'm not sure this is enough checking for possible errors and I am not sure we actually need this; it looks like it might cause more problems than benefits. I can't think of a reason anyone would hotplug so much memory during runtime that it would benefit from the thread-context, although I admit I might very well be wrong on that. Anyway it can easily fail if the emulator is pinned to anything but a superset of host-nodes set for this particular memory device. There might be more restrictions from mgmt apps filtering some syscalls or whatnot. And there is an issue described below as well.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_hotplug.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index da92ced2f4..5c49da87ba 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -2240,11 +2240,13 @@ qemuDomainAttachMemory(virQEMUDriver *driver, g_autoptr(virJSONValue) devprops = NULL; g_autofree char *objalias = NULL; bool objAdded = false; + bool tcObjAdded = false; bool releaseaddr = false; bool teardownlabel = false; bool teardowncgroup = false; bool teardowndevice = false; g_autoptr(virJSONValue) props = NULL; + g_autoptr(virJSONValue) tcProps = NULL; virObjectEvent *event; int id; int ret = -1; @@ -2273,6 +2275,11 @@ qemuDomainAttachMemory(virQEMUDriver *driver, priv, vm->def, mem, true, false) < 0) goto cleanup;
+ /* In case sandbox was turned on, thread-context won't work. */ + if (cfg->seccompSandbox == 0 && + qemuBuildThreadContextProps(&tcProps, &props, priv) < 0) + goto cleanup; +
This function adds the alias to the list of tc aliases...
if (qemuProcessBuildDestroyMemoryPaths(driver, vm, mem, true) < 0) goto cleanup;
@@ -2294,6 +2301,12 @@ qemuDomainAttachMemory(virQEMUDriver *driver, goto removedef;
qemuDomainObjEnterMonitor(vm); + if (tcProps) { + if (qemuMonitorAddObject(priv->mon, &tcProps, NULL) < 0) + goto exit_monitor; + tcObjAdded = true;
If this (or anything above) fails the tcObjAdded is false (because it was not added, kinda makes sense. The hotplug fails, but if the user tries to hotplug again and it does work, then...
+ } + if (qemuMonitorAddObject(priv->mon, &props, NULL) < 0) goto exit_monitor; objAdded = true; @@ -2301,6 +2314,12 @@ qemuDomainAttachMemory(virQEMUDriver *driver, if (qemuMonitorAddDeviceProps(priv->mon, &devprops) < 0) goto exit_monitor;
+ if (tcObjAdded) { + if (qemuProcessDeleteThreadContext(vm) < 0) + goto exit_monitor;
This will fail because there is one leftover tc alias from the first hotplug attempt, even though this second hotplug was completely successful. Either remove it from the list if anything is unsuccessful (i.e. (tcProps && !tcObjAdded), add it to the list only when the object is added, or ignore the return value even in this call (or inside the qemuProcessDeleteThreadContext() function), so that we don't have corner cases of weird errors.
+ tcObjAdded = false; + } + qemuDomainObjExitMonitor(vm);
event = virDomainEventDeviceAddedNewFromObj(vm, objalias); @@ -2339,6 +2358,8 @@ qemuDomainAttachMemory(virQEMUDriver *driver, virErrorPreserveLast(&orig_err); if (objAdded) ignore_value(qemuMonitorDelObject(priv->mon, objalias, false)); + if (tcObjAdded) + ignore_value(qemuProcessDeleteThreadContext(vm)); qemuDomainObjExitMonitor(vm);
if (objAdded && mem) @@ -4380,6 +4401,7 @@ qemuDomainRemoveMemoryDevice(virQEMUDriver *driver,
qemuDomainObjEnterMonitor(vm); rc = qemuMonitorDelObject(priv->mon, backendAlias, true); + /* XXX remove TC object */
Leftover? Or did you mean to call DeleteThreadContext here too? It should be safe to remove all the leftover ones here, or if not, just removing "tc-{backendAlias}" here should do.
qemuDomainObjExitMonitor(vm);
virDomainAuditMemory(vm, oldmem, newmem, "update", rc == 0); -- 2.37.4

On 11/15/22 11:03, Martin Kletzander wrote:
On Mon, Nov 14, 2022 at 04:35:35PM +0100, Michal Privoznik wrote:
This is similar to one of previous commits which generated thread-context object for memory devices at cmd line generation phase. This one does the same for hotplug case, except it's doing so iff QEMU sandboxing is turned off. The reason is that once sandboxing is turned on, the __NR_sched_setaffinity syscall is filtered by libseccomp and thus QEMU is unable to instantiate the thread-context object.
I'm not sure this is enough checking for possible errors and I am not sure we actually need this; it looks like it might cause more problems than benefits. I can't think of a reason anyone would hotplug so much memory during runtime that it would benefit from the thread-context, although I admit I might very well be wrong on that. Anyway it can easily fail if the emulator is pinned to anything but a superset of host-nodes set for this particular memory device. There might be more restrictions from mgmt apps filtering some syscalls or whatnot. And there is an issue described below as well.
Fair enough. I'll drop this patch then. If anybody wants to use thread-context during hotplug I can invest my time into it then.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_hotplug.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index da92ced2f4..5c49da87ba 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -2240,11 +2240,13 @@ qemuDomainAttachMemory(virQEMUDriver *driver, g_autoptr(virJSONValue) devprops = NULL; g_autofree char *objalias = NULL; bool objAdded = false; + bool tcObjAdded = false; bool releaseaddr = false; bool teardownlabel = false; bool teardowncgroup = false; bool teardowndevice = false; g_autoptr(virJSONValue) props = NULL; + g_autoptr(virJSONValue) tcProps = NULL; virObjectEvent *event; int id; int ret = -1; @@ -2273,6 +2275,11 @@ qemuDomainAttachMemory(virQEMUDriver *driver, priv, vm->def, mem, true, false) < 0) goto cleanup;
+ /* In case sandbox was turned on, thread-context won't work. */ + if (cfg->seccompSandbox == 0 && + qemuBuildThreadContextProps(&tcProps, &props, priv) < 0) + goto cleanup; +
This function adds the alias to the list of tc aliases...
if (qemuProcessBuildDestroyMemoryPaths(driver, vm, mem, true) < 0) goto cleanup;
@@ -2294,6 +2301,12 @@ qemuDomainAttachMemory(virQEMUDriver *driver, goto removedef;
qemuDomainObjEnterMonitor(vm); + if (tcProps) { + if (qemuMonitorAddObject(priv->mon, &tcProps, NULL) < 0) + goto exit_monitor; + tcObjAdded = true;
If this (or anything above) fails the tcObjAdded is false (because it was not added, kinda makes sense. The hotplug fails, but if the user tries to hotplug again and it does work, then...
+ } + if (qemuMonitorAddObject(priv->mon, &props, NULL) < 0) goto exit_monitor; objAdded = true; @@ -2301,6 +2314,12 @@ qemuDomainAttachMemory(virQEMUDriver *driver, if (qemuMonitorAddDeviceProps(priv->mon, &devprops) < 0) goto exit_monitor;
+ if (tcObjAdded) { + if (qemuProcessDeleteThreadContext(vm) < 0) + goto exit_monitor;
This will fail because there is one leftover tc alias from the first hotplug attempt, even though this second hotplug was completely successful.
Either remove it from the list if anything is unsuccessful (i.e. (tcProps && !tcObjAdded), add it to the list only when the object is added, or ignore the return value even in this call (or inside the qemuProcessDeleteThreadContext() function), so that we don't have corner cases of weird errors.
Yeah, you're right.
+ tcObjAdded = false; + } + qemuDomainObjExitMonitor(vm);
event = virDomainEventDeviceAddedNewFromObj(vm, objalias); @@ -2339,6 +2358,8 @@ qemuDomainAttachMemory(virQEMUDriver *driver, virErrorPreserveLast(&orig_err); if (objAdded) ignore_value(qemuMonitorDelObject(priv->mon, objalias, false)); + if (tcObjAdded) + ignore_value(qemuProcessDeleteThreadContext(vm)); qemuDomainObjExitMonitor(vm);
if (objAdded && mem) @@ -4380,6 +4401,7 @@ qemuDomainRemoveMemoryDevice(virQEMUDriver *driver,
qemuDomainObjEnterMonitor(vm); rc = qemuMonitorDelObject(priv->mon, backendAlias, true); + /* XXX remove TC object */
Leftover? Or did you mean to call DeleteThreadContext here too? It should be safe to remove all the leftover ones here, or if not, just removing "tc-{backendAlias}" here should do.
Right. I could expose the thread-context alias building in its own function and then try to remove it here. But that might needlessly try to remove a non-existent thread-context object. What I had in mind was to store a list of <memory-object-* ID:thread-context ID> pairs and then lookup TC ID for given @backendAlias. Then again, lets invest time into this only after we know there's a demand. Thank you for the review! Michal
participants (3)
-
Martin Kletzander
-
Michal Privoznik
-
Michal Prívozník