[libvirt] [PATCH v3 0/6] numa: deprecate '-numa node, mem' and default memory distribution

Changes since v2: - taking in account previous review, implement a way for mgmt to intospect if '-numa node,mem' is supported by machine type as suggested by Daniel at https://www.mail-archive.com/qemu-devel@nongnu.org/msg601220.html * ammend "qom-list-properties" to show property values * add "numa-mem-supported" machine property to reflect if '-numa node,mem=SZ' is supported. It culd be used with '-machine none' or at runtime with --preconfig before numa memory mapping are configured * minor fixes to deprecation documentation mentioning "numa-mem-supported" property 1) "I'm considering to deprecating -mem-path/prealloc CLI options and replacing them with a single memdev Machine property to allow interested users to pick used backend for initial RAM (fixes mixed -mem-path+hostmem backends issues) and as a transition step to modeling initial RAM as a Device instead of (ab)using MemoryRegion APIs." (for more details see: https://www.mail-archive.com/qemu-devel@nongnu.org/msg596314.html) However there is a couple of roadblocks on the way (s390x and numa memory handling). I think I finally thought out a way to hack s390x in migration compatible manner, but I don't see any way to do it for -numa node,mem and default RAM assignement to nodes. Considering both numa usecases aren't meaningfully using NUMA (aside guest side testing) and could be replaced with explicitly used memdev parameter, I'd like to propose removing these fake NUMA friends on new machine types, hence this deprecation. And once the last machie type that supported the option is removed we would be able to remove option altogether. As result of removing deprecated options and replacing initial RAM allocation with 'memdev's (1), QEMU will allocate guest RAM in consistent way, fixing mixed use-case and allowing boards to move towards modelling initial RAM as Device(s). Which in its own turn should allow to cleanup NUMA/HMP/memory accounting code more by dropping ad-hoc node_mem tracking and reusing memory device enumeration instead. Reference to previous versions: * [PATCH 0/2] numa: deprecate -numa node, mem and default memory distribution https://www.mail-archive.com/qemu-devel@nongnu.org/msg600706.html * [PATCH] numa: warn if numa 'mem' option or default RAM splitting between nodes is used. https://www.mail-archive.com/qemu-devel@nongnu.org/msg602136.html * [PATCH v2] numa: warn if numa 'mem' option or default RAM splitting between nodes is used. https://www.spinics.net/linux/fedora/libvir/msg180917.html CC: libvir-list@redhat.com CC: ehabkost@redhat.com CC: pbonzini@redhat.com CC: berrange@redhat.com CC: armbru@redhat.com Igor Mammedov (6): pc: fix possible NULL pointer dereference in pc_machine_get_device_memory_region_size() qmp: make "qom-list-properties" show initial property values qmp: qmp_qom_list_properties(): ignore empty string options numa: introduce "numa-mem-supported" machine property numa: deprecate 'mem' parameter of '-numa node' option numa: deprecate implict memory distribution between nodes include/hw/boards.h | 1 + hw/arm/virt.c | 1 + hw/core/machine.c | 12 ++++++++++++ hw/i386/pc.c | 7 ++++++- hw/ppc/spapr.c | 1 + numa.c | 5 +++++ qapi/misc.json | 5 ++++- qemu-deprecated.texi | 24 ++++++++++++++++++++++++ qmp.c | 15 +++++++++++++++ 9 files changed, 69 insertions(+), 2 deletions(-) -- 2.7.4

QEMU will crash when device-memory-region-size property is read if ms->device_memory wasn't initialized yet (ex: property being inspected during preconfig time). Instead of crashing return 0 if ms->device_memory hasn't been initialized. Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- hw/i386/pc.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index d98b737..de91e90 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2461,7 +2461,11 @@ pc_machine_get_device_memory_region_size(Object *obj, Visitor *v, Error **errp) { MachineState *ms = MACHINE(obj); - int64_t value = memory_region_size(&ms->device_memory->mr); + int64_t value = 0; + + if (ms->device_memory) { + memory_region_size(&ms->device_memory->mr); + } visit_type_int(v, name, &value, errp); } -- 2.7.4

Igor Mammedov <imammedo@redhat.com> writes:
QEMU will crash when device-memory-region-size property is read if ms->device_memory wasn't initialized yet (ex: property being inspected during preconfig time).
Reproduced: $ qemu-system-x86_64 -nodefaults -S -display none -preconfig -qmp stdio {"QMP": {"version": {"qemu": {"micro": 50, "minor": 0, "major": 4}, "package": "v4.0.0-828-ga7b21f6762"}, "capabilities": ["oob"]}} {"execute": "qmp_capabilities"} {"return": {}} {"execute": "qom-get", "arguments": {"path": "/machine", "property": "device-memory-region-size"}} Segmentation fault (core dumped) First time I started looking at this series, I went "I'll need a reproducer to fully understand what's up, and I don't feel like finding one now; next series, please". Second time, I had to spend a few minutes on the reproducer. Wasn't hard, since you provided a clue. Still: make review easy, include a reproducer whenever you can.
Instead of crashing return 0 if ms->device_memory hasn't been initialized.
Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- hw/i386/pc.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/hw/i386/pc.c b/hw/i386/pc.c index d98b737..de91e90 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2461,7 +2461,11 @@ pc_machine_get_device_memory_region_size(Object *obj, Visitor *v, Error **errp) { MachineState *ms = MACHINE(obj); - int64_t value = memory_region_size(&ms->device_memory->mr); + int64_t value = 0; + + if (ms->device_memory) { + memory_region_size(&ms->device_memory->mr); + }
visit_type_int(v, name, &value, errp); }
This makes qom-get return 0 for the size of memory that doesn't exist, yet. A possible alternative would be setting an error. Opinions?

On Mon, 27 May 2019 18:36:25 +0200 Markus Armbruster <armbru@redhat.com> wrote:
Igor Mammedov <imammedo@redhat.com> writes:
QEMU will crash when device-memory-region-size property is read if ms->device_memory wasn't initialized yet (ex: property being inspected during preconfig time).
Reproduced:
$ qemu-system-x86_64 -nodefaults -S -display none -preconfig -qmp stdio {"QMP": {"version": {"qemu": {"micro": 50, "minor": 0, "major": 4}, "package": "v4.0.0-828-ga7b21f6762"}, "capabilities": ["oob"]}} {"execute": "qmp_capabilities"} {"return": {}} {"execute": "qom-get", "arguments": {"path": "/machine", "property": "device-memory-region-size"}} Segmentation fault (core dumped)
First time I started looking at this series, I went "I'll need a reproducer to fully understand what's up, and I don't feel like finding one now; next series, please". Second time, I had to spend a few minutes on the reproducer. Wasn't hard, since you provided a clue. Still: make review easy, include a reproducer whenever you can.
sure
Instead of crashing return 0 if ms->device_memory hasn't been initialized.
Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- hw/i386/pc.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/hw/i386/pc.c b/hw/i386/pc.c index d98b737..de91e90 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2461,7 +2461,11 @@ pc_machine_get_device_memory_region_size(Object *obj, Visitor *v, Error **errp) { MachineState *ms = MACHINE(obj); - int64_t value = memory_region_size(&ms->device_memory->mr); + int64_t value = 0; + + if (ms->device_memory) { + memory_region_size(&ms->device_memory->mr); + }
visit_type_int(v, name, &value, errp); }
This makes qom-get return 0 for the size of memory that doesn't exist, yet.
A possible alternative would be setting an error.
Opinions?
We don't have a notion of property not set in QOM, so a code that will receive a text based error will have to parse it (horrible idea) to avoid generation of related ACPI parts. In case of not enabled memory hotplug, PC_MACHINE_DEVMEM_REGION_SIZE == 0 is valid value and it's what's expected by other code.

Add in the command output object's property values right after creation (i.e. state of the object returned by object_new() or equivalent). Follow up patch will add machine property 'numa-mem-supported', which would allow mgmt to introspect which machine types (versions) still support legacy "-numa mem=FOO" CLI option and which don't and require alternative '-numa memdev' option being used. Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- qapi/misc.json | 5 ++++- qmp.c | 5 +++++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/qapi/misc.json b/qapi/misc.json index 8b3ca4f..e333285 100644 --- a/qapi/misc.json +++ b/qapi/misc.json @@ -1365,10 +1365,13 @@ # # @description: if specified, the description of the property. # +# @default: initial property value. +# # Since: 1.2 ## { 'struct': 'ObjectPropertyInfo', - 'data': { 'name': 'str', 'type': 'str', '*description': 'str' } } + 'data': { 'name': 'str', 'type': 'str', '*description': 'str', + '*default': 'any' } } ## # @qom-list: diff --git a/qmp.c b/qmp.c index b92d62c..8415541 100644 --- a/qmp.c +++ b/qmp.c @@ -593,6 +593,11 @@ ObjectPropertyInfoList *qmp_qom_list_properties(const char *typename, info->type = g_strdup(prop->type); info->has_description = !!prop->description; info->description = g_strdup(prop->description); + if (obj) { + info->q_default = + object_property_get_qobject(obj, info->name, NULL); + info->has_q_default = !!info->q_default; + } entry = g_malloc0(sizeof(*entry)); entry->value = info; -- 2.7.4

Igor Mammedov <imammedo@redhat.com> writes:
Add in the command output object's property values right after creation (i.e. state of the object returned by object_new() or equivalent).
Follow up patch will add machine property 'numa-mem-supported', which would allow mgmt to introspect which machine types (versions) still support legacy "-numa mem=FOO" CLI option and which don't and require alternative '-numa memdev' option being used.
I'll have to study that patch to figure out what exactly the use case is.
Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- qapi/misc.json | 5 ++++- qmp.c | 5 +++++ 2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/qapi/misc.json b/qapi/misc.json index 8b3ca4f..e333285 100644 --- a/qapi/misc.json +++ b/qapi/misc.json @@ -1365,10 +1365,13 @@ # # @description: if specified, the description of the property. # +# @default: initial property value. +# # Since: 1.2 ## { 'struct': 'ObjectPropertyInfo', - 'data': { 'name': 'str', 'type': 'str', '*description': 'str' } } + 'data': { 'name': 'str', 'type': 'str', '*description': 'str', + '*default': 'any' } }
ObjectPropertyInfo has three users: qom-list, device-list-properties, qom-list-properties.
## # @qom-list: diff --git a/qmp.c b/qmp.c index b92d62c..8415541 100644 --- a/qmp.c +++ b/qmp.c @@ -593,6 +593,11 @@ ObjectPropertyInfoList *qmp_qom_list_properties(const char *typename, info->type = g_strdup(prop->type); info->has_description = !!prop->description; info->description = g_strdup(prop->description); + if (obj) { + info->q_default = + object_property_get_qobject(obj, info->name, NULL); + info->has_q_default = !!info->q_default; + }
entry = g_malloc0(sizeof(*entry)); entry->value = info;
You update only qom-list-properties. The other two therefore never return objects with a @default member. But query-qmp-schema can't tell. Awkward. The doc comments don't tell. Doc bug. You could have qom-list-properties return a new type { 'struct': 'ObjectPropertyInfoWithDefault', 'base': 'ObjectPropertyInfo', 'data': { '*default': any } } The default value shown by qom-list-properties is the value found in a fresh object created with object_new(). object_new() runs ->instance_init(), which can do anything. When you call object_new() again, you might find a different value. In other words, the @default returned by qom-list-properties is unreliable. Related to this qom-list-properties caveat: # Note: objects can create properties at runtime, for example to describe # links between different devices and/or objects. These properties # are not included in the output of this command. Please add a similar one for @default. This command fights QOM's basic design premises, and it shows.

Current QAPI semantics return empty "" string in case string property value hasn't been set (i.e. NULL). Do not show initial value in this case in "qom-list-properties" command output to reduce clutter. Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- qmp.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/qmp.c b/qmp.c index 8415541..463c7d4 100644 --- a/qmp.c +++ b/qmp.c @@ -41,6 +41,7 @@ #include "qom/object_interfaces.h" #include "hw/mem/memory-device.h" #include "hw/acpi/acpi_dev_interface.h" +#include "qapi/qmp/qstring.h" NameInfo *qmp_query_name(Error **errp) { @@ -596,7 +597,16 @@ ObjectPropertyInfoList *qmp_qom_list_properties(const char *typename, if (obj) { info->q_default = object_property_get_qobject(obj, info->name, NULL); - info->has_q_default = !!info->q_default; + if (info->q_default) { + if (qobject_type(info->q_default) == QTYPE_QSTRING) { + QString *value = qobject_to(QString, info->q_default); + if (!strcmp(qstring_get_str(value), "")) { + qobject_unref(info->q_default); + info->q_default = NULL; + } + } + info->has_q_default = !!info->q_default; + } } entry = g_malloc0(sizeof(*entry)); -- 2.7.4

Igor Mammedov <imammedo@redhat.com> writes:
Current QAPI semantics return empty "" string in case string property value hasn't been set (i.e. NULL). Do not show initial value in this case in "qom-list-properties" command output to reduce clutter.
Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- qmp.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/qmp.c b/qmp.c index 8415541..463c7d4 100644 --- a/qmp.c +++ b/qmp.c @@ -41,6 +41,7 @@ #include "qom/object_interfaces.h" #include "hw/mem/memory-device.h" #include "hw/acpi/acpi_dev_interface.h" +#include "qapi/qmp/qstring.h"
NameInfo *qmp_query_name(Error **errp) { @@ -596,7 +597,16 @@ ObjectPropertyInfoList *qmp_qom_list_properties(const char *typename, if (obj) { info->q_default = object_property_get_qobject(obj, info->name, NULL); - info->has_q_default = !!info->q_default; + if (info->q_default) { + if (qobject_type(info->q_default) == QTYPE_QSTRING) { + QString *value = qobject_to(QString, info->q_default); + if (!strcmp(qstring_get_str(value), "")) { + qobject_unref(info->q_default); + info->q_default = NULL; + } + } + info->has_q_default = !!info->q_default; + } }
entry = g_malloc0(sizeof(*entry));
The commit message suggests we omit @default when a string-valued property is null. We omit it when it's "", too. Makes me wonder whether we should omit other "default defaults", too, such as integer zero. After all, what's so special about strings? I'm not sure omitting any default defaults is a good idea. But then I'm not yet sure @default is a good idea.

'-numa mem' option has a number of issues and mgmt often defaults to it. Unfortunately it's no possible to replace it with an alternative '-numa memdev' without breaking migration compatibility. What's possible though is to deprecate it, keeping option working with old machine types. Once deprecation period expires, QEMU will disable '-numa mem' option, usage on new machine types and when the last machine type that supported it is removed we would be able to remove '-numa mem' with associated code. In order to help mgmt to find out if being deprecated CLI option '-numa mem=SZ' is still supported by particular machine type, expose this information via "numa-mem-supported" machine property. Users can use "qom-list-properties" QMP command to list machine type properties including initial proprety values (when probing for supported machine types with '-machine none') or at runtime at preconfig time before numa mapping is configured and decide if they should used legacy '-numa mem' or alternative '-numa memdev' option. Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- include/hw/boards.h | 1 + hw/arm/virt.c | 1 + hw/core/machine.c | 12 ++++++++++++ hw/i386/pc.c | 1 + hw/ppc/spapr.c | 1 + 5 files changed, 16 insertions(+) diff --git a/include/hw/boards.h b/include/hw/boards.h index 6f7916f..9e347cf 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -210,6 +210,7 @@ struct MachineClass { bool ignore_boot_device_suffixes; bool smbus_no_migration_support; bool nvdimm_supported; + bool numa_mem_supported; HotplugHandler *(*get_hotplug_handler)(MachineState *machine, DeviceState *dev); diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 5331ab7..2e86c78 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1943,6 +1943,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) assert(!mc->get_hotplug_handler); mc->get_hotplug_handler = virt_machine_get_hotplug_handler; hc->plug = virt_machine_device_plug_cb; + mc->numa_mem_supported = true; } static void virt_instance_init(Object *obj) diff --git a/hw/core/machine.c b/hw/core/machine.c index 5d046a4..8bc53ba 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -506,6 +506,13 @@ static char *machine_get_nvdimm_persistence(Object *obj, Error **errp) return g_strdup(ms->nvdimms_state->persistence_string); } +static bool machine_get_numa_mem_supported(Object *obj, Error **errp) +{ + MachineClass *mc = MACHINE_GET_CLASS(obj); + + return mc->numa_mem_supported; +} + static void machine_set_nvdimm_persistence(Object *obj, const char *value, Error **errp) { @@ -810,6 +817,11 @@ static void machine_class_init(ObjectClass *oc, void *data) &error_abort); object_class_property_set_description(oc, "memory-encryption", "Set memory encryption object to use", &error_abort); + + object_class_property_add_bool(oc, "numa-mem-supported", + machine_get_numa_mem_supported, NULL, &error_abort); + object_class_property_set_description(oc, "numa-mem-supported", + "Shows if legacy '-numa mem=SIZE option is supported", &error_abort); } static void machine_class_base_init(ObjectClass *oc, void *data) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index de91e90..bec0055 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2756,6 +2756,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) nc->nmi_monitor_handler = x86_nmi; mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE; mc->nvdimm_supported = true; + mc->numa_mem_supported = true; object_class_property_add(oc, PC_MACHINE_DEVMEM_REGION_SIZE, "int", pc_machine_get_device_memory_region_size, NULL, diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 2ef3ce4..265ecfb 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -4336,6 +4336,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data) * in which LMBs are represented and hot-added */ mc->numa_mem_align_shift = 28; + mc->numa_mem_supported = true; smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF; smc->default_caps.caps[SPAPR_CAP_VSX] = SPAPR_CAP_ON; -- 2.7.4

Igor Mammedov <imammedo@redhat.com> writes:
'-numa mem' option has a number of issues and mgmt often defaults to it. Unfortunately it's no possible to replace it with an alternative '-numa memdev' without breaking migration compatibility.
To be precise: -numa node,mem=... and -numa node,memdev=... Correct?
What's possible though is to deprecate it, keeping option working with old machine types. Once deprecation period expires, QEMU will disable '-numa mem' option, usage on new machine types and when the last machine type that supported it is removed we would be able to remove '-numa mem' with associated code.
In order to help mgmt to find out if being deprecated CLI option '-numa mem=SZ' is still supported by particular machine type, expose this information via "numa-mem-supported" machine property.
Users can use "qom-list-properties" QMP command to list machine type properties including initial proprety values (when probing for supported machine types with '-machine none') or at runtime at preconfig time before numa mapping is configured and decide if they should used legacy '-numa mem' or alternative '-numa memdev' option.
This sentence is impenetrable, I'm afraid :) If we only want to convey whether a machine type supports -numa node,mem=..., then adding a flag to query-machines suffices. Since I'm pretty sure you'd have figured that out yourself, I suspect I'm missing something. Can you give me some examples of intended usage?

On Mon, 27 May 2019 20:38:57 +0200 Markus Armbruster <armbru@redhat.com> wrote:
Igor Mammedov <imammedo@redhat.com> writes:
'-numa mem' option has a number of issues and mgmt often defaults to it. Unfortunately it's no possible to replace it with an alternative '-numa memdev' without breaking migration compatibility.
To be precise: -numa node,mem=... and -numa node,memdev=... Correct? yep, I'll try to use full syntax since so it would be clear to others.
What's possible though is to deprecate it, keeping option working with old machine types. Once deprecation period expires, QEMU will disable '-numa mem' option, usage on new machine types and when the last machine type that supported it is removed we would be able to remove '-numa mem' with associated code.
In order to help mgmt to find out if being deprecated CLI option '-numa mem=SZ' is still supported by particular machine type, expose this information via "numa-mem-supported" machine property.
Users can use "qom-list-properties" QMP command to list machine type properties including initial proprety values (when probing for supported machine types with '-machine none') or at runtime at preconfig time before numa mapping is configured and decide if they should used legacy '-numa mem' or alternative '-numa memdev' option.
This sentence is impenetrable, I'm afraid :)
If we only want to convey whether a machine type supports -numa node,mem=..., then adding a flag to query-machines suffices. Since I'm pretty sure you'd have figured that out yourself, I suspect I'm missing I didn't know about query-machines, hence implemented "qom-list-properties" approach as was discussed at https://www.mail-archive.com/qemu-devel@nongnu.org/msg601220.html
For the purpose of deprecating '-numa node,mem" query-machines is more than enough. I'll drop 1-3 patches and respin series using query-machines.
something. Can you give me some examples of intended usage? Perhaps there will in future use cases when introspecting 'defaults' of objects will be needed, then we could look back into qom-list-properties if there aren't a better alternative.

On Fri, May 17, 2019 at 09:45:17AM +0200, Igor Mammedov wrote:
'-numa mem' option has a number of issues and mgmt often defaults to it. Unfortunately it's no possible to replace it with an alternative '-numa memdev' without breaking migration compatibility. What's possible though is to deprecate it, keeping option working with old machine types. Once deprecation period expires, QEMU will disable '-numa mem' option, usage on new machine types and when the last machine type that supported it is removed we would be able to remove '-numa mem' with associated code.
In order to help mgmt to find out if being deprecated CLI option '-numa mem=SZ' is still supported by particular machine type, expose this information via "numa-mem-supported" machine property.
Users can use "qom-list-properties" QMP command to list machine type properties including initial proprety values (when probing for supported machine types with '-machine none') or at runtime at preconfig time before numa mapping is configured and decide if they should used legacy '-numa mem' or alternative '-numa memdev' option.
Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- include/hw/boards.h | 1 + hw/arm/virt.c | 1 + hw/core/machine.c | 12 ++++++++++++ hw/i386/pc.c | 1 + hw/ppc/spapr.c | 1 + 5 files changed, 16 insertions(+)
diff --git a/include/hw/boards.h b/include/hw/boards.h index 6f7916f..9e347cf 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -210,6 +210,7 @@ struct MachineClass { bool ignore_boot_device_suffixes; bool smbus_no_migration_support; bool nvdimm_supported; + bool numa_mem_supported;
HotplugHandler *(*get_hotplug_handler)(MachineState *machine, DeviceState *dev); diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 5331ab7..2e86c78 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1943,6 +1943,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) assert(!mc->get_hotplug_handler); mc->get_hotplug_handler = virt_machine_get_hotplug_handler; hc->plug = virt_machine_device_plug_cb; + mc->numa_mem_supported = true; }
static void virt_instance_init(Object *obj) diff --git a/hw/core/machine.c b/hw/core/machine.c index 5d046a4..8bc53ba 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -506,6 +506,13 @@ static char *machine_get_nvdimm_persistence(Object *obj, Error **errp) return g_strdup(ms->nvdimms_state->persistence_string); }
+static bool machine_get_numa_mem_supported(Object *obj, Error **errp) +{ + MachineClass *mc = MACHINE_GET_CLASS(obj); + + return mc->numa_mem_supported;
I don't remember other cases where a property value is constant for all instances of a given class, but still requires instantiating an object just to look at the value of the property. Is this really a pattern we want to start using in QEMU? Why hiding numa_mem_supported behind qom-list-properties is better than simply extending the QAPI schema to add a new field to MachineInfo? Extending MachineInfo is simple and obvious, and it makes the new machine class attribute be explicitly documented in the QAPI schema.
+} + static void machine_set_nvdimm_persistence(Object *obj, const char *value, Error **errp) { @@ -810,6 +817,11 @@ static void machine_class_init(ObjectClass *oc, void *data) &error_abort); object_class_property_set_description(oc, "memory-encryption", "Set memory encryption object to use", &error_abort); + + object_class_property_add_bool(oc, "numa-mem-supported", + machine_get_numa_mem_supported, NULL, &error_abort); + object_class_property_set_description(oc, "numa-mem-supported", + "Shows if legacy '-numa mem=SIZE option is supported", &error_abort); }
static void machine_class_base_init(ObjectClass *oc, void *data) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index de91e90..bec0055 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2756,6 +2756,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) nc->nmi_monitor_handler = x86_nmi; mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE; mc->nvdimm_supported = true; + mc->numa_mem_supported = true;
object_class_property_add(oc, PC_MACHINE_DEVMEM_REGION_SIZE, "int", pc_machine_get_device_memory_region_size, NULL, diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 2ef3ce4..265ecfb 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -4336,6 +4336,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data) * in which LMBs are represented and hot-added */ mc->numa_mem_align_shift = 28; + mc->numa_mem_supported = true;
smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF; smc->default_caps.caps[SPAPR_CAP_VSX] = SPAPR_CAP_ON; -- 2.7.4
-- Eduardo

The parameter allows to configure fake NUMA topology where guest VM simulates NUMA topology but not actually getting a performance benefits from it. The same or better results could be achieved using 'memdev' parameter. In light of that any VM that uses NUMA to get its benefits should use 'memdev'. To allow transition initial RAM to device based model, deprecate 'mem' parameter as its ad-hoc partitioning of initial RAM MemoryRegion can't be translated to memdev based backend transparently to users and in compatible manner (migration wise). That will also allow to clean up a bit our numa code, leaving only 'memdev' impl. in place and several boards that use node_mem to generate FDT/ACPI description from it. Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- v3: * mention "numa-mem-supported" machine property in deprecation documentation. --- numa.c | 2 ++ qemu-deprecated.texi | 16 ++++++++++++++++ 2 files changed, 18 insertions(+) diff --git a/numa.c b/numa.c index 3875e1e..2205773 100644 --- a/numa.c +++ b/numa.c @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node, if (node->has_mem) { numa_info[nodenr].node_mem = node->mem; + warn_report("Parameter -numa node,mem is deprecated," + " use -numa node,memdev instead"); } if (node->has_memdev) { Object *o; diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi index 842e71b..995a96c 100644 --- a/qemu-deprecated.texi +++ b/qemu-deprecated.texi @@ -72,6 +72,22 @@ backend settings instead of environment variables. To ease migration to the new format, the ``-audiodev-help'' option can be used to convert the current values of the environment variables to ``-audiodev'' options. +@subsection -numa node,mem=@var{size} (since 4.1) + +The parameter @option{mem} of @option{-numa node} is used to assign a part of +guest RAM to a NUMA node. But when using it, it's impossible to manage specified +size on the host side (like bind it to a host node, setting bind policy, ...), +so guest end-ups with the fake NUMA configuration with suboptiomal performance. +However since 2014 there is an alternative way to assign RAM to a NUMA node +using parameter @option{memdev}, which does the same as @option{mem} and provides +means to actualy manage node RAM on the host side. Use parameter @option{memdev} +with @var{memory-backend-ram} backend as an replacement for parameter @option{mem} +to achieve the same fake NUMA effect or a properly configured +@var{memory-backend-file} backend to actually benefit from NUMA configuration. +In future new machine versions will not accept the option but it will keep +working with old machine types. User can inspect read-only machine property +'numa-mem-supported' to check if specific machine type (not) supports the option. + @section QEMU Machine Protocol (QMP) commands @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0) -- 2.7.4

Implicit RAM distribution between nodes has exactly the same issues as: "numa: deprecate 'mem' parameter of '-numa node' option" only with QEMU being the user that's 'adding' 'mem' parameter. Deprecate it, to get it out of the way so that we could consolidate guest RAM allocation using memory backends making it consistent and possibly later on transition to using memory devices instead of adhoc memory mapping of initial RAM. --- v3: - update deprecation doc, s/4.0/4.1/ - mention that legacy 'mem' option could also be used to provide explicit memory distribution for old machine types Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- numa.c | 3 +++ qemu-deprecated.texi | 8 ++++++++ 2 files changed, 11 insertions(+) diff --git a/numa.c b/numa.c index 2205773..6d45a1f 100644 --- a/numa.c +++ b/numa.c @@ -409,6 +409,9 @@ void numa_complete_configuration(MachineState *ms) if (i == nb_numa_nodes) { assert(mc->numa_auto_assign_ram); mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, ram_size); + warn_report("Default splitting of RAM between nodes is deprecated," + " Use '-numa node,memdev' to explictly define RAM" + " allocation per node"); } numa_total = 0; diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi index 995a96c..546f722 100644 --- a/qemu-deprecated.texi +++ b/qemu-deprecated.texi @@ -88,6 +88,14 @@ In future new machine versions will not accept the option but it will keep working with old machine types. User can inspect read-only machine property 'numa-mem-supported' to check if specific machine type (not) supports the option. +@subsection -numa node (without memory specified) (since 4.1) + +Splitting RAM by default between NUMA nodes has the same issues as @option{mem} +parameter described above with the difference that the role of the user plays +QEMU using implicit generic or board specific splitting rule. +Use @option{memdev} with @var{memory-backend-ram} backend or @option{mem} (if +it's supported by used machine type) to define mapping explictly instead. + @section QEMU Machine Protocol (QMP) commands @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0) -- 2.7.4
participants (3)
-
Eduardo Habkost
-
Igor Mammedov
-
Markus Armbruster