[libvirt] [PATCH] conf/qemu: enforce NUMA nodes only for x86 memory hotplug

libvirt enforces at least one NUMA node for memory hotplug support on all architectures. While it might be required for some x86 guest, PowerPC can hotplug memory on non-NUMA system. The generic checks are replaced with arch specific check and xml validation too does not enforce "node" for non-x86 arch. CC: Peter Krempa <pkrempa@redhat.com> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> --- src/conf/domain_conf.c | 9 ++++++--- src/qemu/qemu_command.c | 28 +++++++++++++++++----------- 2 files changed, 23 insertions(+), 14 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index fd0450f..4cb2d4a 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -12430,6 +12430,7 @@ virDomainMemorySourceDefParseXML(xmlNodePtr node, static int virDomainMemoryTargetDefParseXML(xmlNodePtr node, + const virDomainDef *domDef, xmlXPathContextPtr ctxt, virDomainMemoryDefPtr def) { @@ -12437,7 +12438,7 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node, xmlNodePtr save = ctxt->node; ctxt->node = node; - if (virXPathUInt("string(./node)", ctxt, &def->targetNode) < 0) { + if (virXPathUInt("string(./node)", ctxt, &def->targetNode) < 0 && ARCH_IS_X86(domDef->os.arch)) { virReportError(VIR_ERR_XML_ERROR, "%s", _("invalid or missing value of memory device node")); goto cleanup; @@ -12457,6 +12458,7 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node, static virDomainMemoryDefPtr virDomainMemoryDefParseXML(xmlNodePtr memdevNode, + const virDomainDef *domDef, xmlXPathContextPtr ctxt, unsigned int flags) { @@ -12495,7 +12497,7 @@ virDomainMemoryDefParseXML(xmlNodePtr memdevNode, goto error; } - if (virDomainMemoryTargetDefParseXML(node, ctxt, def) < 0) + if (virDomainMemoryTargetDefParseXML(node, domDef, ctxt, def) < 0) goto error; if (virDomainDeviceInfoParseXML(memdevNode, NULL, &def->info, flags) < 0) @@ -12647,7 +12649,7 @@ virDomainDeviceDefParse(const char *xmlStr, goto error; break; case VIR_DOMAIN_DEVICE_MEMORY: - if (!(dev->data.memory = virDomainMemoryDefParseXML(node, ctxt, flags))) + if (!(dev->data.memory = virDomainMemoryDefParseXML(node, def, ctxt, flags))) goto error; break; case VIR_DOMAIN_DEVICE_NONE: @@ -16328,6 +16330,7 @@ virDomainDefParseXML(xmlDocPtr xml, for (i = 0; i < n; i++) { virDomainMemoryDefPtr mem = virDomainMemoryDefParseXML(nodes[i], + def, ctxt, flags); if (!mem) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index ae03618..51160e7 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -4979,8 +4979,12 @@ qemuBuildMemoryBackendStr(unsigned long long size, *backendProps = NULL; *backendType = NULL; - /* memory devices could provide a invalid guest node */ - if (guestNode >= virDomainNumaGetNodeCount(def->numa)) { + /* memory devices could provide a invalid guest node. Moreover, + * x86 guests needs at least one numa node to support memory + * hotplug + */ + if ((virDomainNumaGetNodeCount(def->numa) == 0 && ARCH_IS_X86(def->os.arch)) || + guestNode > virDomainNumaGetNodeCount(def->numa)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("can't add memory backend for guest node '%d' as " "the guest has only '%zu' NUMA nodes configured"), @@ -4991,10 +4995,12 @@ qemuBuildMemoryBackendStr(unsigned long long size, if (!(props = virJSONValueNewObject())) return -1; - memAccess = virDomainNumaGetNodeMemoryAccessMode(def->numa, guestNode); - if (virDomainNumatuneGetMode(def->numa, guestNode, &mode) < 0 && - virDomainNumatuneGetMode(def->numa, -1, &mode) < 0) - mode = VIR_DOMAIN_NUMATUNE_MEM_STRICT; + if (virDomainNumaGetNodeCount(def->numa)) { + memAccess = virDomainNumaGetNodeMemoryAccessMode(def->numa, guestNode); + if (virDomainNumatuneGetMode(def->numa, guestNode, &mode) < 0 && + virDomainNumatuneGetMode(def->numa, -1, &mode) < 0) + mode = VIR_DOMAIN_NUMATUNE_MEM_STRICT; + } if (pagesize == 0) { /* Find the huge page size we want to use */ @@ -9238,11 +9244,11 @@ qemuBuildCommandLine(virConnectPtr conn, goto error; } - /* due to guest support, qemu would silently enable NUMA with one node - * once the memory hotplug backend is enabled. To avoid possible - * confusion we will enforce user originated numa configuration along - * with memory hotplug. */ - if (virDomainNumaGetNodeCount(def->numa) == 0) { + /* x86 windows guest needs at least one numa node to be + * present. While its not possible to detect what guest os is + * running, enforce this limitation only to x86 architecture. + */ + if (ARCH_IS_X86(def->os.arch) && virDomainNumaGetNodeCount(def->numa) == 0) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("At least one numa node has to be configured when " "enabling memory hotplug")); -- 2.4.3

On Tue, Aug 18, 2015 at 03:35:11PM +0530, Nikunj A Dadhania wrote:
libvirt enforces at least one NUMA node for memory hotplug support on all architectures. While it might be required for some x86 guest, PowerPC can hotplug memory on non-NUMA system.
The generic checks are replaced with arch specific check and xml validation too does not enforce "node" for non-x86 arch.
CC: Peter Krempa <pkrempa@redhat.com> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
For future reference, can you CC Andrea Bolognani <abologna@redhat.com> on Power related libvirt patches? He's handling most of our Power / libvirt work here at Red Hat. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson

David Gibson <david@gibson.dropbear.id.au> writes:
On Tue, Aug 18, 2015 at 03:35:11PM +0530, Nikunj A Dadhania wrote:
libvirt enforces at least one NUMA node for memory hotplug support on all architectures. While it might be required for some x86 guest, PowerPC can hotplug memory on non-NUMA system.
The generic checks are replaced with arch specific check and xml validation too does not enforce "node" for non-x86 arch.
CC: Peter Krempa <pkrempa@redhat.com> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
For future reference, can you CC Andrea Bolognani <abologna@redhat.com> on Power related libvirt patches? He's handling most of our Power / libvirt work here at Red Hat.
Sure David Regards, Nikunj

Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> writes:
libvirt enforces at least one NUMA node for memory hotplug support on all architectures. While it might be required for some x86 guest, PowerPC can hotplug memory on non-NUMA system.
The generic checks are replaced with arch specific check and xml validation too does not enforce "node" for non-x86 arch.
CC: Peter Krempa <pkrempa@redhat.com>
Ping ?

On Tue, Aug 18, 2015 at 15:35:11 +0530, Nikunj A Dadhania wrote:
libvirt enforces at least one NUMA node for memory hotplug support on all architectures. While it might be required for some x86 guest, PowerPC can hotplug memory on non-NUMA system.
The generic checks are replaced with arch specific check and xml validation too does not enforce "node" for non-x86 arch.
CC: Peter Krempa <pkrempa@redhat.com> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> --- src/conf/domain_conf.c | 9 ++++++--- src/qemu/qemu_command.c | 28 +++++++++++++++++----------- 2 files changed, 23 insertions(+), 14 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index fd0450f..4cb2d4a 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c
...
@@ -12437,7 +12438,7 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node, xmlNodePtr save = ctxt->node; ctxt->node = node;
- if (virXPathUInt("string(./node)", ctxt, &def->targetNode) < 0) { + if (virXPathUInt("string(./node)", ctxt, &def->targetNode) < 0 && ARCH_IS_X86(domDef->os.arch)) {
The parser code should not be made architecture dependant. In this case we will need to adjust the code in a way that it will set a known value in case the numa node was not provided in the device XML and the check itself will need to be moved into the post parse callback so that the decision can be made on a per-hypervisor basis.
virReportError(VIR_ERR_XML_ERROR, "%s", _("invalid or missing value of memory device node")); goto cleanup;
...
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index ae03618..51160e7 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -4979,8 +4979,12 @@ qemuBuildMemoryBackendStr(unsigned long long size, *backendProps = NULL; *backendType = NULL;
- /* memory devices could provide a invalid guest node */ - if (guestNode >= virDomainNumaGetNodeCount(def->numa)) { + /* memory devices could provide a invalid guest node. Moreover, + * x86 guests needs at least one numa node to support memory + * hotplug + */ + if ((virDomainNumaGetNodeCount(def->numa) == 0 && ARCH_IS_X86(def->os.arch)) || + guestNode > virDomainNumaGetNodeCount(def->numa)) {
If we make this ARCH dependent here it will be hard to adjust it again in the future. Also I think we should whitelist PPC rather than blacklisting x86, since other ARCHes and OSes might have the same problem here.
virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("can't add memory backend for guest node '%d' as " "the guest has only '%zu' NUMA nodes configured"), @@ -4991,10 +4995,12 @@ qemuBuildMemoryBackendStr(unsigned long long size, if (!(props = virJSONValueNewObject())) return -1;
- memAccess = virDomainNumaGetNodeMemoryAccessMode(def->numa, guestNode); - if (virDomainNumatuneGetMode(def->numa, guestNode, &mode) < 0 && - virDomainNumatuneGetMode(def->numa, -1, &mode) < 0) - mode = VIR_DOMAIN_NUMATUNE_MEM_STRICT; + if (virDomainNumaGetNodeCount(def->numa)) { + memAccess = virDomainNumaGetNodeMemoryAccessMode(def->numa, guestNode); + if (virDomainNumatuneGetMode(def->numa, guestNode, &mode) < 0 && + virDomainNumatuneGetMode(def->numa, -1, &mode) < 0) + mode = VIR_DOMAIN_NUMATUNE_MEM_STRICT; + }
if (pagesize == 0) { /* Find the huge page size we want to use */ @@ -9238,11 +9244,11 @@ qemuBuildCommandLine(virConnectPtr conn, goto error; }
- /* due to guest support, qemu would silently enable NUMA with one node - * once the memory hotplug backend is enabled. To avoid possible - * confusion we will enforce user originated numa configuration along - * with memory hotplug. */ - if (virDomainNumaGetNodeCount(def->numa) == 0) { + /* x86 windows guest needs at least one numa node to be + * present. While its not possible to detect what guest os is + * running, enforce this limitation only to x86 architecture.
Actually, qemu would add the numa node anyways, so the libvirt XML would not correspond to the configuration the guest sees and to avoid that we enforce the numa node.
+ */ + if (ARCH_IS_X86(def->os.arch) && virDomainNumaGetNodeCount(def->numa) == 0) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("At least one numa node has to be configured when " "enabling memory hotplug"));
Additionally, there's a bug in libvirt, where we'd use incorrect memory sizes when hotplug would be enabled without a numa node. I have a patch for this issue. Since this patch needs to be almost completely reworked I'll propose a patch that will lift this limitation without introducing arch specific code in multiple places. Peter

Peter Krempa <pkrempa@redhat.com> writes:
On Tue, Aug 18, 2015 at 15:35:11 +0530, Nikunj A Dadhania wrote:
libvirt enforces at least one NUMA node for memory hotplug support on all architectures. While it might be required for some x86 guest, PowerPC can hotplug memory on non-NUMA system.
The generic checks are replaced with arch specific check and xml validation too does not enforce "node" for non-x86 arch.
CC: Peter Krempa <pkrempa@redhat.com> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> --- src/conf/domain_conf.c | 9 ++++++--- src/qemu/qemu_command.c | 28 +++++++++++++++++----------- 2 files changed, 23 insertions(+), 14 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index fd0450f..4cb2d4a 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c
...
@@ -12437,7 +12438,7 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node, xmlNodePtr save = ctxt->node; ctxt->node = node;
- if (virXPathUInt("string(./node)", ctxt, &def->targetNode) < 0) { + if (virXPathUInt("string(./node)", ctxt, &def->targetNode) < 0 && ARCH_IS_X86(domDef->os.arch)) {
The parser code should not be made architecture dependant. In this case we will need to adjust the code in a way that it will set a known value in case the numa node was not provided in the device XML and the check itself will need to be moved into the post parse callback so that the decision can be made on a per-hypervisor basis.
Sure, the only requirement is node should not be made mandatory in parser code.
virReportError(VIR_ERR_XML_ERROR, "%s", _("invalid or missing value of memory device node")); goto cleanup;
...
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index ae03618..51160e7 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -4979,8 +4979,12 @@ qemuBuildMemoryBackendStr(unsigned long long size, *backendProps = NULL; *backendType = NULL;
- /* memory devices could provide a invalid guest node */ - if (guestNode >= virDomainNumaGetNodeCount(def->numa)) { + /* memory devices could provide a invalid guest node. Moreover, + * x86 guests needs at least one numa node to support memory + * hotplug + */ + if ((virDomainNumaGetNodeCount(def->numa) == 0 && ARCH_IS_X86(def->os.arch)) || + guestNode > virDomainNumaGetNodeCount(def->numa)) {
If we make this ARCH dependent here it will be hard to adjust it again in the future. Also I think we should whitelist PPC rather than blacklisting x86, since other ARCHes and OSes might have the same problem here.
Sure.
virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("can't add memory backend for guest node '%d' as " "the guest has only '%zu' NUMA nodes configured"), @@ -4991,10 +4995,12 @@ qemuBuildMemoryBackendStr(unsigned long long size, if (!(props = virJSONValueNewObject())) return -1;
- memAccess = virDomainNumaGetNodeMemoryAccessMode(def->numa, guestNode); - if (virDomainNumatuneGetMode(def->numa, guestNode, &mode) < 0 && - virDomainNumatuneGetMode(def->numa, -1, &mode) < 0) - mode = VIR_DOMAIN_NUMATUNE_MEM_STRICT; + if (virDomainNumaGetNodeCount(def->numa)) { + memAccess = virDomainNumaGetNodeMemoryAccessMode(def->numa, guestNode); + if (virDomainNumatuneGetMode(def->numa, guestNode, &mode) < 0 && + virDomainNumatuneGetMode(def->numa, -1, &mode) < 0) + mode = VIR_DOMAIN_NUMATUNE_MEM_STRICT; + }
if (pagesize == 0) { /* Find the huge page size we want to use */ @@ -9238,11 +9244,11 @@ qemuBuildCommandLine(virConnectPtr conn, goto error; }
- /* due to guest support, qemu would silently enable NUMA with one node - * once the memory hotplug backend is enabled. To avoid possible - * confusion we will enforce user originated numa configuration along - * with memory hotplug. */ - if (virDomainNumaGetNodeCount(def->numa) == 0) { + /* x86 windows guest needs at least one numa node to be + * present. While its not possible to detect what guest os is + * running, enforce this limitation only to x86 architecture.
Actually, qemu would add the numa node anyways, so the libvirt XML would not correspond to the configuration the guest sees and to avoid that we enforce the numa node.
I did not see this in my testing for PPC64.
+ */ + if (ARCH_IS_X86(def->os.arch) && virDomainNumaGetNodeCount(def->numa) == 0) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("At least one numa node has to be configured when " "enabling memory hotplug"));
Additionally, there's a bug in libvirt, where we'd use incorrect memory sizes when hotplug would be enabled without a numa node. I have a patch for this issue. Since this patch needs to be almost completely reworked I'll propose a patch that will lift this limitation without introducing arch specific code in multiple places.
That will be great, thanks Regards Nikunj
participants (3)
-
David Gibson
-
Nikunj A Dadhania
-
Peter Krempa