[PATCH 0/5] NUMA CPUs 'auto-fill' for incomplete topologies

Hi, This series implements a convenience feature for the case where the user defines an incomplete NUMA topology in the domain, when the sum of all cpus in each NUMA cell is less than the maximum vcpus count of the domain. This is an implementation based on Peter Krempa's suggestion in [1]. The documentation patch (04), aside from documentating this new behavior, while also letting the user know that using incomplete NUMA topologies are not advised. This feature does not break migration ABI. I tested it in a migration using an incomplete NUMA topology domain to migrate to another host with a patched Libvirt. A guest with this NUMA topology, declaring 9 vcpus and short of 7 vcpus to complete the total of 16: <vcpu placement='auto'>16</vcpu> [...] <cpu mode='custom' match='exact' check='none'> <model fallback='forbid'>POWER9</model> <numa> <cell id='0' cpus='0-1,6' memory='3906250' unit='KiB'/> <cell id='1' cpus='2-4' memory='3906250' unit='KiB'/> <cell id='2' cpus='10,12' memory='3906250' unit='KiB'/> <cell id='3' cpus='14' memory='3906250' unit='KiB'/> </numa> </cpu> Successfully migrates to a patched Libvirt, ending up with this topology where the missing VCPUs were added to node 0: <vcpu placement='auto'>16</vcpu> [...] <cpu mode='custom' match='exact' check='none'> <model fallback='forbid'>POWER9</model> <numa> <cell id='0' cpus='0-1,5-9,11,13,15' memory='3932160' unit='KiB'/> <cell id='1' cpus='2-4' memory='3932160' unit='KiB'/> <cell id='2' cpus='10,12' memory='3932160' unit='KiB'/> <cell id='3' cpus='14' memory='3932160' unit='KiB'/> </numa> </cpu> [1] https://www.redhat.com/archives/libvir-list/2019-June/msg00263.html Daniel Henrique Barboza (5): numa_conf.c: add helper functions for cpumap operations qemu_domain.c: NUMA CPUs auto-fill for incomplete topologies qemuxml2xmltest.c: add NUMA vcpus auto fill tests formatdomain.html.in: document the NUMA cpus auto fill feature qemu_validate.c: revert NUMA CPU range user warning docs/formatdomain.html.in | 11 ++++- src/conf/numa_conf.c | 46 ++++++++++++++++++ src/conf/numa_conf.h | 3 ++ src/libvirt_private.syms | 1 + src/qemu/qemu_domain.c | 47 +++++++++++++++++++ src/qemu/qemu_domain.h | 4 ++ src/qemu/qemu_driver.c | 9 ++++ src/qemu/qemu_validate.c | 19 ++------ .../numavcpus-topology-mismatch.xml | 37 +++++++++++++++ ...avcpus-topology-mismatch.x86_64-latest.xml | 38 +++++++++++++++ tests/qemuxml2xmltest.c | 1 + 11 files changed, 201 insertions(+), 15 deletions(-) create mode 100644 tests/qemuxml2argvdata/numavcpus-topology-mismatch.xml create mode 100644 tests/qemuxml2xmloutdata/numavcpus-topology-mismatch.x86_64-latest.xml -- 2.26.2

These helpers will be used in an auto-fill feature for incomplete NUMA topologies in the next patch. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- src/conf/numa_conf.c | 46 ++++++++++++++++++++++++++++++++++++++++ src/conf/numa_conf.h | 3 +++ src/libvirt_private.syms | 1 + 3 files changed, 50 insertions(+) diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c index 6f1257fd8e..3d766189ba 100644 --- a/src/conf/numa_conf.c +++ b/src/conf/numa_conf.c @@ -1373,3 +1373,49 @@ virDomainNumaGetMemorySize(virDomainNumaPtr numa) return ret; } + + +static int +virDomainNumaRemoveCPUsFromMap(virBitmapPtr result, virBitmapPtr exclude) +{ + size_t i; + + for (i = 0; i < virBitmapSize(exclude); i++) { + if (!virBitmapIsBitSet(exclude, i)) + continue; + + if (virBitmapClearBitExpand(result, i) < 0) + return -1; + } + + return 0; +} + + +int +virDomainNumaFillCPUsInNode(virDomainNumaPtr numa, size_t node, + unsigned int maxCpus) +{ + g_autoptr(virBitmap) maxCPUsBitmap = virBitmapNew(maxCpus); + size_t i; + + if (node >= virDomainNumaGetNodeCount(numa)) + return -1; + + virBitmapSetAll(maxCPUsBitmap); + + for (i = 0; i < numa->nmem_nodes; i++) { + virBitmapPtr nodeCpus = virDomainNumaGetNodeCpumask(numa, i); + + if (i == node) + continue; + + if (virDomainNumaRemoveCPUsFromMap(maxCPUsBitmap, nodeCpus) < 0) + return -1; + } + + virBitmapFree(numa->mem_nodes[node].cpumask); + numa->mem_nodes[node].cpumask = g_steal_pointer(&maxCPUsBitmap); + + return 0; +} diff --git a/src/conf/numa_conf.h b/src/conf/numa_conf.h index ae3599bb8b..cdf87a87e8 100644 --- a/src/conf/numa_conf.h +++ b/src/conf/numa_conf.h @@ -185,3 +185,6 @@ int virDomainNumaDefCPUParseXML(virDomainNumaPtr def, xmlXPathContextPtr ctxt); int virDomainNumaDefCPUFormatXML(virBufferPtr buf, virDomainNumaPtr def); unsigned int virDomainNumaGetCPUCountTotal(virDomainNumaPtr numa); + +int virDomainNumaFillCPUsInNode(virDomainNumaPtr numa, size_t node, + unsigned int maxCpus); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index a6af44fe1c..788e08045a 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -834,6 +834,7 @@ virDomainMemoryAccessTypeFromString; virDomainMemoryAccessTypeToString; virDomainNumaCheckABIStability; virDomainNumaEquals; +virDomainNumaFillCPUsInNode; virDomainNumaFree; virDomainNumaGetCPUCountTotal; virDomainNumaGetMaxCPUID; -- 2.26.2

Libvirt allows the user to define an incomplete NUMA topology, where the sum of all CPUs in each cell is less than the total of VCPUs. What ends up happening is that QEMU allocates the non-enumerated CPUs in the first NUMA node. This behavior is being flagged as 'to be deprecated' at least since QEMU commit ec78f8114bc4 ("numa: use possible_cpus for not mapped CPUs check"). In [1], Maxiwell suggested that we forbid the user to define such topologies. In his review [2], Peter Krempa pointed out that we can't break existing guests, and suggested that Libvirt should emulate the QEMU behavior of putting the remaining vCPUs in the first NUMA node in these cases. This patch implements Peter Krempa's suggestion. Since we're going to most likely end up with disjointed NUMA configuration in node 0 after the auto-fill, we're making auto-fill dependent on QEMU_CAPS_NUMA. A following patch will update the documentation not just to inform about the auto-fill mechanic with incomplete NUMA topologies, but also to discourage the user to create such topologies in the future. This approach also makes Libvirt independent of whether QEMU changes its current behavior since we're either auto-filling the CPUs in node 0 or the user (hopefully) is aware that incomplete topologies, although supported in Libvirt, are to be avoided. [1] https://www.redhat.com/archives/libvir-list/2019-June/msg00224.html [2] https://www.redhat.com/archives/libvir-list/2019-June/msg00263.html Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- src/qemu/qemu_domain.c | 47 ++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 4 ++++ src/qemu/qemu_driver.c | 9 ++++++++ 3 files changed, 60 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index d5e3d1a3cc..8034b6a219 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -4953,6 +4953,50 @@ qemuDomainDefTsegPostParse(virDomainDefPtr def, } +/** + * qemuDomainDefNumaCPUsRectify: + * @numa: pointer to numa definition + * @maxCpus: number of CPUs this numa is supposed to have + * + * This function emulates the (to be deprecated) behavior of filling + * up in node0 with the remaining CPUs, in case of an incomplete NUMA + * setup, up to getVcpusMax. + * + * Returns: 0 on success, -1 on error + */ +int +qemuDomainDefNumaCPUsRectify(virDomainDefPtr def, virQEMUCapsPtr qemuCaps) +{ + unsigned int vcpusMax, numacpus; + + /* QEMU_CAPS_NUMA tells us if QEMU is able to handle disjointed + * NUMA CPU ranges. The filling process will create a disjointed + * setup in node0 most of the time. Do not proceed if QEMU + * can't handle it.*/ + if (virDomainNumaGetNodeCount(def->numa) == 0 || + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA)) + return 0; + + vcpusMax = virDomainDefGetVcpusMax(def); + numacpus = virDomainNumaGetCPUCountTotal(def->numa); + + if (numacpus < vcpusMax) { + if (virDomainNumaFillCPUsInNode(def->numa, 0, vcpusMax) < 0) + return -1; + } + + return 0; +} + + +static int +qemuDomainDefNumaCPUsPostParse(virDomainDefPtr def, + virQEMUCapsPtr qemuCaps) +{ + return qemuDomainDefNumaCPUsRectify(def, qemuCaps); +} + + static int qemuDomainDefPostParseBasic(virDomainDefPtr def, void *opaque G_GNUC_UNUSED) @@ -5039,6 +5083,9 @@ qemuDomainDefPostParse(virDomainDefPtr def, if (qemuDomainDefTsegPostParse(def, qemuCaps) < 0) return -1; + if (qemuDomainDefNumaCPUsPostParse(def, qemuCaps) < 0) + return -1; + return 0; } diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 41d3f1561d..e78a2b935d 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -1297,3 +1297,7 @@ qemuDomainInitializePflashStorageSource(virDomainObjPtr vm); bool qemuDomainDiskBlockJobIsSupported(virDomainObjPtr vm, virDomainDiskDefPtr disk); + +int +qemuDomainDefNumaCPUsRectify(virDomainDefPtr def, + virQEMUCapsPtr qemuCaps); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index dd9ae30bb5..9f4f11f15c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4999,6 +4999,7 @@ qemuDomainSetVcpusMax(virQEMUDriverPtr driver, unsigned int nvcpus) { g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); + g_autoptr(virQEMUCaps) qemuCaps = NULL; unsigned int topologycpus; if (def) { @@ -5029,6 +5030,14 @@ qemuDomainSetVcpusMax(virQEMUDriverPtr driver, if (virDomainDefSetVcpusMax(persistentDef, nvcpus, driver->xmlopt) < 0) return -1; + /* re-adjust NUMA nodes if needed */ + if (!(qemuCaps = virQEMUCapsCacheLookup(driver->qemuCapsCache, + persistentDef->emulator))) + return -1; + + if (qemuDomainDefNumaCPUsRectify(persistentDef, qemuCaps) < 0) + return -1; + if (virDomainDefSave(persistentDef, driver->xmlopt, cfg->configDir) < 0) return -1; -- 2.26.2

Add a unit test to verify the NUMA vcpus autocomplete implemented in the previous patch. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- .../numavcpus-topology-mismatch.xml | 37 ++++++++++++++++++ ...avcpus-topology-mismatch.x86_64-latest.xml | 38 +++++++++++++++++++ tests/qemuxml2xmltest.c | 1 + 3 files changed, 76 insertions(+) create mode 100644 tests/qemuxml2argvdata/numavcpus-topology-mismatch.xml create mode 100644 tests/qemuxml2xmloutdata/numavcpus-topology-mismatch.x86_64-latest.xml diff --git a/tests/qemuxml2argvdata/numavcpus-topology-mismatch.xml b/tests/qemuxml2argvdata/numavcpus-topology-mismatch.xml new file mode 100644 index 0000000000..d3b2c1297e --- /dev/null +++ b/tests/qemuxml2argvdata/numavcpus-topology-mismatch.xml @@ -0,0 +1,37 @@ +<domain type='kvm'> + <name>dummy</name> + <uuid>4d92ec27-9ebf-400b-ae91-20c71c647c19</uuid> + <maxMemory slots='16' unit='KiB'>16777216</maxMemory> + <memory unit='KiB'>7864320</memory> + <currentMemory unit='KiB'>3906240</currentMemory> + <vcpu placement='auto'>12</vcpu> + <numatune> + <memory mode='strict' placement='auto'/> + </numatune> + <os> + <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> + <boot dev='hd'/> + </os> + <cpu> + <numa> + <cell id='0' cpus='0' memory='3932160' unit='KiB'/> + <cell id='1' cpus='2-4,6' memory='3932160' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <controller type='usb' index='0' model='piix3-uhci'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxml2xmloutdata/numavcpus-topology-mismatch.x86_64-latest.xml b/tests/qemuxml2xmloutdata/numavcpus-topology-mismatch.x86_64-latest.xml new file mode 100644 index 0000000000..2d2245b10c --- /dev/null +++ b/tests/qemuxml2xmloutdata/numavcpus-topology-mismatch.x86_64-latest.xml @@ -0,0 +1,38 @@ +<domain type='kvm'> + <name>dummy</name> + <uuid>4d92ec27-9ebf-400b-ae91-20c71c647c19</uuid> + <maxMemory slots='16' unit='KiB'>16777216</maxMemory> + <memory unit='KiB'>7864320</memory> + <currentMemory unit='KiB'>3906240</currentMemory> + <vcpu placement='auto'>12</vcpu> + <numatune> + <memory mode='strict' placement='auto'/> + </numatune> + <os> + <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> + <boot dev='hd'/> + </os> + <cpu mode='custom' match='exact' check='none'> + <model fallback='forbid'>qemu64</model> + <numa> + <cell id='0' cpus='0-1,5,7-11' memory='3932160' unit='KiB'/> + <cell id='1' cpus='2-4,6' memory='3932160' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <controller type='usb' index='0' model='piix3-uhci'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index dcc7b29ded..3e434a6cbe 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -553,6 +553,7 @@ mymain(void) DO_TEST("vcpu-placement-static", QEMU_CAPS_KVM, QEMU_CAPS_OBJECT_IOTHREAD); + DO_TEST_CAPS_LATEST("numavcpus-topology-mismatch"); DO_TEST("smp", NONE); DO_TEST("iothreads", NONE); -- 2.26.2

We're not mentioning that we're replicating QEMU behavior on purpose. First because QEMU will one day, maybe, change the behavior and start to refuse incomplete NUMA setups, and then our documentation is now deprecated. Second, auto filling the CPUs in the first cell will work regardless of QEMU changes in the future. The idea is to encourage the user to provide a complete NUMA CPU topology, not relying on the CPU auto fill mechanic. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- docs/formatdomain.html.in | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 33cec1e6dd..0d1b34d7b2 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1787,7 +1787,16 @@ <p> Each <code>cell</code> element specifies a NUMA cell or a NUMA node. <code>cpus</code> specifies the CPU or range of CPUs that are - part of the node. <code>memory</code> specifies the node memory + part of the node. <span class="since">Since 6.5.0</span> For the qemu + driver, if the emulator binary supports disjointed <code>cpus</code> ranges + in each <code>cell</code>, the sum of all CPUs declared in each <code>cell</code> + will be matched with the maximum number of virtual CPUs declared in the + <code>vcpu</code> element. This is done by filling any remaining CPUs + into the first NUMA <code>cell</code>. Users are encouraged to supply a + complete NUMA topology, where the sum of the NUMA CPUs matches the maximum + virtual CPUs number declared in <code>vcpus</code>, to make the domain + consistent across qemu and libvirt versions. + <code>memory</code> specifies the node memory in kibibytes (i.e. blocks of 1024 bytes). <span class="since">Since 1.2.11</span> one can use an additional <a href="#elementsMemoryAllocation"><code>unit</code></a> attribute to -- 2.26.2

Now that we have the auto-fill code in place, and with proper documentation to let the user know that (1) we will auto-fill the NUMA cpus up to the number to maximum VCPUs number if QEMU supports it and (2) the user is advised to always supply a complete NUMA topology, this warning is unneeded. This reverts commit 38d2e033686b5cc274f8f55075ce1985b71e329a. CC: Maxiwell S. Garcia <maxiwell@linux.ibm.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- src/qemu/qemu_validate.c | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 584d1375b8..14d614934d 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -895,24 +895,15 @@ qemuValidateDomainDef(const virDomainDef *def, if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_QUERY_HOTPLUGGABLE_CPUS)) { unsigned int topologycpus; unsigned int granularity; - unsigned int numacpus; /* Starting from QEMU 2.5, max vCPU count and overall vCPU topology * must agree. We only actually enforce this with QEMU 2.7+, due * to the capability check above */ - if (virDomainDefGetVcpusTopology(def, &topologycpus) == 0) { - if (topologycpus != virDomainDefGetVcpusMax(def)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("CPU topology doesn't match maximum vcpu count")); - return -1; - } - - numacpus = virDomainNumaGetCPUCountTotal(def->numa); - if ((numacpus != 0) && (topologycpus != numacpus)) { - VIR_WARN("CPU topology doesn't match numa CPU count; " - "partial NUMA mapping is obsoleted and will " - "be removed in future"); - } + if (virDomainDefGetVcpusTopology(def, &topologycpus) == 0 && + topologycpus != virDomainDefGetVcpusMax(def)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("CPU topology doesn't match maximum vcpu count")); + return -1; } /* vCPU hotplug granularity must be respected */ -- 2.26.2

On Mon, Jun 01, 2020 at 14:50:41 -0300, Daniel Henrique Barboza wrote:
Now that we have the auto-fill code in place, and with proper documentation to let the user know that (1) we will auto-fill the NUMA cpus up to the number to maximum VCPUs number if QEMU supports it and (2) the user is advised to always supply a complete NUMA topology, this warning is unneeded.
This reverts commit 38d2e033686b5cc274f8f55075ce1985b71e329a.
Since we already have the validation in place for some time now I think we should just keep it. The auto-filling would be a useful hack to work around if config breaks, but judged by itself it's of questionable benefit. Specifically users might end up with a topology which they didn't expect. Reasoning is basically the same as with qemu. Any default behaviour here is a policy decision and it might not suit all uses.

On 6/1/20 4:40 PM, Peter Krempa wrote:
On Mon, Jun 01, 2020 at 14:50:41 -0300, Daniel Henrique Barboza wrote:
Now that we have the auto-fill code in place, and with proper documentation to let the user know that (1) we will auto-fill the NUMA cpus up to the number to maximum VCPUs number if QEMU supports it and (2) the user is advised to always supply a complete NUMA topology, this warning is unneeded.
This reverts commit 38d2e033686b5cc274f8f55075ce1985b71e329a.
Since we already have the validation in place for some time now I think we should just keep it. The auto-filling would be a useful hack to work around if config breaks, but judged by itself it's of questionable benefit.
That's a good point. I agree that removing the message after being in place for this long is more trouble than it's worth.
Specifically users might end up with a topology which they didn't expect. Reasoning is basically the same as with qemu. Any default behaviour here is a policy decision and it might not suit all uses.
An ideal situation would be QEMU to never accept incomplete NUMA topologies in the first place. Given that this wasn't the case and now there might be a plethora of guests running with goofy topologies all around, the already existing warning message + this auto-fill hack + documentation mentioning that users should avoid these topologies is a fine solution from Libvirt side, in my estimation. Thanks, DHB

On Mon, 1 Jun 2020 17:14:20 -0300 Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
On 6/1/20 4:40 PM, Peter Krempa wrote:
On Mon, Jun 01, 2020 at 14:50:41 -0300, Daniel Henrique Barboza wrote:
Now that we have the auto-fill code in place, and with proper documentation to let the user know that (1) we will auto-fill the NUMA cpus up to the number to maximum VCPUs number if QEMU supports it and (2) the user is advised to always supply a complete NUMA topology, this warning is unneeded.
This reverts commit 38d2e033686b5cc274f8f55075ce1985b71e329a.
Since we already have the validation in place for some time now I think we should just keep it. The auto-filling would be a useful hack to work around if config breaks, but judged by itself it's of questionable benefit.
That's a good point. I agree that removing the message after being in place for this long is more trouble than it's worth.
Specifically users might end up with a topology which they didn't expect. Reasoning is basically the same as with qemu. Any default behaviour here is a policy decision and it might not suit all uses.
An ideal situation would be QEMU to never accept incomplete NUMA topologies in the first place. At least with your series I can safely drop deprecated incomplete NUMA topologies on QEMU side (which were producing warnings for a while)
Given that this wasn't the case and now there might be a plethora of guests running with goofy topologies all around, the already existing warning message + this auto-fill hack + documentation mentioning that users should avoid these topologies is a fine solution from Libvirt side, in my estimation.
Thanks,
DHB

On 6/2/20 5:53 AM, Igor Mammedov wrote:
On Mon, 1 Jun 2020 17:14:20 -0300 Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
[...]
An ideal situation would be QEMU to never accept incomplete NUMA topologies in the first place. At least with your series I can safely drop deprecated incomplete NUMA topologies on QEMU side (which were producing warnings for a while)
Keep in mind that I'm not enabling this auto-fill feature for all guests across the board. The code is relying on the availability of QEMU_CAPS_NUMA. Granted, there are not many 6 year old guests running Libvirt 1.2.6/QEMU 2.2.0 around, and I'm not even sure if we care about supporting this scenario, but I feel obliged to mention this for the record :) Thanks, DHB

Ping On 6/1/20 2:50 PM, Daniel Henrique Barboza wrote:
Hi,
This series implements a convenience feature for the case where the user defines an incomplete NUMA topology in the domain, when the sum of all cpus in each NUMA cell is less than the maximum vcpus count of the domain.
This is an implementation based on Peter Krempa's suggestion in [1]. The documentation patch (04), aside from documentating this new behavior, while also letting the user know that using incomplete NUMA topologies are not advised.
This feature does not break migration ABI. I tested it in a migration using an incomplete NUMA topology domain to migrate to another host with a patched Libvirt. A guest with this NUMA topology, declaring 9 vcpus and short of 7 vcpus to complete the total of 16:
<vcpu placement='auto'>16</vcpu> [...] <cpu mode='custom' match='exact' check='none'> <model fallback='forbid'>POWER9</model> <numa> <cell id='0' cpus='0-1,6' memory='3906250' unit='KiB'/> <cell id='1' cpus='2-4' memory='3906250' unit='KiB'/> <cell id='2' cpus='10,12' memory='3906250' unit='KiB'/> <cell id='3' cpus='14' memory='3906250' unit='KiB'/> </numa> </cpu>
Successfully migrates to a patched Libvirt, ending up with this topology where the missing VCPUs were added to node 0:
<vcpu placement='auto'>16</vcpu> [...] <cpu mode='custom' match='exact' check='none'> <model fallback='forbid'>POWER9</model> <numa> <cell id='0' cpus='0-1,5-9,11,13,15' memory='3932160' unit='KiB'/> <cell id='1' cpus='2-4' memory='3932160' unit='KiB'/> <cell id='2' cpus='10,12' memory='3932160' unit='KiB'/> <cell id='3' cpus='14' memory='3932160' unit='KiB'/> </numa> </cpu>
[1] https://www.redhat.com/archives/libvir-list/2019-June/msg00263.html
Daniel Henrique Barboza (5): numa_conf.c: add helper functions for cpumap operations qemu_domain.c: NUMA CPUs auto-fill for incomplete topologies qemuxml2xmltest.c: add NUMA vcpus auto fill tests formatdomain.html.in: document the NUMA cpus auto fill feature qemu_validate.c: revert NUMA CPU range user warning
docs/formatdomain.html.in | 11 ++++- src/conf/numa_conf.c | 46 ++++++++++++++++++ src/conf/numa_conf.h | 3 ++ src/libvirt_private.syms | 1 + src/qemu/qemu_domain.c | 47 +++++++++++++++++++ src/qemu/qemu_domain.h | 4 ++ src/qemu/qemu_driver.c | 9 ++++ src/qemu/qemu_validate.c | 19 ++------ .../numavcpus-topology-mismatch.xml | 37 +++++++++++++++ ...avcpus-topology-mismatch.x86_64-latest.xml | 38 +++++++++++++++ tests/qemuxml2xmltest.c | 1 + 11 files changed, 201 insertions(+), 15 deletions(-) create mode 100644 tests/qemuxml2argvdata/numavcpus-topology-mismatch.xml create mode 100644 tests/qemuxml2xmloutdata/numavcpus-topology-mismatch.x86_64-latest.xml
participants (3)
-
Daniel Henrique Barboza
-
Igor Mammedov
-
Peter Krempa