[libvirt] [RFC PATCH auto partition NUMA guest domains v1 0/2] auto partition guests providing the host NUMA topology

From: Wim ten Have <wim.ten.have@oracle.com> This patch extends the guest domain administration adding support to automatically advertise the host NUMA node capabilities obtained architecture under a guest by creating a vNUMA copy. The mechanism is enabled by setting the check='numa' attribute under the CPU 'host-passthrough' topology: <cpu mode='host-passthrough' check='numa' .../> When enabled the mechanism automatically renders the host capabilities provided NUMA architecture, evenly balances the guest reserved vcpu and memory amongst its vNUMA composed cells and have the cell allocated vcpus pinned towards the host NUMA node physical cpusets. This in such way that the host NUMA topology is still in effect under the partitioned guest domain. Below example auto partitions the host 'lscpu' listed physical NUMA detail under a guest domain vNUMA description. [root@host ]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 240 On-line CPU(s) list: 0-239 Thread(s) per core: 2 Core(s) per socket: 15 Socket(s): 8 NUMA node(s): 8 Vendor ID: GenuineIntel CPU family: 6 Model: 62 Model name: Intel(R) Xeon(R) CPU E7-8895 v2 @ 2.80GHz Stepping: 7 CPU MHz: 3449.555 CPU max MHz: 3600.0000 CPU min MHz: 1200.0000 BogoMIPS: 5586.28 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 38400K NUMA node0 CPU(s): 0-14,120-134 NUMA node1 CPU(s): 15-29,135-149 NUMA node2 CPU(s): 30-44,150-164 NUMA node3 CPU(s): 45-59,165-179 NUMA node4 CPU(s): 60-74,180-194 NUMA node5 CPU(s): 75-89,195-209 NUMA node6 CPU(s): 90-104,210-224 NUMA node7 CPU(s): 105-119,225-239 Flags: ... The guest 'anuma' without the auto partition rendering enabled reads; "<cpu mode='host-passthrough' check='none'/>" <domain type='kvm'> <name>anuma</name> <uuid>3f439f5f-1156-4d48-9491-945a2c0abc6d</uuid> <memory unit='KiB'>67108864</memory> <currentMemory unit='KiB'>67108864</currentMemory> <vcpu placement='static'>16</vcpu> <os> <type arch='x86_64' machine='pc-q35-2.11'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <vmport state='off'/> </features> <cpu mode='host-passthrough' check='none'/> <clock offset='utc'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <pm> <suspend-to-mem enabled='no'/> <suspend-to-disk enabled='no'/> </pm> <devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/anuma.qcow2'/> Enabling the auto partitioning the guest 'anuma' XML is rewritten as listed below; "<cpu mode='host-passthrough' check='numa'>" <domain type='kvm'> <name>anuma</name> <uuid>3f439f5f-1156-4d48-9491-945a2c0abc6d</uuid> <memory unit='KiB'>67108864</memory> <currentMemory unit='KiB'>67108864</currentMemory> <vcpu placement='static'>16</vcpu> <cputune> <vcpupin vcpu='0' cpuset='0-14,120-134'/> <vcpupin vcpu='1' cpuset='15-29,135-149'/> <vcpupin vcpu='2' cpuset='30-44,150-164'/> <vcpupin vcpu='3' cpuset='45-59,165-179'/> <vcpupin vcpu='4' cpuset='60-74,180-194'/> <vcpupin vcpu='5' cpuset='75-89,195-209'/> <vcpupin vcpu='6' cpuset='90-104,210-224'/> <vcpupin vcpu='7' cpuset='105-119,225-239'/> <vcpupin vcpu='8' cpuset='0-14,120-134'/> <vcpupin vcpu='9' cpuset='15-29,135-149'/> <vcpupin vcpu='10' cpuset='30-44,150-164'/> <vcpupin vcpu='11' cpuset='45-59,165-179'/> <vcpupin vcpu='12' cpuset='60-74,180-194'/> <vcpupin vcpu='13' cpuset='75-89,195-209'/> <vcpupin vcpu='14' cpuset='90-104,210-224'/> <vcpupin vcpu='15' cpuset='105-119,225-239'/> </cputune> <os> <type arch='x86_64' machine='pc-q35-2.11'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <vmport state='off'/> </features> <cpu mode='host-passthrough' check='numa'> <topology sockets='8' cores='1' threads='2'/> <numa> <cell id='0' cpus='0,8' memory='8388608' unit='KiB'> <distances> <sibling id='0' value='10'/> <sibling id='1' value='21'/> <sibling id='2' value='31'/> <sibling id='3' value='21'/> <sibling id='4' value='21'/> <sibling id='5' value='31'/> <sibling id='6' value='31'/> <sibling id='7' value='31'/> </distances> </cell> <cell id='1' cpus='1,9' memory='8388608' unit='KiB'> <distances> <sibling id='0' value='21'/> <sibling id='1' value='10'/> <sibling id='2' value='21'/> <sibling id='3' value='31'/> <sibling id='4' value='31'/> <sibling id='5' value='21'/> <sibling id='6' value='31'/> <sibling id='7' value='31'/> </distances> </cell> <cell id='2' cpus='2,10' memory='8388608' unit='KiB'> <distances> <sibling id='0' value='31'/> <sibling id='1' value='21'/> <sibling id='2' value='10'/> <sibling id='3' value='21'/> <sibling id='4' value='31'/> <sibling id='5' value='31'/> <sibling id='6' value='21'/> <sibling id='7' value='31'/> </distances> </cell> <cell id='3' cpus='3,11' memory='8388608' unit='KiB'> <distances> <sibling id='0' value='21'/> <sibling id='1' value='31'/> <sibling id='2' value='21'/> <sibling id='3' value='10'/> <sibling id='4' value='31'/> <sibling id='5' value='31'/> <sibling id='6' value='31'/> <sibling id='7' value='21'/> </distances> </cell> <cell id='4' cpus='4,12' memory='8388608' unit='KiB'> <distances> <sibling id='0' value='21'/> <sibling id='1' value='31'/> <sibling id='2' value='31'/> <sibling id='3' value='31'/> <sibling id='4' value='10'/> <sibling id='5' value='21'/> <sibling id='6' value='21'/> <sibling id='7' value='31'/> </distances> </cell> <cell id='5' cpus='5,13' memory='8388608' unit='KiB'> <distances> <sibling id='0' value='31'/> <sibling id='1' value='21'/> <sibling id='2' value='31'/> <sibling id='3' value='31'/> <sibling id='4' value='21'/> <sibling id='5' value='10'/> <sibling id='6' value='31'/> <sibling id='7' value='21'/> </distances> </cell> <cell id='6' cpus='6,14' memory='8388608' unit='KiB'> <distances> <sibling id='0' value='31'/> <sibling id='1' value='31'/> <sibling id='2' value='21'/> <sibling id='3' value='31'/> <sibling id='4' value='21'/> <sibling id='5' value='31'/> <sibling id='6' value='10'/> <sibling id='7' value='21'/> </distances> </cell> <cell id='7' cpus='7,15' memory='8388608' unit='KiB'> <distances> <sibling id='0' value='31'/> <sibling id='1' value='31'/> <sibling id='2' value='31'/> <sibling id='3' value='21'/> <sibling id='4' value='31'/> <sibling id='5' value='21'/> <sibling id='6' value='21'/> <sibling id='7' value='10'/> </distances> </cell> </numa> </cpu> <clock offset='utc'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <pm> <suspend-to-mem enabled='no'/> <suspend-to-disk enabled='no'/> </pm> <devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/anuma.qcow2'/> Finally the auto partitioned guest anuma 'lscpu' listed virtual vNUMA detail. [root@anuma ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 2 Core(s) per socket: 1 Socket(s): 8 NUMA node(s): 8 Vendor ID: GenuineIntel CPU family: 6 Model: 62 Model name: Intel(R) Xeon(R) CPU E7-8895 v2 @ 2.80GHz Stepping: 7 CPU MHz: 2793.268 BogoMIPS: 5586.53 Virtualization: VT-x Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 4096K L3 cache: 16384K NUMA node0 CPU(s): 0,8 NUMA node1 CPU(s): 1,9 NUMA node2 CPU(s): 2,10 NUMA node3 CPU(s): 3,11 NUMA node4 CPU(s): 4,12 NUMA node5 CPU(s): 5,13 NUMA node6 CPU(s): 6,14 NUMA node7 CPU(s): 7,15 Flags: ... Wim ten Have (2): domain: auto partition guests providing the host NUMA topology qemuxml2argv: add tests that exercise vNUMA auto partition topology docs/formatdomain.html.in | 7 + docs/schemas/cputypes.rng | 1 + src/conf/cpu_conf.c | 3 +- src/conf/cpu_conf.h | 1 + src/conf/domain_conf.c | 166 ++++++++++++++++++ .../cpu-host-passthrough-nonuma.args | 25 +++ .../cpu-host-passthrough-nonuma.xml | 18 ++ .../cpu-host-passthrough-numa.args | 29 +++ .../cpu-host-passthrough-numa.xml | 18 ++ tests/qemuxml2argvtest.c | 2 + 10 files changed, 269 insertions(+), 1 deletion(-) create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa.xml -- 2.17.1

From: Wim ten Have <wim.ten.have@oracle.com> Add a mechanism to auto partition the host NUMA topology under the guest domain. This patch adds a framework to automatically partition the host into a small vNUMA subset defined by the guest XML given <vcpu> and <memory> description when <cpu mode="host-passthrough" check="numa".../> are in effect and the hypervisor indicates per the host capabilities that a physical NUMA topology is in effect. The mechanism automatically renders the host capabilities provided NUMA architecture, evenly balances the guest reserved vcpu and memory amongst its vNUMA composed cells and have the cell allocated vcpus pinned towards the host NUMA node physical cpusets. This in such way that the host NUMA topology is still in effect under the partitioned guest vNUMA domain. Signed-off-by: Wim ten Have <wim.ten.have@oracle.com> --- docs/formatdomain.html.in | 7 ++ docs/schemas/cputypes.rng | 1 + src/conf/cpu_conf.c | 3 +- src/conf/cpu_conf.h | 1 + src/conf/domain_conf.c | 166 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 177 insertions(+), 1 deletion(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 1f12ab5b4214..ba073d952545 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1500,6 +1500,13 @@ <dd>The virtual CPU created by the hypervisor will be checked against the CPU specification and the domain will not be started unless the two CPUs match.</dd> + + <dt><code>numa</code></dt> + <dd>When the CPU mode='host-passthrough' check='numa' option + combination is set, libvirt auto partitions the guest domain + by rendering the host NUMA architecture. Here the virtual + CPUs and memory are evenly balanced across the defined NUMA + nodes. The vCPUs are also pinned to their physical CPUs.</dd> </dl> <span class="since">Since 0.9.10</span>, an optional <code>mode</code> diff --git a/docs/schemas/cputypes.rng b/docs/schemas/cputypes.rng index 1f1e0e36d59b..d384d161ee7e 100644 --- a/docs/schemas/cputypes.rng +++ b/docs/schemas/cputypes.rng @@ -29,6 +29,7 @@ <value>none</value> <value>partial</value> <value>full</value> + <value>numa</value> </choice> </attribute> </define> diff --git a/src/conf/cpu_conf.c b/src/conf/cpu_conf.c index 863413e75eaa..0d52f6aa4813 100644 --- a/src/conf/cpu_conf.c +++ b/src/conf/cpu_conf.c @@ -52,7 +52,8 @@ VIR_ENUM_IMPL(virCPUCheck, VIR_CPU_CHECK_LAST, "default", "none", "partial", - "full") + "full", + "numa") VIR_ENUM_IMPL(virCPUFallback, VIR_CPU_FALLBACK_LAST, "allow", diff --git a/src/conf/cpu_conf.h b/src/conf/cpu_conf.h index 9f2e7ee2649d..f2e2f0bef3ae 100644 --- a/src/conf/cpu_conf.h +++ b/src/conf/cpu_conf.h @@ -68,6 +68,7 @@ typedef enum { VIR_CPU_CHECK_NONE, VIR_CPU_CHECK_PARTIAL, VIR_CPU_CHECK_FULL, + VIR_CPU_CHECK_NUMA, VIR_CPU_CHECK_LAST } virCPUCheck; diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 9911d56130a9..c2f9398cfe85 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -1759,6 +1759,168 @@ virDomainDefGetVcpusTopology(const virDomainDef *def, } +/** + * virDomainNumaAutoconfig: auto partition guest vNUMA XML definitions + * taking the machine NUMA topology creating a small guest copy instance. + * @def: domain definition + * @caps: host capabilities + * + * Auto partitioning vNUMA guests is requested under XML configuration + * <cpu mode="host-passthrough" check="numa">. Here libvirt takes the + * host NUMA topology, including maxvcpus, online vcpus, memory and + * pinning node cpu's where it renders the guest domain vNUMA topology + * building an architectural copy of the host. + * + * Returns 0 on success and -1 on error. + */ +static int +virDomainNumaAutoconfig(virDomainDefPtr def, + virCapsPtr caps) +{ + int ret = -1; + + if (caps && def->cpu && + def->cpu->mode == VIR_CPU_MODE_HOST_PASSTHROUGH && + def->cpu->check == VIR_CPU_CHECK_NUMA) { + + size_t i, cell; + size_t nvcpus = 0; + size_t nnumaCell = 0; + unsigned long long memsizeCell = 0; + virBitmapPtr vnumask = NULL; + virCapsHostPtr host = &caps->host; + + nnumaCell = host->nnumaCell; + if (!nnumaCell) + goto cleanup; + + /* Compute the online vcpus */ + for (i = 0; i < def->maxvcpus; i++) + if (def->vcpus[i]->online) + nvcpus++; + + if (nvcpus < nnumaCell) { + VIR_WARN("vNUMA disabled: %ld vcpus is insufficient " + "to arrange a vNUMA topology for %ld nodes.", + nvcpus, nnumaCell); + goto cleanup; + } + + /* Compute the memory size (memsizeCell) per arranged nnumaCell + */ + if ((memsizeCell = def->mem.total_memory / nnumaCell) == 0) + goto cleanup; + + /* Correct vNUMA can only be accomplished if the number of maxvcpus + * is a multiple of the number of physical nodes. If this is not + * possible we set sockets, cores and threads to 0 so libvirt + * creates a default topology where all vcpus appear as sockets and + * cores and threads are set to 1. + */ + if (def->maxvcpus % nnumaCell) { + VIR_WARN("vNUMA: configured %ld vcpus do not meet the host " + "%ld NUMA nodes for an evenly balanced cpu topology.", + def->maxvcpus, nnumaCell); + def->cpu->sockets = def->cpu->cores = def->cpu->threads = 0; + } else { + /* Below artificial cpu topology computed aims for best host + * matching cores/threads alignment fitting the configured vcpus. + */ + unsigned int sockets = host->numaCell[nnumaCell-1]->cpus->socket_id + 1; + unsigned int threads = host->cpu->threads; + + if (def->maxvcpus % (sockets * threads)) + threads = 1; + + def->cpu->cores = def->maxvcpus / (sockets * threads); + def->cpu->threads = threads; + def->cpu->sockets = sockets; + } + + /* Build the vNUMA topology. Our former universe might have + * changed entirely where it did grow beyond former dimensions + * so fully free current allocations and start from scratch + * building new vNUMA topology. + */ + virDomainNumaFree(def->numa); + if (!(def->numa = virDomainNumaNew())) + goto error; + + if (!virDomainNumaSetNodeCount(def->numa, nnumaCell)) + goto error; + + for (cell = 0; cell < nnumaCell; cell++) { + char *vcpus = NULL; + size_t ndistances; + virBitmapPtr cpumask = NULL; + virCapsHostNUMACell *numaCell = host->numaCell[cell]; + + /* per NUMA cell memory size */ + virDomainNumaSetNodeMemorySize(def->numa, cell, memsizeCell); + + /* per NUMA cell vcpu range to mask */ + for (i = cell; i < def->maxvcpus; i += nnumaCell) { + char *tmp = NULL; + + if ((virAsprintf(&tmp, "%ld%s", i, + ((def->maxvcpus - i) > nnumaCell) ? "," : "") < 0) || + (virAsprintf(&vcpus, "%s%s", + (vcpus ? vcpus : ""), tmp) < 0)) { + VIR_FREE(tmp); + VIR_FREE(vcpus); + goto error; + } + VIR_FREE(tmp); + } + + if ((virBitmapParse(vcpus, &cpumask, VIR_DOMAIN_CPUMASK_LEN) < 0) || + (virDomainNumaSetNodeCpumask(def->numa, cell, cpumask) == NULL)) { + VIR_FREE(vcpus); + goto error; + } + VIR_FREE(vcpus); + + /* per NUMA cpus sibling vNUMA pinning */ + if (!(vnumask = virBitmapNew(nnumaCell * numaCell->ncpus))) + goto error; + + for (i = 0; i < numaCell->ncpus; i++) { + unsigned int id = numaCell->cpus[i].id; + + if (virBitmapSetBit(vnumask, id) < 0) { + virBitmapFree(vnumask); + goto error; + } + } + + for (i = 0; i < def->maxvcpus; i++) { + if (virBitmapIsBitSet(cpumask, i)) + def->vcpus[i]->cpumask = virBitmapNewCopy(vnumask); + } + virBitmapFree(vnumask); + + /* per NUMA cell sibling distances */ + ndistances = numaCell->nsiblings; + if (ndistances && + virDomainNumaSetNodeDistanceCount(def->numa, cell, ndistances) != nnumaCell) + goto error; + + for (i = 0; i < ndistances; i++) { + unsigned int distance = numaCell->siblings[i].distance; + + if (virDomainNumaSetNodeDistance(def->numa, cell, i, distance) != distance) + goto error; + } + } + } + cleanup: + ret = 0; + + error: + return ret; +} + + virDomainDiskDefPtr virDomainDiskDefNew(virDomainXMLOptionPtr xmlopt) { @@ -19749,6 +19911,10 @@ virDomainDefParseXML(xmlDocPtr xml, if (virDomainNumaDefCPUParseXML(def->numa, ctxt) < 0) goto error; + /* Check and apply auto partition vNUMA topology to the guest if requested */ + if (virDomainNumaAutoconfig(def, caps)) + goto error; + if (virDomainNumaGetCPUCountTotal(def->numa) > virDomainDefGetVcpusMax(def)) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Number of CPUs in <numa> exceeds the" -- 2.17.1

From: Wim ten Have <wim.ten.have@oracle.com> Add tests to ensure that the virDomainNumaAutoconfig() routine to auto- partition vNUMA topology generates correct KVM/QEMU cmdline arguments under applicable <cpu mode="host-passthrough" check="numa"> setup. Signed-off-by: Wim ten Have <wim.ten.have@oracle.com> --- .../cpu-host-passthrough-nonuma.args | 25 ++++++++++++++++ .../cpu-host-passthrough-nonuma.xml | 18 ++++++++++++ .../cpu-host-passthrough-numa.args | 29 +++++++++++++++++++ .../cpu-host-passthrough-numa.xml | 18 ++++++++++++ tests/qemuxml2argvtest.c | 2 ++ 5 files changed, 92 insertions(+) create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa.xml diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.args b/tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.args new file mode 100644 index 000000000000..4599cdfcc159 --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.args @@ -0,0 +1,25 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=kvm,usb=off,dump-guest-core=off \ +-cpu host \ +-m 214 \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.xml b/tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.xml new file mode 100644 index 000000000000..d8daa8c9a43a --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.xml @@ -0,0 +1,18 @@ +<domain type='kvm'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + </os> + <cpu mode='host-passthrough' check='numa'/> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa.args b/tests/qemuxml2argvdata/cpu-host-passthrough-numa.args new file mode 100644 index 000000000000..5b7b357cc0fe --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa.args @@ -0,0 +1,29 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=kvm,usb=off,dump-guest-core=off \ +-cpu host \ +-m 216 \ +-smp 6,sockets=6,cores=1,threads=1 \ +-numa node,nodeid=0,cpus=0,cpus=4,mem=54 \ +-numa node,nodeid=1,cpus=1,cpus=5,mem=54 \ +-numa node,nodeid=2,cpus=2,mem=54 \ +-numa node,nodeid=3,cpus=3,mem=54 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa.xml b/tests/qemuxml2argvdata/cpu-host-passthrough-numa.xml new file mode 100644 index 000000000000..39488521b27d --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa.xml @@ -0,0 +1,18 @@ +<domain type='kvm'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>6</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + </os> + <cpu mode='host-passthrough' check='numa'/> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index a7cde3ed7e74..4b6436b32ac7 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1745,6 +1745,8 @@ mymain(void) FLAG_SKIP_LEGACY_CPUS | FLAG_EXPECT_FAILURE, 0, GIC_NONE, NONE); DO_TEST("cpu-host-passthrough", QEMU_CAPS_KVM); + DO_TEST("cpu-host-passthrough-numa", QEMU_CAPS_NUMA); + DO_TEST("cpu-host-passthrough-nonuma", QEMU_CAPS_NUMA); DO_TEST_FAILURE("cpu-qemu-host-passthrough", QEMU_CAPS_KVM); qemuTestSetHostArch(driver.caps, VIR_ARCH_S390X); -- 2.17.1

On Tue, Sep 25, 2018 at 12:02:40 +0200, Wim Ten Have wrote:
From: Wim ten Have <wim.ten.have@oracle.com>
This patch extends the guest domain administration adding support to automatically advertise the host NUMA node capabilities obtained architecture under a guest by creating a vNUMA copy.
I'm pretty sure someone would find this useful and such configuration is perfectly valid in libvirt. But I don't think there is a compelling reason to add some magic into the domain XML which would automatically expand to such configuration. It's basically a NUMA placement policy and libvirt generally tries to avoid including any kind of policies and rather just provide all the mechanisms and knobs which can be used by applications to implement any policy they like.
The mechanism is enabled by setting the check='numa' attribute under the CPU 'host-passthrough' topology: <cpu mode='host-passthrough' check='numa' .../>
Anyway, this is definitely not the right place for such option. The 'check' attribute is described as "Since 3.2.0, an optional check attribute can be used to request a specific way of checking whether the virtual CPU matches the specification." and the new 'numa' value does not fit in there in any way. Moreover the code does the automatic NUMA placement at the moment libvirt parses the domain XML, which is not the right place since it would break migration, snapshots, and save/restore features. We have existing placement attributes for vcpu and numatune/memory elements which would have been much better place for implementing such feature. And event cpu/numa element could have been enhanced to support similar configuration. Jirka

On Tue, 25 Sep 2018 14:37:15 +0200 Jiri Denemark <jdenemar@redhat.com> wrote:
On Tue, Sep 25, 2018 at 12:02:40 +0200, Wim Ten Have wrote:
From: Wim ten Have <wim.ten.have@oracle.com>
This patch extends the guest domain administration adding support to automatically advertise the host NUMA node capabilities obtained architecture under a guest by creating a vNUMA copy.
I'm pretty sure someone would find this useful and such configuration is perfectly valid in libvirt. But I don't think there is a compelling reason to add some magic into the domain XML which would automatically expand to such configuration. It's basically a NUMA placement policy and libvirt generally tries to avoid including any kind of policies and rather just provide all the mechanisms and knobs which can be used by applications to implement any policy they like.
The mechanism is enabled by setting the check='numa' attribute under the CPU 'host-passthrough' topology: <cpu mode='host-passthrough' check='numa' .../>
Anyway, this is definitely not the right place for such option. The 'check' attribute is described as
"Since 3.2.0, an optional check attribute can be used to request a specific way of checking whether the virtual CPU matches the specification."
and the new 'numa' value does not fit in there in any way.
Moreover the code does the automatic NUMA placement at the moment libvirt parses the domain XML, which is not the right place since it would break migration, snapshots, and save/restore features.
Howdy, thanks for your fast response. I was Out Of Office for a while unable to reply earlier. The beef of this code does indeed not belong under the domain code and should rather move into the NUMA specific code where check='numa' is simply badly chosen. Also whilst OOO it occurred to me that besides auto partitioning the host into a vNUMA replica there's probably even other configuration target we may introduce reserving a single NUMA-node out of the nodes reserved for a guest to configure. So my plan is to come back asap with reworked code.
We have existing placement attributes for vcpu and numatune/memory elements which would have been much better place for implementing such feature. And event cpu/numa element could have been enhanced to support similar configuration.
Going over libvirt documentation I am more appealed with vcpu area. As said let me rework and return with better approach/RFC asap. Rgds, - Wim10H.
Jirka
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
participants (3)
-
Jiri Denemark
-
Wim Ten Have
-
Wim ten Have