Currently we are only able to bind the whole domain to some host nodes
using the /domain/numatune/memory element. Numerous requests were
made to support host<->guest numa node bindings, so this series tries
to pinch an idea on how to do that using /domain/numatune/memnode
elements.
So here are few ideas I'd like to know others opinions on:
For some reason, qemu wants to know what host nodes it can use to for
allocation of the memory. While adding support for that, qemu added
various memory objects (-object memory*) with different backends.
There's 'memory-file' which is used for hugepages and 'memory-ram'
which is used for standard allocation. Latest version of the qemu
proposal is here:
http://lists.gnu.org/archive/html/qemu-devel/2014-05/msg02706.html
Caveats:
- I'm not sure how cpu hotplug is done with guest numa nodes, but if
there is a possibility to increase the number of numa nodes (which
does not make sense to me from (a) user point of view and (b) our
XMLs and APIs), we need to be able to hotplug the ram as well,
- virDomainGetNumaParameters() now reflects only the
/domain/numatune/memory settings, not 'memnode' ones,
- virDomainSetNumaParameters() is not allowed when there is some
/domain/numatune/memnode parameter as we can query memdev info, but
not change it (if I understood the QEMU side correctly),
- when domain is started, cpuset.mems cgroup is not modified per for
each vcpu, this will be fixed, but the question is how to handle it
for non-strict settings [*],
- automatic numad placement can be now used together with memnode
settings which IMHO doesn't make any sense, but I was hesitant to
disable that in case somebody has a constructive criticism in this
area.
- This series alone is broken when used with
/domain/memoryBacking/hugepages, because it will still use the
memory-ram object, but that will be fixed with Michal's patches on
top of this series.
One idea how to solve some of the problems is to say that
/domain/numatune/memory is set for the whole domain regardless of what
anyone puts in /domain/numatune/memnode. virDomainGetNumaParameters()
could be extended to tell the info for all guest numa nodes, although
it seems new API would suit better for this kind of information. But
is it really neede when we are not able to modify it live and the
information is available in the domain XML?
*) does (or should) this:
...
<numatune>
<memory mode='strict' placement='static' nodeset='0-7'/>
<memnode nodeid='0' mode='preferred' nodeset='7'/>
</numatune>
...
mean what it looks like it means, that is "in guest node 0, prefer
allocating from host node 7 but feel free to allocate from 0-6 as well
in case you can't use 7, but never try allocating from host nodes
8-15"?
Martin Kletzander (5):
conf, schema: add 'id' field for cells
conf, schema: add support for numatune memnode element
qemu: purely a code movement
qemu: numa capability probing
qemu: pass numa node binding preferences to qemu
docs/formatdomain.html.in | 29 +++-
docs/schemas/domaincommon.rng | 22 +++
src/conf/cpu_conf.c | 39 ++++-
src/conf/domain_conf.c | 181 +++++++++++++++++----
src/qemu/qemu_capabilities.c | 2 +
src/qemu/qemu_capabilities.h | 1 +
src/qemu/qemu_cgroup.c | 2 +
src/qemu/qemu_command.c | 160 ++++++++++++++++--
src/qemu/qemu_command.h | 3 +-
src/qemu/qemu_domain.c | 23 ++-
src/qemu/qemu_driver.c | 14 +-
src/qemu/qemu_process.c | 3 +-
src/util/virnuma.h | 14 +-
tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml | 6 +-
tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml | 6 +-
tests/qemuxml2argvdata/qemuxml2argv-cpu-numa3.xml | 25 +++
.../qemuxml2argv-numatune-auto-prefer.args | 6 +
.../qemuxml2argv-numatune-auto-prefer.xml | 29 ++++
.../qemuxml2argv-numatune-auto.args | 6 +
.../qemuxml2argv-numatune-auto.xml | 26 +++
.../qemuxml2argv-numatune-memnode-nocpu.xml | 25 +++
.../qemuxml2argv-numatune-memnodes-problematic.xml | 31 ++++
.../qemuxml2argv-numatune-memnodes.args | 8 +
.../qemuxml2argv-numatune-memnodes.xml | 31 ++++
.../qemuxml2argv-numatune-prefer.args | 6 +
.../qemuxml2argv-numatune-prefer.xml | 29 ++++
tests/qemuxml2argvtest.c | 51 ++++--
.../qemuxml2xmlout-cpu-numa1.xml | 28 ++++
.../qemuxml2xmlout-cpu-numa2.xml | 28 ++++
tests/qemuxml2xmltest.c | 4 +
tests/qemuxmlnstest.c | 2 +-
31 files changed, 747 insertions(+), 93 deletions(-)
create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa3.xml
create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-auto-prefer.args
create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-auto-prefer.xml
create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-auto.args
create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-auto.xml
create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-memnode-nocpu.xml
create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes-problematic.xml
create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes.args
create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes.xml
create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-prefer.args
create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-prefer.xml
create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-cpu-numa1.xml
create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-cpu-numa2.xml
--
1.9.3