On Wed, May 28, 2014 at 11:48:31AM +0200, Martin Kletzander wrote:
Caveats:
- I'm not sure how cpu hotplug is done with guest numa nodes, but if
there is a possibility to increase the number of numa nodes (which
does not make sense to me from (a) user point of view and (b) our
XMLs and APIs), we need to be able to hotplug the ram as well,
AFAIK, you cannot change the NUMA topology once booted. You'd have to
and any new CPUs to existing NUMA nodes (assuming they have space).
Likewise I'd expect that any RAM would have to be added to existing
defined nodes. Of course ultimately nothing should stop the user
defining some "empty" NUMA nodes if they want space to add new CPUs
beyond the initial setup.
- virDomainGetNumaParameters() now reflects only the
/domain/numatune/memory settings, not 'memnode' ones,
Yep, that's fine, though we'll likely want to have new APIs to deal
with the guest NUMA node settings.
- virDomainSetNumaParameters() is not allowed when there is some
/domain/numatune/memnode parameter as we can query memdev info, but
not change it (if I understood the QEMU side correctly),
- when domain is started, cpuset.mems cgroup is not modified per for
each vcpu, this will be fixed, but the question is how to handle it
for non-strict settings [*],
- automatic numad placement can be now used together with memnode
settings which IMHO doesn't make any sense, but I was hesitant to
disable that in case somebody has a constructive criticism in this
area.
IMHO if you're doing fine grained configuration of guest <-> host
NUMA nodes, then you're not going to want numad. numad is really
serving the use cases where you're lazy and want to ignore NUMA
settings in the guest config. IOW, I think it is fine to forbid
numad.
- This series alone is broken when used with
/domain/memoryBacking/hugepages, because it will still use the
memory-ram object, but that will be fixed with Michal's patches on
top of this series.
One idea how to solve some of the problems is to say that
/domain/numatune/memory is set for the whole domain regardless of what
anyone puts in /domain/numatune/memnode. virDomainGetNumaParameters()
could be extended to tell the info for all guest numa nodes, although
it seems new API would suit better for this kind of information. But
is it really neede when we are not able to modify it live and the
information is available in the domain XML?
*) does (or should) this:
...
<numatune>
<memory mode='strict' placement='static' nodeset='0-7'/>
<memnode nodeid='0' mode='preferred' nodeset='7'/>
</numatune>
...
mean what it looks like it means, that is "in guest node 0, prefer
allocating from host node 7 but feel free to allocate from 0-6 as well
in case you can't use 7, but never try allocating from host nodes
8-15"?
What we have to remember is that there's two different sets of threads
and memory we're dealing with. There are the vCPU threads and the guest
RAM allocation as one set, and there are misc emulator threads and
other QEMU memory allocations. The <memnode> elements only apply to
guest vCPUs and guest RAM. The <memory> element will still apply policy
to the other QEMU emulator threads / RAM allocations.
WRT your question about virDomainSetNumaParameters above - I think the
answer to whether that makes sense to allow, is dependant on whether it
would be able to affect the QEMU emulator threads/RAM, without affecting
the vCPU threads/guest RAM. We'd likely want a new virDomainSetNumaNodeParameters
API to control settings for the <memnode> elements directly.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|