On Wed, May 28, 2014 at 11:18:30AM +0100, Daniel P. Berrange wrote:
On Wed, May 28, 2014 at 11:48:31AM +0200, Martin Kletzander wrote:
> Caveats:
>
> - I'm not sure how cpu hotplug is done with guest numa nodes, but if
> there is a possibility to increase the number of numa nodes (which
> does not make sense to me from (a) user point of view and (b) our
> XMLs and APIs), we need to be able to hotplug the ram as well,
AFAIK, you cannot change the NUMA topology once booted. You'd have to
and any new CPUs to existing NUMA nodes (assuming they have space).
Likewise I'd expect that any RAM would have to be added to existing
defined nodes. Of course ultimately nothing should stop the user
defining some "empty" NUMA nodes if they want space to add new CPUs
beyond the initial setup.
> - virDomainGetNumaParameters() now reflects only the
> /domain/numatune/memory settings, not 'memnode' ones,
Yep, that's fine, though we'll likely want to have new APIs to deal
with the guest NUMA node settings.
I was thinking how to extend the current one, but all the ideas seem
too "messy".
> - virDomainSetNumaParameters() is not allowed when there is
some
> /domain/numatune/memnode parameter as we can query memdev info, but
> not change it (if I understood the QEMU side correctly),
>
> - when domain is started, cpuset.mems cgroup is not modified per for
> each vcpu, this will be fixed, but the question is how to handle it
> for non-strict settings [*],
>
> - automatic numad placement can be now used together with memnode
> settings which IMHO doesn't make any sense, but I was hesitant to
> disable that in case somebody has a constructive criticism in this
> area.
IMHO if you're doing fine grained configuration of guest <-> host
NUMA nodes, then you're not going to want numad. numad is really
serving the use cases where you're lazy and want to ignore NUMA
settings in the guest config. IOW, I think it is fine to forbid
numad.
Good we're on the same page here, I just wanted to make sure before
jumping to conclusions. Most of the use cases where one depends on
numad are not very thought-through I guess.
> - This series alone is broken when used with
> /domain/memoryBacking/hugepages, because it will still use the
> memory-ram object, but that will be fixed with Michal's patches on
> top of this series.
>
> One idea how to solve some of the problems is to say that
> /domain/numatune/memory is set for the whole domain regardless of what
> anyone puts in /domain/numatune/memnode. virDomainGetNumaParameters()
> could be extended to tell the info for all guest numa nodes, although
> it seems new API would suit better for this kind of information. But
> is it really neede when we are not able to modify it live and the
> information is available in the domain XML?
>
>
> *) does (or should) this:
>
> ...
> <numatune>
> <memory mode='strict' placement='static'
nodeset='0-7'/>
> <memnode nodeid='0' mode='preferred' nodeset='7'/>
> </numatune>
> ...
>
> mean what it looks like it means, that is "in guest node 0, prefer
> allocating from host node 7 but feel free to allocate from 0-6 as well
> in case you can't use 7, but never try allocating from host nodes
> 8-15"?
What we have to remember is that there's two different sets of threads
and memory we're dealing with. There are the vCPU threads and the guest
RAM allocation as one set, and there are misc emulator threads and
other QEMU memory allocations. The <memnode> elements only apply to
guest vCPUs and guest RAM. The <memory> element will still apply policy
to the other QEMU emulator threads / RAM allocations.
Yes, my sense is that emulator should be set according to memory (as
it's done now IIUC). It would mean we need to restrict the
cpuset.cpus for that as well (in case there is no <emulatorpin>). We
also need to properly error out on configuration that will fail
(strict memory mode with nodes and cpus from different host numa
nodes). This should be done for both emulator threads and cpu threads
(it is not done now, btw).
WRT your question about virDomainSetNumaParameters above - I think
the
answer to whether that makes sense to allow, is dependant on whether it
would be able to affect the QEMU emulator threads/RAM, without affecting
the vCPU threads/guest RAM. We'd likely want a new virDomainSetNumaNodeParameters
API to control settings for the <memnode> elements directly.
Yes. It also depends whether we'll be able to modify the guest node
settings in QEMU (if not then we have to somehow unify when we use
host-nodes and when not).
Another question that came to my mind right now is whether we want to
expose cpuset.memory_migrate as a parameter, too (or set it to some
default value) since it will affect the performace a lot when the
cpuset.mems are changes, won't it?
Thank you for the responses,
Martin