On Tue, Oct 18, 2016 at 10:43:31PM +0200, Martin Kletzander wrote:
On Mon, Oct 17, 2016 at 03:45:09PM +1100, Sam Bobroff wrote:
>On Fri, Oct 14, 2016 at 10:19:42AM +0200, Martin Kletzander wrote:
>>On Fri, Oct 14, 2016 at 11:52:22AM +1100, Sam Bobroff wrote:
>>>I did look at the libnuma and cgroups approaches, but I was concerned they
>>>wouldn't work in this case, because of the way QEMU allocates memory
when
>>>mem-prealloc is used: the memory is allocated in the main process, before
the
>>>CPU threads are created. (This is based only on a bit of hacking and
debugging
>>>in QEMU, but it does seem explain the behaviour I've seen so far.)
>>>
>>
>>But we use numactl before QEMU is exec()'d.
>
>Sorry, I jumped ahead a bit. I'll try to explain what I mean:
>
>I think the problem with using this method would be that the NUMA policy is
>applied to all allocations by QEMU, not just ones related to the memory
>backing. I'm not sure if that would cause a serious problem but it seems untidy,
>and it doesn't happen in other situations (i.e. with separate memory backend
>objects, QEMU sets up the policy specifically for each one and other
>allocations aren't affected, AFAIK). Presumably, if memory were very
>restricted it could prevent the guest from starting.
>
Yes, it is, that's what <numatune><memory/> does if you don't have
any
other (<memnode/>) specifics set.
>>>I think QEMU could be altered to move the preallocations into the VCPU
>>>threads but it didn't seem trivial and I suspected the QEMU community
would
>>>point out that there was already a way to do it using backend objects.
Another
>>>option would be to add a -host-nodes parameter to QEMU so that the policy
can
>>>be given without adding a memory backend object. (That seems like a more
>>>reasonable change to QEMU.)
>>>
>>
>>I think upstream won't like that, mostly because there is already a
>>way. And that is using memory-backend object. I think we could just
>>use that and disable changing it live. But upstream will probably want
>>that to be configurable or something.
>
>Right, but isn't this already an issue in the cases where libvirt is already
>using memory backend objects and NUMA policy? (Or does libvirt already disable
>changing it live in those situations?)
>
It is. I'm not trying to say libvirt is perfect. There are bugs,
e.g. like this one. The problem is that we tried to do *everything*,
but it's not currently possible. I'm trying to explain how stuff works
now. It definitely needs some fixing, though.
OK :-)
Well, given our discussion, do you think it's worth a v2 of my original patch
or would it be better to drop it in favour of some broader change?
Cheers,
Sam.