On Tue, Oct 25, 2016 at 02:13:07PM +0200, Martin Kletzander wrote:
On Tue, Oct 25, 2016 at 01:10:23PM +1100, Sam Bobroff wrote:
>On Tue, Oct 18, 2016 at 10:43:31PM +0200, Martin Kletzander wrote:
>>On Mon, Oct 17, 2016 at 03:45:09PM +1100, Sam Bobroff wrote:
>>>On Fri, Oct 14, 2016 at 10:19:42AM +0200, Martin Kletzander wrote:
>>>>On Fri, Oct 14, 2016 at 11:52:22AM +1100, Sam Bobroff wrote:
>>>>>I did look at the libnuma and cgroups approaches, but I was concerned
they
>>>>>wouldn't work in this case, because of the way QEMU allocates
memory when
>>>>>mem-prealloc is used: the memory is allocated in the main process,
before the
>>>>>CPU threads are created. (This is based only on a bit of hacking and
debugging
>>>>>in QEMU, but it does seem explain the behaviour I've seen so
far.)
>>>>>
>>>>
>>>>But we use numactl before QEMU is exec()'d.
>>>
>>>Sorry, I jumped ahead a bit. I'll try to explain what I mean:
>>>
>>>I think the problem with using this method would be that the NUMA policy is
>>>applied to all allocations by QEMU, not just ones related to the memory
>>>backing. I'm not sure if that would cause a serious problem but it seems
untidy,
>>>and it doesn't happen in other situations (i.e. with separate memory
backend
>>>objects, QEMU sets up the policy specifically for each one and other
>>>allocations aren't affected, AFAIK). Presumably, if memory were very
>>>restricted it could prevent the guest from starting.
>>>
>>
>>Yes, it is, that's what <numatune><memory/> does if you don't
have any
>>other (<memnode/>) specifics set.
>>
>>>>>I think QEMU could be altered to move the preallocations into the
VCPU
>>>>>threads but it didn't seem trivial and I suspected the QEMU
community would
>>>>>point out that there was already a way to do it using backend
objects. Another
>>>>>option would be to add a -host-nodes parameter to QEMU so that the
policy can
>>>>>be given without adding a memory backend object. (That seems like a
more
>>>>>reasonable change to QEMU.)
>>>>>
>>>>
>>>>I think upstream won't like that, mostly because there is already a
>>>>way. And that is using memory-backend object. I think we could just
>>>>use that and disable changing it live. But upstream will probably want
>>>>that to be configurable or something.
>>>
>>>Right, but isn't this already an issue in the cases where libvirt is
already
>>>using memory backend objects and NUMA policy? (Or does libvirt already
disable
>>>changing it live in those situations?)
>>>
>>
>>It is. I'm not trying to say libvirt is perfect. There are bugs,
>>e.g. like this one. The problem is that we tried to do *everything*,
>>but it's not currently possible. I'm trying to explain how stuff works
>>now. It definitely needs some fixing, though.
>
>OK :-)
>
>Well, given our discussion, do you think it's worth a v2 of my original patch
>or would it be better to drop it in favour of some broader change?
>
Honestly, I thought about the approaches so much I'm now not sure I'll
make a good decision. RFC could do. If I were to pick, I would go with
a new setting that would control whether we want the binding to be
changeable throughout the domain's lifetime or not so that we can make
better decisions (and don't feel bad about the bad ones).
I feel the same way.
OK, I'll try an RFC patch with a lot of description.
I'm specifically trying to address the issue I originally raised, which isn't
quite the same thing as the changeability of the bindings but I'll keep that in
mind. I think your point about changing the bindings will apply in the
same way whenever QEMU's memory-backend objects are used with their
"host-nodes" attribute (since they are what causes QEMU to apply policy), so I
don't think I'm suggesting any significant change there.
If you want to add the new setting you mention above, I'd be happy to base my
patch of top of that work. ;-)
Cheers,
Sam.