On Tue, Oct 25, 2016 at 01:10:23PM +1100, Sam Bobroff wrote:
On Tue, Oct 18, 2016 at 10:43:31PM +0200, Martin Kletzander wrote:
> On Mon, Oct 17, 2016 at 03:45:09PM +1100, Sam Bobroff wrote:
> >On Fri, Oct 14, 2016 at 10:19:42AM +0200, Martin Kletzander wrote:
> >>On Fri, Oct 14, 2016 at 11:52:22AM +1100, Sam Bobroff wrote:
> >>>I did look at the libnuma and cgroups approaches, but I was concerned
they
> >>>wouldn't work in this case, because of the way QEMU allocates memory
when
> >>>mem-prealloc is used: the memory is allocated in the main process, before
the
> >>>CPU threads are created. (This is based only on a bit of hacking and
debugging
> >>>in QEMU, but it does seem explain the behaviour I've seen so far.)
> >>>
> >>
> >>But we use numactl before QEMU is exec()'d.
> >
> >Sorry, I jumped ahead a bit. I'll try to explain what I mean:
> >
> >I think the problem with using this method would be that the NUMA policy is
> >applied to all allocations by QEMU, not just ones related to the memory
> >backing. I'm not sure if that would cause a serious problem but it seems
untidy,
> >and it doesn't happen in other situations (i.e. with separate memory backend
> >objects, QEMU sets up the policy specifically for each one and other
> >allocations aren't affected, AFAIK). Presumably, if memory were very
> >restricted it could prevent the guest from starting.
> >
>
> Yes, it is, that's what <numatune><memory/> does if you don't
have any
> other (<memnode/>) specifics set.
>
> >>>I think QEMU could be altered to move the preallocations into the VCPU
> >>>threads but it didn't seem trivial and I suspected the QEMU community
would
> >>>point out that there was already a way to do it using backend objects.
Another
> >>>option would be to add a -host-nodes parameter to QEMU so that the policy
can
> >>>be given without adding a memory backend object. (That seems like a more
> >>>reasonable change to QEMU.)
> >>>
> >>
> >>I think upstream won't like that, mostly because there is already a
> >>way. And that is using memory-backend object. I think we could just
> >>use that and disable changing it live. But upstream will probably want
> >>that to be configurable or something.
> >
> >Right, but isn't this already an issue in the cases where libvirt is already
> >using memory backend objects and NUMA policy? (Or does libvirt already disable
> >changing it live in those situations?)
> >
>
> It is. I'm not trying to say libvirt is perfect. There are bugs,
> e.g. like this one. The problem is that we tried to do *everything*,
> but it's not currently possible. I'm trying to explain how stuff works
> now. It definitely needs some fixing, though.
OK :-)
Well, given our discussion, do you think it's worth a v2 of my original patch
or would it be better to drop it in favour of some broader change?
Honestly, I thought about the approaches so much I'm now not sure I'll
make a good decision. RFC could do. If I were to pick, I would go with
a new setting that would control whether we want the binding to be
changeable throughout the domain's lifetime or not so that we can make
better decisions (and don't feel bad about the bad ones).
Cheers,
Sam.