On Mon, Oct 17, 2016 at 03:45:09PM +1100, Sam Bobroff wrote:
On Fri, Oct 14, 2016 at 10:19:42AM +0200, Martin Kletzander wrote:
> On Fri, Oct 14, 2016 at 11:52:22AM +1100, Sam Bobroff wrote:
> >I did look at the libnuma and cgroups approaches, but I was concerned they
> >wouldn't work in this case, because of the way QEMU allocates memory when
> >mem-prealloc is used: the memory is allocated in the main process, before the
> >CPU threads are created. (This is based only on a bit of hacking and debugging
> >in QEMU, but it does seem explain the behaviour I've seen so far.)
> >
>
> But we use numactl before QEMU is exec()'d.
Sorry, I jumped ahead a bit. I'll try to explain what I mean:
I think the problem with using this method would be that the NUMA policy is
applied to all allocations by QEMU, not just ones related to the memory
backing. I'm not sure if that would cause a serious problem but it seems untidy,
and it doesn't happen in other situations (i.e. with separate memory backend
objects, QEMU sets up the policy specifically for each one and other
allocations aren't affected, AFAIK). Presumably, if memory were very
restricted it could prevent the guest from starting.
Yes, it is, that's what <numatune><memory/> does if you don't have
any
other (<memnode/>) specifics set.
> >I think QEMU could be altered to move the preallocations into
the VCPU
> >threads but it didn't seem trivial and I suspected the QEMU community would
> >point out that there was already a way to do it using backend objects. Another
> >option would be to add a -host-nodes parameter to QEMU so that the policy can
> >be given without adding a memory backend object. (That seems like a more
> >reasonable change to QEMU.)
> >
>
> I think upstream won't like that, mostly because there is already a
> way. And that is using memory-backend object. I think we could just
> use that and disable changing it live. But upstream will probably want
> that to be configurable or something.
Right, but isn't this already an issue in the cases where libvirt is already
using memory backend objects and NUMA policy? (Or does libvirt already disable
changing it live in those situations?)
It is. I'm not trying to say libvirt is perfect. There are bugs,
e.g. like this one. The problem is that we tried to do *everything*,
but it's not currently possible. I'm trying to explain how stuff works
now. It definitely needs some fixing, though.