On 4/15/19 3:47 PM, Martin Kletzander wrote:
On Wed, Apr 10, 2019 at 06:10:44PM +0200, Michal Privoznik wrote:
> It's funny how this went unnoticed for such a long time. Long
> story short, if a domain is configured with
> VIR_DOMAIN_NUMATUNE_MEM_STRICT libvirt doesn't really honour
> that. This is because of 7e72ac787848 after which libvirt allowed
> qemu to allocate memory just anywhere and only after that it used
> some magic involving cpuset.memory_migrate and cpuset.mems to
> move the memory to desired NUMA nodes. This was done in order to
> work around some KVM bug where KVM would fail if there wasn't a
> DMA zone available on the NUMA node. Well, while the work around
> might stopped libvirt tickling the KVM bug it also caused a bug
> on libvirt side: if there is not enough memory on configured NUMA
> node(s) then any attempt to start a domain must fail. Because of
> the way we play with guest memory domains can start just happily.
>
> The solution is to move the child we've just forked into emulator
> cgroup, set up cpuset.mems and exec() qemu only after that.
>
So you are saying this was a bug in KVM? Is it fixed now? I am not
against
this patch, I hated that I had to do the workaround, but I just want to
be sure
we won't start hitting that again.
Yes, that's what I'm saying. Looks like the KVM bug is fixed now because
with a Fedora 29 on a NUMA machine I can start domains just fine.
> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
> ---
> src/qemu/qemu_process.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index 47d8ca2ff1..076ec18e21 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -6653,6 +6653,14 @@ qemuProcessLaunch(virConnectPtr conn,
> if (qemuProcessInitCpuAffinity(vm) < 0)
> goto cleanup;
>
> + VIR_DEBUG("Setting emulator tuning/settings");
> + if (qemuProcessSetupEmulator(vm) < 0)
> + goto cleanup;
> +
> + VIR_DEBUG("Setting up post-init cgroup restrictions");
This is not post-init any more, but more importantly,
> + if (qemuSetupCpusetMems(vm) < 0)
This function does a subset of what qemuProcessSetupEmulator() called right
before, does, so I see no reason for it being called here, or to keep
existing
in the codebase for that matter.
Ah, good point. I'll send v3.
Michal