On Wed, Oct 12, 2016 at 09:48:07AM +0200, Peter Krempa wrote:
On Wed, Oct 12, 2016 at 15:04:53 +1100, Sam Bobroff wrote:
> At the moment, guests that are backed by hugepages in the host are
> only able to use policy to control the placement of those hugepages
> on a per-(guest-)CPU basis. Policy applied globally is ignored.
>
> Such guests would use <memoryBacking><hugepages/></memoryBacking>
and
> a <numatune> block with <memory mode=... nodeset=.../> but no
<memnode
> .../> elements.
>
> This patch corrects this by, in this specific case, changing the QEMU
> command line from "-mem-prealloc -mem-path=..." (which cannot
> specify NUMA policy) to "-object memory-backend-file ..." (which can).
>
> Note: This is not visible to the guest and does not appear to create
> a migration incompatibility.
>
> Signed-off-by: Sam Bobroff <sam.bobroff(a)au1.ibm.com>
> ---
> There was some discussion leading up to this patch, here:
>
>
https://www.redhat.com/archives/libvir-list/2016-October/msg00033.html
>
> During that discussion it seemed that fixing this issue would cause
> migration incompatibility but after some careful testing, it appears
> that it only does when a memory-backend object is attached to a guest
> NUMA node and that is not the case here. If only one is created, and
> used globally (not attached via mem=<id>), the migration data does not
> seem to be changed and so it seems reasonable to try something like
> this patch.
>
> This patch does work for my test cases but I don't claim a deep
> understanding of the libvirt code so this is at least partly a RFC.
> Comments welcome :-)
>
> Cheers,
> Sam.
>
> src/qemu/qemu_command.c | 55 ++++++++++++++++++++++++++++++++++++++++---------
> src/qemu/qemu_command.h | 2 +-
> 2 files changed, 46 insertions(+), 11 deletions(-)
>
> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
> index 0804133..c28c8f2 100644
> --- a/src/qemu/qemu_command.c
> +++ b/src/qemu/qemu_command.c
> @@ -3143,7 +3143,7 @@ qemuBuildMemoryBackendStr(unsigned long long size,
> int guestNode,
> virBitmapPtr userNodeset,
> virBitmapPtr autoNodeset,
> - virDomainDefPtr def,
> + const virDomainDefPtr def,
This does not work as expected. Using 'const' with types that hide the
pointer makes the pointer const and not the structure that the pointer
points to.
Ah, of course you're right.
> virQEMUCapsPtr qemuCaps,
> virQEMUDriverConfigPtr cfg,
> const char **backendType,
> @@ -7129,12 +7129,18 @@ qemuBuildSmpCommandLine(virCommandPtr cmd,
>
> static int
> qemuBuildMemPathStr(virQEMUDriverConfigPtr cfg,
> - const virDomainDef *def,
> + const virDomainDefPtr def,
Same as above. This change is wrong.
OK.
> virQEMUCapsPtr qemuCaps,
> - virCommandPtr cmd)
> + virCommandPtr cmd,
> + virBitmapPtr auto_nodeset)
> {
> const long system_page_size = virGetSystemPageSizeKB();
> char *mem_path = NULL;
> + virBitmapPtr nodemask = NULL;
> + const char *backendType = NULL;
> + char *backendStr = NULL;
> + virJSONValuePtr props;
> + int rv = -1;
>
> /*
> * No-op if hugepages were not requested.
> @@ -7159,18 +7165,47 @@ qemuBuildMemPathStr(virQEMUDriverConfigPtr cfg,
> if (qemuGetHupageMemPath(cfg, def->mem.hugepages[0].size, &mem_path)
< 0)
> return -1;
>
> - virCommandAddArgList(cmd, "-mem-prealloc", "-mem-path",
mem_path, NULL);
> + if (virDomainNumatuneMaybeGetNodeset(def->numa, auto_nodeset,
> + &nodemask, -1) < 0)
> + return -1;
> + if (nodemask && virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE))
{
> + props = virJSONValueNewObject();
This is leaked, qemuBuildMemoryBackendStr allocates one itself.
Ah of course, sorry. I think I can just remove that line.
> + if
(qemuBuildMemoryBackendStr(virDomainDefGetMemoryInitial(def),
> + 0, -1, NULL, auto_nodeset,
> + def, qemuCaps, cfg, &backendType,
> + &props, false) < 0)
> + goto cleanup;
> + if (!(backendStr = virQEMUBuildObjectCommandlineFromJSON(backendType,
> + "mem",
> + props)))
> + goto cleanup;
So you create a memory object that does not get used ...
Yes, that is correct.
> + virCommandAddArgList(cmd, "-object",
backendStr, NULL);
So does qemu use memory objects that are not used magically as it's
memory? That seems odd. Could you please elaborate on this part and how
it's supposed to work? Without a proper elaboration how this is supposed
to work with qemu this patch is not acceptable.
Yes, it did seem odd to me, but I didn't know if it was odd to anyone else! ;-)
I'll elaborate properly in the commit message if I end up doing a version 2
(and if the fix ends up using that method). But briefly, QEMU seems to treat a
single memory backend object, that is not linked to any NUMA node, just as it
would "-mem-path" and "-mem-prealloc" except that NUMA policy can also
be set.
It seems to "know" about this useage because it is careful to avoid changing
the "memory region" name to the ID in this case and it's that quirk that
allows migrations to be unaffected.
> + rv = 0;
> +cleanup:
This won't pass syntax check. Labels have to be spaced out.
Thanks, I'll use syntax check as noted below!
> + VIR_FREE(backendStr);
> + VIR_FREE(props);
This is a JSON value object, you need to free it with the proper
function.
Ah, virJSONValueFree(). Thanks.
> + }
> + else {
This breaks coding style.
OK.
> + if (nodemask || 1)
Erm, a tautology?
Yes, an embarassing bit of debug from the last round of testing :-P
(Testing the warning.)
> + virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
"%s",
> + _("Memory file backend objects are "
> + "unsupported by QEMU binary. Global NUMA "
> + "hugepage policy will be ignored."));
Reporting an error but not terminating the startup is not acceptable.
Errors should not be reported unless something fails.
Right, if I do another version I'll change this to a warning.
> + virCommandAddArgList(cmd, "-mem-prealloc",
"-mem-path", mem_path, NULL);
> + rv = 0;
> + }
> VIR_FREE(mem_path);
>
> - return 0;
> + return rv;
> }
>
Missing a change to the test files. Please make sure that you run 'make
check' and 'make syntax-check' and send only patches that pass those.
Ah, thanks. I will always use them in future.
Peter
Thanks for your revew!
Cheers,
Sam.