[libvirt] [PATCH] qemu: Only use memory-backend-file with NUMA if needed

If this reminds you of a commit message from around a year ago, it's 41c2aa729f0af084ede95ee9a06219a2dd5fb5df and yes, we're dealing with "the same thing" again. Or f309db1f4d51009bad0d32e12efc75530b66836b and it's similar. There is a logic in place that if there is no real need for memory-backend-file, qemuBuildMemoryBackendStr() returns 0. However that wasn't the case with hugepage backing. The reason for that was that we abused the 'pagesize' variable for storing that information, but we should rather have a separate one that specifies whether we really need the new object for hugepage backing. And that variable should be set only if this particular NUMA cell needs special treatment WRT hugepages. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1372153 Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- Notes: This fixes migration from older libvirts. By "older", I mean pre-(circa-)1.2.7, also in some cases pre-1.2.11, in some other cases pre-v1.2.20. It's pretty messy. It could be back-ported as far as it's easy to do. src/qemu/qemu_command.c | 8 +++++--- tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args | 10 ++++------ 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 1ac0fa116310..316aced5124a 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3202,6 +3202,7 @@ qemuBuildMemoryBackendStr(unsigned long long size, int ret = -1; virJSONValuePtr props = NULL; bool nodeSpecified = virDomainNumatuneNodeSpecified(def->numa, guestNode); + bool needHugepage = !!pagesize; *backendProps = NULL; *backendType = NULL; @@ -3224,10 +3225,10 @@ qemuBuildMemoryBackendStr(unsigned long long size, mode = VIR_DOMAIN_NUMATUNE_MEM_STRICT; if (pagesize == 0) { + bool thisHugepage = false; + /* Find the huge page size we want to use */ for (i = 0; i < def->mem.nhugepages; i++) { - bool thisHugepage = false; - hugepage = &def->mem.hugepages[i]; if (!hugepage->nodemask) { @@ -3249,6 +3250,7 @@ qemuBuildMemoryBackendStr(unsigned long long size, if (thisHugepage) { /* Hooray, we've found the page size */ + needHugepage = true; break; } } @@ -3335,7 +3337,7 @@ qemuBuildMemoryBackendStr(unsigned long long size, } /* If none of the following is requested... */ - if (!pagesize && !userNodeset && !memAccess && !nodeSpecified && !force) { + if (!needHugepage && !userNodeset && !memAccess && !nodeSpecified && !force) { /* report back that using the new backend is not necessary * to achieve the desired configuration */ ret = 1; diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args index 447bb52d00b7..a37f03d0d2ed 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args @@ -10,12 +10,10 @@ QEMU_AUDIO_DRV=none \ -M pc \ -m 1024 \ -smp 2,sockets=2,cores=1,threads=1 \ --object memory-backend-file,id=ram-node0,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=268435456 \ --numa node,nodeid=0,cpus=0,memdev=ram-node0 \ --object memory-backend-file,id=ram-node1,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=805306368 \ --numa node,nodeid=1,cpus=1,memdev=ram-node1 \ +-mem-prealloc \ +-mem-path /dev/hugepages2M/libvirt/qemu \ +-numa node,nodeid=0,cpus=0,mem=256 \ +-numa node,nodeid=1,cpus=1,mem=768 \ -uuid ef1bdff4-27f3-4e85-a807-5fb4d58463cc \ -nographic \ -nodefaults \ -- 2.10.0

On 23.09.2016 15:20, Martin Kletzander wrote:
If this reminds you of a commit message from around a year ago, it's 41c2aa729f0af084ede95ee9a06219a2dd5fb5df and yes, we're dealing with "the same thing" again. Or f309db1f4d51009bad0d32e12efc75530b66836b and it's similar.
There is a logic in place that if there is no real need for memory-backend-file, qemuBuildMemoryBackendStr() returns 0. However that wasn't the case with hugepage backing. The reason for that was that we abused the 'pagesize' variable for storing that information, but we should rather have a separate one that specifies whether we really need the new object for hugepage backing. And that variable should be set only if this particular NUMA cell needs special treatment WRT hugepages.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1372153
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> ---
Notes: This fixes migration from older libvirts. By "older", I mean pre-(circa-)1.2.7, also in some cases pre-1.2.11, in some other cases pre-v1.2.20. It's pretty messy. It could be back-ported as far as it's easy to do.
src/qemu/qemu_command.c | 8 +++++--- tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args | 10 ++++------ 2 files changed, 9 insertions(+), 9 deletions(-)
ACK Michal

On Tue, Sep 27, 2016 at 04:29:07PM +0200, Michal Privoznik wrote:
On 23.09.2016 15:20, Martin Kletzander wrote:
If this reminds you of a commit message from around a year ago, it's 41c2aa729f0af084ede95ee9a06219a2dd5fb5df and yes, we're dealing with "the same thing" again. Or f309db1f4d51009bad0d32e12efc75530b66836b and it's similar.
There is a logic in place that if there is no real need for memory-backend-file, qemuBuildMemoryBackendStr() returns 0. However that wasn't the case with hugepage backing. The reason for that was that we abused the 'pagesize' variable for storing that information, but we should rather have a separate one that specifies whether we really need the new object for hugepage backing. And that variable should be set only if this particular NUMA cell needs special treatment WRT hugepages.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1372153
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> ---
Notes: This fixes migration from older libvirts. By "older", I mean pre-(circa-)1.2.7, also in some cases pre-1.2.11, in some other cases pre-v1.2.20. It's pretty messy. It could be back-ported as far as it's easy to do.
src/qemu/qemu_command.c | 8 +++++--- tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args | 10 ++++------ 2 files changed, 9 insertions(+), 9 deletions(-)
ACK
Since this was ACKed before the freeze, I'll pushed it in a while, mainly because I need to back-port it to -maint branches anyway.
Michal
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Fri, Sep 23, 2016 at 03:20:56PM +0200, Martin Kletzander wrote:
If this reminds you of a commit message from around a year ago, it's 41c2aa729f0af084ede95ee9a06219a2dd5fb5df and yes, we're dealing with "the same thing" again. Or f309db1f4d51009bad0d32e12efc75530b66836b and it's similar.
There is a logic in place that if there is no real need for memory-backend-file, qemuBuildMemoryBackendStr() returns 0. However that wasn't the case with hugepage backing. The reason for that was that we abused the 'pagesize' variable for storing that information, but we should rather have a separate one that specifies whether we really need the new object for hugepage backing. And that variable should be set only if this particular NUMA cell needs special treatment WRT hugepages.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1372153
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Hi Martin, I tested your patch and it works but it seems to make the snapshot/migration information incompatible with the current master code. e.g. after applying the patch I can't load a snapshot that was created before applying the patch (obviously only for a guest configuration that is affected by the patch). The error I get is this: $ virsh snapshot-revert tmp --snapshotname 1475026515 error: operation failed: Unknown ramblock "/objects/ram-node0", cannot accept migration error while loading state for instance 0x0 of device 'ram' Error -22 while loading VM state Presumably this is caused by the same change that fixes some other migrations, but is this case a problem? Cheers, Sam.
---
Notes: This fixes migration from older libvirts. By "older", I mean pre-(circa-)1.2.7, also in some cases pre-1.2.11, in some other cases pre-v1.2.20. It's pretty messy. It could be back-ported as far as it's easy to do.
src/qemu/qemu_command.c | 8 +++++--- tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args | 10 ++++------ 2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 1ac0fa116310..316aced5124a 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3202,6 +3202,7 @@ qemuBuildMemoryBackendStr(unsigned long long size, int ret = -1; virJSONValuePtr props = NULL; bool nodeSpecified = virDomainNumatuneNodeSpecified(def->numa, guestNode); + bool needHugepage = !!pagesize;
*backendProps = NULL; *backendType = NULL; @@ -3224,10 +3225,10 @@ qemuBuildMemoryBackendStr(unsigned long long size, mode = VIR_DOMAIN_NUMATUNE_MEM_STRICT;
if (pagesize == 0) { + bool thisHugepage = false; + /* Find the huge page size we want to use */ for (i = 0; i < def->mem.nhugepages; i++) { - bool thisHugepage = false; - hugepage = &def->mem.hugepages[i];
if (!hugepage->nodemask) { @@ -3249,6 +3250,7 @@ qemuBuildMemoryBackendStr(unsigned long long size,
if (thisHugepage) { /* Hooray, we've found the page size */ + needHugepage = true; break; } } @@ -3335,7 +3337,7 @@ qemuBuildMemoryBackendStr(unsigned long long size, }
/* If none of the following is requested... */ - if (!pagesize && !userNodeset && !memAccess && !nodeSpecified && !force) { + if (!needHugepage && !userNodeset && !memAccess && !nodeSpecified && !force) { /* report back that using the new backend is not necessary * to achieve the desired configuration */ ret = 1; diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args index 447bb52d00b7..a37f03d0d2ed 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args @@ -10,12 +10,10 @@ QEMU_AUDIO_DRV=none \ -M pc \ -m 1024 \ -smp 2,sockets=2,cores=1,threads=1 \ --object memory-backend-file,id=ram-node0,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=268435456 \ --numa node,nodeid=0,cpus=0,memdev=ram-node0 \ --object memory-backend-file,id=ram-node1,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=805306368 \ --numa node,nodeid=1,cpus=1,memdev=ram-node1 \ +-mem-prealloc \ +-mem-path /dev/hugepages2M/libvirt/qemu \ +-numa node,nodeid=0,cpus=0,mem=256 \ +-numa node,nodeid=1,cpus=1,mem=768 \ -uuid ef1bdff4-27f3-4e85-a807-5fb4d58463cc \ -nographic \ -nodefaults \ -- 2.10.0
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Wed, Sep 28, 2016 at 12:00:22PM +1000, Sam Bobroff wrote:
On Fri, Sep 23, 2016 at 03:20:56PM +0200, Martin Kletzander wrote:
If this reminds you of a commit message from around a year ago, it's 41c2aa729f0af084ede95ee9a06219a2dd5fb5df and yes, we're dealing with "the same thing" again. Or f309db1f4d51009bad0d32e12efc75530b66836b and it's similar.
There is a logic in place that if there is no real need for memory-backend-file, qemuBuildMemoryBackendStr() returns 0. However that wasn't the case with hugepage backing. The reason for that was that we abused the 'pagesize' variable for storing that information, but we should rather have a separate one that specifies whether we really need the new object for hugepage backing. And that variable should be set only if this particular NUMA cell needs special treatment WRT hugepages.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1372153
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Hi Martin,
I tested your patch and it works but it seems to make the snapshot/migration information incompatible with the current master code. e.g. after applying the patch I can't load a snapshot that was created before applying the patch (obviously only for a guest configuration that is affected by the patch). The error I get is this:
$ virsh snapshot-revert tmp --snapshotname 1475026515 error: operation failed: Unknown ramblock "/objects/ram-node0", cannot accept migration error while loading state for instance 0x0 of device 'ram' Error -22 while loading VM state
Presumably this is caused by the same change that fixes some other migrations, but is this case a problem?
Yes, that's the problem. But since we cannot know which was the previous notation used, we need to pick one. And previously we picked the one that works with most QEMUs, even the old ones, unless we need the new notation for some setting. That is mostly for the migration (or save/restore, which is basically the same thing) to work, even though it looks like we're breaking it due to that. That's why I will back-port this to older maintenance branches so that we have it working across all versions. Some saves will still not be working, but there is no way how to make everything work (due to the reason described in 2nd sentence), so I'm making it work for everything from now on (no matter which version), plus everything before the patch that "broke" it.
Cheers, Sam.
---
Notes: This fixes migration from older libvirts. By "older", I mean pre-(circa-)1.2.7, also in some cases pre-1.2.11, in some other cases pre-v1.2.20. It's pretty messy. It could be back-ported as far as it's easy to do.
Using just a few words I tried to describe that here ^^. Any of those explanations might be hard to follow due to -ENOCOFFEE, sorry if that's the case =)
participants (3)
-
Martin Kletzander
-
Michal Privoznik
-
Sam Bobroff