[libvirt] [PATCH 0/3] Reflect changes in kernel 3.12 in our cgroups code

To see what changed in the kernel, see explanation in PATCH 1/3. First two patches make proper use of VIR_DOMAIN_MEMORY_PARAM_UNLIMITED, the third one just fixes a "typo". Martin Kletzander (3): cgroups: Redefine what "unlimited" means wrt memory limits qemu: Report VIR_DOMAIN_MEMORY_PARAM_UNLIMITED properly qemu: Fix minor inconsistency in error message src/qemu/qemu_driver.c | 27 ++++++++++++------------ src/util/vircgroup.c | 57 ++++++++++++++++++++++++++++++++++---------------- 2 files changed, 53 insertions(+), 31 deletions(-) -- 1.8.4.4

Since kernel 3.12 (commit 34ff8dc08956098563989d8599840b130be81252 in linux-stable.git in particular) the value for 'unlimited' in cgroup memory limits changed from LLONG_MAX to ULLONG_MAX. Due to rather unfortunate choice of our VIR_DOMAIN_MEMORY_PARAM_UNLIMITED constant (which we transfer as an unsigned long long in Kibibytes), we ended up with the situation described below (applies to x86_64): - 2^64-1 (ULLONG_MAX) -- "unlimited" in kernel = 3.12 - 2^63-1 (LLONG_MAX) -- "unlimited" in kernel < 3.12 - 2^63-1024 -- our PARAM_UNLIMITED scaled to Bytes - 2^53-1 -- our PARAM_UNLIMITED unscaled (in Kibibytes) This means that when any number within (2^63-1, 2^64-1] is read from memory cgroup, we are transferring that number instead of "unlimited". Unfortunately, changing VIR_DOMAIN_MEMORY_PARAM_UNLIMITED would break ABI compatibility and thus we have to resort to a different solution. With this patch every value greater than PARAM_UNLIMITED means "unlimited". Even though this may seem misleading, we are already in such unclear situation when running 3.12 kernel with memory limits set to 2^63. One example showing most of the problems at once (with kernel 3.12.2): # virsh memtune asdf --hard-limit 9007199254740991 --swap-hard-limit -1 # echo 12345678901234567890 >\ /sys/fs/cgroup/memory/machine/asdf.libvirt-qemu/memory.soft_limit_in_bytes # virsh memtune asdf hard_limit : 18014398509481983 soft_limit : 12056327051986884 swap_hard_limit: 18014398509481983 Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/util/vircgroup.c | 57 +++++++++++++++++++++++++++++++++++----------------- 1 file changed, 39 insertions(+), 18 deletions(-) diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 9674328..43eb649 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -1957,12 +1957,19 @@ int virCgroupGetMemoryHardLimit(virCgroupPtr group, unsigned long long *kb) { long long unsigned int limit_in_bytes; - int ret; - ret = virCgroupGetValueU64(group, - VIR_CGROUP_CONTROLLER_MEMORY, - "memory.limit_in_bytes", &limit_in_bytes); - if (ret == 0) - *kb = limit_in_bytes >> 10; + int ret = -1; + + if (virCgroupGetValueU64(group, + VIR_CGROUP_CONTROLLER_MEMORY, + "memory.limit_in_bytes", &limit_in_bytes) < 0) + goto cleanup; + + *kb = limit_in_bytes >> 10; + if (*kb > VIR_DOMAIN_MEMORY_PARAM_UNLIMITED) + *kb = VIR_DOMAIN_MEMORY_PARAM_UNLIMITED; + + ret = 0; + cleanup: return ret; } @@ -2012,12 +2019,19 @@ int virCgroupGetMemorySoftLimit(virCgroupPtr group, unsigned long long *kb) { long long unsigned int limit_in_bytes; - int ret; - ret = virCgroupGetValueU64(group, - VIR_CGROUP_CONTROLLER_MEMORY, - "memory.soft_limit_in_bytes", &limit_in_bytes); - if (ret == 0) - *kb = limit_in_bytes >> 10; + int ret = -1; + + if (virCgroupGetValueU64(group, + VIR_CGROUP_CONTROLLER_MEMORY, + "memory.soft_limit_in_bytes", &limit_in_bytes) < 0) + goto cleanup; + + *kb = limit_in_bytes >> 10; + if (*kb > VIR_DOMAIN_MEMORY_PARAM_UNLIMITED) + *kb = VIR_DOMAIN_MEMORY_PARAM_UNLIMITED; + + ret = 0; + cleanup: return ret; } @@ -2067,12 +2081,19 @@ int virCgroupGetMemSwapHardLimit(virCgroupPtr group, unsigned long long *kb) { long long unsigned int limit_in_bytes; - int ret; - ret = virCgroupGetValueU64(group, - VIR_CGROUP_CONTROLLER_MEMORY, - "memory.memsw.limit_in_bytes", &limit_in_bytes); - if (ret == 0) - *kb = limit_in_bytes >> 10; + int ret = -1; + + if (virCgroupGetValueU64(group, + VIR_CGROUP_CONTROLLER_MEMORY, + "memory.memsw.limit_in_bytes", &limit_in_bytes) < 0) + goto cleanup; + + *kb = limit_in_bytes >> 10; + if (*kb > VIR_DOMAIN_MEMORY_PARAM_UNLIMITED) + *kb = VIR_DOMAIN_MEMORY_PARAM_UNLIMITED; + + ret = 0; + cleanup: return ret; } -- 1.8.4.4

On Mon, Dec 09, 2013 at 09:22:51AM +0100, Martin Kletzander wrote:
Since kernel 3.12 (commit 34ff8dc08956098563989d8599840b130be81252 in linux-stable.git in particular) the value for 'unlimited' in cgroup memory limits changed from LLONG_MAX to ULLONG_MAX. Due to rather unfortunate choice of our VIR_DOMAIN_MEMORY_PARAM_UNLIMITED constant (which we transfer as an unsigned long long in Kibibytes), we ended up with the situation described below (applies to x86_64):
- 2^64-1 (ULLONG_MAX) -- "unlimited" in kernel = 3.12
- 2^63-1 (LLONG_MAX) -- "unlimited" in kernel < 3.12 - 2^63-1024 -- our PARAM_UNLIMITED scaled to Bytes
- 2^53-1 -- our PARAM_UNLIMITED unscaled (in Kibibytes)
This means that when any number within (2^63-1, 2^64-1] is read from memory cgroup, we are transferring that number instead of "unlimited". Unfortunately, changing VIR_DOMAIN_MEMORY_PARAM_UNLIMITED would break ABI compatibility and thus we have to resort to a different solution.
With this patch every value greater than PARAM_UNLIMITED means "unlimited". Even though this may seem misleading, we are already in such unclear situation when running 3.12 kernel with memory limits set to 2^63.
One example showing most of the problems at once (with kernel 3.12.2): # virsh memtune asdf --hard-limit 9007199254740991 --swap-hard-limit -1 # echo 12345678901234567890 >\ /sys/fs/cgroup/memory/machine/asdf.libvirt-qemu/memory.soft_limit_in_bytes # virsh memtune asdf hard_limit : 18014398509481983 soft_limit : 12056327051986884 swap_hard_limit: 18014398509481983
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/util/vircgroup.c | 57 +++++++++++++++++++++++++++++++++++----------------- 1 file changed, 39 insertions(+), 18 deletions(-)
diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 9674328..43eb649 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -1957,12 +1957,19 @@ int virCgroupGetMemoryHardLimit(virCgroupPtr group, unsigned long long *kb) { long long unsigned int limit_in_bytes; - int ret; - ret = virCgroupGetValueU64(group, - VIR_CGROUP_CONTROLLER_MEMORY, - "memory.limit_in_bytes", &limit_in_bytes); - if (ret == 0) - *kb = limit_in_bytes >> 10; + int ret = -1; + + if (virCgroupGetValueU64(group, + VIR_CGROUP_CONTROLLER_MEMORY, + "memory.limit_in_bytes", &limit_in_bytes) < 0) + goto cleanup; + + *kb = limit_in_bytes >> 10; + if (*kb > VIR_DOMAIN_MEMORY_PARAM_UNLIMITED) + *kb = VIR_DOMAIN_MEMORY_PARAM_UNLIMITED; + + ret = 0; + cleanup: return ret; }
@@ -2012,12 +2019,19 @@ int virCgroupGetMemorySoftLimit(virCgroupPtr group, unsigned long long *kb) { long long unsigned int limit_in_bytes; - int ret; - ret = virCgroupGetValueU64(group, - VIR_CGROUP_CONTROLLER_MEMORY, - "memory.soft_limit_in_bytes", &limit_in_bytes); - if (ret == 0) - *kb = limit_in_bytes >> 10; + int ret = -1; + + if (virCgroupGetValueU64(group, + VIR_CGROUP_CONTROLLER_MEMORY, + "memory.soft_limit_in_bytes", &limit_in_bytes) < 0) + goto cleanup; + + *kb = limit_in_bytes >> 10; + if (*kb > VIR_DOMAIN_MEMORY_PARAM_UNLIMITED) + *kb = VIR_DOMAIN_MEMORY_PARAM_UNLIMITED; + + ret = 0; + cleanup: return ret; }
@@ -2067,12 +2081,19 @@ int virCgroupGetMemSwapHardLimit(virCgroupPtr group, unsigned long long *kb) { long long unsigned int limit_in_bytes; - int ret; - ret = virCgroupGetValueU64(group, - VIR_CGROUP_CONTROLLER_MEMORY, - "memory.memsw.limit_in_bytes", &limit_in_bytes); - if (ret == 0) - *kb = limit_in_bytes >> 10; + int ret = -1; + + if (virCgroupGetValueU64(group, + VIR_CGROUP_CONTROLLER_MEMORY, + "memory.memsw.limit_in_bytes", &limit_in_bytes) < 0) + goto cleanup; + + *kb = limit_in_bytes >> 10; + if (*kb > VIR_DOMAIN_MEMORY_PARAM_UNLIMITED) + *kb = VIR_DOMAIN_MEMORY_PARAM_UNLIMITED; + + ret = 0; + cleanup: return ret; }
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

For dead domains that have no memtune limits, we return 0 instead of "unlimited", this patch fixes it to return PARAM_UNLIMITED. Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/qemu/qemu_driver.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 66f2a0e..4b93c02 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -8053,29 +8053,30 @@ qemuDomainGetMemoryParameters(virDomainPtr dom, if (flags & VIR_DOMAIN_AFFECT_CONFIG) { for (i = 0; i < *nparams && i < QEMU_NB_MEM_PARAM; i++) { virMemoryParameterPtr param = ¶ms[i]; + unsigned long long value; switch (i) { case 0: /* fill memory hard limit here */ - if (virTypedParameterAssign(param, - VIR_DOMAIN_MEMORY_HARD_LIMIT, - VIR_TYPED_PARAM_ULLONG, - persistentDef->mem.hard_limit) < 0) + value = persistentDef->mem.hard_limit; + value = value ? value : VIR_DOMAIN_MEMORY_PARAM_UNLIMITED; + if (virTypedParameterAssign(param, VIR_DOMAIN_MEMORY_HARD_LIMIT, + VIR_TYPED_PARAM_ULLONG, value) < 0) goto cleanup; break; case 1: /* fill memory soft limit here */ - if (virTypedParameterAssign(param, - VIR_DOMAIN_MEMORY_SOFT_LIMIT, - VIR_TYPED_PARAM_ULLONG, - persistentDef->mem.soft_limit) < 0) + value = persistentDef->mem.soft_limit; + value = value ? value : VIR_DOMAIN_MEMORY_PARAM_UNLIMITED; + if (virTypedParameterAssign(param, VIR_DOMAIN_MEMORY_SOFT_LIMIT, + VIR_TYPED_PARAM_ULLONG, value) < 0) goto cleanup; break; case 2: /* fill swap hard limit here */ - if (virTypedParameterAssign(param, - VIR_DOMAIN_MEMORY_SWAP_HARD_LIMIT, - VIR_TYPED_PARAM_ULLONG, - persistentDef->mem.swap_hard_limit) < 0) + value = persistentDef->mem.swap_hard_limit; + value = value ? value : VIR_DOMAIN_MEMORY_PARAM_UNLIMITED; + if (virTypedParameterAssign(param, VIR_DOMAIN_MEMORY_SWAP_HARD_LIMIT, + VIR_TYPED_PARAM_ULLONG, value) < 0) goto cleanup; break; -- 1.8.4.4

On Mon, Dec 09, 2013 at 09:22:52AM +0100, Martin Kletzander wrote:
For dead domains that have no memtune limits, we return 0 instead of "unlimited", this patch fixes it to return PARAM_UNLIMITED.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/qemu/qemu_driver.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 66f2a0e..4b93c02 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -8053,29 +8053,30 @@ qemuDomainGetMemoryParameters(virDomainPtr dom, if (flags & VIR_DOMAIN_AFFECT_CONFIG) { for (i = 0; i < *nparams && i < QEMU_NB_MEM_PARAM; i++) { virMemoryParameterPtr param = ¶ms[i]; + unsigned long long value;
switch (i) { case 0: /* fill memory hard limit here */ - if (virTypedParameterAssign(param, - VIR_DOMAIN_MEMORY_HARD_LIMIT, - VIR_TYPED_PARAM_ULLONG, - persistentDef->mem.hard_limit) < 0) + value = persistentDef->mem.hard_limit; + value = value ? value : VIR_DOMAIN_MEMORY_PARAM_UNLIMITED; + if (virTypedParameterAssign(param, VIR_DOMAIN_MEMORY_HARD_LIMIT, + VIR_TYPED_PARAM_ULLONG, value) < 0) goto cleanup; break;
case 1: /* fill memory soft limit here */ - if (virTypedParameterAssign(param, - VIR_DOMAIN_MEMORY_SOFT_LIMIT, - VIR_TYPED_PARAM_ULLONG, - persistentDef->mem.soft_limit) < 0) + value = persistentDef->mem.soft_limit; + value = value ? value : VIR_DOMAIN_MEMORY_PARAM_UNLIMITED; + if (virTypedParameterAssign(param, VIR_DOMAIN_MEMORY_SOFT_LIMIT, + VIR_TYPED_PARAM_ULLONG, value) < 0) goto cleanup; break;
case 2: /* fill swap hard limit here */ - if (virTypedParameterAssign(param, - VIR_DOMAIN_MEMORY_SWAP_HARD_LIMIT, - VIR_TYPED_PARAM_ULLONG, - persistentDef->mem.swap_hard_limit) < 0) + value = persistentDef->mem.swap_hard_limit; + value = value ? value : VIR_DOMAIN_MEMORY_PARAM_UNLIMITED; + if (virTypedParameterAssign(param, VIR_DOMAIN_MEMORY_SWAP_HARD_LIMIT, + VIR_TYPED_PARAM_ULLONG, value) < 0) goto cleanup; break;
Ok, so when setting cgroups limits, we treat '0' as meaning leave the cgroup setting on its default, which is unlimited. ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/qemu/qemu_driver.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 4b93c02..a55c762 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7948,7 +7948,7 @@ qemuDomainSetMemoryParameters(virDomainPtr dom, if (virCompareLimitUlong(mem_limit, swap_limit) > 0) { virReportError(VIR_ERR_INVALID_ARG, "%s", _("memory hard_limit tunable value must be lower " - "than swap_hard_limit")); + "than or equal to swap_hard_limit")); goto cleanup; } } -- 1.8.4.4

On Mon, Dec 09, 2013 at 09:22:53AM +0100, Martin Kletzander wrote:
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/qemu/qemu_driver.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 4b93c02..a55c762 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7948,7 +7948,7 @@ qemuDomainSetMemoryParameters(virDomainPtr dom, if (virCompareLimitUlong(mem_limit, swap_limit) > 0) { virReportError(VIR_ERR_INVALID_ARG, "%s", _("memory hard_limit tunable value must be lower " - "than swap_hard_limit")); + "than or equal to swap_hard_limit")); goto cleanup; } }
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 09.12.2013 09:22, Martin Kletzander wrote:
To see what changed in the kernel, see explanation in PATCH 1/3.
First two patches make proper use of VIR_DOMAIN_MEMORY_PARAM_UNLIMITED, the third one just fixes a "typo".
Martin Kletzander (3): cgroups: Redefine what "unlimited" means wrt memory limits qemu: Report VIR_DOMAIN_MEMORY_PARAM_UNLIMITED properly qemu: Fix minor inconsistency in error message
src/qemu/qemu_driver.c | 27 ++++++++++++------------ src/util/vircgroup.c | 57 ++++++++++++++++++++++++++++++++++---------------- 2 files changed, 53 insertions(+), 31 deletions(-)
ACK series Michal

On Tue, Dec 10, 2013 at 08:59:50AM +0100, Michal Privoznik wrote:
On 09.12.2013 09:22, Martin Kletzander wrote:
To see what changed in the kernel, see explanation in PATCH 1/3.
First two patches make proper use of VIR_DOMAIN_MEMORY_PARAM_UNLIMITED, the third one just fixes a "typo".
Martin Kletzander (3): cgroups: Redefine what "unlimited" means wrt memory limits qemu: Report VIR_DOMAIN_MEMORY_PARAM_UNLIMITED properly qemu: Fix minor inconsistency in error message
src/qemu/qemu_driver.c | 27 ++++++++++++------------ src/util/vircgroup.c | 57 ++++++++++++++++++++++++++++++++++---------------- 2 files changed, 53 insertions(+), 31 deletions(-)
ACK series
Thanks, pushed. Martin
participants (3)
-
Daniel P. Berrange
-
Martin Kletzander
-
Michal Privoznik