[libvirt] [PATCH] qemu: Set reasonable RSS limit on domain startup

If there's a memory leak in qemu or qemu is exploited the host's system will sooner or later start trashing instead of killing the bad process. This however has impact on performance and other guests as well. Therefore we should set a reasonable RSS limit even when user hasn't set any. It's better to be secure by default. --- src/qemu/qemu_cgroup.c | 73 +++++++++++++++++++++++++++--------------------- 1 files changed, 41 insertions(+), 32 deletions(-) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index e39f5e1..2c158c1 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -339,42 +339,51 @@ int qemuSetupCgroup(struct qemud_driver *driver, } } - if (vm->def->mem.hard_limit != 0 || - vm->def->mem.soft_limit != 0 || - vm->def->mem.swap_hard_limit != 0) { - if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_MEMORY)) { - if (vm->def->mem.hard_limit != 0) { - rc = virCgroupSetMemoryHardLimit(cgroup, vm->def->mem.hard_limit); - if (rc != 0) { - virReportSystemError(-rc, - _("Unable to set memory hard limit for domain %s"), - vm->def->name); - goto cleanup; - } - } - if (vm->def->mem.soft_limit != 0) { - rc = virCgroupSetMemorySoftLimit(cgroup, vm->def->mem.soft_limit); - if (rc != 0) { - virReportSystemError(-rc, - _("Unable to set memory soft limit for domain %s"), - vm->def->name); - goto cleanup; - } + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_MEMORY)) { + unsigned long long hard_limit = vm->def->mem.hard_limit; + + if (!hard_limit) { + /* If there is no hard_limit set, set a reasonable + * one to avoid system trashing caused by exploited qemu. + * As 'reasonable limit' has been chosen: + * (1 + k) * (domain memory) + F + * where k = 0.02 and F = 200MB. */ + hard_limit = vm->def->mem.max_balloon * 1.02 + 204800; + } + + rc = virCgroupSetMemoryHardLimit(cgroup, hard_limit); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set memory hard limit for domain %s"), + vm->def->name); + goto cleanup; + } + if (vm->def->mem.soft_limit != 0) { + rc = virCgroupSetMemorySoftLimit(cgroup, vm->def->mem.soft_limit); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set memory soft limit for domain %s"), + vm->def->name); + goto cleanup; } + } - if (vm->def->mem.swap_hard_limit != 0) { - rc = virCgroupSetMemSwapHardLimit(cgroup, vm->def->mem.swap_hard_limit); - if (rc != 0) { - virReportSystemError(-rc, - _("Unable to set swap hard limit for domain %s"), - vm->def->name); - goto cleanup; - } + if (vm->def->mem.swap_hard_limit != 0) { + rc = virCgroupSetMemSwapHardLimit(cgroup, vm->def->mem.swap_hard_limit); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set swap hard limit for domain %s"), + vm->def->name); + goto cleanup; } - } else { - qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("Memory cgroup is not available on this host")); } + } else if (vm->def->mem.hard_limit != 0 || + vm->def->mem.soft_limit != 0 || + vm->def->mem.swap_hard_limit != 0) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Memory cgroup is not available on this host")); + } else { + VIR_WARN("Could not autoset a RSS limit for domain %s", vm->def->name); } if (vm->def->cputune.shares != 0) { -- 1.7.8.6

On Tue, Jul 17, 2012 at 11:45 AM, Michal Privoznik <mprivozn@redhat.com> wrote:
If there's a memory leak in qemu or qemu is exploited the host's system will sooner or later start trashing instead of killing the bad process. This however has impact on performance and other guests as well. Therefore we should set a reasonable RSS limit even when user hasn't set any. It's better to be secure by default. --- src/qemu/qemu_cgroup.c | 73 +++++++++++++++++++++++++++--------------------- 1 files changed, 41 insertions(+), 32 deletions(-)
diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index e39f5e1..2c158c1 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -339,42 +339,51 @@ int qemuSetupCgroup(struct qemud_driver *driver, } }
- if (vm->def->mem.hard_limit != 0 || - vm->def->mem.soft_limit != 0 || - vm->def->mem.swap_hard_limit != 0) { - if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_MEMORY)) { - if (vm->def->mem.hard_limit != 0) { - rc = virCgroupSetMemoryHardLimit(cgroup, vm->def->mem.hard_limit); - if (rc != 0) { - virReportSystemError(-rc, - _("Unable to set memory hard limit for domain %s"), - vm->def->name); - goto cleanup; - } - } - if (vm->def->mem.soft_limit != 0) { - rc = virCgroupSetMemorySoftLimit(cgroup, vm->def->mem.soft_limit); - if (rc != 0) { - virReportSystemError(-rc, - _("Unable to set memory soft limit for domain %s"), - vm->def->name); - goto cleanup; - } + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_MEMORY)) { + unsigned long long hard_limit = vm->def->mem.hard_limit; + + if (!hard_limit) { + /* If there is no hard_limit set, set a reasonable + * one to avoid system trashing caused by exploited qemu. + * As 'reasonable limit' has been chosen: + * (1 + k) * (domain memory) + F + * where k = 0.02 and F = 200MB. */ + hard_limit = vm->def->mem.max_balloon * 1.02 + 204800; + } + + rc = virCgroupSetMemoryHardLimit(cgroup, hard_limit); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set memory hard limit for domain %s"), + vm->def->name); + goto cleanup; + } + if (vm->def->mem.soft_limit != 0) { + rc = virCgroupSetMemorySoftLimit(cgroup, vm->def->mem.soft_limit); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set memory soft limit for domain %s"), + vm->def->name); + goto cleanup; } + }
- if (vm->def->mem.swap_hard_limit != 0) { - rc = virCgroupSetMemSwapHardLimit(cgroup, vm->def->mem.swap_hard_limit); - if (rc != 0) { - virReportSystemError(-rc, - _("Unable to set swap hard limit for domain %s"), - vm->def->name); - goto cleanup; - } + if (vm->def->mem.swap_hard_limit != 0) { + rc = virCgroupSetMemSwapHardLimit(cgroup, vm->def->mem.swap_hard_limit); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set swap hard limit for domain %s"), + vm->def->name); + goto cleanup; } - } else { - qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("Memory cgroup is not available on this host")); } + } else if (vm->def->mem.hard_limit != 0 || + vm->def->mem.soft_limit != 0 || + vm->def->mem.swap_hard_limit != 0) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Memory cgroup is not available on this host")); + } else { + VIR_WARN("Could not autoset a RSS limit for domain %s", vm->def->name); }
if (vm->def->cputune.shares != 0) { -- 1.7.8.6
Looks like a good change. My only question would be if its better to look up the guest video RAM instead of assuming that QEMU overhead + guest video RAM will amount to no more than about 200MB (I say about because of the 1.02 fudge factor). If its not really possible to grab that or it just makes sense to go with a set value like this then ACK from me. -- Doug Goldstein

On Tue, Jul 17, 2012 at 11:14:43PM -0500, Doug Goldstein wrote:
On Tue, Jul 17, 2012 at 11:45 AM, Michal Privoznik <mprivozn@redhat.com> wrote:
If there's a memory leak in qemu or qemu is exploited the host's system will sooner or later start trashing instead of killing the bad process. This however has impact on performance and other guests as well. Therefore we should set a reasonable RSS limit even when user hasn't set any. It's better to be secure by default. --- src/qemu/qemu_cgroup.c | 73 +++++++++++++++++++++++++++--------------------- 1 files changed, 41 insertions(+), 32 deletions(-)
diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index e39f5e1..2c158c1 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -339,42 +339,51 @@ int qemuSetupCgroup(struct qemud_driver *driver, } }
- if (vm->def->mem.hard_limit != 0 || - vm->def->mem.soft_limit != 0 || - vm->def->mem.swap_hard_limit != 0) { - if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_MEMORY)) { - if (vm->def->mem.hard_limit != 0) { - rc = virCgroupSetMemoryHardLimit(cgroup, vm->def->mem.hard_limit); - if (rc != 0) { - virReportSystemError(-rc, - _("Unable to set memory hard limit for domain %s"), - vm->def->name); - goto cleanup; - } - } - if (vm->def->mem.soft_limit != 0) { - rc = virCgroupSetMemorySoftLimit(cgroup, vm->def->mem.soft_limit); - if (rc != 0) { - virReportSystemError(-rc, - _("Unable to set memory soft limit for domain %s"), - vm->def->name); - goto cleanup; - } + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_MEMORY)) { + unsigned long long hard_limit = vm->def->mem.hard_limit; + + if (!hard_limit) { + /* If there is no hard_limit set, set a reasonable + * one to avoid system trashing caused by exploited qemu. + * As 'reasonable limit' has been chosen: + * (1 + k) * (domain memory) + F + * where k = 0.02 and F = 200MB. */ + hard_limit = vm->def->mem.max_balloon * 1.02 + 204800; + } + + rc = virCgroupSetMemoryHardLimit(cgroup, hard_limit); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set memory hard limit for domain %s"), + vm->def->name); + goto cleanup; + } + if (vm->def->mem.soft_limit != 0) { + rc = virCgroupSetMemorySoftLimit(cgroup, vm->def->mem.soft_limit); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set memory soft limit for domain %s"), + vm->def->name); + goto cleanup; } + }
- if (vm->def->mem.swap_hard_limit != 0) { - rc = virCgroupSetMemSwapHardLimit(cgroup, vm->def->mem.swap_hard_limit); - if (rc != 0) { - virReportSystemError(-rc, - _("Unable to set swap hard limit for domain %s"), - vm->def->name); - goto cleanup; - } + if (vm->def->mem.swap_hard_limit != 0) { + rc = virCgroupSetMemSwapHardLimit(cgroup, vm->def->mem.swap_hard_limit); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set swap hard limit for domain %s"), + vm->def->name); + goto cleanup; } - } else { - qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("Memory cgroup is not available on this host")); } + } else if (vm->def->mem.hard_limit != 0 || + vm->def->mem.soft_limit != 0 || + vm->def->mem.swap_hard_limit != 0) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Memory cgroup is not available on this host")); + } else { + VIR_WARN("Could not autoset a RSS limit for domain %s", vm->def->name); }
if (vm->def->cputune.shares != 0) { -- 1.7.8.6
Looks like a good change. My only question would be if its better to look up the guest video RAM instead of assuming that QEMU overhead + guest video RAM will amount to no more than about 200MB (I say about because of the 1.02 fudge factor). If its not really possible to grab that or it just makes sense to go with a set value like this then ACK from me.
Yeah, since we go to the trouble of keeping video RAM in the XML, I say we ought to use it, instead of the 200MB fudge factor. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Il 18/07/2012 22:17, Daniel P. Berrange ha scritto:
Looks like a good change. My only question would be if its better to look up the guest video RAM instead of assuming that QEMU overhead + guest video RAM will amount to no more than about 200MB (I say about because of the 1.02 fudge factor). If its not really possible to grab that or it just makes sense to go with a set value like this then ACK from me.
Yeah, since we go to the trouble of keeping video RAM in the XML, I say we ought to use it, instead of the 200MB fudge factor.
Migration can probably consume more memory than this, with the new XBZRLE support that was committed upstream. Orit, how big is the cache? Paolo

On 07/19/2012 08:10 AM, Paolo Bonzini wrote:
Il 18/07/2012 22:17, Daniel P. Berrange ha scritto:
Looks like a good change. My only question would be if its better to look up the guest video RAM instead of assuming that QEMU overhead + guest video RAM will amount to no more than about 200MB (I say about because of the 1.02 fudge factor). If its not really possible to grab that or it just makes sense to go with a set value like this then ACK from me.
Yeah, since we go to the trouble of keeping video RAM in the XML, I say we ought to use it, instead of the 200MB fudge factor.
Migration can probably consume more memory than this, with the new XBZRLE support that was committed upstream. Orit, how big is the cache?
It's user-configurable (although libvirt would have to expose that configuration); defaulting to 64M plus overhead. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 07/19/2012 05:10 PM, Paolo Bonzini wrote:
Il 18/07/2012 22:17, Daniel P. Berrange ha scritto:
Looks like a good change. My only question would be if its better to look up the guest video RAM instead of assuming that QEMU overhead + guest video RAM will amount to no more than about 200MB (I say about because of the 1.02 fudge factor). If its not really possible to grab that or it just makes sense to go with a set value like this then ACK from me.
Yeah, since we go to the trouble of keeping video RAM in the XML, I say we ought to use it, instead of the 200MB fudge factor.
Migration can probably consume more memory than this, with the new XBZRLE support that was committed upstream. Orit, how big is the cache?
The default cache size is 64M and is allocated on demand, the user can configure the desired cache size. My guess is that the cache size will be part of the migration configuration and can be used for calculating the needed extra RAM for migration with XBZRLE. Regards, Orit
Paolo
participants (6)
-
Daniel P. Berrange
-
Doug Goldstein
-
Eric Blake
-
Michal Privoznik
-
Orit Wasserman
-
Paolo Bonzini