On 5/23/19 9:22 AM, Daniel P. Berrangé wrote:
On Wed, May 22, 2019 at 05:16:38PM -0600, Jim Fehlig wrote:
> Hi All,
>
> I recently received an internal bug report of VM "crashing" due to hitting
> thread limits. Seems there was an assert in pthread_create within the VM
> when hitting the limit enforced by pids controller on the host
>
> Apr 28 07:45:46 lpcomp02007 kernel: cgroup: fork rejected by pids controller
> in /machine.slice/machine-qemu\x2d90028\x2dinstance\x2d0000634b.scope
>
> The user has TasksMax set to infinity in machine.slice, but apparently that
> is not inherited by child scopes and appears to be hardcoded to 16384
>
>
https://github.com/systemd/systemd/blob/51aba17b88617515e037e8985d3a4ea87...
>
> The TasksMax property can be set when creating the machine as is done in the
> attached proof of concept patch. Question is whether this should be a
> tunable? My initial thought when seeing the report was TasksMax could be
> calculated based on number of vcpus, iothreads, emulator threads, etc. But
> it appears that could be quite tricky. The following mail thread describes
> the basic scenario encountered by my user
>
>
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008174.html
>
> As you can see, many rbd images attached to a VM can result in an awful lot
> of threads. 300 images could result in 720K threads! We could punt and set
> the limit to infinity, but it exists for a reason - fork bomb prevention. A
> potential compromise between a hardcoded value and per-VM tunable is a
> driver tunable in qemu.conf. If a per-VM tunable is preferred, suggestions
> on where to place it and what to call it would be much appreciated :-).
Yeah, RBD is problematic as you can't predict how many threads it will
use.
We currently have a "max_processes" stting in qemu.conf for the ulimit
base process limit. This applies to the user as a whole though, not the
cgroup.
On Fedora we don't seem to have any "tasks_max" cgroup setting or TasksMax
systemd setting, at least when running with cgroups v1, so we can't set that
unconditionally.
AFAICT, the TasksMax scope property maps to pids.max in pids controller
hierarchy. E.g. with the hardcoded 32k value in the POC patch
# cat /sys/fs/cgroup/pids/machine.slice/machine-qemu\\x2d2\\x2dsles15.scope/pids.max
32768
Regards,
Jim
I'd be inclined to have a new qemu.conf setting "max_tasks". If this is
set to 0, then we should just set TasksMax to infinity, otherwise honour
the setting.
> >From 0583ee3b26b2ee43efe8d25226eceb8547400d97 Mon Sep 17 00:00:00 2001
> From: Jim Fehlig <jfehlig(a)suse.com>
> Date: Wed, 22 May 2019 17:12:14 -0600
> Subject: [PATCH] systemd: set TasksMax when calling CreateMachine
>
> An example of how to set TasksMax when creating a scope for a machine.
>
> Signed-off-by: Jim Fehlig <jfehlig(a)suse.com>
> ---
> src/util/virsystemd.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/src/util/virsystemd.c b/src/util/virsystemd.c
> index 3f03e3bd63..6177447bdb 100644
> --- a/src/util/virsystemd.c
> +++ b/src/util/virsystemd.c
> @@ -341,10 +341,11 @@ int virSystemdCreateMachine(const char *name,
> (unsigned int)pidleader,
> NULLSTR_EMPTY(rootdir),
> nnicindexes, nicindexes,
> - 3,
> + 4,
> "Slice", "s", slicename,
> "After", "as", 1,
"libvirtd.service",
> - "Before", "as", 1,
"virt-guest-shutdown.target") < 0)
> + "Before", "as", 1,
"virt-guest-shutdown.target",
> + "TasksMax", "t", UINT64_C(32768))
< 0)
> goto cleanup;
>
> if (error.level == VIR_ERR_ERROR) {
> @@ -382,10 +383,11 @@ int virSystemdCreateMachine(const char *name,
> iscontainer ? "container" :
"vm",
> (unsigned int)pidleader,
> NULLSTR_EMPTY(rootdir),
> - 3,
> + 4,
> "Slice", "s", slicename,
> "After", "as", 1,
"libvirtd.service",
> - "Before", "as", 1,
"virt-guest-shutdown.target") < 0)
> + "Before", "as", 1,
"virt-guest-shutdown.target",
> + "TasksMax", "t", UINT64_C(32768))
< 0)
> goto cleanup;
> }
>
> --
> 2.21.0
>
> --
> libvir-list mailing list
> libvir-list(a)redhat.com
>
https://www.redhat.com/mailman/listinfo/libvir-list
Regards,
Daniel