Hi All,
I recently received an internal bug report of VM "crashing" due to hitting
thread limits. Seems there was an assert in pthread_create within the VM when
hitting the limit enforced by pids controller on the host
Apr 28 07:45:46 lpcomp02007 kernel: cgroup: fork rejected by pids controller in
/machine.slice/machine-qemu\x2d90028\x2dinstance\x2d0000634b.scope
The user has TasksMax set to infinity in machine.slice, but apparently that is
not inherited by child scopes and appears to be hardcoded to 16384
https://github.com/systemd/systemd/blob/51aba17b88617515e037e8985d3a4ea87...
The TasksMax property can be set when creating the machine as is done in the
attached proof of concept patch. Question is whether this should be a tunable?
My initial thought when seeing the report was TasksMax could be calculated based
on number of vcpus, iothreads, emulator threads, etc. But it appears that could
be quite tricky. The following mail thread describes the basic scenario
encountered by my user
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008174.html
As you can see, many rbd images attached to a VM can result in an awful lot of
threads. 300 images could result in 720K threads! We could punt and set the
limit to infinity, but it exists for a reason - fork bomb prevention. A
potential compromise between a hardcoded value and per-VM tunable is a driver
tunable in qemu.conf. If a per-VM tunable is preferred, suggestions on where to
place it and what to call it would be much appreciated :-).
Regards,
Jim