[libvirt] RFC: Increasing number of processes allowed for qemu user

Hi all, When libvirt is running on a system with limited number of processes allowed to be created by the user under which qemu processes are run, we hit scalability issues. My take on this issue is that it's in fact a host configuration issue and it should just be documented that users need to increase the limit if they're going to run large number of domains or on a large number of VCPUs. I don't think libvirt should be involved in changing the limit in any way, especially since it's not possible to change the limit in runtime as doing so can only affect new processes. However, I'd like to be sure this is also the thinking of broader community, esp. Dan :-) Jirka

On Wed, Mar 02, 2011 at 04:28:46PM +0100, Jiri Denemark wrote:
Hi all,
When libvirt is running on a system with limited number of processes allowed to be created by the user under which qemu processes are run, we hit scalability issues.
My take on this issue is that it's in fact a host configuration issue and it should just be documented that users need to increase the limit if they're going to run large number of domains or on a large number of VCPUs.
I don't think libvirt should be involved in changing the limit in any way, especially since it's not possible to change the limit in runtime as doing so can only affect new processes.
If it were possible to change it on the fly, then I think it could be valid to suggest it is libvirt's responsibility. Since ulimits can't be changed for the existing QEMU processes though, the only option is to change it ahead of time at a host level. The default of 1024 is clearly faaar to low for modern systems so, IMHO, someone/thing just needs to place a file at /etc/limits.d/qemu.conf to raise it say to 10,000 which a modern Linux system can easily cope with. The original rational for the nproc ulimit is to protect against fork bombs, but this is really a terrible solution because it applies at the wrong level. To protect against this properly we would want a limit on a number of children any single QEMU process can spawn, not a limit on the number that the QEMU user can spawn. The obvious places for this is a cgroup tunable, which would mesh nicely with the fact that we put each QEMU in a dedicated group. Someone has written such a patch before, but it was never merged AFAICT: http://lwn.net/Articles/285337/ Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Thu, Mar 03, 2011 at 12:07:12 +0000, Daniel P. Berrange wrote:
If it were possible to change it on the fly, then I think it could be valid to suggest it is libvirt's responsibility. Since ulimits can't be changed for the existing QEMU processes though, the only option is to change it ahead of time at a host level. The default of 1024 is clearly faaar to low for modern systems so, IMHO, someone/thing just needs to place a file at /etc/limits.d/qemu.conf to raise it say to 10,000 which a modern Linux system can easily cope with.
Right, although it's /etc/security/limits.d/..., but the thing is who/what should provided that file. Should it be libvirt or host admin or something else? On one hand, I think it shouldn't be libvirt since it is possible to change qemu user in /etc/libvirt/qemu.conf and setting such limits should be done by the host admin. On the other hand, for better out-of-the-box behavior, we could generate that file according to how libvirt was configured (i.e., what the default qemu user is) and install it. Which of the options do you prefer?
The original rational for the nproc ulimit is to protect against fork bombs, but this is really a terrible solution because it applies at the wrong level. To protect against this properly we would want a limit on a number of children any single QEMU process can spawn, not a limit on the number that the QEMU user can spawn. The obvious places for this is a cgroup tunable, which would mesh nicely with the fact that we put each QEMU in a dedicated group. Someone has written such a patch before, but it was never merged AFAICT:
Yeah, that would ideal for limiting number of process/threads a single domain can use. Jirka
participants (2)
-
Daniel P. Berrange
-
Jiri Denemark