[libvirt-users] Zombie processes being created when console buffer is full

We have been researching stuck zombie processes in our libvirt lxc containers. What we found was: 1) Each zombie’s parent was pid 1. init which symlinks to systemd. 2) In some cases, the zombies were launched by systemd, in others the zombie was inherited. 3) While the child is in the zombie state, the parent process (systemd) /proc/1/status shows no pending signals. 4) Attaching gdb to systemd, there was 1 thread and it was waiting in write() and the file being written was /dev/console. This write() to the console never returns. We operated under the assumption that systemd's SIGCHLD handler sets a bit and a foreground thread (the only thread) would see that child processes needed reaping. While the single thread is stuck in write(), the reaping never takes place. So why is write() blocking? The answer seems to be that there is nothing draining the console and eventually it blocks write() when its buffers become full. When we attached to the container's console, the buffer is cleared allowing systemd’s write() to return. The zombies are then reaped and everything goes back to normal. Our “solution” was more of a workaround. systemd was altered to log errors/warnings/etc to /dev/null instead of /dev/console. This prevented the problem, only in that the console buffer was unlikely to get filled up since systemd generally is the only then that writes to it. This is definitely a hack though. This may be a bug in the libvirt container library (you can't expect something to periodically connect to a container's console to empty it out). We suspect there may also be a configuration issue in our containers with regards to the console. Has anyone else observed this problem? Peter _______________________________________________ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users

On 01/29/2016 05:08 AM, Peter Steele wrote:
We have been researching stuck zombie processes in our libvirt lxc containers. What we found was:
1) Each zombie’s parent was pid 1. init which symlinks to systemd. 2) In some cases, the zombies were launched by systemd, in others the zombie was inherited. 3) While the child is in the zombie state, the parent process (systemd) /proc/1/status shows no pending signals. 4) Attaching gdb to systemd, there was 1 thread and it was waiting in write() and the file being written was /dev/console.
This write() to the console never returns. We operated under the assumption that systemd's SIGCHLD handler sets a bit and a foreground thread (the only thread) would see that child processes needed reaping. While the single thread is stuck in write(), the reaping never takes place.
So why is write() blocking? The answer seems to be that there is nothing draining the console and eventually it blocks write() when its buffers become full. When we attached to the container's console, the buffer is cleared allowing systemd’s write() to return. The zombies are then reaped and everything goes back to normal.
Our “solution” was more of a workaround. systemd was altered to log errors/warnings/etc to /dev/null instead of /dev/console. This prevented the problem, only in that the console buffer was unlikely to get filled up since systemd generally is the only then that writes to it. This is definitely a hack though.
This may be a bug in the libvirt container library (you can't expect something to periodically connect to a container's console to empty it out). We suspect there may also be a configuration issue in our containers with regards to the console.
Has anyone else observed this problem?
As I mentioned here, I think this may have to do with incorrect container configuration with regards to the console. Much of the process though is automated by libvirt itself so I'm not sure what I might be missing. When a container is created, the xml config has this entry defined: <console type='pty'> <target type='lxc' port='0'/> </console> After starting the container, the console config in the xml changes, e.g.: <console type='pty' tty='/dev/pts/2'> <source path='/dev/pts/2'/> <target type='lxc' port='0'/> <alias name='console0'/> </console> In addition to these changes, a new entry is created under /dev/pts: # ll /dev/pts crw--w---- 1 root tty 136, 0 Jan 29 08:27 0 crw--w---- 1 root tty 136, 1 Jan 29 08:26 1 crw--w---- 1 root tty 136, 2 Jan 29 09:19 2 <--- crw--w---- 1 root tty 136, 6 Jan 29 09:22 6 c--------- 1 root root 5, 2 Jan 29 07:52 ptmx The libvirt_lxc process that is spawned a link is created for /dev/console: # ll /dev/conole lrwxrwxrwx 1 root root 10 Jan 29 09:53 console -> /dev/pts/0 and /dev/pts/0 is also created: # ll /dev/pts/0 crw--w---- 1 root tty 136, 0 Jan 29 10:05 /dev/pts/0 I'm surprised that the major/minor number for this isn't the same as /dev/pts/2 in the host. I'm also surprised that no agetty process is launched for the container. I'd expect to see something like this running in the container: # ps aux|grep agetty root 25577 0.0 0.0 6424 792 pts/2 Ss+ 10:13 0:00 /sbin/agetty --noclear --keep-baud console 115200 38400 9600 I guess libvirt does some magic I'm not aware of to handle the consoles for the containers. The question is why are we hitting this issue with zombie processes that are caused by the console buffer filling up? Peter

On Fri, Jan 29, 2016 at 10:25:08AM -0800, Peter Steele wrote:
On 01/29/2016 05:08 AM, Peter Steele wrote:
We have been researching stuck zombie processes in our libvirt lxc containers. What we found was:
1) Each zombie’s parent was pid 1. init which symlinks to systemd. 2) In some cases, the zombies were launched by systemd, in others the zombie was inherited. 3) While the child is in the zombie state, the parent process (systemd) /proc/1/status shows no pending signals. 4) Attaching gdb to systemd, there was 1 thread and it was waiting in write() and the file being written was /dev/console.
This write() to the console never returns. We operated under the assumption that systemd's SIGCHLD handler sets a bit and a foreground thread (the only thread) would see that child processes needed reaping. While the single thread is stuck in write(), the reaping never takes place.
So why is write() blocking? The answer seems to be that there is nothing draining the console and eventually it blocks write() when its buffers become full. When we attached to the container's console, the buffer is cleared allowing systemd’s write() to return. The zombies are then reaped and everything goes back to normal.
Our “solution” was more of a workaround. systemd was altered to log errors/warnings/etc to /dev/null instead of /dev/console. This prevented the problem, only in that the console buffer was unlikely to get filled up since systemd generally is the only then that writes to it. This is definitely a hack though.
This may be a bug in the libvirt container library (you can't expect something to periodically connect to a container's console to empty it out). We suspect there may also be a configuration issue in our containers with regards to the console.
Has anyone else observed this problem?
Unfortunately I did not. How would I go about reproducing it?
As I mentioned here, I think this may have to do with incorrect container configuration with regards to the console. Much of the process though is automated by libvirt itself so I'm not sure what I might be missing. When a container is created, the xml config has this entry defined:
<console type='pty'> <target type='lxc' port='0'/> </console>
After starting the container, the console config in the xml changes, e.g.:
<console type='pty' tty='/dev/pts/2'> <source path='/dev/pts/2'/> <target type='lxc' port='0'/> <alias name='console0'/> </console>
This all looks fine to me.
In addition to these changes, a new entry is created under /dev/pts:
# ll /dev/pts crw--w---- 1 root tty 136, 0 Jan 29 08:27 0 crw--w---- 1 root tty 136, 1 Jan 29 08:26 1 crw--w---- 1 root tty 136, 2 Jan 29 09:19 2 <--- crw--w---- 1 root tty 136, 6 Jan 29 09:22 6 c--------- 1 root root 5, 2 Jan 29 07:52 ptmx
The libvirt_lxc process that is spawned a link is created for /dev/console:
# ll /dev/conole lrwxrwxrwx 1 root root 10 Jan 29 09:53 console -> /dev/pts/0
and /dev/pts/0 is also created:
# ll /dev/pts/0 crw--w---- 1 root tty 136, 0 Jan 29 10:05 /dev/pts/0
I'm surprised that the major/minor number for this isn't the same as /dev/pts/2 in the host. I'm also surprised that no agetty process is launched for the container. I'd expect to see something like this running in the container:
The minor number should not be the same, I believe. That's because of namespaces, it's in the container, so it has it's own numbering.
# ps aux|grep agetty root 25577 0.0 0.0 6424 792 pts/2 Ss+ 10:13 0:00 /sbin/agetty --noclear --keep-baud console 115200 38400 9600
This should depend on the configuration of your guest. If you use some systemd system, it should have it's configuration for each pty. If there is none, it will probably not start any.
I guess libvirt does some magic I'm not aware of to handle the consoles for the containers. The question is why are we hitting this issue with zombie processes that are caused by the console buffer filling up?
I'm not able to reproduce your issue, but that might be because I'm not running systemd neither in the container nor in the host. If we can reproduce it without systemd, though, that would be very helpful for finding out the cause of all this.
Peter
_______________________________________________ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users

On 03/10/2016 12:34 AM, Martin Kletzander wrote:
I'm not able to reproduce your issue, but that might be because I'm not running systemd neither in the container nor in the host. If we can reproduce it without systemd, though, that would be very helpful for finding out the cause of all this.
We're seeing this on CentOS 7.1, which is systemd based. We were able to determine that the cause of the problem is due to a container's console buffer being filled. In a container (or VM) the console of course is not a real physical device, it's a pseudo tty. With a physical console, when some process writes something to /dev/console, it appears on the physical console and if no one is there to see the text it eventually scrolls off the screen and is lost. There is no limit to how much text can be sent to the console. In the case of a container and its pseudo console, there is a buffer associated with the console device and this buffer has a size limit. If there is an active console session open for a container, any text sent to the container's console (e.g. by systemd) is consumed and processed by the container. However, if there is no active console session, as processes continue to write to the container's console device, the buffer associated with this pseudo console fills up. When this happens, any process that attempts to write to the container's console blocks and will stay blocked forever until a console session is started. These hung processes were the source of our zombie processes. We solved the problem by writing a console monitor service that runs on the hypervisor hosting the containers. It continually monitors the console devices of all containers and if there is an open console session for a given container, it does nothing. If however there is no active console session, it opens the console device for the container and drains it using the following Python code: fd = os.open(console, os.O_RDWR | os.O_NOCTTY) termios.tcflush(fd, termios.TCIFLUSH) os.close(fd); For expediency, we do not save the text that's read. This is ultimately similar to text scrolling off the top of a physical console. So, although this monitor service has solved our issue with zombie processes, I'm not convinced this is really the right solution. I'd like to think if a container is setup correctly, its console device should not fill up and block processes that attempt to write to it. I would think this would be a big problem for anyone running containers under libvirt_lxc. The problem is easy to reproduce in our environment: Open a console session to container and run "cat" with no arguments. Leave it running and disconnect the console session (control-]). Determine the container's console device from its xml definition, e.g. /dev/pts/3, and then copy some large file to it, e.g. # cp /var/log/messages /dev/pts/3 Assuming the file is larger than the console's backing buffer, this cp command should hang. If you then open a console session to this container from another window, you'll see the contents of /var/log/messages appear on the screen and the cp command in the other window will exit. If you are unable to reproduce it in your our setup following this procedure, then something is either wrong with my container configuration or there is something more insidious going on. I'd appreciate if you could run a test with this procedure and let me know the results. Thanks. Peter
participants (2)
-
Martin Kletzander
-
Peter Steele