On 03/10/2016 12:34 AM, Martin Kletzander wrote:
I'm not able to reproduce your issue, but that might be because
I'm not
running systemd neither in the container nor in the host. If we can
reproduce it without systemd, though, that would be very helpful for
finding out the cause of all this.
We're seeing this on CentOS 7.1, which is systemd based. We were able to
determine that the cause of the problem is due to a container's console
buffer being filled. In a container (or VM) the console of course is not
a real physical device, it's a pseudo tty. With a physical console, when
some process writes something to /dev/console, it appears on the
physical console and if no one is there to see the text it eventually
scrolls off the screen and is lost. There is no limit to how much text
can be sent to the console.
In the case of a container and its pseudo console, there is a buffer
associated with the console device and this buffer has a size limit. If
there is an active console session open for a container, any text sent
to the container's console (e.g. by systemd) is consumed and processed
by the container. However, if there is no active console session, as
processes continue to write to the container's console device, the
buffer associated with this pseudo console fills up. When this happens,
any process that attempts to write to the container's console blocks and
will stay blocked forever until a console session is started. These hung
processes were the source of our zombie processes.
We solved the problem by writing a console monitor service that runs on
the hypervisor hosting the containers. It continually monitors the
console devices of all containers and if there is an open console
session for a given container, it does nothing. If however there is no
active console session, it opens the console device for the container
and drains it using the following Python code:
fd = os.open(console, os.O_RDWR | os.O_NOCTTY)
termios.tcflush(fd, termios.TCIFLUSH)
os.close(fd);
For expediency, we do not save the text that's read. This is ultimately
similar to text scrolling off the top of a physical console.
So, although this monitor service has solved our issue with zombie
processes, I'm not convinced this is really the right solution. I'd like
to think if a container is setup correctly, its console device should
not fill up and block processes that attempt to write to it. I would
think this would be a big problem for anyone running containers under
libvirt_lxc. The problem is easy to reproduce in our environment: Open a
console session to container and run "cat" with no arguments. Leave it
running and disconnect the console session (control-]). Determine the
container's console device from its xml definition, e.g. /dev/pts/3, and
then copy some large file to it, e.g.
# cp /var/log/messages /dev/pts/3
Assuming the file is larger than the console's backing buffer, this cp
command should hang. If you then open a console session to this
container from another window, you'll see the contents of
/var/log/messages appear on the screen and the cp command in the other
window will exit.
If you are unable to reproduce it in your our setup following this
procedure, then something is either wrong with my container
configuration or there is something more insidious going on. I'd
appreciate if you could run a test with this procedure and let me know
the results.
Thanks.
Peter