[Libvir] [PATCH] lxc: handle SIGCHLD from exiting container

This patch allows the lxc driver to handle SIGCHLD signals from exiting containers. The handling will perform some cleanup such as waiting for the container process and killing/waiting the tty process. This is also required as a first step towards providing some kind of client container exit notification. Additional support is needed for that but this SIGCHLD handling is what would trigger the notification. libvirtd was already catching SIGCHLD although it was just ignoring it. I implemented a mechanism to distribute the signal to any other drivers in the daemon that registered a function to handle them. This required some changes to the way libvirtd was catching signals (to get the pid of the sending process) as well as an addition to the state driver structure. The intent was to provide future drivers access to signals as well. -- Best Regards, Dave Leskovec IBM Linux Technology Center Open Virtualization

On Wed, Apr 30, 2008 at 11:38:01PM -0700, Dave Leskovec wrote:
This patch allows the lxc driver to handle SIGCHLD signals from exiting containers. The handling will perform some cleanup such as waiting for the container process and killing/waiting the tty process. This is also required as a first step towards providing some kind of client container exit notification. Additional support is needed for that but this SIGCHLD handling is what would trigger the notification.
libvirtd was already catching SIGCHLD although it was just ignoring it. I implemented a mechanism to distribute the signal to any other drivers in the daemon that registered a function to handle them. This required some changes to the way libvirtd was catching signals (to get the pid of the sending process) as well as an addition to the state driver structure. The intent was to provide future drivers access to signals as well.
The reason it was ignoring it was because the QEMU driver detects the shutdown of the VM without using the SIGCHLD directly. It instead detects EOF on the STDOUT/ERR of the VM child process & calls waitpid() then to cleanup. I notice that the LXC driver does not appear to setup any STDERR/OUT for its VMs so they're still inheriting the daemon's. If it isn't a huge problem it'd be desirable to try & have QEMU & LXC operate in the same general way wrt to their primary child procs for VMs. Regards, Daniel. -- |: Red Hat, Engineering, Boston -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Daniel P. Berrange wrote:
On Wed, Apr 30, 2008 at 11:38:01PM -0700, Dave Leskovec wrote:
This patch allows the lxc driver to handle SIGCHLD signals from exiting containers. The handling will perform some cleanup such as waiting for the container process and killing/waiting the tty process. This is also required as a first step towards providing some kind of client container exit notification. Additional support is needed for that but this SIGCHLD handling is what would trigger the notification.
libvirtd was already catching SIGCHLD although it was just ignoring it. I implemented a mechanism to distribute the signal to any other drivers in the daemon that registered a function to handle them. This required some changes to the way libvirtd was catching signals (to get the pid of the sending process) as well as an addition to the state driver structure. The intent was to provide future drivers access to signals as well.
The reason it was ignoring it was because the QEMU driver detects the shutdown of the VM without using the SIGCHLD directly. It instead detects EOF on the STDOUT/ERR of the VM child process & calls waitpid() then to cleanup. I notice that the LXC driver does not appear to setup any STDERR/OUT for its VMs so they're still inheriting the daemon's. If it isn't a huge problem it'd be desirable to try & have QEMU & LXC operate in the same general way wrt to their primary child procs for VMs.
Regards, Daniel.
stdout/err for the container is set to the tty. Containers can be used in a non-VM fashion as well. Think of a container running a daemon process or a container running a job as a part of a job scheduler/distribution system. Wouldn't it be valid in these cases for the container close stdout/err while continuing to run? -- Best Regards, Dave Leskovec IBM Linux Technology Center Open Virtualization

On Mon, May 05, 2008 at 02:33:09PM -0700, Dave Leskovec wrote:
Daniel P. Berrange wrote:
On Wed, Apr 30, 2008 at 11:38:01PM -0700, Dave Leskovec wrote:
This patch allows the lxc driver to handle SIGCHLD signals from exiting containers. The handling will perform some cleanup such as waiting for the container process and killing/waiting the tty process. This is also required as a first step towards providing some kind of client container exit notification. Additional support is needed for that but this SIGCHLD handling is what would trigger the notification.
libvirtd was already catching SIGCHLD although it was just ignoring it. I implemented a mechanism to distribute the signal to any other drivers in the daemon that registered a function to handle them. This required some changes to the way libvirtd was catching signals (to get the pid of the sending process) as well as an addition to the state driver structure. The intent was to provide future drivers access to signals as well.
The reason it was ignoring it was because the QEMU driver detects the shutdown of the VM without using the SIGCHLD directly. It instead detects EOF on the STDOUT/ERR of the VM child process & calls waitpid() then to cleanup. I notice that the LXC driver does not appear to setup any STDERR/OUT for its VMs so they're still inheriting the daemon's. If it isn't a huge problem it'd be desirable to try & have QEMU & LXC operate in the same general way wrt to their primary child procs for VMs.
Regards, Daniel.
stdout/err for the container is set to the tty. Containers can be used in a non-VM fashion as well. Think of a container running a daemon process or a container running a job as a part of a job scheduler/distribution system. Wouldn't it be valid in these cases for the container close stdout/err while continuing to run?
Hmm, yes, that could be a reasonable use case. I see the key difference here is the the immediate child of libvirt *is* the startup application in the container which can be anything. So yes, we can't rely on its use of stderr/out, as we do with QEMU where the immediate child has defined behaviour Dan. -- |: Red Hat, Engineering, Boston -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Dave Leskovec <dlesko@linux.vnet.ibm.com> wrote:
This patch allows the lxc driver to handle SIGCHLD signals from exiting ...
Hi Dave, At least superficially, this looks fine. Two questions:
Index: b/src/driver.h =================================================================== --- a/src/driver.h 2008-04-10 09:54:54.000000000 -0700 +++ b/src/driver.h 2008-04-30 15:36:47.000000000 -0700 @@ -11,6 +11,10 @@
#include <libxml/uri.h>
+#ifndef _SIGNAL_H +#include <signal.h> +#endif
In practice it's fine to include <signal.h> unconditionally, and even multiple times. Have you encountered a version of <signal.h> that may not be included twice? If so, it probably deserves a comment with the details. ...
Index: b/src/lxc_driver.c =================================================================== ... -static int lxcDomainDestroy(virDomainPtr dom) +static int lxcVMCleanup(lxc_driver_t *driver, lxc_vm_t * vm) { int rc = -1; ... - rc = WEXITSTATUS(childStatus); - DEBUG("container exited with rc: %d", rc); + if (WIFEXITED(childStatus)) { + rc = WEXITSTATUS(childStatus); + DEBUG("container exited with rc: %d", rc); + } + + rc = 0;
Didn't you mean to initialize rc=0 before that if block? If not, please add a comment saying why the child failure doesn't affect the function's return value.

Hi Jim, Thanks for the review. Answers below - Jim Meyering wrote:
Dave Leskovec <dlesko@linux.vnet.ibm.com> wrote:
This patch allows the lxc driver to handle SIGCHLD signals from exiting ...
Hi Dave, At least superficially, this looks fine. Two questions:
Index: b/src/driver.h =================================================================== --- a/src/driver.h 2008-04-10 09:54:54.000000000 -0700 +++ b/src/driver.h 2008-04-30 15:36:47.000000000 -0700 @@ -11,6 +11,10 @@
#include <libxml/uri.h>
+#ifndef _SIGNAL_H +#include <signal.h> +#endif
In practice it's fine to include <signal.h> unconditionally, and even multiple times. Have you encountered a version of <signal.h> that may not be included twice? If so, it probably deserves a comment with the details.
No, I don't have any special condition here. This is probably some past conditioning resurfacing briefly. If I remember correctly, it had more to do with compile efficiency rather than avoiding compile failures from multiple inclusions.
...
Index: b/src/lxc_driver.c =================================================================== ... -static int lxcDomainDestroy(virDomainPtr dom) +static int lxcVMCleanup(lxc_driver_t *driver, lxc_vm_t * vm) { int rc = -1; ... - rc = WEXITSTATUS(childStatus); - DEBUG("container exited with rc: %d", rc); + if (WIFEXITED(childStatus)) { + rc = WEXITSTATUS(childStatus); + DEBUG("container exited with rc: %d", rc); + } + + rc = 0;
Didn't you mean to initialize rc=0 before that if block? If not, please add a comment saying why the child failure doesn't affect the function's return value.
Nice. Yes that rc = 0 definitely shouldn't be there. -- Best Regards, Dave Leskovec IBM Linux Technology Center Open Virtualization

Dave Leskovec <dlesko@linux.vnet.ibm.com> wrote: ...
+#ifndef _SIGNAL_H +#include <signal.h> +#endif
In practice it's fine to include <signal.h> unconditionally, and even multiple times. Have you encountered a version of <signal.h> that may not be included twice? If so, it probably deserves a comment with the details.
No, I don't have any special condition here. This is probably some past conditioning resurfacing briefly. If I remember correctly, it had more to do with compile efficiency rather than avoiding compile failures from multiple inclusions.
Then don't bother. gcc performs a handy optimization whereby it doesn't even open the header file the second (and subsequent) time it's included, as long as it's entire contents is wrapped in the usual sort of guard: #ifndef SYM #define SYM ... #endif

Jim Meyering wrote:
Dave Leskovec <dlesko@linux.vnet.ibm.com> wrote: ...
+#ifndef _SIGNAL_H +#include <signal.h> +#endif In practice it's fine to include <signal.h> unconditionally, and even multiple times. Have you encountered a version of <signal.h> that may not be included twice? If so, it probably deserves a comment with the details. No, I don't have any special condition here. This is probably some past conditioning resurfacing briefly. If I remember correctly, it had more to do with compile efficiency rather than avoiding compile failures from multiple inclusions.
Then don't bother. gcc performs a handy optimization whereby it doesn't even open the header file the second (and subsequent) time it's included, as long as it's entire contents is wrapped in the usual sort of guard:
#ifndef SYM #define SYM ... #endif
Thanks Jim. I've attached an updated patch with those two changes. While making these changes, I noticed that I missed updating the storage drivers state driver table. I've fixed that as well. -- Best Regards, Dave Leskovec IBM Linux Technology Center Open Virtualization

Dave Leskovec <dlesko@linux.vnet.ibm.com> wrote:
Jim Meyering wrote:
Dave Leskovec <dlesko@linux.vnet.ibm.com> wrote: ...
+#ifndef _SIGNAL_H +#include <signal.h> +#endif In practice it's fine to include <signal.h> unconditionally, and even multiple times. Have you encountered a version of <signal.h> that may not be included twice? If so, it probably deserves a comment with the details. No, I don't have any special condition here. This is probably some past conditioning resurfacing briefly. If I remember correctly, it had more to do with compile efficiency rather than avoiding compile failures from multiple inclusions.
Then don't bother. gcc performs a handy optimization whereby it doesn't even open the header file the second (and subsequent) time it's included, as long as it's entire contents is wrapped in the usual sort of guard:
#ifndef SYM #define SYM ... #endif
Thanks Jim. I've attached an updated patch with those two changes. While making these changes, I noticed that I missed updating the storage drivers state driver table. I've fixed that as well.
-- ... Index: b/qemud/qemud.c =================================================================== ... +static void sig_handler(int sig, siginfo_t * siginfo, + void* context ATTRIBUTE_UNUSED) { ... - unsigned char sigc; + siginfo_t siginfo; int ret;
- if (read(server->sigread, &sigc, 1) != 1) { + if (read(server->sigread, &siginfo, sizeof(siginfo)) != sizeof(siginfo)) {
Looks good, but that should be saferead instead of "read". Now that it's reading more than one byte, EINTR can make a difference. Also, it'd make it a tiny bit easier for people who reply to you if you did not put code after your signature. Or at least not after the "-- " signature-introducer. Some mail clients (at least Gnus) ">"-quote only the part of your message that comes before the signature.

Jim Meyering wrote: ...
Index: b/qemud/qemud.c =================================================================== ... +static void sig_handler(int sig, siginfo_t * siginfo, + void* context ATTRIBUTE_UNUSED) { ... - unsigned char sigc; + siginfo_t siginfo; int ret;
- if (read(server->sigread, &sigc, 1) != 1) { + if (read(server->sigread, &siginfo, sizeof(siginfo)) != sizeof(siginfo)) {
Looks good, but that should be saferead instead of "read". Now that it's reading more than one byte, EINTR can make a difference.
Yes, I'll change this to saferead()
Also, it'd make it a tiny bit easier for people who reply to you if you did not put code after your signature. Or at least not after the "-- " signature-introducer. Some mail clients (at least Gnus) ">"-quote only the part of your message that comes before the signature.
Well, I've been sending patches as attachments since my early ones were getting mangled. I guess options would be not send them as attachments and fix the mangling, or remove the "-- " signature-introducer. -- Best Regards, Dave Leskovec IBM Linux Technology Center Open Virtualization

Jim Meyering wrote:
Dave Leskovec <dlesko@linux.vnet.ibm.com> wrote:
Jim Meyering wrote:
Dave Leskovec <dlesko@linux.vnet.ibm.com> wrote:
...
Index: b/qemud/qemud.c =================================================================== ... +static void sig_handler(int sig, siginfo_t * siginfo, + void* context ATTRIBUTE_UNUSED) { ... - unsigned char sigc; + siginfo_t siginfo; int ret;
- if (read(server->sigread, &sigc, 1) != 1) { + if (read(server->sigread, &siginfo, sizeof(siginfo)) != sizeof(siginfo)) {
Looks good, but that should be saferead instead of "read". Now that it's reading more than one byte, EINTR can make a difference.
This latest version of this patch changes the read() to a saferead() as Jim points out. -- Best Regards, Dave Leskovec IBM Linux Technology Center Open Virtualization
participants (3)
-
Daniel P. Berrange
-
Dave Leskovec
-
Jim Meyering