On 19.03.2014 12:10, Carlos Rodrigues wrote:
Hello Michal,
I am using libvirt 1.1.3 and perl-Sys-Virt 1.1.3 and perl-5.16 on Fedora
19 x86_64
The zombie process appears after open libvirt connection with qemu-tls,
and perl module is binding for libvirt library XS.
Here is my running example with zombie process:
$ perl test-chldhandle-bug-fixed.pl & sleep 15 && echo && ps axf |
grep perl && echo
[2] 12427
init... pid=12427
while...
fork 1
end... pid=12430
receive chld
fork 2
end... pid=12431
receive chld
2014-03-19 11:06:38.712+0000: 12427: info : libvirt version: 1.1.3.1, package: 2.fc19
(Unknown, 2014-03-17-15:02:00, cmar-laptop.lan)
2014-03-19 11:06:38.712+0000: 12427: warning : virNetTLSContextCheckCertificate:1140 :
Certificate check failed Certificate [session] owner does not match the hostname
10.10.4.249
connection open
fork 3
end... pid=12432
fork 4
end... pid=12440
12427 pts/2 S 0:00 | \_ perl test-chldhandle-bug-fixed.pl
12432 pts/2 Z 0:00 | | \_ [perl] <defunct>
12440 pts/2 Z 0:00 | | \_ [perl] <defunct>
12442 pts/2 S+ 0:00 | \_ grep --color=auto perl
Aha! It seems like this is only present if using tls, I was unable to
reproduce this with tcp or unix sockets. And when using tcp I can see
SIGCHLD being delivered while with tls it is not. That makes me wonder
if either libvirt or gnutls silently sets signal mask and not restore it
back. Because if I take a look at signal mask I can clearly see SIGCHLD
to be blocked (from /proc/$pid/status):
SigPnd: 0000000000000000
ShdPnd: 0000000000010000
SigBlk: 0000000008011000
SigIgn: 0000000000001080
SigCgt: 0000000180010000
What we can see here is, SigBlk (the bitmask of blocked signals)
contains 0x801100 which is SIGPIPE, SIGCHLD and SIGWINCH. Right, why
would libvirt care about SIGWINCH anyway? Git greping it leads us to
virNetClientSetTLSSession(). I can clearly see there we are adding just
those three signals to a mask. Then setting this mask just prior to
calling poll() and then restoring back. Oh wait, we are not!
pthread_sigmask(SIG_BLOCK,...) is just adding new signals to the mask,
not overwriting the old one. So yes, this is clearly libvirt bug.
If I use SIG_SETMASK there, I am no longer getting any zombies. I'll
post the patch shortly.
Michal