[libvirt] [PATCH] Ignore SIGWINCH in remote client call to poll(2) (RHBZ#567931).

-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into Xen guests. http://et.redhat.com/~rjones/virt-p2v

On 02/24/2010 02:06 PM, Richard W.M. Jones wrote:
The correct solution is to mask out SIGWINCH for the duration of the poll(2) system call. The per-thread mask is changed and restored immediately after the call. Since we are using pthread_sigmask, this should not affect other threads, and since we restore the signal mask immediately afterwards it should not affect the current thread visibly either.
This is interesting. No signal handler will necessarily do things like longjmp-ing out of the remote driver, so every signal could give rise to a similar bug. In particular, SIGCHLD would be an obvious candidate for being handled the same way, since both SIGWINCH and SIGCHLD are default-ignored signals. On the other hand, while for SIGWINCH it would be mostly harmless(*), for SIGCHLD it would leave a zombie until the remote libvirtd answers. Any ideas? (*) Unless you have more than one thread using curses, and a thread other than the one calling libvirt has blocked SIGWINCH. Paolo

On Wed, Feb 24, 2010 at 03:39:20PM +0100, Paolo Bonzini wrote:
On 02/24/2010 02:06 PM, Richard W.M. Jones wrote:
The correct solution is to mask out SIGWINCH for the duration of the poll(2) system call. The per-thread mask is changed and restored immediately after the call. Since we are using pthread_sigmask, this should not affect other threads, and since we restore the signal mask immediately afterwards it should not affect the current thread visibly either.
This is interesting. No signal handler will necessarily do things like longjmp-ing out of the remote driver, so every signal could give rise to a similar bug.
Did you mean "signals handlers might longjmp out ..."? I've seen a few in my time that have done that. I notice the top Google hit for "signal handler" & "longjmp" is this warning about the practice from CERT: https://www.securecoding.cert.org/confluence/display/seccode/SIG32-C.+Do+not...
In particular, SIGCHLD would be an obvious candidate for being handled the same way, since both SIGWINCH and SIGCHLD are default-ignored signals. On the other hand, while for SIGWINCH it would be mostly harmless(*), for SIGCHLD it would leave a zombie until the remote libvirtd answers. Any ideas?
(*) Unless you have more than one thread using curses, and a thread other than the one calling libvirt has blocked SIGWINCH.
Certainly SIGCHLD could be added. I was keeping this patch minimal so it just fixes the problem I observed, to reduce the chance that a change to such a critical piece of code could break anything else. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://et.redhat.com/~rjones/libguestfs/ See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html

On 02/24/2010 04:09 PM, Richard W.M. Jones wrote:
No signal handler will necessarily do things like
longjmp-ing out of the remote driver, so every signal could give rise to a similar bug. Did you mean "signals handlers might longjmp out ..."?
No, I meant what I wrote, as longjmp-ing out is the only way to handle signals if you do not have control on the whole code and if you cannot audit the handling of EINTR in each non-restartable system call (notably select/poll). Setting a flag doesn't work in this case. Of course, longjmp would have another share of problems since it would likely leave libvirt's state inconsistent, as in the CERT page that you linked.
In particular, SIGCHLD would be an obvious candidate for being handled the same way, since both SIGWINCH and SIGCHLD are default-ignored signals. On the other hand, while for SIGWINCH it would be mostly harmless(*), for SIGCHLD it would leave a zombie until the remote libvirtd answers. Any ideas?
(*) Unless you have more than one thread using curses, and a thread other than the one calling libvirt has blocked SIGWINCH.
Certainly SIGCHLD could be added. I was keeping this patch minimal so it just fixes the problem I observed, to reduce the chance that a change to such a critical piece of code could break anything else.
Yep, I agree. Paolo

On Wed, Feb 24, 2010 at 03:09:54PM +0000, Richard W.M. Jones wrote:
On Wed, Feb 24, 2010 at 03:39:20PM +0100, Paolo Bonzini wrote:
On 02/24/2010 02:06 PM, Richard W.M. Jones wrote:
The correct solution is to mask out SIGWINCH for the duration of the poll(2) system call. The per-thread mask is changed and restored immediately after the call. Since we are using pthread_sigmask, this should not affect other threads, and since we restore the signal mask immediately afterwards it should not affect the current thread visibly either.
This is interesting. No signal handler will necessarily do things like longjmp-ing out of the remote driver, so every signal could give rise to a similar bug.
Did you mean "signals handlers might longjmp out ..."? I've seen a few in my time that have done that. I notice the top Google hit for "signal handler" & "longjmp" is this warning about the practice from CERT:
https://www.securecoding.cert.org/confluence/display/seccode/SIG32-C.+Do+not...
In particular, SIGCHLD would be an obvious candidate for being handled the same way, since both SIGWINCH and SIGCHLD are default-ignored signals. On the other hand, while for SIGWINCH it would be mostly harmless(*), for SIGCHLD it would leave a zombie until the remote libvirtd answers. Any ideas?
(*) Unless you have more than one thread using curses, and a thread other than the one calling libvirt has blocked SIGWINCH.
Certainly SIGCHLD could be added. I was keeping this patch minimal so it just fixes the problem I observed, to reduce the chance that a change to such a critical piece of code could break anything else.
I think SIGWINCH + SIGCHLD + SIGPIPE are a reasonably set to block since those are common "normal" signals a process may received during operation. Other signals are all typically related to fatal problems rather tha normal conditions Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Wed, Feb 24, 2010 at 03:51:39PM +0000, Daniel P. Berrange wrote:
I think SIGWINCH + SIGCHLD + SIGPIPE are a reasonably set to block since those are common "normal" signals a process may received during operation. Other signals are all typically related to fatal problems rather tha normal conditions
This updated patch includes masking of SIGCHLD and SIGPIPE as well. I tested this patch with virt-top. Another potential change would be to use ppoll(2) instead of poll(2). However I don't think this gives us any extra safety. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://et.redhat.com/~rjones/libguestfs/ See what it can do: http://et.redhat.com/~rjones/libguestfs/recipes.html

On 02/24/2010 05:09 PM, Richard W.M. Jones wrote:
In bug 567931 we found that virt-top would exit occasionally when the terminal window was resized. Tracking this down it turned out that SIGWINCH was being delivered to the process at exactly the point where the libvirt remote driver was calling poll(2) waiting for a reply from libvirtd.
This caused the poll(2) call to be interrupted (returning errno EINTR). However handling EINTR the same way as EAGAIN was not the solution to this problem since we found previously that this would break Ctrl-C handling (commit 47fec8eac2bb3).
The correct solution is to mask out SIGWINCH for the duration of the poll(2) system call. The per-thread mask is changed and restored immediately after the call. Since we are using pthread_sigmask, this should not affect other threads, and since we restore the signal mask immediately afterwards it should not affect the current thread visibly either. Other possibly problematic signals are SIGCHLD and SIGPIPE and these are masked too.
Note use of ignore_value: It's not fatal if we cannot mask out SIGWINCH, and in any case pthread_sigmask never fails on Linux as long as you supply the correct arguments.
ACK. Paolo
participants (3)
-
Daniel P. Berrange
-
Paolo Bonzini
-
Richard W.M. Jones