Re: [Libvir] PATCH: Prevent zombie ssh tunnels

17 Sep 2007


      On Fri, Sep 14, 2007 at 03:14:02AM +0100, Daniel P. Berrange wrote:
...
On Tue, Sep 11, 2007 at 04:56:13AM -0400, Daniel Veillard wrote:
...
On Tue, Sep 11, 2007 at 12:59:59AM +0100, Daniel P. Berrange wrote:
...
I noticed that when using the SSH tunnel for the remote driver I ended up
with alot of zombie SSH processes. We simply forgot to waitpid() on the
child when a connection attempt failed, or when shutting down an open remote
connection. Attached is a possible patch
Looks fine, like Rich maybe a bit of refactoring might be good. The
only worries I have is the following scenario:
   - the ssh process dies
   - libvirt based application takes some time to notice it
   - the OS span a new process with the same PID after a PID rollabck
     (not completely unlikely since the ssh may have been started a long
     time ago)
   - we end-up killing a random process in the system
I think this is mostly avoidable by resetting priv->pid to -1 or 0 on 
any child communication error, and before doing the kill in the patch.
Even better would be to be able to check that the process corresponding
to priv->pid is still a child of the current process, I wonder if this
can be achieved without blocking with an initial waitpid()
After some testing I believe we can safely do away with kill() completely.
If we simply close(priv->sock) which is our end of the socketpair used
to talk to SSH, then SSH detects the end-of-file condition and exits of
its on accord. So there's no need to kill() - just close the socket and
waitpid() seems to do the trickm, unless I'm missing something bad.
Looks fine and safe to me, I like it +1 !

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard@redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/