
14 Sep
2007
14 Sep
'07
9:22 a.m.
Daniel P. Berrange wrote: > On Tue, Sep 11, 2007 at 04:56:13AM -0400, Daniel Veillard wrote: >> On Tue, Sep 11, 2007 at 12:59:59AM +0100, Daniel P. Berrange wrote: >>> I noticed that when using the SSH tunnel for the remote driver I ended up >>> with alot of zombie SSH processes. We simply forgot to waitpid() on the >>> child when a connection attempt failed, or when shutting down an open remote >>> connection. Attached is a possible patch >> Looks fine, like Rich maybe a bit of refactoring might be good. The >> only worries I have is the following scenario: >> - the ssh process dies >> - libvirt based application takes some time to notice it >> - the OS span a new process with the same PID after a PID rollabck >> (not completely unlikely since the ssh may have been started a long >> time ago) >> - we end-up killing a random process in the system >> >> I think this is mostly avoidable by resetting priv->pid to -1 or 0 on >> any child communication error, and before doing the kill in the patch. >> Even better would be to be able to check that the process corresponding >> to priv->pid is still a child of the current process, I wonder if this >> can be achieved without blocking with an initial waitpid() > > After some testing I believe we can safely do away with kill() completely. > If we simply close(priv->sock) which is our end of the socketpair used > to talk to SSH, then SSH detects the end-of-file condition and exits of > its on accord. So there's no need to kill() - just close the socket and > waitpid() seems to do the trickm, unless I'm missing something bad. +1, but be nicer if that common code was in a function. Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903