On Tue, Sep 11, 2007 at 12:59:59AM +0100, Daniel P. Berrange wrote:
I noticed that when using the SSH tunnel for the remote driver I
ended up
with alot of zombie SSH processes. We simply forgot to waitpid() on the
child when a connection attempt failed, or when shutting down an open remote
connection. Attached is a possible patch
Looks fine, like Rich maybe a bit of refactoring might be good. The
only worries I have is the following scenario:
- the ssh process dies
- libvirt based application takes some time to notice it
- the OS span a new process with the same PID after a PID rollabck
(not completely unlikely since the ssh may have been started a long
time ago)
- we end-up killing a random process in the system
I think this is mostly avoidable by resetting priv->pid to -1 or 0 on
any child communication error, and before doing the kill in the patch.
Even better would be to be able to check that the process corresponding
to priv->pid is still a child of the current process, I wonder if this
can be achieved without blocking with an initial waitpid()
Maybe I'm too cautious, I'm fine with the principle of the patch though
@@ -646,6 +648,19 @@ doRemoteOpen (virConnectPtr conn, struct
gnutls_bye (priv->session, GNUTLS_SHUT_RDWR);
close (priv->sock);
}
+ if (priv->pid > 0) {
+ pid_t reap;
+ int status, n = 0;
+ kill(priv->pid, SIGTERM);
+ do {
+ if (n)
+ usleep(n*1000);
+ if (n > 3)
+ kill(priv->pid, SIGKILL);
+ reap = waitpid(priv->pid, &status, WNOHANG);
+ n++;
+ } while (reap != -1 && reap != priv->pid);
+ }
/* Free up the URL and strings. */
xmlFreeURI (uri);
Daniel
--
Red Hat Virtualization group
http://redhat.com/virtualization/
Daniel Veillard | virtualization library
http://libvirt.org/
veillard(a)redhat.com | libxml GNOME XML XSLT toolkit
http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine
http://rpmfind.net/