
On Tue, Sep 11, 2007 at 12:59:59AM +0100, Daniel P. Berrange wrote:
I noticed that when using the SSH tunnel for the remote driver I ended up with alot of zombie SSH processes. We simply forgot to waitpid() on the child when a connection attempt failed, or when shutting down an open remote connection. Attached is a possible patch
Looks fine, like Rich maybe a bit of refactoring might be good. The only worries I have is the following scenario: - the ssh process dies - libvirt based application takes some time to notice it - the OS span a new process with the same PID after a PID rollabck (not completely unlikely since the ssh may have been started a long time ago) - we end-up killing a random process in the system I think this is mostly avoidable by resetting priv->pid to -1 or 0 on any child communication error, and before doing the kill in the patch. Even better would be to be able to check that the process corresponding to priv->pid is still a child of the current process, I wonder if this can be achieved without blocking with an initial waitpid() Maybe I'm too cautious, I'm fine with the principle of the patch though
@@ -646,6 +648,19 @@ doRemoteOpen (virConnectPtr conn, struct gnutls_bye (priv->session, GNUTLS_SHUT_RDWR); close (priv->sock); } + if (priv->pid > 0) { + pid_t reap; + int status, n = 0; + kill(priv->pid, SIGTERM); + do { + if (n) + usleep(n*1000); + if (n > 3) + kill(priv->pid, SIGKILL); + reap = waitpid(priv->pid, &status, WNOHANG); + n++; + } while (reap != -1 && reap != priv->pid); + }
/* Free up the URL and strings. */ xmlFreeURI (uri);
Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/