Re: [libvirt] [PATCH] Allow destroying QEMU VM even if a job is active

6 May 2011

On 05/06/2011 05:57 AM, Daniel P. Berrange wrote:
...
Introduce a virProcessKill function that can be safely called
even when the job mutex is held. This allows virDomainDestroy
to kill any VM even if it is asleep in a monitor job. The PID
will die and the thread asleep on the monitor will then wake
up releasing the job mutex.
Very nice.  This will help issues where if you kill a migration at the
source, then the destination is left with a stale VM with the job lock
held waiting for the incoming migration to complete.  With this, you now
have the means to forcibly kill the leftover VM on the destination,
rather than waiting for a network timeout.
...
+++ b/src/qemu/qemu_driver.c
@@ -1482,6 +1482,13 @@ static int qemudDomainDestroy(virDomainPtr dom) {
         goto cleanup;
     }
+    /* Although qemuProcessStop does this already, there may
+     * be an outstanding job active. We want to make sure we
+     * can kill the process even if a job is active. Killing
+     * it now, means the job will be released
s/now,/now/
...
+     */
+    qemuProcessKill(vm);
+
     if (qemuDomainObjBeginJobWithDriver(driver, vm) < 0)
         goto cleanup;

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index a6c0dc8..c60c51f 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -2369,6 +2369,36 @@ cleanup:
 }
+void qemuProcessKill(virDomainObjPtr vm)
Do you want this to return -1 on failure to kill, such as if waitpid()
fails?  On the other hand, I guess that best effort is okay; you already
did VIR_DEBUG logging of failures, and the caller can't do much even if
they were to check for a failure.
...
+{
+    int i;
+    int rc;
+    VIR_DEBUG("vm=%s pid=%d", vm->def->name, vm->pid);
+
+    if (!virDomainObjIsActive(vm)) {
+        VIR_DEBUG("VM '%s' not active", vm->def->name);
+        return;
+    }
+
+    for (i = 0 ; i < 15 ; i++) {
+        int signum;
+        if (i == 0)
+            signum = SIGTERM;
+        else if (i == 8)
+            signum = SIGKILL;
+        else
+            signum = 0; /* Just check for existance */
s/existance/existence/

It took me three reads to figure out what this loop is doing; a comment
would be helpful:

/* Try SIGTERM, then wait 1.6 seconds, then try SIGKILL, all while
probing every 200ms to see if the process is gone */
...
+
+        rc = virKillProcess(vm->pid, signum);
+        VIR_DEBUG("Iteration %d rc=%d", i, rc);
+        if (rc < 0)
+            break;
When errno==ESRCH, this breaks the loop because the child is dead.  But
for any other errno, should you log (at least at VIR_DEBUG) the errno
that causes failure, since that means we didn't exit the loop normally?
...
+
+        usleep(200 * 1000);
+    }
+}
+
ACK with those nits addressed.

-- 
Eric Blake   eblake@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org