[libvirt] [PATCH] Allow destroying QEMU VM even if a job is active

Introduce a virProcessKill function that can be safely called even when the job mutex is held. This allows virDomainDestroy to kill any VM even if it is asleep in a monitor job. The PID will die and the thread asleep on the monitor will then wake up releasing the job mutex. * src/qemu/qemu_driver.c: Kill process before using qemuProcessStop to ensure job is released * src/qemu/qemu_process.c: Add virProcessKill for killing off QEMU processes --- src/qemu/qemu_driver.c | 7 +++++++ src/qemu/qemu_process.c | 39 +++++++++++++++++++++++++++++++-------- src/qemu/qemu_process.h | 2 ++ 3 files changed, 40 insertions(+), 8 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index b3f9e00..6d6fb51 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1482,6 +1482,13 @@ static int qemudDomainDestroy(virDomainPtr dom) { goto cleanup; } + /* Although qemuProcessStop does this already, there may + * be an outstanding job active. We want to make sure we + * can kill the process even if a job is active. Killing + * it now, means the job will be released + */ + qemuProcessKill(vm); + if (qemuDomainObjBeginJobWithDriver(driver, vm) < 0) goto cleanup; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index a6c0dc8..c60c51f 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2369,6 +2369,36 @@ cleanup: } +void qemuProcessKill(virDomainObjPtr vm) +{ + int i; + int rc; + VIR_DEBUG("vm=%s pid=%d", vm->def->name, vm->pid); + + if (!virDomainObjIsActive(vm)) { + VIR_DEBUG("VM '%s' not active", vm->def->name); + return; + } + + for (i = 0 ; i < 15 ; i++) { + int signum; + if (i == 0) + signum = SIGTERM; + else if (i == 8) + signum = SIGKILL; + else + signum = 0; /* Just check for existance */ + + rc = virKillProcess(vm->pid, signum); + VIR_DEBUG("Iteration %d rc=%d", i, rc); + if (rc < 0) + break; + + usleep(200 * 1000); + } +} + + void qemuProcessStop(struct qemud_driver *driver, virDomainObjPtr vm, int migrated) @@ -2436,13 +2466,6 @@ void qemuProcessStop(struct qemud_driver *driver, } } - /* This will safely handle a non-running guest with pid=0 or pid=-1*/ - if (virKillProcess(vm->pid, 0) == 0 && - virKillProcess(vm->pid, SIGTERM) < 0) - virReportSystemError(errno, - _("Failed to send SIGTERM to %s (%d)"), - vm->def->name, vm->pid); - if (priv->mon) qemuMonitorClose(priv->mon); @@ -2454,7 +2477,7 @@ void qemuProcessStop(struct qemud_driver *driver, } /* shut it off for sure */ - virKillProcess(vm->pid, SIGKILL); + qemuProcessKill(vm); /* now that we know it's stopped call the hook if present */ if (virHookPresent(VIR_HOOK_DRIVER_QEMU)) { diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index f1ab599..d8afab0 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -49,4 +49,6 @@ void qemuProcessStop(struct qemud_driver *driver, virDomainObjPtr vm, int migrated); +void qemuProcessKill(virDomainObjPtr vm); + #endif /* __QEMU_PROCESS_H__ */ -- 1.7.4.4

On 05/06/2011 05:57 AM, Daniel P. Berrange wrote:
Introduce a virProcessKill function that can be safely called even when the job mutex is held. This allows virDomainDestroy to kill any VM even if it is asleep in a monitor job. The PID will die and the thread asleep on the monitor will then wake up releasing the job mutex.
Very nice. This will help issues where if you kill a migration at the source, then the destination is left with a stale VM with the job lock held waiting for the incoming migration to complete. With this, you now have the means to forcibly kill the leftover VM on the destination, rather than waiting for a network timeout.
+++ b/src/qemu/qemu_driver.c @@ -1482,6 +1482,13 @@ static int qemudDomainDestroy(virDomainPtr dom) { goto cleanup; }
+ /* Although qemuProcessStop does this already, there may + * be an outstanding job active. We want to make sure we + * can kill the process even if a job is active. Killing + * it now, means the job will be released
s/now,/now/
+ */ + qemuProcessKill(vm); + if (qemuDomainObjBeginJobWithDriver(driver, vm) < 0) goto cleanup;
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index a6c0dc8..c60c51f 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2369,6 +2369,36 @@ cleanup: }
+void qemuProcessKill(virDomainObjPtr vm)
Do you want this to return -1 on failure to kill, such as if waitpid() fails? On the other hand, I guess that best effort is okay; you already did VIR_DEBUG logging of failures, and the caller can't do much even if they were to check for a failure.
+{ + int i; + int rc; + VIR_DEBUG("vm=%s pid=%d", vm->def->name, vm->pid); + + if (!virDomainObjIsActive(vm)) { + VIR_DEBUG("VM '%s' not active", vm->def->name); + return; + } + + for (i = 0 ; i < 15 ; i++) { + int signum; + if (i == 0) + signum = SIGTERM; + else if (i == 8) + signum = SIGKILL; + else + signum = 0; /* Just check for existance */
s/existance/existence/ It took me three reads to figure out what this loop is doing; a comment would be helpful: /* Try SIGTERM, then wait 1.6 seconds, then try SIGKILL, all while probing every 200ms to see if the process is gone */
+ + rc = virKillProcess(vm->pid, signum); + VIR_DEBUG("Iteration %d rc=%d", i, rc); + if (rc < 0) + break;
When errno==ESRCH, this breaks the loop because the child is dead. But for any other errno, should you log (at least at VIR_DEBUG) the errno that causes failure, since that means we didn't exit the loop normally?
+ + usleep(200 * 1000); + } +} +
ACK with those nits addressed. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Fri, May 06, 2011 at 08:34:43AM -0600, Eric Blake wrote:
On 05/06/2011 05:57 AM, Daniel P. Berrange wrote:
Introduce a virProcessKill function that can be safely called even when the job mutex is held. This allows virDomainDestroy to kill any VM even if it is asleep in a monitor job. The PID will die and the thread asleep on the monitor will then wake up releasing the job mutex.
Very nice. This will help issues where if you kill a migration at the source, then the destination is left with a stale VM with the job lock held waiting for the incoming migration to complete. With this, you now have the means to forcibly kill the leftover VM on the destination, rather than waiting for a network timeout.
Actually this is the one use case it won't help with. THe migration thing is a bit of a nasty hack / abuse. 'Prepare' obtains the job mutex, and 'Finish' releases it. Neither of them are actually monitoring the QEMU process, so it won't get released. I'll need some other special case code for that. This patch will help for any case where a API command is stuck waiting for a monitor command reply.
+++ b/src/qemu/qemu_driver.c @@ -1482,6 +1482,13 @@ static int qemudDomainDestroy(virDomainPtr dom) { goto cleanup; }
+ /* Although qemuProcessStop does this already, there may + * be an outstanding job active. We want to make sure we + * can kill the process even if a job is active. Killing + * it now, means the job will be released
s/now,/now/
+ */ + qemuProcessKill(vm); + if (qemuDomainObjBeginJobWithDriver(driver, vm) < 0) goto cleanup;
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index a6c0dc8..c60c51f 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2369,6 +2369,36 @@ cleanup: }
+void qemuProcessKill(virDomainObjPtr vm)
Do you want this to return -1 on failure to kill, such as if waitpid() fails? On the other hand, I guess that best effort is okay; you already did VIR_DEBUG logging of failures, and the caller can't do much even if they were to check for a failure.
I didn't bother to return any value, because all the callers are going to ignore it anyway.
+{ + int i; + int rc; + VIR_DEBUG("vm=%s pid=%d", vm->def->name, vm->pid); + + if (!virDomainObjIsActive(vm)) { + VIR_DEBUG("VM '%s' not active", vm->def->name); + return; + } + + for (i = 0 ; i < 15 ; i++) { + int signum; + if (i == 0) + signum = SIGTERM; + else if (i == 8) + signum = SIGKILL; + else + signum = 0; /* Just check for existance */
s/existance/existence/
It took me three reads to figure out what this loop is doing; a comment would be helpful:
/* Try SIGTERM, then wait 1.6 seconds, then try SIGKILL, all while probing every 200ms to see if the process is gone */
+ + rc = virKillProcess(vm->pid, signum); + VIR_DEBUG("Iteration %d rc=%d", i, rc); + if (rc < 0) + break;
When errno==ESRCH, this breaks the loop because the child is dead. But for any other errno, should you log (at least at VIR_DEBUG) the errno that causes failure, since that means we didn't exit the loop normally?
Yeah i guess so.
+ + usleep(200 * 1000); + } +} +
ACK with those nits addressed.
Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (2)
-
Daniel P. Berrange
-
Eric Blake