[libvirt] [PATCH] qemu: never send SIGKILL to qemu process unless specifically requested

When libvirt is shutting down the qemu process, it first sends SIGTERM, then waits for 1.6 seconds and, if it sees the process still there, sends a SIGKILL. There have been reports that this behavior can lead to data loss because the guest running in qemu doesn't have time to flush it's disk cache buffers before it's unceremoniously whacked. One suggestion on how to solve that problem was to remove SIGKILL from the normal virDomainDestroyFlags, but still provide the ability to kill qemu with SIGKILL by using a new flag to virDomainDestroyFlags. This patch is a quick attempt at that in order to start a conversation on the topic. So what are your opinions? Is this the right way to solve the problem? Any better ideas? --- include/libvirt/libvirt.h.in | 12 ++++++------ src/qemu/qemu_driver.c | 7 +++++-- src/qemu/qemu_process.c | 40 ++++++++++++++++++++++------------------ src/qemu/qemu_process.h | 8 +++++++- 4 files changed, 40 insertions(+), 27 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index e99cd00..f9ac865 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1182,12 +1182,6 @@ virConnectPtr virDomainGetConnect (virDomainPtr domain); * Domain creation and destruction */ - -/* - * typedef enum { - * } virDomainDestroyFlagsValues; - */ - virDomainPtr virDomainCreateXML (virConnectPtr conn, const char *xmlDesc, unsigned int flags); @@ -1222,6 +1216,12 @@ int virDomainReset (virDomainPtr domain, unsigned int flags); int virDomainDestroy (virDomainPtr domain); + +typedef enum { + VIR_DOMAIN_DESTROY_NONE = 0, /* Default behavior */ + VIR_DOMAIN_DESTROY_IMMEDIATE = 1 << 0, /* immediately kill, could lead to data loss!! */ +} virDomainDestroyFlagsValues; + int virDomainDestroyFlags (virDomainPtr domain, unsigned int flags); int virDomainRef (virDomainPtr domain); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index ab69dca..0e80241 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1754,7 +1754,7 @@ qemuDomainDestroyFlags(virDomainPtr dom, virDomainEventPtr event = NULL; qemuDomainObjPrivatePtr priv; - virCheckFlags(0, -1); + virCheckFlags(VIR_DOMAIN_DESTROY_IMMEDIATE, -1); qemuDriverLock(driver); vm = virDomainFindByUUID(&driver->domains, dom->uuid); @@ -1775,7 +1775,10 @@ qemuDomainDestroyFlags(virDomainPtr dom, * can kill the process even if a job is active. Killing * it now means the job will be released */ - qemuProcessKill(vm, false); + if (flags & VIR_DOMAIN_DESTROY_IMMEDIATE) + qemuProcessKill(vm, VIR_QEMU_PROCESS_KILL_IMMEDIATELY); + else + qemuProcessKill(vm, VIR_QEMU_PROCESS_KILL_FORCEFULLY); /* We need to prevent monitor EOF callback from doing our work (and sending * misleading events) while the vm is unlocked inside BeginJob API diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index d22020b..efbe46c 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -586,7 +586,7 @@ endjob: cleanup: if (vm) { if (ret == -1) - qemuProcessKill(vm, false); + qemuProcessKill(vm, VIR_QEMU_PROCESS_KILL_FORCEFULLY); if (virDomainObjUnref(vm) > 0) virDomainObjUnlock(vm); } @@ -611,12 +611,12 @@ qemuProcessShutdownOrReboot(struct qemud_driver *driver, qemuProcessFakeReboot, vm) < 0) { VIR_ERROR(_("Failed to create reboot thread, killing domain")); - qemuProcessKill(vm, true); + qemuProcessKill(vm, VIR_QEMU_PROCESS_KILL_GRACEFULLY); /* Safe to ignore value since ref count was incremented above */ ignore_value(virDomainObjUnref(vm)); } } else { - qemuProcessKill(vm, true); + qemuProcessKill(vm, VIR_QEMU_PROCESS_KILL_GRACEFULLY); } } @@ -3524,30 +3524,34 @@ cleanup: } -void qemuProcessKill(virDomainObjPtr vm, bool gracefully) +void qemuProcessKill(virDomainObjPtr vm, unsigned int killMode) { int i; - VIR_DEBUG("vm=%s pid=%d gracefully=%d", - vm->def->name, vm->pid, gracefully); + VIR_DEBUG("vm=%s pid=%d killMode=%d", + vm->def->name, vm->pid, killMode); if (!virDomainObjIsActive(vm)) { VIR_DEBUG("VM '%s' not active", vm->def->name); return; } - /* This loop sends SIGTERM, then waits a few iterations - * (1.6 seconds) to see if it dies. If still alive then - * it does SIGKILL, and waits a few more iterations (1.6 - * seconds more) to confirm that it has really gone. + /* This loop sends SIGTERM (or SIGKILL if killMode is + * VIR_QEMU_KILL_MODE_IMMEDIATELY), then waits a few iterations + * (3.2 seconds) to see if it dies. Note that the IMMEDIATELY mode + * will very likely result in lost data in the guest, so it should + * only be used if the guest is hung and can't be destroyed in any + * other manner. */ - for (i = 0 ; i < 15 ; i++) { + for (i = 0 ; i < 16 ; i++) { int signum; - if (i == 0) - signum = SIGTERM; - else if (i == 8) - signum = SIGKILL; - else + if (i == 0) { + if (killMode == VIR_QEMU_PROCESS_KILL_IMMEDIATELY) + signum = SIGKILL; + else + signum = SIGTERM; + } else { signum = 0; /* Just check for existence */ + } if (virKillProcess(vm->pid, signum) < 0) { if (errno != ESRCH) { @@ -3558,7 +3562,7 @@ void qemuProcessKill(virDomainObjPtr vm, bool gracefully) break; } - if (i == 0 && gracefully) + if (i == 0 && (killMode == VIR_QEMU_PROCESS_KILL_GRACEFULLY)) break; usleep(200 * 1000); @@ -3651,7 +3655,7 @@ void qemuProcessStop(struct qemud_driver *driver, } /* shut it off for sure */ - qemuProcessKill(vm, false); + qemuProcessKill(vm, VIR_QEMU_PROCESS_KILL_FORCEFULLY); /* Stop autodestroy in case guest is restarted */ qemuProcessAutoDestroyRemove(driver, vm); diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index 062d79e..755d4a6 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -68,7 +68,13 @@ int qemuProcessAttach(virConnectPtr conn, virDomainChrSourceDefPtr monConfig, bool monJSON); -void qemuProcessKill(virDomainObjPtr vm, bool gracefully); +typedef enum { + VIR_QEMU_PROCESS_KILL_GRACEFULLY = 0, + VIR_QEMU_PROCESS_KILL_FORCEFULLY = 1, + VIR_QEMU_PROCESS_KILL_IMMEDIATELY = 2, +} virQemuProcessKillMode; + +void qemuProcessKill(virDomainObjPtr vm, unsigned int killMode); int qemuProcessAutoDestroyInit(struct qemud_driver *driver); void qemuProcessAutoDestroyRun(struct qemud_driver *driver, -- 1.7.7.5

I forgot to mention in the subject line that this is an RFC patch only. I haven't tested it or put in any documentation, and there's likely other things wrong with it too, even if the concept itself is agreeable.

On 01/27/2012 11:35 AM, Laine Stump wrote:
When libvirt is shutting down the qemu process, it first sends SIGTERM, then waits for 1.6 seconds and, if it sees the process still there, sends a SIGKILL.
There have been reports that this behavior can lead to data loss because the guest running in qemu doesn't have time to flush it's disk cache buffers before it's unceremoniously whacked.
Yeah, SIGKILL can have that effect, and should always be left as a last-resort option.
One suggestion on how to solve that problem was to remove SIGKILL from the normal virDomainDestroyFlags, but still provide the ability to kill qemu with SIGKILL by using a new flag to virDomainDestroyFlags. This patch is a quick attempt at that in order to start a conversation on the topic.
So what are your opinions? Is this the right way to solve the problem? Any better ideas?
The only thing I worry about is backward incompatibility. Older clients are expecting the hard-core whack of the guest, but data loss is never nice, so I can live with calling this a bug fix and not a backwards change in behavior. But if I'm wrong, then we would have to make it so that the new flag has to be specified if you want to destroy nicely without a SIGKILL, then call destroy again with flags of 0 if things aren't responsive enough for you. At any rate, I like the idea of using a flags value to give the user control of whether or not to use SIGKILL, and documenting that SIGKILL should be a last resort. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Fri, Jan 27, 2012 at 01:35:35PM -0500, Laine Stump wrote:
When libvirt is shutting down the qemu process, it first sends SIGTERM, then waits for 1.6 seconds and, if it sees the process still there, sends a SIGKILL.
There have been reports that this behavior can lead to data loss because the guest running in qemu doesn't have time to flush it's disk cache buffers before it's unceremoniously whacked.
One suggestion on how to solve that problem was to remove SIGKILL from the normal virDomainDestroyFlags, but still provide the ability to kill qemu with SIGKILL by using a new flag to virDomainDestroyFlags. This patch is a quick attempt at that in order to start a conversation on the topic.
So what are your opinions? Is this the right way to solve the problem?
No, we can't change the default semantics of virDomainDestroy in this case. Applications expect that we do absolutely everything possible to kill of the guest. This is particularly important for cluster fencing usage. If we only use SIGTERM, then we're introducing unacceptable risk to apps relying on this. We could do the opposite though - have a flag to do a gracefully destroy, leaving the default as un-graceful. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 01/30/2012 06:02 AM, Daniel P. Berrange wrote:
When libvirt is shutting down the qemu process, it first sends SIGTERM, then waits for 1.6 seconds and, if it sees the process still there, sends a SIGKILL.
There have been reports that this behavior can lead to data loss because the guest running in qemu doesn't have time to flush it's disk cache buffers before it's unceremoniously whacked.
One suggestion on how to solve that problem was to remove SIGKILL from the normal virDomainDestroyFlags, but still provide the ability to kill qemu with SIGKILL by using a new flag to virDomainDestroyFlags. This patch is a quick attempt at that in order to start a conversation on the topic.
So what are your opinions? Is this the right way to solve the problem? No, we can't change the default semantics of virDomainDestroy in
On Fri, Jan 27, 2012 at 01:35:35PM -0500, Laine Stump wrote: this case. Applications expect that we do absolutely everything possible to kill of the guest. This is particularly important for cluster fencing usage. If we only use SIGTERM, then we're introducing unacceptable risk to apps relying on this.
We could do the opposite though - have a flag to do a gracefully destroy, leaving the default as un-graceful.
virDomainShutdown ends up calling qemuProcessKill() too. So, I guess we need to add a flag there too. In the meantime, shouldn't we at least wait longer before resorting to SIGKILL? (especially since it appears the current timeout is quite often too short). (If we don't at least do that, what we're saying is "the behavior of virDomainShutdown / virDomainDestroy is to lose your data unless you're lucky. If you don't want this behavior, you need to use virDomainXXXFlags, and specify the VIR_DOMAIN_DONT_TRASH_MY_DATA flag" :-P).

On 30.01.2012 21:30, Laine Stump wrote:
On 01/30/2012 06:02 AM, Daniel P. Berrange wrote:
When libvirt is shutting down the qemu process, it first sends SIGTERM, then waits for 1.6 seconds and, if it sees the process still there, sends a SIGKILL.
There have been reports that this behavior can lead to data loss because the guest running in qemu doesn't have time to flush it's disk cache buffers before it's unceremoniously whacked.
One suggestion on how to solve that problem was to remove SIGKILL from the normal virDomainDestroyFlags, but still provide the ability to kill qemu with SIGKILL by using a new flag to virDomainDestroyFlags. This patch is a quick attempt at that in order to start a conversation on the topic.
So what are your opinions? Is this the right way to solve the problem? No, we can't change the default semantics of virDomainDestroy in
On Fri, Jan 27, 2012 at 01:35:35PM -0500, Laine Stump wrote: this case. Applications expect that we do absolutely everything possible to kill of the guest. This is particularly important for cluster fencing usage. If we only use SIGTERM, then we're introducing unacceptable risk to apps relying on this.
We could do the opposite though - have a flag to do a gracefully destroy, leaving the default as un-graceful.
virDomainShutdown ends up calling qemuProcessKill() too. So, I guess we need to add a flag there too.
In the meantime, shouldn't we at least wait longer before resorting to SIGKILL? (especially since it appears the current timeout is quite often too short). (If we don't at least do that, what we're saying is "the behavior of virDomainShutdown / virDomainDestroy is to lose your data unless you're lucky. If you don't want this behavior, you need to use virDomainXXXFlags, and specify the VIR_DOMAIN_DONT_TRASH_MY_DATA flag" :-P).
I should probably hop into this as I've tried to solve this issue earlier but got sidetracked and then forgot about it. Increasing the delay could be temporary workaround, but we should keep in mind that if we change the delay to X (units of time), I bet in some cases it will take qemu X+1 units to flush caches. Therefore I lean to the flag DONT_SENT_SIGKILL and leave the default to what it is now. However, as a qemu developer pointed out (Luiz?), even with -no-shutdown qemu will terminate itself after receiving SIGINT and flushing own caches. So this might be the right way to solve this. Michal

On 01/31/2012 03:30 AM, Michal Privoznik wrote:
On 30.01.2012 21:30, Laine Stump wrote:
On 01/30/2012 06:02 AM, Daniel P. Berrange wrote:
When libvirt is shutting down the qemu process, it first sends SIGTERM, then waits for 1.6 seconds and, if it sees the process still there, sends a SIGKILL.
There have been reports that this behavior can lead to data loss because the guest running in qemu doesn't have time to flush it's disk cache buffers before it's unceremoniously whacked.
One suggestion on how to solve that problem was to remove SIGKILL from the normal virDomainDestroyFlags, but still provide the ability to kill qemu with SIGKILL by using a new flag to virDomainDestroyFlags. This patch is a quick attempt at that in order to start a conversation on the topic.
So what are your opinions? Is this the right way to solve the problem? No, we can't change the default semantics of virDomainDestroy in
On Fri, Jan 27, 2012 at 01:35:35PM -0500, Laine Stump wrote: this case. Applications expect that we do absolutely everything possible to kill of the guest.
But not that we get it all done within 1.6 seconds, right? :-)
This is particularly important for cluster fencing usage. If we only use SIGTERM, then we're introducing unacceptable risk to apps relying on this.
We could do the opposite though - have a flag to do a gracefully destroy, leaving the default as un-graceful. virDomainShutdown ends up calling qemuProcessKill() too. So, I guess we need to add a flag there too.
In the meantime, shouldn't we at least wait longer before resorting to SIGKILL? (especially since it appears the current timeout is quite often too short). (If we don't at least do that, what we're saying is "the behavior of virDomainShutdown / virDomainDestroy is to lose your data unless you're lucky. If you don't want this behavior, you need to use virDomainXXXFlags, and specify the VIR_DOMAIN_DONT_TRASH_MY_DATA flag" :-P). I should probably hop into this as I've tried to solve this issue earlier but got sidetracked and then forgot about it.
Increasing the delay could be temporary workaround, but we should keep in mind that if we change the delay to X (units of time), I bet in some cases it will take qemu X+1 units to flush caches.
Therefore I lean to the flag DONT_SENT_SIGKILL and leave the default to what it is now.
Sure, even if you increase the timeout, there will still be cases where it fails. But we can't magically cause every piece of management software to switch to using the DONT_SEND_SIGKILL flag overnight, and in the meantime increasing the timeout will lead to fewer failures, and won't cause a longer delay than currently *unless it was going to cause a failure anyway* (since the loop exits as soon as it detects the process has exited). So, I don't see a downside to increasing the timeout (in addition to adding the flag, *not* instead of it). The current timeout is arbitrary, so why not make it a bit larger arbitrary amount?
However, as a qemu developer pointed out (Luiz?), even with -no-shutdown qemu will terminate itself after receiving SIGINT and flushing own caches. So this might be the right way to solve this.
Please define "this" more specifically :-) (maybe in patch form?)

On Mon, Jan 30, 2012 at 03:30:18PM -0500, Laine Stump wrote:
On 01/30/2012 06:02 AM, Daniel P. Berrange wrote:
When libvirt is shutting down the qemu process, it first sends SIGTERM, then waits for 1.6 seconds and, if it sees the process still there, sends a SIGKILL.
There have been reports that this behavior can lead to data loss because the guest running in qemu doesn't have time to flush it's disk cache buffers before it's unceremoniously whacked.
One suggestion on how to solve that problem was to remove SIGKILL from the normal virDomainDestroyFlags, but still provide the ability to kill qemu with SIGKILL by using a new flag to virDomainDestroyFlags. This patch is a quick attempt at that in order to start a conversation on the topic.
So what are your opinions? Is this the right way to solve the problem? No, we can't change the default semantics of virDomainDestroy in
On Fri, Jan 27, 2012 at 01:35:35PM -0500, Laine Stump wrote: this case. Applications expect that we do absolutely everything possible to kill of the guest. This is particularly important for cluster fencing usage. If we only use SIGTERM, then we're introducing unacceptable risk to apps relying on this.
We could do the opposite though - have a flag to do a gracefully destroy, leaving the default as un-graceful.
virDomainShutdown ends up calling qemuProcessKill() too. So, I guess we need to add a flag there too.
In the meantime, shouldn't we at least wait longer before resorting to SIGKILL? (especially since it appears the current timeout is quite often too short). (If we don't at least do that, what we're saying is "the behavior of virDomainShutdown / virDomainDestroy is to lose your data unless you're lucky. If you don't want this behavior, you need to use virDomainXXXFlags, and specify the VIR_DOMAIN_DONT_TRASH_MY_DATA flag" :-P).
If you add a flag to trigger a graceful kill(SIGINT) only, then we don't need to change the timeout. The application now has the ability to wait as long as they like, before issuing another virDomainDestroy() Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 01/31/2012 02:53 AM, Daniel P. Berrange wrote:
In the meantime, shouldn't we at least wait longer before resorting to SIGKILL? (especially since it appears the current timeout is quite often too short). (If we don't at least do that, what we're saying is "the behavior of virDomainShutdown / virDomainDestroy is to lose your data unless you're lucky. If you don't want this behavior, you need to use virDomainXXXFlags, and specify the VIR_DOMAIN_DONT_TRASH_MY_DATA flag" :-P).
If you add a flag to trigger a graceful kill(SIGINT) only, then we don't need to change the timeout. The application now has the ability to wait as long as they like, before issuing another virDomainDestroy()
The new flag only benefits new apps that are compiled to use the new flag. I see nothing wrong with lengthening the timeout when no flag is present, to also benefit the older apps that have not yet been recompiled to use the new flag, since as was pointed out, the loop gives up as soon as the process is gone, so in the common case, it won't change behavior, and in the timeout case, we are waiting longer and thus less likely to lose data. In other words, I'm in favor of both approaches (new flag and longer default timeout when no flag is used). -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (4)
-
Daniel P. Berrange
-
Eric Blake
-
Laine Stump
-
Michal Privoznik