[libvirt] [PATCH] qemu: Prevent disk corruption on domain shutdown

Ever since we introduced fake reboot, we call qemuProcessKill as a reaction to SHUTDOWN event. Unfortunately, qemu doesn't guarantee it flushed all internal buffers before sending SHUTDOWN, in which case killing the process forcibly may result in (virtual) disk corruption. By sending just SIGTERM without SIGKILL we give qemu time to to flush all buffers and exit. Once qemu exits, we will see an EOF on monitor connection and tear down the domain. In case qemu ignores SIGTERM or just hangs there, the process stays running but that's not any different from a possible hang anytime during the shutdown process so I think it's just fine. Also qemu (since 0.14 until it's fixed) has a bug in SIGTERM processing which causes it not to exit but instead send new SHUTDOWN event and keep waiting. I think the best we can do is to ignore duplicate SHUTDOWN events to avoid a SHUTDOWN-SIGTERM loop and leave the domain in paused state. --- src/qemu/qemu_driver.c | 2 +- src/qemu/qemu_process.c | 25 ++++++++++++++++++------- src/qemu/qemu_process.h | 2 +- 3 files changed, 20 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index d2626ff..9ff800f 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1870,7 +1870,7 @@ qemuDomainDestroyFlags(virDomainPtr dom, * can kill the process even if a job is active. Killing * it now means the job will be released */ - qemuProcessKill(vm); + qemuProcessKill(vm, false); if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_DESTROY) < 0) goto cleanup; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 24d1dc7..dbd697d 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -419,7 +419,7 @@ endjob: cleanup: if (vm) { if (ret == -1) - qemuProcessKill(vm); + qemuProcessKill(vm, false); if (virDomainObjUnref(vm) > 0) virDomainObjUnlock(vm); } @@ -437,6 +437,12 @@ qemuProcessHandleShutdown(qemuMonitorPtr mon ATTRIBUTE_UNUSED, VIR_DEBUG("vm=%p", vm); virDomainObjLock(vm); + if (priv->gotShutdown) { + VIR_DEBUG("Ignoring repeated SHUTDOWN event from domain %s", + vm->def->name); + goto cleanup; + } + priv->gotShutdown = true; if (priv->fakeReboot) { virDomainObjRef(vm); @@ -446,16 +452,17 @@ qemuProcessHandleShutdown(qemuMonitorPtr mon ATTRIBUTE_UNUSED, qemuProcessFakeReboot, vm) < 0) { VIR_ERROR(_("Failed to create reboot thread, killing domain")); - qemuProcessKill(vm); + qemuProcessKill(vm, true); if (virDomainObjUnref(vm) == 0) vm = NULL; } } else { - qemuProcessKill(vm); + qemuProcessKill(vm, true); } + +cleanup: if (vm) virDomainObjUnlock(vm); - return 0; } @@ -3183,10 +3190,11 @@ cleanup: } -void qemuProcessKill(virDomainObjPtr vm) +void qemuProcessKill(virDomainObjPtr vm, bool gracefully) { int i; - VIR_DEBUG("vm=%s pid=%d", vm->def->name, vm->pid); + VIR_DEBUG("vm=%s pid=%d gracefully=%d", + vm->def->name, vm->pid, gracefully); if (!virDomainObjIsActive(vm)) { VIR_DEBUG("VM '%s' not active", vm->def->name); @@ -3216,6 +3224,9 @@ void qemuProcessKill(virDomainObjPtr vm) break; } + if (i == 0 && gracefully) + break; + usleep(200 * 1000); } } @@ -3300,7 +3311,7 @@ void qemuProcessStop(struct qemud_driver *driver, } /* shut it off for sure */ - qemuProcessKill(vm); + qemuProcessKill(vm, false); /* Stop autodestroy in case guest is restarted */ qemuProcessAutoDestroyRemove(driver, vm); diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index 96ba3f3..ef422c4 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -68,7 +68,7 @@ int qemuProcessAttach(virConnectPtr conn, virDomainChrSourceDefPtr monConfig, bool monJSON); -void qemuProcessKill(virDomainObjPtr vm); +void qemuProcessKill(virDomainObjPtr vm, bool gracefully); int qemuProcessAutoDestroyInit(struct qemud_driver *driver); void qemuProcessAutoDestroyRun(struct qemud_driver *driver, -- 1.7.6.1

On Thu, Sep 15, 2011 at 12:07:27 +0200, Jiri Denemark wrote:
Ever since we introduced fake reboot, we call qemuProcessKill as a reaction to SHUTDOWN event. Unfortunately, qemu doesn't guarantee it flushed all internal buffers before sending SHUTDOWN, in which case killing the process forcibly may result in (virtual) disk corruption.
By sending just SIGTERM without SIGKILL we give qemu time to to flush all buffers and exit. Once qemu exits, we will see an EOF on monitor connection and tear down the domain. In case qemu ignores SIGTERM or just hangs there, the process stays running but that's not any different from a possible hang anytime during the shutdown process so I think it's just fine.
Also qemu (since 0.14 until it's fixed) has a bug in SIGTERM processing which causes it not to exit but instead send new SHUTDOWN event and keep waiting. I think the best we can do is to ignore duplicate SHUTDOWN events to avoid a SHUTDOWN-SIGTERM loop and leave the domain in paused state.
Oops, I git send-email -1 instead of the file with prepared patch so it's missing some additional comments. Mainly, this is a v2 of the patch and the difference from v1 is: - reuse qemuProcessKill to send just SIGTERM instead of introducing qemuProcessQuit that would send SIGQUIT (which is not handled by qemu anyway) - don't do anything if we get SHUTDOWN event after we have already seen one Jirka

On Thu, Sep 15, 2011 at 12:07:27PM +0200, Jiri Denemark wrote:
Ever since we introduced fake reboot, we call qemuProcessKill as a reaction to SHUTDOWN event. Unfortunately, qemu doesn't guarantee it flushed all internal buffers before sending SHUTDOWN, in which case killing the process forcibly may result in (virtual) disk corruption.
By sending just SIGTERM without SIGKILL we give qemu time to to flush all buffers and exit. Once qemu exits, we will see an EOF on monitor connection and tear down the domain. In case qemu ignores SIGTERM or just hangs there, the process stays running but that's not any different from a possible hang anytime during the shutdown process so I think it's just fine.
Also qemu (since 0.14 until it's fixed) has a bug in SIGTERM processing which causes it not to exit but instead send new SHUTDOWN event and keep waiting. I think the best we can do is to ignore duplicate SHUTDOWN events to avoid a SHUTDOWN-SIGTERM loop and leave the domain in paused state. --- src/qemu/qemu_driver.c | 2 +- src/qemu/qemu_process.c | 25 ++++++++++++++++++------- src/qemu/qemu_process.h | 2 +- 3 files changed, 20 insertions(+), 9 deletions(-)
ACK this looks reasonable. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Fri, Sep 16, 2011 at 12:06:50 +0100, Daniel P. Berrange wrote:
On Thu, Sep 15, 2011 at 12:07:27PM +0200, Jiri Denemark wrote:
Ever since we introduced fake reboot, we call qemuProcessKill as a reaction to SHUTDOWN event. Unfortunately, qemu doesn't guarantee it flushed all internal buffers before sending SHUTDOWN, in which case killing the process forcibly may result in (virtual) disk corruption.
By sending just SIGTERM without SIGKILL we give qemu time to to flush all buffers and exit. Once qemu exits, we will see an EOF on monitor connection and tear down the domain. In case qemu ignores SIGTERM or just hangs there, the process stays running but that's not any different from a possible hang anytime during the shutdown process so I think it's just fine.
Also qemu (since 0.14 until it's fixed) has a bug in SIGTERM processing which causes it not to exit but instead send new SHUTDOWN event and keep waiting. I think the best we can do is to ignore duplicate SHUTDOWN events to avoid a SHUTDOWN-SIGTERM loop and leave the domain in paused state. --- src/qemu/qemu_driver.c | 2 +- src/qemu/qemu_process.c | 25 ++++++++++++++++++------- src/qemu/qemu_process.h | 2 +- 3 files changed, 20 insertions(+), 9 deletions(-)
ACK this looks reasonable.
Thanks, pushed. Jirka

At 09/15/2011 06:07 PM, Jiri Denemark Write:
Ever since we introduced fake reboot, we call qemuProcessKill as a reaction to SHUTDOWN event. Unfortunately, qemu doesn't guarantee it flushed all internal buffers before sending SHUTDOWN, in which case killing the process forcibly may result in (virtual) disk corruption.
By sending just SIGTERM without SIGKILL we give qemu time to to flush all buffers and exit. Once qemu exits, we will see an EOF on monitor connection and tear down the domain. In case qemu ignores SIGTERM or just hangs there, the process stays running but that's not any different from a possible hang anytime during the shutdown process so I think it's just fine.
With this patch, the domain can not be shutdown, because we add '-no-shutdown' in the command line. We can send monitor command 'quit' to avoid this, but we can not get any reply and events from qemu after we send this monitor command(libvirtd will be blocked in qemuMonitorSend()). So it's not easy to send monitor command 'quit'. Thanks Wen Congyang
Also qemu (since 0.14 until it's fixed) has a bug in SIGTERM processing which causes it not to exit but instead send new SHUTDOWN event and keep waiting. I think the best we can do is to ignore duplicate SHUTDOWN events to avoid a SHUTDOWN-SIGTERM loop and leave the domain in paused state.

On Tue, Sep 20, 2011 at 11:31:06 +0800, Wen Congyang wrote:
At 09/15/2011 06:07 PM, Jiri Denemark Write:
Ever since we introduced fake reboot, we call qemuProcessKill as a reaction to SHUTDOWN event. Unfortunately, qemu doesn't guarantee it flushed all internal buffers before sending SHUTDOWN, in which case killing the process forcibly may result in (virtual) disk corruption.
By sending just SIGTERM without SIGKILL we give qemu time to to flush all buffers and exit. Once qemu exits, we will see an EOF on monitor connection and tear down the domain. In case qemu ignores SIGTERM or just hangs there, the process stays running but that's not any different from a possible hang anytime during the shutdown process so I think it's just fine.
With this patch, the domain can not be shutdown, because we add '-no-shutdown' in the command line.
That's a bug in qemu introduced just before 0.14.0 was released (IIRC) and fixed by http://lists.nongnu.org/archive/html/qemu-devel/2011-09/msg01757.html Jirka
participants (3)
-
Daniel P. Berrange
-
Jiri Denemark
-
Wen Congyang