[libvirt] [PATCH] qemu: Fix shutdown regression

The commit that prevents disk corruption on domain shutdown (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed only recently in QEMU git. With affected QEMU binaries, domains cannot be shutdown properly and stay in a paused state. This patch tries to avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we wait a bit more between sending SIGTERM and SIGKILL to reduce the possibility of virtual disk corruption. --- src/qemu/qemu_capabilities.c | 7 +++++++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_process.c | 19 +++++++++++++------ 3 files changed, 21 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 36f47a9..823c500 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -136,6 +136,7 @@ VIR_ENUM_IMPL(qemuCaps, QEMU_CAPS_LAST, "pci-ohci", "usb-redir", "usb-hub", + "no-shutdown-bug", ); struct qemu_feature_flags { @@ -1049,6 +1050,12 @@ qemuCapsComputeCmdFlags(const char *help, if (version >= 13000) qemuCapsSet(flags, QEMU_CAPS_PCI_MULTIFUNCTION); + + /* QEMU version 0.14.* and 0.15.* are known not to handle SIGTERM + * properly when started with -no-shutdown + */ + if (version >= 14000 && version <= 15999) + qemuCapsSet(flags, QEMU_CAPS_NO_SHUTDOWN_BUG); } /* We parse the output of 'qemu -help' to get the QEMU diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 96b7a3b..53d5ace 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -110,6 +110,7 @@ enum qemuCapsFlags { QEMU_CAPS_PCI_OHCI = 71, /* -device pci-ohci */ QEMU_CAPS_USB_REDIR = 72, /* -device usb-redir */ QEMU_CAPS_USB_HUB = 73, /* -device usb-hub */ + QEMU_CAPS_NO_SHUTDOWN_BUG = 74, /* -no-shutdown doesn't exit on SIGTERM */ QEMU_CAPS_LAST, /* this must always be the last item */ }; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 3baaa19..3311699 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -434,6 +434,7 @@ qemuProcessHandleShutdown(qemuMonitorPtr mon ATTRIBUTE_UNUSED, virDomainObjPtr vm) { qemuDomainObjPrivatePtr priv = vm->privateData; + bool gracefully = true; VIR_DEBUG("vm=%p", vm); virDomainObjLock(vm); @@ -443,6 +444,12 @@ qemuProcessHandleShutdown(qemuMonitorPtr mon ATTRIBUTE_UNUSED, goto cleanup; } + if (qemuCapsGet(priv->qemuCaps, QEMU_CAPS_NO_SHUTDOWN_BUG)) { + VIR_DEBUG("Emulator is likely affected by -no-shutdown bug;" + " we will not avoid sending SIGKILL to it"); + gracefully = false; + } + priv->gotShutdown = true; if (priv->fakeReboot) { virDomainObjRef(vm); @@ -452,12 +459,12 @@ qemuProcessHandleShutdown(qemuMonitorPtr mon ATTRIBUTE_UNUSED, qemuProcessFakeReboot, vm) < 0) { VIR_ERROR(_("Failed to create reboot thread, killing domain")); - qemuProcessKill(vm, true); + qemuProcessKill(vm, gracefully); if (virDomainObjUnref(vm) == 0) vm = NULL; } } else { - qemuProcessKill(vm, true); + qemuProcessKill(vm, gracefully); } cleanup: @@ -3227,15 +3234,15 @@ void qemuProcessKill(virDomainObjPtr vm, bool gracefully) } /* This loop sends SIGTERM, then waits a few iterations - * (1.6 seconds) to see if it dies. If still alive then + * (3 seconds) to see if it dies. If still alive then * it does SIGKILL, and waits a few more iterations (1.6 * seconds more) to confirm that it has really gone. */ - for (i = 0 ; i < 15 ; i++) { + for (i = 0 ; i < 23 ; i++) { int signum; if (i == 0) signum = SIGTERM; - else if (i == 8) + else if (i == 15) signum = SIGKILL; else signum = 0; /* Just check for existence */ @@ -3249,7 +3256,7 @@ void qemuProcessKill(virDomainObjPtr vm, bool gracefully) break; } - if (i == 0 && gracefully) + if (gracefully) break; usleep(200 * 1000); -- 1.7.6.1

On 09/20/2011 11:39 AM, Jiri Denemark wrote:
The commit that prevents disk corruption on domain shutdown (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed only recently in QEMU git. With affected QEMU binaries, domains cannot be shutdown properly and stay in a paused state. This patch tries to avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we wait a bit more between sending SIGTERM and SIGKILL to reduce the possibility of virtual disk corruption. --- src/qemu/qemu_capabilities.c | 7 +++++++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_process.c | 19 +++++++++++++------ 3 files changed, 21 insertions(+), 6 deletions(-)
ACK. But it would be nice if upstream qemu could give us a more reliable indication of whether the qemu SIGTERM bug is fixed, so that we don't corrupt data on a patched 0.14 or 0.15 qemu. That is, as part of fixing the bug in qemu, we should also update -help text or something similar, so that libvirt can avoid making decisions solely on version numbers. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 09/20/2011 01:01 PM, Eric Blake wrote:
On 09/20/2011 11:39 AM, Jiri Denemark wrote:
The commit that prevents disk corruption on domain shutdown (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed only recently in QEMU git. With affected QEMU binaries, domains cannot be shutdown properly and stay in a paused state. This patch tries to avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we wait a bit more between sending SIGTERM and SIGKILL to reduce the possibility of virtual disk corruption. --- src/qemu/qemu_capabilities.c | 7 +++++++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_process.c | 19 +++++++++++++------ 3 files changed, 21 insertions(+), 6 deletions(-)
ACK. But it would be nice if upstream qemu could give us a more reliable indication of whether the qemu SIGTERM bug is fixed, so that we don't corrupt data on a patched 0.14 or 0.15 qemu.
Can you be a lot more specific about what bug you mean?
That is, as part of fixing the bug in qemu, we should also update -help text or something similar, so that libvirt can avoid making decisions solely on version numbers.
The version number *is* the right way to make decisions. We've gone through this dozens of times. The fact that distros backport all sorts of stuff means that you need to maintain a matrix of versions with features. It's not our (upstream QEMU's) responsibility to tell you the differences that exist in forks of QEMU. Regards, Anthony Liguori

On 09/20/2011 12:52 PM, Anthony Liguori wrote:
On 09/20/2011 01:01 PM, Eric Blake wrote:
On 09/20/2011 11:39 AM, Jiri Denemark wrote:
The commit that prevents disk corruption on domain shutdown (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed only recently in QEMU git. With affected QEMU binaries, domains cannot be shutdown properly and stay in a paused state. This patch tries to avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we wait a bit more between sending SIGTERM and SIGKILL to reduce the possibility of virtual disk corruption. --- src/qemu/qemu_capabilities.c | 7 +++++++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_process.c | 19 +++++++++++++------ 3 files changed, 21 insertions(+), 6 deletions(-)
ACK. But it would be nice if upstream qemu could give us a more reliable indication of whether the qemu SIGTERM bug is fixed, so that we don't corrupt data on a patched 0.14 or 0.15 qemu.
Can you be a lot more specific about what bug you mean?
https://bugzilla.redhat.com/show_bug.cgi?id=739895
That is, as part of fixing the bug in qemu, we should also update -help text or something similar, so that libvirt can avoid making decisions solely on version numbers.
The version number *is* the right way to make decisions. We've gone through this dozens of times.
The fact that distros backport all sorts of stuff means that you need to maintain a matrix of versions with features. It's not our (upstream QEMU's) responsibility to tell you the differences that exist in forks of QEMU.
Version numbers are lousy, precisely because they are not granular enough. That's why the autoconf philosophy frowns so heavily on version checks, and prefers feature checks instead. We want to know which features are present, not which versions introduced which features. In this case, we want to know about a particular feature (SIGTERM is not broken), which we know exists later than 0.15, but which might also exist as a backport in 0.14 or 0.15. If qemu tells us that information, then upstream libvirt can make the decision correctly regardless of how distros backport the patch. But if qemu does not expose the information, then upstream libvirt must be pessimistic, and you've now forced the distros to do double-duty - they must backport both the qemu fix, and write a distro-specific libvirt patch that alters the version matrix to play with the distro build of qemu. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 09/20/2011 02:03 PM, Eric Blake wrote:
On 09/20/2011 12:52 PM, Anthony Liguori wrote:
On 09/20/2011 01:01 PM, Eric Blake wrote:
On 09/20/2011 11:39 AM, Jiri Denemark wrote:
The commit that prevents disk corruption on domain shutdown (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed only recently in QEMU git. With affected QEMU binaries, domains cannot be shutdown properly and stay in a paused state. This patch tries to avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we wait a bit more between sending SIGTERM and SIGKILL to reduce the possibility of virtual disk corruption. --- src/qemu/qemu_capabilities.c | 7 +++++++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_process.c | 19 +++++++++++++------ 3 files changed, 21 insertions(+), 6 deletions(-)
ACK. But it would be nice if upstream qemu could give us a more reliable indication of whether the qemu SIGTERM bug is fixed, so that we don't corrupt data on a patched 0.14 or 0.15 qemu.
Can you be a lot more specific about what bug you mean?
That just got applied, last week, so no, it's not in any release right now.
That is, as part of fixing the bug in qemu, we should also update -help text or something similar, so that libvirt can avoid making decisions solely on version numbers.
The version number *is* the right way to make decisions. We've gone through this dozens of times.
The fact that distros backport all sorts of stuff means that you need to maintain a matrix of versions with features. It's not our (upstream QEMU's) responsibility to tell you the differences that exist in forks of QEMU.
Version numbers are lousy, precisely because they are not granular enough. That's why the autoconf philosophy frowns so heavily on version checks, and prefers feature checks instead.
We want to know which features are present,
Features and bugs are different things. I'm all for providing ways to detect whether we support certain commands in QMP, command line options, etc. not which versions introduced which
features. In this case, we want to know about a particular feature (SIGTERM is not broken), which we know exists later than 0.15, but which might also exist as a backport in 0.14 or 0.15.
No, you want to know, does d9389b9664df561db796b18eb8309fffe58faf8b existing in this build of QEMU. But makes d9389b more important than d296363 or db118fe72? If you want to know whether a bug is fixed that is important to *you*, then you should check the git log correlating to that version and embed that info in libvirt. Then libvirt is entirely empowered to deem whatever bug fixes you think are important to the table that you maintain.
If qemu tells us that information, then upstream libvirt can make the decision correctly regardless of how distros backport the patch. But if qemu does not expose the information, then upstream libvirt must be pessimistic, and you've now forced the distros to do double-duty - they must backport both the qemu fix, and write a distro-specific libvirt patch that alters the version matrix to play with the distro build of qemu.
Or distros could use use the QEMU stable branch as their base and invest in backporting to QEMU stable instead of maintaining private backport trees. Regards, Anthony Liguori

On Tue, Sep 20, 2011 at 12:01:58PM -0600, Eric Blake wrote:
On 09/20/2011 11:39 AM, Jiri Denemark wrote:
The commit that prevents disk corruption on domain shutdown (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed only recently in QEMU git. With affected QEMU binaries, domains cannot be shutdown properly and stay in a paused state. This patch tries to avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we wait a bit more between sending SIGTERM and SIGKILL to reduce the possibility of virtual disk corruption. --- src/qemu/qemu_capabilities.c | 7 +++++++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_process.c | 19 +++++++++++++------ 3 files changed, 21 insertions(+), 6 deletions(-)
ACK. But it would be nice if upstream qemu could give us a more reliable indication of whether the qemu SIGTERM bug is fixed, so that we don't corrupt data on a patched 0.14 or 0.15 qemu. That is, as part of fixing the bug in qemu, we should also update -help text or something similar, so that libvirt can avoid making decisions solely on version numbers.
-help is traditionally just a way to detecting features. I think it would be rather an abuse of -help to start adding information about every bugfix that affects mgmt apps too. So I think a version number check is sufficient in this case. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Am 20.09.2011 20:01, schrieb Eric Blake:
On 09/20/2011 11:39 AM, Jiri Denemark wrote:
The commit that prevents disk corruption on domain shutdown (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed only recently in QEMU git. With affected QEMU binaries, domains cannot be shutdown properly and stay in a paused state. This patch tries to avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we wait a bit more between sending SIGTERM and SIGKILL to reduce the possibility of virtual disk corruption.
I really think libvirt should never SIGKILL qemu unless it's explicitly told so. Management tools should try to ask the user before doing so. Killing qemu with SIGKILL is never safe.
--- src/qemu/qemu_capabilities.c | 7 +++++++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_process.c | 19 +++++++++++++------ 3 files changed, 21 insertions(+), 6 deletions(-)
ACK. But it would be nice if upstream qemu could give us a more reliable indication of whether the qemu SIGTERM bug is fixed, so that we don't corrupt data on a patched 0.14 or 0.15 qemu.
0.14 shouldn't have this bug, but it looks like 0.15 has it. Justin, can you cherry-pick d9389b96 and 941f511a into stable-0.15 in order to fix -no-shutdown? Kevin

0.14 shouldn't have this bug, but it looks like 0.15 has it.
Justin, can you cherry-pick d9389b96 and 941f511a into stable-0.15 in order to fix -no-shutdown?
Kevin
I can confirm this, I`m using qemu-kvm 0.14.1 and do not run into this bug Regards Jason

On Tue, Sep 20, 2011 at 07:39:15PM +0200, Jiri Denemark wrote:
The commit that prevents disk corruption on domain shutdown (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed only recently in QEMU git. With affected QEMU binaries, domains cannot be shutdown properly and stay in a paused state. This patch tries to avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we wait a bit more between sending SIGTERM and SIGKILL to reduce the possibility of virtual disk corruption.
IMO, SIGKILL should only be sent at the explicit direction of the user, saying in effect, I'm ok with possible data corruption, I want the VM killed unconditionally. I would rather leave VMs paused than risk corrupting data. Let's get as much input as we can from the qemu folks before we go down this path. Dave
--- src/qemu/qemu_capabilities.c | 7 +++++++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_process.c | 19 +++++++++++++------ 3 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 36f47a9..823c500 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -136,6 +136,7 @@ VIR_ENUM_IMPL(qemuCaps, QEMU_CAPS_LAST, "pci-ohci", "usb-redir", "usb-hub", + "no-shutdown-bug", );
struct qemu_feature_flags { @@ -1049,6 +1050,12 @@ qemuCapsComputeCmdFlags(const char *help,
if (version >= 13000) qemuCapsSet(flags, QEMU_CAPS_PCI_MULTIFUNCTION); + + /* QEMU version 0.14.* and 0.15.* are known not to handle SIGTERM + * properly when started with -no-shutdown + */ + if (version >= 14000 && version <= 15999) + qemuCapsSet(flags, QEMU_CAPS_NO_SHUTDOWN_BUG); }
/* We parse the output of 'qemu -help' to get the QEMU diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 96b7a3b..53d5ace 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -110,6 +110,7 @@ enum qemuCapsFlags { QEMU_CAPS_PCI_OHCI = 71, /* -device pci-ohci */ QEMU_CAPS_USB_REDIR = 72, /* -device usb-redir */ QEMU_CAPS_USB_HUB = 73, /* -device usb-hub */ + QEMU_CAPS_NO_SHUTDOWN_BUG = 74, /* -no-shutdown doesn't exit on SIGTERM */
QEMU_CAPS_LAST, /* this must always be the last item */ }; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 3baaa19..3311699 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -434,6 +434,7 @@ qemuProcessHandleShutdown(qemuMonitorPtr mon ATTRIBUTE_UNUSED, virDomainObjPtr vm) { qemuDomainObjPrivatePtr priv = vm->privateData; + bool gracefully = true; VIR_DEBUG("vm=%p", vm);
virDomainObjLock(vm); @@ -443,6 +444,12 @@ qemuProcessHandleShutdown(qemuMonitorPtr mon ATTRIBUTE_UNUSED, goto cleanup; }
+ if (qemuCapsGet(priv->qemuCaps, QEMU_CAPS_NO_SHUTDOWN_BUG)) { + VIR_DEBUG("Emulator is likely affected by -no-shutdown bug;" + " we will not avoid sending SIGKILL to it"); + gracefully = false; + } + priv->gotShutdown = true; if (priv->fakeReboot) { virDomainObjRef(vm); @@ -452,12 +459,12 @@ qemuProcessHandleShutdown(qemuMonitorPtr mon ATTRIBUTE_UNUSED, qemuProcessFakeReboot, vm) < 0) { VIR_ERROR(_("Failed to create reboot thread, killing domain")); - qemuProcessKill(vm, true); + qemuProcessKill(vm, gracefully); if (virDomainObjUnref(vm) == 0) vm = NULL; } } else { - qemuProcessKill(vm, true); + qemuProcessKill(vm, gracefully); }
cleanup: @@ -3227,15 +3234,15 @@ void qemuProcessKill(virDomainObjPtr vm, bool gracefully) }
/* This loop sends SIGTERM, then waits a few iterations - * (1.6 seconds) to see if it dies. If still alive then + * (3 seconds) to see if it dies. If still alive then * it does SIGKILL, and waits a few more iterations (1.6 * seconds more) to confirm that it has really gone. */ - for (i = 0 ; i < 15 ; i++) { + for (i = 0 ; i < 23 ; i++) { int signum; if (i == 0) signum = SIGTERM; - else if (i == 8) + else if (i == 15) signum = SIGKILL; else signum = 0; /* Just check for existence */ @@ -3249,7 +3256,7 @@ void qemuProcessKill(virDomainObjPtr vm, bool gracefully) break; }
- if (i == 0 && gracefully) + if (gracefully) break;
usleep(200 * 1000); -- 1.7.6.1
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 09/20/2011 12:06 PM, Dave Allan wrote:
On Tue, Sep 20, 2011 at 07:39:15PM +0200, Jiri Denemark wrote:
The commit that prevents disk corruption on domain shutdown (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed only recently in QEMU git. With affected QEMU binaries, domains cannot be shutdown properly and stay in a paused state. This patch tries to avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we wait a bit more between sending SIGTERM and SIGKILL to reduce the possibility of virtual disk corruption.
IMO, SIGKILL should only be sent at the explicit direction of the user, saying in effect, I'm ok with possible data corruption, I want the VM killed unconditionally. I would rather leave VMs paused than risk corrupting data. Let's get as much input as we can from the qemu folks before we go down this path.
That re-echos my sentiment that qemu needs to tell us whether the bug is fixed (we know that if version < 0.14, the bug is not present, and if version > 0.15, the bug is fixed, but it is the 0.1[45] window where we don't know if the vendor has back-ported the fix into the version of qemu that we are targetting, unless we get some help from qemu). I also wonder if we should make it so: virDomainDestroy(dom) fails with a reasonable message, rather than leaving the domain paused, if we think qemu has the bug, and require the user to do virDomainDestroyFlags(dom, VIR_DOMAIN_DESTROY_FORCE) as the means of the user explicitly requesting that they work around the qemu bug. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Tue, Sep 20, 2011 at 14:06:49 -0400, Dave Allan wrote:
On Tue, Sep 20, 2011 at 07:39:15PM +0200, Jiri Denemark wrote:
The commit that prevents disk corruption on domain shutdown (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed only recently in QEMU git. With affected QEMU binaries, domains cannot be shutdown properly and stay in a paused state. This patch tries to avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we wait a bit more between sending SIGTERM and SIGKILL to reduce the possibility of virtual disk corruption.
IMO, SIGKILL should only be sent at the explicit direction of the user, saying in effect, I'm ok with possible data corruption, I want the VM killed unconditionally. I would rather leave VMs paused than risk corrupting data. Let's get as much input as we can from the qemu folks before we go down this path.
Yes, I agree with that. Better leave the domain paused (and virDomainGetState() or virsh domstate --reason report that it is paused because it shut down) and let users call explicit virDomainDestroy to get rid of it than silently keeping the possibility of disk corruption. Though some people may see this as such a big regression of virDomainShutdown that we should rather live with the possible corruption. After all, the disk corruption has so far been seen only with specific Windows version and in specific scenario and the probability of hitting it may be quite low. And this patch doesn't make it worse than 0.9.[34], it actually makes it a bit better by giving qemu more time before sending SIGKILL. Anyway, this patch provided a starting point for our discussion on whether/how we should address the shutdown regression. Jirka

On Tue, Sep 20, 2011 at 02:06:49PM -0400, Dave Allan wrote:
On Tue, Sep 20, 2011 at 07:39:15PM +0200, Jiri Denemark wrote:
The commit that prevents disk corruption on domain shutdown (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed only recently in QEMU git. With affected QEMU binaries, domains cannot be shutdown properly and stay in a paused state. This patch tries to avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we wait a bit more between sending SIGTERM and SIGKILL to reduce the possibility of virtual disk corruption.
IMO, SIGKILL should only be sent at the explicit direction of the user, saying in effect, I'm ok with possible data corruption, I want the VM killed unconditionally. I would rather leave VMs paused than risk corrupting data. Let's get as much input as we can from the qemu folks before we go down this path.
If we want that, then we need to have a different way to deal with this shutdown problem. Leaving VMs in a paused state is not acceptable behaviour - our IRC channel has already been flooded with people complaining about this and we've only released this 2 days ago. This patch of Jiri's attempts to minimise the liklihood of hitting a problem. If people want an absolute guarantee then they have the option to update to a version of QEMU that has the fix, or distros can backport the fix to QEMU and disable this bit in libvirt. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tue, Sep 20, 2011 at 19:39:15 +0200, Jiri Denemark wrote:
The commit that prevents disk corruption on domain shutdown (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed only recently in QEMU git. With affected QEMU binaries, domains cannot be shutdown properly and stay in a paused state. This patch tries to avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we wait a bit more between sending SIGTERM and SIGKILL to reduce the possibility of virtual disk corruption.
An alternative solution would be to break reboot on affected QEMUs instead of trying to live with possible data corruption as if libvirt 0.9.[34] is used. That is, virDomainReboot would report unsupported operation because current emulator binary is not able to support it without causing possible data corruption during shutdown. Jirka

On Tue, Sep 20, 2011 at 09:10:46PM +0200, Jiri Denemark wrote:
On Tue, Sep 20, 2011 at 19:39:15 +0200, Jiri Denemark wrote:
The commit that prevents disk corruption on domain shutdown (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed only recently in QEMU git. With affected QEMU binaries, domains cannot be shutdown properly and stay in a paused state. This patch tries to avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we wait a bit more between sending SIGTERM and SIGKILL to reduce the possibility of virtual disk corruption.
An alternative solution would be to break reboot on affected QEMUs instead of trying to live with possible data corruption as if libvirt 0.9.[34] is used. That is, virDomainReboot would report unsupported operation because current emulator binary is not able to support it without causing possible data corruption during shutdown.
That approach gets my vote. Dave
Jirka
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 09/20/2011 01:10 PM, Jiri Denemark wrote:
On Tue, Sep 20, 2011 at 19:39:15 +0200, Jiri Denemark wrote:
The commit that prevents disk corruption on domain shutdown (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed only recently in QEMU git. With affected QEMU binaries, domains cannot be shutdown properly and stay in a paused state. This patch tries to avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we wait a bit more between sending SIGTERM and SIGKILL to reduce the possibility of virtual disk corruption.
An alternative solution would be to break reboot on affected QEMUs instead of trying to live with possible data corruption as if libvirt 0.9.[34] is used. That is, virDomainReboot would report unsupported operation because current emulator binary is not able to support it without causing possible data corruption during shutdown.
That makes sense - that is, we don't use -no-shutdown, so virDomainReboot reports failure, but then virDomainDestroy doesn't tickle the bug and can unconditionally work as it has always done. I like that idea a bit better, especially since it sounds like upstream qemu is reluctant to modify -help text to help us out. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Tue, Sep 20, 2011 at 13:22:12 -0600, Eric Blake wrote:
On 09/20/2011 01:10 PM, Jiri Denemark wrote:
On Tue, Sep 20, 2011 at 19:39:15 +0200, Jiri Denemark wrote:
The commit that prevents disk corruption on domain shutdown (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed only recently in QEMU git. With affected QEMU binaries, domains cannot be shutdown properly and stay in a paused state. This patch tries to avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we wait a bit more between sending SIGTERM and SIGKILL to reduce the possibility of virtual disk corruption.
An alternative solution would be to break reboot on affected QEMUs instead of trying to live with possible data corruption as if libvirt 0.9.[34] is used. That is, virDomainReboot would report unsupported operation because current emulator binary is not able to support it without causing possible data corruption during shutdown.
That makes sense - that is, we don't use -no-shutdown, so virDomainReboot reports failure, but then virDomainDestroy doesn't tickle the bug and can unconditionally work as it has always done.
BTW, this has nothing to do with virDomainDestroy, which works as it always worked. The issue is about virDomainShutdown or just a guest initiated shutdown. Jirka

The commit that prevents disk corruption on domain shutdown (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed only recently in QEMU git. The affected versions of QEMU do not quit on SIGTERM if started with -no-shutdown, which we use to implement fake reboot. Since -no-shutdown tells QEMU not to quit automatically on guest shutdown, domains started using the affected QEMU cannot be shutdown properly and stay in a paused state. This patch disables fake reboot feature on such QEMU by not using -no-shutdown, which makes shutdown work as expected. However, virDomainReboot will not work in this case and it will report "Requested operation is not valid: Reboot is not supported with this QEMU binary". NB, the qemu capability is called QEMU_CAPS_NO_FAKE_REBOOT for backward compatibility with running domains started by libvirtd that didn't have this patch in. --- src/qemu/qemu_capabilities.c | 9 +++++++++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 2 +- src/qemu/qemu_driver.c | 6 ++++++ 4 files changed, 17 insertions(+), 1 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 36f47a9..5b9e4a9 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -136,6 +136,7 @@ VIR_ENUM_IMPL(qemuCaps, QEMU_CAPS_LAST, "pci-ohci", "usb-redir", "usb-hub", + "no-fake-reboot", ); struct qemu_feature_flags { @@ -1008,6 +1009,14 @@ qemuCapsComputeCmdFlags(const char *help, qemuCapsSet(flags, QEMU_CAPS_VHOST_NET); } + /* Do not use -no-shutdown to implement fake reboot if qemu doesn't support + * it or SIGTERM handling is most likely buggy when used with -no-shutdown + * (which applies for qemu 0.14.* and 0.15.*) + */ + if (!strstr(help, "-no-shutdown") || + version / 1000 == 14 || version / 1000 == 15) + qemuCapsSet(flags, QEMU_CAPS_NO_FAKE_REBOOT); + /* * Handling of -incoming arg with varying features * -incoming tcp (kvm >= 79, qemu >= 0.10.0) diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 96b7a3b..b4f3ba4 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -110,6 +110,7 @@ enum qemuCapsFlags { QEMU_CAPS_PCI_OHCI = 71, /* -device pci-ohci */ QEMU_CAPS_USB_REDIR = 72, /* -device usb-redir */ QEMU_CAPS_USB_HUB = 73, /* -device usb-hub */ + QEMU_CAPS_NO_FAKE_REBOOT = 74, /* don't fake reboot using -no-shutdown */ QEMU_CAPS_LAST, /* this must always be the last item */ }; diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index ee4b52b..aab4cd9 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3574,7 +3574,7 @@ qemuBuildCommandLine(virConnectPtr conn, * when QEMU stops. If we use no-shutdown, then we can * watch for this event and do a soft/warm reboot. */ - if (monitor_json) + if (monitor_json && !qemuCapsGet(qemuCaps, QEMU_CAPS_NO_FAKE_REBOOT)) virCommandAddArg(cmd, "-no-shutdown"); if (!(def->features & (1 << VIR_DOMAIN_FEATURE_ACPI))) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index f4ee4c3..d2eb881 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1557,6 +1557,12 @@ static int qemuDomainReboot(virDomainPtr dom, unsigned int flags) { priv = vm->privateData; if (qemuCapsGet(priv->qemuCaps, QEMU_CAPS_MONITOR_JSON)) { + if (qemuCapsGet(priv->qemuCaps, QEMU_CAPS_NO_FAKE_REBOOT)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("Reboot is not supported with this QEMU binary")); + goto cleanup; + } + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) goto cleanup; -- 1.7.6.1

On Wed, Sep 21, 2011 at 10:43:32AM +0200, Jiri Denemark wrote:
The commit that prevents disk corruption on domain shutdown (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed only recently in QEMU git. The affected versions of QEMU do not quit on SIGTERM if started with -no-shutdown, which we use to implement fake reboot. Since -no-shutdown tells QEMU not to quit automatically on guest shutdown, domains started using the affected QEMU cannot be shutdown properly and stay in a paused state.
This patch disables fake reboot feature on such QEMU by not using -no-shutdown, which makes shutdown work as expected. However, virDomainReboot will not work in this case and it will report "Requested operation is not valid: Reboot is not supported with this QEMU binary".
NB, the qemu capability is called QEMU_CAPS_NO_FAKE_REBOOT for backward compatibility with running domains started by libvirtd that didn't have this patch in. --- src/qemu/qemu_capabilities.c | 9 +++++++++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 2 +- src/qemu/qemu_driver.c | 6 ++++++ 4 files changed, 17 insertions(+), 1 deletions(-)
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 36f47a9..5b9e4a9 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -136,6 +136,7 @@ VIR_ENUM_IMPL(qemuCaps, QEMU_CAPS_LAST, "pci-ohci", "usb-redir", "usb-hub", + "no-fake-reboot", );
This should really be 'no-shutdown'. The reboot code is just one current potential user of that capability.
struct qemu_feature_flags { @@ -1008,6 +1009,14 @@ qemuCapsComputeCmdFlags(const char *help, qemuCapsSet(flags, QEMU_CAPS_VHOST_NET); }
+ /* Do not use -no-shutdown to implement fake reboot if qemu doesn't support + * it or SIGTERM handling is most likely buggy when used with -no-shutdown + * (which applies for qemu 0.14.* and 0.15.*) + */ + if (!strstr(help, "-no-shutdown") || + version / 1000 == 14 || version / 1000 == 15) + qemuCapsSet(flags, QEMU_CAPS_NO_FAKE_REBOOT); + /* * Handling of -incoming arg with varying features * -incoming tcp (kvm >= 79, qemu >= 0.10.0) diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 96b7a3b..b4f3ba4 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -110,6 +110,7 @@ enum qemuCapsFlags { QEMU_CAPS_PCI_OHCI = 71, /* -device pci-ohci */ QEMU_CAPS_USB_REDIR = 72, /* -device usb-redir */ QEMU_CAPS_USB_HUB = 73, /* -device usb-hub */ + QEMU_CAPS_NO_FAKE_REBOOT = 74, /* don't fake reboot using -no-shutdown */
s/FAKE_REBOOT/SHUTDOWN/ Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Wed, Sep 21, 2011 at 10:29:35 +0100, Daniel P. Berrange wrote:
On Wed, Sep 21, 2011 at 10:43:32AM +0200, Jiri Denemark wrote:
The commit that prevents disk corruption on domain shutdown (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed only recently in QEMU git. The affected versions of QEMU do not quit on SIGTERM if started with -no-shutdown, which we use to implement fake reboot. Since -no-shutdown tells QEMU not to quit automatically on guest shutdown, domains started using the affected QEMU cannot be shutdown properly and stay in a paused state.
This patch disables fake reboot feature on such QEMU by not using -no-shutdown, which makes shutdown work as expected. However, virDomainReboot will not work in this case and it will report "Requested operation is not valid: Reboot is not supported with this QEMU binary".
NB, the qemu capability is called QEMU_CAPS_NO_FAKE_REBOOT for backward compatibility with running domains started by libvirtd that didn't have this patch in. --- src/qemu/qemu_capabilities.c | 9 +++++++++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 2 +- src/qemu/qemu_driver.c | 6 ++++++ 4 files changed, 17 insertions(+), 1 deletions(-)
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 36f47a9..5b9e4a9 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -136,6 +136,7 @@ VIR_ENUM_IMPL(qemuCaps, QEMU_CAPS_LAST, "pci-ohci", "usb-redir", "usb-hub", + "no-fake-reboot", );
This should really be 'no-shutdown'.
The reboot code is just one current potential user of that capability.
Yeah, that makes sense. I'm just worried about compatibility with domain started by older libvirtd since they were started with -no-shutdown but qemu capabilities in status xml doesn't mention that. So if the qemu is old or new enough to support -no-shutdown properly, the new libvirtd won't allow running domains to be rebooted using virDomainReboot until they are recreated and no-shutdown capability detected. Another issue is that domain started by older libvirtd with -no-shutdown will remain paused on shutdown. In other words, this patch fixes only domains that will be started in the future, not currently running ones. Jirka
participants (7)
-
Anthony Liguori
-
Daniel P. Berrange
-
Dave Allan
-
Eric Blake
-
Jason Krieg
-
Jiri Denemark
-
Kevin Wolf