Re: [libvirt] [PATCH] qemu: Fix shutdown regression

20 Sep 2011

      On 09/20/2011 12:06 PM, Dave Allan wrote:
...
On Tue, Sep 20, 2011 at 07:39:15PM +0200, Jiri Denemark wrote:
...
The commit that prevents disk corruption on domain shutdown
(96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU
0.14.* and 0.15.* because of a regression bug in QEMU that was fixed
only recently in QEMU git. With affected QEMU binaries, domains cannot
be shutdown properly and stay in a paused state. This patch tries to
avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we
wait a bit more between sending SIGTERM and SIGKILL to reduce the
possibility of virtual disk corruption.
IMO, SIGKILL should only be sent at the explicit direction of the
user, saying in effect, I'm ok with possible data corruption, I want
the VM killed unconditionally.  I would rather leave VMs paused than
risk corrupting data.  Let's get as much input as we can from the qemu
folks before we go down this path.
That re-echos my sentiment that qemu needs to tell us whether the bug is 
fixed (we know that if version < 0.14, the bug is not present, and if 
version > 0.15, the bug is fixed, but it is the 0.1[45] window where we 
don't know if the vendor has back-ported the fix into the version of 
qemu that we are targetting, unless we get some help from qemu).

I also wonder if we should make it so:

virDomainDestroy(dom) fails with a reasonable message, rather than 
leaving the domain paused, if we think qemu has the bug, and require the 
user to do virDomainDestroyFlags(dom, VIR_DOMAIN_DESTROY_FORCE) as the 
means of the user explicitly requesting that they work around the qemu bug.

-- 
Eric Blake   eblake@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org