On 09/20/2011 12:06 PM, Dave Allan wrote:
On Tue, Sep 20, 2011 at 07:39:15PM +0200, Jiri Denemark wrote:
> The commit that prevents disk corruption on domain shutdown
> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU
> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed
> only recently in QEMU git. With affected QEMU binaries, domains cannot
> be shutdown properly and stay in a paused state. This patch tries to
> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we
> wait a bit more between sending SIGTERM and SIGKILL to reduce the
> possibility of virtual disk corruption.
IMO, SIGKILL should only be sent at the explicit direction of the
user, saying in effect, I'm ok with possible data corruption, I want
the VM killed unconditionally. I would rather leave VMs paused than
risk corrupting data. Let's get as much input as we can from the qemu
folks before we go down this path.
That re-echos my sentiment that qemu needs to tell us whether the bug is
fixed (we know that if version < 0.14, the bug is not present, and if
version > 0.15, the bug is fixed, but it is the 0.1[45] window where we
don't know if the vendor has back-ported the fix into the version of
qemu that we are targetting, unless we get some help from qemu).
I also wonder if we should make it so:
virDomainDestroy(dom) fails with a reasonable message, rather than
leaving the domain paused, if we think qemu has the bug, and require the
user to do virDomainDestroyFlags(dom, VIR_DOMAIN_DESTROY_FORCE) as the
means of the user explicitly requesting that they work around the qemu bug.
--
Eric Blake eblake(a)redhat.com +1-801-349-2682
Libvirt virtualization library
http://libvirt.org