On Tue, Sep 20, 2011 at 14:06:49 -0400, Dave Allan wrote:
On Tue, Sep 20, 2011 at 07:39:15PM +0200, Jiri Denemark wrote:
> The commit that prevents disk corruption on domain shutdown
> (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU
> 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed
> only recently in QEMU git. With affected QEMU binaries, domains cannot
> be shutdown properly and stay in a paused state. This patch tries to
> avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we
> wait a bit more between sending SIGTERM and SIGKILL to reduce the
> possibility of virtual disk corruption.
IMO, SIGKILL should only be sent at the explicit direction of the
user, saying in effect, I'm ok with possible data corruption, I want
the VM killed unconditionally. I would rather leave VMs paused than
risk corrupting data. Let's get as much input as we can from the qemu
folks before we go down this path.
Yes, I agree with that. Better leave the domain paused (and
virDomainGetState() or virsh domstate --reason report that it is paused
because it shut down) and let users call explicit virDomainDestroy to get rid
of it than silently keeping the possibility of disk corruption. Though some
people may see this as such a big regression of virDomainShutdown that we
should rather live with the possible corruption. After all, the disk
corruption has so far been seen only with specific Windows version and in
specific scenario and the probability of hitting it may be quite low. And this
patch doesn't make it worse than 0.9.[34], it actually makes it a bit better
by giving qemu more time before sending SIGKILL.
Anyway, this patch provided a starting point for our discussion on whether/how
we should address the shutdown regression.
Jirka