
On Tue, Sep 20, 2011 at 02:06:49PM -0400, Dave Allan wrote:
On Tue, Sep 20, 2011 at 07:39:15PM +0200, Jiri Denemark wrote:
The commit that prevents disk corruption on domain shutdown (96fc4784177ecb70357518fa863442455e45ad0e) causes regression with QEMU 0.14.* and 0.15.* because of a regression bug in QEMU that was fixed only recently in QEMU git. With affected QEMU binaries, domains cannot be shutdown properly and stay in a paused state. This patch tries to avoid this by sending SIGKILL to 0.1[45].* QEMU processes. Though we wait a bit more between sending SIGTERM and SIGKILL to reduce the possibility of virtual disk corruption.
IMO, SIGKILL should only be sent at the explicit direction of the user, saying in effect, I'm ok with possible data corruption, I want the VM killed unconditionally. I would rather leave VMs paused than risk corrupting data. Let's get as much input as we can from the qemu folks before we go down this path.
If we want that, then we need to have a different way to deal with this shutdown problem. Leaving VMs in a paused state is not acceptable behaviour - our IRC channel has already been flooded with people complaining about this and we've only released this 2 days ago. This patch of Jiri's attempts to minimise the liklihood of hitting a problem. If people want an absolute guarantee then they have the option to update to a version of QEMU that has the fix, or distros can backport the fix to QEMU and disable this bit in libvirt. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|