Re: [libvirt-users] Behavior of disk caching with qcow2 disks

14 Aug 2014

      ----- Original Message -----
...
From: "Kashyap Chamarthy" <kchamart@redhat.com>
Looking at `qemu-img` source[1], 'cache=writeback' seems to be the
default. That's also corroborated by this[2] (Rich's blog, and
libguestfs/virt-tools lead developer).
[1]
  http://git.qemu.org/?p=qemu.git;a=blob;f=qemu-img.c;h=d4518e724f848a6ff8ffaf...
  [2]
  http://rwmj.wordpress.com/2013/09/02/new-in-libguestfs-allow-cache-mode-to-b...
...
Which is correct?
"cache=writeback"
...
How is the cache mode set by default (if cache= is not
specified)?
It's compiled into the binary.
...
My second question is can cache=none be used safely on a local ext4
filesystem with no BBU? Since ext4 uses barriers, would writing to
these qcow2 image files be safe? The kernel documentation about
barriers states that "Write barriers enforce proper on-disk ordering
of journal commits, making volatile disk write caches safe to use, at
some performance penalty". Does this apply to qcow2 VM images?
FWIW, in my test environments (which I should admit - there's not a
whole lot of I/O activity), I use:
$ qemu-img create -f qcow2 -o preallocation=metadata test1.qcow2 8G
Followed by an `fallocate`:
$ fallocate -l 8589934592 test1.qcow2
Then, I used to invoke QEMU "cache=none" (setting it in libvirt's guest
XML), but lately started using the default "cache=writeback" after the
I learnt about the bug from Rich's blog above.
--
/kashyap
Kashyap,

Thanks for the clarification. Rich's article seems to indicate that cache=writeback
is safe:
...
writeback is the new, safe default. Flush commands are obeyed so as long as you’re
using a journalled filesystem or issue guestfs_sync calls your data will be safe. 
However, I have several VMs running on a server with qemu-kvm 1.4.0 and libguestfs 1.14.8
(older because this is Ubuntu 12.04) using the default cache mode, cache=writeback, 
and recently this server's UPS experienced a fault so all of the VMs and host lost power. 
After booting back up, I discovered that the filesystems on 3 of the guests were 
corrupted, requiring an fsck with a lot of fixes. After fsck finished, data that had 
been written within the last 24-48 hours on the disks appears to have been corrupted. 
This makes me think that the data was never synced back to the disk, and would 
indicate that I can't trust the guest's journalled filesystem. This data was written 
several hours before the crash, so I would think that should be enough time for an 
fsync to be called.
How can I guarantee the safety of written data on guests whose images are stored on 
the following types of filesystems:
* local ext4 filesystem on a md RAID (no BBU)
* NFS mountpoint with the "sync" option

Thanks,

Andrew

Re: [libvirt-users] Behavior of disk caching with qcow2 disks

Andrew Martin