Re: [libvirt] PATCH: Disable QEMU drive caching

9 Oct 2008


      Daniel Veillard wrote:
...
On Wed, Oct 08, 2008 at 10:51:16AM -0500, Anthony Liguori wrote:
...
...
- It is unsafe on host OS crash - all unflushed guest I/O will be
   lost, and there's no ordering guarentees, so metadata updates could
   be flushe to disk, while the journal updates were not. Say goodbye
   to your filesystem.
This has nothing to do with cache=off.  The IDE device defaults to  
write-back caching.  As such, IDE makes no guarantee that when a data  
write completes, it's actually completed on disk.  This only comes into
Daniel P. Berrange wrote:
play when write-back is disabled.  I'm perfectly happy to accept a patch  
that adds explicit sync's when write-back is disabled.
For SCSI, an unordered queue is advertised.  Again, everything depends  
on whether or not write-back caching is enabled or not.  Again,  
perfectly happy to take patches here.
More importantly, the most common journaled filesystem, ext3, does not  
enable write barriers by default (even for journal updates).  This is  
how it ship in Red Hat distros.  So there is no greater risk of  
corrupting a journal in QEMU than there is on bare metal.
Interesting discussion, I'm wondering about the non-local storage
effect though, if the Node is caching writes, how can we ensure a
coherent view on remote storage for example when migrating a domain ?
Maybe migration is easy to fix because qemu is aware and can issue a
sync, but as we start adding cloning APIs to libvirt, we could face the
issue if issuing an LVM snapshot operation on the guest storage while
the Node still cache some of the data. The more layers of caching the
harder it is to have a predictable behaviour, no ?
Any live migration infrastructure must guarantee the write ordering between guest writes generated on the "old" node and guest writes generated on the "new" node.  This usually happens as the live migration crosses the point of no return where the guest is allow to execute code on the "new" node.  The "old" node must flush it's writes and/or the "new" node must delay any new writes until it is safe to do so.  In the case of LVM snapshots, only one node is able to safely access the snapshot at a time, so an organized transfer of the active snapshot is necessary during the live migration.  For the case of CLVM, I would think the "cluster-aware" bits would coordinate the transfer.  Even in this case though, the data must be flushed out of the page cache on the "old" node and onto the storage itself.

Steve

Re: [libvirt] PATCH: Disable QEMU drive caching

Steve Ofsthun