Re: [libvirt] PATCH: Disable QEMU drive caching

8 Oct 2008

      Daniel P. Berrange wrote:
...
On Wed, Oct 08, 2008 at 01:15:46PM +0200, Chris Lalancette wrote:
...
Daniel P. Berrange wrote:
...
QEMU defaults to allowing the host OS to cache all disk I/O. THis has a
couple of problems
- It is a waste of memory because the guest already caches I/O ops
 - It is unsafe on host OS crash - all unflushed guest I/O will be
   lost, and there's no ordering guarentees, so metadata updates could
   be flushe to disk, while the journal updates were not. Say goodbye
   to your filesystem.
 - It makes benchmarking more or less impossible / worthless because
   what the benchmark things are disk writes just sit around in memory
   so guest disk performance appears to exceed host diskperformance.
This patch disables caching on all QEMU guests. NB, Xen has long done this
for both PV & HVM guests - QEMU only gained this ability when -drive was
introduced, and sadly kept the default to unsafe cache=on settings.
I'm for this in general, but I'm a little worried about the "performance
regression" aspect of this.  People are going to upgrade to 0.4.7 (or whatever),
and suddenly find that their KVM guests perform much more slowly.  This is
better in the end for their data, but we might hear large complaints about it.
Yes & no. They will find their guests perform more consistently. With the
current system their guests will perform very erratically depending on 
memory & I/O pressure on the host. If the host I/O cache is empty & has 
no I/O load, current guests will be "fast",
They will perform marginally better than if cache=off.  This is the 
Linux host knows more about the underlying hardware than the guest and 
is able to do smarter read-ahead.  When using cache=off, the host cannot 
perform any sort of read-ahead.
...
but if host I/O cache is full
and they do something which requires more host memory (eg start up another
guest), then all existing guests get their I/O performance trashed as the
I/O cache has to be flushed out, and future I/O is unable to be cached.
This is not accurate.  Dirty pages in the host page cache are not 
reclaimable until they're written to disk.  If you're in a seriously low 
memory situation, they the thing allocating memory is going to sleep 
until the data is written to disk.  If an existing guest is trying to do 
I/O, then what things will degenerate to is basically cache=off since 
the guest must wait for other pending IO to complete
...
Xen went through this same change and there were not any serious
complaints, particularly when explained that previous system had
zero data integrity guarentees. The current system merely provides an
illusion of performance - any attempt to show that performance has 
decreased is impossible because any attempt to run benchmarks with
existing caching just results in meaningless garbage.
https://bugzilla.redhat.com/show_bug.cgi?id=444047
I can't see this bug, but a quick grep of ioemu in xen-unstable for 
O_DIRECT reveals that they are not in fact using O_DIRECT.

O_DIRECT, O_SYNC, and fsync are not the same mechanism.

Regards,

Anthony Liguori