Daniel P. Berrange wrote:
QEMU defaults to allowing the host OS to cache all disk I/O. THis has
a
couple of problems
Oh, say it ain't so. This is precisely what I didn't want to see happen :-(
- It is a waste of memory because the guest already caches I/O ops
Page cache memory is easily reclaimable and has relatively low priority.
If a guest needs memory, the size of the page cache will be reduced.
- It is unsafe on host OS crash - all unflushed guest I/O will be
lost, and there's no ordering guarentees, so metadata updates could
be flushe to disk, while the journal updates were not. Say goodbye
to your filesystem.
This has nothing to do with cache=off. The IDE device defaults to
write-back caching. As such, IDE makes no guarantee that when a data
write completes, it's actually completed on disk. This only comes into
play when write-back is disabled. I'm perfectly happy to accept a patch
that adds explicit sync's when write-back is disabled.
For SCSI, an unordered queue is advertised. Again, everything depends
on whether or not write-back caching is enabled or not. Again,
perfectly happy to take patches here.
More importantly, the most common journaled filesystem, ext3, does not
enable write barriers by default (even for journal updates). This is
how it ship in Red Hat distros. So there is no greater risk of
corrupting a journal in QEMU than there is on bare metal.
- It makes benchmarking more or less impossible / worthless because
what the benchmark things are disk writes just sit around in memory
so guest disk performance appears to exceed host diskperformance.
It just means you have to understand the extra level of caching.
A great deal of virtualization users are doing some form of homogeneous
consolidation. If they have a good set of management tools or
sophisticated storage, then their guests will be sharing base images or
something like that. Caching in the host will result in major
performance improvements because otherwise, the same data will be
fetched multiple times.
This patch disables caching on all QEMU guests. NB, Xen has long done
this
for both PV & HVM guests
They don't for HVM actually. When using file: for PV disks, it also
goes through the host page cache. For HVM, Xen uses the write-back
disabled synchronization stuff I mentioned early.
This is a really bad thing to do by default. I don't even think it
should be an option for users because it's so terribly misunderstood.
Regards,
Anthony Liguori
- QEMU only gained this ability when -drive was
introduced, and sadly kept the default to unsafe cache=on settings.
Daniel
diff -r 4a0ccc9dc530 src/qemu_conf.c
--- a/src/qemu_conf.c Wed Oct 08 11:53:45 2008 +0100
+++ b/src/qemu_conf.c Wed Oct 08 11:59:33 2008 +0100
@@ -460,6 +460,8 @@
flags |= QEMUD_CMD_FLAG_DRIVE;
if (strstr(help, "boot=on"))
flags |= QEMUD_CMD_FLAG_DRIVE_BOOT;
+ if (strstr(help, "cache=on"))
+ flags |= QEMUD_CMD_FLAG_DRIVE_CACHE;
if (version >= 9000)
flags |= QEMUD_CMD_FLAG_VNC_COLON;
@@ -959,13 +961,15 @@
break;
}
- snprintf(opt, PATH_MAX, "file=%s,if=%s,%sindex=%d%s",
+ snprintf(opt, PATH_MAX, "file=%s,if=%s,%sindex=%d%s%s",
disk->src ? disk->src : "", bus,
media ? media : "",
idx,
bootable &&
disk->device == VIR_DOMAIN_DISK_DEVICE_DISK
- ? ",boot=on" : "");
+ ? ",boot=on" : "",
+ qemuCmdFlags & QEMUD_CMD_FLAG_DRIVE_BOOT
+ ? ",cache=off" : "");
ADD_ARG_LIT("-drive");
ADD_ARG_LIT(opt);
diff -r 4a0ccc9dc530 src/qemu_conf.h
--- a/src/qemu_conf.h Wed Oct 08 11:53:45 2008 +0100
+++ b/src/qemu_conf.h Wed Oct 08 11:59:33 2008 +0100
@@ -44,7 +44,8 @@
QEMUD_CMD_FLAG_NO_REBOOT = (1 << 2),
QEMUD_CMD_FLAG_DRIVE = (1 << 3),
QEMUD_CMD_FLAG_DRIVE_BOOT = (1 << 4),
- QEMUD_CMD_FLAG_NAME = (1 << 5),
+ QEMUD_CMD_FLAG_DRIVE_CACHE = (1 << 5),
+ QEMUD_CMD_FLAG_NAME = (1 << 6),
};
/* Main driver state */