Hi,
Not greatly familiar with this subject, but trying to follow your
logic ...
On Wed, 2008-10-08 at 10:51 -0500, Anthony Liguori wrote:
Daniel P. Berrange wrote:
> - It is unsafe on host OS crash - all unflushed guest I/O will be
> lost, and there's no ordering guarentees, so metadata updates could
> be flushe to disk, while the journal updates were not. Say goodbye
> to your filesystem.
This has nothing to do with cache=off. The IDE device defaults to
write-back caching. As such, IDE makes no guarantee that when a data
write completes, it's actually completed on disk. This only comes into
play when write-back is disabled. I'm perfectly happy to accept a patch
that adds explicit sync's when write-back is disabled.
i.e. with write-back caching enabled, the IDE protocol makes no
guarantees about when data is committed to disk.
So, from a protocol correctness POV, qemu is behaving correctly with
cache=on and write-back caching enabled on the disk.
For SCSI, an unordered queue is advertised. Again, everything
depends
on whether or not write-back caching is enabled or not. Again,
perfectly happy to take patches here.
Queue ordering and write-back caching sound like very different things.
Are they two distinct SCSI options, or ...?
Surely an ordered queue doesn't do much help prevent fs corruption if
the host crashes, right? You would still need write-back caching
disabled?
More importantly, the most common journaled filesystem, ext3, does
not
enable write barriers by default (even for journal updates). This is
how it ship in Red Hat distros.
i.e. implementing barriers for virtio won't help most ext3 deployments?
And again, if barriers are just about ordering, don't you need to
disable caching anyway?
So there is no greater risk of corrupting a journal in QEMU than
there
is on bare metal.
This is the bit I really don't buy - we're equating qemu caching to IDE
write-back caching and saying the risk of corruption is the same in both
cases.
But doesn't qemu cache data for far, far longer than a typical IDE disk
with write-back caching would do? Doesn't that mean you're far, far more
likely to see fs corruption with qemu caching?
Or put it another way, if we fix it by implementing the disabling of
write-back caching ... users running a virtual machine will need to run
"hdparam -W 0 /dev/sda" where they would never have run it on baremetal?
Cheers,
Mark.