Anthony Liguori wrote:
Daniel P. Berrange wrote:
> On Wed, Oct 08, 2008 at 11:06:27AM -0500, Anthony Liguori wrote:
> Sorry, it was mistakenly private - fixed now.
> Xen does use O_DIRECT for paravirt driver case - blktap is using the
> combo
> of AIO+O_DIRECT.
You have to use O_DIRECT with linux-aio. And blktap is well known to
have terrible performance. Most serious users use blkback/blkfront and
blkback does not avoid the host page cache. It maintains data integrity
by passing through barriers from the guest to the host. You can
approximate this in userspace by using fdatasync.
This is not accurate (at least for HVM guests using PV drivers on Xen 3.2). blkback does
indeed bypass the host page cache completely. It's I/O behavior is akin to O_DIRECT.
I/O is dma'd directly to/from guest pages without involving any dom0 buffering.
blkback barrier support only enforces write ordering of the blkback I/O stream(s). It
does nothing to synchronize data in the host page cache. Data written through blkback
will modify the storage "underneath" any data in the host page cache (w/o
flushing the page cache). Subsequent access to the page cache by qemu-dm will access
stale data. In our own Xen product we must explicitly flush the host page cache backing
store data at qemu-dm start up, to guarantee proper data access. It is not safe to access
the same backing object with both qemu-dm and blkback simultaneously.
The issue the bug addresses, iozone performs better than native, can
be
addressed in the following way:
1) For IDE, you have to disable write-caching in the guest. This should
force an fdatasync in the host.
2) For virtio-blk, we need to implement barrier support. This is what
blkfront/blkback do.
I don't think this is enough. Barrier semantics are local to a particular I/O stream.
There would be no reason for the barrier to affect the host page cache (unless the I/Os
are buffered by the cache).
3) For SCSI, we should support ordered queuing which would result in
an
fdatasync when barriers are injected.
This would result in write performanc> e being what was expected in the
guest while still letting the host coalesce IO requests, perform
scheduling with other guests (while respecting each guest's own ordering
requirements).
I generally agree with your suggestion that host page cache performance benefits
shouldn't be discarded just to make naive benchmark data collection easier. Anyone
suggesting that QEMU emulated disk I/O could somehow outperform the host I/O system should
know that something is wrong with their benchmark setup. Unfortunately this discussion
continues to reappear in the Xen community. I am sure that as QEMU/KVM/virtio matures, a
similar thread will continue to resurface.
Steve
Regards,
Anthony LIguori
> QEMU code is only used for the IDE emulation case which isn't
> interesting from a performance POV.
>
> Daniel
>
--
Libvir-list mailing list
Libvir-list(a)redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list