Steve Ofsthun wrote:
Anthony Liguori wrote:
> Daniel P. Berrange wrote:
>
>> On Wed, Oct 08, 2008 at 11:06:27AM -0500, Anthony Liguori wrote:
>> Sorry, it was mistakenly private - fixed now.
>> Xen does use O_DIRECT for paravirt driver case - blktap is using the
>> combo
>> of AIO+O_DIRECT.
>>
> You have to use O_DIRECT with linux-aio. And blktap is well known to
> have terrible performance. Most serious users use blkback/blkfront and
> blkback does not avoid the host page cache. It maintains data integrity
> by passing through barriers from the guest to the host. You can
> approximate this in userspace by using fdatasync.
>
This is not accurate (at least for HVM guests using PV drivers on Xen 3.2). blkback does
indeed bypass the host page cache completely. It's I/O behavior is akin to O_DIRECT.
I reread the code more closely and convinced myself that you are
correct. While it was obvious that the bio's were being constructed
from granted pages, my initial impression was that the requests were
still going through the scheduler and could still be satisfied from the
host page cache. But that is not that case.
I/O is dma'd directly to/from guest pages without involving any
dom0 buffering. blkback barrier support only enforces write ordering of the blkback I/O
stream(s). It does nothing to synchronize data in the host page cache. Data written
through blkback will modify the storage "underneath" any data in the host page
cache (w/o flushing the page cache). Subsequent access to the page cache by qemu-dm will
access stale data. In our own Xen product we must explicitly flush the host page cache
backing store data at qemu-dm start up, to guarantee proper data access. It is not safe
to access the same backing object with both qemu-dm and blkback simultaneously.
> The issue the bug addresses, iozone performs better than native, can be
> addressed in the following way:
>
> 1) For IDE, you have to disable write-caching in the guest. This should
> force an fdatasync in the host.
> 2) For virtio-blk, we need to implement barrier support. This is what
> blkfront/blkback do.
>
I don't think this is enough. Barrier semantics are local to a particular I/O
stream. There would be no reason for the barrier to affect the host page cache (unless
the I/Os are buffered by the cache).
If we implement barriers in terms of fdatasync, it should be sufficient.
Regards,
Anthony Liguori