On (Tue) Feb 24 2009 [11:58:31], Daniel P. Berrange wrote:
On Tue, Feb 24, 2009 at 05:09:31PM +0530, Amit Shah wrote:
...
> The best case to get a non-fragmented VM image is to have it
allocated
> completely at create-time with fallocate().
The main problem with this change is that it'll make it harder for
us to provide incremental feedback. As per the comment in the code,
it is our intention to make the volume creation API run as a background
job which provides feedback on progress of allocation, and the ability
to cancel the job. Since posix_fallocate() is an all-or-nothing kind of
API it wouldn't be very helpful.
What sort of performance boost does this give you ? Would we perhaps
be able to get close to it by writing in bigger chunks than 4k, or
mmap'ing the file and then doing a memset across it ?
I have a program up at [1] that gives me the following data.
[1]
http://fedorapeople.org/gitweb?p=amitshah/public_git/alloc-perf.git;a=blo...
I compiled results for ext3, ext4, xfs and btrfs. I used the following
methods to allocate a file (1 GB in size) and zero it:
- posix_fallocate()
- mmap() and memset()
- write chunks, sized 4k and 8k.
Results:
---
ext4:
posix-fallocate run time:
(approx 0s)
mmap run time:
(approx 13s)
4096-sized chunk run time:
(approx 15s)
8192-sized chunk run time:
(approx 18s)
$ sudo filefrag /mnt/ext4/*
/mnt/ext4/file-chunk4: 29 extents found
/mnt/ext4/file-chunk8: 20 extents found
/mnt/ext4/file-mmap: 38 extents found
/mnt/ext4/file-pf: 1 extent found
---
xfs:
posix-fallocate run time:
(approx 0s)
mmap run time:
(approx 14s)
4096-sized chunk run time:
(approx 18s)
8192-sized chunk run time:
(approx 19s)
$ sudo filefrag /mnt/xfs/*
/mnt/xfs/file-chunk4: 3 extents found
/mnt/xfs/file-chunk8: 4 extents found
/mnt/xfs/file-mmap: 2 extents found
/mnt/xfs/file-pf: 1 extent found
---
ext3:
posix-fallocate run time:
(approx 18s)
mmap run time:
(approx 20s)
4096-sized chunk run time:
(approx 22s)
8192-sized chunk run time:
(approx 24s)
$ sudo filefrag /mnt/ext3/*
/mnt/ext3/file-chunk4: 38 extents found, perfection would be 9 extents
/mnt/ext3/file-chunk8: 9 extents found
/mnt/ext3/file-mmap: 44 extents found, perfection would be 9 extents
/mnt/ext3/file-pf: 9 extents found
---
btrfs:
posix-fallocate run time:
(approx 0s)
mmap run time:
(approx 18s)
4096-sized chunk run time:
(approx 17s)
8192-sized chunk run time:
(approx 19s)
$ sudo /mnt/btrfs/*
FIBMAP: Invalid argument
---
I have detailed results up at
http://fedorapeople.org/gitweb?p=amitshah/public_git/alloc-perf.git;a=blo...
The link to the git tree is
http://fedorapeople.org/gitweb?p=amitshah/public_git/alloc-perf.git
Clearly, extents-based file systems provide a very very fast fallocate()
implementation that allocates a new file and zeroes it. Since F11 is
going to have ext4 by default, I strongly suggest we switch to
posix_fallocate() for Linux hosts. The feedback should not matter on the
newer file systems as the alloc is really fast and we anyway don't have
an implementation currently for non-extent-based file systems. It really
won't be missed for newer hosts.
Inspite of this if some feedback is needed for a non-extents-based file
system, a run-time probe for the underlying file system can be made and
we could default to a chunk-based allocation in that case.
For systems that do not implement posix_fallocate(), some
configure-magic is needed.
Amit