[libvirt] Use posix_fallocate() to allocate disk space

These patches introduce safezero() to allocate chunks for a file and write zeroes to them. Not yet ready for merge; I don't understand the configure system yet and don't know how to test for posix_fallocate() availability. The second patch introduces a new variable, track_allocation_progress, which perhaps belongs somewhere else. Pointers for both these problems appreciated. Thanks, Amit

Using posix_fallocate() to allocate disk space and fill it with zeros is faster than writing the zeros block-by-block. Also, for backing file systems that support extents and the fallocate() syscall, this operation will give us a big speed boost. This also brings us the advantage of very less fragmentation for the chunk being allocated. For systems that don't support posix_fallocate(), fall back to safewrite(). Signed-off-by: Amit Shah <amit.shah@redhat.com> --- src/libvirt_private.syms | 1 + src/util.c | 20 ++++++++++++++++++++ src/util.h | 1 + 3 files changed, 22 insertions(+), 0 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index f0d8afa..a5f9f92 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -308,6 +308,7 @@ virStrToLong_ui; virFileLinkPointsTo; saferead; safewrite; +safezero; virMacAddrCompare; virEnumFromString; virEnumToString; diff --git a/src/util.c b/src/util.c index 66ad9a4..b69d33a 100644 --- a/src/util.c +++ b/src/util.c @@ -117,6 +117,26 @@ ssize_t safewrite(int fd, const void *buf, size_t count) return nwritten; } +#if 1 +int safezero(int fd, int flags, off_t offset, off_t len) +{ + return posix_fallocate(fd, offset, len); +} +#else +int safezero(int fd, int flags, off_t offset, off_t len) +{ + char *buf; + int r; + + buf = calloc(len, sizeof(char)); + if (buf == NULL) + return -ENOMEM; + + r = safewrite(fd, buf, len); + return r; +} +#endif + #ifndef PROXY int virFileStripSuffix(char *str, diff --git a/src/util.h b/src/util.h index 87cbf67..3fd5d25 100644 --- a/src/util.h +++ b/src/util.h @@ -31,6 +31,7 @@ int saferead(int fd, void *buf, size_t count); ssize_t safewrite(int fd, const void *buf, size_t count); +int safezero(int fd, int flags, off_t offset, off_t len); enum { VIR_EXEC_NONE = 0, -- 1.6.0.6

Make use of the safezero() function to allocate disk space instead of safewrite() to write zeros. The safezero() function will use posix_fallocate() on supported systems. If progress activity is not requested (current behaviour), the new safezero() function will allocate the file in one go. If progress activity is requested, allocate in 512MiB chunks. Signed-off-by: Amit Shah <amit.shah@redhat.com> --- src/storage_backend_fs.c | 44 +++++++++++++++++++++++++++++++++----------- 1 files changed, 33 insertions(+), 11 deletions(-) diff --git a/src/storage_backend_fs.c b/src/storage_backend_fs.c index c0b130e..f0cbcc3 100644 --- a/src/storage_backend_fs.c +++ b/src/storage_backend_fs.c @@ -62,6 +62,8 @@ static int qcowXGetBackingStore(virConnectPtr, char **, static int vmdk4GetBackingStore(virConnectPtr, char **, const unsigned char *, size_t); +static int track_allocation_progress = 0; + /* Either 'magic' or 'extension' *must* be provided */ struct FileTypeInfo { int type; /* One of the constants above */ @@ -1016,24 +1018,44 @@ virStorageBackendFileSystemVolCreate(virConnectPtr conn, } /* Pre-allocate any data if requested */ - /* XXX slooooooooooooooooow. - * Need to add in progress bars & bg thread somehow */ + /* XXX slooooooooooooooooow on non-extents-based file systems */ + /* FIXME: Add in progress bars & bg thread if progress bar requested */ if (vol->allocation) { - unsigned long long remain = vol->allocation; - static char const zeros[4096]; - while (remain) { - int bytes = sizeof(zeros); - if (bytes > remain) - bytes = remain; - if ((bytes = safewrite(fd, zeros, bytes)) < 0) { - virReportSystemError(conn, errno, + if (track_allocation_progress) { + unsigned long long remain = vol->allocation; + + while (remain) { + /* Allocate in chunks of 512MB: big-enough chunk + * size and takes approx. 9s on ext3. A progress + * update every 9s is a fair-enough trade-off + */ + unsigned long long bytes = 512 * 1024 * 1024; + int r; + + if (bytes > remain) + bytes = remain; + if ((r = safezero(fd, 0, vol->allocation - remain, + bytes)) != 0) { + virReportSystemError(conn, r, + _("cannot fill file '%s'"), + vol->target.path); + unlink(vol->target.path); + close(fd); + return -1; + } + remain -= bytes; + } + } else { /* No progress bars to be shown */ + int r; + + if ((r = safezero(fd, 0, 0, vol->allocation)) != 0) { + virReportSystemError(conn, r, _("cannot fill file '%s'"), vol->target.path); unlink(vol->target.path); close(fd); return -1; } - remain -= bytes; } } -- 1.6.0.6

On Thu, Mar 19, 2009 at 12:45:25PM +0530, Amit Shah wrote:
Make use of the safezero() function to allocate disk space instead of safewrite() to write zeros. The safezero() function will use posix_fallocate() on supported systems.
If progress activity is not requested (current behaviour), the new safezero() function will allocate the file in one go.
If progress activity is requested, allocate in 512MiB chunks.
ACK, this looks good, assuming we change the non-fallocate() safezero() impl to not allocate huge chunks of memory. The 'static int track_allocation_progress' is fine as you have it - it is a useful reminder that we need to expose this functionality via the API, and we can move it as needed at that time. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Thu, Mar 19, 2009 at 12:45:24PM +0530, Amit Shah wrote:
diff --git a/src/util.c b/src/util.c index 66ad9a4..b69d33a 100644 --- a/src/util.c +++ b/src/util.c @@ -117,6 +117,26 @@ ssize_t safewrite(int fd, const void *buf, size_t count) return nwritten; }
+#if 1 +int safezero(int fd, int flags, off_t offset, off_t len) +{ + return posix_fallocate(fd, offset, len); +} +#else +int safezero(int fd, int flags, off_t offset, off_t len) +{ + char *buf; + int r; + + buf = calloc(len, sizeof(char)); + if (buf == NULL) + return -ENOMEM; + + r = safewrite(fd, buf, len); + return r; +} +#endif
For memory allocation you can use if (VIR_ALLOC_N(buf, len) < 0) return -1; Also, you'll need a VIR_FREE(buf) call in there after the safewrite(). I'm a little worried about scalability of this impl though. The later patch will call this with 500 MB chunks if progress is turned on, or even allocate the whole file in one chunk - we don't want to be allocating 20 GB of memory jus to write zero's to a file ! Is it perhaps easier just to do something like char * buf = mmap(NULL, len, MAP_SHARED, MAP_ANONYMOUS, fd, offset) memset(buf, 0, len); munmap(buf, len); Or, do your calloc() of a 1 MB chunk, and then call safewrite in a loop, just to avoid too large a memory allocation. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On (Thu) Mar 19 2009 [10:39:05], Daniel P. Berrange wrote:
On Thu, Mar 19, 2009 at 12:45:24PM +0530, Amit Shah wrote:
diff --git a/src/util.c b/src/util.c index 66ad9a4..b69d33a 100644 --- a/src/util.c +++ b/src/util.c @@ -117,6 +117,26 @@ ssize_t safewrite(int fd, const void *buf, size_t count) return nwritten; }
+#if 1 +int safezero(int fd, int flags, off_t offset, off_t len) +{ + return posix_fallocate(fd, offset, len); +} +#else +int safezero(int fd, int flags, off_t offset, off_t len) +{ + char *buf; + int r; + + buf = calloc(len, sizeof(char)); + if (buf == NULL) + return -ENOMEM; + + r = safewrite(fd, buf, len); + return r; +} +#endif
For memory allocation you can use
if (VIR_ALLOC_N(buf, len) < 0) return -1;
Also, you'll need a VIR_FREE(buf) call in there after the safewrite().
I'm a little worried about scalability of this impl though. The later patch will call this with 500 MB chunks if progress is turned on, or even allocate the whole file in one chunk - we don't want to be allocating 20 GB of memory jus to write zero's to a file ! Is it perhaps easier just to do something like
char * buf = mmap(NULL, len, MAP_SHARED, MAP_ANONYMOUS, fd, offset) memset(buf, 0, len); munmap(buf, len);
Or, do your calloc() of a 1 MB chunk, and then call safewrite in a loop, just to avoid too large a memory allocation.
Yeah; I forgot the free(). My approach was to never allocate more than 500MiB. However we can call safezero() itself with less than 500M; is 200M OK? Amit

On Thu, Mar 19, 2009 at 05:02:48PM +0530, Amit Shah wrote:
On (Thu) Mar 19 2009 [10:39:05], Daniel P. Berrange wrote:
char * buf = mmap(NULL, len, MAP_SHARED, MAP_ANONYMOUS, fd, offset) memset(buf, 0, len); munmap(buf, len);
Or, do your calloc() of a 1 MB chunk, and then call safewrite in a loop, just to avoid too large a memory allocation.
Yeah; I forgot the free().
My approach was to never allocate more than 500MiB. However we can call safezero() itself with less than 500M; is 200M OK?
No IMHO that's definitely too big. Even 1MB is too much if you can avoid it, please try to use mmap, if that dowsn't work then use 1MB chunk, thanks, Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

Using posix_fallocate() to allocate disk space and fill it with zeros is faster than writing the zeros block-by-block. Also, for backing file systems that support extents and the fallocate() syscall, this operation will give us a big speed boost. This also brings us the advantage of very less fragmentation for the chunk being allocated. For systems that don't support posix_fallocate(), fall back to safewrite(). Signed-off-by: Amit Shah <amit.shah@redhat.com> --- configure.in | 2 +- src/libvirt_private.syms | 1 + src/util.c | 20 ++++++++++++++++++++ src/util.h | 1 + 4 files changed, 23 insertions(+), 1 deletions(-) diff --git a/configure.in b/configure.in index 413d27c..edce040 100644 --- a/configure.in +++ b/configure.in @@ -72,7 +72,7 @@ dnl Use --disable-largefile if you don't want this. AC_SYS_LARGEFILE dnl Availability of various common functions (non-fatal if missing). -AC_CHECK_FUNCS([cfmakeraw regexec uname sched_getaffinity getuid getgid]) +AC_CHECK_FUNCS([cfmakeraw regexec uname sched_getaffinity getuid getgid posix_fallocate]) dnl Availability of various not common threadsafe functions AC_CHECK_FUNCS([strerror_r strtok_r getmntent_r getgrnam_r getpwuid_r]) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index f0d8afa..a5f9f92 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -308,6 +308,7 @@ virStrToLong_ui; virFileLinkPointsTo; saferead; safewrite; +safezero; virMacAddrCompare; virEnumFromString; virEnumToString; diff --git a/src/util.c b/src/util.c index 66ad9a4..de754bf 100644 --- a/src/util.c +++ b/src/util.c @@ -117,6 +117,26 @@ ssize_t safewrite(int fd, const void *buf, size_t count) return nwritten; } +#if HAVE_POSIX_FALLOCATE +int safezero(int fd, int flags, off_t offset, off_t len) +{ + return posix_fallocate(fd, offset, len); +} +#else +int safezero(int fd, int flags, off_t offset, off_t len) +{ + char *buf; + int r; + + buf = calloc(len, sizeof(char)); + if (buf == NULL) + return -ENOMEM; + + r = safewrite(fd, buf, len); + return r; +} +#endif + #ifndef PROXY int virFileStripSuffix(char *str, diff --git a/src/util.h b/src/util.h index 87cbf67..3fd5d25 100644 --- a/src/util.h +++ b/src/util.h @@ -31,6 +31,7 @@ int saferead(int fd, void *buf, size_t count); ssize_t safewrite(int fd, const void *buf, size_t count); +int safezero(int fd, int flags, off_t offset, off_t len); enum { VIR_EXEC_NONE = 0, -- 1.6.0.6

On (Thu) Mar 19 2009 [12:45:23], Amit Shah wrote:
These patches introduce safezero() to allocate chunks for a file and write zeroes to them.
Not yet ready for merge; I don't understand the configure system yet and don't know how to test for posix_fallocate() availability.
Figured out; and it's easy!

On Thu, Mar 19, 2009 at 12:45:23PM +0530, Amit Shah wrote:
These patches introduce safezero() to allocate chunks for a file and write zeroes to them.
Not yet ready for merge; I don't understand the configure system yet and don't know how to test for posix_fallocate() availability.
In this case it'll be pretty easy, add to configure.in: AC_CHECK_FUNCS([posix_fallocate()]) And then in the source code you can #ifdef HAVE_POSIX_FALLOCATE ... #else ... #endif Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Thu, Mar 19, 2009 at 10:34:05AM +0000, Daniel P. Berrange wrote:
On Thu, Mar 19, 2009 at 12:45:23PM +0530, Amit Shah wrote:
These patches introduce safezero() to allocate chunks for a file and write zeroes to them.
Not yet ready for merge; I don't understand the configure system yet and don't know how to test for posix_fallocate() availability.
In this case it'll be pretty easy, add to configure.in:
AC_CHECK_FUNCS([posix_fallocate()])
And then in the source code you can
#ifdef HAVE_POSIX_FALLOCATE ... #else ... #endif
Ahh, you've already done this - i missed the followup patch you sent:-) Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
participants (3)
-
Amit Shah
-
Daniel P. Berrange
-
Daniel Veillard