[libvirt] v3: Use posix_fallocate() on supported systems to allocate diskspace

This patchset makes use of the posix_fallocate() call to allocate chunks of files whenever needed if it's available. We fallback to using safewrite() if it's not available. mmap() could be used instead of safewrite too; I have a patch in case someone is interested in seeing it.

Using posix_fallocate() to allocate disk space and fill it with zeros is faster than writing the zeros block-by-block. Also, for backing file systems that support extents and the fallocate() syscall, this operation will give us a big speed boost. This also brings us the advantage of very less fragmentation for the chunk being allocated. For systems that don't support posix_fallocate(), fall back to safewrite(). Signed-off-by: Amit Shah <amit.shah@redhat.com> --- configure.in | 2 +- src/libvirt_private.syms | 1 + src/util.c | 38 ++++++++++++++++++++++++++++++++++++++ src/util.h | 1 + 4 files changed, 41 insertions(+), 1 deletions(-) diff --git a/configure.in b/configure.in index 413d27c..edce040 100644 --- a/configure.in +++ b/configure.in @@ -72,7 +72,7 @@ dnl Use --disable-largefile if you don't want this. AC_SYS_LARGEFILE dnl Availability of various common functions (non-fatal if missing). -AC_CHECK_FUNCS([cfmakeraw regexec uname sched_getaffinity getuid getgid]) +AC_CHECK_FUNCS([cfmakeraw regexec uname sched_getaffinity getuid getgid posix_fallocate]) dnl Availability of various not common threadsafe functions AC_CHECK_FUNCS([strerror_r strtok_r getmntent_r getgrnam_r getpwuid_r]) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index f0d8afa..a5f9f92 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -308,6 +308,7 @@ virStrToLong_ui; virFileLinkPointsTo; saferead; safewrite; +safezero; virMacAddrCompare; virEnumFromString; virEnumToString; diff --git a/src/util.c b/src/util.c index 66ad9a4..955c4e5 100644 --- a/src/util.c +++ b/src/util.c @@ -117,6 +117,44 @@ ssize_t safewrite(int fd, const void *buf, size_t count) return nwritten; } +#ifdef HAVE_POSIX_FALLOCATE +int safezero(int fd, int flags, off_t offset, off_t len) +{ + return posix_fallocate(fd, offset, len); +} +#else +int safezero(int fd, int flags, off_t offset, off_t len) +{ + int r; + char *buf; + unsigned long long remain, bytes; + + /* Split up the write in small chunks so as not to allocate lots of RAM */ + remain = len; + bytes = 1024 * 1024; + + r = VIR_ALLOC_N(buf, bytes); + if (r < 0) + return -ENOMEM; + + while (remain) { + if (bytes > remain) + bytes = remain; + + r = safewrite(fd, buf, len); + if (r < 0) { + VIR_FREE(buf); + return r; + } + + /* safewrite() guarantees all data will be written */ + remain -= bytes; + } + VIR_FREE(buf); + return 0; +} +#endif + #ifndef PROXY int virFileStripSuffix(char *str, diff --git a/src/util.h b/src/util.h index 87cbf67..3fd5d25 100644 --- a/src/util.h +++ b/src/util.h @@ -31,6 +31,7 @@ int saferead(int fd, void *buf, size_t count); ssize_t safewrite(int fd, const void *buf, size_t count); +int safezero(int fd, int flags, off_t offset, off_t len); enum { VIR_EXEC_NONE = 0, -- 1.6.0.6

Make use of the safezero() function to allocate disk space instead of safewrite() to write zeros. The safezero() function will use posix_fallocate() on supported systems. If progress activity is not requested (current behaviour), the new safezero() function will allocate the file in one go. If progress activity is requested, allocate in 512MiB chunks. Signed-off-by: Amit Shah <amit.shah@redhat.com> --- src/storage_backend_fs.c | 44 +++++++++++++++++++++++++++++++++----------- 1 files changed, 33 insertions(+), 11 deletions(-) diff --git a/src/storage_backend_fs.c b/src/storage_backend_fs.c index c0b130e..5000f43 100644 --- a/src/storage_backend_fs.c +++ b/src/storage_backend_fs.c @@ -62,6 +62,8 @@ static int qcowXGetBackingStore(virConnectPtr, char **, static int vmdk4GetBackingStore(virConnectPtr, char **, const unsigned char *, size_t); +static int track_allocation_progress = 0; + /* Either 'magic' or 'extension' *must* be provided */ struct FileTypeInfo { int type; /* One of the constants above */ @@ -1016,24 +1018,44 @@ virStorageBackendFileSystemVolCreate(virConnectPtr conn, } /* Pre-allocate any data if requested */ - /* XXX slooooooooooooooooow. - * Need to add in progress bars & bg thread somehow */ + /* XXX slooooooooooooooooow on non-extents-based file systems */ + /* FIXME: Add in progress bars & bg thread if progress bar requested */ if (vol->allocation) { - unsigned long long remain = vol->allocation; - static char const zeros[4096]; - while (remain) { - int bytes = sizeof(zeros); - if (bytes > remain) - bytes = remain; - if ((bytes = safewrite(fd, zeros, bytes)) < 0) { - virReportSystemError(conn, errno, + if (track_allocation_progress) { + unsigned long long remain = vol->allocation; + + while (remain) { + /* Allocate in chunks of 512MiB: big-enough chunk + * size and takes approx. 9s on ext3. A progress + * update every 9s is a fair-enough trade-off + */ + unsigned long long bytes = 512 * 1024 * 1024; + int r; + + if (bytes > remain) + bytes = remain; + if ((r = safezero(fd, 0, vol->allocation - remain, + bytes)) != 0) { + virReportSystemError(conn, r, + _("cannot fill file '%s'"), + vol->target.path); + unlink(vol->target.path); + close(fd); + return -1; + } + remain -= bytes; + } + } else { /* No progress bars to be shown */ + int r; + + if ((r = safezero(fd, 0, 0, vol->allocation)) != 0) { + virReportSystemError(conn, r, _("cannot fill file '%s'"), vol->target.path); unlink(vol->target.path); close(fd); return -1; } - remain -= bytes; } } -- 1.6.0.6

On Thu, Mar 19, 2009 at 08:17:54PM +0530, Amit Shah wrote:
Make use of the safezero() function to allocate disk space instead of safewrite() to write zeros. The safezero() function will use posix_fallocate() on supported systems.
If progress activity is not requested (current behaviour), the new safezero() function will allocate the file in one go.
If progress activity is requested, allocate in 512MiB chunks.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
ACK Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Thu, Mar 19, 2009 at 08:17:53PM +0530, Amit Shah wrote:
Using posix_fallocate() to allocate disk space and fill it with zeros is faster than writing the zeros block-by-block.
Also, for backing file systems that support extents and the fallocate() syscall, this operation will give us a big speed boost.
This also brings us the advantage of very less fragmentation for the chunk being allocated.
For systems that don't support posix_fallocate(), fall back to safewrite().
ACK, looks good now. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On (Thu) Mar 19 2009 [20:17:52], Amit Shah wrote:
This patchset makes use of the posix_fallocate() call to allocate chunks of files whenever needed if it's available.
We fallback to using safewrite() if it's not available.
mmap() could be used instead of safewrite too; I have a patch in case someone is interested in seeing it.
Something like this:
From d33b843b381ea6a25c6e8efb6b248965a40e5f84 Mon Sep 17 00:00:00 2001 From: Amit Shah <amit.shah@redhat.com> Date: Thu, 19 Mar 2009 21:43:50 +0530 Subject: [PATCH] Use mmap() and memset() for safezero
If available, use mmap to allocate zeroed chunks for files. This should be faster than allocating small chunks using safewrite. Signed-off-by: Amit Shah <amit.shah@redhat.com> --- configure.in | 2 +- src/util.c | 32 +++++++++++++++++++++++++++++++- 2 files changed, 32 insertions(+), 2 deletions(-) diff --git a/configure.in b/configure.in index edce040..6b2bb5e 100644 --- a/configure.in +++ b/configure.in @@ -72,7 +72,7 @@ dnl Use --disable-largefile if you don't want this. AC_SYS_LARGEFILE dnl Availability of various common functions (non-fatal if missing). -AC_CHECK_FUNCS([cfmakeraw regexec uname sched_getaffinity getuid getgid posix_fallocate]) +AC_CHECK_FUNCS([cfmakeraw regexec uname sched_getaffinity getuid getgid posix_fallocate mmap]) dnl Availability of various not common threadsafe functions AC_CHECK_FUNCS([strerror_r strtok_r getmntent_r getgrnam_r getpwuid_r]) diff --git a/src/util.c b/src/util.c index 955c4e5..93d2937 100644 --- a/src/util.c +++ b/src/util.c @@ -39,6 +39,9 @@ #if HAVE_SYS_WAIT_H #include <sys/wait.h> #endif +#if HAVE_MMAP +#include <sys/mman.h> +#endif #include <string.h> #include <signal.h> #if HAVE_TERMIOS_H @@ -123,6 +126,32 @@ int safezero(int fd, int flags, off_t offset, off_t len) return posix_fallocate(fd, offset, len); } #else + +#ifdef HAVE_MMAP +int safezero(int fd, int flags, off_t offset, off_t len) +{ + int r; + char *buf; + + /* memset wants the mmap'ed file to be present on disk so create a + * sparse file + */ + r = ftruncate(fd, len); + if (r < 0) + return -errno; + + buf = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, offset); + if (buf == MAP_FAILED) + return -errno; + + memset(buf, 0, len); + munmap(buf, len); + + return 0; +} + +#else /* HAVE_MMAP */ + int safezero(int fd, int flags, off_t offset, off_t len) { int r; @@ -153,7 +182,8 @@ int safezero(int fd, int flags, off_t offset, off_t len) VIR_FREE(buf); return 0; } -#endif +#endif /* HAVE_MMAP */ +#endif /* HAVE_POSIX_FALLOCATE */ #ifndef PROXY -- 1.6.0.6

On Thu, Mar 19, 2009 at 09:47:12PM +0530, Amit Shah wrote:
On (Thu) Mar 19 2009 [20:17:52], Amit Shah wrote:
This patchset makes use of the posix_fallocate() call to allocate chunks of files whenever needed if it's available.
We fallback to using safewrite() if it's not available.
mmap() could be used instead of safewrite too; I have a patch in case someone is interested in seeing it.
Something like this:
From d33b843b381ea6a25c6e8efb6b248965a40e5f84 Mon Sep 17 00:00:00 2001 From: Amit Shah <amit.shah@redhat.com> Date: Thu, 19 Mar 2009 21:43:50 +0530 Subject: [PATCH] Use mmap() and memset() for safezero
If available, use mmap to allocate zeroed chunks for files. This should be faster than allocating small chunks using safewrite.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
ACK, looks like a good extension. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Thu, Mar 19, 2009 at 08:17:52PM +0530, Amit Shah wrote:
This patchset makes use of the posix_fallocate() call to allocate chunks of files whenever needed if it's available.
We fallback to using safewrite() if it's not available.
mmap() could be used instead of safewrite too; I have a patch in case someone is interested in seeing it.
1 + 2 + 3 looks fine to me, ACK, thanks a lot ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Fri, Mar 20, 2009 at 12:06:25PM +0100, Daniel Veillard wrote:
On Thu, Mar 19, 2009 at 08:17:52PM +0530, Amit Shah wrote:
This patchset makes use of the posix_fallocate() call to allocate chunks of files whenever needed if it's available.
We fallback to using safewrite() if it's not available.
mmap() could be used instead of safewrite too; I have a patch in case someone is interested in seeing it.
1 + 2 + 3 looks fine to me, ACK,
Applied and commited :-) thanks again ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/
participants (3)
-
Amit Shah
-
Daniel P. Berrange
-
Daniel Veillard