[libvirt] [PATCH v2 0/3] Wrong allocation size displayed for NFS volumes

Changes over v1 - different tact as more research discovers that the posix_fallocate() does not perform the correct pre-allocation of space for an NFS backed file/disk, some details of the findings can be found: http://www.redhat.com/archives/libvir-list/2014-August/msg00367.html The first patch in this series will refactor the safezero to allow for more fallback than the current code. Initially implemented as a series of three 'safezero()' functions in commit id 'c29d0929' using build conditionals to determine which of the 3 is being used. The refactored code will have one function that will make a series of calls rather the just failing on whatever is built into the system. The first is a global virFileFdPosixFallocate() with the build conditional controlling only whether the function runs or will just return -1. It will also be re-used by the Volume Resize code in a future patch. If on creation the virFileFdPosixFallocate() fails, attempts will be made to then try the mmap() method (if it exists), and then fall back to the slow, old, safewrite of zero filled buffers. The mmap function will follow the format of the posix_fallocate insomuch as if HAVE_MMAP is defined, then that will be attempted or a -1 will be returned immediately. The major change of this patch over the prior code is the fallback. One other minor change is if the virFileFdPosixFallocate() tries to call posix_fallocate() and fails, then a VIR_WARN will be used to indicate something went wrong. Previously, the code would return errno and eventually that would filter back to a caller which would use the errno in a sys error message. This I think only is odd when it comes to the resize path. The second patch looks to resolve the first issue discovered by: https://bugzilla.redhat.com/show_bug.cgi?id=1077068 It does this by checking the returned size/len of the file in the posix_fallocate() call and if it does not match a VIR_WARN will be posted and a failure returned allowing either the mmap or write of zero filled buffers to the file. The third patch changes the virStorageFileResize() logic to follow the safezero() logic with respect to build conditionals. Also, rather than reinvent the wheel, it will reuse the virFileFdPosixFallocate(). The new static API resizeSysFallocate() will return -1 if the build conditionals are not met. A current "downside" is by calling virFileFdPosixFallocate() twice (once directly and the other indirectly through safezero), the VIR_WARN will be displayed twice, but that should only affect someone trying to debug things. NOTE: Investigation and testing while creating this series also showed it's possible to modify the NFS Pool Start code to add a "-o wsize=4096" to the MOUNT command in order to force smaller write's. While that did fix the symptoms, the fix just didn't "feel right". The block size of the "source" export was 4096 bytes, while the block size of the "target" file was 1048576 bytes (current NFSv4 wsize default value when not specified). Whether those play into what posix_fallocate() does I am not sure, but if it does and that's fixed, then the way these patches are coded up - it won't matter. Compared to perhaps needing to revisit or document heavily how to set some new field in the pool XML source - this just seems more right. John Ferlan (3): virfile: Refactor safezero, introduce virFileFdPosixFallocate virfile: Check returned size from virFileFdPosixFallocate virstoragefile: Refactor virStorageFileResize src/libvirt_private.syms | 1 + src/util/virfile.c | 58 ++++++++++++++++++++++++++++++++++++++--------- src/util/virfile.h | 2 ++ src/util/virstoragefile.c | 52 +++++++++++++++++++++++++++--------------- 4 files changed, 84 insertions(+), 29 deletions(-) -- 1.9.3

Currently the safezero() function uses build conditionals to choose either the posix_fallocate() or mmap() with a fallback to safewrite() in order to preallocate a file. This patch will modify the logic in order to allow fallbacks in the event that posix_fallocate() or the ftruncate()and mmap() doesn't work properly. The fallback will be to use the slow safewrite of zero filled buffers to the file. This patch also introduces virFileFdPosixFallocate() so that it can be used by the resize code rather than having two separate functions Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virfile.c | 37 +++++++++++++++++++++++++++---------- src/util/virfile.h | 2 ++ 3 files changed, 30 insertions(+), 10 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 08111d4..5556d31 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1278,6 +1278,7 @@ virFileDirectFdFlag; virFileExists; virFileFclose; virFileFdopen; +virFileFdPosixFallocate; virFileFindHugeTLBFS; virFileFindMountPoint; virFileFindResource; diff --git a/src/util/virfile.c b/src/util/virfile.c index f9efc65..7f03cbf 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -1034,26 +1034,26 @@ safewrite(int fd, const void *buf, size_t count) return nwritten; } -#ifdef HAVE_POSIX_FALLOCATE int -safezero(int fd, off_t offset, off_t len) +virFileFdPosixFallocate(int fd, off_t offset, off_t len) { +#ifdef HAVE_POSIX_FALLOCATE int ret = posix_fallocate(fd, offset, len); if (ret == 0) return 0; + VIR_WARN("Failed to pre-allocate '%lu' bytes with return value '%d'", + len, ret); errno = ret; +#endif /* HAVE_POSIX_FALLOCATE */ return -1; } -#else - -int -safezero(int fd, off_t offset, off_t len) +static int +safezero_mmap(int fd, off_t offset, off_t len) { +#ifdef HAVE_MMAP int r; char *buf; - unsigned long long remain, bytes; -# ifdef HAVE_MMAP static long pagemask = 0; off_t map_skip; @@ -1080,7 +1080,16 @@ safezero(int fd, off_t offset, off_t len) /* fall back to writing zeroes using safewrite if mmap fails (for * example because of virtual memory limits) */ -# endif /* HAVE_MMAP */ +#endif /* HAVE_MMAP */ + return -1; +} + +static int +safezero_slow(int fd, off_t offset, off_t len) +{ + int r; + char *buf; + unsigned long long remain, bytes; if (lseek(fd, offset, SEEK_SET) < 0) return -1; @@ -1111,8 +1120,16 @@ safezero(int fd, off_t offset, off_t len) VIR_FREE(buf); return 0; } -#endif /* HAVE_POSIX_FALLOCATE */ +int +safezero(int fd, off_t offset, off_t len) +{ + if (virFileFdPosixFallocate(fd, offset, len) == 0) + return 0; + if (safezero_mmap(fd, offset, len) == 0) + return 0; + return safezero_slow(fd, offset, len); +} #if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R /* search /proc/mounts for mount point of *type; return pointer to diff --git a/src/util/virfile.h b/src/util/virfile.h index 403d0ba..29a1776 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -41,6 +41,8 @@ typedef enum { ssize_t saferead(int fd, void *buf, size_t count) ATTRIBUTE_RETURN_CHECK; ssize_t safewrite(int fd, const void *buf, size_t count) ATTRIBUTE_RETURN_CHECK; +int virFileFdPosixFallocate(int fd, off_t offset, off_t len) + ATTRIBUTE_RETURN_CHECK; int safezero(int fd, off_t offset, off_t len) ATTRIBUTE_RETURN_CHECK; -- 1.9.3

On Mon, Aug 11, 2014 at 04:30:19PM -0400, John Ferlan wrote:
Currently the safezero() function uses build conditionals to choose either the posix_fallocate() or mmap() with a fallback to safewrite() in order to preallocate a file.
This patch will modify the logic in order to allow fallbacks in the event that posix_fallocate() or the ftruncate()and mmap() doesn't work properly. The fallback will be to use the slow safewrite of zero filled buffers to the file.
Have you actually encountered failing of posix_fallocate() in the real world ? It is supposed to automatically fallback to the equivalent of writing zeros if the filesystem / kernel does not support it, so we should not have todo runtime fallback ourselves. The existance of fallback is the main distinction between the posix_fallocate() and fallocate() system calls. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 08/22/2014 10:46 AM, Daniel P. Berrange wrote:
On Mon, Aug 11, 2014 at 04:30:19PM -0400, John Ferlan wrote:
Currently the safezero() function uses build conditionals to choose either the posix_fallocate() or mmap() with a fallback to safewrite() in order to preallocate a file.
This patch will modify the logic in order to allow fallbacks in the event that posix_fallocate() or the ftruncate()and mmap() doesn't work properly. The fallback will be to use the slow safewrite of zero filled buffers to the file.
Have you actually encountered failing of posix_fallocate() in the real world ? It is supposed to automatically fallback to the equivalent of writing zeros if the filesystem / kernel does not support it, so we should not have todo runtime fallback ourselves. The existance of fallback is the main distinction between the posix_fallocate() and fallocate() system calls.
It wasn't so much as a "failure" as "unexpected results" - the key being that the resulting created (or resized) file was not sized as expected. For an NFS target the results are not what was expected. I've left some history in the prior set of patches with the following probably having the most details: http://www.redhat.com/archives/libvir-list/2014-August/msg00367.html As for real world - this bug came as a result of a virt-test test which was going through all the various sizes/options and found this result to be incorrect - at least in the eyes of the tester. John

On Fri, Aug 22, 2014 at 10:56:47AM -0400, John Ferlan wrote:
On 08/22/2014 10:46 AM, Daniel P. Berrange wrote:
On Mon, Aug 11, 2014 at 04:30:19PM -0400, John Ferlan wrote:
Currently the safezero() function uses build conditionals to choose either the posix_fallocate() or mmap() with a fallback to safewrite() in order to preallocate a file.
This patch will modify the logic in order to allow fallbacks in the event that posix_fallocate() or the ftruncate()and mmap() doesn't work properly. The fallback will be to use the slow safewrite of zero filled buffers to the file.
Have you actually encountered failing of posix_fallocate() in the real world ? It is supposed to automatically fallback to the equivalent of writing zeros if the filesystem / kernel does not support it, so we should not have todo runtime fallback ourselves. The existance of fallback is the main distinction between the posix_fallocate() and fallocate() system calls.
It wasn't so much as a "failure" as "unexpected results" - the key being that the resulting created (or resized) file was not sized as expected.
For an NFS target the results are not what was expected. I've left some history in the prior set of patches with the following probably having the most details:
http://www.redhat.com/archives/libvir-list/2014-August/msg00367.html
So, IIUC, the bug happens when the rsize mount option to NFS is not 4k. strace'ing libvirtd on an NFS volume in this case shows: open("/var/lib/libvirt/images/lettuce/foo", O_RDWR|O_CREAT|O_EXCL, 0600) = 24 fstat(24, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 ftruncate(24, 1073741824) = 0 fallocate(24, 0, 0, 1073741824) = -1 EOPNOTSUPP (Operation not supported) fallocate(24, 0, 0, 1073741824) = -1 EOPNOTSUPP (Operation not supported) fstat(24, {st_mode=S_IFREG|0600, st_size=1073741824, ...}) = 0 fstatfs(24, {f_type="NFS_SUPER_MAGIC", f_bsize=1048576, f_blocks=118342, f_bfree=71002, f_bavail=65632, f_files=7678560, f_ffree=5495931, f_fsid={0, 0}, f_namelen=255, f_frsize=1048576}) = 0 pread(24, "\0", 1, 1048575) = 1 pwrite(24, "\0", 1, 1048575) = 1 pread(24, "\0", 1, 2097151) = 1 pwrite(24, "\0", 1, 2097151) = 1 pread(24, "\0", 1, 3145727) = 1 So we can see glibc here trying fallocate() and then falling back to writing zeros. Since the volume does not come out at the right size this seems to show a bug in glibc. So I think we really ought to report that bug to glibc to be fixed there rather than working around it in libvirt, as there are many more applications besides libvirt that will be impacted by this bug. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Fri, Aug 22, 2014 at 04:15:14PM +0100, Daniel P. Berrange wrote:
On Fri, Aug 22, 2014 at 10:56:47AM -0400, John Ferlan wrote:
On 08/22/2014 10:46 AM, Daniel P. Berrange wrote:
On Mon, Aug 11, 2014 at 04:30:19PM -0400, John Ferlan wrote:
Currently the safezero() function uses build conditionals to choose either the posix_fallocate() or mmap() with a fallback to safewrite() in order to preallocate a file.
This patch will modify the logic in order to allow fallbacks in the event that posix_fallocate() or the ftruncate()and mmap() doesn't work properly. The fallback will be to use the slow safewrite of zero filled buffers to the file.
Have you actually encountered failing of posix_fallocate() in the real world ? It is supposed to automatically fallback to the equivalent of writing zeros if the filesystem / kernel does not support it, so we should not have todo runtime fallback ourselves. The existance of fallback is the main distinction between the posix_fallocate() and fallocate() system calls.
It wasn't so much as a "failure" as "unexpected results" - the key being that the resulting created (or resized) file was not sized as expected.
For an NFS target the results are not what was expected. I've left some history in the prior set of patches with the following probably having the most details:
http://www.redhat.com/archives/libvir-list/2014-August/msg00367.html
So, IIUC, the bug happens when the rsize mount option to NFS is not 4k.
strace'ing libvirtd on an NFS volume in this case shows:
open("/var/lib/libvirt/images/lettuce/foo", O_RDWR|O_CREAT|O_EXCL, 0600) = 24 fstat(24, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 ftruncate(24, 1073741824) = 0 fallocate(24, 0, 0, 1073741824) = -1 EOPNOTSUPP (Operation not supported) fallocate(24, 0, 0, 1073741824) = -1 EOPNOTSUPP (Operation not supported) fstat(24, {st_mode=S_IFREG|0600, st_size=1073741824, ...}) = 0 fstatfs(24, {f_type="NFS_SUPER_MAGIC", f_bsize=1048576, f_blocks=118342, f_bfree=71002, f_bavail=65632, f_files=7678560, f_ffree=5495931, f_fsid={0, 0}, f_namelen=255, f_frsize=1048576}) = 0 pread(24, "\0", 1, 1048575) = 1 pwrite(24, "\0", 1, 1048575) = 1 pread(24, "\0", 1, 2097151) = 1 pwrite(24, "\0", 1, 2097151) = 1 pread(24, "\0", 1, 3145727) = 1
So we can see glibc here trying fallocate() and then falling back to writing zeros. Since the volume does not come out at the right size this seems to show a bug in glibc.
So I think we really ought to report that bug to glibc to be fixed there rather than working around it in libvirt, as there are many more applications besides libvirt that will be impacted by this bug.
Opps, meant to include the stack trace to show where the pread/writes are coming from: (gdb) bt #0 pread64 () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f55a29f9c5e in internal_fallocate (fd=fd@entry=24, offset=1048575, len=1072693248) at ../sysdeps/posix/posix_fallocate.c:78 #2 0x00007f55a29f9cc7 in posix_fallocate (fd=fd@entry=24, offset=<optimized out>, len=<optimized out>) at ../sysdeps/unix/sysv/linux/wordsize-64/posix_fallocate.c:62 #3 0x00007f55a6071026 in safezero (fd=fd@entry=24, offset=<optimized out>, len=<optimized out>) at util/virfile.c:1031 #4 0x00007f55916258c2 in createRawFile (inputvol=0x0, vol=0x7f5570008280, fd=24) at storage/storage_backend.c:389 #5 virStorageBackendCreateRaw (conn=<optimized out>, pool=<optimized out>, vol=0x7f5570008280, inputvol=0x0, flags=<optimized out>) at storage/storage_backend.c:450 Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

https://bugzilla.redhat.com/show_bug.cgi?id=1077068 Add an fstat check to virFileFdPosixFallocate and fail if the allocated size doesn't match the expected value. Since the failure will fall back to using safewrite(), just leave a VIR_WARN message indicating the failure and some entrails for determining why the failure occurred. It seems for an NFS file at least the allocation may not work properly depending on the NFS configuration. Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/util/virfile.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/src/util/virfile.c b/src/util/virfile.c index 7f03cbf..3b87d9f 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -1038,9 +1038,28 @@ int virFileFdPosixFallocate(int fd, off_t offset, off_t len) { #ifdef HAVE_POSIX_FALLOCATE + struct stat sb; + off_t newlen; int ret = posix_fallocate(fd, offset, len); - if (ret == 0) + if (ret == 0) { + /* Let's make sure the new size matches our expectations */ + if (fstat(fd, &sb) < 0) + return -1; +# ifndef WIN32 + newlen = (unsigned long long)sb.st_blocks * + (unsigned long long)DEV_BSIZE; +# else + newlen = sb.st_size; +# endif + if (newlen != len) { + VIR_WARN("posix_fallocate only allocated '%lu' bytes, expected " + "'%lu' byte file (blocks=%lu bsize=%d).", + newlen, len, sb.st_blocks, DEV_BSIZE); + return -1; + } + return 0; + } VIR_WARN("Failed to pre-allocate '%lu' bytes with return value '%d'", len, ret); errno = ret; -- 1.9.3

Currently virStorageFileResize() function uses build conditionals to choose either the posix_fallocate() or mmap() with no fallback in order to preallocate the space in the newly resized file. This patch will modify the logic in order to allow fallbacks in the event that posix_fallocate() or the sys_fallocate syscall() doesn't work properly. The fallback will be to use the slow safewrite of zero filled buffers to the file. Use the virFileFdPosixFallocate() rather than a local function. Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/util/virstoragefile.c | 52 +++++++++++++++++++++++++++++++---------------- 1 file changed, 34 insertions(+), 18 deletions(-) diff --git a/src/util/virstoragefile.c b/src/util/virstoragefile.c index 5b6b2f5..4d37de1 100644 --- a/src/util/virstoragefile.c +++ b/src/util/virstoragefile.c @@ -1086,6 +1086,39 @@ virStorageFileChainGetBroken(virStorageSourcePtr chain, return 0; } +static int +resizeSysFallocate(const char *path, + int fd, + off_t offset, + off_t len) +{ + int rc = -1; +#if HAVE_SYS_SYSCALL_H && defined(SYS_fallocate) + if ((rc = syscall(SYS_fallocate, fd, 0, offset, len)) != 0) { + virReportSystemError(errno, + _("Failed to pre-allocate space for " + "file '%s'"), path); + } +#endif + return rc; +} + +static int +doResize(const char *path, + int fd, + off_t offset, + off_t len) +{ + if (virFileFdPosixFallocate(fd, offset, len) == 0 || + resizeSysFallocate(path, fd, offset, len) == 0 || + safezero(fd, offset, len) == 0) + return 0; + + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("preallocate is not supported on this platform")); + return -1; +} + /** * virStorageFileResize: @@ -1113,25 +1146,8 @@ virStorageFileResize(const char *path, } if (pre_allocate) { -#if HAVE_POSIX_FALLOCATE - if ((rc = posix_fallocate(fd, offset, len)) != 0) { - virReportSystemError(rc, - _("Failed to pre-allocate space for " - "file '%s'"), path); - goto cleanup; - } -#elif HAVE_SYS_SYSCALL_H && defined(SYS_fallocate) - if (syscall(SYS_fallocate, fd, 0, offset, len) != 0) { - virReportSystemError(errno, - _("Failed to pre-allocate space for " - "file '%s'"), path); + if (doResize(path, fd, offset, len) < 0) goto cleanup; - } -#else - virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", - _("preallocate is not supported on this platform")); - goto cleanup; -#endif } else { if (ftruncate(fd, capacity) < 0) { virReportSystemError(errno, -- 1.9.3

On 08/11/2014 10:30 PM, John Ferlan wrote:
Currently virStorageFileResize() function uses build conditionals to choose either the posix_fallocate() or mmap() with no fallback in order to preallocate the space in the newly resized file.
This patch will modify the logic in order to allow fallbacks in the event that posix_fallocate() or the sys_fallocate syscall() doesn't work properly. The fallback will be to use the slow safewrite of zero filled buffers to the file.
Use the virFileFdPosixFallocate() rather than a local function.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/util/virstoragefile.c | 52 +++++++++++++++++++++++++++++++---------------- 1 file changed, 34 insertions(+), 18 deletions(-)
diff --git a/src/util/virstoragefile.c b/src/util/virstoragefile.c index 5b6b2f5..4d37de1 100644 --- a/src/util/virstoragefile.c +++ b/src/util/virstoragefile.c @@ -1086,6 +1086,39 @@ virStorageFileChainGetBroken(virStorageSourcePtr chain, return 0; }
+static int +resizeSysFallocate(const char *path, + int fd, + off_t offset, + off_t len) +{ + int rc = -1; +#if HAVE_SYS_SYSCALL_H && defined(SYS_fallocate) + if ((rc = syscall(SYS_fallocate, fd, 0, offset, len)) != 0) { + virReportSystemError(errno, + _("Failed to pre-allocate space for " + "file '%s'"), path);
I think this should not log an error, since we have a fallback. VIR_DEBUG maybe?
+ } +#endif + return rc; +} + +static int +doResize(const char *path, + int fd, + off_t offset, + off_t len) +{ + if (virFileFdPosixFallocate(fd, offset, len) == 0 || + resizeSysFallocate(path, fd, offset, len) == 0 || + safezero(fd, offset, len) == 0) + return 0; + + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("preallocate is not supported on this platform")); + return -1;
safezero is always supported. And this function should do: return safezero(fd, offset, len);
+} +
/** * virStorageFileResize: @@ -1113,25 +1146,8 @@ virStorageFileResize(const char *path, }
if (pre_allocate) { -#if HAVE_POSIX_FALLOCATE - if ((rc = posix_fallocate(fd, offset, len)) != 0) { - virReportSystemError(rc, - _("Failed to pre-allocate space for " - "file '%s'"), path); - goto cleanup; - } -#elif HAVE_SYS_SYSCALL_H && defined(SYS_fallocate) - if (syscall(SYS_fallocate, fd, 0, offset, len) != 0) { - virReportSystemError(errno, - _("Failed to pre-allocate space for " - "file '%s'"), path); + if (doResize(path, fd, offset, len) < 0) goto cleanup; - } -#else - virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", - _("preallocate is not supported on this platform")); - goto cleanup; -#endif } else { if (ftruncate(fd, capacity) < 0) { virReportSystemError(errno,

On 08/21/2014 11:53 AM, Ján Tomko wrote:
On 08/11/2014 10:30 PM, John Ferlan wrote:
Currently virStorageFileResize() function uses build conditionals to choose either the posix_fallocate() or mmap() with no fallback in order to preallocate the space in the newly resized file.
This patch will modify the logic in order to allow fallbacks in the event that posix_fallocate() or the sys_fallocate syscall() doesn't work properly. The fallback will be to use the slow safewrite of zero filled buffers to the file.
Use the virFileFdPosixFallocate() rather than a local function.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/util/virstoragefile.c | 52 +++++++++++++++++++++++++++++++---------------- 1 file changed, 34 insertions(+), 18 deletions(-)
diff --git a/src/util/virstoragefile.c b/src/util/virstoragefile.c index 5b6b2f5..4d37de1 100644 --- a/src/util/virstoragefile.c +++ b/src/util/virstoragefile.c @@ -1086,6 +1086,39 @@ virStorageFileChainGetBroken(virStorageSourcePtr chain, return 0; }
+static int +resizeSysFallocate(const char *path, + int fd, + off_t offset, + off_t len) +{ + int rc = -1; +#if HAVE_SYS_SYSCALL_H && defined(SYS_fallocate) + if ((rc = syscall(SYS_fallocate, fd, 0, offset, len)) != 0) { + virReportSystemError(errno, + _("Failed to pre-allocate space for " + "file '%s'"), path);
I think this should not log an error, since we have a fallback. VIR_DEBUG maybe?
Yep - right. Although to match the other path, VIR_WARN...
+ } +#endif + return rc; +} + +static int +doResize(const char *path, + int fd, + off_t offset, + off_t len) +{ + if (virFileFdPosixFallocate(fd, offset, len) == 0 || + resizeSysFallocate(path, fd, offset, len) == 0 || + safezero(fd, offset, len) == 0) + return 0; + + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("preallocate is not supported on this platform")); + return -1;
safezero is always supported. And this function should do: return safezero(fd, offset, len);
Hmm.. Perhaps a better way to do this would be to modify safezero() to add a 4th boolean parameter "resize" and make the "safezero_mmap()" be the false side and the check/call to SYS_fallocate() be the true side. That way all the logic resides in virfile.c Thus doResize() goes away and resizeSysFallocate() moves to virfile.c. All virStorageFileResize() does for the "if (preAllocate)" condition "if (safezero(...,true) < 0)" See the attached patch file that "should" be able to be "git am"'d on top of the existing patches (although I did rebase to top to tree this morning). Does that seem better? John
+} +
/** * virStorageFileResize: @@ -1113,25 +1146,8 @@ virStorageFileResize(const char *path, }
if (pre_allocate) { -#if HAVE_POSIX_FALLOCATE - if ((rc = posix_fallocate(fd, offset, len)) != 0) { - virReportSystemError(rc, - _("Failed to pre-allocate space for " - "file '%s'"), path); - goto cleanup; - } -#elif HAVE_SYS_SYSCALL_H && defined(SYS_fallocate) - if (syscall(SYS_fallocate, fd, 0, offset, len) != 0) { - virReportSystemError(errno, - _("Failed to pre-allocate space for " - "file '%s'"), path); + if (doResize(path, fd, offset, len) < 0) goto cleanup; - } -#else - virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", - _("preallocate is not supported on this platform")); - goto cleanup; -#endif } else { if (ftruncate(fd, capacity) < 0) { virReportSystemError(errno,

On 08/21/2014 06:51 PM, John Ferlan wrote:
On 08/21/2014 11:53 AM, Ján Tomko wrote:
On 08/11/2014 10:30 PM, John Ferlan wrote:
Currently virStorageFileResize() function uses build conditionals to choose either the posix_fallocate() or mmap() with no fallback in order to preallocate the space in the newly resized file.
This patch will modify the logic in order to allow fallbacks in the event that posix_fallocate() or the sys_fallocate syscall() doesn't work properly. The fallback will be to use the slow safewrite of zero filled buffers to the file.
Use the virFileFdPosixFallocate() rather than a local function.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/util/virstoragefile.c | 52 +++++++++++++++++++++++++++++++---------------- 1 file changed, 34 insertions(+), 18 deletions(-)
diff --git a/src/util/virstoragefile.c b/src/util/virstoragefile.c index 5b6b2f5..4d37de1 100644 --- a/src/util/virstoragefile.c +++ b/src/util/virstoragefile.c @@ -1086,6 +1086,39 @@ virStorageFileChainGetBroken(virStorageSourcePtr chain, return 0; }
+static int +resizeSysFallocate(const char *path, + int fd, + off_t offset, + off_t len) +{ + int rc = -1; +#if HAVE_SYS_SYSCALL_H && defined(SYS_fallocate) + if ((rc = syscall(SYS_fallocate, fd, 0, offset, len)) != 0) { + virReportSystemError(errno, + _("Failed to pre-allocate space for " + "file '%s'"), path);
I think this should not log an error, since we have a fallback. VIR_DEBUG maybe?
Yep - right. Although to match the other path, VIR_WARN...
+ } +#endif + return rc; +} + +static int +doResize(const char *path, + int fd, + off_t offset, + off_t len) +{ + if (virFileFdPosixFallocate(fd, offset, len) == 0 || + resizeSysFallocate(path, fd, offset, len) == 0 || + safezero(fd, offset, len) == 0) + return 0; + + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("preallocate is not supported on this platform")); + return -1;
safezero is always supported. And this function should do: return safezero(fd, offset, len);
Hmm.. Perhaps a better way to do this would be to modify safezero() to add a 4th boolean parameter "resize" and make the "safezero_mmap()" be the false side and the check/call to SYS_fallocate() be the true side. That way all the logic resides in virfile.c
How about 'fallback_to_mmap' instead of resize? Or even simpler, we don't really need a different set of fallbacks for resize than the other uses. It seems the patch adding the preallocation to FileResize should have used safezero instead of reimplementing it. (I also wonder if there are systems without posix_fallocate, but having the syscall...) ACK to safezero implementation with all four methods: int safezero(int fd, off_t offset, off_t len) { if (virFileFdPosixFallocate(fd, offset, len) == 0) return 0; if (safezero_sys_fallocate(fd, offset, len) == 0) return 0; if (safezero_mmap(fd, offset, len) == 0) return 0; return safezero_slow(fd, offset, len); } Jan

On 08/22/2014 10:32 AM, Ján Tomko wrote:
On 08/21/2014 06:51 PM, John Ferlan wrote:
<...snip...>
Hmm.. Perhaps a better way to do this would be to modify safezero() to add a 4th boolean parameter "resize" and make the "safezero_mmap()" be the false side and the check/call to SYS_fallocate() be the true side. That way all the logic resides in virfile.c
How about 'fallback_to_mmap' instead of resize?
Or even simpler, we don't really need a different set of fallbacks for resize than the other uses. It seems the patch adding the preallocation to FileResize should have used safezero instead of reimplementing it.
(I also wonder if there are systems without posix_fallocate, but having the syscall...)
ACK to safezero implementation with all four methods:
int safezero(int fd, off_t offset, off_t len) { if (virFileFdPosixFallocate(fd, offset, len) == 0) return 0; if (safezero_sys_fallocate(fd, offset, len) == 0) return 0; if (safezero_mmap(fd, offset, len) == 0) return 0; return safezero_slow(fd, offset, len); }
OK - I'll do this (although since virFileFdPosixFallocate only needs to be local now, I'll rename it to safezero_posix_fallocate() and make it static to the module as well as removing it from libvirt_private.syms). John

On Fri, Aug 22, 2014 at 04:32:09PM +0200, Ján Tomko wrote:
On 08/21/2014 06:51 PM, John Ferlan wrote: ACK to safezero implementation with all four methods:
int safezero(int fd, off_t offset, off_t len) { if (virFileFdPosixFallocate(fd, offset, len) == 0) return 0; if (safezero_sys_fallocate(fd, offset, len) == 0) return 0; if (safezero_mmap(fd, offset, len) == 0) return 0; return safezero_slow(fd, offset, len); }
Huh, why would we want todo that ? safezero() will always use the posix_fallocate() function if it exists at build time, otherwise it will be built for mmap, or even write(). If posix_fallocate() is used, it will try the fast allocation strategy and fallback to manually filling with zeros if not available without libvirt needing todo the fallback itself. So we don't need to do any of this dynamically fallback at runtime - just use safezero() as it exists today. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

ping? Tks - John On 08/11/2014 04:30 PM, John Ferlan wrote:
Changes over v1 - different tact as more research discovers that the posix_fallocate() does not perform the correct pre-allocation of space for an NFS backed file/disk, some details of the findings can be found:
http://www.redhat.com/archives/libvir-list/2014-August/msg00367.html
The first patch in this series will refactor the safezero to allow for more fallback than the current code. Initially implemented as a series of three 'safezero()' functions in commit id 'c29d0929' using build conditionals to determine which of the 3 is being used.
The refactored code will have one function that will make a series of calls rather the just failing on whatever is built into the system. The first is a global virFileFdPosixFallocate() with the build conditional controlling only whether the function runs or will just return -1. It will also be re-used by the Volume Resize code in a future patch.
If on creation the virFileFdPosixFallocate() fails, attempts will be made to then try the mmap() method (if it exists), and then fall back to the slow, old, safewrite of zero filled buffers. The mmap function will follow the format of the posix_fallocate insomuch as if HAVE_MMAP is defined, then that will be attempted or a -1 will be returned immediately.
The major change of this patch over the prior code is the fallback. One other minor change is if the virFileFdPosixFallocate() tries to call posix_fallocate() and fails, then a VIR_WARN will be used to indicate something went wrong. Previously, the code would return errno and eventually that would filter back to a caller which would use the errno in a sys error message. This I think only is odd when it comes to the resize path.
The second patch looks to resolve the first issue discovered by:
https://bugzilla.redhat.com/show_bug.cgi?id=1077068
It does this by checking the returned size/len of the file in the posix_fallocate() call and if it does not match a VIR_WARN will be posted and a failure returned allowing either the mmap or write of zero filled buffers to the file.
The third patch changes the virStorageFileResize() logic to follow the safezero() logic with respect to build conditionals. Also, rather than reinvent the wheel, it will reuse the virFileFdPosixFallocate(). The new static API resizeSysFallocate() will return -1 if the build conditionals are not met.
A current "downside" is by calling virFileFdPosixFallocate() twice (once directly and the other indirectly through safezero), the VIR_WARN will be displayed twice, but that should only affect someone trying to debug things.
NOTE: Investigation and testing while creating this series also showed it's possible to modify the NFS Pool Start code to add a "-o wsize=4096" to the MOUNT command in order to force smaller write's. While that did fix the symptoms, the fix just didn't "feel right". The block size of the "source" export was 4096 bytes, while the block size of the "target" file was 1048576 bytes (current NFSv4 wsize default value when not specified). Whether those play into what posix_fallocate() does I am not sure, but if it does and that's fixed, then the way these patches are coded up - it won't matter. Compared to perhaps needing to revisit or document heavily how to set some new field in the pool XML source - this just seems more right.
John Ferlan (3): virfile: Refactor safezero, introduce virFileFdPosixFallocate virfile: Check returned size from virFileFdPosixFallocate virstoragefile: Refactor virStorageFileResize
src/libvirt_private.syms | 1 + src/util/virfile.c | 58 ++++++++++++++++++++++++++++++++++++++--------- src/util/virfile.h | 2 ++ src/util/virstoragefile.c | 52 +++++++++++++++++++++++++++--------------- 4 files changed, 84 insertions(+), 29 deletions(-)

On 08/11/2014 10:30 PM, John Ferlan wrote:
Changes over v1 - different tact as more research discovers that the posix_fallocate() does not perform the correct pre-allocation of space for an NFS backed file/disk, some details of the findings can be found:
http://www.redhat.com/archives/libvir-list/2014-August/msg00367.html
The first patch in this series will refactor the safezero to allow for more fallback than the current code. Initially implemented as a series of three 'safezero()' functions in commit id 'c29d0929' using build conditionals to determine which of the 3 is being used.
The refactored code will have one function that will make a series of calls rather the just failing on whatever is built into the system. The first is a global virFileFdPosixFallocate() with the build conditional controlling only whether the function runs or will just return -1. It will also be re-used by the Volume Resize code in a future patch.
If on creation the virFileFdPosixFallocate() fails, attempts will be made to then try the mmap() method (if it exists), and then fall back to the slow, old, safewrite of zero filled buffers. The mmap function will follow the format of the posix_fallocate insomuch as if HAVE_MMAP is defined, then that will be attempted or a -1 will be returned immediately.
The major change of this patch over the prior code is the fallback. One other minor change is if the virFileFdPosixFallocate() tries to call posix_fallocate() and fails, then a VIR_WARN will be used to indicate something went wrong. Previously, the code would return errno and eventually that would filter back to a caller which would use the errno in a sys error message. This I think only is odd when it comes to the resize path.
The second patch looks to resolve the first issue discovered by:
https://bugzilla.redhat.com/show_bug.cgi?id=1077068
It does this by checking the returned size/len of the file in the posix_fallocate() call and if it does not match a VIR_WARN will be posted and a failure returned allowing either the mmap or write of zero filled buffers to the file.
The third patch changes the virStorageFileResize() logic to follow the safezero() logic with respect to build conditionals. Also, rather than reinvent the wheel, it will reuse the virFileFdPosixFallocate(). The new static API resizeSysFallocate() will return -1 if the build conditionals are not met.
A current "downside" is by calling virFileFdPosixFallocate() twice (once directly and the other indirectly through safezero), the VIR_WARN will be displayed twice, but that should only affect someone trying to debug things.
NOTE: Investigation and testing while creating this series also showed it's possible to modify the NFS Pool Start code to add a "-o wsize=4096" to the MOUNT command in order to force smaller write's. While that did fix the symptoms, the fix just didn't "feel right". The block size of the "source" export was 4096 bytes, while the block size of the "target" file was 1048576 bytes (current NFSv4 wsize default value when not specified). Whether those play into what posix_fallocate() does I am not sure, but if it does and that's fixed, then the way these patches are coded up - it won't matter. Compared to perhaps needing to revisit or document heavily how to set some new field in the pool XML source - this just seems more right.
John Ferlan (3): virfile: Refactor safezero, introduce virFileFdPosixFallocate virfile: Check returned size from virFileFdPosixFallocate virstoragefile: Refactor virStorageFileResize
src/libvirt_private.syms | 1 + src/util/virfile.c | 58 ++++++++++++++++++++++++++++++++++++++--------- src/util/virfile.h | 2 ++ src/util/virstoragefile.c | 52 +++++++++++++++++++++++++++--------------- 4 files changed, 84 insertions(+), 29 deletions(-)
ACK series Jan
participants (3)
-
Daniel P. Berrange
-
John Ferlan
-
Ján Tomko