[PATCH v2 0/3] Wrong allocation size displayed for NFS volumes

Changes over v1 - different tact as more research discovers that the posix_fallocate() does not perform the correct pre-allocation of space for an NFS backed file/disk, some details of the findings can be found: http://www.redhat.com/archives/libvir-list/2014-August/msg00367.html The first patch in this series will refactor the safezero to allow for more fallback than the current code. Initially implemented as a series of three 'safezero()' functions in commit id 'c29d0929' using build conditionals to determine which of the 3 is being used. The refactored code will have one function that will make a series of calls rather the just failing on whatever is built into the system. The first is a global virFileFdPosixFallocate() with the build conditional controlling only whether the function runs or will just return -1. It will also be re-used by the Volume Resize code in a future patch. If on creation the virFileFdPosixFallocate() fails, attempts will be made to then try the mmap() method (if it exists), and then fall back to the slow, old, safewrite of zero filled buffers. The mmap function will follow the format of the posix_fallocate insomuch as if HAVE_MMAP is defined, then that will be attempted or a -1 will be returned immediately. The major change of this patch over the prior code is the fallback. One other minor change is if the virFileFdPosixFallocate() tries to call posix_fallocate() and fails, then a VIR_WARN will be used to indicate something went wrong. Previously, the code would return errno and eventually that would filter back to a caller which would use the errno in a sys error message. This I think only is odd when it comes to the resize path. The second patch looks to resolve the first issue discovered by: https://bugzilla.redhat.com/show_bug.cgi?id=1077068 It does this by checking the returned size/len of the file in the posix_fallocate() call and if it does not match a VIR_WARN will be posted and a failure returned allowing either the mmap or write of zero filled buffers to the file. The third patch changes the virStorageFileResize() logic to follow the safezero() logic with respect to build conditionals. Also, rather than reinvent the wheel, it will reuse the virFileFdPosixFallocate(). The new static API resizeSysFallocate() will return -1 if the build conditionals are not met. A current "downside" is by calling virFileFdPosixFallocate() twice (once directly and the other indirectly through safezero), the VIR_WARN will be displayed twice, but that should only affect someone trying to debug things. NOTE: Investigation and testing while creating this series also showed it's possible to modify the NFS Pool Start code to add a "-o wsize=4096" to the MOUNT command in order to force smaller write's. While that did fix the symptoms, the fix just didn't "feel right". The block size of the "source" export was 4096 bytes, while the block size of the "target" file was 1048576 bytes (current NFSv4 wsize default value when not specified). Whether those play into what posix_fallocate() does I am not sure, but if it does and that's fixed, then the way these patches are coded up - it won't matter. Compared to perhaps needing to revisit or document heavily how to set some new field in the pool XML source - this just seems more right. John Ferlan (3): virfile: Refactor safezero, introduce virFileFdPosixFallocate virfile: Check returned size from virFileFdPosixFallocate virstoragefile: Refactor virStorageFileResize src/libvirt_private.syms | 1 + src/util/virfile.c | 58 ++++++++++++++++++++++++++++++++++++++--------- src/util/virfile.h | 2 ++ src/util/virstoragefile.c | 52 +++++++++++++++++++++++++++--------------- 4 files changed, 84 insertions(+), 29 deletions(-) -- 1.9.3

Currently the safezero() function uses build conditionals to choose either the posix_fallocate() or mmap() with a fallback to safewrite() in order to preallocate a file. This patch will modify the logic in order to allow fallbacks in the event that posix_fallocate() or the ftruncate()and mmap() doesn't work properly. The fallback will be to use the slow safewrite of zero filled buffers to the file. This patch also introduces virFileFdPosixFallocate() so that it can be used by the resize code rather than having two separate functions Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virfile.c | 37 +++++++++++++++++++++++++++---------- src/util/virfile.h | 2 ++ 3 files changed, 30 insertions(+), 10 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 08111d4..5556d31 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1278,6 +1278,7 @@ virFileDirectFdFlag; virFileExists; virFileFclose; virFileFdopen; +virFileFdPosixFallocate; virFileFindHugeTLBFS; virFileFindMountPoint; virFileFindResource; diff --git a/src/util/virfile.c b/src/util/virfile.c index f9efc65..7f03cbf 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -1034,26 +1034,26 @@ safewrite(int fd, const void *buf, size_t count) return nwritten; } -#ifdef HAVE_POSIX_FALLOCATE int -safezero(int fd, off_t offset, off_t len) +virFileFdPosixFallocate(int fd, off_t offset, off_t len) { +#ifdef HAVE_POSIX_FALLOCATE int ret = posix_fallocate(fd, offset, len); if (ret == 0) return 0; + VIR_WARN("Failed to pre-allocate '%lu' bytes with return value '%d'", + len, ret); errno = ret; +#endif /* HAVE_POSIX_FALLOCATE */ return -1; } -#else - -int -safezero(int fd, off_t offset, off_t len) +static int +safezero_mmap(int fd, off_t offset, off_t len) { +#ifdef HAVE_MMAP int r; char *buf; - unsigned long long remain, bytes; -# ifdef HAVE_MMAP static long pagemask = 0; off_t map_skip; @@ -1080,7 +1080,16 @@ safezero(int fd, off_t offset, off_t len) /* fall back to writing zeroes using safewrite if mmap fails (for * example because of virtual memory limits) */ -# endif /* HAVE_MMAP */ +#endif /* HAVE_MMAP */ + return -1; +} + +static int +safezero_slow(int fd, off_t offset, off_t len) +{ + int r; + char *buf; + unsigned long long remain, bytes; if (lseek(fd, offset, SEEK_SET) < 0) return -1; @@ -1111,8 +1120,16 @@ safezero(int fd, off_t offset, off_t len) VIR_FREE(buf); return 0; } -#endif /* HAVE_POSIX_FALLOCATE */ +int +safezero(int fd, off_t offset, off_t len) +{ + if (virFileFdPosixFallocate(fd, offset, len) == 0) + return 0; + if (safezero_mmap(fd, offset, len) == 0) + return 0; + return safezero_slow(fd, offset, len); +} #if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R /* search /proc/mounts for mount point of *type; return pointer to diff --git a/src/util/virfile.h b/src/util/virfile.h index 403d0ba..29a1776 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -41,6 +41,8 @@ typedef enum { ssize_t saferead(int fd, void *buf, size_t count) ATTRIBUTE_RETURN_CHECK; ssize_t safewrite(int fd, const void *buf, size_t count) ATTRIBUTE_RETURN_CHECK; +int virFileFdPosixFallocate(int fd, off_t offset, off_t len) + ATTRIBUTE_RETURN_CHECK; int safezero(int fd, off_t offset, off_t len) ATTRIBUTE_RETURN_CHECK; -- 1.9.3

https://bugzilla.redhat.com/show_bug.cgi?id=1077068 Add an fstat check to virFileFdPosixFallocate and fail if the allocated size doesn't match the expected value. Since the failure will fall back to using safewrite(), just leave a VIR_WARN message indicating the failure and some entrails for determining why the failure occurred. It seems for an NFS file at least the allocation may not work properly depending on the NFS configuration. Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/util/virfile.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/src/util/virfile.c b/src/util/virfile.c index 7f03cbf..3b87d9f 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -1038,9 +1038,28 @@ int virFileFdPosixFallocate(int fd, off_t offset, off_t len) { #ifdef HAVE_POSIX_FALLOCATE + struct stat sb; + off_t newlen; int ret = posix_fallocate(fd, offset, len); - if (ret == 0) + if (ret == 0) { + /* Let's make sure the new size matches our expectations */ + if (fstat(fd, &sb) < 0) + return -1; +# ifndef WIN32 + newlen = (unsigned long long)sb.st_blocks * + (unsigned long long)DEV_BSIZE; +# else + newlen = sb.st_size; +# endif + if (newlen != len) { + VIR_WARN("posix_fallocate only allocated '%lu' bytes, expected " + "'%lu' byte file (blocks=%lu bsize=%d).", + newlen, len, sb.st_blocks, DEV_BSIZE); + return -1; + } + return 0; + } VIR_WARN("Failed to pre-allocate '%lu' bytes with return value '%d'", len, ret); errno = ret; -- 1.9.3

Currently virStorageFileResize() function uses build conditionals to choose either the posix_fallocate() or mmap() with no fallback in order to preallocate the space in the newly resized file. This patch will modify the logic in order to allow fallbacks in the event that posix_fallocate() or the sys_fallocate syscall() doesn't work properly. The fallback will be to use the slow safewrite of zero filled buffers to the file. Use the virFileFdPosixFallocate() rather than a local function. Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/util/virstoragefile.c | 52 +++++++++++++++++++++++++++++++---------------- 1 file changed, 34 insertions(+), 18 deletions(-) diff --git a/src/util/virstoragefile.c b/src/util/virstoragefile.c index 5b6b2f5..4d37de1 100644 --- a/src/util/virstoragefile.c +++ b/src/util/virstoragefile.c @@ -1086,6 +1086,39 @@ virStorageFileChainGetBroken(virStorageSourcePtr chain, return 0; } +static int +resizeSysFallocate(const char *path, + int fd, + off_t offset, + off_t len) +{ + int rc = -1; +#if HAVE_SYS_SYSCALL_H && defined(SYS_fallocate) + if ((rc = syscall(SYS_fallocate, fd, 0, offset, len)) != 0) { + virReportSystemError(errno, + _("Failed to pre-allocate space for " + "file '%s'"), path); + } +#endif + return rc; +} + +static int +doResize(const char *path, + int fd, + off_t offset, + off_t len) +{ + if (virFileFdPosixFallocate(fd, offset, len) == 0 || + resizeSysFallocate(path, fd, offset, len) == 0 || + safezero(fd, offset, len) == 0) + return 0; + + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("preallocate is not supported on this platform")); + return -1; +} + /** * virStorageFileResize: @@ -1113,25 +1146,8 @@ virStorageFileResize(const char *path, } if (pre_allocate) { -#if HAVE_POSIX_FALLOCATE - if ((rc = posix_fallocate(fd, offset, len)) != 0) { - virReportSystemError(rc, - _("Failed to pre-allocate space for " - "file '%s'"), path); - goto cleanup; - } -#elif HAVE_SYS_SYSCALL_H && defined(SYS_fallocate) - if (syscall(SYS_fallocate, fd, 0, offset, len) != 0) { - virReportSystemError(errno, - _("Failed to pre-allocate space for " - "file '%s'"), path); + if (doResize(path, fd, offset, len) < 0) goto cleanup; - } -#else - virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", - _("preallocate is not supported on this platform")); - goto cleanup; -#endif } else { if (ftruncate(fd, capacity) < 0) { virReportSystemError(errno, -- 1.9.3
participants (1)
-
John Ferlan