On Fri, Aug 22, 2014 at 04:15:14PM +0100, Daniel P. Berrange wrote:
On Fri, Aug 22, 2014 at 10:56:47AM -0400, John Ferlan wrote:
>
>
> On 08/22/2014 10:46 AM, Daniel P. Berrange wrote:
> > On Mon, Aug 11, 2014 at 04:30:19PM -0400, John Ferlan wrote:
> >> Currently the safezero() function uses build conditionals to choose either
> >> the posix_fallocate() or mmap() with a fallback to safewrite() in order to
> >> preallocate a file.
> >>
> >> This patch will modify the logic in order to allow fallbacks in the
> >> event that posix_fallocate() or the ftruncate()and mmap() doesn't work
> >> properly. The fallback will be to use the slow safewrite of zero filled
> >> buffers to the file.
> >
> > Have you actually encountered failing of posix_fallocate() in the
> > real world ? It is supposed to automatically fallback to the
> > equivalent of writing zeros if the filesystem / kernel does not
> > support it, so we should not have todo runtime fallback ourselves.
> > The existance of fallback is the main distinction between the
> > posix_fallocate() and fallocate() system calls.
> >
>
> It wasn't so much as a "failure" as "unexpected results" -
the key being
> that the resulting created (or resized) file was not sized as expected.
>
> For an NFS target the results are not what was expected. I've left some
> history in the prior set of patches with the following probably having
> the most details:
>
>
http://www.redhat.com/archives/libvir-list/2014-August/msg00367.html
So, IIUC, the bug happens when the rsize mount option to NFS is not 4k.
strace'ing libvirtd on an NFS volume in this case shows:
open("/var/lib/libvirt/images/lettuce/foo", O_RDWR|O_CREAT|O_EXCL, 0600) = 24
fstat(24, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
ftruncate(24, 1073741824) = 0
fallocate(24, 0, 0, 1073741824) = -1 EOPNOTSUPP (Operation not supported)
fallocate(24, 0, 0, 1073741824) = -1 EOPNOTSUPP (Operation not supported)
fstat(24, {st_mode=S_IFREG|0600, st_size=1073741824, ...}) = 0
fstatfs(24, {f_type="NFS_SUPER_MAGIC", f_bsize=1048576, f_blocks=118342,
f_bfree=71002, f_bavail=65632, f_files=7678560, f_ffree=5495931, f_fsid={0, 0},
f_namelen=255, f_frsize=1048576}) = 0
pread(24, "\0", 1, 1048575) = 1
pwrite(24, "\0", 1, 1048575) = 1
pread(24, "\0", 1, 2097151) = 1
pwrite(24, "\0", 1, 2097151) = 1
pread(24, "\0", 1, 3145727) = 1
So we can see glibc here trying fallocate() and then falling back to
writing zeros. Since the volume does not come out at the right size
this seems to show a bug in glibc.
So I think we really ought to report that bug to glibc to be fixed
there rather than working around it in libvirt, as there are many
more applications besides libvirt that will be impacted by this
bug.
Opps, meant to include the stack trace to show where the pread/writes
are coming from:
(gdb) bt
#0 pread64 () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f55a29f9c5e in internal_fallocate (fd=fd@entry=24, offset=1048575,
len=1072693248)
at ../sysdeps/posix/posix_fallocate.c:78
#2 0x00007f55a29f9cc7 in posix_fallocate (fd=fd@entry=24, offset=<optimized out>,
len=<optimized out>)
at ../sysdeps/unix/sysv/linux/wordsize-64/posix_fallocate.c:62
#3 0x00007f55a6071026 in safezero (fd=fd@entry=24, offset=<optimized out>,
len=<optimized out>) at util/virfile.c:1031
#4 0x00007f55916258c2 in createRawFile (inputvol=0x0, vol=0x7f5570008280, fd=24) at
storage/storage_backend.c:389
#5 virStorageBackendCreateRaw (conn=<optimized out>, pool=<optimized out>,
vol=0x7f5570008280, inputvol=0x0,
flags=<optimized out>) at storage/storage_backend.c:450
Regards,
Daniel
--
|: