On 12/22/2010 11:35 AM, Eric Blake wrote:
Right now, we create qemu snapshots to a file by using an exec:
monitor
command that passes 'compressor | { dd && dd; }' with stdout connected
to the target file. However, since dd is using a larger bs= than
PIPE_MAX, it is conceivable that under heavy machine load, that dd will
get a short read from the pipe, and unless we use the GNU extension of
iflag=fullblock, that short read will be padded out to the output block
size and result in data corruption in the destination file.
The possibility of corruption due to short reads when dd is fed input
through a pipe but produces output through a large block size has
occurred several times on the coreutils mailing list:
http://lists.gnu.org/archive/html/bug-coreutils/2010-11/msg00092.html
Coreutils currently obeys the (non-intuitive) POSIX requirements for
this short read behavior, and while it is being considered to make dd
when POSIXLY_CORRECT is unset be more sensible, you can't rely on that
being a default.
Should we refuse to make snapshots if we don't detect the GNU extension
of 'dd iflag=fullblock'? Probably safe to do on GNU/Linux machines, but
I'm wondering if it will have negative effects on BSD machines where GNU
coreutils is not installed.
I've gone back and read POSIX again. It states:
If the bs=expr operand is specified and no conversions other than sync,
noerror, or notrunc are requested, the data returned from each input
block shall be written as a separate output block; if the read returns
less than a full block and the sync conversion is not specified, the
resulting output block shall be the same size as the input block. If the
bs=expr operand is not specified, or a conversion other than sync,
noerror, or notrunc is requested, the input shall be processed and
collected into full-sized output blocks until the end of the input is
reached.
So I stand corrected - because we aren't using conv=sync, there is no
zero padding or data corruption; however, behavior can still be
improved. With bs=, a short read results in a short write, so we aren't
using the full 1M block size that we asked of dd, and end up with more
syscalls than necessary. Swapping to ibs= obs= instead of bs= allows dd
to do fewer syscalls on output. Patch coming up soon, but I'm grateful
that this is not a data corruption bug after all. And the GNU extension
of iflag=fullblock is only useful when mixing bs= and conv=sync, so
there's no need to use that extension.
--
Eric Blake eblake(a)redhat.com +1-801-349-2682
Libvirt virtualization library
http://libvirt.org