
On 12/22/2010 11:35 AM, Eric Blake wrote:
Right now, we create qemu snapshots to a file by using an exec: monitor command that passes 'compressor | { dd && dd; }' with stdout connected to the target file. However, since dd is using a larger bs= than PIPE_MAX, it is conceivable that under heavy machine load, that dd will get a short read from the pipe, and unless we use the GNU extension of iflag=fullblock, that short read will be padded out to the output block size and result in data corruption in the destination file.
The possibility of corruption due to short reads when dd is fed input through a pipe but produces output through a large block size has occurred several times on the coreutils mailing list: http://lists.gnu.org/archive/html/bug-coreutils/2010-11/msg00092.html Coreutils currently obeys the (non-intuitive) POSIX requirements for this short read behavior, and while it is being considered to make dd when POSIXLY_CORRECT is unset be more sensible, you can't rely on that being a default.
Should we refuse to make snapshots if we don't detect the GNU extension of 'dd iflag=fullblock'? Probably safe to do on GNU/Linux machines, but I'm wondering if it will have negative effects on BSD machines where GNU coreutils is not installed.
I've gone back and read POSIX again. It states: If the bs=expr operand is specified and no conversions other than sync, noerror, or notrunc are requested, the data returned from each input block shall be written as a separate output block; if the read returns less than a full block and the sync conversion is not specified, the resulting output block shall be the same size as the input block. If the bs=expr operand is not specified, or a conversion other than sync, noerror, or notrunc is requested, the input shall be processed and collected into full-sized output blocks until the end of the input is reached. So I stand corrected - because we aren't using conv=sync, there is no zero padding or data corruption; however, behavior can still be improved. With bs=, a short read results in a short write, so we aren't using the full 1M block size that we asked of dd, and end up with more syscalls than necessary. Swapping to ibs= obs= instead of bs= allows dd to do fewer syscalls on output. Patch coming up soon, but I'm grateful that this is not a data corruption bug after all. And the GNU extension of iflag=fullblock is only useful when mixing bs= and conv=sync, so there's no need to use that extension. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org