Re: [libvirt] [PATCH v2 11/38] Introduce virStreamSkip

12 May 2017

On 05/12/2017 12:56 PM, John Ferlan wrote:
...
On 05/12/2017 03:29 AM, Michal Privoznik wrote:
...
On 05/05/2017 04:48 PM, Daniel P. Berrange wrote:
...
On Fri, May 05, 2017 at 01:25:34PM +0200, Michal Privoznik wrote:
...
On 05/04/2017 11:29 PM, John Ferlan wrote:
...
On 04/20/2017 06:01 AM, Michal Privoznik wrote:
...
This API can be used to tell the other side of the stream to skip
s/can be/is  (unless it can be used for something else ;-))
...
some bytes in the stream. This can be used to create a sparse
file on the receiving side of a stream.
It takes just one argument @length, which says how big the hole
is. Since our streams are not rewindable like regular files, we
don't need @whence argument like seek(2) has.
lseek is an implementation detail...  However, it could be stated that
the skipping would be from the current point in the file forward by some
number of bytes.  It's expected to be used in conjunction with code that
is copying over the real (or non-zero) data and should be considered an
optimization over sending zere data segments.
...
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
 include/libvirt/libvirt-stream.h |  3 +++
 src/driver-stream.h              |  5 ++++
 src/libvirt-stream.c             | 57 ++++++++++++++++++++++++++++++++++++++++
 src/libvirt_public.syms          |  1 +
 4 files changed, 66 insertions(+)
While it would be unused for now, should @flags be added.  Who knows
what use it could have, but avoids a new Flags API, but does cause a few
other wording changes here.
Ah sure. We should have @flags there. Good point.
...
Perhaps it's just me - but "Skip" and "HoleSize" just seem awkward.
Would "virStreamSetSkip" and "virStreamGetSkip" be more apropos? (or
s/Skip/HoleSize/ - ewww). Names would then follow our more recent
function naming guidelines. I think I dislike the HoleSize much more
than the Skip.
SetSkip and GetSkip sound wrong to me instead :D
...
...

diff --git a/include/libvirt/libvirt-stream.h b/include/libvirt/libvirt-stream.h
index bee2516..4e0a599 100644
--- a/include/libvirt/libvirt-stream.h
+++ b/include/libvirt/libvirt-stream.h
@@ -50,6 +50,9 @@ int virStreamRecvFlags(virStreamPtr st,
                        size_t nbytes,
                        unsigned int flags);
+int virStreamSkip(virStreamPtr st,
+                  unsigned long long length);
Was there consideration for using 'off_t' instead of ULL? I know it's an
implementation detail of virFDStreamData and lseek() usage, but it does
hide things... IDC either way.
The problem with off_t is that it is signed type, while ULL is unsigned.
There's not much point in sending a negative offset, is there?
Moreover, we use ULL for arguments like offset (not sure really why).
Frankly, I don't really know why. Perhaps some types don't exist everywhere?
If anything, we would use  size_t, for consistency with the Send/Recv
methods.
So I've given this some though and ran some experiments. On a 32bit arch
I've found this:
long long 8 signed
size_t 4 unsigned
ssize_t 4 signed
off_t 4 signed
So size_t is 4 bytes long and long long is 8 bytes. This got me
thinking, size_t type makes sense for those APIs where we need to
address individual bytes. But what would happen if I have the following
file on a 32 bit arch:
[2MB data] -> [5GB hole] -> [2M data]
Would a 5.4G file exist on a 32b arch since the OS couldn't address it
nor could it get anything beyond 4GB-1 as a size and then only if using
unsigned as the sizing...  It's been far too long since I worked on 32
bit arches (or less).  The 90's are over aren't they ;-)
Sure it can. Kernel might use long long internally. It definitely has to
use bigger type than size_t otherwise we couldn't have >4GB files. But
we can. BTW there are other ways to detect holes in files. For instance
'cp' uses ioctl(fd, FS_IOC_FIEMAP, fiemap) to get map of extents on the
disk. The fiemap struct then uses uint64_t type to represent addresses
and lengths for extents.
Therefore I'm gonna stick with my decision and have this unsigned long long.

The other argument for using that is: we want ULL to be used at RPC
level, right? However, this might clash when mixing 32 and 64 bit
client/server as decoding into size_t might fail - ULL does not fit into
size_t on 32 bits.

Here's an example of a big sparse file:

rpi ~ # truncate -s 5G bla.raw
rpi ~ # ls -lhs bla.raw
0 -rw-r--r-- 1 root root 5.0G May 12 14:57 bla.raw
rpi ~ # uname -a
Linux rpi 4.1.15-v7+ #830 SMP Tue Dec 15 17:02:45 GMT 2015 armv7l ARMv7
Processor rev 5 (v7l) BCM2709 GNU/Linux
rpi ~ # /home/zippy/tmp/sparse/sparse_map bla.raw
hole: 0         len: 5368709120
end: 5368709120

sparse_map is a program I've written for getting data/hole map of a
given file. You can find it here: https://pastebin.com/SSjAUi3q
BTW: try changing those (long long) typecasts to (size_t) and you'll
immediately see why we need long long instead of size_t.


Michal