
On 05/12/2017 12:56 PM, John Ferlan wrote:
On 05/12/2017 03:29 AM, Michal Privoznik wrote:
On 05/05/2017 04:48 PM, Daniel P. Berrange wrote:
On Fri, May 05, 2017 at 01:25:34PM +0200, Michal Privoznik wrote:
On 05/04/2017 11:29 PM, John Ferlan wrote:
On 04/20/2017 06:01 AM, Michal Privoznik wrote:
This API can be used to tell the other side of the stream to skip
s/can be/is (unless it can be used for something else ;-))
some bytes in the stream. This can be used to create a sparse file on the receiving side of a stream.
It takes just one argument @length, which says how big the hole is. Since our streams are not rewindable like regular files, we don't need @whence argument like seek(2) has.
lseek is an implementation detail... However, it could be stated that the skipping would be from the current point in the file forward by some number of bytes. It's expected to be used in conjunction with code that is copying over the real (or non-zero) data and should be considered an optimization over sending zere data segments.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- include/libvirt/libvirt-stream.h | 3 +++ src/driver-stream.h | 5 ++++ src/libvirt-stream.c | 57 ++++++++++++++++++++++++++++++++++++++++ src/libvirt_public.syms | 1 + 4 files changed, 66 insertions(+)
While it would be unused for now, should @flags be added. Who knows what use it could have, but avoids a new Flags API, but does cause a few other wording changes here.
Ah sure. We should have @flags there. Good point.
Perhaps it's just me - but "Skip" and "HoleSize" just seem awkward. Would "virStreamSetSkip" and "virStreamGetSkip" be more apropos? (or s/Skip/HoleSize/ - ewww). Names would then follow our more recent function naming guidelines. I think I dislike the HoleSize much more than the Skip.
SetSkip and GetSkip sound wrong to me instead :D
diff --git a/include/libvirt/libvirt-stream.h b/include/libvirt/libvirt-stream.h index bee2516..4e0a599 100644 --- a/include/libvirt/libvirt-stream.h +++ b/include/libvirt/libvirt-stream.h @@ -50,6 +50,9 @@ int virStreamRecvFlags(virStreamPtr st, size_t nbytes, unsigned int flags);
+int virStreamSkip(virStreamPtr st, + unsigned long long length);
Was there consideration for using 'off_t' instead of ULL? I know it's an implementation detail of virFDStreamData and lseek() usage, but it does hide things... IDC either way.
The problem with off_t is that it is signed type, while ULL is unsigned. There's not much point in sending a negative offset, is there? Moreover, we use ULL for arguments like offset (not sure really why). Frankly, I don't really know why. Perhaps some types don't exist everywhere?
If anything, we would use size_t, for consistency with the Send/Recv methods.
So I've given this some though and ran some experiments. On a 32bit arch I've found this:
long long 8 signed size_t 4 unsigned ssize_t 4 signed off_t 4 signed
So size_t is 4 bytes long and long long is 8 bytes. This got me thinking, size_t type makes sense for those APIs where we need to address individual bytes. But what would happen if I have the following file on a 32 bit arch:
[2MB data] -> [5GB hole] -> [2M data]
Would a 5.4G file exist on a 32b arch since the OS couldn't address it nor could it get anything beyond 4GB-1 as a size and then only if using unsigned as the sizing... It's been far too long since I worked on 32 bit arches (or less). The 90's are over aren't they ;-)
Sure it can. Kernel might use long long internally. It definitely has to use bigger type than size_t otherwise we couldn't have >4GB files. But we can. BTW there are other ways to detect holes in files. For instance 'cp' uses ioctl(fd, FS_IOC_FIEMAP, fiemap) to get map of extents on the disk. The fiemap struct then uses uint64_t type to represent addresses and lengths for extents. Therefore I'm gonna stick with my decision and have this unsigned long long. The other argument for using that is: we want ULL to be used at RPC level, right? However, this might clash when mixing 32 and 64 bit client/server as decoding into size_t might fail - ULL does not fit into size_t on 32 bits. Here's an example of a big sparse file: rpi ~ # truncate -s 5G bla.raw rpi ~ # ls -lhs bla.raw 0 -rw-r--r-- 1 root root 5.0G May 12 14:57 bla.raw rpi ~ # uname -a Linux rpi 4.1.15-v7+ #830 SMP Tue Dec 15 17:02:45 GMT 2015 armv7l ARMv7 Processor rev 5 (v7l) BCM2709 GNU/Linux rpi ~ # /home/zippy/tmp/sparse/sparse_map bla.raw hole: 0 len: 5368709120 end: 5368709120 sparse_map is a program I've written for getting data/hole map of a given file. You can find it here: https://pastebin.com/SSjAUi3q BTW: try changing those (long long) typecasts to (size_t) and you'll immediately see why we need long long instead of size_t. Michal