On 05/12/2017 12:56 PM, John Ferlan wrote:
On 05/12/2017 03:29 AM, Michal Privoznik wrote:
> On 05/05/2017 04:48 PM, Daniel P. Berrange wrote:
>> On Fri, May 05, 2017 at 01:25:34PM +0200, Michal Privoznik wrote:
>>> On 05/04/2017 11:29 PM, John Ferlan wrote:
>>>>
>>>>
>>>> On 04/20/2017 06:01 AM, Michal Privoznik wrote:
>>>>> This API can be used to tell the other side of the stream to skip
>>>>
>>>> s/can be/is (unless it can be used for something else ;-))
>>>>
>>>>> some bytes in the stream. This can be used to create a sparse
>>>>> file on the receiving side of a stream.
>>>>>
>>>>> It takes just one argument @length, which says how big the hole
>>>>> is. Since our streams are not rewindable like regular files, we
>>>>> don't need @whence argument like seek(2) has.
>>>>
>>>> lseek is an implementation detail... However, it could be stated that
>>>> the skipping would be from the current point in the file forward by some
>>>> number of bytes. It's expected to be used in conjunction with code
that
>>>> is copying over the real (or non-zero) data and should be considered an
>>>> optimization over sending zere data segments.
>>>>
>>>>>
>>>>> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
>>>>> ---
>>>>> include/libvirt/libvirt-stream.h | 3 +++
>>>>> src/driver-stream.h | 5 ++++
>>>>> src/libvirt-stream.c | 57
++++++++++++++++++++++++++++++++++++++++
>>>>> src/libvirt_public.syms | 1 +
>>>>> 4 files changed, 66 insertions(+)
>>>>>
>>>>
>>>> While it would be unused for now, should @flags be added. Who knows
>>>> what use it could have, but avoids a new Flags API, but does cause a few
>>>> other wording changes here.
>>>
>>> Ah sure. We should have @flags there. Good point.
>>>
>>>>
>>>> Perhaps it's just me - but "Skip" and "HoleSize"
just seem awkward.
>>>> Would "virStreamSetSkip" and "virStreamGetSkip" be
more apropos? (or
>>>> s/Skip/HoleSize/ - ewww). Names would then follow our more recent
>>>> function naming guidelines. I think I dislike the HoleSize much more
>>>> than the Skip.
>>>
>>> SetSkip and GetSkip sound wrong to me instead :D
>>>
>>>>
>>>>> diff --git a/include/libvirt/libvirt-stream.h
b/include/libvirt/libvirt-stream.h
>>>>> index bee2516..4e0a599 100644
>>>>> --- a/include/libvirt/libvirt-stream.h
>>>>> +++ b/include/libvirt/libvirt-stream.h
>>>>> @@ -50,6 +50,9 @@ int virStreamRecvFlags(virStreamPtr st,
>>>>> size_t nbytes,
>>>>> unsigned int flags);
>>>>>
>>>>> +int virStreamSkip(virStreamPtr st,
>>>>> + unsigned long long length);
>>>>
>>>> Was there consideration for using 'off_t' instead of ULL? I know
it's an
>>>> implementation detail of virFDStreamData and lseek() usage, but it does
>>>> hide things... IDC either way.
>>>
>>> The problem with off_t is that it is signed type, while ULL is unsigned.
>>> There's not much point in sending a negative offset, is there?
>>> Moreover, we use ULL for arguments like offset (not sure really why).
>>> Frankly, I don't really know why. Perhaps some types don't exist
everywhere?
>>
>> If anything, we would use size_t, for consistency with the Send/Recv
>> methods.
>
> So I've given this some though and ran some experiments. On a 32bit arch
> I've found this:
>
> long long 8 signed
> size_t 4 unsigned
> ssize_t 4 signed
> off_t 4 signed
>
> So size_t is 4 bytes long and long long is 8 bytes. This got me
> thinking, size_t type makes sense for those APIs where we need to
> address individual bytes. But what would happen if I have the following
> file on a 32 bit arch:
>
> [2MB data] -> [5GB hole] -> [2M data]
Would a 5.4G file exist on a 32b arch since the OS couldn't address it
nor could it get anything beyond 4GB-1 as a size and then only if using
unsigned as the sizing... It's been far too long since I worked on 32
bit arches (or less). The 90's are over aren't they ;-)
Sure it can. Kernel might use long long internally. It definitely has to
use bigger type than size_t otherwise we couldn't have >4GB files. But
we can. BTW there are other ways to detect holes in files. For instance
'cp' uses ioctl(fd, FS_IOC_FIEMAP, fiemap) to get map of extents on the
disk. The fiemap struct then uses uint64_t type to represent addresses
and lengths for extents.
Therefore I'm gonna stick with my decision and have this unsigned long long.
The other argument for using that is: we want ULL to be used at RPC
level, right? However, this might clash when mixing 32 and 64 bit
client/server as decoding into size_t might fail - ULL does not fit into
size_t on 32 bits.
Here's an example of a big sparse file:
rpi ~ # truncate -s 5G bla.raw
rpi ~ # ls -lhs bla.raw
0 -rw-r--r-- 1 root root 5.0G May 12 14:57 bla.raw
rpi ~ # uname -a
Linux rpi 4.1.15-v7+ #830 SMP Tue Dec 15 17:02:45 GMT 2015 armv7l ARMv7
Processor rev 5 (v7l) BCM2709 GNU/Linux
rpi ~ # /home/zippy/tmp/sparse/sparse_map bla.raw
hole: 0 len: 5368709120
end: 5368709120
sparse_map is a program I've written for getting data/hole map of a
given file. You can find it here:
https://pastebin.com/SSjAUi3q
BTW: try changing those (long long) typecasts to (size_t) and you'll
immediately see why we need long long instead of size_t.
Michal