So, after couple of sleepless nights and headaches I'm proud to
announce that finally got this working.
What?
Our regular streams that are can be used to transfer disk images
for domains are unaware of any sparseness. Therefore they have
two limitations:
a) transferring big but sparse image can take ages as all the
holes (interpreted by kernel as '\0') have to go through our
event loop.
b) resulting volume is not sparse even if the source was.
How?
I went by verified approach that linux kernel has. One way to
look at our streams is just like read() and write() with a
different names: virStreamRecv() and virStreamSend(). They even
have the same prototype (if 'int fd' is substituted with
'virStreamPtr'). Now, holes in files are created and detected via
third API: lseek(). Therefore I'm introducing new virStreamSkip()
API that mimics the missing primitive. Now, because our streams
do not necessarily have to work over files (they are for generic
data transfer), I had to let users register a callback that is
called whenever the other side calls virStreamSkip().
So now that we have all three primitives, we can focus on making
life easier for our users. Nobody is actually using bare
virStreamSend() and virStreamRecv() rather than our wrappers:
virStreamSendAll() and virStreamRecvAll(). With my approach
described above just virStreamSendAll() needs to be adjusted so
that it's 'sparse file' aware. The virStreamRecvAll() will only
get the data to write (just like it is now) with skip callback
called automatically whenever needed. In order for
virStreamSendAll() to skip holes I'm introducing yet another
callback: virStreamInDataFunc(). This callback will help us to
create a map of a file: before each virStreamSend() it checks
whether we are in a data section or a hole and calls
virStreamSend() or virStreamSkip() respectively.
Do not worry - it will all become clear once you see the code.
Now question is - how will users enable this feature? I mean, we
have take into account that we might be talking to an older
daemon that does not know how to skip a hole. Or vice versa -
older client.
The solution I came up with is to introduce flags to APIs where
sparse streams make sense. I guess it makes sense for volume
upload and download, but almost certainly makes no sense for
virDomainOpenConsole().
Code?
>From users POV they just need to pass correct argument to
'vol-upload' or 'vol-download' virsh commands. One layer down, on
programming level they need to merely:
st = virStreamNew(conn, 0);
virStreamRegisterSkip(st, skipFunc, &fd);
virStorageVolDownload(st, ...);
virStreamRecvAll(st, sinkFunc, &fd);
where:
int skipFunc(virStreamPtr st,
unsigned long long offset, void *opaque)
{
int *fd = opaque;
return lseek(*fd, offset, SEEK_CUR) == (off_t) -1 ? -1 : 0;
}
And for uploading it's slightly more verbose - see patch 24.
Limitations?
While testing this on my machine with XFS, I've noticed that the
resulting map of a transferred file is not exactly the same as
the source's. Checksums are the same though. After digging deeper
I found this commit in the kernel:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id...
Thing is, as we transfer the file, we are practically just
seeking at EOF and thus creating holes. But if the hole size is
small enough, XFS will use some speculative file allocation
algorithm and eventually fully allocate the blocks even if we
intended to create a hole. This does not occur when punching a
hole into a file though. Well, I guess XFS devels have some
reasons to do that.
This behaviour has not been observed on EXT4.
Notes?
Oh, patches 01-03 have been ACKed already, but are not pushed yet
because of the freeze. But since this feature build on the top of
them, I'm sending them too.
Also the whole patch set is accessible at my github:
https://github.com/zippy2/libvirt/tree/sparse_streams4
Michal Privoznik (27):
Revert "rpc: Fix slow volume download (virsh vol-download)"
virnetclientstream: Process stream messages later
virStream{Recv,Send}All: Increase client buffer
Introduce virStreamSkip
Introduce virStreamRegisterSkip and virStreamSkipCallback
Introduce virStreamInData and virStreamRegisterInData
virNetClientStreamNew: Track origin stream
Track if stream is seekable
RPC: Introduce virNetStreamSkip
Introduce VIR_NET_STREAM_SKIP message type
Teach wireshark plugin about VIR_NET_STREAM_SKIP
daemon: Implement VIR_NET_STREAM_SKIP handling
daemon: Introduce virNetServerProgramSendStreamSkip
virnetclientstream: Introduce virNetClientStreamSendSkip
virnetclientstream: Introduce virNetClientStreamHandleSkip
remote_driver: Implement virStreamSkip
daemonStreamHandleRead: Wire up seekable stream
virNetClientStream: Wire up VIR_NET_STREAM_SKIP
virStreamSendAll: Wire up sparse streams
fdstream: Implement seek
gendispatch: Introduce @sparseflag for our calls
Introduce virStorageVol{Download,Upload}Flags
virsh: Implement sparse stream to vol-download
virsh: Implement sparse stream to vol-upload
fdstream: Suppress use of IO helper for sparse streams
daemon: Don't call virStreamInData so often
storage: Enable sparse streams for virStorageVol{Download,Upload}
daemon/remote.c | 2 +-
daemon/stream.c | 134 +++++++++++++--
daemon/stream.h | 3 +-
include/libvirt/libvirt-storage.h | 9 +
include/libvirt/libvirt-stream.h | 47 ++++++
src/datatypes.h | 8 +
src/driver-stream.h | 5 +
src/fdstream.c | 156 +++++++++++++++---
src/fdstream.h | 3 +-
src/libvirt-storage.c | 4 +-
src/libvirt-stream.c | 238 ++++++++++++++++++++++++++-
src/libvirt_internal.h | 7 +
src/libvirt_private.syms | 2 +
src/libvirt_public.syms | 7 +
src/libvirt_remote.syms | 2 +
src/remote/remote_driver.c | 41 ++++-
src/remote/remote_protocol.x | 2 +
src/rpc/gendispatch.pl | 21 ++-
src/rpc/virnetclient.c | 1 +
src/rpc/virnetclientstream.c | 308 ++++++++++++++++++++++++-----------
src/rpc/virnetclientstream.h | 10 +-
src/rpc/virnetprotocol.x | 16 +-
src/rpc/virnetserverprogram.c | 33 ++++
src/rpc/virnetserverprogram.h | 7 +
src/storage/storage_backend.c | 12 +-
src/storage/storage_driver.c | 4 +-
src/virnetprotocol-structs | 4 +
tools/virsh-volume.c | 40 ++++-
tools/virsh.c | 79 +++++++++
tools/virsh.h | 12 ++
tools/virsh.pod | 6 +-
tools/wireshark/src/packet-libvirt.c | 40 +++++
tools/wireshark/src/packet-libvirt.h | 2 +
33 files changed, 1104 insertions(+), 161 deletions(-)
Read the code mostly for educational purposes, but also ran through
Coverity...
First the coverity notes:
Patch 11: dissect_xdr_stream_skip() - "start" is not initialized
Patch 15: virNetClientStreamHandleSkip() - if (!msg || ...) check then
dereferences msg in the virReportError. So it seems !msg needs its own
check and message.
Couple of questions/thoughts about skipping in general.
Is it possible to skip past some sort of EOF? IOW: What if skip
'offset' is larger than perceived maximum size of the file? I didn't dig
line to line, but just a general thought along the lines of the either
not so bright or even worse malicious code that now has a means to fill
up a disk with a mostly empty file.
In Patch 23, your commit message indicates enabling skips for specific
backends - perhaps batter to note "local" vs "remote" type
operations?
For iSCSI, it's possible to configure a disk/lun such that the target
file is found in the /dev/disk/by-path (mode="host").
John