[PATCH v3 0/8] Allow sparse streams for block devices

Please note that the first 7 patches are merged already and I'm sending them only for completeness. The last one (news update) is not merged yet though. Michal Prívozník (8): libvirt-storage: Document volume upload/download stream format virsh: Pass virshStreamCallbackDataPtr to virshStreamSink() and virshStreamSkip() virsh: Track if vol-upload or vol-download work over a block device virshStreamSkip: Emulate skip for block devices virfdstream: Allow sparse stream vol-download virshStreamInData: Handle block devices virfdstream: Emulate skip for block devices news: Document sparse streams for block devices NEWS.rst | 7 ++++ src/libvirt-storage.c | 10 ++++-- src/util/virfdstream.c | 81 ++++++++++++++++++++++++++++++++---------- tools/virsh-util.c | 57 +++++++++++++++++++++++------ tools/virsh-util.h | 1 + tools/virsh-volume.c | 20 ++++++++++- 6 files changed, 144 insertions(+), 32 deletions(-) -- 2.26.2

For libvirt, the volume is just a binary blob and it doesn't interpret data on volume upload/download. But as it turns out, this unspoken assumption is not clear to our users. Document it explicitly. Suggested in: https://bugzilla.redhat.com/show_bug.cgi?id=1851023#c17 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> --- src/libvirt-storage.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/src/libvirt-storage.c b/src/libvirt-storage.c index a45c8b98c1..2a7cdca234 100644 --- a/src/libvirt-storage.c +++ b/src/libvirt-storage.c @@ -1590,7 +1590,10 @@ virStorageVolCreateXMLFrom(virStoragePoolPtr pool, * * Download the content of the volume as a stream. If @length * is zero, then the remaining contents of the volume after - * @offset will be downloaded. + * @offset will be downloaded. Please note that the stream + * transports the volume itself as is, so the downloaded data may + * not correspond to guest OS visible state in cases when a + * complex storage format such as qcow2 or vmdk is used. * * If VIR_STORAGE_VOL_DOWNLOAD_SPARSE_STREAM is set in @flags * effective transmission of holes is enabled. This assumes using @@ -1663,7 +1666,10 @@ virStorageVolDownload(virStorageVolPtr vol, * will fail if @offset + @length exceeds the size of the * volume. Otherwise, if @length is non-zero, an error * will be raised if an attempt is made to upload greater - * than @length bytes of data. + * than @length bytes of data. Please note that the stream + * transports the volume itself as is, so the downloaded data may + * not correspond to guest OS visible state in cases when a + * complex storage format such as qcow2 or vmdk is used. * * If VIR_STORAGE_VOL_UPLOAD_SPARSE_STREAM is set in @flags * effective transmission of holes is enabled. This assumes using -- 2.26.2

These callback will need to know more that the FD they are working on. Pass the structure that is passed to other stream callbacks (e.g. virshStreamSource() or virshStreamSourceSkip()) instead of inventing a new one. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> --- tools/virsh-util.c | 10 +++++----- tools/virsh-volume.c | 6 +++++- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/tools/virsh-util.c b/tools/virsh-util.c index 932d6d0849..89f15efd08 100644 --- a/tools/virsh-util.c +++ b/tools/virsh-util.c @@ -146,9 +146,9 @@ virshStreamSink(virStreamPtr st G_GNUC_UNUSED, size_t nbytes, void *opaque) { - int *fd = opaque; + virshStreamCallbackDataPtr cbData = opaque; - return safewrite(*fd, bytes, nbytes); + return safewrite(cbData->fd, bytes, nbytes); } @@ -186,13 +186,13 @@ virshStreamSkip(virStreamPtr st G_GNUC_UNUSED, long long offset, void *opaque) { - int *fd = opaque; + virshStreamCallbackDataPtr cbData = opaque; off_t cur; - if ((cur = lseek(*fd, offset, SEEK_CUR)) == (off_t) -1) + if ((cur = lseek(cbData->fd, offset, SEEK_CUR)) == (off_t) -1) return -1; - if (ftruncate(*fd, cur) < 0) + if (ftruncate(cbData->fd, cur) < 0) return -1; return 0; diff --git a/tools/virsh-volume.c b/tools/virsh-volume.c index e04e2db096..d29db6c38d 100644 --- a/tools/virsh-volume.c +++ b/tools/virsh-volume.c @@ -790,6 +790,7 @@ cmdVolDownload(vshControl *ctl, const vshCmd *cmd) unsigned long long offset = 0, length = 0; bool created = false; virshControlPtr priv = ctl->privData; + virshStreamCallbackData cbData; unsigned int flags = 0; if (vshCommandOptULongLong(ctl, cmd, "offset", &offset) < 0) @@ -817,6 +818,9 @@ cmdVolDownload(vshControl *ctl, const vshCmd *cmd) created = true; } + cbData.ctl = ctl; + cbData.fd = fd; + if (!(st = virStreamNew(priv->conn, 0))) { vshError(ctl, _("cannot create a new stream")); goto cleanup; @@ -827,7 +831,7 @@ cmdVolDownload(vshControl *ctl, const vshCmd *cmd) goto cleanup; } - if (virStreamSparseRecvAll(st, virshStreamSink, virshStreamSkip, &fd) < 0) { + if (virStreamSparseRecvAll(st, virshStreamSink, virshStreamSkip, &cbData) < 0) { vshError(ctl, _("cannot receive data from volume %s"), name); goto cleanup; } -- 2.26.2

We can't use virFileInData() with block devices, but we can emulate being in data section all the time (vol-upload case). Alternatively, we can't just lseek() beyond EOF with block devices to create a hole, we will have to write zeroes (vol-download case). But to decide we need to know if the FD we are reading data from / writing data to is a block device. Store this information in _virshStreamCallbackData. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> --- tools/virsh-util.h | 1 + tools/virsh-volume.c | 14 ++++++++++++++ 2 files changed, 15 insertions(+) diff --git a/tools/virsh-util.h b/tools/virsh-util.h index 72653d9735..9ef28cfe0a 100644 --- a/tools/virsh-util.h +++ b/tools/virsh-util.h @@ -72,6 +72,7 @@ typedef virshStreamCallbackData *virshStreamCallbackDataPtr; struct _virshStreamCallbackData { vshControl *ctl; int fd; + bool isBlock; }; int diff --git a/tools/virsh-volume.c b/tools/virsh-volume.c index d29db6c38d..374bde4318 100644 --- a/tools/virsh-volume.c +++ b/tools/virsh-volume.c @@ -676,6 +676,7 @@ cmdVolUpload(vshControl *ctl, const vshCmd *cmd) virshControlPtr priv = ctl->privData; unsigned int flags = 0; virshStreamCallbackData cbData; + struct stat sb; if (vshCommandOptULongLong(ctl, cmd, "offset", &offset) < 0) return false; @@ -694,8 +695,14 @@ cmdVolUpload(vshControl *ctl, const vshCmd *cmd) goto cleanup; } + if (fstat(fd, &sb) < 0) { + vshError(ctl, _("unable to stat %s"), file); + goto cleanup; + } + cbData.ctl = ctl; cbData.fd = fd; + cbData.isBlock = !!S_ISBLK(sb.st_mode); if (vshCommandOptBool(cmd, "sparse")) flags |= VIR_STORAGE_VOL_UPLOAD_SPARSE_STREAM; @@ -792,6 +799,7 @@ cmdVolDownload(vshControl *ctl, const vshCmd *cmd) virshControlPtr priv = ctl->privData; virshStreamCallbackData cbData; unsigned int flags = 0; + struct stat sb; if (vshCommandOptULongLong(ctl, cmd, "offset", &offset) < 0) return false; @@ -818,8 +826,14 @@ cmdVolDownload(vshControl *ctl, const vshCmd *cmd) created = true; } + if (fstat(fd, &sb) < 0) { + vshError(ctl, _("unable to stat %s"), file); + goto cleanup; + } + cbData.ctl = ctl; cbData.fd = fd; + cbData.isBlock = !!S_ISBLK(sb.st_mode); if (!(st = virStreamNew(priv->conn, 0))) { vshError(ctl, _("cannot create a new stream")); -- 2.26.2

This callback is called when the server sends us STREAM_HOLE meaning there is no real data, only zeroes. For regular files we would just seek() beyond EOF and ftruncate() to create the hole. But for block devices this won't work. Not only we can't seek() beyond EOF, and ftruncate() will fail, this approach won't fill the device with zeroes. We have to do it manually. Partially resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1852528 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> --- tools/virsh-util.c | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/tools/virsh-util.c b/tools/virsh-util.c index 89f15efd08..884261eb49 100644 --- a/tools/virsh-util.c +++ b/tools/virsh-util.c @@ -189,11 +189,33 @@ virshStreamSkip(virStreamPtr st G_GNUC_UNUSED, virshStreamCallbackDataPtr cbData = opaque; off_t cur; - if ((cur = lseek(cbData->fd, offset, SEEK_CUR)) == (off_t) -1) - return -1; + if (cbData->isBlock) { + g_autofree char * buf = NULL; + const size_t buflen = 1 * 1024 * 1024; /* 1MiB */ - if (ftruncate(cbData->fd, cur) < 0) - return -1; + /* While for files it's enough to lseek() and ftruncate() to create + * a hole which would emulate zeroes on read(), for block devices + * we have to write zeroes to read() zeroes. And we have to write + * @got bytes of zeroes. Do that in smaller chunks though.*/ + + buf = g_new0(char, buflen); + + while (offset) { + size_t count = MIN(offset, buflen); + ssize_t r; + + if ((r = safewrite(cbData->fd, buf, count)) < 0) + return -1; + + offset -= r; + } + } else { + if ((cur = lseek(cbData->fd, offset, SEEK_CUR)) == (off_t) -1) + return -1; + + if (ftruncate(cbData->fd, cur) < 0) + return -1; + } return 0; } -- 2.26.2

When handling sparse stream, a thread is executed. This thread runs a read() or write() loop (depending what API is called; in this case it's virStorageVolDownload() and this the thread run read() loop). The read() is handled in virFDStreamThreadDoRead() which is then data/hole section aware, meaning it uses virFileInData() to detect data and hole sections and sends TYPE_DATA or TYPE_HOLE virStream messages accordingly. However, virFileInData() does not work with block devices. Simply because block devices don't have data and hole sections. What we can do though, is to mimic being always in a DATA section. Partially resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1852528 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> --- src/util/virfdstream.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/src/util/virfdstream.c b/src/util/virfdstream.c index 44399b2dc0..39514ef555 100644 --- a/src/util/virfdstream.c +++ b/src/util/virfdstream.c @@ -398,6 +398,7 @@ struct _virFDStreamThreadData { size_t length; bool doRead; bool sparse; + bool isBlock; int fdin; char *fdinname; int fdout; @@ -421,6 +422,7 @@ virFDStreamThreadDataFree(virFDStreamThreadDataPtr data) static ssize_t virFDStreamThreadDoRead(virFDStreamDataPtr fdst, bool sparse, + bool isBlock, const int fdin, const int fdout, const char *fdinname, @@ -437,8 +439,20 @@ virFDStreamThreadDoRead(virFDStreamDataPtr fdst, ssize_t got; if (sparse && *dataLen == 0) { - if (virFileInData(fdin, &inData, §ionLen) < 0) - return -1; + if (isBlock) { + /* Block devices are always in data section by definition. The + * @sectionLen is slightly more tricky. While we could try and get + * how much bytes is there left until EOF, we can pretend there is + * always X bytes left and let the saferead() below hit EOF (which + * is then handled gracefully anyway). Worst case scenario, this + * branch is called more than once. + * X was chosen to be 1MiB but it has ho special meaning. */ + inData = 1; + sectionLen = 1 * 1024 * 1024; + } else { + if (virFileInData(fdin, &inData, §ionLen) < 0) + return -1; + } if (length && sectionLen > length - total) @@ -568,6 +582,7 @@ virFDStreamThread(void *opaque) virStreamPtr st = data->st; size_t length = data->length; bool sparse = data->sparse; + bool isBlock = data->isBlock; VIR_AUTOCLOSE fdin = data->fdin; char *fdinname = data->fdinname; VIR_AUTOCLOSE fdout = data->fdout; @@ -604,7 +619,7 @@ virFDStreamThread(void *opaque) } if (doRead) - got = virFDStreamThreadDoRead(fdst, sparse, + got = virFDStreamThreadDoRead(fdst, sparse, isBlock, fdin, fdout, fdinname, fdoutname, length, total, @@ -1271,6 +1286,7 @@ virFDStreamOpenFileInternal(virStreamPtr st, threadData->st = virObjectRef(st); threadData->length = length; threadData->sparse = sparse; + threadData->isBlock = !!S_ISBLK(sb.st_mode); if ((oflags & O_ACCMODE) == O_RDONLY) { threadData->fdin = fd; -- 2.26.2

This is very similar to previous commit. The virshStreamInData() callback is used by virStreamSparseSendAll() to detect whether the file the data is read from is in data or hole section. The SendAll() will then send corresponding type of virStream message to make server create a hole or write actual data. But the callback uses virFileInData() even for block devices, which results in an error. Just like in previous commit, emulate a DATA section for block devices. Partially resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1852528 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> --- tools/virsh-util.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/tools/virsh-util.c b/tools/virsh-util.c index 884261eb49..d78a196cc6 100644 --- a/tools/virsh-util.c +++ b/tools/virsh-util.c @@ -230,12 +230,25 @@ virshStreamInData(virStreamPtr st G_GNUC_UNUSED, virshStreamCallbackDataPtr cbData = opaque; vshControl *ctl = cbData->ctl; int fd = cbData->fd; - int ret; - if ((ret = virFileInData(fd, inData, offset)) < 0) - vshError(ctl, "%s", _("Unable to get current position in stream")); + if (cbData->isBlock) { + /* Block devices are always in data section by definition. The + * @sectionLen is slightly more tricky. While we could try and get + * how much bytes is there left until EOF, we can pretend there is + * always X bytes left and let the saferead() below hit EOF (which + * is then handled gracefully anyway). Worst case scenario, this + * branch is called more than once. + * X was chosen to be 1MiB but it has ho special meaning. */ + *inData = 1; + *offset = 1 * 1024 * 1024; + } else { + if (virFileInData(fd, inData, offset) < 0) { + vshError(ctl, "%s", _("Unable to get current position in stream")); + return -1; + } + } - return ret; + return 0; } -- 2.26.2

This is similar to one of previous patches. When receiving stream (on virStorageVolUpload() and subsequent virStreamSparseSendAll()) we may receive a hole. If the volume we are saving the incoming data into is a regular file we just lseek() and ftruncate() to create the hole. But this won't work if the file is a block device. If that is the case we must write zeroes so that any subsequent reader reads nothing just zeroes (just like they would from a hole in a regular file). Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1852528 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> --- src/util/virfdstream.c | 59 +++++++++++++++++++++++++++++++----------- 1 file changed, 44 insertions(+), 15 deletions(-) diff --git a/src/util/virfdstream.c b/src/util/virfdstream.c index 39514ef555..1a7b671179 100644 --- a/src/util/virfdstream.c +++ b/src/util/virfdstream.c @@ -512,6 +512,7 @@ virFDStreamThreadDoRead(virFDStreamDataPtr fdst, static ssize_t virFDStreamThreadDoWrite(virFDStreamDataPtr fdst, bool sparse, + bool isBlock, const int fdin, const int fdout, const char *fdinname, @@ -519,7 +520,6 @@ virFDStreamThreadDoWrite(virFDStreamDataPtr fdst, { ssize_t got = 0; virFDStreamMsgPtr msg = fdst->msg; - off_t off; bool pop = false; switch (msg->type) { @@ -547,19 +547,48 @@ virFDStreamThreadDoWrite(virFDStreamDataPtr fdst, } got = msg->stream.hole.len; - off = lseek(fdout, got, SEEK_CUR); - if (off == (off_t) -1) { - virReportSystemError(errno, - _("unable to seek in %s"), - fdoutname); - return -1; - } - - if (ftruncate(fdout, off) < 0) { - virReportSystemError(errno, - _("unable to truncate %s"), - fdoutname); - return -1; + if (isBlock) { + g_autofree char * buf = NULL; + const size_t buflen = 1 * 1024 * 1024; /* 1MiB */ + size_t toWrite = got; + + /* While for files it's enough to lseek() and ftruncate() to create + * a hole which would emulate zeroes on read(), for block devices + * we have to write zeroes to read() zeroes. And we have to write + * @got bytes of zeroes. Do that in smaller chunks though.*/ + + buf = g_new0(char, buflen); + + while (toWrite) { + size_t count = MIN(toWrite, buflen); + ssize_t r; + + if ((r = safewrite(fdout, buf, count)) < 0) { + virReportSystemError(errno, + _("Unable to write %s"), + fdoutname); + return -1; + } + + toWrite -= r; + } + } else { + off_t off; + + off = lseek(fdout, got, SEEK_CUR); + if (off == (off_t) -1) { + virReportSystemError(errno, + _("unable to seek in %s"), + fdoutname); + return -1; + } + + if (ftruncate(fdout, off) < 0) { + virReportSystemError(errno, + _("unable to truncate %s"), + fdoutname); + return -1; + } } pop = true; @@ -625,7 +654,7 @@ virFDStreamThread(void *opaque) length, total, &dataLen, buflen); else - got = virFDStreamThreadDoWrite(fdst, sparse, + got = virFDStreamThreadDoWrite(fdst, sparse, isBlock, fdin, fdout, fdinname, fdoutname); -- 2.26.2

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- NEWS.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/NEWS.rst b/NEWS.rst index 0669051ee6..3cd09decf6 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -15,6 +15,13 @@ v6.7.0 (unreleased) * **Improvements** + * Allow sparse streams for block devices + + Sparse streams (e.g. ``virsh vol-download --sparse`` or ``virsh vol-upload + --sparse``) now handle if one of the stream ends is a block device. A zero + block detection is performed so that zero block are not transferred and + thus bandwidth is saved. + * **Bug fixes** * virdevmapper: Deal with kernels without DM support -- 2.26.2

On Mon, Aug 24, 2020 at 13:50:21 +0200, Michal Privoznik wrote:
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- NEWS.rst | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/NEWS.rst b/NEWS.rst index 0669051ee6..3cd09decf6 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -15,6 +15,13 @@ v6.7.0 (unreleased)
* **Improvements**
+ * Allow sparse streams for block devices + + Sparse streams (e.g. ``virsh vol-download --sparse`` or ``virsh vol-upload + --sparse``) now handle if one of the stream ends is a block device. A zero + block detection is performed so that zero block are not transferred and + thus bandwidth is saved.
IIRC this version doesn't do zero block detection.
+
Reviewed-by: Peter Krempa <pkrempa@redhat.com> once you make the entry accurate.
participants (2)
-
Michal Privoznik
-
Peter Krempa