[PATCH v11 00/13] Apply COR-filter to the block-stream permanently

The iotest case test_stream_parallel still does not pass after the COR-filter is inserted into the backing chain. As the test case may not be initialized, it does not make a sense and was removed again. v11: 04: Base node overlay is used instead of base. 05: Base node overlay is used instead of base. 06: New. 07: New. 08: New. 09: The new BDS-member 'supported_read_flags' is applied. 10: The 'base_metadata' variable renamed to 'base_unfiltered'. 11: New. 12: The backing-file argument is left in the QMP interface. Warning added. 13: The BDRV_REQ_COPY_ON_READ removed from the stream_populate(); The 'implicit' initialization moved back to COR-filter driver. Base node overlay is used instead of base. The v8 Message-Id: <1601383109-110988-1-git-send-email-andrey.shinkevich@virtuozzo.com> Andrey Shinkevich (13): copy-on-read: Support preadv/pwritev_part functions copy-on-read: add filter append/drop functions qapi: add filter-node-name to block-stream copy-on-read: pass overlay base node name to COR driver copy-on-read: limit COR operations to base in COR driver block: modify the comment for BDRV_REQ_PREFETCH flag block: include supported_read_flags into BDS structure copy-on-read: add support for BDRV_REQ_PREFETCH to COR-filter copy-on-read: skip non-guest reads if no copy needed stream: skip filters when writing backing file name to QCOW2 header stream: mark backing-file argument as deprecated stream: remove unused backing-file name parameter block: apply COR-filter to block-stream jobs block/copy-on-read.c | 171 ++++++++++++++++++++++++++++++++++++++--- block/copy-on-read.h | 35 +++++++++ block/io.c | 3 +- block/monitor/block-hmp-cmds.c | 4 +- block/stream.c | 112 ++++++++++++++++----------- blockdev.c | 25 +++--- docs/system/deprecated.rst | 6 ++ include/block/block.h | 7 +- include/block/block_int.h | 13 +++- qapi/block-core.json | 6 ++ tests/qemu-iotests/030 | 51 ++---------- tests/qemu-iotests/030.out | 4 +- tests/qemu-iotests/141.out | 2 +- tests/qemu-iotests/245 | 19 +++-- 14 files changed, 324 insertions(+), 134 deletions(-) create mode 100644 block/copy-on-read.h -- 1.8.3.1

Add support for the recently introduced functions bdrv_co_preadv_part() and bdrv_co_pwritev_part() to the COR-filter driver. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> --- block/copy-on-read.c | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/block/copy-on-read.c b/block/copy-on-read.c index 2816e61..cb03e0f 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -74,21 +74,25 @@ static int64_t cor_getlength(BlockDriverState *bs) } -static int coroutine_fn cor_co_preadv(BlockDriverState *bs, - uint64_t offset, uint64_t bytes, - QEMUIOVector *qiov, int flags) +static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, + uint64_t offset, uint64_t bytes, + QEMUIOVector *qiov, + size_t qiov_offset, + int flags) { - return bdrv_co_preadv(bs->file, offset, bytes, qiov, - flags | BDRV_REQ_COPY_ON_READ); + return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset, + flags | BDRV_REQ_COPY_ON_READ); } -static int coroutine_fn cor_co_pwritev(BlockDriverState *bs, - uint64_t offset, uint64_t bytes, - QEMUIOVector *qiov, int flags) +static int coroutine_fn cor_co_pwritev_part(BlockDriverState *bs, + uint64_t offset, + uint64_t bytes, + QEMUIOVector *qiov, + size_t qiov_offset, int flags) { - - return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags); + return bdrv_co_pwritev_part(bs->file, offset, bytes, qiov, qiov_offset, + flags); } @@ -137,8 +141,8 @@ static BlockDriver bdrv_copy_on_read = { .bdrv_getlength = cor_getlength, - .bdrv_co_preadv = cor_co_preadv, - .bdrv_co_pwritev = cor_co_pwritev, + .bdrv_co_preadv_part = cor_co_preadv_part, + .bdrv_co_pwritev_part = cor_co_pwritev_part, .bdrv_co_pwrite_zeroes = cor_co_pwrite_zeroes, .bdrv_co_pdiscard = cor_co_pdiscard, .bdrv_co_pwritev_compressed = cor_co_pwritev_compressed, -- 1.8.3.1

Provide API for the COR-filter insertion/removal. Also, drop the filter child permissions for an inactive state when the filter node is being removed. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> --- block/copy-on-read.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++ block/copy-on-read.h | 35 +++++++++++++++++++++ 2 files changed, 123 insertions(+) create mode 100644 block/copy-on-read.h diff --git a/block/copy-on-read.c b/block/copy-on-read.c index cb03e0f..bcccf0f 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -23,11 +23,21 @@ #include "qemu/osdep.h" #include "block/block_int.h" #include "qemu/module.h" +#include "qapi/error.h" +#include "qapi/qmp/qdict.h" +#include "block/copy-on-read.h" + + +typedef struct BDRVStateCOR { + bool active; +} BDRVStateCOR; static int cor_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { + BDRVStateCOR *state = bs->opaque; + bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds, BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY, false, errp); @@ -42,6 +52,13 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags, ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) & bs->file->bs->supported_zero_flags); + state->active = true; + + /* + * We don't need to call bdrv_child_refresh_perms() now as the permissions + * will be updated later when the filter node gets its parent. + */ + return 0; } @@ -57,6 +74,17 @@ static void cor_child_perm(BlockDriverState *bs, BdrvChild *c, uint64_t perm, uint64_t shared, uint64_t *nperm, uint64_t *nshared) { + BDRVStateCOR *s = bs->opaque; + + if (!s->active) { + /* + * While the filter is being removed + */ + *nperm = 0; + *nshared = BLK_PERM_ALL; + return; + } + *nperm = perm & PERM_PASSTHROUGH; *nshared = (shared & PERM_PASSTHROUGH) | PERM_UNCHANGED; @@ -135,6 +163,7 @@ static void cor_lock_medium(BlockDriverState *bs, bool locked) static BlockDriver bdrv_copy_on_read = { .format_name = "copy-on-read", + .instance_size = sizeof(BDRVStateCOR), .bdrv_open = cor_open, .bdrv_child_perm = cor_child_perm, @@ -159,4 +188,63 @@ static void bdrv_copy_on_read_init(void) bdrv_register(&bdrv_copy_on_read); } + +BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs, + QDict *node_options, + int flags, Error **errp) +{ + BlockDriverState *cor_filter_bs; + Error *local_err = NULL; + + cor_filter_bs = bdrv_open(NULL, NULL, node_options, flags, errp); + if (cor_filter_bs == NULL) { + error_prepend(errp, "Could not create COR-filter node: "); + return NULL; + } + + if (!qdict_get_try_str(node_options, "node-name")) { + cor_filter_bs->implicit = true; + } + + bdrv_drained_begin(bs); + bdrv_replace_node(bs, cor_filter_bs, &local_err); + bdrv_drained_end(bs); + + if (local_err) { + bdrv_unref(cor_filter_bs); + error_propagate(errp, local_err); + return NULL; + } + + return cor_filter_bs; +} + + +void bdrv_cor_filter_drop(BlockDriverState *cor_filter_bs) +{ + BdrvChild *child; + BlockDriverState *bs; + BDRVStateCOR *s = cor_filter_bs->opaque; + + child = bdrv_filter_child(cor_filter_bs); + if (!child) { + return; + } + bs = child->bs; + + /* Retain the BDS until we complete the graph change. */ + bdrv_ref(bs); + /* Hold a guest back from writing while permissions are being reset. */ + bdrv_drained_begin(bs); + /* Drop permissions before the graph change. */ + s->active = false; + bdrv_child_refresh_perms(cor_filter_bs, child, &error_abort); + bdrv_replace_node(cor_filter_bs, bs, &error_abort); + + bdrv_drained_end(bs); + bdrv_unref(bs); + bdrv_unref(cor_filter_bs); +} + + block_init(bdrv_copy_on_read_init); diff --git a/block/copy-on-read.h b/block/copy-on-read.h new file mode 100644 index 0000000..d6f2422 --- /dev/null +++ b/block/copy-on-read.h @@ -0,0 +1,35 @@ +/* + * Copy-on-read filter block driver + * + * The filter driver performs Copy-On-Read (COR) operations + * + * Copyright (c) 2018-2020 Virtuozzo International GmbH. + * + * Author: + * Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef BLOCK_COPY_ON_READ +#define BLOCK_COPY_ON_READ + +#include "block/block_int.h" + +BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs, + QDict *node_options, + int flags, Error **errp); +void bdrv_cor_filter_drop(BlockDriverState *cor_filter_bs); + +#endif /* BLOCK_COPY_ON_READ */ -- 1.8.3.1

On 12.10.20 19:43, Andrey Shinkevich wrote:
Provide API for the COR-filter insertion/removal. Also, drop the filter child permissions for an inactive state when the filter node is being removed.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> --- block/copy-on-read.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++ block/copy-on-read.h | 35 +++++++++++++++++++++ 2 files changed, 123 insertions(+) create mode 100644 block/copy-on-read.h
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index cb03e0f..bcccf0f 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c
[...]
@@ -159,4 +188,63 @@ static void bdrv_copy_on_read_init(void) bdrv_register(&bdrv_copy_on_read); }
+ +BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs, + QDict *node_options, + int flags, Error **errp)
I had hoped you could make this a generic block layer function. :( (Because it really is rather generic) *shrug* Reviewed-by: Max Reitz <mreitz@redhat.com>
+{ + BlockDriverState *cor_filter_bs; + Error *local_err = NULL; + + cor_filter_bs = bdrv_open(NULL, NULL, node_options, flags, errp); + if (cor_filter_bs == NULL) { + error_prepend(errp, "Could not create COR-filter node: "); + return NULL; + } + + if (!qdict_get_try_str(node_options, "node-name")) { + cor_filter_bs->implicit = true; + } + + bdrv_drained_begin(bs); + bdrv_replace_node(bs, cor_filter_bs, &local_err); + bdrv_drained_end(bs); + + if (local_err) { + bdrv_unref(cor_filter_bs); + error_propagate(errp, local_err); + return NULL; + } + + return cor_filter_bs; +}

On 14.10.2020 13:44, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
Provide API for the COR-filter insertion/removal. Also, drop the filter child permissions for an inactive state when the filter node is being removed.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> --- block/copy-on-read.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++ block/copy-on-read.h | 35 +++++++++++++++++++++ 2 files changed, 123 insertions(+) create mode 100644 block/copy-on-read.h
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index cb03e0f..bcccf0f 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c
[...]
@@ -159,4 +188,63 @@ static void bdrv_copy_on_read_init(void) bdrv_register(&bdrv_copy_on_read); }
+ +BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs, + QDict *node_options, + int flags, Error **errp)
I had hoped you could make this a generic block layer function. :(
(Because it really is rather generic)
*shrug*
Actually, I did (and still can do) that for the 'append node' function only but not for the 'drop node' one so far... diff --git a/block.c b/block.c index 11ab55f..f41e876 100644 --- a/block.c +++ b/block.c @@ -4669,6 +4669,55 @@ static void bdrv_delete(BlockDriverState *bs) g_free(bs); } +BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict *node_options, + int flags, Error **errp) +{ + BlockDriverState *new_node_bs; + Error *local_err = NULL; + + new_node_bs = bdrv_open(NULL, NULL, node_options, flags, errp); + if (new_node_bs == NULL) { + error_prepend(errp, "Could not create node: "); + return NULL; + } + + bdrv_drained_begin(bs); + bdrv_replace_node(bs, new_node_bs, &local_err); + bdrv_drained_end(bs); + + if (local_err) { + bdrv_unref(new_node_bs); + error_propagate(errp, local_err); + return NULL; + } + + return new_node_bs; +} + +void bdrv_remove_node(BlockDriverState *bs) +{ + BdrvChild *child; + BlockDriverState *inferior_bs; + + child = bdrv_filter_or_cow_child(bs); + if (!child) { + return; + } + inferior_bs = child->bs; + + /* Retain the BDS until we complete the graph change. */ + bdrv_ref(inferior_bs); + /* Hold a guest back from writing while permissions are being reset. */ + bdrv_drained_begin(inferior_bs); + /* Refresh permissions before the graph change. */ + bdrv_child_refresh_perms(bs, child, &error_abort); + bdrv_replace_node(bs, inferior_bs, &error_abort); + + bdrv_drained_end(inferior_bs); + bdrv_unref(inferior_bs); + bdrv_unref(bs); +} So, it is an intermediate solution in this patch of the series. I am going to make both functions generic once Vladimir overhauls the QEMU permission update system. Otherwise, the COR-filter node cannot be removed from the backing chain gracefully. Thank you for your r-b. If the next version comes, I can move the 'append node' function only to the generic layer. Andrey
Reviewed-by: Max Reitz <mreitz@redhat.com>
+{ + BlockDriverState *cor_filter_bs; + Error *local_err = NULL; + + cor_filter_bs = bdrv_open(NULL, NULL, node_options, flags, errp); + if (cor_filter_bs == NULL) { + error_prepend(errp, "Could not create COR-filter node: "); + return NULL; + } + + if (!qdict_get_try_str(node_options, "node-name")) { + cor_filter_bs->implicit = true; + } + + bdrv_drained_begin(bs); + bdrv_replace_node(bs, cor_filter_bs, &local_err); + bdrv_drained_end(bs); + + if (local_err) { + bdrv_unref(cor_filter_bs); + error_propagate(errp, local_err); + return NULL; + } + + return cor_filter_bs; +}

On 14.10.20 16:28, Andrey Shinkevich wrote:
On 14.10.2020 13:44, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
Provide API for the COR-filter insertion/removal. Also, drop the filter child permissions for an inactive state when the filter node is being removed.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> --- block/copy-on-read.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++ block/copy-on-read.h | 35 +++++++++++++++++++++ 2 files changed, 123 insertions(+) create mode 100644 block/copy-on-read.h
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index cb03e0f..bcccf0f 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c
[...]
@@ -159,4 +188,63 @@ static void bdrv_copy_on_read_init(void) bdrv_register(&bdrv_copy_on_read); } + +BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs, + QDict *node_options, + int flags, Error **errp)
I had hoped you could make this a generic block layer function. :(
(Because it really is rather generic)
*shrug*
Actually, I did (and still can do) that for the 'append node' function only but not for the 'drop node' one so far...
Ah, yeah.
diff --git a/block.c b/block.c index 11ab55f..f41e876 100644 --- a/block.c +++ b/block.c @@ -4669,6 +4669,55 @@ static void bdrv_delete(BlockDriverState *bs) g_free(bs); }
+BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict *node_options, + int flags, Error **errp) +{ + BlockDriverState *new_node_bs; + Error *local_err = NULL; + + new_node_bs = bdrv_open(NULL, NULL, node_options, flags, errp); + if (new_node_bs == NULL) { + error_prepend(errp, "Could not create node: "); + return NULL; + } + + bdrv_drained_begin(bs); + bdrv_replace_node(bs, new_node_bs, &local_err); + bdrv_drained_end(bs); + + if (local_err) { + bdrv_unref(new_node_bs); + error_propagate(errp, local_err); + return NULL; + } + + return new_node_bs; +} + +void bdrv_remove_node(BlockDriverState *bs) +{ + BdrvChild *child; + BlockDriverState *inferior_bs; + + child = bdrv_filter_or_cow_child(bs); + if (!child) { + return; + } + inferior_bs = child->bs; + + /* Retain the BDS until we complete the graph change. */ + bdrv_ref(inferior_bs); + /* Hold a guest back from writing while permissions are being reset. */ + bdrv_drained_begin(inferior_bs); + /* Refresh permissions before the graph change. */ + bdrv_child_refresh_perms(bs, child, &error_abort); + bdrv_replace_node(bs, inferior_bs, &error_abort); + + bdrv_drained_end(inferior_bs); + bdrv_unref(inferior_bs); + bdrv_unref(bs); +}
So, it is an intermediate solution in this patch of the series. I am going to make both functions generic once Vladimir overhauls the QEMU permission update system. Otherwise, the COR-filter node cannot be removed from the backing chain gracefully.
True.
Thank you for your r-b. If the next version comes, I can move the 'append node' function only to the generic layer.
Thanks! Even if you decide against it, I’m happy as long as there are plans to perhaps make it generic at some point. (I’m just afraid this function might be useful in some other case, too, and we forget that it exists here already, and then end up reimplementing it.) Max

Provide the possibility to pass the 'filter-node-name' parameter to the block-stream job as it is done for the commit block job. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> --- block/monitor/block-hmp-cmds.c | 4 ++-- block/stream.c | 4 +++- blockdev.c | 4 +++- include/block/block_int.h | 7 ++++++- qapi/block-core.json | 6 ++++++ 5 files changed, 20 insertions(+), 5 deletions(-) diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c index 4d3db5e..4e66775 100644 --- a/block/monitor/block-hmp-cmds.c +++ b/block/monitor/block-hmp-cmds.c @@ -507,8 +507,8 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict) qmp_block_stream(true, device, device, base != NULL, base, false, NULL, false, NULL, qdict_haskey(qdict, "speed"), speed, true, - BLOCKDEV_ON_ERROR_REPORT, false, false, false, false, - &error); + BLOCKDEV_ON_ERROR_REPORT, false, NULL, false, false, false, + false, &error); hmp_handle_error(mon, error); } diff --git a/block/stream.c b/block/stream.c index 8ce6729..e0540ee 100644 --- a/block/stream.c +++ b/block/stream.c @@ -221,7 +221,9 @@ static const BlockJobDriver stream_job_driver = { void stream_start(const char *job_id, BlockDriverState *bs, BlockDriverState *base, const char *backing_file_str, int creation_flags, int64_t speed, - BlockdevOnError on_error, Error **errp) + BlockdevOnError on_error, + const char *filter_node_name, + Error **errp) { StreamBlockJob *s; BlockDriverState *iter; diff --git a/blockdev.c b/blockdev.c index bebd3ba..d719c47 100644 --- a/blockdev.c +++ b/blockdev.c @@ -2489,6 +2489,7 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device, bool has_backing_file, const char *backing_file, bool has_speed, int64_t speed, bool has_on_error, BlockdevOnError on_error, + bool has_filter_node_name, const char *filter_node_name, bool has_auto_finalize, bool auto_finalize, bool has_auto_dismiss, bool auto_dismiss, Error **errp) @@ -2571,7 +2572,8 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device, } stream_start(has_job_id ? job_id : NULL, bs, base_bs, base_name, - job_flags, has_speed ? speed : 0, on_error, &local_err); + job_flags, has_speed ? speed : 0, on_error, + filter_node_name, &local_err); if (local_err) { error_propagate(errp, local_err); goto out; diff --git a/include/block/block_int.h b/include/block/block_int.h index 38cad9d..f782737 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -1134,6 +1134,9 @@ int is_windows_drive(const char *filename); * See @BlockJobCreateFlags * @speed: The maximum speed, in bytes per second, or 0 for unlimited. * @on_error: The action to take upon error. + * @filter_node_name: The node name that should be assigned to the filter + * driver that the commit job inserts into the graph above @bs. NULL means + * that a node name should be autogenerated. * @errp: Error object. * * Start a streaming operation on @bs. Clusters that are unallocated @@ -1146,7 +1149,9 @@ int is_windows_drive(const char *filename); void stream_start(const char *job_id, BlockDriverState *bs, BlockDriverState *base, const char *backing_file_str, int creation_flags, int64_t speed, - BlockdevOnError on_error, Error **errp); + BlockdevOnError on_error, + const char *filter_node_name, + Error **errp); /** * commit_start: diff --git a/qapi/block-core.json b/qapi/block-core.json index 3c16f1e..32fb097 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -2533,6 +2533,11 @@ # 'stop' and 'enospc' can only be used if the block device # supports io-status (see BlockInfo). Since 1.3. # +# @filter-node-name: the node name that should be assigned to the +# filter driver that the stream job inserts into the graph +# above @device. If this option is not given, a node name is +# autogenerated. (Since: 5.2) +# # @auto-finalize: When false, this job will wait in a PENDING state after it has # finished its work, waiting for @block-job-finalize before # making any block graph changes. @@ -2563,6 +2568,7 @@ 'data': { '*job-id': 'str', 'device': 'str', '*base': 'str', '*base-node': 'str', '*backing-file': 'str', '*speed': 'int', '*on-error': 'BlockdevOnError', + '*filter-node-name': 'str', '*auto-finalize': 'bool', '*auto-dismiss': 'bool' } } ## -- 1.8.3.1

We are going to use the COR-filter for a block-stream job. To limit COR operations by the base node in the backing chain during stream job, pass the name of overlay base node to the copy-on-read driver as base node itself may change due to possible concurrent jobs. The rest of the functionality will be implemented in the patch that follows. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/block/copy-on-read.c b/block/copy-on-read.c index bcccf0f..c578b1b 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -24,19 +24,24 @@ #include "block/block_int.h" #include "qemu/module.h" #include "qapi/error.h" +#include "qapi/qmp/qerror.h" #include "qapi/qmp/qdict.h" #include "block/copy-on-read.h" typedef struct BDRVStateCOR { bool active; + BlockDriverState *base_overlay; } BDRVStateCOR; static int cor_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { + BlockDriverState *base_overlay = NULL; BDRVStateCOR *state = bs->opaque; + /* We need the base overlay node rather than the base itself */ + const char *base_overlay_node = qdict_get_try_str(options, "base"); bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds, BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY, @@ -52,7 +57,16 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags, ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) & bs->file->bs->supported_zero_flags); + if (base_overlay_node) { + qdict_del(options, "base"); + base_overlay = bdrv_lookup_bs(NULL, base_overlay_node, errp); + if (!base_overlay) { + error_setg(errp, QERR_BASE_NOT_FOUND, base_overlay_node); + return -EINVAL; + } + } state->active = true; + state->base_overlay = base_overlay; /* * We don't need to call bdrv_child_refresh_perms() now as the permissions -- 1.8.3.1

On 12.10.20 19:43, Andrey Shinkevich wrote:
We are going to use the COR-filter for a block-stream job. To limit COR operations by the base node in the backing chain during stream job, pass the name of overlay base node to the copy-on-read driver as base node itself may change due to possible concurrent jobs. The rest of the functionality will be implemented in the patch that follows.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
Is there a reason why you didn’t add this option to QAPI (as part of a yet-to-be-created BlockdevOptionsCor)? Because I’d really like it there.
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index bcccf0f..c578b1b 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -24,19 +24,24 @@ #include "block/block_int.h" #include "qemu/module.h" #include "qapi/error.h" +#include "qapi/qmp/qerror.h" #include "qapi/qmp/qdict.h" #include "block/copy-on-read.h"
typedef struct BDRVStateCOR { bool active; + BlockDriverState *base_overlay; } BDRVStateCOR;
static int cor_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { + BlockDriverState *base_overlay = NULL; BDRVStateCOR *state = bs->opaque; + /* We need the base overlay node rather than the base itself */ + const char *base_overlay_node = qdict_get_try_str(options, "base");
Shouldn’t it be called base-overlay or above-base then?
bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds, BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY, @@ -52,7 +57,16 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags, ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) & bs->file->bs->supported_zero_flags);
+ if (base_overlay_node) { + qdict_del(options, "base"); + base_overlay = bdrv_lookup_bs(NULL, base_overlay_node, errp);
I think this is a use-after-free. The storage @base_overlay_node points to belongs to a QString, which is referenced only by @options; so deleting that element of @options should free that string. Max
+ if (!base_overlay) { + error_setg(errp, QERR_BASE_NOT_FOUND, base_overlay_node); + return -EINVAL; + } + } state->active = true; + state->base_overlay = base_overlay;
/* * We don't need to call bdrv_child_refresh_perms() now as the permissions

On 14.10.20 13:09, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
We are going to use the COR-filter for a block-stream job. To limit COR operations by the base node in the backing chain during stream job, pass the name of overlay base node to the copy-on-read driver as base node itself may change due to possible concurrent jobs. The rest of the functionality will be implemented in the patch that follows.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
Is there a reason why you didn’t add this option to QAPI (as part of a yet-to-be-created BlockdevOptionsCor)? Because I’d really like it there.
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index bcccf0f..c578b1b 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -24,19 +24,24 @@ #include "block/block_int.h" #include "qemu/module.h" #include "qapi/error.h" +#include "qapi/qmp/qerror.h" #include "qapi/qmp/qdict.h" #include "block/copy-on-read.h"
typedef struct BDRVStateCOR { bool active; + BlockDriverState *base_overlay; } BDRVStateCOR;
static int cor_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { + BlockDriverState *base_overlay = NULL; BDRVStateCOR *state = bs->opaque; + /* We need the base overlay node rather than the base itself */ + const char *base_overlay_node = qdict_get_try_str(options, "base");
Shouldn’t it be called base-overlay or above-base then?
While reviewing patch 5, I realized this sould indeed be base-overlay (i.e. the next allocation-bearing layer above the base, not just a filter on top of base), but that should be noted somewhere, of course. Best would be the QAPI documentation for this option. O:) Max

14.10.2020 14:57, Max Reitz wrote:
On 14.10.20 13:09, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
We are going to use the COR-filter for a block-stream job. To limit COR operations by the base node in the backing chain during stream job, pass the name of overlay base node to the copy-on-read driver as base node itself may change due to possible concurrent jobs. The rest of the functionality will be implemented in the patch that follows.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
Is there a reason why you didn’t add this option to QAPI (as part of a yet-to-be-created BlockdevOptionsCor)? Because I’d really like it there.
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index bcccf0f..c578b1b 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -24,19 +24,24 @@ #include "block/block_int.h" #include "qemu/module.h" #include "qapi/error.h" +#include "qapi/qmp/qerror.h" #include "qapi/qmp/qdict.h" #include "block/copy-on-read.h"
typedef struct BDRVStateCOR { bool active; + BlockDriverState *base_overlay; } BDRVStateCOR;
static int cor_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { + BlockDriverState *base_overlay = NULL; BDRVStateCOR *state = bs->opaque; + /* We need the base overlay node rather than the base itself */ + const char *base_overlay_node = qdict_get_try_str(options, "base");
Shouldn’t it be called base-overlay or above-base then?
While reviewing patch 5, I realized this sould indeed be base-overlay (i.e. the next allocation-bearing layer above the base, not just a filter on top of base), but that should be noted somewhere, of course. Best would be the QAPI documentation for this option. O:)
What about naming it just "bottom" or "bottom-node"? If we don't have base, it's strange to have something "relative to base". And we can document, that "bottom" must be a non-filter node in a backing chain of "top". -- Best regards, Vladimir

On 14.10.20 16:56, Vladimir Sementsov-Ogievskiy wrote:
14.10.2020 14:57, Max Reitz wrote:
On 14.10.20 13:09, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
We are going to use the COR-filter for a block-stream job. To limit COR operations by the base node in the backing chain during stream job, pass the name of overlay base node to the copy-on-read driver as base node itself may change due to possible concurrent jobs. The rest of the functionality will be implemented in the patch that follows.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
Is there a reason why you didn’t add this option to QAPI (as part of a yet-to-be-created BlockdevOptionsCor)? Because I’d really like it there.
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index bcccf0f..c578b1b 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -24,19 +24,24 @@ #include "block/block_int.h" #include "qemu/module.h" #include "qapi/error.h" +#include "qapi/qmp/qerror.h" #include "qapi/qmp/qdict.h" #include "block/copy-on-read.h" typedef struct BDRVStateCOR { bool active; + BlockDriverState *base_overlay; } BDRVStateCOR; static int cor_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { + BlockDriverState *base_overlay = NULL; BDRVStateCOR *state = bs->opaque; + /* We need the base overlay node rather than the base itself */ + const char *base_overlay_node = qdict_get_try_str(options, "base");
Shouldn’t it be called base-overlay or above-base then?
While reviewing patch 5, I realized this sould indeed be base-overlay
(Just realized that sounds different from how I meant it. I meant that “above-base” would be wrong, so from the two, if any, it should be “base-overlay”.)
(i.e. the next allocation-bearing layer above the base, not just a filter on top of base), but that should be noted somewhere, of course. Best would be the QAPI documentation for this option. O:)
What about naming it just "bottom" or "bottom-node"? If we don't have base, it's strange to have something "relative to base". And we can document, that "bottom" must be a non-filter node in a backing chain of "top".
Works for me, too. Max

On 14.10.2020 14:09, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
We are going to use the COR-filter for a block-stream job. To limit COR operations by the base node in the backing chain during stream job, pass the name of overlay base node to the copy-on-read driver as base node itself may change due to possible concurrent jobs. The rest of the functionality will be implemented in the patch that follows.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
Is there a reason why you didn’t add this option to QAPI (as part of a yet-to-be-created BlockdevOptionsCor)? Because I’d really like it there.
I agree that passing a base overlay under the base option looks clumsy. We could pass the base node name and find its overlay ourselves here in cor_open(). In that case, we can use the existing QAPI. The reason I used the existing QAPI is to make it easier for a user to operate with the traditional options and to keep things simple. So, the user shouldn't think what overlay or above-base node to pass. If we introduce the specific BlockdevOptionsCor, what other options may come with?
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index bcccf0f..c578b1b 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -24,19 +24,24 @@ #include "block/block_int.h" #include "qemu/module.h" #include "qapi/error.h" +#include "qapi/qmp/qerror.h" #include "qapi/qmp/qdict.h" #include "block/copy-on-read.h"
typedef struct BDRVStateCOR { bool active; + BlockDriverState *base_overlay; } BDRVStateCOR;
static int cor_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { + BlockDriverState *base_overlay = NULL; BDRVStateCOR *state = bs->opaque; + /* We need the base overlay node rather than the base itself */ + const char *base_overlay_node = qdict_get_try_str(options, "base");
Shouldn’t it be called base-overlay or above-base then?
The base_overlay identifier is used below as the pointer to BS. The base_overlay_node stands for the name of the node. I used that identifier to differ between the types. And the above_base has another meaning per block/stream.c - it can be a temporary filter with a JSON-name.
bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds, BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY, @@ -52,7 +57,16 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags, ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) & bs->file->bs->supported_zero_flags);
+ if (base_overlay_node) { + qdict_del(options, "base"); + base_overlay = bdrv_lookup_bs(NULL, base_overlay_node, errp);
I think this is a use-after-free. The storage @base_overlay_node points to belongs to a QString, which is referenced only by @options; so deleting that element of @options should free that string.
Max
I will swap those two function calls (bdrv_lookup_bs(); qdict_del();). Thank you. Andrey
+ if (!base_overlay) { + error_setg(errp, QERR_BASE_NOT_FOUND, base_overlay_node); + return -EINVAL; + } + } state->active = true; + state->base_overlay = base_overlay;
/* * We don't need to call bdrv_child_refresh_perms() now as the permissions

14.10.2020 19:08, Andrey Shinkevich wrote:
On 14.10.2020 14:09, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
We are going to use the COR-filter for a block-stream job. To limit COR operations by the base node in the backing chain during stream job, pass the name of overlay base node to the copy-on-read driver as base node itself may change due to possible concurrent jobs. The rest of the functionality will be implemented in the patch that follows.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
Is there a reason why you didn’t add this option to QAPI (as part of a yet-to-be-created BlockdevOptionsCor)? Because I’d really like it there.
I agree that passing a base overlay under the base option looks clumsy. We could pass the base node name and find its overlay ourselves here in cor_open(). In that case, we can use the existing QAPI.
Actually, there is no existing QAPI: if you don't modify qapi/*.json, user is not able to pass the option through QAPI. It's still possible to pass the option through command-line, or when create the filter internally (like we are going to do in block-stream), but not through QAPI. So, it's better to make a new QAPI parameter, to make the new option available for QMP interface.
The reason I used the existing QAPI is to make it easier for a user to operate with the traditional options and to keep things simple. So, the user shouldn't think what overlay or above-base node to pass. If we introduce the specific BlockdevOptionsCor, what other options may come with?
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index bcccf0f..c578b1b 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -24,19 +24,24 @@ #include "block/block_int.h" #include "qemu/module.h" #include "qapi/error.h" +#include "qapi/qmp/qerror.h" #include "qapi/qmp/qdict.h" #include "block/copy-on-read.h" typedef struct BDRVStateCOR { bool active; + BlockDriverState *base_overlay; } BDRVStateCOR; static int cor_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { + BlockDriverState *base_overlay = NULL; BDRVStateCOR *state = bs->opaque; + /* We need the base overlay node rather than the base itself */ + const char *base_overlay_node = qdict_get_try_str(options, "base");
Shouldn’t it be called base-overlay or above-base then?
The base_overlay identifier is used below as the pointer to BS. The base_overlay_node stands for the name of the node. I used that identifier to differ between the types. And the above_base has another meaning per block/stream.c - it can be a temporary filter with a JSON-name.
bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds, BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY, @@ -52,7 +57,16 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags, ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) & bs->file->bs->supported_zero_flags); + if (base_overlay_node) { + qdict_del(options, "base"); + base_overlay = bdrv_lookup_bs(NULL, base_overlay_node, errp);
I think this is a use-after-free. The storage @base_overlay_node points to belongs to a QString, which is referenced only by @options; so deleting that element of @options should free that string.
Max
I will swap those two function calls (bdrv_lookup_bs(); qdict_del();). Thank you.
Andrey
+ if (!base_overlay) { + error_setg(errp, QERR_BASE_NOT_FOUND, base_overlay_node); + return -EINVAL; + } + } state->active = true; + state->base_overlay = base_overlay; /* * We don't need to call bdrv_child_refresh_perms() now as the permissions
-- Best regards, Vladimir

On 14.10.20 18:08, Andrey Shinkevich wrote:
On 14.10.2020 14:09, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
We are going to use the COR-filter for a block-stream job. To limit COR operations by the base node in the backing chain during stream job, pass the name of overlay base node to the copy-on-read driver as base node itself may change due to possible concurrent jobs. The rest of the functionality will be implemented in the patch that follows.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
Is there a reason why you didn’t add this option to QAPI (as part of a yet-to-be-created BlockdevOptionsCor)? Because I’d really like it there.
I agree that passing a base overlay under the base option looks clumsy. We could pass the base node name and find its overlay ourselves here in cor_open(). In that case, we can use the existing QAPI.
Sounds more complicated than to just split off BlockdevOptionsCor (from BlockdevOptionsGenericFormat, like e.g. BlockdevOptionsRaw does it).
The reason I used the existing QAPI is to make it easier for a user to operate with the traditional options and to keep things simple. So, the user shouldn't think what overlay or above-base node to pass.
Well, they don’t have to pass anything if they don’t need a bottom node. So unless they want to use a bottom/base-overlay node, it won’t become more complicated.
If we introduce the specific BlockdevOptionsCor, what other options may come with?
Nothing yet, I think.
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index bcccf0f..c578b1b 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -24,19 +24,24 @@ #include "block/block_int.h" #include "qemu/module.h" #include "qapi/error.h" +#include "qapi/qmp/qerror.h" #include "qapi/qmp/qdict.h" #include "block/copy-on-read.h" typedef struct BDRVStateCOR { bool active; + BlockDriverState *base_overlay; } BDRVStateCOR; static int cor_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { + BlockDriverState *base_overlay = NULL; BDRVStateCOR *state = bs->opaque; + /* We need the base overlay node rather than the base itself */ + const char *base_overlay_node = qdict_get_try_str(options, "base");
Shouldn’t it be called base-overlay or above-base then?
The base_overlay identifier is used below as the pointer to BS. The base_overlay_node stands for the name of the node. I used that identifier to differ between the types. And the above_base has another meaning per block/stream.c - it can be a temporary filter with a JSON-name.
Yes, agreed; I’m not talking about the variable identifier though. I’m talking about the option name, which in this version is just “base”. (And whenever that option is used, once here, once later in block/stream.c, there is an explanatory comment why we don’t pass the base node through it, but the first overlay above the base. Which makes me believe perhaps it shouldn’t be called “base”.)
bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds, BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY, @@ -52,7 +57,16 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags, ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) & bs->file->bs->supported_zero_flags); + if (base_overlay_node) { + qdict_del(options, "base"); + base_overlay = bdrv_lookup_bs(NULL, base_overlay_node, errp);
I think this is a use-after-free. The storage @base_overlay_node points to belongs to a QString, which is referenced only by @options; so deleting that element of @options should free that string.
Max
I will swap those two function calls (bdrv_lookup_bs(); qdict_del();). Thank you.
Don’t forget that the error_setg() below also makes use of it:
+ if (!base_overlay) { + error_setg(errp, QERR_BASE_NOT_FOUND, base_overlay_node); + return -EINVAL; + } + }
Max

Limit COR operations by the base node in the backing chain when the overlay base node name is given. It will be useful for a block stream job when the COR-filter is applied. The overlay base node is passed as the base itself may change due to concurrent commit jobs on the same backing chain. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-) diff --git a/block/copy-on-read.c b/block/copy-on-read.c index c578b1b..dfbd6ad 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, size_t qiov_offset, int flags) { - return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset, - flags | BDRV_REQ_COPY_ON_READ); + int64_t n = 0; + int64_t size = offset + bytes; + int local_flags; + int ret; + BDRVStateCOR *state = bs->opaque; + + if (!state->base_overlay) { + return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset, + flags | BDRV_REQ_COPY_ON_READ); + } + + while (offset < size) { + local_flags = flags; + + /* In case of failure, try to copy-on-read anyway */ + ret = bdrv_is_allocated(bs->file->bs, offset, bytes, &n); + if (!ret) { + ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs), + state->base_overlay, true, offset, + n, &n); + if (ret) { + local_flags |= BDRV_REQ_COPY_ON_READ; + } + } + + ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, + local_flags); + if (ret < 0) { + return ret; + } + + offset += n; + qiov_offset += n; + bytes -= n; + } + + return 0; } -- 1.8.3.1

On 12.10.20 19:43, Andrey Shinkevich wrote:
Limit COR operations by the base node in the backing chain when the overlay base node name is given. It will be useful for a block stream job when the COR-filter is applied. The overlay base node is passed as the base itself may change due to concurrent commit jobs on the same backing chain.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-)
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index c578b1b..dfbd6ad 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, size_t qiov_offset, int flags) { - return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset, - flags | BDRV_REQ_COPY_ON_READ); + int64_t n = 0; + int64_t size = offset + bytes; + int local_flags; + int ret; + BDRVStateCOR *state = bs->opaque; + + if (!state->base_overlay) { + return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset, + flags | BDRV_REQ_COPY_ON_READ); + } + + while (offset < size) { + local_flags = flags; + + /* In case of failure, try to copy-on-read anyway */ + ret = bdrv_is_allocated(bs->file->bs, offset, bytes, &n); + if (!ret) {
In case of failure, a negative value is going to be returned, we won’t go into this conditional block, and local_flags isn’t going to contain BDRV_REQ_COPY_ON_READ. So the idea of CORing in case of failure sounds sound to me, but it doesn’t look like that’s done.
+ ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
I think this should either be bdrv_backing_chain_next() or we must rule out the possibility of bs->file->bs being a filter somewhere. I think I’d prefer the former.
+ state->base_overlay, true, offset, + n, &n); + if (ret) {
“ret == 1 || ret < 0” would be more explicit (and in line with the “!ret || ret < 0” probably needed above), but correct either way. Max

On 14.10.2020 14:59, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
Limit COR operations by the base node in the backing chain when the overlay base node name is given. It will be useful for a block stream job when the COR-filter is applied. The overlay base node is passed as the base itself may change due to concurrent commit jobs on the same backing chain.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-)
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index c578b1b..dfbd6ad 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, size_t qiov_offset, int flags) { - return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset, - flags | BDRV_REQ_COPY_ON_READ); + int64_t n = 0; + int64_t size = offset + bytes; + int local_flags; + int ret; + BDRVStateCOR *state = bs->opaque; + + if (!state->base_overlay) { + return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset, + flags | BDRV_REQ_COPY_ON_READ); + } + + while (offset < size) { + local_flags = flags; + + /* In case of failure, try to copy-on-read anyway */ + ret = bdrv_is_allocated(bs->file->bs, offset, bytes, &n); + if (!ret) {
In case of failure, a negative value is going to be returned, we won’t go into this conditional block, and local_flags isn’t going to contain BDRV_REQ_COPY_ON_READ.
So the idea of CORing in case of failure sounds sound to me, but it doesn’t look like that’s done.
Yes, it's obvious. That was just my fault to miss setting the additional condition for "ret < 0". Thank you for noticing that. Andrey
+ ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
I think this should either be bdrv_backing_chain_next() or we must rule out the possibility of bs->file->bs being a filter somewhere. I think I’d prefer the former.
+ state->base_overlay, true, offset, + n, &n); + if (ret) {
“ret == 1 || ret < 0” would be more explicit (and in line with the “!ret || ret < 0” probably needed above), but correct either way.
Max

On 12.10.20 19:43, Andrey Shinkevich wrote:
Limit COR operations by the base node in the backing chain when the overlay base node name is given. It will be useful for a block stream job when the COR-filter is applied. The overlay base node is passed as the base itself may change due to concurrent commit jobs on the same backing chain.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-)
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index c578b1b..dfbd6ad 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, size_t qiov_offset, int flags) { - return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset, - flags | BDRV_REQ_COPY_ON_READ); + int64_t n = 0; + int64_t size = offset + bytes;
Just when I hit send I noticed that “end” would be a more fitting name for this variable.
+ int local_flags; + int ret; + BDRVStateCOR *state = bs->opaque; + + if (!state->base_overlay) { + return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset, + flags | BDRV_REQ_COPY_ON_READ); + } + + while (offset < size) {
(because I got a bit confused looking at this) (Though dropping @size (or @end) and just checking when @bytes becomes 0 should work, too.)
+ local_flags = flags; + + /* In case of failure, try to copy-on-read anyway */ + ret = bdrv_is_allocated(bs->file->bs, offset, bytes, &n); + if (!ret) { + ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs), + state->base_overlay, true, offset, + n, &n); + if (ret) { + local_flags |= BDRV_REQ_COPY_ON_READ; + } + }
Furthermore, I just noticed – can the is_allocated functions not return 0 in @n, when @offset is a the EOF? Is that something to look out for? (I’m not sure.) Max
+ + ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, + local_flags); + if (ret < 0) { + return ret; + } + + offset += n; + qiov_offset += n; + bytes -= n; + } + + return 0; }

On 14.10.2020 15:01, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
Limit COR operations by the base node in the backing chain when the overlay base node name is given. It will be useful for a block stream job when the COR-filter is applied. The overlay base node is passed as the base itself may change due to concurrent commit jobs on the same backing chain.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-)
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index c578b1b..dfbd6ad 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, size_t qiov_offset, int flags) {
[...]
+ ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs), + state->base_overlay, true, offset, + n, &n); + if (ret) { + local_flags |= BDRV_REQ_COPY_ON_READ; + } + }
Furthermore, I just noticed – can the is_allocated functions not return 0 in @n, when @offset is a the EOF? Is that something to look out for? (I’m not sure.)
Max
The check for EOF is managed earlier in the stream_run() for a block-stream job. For other cases of using the COR-filter, the check for EOF can be added to the cor_co_preadv_part(). I would be more than happy if we can escape the duplicated checking for is_allocated in the block-stream. But how the stream_run() can stop calling the blk_co_preadv() when EOF is reached if is_allocated removed from it? May the cor_co_preadv_part() return EOF (or other error code) to be handled by a caller if (ret == 0 && n == 0 && (flags & BDRV_REQ_PREFETCH)? Andrey

On 14.10.20 20:57, Andrey Shinkevich wrote:
On 14.10.2020 15:01, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
Limit COR operations by the base node in the backing chain when the overlay base node name is given. It will be useful for a block stream job when the COR-filter is applied. The overlay base node is passed as the base itself may change due to concurrent commit jobs on the same backing chain.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-)
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index c578b1b..dfbd6ad 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, size_t qiov_offset, int flags) {
[...]
+ ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs), + state->base_overlay, true, offset, + n, &n); + if (ret) { + local_flags |= BDRV_REQ_COPY_ON_READ; + } + }
Furthermore, I just noticed – can the is_allocated functions not return 0 in @n, when @offset is a the EOF? Is that something to look out for? (I’m not sure.)
Max
The check for EOF is managed earlier in the stream_run() for a block-stream job. For other cases of using the COR-filter, the check for EOF can be added to the cor_co_preadv_part(). I would be more than happy if we can escape the duplicated checking for is_allocated in the block-stream. But how the stream_run() can stop calling the blk_co_preadv() when EOF is reached if is_allocated removed from it?
True. Is it that bad to lose that optimization, though? (And I would expect the case of a short backing file to be rather rare, too.)
May the cor_co_preadv_part() return EOF (or other error code) to be handled by a caller if (ret == 0 && n == 0 && (flags & BDRV_REQ_PREFETCH)?
That sounds like a bad hack. I’d rather keep the double is_allocated(). But what would be the problem with losing the short backing file optimization? Just performance? Or would we end up writing actual zeroes into the overlay past the end of the backing file? Hm, probably not, if the COR filter would detect that case and handle it like stream does. So it seems only a question of performance to me, and I don’t think it would be that bad to in this rather rare case to have a bunch of useless is_allocated and is_allocated_above calls past the backing file’s EOF. (Maybe I’m wrong, though.) Max

On 15.10.2020 18:56, Max Reitz wrote:
On 14.10.20 20:57, Andrey Shinkevich wrote:
On 14.10.2020 15:01, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
Limit COR operations by the base node in the backing chain when the overlay base node name is given. It will be useful for a block stream job when the COR-filter is applied. The overlay base node is passed as the base itself may change due to concurrent commit jobs on the same backing chain.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-)
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index c578b1b..dfbd6ad 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, size_t qiov_offset, int flags) {
[...]
+ ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs), + state->base_overlay, true, offset, + n, &n); + if (ret) { + local_flags |= BDRV_REQ_COPY_ON_READ; + } + }
Furthermore, I just noticed – can the is_allocated functions not return 0 in @n, when @offset is a the EOF? Is that something to look out for? (I’m not sure.)
Max
The check for EOF is managed earlier in the stream_run() for a block-stream job. For other cases of using the COR-filter, the check for EOF can be added to the cor_co_preadv_part(). I would be more than happy if we can escape the duplicated checking for is_allocated in the block-stream. But how the stream_run() can stop calling the blk_co_preadv() when EOF is reached if is_allocated removed from it?
True. Is it that bad to lose that optimization, though? (And I would expect the case of a short backing file to be rather rare, too.)
May the cor_co_preadv_part() return EOF (or other error code) to be handled by a caller if (ret == 0 && n == 0 && (flags & BDRV_REQ_PREFETCH)?
That sounds like a bad hack. I’d rather keep the double is_allocated().
But what would be the problem with losing the short backing file optimization? Just performance? Or would we end up writing actual zeroes into the overlay past the end of the backing file? Hm, probably not, if the COR filter would detect that case and handle it like stream does.
So it seems only a question of performance to me, and I don’t think it would be that bad to in this rather rare case to have a bunch of useless is_allocated and is_allocated_above calls past the backing file’s EOF. (Maybe I’m wrong, though.)
Max
Thank you, Max, for sharing your thoughts on this subject. The double check for the is_allocated in the stream_run() is a performance degradation also. And we will make a check for the EOF in the cor_co_preadv_part() in either case, won't we? Andrey

15.10.2020 20:37, Andrey Shinkevich wrote:
On 15.10.2020 18:56, Max Reitz wrote:
On 14.10.20 20:57, Andrey Shinkevich wrote:
On 14.10.2020 15:01, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
Limit COR operations by the base node in the backing chain when the overlay base node name is given. It will be useful for a block stream job when the COR-filter is applied. The overlay base node is passed as the base itself may change due to concurrent commit jobs on the same backing chain.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-)
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index c578b1b..dfbd6ad 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, size_t qiov_offset, int flags) {
[...]
+ ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs), + state->base_overlay, true, offset, + n, &n); + if (ret) { + local_flags |= BDRV_REQ_COPY_ON_READ; + } + }
Furthermore, I just noticed – can the is_allocated functions not return 0 in @n, when @offset is a the EOF? Is that something to look out for? (I’m not sure.)
Max
The check for EOF is managed earlier in the stream_run() for a block-stream job. For other cases of using the COR-filter, the check for EOF can be added to the cor_co_preadv_part(). I would be more than happy if we can escape the duplicated checking for is_allocated in the block-stream. But how the stream_run() can stop calling the blk_co_preadv() when EOF is reached if is_allocated removed from it?
True. Is it that bad to lose that optimization, though? (And I would expect the case of a short backing file to be rather rare, too.)
May the cor_co_preadv_part() return EOF (or other error code) to be handled by a caller if (ret == 0 && n == 0 && (flags & BDRV_REQ_PREFETCH)?
That sounds like a bad hack. I’d rather keep the double is_allocated().
But what would be the problem with losing the short backing file optimization? Just performance? Or would we end up writing actual zeroes into the overlay past the end of the backing file? Hm, probably not, if the COR filter would detect that case and handle it like stream does.
So it seems only a question of performance to me, and I don’t think it would be that bad to in this rather rare case to have a bunch of useless is_allocated and is_allocated_above calls past the backing file’s EOF. (Maybe I’m wrong, though.)
Max
Thank you, Max, for sharing your thoughts on this subject. The double check for the is_allocated in the stream_run() is a performance degradation also. And we will make a check for the EOF in the cor_co_preadv_part() in either case, won't we?
Andrey
I'd keep is_allocated logic in block-stream as is for now. It's not good that we check block-status several times (in block-stream, than in cor filter, than in generic COR code), but it shouldn't be real problem, and we can postpone optimizations for the next step. Also, the resulting architecture is not final. I believe that in bright future block-stream will work through block-copy like backup does. And COR filter will call block_copy() by itself, and generic COR code will be dropped together with BDRV_REQ_COR flag. And stream will do just one background call of block_copy for the whole device, like backup in finish on my in-flight backup series. And all extra levels of block_status checking will leave. About EOF problem discussed here, let's look at more generic problem: we are going to skip _large_ area, but skipping chunk-by-chunk is inefficient. So, we just want to learn to skip large areas. The simplest way is just to call is_allocated/is_allocated_above from current offset to device end, if we decided to skip current chunk. Then we'll know how much to skip totally. But that kind of optimization is not directly related to these series and may be done in separate if needed. -- Best regards, Vladimir

Modify the comment for the flag BDRV_REQ_PREFETCH as we are going to use it alone and pass it to the COR-filter driver for further processing. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- include/block/block.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/include/block/block.h b/include/block/block.h index 981ab5b..2b7efd1 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -71,9 +71,10 @@ typedef enum { BDRV_REQ_NO_FALLBACK = 0x100, /* - * BDRV_REQ_PREFETCH may be used only together with BDRV_REQ_COPY_ON_READ - * on read request and means that caller doesn't really need data to be - * written to qiov parameter which may be NULL. + * BDRV_REQ_PREFETCH may be used together with the BDRV_REQ_COPY_ON_READ + * flag or when the COR-filter applied to read operations and means that + * caller doesn't really need data to be written to qiov parameter which + * may be NULL. */ BDRV_REQ_PREFETCH = 0x200, /* Mask of valid flags */ -- 1.8.3.1

On 12.10.20 19:43, Andrey Shinkevich wrote:
Modify the comment for the flag BDRV_REQ_PREFETCH as we are going to use it alone and pass it to the COR-filter driver for further processing.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- include/block/block.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/include/block/block.h b/include/block/block.h index 981ab5b..2b7efd1 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -71,9 +71,10 @@ typedef enum { BDRV_REQ_NO_FALLBACK = 0x100,
/* - * BDRV_REQ_PREFETCH may be used only together with BDRV_REQ_COPY_ON_READ - * on read request and means that caller doesn't really need data to be - * written to qiov parameter which may be NULL. + * BDRV_REQ_PREFETCH may be used together with the BDRV_REQ_COPY_ON_READ + * flag or when the COR-filter applied to read operations and means that
There’s some word missing here, but I’m not sure what it is... At least an “is” before “applied”. Perhaps something like ”or when a COR filter is involved (in read operations)” would be better.
+ * caller doesn't really need data to be written to qiov parameter which
And this “written to” confused me for a second, because we’re reading into qiov. Technically, that means writing into the buffer, but, you know. Could we rewrite the whole thing, perhaps? Something like “BDRV_REQ_PREFETCH makes sense only in the context of copy-on-read (i.e., together with the BDRV_REQ_COPY_ON_READ flag or when there is a COR filter), in which case it signals that the COR operation need not read the data into memory (qiov), but only ensure it is copied to the top layer (i.e., that COR is done).” I don’t know. Max
+ * may be NULL. */ BDRV_REQ_PREFETCH = 0x200, /* Mask of valid flags */

14.10.2020 15:22, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
Modify the comment for the flag BDRV_REQ_PREFETCH as we are going to use it alone and pass it to the COR-filter driver for further processing.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- include/block/block.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/include/block/block.h b/include/block/block.h index 981ab5b..2b7efd1 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -71,9 +71,10 @@ typedef enum { BDRV_REQ_NO_FALLBACK = 0x100,
/* - * BDRV_REQ_PREFETCH may be used only together with BDRV_REQ_COPY_ON_READ - * on read request and means that caller doesn't really need data to be - * written to qiov parameter which may be NULL. + * BDRV_REQ_PREFETCH may be used together with the BDRV_REQ_COPY_ON_READ + * flag or when the COR-filter applied to read operations and means that
There’s some word missing here, but I’m not sure what it is... At least an “is” before “applied”. Perhaps something like ”or when a COR filter is involved (in read operations)” would be better.
+ * caller doesn't really need data to be written to qiov parameter which
And this “written to” confused me for a second, because we’re reading into qiov. Technically, that means writing into the buffer, but, you know.
Could we rewrite the whole thing, perhaps? Something like
“BDRV_REQ_PREFETCH makes sense only in the context of copy-on-read (i.e., together with the BDRV_REQ_COPY_ON_READ flag or when there is a COR filter), in which case it signals that the COR operation need not read the data into memory (qiov), but only ensure it is copied to the top layer (i.e., that COR is done).”
Sounds good
+ * may be NULL. */ BDRV_REQ_PREFETCH = 0x200, /* Mask of valid flags */
-- Best regards, Vladimir

On 14.10.2020 15:22, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
Modify the comment for the flag BDRV_REQ_PREFETCH as we are going to use it alone and pass it to the COR-filter driver for further processing.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- include/block/block.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/include/block/block.h b/include/block/block.h index 981ab5b..2b7efd1 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -71,9 +71,10 @@ typedef enum { BDRV_REQ_NO_FALLBACK = 0x100,
/* - * BDRV_REQ_PREFETCH may be used only together with BDRV_REQ_COPY_ON_READ - * on read request and means that caller doesn't really need data to be - * written to qiov parameter which may be NULL. + * BDRV_REQ_PREFETCH may be used together with the BDRV_REQ_COPY_ON_READ + * flag or when the COR-filter applied to read operations and means that
There’s some word missing here, but I’m not sure what it is... At least an “is” before “applied”. Perhaps something like ”or when a COR filter is involved (in read operations)” would be better.
+ * caller doesn't really need data to be written to qiov parameter which
And this “written to” confused me for a second, because we’re reading into qiov. Technically, that means writing into the buffer, but, you know.
Could we rewrite the whole thing, perhaps? Something like
“BDRV_REQ_PREFETCH makes sense only in the context of copy-on-read (i.e., together with the BDRV_REQ_COPY_ON_READ flag or when there is a COR filter), in which case it signals that the COR operation need not read the data into memory (qiov), but only ensure it is copied to the top layer (i.e., that COR is done).”
I don’t know.
Max
I would modify a little: “BDRV_REQ_PREFETCH makes sense only in the context of copy-on-read (i.e., together with the BDRV_REQ_COPY_ON_READ flag or when a COR filter is involved), in which case it signals that the COR operation need not read the data into memory (qiov) but only ensure they are copied to the top layer (i.e., that COR operation is done).”
+ * may be NULL. */ BDRV_REQ_PREFETCH = 0x200, /* Mask of valid flags */

Add the new member supported_read_flags to BlockDriverState structure. It will control the BDRV_REQ_PREFETCH flag set for copy-on-read operations. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- include/block/block_int.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/include/block/block_int.h b/include/block/block_int.h index f782737..a142867 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -873,6 +873,10 @@ struct BlockDriverState { /* I/O Limits */ BlockLimits bl; + /* + * Flags honored during pread (so far: BDRV_REQ_PREFETCH) + */ + unsigned int supported_read_flags; /* Flags honored during pwrite (so far: BDRV_REQ_FUA, * BDRV_REQ_WRITE_UNCHANGED). * If a driver does not support BDRV_REQ_WRITE_UNCHANGED, those -- 1.8.3.1

On 12.10.20 19:43, Andrey Shinkevich wrote:
Add the new member supported_read_flags to BlockDriverState structure. It will control the BDRV_REQ_PREFETCH flag set for copy-on-read operations.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- include/block/block_int.h | 4 ++++ 1 file changed, 4 insertions(+)
Not sure what the problem with passing BDRV_REQ_PREFETCH to drivers that aren’t the COR filter are would be, but: Reviewed-by: Max Reitz <mreitz@redhat.com> (I mean, outside of the context of COR the flag is undefined, so it can be anything, but intuitively I’d assume it means “no need to read the data to memory” then, too. So if a driver doesn’t support it, nothing bad should happen. Hm. Well, unless the driver passes the flag on (unsuspectingly) and a metadata-bearing child doesn’t return any data that the drviver needs. Yeah, better to explicitly disallow the flag for all unsupporting drivers then.)

Add support for the BDRV_REQ_PREFETCH flag to the supported_write_flags of the COR-filter. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/copy-on-read.c b/block/copy-on-read.c index dfbd6ad..b136895 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -50,6 +50,7 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags, return -EINVAL; } + bs->supported_read_flags = BDRV_REQ_PREFETCH; bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED | (BDRV_REQ_FUA & bs->file->bs->supported_write_flags); -- 1.8.3.1

On 12.10.20 19:43, Andrey Shinkevich wrote:
Add support for the BDRV_REQ_PREFETCH flag to the supported_write_flags
s/write/read/
of the COR-filter.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index dfbd6ad..b136895 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -50,6 +50,7 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags, return -EINVAL; }
+ bs->supported_read_flags = BDRV_REQ_PREFETCH; bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED | (BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
Then we mustn’t let cor_co_preadv_part() pass the flag on to bdrv_co_preadv_part() unless BDRV_REQ_COPY_ON_READ is set, too. I suspect the following patch is going to do that, but in the meantime the code is wrong. Perhaps just swap both patches? And by the way, I’m also missing a patch that makes the block layer evaluate supported_read_flags and e.g. strip BDRV_REQ_PREFETCH if it isn’t supported, before it gets passed to such a block driver. Max

If the flag BDRV_REQ_PREFETCH was set, pass it further to the COR-driver to skip unneeded reading. It can be taken into account for the COR-algorithms optimization. That check is being made during the block stream job by the moment. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 13 +++++++++---- block/io.c | 3 ++- 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/block/copy-on-read.c b/block/copy-on-read.c index b136895..278a11a 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -148,10 +148,15 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, } } - ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, - local_flags); - if (ret < 0) { - return ret; + if (!!(flags & BDRV_REQ_PREFETCH) & + !(local_flags & BDRV_REQ_COPY_ON_READ)) { + /* Skip non-guest reads if no copy needed */ + } else { + ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, + local_flags); + if (ret < 0) { + return ret; + } } offset += n; diff --git a/block/io.c b/block/io.c index 11df188..bff1808 100644 --- a/block/io.c +++ b/block/io.c @@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child, max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align); if (bytes <= max_bytes && bytes <= max_transfer) { - ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0); + ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, + flags & bs->supported_read_flags); goto out; } -- 1.8.3.1

On 12.10.20 19:43, Andrey Shinkevich wrote:
If the flag BDRV_REQ_PREFETCH was set, pass it further to the COR-driver to skip unneeded reading. It can be taken into account for the COR-algorithms optimization. That check is being made during the block stream job by the moment.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 13 +++++++++---- block/io.c | 3 ++- 2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index b136895..278a11a 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -148,10 +148,15 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, } }
- ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, - local_flags); - if (ret < 0) { - return ret; + if (!!(flags & BDRV_REQ_PREFETCH) &
How about dropping the double negation and using a logical && instead of the binary &?
+ !(local_flags & BDRV_REQ_COPY_ON_READ)) { + /* Skip non-guest reads if no copy needed */ + } else {
Hm. I would have just written the negated form (!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ)) and put the “skip” comment above that condition. (Since local_flags is initialized to flags, it can be written as a single comparison, but that’s a matter of taste and I’m not going to recommend either over the other: ((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) != BDRV_REQ_PREFETCH) )
+ ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, + local_flags); + if (ret < 0) { + return ret; + } }
offset += n; diff --git a/block/io.c b/block/io.c index 11df188..bff1808 100644 --- a/block/io.c +++ b/block/io.c @@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align); if (bytes <= max_bytes && bytes <= max_transfer) { - ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0); + ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, + flags & bs->supported_read_flags);
Ah, OK. I see. I expected this to be a separate patch. I still wonder why it isn’t. Max
goto out; }

14.10.2020 15:51, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
If the flag BDRV_REQ_PREFETCH was set, pass it further to the COR-driver to skip unneeded reading. It can be taken into account for the COR-algorithms optimization. That check is being made during the block stream job by the moment.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 13 +++++++++---- block/io.c | 3 ++- 2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index b136895..278a11a 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -148,10 +148,15 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, } }
- ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, - local_flags); - if (ret < 0) { - return ret; + if (!!(flags & BDRV_REQ_PREFETCH) &
How about dropping the double negation and using a logical && instead of the binary &?
+ !(local_flags & BDRV_REQ_COPY_ON_READ)) { + /* Skip non-guest reads if no copy needed */ + } else {
Hm. I would have just written the negated form
(!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))
and put the “skip” comment above that condition.
(Since local_flags is initialized to flags, it can be written as a single comparison, but that’s a matter of taste and I’m not going to recommend either over the other:
((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) != BDRV_REQ_PREFETCH)
)
+ ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, + local_flags); + if (ret < 0) { + return ret; + } }
offset += n; diff --git a/block/io.c b/block/io.c index 11df188..bff1808 100644 --- a/block/io.c +++ b/block/io.c @@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align); if (bytes <= max_bytes && bytes <= max_transfer) { - ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0); + ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, + flags & bs->supported_read_flags);
When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should be) NULL. This means, that we can't just drop the flag when call the driver that doesn't support it. Actually, if driver doesn't support the PREFETCH flag we should do nothing.
Ah, OK. I see. I expected this to be a separate patch. I still wonder why it isn’t.
Could it be part of patch 07? I mean introduce new field supported_read_flags and handle it in generic code in one patch, prior to implementing support for it in COR driver. -- Best regards, Vladimir

On 14.10.20 17:22, Vladimir Sementsov-Ogievskiy wrote:
14.10.2020 15:51, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
If the flag BDRV_REQ_PREFETCH was set, pass it further to the COR-driver to skip unneeded reading. It can be taken into account for the COR-algorithms optimization. That check is being made during the block stream job by the moment.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 13 +++++++++---- block/io.c | 3 ++- 2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index b136895..278a11a 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -148,10 +148,15 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, } } - ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, - local_flags); - if (ret < 0) { - return ret; + if (!!(flags & BDRV_REQ_PREFETCH) &
How about dropping the double negation and using a logical && instead of the binary &?
+ !(local_flags & BDRV_REQ_COPY_ON_READ)) { + /* Skip non-guest reads if no copy needed */ + } else {
Hm. I would have just written the negated form
(!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))
and put the “skip” comment above that condition.
(Since local_flags is initialized to flags, it can be written as a single comparison, but that’s a matter of taste and I’m not going to recommend either over the other:
((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) != BDRV_REQ_PREFETCH)
)
+ ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, + local_flags); + if (ret < 0) { + return ret; + } } offset += n; diff --git a/block/io.c b/block/io.c index 11df188..bff1808 100644 --- a/block/io.c +++ b/block/io.c @@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child, max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align); if (bytes <= max_bytes && bytes <= max_transfer) { - ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0); + ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, + flags & bs->supported_read_flags);
When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should be) NULL. This means, that we can't just drop the flag when call the driver that doesn't support it.
True. :/
Actually, if driver doesn't support the PREFETCH flag we should do nothing.
Hm. But at least in the case of COR, PREFETCH is not something that can be optimized to be a no-op (unless the COR is a no-op); it still denotes a command that must be executed. So if we can’t pass it to the driver, I don’t think we should do nothing, but to return an error. Or maybe we could even assert that it isn’t set for drivers that don’t support it, because at least right now such a case would just be a bug.
Ah, OK. I see. I expected this to be a separate patch. I still wonder why it isn’t.
Could it be part of patch 07? I mean introduce new field supported_read_flags and handle it in generic code in one patch, prior to implementing support for it in COR driver.
Sure. Max

14.10.2020 19:30, Max Reitz wrote:
On 14.10.20 17:22, Vladimir Sementsov-Ogievskiy wrote:
14.10.2020 15:51, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
If the flag BDRV_REQ_PREFETCH was set, pass it further to the COR-driver to skip unneeded reading. It can be taken into account for the COR-algorithms optimization. That check is being made during the block stream job by the moment.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 13 +++++++++---- block/io.c | 3 ++- 2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index b136895..278a11a 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -148,10 +148,15 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, } } - ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, - local_flags); - if (ret < 0) { - return ret; + if (!!(flags & BDRV_REQ_PREFETCH) &
How about dropping the double negation and using a logical && instead of the binary &?
+ !(local_flags & BDRV_REQ_COPY_ON_READ)) { + /* Skip non-guest reads if no copy needed */ + } else {
Hm. I would have just written the negated form
(!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))
and put the “skip” comment above that condition.
(Since local_flags is initialized to flags, it can be written as a single comparison, but that’s a matter of taste and I’m not going to recommend either over the other:
((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) != BDRV_REQ_PREFETCH)
)
+ ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, + local_flags); + if (ret < 0) { + return ret; + } } offset += n; diff --git a/block/io.c b/block/io.c index 11df188..bff1808 100644 --- a/block/io.c +++ b/block/io.c @@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child, max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align); if (bytes <= max_bytes && bytes <= max_transfer) { - ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0); + ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, + flags & bs->supported_read_flags);
When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should be) NULL. This means, that we can't just drop the flag when call the driver that doesn't support it.
True. :/
Actually, if driver doesn't support the PREFETCH flag we should do nothing.
Hm. But at least in the case of COR, PREFETCH is not something that can be optimized to be a no-op (unless the COR is a no-op); it still denotes a command that must be executed.
So if we can’t pass it to the driver, I don’t think we should do nothing, but to return an error. Or maybe we could even assert that it isn’t set for drivers that don’t support it, because at least right now such a case would just be a bug.
Hmm. Reasonable.. So, let me summarize the cases: 1. bdrv_co_preadv(.. , PREFETCH | COR) In this case generic layer should handle both flags and pass flags=0 to driver 2. bdrv_co_preadv(.., PREFETCH) 2.1 driver supporst PREFETCH OK, pass PREFETCH to driver, no problems 2.2 driver doesn't support PREFETCH We can just abort() here, as the only source of PREFETCH without COR would be stream job driver, which must read from COR filter. More generic solution is to allocate temporary buffer (at least if qiov is NULL) and call underlying driver .preadv with flags=0 on that temporary buffer. But just abort() is simpler and should work for now.
Ah, OK. I see. I expected this to be a separate patch. I still wonder why it isn’t.
Could it be part of patch 07? I mean introduce new field supported_read_flags and handle it in generic code in one patch, prior to implementing support for it in COR driver.
Sure.
Max
-- Best regards, Vladimir

On 14.10.20 18:39, Vladimir Sementsov-Ogievskiy wrote:
14.10.2020 19:30, Max Reitz wrote:
On 14.10.20 17:22, Vladimir Sementsov-Ogievskiy wrote:
14.10.2020 15:51, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
If the flag BDRV_REQ_PREFETCH was set, pass it further to the COR-driver to skip unneeded reading. It can be taken into account for the COR-algorithms optimization. That check is being made during the block stream job by the moment.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 13 +++++++++---- block/io.c | 3 ++- 2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index b136895..278a11a 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -148,10 +148,15 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, } } - ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, - local_flags); - if (ret < 0) { - return ret; + if (!!(flags & BDRV_REQ_PREFETCH) &
How about dropping the double negation and using a logical && instead of the binary &?
+ !(local_flags & BDRV_REQ_COPY_ON_READ)) { + /* Skip non-guest reads if no copy needed */ + } else {
Hm. I would have just written the negated form
(!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))
and put the “skip” comment above that condition.
(Since local_flags is initialized to flags, it can be written as a single comparison, but that’s a matter of taste and I’m not going to recommend either over the other:
((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) != BDRV_REQ_PREFETCH)
)
+ ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, + local_flags); + if (ret < 0) { + return ret; + } } offset += n; diff --git a/block/io.c b/block/io.c index 11df188..bff1808 100644 --- a/block/io.c +++ b/block/io.c @@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child, max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align); if (bytes <= max_bytes && bytes <= max_transfer) { - ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0); + ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, + flags & bs->supported_read_flags);
When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should be) NULL. This means, that we can't just drop the flag when call the driver that doesn't support it.
True. :/
Actually, if driver doesn't support the PREFETCH flag we should do nothing.
Hm. But at least in the case of COR, PREFETCH is not something that can be optimized to be a no-op (unless the COR is a no-op); it still denotes a command that must be executed.
So if we can’t pass it to the driver, I don’t think we should do nothing, but to return an error. Or maybe we could even assert that it isn’t set for drivers that don’t support it, because at least right now such a case would just be a bug.
Hmm. Reasonable..
So, let me summarize the cases:
1. bdrv_co_preadv(.. , PREFETCH | COR)
In this case generic layer should handle both flags and pass flags=0 to driver
2. bdrv_co_preadv(.., PREFETCH)
2.1 driver supporst PREFETCH OK, pass PREFETCH to driver, no problems
2.2 driver doesn't support PREFETCH
We can just abort() here, as the only source of PREFETCH without COR would be stream job driver, which must read from COR filter.
More generic solution is to allocate temporary buffer (at least if qiov is NULL) and call underlying driver .preadv with flags=0 on that temporary buffer. But just abort() is simpler and should work for now.
Agreed. Max

On 14.10.2020 18:22, Vladimir Sementsov-Ogievskiy wrote:
14.10.2020 15:51, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
If the flag BDRV_REQ_PREFETCH was set, pass it further to the COR-driver to skip unneeded reading. It can be taken into account for the COR-algorithms optimization. That check is being made during the block stream job by the moment.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 13 +++++++++---- block/io.c | 3 ++- 2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index b136895..278a11a 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -148,10 +148,15 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, } } - ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, - local_flags); - if (ret < 0) { - return ret; + if (!!(flags & BDRV_REQ_PREFETCH) &
How about dropping the double negation and using a logical && instead of the binary &?
+ !(local_flags & BDRV_REQ_COPY_ON_READ)) { + /* Skip non-guest reads if no copy needed */ + } else {
Hm. I would have just written the negated form
(!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))
and put the “skip” comment above that condition.
(Since local_flags is initialized to flags, it can be written as a single comparison, but that’s a matter of taste and I’m not going to recommend either over the other:
((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) != BDRV_REQ_PREFETCH)
)
+ ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, + local_flags); + if (ret < 0) { + return ret; + } } offset += n; diff --git a/block/io.c b/block/io.c index 11df188..bff1808 100644 --- a/block/io.c +++ b/block/io.c @@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child, max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align); if (bytes <= max_bytes && bytes <= max_transfer) { - ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0); + ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, + flags & bs->supported_read_flags);
When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should be) NULL. This means, that we can't just drop the flag when call the driver that doesn't support it.
Actually, if driver doesn't support the PREFETCH flag we should do nothing.
Ah, OK. I see. I expected this to be a separate patch. I still wonder why it isn’t.
Could it be part of patch 07? I mean introduce new field supported_read_flags and handle it in generic code in one patch, prior to implementing support for it in COR driver.
We have to add the supported flags for the COR driver in the same patch. Or before handling the supported_read_flags at the generic layer (handling zero does not make a sence). Otherwise, the test #216 (where the COR-filter is applied) will not pass. Andrey

On 21.10.2020 23:43, Andrey Shinkevich wrote:
On 14.10.2020 18:22, Vladimir Sementsov-Ogievskiy wrote:
14.10.2020 15:51, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
If the flag BDRV_REQ_PREFETCH was set, pass it further to the COR-driver to skip unneeded reading. It can be taken into account for the COR-algorithms optimization. That check is being made during the block stream job by the moment.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> ---
[...]
diff --git a/block/io.c b/block/io.c index 11df188..bff1808 100644 --- a/block/io.c +++ b/block/io.c @@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child, max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align); if (bytes <= max_bytes && bytes <= max_transfer) { - ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0); + ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, + flags & bs->supported_read_flags);
When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should be) NULL. This means, that we can't just drop the flag when call the driver that doesn't support it.
Actually, if driver doesn't support the PREFETCH flag we should do nothing.
Ah, OK. I see. I expected this to be a separate patch. I still wonder why it isn’t.
Could it be part of patch 07? I mean introduce new field supported_read_flags and handle it in generic code in one patch, prior to implementing support for it in COR driver.
We have to add the supported flags for the COR driver in the same patch. Or before handling the supported_read_flags at the generic layer (handling zero does not make a sence). Otherwise, the test #216 (where the COR-filter is applied) will not pass.
Andrey
I have found a workaround and am going to send all the related patches as a separate series. Andrey

22.10.2020 10:50, Andrey Shinkevich wrote:
On 21.10.2020 23:43, Andrey Shinkevich wrote:
On 14.10.2020 18:22, Vladimir Sementsov-Ogievskiy wrote:
14.10.2020 15:51, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
If the flag BDRV_REQ_PREFETCH was set, pass it further to the COR-driver to skip unneeded reading. It can be taken into account for the COR-algorithms optimization. That check is being made during the block stream job by the moment.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> ---
[...]
diff --git a/block/io.c b/block/io.c index 11df188..bff1808 100644 --- a/block/io.c +++ b/block/io.c @@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child, max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align); if (bytes <= max_bytes && bytes <= max_transfer) { - ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0); + ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, + flags & bs->supported_read_flags);
When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should be) NULL. This means, that we can't just drop the flag when call the driver that doesn't support it.
Actually, if driver doesn't support the PREFETCH flag we should do nothing.
Ah, OK. I see. I expected this to be a separate patch. I still wonder why it isn’t.
Could it be part of patch 07? I mean introduce new field supported_read_flags and handle it in generic code in one patch, prior to implementing support for it in COR driver.
We have to add the supported flags for the COR driver in the same patch. Or before handling the supported_read_flags at the generic layer (handling zero does not make a sence). Otherwise, the test #216 (where the COR-filter is applied) will not pass.
Andrey
I have found a workaround and am going to send all the related patches as a separate series.
What is the problem? If in a separate patch prior to modifying COR driver, we add new field supported_read_flags and add a support for it in generic layer, when no driver support it yet, nothing should be changed and all tests should pass.. -- Best regards, Vladimir

On 14.10.2020 15:51, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
If the flag BDRV_REQ_PREFETCH was set, pass it further to the COR-driver to skip unneeded reading. It can be taken into account for the COR-algorithms optimization. That check is being made during the block stream job by the moment.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/copy-on-read.c | 13 +++++++++---- block/io.c | 3 ++- 2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/block/copy-on-read.c b/block/copy-on-read.c index b136895..278a11a 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -148,10 +148,15 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, } }
- ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, - local_flags); - if (ret < 0) { - return ret; + if (!!(flags & BDRV_REQ_PREFETCH) &
How about dropping the double negation and using a logical && instead of the binary &?
Yes, that's correct.
+ !(local_flags & BDRV_REQ_COPY_ON_READ)) { + /* Skip non-guest reads if no copy needed */ + } else {
Hm. I would have just written the negated form
(!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))
and put the “skip” comment above that condition.
(Since local_flags is initialized to flags, it can be written as a single comparison, but that’s a matter of taste and I’m not going to recommend either over the other:
I played with the flags to make the idea obvious for the eye of a beholder: "we neither read nor write". Comparing the BDRV_REQ_PREFETCH against the 'flags' means that the flag comes from outside of the function. And the empty section means we do nothing in that case. Eventually, I will pick up the brief expression below. Thanks, Andrey
((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) != BDRV_REQ_PREFETCH)
)
+ ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, + local_flags); + if (ret < 0) { + return ret; + } }
offset += n; diff --git a/block/io.c b/block/io.c index 11df188..bff1808 100644 --- a/block/io.c +++ b/block/io.c @@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align); if (bytes <= max_bytes && bytes <= max_transfer) { - ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0); + ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, + flags & bs->supported_read_flags);
Ah, OK. I see. I expected this to be a separate patch. I still wonder why it isn’t.
Max
goto out; }

Avoid writing a filter JSON-name to QCOW2 image when the backing file is changed after the block stream job. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/stream.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/block/stream.c b/block/stream.c index e0540ee..51462bd 100644 --- a/block/stream.c +++ b/block/stream.c @@ -65,6 +65,7 @@ static int stream_prepare(Job *job) BlockDriverState *bs = blk_bs(bjob->blk); BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs); BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base); + BlockDriverState *base_unfiltered = bdrv_skip_filters(base); Error *local_err = NULL; int ret = 0; @@ -73,10 +74,10 @@ static int stream_prepare(Job *job) if (bdrv_cow_child(unfiltered_bs)) { const char *base_id = NULL, *base_fmt = NULL; - if (base) { - base_id = s->backing_file_str; - if (base->drv) { - base_fmt = base->drv->format_name; + if (base_unfiltered) { + base_id = base_unfiltered->filename; + if (base_unfiltered->drv) { + base_fmt = base_unfiltered->drv->format_name; } } bdrv_set_backing_hd(unfiltered_bs, base, &local_err); -- 1.8.3.1

On 12.10.20 19:43, Andrey Shinkevich wrote:
Avoid writing a filter JSON-name to QCOW2 image when the backing file is changed after the block stream job.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/stream.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/block/stream.c b/block/stream.c index e0540ee..51462bd 100644 --- a/block/stream.c +++ b/block/stream.c @@ -65,6 +65,7 @@ static int stream_prepare(Job *job) BlockDriverState *bs = blk_bs(bjob->blk); BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs); BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base); + BlockDriverState *base_unfiltered = bdrv_skip_filters(base); Error *local_err = NULL; int ret = 0;
@@ -73,10 +74,10 @@ static int stream_prepare(Job *job)
if (bdrv_cow_child(unfiltered_bs)) { const char *base_id = NULL, *base_fmt = NULL; - if (base) { - base_id = s->backing_file_str; - if (base->drv) { - base_fmt = base->drv->format_name; + if (base_unfiltered) { + base_id = base_unfiltered->filename;
I think you have to leave this querying s->backing_file_str, and instead change how qmp_block_stream() gets @base_name. block-stream has a backing-file parameter that can override the string that should be used here. (Or perhaps you can let qmp_block_stream() just set it to NULL if no backing-file parameter is passed and then fall back to base_unfiltered->filename only here. Probably better, actually, in case base_unfiltered is changed during the job run.) Max
+ if (base_unfiltered->drv) { + base_fmt = base_unfiltered->drv->format_name; } } bdrv_set_backing_hd(unfiltered_bs, base, &local_err);

14.10.2020 18:02, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
Avoid writing a filter JSON-name to QCOW2 image when the backing file is changed after the block stream job.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/stream.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/block/stream.c b/block/stream.c index e0540ee..51462bd 100644 --- a/block/stream.c +++ b/block/stream.c @@ -65,6 +65,7 @@ static int stream_prepare(Job *job) BlockDriverState *bs = blk_bs(bjob->blk); BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs); BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base); + BlockDriverState *base_unfiltered = bdrv_skip_filters(base); Error *local_err = NULL; int ret = 0;
@@ -73,10 +74,10 @@ static int stream_prepare(Job *job)
if (bdrv_cow_child(unfiltered_bs)) { const char *base_id = NULL, *base_fmt = NULL; - if (base) { - base_id = s->backing_file_str; - if (base->drv) { - base_fmt = base->drv->format_name; + if (base_unfiltered) { + base_id = base_unfiltered->filename;
I think you have to leave this querying s->backing_file_str, and instead change how qmp_block_stream() gets @base_name. block-stream has a backing-file parameter that can override the string that should be used here.
(Or perhaps you can let qmp_block_stream() just set it to NULL if no backing-file parameter is passed and then fall back to base_unfiltered->filename only here. Probably better, actually, in case base_unfiltered is changed during the job run.)
Agree with the way in brackets. If user set backing-file parameter we should handle it here. If backing-file is not set, we should use dynamically calculated unfiltered base in stream_prepare(). -- Best regards, Vladimir

Whereas the block-stream job starts using a backing file name of the base node overlay after the block-stream job completes, mark the QMP 'backing-file' argument as deprecated. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- docs/system/deprecated.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst index 8b3ab5b..7491fcf 100644 --- a/docs/system/deprecated.rst +++ b/docs/system/deprecated.rst @@ -285,6 +285,12 @@ details. The ``query-events`` command has been superseded by the more powerful and accurate ``query-qmp-schema`` command. +``block-stream`` argument ``backing-file`` (since 5.2) +''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' + +The argument ``backing-file`` is deprecated. QEMU uses a backing file +name of the base node overlay after the block-stream job completes. + chardev client socket with ``wait`` option (since 4.0) '''''''''''''''''''''''''''''''''''''''''''''''''''''' -- 1.8.3.1

On 12.10.20 19:43, Andrey Shinkevich wrote:
Whereas the block-stream job starts using a backing file name of the base node overlay after the block-stream job completes, mark the QMP 'backing-file' argument as deprecated.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- docs/system/deprecated.rst | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst index 8b3ab5b..7491fcf 100644 --- a/docs/system/deprecated.rst +++ b/docs/system/deprecated.rst @@ -285,6 +285,12 @@ details. The ``query-events`` command has been superseded by the more powerful and accurate ``query-qmp-schema`` command.
+``block-stream`` argument ``backing-file`` (since 5.2) +''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' + +The argument ``backing-file`` is deprecated. QEMU uses a backing file +name of the base node overlay after the block-stream job completes. +
Hm, why? I don’t see the problem with it. Max

14.10.2020 18:03, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
Whereas the block-stream job starts using a backing file name of the base node overlay after the block-stream job completes, mark the QMP 'backing-file' argument as deprecated.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- docs/system/deprecated.rst | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst index 8b3ab5b..7491fcf 100644 --- a/docs/system/deprecated.rst +++ b/docs/system/deprecated.rst @@ -285,6 +285,12 @@ details. The ``query-events`` command has been superseded by the more powerful and accurate ``query-qmp-schema`` command.
+``block-stream`` argument ``backing-file`` (since 5.2) +''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' + +The argument ``backing-file`` is deprecated. QEMU uses a backing file +name of the base node overlay after the block-stream job completes. +
Hm, why? I don’t see the problem with it.
My wrong idea, sorry. I believed that the argument is unused when I were reviewing v10. But it actually become unused during the series and it is wrong. -- Best regards, Vladimir

On 14.10.2020 18:43, Vladimir Sementsov-Ogievskiy wrote:
14.10.2020 18:03, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
Whereas the block-stream job starts using a backing file name of the base node overlay after the block-stream job completes, mark the QMP 'backing-file' argument as deprecated.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- docs/system/deprecated.rst | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst index 8b3ab5b..7491fcf 100644 --- a/docs/system/deprecated.rst +++ b/docs/system/deprecated.rst @@ -285,6 +285,12 @@ details. The ``query-events`` command has been superseded by the more powerful and accurate ``query-qmp-schema`` command. +``block-stream`` argument ``backing-file`` (since 5.2) +''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' + +The argument ``backing-file`` is deprecated. QEMU uses a backing file +name of the base node overlay after the block-stream job completes. +
Hm, why? I don’t see the problem with it.
My wrong idea, sorry. I believed that the argument is unused when I were reviewing v10. But it actually become unused during the series and it is wrong.
I missed searching for calls to the qmp_block_stream() in the QEMU dinamically generated code. Will roll back. Andrey

The 'backing-file' argument is not used by the block-stream job. It designates a backing file name to set in QCOW2 image header after the block-stream job finished. A backing file name of the node above base is used instead. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/stream.c | 6 +----- blockdev.c | 21 ++++++--------------- include/block/block_int.h | 2 +- 3 files changed, 8 insertions(+), 21 deletions(-) diff --git a/block/stream.c b/block/stream.c index 51462bd..d3e1812 100644 --- a/block/stream.c +++ b/block/stream.c @@ -34,7 +34,6 @@ typedef struct StreamBlockJob { BlockDriverState *base_overlay; /* COW overlay (stream from this) */ BlockDriverState *above_base; /* Node directly above the base */ BlockdevOnError on_error; - char *backing_file_str; bool bs_read_only; bool chain_frozen; } StreamBlockJob; @@ -103,8 +102,6 @@ static void stream_clean(Job *job) blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort); bdrv_reopen_set_read_only(bs, true, NULL); } - - g_free(s->backing_file_str); } static int coroutine_fn stream_run(Job *job, Error **errp) @@ -220,7 +217,7 @@ static const BlockJobDriver stream_job_driver = { }; void stream_start(const char *job_id, BlockDriverState *bs, - BlockDriverState *base, const char *backing_file_str, + BlockDriverState *base, int creation_flags, int64_t speed, BlockdevOnError on_error, const char *filter_node_name, @@ -295,7 +292,6 @@ void stream_start(const char *job_id, BlockDriverState *bs, s->base_overlay = base_overlay; s->above_base = above_base; - s->backing_file_str = g_strdup(backing_file_str); s->bs_read_only = bs_read_only; s->chain_frozen = true; diff --git a/blockdev.c b/blockdev.c index d719c47..019b6e0 100644 --- a/blockdev.c +++ b/blockdev.c @@ -2498,7 +2498,6 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device, BlockDriverState *base_bs = NULL; AioContext *aio_context; Error *local_err = NULL; - const char *base_name = NULL; int job_flags = JOB_DEFAULT; if (!has_on_error) { @@ -2526,7 +2525,6 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device, goto out; } assert(bdrv_get_aio_context(base_bs) == aio_context); - base_name = base; } if (has_base_node) { @@ -2541,7 +2539,11 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device, } assert(bdrv_get_aio_context(base_bs) == aio_context); bdrv_refresh_filename(base_bs); - base_name = base_bs->filename; + } + + if (has_backing_file) { + warn_report("Use of \"backing-file\" argument is deprecated; " + "a backing file of the node above base is used instead"); } /* Check for op blockers in the whole chain between bs and base */ @@ -2553,17 +2555,6 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device, } } - /* if we are streaming the entire chain, the result will have no backing - * file, and specifying one is therefore an error */ - if (base_bs == NULL && has_backing_file) { - error_setg(errp, "backing file specified, but streaming the " - "entire chain"); - goto out; - } - - /* backing_file string overrides base bs filename */ - base_name = has_backing_file ? backing_file : base_name; - if (has_auto_finalize && !auto_finalize) { job_flags |= JOB_MANUAL_FINALIZE; } @@ -2571,7 +2562,7 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device, job_flags |= JOB_MANUAL_DISMISS; } - stream_start(has_job_id ? job_id : NULL, bs, base_bs, base_name, + stream_start(has_job_id ? job_id : NULL, bs, base_bs, job_flags, has_speed ? speed : 0, on_error, filter_node_name, &local_err); if (local_err) { diff --git a/include/block/block_int.h b/include/block/block_int.h index a142867..4f523c3 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -1151,7 +1151,7 @@ int is_windows_drive(const char *filename); * BlockDriverState. */ void stream_start(const char *job_id, BlockDriverState *bs, - BlockDriverState *base, const char *backing_file_str, + BlockDriverState *base, int creation_flags, int64_t speed, BlockdevOnError on_error, const char *filter_node_name, -- 1.8.3.1

On 12.10.20 19:43, Andrey Shinkevich wrote:
The 'backing-file' argument is not used by the block-stream job. It designates a backing file name to set in QCOW2 image header after the block-stream job finished. A backing file name of the node above base is used instead.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/stream.c | 6 +----- blockdev.c | 21 ++++++--------------- include/block/block_int.h | 2 +- 3 files changed, 8 insertions(+), 21 deletions(-)
I don’t think you can deprecate a parameter and effectively remove support for it in the same release. Max

This patch completes the series with the COR-filter insertion for block-stream operations. Adding the filter makes it possible for copied regions to be discarded in backing files during the block-stream job, what will reduce the disk overuse. The COR-filter insertion incurs changes in the iotests case 245:test_block_stream_4 that reopens the backing chain during a block-stream job. There are changes in the iotests #030 as well. The iotests case 030:test_stream_parallel was deleted due to multiple conflicts between the concurrent job operations over the same backing chain. The base backing node for one job is the top node for another job. It may change due to the filter node inserted into the backing chain while both jobs are running. Another issue is that the parts of the backing chain are being frozen by the running job and may not be changed by the concurrent job when needed. The concept of the parallel jobs with common nodes is considered vital no more. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/stream.c | 93 +++++++++++++++++++++++++++++----------------- tests/qemu-iotests/030 | 51 +++---------------------- tests/qemu-iotests/030.out | 4 +- tests/qemu-iotests/141.out | 2 +- tests/qemu-iotests/245 | 19 +++++++--- 5 files changed, 81 insertions(+), 88 deletions(-) diff --git a/block/stream.c b/block/stream.c index d3e1812..93564db 100644 --- a/block/stream.c +++ b/block/stream.c @@ -17,8 +17,10 @@ #include "block/blockjob_int.h" #include "qapi/error.h" #include "qapi/qmp/qerror.h" +#include "qapi/qmp/qdict.h" #include "qemu/ratelimit.h" #include "sysemu/block-backend.h" +#include "block/copy-on-read.h" enum { /* @@ -33,6 +35,8 @@ typedef struct StreamBlockJob { BlockJob common; BlockDriverState *base_overlay; /* COW overlay (stream from this) */ BlockDriverState *above_base; /* Node directly above the base */ + BlockDriverState *cor_filter_bs; + BlockDriverState *target_bs; BlockdevOnError on_error; bool bs_read_only; bool chain_frozen; @@ -43,8 +47,7 @@ static int coroutine_fn stream_populate(BlockBackend *blk, { assert(bytes < SIZE_MAX); - return blk_co_preadv(blk, offset, bytes, NULL, - BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH); + return blk_co_preadv(blk, offset, bytes, NULL, BDRV_REQ_PREFETCH); } static void stream_abort(Job *job) @@ -52,23 +55,20 @@ static void stream_abort(Job *job) StreamBlockJob *s = container_of(job, StreamBlockJob, common.job); if (s->chain_frozen) { - BlockJob *bjob = &s->common; - bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base); + bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base); } } static int stream_prepare(Job *job) { StreamBlockJob *s = container_of(job, StreamBlockJob, common.job); - BlockJob *bjob = &s->common; - BlockDriverState *bs = blk_bs(bjob->blk); - BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs); + BlockDriverState *unfiltered_bs = bdrv_skip_filters(s->target_bs); BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base); BlockDriverState *base_unfiltered = bdrv_skip_filters(base); Error *local_err = NULL; int ret = 0; - bdrv_unfreeze_backing_chain(bs, s->above_base); + bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base); s->chain_frozen = false; if (bdrv_cow_child(unfiltered_bs)) { @@ -94,13 +94,14 @@ static void stream_clean(Job *job) { StreamBlockJob *s = container_of(job, StreamBlockJob, common.job); BlockJob *bjob = &s->common; - BlockDriverState *bs = blk_bs(bjob->blk); + + bdrv_cor_filter_drop(s->cor_filter_bs); /* Reopen the image back in read-only mode if necessary */ if (s->bs_read_only) { /* Give up write permissions before making it read-only */ blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort); - bdrv_reopen_set_read_only(bs, true, NULL); + bdrv_reopen_set_read_only(s->target_bs, true, NULL); } } @@ -108,9 +109,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp) { StreamBlockJob *s = container_of(job, StreamBlockJob, common.job); BlockBackend *blk = s->common.blk; - BlockDriverState *bs = blk_bs(blk); - BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs); - bool enable_cor = !bdrv_cow_child(s->base_overlay); + BlockDriverState *unfiltered_bs = bdrv_skip_filters(s->target_bs); int64_t len; int64_t offset = 0; uint64_t delay_ns = 0; @@ -122,21 +121,12 @@ static int coroutine_fn stream_run(Job *job, Error **errp) return 0; } - len = bdrv_getlength(bs); + len = bdrv_getlength(s->target_bs); if (len < 0) { return len; } job_progress_set_remaining(&s->common.job, len); - /* Turn on copy-on-read for the whole block device so that guest read - * requests help us make progress. Only do this when copying the entire - * backing chain since the copy-on-read operation does not take base into - * account. - */ - if (enable_cor) { - bdrv_enable_copy_on_read(bs); - } - for ( ; offset < len; offset += n) { bool copy; int ret; @@ -195,10 +185,6 @@ static int coroutine_fn stream_run(Job *job, Error **errp) } } - if (enable_cor) { - bdrv_disable_copy_on_read(bs); - } - /* Do not remove the backing file if an error was there but ignored. */ return error; } @@ -228,6 +214,7 @@ void stream_start(const char *job_id, BlockDriverState *bs, bool bs_read_only; int basic_flags = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED; BlockDriverState *base_overlay = bdrv_find_overlay(bs, base); + BlockDriverState *cor_filter_bs = NULL; BlockDriverState *above_base; if (!base_overlay) { @@ -262,17 +249,48 @@ void stream_start(const char *job_id, BlockDriverState *bs, } } - /* Prevent concurrent jobs trying to modify the graph structure here, we - * already have our own plans. Also don't allow resize as the image size is - * queried only at the job start and then cached. */ - s = block_job_create(job_id, &stream_job_driver, NULL, bs, - basic_flags | BLK_PERM_GRAPH_MOD, - basic_flags | BLK_PERM_WRITE, + QDict *opts = qdict_new(); + + qdict_put_str(opts, "driver", "copy-on-read"); + qdict_put_str(opts, "file", bdrv_get_node_name(bs)); + if (base_overlay) { + /* Pass the base_overlay rather than base */ + qdict_put_str(opts, "base", base_overlay->node_name); + } + if (filter_node_name) { + qdict_put_str(opts, "node-name", filter_node_name); + } + + cor_filter_bs = bdrv_cor_filter_append(bs, opts, BDRV_O_RDWR, errp); + if (cor_filter_bs == NULL) { + goto fail; + } + + if (bdrv_freeze_backing_chain(cor_filter_bs, bs, errp) < 0) { + bdrv_cor_filter_drop(cor_filter_bs); + cor_filter_bs = NULL; + goto fail; + } + + s = block_job_create(job_id, &stream_job_driver, NULL, cor_filter_bs, + BLK_PERM_CONSISTENT_READ, + basic_flags | BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD, speed, creation_flags, NULL, NULL, errp); if (!s) { goto fail; } + /* + * Prevent concurrent jobs trying to modify the graph structure here, we + * already have our own plans. Also don't allow resize as the image size is + * queried only at the job start and then cached. + */ + if (block_job_add_bdrv(&s->common, "active node", bs, + basic_flags | BLK_PERM_GRAPH_MOD, + basic_flags | BLK_PERM_WRITE, &error_abort)) { + goto fail; + } + /* Block all intermediate nodes between bs and base, because they will * disappear from the chain after this operation. The streaming job reads * every block only once, assuming that it doesn't change, so forbid writes @@ -292,6 +310,8 @@ void stream_start(const char *job_id, BlockDriverState *bs, s->base_overlay = base_overlay; s->above_base = above_base; + s->cor_filter_bs = cor_filter_bs; + s->target_bs = bs; s->bs_read_only = bs_read_only; s->chain_frozen = true; @@ -304,5 +324,10 @@ fail: if (bs_read_only) { bdrv_reopen_set_read_only(bs, true, NULL); } - bdrv_unfreeze_backing_chain(bs, above_base); + if (cor_filter_bs) { + bdrv_unfreeze_backing_chain(cor_filter_bs, above_base); + bdrv_cor_filter_drop(cor_filter_bs); + } else { + bdrv_unfreeze_backing_chain(bs, above_base); + } } diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030 index dcb4b5d..0064590 100755 --- a/tests/qemu-iotests/030 +++ b/tests/qemu-iotests/030 @@ -227,61 +227,20 @@ class TestParallelOps(iotests.QMPTestCase): for img in self.imgs: os.remove(img) - # Test that it's possible to run several block-stream operations - # in parallel in the same snapshot chain - @unittest.skipIf(os.environ.get('QEMU_CHECK_BLOCK_AUTO'), 'disabled in CI') - def test_stream_parallel(self): - self.assert_no_active_block_jobs() - - # Check that the maps don't match before the streaming operations - for i in range(2, self.num_imgs, 2): - self.assertNotEqual(qemu_io('-f', iotests.imgfmt, '-rU', '-c', 'map', self.imgs[i]), - qemu_io('-f', iotests.imgfmt, '-rU', '-c', 'map', self.imgs[i-1]), - 'image file map matches backing file before streaming') - - # Create all streaming jobs - pending_jobs = [] - for i in range(2, self.num_imgs, 2): - node_name = 'node%d' % i - job_id = 'stream-%s' % node_name - pending_jobs.append(job_id) - result = self.vm.qmp('block-stream', device=node_name, job_id=job_id, base=self.imgs[i-2], speed=1024) - self.assert_qmp(result, 'return', {}) - - for job in pending_jobs: - result = self.vm.qmp('block-job-set-speed', device=job, speed=0) - self.assert_qmp(result, 'return', {}) - - # Wait for all jobs to be finished. - while len(pending_jobs) > 0: - for event in self.vm.get_qmp_events(wait=True): - if event['event'] == 'BLOCK_JOB_COMPLETED': - job_id = self.dictpath(event, 'data/device') - self.assertTrue(job_id in pending_jobs) - self.assert_qmp_absent(event, 'data/error') - pending_jobs.remove(job_id) - - self.assert_no_active_block_jobs() - self.vm.shutdown() - - # Check that all maps match now - for i in range(2, self.num_imgs, 2): - self.assertEqual(qemu_io('-f', iotests.imgfmt, '-c', 'map', self.imgs[i]), - qemu_io('-f', iotests.imgfmt, '-c', 'map', self.imgs[i-1]), - 'image file map does not match backing file after streaming') - # Test that it's not possible to perform two block-stream # operations if there are nodes involved in both. def test_overlapping_1(self): self.assert_no_active_block_jobs() # Set a speed limit to make sure that this job blocks the rest - result = self.vm.qmp('block-stream', device='node4', job_id='stream-node4', base=self.imgs[1], speed=1024*1024) + result = self.vm.qmp('block-stream', device='node4', + job_id='stream-node4', base=self.imgs[1], + filter_node_name='stream-filter', speed=1024*1024) self.assert_qmp(result, 'return', {}) result = self.vm.qmp('block-stream', device='node5', job_id='stream-node5', base=self.imgs[2]) self.assert_qmp(result, 'error/desc', - "Node 'node4' is busy: block device is in use by block job: stream") + "Node 'stream-filter' is busy: block device is in use by block job: stream") result = self.vm.qmp('block-stream', device='node3', job_id='stream-node3', base=self.imgs[2]) self.assert_qmp(result, 'error/desc', @@ -294,7 +253,7 @@ class TestParallelOps(iotests.QMPTestCase): # block-commit should also fail if it touches nodes used by the stream job result = self.vm.qmp('block-commit', device='drive0', base=self.imgs[4], job_id='commit-node4') self.assert_qmp(result, 'error/desc', - "Node 'node4' is busy: block device is in use by block job: stream") + "Node 'stream-filter' is busy: block device is in use by block job: stream") result = self.vm.qmp('block-commit', device='drive0', base=self.imgs[1], top=self.imgs[3], job_id='commit-node1') self.assert_qmp(result, 'error/desc', diff --git a/tests/qemu-iotests/030.out b/tests/qemu-iotests/030.out index 6d9bee1..5eb508d 100644 --- a/tests/qemu-iotests/030.out +++ b/tests/qemu-iotests/030.out @@ -1,5 +1,5 @@ -........................... +.......................... ---------------------------------------------------------------------- -Ran 27 tests +Ran 26 tests OK diff --git a/tests/qemu-iotests/141.out b/tests/qemu-iotests/141.out index 08e0aec..028a16f 100644 --- a/tests/qemu-iotests/141.out +++ b/tests/qemu-iotests/141.out @@ -99,7 +99,7 @@ wrote 1048576/1048576 bytes at offset 0 {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}} {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}} {'execute': 'blockdev-del', 'arguments': {'node-name': 'drv0'}} -{"error": {"class": "GenericError", "desc": "Node drv0 is in use"}} +{"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block device is in use by block job: stream"}} {'execute': 'block-job-cancel', 'arguments': {'device': 'job0'}} {"return": {}} {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}} diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245 index e60c832..940e85a 100755 --- a/tests/qemu-iotests/245 +++ b/tests/qemu-iotests/245 @@ -899,17 +899,26 @@ class TestBlockdevReopen(iotests.QMPTestCase): # make hd1 read-only and block-stream requires it to be read-write # (Which error message appears depends on whether the stream job is # already done with copying at this point.) - self.reopen(opts, {}, + # As the COR-filter node is inserted into the backing chain with the + # 'block-stream' operation, we move the options to their proper nodes. + opts = hd_opts(1) + opts['backing'] = hd_opts(2) + opts['backing']['backing'] = None + self.reopen(opts, {'read-only': True}, ["Can't set node 'hd1' to r/o with copy-on-read enabled", "Cannot make block node read-only, there is a writer on it"]) # We can't remove hd2 while the stream job is ongoing - opts['backing']['backing'] = None - self.reopen(opts, {'backing.read-only': False}, "Cannot change 'backing' link from 'hd1' to 'hd2'") + opts['backing'] = None + self.reopen(opts, {'read-only': False}, + "Cannot change 'backing' link from 'hd1' to 'hd2'") - # We can detach hd1 from hd0 because it doesn't affect the stream job + # We can't detach hd1 from hd0 because there is the COR-filter implicit + # node in between. + opts = hd_opts(0) opts['backing'] = None - self.reopen(opts) + self.reopen(opts, {}, + "Cannot change backing link if 'hd0' has an implicit backing file") self.vm.run_job('stream0', auto_finalize = False, auto_dismiss = True) -- 1.8.3.1

On 12.10.20 19:43, Andrey Shinkevich wrote:
This patch completes the series with the COR-filter insertion for block-stream operations. Adding the filter makes it possible for copied regions to be discarded in backing files during the block-stream job, what will reduce the disk overuse. The COR-filter insertion incurs changes in the iotests case 245:test_block_stream_4 that reopens the backing chain during a block-stream job. There are changes in the iotests #030 as well. The iotests case 030:test_stream_parallel was deleted due to multiple conflicts between the concurrent job operations over the same backing chain. The base backing node for one job is the top node for another job. It may change due to the filter node inserted into the backing chain while both jobs are running. Another issue is that the parts of the backing chain are being frozen by the running job and may not be changed by the concurrent job when needed. The concept of the parallel jobs with common nodes is considered vital no more.
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> --- block/stream.c | 93 +++++++++++++++++++++++++++++----------------- tests/qemu-iotests/030 | 51 +++---------------------- tests/qemu-iotests/030.out | 4 +- tests/qemu-iotests/141.out | 2 +- tests/qemu-iotests/245 | 19 +++++++--- 5 files changed, 81 insertions(+), 88 deletions(-)
Looks like stream_run() could be a bit streamlined now (the allocation checking should be unnecessary, unconditionally calling stream_populate() should be sufficient), but not necessary now.
diff --git a/block/stream.c b/block/stream.c index d3e1812..93564db 100644 --- a/block/stream.c +++ b/block/stream.c
[...]
@@ -94,13 +94,14 @@ static void stream_clean(Job *job) { StreamBlockJob *s = container_of(job, StreamBlockJob, common.job); BlockJob *bjob = &s->common; - BlockDriverState *bs = blk_bs(bjob->blk); + + bdrv_cor_filter_drop(s->cor_filter_bs);
/* Reopen the image back in read-only mode if necessary */ if (s->bs_read_only) { /* Give up write permissions before making it read-only */ blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort);
Perhaps it would be good to do so even before the filter is dropped. I don’t know whether moving bjob->blk from cor_filter_bs to target_bs might cause problems otherwise.
- bdrv_reopen_set_read_only(bs, true, NULL); + bdrv_reopen_set_read_only(s->target_bs, true, NULL); } }
[...]
@@ -262,17 +249,48 @@ void stream_start(const char *job_id, BlockDriverState *bs, } }
- /* Prevent concurrent jobs trying to modify the graph structure here, we - * already have our own plans. Also don't allow resize as the image size is - * queried only at the job start and then cached. */ - s = block_job_create(job_id, &stream_job_driver, NULL, bs, - basic_flags | BLK_PERM_GRAPH_MOD, - basic_flags | BLK_PERM_WRITE, + QDict *opts = qdict_new();
Declaration should be done at the start of the block.
+ + qdict_put_str(opts, "driver", "copy-on-read"); + qdict_put_str(opts, "file", bdrv_get_node_name(bs)); + if (base_overlay) {
@base_overlay is always non-NULL, this condition should check @base, I think.
+ /* Pass the base_overlay rather than base */ + qdict_put_str(opts, "base", base_overlay->node_name); + } + if (filter_node_name) { + qdict_put_str(opts, "node-name", filter_node_name); + } + + cor_filter_bs = bdrv_cor_filter_append(bs, opts, BDRV_O_RDWR, errp); + if (cor_filter_bs == NULL) { + goto fail; + } + + if (bdrv_freeze_backing_chain(cor_filter_bs, bs, errp) < 0) {
Is there a reason why we can’t combine this with the bdrv_free_backing_chain() from bs down to above_base? I mean, the effect should be the same, just asking.
+ bdrv_cor_filter_drop(cor_filter_bs); + cor_filter_bs = NULL; + goto fail; + } + + s = block_job_create(job_id, &stream_job_driver, NULL, cor_filter_bs, + BLK_PERM_CONSISTENT_READ, + basic_flags | BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD,
Not that I’m an expert on the GRAPH_MOD permission, but why is this shared here but not below? Shouldn’t it be the same in both cases? (Same for taking it as a permission.)
speed, creation_flags, NULL, NULL, errp); if (!s) { goto fail; }
+ /* + * Prevent concurrent jobs trying to modify the graph structure here, we + * already have our own plans. Also don't allow resize as the image size is + * queried only at the job start and then cached. + */ + if (block_job_add_bdrv(&s->common, "active node", bs, + basic_flags | BLK_PERM_GRAPH_MOD, + basic_flags | BLK_PERM_WRITE, &error_abort)) { + goto fail; + } + /* Block all intermediate nodes between bs and base, because they will * disappear from the chain after this operation. The streaming job reads * every block only once, assuming that it doesn't change, so forbid writes
[...]
diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245 index e60c832..940e85a 100755 --- a/tests/qemu-iotests/245 +++ b/tests/qemu-iotests/245 @@ -899,17 +899,26 @@ class TestBlockdevReopen(iotests.QMPTestCase): # make hd1 read-only and block-stream requires it to be read-write # (Which error message appears depends on whether the stream job is # already done with copying at this point.)
Hm. Let’s look at the set of messages below... [1]
- self.reopen(opts, {}, + # As the COR-filter node is inserted into the backing chain with the + # 'block-stream' operation, we move the options to their proper nodes. + opts = hd_opts(1)
Oh, so this patch changes it so that only the subtree below hd1 is reopened, and we don’t have to deal with the filter options. Got it. (I think.)
+ opts['backing'] = hd_opts(2) + opts['backing']['backing'] = None + self.reopen(opts, {'read-only': True}, ["Can't set node 'hd1' to r/o with copy-on-read enabled",
[1] This isn’t done anymore as of this patch. So I don’t think this error message can still appear. Will some other message appear in its stead, or is it always going to be the second one?
"Cannot make block node read-only, there is a writer on it"])
# We can't remove hd2 while the stream job is ongoing - opts['backing']['backing'] = None - self.reopen(opts, {'backing.read-only': False}, "Cannot change 'backing' link from 'hd1' to 'hd2'") + opts['backing'] = None + self.reopen(opts, {'read-only': False}, + "Cannot change 'backing' link from 'hd1' to 'hd2'")
- # We can detach hd1 from hd0 because it doesn't affect the stream job + # We can't detach hd1 from hd0 because there is the COR-filter implicit + # node in between. + opts = hd_opts(0) opts['backing'] = None - self.reopen(opts) + self.reopen(opts, {}, + "Cannot change backing link if 'hd0' has an implicit backing file")
Does “has an implicit backing file” mean that hd0 has an implicit node (the COR filter) as its backing file? And then reopening isn’t allowed because the user supposedly doesn’t know about that implicit node? If so, makes sense. Max
self.vm.run_job('stream0', auto_finalize = False, auto_dismiss = True)

On 14.10.2020 19:24, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
[...]
--- block/stream.c | 93 +++++++++++++++++++++++++++++----------------- tests/qemu-iotests/030 | 51 +++---------------------- tests/qemu-iotests/030.out | 4 +- tests/qemu-iotests/141.out | 2 +- tests/qemu-iotests/245 | 19 +++++++--- 5 files changed, 81 insertions(+), 88 deletions(-)
Looks like stream_run() could be a bit streamlined now (the allocation checking should be unnecessary, unconditionally calling stream_populate() should be sufficient), but not necessary now.
That is what I had kept in my mind when I tackled this patch. But there is an underwater reef to streamline. Namely, how the block-stream job gets known about a long unallocated tail to exit the loop earlier in the stream_run(). Shall we return the '-EOF' or another error code from the cor_co_preadv_part() to be handled by the stream_run()? Any other suggestions, if any, will be appreciated.
diff --git a/block/stream.c b/block/stream.c index d3e1812..93564db 100644 --- a/block/stream.c +++ b/block/stream.c
[...]
+ + cor_filter_bs = bdrv_cor_filter_append(bs, opts, BDRV_O_RDWR, errp); + if (cor_filter_bs == NULL) { + goto fail; + } + + if (bdrv_freeze_backing_chain(cor_filter_bs, bs, errp) < 0) {
Is there a reason why we can’t combine this with the bdrv_free_backing_chain() from bs down to above_base? I mean, the effect should be the same, just asking.
The bdrv_freeze_backing_chain(bs, above_base, errp) is called before the bdrv_reopen_set_read_only() to keep the backing chain safe during the context switch. Then we will want to freeze the 'COR -> TOP BS' link as well. Freezing/unfreezing parts is simlier to manage than doing that with the whole chain. If we decide to invoke the bdrv_reopen_set_read_only() after freezing the backing chain together with the COR-filter, we will not be able to get the 'write' permission on the read-only node.
+ bdrv_cor_filter_drop(cor_filter_bs); + cor_filter_bs = NULL; + goto fail; + } + + s = block_job_create(job_id, &stream_job_driver, NULL, cor_filter_bs, + BLK_PERM_CONSISTENT_READ, + basic_flags | BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD,
Not that I’m an expert on the GRAPH_MOD permission, but why is this shared here but not below? Shouldn’t it be the same in both cases? (Same for taking it as a permission.)
When we invoke the block_job_add_bdrv(&s->common, "active node", bs,..) below (particularly, we need it to block the operations on the top node, bdrv_op_block_all()), we ask for the GRAPH_MOD permission for the top node. To allow that, the parent filter node should share that permission for the underlying node. Otherwise, we get assertion failed in the bdrv_check_update_perm() called from bdrv_replace_node() when we remove the filter.
speed, creation_flags, NULL, NULL, errp); if (!s) { goto fail; }
+ /* + * Prevent concurrent jobs trying to modify the graph structure here, we + * already have our own plans. Also don't allow resize as the image size is + * queried only at the job start and then cached. + */ + if (block_job_add_bdrv(&s->common, "active node", bs, + basic_flags | BLK_PERM_GRAPH_MOD, + basic_flags | BLK_PERM_WRITE, &error_abort)) { + goto fail; + } + /* Block all intermediate nodes between bs and base, because they will * disappear from the chain after this operation. The streaming job reads * every block only once, assuming that it doesn't change, so forbid writes
[...]
diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245 index e60c832..940e85a 100755 --- a/tests/qemu-iotests/245 +++ b/tests/qemu-iotests/245 @@ -899,17 +899,26 @@ class TestBlockdevReopen(iotests.QMPTestCase): # make hd1 read-only and block-stream requires it to be read-write # (Which error message appears depends on whether the stream job is # already done with copying at this point.)
Hm. Let’s look at the set of messages below... [1]
- self.reopen(opts, {}, + # As the COR-filter node is inserted into the backing chain with the + # 'block-stream' operation, we move the options to their proper nodes. + opts = hd_opts(1)
Oh, so this patch changes it so that only the subtree below hd1 is reopened, and we don’t have to deal with the filter options. Got it. (I think.)
Yes, that's right.
+ opts['backing'] = hd_opts(2) + opts['backing']['backing'] = None + self.reopen(opts, {'read-only': True}, ["Can't set node 'hd1' to r/o with copy-on-read enabled",
[1]
This isn’t done anymore as of this patch. So I don’t think this error message can still appear. Will some other message appear in its stead, or is it always going to be the second one?
The only second message appears in the test case when I run it on my node. So, I will remove the first one as the bdrv_enable_copy_on_read() is not called for the top BS on the frozen backing chain anymore. Also, I will delet the part of the comment: "(Which error message appears depends on whether the stream job is already done with copying at this point.)"
"Cannot make block node read-only, there is a writer on it"])
# We can't remove hd2 while the stream job is ongoing - opts['backing']['backing'] = None - self.reopen(opts, {'backing.read-only': False}, "Cannot change 'backing' link from 'hd1' to 'hd2'") + opts['backing'] = None + self.reopen(opts, {'read-only': False}, + "Cannot change 'backing' link from 'hd1' to 'hd2'")
- # We can detach hd1 from hd0 because it doesn't affect the stream job + # We can't detach hd1 from hd0 because there is the COR-filter implicit + # node in between. + opts = hd_opts(0) opts['backing'] = None - self.reopen(opts) + self.reopen(opts, {}, + "Cannot change backing link if 'hd0' has an implicit backing file")
Does “has an implicit backing file” mean that hd0 has an implicit node (the COR filter) as its backing file? And then reopening isn’t allowed because the user supposedly doesn’t know about that implicit node? If so, makes sense.
Yes, it is. Andrey
Max
self.vm.run_job('stream0', auto_finalize = False, auto_dismiss = True)

On 15.10.2020 20:16, Andrey Shinkevich wrote:
On 14.10.2020 19:24, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
[...]
--- block/stream.c | 93 +++++++++++++++++++++++++++++----------------- tests/qemu-iotests/030 | 51 +++---------------------- tests/qemu-iotests/030.out | 4 +- tests/qemu-iotests/141.out | 2 +- tests/qemu-iotests/245 | 19 +++++++--- 5 files changed, 81 insertions(+), 88 deletions(-)
Looks like stream_run() could be a bit streamlined now (the allocation checking should be unnecessary, unconditionally calling stream_populate() should be sufficient), but not necessary now.
That is what I had kept in my mind when I tackled this patch. But there is an underwater reef to streamline. Namely, how the block-stream job gets known about a long unallocated tail to exit the loop earlier in the stream_run(). Shall we return the '-EOF' or another error code from the cor_co_preadv_part() to be handled by the stream_run()? Any other suggestions, if any, will be appreciated.
diff --git a/block/stream.c b/block/stream.c index d3e1812..93564db 100644 --- a/block/stream.c +++ b/block/stream.c
[...]
+ + cor_filter_bs = bdrv_cor_filter_append(bs, opts, BDRV_O_RDWR, errp); + if (cor_filter_bs == NULL) { + goto fail; + } + + if (bdrv_freeze_backing_chain(cor_filter_bs, bs, errp) < 0) {
Is there a reason why we can’t combine this with the bdrv_free_backing_chain() from bs down to above_base? I mean, the effect should be the same, just asking.
The bdrv_freeze_backing_chain(bs, above_base, errp) is called before the bdrv_reopen_set_read_only() to keep the backing chain safe during the context switch. Then we will want to freeze the 'COR -> TOP BS' link as well. Freezing/unfreezing parts is simlier to manage than doing that with the whole chain. If we decide to invoke the bdrv_reopen_set_read_only() after freezing the backing chain together with the COR-filter, we will not be able to get the 'write' permission on the read-only node.
+ bdrv_cor_filter_drop(cor_filter_bs); + cor_filter_bs = NULL; + goto fail; + } + + s = block_job_create(job_id, &stream_job_driver, NULL, cor_filter_bs, + BLK_PERM_CONSISTENT_READ, + basic_flags | BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD,
Not that I’m an expert on the GRAPH_MOD permission, but why is this shared here but not below? Shouldn’t it be the same in both cases? (Same for taking it as a permission.)
When we invoke the block_job_add_bdrv(&s->common, "active node", bs,..) below (particularly, we need it to block the operations on the top node, bdrv_op_block_all()), we ask for the GRAPH_MOD permission for the top node. To allow that, the parent filter node should share that permission for the underlying node. Otherwise, we get assertion failed in the bdrv_check_update_perm() called from bdrv_replace_node() when we remove the filter.
I will add my comments above to the code. Andrey [...]

15.10.2020 20:16, Andrey Shinkevich wrote:
On 14.10.2020 19:24, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
[...]
--- block/stream.c | 93 +++++++++++++++++++++++++++++----------------- tests/qemu-iotests/030 | 51 +++---------------------- tests/qemu-iotests/030.out | 4 +- tests/qemu-iotests/141.out | 2 +- tests/qemu-iotests/245 | 19 +++++++--- 5 files changed, 81 insertions(+), 88 deletions(-)
Looks like stream_run() could be a bit streamlined now (the allocation checking should be unnecessary, unconditionally calling stream_populate() should be sufficient), but not necessary now.
That is what I had kept in my mind when I tackled this patch. But there is an underwater reef to streamline. Namely, how the block-stream job gets known about a long unallocated tail to exit the loop earlier in the stream_run(). Shall we return the '-EOF' or another error code from the cor_co_preadv_part() to be handled by the stream_run()? Any other suggestions, if any, will be appreciated.
Just calling read CHUNK by CHUNK may be less efficient than is_allocated()-driven loop: you may end up with splitting regions unaligned to CHUNK-granularity, which would not be splitted with is_allocated()-driven loop. Current loop allows chunks unaligned to CHUNK. So, I think, it's better to keep is_allocated() logic as is for now. -- Best regards, Vladimir

On 16.10.2020 18:45, Vladimir Sementsov-Ogievskiy wrote:
15.10.2020 20:16, Andrey Shinkevich wrote:
On 14.10.2020 19:24, Max Reitz wrote:
On 12.10.20 19:43, Andrey Shinkevich wrote:
[...]
--- block/stream.c | 93 +++++++++++++++++++++++++++++----------------- tests/qemu-iotests/030 | 51 +++---------------------- tests/qemu-iotests/030.out | 4 +- tests/qemu-iotests/141.out | 2 +- tests/qemu-iotests/245 | 19 +++++++--- 5 files changed, 81 insertions(+), 88 deletions(-)
Looks like stream_run() could be a bit streamlined now (the allocation checking should be unnecessary, unconditionally calling stream_populate() should be sufficient), but not necessary now.
That is what I had kept in my mind when I tackled this patch. But there is an underwater reef to streamline. Namely, how the block-stream job gets known about a long unallocated tail to exit the loop earlier in the stream_run(). Shall we return the '-EOF' or another error code from the cor_co_preadv_part() to be handled by the stream_run()? Any other suggestions, if any, will be appreciated.
Just calling read CHUNK by CHUNK may be less efficient than is_allocated()-driven loop: you may end up with splitting regions unaligned to CHUNK-granularity, which would not be splitted with is_allocated()-driven loop. Current loop allows chunks unaligned to CHUNK.
The cor_co_preadv_part() will check for the end of a file in the next version. So, the unalignment is not going to be the issue. Andrey
So, I think, it's better to keep is_allocated() logic as is for now.
participants (3)
-
Andrey Shinkevich
-
Max Reitz
-
Vladimir Sementsov-Ogievskiy