[PATCH 0/4] qemu: fix block jobs when copy_on_read is enabled

QEMU requires us to pass the topmost node(-name) in the backing chain as the 'device' argument for the blockdev-mirror/block-commit commands. Otherwise QEMU complains: "Need a root block node" Since libvirt always puts the copy-on-read driver on top of the backing chain, blockjobs on disks which use "copy_on_read" fail. The series below fixes that by passing the proper node. This fixes it just fine for the mirror, but for block-commit we still get another error due to a bug in qemu: "'libvirt-4-format' is not in this backing file chain" Additionally one weird thing happens with block-stream (blockpull). If I pass the layer below the copy-on-read as 'device', everything works. This is kind of weird as the documentation for that is the same. If I pass the nodename of copy-on-read. Max, Kevin, could you please comment on the block-stream part and possibly also on the plans to fix the issue with block-commit? Cc: Max Reitz <mreitz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> https://bugzilla.redhat.com/show_bug.cgi?id=1792195 Peter Krempa (4): qemu: blockcopy: Actually unplug unused images when mirror job fails to start qemu: domain: Extract code to determine topmost nodename to qemuDomainDiskGetTopNodename qemu: Fix value of 'device' argument for blockdev-mirror qemu: Fix value of 'device' argument for block-commit src/qemu/qemu_domain.c | 38 ++++++++++++++++++++++++++------------ src/qemu/qemu_domain.h | 4 ++++ src/qemu/qemu_driver.c | 6 +++--- src/qemu/qemu_migration.c | 2 +- 4 files changed, 34 insertions(+), 16 deletions(-) -- 2.24.1

If a mirror job fails to start in -blockdev mode we'd not unplug the backing files we added first because the code on the error path checked the wrong value. 'rc' is used as status of the code which added the images, but the state of the 'block(dev)-mirror' call is stored in 'ret' at that point. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_driver.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 0a1dc414ae..3218dc0e23 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -18413,7 +18413,7 @@ qemuDomainBlockCopyCommon(virDomainObjPtr vm, qemuBlockJobStarted(job, vm); endjob: - if (rc < 0 && + if (ret < 0 && virDomainObjIsActive(vm) && (data || crdata)) { qemuDomainObjEnterMonitor(driver, vm); -- 2.24.1

On 1/23/20 9:57 AM, Peter Krempa wrote:
If a mirror job fails to start in -blockdev mode we'd not unplug the backing files we added first because the code on the error path checked the wrong value. 'rc' is used as status of the code which added the images, but the state of the 'block(dev)-mirror' call is stored in 'ret' at that point.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_driver.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
The patch itself lacks the context to show it is correct, but I looked at the file to confirm it makes sense. Reviewed-by: Eric Blake <eblake@redhat.com>
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 0a1dc414ae..3218dc0e23 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -18413,7 +18413,7 @@ qemuDomainBlockCopyCommon(virDomainObjPtr vm, qemuBlockJobStarted(job, vm);
endjob: - if (rc < 0 && + if (ret < 0 && virDomainObjIsActive(vm) && (data || crdata)) { qemuDomainObjEnterMonitor(driver, vm);
-- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org

There are more places which require getting the topmost nodename to be passed to qemu. Separate it out into a new function. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_domain.c | 38 ++++++++++++++++++++++++++------------ src/qemu/qemu_domain.h | 4 ++++ 2 files changed, 30 insertions(+), 12 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 38addc7b61..4604b6c993 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11530,6 +11530,31 @@ qemuDomainDetermineDiskChain(virQEMUDriverPtr driver, } +/** + * qemuDomainDiskGetTopNodename: + * + * @disk: disk definition object + * + * Returns the pointer to the node-name of the topmost layer used by @disk as + * backend. Currently returns the nodename of the copy-on-read filter if enabled + * or the nodename of the top image's format driver. Empty disks return NULL. + * This must be used only when VIR_QEMU_CAPS_BLOCKDEV is enabled. + */ +const char * +qemuDomainDiskGetTopNodename(virDomainDiskDefPtr disk) +{ + qemuDomainDiskPrivatePtr priv = QEMU_DOMAIN_DISK_PRIVATE(disk); + + if (virStorageSourceIsEmpty(disk->src)) + return NULL; + + if (disk->copy_on_read == VIR_TRISTATE_SWITCH_ON) + return priv->nodeCopyOnRead; + + return disk->src->nodeformat; +} + + /** * qemuDomainDiskGetBackendAlias: * @disk: disk definition @@ -11549,8 +11574,6 @@ qemuDomainDiskGetBackendAlias(virDomainDiskDefPtr disk, virQEMUCapsPtr qemuCaps, char **backendAlias) { - qemuDomainDiskPrivatePtr priv = QEMU_DOMAIN_DISK_PRIVATE(disk); - const char *nodename = NULL; *backendAlias = NULL; if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_BLOCKDEV)) { @@ -11560,16 +11583,7 @@ qemuDomainDiskGetBackendAlias(virDomainDiskDefPtr disk, return 0; } - if (virStorageSourceIsEmpty(disk->src)) - return 0; - - if (disk->copy_on_read == VIR_TRISTATE_SWITCH_ON) - nodename = priv->nodeCopyOnRead; - else - nodename = disk->src->nodeformat; - - *backendAlias = g_strdup(nodename); - + *backendAlias = g_strdup(qemuDomainDiskGetTopNodename(disk)); return 0; } diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 21ece23177..c8bf621c61 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -874,6 +874,10 @@ int qemuDomainStorageFileInit(virQEMUDriverPtr driver, virStorageSourcePtr parent); char *qemuDomainStorageAlias(const char *device, int depth); +const char * +qemuDomainDiskGetTopNodename(virDomainDiskDefPtr disk) + ATTRIBUTE_NONNULL(1); + int qemuDomainDiskGetBackendAlias(virDomainDiskDefPtr disk, virQEMUCapsPtr qemuCaps, char **backendAlias) -- 2.24.1

On 1/23/20 9:57 AM, Peter Krempa wrote:
There are more places which require getting the topmost nodename to be passed to qemu. Separate it out into a new function.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_domain.c | 38 ++++++++++++++++++++++++++------------ src/qemu/qemu_domain.h | 4 ++++ 2 files changed, 30 insertions(+), 12 deletions(-)
Reviewed-by: Eric Blake <eblake@redhat.com> -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org

When using blockdev configurations the 'device' argument of 'blockdev-mirror' must correspond to the topmost node in the block node graph. Libvirt didn't do this properly in case when 'copy_on_read' option was enabled on the disk. Use qemuDomainDiskGetTopNodename to fix it for the blockdev-mirror calls in qemuDomainBlockCopy and the non-shared-storage migration. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_driver.c | 2 +- src/qemu/qemu_migration.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 3218dc0e23..9e2a94306c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -18386,7 +18386,7 @@ qemuDomainBlockCopyCommon(virDomainObjPtr vm, if (blockdev) { ret = qemuMonitorBlockdevMirror(priv->mon, job->name, true, - disk->src->nodeformat, + qemuDomainDiskGetTopNodename(disk), mirror->nodeformat, bandwidth, granularity, buf_size, mirror_shallow); } else { diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 29d228a8d9..d7814208a2 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -931,7 +931,7 @@ qemuMigrationSrcNBDStorageCopyOne(virQEMUDriverPtr driver, if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCKDEV)) { jobname = diskAlias; - sourcename = disk->src->nodeformat; + sourcename = qemuDomainDiskGetTopNodename(disk); persistjob = true; } else { jobname = NULL; -- 2.24.1

On 1/23/20 9:57 AM, Peter Krempa wrote:
When using blockdev configurations the 'device' argument of 'blockdev-mirror' must correspond to the topmost node in the block node graph. Libvirt didn't do this properly in case when 'copy_on_read' option was enabled on the disk.
Use qemuDomainDiskGetTopNodename to fix it for the blockdev-mirror calls in qemuDomainBlockCopy and the non-shared-storage migration.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_driver.c | 2 +- src/qemu/qemu_migration.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
Reviewed-by: Eric Blake <eblake@redhat.com> -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org

When using blockdev configurations the 'device' argument of 'blockdev-commit' must correspond to the topmost node in the block node graph. Libvirt didn't do this properly in case when 'copy_on_read' option was enabled on the disk. Use qemuDomainDiskGetTopNodename to fix it when calling block-commit. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_driver.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 9e2a94306c..0b23c747ce 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -18813,7 +18813,7 @@ qemuDomainBlockCommit(virDomainPtr dom, jobname = job->name; nodetop = topSource->nodeformat; nodebase = baseSource->nodeformat; - device = disk->src->nodeformat; + device = qemuDomainDiskGetTopNodename(disk); if (!backingPath && top_parent && !(backingPath = qemuBlockGetBackingStoreString(baseSource))) goto endjob; -- 2.24.1

On 1/23/20 9:57 AM, Peter Krempa wrote:
When using blockdev configurations the 'device' argument of 'blockdev-commit' must correspond to the topmost node in the block node graph. Libvirt didn't do this properly in case when 'copy_on_read' option was enabled on the disk.
Use qemuDomainDiskGetTopNodename to fix it when calling block-commit.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_driver.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Reviewed-by: Eric Blake <eblake@redhat.com>
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 9e2a94306c..0b23c747ce 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -18813,7 +18813,7 @@ qemuDomainBlockCommit(virDomainPtr dom, jobname = job->name; nodetop = topSource->nodeformat; nodebase = baseSource->nodeformat; - device = disk->src->nodeformat; + device = qemuDomainDiskGetTopNodename(disk); if (!backingPath && top_parent && !(backingPath = qemuBlockGetBackingStoreString(baseSource))) goto endjob;
-- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org

On 23.01.20 16:57, Peter Krempa wrote:
QEMU requires us to pass the topmost node(-name) in the backing chain as the 'device' argument for the blockdev-mirror/block-commit commands. Otherwise QEMU complains:
"Need a root block node"
Since libvirt always puts the copy-on-read driver on top of the backing chain, blockjobs on disks which use "copy_on_read" fail.
The series below fixes that by passing the proper node. This fixes it just fine for the mirror, but for block-commit we still get another error due to a bug in qemu:
"'libvirt-4-format' is not in this backing file chain"
Additionally one weird thing happens with block-stream (blockpull). If I pass the layer below the copy-on-read as 'device', everything works. This is kind of weird as the documentation for that is the same. If I pass the nodename of copy-on-read.
It isn’t. block-stream says:
# @device: the device or node name of the top image block-commit says: # @device: the device name or node-name of a root node blockdev-mirror says: # @device: The device name or node-name of a root node whose writes should be # mirrored.
Note the subtle difference: “top” != “root”. “top” is meant just as part of the job: There’s a base image, and a top image. The top image is the target (for stream, that is), the base is the first image in the chain that will be kept. “root node” means actually a root node in the graph. Hence, stream doesn’t require @device to point to a root node. (Before qemu commit 554b6147650 it did, but that commit relaxed block-stream.)
Max, Kevin, could you please comment on the block-stream part and possibly also on the plans to fix the issue with block-commit?
The plans are to fix it. The latest version of the series is here: https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00350.html It’s from August because it’s an extreme churn to work on new versions. It touches basically everything in the block layer, so rebasing it alone takes time. And then there are facts like reviews leading to new prerequesite series, namely: https://lists.nongnu.org/archive/html/qemu-block/2019-11/msg00370.html https://lists.nongnu.org/archive/html/qemu-block/2019-11/msg00908.html So in total, there are 96 patches. The churn for each version is hard. Finding willing reviewers is even harder. (I actually have basically no reviews for one of those prerequisite series (“block: Introduce real BdrvChildRole”) yet, so there’s little point in working on v7 of the main series.) Max
Cc: Max Reitz <mreitz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com>
https://bugzilla.redhat.com/show_bug.cgi?id=1792195
Peter Krempa (4): qemu: blockcopy: Actually unplug unused images when mirror job fails to start qemu: domain: Extract code to determine topmost nodename to qemuDomainDiskGetTopNodename qemu: Fix value of 'device' argument for blockdev-mirror qemu: Fix value of 'device' argument for block-commit
src/qemu/qemu_domain.c | 38 ++++++++++++++++++++++++++------------ src/qemu/qemu_domain.h | 4 ++++ src/qemu/qemu_driver.c | 6 +++--- src/qemu/qemu_migration.c | 2 +- 4 files changed, 34 insertions(+), 16 deletions(-)

On Fri, Jan 24, 2020 at 11:23:46 +0100, Max Reitz wrote:
On 23.01.20 16:57, Peter Krempa wrote:
QEMU requires us to pass the topmost node(-name) in the backing chain as the 'device' argument for the blockdev-mirror/block-commit commands. Otherwise QEMU complains:
"Need a root block node"
Since libvirt always puts the copy-on-read driver on top of the backing chain, blockjobs on disks which use "copy_on_read" fail.
The series below fixes that by passing the proper node. This fixes it just fine for the mirror, but for block-commit we still get another error due to a bug in qemu:
"'libvirt-4-format' is not in this backing file chain"
Additionally one weird thing happens with block-stream (blockpull). If I pass the layer below the copy-on-read as 'device', everything works. This is kind of weird as the documentation for that is the same. If I pass the nodename of copy-on-read.
It isn’t. block-stream says:
# @device: the device or node name of the top image block-commit says: # @device: the device name or node-name of a root node blockdev-mirror says: # @device: The device name or node-name of a root node whose writes should be # mirrored.
Note the subtle difference: “top” != “root”. “top” is meant just as part of the job: There’s a base image, and a top image. The top image is the target (for stream, that is), the base is the first image in the chain that will be kept.
Ah, right. The distinction was too subtle for me to detect :)
“root node” means actually a root node in the graph.
Hence, stream doesn’t require @device to point to a root node. (Before qemu commit 554b6147650 it did, but that commit relaxed block-stream.)
Cool, thanks for the version info. Since we use this only starting from qemu-4.2 we are safe to always use it.
Max, Kevin, could you please comment on the block-stream part and possibly also on the plans to fix the issue with block-commit?
The plans are to fix it. The latest version of the series is here:
https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00350.html
It’s from August because it’s an extreme churn to work on new versions. It touches basically everything in the block layer, so rebasing it alone takes time. And then there are facts like reviews leading to new prerequesite series, namely:
https://lists.nongnu.org/archive/html/qemu-block/2019-11/msg00370.html https://lists.nongnu.org/archive/html/qemu-block/2019-11/msg00908.html
So in total, there are 96 patches. The churn for each version is hard. Finding willing reviewers is even harder.
(I actually have basically no reviews for one of those prerequisite series (“block: Introduce real BdrvChildRole”) yet, so there’s little point in working on v7 of the main series.)
Thanks for the info. Unfortunately I'm not too familiar with qemu to warant a review for this :( Thank you for the information!
participants (3)
-
Eric Blake
-
Max Reitz
-
Peter Krempa