[libvirt] [PATCH 0/3] qemu: Fix handling of block job data in event handlers (blockdev-add saga fixes)

Some of the refactors which allowed carrying more data with blockjobs created a race condition when a too-quick job would actually delete the data. This is a problem for legacy block jobs only. Peter Krempa (3): qemu: process: Don't use qemuBlockJobStartupFinalize in qemuProcessHandleBlockJob qemu: driver: Don't use qemuBlockJobStartupFinalize in processBlockJobEvent qemu: driver: Add debug message when we conjure block job data object src/qemu/qemu_driver.c | 4 ++-- src/qemu/qemu_process.c | 3 +-- 2 files changed, 3 insertions(+), 4 deletions(-) -- 2.21.0

The block job event handler qemuProcessHandleBlockJob looks at the block job data to see whether the job requires synchronous handling. Since the block job event may arrive before we continue the job handling (if the job has no data to copy) we could hit the state when the job is still set as QEMU_BLOCKJOB_STATE_NEW (as we move it to the QEMU_BLOCKJOB_STATE_RUNNING state only after returning from monitor). If the event handler uses qemuBlockJobStartupFinalize it would unregister and free the job. Thankfully this is not a big problem for legacy blockjobs as we don't need much data for them but since we'd re-instantiate the job data structure we'd report wrong job type for active commit as qemu reports it as a regular commit job. Fix it by not using qemuBlockJobStartupFinalize function in qemuProcessHandleBlockJob as it is not starting the job anyways. https://bugzilla.redhat.com/show_bug.cgi?id=1721375 Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_process.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index f3ffd76259..75205bc121 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -936,7 +936,7 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, virQEMUDriverPtr driver = opaque; struct qemuProcessEvent *processEvent = NULL; virDomainDiskDefPtr disk; - qemuBlockJobDataPtr job = NULL; + VIR_AUTOUNREF(qemuBlockJobDataPtr) job = NULL; char *data = NULL; virObjectLock(vm); @@ -983,7 +983,6 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, } cleanup: - qemuBlockJobStartupFinalize(vm, job); qemuProcessEventFree(processEvent); virObjectUnlock(vm); return 0; -- 2.21.0

While this function does start a block job in case when we'd not be able to get our internal data for it, the handler sets the job state to QEMU_BLOCKJOB_STATE_RUNNING anyways, thus qemuBlockJobStartupFinalize would just unref the job. Since the other usage of qemuBlockJobStartupFinalize in the other part of the event handler was a bug replace this one anyways even if it would not cause problems. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_driver.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 329c166255..0438dd2878 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4686,7 +4686,7 @@ processBlockJobEvent(virQEMUDriverPtr driver, int status) { virDomainDiskDefPtr disk; - qemuBlockJobDataPtr job = NULL; + VIR_AUTOUNREF(qemuBlockJobDataPtr) job = NULL; if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) return; @@ -4712,7 +4712,6 @@ processBlockJobEvent(virQEMUDriverPtr driver, qemuBlockJobUpdate(vm, job, QEMU_ASYNC_JOB_NONE); endjob: - qemuBlockJobStartupFinalize(vm, job); qemuDomainObjEndJob(driver, vm); } -- 2.21.0

Report in logs when we don't find existing block job data and create it just to handle the job. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_driver.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 0438dd2878..482f915b67 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4702,6 +4702,7 @@ processBlockJobEvent(virQEMUDriverPtr driver, } if (!(job = qemuBlockJobDiskGetJob(disk))) { + VIR_DEBUG("creating new block job object for '%s'", diskAlias); if (!(job = qemuBlockJobDiskNew(vm, disk, type, diskAlias))) goto endjob; job->state = QEMU_BLOCKJOB_STATE_RUNNING; -- 2.21.0

On Thu, Jul 18, 2019 at 06:22:05PM +0200, Peter Krempa wrote:
Some of the refactors which allowed carrying more data with blockjobs created a race condition when a too-quick job would actually delete the data. This is a problem for legacy block jobs only.
Peter Krempa (3): qemu: process: Don't use qemuBlockJobStartupFinalize in qemuProcessHandleBlockJob qemu: driver: Don't use qemuBlockJobStartupFinalize in processBlockJobEvent qemu: driver: Add debug message when we conjure block job data object
src/qemu/qemu_driver.c | 4 ++-- src/qemu/qemu_process.c | 3 +-- 2 files changed, 3 insertions(+), 4 deletions(-)
Reviewed-by: Ján Tomko <jtomko@redhat.com> Jano
participants (2)
-
Ján Tomko
-
Peter Krempa