[PATCH] qemu: blockjob: Transition into 'ready' state only from expected states

In certain rare occasions qemu can transition a block job which was already 'ready' into 'standby' and then back. If this happens in the following order libvirt will get confused about the actual job state: 1) the block copy job is 'ready' (job->state == QEMU_BLOCKJOB_STATE_READY) 2) user calls qemuDomainBlockJobAbort with VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT flag but without VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC 3) the block job is switched to synchronous event handling 4) the block job blips to 'standby' and back to 'ready', the event is not processed since the blockjob is in sync mode for now 5) qemuDomainBlockJobPivot is called: 5.1) 'job-complete' QMP command is issued 5.2) job->state is set to QEMU_BLOCKJOB_STATE_PIVOTING 6) code for synchronous-wait for the job completion in qemuDomainBlockJobAbort is invoked 7) the waiting loop calls qemuBlockJobUpdate: 7.1) job->newstate is QEMU_BLOCKJOB_STATE_READY due to 4) 7.2) qemuBlockJobEventProcess is called 7.3) the handler for QEMU_BLOCKJOB_STATE_READY overwrites job->state from QEMU_BLOCKJOB_STATE_PIVOTING to QEMU_BLOCKJOB_STATE_READY 8) qemuDomainBlockJobAbort is looking for a finished job, so waits again 9) qemu finishes the blockjob and transitions it into 'concluded' state 10) qemuBlockJobUpdate is triggered again, this time finalizing the job. 10.1) job->newstate is = QEMU_BLOCKJOB_STATE_CONCLUDED job->state is = QEMU_BLOCKJOB_STATE_READY 10.2) qemuBlockJobEventProcessConcluded is called, the function checks whether there was an error with the blockjob. Since there was no error job->newstate becomes QEMU_BLOCKJOB_STATE_COMPLETED. 10.3) qemuBlockJobEventProcessConcludedTransition selects the action for the appropriate block job type where we have: case QEMU_BLOCKJOB_TYPE_COPY: if (job->state == QEMU_BLOCKJOB_STATE_PIVOTING && success) qemuBlockJobProcessEventConcludedCopyPivot(driver, vm, job, asyncJob); else qemuBlockJobProcessEventConcludedCopyAbort(driver, vm, job, asyncJob); break; Since job->state is QEMU_BLOCKJOB_STATE_READY, qemuBlockJobProcessEventConcludedCopyAbort is called. This patch forbids transitions to QEMU_BLOCKJOB_STATE_READY if the previous job state isn't QEMU_BLOCKJOB_STATE_RUNNING or QEMU_BLOCKJOB_STATE_NEW. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1951507 Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index 21fcc29ddb..9ae4500f4d 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -1697,14 +1697,21 @@ qemuBlockJobEventProcess(virQEMUDriver *driver, break; case QEMU_BLOCKJOB_STATE_READY: - /* mirror may be NULL for copy job corresponding to migration */ - if (job->disk) { - job->disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY; - qemuBlockJobEmitEvents(driver, vm, job->disk, job->type, job->newstate); + /* in certain cases qemu can blip out and back into 'ready' state for + * a blockjob. In cases when we already are past RUNNING the job such + * as when pivoting/aborting this could reset the internally set job + * state, thus we ignore it if the job isn't in expected state */ + if (job->state == QEMU_BLOCKJOB_STATE_NEW || + job->state == QEMU_BLOCKJOB_STATE_RUNNING) { + /* mirror may be NULL for copy job corresponding to migration */ + if (job->disk) { + job->disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY; + qemuBlockJobEmitEvents(driver, vm, job->disk, job->type, job->newstate); + } + job->state = job->newstate; + qemuDomainSaveStatus(vm); } - job->state = job->newstate; job->newstate = -1; - qemuDomainSaveStatus(vm); break; case QEMU_BLOCKJOB_STATE_NEW: -- 2.30.2

On Tue, Apr 20, 2021 at 2:45 PM Peter Krempa <pkrempa@redhat.com> wrote:
In certain rare occasions qemu can transition a block job which was already 'ready' into 'standby' and then back. If this happens in the following order libvirt will get confused about the actual job state:
1) the block copy job is 'ready' (job->state == QEMU_BLOCKJOB_STATE_READY)
2) user calls qemuDomainBlockJobAbort with VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT flag but without VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC
3) the block job is switched to synchronous event handling
4) the block job blips to 'standby' and back to 'ready', the event is not processed since the blockjob is in sync mode for now
5) qemuDomainBlockJobPivot is called: 5.1) 'job-complete' QMP command is issued 5.2) job->state is set to QEMU_BLOCKJOB_STATE_PIVOTING
6) code for synchronous-wait for the job completion in qemuDomainBlockJobAbort is invoked
7) the waiting loop calls qemuBlockJobUpdate:
7.1) job->newstate is QEMU_BLOCKJOB_STATE_READY due to 4) 7.2) qemuBlockJobEventProcess is called 7.3) the handler for QEMU_BLOCKJOB_STATE_READY overwrites job->state from QEMU_BLOCKJOB_STATE_PIVOTING to QEMU_BLOCKJOB_STATE_READY
8) qemuDomainBlockJobAbort is looking for a finished job, so waits again
9) qemu finishes the blockjob and transitions it into 'concluded' state
10) qemuBlockJobUpdate is triggered again, this time finalizing the job. 10.1) job->newstate is = QEMU_BLOCKJOB_STATE_CONCLUDED job->state is = QEMU_BLOCKJOB_STATE_READY 10.2) qemuBlockJobEventProcessConcluded is called, the function checks whether there was an error with the blockjob. Since there was no error job->newstate becomes QEMU_BLOCKJOB_STATE_COMPLETED. 10.3) qemuBlockJobEventProcessConcludedTransition selects the action for the appropriate block job type where we have:
case QEMU_BLOCKJOB_TYPE_COPY: if (job->state == QEMU_BLOCKJOB_STATE_PIVOTING && success) qemuBlockJobProcessEventConcludedCopyPivot(driver, vm, job, asyncJob); else qemuBlockJobProcessEventConcludedCopyAbort(driver, vm, job, asyncJob); break;
The log does not make it clear what libvirt is doing and why.
Since job->state is QEMU_BLOCKJOB_STATE_READY, qemuBlockJobProcessEventConcludedCopyAbort is called.
This patch forbids transitions to QEMU_BLOCKJOB_STATE_READY if the previous job state isn't QEMU_BLOCKJOB_STATE_RUNNING or QEMU_BLOCKJOB_STATE_NEW.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1951507 Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index 21fcc29ddb..9ae4500f4d 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -1697,14 +1697,21 @@ qemuBlockJobEventProcess(virQEMUDriver *driver, break;
case QEMU_BLOCKJOB_STATE_READY: - /* mirror may be NULL for copy job corresponding to migration */ - if (job->disk) { - job->disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY; - qemuBlockJobEmitEvents(driver, vm, job->disk, job->type, job->newstate); + /* in certain cases qemu can blip out and back into 'ready' state for + * a blockjob. In cases when we already are past RUNNING the job such + * as when pivoting/aborting this could reset the internally set job + * state, thus we ignore it if the job isn't in expected state */ + if (job->state == QEMU_BLOCKJOB_STATE_NEW || + job->state == QEMU_BLOCKJOB_STATE_RUNNING) { + /* mirror may be NULL for copy job corresponding to migration */ + if (job->disk) { + job->disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY; + qemuBlockJobEmitEvents(driver, vm, job->disk, job->type, job->newstate); + } + job->state = job->newstate; + qemuDomainSaveStatus(vm); } - job->state = job->newstate; job->newstate = -1; - qemuDomainSaveStatus(vm); break;
case QEMU_BLOCKJOB_STATE_NEW: -- 2.30.2
The fix looks good to me. Should we add a debug log about ignoring the event when the state is not running or new? Seems that flipping state between ready and standby is a repeating issue. Does it affect other block jobs? Finally, there is no change in tests, so I guess we are missing a test verifying this flow? Nir

On Tue, Apr 20, 2021 at 15:07:03 +0300, Nir Soffer wrote:
On Tue, Apr 20, 2021 at 2:45 PM Peter Krempa <pkrempa@redhat.com> wrote:
In certain rare occasions qemu can transition a block job which was already 'ready' into 'standby' and then back. If this happens in the following order libvirt will get confused about the actual job state:
1) the block copy job is 'ready' (job->state == QEMU_BLOCKJOB_STATE_READY)
2) user calls qemuDomainBlockJobAbort with VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT flag but without VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC
3) the block job is switched to synchronous event handling
4) the block job blips to 'standby' and back to 'ready', the event is not processed since the blockjob is in sync mode for now
5) qemuDomainBlockJobPivot is called: 5.1) 'job-complete' QMP command is issued 5.2) job->state is set to QEMU_BLOCKJOB_STATE_PIVOTING
6) code for synchronous-wait for the job completion in qemuDomainBlockJobAbort is invoked
7) the waiting loop calls qemuBlockJobUpdate:
7.1) job->newstate is QEMU_BLOCKJOB_STATE_READY due to 4) 7.2) qemuBlockJobEventProcess is called 7.3) the handler for QEMU_BLOCKJOB_STATE_READY overwrites job->state from QEMU_BLOCKJOB_STATE_PIVOTING to QEMU_BLOCKJOB_STATE_READY
8) qemuDomainBlockJobAbort is looking for a finished job, so waits again
9) qemu finishes the blockjob and transitions it into 'concluded' state
10) qemuBlockJobUpdate is triggered again, this time finalizing the job. 10.1) job->newstate is = QEMU_BLOCKJOB_STATE_CONCLUDED job->state is = QEMU_BLOCKJOB_STATE_READY 10.2) qemuBlockJobEventProcessConcluded is called, the function checks whether there was an error with the blockjob. Since there was no error job->newstate becomes QEMU_BLOCKJOB_STATE_COMPLETED. 10.3) qemuBlockJobEventProcessConcludedTransition selects the action for the appropriate block job type where we have:
case QEMU_BLOCKJOB_TYPE_COPY: if (job->state == QEMU_BLOCKJOB_STATE_PIVOTING && success) qemuBlockJobProcessEventConcludedCopyPivot(driver, vm, job, asyncJob); else qemuBlockJobProcessEventConcludedCopyAbort(driver, vm, job, asyncJob); break;
The log does not make it clear what libvirt is doing and why.
Both qemuBlockJobProcessEventConcludedCopyPivot and qemuBlockJobProcessEventConcludedCopyAbort do log that they are called and the arguments, the value of 'success' can be inferred from the previous QMP call to 'query-jobs' where no error is reported. I've used those points to infer what's happening. Also the log has enough info to see that the 'ready' event was re-handled synchronously: 2021-04-18 13:41:17.293+0000: 19074: debug : qemuProcessHandleJobStatusChange:1010 : job 'copy-vdb-libvirt-5-format'(domain: 0x7f0138011550,vm0) state changed to 'ready'(4) 2021-04-18 13:41:17.293+0000: 19074: debug : qemuProcessHandleJobStatusChange:1028 : job 'copy-vdb-libvirt-5-format' handled synchronously
Since job->state is QEMU_BLOCKJOB_STATE_READY, qemuBlockJobProcessEventConcludedCopyAbort is called.
This patch forbids transitions to QEMU_BLOCKJOB_STATE_READY if the previous job state isn't QEMU_BLOCKJOB_STATE_RUNNING or QEMU_BLOCKJOB_STATE_NEW.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1951507 Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index 21fcc29ddb..9ae4500f4d 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -1697,14 +1697,21 @@ qemuBlockJobEventProcess(virQEMUDriver *driver, break;
case QEMU_BLOCKJOB_STATE_READY: - /* mirror may be NULL for copy job corresponding to migration */ - if (job->disk) { - job->disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY; - qemuBlockJobEmitEvents(driver, vm, job->disk, job->type, job->newstate); + /* in certain cases qemu can blip out and back into 'ready' state for + * a blockjob. In cases when we already are past RUNNING the job such + * as when pivoting/aborting this could reset the internally set job + * state, thus we ignore it if the job isn't in expected state */ + if (job->state == QEMU_BLOCKJOB_STATE_NEW || + job->state == QEMU_BLOCKJOB_STATE_RUNNING) { + /* mirror may be NULL for copy job corresponding to migration */ + if (job->disk) { + job->disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY; + qemuBlockJobEmitEvents(driver, vm, job->disk, job->type, job->newstate); + } + job->state = job->newstate; + qemuDomainSaveStatus(vm); } - job->state = job->newstate; job->newstate = -1; - qemuDomainSaveStatus(vm); break;
case QEMU_BLOCKJOB_STATE_NEW: -- 2.30.2
The fix looks good to me.
Should we add a debug log about ignoring the event when the state is not running or new?
I can add one, but it didn't feel that much useful since even with the log entry it wouldn't be obvious without knowing what the code is doing.
Seems that flipping state between ready and standby is a repeating issue. Does it affect other block jobs?
The only other block job which uses 'ready' state is active-layer block commit, but that uses the same code to handle the completion, so this patch fixes both if it's ever to be observed with block-commit.
Finally, there is no change in tests, so I guess we are missing a test verifying this flow?
We unfortunately don't have unit tests for block jobs I was planning to add some once the conversion to blockdev was complete but unfortunately got pre-empted by more important work. I still have that on my to-do list, but the priority isn't that high :(

On 4/20/21 1:44 PM, Peter Krempa wrote:
In certain rare occasions qemu can transition a block job which was already 'ready' into 'standby' and then back. If this happens in the following order libvirt will get confused about the actual job state:
1) the block copy job is 'ready' (job->state == QEMU_BLOCKJOB_STATE_READY)
2) user calls qemuDomainBlockJobAbort with VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT flag but without VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC
3) the block job is switched to synchronous event handling
4) the block job blips to 'standby' and back to 'ready', the event is not processed since the blockjob is in sync mode for now
5) qemuDomainBlockJobPivot is called: 5.1) 'job-complete' QMP command is issued 5.2) job->state is set to QEMU_BLOCKJOB_STATE_PIVOTING
6) code for synchronous-wait for the job completion in qemuDomainBlockJobAbort is invoked
7) the waiting loop calls qemuBlockJobUpdate:
7.1) job->newstate is QEMU_BLOCKJOB_STATE_READY due to 4) 7.2) qemuBlockJobEventProcess is called 7.3) the handler for QEMU_BLOCKJOB_STATE_READY overwrites job->state from QEMU_BLOCKJOB_STATE_PIVOTING to QEMU_BLOCKJOB_STATE_READY
8) qemuDomainBlockJobAbort is looking for a finished job, so waits again
9) qemu finishes the blockjob and transitions it into 'concluded' state
10) qemuBlockJobUpdate is triggered again, this time finalizing the job. 10.1) job->newstate is = QEMU_BLOCKJOB_STATE_CONCLUDED job->state is = QEMU_BLOCKJOB_STATE_READY 10.2) qemuBlockJobEventProcessConcluded is called, the function checks whether there was an error with the blockjob. Since there was no error job->newstate becomes QEMU_BLOCKJOB_STATE_COMPLETED. 10.3) qemuBlockJobEventProcessConcludedTransition selects the action for the appropriate block job type where we have:
case QEMU_BLOCKJOB_TYPE_COPY: if (job->state == QEMU_BLOCKJOB_STATE_PIVOTING && success) qemuBlockJobProcessEventConcludedCopyPivot(driver, vm, job, asyncJob); else qemuBlockJobProcessEventConcludedCopyAbort(driver, vm, job, asyncJob); break;
Since job->state is QEMU_BLOCKJOB_STATE_READY, qemuBlockJobProcessEventConcludedCopyAbort is called.
This patch forbids transitions to QEMU_BLOCKJOB_STATE_READY if the previous job state isn't QEMU_BLOCKJOB_STATE_RUNNING or QEMU_BLOCKJOB_STATE_NEW.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1951507 Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Michal
participants (3)
-
Michal Privoznik
-
Nir Soffer
-
Peter Krempa