[libvirt] [PATCH 00/10] qemu: Fixes of the blockdev job handling (a blockdev-add series)

While playing with the incremental backup stuff I came across few corner cases in the blockdev job handling. Peter Krempa (10): qemu: monitor: Finish implementation of infrastructure for 'query-jobs' qemu: blockjob: Properly propagate cancellation of blockjobs qemu: process: Move block job refresh after async job recovery qemu: blockjob: Fix deadlock when terminating job with invalid data qemu: blockjob: Log blockjobs which are dropped when untracked by qemu qemu: blockjob: Mark job with broken data but tracked by qemu as reconnected qemu: blockjob: Don't stop processing the finished job early qemu: blockjob: Separate clearing of per-job data qemu: blockjob: Introduce "broken" block job type qemu: blockjob: Finish handling job with broken data src/qemu/qemu_blockjob.c | 70 ++++++++++++------- src/qemu/qemu_blockjob.h | 3 + src/qemu/qemu_domain.c | 13 +++- src/qemu/qemu_driver.c | 1 + src/qemu/qemu_monitor.h | 4 +- src/qemu/qemu_monitor_json.c | 6 ++ src/qemu/qemu_process.c | 6 +- .../blockjob-blockdev-in.xml | 1 + 8 files changed, 73 insertions(+), 31 deletions(-) -- 2.23.0

Commit ed56851f1bc6f5 didn't wire up fetching of the statistics for the job which are reported by 'query-jobs'. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_monitor.h | 4 ++-- src/qemu/qemu_monitor_json.c | 6 ++++++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index a17d7200c2..e2bfc420bb 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -154,8 +154,8 @@ struct _qemuMonitorJobInfo { qemuMonitorJobType type; qemuMonitorJobStatus status; char *error; - long long progressCurrent; - long long progressTotal; + unsigned long long progressCurrent; + unsigned long long progressTotal; }; diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 4d38030179..9f3783ab70 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -9221,6 +9221,12 @@ qemuMonitorJSONGetJobInfoOne(virJSONValuePtr data) job->id = g_strdup(id); job->error = g_strdup(errmsg); + /* failure to fetch progress stats is not fatal */ + ignore_value(virJSONValueObjectGetNumberUlong(data, "current-progress", + &job->progressCurrent)); + ignore_value(virJSONValueObjectGetNumberUlong(data, "total-progress", + &job->progressTotal)); + return g_steal_pointer(&job); } -- 2.23.0

On 11/26/19 10:17 AM, Peter Krempa wrote:
Commit ed56851f1bc6f5 didn't wire up fetching of the statistics for the job which are reported by 'query-jobs'.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_monitor.h | 4 ++-- src/qemu/qemu_monitor_json.c | 6 ++++++ 2 files changed, 8 insertions(+), 2 deletions(-)
Reviewed-by: Cole Robinson <crobinso@redhat.com> - Cole

qemu returns an error message in the job statistics even if the job was cancelled to emphasize it was not successful. Libvirt didn't properly transform it into QEMU_BLOCKJOB_STATE_CANCELLED though. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index 92e4d391c9..2283d49c61 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -1313,7 +1313,8 @@ qemuBlockJobEventProcessConcluded(qemuBlockJobDataPtr job, if (qemuDomainObjExitMonitor(driver, vm) < 0 || rc < 0) goto cleanup; - if (job->newstate == QEMU_BLOCKJOB_STATE_COMPLETED && + if ((job->newstate == QEMU_BLOCKJOB_STATE_COMPLETED || + job->newstate == QEMU_BLOCKJOB_STATE_FAILED) && job->state == QEMU_BLOCKJOB_STATE_ABORTING) job->newstate = QEMU_BLOCKJOB_STATE_CANCELLED; -- 2.23.0

On 11/26/19 10:17 AM, Peter Krempa wrote:
qemu returns an error message in the job statistics even if the job was cancelled to emphasize it was not successful. Libvirt didn't properly transform it into QEMU_BLOCKJOB_STATE_CANCELLED though.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index 92e4d391c9..2283d49c61 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -1313,7 +1313,8 @@ qemuBlockJobEventProcessConcluded(qemuBlockJobDataPtr job, if (qemuDomainObjExitMonitor(driver, vm) < 0 || rc < 0) goto cleanup;
- if (job->newstate == QEMU_BLOCKJOB_STATE_COMPLETED && + if ((job->newstate == QEMU_BLOCKJOB_STATE_COMPLETED || + job->newstate == QEMU_BLOCKJOB_STATE_FAILED) && job->state == QEMU_BLOCKJOB_STATE_ABORTING) job->newstate = QEMU_BLOCKJOB_STATE_CANCELLED;
Reviewed-by: Cole Robinson <crobinso@redhat.com> - Cole

Block jobs may be members of async jobs so it makes more sense to refresh block job state after we do steps for async job recovery. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_process.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index cb11da2401..a588ee25f8 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8143,9 +8143,6 @@ qemuProcessReconnect(void *opaque) qemuBlockNodeNamesDetect(driver, obj, QEMU_ASYNC_JOB_NONE) < 0) goto error; - if (qemuProcessRefreshBlockjobs(driver, obj) < 0) - goto error; - if (qemuRefreshVirtioChannelState(driver, obj, QEMU_ASYNC_JOB_NONE) < 0) goto error; @@ -8158,6 +8155,9 @@ qemuProcessReconnect(void *opaque) if (qemuProcessRecoverJob(driver, obj, &oldjob, &stopFlags) < 0) goto error; + if (qemuProcessRefreshBlockjobs(driver, obj) < 0) + goto error; + if (qemuProcessUpdateDevices(driver, obj) < 0) goto error; -- 2.23.0

On 11/26/19 10:17 AM, Peter Krempa wrote:
Block jobs may be members of async jobs so it makes more sense to refresh block job state after we do steps for async job recovery.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_process.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
Reviewed-by: Cole Robinson <crobinso@redhat.com> - Cole

We must exit the monitor prior to refusing other work, otherwise the VM object will become unusable. This bug was introduced in commit v5.5.0-244-gc412383796 but thankfully the code path was not excercised without QEMU_CAPS_BLOCKDEV. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index 2283d49c61..818e36435c 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -1305,14 +1305,14 @@ qemuBlockJobEventProcessConcluded(qemuBlockJobDataPtr job, dismissed = true; } + if (qemuDomainObjExitMonitor(driver, vm) < 0 || rc < 0) + goto cleanup; + if (job->invalidData) { VIR_WARN("terminating job '%s' with invalid data", job->name); goto cleanup; } - if (qemuDomainObjExitMonitor(driver, vm) < 0 || rc < 0) - goto cleanup; - if ((job->newstate == QEMU_BLOCKJOB_STATE_COMPLETED || job->newstate == QEMU_BLOCKJOB_STATE_FAILED) && job->state == QEMU_BLOCKJOB_STATE_ABORTING) -- 2.23.0

On 11/26/19 10:17 AM, Peter Krempa wrote:
We must exit the monitor prior to refusing other work, otherwise the VM object will become unusable.
This bug was introduced in commit v5.5.0-244-gc412383796 but thankfully the code path was not excercised without QEMU_CAPS_BLOCKDEV.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
Reviewed-by: Cole Robinson <crobinso@redhat.com> - Cole

Since we don't know what happened to the job we can't do much about it but we can at least log that this happened. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index 818e36435c..b83d681f06 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -501,8 +501,10 @@ qemuBlockJobRefreshJobs(virQEMUDriverPtr driver, /* remove data for job which qemu didn't report (the algorithm is * inefficient, but the possibility of such jobs is very low */ - while ((job = virHashSearch(priv->blockjobs, qemuBlockJobRefreshJobsFindInactive, NULL, NULL))) + while ((job = virHashSearch(priv->blockjobs, qemuBlockJobRefreshJobsFindInactive, NULL, NULL))) { + VIR_WARN("dropping blockjob '%s' untracked by qemu", job->name); qemuBlockJobUnregister(job, vm); + } ret = 0; -- 2.23.0

On 11/26/19 10:17 AM, Peter Krempa wrote:
Since we don't know what happened to the job we can't do much about it but we can at least log that this happened.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
Reviewed-by: Cole Robinson <crobinso@redhat.com> - Cole

Otherwise it would get dropped later on as untracked despite us knowing about it. Additionally since we cancelled it we must wait to dismiss it which would not be possible if we unregister it. This also opened a window for a race condition since the job state change event of the just-cancelled job might be delivered prior to us unregistering the job in which case everything would work properly. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index b83d681f06..4d14c3d27c 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -465,6 +465,8 @@ qemuBlockJobRefreshJobs(virQEMUDriverPtr driver, if (rc < 0) qemuBlockJobUnregister(job, vm); + else + job->reconnected = true; continue; } -- 2.23.0

On 11/26/19 10:17 AM, Peter Krempa wrote:
Otherwise it would get dropped later on as untracked despite us knowing about it. Additionally since we cancelled it we must wait to dismiss it which would not be possible if we unregister it. This also opened a window for a race condition since the job state change event of the just-cancelled job might be delivered prior to us unregistering the job in which case everything would work properly.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index b83d681f06..4d14c3d27c 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -465,6 +465,8 @@ qemuBlockJobRefreshJobs(virQEMUDriverPtr driver,
if (rc < 0) qemuBlockJobUnregister(job, vm); + else + job->reconnected = true; continue; }
Reviewed-by: Cole Robinson <crobinso@redhat.com> - Cole

Both failure to refresh and to dismiss the job are very unlikely but if they happen there's not much we can do about the blockjob. The concluded job handlers treat it as if the job failed if we don't update the state to 'QEMU_BLOCKJOB_STATE_COMPLETED' which is probably the safest thing to do here. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-) diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index 4d14c3d27c..5a6c5542a6 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -1270,8 +1270,6 @@ qemuBlockJobEventProcessConcluded(qemuBlockJobDataPtr job, qemuMonitorJobInfoPtr *jobinfo = NULL; size_t njobinfo = 0; size_t i; - int rc = 0; - bool dismissed = false; bool refreshed = false; if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) < 0) @@ -1279,7 +1277,7 @@ qemuBlockJobEventProcessConcluded(qemuBlockJobDataPtr job, /* we need to fetch the error state as the event does not propagate it */ if (job->newstate == QEMU_BLOCKJOB_STATE_CONCLUDED && - (rc = qemuMonitorGetJobInfo(qemuDomainGetMonitor(vm), &jobinfo, &njobinfo)) == 0) { + qemuMonitorGetJobInfo(qemuDomainGetMonitor(vm), &jobinfo, &njobinfo) == 0) { for (i = 0; i < njobinfo; i++) { if (STRNEQ_NULLABLE(job->name, jobinfo[i]->id)) @@ -1297,19 +1295,14 @@ qemuBlockJobEventProcessConcluded(qemuBlockJobDataPtr job, break; } - if (i == njobinfo) { + if (i == njobinfo) VIR_WARN("failed to refresh job '%s'", job->name); - rc = -1; - } } /* dismiss job in qemu */ - if (rc >= 0) { - if ((rc = qemuMonitorJobDismiss(qemuDomainGetMonitor(vm), job->name)) >= 0) - dismissed = true; - } + ignore_value(qemuMonitorJobDismiss(qemuDomainGetMonitor(vm), job->name)); - if (qemuDomainObjExitMonitor(driver, vm) < 0 || rc < 0) + if (qemuDomainObjExitMonitor(driver, vm) < 0) goto cleanup; if (job->invalidData) { @@ -1340,10 +1333,8 @@ qemuBlockJobEventProcessConcluded(qemuBlockJobDataPtr job, } cleanup: - if (dismissed) { - qemuBlockJobUnregister(job, vm); - qemuDomainSaveConfig(vm); - } + qemuBlockJobUnregister(job, vm); + qemuDomainSaveConfig(vm); for (i = 0; i < njobinfo; i++) qemuMonitorJobInfoFree(jobinfo[i]); -- 2.23.0

On 11/26/19 10:17 AM, Peter Krempa wrote:
Both failure to refresh and to dismiss the job are very unlikely but if they happen there's not much we can do about the blockjob.
The concluded job handlers treat it as if the job failed if we don't update the state to 'QEMU_BLOCKJOB_STATE_COMPLETED' which is probably the safest thing to do here.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-)
Reviewed-by: Cole Robinson <crobinso@redhat.com> - Cole

We will need to clear per-job type data when we will be marking a blockjob as broken in the new way. Extract the code for future reuse. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index 5a6c5542a6..6db3e0ca84 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -70,6 +70,13 @@ VIR_ENUM_IMPL(qemuBlockjob, static virClassPtr qemuBlockJobDataClass; +static void +qemuBlockJobDataDisposeJobdata(qemuBlockJobDataPtr job) +{ + if (job->type == QEMU_BLOCKJOB_TYPE_CREATE) + virObjectUnref(job->data.create.src); +} + static void qemuBlockJobDataDispose(void *obj) @@ -79,8 +86,7 @@ qemuBlockJobDataDispose(void *obj) virObjectUnref(job->chain); virObjectUnref(job->mirrorChain); - if (job->type == QEMU_BLOCKJOB_TYPE_CREATE) - virObjectUnref(job->data.create.src); + qemuBlockJobDataDisposeJobdata(job); g_free(job->name); g_free(job->errmsg); -- 2.23.0

On 11/26/19 10:17 AM, Peter Krempa wrote:
We will need to clear per-job type data when we will be marking a blockjob as broken in the new way. Extract the code for future reuse.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index 5a6c5542a6..6db3e0ca84 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -70,6 +70,13 @@ VIR_ENUM_IMPL(qemuBlockjob,
static virClassPtr qemuBlockJobDataClass;
Need an extra newline here to preserve previous spacing Reviewed-by: Cole Robinson <crobinso@redhat.com> - Cole

To better track jobs we couldn't parse let's introduce a new job type which will clarify semantics internally in few places. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 25 ++++++++++++++++++- src/qemu/qemu_blockjob.h | 3 +++ src/qemu/qemu_domain.c | 13 +++++++++- src/qemu/qemu_driver.c | 1 + .../blockjob-blockdev-in.xml | 1 + 5 files changed, 41 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index 6db3e0ca84..d957e3175e 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -66,7 +66,8 @@ VIR_ENUM_IMPL(qemuBlockjob, "commit", "active-commit", "", - "create"); + "create", + "broken"); static virClassPtr qemuBlockJobDataClass; @@ -127,6 +128,23 @@ qemuBlockJobDataNew(qemuBlockJobType type, } +/** + * qemuBlockJobMarkBroken: + * @job: job to mark as broken + * + * In case when we are unable to parse the block job data from the XML + * successfully we'll need to mark the job as broken and then attempt to abort + * it. This function marks the job as broken. + */ +static void +qemuBlockJobMarkBroken(qemuBlockJobDataPtr job) +{ + qemuBlockJobDataDisposeJobdata(job); + job->brokentype = job->type; + job->type = QEMU_BLOCKJOB_TYPE_BROKEN; +} + + /** * qemuBlockJobRegister: * @job: job to register @@ -460,6 +478,9 @@ qemuBlockJobRefreshJobs(virQEMUDriverPtr driver, * in qemu and just forget about it in libvirt because there's not much * we coud do besides killing the VM */ if (job->invalidData) { + + qemuBlockJobMarkBroken(job); + qemuDomainObjEnterMonitor(driver, vm); rc = qemuMonitorJobCancel(priv->mon, job->name, true); @@ -1254,6 +1275,8 @@ qemuBlockJobEventProcessConcludedTransition(qemuBlockJobDataPtr job, qemuBlockJobProcessEventConcludedCopyAbort(driver, vm, job, asyncJob); break; + + case QEMU_BLOCKJOB_TYPE_BROKEN: case QEMU_BLOCKJOB_TYPE_NONE: case QEMU_BLOCKJOB_TYPE_INTERNAL: case QEMU_BLOCKJOB_TYPE_LAST: diff --git a/src/qemu/qemu_blockjob.h b/src/qemu/qemu_blockjob.h index d8da918f2f..fdfe2c57ec 100644 --- a/src/qemu/qemu_blockjob.h +++ b/src/qemu/qemu_blockjob.h @@ -63,6 +63,7 @@ typedef enum { /* Additional enum values local to qemu */ QEMU_BLOCKJOB_TYPE_INTERNAL, QEMU_BLOCKJOB_TYPE_CREATE, + QEMU_BLOCKJOB_TYPE_BROKEN, QEMU_BLOCKJOB_TYPE_LAST } qemuBlockJobType; verify((int)QEMU_BLOCKJOB_TYPE_INTERNAL == VIR_DOMAIN_BLOCK_JOB_TYPE_LAST); @@ -131,6 +132,8 @@ struct _qemuBlockJobData { int newstate; /* qemuBlockjobState, subset of events emitted by qemu */ + int brokentype; /* the previous type of a broken blockjob qemuBlockJobType */ + bool invalidData; /* the job data (except name) is not valid */ bool reconnected; /* internal field for tracking whether job is live after reconnect to qemu */ }; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index c233a4ba96..d1596a28ca 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -2475,6 +2475,8 @@ qemuDomainObjPrivateXMLFormatBlockjobIterator(void *payload, virBufferEscapeString(&attrBuf, " type='%s'", qemuBlockjobTypeToString(job->type)); virBufferEscapeString(&attrBuf, " state='%s'", state); virBufferEscapeString(&attrBuf, " newstate='%s'", newstate); + if (job->brokentype != QEMU_BLOCKJOB_TYPE_NONE) + virBufferEscapeString(&attrBuf, " brokentype='%s'", qemuBlockjobTypeToString(job->brokentype)); virBufferEscapeString(&childBuf, "<errmsg>%s</errmsg>", job->errmsg); if (job->disk) { @@ -2536,6 +2538,8 @@ qemuDomainObjPrivateXMLFormatBlockjobIterator(void *payload, virBufferAddLit(&attrBuf, " shallownew='yes'"); break; + + case QEMU_BLOCKJOB_TYPE_BROKEN: case QEMU_BLOCKJOB_TYPE_NONE: case QEMU_BLOCKJOB_TYPE_INTERNAL: case QEMU_BLOCKJOB_TYPE_LAST: @@ -3100,6 +3104,8 @@ qemuDomainObjPrivateXMLParseBlockjobDataSpecific(qemuBlockJobDataPtr job, } break; + + case QEMU_BLOCKJOB_TYPE_BROKEN: case QEMU_BLOCKJOB_TYPE_NONE: case QEMU_BLOCKJOB_TYPE_INTERNAL: case QEMU_BLOCKJOB_TYPE_LAST: @@ -3125,6 +3131,7 @@ qemuDomainObjPrivateXMLParseBlockjobData(virDomainObjPtr vm, g_autoptr(qemuBlockJobData) job = NULL; g_autofree char *name = NULL; g_autofree char *typestr = NULL; + g_autofree char *brokentypestr = NULL; int type; g_autofree char *statestr = NULL; int state = QEMU_BLOCKJOB_STATE_FAILED; @@ -3146,13 +3153,17 @@ qemuDomainObjPrivateXMLParseBlockjobData(virDomainObjPtr vm, * clean it up */ if (!(typestr = virXPathString("string(./@type)", ctxt)) || (type = qemuBlockjobTypeFromString(typestr)) < 0) { - type = QEMU_BLOCKJOB_TYPE_NONE; + type = QEMU_BLOCKJOB_TYPE_BROKEN; invalidData = true; } if (!(job = qemuBlockJobDataNew(type, name))) return -1; + if ((brokentypestr = virXPathString("string(./@brokentype)", ctxt)) && + (job->brokentype = qemuBlockjobTypeFromString(brokentypestr)) < 0) + job->brokentype = QEMU_BLOCKJOB_TYPE_NONE; + if (!(statestr = virXPathString("string(./@state)", ctxt)) || (state = qemuBlockjobStateTypeFromString(statestr)) < 0) invalidData = true; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 669c12d6ca..fa8a948568 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -17418,6 +17418,7 @@ qemuDomainBlockPivot(virQEMUDriverPtr driver, case QEMU_BLOCKJOB_TYPE_COMMIT: case QEMU_BLOCKJOB_TYPE_INTERNAL: case QEMU_BLOCKJOB_TYPE_CREATE: + case QEMU_BLOCKJOB_TYPE_BROKEN: virReportError(VIR_ERR_OPERATION_INVALID, _("job type '%s' does not support pivot"), qemuBlockjobTypeToString(job->type)); diff --git a/tests/qemustatusxml2xmldata/blockjob-blockdev-in.xml b/tests/qemustatusxml2xmldata/blockjob-blockdev-in.xml index 4f6930001e..67ab099bd9 100644 --- a/tests/qemustatusxml2xmldata/blockjob-blockdev-in.xml +++ b/tests/qemustatusxml2xmldata/blockjob-blockdev-in.xml @@ -300,6 +300,7 @@ </source> </src> </blockjob> + <blockjob name='broken-test' type='broken' state='ready' brokentype='commit'/> <blockjob name='test-orphan-job0' type='copy' state='ready'> <chains> <disk type='file' format='qcow2'> -- 2.23.0

On 11/26/19 10:17 AM, Peter Krempa wrote:
To better track jobs we couldn't parse let's introduce a new job type which will clarify semantics internally in few places.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 25 ++++++++++++++++++- src/qemu/qemu_blockjob.h | 3 +++ src/qemu/qemu_domain.c | 13 +++++++++- src/qemu/qemu_driver.c | 1 + .../blockjob-blockdev-in.xml | 1 + 5 files changed, 41 insertions(+), 2 deletions(-)
Reviewed-by: Cole Robinson <crobinso@redhat.com> - Cole

Now that we have a separate job type which will not trigger normal code paths for terminating job we can remove the ad-hoc handling. This possibly fixes the issue of a broken job inheriting the disk and then finishing in which case we'd not detach the backing chain. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index d957e3175e..2c51367c67 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -1334,11 +1334,6 @@ qemuBlockJobEventProcessConcluded(qemuBlockJobDataPtr job, if (qemuDomainObjExitMonitor(driver, vm) < 0) goto cleanup; - if (job->invalidData) { - VIR_WARN("terminating job '%s' with invalid data", job->name); - goto cleanup; - } - if ((job->newstate == QEMU_BLOCKJOB_STATE_COMPLETED || job->newstate == QEMU_BLOCKJOB_STATE_FAILED) && job->state == QEMU_BLOCKJOB_STATE_ABORTING) -- 2.23.0

On 11/26/19 10:17 AM, Peter Krempa wrote:
Now that we have a separate job type which will not trigger normal code paths for terminating job we can remove the ad-hoc handling.
This possibly fixes the issue of a broken job inheriting the disk and then finishing in which case we'd not detach the backing chain.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_blockjob.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index d957e3175e..2c51367c67 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -1334,11 +1334,6 @@ qemuBlockJobEventProcessConcluded(qemuBlockJobDataPtr job, if (qemuDomainObjExitMonitor(driver, vm) < 0) goto cleanup;
- if (job->invalidData) { - VIR_WARN("terminating job '%s' with invalid data", job->name); - goto cleanup; - } - if ((job->newstate == QEMU_BLOCKJOB_STATE_COMPLETED || job->newstate == QEMU_BLOCKJOB_STATE_FAILED) && job->state == QEMU_BLOCKJOB_STATE_ABORTING)
Reviewed-by: Cole Robinson <crobinso@redhat.com> - Cole
participants (2)
-
Cole Robinson
-
Peter Krempa