[PATCH v2 00/10] qemu: add option to process offloaded legacy blockjob event ealier

This is successor to [1] but I changed the subject as in the review the patch 'qemu: sync backing chain update in virDomainGetBlockJobInfo' was not considered good one from design POV. However I think the basic patch is helpful to address similar issues. Look at [*] for example, there it allows to sync backing chains during query block stats. In discussion of [1] I stated that first patch will also allow to get rid of stale block job events on reconnection to qemu. But this will require additional work actually so with this patch series stale events are still present. And in the discussion it was also mentioned by Peter that stale events are not harmful for legacy blockjobs code. I also removed previous attempts to eliminate some of stale events. I still keep patch for virDomainGetBlockJobInfo. May be in comparsion to [*] the approach the patch takes will look not so bad :) [1] First version of the patch series https://www.redhat.com/archives/libvir-list/2020-October/msg01133.html Nikolay Shirokovskiy (10): qemu: add option to process offloaded legacy blockjob event ealier qemu: reconnect: precreate legacy blockjobs qemu: remove extra block job finalize on reconnect qemu: remove stale cleanup in qemuProcessRefreshLegacyBlockjob qemu: add note for outdated legacy block job events qemu: use autoptr in qemuProcessRefreshLegacyBlockjobs qemu: refresh backing chain after block job reconnection qemu: move code that depends on backing chain appropriately qemu: fix race on legacy block completion and quering stats [*] qemu: sync backing chain update in virDomainGetBlockJobInfo src/qemu/qemu_blockjob.c | 2 +- src/qemu/qemu_blockjob.h | 7 +++ src/qemu/qemu_driver.c | 50 +++++++++++----- src/qemu/qemu_process.c | 149 ++++++++++++++++++++++++++++------------------- 4 files changed, 132 insertions(+), 76 deletions(-) -- 1.8.3.1

Currently in qemuProcessHandleBlockJob we either offload blockjob event processing to the worker thread or notify another thread that waits for blockjob event and that thread processes the event. But sometimes after event is offloaded to the worker thread we need to process the event in a different thread. To be able to to do it let's always set newstate/errmsg and then let whatever thread is first process it. Libvirt always creates blockjob object before starting blockjob but at least in Virtuozzo we have been using virDomainQemuMonitorCommand for several years to start push backups and relay on behaviour that libvirt sends block job events even in this case. Thus let's continue to send notifications if block job object is not found. Note that we can have issue on reconnect when block jobs are not yet created but this will addressed in following patches. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_blockjob.c | 2 +- src/qemu/qemu_blockjob.h | 7 +++++++ src/qemu/qemu_driver.c | 17 ++++------------- src/qemu/qemu_process.c | 19 +++++++++++-------- 4 files changed, 23 insertions(+), 22 deletions(-) diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index 2a5a5e6..cf3939d 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -596,7 +596,7 @@ qemuBlockJobRefreshJobs(virQEMUDriverPtr driver, * Emits the VIR_DOMAIN_EVENT_ID_BLOCK_JOB and VIR_DOMAIN_EVENT_ID_BLOCK_JOB_2 * for a block job. The former event is emitted only for local disks. */ -static void +void qemuBlockJobEmitEvents(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainDiskDefPtr disk, diff --git a/src/qemu/qemu_blockjob.h b/src/qemu/qemu_blockjob.h index 9f73a35..5edb459 100644 --- a/src/qemu/qemu_blockjob.h +++ b/src/qemu/qemu_blockjob.h @@ -245,3 +245,10 @@ qemuBlockJobGetByDisk(virDomainDiskDefPtr disk) qemuBlockjobState qemuBlockjobConvertMonitorStatus(int monitorstatus); + +void +qemuBlockJobEmitEvents(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk, + virDomainBlockJobType type, + virConnectDomainEventBlockJobStatus status); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 05f8eb2..25466f6 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4116,9 +4116,7 @@ processSerialChangedEvent(virQEMUDriverPtr driver, static void processBlockJobEvent(virQEMUDriverPtr driver, virDomainObjPtr vm, - const char *diskAlias, - int type, - int status) + const char *diskAlias) { virDomainDiskDefPtr disk; g_autoptr(qemuBlockJobData) job = NULL; @@ -4137,14 +4135,10 @@ processBlockJobEvent(virQEMUDriverPtr driver, } if (!(job = qemuBlockJobDiskGetJob(disk))) { - VIR_DEBUG("creating new block job object for '%s'", diskAlias); - if (!(job = qemuBlockJobDiskNew(vm, disk, type, diskAlias))) - goto endjob; - job->state = QEMU_BLOCKJOB_STATE_RUNNING; + VIR_DEBUG("disk %s job not found", diskAlias); + goto endjob; } - job->newstate = status; - qemuBlockJobUpdate(vm, job, QEMU_ASYNC_JOB_NONE); endjob: @@ -4319,10 +4313,7 @@ static void qemuProcessEventHandler(void *data, void *opaque) processEvent->action); break; case QEMU_PROCESS_EVENT_BLOCK_JOB: - processBlockJobEvent(driver, vm, - processEvent->data, - processEvent->action, - processEvent->status); + processBlockJobEvent(driver, vm, processEvent->data); break; case QEMU_PROCESS_EVENT_JOB_STATUS_CHANGE: processJobStatusChangeEvent(driver, vm, processEvent->data); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 1963de9..a810f38 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -953,13 +953,18 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon G_GNUC_UNUSED, if (!(disk = qemuProcessFindDomainDiskByAliasOrQOM(vm, diskAlias, NULL))) goto cleanup; - job = qemuBlockJobDiskGetJob(disk); + if (!(job = qemuBlockJobDiskGetJob(disk))) { + VIR_DEBUG("block job for '%s' not found", diskAlias); + qemuBlockJobEmitEvents(driver, vm, disk, type, status); + goto cleanup; + } - if (job && job->synchronous) { - /* We have a SYNC API waiting for this event, dispatch it back */ - job->newstate = status; - VIR_FREE(job->errmsg); - job->errmsg = g_strdup(error); + job->newstate = status; + VIR_FREE(job->errmsg); + job->errmsg = g_strdup(error); + + /* We have a SYNC API waiting for this event, dispatch it back */ + if (job->synchronous) { virDomainObjBroadcast(vm); } else { /* there is no waiting SYNC API, dispatch the update to a thread */ @@ -969,8 +974,6 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon G_GNUC_UNUSED, data = g_strdup(diskAlias); processEvent->data = data; processEvent->vm = virObjectRef(vm); - processEvent->action = type; - processEvent->status = status; if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { virObjectUnref(vm); -- 1.8.3.1

Now we basically ignore qemu block job event if there is no correspondent block job object in libvirt. So we need to precreate block job objects before we call qemu's query-block-job because we can receive events right after receiving query-block-job result but before we create block job object in libvirt and miss this event. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_process.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index a810f38..549c17a 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8153,8 +8153,8 @@ qemuProcessRefreshLegacyBlockjob(void *payload, disk->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT) jobtype = disk->mirrorJob; - if (!(job = qemuBlockJobDiskNew(vm, disk, jobtype, jobname))) - return -1; + job = qemuBlockJobDiskGetJob(disk); + job->type = jobtype; if (disk->mirror) { if (info->ready == 1 || @@ -8199,6 +8199,16 @@ qemuProcessRefreshLegacyBlockjobs(virQEMUDriverPtr driver, { GHashTable *blockJobs = NULL; int ret = -1; + size_t i; + + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + g_autoptr(qemuBlockJobData) job = NULL; + g_autofree char *jobname = NULL; + + jobname = qemuAliasDiskDriveFromDisk(disk); + job = qemuBlockJobDiskNew(vm, disk, QEMU_BLOCKJOB_TYPE_NONE, jobname); + } qemuDomainObjEnterMonitor(driver, vm); blockJobs = qemuMonitorGetAllBlockJobInfo(qemuDomainGetMonitor(vm), true); @@ -8208,6 +8218,13 @@ qemuProcessRefreshLegacyBlockjobs(virQEMUDriverPtr driver, if (virHashForEach(blockJobs, qemuProcessRefreshLegacyBlockjob, vm) < 0) goto cleanup; + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + qemuBlockJobDataPtr job = qemuBlockJobDiskGetJob(disk); + + qemuBlockJobStartupFinalize(vm, job); + } + ret = 0; cleanup: -- 1.8.3.1

As we now call qemuBlockJobStartupFinalize in qemuProcessRefreshLegacyBlockjobs for precreated block job of every disk anyway. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_process.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 549c17a..49ee8fe 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8140,7 +8140,7 @@ qemuProcessRefreshLegacyBlockjob(void *payload, virDomainObjPtr vm = opaque; qemuMonitorBlockJobInfoPtr info = payload; virDomainDiskDefPtr disk; - qemuBlockJobDataPtr job; + g_autoptr(qemuBlockJobData) job = NULL; qemuBlockJobType jobtype = info->type; qemuDomainObjPrivatePtr priv = vm->privateData; @@ -8187,7 +8187,6 @@ qemuProcessRefreshLegacyBlockjob(void *payload, qemuBlockJobStarted(job, vm); cleanup: - qemuBlockJobStartupFinalize(vm, job); return 0; } -- 1.8.3.1

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_process.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 49ee8fe..9bbc9dc 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8172,7 +8172,7 @@ qemuProcessRefreshLegacyBlockjob(void *payload, if (disk->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) { if (qemuDomainDetermineDiskChain(priv->driver, vm, disk, disk->mirror, true) < 0) - goto cleanup; + return 0; if (disk->mirror->format && disk->mirror->format != VIR_STORAGE_FILE_RAW && @@ -8180,14 +8180,11 @@ qemuProcessRefreshLegacyBlockjob(void *payload, qemuSetupImageChainCgroup(vm, disk->mirror) < 0 || qemuSecuritySetImageLabel(priv->driver, vm, disk->mirror, true, true) < 0)) - goto cleanup; + return 0; } } qemuBlockJobStarted(job, vm); - - cleanup: - return 0; } -- 1.8.3.1

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_process.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 9bbc9dc..8706de3 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8214,6 +8214,12 @@ qemuProcessRefreshLegacyBlockjobs(virQEMUDriverPtr driver, if (virHashForEach(blockJobs, qemuProcessRefreshLegacyBlockjob, vm) < 0) goto cleanup; + /* + * At this point we can have outdated pending events in job->newstate. + * These events can arise from qemu events received before we get + * reply to query-block-jobs command. So the code that handle legacy + * block job events should handle outdated events correctly. + */ for (i = 0; i < vm->def->ndisks; i++) { virDomainDiskDefPtr disk = vm->def->disks[i]; qemuBlockJobDataPtr job = qemuBlockJobDiskGetJob(disk); -- 1.8.3.1

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_process.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 8706de3..46a39ac 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8193,8 +8193,7 @@ static int qemuProcessRefreshLegacyBlockjobs(virQEMUDriverPtr driver, virDomainObjPtr vm) { - GHashTable *blockJobs = NULL; - int ret = -1; + g_autoptr(GHashTable) blockJobs = NULL; size_t i; for (i = 0; i < vm->def->ndisks; i++) { @@ -8209,10 +8208,10 @@ qemuProcessRefreshLegacyBlockjobs(virQEMUDriverPtr driver, qemuDomainObjEnterMonitor(driver, vm); blockJobs = qemuMonitorGetAllBlockJobInfo(qemuDomainGetMonitor(vm), true); if (qemuDomainObjExitMonitor(driver, vm) < 0 || !blockJobs) - goto cleanup; + return -1; if (virHashForEach(blockJobs, qemuProcessRefreshLegacyBlockjob, vm) < 0) - goto cleanup; + return -1; /* * At this point we can have outdated pending events in job->newstate. @@ -8227,11 +8226,7 @@ qemuProcessRefreshLegacyBlockjobs(virQEMUDriverPtr driver, qemuBlockJobStartupFinalize(vm, job); } - ret = 0; - - cleanup: - virHashFree(blockJobs); - return ret; + return 0; } -- 1.8.3.1

After [1] we basically ignore qemu block job event if there is no correspondent block job object in libvirt. Thus we need to refresh backing chain after we reconnect block jobs or we can miss events that change backing chain. There is also another reason for it. I'm not sure of result if both libvirt is reading images metadata on refreshing backing chain and qemu is writing metadata on block job completion. Reconnection code that depends on backing chain will be moved appropriately in a different patch. [1] qemu: add option to process offloaded legacy blockjob event ealier Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_process.c | 37 +++++++++++++++++++++++-------------- 1 file changed, 23 insertions(+), 14 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 46a39ac..0d50db5 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8193,9 +8193,15 @@ static int qemuProcessRefreshLegacyBlockjobs(virQEMUDriverPtr driver, virDomainObjPtr vm) { + qemuDomainObjPrivatePtr priv = vm->privateData; g_autoptr(GHashTable) blockJobs = NULL; size_t i; + if (priv->reconnectBlockjobs == VIR_TRISTATE_BOOL_NO) { + VIR_DEBUG("skipping block job reconnection"); + return 0; + } + for (i = 0; i < vm->def->ndisks; i++) { virDomainDiskDefPtr disk = vm->def->disks[i]; g_autoptr(qemuBlockJobData) job = NULL; @@ -8221,9 +8227,25 @@ qemuProcessRefreshLegacyBlockjobs(virQEMUDriverPtr driver, */ for (i = 0; i < vm->def->ndisks; i++) { virDomainDiskDefPtr disk = vm->def->disks[i]; - qemuBlockJobDataPtr job = qemuBlockJobDiskGetJob(disk); + g_autoptr(qemuBlockJobData) job = qemuBlockJobDiskGetJob(disk); qemuBlockJobStartupFinalize(vm, job); + + if ((job = qemuBlockJobDiskGetJob(disk))) + continue; + + if (!disk->mirror) { + /* This should be the only place that calls + * qemuDomainDetermineDiskChain with @report_broken == false + * to guarantee best-effort domain reconnect */ + virStorageSourceBackingStoreClear(disk->src); + if (qemuDomainDetermineDiskChain(driver, vm, disk, NULL, false) < 0) + return -1; + } + /* + * TODO we need code to properly reconnect for mirror blockjobs + * that completed while libvirtd was down. + */ } return 0; @@ -8360,19 +8382,6 @@ qemuProcessReconnect(void *opaque) if (virDomainDiskTranslateSourcePool(disk) < 0) goto error; - /* backing chains need to be refreshed only if they could change */ - if (priv->reconnectBlockjobs != VIR_TRISTATE_BOOL_NO && - !virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCKDEV)) { - /* This should be the only place that calls - * qemuDomainDetermineDiskChain with @report_broken == false - * to guarantee best-effort domain reconnect */ - virStorageSourceBackingStoreClear(disk->src); - if (qemuDomainDetermineDiskChain(driver, obj, disk, NULL, false) < 0) - goto error; - } else { - VIR_DEBUG("skipping backing chain detection for '%s'", disk->dst); - } - dev.type = VIR_DOMAIN_DEVICE_DISK; dev.data.disk = disk; if (qemuAddSharedDevice(driver, &dev, obj->def->name) < 0) -- 1.8.3.1

After the previous patch that moves backing chain detection later in code. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_process.c | 47 +++++++++++++++++++++++++---------------------- 1 file changed, 25 insertions(+), 22 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 0d50db5..0dadc69 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8372,22 +8372,6 @@ qemuProcessReconnect(void *opaque) if (qemuDomainInitializePflashStorageSource(obj) < 0) goto error; - /* XXX: Need to change as long as lock is introduced for - * qemu_driver->sharedDevices. - */ - for (i = 0; i < obj->def->ndisks; i++) { - virDomainDiskDefPtr disk = obj->def->disks[i]; - virDomainDeviceDef dev; - - if (virDomainDiskTranslateSourcePool(disk) < 0) - goto error; - - dev.type = VIR_DOMAIN_DEVICE_DISK; - dev.data.disk = disk; - if (qemuAddSharedDevice(driver, &dev, obj->def->name) < 0) - goto error; - } - for (i = 0; i < obj->def->ngraphics; i++) { if (qemuProcessGraphicsReservePorts(obj->def->graphics[i], true) < 0) goto error; @@ -8436,12 +8420,6 @@ qemuProcessReconnect(void *opaque) goto error; } - /* if domain requests security driver we haven't loaded, report error, but - * do not kill the domain - */ - ignore_value(qemuSecurityCheckAllLabel(driver->securityManager, - obj->def)); - if (qemuProcessRefreshCPU(driver, obj) < 0) goto error; @@ -8482,6 +8460,31 @@ qemuProcessReconnect(void *opaque) if (qemuProcessRefreshBlockjobs(driver, obj) < 0) goto error; + /* XXX: Need to change as long as lock is introduced for + * qemu_driver->sharedDevices. + * + * Handling shared devices depends on top source and thus should + * follow refreshing blockjobs which can update it. + */ + for (i = 0; i < obj->def->ndisks; i++) { + virDomainDiskDefPtr disk = obj->def->disks[i]; + virDomainDeviceDef dev; + + if (virDomainDiskTranslateSourcePool(disk) < 0) + goto error; + + dev.type = VIR_DOMAIN_DEVICE_DISK; + dev.data.disk = disk; + if (qemuAddSharedDevice(driver, &dev, obj->def->name) < 0) + goto error; + } + + /* if domain requests security driver we haven't loaded, report error, but + * do not kill the domain + */ + ignore_value(qemuSecurityCheckAllLabel(driver->securityManager, + obj->def)); + if (qemuProcessUpdateDevices(driver, obj) < 0) goto error; -- 1.8.3.1

At the time when we query qemu for block stats backing chain in qemu and libvirt can be different and this will result in messy block stats. I guess this can be noticable under load when thread that process events is busy so that it can take some time before block job events are processed. The modern block job have same issue I guess but I don't know them well enough to apply this fix too. Also the approach for them can be different as there is option to use autofinalize = no. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_driver.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 25466f6..05917eb 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -18458,8 +18458,9 @@ qemuDomainGetStatsBlock(virQEMUDriverPtr driver, int count_index = -1; size_t visited = 0; bool visitBacking = !!(privflags & QEMU_DOMAIN_STATS_BACKING); + bool refresh = true; - if (HAVE_JOB(privflags) && virDomainObjIsActive(dom)) { + while (HAVE_JOB(privflags) && virDomainObjIsActive(dom) && refresh) { qemuDomainObjEnterMonitor(driver, dom); rc = qemuMonitorGetAllBlockStatsInfo(priv->mon, &stats, visitBacking); @@ -18481,6 +18482,27 @@ qemuDomainGetStatsBlock(virQEMUDriverPtr driver, /* failure to retrieve stats is fine at this point */ if (rc < 0 || (fetchnodedata && !nodedata)) virResetLastError(); + + refresh = false; + if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCKDEV)) { + for (i = 0; i < dom->def->ndisks; i++) { + virDomainDiskDefPtr disk = dom->def->disks[i]; + g_autoptr(qemuBlockJobData) job = qemuBlockJobDiskGetJob(disk); + + if (!job) + continue; + + /* + * If block job is completed then backing chain at time + * of quering stats in qemu and current backing chain in + * libvirt can be different. We need to sync them. + */ + if (job->newstate == QEMU_BLOCKJOB_STATE_COMPLETED) { + qemuBlockJobUpdate(dom, job, QEMU_ASYNC_JOB_NONE); + refresh = true; + } + } + } } if (nodedata && -- 1.8.3.1

Some mgmt still use polling for block job completion. After job completion the job failure/success is infered by inspecting domain xml. With legacy block job processing this does not always work. The issue deals with how libvirt processes events. If no other thread is waiting for blockjob event then event processing if offloaded to worker thread. If now virDomainGetBlockJobInfo API is called then as block job is already dismissed in legacy scheme the API returns 0 but backing chain is not yet updated as processing yet be done in worker thread. Now mgmt checks backing chain right after return from the API call and detects error. This happens quite often under load. I guess because of we have only one worker thread for all the domains. The event delivery is synchronous in qemu and block job completed event is sent in job finalize step so if block job is absent the event is already delivered and we just need to process it. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_driver.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 05917eb..25f66df 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -14740,8 +14740,15 @@ qemuDomainGetBlockJobInfo(virDomainPtr dom, ret = qemuMonitorGetBlockJobInfo(qemuDomainGetMonitor(vm), job->name, &rawInfo); if (qemuDomainObjExitMonitor(driver, vm) < 0) ret = -1; - if (ret <= 0) + if (ret < 0) + goto endjob; + if (ret == 0) { + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCKDEV)) + qemuBlockJobUpdate(vm, job, QEMU_ASYNC_JOB_NONE); goto endjob; + } if (qemuBlockJobInfoTranslate(&rawInfo, info, disk, flags & VIR_DOMAIN_BLOCK_JOB_INFO_BANDWIDTH_BYTES) < 0) { -- 1.8.3.1

On Fri, Nov 13, 2020 at 09:53:28 +0300, Nikolay Shirokovskiy wrote:
This is successor to [1] but I changed the subject as in the review the patch 'qemu: sync backing chain update in virDomainGetBlockJobInfo' was not considered good one from design POV. However I think the basic patch is helpful to address similar issues. Look at [*] for example, there it allows to sync backing chains during query block stats.
In discussion of [1] I stated that first patch will also allow to get rid of stale block job events on reconnection to qemu. But this will require additional work actually so with this patch series stale events are still present. And in the discussion it was also mentioned by Peter that stale events are not harmful for legacy blockjobs code. I also removed previous attempts to eliminate some of stale events.
I still keep patch for virDomainGetBlockJobInfo. May be in comparsion to [*] the approach the patch takes will look not so bad :)
[1] First version of the patch series https://www.redhat.com/archives/libvir-list/2020-October/msg01133.html
Please allow for longer review time. I'm stuffed with other work and also as I've pointed out last time I'm not very in favor of modifying the old block job code since it's being phased out. I still suggest the client application is fixed to use the events and stop polling virDomainGetBlockJobInfo.

On 13.11.2020 10:24, Peter Krempa wrote:
On Fri, Nov 13, 2020 at 09:53:28 +0300, Nikolay Shirokovskiy wrote:
This is successor to [1] but I changed the subject as in the review the patch 'qemu: sync backing chain update in virDomainGetBlockJobInfo' was not considered good one from design POV. However I think the basic patch is helpful to address similar issues. Look at [*] for example, there it allows to sync backing chains during query block stats.
In discussion of [1] I stated that first patch will also allow to get rid of stale block job events on reconnection to qemu. But this will require additional work actually so with this patch series stale events are still present. And in the discussion it was also mentioned by Peter that stale events are not harmful for legacy blockjobs code. I also removed previous attempts to eliminate some of stale events.
I still keep patch for virDomainGetBlockJobInfo. May be in comparsion to [*] the approach the patch takes will look not so bad :)
[1] First version of the patch series https://www.redhat.com/archives/libvir-list/2020-October/msg01133.html
Please allow for longer review time. I'm stuffed with other work and also as I've pointed out last time I'm not very in favor of modifying the old block job code since it's being phased out.
I still suggest the client application is fixed to use the events and stop polling virDomainGetBlockJobInfo.
Of course, take your time, we all have our duties. Note that there is new case in "[PATCH v2 09/10] qemu: fix race on legacy block completion and quering stats" that is similar to case in virDomainGetBlockJobInfo. AFAIU modern blockjobs are also affected. Nikolay

On Fri, Nov 13, 2020 at 09:53:28 +0300, Nikolay Shirokovskiy wrote:
This is successor to [1] but I changed the subject as in the review the patch 'qemu: sync backing chain update in virDomainGetBlockJobInfo' was not considered good one from design POV. However I think the basic patch is helpful to address similar issues. Look at [*] for example, there it allows to sync backing chains during query block stats.
In discussion of [1] I stated that first patch will also allow to get rid of stale block job events on reconnection to qemu. But this will require additional work actually so with this patch series stale events are still present. And in the discussion it was also mentioned by Peter that stale events are not harmful for legacy blockjobs code. I also removed previous attempts to eliminate some of stale events.
I still keep patch for virDomainGetBlockJobInfo. May be in comparsion to [*] the approach the patch takes will look not so bad :)
[1] First version of the patch series https://www.redhat.com/archives/libvir-list/2020-October/msg01133.html
I had a look at this series and it's just doing too much just to "fix" apps which insist on polling especially with pre-blockdev qemu. I simply it's not worth adding the complexity you are proposing. NACK series. Instead I propose we add a workaround which will report fake unfinished job: https://www.redhat.com/archives/libvir-list/2020-December/msg00352.html The scope of this is way more limited and it doesn't actually try to modify the job code handovers which is very dangerous. I had a look at this series and it's just doing too much just to "fix" apps which insist on polling especially with pre-blockdev qemu. Additionally it adds documentation stating that apps should not poll virDomainGetBlockJobInfo.

On 04.12.2020 18:11, Peter Krempa wrote:
On Fri, Nov 13, 2020 at 09:53:28 +0300, Nikolay Shirokovskiy wrote:
This is successor to [1] but I changed the subject as in the review the patch 'qemu: sync backing chain update in virDomainGetBlockJobInfo' was not considered good one from design POV. However I think the basic patch is helpful to address similar issues. Look at [*] for example, there it allows to sync backing chains during query block stats.
In discussion of [1] I stated that first patch will also allow to get rid of stale block job events on reconnection to qemu. But this will require additional work actually so with this patch series stale events are still present. And in the discussion it was also mentioned by Peter that stale events are not harmful for legacy blockjobs code. I also removed previous attempts to eliminate some of stale events.
I still keep patch for virDomainGetBlockJobInfo. May be in comparsion to [*] the approach the patch takes will look not so bad :)
[1] First version of the patch series https://www.redhat.com/archives/libvir-list/2020-October/msg01133.html
I had a look at this series and it's just doing too much just to "fix" apps which insist on polling especially with pre-blockdev qemu. I simply it's not worth adding the complexity you are proposing.
NACK series.
Ok. By the way what about the issue described in [PATCH v2 09/10] qemu: fix race on legacy block completion and quering stats Are modern blockjobs have similar? Nikolay
Instead I propose we add a workaround which will report fake unfinished job:
https://www.redhat.com/archives/libvir-list/2020-December/msg00352.html
The scope of this is way more limited and it doesn't actually try to modify the job code handovers which is very dangerous.
I had a look at this series and it's just doing too much just to "fix" apps which insist on polling especially with pre-blockdev qemu.
Additionally it adds documentation stating that apps should not poll virDomainGetBlockJobInfo.

On Fri, Dec 04, 2020 at 18:44:00 +0300, Nikolay Shirokovskiy wrote:
On 04.12.2020 18:11, Peter Krempa wrote:
On Fri, Nov 13, 2020 at 09:53:28 +0300, Nikolay Shirokovskiy wrote:
[...]
Ok. By the way what about the issue described in
[PATCH v2 09/10] qemu: fix race on legacy block completion and quering stats
Are modern blockjobs have similar?
Modern blockjobs are started with '"auto-dismiss": false' so if you get to the monitor before the thread completing the blockjob finishes it, qemu still reports the status for the job. The completion is done inside a 'MODIFY' job lock, so other thread won't be able to get to the monitor in the time between the job is dismissed in qemu and the XML/definition is updated.

On 04.12.2020 18:52, Peter Krempa wrote:
On Fri, Dec 04, 2020 at 18:44:00 +0300, Nikolay Shirokovskiy wrote:
On 04.12.2020 18:11, Peter Krempa wrote:
On Fri, Nov 13, 2020 at 09:53:28 +0300, Nikolay Shirokovskiy wrote:
[...]
Ok. By the way what about the issue described in
[PATCH v2 09/10] qemu: fix race on legacy block completion and quering stats
Are modern blockjobs have similar?
Modern blockjobs are started with '"auto-dismiss": false' so if you get to the monitor before the thread completing the blockjob finishes it, qemu still reports the status for the job.
But from qemu docs I read that graph changes occur on finalize step and modern blockjobs use auto-finalize = true. So at the moment of querying stats graph in libvirt and qemu can be different I thought. Nikolay
The completion is done inside a 'MODIFY' job lock, so other thread won't be able to get to the monitor in the time between the job is dismissed in qemu and the XML/definition is updated.

On Fri, Dec 04, 2020 at 19:00:57 +0300, Nikolay Shirokovskiy wrote:
On 04.12.2020 18:52, Peter Krempa wrote:
On Fri, Dec 04, 2020 at 18:44:00 +0300, Nikolay Shirokovskiy wrote:
On 04.12.2020 18:11, Peter Krempa wrote:
On Fri, Nov 13, 2020 at 09:53:28 +0300, Nikolay Shirokovskiy wrote:
[...]
Ok. By the way what about the issue described in
[PATCH v2 09/10] qemu: fix race on legacy block completion and quering stats
Are modern blockjobs have similar?
Modern blockjobs are started with '"auto-dismiss": false' so if you get to the monitor before the thread completing the blockjob finishes it, qemu still reports the status for the job.
But from qemu docs I read that graph changes occur on finalize step and modern blockjobs use auto-finalize = true. So at the moment of querying stats graph in libvirt and qemu can be different I thought.
That doesn't matter for the purposte of data returned from 'qemuDomainGetBlockJobInfo'. QEMU still reports the job, so you will still see it as running and the XML will look the same as while it was running. In fact the images are still in use by qemu at that point as they were not blockdev-del-ed. The graph will change though.

On 04.12.2020 19:24, Peter Krempa wrote:
On Fri, Dec 04, 2020 at 19:00:57 +0300, Nikolay Shirokovskiy wrote:
On 04.12.2020 18:52, Peter Krempa wrote:
On Fri, Dec 04, 2020 at 18:44:00 +0300, Nikolay Shirokovskiy wrote:
On 04.12.2020 18:11, Peter Krempa wrote:
On Fri, Nov 13, 2020 at 09:53:28 +0300, Nikolay Shirokovskiy wrote:
[...]
Ok. By the way what about the issue described in
[PATCH v2 09/10] qemu: fix race on legacy block completion and quering stats
Are modern blockjobs have similar?
Modern blockjobs are started with '"auto-dismiss": false' so if you get to the monitor before the thread completing the blockjob finishes it, qemu still reports the status for the job.
But from qemu docs I read that graph changes occur on finalize step and modern blockjobs use auto-finalize = true. So at the moment of querying stats graph in libvirt and qemu can be different I thought.
That doesn't matter for the purposte of data returned from 'qemuDomainGetBlockJobInfo'. QEMU still reports the job, so you will still see it as running and the XML will look the same as while it was running.
Yeah, but the 9th patch is not fixing qemuDomainGetBlockJobInfo (as 10th does). It fixes getting stats. And different graphs in qemu and libvirt affects stats collection. Nikolay
In fact the images are still in use by qemu at that point as they were not blockdev-del-ed. The graph will change though.

On Fri, Dec 04, 2020 at 19:28:55 +0300, Nikolay Shirokovskiy wrote:
On 04.12.2020 19:24, Peter Krempa wrote:
On Fri, Dec 04, 2020 at 19:00:57 +0300, Nikolay Shirokovskiy wrote:
On 04.12.2020 18:52, Peter Krempa wrote:
On Fri, Dec 04, 2020 at 18:44:00 +0300, Nikolay Shirokovskiy wrote:
On 04.12.2020 18:11, Peter Krempa wrote:
On Fri, Nov 13, 2020 at 09:53:28 +0300, Nikolay Shirokovskiy wrote:
[...]
Ok. By the way what about the issue described in
[PATCH v2 09/10] qemu: fix race on legacy block completion and quering stats
Are modern blockjobs have similar?
Modern blockjobs are started with '"auto-dismiss": false' so if you get to the monitor before the thread completing the blockjob finishes it, qemu still reports the status for the job.
But from qemu docs I read that graph changes occur on finalize step and modern blockjobs use auto-finalize = true. So at the moment of querying stats graph in libvirt and qemu can be different I thought.
That doesn't matter for the purposte of data returned from 'qemuDomainGetBlockJobInfo'. QEMU still reports the job, so you will still see it as running and the XML will look the same as while it was running.
Yeah, but the 9th patch is not fixing qemuDomainGetBlockJobInfo (as 10th does). It fixes getting stats. And different graphs in qemu and libvirt affects stats collection.
Stats are queried via 'node-name' when -blockdev is used and as said, removal of the nodes happens only explicitly after we finalize the job. Thus you will get stats per the state the backing chain looked before the blockjob modified the graph as we report the graph from our internal data. Some stats may be stale perhaps, but that's the worst case. In case of stats for pre-blockdev, you will potentially get stats from the reconfigured chain which we attempt to map into the old chain. This indeed may fail. In such case we can plainly not report them, the stats API doesn't always guarantee results. Even for the case of the stats, the proposed fix is just too invasive for something which won't be used for long.

On 04.12.2020 19:46, Peter Krempa wrote:
On Fri, Dec 04, 2020 at 19:28:55 +0300, Nikolay Shirokovskiy wrote:
On 04.12.2020 19:24, Peter Krempa wrote:
On Fri, Dec 04, 2020 at 19:00:57 +0300, Nikolay Shirokovskiy wrote:
On 04.12.2020 18:52, Peter Krempa wrote:
On Fri, Dec 04, 2020 at 18:44:00 +0300, Nikolay Shirokovskiy wrote:
On 04.12.2020 18:11, Peter Krempa wrote: > On Fri, Nov 13, 2020 at 09:53:28 +0300, Nikolay Shirokovskiy wrote:
[...]
Ok. By the way what about the issue described in
[PATCH v2 09/10] qemu: fix race on legacy block completion and quering stats
Are modern blockjobs have similar?
Modern blockjobs are started with '"auto-dismiss": false' so if you get to the monitor before the thread completing the blockjob finishes it, qemu still reports the status for the job.
But from qemu docs I read that graph changes occur on finalize step and modern blockjobs use auto-finalize = true. So at the moment of querying stats graph in libvirt and qemu can be different I thought.
That doesn't matter for the purposte of data returned from 'qemuDomainGetBlockJobInfo'. QEMU still reports the job, so you will still see it as running and the XML will look the same as while it was running.
Yeah, but the 9th patch is not fixing qemuDomainGetBlockJobInfo (as 10th does). It fixes getting stats. And different graphs in qemu and libvirt affects stats collection.
Stats are queried via 'node-name' when -blockdev is used and as said, removal of the nodes happens only explicitly after we finalize the job.
Thus you will get stats per the state the backing chain looked before the blockjob modified the graph as we report the graph from our internal data. Some stats may be stale perhaps, but that's the worst case.
I see then, in modern approach we won't get graph mismatch because we have node-names. Thanx for making it clear!
In case of stats for pre-blockdev, you will potentially get stats from the reconfigured chain which we attempt to map into the old chain. This indeed may fail.
In such case we can plainly not report them, the stats API doesn't always guarantee results.
Even for the case of the stats, the proposed fix is just too invasive for something which won't be used for long.
Ok. Nikolay
participants (2)
-
Nikolay Shirokovskiy
-
Peter Krempa