[libvirt] [PATCH 0/5] qemu: Change the way we generate VIR_DOMAIN_EVENT_RESUMED

https://bugzilla.redhat.com/show_bug.cgi?id=1612943 Jiri Denemark (5): qemu: Properly report VIR_DOMAIN_EVENT_RESUMED_FROM_SNAPSHOT qemu: Report more appropriate running reasons qemu: Pass running reason to RESUME event handler qemu: Map running reason to resume event detail qemu: Avoid duplicate resume events and state changes src/qemu/qemu_domain.c | 29 ++++++++++++++++++++++++ src/qemu/qemu_domain.h | 7 ++++++ src/qemu/qemu_driver.c | 13 ----------- src/qemu/qemu_migration.c | 36 +++++------------------------- src/qemu/qemu_process.c | 46 ++++++++++++++++++++++++--------------- 5 files changed, 71 insertions(+), 60 deletions(-) -- 2.19.0

VIR_DOMAIN_EVENT_RESUMED_FROM_SNAPSHOT was defined but not used anywhere in our event generation code. This fixes qemuDomainRevertToSnapshot to properly report why the domain was resumed. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_driver.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 2f8d6915e1..5e3f7297e4 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16268,7 +16268,7 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, detail); } else if (!was_running) { /* Transition 8 */ - detail = VIR_DOMAIN_EVENT_RESUMED; + detail = VIR_DOMAIN_EVENT_RESUMED_FROM_SNAPSHOT; event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_RESUMED, detail); -- 2.19.0

On 09/12/2018 08:55 AM, Jiri Denemark wrote:
VIR_DOMAIN_EVENT_RESUMED_FROM_SNAPSHOT was defined but not used anywhere in our event generation code. This fixes qemuDomainRevertToSnapshot to properly report why the domain was resumed.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_driver.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Reviewed-by: John Ferlan <jferlan@redhat.com> John So the event detail was introduced by commit c1ff5dc63d, but never added to anything. Even when commit 88fe7a4b originally implemented the "was_stopped" as STARTED_FROM_SNAPSHOT. When I changed the logic in commit 5efcfb96 it doesn't seem I was considering the "detail", but rather just the event.

This patch replaces some rather generic VIR_DOMAIN_RUNNING_UNPAUSED reasons when changing domain state to running with more specific ones. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_process.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index eb9904b7ba..2f3f52e2a5 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3284,7 +3284,7 @@ qemuProcessRecoverMigrationIn(virQEMUDriverPtr driver, VIR_DEBUG("Incoming migration finished, resuming domain %s", vm->def->name); if (qemuProcessStartCPUs(driver, vm, - VIR_DOMAIN_RUNNING_UNPAUSED, + VIR_DOMAIN_RUNNING_MIGRATED, QEMU_ASYNC_JOB_NONE) < 0) { VIR_WARN("Could not resume domain %s", vm->def->name); } @@ -3391,7 +3391,7 @@ qemuProcessRecoverMigrationOut(virQEMUDriverPtr driver, (reason == VIR_DOMAIN_PAUSED_MIGRATION || reason == VIR_DOMAIN_PAUSED_UNKNOWN)) { if (qemuProcessStartCPUs(driver, vm, - VIR_DOMAIN_RUNNING_UNPAUSED, + VIR_DOMAIN_RUNNING_MIGRATION_CANCELED, QEMU_ASYNC_JOB_NONE) < 0) { VIR_WARN("Could not resume domain %s", vm->def->name); } @@ -3449,7 +3449,7 @@ qemuProcessRecoverJob(virQEMUDriverPtr driver, reason == VIR_DOMAIN_PAUSED_MIGRATION)) || reason == VIR_DOMAIN_PAUSED_UNKNOWN)) { if (qemuProcessStartCPUs(driver, vm, - VIR_DOMAIN_RUNNING_UNPAUSED, + VIR_DOMAIN_RUNNING_SAVE_CANCELED, QEMU_ASYNC_JOB_NONE) < 0) { VIR_WARN("Could not resume domain '%s' after migration to file", vm->def->name); -- 2.19.0

On 09/12/2018 08:55 AM, Jiri Denemark wrote:
This patch replaces some rather generic VIR_DOMAIN_RUNNING_UNPAUSED reasons when changing domain state to running with more specific ones.
These are done in the (ahem) event that libvirtd was restarted before the migration was completed and would have called the *StartCPUs...
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_process.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
You could augment the commit message, but it's not that important. Reviewed-by: John Ferlan <jferlan@redhat.com> John

Whenever we get the RESUME event from QEMU, we change the state of the affected domain to VIR_DOMAIN_RUNNING with VIR_DOMAIN_RUNNING_UNPAUSED reason. This is fine if the domain is resumed unexpectedly, but when we sent "cont" to QEMU we usually have a better reason for the state change. The better reason is used in qemuProcessStartCPUs which also sets the domain state to running if qemuMonitorStartCPUs reports success. Thus we may end up with two state updates in a row, but the final reason is correct. This patch is a preparation for dropping the state change done in qemuMonitorStartCPUs for which we need to pass the actual running reason to the RESUME event handler and use it there instead of VIR_DOMAIN_RUNNING_UNPAUSED. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_domain.h | 4 ++++ src/qemu/qemu_process.c | 23 +++++++++++++++++------ 2 files changed, 21 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 914c9a6a8d..3f3f7ccf18 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -366,6 +366,10 @@ struct _qemuDomainObjPrivate { /* counter for generating node names for qemu disks */ unsigned long long nodenameindex; + + /* qemuProcessStartCPUs stores the reason for starting vCPUs here for the + * RESUME event handler to use it */ + virDomainRunningReason runningReason; }; # define QEMU_DOMAIN_PRIVATE(vm) \ diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 2f3f52e2a5..40ffb23dbb 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -712,21 +712,28 @@ qemuProcessHandleResume(qemuMonitorPtr mon ATTRIBUTE_UNUSED, virQEMUDriverPtr driver = opaque; virObjectEventPtr event = NULL; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + qemuDomainObjPrivatePtr priv; + virDomainRunningReason reason = VIR_DOMAIN_RUNNING_UNPAUSED; virObjectLock(vm); - if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_PAUSED) { - qemuDomainObjPrivatePtr priv = vm->privateData; + priv = vm->privateData; + if (priv->runningReason != VIR_DOMAIN_RUNNING_UNKNOWN) { + reason = priv->runningReason; + priv->runningReason = VIR_DOMAIN_RUNNING_UNKNOWN; + } + + if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_PAUSED) { if (priv->gotShutdown) { VIR_DEBUG("Ignoring RESUME event after SHUTDOWN"); goto unlock; } - VIR_DEBUG("Transitioned guest %s out of paused into resumed state", - vm->def->name); + VIR_DEBUG("Transitioned guest %s out of paused into resumed state, " + "reason '%s'", + vm->def->name, virDomainRunningReasonTypeToString(reason)); - virDomainObjSetState(vm, VIR_DOMAIN_RUNNING, - VIR_DOMAIN_RUNNING_UNPAUSED); + virDomainObjSetState(vm, VIR_DOMAIN_RUNNING, reason); event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_RESUMED, VIR_DOMAIN_EVENT_RESUMED_UNPAUSED); @@ -3088,6 +3095,8 @@ qemuProcessStartCPUs(virQEMUDriverPtr driver, virDomainObjPtr vm, } VIR_FREE(priv->lockState); + priv->runningReason = reason; + if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) < 0) goto release; @@ -3105,6 +3114,7 @@ qemuProcessStartCPUs(virQEMUDriverPtr driver, virDomainObjPtr vm, return ret; release: + priv->runningReason = VIR_DOMAIN_RUNNING_UNKNOWN; if (virDomainLockProcessPause(driver->lockManager, vm, &priv->lockState) < 0) VIR_WARN("Unable to release lease on %s", vm->def->name); VIR_DEBUG("Preserving lock state '%s'", NULLSTR(priv->lockState)); @@ -5982,6 +5992,7 @@ qemuProcessPrepareDomain(virQEMUDriverPtr driver, priv->monError = false; priv->monStart = 0; priv->gotShutdown = false; + priv->runningReason = VIR_DOMAIN_RUNNING_UNKNOWN; VIR_DEBUG("Updating guest CPU definition"); if (qemuProcessUpdateGuestCPU(vm->def, priv->qemuCaps, caps, flags) < 0) -- 2.19.0

On 09/12/2018 08:55 AM, Jiri Denemark wrote:
Whenever we get the RESUME event from QEMU, we change the state of the affected domain to VIR_DOMAIN_RUNNING with VIR_DOMAIN_RUNNING_UNPAUSED reason. This is fine if the domain is resumed unexpectedly, but when we sent "cont" to QEMU we usually have a better reason for the state change. The better reason is used in qemuProcessStartCPUs which also sets the domain state to running if qemuMonitorStartCPUs reports success. Thus we may end up with two state updates in a row, but the final reason is correct.
This patch is a preparation for dropping the state change done in qemuMonitorStartCPUs for which we need to pass the actual running reason to the RESUME event handler and use it there instead of VIR_DOMAIN_RUNNING_UNPAUSED.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_domain.h | 4 ++++ src/qemu/qemu_process.c | 23 +++++++++++++++++------ 2 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 914c9a6a8d..3f3f7ccf18 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -366,6 +366,10 @@ struct _qemuDomainObjPrivate {
/* counter for generating node names for qemu disks */ unsigned long long nodenameindex; + + /* qemuProcessStartCPUs stores the reason for starting vCPUs here for the + * RESUME event handler to use it */ + virDomainRunningReason runningReason;
So what happens in the libvirtd restart case/condition? This isn't Format/Parse'd so it's lost or essentially set to RUNNING_UNKNOWN.
};
# define QEMU_DOMAIN_PRIVATE(vm) \
The rest seems fine, I think the qemuDomainObjPrivateXML{Parse|Format} code is "simple enough" to copy from other examples that you don't need to respin/repost - you can show a diff. Assuming a proper Parse/Format, Reviewed-by: John Ferlan <jferlan@redhat.com> John

On Wed, Sep 19, 2018 at 11:12:26 -0400, John Ferlan wrote:
On 09/12/2018 08:55 AM, Jiri Denemark wrote:
Whenever we get the RESUME event from QEMU, we change the state of the affected domain to VIR_DOMAIN_RUNNING with VIR_DOMAIN_RUNNING_UNPAUSED reason. This is fine if the domain is resumed unexpectedly, but when we sent "cont" to QEMU we usually have a better reason for the state change. The better reason is used in qemuProcessStartCPUs which also sets the domain state to running if qemuMonitorStartCPUs reports success. Thus we may end up with two state updates in a row, but the final reason is correct.
This patch is a preparation for dropping the state change done in qemuMonitorStartCPUs for which we need to pass the actual running reason to the RESUME event handler and use it there instead of VIR_DOMAIN_RUNNING_UNPAUSED.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_domain.h | 4 ++++ src/qemu/qemu_process.c | 23 +++++++++++++++++------ 2 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 914c9a6a8d..3f3f7ccf18 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -366,6 +366,10 @@ struct _qemuDomainObjPrivate {
/* counter for generating node names for qemu disks */ unsigned long long nodenameindex; + + /* qemuProcessStartCPUs stores the reason for starting vCPUs here for the + * RESUME event handler to use it */ + virDomainRunningReason runningReason;
So what happens in the libvirtd restart case/condition? This isn't Format/Parse'd so it's lost or essentially set to RUNNING_UNKNOWN.
Right, when libvirtd is restarted just after sending "cont", I don't think we can expect to see the "RESUME" event in the new process. Most likely the previous libvirtd process already got the event and just didn't have a chance to process it. Thus just updating qemuDomainObjPrivateXML{Parse|Format} to store this in the status XML wouldn't help. We could add a code which would look at the restored runningReason when seeing vCPUs are running, but status XML says the domain is paused. But it would be in a separate patch anyway. Jirka

On 9/21/18 8:12 AM, Jiri Denemark wrote:
On Wed, Sep 19, 2018 at 11:12:26 -0400, John Ferlan wrote:
On 09/12/2018 08:55 AM, Jiri Denemark wrote:
Whenever we get the RESUME event from QEMU, we change the state of the affected domain to VIR_DOMAIN_RUNNING with VIR_DOMAIN_RUNNING_UNPAUSED reason. This is fine if the domain is resumed unexpectedly, but when we sent "cont" to QEMU we usually have a better reason for the state change. The better reason is used in qemuProcessStartCPUs which also sets the domain state to running if qemuMonitorStartCPUs reports success. Thus we may end up with two state updates in a row, but the final reason is correct.
This patch is a preparation for dropping the state change done in qemuMonitorStartCPUs for which we need to pass the actual running reason to the RESUME event handler and use it there instead of VIR_DOMAIN_RUNNING_UNPAUSED.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_domain.h | 4 ++++ src/qemu/qemu_process.c | 23 +++++++++++++++++------ 2 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 914c9a6a8d..3f3f7ccf18 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -366,6 +366,10 @@ struct _qemuDomainObjPrivate {
/* counter for generating node names for qemu disks */ unsigned long long nodenameindex; + + /* qemuProcessStartCPUs stores the reason for starting vCPUs here for the + * RESUME event handler to use it */ + virDomainRunningReason runningReason;
So what happens in the libvirtd restart case/condition? This isn't Format/Parse'd so it's lost or essentially set to RUNNING_UNKNOWN.
Right, when libvirtd is restarted just after sending "cont", I don't think we can expect to see the "RESUME" event in the new process. Most likely the previous libvirtd process already got the event and just didn't have a chance to process it. Thus just updating qemuDomainObjPrivateXML{Parse|Format} to store this in the status XML wouldn't help. We could add a code which would look at the restored runningReason when seeing vCPUs are running, but status XML says the domain is paused. But it would be in a separate patch anyway.
OK - fair enough. I see something adding to ObjPrivate without the Format/Parse and I think we should add it there. However, in this case it's tricky because of the event driven mechanics of change - especially when libvirtd isn't running and the event occurs. Perhaps just note in qemu_domain.h why the Format/Parse of the value isn't being done. The R-By can still apply without the Format/Parse then. John

Thanks to the previous commit the RESUME event handler knows what reason should be used when changing the domain state to VIR_DOMAIN_RUNNING, but the emitted VIR_DOMAIN_EVENT_RESUMED event still uses a generic VIR_DOMAIN_EVENT_RESUMED_UNPAUSED detail. Luckily, the event detail can be easily deduced from the running reason, which saves us from having to pass one more value to the handler. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_domain.c | 29 +++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 3 +++ src/qemu/qemu_process.c | 11 +++++++---- 3 files changed, 39 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 15325aa4c1..66cf2a1040 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -13438,3 +13438,32 @@ qemuDomainStorageIdReset(qemuDomainObjPrivatePtr priv) { priv->nodenameindex = 0; } + + +virDomainEventResumedDetailType +qemuDomainRunningReasonToResumeEvent(virDomainRunningReason reason) +{ + switch (reason) { + case VIR_DOMAIN_RUNNING_RESTORED: + case VIR_DOMAIN_RUNNING_FROM_SNAPSHOT: + return VIR_DOMAIN_EVENT_RESUMED_FROM_SNAPSHOT; + + case VIR_DOMAIN_RUNNING_MIGRATED: + case VIR_DOMAIN_RUNNING_MIGRATION_CANCELED: + return VIR_DOMAIN_EVENT_RESUMED_MIGRATED; + + case VIR_DOMAIN_RUNNING_POSTCOPY: + return VIR_DOMAIN_EVENT_RESUMED_POSTCOPY; + + case VIR_DOMAIN_RUNNING_UNKNOWN: + case VIR_DOMAIN_RUNNING_SAVE_CANCELED: + case VIR_DOMAIN_RUNNING_BOOTED: + case VIR_DOMAIN_RUNNING_UNPAUSED: + case VIR_DOMAIN_RUNNING_WAKEUP: + case VIR_DOMAIN_RUNNING_CRASHED: + case VIR_DOMAIN_RUNNING_LAST: + break; + } + + return VIR_DOMAIN_EVENT_RESUMED_UNPAUSED; +} diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 3f3f7ccf18..d40998e293 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -1079,4 +1079,7 @@ char * qemuDomainGetManagedPRSocketPath(qemuDomainObjPrivatePtr priv); unsigned int qemuDomainStorageIdNew(qemuDomainObjPrivatePtr priv); void qemuDomainStorageIdReset(qemuDomainObjPrivatePtr priv); +virDomainEventResumedDetailType +qemuDomainRunningReasonToResumeEvent(virDomainRunningReason reason); + #endif /* __QEMU_DOMAIN_H__ */ diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 40ffb23dbb..6be3d9e4da 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -714,6 +714,7 @@ qemuProcessHandleResume(qemuMonitorPtr mon ATTRIBUTE_UNUSED, virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); qemuDomainObjPrivatePtr priv; virDomainRunningReason reason = VIR_DOMAIN_RUNNING_UNPAUSED; + virDomainEventResumedDetailType eventDetail; virObjectLock(vm); @@ -729,14 +730,16 @@ qemuProcessHandleResume(qemuMonitorPtr mon ATTRIBUTE_UNUSED, goto unlock; } + eventDetail = qemuDomainRunningReasonToResumeEvent(reason); VIR_DEBUG("Transitioned guest %s out of paused into resumed state, " - "reason '%s'", - vm->def->name, virDomainRunningReasonTypeToString(reason)); + "reason '%s', event detail %d", + vm->def->name, virDomainRunningReasonTypeToString(reason), + eventDetail); virDomainObjSetState(vm, VIR_DOMAIN_RUNNING, reason); event = virDomainEventLifecycleNewFromObj(vm, - VIR_DOMAIN_EVENT_RESUMED, - VIR_DOMAIN_EVENT_RESUMED_UNPAUSED); + VIR_DOMAIN_EVENT_RESUMED, + eventDetail); if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm, driver->caps) < 0) { VIR_WARN("Unable to save status on vm %s after state change", -- 2.19.0

On 09/12/2018 08:55 AM, Jiri Denemark wrote:
Thanks to the previous commit the RESUME event handler knows what reason should be used when changing the domain state to VIR_DOMAIN_RUNNING, but the emitted VIR_DOMAIN_EVENT_RESUMED event still uses a generic VIR_DOMAIN_EVENT_RESUMED_UNPAUSED detail. Luckily, the event detail can be easily deduced from the running reason, which saves us from having to pass one more value to the handler.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_domain.c | 29 +++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 3 +++ src/qemu/qemu_process.c | 11 +++++++---- 3 files changed, 39 insertions(+), 4 deletions(-)
Reviewed-by: John Ferlan <jferlan@redhat.com> John

The only place where VIR_DOMAIN_EVENT_RESUMED is generated is the RESUME event handler to make sure we don't generate duplicate events or state changes. In the worse case the duplicity can revert or cover changes done by other event handlers. For example, after QEMU sent RESUME, BLOCK_IO_ERROR, and STOP events we could happily mark the domain as running and report VIR_DOMAIN_EVENT_RESUMED to registered clients. https://bugzilla.redhat.com/show_bug.cgi?id=1612943 Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_driver.c | 13 ------------- src/qemu/qemu_migration.c | 36 ++++++------------------------------ src/qemu/qemu_process.c | 10 ++++------ 3 files changed, 10 insertions(+), 49 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 5e3f7297e4..49e9fd1233 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1863,7 +1863,6 @@ static int qemuDomainResume(virDomainPtr dom) virQEMUDriverPtr driver = dom->conn->privateData; virDomainObjPtr vm; int ret = -1; - virObjectEventPtr event = NULL; int state; int reason; virQEMUDriverConfigPtr cfg = NULL; @@ -1902,9 +1901,6 @@ static int qemuDomainResume(virDomainPtr dom) "%s", _("resume operation failed")); goto endjob; } - event = virDomainEventLifecycleNewFromObj(vm, - VIR_DOMAIN_EVENT_RESUMED, - VIR_DOMAIN_EVENT_RESUMED_UNPAUSED); } if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm, driver->caps) < 0) goto endjob; @@ -1915,7 +1911,6 @@ static int qemuDomainResume(virDomainPtr dom) cleanup: virDomainObjEndAPI(&vm); - virObjectEventStateQueue(driver->domainEventState, event); virObjectUnref(cfg); return ret; } @@ -15978,7 +15973,6 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, virDomainDefPtr config = NULL; virQEMUDriverConfigPtr cfg = NULL; virCapsPtr caps = NULL; - bool was_running = false; bool was_stopped = false; qemuDomainSaveCookiePtr cookie; virCPUDefPtr origCPU = NULL; @@ -16169,7 +16163,6 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, priv = vm->privateData; if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { /* Transitions 5, 6 */ - was_running = true; if (qemuProcessStopCPUs(driver, vm, VIR_DOMAIN_PAUSED_FROM_SNAPSHOT, QEMU_ASYNC_JOB_START) < 0) @@ -16266,12 +16259,6 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_STARTED, detail); - } else if (!was_running) { - /* Transition 8 */ - detail = VIR_DOMAIN_EVENT_RESUMED_FROM_SNAPSHOT; - event = virDomainEventLifecycleNewFromObj(vm, - VIR_DOMAIN_EVENT_RESUMED, - detail); } } break; diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 825a9d399b..4771f26938 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2982,14 +2982,10 @@ qemuMigrationSrcConfirmPhase(virQEMUDriverPtr driver, virFreeError(orig_err); if (virDomainObjGetState(vm, &reason) == VIR_DOMAIN_PAUSED && - reason == VIR_DOMAIN_PAUSED_POSTCOPY) { + reason == VIR_DOMAIN_PAUSED_POSTCOPY) qemuMigrationAnyPostcopyFailed(driver, vm); - } else if (qemuMigrationSrcRestoreDomainState(driver, vm)) { - event = virDomainEventLifecycleNewFromObj(vm, - VIR_DOMAIN_EVENT_RESUMED, - VIR_DOMAIN_EVENT_RESUMED_MIGRATED); - virObjectEventStateQueue(driver->domainEventState, event); - } + else + qemuMigrationSrcRestoreDomainState(driver, vm); qemuMigrationParamsReset(driver, vm, QEMU_ASYNC_JOB_MIGRATION_OUT, priv->job.migParams, priv->job.apiFlags); @@ -4624,11 +4620,7 @@ qemuMigrationSrcPerformJob(virQEMUDriverPtr driver, qemuMigrationParamsReset(driver, vm, QEMU_ASYNC_JOB_MIGRATION_OUT, priv->job.migParams, priv->job.apiFlags); - if (qemuMigrationSrcRestoreDomainState(driver, vm)) { - event = virDomainEventLifecycleNewFromObj(vm, - VIR_DOMAIN_EVENT_RESUMED, - VIR_DOMAIN_EVENT_RESUMED_MIGRATED); - } + qemuMigrationSrcRestoreDomainState(driver, vm); qemuMigrationJobFinish(driver, vm); if (!virDomainObjIsActive(vm) && ret == 0) { @@ -4672,7 +4664,6 @@ qemuMigrationSrcPerformPhase(virQEMUDriverPtr driver, unsigned long resource) { qemuDomainObjPrivatePtr priv = vm->privateData; - virObjectEventPtr event = NULL; int ret = -1; /* If we didn't start the job in the begin phase, start it now. */ @@ -4694,11 +4685,7 @@ qemuMigrationSrcPerformPhase(virQEMUDriverPtr driver, nmigrate_disks, migrate_disks, migParams); if (ret < 0) { - if (qemuMigrationSrcRestoreDomainState(driver, vm)) { - event = virDomainEventLifecycleNewFromObj(vm, - VIR_DOMAIN_EVENT_RESUMED, - VIR_DOMAIN_EVENT_RESUMED_MIGRATED); - } + qemuMigrationSrcRestoreDomainState(driver, vm); goto endjob; } @@ -4722,7 +4709,6 @@ qemuMigrationSrcPerformPhase(virQEMUDriverPtr driver, cleanup: virDomainObjEndAPI(&vm); - virObjectEventStateQueue(driver->domainEventState, event); return ret; } @@ -5074,13 +5060,8 @@ qemuMigrationDstFinish(virQEMUDriverPtr driver, goto endjob; } - if (inPostCopy) { + if (inPostCopy) doKill = false; - event = virDomainEventLifecycleNewFromObj(vm, - VIR_DOMAIN_EVENT_RESUMED, - VIR_DOMAIN_EVENT_RESUMED_POSTCOPY); - virObjectEventStateQueue(driver->domainEventState, event); - } } if (mig->jobInfo) { @@ -5111,11 +5092,6 @@ qemuMigrationDstFinish(virQEMUDriverPtr driver, dom = virGetDomain(dconn, vm->def->name, vm->def->uuid, vm->def->id); - event = virDomainEventLifecycleNewFromObj(vm, - VIR_DOMAIN_EVENT_RESUMED, - VIR_DOMAIN_EVENT_RESUMED_MIGRATED); - virObjectEventStateQueue(driver->domainEventState, event); - if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_PAUSED) { virDomainObjSetState(vm, VIR_DOMAIN_PAUSED, VIR_DOMAIN_PAUSED_USER); event = virDomainEventLifecycleNewFromObj(vm, diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 6be3d9e4da..3a4bb17e83 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -456,7 +456,6 @@ qemuProcessFakeReboot(void *opaque) virDomainObjPtr vm = opaque; qemuDomainObjPrivatePtr priv = vm->privateData; virQEMUDriverPtr driver = priv->driver; - virObjectEventPtr event = NULL; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); virDomainRunningReason reason = VIR_DOMAIN_RUNNING_BOOTED; int ret = -1, rc; @@ -493,9 +492,6 @@ qemuProcessFakeReboot(void *opaque) goto endjob; } priv->gotShutdown = false; - event = virDomainEventLifecycleNewFromObj(vm, - VIR_DOMAIN_EVENT_RESUMED, - VIR_DOMAIN_EVENT_RESUMED_UNPAUSED); if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm, driver->caps) < 0) { VIR_WARN("Unable to save status on vm %s after state change", @@ -511,7 +507,6 @@ qemuProcessFakeReboot(void *opaque) if (ret == -1) ignore_value(qemuProcessKill(vm, VIR_QEMU_PROCESS_KILL_FORCE)); virDomainObjEndAPI(&vm); - virObjectEventStateQueue(driver->domainEventState, event); virObjectUnref(cfg); } @@ -3110,7 +3105,10 @@ qemuProcessStartCPUs(virQEMUDriverPtr driver, virDomainObjPtr vm, if (ret < 0) goto release; - virDomainObjSetState(vm, VIR_DOMAIN_RUNNING, reason); + /* The RESUME event handler will change the domain state with the reason + * saved in priv->runningReason and it will also emit corresponding domain + * lifecycle event. + */ cleanup: virObjectUnref(cfg); -- 2.19.0

On 09/12/2018 08:55 AM, Jiri Denemark wrote:
The only place where VIR_DOMAIN_EVENT_RESUMED is generated is the RESUME
I assume you meant "should be generated"
event handler to make sure we don't generate duplicate events or state changes. In the worse case the duplicity can revert or cover changes done by other event handlers.
For example, after QEMU sent RESUME, BLOCK_IO_ERROR, and STOP events we could happily mark the domain as running and report VIR_DOMAIN_EVENT_RESUMED to registered clients.
This patch removes the non QEMU event handling resume processing such as 1. User initiated domain resume 2. Resume after revert from snapshot 3. Resume on migration source after failure detected 4. Resume on migration destination once finished 5. Resume after Shutdown/Reboot or Panic when guest is configured to process a fake reboot. deferring completely to the resume event QEMU will send as a result of the CPUs being successfully started. ... My caveats: 1. I think that list is correct - I may have miss-worded #5 - I forget exactly how that code works until I'm debugging it ;-) 2. The CPUs being successfully restarted is I think the "cont" command being successful.
https://bugzilla.redhat.com/show_bug.cgi?id=1612943
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_driver.c | 13 ------------- src/qemu/qemu_migration.c | 36 ++++++------------------------------ src/qemu/qemu_process.c | 10 ++++------ 3 files changed, 10 insertions(+), 49 deletions(-)
What I type above could be "massaged" into the commit message - so it's "easier" at a glance to understand what's going on. It's of course my wording and making sure I've properly understood what happening without being able to stop by your cube and ask directly ;-). [...]
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 825a9d399b..4771f26938 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c
[...]
@@ -5074,13 +5060,8 @@ qemuMigrationDstFinish(virQEMUDriverPtr driver, goto endjob; }
- if (inPostCopy) { + if (inPostCopy) doKill = false; - event = virDomainEventLifecycleNewFromObj(vm, - VIR_DOMAIN_EVENT_RESUMED, - VIR_DOMAIN_EVENT_RESUMED_POSTCOPY); - virObjectEventStateQueue(driver->domainEventState, event); - } }
if (mig->jobInfo) { @@ -5111,11 +5092,6 @@ qemuMigrationDstFinish(virQEMUDriverPtr driver,
dom = virGetDomain(dconn, vm->def->name, vm->def->uuid, vm->def->id);
- event = virDomainEventLifecycleNewFromObj(vm, - VIR_DOMAIN_EVENT_RESUMED, - VIR_DOMAIN_EVENT_RESUMED_MIGRATED); - virObjectEventStateQueue(driver->domainEventState, event); - if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_PAUSED) { virDomainObjSetState(vm, VIR_DOMAIN_PAUSED, VIR_DOMAIN_PAUSED_USER); event = virDomainEventLifecycleNewFromObj(vm,
For this path 2 events were generated if @inPostCopy == True - one right after the qemuProcessStartCPUs for RESUMED_POSTCOPY and then one for SUSPENDED_PAUSED once qemuMigrationDstWaitForCompletion occurred. IIUC, the changes here when @inPostCopy == true would only generate the POSTCOPY event, but not the MIGRATED event. Or is there something subtle I'm missing. Will the post copy processing generate two resume events? If so, we perhaps should document that. If not, then perhaps this second event that's being deleted needs to be moved inside that inPostCopy condition just above with the note that this one case is "abnormal" (or you could just do the *StartCPUs again with the RESUMED_MIGRATED and close your eyes ;-)). Everything else is fine - it's just this one case. John [...] Curious, for the future, would it be useful to change the FakeReboot path to generate a different detail reason? It's hard to distinguish between someone using virsh resume (or recovery from Save, Dump, Snapshot failure) from that fake reboot path - at least from just the event. I guess it seems VIR_DOMAIN_EVENT_RESUMED_UNPAUSED is just too generic since it's more than just a "normal resume due to admin unpause", but that I would think would be extra given the "scope" of this particular problem. Second curiosity... would it be useful/easy to generate an event on the failure scenario for StartCPUs? Knowing from whence we were called, we should be able to generate a suspended event rather than having the callers decide.

On Wed, Sep 19, 2018 at 16:04:13 -0400, John Ferlan wrote:
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 825a9d399b..4771f26938 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c
[...]
@@ -5074,13 +5060,8 @@ qemuMigrationDstFinish(virQEMUDriverPtr driver, goto endjob; }
- if (inPostCopy) { + if (inPostCopy) doKill = false; - event = virDomainEventLifecycleNewFromObj(vm, - VIR_DOMAIN_EVENT_RESUMED, - VIR_DOMAIN_EVENT_RESUMED_POSTCOPY); - virObjectEventStateQueue(driver->domainEventState, event); - } }
if (mig->jobInfo) { @@ -5111,11 +5092,6 @@ qemuMigrationDstFinish(virQEMUDriverPtr driver,
dom = virGetDomain(dconn, vm->def->name, vm->def->uuid, vm->def->id);
- event = virDomainEventLifecycleNewFromObj(vm, - VIR_DOMAIN_EVENT_RESUMED, - VIR_DOMAIN_EVENT_RESUMED_MIGRATED); - virObjectEventStateQueue(driver->domainEventState, event); - if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_PAUSED) { virDomainObjSetState(vm, VIR_DOMAIN_PAUSED, VIR_DOMAIN_PAUSED_USER); event = virDomainEventLifecycleNewFromObj(vm,
For this path 2 events were generated if @inPostCopy == True - one right after the qemuProcessStartCPUs for RESUMED_POSTCOPY and then one for SUSPENDED_PAUSED once qemuMigrationDstWaitForCompletion occurred.
I guess you meant s/SUSPENDED_PAUSED/RESUMED_MIGRATED/. Anyway, you made me think about this and you're right. Or docs even say that two RESUMED events are sent during post-copy migration. RESUMED_POSTCOPY when the domain switches from source to the destination host and RESUMED_MIGRATED once migration is complete... v2 is in progress.
IIUC, the changes here when @inPostCopy == true would only generate the POSTCOPY event, but not the MIGRATED event. Or is there something subtle I'm missing. Will the post copy processing generate two resume events? If so, we perhaps should document that. If not, then perhaps this second event that's being deleted needs to be moved inside that inPostCopy condition just above with the note that this one case is "abnormal"
Yes.
(or you could just do the *StartCPUs again with the RESUMED_MIGRATED and close your eyes ;-)).
This wouldn't work, the RESUME handler ignores the event if the domain is not paused.
Curious, for the future, would it be useful to change the FakeReboot path to generate a different detail reason? It's hard to distinguish between someone using virsh resume (or recovery from Save, Dump, Snapshot failure) from that fake reboot path - at least from just the event. I guess it seems VIR_DOMAIN_EVENT_RESUMED_UNPAUSED is just too generic since it's more than just a "normal resume due to admin unpause", but that I would think would be extra given the "scope" of this particular problem.
Yeah, a new resume reason would be nice in this case, but I don't think it's a big issue. The guest-agent driven reboot is much better and hopefully used more often than fake reboot.
Second curiosity... would it be useful/easy to generate an event on the failure scenario for StartCPUs? Knowing from whence we were called, we should be able to generate a suspended event rather than having the callers decide.
Not really, if StartCPUs fails the vCPUs just remain paused as if nothing happened. That is, there's no state change to signal using an event. And I think "cont" can't really fail anyway. Normally it would just succeed and another STOP event would be sent right away in case the reason for pausing the CPUs (such as I/O error) is still valid. Jirka

On Wed, Sep 12, 2018 at 14:55:53 +0200, Jiri Denemark wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1612943
Jiri Denemark (5): qemu: Properly report VIR_DOMAIN_EVENT_RESUMED_FROM_SNAPSHOT qemu: Report more appropriate running reasons qemu: Pass running reason to RESUME event handler qemu: Map running reason to resume event detail qemu: Avoid duplicate resume events and state changes
src/qemu/qemu_domain.c | 29 ++++++++++++++++++++++++ src/qemu/qemu_domain.h | 7 ++++++ src/qemu/qemu_driver.c | 13 ----------- src/qemu/qemu_migration.c | 36 +++++------------------------- src/qemu/qemu_process.c | 46 ++++++++++++++++++++++++--------------- 5 files changed, 71 insertions(+), 60 deletions(-)
ping
participants (2)
-
Jiri Denemark
-
John Ferlan