[libvirt] [PATCH 0/5] Fix a couple of force snapshot revert issues

Well, at least I hope so. There's a couple of related bz's in the last two patches with gobs more details, but the long and short of it is that when using the force flag, the domain is Stop'd and re-Start'd without applying any snapshot. For both issues I've seen, the "error: internal error: unexpected async job 6" occurs on the Start and thus the CPU's aren't started. For one bz, when running managedsave the call never returns. For the other bz, one cannot restart the paused domain. If there's a better way to do this - I'm sure someone will let me know. John Ferlan (5): qemu: Adjust async job failure message qemu: Promote start_flags in qemuDomainRevertToSnapshot qemu: Use start_flags for RUNNING and PAUSED transitions qemu: Unset the genid start change flag for revert force qemu: Don't use asyncJob after stop during snapshot revert src/qemu/qemu_domain.c | 3 ++- src/qemu/qemu_driver.c | 22 +++++++++++++++------- 2 files changed, 17 insertions(+), 8 deletions(-) -- 2.14.4

Make it clearer what asyncJob type was passed and what was expected. Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_domain.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 8a2d4750a5..939d64542c 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6581,7 +6581,8 @@ qemuDomainObjBeginNestedJob(virQEMUDriverPtr driver, if (asyncJob != priv->job.asyncJob) { virReportError(VIR_ERR_INTERNAL_ERROR, - _("unexpected async job %d"), asyncJob); + _("unexpected async job %d type expected %d"), + asyncJob, priv->job.asyncJob); return -1; } -- 2.14.4

Promote the @start_flags to the top of the function, a subsequent patch needs to use it. Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_driver.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 3abbe41895..971f485226 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -15990,6 +15990,7 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, bool was_stopped = false; qemuDomainSaveCookiePtr cookie; virCPUDefPtr origCPU = NULL; + unsigned int start_flags = VIR_QEMU_PROCESS_START_GEN_VMID; virCheckFlags(VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING | VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED | @@ -16307,7 +16308,6 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED)) { /* Flush first event, now do transition 2 or 3 */ bool paused = (flags & VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED) != 0; - unsigned int start_flags = VIR_QEMU_PROCESS_START_GEN_VMID; start_flags |= paused ? VIR_QEMU_PROCESS_START_PAUSED : 0; -- 2.14.4

Use and set the @start_flags at the top of the RUNNING and PAUSED transitions to GEN_VMID | PAUSED. Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_driver.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 971f485226..01011906d1 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16102,6 +16102,9 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, switch ((virDomainState) snap->def->state) { case VIR_DOMAIN_RUNNING: case VIR_DOMAIN_PAUSED: + + start_flags |= VIR_QEMU_PROCESS_START_PAUSED; + /* Transitions 2, 3, 5, 6, 8, 9 */ /* When using the loadvm monitor command, qemu does not know * whether to pause or run the reverted domain, and just stays @@ -16221,8 +16224,7 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, cookie ? cookie->cpu : NULL, QEMU_ASYNC_JOB_START, NULL, -1, NULL, snap, VIR_NETDEV_VPORT_PROFILE_OP_CREATE, - VIR_QEMU_PROCESS_START_PAUSED | - VIR_QEMU_PROCESS_START_GEN_VMID); + start_flags); virDomainAuditStart(vm, "from-snapshot", rc >= 0); detail = VIR_DOMAIN_EVENT_STARTED_FROM_SNAPSHOT; event = virDomainEventLifecycleNewFromObj(vm, -- 2.14.4

If the the snapshot revert involves a forced revert option, then let's not cause startup to change the genid flag in order to signify that we're still running the same/previous guest and not some snapshot reversion. https://bugzilla.redhat.com/show_bug.cgi?id=1149445 Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_driver.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 01011906d1..f737f4d350 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16136,12 +16136,14 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, } /* If using VM GenID, there is no way currently to change - * the genid for the running guest, so set an error and - * mark as incompatible. */ + * the genid for the running guest, so set an error, + * mark as incompatible, and don't allow change of genid + * if the revert force flag would start the guest again. */ if (compatible && config->genidRequested) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("domain genid update requires restart")); compatible = false; + start_flags &= ~VIR_QEMU_PROCESS_START_GEN_VMID; } if (!compatible) { -- 2.14.4

On 06/20/2018 12:54 AM, John Ferlan wrote:
If the the snapshot revert involves a forced revert option, then let's not cause startup to change the genid flag in order to signify that we're still running the same/previous guest and not some snapshot reversion.
https://bugzilla.redhat.com/show_bug.cgi?id=1149445
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_driver.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 01011906d1..f737f4d350 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16136,12 +16136,14 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, }
/* If using VM GenID, there is no way currently to change - * the genid for the running guest, so set an error and - * mark as incompatible. */ + * the genid for the running guest, so set an error, + * mark as incompatible, and don't allow change of genid + * if the revert force flag would start the guest again. */ if (compatible && config->genidRequested) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("domain genid update requires restart")); compatible = false; + start_flags &= ~VIR_QEMU_PROCESS_START_GEN_VMID;
Pre-existing, but what is the point of this virReportError() if it does not cause domain starting? Michal

On 06/20/2018 09:20 AM, Michal Privoznik wrote:
On 06/20/2018 12:54 AM, John Ferlan wrote:
If the the snapshot revert involves a forced revert option, then let's not cause startup to change the genid flag in order to signify that we're still running the same/previous guest and not some snapshot reversion.
https://bugzilla.redhat.com/show_bug.cgi?id=1149445
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_driver.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 01011906d1..f737f4d350 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16136,12 +16136,14 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, }
/* If using VM GenID, there is no way currently to change - * the genid for the running guest, so set an error and - * mark as incompatible. */ + * the genid for the running guest, so set an error, + * mark as incompatible, and don't allow change of genid + * if the revert force flag would start the guest again. */ if (compatible && config->genidRequested) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("domain genid update requires restart")); compatible = false; + start_flags &= ~VIR_QEMU_PROCESS_START_GEN_VMID;
Pre-existing, but what is the point of this virReportError() if it does not cause domain starting?
Michal
See a few lines lower: if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_FORCE)) { /* Re-spawn error using correct category. */ if (err->code == VIR_ERR_CONFIG_UNSUPPORTED) virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY, "%s", err->str2); goto endjob; } virResetError(err); It's thus "similar to" what virDomainDefCheckABIStabilityFlags would do (see a few lines above) when generating an error such as "Target domain disk count 1 does not match source 2" which is seen in bz1591628. A virsh failure would see : error: revert requires force: "%s" (message) which immediately for qe is followed by --force ;-) Tks for the review... John

Attempting to use the FORCE flag for snapshot-revert was resulting in failures because qemuProcessStart and qemuProcessStartCPUs were using QEMU_ASYNC_JOB_START after a qemuProcessStop resulting in an error when entering the monitor: error: internal error: unexpected async job 6 type expected 0 So create a local @jobType, initialize to QEMU_ASYNC_JOB_START, and change to QEMU_ASYNC_JOB_NONE if we end up in the --force path where the qemuProcessStop is run before a Start and StartCPUs. https://bugzilla.redhat.com/show_bug.cgi?id=1591628 Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_driver.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index f737f4d350..71d2bb357c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -15991,6 +15991,7 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, qemuDomainSaveCookiePtr cookie; virCPUDefPtr origCPU = NULL; unsigned int start_flags = VIR_QEMU_PROCESS_START_GEN_VMID; + qemuDomainAsyncJob jobType = QEMU_ASYNC_JOB_START; virCheckFlags(VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING | VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED | @@ -16166,6 +16167,9 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, VIR_DOMAIN_EVENT_STOPPED, detail); virObjectEventStateQueue(driver->domainEventState, event); + /* Start after stop won't be an async start job, so + * reset to none */ + jobType = QEMU_ASYNC_JOB_NONE; goto load; } } @@ -16224,7 +16228,7 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, rc = qemuProcessStart(snapshot->domain->conn, driver, vm, cookie ? cookie->cpu : NULL, - QEMU_ASYNC_JOB_START, NULL, -1, NULL, snap, + jobType, NULL, -1, NULL, snap, VIR_NETDEV_VPORT_PROFILE_OP_CREATE, start_flags); virDomainAuditStart(vm, "from-snapshot", rc >= 0); @@ -16259,7 +16263,7 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, } rc = qemuProcessStartCPUs(driver, vm, VIR_DOMAIN_RUNNING_FROM_SNAPSHOT, - QEMU_ASYNC_JOB_START); + jobType); if (rc < 0) goto endjob; virObjectUnref(event); -- 2.14.4

On 06/20/2018 12:54 AM, John Ferlan wrote:
Well, at least I hope so. There's a couple of related bz's in the last two patches with gobs more details, but the long and short of it is that when using the force flag, the domain is Stop'd and re-Start'd without applying any snapshot. For both issues I've seen, the "error: internal error: unexpected async job 6" occurs on the Start and thus the CPU's aren't started. For one bz, when running managedsave the call never returns. For the other bz, one cannot restart the paused domain.
If there's a better way to do this - I'm sure someone will let me know.
John Ferlan (5): qemu: Adjust async job failure message qemu: Promote start_flags in qemuDomainRevertToSnapshot qemu: Use start_flags for RUNNING and PAUSED transitions qemu: Unset the genid start change flag for revert force qemu: Don't use asyncJob after stop during snapshot revert
src/qemu/qemu_domain.c | 3 ++- src/qemu/qemu_driver.c | 22 +++++++++++++++------- 2 files changed, 17 insertions(+), 8 deletions(-)
ACK series. Michal
participants (2)
-
John Ferlan
-
Michal Privoznik