[libvirt] [PATCH 0/2] qemu: fix few image/resource locking problems

The problem from patch 2/2 made me to look at the locking infrastructure in libvirt. There are a few broken places. This fixes two mistakes but more patches will come. Peter Krempa (2): qemu: monitor: Don't resume lockspaces in resume event handler qemu: snapshot: Don't attempt to resume cpus if they were not paused src/qemu/qemu_driver.c | 4 ++-- src/qemu/qemu_process.c | 11 ----------- 2 files changed, 2 insertions(+), 13 deletions(-) -- 2.11.0

After qemu delivers the resume event it's already running and thus it's too late to enter lockspaces since it may already have modified the disk. The code only creates false log entries in the case when locking is enabled. The lockspace needs to be acquired prior to starting cpus. --- src/qemu/qemu_process.c | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 387103481..3f892fcee 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -773,17 +773,6 @@ qemuProcessHandleResume(qemuMonitorPtr mon ATTRIBUTE_UNUSED, VIR_DOMAIN_EVENT_RESUMED, VIR_DOMAIN_EVENT_RESUMED_UNPAUSED); - VIR_DEBUG("Using lock state '%s' on resume event", NULLSTR(priv->lockState)); - if (virDomainLockProcessResume(driver->lockManager, cfg->uri, - vm, priv->lockState) < 0) { - /* Don't free priv->lockState on error, because we need - * to make sure we have state still present if the user - * tries to resume again - */ - goto unlock; - } - VIR_FREE(priv->lockState); - if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm, driver->caps) < 0) { VIR_WARN("Unable to save status on vm %s after state change", vm->def->name); -- 2.11.0

External disk-only snapshots with recent enough qemu don't require libvirt to pause the VM. The logic determining when to resume cpus was slightly flawed and attempted to resume them even if they were not paused by the snapshot code. This normally was not a problem, but with locking enabled the code would attempt to acquire the lock twice. The fallout of this bug would be a error from the API, but the actual snapshot being created. The bug was introduced with when adding support for external snapshots with memory (checkpoints) in commit f569b87. Resolves problems described by: https://bugzilla.redhat.com/show_bug.cgi?id=1403691 --- src/qemu/qemu_driver.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 88778d491..d976b5c59 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -14471,8 +14471,6 @@ qemuDomainSnapshotCreateActiveExternal(virConnectPtr conn, if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_PMSUSPENDED) { pmsuspended = true; } else if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { - resume = true; - /* For external checkpoints (those with memory), the guest * must pause (either by libvirt up front, or by qemu after * _LIVE converges). For disk-only snapshots with multiple @@ -14495,6 +14493,8 @@ qemuDomainSnapshotCreateActiveExternal(virConnectPtr conn, _("guest unexpectedly quit")); goto cleanup; } + + resume = true; } } -- 2.11.0

On Wed, Dec 14, 2016 at 17:36:31 +0100, Peter Krempa wrote:
The problem from patch 2/2 made me to look at the locking infrastructure in libvirt. There are a few broken places. This fixes two mistakes but more patches will come.
Peter Krempa (2): qemu: monitor: Don't resume lockspaces in resume event handler qemu: snapshot: Don't attempt to resume cpus if they were not paused
src/qemu/qemu_driver.c | 4 ++-- src/qemu/qemu_process.c | 11 ----------- 2 files changed, 2 insertions(+), 13 deletions(-)
ACK series Jirka
participants (2)
-
Jiri Denemark
-
Peter Krempa