Re: [libvirt] [PATCHv2 13/14] Exit early after domain crash in monitor on snapshots

14 Jan 2015

On 01/13/2015 03:43 PM, John Ferlan wrote:
...
<no commit message>
On 01/07/2015 10:42 AM, Ján Tomko wrote:
...
---
 src/qemu/qemu_domain.c |  2 +-
 src/qemu/qemu_driver.c | 18 ++++++------------
 2 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 82d0c91..3d56eb8 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -2313,7 +2313,7 @@ qemuDomainSnapshotDiscard(virQEMUDriverPtr driver,
             qemuDomainObjEnterMonitor(driver, vm);
             /* we continue on even in the face of error */
             qemuMonitorDeleteSnapshot(priv->mon, snap->def->name);
-            qemuDomainObjExitMonitor(driver, vm);
+            ignore_value(qemuDomainObjExitMonitor(driver, vm));
         }
     }
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 55d6fb3..a3d8983 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -12896,7 +12896,8 @@ qemuDomainSnapshotCreateActiveInternal(virConnectPtr conn,
     }
ret = qemuMonitorCreateSnapshot(priv->mon, snap->def->name);
-    qemuDomainObjExitMonitor(driver, vm);
+    if (qemuDomainObjExitMonitor(driver, vm) < 0)
+        ret = -1;
     if (ret < 0)
         goto cleanup;
@@ -13417,12 +13418,8 @@ qemuDomainSnapshotCreateSingleDiskActive(virQEMUDriverPtr driver,
     ret = qemuMonitorDiskSnapshot(priv->mon, actions, device, source,
                                   formatStr, reuse);
     if (!actions) {
-        qemuDomainObjExitMonitor(driver, vm);
-        if (!virDomainObjIsActive(vm)) {
-            virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
-                           _("domain crashed while taking the snapshot"));
+        if (qemuDomainObjExitMonitor(driver, vm) < 0)
             ret = -1;
-        }
     }
In this case we do fall through to Audit - what the "strange" part of
the Audit is though is we've changed the value of 'ret' if the
ExitMonitor fails. Could a 'successful' DiskSnapshot ret value be
changed into a failure because the guest died for some reason. Could
anything else jump in (I've totally lost track now :-))
If that's expected, fine, but I thought it worthy to point out. Although
I do "suspect" it could be reasonable to expect that the crash was
because of taking the snapshot, but who knows for sure.
ACK in general...
Now that I'm done with the various callers. It's been hard to keep track
in my own mind of the consistency between each routine and what I may
have caught along the way. The following 4 do spring to mind though
1. Audit
   I think the Audits should occur. Whether they "all" happen right
after the qemuMonitor* call while we still have the lock or whether
they're done afterwards doesn't matter to me.
As of now, the audits are broken because they possibly access freed data.
While the vm->def that will be replaced on domain crash still contains
the basic data needed for audit, the device pointers into vm->def will be stale.

The aim of this series is to remove the crash possibility by not using those
pointers.

While we are in monitor, the domain object is unlocked, so even accessing
vm->def should not be safe. After exiting the monitor, we'd need a copy of the
device that was not a part of vm->def, which we have for attach and
things like setvcpus, where the auditted 'device' is just an int, but not for
detach.

Regardless of this series, the audit logging is inconsistent and should be
IMHO addressed separately.
...
2. EndJob
   I started noticing some cases where EndJob was called and other times
where it was missed. Of course I didn't go back to earlier patches to
check all cases, but that might be something you want to do to ensure
we're not leaving somewhere without the EndJob
3. Possibility of ret = qemuMonitor* == 0 and ExitMonitor == -1
   While I agree it's highly likely that ret == -1 prior to ExitMonitor
calls where sometimes ret = -1 is done and other times it's not. Still,
if it is possible, then it needs to be accounted for. It doesn't seem
so, but to have some call sequences do things one way while others do it
differently makes it difficult to keep track.
Ouch, that's probably a result of me spending less time per line on the last 6
patches. I'll take another look before pushing/resending. (So far I've got
patches 1, 2, 6-8 queued as I believe there were no audit issues there)
...
4. Consistency
   I realize the inconsistencies existed before these patches, but since
you're tackling all the callers in this series of patches - it may be
nice to make things consistent. That way the next person who copies what
some command does will then hopefully create consistent code. Another
option is to make a macro that handles the ExitMonitor and goto's...
I'm not sure I understand what should be consistent and can't imagine a
universal and readable macro for that.

For the audit, if qemuMonitor* succeeds but ExitMonitor fails, it would be
nice to log 'true', but that would be nicer in a separate series adressing all
the audit calls.

As for the ignore_value vs. goto cleanup, I try to use goto when possible,
because ingore value is harder to read to me.

Jan
...
John
...
virDomainAuditDisk(vm, disk->src, snap->src, "snapshot", ret >= 0);
@@ -13560,12 +13557,8 @@ qemuDomainSnapshotCreateDiskActive(virQEMUDriverPtr driver,
         if (ret == 0) {
             if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) == 0) {
                 ret = qemuMonitorTransaction(priv->mon, actions);
-                qemuDomainObjExitMonitor(driver, vm);
-                if (!virDomainObjIsActive(vm)) {
-                    virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
-                                   _("domain crashed while taking the snapshot"));
+                if (qemuDomainObjExitMonitor(driver, vm) < 0)
                     ret = -1;
-                }
             } else {
                 /* failed to enter monitor, clean stuff up and quit */
                 ret = -1;
@@ -14656,7 +14649,8 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot,
             }
             qemuDomainObjEnterMonitor(driver, vm);
             rc = qemuMonitorLoadSnapshot(priv->mon, snap->def->name);
-            qemuDomainObjExitMonitor(driver, vm);
+            if (qemuDomainObjExitMonitor(driver, vm) < 0)
+                goto endjob;
             if (rc < 0) {
                 /* XXX resume domain if it was running before the
                  * failed loadvm attempt? */
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list