On 08/07/2012 07:34 PM, Daniel P. Berrange wrote:
On Tue, Aug 07, 2012 at 03:18:38PM +0800, Alex Jia wrote:
* src/qemu/qemu_domain.c (qemuDomainObjExitAgentInternal): fix crashing
  libvirtd due to derefing a NULL pointer.

For details, please see bug:
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=845966

Signed-off-by: Alex Jia <ajia@redhat.com>
---
 src/qemu/qemu_domain.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 86f0265..8667b6c 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -1136,12 +1136,14 @@ qemuDomainObjExitAgentInternal(struct qemud_driver *driver,
                                virDomainObjPtr obj)
 {
     qemuDomainObjPrivatePtr priv = obj->privateData;
-    int refs;
+    int refs = -1;
 
-    refs = qemuAgentUnref(priv->agent);
+    if (priv->agent) {
+        refs = qemuAgentUnref(priv->agent);
 
-    if (refs > 0)
-        qemuAgentUnlock(priv->agent);
+        if (refs > 0)
+            qemuAgentUnlock(priv->agent);
+    }
 
     if (driver_locked)
         qemuDriverLock(driver);
I'm not convinced this is the right fix. The whole point of the Enter/ExitAgent
methods is to hold an extra reference on priv->agent, so that it is *not*
deleted while a agent command is run.

What is setting priv->agent to NULL while the command is still active ?

In fact, the command 'guest-suspend-disk' is freed by virJSONValueFree() in qemuAgentSuspend() after the command is successfully sent via 'qemuAgentCommand()':

(gdb) s
qemuDomainPMSuspendForDuration (dom=<value optimized out>, target=1, duration=<value optimized out>, flags=<value optimized out>) at qemu/qemu_driver.c:13123
13123       qemuDomainObjExitAgent(driver, vm);
(gdb) p *vm
$68 = {object = {magic = 3405643788, refs = 4, klass = 0x7f4ce815a9b0}, lock = {lock = {__data = {__lock = 1, __count = 0, __owner = 20285, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0,
          __next = 0x0}}, __size = "\001\000\000\000\000\000\000\000=O\000\000\001", '\000' <repeats 26 times>, __align = 1}}, pid = 20379, state = {state = 4, reason = 0}, autostart = 0, persistent = 1,
  updated = 0, def = 0x7f4ce815e500, newDef = 0x7f4ce8069b80, snapshots = {objs = 0x7f4ce815e240, metaroot = {def = 0x0, parent = 0x0, sibling = 0x0, nchildren = 0, first_child = 0x0}},
  current_snapshot = 0x0, hasManagedSave = false, privateData = 0x7f4ce8154660, privateDataFreeFunc = 0x7f4cefada190 <qemuDomainObjPrivateFree>, taint = 4}
 (gdb) s
13122       ret = qemuAgentSuspend(priv->agent, target);

(gdb) p *priv
$70 = {job = {cond = {cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>,
        __align = 0}}, active = QEMU_JOB_MODIFY, owner = 20286, asyncCond = {cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0,
          __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}}, asyncJob = QEMU_ASYNC_JOB_NONE, asyncOwner = 0, phase = 0, mask = 0, start = 0, dump_memory_only = false, info = {
      type = 0, timeElapsed = 0, timeRemaining = 0, dataTotal = 0, dataProcessed = 0, dataRemaining = 0, memTotal = 0, memProcessed = 0, memRemaining = 0, fileTotal = 0, fileProcessed = 0,
      fileRemaining = 0}}, mon = 0x7f4ce80a3570, monConfig = 0x0, monJSON = 1, monError = false, monStart = 0, agent = 0x0, agentError = false, agentStart = 1344402957193, gotShutdown = true,
  beingDestroyed = false, pidfile = 0x7f4ce816ba90 "/var/run/libvirt/qemu/myRHEL6.pid", nvcpupids = 1, vcpupids = 0x7f4ce8146be0, pciaddrs = 0x7f4ce8171b30, persistentAddrs = 1, qemuCaps = 0x7f4ce80b4030,
  lockState = 0x0, fakeReboot = false, jobs_queued = 1, migMaxBandwidth = 32, origname = 0x0, cons = 0x7f4ce8165ce0, cleanupCallbacks = 0x0, ncleanupCallbacks = 0, ncleanupCallbacks_max = 0}
(gdb) p priv->agent
$71 = (qemuAgentPtr) 0x0
(gdb) s
1138        refs = qemuAgentUnref(priv->agent);
(gdb) s
qemuAgentUnref (mon=0x0) at qemu/qemu_agent.c:168
(gdb) s
170         VIR_DEBUG("%d", mon->refs);
(gdb) s
168     {
(gdb) s
169         mon->refs--;
(gdb) s

Program received signal SIGSEGV, Segmentation fault.
qemuAgentUnref (mon=0x0) at qemu/qemu_agent.c:169
169         mon->refs--;
(gdb) s
virNetServerFatalSignal (sig=11, siginfo=0x7f4cf748f630, context=0x7f4cf748f500) at rpc/virnetserver.c:296


In addition, the old qemuAgentUnref(mon) hasn't judge whether its parameter is NULL then will deref a NULL pointer, I should simply fix it in  qemuAgentUnref(), for example, if 'mon' is NULL then directly return.

Fortunately, your commit b57ee09 potentially fix this issue via using virObjectUnref() instead of qemuAgentUnref(), if the parameter 'priv->agent' is NULL then the virObjectUnref(priv->agent) will directly return false:

bool virObjectUnref(void *anyobj)
{
    virObjectPtr obj = anyobj;

    if (!obj)
        return false;

    ......
}


Regards,
Alex
Daniel