On 08/07/2012 07:34 PM, Daniel P. Berrange wrote:
>On Tue, Aug 07, 2012 at 03:18:38PM +0800, Alex Jia wrote:
>>* src/qemu/qemu_domain.c (qemuDomainObjExitAgentInternal): fix crashing
>> libvirtd due to derefing a NULL pointer.
>>
>>For details, please see bug:
>>RHBZ:
https://bugzilla.redhat.com/show_bug.cgi?id=845966
>>
>>Signed-off-by: Alex Jia<ajia(a)redhat.com>
>>---
>> src/qemu/qemu_domain.c | 10 ++++++----
>> 1 files changed, 6 insertions(+), 4 deletions(-)
>>
>>diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
>>index 86f0265..8667b6c 100644
>>--- a/src/qemu/qemu_domain.c
>>+++ b/src/qemu/qemu_domain.c
>>@@ -1136,12 +1136,14 @@ qemuDomainObjExitAgentInternal(struct qemud_driver
*driver,
>> virDomainObjPtr obj)
>> {
>> qemuDomainObjPrivatePtr priv = obj->privateData;
>>- int refs;
>>+ int refs = -1;
>>
>>- refs = qemuAgentUnref(priv->agent);
>>+ if (priv->agent) {
>>+ refs = qemuAgentUnref(priv->agent);
>>
>>- if (refs> 0)
>>- qemuAgentUnlock(priv->agent);
>>+ if (refs> 0)
>>+ qemuAgentUnlock(priv->agent);
>>+ }
>>
>> if (driver_locked)
>> qemuDriverLock(driver);
>I'm not convinced this is the right fix. The whole point of the Enter/ExitAgent
>methods is to hold an extra reference on priv->agent, so that it is *not*
>deleted while a agent command is run.
>
>What is setting priv->agent to NULL while the command is still active ?
In fact, the command 'guest-suspend-disk' is freed by
virJSONValueFree() in qemuAgentSuspend() after the command is
successfully sent via 'qemuAgentCommand()':
(gdb) s
qemuDomainPMSuspendForDuration (dom=<value optimized out>, target=1,
duration=<value optimized out>, flags=<value optimized out>) at
qemu/qemu_driver.c:13123
13123 qemuDomainObjExitAgent(driver, vm);
(gdb) p *vm
$68 = {object = {magic = 3405643788, refs = 4, klass =
0x7f4ce815a9b0}, lock = {lock = {__data = {__lock = 1, __count = 0,
__owner = 20285, __nusers = 1, __kind = 0, __spins = 0, __list =
{__prev = 0x0,
__next = 0x0}}, __size =
"\001\000\000\000\000\000\000\000=O\000\000\001", '\000' <repeats
26
times>, __align = 1}}, pid = 20379, *state = {state = 4, reason =
0}*, autostart = 0, persistent = 1,
updated = 0, def = 0x7f4ce815e500, newDef = 0x7f4ce8069b80,
snapshots = {objs = 0x7f4ce815e240, metaroot = {def = 0x0, parent =
0x0, sibling = 0x0, nchildren = 0, first_child = 0x0}},
current_snapshot = 0x0, hasManagedSave = false, privateData =
0x7f4ce8154660, privateDataFreeFunc = 0x7f4cefada190
<qemuDomainObjPrivateFree>, taint = 4}
(gdb) s
13122 ret = qemuAgentSuspend(priv->agent, target);
(gdb) p *priv
$70 = {job = {cond = {cond = {__data = {__lock = 0, __futex = 0,
__total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0,
__nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47
times>,
__align = 0}}, active = QEMU_JOB_MODIFY, owner = 20286,
asyncCond = {cond = {__data = {__lock = 0, __futex = 0, __total_seq
= 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters =
0,
__broadcast_seq = 0}, __size = '\000' <repeats 47 times>,
__align = 0}}, asyncJob = QEMU_ASYNC_JOB_NONE, asyncOwner = 0, phase
= 0, mask = 0, start = 0, dump_memory_only = false, info = {
type = 0, timeElapsed = 0, timeRemaining = 0, dataTotal = 0,
dataProcessed = 0, dataRemaining = 0, memTotal = 0, memProcessed =
0, memRemaining = 0, fileTotal = 0, fileProcessed = 0,
fileRemaining = 0}}, mon = 0x7f4ce80a3570, monConfig = 0x0,
monJSON = 1, monError = false, monStart = 0, *agent = 0x0*,
agentError = false, agentStart = 1344402957193, gotShutdown = true,
beingDestroyed = false, pidfile = 0x7f4ce816ba90
"/var/run/libvirt/qemu/myRHEL6.pid", nvcpupids = 1, vcpupids =
0x7f4ce8146be0, pciaddrs = 0x7f4ce8171b30, persistentAddrs = 1,
qemuCaps = 0x7f4ce80b4030,
lockState = 0x0, fakeReboot = false, jobs_queued = 1,
migMaxBandwidth = 32, origname = 0x0, cons = 0x7f4ce8165ce0,
cleanupCallbacks = 0x0, ncleanupCallbacks = 0, ncleanupCallbacks_max
= 0}
(gdb) p priv->agent
$71 =*(qemuAgentPtr) 0x0*
(gdb) s
1138 refs = qemuAgentUnref(priv->agent);
(gdb) s
qemuAgentUnref (mon=0x0) at qemu/qemu_agent.c:168
(gdb) s
170 VIR_DEBUG("%d", mon->refs);
(gdb) s
168 {
(gdb) s
169 mon->refs--;
(gdb) s
Program received signal SIGSEGV, Segmentation fault.
qemuAgentUnref (*mon=0x0*) at qemu/qemu_agent.c:169
169 *mon->refs--*;
(gdb) s
virNetServerFatalSignal (sig=11, siginfo=0x7f4cf748f630,
context=0x7f4cf748f500) at rpc/virnetserver.c:296
In addition, the old qemuAgentUnref(mon) hasn't judge whether its
parameter is NULL then will deref a NULL pointer, I should simply
fix it in qemuAgentUnref(), for example, if 'mon' is NULL then
directly return.
Fortunately, your commit b57ee09 potentially fix this issue via
using virObjectUnref() instead of qemuAgentUnref(), if the parameter
'priv->agent' is NULL then the virObjectUnref(priv->agent) will
directly return false:
That is just lucky and we should not rely on that. There is still a
bug here. The 'priv->agent' field should *not* be set to NULL in the
first place, while a monitor command is active. The GDB trace above
does not show *what* is setting priv->agent to NULL, and that is
what we need to find out.
Daniel
--
|: