[libvirt] [PATCH] qemu: Avoid libvirtd crash in qemuDomainObjExitAgentInternal

* src/qemu/qemu_domain.c (qemuDomainObjExitAgentInternal): fix crashing libvirtd due to derefing a NULL pointer. For details, please see bug: RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=845966 Signed-off-by: Alex Jia <ajia@redhat.com> --- src/qemu/qemu_domain.c | 10 ++++++---- 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 86f0265..8667b6c 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1136,12 +1136,14 @@ qemuDomainObjExitAgentInternal(struct qemud_driver *driver, virDomainObjPtr obj) { qemuDomainObjPrivatePtr priv = obj->privateData; - int refs; + int refs = -1; - refs = qemuAgentUnref(priv->agent); + if (priv->agent) { + refs = qemuAgentUnref(priv->agent); - if (refs > 0) - qemuAgentUnlock(priv->agent); + if (refs > 0) + qemuAgentUnlock(priv->agent); + } if (driver_locked) qemuDriverLock(driver); -- 1.7.1

On Tue, Aug 07, 2012 at 03:18:38PM +0800, Alex Jia wrote:
* src/qemu/qemu_domain.c (qemuDomainObjExitAgentInternal): fix crashing libvirtd due to derefing a NULL pointer.
For details, please see bug: RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=845966
Signed-off-by: Alex Jia <ajia@redhat.com> --- src/qemu/qemu_domain.c | 10 ++++++---- 1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 86f0265..8667b6c 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1136,12 +1136,14 @@ qemuDomainObjExitAgentInternal(struct qemud_driver *driver, virDomainObjPtr obj) { qemuDomainObjPrivatePtr priv = obj->privateData; - int refs; + int refs = -1;
- refs = qemuAgentUnref(priv->agent); + if (priv->agent) { + refs = qemuAgentUnref(priv->agent);
- if (refs > 0) - qemuAgentUnlock(priv->agent); + if (refs > 0) + qemuAgentUnlock(priv->agent); + }
if (driver_locked) qemuDriverLock(driver);
I'm not convinced this is the right fix. The whole point of the Enter/ExitAgent methods is to hold an extra reference on priv->agent, so that it is *not* deleted while a agent command is run. What is setting priv->agent to NULL while the command is still active ? Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 08/07/2012 07:34 PM, Daniel P. Berrange wrote:
On Tue, Aug 07, 2012 at 03:18:38PM +0800, Alex Jia wrote:
* src/qemu/qemu_domain.c (qemuDomainObjExitAgentInternal): fix crashing libvirtd due to derefing a NULL pointer.
For details, please see bug: RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=845966
Signed-off-by: Alex Jia<ajia@redhat.com> --- src/qemu/qemu_domain.c | 10 ++++++---- 1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 86f0265..8667b6c 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1136,12 +1136,14 @@ qemuDomainObjExitAgentInternal(struct qemud_driver *driver, virDomainObjPtr obj) { qemuDomainObjPrivatePtr priv = obj->privateData; - int refs; + int refs = -1;
- refs = qemuAgentUnref(priv->agent); + if (priv->agent) { + refs = qemuAgentUnref(priv->agent);
- if (refs> 0) - qemuAgentUnlock(priv->agent); + if (refs> 0) + qemuAgentUnlock(priv->agent); + }
if (driver_locked) qemuDriverLock(driver); I'm not convinced this is the right fix. The whole point of the Enter/ExitAgent methods is to hold an extra reference on priv->agent, so that it is *not* deleted while a agent command is run.
What is setting priv->agent to NULL while the command is still active ?
In fact, the command 'guest-suspend-disk' is freed by virJSONValueFree() in qemuAgentSuspend() after the command is successfully sent via 'qemuAgentCommand()': (gdb) s qemuDomainPMSuspendForDuration (dom=<value optimized out>, target=1, duration=<value optimized out>, flags=<value optimized out>) at qemu/qemu_driver.c:13123 13123 qemuDomainObjExitAgent(driver, vm); (gdb) p *vm $68 = {object = {magic = 3405643788, refs = 4, klass = 0x7f4ce815a9b0}, lock = {lock = {__data = {__lock = 1, __count = 0, __owner = 20285, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\001\000\000\000\000\000\000\000=O\000\000\001", '\000' <repeats 26 times>, __align = 1}}, pid = 20379, *state = {state = 4, reason = 0}*, autostart = 0, persistent = 1, updated = 0, def = 0x7f4ce815e500, newDef = 0x7f4ce8069b80, snapshots = {objs = 0x7f4ce815e240, metaroot = {def = 0x0, parent = 0x0, sibling = 0x0, nchildren = 0, first_child = 0x0}}, current_snapshot = 0x0, hasManagedSave = false, privateData = 0x7f4ce8154660, privateDataFreeFunc = 0x7f4cefada190 <qemuDomainObjPrivateFree>, taint = 4} (gdb) s 13122 ret = qemuAgentSuspend(priv->agent, target); (gdb) p *priv $70 = {job = {cond = {cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}}, active = QEMU_JOB_MODIFY, owner = 20286, asyncCond = {cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}}, asyncJob = QEMU_ASYNC_JOB_NONE, asyncOwner = 0, phase = 0, mask = 0, start = 0, dump_memory_only = false, info = { type = 0, timeElapsed = 0, timeRemaining = 0, dataTotal = 0, dataProcessed = 0, dataRemaining = 0, memTotal = 0, memProcessed = 0, memRemaining = 0, fileTotal = 0, fileProcessed = 0, fileRemaining = 0}}, mon = 0x7f4ce80a3570, monConfig = 0x0, monJSON = 1, monError = false, monStart = 0, *agent = 0x0*, agentError = false, agentStart = 1344402957193, gotShutdown = true, beingDestroyed = false, pidfile = 0x7f4ce816ba90 "/var/run/libvirt/qemu/myRHEL6.pid", nvcpupids = 1, vcpupids = 0x7f4ce8146be0, pciaddrs = 0x7f4ce8171b30, persistentAddrs = 1, qemuCaps = 0x7f4ce80b4030, lockState = 0x0, fakeReboot = false, jobs_queued = 1, migMaxBandwidth = 32, origname = 0x0, cons = 0x7f4ce8165ce0, cleanupCallbacks = 0x0, ncleanupCallbacks = 0, ncleanupCallbacks_max = 0} (gdb) p priv->agent $71 =*(qemuAgentPtr) 0x0* (gdb) s 1138 refs = qemuAgentUnref(priv->agent); (gdb) s qemuAgentUnref (mon=0x0) at qemu/qemu_agent.c:168 (gdb) s 170 VIR_DEBUG("%d", mon->refs); (gdb) s 168 { (gdb) s 169 mon->refs--; (gdb) s Program received signal SIGSEGV, Segmentation fault. qemuAgentUnref (*mon=0x0*) at qemu/qemu_agent.c:169 169 *mon->refs--*; (gdb) s virNetServerFatalSignal (sig=11, siginfo=0x7f4cf748f630, context=0x7f4cf748f500) at rpc/virnetserver.c:296 In addition, the old qemuAgentUnref(mon) hasn't judge whether its parameter is NULL then will deref a NULL pointer, I should simply fix it in qemuAgentUnref(), for example, if 'mon' is NULL then directly return. Fortunately, your commit b57ee09 potentially fix this issue via using virObjectUnref() instead of qemuAgentUnref(), if the parameter 'priv->agent' is NULL then the virObjectUnref(priv->agent) will directly return false: bool virObjectUnref(void *anyobj) { virObjectPtr obj = anyobj; if (!obj) return false; ...... } Regards, Alex
Daniel

On Wed, Aug 08, 2012 at 02:26:26PM +0800, Alex Jia wrote:
On 08/07/2012 07:34 PM, Daniel P. Berrange wrote:
On Tue, Aug 07, 2012 at 03:18:38PM +0800, Alex Jia wrote:
* src/qemu/qemu_domain.c (qemuDomainObjExitAgentInternal): fix crashing libvirtd due to derefing a NULL pointer.
For details, please see bug: RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=845966
Signed-off-by: Alex Jia<ajia@redhat.com> --- src/qemu/qemu_domain.c | 10 ++++++---- 1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 86f0265..8667b6c 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1136,12 +1136,14 @@ qemuDomainObjExitAgentInternal(struct qemud_driver *driver, virDomainObjPtr obj) { qemuDomainObjPrivatePtr priv = obj->privateData; - int refs; + int refs = -1;
- refs = qemuAgentUnref(priv->agent); + if (priv->agent) { + refs = qemuAgentUnref(priv->agent);
- if (refs> 0) - qemuAgentUnlock(priv->agent); + if (refs> 0) + qemuAgentUnlock(priv->agent); + }
if (driver_locked) qemuDriverLock(driver); I'm not convinced this is the right fix. The whole point of the Enter/ExitAgent methods is to hold an extra reference on priv->agent, so that it is *not* deleted while a agent command is run.
What is setting priv->agent to NULL while the command is still active ?
In fact, the command 'guest-suspend-disk' is freed by virJSONValueFree() in qemuAgentSuspend() after the command is successfully sent via 'qemuAgentCommand()':
(gdb) s qemuDomainPMSuspendForDuration (dom=<value optimized out>, target=1, duration=<value optimized out>, flags=<value optimized out>) at qemu/qemu_driver.c:13123 13123 qemuDomainObjExitAgent(driver, vm); (gdb) p *vm $68 = {object = {magic = 3405643788, refs = 4, klass = 0x7f4ce815a9b0}, lock = {lock = {__data = {__lock = 1, __count = 0, __owner = 20285, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\001\000\000\000\000\000\000\000=O\000\000\001", '\000' <repeats 26 times>, __align = 1}}, pid = 20379, *state = {state = 4, reason = 0}*, autostart = 0, persistent = 1, updated = 0, def = 0x7f4ce815e500, newDef = 0x7f4ce8069b80, snapshots = {objs = 0x7f4ce815e240, metaroot = {def = 0x0, parent = 0x0, sibling = 0x0, nchildren = 0, first_child = 0x0}}, current_snapshot = 0x0, hasManagedSave = false, privateData = 0x7f4ce8154660, privateDataFreeFunc = 0x7f4cefada190 <qemuDomainObjPrivateFree>, taint = 4} (gdb) s 13122 ret = qemuAgentSuspend(priv->agent, target);
(gdb) p *priv $70 = {job = {cond = {cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}}, active = QEMU_JOB_MODIFY, owner = 20286, asyncCond = {cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}}, asyncJob = QEMU_ASYNC_JOB_NONE, asyncOwner = 0, phase = 0, mask = 0, start = 0, dump_memory_only = false, info = { type = 0, timeElapsed = 0, timeRemaining = 0, dataTotal = 0, dataProcessed = 0, dataRemaining = 0, memTotal = 0, memProcessed = 0, memRemaining = 0, fileTotal = 0, fileProcessed = 0, fileRemaining = 0}}, mon = 0x7f4ce80a3570, monConfig = 0x0, monJSON = 1, monError = false, monStart = 0, *agent = 0x0*, agentError = false, agentStart = 1344402957193, gotShutdown = true, beingDestroyed = false, pidfile = 0x7f4ce816ba90 "/var/run/libvirt/qemu/myRHEL6.pid", nvcpupids = 1, vcpupids = 0x7f4ce8146be0, pciaddrs = 0x7f4ce8171b30, persistentAddrs = 1, qemuCaps = 0x7f4ce80b4030, lockState = 0x0, fakeReboot = false, jobs_queued = 1, migMaxBandwidth = 32, origname = 0x0, cons = 0x7f4ce8165ce0, cleanupCallbacks = 0x0, ncleanupCallbacks = 0, ncleanupCallbacks_max = 0} (gdb) p priv->agent $71 =*(qemuAgentPtr) 0x0* (gdb) s 1138 refs = qemuAgentUnref(priv->agent); (gdb) s qemuAgentUnref (mon=0x0) at qemu/qemu_agent.c:168 (gdb) s 170 VIR_DEBUG("%d", mon->refs); (gdb) s 168 { (gdb) s 169 mon->refs--; (gdb) s
Program received signal SIGSEGV, Segmentation fault. qemuAgentUnref (*mon=0x0*) at qemu/qemu_agent.c:169 169 *mon->refs--*; (gdb) s virNetServerFatalSignal (sig=11, siginfo=0x7f4cf748f630, context=0x7f4cf748f500) at rpc/virnetserver.c:296
In addition, the old qemuAgentUnref(mon) hasn't judge whether its parameter is NULL then will deref a NULL pointer, I should simply fix it in qemuAgentUnref(), for example, if 'mon' is NULL then directly return.
Fortunately, your commit b57ee09 potentially fix this issue via using virObjectUnref() instead of qemuAgentUnref(), if the parameter 'priv->agent' is NULL then the virObjectUnref(priv->agent) will directly return false:
That is just lucky and we should not rely on that. There is still a bug here. The 'priv->agent' field should *not* be set to NULL in the first place, while a monitor command is active. The GDB trace above does not show *what* is setting priv->agent to NULL, and that is what we need to find out. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (2)
-
Alex Jia
-
Daniel P. Berrange