[libvirt] [PATCH 0/2] Avoid issue when VM quits while attempting connection to the guest agent.

Peter Krempa (2): DO NOT APPLY UPSTREAM: Reproducer qemu: Avoid operations on NULL monitor if VM fails early src/qemu/qemu_process.c | 36 +++++++++++++++++++++++++++--------- 1 file changed, 27 insertions(+), 9 deletions(-) -- 1.8.5.2

Apply this patch and run a patched libvirt daemon. Then start a VM and kill it's process after few seconds: # virsh start VM & sleep 3; killall -9 qemu-kvm Upstream version doesn't crash but the virsh command returns cryptic error message: error: invalid argument: monitor must not be NULL 0.10.2 and similar downstream versions segfault: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffed467700 (LWP 29256)] __pthread_mutex_lock (mutex=0x10) at pthread_mutex_lock.c:50 50 unsigned int type = PTHREAD_MUTEX_TYPE (mutex); Missing separate debuginfos, use: debuginfo-install VirtualBox-4.2-4.2.22_91556_el6-1.x86_64 (gdb) bt #0 __pthread_mutex_lock (mutex=0x10) at pthread_mutex_lock.c:50 #1 0x00007fffe891585e in qemuDomainObjEnterMonitorInternal (driver=0x7fffe00858f0, driver_locked=true, obj=0x7fffe01689c0, asyncJob=<value optimized out>) at qemu/qemu_domain.c:1008 #2 0x00007fffe892662a in qemuProcessDetectVcpuPIDs (driver=0x7fffe00858f0, vm=0x7fffe01689c0) at qemu/qemu_process.c:1831 #3 0x00007fffe892c417 in qemuProcessStart (conn=0x7fffdc000ae0, driver=0x7fffe00858f0, vm=0x7fffe01689c0, migrateFrom=0x0, stdin_fd=-1, stdin_path=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=1) at qemu/qemu_process.c:4022 #4 0x00007fffe8973f3e in qemuDomainObjStart (conn=0x7fffdc000ae0, driver=0x7fffe00858f0, vm=0x7fffe01689c0, flags=<value optimized out>) at qemu/qemu_driver.c:6025 --- src/qemu/qemu_process.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 9331744..a27eded 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -243,6 +243,8 @@ qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm) config, &agentCallbacks); + sleep(10); + virObjectLock(vm); priv->agentStart = 0; -- 1.8.5.2

https://bugzilla.redhat.com/show_bug.cgi?id=1047659 If a VM dies very early during an attempted connect to the guest agent while the locks are down the domain monitor object will be freed. The object is then accessed later as any failure during guest agent startup isn't considered fatal. In the current upstream version this doesn't lead to a crash as virObjectLock called when entering the monitor in qemuProcessDetectVcpuPIDs checks the pointer before attempting to dereference (lock) it. The NULL pointer is then caught in the monitor helper code. Before the introduction of virObjectLockable - observed on 0.10.2 - the pointer is locked directly via virMutexLock leading to a crash. To avoid this problem we need to differentiate between the guest agent not being present and the VM quitting when the locks were down. The fix reorganizes the code in qemuConnectAgent to add the check and then adds special handling to the callers. --- src/qemu/qemu_process.c | 34 +++++++++++++++++++++++++--------- 1 file changed, 25 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index a27eded..cf23ff3 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -248,6 +248,17 @@ qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm) virObjectLock(vm); priv->agentStart = 0; + if (agent == NULL) + virObjectUnref(vm); + + if (!virDomainObjIsActive(vm)) { + qemuAgentClose(agent); + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("guest crashed while connecting to the guest agent")); + ret = -2; + goto cleanup; + } + if (virSecurityManagerClearSocketLabel(driver->securityManager, vm->def) < 0) { VIR_ERROR(_("Failed to clear security context for agent for %s"), @@ -255,13 +266,7 @@ qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm) goto cleanup; } - if (agent == NULL) - virObjectUnref(vm); - if (!virDomainObjIsActive(vm)) { - qemuAgentClose(agent); - goto cleanup; - } priv->agent = agent; if (priv->agent == NULL) { @@ -3120,6 +3125,7 @@ qemuProcessReconnect(void *opaque) int reason; virQEMUDriverConfigPtr cfg; size_t i; + int ret; memcpy(&oldjob, &data->oldjob, sizeof(oldjob)); @@ -3144,7 +3150,10 @@ qemuProcessReconnect(void *opaque) goto error; /* Failure to connect to agent shouldn't be fatal */ - if (qemuConnectAgent(driver, obj) < 0) { + if ((ret = qemuConnectAgent(driver, obj)) < 0) { + if (ret == -2) + goto error; + VIR_WARN("Cannot connect to QEMU guest agent for %s", obj->def->name); virResetLastError(); @@ -4018,7 +4027,10 @@ int qemuProcessStart(virConnectPtr conn, goto cleanup; /* Failure to connect to agent shouldn't be fatal */ - if (qemuConnectAgent(driver, vm) < 0) { + if ((ret = qemuConnectAgent(driver, vm)) < 0) { + if (ret == -2) + goto cleanup; + VIR_WARN("Cannot connect to QEMU guest agent for %s", vm->def->name); virResetLastError(); @@ -4478,6 +4490,7 @@ int qemuProcessAttach(virConnectPtr conn ATTRIBUTE_UNUSED, virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); virCapsPtr caps = NULL; bool active = false; + int ret; VIR_DEBUG("Beginning VM attach process"); @@ -4592,7 +4605,10 @@ int qemuProcessAttach(virConnectPtr conn ATTRIBUTE_UNUSED, goto error; /* Failure to connect to agent shouldn't be fatal */ - if (qemuConnectAgent(driver, vm) < 0) { + if ((ret = qemuConnectAgent(driver, vm)) < 0) { + if (ret == -2) + goto error; + VIR_WARN("Cannot connect to QEMU guest agent for %s", vm->def->name); virResetLastError(); -- 1.8.5.2

On 14.01.2014 19:31, Peter Krempa wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1047659
If a VM dies very early during an attempted connect to the guest agent while the locks are down the domain monitor object will be freed. The object is then accessed later as any failure during guest agent startup isn't considered fatal.
In the current upstream version this doesn't lead to a crash as virObjectLock called when entering the monitor in qemuProcessDetectVcpuPIDs checks the pointer before attempting to dereference (lock) it. The NULL pointer is then caught in the monitor helper code.
Before the introduction of virObjectLockable - observed on 0.10.2 - the pointer is locked directly via virMutexLock leading to a crash.
To avoid this problem we need to differentiate between the guest agent not being present and the VM quitting when the locks were down. The fix reorganizes the code in qemuConnectAgent to add the check and then adds special handling to the callers. --- src/qemu/qemu_process.c | 34 +++++++++++++++++++++++++--------- 1 file changed, 25 insertions(+), 9 deletions(-)
ACK and safe for 1.2.1. Michal

On 01/15/14 17:45, Michal Privoznik wrote:
On 14.01.2014 19:31, Peter Krempa wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1047659
If a VM dies very early during an attempted connect to the guest agent while the locks are down the domain monitor object will be freed. The object is then accessed later as any failure during guest agent startup isn't considered fatal.
In the current upstream version this doesn't lead to a crash as virObjectLock called when entering the monitor in qemuProcessDetectVcpuPIDs checks the pointer before attempting to dereference (lock) it. The NULL pointer is then caught in the monitor helper code.
Before the introduction of virObjectLockable - observed on 0.10.2 - the pointer is locked directly via virMutexLock leading to a crash.
To avoid this problem we need to differentiate between the guest agent not being present and the VM quitting when the locks were down. The fix reorganizes the code in qemuConnectAgent to add the check and then adds special handling to the callers. --- src/qemu/qemu_process.c | 34 +++++++++++++++++++++++++--------- 1 file changed, 25 insertions(+), 9 deletions(-)
ACK and safe for 1.2.1.
Michal
Pushed; Thanks. Peter
participants (2)
-
Michal Privoznik
-
Peter Krempa