On 09/26/2015 08:18 AM, Wang Yufei wrote:
We shutdown a VM A by qemu agent,meanwhile an agent EOF
of VM A happened, there's a chance that deadlock occurred:
qemuProcessHandleAgentEOF in main thread
A) priv->agent = NULL; //A happened before B
//deadlock when we get agent lock which's held by worker thread
qemuAgentClose(agent);
qemuDomainObjExitAgent called by qemuDomainShutdownFlags in worker thread
B) hasRefs = virObjectUnref(priv->agent); //priv->agent is NULL, return false
if (hasRefs)
virObjectUnlock(priv->agent); //agent lock will not be released here
So I close agent first, then set priv->agent NULL to fix the deadlock.
Signed-off-by: Wang Yufei <james.wangyufei(a)huawei.com>
Reviewed-by: Ren Guannan <renguannan(a)huawei.com>
---
src/qemu/qemu_process.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
Interesting - this is the exact opposite of commit id '1020a504' from
Michal over 3 years ago.
However, a bit of digging into the claim from the commit message drove
me to commit id '362d04779c' which removes the domain object lock that
was the basis of the initial patch.
While I'm not an expert in the vmagent code, I do note that the
HandleAgentEOF code checks for !priv->agent and priv->beingDestroyed,
while the ExitAgent code doesn't necessarily (or directly) check whether
the priv->agent is still valid (IOW: that nothing else has removed it
already like the EOF).
So, while I don't discount that the patch works - I'm wondering whether
the smarts/logic should be built into ExitAgent to check for
!priv->agent (and or something else) that would indicate we're already
on the path of shutdown.
John
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index f2586a1..8c9622e 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -150,11 +150,10 @@ qemuProcessHandleAgentEOF(qemuAgentPtr agent,
goto unlock;
}
+ qemuAgentClose(agent);
priv->agent = NULL;
virObjectUnlock(vm);
-
- qemuAgentClose(agent);
return;
unlock: