Re: [libvirt] [PATCH] qemu_agent: fix deadlock in qemuProcessHandleAgentEOF

Tuesday, 13 October 2015

On 2015/10/2 20:17, John Ferlan wrote:

...

 On 09/26/2015 08:18 AM, Wang Yufei wrote:
> We shutdown a VM A by qemu agent,meanwhile an agent EOF
> of VM A happened, there's a chance that deadlock occurred:
>
> qemuProcessHandleAgentEOF in main thread
> A)  priv->agent = NULL; //A happened before B
>
>     //deadlock when we get agent lock which's held by worker thread
>     qemuAgentClose(agent);
>
> qemuDomainObjExitAgent called by qemuDomainShutdownFlags in worker thread
> B)  hasRefs = virObjectUnref(priv->agent); //priv->agent is NULL, return false
>
>     if (hasRefs)
>         virObjectUnlock(priv->agent); //agent lock will not be released here
>
> So I close agent first, then set priv->agent NULL to fix the deadlock.
>
> Signed-off-by: Wang Yufei <james.wangyufei(a)huawei.com&gt;
> Reviewed-by: Ren Guannan <renguannan(a)huawei.com&gt;
> ---
>  src/qemu/qemu_process.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>

 Interesting - this is the exact opposite of commit id '1020a504' from
 Michal over 3 years ago.

 However, a bit of digging into the claim from the commit message drove
 me to commit id '362d04779c' which removes the domain object lock that
 was the basis of the initial patch.

 While I'm not an expert in the vmagent code, I do note that the
 HandleAgentEOF code checks for !priv->agent and priv->beingDestroyed,
 while the ExitAgent code doesn't necessarily (or directly) check whether
 the priv->agent is still valid (IOW: that nothing else has removed it
 already like the EOF).

 So, while I don't discount that the patch works - I'm wondering whether
 the smarts/logic should be built into ExitAgent to check for
 !priv->agent (and or something else) that would indicate we're already
 on the path of shutdown.

 John

> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index f2586a1..8c9622e 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -150,11 +150,10 @@ qemuProcessHandleAgentEOF(qemuAgentPtr agent,
>          goto unlock;
>      }
>  
> +    qemuAgentClose(agent);
>      priv->agent = NULL;
>  
>      virObjectUnlock(vm);
> -
> -    qemuAgentClose(agent);
>      return;
>  
>   unlock:
>

 .

Thank you for your reply.

At first, we should consider about the right logic. In my oppinion, we should call
qemuAgentClose at first, then we set priv->agent NULL, just like the logic in
qemuProcessStop.

    if (priv->agent) {
        qemuAgentClose(priv->agent);
        priv->agent = NULL;
        priv->agentError = false;
    }

Base on the right logic, we consider about the right lock which I have shown in my patch.

Whether the smarts/logic should be built into ExitAgent to check for !priv->agent (and
or
something else) that would indicate we're already on the path of shutdown?

The answer is yes, we have to, because in ExitAgent we use agent lock which may be
released
by other thread like qemuAgentClose, we have to check the agent first to make sure
it's
safe to visit.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [PATCH] qemu_agent: fix deadlock in qemuProcessHandleAgentEOF