Re: [libvirt] [PATCH] qemu: Resolve agent deadlock

Tuesday, 27 October 2015

On Mon, Oct 26, 2015 at 17:28:57 -0400, John Ferlan wrote:
...

 On 10/26/2015 11:37 AM, Peter Krempa wrote:
 > On Mon, Oct 26, 2015 at 11:06:21 -0400, John Ferlan wrote:
 >> If we have a shutdown of a VM by a qemu agent while an agent EOF occurs
 >> at relatively the same time, it's possible that a deadlock occurs depending
 >> on what happens first.
 >>
 >> Assume qemuProcessHandleAgentEOF runs first, with the vm->lock it clears
 >> priv->agent, then clears the vm->lock.
 > 
 > Couldn't we make sure, that qemuProcessHandleAgentEOF clears priv->agent
 > if and only if it removed the last reference?

 reference to?  agent?  vm?  via? 
Since all of this is refering to the agent I was refering to it too.

...
 qemuConnectAgent/qemuAgentOpen takes out a reference to the agent.
That
 is un-referenced by qemuAgentClose. The EnterAgent takes out a reference
 which is un-referenced during ExitAgent. Adding a new reference just for
 EOF processing doesn't solve the problem.

 EOF doesn't do an agent-unlock, it does a vm-lock/unlock, and calls
 qemuAgentClose which will attempt an agent-lock. The agent may be locked
 by some other thread and it needs to wait until that reference is cleared.

 Adding some sort of unref-agent check logic in EOF similar to ExitAgent
 feels like it'll cause other issues. It seems "backwards" to remove the
 last reference and avoid everything else that qemuCloseAgent does.

 I think logically if some sort of unref-agent check logic was added,
 then qemuCloseAgent could not be called from either EOF or ExitAgent.
 >From EOF, if the current thread was the last one, then the agent is
 free'd so qemuCloseAgent shouldn't be called. If the current thread
 wasn't the last one, then we'd have to wait for the last reference check
 in ExitAgent, but once that is done, the agent is freed and thus
 qemuCloseAgent shouldn't be called. 
I wanted to point out that since we do have the 'EnterAgent' and
'ExitAgent' helpers, they should be used to do any kind of logic
required to do the locking and it should not be necessary at any point
to copy the pointer to priv->agent. Otherwise it creates a really ugly
usage pattern which requires a separate variable and basically defeats
the Enter/Exit pattern we use anywhere else.

Doing so probably will increase the complexity of either the helpers or
the closing function, but the complexity will not be exposed in a
repeated pattern through the code. Since we already have the helpers, we
should use them and not clutter the rest of the code.

One other option worth checking is moving the stuff happening in
qemuAgentClose into the destructor for the agent object
(qemuAgentDispose) which would then auto-call it in the case where
you remove the last reference. I didn't check thoroughly enough though
to see whether it's possible.

Peter

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [PATCH] qemu: Resolve agent deadlock