Re: [libvirt] [PATCH 02/10] qemu: agent: dont set error after agent cleanup

Thursday, 27 October 2016

On 27.10.2016 11:58, Nikolay Shirokovskiy wrote:
...

 On 26.10.2016 22:49, John Ferlan wrote:
>
>
> On 10/04/2016 02:56 AM, Nikolay Shirokovskiy wrote:
>> If there is an error event after eof event then after agent
>> is cleaned up on eof error flag will be set back and remains
>> set after next domain start up making agent unavailable.
>> Thus let's check before set this flag on error event.
>> ---
>>  src/qemu/qemu_process.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index a581f96..3637f4b 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -180,7 +180,8 @@ qemuProcessHandleAgentError(qemuAgentPtr agent
ATTRIBUTE_UNUSED,
>>  
>>      priv = vm->privateData;
>>  
>> -    priv->agentError = true;
>> +    if (priv->agent)
>> +        priv->agentError = true;
>
> A NULL priv->agent happens for the following reasons:
>
> 1. qemuDomainObjExitAgent - when the last reference is removed... I
> would hope we're not falling into this path for what you've seen...
>
> 2. processSerialChangedEvent - after a qemuAgentClose for VSERPORT
> callback and agentError is set to false... (so that seems to be
> happening properly)
>
> 3. qemuProcessHandleAgentEOF (discussed in patch 1) - where I don't
> believe you should be clearing agentError
>
> 4. qemuConnectAgent has a failure after qemuAgentOpen creates the
> 'agent', but before sets priv->agent = agent.
>
> 5. qemuProcessStop - stops a started agent *and* clears agentError flag
> (so that seems to be happening properly).
>
> w/r/t #4... there are windows in qemuConnectAgent where some error after
> qemuAgentOpen *succeeds* could trigger this function to be called before
> priv->agent is filled in.

 Can not agree.
 Places you mentioned and qemuProcessHandleAgentError are serialized
 thru domain lock. So if the agent start up is ok and error is 
 in io loop we do not break anything - when qemuProcessHandleAgentError
 get change to handle error priv->agent is not NULL. On the other
 hand this patch cures the situation that similar to described in patch 1.

 1. shutdown
 2. we get VSERPORT
 3. we get ERROR from agent and set error flag
 4. again we have problems on next domain start and agent usage, just
 as described in patch 1.

 Actually this is more like a protection. As on step 2 we close agent
 and eventually remove handle from io loop and this operation is
 synchronous (no more events will be delivered). 
Ohh I failed again with reasoning... Lock is dropped during qemuAgentOpen
in qemuConnectAgent.

So let's consider it again.

Yes. If we clear the error in qemuProcessHandleAgentError conditionally
we can miss agent error upon startup etc. But hopefully next time
we try to use agent we get timeout/new error.

On the other hand error can be delivered after priv->agent is closed
and set to NULL on shutdown. In this case if we set the error flag
unconditionally we get problems on next domain start and agent use.
Just as in case of patch 1.

Eventually patch 3 makes patch 1 and 2 not necessary. I just wanted
to keep correct value in the error flag just for the case. As to
what is the meaning of this flag and it's correct value - patch
5 explains it.

...

>
> Thus by only clearing it when priv->agent is set, we could miss some
> error that occurred.  While the "next" call could also fail and
> eventually set the error, I'm not convinced that only clearing when
> priv->agent is set is right.
>
> Long way of saying NAK on this one.
>
> John
>>  
>>      virObjectUnlock(vm);
>>  }
>>

 --
 libvir-list mailing list
 libvir-list(a)redhat.com
 https://www.redhat.com/mailman/listinfo/libvir-list

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [PATCH 02/10] qemu: agent: dont set error after agent cleanup