On 27.09.2018 14:37, Wang Yechao wrote:
> qemuAgentClose and qemuAgentIO have race condition,
> as follows:
>
> main thread: second thread:
> virEventPollRunOnce processSerialChangedEvent
> virEventPollDispatchHandles
> virMutexUnlock(&eventLoop.lock)
> qemuAgentClose
> virObjectLock(mon)
> virEventRemoveHandle
> VIR_FORCE_CLOSE(mon->fd)
> virObjectUnlock(mon)
> priv->agentError = false
> qemuAgentIO
> virObjectLock(mon)
> mon->fd != fd --> error = true
> qemuProcessHandleAgentError
> priv->agentError = true
> virObjectUnlock(mon)
> virMutexLock(&eventLoop.lock)
>
> qemuAgentClose set the mon->fd to '-1', and then qemuAgentIO
> check the mon->fd not equals to fd that registered before.
> qemuProcessHandleAgentError will be called to set
> priv->agentError to 'true', then the priv->agentError is
> always 'true' except restart libvirtd or restart
> qemu-guest-agent process in guest. We can't send any
> qemu-agent-command anymore even if qemuConnectAgent return
> success later.
>
> This is accidently occurs when hot-add-vcpu in windows2012.
> virsh setvcpus ...
> virsh qemu-agent-command $vm
'{"execute":"guest-get-vcpus"}'
>
> Reset the priv->agentError to 'false' when qemuConnectAgent sucess
> to fix this problem.
>
> Signed-off-by: Wang Yechao <wang.yechao255(a)zte.com.cn>
> ---
> src/qemu/qemu_process.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index 29b0ba1..4fbb955 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -269,6 +269,7 @@ qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm)
> virResetLastError();
> }
>
> + priv->agentError = false;
> return 0;
> }
>
>
There was similar problem with qemu monitor:
commit 89563efc0209b854d2b2e554423423d7602acdbd
Author: Daniel P. Berrange <berrange(a)redhat.com>
Date: Fri Sep 28 15:27:39 2012 +0100
Avoid bogus I/O event errors when closing the QEMU monitor
After calling qemuMonitorClose(), it is still possible for
the QEMU monitor I/O event callback to get invoked. This
will trigger an error message because mon->fd has been set
to -1 at this point. Silently ignore the case where mon->fd
is -1, likewise for mon->watch being zero.
Signed-off-by: Daniel P. Berrange <berrange(a)redhat.com
This approach has advantage it does not set agentError at all. So clients
see agent state as disconnected (which is true) instead of error.
Nikolay
Thanks Nikolay, this solution is better. I'll give another patch.
---
Best wishes
Wang Yechao