On 10/04/2016 02:56 AM, Nikolay Shirokovskiy wrote:
There were a few errors in the code when this flag was not
cleared upon monitor cleanup. All of them could be fixed
just resetting this flag upon agent monitor initialization.
We should fix the places where the flag wasn't cleared properly then.
---
src/qemu/qemu_process.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 3637f4b..42f7f84 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -210,6 +210,8 @@ qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm)
qemuAgentPtr agent = NULL;
virDomainChrDefPtr config = qemuFindAgentConfig(vm->def);
+ priv->agentError = false;
+
This idea actually may have some merit and may actually be the cause of
the bug you saw, but not right here.
if (!config)
return 0;
I think perhaps a better place would be some time after we ascertain
that "priv->agent" is not already set since the only way to set it is as
a result of this function.
My first inclination was in that VSERPORT_CHANGE check and return 0
(e.g. if called from qemuMigrationFinish, qemuProcessReconnect,
qemuProcessLaunch, or qemuProcessAttach).
But then I wasn't sure if there would be a race between setting 'state'
to VIR_DOMAIN_CHR_DEVICE_STATE_CONNECTED in processSerialChangedEvent
and one of the 4 other paths.
So, I think perhaps the better place would be after the virObjectRef(vm)
prior to calling qemuAgentOpen and unlocking the vm. IOW: We have the
lock, alter priv->agentError. We're about to start anyway...
That way, we'd know we're about the unconditionally clear the agentError
just as if we were starting the first time, reconnecting to a running
domain, migrating, or attaching.
Thus ignoring patch 1 and 2 for a moment and considering this patch
alone, we'd know for sure the agentError was cleared before a "real"
startup and after a qemuProcessStop. Thus clearing before we start
seems to be right (although it makes me wonder how it could have been set).
Personally, I'd start with applying patch 4 and then trying to reproduce
the condition... Have some sort of "marker" here that checks if
agentError is true then complain. If we can figure out how we get into
this function with agentError being set, then I think that'll really
help me understand the order of events...
John