On 26.10.2016 22:53, John Ferlan wrote:
On 10/04/2016 02:56 AM, Nikolay Shirokovskiy wrote:
> There were a few errors in the code when this flag was not
> cleared upon monitor cleanup. All of them could be fixed
> just resetting this flag upon agent monitor initialization.
We should fix the places where the flag wasn't cleared properly then.
Well, the first 2 patches are exactly for this purpose.
> ---
> src/qemu/qemu_process.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index 3637f4b..42f7f84 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -210,6 +210,8 @@ qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm)
> qemuAgentPtr agent = NULL;
> virDomainChrDefPtr config = qemuFindAgentConfig(vm->def);
>
> + priv->agentError = false;
> +
This idea actually may have some merit and may actually be the cause of
the bug you saw, but not right here.
> if (!config)
> return 0;
>
>
I think perhaps a better place would be some time after we ascertain
that "priv->agent" is not already set since the only way to set it is as
a result of this function.
My first inclination was in that VSERPORT_CHANGE check and return 0
(e.g. if called from qemuMigrationFinish, qemuProcessReconnect,
qemuProcessLaunch, or qemuProcessAttach).
But then I wasn't sure if there would be a race between setting 'state'
to VIR_DOMAIN_CHR_DEVICE_STATE_CONNECTED in processSerialChangedEvent
and one of the 4 other paths.
So, I think perhaps the better place would be after the virObjectRef(vm)
prior to calling qemuAgentOpen and unlocking the vm. IOW: We have the
lock, alter priv->agentError. We're about to start anyway...
If we fail to clean up the error flag correctly (patches 1 and 2 are dropped)
on shutdown then cleaning it up in this place will be too late. For example
in case we have QEMU_CAPS_VSERPORT_CHANGE then until serial port event
we will be in error state while it is unconnected really.
Resetting the error flag in this function is really a hack, a protection
from errors like in patch 1 or 2. After thinking a while I would
event put resetting before
if (!config)
return 0;
check. Consider next situation:
1. shutown and the error flag is set after shutdown as described earlier
2. config is changed so that agent channel is removed
3. domain start
4. now as the flag is set, on attempt to use agent we get the same
"QEMU guest agent is not available due to an error" message while
agent is just not configured.
That way, we'd know we're about the unconditionally clear the agentError
just as if we were starting the first time, reconnecting to a running
domain, migrating, or attaching.
Thus ignoring patch 1 and 2 for a moment and considering this patch
alone, we'd know for sure the agentError was cleared before a "real"
startup and after a qemuProcessStop. Thus clearing before we start
seems to be right (although it makes me wonder how it could have been set).
Personally, I'd start with applying patch 4 and then trying to reproduce
the condition... Have some sort of "marker" here that checks if
agentError is true then complain. If we can figure out how we get into
this function with agentError being set, then I think that'll really
help me understand the order of events...
John