
On 11.03.2020 12:38, Michal Privoznik wrote:
On 3/5/20 3:47 PM, Nikolay Shirokovskiy wrote:
needReply added in [1] looks redundant. Indeed it is set to false only when mon->await_event is set too (the only exception qemuAgentFSTrim which is mistaken).
However it fixes the issue when qemuAgentCommand exits on error path and mon->await_event is not reset. Let's instead reset mon->await_event properly.
Also remove "Woken up by event" debug message as it can be misleading. We can get it also if monitor is closed due to serial changed event currently. Anyway both qemuAgentClose and qemuAgentNotifyEvent log itself.
[1] qemu: make sure agent returns error when required data are missing
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_agent.c | 47 ++++++++++++++++++++----------------------- 1 file changed, 22 insertions(+), 25 deletions(-)
diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c index 2de53efb7a..605c9f563e 100644 --- a/src/qemu/qemu_agent.c +++ b/src/qemu/qemu_agent.c @@ -1112,7 +1112,6 @@ static int qemuAgentCommand(qemuAgentPtr mon, virJSONValuePtr cmd, virJSONValuePtr *reply, - bool needReply, int seconds) { int ret = -1; @@ -1121,17 +1120,16 @@ qemuAgentCommand(qemuAgentPtr mon, int await_event = mon->await_event; *reply = NULL; + memset(&msg, 0, sizeof(msg)); if (!mon->running) { virReportError(VIR_ERR_AGENT_UNRESPONSIVE, "%s", _("Guest agent disappeared while executing command")); - return -1; + goto cleanup; } if (qemuAgentGuestSync(mon) < 0) - return -1; - - memset(&msg, 0, sizeof(msg)); + goto cleanup; if (!(cmdstr = virJSONValueToString(cmd, false))) goto cleanup; @@ -1149,9 +1147,7 @@ qemuAgentCommand(qemuAgentPtr mon, /* If we haven't obtained any reply but we wait for an * event, then don't report this as error */ if (!msg.rxObject) { - if (await_event && !needReply) { - VIR_DEBUG("Woken up by event %d", await_event); - } else {
Please keep this debug printing. It doesn't hurt anything and has potential of helping us to debug.
This debug line is not correct. I reflected it in the commit message. I actually get this line during debug session when agent monitor is closed during VM shutdown on serial event and not shutdown event. Serial event just comes first.
+ if (!await_event) { if (mon->running) virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Missing monitor reply object")); @@ -1169,6 +1165,7 @@ qemuAgentCommand(qemuAgentPtr mon, cleanup: VIR_FREE(cmdstr); VIR_FREE(msg.txBuffer); + mon->await_event = QEMU_AGENT_EVENT_NONE;
This is the part I don't like about this patch. The rest looks fine. Why is this needed? Either the mon->await_event is not set by the caller, or if set the the event handler will clear it out by calling qemuAgentNotifyEvent(). Or am I missing something?
The whole point of the patch is in this line) Yeah, you are talking of success paths. But on error paths mon->await_event is not cleared currently so next command which need reply can be finish on await event without needReply which lead to SIGSGEV. This is what patch [1] fixes by introducing needReply which don't need to be cleared as it is on stack. I guess scenario is next given the bug [2] mentioned in [1]: - vm shutdown command is sent to GA - it is finished with timeout (60 s), mon->await_event is not cleared - guest ping is sent - guest shutdowned and ping command get awakend by this event as mon->await_event is set (bug) So needReply argument is helpful but it is cleaner just to reset mon->await_event properly. [1] qemu: make sure agent returns error when required data are missing [2] https://bugzilla.redhat.com/show_bug.cgi?id=1058149 Nikolay
return ret; } @@ -1269,7 +1266,7 @@ int qemuAgentShutdown(qemuAgentPtr mon, mon->await_event = QEMU_AGENT_EVENT_RESET; else mon->await_event = QEMU_AGENT_EVENT_SHUTDOWN; - ret = qemuAgentCommand(mon, cmd, &reply, false, + ret = qemuAgentCommand(mon, cmd, &reply, VIR_DOMAIN_QEMU_AGENT_COMMAND_SHUTDOWN); virJSONValueFree(cmd); @@ -1312,7 +1309,7 @@ int qemuAgentFSFreeze(qemuAgentPtr mon, const char **mountpoints, if (!cmd) goto cleanup; - if (qemuAgentCommand(mon, cmd, &reply, true, mon->timeout) < 0) + if (qemuAgentCommand(mon, cmd, &reply, mon->timeout) < 0) goto cleanup; if (virJSONValueObjectGetNumberInt(reply, "return", &ret) < 0) { @@ -1349,7 +1346,7 @@ int qemuAgentFSThaw(qemuAgentPtr mon) if (!cmd) return -1; - if (qemuAgentCommand(mon, cmd, &reply, true, mon->timeout) < 0) + if (qemuAgentCommand(mon, cmd, &reply, mon->timeout) < 0) goto cleanup; if (virJSONValueObjectGetNumberInt(reply, "return", &ret) < 0) { @@ -1386,7 +1383,7 @@ qemuAgentSuspend(qemuAgentPtr mon, return -1; mon->await_event = QEMU_AGENT_EVENT_SUSPEND; - ret = qemuAgentCommand(mon, cmd, &reply, false, mon->timeout); + ret = qemuAgentCommand(mon, cmd, &reply, mon->timeout); virJSONValueFree(cmd); virJSONValueFree(reply); @@ -1415,7 +1412,7 @@ qemuAgentArbitraryCommand(qemuAgentPtr mon, if (!(cmd = virJSONValueFromString(cmd_str))) goto cleanup; - if ((ret = qemuAgentCommand(mon, cmd, &reply, true, timeout)) < 0) + if ((ret = qemuAgentCommand(mon, cmd, &reply, timeout)) < 0) goto cleanup; if (!(*result = virJSONValueToString(reply, false))) @@ -1442,7 +1439,7 @@ qemuAgentFSTrim(qemuAgentPtr mon, if (!cmd) return ret; - ret = qemuAgentCommand(mon, cmd, &reply, false, mon->timeout); + ret = qemuAgentCommand(mon, cmd, &reply, mon->timeout);
Oh this is fun. This 'false' you are removing should have been true as the command does have a reply that we need to wait for.
virJSONValueFree(cmd); virJSONValueFree(reply);
Michal