On 07/07/2016 05:53 AM, Xiubo Li wrote:
These days, we experienced one qemu guest agent is not available
error. Then the guest agent couldn't be used forever before rebooting
the libvirtd daemon or reboot the qemu-guest-agent in the VM(you can
also reboot the VM) to clear the priv->agentError.
This is one bug for a long time in many verisons of libvirtd:
https://bugzilla.redhat.com/show_bug.cgi?id=1090551
And there is one python script to reproduce this bug from the bugzilla:
https://github.com/aspirer/study/blob/master/qemu-guest-agent/test2.py
Just set the timeout to 0 at Line26, you can reproduce it very quickly:
rsp = libvirt_qemu.qemuAgentCommand(dom, cmd, 1, 0) // 1 -> 0
The reason why this could happen is that:
In case we received something like {"return": {}} for example, it is
likely that somebody:
1) Started GA thread which sent one "guest-ping" command with seconds
equals to VIR_DOMAIN_QEMU_AGENT_COMMAND_NOWAIT(0) makes this function
return immediately without waiting, but after updating the events, it
will sleep in virCondWait() or virCondWaitUntil().
2) While in AgentIO thread the "guest-ping" command is sent successfully,
with updating the events it will wait for reply.
3) Then the GA thread is woken up and the mon->msg is set to NULL and exit.
4) At last the AgentIO thread receives the reply of "guest-ping" command
with the mon->msg == NULL.
This case could be adapt to all the GA commands, including guest-sync, etc.
This patch will check if this is the case and don't report an error but
return silently.
Signed-off-by: Xiubo Li <lixiubo(a)cmss.chinamobile.com>
Signed-off-by: Zhuoyu Zhang <zhangzhuoyu(a)cmss.chinamobile.com>
Signed-off-by: Wei Tang <tangwei(a)cmss.chinamobile.com>
Signed-off-by: Yaowei Bai <baiyaowei(a)cmss.chinamobile.com>
Signed-off-by: Qide Chen <chenqide(a)cmss.chinamobile.com>
---
Changes in V2:
- check value type instead of object type.
src/qemu/qemu_agent.c | 37 ++++++++++++++++++++++++++++---------
src/util/virjson.h | 7 +++++++
2 files changed, 35 insertions(+), 9 deletions(-)
Since this has been sitting around for a long time with a response -
figured I'd point out something that was pushed today:
http://www.redhat.com/archives/libvir-list/2016-September/msg00570.html
It addresses/resolves the issue in essentially the same manner, although
it doesn't check the empty {} reply, rather it just makes the else
condition be a ignored delay reply.... There's also a few more patches
to the series.
John
diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c
index eeede6b..f5408cc 100644
--- a/src/qemu/qemu_agent.c
+++ b/src/qemu/qemu_agent.c
@@ -308,7 +308,6 @@ qemuAgentIOProcessLine(qemuAgentPtr mon,
{
virJSONValuePtr obj = NULL;
int ret = -1;
- unsigned long long id;
VIR_DEBUG("Line [%s]", line);
@@ -333,16 +332,36 @@ qemuAgentIOProcessLine(qemuAgentPtr mon,
obj = NULL;
ret = 0;
} else {
- /* If we've received something like:
- * {"return": 1234}
- * it is likely that somebody started GA
- * which is now processing our previous
- * guest-sync commands. Check if this is
- * the case and don't report an error but
+ virJSONValuePtr val;
+
+ /* In case we received something like:
+ * {"return": {}} for example,
+ * it is likely that somebody:
+ *
+ * 1) Started GA thread which sent one "guest-ping" command
+ * with seconds equals to VIR_DOMAIN_QEMU_AGENT_COMMAND_NOWAIT(0)
+ * makes this function return immediately without waiting, but
+ * after updating the events, it will sleep in virCondWait() or
+ * virCondWaitUntil().
+ *
+ * 2) While in AgentIO thread the "guest-ping" command is sent
+ * successfully, with updating the events it will wait for reply.
+ *
+ * 3) Then the GA thread is woken up and the mon->msg is set
+ * to NULL and exit.
+ *
+ * 4) At last the AgentIO thread receives the reply of
"guest-ping"
+ * command with the mon->msg == NULL.
+ *
+ * This case could be adapt to all the GA commands, including
+ * guest-sync, etc.
+ *
+ * Check if this is the case and don't report an error but
* return silently.
*/
- if (virJSONValueObjectGetNumberUlong(obj, "return", &id) == 0)
{
- VIR_DEBUG("Ignoring delayed reply to guest-sync: %llu", id);
+ val = virJSONValueObjectGet(obj, "return");
+ if (val && virJSONValueTypeIsValid(val)) {
+ VIR_DEBUG("Ignoring delayed command reply");
ret = 0;
goto cleanup;
}
diff --git a/src/util/virjson.h b/src/util/virjson.h
index 66ed48a..cb1d973 100644
--- a/src/util/virjson.h
+++ b/src/util/virjson.h
@@ -80,6 +80,13 @@ struct _virJSONValue {
} data;
};
+static inline bool
+virJSONValueTypeIsValid(virJSONValuePtr value)
+{
+ return ((value->type >= VIR_JSON_TYPE_OBJECT) &&
+ (value->type <= VIR_JSON_TYPE_BOOLEAN));
+}
+
void virJSONValueFree(virJSONValuePtr value);
int virJSONValueObjectCreate(virJSONValuePtr *obj, ...)