These days, we experienced one qemu guest agent is not available
error. Then the guest agent couldn't be used forever before rebooting
the libvirtd daemon or reboot the qemu-guest-agent in the VM(you can
also reboot the VM) to clear the priv->agentError.
This is one bug for a long time in many verisons of libvirtd:
https://bugzilla.redhat.com/show_bug.cgi?id=1090551
And there is one python script to reproduce this bug from the bugzilla:
https://github.com/aspirer/study/blob/master/qemu-guest-agent/test2.py
Just set the timeout to 0 at Line26, you can reproduce it very quickly:
rsp = libvirt_qemu.qemuAgentCommand(dom, cmd, 1, 0) // 1 -> 0
The reason why this could happen is that:
In case we received something like: {"return": {}} for example, it is
likely that somebody:
1) Started GA which sent one "guest-ping" command with seconds equals
to VIR_DOMAIN_QEMU_AGENT_COMMAND_NOWAIT(0), which makes this function
return "immediately" without waiting, and sleeps in virCondWait().
2) While in AgentIO thread the "guest-ping" command is sent successfully,
and then waiting for reply.
3) Then the GA thread is woken up and the mon->msg is set to NULL and
exit.
4) Now the AgentIO has received the reply for "guest-ping" command with
the mon->msg == NULL.
Before we just check the guest-sync case, and should also check for all
the other GA commands.
I have test many of the GA commands, which all can reporduce this bug.
This patch will check if this is the case and don't report an error but
return silently.
Signed-off-by: Xiubo Li <lixiubo(a)cmss.chinamobile.com>
Signed-off-by: Zhuoyu Zhang <zhangzhuoyu(a)cmss.chinamobile.com>
Signed-off-by: Wei Tang <tangwei(a)cmss.chinamobile.com>
Signed-off-by: Yaowei Bai <baiyaowei(a)cmss.chinamobile.com>
Signed-off-by: Qide Chen <chenqide(a)cmss.chinamobile.com>
---
src/qemu/qemu_agent.c | 30 +++++++++++++++++++++---------
src/util/virjson.h | 7 +++++++
2 files changed, 28 insertions(+), 9 deletions(-)
diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c
index eeede6b..5f08ba0 100644
--- a/src/qemu/qemu_agent.c
+++ b/src/qemu/qemu_agent.c
@@ -308,7 +308,6 @@ qemuAgentIOProcessLine(qemuAgentPtr mon,
{
virJSONValuePtr obj = NULL;
int ret = -1;
- unsigned long long id;
VIR_DEBUG("Line [%s]", line);
@@ -333,16 +332,29 @@ qemuAgentIOProcessLine(qemuAgentPtr mon,
obj = NULL;
ret = 0;
} else {
- /* If we've received something like:
- * {"return": 1234}
- * it is likely that somebody started GA
- * which is now processing our previous
- * guest-sync commands. Check if this is
- * the case and don't report an error but
+ /* In case we received something like:
+ * {"return": {}} for example,
+ * it is likely that somebody:
+ *
+ * 1) Started GA which sent one "guest-ping" command with
+ * seconds equals to VIR_DOMAIN_QEMU_AGENT_COMMAND_NOWAIT(0)
+ * makes this function return immediately without waiting,
+ * and sleeps in virCondWait().
+ *
+ * 2) While in AgentIO thread the "guest-ping" command is sent
+ * successfully, and the waiting for reply.
+ *
+ * 3) Then the GA thread is woken up and the mon->msg is set
+ * to NULL and exit.
+ *
+ * 4) Now the AgentIO has received the reply for "guest-ping"
+ * command with the mon->msg == NULL.
+ *
+ * Check if this is the case and don't report an error but
* return silently.
*/
- if (virJSONValueObjectGetNumberUlong(obj, "return", &id) == 0)
{
- VIR_DEBUG("Ignoring delayed reply to guest-sync: %llu", id);
+ if (virIsValidJSONType(obj)) {
+ VIR_DEBUG("Ignoring delayed command reply");
ret = 0;
goto cleanup;
}
diff --git a/src/util/virjson.h b/src/util/virjson.h
index 66ed48a..d36ce8e 100644
--- a/src/util/virjson.h
+++ b/src/util/virjson.h
@@ -80,6 +80,13 @@ struct _virJSONValue {
} data;
};
+static inline bool
+virIsValidJSONType(virJSONValuePtr object)
+{
+ return ((object->type >= VIR_JSON_TYPE_OBJECT)
+ && (object->type <= VIR_JSON_TYPE_BOOLEAN));
+}
+
void virJSONValueFree(virJSONValuePtr value);
int virJSONValueObjectCreate(virJSONValuePtr *obj, ...)
--
1.8.3.1