[libvirt] [PATCH 0/5] enforce and simplify qga interactions

First 2 patches fix bugs of libvirt - qga communication. They deal with the channel pecularities described in http://wiki.qemu.org/Features/QAPI/GuestAgent. First patch address https://bugzilla.redhat.com/show_bug.cgi?id=1090551 bug. Next 3 patches hide details of channel pecularities from client, so there is no more need to reissue commands for the reasons other then simple timeout. Nikolay Shirokovskiy (5): qemu: agent: ignore delayed reply when unsynced qemu: agent: ignore garbage while waiting for sync reply qemu: agent: skip outdated sync replies qemu: agent: reissue sync on garbage sync reply qemu: agent: give better error messages whe agent monitor is down src/qemu/qemu_agent.c | 91 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 54 insertions(+), 37 deletions(-) -- 1.8.3.1

Errors in qemuAgentIOProcessLine stop agent IO processing just like any regular IO error, however some of current errors that this functions spawns are false positives. Consider next case for example: 1. send sync (unsynced state) 2. receive sync reply (sync established) 3. command send, but timeout occured (unsynced state) 4. receive command reply Last IO triggers error because current code ignores only delayed syncs when unsynced We should not treat any delayed reply as error in unsynced state. Until client and qga are not in sync delayed reply to any command is possible. msg == NULL is the exact criterion that we are not in sync. --- src/qemu/qemu_agent.c | 21 +++------------------ 1 file changed, 3 insertions(+), 18 deletions(-) diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c index fdc4fed..88bcb4c 100644 --- a/src/qemu/qemu_agent.c +++ b/src/qemu/qemu_agent.c @@ -308,7 +308,6 @@ qemuAgentIOProcessLine(qemuAgentPtr mon, { virJSONValuePtr obj = NULL; int ret = -1; - unsigned long long id; VIR_DEBUG("Line [%s]", line); @@ -331,25 +330,11 @@ qemuAgentIOProcessLine(qemuAgentPtr mon, msg->rxObject = obj; msg->finished = 1; obj = NULL; - ret = 0; } else { - /* If we've received something like: - * {"return": 1234} - * it is likely that somebody started GA - * which is now processing our previous - * guest-sync commands. Check if this is - * the case and don't report an error but - * return silently. - */ - if (virJSONValueObjectGetNumberUlong(obj, "return", &id) == 0) { - VIR_DEBUG("Ignoring delayed reply to guest-sync: %llu", id); - ret = 0; - goto cleanup; - } - - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Unexpected JSON reply '%s'"), line); + /* we are out of sync */ + VIR_DEBUG("Ignoring delayed reply"); } + ret = 0; } else { virReportError(VIR_ERR_INTERNAL_ERROR, _("Unknown JSON reply '%s'"), line); -- 1.8.3.1

After sync is sent we can receive garbare and this is not error. Consider next regular case: 1. libvirtd sends sync 2. qga sends partial sync reply and die 3. libvirtd sends sync 4. qga sends sync reply 5. libvirtd receives garbage (half of first reply and second reply together) We should handle this situation as it is recoverable. Next sync can succeed. Let's report reply is NULL, it will be converted to the VIR_ERR_AGENT_UNSYNCED which signals client to retry. --- src/qemu/qemu_agent.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c index 88bcb4c..8e447d6 100644 --- a/src/qemu/qemu_agent.c +++ b/src/qemu/qemu_agent.c @@ -81,6 +81,8 @@ struct _qemuAgentMessage { * fatal error occurred on the monitor channel */ bool finished; + /* true for sync command */ + bool sync; }; @@ -311,8 +313,16 @@ qemuAgentIOProcessLine(qemuAgentPtr mon, VIR_DEBUG("Line [%s]", line); - if (!(obj = virJSONValueFromString(line))) + if (!(obj = virJSONValueFromString(line))) { + /* receiving garbage on sync is regular situation */ + if (msg && msg->sync) { + VIR_DEBUG("Received garbage on sync"); + msg->finished = 1; + return 0; + } + goto cleanup; + } if (obj->type != VIR_JSON_TYPE_OBJECT) { virReportError(VIR_ERR_INTERNAL_ERROR, @@ -939,6 +949,7 @@ qemuAgentGuestSync(qemuAgentPtr mon) return -1; sync_msg.txLength = strlen(sync_msg.txBuffer); + sync_msg.sync = true; VIR_DEBUG("Sending guest-sync command with ID: %llu", id); -- 1.8.3.1

When we wait for sync reply we can receive delayed reply to syncs or commands that were sent erlier. We can safely skip them until we receive sync reply with correct id. There is no much sense report this situation to client. Actually with a bit of "luck" if we involve client into this the play can go on forever: send sync 0, receive sync reply -1, send sync 1, receive reply 0 ... --- src/qemu/qemu_agent.c | 37 ++++++++++++++++++++++--------------- 1 file changed, 22 insertions(+), 15 deletions(-) diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c index 8e447d6..30a1c77 100644 --- a/src/qemu/qemu_agent.c +++ b/src/qemu/qemu_agent.c @@ -83,6 +83,8 @@ struct _qemuAgentMessage { bool finished; /* true for sync command */ bool sync; + /* id of the issued sync comand */ + unsigned long long id; }; @@ -337,6 +339,24 @@ qemuAgentIOProcessLine(qemuAgentPtr mon, } else if (virJSONValueObjectHasKey(obj, "error") == 1 || virJSONValueObjectHasKey(obj, "return") == 1) { if (msg) { + if (msg->sync) { + unsigned long long id; + + if (virJSONValueObjectGetNumberUlong(obj, "return", &id) < 0) { + VIR_DEBUG("Ignoring delayed reply on sync"); + ret = 0; + goto cleanup; + } + + VIR_DEBUG("Guest returned ID: %llu", id); + + if (msg->id != id) { + VIR_DEBUG("Guest agent returned ID: %llu instead of %llu", + id, msg->id); + ret = 0; + goto cleanup; + } + } msg->rxObject = obj; msg->finished = 1; obj = NULL; @@ -935,7 +955,7 @@ qemuAgentGuestSync(qemuAgentPtr mon) { int ret = -1; int send_ret; - unsigned long long id, id_ret; + unsigned long long id; qemuAgentMessage sync_msg; memset(&sync_msg, 0, sizeof(sync_msg)); @@ -950,6 +970,7 @@ qemuAgentGuestSync(qemuAgentPtr mon) sync_msg.txLength = strlen(sync_msg.txBuffer); sync_msg.sync = true; + sync_msg.id = id; VIR_DEBUG("Sending guest-sync command with ID: %llu", id); @@ -967,20 +988,6 @@ qemuAgentGuestSync(qemuAgentPtr mon) goto cleanup; } - if (virJSONValueObjectGetNumberUlong(sync_msg.rxObject, - "return", &id_ret) < 0) { - virReportError(VIR_ERR_AGENT_UNSYNCED, "%s", - _("Malformed return value")); - goto cleanup; - } - - VIR_DEBUG("Guest returned ID: %llu", id_ret); - if (id_ret != id) { - virReportError(VIR_ERR_AGENT_UNSYNCED, - _("Guest agent returned ID: %llu instead of %llu"), - id_ret, id); - goto cleanup; - } ret = 0; cleanup: -- 1.8.3.1

We can easily handle receiving garbage on sync. We don't have to make client deal with this situation. We just need to resend sync command but this time garbage is not be possible. --- src/qemu/qemu_agent.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c index 30a1c77..8f47618 100644 --- a/src/qemu/qemu_agent.c +++ b/src/qemu/qemu_agent.c @@ -85,6 +85,7 @@ struct _qemuAgentMessage { bool sync; /* id of the issued sync comand */ unsigned long long id; + bool first; }; @@ -316,8 +317,8 @@ qemuAgentIOProcessLine(qemuAgentPtr mon, VIR_DEBUG("Line [%s]", line); if (!(obj = virJSONValueFromString(line))) { - /* receiving garbage on sync is regular situation */ - if (msg && msg->sync) { + /* receiving garbage on first sync is regular situation */ + if (msg && msg->sync && msg->first) { VIR_DEBUG("Received garbage on sync"); msg->finished = 1; return 0; @@ -959,7 +960,10 @@ qemuAgentGuestSync(qemuAgentPtr mon) qemuAgentMessage sync_msg; memset(&sync_msg, 0, sizeof(sync_msg)); + /* set only on first sync */ + sync_msg.first = true; + retry: if (virTimeMillisNow(&id) < 0) return -1; @@ -983,9 +987,15 @@ qemuAgentGuestSync(qemuAgentPtr mon) goto cleanup; if (!sync_msg.rxObject) { - virReportError(VIR_ERR_AGENT_UNSYNCED, "%s", - _("Missing monitor reply object")); - goto cleanup; + if (sync_msg.first) { + VIR_FREE(sync_msg.txBuffer); + memset(&sync_msg, 0, sizeof(sync_msg)); + goto retry; + } else { + virReportError(VIR_ERR_AGENT_UNSYNCED, "%s", + _("Missing monitor reply object")); + goto cleanup; + } } ret = 0; -- 1.8.3.1

We can receive NULL as sync reply in two situations. First is garbage sync reply and this situation is handled by resending sync message. Second is different cases of rebooting guest, destroing domain etc and we can give more meaningful error message. Actually we have this error message in qemuAgentCommand already which checks for the same sitatuion. AFAIK case with mon->running is just to be safe on adding some future(?) cases of returning NULL reply. --- src/qemu/qemu_agent.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c index 8f47618..8aea3c4 100644 --- a/src/qemu/qemu_agent.c +++ b/src/qemu/qemu_agent.c @@ -992,8 +992,12 @@ qemuAgentGuestSync(qemuAgentPtr mon) memset(&sync_msg, 0, sizeof(sync_msg)); goto retry; } else { - virReportError(VIR_ERR_AGENT_UNSYNCED, "%s", - _("Missing monitor reply object")); + if (mon->running) + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Missing monitor reply object")); + else + virReportError(VIR_ERR_AGENT_UNRESPONSIVE, "%s", + _("Guest agent disappeared while executing command")); goto cleanup; } } -- 1.8.3.1

On 16.09.2016 12:35, Nikolay Shirokovskiy wrote:
First 2 patches fix bugs of libvirt - qga communication. They deal with the channel pecularities described in http://wiki.qemu.org/Features/QAPI/GuestAgent. First patch address https://bugzilla.redhat.com/show_bug.cgi?id=1090551 bug.
Next 3 patches hide details of channel pecularities from client, so there is no more need to reissue commands for the reasons other then simple timeout.
Nikolay Shirokovskiy (5): qemu: agent: ignore delayed reply when unsynced qemu: agent: ignore garbage while waiting for sync reply qemu: agent: skip outdated sync replies qemu: agent: reissue sync on garbage sync reply qemu: agent: give better error messages whe agent monitor is down
src/qemu/qemu_agent.c | 91 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 54 insertions(+), 37 deletions(-)
The patches look good to me, but they might be considered controversial, so I'll wait couple of hours before merging them. ACK meanwhile. Michal

On 19.09.2016 12:47, Michal Privoznik wrote:
On 16.09.2016 12:35, Nikolay Shirokovskiy wrote:
First 2 patches fix bugs of libvirt - qga communication. They deal with the channel pecularities described in http://wiki.qemu.org/Features/QAPI/GuestAgent. First patch address https://bugzilla.redhat.com/show_bug.cgi?id=1090551 bug.
Next 3 patches hide details of channel pecularities from client, so there is no more need to reissue commands for the reasons other then simple timeout.
Nikolay Shirokovskiy (5): qemu: agent: ignore delayed reply when unsynced qemu: agent: ignore garbage while waiting for sync reply qemu: agent: skip outdated sync replies qemu: agent: reissue sync on garbage sync reply qemu: agent: give better error messages whe agent monitor is down
src/qemu/qemu_agent.c | 91 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 54 insertions(+), 37 deletions(-)
The patches look good to me, but they might be considered controversial, so I'll wait couple of hours before merging them.
ACK meanwhile.
I've pushed this now. Michal

On 21.09.2016 10:09, Michal Privoznik wrote:
On 19.09.2016 12:47, Michal Privoznik wrote:
On 16.09.2016 12:35, Nikolay Shirokovskiy wrote:
First 2 patches fix bugs of libvirt - qga communication. They deal with the channel pecularities described in http://wiki.qemu.org/Features/QAPI/GuestAgent. First patch address https://bugzilla.redhat.com/show_bug.cgi?id=1090551 bug.
Next 3 patches hide details of channel pecularities from client, so there is no more need to reissue commands for the reasons other then simple timeout.
Nikolay Shirokovskiy (5): qemu: agent: ignore delayed reply when unsynced qemu: agent: ignore garbage while waiting for sync reply qemu: agent: skip outdated sync replies qemu: agent: reissue sync on garbage sync reply qemu: agent: give better error messages whe agent monitor is down
src/qemu/qemu_agent.c | 91 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 54 insertions(+), 37 deletions(-)
The patches look good to me, but they might be considered controversial, so I'll wait couple of hours before merging them.
ACK meanwhile.
I've pushed this now.
Michal
Thanks!
participants (2)
-
Michal Privoznik
-
Nikolay Shirokovskiy