[PATCH 0/8] qemu: Improve guest agent corner case errors

This series introduces two new error codes aimed to help management applications to better in deciding when corner cases of guest agent interaction are encountered. Peter Krempa (8): lib: error: Introduce 'VIR_ERR_AGENT_COMMAND_TIMEOUT' qemu: agent: Differentiate timeouts when syncing from command timeout qemuAgentCommandFull: Use VIR_ERR_AGENT_COMMAND_TIMEOUT when agent disappears docs: Point to VIR_ERR_AGENT_COMMAND_TIMEOUT when setting timeout lib: error: Introduce 'VIR_ERR_AGENT_COMMAND_FAILED' qemuAgentCheckError: Use 'VIR_ERR_AGENT_COMMAND_FAILED' qemuAgentCheckError: Rewort error if neither return nor error is found NEWS: Mention guest agent error code improvements NEWS.rst | 10 ++++++++++ docs/manpages/virsh.rst | 3 +++ include/libvirt/virterror.h | 4 ++++ src/libvirt-domain.c | 4 ++++ src/qemu/qemu_agent.c | 32 +++++++++++++++++++++----------- src/util/virerror.c | 6 ++++++ 6 files changed, 48 insertions(+), 11 deletions(-) -- 2.48.1

From: Peter Krempa <pkrempa@redhat.com> Introduce a new special error code for guest agent commands. The error code will be specifically reported only when an actual command (not a sync) was issued to the guest agent and the timeout time was reached. This will allow users and management applications to differentiate between the cases when the sync timed out and thus there's no risk in the agent actually having executed the command and when the actual command was sent. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- include/libvirt/virterror.h | 2 ++ src/util/virerror.c | 3 +++ 2 files changed, 5 insertions(+) diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h index 224eddc9e4..7a2cc2b4bd 100644 --- a/include/libvirt/virterror.h +++ b/include/libvirt/virterror.h @@ -349,6 +349,8 @@ typedef enum { VIR_ERR_CHECKPOINT_INCONSISTENT = 109, /* checkpoint can't be used (Since: 6.10.0) */ VIR_ERR_MULTIPLE_DOMAINS = 110, /* more than one matching domain found (Since: 7.1.0) */ VIR_ERR_NO_NETWORK_METADATA = 111, /* Network metadata is not present (Since: 9.7.0) */ + VIR_ERR_AGENT_COMMAND_TIMEOUT = 112,/* guest agent didn't respond to a non-sync + command within timeout (Since: 11.2.0) */ # ifdef VIR_ENUM_SENTINELS VIR_ERR_NUMBER_LAST /* (Since: 5.0.0) */ diff --git a/src/util/virerror.c b/src/util/virerror.c index 227a182417..f89bfbc530 100644 --- a/src/util/virerror.c +++ b/src/util/virerror.c @@ -1290,6 +1290,9 @@ static const virErrorMsgTuple virErrorMsgStrings[] = { [VIR_ERR_NO_NETWORK_METADATA] = { N_("metadata not found"), N_("metadata not found: %1$s") }, + [VIR_ERR_AGENT_COMMAND_TIMEOUT] = { + N_("guest agent command timed out"), + N_("guest agent command timed out: %1$s") }, }; G_STATIC_ASSERT(G_N_ELEMENTS(virErrorMsgStrings) == VIR_ERR_NUMBER_LAST); -- 2.48.1

From: Peter Krempa <pkrempa@redhat.com> As the guest agent code uses timeouts it is possible that we stop waiting before the guest agent replies. If this happens while syncing everything is okay because we didn't send any state-changing command. In case when the timeout happens after a real command was transmitted it's unknown if the guest-agent processed it or not. Use the new special error code VIR_ERR_AGENT_COMMAND_TIMEOUT for cases when we sent non-sync commands, so that the management applications or users have possibility to react to this situation. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_agent.c | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c index 6f5aab5bf2..879c3a8f41 100644 --- a/src/qemu/qemu_agent.c +++ b/src/qemu/qemu_agent.c @@ -707,6 +707,7 @@ void qemuAgentClose(qemuAgent *agent) * @msg: Message * @seconds: number of seconds to wait for the result, it can be either * -2, -1, 0 or positive. + * @report_sync: On timeout; report synchronization error instead of the normal error * * Send @msg to agent @agent. If @seconds is equal to * VIR_DOMAIN_QEMU_AGENT_COMMAND_BLOCK(-2), this function will block forever @@ -720,9 +721,11 @@ void qemuAgentClose(qemuAgent *agent) * -2 on timeout, * -1 otherwise */ -static int qemuAgentSend(qemuAgent *agent, - qemuAgentMessage *msg, - int seconds) +static int +qemuAgentSend(qemuAgent *agent, + qemuAgentMessage *msg, + int seconds, + bool report_sync) { int ret = -1; unsigned long long then = 0; @@ -751,8 +754,15 @@ static int qemuAgentSend(qemuAgent *agent, if ((then && virCondWaitUntil(&agent->notify, &agent->parent.lock, then) < 0) || (!then && virCondWait(&agent->notify, &agent->parent.lock) < 0)) { if (errno == ETIMEDOUT) { - virReportError(VIR_ERR_AGENT_UNRESPONSIVE, "%s", - _("Guest agent not available for now")); + if (report_sync) { + virReportError(VIR_ERR_AGENT_UNRESPONSIVE, + _("guest agent didn't respond to synchronization within '%1$d' seconds"), + seconds); + } else { + virReportError(VIR_ERR_AGENT_COMMAND_TIMEOUT, + _("guest agent didn't respond to command within '%1$d' seconds"), + seconds); + } ret = -2; } else { virReportSystemError(errno, "%s", @@ -817,7 +827,7 @@ qemuAgentGuestSyncSend(qemuAgent *agent, VIR_DEBUG("Sending guest-sync command with ID: %llu", id); - rc = qemuAgentSend(agent, &sync_msg, timeout); + rc = qemuAgentSend(agent, &sync_msg, timeout, true); rxObj = g_steal_pointer(&sync_msg.rxObject); VIR_DEBUG("qemuAgentSend returned: %d", rc); @@ -1040,7 +1050,7 @@ qemuAgentCommandFull(qemuAgent *agent, VIR_DEBUG("Send command '%s' for write, seconds = %d", cmdstr, seconds); - ret = qemuAgentSend(agent, &msg, seconds); + ret = qemuAgentSend(agent, &msg, seconds, false); VIR_DEBUG("Receive command reply ret=%d rxObject=%p", ret, msg.rxObject); -- 2.48.1

From: Peter Krempa <pkrempa@redhat.com> When the agent disappears after geting a proper command we ought to report the same error code as if we timed out as it's uncertain whether the guest agent did anything. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_agent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c index 879c3a8f41..b22c9d7e85 100644 --- a/src/qemu/qemu_agent.c +++ b/src/qemu/qemu_agent.c @@ -1066,7 +1066,7 @@ qemuAgentCommandFull(qemuAgent *agent, virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Missing agent reply object")); } else { - virReportError(VIR_ERR_AGENT_UNRESPONSIVE, "%s", + virReportError(VIR_ERR_AGENT_COMMAND_TIMEOUT, "%s", _("Guest agent disappeared while executing command")); } ret = -1; -- 2.48.1

From: Peter Krempa <pkrempa@redhat.com> In addition to the error constant appearing add docs hinting that this new error code can be produced on timeouts. The most relevant place is to do it when setting the timeout. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- docs/manpages/virsh.rst | 3 +++ src/libvirt-domain.c | 4 ++++ 2 files changed, 7 insertions(+) diff --git a/docs/manpages/virsh.rst b/docs/manpages/virsh.rst index baced15dec..6f31bd9ca3 100644 --- a/docs/manpages/virsh.rst +++ b/docs/manpages/virsh.rst @@ -2909,6 +2909,9 @@ values: libvirt daemon), * 0 - do not wait at all, +In all guest-agent based APIs when a timeout happens if an actual command was +send to the guest agent the returned error code will be +VIR_ERR_AGENT_COMMAND_TIMEOUT. guestinfo --------- diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 4e78c687d5..09c29df462 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -13519,6 +13519,10 @@ int virDomainSetLaunchSecurityState(virDomainPtr domain, * VIR_DOMAIN_AGENT_RESPONSE_TIMEOUT_NOWAIT(0): does not wait. * positive value: wait for @timeout seconds * + * In all guest-agent based APIs when a timeout happens if an actual command + * was send to the guest agent the returned error code will be + * VIR_ERR_AGENT_COMMAND_TIMEOUT. + * * Returns 0 on success, -1 on failure * * Since: 5.10.0 -- 2.48.1

From: Peter Krempa <pkrempa@redhat.com> Add a special error code for when the guest agent returned a failure message. Allow management applications to deterministically detect failure of the guest agent command. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- include/libvirt/virterror.h | 2 ++ src/util/virerror.c | 3 +++ 2 files changed, 5 insertions(+) diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h index 7a2cc2b4bd..f02da046a3 100644 --- a/include/libvirt/virterror.h +++ b/include/libvirt/virterror.h @@ -351,6 +351,8 @@ typedef enum { VIR_ERR_NO_NETWORK_METADATA = 111, /* Network metadata is not present (Since: 9.7.0) */ VIR_ERR_AGENT_COMMAND_TIMEOUT = 112,/* guest agent didn't respond to a non-sync command within timeout (Since: 11.2.0) */ + VIR_ERR_AGENT_COMMAND_FAILED = 113, /* guest agent responded with failure + to a command (Since: 11.2.0) */ # ifdef VIR_ENUM_SENTINELS VIR_ERR_NUMBER_LAST /* (Since: 5.0.0) */ diff --git a/src/util/virerror.c b/src/util/virerror.c index f89bfbc530..abb014b522 100644 --- a/src/util/virerror.c +++ b/src/util/virerror.c @@ -1293,6 +1293,9 @@ static const virErrorMsgTuple virErrorMsgStrings[] = { [VIR_ERR_AGENT_COMMAND_TIMEOUT] = { N_("guest agent command timed out"), N_("guest agent command timed out: %1$s") }, + [VIR_ERR_AGENT_COMMAND_FAILED] = { + N_("guest agent command failed"), + N_("guest agent command failed: %1$s") }, }; G_STATIC_ASSERT(G_N_ELEMENTS(virErrorMsgStrings) == VIR_ERR_NUMBER_LAST); -- 2.48.1

From: Peter Krempa <pkrempa@redhat.com> In the two cases when we know that the command returned failure switch to the new error code so that management applications can programatically detect failure of the guest agent command. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_agent.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c index b22c9d7e85..d4eb4897a4 100644 --- a/src/qemu/qemu_agent.c +++ b/src/qemu/qemu_agent.c @@ -985,7 +985,7 @@ qemuAgentCheckError(virJSONValue *cmd, /* Only send the user the command name + friendly error */ if (!error) { - virReportError(VIR_ERR_INTERNAL_ERROR, + virReportError(VIR_ERR_AGENT_COMMAND_FAILED, _("unable to execute QEMU agent command '%1$s'"), qemuAgentCommandName(cmd)); return -1; @@ -999,7 +999,7 @@ qemuAgentCheckError(virJSONValue *cmd, return -2; } - virReportError(VIR_ERR_INTERNAL_ERROR, + virReportError(VIR_ERR_AGENT_COMMAND_FAILED, _("unable to execute QEMU agent command '%1$s': %2$s"), qemuAgentCommandName(cmd), qemuAgentStringifyError(error)); -- 2.48.1

From: Peter Krempa <pkrempa@redhat.com> Disambiguate the case from other types of error. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_agent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c index d4eb4897a4..ee0921eca6 100644 --- a/src/qemu/qemu_agent.c +++ b/src/qemu/qemu_agent.c @@ -1013,7 +1013,7 @@ qemuAgentCheckError(virJSONValue *cmd, VIR_DEBUG("Neither 'return' nor 'error' is set in the JSON reply %s: %s", NULLSTR(cmdstr), NULLSTR(replystr)); virReportError(VIR_ERR_INTERNAL_ERROR, - _("unable to execute QEMU agent command '%1$s'"), + _("QEMU agent command '%1$s' returned neither error nor success"), qemuAgentCommandName(cmd)); return -1; } -- 2.48.1

From: Peter Krempa <pkrempa@redhat.com> Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- NEWS.rst | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/NEWS.rst b/NEWS.rst index 98ca838642..b2f3415001 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -37,6 +37,16 @@ v11.2.0 (unreleased) * **Improvements** + * qemu: Improved guest agent corner case error reporting + + The APIs using the guest agent now report two specific error codes aimed at + helping management applications and also users to differentiate between + the guest agent timing out while libvirt is attempting synchronisation, thus + no harm would be done and while being issued a command. + + The new error codes are ``VIR_ERR_AGENT_COMMAND_TIMEOUT`` and + ``VIR_ERR_AGENT_COMMAND_FAILED``. + * **Bug fixes** -- 2.48.1

On a Thursday in 2025, Peter Krempa via Devel wrote:
From: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- NEWS.rst | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/NEWS.rst b/NEWS.rst index 98ca838642..b2f3415001 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -37,6 +37,16 @@ v11.2.0 (unreleased)
* **Improvements**
+ * qemu: Improved guest agent corner case error reporting + + The APIs using the guest agent now report two specific error codes aimed at + helping management applications and also users to differentiate between + the guest agent timing out while libvirt is attempting synchronisation, thus + no harm would be done and while being issued a command. +
guest-agent considered harmful? :) Also, there's an extra 'and' How about? the guest agent timing out while libvirt is attempting synchronisation. These mean that the command was not executed so no change to the guest happened. Or just replace harm with change and and remove the extra and.
+ The new error codes are ``VIR_ERR_AGENT_COMMAND_TIMEOUT`` and + ``VIR_ERR_AGENT_COMMAND_FAILED``. + * **Bug fixes**
-- 2.48.1

On Thu, Mar 20, 2025 at 17:36:27 +0100, Ján Tomko wrote:
On a Thursday in 2025, Peter Krempa via Devel wrote:
From: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- NEWS.rst | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/NEWS.rst b/NEWS.rst index 98ca838642..b2f3415001 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -37,6 +37,16 @@ v11.2.0 (unreleased)
* **Improvements**
+ * qemu: Improved guest agent corner case error reporting + + The APIs using the guest agent now report two specific error codes aimed at + helping management applications and also users to differentiate between + the guest agent timing out while libvirt is attempting synchronisation, thus + no harm would be done and while being issued a command. +
guest-agent considered harmful? :)
Well, it can be sub-optimal to the VM if the filesystems are frozen while the management layer thinks they are not. But I agree that "harm" is not the correct word here.
Also, there's an extra 'and'
How about?
the guest agent timing out while libvirt is attempting synchronisation. These mean that the command was not executed so no change to the guest happened.
Or just replace harm with change and and remove the extra and.
So I wanted to outline the two cases: 1) timeout while syncing 2) timeout when an actual command was sent but we've timed out So how about: The APIs using the guest agent now report two specific error codes aimed at helping management applications/users to differentiate between timeout while libvirt was synchronizing with the guest agent and timeout after a command was already sent.

On Fri, Mar 21, 2025 at 12:15:19PM +0100, Peter Krempa via Devel wrote:
On Thu, Mar 20, 2025 at 17:36:27 +0100, Ján Tomko wrote:
On a Thursday in 2025, Peter Krempa via Devel wrote:
From: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- NEWS.rst | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/NEWS.rst b/NEWS.rst index 98ca838642..b2f3415001 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -37,6 +37,16 @@ v11.2.0 (unreleased)
* **Improvements**
+ * qemu: Improved guest agent corner case error reporting + + The APIs using the guest agent now report two specific error codes aimed at + helping management applications and also users to differentiate between + the guest agent timing out while libvirt is attempting synchronisation, thus + no harm would be done and while being issued a command. +
guest-agent considered harmful? :)
Well, it can be sub-optimal to the VM if the filesystems are frozen while the management layer thinks they are not.
In that scenario, the guest agent will unconditionally fail all commands that are not safe for use while frozen. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Fri, Mar 21, 2025 at 11:35:36 +0000, Daniel P. Berrangé wrote:
On Fri, Mar 21, 2025 at 12:15:19PM +0100, Peter Krempa via Devel wrote:
On Thu, Mar 20, 2025 at 17:36:27 +0100, Ján Tomko wrote:
On a Thursday in 2025, Peter Krempa via Devel wrote:
From: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- NEWS.rst | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/NEWS.rst b/NEWS.rst index 98ca838642..b2f3415001 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -37,6 +37,16 @@ v11.2.0 (unreleased)
* **Improvements**
+ * qemu: Improved guest agent corner case error reporting + + The APIs using the guest agent now report two specific error codes aimed at + helping management applications and also users to differentiate between + the guest agent timing out while libvirt is attempting synchronisation, thus + no harm would be done and while being issued a command. +
guest-agent considered harmful? :)
Well, it can be sub-optimal to the VM if the filesystems are frozen while the management layer thinks they are not.
In that scenario, the guest agent will unconditionally fail all commands that are not safe for use while frozen.
The guest agent indeed will fail. But also many things in the VM will be unhappy without the abiility to write. In one of the scenarios I was dealing with the guest agent took a bit longer to reply after getting the command to freeze. The user got a generic failure from libvirt timing out but the command was actually executed by the GA. Thus if you "forget" or "don't notice" that you froze filesystems that might be harmful.

On Fri, Mar 21, 2025 at 12:40:39PM +0100, Peter Krempa wrote:
On Fri, Mar 21, 2025 at 11:35:36 +0000, Daniel P. Berrangé wrote:
On Fri, Mar 21, 2025 at 12:15:19PM +0100, Peter Krempa via Devel wrote:
On Thu, Mar 20, 2025 at 17:36:27 +0100, Ján Tomko wrote:
On a Thursday in 2025, Peter Krempa via Devel wrote:
From: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- NEWS.rst | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/NEWS.rst b/NEWS.rst index 98ca838642..b2f3415001 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -37,6 +37,16 @@ v11.2.0 (unreleased)
* **Improvements**
+ * qemu: Improved guest agent corner case error reporting + + The APIs using the guest agent now report two specific error codes aimed at + helping management applications and also users to differentiate between + the guest agent timing out while libvirt is attempting synchronisation, thus + no harm would be done and while being issued a command. +
guest-agent considered harmful? :)
Well, it can be sub-optimal to the VM if the filesystems are frozen while the management layer thinks they are not.
In that scenario, the guest agent will unconditionally fail all commands that are not safe for use while frozen.
The guest agent indeed will fail. But also many things in the VM will be unhappy without the abiility to write.
In one of the scenarios I was dealing with the guest agent took a bit longer to reply after getting the command to freeze. The user got a generic failure from libvirt timing out but the command was actually executed by the GA.
Thus if you "forget" or "don't notice" that you froze filesystems that might be harmful.
Hmmm, the guest agent is using QMP and QMP has a notion of events. We should file an RFE to extend the guest agent so that it issues events when freezing or unfreezing. That way even if the initial command times out, libvirt can still get notified when the action has happened. When re-connecting to a running VM at startup, we also ought to have a way to query whether it is frozen or not, in case something changed while we were stopped. In theory we could query what commands are available, since while frozen most commands get disabled, but there's no "query-qmp-schema" command exposed. This is another feature gap comapared to QEMU that ought to be fixed. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Fri, Mar 21, 2025 at 11:53:50 +0000, Daniel P. Berrangé wrote:
On Fri, Mar 21, 2025 at 12:40:39PM +0100, Peter Krempa wrote:
On Fri, Mar 21, 2025 at 11:35:36 +0000, Daniel P. Berrangé wrote:
On Fri, Mar 21, 2025 at 12:15:19PM +0100, Peter Krempa via Devel wrote:
On Thu, Mar 20, 2025 at 17:36:27 +0100, Ján Tomko wrote:
On a Thursday in 2025, Peter Krempa via Devel wrote:
From: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- NEWS.rst | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/NEWS.rst b/NEWS.rst index 98ca838642..b2f3415001 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -37,6 +37,16 @@ v11.2.0 (unreleased)
* **Improvements**
+ * qemu: Improved guest agent corner case error reporting + + The APIs using the guest agent now report two specific error codes aimed at + helping management applications and also users to differentiate between + the guest agent timing out while libvirt is attempting synchronisation, thus + no harm would be done and while being issued a command. +
guest-agent considered harmful? :)
Well, it can be sub-optimal to the VM if the filesystems are frozen while the management layer thinks they are not.
In that scenario, the guest agent will unconditionally fail all commands that are not safe for use while frozen.
The guest agent indeed will fail. But also many things in the VM will be unhappy without the abiility to write.
In one of the scenarios I was dealing with the guest agent took a bit longer to reply after getting the command to freeze. The user got a generic failure from libvirt timing out but the command was actually executed by the GA.
Thus if you "forget" or "don't notice" that you froze filesystems that might be harmful.
Hmmm, the guest agent is using QMP and QMP has a notion of events. We should file an RFE to extend the guest agent so that it issues events when freezing or unfreezing. That way even if the initial command times out, libvirt can still get notified when the action has happened.
When re-connecting to a running VM at startup, we also ought to have a way to query whether it is frozen or not, in case something changed while we were stopped.
In theory we could query what commands are available, since while frozen most commands get disabled, but there's no "query-qmp-schema" command exposed. This is another feature gap comapared to QEMU that ought to be fixed.
Well in theory you can use 'guest-fsfreeze-status'. The problem is that it only reports the internal state of the guest agent. If something goes sideways (GA killed) or is not triggered by the GA this state will not get updated. This is specifically one of the corner cases of 'guest-fsfreeze-freeze' on windows which gets auto-thawed after some time. In addition the guest agent supports parametric freeze, thus you can freeze only specific filesystems, but then you can't freeze anything else because the internal state locks it out even when you don't freeze the filesystem the agend resides on. The thaw command is always global, which can also be fun if something else froze filesystems.

On a Friday in 2025, Peter Krempa wrote:
On Thu, Mar 20, 2025 at 17:36:27 +0100, Ján Tomko wrote:
On a Thursday in 2025, Peter Krempa via Devel wrote:
From: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- NEWS.rst | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/NEWS.rst b/NEWS.rst index 98ca838642..b2f3415001 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -37,6 +37,16 @@ v11.2.0 (unreleased)
* **Improvements**
+ * qemu: Improved guest agent corner case error reporting + + The APIs using the guest agent now report two specific error codes aimed at + helping management applications and also users to differentiate between + the guest agent timing out while libvirt is attempting synchronisation, thus + no harm would be done and while being issued a command. +
guest-agent considered harmful? :)
Well, it can be sub-optimal to the VM if the filesystems are frozen while the management layer thinks they are not.
But I agree that "harm" is not the correct word here.
Also, there's an extra 'and'
How about?
the guest agent timing out while libvirt is attempting synchronisation. These mean that the command was not executed so no change to the guest happened.
Or just replace harm with change and and remove the extra and.
So I wanted to outline the two cases: 1) timeout while syncing 2) timeout when an actual command was sent but we've timed out
So how about:
The APIs using the guest agent now report two specific error codes aimed at helping management applications/users to differentiate between timeout while libvirt was synchronizing with the guest agent and timeout after a command was already sent.
Reviewed-by: Ján Tomko <jtomko@redhat.com> Jano

On a Thursday in 2025, Peter Krempa via Devel wrote:
This series introduces two new error codes aimed to help management applications to better in deciding when corner cases of guest agent interaction are encountered.
Peter Krempa (8): lib: error: Introduce 'VIR_ERR_AGENT_COMMAND_TIMEOUT' qemu: agent: Differentiate timeouts when syncing from command timeout qemuAgentCommandFull: Use VIR_ERR_AGENT_COMMAND_TIMEOUT when agent disappears docs: Point to VIR_ERR_AGENT_COMMAND_TIMEOUT when setting timeout lib: error: Introduce 'VIR_ERR_AGENT_COMMAND_FAILED' qemuAgentCheckError: Use 'VIR_ERR_AGENT_COMMAND_FAILED' qemuAgentCheckError: Rewort error if neither return nor error is found NEWS: Mention guest agent error code improvements
NEWS.rst | 10 ++++++++++ docs/manpages/virsh.rst | 3 +++ include/libvirt/virterror.h | 4 ++++ src/libvirt-domain.c | 4 ++++ src/qemu/qemu_agent.c | 32 +++++++++++++++++++++----------- src/util/virerror.c | 6 ++++++ 6 files changed, 48 insertions(+), 11 deletions(-)
Reviewed-by: Ján Tomko <jtomko@redhat.com> Jano
participants (3)
-
Daniel P. Berrangé
-
Ján Tomko
-
Peter Krempa