On Fri, Mar 21, 2025 at 11:53:50 +0000, Daniel P. Berrangé wrote:
On Fri, Mar 21, 2025 at 12:40:39PM +0100, Peter Krempa wrote:
> On Fri, Mar 21, 2025 at 11:35:36 +0000, Daniel P. Berrangé wrote:
> > On Fri, Mar 21, 2025 at 12:15:19PM +0100, Peter Krempa via Devel wrote:
> > > On Thu, Mar 20, 2025 at 17:36:27 +0100, Ján Tomko wrote:
> > > > On a Thursday in 2025, Peter Krempa via Devel wrote:
> > > > > From: Peter Krempa <pkrempa(a)redhat.com>
> > > > >
> > > > > Signed-off-by: Peter Krempa <pkrempa(a)redhat.com>
> > > > > ---
> > > > > NEWS.rst | 10 ++++++++++
> > > > > 1 file changed, 10 insertions(+)
> > > > >
> > > > > diff --git a/NEWS.rst b/NEWS.rst
> > > > > index 98ca838642..b2f3415001 100644
> > > > > --- a/NEWS.rst
> > > > > +++ b/NEWS.rst
> > > > > @@ -37,6 +37,16 @@ v11.2.0 (unreleased)
> > > > >
> > > > > * **Improvements**
> > > > >
> > > > > + * qemu: Improved guest agent corner case error reporting
> > > > > +
> > > > > + The APIs using the guest agent now report two specific
error codes aimed at
> > > > > + helping management applications and also users to
differentiate between
> > > > > + the guest agent timing out while libvirt is attempting
synchronisation, thus
> > > > > + no harm would be done and while being issued a command.
> > > > > +
> > > >
> > > > guest-agent considered harmful? :)
> > >
> > > Well, it can be sub-optimal to the VM if the filesystems are frozen
> > > while the management layer thinks they are not.
> >
> > In that scenario, the guest agent will unconditionally fail all commands
> > that are not safe for use while frozen.
>
> The guest agent indeed will fail. But also many things in the VM will be
> unhappy without the abiility to write.
>
> In one of the scenarios I was dealing with the guest agent took a bit
> longer to reply after getting the command to freeze. The user got a
> generic failure from libvirt timing out but the command was actually
> executed by the GA.
>
> Thus if you "forget" or "don't notice" that you froze
filesystems that
> might be harmful.
Hmmm, the guest agent is using QMP and QMP has a notion of events. We
should file an RFE to extend the guest agent so that it issues events
when freezing or unfreezing. That way even if the initial command times
out, libvirt can still get notified when the action has happened.
When re-connecting to a running VM at startup, we also ought to have a
way to query whether it is frozen or not, in case something changed
while we were stopped.
In theory we could query what commands are available, since while
frozen most commands get disabled, but there's no "query-qmp-schema"
command exposed. This is another feature gap comapared to QEMU that
ought to be fixed.
Well in theory you can use 'guest-fsfreeze-status'. The problem is that
it only reports the internal state of the guest agent.
If something goes sideways (GA killed) or is not triggered by the GA
this state will not get updated.
This is specifically one of the corner cases of 'guest-fsfreeze-freeze'
on windows which gets auto-thawed after some time.
In addition the guest agent supports parametric freeze, thus you can
freeze only specific filesystems, but then you can't freeze anything
else because the internal state locks it out even when you don't freeze
the filesystem the agend resides on.
The thaw command is always global, which can also be fun if something
else froze filesystems.