On 02/07/2012 08:49 AM, Michal Privoznik wrote:
> We could still timeout the 'fs-freeze' command after 30
seconds
> or so. Given that we issue the guest-resync command, we'll be
> able to automatically re-sync the JSON protocol by dropping the
> later arriving fs-freeze reply (if any).
I don't think this is a good idea. I've chosen 'fs-freeze' intentionally
:) It's something that actually might take ages - to sync disks (which
is what current implementation does). Therefore if we set any timeout
for regular commands we may get into inconsistent state:
1) issue fs-freeze
2) timeout and return error (everybody thinks fs is not frozen)
3) receive "okay, frozen" from GA
Question for the qemu-folks:
We've already documented that qemu-ga must be treated as an asynchronous
interface; callers cannot expect the client to reliably reply, and must
always have a timeout mechanism in place. Doesn't that mean that any
guest agent command that might potentially be long-running should
instead be broken up into multiple commands, one to start the process,
and another to query whether the process has been completed?
That is, since fs-freeze might be potentially long-running, should we
break it into multiple commands:
fs-freeze-async requests that a freeze be started, and an immediate ack
returned if the process is started
fs-freeze-query returns the status of whether the system is thawed,
frozen, or in the process of transitioning
libvirt would then issue a guest-sync with reasonable timeout (to ensure
the agent is currently responsive, if it fails, the agent is not
available), then an fs-freeze-async with reasonable timeout (if that
fails, the freeze is not possible), then periodic fs-freeze-query until
the freeze completes (if any of them fail, assume the agent restarted,
but that the system is frozen, and therefore, libvirt should send an
fs-thaw command prior to returning failure, just in case).
>
> According to the 'guest-sync' QMP spec, we need to send the magic byte
> '0xFF' immediately before the guest-sync command data is sent.
Yeah, and probably switch to new guest-sync-delimited command as soon as
it's upstream.
If I'm understanding the recent proposals correctly, guest-sync exists
in 1.0 guest agents, but not guest-sync-delimited; we can always send
0xff, but we can only expect to receive 0xff if we use
guest-sync-delimited which means we need to probe to see if the guest
agent understands guest-sync-delimited. Is it safe to send a 1.0 guest
a command it doesn't understand, like guest-sync-delimited, and expect
to get a reliable error message in reply?
--
Eric Blake eblake(a)redhat.com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org