On 07.02.2012 16:18, Daniel P. Berrange wrote:
On Wed, Feb 01, 2012 at 06:04:45PM +0100, Michal Privoznik wrote:
> If we issue guest command and GA is not running, the issuing thread
> will block endlessly. We can check for GA presence by issuing
> guest-sync with unique ID (timestamp). We don't want to issue real
> command as even if GA is not running, once it is started, it process
> all commands written to GA socket.
>
> If we receive reply, it might be from a previous sync command, so
> we need to compare received ID for equality with the given one.
> ---
>
> Some background on this:
> -GA socked is always writeable, even with no GA running (!)
>
> -writing something to dead socket (=without any GA listening) will thus
> succeed. However, when one will (after a period of time) start the GA,
> it will read and process all written data. This makes timeout
> implementation harder.
>
> -Therefore issuing a GA command without any GA running will block indefinitely.
>
> My solution to this is: Issue harmless guest-sync command which returns
> given identifier. To avoid blocking, do a timed wait (5 seconds).
> This is the only case where we do a timed wait, regular GA commands
> will of course wait without any timeout.
> If GA didn't reply within those 5 seconds, we store an ID so we can compare
> it later and report 'GA unavailable'; Note that guest won't get into
> a different state as libvirt thinks it is in.
> Then, if GA is ever started, it start to process our sync commands.
> And all we have to do is check if we have ever issued such ID.
> If not an error is reported.
>
> However, there is still race:
> 1) GA will reply to our guest-sync
> 2) GA gets terminated
> 3) we issue regular command (e.g. fs-freeze)
>
> I am not sure this is avoidable. But this patch makes current situation bearable.
We could still timeout the 'fs-freeze' command after 30 seconds
or so. Given that we issue the guest-resync command, we'll be
able to automatically re-sync the JSON protocol by dropping the
later arriving fs-freeze reply (if any).
I don't think this is a good idea. I've chosen 'fs-freeze' intentionally
:) It's something that actually might take ages - to sync disks (which
is what current implementation does). Therefore if we set any timeout
for regular commands we may get into inconsistent state:
1) issue fs-freeze
2) timeout and return error (everybody thinks fs is not frozen)
3) receive "okay, frozen" from GA
>
> diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c
> index 9df5546..5ecf893 100644
> --- a/src/qemu/qemu_agent.c
> +++ b/src/qemu/qemu_agent.c
> @@ -39,6 +39,7 @@
> #include "virterror_internal.h"
> #include "json.h"
> #include "virfile.h"
> +#include "virtime.h"
>
> #define VIR_FROM_THIS VIR_FROM_QEMU
>
> @@ -106,6 +107,10 @@ struct _qemuAgent {
> /* If anything went wrong, this will be fed back
> * the next monitor msg */
> virError lastError;
> +
> + /* Array of IDs issued to GA via guest-sync */
> + unsigned long long *ids;
> + size_t ids_size;
> };
I'm not sure that we want / need to store all historic IDs.
At any time, aren't we only interested in the most recent ID
that we have sent.
We always serialize commands we sent to the agent, so consider
we sent
guest-sync: id=1
...times out...
guest-sync: id=2
fs-freeze:
After that first time out we see, we need to discard all
further data received from the agent until we see a guest-sync
command reply with id=2. If we see a guest-snc reply with
id=1, we don't care - we just drop it & carry on waiting
until we get one with id=2.
Makes sense.
> +/**
> + * qemuAgentGuestSync:
> + * @mon: Monitor
> + *
> + * Send guest-sync with unique ID and wait for
> + * reply. If we get one, check if received ID
> + * is equal to given.
> + *
> + * Returns: 0 on success,
> + * -1 otherwise
> + */
> +static int
> +qemuAgentGuestSync(qemuAgentPtr mon)
> +{
> + int ret = -1;
> + int send_ret;
> + unsigned long long id, id_ret;
> + qemuAgentMessage sync_msg;
> +
> + memset(&sync_msg, 0, sizeof sync_msg);
> +
> + if (virTimeMillisNow(&id) < 0)
> + return -1;
> +
> + if (virAsprintf(&sync_msg.txBuffer,
> + "{\"execute\":\"guest-sync\", "
> + "\"arguments\":{\"id\":%llu}}",
id) < 0) {
> + virReportOOMError();
> + return -1;
> + }
> + sync_msg.txLength = strlen(sync_msg.txBuffer);
> +
> + VIR_DEBUG("Sending guest-sync command with ID: %llu", id);
According to the 'guest-sync' QMP spec, we need to send the magic byte
'0xFF' immediately before the guest-sync command data is sent.
Yeah, and probably switch to new guest-sync-delimited command as soon as
it's upstream.
Regards,
Daniel