Laine Stump wrote:
On 01/13/2011 04:29 PM, Jim Fehlig wrote:
> Daniel P. Berrange wrote:
>> >
>> > Yep, it wasn't really intended as a fix. It was intended to make
>> > the error scenario clearly detectable, which has succeeded as per
>> > Jim's report. The fact that QMP returned an error in this way,
>> > means we can now reliably detect, usleep(1000), and then retry
>> > the 'cont' again at which point we should succeed.
>> >
> Something like the attached patch? I'm not quite sure about the retry
> bounds (currently 1 second). In my testing, it always succeeds on the
> second attempt - even with large memory vms.
In my tests, 250ms was more than enough, so I'm guessing it's okay,
although there's probably no hurt in making it a bit larger - it's not
like this is something that repeats all day every day :-)
Would it be possible for you to add the same thing into the text
monitor version of the code, so both fixes would travel together as
the patch gets cherry-picked around?
I can, but have not seen this issue with the text monitor. And the
error reporting is not so fine grained correct?
> 0001-qemu-Retry-JSON-monitor-cont-cmd-on-MigrationExpecte.patch
>
>
> > From ec9109b40a4b2c45035495e0e4f65824a92dcf3d Mon Sep 17 00:00:00 2001
> From: Jim Fehlig<jfehlig(a)novell.com>
> Date: Thu, 13 Jan 2011 12:52:23 -0700
> Subject: [PATCH] qemu: Retry JSON monitor cont cmd on
> MigrationExpected error
>
> When restoring a saved qemu instance via JSON monitor, the vm is
> left in a paused state. Turns out the 'cont' cmd was failing with
> "MigrationExpected" error class and "An incoming migration is
> expected before this command can be executed" error description
> due to migration (restore) not yet complete.
>
> Detect if 'cont' cmd fails with "MigrationExpecte" error class and
> retry 'cont' cmd.
> ---
> src/qemu/qemu_monitor_json.c | 20 +++++++++++++++++---
> 1 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c
> index 7877731..63b736e 100644
> --- a/src/qemu/qemu_monitor_json.c
> +++ b/src/qemu/qemu_monitor_json.c
> @@ -702,13 +702,27 @@ qemuMonitorJSONStartCPUs(qemuMonitorPtr mon,
> int ret;
> virJSONValuePtr cmd = qemuMonitorJSONMakeCommand("cont", NULL);
> virJSONValuePtr reply = NULL;
> + int i = 0, timeout = 3;
> if (!cmd)
> return -1;
>
> - ret = qemuMonitorJSONCommand(mon, cmd,&reply);
> + do {
> + ret = qemuMonitorJSONCommand(mon, cmd,&reply);
>
> - if (ret == 0)
> - ret = qemuMonitorJSONCheckError(cmd, reply);
> + if (ret != 0)
> + break;
> +
> + /* If no error, we're done */
> + if ((ret = qemuMonitorJSONCheckError(cmd, reply)) == 0)
> + break;
> +
> + /* If error class is not MigrationExpected, we're done.
> + * Otherwise try 'cont' cmd again */
> + if (!qemuMonitorJSONHasError(reply, "MigrationExpected"))
> + break;
> +
> + virJSONValueFree(reply);
> + } while ((++i<= timeout)&& (usleep(250000)<=0));
>
> virJSONValueFree(cmd);
> virJSONValueFree(reply);
Doesn't this end up doing a double-free of reply if it times out?
virJSONValueFree doesn't update the pointer that's free'd like
VIR_FREE does (it can't, since it's a function call rather than a macro).
virJSONValueFree() calls VIR_FREE() on the value passed to it, so reply
should be set to NULL when virJSONValueFree() returns.
While looking at the patch again, not sure I'm fond of the
while ((++i<= timeout) && (usleep(250000)<=0));
Perhaps should move the usleep inside the loop and clean that up a bit.
Regards,
Jim