
Laine Stump wrote:
On 01/13/2011 04:29 PM, Jim Fehlig wrote:
Daniel P. Berrange wrote:
Yep, it wasn't really intended as a fix. It was intended to make the error scenario clearly detectable, which has succeeded as per Jim's report. The fact that QMP returned an error in this way, means we can now reliably detect, usleep(1000), and then retry the 'cont' again at which point we should succeed.
Something like the attached patch? I'm not quite sure about the retry bounds (currently 1 second). In my testing, it always succeeds on the second attempt - even with large memory vms.
In my tests, 250ms was more than enough, so I'm guessing it's okay, although there's probably no hurt in making it a bit larger - it's not like this is something that repeats all day every day :-)
Would it be possible for you to add the same thing into the text monitor version of the code, so both fixes would travel together as the patch gets cherry-picked around?
I can, but have not seen this issue with the text monitor. And the error reporting is not so fine grained correct?
0001-qemu-Retry-JSON-monitor-cont-cmd-on-MigrationExpecte.patch
From ec9109b40a4b2c45035495e0e4f65824a92dcf3d Mon Sep 17 00:00:00 2001 From: Jim Fehlig<jfehlig@novell.com> Date: Thu, 13 Jan 2011 12:52:23 -0700 Subject: [PATCH] qemu: Retry JSON monitor cont cmd on MigrationExpected error
When restoring a saved qemu instance via JSON monitor, the vm is left in a paused state. Turns out the 'cont' cmd was failing with "MigrationExpected" error class and "An incoming migration is expected before this command can be executed" error description due to migration (restore) not yet complete.
Detect if 'cont' cmd fails with "MigrationExpecte" error class and retry 'cont' cmd. --- src/qemu/qemu_monitor_json.c | 20 +++++++++++++++++--- 1 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 7877731..63b736e 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -702,13 +702,27 @@ qemuMonitorJSONStartCPUs(qemuMonitorPtr mon, int ret; virJSONValuePtr cmd = qemuMonitorJSONMakeCommand("cont", NULL); virJSONValuePtr reply = NULL; + int i = 0, timeout = 3; if (!cmd) return -1;
- ret = qemuMonitorJSONCommand(mon, cmd,&reply); + do { + ret = qemuMonitorJSONCommand(mon, cmd,&reply);
- if (ret == 0) - ret = qemuMonitorJSONCheckError(cmd, reply); + if (ret != 0) + break; + + /* If no error, we're done */ + if ((ret = qemuMonitorJSONCheckError(cmd, reply)) == 0) + break; + + /* If error class is not MigrationExpected, we're done. + * Otherwise try 'cont' cmd again */ + if (!qemuMonitorJSONHasError(reply, "MigrationExpected")) + break; + + virJSONValueFree(reply); + } while ((++i<= timeout)&& (usleep(250000)<=0));
virJSONValueFree(cmd); virJSONValueFree(reply);
Doesn't this end up doing a double-free of reply if it times out? virJSONValueFree doesn't update the pointer that's free'd like VIR_FREE does (it can't, since it's a function call rather than a macro).
virJSONValueFree() calls VIR_FREE() on the value passed to it, so reply should be set to NULL when virJSONValueFree() returns. While looking at the patch again, not sure I'm fond of the while ((++i<= timeout) && (usleep(250000)<=0)); Perhaps should move the usleep inside the loop and clean that up a bit. Regards, Jim