Re: [libvirt] [PATCH] Avoid a race when restoring a qemu domain.

7 Apr 2010

On 04/07/2010 02:39 PM, Laine Stump wrote:
...
This patch adds a 1 second sleep after telling qemu to start the
restore operation and before telling qemu to start up the
CPUs. Without this sleep, my hardware would end up with the CPUs
started before the restore was started, leading to random (but never
good) behavior. Apparently this is caused by slow hardware, as I
haven't heard of anyone else experiencing this problem.
A sleep is a very inelegant way to eliminate the problem, but it's
apparently the only way currently available to us.
Note that sleep durations as low as 250msec were successful in
eliminating the bad behavior; I made it 1 sec. just for extra safety.
---
 src/qemu/qemu_driver.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 60fa95a..1270c84 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -5965,6 +5965,13 @@ static int qemudDomainRestore(virConnectPtr conn,
     /* If it was running before, resume it now. */
     if (header.was_running) {
         qemuDomainObjPrivatePtr priv = vm->privateData;
+
+        /* pause 1 second to allow qemu time to start the restore,
+         * otherwise it may start the CPUs before the restore, and end
+         * up in a "nondeterminate" state.
+         */
+        usleep(1000000);
+
         qemuDomainObjEnterMonitorWithDriver(driver, vm);
         if (qemuMonitorStartCPUs(priv->mon, conn) < 0) {
             if (virGetLastError() == NULL)
Hm, this really doesn't seem like it's the way to fix this.  We really
should investigate what is going on in qemu, and see if it's a bug in
qemu itself (in which case we should fix qemu), or if it's a bug in the
way we communicate with qemu (in which case we should fix that).  A
sleep is just hiding the problem (which means it can still pop up on
machines slower, or more busy, than yours!).

-- 
Chris Lalancette