[libvirt] [PATCH] Avoid a race when restoring a qemu domain.

This patch adds a 1 second sleep after telling qemu to start the restore operation and before telling qemu to start up the CPUs. Without this sleep, my hardware would end up with the CPUs started before the restore was started, leading to random (but never good) behavior. Apparently this is caused by slow hardware, as I haven't heard of anyone else experiencing this problem. A sleep is a very inelegant way to eliminate the problem, but it's apparently the only way currently available to us. Note that sleep durations as low as 250msec were successful in eliminating the bad behavior; I made it 1 sec. just for extra safety. --- src/qemu/qemu_driver.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 60fa95a..1270c84 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -5965,6 +5965,13 @@ static int qemudDomainRestore(virConnectPtr conn, /* If it was running before, resume it now. */ if (header.was_running) { qemuDomainObjPrivatePtr priv = vm->privateData; + + /* pause 1 second to allow qemu time to start the restore, + * otherwise it may start the CPUs before the restore, and end + * up in a "nondeterminate" state. + */ + usleep(1000000); + qemuDomainObjEnterMonitorWithDriver(driver, vm); if (qemuMonitorStartCPUs(priv->mon, conn) < 0) { if (virGetLastError() == NULL) -- 1.6.6.1

On 04/07/2010 12:39 PM, Laine Stump wrote:
This patch adds a 1 second sleep after telling qemu to start the restore operation and before telling qemu to start up the CPUs. Without this sleep, my hardware would end up with the CPUs started before the restore was started, leading to random (but never good) behavior. Apparently this is caused by slow hardware, as I haven't heard of anyone else experiencing this problem.
Is there a BZ number or anything else we can list in the commit message showing that we've informed the qemu folks about their lack of a reliable notification that they are ready for the restore? ACK. But can you squash this in first? Otherwise, the usleep(1000000) will silently become a no-op on mingw, rather than sleeping for a second. diff --git i/bootstrap.conf w/bootstrap.conf index ac2f8e6..d55dc71 100644 --- i/bootstrap.conf +++ w/bootstrap.conf @@ -56,6 +56,7 @@ strsep sys_stat time_r useless-if-before-free +usleep vasprintf verify vc-list-files -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 04/07/2010 03:38 PM, Eric Blake wrote:
On 04/07/2010 12:39 PM, Laine Stump wrote:
This patch adds a 1 second sleep after telling qemu to start the restore operation and before telling qemu to start up the CPUs. Without this sleep, my hardware would end up with the CPUs started before the restore was started, leading to random (but never good) behavior. Apparently this is caused by slow hardware, as I haven't heard of anyone else experiencing this problem.
Is there a BZ number or anything else we can list in the commit message showing that we've informed the qemu folks about their lack of a reliable notification that they are ready for the restore?
No, good point. I need to understand the capabilities (and lack) of what's there before I can file a coherent bug report, but it definitely should be filed. (I arrived at this "fix" by asking around for advice when I encountered the problem, rather than by analyzing the code myself.)
ACK. But can you squash this in first? Otherwise, the usleep(1000000) will silently become a no-op on mingw, rather than sleeping for a second.
As far as I know, qemu_driver.c would never be compiled with mingw, since it only goes into libvirtd, and mingw is only used to build client-side code. Of course there are many other uses of usleep in the code, and some of them may actually be in client-side code (I haven't checked) but making them work properly should be a separate commit. At any rate, I don't think I'll be committing this just yet - I sent the patch to the list expecting there might be differences of opinion, but figured we had to start a discussion about what should be done somewhere.
diff --git i/bootstrap.conf w/bootstrap.conf index ac2f8e6..d55dc71 100644 --- i/bootstrap.conf +++ w/bootstrap.conf @@ -56,6 +56,7 @@ strsep sys_stat time_r useless-if-before-free +usleep vasprintf verify vc-list-files

On 04/07/2010 02:39 PM, Laine Stump wrote:
This patch adds a 1 second sleep after telling qemu to start the restore operation and before telling qemu to start up the CPUs. Without this sleep, my hardware would end up with the CPUs started before the restore was started, leading to random (but never good) behavior. Apparently this is caused by slow hardware, as I haven't heard of anyone else experiencing this problem.
A sleep is a very inelegant way to eliminate the problem, but it's apparently the only way currently available to us.
Note that sleep durations as low as 250msec were successful in eliminating the bad behavior; I made it 1 sec. just for extra safety. --- src/qemu/qemu_driver.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 60fa95a..1270c84 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -5965,6 +5965,13 @@ static int qemudDomainRestore(virConnectPtr conn, /* If it was running before, resume it now. */ if (header.was_running) { qemuDomainObjPrivatePtr priv = vm->privateData; + + /* pause 1 second to allow qemu time to start the restore, + * otherwise it may start the CPUs before the restore, and end + * up in a "nondeterminate" state. + */ + usleep(1000000); + qemuDomainObjEnterMonitorWithDriver(driver, vm); if (qemuMonitorStartCPUs(priv->mon, conn) < 0) { if (virGetLastError() == NULL)
Hm, this really doesn't seem like it's the way to fix this. We really should investigate what is going on in qemu, and see if it's a bug in qemu itself (in which case we should fix qemu), or if it's a bug in the way we communicate with qemu (in which case we should fix that). A sleep is just hiding the problem (which means it can still pop up on machines slower, or more busy, than yours!). -- Chris Lalancette

On 04/07/2010 03:43 PM, Chris Lalancette wrote:
Hm, this really doesn't seem like it's the way to fix this.
You are correct that it isn't what should be done in the long term. Short term, though, it definitely fixes bad behavior that I wouldn't want to see in an official release (on my hardware, restores will basically always fail unless the guest was paused prior to saving).
We really should investigate what is going on in qemu, and see if it's a bug in qemu itself (in which case we should fix qemu), or if it's a bug in the way we communicate with qemu (in which case we should fix that).
I'm operating on information I learned in an IRC chat. Perhaps Dan Berrange can pipe up here to repeat / expand on what he said, but basically it sounds like the problem is that qemu will happily start the CPUs for us before the restore operation has begun, and there's no way for us to verify whether or not it has begun - for that qemu will need to make 'info migrate' work on the incoming side, and that's not likely to happen very quickly (of course it will take even longer if I don't whine about it, I just haven't gotten there yet ;-)
A sleep is just hiding the problem
Yes, I dislike this solution. I'd love it if someone could tell me of an alternate way. If there is no other way to fix it entirely within libvirt, I don't think we should just report the problem to qemu and let users suffer until it gets fixed there, though; especially if that fix requires a new interface in qemu that must then be supported by libvirt, the path to reliably working domain restores could be very long indeed; and in the meantime we'd be left with delivered code that may fail in a rather bad way for someone, especially in the case of a managed save, where the image is deleted as soon as the domain is started - if it fails once, you've lost the image so you can't even try again.
(which means it can still pop up on machines slower, or more busy, than yours!).
I'm doubtful that slower VT-capable machines exist (although I haven't checked - possibly this same problem exists when doing software emulation too). I hadn't considered if this would pop up on faster hardware that was also busier - a very good point. (I did just do some more testing, and found that even 50msec is enough to make things work. 10msec isn't enough, though...)
participants (3)
-
Chris Lalancette
-
Eric Blake
-
Laine Stump