
On 7/23/20 7:14 AM, Nikolay Shirokovskiy wrote:
We are dropping the only reference here so that the event loop thread is going to be exited synchronously. In order to avoid deadlocks we need to unlock the VM so that any handler being called can finish execution and thus even loop thread be finished too.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_domain.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 5b22eb2..82b3d11 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1637,11 +1637,21 @@ void qemuDomainObjStopWorker(virDomainObjPtr dom) { qemuDomainObjPrivatePtr priv = dom->privateData; + virEventThread *eventThread;
- if (priv->eventThread) { - g_object_unref(priv->eventThread); - priv->eventThread = NULL; - } + if (!priv->eventThread) + return; + + /* + * We are dropping the only reference here so that the event loop thread + * is going to be exited synchronously. In order to avoid deadlocks we + * need to unlock the VM so that any handler being called can finish + * execution and thus even loop thread be finished too. + */ + eventThread = g_steal_pointer(&priv->eventThread); + virObjectUnlock(dom); + g_object_unref(eventThread); + virObjectLock(dom);
I understand the intention of the code thanks to the comment just before it, but this is not robust. This unlocking -> do stuff -> lock will only work if the caller of qemuDomainObjStopWorker() is holding the mutex. I see that this is the case for qemuDomainObjStopWorkerIter(), but qemuDomainObjStopWorker() is also called by qemuProcessStop(). qemuProcessStop() does not make any mutex lock/unlock, so you'll be assuming that all callers of qemuProcessStop() will hold the mutex before calling it. One of them is qemuProcessInit(), which calls qemuProcessStop() in the 'stop' jump, and there is no mutex lock beforehand as well. Now we're assuming that all callers of qemuProcessInit() are holding the mutex lock beforehand. In a quick glance in the code I saw at least 2 instances that this isn't the case, then we'll need to assume that the callers of those functions are holding the mutex lock. So on and so forth. Given that this code only makes sense when called from qemuDomainObjStopWorkerIter(), I'd suggest removing the lock/unlock of this function (turning it into just a call to qemuDomainObjStopWorker()) and move them inside the body of qemuDomainObjStopWorker(), locking and unlocking the mutex inside the scope of the same function. Thanks, DHB
}