On Thu, Jul 25, 2024 at 12:57:59 +0200, Michal Privoznik wrote:
Currently, qemuProcessStop() unlocks given domain object right in
the middle of cleanup process. This is dangerous because there
might be another thread which is executing virDomainObjListAdd().
And since the domain object is on the list of domain objects AND
by the time qemuProcessStop() unlocks it the object is also
marked as inactive, the other thread acquires the lock and
switches vm->def pointer.
The unlocking of domain object is needed though, to allow even
processing thread finish its queue. Well, the processing can be
done before any cleanup is attempted.
Therefore, use freshly introduced virEventThreadStop() to join
the event thread and drop lock/unlock from the middle of
qemuProcessStop().
Fixes: 3865410e7f67ca4ec66e9a905e75f452762a97f0
Resolves:
https://issues.redhat.com/browse/RHEL-49607
Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
---
src/qemu/qemu_process.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 6255ba92e7..0869307a90 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -8615,6 +8615,14 @@ void qemuProcessStop(virQEMUDriver *driver,
* reporting so we don't squash a legit error. */
virErrorPreserveLast(&orig_err);
+ /* By unlocking the domain object the events processing thread is
+ * allowed to finish its job. */
+ if (priv->eventThread) {
+ virObjectUnlock(vm);
+ virEventThreadStop(priv->eventThread);
+ virObjectLock(vm);
+ }
Moving this unlocking above the line where we reset vm->def->id below
directly contradicts ...
+
if (asyncJob != VIR_ASYNC_JOB_NONE) {
if (virDomainObjBeginNestedJob(vm, asyncJob) < 0)
goto cleanup;
@@ -8681,10 +8689,8 @@ void qemuProcessStop(virQEMUDriver *driver,
/* Wake up anything waiting on domain condition */
virDomainObjBroadcast(vm);
- /* IMPORTANT: qemuDomainObjStopWorker() unlocks @vm in order to prevent
- * deadlocks with the per-VM event loop thread. This MUST be done after
- * marking the VM as dead */
... this message here. Now due to other patches I've pushed, this is now
not a problem directly as we now keep the 'beingDestroyed' flag set
during the whole time thanks to my patch and Jirka's patch fixing few
cases where the flag would be leaked.
What I'm missing is a note why this is safe which I'd expect in the
commit message:
This patch moves the unlocking of the VM object once again above the
code which marks the VM as inactive by clearing 'vm->def->id'.
At this point it's now safe to do as other qemu driver code which may
be synchronized on the VM lock now also checks the
'priv->beingDestroyed' flag in addition to 'vm->def->id'.
I'd also most likely prefer if 'qemuProcessStop' explicitly sets
'priv->beingDestroyed' to true before unlocking. The flag will be later
cleared but I didn't really go through all places where it's called to
see if we assert the flag before.
- qemuDomainObjStopWorker(vm);
+ if (priv->eventThread)
+ g_object_unref(g_steal_pointer(&priv->eventThread));
if (!!g_atomic_int_dec_and_test(&driver->nactive) &&
driver->inhibitCallback)
driver->inhibitCallback(false, driver->inhibitOpaque);
--
2.44.2