[PATCH v2 0/2] qemuProcessStop: Don't unlock domain during cleanup

v2 of: https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/ELMSQ... diff to v1: - The stopping of per-domain event thread is moved after killing QEMU's PID, - New/enhanced comments to warn developers. Michal Prívozník (2): vireventthread: Introduce virEventThreadStop qemu: Use virEventThreadStop() in qemuProcessStop() src/libvirt_private.syms | 1 + src/qemu/qemu_process.c | 18 ++++++++++++++---- src/util/vireventthread.c | 22 ++++++++++++++++++++++ src/util/vireventthread.h | 2 ++ 4 files changed, 39 insertions(+), 4 deletions(-) -- 2.44.2

The aim is to move parts of vir_event_thread_finalize() that MAY block into a separate function, so that unrefing the a virEventThread no longer blocks (or require releasing and subsequent re-acquiring of a mutex). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/vireventthread.c | 22 ++++++++++++++++++++++ src/util/vireventthread.h | 2 ++ 3 files changed, 25 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index d15d6a6a9d..52a8d12d9a 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2286,6 +2286,7 @@ virEventGLibRunOnce; # util/vireventthread.h virEventThreadGetContext; virEventThreadNew; +virEventThreadStop; # util/virfcp.h diff --git a/src/util/vireventthread.c b/src/util/vireventthread.c index 20d52b9efb..131f53820f 100644 --- a/src/util/vireventthread.c +++ b/src/util/vireventthread.c @@ -188,6 +188,28 @@ virEventThreadNew(const char *name) } +/** + * virEventThreadStop: + * @evt: event thread + * + * May block until all events are processed. Typical use case is: + * + * virEventThread *evt = virEventThreadNew("name"); + * ... + * virEventThreadStop(evt); + * g_object_unref(evt); + */ +void +virEventThreadStop(virEventThread *evt) +{ + if (evt->thread) { + g_main_loop_quit(evt->loop); + g_thread_join(evt->thread); + evt->thread = NULL; + } +} + + GMainContext * virEventThreadGetContext(virEventThread *evt) { diff --git a/src/util/vireventthread.h b/src/util/vireventthread.h index 5826c25cf4..78d842b894 100644 --- a/src/util/vireventthread.h +++ b/src/util/vireventthread.h @@ -28,4 +28,6 @@ G_DECLARE_FINAL_TYPE(virEventThread, vir_event_thread, VIR, EVENT_THREAD, GObjec virEventThread *virEventThreadNew(const char *name); +void virEventThreadStop(virEventThread *evt); + GMainContext *virEventThreadGetContext(virEventThread *evt); -- 2.44.2

Currently, qemuProcessStop() unlocks given domain object right in the middle of cleanup process. This is dangerous because there might be another thread which is executing virDomainObjListAdd(). And since the domain object is on the list of domain objects AND by the time qemuProcessStop() unlocks it the object is also marked as inactive, the other thread acquires the lock and switches vm->def pointer. The unlocking of domain object is needed though, to allow even processing thread finish its queue. Well, the processing can be done before any cleanup is attempted. Therefore, use freshly introduced virEventThreadStop() to join the event thread and drop lock/unlock from the middle of qemuProcessStop(). Fixes: 3865410e7f67ca4ec66e9a905e75f452762a97f0 Resolves: https://issues.redhat.com/browse/RHEL-49607 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index cec739c984..a07ce50fbf 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8692,6 +8692,16 @@ void qemuProcessStop(virQEMUDriver *driver, VIR_QEMU_PROCESS_KILL_FORCE| VIR_QEMU_PROCESS_KILL_NOCHECK)); + /* By unlocking the domain object the events processing thread is allowed + * to finish its job. Unlocking must happen before resetting vm->def->id as + * the global domain object list code depends on it (and it can't actually + * check 'priv->beingDestroyed as that's private). */ + if (priv->eventThread) { + virObjectUnlock(vm); + virEventThreadStop(priv->eventThread); + virObjectLock(vm); + } + if (priv->agent) { g_clear_pointer(&priv->agent, qemuAgentClose); } @@ -8723,13 +8733,13 @@ void qemuProcessStop(virQEMUDriver *driver, vm->def->id = -1; priv->beingDestroyed = false; + /* No unlocking of @vm after this point until whole cleanup is done. */ + /* Wake up anything waiting on domain condition */ virDomainObjBroadcast(vm); - /* IMPORTANT: qemuDomainObjStopWorker() unlocks @vm in order to prevent - * deadlocks with the per-VM event loop thread. This MUST be done after - * marking the VM as dead */ - qemuDomainObjStopWorker(vm); + if (priv->eventThread) + g_object_unref(g_steal_pointer(&priv->eventThread)); if (!!g_atomic_int_dec_and_test(&driver->nactive) && driver->inhibitCallback) driver->inhibitCallback(false, driver->inhibitOpaque); -- 2.44.2

On Wed, Aug 07, 2024 at 14:44:56 +0200, Michal Privoznik wrote:
Currently, qemuProcessStop() unlocks given domain object right in the middle of cleanup process. This is dangerous because there might be another thread which is executing virDomainObjListAdd(). And since the domain object is on the list of domain objects AND by the time qemuProcessStop() unlocks it the object is also marked as inactive, the other thread acquires the lock and switches vm->def pointer.
The unlocking of domain object is needed though, to allow even processing thread finish its queue. Well, the processing can be done before any cleanup is attempted.
Therefore, use freshly introduced virEventThreadStop() to join the event thread and drop lock/unlock from the middle of qemuProcessStop().
Fixes: 3865410e7f67ca4ec66e9a905e75f452762a97f0 Resolves: https://issues.redhat.com/browse/RHEL-49607 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
I've requested that the commit message carries an explanation why you can change the code to contradict what the coment, which is being deleted below states.
src/qemu/qemu_process.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index cec739c984..a07ce50fbf 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8692,6 +8692,16 @@ void qemuProcessStop(virQEMUDriver *driver, VIR_QEMU_PROCESS_KILL_FORCE| VIR_QEMU_PROCESS_KILL_NOCHECK));
+ /* By unlocking the domain object the events processing thread is allowed + * to finish its job. Unlocking must happen before resetting vm->def->id as + * the global domain object list code depends on it (and it can't actually + * check 'priv->beingDestroyed as that's private). */ + if (priv->eventThread) {
I've also requested to explicitly set 'priv->beingDestroyed' before unlocking as qemuProcessStop may be called from code paths which don't do that. With both of the above addressed as I've requested in previous review: Reviewed-by: Peter Krempa <pkrempa@redhat.com>
participants (3)
-
Michal Privoznik
-
Michal Prívozník
-
Peter Krempa