[PATCH v2 00/13] resolve hangs/crashes on libvirtd shutdown

I keep qemu VM event loop exiting synchronously but add code to avoid deadlock that can be caused by this approach. I guess it is worth having synchronous exiting of threads in this case to avoid crashes. Patches that are already positively reviewed has appropriate 'Reviewed-by' lines. Changes from v1: - rename stateShutdown to state stateShutdownPrepare - introduce net daemon shutdown callbacks - make some adjustments in terms of qemu per VM's event loop thread finishing - factor out net server shutdown facilities into distinct patch - increase shutdown timeout from 15s to 30s Nikolay Shirokovskiy (13): libvirt: add stateShutdownPrepare/stateShutdownWait to drivers util: always initialize priority condition util: add stop/drain functions to thread pool rpc: don't unref service ref on socket behalf twice rpc: add virNetDaemonSetShutdownCallbacks rpc: add shutdown facilities to netserver rpc: finish all threads before exiting main loop qemu: don't shutdown event thread in monitor EOF callback vireventthread: exit thread synchronously on finalize qemu: avoid deadlock in qemuDomainObjStopWorker qemu: implement driver's shutdown/shutdown wait methods rpc: cleanup virNetDaemonClose method util: remove unused virThreadPoolNew macro scripts/check-drivername.py | 2 + src/driver-state.h | 8 ++++ src/libvirt.c | 42 ++++++++++++++++ src/libvirt_internal.h | 2 + src/libvirt_private.syms | 4 ++ src/libvirt_remote.syms | 2 +- src/qemu/qemu_domain.c | 18 +++++-- src/qemu/qemu_driver.c | 32 +++++++++++++ src/qemu/qemu_process.c | 3 -- src/remote/remote_daemon.c | 6 +-- src/rpc/virnetdaemon.c | 109 ++++++++++++++++++++++++++++++++++++------ src/rpc/virnetdaemon.h | 8 +++- src/rpc/virnetserver.c | 8 ++++ src/rpc/virnetserver.h | 1 + src/rpc/virnetserverservice.c | 1 - src/util/vireventthread.c | 1 + src/util/virthreadpool.c | 65 +++++++++++++++++-------- src/util/virthreadpool.h | 6 +-- 18 files changed, 267 insertions(+), 51 deletions(-) -- 1.8.3.1

stateShutdownPrepare is supposed to inform driver that it will be closed soon so that the driver can prepare and finish all background threads quickly on stateShutdownWait call. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- scripts/check-drivername.py | 2 ++ src/driver-state.h | 8 ++++++++ src/libvirt.c | 42 ++++++++++++++++++++++++++++++++++++++++++ src/libvirt_internal.h | 2 ++ src/libvirt_private.syms | 2 ++ 5 files changed, 56 insertions(+) diff --git a/scripts/check-drivername.py b/scripts/check-drivername.py index 39eff83..cce8e7d 100644 --- a/scripts/check-drivername.py +++ b/scripts/check-drivername.py @@ -50,6 +50,8 @@ for drvfile in drvfiles: "virDrvStateCleanup", "virDrvStateReload", "virDrvStateStop", + "virDrvStateShutdownPrepare", + "virDrvStateShutdownWait", "virDrvConnectSupportsFeature", "virDrvConnectURIProbe", "virDrvDomainMigratePrepare", diff --git a/src/driver-state.h b/src/driver-state.h index 6b3f501..767d8e8 100644 --- a/src/driver-state.h +++ b/src/driver-state.h @@ -45,6 +45,12 @@ typedef int typedef int (*virDrvStateStop)(void); +typedef int +(*virDrvStateShutdownPrepare)(void); + +typedef int +(*virDrvStateShutdownWait)(void); + typedef struct _virStateDriver virStateDriver; typedef virStateDriver *virStateDriverPtr; @@ -55,4 +61,6 @@ struct _virStateDriver { virDrvStateCleanup stateCleanup; virDrvStateReload stateReload; virDrvStateStop stateStop; + virDrvStateShutdownPrepare stateShutdownPrepare; + virDrvStateShutdownWait stateShutdownWait; }; diff --git a/src/libvirt.c b/src/libvirt.c index b2d0ba3..c5089ea 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -673,6 +673,48 @@ virStateInitialize(bool privileged, /** + * virStateShutdownPrepare: + * + * Run each virtualization driver's shutdown prepare method. + * + * Returns 0 if all succeed, -1 upon any failure. + */ +int +virStateShutdownPrepare(void) +{ + size_t i; + + for (i = 0; i < virStateDriverTabCount; i++) { + if (virStateDriverTab[i]->stateShutdownPrepare && + virStateDriverTab[i]->stateShutdownPrepare() < 0) + return -1; + } + return 0; +} + + +/** + * virStateShutdownWait: + * + * Run each virtualization driver's shutdown wait method. + * + * Returns 0 if all succeed, -1 upon any failure. + */ +int +virStateShutdownWait(void) +{ + size_t i; + + for (i = 0; i < virStateDriverTabCount; i++) { + if (virStateDriverTab[i]->stateShutdownWait && + virStateDriverTab[i]->stateShutdownWait() < 0) + return -1; + } + return 0; +} + + +/** * virStateCleanup: * * Run each virtualization driver's cleanup method. diff --git a/src/libvirt_internal.h b/src/libvirt_internal.h index 72c6127..e27030b 100644 --- a/src/libvirt_internal.h +++ b/src/libvirt_internal.h @@ -34,6 +34,8 @@ int virStateInitialize(bool privileged, const char *root, virStateInhibitCallback inhibit, void *opaque); +int virStateShutdownPrepare(void); +int virStateShutdownWait(void); int virStateCleanup(void); int virStateReload(void); int virStateStop(void); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 73b72c9..cf3ccab 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1503,6 +1503,8 @@ virSetSharedStorageDriver; virStateCleanup; virStateInitialize; virStateReload; +virStateShutdownPrepare; +virStateShutdownWait; virStateStop; virStreamInData; -- 1.8.3.1

Even if we have no priority threads on pool creation we can add them thru virThreadPoolSetParameters later. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- src/util/virthreadpool.c | 22 +++++++--------------- 1 file changed, 7 insertions(+), 15 deletions(-) diff --git a/src/util/virthreadpool.c b/src/util/virthreadpool.c index 379d236..10a44de 100644 --- a/src/util/virthreadpool.c +++ b/src/util/virthreadpool.c @@ -245,6 +245,8 @@ virThreadPoolNewFull(size_t minWorkers, goto error; if (virCondInit(&pool->cond) < 0) goto error; + if (virCondInit(&pool->prioCond) < 0) + goto error; if (virCondInit(&pool->quit_cond) < 0) goto error; @@ -255,13 +257,8 @@ virThreadPoolNewFull(size_t minWorkers, if (virThreadPoolExpand(pool, minWorkers, false) < 0) goto error; - if (prioWorkers) { - if (virCondInit(&pool->prioCond) < 0) - goto error; - - if (virThreadPoolExpand(pool, prioWorkers, true) < 0) - goto error; - } + if (virThreadPoolExpand(pool, prioWorkers, true) < 0) + goto error; return pool; @@ -274,7 +271,6 @@ virThreadPoolNewFull(size_t minWorkers, void virThreadPoolFree(virThreadPoolPtr pool) { virThreadPoolJobPtr job; - bool priority = false; if (!pool) return; @@ -283,10 +279,8 @@ void virThreadPoolFree(virThreadPoolPtr pool) pool->quit = true; if (pool->nWorkers > 0) virCondBroadcast(&pool->cond); - if (pool->nPrioWorkers > 0) { - priority = true; + if (pool->nPrioWorkers > 0) virCondBroadcast(&pool->prioCond); - } while (pool->nWorkers > 0 || pool->nPrioWorkers > 0) ignore_value(virCondWait(&pool->quit_cond, &pool->mutex)); @@ -301,10 +295,8 @@ void virThreadPoolFree(virThreadPoolPtr pool) virMutexDestroy(&pool->mutex); virCondDestroy(&pool->quit_cond); virCondDestroy(&pool->cond); - if (priority) { - VIR_FREE(pool->prioWorkers); - virCondDestroy(&pool->prioCond); - } + VIR_FREE(pool->prioWorkers); + virCondDestroy(&pool->prioCond); VIR_FREE(pool); } -- 1.8.3.1

Stop just send signal for threads to exit when they finish with current task. Drain waits when all threads will finish. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- src/libvirt_private.syms | 2 ++ src/util/virthreadpool.c | 43 ++++++++++++++++++++++++++++++++++++++----- src/util/virthreadpool.h | 3 +++ 3 files changed, 43 insertions(+), 5 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index cf3ccab..0a9afc5 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -3328,6 +3328,7 @@ virThreadJobSetWorker; # util/virthreadpool.h +virThreadPoolDrain; virThreadPoolFree; virThreadPoolGetCurrentWorkers; virThreadPoolGetFreeWorkers; @@ -3338,6 +3339,7 @@ virThreadPoolGetPriorityWorkers; virThreadPoolNewFull; virThreadPoolSendJob; virThreadPoolSetParameters; +virThreadPoolStop; # util/virtime.h diff --git a/src/util/virthreadpool.c b/src/util/virthreadpool.c index 10a44de..ca44f55 100644 --- a/src/util/virthreadpool.c +++ b/src/util/virthreadpool.c @@ -268,19 +268,27 @@ virThreadPoolNewFull(size_t minWorkers, } -void virThreadPoolFree(virThreadPoolPtr pool) -{ - virThreadPoolJobPtr job; - if (!pool) +static void +virThreadPoolStopLocked(virThreadPoolPtr pool) +{ + if (pool->quit) return; - virMutexLock(&pool->mutex); pool->quit = true; if (pool->nWorkers > 0) virCondBroadcast(&pool->cond); if (pool->nPrioWorkers > 0) virCondBroadcast(&pool->prioCond); +} + + +static void +virThreadPoolDrainLocked(virThreadPoolPtr pool) +{ + virThreadPoolJobPtr job; + + virThreadPoolStopLocked(pool); while (pool->nWorkers > 0 || pool->nPrioWorkers > 0) ignore_value(virCondWait(&pool->quit_cond, &pool->mutex)); @@ -289,6 +297,15 @@ void virThreadPoolFree(virThreadPoolPtr pool) pool->jobList.head = pool->jobList.head->next; VIR_FREE(job); } +} + +void virThreadPoolFree(virThreadPoolPtr pool) +{ + if (!pool) + return; + + virMutexLock(&pool->mutex); + virThreadPoolDrainLocked(pool); VIR_FREE(pool->workers); virMutexUnlock(&pool->mutex); @@ -475,3 +492,19 @@ virThreadPoolSetParameters(virThreadPoolPtr pool, virMutexUnlock(&pool->mutex); return -1; } + +void +virThreadPoolStop(virThreadPoolPtr pool) +{ + virMutexLock(&pool->mutex); + virThreadPoolStopLocked(pool); + virMutexUnlock(&pool->mutex); +} + +void +virThreadPoolDrain(virThreadPoolPtr pool) +{ + virMutexLock(&pool->mutex); + virThreadPoolDrainLocked(pool); + virMutexUnlock(&pool->mutex); +} diff --git a/src/util/virthreadpool.h b/src/util/virthreadpool.h index c97d9b3..dd1aaf3 100644 --- a/src/util/virthreadpool.h +++ b/src/util/virthreadpool.h @@ -56,3 +56,6 @@ int virThreadPoolSetParameters(virThreadPoolPtr pool, long long int minWorkers, long long int maxWorkers, long long int prioWorkers); + +void virThreadPoolStop(virThreadPoolPtr pool); +void virThreadPoolDrain(virThreadPoolPtr pool); -- 1.8.3.1

Second unref was added in [1]. We don't need it actually as we pass free callback to virNetSocketAddIOCallback thus when we call virNetSocketRemoveIOCallback the extra ref for callback will be dropped without extra efforts. [1] 355d8f470f9: virNetServerServiceClose: Don't leak sockets Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- src/rpc/virnetserverservice.c | 1 - 1 file changed, 1 deletion(-) diff --git a/src/rpc/virnetserverservice.c b/src/rpc/virnetserverservice.c index 9d5df45..e4165ea 100644 --- a/src/rpc/virnetserverservice.c +++ b/src/rpc/virnetserverservice.c @@ -449,6 +449,5 @@ void virNetServerServiceClose(virNetServerServicePtr svc) for (i = 0; i < svc->nsocks; i++) { virNetSocketRemoveIOCallback(svc->socks[i]); virNetSocketClose(svc->socks[i]); - virObjectUnref(svc); } } -- 1.8.3.1

The function is used to set shutdown prepare and wait callbacks. Prepare callback is used to inform other threads of the daemon that the daemon will be closed soon so that they can start to shutdown. Wait callback is used to wait for other threads to actually finish. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/libvirt_remote.syms | 1 + src/rpc/virnetdaemon.c | 15 +++++++++++++++ src/rpc/virnetdaemon.h | 6 ++++++ 3 files changed, 22 insertions(+) diff --git a/src/libvirt_remote.syms b/src/libvirt_remote.syms index 0018a0c..335e431 100644 --- a/src/libvirt_remote.syms +++ b/src/libvirt_remote.syms @@ -87,6 +87,7 @@ virNetDaemonPreExecRestart; virNetDaemonQuit; virNetDaemonRemoveShutdownInhibition; virNetDaemonRun; +virNetDaemonSetShutdownCallbacks; virNetDaemonUpdateServices; diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index bb81a43..5a5643c 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -69,6 +69,8 @@ struct _virNetDaemon { virHashTablePtr servers; virJSONValuePtr srvObject; + virNetDaemonShutdownCallback shutdownPrepareCb; + virNetDaemonShutdownCallback shutdownWaitCb; bool quit; unsigned int autoShutdownTimeout; @@ -922,3 +924,16 @@ virNetDaemonHasClients(virNetDaemonPtr dmn) return ret; } + +void +virNetDaemonSetShutdownCallbacks(virNetDaemonPtr dmn, + virNetDaemonShutdownCallback prepareCb, + virNetDaemonShutdownCallback waitCb) +{ + virObjectLock(dmn); + + dmn->shutdownPrepareCb = prepareCb; + dmn->shutdownWaitCb = waitCb; + + virObjectUnlock(dmn); +} diff --git a/src/rpc/virnetdaemon.h b/src/rpc/virnetdaemon.h index c2c7767..9f7a282 100644 --- a/src/rpc/virnetdaemon.h +++ b/src/rpc/virnetdaemon.h @@ -82,3 +82,9 @@ virNetServerPtr virNetDaemonGetServer(virNetDaemonPtr dmn, ssize_t virNetDaemonGetServers(virNetDaemonPtr dmn, virNetServerPtr **servers); bool virNetDaemonHasServer(virNetDaemonPtr dmn, const char *serverName); + +typedef int (*virNetDaemonShutdownCallback)(void); + +void virNetDaemonSetShutdownCallbacks(virNetDaemonPtr dmn, + virNetDaemonShutdownCallback prepareCb, + virNetDaemonShutdownCallback waitCb); -- 1.8.3.1

On 7/23/20 7:14 AM, Nikolay Shirokovskiy wrote:
The function is used to set shutdown prepare and wait callbacks. Prepare callback is used to inform other threads of the daemon that the daemon will be closed soon so that they can start to shutdown. Wait callback is used to wait for other threads to actually finish.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> ---
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
src/libvirt_remote.syms | 1 + src/rpc/virnetdaemon.c | 15 +++++++++++++++ src/rpc/virnetdaemon.h | 6 ++++++ 3 files changed, 22 insertions(+)
diff --git a/src/libvirt_remote.syms b/src/libvirt_remote.syms index 0018a0c..335e431 100644 --- a/src/libvirt_remote.syms +++ b/src/libvirt_remote.syms @@ -87,6 +87,7 @@ virNetDaemonPreExecRestart; virNetDaemonQuit; virNetDaemonRemoveShutdownInhibition; virNetDaemonRun; +virNetDaemonSetShutdownCallbacks; virNetDaemonUpdateServices;
diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index bb81a43..5a5643c 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -69,6 +69,8 @@ struct _virNetDaemon { virHashTablePtr servers; virJSONValuePtr srvObject;
+ virNetDaemonShutdownCallback shutdownPrepareCb; + virNetDaemonShutdownCallback shutdownWaitCb; bool quit;
unsigned int autoShutdownTimeout; @@ -922,3 +924,16 @@ virNetDaemonHasClients(virNetDaemonPtr dmn)
return ret; } + +void +virNetDaemonSetShutdownCallbacks(virNetDaemonPtr dmn, + virNetDaemonShutdownCallback prepareCb, + virNetDaemonShutdownCallback waitCb) +{ + virObjectLock(dmn); + + dmn->shutdownPrepareCb = prepareCb; + dmn->shutdownWaitCb = waitCb; + + virObjectUnlock(dmn); +} diff --git a/src/rpc/virnetdaemon.h b/src/rpc/virnetdaemon.h index c2c7767..9f7a282 100644 --- a/src/rpc/virnetdaemon.h +++ b/src/rpc/virnetdaemon.h @@ -82,3 +82,9 @@ virNetServerPtr virNetDaemonGetServer(virNetDaemonPtr dmn, ssize_t virNetDaemonGetServers(virNetDaemonPtr dmn, virNetServerPtr **servers); bool virNetDaemonHasServer(virNetDaemonPtr dmn, const char *serverName); + +typedef int (*virNetDaemonShutdownCallback)(void); + +void virNetDaemonSetShutdownCallbacks(virNetDaemonPtr dmn, + virNetDaemonShutdownCallback prepareCb, + virNetDaemonShutdownCallback waitCb);

virNetServerClose and virNetServerShutdownWait are used to start net server threads shutdown and wait net server threads to actually finish respectively during net daemon shutdown procedure. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- src/rpc/virnetserver.c | 8 ++++++++ src/rpc/virnetserver.h | 1 + 2 files changed, 9 insertions(+) diff --git a/src/rpc/virnetserver.c b/src/rpc/virnetserver.c index e0a2386..79ea9f6 100644 --- a/src/rpc/virnetserver.c +++ b/src/rpc/virnetserver.c @@ -942,9 +942,17 @@ void virNetServerClose(virNetServerPtr srv) for (i = 0; i < srv->nclients; i++) virNetServerClientClose(srv->clients[i]); + virThreadPoolStop(srv->workers); + virObjectUnlock(srv); } +void +virNetServerShutdownWait(virNetServerPtr srv) +{ + virThreadPoolDrain(srv->workers); +} + static inline size_t virNetServerTrackPendingAuthLocked(virNetServerPtr srv) { diff --git a/src/rpc/virnetserver.h b/src/rpc/virnetserver.h index 1c6a2ef..112a51d 100644 --- a/src/rpc/virnetserver.h +++ b/src/rpc/virnetserver.h @@ -56,6 +56,7 @@ virNetServerPtr virNetServerNewPostExecRestart(virJSONValuePtr object, ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5) ATTRIBUTE_NONNULL(6); void virNetServerClose(virNetServerPtr srv); +void virNetServerShutdownWait(virNetServerPtr srv); virJSONValuePtr virNetServerPreExecRestart(virNetServerPtr srv); -- 1.8.3.1

Currently we have issues like [1] on libvirtd shutdown as we cleanup while RPC and other threads are still running. Let's finish all threads other then main before cleanup. The approach to finish threads is suggested in [2]. In order to finish RPC threads serving API calls we let the event loop run but stop accepting new API calls and block processing any pending API calls. We also inform all drivers of shutdown so they can prepare for shutdown too. Then we wait for all RPC threads and driver's background thread to finish. If finishing takes more then 15s we just exit as we can't safely cleanup in time. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1828207 [2] https://www.redhat.com/archives/libvir-list/2020-April/msg01328.html Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- src/remote/remote_daemon.c | 6 ++-- src/rpc/virnetdaemon.c | 81 +++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 83 insertions(+), 4 deletions(-) diff --git a/src/remote/remote_daemon.c b/src/remote/remote_daemon.c index 1aa9bfc..2ac4f6c 100644 --- a/src/remote/remote_daemon.c +++ b/src/remote/remote_daemon.c @@ -1193,6 +1193,9 @@ int main(int argc, char **argv) { #endif /* Run event loop. */ + virNetDaemonSetShutdownCallbacks(dmn, + virStateShutdownPrepare, + virStateShutdownWait); virNetDaemonRun(dmn); ret = 0; @@ -1201,9 +1204,6 @@ int main(int argc, char **argv) { 0, "shutdown", NULL, NULL); cleanup: - /* Keep cleanup order in inverse order of startup */ - virNetDaemonClose(dmn); - virNetlinkEventServiceStopAll(); if (driversInitialized) { diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index 5a5643c..1d7f00d 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -71,7 +71,10 @@ struct _virNetDaemon { virNetDaemonShutdownCallback shutdownPrepareCb; virNetDaemonShutdownCallback shutdownWaitCb; + int finishTimer; bool quit; + bool finished; + bool graceful; unsigned int autoShutdownTimeout; size_t autoShutdownInhibitions; @@ -82,6 +85,11 @@ struct _virNetDaemon { static virClassPtr virNetDaemonClass; +static int +daemonServerClose(void *payload, + const void *key G_GNUC_UNUSED, + void *opaque G_GNUC_UNUSED); + static void virNetDaemonDispose(void *obj) { @@ -798,11 +806,53 @@ daemonServerProcessClients(void *payload, return 0; } +static int +daemonServerShutdownWait(void *payload, + const void *key G_GNUC_UNUSED, + void *opaque G_GNUC_UNUSED) +{ + virNetServerPtr srv = payload; + + virNetServerShutdownWait(srv); + return 0; +} + +static void +daemonShutdownWait(void *opaque) +{ + virNetDaemonPtr dmn = opaque; + bool graceful = false; + + virHashForEach(dmn->servers, daemonServerShutdownWait, NULL); + if (dmn->shutdownWaitCb && dmn->shutdownWaitCb() < 0) + goto finish; + + graceful = true; + + finish: + virObjectLock(dmn); + dmn->graceful = graceful; + virEventUpdateTimeout(dmn->finishTimer, 0); + virObjectUnlock(dmn); +} + +static void +virNetDaemonFinishTimer(int timerid G_GNUC_UNUSED, + void *opaque) +{ + virNetDaemonPtr dmn = opaque; + + virObjectLock(dmn); + dmn->finished = true; + virObjectUnlock(dmn); +} + void virNetDaemonRun(virNetDaemonPtr dmn) { int timerid = -1; bool timerActive = false; + virThread shutdownThread; virObjectLock(dmn); @@ -813,6 +863,9 @@ virNetDaemonRun(virNetDaemonPtr dmn) } dmn->quit = false; + dmn->finishTimer = -1; + dmn->finished = false; + dmn->graceful = false; if (dmn->autoShutdownTimeout && (timerid = virEventAddTimeout(-1, @@ -828,7 +881,7 @@ virNetDaemonRun(virNetDaemonPtr dmn) virSystemdNotifyStartup(); VIR_DEBUG("dmn=%p quit=%d", dmn, dmn->quit); - while (!dmn->quit) { + while (!dmn->finished) { /* A shutdown timeout is specified, so check * if any drivers have active state, if not * shutdown after timeout seconds @@ -859,6 +912,32 @@ virNetDaemonRun(virNetDaemonPtr dmn) virObjectLock(dmn); virHashForEach(dmn->servers, daemonServerProcessClients, NULL); + + if (dmn->quit && dmn->finishTimer == -1) { + virHashForEach(dmn->servers, daemonServerClose, NULL); + if (dmn->shutdownPrepareCb && dmn->shutdownPrepareCb() < 0) + break; + + if ((dmn->finishTimer = virEventAddTimeout(30 * 1000, + virNetDaemonFinishTimer, + dmn, NULL)) < 0) { + VIR_WARN("Failed to register finish timer."); + break; + } + + if (virThreadCreateFull(&shutdownThread, true, daemonShutdownWait, + "daemon-shutdown", false, dmn) < 0) { + VIR_WARN("Failed to register join thread."); + break; + } + } + } + + if (dmn->graceful) { + virThreadJoin(&shutdownThread); + } else { + VIR_WARN("Make forcefull daemon shutdown"); + exit(EXIT_FAILURE); } cleanup: -- 1.8.3.1

On Thu, Jul 23, 2020 at 01:14:07PM +0300, Nikolay Shirokovskiy wrote:
Currently we have issues like [1] on libvirtd shutdown as we cleanup while RPC and other threads are still running. Let's finish all threads other then main before cleanup.
The approach to finish threads is suggested in [2]. In order to finish RPC threads serving API calls we let the event loop run but stop accepting new API calls and block processing any pending API calls. We also inform all drivers of shutdown so they can prepare for shutdown too. Then we wait for all RPC threads and driver's background thread to finish. If finishing takes more then 15s we just exit as we can't safely cleanup in time.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1828207 [2] https://www.redhat.com/archives/libvir-list/2020-April/msg01328.html
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- src/remote/remote_daemon.c | 6 ++-- src/rpc/virnetdaemon.c | 81 +++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 83 insertions(+), 4 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

This hunk was introduced in [1] in order to avoid loosing events from monitor on stopping qemu process. But as explained in [2] on destroy we won't get neither EOF nor any other events as monitor is just closed. In case of crash/shutdown we won't get any more events as well and qemuDomainObjStopWorker will be called by qemuProcessStop eventually. Thus let's remove qemuDomainObjStopWorker from qemuProcessHandleMonitorEOF as it is not useful anymore. [1] e6afacb0f: qemu: start/stop an event loop thread for domains [2] d2954c072: qemu: ensure domain event thread is always stopped Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_process.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index e8b15ee..44098fe 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -322,9 +322,6 @@ qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, qemuDomainDestroyNamespace(driver, vm); cleanup: - /* Now we got EOF we're not expecting more I/O, so we - * can finally kill the event thread */ - qemuDomainObjStopWorker(vm); virObjectUnlock(vm); } -- 1.8.3.1

On 7/23/20 7:14 AM, Nikolay Shirokovskiy wrote:
This hunk was introduced in [1] in order to avoid loosing events from monitor on stopping qemu process. But as explained in [2] on destroy we won't get neither EOF nor any other events as monitor is just closed. In case of crash/shutdown we won't get any more events as well and qemuDomainObjStopWorker will be called by qemuProcessStop eventually. Thus let's remove qemuDomainObjStopWorker from qemuProcessHandleMonitorEOF as it is not useful anymore.
[1] e6afacb0f: qemu: start/stop an event loop thread for domains [2] d2954c072: qemu: ensure domain event thread is always stopped
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> ---
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
src/qemu/qemu_process.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index e8b15ee..44098fe 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -322,9 +322,6 @@ qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, qemuDomainDestroyNamespace(driver, vm);
cleanup: - /* Now we got EOF we're not expecting more I/O, so we - * can finally kill the event thread */ - qemuDomainObjStopWorker(vm); virObjectUnlock(vm); }

On Thu, Jul 23, 2020 at 01:14:08PM +0300, Nikolay Shirokovskiy wrote:
This hunk was introduced in [1] in order to avoid loosing events from monitor on stopping qemu process. But as explained in [2] on destroy we won't get neither EOF nor any other events as monitor is just closed. In case of crash/shutdown we won't get any more events as well and qemuDomainObjStopWorker will be called by qemuProcessStop eventually. Thus let's remove qemuDomainObjStopWorker from qemuProcessHandleMonitorEOF as it is not useful anymore.
[1] e6afacb0f: qemu: start/stop an event loop thread for domains [2] d2954c072: qemu: ensure domain event thread is always stopped
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_process.c | 3 --- 1 file changed, 3 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

It useful to be sure no thread is running after we drop all references to virEventThread. Otherwise in order to avoid crashes we need to synchronize some other way or we make extra references in event handler callbacks to all the object in use. And some of them are not prepared to be refcounted. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/util/vireventthread.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/util/vireventthread.c b/src/util/vireventthread.c index cf86592..136a550 100644 --- a/src/util/vireventthread.c +++ b/src/util/vireventthread.c @@ -43,6 +43,7 @@ vir_event_thread_finalize(GObject *object) if (evt->thread) { g_main_loop_quit(evt->loop); + g_thread_join(evt->thread); g_thread_unref(evt->thread); } -- 1.8.3.1

On 7/23/20 7:14 AM, Nikolay Shirokovskiy wrote:
It useful to be sure no thread is running after we drop all references to
Nit: "It is useful to ..." Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
virEventThread. Otherwise in order to avoid crashes we need to synchronize some other way or we make extra references in event handler callbacks to all the object in use. And some of them are not prepared to be refcounted.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/util/vireventthread.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/src/util/vireventthread.c b/src/util/vireventthread.c index cf86592..136a550 100644 --- a/src/util/vireventthread.c +++ b/src/util/vireventthread.c @@ -43,6 +43,7 @@ vir_event_thread_finalize(GObject *object)
if (evt->thread) { g_main_loop_quit(evt->loop); + g_thread_join(evt->thread); g_thread_unref(evt->thread); }

On Thu, Jul 23, 2020 at 01:14:09PM +0300, Nikolay Shirokovskiy wrote:
It useful to be sure no thread is running after we drop all references to virEventThread. Otherwise in order to avoid crashes we need to synchronize some other way or we make extra references in event handler callbacks to all the object in use. And some of them are not prepared to be refcounted.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/util/vireventthread.c | 1 + 1 file changed, 1 insertion(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

We are dropping the only reference here so that the event loop thread is going to be exited synchronously. In order to avoid deadlocks we need to unlock the VM so that any handler being called can finish execution and thus even loop thread be finished too. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_domain.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 5b22eb2..82b3d11 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1637,11 +1637,21 @@ void qemuDomainObjStopWorker(virDomainObjPtr dom) { qemuDomainObjPrivatePtr priv = dom->privateData; + virEventThread *eventThread; - if (priv->eventThread) { - g_object_unref(priv->eventThread); - priv->eventThread = NULL; - } + if (!priv->eventThread) + return; + + /* + * We are dropping the only reference here so that the event loop thread + * is going to be exited synchronously. In order to avoid deadlocks we + * need to unlock the VM so that any handler being called can finish + * execution and thus even loop thread be finished too. + */ + eventThread = g_steal_pointer(&priv->eventThread); + virObjectUnlock(dom); + g_object_unref(eventThread); + virObjectLock(dom); } -- 1.8.3.1

On 7/23/20 7:14 AM, Nikolay Shirokovskiy wrote:
We are dropping the only reference here so that the event loop thread is going to be exited synchronously. In order to avoid deadlocks we need to unlock the VM so that any handler being called can finish execution and thus even loop thread be finished too.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_domain.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 5b22eb2..82b3d11 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1637,11 +1637,21 @@ void qemuDomainObjStopWorker(virDomainObjPtr dom) { qemuDomainObjPrivatePtr priv = dom->privateData; + virEventThread *eventThread;
- if (priv->eventThread) { - g_object_unref(priv->eventThread); - priv->eventThread = NULL; - } + if (!priv->eventThread) + return; + + /* + * We are dropping the only reference here so that the event loop thread + * is going to be exited synchronously. In order to avoid deadlocks we + * need to unlock the VM so that any handler being called can finish + * execution and thus even loop thread be finished too. + */ + eventThread = g_steal_pointer(&priv->eventThread); + virObjectUnlock(dom); + g_object_unref(eventThread); + virObjectLock(dom);
I understand the intention of the code thanks to the comment just before it, but this is not robust. This unlocking -> do stuff -> lock will only work if the caller of qemuDomainObjStopWorker() is holding the mutex. I see that this is the case for qemuDomainObjStopWorkerIter(), but qemuDomainObjStopWorker() is also called by qemuProcessStop(). qemuProcessStop() does not make any mutex lock/unlock, so you'll be assuming that all callers of qemuProcessStop() will hold the mutex before calling it. One of them is qemuProcessInit(), which calls qemuProcessStop() in the 'stop' jump, and there is no mutex lock beforehand as well. Now we're assuming that all callers of qemuProcessInit() are holding the mutex lock beforehand. In a quick glance in the code I saw at least 2 instances that this isn't the case, then we'll need to assume that the callers of those functions are holding the mutex lock. So on and so forth. Given that this code only makes sense when called from qemuDomainObjStopWorkerIter(), I'd suggest removing the lock/unlock of this function (turning it into just a call to qemuDomainObjStopWorker()) and move them inside the body of qemuDomainObjStopWorker(), locking and unlocking the mutex inside the scope of the same function. Thanks, DHB
}

On 10.08.2020 20:40, Daniel Henrique Barboza wrote:
On 7/23/20 7:14 AM, Nikolay Shirokovskiy wrote:
We are dropping the only reference here so that the event loop thread is going to be exited synchronously. In order to avoid deadlocks we need to unlock the VM so that any handler being called can finish execution and thus even loop thread be finished too.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_domain.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 5b22eb2..82b3d11 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1637,11 +1637,21 @@ void qemuDomainObjStopWorker(virDomainObjPtr dom) { qemuDomainObjPrivatePtr priv = dom->privateData; + virEventThread *eventThread; - if (priv->eventThread) { - g_object_unref(priv->eventThread); - priv->eventThread = NULL; - } + if (!priv->eventThread) + return; + + /* + * We are dropping the only reference here so that the event loop thread + * is going to be exited synchronously. In order to avoid deadlocks we + * need to unlock the VM so that any handler being called can finish + * execution and thus even loop thread be finished too. + */ + eventThread = g_steal_pointer(&priv->eventThread); + virObjectUnlock(dom); + g_object_unref(eventThread); + virObjectLock(dom);
I understand the intention of the code thanks to the comment just before it, but this is not robust. This unlocking -> do stuff -> lock will only work if the caller of qemuDomainObjStopWorker() is holding the mutex.
I see that this is the case for qemuDomainObjStopWorkerIter(), but qemuDomainObjStopWorker() is also called by qemuProcessStop(). qemuProcessStop() does not make any mutex lock/unlock, so you'll be assuming that all callers of qemuProcessStop() will hold the mutex before calling it. One of them is qemuProcessInit(), which calls qemuProcessStop() in the 'stop' jump, and there is no mutex lock beforehand as well.
Now we're assuming that all callers of qemuProcessInit() are holding the mutex lock beforehand. In a quick glance in the code I saw at least 2 instances that this isn't the case, then we'll need to assume that the callers of those functions are holding the mutex lock. So on and so forth.
Given that this code only makes sense when called from qemuDomainObjStopWorkerIter(), I'd suggest removing the lock/unlock of this function (turning it into just a call to qemuDomainObjStopWorker()) and move them inside the body of qemuDomainObjStopWorker(), locking and unlocking the mutex inside the scope of the same function.
Hi, Daniel. Actually all callers of qemuProcessStop hold the lock. Moreover they even hold job condition. And calling qemuProcessStop without lock/job condition would be a mistake AFIU (qemuConnectDomainXMLToNative is the only exception I see that holds the lock but not the job condition. But this function creates new vm object that is not shared with other threads) I understand you concern but there are precedents. Take a look for example virDomainObjListRemove. The lock requirements are documented for virDomainObjListRemove and I can do it for qemuDomainObjStopWorker too but it looks a bit over documenting to me. Nikolay

On 8/11/20 3:39 AM, Nikolay Shirokovskiy wrote:
On 10.08.2020 20:40, Daniel Henrique Barboza wrote:
On 7/23/20 7:14 AM, Nikolay Shirokovskiy wrote:
We are dropping the only reference here so that the event loop thread is going to be exited synchronously. In order to avoid deadlocks we need to unlock the VM so that any handler being called can finish execution and thus even loop thread be finished too.
Given that this code only makes sense when called from qemuDomainObjStopWorkerIter(), I'd suggest removing the lock/unlock of this function (turning it into just a call to qemuDomainObjStopWorker()) and move them inside the body of qemuDomainObjStopWorker(), locking and unlocking the mutex inside the scope of the same function.
Hi, Daniel.
Actually all callers of qemuProcessStop hold the lock. Moreover they even hold job condition. And calling qemuProcessStop without lock/job condition would be a mistake AFIU (qemuConnectDomainXMLToNative is the only exception I see that holds the lock but not the job condition. But this function creates new vm object that is not shared with other threads) I understand you concern but there are precedents. Take a look for example virDomainObjListRemove. The lock requirements are documented for virDomainObjListRemove and I can do it for qemuDomainObjStopWorker too but it looks a bit over documenting to me.
Got it. Thanks for explaining.
Nikolay

On 7/23/20 7:14 AM, Nikolay Shirokovskiy wrote:
We are dropping the only reference here so that the event loop thread is going to be exited synchronously. In order to avoid deadlocks we need to unlock the VM so that any handler being called can finish execution and thus even loop thread be finished too.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> ---
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
src/qemu/qemu_domain.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 5b22eb2..82b3d11 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1637,11 +1637,21 @@ void qemuDomainObjStopWorker(virDomainObjPtr dom) { qemuDomainObjPrivatePtr priv = dom->privateData; + virEventThread *eventThread;
- if (priv->eventThread) { - g_object_unref(priv->eventThread); - priv->eventThread = NULL; - } + if (!priv->eventThread) + return; + + /* + * We are dropping the only reference here so that the event loop thread + * is going to be exited synchronously. In order to avoid deadlocks we + * need to unlock the VM so that any handler being called can finish + * execution and thus even loop thread be finished too. + */ + eventThread = g_steal_pointer(&priv->eventThread); + virObjectUnlock(dom); + g_object_unref(eventThread); + virObjectLock(dom); }

On Thu, Jul 23, 2020 at 01:14:10PM +0300, Nikolay Shirokovskiy wrote:
We are dropping the only reference here so that the event loop thread is going to be exited synchronously. In order to avoid deadlocks we need to unlock the VM so that any handler being called can finish execution and thus even loop thread be finished too.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_domain.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 5b22eb2..82b3d11 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1637,11 +1637,21 @@ void qemuDomainObjStopWorker(virDomainObjPtr dom) { qemuDomainObjPrivatePtr priv = dom->privateData; + virEventThread *eventThread;
- if (priv->eventThread) { - g_object_unref(priv->eventThread); - priv->eventThread = NULL; - } + if (!priv->eventThread) + return; + + /* + * We are dropping the only reference here so that the event loop thread + * is going to be exited synchronously. In order to avoid deadlocks we + * need to unlock the VM so that any handler being called can finish + * execution and thus even loop thread be finished too. + */ + eventThread = g_steal_pointer(&priv->eventThread); + virObjectUnlock(dom); + g_object_unref(eventThread); + virObjectLock(dom);
Hmm, I'm really not at all comfortable with this. I don't have confidence that qemuProcessStop() safe if we drpo & reacquire the lock half way through its cleanup process. The 'virDomainObj' is in a semi-clean state, 1/2 running 1/2 shutoff. This is the key reason I think I didn't do the synchronous thread join in the event loop. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Fri, Sep 04, 2020 at 05:46:18PM +0100, Daniel P. Berrangé wrote:
On Thu, Jul 23, 2020 at 01:14:10PM +0300, Nikolay Shirokovskiy wrote:
We are dropping the only reference here so that the event loop thread is going to be exited synchronously. In order to avoid deadlocks we need to unlock the VM so that any handler being called can finish execution and thus even loop thread be finished too.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_domain.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 5b22eb2..82b3d11 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1637,11 +1637,21 @@ void qemuDomainObjStopWorker(virDomainObjPtr dom) { qemuDomainObjPrivatePtr priv = dom->privateData; + virEventThread *eventThread;
- if (priv->eventThread) { - g_object_unref(priv->eventThread); - priv->eventThread = NULL; - } + if (!priv->eventThread) + return; + + /* + * We are dropping the only reference here so that the event loop thread + * is going to be exited synchronously. In order to avoid deadlocks we + * need to unlock the VM so that any handler being called can finish + * execution and thus even loop thread be finished too. + */ + eventThread = g_steal_pointer(&priv->eventThread); + virObjectUnlock(dom); + g_object_unref(eventThread); + virObjectLock(dom);
Hmm, I'm really not at all comfortable with this. I don't have confidence that qemuProcessStop() safe if we drpo & reacquire the lock half way through its cleanup process. The 'virDomainObj' is in a semi-clean state, 1/2 running 1/2 shutoff.
This is the key reason I think I didn't do the synchronous thread join in the event loop.
Hmm, I see your justification in the other reply now. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On shutdown we just stop accepting new jobs for worker thread so that on shutdown wait we can exit worker thread faster. Yes we basically stop processing of events for VMs but we are going to do so anyway in case of daemon shutdown. At the same time synchronous event processing that some API calls may require are still possible as per VM event loop is still running and we don't need worker thread for synchronous event processing. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- src/qemu/qemu_driver.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 8e81c30..78a1aaf 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1098,6 +1098,36 @@ qemuStateStop(void) return ret; } + +static int +qemuStateShutdownPrepare(void) +{ + virThreadPoolStop(qemu_driver->workerPool); + return 0; +} + + +static int +qemuDomainObjStopWorkerIter(virDomainObjPtr vm, + void *opaque G_GNUC_UNUSED) +{ + virObjectLock(vm); + qemuDomainObjStopWorker(vm); + virObjectUnlock(vm); + return 0; +} + + +static int +qemuStateShutdownWait(void) +{ + virDomainObjListForEach(qemu_driver->domains, false, + qemuDomainObjStopWorkerIter, NULL); + virThreadPoolDrain(qemu_driver->workerPool); + return 0; +} + + /** * qemuStateCleanup: * @@ -23402,6 +23432,8 @@ static virStateDriver qemuStateDriver = { .stateCleanup = qemuStateCleanup, .stateReload = qemuStateReload, .stateStop = qemuStateStop, + .stateShutdownPrepare = qemuStateShutdownPrepare, + .stateShutdownWait = qemuStateShutdownWait, }; int qemuRegister(void) -- 1.8.3.1

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- src/libvirt_remote.syms | 1 - src/rpc/virnetdaemon.c | 13 ------------- src/rpc/virnetdaemon.h | 2 -- 3 files changed, 16 deletions(-) diff --git a/src/libvirt_remote.syms b/src/libvirt_remote.syms index 335e431..c8f2e31 100644 --- a/src/libvirt_remote.syms +++ b/src/libvirt_remote.syms @@ -75,7 +75,6 @@ virNetDaemonAddServer; virNetDaemonAddShutdownInhibition; virNetDaemonAddSignalHandler; virNetDaemonAutoShutdown; -virNetDaemonClose; virNetDaemonGetServer; virNetDaemonGetServers; virNetDaemonHasClients; diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index 1d7f00d..e2f4d5c 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -967,19 +967,6 @@ daemonServerClose(void *payload, return 0; } -void -virNetDaemonClose(virNetDaemonPtr dmn) -{ - if (!dmn) - return; - - virObjectLock(dmn); - - virHashForEach(dmn->servers, daemonServerClose, NULL); - - virObjectUnlock(dmn); -} - static int daemonServerHasClients(void *payload, const void *key G_GNUC_UNUSED, diff --git a/src/rpc/virnetdaemon.h b/src/rpc/virnetdaemon.h index 9f7a282..6ae5305 100644 --- a/src/rpc/virnetdaemon.h +++ b/src/rpc/virnetdaemon.h @@ -73,8 +73,6 @@ void virNetDaemonRun(virNetDaemonPtr dmn); void virNetDaemonQuit(virNetDaemonPtr dmn); -void virNetDaemonClose(virNetDaemonPtr dmn); - bool virNetDaemonHasClients(virNetDaemonPtr dmn); virNetServerPtr virNetDaemonGetServer(virNetDaemonPtr dmn, -- 1.8.3.1

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- src/util/virthreadpool.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/src/util/virthreadpool.h b/src/util/virthreadpool.h index dd1aaf3..cb800dc 100644 --- a/src/util/virthreadpool.h +++ b/src/util/virthreadpool.h @@ -28,9 +28,6 @@ typedef virThreadPool *virThreadPoolPtr; typedef void (*virThreadPoolJobFunc)(void *jobdata, void *opaque); -#define virThreadPoolNew(min, max, prio, func, opaque) \ - virThreadPoolNewFull(min, max, prio, func, #func, opaque) - virThreadPoolPtr virThreadPoolNewFull(size_t minWorkers, size_t maxWorkers, size_t prioWorkers, -- 1.8.3.1

polite ping On 23.07.2020 13:14, Nikolay Shirokovskiy wrote:
I keep qemu VM event loop exiting synchronously but add code to avoid deadlock that can be caused by this approach. I guess it is worth having synchronous exiting of threads in this case to avoid crashes.
Patches that are already positively reviewed has appropriate 'Reviewed-by' lines.
Changes from v1: - rename stateShutdown to state stateShutdownPrepare - introduce net daemon shutdown callbacks - make some adjustments in terms of qemu per VM's event loop thread finishing - factor out net server shutdown facilities into distinct patch - increase shutdown timeout from 15s to 30s
Nikolay Shirokovskiy (13): libvirt: add stateShutdownPrepare/stateShutdownWait to drivers util: always initialize priority condition util: add stop/drain functions to thread pool rpc: don't unref service ref on socket behalf twice rpc: add virNetDaemonSetShutdownCallbacks rpc: add shutdown facilities to netserver rpc: finish all threads before exiting main loop qemu: don't shutdown event thread in monitor EOF callback vireventthread: exit thread synchronously on finalize qemu: avoid deadlock in qemuDomainObjStopWorker qemu: implement driver's shutdown/shutdown wait methods rpc: cleanup virNetDaemonClose method util: remove unused virThreadPoolNew macro
scripts/check-drivername.py | 2 + src/driver-state.h | 8 ++++ src/libvirt.c | 42 ++++++++++++++++ src/libvirt_internal.h | 2 + src/libvirt_private.syms | 4 ++ src/libvirt_remote.syms | 2 +- src/qemu/qemu_domain.c | 18 +++++-- src/qemu/qemu_driver.c | 32 +++++++++++++ src/qemu/qemu_process.c | 3 -- src/remote/remote_daemon.c | 6 +-- src/rpc/virnetdaemon.c | 109 ++++++++++++++++++++++++++++++++++++------ src/rpc/virnetdaemon.h | 8 +++- src/rpc/virnetserver.c | 8 ++++ src/rpc/virnetserver.h | 1 + src/rpc/virnetserverservice.c | 1 - src/util/vireventthread.c | 1 + src/util/virthreadpool.c | 65 +++++++++++++++++-------- src/util/virthreadpool.h | 6 +-- 18 files changed, 267 insertions(+), 51 deletions(-)

one more time Sorry if this is on somebody's yet to be reviewed patches stack. It is not clear now in current workflow if this patch track lost or not. On 06.08.2020 17:10, Nikolay Shirokovskiy wrote:
polite ping
On 23.07.2020 13:14, Nikolay Shirokovskiy wrote:
I keep qemu VM event loop exiting synchronously but add code to avoid deadlock that can be caused by this approach. I guess it is worth having synchronous exiting of threads in this case to avoid crashes.
Patches that are already positively reviewed has appropriate 'Reviewed-by' lines.
Changes from v1: - rename stateShutdown to state stateShutdownPrepare - introduce net daemon shutdown callbacks - make some adjustments in terms of qemu per VM's event loop thread finishing - factor out net server shutdown facilities into distinct patch - increase shutdown timeout from 15s to 30s
Nikolay Shirokovskiy (13): libvirt: add stateShutdownPrepare/stateShutdownWait to drivers util: always initialize priority condition util: add stop/drain functions to thread pool rpc: don't unref service ref on socket behalf twice rpc: add virNetDaemonSetShutdownCallbacks rpc: add shutdown facilities to netserver rpc: finish all threads before exiting main loop qemu: don't shutdown event thread in monitor EOF callback vireventthread: exit thread synchronously on finalize qemu: avoid deadlock in qemuDomainObjStopWorker qemu: implement driver's shutdown/shutdown wait methods rpc: cleanup virNetDaemonClose method util: remove unused virThreadPoolNew macro
scripts/check-drivername.py | 2 + src/driver-state.h | 8 ++++ src/libvirt.c | 42 ++++++++++++++++ src/libvirt_internal.h | 2 + src/libvirt_private.syms | 4 ++ src/libvirt_remote.syms | 2 +- src/qemu/qemu_domain.c | 18 +++++-- src/qemu/qemu_driver.c | 32 +++++++++++++ src/qemu/qemu_process.c | 3 -- src/remote/remote_daemon.c | 6 +-- src/rpc/virnetdaemon.c | 109 ++++++++++++++++++++++++++++++++++++------ src/rpc/virnetdaemon.h | 8 +++- src/rpc/virnetserver.c | 8 ++++ src/rpc/virnetserver.h | 1 + src/rpc/virnetserverservice.c | 1 - src/util/vireventthread.c | 1 + src/util/virthreadpool.c | 65 +++++++++++++++++-------- src/util/virthreadpool.h | 6 +-- 18 files changed, 267 insertions(+), 51 deletions(-)

Pushed now. Thanx everyone for review. Patch "[PATCH v2 05/13] rpc: add virNetDaemonSetShutdownCallbacks" does not have mantainer review but I guess it is ok as the patch is simple enough and it's API is used in other patches of series. I would also want to note that crashes are still possible because not all threads are joined on shutdown in qemu driver and other drivers. For example in qemu driver we spawn a thread during fake reboot or on daemon startup we spawn threads for every VM to reconnect. So I would like to continue to work on this issues. Is this considered worth the effort? I realize these are rare corner cases, like daemon shutdown immediately after daemon start or daemon shutdown coincide with some domain reboot etc. Nikolay On 23.07.2020 13:14, Nikolay Shirokovskiy wrote:
I keep qemu VM event loop exiting synchronously but add code to avoid deadlock that can be caused by this approach. I guess it is worth having synchronous exiting of threads in this case to avoid crashes.
Patches that are already positively reviewed has appropriate 'Reviewed-by' lines.
Changes from v1: - rename stateShutdown to state stateShutdownPrepare - introduce net daemon shutdown callbacks - make some adjustments in terms of qemu per VM's event loop thread finishing - factor out net server shutdown facilities into distinct patch - increase shutdown timeout from 15s to 30s
Nikolay Shirokovskiy (13): libvirt: add stateShutdownPrepare/stateShutdownWait to drivers util: always initialize priority condition util: add stop/drain functions to thread pool rpc: don't unref service ref on socket behalf twice rpc: add virNetDaemonSetShutdownCallbacks rpc: add shutdown facilities to netserver rpc: finish all threads before exiting main loop qemu: don't shutdown event thread in monitor EOF callback vireventthread: exit thread synchronously on finalize qemu: avoid deadlock in qemuDomainObjStopWorker qemu: implement driver's shutdown/shutdown wait methods rpc: cleanup virNetDaemonClose method util: remove unused virThreadPoolNew macro
scripts/check-drivername.py | 2 + src/driver-state.h | 8 ++++ src/libvirt.c | 42 ++++++++++++++++ src/libvirt_internal.h | 2 + src/libvirt_private.syms | 4 ++ src/libvirt_remote.syms | 2 +- src/qemu/qemu_domain.c | 18 +++++-- src/qemu/qemu_driver.c | 32 +++++++++++++ src/qemu/qemu_process.c | 3 -- src/remote/remote_daemon.c | 6 +-- src/rpc/virnetdaemon.c | 109 ++++++++++++++++++++++++++++++++++++------ src/rpc/virnetdaemon.h | 8 +++- src/rpc/virnetserver.c | 8 ++++ src/rpc/virnetserver.h | 1 + src/rpc/virnetserverservice.c | 1 - src/util/vireventthread.c | 1 + src/util/virthreadpool.c | 65 +++++++++++++++++-------- src/util/virthreadpool.h | 6 +-- 18 files changed, 267 insertions(+), 51 deletions(-)

On Mon, Sep 07, 2020 at 10:12:08AM +0300, Nikolay Shirokovskiy wrote:
Pushed now. Thanx everyone for review.
Patch "[PATCH v2 05/13] rpc: add virNetDaemonSetShutdownCallbacks" does not have mantainer review but I guess it is ok as the patch is simple enough and it's API is used in other patches of series.
I would also want to note that crashes are still possible because not all threads are joined on shutdown in qemu driver and other drivers. For example in qemu driver we spawn a thread during fake reboot or on daemon startup we spawn threads for every VM to reconnect. So I would like to continue to work on this issues. Is this considered worth the effort? I realize these are rare corner cases, like daemon shutdown immediately after daemon start or daemon shutdown coincide with some domain reboot etc.
Personally I wouldn't spend time on such edge cases, as I feel there are probably worse problems in libvirt needing attention first, but if you want to send patches we would of course review them. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (3)
-
Daniel Henrique Barboza
-
Daniel P. Berrangé
-
Nikolay Shirokovskiy