This series follows [1] but address the issue slightly differently.
Instead of polling for RPC thread pool termination it waits for
thread pool drain in distinct thread and then signal the main loop
to exit.
The series introduces new driver's methods stateShutdown/stateShutdownWait
to finish all driver's background threads. The methods however are only
implemented for qemu driver and only partially. There are other drivers
that have background threads and I don't check every one of them in
terms of how they manage their background threads.
For example node driver creates 2 threads. One of them is supposed to live
a for a short amount of time and thus not tracked. This thread can cause issues
on shutdown. The second thread is tracked and finished synchronously on driver
cleanup. So this thread can not cause crashes but can cause hangs theoretically
speaking so we may want to move the thread finishing to stateShutdownWait
method so that possible hang will be handled by shutdown timeout.
The qemu driver also has untracked threads and they can cause crashes on
shutdown. For example reconnect threads or reboot thread. These need to be
tracked.
I'm going to address these issues in qemu and other drivers once the overall
approach will be approved.
I added 2 new driver's methods so that thread finishing will be done in
parallel. If we have only one method then the shutdown is done one by one
effectively.
I've added clean shutdown timeout in event loop as suggested by Daniel in [2].
Now I think why we can't just go with systemd unit management? Systemd will
eventually send SIGKILL and we can tune the timeout using TimeoutStopUSec
parameter. This way we even don't need to introduce new driver's methods.
Driver's background threads can be finished in stateCleanup method. AFAIU as
drivers are cleaned up in reverse order it is safe in the sense that already
cleaned up driver can not be used by background threads of not yet cleaned up
driver. Of course this way the cleanup is not done in parallel. Well to
turn it into parallel we can introduce just stateShutdown which we don't need
to call in netdaemon code and thus don't introduce undesired dependency of
netdaemon on drivers concept.
[1] Resolve libvirtd hang on termination with connected long running client
https://www.redhat.com/archives/libvir-list/2018-July/msg00382.html
[2] Races / crashes in shutdown of libvirtd daemon
https://www.redhat.com/archives/libvir-list/2020-April/msg01328.html
Nikolay Shirokovskiy (10):
libvirt: add stateShutdown/stateShutdownWait to drivers
util: always initialize priority condition
util: add stop/drain functions to thread pool
rpc: don't unref service ref on socket behalf twice
rpc: finish all threads before exiting main loop
vireventthread: add virEventThreadClose
qemu: exit thread synchronously in qemuDomainObjStopWorker
qemu: implement driver's shutdown/shutdown wait methods
rpc: cleanup virNetDaemonClose method
util: remove unused virThreadPoolNew macro
scripts/check-drivername.py | 2 +
src/driver-state.h | 8 ++++
src/libvirt.c | 42 +++++++++++++++++++
src/libvirt_internal.h | 2 +
src/libvirt_private.syms | 3 ++
src/libvirt_remote.syms | 1 -
src/qemu/qemu_domain.c | 1 +
src/qemu/qemu_driver.c | 32 +++++++++++++++
src/remote/remote_daemon.c | 3 --
src/rpc/virnetdaemon.c | 95 ++++++++++++++++++++++++++++++++++++-------
src/rpc/virnetdaemon.h | 2 -
src/rpc/virnetserver.c | 8 ++++
src/rpc/virnetserver.h | 1 +
src/rpc/virnetserverservice.c | 1 -
src/util/vireventthread.c | 9 ++++
src/util/vireventthread.h | 1 +
src/util/virthreadpool.c | 65 ++++++++++++++++++++---------
src/util/virthreadpool.h | 6 +--
18 files changed, 238 insertions(+), 44 deletions(-)
--
1.8.3.1