
On Fri, Jan 22, 2021 at 14:09:52 +0300, Nikolay Shirokovskiy wrote:
On Fri, Jan 22, 2021 at 12:45 PM Michal Privoznik <mprivozn@redhat.com> wrote:
If libvirtd is sent SIGTERM while it is still initializing, it may crash. The following scenario was observed (using 'stress' to slow down CPU so much that the window where the problem exists is bigger):
1) The main thread is already executing virNetDaemonRun() and is in virEventRunDefaultImpl(). 2) The thread that's supposed to run daemonRunStateInit() is spawned already, but daemonRunStateInit() is in its very early stage (in the stack trace I see it's executing virIdentityGetSystem()).
If SIGTERM (or any other signal that we don't override handler for) arrives at this point, the main thread jumps out from virEventRunDefaultImpl() and enters virStateShutdownPrepare() (via shutdownPrepareCb which was set earlier). This iterates through stateShutdownPrepare() callbacks of state drivers and reaching qemuStateShutdownPrepare() eventually only to dereference qemu_driver. But since thread 2) has not been scheduled/not proceeded yet, qemu_driver was not allocated yet.
Solution is simple - just check if qemu_driver is not NULL. But doing so only in qemuStateShutdownPrepare() would push the problem down to virStateShutdownWait(), well qemuStateShutdownWait(). Therefore, duplicate the trick there too.
I guess this is a partial solution. Initialization may be in a state when qemu_driver is initialized but qemu_driver->workerPool is still NULL for example.
Maybe we'd better delay shutdown until initialization is finished?
The fix may not be ideal or complete, but it definitely fixes one non-race issue (see https://www.redhat.com/archives/libvir-list/2021-January/msg01134.html) Even if we come up with a flag indicating whether a driver was fully initialized or not, we still need to check whether it was initialized at all. Unless we want to introduce yet another static variable in the driver, but I think qemu_driver is already enough. So I guess if you reword the commit message to say this is fixing a case when driver initialization fails (rather than a race), you get my Reviewed-by: Jiri Denemark <jdenemar@redhat.com>