On Fri, Jan 22, 2021 at 14:09:52 +0300, Nikolay Shirokovskiy wrote:
On Fri, Jan 22, 2021 at 12:45 PM Michal Privoznik
<mprivozn(a)redhat.com>
wrote:
> If libvirtd is sent SIGTERM while it is still initializing, it
> may crash. The following scenario was observed (using 'stress' to
> slow down CPU so much that the window where the problem exists is
> bigger):
>
> 1) The main thread is already executing virNetDaemonRun() and is
> in virEventRunDefaultImpl().
> 2) The thread that's supposed to run daemonRunStateInit() is
> spawned already, but daemonRunStateInit() is in its very early
> stage (in the stack trace I see it's executing
> virIdentityGetSystem()).
>
> If SIGTERM (or any other signal that we don't override handler
> for) arrives at this point, the main thread jumps out from
> virEventRunDefaultImpl() and enters virStateShutdownPrepare()
> (via shutdownPrepareCb which was set earlier). This iterates
> through stateShutdownPrepare() callbacks of state drivers and
> reaching qemuStateShutdownPrepare() eventually only to
> dereference qemu_driver. But since thread 2) has not been
> scheduled/not proceeded yet, qemu_driver was not allocated yet.
>
> Solution is simple - just check if qemu_driver is not NULL. But
> doing so only in qemuStateShutdownPrepare() would push the
> problem down to virStateShutdownWait(), well
> qemuStateShutdownWait(). Therefore, duplicate the trick there
> too.
>
I guess this is a partial solution. Initialization may be in a state when
qemu_driver is initialized but qemu_driver->workerPool is still NULL
for example.
Maybe we'd better delay shutdown until initialization is finished?
The fix may not be ideal or complete, but it definitely fixes one
non-race issue (see
https://www.redhat.com/archives/libvir-list/2021-January/msg01134.html)
Even if we come up with a flag indicating whether a driver was fully
initialized or not, we still need to check whether it was initialized at
all. Unless we want to introduce yet another static variable in the
driver, but I think qemu_driver is already enough.
So I guess if you reword the commit message to say this is fixing a case
when driver initialization fails (rather than a race), you get my
Reviewed-by: Jiri Denemark <jdenemar(a)redhat.com>