On 27.10.2016 15:59, John Ferlan wrote:
On 10/04/2016 10:27 AM, Nikolay Shirokovskiy wrote:
> This fix SEGV with the backtrace [1]
>
> This happens due to race on simultaneous finishing of libvirtd and
> qemu process. We need to keep daemon object until all hypervisor
> drivers are cleaned up. The other option is to take reference to
> daemon in every hypervisor driver but this will take more work
> and we really don't need to. Drivers cleanup procedure is synchronous
> thus let's just remove last reference to daemon after drivers cleanup.
>
> [1] backtrace (shortened a bit)
>
> 1 0x00007fd3a791b2e6 in virCondWait (c=<optimized out>, m=<optimized
out>) at util/virthread.c:154
> 2 0x00007fd3a791bcb0 in virThreadPoolFree (pool=0x7fd38024ee00) at
util/virthreadpool.c:266
> 3 0x00007fd38edaa00e in qemuStateCleanup () at qemu/qemu_driver.c:1116
> 4 0x00007fd3a79abfeb in virStateCleanup () at libvirt.c:808
> 5 0x00007fd3a85f2c9e in main (argc=<optimized out>, argv=<optimized
out>) at libvirtd.c:1660
>
> Thread 1 (Thread 0x7fd38722d700 (LWP 32256)):
> 0 0x00007fd3a7900910 in virClassIsDerivedFrom (klass=0xdfd36058d4853,
parent=0x7fd3a8f394d0) at util/virobject.c:169
> 1 0x00007fd3a7900c4e in virObjectIsClass (anyobj=anyobj@entry=0x7fd3a8f2f850,
klass=<optimized out>) at util/virobject.c:365
> 2 0x00007fd3a7900c74 in virObjectLock (anyobj=0x7fd3a8f2f850) at
util/virobject.c:317
> 3 0x00007fd3a7a24d5d in virNetDaemonRemoveShutdownInhibition (dmn=0x7fd3a8f2f850) at
rpc/virnetdaemon.c:547
> 4 0x00007fd38ed722cf in qemuProcessStop (driver=driver@entry=0x7fd380103810,
vm=vm@entry=0x7fd38025b6d0, reason=reason@entry=VIR_DOMAIN_SHUTOFF_SHUTDOWN,
asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_NONE,
> flags=flags@entry=0) at qemu/qemu_process.c:5786
> 5 0x00007fd38edd9428 in processMonitorEOFEvent (vm=0x7fd38025b6d0,
driver=0x7fd380103810) at qemu/qemu_driver.c:4588
> 6 qemuProcessEventHandler (data=<optimized out>, opaque=0x7fd380103810) at
qemu/qemu_driver.c:4632
> 7 0x00007fd3a791bb55 in virThreadPoolWorker (opaque=opaque@entry=0x7fd3a8f1e4c0) at
util/virthreadpool.c:145
> ---
> daemon/libvirtd.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
I'd like to adjust the commit message just a bit... In particular
removing the "The other option..." sentence.
ok
So the "problem" discovered is that virStateInitialize requires/uses the
"dmn" reference as the argument to the daemonInhibitCallback and by
dereferencing the object too early and then calling the virStateCleanup
which calls the daemonInhibitCallback using the dmn we end up with the SEGV.
yep
> diff --git a/daemon/libvirtd.c b/daemon/libvirtd.c
> index 95c1b1c..eefdefc 100644
> --- a/daemon/libvirtd.c
> +++ b/daemon/libvirtd.c
> @@ -1628,7 +1628,6 @@ int main(int argc, char **argv) {
> virObjectUnref(qemuProgram);
> virObjectUnref(adminProgram);
> virNetDaemonClose(dmn);
> - virObjectUnref(dmn);
> virObjectUnref(srv);
> virObjectUnref(srvAdm);
> virNetlinkShutdown();
> @@ -1658,6 +1657,9 @@ int main(int argc, char **argv) {
> driversInitialized = false;
> virStateCleanup();
> }
Another option would have been to move the above hunk up... Since
setting the daemonStateInit is one of the last things done before
running the event loop, it stands to reason shutting down should be done
in the opposite order thus causing those InhibitCallbacks to be run
perhaps after virNetlinkEventServiceStopAll and before virNetDaemonClose.
By the above hunk you mean virStateCleanup() et all? If yes - then I don't
think so. The thing is that start is tricky.
1. We start the network part, but keep it in disabled state.
In this state it creates sockets and listens on them but not accept
connections.
2. Initialize all the hyper drivers and then enable the network part
to let it accept connections.
Thus effectively the order is reverse to what is seems -
first hyper drivers and then network servers.
Actually there is a direct reasoning why hyper drivers shutted down after
network part - otherwise requests can be delivered to hyper drivers in
invalid state.
Nikolay
Any thoughts to that?
This I assume works, but do we run into "other" issues because we're
running those callbacks after virNetDaemonClose and virNetlinkShutdown?
Of course the "fear" of moving is that it causes "other" timing
issues
with previous adjustments here, in particular:
commit id '5c756e580' and 'a602e90bc1' which introduced and adjusted
driversInitialized
I'm OK with leaving things as is, but perhaps someone else will see this
an comment as well.
John
> + /* unref daemon only here as hypervisor drivers can
> + call shutdown inhibition functions on cleanup */
> + virObjectUnref(dmn);
>
> return ret;
> }
>