Hi guys,
Could you help me to review it ?
thanks,
haitao
On 2019/6/12 15:18, Liu Haitao wrote:
When reboot the host, a core dump file would be generated.
The call traces are:
Note.In this case, the main thread is thread 5.
(gdb) thread 5
[Switching to thread 5 (LWP 4142)]
(gdb) bt
0 0x00007f00a6838273 in futex_wait_cancelable (private=<optimized out>,
expected=0, futex_word=0x7f004c0125c0)
at /usr/src/debug/glibc/2.24-r0/git/sysdeps/unix/sysv/linux/futex-internal.h:88
1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7f004c012540,
cond=0x7f004c012598)
at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_cond_wait.c:502
2 __pthread_cond_wait (cond=0x7f004c012598, mutex=0x7f004c012540)
at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_cond_wait.c:655
3 0x00007f00aa467246 in virCondWait (c=<optimized out>, m=<optimized out>)
at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:154
4 0x00007f00aa467eb0 in virThreadPoolFree (pool=<optimized out>)
at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthreadpool.c:286
5 0x00007f0074349f9d in qemuStateCleanup ()
at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c:1036
6 0x00007f00aa5e9486 in virStateCleanup ()
at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/libvirt.c:682
7 0x000055a687ab86a4 in main (argc=<optimized out>, argv=<optimized out>)
at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/remote/remote_daemon.c:1473
(gdb) thread 1
[Switching to thread 1 (LWP 4403)]
(gdb) bt
0 __GI___pthread_mutex_lock (mutex=mutex@entry=0x0)
at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_mutex_lock.c:67
1 0x00007f00aa467165 in virMutexLock (m=m@entry=0x0)
at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:89
2 0x00007f00aa43c0f9 in virNetlinkEventServerLock (driver=<optimized out>)
at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetlink.c:799
3 virNetlinkEventRemoveClient (watch=watch@entry=0,
macaddr=macaddr@entry=0x7f0088014944, protocol=protocol@entry=0)
at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetlink.c:1197
4 0x00007f00aa4341df in virNetDevMacVLanDeleteWithVPortProfile (
ifname=<optimized out>, macaddr=macaddr@entry=0x7f0088014944,
linkdev=0x7f0088014920 "eth1", mode=mode@entry=1,
virtPortProfile=virtPortProfile@entry=0x0,
stateDir=stateDir@entry=0x7f004c12fa90 "/var/run/libvirt/qemu")
at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetdevmacvlan.c:1112
5 0x00007f0074312251 in qemuProcessStop (driver=driver@entry=0x7f004c0ecef0,
vm=vm@entry=0x7f0088000b00,
reason=reason@entry=VIR_DOMAIN_SHUTOFF_SHUTDOWN,
asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_NONE, flags=<optimized out>)
at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_process.c:7291
6 0x00007f007437a5ea in processMonitorEOFEvent (vm=0x7f0088000b00,
driver=0x7f004c0ecef0)
at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c:4756
7 qemuProcessEventHandler (data=0x55a687d6df10, opaque=0x7f004c0ecef0)
at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c:4859
8 0x00007f00aa467c5b in virThreadPoolWorker (
opaque=opaque@entry=0x55a687d6c110)
at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthreadpool.c:163
9 0x00007f00aa466fe8 in virThreadHelper (data=<optimized out>)
at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:206
10 0x00007f00a68323f4 in start_thread (arg=0x7f00699df700)
at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_create.c:456
11 0x00007f00a616e10f in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
1. The execution flow of main thread (Thread 5 LWP 4142):
main()
-->virNetDaemonRun()
-->virNetDaemonClose(dmn) //cleanup
-->virNetlinkEventServiceStopAll()
-->virStateCleanup()
-->qemuStateCleanup()
-->virThreadPoolFree()
-->__pthread_cond_wait()
virNetDaemonRun()
-->virEventRunDefaultImpl
-->virEventPollRunOnce
-->virEventPollDispatchHandles
-->qemuMonitorIO
-->qemuProcessHandleMonitorEOF
-->processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF
-->virThreadPoolSendJob()
After typing reboot command on the host, the main thread would send an event message to
another thread.
Here it would let thread 1 to handle the shutdown of qemu process. But it could
not be executed immediately.
virNetlinkEventServiceStopAll()
--> virNetlinkEventServiceStop()
--> server[protocol] = NULL; // set server to null
IN virNetlinkEventServiceStopAll(), some variables related to network are freed,
like (static virNetlinkEventSrvPrivatePtr server).
virStateCleanup()
-->qemuStateCleanup()
-->virThreadPoolFree()
-->__pthread_cond_wait()
In virThreadPoolFree() it will wait other thread to end up.
2. The execution flow of thread 5 (LWP 4403):
qemuProcessStop()
-->virNetDevMacVLanDeleteWithVPortProfile()
-->virNetlinkEventRemoveClient()
--> srv = server[protocol]
Although the main thread had sent the message to thread 1(4403), it could not be
run instantly. It means that the virNetlinkEventServiceStopAll() might be
executed earlier than virNetlinkEventRemoveClient(). We could get it from the log file.
""
2019-06-12 00:10:09.230+0000: 4142: info : virNetlinkEventServiceStopAll:941 : stopping
all netlink event services
2019-06-12 00:10:09.230+0000: 4142: info : virNetlinkEventServiceStop:904 : stopping
netlink event service
2019-06-12 00:10:21.165+0000: 4403: debug : virNetlinkEventRemoveClient:1190 : removing
client watch=0, mac=0x7f0088014944.
"
In virNetlinkEventRemoveClient() the variable server is used again, but now it
is null that is freed by virNetlinkEventServiceStopAll().So it would case a crash .
The virNetlinkEventServiceStopAll() should be executed behind virStateCleanup(),
Signed-off-by: Liu Haitao <haitao.liu(a)windriver.com>
---
src/remote/remote_daemon.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/remote/remote_daemon.c b/src/remote/remote_daemon.c
index c3782971f1..7da20a6644 100644
--- a/src/remote/remote_daemon.c
+++ b/src/remote/remote_daemon.c
@@ -1464,8 +1464,6 @@ int main(int argc, char **argv) {
/* Keep cleanup order in inverse order of startup */
virNetDaemonClose(dmn);
- virNetlinkEventServiceStopAll();
-
if (driversInitialized) {
/* NB: Possible issue with timing window between driversInitialized
* setting if virNetlinkEventServerStart fails */
@@ -1473,6 +1471,8 @@ int main(int argc, char **argv) {
virStateCleanup();
}
+ virNetlinkEventServiceStopAll();
+
virObjectUnref(adminProgram);
virObjectUnref(srvAdm);
virObjectUnref(qemuProgram);