Steps to produce this bug:
1. # virsh migrate vm1 --p2p qemu+tls://<remote host>/system
error: End of file while reading data: Input/output error
Now the libvirtd crashed.
This bug only happened twice.
I use gdb to analyze the core file:
(gdb) info threads
6 Thread 3952 0x000000351fe0b43c in pthread_cond_wait@(a)GLIBC_2.3.2 () from
/lib64/libpthread.so.0
5 Thread 3951 0x000000351fe0b43c in pthread_cond_wait@(a)GLIBC_2.3.2 () from
/lib64/libpthread.so.0
4 Thread 3950 0x000000351fe0b43c in pthread_cond_wait@(a)GLIBC_2.3.2 () from
/lib64/libpthread.so.0
3 Thread 3949 0x000000351fe0b43c in pthread_cond_wait@(a)GLIBC_2.3.2 () from
/lib64/libpthread.so.0
2 Thread 3948 0x000000351fe0b43c in pthread_cond_wait@(a)GLIBC_2.3.2 () from
/lib64/libpthread.so.0
* 1 Thread 3947 0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x0000003f2a834150 in _gnutls_string_resize (dest=0x7f92740079d8, new_size=<value
optimized out>) at gnutls_str.c:192
#2 0x0000003f2a81a614 in _gnutls_io_read_buffered (session=0x7f9274006d30,
iptr=0x7fffb3476148, sizeOfPtr=5, recv_type=<value optimized out>) at
gnutls_buffers.c:515
#3 0x0000003f2a816031 in _gnutls_recv_int (session=0x7f9274006d30,
type=GNUTLS_APPLICATION_DATA, htype=4294967295, data=0x7f92740155e8 "",
sizeofdata=4) at gnutls_record.c:904
#4 0x00007f9285bd2ec7 in virNetTLSSessionRead (sess=0x7f927400a5d0, buf=0x7f92740155e8
"", len=4) at rpc/virnettlscontext.c:812
#5 0x00007f9285bd50a4 in virNetSocketReadWire (sock=0x7f9274006ba0, buf=0x7f92740155e8
"", len=4) at rpc/virnetsocket.c:801
#6 0x00007f9285bd5815 in virNetSocketRead (sock=0x7f9274006ba0, buf=0x7f92740155e8
"", len=4) at rpc/virnetsocket.c:981
#7 0x00007f9285bce40f in virNetClientIOReadMessage (client=0x7f9274015590) at
rpc/virnetclient.c:711
#8 0x00007f9285bce461 in virNetClientIOHandleInput (client=0x7f9274015590) at
rpc/virnetclient.c:730
#9 0x00007f9285bcf0f4 in virNetClientIncomingEvent (sock=0x7f9274006ba0, events=1,
opaque=0x7f9274015590) at rpc/virnetclient.c:1119
#10 0x00007f9285bd5b87 in virNetSocketEventHandle (fd=13, watch=20, events=1,
opaque=0x7f9274006ba0) at rpc/virnetsocket.c:1052
#11 0x00007f9285b09325 in virEventPollDispatchHandles (nfds=10, fds=0xfe51d0) at
util/event_poll.c:469
#12 0x00007f9285b09a7e in virEventPollRunOnce () at util/event_poll.c:610
#13 0x00007f9285b07ec5 in virEventRunDefaultImpl () at util/event.c:247
#14 0x0000000000449cc2 in virNetServerRun (srv=0xfc3490) at rpc/virnetserver.c:662
#15 0x000000000041e3c5 in main (argc=2, argv=0x7fffb3476b68) at libvirtd.c:1561
The debug log in /var/log/libvirt/libvirtd.log:
...
11:18:27.838: 1848: debug : virEventPollRemoveHandle:171 : Remove handle w=13
11:18:27.838: 1847: debug : virEventPollDispatchHandles:454 : i=3 w=4
11:18:27.838: 1847: debug : virEventPollDispatchHandles:454 : i=4 w=5
11:18:27.838: 1847: debug : virEventPollDispatchHandles:454 : i=5 w=6
11:18:27.838: 1847: debug : virEventPollDispatchHandles:454 : i=6 w=12
11:18:27.838: 1847: debug : virEventPollDispatchHandles:454 : i=7 w=13
11:18:27.838: 1847: debug : virEventPollDispatchHandles:467 : Dispatch n=7 f=20 w=13 e=1
0x7f00780aaa80
11:18:27.838: 1848: debug : virEventPollRemoveHandle:184 : mark delete 7 20
11:18:27.838: 1848: debug : virEventPollInterruptLocked:677 : Interrupting
11:18:27.838: 1848: debug : virNetSocketFree:627 : sock=0x7f00780aaa80 fd=20
11:18:27.838: 1848: debug : virEventPollRemoveTimeout:276 : Remove timer 10
11:18:27.838: 1848: debug : virEventPollInterruptLocked:677 : Interrupting
11:18:27.838: 1848: debug : virDomainObjUnref:1142 : obj=0x7f007800ce10 refs=2
11:18:27.838: 1848: debug : virDomainObjUnref:1142 : obj=0x7f007800ce10 refs=1
====== end of log =====
The reason is that: we dispatch handle(fd = 20, watch=13) and remove the handle(watch=13)
almost
at the same time(The order is: dispatch, remove)
We remove the handle when remote connection is closed, and we will call virNetSocketFree()
to free sock,
but we still use sock in another thread. It's very dangerous!!!
I think we should wait dispatching a handle when removing the same handle.
Show replies by date