On Fri, May 18, 2012 at 11:55:05AM +0800, Daniel Veillard wrote:
One of the ouput raised frequently is:
----------------------------------------------------------------------
==5838== WARNING: Possible data race during write of size 4 at 0x34723B3BD0: {{{
==5838== T1 (L{L17, L125}):
==5838== #0 ptmalloc_unlock_all2 /lib64/libc-2.14.90.so
==5838== #1 fork /lib64/libc-2.14.90.so
==5838== #2 virFork /usr/src/debug/libvirt-0.9.12/src/util/command.c:268
==5838== #3 virExecWithHook.constprop.5
/usr/src/debug/libvirt-0.9.12/src/util/command.c:489
==5838== #4 virCommandRunAsync
/usr/src/debug/libvirt-0.9.12/src/util/command.c:2177
==5838== #5 virCommandRun /usr/src/debug/libvirt-0.9.12/src/util/command.c:1970
==5838== #6 qemuCapsExtractVersionInfo
/usr/src/debug/libvirt-0.9.12/src/qemu/qemu_capabilities.c:1379
==5838== #7 qemuCapsExtractVersion
/usr/src/debug/libvirt-0.9.12/src/qemu/qemu_capabilities.c:1568
==5838== #8 qemudGetVersion
/usr/src/debug/libvirt-0.9.12/src/qemu/qemu_driver.c:1320
==5838== #9 virConnectGetVersion /usr/src/debug/libvirt-0.9.12/src/libvirt.c:1640
==5838== #10 remoteDispatchGetVersionHelper
/usr/src/debug/libvirt-0.9.12/daemon/remote_dispatch.h:6862
==5838== #11 virNetServerProgramDispatch
/usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:416
==5838== Concurrent read(s) happened at (OR AFTER) these points:
==5838== T11 (L{}):
==5838== #0 arena_thread_freeres /lib64/libc-2.14.90.so
==5838== #1 __libc_thread_freeres /lib64/libc-2.14.90.so
==5838== #2 start_thread /lib64/libpthread-2.14.90.so
==5838== Address 0x34723B3BD0 is 0 bytes inside data symbol "list_lock"
==5838== Locks involved in this report (reporting last lock sites): {L17, L125}
==5838== L17 (0x5339B00)
==5838== #0 pthread_mutex_lock
/mnt/data/build/slave/full_linux_build/build/tsan/ts_valgrind_intercepts.c:935
==5838== #1 virFork /usr/src/debug/libvirt-0.9.12/src/util/command.c:266
==5838== #2 virExecWithHook.constprop.5
/usr/src/debug/libvirt-0.9.12/src/util/command.c:489
==5838== #3 virCommandRunAsync
/usr/src/debug/libvirt-0.9.12/src/util/command.c:2177
==5838== #4 virCommandRun /usr/src/debug/libvirt-0.9.12/src/util/command.c:1970
==5838== #5 qemuCapsExtractVersionInfo
/usr/src/debug/libvirt-0.9.12/src/qemu/qemu_capabilities.c:1379
==5838== #6 qemuCapsExtractVersion
/usr/src/debug/libvirt-0.9.12/src/qemu/qemu_capabilities.c:1568
==5838== #7 qemudGetVersion
/usr/src/debug/libvirt-0.9.12/src/qemu/qemu_driver.c:1320
==5838== #8 virConnectGetVersion /usr/src/debug/libvirt-0.9.12/src/libvirt.c:1640
==5838== #9 remoteDispatchGetVersionHelper
/usr/src/debug/libvirt-0.9.12/daemon/remote_dispatch.h:6862
==5838== #10 virNetServerProgramDispatch
/usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:416
==5838== #11 virNetServerHandleJob
/usr/src/debug/libvirt-0.9.12/src/rpc/virnetserver.c:161
==5838== L125 (0x4C0A0A00)
==5838== #0 pthread_mutex_lock
/mnt/data/build/slave/full_linux_build/build/tsan/ts_valgrind_intercepts.c:935
==5838== #1 qemudGetVersion
/usr/src/debug/libvirt-0.9.12/src/qemu/qemu_driver.c:1319
==5838== #2 virConnectGetVersion /usr/src/debug/libvirt-0.9.12/src/libvirt.c:1640
==5838== #3 remoteDispatchGetVersionHelper
/usr/src/debug/libvirt-0.9.12/daemon/remote_dispatch.h:6862
==5838== #4 virNetServerProgramDispatch
/usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:416
==5838== #5 virNetServerHandleJob
/usr/src/debug/libvirt-0.9.12/src/rpc/virnetserver.c:161
==5838== #6 virThreadPoolWorker
/usr/src/debug/libvirt-0.9.12/src/util/threadpool.c:144
==5838== #7 virThreadHelper
/usr/src/debug/libvirt-0.9.12/src/util/threads-pthread.c:161
==5838== Race verifier data: 0x347207CF98,0x347215D6BB
==5838== }}}
I don't know what "Race verifier data" means but it seems that we do
*pid = fork(); in the main thread and that another thread is accessing
without synchronization. The probram properly report that 2 locks are
held at that point, the virLogLock() and qemuDriverLock() and give full
stack trace on where those locks were taken. That's convenient but
I could not really spot where the second thread did the access, I suspect
that's somehow related to libpthread considering the minimal stack trace
of the second thread T11 ...
No, those first 2 stack traces are refering to the locks inside GLibC's
malloc() implementation, so not something in our code. THis is bogus
I believe.
I however found out a couple of issues not related to libpthread:
==5733== WARNING: Possible data race during write of size 1 at 0x34723B6B14: {{{
==5733== T5 (L{}):
==5733== #0 Replace_memcpy
/mnt/data/build/slave/full_linux_build/build/tsan/ts_replace.h:113
==5733== #1 __GI_memcpy
/mnt/data/build/slave/full_linux_build/build/tsan/ts_valgrind_intercepts.c:2359
==5733== #2 xdrmem_getbytes /lib64/libc-2.14.90.so
==5733== #3 xdr_string /lib64/libc-2.14.90.so
==5733== #4 xdr_remote_nonnull_string
/usr/src/debug/libvirt-0.9.12/daemon/../src/remote/remote_protocol.c:31
==5733== #5 xdr_remote_nonnull_domain
/usr/src/debug/libvirt-0.9.12/daemon/../src/remote/remote_protocol.c:58
==5733== #6 xdr_remote_domain_get_info_args
/usr/src/debug/libvirt-0.9.12/daemon/../src/remote/remote_protocol.c:1390
==5733== #7 virNetMessageDecodePayload
/usr/src/debug/libvirt-0.9.12/src/rpc/virnetmessage.c:382
==5733== #8 virNetServerProgramDispatch
/usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:403
==5733== #9 virNetServerHandleJob
/usr/src/debug/libvirt-0.9.12/src/rpc/virnetserver.c:161
==5733== #10 virThreadPoolWorker
/usr/src/debug/libvirt-0.9.12/src/util/threadpool.c:144
==5733== #11 virThreadHelper
/usr/src/debug/libvirt-0.9.12/src/util/threads-pthread.c:161
==5733== Concurrent write(s) happened at (OR AFTER) these points:
==5733== T2 (L{}):
==5733== #0 xdrmem_getbytes /lib64/libc-2.14.90.so
==5733== #1 xdr_string /lib64/libc-2.14.90.so
==5733== #2 xdr_remote_nonnull_string
/usr/src/debug/libvirt-0.9.12/daemon/../src/remote/remote_protocol.c:31
==5733== #3 xdr_remote_nonnull_domain
/usr/src/debug/libvirt-0.9.12/daemon/../src/remote/remote_protocol.c:58
==5733== #4 xdr_remote_domain_get_xml_desc_args
/usr/src/debug/libvirt-0.9.12/daemon/../src/remote/remote_protocol.c:1532
==5733== #5 virNetMessageDecodePayload
/usr/src/debug/libvirt-0.9.12/src/rpc/virnetmessage.c:382
==5733== #6 virNetServerProgramDispatch
/usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:403
==5733== #7 virNetServerHandleJob
/usr/src/debug/libvirt-0.9.12/src/rpc/virnetserver.c:161
==5733== #8 virThreadPoolWorker
/usr/src/debug/libvirt-0.9.12/src/util/threadpool.c:144
==5733== #9 virThreadHelper
/usr/src/debug/libvirt-0.9.12/src/util/threads-pthread.c:161
==5733== Address 0x34723B6B14 is 0 bytes inside data symbol "crud.6424"
==5733== Race verifier data: 0x4A091CE,0x347212579D
==5733== }}}
that one could potentially be nasty, it seems that 2 threads decoding
rpc payload in parallel hit an unsynchronized write access in
the xdr routine for xdr_remote_nonnull_string() . I didn't had the
debuginfo for glibc installed when I ran it so I don't have the
exact informations at the top of the stacks
This looks bogus to me too. These are two completely separate RPC messages
being dispatched here (virDomainGetInfo and virDomainGetXMLDesc). If they
truely pointed to the same data, we'd be going SEGV tastic all over the
place.
The other one raised is actually something I had initially found
with the first run on libxml2 so it's relatively consistent:
Yay libxml2 is not thread safe !
"The function rand() is not reentrant or thread-safe, since
it uses hidden state that is modified on each call. This
might just be the seed value to be used by the next call, or
it might be something more elaborate. In order to get
reproducible behavior in a threaded application, this state
must be made explicit; this can be done using the reentrant
function rand_r()."
You should probably validate libxml2 against all the functions
have blacklisted for libvirt in Makefile.nonreentrant
==5733== WARNING: Possible data race during read of size 4 at 0x34723B10B4: {{{
==5733== T1 (L{L173}):
==5733== #0 random_r /lib64/libc-2.14.90.so
==5733== #1 random /lib64/libc-2.14.90.so
==5733== #2 rand /lib64/libc-2.14.90.so
==5733== #3 xmlDictCreate /usr/src/debug/libxml2-2.8.0/dict.c:491
==5733== #4 xmlInitParserCtxt /usr/src/debug/libxml2-2.8.0/parserInternals.c:1620
==5733== #5 xmlNewParserCtxt /usr/src/debug/libxml2-2.8.0/parserInternals.c:1874
==5733== #6 xmlCreateURLParserCtxt /usr/src/debug/libxml2-2.8.0/parser.c:13546
==5733== #7 xmlReadFile /usr/src/debug/libxml2-2.8.0/parser.c:14713
==5733== #8 xmlRelaxNGParse /usr/src/debug/libxml2-2.8.0/relaxng.c:7480
==5733== #9 (no symbols) /usr/lib64/libnetcf.so.1.4.0
==5733== #10 ncf_init /usr/lib64/libnetcf.so.1.4.0
==5733== #11 interfaceOpenInterface
/usr/src/debug/libvirt-0.9.12/src/interface/netcf_driver.c:144
==5733== Concurrent write(s) happened at (OR AFTER) these points:
==5733== T10 (L{L156}):
==5733== #0 random_r /lib64/libc-2.14.90.so
==5733== #1 random /lib64/libc-2.14.90.so
==5733== #2 rand /lib64/libc-2.14.90.so
==5733== #3 xmlHashCreate /usr/src/debug/libxml2-2.8.0/hash.c:193
==5733== #4 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:4125
==5733== #5 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6380
==5733== #6 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6523
==5733== #7 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6319
==5733== #8 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6520
==5733== #9 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6523
==5733== Address 0x34723B10B4 is 20 bytes inside data symbol "randtbl"
==5733== Locks involved in this report (reporting last lock sites): {L156, L173}
==5733== L156 (0x402E790)
==5733== #0 pthread_mutex_lock
/mnt/data/build/slave/full_linux_build/build/tsan/ts_valgrind_intercepts.c:935
==5733== #1 remoteDispatchOpenHelper
/usr/src/debug/libvirt-0.9.12/daemon/remote.c:683
==5733== #2 virNetServerProgramDispatch
/usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:416
==5733== #3 virNetServerHandleJob
/usr/src/debug/libvirt-0.9.12/src/rpc/virnetserver.c:161
==5733== #4 virThreadPoolWorker
/usr/src/debug/libvirt-0.9.12/src/util/threadpool.c:144
==5733== #5 virThreadHelper
/usr/src/debug/libvirt-0.9.12/src/util/threads-pthread.c:161
==5733== L173 (0x402F420)
==5733== #0 pthread_mutex_lock
/mnt/data/build/slave/full_linux_build/build/tsan/ts_valgrind_intercepts.c:935
==5733== #1 remoteDispatchOpenHelper
/usr/src/debug/libvirt-0.9.12/daemon/remote.c:683
==5733== #2 virNetServerProgramDispatch
/usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:416
==5733== #3 virNetServerHandleJob
/usr/src/debug/libvirt-0.9.12/src/rpc/virnetserver.c:161
==5733== #4 virThreadPoolWorker
/usr/src/debug/libvirt-0.9.12/src/util/threadpool.c:144
==5733== #5 virThreadHelper
/usr/src/debug/libvirt-0.9.12/src/util/threads-pthread.c:161
==5733== Race verifier data: 0x347203A572,0x347203A565
==5733== }}}
I didn't know that netcf was doing Relax-NG validation on-the-fly :)
and hit another in libxml2 though libvirt/netcf/libxslt/libxml2
call:
==5733== WARNING: Possible data race during read of size 4 at 0x34723B10B4: {{{
==5733== T1 (L{L173}):
==5733== #0 random_r /lib64/libc-2.14.90.so
==5733== #1 random /lib64/libc-2.14.90.so
==5733== #2 rand /lib64/libc-2.14.90.so
==5733== #3 xmlDictCreate /usr/src/debug/libxml2-2.8.0/dict.c:491
==5733== #4 xmlInitParserCtxt /usr/src/debug/libxml2-2.8.0/parserInternals.c:1620
==5733== #5 xmlNewParserCtxt /usr/src/debug/libxml2-2.8.0/parserInternals.c:1874
==5733== #6 xmlCreateURLParserCtxt /usr/src/debug/libxml2-2.8.0/parser.c:13546
==5733== #7 xmlReadFile /usr/src/debug/libxml2-2.8.0/parser.c:14713
==5733== #8 xmlRelaxNGParse /usr/src/debug/libxml2-2.8.0/relaxng.c:7480
==5733== #9 (no symbols) /usr/lib64/libnetcf.so.1.4.0
==5733== #10 ncf_init /usr/lib64/libnetcf.so.1.4.0
==5733== #11 interfaceOpenInterface
/usr/src/debug/libvirt-0.9.12/src/interface/netcf_driver.c:144
==5733== Concurrent write(s) happened at (OR AFTER) these points:
==5733== T10 (L{L156}):
==5733== #0 random_r /lib64/libc-2.14.90.so
==5733== #1 random /lib64/libc-2.14.90.so
==5733== #2 rand /lib64/libc-2.14.90.so
==5733== #3 xmlHashCreate /usr/src/debug/libxml2-2.8.0/hash.c:193
==5733== #4 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:4125
==5733== #5 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6380
==5733== #6 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6523
==5733== #7 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6319
==5733== #8 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6520
==5733== #9 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6523
==5733== Address 0x34723B10B4 is 20 bytes inside data symbol "randtbl"
==5733== Locks involved in this report (reporting last lock sites): {L156, L173}
Looks like another libxml2 bug calling rand, I should use rand_r() instead
if available.
Yep.
So except for the parallel xdr_remote_nonnull_string() issue
I don't think there is anything I can report at this point for libvirt
code itself.
I think that is bogus.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|