On Mon, Sep 08, 2014 at 04:17:44PM +0200, Martin Kletzander wrote:
Many moons ago, I wanted to make locking more debuggable from logs.
An idea appeared that this might be better off with SystemTap. I let
it rot for a while and some time back I had a deadlock that I wanted
to try it on. So I started scripting it and here's the work (in
progress).
Well, the idea is simple. For each mutex that is about to get locked,
append it's symname and pid that locked it into the list A, when it
get's locked move it to list B and when it's unlocked, remove it.
Then whenever the user presses ^C, the script prints all the locks
that were locked (with their particular backtraces) and locks that are
waiting to be locked. This could be enhanced that it would print both
pieces of information only for the locks that are waiting to be
acquired. But I don't really care what color the shed will have.
Here are some problems I'd love to get any help with:
- I cannot run it against built git version, two problems with that:
- When I'm not root, staprun needs setuid bit set and that is, of
course, incompatible with the needed LD_LIBRARY_PATH.
The example command line shows you using 'stap' to actually
launch libvirtd. This is a convenient approach since stap
automatically gets the PID of the process to trace. You do
not need todo it this way though. You can simply run the
stap script, and then start libvirtd manually (at least for
session mode libvirtd).
- Even when I am root, I'm unable to get any backtrace that
goes
beyond virLogMutex() no matter how I'm loading the debug symbols.
Hmm, I just tried your example script and seemed to get the
full trace of symbols
# ./run stap --ldd -c daemon/libvirtd -d daemon/libvirtd
examples/systemtap/lock-debug.stp src/.libs/libvirt.so
WARNING: Missing unwind data for module, rerun with 'stap -d
...ge/src/virt/libvirt/src/.libs/libvirt_driver_vbox_network.so'
WARNING: Missing unwind data for module, rerun with 'stap -d
...ge/src/virt/libvirt/src/.libs/libvirt_driver_vbox_storage.so'
WARNING: Missing unwind data for module, rerun with 'stap -d
...errange/src/virt/libvirt/src/.libs/libvirt_driver_nodedev.so'
WARNING: Missing unwind data for module, rerun with 'stap -d
...6/berrange/src/virt/libvirt/src/.libs/libvirt_driver_vbox.so'
WARNING: Missing unwind data for module, rerun with 'stap -d
...errange/src/virt/libvirt/src/.libs/libvirt_driver_network.so'
Started, press ^C when the proccess hangs
^C
=============
The following locks are locked:
0x7f5ed7499be8(22452):
virMutexLock+0x0 [libvirt.so.0.1002.9]
virThreadPoolWorker+0x51 [libvirt.so.0.1002.9]
virThreadHelper+0x2e [libvirt.so.0.1002.9]
start_thread+0xc5 [libpthread-2.18.so]
clone+0x6d [libc-2.18.so]
---
0x7f5ed7499be8(22454):
virMutexLock+0x0 [libvirt.so.0.1002.9]
virThreadPoolWorker+0x51 [libvirt.so.0.1002.9]
virThreadHelper+0x2e [libvirt.so.0.1002.9]
start_thread+0xc5 [libpthread-2.18.so]
clone+0x6d [libc-2.18.so]
---
0x7f5ed7499be8(22456):
virMutexLock+0x0 [libvirt.so.0.1002.9]
virThreadPoolWorker+0x51 [libvirt.so.0.1002.9]
virThreadHelper+0x2e [libvirt.so.0.1002.9]
start_thread+0xc5 [libpthread-2.18.so]
clone+0x6d [libc-2.18.so]
---
0x7f5ed7499be8(22453):
virMutexLock+0x0 [libvirt.so.0.1002.9]
virThreadPoolWorker+0x51 [libvirt.so.0.1002.9]
virThreadHelper+0x2e [libvirt.so.0.1002.9]
start_thread+0xc5 [libpthread-2.18.so]
clone+0x6d [libc-2.18.so]
---
0x7f5ed7499be8(22455):
virMutexLock+0x0 [libvirt.so.0.1002.9]
virThreadPoolWorker+0x51 [libvirt.so.0.1002.9]
virThreadHelper+0x2e [libvirt.so.0.1002.9]
start_thread+0xc5 [libpthread-2.18.so]
clone+0x6d [libc-2.18.so]
---
0x7f5ed7499be8(22461):
virMutexLock+0x0 [libvirt.so.0.1002.9]
virThreadPoolWorker+0x51 [libvirt.so.0.1002.9]
virThreadHelper+0x2e [libvirt.so.0.1002.9]
start_thread+0xc5 [libpthread-2.18.so]
clone+0x6d [libc-2.18.so]
---
0x7f5ed7499be8(22457):
virMutexLock+0x0 [libvirt.so.0.1002.9]
virThreadPoolWorker+0x51 [libvirt.so.0.1002.9]
virThreadHelper+0x2e [libvirt.so.0.1002.9]
start_thread+0xc5 [libpthread-2.18.so]
clone+0x6d [libc-2.18.so]
---
0x7f5ed7499be8(22460):
virMutexLock+0x0 [libvirt.so.0.1002.9]
virThreadPoolWorker+0x51 [libvirt.so.0.1002.9]
virThreadHelper+0x2e [libvirt.so.0.1002.9]
start_thread+0xc5 [libpthread-2.18.so]
clone+0x6d [libc-2.18.so]
---
0x7f5ed7499be8(22459):
virMutexLock+0x0 [libvirt.so.0.1002.9]
virThreadPoolWorker+0x51 [libvirt.so.0.1002.9]
virThreadHelper+0x2e [libvirt.so.0.1002.9]
start_thread+0xc5 [libpthread-2.18.so]
clone+0x6d [libc-2.18.so]
---
0x7f5ed7499be8(22458):
virMutexLock+0x0 [libvirt.so.0.1002.9]
virThreadPoolWorker+0x51 [libvirt.so.0.1002.9]
virThreadHelper+0x2e [libvirt.so.0.1002.9]
start_thread+0xc5 [libpthread-2.18.so]
clone+0x6d [libc-2.18.so]
---
0x7f5eb4000c00(22462):
virMutexLock+0x0 [libvirt.so.0.1002.9]
0x7f5ec1bb65bb [libvirt_driver_network.so+0xd5bb]
---
virLogMutex+0x0/0x28
[...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9](22449):
virMutexLock+0x0 [libvirt.so.0.1002.9]
main+0x1f33 [lt-libvirtd]
__libc_start_main+0xf5 [libc-2.18.so]
_start+0x29 [lt-libvirtd]
---
The following locks are waiting to be acquired:
virLogMutex+0x0/0x28
[...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9](22462):
virMutexLock+0x0 [libvirt.so.0.1002.9]
virLogVMessage+0x365 [libvirt.so.0.1002.9]
virLogMessage+0x97 [libvirt.so.0.1002.9]
virCommandRunAsync+0x44f [libvirt.so.0.1002.9]
virCommandRun+0x3a4 [libvirt.so.0.1002.9]
dnsmasqCapsRefreshInternal+0x106 [libvirt.so.0.1002.9]
dnsmasqCapsNewFromBinary+0x1b [libvirt.so.0.1002.9]
0x7f5ec1bb67e2 [libvirt_driver_network.so+0xd7e2]
---
eventLoop+0x0/0x70
[...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9](22449):
virMutexLock+0x0 [libvirt.so.0.1002.9]
virEventPollRunOnce+0x401 [libvirt.so.0.1002.9]
virEventRunDefaultImpl+0x42 [libvirt.so.0.1002.9]
virNetServerRun+0x13d [lt-libvirtd]
main+0x1f40 [lt-libvirtd]
__libc_start_main+0xf5 [libc-2.18.so]
_start+0x29 [lt-libvirtd]
---
- When running with installed libvirtd, I am not getting as nice
backtrace as I would in gdb (but that's probably the only way how
systemtap will get any backtrace so it's better than nothing). And
I'm unable to get line numbers even with addr2line.
Yeah, the systemtap backtrace is based solely on info available in
the loaded ELF library unwind data. GDB meanwhile will look at the
debuginfo packages to resolve to line numbers/files. The data your
script is printing out has only shown addresses relative to the
start of the function, and addr2line cannot convert that into line
numbers. AFAICT addr2line only works with addresses relative to
the load address of the ELF library.
You can make a tweak to the script to instead use sprint_usyms(ubacktrace())
which gets it to print out more data
---
0x7f224c000c00(23921):
0x7f226f0cd850 : virMutexLock+0x0/0x10
[...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9]
0x7f225a7f35bb : 0x7f225a7f35bb
[...errange/src/virt/libvirt/src/.libs/libvirt_driver_network.so+0xd5bb/0x223000]
---
virLogMutex+0x0/0x28
[...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9](23908):
0x7f226f0cd850 : virMutexLock+0x0/0x10
[...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9]
0x7f226fc1d163 : main+0x1f33/0x269c
[...1524c9576/berrange/src/virt/libvirt/daemon/.libs/lt-libvirtd]
0x7f226b94ed65 : __libc_start_main+0xf5/0x1c0 [/usr/lib64/libc-2.18.so]
0x7f226fc1d8f5 : _start+0x29/0x34
[...1524c9576/berrange/src/virt/libvirt/daemon/.libs/lt-libvirtd]
Unfortuntely these addresses appear to be absolute offsets in the process
memory, which is still useless unless you know the load address of each
ELF module.
I definitely think it ought to be possible to convert an address like
virCommandRunAsync+0x44f into a proper line number though. Perhaps we
would need to directly write code against elfutils, instead of relying
on addr2line.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|