[libvirt] [PATCH 0/2] [RFC] Use the power of SystemTap to get rid of all* deadlocks forever*

Many moons ago, I wanted to make locking more debuggable from logs. An idea appeared that this might be better off with SystemTap. I let it rot for a while and some time back I had a deadlock that I wanted to try it on. So I started scripting it and here's the work (in progress). Well, the idea is simple. For each mutex that is about to get locked, append it's symname and pid that locked it into the list A, when it get's locked move it to list B and when it's unlocked, remove it. Then whenever the user presses ^C, the script prints all the locks that were locked (with their particular backtraces) and locks that are waiting to be locked. This could be enhanced that it would print both pieces of information only for the locks that are waiting to be acquired. But I don't really care what color the shed will have. Here are some problems I'd love to get any help with: - I cannot run it against built git version, two problems with that: - When I'm not root, staprun needs setuid bit set and that is, of course, incompatible with the needed LD_LIBRARY_PATH. - Even when I am root, I'm unable to get any backtrace that goes beyond virLogMutex() no matter how I'm loading the debug symbols. - When running with installed libvirtd, I am not getting as nice backtrace as I would in gdb (but that's probably the only way how systemtap will get any backtrace so it's better than nothing). And I'm unable to get line numbers even with addr2line. I'd be interested in hearing how this works for others, or any hints on how to make the script so good that I could post it for the inclusion upstream. Any best practices about systemtap scripts are welcomed as well. _______________________________________________________ (*) Not quite, I just wanted you to notice this mail ;) Martin Kletzander (2): lock-debug.stp DO NOT APPLY: deadlock test daemon/libvirtd.c | 1 + examples/systemtap/lock-debug.stp | 115 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 116 insertions(+) create mode 100644 examples/systemtap/lock-debug.stp -- 2.1.0

Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- examples/systemtap/lock-debug.stp | 115 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 examples/systemtap/lock-debug.stp diff --git a/examples/systemtap/lock-debug.stp b/examples/systemtap/lock-debug.stp new file mode 100644 index 0000000..bf18bc6 --- /dev/null +++ b/examples/systemtap/lock-debug.stp @@ -0,0 +1,115 @@ +#!/usr/bin/stap --ldd -d /usr/sbin/libvirtd -c libvirtd +# +# Usage with installed libvirt daemon: +# stap --ldd -d /usr/sbin/libvirtd -c libvirtd \ +# lock-debug.stp /usr/lib/libvirt.so +# +# If made executable; simple './lock-debug.stp' should work too. +# +# TODO: Document usage with uninstalled daemon and libs. +# Assuming CWD is toplevel source git directory, it should be only +# slight modification to the following: +# +# ./run stap --ldd -c daemon/libvirtd -d daemon/libvirtd +# examples/systemtap/lock-debug.stp src/.libs/libvirt.so +# +# Author: Martin Kletzander <mkletzan@redhat.com> + +global mx_tolock +global mx_locked + + +function filter() +{ + if (pid() != target()) + return 1 + + return 0 +} + +probe library = process( %( $# > 0 %? @1 %: "/usr/lib/libvirt.so" %) ) +{ + if (filter()) next +} + +probe lock = library.function("virMutexLock") +{ + lockname = usymdata($m) +} + +probe unlock = library.function("virMutexUnlock") +{ + lockname = usymdata($m) +} + +probe begin +{ + %( $# > 1 %? println("error: Too many parameters"); exit(); + %: print("Started, press ^C when the proccess hangs\n"); %) +} + +probe lock.call +{ + mx_tolock[lockname, tid()] = sprint_ubacktrace() +} + +probe lock.return +{ + if ([lockname, tid()] in mx_tolock) { + mx_locked[lockname, tid()] = mx_tolock[lockname, tid()] + delete mx_tolock[lockname, tid()] + } else { + printf("internal error: lock acquired unwillingly?\n") + } +} + +probe unlock.return +{ + found = 0 + + foreach ([lock, tid] in mx_locked) { + if (lock != lockname) + continue + if (tid != tid()) { + printf("Warning: lock released on different thread that locked it.\n"); + printf("Lock trace:\n%s\n", mx_locked[lock, tid]) + printf("Unlock trace:\n%s\n", sprint_ubacktrace()) + } + + found = tid + break + } + + if (found) { + if ([lockname, found] in mx_locked) + delete mx_locked[lockname, found] + else + println("FDSA") + } else { + printf("Warning: lock released without being locked.\n") + printf("Unlock trace:\n%s\n", sprint_ubacktrace()) + } +} + +probe end +{ + printf("\n=============\n") + + tmp = 0; + foreach (bt = [lock, tid] in mx_locked) { + if (!tmp++) + printf("The following locks are locked:\n") + printf("%s(%d):\n%s\n---\n", lock, tid, bt) + } + if (!tmp) + printf("No locks are locked, apparently\n") + + tmp = 0; + foreach (bt = [lock, tid] in mx_tolock) { + if (!tmp++) + printf("The following locks are waiting to be acquired:\n") + printf("%s(%d):\n%s\n---\n", lock, tid, bt) + } + if (!tmp) + printf("No locks are being acquired, apparently\n") +} -- 2.1.0

Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- daemon/libvirtd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/daemon/libvirtd.c b/daemon/libvirtd.c index 0503cd0..b803529 100644 --- a/daemon/libvirtd.c +++ b/daemon/libvirtd.c @@ -1536,6 +1536,7 @@ int main(int argc, char **argv) { } #endif + virLogLock(); /* Run event loop. */ virNetServerRun(srv); -- 2.1.0

On Mon, Sep 08, 2014 at 04:17:44PM +0200, Martin Kletzander wrote:
Many moons ago, I wanted to make locking more debuggable from logs. An idea appeared that this might be better off with SystemTap. I let it rot for a while and some time back I had a deadlock that I wanted to try it on. So I started scripting it and here's the work (in progress).
Well, the idea is simple. For each mutex that is about to get locked, append it's symname and pid that locked it into the list A, when it get's locked move it to list B and when it's unlocked, remove it. Then whenever the user presses ^C, the script prints all the locks that were locked (with their particular backtraces) and locks that are waiting to be locked. This could be enhanced that it would print both pieces of information only for the locks that are waiting to be acquired. But I don't really care what color the shed will have.
Here are some problems I'd love to get any help with:
- I cannot run it against built git version, two problems with that:
- When I'm not root, staprun needs setuid bit set and that is, of course, incompatible with the needed LD_LIBRARY_PATH.
The example command line shows you using 'stap' to actually launch libvirtd. This is a convenient approach since stap automatically gets the PID of the process to trace. You do not need todo it this way though. You can simply run the stap script, and then start libvirtd manually (at least for session mode libvirtd).
- Even when I am root, I'm unable to get any backtrace that goes beyond virLogMutex() no matter how I'm loading the debug symbols.
Hmm, I just tried your example script and seemed to get the full trace of symbols # ./run stap --ldd -c daemon/libvirtd -d daemon/libvirtd examples/systemtap/lock-debug.stp src/.libs/libvirt.so WARNING: Missing unwind data for module, rerun with 'stap -d ...ge/src/virt/libvirt/src/.libs/libvirt_driver_vbox_network.so' WARNING: Missing unwind data for module, rerun with 'stap -d ...ge/src/virt/libvirt/src/.libs/libvirt_driver_vbox_storage.so' WARNING: Missing unwind data for module, rerun with 'stap -d ...errange/src/virt/libvirt/src/.libs/libvirt_driver_nodedev.so' WARNING: Missing unwind data for module, rerun with 'stap -d ...6/berrange/src/virt/libvirt/src/.libs/libvirt_driver_vbox.so' WARNING: Missing unwind data for module, rerun with 'stap -d ...errange/src/virt/libvirt/src/.libs/libvirt_driver_network.so' Started, press ^C when the proccess hangs ^C ============= The following locks are locked: 0x7f5ed7499be8(22452): virMutexLock+0x0 [libvirt.so.0.1002.9] virThreadPoolWorker+0x51 [libvirt.so.0.1002.9] virThreadHelper+0x2e [libvirt.so.0.1002.9] start_thread+0xc5 [libpthread-2.18.so] clone+0x6d [libc-2.18.so] --- 0x7f5ed7499be8(22454): virMutexLock+0x0 [libvirt.so.0.1002.9] virThreadPoolWorker+0x51 [libvirt.so.0.1002.9] virThreadHelper+0x2e [libvirt.so.0.1002.9] start_thread+0xc5 [libpthread-2.18.so] clone+0x6d [libc-2.18.so] --- 0x7f5ed7499be8(22456): virMutexLock+0x0 [libvirt.so.0.1002.9] virThreadPoolWorker+0x51 [libvirt.so.0.1002.9] virThreadHelper+0x2e [libvirt.so.0.1002.9] start_thread+0xc5 [libpthread-2.18.so] clone+0x6d [libc-2.18.so] --- 0x7f5ed7499be8(22453): virMutexLock+0x0 [libvirt.so.0.1002.9] virThreadPoolWorker+0x51 [libvirt.so.0.1002.9] virThreadHelper+0x2e [libvirt.so.0.1002.9] start_thread+0xc5 [libpthread-2.18.so] clone+0x6d [libc-2.18.so] --- 0x7f5ed7499be8(22455): virMutexLock+0x0 [libvirt.so.0.1002.9] virThreadPoolWorker+0x51 [libvirt.so.0.1002.9] virThreadHelper+0x2e [libvirt.so.0.1002.9] start_thread+0xc5 [libpthread-2.18.so] clone+0x6d [libc-2.18.so] --- 0x7f5ed7499be8(22461): virMutexLock+0x0 [libvirt.so.0.1002.9] virThreadPoolWorker+0x51 [libvirt.so.0.1002.9] virThreadHelper+0x2e [libvirt.so.0.1002.9] start_thread+0xc5 [libpthread-2.18.so] clone+0x6d [libc-2.18.so] --- 0x7f5ed7499be8(22457): virMutexLock+0x0 [libvirt.so.0.1002.9] virThreadPoolWorker+0x51 [libvirt.so.0.1002.9] virThreadHelper+0x2e [libvirt.so.0.1002.9] start_thread+0xc5 [libpthread-2.18.so] clone+0x6d [libc-2.18.so] --- 0x7f5ed7499be8(22460): virMutexLock+0x0 [libvirt.so.0.1002.9] virThreadPoolWorker+0x51 [libvirt.so.0.1002.9] virThreadHelper+0x2e [libvirt.so.0.1002.9] start_thread+0xc5 [libpthread-2.18.so] clone+0x6d [libc-2.18.so] --- 0x7f5ed7499be8(22459): virMutexLock+0x0 [libvirt.so.0.1002.9] virThreadPoolWorker+0x51 [libvirt.so.0.1002.9] virThreadHelper+0x2e [libvirt.so.0.1002.9] start_thread+0xc5 [libpthread-2.18.so] clone+0x6d [libc-2.18.so] --- 0x7f5ed7499be8(22458): virMutexLock+0x0 [libvirt.so.0.1002.9] virThreadPoolWorker+0x51 [libvirt.so.0.1002.9] virThreadHelper+0x2e [libvirt.so.0.1002.9] start_thread+0xc5 [libpthread-2.18.so] clone+0x6d [libc-2.18.so] --- 0x7f5eb4000c00(22462): virMutexLock+0x0 [libvirt.so.0.1002.9] 0x7f5ec1bb65bb [libvirt_driver_network.so+0xd5bb] --- virLogMutex+0x0/0x28 [...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9](22449): virMutexLock+0x0 [libvirt.so.0.1002.9] main+0x1f33 [lt-libvirtd] __libc_start_main+0xf5 [libc-2.18.so] _start+0x29 [lt-libvirtd] --- The following locks are waiting to be acquired: virLogMutex+0x0/0x28 [...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9](22462): virMutexLock+0x0 [libvirt.so.0.1002.9] virLogVMessage+0x365 [libvirt.so.0.1002.9] virLogMessage+0x97 [libvirt.so.0.1002.9] virCommandRunAsync+0x44f [libvirt.so.0.1002.9] virCommandRun+0x3a4 [libvirt.so.0.1002.9] dnsmasqCapsRefreshInternal+0x106 [libvirt.so.0.1002.9] dnsmasqCapsNewFromBinary+0x1b [libvirt.so.0.1002.9] 0x7f5ec1bb67e2 [libvirt_driver_network.so+0xd7e2] --- eventLoop+0x0/0x70 [...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9](22449): virMutexLock+0x0 [libvirt.so.0.1002.9] virEventPollRunOnce+0x401 [libvirt.so.0.1002.9] virEventRunDefaultImpl+0x42 [libvirt.so.0.1002.9] virNetServerRun+0x13d [lt-libvirtd] main+0x1f40 [lt-libvirtd] __libc_start_main+0xf5 [libc-2.18.so] _start+0x29 [lt-libvirtd] ---
- When running with installed libvirtd, I am not getting as nice backtrace as I would in gdb (but that's probably the only way how systemtap will get any backtrace so it's better than nothing). And I'm unable to get line numbers even with addr2line.
Yeah, the systemtap backtrace is based solely on info available in the loaded ELF library unwind data. GDB meanwhile will look at the debuginfo packages to resolve to line numbers/files. The data your script is printing out has only shown addresses relative to the start of the function, and addr2line cannot convert that into line numbers. AFAICT addr2line only works with addresses relative to the load address of the ELF library. You can make a tweak to the script to instead use sprint_usyms(ubacktrace()) which gets it to print out more data --- 0x7f224c000c00(23921): 0x7f226f0cd850 : virMutexLock+0x0/0x10 [...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9] 0x7f225a7f35bb : 0x7f225a7f35bb [...errange/src/virt/libvirt/src/.libs/libvirt_driver_network.so+0xd5bb/0x223000] --- virLogMutex+0x0/0x28 [...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9](23908): 0x7f226f0cd850 : virMutexLock+0x0/0x10 [...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9] 0x7f226fc1d163 : main+0x1f33/0x269c [...1524c9576/berrange/src/virt/libvirt/daemon/.libs/lt-libvirtd] 0x7f226b94ed65 : __libc_start_main+0xf5/0x1c0 [/usr/lib64/libc-2.18.so] 0x7f226fc1d8f5 : _start+0x29/0x34 [...1524c9576/berrange/src/virt/libvirt/daemon/.libs/lt-libvirtd] Unfortuntely these addresses appear to be absolute offsets in the process memory, which is still useless unless you know the load address of each ELF module. I definitely think it ought to be possible to convert an address like virCommandRunAsync+0x44f into a proper line number though. Perhaps we would need to directly write code against elfutils, instead of relying on addr2line. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, Sep 08, 2014 at 03:59:04PM +0100, Daniel P. Berrange wrote:
On Mon, Sep 08, 2014 at 04:17:44PM +0200, Martin Kletzander wrote:
Many moons ago, I wanted to make locking more debuggable from logs. An idea appeared that this might be better off with SystemTap. I let it rot for a while and some time back I had a deadlock that I wanted to try it on. So I started scripting it and here's the work (in progress).
Well, the idea is simple. For each mutex that is about to get locked, append it's symname and pid that locked it into the list A, when it get's locked move it to list B and when it's unlocked, remove it. Then whenever the user presses ^C, the script prints all the locks that were locked (with their particular backtraces) and locks that are waiting to be locked. This could be enhanced that it would print both pieces of information only for the locks that are waiting to be acquired. But I don't really care what color the shed will have.
Here are some problems I'd love to get any help with:
- I cannot run it against built git version, two problems with that:
- When I'm not root, staprun needs setuid bit set and that is, of course, incompatible with the needed LD_LIBRARY_PATH.
The example command line shows you using 'stap' to actually launch libvirtd. This is a convenient approach since stap automatically gets the PID of the process to trace. You do not need todo it this way though. You can simply run the stap script, and then start libvirtd manually (at least for session mode libvirtd).
Yes, but then I might miss some information on the locks. However it might still help sometimes.
- Even when I am root, I'm unable to get any backtrace that goes beyond virLogMutex() no matter how I'm loading the debug symbols.
Hmm, I just tried your example script and seemed to get the full trace of symbols
# ./run stap --ldd -c daemon/libvirtd -d daemon/libvirtd examples/systemtap/lock-debug.stp src/.libs/libvirt.so WARNING: Missing unwind data for module, rerun with 'stap -d ...ge/src/virt/libvirt/src/.libs/libvirt_driver_vbox_network.so' WARNING: Missing unwind data for module, rerun with 'stap -d ...ge/src/virt/libvirt/src/.libs/libvirt_driver_vbox_storage.so' WARNING: Missing unwind data for module, rerun with 'stap -d ...errange/src/virt/libvirt/src/.libs/libvirt_driver_nodedev.so' WARNING: Missing unwind data for module, rerun with 'stap -d ...6/berrange/src/virt/libvirt/src/.libs/libvirt_driver_vbox.so' WARNING: Missing unwind data for module, rerun with 'stap -d ...errange/src/virt/libvirt/src/.libs/libvirt_driver_network.so'
This is another thing that has to be done. I guess --ldd doesn't load symbols for these drivers because the daemon is not dynamically linked with them, it just loads them. [...]
0x7f5eb4000c00(22462): virMutexLock+0x0 [libvirt.so.0.1002.9] 0x7f5ec1bb65bb [libvirt_driver_network.so+0xd5bb] --- virLogMutex+0x0/0x28 [...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9](22449): virMutexLock+0x0 [libvirt.so.0.1002.9] main+0x1f33 [lt-libvirtd] __libc_start_main+0xf5 [libc-2.18.so] _start+0x29 [lt-libvirtd] --- The following locks are waiting to be acquired: virLogMutex+0x0/0x28 [...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9](22462): virMutexLock+0x0 [libvirt.so.0.1002.9] virLogVMessage+0x365 [libvirt.so.0.1002.9] virLogMessage+0x97 [libvirt.so.0.1002.9] virCommandRunAsync+0x44f [libvirt.so.0.1002.9] virCommandRun+0x3a4 [libvirt.so.0.1002.9] dnsmasqCapsRefreshInternal+0x106 [libvirt.so.0.1002.9] dnsmasqCapsNewFromBinary+0x1b [libvirt.so.0.1002.9] 0x7f5ec1bb67e2 [libvirt_driver_network.so+0xd7e2] --- eventLoop+0x0/0x70 [...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9](22449): virMutexLock+0x0 [libvirt.so.0.1002.9] virEventPollRunOnce+0x401 [libvirt.so.0.1002.9] virEventRunDefaultImpl+0x42 [libvirt.so.0.1002.9] virNetServerRun+0x13d [lt-libvirtd] main+0x1f40 [lt-libvirtd] __libc_start_main+0xf5 [libc-2.18.so] _start+0x29 [lt-libvirtd] ---
How did you manage to run it with the git built daemon? Or did you just connected to it? If you just ran it as written above:
# ./run stap --ldd -c daemon/libvirtd -d daemon/libvirtd examples/systemtap/lock-debug.stp src/.libs/libvirt.so
then I myself am impressed that it works, because for me it doesn't :)
- When running with installed libvirtd, I am not getting as nice backtrace as I would in gdb (but that's probably the only way how systemtap will get any backtrace so it's better than nothing). And I'm unable to get line numbers even with addr2line.
Yeah, the systemtap backtrace is based solely on info available in the loaded ELF library unwind data. GDB meanwhile will look at the debuginfo packages to resolve to line numbers/files. The data your script is printing out has only shown addresses relative to the start of the function, and addr2line cannot convert that into line numbers. AFAICT addr2line only works with addresses relative to the load address of the ELF library.
You can make a tweak to the script to instead use sprint_usyms(ubacktrace()) which gets it to print out more data
---
0x7f224c000c00(23921): 0x7f226f0cd850 : virMutexLock+0x0/0x10 [...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9] 0x7f225a7f35bb : 0x7f225a7f35bb [...errange/src/virt/libvirt/src/.libs/libvirt_driver_network.so+0xd5bb/0x223000]
--- virLogMutex+0x0/0x28 [...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9](23908): 0x7f226f0cd850 : virMutexLock+0x0/0x10 [...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9] 0x7f226fc1d163 : main+0x1f33/0x269c [...1524c9576/berrange/src/virt/libvirt/daemon/.libs/lt-libvirtd] 0x7f226b94ed65 : __libc_start_main+0xf5/0x1c0 [/usr/lib64/libc-2.18.so] 0x7f226fc1d8f5 : _start+0x29/0x34 [...1524c9576/berrange/src/virt/libvirt/daemon/.libs/lt-libvirtd]
Unfortuntely these addresses appear to be absolute offsets in the process memory, which is still useless unless you know the load address of each ELF module.
I don't know what's the difference between the offsets before and after the slash (e.g. in main+0x1f33/0x269c), maybe it's from-to?
I definitely think it ought to be possible to convert an address like virCommandRunAsync+0x44f into a proper line number though. Perhaps we would need to directly write code against elfutils, instead of relying on addr2line.
That's out of my league for now (even though Id love to get some more insights on ELF structure, etc.). But do you think it would be worth adding even without this info (for now)? If yes, then I'd be happy to polish it a bit and propose it. Thank you for the response, Martin
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tue, Sep 09, 2014 at 07:51:00PM +0200, Martin Kletzander wrote:
On Mon, Sep 08, 2014 at 03:59:04PM +0100, Daniel P. Berrange wrote:
On Mon, Sep 08, 2014 at 04:17:44PM +0200, Martin Kletzander wrote:
Many moons ago, I wanted to make locking more debuggable from logs. An idea appeared that this might be better off with SystemTap. I let it rot for a while and some time back I had a deadlock that I wanted to try it on. So I started scripting it and here's the work (in progress).
Well, the idea is simple. For each mutex that is about to get locked, append it's symname and pid that locked it into the list A, when it get's locked move it to list B and when it's unlocked, remove it. Then whenever the user presses ^C, the script prints all the locks that were locked (with their particular backtraces) and locks that are waiting to be locked. This could be enhanced that it would print both pieces of information only for the locks that are waiting to be acquired. But I don't really care what color the shed will have.
Here are some problems I'd love to get any help with:
- I cannot run it against built git version, two problems with that:
- When I'm not root, staprun needs setuid bit set and that is, of course, incompatible with the needed LD_LIBRARY_PATH.
The example command line shows you using 'stap' to actually launch libvirtd. This is a convenient approach since stap automatically gets the PID of the process to trace. You do not need todo it this way though. You can simply run the stap script, and then start libvirtd manually (at least for session mode libvirtd).
Yes, but then I might miss some information on the locks. However it might still help sometimes.
If you launch the stap script before launching libvirtd then you should not miss any locks.
Hmm, I just tried your example script and seemed to get the full trace of symbols
# ./run stap --ldd -c daemon/libvirtd -d daemon/libvirtd examples/systemtap/lock-debug.stp src/.libs/libvirt.so WARNING: Missing unwind data for module, rerun with 'stap -d ...ge/src/virt/libvirt/src/.libs/libvirt_driver_vbox_network.so' WARNING: Missing unwind data for module, rerun with 'stap -d ...ge/src/virt/libvirt/src/.libs/libvirt_driver_vbox_storage.so' WARNING: Missing unwind data for module, rerun with 'stap -d ...errange/src/virt/libvirt/src/.libs/libvirt_driver_nodedev.so' WARNING: Missing unwind data for module, rerun with 'stap -d ...6/berrange/src/virt/libvirt/src/.libs/libvirt_driver_vbox.so' WARNING: Missing unwind data for module, rerun with 'stap -d ...errange/src/virt/libvirt/src/.libs/libvirt_driver_network.so'
This is another thing that has to be done. I guess --ldd doesn't load symbols for these drivers because the daemon is not dynamically linked with them, it just loads them.
Yes, you would have to manually list these, or just turn off loadable modules when debugging locks might be easier.
How did you manage to run it with the git built daemon? Or did you just connected to it? If you just ran it as written above:
# ./run stap --ldd -c daemon/libvirtd -d daemon/libvirtd examples/systemtap/lock-debug.stp src/.libs/libvirt.so
then I myself am impressed that it works, because for me it doesn't :)
Yes, I ran exactly that command, however, I can't remember if that was the first time I ran it after compilation. The libtool wrapper scripts do alot of one-time work the first time you run after build. So it cna be helpful to launch the daemon once & shut it down, and then launch it again for debugging.
virLogMutex+0x0/0x28 [...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9](23908): 0x7f226f0cd850 : virMutexLock+0x0/0x10 [...9576/berrange/src/virt/libvirt/src/.libs/libvirt.so.0.1002.9] 0x7f226fc1d163 : main+0x1f33/0x269c [...1524c9576/berrange/src/virt/libvirt/daemon/.libs/lt-libvirtd] 0x7f226b94ed65 : __libc_start_main+0xf5/0x1c0 [/usr/lib64/libc-2.18.so] 0x7f226fc1d8f5 : _start+0x29/0x34 [...1524c9576/berrange/src/virt/libvirt/daemon/.libs/lt-libvirtd]
Unfortuntely these addresses appear to be absolute offsets in the process memory, which is still useless unless you know the load address of each ELF module.
I don't know what's the difference between the offsets before and after the slash (e.g. in main+0x1f33/0x269c), maybe it's from-to?
Yeah, I'm unclear on that too - I'd have to look at the source
I definitely think it ought to be possible to convert an address like virCommandRunAsync+0x44f into a proper line number though. Perhaps we would need to directly write code against elfutils, instead of relying on addr2line.
That's out of my league for now (even though Id love to get some more insights on ELF structure, etc.). But do you think it would be worth adding even without this info (for now)? If yes, then I'd be happy to polish it a bit and propose it.
Yes, I think it is worth having the script in git. Even without line numbers, simply seeing the function name stacks could be enough to diagnose the problems, since most functions only have one lock call. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (2)
-
Daniel P. Berrange
-
Martin Kletzander