On Thu, Sep 16, 2010 at 05:04:42PM +0200, Daniel Veillard wrote:
On Tue, Sep 14, 2010 at 07:30:20PM +0100, Daniel P. Berrange wrote:
> This patchset provides the infrastructure for supporting dynamic
> probing of libvirtd, using static DTrace markers. This can be
> used by SystemTAP on Linux, or DTrace on Solaris/OS-X/BSD for
> low overhead tracing.
>
> The proof of concept provides a handful of markers wrt to network
> client connections, security & auth. Obviously it can be expanded
> to cover a huge area of our codebase for different tasks. The
> hard bit is deciding what should be exposed as a probe point.
> Ideally probes should not be changed/removed once added, since
> this would break any user tracing scripts. So a little care needs
> to be taken in placing probes to be robust against future code
> re-factoring.
Very interesting, I'm just a bit surprized by the patch set.
patch 1 and 2 are really unrelated, I think they should go in
independantly. Patch 1 is pure refactoring, I would rather apply it
early in the cycle for 0.8.5 (or whatever our next release might
be named), and glancing at it it looks finr to me ACK (will jsut need
a bit of rebase due to Justin patch)
Patch 2 looks simple enough, ACK too
For patch 3 I'm a bit surprized, I think I would have started by adding
probes for all the public API places first on entry and second on exit with
the return value provided. It can be helpful for debugging connection
problems but well I would use SystemTap more for debugging and tuning
of a running system (i.e. it runs and you don't want to disturb it or
minimally)
Adding probes for public API places would require me to add 266 probes!
This patch let me demo it adding < 10 probes. I'm not 100% sure whether
we need to add explicit probes for public API places, because I think
dtrace/systemtap may provide generic support for probes at function
enty and return points. If adding explicit probes lets it work without
needing -debuginfo installed then it could still be worthwhile to mark
public API places.
We should understand the problem of the restart being needed to
before
pushing patch3 I think, maybe we need to talk to one of the SystemTap
developpers so he can explain what is going on and how to fix this :-)
Yep, there's already someone looking at it. One of the problems was
caused by LXC containers - uprobes kernel module has some bad assumptions
that LXC invalidates. The other problem looks like a data caching / race
issue in systemtap/uprobes to me. So I don't think this needs to hold
us up.
Daniel
--
|: Red Hat, Engineering, London -o-
http://people.redhat.com/berrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org -o-
http://deltacloud.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|