2010/6/10 Emre Erenoglu <erenoglu(a)gmail.com>:
On Thu, Jun 10, 2010 at 10:35 PM, Matthias Bolte
<matthias.bolte(a)googlemail.com> wrote:
>
> 2010/6/10 Emre Erenoglu <erenoglu(a)gmail.com>:
> > On Thu, Jun 10, 2010 at 9:07 PM, Daniel P. Berrange
> > <berrange(a)redhat.com>
> > wrote:
> >>
> >> On Thu, Jun 10, 2010 at 08:57:15PM +0300, Emre Erenoglu wrote:
> >> > On Thu, Jun 10, 2010 at 5:02 PM, Matthias Bolte <
> >> > matthias.bolte(a)googlemail.com> wrote:
> >> >
> >> > > 2010/6/10 Emre Erenoglu <erenoglu(a)gmail.com>:
> >> > > The initscript explicitly starts the one in /usr/sbin. If you
just
> >> > > start libvirtd manually without an absolute path then you'll
start
> >> > > the
> >> > > one in /usr/local/sbin. This might explain why you cannot
reproduce
> >> > > the segfault manually, but it doesn't explain why the
segfault
> >> > > happens.
> >> > >
> >> >
> >> > There's no other installation of libvirt in the system. I can also
> >> > reproduce
> >> > the same thing in all Pardus machines, so I believe it's something
in
> >> > libvirt not doing well with something else in our service init
> >> > mechanisms.
> >>
> >> I guess I'd put money on some environment variable causing trouble.
> >> It could be a *missing* environment variable that we expect to always
> >> be set, or something like that
> >
> > Hi Daniel, thanks for your message. Yes, I did a small script file as
> > you
> > suggested and found out this environment while libvirtd was run:
> >
> >
> >
DBUS_STARTER_ADDRESS=unix:path=/var/run/dbus/system_bus_socket,guid=6c515f612162b05d554b59cd4c112d43
> > KRB5_KTNAME=/etc/libvirt/krb5.tab
> > PWD=/
> > DBUS_STARTER_BUS_TYPE=system
> > SHLVL=1
> > _=/usr/bin/env
> >
> > This looks very weak compared to the standard root environment that I
> > pasted
> > in my earlier message.
>
> No PATH? I bet there is code in libvirt that assumes getenv("PATH")
> will be != NULL.
>
> Could you try to add PATH to the environment. It can be empty, doesn't
> matter. Just make sure it's there, so getenv("PATH") returns an empty
> string instead of NULL.
I just did exactly what you said by the same instinct, ie added the PATH
environment variable, and, nailed it down! It works! wow!
That confirms our assumption.
> >>
> >> > > >> Could you provide a GDB backtrace of the segfault? The
syslog
> >> > > >> entry
> >> > > >> only
> >> > > >> says that it crashed in libc, that's not enough
information to
> >> > > >> debug the segfault.
> >> > > >
> >> > > > Unfortunately, I can't find a related core file in the
system. In
> >> > > > fact,
> >> > > core
> >> > > > file is not generated. I'll also try to fix this out and
come
> >> > > > back
> >> > > > to the
> >> > > > list.
> >> > > >
> >> > >
> >> > > Getting a backtrace would be simpler if you could reproduce the
> >> > > problem manually. In that case you could just start libvirtd in
> >> > > GDB.
> >> > > But getting a backtrace from a coredump will work too.
> >> > >
> >> > I can't reproduce the segfault when I run it manually. It only
> >> > happens
> >> > when
> >> > it's run from this python script. I will try to initialize gdb
inside
> >> > the
> >> > script and connect remotely to the gdb session, but it's getting a
> >> > bit
> >> > over
> >> > my debugging capabilities :) For example, I don't know how to
assign
> >> > the
> >> > symbols and source code etc from the package build directory to gdb.
> >>
> >> Try creating a wrapper script, eg
> >>
> >> mv /usr/sbin/libvirtd /usr/sbin/libvirtd.real
> >> cat > /usr/sbin/libvirtd <<EOF
> >> #!/bin/sh
> >> cd /tmp
> >> ulimited -c unlimited
> >> exec /usr/sbin/libvirtd.real
> >> EOF
> >> chmod +x /usr/sbin/libvirtd
> >>
> >> That will hopefully give you a core dump in /tmp you can get get a
> >> stack trace from
> >
> > Yes, I got the core file with the script. However, when I open the core
> > file
> > with gdb, and use bt command to get the backtrace, the only thing it
> > tells
> > me is this:
> >
> > Core was generated by `/usr/sbin/libvirtd --daemon'.
> > Program terminated with signal 11, Segmentation fault.
> > #0 0xb73ed8f3 in ?? ()
> > (gdb) bt
> > Cannot access memory at address 0x810b9db
> >
> > Maybe I don't know enough of debugging as I know I have to see the code
> > lines (somehow) at this segfault point. Could you guide me on that?
> >
> > Thanks,
> >
> > Br,
> > Emre
> >
>
> Strange backtrace. Maybe there is heap corruption going on so that GDB
> can't make sense out of it anymore.
>
> I'll do some research about the PATH usage in libvirt now.
OK. I guess it's used to find the dhcp daemon, iptables etc. Other service
scripts seem to work happily without this PATH, but I'll ask developers to
add it to the python service environment to make sure it works fine.
Thanks again Matthias, Daniel! I'm a happy guy now :)
Emre Erenoglu
Yes, libvirt tries to discover that binaries via the PATH.
The utility function virFindFileInPath used the result of
getenv("PATH") without checking it for NULL. I'll post a patch for
that in a bit.
Matthias