[libvirt] Segfault in libvirtd when run as a service

Dear list, I'm trying to package libvirt 0.8.1 for our distribution, Pardus 2009.2. libvirt is installed perfectly normal, and libvirtd runs OK when I start it in a console using root account. However, when I start libvirtd as a service, with the same parameters, through the normal service startup functions, it segfaults. The services in Pardus 2009.2 are started using a management backend which works with python and service start/stop scripts are python based. For libvirt, it's the following: http://svn.pardus.org.tr/pardus/playground/ozan/libvirt/comar/service.py Whatever I did, I couldn't find why libvirt is crashing. It works normal when I run it from console with exactly the same parameters. Here's an earlier syslog section ending with the crash: May 21 17:34:31 voyager kernel: [148326.330658] Bridge firewalling registered May 21 17:34:31 voyager kernel: [148326.626654] virbr0: starting userspace STP failed, starting kernel STP May 21 17:34:31 voyager avahi-daemon[559]: Joining mDNS multicast group on interface virbr0.IPv4 with address 192.168.122.1. May 21 17:34:31 voyager avahi-daemon[559]: New relevant interface virbr0.IPv4 for mDNS. May 21 17:34:31 voyager avahi-daemon[559]: Registering new address record for 192.168.122.1 on virbr0.IPv4. May 21 17:34:32 voyager kernel: [148327.158860] ip_tables: (C) 2000-2006 Netfilter Core Team May 21 17:34:32 voyager kernel: [148327.359921] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) May 21 17:34:32 voyager kernel: [148327.360680] CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use May 21 17:34:32 voyager kernel: [148327.360682] nf_conntrack.acct=1 kernel parameter, acct=1 nf_conntrack module option or May 21 17:34:32 voyager kernel: [148327.360683] sysctl net.netfilter.nf_conntrack_acct=1 to enable it. May 21 17:34:33 voyager dnsmasq[13918]: started, version 2.47 cachesize 150 May 21 17:34:33 voyager dnsmasq[13918]: compile time options: IPv6 GNU-getopt DBus no-I18N TFTP May 21 17:34:33 voyager dnsmasq[13918]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h May 21 17:34:33 voyager dnsmasq[13918]: reading /etc/resolv.conf May 21 17:34:33 voyager dnsmasq[13918]: using nameserver 193.140.100.215#53 May 21 17:34:33 voyager dnsmasq[13918]: using nameserver 193.140.100.210#53 May 21 17:34:33 voyager dnsmasq[13918]: using nameserver 192.168.1.1#53 May 21 17:34:33 voyager dnsmasq[13918]: read /etc/hosts - 7 addresses May 21 17:34:33 voyager kernel: [148328.620605] libvirtd[13826]: segfault at 0 ip b74aa8f3 sp bfcd24fc error 4 in libc-2.9.so[b7431000+161000] Thanks for any insight, Best Regards, -- Emre Erenoglu

2010/6/10 Emre Erenoglu <erenoglu@gmail.com>:
Dear list,
I'm trying to package libvirt 0.8.1 for our distribution, Pardus 2009.2. libvirt is installed perfectly normal, and libvirtd runs OK when I start it in a console using root account.
However, when I start libvirtd as a service, with the same parameters, through the normal service startup functions, it segfaults.
The services in Pardus 2009.2 are started using a management backend which works with python and service start/stop scripts are python based.
For libvirt, it's the following: http://svn.pardus.org.tr/pardus/playground/ozan/libvirt/comar/service.py
Whatever I did, I couldn't find why libvirt is crashing. It works normal when I run it from console with exactly the same parameters. Here's an earlier syslog section ending with the crash:
There are some things to consider: - Did you use the exact same commandline as the initscript when testing manually? - Did you make sure to use the same environment variable configuration when starting libvirtd manually, compared to the initscript? Could you provide a GDB backtrace of the segfault? The syslog entry only says that it crashed in libc, that's not enough information to debug the segfault. Matthias

On Thu, Jun 10, 2010 at 2:05 PM, Matthias Bolte < matthias.bolte@googlemail.com> wrote:
2010/6/10 Emre Erenoglu <erenoglu@gmail.com>:
Dear list,
I'm trying to package libvirt 0.8.1 for our distribution, Pardus 2009.2. libvirt is installed perfectly normal, and libvirtd runs OK when I start it in a console using root account.
However, when I start libvirtd as a service, with the same parameters, through the normal service startup functions, it segfaults.
The services in Pardus 2009.2 are started using a management backend which works with python and service start/stop scripts are python based.
For libvirt, it's the following: http://svn.pardus.org.tr/pardus/playground/ozan/libvirt/comar/service.py
Whatever I did, I couldn't find why libvirt is crashing. It works normal when I run it from console with exactly the same parameters. Here's an earlier syslog section ending with the crash:
There are some things to consider:
- Did you use the exact same commandline as the initscript when testing manually?
Yes. In fact, the only parameter passed is the --daemon parameter with current configuration.
- Did you make sure to use the same environment variable configuration when starting libvirtd manually, compared to the initscript?
Here's the environment of the root user, I will try to find out the environment of the service script: MANPATH=/usr/local/share/man:/usr/share/man:/opt/sun-jre/man:/usr/kde/4/share/man HOSTNAME=EMRE SHELL=/bin/bash TERM=linux XDG_SESSION_COOKIE=3d6ade2bb28141896f3212d64bf41670-1276174999.886063-1263776093 HUSHLOGIN=FALSE LC_ALL=en_US.UTF-8 USER=root LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.pisi=01;33:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.ogv=01\:35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.pdf=00;32:*.ps=00;32:*.txt=00;32:*.patch=00;32:*.diff=00;32:*.log=00;32:*.tex=00;32:*.doc=00;32:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36: GUILE_LOAD_PATH=/usr/share/guile/1.8 MC_ENV=/usr/share/mc/bin/mc.sh PAGER=/usr/bin/less CONFIG_PROTECT_MASK=/etc/texmf/web2c /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:/opt/sun-jre/bin:/usr/kde/4/sbin:/usr/kde/4/bin PWD=/root JAVA_HOME=/opt/sun-jre EDITOR=/bin/nano LESSCOLOR=yes LANG=en_US.UTF-8 PYTHONSTARTUP=/etc/pythonstart PS1=\[\033[1;31m\]\h \[\033[1;34m\]\W \$ \[\033[00m\] SHLVL=1 HOME=/root LD_BIND_DIRECT=1 LESS=-R -M --shift 5 LOGNAME=root CVS_RSH=ssh XDG_DATA_DIRS=/usr/kde/4/share:/usr/share PKG_CONFIG_PATH=/usr/kde/4/lib/pkgconfig:/usr/qt/4/lib/pkgconfig LESSOPEN=|lesspipe.sh %s INFOPATH=/usr/share/info LADSPA_PATH=/usr/lib/ladspa SANE_CONFIG_DIR=/etc/sane.d _=/usr/bin/env Do you see any environment variable that may affect behaviour of libvirtd? Could you provide a GDB backtrace of the segfault? The syslog entry only
says that it crashed in libc, that's not enough information to debug the segfault.
Unfortunately, I can't find a related core file in the system. In fact, core file is not generated. I'll also try to fix this out and come back to the list. Thanks a lot, Emre

2010/6/10 Emre Erenoglu <erenoglu@gmail.com>:
On Thu, Jun 10, 2010 at 2:05 PM, Matthias Bolte <matthias.bolte@googlemail.com> wrote:
2010/6/10 Emre Erenoglu <erenoglu@gmail.com>:
Dear list,
I'm trying to package libvirt 0.8.1 for our distribution, Pardus 2009.2. libvirt is installed perfectly normal, and libvirtd runs OK when I start it in a console using root account.
However, when I start libvirtd as a service, with the same parameters, through the normal service startup functions, it segfaults.
The services in Pardus 2009.2 are started using a management backend which works with python and service start/stop scripts are python based.
For libvirt, it's the following: http://svn.pardus.org.tr/pardus/playground/ozan/libvirt/comar/service.py
Whatever I did, I couldn't find why libvirt is crashing. It works normal when I run it from console with exactly the same parameters. Here's an earlier syslog section ending with the crash:
There are some things to consider:
- Did you use the exact same commandline as the initscript when testing manually?
Yes. In fact, the only parameter passed is the --daemon parameter with current configuration.
With absolute path as the initscript? /usr/sbin/libvirtd --daemon --config /etc/libvirt/libvirtd.conf Assuming LIBVIRTD_ARGS is empty in the initscript.
- Did you make sure to use the same environment variable configuration when starting libvirtd manually, compared to the initscript?
Here's the environment of the root user, I will try to find out the environment of the service script:
MANPATH=/usr/local/share/man:/usr/share/man:/opt/sun-jre/man:/usr/kde/4/share/man HOSTNAME=EMRE SHELL=/bin/bash TERM=linux XDG_SESSION_COOKIE=3d6ade2bb28141896f3212d64bf41670-1276174999.886063-1263776093 HUSHLOGIN=FALSE LC_ALL=en_US.UTF-8 USER=root LS_COLORS= ... GUILE_LOAD_PATH=/usr/share/guile/1.8 MC_ENV=/usr/share/mc/bin/mc.sh PAGER=/usr/bin/less CONFIG_PROTECT_MASK=/etc/texmf/web2c /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:/opt/sun-jre/bin:/usr/kde/4/sbin:/usr/kde/4/bin
I asked about the environment variables and the commandline because you have /usr/local/sbin befreo /usr/bin in PATH. So you might have two libvirtds installed, one in /usr/local/sbin and one in /usr/sbin. The initscript explicitly starts the one in /usr/sbin. If you just start libvirtd manually without an absolute path then you'll start the one in /usr/local/sbin. This might explain why you cannot reproduce the segfault manually, but it doesn't explain why the segfault happens.
Could you provide a GDB backtrace of the segfault? The syslog entry only says that it crashed in libc, that's not enough information to debug the segfault.
Unfortunately, I can't find a related core file in the system. In fact, core file is not generated. I'll also try to fix this out and come back to the list.
Getting a backtrace would be simpler if you could reproduce the problem manually. In that case you could just start libvirtd in GDB. But getting a backtrace from a coredump will work too. Matthias

On Thu, Jun 10, 2010 at 5:02 PM, Matthias Bolte < matthias.bolte@googlemail.com> wrote:
2010/6/10 Emre Erenoglu <erenoglu@gmail.com>:
On Thu, Jun 10, 2010 at 2:05 PM, Matthias Bolte <matthias.bolte@googlemail.com> wrote:
2010/6/10 Emre Erenoglu <erenoglu@gmail.com>:
Dear list,
I'm trying to package libvirt 0.8.1 for our distribution, Pardus
2009.2.
libvirt is installed perfectly normal, and libvirtd runs OK when I start it in a console using root account.
However, when I start libvirtd as a service, with the same parameters, through the normal service startup functions, it segfaults.
The services in Pardus 2009.2 are started using a management backend which works with python and service start/stop scripts are python based.
For libvirt, it's the following:
http://svn.pardus.org.tr/pardus/playground/ozan/libvirt/comar/service.py
Whatever I did, I couldn't find why libvirt is crashing. It works
normal
when I run it from console with exactly the same parameters. Here's an earlier syslog section ending with the crash:
There are some things to consider:
- Did you use the exact same commandline as the initscript when testing manually?
Yes. In fact, the only parameter passed is the --daemon parameter with current configuration.
With absolute path as the initscript?
/usr/sbin/libvirtd --daemon --config /etc/libvirt/libvirtd.conf
Assuming LIBVIRTD_ARGS is empty in the initscript.
Yes, if you check the script service.py, you'll see. We start libvirtd with the absolute path and exactly the above parameters. The conf file itself is the default one.
- Did you make sure to use the same environment variable configuration when starting libvirtd manually, compared to the initscript?
Here's the environment of the root user, I will try to find out the environment of the service script:
MANPATH=/usr/local/share/man:/usr/share/man:/opt/sun-jre/man:/usr/kde/4/share/man
HOSTNAME=EMRE SHELL=/bin/bash TERM=linux
XDG_SESSION_COOKIE=3d6ade2bb28141896f3212d64bf41670-1276174999.886063-1263776093
HUSHLOGIN=FALSE LC_ALL=en_US.UTF-8 USER=root LS_COLORS= ... GUILE_LOAD_PATH=/usr/share/guile/1.8 MC_ENV=/usr/share/mc/bin/mc.sh PAGER=/usr/bin/less CONFIG_PROTECT_MASK=/etc/texmf/web2c /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:/opt/sun-jre/bin:/usr/kde/4/sbin:/usr/kde/4/bin
I asked about the environment variables and the commandline because you have /usr/local/sbin befreo /usr/bin in PATH. So you might have two libvirtds installed, one in /usr/local/sbin and one in /usr/sbin.
The initscript explicitly starts the one in /usr/sbin. If you just start libvirtd manually without an absolute path then you'll start the one in /usr/local/sbin. This might explain why you cannot reproduce the segfault manually, but it doesn't explain why the segfault happens.
There's no other installation of libvirt in the system. I can also reproduce the same thing in all Pardus machines, so I believe it's something in libvirt not doing well with something else in our service init mechanisms.
Could you provide a GDB backtrace of the segfault? The syslog entry only says that it crashed in libc, that's not enough information to debug the segfault.
Unfortunately, I can't find a related core file in the system. In fact, core file is not generated. I'll also try to fix this out and come back to the list.
Getting a backtrace would be simpler if you could reproduce the problem manually. In that case you could just start libvirtd in GDB. But getting a backtrace from a coredump will work too.
I can't reproduce the segfault when I run it manually. It only happens when it's run from this python script. I will try to initialize gdb inside the script and connect remotely to the gdb session, but it's getting a bit over my debugging capabilities :) For example, I don't know how to assign the symbols and source code etc from the package build directory to gdb. Thanks a lot for your support Matthias! Emre

On Thu, Jun 10, 2010 at 08:57:15PM +0300, Emre Erenoglu wrote:
On Thu, Jun 10, 2010 at 5:02 PM, Matthias Bolte < matthias.bolte@googlemail.com> wrote:
2010/6/10 Emre Erenoglu <erenoglu@gmail.com>: The initscript explicitly starts the one in /usr/sbin. If you just start libvirtd manually without an absolute path then you'll start the one in /usr/local/sbin. This might explain why you cannot reproduce the segfault manually, but it doesn't explain why the segfault happens.
There's no other installation of libvirt in the system. I can also reproduce the same thing in all Pardus machines, so I believe it's something in libvirt not doing well with something else in our service init mechanisms.
I guess I'd put money on some environment variable causing trouble. It could be a *missing* environment variable that we expect to always be set, or something like that
Could you provide a GDB backtrace of the segfault? The syslog entry only says that it crashed in libc, that's not enough information to debug the segfault.
Unfortunately, I can't find a related core file in the system. In fact, core file is not generated. I'll also try to fix this out and come back to the list.
Getting a backtrace would be simpler if you could reproduce the problem manually. In that case you could just start libvirtd in GDB. But getting a backtrace from a coredump will work too.
I can't reproduce the segfault when I run it manually. It only happens when it's run from this python script. I will try to initialize gdb inside the script and connect remotely to the gdb session, but it's getting a bit over my debugging capabilities :) For example, I don't know how to assign the symbols and source code etc from the package build directory to gdb.
Try creating a wrapper script, eg mv /usr/sbin/libvirtd /usr/sbin/libvirtd.real cat > /usr/sbin/libvirtd <<EOF #!/bin/sh cd /tmp ulimited -c unlimited exec /usr/sbin/libvirtd.real EOF chmod +x /usr/sbin/libvirtd That will hopefully give you a core dump in /tmp you can get get a stack trace from Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Thu, Jun 10, 2010 at 9:07 PM, Daniel P. Berrange <berrange@redhat.com>wrote:
On Thu, Jun 10, 2010 at 08:57:15PM +0300, Emre Erenoglu wrote:
On Thu, Jun 10, 2010 at 5:02 PM, Matthias Bolte < matthias.bolte@googlemail.com> wrote:
2010/6/10 Emre Erenoglu <erenoglu@gmail.com>: The initscript explicitly starts the one in /usr/sbin. If you just start libvirtd manually without an absolute path then you'll start the one in /usr/local/sbin. This might explain why you cannot reproduce the segfault manually, but it doesn't explain why the segfault happens.
There's no other installation of libvirt in the system. I can also reproduce the same thing in all Pardus machines, so I believe it's something in libvirt not doing well with something else in our service init mechanisms.
I guess I'd put money on some environment variable causing trouble. It could be a *missing* environment variable that we expect to always be set, or something like that
Hi Daniel, thanks for your message. Yes, I did a small script file as you suggested and found out this environment while libvirtd was run: DBUS_STARTER_ADDRESS=unix:path=/var/run/dbus/system_bus_socket,guid=6c515f612162b05d554b59cd4c112d43 KRB5_KTNAME=/etc/libvirt/krb5.tab PWD=/ DBUS_STARTER_BUS_TYPE=system SHLVL=1 _=/usr/bin/env This looks very weak compared to the standard root environment that I pasted in my earlier message.
Could you provide a GDB backtrace of the segfault? The syslog entry only says that it crashed in libc, that's not enough information to debug the segfault.
Unfortunately, I can't find a related core file in the system. In fact, core file is not generated. I'll also try to fix this out and come back to the list.
Getting a backtrace would be simpler if you could reproduce the problem manually. In that case you could just start libvirtd in GDB. But getting a backtrace from a coredump will work too.
I can't reproduce the segfault when I run it manually. It only happens when it's run from this python script. I will try to initialize gdb inside the script and connect remotely to the gdb session, but it's getting a bit over my debugging capabilities :) For example, I don't know how to assign the symbols and source code etc from the package build directory to gdb.
Try creating a wrapper script, eg
mv /usr/sbin/libvirtd /usr/sbin/libvirtd.real cat > /usr/sbin/libvirtd <<EOF #!/bin/sh cd /tmp ulimited -c unlimited exec /usr/sbin/libvirtd.real EOF chmod +x /usr/sbin/libvirtd
That will hopefully give you a core dump in /tmp you can get get a stack trace from
Yes, I got the core file with the script. However, when I open the core file with gdb, and use bt command to get the backtrace, the only thing it tells me is this: Core was generated by `/usr/sbin/libvirtd --daemon'. Program terminated with signal 11, Segmentation fault. #0 0xb73ed8f3 in ?? () (gdb) bt Cannot access memory at address 0x810b9db Maybe I don't know enough of debugging as I know I have to see the code lines (somehow) at this segfault point. Could you guide me on that? Thanks, Br, Emre

2010/6/10 Emre Erenoglu <erenoglu@gmail.com>:
On Thu, Jun 10, 2010 at 9:07 PM, Daniel P. Berrange <berrange@redhat.com> wrote:
On Thu, Jun 10, 2010 at 08:57:15PM +0300, Emre Erenoglu wrote:
On Thu, Jun 10, 2010 at 5:02 PM, Matthias Bolte < matthias.bolte@googlemail.com> wrote:
2010/6/10 Emre Erenoglu <erenoglu@gmail.com>: The initscript explicitly starts the one in /usr/sbin. If you just start libvirtd manually without an absolute path then you'll start the one in /usr/local/sbin. This might explain why you cannot reproduce the segfault manually, but it doesn't explain why the segfault happens.
There's no other installation of libvirt in the system. I can also reproduce the same thing in all Pardus machines, so I believe it's something in libvirt not doing well with something else in our service init mechanisms.
I guess I'd put money on some environment variable causing trouble. It could be a *missing* environment variable that we expect to always be set, or something like that
Hi Daniel, thanks for your message. Yes, I did a small script file as you suggested and found out this environment while libvirtd was run:
DBUS_STARTER_ADDRESS=unix:path=/var/run/dbus/system_bus_socket,guid=6c515f612162b05d554b59cd4c112d43 KRB5_KTNAME=/etc/libvirt/krb5.tab PWD=/ DBUS_STARTER_BUS_TYPE=system SHLVL=1 _=/usr/bin/env
This looks very weak compared to the standard root environment that I pasted in my earlier message.
No PATH? I bet there is code in libvirt that assumes getenv("PATH") will be != NULL. Could you try to add PATH to the environment. It can be empty, doesn't matter. Just make sure it's there, so getenv("PATH") returns an empty string instead of NULL.
Could you provide a GDB backtrace of the segfault? The syslog entry only says that it crashed in libc, that's not enough information to debug the segfault.
Unfortunately, I can't find a related core file in the system. In fact, core file is not generated. I'll also try to fix this out and come back to the list.
Getting a backtrace would be simpler if you could reproduce the problem manually. In that case you could just start libvirtd in GDB. But getting a backtrace from a coredump will work too.
I can't reproduce the segfault when I run it manually. It only happens when it's run from this python script. I will try to initialize gdb inside the script and connect remotely to the gdb session, but it's getting a bit over my debugging capabilities :) For example, I don't know how to assign the symbols and source code etc from the package build directory to gdb.
Try creating a wrapper script, eg
mv /usr/sbin/libvirtd /usr/sbin/libvirtd.real cat > /usr/sbin/libvirtd <<EOF #!/bin/sh cd /tmp ulimited -c unlimited exec /usr/sbin/libvirtd.real EOF chmod +x /usr/sbin/libvirtd
That will hopefully give you a core dump in /tmp you can get get a stack trace from
Yes, I got the core file with the script. However, when I open the core file with gdb, and use bt command to get the backtrace, the only thing it tells me is this:
Core was generated by `/usr/sbin/libvirtd --daemon'. Program terminated with signal 11, Segmentation fault. #0 0xb73ed8f3 in ?? () (gdb) bt Cannot access memory at address 0x810b9db
Maybe I don't know enough of debugging as I know I have to see the code lines (somehow) at this segfault point. Could you guide me on that?
Thanks,
Br, Emre
Strange backtrace. Maybe there is heap corruption going on so that GDB can't make sense out of it anymore. I'll do some research about the PATH usage in libvirt now. Matthias

On Thu, Jun 10, 2010 at 10:35 PM, Matthias Bolte < matthias.bolte@googlemail.com> wrote:
On Thu, Jun 10, 2010 at 9:07 PM, Daniel P. Berrange <berrange@redhat.com
wrote:
On Thu, Jun 10, 2010 at 08:57:15PM +0300, Emre Erenoglu wrote:
On Thu, Jun 10, 2010 at 5:02 PM, Matthias Bolte < matthias.bolte@googlemail.com> wrote:
2010/6/10 Emre Erenoglu <erenoglu@gmail.com>: The initscript explicitly starts the one in /usr/sbin. If you just start libvirtd manually without an absolute path then you'll start
2010/6/10 Emre Erenoglu <erenoglu@gmail.com>: the
one in /usr/local/sbin. This might explain why you cannot reproduce the segfault manually, but it doesn't explain why the segfault happens.
There's no other installation of libvirt in the system. I can also reproduce the same thing in all Pardus machines, so I believe it's something in libvirt not doing well with something else in our service init mechanisms.
I guess I'd put money on some environment variable causing trouble. It could be a *missing* environment variable that we expect to always be set, or something like that
Hi Daniel, thanks for your message. Yes, I did a small script file as you suggested and found out this environment while libvirtd was run:
KRB5_KTNAME=/etc/libvirt/krb5.tab PWD=/ DBUS_STARTER_BUS_TYPE=system SHLVL=1 _=/usr/bin/env
This looks very weak compared to the standard root environment that I
DBUS_STARTER_ADDRESS=unix:path=/var/run/dbus/system_bus_socket,guid=6c515f612162b05d554b59cd4c112d43 pasted
in my earlier message.
No PATH? I bet there is code in libvirt that assumes getenv("PATH") will be != NULL.
Could you try to add PATH to the environment. It can be empty, doesn't matter. Just make sure it's there, so getenv("PATH") returns an empty string instead of NULL.
I just did exactly what you said by the same instinct, ie added the PATH environment variable, and, nailed it down! It works! wow!
> Could you provide a GDB backtrace of the segfault? The syslog
entry
> only > says that it crashed in libc, that's not enough information to > debug the segfault.
Unfortunately, I can't find a related core file in the system. In fact, core file is not generated. I'll also try to fix this out and come back to the list.
Getting a backtrace would be simpler if you could reproduce the problem manually. In that case you could just start libvirtd in GDB. But getting a backtrace from a coredump will work too.
I can't reproduce the segfault when I run it manually. It only happens when it's run from this python script. I will try to initialize gdb inside the script and connect remotely to the gdb session, but it's getting a bit over my debugging capabilities :) For example, I don't know how to assign the symbols and source code etc from the package build directory to gdb.
Try creating a wrapper script, eg
mv /usr/sbin/libvirtd /usr/sbin/libvirtd.real cat > /usr/sbin/libvirtd <<EOF #!/bin/sh cd /tmp ulimited -c unlimited exec /usr/sbin/libvirtd.real EOF chmod +x /usr/sbin/libvirtd
That will hopefully give you a core dump in /tmp you can get get a stack trace from
Yes, I got the core file with the script. However, when I open the core file with gdb, and use bt command to get the backtrace, the only thing it tells me is this:
Core was generated by `/usr/sbin/libvirtd --daemon'. Program terminated with signal 11, Segmentation fault. #0 0xb73ed8f3 in ?? () (gdb) bt Cannot access memory at address 0x810b9db
Maybe I don't know enough of debugging as I know I have to see the code lines (somehow) at this segfault point. Could you guide me on that?
Thanks,
Br, Emre
Strange backtrace. Maybe there is heap corruption going on so that GDB can't make sense out of it anymore.
I'll do some research about the PATH usage in libvirt now.
OK. I guess it's used to find the dhcp daemon, iptables etc. Other service scripts seem to work happily without this PATH, but I'll ask developers to add it to the python service environment to make sure it works fine. Thanks again Matthias, Daniel! I'm a happy guy now :) Emre Erenoglu

2010/6/10 Emre Erenoglu <erenoglu@gmail.com>:
On Thu, Jun 10, 2010 at 10:35 PM, Matthias Bolte <matthias.bolte@googlemail.com> wrote:
2010/6/10 Emre Erenoglu <erenoglu@gmail.com>:
On Thu, Jun 10, 2010 at 9:07 PM, Daniel P. Berrange <berrange@redhat.com> wrote:
On Thu, Jun 10, 2010 at 08:57:15PM +0300, Emre Erenoglu wrote:
On Thu, Jun 10, 2010 at 5:02 PM, Matthias Bolte < matthias.bolte@googlemail.com> wrote:
2010/6/10 Emre Erenoglu <erenoglu@gmail.com>: The initscript explicitly starts the one in /usr/sbin. If you just start libvirtd manually without an absolute path then you'll start the one in /usr/local/sbin. This might explain why you cannot reproduce the segfault manually, but it doesn't explain why the segfault happens.
There's no other installation of libvirt in the system. I can also reproduce the same thing in all Pardus machines, so I believe it's something in libvirt not doing well with something else in our service init mechanisms.
I guess I'd put money on some environment variable causing trouble. It could be a *missing* environment variable that we expect to always be set, or something like that
Hi Daniel, thanks for your message. Yes, I did a small script file as you suggested and found out this environment while libvirtd was run:
DBUS_STARTER_ADDRESS=unix:path=/var/run/dbus/system_bus_socket,guid=6c515f612162b05d554b59cd4c112d43 KRB5_KTNAME=/etc/libvirt/krb5.tab PWD=/ DBUS_STARTER_BUS_TYPE=system SHLVL=1 _=/usr/bin/env
This looks very weak compared to the standard root environment that I pasted in my earlier message.
No PATH? I bet there is code in libvirt that assumes getenv("PATH") will be != NULL.
Could you try to add PATH to the environment. It can be empty, doesn't matter. Just make sure it's there, so getenv("PATH") returns an empty string instead of NULL.
I just did exactly what you said by the same instinct, ie added the PATH environment variable, and, nailed it down! It works! wow!
That confirms our assumption.
>> Could you provide a GDB backtrace of the segfault? The syslog >> entry >> only >> says that it crashed in libc, that's not enough information to >> debug the segfault. > > Unfortunately, I can't find a related core file in the system. In > fact, core > file is not generated. I'll also try to fix this out and come > back > to the > list. >
Getting a backtrace would be simpler if you could reproduce the problem manually. In that case you could just start libvirtd in GDB. But getting a backtrace from a coredump will work too.
I can't reproduce the segfault when I run it manually. It only happens when it's run from this python script. I will try to initialize gdb inside the script and connect remotely to the gdb session, but it's getting a bit over my debugging capabilities :) For example, I don't know how to assign the symbols and source code etc from the package build directory to gdb.
Try creating a wrapper script, eg
mv /usr/sbin/libvirtd /usr/sbin/libvirtd.real cat > /usr/sbin/libvirtd <<EOF #!/bin/sh cd /tmp ulimited -c unlimited exec /usr/sbin/libvirtd.real EOF chmod +x /usr/sbin/libvirtd
That will hopefully give you a core dump in /tmp you can get get a stack trace from
Yes, I got the core file with the script. However, when I open the core file with gdb, and use bt command to get the backtrace, the only thing it tells me is this:
Core was generated by `/usr/sbin/libvirtd --daemon'. Program terminated with signal 11, Segmentation fault. #0 0xb73ed8f3 in ?? () (gdb) bt Cannot access memory at address 0x810b9db
Maybe I don't know enough of debugging as I know I have to see the code lines (somehow) at this segfault point. Could you guide me on that?
Thanks,
Br, Emre
Strange backtrace. Maybe there is heap corruption going on so that GDB can't make sense out of it anymore.
I'll do some research about the PATH usage in libvirt now.
OK. I guess it's used to find the dhcp daemon, iptables etc. Other service scripts seem to work happily without this PATH, but I'll ask developers to add it to the python service environment to make sure it works fine.
Thanks again Matthias, Daniel! I'm a happy guy now :)
Emre Erenoglu
Yes, libvirt tries to discover that binaries via the PATH. The utility function virFindFileInPath used the result of getenv("PATH") without checking it for NULL. I'll post a patch for that in a bit. Matthias

2010/6/10 Matthias Bolte <matthias.bolte@googlemail.com>:
2010/6/10 Emre Erenoglu <erenoglu@gmail.com>:
On Thu, Jun 10, 2010 at 10:35 PM, Matthias Bolte ...
OK. I guess it's used to find the dhcp daemon, iptables etc. Other service scripts seem to work happily without this PATH, but I'll ask developers to add it to the python service environment to make sure it works fine.
Thanks again Matthias, Daniel! I'm a happy guy now :)
Emre Erenoglu
Yes, libvirt tries to discover that binaries via the PATH.
The utility function virFindFileInPath used the result of getenv("PATH") without checking it for NULL. I'll post a patch for that in a bit.
Matthias
As an additional note: you'll need to have PATH set even when this bug is fixed, because libvirtd discovers the QEMU binaries and the other relevant binaries via the PATH. So you should define PATH for libvirtd in a way that it can find those binaries. Matthias

On Thu, Jun 10, 2010 at 11:01 PM, Matthias Bolte < matthias.bolte@googlemail.com> wrote:
2010/6/10 Matthias Bolte <matthias.bolte@googlemail.com>:
2010/6/10 Emre Erenoglu <erenoglu@gmail.com>:
On Thu, Jun 10, 2010 at 10:35 PM, Matthias Bolte ...
OK. I guess it's used to find the dhcp daemon, iptables etc. Other service scripts seem to work happily without this PATH, but I'll ask developers to add it to the python service environment to make sure it works fine.
Thanks again Matthias, Daniel! I'm a happy guy now :)
Emre Erenoglu
Yes, libvirt tries to discover that binaries via the PATH.
The utility function virFindFileInPath used the result of getenv("PATH") without checking it for NULL. I'll post a patch for that in a bit.
Matthias
As an additional note: you'll need to have PATH set even when this bug is fixed, because libvirtd discovers the QEMU binaries and the other relevant binaries via the PATH. So you should define PATH for libvirtd in a way that it can find those binaries.
OK, I'll make sure the PATH is there. I'm fixing the service script now. -- Emre
participants (3)
-
Daniel P. Berrange
-
Emre Erenoglu
-
Matthias Bolte