[libvirt] libvirtd crashing with network bridge configuration

Dear Developers, Users, I'm trying to package libvirt for our distribution, however, having some problems past the libvirt 0.4.3 series. Everything compiles and installs fine. With versions above and including 0.5.1, I'm getting the following errors in the dmesg output: virbr0: Dropping NETIF_F_UFO since no NETIF_F_HW_CSUM feature. virbr0: starting userspace STP failed, starting kernel STP sysfs: duplicate filename '0' can not be created ------------[ cut here ]------------ WARNING: at fs/sysfs/dir.c:424 sysfs_add_one+0x34/0xa6() Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables microcode firmware_class acpi_cpufreq cpufreq_powersave cpufreq_userspace ipv6 bridge fuse loop tun kvm_intel kvm coretemp w83627ehf hwmon_vid hwmon snd_hda_intel nvidia(P) snd_seq_dummy snd_seq_oss snd_seq_midi_event joydev snd_seq rtc_cmos rtc_core rtc_lib snd_seq_device i2c_i801 iTCO_wdt usbhid serio_raw snd_pcm_oss snd_mixer_oss hid iTCO_vendor_support snd_pcm ff_memless snd_timer snd_page_alloc thermal atl1 snd_hwdep processor intel_agp snd button mii agpgart i2c_core soundcore sg ext3 jbd mbcache raid456 async_xor async_memcpy async_tx xor raid10 raid1 raid0 dm_mod sd_mod sr_mod cdrom usb_storage pata_jmicron ehci_hcd uhci_hcd usbcore ohci1394 ieee1394 pata_acpi ahci ata_generic libata scsi_mod dock Pid: 3605, comm: libvirtd Tainted: P 2.6.25.20-113 #1 [<c02d60e6>] ? printk+0xf/0x11 [<c0122e01>] warn_on_slowpath+0x41/0x67 [<c01dbae8>] ? vsnprintf+0x287/0x40f [<c012352e>] ? release_console_sem+0x194/0x19c [<c01d7e32>] ? ida_get_new_above+0xd0/0x171 [<c0185afb>] ? find_inode+0x1f/0x5b [<c01afde8>] ? sysfs_ilookup_test+0x0/0x11 [<c0185bfb>] ? ifind+0x32/0x76 [<c01b00ca>] sysfs_add_one+0x34/0xa6 [<c01b05b2>] create_dir+0x43/0x72 [<c01b060e>] sysfs_create_dir+0x2d/0x41 [<c01d85de>] ? kobject_get+0x12/0x17 [<c01d86dc>] kobject_add_internal+0xa9/0x14a [<c01d8811>] kobject_add_varg+0x35/0x41 [<c01d883d>] kobject_init_and_add+0x20/0x22 [<c012ae70>] uids_user_create+0x36/0x5b [<c012b028>] alloc_uid+0xf3/0x1b6 [<c01486c6>] copy_user_ns+0x59/0xaf [<c0136833>] create_new_namespaces+0xe9/0x196 [<c013697e>] copy_namespaces+0x45/0x73 [<c0122152>] copy_process+0xba9/0x11e5 [<c012283d>] do_fork+0xaf/0x1e4 [<c0103428>] sys_clone+0x1f/0x21 [<c0104a2e>] syscall_call+0x7/0xb [<c02d0000>] ? pci_fixup_msi_k8t_onboard_sound+0x68/0x98 ======================= ---[ end trace e8211acaf4c7288d ]--- kobject_add_internal failed for 0 with -EEXIST, don't try to register things with the same name in the same directory. Pid: 3605, comm: libvirtd Tainted: P 2.6.25.20-113 #1 [<c01d876b>] kobject_add_internal+0x138/0x14a [<c01d8811>] kobject_add_varg+0x35/0x41 [<c01d883d>] kobject_init_and_add+0x20/0x22 [<c012ae70>] uids_user_create+0x36/0x5b [<c012b028>] alloc_uid+0xf3/0x1b6 [<c01486c6>] copy_user_ns+0x59/0xaf [<c0136833>] create_new_namespaces+0xe9/0x196 [<c013697e>] copy_namespaces+0x45/0x73 [<c0122152>] copy_process+0xba9/0x11e5 [<c012283d>] do_fork+0xaf/0x1e4 [<c0103428>] sys_clone+0x1f/0x21 [<c0104a2e>] syscall_call+0x7/0xb [<c02d0000>] ? pci_fixup_msi_k8t_onboard_sound+0x68/0x98 ======================= virbr0: no IPv6 routers present ps ax shows the following output right after the service is started: 3605 ? S 0:00 /usr/sbin/libvirtd --daemon 3624 ? Z 0:00 [libvirtd] <defunct> If I don't enable any network at /etc/libvirt/qemu/networks/autostart, then there's no error. Any clues are appreciated, I'm unable to proceed with it. Best Regards. Emre

On Wed, Dec 17, 2008 at 02:05:33AM +0100, Emre Erenoglu wrote:
Dear Developers, Users,
I'm trying to package libvirt for our distribution, however, having some problems past the libvirt 0.4.3 series. Everything compiles and installs fine.
With versions above and including 0.5.1, I'm getting the following errors in the dmesg output:
The kernel bug traces aren't really helpful at finding out what's wrong with libvirtd in userspace.
ps ax shows the following output right after the service is started:
3605 ? S 0:00 /usr/sbin/libvirtd --daemon 3624 ? Z 0:00 [libvirtd] <defunct>
If I don't enable any network at /etc/libvirt/qemu/networks/autostart, then there's no error.
This suggest libvirtd itself is *not* crashing. Some process that libvirtd runs is dieing - not sure which though. Please kill all libvirtd instances, and making sure you have compiled with debugging info turned on (ie, '-g' compile flag), then run # valgrind /usr/sbin/libvirtd And also try LIBVIRT_DEBUG=1 /usr/sbin/libvirtd and send the output for both of these. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Hi Daniel, On Thu, Dec 18, 2008 at 1:47 PM, Daniel P. Berrange <berrange@redhat.com>wrote:
ps ax shows the following output right after the service is started:
3605 ? S 0:00 /usr/sbin/libvirtd --daemon 3624 ? Z 0:00 [libvirtd] <defunct>
If I don't enable any network at /etc/libvirt/qemu/networks/autostart, then there's no error.
This suggest libvirtd itself is *not* crashing. Some process that libvirtd runs is dieing - not sure which though.
Please kill all libvirtd instances, and making sure you have compiled with debugging info turned on (ie, '-g' compile flag), then run
# valgrind /usr/sbin/libvirtd
And also try
LIBVIRT_DEBUG=1 /usr/sbin/libvirtd
and send the output for both of these.
The output of these seperately, and combined (total 3 files) are attached. Guessing that there might a compiler flag problem, here's what our default compiler uses while compiling packages: cflags = -mtune=generic -march=i686 -O2 -pipe -fomit-frame-pointer -fstack-protector -D_FORTIFY_SOURCE=2 cxxflags = -mtune=generic -march=i686 -O2 -pipe -fomit-frame-pointer -fstack-protector -D_FORTIFY_SOURCE=2 host = i686-pc-linux-gnu jobs = -j1 ldflags = -Wl,-O1 -Wl,-z,relro -Wl,--hash-style=gnu A typical compile line for libvirt includes the following command, except the -g debug flag that we added: gcc -DHAVE_CONFIG_H -I. -I.. -I../gnulib/lib -I../gnulib/lib -I../include -I../include -I../qemud -I/usr/include/libxml2 -DLIBDIR=\"/usr/lib\" -DBINDIR=\"/usr/libexec\" -DSBINDIR=\"/usr/sbin\" -DSYSCONF_DIR=\"/etc\" -DLOCALEBASEDIR=\"/usr/share/locale\" -DLOCAL_STATE_DIR=\"/var\" -DGETTEXT_PACKAGE=\"libvirt\" -Wall -Wformat -Wformat-security -Wmissing-prototypes -Wnested-externs -Wpointer-arith -Wextra -Wshadow -Wcast-align -Wwrite-strings -Waggregate-return -Wstrict-prototypes -Winline -Wredundant-decls -Wno-sign-compare -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fasynchronous-unwind-tables -mtune=generic -march=i686 -O2 -pipe -fomit-frame-pointer -fstack-protector -D_FORTIFY_SOURCE=2 *-g* -MT libvirt_driver_xen_la-xen_internal.lo -MD -MP -MF .deps/libvirt_driver_xen_la-xen_internal.Tpo -c xen_internal.c -fPIC -DPIC -o .libs/libvirt_driver_xen_la-xen_internal.o Thanks again for any insight and please don't hesitate if you need further info from my side. Best Regards, Emre

On Fri, Dec 19, 2008 at 12:47:14AM +0100, Emre Erenoglu wrote:
Hi Daniel,
On Thu, Dec 18, 2008 at 1:47 PM, Daniel P. Berrange <berrange@redhat.com>wrote:
ps ax shows the following output right after the service is started:
3605 ? S 0:00 /usr/sbin/libvirtd --daemon 3624 ? Z 0:00 [libvirtd] <defunct>
If I don't enable any network at /etc/libvirt/qemu/networks/autostart, then there's no error.
This suggest libvirtd itself is *not* crashing. Some process that libvirtd runs is dieing - not sure which though.
Please kill all libvirtd instances, and making sure you have compiled with debugging info turned on (ie, '-g' compile flag), then run
# valgrind /usr/sbin/libvirtd
And also try
LIBVIRT_DEBUG=1 /usr/sbin/libvirtd
and send the output for both of these.
The output of these seperately, and combined (total 3 files) are attached. Guessing that there might a compiler flag problem, here's what our default compiler uses while compiling packages:
The valgrind output was all fine - the warnings it issues are all harmless. The key is this message from the libvirt debug output:
DEBUG: util.c: virExec (dnsmasq --keep-in-foreground --strict-order --bind-interfaces --pid-file --conf-file --listen-address 192.168.122.1 --except-interface lo --dhcp-leasefile=/var/lib/libvirt/dhcp-default.leases --dhcp-range 192.168.122.2,192.168.122.254) libvir: error : internal error cannot execute binary 'dnsmasq': No such file or directory
This missing 'dnsmasq' binary is what is causing the 'defunct' process you see. If you install dnsmasq it should all work as expected Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Hi Daniel, On Fri, Dec 19, 2008 at 11:19 AM, Daniel P. Berrange <berrange@redhat.com>wrote:
The key is this message from the libvirt debug output:
DEBUG: util.c: virExec (dnsmasq --keep-in-foreground --strict-order --bind-interfaces --pid-file --conf-file --listen-address 192.168.122.1 --except-interface lo --dhcp-leasefile=/var/lib/libvirt/dhcp-default.leases --dhcp-range 192.168.122.2,192.168.122.254) libvir: error : internal error cannot execute binary 'dnsmasq': No such file or directory
This missing 'dnsmasq' binary is what is causing the 'defunct' process you see. If you install dnsmasq it should all work as expected
I'm very surprised. dnsmasq is not listed in the dependencies of libvirt as far as I know. I was guessing that it's a "nice to have" package, to enable DHCP IP distribution to guests. But even if it doesn't exist, I shall normally be able to assign static IP addresses manually to my guests. So if the system crashes due to dnsmasq, I would call it a bug instead. Nevertheless, I'll try to compile this package for our distro and see what happens. I'll keep you posted, thanks again a lot for your interest. -- Emre

Hi Again, On Sat, Dec 20, 2008 at 12:25 AM, Emre Erenoglu <erenoglu@gmail.com> wrote:
Hi Daniel,
I'm very surprised. dnsmasq is not listed in the dependencies of libvirt as far as I know. I was guessing that it's a "nice to have" package, to enable DHCP IP distriPid: 2616, comm: libvirtd Tainted: P 2.6.25.20-113 #1 bution to guests. But even if it doesn't exist, I shall normally be able to assign static IP addresses manually to my guests.
So if the system crashes due to dnsmasq, I would call it a bug instead. Nevertheless, I'll try to compile this package for our distro and see what happens. I'll keep you posted, thanks again a lot for your interest.
My suspections were correct. The dnsmasq does not have anything to do with the crash. I installed the dnsmasq package and it works fine: 2616 pts/1 S+ 0:00 /usr/sbin/libvirtd 2637 pts/1 S+ 0:00 dnsmasq --keep-in-foreground --strict-order --bind-interfaces --pid-file --conf-file --listen-address 192.168.122.1 --except-interface lo --dhcp-leasefile=/var/lib/libvirt/dhcp-d However, I still have the crash in the dmesg output, as before, errors like: sysfs: duplicate filename '0' can not be created ------------[ cut here ------------- WARNING: at fs/sysfs/dir.c:424 sysfs_add_one+0x34/0xa6() ... Pid: 2616, comm: libvirtd Tainted: P 2.6.25.20-113 #1 ... *kobject_add_internal failed for 0 with -EEXIST, don't try to register things with the same name in the same directory.* ... Pid: 2616, comm: libvirtd Tainted: P 2.6.25.20-113 #1 The DEBUG output shows in the last lines: DEBUG: util.c: virExec (dnsmasq --keep-in-foreground --strict-order --bind-interfaces --pid-file --conf-file --listen-address 192.168.122.1 --except-interface lo --dhcp-leasefile=/var/lib/libvirt/dhcp-default.leases --dhcp-range 192.168.122.2,192.168.122.254) DEBUG: lxc_container.c: lxcContainerAvailable (clone call returned Cannot allocate memory, container support is not enabled) I still suspect the compiler flags. Any suggestions? What about this last DEBUG message regarding the container support? Thanks a lot & Best Regards, Emre

Hi Again, I've started to learn how to use gdb and came accross these lines which are causing the crash: /** * virStateInitialize: * * Initialize all virtualization drivers. * * Return 0 if all succeed, -1 upon any failure. */ int virStateInitialize(void) { int i, ret = 0; if (virInitialize() < 0) return -1; * for (i = 0 ; i < virStateDriverTabCount ; i++) { if (virStateDriverTab[i]->initialize && virStateDriverTab[i]->initialize() < 0) ret = -1; }* return ret; } Somehow after the 5th or 6th run of the "for" expression, guessing that virStateDriverTab[i]->initialize && virStateDriverTab[i]->initialize() is somehow crashing. Any clue is appreciated. Emre

On Sat, Dec 20, 2008 at 03:31:58AM +0100, Emre Erenoglu wrote:
Hi Again,
On Sat, Dec 20, 2008 at 12:25 AM, Emre Erenoglu <erenoglu@gmail.com> wrote:
Hi Daniel,
I'm very surprised. dnsmasq is not listed in the dependencies of libvirt as far as I know. I was guessing that it's a "nice to have" package, to enable DHCP IP distriPid: 2616, comm: libvirtd Tainted: P 2.6.25.20-113 #1 bution to guests. But even if it doesn't exist, I shall normally be able to assign static IP addresses manually to my guests.
So if the system crashes due to dnsmasq, I would call it a bug instead. Nevertheless, I'll try to compile this package for our distro and see what happens. I'll keep you posted, thanks again a lot for your interest.
My suspections were correct. The dnsmasq does not have anything to do with the crash. I installed the dnsmasq package and it works fine:
2616 pts/1 S+ 0:00 /usr/sbin/libvirtd 2637 pts/1 S+ 0:00 dnsmasq --keep-in-foreground --strict-order --bind-interfaces --pid-file --conf-file --listen-address 192.168.122.1 --except-interface lo --dhcp-leasefile=/var/lib/libvirt/dhcp-d
However, I still have the crash in the dmesg output, as before, errors like:
sysfs: duplicate filename '0' can not be created
------------[ cut here -------------
WARNING: at fs/sysfs/dir.c:424 sysfs_add_one+0x34/0xa6() ... Pid: 2616, comm: libvirtd Tainted: P 2.6.25.20-113 #1 ... *kobject_add_internal failed for 0 with -EEXIST, don't try to register things with the same name in the same directory.*
Any of these messages in the dmesg output are kernel problems, not libvirt problems. The process listing you show about indicates that libvirtd itself is running, and has not crashed. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Hi Daniel, On Sun, Dec 21, 2008 at 6:15 PM, Daniel P. Berrange <berrange@redhat.com>wrote:
However, I still have the crash in the dmesg output, as before, errors
like:
sysfs: duplicate filename '0' can not be created
------------[ cut here -------------
WARNING: at fs/sysfs/dir.c:424 sysfs_add_one+0x34/0xa6() ... Pid: 2616, comm: libvirtd Tainted: P 2.6.25.20-113 #1 ... *kobject_add_internal failed for 0 with -EEXIST, don't try to register things with the same name in the same directory.*
Any of these messages in the dmesg output are kernel problems, not libvirt problems. The process listing you show about indicates that libvirtd itself is running, and has not crashed.
Daniel
Using eclipse and gdb, I've traced the problem until this line in source code. Source code filename lxc_container.c, line 654, this function: cpid = clone(lxcContainerDummyChild, childStack, flags, NULL); is crashing something inside my kernel, which results in the messages that I've sent previously in this thread. Sometimes, the crash occurs even though the "cpid" value is 0, and in the second turn (libvirtd continues to run despite the crashing messages), cpid value returns -1 and system gives debug message mentioned in this function: "DEBUG("clone call returned %s, container support is not enabled", strerror(errno));" The full function causing the error, with the exact line in "*bold* *font*" is: int lxcContainerAvailable(int features) { int flags = CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWUSER| CLONE_NEWIPC|SIGCHLD; int cpid; char *childStack; char *stack; int childStatus; if (features & LXC_CONTAINER_FEATURE_NET) flags |= CLONE_NEWNET; if (VIR_ALLOC_N(stack, getpagesize() * 4) < 0) { DEBUG0("Unable to allocate stack"); return -1; } childStack = stack + (getpagesize() * 4); * cpid = clone(lxcContainerDummyChild, childStack, flags, NULL);* VIR_FREE(stack); if (cpid < 0) { DEBUG("clone call returned %s, container support is not enabled", strerror(errno)); return -1; } else { waitpid(cpid, &childStatus, 0); } return 0; } I appreciate any clues on why this could happen, and what shall I change in the host kernel to prevent it from happening? Thank you very very much. Emre

On Wed, Dec 24, 2008 at 01:33:06AM +0100, Emre Erenoglu wrote:
Hi Daniel,
On Sun, Dec 21, 2008 at 6:15 PM, Daniel P. Berrange <berrange@redhat.com>wrote:
However, I still have the crash in the dmesg output, as before, errors
like:
sysfs: duplicate filename '0' can not be created
------------[ cut here -------------
WARNING: at fs/sysfs/dir.c:424 sysfs_add_one+0x34/0xa6() ... Pid: 2616, comm: libvirtd Tainted: P 2.6.25.20-113 #1 ... *kobject_add_internal failed for 0 with -EEXIST, don't try to register things with the same name in the same directory.*
Any of these messages in the dmesg output are kernel problems, not libvirt problems. The process listing you show about indicates that libvirtd itself is running, and has not crashed.
Daniel
Using eclipse and gdb, I've traced the problem until this line in source code. Source code filename lxc_container.c, line 654, this function:
cpid = clone(lxcContainerDummyChild, childStack, flags, NULL);
is crashing something inside my kernel, which results in the messages that I've sent previously in this thread.
Then I recommend you report a bug against the kernel with your OS distribution's bug tracker. I've no idea why its crashing your kernel, but the code works fine on vanilla Fedora kernels. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
participants (2)
-
Daniel P. Berrange
-
Emre Erenoglu