[Libvir] Core dump while executing virsh in RHEL5 .

Hi all, I have installed <mailto:libvir-list@redhat.com>libvirt-0.3.3-1, on my RHEL 5 64 bit machine(KerVer 2.6.18-8.el5xen). But I am getting core-dump , while executing virsh command. Could anyone please help me to solve this? #gdb /usr/bin/virsh core Core was generated by `virsh'. Program terminated with signal 11, Segmentation fault. #0 0x000000393686ca27 in malloc_consolidate () from /lib64/libc.so.6 (gdb) where #0 0x000000393686ca27 in malloc_consolidate () from /lib64/libc.so.6 #1 0x000000393686eea2 in _int_malloc () from /lib64/libc.so.6 #2 0x00000039368706dd in malloc () from /lib64/libc.so.6 #3 0x000000393685eb4a in __fopen_internal () from /lib64/libc.so.6 #4 0x000000393682cb5f in read_alias_file () from /lib64/libc.so.6 #5 0x000000393682d09e in _nl_expand_alias () from /lib64/libc.so.6 #6 0x000000393682b93e in _nl_find_domain () from /lib64/libc.so.6 #7 0x000000393682b2ff in __dcigettext () from /lib64/libc.so.6 #8 0x000000393687530c in strerror_r () from /lib64/libc.so.6 #9 0x000000393687514e in strerror () from /lib64/libc.so.6 #10 0x00002aaaaab018f2 in __virConfReadFile () from /usr/lib64/libvirt.so.0 #11 0x00002aaaaab021ec in __virConfReadFile () from /usr/lib64/libvirt.so.0 #12 0x00002aaaaaadd44b in virInitialize () from /usr/lib64/libvirt.so.0 #13 0x000000000040a65e in ?? () #14 0x000000393681d8a4 in __libc_start_main () from /lib64/libc.so.6 #15 0x00000000004033b9 in ?? () #16 0x00007fffd2df95c8 in ?? () #17 0x0000000000000000 in ?? () I have installed the following rpms from libvirt.org/libvirt # rpm -qa | grep virt libvirt-devel-0.3.3-1 libvirt-0.3.3-1 libvirt-debuginfo-0.3.3-1// * With Regards Veerendra C*

On Tue, Oct 09, 2007 at 03:50:37PM +0530, Veerendra wrote:
Hi all,
I have installed <mailto:libvir-list@redhat.com>libvirt-0.3.3-1, on my RHEL 5 64 bit machine(KerVer 2.6.18-8.el5xen). But I am getting core-dump , while executing virsh command.
Could anyone please help me to solve this?
#gdb /usr/bin/virsh core
Core was generated by `virsh'. Program terminated with signal 11, Segmentation fault. #0 0x000000393686ca27 in malloc_consolidate () from /lib64/libc.so.6 (gdb) where #0 0x000000393686ca27 in malloc_consolidate () from /lib64/libc.so.6 #1 0x000000393686eea2 in _int_malloc () from /lib64/libc.so.6 #2 0x00000039368706dd in malloc () from /lib64/libc.so.6 #3 0x000000393685eb4a in __fopen_internal () from /lib64/libc.so.6 #4 0x000000393682cb5f in read_alias_file () from /lib64/libc.so.6 #5 0x000000393682d09e in _nl_expand_alias () from /lib64/libc.so.6 #6 0x000000393682b93e in _nl_find_domain () from /lib64/libc.so.6 #7 0x000000393682b2ff in __dcigettext () from /lib64/libc.so.6 #8 0x000000393687530c in strerror_r () from /lib64/libc.so.6 #9 0x000000393687514e in strerror () from /lib64/libc.so.6 #10 0x00002aaaaab018f2 in __virConfReadFile () from /usr/lib64/libvirt.so.0 #11 0x00002aaaaab021ec in __virConfReadFile () from /usr/lib64/libvirt.so.0 #12 0x00002aaaaaadd44b in virInitialize () from /usr/lib64/libvirt.so.0 #13 0x000000000040a65e in ?? () #14 0x000000393681d8a4 in __libc_start_main () from /lib64/libc.so.6 #15 0x00000000004033b9 in ?? () #16 0x00007fffd2df95c8 in ?? () #17 0x0000000000000000 in ?? ()
Rerun under gdb control, as I'm unable to find where this may occur just from the tack strace given. __virConfReadFile only calls virConfError which does not call strerror, put a breakpoint in __virConfReadFile and __virRaiseError and try to find out what is happening please. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Daniel, Thanks for the quick reply, It didn't stop at the break Will this help ?
Rerun under gdb control, as I'm unable to find where this may occur just from the tack strace given. __virConfReadFile only calls virConfError which does not call strerror, put a breakpoint in __virConfReadFile and __virRaiseError and try to find out what is happening please.
Daniel
(gdb) where #0 0x000000393686ca27 in malloc_consolidate () from /lib64/libc.so.6 #1 0x000000393686eea2 in _int_malloc () from /lib64/libc.so.6 #2 0x00000039368706dd in malloc () from /lib64/libc.so.6 #3 0x000000393685eb4a in __fopen_internal () from /lib64/libc.so.6 #4 0x000000393682cb5f in read_alias_file () from /lib64/libc.so.6 #5 0x000000393682d09e in _nl_expand_alias () from /lib64/libc.so.6 #6 0x000000393682b93e in _nl_find_domain () from /lib64/libc.so.6 #7 0x000000393682b2ff in __dcigettext () from /lib64/libc.so.6 #8 0x000000393687530c in strerror_r () from /lib64/libc.so.6 #9 0x000000393687514e in strerror () from /lib64/libc.so.6 #10 0x00002aaaaab020fc in doRemoteOpen (conn=0x7eccbe0, priv=0x7ecc960, uri_str=0x2aaaaab13762 "xen:///", flags=2) at remote_internal.c:553 #11 0x00002aaaaab02abc in remoteNetworkOpen (conn=0x7eccbe0, uri_str=0x2aaaaab13762 "xen:///", flags=2) at remote_internal.c:2392 #12 0x00002aaaaaadd5db in do_open (name=0x2aaaaab13762 "xen:///", flags=0) at libvirt.c:447 #13 0x000000000040a80e in main (argc=0, argv=0x7fff4c23c9e8) at virsh.c:4507 #14 0x000000393681d8a4 in __libc_start_main () from /lib64/libc.so.6 #15 0x0000000000403459 in _start () (gdb) 534 if (connect (priv->sock, (struct sockaddr *) &addr, sizeof addr) == -1) { (gdb) 542 if (errno == ECONNREFUSED && (gdb) 553 error (NULL, VIR_ERR_SYSTEM_ERROR, strerror (errno)); (gdb) (gdb) list 548 trials++; 549 usleep(5000 * trials * trials); 550 goto autostart_retry; 551 } 552 } 553 error (NULL, VIR_ERR_SYSTEM_ERROR, strerror (errno)); 554 goto failed; 555 } 556 557 break; (gdb) p VIR_ERR_SYSTEM_ERROR $7 = VIR_ERR_SYSTEM_ERROR (gdb) p errno $8 = 2 (gdb) n Program received signal SIGSEGV, Segmentation fault. 0x000000393686ca27 in malloc_consolidate () from /lib64/libc.so.6 Num Type Disp Enb Address What 1 breakpoint keep n 0x00002aaaaab00ed0 remote_internal.c:243 2 breakpoint keep n 0x00002aaaaaaf4480 virterror.c:326 3 breakpoint keep n 0x00002aaaaaaf74c0 conf.c:704

On Tue, Oct 09, 2007 at 04:45:05PM +0530, Veerendra wrote:
Daniel, Thanks for the quick reply, It didn't stop at the break Will this help ?
I still don't understand how you could get there (remote code while using xen:///) and why this could crash, except for a previous memory corruption. Please run the same command under valgrind and report, thanks, Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Daniel Veillard wrote:
I still don't understand how you could get there (remote code while using xen:///) and why this could crash, except for a previous memory corruption. Please run the same command under valgrind and report,
thanks,
Daniel Now when I am running virsh using the valgrind it is listing fine!! But when I am trying to run virsh alone it is dumping the core again. Attaching the valgrind.log file also.
[root@mx3650b new]# valgrind --log-file=valgrind.log -v virsh list libvir: Remote error : No such file or directory libvir: warning : Failed to find the network: Is the daemon running ? Id Name State ---------------------------------- 0 Domain-0 running

Veerendra wrote:
Now when I am running virsh using the valgrind it is listing fine!! But when I am trying to run virsh alone it is dumping the core again. Attaching the valgrind.log file also.
Does anyone knows why this is behaving as above ? And the virsh start for my domu1, which is inactive throwing up this error . Any solutions ? # valgrind virsh start domu1 libvir: Xen Daemon error : POST operation failed: (xend.err "Error creating domain: Boot loader didn't return any data!") error: Failed to start domain domu1

On Wed, Oct 10, 2007 at 04:44:36PM +0530, Veerendra wrote:
Veerendra wrote:
Now when I am running virsh using the valgrind it is listing fine!! But when I am trying to run virsh alone it is dumping the core again. Attaching the valgrind.log file also.
Does anyone knows why this is behaving as above ?
And the virsh start for my domu1, which is inactive throwing up this error . Any solutions ?
# valgrind virsh start domu1
libvir: Xen Daemon error : POST operation failed: (xend.err "Error creating domain: Boot loader didn't return any data!")
That means that Xen ran 'pygrub' to extract the kernel & initrd from the primary disk, but was unable to find a MBR, or grub config or kernel. Most likely cause is that the previous OS install was not successful and thus didn't install grub/MBR in the guest. Dan -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

That means that Xen ran 'pygrub' to extract the kernel & initrd from the primary disk, but was unable to find a MBR, or grub config or kernel. Most likely cause is that the previous OS install was not successful and thus didn't install grub/MBR in the guest.
Dan
Thanks Dan, for your suggestion. Now, I could build the libvirt3.3 manually and install it on RHEL5 . But still I am getting the error, while trying to start the guest.

Did this problem get fixed while I was away? What I'm seeing in Veerendra's valgrind log are the following suspicious messages, although the line numbers don't correspond to the earlier line numbers from gdb: ==17508== Invalid read of size 8 ==17508== at 0x4C5D0BB: doRemoteOpen (remote_internal.c:323) ==17508== by 0x4C5EABB: remoteNetworkOpen (remote_internal.c:2392) ==17508== by 0x4C395DA: do_open (libvirt.c:447) ==17508== by 0x40A80D: main (virsh.c:4507) if (uri->user) { username = strdup (uri->user); <--- line 323 if (!username) goto out_of_memory; } ==17508== Invalid write of size 8 ==17508== at 0x4C5D455: doRemoteOpen (remote_internal.c:761) ==17508== by 0x4C5EABB: remoteNetworkOpen (remote_internal.c:2392) ==17508== by 0x4C395DA: do_open (libvirt.c:447) ==17508== by 0x40A80D: main (virsh.c:4507) if (query_out) *query_out = NULL; <-- line 761 As I understand the valgrind message, these indicate that the memory being read/written is not valid (ie. outside any allocated malloc block or static memory), although I don't understand how those lines could generate that error. (FWIW this is not a core dump that I've ever seen from virsh, but I will have a go at seeing if I can reproduce this on my RHEL 5 box later). Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903

On Mon, Oct 15, 2007 at 01:10:45PM +0100, Richard W.M. Jones wrote:
Did this problem get fixed while I was away?
What I'm seeing in Veerendra's valgrind log are the following suspicious messages, although the line numbers don't correspond to the earlier line numbers from gdb:
yeah, I looked at it but could not find the right mapping
==17508== Invalid read of size 8 ==17508== at 0x4C5D0BB: doRemoteOpen (remote_internal.c:323) ==17508== by 0x4C5EABB: remoteNetworkOpen (remote_internal.c:2392) ==17508== by 0x4C395DA: do_open (libvirt.c:447) ==17508== by 0x40A80D: main (virsh.c:4507)
if (uri->user) { username = strdup (uri->user); <--- line 323 if (!username) goto out_of_memory; }
==17508== Invalid write of size 8 ==17508== at 0x4C5D455: doRemoteOpen (remote_internal.c:761) ==17508== by 0x4C5EABB: remoteNetworkOpen (remote_internal.c:2392) ==17508== by 0x4C395DA: do_open (libvirt.c:447) ==17508== by 0x40A80D: main (virsh.c:4507)
if (query_out) *query_out = NULL; <-- line 761
As I understand the valgrind message, these indicate that the memory being read/written is not valid (ie. outside any allocated malloc block or static memory), although I don't understand how those lines could generate that error.
yesh I could not understand either, I was afraid the compilation with optimization would lead to the skew of lines, and only a valgrind of a debug version possibly recompiled locally would be really okay to check this out. And then the issue got off my radar :-\ Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
participants (4)
-
Daniel P. Berrange
-
Daniel Veillard
-
Richard W.M. Jones
-
Veerendra