2009/12/4 Shi Jin <jinzishuai(a)yahoo.com>:
FOr me, the error seems to vary. I am going to show two versions.
This is one.
The gdb bt is
#0 0x00007f7ce92986b4 in pthread_mutex_unlock () from /lib/libpthread.so.0
#1 0x000000000042f661 in ?? ()
#2 0x000000000043da36 in ?? ()
#3 0x000000000043ef4b in ?? ()
#4 0x00007f7ce94f90fb in virDomainCreateXML () from /usr/lib/libvirt.so.0
#5 0x000000000041f228 in ?? ()
#6 0x0000000000420e41 in ?? ()
#7 0x00000000004211f3 in ?? ()
#8 0x000000000041478c in ?? ()
#9 0x00007f7ce9294a04 in start_thread () from /lib/libpthread.so.0
#10 0x00007f7ce8ffe7bd in clone () from /lib/libc.so.6
#11 0x0000000000000000 in ?? ()
Missing debug symbols here, but the last function is
pthread_mutex_unlock, like in the other bug report.
Its corresponding debug is
17:09:46.140: debug : virEventRemoveHandleImpl:173 : Remove handle w=81
17:09:46.140: debug : virEventRemoveHandleImpl:186 : mark delete 38 68
17:09:46.140: debug : virEventInterruptLocked:658 : Skip interrupt, 1 -438118128
17:09:46.140: debug : qemuMonitorClose:532 : Mark monitor to be deleted 0x7f7cd80cf480
17:09:46.140: debug : qemuDomainSetFileOwnership:1971 : Setting ownership on
/srv/cloud/one/var//6666/images/disk.0 to 0:0
17:09:46.140: debug : virEventUpdateHandleImpl:146 : Update handle w=81 e=12
17:09:46.140: debug : virEventInterruptLocked:662 : Interrupting
17:09:46.140: debug : qemuMonitorCommandWithHandler:271 : Receive command reply ret=-1
errno=104 0 bytes '(null)'
17:09:46.140: error : qemuMonitorCommandWithHandler:290 : cannot send monitor command
'info cpus': Connection reset by peer
17:09:46.140: error : qemuMonitorTextGetCPUInfo:436 : internal error cannot run monitor
command to fetch CPU thread info
Here is another one:
(gdb) bt
#0 0x00007ff61ca026b4 in pthread_mutex_unlock () from /lib/libpthread.so.0
#1 0x000000000042f661 in qemuDomainObjExitMonitorWithDriver (driver=0x7ff61000c410,
obj=0x1e8c310) at qemu/qemu_driver.c:318
#2 0x000000000043da36 in qemudStartVMDaemon (conn=<value optimized out>,
driver=0x7ff61000c410, vm=0x1e8c310,
migrateFrom=<value optimized out>, stdin_fd=<value optimized out>) at
qemu/qemu_driver.c:2327
#3 0x000000000043ef4b in qemudDomainCreate (conn=0x7ff610009a00, xml=<value optimized
out>, flags=<value optimized out>)
at qemu/qemu_driver.c:2881
#4 0x00007ff61cc630fb in virDomainCreateXML (conn=0x7ff610009a00,
xmlDesc=0x7ff610004d70 "<domain
type='kvm'>\n\t<name>one-7238</name>\n\t<vcpu>1</vcpu>\n\t<memory>524288</memory>\n\t<os>\n\t\t<type>hvm</type>\n\t\t<boot
dev='hd'/>\n\t</os>\n\t<devices>\n\t\t<emulator>/usr/bin/kvm</emulator>\n\t\t<disk
type='file"..., flags=0) at libvirt.c:1745
#5 0x000000000041f228 in remoteDispatchDomainCreateXml (server=<value optimized
out>, client=<value optimized out>,
conn=0x7ff610009a00, hdr=0x6572687420555043, rerr=0x6f666e69206461,
args=0x616d6d6f6320726f, ret=0x7ff60e7fbed0)
at remote.c:873
#6 0x0000000000420e41 in remoteDispatchClientCall (server=<value optimized out>,
client=0x7ff6080c4960, msg=0x7ff6080ca910)
at dispatch.c:506
#7 0x00000000004211f3 in remoteDispatchClientRequest (server=0x1e62070,
client=0x7ff6080c4960, msg=0x7ff6080ca910)
at dispatch.c:388
#8 0x000000000041478c in qemudWorker (data=<value optimized out>) at
libvirtd.c:1518
#9 0x00007ff61c9fea04 in start_thread () from /lib/libpthread.so.0
#10 0x00007ff61c7687bd in clone () from /lib/libc.so.6
#11 0x0000000000000000 in ?? ()
Yep, it's the same bug that Nikola Ciprich reported, even if the
actual backtraces are not completely equal.
The call to qemuDomainObjExitMonitorWithDriver results in trying to
unlock the monitor, but the monitor has been deleted in between
because an error occurred while interacting with QEMU.
So, my initial guess was correct and this is a known bug. As pointed
out in the referenced thread, a preliminary patch is already
available.
and its debug:
17:33:35.424: debug : virEventUpdateHandleImpl:146 : Update handle w=1389 e=12
17:33:35.424: debug : virEventInterruptLocked:662 : Interrupting
17:33:35.424: debug : qemuMonitorCommandWithHandler:271 : Receive command reply ret=-1
errno=104 0 bytes '(null)'
17:33:35.424: error : qemuMonitorCommandWithHandler:290 : cannot send monitor command
'info cpus': Connection reset by peer
17:33:35.424: error : qemuMonitorTextGetCPUInfo:436 : internal error cannot run monitor
command to fetch CPU thread info
Thanks a lot.
Shi
--
Shi Jin, PhD
--- On Thu, 12/3/09, Matthias Bolte <matthias.bolte(a)googlemail.com> wrote:
> From: Matthias Bolte <matthias.bolte(a)googlemail.com>
> Subject: Re: [libvirt] libvirtd crashes
> To: "Shi Jin" <jinzishuai(a)yahoo.com>
> Cc: libvir-list(a)redhat.com, jinzishuai(a)gmail.com
> Date: Thursday, December 3, 2009, 4:20 PM
> 2009/12/3 Shi Jin <jinzishuai(a)yahoo.com>:
> > Hi there,
> >
> > My libvirtd built from the latest git code keeps on
> crashing on all machines.
> > I turned on debugging and this is the information I
> have in the log file before crashing:
> > 14:31:50.828: debug : virEventUpdateHandleImpl:146 :
> Update handle w=110 e=12
> > 14:31:50.828: debug : virEventInterruptLocked:662 :
> Interrupting
> > 14:31:50.828: debug :
> qemuMonitorCommandWithHandler:271 : Receive command reply
> ret=-1 errno=104 0 bytes '(null)'
> > 14:31:50.828: error :
> qemuMonitorCommandWithHandler:290 : cannot send monitor
> command 'info cpus': Connection reset by peer
> > 14:31:50.828: error : qemuMonitorTextGetCPUInfo:436 :
> internal error cannot run monitor command to fetch CPU
> thread info
> >
> > I am not sure if there is any other information needed
> to help identify the problem. My building options are:
> > ./autogen.sh --prefix=/usr --sysconfdir=/etc
> --localstatedir=/var --without-xen --with-qemu
> --with-qemu-user=oneadmin --with-qemu-group=oneadmin
> --without-uml --without-vbox --without-openvz --without-lxc
> >
> > Please help me here. I can accept the service failing
> queries from time to time since I have error handling
> written so that they can be re-tried. But a crashing
> libvirtd takes the whole thing down.
> >
> > Thanks a lot.
> >
> > Shi
> > --
> > Shi Jin, PhD
> >
>
> A GDB backtrace would be helpful. Judging by the debug log
> alone it
> could be a known issue, see:
>
>
https://www.redhat.com/archives/libvir-list/2009-December/msg00063.html
>
> Matthias
>
Matthias