[libvirt] libvirt-0.8.5 sometimes crahing

Hello, I noticed that libvirtd sometimes crashes immediately after start.. here's the backtrace: Core was generated by `libvirtd --daemon --listen'. Program terminated with signal 11, Segmentation fault. #0 0x00007fee038c0ca0 in pthread_mutex_lock () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007fee038c0ca0 in pthread_mutex_lock () from /lib64/libpthread.so.0 #1 0x000000000042eb21 in qemuDomainObjEnterMonitor (obj=0xe417d0) at qemu/qemu_driver.c:478 #2 0x0000000000448e7e in qemudDomainGetInfo (dom=<value optimized out>, info=0x43392e10) at qemu/qemu_driver.c:5165 #3 0x00007fee054f29f2 in virDomainGetInfo (domain=0xe3f610, info=0x43392e10) at libvirt.c:3177 #4 0x000000000042554e in remoteDispatchDomainGetInfo (server=<value optimized out>, client=<value optimized out>, conn=0xdd8a20, hdr=<value optimized out>, rerr=0x43392eb0, args=<value optimized out>, ret=0x43392f90) at remote.c:1503 #5 0x0000000000427444 in remoteDispatchClientCall (server=0xdb84d0, client=0x7fedfc040bc0, msg=0x7fedfc0410a0) at dispatch.c:529 #6 remoteDispatchClientRequest (server=0xdb84d0, client=0x7fedfc040bc0, msg=0x7fedfc0410a0) at dispatch.c:407 #7 0x00000000004194c7 in qemudWorker (data=<value optimized out>) at libvirtd.c:1587 #8 0x00007fee038be73d in start_thread () from /lib64/libpthread.so.0 #9 0x00007fee03430f6d in clone () from /lib64/libc.so.6 Could somebody please look at it? If I could provide any additional information, please let me know. Thanks in advance BR nik -- ------------------------------------- Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.: +420 596 603 142 fax: +420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: servis@linuxbox.cz -------------------------------------

libvir-list-bounces@redhat.com wrote on 11/23/2010 01:49:46 PM:
Hello, I noticed that libvirtd sometimes crashes immediately after start.. here's the backtrace:
Core was generated by `libvirtd --daemon --listen'. Program terminated with signal 11, Segmentation fault. #0 0x00007fee038c0ca0 in pthread_mutex_lock () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007fee038c0ca0 in pthread_mutex_lock () from /lib64/libpthread.so.0 #1 0x000000000042eb21 in qemuDomainObjEnterMonitor (obj=0xe417d0) at qemu/qemu_driver.c:478 #2 0x0000000000448e7e in qemudDomainGetInfo (dom=<value optimized out>, info=0x43392e10) at qemu/qemu_driver.c:5165 #3 0x00007fee054f29f2 in virDomainGetInfo (domain=0xe3f610, info=0x43392e10) at libvirt.c:3177 #4 0x000000000042554e in remoteDispatchDomainGetInfo (server=<value optimized out>, client=<value optimized out>, conn=0xdd8a20, hdr=<value optimized out>, rerr=0x43392eb0, args=<value optimized out>, ret=0x43392f90) at remote.c:1503 #5 0x0000000000427444 in remoteDispatchClientCall (server=0xdb84d0, client=0x7fedfc040bc0, msg=0x7fedfc0410a0) at dispatch.c:529 #6 remoteDispatchClientRequest (server=0xdb84d0, client=0x7fedfc040bc0, msg=0x7fedfc0410a0) at dispatch.c:407 #7 0x00000000004194c7 in qemudWorker (data=<value optimized out>) at libvirtd.c:1587 #8 0x00007fee038be73d in start_thread () from /lib64/libpthread.so.0 #9 0x00007fee03430f6d in clone () from /lib64/libc.so.6
Could somebody please look at it? If I could provide any additional information, please let me know. Thanks in advance BR nik
I also saw crashes recently, though those were resolved with a recent patch from ~3 hours ago. http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=149c49213799bc0fb1b99cc... Hope this one also resolves your crashes. Regards, Stefan

I also saw crashes recently, though those were resolved with a recent patch from ~3 hours ago.
http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=149c49213799bc0fb1b99cc...
Hope this one also resolves your crashes.
Regards, Stefan
Hello Stefan, Thanks for reply! but I'm afraid this is unrelated problem, seems there is something wrong with locking.. (like Eric pointed) cheers! n.
-- ------------------------------------- Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.: +420 596 603 142 fax: +420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: servis@linuxbox.cz -------------------------------------

On 11/23/2010 11:49 AM, Nikola Ciprich wrote:
Hello, I noticed that libvirtd sometimes crashes immediately after start.. here's the backtrace:
Core was generated by `libvirtd --daemon --listen'. Program terminated with signal 11, Segmentation fault. #0 0x00007fee038c0ca0 in pthread_mutex_lock () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007fee038c0ca0 in pthread_mutex_lock () from /lib64/libpthread.so.0 #1 0x000000000042eb21 in qemuDomainObjEnterMonitor (obj=0xe417d0) at qemu/qemu_driver.c:478 #2 0x0000000000448e7e in qemudDomainGetInfo (dom=<value optimized out>, info=0x43392e10) at qemu/qemu_driver.c:5165 #3 0x00007fee054f29f2 in virDomainGetInfo (domain=0xe3f610, info=0x43392e10) at libvirt.c:3177 #4 0x000000000042554e in remoteDispatchDomainGetInfo (server=<value optimized out>, client=<value optimized out>, conn=0xdd8a20, hdr=<value optimized out>, rerr=0x43392eb0, args=<value optimized out>, ret=0x43392f90) at remote.c:1503 #5 0x0000000000427444 in remoteDispatchClientCall (server=0xdb84d0, client=0x7fedfc040bc0, msg=0x7fedfc0410a0) at dispatch.c:529 #6 remoteDispatchClientRequest (server=0xdb84d0, client=0x7fedfc040bc0, msg=0x7fedfc0410a0) at dispatch.c:407 #7 0x00000000004194c7 in qemudWorker (data=<value optimized out>) at libvirtd.c:1587 #8 0x00007fee038be73d in start_thread () from /lib64/libpthread.so.0 #9 0x00007fee03430f6d in clone () from /lib64/libc.so.6
Could somebody please look at it? If I could provide any additional information, please let me know. Thanks in advance
Hmm; this looks similar to the crash that was supposed to be fixed by commit 054d43f57 which made it into 0.8.5; maybe I missed a case where we aren't grabbing locks in the correct order? -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

Hmm; this looks similar to the crash that was supposed to be fixed by commit 054d43f57 which made it into 0.8.5; maybe I missed a case where we aren't grabbing locks in the correct order?
Hello Eric, might be.. I see those crashes quite often on my testing boxes, actually every third or fourth boot it crashes.. Do You think You might have time to revise the patch? cheers nik
-- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org
-- ------------------------------------- Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.: +420 596 603 142 fax: +420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: servis@linuxbox.cz -------------------------------------

On 11/23/2010 11:49 AM, Nikola Ciprich wrote:
Hello, I noticed that libvirtd sometimes crashes immediately after start.. here's the backtrace:
Core was generated by `libvirtd --daemon --listen'. Program terminated with signal 11, Segmentation fault. #0 0x00007fee038c0ca0 in pthread_mutex_lock () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007fee038c0ca0 in pthread_mutex_lock () from /lib64/libpthread.so.0 #1 0x000000000042eb21 in qemuDomainObjEnterMonitor (obj=0xe417d0) at qemu/qemu_driver.c:478 #2 0x0000000000448e7e in qemudDomainGetInfo (dom=<value optimized out>, info=0x43392e10) at qemu/qemu_driver.c:5165
Unfortunately, I'm not seeing a local race: } else if (!priv->jobActive) { if (qemuDomainObjBeginJob(vm) < 0) goto cleanup; if (!virDomainObjIsActive(vm)) err = 0; else { qemuDomainObjEnterMonitor(vm); err = qemuMonitorGetBalloonInfo(priv->mon, &balloon); qemuDomainObjExitMonitor(vm); } That properly grabs the job lock, verifies that the vm is still active, and then uses the monitor. The only other thing I can think of is a non-local race that I'm not seeing in the immediate vicinity; perhaps one thread is able to see the vm object prior to another thread finishing the creation of the monitor lock, so that querying the domain info ends up calling pthread_mutex_lock on an invalid mutex? But that shouldn't be possible - object creation is also done under a lock, so nothing should be able to see a partially initialized object. Is this something you can easily repeat? Can you reproduce it while a debugger is attached, or have you been limited to post-mortem debugging so far? I'm a bit stumped on what to look for next. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

Unfortunately, I'm not seeing a local race:
} else if (!priv->jobActive) { if (qemuDomainObjBeginJob(vm) < 0) goto cleanup; if (!virDomainObjIsActive(vm)) err = 0; else { qemuDomainObjEnterMonitor(vm); err = qemuMonitorGetBalloonInfo(priv->mon, &balloon); qemuDomainObjExitMonitor(vm); }
That properly grabs the job lock, verifies that the vm is still active, and then uses the monitor. The only other thing I can think of is a non-local race that I'm not seeing in the immediate vicinity; perhaps one thread is able to see the vm object prior to another thread finishing the creation of the monitor lock, so that querying the domain info ends up calling pthread_mutex_lock on an invalid mutex? But that shouldn't be possible - object creation is also done under a lock, so nothing should be able to see a partially initialized object.
Is this something you can easily repeat? Can you reproduce it while a debugger is attached, or have you been limited to post-mortem debugging so far? I'm a bit stumped on what to look for next.
Hello Eric, crashes always occur when libvirt is starting, no machines are running at that time.. Dunno if this information can help...? Maybe it might be possible to edit initscript to start libvirt in gdb? n. -- ------------------------------------- Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.: +420 596 603 142 fax: +420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: servis@linuxbox.cz -------------------------------------
participants (3)
-
Eric Blake
-
Nikola Ciprich
-
Stefan Berger