Re: [libvirt] Libvirt segfault in qemuMonitorSend() with multi-threaded API use

Friday, 5 March 2010

Daniel, thanks for the help.  I was able to fix the problem (see my post
in a new thread).

On Fri, 2010-03-05 at 09:32 +0000, Daniel P. Berrange wrote:
...
 On Thu, Mar 04, 2010 at 02:22:35PM -0600, Adam Litke wrote:
 > I have a multi-threaded Python program that shares a single libvirt
 > connection object among several threads (one thread per active domain on
 > the system plus a management thread).  On a heavily loaded host with 8
 > running domains I am getting a consistent libvirtd segfault in the qemu
 > monitor handling code.  This happens with libvirt-0.7.6 and git.
 > 
 > Mar  4 12:23:13 bc1cn7-mgmt kernel: [ 3947.836151] libvirtd[7716]:
 > segfault at 24 ip 000000000045de5c sp 00007fe5aa7d2b20 error 4 in
 > libvirtd[400000+b3000]
 > 
 > Using addr2line, this translates to: libvirt/src/qemu/qemu_monitor.c:698
 > 
 > Which is in qemuMonitorSend():
 > 
 > --> while (!mon->msg->finished) { 
 >         if (virCondWait(&mon->notify, &mon->lock) < 0)
 >             goto cleanup;
 >     }
 > 
 > It seems that mon->msg is being reset to NULL in the middle of this loop
 > execution.  I suspect that is because qemuMonitorSend() is not reentrant
 > and multiple threads in my program are racing here.  I would guess the
 > 'mon->msg = NULL;' on line 707 causes the NULL that trips up the other
 > racer.

 > I presume the Monitor interface has some locking protection around it to
 > ensure that only one thread can use it at a time?

 You are correct that qemuMonitorSend() is not re-entrant. qemuMonitorSend()
 is invoked by any of the qemuMonitorXXXX() APIs. For all these APIs, the
 QEMU driver code is required to first hold the lock by calling
 qemuDomainObjEnterMonitor() and release it when dine with the method
 qemuDomainObjExitMonitor.

 eg, 

   qemuDomainObjEnterMonitor(obj);
   naddrs = qemuMonitorGetAllPCIAddresses(priv->mon,
                                          &addrs);
   qemuDomainObjExitMonitor(obj);

 > Is there an easy way to fix this?  I am not familiar with the measures
 > employed to make libvirt thread-safe.  Thanks!

 The first step is to try to identify which functions were run concurrently

 Try running libvirtd with 

   LIBVIRT_LOG_FILTERS=1:qemu LIBVIRT_LOG_OUTPUTS=1:stderr

 You'll get quite alot of data printed out for all montor calls which might
 let you see which overlap. You might want to add further log messages in the
 qemuMonitorSend() method itself to help with this.

 There is a small chance that using GDB 'thread apply all backtrace' when
 it crashes will show you info, but that's fairly unlikely

 The other possibility is buffer corruption in the qemuMonitor struct, but
 that seems less likely

 Regards,
 Daniel 

-- 
Thanks,
Adam

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] Libvirt segfault in qemuMonitorSend() with multi-threaded API use