I have a multi-threaded Python program that shares a single libvirt
connection object among several threads (one thread per active domain on
the system plus a management thread). On a heavily loaded host with 8
running domains I am getting a consistent libvirtd segfault in the qemu
monitor handling code. This happens with libvirt-0.7.6 and git.
Mar 4 12:23:13 bc1cn7-mgmt kernel: [ 3947.836151] libvirtd[7716]:
segfault at 24 ip 000000000045de5c sp 00007fe5aa7d2b20 error 4 in
libvirtd[400000+b3000]
Using addr2line, this translates to: libvirt/src/qemu/qemu_monitor.c:698
Which is in qemuMonitorSend():
--> while (!mon->msg->finished) {
if (virCondWait(&mon->notify, &mon->lock) < 0)
goto cleanup;
}
It seems that mon->msg is being reset to NULL in the middle of this loop
execution. I suspect that is because qemuMonitorSend() is not reentrant
and multiple threads in my program are racing here. I would guess the
'mon->msg = NULL;' on line 707 causes the NULL that trips up the other
racer.
I presume the Monitor interface has some locking protection around it to
ensure that only one thread can use it at a time?
Is there an easy way to fix this? I am not familiar with the measures
employed to make libvirt thread-safe. Thanks!
--
Thanks,
Adam