From: "Daniel P. Berrange" <berrange(a)redhat.com>
The past 24 hours have seen a flurry of libvirtd crash reports from
Fedora users.
https://bugzilla.redhat.com/show_bug.cgi?id=1014933
In one thread we have the libvirtd daemon startup code running, and
it is in the middle of QEMU state initialization.
#9 0xb00882e4 in qemuStateInitialize (privileged=true, callback=0xb77a0420
<daemonInhibitCallback>, opaque=0xb8b1fc98) at qemu/qemu_driver.c:595
driverConf = 0xaf5afcd8 "/etc/libvirt/qemu.conf"
conn = 0x0
ebuf =
"\000\260\025\267\024\071P\257\214\000\000\000\360\316\341\257\335\242\023\267\214\000\000\000\210\177X\257\001\000\000\000l\000\000\000\360\316\341\257\000\260\025\267\264\316\341\257\210\177X\257$\316\341\257$\316\341\257l\000\000\000\304\316\341\257\201\321LRl\000\000\000\235R\022\267\000)\233\351\260\316\341\257\000\000\000\000\253G\022\267\000\260\025\267\340\316\341\257\a\000\000\000\v\260\023\267\000\260\025\267\001\000\000\000\254\325\334\266\000\260\025\267\214\261\023\267
:P\257\037:P\257\000\000\000\000/\261\023\267\000\260\025\267uc\334\266\000\260\025\267A\262\023\267\037:P\257\000\000\000\000\001\000\000\000\000\000\000\000\340\316\341\257\334\316\341\257\001\000\000\000\001\000\000\000\033c\024\267"...
membase = 0x0
mempath = 0x0
cfg = 0xaf509050
run_uid = 4294967295
run_gid = 4294967295
__func__ = "qemuStateInitialize"
__FUNCTION__ = "qemuStateInitialize"
#10 0xb74c5325 in virStateInitialize (privileged=true, callback=callback@entry=0xb77a0420
<daemonInhibitCallback>, opaque=opaque@entry=0xb8b1fc98) at libvirt.c:833
i = 6
__func__ = "virStateInitialize"
#11 0xb77a049e in daemonRunStateInit (opaque=opaque@entry=0xb8b1fc98) at libvirtd.c:876
srv = 0xb8b1fc98
__func__ = "daemonRunStateInit"
In another thread, we have a dbus event being handled by the nwfilter
driver, and the nwfilter driver calls into the QEMU driver....which
has not finished initializing itself yet!
Thread 1 (Thread 0xb6366ac0 (LWP 7041)):
#0 0xb0052861 in virQEMUCloseCallbacksGetForConn (closeCallbacks=0x0, conn=0xb8b2cc20) at
qemu/qemu_conf.c:861
list = 0xb8ac57e8
data = {conn = 0xb8b2cc20, list = 0xb8ac57e8, oom = false}
#1 virQEMUCloseCallbacksRun (closeCallbacks=0x0, conn=conn@entry=0xb8b2cc20,
driver=0xaf50b350) at qemu/qemu_conf.c:890
list = 0xb8b2cc20
i = <optimized out>
__func__ = "virQEMUCloseCallbacksRun"
#2 0xb009df3b in qemuConnectClose (conn=0xb8b2cc20) at qemu/qemu_driver.c:1057
driver = <optimized out>
#3 0xb74babc1 in virConnectDispose (obj=0xb8b2cc20) at datatypes.c:159
conn = 0xb8b2cc20
#4 0xb742f22c in virObjectUnref (anyobj=anyobj@entry=0xb8b2cc20) at util/virobject.c:264
klass = 0xb8b2cba0
obj = 0xb8b2cc20
lastRef = true
__func__ = "virObjectUnref"
#5 0xb74c5811 in virConnectClose (conn=conn@entry=0xb8b2cc20) at libvirt.c:1503
__func__ = "virConnectClose"
__FUNCTION__ = "virConnectClose"
#6 0xb023424e in nwfilterStateReload () at nwfilter/nwfilter_driver.c:301
conn = 0xb8b2cc20
#7 0xb02342fc in nwfilterFirewalldDBusFilter (connection=0xaf501038, message=0xaf503910,
user_data=0x0) at nwfilter/nwfilter_driver.c:90
__func__ = "nwfilterFirewalldDBusFilter"
#8 0xb711efb9 in dbus_connection_dispatch (connection=0xaf501038) at
dbus-connection.c:4631
filter = <optimized out>
next = 0x0
message = 0xaf503910
link = <optimized out>
filter_list_copy = 0xaf5009dc
message_link = 0xaf500a18
result = DBUS_HANDLER_RESULT_NOT_YET_HANDLED
pending = <optimized out>
reply_serial = <optimized out>
status = <optimized out>
found_object = 3071507249
__FUNCTION__ = "dbus_connection_dispatch"
#9 0xb740caeb in virDBusWatchCallback (fdatch=fdatch@entry=8, fd=15, events=1,
opaque=0xaf500ca8) at util/virdbus.c:144
watch = 0xaf500ca8
info = 0xaf500de0
dbus_flags = 1
This DBus event is triggered when the firewalld driver is
reloaded, or restarted.
I confirmed this analysis by adding a sleep(10) to the QEMU
driver startup code, and then triggering a firewalld restart.
Sure enough it crashed & burned with the same trace.
The reason why it has suddenly hit us is that we are unlucky
enough to have a firewalld update in Fedora repos at the same
time as a libvirt update, and lots of people are pulling both
updates down in one yum transaction!
After wasting time figuring out how to avoid the race condition
with mutexes and other synchronization ideas, I realized that
the nwfilter code was in fact bogus.
The only reason it gets a virConnectPtr is so that the code
for reloading filters can access its nwfilterPrivateData
field to get the virNWFilterDriverStatePtr object instance.
This is insanely convoluted, since the nwfilter driver can
trivially pass the driver state instance into the
virNWFilterConfLayerInit method at startup.
Thus these patches just rip out all use of virConnectPtr
from the nwfilter driver code, thus avoiding the race with
the QEMU driver initialization code.
This also fixes the nwfilter driver in cases where the QEMU
driver is disabled, but LXC driver still wants to use nwfilter.
Daniel P. Berrange (3):
Remove virConnectPtr arg from virNWFilterDefParse*
Don't pass virConnectPtr in nwfilter 'struct domUpdateCBStruct'
Remove use of virConnectPtr from all remaining nwfilter code
src/conf/nwfilter_conf.c | 78 ++++++++++++++++------------------
src/conf/nwfilter_conf.h | 24 ++++-------
src/lxc/lxc_driver.c | 3 +-
src/nwfilter/nwfilter_dhcpsnoop.c | 12 +++---
src/nwfilter/nwfilter_driver.c | 49 +++++++++------------
src/nwfilter/nwfilter_gentech_driver.c | 32 +++++++-------
src/nwfilter/nwfilter_gentech_driver.h | 10 ++---
src/nwfilter/nwfilter_learnipaddr.c | 6 +--
src/qemu/qemu_driver.c | 6 ++-
src/uml/uml_driver.c | 3 +-
tests/nwfilterxml2xmltest.c | 2 +-
11 files changed, 102 insertions(+), 123 deletions(-)
--
1.8.3.1