On 29.11.2013 08:18, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1033061
Since our transformation into virObject is not complete and we must do
ref and unref ourselves there's a chance that we will get it wrong. That
is, while one thread is doing unref and subsequent dispose another
thread may come and do the ref & unref on stale pointer. This results in
dispose being called twice (and possibly simultaneously). These kind of
errors are hard to catch so we should at least throw an error into logs
if such situation occurs. In fact, I've seen a stack trace showing this
error had happen (obj = 0x7f4968018260):
On a second thought I don't think this patch is that good. I mean, the
libvirtd has a very small window where this patch would work. The
beginning of the window is bounded by destroy callback where memory
allocated for an object is free()d, the end of the window is actual
unmap performed by glibc. Because after this point, accessing a stale
pointer either:
a) results in access into unmapped memory and thus SIGSEGV
b) results in access into mapped - but random memory, where a random
value is incremented or decremented and hence our check for refcount
being smaller than or equal to one is bogus.
So I think I have to self-NAK this one. Sigh.
Anyway, just for the record, the original bug is (MT = MainThread -
thread running main(); IT = InitializeThread)
1) (MT) daemonStateInit spawns a new thread (IT) to initialize all the
drivers, which subsequently autostart domains, ...
2) (IT) creates a new driver - be it netcf driver in this case.
driver.refs = 1
3) (MT) For some reason, we exit the eventloop early (e.g. SIGINT was
delivered) resulting in calling virStateCleanup() which iterates over
table of drivers and calls ->stateCleanup() method over each one. In our
specific case, the netfc driver calls virObjectUnref(driver),
driver.refs = 0 and hence the dispose cb is called.
4) (IT) Doesn't know anything about quiting, and tries to autostart
domains. Be it LXC domains for now. So it opens a new dummy connection,
which causes virObjectRef(driver). But wait! The driver is already bing
disposed.
5) (IT) Eventually calls virConnectClose() which unrefs the driver,
again, resulting in disposing the driver.
Therefore I think the correct way how to solve this is to remove driver
from global driver table while iterating over its items in
virStateCleanup().
Michal