I received a report of a virsh coredump that happened on a heavy loaded
system. Debugging the corefile I saw that under certain circumstances
the mutex for a connection can be destroyed while another thread is
waiting for the lock on that connection.
Specifically this seems to happen when vshDeinit is called after the
completion of vshRunCommand and at the same time the event loop is
receiving a client disconnect.
vshDeinit calls virConnectClose which unrefs the connection object
destroying the lock mutex before releasing the memory and nulling
the callback pointers.
At the same time the event loop will call remoteClientCloseFunc
which waits for the mutex while the connection is unrefed.
It will obtain the (now invalid but non-error checking) mutex,
copy the closeCallback pointer (shorty before the connection object is
destroyed) and will die upon trying to call closeFreeCallback (NULL
by now).
First I would like to see whether someone finds an obvious flaw
in my reasoning. Then I'd would like to discuss how to get around it.
One thought would be to increase the refcount of the connection
object in remoteClientCloseFunc before calling the closeCallback.
--
Mit freundlichen Grüßen/Kind Regards
Viktor Mihajlovski
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martina Köderitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294