libvirt
/
libvirt
|
master
|
19 mins and 18 secs
|
Mark Asselstine
|
lxc_monitor: Avoid AB / BA lock race
A deadlock situation can occur when autostarting a LXC domain 'guest' due to two threads attempting to take opposing locks while holding opposing locks (AB BA problem). Thread A takes and holds the 'vm' lock while attempting to take the 'client' lock, meanwhile, thread B takes and holds the 'client' lock while attempting to take the 'vm' lock.
The potential for this can be seen as follows:
Thread A: virLXCProcessAutostartDomain (takes vm lock) --> virLXCProcessStart --> virLXCProcessConnectMonitor --> virLXCMonitorNew --> virNetClientSetCloseCallback (wants client lock)
Thread B: virNetClientIncomingEvent (takes client lock) --> virNetClientIOHandleInput --> virNetClientCallDispatch --> virNetClientCallDispatchMessage --> virNetClientProgramDispatch --> virLXCMonitorHandleEventInit --> virLXCProcessMonitorInitNotify (wants vm lock)
Since these threads are scheduled independently and are preemptible it is possible for the deadlock scenario to occur where each thread locks their first lock but both will fail to get their second lock and just spin forever. You get something like:
virLXCProcessAutostartDomain (takes vm lock) --> virLXCProcessStart --> virLXCProcessConnectMonitor --> virLXCMonitorNew <...> virNetClientIncomingEvent (takes client lock) --> virNetClientIOHandleInput --> virNetClientCallDispatch --> virNetClientCallDispatchMessage --> virNetClientProgramDispatch --> virLXCMonitorHandleEventInit --> virLXCProcessMonitorInitNotify (wants vm lock but spins) <...> --> virNetClientSetCloseCallback (wants client lock but spins)
Neither thread ever gets the lock it needs to be able to continue while holding the lock that the other thread needs.
The actual window for preemption which can cause this deadlock is rather small, between the calls to virNetClientProgramNew() and execution of virNetClientSetCloseCallback(), both in virLXCMonitorNew(). But it can be seen in real world use that this small window is enough.
By moving the call to virNetClientSetCloseCallback() ahead of virNetClientProgramNew() we can close any possible chance of the deadlock taking place. There should be no other implications to the move since the close callback (in the unlikely event was called) will spin on the vm lock. The remaining work that takes place between the old call location of virNetClientSetCloseCallback() and the new location is unaffected by the move.
Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
|
|