
Eric Blake wrote:
On 06/20/2012 03:07 PM, Jim Fehlig wrote:
I'm looking into a libvirtd deadlock on daemon shutdown. The deadlock occurs when shutting down virNetServer. If the handling of a job is in flight in virNetServerHandleJob(), the virNetServer lock is acquired when freeing job->prog (src/rpc/virnetserver.c:167). But the lock is already held in virNetServerFree(), which is blocked in virThreadPoolFree() waiting for all the workers to finish. No progress can be made.
The attached hack fixes the problem, but I'm not convinced this is an appropriate fix. Is it necessary to hold the virNetServer lock when calling virNetServerProgramFree(job->prog)? I notice the lock is not held in the error path of virNetServerHandleJob().
+++ b/src/rpc/virnetserver.c @@ -774,7 +774,9 @@ void virNetServerFree(virNetServerPtr srv) for (i = 0 ; i < srv->nservices ; i++) virNetServerServiceToggle(srv->services[i], false);
+ virNetServerUnlock(srv); virThreadPoolFree(srv->workers); + virNetServerLock(srv);
As written, this can't be right, because it reads a field of srv outside the locks. But maybe you meant:
<type> tmp = srv->workers; srv->workers = NULL; virNetServerUnlock(srv); virThreadPoolFree(tmp); virNetServerLock(srv);
as a possible fix that doesn't violate the safety rules of reading fields from srv outside a lock.
Perhaps, but it _currently_ appears srv->workers is only set in virNetServerNew(), which is called when libvirtd starts. I suppose that could change in the future, causing a bug as I wrote it.
I hope danpb chimes in on this one, as he is more of an expert when it comes to the locking rules in virnet*.
Agreed. I think there is a better fix here. Thanks, Jim