Eric Blake wrote:
On 06/20/2012 03:07 PM, Jim Fehlig wrote:
> I'm looking into a libvirtd deadlock on daemon shutdown. The deadlock
> occurs when shutting down virNetServer. If the handling of a job is in
> flight in virNetServerHandleJob(), the virNetServer lock is acquired
> when freeing job->prog (src/rpc/virnetserver.c:167). But the lock is
> already held in virNetServerFree(), which is blocked in
> virThreadPoolFree() waiting for all the workers to finish. No progress
> can be made.
>
> The attached hack fixes the problem, but I'm not convinced this is an
> appropriate fix. Is it necessary to hold the virNetServer lock when
> calling virNetServerProgramFree(job->prog)? I notice the lock is not
> held in the error path of virNetServerHandleJob().
>
>
> +++ b/src/rpc/virnetserver.c
> @@ -774,7 +774,9 @@ void virNetServerFree(virNetServerPtr srv)
> for (i = 0 ; i < srv->nservices ; i++)
> virNetServerServiceToggle(srv->services[i], false);
>
> + virNetServerUnlock(srv);
> virThreadPoolFree(srv->workers);
> + virNetServerLock(srv);
>
As written, this can't be right, because it reads a field of srv outside
the locks. But maybe you meant:
<type> tmp = srv->workers;
srv->workers = NULL;
virNetServerUnlock(srv);
virThreadPoolFree(tmp);
virNetServerLock(srv);
as a possible fix that doesn't violate the safety rules of reading
fields from srv outside a lock.
Perhaps, but it _currently_ appears srv->workers is only set in
virNetServerNew(), which is called when libvirtd starts. I suppose that
could change in the future, causing a bug as I wrote it.
I hope danpb chimes in on this one, as he is more of an expert when
it
comes to the locking rules in virnet*.
Agreed. I think there is a better fix here.
Thanks,
Jim