On Wed, Jun 20, 2012 at 03:31:02PM -0600, Eric Blake wrote:
On 06/20/2012 03:07 PM, Jim Fehlig wrote:
> I'm looking into a libvirtd deadlock on daemon shutdown. The deadlock
> occurs when shutting down virNetServer. If the handling of a job is in
> flight in virNetServerHandleJob(), the virNetServer lock is acquired
> when freeing job->prog (src/rpc/virnetserver.c:167). But the lock is
> already held in virNetServerFree(), which is blocked in
> virThreadPoolFree() waiting for all the workers to finish. No progress
> can be made.
>
> The attached hack fixes the problem, but I'm not convinced this is an
> appropriate fix. Is it necessary to hold the virNetServer lock when
> calling virNetServerProgramFree(job->prog)? I notice the lock is not
> held in the error path of virNetServerHandleJob().
>
> +++ b/src/rpc/virnetserver.c
> @@ -774,7 +774,9 @@ void virNetServerFree(virNetServerPtr srv)
> for (i = 0 ; i < srv->nservices ; i++)
> virNetServerServiceToggle(srv->services[i], false);
>
> + virNetServerUnlock(srv);
> virThreadPoolFree(srv->workers);
> + virNetServerLock(srv);
As written, this can't be right, because it reads a field of srv outside
the locks. But maybe you meant:
<type> tmp = srv->workers;
srv->workers = NULL;
virNetServerUnlock(srv);
virThreadPoolFree(tmp);
virNetServerLock(srv);
as a possible fix that doesn't violate the safety rules of reading
fields from srv outside a lock.
While you are technically correct in the most general sense, there
is not actually any harm in dereferencing srv->workers here, since
now the ref count is 0, only the current thread & the workers are
allowed to be accessing 'srv', an the workers don't touch the
'srv->workers' field at all.
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|