
On Wed, Jun 20, 2012 at 03:31:02PM -0600, Eric Blake wrote:
On 06/20/2012 03:07 PM, Jim Fehlig wrote:
I'm looking into a libvirtd deadlock on daemon shutdown. The deadlock occurs when shutting down virNetServer. If the handling of a job is in flight in virNetServerHandleJob(), the virNetServer lock is acquired when freeing job->prog (src/rpc/virnetserver.c:167). But the lock is already held in virNetServerFree(), which is blocked in virThreadPoolFree() waiting for all the workers to finish. No progress can be made.
The attached hack fixes the problem, but I'm not convinced this is an appropriate fix. Is it necessary to hold the virNetServer lock when calling virNetServerProgramFree(job->prog)? I notice the lock is not held in the error path of virNetServerHandleJob().
+++ b/src/rpc/virnetserver.c @@ -774,7 +774,9 @@ void virNetServerFree(virNetServerPtr srv) for (i = 0 ; i < srv->nservices ; i++) virNetServerServiceToggle(srv->services[i], false);
+ virNetServerUnlock(srv); virThreadPoolFree(srv->workers); + virNetServerLock(srv);
As written, this can't be right, because it reads a field of srv outside the locks. But maybe you meant:
<type> tmp = srv->workers; srv->workers = NULL; virNetServerUnlock(srv); virThreadPoolFree(tmp); virNetServerLock(srv);
as a possible fix that doesn't violate the safety rules of reading fields from srv outside a lock.
While you are technically correct in the most general sense, there is not actually any harm in dereferencing srv->workers here, since now the ref count is 0, only the current thread & the workers are allowed to be accessing 'srv', an the workers don't touch the 'srv->workers' field at all. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|