
On Thu, Feb 06, 2014 at 02:11:26PM +0200, Laine Stump wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
(see notes about conflict resolution at the end. I'm not sending the other 5 patches required, since they were all simple cherry-picks of:
4f2094346d98f4ed6a2de115d204c166cc563496 b77b16ce4166dcc87963ae5d279b77b162ddbb55 ebca369e3fe5ac999c261c2d44e60a1bac3cfe65 999d72fbd59ea712128ae294b69b6a54039d757b c065984b58000a44c90588198d222a314ac532fd )
The NWFilter code has as a deadlock race condition between the virNWFilter{Define,Undefine} APIs and starting of guest VMs due to mis-matched lock ordering.
In the virNWFilter{Define,Undefine} codepaths the lock ordering is
1. nwfilter driver lock 2. virt driver lock 3. nwfilter update lock 4. domain object lock
In the VM guest startup paths the lock ordering is
1. virt driver lock 2. domain object lock 3. nwfilter update lock
As can be seen the domain object and nwfilter update locks are not acquired in a consistent order.
The fix used is to push the nwfilter update lock upto the top level resulting in a lock ordering for virNWFilter{Define,Undefine} of
1. nwfilter driver lock 2. nwfilter update lock 3. virt driver lock 4. domain object lock
and VM start using
1. nwfilter update lock 2. virt driver lock 3. domain object lock
This has the effect of serializing VM startup once again, even if no nwfilters are applied to the guest. There is also the possibility of deadlock due to a call graph loop via virNWFilterInstantiate and virNWFilterInstantiateFilterLate.
These two problems mean the lock must be turned into a read/write lock instead of a plain mutex at the same time. The lock is used to serialize changes to the "driver->nwfilters" hash, so the write lock only needs to be held by the define/undefine methods. All other methods can rely on a read lock which allows good concurrency.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> (cherry picked from commit 6e5c79a1b5a8b3a23e7df7ffe58fb272aa17fbfb)
Conflicts: src/conf/nwfilter_conf.c - virReportOOMError() in context of one hunk. src/lxc/lxc_driver.c - functions renamed, and lxc object locking changed, creating a conflict in the context. --- src/conf/nwfilter_conf.c | 25 ++++++++++++------------- src/conf/nwfilter_conf.h | 3 ++- src/libvirt_private.syms | 3 ++- src/lxc/lxc_driver.c | 8 +++++++- src/nwfilter/nwfilter_driver.c | 10 ++++++---- src/nwfilter/nwfilter_gentech_driver.c | 6 +----- src/qemu/qemu_driver.c | 6 ++++++ src/uml/uml_driver.c | 4 ++++ 8 files changed, 40 insertions(+), 25 deletions(-)
ACK, you got the lock ordering correct when resolving the lxc conflict. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|