On 01/23/2014 07:37 AM, Daniel P. Berrange wrote:
The NWFilter code has as a deadlock race condition between
the virNWFilter{Define,Undefine} APIs and starting of guest
VMs due to mis-matched lock ordering.
In the virNWFilter{Define,Undefine} codepaths the lock ordering
is
1. nwfilter driver lock
2. virt driver lock
3. nwfilter update lock
4. domain object lock
In the VM guest startup paths the lock ordering is
1. virt driver lock
2. domain object lock
3. nwfilter update lock
As can be seen the domain object and nwfilter update locks are
not acquired in a consistent order.
The fix used is to push the nwfilter update lock upto the top
level resulting in a lock ordering for virNWFilter{Define,Undefine}
of
1. nwfilter driver lock
2. nwfilter update lock
3. virt driver lock
4. domain object lock
and VM start using
1. nwfilter update lock
2. virt driver lock
3. domain object lock
Makes sense... there are other paths as well:
- SIGHUP or restart of libvirt that recreates all filters
- late instantiation of filters following detection of VM's IP address
This has the effect of serializing VM startup once again, even if
no nwfilters are applied to the guest.
Signed-off-by: Daniel P. Berrange <berrange(a)redhat.com>
---
src/conf/nwfilter_conf.c | 6 ------
src/lxc/lxc_driver.c | 6 ++++++
src/nwfilter/nwfilter_driver.c | 8 ++++----
src/nwfilter/nwfilter_gentech_driver.c | 6 ------
src/qemu/qemu_driver.c | 6 ++++++
src/uml/uml_driver.c | 4 ++++
6 files changed, 20 insertions(+), 16 deletions(-)
diff --git a/src/conf/nwfilter_conf.c b/src/conf/nwfilter_conf.c
index 6db8ea9..e712ca5 100644
--- a/src/conf/nwfilter_conf.c
+++ b/src/conf/nwfilter_conf.c
@@ -2990,14 +2990,12 @@ virNWFilterObjAssignDef(virNWFilterObjListPtr nwfilters,
return NULL;
}
- virNWFilterLockFilterUpdates();
if ((nwfilter = virNWFilterObjFindByName(nwfilters, def->name))) {
if (virNWFilterDefEqual(def, nwfilter->def, false)) {
virNWFilterDefFree(nwfilter->def);
nwfilter->def = def;
- virNWFilterUnlockFilterUpdates();
return nwfilter;
}
I think you should add a comment to this function that it must be called
with this lock held.
With the removal of the locking from this function here you have to do
the locking in the callers. Callers of virNWFilterObjAssignDef are:
src/conf/nwfilter_conf.c::virNWFilterObjLoad : I don't see this function
grabbing the lock. I think this is missing.
src/conf/nwfilter_driver.c::nwfilterDefineXML : ok, found it below
diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c
index e319234..b1f8a89 100644
--- a/src/lxc/lxc_driver.c
+++ b/src/lxc/lxc_driver.c
@@ -1015,6 +1015,8 @@ static int lxcDomainCreateWithFiles(virDomainPtr dom,
virCheckFlags(VIR_DOMAIN_START_AUTODESTROY, -1);
+ virNWFilterLockFilterUpdates();
+
if (!(vm = lxcDomObjFromDomain(dom)))
goto cleanup;
@@ -1053,6 +1055,7 @@ cleanup:
if (event)
virObjectEventStateQueue(driver->domainEventState, event);
virObjectUnref(cfg);
+ virNWFilterUnlockFilterUpdates();
return ret;
}
@@ -1109,6 +1112,8 @@ lxcDomainCreateXMLWithFiles(virConnectPtr conn,
virCheckFlags(VIR_DOMAIN_START_AUTODESTROY, NULL);
+ virNWFilterLockFilterUpdates();
+
if (!(caps = virLXCDriverGetCapabilities(driver, false)))
goto cleanup;
@@ -1164,6 +1169,7 @@ cleanup:
virObjectEventStateQueue(driver->domainEventState, event);
virObjectUnref(caps);
virObjectUnref(cfg);
+ virNWFilterUnlockFilterUpdates();
return dom;
}
Probably these calls are correctly placed since all other creation
functions funnel into them.Same for UML and QEMU drivers.
diff --git a/src/nwfilter/nwfilter_gentech_driver.c
b/src/nwfilter/nwfilter_gentech_driver.c
index 89913cf8..f0e78ed 100644
--- a/src/nwfilter/nwfilter_gentech_driver.c
+++ b/src/nwfilter/nwfilter_gentech_driver.c
@@ -935,7 +935,6 @@ _virNWFilterInstantiateFilter(virNWFilterDriverStatePtr driver,
int ifindex;
int rc;
- virNWFilterLockFilterUpdates();
/* after grabbing the filter update lock check for the interface; if
it's not there anymore its filters will be or are being removed
@@ -964,7 +963,6 @@ _virNWFilterInstantiateFilter(virNWFilterDriverStatePtr driver,
foundNewFilter);
cleanup:
- virNWFilterUnlockFilterUpdates();
return rc;
}
This function is called by virNWFilterInstantiateFilter and
virNWFilterUpdateInstantiateFilter. So, in the former case this is
called when a VM is started, in the latter case if VMs' filters are
updated while the VM is running. I think you are covering the VM
creation case with the above calls for lxc and further below with the
changes to the qemu driver and the uml driver.
There is at least one other part that needs to be covered:
src/conf/nwfilter_conf.c::virNWFilterInstFiltersOnAllVMs :: kicked off
by nwfilterStateReload upon SIGHUP. We need a lock there now.
src/conf/nwfilter_conf.c::virNWFilterTriggerVMFilterRebuild::
- called by virNWFilterTestUnassignDef:
- called by src/nwfilter/nwfilter/nwfilterUndefine:: You seem
to just reorder some locks there; now we need a (writer) lock there
- called by virNWFilterObjAssignDef: which must be called with lock
called following above reasoning
@@ -984,8 +982,6 @@
virNWFilterInstantiateFilterLate(virNWFilterDriverStatePtr driver,
int rc;
bool foundNewFilter = false;
- virNWFilterLockFilterUpdates();
-
rc = __virNWFilterInstantiateFilter(driver,
vmuuid,
true,
@@ -1009,8 +1005,6 @@ virNWFilterInstantiateFilterLate(virNWFilterDriverStatePtr driver,
}
}
- virNWFilterUnlockFilterUpdates();
-
return rc;
}
This function here is called by
src/nwfilter/nwfilter_learnip.c::learnIPAddressThread
src/nwfilter/nwfilter_dhcpsnoop.c::virNWFilterSnoopIPLeaseInstallRule
They instantiate the filters once a VM's IP address has been detected.
So this is where the *Late() comes from.
If you remove the locking from here, you have to lock it there.
Considering what you do layer, I would keep the lock here and convert
this into a reader lock layer on.
The farther away we do this from the calls where the actual action is
happening, the more tricky this becomes. I would almost recommend to
plant an assert into those far way functions that tests whether the lock
has been grabbed.
Regards,
Stefan