On Thu, Apr 30, 2015 at 02:38:21PM +0200, Michal Privoznik wrote:
On 30.04.2015 14:04, Erik Skultety wrote:
> The lock is dropped always at the end of an API, but for example when
> attaching devices, there's no point in having the NW filter locked if the
> device being attached isn't a network interface. It's always a nice
> practice to drop unnecessary locks as soon as possible.
> ---
> src/qemu/qemu_driver.c | 20 ++++++++++++++++++--
> 1 file changed, 18 insertions(+), 2 deletions(-)
>
> BZ 1181074 reported the daemon to hang (not completely true btw) after trying
> to attach /dev/ttyX to a running domain. When going through the code I realized
> we acquire a read lock for NW filter, although the device itself is not a network
> interface and then releasing it at the end. First clue was that this lock which
> won't be unlocked as the thread is blocked in reading header of /dev/ttyX
causes
> the freeze. As it turned out, the problem resides in calling virsh list afterwards
> (Peter already does have some patches prepared), but I think the NW filter might
> be released earlier in this case with no harm done.
>
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 74dcb0a..04887e0 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -8764,6 +8764,7 @@ static int qemuDomainAttachDeviceFlags(virDomainPtr dom, const
char *xml,
> virDomainDeviceDefPtr dev = NULL, dev_copy = NULL;
> int ret = -1;
> unsigned int affect, parse_flags = VIR_DOMAIN_DEF_PARSE_INACTIVE;
> + bool nwfilter_locked = false;
> virQEMUCapsPtr qemuCaps = NULL;
> qemuDomainObjPrivatePtr priv;
> virQEMUDriverConfigPtr cfg = NULL;
> @@ -8773,6 +8774,7 @@ static int qemuDomainAttachDeviceFlags(virDomainPtr dom, const
char *xml,
> VIR_DOMAIN_AFFECT_CONFIG, -1);
>
> virNWFilterReadLockFilterUpdates();
> + nwfilter_locked = true;
Wait a while. Whenever a programmer needs to know if a lock is locked,
something is wrong with the design. That's why POSIX doesn't define such
function. Then, this is a Read lock, so it won't hold back other
readers, only writers. I don't think there's a real bottle neck unless
you are doing a testcase and attaching thousands of devices and defining
another thousands of NWFilters.
What I'm more curious is - why the heck does this lock needs to be on
this level? It's very high level. We lock the lock even when we don't
know yet if the XML is right, and what device type we will work on.
Can't the lock be moved to lower layers?
We have to be careful of the lock ordering rules - see this note
in the commit where I refactoring nwfilter locking:
commit 6e5c79a1b5a8b3a23e7df7ffe58fb272aa17fbfb
Author: Daniel P. Berrange <berrange(a)redhat.com>
Date: Wed Jan 22 17:28:29 2014 +0000
Push nwfilter update locking up to top level
The NWFilter code has as a deadlock race condition between
the virNWFilter{Define,Undefine} APIs and starting of guest
VMs due to mis-matched lock ordering.
In the virNWFilter{Define,Undefine} codepaths the lock ordering
is
1. nwfilter driver lock
2. virt driver lock
3. nwfilter update lock
4. domain object lock
In the VM guest startup paths the lock ordering is
1. virt driver lock
2. domain object lock
3. nwfilter update lock
As can be seen the domain object and nwfilter update locks are
not acquired in a consistent order.
The fix used is to push the nwfilter update lock upto the top
level resulting in a lock ordering for virNWFilter{Define,Undefine}
of
1. nwfilter driver lock
2. nwfilter update lock
3. virt driver lock
4. domain object lock
and VM start using
1. nwfilter update lock
2. virt driver lock
3. domain object lock
This has the effect of serializing VM startup once again, even if
no nwfilters are applied to the guest. There is also the possibility
of deadlock due to a call graph loop via virNWFilterInstantiate
and virNWFilterInstantiateFilterLate.
These two problems mean the lock must be turned into a read/write
lock instead of a plain mutex at the same time. The lock is used to
serialize changes to the "driver->nwfilters" hash, so the write lock
only needs to be held by the define/undefine methods. All other
methods can rely on a read lock which allows good concurrency.
Though some things have changed due to removal of the virt driver lock
in QEMU, we stil have to respect the ordering of the nwfilter vs domain
object lock
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|