On Thu, Oct 29, 2009 at 06:04:29PM +0100, Daniel Veillard wrote:
> All other API calls making changes get safely queued up at
> step 1, but API calls which simply wish to query information
> can run without being blocked at all. This fixes the major
> concurrency problem with running monitor commands. The use
> of a condition variable at the start of step 1, also allows
> us to time out API calls, if some other thread get stuck in
> the monitor for too long. I think this also makes the use of
> a RWLock on the QEMU driver unneccessary, since no code will
> ever be holding a mutex in any place that sleeps/wait. Only
> the condition variable will be held during sleeps/waits.
>
> Since we'll now effectively have 3 locks, and 1 condition
> variable this is getting kind of complex. So the rest of this
> mail is a file I propose to put in src/qemu/THREADS.txt
> describing what is going on, and showing the recommended
> design patterns to use.
I have just one remark, this separation between APIs might
be done one level up, i.e. at the library entry point level
we should know what may induce a state change and those could
be flagged more formally. This may help other drivers where
libvirt needs to keep the state instead of asking the hypervisor.
It can't be done at the library entry level, since the locking needs
to be done against objects that are private to the driver.
> Basic locking primitives
> ------------------------
>
> There are a number of locks on various objects
>
> * struct qemud_driver: RWLock
Opps, this should have said 'Mutex' rather than RWLock
>
> This is the top level lock on the entire driver. Every API call in
> the QEMU driver is blocked while this is held, though some internal
> callbacks may still run asynchronously. This lock must never be held
> for anything which sleeps/waits (ie monitor commands)
>
> When obtaining the driver lock, under *NO* circumstances must
> any lock be held on a virDomainObjPtr. This *WILL* result in
> deadlock.
Any chance to enforce that at the code level ? Since we have
primitives for both, we could once the RW lock is taken set a flag in
the driver, and the DomainObj locking/unlocking routine could raise an
error if this happen.
That is not possible todo safely. If you add a flag in the driver to
indicate whether it is locked or not, then you need to add another
mutex to protect reads/write to that flag, otherwise you've got a
clear race condition in checking it.
> * qemuMonitorPtr: Mutex
>
> Lock to be used when invoking any monitor command to ensure safety
> wrt any asynchronous events that may be dispatched from the monitor.
> It should be acquired before running a command.
>
> The job condition *MUST* be held before acquiring the monitor lock
>
> The virDomainObjPtr lock *MUST* be held before acquiring the monitor
> lock.
>
> The virDomainObjPtr lock *MUST* then be released when invoking the
> monitor command.
>
> The driver lock *MUST* be released when invoking the monitor commands.
>
> This ensures that the virDomainObjPtr & driver are both unlocked while
> sleeping/waiting for the monitor response.
I had to read this twice and I'm not sure I managed to fully map
mentally the full set of constraints.
Essentially there's a hierarchy of objects
Driver -> virDomainObjPtr -> qemuMonitorPtr
You have to acquire the locks in that order, and once you've acquired
the final qemuMonitorPtr lock, you must release the other locks
before running the actual monitor command.
> To acquire the job condition variable (int jobActive)
>
> qemuDomainObjBeginJob() (if driver is unlocked)
> - Increments ref count on virDomainObjPtr
> - Wait qemuDomainObjPrivate condition 'jobActive != 0' using
virDomainObjPtr mutex
> - Sets jobActive to 1
>
> qemuDomainObjBeginJobWithDriver() (if driver needs to be locked)
> - Unlocks driver
> - Increments ref count on virDomainObjPtr
> - Wait qemuDomainObjPrivate condition 'jobActive != 0' using
virDomainObjPtr mutex
> - Sets jobActive to 1
> - Unlocks virDomainObjPtr
> - Locks driver
> - Locks virDomainObjPtr
>
> NB: this variant is required in order to comply with lock ordering rules
> for virDomainObjPtr vs driver
>
>
> qemuDomainObjEndJob()
> - Set jobActive to 0
> - Signal on qemuDomainObjPrivate condition
> - Decrements ref count on virDomainObjPtr
>
>
>
> To acquire the QEMU monitor lock
>
> qemuDomainObjEnterMonitor()
> - Acquires the qemuMonitorObjPtr lock
> - Releases the virDomainObjPtr lock
>
> qemuDomainObjExitMonitor()
> - Acquires the virDomainObjPtr lock
> - Releases the qemuMonitorObjPtr lock
>
> NB: caller must take care to drop the driver lock if neccessary
>
It would be good if a maximum number of the constraints lested above
could also be checked at runtime. Sure we could try to make new
checking rules like we did for previous locking checks but it's hard
for someone doing a patch to really run those. And I doubt the extra
burden of checking a few conditions in locking routines would really
impact performances. The only problem might be availbaility of
pointers at the locking routines (or wrappers) to get the
informations.
As before it is not possible to check those constraints safely at
runtime without adding yet more locks. The idea of adding these methods
qemuDomainObjBeginJob, qemuDomainObjEndJob, qemuDomainObjEnterMonitor
and qemuDomainObjExitMonitor, is that they take the complexity out of
the code. By defining the common code patterns, and making everything
use these helpers instead of the locks themselves, we ensure that all
code is compliant with the rules. It has taken that complex set of
ordering rules and simplified it to one of the patterns shown below
> Design patterns
> ---------------
>
> All driver methods must follow one of these design patterns to
> ensure thread safety and lock correctness.
>
>
> * Accessing or updating something with just the driver
>
> qemuDriverLock(driver);
>
> ...do work...
>
> qemuDriverUnlock(driver);
>
>
>
> * Accessing something directly todo with a virDomainObjPtr
>
> virDomainObjPtr obj;
>
> qemuDriverLock(driver);
> obj = virDomainFindByUUID(driver->domains, dom->uuid);
> qemuDriverUnlock(driver);
>
> ...do work...
>
> virDomainObjUnlock(obj);
>
>
>
> * Accessing something directly todo with a virDomainObjPtr and driver
>
> virDomainObjPtr obj;
>
> qemuDriverLock(driver);
> obj = virDomainFindByUUID(driver->domains, dom->uuid);
>
> ...do work...
>
> virDomainObjUnlock(obj);
> qemuDriverUnlock(driver);
>
>
>
> * Updating something directly todo with a virDomainObjPtr
>
> virDomainObjPtr obj;
>
> qemuDriverLock(driver);
> obj = virDomainFindByUUID(driver->domains, dom->uuid);
> qemuDriverUnlock(driver);
>
> qemuDomainObjBeginJob(obj);
>
> ...do work...
>
> qemuDomainObjEndJob(obj);
>
> virDomainObjUnlock(obj);
>
>
>
>
> * Invoking a monitor command on a virDomainObjPtr
>
>
> virDomainObjPtr obj;
> qemuDomainObjPrivatePtr priv;
>
> qemuDriverLockRO(driver);
> obj = virDomainFindByUUID(driver->domains, dom->uuid);
> qemuDriverUnlock(driver);
>
> qemuDomainObjBeginJob(obj);
>
> ...do prep work...
>
> qemuDomainObjEnterMonitor(obj);
> qemuMonitorXXXX(priv->mon);
> qemuDomainObjExitMonitor(obj);
>
> ...do final work...
>
> qemuDomainObjEndJob(obj);
> virDomainObjUnlock(obj);
>
>
>
>
> * Invoking a monitor command on a virDomainObjPtr with driver locked too
>
>
> virDomainObjPtr obj;
> qemuDomainObjPrivatePtr priv;
>
> qemuDriverLock(driver);
> obj = virDomainFindByUUID(driver->domains, dom->uuid);
>
> qemuDomainObjBeginJobWithDriver(obj);
>
> ...do prep work...
>
> qemuDomainObjEnterMonitor(obj);
> qemuDriverUnlock(driver);
> qemuMonitorXXXX(priv->mon);
> qemuDriverLock(driver);
> qemuDomainObjExitMonitor(obj);
>
> ...do final work...
>
> qemuDomainObjEndJob(obj);
> virDomainObjUnlock(obj);
> qemuDriverUnlock(driver);
>
>
>
> Summary
> -------
>
> * Respect lock ordering rules: never lock driver if anything else is
> already locked
>
> * Don't hold locks in code which sleeps: unlock driver & virDomainObjPtr
> when using monitor
It's good to have all those described, I'm still worried by the
complexity level, especially for someone contributing small changes,
and by the qemu specific nature of the guidelines. how much of this
is generic for example for other drivers doing read only operations
with a domain, etc ...
The other drivers don't really have the equivalent of the QEMU monitor (well
the UML driver does have a very simple version, but we've not hooked that
up). Their methods are all fairly fast to complete, so don't suffer as badly
from concurrency bottlenecks that hit the QEMU driver
Daniel
--
|: Red Hat, Engineering, London -o-
http://people.redhat.com/berrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org -o-
http://ovirt.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|