[libvirt] QEMU driver thread safety rules

The current QEMU driver makes use of 2 locks - The driver lock - The virDomainObjPtr lock The idea is the driver lock is not held for long periods of time. Unfortunately we don't always deal with this very well - some code needs todo quite alot with the driver - particularly starting and stopping of guests. The bigger problem is that the virDomainObjPtr lock is often held for long periods, specifically whenever we invoke a monitor command. Some of these commands can take a very long time (even infinite if someone has send SIGSTOP to QEMU). This very quickly blocks the whole driver. I've realized that even with the series of monitor patches I sent out, changing the driver mutex to a RWLock, and adding a separate lock on the qemuMonitorPtr object iself, there's still a major concurrency problem: the virDomainObjPtr lock is held for too long. I propose to drop the RWLock patch, and do something totally different instead.... We fundamentally need to drop the virDomainObjPtr lock whenever we invoke a monitor command. Unfortunately, merely dropping the virDomainObjPtr and acquiring the qemuMonitorPtr is not safe. An API call which changes the VM state typically has 3 phases 1. Check what state/config the VM is in 2. Invoke the monitor command 3. Update the state/config of the VM If we release the virDomainObjPtr, and acquire qemuMonitorPtr at step 2, then other APIs calls will be able to complete their own step 1 checks, and get blocked at step 2. This is not safe, because when the original call moves onto step 3 and changes the state, this will have invalidated the checks the other sleeping API calls made in step 1. We need to prevent any API call starting step 1, for as long as there is a monitor command being run, even if the lock on virDomainObjPtr is not held. The only way I see todo this, is to introduce a condition variable indicating that a state change is to be made. Any API call which intends to make a state change must acquire this condition prior to step 1. They can thus safely do their checks, and move onto step 2, releasing the virDomainObj lock whle the monitor command is running, and reacquiring it after. All other API calls making changes get safely queued up at step 1, but API calls which simply wish to query information can run without being blocked at all. This fixes the major concurrency problem with running monitor commands. The use of a condition variable at the start of step 1, also allows us to time out API calls, if some other thread get stuck in the monitor for too long. I think this also makes the use of a RWLock on the QEMU driver unneccessary, since no code will ever be holding a mutex in any place that sleeps/wait. Only the condition variable will be held during sleeps/waits. Since we'll now effectively have 3 locks, and 1 condition variable this is getting kind of complex. So the rest of this mail is a file I propose to put in src/qemu/THREADS.txt describing what is going on, and showing the recommended design patterns to use. Daniel QEMU Driver Threading: The Rules ================================= This document describes how thread safety is ensured throughout the QEMU driver. The criteria for this model are: - Objects must never be exclusively locked for any pro-longed time - Code which sleeps must be able to time out after suitable period - Must be safe against dispatch asynchronous events from monitor Basic locking primitives ------------------------ There are a number of locks on various objects * struct qemud_driver: RWLock This is the top level lock on the entire driver. Every API call in the QEMU driver is blocked while this is held, though some internal callbacks may still run asynchronously. This lock must never be held for anything which sleeps/waits (ie monitor commands) When obtaining the driver lock, under *NO* circumstances must any lock be held on a virDomainObjPtr. This *WILL* result in deadlock. * virDomainObjPtr: Mutex Will be locked after calling any of the virDomainFindBy{ID,Name,UUID} methods. Lock must be held when changing/reading any variable in the virDomainObjPtr Once the lock is held, you must *NOT* try to lock the driver. You must release all virDomainObjPtr locks before locking the driver, or deadlock *WILL* occurr. If the lock needs to be dropped & then re-acquired for a short period of time, the reference count must be incremented first using virDomainObjRef(). If the reference count is incremented in this way, it is not neccessary to have the driver locked when re-acquiring the dropped locked, since the reference count prevents it being freed by another thread. This lock must not be held for anything which sleeps/waits (ie monitor commands). * qemuMonitorPrivatePtr: Job condition Since virDomainObjPtr lock must not be held during sleeps, the job condition provides additional protection for code making updates. Immediately after acquiring the virDomainObjPtr lock, any method which intends to update state, must acquire the job condition. The virDomainObjPtr lock is released while blocking on this condition variable. Once the job condition is acquired a method can safely release the virDomainObjPtr lock whenever it hits a piece of code which may sleep/wait, and re-acquire it after the sleep/ wait. * qemuMonitorPtr: Mutex Lock to be used when invoking any monitor command to ensure safety wrt any asynchronous events that may be dispatched from the monitor. It should be acquired before running a command. The job condition *MUST* be held before acquiring the monitor lock The virDomainObjPtr lock *MUST* be held before acquiring the monitor lock. The virDomainObjPtr lock *MUST* then be released when invoking the monitor command. The driver lock *MUST* be released when invoking the monitor commands. This ensures that the virDomainObjPtr & driver are both unlocked while sleeping/waiting for the monitor response. Helper methods -------------- To lock the driver qemuDriverLock() - Acquires the driver lock qemuDriverUnlock() - Releases the driver lock To lock the virDomainObjPtr virDomainObjLock() - Acquires the virDomainObjPtr lock virDomainObjUnlock() - Releases the virDomainObjPtr lock To acquire the job condition variable (int jobActive) qemuDomainObjBeginJob() (if driver is unlocked) - Increments ref count on virDomainObjPtr - Wait qemuDomainObjPrivate condition 'jobActive != 0' using virDomainObjPtr mutex - Sets jobActive to 1 qemuDomainObjBeginJobWithDriver() (if driver needs to be locked) - Unlocks driver - Increments ref count on virDomainObjPtr - Wait qemuDomainObjPrivate condition 'jobActive != 0' using virDomainObjPtr mutex - Sets jobActive to 1 - Unlocks virDomainObjPtr - Locks driver - Locks virDomainObjPtr NB: this variant is required in order to comply with lock ordering rules for virDomainObjPtr vs driver qemuDomainObjEndJob() - Set jobActive to 0 - Signal on qemuDomainObjPrivate condition - Decrements ref count on virDomainObjPtr To acquire the QEMU monitor lock qemuDomainObjEnterMonitor() - Acquires the qemuMonitorObjPtr lock - Releases the virDomainObjPtr lock qemuDomainObjExitMonitor() - Acquires the virDomainObjPtr lock - Releases the qemuMonitorObjPtr lock NB: caller must take care to drop the driver lock if neccessary Design patterns --------------- All driver methods must follow one of these design patterns to ensure thread safety and lock correctness. * Accessing or updating something with just the driver qemuDriverLock(driver); ...do work... qemuDriverUnlock(driver); * Accessing something directly todo with a virDomainObjPtr virDomainObjPtr obj; qemuDriverLock(driver); obj = virDomainFindByUUID(driver->domains, dom->uuid); qemuDriverUnlock(driver); ...do work... virDomainObjUnlock(obj); * Accessing something directly todo with a virDomainObjPtr and driver virDomainObjPtr obj; qemuDriverLock(driver); obj = virDomainFindByUUID(driver->domains, dom->uuid); ...do work... virDomainObjUnlock(obj); qemuDriverUnlock(driver); * Updating something directly todo with a virDomainObjPtr virDomainObjPtr obj; qemuDriverLock(driver); obj = virDomainFindByUUID(driver->domains, dom->uuid); qemuDriverUnlock(driver); qemuDomainObjBeginJob(obj); ...do work... qemuDomainObjEndJob(obj); virDomainObjUnlock(obj); * Invoking a monitor command on a virDomainObjPtr virDomainObjPtr obj; qemuDomainObjPrivatePtr priv; qemuDriverLockRO(driver); obj = virDomainFindByUUID(driver->domains, dom->uuid); qemuDriverUnlock(driver); qemuDomainObjBeginJob(obj); ...do prep work... qemuDomainObjEnterMonitor(obj); qemuMonitorXXXX(priv->mon); qemuDomainObjExitMonitor(obj); ...do final work... qemuDomainObjEndJob(obj); virDomainObjUnlock(obj); * Invoking a monitor command on a virDomainObjPtr with driver locked too virDomainObjPtr obj; qemuDomainObjPrivatePtr priv; qemuDriverLock(driver); obj = virDomainFindByUUID(driver->domains, dom->uuid); qemuDomainObjBeginJobWithDriver(obj); ...do prep work... qemuDomainObjEnterMonitor(obj); qemuDriverUnlock(driver); qemuMonitorXXXX(priv->mon); qemuDriverLock(driver); qemuDomainObjExitMonitor(obj); ...do final work... qemuDomainObjEndJob(obj); virDomainObjUnlock(obj); qemuDriverUnlock(driver); Summary ------- * Respect lock ordering rules: never lock driver if anything else is already locked * Don't hold locks in code which sleeps: unlock driver & virDomainObjPtr when using monitor -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Wed, Oct 28, 2009 at 05:49:15PM +0000, Daniel P. Berrange wrote:
Helper methods --------------
To lock the driver
qemuDriverLock() - Acquires the driver lock
qemuDriverUnlock() - Releases the driver lock
To lock the virDomainObjPtr
virDomainObjLock() - Acquires the virDomainObjPtr lock
virDomainObjUnlock() - Releases the virDomainObjPtr lock
To acquire the job condition variable (int jobActive)
qemuDomainObjBeginJob() (if driver is unlocked) - Increments ref count on virDomainObjPtr - Wait qemuDomainObjPrivate condition 'jobActive != 0' using virDomainObjPtr mutex - Sets jobActive to 1
qemuDomainObjBeginJobWithDriver() (if driver needs to be locked) - Unlocks driver - Increments ref count on virDomainObjPtr - Wait qemuDomainObjPrivate condition 'jobActive != 0' using virDomainObjPtr mutex - Sets jobActive to 1 - Unlocks virDomainObjPtr - Locks driver - Locks virDomainObjPtr
NB: this variant is required in order to comply with lock ordering rules for virDomainObjPtr vs driver
qemuDomainObjEndJob() - Set jobActive to 0 - Signal on qemuDomainObjPrivate condition - Decrements ref count on virDomainObjPtr
To acquire the QEMU monitor lock
qemuDomainObjEnterMonitor() - Acquires the qemuMonitorObjPtr lock - Releases the virDomainObjPtr lock
qemuDomainObjExitMonitor() - Acquires the virDomainObjPtr lock - Releases the qemuMonitorObjPtr lock
NB: caller must take care to drop the driver lock if neccessary
The actual implementations of these methods I'm proposing are typedef struct _qemuDomainObjPrivate qemuDomainObjPrivate; typedef qemuDomainObjPrivate *qemuDomainObjPrivatePtr; struct _qemuDomainObjPrivate { virCond jobCond; /* Use in conjunction with main virDomainObjPtr lock */ int jobActive; /* Non-zero if a job is active. Only 1 job is allowed at any time * A job includes *all* monitor commands, even those just querying * information, not merely actions */ qemuMonitorPtr mon; }; static void qemuDriverLock(struct qemud_driver *driver) { virMutexLock(&driver->lock); } static void qemuDriverUnlock(struct qemud_driver *driver) { virMutexUnlock(&driver->lock); } /* * obj must be locked before calling, qemud_driver must NOT be locked * * This must be called by anything that will change the VM state * in any way, or anything that will use the QEMU monitor. * * Upon successful return, the object will have its ref count increased, * successful calls must be followed by EndJob eventually */ static int qemuDomainObjBeginJob(virDomainObjPtr obj) { qemuDomainObjPrivatePtr priv = obj->privateData; virDomainObjRef(obj); while (priv->jobActive) { if (virCondWait(&priv->jobCond, &obj->lock) < 0) { virDomainObjUnref(obj); return -1; } } priv->jobActive = 1; return 0; } /* * obj must be locked before calling, qemud_driver must be locked * * This must be called by anything that will change the VM state * in any way, or anything that will use the QEMU monitor. */ static int qemuDomainObjBeginJobWithDriver(struct qemud_driver *driver, virDomainObjPtr obj) { qemuDomainObjPrivatePtr priv = obj->privateData; virDomainObjRef(obj); qemuDriverUnlock(driver); while (priv->jobActive) { if (virCondWait(&priv->jobCond, &obj->lock) < 0) return -1; } priv->jobActive = 1; virDomainObjUnlock(obj); qemuDriverLock(driver); virDomainObjLock(obj); return 0; } /* * obj must be locked before calling, qemud_driver does not matter * * To be called after completing the work associated with the * earlier qemuDomainBeginJob() call */ static void qemuDomainObjEndJob(virDomainObjPtr obj) { qemuDomainObjPrivatePtr priv = obj->privateData; priv->jobActive = 0; virCondSignal(&priv->jobCond); virDomainObjUnref(obj); } /* * obj must be locked before calling, qemud_driver does not matter * * To be called immediately before any QEMU monitor API call * Must have alrady called qemuDomainObjBeginJob(). * * To be followed with qemuDomainObjExitMonitor() once complete */ static void qemuDomainObjEnterMonitor(virDomainObjPtr obj) { qemuDomainObjPrivatePtr priv = obj->privateData; qemuMonitorLock(priv->mon); virDomainObjUnlock(obj); } /* obj must NOT be locked before calling, qemud_driver does not matter, * but if qemud_driver is to be locked, it should be done so before * this call, not after. This avoids deadlock. * * Should be paired with an earlier qemuDomainObjEnterMonitor() call */ static void qemuDomainObjExitMonitor(virDomainObjPtr obj) { qemuDomainObjPrivatePtr priv = obj->privateData; qemuMonitorUnlock(priv->mon); virDomainObjLock(obj); } Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Wed, Oct 28, 2009 at 05:49:15PM +0000, Daniel P. Berrange wrote:
The current QEMU driver makes use of 2 locks
- The driver lock - The virDomainObjPtr lock
The idea is the driver lock is not held for long periods of time. Unfortunately we don't always deal with this very well - some code needs todo quite alot with the driver - particularly starting and stopping of guests.
The bigger problem is that the virDomainObjPtr lock is often held for long periods, specifically whenever we invoke a monitor command. Some of these commands can take a very long time (even infinite if someone has send SIGSTOP to QEMU). This very quickly blocks the whole driver.
I've realized that even with the series of monitor patches I sent out, changing the driver mutex to a RWLock, and adding a separate lock on the qemuMonitorPtr object iself, there's still a major concurrency problem: the virDomainObjPtr lock is held for too long. I propose to drop the RWLock patch, and do something totally different instead....
okay,
We fundamentally need to drop the virDomainObjPtr lock whenever we invoke a monitor command. Unfortunately, merely dropping the virDomainObjPtr and acquiring the qemuMonitorPtr is not safe.
An API call which changes the VM state typically has 3 phases
1. Check what state/config the VM is in 2. Invoke the monitor command 3. Update the state/config of the VM
If we release the virDomainObjPtr, and acquire qemuMonitorPtr at step 2, then other APIs calls will be able to complete their own step 1 checks, and get blocked at step 2. This is not safe, because when the original call moves onto step 3 and changes the state, this will have invalidated the checks the other sleeping API calls made in step 1.
We need to prevent any API call starting step 1, for as long as there is a monitor command being run, even if the lock on virDomainObjPtr is not held.
The only way I see todo this, is to introduce a condition variable indicating that a state change is to be made. Any API call which intends to make a state change must acquire this condition prior to step 1. They can thus safely do their checks, and move onto step 2, releasing the virDomainObj lock whle the monitor command is running, and reacquiring it after.
sounds good as monitoring won't be blocked by state change operations and since they are the background load of commands especially in a monitored situation the scheme sounds fine.
All other API calls making changes get safely queued up at step 1, but API calls which simply wish to query information can run without being blocked at all. This fixes the major concurrency problem with running monitor commands. The use of a condition variable at the start of step 1, also allows us to time out API calls, if some other thread get stuck in the monitor for too long. I think this also makes the use of a RWLock on the QEMU driver unneccessary, since no code will ever be holding a mutex in any place that sleeps/wait. Only the condition variable will be held during sleeps/waits.
Since we'll now effectively have 3 locks, and 1 condition variable this is getting kind of complex. So the rest of this mail is a file I propose to put in src/qemu/THREADS.txt describing what is going on, and showing the recommended design patterns to use.
I have just one remark, this separation between APIs might be done one level up, i.e. at the library entry point level we should know what may induce a state change and those could be flagged more formally. This may help other drivers where libvirt needs to keep the state instead of asking the hypervisor.
Daniel
QEMU Driver Threading: The Rules =================================
This document describes how thread safety is ensured throughout the QEMU driver. The criteria for this model are:
- Objects must never be exclusively locked for any pro-longed time - Code which sleeps must be able to time out after suitable period - Must be safe against dispatch asynchronous events from monitor
Basic locking primitives ------------------------
There are a number of locks on various objects
* struct qemud_driver: RWLock
This is the top level lock on the entire driver. Every API call in the QEMU driver is blocked while this is held, though some internal callbacks may still run asynchronously. This lock must never be held for anything which sleeps/waits (ie monitor commands)
When obtaining the driver lock, under *NO* circumstances must any lock be held on a virDomainObjPtr. This *WILL* result in deadlock.
Any chance to enforce that at the code level ? Since we have primitives for both, we could once the RW lock is taken set a flag in the driver, and the DomainObj locking/unlocking routine could raise an error if this happen.
* virDomainObjPtr: Mutex
Will be locked after calling any of the virDomainFindBy{ID,Name,UUID} methods.
Lock must be held when changing/reading any variable in the virDomainObjPtr
Once the lock is held, you must *NOT* try to lock the driver. You must release all virDomainObjPtr locks before locking the driver, or deadlock *WILL* occurr.
If the lock needs to be dropped & then re-acquired for a short period of time, the reference count must be incremented first using virDomainObjRef(). If the reference count is incremented in this way, it is not neccessary to have the driver locked when re-acquiring the dropped locked, since the reference count prevents it being freed by another thread.
This lock must not be held for anything which sleeps/waits (ie monitor commands).
* qemuMonitorPrivatePtr: Job condition
Since virDomainObjPtr lock must not be held during sleeps, the job condition provides additional protection for code making updates.
Immediately after acquiring the virDomainObjPtr lock, any method which intends to update state, must acquire the job condition. The virDomainObjPtr lock is released while blocking on this condition variable. Once the job condition is acquired a method can safely release the virDomainObjPtr lock whenever it hits a piece of code which may sleep/wait, and re-acquire it after the sleep/ wait.
* qemuMonitorPtr: Mutex
Lock to be used when invoking any monitor command to ensure safety wrt any asynchronous events that may be dispatched from the monitor. It should be acquired before running a command.
The job condition *MUST* be held before acquiring the monitor lock
The virDomainObjPtr lock *MUST* be held before acquiring the monitor lock.
The virDomainObjPtr lock *MUST* then be released when invoking the monitor command.
The driver lock *MUST* be released when invoking the monitor commands.
This ensures that the virDomainObjPtr & driver are both unlocked while sleeping/waiting for the monitor response.
I had to read this twice and I'm not sure I managed to fully map mentally the full set of constraints.
Helper methods --------------
To lock the driver
qemuDriverLock() - Acquires the driver lock
qemuDriverUnlock() - Releases the driver lock
To lock the virDomainObjPtr
virDomainObjLock() - Acquires the virDomainObjPtr lock
virDomainObjUnlock() - Releases the virDomainObjPtr lock
To acquire the job condition variable (int jobActive)
qemuDomainObjBeginJob() (if driver is unlocked) - Increments ref count on virDomainObjPtr - Wait qemuDomainObjPrivate condition 'jobActive != 0' using virDomainObjPtr mutex - Sets jobActive to 1
qemuDomainObjBeginJobWithDriver() (if driver needs to be locked) - Unlocks driver - Increments ref count on virDomainObjPtr - Wait qemuDomainObjPrivate condition 'jobActive != 0' using virDomainObjPtr mutex - Sets jobActive to 1 - Unlocks virDomainObjPtr - Locks driver - Locks virDomainObjPtr
NB: this variant is required in order to comply with lock ordering rules for virDomainObjPtr vs driver
qemuDomainObjEndJob() - Set jobActive to 0 - Signal on qemuDomainObjPrivate condition - Decrements ref count on virDomainObjPtr
To acquire the QEMU monitor lock
qemuDomainObjEnterMonitor() - Acquires the qemuMonitorObjPtr lock - Releases the virDomainObjPtr lock
qemuDomainObjExitMonitor() - Acquires the virDomainObjPtr lock - Releases the qemuMonitorObjPtr lock
NB: caller must take care to drop the driver lock if neccessary
It would be good if a maximum number of the constraints lested above could also be checked at runtime. Sure we could try to make new checking rules like we did for previous locking checks but it's hard for someone doing a patch to really run those. And I doubt the extra burden of checking a few conditions in locking routines would really impact performances. The only problem might be availbaility of pointers at the locking routines (or wrappers) to get the informations.
Design patterns ---------------
All driver methods must follow one of these design patterns to ensure thread safety and lock correctness.
* Accessing or updating something with just the driver
qemuDriverLock(driver);
...do work...
qemuDriverUnlock(driver);
* Accessing something directly todo with a virDomainObjPtr
virDomainObjPtr obj;
qemuDriverLock(driver); obj = virDomainFindByUUID(driver->domains, dom->uuid); qemuDriverUnlock(driver);
...do work...
virDomainObjUnlock(obj);
* Accessing something directly todo with a virDomainObjPtr and driver
virDomainObjPtr obj;
qemuDriverLock(driver); obj = virDomainFindByUUID(driver->domains, dom->uuid);
...do work...
virDomainObjUnlock(obj); qemuDriverUnlock(driver);
* Updating something directly todo with a virDomainObjPtr
virDomainObjPtr obj;
qemuDriverLock(driver); obj = virDomainFindByUUID(driver->domains, dom->uuid); qemuDriverUnlock(driver);
qemuDomainObjBeginJob(obj);
...do work...
qemuDomainObjEndJob(obj);
virDomainObjUnlock(obj);
* Invoking a monitor command on a virDomainObjPtr
virDomainObjPtr obj; qemuDomainObjPrivatePtr priv;
qemuDriverLockRO(driver); obj = virDomainFindByUUID(driver->domains, dom->uuid); qemuDriverUnlock(driver);
qemuDomainObjBeginJob(obj);
...do prep work...
qemuDomainObjEnterMonitor(obj); qemuMonitorXXXX(priv->mon); qemuDomainObjExitMonitor(obj);
...do final work...
qemuDomainObjEndJob(obj); virDomainObjUnlock(obj);
* Invoking a monitor command on a virDomainObjPtr with driver locked too
virDomainObjPtr obj; qemuDomainObjPrivatePtr priv;
qemuDriverLock(driver); obj = virDomainFindByUUID(driver->domains, dom->uuid);
qemuDomainObjBeginJobWithDriver(obj);
...do prep work...
qemuDomainObjEnterMonitor(obj); qemuDriverUnlock(driver); qemuMonitorXXXX(priv->mon); qemuDriverLock(driver); qemuDomainObjExitMonitor(obj);
...do final work...
qemuDomainObjEndJob(obj); virDomainObjUnlock(obj); qemuDriverUnlock(driver);
Summary -------
* Respect lock ordering rules: never lock driver if anything else is already locked
* Don't hold locks in code which sleeps: unlock driver & virDomainObjPtr when using monitor
It's good to have all those described, I'm still worried by the complexity level, especially for someone contributing small changes, and by the qemu specific nature of the guidelines. how much of this is generic for example for other drivers doing read only operations with a domain, etc ... Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Thu, Oct 29, 2009 at 06:04:29PM +0100, Daniel Veillard wrote:
All other API calls making changes get safely queued up at step 1, but API calls which simply wish to query information can run without being blocked at all. This fixes the major concurrency problem with running monitor commands. The use of a condition variable at the start of step 1, also allows us to time out API calls, if some other thread get stuck in the monitor for too long. I think this also makes the use of a RWLock on the QEMU driver unneccessary, since no code will ever be holding a mutex in any place that sleeps/wait. Only the condition variable will be held during sleeps/waits.
Since we'll now effectively have 3 locks, and 1 condition variable this is getting kind of complex. So the rest of this mail is a file I propose to put in src/qemu/THREADS.txt describing what is going on, and showing the recommended design patterns to use.
I have just one remark, this separation between APIs might be done one level up, i.e. at the library entry point level we should know what may induce a state change and those could be flagged more formally. This may help other drivers where libvirt needs to keep the state instead of asking the hypervisor.
It can't be done at the library entry level, since the locking needs to be done against objects that are private to the driver.
Basic locking primitives ------------------------
There are a number of locks on various objects
* struct qemud_driver: RWLock
Opps, this should have said 'Mutex' rather than RWLock
This is the top level lock on the entire driver. Every API call in the QEMU driver is blocked while this is held, though some internal callbacks may still run asynchronously. This lock must never be held for anything which sleeps/waits (ie monitor commands)
When obtaining the driver lock, under *NO* circumstances must any lock be held on a virDomainObjPtr. This *WILL* result in deadlock.
Any chance to enforce that at the code level ? Since we have primitives for both, we could once the RW lock is taken set a flag in the driver, and the DomainObj locking/unlocking routine could raise an error if this happen.
That is not possible todo safely. If you add a flag in the driver to indicate whether it is locked or not, then you need to add another mutex to protect reads/write to that flag, otherwise you've got a clear race condition in checking it.
* qemuMonitorPtr: Mutex
Lock to be used when invoking any monitor command to ensure safety wrt any asynchronous events that may be dispatched from the monitor. It should be acquired before running a command.
The job condition *MUST* be held before acquiring the monitor lock
The virDomainObjPtr lock *MUST* be held before acquiring the monitor lock.
The virDomainObjPtr lock *MUST* then be released when invoking the monitor command.
The driver lock *MUST* be released when invoking the monitor commands.
This ensures that the virDomainObjPtr & driver are both unlocked while sleeping/waiting for the monitor response.
I had to read this twice and I'm not sure I managed to fully map mentally the full set of constraints.
Essentially there's a hierarchy of objects Driver -> virDomainObjPtr -> qemuMonitorPtr You have to acquire the locks in that order, and once you've acquired the final qemuMonitorPtr lock, you must release the other locks before running the actual monitor command.
To acquire the job condition variable (int jobActive)
qemuDomainObjBeginJob() (if driver is unlocked) - Increments ref count on virDomainObjPtr - Wait qemuDomainObjPrivate condition 'jobActive != 0' using virDomainObjPtr mutex - Sets jobActive to 1
qemuDomainObjBeginJobWithDriver() (if driver needs to be locked) - Unlocks driver - Increments ref count on virDomainObjPtr - Wait qemuDomainObjPrivate condition 'jobActive != 0' using virDomainObjPtr mutex - Sets jobActive to 1 - Unlocks virDomainObjPtr - Locks driver - Locks virDomainObjPtr
NB: this variant is required in order to comply with lock ordering rules for virDomainObjPtr vs driver
qemuDomainObjEndJob() - Set jobActive to 0 - Signal on qemuDomainObjPrivate condition - Decrements ref count on virDomainObjPtr
To acquire the QEMU monitor lock
qemuDomainObjEnterMonitor() - Acquires the qemuMonitorObjPtr lock - Releases the virDomainObjPtr lock
qemuDomainObjExitMonitor() - Acquires the virDomainObjPtr lock - Releases the qemuMonitorObjPtr lock
NB: caller must take care to drop the driver lock if neccessary
It would be good if a maximum number of the constraints lested above could also be checked at runtime. Sure we could try to make new checking rules like we did for previous locking checks but it's hard for someone doing a patch to really run those. And I doubt the extra burden of checking a few conditions in locking routines would really impact performances. The only problem might be availbaility of pointers at the locking routines (or wrappers) to get the informations.
As before it is not possible to check those constraints safely at runtime without adding yet more locks. The idea of adding these methods qemuDomainObjBeginJob, qemuDomainObjEndJob, qemuDomainObjEnterMonitor and qemuDomainObjExitMonitor, is that they take the complexity out of the code. By defining the common code patterns, and making everything use these helpers instead of the locks themselves, we ensure that all code is compliant with the rules. It has taken that complex set of ordering rules and simplified it to one of the patterns shown below
Design patterns ---------------
All driver methods must follow one of these design patterns to ensure thread safety and lock correctness.
* Accessing or updating something with just the driver
qemuDriverLock(driver);
...do work...
qemuDriverUnlock(driver);
* Accessing something directly todo with a virDomainObjPtr
virDomainObjPtr obj;
qemuDriverLock(driver); obj = virDomainFindByUUID(driver->domains, dom->uuid); qemuDriverUnlock(driver);
...do work...
virDomainObjUnlock(obj);
* Accessing something directly todo with a virDomainObjPtr and driver
virDomainObjPtr obj;
qemuDriverLock(driver); obj = virDomainFindByUUID(driver->domains, dom->uuid);
...do work...
virDomainObjUnlock(obj); qemuDriverUnlock(driver);
* Updating something directly todo with a virDomainObjPtr
virDomainObjPtr obj;
qemuDriverLock(driver); obj = virDomainFindByUUID(driver->domains, dom->uuid); qemuDriverUnlock(driver);
qemuDomainObjBeginJob(obj);
...do work...
qemuDomainObjEndJob(obj);
virDomainObjUnlock(obj);
* Invoking a monitor command on a virDomainObjPtr
virDomainObjPtr obj; qemuDomainObjPrivatePtr priv;
qemuDriverLockRO(driver); obj = virDomainFindByUUID(driver->domains, dom->uuid); qemuDriverUnlock(driver);
qemuDomainObjBeginJob(obj);
...do prep work...
qemuDomainObjEnterMonitor(obj); qemuMonitorXXXX(priv->mon); qemuDomainObjExitMonitor(obj);
...do final work...
qemuDomainObjEndJob(obj); virDomainObjUnlock(obj);
* Invoking a monitor command on a virDomainObjPtr with driver locked too
virDomainObjPtr obj; qemuDomainObjPrivatePtr priv;
qemuDriverLock(driver); obj = virDomainFindByUUID(driver->domains, dom->uuid);
qemuDomainObjBeginJobWithDriver(obj);
...do prep work...
qemuDomainObjEnterMonitor(obj); qemuDriverUnlock(driver); qemuMonitorXXXX(priv->mon); qemuDriverLock(driver); qemuDomainObjExitMonitor(obj);
...do final work...
qemuDomainObjEndJob(obj); virDomainObjUnlock(obj); qemuDriverUnlock(driver);
Summary -------
* Respect lock ordering rules: never lock driver if anything else is already locked
* Don't hold locks in code which sleeps: unlock driver & virDomainObjPtr when using monitor
It's good to have all those described, I'm still worried by the complexity level, especially for someone contributing small changes, and by the qemu specific nature of the guidelines. how much of this is generic for example for other drivers doing read only operations with a domain, etc ...
The other drivers don't really have the equivalent of the QEMU monitor (well the UML driver does have a very simple version, but we've not hooked that up). Their methods are all fairly fast to complete, so don't suffer as badly from concurrency bottlenecks that hit the QEMU driver Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Fri, Oct 30, 2009 at 11:05:03AM +0000, Daniel P. Berrange wrote:
On Thu, Oct 29, 2009 at 06:04:29PM +0100, Daniel Veillard wrote:
All other API calls making changes get safely queued up at step 1, but API calls which simply wish to query information can run without being blocked at all. This fixes the major concurrency problem with running monitor commands. The use of a condition variable at the start of step 1, also allows us to time out API calls, if some other thread get stuck in the monitor for too long. I think this also makes the use of a RWLock on the QEMU driver unneccessary, since no code will ever be holding a mutex in any place that sleeps/wait. Only the condition variable will be held during sleeps/waits.
Since we'll now effectively have 3 locks, and 1 condition variable this is getting kind of complex. So the rest of this mail is a file I propose to put in src/qemu/THREADS.txt describing what is going on, and showing the recommended design patterns to use.
I have just one remark, this separation between APIs might be done one level up, i.e. at the library entry point level we should know what may induce a state change and those could be flagged more formally. This may help other drivers where libvirt needs to keep the state instead of asking the hypervisor.
It can't be done at the library entry level, since the locking needs to be done against objects that are private to the driver.
hum, I'm not suggesting to do the locking one level up, but to flag in some way entry points which may induce a state change.
Basic locking primitives ------------------------
There are a number of locks on various objects
* struct qemud_driver: RWLock
Opps, this should have said 'Mutex' rather than RWLock
Ah right, since you said you dropped the idea I was a bit surprized...
This is the top level lock on the entire driver. Every API call in the QEMU driver is blocked while this is held, though some internal callbacks may still run asynchronously. This lock must never be held for anything which sleeps/waits (ie monitor commands)
When obtaining the driver lock, under *NO* circumstances must any lock be held on a virDomainObjPtr. This *WILL* result in deadlock.
Any chance to enforce that at the code level ? Since we have primitives for both, we could once the RW lock is taken set a flag in the driver, and the DomainObj locking/unlocking routine could raise an error if this happen.
That is not possible todo safely. If you add a flag in the driver to indicate whether it is locked or not, then you need to add another mutex to protect reads/write to that flag, otherwise you've got a clear race condition in checking it.
Well you can't protect at 100% but that state change could be modified after having taken the lock and just before releasing it. It's not a protection against reentrancy, it's about detecting a problem at runtime. You may not be able to detect it for a new nanosecs after having taken the lock or before releasing it, but that will allow better runtime error reporting in general than just a hung driver which was the outcome when we introduced locking and chased locking. The goal is to be able to log (in general but not a garanteed 100%) when we are doing something which may lead to a lock, report this in the log files and allow users to send them.
* qemuMonitorPtr: Mutex
Lock to be used when invoking any monitor command to ensure safety wrt any asynchronous events that may be dispatched from the monitor. It should be acquired before running a command.
The job condition *MUST* be held before acquiring the monitor lock
The virDomainObjPtr lock *MUST* be held before acquiring the monitor lock.
The virDomainObjPtr lock *MUST* then be released when invoking the monitor command.
The driver lock *MUST* be released when invoking the monitor commands.
This ensures that the virDomainObjPtr & driver are both unlocked while sleeping/waiting for the monitor response.
I had to read this twice and I'm not sure I managed to fully map mentally the full set of constraints.
Essentially there's a hierarchy of objects
Driver -> virDomainObjPtr -> qemuMonitorPtr
You have to acquire the locks in that order, and once you've acquired the final qemuMonitorPtr lock, you must release the other locks before running the actual monitor command.
okay
It would be good if a maximum number of the constraints lested above could also be checked at runtime. Sure we could try to make new checking rules like we did for previous locking checks but it's hard for someone doing a patch to really run those. And I doubt the extra burden of checking a few conditions in locking routines would really impact performances. The only problem might be availbaility of pointers at the locking routines (or wrappers) to get the informations.
As before it is not possible to check those constraints safely at runtime without adding yet more locks. The idea of adding these methods qemuDomainObjBeginJob, qemuDomainObjEndJob, qemuDomainObjEnterMonitor and qemuDomainObjExitMonitor, is that they take the complexity out of the code. By defining the common code patterns, and making everything use these helpers instead of the locks themselves, we ensure that all code is compliant with the rules. It has taken that complex set of ordering rules and simplified it to one of the patterns shown below
A good point, sure ! But a small extra step would allow to debug this more easilly, unless you;re sure we really won't hit problems after the refactoring. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/
participants (2)
-
Daniel P. Berrange
-
Daniel Veillard