[libvirt] QEMU driver thread safety rules

28 Oct 2009

      The current QEMU driver makes use of 2 locks

 - The driver lock
 - The virDomainObjPtr lock

The idea is the driver lock is not held for long periods of time.
Unfortunately we don't always deal with this very well - some
code needs todo quite alot with the driver - particularly starting
and stopping of guests.

The bigger problem is that the virDomainObjPtr lock is often held
for long periods, specifically whenever we invoke a monitor command.
Some of these commands can take a very long time (even infinite if
someone has send SIGSTOP to QEMU). This very quickly blocks the 
whole driver.

I've realized that even with the series of monitor patches I sent
out, changing the driver mutex to a RWLock, and adding a separate
lock on the qemuMonitorPtr object iself, there's still a major
concurrency problem: the virDomainObjPtr lock is held for too
long.  I propose to drop the RWLock patch, and do something totally
different instead....

We fundamentally need to drop the virDomainObjPtr lock whenever 
we invoke a monitor command. Unfortunately, merely dropping the
virDomainObjPtr and acquiring the qemuMonitorPtr is not safe.

An API call which changes the VM state typically has 3 phases

  1. Check what state/config the VM is in
  2. Invoke the monitor command
  3. Update the state/config of the VM

If we release the virDomainObjPtr, and acquire qemuMonitorPtr
at step 2, then other APIs calls will be able to complete 
their own step 1 checks, and get blocked at step 2. This is
not safe, because when the original call moves onto step 3
and changes the state, this will have invalidated the checks
the other sleeping API calls made in step 1.

We need to prevent any API call starting step 1, for as long
as there is a monitor command being run, even if the lock on
virDomainObjPtr is not held.

The only way I see todo this, is to introduce a condition 
variable indicating that a state change is to be made. Any
API call which intends to make a state change must acquire 
this condition prior to step 1. They can thus safely do their
checks, and move onto step 2, releasing the virDomainObj lock
whle the monitor command is running, and reacquiring it after.

All other API calls making changes get safely queued up at
step 1, but  API calls which simply wish to query information
can run without being blocked at all. This fixes the major
concurrency problem with running monitor commands. The use
of a condition variable at the start of step 1, also allows
us to time out API calls, if some other thread get stuck in
the monitor for too long.  I think this also makes the use of
a RWLock on the QEMU driver unneccessary, since no code will
ever be holding a mutex in any place that sleeps/wait. Only
the condition variable will be held during sleeps/waits.

Since we'll now effectively have 3 locks, and 1 condition
variable this is getting kind of complex. So the rest of this
mail is a file I propose to put in src/qemu/THREADS.txt
describing what is going on, and showing the recommended 
design patterns to use.

Daniel

   QEMU Driver  Threading: The Rules
   =================================

This document describes how thread safety is ensured throughout
the QEMU driver. The criteria for this model are:

 - Objects must never be exclusively locked for any pro-longed time
 - Code which sleeps must be able to time out after suitable period
 - Must be safe against dispatch asynchronous events from monitor

Basic locking primitives
------------------------

There are a number of locks on various objects

  * struct qemud_driver: RWLock

    This is the top level lock on the entire driver. Every API call in
    the QEMU driver is blocked while this is held, though some internal
    callbacks may still run asynchronously. This lock must never be held
    for anything which sleeps/waits (ie monitor commands)

    When obtaining the driver lock, under *NO* circumstances must
    any lock be held on a virDomainObjPtr. This *WILL* result in
    deadlock.

  * virDomainObjPtr:  Mutex

    Will be locked after calling any of the virDomainFindBy{ID,Name,UUID}
    methods.

    Lock must be held when changing/reading any variable in the virDomainObjPtr

    Once the lock is held, you must *NOT* try to lock the driver. You must
    release all virDomainObjPtr locks before locking the driver, or deadlock
    *WILL* occurr.

    If the lock needs to be dropped & then re-acquired for a short period of
    time, the reference count must be incremented first using virDomainObjRef().
    If the reference count is incremented in this way, it is not neccessary
    to have the driver locked when re-acquiring the dropped locked, since the
    reference count prevents it being freed by another thread.

    This lock must not be held for anything which sleeps/waits (ie monitor
    commands).

  * qemuMonitorPrivatePtr: Job condition

    Since virDomainObjPtr lock must not be held during sleeps, the job condition
    provides additional protection for code making updates.

    Immediately after acquiring the virDomainObjPtr lock, any method which intends
    to update state, must acquire the job condition. The virDomainObjPtr lock
    is released while blocking on this condition variable. Once the job condition
    is acquired a method can safely release the virDomainObjPtr lock whenever it
    hits a piece of code which may sleep/wait, and re-acquire it after the sleep/
    wait.

  * qemuMonitorPtr:  Mutex

    Lock to be used when invoking any monitor command to ensure safety
    wrt any asynchronous events that may be dispatched from the monitor.
    It should be acquired before running a command.

    The job condition *MUST* be held before acquiring the monitor lock

    The virDomainObjPtr lock *MUST* be held before acquiring the monitor
    lock.

    The virDomainObjPtr lock *MUST* then be released when invoking the
    monitor command.

    The driver lock *MUST* be released when invoking the monitor commands.

    This ensures that the virDomainObjPtr & driver are both unlocked while
    sleeping/waiting for the monitor response.

Helper methods
--------------

To lock the driver

  qemuDriverLock()
    - Acquires the driver lock

  qemuDriverUnlock()
    - Releases the driver lock

To lock the virDomainObjPtr

  virDomainObjLock()
    - Acquires the virDomainObjPtr lock

  virDomainObjUnlock()
    - Releases the virDomainObjPtr lock

To acquire the job condition variable (int jobActive)

  qemuDomainObjBeginJob()           (if driver is unlocked)
    - Increments ref count on virDomainObjPtr
    - Wait qemuDomainObjPrivate condition 'jobActive != 0' using virDomainObjPtr mutex
    - Sets jobActive to 1

  qemuDomainObjBeginJobWithDriver() (if driver needs to be locked)
    - Unlocks driver
    - Increments ref count on virDomainObjPtr
    - Wait qemuDomainObjPrivate condition 'jobActive != 0' using virDomainObjPtr mutex
    - Sets jobActive to 1
    - Unlocks virDomainObjPtr
    - Locks driver
    - Locks virDomainObjPtr

   NB: this variant is required in order to comply with lock ordering rules
   for virDomainObjPtr vs driver

  qemuDomainObjEndJob()
    - Set jobActive to 0
    - Signal on qemuDomainObjPrivate condition
    - Decrements ref count on virDomainObjPtr

To acquire the QEMU monitor lock

  qemuDomainObjEnterMonitor()
    - Acquires the qemuMonitorObjPtr lock
    - Releases the virDomainObjPtr lock

  qemuDomainObjExitMonitor()
    - Acquires the virDomainObjPtr lock
    - Releases the qemuMonitorObjPtr lock

  NB: caller must take care to drop the driver lock if neccessary

Design patterns
---------------

All driver methods must follow one of these design patterns to
ensure thread safety and lock correctness.

 * Accessing or updating something with just the driver

     qemuDriverLock(driver);

     ...do work...

     qemuDriverUnlock(driver);

 * Accessing something directly todo with a virDomainObjPtr

     virDomainObjPtr obj;

     qemuDriverLock(driver);
     obj = virDomainFindByUUID(driver->domains, dom->uuid);
     qemuDriverUnlock(driver);

     ...do work...

     virDomainObjUnlock(obj);

 * Accessing something directly todo with a virDomainObjPtr and driver

     virDomainObjPtr obj;

     qemuDriverLock(driver);
     obj = virDomainFindByUUID(driver->domains, dom->uuid);

     ...do work...

     virDomainObjUnlock(obj);
     qemuDriverUnlock(driver);

 * Updating something directly todo with a virDomainObjPtr

     virDomainObjPtr obj;

     qemuDriverLock(driver);
     obj = virDomainFindByUUID(driver->domains, dom->uuid);
     qemuDriverUnlock(driver);

     qemuDomainObjBeginJob(obj);

     ...do work...

     qemuDomainObjEndJob(obj);

     virDomainObjUnlock(obj);

 * Invoking a monitor command on a virDomainObjPtr

     virDomainObjPtr obj;
     qemuDomainObjPrivatePtr priv;

     qemuDriverLockRO(driver);
     obj = virDomainFindByUUID(driver->domains, dom->uuid);
     qemuDriverUnlock(driver);

     qemuDomainObjBeginJob(obj);

     ...do prep work...

     qemuDomainObjEnterMonitor(obj);
     qemuMonitorXXXX(priv->mon);
     qemuDomainObjExitMonitor(obj);

     ...do final work...

     qemuDomainObjEndJob(obj);
     virDomainObjUnlock(obj);

 * Invoking a monitor command on a virDomainObjPtr with driver locked too

     virDomainObjPtr obj;
     qemuDomainObjPrivatePtr priv;

     qemuDriverLock(driver);
     obj = virDomainFindByUUID(driver->domains, dom->uuid);

     qemuDomainObjBeginJobWithDriver(obj);

     ...do prep work...

     qemuDomainObjEnterMonitor(obj);
     qemuDriverUnlock(driver);
     qemuMonitorXXXX(priv->mon);
     qemuDriverLock(driver);
     qemuDomainObjExitMonitor(obj);

     ...do final work...

     qemuDomainObjEndJob(obj);
     virDomainObjUnlock(obj);
     qemuDriverUnlock(driver);

Summary
-------

  * Respect lock ordering rules: never lock driver if anything else is
    already locked

  * Don't hold locks in code which sleeps: unlock driver & virDomainObjPtr
    when using monitor

-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Daniel P. Berrange

Daniel P. Berrange

Daniel Veillard

Daniel P. Berrange

Daniel Veillard

tags

participants (2)