Re: [libvirt] QEMU driver thread safety rules

Thursday, 29 October 2009

On Wed, Oct 28, 2009 at 05:49:15PM +0000, Daniel P. Berrange wrote:
...
 The current QEMU driver makes use of 2 locks

  - The driver lock
  - The virDomainObjPtr lock

 The idea is the driver lock is not held for long periods of time.
 Unfortunately we don't always deal with this very well - some
 code needs todo quite alot with the driver - particularly starting
 and stopping of guests.

 The bigger problem is that the virDomainObjPtr lock is often held
 for long periods, specifically whenever we invoke a monitor command.
 Some of these commands can take a very long time (even infinite if
 someone has send SIGSTOP to QEMU). This very quickly blocks the 
 whole driver.

 I've realized that even with the series of monitor patches I sent
 out, changing the driver mutex to a RWLock, and adding a separate
 lock on the qemuMonitorPtr object iself, there's still a major
 concurrency problem: the virDomainObjPtr lock is held for too
 long.  I propose to drop the RWLock patch, and do something totally
 different instead.... 
  okay,

...

 We fundamentally need to drop the virDomainObjPtr lock whenever 
 we invoke a monitor command. Unfortunately, merely dropping the
 virDomainObjPtr and acquiring the qemuMonitorPtr is not safe.

 An API call which changes the VM state typically has 3 phases

   1. Check what state/config the VM is in
   2. Invoke the monitor command
   3. Update the state/config of the VM

 If we release the virDomainObjPtr, and acquire qemuMonitorPtr
 at step 2, then other APIs calls will be able to complete 
 their own step 1 checks, and get blocked at step 2. This is
 not safe, because when the original call moves onto step 3
 and changes the state, this will have invalidated the checks
 the other sleeping API calls made in step 1.

 We need to prevent any API call starting step 1, for as long
 as there is a monitor command being run, even if the lock on
 virDomainObjPtr is not held.

 The only way I see todo this, is to introduce a condition 
 variable indicating that a state change is to be made. Any
 API call which intends to make a state change must acquire 
 this condition prior to step 1. They can thus safely do their
 checks, and move onto step 2, releasing the virDomainObj lock
 whle the monitor command is running, and reacquiring it after. 
  sounds good as monitoring won't be blocked by state change operations
and since they are the background load of commands especially in a
monitored situation the scheme sounds fine.

...
 All other API calls making changes get safely queued up at
 step 1, but  API calls which simply wish to query information
 can run without being blocked at all. This fixes the major
 concurrency problem with running monitor commands. The use
 of a condition variable at the start of step 1, also allows
 us to time out API calls, if some other thread get stuck in
 the monitor for too long.  I think this also makes the use of
 a RWLock on the QEMU driver unneccessary, since no code will
 ever be holding a mutex in any place that sleeps/wait. Only
 the condition variable will be held during sleeps/waits.

 Since we'll now effectively have 3 locks, and 1 condition
 variable this is getting kind of complex. So the rest of this
 mail is a file I propose to put in src/qemu/THREADS.txt
 describing what is going on, and showing the recommended 
 design patterns to use. 
  I have just one remark, this separation between APIs might
be done one level up, i.e. at the library entry point level
we should know what may induce a state change and those could
be flagged more formally. This may help other drivers where
libvirt needs to keep the state instead of asking the hypervisor.

...
 Daniel

    QEMU Driver  Threading: The Rules
    =================================

 This document describes how thread safety is ensured throughout
 the QEMU driver. The criteria for this model are:

  - Objects must never be exclusively locked for any pro-longed time
  - Code which sleeps must be able to time out after suitable period
  - Must be safe against dispatch asynchronous events from monitor

 Basic locking primitives
 ------------------------

 There are a number of locks on various objects

   * struct qemud_driver: RWLock

     This is the top level lock on the entire driver. Every API call in
     the QEMU driver is blocked while this is held, though some internal
     callbacks may still run asynchronously. This lock must never be held
     for anything which sleeps/waits (ie monitor commands)

     When obtaining the driver lock, under *NO* circumstances must
     any lock be held on a virDomainObjPtr. This *WILL* result in
     deadlock. 
  Any chance to enforce that at the code level ? Since we have
primitives for both, we could once the RW lock is taken set a flag in
the driver, and the DomainObj locking/unlocking routine could raise an
error if this happen.

...

   * virDomainObjPtr:  Mutex

     Will be locked after calling any of the virDomainFindBy{ID,Name,UUID}
     methods.

     Lock must be held when changing/reading any variable in the virDomainObjPtr

     Once the lock is held, you must *NOT* try to lock the driver. You must
     release all virDomainObjPtr locks before locking the driver, or deadlock
     *WILL* occurr.

     If the lock needs to be dropped & then re-acquired for a short period of
     time, the reference count must be incremented first using virDomainObjRef().
     If the reference count is incremented in this way, it is not neccessary
     to have the driver locked when re-acquiring the dropped locked, since the
     reference count prevents it being freed by another thread.

     This lock must not be held for anything which sleeps/waits (ie monitor
     commands).

   * qemuMonitorPrivatePtr: Job condition

     Since virDomainObjPtr lock must not be held during sleeps, the job condition
     provides additional protection for code making updates.

     Immediately after acquiring the virDomainObjPtr lock, any method which intends
     to update state, must acquire the job condition. The virDomainObjPtr lock
     is released while blocking on this condition variable. Once the job condition
     is acquired a method can safely release the virDomainObjPtr lock whenever it
     hits a piece of code which may sleep/wait, and re-acquire it after the sleep/
     wait.

   * qemuMonitorPtr:  Mutex

     Lock to be used when invoking any monitor command to ensure safety
     wrt any asynchronous events that may be dispatched from the monitor.
     It should be acquired before running a command.

     The job condition *MUST* be held before acquiring the monitor lock

     The virDomainObjPtr lock *MUST* be held before acquiring the monitor
     lock.

     The virDomainObjPtr lock *MUST* then be released when invoking the
     monitor command.

     The driver lock *MUST* be released when invoking the monitor commands.

     This ensures that the virDomainObjPtr & driver are both unlocked while
     sleeping/waiting for the monitor response. 

  I had to read this twice and I'm not sure I managed to fully map
mentally the full set of constraints. 

...

 Helper methods
 --------------

 To lock the driver

   qemuDriverLock()
     - Acquires the driver lock

   qemuDriverUnlock()
     - Releases the driver lock

 To lock the virDomainObjPtr

   virDomainObjLock()
     - Acquires the virDomainObjPtr lock

   virDomainObjUnlock()
     - Releases the virDomainObjPtr lock

 To acquire the job condition variable (int jobActive)

   qemuDomainObjBeginJob()           (if driver is unlocked)
     - Increments ref count on virDomainObjPtr
     - Wait qemuDomainObjPrivate condition 'jobActive != 0' using virDomainObjPtr
mutex
     - Sets jobActive to 1

   qemuDomainObjBeginJobWithDriver() (if driver needs to be locked)
     - Unlocks driver
     - Increments ref count on virDomainObjPtr
     - Wait qemuDomainObjPrivate condition 'jobActive != 0' using virDomainObjPtr
mutex
     - Sets jobActive to 1
     - Unlocks virDomainObjPtr
     - Locks driver
     - Locks virDomainObjPtr

    NB: this variant is required in order to comply with lock ordering rules
    for virDomainObjPtr vs driver

   qemuDomainObjEndJob()
     - Set jobActive to 0
     - Signal on qemuDomainObjPrivate condition
     - Decrements ref count on virDomainObjPtr

 To acquire the QEMU monitor lock

   qemuDomainObjEnterMonitor()
     - Acquires the qemuMonitorObjPtr lock
     - Releases the virDomainObjPtr lock

   qemuDomainObjExitMonitor()
     - Acquires the virDomainObjPtr lock
     - Releases the qemuMonitorObjPtr lock

   NB: caller must take care to drop the driver lock if neccessary

  It would be good if a maximum number of the constraints lested above
  could also be checked at runtime. Sure we could try to make new
  checking rules like we did for previous locking checks but it's hard
  for someone doing a patch to really run those. And I doubt the extra
  burden of checking a few conditions in locking routines would really
  impact performances. The only problem might be availbaility of
  pointers at the locking routines (or wrappers) to get the
  informations.

...
 Design patterns
 ---------------

 All driver methods must follow one of these design patterns to
 ensure thread safety and lock correctness.

  * Accessing or updating something with just the driver

      qemuDriverLock(driver);

      ...do work...

      qemuDriverUnlock(driver);

  * Accessing something directly todo with a virDomainObjPtr

      virDomainObjPtr obj;

      qemuDriverLock(driver);
      obj = virDomainFindByUUID(driver->domains, dom->uuid);
      qemuDriverUnlock(driver);

      ...do work...

      virDomainObjUnlock(obj);

  * Accessing something directly todo with a virDomainObjPtr and driver

      virDomainObjPtr obj;

      qemuDriverLock(driver);
      obj = virDomainFindByUUID(driver->domains, dom->uuid);

      ...do work...

      virDomainObjUnlock(obj);
      qemuDriverUnlock(driver);

  * Updating something directly todo with a virDomainObjPtr

      virDomainObjPtr obj;

      qemuDriverLock(driver);
      obj = virDomainFindByUUID(driver->domains, dom->uuid);
      qemuDriverUnlock(driver);

      qemuDomainObjBeginJob(obj);

      ...do work...

      qemuDomainObjEndJob(obj);

      virDomainObjUnlock(obj);

  * Invoking a monitor command on a virDomainObjPtr

      virDomainObjPtr obj;
      qemuDomainObjPrivatePtr priv;

      qemuDriverLockRO(driver);
      obj = virDomainFindByUUID(driver->domains, dom->uuid);
      qemuDriverUnlock(driver);

      qemuDomainObjBeginJob(obj);

      ...do prep work...

      qemuDomainObjEnterMonitor(obj);
      qemuMonitorXXXX(priv->mon);
      qemuDomainObjExitMonitor(obj);

      ...do final work...

      qemuDomainObjEndJob(obj);
      virDomainObjUnlock(obj);

  * Invoking a monitor command on a virDomainObjPtr with driver locked too

      virDomainObjPtr obj;
      qemuDomainObjPrivatePtr priv;

      qemuDriverLock(driver);
      obj = virDomainFindByUUID(driver->domains, dom->uuid);

      qemuDomainObjBeginJobWithDriver(obj);

      ...do prep work...

      qemuDomainObjEnterMonitor(obj);
      qemuDriverUnlock(driver);
      qemuMonitorXXXX(priv->mon);
      qemuDriverLock(driver);
      qemuDomainObjExitMonitor(obj);

      ...do final work...

      qemuDomainObjEndJob(obj);
      virDomainObjUnlock(obj);
      qemuDriverUnlock(driver);

 Summary
 -------

   * Respect lock ordering rules: never lock driver if anything else is
     already locked

   * Don't hold locks in code which sleeps: unlock driver & virDomainObjPtr
     when using monitor 
  It's good to have all those described, I'm still worried by the
complexity level, especially for someone contributing small changes,
and by the qemu specific nature of the guidelines. how much of this
is generic for example for other drivers doing read only operations
with a domain, etc ... 

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel(a)veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] QEMU driver thread safety rules