---
docs/internals-locking.html.in | 301 ++++++++++++++++++++++++++++++++++++++++
1 files changed, 301 insertions(+), 0 deletions(-)
create mode 100644 docs/internals-locking.html.in
diff --git a/docs/internals-locking.html.in b/docs/internals-locking.html.in
new file mode 100644
index 0000000..90054f0
--- /dev/null
+++ b/docs/internals-locking.html.in
@@ -0,0 +1,301 @@
+<html>
+ <body>
+ <h1>Resource Lock Manager</h1>
+
+ <ul id="toc"></ul>
+
+ <p>
+ This page describes the design of the resource lock manager
+ that is used for locking disk images with the QEMU driver.
+ </p>
+
+ <h2><a name="goals">Goals</a></h2>
+
+ <p>
+ The high level goal is to prevent the same disk image being
+ used by more than one QEMU instance at a time (unless the
+ disk is marked as sharable, or readonly). The scenarios
+ to be prevented are thus:
+ </p>
+
+ <ol>
+ <li>
+ Two different guests running configured to point at the
+ same disk image.
+ </li>
+ <li>
+ One guest being started more than once on two different
+ machines due to admin mistake
+ </li>
+ <li>
+ One guest being started more than once on a single machine
+ due to libvirt driver bug on aa single machine.
+ </li>
+ </ol>
+
+ <h2><a name="requirement">Requirements</a></h2>
+
+ <p>
+ The high level goal leads to a set of requirements
+ for the lock manager design
+ </p>
+
+ <ol>
+ <li>
+ A lock must be held on a disk whenever a QEMU process
+ has the disk open
+ </li>
+ <li>
+ The lock scheme must allow QEMU to be configured with
+ readonly, shared write, or exclusive writable disks
+ </li>
+ <li>
+ A lock must be held on a disk whenever libvirtd makes
+ changes to user/group ownership and SELinux labelling.
+ </li>
+ <li>
+ At least one locking impl must allow use of libvirtd on
+ a single host without any admin config tasks
+ </li>
+ <li>
+ A lock handover must be performed during the migration
+ process where 2 QEMU processes will have the same disk
+ open concurrently.
+ </li>
+ <li>
+ The lock manager must be able to identify and kill the
+ process accessing the resource if the lock is revoked.
+ </li>
+ </ol>
+
+ <h2><a name="design">Design</a></h2>
+
+ <p>
+ The requirements call for a design with two distinct lockspaces:
+ </p>
+
+ <ol>
+ <li>
+ The <strong>primary lockspace</strong> is used to protect the content of
+ disk images. This will honour the disk sharing modes to
+ allow readonly/shared disk to be assigned to multiple
+ guests concurrently.
+ </li>
+ <li>
+ The <strong>secondary lockspace</strong> is used to protect the metadata
+ of disk images. This lock will be held whenever file
+ permissions / ownership / attributes are changed, and
+ is always exclusive, regardless of sharing mode. The
+ primary lock will be held prior to obtaining the secondary
+ lock.
+ </li>
+ </ol>
+
+ <p>
+ Within each lockspace the following operations will need to be
+ supported
+ </p>
+
+ <ul>
+ <li>
+ <strong>Acquire object lock</strong>
+ Acquire locks on all resources initially
+ registered against an object
+ </li>
+ <li>
+ <strong>Release object lock</strong>
+ Release locks on all resources currently
+ registered against an object
+ </li>
+ <li>
+ <strong>Associate object lock</strong>
+ Associate the current process with an existing
+ set of locks for an object
+ </li>
+ <li>
+ <strong>Deassociate object lock</strong>
+ Deassociate the current process with an
+ existing set of locks for an object.
+ </li>
+ <li>
+ <strong>Register resource</strong>
+ Register an initial resource against an object
+ </li>
+ <li>
+ <strong>Get object lock state</strong>
+ Obtain an representation of the current object
+ lock state.
+ </li>
+ <li>
+ <strong>Acquire a resource lock</strong>
+ Register and acquire a lock for a resource
+ to be added to a locked object.
+ </li>
+ <li>
+ <strong>Release a resource lock</strong>
+ Dereigster and release a lock for a resource
+ to be removed from a lock object
+ </li>
+ </ul>
+
+ <h2><a name="impl">Plugin Implementations</a></h2>
+
+ <p>
+ Lock manager implementations are provided as LGPLv2+
+ licensed, dlopen()able library modules. A different
+ lock manager implementation may be used
+ for the primary and secondary lockspaces. With the
+ QEMU driver, these can be configured via the
+ <code>/etc/libvirt/qemu.conf</code> configuration
+ file by specifying the lock manager name.
+ </p>
+
+ <pre>
+ contentLockManager="fcntl"
+ metadataLockManager="fcntl"
+ </pre>
+
+ <p>
+ Lock manager implmentations are free to support
+ both content and metadata locks, however, if the
+ plugin author is only able to handle one lockspace,
+ the other can be delegated to the standard fcntl
+ lock manager. The QEMU driver will load the lock
+ manager plugin binaries from the following location
+ </p>
+
+ <pre>
+/usr/{lib,lib64}/libvirt/lock_manager/$NAME.so
+</pre>
+
+ <p>
+ The lock manager plugin must export a single ELF
+ symbol named <code>virLockDriverImpl</code>, which is
+ a static instance of the <code>virLockDriver</code>
+ struct. The struct is defined in the header file
+ </p>
+
+ <pre>
+ #include <libvirt/plugins/lock_manager.h>
+ </pre>
+
+ <p>
+ All callbacks in the struct must be initialized
+ to non-NULL pointers. The semantics of each
+ callback are defined in the API docs embedded
+ in the previously mentioned header file
+ </p>
+
+ <h2><a name="usagePatterns">Lock usage
patterns</a></h2>
+
+ <p>
+ The following psuedo code illustrates the common
+ patterns of operations invoked on the lock
+ manager plugin callbacks.
+ </p>
+
+ <h3><a name="usageLockAcquire">Lock
acquisition</a></h3>
+
+ <p>
+ Lock acquisition will always be performed from the
+ process that is to own the lock. This is typically
+ the QEMU child process, in between the fork+exec
+ pairing, but it may occassionally be held directly
+ by libvirtd.
+ </p>
+
+ <pre>
+ mgr = virLockManagerNew(lockPlugin,
+ VIR_LOCK_MANAGER_MODE_CONTENT,
+ VIR_LOCK_MANAGER_TYPE_DOMAIN);
+ virLockManagerSetParameter(mgr, "uuid", $uuid);
+ virLockManagerSetParameter(mgr, "name", $name);
+
+ foreach (initial disks)
+ virLockManagerAddResource(mgr,
+ VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK,
+ $path, $flags);
+
+ if (virLockManagerAcquireObject(mgr) < 0)
+ ...abort...
+ </pre>
+
+ <p>
+ The lock is implicitly released when the process
+ that acquired it exits, however, a process may
+ voluntarily give up the lock by running
+ </p>
+
+ <pre>
+ virLockManagerReleaseObject(mgr);
+ </pre>
+
+ <h3><a name="usageLockAttach">Lock
attachment</a></h3>
+
+ <p>
+ Any time a process needs todo work on behalf of
+ another process that holds a lock, it will associate
+ itself with the existing lock. This sequence is
+ identical to the previous one, except for the
+ last step.
+ </p>
+
+
+ <pre>
+ mgr = virLockManagerNew(contentLock,
+ VIR_LOCK_MANAGER_MODE_CONTENT,
+ VIR_LOCK_MANAGER_TYPE_DOMAIN);
+ virLockManagerSetParameter(mgr, "uuid", $uuid);
+ virLockManagerSetParameter(mgr, "name", $name);
+
+ foreach (current disks)
+ virLockManagerAddResource(mgr,
+ VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK,
+ $path, $flags);
+
+ if (virLockManagerAttachObject(mgr, $pid) < 0)
+ ...abort...
+ </pre>
+
+ <p>
+ A lock association will always be explicitly broken
+ by running
+ </p>
+
+ <pre>
+ virLockManagerDetachObject(mgr, $pid);
+ </pre>
+
+
+ <h3><a name="usageLiveResourceChange">Live resource
changes</a></h3>
+
+ <p>
+ When adding a resource to an existing locked object (eg to
+ hotplug a disk into a VM), the lock manager will first
+ attach to the locked object, acquire a lock on the
+ new resource, then detach from the locked object.
+ </p>
+
+ <pre>
+ ... initial glue ...
+ if (virLockManagerAttachObject(mgr, $pid) < 0)
+ ...abort...
+
+ if (virLockManagerAcquireResource(mgr,
+ VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK,
+ $path, $flags) < 0)
+ ...abort...
+
+ ...assign resource to object
+
+ virLockManagerDetachObject(mgr, $pid)
+ </pre>
+
+ <p>
+ Removing a resource from an existing object is an identical
+ process, but with <code>virLockManagerReleaseResource</code>
+ invoked instead
+ </p>
+
+ </body>
+</html>
--
1.7.2.3