[libvirt] [PATCH 00/11] Integration of lock managers in QEMU driver (v2)

This is a second iteration of the patches from https://www.redhat.com/archives/libvir-list/2010-November/msg00975.html Since the last posting quite alot of dev & testing has been done. The main changes are: - There is now a <lease> element in the XML to allow locking against resources not otherwise in the XML - The 'sanlock' plugin implementation is now in-tree - The 'lock_driver.h' header is now longer publically installed. All lock manager impls must be in-tree. This prevents closed source plugins, and more importantly allows full integration of error reporting - Improved virCommand handshake to allow error message to be passed back to libvirtd This version is now actually tested with sanlock and works for initial VM startup at least, successfully preventing QEMU being started more than once for a given config. Still todo: - Merge this series with the huge RPC series - Wire up a fcntl based plugin - Integration with migration workflow

Some functionality run in virExec hooks may do I/O which can trigger SIGPIPE. Renable SIGPIPE blocking around the hook function * src/util/util.c: Block SIGPIPE around hooks --- src/util/util.c | 24 +++++++++++++++++++++++- 1 files changed, 23 insertions(+), 1 deletions(-) diff --git a/src/util/util.c b/src/util/util.c index f412a83..78ac168 100644 --- a/src/util/util.c +++ b/src/util/util.c @@ -642,12 +642,34 @@ __virExec(const char *const*argv, } } - if (hook) + if (hook) { + /* virFork reset all signal handlers to the defaults. + * This is good for the child process, but our hook + * risks running something that generates SIGPIPE, + * so we need to temporarily block that again + */ + struct sigaction waxon, waxoff; + waxoff.sa_handler = SIG_IGN; + waxoff.sa_flags = 0; + memset(&waxon, 0, sizeof(waxon)); + if (sigaction(SIGPIPE, &waxoff, &waxon) < 0) { + virReportSystemError(errno, "%s", + _("Could not disable SIGPIPE")); + goto fork_error; + } + if ((hook)(data) != 0) { VIR_DEBUG0("Hook function failed."); goto fork_error; } + if (sigaction(SIGPIPE, &waxon, NULL) < 0) { + virReportSystemError(errno, "%s", + _("Could not re-enable SIGPIPE")); + goto fork_error; + } + } + /* The steps above may need todo something privileged, so * we delay clearing capabilities until the last minute */ if ((flags & VIR_EXEC_CLEAR_CAPS) && -- 1.7.3.4

On 01/24/2011 08:13 AM, Daniel P. Berrange wrote:
Some functionality run in virExec hooks may do I/O which can trigger SIGPIPE. Renable SIGPIPE blocking around the hook function
* src/util/util.c: Block SIGPIPE around hooks - if (hook) + if (hook) { + /* virFork reset all signal handlers to the defaults. + * This is good for the child process, but our hook + * risks running something that generates SIGPIPE, + * so we need to temporarily block that again + */ + struct sigaction waxon, waxoff;
Cute.
+ waxoff.sa_handler = SIG_IGN; + waxoff.sa_flags = 0; + memset(&waxon, 0, sizeof(waxon)); + if (sigaction(SIGPIPE, &waxoff, &waxon) < 0) { + virReportSystemError(errno, "%s", + _("Could not disable SIGPIPE"));
Yikes. We have a potential deadlock problem. See this bug report against GNU sort: http://lists.gnu.org/archive/html/coreutils/2011-01/msg00085.html In sort, any program that mixes pthread_create with fork (and libvirt falls into that category) must obey this section of POSIX: http://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html "If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called." malloc() (which is called by virReportSystemError(), as well as by _()) is NOT async-signal-safe; therefore, it is quite possible that we fork() in one thread while another thread is in the middle of holding the malloc() mutex, and the child process will deadlock because it no longer has a secondary thread available to release the malloc() mutex. Ultimately, we need to refactor and audit the code so that only async-signal-safe functions are allowed between fork() and exec(); which means that virExec needs to be taught how to hand all errors back to the parent over a secondary pipe for the parent to issue (rather than the child attempting to issue any errors on its own). However, that problem is pre-existing; so your patch, while adding another instance of a violation, is not adding a regression, so: Reluctant ACK. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Fri, Jan 28, 2011 at 09:32:42AM -0700, Eric Blake wrote:
On 01/24/2011 08:13 AM, Daniel P. Berrange wrote:
Some functionality run in virExec hooks may do I/O which can trigger SIGPIPE. Renable SIGPIPE blocking around the hook function
* src/util/util.c: Block SIGPIPE around hooks - if (hook) + if (hook) { + /* virFork reset all signal handlers to the defaults. + * This is good for the child process, but our hook + * risks running something that generates SIGPIPE, + * so we need to temporarily block that again + */ + struct sigaction waxon, waxoff;
Cute.
+ waxoff.sa_handler = SIG_IGN; + waxoff.sa_flags = 0; + memset(&waxon, 0, sizeof(waxon)); + if (sigaction(SIGPIPE, &waxoff, &waxon) < 0) { + virReportSystemError(errno, "%s", + _("Could not disable SIGPIPE"));
Yikes. We have a potential deadlock problem. See this bug report against GNU sort:
http://lists.gnu.org/archive/html/coreutils/2011-01/msg00085.html
In sort, any program that mixes pthread_create with fork (and libvirt falls into that category) must obey this section of POSIX:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html
"If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called."
malloc() (which is called by virReportSystemError(), as well as by _()) is NOT async-signal-safe; therefore, it is quite possible that we fork() in one thread while another thread is in the middle of holding the malloc() mutex, and the child process will deadlock because it no longer has a secondary thread available to release the malloc() mutex.
Ultimately, we need to refactor and audit the code so that only async-signal-safe functions are allowed between fork() and exec(); which means that virExec needs to be taught how to hand all errors back to the parent over a secondary pipe for the parent to issue (rather than the child attempting to issue any errors on its own).
However, that problem is pre-existing; so your patch, while adding another instance of a violation, is not adding a regression, so:
Hmm, we have had this style of problem before, but with libvirt internal APIs. eg, this is why we do virLogLock() and Unlock across the fork() call. I didn't occur to me that we could get hit at the POSIX level with this. This is a collosal PITA because we do quite alot of work inbetween fork+exec() in QEMU, including calling out to library APIs we don't control to the extent that I doubt we can practically audit it, or easily address it :-( Daniel

To ensure child processes will log all error messages, reset the logging filter function when forking * src/util/util.c: Reset log filter in fork --- src/util/util.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/src/util/util.c b/src/util/util.c index 78ac168..db2b04d 100644 --- a/src/util/util.c +++ b/src/util/util.c @@ -384,6 +384,7 @@ int virFork(pid_t *pid) { get sent to stderr where they stand a fighting chance of being seen / logged */ virSetErrorFunc(NULL, NULL); + virSetErrorLogPriorityFunc(NULL); /* Make sure any hook logging is sent to stderr, since child * process may close the logfile FDs */ -- 1.7.3.4

On 01/24/2011 08:13 AM, Daniel P. Berrange wrote:
To ensure child processes will log all error messages, reset the logging filter function when forking
* src/util/util.c: Reset log filter in fork --- src/util/util.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/src/util/util.c b/src/util/util.c index 78ac168..db2b04d 100644 --- a/src/util/util.c +++ b/src/util/util.c @@ -384,6 +384,7 @@ int virFork(pid_t *pid) { get sent to stderr where they stand a fighting chance of being seen / logged */ virSetErrorFunc(NULL, NULL); + virSetErrorLogPriorityFunc(NULL);
ACK. (modulo the bigger issue that we can't safely use malloc to log anything in the child in the first place, but that's pre-existing...) -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

When built as modules, the connection drivers live in $LIBDIR/libvirt/drivers. Now we add lock manager drivers, we need to distinguish. So move the existing modules to 'connection-driver' * src/Makefile.am: Move module install dir * src/driver.c: Move module search dir --- src/Makefile.am | 2 +- src/driver.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/Makefile.am b/src/Makefile.am index f8b8434..2f94efd 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -29,7 +29,7 @@ endif lib_LTLIBRARIES = libvirt.la libvirt-qemu.la -moddir = $(libdir)/libvirt/drivers +moddir = $(libdir)/libvirt/connection-driver mod_LTLIBRARIES = confdir = $(sysconfdir)/libvirt diff --git a/src/driver.c b/src/driver.c index d83b1fd..6e4aa9f 100644 --- a/src/driver.c +++ b/src/driver.c @@ -30,7 +30,7 @@ #include "util.h" #include "configmake.h" -#define DEFAULT_DRIVER_DIR LIBDIR "/libvirt/drivers" +#define DEFAULT_DRIVER_DIR LIBDIR "/libvirt/connection-driver" /* Make sure ... INTERNAL_CALL can not be set by the caller */ verify((VIR_SECRET_GET_VALUE_INTERNAL_CALL & -- 1.7.3.4

On 01/24/2011 08:13 AM, Daniel P. Berrange wrote:
When built as modules, the connection drivers live in $LIBDIR/libvirt/drivers. Now we add lock manager drivers, we need to distinguish. So move the existing modules to 'connection-driver'
* src/Makefile.am: Move module install dir * src/driver.c: Move module search dir --- src/Makefile.am | 2 +- src/driver.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
ACK. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

A lock manager may operate in various modes. The direct mode of operation is to obtain locks based on the resources associated with devices in the XML. The indirect mode is where the app creating the domain provides explicit leases for each resource that needs to be locked. This XML extension allows for listing resources in the XML <leases> <lease> <key>thequickbrownfoxjumpsoverthelazydog</key> <target path='/some/lease/path' offset='23432' length='256'/> </lease> </leases> * docs/schemas/domain.rng: Add lease schema * src/conf/domain_conf.c, src/conf/domain_conf.h: parsing and formatting for leases * tests/qemuxml2argvdata/qemuxml2argv-lease.args, tests/qemuxml2argvdata/qemuxml2argv-lease.xml, tests/qemuxml2xmltest.c: Test XML handling for leases --- docs/schemas/domain.rng | 32 ++++++ src/conf/domain_conf.c | 133 ++++++++++++++++++++++++ src/conf/domain_conf.h | 12 ++ tests/qemuxml2argvdata/qemuxml2argv-lease.args | 1 + tests/qemuxml2argvdata/qemuxml2argv-lease.xml | 37 +++++++ tests/qemuxml2xmltest.c | 1 + 6 files changed, 216 insertions(+), 0 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-lease.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-lease.xml diff --git a/docs/schemas/domain.rng b/docs/schemas/domain.rng index d4756e6..fda7a3f 100644 --- a/docs/schemas/domain.rng +++ b/docs/schemas/domain.rng @@ -39,6 +39,9 @@ <ref name="features"/> <ref name="termination"/> <optional> + <ref name="leases"/> + </optional> + <optional> <ref name="devices"/> </optional> <optional> @@ -550,6 +553,35 @@ <ref name="address"/> </optional> </define> + + <define name="leases"> + <element name="leases"> + <zeroOrMore> + <ref name="lease"/> + </zeroOrMore> + </element> + </define> + + <define name="lease"> + <element name="lease"> + <element name="key"> + <text/> + </element> + <element name="target"> + <attribute name="path"> + <text/> + </attribute> + <optional> + <attribute name="offset"> + <ref name="unsignedInt"/> + </attribute> + <attribute name="length"> + <ref name="unsignedInt"/> + </attribute> + </optional> + </element> + </element> + </define> <!-- A disk description can be either of type file or block The name of the attribute on the source element depends on the type diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 699fee7..c10bb8f 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -514,6 +514,17 @@ void virDomainInputDefFree(virDomainInputDefPtr def) VIR_FREE(def); } +static void virDomainLeaseDefFree(virDomainLeaseDefPtr def) +{ + if (!def) + return; + + VIR_FREE(def->key); + VIR_FREE(def->path); + + VIR_FREE(def); +} + void virDomainDiskDefFree(virDomainDiskDefPtr def) { unsigned int i; @@ -804,6 +815,10 @@ void virDomainDefFree(virDomainDefPtr def) if (!def) return; + for (i = 0 ; i < def->nleases ; i++) + virDomainLeaseDefFree(def->leases[i]); + VIR_FREE(def->leases); + for (i = 0 ; i < def->ngraphics ; i++) virDomainGraphicsDefFree(def->graphics[i]); VIR_FREE(def->graphics); @@ -1659,6 +1674,82 @@ virDomainDiskDefAssignAddress(virCapsPtr caps, virDomainDiskDefPtr def) return 0; } +/* Parse the XML definition for a lease + */ +static virDomainLeaseDefPtr +virDomainLeaseDefParseXML(xmlNodePtr node) +{ + virDomainLeaseDefPtr def; + xmlNodePtr cur; + char *key = NULL; + char *path = NULL; + char *offset = NULL; + char *length = NULL; + + if (VIR_ALLOC(def) < 0) { + virReportOOMError(); + return NULL; + } + + cur = node->children; + while (cur != NULL) { + if (cur->type == XML_ELEMENT_NODE) { + if ((key == NULL) && + (xmlStrEqual(cur->name, BAD_CAST "key"))) { + key = (char *)xmlNodeGetContent(cur); + } else if ((path == NULL) && + (xmlStrEqual(cur->name, BAD_CAST "target"))) { + path = virXMLPropString(cur, "path"); + offset = virXMLPropString(cur, "offset"); + length = virXMLPropString(cur, "length"); + } + } + cur = cur->next; + } + + if (!key) { + virDomainReportError(VIR_ERR_XML_ERROR, "%s", + _("Missing 'key' element for lease")); + goto error; + } + if (!path) { + virDomainReportError(VIR_ERR_XML_ERROR, "%s", + _("Missing 'target' element for lease")); + goto error; + } + + if (offset && + virStrToLong_ull(offset, NULL, 10, &def->offset) < 0) { + virDomainReportError(VIR_ERR_XML_ERROR, + _("Malformed lease target offset %s"), offset); + goto error; + } + if (length && + virStrToLong_ull(length, NULL, 10, &def->length) < 0) { + virDomainReportError(VIR_ERR_XML_ERROR, + _("Malformed lease target length %s"), length); + goto error; + } + + def->key = key; + def->path = path; + path = key = NULL; + +cleanup: + VIR_FREE(key); + VIR_FREE(path); + VIR_FREE(offset); + VIR_FREE(length); + + return def; + + error: + virDomainLeaseDefFree(def); + def = NULL; + goto cleanup; +} + + /* Parse the XML definition for a disk * @param node XML nodeset to parse for disk definition */ @@ -5154,6 +5245,23 @@ static virDomainDefPtr virDomainDefParseXML(virCapsPtr caps, goto error; } + /* analysis of the resource leases */ + if ((n = virXPathNodeSet("./leases/lease", ctxt, &nodes)) < 0) { + virDomainReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("cannot extract lease devices")); + goto error; + } + if (n && VIR_ALLOC_N(def->leases, n) < 0) + goto no_memory; + for (i = 0 ; i < n ; i++) { + virDomainLeaseDefPtr lease = virDomainLeaseDefParseXML(nodes[i]); + if (!lease) + goto error; + + def->leases[def->nleases++] = lease; + } + VIR_FREE(nodes); + /* analysis of the disk devices */ if ((n = virXPathNodeSet("./devices/disk", ctxt, &nodes)) < 0) { virDomainReportError(VIR_ERR_INTERNAL_ERROR, @@ -6155,6 +6263,23 @@ virDomainLifecycleDefFormat(virBufferPtr buf, static int +virDomainLeaseDefFormat(virBufferPtr buf, + virDomainLeaseDefPtr def) +{ + virBufferAddLit(buf, " <lease>\n"); + virBufferEscapeString(buf, " <key>%s</key>\n", def->key); + virBufferEscapeString(buf, " <target path='%s'", def->path); + if (def->offset) + virBufferVSprintf(buf, " offset='%llu'", def->offset); + if (def->length) + virBufferVSprintf(buf, " length='%llu'", def->length); + virBufferAddLit(buf, "/>\n"); + virBufferAddLit(buf, " </lease>\n"); + + return 0; +} + +static int virDomainDiskDefFormat(virBufferPtr buf, virDomainDiskDefPtr def, int flags) @@ -7485,6 +7610,14 @@ char *virDomainDefFormat(virDomainDefPtr def, virDomainLifecycleCrashTypeToString) < 0) goto cleanup; + if (def->nleases) { + virBufferAddLit(&buf, " <leases>\n"); + for (n = 0 ; n < def->nleases ; n++) + if (virDomainLeaseDefFormat(&buf, def->leases[n]) < 0) + goto cleanup; + virBufferAddLit(&buf, " </leases>\n"); + } + virBufferAddLit(&buf, " <devices>\n"); if (def->emulator) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 3b00ba0..4f9c044 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -114,6 +114,15 @@ struct _virDomainDeviceInfo { } addr; }; +typedef struct _virDomainLeaseDef virDomainLeaseDef; +typedef virDomainLeaseDef *virDomainLeaseDefPtr; +struct _virDomainLeaseDef { + char *key; + char *path; + unsigned long long offset; + unsigned long long length; +}; + /* Two types of disk backends */ enum virDomainDiskType { @@ -992,6 +1001,9 @@ struct _virDomainDef { char *emulator; int features; + int nleases; + virDomainLeaseDefPtr *leases; + virDomainClockDef clock; int ngraphics; diff --git a/tests/qemuxml2argvdata/qemuxml2argv-lease.args b/tests/qemuxml2argvdata/qemuxml2argv-lease.args new file mode 100644 index 0000000..4a347ad --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-lease.args @@ -0,0 +1 @@ +LC_ALL=C PATH=/bin HOME=/home/test USER=test LOGNAME=test /usr/bin/qemu -S -M pc -m 214 -smp 1 -nographic -monitor unix:/tmp/test-monitor,server,nowait -no-acpi -boot c -hda /dev/HostVG/QEMUGuest1 -cdrom /root/boot.iso -net none -serial none -parallel none -usb diff --git a/tests/qemuxml2argvdata/qemuxml2argv-lease.xml b/tests/qemuxml2argvdata/qemuxml2argv-lease.xml new file mode 100644 index 0000000..5304f89 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-lease.xml @@ -0,0 +1,37 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory>219200</memory> + <currentMemory>219200</currentMemory> + <vcpu>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <leases> + <lease> + <key>thequickbrownfoxjumpedoverthelazydog</key> + <target path='/some/lease/path' offset='1024' length='512'/> + </lease> + </leases> + <devices> + <emulator>/usr/bin/qemu</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' unit='0'/> + </disk> + <disk type='file' device='cdrom'> + <source file='/root/boot.iso'/> + <target dev='hdc' bus='ide'/> + <readonly/> + <address type='drive' controller='0' bus='1' unit='0'/> + </disk> + <controller type='ide' index='0'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index ab82d36..7d9466b 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -183,6 +183,7 @@ mymain(int argc, char **argv) DO_TEST("memtune"); DO_TEST("smp"); + DO_TEST("lease"); /* These tests generate different XML */ DO_TEST_DIFFERENT("balloon-device-auto"); -- 1.7.3.4

On 01/24/2011 08:13 AM, Daniel P. Berrange wrote:
A lock manager may operate in various modes. The direct mode of operation is to obtain locks based on the resources associated with devices in the XML. The indirect mode is where the app creating the domain provides explicit leases for each resource that needs to be locked. This XML extension allows for listing resources in the XML
<leases> <lease> <key>thequickbrownfoxjumpsoverthelazydog</key> <target path='/some/lease/path' offset='23432' length='256'/> </lease> </leases>
* docs/schemas/domain.rng: Add lease schema * src/conf/domain_conf.c, src/conf/domain_conf.h: parsing and formatting for leases * tests/qemuxml2argvdata/qemuxml2argv-lease.args, tests/qemuxml2argvdata/qemuxml2argv-lease.xml, tests/qemuxml2xmltest.c: Test XML handling for leases --- docs/schemas/domain.rng | 32 ++++++
docs/formatdomain.html.in You can't escape writing docs, no matter how hard you try :)
+static void virDomainLeaseDefFree(virDomainLeaseDefPtr def) +{ + if (!def) + return;
Add this to the free-like functions in cfg.mk.
static int +virDomainLeaseDefFormat(virBufferPtr buf, + virDomainLeaseDefPtr def) +{ + virBufferAddLit(buf, " <lease>\n"); + virBufferEscapeString(buf, " <key>%s</key>\n", def->key); + virBufferEscapeString(buf, " <target path='%s'", def->path); + if (def->offset) + virBufferVSprintf(buf, " offset='%llu'", def->offset); + if (def->length) + virBufferVSprintf(buf, " length='%llu'", def->length); + virBufferAddLit(buf, "/>\n"); + virBufferAddLit(buf, " </lease>\n");
The last two lines could be merged, but I'm not picky.
+typedef virDomainLeaseDef *virDomainLeaseDefPtr; +struct _virDomainLeaseDef { + char *key; + char *path; + unsigned long long offset; + unsigned long long length;
Do we want to use off_t instead of unsigned long long? Then again, I don't think it matters that much in practice (for all practical porting targets, ull is 64-bits, and gnulib pretty much guarantees that off_t is 64-bits). I'd like to ACK this with the nits fixed, but should I really do that without documentation?... -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

Allow the parent process to perform a bi-directional handshake with the child process during fork/exec. The child process will fork and do its initial setup. Immediately prior to the exec(), it will stop & wait for a handshake from the parent process. The parent process will spawn the child and wait until the child reaches the handshake point. It will do whatever extra setup work is required, before signalling the child to continue. The implementation of this is done using two pairs of blocking pipes. The first pair is used to block the parent, until the child writes a single byte. Then the second pair pair is used to block the child, until the parent confirms with another single byte. * src/util/command.c, src/util/command.h, src/libvirt_private.syms: Add APIs to perform a handshake --- src/libvirt_private.syms | 3 + src/util/command.c | 141 +++++++++++++++++++++++++++++++++++++++++++++- src/util/command.h | 5 ++ 3 files changed, 147 insertions(+), 2 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index cb0214e..99df0f7 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -98,11 +98,14 @@ virCommandAddEnvString; virCommandClearCaps; virCommandDaemonize; virCommandFree; +virCommandHandshakeNotify; +virCommandHandshakeWait; virCommandNew; virCommandNewArgList; virCommandNewArgs; virCommandNonblockingFDs; virCommandPreserveFD; +virCommandRequireHandshake; virCommandRun; virCommandRunAsync; virCommandSetErrorBuffer; diff --git a/src/util/command.c b/src/util/command.c index abd2dc4..7a5d333 100644 --- a/src/util/command.c +++ b/src/util/command.c @@ -36,6 +36,11 @@ #include "files.h" #include "buf.h" +#include <stdlib.h> +#include <stdbool.h> +#include <poll.h> +#include <sys/wait.h> + #define VIR_FROM_THIS VIR_FROM_NONE #define virCommandError(code, ...) \ @@ -72,6 +77,10 @@ struct _virCommand { int *outfdptr; int *errfdptr; + bool handshake; + int handshakeWait[2]; + int handshakeNotify[2]; + virExecHook hook; void *opaque; @@ -102,6 +111,11 @@ virCommandNewArgs(const char *const*args) if (VIR_ALLOC(cmd) < 0) return NULL; + cmd->handshakeWait[0] = -1; + cmd->handshakeWait[1] = -1; + cmd->handshakeNotify[0] = -1; + cmd->handshakeNotify[1] = -1; + FD_ZERO(&cmd->preserve); FD_ZERO(&cmd->transfer); cmd->infd = cmd->outfd = cmd->errfd = -1; @@ -1078,7 +1092,6 @@ virCommandRun(virCommandPtr cmd, int *exitstatus) return ret; } - /* * Perform all virCommand-specific actions, along with the user hook. */ @@ -1088,12 +1101,61 @@ virCommandHook(void *data) virCommandPtr cmd = data; int res = 0; - if (cmd->hook) + if (cmd->hook) { + VIR_DEBUG("Run hook %p %p", cmd->hook, cmd->opaque); res = cmd->hook(cmd->opaque); + VIR_DEBUG("Done hook %d", res); + } if (res == 0 && cmd->pwd) { VIR_DEBUG("Running child in %s", cmd->pwd); res = chdir(cmd->pwd); + if (res < 0) { + virReportSystemError(errno, + _("Unable to change to %s"), cmd->pwd); + } + } + if (cmd->handshake) { + char c = res < 0 ? '0' : '1'; + int rv; + VIR_DEBUG("Notifying parent for handshake start on %d", cmd->handshakeWait[1]); + if (safewrite(cmd->handshakeWait[1], &c, sizeof(c)) != sizeof(c)) { + virReportSystemError(errno, "%s", _("Unable to notify parent process")); + return -1; + } + + /* On failure we pass the error message back to parent, + * so they don't have to dig through stderr logs + */ + if (res < 0) { + virErrorPtr err = virGetLastError(); + const char *msg = err ? err->message : + _("Unknown failure during hook execution"); + size_t len = strlen(msg) + 1; + if (safewrite(cmd->handshakeWait[1], msg, len) != len) { + virReportSystemError(errno, "%s", _("Unable to send error to parent process")); + return -1; + } + return -1; + } + + VIR_DEBUG("Waiting on parent for handshake complete on %d", cmd->handshakeNotify[0]); + if ((rv = saferead(cmd->handshakeNotify[0], &c, sizeof(c))) != sizeof(c)) { + if (rv < 0) + virReportSystemError(errno, "%s", _("Unable to wait on parent process")); + else + virReportSystemError(EIO, "%s", _("libvirtd quit during handshake")); + return -1; + } + if (c != '1') { + virReportSystemError(EINVAL, _("Unexpected confirm code '%c' from parent process"), c); + return -1; + } + VIR_FORCE_CLOSE(cmd->handshakeWait[1]); + VIR_FORCE_CLOSE(cmd->handshakeNotify[0]); } + + VIR_DEBUG("Hook is done %d", res); + return res; } @@ -1170,6 +1232,10 @@ virCommandRunAsync(virCommandPtr cmd, pid_t *pid) FD_CLR(i, &cmd->transfer); } } + if (cmd->handshake) { + VIR_FORCE_CLOSE(cmd->handshakeWait[1]); + VIR_FORCE_CLOSE(cmd->handshakeNotify[0]); + } if (ret == 0 && pid) *pid = cmd->pid; @@ -1234,6 +1300,70 @@ virCommandWait(virCommandPtr cmd, int *exitstatus) } +void virCommandRequireHandshake(virCommandPtr cmd) +{ + if (pipe(cmd->handshakeWait) < 0) { + cmd->has_error = errno; + return; + } + if (pipe(cmd->handshakeNotify) < 0) { + VIR_FORCE_CLOSE(cmd->handshakeWait[0]); + VIR_FORCE_CLOSE(cmd->handshakeWait[1]); + cmd->has_error = errno; + return; + } + + VIR_DEBUG("Transfer handshake wait=%d notify=%d", + cmd->handshakeWait[1], cmd->handshakeNotify[0]); + virCommandPreserveFD(cmd, cmd->handshakeWait[1]); + virCommandPreserveFD(cmd, cmd->handshakeNotify[0]); + cmd->handshake = true; +} + +int virCommandHandshakeWait(virCommandPtr cmd) +{ + char c; + int rv; + VIR_DEBUG("Wait for handshake on %d", cmd->handshakeWait[0]); + if ((rv = saferead(cmd->handshakeWait[0], &c, sizeof(c))) != sizeof(c)) { + if (rv < 0) + virReportSystemError(errno, "%s", _("Unable to wait for child process")); + else + virReportSystemError(EIO, "%s", _("Child process quit during startup handshake")); + return -1; + } + if (c != '1') { + char *msg; + ssize_t len; + if (VIR_ALLOC_N(msg, 1024) < 0) { + virReportOOMError(); + return -1; + } + if ((len = saferead(cmd->handshakeWait[0], msg, 1024)) < 0) { + VIR_FREE(msg); + virReportSystemError(errno, "%s", _("No error message from child failure")); + return -1; + } + msg[len-1] = '\0'; + virCommandError(VIR_ERR_INTERNAL_ERROR, "%s", msg); + VIR_FREE(msg); + return -1; + } + return 0; +} + +int virCommandHandshakeNotify(virCommandPtr cmd) +{ + char c = '1'; + VIR_DEBUG("Notify handshake on %d", cmd->handshakeWait[0]); + if (safewrite(cmd->handshakeNotify[1], &c, sizeof(c)) != sizeof(c)) { + virReportSystemError(errno, "%s", _("Unable to notify child process")); + return -1; + } + return 0; +} + + /* * Release all resources */ @@ -1265,6 +1395,13 @@ virCommandFree(virCommandPtr cmd) VIR_FREE(cmd->pwd); + if (cmd->handshake) { + VIR_FORCE_CLOSE(cmd->handshakeWait[0]); + VIR_FORCE_CLOSE(cmd->handshakeWait[1]); + VIR_FORCE_CLOSE(cmd->handshakeNotify[0]); + VIR_FORCE_CLOSE(cmd->handshakeNotify[1]); + } + VIR_FREE(cmd->pidfile); VIR_FREE(cmd); diff --git a/src/util/command.h b/src/util/command.h index 59d0ee3..c5b0a64 100644 --- a/src/util/command.h +++ b/src/util/command.h @@ -268,6 +268,11 @@ int virCommandRunAsync(virCommandPtr cmd, int virCommandWait(virCommandPtr cmd, int *exitstatus) ATTRIBUTE_RETURN_CHECK; +void virCommandRequireHandshake(virCommandPtr cmd); + +int virCommandHandshakeWait(virCommandPtr cmd); +int virCommandHandshakeNotify(virCommandPtr cmd); + /* * Release all resources */ -- 1.7.3.4

Define the basic framework lock manager plugins. The basic plugin API for 3rd parties to implemented is defined in src/locking/lock_driver.h This allows dlopen()able modules for alternative locking schemes, however, we do not install the header. This requires lock plugins to be in-tree allowing changing of the lock manager plugin API in future. The libvirt code for loading & calling into plugins is in src/locking/lock_manager.{c,h} * include/libvirt/virterror.h, src/util/virterror.c: Add VIR_FROM_LOCKING * src/locking/lock_driver.h: API for lock driver plugins to implement * src/locking/lock_manager.c, src/locking/lock_manager.h: Internal API for managing locking * src/Makefile.am: Add locking code --- include/libvirt/virterror.h | 1 + po/POTFILES.in | 1 + src/Makefile.am | 3 +- src/libvirt_private.syms | 18 ++ src/locking/README | 158 +++++++++++++++++ src/locking/lock_driver.h | 395 +++++++++++++++++++++++++++++++++++++++++++ src/locking/lock_manager.c | 394 ++++++++++++++++++++++++++++++++++++++++++ src/locking/lock_manager.h | 79 +++++++++ src/util/virterror.c | 3 + 9 files changed, 1051 insertions(+), 1 deletions(-) create mode 100644 src/locking/README create mode 100644 src/locking/lock_driver.h create mode 100644 src/locking/lock_manager.c create mode 100644 src/locking/lock_manager.h diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h index 5962dbf..33118c6 100644 --- a/include/libvirt/virterror.h +++ b/include/libvirt/virterror.h @@ -79,6 +79,7 @@ typedef enum { VIR_FROM_SYSINFO = 37, /* Error from sysinfo/SMBIOS */ VIR_FROM_STREAMS = 38, /* Error from I/O streams */ VIR_FROM_VMWARE = 39, /* Error from VMware driver */ + VIR_FROM_LOCKING = 40, /* Error from lock manager */ } virErrorDomain; diff --git a/po/POTFILES.in b/po/POTFILES.in index 5f2ed75..47f2f20 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -29,6 +29,7 @@ src/fdstream.c src/interface/netcf_driver.c src/internal.h src/libvirt.c +src/locking/lock_manager.c src/lxc/lxc_container.c src/lxc/lxc_conf.c src/lxc/lxc_controller.c diff --git a/src/Makefile.am b/src/Makefile.am index 2f94efd..f001daf 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -91,7 +91,8 @@ DRIVER_SOURCES = \ datatypes.c datatypes.h \ fdstream.c fdstream.h \ $(NODE_INFO_SOURCES) \ - libvirt.c libvirt_internal.h + libvirt.c libvirt_internal.h \ + locking/lock_manager.c locking/lock_manager.h # XML configuration format handling sources diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 99df0f7..b45f501 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -525,6 +525,24 @@ virRegisterSecretDriver; virRegisterStorageDriver; +# locking.h +virLockManagerAcquireObject; +virLockManagerAcquireResource; +virLockManagerAddResource; +virLockManagerAttachObject; +virLockManagerDetachObject; +virLockManagerFree; +virLockManagerGetState; +virLockManagerNew; +virLockManagerPluginNew; +virLockManagerPluginRef; +virLockManagerPluginUnref; +virLockManagerReleaseObject; +virLockManagerReleaseResource; +virLockManagerSetParameter; +virLockManagerStartup; + + # logging.h virLogDefineFilter; virLogDefineOutput; diff --git a/src/locking/README b/src/locking/README new file mode 100644 index 0000000..4fa4f89 --- /dev/null +++ b/src/locking/README @@ -0,0 +1,158 @@ + +At libvirtd startup: + + plugin = virLockManagerPluginLoad("sync-manager"); + + +At libvirtd shtudown: + + virLockManagerPluginUnload(plugin) + + +At guest startup: + + manager = virLockManagerNew(plugin, + VIR_LOCK_MANAGER_OBJECT_DOMAIN, + 0); + + virLockManagerSetParameter(manager, "id", id); + virLockManagerSetParameter(manager, "uuid", uuid); + virLockManagerSetParameter(manager, "name", name); + + foreach disk + virLockManagerRegisterResource(manager, + VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, + disk.path, + ..flags...); + + if (!virLockManagerAcquireObject(manager)) + abort.. + + run QEMU + + +At guest shutdown: + + ...send QEMU 'quit' monitor command, and/or kill(qemupid)... + + if (!virLockManagerShutdown(manager)) + kill(supervisorpid); /* XXX or leave it running ??? */ + + virLockManagerFree(manager); + + + +At libvirtd restart with running guests: + + foreach still running guest + manager = virLockManagerNew(driver, + VIR_LOCK_MANAGER_START_DOMAIN, + VIR_LOCK_MANAGER_NEW_ATTACH); + virLockManagerSetParameter(manager, "id", id); + virLockManagerSetParameter(manager, "uuid", uuid); + virLockManagerSetParameter(manager, "name", name); + + if (!virLockManagerGetChild(manager, &qemupid)) + kill(supervisorpid); /* XXX or leave it running ??? */ + + + +With disk hotplug: + + if (virLockManagerAcquireResource(manager, + VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, + disk.path + ..flags..)) + ...abort hotplug attempt ... + + ...hotplug the device... + + + +With disk unhotplug: + + ...hotunplug the device... + + if (virLockManagerReleaseResource(manager, + VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, + disk.path + ..flags..)) + ...log warning ... + + + +During migration: + + 1. On source host + + if (!virLockManagerPrepareMigrate(manager, hosturi)) + ..don't start migration.. + + 2. On dest host + + manager = virLockManagerNew(driver, + VIR_LOCK_MANAGER_START_DOMAIN, + VIR_LOCK_MANAGER_NEW_MIGRATE); + virLockManagerSetParameter(manager, "id", id); + virLockManagerSetParameter(manager, "uuid", uuid); + virLockManagerSetParameter(manager, "name", name); + + foreach disk + virLockManagerRegisterResource(manager, + VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, + disk.path, + ..flags...); + + char **supervisorargv; + int supervisorargc; + + supervisor = virLockManagerGetSupervisorPath(manager); + virLockManagerGetSupervisorArgs(&argv, &argc); + + cmd = qemuBuildCommandLine(supervisor, supervisorargv, supervisorargv); + + supervisorpid = virCommandExec(cmd); + + if (!virLockManagerGetChild(manager, &qemupid)) + kill(supervisorpid); /* XXX or leave it running ??? */ + + 3. Initiate migration in QEMU on source and wait for completion + + 4a. On failure + + 4a1 On target + + virLockManagerCompleteMigrateIn(manager, + VIR_LOCK_MANAGER_MIGRATE_CANCEL); + virLockManagerShutdown(manager); + virLockManagerFree(manager); + + 4a2 On source + + virLockManagerCompleteMigrateIn(manager, + VIR_LOCK_MANAGER_MIGRATE_CANCEL); + + 4b. On succcess + + + 4b1 On target + + virLockManagerCompleteMigrateIn(manager, 0); + + 42 On source + + virLockManagerCompleteMigrateIn(manager, 0); + virLockManagerShutdown(manager); + virLockManagerFree(manager); + + +Notes: + + - If a lock manager impl does just VM level leases, it can + ignore all the resource paths at startup. + + - If a lock manager impl does not support migrate + it can return an error from all migrate calls + + - If a lock manger impl does not support hotplug + it can return an error from all resource acquire/release calls diff --git a/src/locking/lock_driver.h b/src/locking/lock_driver.h new file mode 100644 index 0000000..a8b337d --- /dev/null +++ b/src/locking/lock_driver.h @@ -0,0 +1,395 @@ +/* + * lock_driver.h: Defines the lock driver plugin API + * + * Copyright (C) 2010-2011 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#ifndef __VIR_PLUGINS_LOCK_DRIVER_H__ +# define __VIR_PLUGINS_LOCK_DRIVER_H__ + +# include "internal.h" + +typedef struct _virLockManager virLockManager; +typedef virLockManager *virLockManagerPtr; + +typedef struct _virLockDriver virLockDriver; +typedef virLockDriver *virLockDriverPtr; + +typedef struct _virLockManagerParam virLockManagerParam; +typedef virLockManagerParam *virLockManagerParamPtr; + +enum { + /* The managed object is a virtual guest domain */ + VIR_LOCK_MANAGER_OBJECT_TYPE_DOMAIN = 0, +} virLockManagerObjectType; + +/* + * Flags to pass to 'load_drv' and also 'new_drv' method + * Plugins must support at least one of the modes. If a + * mode is unsupported, it must return an error + */ +enum { + VIR_LOCK_MANAGER_MODE_CONTENT = (1 << 0), + VIR_LOCK_MANAGER_MODE_METADATA = (1 << 1), +} virLockManagerFlags; + +enum { + /* The resource to be locked is a virtual disk */ + VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK = 0, + /* A lease against an arbitrary resource */ + VIR_LOCK_MANAGER_RESOURCE_TYPE_LEASE = 1, +} virLockManagerResourceType; + +typedef enum { + /* The resource is assigned in readonly mode */ + VIR_LOCK_MANAGER_RESOURCE_READONLY = (1 << 0), + /* The resource is assigned in shared, writable mode */ + VIR_LOCK_MANAGER_RESOURCE_SHARED = (1 << 1), +} virLockManagerResourceFlags; + +enum { + VIR_LOCK_MANAGER_PARAM_TYPE_STRING, + VIR_LOCK_MANAGER_PARAM_TYPE_INT, + VIR_LOCK_MANAGER_PARAM_TYPE_LONG, + VIR_LOCK_MANAGER_PARAM_TYPE_UINT, + VIR_LOCK_MANAGER_PARAM_TYPE_ULONG, + VIR_LOCK_MANAGER_PARAM_TYPE_DOUBLE, + VIR_LOCK_MANAGER_PARAM_TYPE_UUID, +}; + +struct _virLockManagerParam { + int type; + const char *key; + union { + int i; + long long l; + unsigned int ui; + unsigned long long ul; + double d; + char *str; + unsigned char uuid[16]; + } value; +}; + + +/* + * Changes in major version denote incompatible ABI changes + * Changes in minor version denote new compatible API entry points + * Changes in micro version denote new compatible flags + */ +# define VIR_LOCK_MANAGER_VERSION_MAJOR 1 +# define VIR_LOCK_MANAGER_VERSION_MINOR 0 +# define VIR_LOCK_MANAGER_VERSION_MICRO 0 + +# define VIR_LOCK_MANAGER_VERSION \ + ((VIR_LOCK_MANAGER_VERSION_MAJOR * 1000 * 1000) + \ + (VIR_LOCK_MANAGER_VERSION_MINOR * 1000) + \ + (VIR_LOCK_MANAGER_VERSION_MICRO)) + + + +/** + * virLockDriverInit: + * @version: the libvirt requested plugin ABI version + * @flags: the libvirt requested plugin optional extras + * + * Allow the plugin to validate the libvirt requested + * plugin version / flags. This allows the plugin impl + * to block its use in versions of libvirtd which are + * too old to support key features. + * + * NB: A plugin may be loaded multiple times, for different + * libvirt drivers (eg QEMU, LXC, UML) + * + * Returns -1 if the requested version/flags were inadequate + */ +typedef int (*virLockDriverInit)(unsigned int version, + unsigned int flags); + +/** + * virLockDriverDeinit: + * + * Called to release any resources prior to the plugin + * being unloaded from memory. Returns -1 to prevent + * plugin from being unloaded from memory. + */ +typedef int (*virLockDriverDeinit)(void); + +/** + * virLockManagerNew: + * @man: the lock manager context + * @type: the type of process to be supervised + * @nparams: number of metadata parameters + * @params: extra metadata parameters + * @flags: optional flags, currently unused + * + * Initialize a new context to supervise a process, usually + * a virtual machine. The lock driver implementation can use + * the <code>privateData</code> field of <code>man</code> + * to store a pointer to any driver specific state. + * + * A process of VIR_LOCK_MANAGER_START_DOMAIN will be + * given the following parameters + * + * - id: the domain unique id (unsigned int) + * - uuid: the domain uuid (uuid) + * - name: the domain name (string) + * - pid: process ID owning the lock (unsigned int) + * + * Returns 0 if successful initialized a new context, -1 on error + */ +typedef int (*virLockDriverNew)(virLockManagerPtr man, + unsigned int type, + size_t nparams, + virLockManagerParamPtr params, + unsigned int flags); + +/** + * virLockDriverFree: + * @manager: the lock manager context + * + * Release any resources associated with the lock manager + * context private data + */ +typedef void (*virLockDriverFree)(virLockManagerPtr man); + +/** + * virLockDriverAddResource: + * @manager: the lock manager context + * @type: the resource type virLockManagerResourceType + * @name: the resource name + * @nparams: number of metadata parameters + * @params: extra metadata parameters + * @flags: the resource access flags + * + * Assign a resource to a managed object. This will + * only be called prior to the object is being locked + * when it is inactive. eg, to set the initial boot + * time disk assignments on a VM + * The format of @name varies according to + * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK + * or VIR_LOCK_MANAGER_RESOURCE_TYPE_LEASE will have the + * fully qualified file path. + * + * A resource of type VIR_LOCK_MANAGER_RESOURCE_TYPE_LEASE + * will receive at least the following extra parameters + * + * - 'uuid': globally unique identifier of the lease (uuid) + * - 'offset': byte offset within the lease (unsigned long long) + * - 'length': number of bytes in lease region (unsigned long long) + * + * If no flags are given, the resource is assumed to be + * used in exclusive, read-write mode. Access can be + * relaxed to readonly, or shared read-write. + * + * Returns 0 on success, or -1 on failure + */ +typedef int (*virLockDriverAddResource)(virLockManagerPtr man, + unsigned int type, + const char *name, + size_t nparams, + virLockManagerParamPtr params, + unsigned int flags); + +/** + * virLockDriverAcquireObject: + * @manager: the lock manager context + * @state: the current lock state + * @flags: optional flags, currently unused + * + * Start managing resources for the object. If the + * object is being transferred from another location + * the current lock state may be passed in. This + * must be called from the PID that represents the + * object to be managed. If the lock is lost at any + * time, the PID will be killed off by the lock manager. + * + * Returns 0 on success, or -1 on failure + */ +typedef int (*virLockDriverAcquireObject)(virLockManagerPtr man, + const char *state, + unsigned int flags); + +/** + * virLockDriverAttachObject: + * @manager: the lock manager context + * @flags: optional flags, currently unused + * + * Re-attach to an existing lock manager instance managing + * PID @pid. + * + * Returns 0 on success, or -1 on failure + */ +typedef int (*virLockDriverAttachObject)(virLockManagerPtr man, + unsigned int flags); + +/** + * virLockDriverDetachObject: + * @manager: the lock manager context + * @flags: optional flags, currently unused + * + * Deattach from an existing lock manager instance managing + * PID @pid. + * + * Returns 0 on success, or -1 on failure + */ +typedef int (*virLockDriverDetachObject)(virLockManagerPtr man, + unsigned int flags); + +/** + * virLockDriverReleaseObject: + * @manager: the lock manager context + * @flags: optional flags + * + * Inform the lock manager that the supervised process has + * been, or can be stopped. This can must be called from + * the same context as the previous virLockDriverAttachObject + * or virLockDriverAcquireObject call. + * + * Returns 0 on success, or -1 on failure + */ +typedef int (*virLockDriverReleaseObject)(virLockManagerPtr man, + unsigned int flags); + +/** + * virLockDriverGetState: + * @manager: the lock manager context + * @state: pointer to be filled with lock state + * @flags: optional flags, currently unused + * + * Retrieve the current lock state. The returned + * lock state may be NULL if none is required. The + * caller is responsible for freeing the lock + * state string when it is no longer required + * + * Returns 0 on success, or -1 on failure. + */ +typedef int (*virLockDriverGetState)(virLockManagerPtr man, + char **state, + unsigned int flags); + +/** + * virLockDriverAcquireResource: + * @manager: the lock manager context + * @type: the resource type virLockDriverResourceType + * @name: the resource name + * @nparams: number of metadata parameters + * @params: extra metadata parameters + * @flags: the resource access flags + * + * Assign a resource to a managed object. This will + * only be called when the object is already locked + * and active. eg, to hotplug a disk into a VM. + * The format of @name varies according to + * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK + * or VIR_LOCK_MANAGER_RESOURCE_TYPE_LEASE will have the + * fully qualified file path. + * + * A resource of type VIR_LOCK_MANAGER_RESOURCE_TYPE_LEASE + * will receive at least the following extra parameters + * + * - 'uuid': globally unique identifier of the lease (uuid) + * - 'offset': byte offset within the lease (unsigned long long) + * - 'length': number of bytes in lease region (unsigned long long) + * + * If no flags are given, the resource is assumed to be + * used in exclusive, read-write mode. Access can be + * relaxed to readonly, or shared read-write. + * + * Returns 0 on success, or -1 on failure + */ +typedef int (*virLockDriverAcquireResource)(virLockManagerPtr man, + unsigned int type, + const char *name, + size_t nparams, + virLockManagerParamPtr params, + unsigned int flags); + +/** + * virLockDriverReleaseResource: + * @manager: the lock manager context + * @type: the resource type virLockDriverResourceType + * @name: the resource name + * @nparams: number of metadata parameters + * @params: extra metadata parameters + * @flags: the resource access flags + * + * Dynamically release a resource for a running process. + * This may only be called after the process has been + * started. The format of @name varies according to + * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK + * will have a fully qualified file path. + * + * A resource of type VIR_LOCK_MANAGER_RESOURCE_TYPE_LEASE + * will receive at least the following extra parameters + * + * - 'uuid': globally unique identifier of the lease (uuid) + * - 'offset': byte offset within the lease (unsigned long long) + * - 'length': number of bytes in lease region (unsigned long long) + * + * If no flags are given, the resource is assumed to be + * used in exclusive, read-write mode. Access can be + * relaxed to readonly, or shared read-write. + * + * Returns 0 on success, or -1 on failure + */ +typedef int (*virLockDriverReleaseResource)(virLockManagerPtr man, + unsigned int type, + const char *name, + size_t nparams, + virLockManagerParamPtr params, + unsigned int flags); + +struct _virLockManager { + virLockDriverPtr driver; + void *privateData; +}; + +/** + * The plugin must export a static instance of this + * driver table, with the name 'virLockDriverImpl' + */ +struct _virLockDriver { + /** + * @version: the newest implemented plugin ABI version + * @flags: optional flags, currently unused + */ + unsigned int version; + unsigned int flags; + + virLockDriverInit drvInit; + virLockDriverDeinit drvDeinit; + + virLockDriverNew drvNew; + virLockDriverFree drvFree; + + virLockDriverAddResource drvAddResource; + + virLockDriverAcquireObject drvAcquireObject; + virLockDriverAttachObject drvAttachObject; + virLockDriverDetachObject drvDetachObject; + virLockDriverReleaseObject drvReleaseObject; + + virLockDriverGetState drvGetState; + + virLockDriverAcquireResource drvAcquireResource; + virLockDriverReleaseResource drvReleaseResource; +}; + + +#endif /* __VIR_PLUGINS_LOCK_DRIVER_H__ */ diff --git a/src/locking/lock_manager.c b/src/locking/lock_manager.c new file mode 100644 index 0000000..9c98555 --- /dev/null +++ b/src/locking/lock_manager.c @@ -0,0 +1,394 @@ +/* + * lock_manager.c: Implements the internal lock manager API + * + * Copyright (C) 2010-2011 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#include <config.h> + +#include "lock_manager.h" +#include "virterror_internal.h" +#include "logging.h" +#include "util.h" +#include "memory.h" +#include "uuid.h" + +#include <dlfcn.h> +#include <stdlib.h> +#include <unistd.h> + +#include "configmake.h" + +#define VIR_FROM_THIS VIR_FROM_LOCKING + +#define virLockError(code, ...) \ + virReportErrorHelper(NULL, VIR_FROM_THIS, code, __FILE__, \ + __FUNCTION__, __LINE__, __VA_ARGS__) + +#define CHECK_PLUGIN(field, errret) \ + if (!plugin->driver->field) { \ + virLockError(VIR_ERR_INTERNAL_ERROR, \ + _("Missing '%s' field in lock manager driver"), \ + #field); \ + return errret; \ + } + +#define CHECK_MANAGER(field, errret) \ + if (!manager->driver->field) { \ + virLockError(VIR_ERR_INTERNAL_ERROR, \ + _("Missing '%s' field in lock manager driver"), \ + #field); \ + return errret; \ + } + +struct _virLockManagerPlugin { + virLockDriverPtr driver; + void *handle; + int refs; +}; + +#define DEFAULT_LOCK_MANAGER_PLUGIN_DIR LIBDIR "/libvirt/lock-driver" + +static void virLockManagerLogParams(size_t nparams, + virLockManagerParamPtr params) +{ + int i; + char uuidstr[VIR_UUID_STRING_BUFLEN]; + for (i = 0 ; i < nparams ; i++) { + switch (params[i].type) { + case VIR_LOCK_MANAGER_PARAM_TYPE_INT: + VIR_DEBUG(" key=%s type=int value=%d", params[i].key, params[i].value.i); + break; + case VIR_LOCK_MANAGER_PARAM_TYPE_UINT: + VIR_DEBUG(" key=%s type=uint value=%u", params[i].key, params[i].value.ui); + break; + case VIR_LOCK_MANAGER_PARAM_TYPE_LONG: + VIR_DEBUG(" key=%s type=long value=%lld", params[i].key, params[i].value.l); + break; + case VIR_LOCK_MANAGER_PARAM_TYPE_ULONG: + VIR_DEBUG(" key=%s type=ulong value=%llu", params[i].key, params[i].value.ul); + break; + case VIR_LOCK_MANAGER_PARAM_TYPE_DOUBLE: + VIR_DEBUG(" key=%s type=double value=%lf", params[i].key, params[i].value.d); + break; + case VIR_LOCK_MANAGER_PARAM_TYPE_STRING: + VIR_DEBUG(" key=%s type=string value=%s", params[i].key, params[i].value.str); + break; + case VIR_LOCK_MANAGER_PARAM_TYPE_UUID: + virUUIDFormat(params[i].value.uuid, uuidstr); + VIR_DEBUG(" key=%s type=uuid value=%s", params[i].key, uuidstr); + break; + } + } +} + + +/** + * virLockManagerPluginNew: + * @name: the name of the plugin + * @flag: optional plugin flags + * + * Attempt to load the plugin $(libdir)/libvirt/lock-driver/@name.so + * The plugin driver entry point will be resolved & invoked to obtain + * the lock manager driver. + * + * Even if the loading of the plugin succeeded, this may still + * return NULL if the plugin impl decided that we (libvirtd) + * are too old to support a feature it requires + * + * Returns a plugin object, or NULL if loading failed. + */ +virLockManagerPluginPtr virLockManagerPluginNew(const char *name, + unsigned int flags) +{ + void *handle = NULL; + virLockDriverPtr driver; + virLockManagerPluginPtr plugin; + const char *moddir = getenv("LIBVIRT_LOCK_MANAGER_PLUGIN_DIR"); + char *modfile = NULL; + + if (moddir == NULL) + moddir = DEFAULT_LOCK_MANAGER_PLUGIN_DIR; + + VIR_DEBUG("Module load %s from %s", name, moddir); + + if (virAsprintf(&modfile, "%s/%s.so", moddir, name) < 0) { + virReportOOMError(); + return NULL; + } + + if (access(modfile, R_OK) < 0) { + virReportSystemError(errno, + _("Plugin %s not accessible"), + modfile); + goto cleanup; + } + + handle = dlopen(modfile, RTLD_NOW | RTLD_LOCAL); + if (!handle) { + virLockError(VIR_ERR_SYSTEM_ERROR, + _("Failed to load plugin %s: %s"), + modfile, dlerror()); + goto cleanup; + } + + if (!(driver = dlsym(handle, "virLockDriverImpl"))) { + virLockError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Missing plugin initialization symbol 'virLockDriverImpl'")); + goto cleanup; + } + + if (driver->drvInit(VIR_LOCK_MANAGER_VERSION, flags) < 0) { + virLockError(VIR_ERR_INTERNAL_ERROR, "%s", + _("plugin ABI is not compatible")); + goto cleanup; + } + + if (VIR_ALLOC(plugin) < 0) { + virReportOOMError(); + goto cleanup; + } + + plugin->driver = driver; + plugin->handle = handle; + plugin->refs = 1; + + VIR_FREE(modfile); + return plugin; + +cleanup: + VIR_FREE(modfile); + if (handle) + dlclose(handle); + return NULL; +} + + +/** + * virLockManagerPluginRef: + * @plugin: the plugin implementation to ref + * + * Acquires an additional reference on the plugin. + */ +void virLockManagerPluginRef(virLockManagerPluginPtr plugin) +{ + plugin->refs++; +} + + +/** + * virLockManagerPluginUnref: + * @plugin: the plugin implementation to unref + * + * Releases a reference on the plugin. When the last reference + * is released, it will attempt to unload the plugin from memory. + * The plugin may refuse to allow unloading if this would + * result in an unsafe scenario. + * + */ +void virLockManagerPluginUnref(virLockManagerPluginPtr plugin) +{ + if (!plugin) + return; + + plugin->refs--; + + if (plugin->refs > 0) + return; + + if (plugin->driver->drvDeinit() >= 0) { + if (plugin->handle) + dlclose(plugin->handle); + } else { + VIR_WARN0("Unable to unload lock maanger plugin from memory"); + return; + } + + VIR_FREE(plugin); +} + + +/** + * virLockManagerNew: + * @plugin: the plugin implementation to use + * @type: the type of process to be supervised + * @flags: optional flags, currently unused + * + * Create a new context to supervise a process, usually + * a virtual machine. + * + * Returns a new lock manager context + */ +virLockManagerPtr virLockManagerNew(virLockManagerPluginPtr plugin, + unsigned int type, + size_t nparams, + virLockManagerParamPtr params, + unsigned int flags) +{ + virLockManagerPtr manager; + VIR_DEBUG("plugin=%p type=%u nparams=%zu params=%p flags=%u", + plugin, type, nparams, params, flags); + virLockManagerLogParams(nparams, params); + + CHECK_PLUGIN(drvNew, NULL); + + if (VIR_ALLOC(manager) < 0) { + virReportOOMError(); + return NULL; + } + + manager->driver = plugin->driver; + + if (plugin->driver->drvNew(manager, type, nparams, params, flags) < 0) { + VIR_FREE(manager); + return NULL; + } + + return manager; +} + + +int virLockManagerAddResource(virLockManagerPtr manager, + unsigned int type, + const char *name, + size_t nparams, + virLockManagerParamPtr params, + unsigned int flags) +{ + VIR_DEBUG("manager=%p type=%u name=%s nparams=%zu params=%p flags=%u", + manager, type, name, nparams, params, flags); + virLockManagerLogParams(nparams, params); + + CHECK_MANAGER(drvAddResource, -1); + + return manager->driver->drvAddResource(manager, + type, name, + nparams, params, + flags); +} + +int virLockManagerAcquireObject(virLockManagerPtr manager, + const char *state, + unsigned int flags) +{ + VIR_DEBUG("manager=%p state=%s flags=%u", manager, state, flags); + + CHECK_MANAGER(drvAcquireObject, -1); + + return manager->driver->drvAcquireObject(manager, state, flags); +} + + +int virLockManagerAttachObject(virLockManagerPtr manager, + unsigned int flags) +{ + VIR_DEBUG("manager=%p flags=%u", manager, flags); + + CHECK_MANAGER(drvAttachObject, -1); + + return manager->driver->drvAttachObject(manager, flags); +} + + +int virLockManagerDetachObject(virLockManagerPtr manager, + unsigned int flags) +{ + VIR_DEBUG("manager=%p flags=%u", manager, flags); + + CHECK_MANAGER(drvDetachObject, -1); + + return manager->driver->drvDetachObject(manager, flags); +} + + +int virLockManagerReleaseObject(virLockManagerPtr manager, + unsigned int flags) +{ + VIR_DEBUG("manager=%p flags=%u", manager, flags); + + CHECK_MANAGER(drvReleaseObject, -1); + + return manager->driver->drvReleaseObject(manager, flags); +} + + +int virLockManagerGetState(virLockManagerPtr manager, + char **state, + unsigned int flags) +{ + VIR_DEBUG("manager=%p state=%p flags=%u", manager, state, flags); + + CHECK_MANAGER(drvGetState, -1); + + return manager->driver->drvGetState(manager, state, flags); +} + + +int virLockManagerAcquireResource(virLockManagerPtr manager, + unsigned int type, + const char *name, + size_t nparams, + virLockManagerParamPtr params, + unsigned int flags) +{ + VIR_DEBUG("manager=%p type=%u name=%s nparams=%zu params=%p flags=%u", + manager, type, name, nparams, params, flags); + virLockManagerLogParams(nparams, params); + + CHECK_MANAGER(drvAcquireResource, -1); + + return manager->driver->drvAcquireResource(manager, + type, name, + nparams, params, + flags); +} + + +int virLockManagerReleaseResource(virLockManagerPtr manager, + unsigned int type, + const char *name, + size_t nparams, + virLockManagerParamPtr params, + unsigned int flags) +{ + VIR_DEBUG("manager=%p type=%u name=%s nparams=%zu params=%p flags=%u", + manager, type, name, nparams, params, flags); + virLockManagerLogParams(nparams, params); + + CHECK_MANAGER(drvReleaseResource, -1); + + return manager->driver->drvReleaseResource(manager, + type, name, + nparams, params, + flags); +} + + +int virLockManagerFree(virLockManagerPtr manager) +{ + VIR_DEBUG("manager=%p", manager); + + if (!manager) + return 0; + + CHECK_MANAGER(drvFree, -1); + + manager->driver->drvFree(manager); + + return 0; +} diff --git a/src/locking/lock_manager.h b/src/locking/lock_manager.h new file mode 100644 index 0000000..a33cb68 --- /dev/null +++ b/src/locking/lock_manager.h @@ -0,0 +1,79 @@ +/* + * lock_manager.h: Defines the internal lock manager API + * + * Copyright (C) 2010-2011 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#ifndef __VIR_LOCK_MANAGER_H__ +# define __VIR_LOCK_MANAGER_H__ + +# include "internal.h" +# include "lock_driver.h" + +typedef struct _virLockManagerPlugin virLockManagerPlugin; +typedef virLockManagerPlugin *virLockManagerPluginPtr; + +virLockManagerPluginPtr virLockManagerPluginNew(const char *name, + unsigned int flags); +void virLockManagerPluginRef(virLockManagerPluginPtr plugin); +void virLockManagerPluginUnref(virLockManagerPluginPtr plugin); + +virLockManagerPtr virLockManagerNew(virLockManagerPluginPtr plugin, + unsigned int type, + size_t nparams, + virLockManagerParamPtr params, + unsigned int flags); + +int virLockManagerAddResource(virLockManagerPtr manager, + unsigned int type, + const char *name, + size_t nparams, + virLockManagerParamPtr params, + unsigned int flags); + +int virLockManagerAcquireObject(virLockManagerPtr manager, + const char *state, + unsigned int flags); +int virLockManagerAttachObject(virLockManagerPtr manager, + unsigned int flags); +int virLockManagerDetachObject(virLockManagerPtr manager, + unsigned int flags); +int virLockManagerReleaseObject(virLockManagerPtr manager, + unsigned int flags); + +int virLockManagerGetState(virLockManagerPtr manager, + char **state, + unsigned int flags); + +int virLockManagerAcquireResource(virLockManagerPtr manager, + unsigned int type, + const char *name, + size_t nparams, + virLockManagerParamPtr params, + unsigned int flags); + +int virLockManagerReleaseResource(virLockManagerPtr manager, + unsigned int type, + const char *name, + size_t nparams, + virLockManagerParamPtr params, + unsigned int flags); + +int virLockManagerFree(virLockManagerPtr manager); + +#endif /* __VIR_LOCK_MANAGER_H__ */ diff --git a/src/util/virterror.c b/src/util/virterror.c index e45b582..c90fdfb 100644 --- a/src/util/virterror.c +++ b/src/util/virterror.c @@ -200,6 +200,9 @@ static const char *virErrorDomainName(virErrorDomain domain) { case VIR_FROM_STREAMS: dom = "Streams "; break; + case VIR_FROM_LOCKING: + dom = "Locking "; + break; } return(dom); } -- 1.7.3.4

To allow hypervisor drivers to assume that a lock driver impl will be guaranteed to exist, provide a 'nop' impl that is compiled into the library * src/Makefile.am: Add nop driver * src/locking/lock_driver_nop.c, src/locking/lock_driver_nop.h: Nop lock driver implementation * src/locking/lock_manager.c: Enable direct access of 'nop' driver, instead of dlopen()ing it. --- src/Makefile.am | 4 +- src/locking/lock_driver_nop.c | 157 +++++++++++++++++++++++++++++++++++++++++ src/locking/lock_driver_nop.h | 30 ++++++++ src/locking/lock_manager.c | 53 ++++++++------ 4 files changed, 219 insertions(+), 25 deletions(-) create mode 100644 src/locking/lock_driver_nop.c create mode 100644 src/locking/lock_driver_nop.h diff --git a/src/Makefile.am b/src/Makefile.am index f001daf..9bd20e5 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -92,7 +92,9 @@ DRIVER_SOURCES = \ fdstream.c fdstream.h \ $(NODE_INFO_SOURCES) \ libvirt.c libvirt_internal.h \ - locking/lock_manager.c locking/lock_manager.h + locking/lock_manager.c locking/lock_manager.h \ + locking/lock_driver.h \ + locking/lock_driver_nop.h locking/lock_driver_nop.c # XML configuration format handling sources diff --git a/src/locking/lock_driver_nop.c b/src/locking/lock_driver_nop.c new file mode 100644 index 0000000..b79d0e5 --- /dev/null +++ b/src/locking/lock_driver_nop.c @@ -0,0 +1,157 @@ +/* + * lock_driver_nop.c: A lock driver which locks nothing + * + * Copyright (C) 2010-2011 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#include <config.h> + +#include "lock_driver_nop.h" +#include "memory.h" +#include "logging.h" +#include "uuid.h" + + +static int virLockManagerNopInit(unsigned int version, + unsigned int flags) +{ + VIR_DEBUG("version=%u flags=%u", version, flags); + + return 0; +} + +static int virLockManagerNopDeinit(void) +{ + VIR_DEBUG0(""); + + return 0; +} + + +static int virLockManagerNopNew(virLockManagerPtr lock ATTRIBUTE_UNUSED, + unsigned int type ATTRIBUTE_UNUSED, + size_t nparams ATTRIBUTE_UNUSED, + virLockManagerParamPtr params ATTRIBUTE_UNUSED, + unsigned int flags ATTRIBUTE_UNUSED) +{ + return 0; +} + +static int virLockManagerNopAddResource(virLockManagerPtr lock ATTRIBUTE_UNUSED, + unsigned int type ATTRIBUTE_UNUSED, + const char *name ATTRIBUTE_UNUSED, + size_t nparams ATTRIBUTE_UNUSED, + virLockManagerParamPtr params ATTRIBUTE_UNUSED, + unsigned int flags ATTRIBUTE_UNUSED) +{ + + return 0; +} + + +static int virLockManagerNopAcquireObject(virLockManagerPtr lock ATTRIBUTE_UNUSED, + const char *state ATTRIBUTE_UNUSED, + unsigned int flags ATTRIBUTE_UNUSED) +{ + + return 0; +} + +static int virLockManagerNopAttachObject(virLockManagerPtr lock ATTRIBUTE_UNUSED, + unsigned int flags ATTRIBUTE_UNUSED) +{ + + return 0; +} + +static int virLockManagerNopDetachObject(virLockManagerPtr lock ATTRIBUTE_UNUSED, + unsigned int flags ATTRIBUTE_UNUSED) +{ + + return 0; +} + + +static int virLockManagerNopReleaseObject(virLockManagerPtr lock ATTRIBUTE_UNUSED, + unsigned int flags ATTRIBUTE_UNUSED) +{ + + return 0; +} + +static int virLockManagerNopGetState(virLockManagerPtr lock ATTRIBUTE_UNUSED, + char **state ATTRIBUTE_UNUSED, + unsigned int flags ATTRIBUTE_UNUSED) +{ + + *state = NULL; + + return 0; +} + + +static int virLockManagerNopAcquireResource(virLockManagerPtr lock ATTRIBUTE_UNUSED, + unsigned int type ATTRIBUTE_UNUSED, + const char *name ATTRIBUTE_UNUSED, + size_t nparams ATTRIBUTE_UNUSED, + virLockManagerParamPtr params ATTRIBUTE_UNUSED, + unsigned int flags ATTRIBUTE_UNUSED) +{ + + return 0; +} + +static int virLockManagerNopReleaseResource(virLockManagerPtr lock ATTRIBUTE_UNUSED, + unsigned int type ATTRIBUTE_UNUSED, + const char *name ATTRIBUTE_UNUSED, + size_t nparams ATTRIBUTE_UNUSED, + virLockManagerParamPtr params ATTRIBUTE_UNUSED, + unsigned int flags ATTRIBUTE_UNUSED) +{ + + return 0; +} + +static void virLockManagerNopFree(virLockManagerPtr man) +{ + VIR_FREE(man); +} + +virLockDriver virLockDriverNop = +{ + .version = VIR_LOCK_MANAGER_VERSION, + .flags = VIR_LOCK_MANAGER_MODE_CONTENT | VIR_LOCK_MANAGER_MODE_METADATA, + + .drvInit = virLockManagerNopInit, + .drvDeinit = virLockManagerNopDeinit, + + .drvNew = virLockManagerNopNew, + .drvFree = virLockManagerNopFree, + + .drvAddResource = virLockManagerNopAddResource, + + .drvAcquireObject = virLockManagerNopAcquireObject, + .drvAttachObject = virLockManagerNopAttachObject, + .drvDetachObject = virLockManagerNopDetachObject, + .drvReleaseObject = virLockManagerNopReleaseObject, + + .drvGetState = virLockManagerNopGetState, + + .drvAcquireResource = virLockManagerNopAcquireResource, + .drvReleaseResource = virLockManagerNopReleaseResource, +}; diff --git a/src/locking/lock_driver_nop.h b/src/locking/lock_driver_nop.h new file mode 100644 index 0000000..4be5377 --- /dev/null +++ b/src/locking/lock_driver_nop.h @@ -0,0 +1,30 @@ +/* + * lock_driver_nop.h: A lock driver which locks nothing + * + * Copyright (C) 2010-2011 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#ifndef __VIR_LOCK_DRIVER_NOP_H__ +# define __VIR_LOCK_DRIVER_NOP_H__ + +# include "lock_driver.h" + +extern virLockDriver virLockDriverNop; + + +#endif /* __VIR_LOCK_DRIVER_NOP_H__ */ diff --git a/src/locking/lock_manager.c b/src/locking/lock_manager.c index 9c98555..21862ee 100644 --- a/src/locking/lock_manager.c +++ b/src/locking/lock_manager.c @@ -22,6 +22,7 @@ #include <config.h> #include "lock_manager.h" +#include "lock_driver_nop.h" #include "virterror_internal.h" #include "logging.h" #include "util.h" @@ -122,35 +123,39 @@ virLockManagerPluginPtr virLockManagerPluginNew(const char *name, const char *moddir = getenv("LIBVIRT_LOCK_MANAGER_PLUGIN_DIR"); char *modfile = NULL; - if (moddir == NULL) - moddir = DEFAULT_LOCK_MANAGER_PLUGIN_DIR; + if (STREQ(name, "nop")) { + driver = &virLockDriverNop; + } else { + if (moddir == NULL) + moddir = DEFAULT_LOCK_MANAGER_PLUGIN_DIR; - VIR_DEBUG("Module load %s from %s", name, moddir); + VIR_DEBUG("Module load %s from %s", name, moddir); - if (virAsprintf(&modfile, "%s/%s.so", moddir, name) < 0) { - virReportOOMError(); - return NULL; - } + if (virAsprintf(&modfile, "%s/%s.so", moddir, name) < 0) { + virReportOOMError(); + return NULL; + } - if (access(modfile, R_OK) < 0) { - virReportSystemError(errno, - _("Plugin %s not accessible"), - modfile); - goto cleanup; - } + if (access(modfile, R_OK) < 0) { + virReportSystemError(errno, + _("Plugin %s not accessible"), + modfile); + goto cleanup; + } - handle = dlopen(modfile, RTLD_NOW | RTLD_LOCAL); - if (!handle) { - virLockError(VIR_ERR_SYSTEM_ERROR, - _("Failed to load plugin %s: %s"), - modfile, dlerror()); - goto cleanup; - } + handle = dlopen(modfile, RTLD_NOW | RTLD_LOCAL); + if (!handle) { + virLockError(VIR_ERR_SYSTEM_ERROR, + _("Failed to load plugin %s: %s"), + modfile, dlerror()); + goto cleanup; + } - if (!(driver = dlsym(handle, "virLockDriverImpl"))) { - virLockError(VIR_ERR_INTERNAL_ERROR, "%s", - _("Missing plugin initialization symbol 'virLockDriverImpl'")); - goto cleanup; + if (!(driver = dlsym(handle, "virLockDriverImpl"))) { + virLockError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Missing plugin initialization symbol 'virLockDriverImpl'")); + goto cleanup; + } } if (driver->drvInit(VIR_LOCK_MANAGER_VERSION, flags) < 0) { -- 1.7.3.4

To facilitate use of the locking plugins from hypervisor drivers, introduce a higher level API for locking virDomainObjPtr instances. In includes APIs targetted to VM startup, and hotplug/unplug * src/Makefile.am: Add domain lock API * src/locking/domain_lock.c, src/locking/domain_lock.h: High level API for domain locking --- src/Makefile.am | 3 +- src/libvirt_private.syms | 13 ++ src/locking/README | 7 + src/locking/domain_lock.c | 442 +++++++++++++++++++++++++++++++++++++++++++++ src/locking/domain_lock.h | 70 +++++++ 5 files changed, 534 insertions(+), 1 deletions(-) create mode 100644 src/locking/domain_lock.c create mode 100644 src/locking/domain_lock.h diff --git a/src/Makefile.am b/src/Makefile.am index 9bd20e5..b68a9b4 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -94,7 +94,8 @@ DRIVER_SOURCES = \ libvirt.c libvirt_internal.h \ locking/lock_manager.c locking/lock_manager.h \ locking/lock_driver.h \ - locking/lock_driver_nop.h locking/lock_driver_nop.c + locking/lock_driver_nop.h locking/lock_driver_nop.c \ + locking/domain_lock.h locking/domain_lock.c # XML configuration format handling sources diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b45f501..8005f20 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -357,6 +357,19 @@ virDomainEventWatchdogNewFromDom; virDomainEventWatchdogNewFromObj; +# domain_lock.h +virDomainLockFree; +virDomainLockForExec; +virDomainLockForStartup; +virDomainLockForShutdown; +virDomainLockForModify; +virDomainLockBeginDiskAttach; +virDomainLockBeginDiskDetach; +virDomainLockEndDiskAttach; +virDomainLockEndDiskDetach; +virDomainLockReleaseAndFree; + + # domain_nwfilter.h virDomainConfNWFilterInstantiate; virDomainConfNWFilterRegister; diff --git a/src/locking/README b/src/locking/README index 4fa4f89..da2a8f8 100644 --- a/src/locking/README +++ b/src/locking/README @@ -1,3 +1,10 @@ + Using the Lock Manager APIs + =========================== + +This file describes how to use the lock manager APIs. +All the guest lifecycle sequences here have higher +level wrappers provided by the 'domain_lock.h' API, +which simplify thue usage At libvirtd startup: diff --git a/src/locking/domain_lock.c b/src/locking/domain_lock.c new file mode 100644 index 0000000..60b7926 --- /dev/null +++ b/src/locking/domain_lock.c @@ -0,0 +1,442 @@ +/* + * domain_lock.c: Locking for domain lifecycle operations + * + * Copyright (C) 2010-2011 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#include <config.h> + +#include <intprops.h> + +#include "domain_lock.h" +#include "memory.h" +#include "uuid.h" +#include "virterror_internal.h" +#include "logging.h" + +#define VIR_FROM_THIS VIR_FROM_LOCKING + +struct _virDomainLock { + virLockManagerPtr contentLock; + virLockManagerPtr metadataLock; + + pid_t pid; + + bool releaseContentLock; + bool releaseMetadataLock; +}; + +static virLockManagerPtr virDomainLockNewManager(virLockManagerPluginPtr plugin, + virDomainObjPtr vm, + const char *state, + int mode, + bool acquireLock) +{ + virLockManagerPtr lock; + int i; + virLockManagerParam params[] = { + { .type = VIR_LOCK_MANAGER_PARAM_TYPE_UUID, + .key = "uuid", + }, + { .type = VIR_LOCK_MANAGER_PARAM_TYPE_STRING, + .key = "name", + .value = { .str = vm->def->name }, + }, + { .type = VIR_LOCK_MANAGER_PARAM_TYPE_UINT, + .key = "id", + .value = { .i = vm->def->id }, + }, + { .type = VIR_LOCK_MANAGER_PARAM_TYPE_UINT, + .key = "pid", + .value = { .i = vm->pid }, + }, + }; + VIR_DEBUG("plugin=%p vm=%p state=%s mode=%d acquire=%d", + plugin, vm, state, mode, acquireLock); + + memcpy(params[0].value.uuid, vm->def->uuid, VIR_UUID_BUFLEN); + + if (!(lock = virLockManagerNew(plugin, + VIR_LOCK_MANAGER_OBJECT_TYPE_DOMAIN, + ARRAY_CARDINALITY(params), + params, + mode))) + return NULL; + + if (acquireLock) { + VIR_DEBUG0("Acquiring leases"); + for (i = 0 ; i < vm->def->nleases ; i++) { + virDomainLeaseDefPtr lease = vm->def->leases[i]; + unsigned int leaseFlags = 0; + virLockManagerParam lparams[] = { + { .type = VIR_LOCK_MANAGER_PARAM_TYPE_STRING, + .key = "path", + .value = { .str = lease->path }, + }, + { .type = VIR_LOCK_MANAGER_PARAM_TYPE_ULONG, + .key = "offset", + .value = { .ul = lease->offset }, + }, + { .type = VIR_LOCK_MANAGER_PARAM_TYPE_ULONG, + .key = "length", + .value = { .ul = lease->length }, + }, + }; + + VIR_DEBUG("Acquire lease %s", lease->path); + if (virLockManagerAddResource(lock, + VIR_LOCK_MANAGER_RESOURCE_TYPE_LEASE, + lease->key, + ARRAY_CARDINALITY(lparams), + lparams, + leaseFlags) < 0) { + VIR_DEBUG("Failed lease %s", lease->path); + virLockManagerFree(lock); + return NULL; + } + } + + VIR_DEBUG0("Acquiring disks"); + for (i = 0 ; i < vm->def->ndisks ; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + unsigned int diskFlags = 0; + if (!disk->src) + continue; + + if (!(disk->type == VIR_DOMAIN_DISK_TYPE_BLOCK || + disk->type == VIR_DOMAIN_DISK_TYPE_FILE || + disk->type == VIR_DOMAIN_DISK_TYPE_DIR)) + continue; + + if (disk->readonly) + diskFlags |= VIR_LOCK_MANAGER_RESOURCE_READONLY; + if (disk->shared) + diskFlags |= VIR_LOCK_MANAGER_RESOURCE_SHARED; + + VIR_DEBUG("Acquire disk %s", disk->src); + if (virLockManagerAddResource(lock, + VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, + disk->src, + 0, + NULL, + diskFlags) < 0) { + VIR_DEBUG("Failed disk %s", disk->src); + virLockManagerFree(lock); + return NULL; + } + } + + if (virLockManagerAcquireObject(lock, state, 0) < 0) + goto error; + } else { + if (virLockManagerAttachObject(lock, 0) < 0) + goto error; + } + + return lock; + +error: + virLockManagerFree(lock); + return NULL; +} + + +static virDomainLockPtr virDomainLockNew(virLockManagerPluginPtr contentLockPlugin, + virLockManagerPluginPtr metadataLockPlugin, + virDomainObjPtr dom, + const char *contentState, + const char *metadataState, + bool acquireContentLock, + bool acquireMetadataLock) +{ + virDomainLockPtr lock; + + if (VIR_ALLOC(lock) < 0) { + virReportOOMError(); + goto error; + } + + if (contentLockPlugin && + !(lock->contentLock = virDomainLockNewManager(contentLockPlugin, + dom, + contentState, + VIR_LOCK_MANAGER_MODE_CONTENT, + acquireContentLock))) + goto error; + + lock->releaseContentLock = acquireContentLock; + + if (metadataLockPlugin && + !(lock->metadataLock = virDomainLockNewManager(metadataLockPlugin, + dom, + metadataState, + VIR_LOCK_MANAGER_MODE_METADATA, + acquireMetadataLock))) + goto error; + + lock->releaseMetadataLock = acquireMetadataLock; + + lock->pid = dom->pid; + + return lock; + +error: + virDomainLockFree(lock); + return NULL; +} + + +virDomainLockPtr virDomainLockForExec(virLockManagerPluginPtr contentLockPlugin, + const char *contentState, + virDomainObjPtr dom) +{ + VIR_DEBUG("contentLockPlugin=%p contentState=%s dom=%p", + contentLockPlugin, contentState, dom); + + return virDomainLockNew(contentLockPlugin, + NULL, + dom, + contentState, + NULL, + true, + false); +} + +virDomainLockPtr virDomainLockForStartup(virLockManagerPluginPtr contentLockPlugin, + virLockManagerPluginPtr metadataLockPlugin, + const char *metadataState, + virDomainObjPtr dom) +{ + VIR_DEBUG("contentLockPlugin=%p metadataLockPlugin=%p metadataState=%s dom=%p", + contentLockPlugin, metadataLockPlugin, metadataState, dom); + + return virDomainLockNew(contentLockPlugin, + metadataLockPlugin, + dom, + NULL, + metadataState, + false, + true); +} + +virDomainLockPtr virDomainLockForShutdown(virLockManagerPluginPtr contentLockPlugin, + virLockManagerPluginPtr metadataLockPlugin, + virDomainObjPtr dom) +{ + VIR_DEBUG("contentLockPlugin=%p metadataLockPlugin=%p dom=%p", + contentLockPlugin, metadataLockPlugin, dom); + + return virDomainLockNew(contentLockPlugin, + metadataLockPlugin, + dom, + NULL, + NULL, + true, + true); +} + +virDomainLockPtr virDomainLockForModify(virLockManagerPluginPtr contentLockPlugin, + virLockManagerPluginPtr metadataLockPlugin, + virDomainObjPtr dom) +{ + VIR_DEBUG("contentLockPlugin=%p metadataLockPlugin=%p dom=%p", + contentLockPlugin, metadataLockPlugin, dom); + + return virDomainLockNew(contentLockPlugin, + metadataLockPlugin, + dom, + NULL, + NULL, + false, + false); +} + + +static int virDomainLockDiskOperation(virDomainLockPtr lock, + virDomainDiskDefPtr disk, + bool isBegin, + bool isAttach) +{ + unsigned int diskFlags = 0; + if (!disk->src) + return 0; + + if (!(disk->type == VIR_DOMAIN_DISK_TYPE_BLOCK || + disk->type == VIR_DOMAIN_DISK_TYPE_FILE || + disk->type == VIR_DOMAIN_DISK_TYPE_DIR)) + return 0; + + if (disk->readonly) + diskFlags |= VIR_LOCK_MANAGER_RESOURCE_READONLY; + if (disk->shared) + diskFlags |= VIR_LOCK_MANAGER_RESOURCE_SHARED; + + if (isAttach) { + if (isBegin) { + if (lock->contentLock && + virLockManagerAcquireResource(lock->contentLock, + VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, + disk->src, + 0, + NULL, + diskFlags) < 0) + return -1; + + if (lock->metadataLock && + virLockManagerAcquireResource(lock->metadataLock, + VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, + disk->src, + 0, + NULL, + diskFlags) < 0) { + virLockManagerReleaseResource(lock->contentLock, + VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, + disk->src, + 0, + NULL, + diskFlags); + return -1; + } + } else { + if (lock->metadataLock && + virLockManagerReleaseResource(lock->metadataLock, + VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, + disk->src, + 0, + NULL, + diskFlags) < 0) + return -1; + } + } else { + if (isBegin) { + if (lock->metadataLock && + virLockManagerAcquireResource(lock->metadataLock, + VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, + disk->src, + 0, + NULL, + diskFlags) < 0) { + return -1; + } + } else { + if (lock->metadataLock && + virLockManagerReleaseResource(lock->metadataLock, + VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, + disk->src, + 0, + NULL, + diskFlags) < 0) + return -1; + + if (lock->contentLock && + virLockManagerReleaseResource(lock->contentLock, + VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, + disk->src, + 0, + NULL, + diskFlags) < 0) + return -1; + + } + } + + return 0; +} + +int virDomainLockBeginDiskAttach(virDomainLockPtr lock, + virDomainDiskDefPtr disk) +{ + return virDomainLockDiskOperation(lock, disk, true, true); +} + +int virDomainLockEndDiskAttach(virDomainLockPtr lock, + virDomainDiskDefPtr disk) +{ + return virDomainLockDiskOperation(lock, disk, false, true); +} + + +int virDomainLockBeginDiskDetach(virDomainLockPtr lock, + virDomainDiskDefPtr disk) +{ + return virDomainLockDiskOperation(lock, disk, true, false); +} + + +int virDomainLockEndDiskDetach(virDomainLockPtr lock, + virDomainDiskDefPtr disk) +{ + return virDomainLockDiskOperation(lock, disk, false, false); +} + + +void virDomainLockFree(virDomainLockPtr lock) +{ + if (!lock) + return; + + virLockManagerFree(lock->metadataLock); + virLockManagerFree(lock->contentLock); + + VIR_FREE(lock); +} + +void virDomainLockReleaseAndFree(virDomainLockPtr lock) +{ + if (!lock) + return; + + if (lock->metadataLock) { + if (lock->releaseMetadataLock) + virLockManagerReleaseObject(lock->metadataLock, 0); + else + virLockManagerDetachObject(lock->metadataLock, 0); + } + + if (lock->contentLock) { + if (lock->releaseContentLock) + virLockManagerReleaseObject(lock->contentLock, 0); + else + virLockManagerDetachObject(lock->contentLock, 0); + } + + virDomainLockFree(lock); +} + +int virDomainLockGetState(virDomainLockPtr lock, + char **contentState, + char **metadataState, + unsigned int flags) +{ + *contentState = NULL; + *metadataState = NULL; + + if (lock->contentLock) { + if (virLockManagerGetState(lock->contentLock, contentState, flags) < 0) + return -1; + } + + if (lock->metadataLock) { + if (virLockManagerGetState(lock->metadataLock, metadataState, flags) < 0) { + VIR_FREE(*contentState); + return -1; + } + } + + return 0; +} diff --git a/src/locking/domain_lock.h b/src/locking/domain_lock.h new file mode 100644 index 0000000..d49fb94 --- /dev/null +++ b/src/locking/domain_lock.h @@ -0,0 +1,70 @@ +/* + * domain_lock.c: Locking for domain lifecycle operations + * + * Copyright (C) 2010-2011 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#ifndef __VIR_DOMAIN_LOCK_H__ +# define __VIR_DOMAIN_LOCK_H__ + +# include "internal.h" +# include "domain_conf.h" +# include "lock_manager.h" + +typedef struct _virDomainLock virDomainLock; +typedef virDomainLock *virDomainLockPtr; + +virDomainLockPtr virDomainLockForExec(virLockManagerPluginPtr contentLockPlugin, + const char *state, + virDomainObjPtr dom); + +virDomainLockPtr virDomainLockForStartup(virLockManagerPluginPtr contentLockPlugin, + virLockManagerPluginPtr metadataLockPlugin, + const char *state, + virDomainObjPtr dom); + +virDomainLockPtr virDomainLockForShutdown(virLockManagerPluginPtr contentLockPlugin, + virLockManagerPluginPtr metadataLockPlugin, + virDomainObjPtr dom); + +virDomainLockPtr virDomainLockForModify(virLockManagerPluginPtr contentLockPlugin, + virLockManagerPluginPtr metadataLockPlugin, + virDomainObjPtr dom); + +int virDomainLockBeginDiskAttach(virDomainLockPtr dom, + virDomainDiskDefPtr disk); + +int virDomainLockEndDiskAttach(virDomainLockPtr dom, + virDomainDiskDefPtr disk); + +int virDomainLockBeginDiskDetach(virDomainLockPtr dom, + virDomainDiskDefPtr disk); + +int virDomainLockEndDiskDetach(virDomainLockPtr dom, + virDomainDiskDefPtr disk); + +void virDomainLockFree(virDomainLockPtr lock); +void virDomainLockReleaseAndFree(virDomainLockPtr lock); + +int virDomainLockGetState(virDomainLockPtr manager, + char **contentState, + char **metadataState, + unsigned int flags); + + +#endif /* __VIR_DOMAIN_LOCK_H__ */ -- 1.7.3.4

--- docs/internals-locking.html.in | 301 ++++++++++++++++++++++++++++++++++++++++ 1 files changed, 301 insertions(+), 0 deletions(-) create mode 100644 docs/internals-locking.html.in diff --git a/docs/internals-locking.html.in b/docs/internals-locking.html.in new file mode 100644 index 0000000..90054f0 --- /dev/null +++ b/docs/internals-locking.html.in @@ -0,0 +1,301 @@ +<html> + <body> + <h1>Resource Lock Manager</h1> + + <ul id="toc"></ul> + + <p> + This page describes the design of the resource lock manager + that is used for locking disk images with the QEMU driver. + </p> + + <h2><a name="goals">Goals</a></h2> + + <p> + The high level goal is to prevent the same disk image being + used by more than one QEMU instance at a time (unless the + disk is marked as sharable, or readonly). The scenarios + to be prevented are thus: + </p> + + <ol> + <li> + Two different guests running configured to point at the + same disk image. + </li> + <li> + One guest being started more than once on two different + machines due to admin mistake + </li> + <li> + One guest being started more than once on a single machine + due to libvirt driver bug on aa single machine. + </li> + </ol> + + <h2><a name="requirement">Requirements</a></h2> + + <p> + The high level goal leads to a set of requirements + for the lock manager design + </p> + + <ol> + <li> + A lock must be held on a disk whenever a QEMU process + has the disk open + </li> + <li> + The lock scheme must allow QEMU to be configured with + readonly, shared write, or exclusive writable disks + </li> + <li> + A lock must be held on a disk whenever libvirtd makes + changes to user/group ownership and SELinux labelling. + </li> + <li> + At least one locking impl must allow use of libvirtd on + a single host without any admin config tasks + </li> + <li> + A lock handover must be performed during the migration + process where 2 QEMU processes will have the same disk + open concurrently. + </li> + <li> + The lock manager must be able to identify and kill the + process accessing the resource if the lock is revoked. + </li> + </ol> + + <h2><a name="design">Design</a></h2> + + <p> + The requirements call for a design with two distinct lockspaces: + </p> + + <ol> + <li> + The <strong>primary lockspace</strong> is used to protect the content of + disk images. This will honour the disk sharing modes to + allow readonly/shared disk to be assigned to multiple + guests concurrently. + </li> + <li> + The <strong>secondary lockspace</strong> is used to protect the metadata + of disk images. This lock will be held whenever file + permissions / ownership / attributes are changed, and + is always exclusive, regardless of sharing mode. The + primary lock will be held prior to obtaining the secondary + lock. + </li> + </ol> + + <p> + Within each lockspace the following operations will need to be + supported + </p> + + <ul> + <li> + <strong>Acquire object lock</strong> + Acquire locks on all resources initially + registered against an object + </li> + <li> + <strong>Release object lock</strong> + Release locks on all resources currently + registered against an object + </li> + <li> + <strong>Associate object lock</strong> + Associate the current process with an existing + set of locks for an object + </li> + <li> + <strong>Deassociate object lock</strong> + Deassociate the current process with an + existing set of locks for an object. + </li> + <li> + <strong>Register resource</strong> + Register an initial resource against an object + </li> + <li> + <strong>Get object lock state</strong> + Obtain an representation of the current object + lock state. + </li> + <li> + <strong>Acquire a resource lock</strong> + Register and acquire a lock for a resource + to be added to a locked object. + </li> + <li> + <strong>Release a resource lock</strong> + Dereigster and release a lock for a resource + to be removed from a lock object + </li> + </ul> + + <h2><a name="impl">Plugin Implementations</a></h2> + + <p> + Lock manager implementations are provided as LGPLv2+ + licensed, dlopen()able library modules. A different + lock manager implementation may be used + for the primary and secondary lockspaces. With the + QEMU driver, these can be configured via the + <code>/etc/libvirt/qemu.conf</code> configuration + file by specifying the lock manager name. + </p> + + <pre> + contentLockManager="fcntl" + metadataLockManager="fcntl" + </pre> + + <p> + Lock manager implmentations are free to support + both content and metadata locks, however, if the + plugin author is only able to handle one lockspace, + the other can be delegated to the standard fcntl + lock manager. The QEMU driver will load the lock + manager plugin binaries from the following location + </p> + + <pre> +/usr/{lib,lib64}/libvirt/lock_manager/$NAME.so +</pre> + + <p> + The lock manager plugin must export a single ELF + symbol named <code>virLockDriverImpl</code>, which is + a static instance of the <code>virLockDriver</code> + struct. The struct is defined in the header file + </p> + + <pre> + #include <libvirt/plugins/lock_manager.h> + </pre> + + <p> + All callbacks in the struct must be initialized + to non-NULL pointers. The semantics of each + callback are defined in the API docs embedded + in the previously mentioned header file + </p> + + <h2><a name="usagePatterns">Lock usage patterns</a></h2> + + <p> + The following psuedo code illustrates the common + patterns of operations invoked on the lock + manager plugin callbacks. + </p> + + <h3><a name="usageLockAcquire">Lock acquisition</a></h3> + + <p> + Lock acquisition will always be performed from the + process that is to own the lock. This is typically + the QEMU child process, in between the fork+exec + pairing, but it may occassionally be held directly + by libvirtd. + </p> + + <pre> + mgr = virLockManagerNew(lockPlugin, + VIR_LOCK_MANAGER_MODE_CONTENT, + VIR_LOCK_MANAGER_TYPE_DOMAIN); + virLockManagerSetParameter(mgr, "uuid", $uuid); + virLockManagerSetParameter(mgr, "name", $name); + + foreach (initial disks) + virLockManagerAddResource(mgr, + VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, + $path, $flags); + + if (virLockManagerAcquireObject(mgr) < 0) + ...abort... + </pre> + + <p> + The lock is implicitly released when the process + that acquired it exits, however, a process may + voluntarily give up the lock by running + </p> + + <pre> + virLockManagerReleaseObject(mgr); + </pre> + + <h3><a name="usageLockAttach">Lock attachment</a></h3> + + <p> + Any time a process needs todo work on behalf of + another process that holds a lock, it will associate + itself with the existing lock. This sequence is + identical to the previous one, except for the + last step. + </p> + + + <pre> + mgr = virLockManagerNew(contentLock, + VIR_LOCK_MANAGER_MODE_CONTENT, + VIR_LOCK_MANAGER_TYPE_DOMAIN); + virLockManagerSetParameter(mgr, "uuid", $uuid); + virLockManagerSetParameter(mgr, "name", $name); + + foreach (current disks) + virLockManagerAddResource(mgr, + VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, + $path, $flags); + + if (virLockManagerAttachObject(mgr, $pid) < 0) + ...abort... + </pre> + + <p> + A lock association will always be explicitly broken + by running + </p> + + <pre> + virLockManagerDetachObject(mgr, $pid); + </pre> + + + <h3><a name="usageLiveResourceChange">Live resource changes</a></h3> + + <p> + When adding a resource to an existing locked object (eg to + hotplug a disk into a VM), the lock manager will first + attach to the locked object, acquire a lock on the + new resource, then detach from the locked object. + </p> + + <pre> + ... initial glue ... + if (virLockManagerAttachObject(mgr, $pid) < 0) + ...abort... + + if (virLockManagerAcquireResource(mgr, + VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, + $path, $flags) < 0) + ...abort... + + ...assign resource to object + + virLockManagerDetachObject(mgr, $pid) + </pre> + + <p> + Removing a resource from an existing object is an identical + process, but with <code>virLockManagerReleaseResource</code> + invoked instead + </p> + + </body> +</html> -- 1.7.3.4

The QEMU integrates with the lock manager instructure in a number of key places * During startup, a lock is acquired in between the fork & exec * During startup, the libvirtd process acquires a lock before setting file labelling * During shutdown, the libvirtd process acquires a lock before restoring file labelling * During hotplug, unplug & media change the libvirtd process holds a lock while setting/restoring labels The main content lock is only ever held by the QEMU child process, or libvirtd during VM shutdown. The rest of the operations only require libvirtd to hold the metadata locks, relying on the active QEMU still holding the content lock. * src/qemu/qemu_conf.c, src/qemu/qemu_conf.h, src/qemu/libvirtd_qemu.aug, src/qemu/test_libvirtd_qemu.aug: Add config parameter for configuring lock managers * src/qemu/qemu_driver.c: Add calls to the lock manager --- src/qemu/libvirtd_qemu.aug | 2 + src/qemu/qemu.conf | 16 ++++ src/qemu/qemu_conf.c | 28 +++++++ src/qemu/qemu_conf.h | 5 + src/qemu/qemu_driver.c | 106 +++++++++++++++++++++--- src/qemu/qemu_hotplug.c | 170 +++++++++++++++++++++++++++++++++------ src/qemu/test_libvirtd_qemu.aug | 6 ++ 7 files changed, 293 insertions(+), 40 deletions(-) diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug index 2f37015..55eb22c 100644 --- a/src/qemu/libvirtd_qemu.aug +++ b/src/qemu/libvirtd_qemu.aug @@ -44,6 +44,8 @@ module Libvirtd_qemu = | bool_entry "clear_emulator_capabilities" | bool_entry "allow_disk_format_probing" | bool_entry "set_process_name" + | str_entry "content_lock_manager" + | str_entry "metadata_lock_manager" (* Each enty in the config is one of the following three ... *) let entry = vnc_entry diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 66310d4..5c53e4d 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -272,3 +272,19 @@ # its arguments) appear in process listings. # # set_process_name = 1 + + +# To enable strict 'fcntl' based locking of the file +# content (to prevent two VMs writing to the same +# disk), start the 'virtlockd' service, and uncomment +# this +# +# content_lock_manager = "fcntl" + + +# To enable strict 'fcntl' based locking of the file +# metadata (to prevent two libvirtd daemons on different +# hosts doing conflicting metadata changes), start the +# 'virtlockd' service, and uncomment this +# +# metadata_lock_manager = "fcntl" diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index 9f9e99e..af952ba 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -115,6 +115,14 @@ int qemudLoadDriverConfig(struct qemud_driver *driver, } #endif + if (!(driver->contentLockManager = + virLockManagerPluginNew("nop", + VIR_LOCK_MANAGER_MODE_CONTENT))) + VIR_ERROR0(_("Failed to load lock manager nop")); + if (!(driver->metadataLockManager = + virLockManagerPluginNew("nop", + VIR_LOCK_MANAGER_MODE_METADATA))) + VIR_ERROR0(_("Failed to load lock manager nop")); /* Just check the file is readable before opening it, otherwise * libvirt emits an error. @@ -423,6 +431,26 @@ int qemudLoadDriverConfig(struct qemud_driver *driver, CHECK_TYPE ("set_process_name", VIR_CONF_LONG); if (p) driver->setProcessName = p->l; + p = virConfGetValue (conf, "content_lock_manager"); + CHECK_TYPE ("content_lock_manager", VIR_CONF_STRING); + if (p && p->str) { + virLockManagerPluginUnref(driver->contentLockManager); + if (!(driver->contentLockManager = + virLockManagerPluginNew(p->str, + VIR_LOCK_MANAGER_MODE_CONTENT))) + VIR_ERROR(_("Failed to load lock manager %s"), p->str); + } + + p = virConfGetValue (conf, "metadata_lock_manager"); + CHECK_TYPE ("metadata_lock_manager", VIR_CONF_STRING); + if (p && p->str) { + virLockManagerPluginUnref(driver->metadataLockManager); + if (!(driver->metadataLockManager = + virLockManagerPluginNew(p->str, + VIR_LOCK_MANAGER_MODE_METADATA))) + VIR_ERROR(_("Failed to load lock manager %s"), p->str); + } + virConfFree (conf); return 0; } diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index af1be2e..1746e85 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -44,6 +44,7 @@ # include "macvtap.h" # include "command.h" # include "threadpool.h" +# include "locking/lock_manager.h" # define QEMUD_CPUMASK_LEN CPU_SETSIZE @@ -127,6 +128,10 @@ struct qemud_driver { virBitmapPtr reservedVNCPorts; virSysinfoDefPtr hostsysinfo; + + /* These two might point to the same instance */ + virLockManagerPluginPtr contentLockManager; + virLockManagerPluginPtr metadataLockManager; }; typedef struct _qemuDomainCmdlineDef qemuDomainCmdlineDef; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 34cc29f..096e771 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -85,6 +85,7 @@ #include "fdstream.h" #include "configmake.h" #include "threadpool.h" +#include "locking/domain_lock.h" #define VIR_FROM_THIS VIR_FROM_QEMU @@ -1329,6 +1330,15 @@ qemudStartup(int privileged) { } VIR_FREE(driverConf); + /* We should always at least have the 'nop' manager, so + * NULLs here are a fatal error + */ + if (!qemu_driver->contentLockManager || + !qemu_driver->metadataLockManager) { + VIR_ERROR0(_("Missing content/metadata lock managers")); + goto error; + } + if (qemuSecurityInit(qemu_driver) < 0) goto error; @@ -1573,6 +1583,9 @@ qemudShutdown(void) { virCgroupFree(&qemu_driver->cgroup); + virLockManagerPluginUnref(qemu_driver->contentLockManager); + virLockManagerPluginUnref(qemu_driver->metadataLockManager); + qemuDriverUnlock(qemu_driver); virMutexDestroy(&qemu_driver->lock); virThreadPoolFree(qemu_driver->workerPool); @@ -1761,7 +1774,7 @@ qemudFindCharDevicePTYs(virDomainObjPtr vm, int ret, i; /* The order in which QEMU prints out the PTY paths is - the order in which it procsses its serial and parallel + the order in which it processes its serial and parallel device args. This code must match that ordering.... */ /* first comes the serial devices */ @@ -2554,26 +2567,51 @@ struct qemudHookData { virConnectPtr conn; virDomainObjPtr vm; struct qemud_driver *driver; + char *lockState; }; static int qemudSecurityHook(void *data) { struct qemudHookData *h = data; + virDomainLockPtr lock; + int ret = -1; + + /* Some later calls want pid present */ + h->vm->pid = getpid(); + + VIR_DEBUG0("Obtaining domain lock"); + if (!(lock = virDomainLockForExec(h->driver->contentLockManager, + h->lockState, + h->vm))) + goto cleanup; /* This must take place before exec(), so that all QEMU * memory allocation is on the correct NUMA node */ + VIR_DEBUG0("Moving procss to cgroup"); if (qemuAddToCgroup(h->driver, h->vm->def) < 0) - return -1; + goto cleanup; /* This must be done after cgroup placement to avoid resetting CPU * affinity */ + VIR_DEBUG0("Setup CPU affinity"); if (qemudInitCpuAffinity(h->vm) < 0) - return -1; + goto cleanup; + VIR_DEBUG0("Setting up security labeling"); if (virSecurityManagerSetProcessLabel(h->driver->securityManager, h->vm) < 0) - return -1; + goto cleanup; - return 0; + ret = 0; + +cleanup: + /* Free, but don't release the lock. The object lock + * must remain for lifetime of QEMU process. + */ + VIR_DEBUG0("Releasing domain lock"); + virDomainLockFree(lock); + + VIR_DEBUG("Hook complete ret=%d", ret); + return ret; } static int @@ -2619,11 +2657,13 @@ static int qemudStartVMDaemon(virConnectPtr conn, char *timestamp; qemuDomainObjPrivatePtr priv = vm->privateData; virCommandPtr cmd = NULL; + virDomainLockPtr lock = NULL; struct qemudHookData hookData; hookData.conn = conn; hookData.vm = vm; hookData.driver = driver; + hookData.lockState = NULL; /* XXX */ DEBUG0("Beginning VM startup process"); @@ -2662,11 +2702,6 @@ static int qemudStartVMDaemon(virConnectPtr conn, } qemuDomainSecurityLabelAudit(vm, true); - DEBUG0("Generating setting domain security labels (if required)"); - if (virSecurityManagerSetAllLabel(driver->securityManager, - vm, stdin_path) < 0) - goto cleanup; - /* Ensure no historical cgroup for this VM is lying around bogus * settings */ DEBUG0("Ensuring no historical cgroup is lying around"); @@ -2847,10 +2882,14 @@ static int qemudStartVMDaemon(virConnectPtr conn, virCommandNonblockingFDs(cmd); virCommandSetPidFile(cmd, pidfile); virCommandDaemonize(cmd); + virCommandRequireHandshake(cmd); ret = virCommandRun(cmd, NULL); - VIR_WARN("Executing done %s", vm->def->emulator); VIR_FREE(pidfile); + /* QEMU isn't actually running properly yet. The child process + * exists, but is paused in the hook waiting for handshake to + * complete + */ /* wait for qemu process to to show up */ if (ret == 0) { @@ -2863,7 +2902,8 @@ static int qemudStartVMDaemon(virConnectPtr conn, } else if (ret == -2) { /* * XXX this is bogus. It isn't safe to set vm->pid = child - * because the child no longer exists. + * because the child no longer exists. Also the QEMU + * isn't really running properly */ /* The virExec process that launches the daemon failed. Pending on @@ -2878,6 +2918,31 @@ static int qemudStartVMDaemon(virConnectPtr conn, #endif } + VIR_DEBUG0("Waiting for handshake from child"); + if (virCommandHandshakeWait(cmd) < 0) { + ret = -1; + goto cleanup; + } + + VIR_DEBUG("Started handshake, doing locking %s", vm->def->emulator); + if (!(lock = virDomainLockForStartup(driver->contentLockManager, + driver->metadataLockManager, + NULL, /* XXX lock state */ + vm))) + goto cleanup; + + VIR_DEBUG0("Setting domain security labels"); + if (virSecurityManagerSetAllLabel(driver->securityManager, + vm, stdin_path) < 0) + goto cleanup; + + VIR_DEBUG0("Labelling done, completing handshake to child"); + if (virCommandHandshakeNotify(cmd) < 0) { + ret = -1; + goto cleanup; + } + VIR_DEBUG0("Handshake complete, child running"); + if (migrateFrom) start_paused = true; vm->state = start_paused ? VIR_DOMAIN_PAUSED : VIR_DOMAIN_RUNNING; @@ -2933,11 +2998,17 @@ static int qemudStartVMDaemon(virConnectPtr conn, goto cleanup; virCommandFree(cmd); + virDomainLockReleaseAndFree(lock); VIR_FORCE_CLOSE(logfile); return 0; cleanup: + /* Must release before we call into ShutdownVMDaemon() because + * that will re-aquire the lock in order to perform relabelling + */ + virDomainLockReleaseAndFree(lock); + /* We jump here if we failed to start the VM for any reason, or * if we failed to initialize the now running VM. kill it off and * pretend we never started it */ @@ -2960,6 +3031,7 @@ static void qemudShutdownVMDaemon(struct qemud_driver *driver, int logfile = -1; char *timestamp; char ebuf[1024]; + virDomainLockPtr lock = NULL; VIR_DEBUG("Shutting down VM '%s' pid=%d migrated=%d", vm->def->name, vm->pid, migrated); @@ -3040,8 +3112,14 @@ static void qemudShutdownVMDaemon(struct qemud_driver *driver, } /* Reset Security Labels */ - virSecurityManagerRestoreAllLabel(driver->securityManager, - vm, migrated); + if ((lock = virDomainLockForShutdown(driver->contentLockManager, + driver->metadataLockManager, + vm)) != NULL) { + virSecurityManagerRestoreAllLabel(driver->securityManager, + vm, migrated); + virDomainLockReleaseAndFree(lock); + lock = NULL; + } virSecurityManagerReleaseLabel(driver->securityManager, vm); /* Clear out dynamically assigned labels */ diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 8be993b..125d9b5 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -38,6 +38,7 @@ #include "pci.h" #include "files.h" #include "qemu_cgroup.h" +#include "locking/domain_lock.h" #define VIR_FROM_THIS VIR_FROM_QEMU @@ -51,6 +52,8 @@ int qemuDomainChangeEjectableMedia(struct qemud_driver *driver, int i; int ret; char *driveAlias = NULL; + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainLockPtr lock; origdisk = NULL; for (i = 0 ; i < vm->def->ndisks ; i++) { @@ -83,14 +86,28 @@ int qemuDomainChangeEjectableMedia(struct qemud_driver *driver, return -1; } + if (!(lock = virDomainLockForModify(driver->contentLockManager, + driver->metadataLockManager, + vm))) + return -1; + + if (virDomainLockBeginDiskAttach(lock, disk) < 0) { + virDomainLockReleaseAndFree(lock); + return -1; + } + if (virSecurityManagerSetImageLabel(driver->securityManager, - vm, disk) < 0) + vm, disk) < 0) { + virDomainLockReleaseAndFree(lock); return -1; + } + + if (virDomainLockEndDiskAttach(lock, disk) < 0) + VIR_WARN("Unable to release lock on disk %s", disk->src); if (!(driveAlias = qemuDeviceDriveHostAlias(origdisk, qemuCmdFlags))) goto error; - qemuDomainObjPrivatePtr priv = vm->privateData; qemuDomainObjEnterMonitorWithDriver(driver, vm); if (disk->src) { const char *format = NULL; @@ -113,9 +130,13 @@ int qemuDomainChangeEjectableMedia(struct qemud_driver *driver, if (ret < 0) goto error; - if (virSecurityManagerRestoreImageLabel(driver->securityManager, - vm, origdisk) < 0) - VIR_WARN("Unable to restore security label on ejected image %s", origdisk->src); + if (virDomainLockBeginDiskDetach(lock, origdisk) >= 0) { + if (virSecurityManagerRestoreImageLabel(driver->securityManager, + vm, origdisk) < 0) + VIR_WARN("Unable to restore security label on ejected image %s", origdisk->src); + if (virDomainLockEndDiskDetach(lock, origdisk) < 0) + VIR_WARN("Unable to release lock on disk %s", origdisk->src); + } VIR_FREE(origdisk->src); origdisk->src = disk->src; @@ -125,14 +146,22 @@ int qemuDomainChangeEjectableMedia(struct qemud_driver *driver, VIR_FREE(driveAlias); virDomainDiskDefFree(disk); + virDomainLockReleaseAndFree(lock); return ret; error: VIR_FREE(driveAlias); - if (virSecurityManagerRestoreImageLabel(driver->securityManager, - vm, disk) < 0) - VIR_WARN("Unable to restore security label on new media %s", disk->src); + + if (virDomainLockBeginDiskDetach(lock, disk) >= 0) { + if (virSecurityManagerRestoreImageLabel(driver->securityManager, + vm, disk) < 0) + VIR_WARN("Unable to restore security label on new media %s", disk->src); + if (virDomainLockEndDiskDetach(lock, disk) < 0) + VIR_WARN("Unable to release lock on disk %s", disk->src); + } + virDomainLockReleaseAndFree(lock); + return -1; } @@ -147,6 +176,7 @@ int qemuDomainAttachPciDiskDevice(struct qemud_driver *driver, qemuDomainObjPrivatePtr priv = vm->privateData; char *devstr = NULL; char *drivestr = NULL; + virDomainLockPtr lock = NULL; for (i = 0 ; i < vm->def->ndisks ; i++) { if (STREQ(vm->def->disks[i]->dst, disk->dst)) { @@ -156,9 +186,24 @@ int qemuDomainAttachPciDiskDevice(struct qemud_driver *driver, } } + if (!(lock = virDomainLockForModify(driver->contentLockManager, + driver->metadataLockManager, + vm))) + return -1; + + if (virDomainLockBeginDiskAttach(lock, disk) < 0) { + virDomainLockReleaseAndFree(lock); + return -1; + } + if (virSecurityManagerSetImageLabel(driver->securityManager, - vm, disk) < 0) + vm, disk) < 0) { + virDomainLockReleaseAndFree(lock); return -1; + } + + if (virDomainLockEndDiskAttach(lock, disk) < 0) + VIR_WARN("Unable to release lock on disk %s", disk->src); if (qemuCmdFlags & QEMUD_CMD_FLAG_DEVICE) { if (qemuDomainPCIAddressEnsureAddr(priv->pciaddrs, &disk->info) < 0) @@ -212,6 +257,7 @@ int qemuDomainAttachPciDiskDevice(struct qemud_driver *driver, VIR_FREE(devstr); VIR_FREE(drivestr); + virDomainLockReleaseAndFree(lock); return 0; @@ -224,9 +270,14 @@ error: qemuDomainPCIAddressReleaseAddr(priv->pciaddrs, &disk->info) < 0) VIR_WARN("Unable to release PCI address on %s", disk->src); - if (virSecurityManagerRestoreImageLabel(driver->securityManager, - vm, disk) < 0) - VIR_WARN("Unable to restore security label on %s", disk->src); + if (virDomainLockBeginDiskDetach(lock, disk) >= 0) { + if (virSecurityManagerRestoreImageLabel(driver->securityManager, + vm, disk) < 0) + VIR_WARN("Unable to restore security label on %s", disk->src); + if (virDomainLockEndDiskDetach(lock, disk) < 0) + VIR_WARN("Unable to release lock on disk %s", disk->src); + } + virDomainLockReleaseAndFree(lock); return -1; } @@ -355,6 +406,7 @@ int qemuDomainAttachSCSIDisk(struct qemud_driver *driver, char *drivestr = NULL; char *devstr = NULL; int ret = -1; + virDomainLockPtr lock = NULL; for (i = 0 ; i < vm->def->ndisks ; i++) { if (STREQ(vm->def->disks[i]->dst, disk->dst)) { @@ -364,10 +416,24 @@ int qemuDomainAttachSCSIDisk(struct qemud_driver *driver, } } + if (!(lock = virDomainLockForModify(driver->contentLockManager, + driver->metadataLockManager, + vm))) + return -1; + + if (virDomainLockBeginDiskAttach(lock, disk) < 0) { + virDomainLockReleaseAndFree(lock); + return -1; + } if (virSecurityManagerSetImageLabel(driver->securityManager, - vm, disk) < 0) + vm, disk) < 0) { + virDomainLockReleaseAndFree(lock); return -1; + } + + if (virDomainLockEndDiskAttach(lock, disk) < 0) + VIR_WARN("Unable to release lock on disk %s", disk->src); /* We should have an address already, so make sure */ if (disk->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DRIVE) { @@ -445,6 +511,7 @@ int qemuDomainAttachSCSIDisk(struct qemud_driver *driver, VIR_FREE(devstr); VIR_FREE(drivestr); + virDomainLockReleaseAndFree(lock); return 0; @@ -452,9 +519,14 @@ error: VIR_FREE(devstr); VIR_FREE(drivestr); - if (virSecurityManagerRestoreImageLabel(driver->securityManager, - vm, disk) < 0) - VIR_WARN("Unable to restore security label on %s", disk->src); + if (virDomainLockBeginDiskDetach(lock, disk) >= 0) { + if (virSecurityManagerRestoreImageLabel(driver->securityManager, + vm, disk) < 0) + VIR_WARN("Unable to restore security label on %s", disk->src); + if (virDomainLockEndDiskDetach(lock, disk) < 0) + VIR_WARN("Unable to release lock on disk %s", disk->src); + } + virDomainLockReleaseAndFree(lock); return -1; } @@ -469,6 +541,7 @@ int qemuDomainAttachUsbMassstorageDevice(struct qemud_driver *driver, int i, ret; char *drivestr = NULL; char *devstr = NULL; + virDomainLockPtr lock = NULL; for (i = 0 ; i < vm->def->ndisks ; i++) { if (STREQ(vm->def->disks[i]->dst, disk->dst)) { @@ -478,10 +551,26 @@ int qemuDomainAttachUsbMassstorageDevice(struct qemud_driver *driver, } } + if (!(lock = virDomainLockForModify(driver->contentLockManager, + driver->metadataLockManager, + vm))) + return -1; + + if (virDomainLockBeginDiskAttach(lock, disk) < 0) { + virDomainLockReleaseAndFree(lock); + return -1; + } + if (virSecurityManagerSetImageLabel(driver->securityManager, - vm, disk) < 0) + vm, disk) < 0) { + virDomainLockReleaseAndFree(lock); return -1; + } + + if (virDomainLockEndDiskAttach(lock, disk) < 0) + VIR_WARN("Unable to release lock on disk %s", disk->src); + /* XXX not correct once we allow attaching a USB CDROM */ if (!disk->src) { qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("disk source path is missing")); @@ -528,6 +617,7 @@ int qemuDomainAttachUsbMassstorageDevice(struct qemud_driver *driver, VIR_FREE(devstr); VIR_FREE(drivestr); + virDomainLockReleaseAndFree(lock); return 0; @@ -535,9 +625,14 @@ error: VIR_FREE(devstr); VIR_FREE(drivestr); - if (virSecurityManagerRestoreImageLabel(driver->securityManager, - vm, disk) < 0) - VIR_WARN("Unable to restore security label on %s", disk->src); + if (virDomainLockBeginDiskDetach(lock, disk) >= 0) { + if (virSecurityManagerRestoreImageLabel(driver->securityManager, + vm, disk) < 0) + VIR_WARN("Unable to restore security label on %s", disk->src); + if (virDomainLockEndDiskDetach(lock, disk) < 0) + VIR_WARN("Unable to release lock on disk %s", disk->src); + } + virDomainLockReleaseAndFree(lock); return -1; } @@ -1137,6 +1232,12 @@ int qemuDomainDetachPciDiskDevice(struct qemud_driver *driver, qemuDomainObjPrivatePtr priv = vm->privateData; virCgroupPtr cgroup = NULL; char *drivestr = NULL; + virDomainLockPtr lock = NULL; + + if (!(lock = virDomainLockForModify(driver->contentLockManager, + driver->metadataLockManager, + vm))) + return -1; i = qemuFindDisk(vm->def, dev->data.disk->dst); @@ -1201,9 +1302,13 @@ int qemuDomainDetachPciDiskDevice(struct qemud_driver *driver, virDomainDiskDefFree(detach); - if (virSecurityManagerRestoreImageLabel(driver->securityManager, - vm, dev->data.disk) < 0) - VIR_WARN("Unable to restore security label on %s", dev->data.disk->src); + if (virDomainLockBeginDiskDetach(lock, dev->data.disk)) { + if (virSecurityManagerRestoreImageLabel(driver->securityManager, + vm, dev->data.disk) < 0) + VIR_WARN("Unable to restore security label on %s", dev->data.disk->src); + if (virDomainLockEndDiskDetach(lock, dev->data.disk) < 0) + VIR_WARN("Unable to release lock on disk %s", dev->data.disk->src); + } if (cgroup != NULL) { if (qemuTeardownDiskCgroup(driver, cgroup, dev->data.disk) < 0) @@ -1215,6 +1320,8 @@ int qemuDomainDetachPciDiskDevice(struct qemud_driver *driver, cleanup: VIR_FREE(drivestr); + virCgroupFree(&cgroup); + virDomainLockReleaseAndFree(lock); return ret; } @@ -1228,6 +1335,12 @@ int qemuDomainDetachSCSIDiskDevice(struct qemud_driver *driver, qemuDomainObjPrivatePtr priv = vm->privateData; virCgroupPtr cgroup = NULL; char *drivestr = NULL; + virDomainLockPtr lock = NULL; + + if (!(lock = virDomainLockForModify(driver->contentLockManager, + driver->metadataLockManager, + vm))) + return -1; i = qemuFindDisk(vm->def, dev->data.disk->dst); @@ -1279,9 +1392,13 @@ int qemuDomainDetachSCSIDiskDevice(struct qemud_driver *driver, virDomainDiskDefFree(detach); - if (virSecurityManagerRestoreImageLabel(driver->securityManager, - vm, dev->data.disk) < 0) - VIR_WARN("Unable to restore security label on %s", dev->data.disk->src); + if (virDomainLockBeginDiskDetach(lock, dev->data.disk)) { + if (virSecurityManagerRestoreImageLabel(driver->securityManager, + vm, dev->data.disk) < 0) + VIR_WARN("Unable to restore security label on %s", dev->data.disk->src); + if (virDomainLockEndDiskDetach(lock, dev->data.disk) < 0) + VIR_WARN("Unable to release lock on disk %s", dev->data.disk->src); + } if (cgroup != NULL) { if (qemuTeardownDiskCgroup(driver, cgroup, dev->data.disk) < 0) @@ -1294,6 +1411,7 @@ int qemuDomainDetachSCSIDiskDevice(struct qemud_driver *driver, cleanup: VIR_FREE(drivestr); virCgroupFree(&cgroup); + virDomainLockReleaseAndFree(lock); return ret; } diff --git a/src/qemu/test_libvirtd_qemu.aug b/src/qemu/test_libvirtd_qemu.aug index b4d8833..a3a669c 100644 --- a/src/qemu/test_libvirtd_qemu.aug +++ b/src/qemu/test_libvirtd_qemu.aug @@ -109,6 +109,9 @@ vnc_allow_host_audio = 1 clear_emulator_capabilities = 0 allow_disk_format_probing = 1 + +content_lock_manager = \"fcntl\" +metadata_lock_manager = \"fcntl\" " test Libvirtd_qemu.lns get conf = @@ -228,3 +231,6 @@ allow_disk_format_probing = 1 { "clear_emulator_capabilities" = "0" } { "#empty" } { "allow_disk_format_probing" = "1" } +{ "#empty" } +{ "content_lock_manager" = "fcntl" } +{ "metadata_lock_manager" = "fcntl" } -- 1.7.3.4

Sanlock is a project that implements a disk-paxos locking algorithm. This is suitable for cluster deployments with shared storage. * src/Makefile.am: Add dlopen plugin for sanlock * src/locking/lock_driver_sanlock.c: Sanlock driver --- po/POTFILES.in | 1 + src/Makefile.am | 12 + src/libvirt_private.syms | 1 + src/locking/lock_driver_sanlock.c | 452 +++++++++++++++++++++++++++++++++++++ 4 files changed, 466 insertions(+), 0 deletions(-) create mode 100644 src/locking/lock_driver_sanlock.c diff --git a/po/POTFILES.in b/po/POTFILES.in index 47f2f20..302b9c0 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -30,6 +30,7 @@ src/interface/netcf_driver.c src/internal.h src/libvirt.c src/locking/lock_manager.c +src/locking/lock_driver_sanlock.c src/lxc/lxc_container.c src/lxc/lxc_conf.c src/lxc/lxc_controller.c diff --git a/src/Makefile.am b/src/Makefile.am index b68a9b4..f56ff17 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -97,6 +97,9 @@ DRIVER_SOURCES = \ locking/lock_driver_nop.h locking/lock_driver_nop.c \ locking/domain_lock.h locking/domain_lock.c +LOCK_DRIVER_SANLOCK_SOURCES = \ + locking/lock_driver_sanlock.c + # XML configuration format handling sources # Domain driver generic impl APIs @@ -1148,6 +1151,15 @@ libvirt_qemu_la_CFLAGS = $(AM_CFLAGS) libvirt_qemu_la_LIBADD = libvirt.la $(CYGWIN_EXTRA_LIBADD) EXTRA_DIST += $(LIBVIRT_QEMU_SYMBOL_FILE) + +lockdriverdir = $(libdir)/libvirt/lock-driver +lockdriver_LTLIBRARIES = sanlock.la + +sanlock_la_SOURCES = $(LOCK_DRIVER_SANLOCK_SOURCES) +sanlock_la_CFLAGS = $(AM_CLFAGS) +sanlock_la_LDFLAGS = -no-version -module +sanlock_la_LIBADD = -lsanlock + libexec_PROGRAMS = if WITH_STORAGE_DISK diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 8005f20..b2f11b8 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -585,6 +585,7 @@ virVMOperationTypeToString; # memory.h virAlloc; virAllocN; +virAllocVar; virExpandN; virFree; virReallocN; diff --git a/src/locking/lock_driver_sanlock.c b/src/locking/lock_driver_sanlock.c new file mode 100644 index 0000000..90afe18 --- /dev/null +++ b/src/locking/lock_driver_sanlock.c @@ -0,0 +1,452 @@ +/* + * lock_driver_sanlock.c: A lock driver for Sanlock + * + * Copyright (C) 2010-2011 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#include <config.h> + +#include <stdlib.h> +#include <stdint.h> +#include <unistd.h> +#include <string.h> +#include <stdio.h> +#include <errno.h> +#include <sys/types.h> + +#include <sanlock.h> +#include <sanlock_resource.h> + +#include "lock_driver.h" +#include "logging.h" +#include "virterror_internal.h" +#include "memory.h" +#include "util.h" +#include "files.h" + +#define VIR_FROM_THIS VIR_FROM_LOCKING + +#define virLockError(code, ...) \ + virReportErrorHelper(NULL, VIR_FROM_THIS, code, __FILE__, \ + __FUNCTION__, __LINE__, __VA_ARGS__) + +struct snlk_con { + char vm_name[SANLK_NAME_LEN]; + char vm_uuid[VIR_UUID_BUFLEN]; + unsigned int vm_id; + unsigned int vm_pid; + unsigned int flags; + int sock; + int res_count; + struct sanlk_resource *res_args[SANLK_MAX_RESOURCES]; +}; + +/* + * sanlock plugin for the libvirt virLockManager API + */ + +static int drv_snlk_init(unsigned int version ATTRIBUTE_UNUSED, + unsigned int flags) +{ + virCheckFlags(VIR_LOCK_MANAGER_MODE_CONTENT, -1); + return 0; +} + +static int drv_snlk_deinit(void) +{ + virLockError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unloading sanlock plugin is forbidden")); + return -1; +} + +static int drv_snlk_new(virLockManagerPtr man, + unsigned int type, + size_t nparams, + virLockManagerParamPtr params, + unsigned int flags) +{ + virLockManagerParamPtr param; + struct snlk_con *con; + int i; + + virCheckFlags(VIR_LOCK_MANAGER_MODE_CONTENT, -1); + + if (type != VIR_LOCK_MANAGER_OBJECT_TYPE_DOMAIN) { + virLockError(VIR_ERR_INTERNAL_ERROR, + _("Unsupported object type %d"), type); + return -1; + } + + if (VIR_ALLOC(con) < 0) { + virReportOOMError(); + return -1; + } + + con->flags = flags; + con->sock = -1; + + for (i = 0; i < nparams; i++) { + param = ¶ms[i]; + + if (STREQ(param->key, "uuid")) { + memcpy(con->vm_uuid, param->value.uuid, 16); + } else if (STREQ(param->key, "name")) { + if (!virStrcpy(con->vm_name, param->value.str, SANLK_NAME_LEN)) { + virLockError(VIR_ERR_INTERNAL_ERROR, + _("Domain name '%s' exceeded %d characters"), + param->value.str, SANLK_NAME_LEN); + goto error; + } + } else if (STREQ(param->key, "pid")) { + con->vm_pid = param->value.ui; + } else if (STREQ(param->key, "id")) { + con->vm_id = param->value.ui; + } + } + + man->privateData = con; + return 0; + +error: + VIR_FREE(con); + return -1; +} + +static void drv_snlk_free(virLockManagerPtr man) +{ + struct snlk_con *con = man->privateData; + + DEBUG("man=%p sock=%d", man, con->sock); +#if 0 + /* We do *not* want to close the socket here. We need the + * socket to keep alive other sanlock will fence the + * process. The socket will be explicitly closed before + * free, in the release_object method, if neccessary. + */ + VIR_FORCE_CLOSE(con->sock); +#endif + VIR_FREE(con); + man->privateData = NULL; +} + +static int add_con_resource(struct snlk_con *con, + const char *name, + size_t nparams, + virLockManagerParamPtr params) +{ + virLockManagerParamPtr param; + struct sanlk_resource *res; + int i; + + if (VIR_ALLOC_VAR(res, struct sanlk_disk, 1) < 0) { + virReportOOMError(); + return -1; + } + + res->num_disks = 1; + if (!virStrcpy(res->name, name, SANLK_NAME_LEN)) { + virLockError(VIR_ERR_INTERNAL_ERROR, + _("Resource name '%s' exceeds %d characters"), + name, SANLK_NAME_LEN); + goto error; + } + + for (i = 0; i < nparams; i++) { + param = ¶ms[i]; + + if (STREQ(param->key, "path")) { + if (!virStrcpy(res->disks[0].path, param->value.str, SANLK_PATH_LEN)) { + virLockError(VIR_ERR_INTERNAL_ERROR, + _("Lease path '%s' exceeds %d characters"), + param->value.str, SANLK_PATH_LEN); + goto error; + } + } else if (STREQ(param->key, "offset")) { + res->disks[0].offset = param->value.ul; + } + } + + con->res_args[con->res_count] = res; + con->res_count++; + return 0; + +error: + VIR_FREE(res); + return -1; +} + +static int drv_snlk_add_resource(virLockManagerPtr man, + unsigned int type, + const char *name, + size_t nparams, + virLockManagerParamPtr params, + unsigned int flags ATTRIBUTE_UNUSED) +{ + struct snlk_con *con = man->privateData; + /* must be called before acquire_object */ + if (con->sock != -1) { + virLockError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot add resources to an existing lock")); + return -1; + } + + if (con->res_count == SANLK_MAX_RESOURCES) { + virLockError(VIR_ERR_INTERNAL_ERROR, + _("Too many resources %d for object"), + SANLK_MAX_RESOURCES); + return -1; + } + + if (type != VIR_LOCK_MANAGER_RESOURCE_TYPE_LEASE) + return 0; + + if (add_con_resource(con, name, nparams, params) < 0) + return -1; + + return 0; +} + +static int drv_snlk_acquire_object(virLockManagerPtr man, + const char *state, + unsigned int flags ATTRIBUTE_UNUSED) +{ + struct snlk_con *con = man->privateData; + struct sanlk_options *opt = NULL; + int i, rv, sock; + int pid = getpid(); + + /* acquire_object can be called only once */ + if (con->sock != -1) { + virLockError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Object lock is already held")); + return -1; + } + + if (con->vm_pid != pid) { + virLockError(VIR_ERR_INTERNAL_ERROR, + _("Object lock attempt from pid %d, expected %d"), + pid, con->vm_pid); + return -1; + } + + if (VIR_ALLOC_VAR(opt, char, state ? strlen(state) : 0) < 0) { + virReportOOMError(); + return -1; + } + + if (!virStrcpy(opt->owner_name, con->vm_name, SANLK_NAME_LEN)) { + virLockError(VIR_ERR_INTERNAL_ERROR, + _("Domain name '%s' exceeded %d characters"), + con->vm_name, SANLK_NAME_LEN); + goto error; + } + + if (state) { + opt->flags = SANLK_FLG_INCOMING; + opt->len = strlen(state); + strcpy(opt->str, state); + } + + VIR_DEBUG0("Register sanlock"); + sock = sanlock_register(); + if (sock < 0) { + virLockError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to open socket to sanlock daemon")); + goto error; + } + VIR_DEBUG("Acquiring object %u", con->res_count); + rv = sanlock_acquire(sock, -1, con->res_count, con->res_args, opt); + VIR_DEBUG("Acquire result %d", rv); + if (rv < 0) { + virLockError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to acquire lock")); + goto error; + } + VIR_FREE(opt); + + for (i = 0; i < con->res_count; i++) + VIR_FREE(con->res_args[i]); + + con->sock = sock; + + return 0; + +error: + VIR_FORCE_CLOSE(sock); + VIR_FREE(opt); + return -1; +} + +static int drv_snlk_attach_object(virLockManagerPtr man ATTRIBUTE_UNUSED, + unsigned int flags ATTRIBUTE_UNUSED) +{ + return 0; +} + +static int drv_snlk_detach_object(virLockManagerPtr man ATTRIBUTE_UNUSED, + unsigned int flags ATTRIBUTE_UNUSED) +{ + return 0; +} + +static int drv_snlk_release_object(virLockManagerPtr man ATTRIBUTE_UNUSED, + unsigned int flags ATTRIBUTE_UNUSED) +{ + struct snlk_con *con = man->privateData; + + if (con->sock == -1) { + virLockError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot release object that is not locked")); + return -1; + } + VIR_FORCE_CLOSE(con->sock); + return 0; +} + +static int drv_snlk_get_state(virLockManagerPtr man ATTRIBUTE_UNUSED, + char **state, + unsigned int flags ATTRIBUTE_UNUSED) +{ + *state = NULL; + + return 0; +} + + +static int drv_snlk_acquire_resource(virLockManagerPtr man, + unsigned int type, + const char *name, + size_t nparams, + virLockManagerParamPtr params, + unsigned int flags ATTRIBUTE_UNUSED) +{ + struct snlk_con *con = man->privateData; + struct sanlk_options opt; + int rv; + + if (con->sock != -1) { + virLockError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot acquire resource on unlocked object")); + return -1; + } + + if (!con->vm_pid) { + virLockError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot acquire resource on unlocked object")); + return -1; + } + + if (type != VIR_LOCK_MANAGER_RESOURCE_TYPE_LEASE) + return 0; + + if (add_con_resource(con, name, nparams, params) < 0) + return -1; + + /* Setting REACQUIRE tells sanlock that if con->vm_pid previously held + and released the resource, we need to ensure no other host has + acquired a lease on it in the mean time. If this is a new resource + that the pid hasn't held before, then REACQUIRE will have no effect + since sanlock will have no memory of a previous version. */ + + memset(&opt, 0, sizeof(struct sanlk_options)); + if (!virStrcpy(opt.owner_name, con->vm_name, SANLK_NAME_LEN)) { + virLockError(VIR_ERR_INTERNAL_ERROR, + _("Domain name '%s' exceeds %d characters"), + con->vm_name, SANLK_NAME_LEN); + return -1; + } + opt.flags = SANLK_FLG_REACQUIRE; + opt.len = 0; + + rv = sanlock_acquire(-1, con->vm_pid, con->res_count, con->res_args, &opt); + + VIR_FREE(con->res_args[0]); + + if (rv < 0) { + virLockError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to acquire resource")); + return -1; + } + + return 0; +} + +static int drv_snlk_release_resource(virLockManagerPtr man, + unsigned int type, + const char *name, + size_t nparams, + virLockManagerParamPtr params, + unsigned int flags ATTRIBUTE_UNUSED) +{ + struct snlk_con *con = man->privateData; + int rv; + + if (con->sock != -1) { + virLockError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot acquire resource on unlocked object")); + return -1; + } + + if (!con->vm_pid) { + virLockError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot acquire resource on unlocked object")); + return -1; + } + + if (type != VIR_LOCK_MANAGER_RESOURCE_TYPE_LEASE) + return 0; + + if (add_con_resource(con, name, nparams, params) < 0) + return -1; + + rv = sanlock_release(-1, con->vm_pid, con->res_count, con->res_args); + + VIR_FREE(con->res_args[0]); + + if (rv < 0) { + virLockError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to release resource")); + return -1; + } + + return 0; +} + +virLockDriver virLockDriverImpl = +{ + .version = VIR_LOCK_MANAGER_VERSION, + .flags = VIR_LOCK_MANAGER_MODE_CONTENT, + + .drvInit = drv_snlk_init, + .drvDeinit = drv_snlk_deinit, + + .drvNew = drv_snlk_new, + .drvFree = drv_snlk_free, + + .drvAddResource = drv_snlk_add_resource, + + .drvAcquireObject = drv_snlk_acquire_object, + .drvAttachObject = drv_snlk_attach_object, + .drvDetachObject = drv_snlk_detach_object, + .drvReleaseObject = drv_snlk_release_object, + + .drvGetState = drv_snlk_get_state, + + .drvAcquireResource = drv_snlk_acquire_resource, + .drvReleaseResource = drv_snlk_release_resource, +}; -- 1.7.3.4
participants (2)
-
Daniel P. Berrange
-
Eric Blake