[libvirt] [PATCH 0/5] Proof of concept for libvirt_qemu shim process

This patch series provides a proof of concept impl of the libvirt_qemu shim process I previously suggested here: https://www.redhat.com/archives/libvir-list/2017-November/msg00526.html The end goal is that we'll be able to fully isolate managemen to each QEMU process. ie all the virDomain* APIs would be executed inside the libvirt_qemu shim process. The QEMU driver in libvirtd would merely deal with aggregating the views / tracking central resource allocations. This series, however, does *not* do that. It is a very much smaller proof of concept, principally to: - Learn about pros/cons of different approaches for the long term goal - Provide a working PoC that can be used by the KubeVirt project such that they can spawn QEMU in a separate docker container than libvirtd is in, and inherit namespaces & cgroup placement, etc So, in this series, libvirtd functionality remains essentially unchanged. All I have done is provide a new binary 'libvirt_qemu' that accepts an XML file as input, launches the QEMU process directly, and then calls a new virDomainQemuReconnect() API to make libvirtd aware of its existance. At this point libvirtd can deal with it normally (some caveats listed in last patch). Usage is pretty simple - start libvirtd normally, then to launch a guest just use $ libvirt_qemu /path/to/xml/file It'll be associated with whatever libvirtd instance is running with the same user account. ie if you launch libvirt_qemu as root, it'll associate with qemu:///system. By default it will still place VMs in a dedicated cgroup. To inherit the cgroup of the caller, use <resource register="none"/> in the XML schema to turn off cgroup setup in libvirt_qemu. Having written this PoC, however, I'm less convinced that a bottom up, minimal impl which incrementally picks certain subsets of QEMU driver APIs to call is the right way to attack this problem. ie I was intending to have this minimal shim, then gradually move functionality into it from libvirtd. This feels like it is going to create alot of busy-work, delaying the end goal. I think instead a different approach might be better in the short term. Take the existing libvirtd code as a starting point, clone it to a libvirt_qemu and just start cutting out existing code to make a lighter weight binary that can only run a single guest, whose XML is passed in. We would still ultimately need to deal with much of the same issues, like getting VMs reported to the central libvirtd, but I think that might get to the end result, where all APIs run inside the shim, quicker. The key difference is that we could sooner focus on the harder problems of dealing with shared resource allocation tracking, instead of doing lots of code rewiring for API execution. Daniel P. Berrange (5): conf: allow different resource registration modes conf: expose APIs to let drivers load individual config / status files qemu: add a public API to trigger QEMU driver to connect to running guest qemu: implement the new virDomainQemuReconnect method qemu: implement the 'libvirt_qemu' shim for launching guests externally include/libvirt/libvirt-qemu.h | 4 + po/POTFILES.in | 1 + src/Makefile.am | 49 ++++ src/conf/domain_conf.c | 42 +++- src/conf/domain_conf.h | 12 + src/conf/virdomainobjlist.c | 98 +++++--- src/conf/virdomainobjlist.h | 17 ++ src/driver-hypervisor.h | 5 + src/libvirt-qemu.c | 48 ++++ src/libvirt_private.syms | 4 + src/libvirt_qemu.syms | 5 + src/lxc/lxc_cgroup.c | 34 +++ src/lxc/lxc_cgroup.h | 3 + src/lxc/lxc_process.c | 11 +- src/qemu/qemu_cgroup.c | 69 +++++- src/qemu/qemu_conf.h | 2 +- src/qemu/qemu_controller.c | 551 +++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.c | 2 +- src/qemu/qemu_driver.c | 59 ++++- src/qemu/qemu_process.c | 31 ++- src/qemu/qemu_process.h | 1 + src/remote/qemu_protocol.x | 18 +- src/remote/remote_driver.c | 1 + src/util/vircgroup.c | 55 ++-- src/util/vircgroup.h | 10 +- 25 files changed, 1046 insertions(+), 86 deletions(-) create mode 100644 src/qemu/qemu_controller.c -- 2.14.3

Currently the QEMU driver has three ways of setting up cgroups. It either skips them entirely (if non-root), or uses systemd-machined, or uses cgroups directly. It is further possible to register directly with systemd and bypass machined. We don't support this by systemd-nsspawn does and we ought to. This change adds ability to configure the mechanism for registering resources between all these options explicitly. via <resource register="none|direct|machined|systemd"/> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/conf/domain_conf.c | 42 ++++++++++++++++++++++------- src/conf/domain_conf.h | 12 +++++++++ src/libvirt_private.syms | 2 ++ src/lxc/lxc_cgroup.c | 34 ++++++++++++++++++++++++ src/lxc/lxc_cgroup.h | 3 +++ src/lxc/lxc_process.c | 11 ++++---- src/qemu/qemu_cgroup.c | 69 +++++++++++++++++++++++++++++++++++++++++------- src/util/vircgroup.c | 55 ++++++++++++++++++++++++-------------- src/util/vircgroup.h | 10 ++++++- 9 files changed, 194 insertions(+), 44 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 9a62bc472c..fb8e7a0ec7 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -910,6 +910,14 @@ VIR_ENUM_IMPL(virDomainHPTResizing, "required", ); +VIR_ENUM_IMPL(virDomainResourceRegister, + VIR_DOMAIN_RESOURCE_REGISTER_LAST, + "default", + "none", + "cgroup", + "machined", + "systemd"); + /* Internal mapping: subset of block job types that can be present in * <mirror> XML (remaining types are not two-phase). */ VIR_ENUM_DECL(virDomainBlockJob) @@ -17698,16 +17706,20 @@ virDomainResourceDefParse(xmlNodePtr node, { virDomainResourceDefPtr def = NULL; xmlNodePtr tmp = ctxt->node; + char *reg; ctxt->node = node; if (VIR_ALLOC(def) < 0) goto error; - /* Find out what type of virtualization to use */ - if (!(def->partition = virXPathString("string(./partition)", ctxt))) { - virReportError(VIR_ERR_INTERNAL_ERROR, - "%s", _("missing resource partition attribute")); + def->partition = virXPathString("string(./partition)", ctxt); + + reg = virXMLPropString(node, "register"); + if (reg != NULL && + (def->reg = virDomainResourceRegisterTypeFromString(reg)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + "%s", _("Invalid register attribute")); goto error; } @@ -25568,11 +25580,23 @@ static void virDomainResourceDefFormat(virBufferPtr buf, virDomainResourceDefPtr def) { - virBufferAddLit(buf, "<resource>\n"); - virBufferAdjustIndent(buf, 2); - virBufferEscapeString(buf, "<partition>%s</partition>\n", def->partition); - virBufferAdjustIndent(buf, -2); - virBufferAddLit(buf, "</resource>\n"); + if (def->reg == VIR_DOMAIN_RESOURCE_REGISTER_DEFAULT && + def->partition == NULL) + return; + + virBufferAddLit(buf, "<resource"); + if (def->reg != VIR_DOMAIN_RESOURCE_REGISTER_DEFAULT) + virBufferAsprintf(buf, " register='%s'", virDomainResourceRegisterTypeToString(def->reg)); + + if (def->partition) { + virBufferAddLit(buf, ">\n"); + virBufferAdjustIndent(buf, 2); + virBufferEscapeString(buf, "<partition>%s</partition>\n", def->partition); + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</resource>\n"); + } else { + virBufferAddLit(buf, "/>\n"); + } } diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 6f7f96b3dd..a7a6628a36 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2145,9 +2145,20 @@ struct _virDomainPanicDef { void virBlkioDeviceArrayClear(virBlkioDevicePtr deviceWeights, int ndevices); +typedef enum { + VIR_DOMAIN_RESOURCE_REGISTER_DEFAULT, + VIR_DOMAIN_RESOURCE_REGISTER_NONE, + VIR_DOMAIN_RESOURCE_REGISTER_CGROUP, + VIR_DOMAIN_RESOURCE_REGISTER_MACHINED, + VIR_DOMAIN_RESOURCE_REGISTER_SYSTEMD, + + VIR_DOMAIN_RESOURCE_REGISTER_LAST, +} virDomainResourceRegister; + typedef struct _virDomainResourceDef virDomainResourceDef; typedef virDomainResourceDef *virDomainResourceDefPtr; struct _virDomainResourceDef { + int reg; /* enum virDomainResourceRegister */ char *partition; }; @@ -3325,6 +3336,7 @@ VIR_ENUM_DECL(virDomainMemorySource) VIR_ENUM_DECL(virDomainMemoryAllocation) VIR_ENUM_DECL(virDomainIOMMUModel) VIR_ENUM_DECL(virDomainShmemModel) +VIR_ENUM_DECL(virDomainResourceRegister) /* from libvirt.h */ VIR_ENUM_DECL(virDomainState) VIR_ENUM_DECL(virDomainNostateReason) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index d5c3b9abb5..a0fde65dba 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -489,6 +489,8 @@ virDomainRedirdevBusTypeToString; virDomainRedirdevDefFind; virDomainRedirdevDefFree; virDomainRedirdevDefRemove; +virDomainResourceRegisterTypeFromString; +virDomainResourceRegisterTypeToString; virDomainRNGBackendTypeToString; virDomainRNGDefFree; virDomainRNGFind; diff --git a/src/lxc/lxc_cgroup.c b/src/lxc/lxc_cgroup.c index 3369801870..7bd479df1b 100644 --- a/src/lxc/lxc_cgroup.c +++ b/src/lxc/lxc_cgroup.c @@ -478,6 +478,35 @@ static int virLXCCgroupSetupDeviceACL(virDomainDefPtr def, return ret; } +int virLXCCgroupMode(virDomainResourceRegister reg, + virCgroupRegister *cgreg) +{ + switch (reg) { + case VIR_DOMAIN_RESOURCE_REGISTER_NONE: + goto unsupported; + case VIR_DOMAIN_RESOURCE_REGISTER_DEFAULT: + *cgreg = VIR_CGROUP_REGISTER_DEFAULT; + break; + case VIR_DOMAIN_RESOURCE_REGISTER_MACHINED: + *cgreg = VIR_CGROUP_REGISTER_MACHINED; + break; + case VIR_DOMAIN_RESOURCE_REGISTER_CGROUP: + *cgreg = VIR_CGROUP_REGISTER_DIRECT; + break; + case VIR_DOMAIN_RESOURCE_REGISTER_SYSTEMD: + default: + goto unsupported; + } + + return 0; + + unsupported: + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Resource register '%s' not available"), + virDomainResourceRegisterTypeToString(reg)); + return -1; +} + virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def, pid_t initpid, @@ -485,11 +514,15 @@ virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def, int *nicindexes) { virCgroupPtr cgroup = NULL; + virCgroupRegister reg; char *machineName = virLXCDomainGetMachineName(def, 0); if (!machineName) goto cleanup; + if (virLXCCgroupMode(def->resource->reg, ®) < 0) + goto cleanup; + if (def->resource->partition[0] != '/') { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("Resource partition '%s' must start with '/'"), @@ -504,6 +537,7 @@ virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def, initpid, true, nnicindexes, nicindexes, + ®, def->resource->partition, -1, &cgroup) < 0) diff --git a/src/lxc/lxc_cgroup.h b/src/lxc/lxc_cgroup.h index e85f21c47d..979d6a154b 100644 --- a/src/lxc/lxc_cgroup.h +++ b/src/lxc/lxc_cgroup.h @@ -27,6 +27,9 @@ # include "lxc_fuse.h" # include "virusb.h" +int virLXCCgroupMode(virDomainResourceRegister reg, + virCgroupRegister *cgreg); + virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def, pid_t initpid, size_t nnicindexes, diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index efd8a69000..24aa0cb0bf 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -1166,6 +1166,7 @@ virLXCProcessEnsureRootFS(virDomainObjPtr vm) return -1; } + /** * virLXCProcessStart: * @conn: pointer to connection @@ -1260,13 +1261,13 @@ int virLXCProcessStart(virConnectPtr conn, if (VIR_ALLOC(res) < 0) goto cleanup; - if (VIR_STRDUP(res->partition, "/machine") < 0) { - VIR_FREE(res); - goto cleanup; - } - vm->def->resource = res; } + if (vm->def->resource->reg != VIR_DOMAIN_RESOURCE_REGISTER_NONE && + !vm->def->resource->partition) { + if (VIR_STRDUP(vm->def->resource->partition, "/machine") < 0) + goto cleanup; + } if (virAsprintf(&logfile, "%s/%s.log", cfg->logDir, vm->def->name) < 0) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 19252ea239..5167d7fee1 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -834,6 +834,46 @@ qemuSetupCpuCgroup(virDomainObjPtr vm) } +static int qemuGetCgroupMode(virDomainObjPtr vm, + virDomainResourceRegister reg, + virCgroupRegister *cgreg) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + bool avail = virQEMUDriverIsPrivileged(priv->driver) && + virCgroupAvailable(); + + switch (reg) { + case VIR_DOMAIN_RESOURCE_REGISTER_NONE: + return 0; + case VIR_DOMAIN_RESOURCE_REGISTER_DEFAULT: + if (!avail) + return 0; + *cgreg = VIR_CGROUP_REGISTER_DEFAULT; + break; + case VIR_DOMAIN_RESOURCE_REGISTER_MACHINED: + if (!avail) + goto unsupported; + *cgreg = VIR_CGROUP_REGISTER_MACHINED; + break; + case VIR_DOMAIN_RESOURCE_REGISTER_CGROUP: + if (!avail) + goto unsupported; + *cgreg = VIR_CGROUP_REGISTER_DIRECT; + break; + case VIR_DOMAIN_RESOURCE_REGISTER_SYSTEMD: + default: + goto unsupported; + } + + return 1; + + unsupported: + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Resource register '%s' not available"), + virDomainResourceRegisterTypeToString(reg)); + return -1; +} + static int qemuInitCgroup(virDomainObjPtr vm, size_t nnicindexes, @@ -842,11 +882,17 @@ qemuInitCgroup(virDomainObjPtr vm, int ret = -1; qemuDomainObjPrivatePtr priv = vm->privateData; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(priv->driver); + virCgroupRegister reg; + int rv; - if (!virQEMUDriverIsPrivileged(priv->driver)) - goto done; - - if (!virCgroupAvailable()) + rv = qemuGetCgroupMode(vm, + vm->def->resource ? + vm->def->resource->reg : + VIR_DOMAIN_RESOURCE_REGISTER_DEFAULT, + ®); + if (rv < 0) + goto cleanup; + if (rv == 0) goto done; virCgroupFree(&priv->cgroup); @@ -857,13 +903,12 @@ qemuInitCgroup(virDomainObjPtr vm, if (VIR_ALLOC(res) < 0) goto cleanup; - if (VIR_STRDUP(res->partition, "/machine") < 0) { - VIR_FREE(res); - goto cleanup; - } - vm->def->resource = res; } + if (!vm->def->resource->partition) { + if (VIR_STRDUP(vm->def->resource->partition, "/machine") < 0) + goto cleanup; + } if (vm->def->resource->partition[0] != '/') { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, @@ -879,6 +924,7 @@ qemuInitCgroup(virDomainObjPtr vm, vm->pid, false, nnicindexes, nicindexes, + ®, vm->def->resource->partition, cfg->cgroupControllers, &priv->cgroup) < 0) { @@ -980,6 +1026,11 @@ qemuConnectCgroup(virDomainObjPtr vm) virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(priv->driver); int ret = -1; + if (vm->def->resource && + vm->def->resource->reg == VIR_DOMAIN_RESOURCE_REGISTER_NONE) { + goto done; + } + if (!virQEMUDriverIsPrivileged(priv->driver)) goto done; diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 0a31947b0d..07ffb78c78 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -1733,6 +1733,7 @@ virCgroupNewMachine(const char *name, bool isContainer, size_t nnicindexes, int *nicindexes, + virCgroupRegister *reg, const char *partition, int controllers, virCgroupPtr *group) @@ -1741,28 +1742,42 @@ virCgroupNewMachine(const char *name, *group = NULL; - if ((rv = virCgroupNewMachineSystemd(name, - drivername, - uuid, - rootdir, - pidleader, - isContainer, - nnicindexes, - nicindexes, - partition, - controllers, - group)) == 0) - return 0; + if (*reg == VIR_CGROUP_REGISTER_DEFAULT || + *reg == VIR_CGROUP_REGISTER_MACHINED) { + if ((rv = virCgroupNewMachineSystemd(name, + drivername, + uuid, + rootdir, + pidleader, + isContainer, + nnicindexes, + nicindexes, + partition, + controllers, + group)) == 0) { + *reg = VIR_CGROUP_REGISTER_MACHINED; + return 0; + } - if (rv == -1) - return -1; + if (rv == -1) + return -1; + + if (*reg == VIR_CGROUP_REGISTER_MACHINED) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + "%s", _("Systemd machined requested, but not available")); + return -1; + } + } - return virCgroupNewMachineManual(name, - drivername, - pidleader, - partition, - controllers, - group); + rv = virCgroupNewMachineManual(name, + drivername, + pidleader, + partition, + controllers, + group); + if (rv == 0) + *reg = VIR_CGROUP_REGISTER_DIRECT; + return rv; } diff --git a/src/util/vircgroup.h b/src/util/vircgroup.h index d833927678..63ee1aba5c 100644 --- a/src/util/vircgroup.h +++ b/src/util/vircgroup.h @@ -46,7 +46,8 @@ enum { VIR_CGROUP_CONTROLLER_LAST }; -VIR_ENUM_DECL(virCgroupController); +VIR_ENUM_DECL(virCgroupController) + /* Items of this enum are used later in virCgroupNew to create * bit array stored in int. Like this: * 1 << VIR_CGROUP_CONTROLLER_CPU @@ -103,6 +104,12 @@ virCgroupNewDetectMachine(const char *name, virCgroupPtr *group) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +typedef enum { + VIR_CGROUP_REGISTER_DEFAULT, + VIR_CGROUP_REGISTER_DIRECT, + VIR_CGROUP_REGISTER_MACHINED, +} virCgroupRegister; + int virCgroupNewMachine(const char *name, const char *drivername, const unsigned char *uuid, @@ -111,6 +118,7 @@ int virCgroupNewMachine(const char *name, bool isContainer, size_t nnicindexes, int *nicindexes, + virCgroupRegister *reg, const char *partition, int controllers, virCgroupPtr *group) -- 2.14.3

I know it's a POC type series, but figured I'd take a look primarily at the first 4 patches, learn a bit in the 5th one, and provide some review feedback in order to help move things along... There's perhaps a few adjustments in here that could be made into multiple patches. On 12/20/2017 11:47 AM, Daniel P. Berrange wrote:
Currently the QEMU driver has three ways of setting up cgroups. It either skips them entirely (if non-root), or uses systemd-machined, or uses cgroups directly.
It is further possible to register directly with systemd and bypass machined. We don't support this by systemd-nsspawn does and we ought
s/by/but/ (I think) Still a bit odd to add an option that isn't supported unless there was a plan to add the support when formulating the non POC patches...
to.
This change adds ability to configure the mechanism for registering resources between all these options explicitly. via
s/. via/ via:/ (e.g. remove the '.' and add ':')
<resource register="none|direct|machined|systemd"/>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/conf/domain_conf.c | 42 ++++++++++++++++++++++------- src/conf/domain_conf.h | 12 +++++++++ src/libvirt_private.syms | 2 ++ src/lxc/lxc_cgroup.c | 34 ++++++++++++++++++++++++ src/lxc/lxc_cgroup.h | 3 +++ src/lxc/lxc_process.c | 11 ++++---- src/qemu/qemu_cgroup.c | 69 +++++++++++++++++++++++++++++++++++++++++------- src/util/vircgroup.c | 55 ++++++++++++++++++++++++-------------- src/util/vircgroup.h | 10 ++++++- 9 files changed, 194 insertions(+), 44 deletions(-)
formatdomain.html.in would need to be augmented to describe the new attribute. domaincommon.rng modified to add the new attribute. A tests either added or existing one adjusted to exhibit the new attribute option(s).
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 9a62bc472c..fb8e7a0ec7 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -910,6 +910,14 @@ VIR_ENUM_IMPL(virDomainHPTResizing, "required", );
+VIR_ENUM_IMPL(virDomainResourceRegister, + VIR_DOMAIN_RESOURCE_REGISTER_LAST, + "default", + "none", + "cgroup", + "machined", + "systemd"); + /* Internal mapping: subset of block job types that can be present in * <mirror> XML (remaining types are not two-phase). */ VIR_ENUM_DECL(virDomainBlockJob) @@ -17698,16 +17706,20 @@ virDomainResourceDefParse(xmlNodePtr node, { virDomainResourceDefPtr def = NULL; xmlNodePtr tmp = ctxt->node; + char *reg;
= NULL;
ctxt->node = node;
if (VIR_ALLOC(def) < 0) goto error;
- /* Find out what type of virtualization to use */ - if (!(def->partition = virXPathString("string(./partition)", ctxt))) { - virReportError(VIR_ERR_INTERNAL_ERROR, - "%s", _("missing resource partition attribute")); + def->partition = virXPathString("string(./partition)", ctxt);
It seems partition is now optional, although formatdomain seems to have already indicated that... However, that's not a perfect match to domaincommon.rng where only "respartition" is optional. (This type of change could perhaps be it's own patch).
+ + reg = virXMLPropString(node, "register"); + if (reg != NULL && + (def->reg = virDomainResourceRegisterTypeFromString(reg)) <= 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + "%s", _("Invalid register attribute")); goto error; }
Need a VIR_FREE(reg); since we haven't GO-ified yet in both error and normal paths unless this is rewritten a bit...
@@ -25568,11 +25580,23 @@ static void virDomainResourceDefFormat(virBufferPtr buf, virDomainResourceDefPtr def) { - virBufferAddLit(buf, "<resource>\n"); - virBufferAdjustIndent(buf, 2); - virBufferEscapeString(buf, "<partition>%s</partition>\n", def->partition); - virBufferAdjustIndent(buf, -2); - virBufferAddLit(buf, "</resource>\n"); + if (def->reg == VIR_DOMAIN_RESOURCE_REGISTER_DEFAULT && + def->partition == NULL) + return; + + virBufferAddLit(buf, "<resource"); + if (def->reg != VIR_DOMAIN_RESOURCE_REGISTER_DEFAULT) + virBufferAsprintf(buf, " register='%s'", virDomainResourceRegisterTypeToString(def->reg)); + + if (def->partition) { + virBufferAddLit(buf, ">\n"); + virBufferAdjustIndent(buf, 2); + virBufferEscapeString(buf, "<partition>%s</partition>\n", def->partition); + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</resource>\n"); + } else { + virBufferAddLit(buf, "/>\n"); + } }
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 6f7f96b3dd..a7a6628a36 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2145,9 +2145,20 @@ struct _virDomainPanicDef { void virBlkioDeviceArrayClear(virBlkioDevicePtr deviceWeights, int ndevices);
+typedef enum { + VIR_DOMAIN_RESOURCE_REGISTER_DEFAULT, + VIR_DOMAIN_RESOURCE_REGISTER_NONE, + VIR_DOMAIN_RESOURCE_REGISTER_CGROUP, + VIR_DOMAIN_RESOURCE_REGISTER_MACHINED, + VIR_DOMAIN_RESOURCE_REGISTER_SYSTEMD, + + VIR_DOMAIN_RESOURCE_REGISTER_LAST, +} virDomainResourceRegister; + typedef struct _virDomainResourceDef virDomainResourceDef; typedef virDomainResourceDef *virDomainResourceDefPtr; struct _virDomainResourceDef { + int reg; /* enum virDomainResourceRegister */ char *partition; };
@@ -3325,6 +3336,7 @@ VIR_ENUM_DECL(virDomainMemorySource) VIR_ENUM_DECL(virDomainMemoryAllocation) VIR_ENUM_DECL(virDomainIOMMUModel) VIR_ENUM_DECL(virDomainShmemModel) +VIR_ENUM_DECL(virDomainResourceRegister) /* from libvirt.h */ VIR_ENUM_DECL(virDomainState) VIR_ENUM_DECL(virDomainNostateReason) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index d5c3b9abb5..a0fde65dba 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -489,6 +489,8 @@ virDomainRedirdevBusTypeToString; virDomainRedirdevDefFind; virDomainRedirdevDefFree; virDomainRedirdevDefRemove; +virDomainResourceRegisterTypeFromString; +virDomainResourceRegisterTypeToString; virDomainRNGBackendTypeToString; virDomainRNGDefFree; virDomainRNGFind; diff --git a/src/lxc/lxc_cgroup.c b/src/lxc/lxc_cgroup.c index 3369801870..7bd479df1b 100644 --- a/src/lxc/lxc_cgroup.c +++ b/src/lxc/lxc_cgroup.c @@ -478,6 +478,35 @@ static int virLXCCgroupSetupDeviceACL(virDomainDefPtr def, return ret; }
+int virLXCCgroupMode(virDomainResourceRegister reg, + virCgroupRegister *cgreg) +{ + switch (reg) { + case VIR_DOMAIN_RESOURCE_REGISTER_NONE: + goto unsupported; + case VIR_DOMAIN_RESOURCE_REGISTER_DEFAULT: + *cgreg = VIR_CGROUP_REGISTER_DEFAULT; + break; + case VIR_DOMAIN_RESOURCE_REGISTER_MACHINED: + *cgreg = VIR_CGROUP_REGISTER_MACHINED; + break; + case VIR_DOMAIN_RESOURCE_REGISTER_CGROUP: + *cgreg = VIR_CGROUP_REGISTER_DIRECT; + break; + case VIR_DOMAIN_RESOURCE_REGISTER_SYSTEMD: + default:
Use VIR_DOMAIN_RESOURCE_REGISTER_LAST instead of default
+ goto unsupported; + } + + return 0; + + unsupported: + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Resource register '%s' not available"), + virDomainResourceRegisterTypeToString(reg)); + return -1; +} +
virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def, pid_t initpid, @@ -485,11 +514,15 @@ virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def, int *nicindexes) { virCgroupPtr cgroup = NULL; + virCgroupRegister reg; char *machineName = virLXCDomainGetMachineName(def, 0);
if (!machineName) goto cleanup;
+ if (virLXCCgroupMode(def->resource->reg, ®) < 0) + goto cleanup; + if (def->resource->partition[0] != '/') { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("Resource partition '%s' must start with '/'"), @@ -504,6 +537,7 @@ virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def, initpid, true, nnicindexes, nicindexes, + ®, def->resource->partition, -1, &cgroup) < 0) diff --git a/src/lxc/lxc_cgroup.h b/src/lxc/lxc_cgroup.h index e85f21c47d..979d6a154b 100644 --- a/src/lxc/lxc_cgroup.h +++ b/src/lxc/lxc_cgroup.h @@ -27,6 +27,9 @@ # include "lxc_fuse.h" # include "virusb.h"
+int virLXCCgroupMode(virDomainResourceRegister reg, + virCgroupRegister *cgreg); + virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def, pid_t initpid, size_t nnicindexes, diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index efd8a69000..24aa0cb0bf 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -1166,6 +1166,7 @@ virLXCProcessEnsureRootFS(virDomainObjPtr vm) return -1; }
+ /** * virLXCProcessStart: * @conn: pointer to connection @@ -1260,13 +1261,13 @@ int virLXCProcessStart(virConnectPtr conn, if (VIR_ALLOC(res) < 0) goto cleanup;
- if (VIR_STRDUP(res->partition, "/machine") < 0) { - VIR_FREE(res); - goto cleanup; - } - vm->def->resource = res; }
So the above could be shortened to just: if (!vm->def->resource && VIR_ALLOC(vm->def->resource) < 0) goto cleanup;
+ if (vm->def->resource->reg != VIR_DOMAIN_RESOURCE_REGISTER_NONE && + !vm->def->resource->partition) { + if (VIR_STRDUP(vm->def->resource->partition, "/machine") < 0) + goto cleanup; + }
This could be similar too w/r/t single if condition...
if (virAsprintf(&logfile, "%s/%s.log", cfg->logDir, vm->def->name) < 0) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 19252ea239..5167d7fee1 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -834,6 +834,46 @@ qemuSetupCpuCgroup(virDomainObjPtr vm) }
There could be a helper here that would be callable by qemuConnectCgroup that would return the @avail value.
+static int qemuGetCgroupMode(virDomainObjPtr vm, + virDomainResourceRegister reg, + virCgroupRegister *cgreg) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + bool avail = virQEMUDriverIsPrivileged(priv->driver) && + virCgroupAvailable(); + + switch (reg) { + case VIR_DOMAIN_RESOURCE_REGISTER_NONE: + return 0; + case VIR_DOMAIN_RESOURCE_REGISTER_DEFAULT: + if (!avail) + return 0; + *cgreg = VIR_CGROUP_REGISTER_DEFAULT; + break; + case VIR_DOMAIN_RESOURCE_REGISTER_MACHINED: + if (!avail) + goto unsupported; + *cgreg = VIR_CGROUP_REGISTER_MACHINED; + break; + case VIR_DOMAIN_RESOURCE_REGISTER_CGROUP: + if (!avail) + goto unsupported; + *cgreg = VIR_CGROUP_REGISTER_DIRECT; + break; + case VIR_DOMAIN_RESOURCE_REGISTER_SYSTEMD: + default:
Use VIR_DOMAIN_RESOURCE_REGISTER_LAST instead of default
+ goto unsupported; + } + + return 1; + + unsupported: + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Resource register '%s' not available"), + virDomainResourceRegisterTypeToString(reg)); + return -1; +} + static int qemuInitCgroup(virDomainObjPtr vm, size_t nnicindexes, @@ -842,11 +882,17 @@ qemuInitCgroup(virDomainObjPtr vm, int ret = -1; qemuDomainObjPrivatePtr priv = vm->privateData; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(priv->driver); + virCgroupRegister reg; + int rv;
- if (!virQEMUDriverIsPrivileged(priv->driver)) - goto done; - - if (!virCgroupAvailable()) + rv = qemuGetCgroupMode(vm, + vm->def->resource ? + vm->def->resource->reg : + VIR_DOMAIN_RESOURCE_REGISTER_DEFAULT, + ®); + if (rv < 0) + goto cleanup; + if (rv == 0) goto done;
virCgroupFree(&priv->cgroup); @@ -857,13 +903,12 @@ qemuInitCgroup(virDomainObjPtr vm, if (VIR_ALLOC(res) < 0) goto cleanup;
- if (VIR_STRDUP(res->partition, "/machine") < 0) { - VIR_FREE(res); - goto cleanup; - } - vm->def->resource = res; }
Similar to above, we now have: if (!vm->def->resource && VIR_ALLOC(vm->def->resource) < 0) goto cleanup;
+ if (!vm->def->resource->partition) { + if (VIR_STRDUP(vm->def->resource->partition, "/machine") < 0) + goto cleanup; + }
and similar construct if desired.
if (vm->def->resource->partition[0] != '/') { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, @@ -879,6 +924,7 @@ qemuInitCgroup(virDomainObjPtr vm, vm->pid, false, nnicindexes, nicindexes, + ®, vm->def->resource->partition, cfg->cgroupControllers, &priv->cgroup) < 0) { @@ -980,6 +1026,11 @@ qemuConnectCgroup(virDomainObjPtr vm) virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(priv->driver); int ret = -1;
+ if (vm->def->resource && + vm->def->resource->reg == VIR_DOMAIN_RESOURCE_REGISTER_NONE) { + goto done; + } +
The subsequent checks could use the @avail helper mentioned above... John
if (!virQEMUDriverIsPrivileged(priv->driver)) goto done;
diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 0a31947b0d..07ffb78c78 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -1733,6 +1733,7 @@ virCgroupNewMachine(const char *name, bool isContainer, size_t nnicindexes, int *nicindexes, + virCgroupRegister *reg, const char *partition, int controllers, virCgroupPtr *group) @@ -1741,28 +1742,42 @@ virCgroupNewMachine(const char *name,
*group = NULL;
- if ((rv = virCgroupNewMachineSystemd(name, - drivername, - uuid, - rootdir, - pidleader, - isContainer, - nnicindexes, - nicindexes, - partition, - controllers, - group)) == 0) - return 0; + if (*reg == VIR_CGROUP_REGISTER_DEFAULT || + *reg == VIR_CGROUP_REGISTER_MACHINED) { + if ((rv = virCgroupNewMachineSystemd(name, + drivername, + uuid, + rootdir, + pidleader, + isContainer, + nnicindexes, + nicindexes, + partition, + controllers, + group)) == 0) { + *reg = VIR_CGROUP_REGISTER_MACHINED; + return 0; + }
- if (rv == -1) - return -1; + if (rv == -1) + return -1; + + if (*reg == VIR_CGROUP_REGISTER_MACHINED) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + "%s", _("Systemd machined requested, but not available")); + return -1; + } + }
- return virCgroupNewMachineManual(name, - drivername, - pidleader, - partition, - controllers, - group); + rv = virCgroupNewMachineManual(name, + drivername, + pidleader, + partition, + controllers, + group); + if (rv == 0) + *reg = VIR_CGROUP_REGISTER_DIRECT; + return rv; }
diff --git a/src/util/vircgroup.h b/src/util/vircgroup.h index d833927678..63ee1aba5c 100644 --- a/src/util/vircgroup.h +++ b/src/util/vircgroup.h @@ -46,7 +46,8 @@ enum { VIR_CGROUP_CONTROLLER_LAST };
-VIR_ENUM_DECL(virCgroupController); +VIR_ENUM_DECL(virCgroupController) + /* Items of this enum are used later in virCgroupNew to create * bit array stored in int. Like this: * 1 << VIR_CGROUP_CONTROLLER_CPU @@ -103,6 +104,12 @@ virCgroupNewDetectMachine(const char *name, virCgroupPtr *group) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2);
+typedef enum { + VIR_CGROUP_REGISTER_DEFAULT, + VIR_CGROUP_REGISTER_DIRECT, + VIR_CGROUP_REGISTER_MACHINED, +} virCgroupRegister; + int virCgroupNewMachine(const char *name, const char *drivername, const unsigned char *uuid, @@ -111,6 +118,7 @@ int virCgroupNewMachine(const char *name, bool isContainer, size_t nnicindexes, int *nicindexes, + virCgroupRegister *reg, const char *partition, int controllers, virCgroupPtr *group)

I'm so sorry for not getting to this earlier. I though I'll get to this over the holidays, but they were very busy with no free time for me. On Wed, Dec 20, 2017 at 04:47:46PM +0000, Daniel P. Berrange wrote:
Currently the QEMU driver has three ways of setting up cgroups. It either skips them entirely (if non-root), or uses systemd-machined, or uses cgroups directly.
So what we are trying to fix here is that all of the variations don't create the same structure. So it needs to be clear for the mgmt app to guess^Wknow correctly where the domain is in the cgroup hierarchy.
It is further possible to register directly with systemd and bypass machined. We don't support this by systemd-nsspawn does and we ought to.
But what's the benefit of that?
This change adds ability to configure the mechanism for registering resources between all these options explicitly. via
<resource register="none|direct|machined|systemd"/>
I understand what you are trying to fix, but I don't quite follow why we should expose that. Can't we guess some of them easily? Or are you making this part of the PoC, but then removing it later? For now I am OK with that, but I would rather see that as a part of qemu.conf (or libvirtd.conf), not really in the XML. OK for the PoC, though.

On Tue, Jan 09, 2018 at 01:43:39PM +0100, Martin Kletzander wrote:
I'm so sorry for not getting to this earlier. I though I'll get to this over the holidays, but they were very busy with no free time for me.
On Wed, Dec 20, 2017 at 04:47:46PM +0000, Daniel P. Berrange wrote:
Currently the QEMU driver has three ways of setting up cgroups. It either skips them entirely (if non-root), or uses systemd-machined, or uses cgroups directly.
So what we are trying to fix here is that all of the variations don't create the same structure. So it needs to be clear for the mgmt app to guess^Wknow correctly where the domain is in the cgroup hierarchy.
It is further possible to register directly with systemd and bypass machined. We don't support this by systemd-nsspawn does and we ought to.
But what's the benefit of that?
Systemd recommends that you only register with machined, if you are running a full OS install in the machine. IOW, if you are using LXC / QEMU for sandboxing individual applications, instead of full OS, then you should not register with machined. So this is something libvirt sandbox ought to be able to request, for exmaple.
This change adds ability to configure the mechanism for registering resources between all these options explicitly. via
<resource register="none|direct|machined|systemd"/>
I understand what you are trying to fix, but I don't quite follow why we should expose that. Can't we guess some of them easily? Or are you making this part of the PoC, but then removing it later?
No, I intend this bit to be fully supported long term. * none - use this to just inherit current placement of the caller. This is akin to turning off all use of cgroups in qemu.conf if launching from libvirtd, causing currently placement of libvirtd in cgroups to be inherited by VMs. This is what we'll also need with the shim for KubeVirt's needs * machined - register with machined - this is what we do today, if we detect that machined is available * direct - create cgroups directly - this is what we do if machined is not installed, or we have no dbus connection available. * systemd - register with systemd only, not with machined - see above rationale It is unlikely that apps would use 'direct' if running on a systemd based host. Likewise using machined/systemd on a non-systemd host is not possible. But that still leaves a choice of 2/3 options that are viable and cannot be automatically determined, and are reasonable to vary per-VM.
For now I am OK with that, but I would rather see that as a part of qemu.conf (or libvirtd.conf), not really in the XML. OK for the PoC, though.
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Jan 11, 2018 at 12:14:21PM +0000, Daniel P. Berrange wrote:
On Tue, Jan 09, 2018 at 01:43:39PM +0100, Martin Kletzander wrote:
I'm so sorry for not getting to this earlier. I though I'll get to this over the holidays, but they were very busy with no free time for me.
On Wed, Dec 20, 2017 at 04:47:46PM +0000, Daniel P. Berrange wrote:
Currently the QEMU driver has three ways of setting up cgroups. It either skips them entirely (if non-root), or uses systemd-machined, or uses cgroups directly.
So what we are trying to fix here is that all of the variations don't create the same structure. So it needs to be clear for the mgmt app to guess^Wknow correctly where the domain is in the cgroup hierarchy.
It is further possible to register directly with systemd and bypass machined. We don't support this by systemd-nsspawn does and we ought to.
But what's the benefit of that?
Systemd recommends that you only register with machined, if you are running a full OS install in the machine. IOW, if you are using LXC / QEMU for sandboxing individual applications, instead of full OS, then you should not register with machined. So this is something libvirt sandbox ought to be able to request, for exmaple.
This change adds ability to configure the mechanism for registering resources between all these options explicitly. via
<resource register="none|direct|machined|systemd"/>
I understand what you are trying to fix, but I don't quite follow why we should expose that. Can't we guess some of them easily? Or are you making this part of the PoC, but then removing it later?
No, I intend this bit to be fully supported long term.
* none - use this to just inherit current placement of the caller. This is akin to turning off all use of cgroups in qemu.conf if launching from libvirtd, causing currently placement of libvirtd in cgroups to be inherited by VMs.
This is what we'll also need with the shim for KubeVirt's needs
* machined - register with machined - this is what we do today, if we detect that machined is available
* direct - create cgroups directly - this is what we do if machined is not installed, or we have no dbus connection available.
* systemd - register with systemd only, not with machined - see above rationale
It is unlikely that apps would use 'direct' if running on a systemd based host. Likewise using machined/systemd on a non-systemd host is not possible.
But that still leaves a choice of 2/3 options that are viable and cannot be automatically determined, and are reasonable to vary per-VM.
Oh, I also meant to say that I would expect us to update this in the live XML, if the user/app left it unspecified. This would allow the app to determine whether libvirt has actually activated cgroup support or not, and thus whether resource mgmt apis/config will work Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Currently drivers can only do a bulk load of config / status files for their guests. This exposes some helper methods to allow individual guests to be loaded. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/conf/virdomainobjlist.c | 98 ++++++++++++++++++++++++++++++++------------- src/conf/virdomainobjlist.h | 17 ++++++++ src/libvirt_private.syms | 2 + 3 files changed, 89 insertions(+), 28 deletions(-) diff --git a/src/conf/virdomainobjlist.c b/src/conf/virdomainobjlist.c index 87a742b1ea..37f795fe6d 100644 --- a/src/conf/virdomainobjlist.c +++ b/src/conf/virdomainobjlist.c @@ -443,16 +443,15 @@ void virDomainObjListRemoveLocked(virDomainObjListPtr doms, virObjectUnlock(dom); } - static virDomainObjPtr -virDomainObjListLoadConfig(virDomainObjListPtr doms, - virCapsPtr caps, - virDomainXMLOptionPtr xmlopt, - const char *configDir, - const char *autostartDir, - const char *name, - virDomainLoadConfigNotify notify, - void *opaque) +virDomainObjListLoadConfigLocked(virDomainObjListPtr doms, + virCapsPtr caps, + virDomainXMLOptionPtr xmlopt, + const char *configDir, + const char *autostartDir, + const char *name, + virDomainLoadConfigNotify notify, + void *opaque) { char *configFile = NULL, *autostartLink = NULL; virDomainDefPtr def = NULL; @@ -496,14 +495,36 @@ virDomainObjListLoadConfig(virDomainObjListPtr doms, } -static virDomainObjPtr -virDomainObjListLoadStatus(virDomainObjListPtr doms, - const char *statusDir, - const char *name, +virDomainObjPtr +virDomainObjListLoadConfig(virDomainObjListPtr doms, virCapsPtr caps, virDomainXMLOptionPtr xmlopt, + const char *configDir, + const char *autostartDir, + const char *name, virDomainLoadConfigNotify notify, void *opaque) +{ + virDomainObjPtr vm = NULL; + virObjectRWLockWrite(doms); + + vm = virDomainObjListLoadConfigLocked(doms, caps, xmlopt, configDir, + autostartDir, name, notify, opaque); + virObjectRef(vm); + + virObjectRWUnlock(doms); + return vm; +} + + +static virDomainObjPtr +virDomainObjListLoadStatusLocked(virDomainObjListPtr doms, + const char *statusDir, + const char *name, + virCapsPtr caps, + virDomainXMLOptionPtr xmlopt, + virDomainLoadConfigNotify notify, + void *opaque) { char *statusFile = NULL; virDomainObjPtr obj = NULL; @@ -555,6 +576,27 @@ virDomainObjListLoadStatus(virDomainObjListPtr doms, } +virDomainObjPtr +virDomainObjListLoadStatus(virDomainObjListPtr doms, + const char *statusDir, + const char *name, + virCapsPtr caps, + virDomainXMLOptionPtr xmlopt, + virDomainLoadConfigNotify notify, + void *opaque) +{ + virDomainObjPtr vm = NULL; + virObjectRWLockWrite(doms); + + vm = virDomainObjListLoadStatusLocked(doms, statusDir, name, caps, xmlopt, + notify, opaque); + virObjectRef(vm); + + virObjectRWUnlock(doms); + return vm; +} + + int virDomainObjListLoadAllConfigs(virDomainObjListPtr doms, const char *configDir, @@ -587,22 +629,22 @@ virDomainObjListLoadAllConfigs(virDomainObjListPtr doms, kill the whole process */ VIR_INFO("Loading config file '%s.xml'", entry->d_name); if (liveStatus) - dom = virDomainObjListLoadStatus(doms, - configDir, - entry->d_name, - caps, - xmlopt, - notify, - opaque); + dom = virDomainObjListLoadStatusLocked(doms, + configDir, + entry->d_name, + caps, + xmlopt, + notify, + opaque); else - dom = virDomainObjListLoadConfig(doms, - caps, - xmlopt, - configDir, - autostartDir, - entry->d_name, - notify, - opaque); + dom = virDomainObjListLoadConfigLocked(doms, + caps, + xmlopt, + configDir, + autostartDir, + entry->d_name, + notify, + opaque); if (dom) { if (!liveStatus) dom->persistent = 1; diff --git a/src/conf/virdomainobjlist.h b/src/conf/virdomainobjlist.h index bb186bde30..4aee8fae13 100644 --- a/src/conf/virdomainobjlist.h +++ b/src/conf/virdomainobjlist.h @@ -77,6 +77,23 @@ int virDomainObjListLoadAllConfigs(virDomainObjListPtr doms, virDomainXMLOptionPtr xmlopt, virDomainLoadConfigNotify notify, void *opaque); +virDomainObjPtr +virDomainObjListLoadConfig(virDomainObjListPtr doms, + virCapsPtr caps, + virDomainXMLOptionPtr xmlopt, + const char *configDir, + const char *autostartDir, + const char *name, + virDomainLoadConfigNotify notify, + void *opaque); +virDomainObjPtr +virDomainObjListLoadStatus(virDomainObjListPtr doms, + const char *statusDir, + const char *name, + virCapsPtr caps, + virDomainXMLOptionPtr xmlopt, + virDomainLoadConfigNotify notify, + void *opaque); int virDomainObjListNumOfDomains(virDomainObjListPtr doms, bool active, diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index a0fde65dba..be631e3dd7 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -938,6 +938,8 @@ virDomainObjListForEach; virDomainObjListGetActiveIDs; virDomainObjListGetInactiveNames; virDomainObjListLoadAllConfigs; +virDomainObjListLoadConfig; +virDomainObjListLoadStatus; virDomainObjListNew; virDomainObjListNumOfDomains; virDomainObjListRemove; -- 2.14.3

On 12/20/2017 11:47 AM, Daniel P. Berrange wrote:
Currently drivers can only do a bulk load of config / status files for their guests. This exposes some helper methods to allow individual guests to be loaded.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/conf/virdomainobjlist.c | 98 ++++++++++++++++++++++++++++++++------------- src/conf/virdomainobjlist.h | 17 ++++++++ src/libvirt_private.syms | 2 + 3 files changed, 89 insertions(+), 28 deletions(-)
diff --git a/src/conf/virdomainobjlist.c b/src/conf/virdomainobjlist.c index 87a742b1ea..37f795fe6d 100644 --- a/src/conf/virdomainobjlist.c +++ b/src/conf/virdomainobjlist.c @@ -443,16 +443,15 @@ void virDomainObjListRemoveLocked(virDomainObjListPtr doms, virObjectUnlock(dom); }
- static virDomainObjPtr -virDomainObjListLoadConfig(virDomainObjListPtr doms, - virCapsPtr caps, - virDomainXMLOptionPtr xmlopt, - const char *configDir, - const char *autostartDir, - const char *name, - virDomainLoadConfigNotify notify, - void *opaque) +virDomainObjListLoadConfigLocked(virDomainObjListPtr doms, + virCapsPtr caps, + virDomainXMLOptionPtr xmlopt, + const char *configDir, + const char *autostartDir, + const char *name, + virDomainLoadConfigNotify notify, + void *opaque) { char *configFile = NULL, *autostartLink = NULL; virDomainDefPtr def = NULL; @@ -496,14 +495,36 @@ virDomainObjListLoadConfig(virDomainObjListPtr doms, }
-static virDomainObjPtr -virDomainObjListLoadStatus(virDomainObjListPtr doms, - const char *statusDir, - const char *name, +virDomainObjPtr +virDomainObjListLoadConfig(virDomainObjListPtr doms, virCapsPtr caps, virDomainXMLOptionPtr xmlopt, + const char *configDir, + const char *autostartDir, + const char *name, virDomainLoadConfigNotify notify, void *opaque) +{ + virDomainObjPtr vm = NULL; + virObjectRWLockWrite(doms); + + vm = virDomainObjListLoadConfigLocked(doms, caps, xmlopt, configDir, + autostartDir, name, notify, opaque); + virObjectRef(vm);
<sigh> IIRC this Ref is done because the virDomainObjListAddLocked doesn't add enough refs. One of those things that's been on my list to get to after finishing up the storage/nwfilter "common object" adjustments. IOW: AddLocked is broken because it only adds one ref even though the obj is in both the UUID and Name tables - so callers are forced to "adjust". When we leave AddLocked there should be 3 refs and 1 lock, but there are 2 refs and 1 lock. The other concerns here would be when virDomainObjListLoadAllConfigs returns a valid @dom will: 1. Set "dom->persistent" based upon the @liveStatus pool passed in 2. Call virObjectUnlock(dom) Neither happens in these new API's. So all new consumers would need to know/remember to unlock (or EndAPI) and the LoadConfig caller would need to set persistent (unless we did so in "this" function"). Probably something that needs to be documented in the two new functions. John
+ + virObjectRWUnlock(doms); + return vm; +} + + +static virDomainObjPtr +virDomainObjListLoadStatusLocked(virDomainObjListPtr doms, + const char *statusDir, + const char *name, + virCapsPtr caps, + virDomainXMLOptionPtr xmlopt, + virDomainLoadConfigNotify notify, + void *opaque) { char *statusFile = NULL; virDomainObjPtr obj = NULL; @@ -555,6 +576,27 @@ virDomainObjListLoadStatus(virDomainObjListPtr doms, }
+virDomainObjPtr +virDomainObjListLoadStatus(virDomainObjListPtr doms, + const char *statusDir, + const char *name, + virCapsPtr caps, + virDomainXMLOptionPtr xmlopt, + virDomainLoadConfigNotify notify, + void *opaque) +{ + virDomainObjPtr vm = NULL; + virObjectRWLockWrite(doms); + + vm = virDomainObjListLoadStatusLocked(doms, statusDir, name, caps, xmlopt, + notify, opaque); + virObjectRef(vm); + + virObjectRWUnlock(doms); + return vm; +} + + int virDomainObjListLoadAllConfigs(virDomainObjListPtr doms, const char *configDir, @@ -587,22 +629,22 @@ virDomainObjListLoadAllConfigs(virDomainObjListPtr doms, kill the whole process */ VIR_INFO("Loading config file '%s.xml'", entry->d_name); if (liveStatus) - dom = virDomainObjListLoadStatus(doms, - configDir, - entry->d_name, - caps, - xmlopt, - notify, - opaque); + dom = virDomainObjListLoadStatusLocked(doms, + configDir, + entry->d_name, + caps, + xmlopt, + notify, + opaque); else - dom = virDomainObjListLoadConfig(doms, - caps, - xmlopt, - configDir, - autostartDir, - entry->d_name, - notify, - opaque); + dom = virDomainObjListLoadConfigLocked(doms, + caps, + xmlopt, + configDir, + autostartDir, + entry->d_name, + notify, + opaque); if (dom) { if (!liveStatus) dom->persistent = 1; diff --git a/src/conf/virdomainobjlist.h b/src/conf/virdomainobjlist.h index bb186bde30..4aee8fae13 100644 --- a/src/conf/virdomainobjlist.h +++ b/src/conf/virdomainobjlist.h @@ -77,6 +77,23 @@ int virDomainObjListLoadAllConfigs(virDomainObjListPtr doms, virDomainXMLOptionPtr xmlopt, virDomainLoadConfigNotify notify, void *opaque); +virDomainObjPtr +virDomainObjListLoadConfig(virDomainObjListPtr doms, + virCapsPtr caps, + virDomainXMLOptionPtr xmlopt, + const char *configDir, + const char *autostartDir, + const char *name, + virDomainLoadConfigNotify notify, + void *opaque); +virDomainObjPtr +virDomainObjListLoadStatus(virDomainObjListPtr doms, + const char *statusDir, + const char *name, + virCapsPtr caps, + virDomainXMLOptionPtr xmlopt, + virDomainLoadConfigNotify notify, + void *opaque);
int virDomainObjListNumOfDomains(virDomainObjListPtr doms, bool active, diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index a0fde65dba..be631e3dd7 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -938,6 +938,8 @@ virDomainObjListForEach; virDomainObjListGetActiveIDs; virDomainObjListGetInactiveNames; virDomainObjListLoadAllConfigs; +virDomainObjListLoadConfig; +virDomainObjListLoadStatus; virDomainObjListNew; virDomainObjListNumOfDomains; virDomainObjListRemove;

Currently the QEMU driver will reconnect to running guests during libvirtd startup. To support the ability to launch a fully supported guest externally to the main libvirtd daemon, this adds a QEMU specific public API virDomainQemuReconnect(conn, name, flags); This accepts a domain name, and simply runs the normal reconnect logic that the QEMU driver would do at startup. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- include/libvirt/libvirt-qemu.h | 4 ++++ src/driver-hypervisor.h | 5 +++++ src/libvirt-qemu.c | 48 ++++++++++++++++++++++++++++++++++++++++++ src/libvirt_qemu.syms | 5 +++++ src/remote/qemu_protocol.x | 18 +++++++++++++++- src/remote/remote_driver.c | 1 + 6 files changed, 80 insertions(+), 1 deletion(-) diff --git a/include/libvirt/libvirt-qemu.h b/include/libvirt/libvirt-qemu.h index 2bb8ee8685..0d38d2f9f8 100644 --- a/include/libvirt/libvirt-qemu.h +++ b/include/libvirt/libvirt-qemu.h @@ -44,6 +44,10 @@ virDomainPtr virDomainQemuAttach(virConnectPtr domain, unsigned int pid_value, unsigned int flags); +virDomainPtr virDomainQemuReconnect(virConnectPtr domain, + const char *name, + unsigned int flags); + typedef enum { VIR_DOMAIN_QEMU_AGENT_COMMAND_MIN = -2, VIR_DOMAIN_QEMU_AGENT_COMMAND_BLOCK = -2, diff --git a/src/driver-hypervisor.h b/src/driver-hypervisor.h index ce0e2b2525..2e923e730f 100644 --- a/src/driver-hypervisor.h +++ b/src/driver-hypervisor.h @@ -848,6 +848,10 @@ typedef virDomainPtr (*virDrvDomainQemuAttach)(virConnectPtr conn, unsigned int pid_value, unsigned int flags); +typedef virDomainPtr +(*virDrvDomainQemuReconnect)(virConnectPtr conn, + const char *name, + unsigned int flags); typedef int (*virDrvConnectDomainQemuMonitorEventRegister)(virConnectPtr conn, @@ -1463,6 +1467,7 @@ struct _virHypervisorDriver { virDrvDomainSnapshotDelete domainSnapshotDelete; virDrvDomainQemuMonitorCommand domainQemuMonitorCommand; virDrvDomainQemuAttach domainQemuAttach; + virDrvDomainQemuReconnect domainQemuReconnect; virDrvDomainQemuAgentCommand domainQemuAgentCommand; virDrvConnectDomainQemuMonitorEventRegister connectDomainQemuMonitorEventRegister; virDrvConnectDomainQemuMonitorEventDeregister connectDomainQemuMonitorEventDeregister; diff --git a/src/libvirt-qemu.c b/src/libvirt-qemu.c index 43f63839eb..c5859cf312 100644 --- a/src/libvirt-qemu.c +++ b/src/libvirt-qemu.c @@ -164,6 +164,54 @@ virDomainQemuAttach(virConnectPtr conn, return NULL; } +/** + * virDomainQemuReconnect: + * @conn: pointer to a hypervisor connection + * @name: the libvirt guest name + * @flags: optional flags, currently unused + * + * This API is QEMU specific, so it will only work with hypervisor + * connections to the QEMU driver. + * + * This API will attach to an QEMU process that a separate libvirt + * helper has launched. The external QEMU must fully follow the + * normal libvirt QEMU configuration. + * + * If successful, then the guest will appear in the list of running + * domains for this connection, and other APIs should operate + * normally (provided the above requirements were honored). + * + * Returns a new domain object on success, NULL otherwise + */ +virDomainPtr +virDomainQemuReconnect(virConnectPtr conn, + const char *name, + unsigned int flags) +{ + VIR_DEBUG("conn=%p, name=%s, flags=0x%x", conn, name, flags); + + virResetLastError(); + + virCheckConnectReturn(conn, NULL); + virCheckNonNullArgGoto(name, error); + + virCheckReadOnlyGoto(conn->flags, error); + + if (conn->driver->domainQemuReconnect) { + virDomainPtr ret; + ret = conn->driver->domainQemuReconnect(conn, name, flags); + if (!ret) + goto error; + return ret; + } + + virReportUnsupportedError(); + + error: + virDispatchError(conn); + return NULL; +} + /** * virDomainQemuAgentCommand: diff --git a/src/libvirt_qemu.syms b/src/libvirt_qemu.syms index 3a297e3a2b..a109fcc0d0 100644 --- a/src/libvirt_qemu.syms +++ b/src/libvirt_qemu.syms @@ -30,3 +30,8 @@ LIBVIRT_QEMU_1.2.3 { virConnectDomainQemuMonitorEventDeregister; virConnectDomainQemuMonitorEventRegister; } LIBVIRT_QEMU_0.10.0; + +LIBVIRT_QEMU_4.1.0 { + global: + virDomainQemuReconnect; +} LIBVIRT_QEMU_1.2.3; diff --git a/src/remote/qemu_protocol.x b/src/remote/qemu_protocol.x index f6b88a984c..b5307bf9dd 100644 --- a/src/remote/qemu_protocol.x +++ b/src/remote/qemu_protocol.x @@ -82,6 +82,15 @@ struct qemu_domain_monitor_event_msg { remote_string details; }; +struct qemu_domain_reconnect_args { + remote_nonnull_string name; + unsigned int flags; +}; + +struct qemu_domain_reconnect_ret { + remote_nonnull_domain dom; +}; + /* Define the program number, protocol version and procedure numbers here. */ const QEMU_PROGRAM = 0x20008087; const QEMU_PROTOCOL_VERSION = 1; @@ -154,5 +163,12 @@ enum qemu_procedure { * @generate: both * @acl: none */ - QEMU_PROC_DOMAIN_MONITOR_EVENT = 6 + QEMU_PROC_DOMAIN_MONITOR_EVENT = 6, + + /** + * @generate: both + * @acl: domain:start + * @acl: domain:write + */ + QEMU_PROC_DOMAIN_RECONNECT = 7 }; diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index f8fa64af99..b76242fe2e 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -8431,6 +8431,7 @@ static virHypervisorDriver hypervisor_driver = { .domainSnapshotDelete = remoteDomainSnapshotDelete, /* 0.8.0 */ .domainQemuMonitorCommand = remoteDomainQemuMonitorCommand, /* 0.8.3 */ .domainQemuAttach = remoteDomainQemuAttach, /* 0.9.4 */ + .domainQemuReconnect = remoteDomainQemuReconnect, /* 4.1.0 */ .domainQemuAgentCommand = remoteDomainQemuAgentCommand, /* 0.10.0 */ .connectDomainQemuMonitorEventRegister = remoteConnectDomainQemuMonitorEventRegister, /* 1.2.3 */ .connectDomainQemuMonitorEventDeregister = remoteConnectDomainQemuMonitorEventDeregister, /* 1.2.3 */ -- 2.14.3

On 12/20/2017 11:47 AM, Daniel P. Berrange wrote:
Currently the QEMU driver will reconnect to running guests during libvirtd startup. To support the ability to launch a fully supported guest externally to the main libvirtd daemon, this adds a QEMU specific public API
virDomainQemuReconnect(conn, name, flags);
This accepts a domain name, and simply runs the normal reconnect logic that the QEMU driver would do at startup.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- include/libvirt/libvirt-qemu.h | 4 ++++ src/driver-hypervisor.h | 5 +++++ src/libvirt-qemu.c | 48 ++++++++++++++++++++++++++++++++++++++++++ src/libvirt_qemu.syms | 5 +++++ src/remote/qemu_protocol.x | 18 +++++++++++++++- src/remote/remote_driver.c | 1 + 6 files changed, 80 insertions(+), 1 deletion(-)
Build for me failed on this patch during build of various src/*_protocol_struct files: --- qemu_protocol-structs 2016-01-13 07:49:19.459819838 -0500 +++ qemu_protocol-struct-t3 2017-12-21 13:42:37.836063667 -0500 @@ -47,6 +47,13 @@ u_int micros; remote_string details; }; +struct qemu_domain_reconnect_args { + remote_nonnull_string name; + u_int flags; +}; +struct qemu_domain_reconnect_ret { + remote_nonnull_domain dom; +}; enum qemu_procedure { QEMU_PROC_DOMAIN_MONITOR_COMMAND = 1, QEMU_PROC_DOMAIN_ATTACH = 2, @@ -54,4 +61,5 @@ QEMU_PROC_CONNECT_DOMAIN_MONITOR_EVENT_REGISTER = 4, QEMU_PROC_CONNECT_DOMAIN_MONITOR_EVENT_DEREGISTER = 5, QEMU_PROC_DOMAIN_MONITOR_EVENT = 6, + QEMU_PROC_DOMAIN_RECONNECT = 7, }; Otherwise, looks reasonable/right to me. John
diff --git a/include/libvirt/libvirt-qemu.h b/include/libvirt/libvirt-qemu.h index 2bb8ee8685..0d38d2f9f8 100644 --- a/include/libvirt/libvirt-qemu.h +++ b/include/libvirt/libvirt-qemu.h @@ -44,6 +44,10 @@ virDomainPtr virDomainQemuAttach(virConnectPtr domain, unsigned int pid_value, unsigned int flags);
+virDomainPtr virDomainQemuReconnect(virConnectPtr domain, + const char *name, + unsigned int flags); + typedef enum { VIR_DOMAIN_QEMU_AGENT_COMMAND_MIN = -2, VIR_DOMAIN_QEMU_AGENT_COMMAND_BLOCK = -2, diff --git a/src/driver-hypervisor.h b/src/driver-hypervisor.h index ce0e2b2525..2e923e730f 100644 --- a/src/driver-hypervisor.h +++ b/src/driver-hypervisor.h @@ -848,6 +848,10 @@ typedef virDomainPtr (*virDrvDomainQemuAttach)(virConnectPtr conn, unsigned int pid_value, unsigned int flags); +typedef virDomainPtr +(*virDrvDomainQemuReconnect)(virConnectPtr conn, + const char *name, + unsigned int flags);
typedef int (*virDrvConnectDomainQemuMonitorEventRegister)(virConnectPtr conn, @@ -1463,6 +1467,7 @@ struct _virHypervisorDriver { virDrvDomainSnapshotDelete domainSnapshotDelete; virDrvDomainQemuMonitorCommand domainQemuMonitorCommand; virDrvDomainQemuAttach domainQemuAttach; + virDrvDomainQemuReconnect domainQemuReconnect; virDrvDomainQemuAgentCommand domainQemuAgentCommand; virDrvConnectDomainQemuMonitorEventRegister connectDomainQemuMonitorEventRegister; virDrvConnectDomainQemuMonitorEventDeregister connectDomainQemuMonitorEventDeregister; diff --git a/src/libvirt-qemu.c b/src/libvirt-qemu.c index 43f63839eb..c5859cf312 100644 --- a/src/libvirt-qemu.c +++ b/src/libvirt-qemu.c @@ -164,6 +164,54 @@ virDomainQemuAttach(virConnectPtr conn, return NULL; }
+/** + * virDomainQemuReconnect: + * @conn: pointer to a hypervisor connection + * @name: the libvirt guest name + * @flags: optional flags, currently unused + * + * This API is QEMU specific, so it will only work with hypervisor + * connections to the QEMU driver. + * + * This API will attach to an QEMU process that a separate libvirt + * helper has launched. The external QEMU must fully follow the + * normal libvirt QEMU configuration. + * + * If successful, then the guest will appear in the list of running + * domains for this connection, and other APIs should operate + * normally (provided the above requirements were honored). + * + * Returns a new domain object on success, NULL otherwise + */ +virDomainPtr +virDomainQemuReconnect(virConnectPtr conn, + const char *name, + unsigned int flags) +{ + VIR_DEBUG("conn=%p, name=%s, flags=0x%x", conn, name, flags); + + virResetLastError(); + + virCheckConnectReturn(conn, NULL); + virCheckNonNullArgGoto(name, error); + + virCheckReadOnlyGoto(conn->flags, error); + + if (conn->driver->domainQemuReconnect) { + virDomainPtr ret; + ret = conn->driver->domainQemuReconnect(conn, name, flags); + if (!ret) + goto error; + return ret; + } + + virReportUnsupportedError(); + + error: + virDispatchError(conn); + return NULL; +} +
/** * virDomainQemuAgentCommand: diff --git a/src/libvirt_qemu.syms b/src/libvirt_qemu.syms index 3a297e3a2b..a109fcc0d0 100644 --- a/src/libvirt_qemu.syms +++ b/src/libvirt_qemu.syms @@ -30,3 +30,8 @@ LIBVIRT_QEMU_1.2.3 { virConnectDomainQemuMonitorEventDeregister; virConnectDomainQemuMonitorEventRegister; } LIBVIRT_QEMU_0.10.0; + +LIBVIRT_QEMU_4.1.0 { + global: + virDomainQemuReconnect; +} LIBVIRT_QEMU_1.2.3; diff --git a/src/remote/qemu_protocol.x b/src/remote/qemu_protocol.x index f6b88a984c..b5307bf9dd 100644 --- a/src/remote/qemu_protocol.x +++ b/src/remote/qemu_protocol.x @@ -82,6 +82,15 @@ struct qemu_domain_monitor_event_msg { remote_string details; };
+struct qemu_domain_reconnect_args { + remote_nonnull_string name; + unsigned int flags; +}; + +struct qemu_domain_reconnect_ret { + remote_nonnull_domain dom; +}; + /* Define the program number, protocol version and procedure numbers here. */ const QEMU_PROGRAM = 0x20008087; const QEMU_PROTOCOL_VERSION = 1; @@ -154,5 +163,12 @@ enum qemu_procedure { * @generate: both * @acl: none */ - QEMU_PROC_DOMAIN_MONITOR_EVENT = 6 + QEMU_PROC_DOMAIN_MONITOR_EVENT = 6, + + /** + * @generate: both + * @acl: domain:start + * @acl: domain:write + */ + QEMU_PROC_DOMAIN_RECONNECT = 7 }; diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index f8fa64af99..b76242fe2e 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -8431,6 +8431,7 @@ static virHypervisorDriver hypervisor_driver = { .domainSnapshotDelete = remoteDomainSnapshotDelete, /* 0.8.0 */ .domainQemuMonitorCommand = remoteDomainQemuMonitorCommand, /* 0.8.3 */ .domainQemuAttach = remoteDomainQemuAttach, /* 0.9.4 */ + .domainQemuReconnect = remoteDomainQemuReconnect, /* 4.1.0 */ .domainQemuAgentCommand = remoteDomainQemuAgentCommand, /* 0.10.0 */ .connectDomainQemuMonitorEventRegister = remoteConnectDomainQemuMonitorEventRegister, /* 1.2.3 */ .connectDomainQemuMonitorEventDeregister = remoteConnectDomainQemuMonitorEventDeregister, /* 1.2.3 */

Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_driver.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_process.c | 31 ++++++++++++++++++--------- src/qemu/qemu_process.h | 1 + 3 files changed, 79 insertions(+), 10 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 97b194b057..fea1f24250 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16300,6 +16300,62 @@ static virDomainPtr qemuDomainQemuAttach(virConnectPtr conn, } +static virDomainPtr qemuDomainQemuReconnect(virConnectPtr conn, + const char *name, + unsigned int flags) +{ + virQEMUDriverPtr driver = conn->privateData; + virDomainObjPtr vm = NULL; + virDomainPtr dom = NULL; + virCapsPtr caps = NULL; + virQEMUDriverConfigPtr cfg; + + virCheckFlags(0, NULL); + + cfg = virQEMUDriverGetConfig(driver); + + if (strchr(name, '/')) { + virReportError(VIR_ERR_XML_ERROR, + _("name %s cannot contain '/'"), name); + goto cleanup; + } + + vm = virDomainObjListFindByName(driver->domains, name); + if (vm) { + virReportError(VIR_ERR_OPERATION_INVALID, + _("Domain '%s' already exists"), name); + goto cleanup; + } + + if (!(caps = virQEMUDriverGetCapabilities(driver, false))) + goto cleanup; + + if (!(vm = virDomainObjListLoadStatus(driver->domains, + cfg->stateDir, + name, + caps, + driver->xmlopt, + NULL, NULL))) { + goto cleanup; + } + + if (virDomainQemuReconnectEnsureACL(conn, vm->def) < 0) { + qemuDomainRemoveInactive(driver, vm); + goto cleanup; + } + + dom = virGetDomain(conn, vm->def->name, vm->def->uuid, vm->def->id); + + qemuDomainObjEndJob(driver, vm); + + cleanup: + virDomainObjEndAPI(&vm); + virObjectUnref(caps); + virObjectUnref(cfg); + return dom; +} + + static int qemuDomainOpenConsole(virDomainPtr dom, const char *dev_name, @@ -21262,6 +21318,7 @@ static virHypervisorDriver qemuHypervisorDriver = { .domainSnapshotDelete = qemuDomainSnapshotDelete, /* 0.8.0 */ .domainQemuMonitorCommand = qemuDomainQemuMonitorCommand, /* 0.8.3 */ .domainQemuAttach = qemuDomainQemuAttach, /* 0.9.4 */ + .domainQemuReconnect = qemuDomainQemuReconnect, /* 4.1.0 */ .domainQemuAgentCommand = qemuDomainQemuAgentCommand, /* 0.10.0 */ .connectDomainQemuMonitorEventRegister = qemuConnectDomainQemuMonitorEventRegister, /* 1.2.3 */ .connectDomainQemuMonitorEventDeregister = qemuConnectDomainQemuMonitorEventDeregister, /* 1.2.3 */ diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index a0f430f89f..ea924cf9b6 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -7088,14 +7088,10 @@ struct qemuProcessReconnectData { * We can't do normal MonitorEnter & MonitorExit because these two lock the * monitor lock, which does not exists in this early phase. */ -static void -qemuProcessReconnect(void *opaque) +int +qemuProcessReconnect(virQEMUDriverPtr driver, virDomainObjPtr obj, virConnectPtr conn) { - struct qemuProcessReconnectData *data = opaque; - virQEMUDriverPtr driver = data->driver; - virDomainObjPtr obj = data->obj; qemuDomainObjPrivatePtr priv; - virConnectPtr conn = data->conn; struct qemuDomainJobObj oldjob; int state; int reason; @@ -7104,8 +7100,7 @@ qemuProcessReconnect(void *opaque) unsigned int stopFlags = 0; bool jobStarted = false; virCapsPtr caps = NULL; - - VIR_FREE(data); + int ret = -1; qemuDomainObjRestoreJob(obj, &oldjob); if (oldjob.asyncJob == QEMU_ASYNC_JOB_MIGRATION_IN) @@ -7297,6 +7292,7 @@ qemuProcessReconnect(void *opaque) if (virAtomicIntInc(&driver->nactive) == 1 && driver->inhibitCallback) driver->inhibitCallback(true, driver->inhibitOpaque); + ret = 0; cleanup: if (jobStarted) { if (!virDomainObjIsActive(obj)) @@ -7311,7 +7307,7 @@ qemuProcessReconnect(void *opaque) virObjectUnref(cfg); virObjectUnref(caps); virNWFilterUnlockFilterUpdates(); - return; + return ret; error: if (virDomainObjIsActive(obj)) { @@ -7338,6 +7334,21 @@ qemuProcessReconnect(void *opaque) goto cleanup; } + +static void +qemuProcessReconnectThread(void *opaque) +{ + struct qemuProcessReconnectData *data = opaque; + virQEMUDriverPtr driver = data->driver; + virDomainObjPtr obj = data->obj; + virConnectPtr conn = data->conn; + + qemuProcessReconnect(driver, obj, conn); + + VIR_FREE(data); +} + + static int qemuProcessReconnectHelper(virDomainObjPtr obj, void *opaque) @@ -7369,7 +7380,7 @@ qemuProcessReconnectHelper(virDomainObjPtr obj, */ virObjectRef(data->conn); - if (virThreadCreate(&thread, false, qemuProcessReconnect, data) < 0) { + if (virThreadCreate(&thread, false, qemuProcessReconnectThread, data) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Could not create thread. QEMU initialization " "might be incomplete")); diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index cd9a720313..846577d341 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -45,6 +45,7 @@ int qemuProcessBuildDestroyMemoryPaths(virQEMUDriverPtr driver, void qemuProcessAutostartAll(virQEMUDriverPtr driver); void qemuProcessReconnectAll(virConnectPtr conn, virQEMUDriverPtr driver); +int qemuProcessReconnect(virQEMUDriverPtr driver, virDomainObjPtr obj, virConnectPtr conn); typedef struct _qemuProcessIncomingDef qemuProcessIncomingDef; typedef qemuProcessIncomingDef *qemuProcessIncomingDefPtr; -- 2.14.3

On 12/20/2017 11:47 AM, Daniel P. Berrange wrote:
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_driver.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_process.c | 31 ++++++++++++++++++--------- src/qemu/qemu_process.h | 1 + 3 files changed, 79 insertions(+), 10 deletions(-)
It feels like there's two separate patches here as the qemu_process stuff seems separable. Of course it dawns on me now... Should we go with a QemuConnect type naming scheme? Usage of Reconnect would seem to assume we lost connection at some point because libvirtd was stopped or crashed. Of course Attach would have been "optimal", but that's already used for something else. Perhaps the "proof" that things can work is to be able to have the process reconnect logic and this logic intersect. Theoretically they are doing the same thing in the long run at least with respect to being able to connect to an existing qemu process.
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 97b194b057..fea1f24250 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16300,6 +16300,62 @@ static virDomainPtr qemuDomainQemuAttach(virConnectPtr conn, }
+static virDomainPtr qemuDomainQemuReconnect(virConnectPtr conn, + const char *name, + unsigned int flags)
static virDomainPtr qemuDomainQemuReconnect(...)
+{ + virQEMUDriverPtr driver = conn->privateData; + virDomainObjPtr vm = NULL; + virDomainPtr dom = NULL; + virCapsPtr caps = NULL; + virQEMUDriverConfigPtr cfg; + + virCheckFlags(0, NULL); + + cfg = virQEMUDriverGetConfig(driver); + + if (strchr(name, '/')) { + virReportError(VIR_ERR_XML_ERROR, + _("name %s cannot contain '/'"), name); + goto cleanup; + } + + vm = virDomainObjListFindByName(driver->domains, name); + if (vm) { + virReportError(VIR_ERR_OPERATION_INVALID, + _("Domain '%s' already exists"), name); + goto cleanup; + } + + if (!(caps = virQEMUDriverGetCapabilities(driver, false))) + goto cleanup; + + if (!(vm = virDomainObjListLoadStatus(driver->domains, + cfg->stateDir, + name, + caps, + driver->xmlopt, + NULL, NULL))) { + goto cleanup; + }
This would assume that stateDir has a file @name.xml, right? IOW: shim processing creates it. I only mention it as a "one extra sanity" type check that we may want to make; otherwise, we could get some sort of less obvious error... I didn't do much digging to prove/disprove.
+ + if (virDomainQemuReconnectEnsureACL(conn, vm->def) < 0) { + qemuDomainRemoveInactive(driver, vm); + goto cleanup; + } + + dom = virGetDomain(conn, vm->def->name, vm->def->uuid, vm->def->id); + + qemuDomainObjEndJob(driver, vm);
When did we Begin the job? I assume that this is borrowing logic from qemuProcessReconnect and this may just be "leftover".
+ + cleanup: + virDomainObjEndAPI(&vm); + virObjectUnref(caps); + virObjectUnref(cfg); + return dom; +} + + static int qemuDomainOpenConsole(virDomainPtr dom, const char *dev_name, @@ -21262,6 +21318,7 @@ static virHypervisorDriver qemuHypervisorDriver = { .domainSnapshotDelete = qemuDomainSnapshotDelete, /* 0.8.0 */ .domainQemuMonitorCommand = qemuDomainQemuMonitorCommand, /* 0.8.3 */ .domainQemuAttach = qemuDomainQemuAttach, /* 0.9.4 */ + .domainQemuReconnect = qemuDomainQemuReconnect, /* 4.1.0 */ .domainQemuAgentCommand = qemuDomainQemuAgentCommand, /* 0.10.0 */ .connectDomainQemuMonitorEventRegister = qemuConnectDomainQemuMonitorEventRegister, /* 1.2.3 */ .connectDomainQemuMonitorEventDeregister = qemuConnectDomainQemuMonitorEventDeregister, /* 1.2.3 */ diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index a0f430f89f..ea924cf9b6 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -7088,14 +7088,10 @@ struct qemuProcessReconnectData { * We can't do normal MonitorEnter & MonitorExit because these two lock the * monitor lock, which does not exists in this early phase. */ -static void -qemuProcessReconnect(void *opaque) +int +qemuProcessReconnect(virQEMUDriverPtr driver, virDomainObjPtr obj, virConnectPtr conn)
One argument per line...
{ - struct qemuProcessReconnectData *data = opaque; - virQEMUDriverPtr driver = data->driver; - virDomainObjPtr obj = data->obj; qemuDomainObjPrivatePtr priv; - virConnectPtr conn = data->conn; struct qemuDomainJobObj oldjob; int state; int reason; @@ -7104,8 +7100,7 @@ qemuProcessReconnect(void *opaque) unsigned int stopFlags = 0; bool jobStarted = false; virCapsPtr caps = NULL; - - VIR_FREE(data); + int ret = -1;
qemuDomainObjRestoreJob(obj, &oldjob); if (oldjob.asyncJob == QEMU_ASYNC_JOB_MIGRATION_IN) @@ -7297,6 +7292,7 @@ qemuProcessReconnect(void *opaque) if (virAtomicIntInc(&driver->nactive) == 1 && driver->inhibitCallback) driver->inhibitCallback(true, driver->inhibitOpaque);
+ ret = 0; cleanup: if (jobStarted) { if (!virDomainObjIsActive(obj)) @@ -7311,7 +7307,7 @@ qemuProcessReconnect(void *opaque) virObjectUnref(cfg); virObjectUnref(caps); virNWFilterUnlockFilterUpdates(); - return; + return ret;
error: if (virDomainObjIsActive(obj)) { @@ -7338,6 +7334,21 @@ qemuProcessReconnect(void *opaque) goto cleanup; }
+ +static void +qemuProcessReconnectThread(void *opaque) +{ + struct qemuProcessReconnectData *data = opaque; + virQEMUDriverPtr driver = data->driver; + virDomainObjPtr obj = data->obj; + virConnectPtr conn = data->conn; + + qemuProcessReconnect(driver, obj, conn);
So we modify the above call to return an int, but do nothing with that status. I suspect there is a future reason for qemuProcessReconnect, but figured I'd at least note it... John
+ + VIR_FREE(data); +} + + static int qemuProcessReconnectHelper(virDomainObjPtr obj, void *opaque) @@ -7369,7 +7380,7 @@ qemuProcessReconnectHelper(virDomainObjPtr obj, */ virObjectRef(data->conn);
- if (virThreadCreate(&thread, false, qemuProcessReconnect, data) < 0) { + if (virThreadCreate(&thread, false, qemuProcessReconnectThread, data) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Could not create thread. QEMU initialization " "might be incomplete")); diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index cd9a720313..846577d341 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -45,6 +45,7 @@ int qemuProcessBuildDestroyMemoryPaths(virQEMUDriverPtr driver,
void qemuProcessAutostartAll(virQEMUDriverPtr driver); void qemuProcessReconnectAll(virConnectPtr conn, virQEMUDriverPtr driver); +int qemuProcessReconnect(virQEMUDriverPtr driver, virDomainObjPtr obj, virConnectPtr conn);
typedef struct _qemuProcessIncomingDef qemuProcessIncomingDef; typedef qemuProcessIncomingDef *qemuProcessIncomingDefPtr;

On Wed, Dec 20, 2017 at 04:47:49PM +0000, Daniel P. Berrange wrote:
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_driver.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_process.c | 31 ++++++++++++++++++--------- src/qemu/qemu_process.h | 1 + 3 files changed, 79 insertions(+), 10 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 97b194b057..fea1f24250 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16300,6 +16300,62 @@ static virDomainPtr qemuDomainQemuAttach(virConnectPtr conn, }
+static virDomainPtr qemuDomainQemuReconnect(virConnectPtr conn, + const char *name, + unsigned int flags) +{ + virQEMUDriverPtr driver = conn->privateData; + virDomainObjPtr vm = NULL; + virDomainPtr dom = NULL; + virCapsPtr caps = NULL; + virQEMUDriverConfigPtr cfg; + + virCheckFlags(0, NULL); + + cfg = virQEMUDriverGetConfig(driver); + + if (strchr(name, '/')) { + virReportError(VIR_ERR_XML_ERROR, + _("name %s cannot contain '/'"), name); + goto cleanup; + } + + vm = virDomainObjListFindByName(driver->domains, name); + if (vm) { + virReportError(VIR_ERR_OPERATION_INVALID, + _("Domain '%s' already exists"), name); + goto cleanup; + } +
It should be possible to start existing inactive domain.
+ if (!(caps = virQEMUDriverGetCapabilities(driver, false))) + goto cleanup; +
You are leaking vm here. And we should start a job on it.
+ if (!(vm = virDomainObjListLoadStatus(driver->domains, + cfg->stateDir, + name, + caps, + driver->xmlopt, + NULL, NULL))) { + goto cleanup; + } + + if (virDomainQemuReconnectEnsureACL(conn, vm->def) < 0) { + qemuDomainRemoveInactive(driver, vm);
Of course adjust according to above.

This introduces a new binary 'libvirt_qemu' which can be used to launch guests externally from libvirtd. eg libvirt_qemu -c qemu:///system /path/to/xml/file This will launch a guest from the requested XML file and then connect to qemu:///system to register it with libvirtd. At this point all normal libvirtd operations should generally work. NB, this impl has many flaws: - We don't generate a unique 'id', so all guests will get id==1. Surprisingly this doesn't break the world. - Tracking of unique name + UUIDs is *not* race free. - No tracking/allocation of shared resource state (PCI devices, etc) Most other functionality works, but might have expected results. In terms of namespace support there's some restrictions - The mount namespace must share certain dirs /etc/libvirt /var/run/libvirt /var/lib/libvirt /var/cache/libvirt between the libvirtd and libvirt_qemu processes - The PID namespace must match libvirtd & libvirt_qemu though this will be addressed in another patch to come - The network namespace can be different - Hotplug/unplug will likely break if separate namespaces are used - The libvirt_qemu will exit if startup fails, however, it won't exit when QEMU later shuts down - you must listen for libvirtd events to detect that right now. We'll fix that - Killing the libvirt_qemu shim will not kill the QEMU process. You must kill via the libvirt API as normal. This won't change, because we need to be able to ultimately restart the libvirt_qemu shim in order to apply software updates to running instance. We might wire up a special signal though to let you kill libvirt_qemu & take out QEMU at same time eg SIGQUIT or something like that perhaps. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- po/POTFILES.in | 1 + src/Makefile.am | 49 ++++ src/qemu/qemu_conf.h | 2 +- src/qemu/qemu_controller.c | 551 +++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.c | 2 +- src/qemu/qemu_driver.c | 2 +- 6 files changed, 604 insertions(+), 3 deletions(-) create mode 100644 src/qemu/qemu_controller.c diff --git a/po/POTFILES.in b/po/POTFILES.in index c1fa23427e..d97175b6d5 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -133,6 +133,7 @@ src/qemu/qemu_block.c src/qemu/qemu_capabilities.c src/qemu/qemu_cgroup.c src/qemu/qemu_command.c +src/qemu/qemu_controller.c src/qemu/qemu_conf.c src/qemu/qemu_domain.c src/qemu/qemu_domain_address.c diff --git a/src/Makefile.am b/src/Makefile.am index 4c022d1e44..7ca5ad4d3c 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -922,6 +922,9 @@ QEMU_DRIVER_SOURCES = \ qemu/qemu_capspriv.h \ qemu/qemu_security.c qemu/qemu_security.h +QEMU_CONTROLLER_SOURCES = \ + qemu/qemu_controller.c + XENAPI_DRIVER_SOURCES = \ xenapi/xenapi_driver.c xenapi/xenapi_driver.h \ xenapi/xenapi_driver_private.h \ @@ -3115,6 +3118,52 @@ endif WITH_LIBVIRTD endif WITH_LXC EXTRA_DIST += $(LXC_CONTROLLER_SOURCES) +if WITH_QEMU +if WITH_LIBVIRTD +libexec_PROGRAMS += libvirt_qemu + +libvirt_qemu_SOURCES = \ + $(QEMU_CONTROLLER_SOURCES) \ + $(DATATYPES_SOURCES) +libvirt_qemu_LDFLAGS = \ + $(AM_LDFLAGS) \ + $(PIE_LDFLAGS) \ + $(NULL) +libvirt_qemu_LDADD = \ + libvirt.la \ + libvirt-qemu.la \ + libvirt_driver_qemu_impl.la +if WITH_NETWORK +libvirt_qemu_LDADD += libvirt_driver_network_impl.la +endif WITH_NETWORK +if WITH_STORAGE +libvirt_qemu_LDADD += libvirt_driver_storage_impl.la +endif WITH_STORAGE +libvirt_qemu_LDADD += ../gnulib/lib/libgnu.la $(LIBSOCKET) +if WITH_DTRACE_PROBES +libvirt_qemu_LDADD += libvirt_probes.lo libvirt_qemu_probes.lo +endif WITH_DTRACE_PROBES +libvirt_qemu_LDADD += $(SECDRIVER_LIBS) +libvirt_qemu_CFLAGS = \ + -I$(srcdir)/access \ + -I$(srcdir)/conf \ + -I$(srcdir)/secret \ + $(AM_CFLAGS) \ + $(PIE_CFLAGS) \ + $(LIBNL_CFLAGS) \ + $(FUSE_CFLAGS) \ + $(DBUS_CFLAGS) \ + $(XDR_CFLAGS) \ + $(NULL) +if WITH_BLKID +libvirt_qemu_CFLAGS += $(BLKID_CFLAGS) +libvirt_qemu_LDADD += $(BLKID_LIBS) +endif WITH_BLKID +libvirt_qemu_CFLAGS += $(SECDRIVER_CFLAGS) +endif WITH_LIBVIRTD +endif WITH_QEMU +EXTRA_DIST += $(QEMU_CONTROLLER_SOURCES) + if WITH_SECDRIVER_APPARMOR if WITH_LIBVIRTD libexec_PROGRAMS += virt-aa-helper diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index a553e30e2e..a19b199147 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -296,7 +296,7 @@ struct _qemuDomainCmdlineDef { char **env_value; }; - +int qemuSecurityInit(virQEMUDriverPtr driver); void qemuDomainCmdlineDefFree(qemuDomainCmdlineDefPtr def); diff --git a/src/qemu/qemu_controller.c b/src/qemu/qemu_controller.c new file mode 100644 index 0000000000..320d9a99b6 --- /dev/null +++ b/src/qemu/qemu_controller.c @@ -0,0 +1,551 @@ +/* + * qemu_controller.c: QEMU process controller + * + * Copyright (C) 2006-2018 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + */ + +#include <config.h> + +#include <stdlib.h> +#include <getopt.h> +#include <stdio.h> +#include <unistd.h> + +#include "virgettext.h" +#include "virthread.h" +#include "virlog.h" +#include "virfile.h" +#include "viralloc.h" +#include "virstring.h" +#include "dirname.h" +#include "driver.h" + +#include "qemu/qemu_conf.h" +#include "qemu/qemu_process.h" +#include "qemu/qemu_driver.h" +#include "libvirt_internal.h" + +static const char *argv0; + +#define VIR_FROM_THIS VIR_FROM_QEMU + +VIR_LOG_INIT("qemu.qemu_controller"); + +typedef struct virQEMUController virQEMUController; +typedef virQEMUController *virQEMUControllerPtr; +struct virQEMUController { + const char *uri; + bool privileged; + const char *xml; + virQEMUDriverPtr driver; + virConnectPtr conn; + virDomainObjPtr vm; +}; + +static void +virQEMUControllerDriverFree(virQEMUDriverPtr driver) +{ + if (!driver) + return; + + virObjectUnref(driver->config); + virObjectUnref(driver->hostdevMgr); + virHashFree(driver->sharedDevices); + virObjectUnref(driver->caps); + virObjectUnref(driver->qemuCapsCache); + + virObjectUnref(driver->domains); + virObjectUnref(driver->remotePorts); + virObjectUnref(driver->webSocketPorts); + virObjectUnref(driver->migrationPorts); + virObjectUnref(driver->migrationErrors); + + virObjectUnref(driver->xmlopt); + + virSysinfoDefFree(driver->hostsysinfo); + + virObjectUnref(driver->closeCallbacks); + + VIR_FREE(driver->qemuImgBinary); + + virObjectUnref(driver->securityManager); + + ebtablesContextFree(driver->ebtables); + + virLockManagerPluginUnref(driver->lockManager); + + virMutexDestroy(&driver->lock); + VIR_FREE(driver); +} + +static virQEMUDriverPtr virQEMUControllerNewDriver(bool privileged) +{ + virQEMUDriverPtr driver = NULL; + char *driverConf = NULL; + virConnectPtr conn = NULL; + virQEMUDriverConfigPtr cfg; + uid_t run_uid = -1; + gid_t run_gid = -1; + char *hugepagePath = NULL; + char *memoryBackingPath = NULL; + size_t i; + + if (VIR_ALLOC(driver) < 0) + return NULL; + + if (virMutexInit(&driver->lock) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("cannot initialize mutex")); + VIR_FREE(driver); + return NULL; + } + + driver->privileged = privileged; + + /* read the host sysinfo */ + if (privileged) + driver->hostsysinfo = virSysinfoRead(); + + if (!(driver->config = cfg = virQEMUDriverConfigNew(privileged))) + goto error; + + if (virAsprintf(&driverConf, "%s/qemu.conf", cfg->configBaseDir) < 0) + goto error; + + if (virQEMUDriverConfigLoadFile(cfg, driverConf, privileged) < 0) + goto error; + VIR_FREE(driverConf); + + if (virQEMUDriverConfigValidate(cfg) < 0) + goto error; + + if (virFileMakePath(cfg->stateDir) < 0) { + virReportSystemError(errno, _("Failed to create state dir %s"), + cfg->stateDir); + goto error; + } + if (virFileMakePath(cfg->libDir) < 0) { + virReportSystemError(errno, _("Failed to create lib dir %s"), + cfg->libDir); + goto error; + } + if (virFileMakePath(cfg->cacheDir) < 0) { + virReportSystemError(errno, _("Failed to create cache dir %s"), + cfg->cacheDir); + goto error; + } + if (virFileMakePath(cfg->saveDir) < 0) { + virReportSystemError(errno, _("Failed to create save dir %s"), + cfg->saveDir); + goto error; + } + if (virFileMakePath(cfg->snapshotDir) < 0) { + virReportSystemError(errno, _("Failed to create save dir %s"), + cfg->snapshotDir); + goto error; + } + if (virFileMakePath(cfg->autoDumpPath) < 0) { + virReportSystemError(errno, _("Failed to create dump dir %s"), + cfg->autoDumpPath); + goto error; + } + if (virFileMakePath(cfg->channelTargetDir) < 0) { + virReportSystemError(errno, _("Failed to create channel target dir %s"), + cfg->channelTargetDir); + goto error; + } + if (virFileMakePath(cfg->nvramDir) < 0) { + virReportSystemError(errno, _("Failed to create nvram dir %s"), + cfg->nvramDir); + goto error; + } + if (virFileMakePath(cfg->memoryBackingDir) < 0) { + virReportSystemError(errno, _("Failed to create memory backing dir %s"), + cfg->memoryBackingDir); + goto error; + } + + driver->qemuImgBinary = virFindFileInPath("qemu-img"); + + if (!(driver->lockManager = + virLockManagerPluginNew(cfg->lockManagerName ? + cfg->lockManagerName : "nop", + "qemu", + cfg->configBaseDir, + 0))) + goto error; + + if (cfg->macFilter) { + if (!(driver->ebtables = ebtablesContextNew("qemu"))) { + virReportSystemError(errno, + _("failed to enable mac filter in '%s'"), + __FILE__); + goto error; + } + + if (ebtablesAddForwardPolicyReject(driver->ebtables) < 0) + goto error; + } + + /* Allocate bitmap for remote display port reservations. We cannot + * do this before the config is loaded properly, since the port + * numbers are configurable now */ + if ((driver->remotePorts = + virPortAllocatorNew(_("display"), + cfg->remotePortMin, + cfg->remotePortMax, + 0)) == NULL) + goto error; + + if ((driver->webSocketPorts = + virPortAllocatorNew(_("webSocket"), + cfg->webSocketPortMin, + cfg->webSocketPortMax, + 0)) == NULL) + goto error; + + if ((driver->migrationPorts = + virPortAllocatorNew(_("migration"), + cfg->migrationPortMin, + cfg->migrationPortMax, + 0)) == NULL) + goto error; + + if (qemuSecurityInit(driver) < 0) + goto error; + + if (!(driver->hostdevMgr = virHostdevManagerGetDefault())) + goto error; + + if (!(driver->sharedDevices = virHashCreate(30, qemuSharedDeviceEntryFree))) + goto error; + + if (privileged) { + char *channeldir; + + if (chown(cfg->libDir, cfg->user, cfg->group) < 0) { + virReportSystemError(errno, + _("unable to set ownership of '%s' to user %d:%d"), + cfg->libDir, (int) cfg->user, + (int) cfg->group); + goto error; + } + if (chown(cfg->cacheDir, cfg->user, cfg->group) < 0) { + virReportSystemError(errno, + _("unable to set ownership of '%s' to %d:%d"), + cfg->cacheDir, (int) cfg->user, + (int) cfg->group); + goto error; + } + if (chown(cfg->saveDir, cfg->user, cfg->group) < 0) { + virReportSystemError(errno, + _("unable to set ownership of '%s' to %d:%d"), + cfg->saveDir, (int) cfg->user, + (int) cfg->group); + goto error; + } + if (chown(cfg->snapshotDir, cfg->user, cfg->group) < 0) { + virReportSystemError(errno, + _("unable to set ownership of '%s' to %d:%d"), + cfg->snapshotDir, (int) cfg->user, + (int) cfg->group); + goto error; + } + if (chown(cfg->autoDumpPath, cfg->user, cfg->group) < 0) { + virReportSystemError(errno, + _("unable to set ownership of '%s' to %d:%d"), + cfg->autoDumpPath, (int) cfg->user, + (int) cfg->group); + goto error; + } + if (!(channeldir = mdir_name(cfg->channelTargetDir))) { + virReportOOMError(); + goto error; + } + if (chown(channeldir, cfg->user, cfg->group) < 0) { + virReportSystemError(errno, + _("unable to set ownership of '%s' to %d:%d"), + channeldir, (int) cfg->user, + (int) cfg->group); + VIR_FREE(channeldir); + goto error; + } + VIR_FREE(channeldir); + if (chown(cfg->channelTargetDir, cfg->user, cfg->group) < 0) { + virReportSystemError(errno, + _("unable to set ownership of '%s' to %d:%d"), + cfg->channelTargetDir, (int) cfg->user, + (int) cfg->group); + goto error; + } + if (chown(cfg->nvramDir, cfg->user, cfg->group) < 0) { + virReportSystemError(errno, + _("unable to set ownership of '%s' to %d:%d"), + cfg->nvramDir, (int) cfg->user, + (int) cfg->group); + goto error; + } + if (chown(cfg->memoryBackingDir, cfg->user, cfg->group) < 0) { + virReportSystemError(errno, + _("unable to set ownership of '%s' to %d:%d"), + cfg->memoryBackingDir, (int) cfg->user, + (int) cfg->group); + goto error; + } + + run_uid = cfg->user; + run_gid = cfg->group; + } + + driver->qemuCapsCache = virQEMUCapsCacheNew(cfg->libDir, + cfg->cacheDir, + run_uid, + run_gid); + if (!driver->qemuCapsCache) + goto error; + + if ((driver->caps = virQEMUDriverCreateCapabilities(driver)) == NULL) + goto error; + + if (!(driver->xmlopt = virQEMUDriverCreateXMLConf(driver))) + goto error; + + /* If hugetlbfs is present, then we need to create a sub-directory within + * it, since we can't assume the root mount point has permissions that + * will let our spawned QEMU instances use it. */ + for (i = 0; i < cfg->nhugetlbfs; i++) { + hugepagePath = qemuGetBaseHugepagePath(&cfg->hugetlbfs[i]); + + if (!hugepagePath) + goto error; + + if (virFileMakePath(hugepagePath) < 0) { + virReportSystemError(errno, + _("unable to create hugepage path %s"), + hugepagePath); + goto error; + } + if (privileged && + virFileUpdatePerm(cfg->hugetlbfs[i].mnt_dir, + 0, S_IXGRP | S_IXOTH) < 0) + goto error; + VIR_FREE(hugepagePath); + } + + if (qemuGetMemoryBackingBasePath(cfg, &memoryBackingPath) < 0) + goto error; + + if (virFileMakePath(memoryBackingPath) < 0) { + virReportSystemError(errno, + _("unable to create memory backing path %s"), + memoryBackingPath); + goto error; + } + + if (privileged && + virFileUpdatePerm(memoryBackingPath, + 0, S_IXGRP | S_IXOTH) < 0) + goto error; + VIR_FREE(memoryBackingPath); + + if (!(driver->closeCallbacks = virCloseCallbacksNew())) + goto error; + + return driver; + + error: + virObjectUnref(conn); + VIR_FREE(driverConf); + VIR_FREE(hugepagePath); + VIR_FREE(memoryBackingPath); + virQEMUControllerDriverFree(driver); + return NULL; +} + +static void show_help(FILE *io) +{ + fprintf(io, "\n"); + fprintf(io, "syntax: %s [OPTIONS] PATH-TO-XML\n", argv0); + fprintf(io, "\n"); + fprintf(io, "Options\n"); + fprintf(io, "\n"); + fprintf(io, " -c URI, --connect URI\n"); + fprintf(io, " -h, --help\n"); + fprintf(io, "\n"); +} + +static void +virQEMUControllerMain(void *opaque) +{ + int ret = -1; + virQEMUControllerPtr ctrl = opaque; + virQEMUDriverConfigPtr cfg; + virDomainChrSourceDef monitor_chr = { 0 }; + qemuDomainObjPrivatePtr priv; + virDomainPtr dom; + + if (!(ctrl->conn = virConnectOpen(ctrl->uri))) { + fprintf(stderr, "Unable to connect to %s: %s", + ctrl->uri, virGetLastErrorMessage()); + goto cleanup; + } + + if (!(ctrl->driver = virQEMUControllerNewDriver(ctrl->privileged))) { + fprintf(stderr, "Unable to initialize driver: %s", + virGetLastErrorMessage()); + goto cleanup; + } + + cfg = virObjectRef(ctrl->driver->config); + + if (qemuProcessPrepareMonitorChr(&monitor_chr, cfg->libDir) < 0) { + fprintf(stderr, "Unable to prepare QEMU monitor: %s\n", + virGetLastErrorMessage()); + goto cleanup; + } + + if (!(ctrl->vm = virDomainObjNew(ctrl->driver->xmlopt))) { + fprintf(stderr, "Unable to allocate domain object: %s\n", + virGetLastErrorMessage()); + goto cleanup; + } + + if (!(ctrl->vm->def = virDomainDefParseFile(ctrl->xml, + ctrl->driver->caps, ctrl->driver->xmlopt, + NULL, VIR_DOMAIN_DEF_PARSE_INACTIVE))) { + fprintf(stderr, "Unable to parse domain config %s\n", + virGetLastErrorMessage()); + goto cleanup; + } + + if (qemuProcessStart(NULL, ctrl->driver, ctrl->vm, NULL, 0, NULL, -1, NULL, NULL, 0, 0) < 0) { + fprintf(stderr, "Unable to start QEMU: %s\n", + virGetLastErrorMessage()); + goto cleanup; + } + + priv = ctrl->vm->privateData; + + /* Release the monitor & agent sockets, so main libvirtd can take over */ + qemuMonitorClose(priv->mon); + if (priv->agent) + qemuAgentClose(priv->agent); + + dom = virDomainQemuReconnect(ctrl->conn, ctrl->vm->def->name, 0); + if (!dom) { + qemuProcessStop(ctrl->driver, ctrl->vm, 0, 0, 0); + fprintf(stderr, "Unable to reconnect with libvirtd: %s\n", + virGetLastErrorMessage()); + goto cleanup; + } + + virObjectUnref(dom); + + fprintf(stderr, "QEMU running and connected\n"); + ret = 0; + cleanup: + virConnectClose(ctrl->conn); + if (ret < 0) + exit(EXIT_FAILURE); +} + +int main(int argc, char **argv) +{ + int rc = 1; + virThread thr; + virQEMUControllerPtr ctrl; + const struct option options[] = { + { "connect", 1, NULL, 'c' }, + { "help", 0, NULL, 'h' }, + { 0, 0, 0, 0 }, + }; + + argv0 = argv[0]; + + if (virGettextInitialize() < 0 || + virThreadInitialize() < 0 || + virErrorInitialize() < 0) { + fprintf(stderr, _("%s: initialization failed\n"), argv0); + exit(EXIT_FAILURE); + } + + /* Initialize logging */ + virLogSetFromEnv(); + + virUpdateSelfLastChanged(argv[0]); + + virFileActivateDirOverride(argv[0]); + + if (VIR_ALLOC(ctrl) < 0) + goto cleanup; + + ctrl->privileged = geteuid() == 0; + ctrl->uri = ctrl->privileged ? "qemu:///system" : "qemu:///session"; + + while (1) { + int c; + + c = getopt_long(argc, argv, "c:h", + options, NULL); + + if (c == -1) + break; + + switch (c) { + case 'c': + ctrl->uri = optarg; + break; + + case 'h': + case '?': + show_help(stdout); + rc = 0; + goto cleanup; + } + } + + ctrl->xml = argv[optind]; + + if (ctrl->xml == NULL) { + fprintf(stderr, "Missing XML file path\n"); + show_help(stderr); + goto cleanup; + } + + if (virEventRegisterDefaultImpl() < 0) { + fprintf(stderr, "Unable to initialize events: %s", + virGetLastErrorMessage()); + goto cleanup; + } + + if (virThreadCreate(&thr, false, virQEMUControllerMain, ctrl) < 0) + goto cleanup; + + for (;;) + virEventRunDefaultImpl(); + + rc = 0; + + cleanup: + virStateCleanup(); + if (ctrl->conn) + virConnectClose(ctrl->conn); + virObjectUnref(ctrl->vm); + VIR_FREE(ctrl); + return rc; +} diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 74b82450b4..8c8e9891c8 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -262,7 +262,7 @@ qemuDomainDisableNamespace(virDomainObjPtr vm, void qemuDomainEventQueue(virQEMUDriverPtr driver, virObjectEventPtr event) { - if (event) + if (event && driver->domainEventState) virObjectEventStateQueue(driver->domainEventState, event); } diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index fea1f24250..a7252ea913 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -371,7 +371,7 @@ qemuSecurityChownCallback(const virStorageSource *src, } -static int +int qemuSecurityInit(virQEMUDriverPtr driver) { char **names; -- 2.14.3

On Wed, Dec 20, 2017 at 04:47:50PM +0000, Daniel P. Berrange wrote:
This introduces a new binary 'libvirt_qemu' which can be used to launch guests externally from libvirtd. eg
libvirt_qemu -c qemu:///system /path/to/xml/file
This will launch a guest from the requested XML file and then connect to qemu:///system to register it with libvirtd. At this point all normal libvirtd operations should generally work.
NB, this impl has many flaws:
- We don't generate a unique 'id', so all guests will get id==1. Surprisingly this doesn't break the world.
- Tracking of unique name + UUIDs is *not* race free.
- No tracking/allocation of shared resource state (PCI devices, etc)
You will also not get the same events as you would when starting the domain as usual.
Most other functionality works, but might have expected results.
If I start a domain with a network, it segfaults. I'll see later if that's my fault or not. It probably (and hopefully) is.
In terms of namespace support there's some restrictions
- The mount namespace must share certain dirs
/etc/libvirt /var/run/libvirt /var/lib/libvirt /var/cache/libvirt
between the libvirtd and libvirt_qemu processes
- The PID namespace must match libvirtd & libvirt_qemu though this will be addressed in another patch to come
- The network namespace can be different
- Hotplug/unplug will likely break if separate namespaces are used
- The libvirt_qemu will exit if startup fails, however, it won't exit when QEMU later shuts down - you must listen for libvirtd events to detect that right now. We'll fix that
- Killing the libvirt_qemu shim will not kill the QEMU process. You must kill via the libvirt API as normal. This won't change, because we need to be able to ultimately restart the libvirt_qemu shim in order to apply software updates to running instance. We might wire up a special signal though to let you kill libvirt_qemu & take out QEMU at same time eg SIGQUIT or something like that perhaps.
IMHO we should use standard signals to do standard things. TERM should gracefully shutdown the domain. Not in PoC, but later. That way users can treat the shim process as other processes.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- po/POTFILES.in | 1 + src/Makefile.am | 49 ++++ src/qemu/qemu_conf.h | 2 +- src/qemu/qemu_controller.c | 551 +++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.c | 2 +- src/qemu/qemu_driver.c | 2 +- 6 files changed, 604 insertions(+), 3 deletions(-) create mode 100644 src/qemu/qemu_controller.c
diff --git a/po/POTFILES.in b/po/POTFILES.in index c1fa23427e..d97175b6d5 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -133,6 +133,7 @@ src/qemu/qemu_block.c src/qemu/qemu_capabilities.c src/qemu/qemu_cgroup.c src/qemu/qemu_command.c +src/qemu/qemu_controller.c src/qemu/qemu_conf.c src/qemu/qemu_domain.c src/qemu/qemu_domain_address.c diff --git a/src/Makefile.am b/src/Makefile.am index 4c022d1e44..7ca5ad4d3c 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -922,6 +922,9 @@ QEMU_DRIVER_SOURCES = \ qemu/qemu_capspriv.h \ qemu/qemu_security.c qemu/qemu_security.h
+QEMU_CONTROLLER_SOURCES = \ + qemu/qemu_controller.c + XENAPI_DRIVER_SOURCES = \ xenapi/xenapi_driver.c xenapi/xenapi_driver.h \ xenapi/xenapi_driver_private.h \ @@ -3115,6 +3118,52 @@ endif WITH_LIBVIRTD endif WITH_LXC EXTRA_DIST += $(LXC_CONTROLLER_SOURCES)
+if WITH_QEMU +if WITH_LIBVIRTD +libexec_PROGRAMS += libvirt_qemu +
Shouldn't I be able to launch this as a user without mgmt app? It should go to bin_PROGRAMS, I presume.
+libvirt_qemu_SOURCES = \ + $(QEMU_CONTROLLER_SOURCES) \ + $(DATATYPES_SOURCES) +libvirt_qemu_LDFLAGS = \ + $(AM_LDFLAGS) \ + $(PIE_LDFLAGS) \ + $(NULL) +libvirt_qemu_LDADD = \ + libvirt.la \ + libvirt-qemu.la \ + libvirt_driver_qemu_impl.la +if WITH_NETWORK +libvirt_qemu_LDADD += libvirt_driver_network_impl.la +endif WITH_NETWORK +if WITH_STORAGE +libvirt_qemu_LDADD += libvirt_driver_storage_impl.la +endif WITH_STORAGE
+libvirt_qemu_LDADD += ../gnulib/lib/libgnu.la $(LIBSOCKET)
This is not part of any condition, should be up there with libvirt.la just for clarity. Those are just nits, of course. [...]
+static void +virQEMUControllerDriverFree(virQEMUDriverPtr driver) +{ + if (!driver) + return; + + virObjectUnref(driver->config); + virObjectUnref(driver->hostdevMgr); + virHashFree(driver->sharedDevices); + virObjectUnref(driver->caps); + virObjectUnref(driver->qemuCapsCache); + + virObjectUnref(driver->domains); + virObjectUnref(driver->remotePorts); + virObjectUnref(driver->webSocketPorts); + virObjectUnref(driver->migrationPorts); + virObjectUnref(driver->migrationErrors); + + virObjectUnref(driver->xmlopt); + + virSysinfoDefFree(driver->hostsysinfo); + + virObjectUnref(driver->closeCallbacks); + + VIR_FREE(driver->qemuImgBinary); + + virObjectUnref(driver->securityManager); + + ebtablesContextFree(driver->ebtables); + + virLockManagerPluginUnref(driver->lockManager); + + virMutexDestroy(&driver->lock); + VIR_FREE(driver); +} + +static virQEMUDriverPtr virQEMUControllerNewDriver(bool privileged)
This is very similar to what I did, but I just exported this function privately si that there is less duplication of code and I added an enum that is used as a parameter instead of the bool @privileged. It is a tristate (user, system, shim) that special-cases the shim slightly. The reason behind that is that I kind of presumed we need to be able to launch system QEMUs with non-root user. Thinking about it I'm not sure that's the case for kubevirt, but it definitely is for others. The way it is done now is that you initialize the driver based on geteuid(), but you connect to any uri specified by the user. What I had in mind was that with this shim PoC users would be able to start their own VMs (if libvirtd permissions are set up the right way, even with some directory access changes done in the daemon) as system VMs as if they had seclabel set up for their user. I think it's feasible, but I maybe overestimated what's possible for the PoC.

On Tue, Jan 09, 2018 at 03:13:41PM +0100, Martin Kletzander wrote:
On Wed, Dec 20, 2017 at 04:47:50PM +0000, Daniel P. Berrange wrote:
This introduces a new binary 'libvirt_qemu' which can be used to launch guests externally from libvirtd. eg
libvirt_qemu -c qemu:///system /path/to/xml/file
This will launch a guest from the requested XML file and then connect to qemu:///system to register it with libvirtd. At this point all normal libvirtd operations should generally work.
NB, this impl has many flaws:
- We don't generate a unique 'id', so all guests will get id==1. Surprisingly this doesn't break the world.
- Tracking of unique name + UUIDs is *not* race free.
- No tracking/allocation of shared resource state (PCI devices, etc)
You will also not get the same events as you would when starting the domain as usual.
Hmm, yes, interesting. If someone is connected to libvirtd watching events they'll not get any startup events. That's something we'd need to fix.
Most other functionality works, but might have expected results.
If I start a domain with a network, it segfaults. I'll see later if that's my fault or not. It probably (and hopefully) is.
Not intentional if that's happening !
- The libvirt_qemu will exit if startup fails, however, it won't exit when QEMU later shuts down - you must listen for libvirtd events to detect that right now. We'll fix that
- Killing the libvirt_qemu shim will not kill the QEMU process. You must kill via the libvirt API as normal. This won't change, because we need to be able to ultimately restart the libvirt_qemu shim in order to apply software updates to running instance. We might wire up a special signal though to let you kill libvirt_qemu & take out QEMU at same time eg SIGQUIT or something like that perhaps.
IMHO we should use standard signals to do standard things. TERM should gracefully shutdown the domain. Not in PoC, but later. That way users can treat the shim process as other processes.
I guess we can use SIGUSR1 to trigger restart of the shim while leaving QEMU running.
+if WITH_QEMU +if WITH_LIBVIRTD +libexec_PROGRAMS += libvirt_qemu +
Shouldn't I be able to launch this as a user without mgmt app? It should go to bin_PROGRAMS, I presume.
Yes
+libvirt_qemu_SOURCES = \ + $(QEMU_CONTROLLER_SOURCES) \ + $(DATATYPES_SOURCES) +libvirt_qemu_LDFLAGS = \ + $(AM_LDFLAGS) \ + $(PIE_LDFLAGS) \ + $(NULL) +libvirt_qemu_LDADD = \ + libvirt.la \ + libvirt-qemu.la \ + libvirt_driver_qemu_impl.la +if WITH_NETWORK +libvirt_qemu_LDADD += libvirt_driver_network_impl.la +endif WITH_NETWORK +if WITH_STORAGE +libvirt_qemu_LDADD += libvirt_driver_storage_impl.la +endif WITH_STORAGE
+libvirt_qemu_LDADD += ../gnulib/lib/libgnu.la $(LIBSOCKET)
This is not part of any condition, should be up there with libvirt.la just for clarity.
IIRC, libgnu.la needs to be the last library listed
+static void +virQEMUControllerDriverFree(virQEMUDriverPtr driver) +{ + if (!driver) + return; + + virObjectUnref(driver->config); + virObjectUnref(driver->hostdevMgr); + virHashFree(driver->sharedDevices); + virObjectUnref(driver->caps); + virObjectUnref(driver->qemuCapsCache); + + virObjectUnref(driver->domains); + virObjectUnref(driver->remotePorts); + virObjectUnref(driver->webSocketPorts); + virObjectUnref(driver->migrationPorts); + virObjectUnref(driver->migrationErrors); + + virObjectUnref(driver->xmlopt); + + virSysinfoDefFree(driver->hostsysinfo); + + virObjectUnref(driver->closeCallbacks); + + VIR_FREE(driver->qemuImgBinary); + + virObjectUnref(driver->securityManager); + + ebtablesContextFree(driver->ebtables); + + virLockManagerPluginUnref(driver->lockManager); + + virMutexDestroy(&driver->lock); + VIR_FREE(driver); +} + +static virQEMUDriverPtr virQEMUControllerNewDriver(bool privileged)
This is very similar to what I did, but I just exported this function privately si that there is less duplication of code and I added an enum that is used as a parameter instead of the bool @privileged. It is a tristate (user, system, shim) that special-cases the shim slightly.
Yeah, there's definitely much more duplication here that I would like. I would expect to refactor this better.
The reason behind that is that I kind of presumed we need to be able to launch system QEMUs with non-root user. Thinking about it I'm not sure that's the case for kubevirt, but it definitely is for others. The way it is done now is that you initialize the driver based on geteuid(), but you connect to any uri specified by the user. What I had in mind was that with this shim PoC users would be able to start their own VMs (if libvirtd permissions are set up the right way, even with some directory access changes done in the daemon) as system VMs as if they had seclabel set up for their user. I think it's feasible, but I maybe overestimated what's possible for the PoC.
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (3)
-
Daniel P. Berrange
-
John Ferlan
-
Martin Kletzander