Re: [RFC] Add basic driver for the Cloud-Hypervisor

15 Sep 2020

On Thu, Aug 27, 2020 at 11:24:32AM -0700, William Douglas wrote:
...
This patch adds support for the following initial VM actions using the
Cloud-Hypervsior API:
 * vm.create
 * vm.delete
 * vm.boot
 * vm.shutdown
 * vm.reboot
 * vm.pause
 * vm.resume
To use the Cloud-Hypervisor driver, the v0.9.0 (the as of now current)
release of Cloud-Hypervisor is required to be installed.
It would be useful to outline the general architecture / approach to
mgmt.

IIUC, cloud-hypervisor is similar to QEMU in that launching a VM just
involves spawning a process for that VM. There is no central daemon
or authority over all cloud-hypervisor VMs, thus libvirtd fills that
role by acting as the mgmt daemon.

In terms of communicating with the process, I'm seeing that basically
everyting is REST API calls with JSON encoded data.
...
Signed-off-by: William Douglas <william.douglas@intel.com>
---
 include/libvirt/virterror.h |   1 +
 libvirt.spec.in             |  32 ++
 meson.build                 |   5 +
 meson_options.txt           |   1 +
It would be good to have a drvch.html.in file, well a drvch.rst
file actually to give users an intro to the new driver. It should
probably mention it is an early PoC not really targetting production
usage at this time since there's many key features missing.
...

diff --git a/libvirt.spec.in b/libvirt.spec.in
index bb74443484..66edb1fa76 100644
--- a/libvirt.spec.in
+++ b/libvirt.spec.in
@@ -14,6 +14,7 @@
# The hypervisor drivers that run in libvirtd
 %define with_qemu          0%{!?_without_qemu:1}
+%define with_ch            0%{!?_without_ch:1}
 %define with_lxc           0%{!?_without_lxc:1}
 %define with_libxl         0%{!?_without_libxl:1}
 %define with_vbox          0%{!?_without_vbox:1}
@@ -232,6 +233,9 @@ Requires: libvirt-daemon-driver-lxc = %{version}-%{release}
 %if %{with_qemu}
 Requires: libvirt-daemon-driver-qemu = %{version}-%{release}
 %endif
+%if %{with_ch}
+Requires: libvirt-daemon-driver-ch = %{version}-%{release}
+%endif
 # We had UML driver, but we've removed it.
 Obsoletes: libvirt-daemon-driver-uml <= 5.0.0
 Obsoletes: libvirt-daemon-uml <= 5.0.0
@@ -744,6 +748,20 @@ QEMU
 %endif
+%if %{with_ch}
+%package daemon-driver-ch
+Summary: Cloud-Hypervisor driver plugin for the libvirtd daemon
+Requires: libvirt-daemon = %{version}-%{release}
+Requires: libvirt-libs = %{version}-%{release}
+Requires: /usr/bin/cloud-hypervisor
+
+%description daemon-driver-ch
+The Cloud-Hypervisor driver plugin for the libvirtd daemon,
+providing an implementation of the hypervisor driver APIs
+using Cloud-Hypervisor
+%endif
The spec file targets Fedora and RHEL, and IIUC, cloud-hypervisor
is not present in either of them. So for now the spec file should
always disable building of the new driver.
...
diff --git a/meson.build b/meson.build
index dabd4196e6..a6759cb051 100644
--- a/meson.build
+++ b/meson.build
@@ -1722,6 +1722,10 @@ elif get_option('driver_lxc').enabled()
   error('linux and remote_driver are required for LXC')
 endif
+if not get_option('driver_ch').disabled() and host_machine.system() == 'linux' and conf.has('WITH_LIBVIRTD')
+  conf.set('WITH_CH', 1)
+endif
The driver relies on Curl and YAJL, so you'll need to do more there to
automatically disable the driver if either of those is not present.
Take a look a the handling of 'drive_qemu' for a reasonable illustration
for YAJL, which you can adapt and include curl too.
...
diff --git a/src/ch/ch_conf.c b/src/ch/ch_conf.c
new file mode 100644
index 0000000000..8769b0f7e2
--- /dev/null
+++ b/src/ch/ch_conf.c
@@ -0,0 +1,239 @@
+/*
+ * Copyright Intel Corp. 2020
+ *
+ * ch_driver.h: header file for Cloud-Hypervisor driver functions
I'm surprised 'ninja test' didn't complain that the filename mentioned
here doesn't match the actual filename. Likewise for other files.
...
+/* Functions */
+virCapsPtr virCHDriverCapsInit(void)
+{
+    virCapsPtr caps;
+    virCapsGuestPtr guest;
+
+    if ((caps = virCapabilitiesNew(virArchFromHost(),
+                                   false, false)) == NULL)
+        goto cleanup;
+
+    if (!(caps->host.numa = virCapabilitiesHostNUMANewHost()))
+        goto cleanup;
+
+    if (virCapabilitiesInitCaches(caps) < 0)
+        goto cleanup;
+
+    if ((guest = virCapabilitiesAddGuest(caps,
+                                         VIR_DOMAIN_OSTYPE_HVM,
+                                         caps->host.arch,
+                                         NULL,
+                                         NULL,
+                                         0,
+                                         NULL)) == NULL)
+        goto cleanup;
+
+    if (virCapabilitiesAddGuestDomain(guest,
+                                      VIR_DOMAIN_VIRT_CH,
IIUC, cloud hypervisor is still using KVM as the hypervisor,
merely providing a different userspace emulation layer. So
in terms of the libvirt virt type, we should still be using
VIR_DOMAIN_VIRT_KVM. No need to invent a VIR_DOMAIN_VIRT_CH.
...
+                                      NULL,
+                                      NULL,
+                                      0,
+                                      NULL) == NULL)
+        goto cleanup;
This should also be testing of the required binary exists
on the host, and if it does not, then don't claim to support
guest domains.

On 64-bit hosts does CH provide a way to strictly emulate
just a 32-bit CPU, or will it always be 64-bit only ?

If the former, then you need to add the 32-bit equivalent
guest too.
...
+
+    return caps;
+
+ cleanup:
+    virObjectUnref(caps);
+    return NULL;
+}
+
...
+static void
+virCHDriverConfigDispose(void *obj)
+{
+    virCHDriverConfigPtr cfg = obj;
+
+    VIR_FREE(cfg->stateDir);
+    VIR_FREE(cfg->logDir);
Use g_free for these.
...
+}
+
+static int
+chExtractVersionInfo(int *retversion)
+{
+    int ret = -1;
+    unsigned long version;
+    char *help = NULL;
+    char *tmp;
+    virCommandPtr cmd = virCommandNewArgList(CH_CMD, "--version", NULL);
+
+    if (retversion)
+        *retversion = 0;
+
+    virCommandAddEnvString(cmd, "LC_ALL=C");
+    virCommandSetOutputBuffer(cmd, &help);
+
+    if (virCommandRun(cmd, NULL) < 0)
+        goto cleanup;
+
+    tmp = help;
+
+    /* expected format: cloud-hypervisor v<major>.<minor>.<micro> */
+    if ((tmp = STRSKIP(tmp, "cloud-hypervisor v")) == NULL)
+        goto cleanup;
+
+    if (virParseVersionString(tmp, &version, false) < 0)
+        goto cleanup;
+
+    // v0.9.0 is the minimum supported version
+    if ((unsigned int)(version / 1000000) < 1) {
+        if (((unsigned int)((unsigned long)(version % 1000000)) / 1000) < 9) {
+            virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                           _("Cloud-Hypervisor version is too old (v0.9.0 is the minimum supported version)"));
+            goto cleanup;
+        }
+    }
This is a bit more convulted than needed - just encode the
min version you want as an integer and do a single compare
eg

   #define MIN_VERSION ((0 * 1000000) + (9 * 1000) + (0))

   if (version < MIN_VERSION) {
       ..report error...
   }
...
+int chStrToInt(const char *str)
+{
+    int val;
+
+    if (virStrToLong_i(str, NULL, 10, &val) < 0)
+        return 0;
+
+    return val;
+}
This method is pretty pointless and its also not used anywhere
...
diff --git a/src/ch/ch_conf.h b/src/ch/ch_conf.h
new file mode 100644
index 0000000000..04334130f7
--- /dev/null
+++ b/src/ch/ch_conf.h
+#pragma once
+
+#include "virdomainobjlist.h"
+#include "virthread.h"
+
+#define CH_DRIVER_NAME "CH"
+#define CH_CMD "cloud-hypervisor"
+
+#define CH_STATE_DIR RUNSTATEDIR "/libvirt/ch"
+#define CH_LOG_DIR LOCALSTATEDIR "/log/libvirt/ch"
+
+typedef struct _virCHDriver virCHDriver;
+typedef virCHDriver *virCHDriverPtr;
+
+typedef struct _virCHDriverConfig virCHDriverConfig;
+typedef virCHDriverConfig *virCHDriverConfigPtr;
+
+struct _virCHDriverConfig {
+    virObject parent;
FWIW, I'd encourage you to use  GObject rather than virObject
for anything new, as our long term goal is to convert all
code to use GObject alone and eliminate virObject
...
diff --git a/src/ch/ch_domain.c b/src/ch/ch_domain.c
new file mode 100644
index 0000000000..a46641d50d
--- /dev/null
+++ b/src/ch/ch_domain.c
@@ -0,0 +1,219 @@
...
+static void *
+virCHDomainObjPrivateAlloc(void *opaque G_GNUC_UNUSED)
+{
+    virCHDomainObjPrivatePtr priv;
+
+    if (VIR_ALLOC(priv) < 0)
+        return NULL;
+
+    if (virCHDomainObjInitJob(priv) < 0) {
+        VIR_FREE(priv);
+        return NULL;
+    }
+
+    return priv;
+}
+
+static void
+virCHDomainObjPrivateFree(void *data)
+{
+    virCHDomainObjPrivatePtr priv = data;
+
+    virCHDomainObjFreeJob(priv);
+    VIR_FREE(priv);
+}
+
+static int
+virCHDomainObjPrivateXMLFormat(virBufferPtr buf,
+                               virDomainObjPtr vm)
+{
+    virCHDomainObjPrivatePtr priv = vm->privateData;
+    virBufferAsprintf(buf, "<init pid='%lld'/>\n",
+                      (long long)priv->initpid);
+
+    return 0;
+}
+
+static int
+virCHDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt,
+                              virDomainObjPtr vm,
+                              virDomainDefParserConfigPtr config G_GNUC_UNUSED)
+{
+    virCHDomainObjPrivatePtr priv = vm->privateData;
+    long long thepid;
+
+    if (virXPathLongLong("string(./init[1]/@pid)", ctxt, &thepid) < 0) {
+        VIR_WARN("Failed to load init pid from state %s",
+                 virGetLastErrorMessage());
+        priv->initpid = 0;
+    } else {
+        priv->initpid = thepid;
+    }
+
+    return 0;
+}
What is this  initpid needed for ?  The virDomainObjPtr struct
already lets us record a PID for the guest in the <domstatus>
element.
...
diff --git a/src/ch/ch_driver.c b/src/ch/ch_driver.c
new file mode 100644
index 0000000000..e5b027f71f
--- /dev/null
+++ b/src/ch/ch_driver.c
@@ -0,0 +1,937 @@
+/*
+ * Copyright Intel Corp. 2020
+ *
+ * ch_driver.h: header file for Cloud-Hypervisor driver functions
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library.  If not, see
+ * <http://www.gnu.org/licenses/>.
+ */
+
+#include <config.h>
+
+#include "ch_conf.h"
+#include "ch_domain.h"
+#include "ch_driver.h"
+#include "ch_monitor.h"
+#include "ch_process.h"
+#include "datatypes.h"
+#include "driver.h"
+#include "viraccessapicheck.h"
+#include "viralloc.h"
+#include "virbuffer.h"
+#include "vircommand.h"
+#include "virerror.h"
+#include "virfile.h"
+#include "virlog.h"
+#include "virnetdevtap.h"
+#include "virobject.h"
+#include "virstring.h"
+#include "virtypedparam.h"
+#include "viruri.h"
+#include "virutil.h"
+#include "viruuid.h"
+
+#define VIR_FROM_THIS VIR_FROM_CH
+
+VIR_LOG_INIT("ch.ch_driver");
...
+
+/* Functions */
+static int
+chConnectURIProbe(char **uri)
+{
+    if (ch_driver == NULL)
+        return 0;
+
+    *uri = g_strdup("ch:///system");
+    return 1;
+}
Are you not able to support the ch:///session mode for running
under the user's unprivileged session ?
...
+/**
+ * chDomainCreateXML:
+ * @conn: pointer to connection
+ * @xml: XML definition of domain
+ * @flags: bitwise-OR of supported virDomainCreateFlags
+ *
+ * Creates a domain based on xml and starts it
+ *
+ * Returns a new domain object or NULL in case of failure.
+ */
+static virDomainPtr
+chDomainCreateXML(virConnectPtr conn,
+                           const char *xml,
+                           unsigned int flags)
+{
+    virCHDriverPtr driver = conn->privateData;
+    virDomainDefPtr vmdef = NULL;
+    virDomainObjPtr vm = NULL;
+    virDomainPtr dom = NULL;
+    unsigned int parse_flags = VIR_DOMAIN_DEF_PARSE_INACTIVE;
+
+    virCheckFlags(VIR_DOMAIN_START_VALIDATE, NULL);
+
+    if (flags & VIR_DOMAIN_START_VALIDATE)
+        parse_flags |= VIR_DOMAIN_DEF_PARSE_VALIDATE_SCHEMA;
+
+
+    if ((vmdef = virDomainDefParseString(xml, driver->xmlopt,
+                                         NULL, parse_flags)) == NULL)
+        goto cleanup;
+
+    if (virDomainCreateXMLEnsureACL(conn, vmdef) < 0)
+        goto cleanup;
+
+    if (!(vm = virDomainObjListAdd(driver->domains,
+                                   vmdef,
+                                   driver->xmlopt,
+                                   VIR_DOMAIN_OBJ_LIST_ADD_LIVE |
+                                       VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE,
+                                   NULL)))
+        goto cleanup;
+
+    vmdef = NULL;
+    vm->persistent = 1;
This isn't right - the virDomainCreateXML method is for creating
a transient guest only.

Also from this point onwards, for any failure you need to remove
the dangling guest from the driver->domains list.
...
+
+    if (virCHDomainObjBeginJob(vm, CH_JOB_MODIFY) < 0)
+        goto cleanup;
+
+    if (virCHProcessStart(driver, vm, VIR_DOMAIN_RUNNING_BOOTED) < 0)
+        goto cleanup;
+
+    dom = virGetDomain(conn, vm->def->name, vm->def->uuid, vm->def->id);
+
+    virCHDomainObjEndJob(vm);
+
+ cleanup:
+    virDomainDefFree(vmdef);
+    virDomainObjEndAPI(&vm);
+    chDriverUnlock(driver);
+    return dom;
+}
...
+static int
+chDomainShutdownFlags(virDomainPtr dom,
+                      unsigned int flags)
+{
+    virCHDomainObjPrivatePtr priv;
+    virDomainObjPtr vm;
+    virDomainState state;
+    int ret = -1;
+
+    virCheckFlags(VIR_DOMAIN_SHUTDOWN_INITCTL |
+                  VIR_DOMAIN_SHUTDOWN_SIGNAL, -1);
These two shutdown methods would really only be for container
based virt. When you have a KVM guest, only an APIC based
trigger, or guest agent trigger make conceptual sense.
...
+
+    if (!(vm = chDomObjFromDomain(dom)))
+        goto cleanup;
+
+    priv = vm->privateData;
+
+    if (virDomainShutdownFlagsEnsureACL(dom->conn, vm->def, flags) < 0)
+        goto cleanup;
+
+    if (virCHDomainObjBeginJob(vm, CH_JOB_MODIFY) < 0)
+        goto cleanup;
+
+    if (virDomainObjCheckActive(vm) < 0)
+        goto endjob;
+
+    state = virDomainObjGetState(vm, NULL);
+    if (state != VIR_DOMAIN_RUNNING && state != VIR_DOMAIN_PAUSED) {
+        virReportError(VIR_ERR_OPERATION_INVALID, "%s",
+                       _("only can shutdown running/paused domain"));
+        goto endjob;
+    } else {
+        if (virCHMonitorShutdownVM(priv->monitor) < 0) {
+            virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                        _("failed to shutdown guest VM"));
+            goto endjob;
+        }
+    }
+
+    virDomainObjSetState(vm, VIR_DOMAIN_SHUTDOWN, VIR_DOMAIN_SHUTDOWN_USER);
+
+    ret = 0;
+
+ endjob:
+    virCHDomainObjEndJob(vm);
+
+ cleanup:
+    virDomainObjEndAPI(&vm);
+    return ret;
+}
...
+static int
+chDomainReboot(virDomainPtr dom, unsigned int flags)
+{
+    virCHDomainObjPrivatePtr priv;
+    virDomainObjPtr vm;
+    virDomainState state;
+    int ret = -1;
+
+    virCheckFlags(VIR_DOMAIN_REBOOT_INITCTL |
+                  VIR_DOMAIN_REBOOT_SIGNAL, -1);
Same note as above.
...
+
+    if (!(vm = chDomObjFromDomain(dom)))
+        goto cleanup;
+
+    priv = vm->privateData;
+
+    if (virDomainRebootEnsureACL(dom->conn, vm->def, flags) < 0)
+        goto cleanup;
+
+    if (virCHDomainObjBeginJob(vm, CH_JOB_MODIFY) < 0)
+        goto cleanup;
+
+    if (virDomainObjCheckActive(vm) < 0)
+        goto endjob;
+
+    state = virDomainObjGetState(vm, NULL);
+    if (state != VIR_DOMAIN_RUNNING && state != VIR_DOMAIN_PAUSED) {
+        virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s",
+                       _("only can reboot running/paused domain"));
+        goto endjob;
+    } else {
+        if (virCHMonitorRebootVM(priv->monitor) < 0) {
+            virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                           _("failed to reboot domain"));
+            goto endjob;
+        }
+    }
+
+    if (state == VIR_DOMAIN_RUNNING)
+        virDomainObjSetState(vm, VIR_DOMAIN_RUNNING, VIR_DOMAIN_RUNNING_BOOTED);
+    else
+        virDomainObjSetState(vm, VIR_DOMAIN_RUNNING, VIR_DOMAIN_RUNNING_UNPAUSED);
+
+    ret = 0;
+
+ endjob:
+    virCHDomainObjEndJob(vm);
+
+ cleanup:
+    virDomainObjEndAPI(&vm);
+    return ret;
+}
+
+static int
+chDomainSuspend(virDomainPtr dom)
+{
+    virCHDomainObjPrivatePtr priv;
+    virDomainObjPtr vm;
+    int ret = -1;
+
+    if (!(vm = chDomObjFromDomain(dom)))
+        goto cleanup;
+
+    priv = vm->privateData;
+
+    if (virDomainSuspendEnsureACL(dom->conn, vm->def) < 0)
+        goto cleanup;
+
+    if (virCHDomainObjBeginJob(vm, CH_JOB_MODIFY) < 0)
+        goto cleanup;
+
+    if (virDomainObjCheckActive(vm) < 0)
+        goto endjob;
+
+    if (virDomainObjGetState(vm, NULL) != VIR_DOMAIN_RUNNING) {
+        virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s",
+                       _("only can suspend running domain"));
+        goto endjob;
+    } else {
+        if (virCHMonitorSuspendVM(priv->monitor) < 0) {
Just to explicitly confirm,  "suspend" in libvirt terminology means
pause execution of the guest CPUs, nothing else.   Does that match
semantics from CH ?
...
+            virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                        _("failed to suspend domain"));
+            goto endjob;
+        }
+    }
+
+    virDomainObjSetState(vm, VIR_DOMAIN_PAUSED, VIR_DOMAIN_PAUSED_USER);
+
+    ret = 0;
+
+ endjob:
+    virCHDomainObjEndJob(vm);
+
+ cleanup:
+    virDomainObjEndAPI(&vm);
+    return ret;
+}
...
+static int chStateInitialize(bool privileged,
+                             const char *root,
+                             virStateInhibitCallback callback G_GNUC_UNUSED,
+                             void *opaque G_GNUC_UNUSED)
+{
+    if (root != NULL) {
+        virReportError(VIR_ERR_INVALID_ARG, "%s",
+                       _("Driver does not support embedded mode"));
+        return -1;
+    }
+
+    /* Check that the user is root, silently disable if not */
+    if (!privileged) {
+        VIR_INFO("Not running privileged, disabling driver");
+        return VIR_DRV_STATE_INIT_SKIPPED;
+    }
Same question as earlier about why its forbidding non-root usage ?
...
+static virConnectDriver chConnectDriver = {
+    .localOnly = true,
+    .uriSchemes = (const char *[]){"CH", "Ch", "ch", "Cloud-Hypervisor", NULL},
Your driver URI is "ch:///system", so the only thing that should be in this
list is "ch".
...
+    .hypervisorDriver = &chHypervisorDriver,
+};
+
+static virStateDriver chStateDriver = {
+    .name = "CH",
I'd suggest "cloud-hypervisor" to be a bit clearer
...
+    .stateInitialize = chStateInitialize,
+    .stateCleanup = chStateCleanup,
+};
+
+int chRegister(void)
+{
+    if (virRegisterConnectDriver(&chConnectDriver, false) < 0)
+        return -1;
+    if (virRegisterStateDriver(&chStateDriver) < 0)
+        return -1;
+    return 0;
+}
...
diff --git a/src/ch/ch_monitor.c b/src/ch/ch_monitor.c
new file mode 100644
index 0000000000..ccef70f719
--- /dev/null
+++ b/src/ch/ch_monitor.c
...
+static int
+virCHMonitorBuildCPUJson(virJSONValuePtr content, virDomainDefPtr vmdef)
+{
+    virJSONValuePtr cpus;
+    unsigned int maxvcpus = 0;
+    unsigned int nvcpus = 0;
+    virDomainVcpuDefPtr vcpu;
+    size_t i;
+
+    /* count maximum allowed number vcpus and enabled vcpus when boot.*/
+    maxvcpus = virDomainDefGetVcpusMax(vmdef);
+    for (i = 0; i < maxvcpus; i++) {
+        vcpu = virDomainDefGetVcpu(vmdef, i);
+        if (vcpu->online)
+            nvcpus++;
+    }
+
+    if (maxvcpus != 0 || nvcpus != 0) {
+        cpus = virJSONValueNewObject();
+        if (virJSONValueObjectAppendNumberInt(cpus, "boot_vcpus", nvcpus) < 0)
+            goto cleanup;
+        if (virJSONValueObjectAppendNumberInt(cpus, "max_vcpus", vmdef->maxvcpus) < 0)
+            goto cleanup;
+        if (virJSONValueObjectAppend(content, "cpus", cpus) < 0)
+            goto cleanup;
+    }
+
+    return 0;
+
+ cleanup:
+    virJSONValueFree(cpus);
+    return -1;
+}
+
+static int
+virCHMonitorBuildKernelJson(virJSONValuePtr content, virDomainDefPtr vmdef)
+{
+    virJSONValuePtr kernel;
+
+    if (vmdef->os.kernel == NULL) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("Kernel image path in this domain is not defined"));
+        return -1;
+    } else {
+        kernel = virJSONValueNewObject();
+        if (virJSONValueObjectAppendString(kernel, "path", vmdef->os.kernel) < 0)
+            goto cleanup;
+        if (virJSONValueObjectAppend(content, "kernel", kernel) < 0)
+            goto cleanup;
+    }
+
+    return 0;
+
+ cleanup:
+    virJSONValueFree(kernel);
+    return -1;
+}
+
+static int
+virCHMonitorBuildCmdlineJson(virJSONValuePtr content, virDomainDefPtr vmdef)
+{
+    virJSONValuePtr cmdline;
+
+    cmdline = virJSONValueNewObject();
+    if (vmdef->os.cmdline) {
+        if (virJSONValueObjectAppendString(cmdline, "args", vmdef->os.cmdline) < 0)
+            goto cleanup;
+        if (virJSONValueObjectAppend(content, "cmdline", cmdline) < 0)
+            goto cleanup;
+    }
+
+    return 0;
+
+ cleanup:
+    virJSONValueFree(cmdline);
+    return -1;
+}
I'm kinda inclined to say it is overkill to have separate methods for kernel,
initrd and cmdline. I'd handle all three at same time since they go hand in
hand with each other.
...
+static int
+virCHMonitorBuildDiskJson(virJSONValuePtr disks, virDomainDiskDefPtr diskdef)
+{
+    virJSONValuePtr disk;
+
+    if (diskdef->src != NULL && diskdef->src->path != NULL) {
You must validate diskdef->type before accesing any fields for the source
otherwise it is highly liable to crash on unexpected user input. Generally
we'd expect a switch(diskdef->type) and cases to each applicable type you
want to support, with errors raised for others.
...
+        disk = virJSONValueNewObject();
+        if (virJSONValueObjectAppendString(disk, "path", diskdef->src->path) < 0)
+            goto cleanup;
+        if (diskdef->src->readonly) {
+            if (virJSONValueObjectAppendBoolean(disk, "readonly", true) < 0)
+                goto cleanup;
+        }
+        if (virJSONValueArrayAppend(disks, disk) < 0)
+            goto cleanup;
+    }
+
+    return 0;
+
+ cleanup:
+    virJSONValueFree(disk);
+    return -1;
+}
...
+static int
+virCHMonitorBuildNetJson(virJSONValuePtr nets, virDomainNetDefPtr netdef)
+{
+    virDomainNetType netType = virDomainNetGetActualType(netdef);
+    char macaddr[VIR_MAC_STRING_BUFLEN];
+    virJSONValuePtr net;
+
+    // check net type at first
+    net = virJSONValueNewObject();
+
+    switch (netType) {
+    case VIR_DOMAIN_NET_TYPE_ETHERNET:
+        if (netdef->guestIP.nips == 1) {
+            const virNetDevIPAddr *ip = netdef->guestIP.ips[0];
+            g_autofree char *addr = NULL;
+            virSocketAddr netmask;
+            g_autofree char *netmaskStr = NULL;
+            if (!(addr = virSocketAddrFormat(&ip->address)))
+                goto cleanup;
+            if (virJSONValueObjectAppendString(net, "ip", addr) < 0)
+                goto cleanup;
+
+            if (virSocketAddrPrefixToNetmask(ip->prefix, &netmask, AF_INET) < 0) {
+                virReportError(VIR_ERR_INTERNAL_ERROR,
+                               _("Failed to translate net prefix %d to netmask"),
+                               ip->prefix);
+                goto cleanup;
+            }
+            if (!(netmaskStr = virSocketAddrFormat(&netmask)))
+                goto cleanup;
+            if (virJSONValueObjectAppendString(net, "mask", netmaskStr) < 0)
+                goto cleanup;
+        }
else {
....report VIR_ERR_CONFIG_UNSUPPORTED...
}

As a general rule, when building the config from the user's XML, report
as many errors as possible. It is preferrable for apps to see an error
report than to have their config silently ignored.
...
+        break;
+    case VIR_DOMAIN_NET_TYPE_VHOSTUSER:
+        if ((virDomainChrType)netdef->data.vhostuser->type != VIR_DOMAIN_CHR_TYPE_UNIX) {
+            virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+                           _("vhost_user type support UNIX socket in this CH"));
+            goto cleanup;
+        } else {
+            if (virJSONValueObjectAppendString(net, "vhost_socket", netdef->data.vhostuser->data.nix.path) < 0)
+                goto cleanup;
+            if (virJSONValueObjectAppendBoolean(net, "vhost_user", true) < 0)
+                goto cleanup;
+        }
+        break;
+    case VIR_DOMAIN_NET_TYPE_BRIDGE:
+    case VIR_DOMAIN_NET_TYPE_NETWORK:
+    case VIR_DOMAIN_NET_TYPE_DIRECT:
+    case VIR_DOMAIN_NET_TYPE_USER:
+    case VIR_DOMAIN_NET_TYPE_SERVER:
+    case VIR_DOMAIN_NET_TYPE_CLIENT:
+    case VIR_DOMAIN_NET_TYPE_MCAST:
+    case VIR_DOMAIN_NET_TYPE_INTERNAL:
+    case VIR_DOMAIN_NET_TYPE_HOSTDEV:
+    case VIR_DOMAIN_NET_TYPE_UDP:
+    case VIR_DOMAIN_NET_TYPE_LAST:
+    default:
+        virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+                       _("Only ethernet and vhost_user type network types are "
+                         "supported in this CH"));
+        goto cleanup;
+    }
For _LAST/defalt, use  virReportEnumRangeError
...
+static int
+virCHMonitorBuildVMJson(virDomainDefPtr vmdef, char **jsonstr)
+{
+    virJSONValuePtr content = virJSONValueNewObject();
+    int ret = -1;
+
+    if (vmdef == NULL) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("VM is not defined"));
+        goto cleanup;
+    }
+
+    if (virCHMonitorBuildCPUJson(content, vmdef) < 0)
+        goto cleanup;
+
+    if (virCHMonitorBuildMemoryJson(content, vmdef) < 0)
+        goto cleanup;
+
+    if (virCHMonitorBuildKernelJson(content, vmdef) < 0)
+        goto cleanup;
+
+    if (virCHMonitorBuildCmdlineJson(content, vmdef) < 0)
+        goto cleanup;
+
+    if (virCHMonitorBuildInitramfsJson(content, vmdef) < 0)
+        goto cleanup;
+
+    if (virCHMonitorBuildDisksJson(content, vmdef) < 0)
+        goto cleanup;
+
+    if (virCHMonitorBuildNetsJson(content, vmdef) < 0)
+        goto cleanup;
+
+    if (!(*jsonstr = virJSONValueToString(content, false)))
+        goto cleanup;
+
+    ret = 0;
+
+ cleanup:
+    virJSONValueFree(content);
+    return ret;
+}
Obviously you're doing the bare minimum here to get something running
which is fine for an initial merge. Ultimately though, it should be
reporting erors for any types of devices you don't wish to support.
...
+static virCommandPtr
+chMonitorBuildSocketCmd(virDomainObjPtr vm, const char *socket_path)
+{
+    virCommandPtr cmd;
+
+    if (vm->def == NULL) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("VM is not defined"));
+        return NULL;
+    }
+
+    if (vm->def->emulator != NULL)
+        cmd = virCommandNew(vm->def->emulator);
+    else
+        cmd = virCommandNew(CH_CMD);
If you populate the guest capabilities with the default emulator
path, it should get populated into the virDomainDef, and thus
be able to assume it is non-NULL.
...
+
+    virCommandAddArgList(cmd, "--api-socket", socket_path, NULL);
+
+    return cmd;
+}
+
+virCHMonitorPtr
+virCHMonitorNew(virDomainObjPtr vm, const char *socketdir)
+{
+    virCHMonitorPtr ret = NULL;
+    virCHMonitorPtr mon = NULL;
+    virCommandPtr cmd = NULL;
+    int pings = 0;
+
+    if (virCHMonitorInitialize() < 0)
+        return NULL;
+
+    if (!(mon = virObjectLockableNew(virCHMonitorClass)))
+        return NULL;
+
+    mon->socketpath = g_strdup_printf("%s/%s-socket", socketdir, vm->def->name);
+
+    /* prepare to launch Cloud-Hypervisor socket */
+    if (!(cmd = chMonitorBuildSocketCmd(vm, mon->socketpath)))
+        goto cleanup;
+
+    if (virFileMakePath(socketdir) < 0) {
+        virReportSystemError(errno,
+                             _("Cannot create socket directory '%s'"),
+                             socketdir);
+        goto cleanup;
+    }
+
+    /* launch Cloud-Hypervisor socket */
+    if (virCommandRunAsync(cmd, &mon->pid) < 0)
+        goto cleanup;
+
+    /* get a curl handle */
+    mon->handle = curl_easy_init();
+
+    /* try to ping VMM socket 5 times to make sure it is ready */
+    while (pings < 5) {
+        if (virCHMonitorPingVMM(mon) == 0)
+            break;
+        if (pings == 5)
+            goto cleanup;
+
+        g_usleep(100 * 1000);
+    }
This is highly undesirable. Is there no way to launch the CH process
such that the socket is guaranteed to be accepting requests by the
time it has forked into the background ? Or is there a way to pass
in a pre-opened FD for the monitor socket ?

This kind of wait with timeout will cause startup failures due to
timeout under load.
...
+
+    /* now has its own reference */
+    virObjectRef(mon);
+    mon->vm = virObjectRef(vm);
+
+    ret = mon;
+
+ cleanup:
+    virCommandFree(cmd);
+    return ret;
+}
...
+struct data {
+  char trace_ascii; /* 1 or 0 */
+};
+
+static void dump(const char *text,
+                 FILE *stream,
+                 unsigned char *ptr,
+                 size_t size,
+                 char nohex)
+{
+    size_t i;
+    size_t c;
+
+    unsigned int width = 0x10;
+
+    if (nohex)
+        /* without the hex output, we can fit more on screen */
+        width = 0x40;
+
+    fprintf(stream, "%s, %10.10lu bytes (0x%8.8lx)\n", text, (unsigned long)size,
+            (unsigned long)size);
+
+    for (i = 0; i < size; i += width) {
+
+        fprintf(stream, "%4.4lx: ", (unsigned long)i);
+
+        if (!nohex) {
+            /* hex not disabled, show it */
+            for (c = 0; c < width; c++) {
+                if (i + c < size)
+                    fprintf(stream, "%02x ", ptr[i + c]);
+                else
+                    fputs("   ", stream);
+            }
+        }
+
+        for (c = 0; (c < width) && (i + c < size); c++) {
+            /* check for 0D0A; if found, skip past and start a new line of output */
+            if (nohex && (i + c + 1 < size) && ptr[i + c] == 0x0D &&
+                ptr[i + c + 1] == 0x0A) {
+                i += (c + 2 - width);
+                break;
+            }
+            fprintf(stream, "%c",
+                    (ptr[i + c] >= 0x20) && (ptr[i + c] < 0x80) ? ptr[i + c] : '.');
+            /* check again for 0D0A, to avoid an extra \n if it's at width */
+            if (nohex && (i + c + 2 < size) && ptr[i + c + 1] == 0x0D &&
+                ptr[i + c + 2] == 0x0A) {
+                i += (c + 3 - width);
+                break;
+            }
+        }
+        fputc('\n', stream); /* newline */
+    }
+    fflush(stream);
+}
+
+static int my_trace(CURL *handle,
+                    curl_infotype type,
+                    char *data,
+                    size_t size,
+                    void *userp)
+{
+    struct data *config = (struct data *)userp;
+    const char *text = "";
+    (void)handle; /* prevent compiler warning */
+
+    switch (type) {
+    case CURLINFO_TEXT:
+        fprintf(stderr, "== Info: %s", data);
+        /* FALLTHROUGH */
+    case CURLINFO_END: /* in case a new one is introduced to shock us */
+        break;
+    case CURLINFO_HEADER_OUT:
+        text = "=> Send header";
+        break;
+    case CURLINFO_DATA_OUT:
+        text = "=> Send data";
+        break;
+    case CURLINFO_SSL_DATA_OUT:
+        text = "=> Send SSL data";
+        break;
+    case CURLINFO_HEADER_IN:
+        text = "<= Recv header";
+        break;
+    case CURLINFO_DATA_IN:
+        text = "<= Recv data";
+        break;
+    case CURLINFO_SSL_DATA_IN:
+        text = "<= Recv SSL data";
+        break;
+    }
+
+    dump(text, stderr, (unsigned char *)data, size, config->trace_ascii);
+    return 0;
+}
I presume this is left over from when you were debugging this. Printing
to stderr is not really something we would want in the code.
...
+static int
+virCHMonitorCurlPerform(CURL *handle)
+{
+    CURLcode errorCode;
+    long responseCode = 0;
+
+    struct data config;
+
+    config.trace_ascii = 1; /* enable ascii tracing */
+
+    curl_easy_setopt(handle, CURLOPT_DEBUGFUNCTION, my_trace);
+    curl_easy_setopt(handle, CURLOPT_DEBUGDATA, &config);
+
+    /* the DEBUGFUNCTION has no effect until we enable VERBOSE */
+    curl_easy_setopt(handle, CURLOPT_VERBOSE, 1L);
+
+    errorCode = curl_easy_perform(handle);
+
+    if (errorCode != CURLE_OK) {
+        virReportError(VIR_ERR_INTERNAL_ERROR,
+                       _("curl_easy_perform() returned an error: %s (%d)"),
+                       curl_easy_strerror(errorCode), errorCode);
+        return -1;
+    }
+
+    errorCode = curl_easy_getinfo(handle, CURLINFO_RESPONSE_CODE,
+                                  &responseCode);
+
+    if (errorCode != CURLE_OK) {
+        virReportError(VIR_ERR_INTERNAL_ERROR,
+                       _("curl_easy_getinfo(CURLINFO_RESPONSE_CODE) returned an "
+                         "error: %s (%d)"), curl_easy_strerror(errorCode),
+                       errorCode);
+        return -1;
+    }
+
+    if (responseCode < 0) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("curl_easy_getinfo(CURLINFO_RESPONSE_CODE) returned a "
+                         "negative response code"));
+        return -1;
+    }
+
+    return responseCode;
+}
+
+int
+virCHMonitorPutNoContent(virCHMonitorPtr mon, const char *endpoint)
+{
+    char *url;
Change to...

 g_autofree  char *url = NULL;

and...
...
+    int responseCode = 0;
+    int ret = -1;
+
+    url = g_strdup_printf("%s/%s", URL_ROOT, endpoint);
+
+    virObjectLock(mon);
+
+    /* reset all options of a libcurl session handle at first */
+    curl_easy_reset(mon->handle);
+
+    curl_easy_setopt(mon->handle, CURLOPT_UNIX_SOCKET_PATH, mon->socketpath);
+    curl_easy_setopt(mon->handle, CURLOPT_URL, url);
+    curl_easy_setopt(mon->handle, CURLOPT_PUT, true);
+    curl_easy_setopt(mon->handle, CURLOPT_HTTPHEADER, NULL);
+
+    responseCode = virCHMonitorCurlPerform(mon->handle);
+
+    virObjectUnlock(mon);
+
+    if (responseCode == 200 || responseCode == 204)
+        ret = 0;
+
+    VIR_FREE(url);
...you can drop this.

Likewise for any where else you malloc a variable and free it
in the same scope.
...
+    return ret;
+}
+
+int
+virCHMonitorGet(virCHMonitorPtr mon, const char *endpoint)
+{
+    char *url;
+    int responseCode = 0;
+    int ret = -1;
+
+    url = g_strdup_printf("%s/%s", URL_ROOT, endpoint);
+
+    virObjectLock(mon);
+
+    /* reset all options of a libcurl session handle at first */
+    curl_easy_reset(mon->handle);
+
+    curl_easy_setopt(mon->handle, CURLOPT_UNIX_SOCKET_PATH, mon->socketpath);
+    curl_easy_setopt(mon->handle, CURLOPT_URL, url);
+
+    responseCode = virCHMonitorCurlPerform(mon->handle);
+
+    virObjectUnlock(mon);
+
+    if (responseCode == 200 || responseCode == 204)
+        ret = 0;
+
+    VIR_FREE(url);
+    return ret;
+}
...
+int
+virCHMonitorCreateVM(virCHMonitorPtr mon)
+{
+    g_autofree char *url = NULL;
+    int responseCode = 0;
+    int ret = -1;
+    g_autofree char *payload = NULL;
+    struct curl_slist *headers = NULL;
+
+    url = g_strdup_printf("%s/%s", URL_ROOT, URL_VM_CREATE);
+    headers = curl_slist_append(headers, "Accept: application/json");
+    headers = curl_slist_append(headers, "Content-Type: application/json");
+    headers = curl_slist_append(headers, "Expect:");
Is that empty "Expect:" header intentional ?
...
+
+    if (virCHMonitorBuildVMJson(mon->vm->def, &payload) != 0)
+        return -1;
+
+    virObjectLock(mon);
+
+    /* reset all options of a libcurl session handle at first */
+    curl_easy_reset(mon->handle);
+
+    curl_easy_setopt(mon->handle, CURLOPT_UNIX_SOCKET_PATH, mon->socketpath);
+    curl_easy_setopt(mon->handle, CURLOPT_URL, url);
+    curl_easy_setopt(mon->handle, CURLOPT_CUSTOMREQUEST, "PUT");
+    curl_easy_setopt(mon->handle, CURLOPT_HTTPHEADER, headers);
+    curl_easy_setopt(mon->handle, CURLOPT_POSTFIELDS, payload);
+
+    responseCode = virCHMonitorCurlPerform(mon->handle);
+
+    virObjectUnlock(mon);
+
+    if (responseCode == 200 || responseCode == 204)
+        ret = 0;
+
+    curl_slist_free_all(headers);
+    VIR_FREE(url);
+    VIR_FREE(payload);
+    return ret;
+}
...
+struct _virCHMonitor {
+    virObjectLockable parent;
+
+    CURL *handle;
Does this curl handle keep a persistent connection open to the CH
process ?  And is that able to be used to detect immedaitely when
the CH process shuts down or crashes unexpectedly ?
...
+
+    char *socketpath;
+
+    pid_t pid;
+
+    virDomainObjPtr vm;
+};
+
...
+int virCHMonitorResumeVM(virCHMonitorPtr mon);
diff --git a/src/ch/ch_process.c b/src/ch/ch_process.c
new file mode 100644
index 0000000000..15f4801549
--- /dev/null
+++ b/src/ch/ch_process.c
@@ -0,0 +1,125 @@
+/*
+ * Copyright Intel Corp. 2020
+ *
+ * ch_driver.h: header file for Cloud-Hypervisor driver functions
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library.  If not, see
+ * <http://www.gnu.org/licenses/>.
+ */
+
...
diff --git a/src/ch/meson.build b/src/ch/meson.build
new file mode 100644
index 0000000000..e41691bc05
--- /dev/null
+++ b/src/ch/meson.build
@@ -0,0 +1,44 @@
+ch_driver_sources = [
+  'ch_conf.c',
+  'ch_conf.h',
+  'ch_domain.c',
+  'ch_domain.h',
+  'ch_driver.c',
+  'ch_driver.h',
+  'ch_monitor.c',
+  'ch_monitor.h',
+  'ch_process.c',
+  'ch_process.h',
+]
+
+driver_source_files += files(ch_driver_sources)
+
+stateful_driver_source_files += files(ch_driver_sources)
+
+if conf.has('WITH_CH')
+  ch_driver_impl = static_library(
+    'virt_driver_ch_impl',
+    [
+      ch_driver_sources,
+    ],
+    dependencies: [
+      access_dep,
+      curl_dep,
+      log_dep,
+      src_dep,
+    ],
+    include_directories: [
+      conf_inc_dir,
+    ],
+  )
+
+  virt_modules += {
+    'name': 'virt_driver_ch',
+    'link_whole': [
+      ch_driver_impl,
+    ],
+    'link_args': [
+      libvirt_no_undefined,
+    ],
+  }
+endif
We're moving to a world where each stateful virt driver has its
own daemon. The rules here are enough for libvirtd, but you'll
need to add rules to create a "virtchd" daemon. The qemu/meson.build
will show how - look for "virt_daemons".
...
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 5d3ae8bb28..11b183ad2c 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -130,6 +130,7 @@ VIR_ENUM_IMPL(virDomainVirt,
               "parallels",
               "bhyve",
               "vz",
+              "ch",
 );
VIR_ENUM_IMPL(virDomainOS,
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index 8a0f26f5c0..4dba588728 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -139,6 +139,7 @@ typedef enum {
     VIR_DOMAIN_VIRT_PARALLELS,
     VIR_DOMAIN_VIRT_BHYVE,
     VIR_DOMAIN_VIRT_VZ,
+    VIR_DOMAIN_VIRT_CH,
As mentioned earlier, not required since you're still using
KVM IIUC.



Overall, this looks like a reasonable start of the driver.

What is the security model for the processes ? The libvirt driver
here is running as root and spawning CH as root. We don't really
want the VM process running as root though. It really needs to
be unprivileged from a DAC POV. Obviously that's quite a bit more
work for you todo, but it should be opssible to share alot of the
security mgmt infrastructure that we already have for QEMU and
LXC drivers. It can manage DAC permissions and apply SELinux or
AppArmor MAC labelling/profiles.

The other big question is around device addressing. If using PCI,
then PCI address assignment logic is critical to ensure consistent
guest ABI. 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

    

Re: [RFC] Add basic driver for the Cloud-Hypervisor

Daniel P. Berrangé