[PATCH RFC 0/3] Add checkpoint/restore support to LXC using CRIU

This patch series implements a way to do checkpoint/restore to LXC driver using CRIU operations. This respects the other methods to save and restore processes states: using a file with a header with some metadata. The only difference here is basically the way LXC drivers join the files produced by CRIU. CRIU generates a lots of 'img' files and it is compresses using TAR to fit into the libvirt state file. Julio Faracco (3): meson: Add support to CRIU binary into meson lxc: Including CRIU functions and functions to support C/R. lxc: Adding support to LXC driver to restore a container meson.build | 10 + meson_options.txt | 1 + src/lxc/lxc_conf.c | 3 + src/lxc/lxc_conf.h | 2 + src/lxc/lxc_container.c | 188 +++++++++++++++++- src/lxc/lxc_container.h | 3 +- src/lxc/lxc_controller.c | 93 ++++++++- src/lxc/lxc_criu.c | 405 +++++++++++++++++++++++++++++++++++++++ src/lxc/lxc_criu.h | 50 +++++ src/lxc/lxc_driver.c | 341 +++++++++++++++++++++++++++++++- src/lxc/lxc_process.c | 26 ++- src/lxc/lxc_process.h | 1 + src/lxc/meson.build | 2 + 13 files changed, 1106 insertions(+), 19 deletions(-) create mode 100644 src/lxc/lxc_criu.c create mode 100644 src/lxc/lxc_criu.h -- 2.27.0

This patch includes CRIU binary checks into meson files to support checkpoint/restore for LXC driver. Signed-off-by: Julio Faracco <jcfaracco@gmail.com> --- meson.build | 10 ++++++++++ meson_options.txt | 1 + 2 files changed, 11 insertions(+) diff --git a/meson.build b/meson.build index 369548f127..115c903ab2 100644 --- a/meson.build +++ b/meson.build @@ -1639,6 +1639,15 @@ void main(void) { if cc.compiles(lxc_get_free_code) conf.set('WITH_DECL_LOOP_CTL_GET_FREE', 1) endif + + if not get_option('criu').disabled() + criu_prog = find_program('criu') + + if criu_prog.found() + conf.set('WITH_CRIU', 1) + conf.set_quoted('CRIU', criu_prog.path()) + endif + endif elif get_option('driver_lxc').enabled() error('linux and remote_driver are required for LXC') endif @@ -2411,6 +2420,7 @@ misc_summary = { 'virt-login-shell': conf.has('WITH_LOGIN_SHELL'), 'virt-host-validate': conf.has('WITH_HOST_VALIDATE'), 'TLS priority': conf.get_unquoted('TLS_PRIORITY'), + 'CRIU': conf.has('WITH_CRIU'), } summary(misc_summary, section: 'Miscellaneous', bool_yn: true, list_sep: ' ') diff --git a/meson_options.txt b/meson_options.txt index e5d79c2b6b..de977c8775 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -102,3 +102,4 @@ option('numad', type: 'feature', value: 'auto', description: 'use numad to manag option('pm_utils', type: 'feature', value: 'auto', description: 'use pm-utils for power management') option('sysctl_config', type: 'feature', value: 'auto', description: 'Whether to install sysctl configs') option('tls_priority', type: 'string', value: 'NORMAL', description: 'set the default TLS session priority string') +option('criu', type: 'feature', value: 'auto', description: 'use CRIU to checkpoint/restore containers') -- 2.27.0

This patch adds the source code of helper functions into files lxc_criu.{h,c} to support LXC checkpoint/restore using CRIU binary. To save container state, LXC follows the same pattern of QEMU and libxl using a file with a header with metadata, but as CRIU saves multiple files, it needs to inserted in a unique file using a type of compression. Using TAR for instance. Signed-off-by: Julio Faracco <jcfaracco@gmail.com> --- src/lxc/lxc_criu.c | 405 ++++++++++++++++++++++++++++++++++++++++++++ src/lxc/lxc_criu.h | 50 ++++++ src/lxc/meson.build | 2 + 3 files changed, 457 insertions(+) create mode 100644 src/lxc/lxc_criu.c create mode 100644 src/lxc/lxc_criu.h diff --git a/src/lxc/lxc_criu.c b/src/lxc/lxc_criu.c new file mode 100644 index 0000000000..a82bd5ffde --- /dev/null +++ b/src/lxc/lxc_criu.c @@ -0,0 +1,405 @@ +/* + * lxc_criu.c: wrapper functions for CRIU C API to be used for lxc migration + * + * Copyright (c) 2021 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + */ + +#include <config.h> + +#include <fcntl.h> +#include <sys/stat.h> +#include <sys/mount.h> + +#include "virobject.h" +#include "virerror.h" +#include "virlog.h" +#include "virfile.h" +#include "vircommand.h" +#include "virstring.h" +#include "viralloc.h" +#include "virutil.h" + +#include "lxc_domain.h" +#include "lxc_driver.h" +#include "lxc_criu.h" + +#define VIR_FROM_THIS VIR_FROM_LXC + +VIR_LOG_INIT("lxc.lxc_criu"); + +#if WITH_CRIU +typedef enum { + LXC_SAVE_FORMAT_RAW = 0, + LXC_SAVE_FORMAT_GZIP = 1, + LXC_SAVE_FORMAT_BZIP2 = 2, + LXC_SAVE_FORMAT_XZ = 3, + LXC_SAVE_FORMAT_LZOP = 4, + + LXC_SAVE_FORMAT_LAST +} virLXCSaveFormat; + +VIR_ENUM_DECL(lxcSaveCompression); +VIR_ENUM_IMPL(lxcSaveCompression, + LXC_SAVE_FORMAT_LAST, + "raw", + "gzip", + "bzip2", + "xz", + "lzop", +); + + +/* lxcSaveImageGetCompressionProgram: + * @imageFormat: String representation from lxc.conf for the compression + * image format being used (dump, save, or snapshot). + * @compresspath: Pointer to a character string to store the fully qualified + * path from virFindFileInPath. + * @styleFormat: String representing the style of format (dump, save, snapshot) + * + * Returns: + * virQEMUSaveFormat - Integer representation of the compression + * program to be used for particular style + * (e.g. dump, save, or snapshot). + * LXC_SAVE_FORMAT_RAW - If there is no lxc.conf imageFormat value or + * no there was an error, then just return RAW + * indicating none. + */ +static int +lxcSaveImageGetCompressionProgram(const char *imageFormat, + virCommandPtr *compressor, + const char *styleFormat) +{ + const char *prog; + int ret; + + *compressor = NULL; + + /* Use tar to compress all .img files */ + if (!(prog = virFindFileInPath("tar"))) + return -1; + + *compressor = virCommandNew(prog); + + if (STREQ(styleFormat, "save")) { + /* Remove files after added into tar */ + virCommandAddArgList(*compressor, "--create", + "--remove-files", NULL); + } else if (STREQ(styleFormat, "dump")) { + virCommandAddArg(*compressor, "--extract"); + } else { + return -1; + } + + if (!imageFormat) + return 0; + + if ((ret = lxcSaveCompressionTypeFromString(imageFormat)) < 0) + return -1; + + switch (ret) { + case LXC_SAVE_FORMAT_GZIP: + virCommandAddArg(*compressor, "--gzip"); + break; + case LXC_SAVE_FORMAT_BZIP2: + virCommandAddArg(*compressor, "--bzip2"); + break; + case LXC_SAVE_FORMAT_XZ: + virCommandAddArg(*compressor, "--xz"); + break; + case LXC_SAVE_FORMAT_LZOP: + virCommandAddArg(*compressor, "--lzop"); + break; + case LXC_SAVE_FORMAT_RAW: + default: + break; + } + + return ret; +} + + +int lxcCriuCompress(const char *checkpointdir, + char *compressionType) +{ + virCommandPtr cmd; + g_autofree char *tarfile = NULL; + int ret = -1; + + if ((ret = lxcSaveImageGetCompressionProgram(compressionType, + &cmd, + "save")) < 0) + return -1; + + tarfile = g_strdup_printf("%s/criu.save", checkpointdir); + + virCommandAddArgFormat(cmd, "--file=%s", tarfile); + virCommandAddArgFormat(cmd, "--directory=%s/save/", checkpointdir); + virCommandAddArg(cmd, "."); + + if (virCommandRun(cmd, NULL) < 0) + return -1; + + return ret; +} + + +int lxcCriuDecompress(const char *checkpointdir, + char *compressionType) +{ + virCommandPtr cmd; + g_autofree char *tarfile = NULL; + g_autofree char *savedir = NULL; + int ret = -1; + + if ((ret = lxcSaveImageGetCompressionProgram(compressionType, + &cmd, + "dump")) < 0) + return -1; + + savedir = g_strdup_printf("%s/save/", checkpointdir); + if (virFileMakePath(savedir) < 0) { + virReportSystemError(errno, + _("Failed to mkdir %s"), savedir); + return -1; + } + + tarfile = g_strdup_printf("%s/criu.save", checkpointdir); + + virCommandAddArgFormat(cmd, "--file=%s", tarfile); + virCommandAddArgFormat(cmd, "--directory=%s", savedir); + + if (virCommandRun(cmd, NULL) < 0) + return -1; + + return ret; +} + + +int lxcCriuDump(virDomainObjPtr vm, + const char *checkpointdir) +{ + int ret = -1; + virLXCDomainObjPrivatePtr priv = vm->privateData; + virCommandPtr cmd; + struct stat sb; + g_autofree char *path = NULL; + g_autofree char *tty_info_path = NULL; + g_autofree char *ttyinfo = NULL; + g_autofree char *pidfile = NULL; + g_autofree char *pidbuf = NULL; + g_autofree char *savedir = NULL; + int pidlen; + int pidfd; + int status; + + savedir = g_strdup_printf("%s/save/", checkpointdir); + if (virFileMakePath(savedir) < 0) { + virReportSystemError(errno, + _("Failed to mkdir %s"), savedir); + return -1; + } + + pidfile = g_strdup_printf("%s/save/dump.pid", checkpointdir); + pidbuf = g_strdup_printf("%d", priv->initpid); + pidlen = strlen(pidbuf); + + pidfd = open(pidfile, O_WRONLY | O_CREAT | O_TRUNC, 0644); + if (safewrite(pidfd, pidbuf, pidlen) != pidlen) { + virReportSystemError(errno, "%s", _("criu pid file write failed")); + return -1; + } + + cmd = virCommandNew(CRIU); + virCommandAddArg(cmd, "dump"); + + virCommandAddArgList(cmd, "--images-dir", savedir, NULL); + + virCommandAddArgList(cmd, "--log-file", "dump.log", NULL); + + virCommandAddArgList(cmd, "-vvvv", NULL); + + virCommandAddArg(cmd, "--tree"); + virCommandAddArgFormat(cmd, "%d", priv->initpid); + + virCommandAddArgList(cmd, "--tcp-established", "--file-locks", + "--link-remap", "--force-irmap", NULL); + + virCommandAddArgList(cmd, "--manage-cgroup", NULL); + + virCommandAddArgList(cmd, "--enable-external-sharing", + "--enable-external-masters", NULL); + + virCommandAddArgList(cmd, "--enable-fs", "hugetlbfs", + "--enable-fs", "tracefs", NULL); + + /* Add support for FUSE */ + virCommandAddArgList(cmd, "--ext-mount-map", "/proc/meminfo:fuse", NULL); + virCommandAddArgList(cmd, "--ghost-limit", "10000000", NULL); + + virCommandAddArgList(cmd, "--ext-mount-map", "/dev/console:console", NULL); + virCommandAddArgList(cmd, "--ext-mount-map", "/dev/tty1:tty1", NULL); + virCommandAddArgList(cmd, "--ext-mount-map", "auto", NULL); + + /* The master pair of the /dev/pts device lives outside from what is dumped + * inside the libvirt-lxc process. Add the slave pair as an external tty + * otherwise criu will fail. + */ + path = g_strdup_printf("/proc/%d/root/dev/pts/0", priv->initpid); + + if (stat(path, &sb) < 0) { + virReportSystemError(errno, + _("Unable to stat %s"), path); + goto cleanup; + } + + tty_info_path = g_strdup_printf("%s/tty.info", savedir); + ttyinfo = g_strdup_printf("tty[%x:%x]", (unsigned int)sb.st_rdev, + (unsigned int)sb.st_dev); + + if (virFileWriteStr(tty_info_path, ttyinfo, 0666) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Failed to write tty info to %s"), tty_info_path); + goto cleanup; + } + + VIR_DEBUG("tty.info: tty[%x:%x]", + (unsigned int)sb.st_dev, (unsigned int)sb.st_rdev); + virCommandAddArg(cmd, "--external"); + virCommandAddArgFormat(cmd, "tty[%x:%x]", + (unsigned int)sb.st_rdev, (unsigned int)sb.st_dev); + + VIR_DEBUG("About to checkpoint domain %s (pid = %d)", + vm->def->name, priv->initpid); + virCommandRawStatus(cmd); + if (virCommandRun(cmd, &status) < 0) + goto cleanup; + + ret = 0; + + cleanup: + if (ret < 0) + return ret; + return status; +} + +int lxcCriuRestore(virDomainDefPtr def, + int restorefd, int ttyfd) +{ + virCommandPtr cmd; + g_autofree char *ttyinfo = NULL; + g_autofree char *inheritfd = NULL; + g_autofree char *tty_info_path = NULL; + g_autofree char *checkpointfd = NULL; + g_autofree char *checkpointdir = NULL; + g_autofree char *rootfs_mount = NULL; + g_autofree gid_t *groups = NULL; + int ret = -1; + int ngroups; + + cmd = virCommandNew(CRIU); + virCommandAddArg(cmd, "restore"); + + checkpointfd = g_strdup_printf("/proc/self/fd/%d", restorefd); + + if (virFileResolveLink(checkpointfd, &checkpointdir) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to readlink checkpoint dir path")); + return -1; + } + + /* CRIU needs the container's root bind mounted so that it is the root of + * some mount. + */ + rootfs_mount = g_strdup_printf("%s/save/%s", LXC_STATE_DIR, def->name); + + virCommandAddArgList(cmd, "--images-dir", checkpointdir, NULL); + + virCommandAddArgList(cmd, "--log-file", "restore.log", NULL); + + virCommandAddArgList(cmd, "--pidfile", "restore.pid", NULL); + + virCommandAddArgList(cmd, "-vvvv", NULL); + virCommandAddArgList(cmd, "--tcp-established", "--file-locks", + "--link-remap", "--force-irmap", NULL); + + virCommandAddArgList(cmd, "--enable-external-sharing", + "--enable-external-masters", NULL); + + virCommandAddArgList(cmd, "--ext-mount-map", "auto", NULL); + + virCommandAddArgList(cmd, "--enable-fs", "hugetlbfs", + "--enable-fs", "tracefs", NULL); + + virCommandAddArgList(cmd, "--ext-mount-map", "fuse:/proc/meminfo", NULL); + + virCommandAddArgList(cmd, "--ext-mount-map", "console:/dev/console", NULL); + virCommandAddArgList(cmd, "--ext-mount-map", "tty1:/dev/tty1", NULL); + + virCommandAddArgList(cmd, "--restore-detached", "--restore-sibling", NULL); + + /* Restore external tty that was saved in tty.info file + */ + tty_info_path = g_strdup_printf("%s/tty.info", checkpointdir); + + if (virFileReadAll(tty_info_path, 1024, &ttyinfo) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Failed to read tty info from %s"), tty_info_path); + return -1; + } + + inheritfd = g_strdup_printf("fd[%d]:%s", ttyfd, ttyinfo); + + virCommandAddArgList(cmd, "--inherit-fd", inheritfd, NULL); + + /* Change the root filesystem because we run in mount namespace. + */ + virCommandAddArgList(cmd, "--root", rootfs_mount, NULL); + + if ((ngroups = virGetGroupList(virCommandGetUID(cmd), virCommandGetGID(cmd), + &groups)) < 0) + return -1; + + + VIR_DEBUG("Executing init binary"); + /* this function will only return if an error occurred */ + ret = virCommandExec(cmd, groups, ngroups); + + if (ret != 0) { + VIR_DEBUG("Tearing down container"); + fprintf(stderr, + _("Failure in libvirt_lxc startup: %s\n"), + virGetLastErrorMessage()); + } + + return ret; +} +#else +int lxcCriuDump(virDomainObjPtr vm ATTRIBUTE_UNUSED, + const char *checkpointdir ATTRIBUTE_UNUSED) +{ + virReportUnsupportedError(); + return -1; +} + +int lxcCriuRestore(virDomainDefPtr def ATTRIBUTE_UNUSED, + int fd ATTRIBUTE_UNUSED, + int ttyfd ATTRIBUTE_UNUSED) +{ + virReportUnsupportedError(); + return -1; +} +#endif diff --git a/src/lxc/lxc_criu.h b/src/lxc/lxc_criu.h new file mode 100644 index 0000000000..7dfd78aa24 --- /dev/null +++ b/src/lxc/lxc_criu.h @@ -0,0 +1,50 @@ +/* + * lxc_criu.h: CRIU C API methods wrapper + * + * Copyright (c) 2021 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + */ + +#ifndef LXC_CRIU_H +# define LXC_CRIU_H + +# include "virobject.h" + +#define LXC_SAVE_MAGIC "LXCCriuSaveMagic" +#define LXC_SAVE_VERSION 2 + +typedef struct _virLXCSaveHeader virLXCSaveHeader; +typedef virLXCSaveHeader *virLXCSaveHeaderPtr; +struct _virLXCSaveHeader { + char magic[sizeof(LXC_SAVE_MAGIC)-1]; + uint32_t version; + uint32_t xmlLen; + uint32_t compressed; + uint32_t unused[9]; +}; + +int lxcCriuCompress(const char *checkpointdir, + char *compressionType); + +int lxcCriuDecompress(const char *checkpointdir, + char *compressionType); + +int lxcCriuDump(virDomainObjPtr vm, + const char *checkpointdir); + +int lxcCriuRestore(virDomainDefPtr def, + int fd, int ttyfd); +#endif /* LXC_CRIU_H */ diff --git a/src/lxc/meson.build b/src/lxc/meson.build index ad5c659dba..1a8524aab3 100644 --- a/src/lxc/meson.build +++ b/src/lxc/meson.build @@ -9,6 +9,7 @@ lxc_driver_sources = [ 'lxc_monitor.c', 'lxc_native.c', 'lxc_process.c', + 'lxc_criu.c', ] lxc_monitor_protocol = files('lxc_monitor_protocol.x') @@ -61,6 +62,7 @@ lxc_controller_sources = files( 'lxc_domain.c', 'lxc_fuse.c', 'lxc_controller.c', + 'lxc_criu.c', ) lxc_controller_generated = custom_target( -- 2.27.0

This patch introduces the hability to restore a saved container using CRIU. It should be possible to start it using traditional methods: a simple container start; or from a saved state. Signed-off-by: Julio Faracco <jcfaracco@gmail.com> --- src/lxc/lxc_conf.c | 3 + src/lxc/lxc_conf.h | 2 + src/lxc/lxc_container.c | 188 ++++++++++++++++++++- src/lxc/lxc_container.h | 3 +- src/lxc/lxc_controller.c | 93 ++++++++++- src/lxc/lxc_driver.c | 341 ++++++++++++++++++++++++++++++++++++++- src/lxc/lxc_process.c | 26 ++- src/lxc/lxc_process.h | 1 + 8 files changed, 638 insertions(+), 19 deletions(-) diff --git a/src/lxc/lxc_conf.c b/src/lxc/lxc_conf.c index e6ad91205e..690cef7d39 100644 --- a/src/lxc/lxc_conf.c +++ b/src/lxc/lxc_conf.c @@ -240,6 +240,8 @@ virLXCDriverConfigNew(void) cfg->logDir = g_strdup(LXC_LOG_DIR); cfg->autostartDir = g_strdup(LXC_AUTOSTART_DIR); + cfg->saveImageFormat = NULL; + return cfg; } @@ -291,4 +293,5 @@ virLXCDriverConfigDispose(void *obj) g_free(cfg->stateDir); g_free(cfg->logDir); g_free(cfg->securityDriverName); + g_free(cfg->saveImageFormat); } diff --git a/src/lxc/lxc_conf.h b/src/lxc/lxc_conf.h index 664bafc7b9..08bea11c02 100644 --- a/src/lxc/lxc_conf.h +++ b/src/lxc/lxc_conf.h @@ -61,6 +61,8 @@ struct _virLXCDriverConfig { char *securityDriverName; bool securityDefaultConfined; bool securityRequireConfined; + + char *saveImageFormat; }; struct _virLXCDriver { diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 2a5f8711c4..03c086d029 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -51,6 +51,7 @@ #include "virerror.h" #include "virlog.h" #include "lxc_container.h" +#include "lxc_criu.h" #include "viralloc.h" #include "virnetdevveth.h" #include "viruuid.h" @@ -84,6 +85,7 @@ struct __lxc_child_argv { char **ttyPaths; int handshakefd; int *nsInheritFDs; + int restorefd; }; static int lxcContainerMountFSBlock(virDomainFSDefPtr fs, @@ -235,8 +237,8 @@ static virCommandPtr lxcContainerBuildInitCmd(virDomainDefPtr vmDef, * * Returns 0 on success or -1 in case of error */ -static int lxcContainerSetupFDs(int *ttyfd, - size_t npassFDs, int *passFDs) +static int lxcContainerSetupFDs(int *ttyfd, size_t npassFDs, + int *passFDs, int restorefd) { int rc = -1; int open_max; @@ -335,6 +337,10 @@ static int lxcContainerSetupFDs(int *ttyfd, for (fd = last_fd + 1; fd < open_max; fd++) { int tmpfd = fd; + + if (tmpfd == restorefd) + continue; + VIR_MASS_CLOSE(tmpfd); } @@ -1017,6 +1023,30 @@ static int lxcContainerMountFSDevPTS(virDomainDefPtr def, return 0; } + +static int lxcContainerMountFSDevPTSRestore(virDomainDefPtr def, + const char *stateDir) +{ + int flags = MS_MOVE; + g_autofree char *path = NULL; + + VIR_DEBUG("Mount /dev/pts stateDir=%s", stateDir); + + path = g_strdup_printf("%s/%s.devpts", stateDir, def->name); + + VIR_DEBUG("Trying to move %s to /dev/pts", path); + + if (mount(path, "/dev/pts", NULL, flags, NULL) < 0) { + virReportSystemError(errno, + _("Failed to mount %s on /dev/pts"), + path); + return -1; + } + + return 0; +} + + static int lxcContainerSetupDevices(char **ttyPaths, size_t nttyPaths) { size_t i; @@ -1843,6 +1873,139 @@ static int lxcAttachNS(int *ns_fd) return 0; } + +/* + * lxcContainerChildRestore: + * @data: pointer to container arguments + */ +static int lxcContainerChildRestore(void *data) +{ + lxc_child_argv_t *argv = data; + virDomainDefPtr vmDef = argv->config; + int ttyfd = -1; + int ret = -1; + virDomainFSDefPtr root; + g_autofree char *ttyPath = NULL; + g_autofree char *sec_mount_options = NULL; + g_autofree char *stateDir = NULL; + g_autofree char *rootfs_mount = NULL; + + if (NULL == vmDef) { + virReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("lxcChild() passed invalid vm definition")); + goto cleanup; + } + + if (lxcContainerWaitForContinue(argv->monitor) < 0) { + virReportSystemError(errno, "%s", + _("Failed to read the container continue message")); + goto cleanup; + } + VIR_DEBUG("Received container continue message"); + + if (lxcContainerSetID(vmDef) < 0) + goto cleanup; + + root = virDomainGetFilesystemForTarget(vmDef, "/"); + + if (argv->nttyPaths) { + const char *tty = argv->ttyPaths[0]; + + if (STRPREFIX(tty, "/dev/pts/")) + tty += strlen("/dev/pts/"); + + ttyPath = g_strdup_printf("%s/%s.devpts/%s", + LXC_STATE_DIR, vmDef->name, tty); + } else { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("At least one tty is required")); + goto cleanup; + } + + VIR_DEBUG("Container TTY path: %s", ttyPath); + + ttyfd = open(ttyPath, O_RDWR); + if (ttyfd < 0) { + virReportSystemError(errno, + _("Failed to open tty %s"), + ttyPath); + goto cleanup; + } + VIR_DEBUG("Container TTY fd: %d", ttyfd); + + if (!(sec_mount_options = virSecurityManagerGetMountOptions( + argv->securityDriver, + vmDef))) + goto cleanup; + + if (lxcContainerPrepareRoot(vmDef, root, sec_mount_options) < 0) + goto cleanup; + + if (lxcContainerSendContinue(argv->handshakefd) < 0) { + virReportSystemError(errno, "%s", + _("Failed to send continue signal to controller")); + goto cleanup; + } + + VIR_DEBUG("Setting up container's std streams"); + + if (lxcContainerSetupFDs(&ttyfd, + argv->npassFDs, argv->passFDs, argv->restorefd) < 0) + goto cleanup; + + /* CRIU needs the container's root bind mounted so that it is the root of + * some mount. + */ + rootfs_mount = g_strdup_printf("%s/save/%s", LXC_STATE_DIR, vmDef->name); + + if (virFileMakePath(rootfs_mount) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to mkdir rootfs mount path")); + goto cleanup; + } + + if (mount(root->src->path, rootfs_mount, NULL, MS_BIND, NULL) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to create rootfs mountpoint")); + goto cleanup; + } + + if (virFileResolveAllLinks(LXC_STATE_DIR, &stateDir) < 0) + goto cleanup; + + /* Mounts /dev/pts */ + if (lxcContainerMountFSDevPTSRestore(vmDef, stateDir) < 0) { + virReportSystemError(errno, "%s", + _("Failed to mount dev/pts")); + goto cleanup; + } + + if (setsid() < 0) { + virReportSystemError(errno, "%s", + _("Unable to become session leader")); + } + + ret = 0; + + cleanup: + VIR_FORCE_CLOSE(argv->monitor); + VIR_FORCE_CLOSE(argv->handshakefd); + VIR_FORCE_CLOSE(ttyfd); + + if (ret == 0) { + VIR_DEBUG("Executing container restore criu function"); + ret = lxcCriuRestore(vmDef, argv->restorefd, 0); + } else { + VIR_DEBUG("Tearing down container"); + fprintf(stderr, + _("Failure in libvirt_lxc startup: %s\n"), + virGetLastErrorMessage()); + } + + return ret; +} + + /** * lxcContainerSetUserGroup: * @cmd: command to update @@ -2049,7 +2212,7 @@ static int lxcContainerChild(void *data) VIR_FORCE_CLOSE(argv->handshakefd); VIR_FORCE_CLOSE(argv->monitor); if (lxcContainerSetupFDs(&ttyfd, - argv->npassFDs, argv->passFDs) < 0) + argv->npassFDs, argv->passFDs, -1) < 0) goto cleanup; /* Make init process of the container the leader of the new session. @@ -2143,7 +2306,8 @@ int lxcContainerStart(virDomainDefPtr def, int handshakefd, int *nsInheritFDs, size_t nttyPaths, - char **ttyPaths) + char **ttyPaths, + int restorefd) { pid_t pid; int cflags; @@ -2162,6 +2326,7 @@ int lxcContainerStart(virDomainDefPtr def, .ttyPaths = ttyPaths, .handshakefd = handshakefd, .nsInheritFDs = nsInheritFDs, + .restorefd = restorefd, }; /* allocate a stack for the container */ @@ -2207,9 +2372,18 @@ int lxcContainerStart(virDomainDefPtr def, VIR_DEBUG("Inheriting a UTS namespace"); } - VIR_DEBUG("Cloning container init process"); - pid = clone(lxcContainerChild, stacktop, cflags, &args); - VIR_DEBUG("clone() completed, new container PID is %d", pid); + if (restorefd == -1) + VIR_DEBUG("Cloning container init process"); + else + VIR_DEBUG("Cloning container process that will spawn criu restore"); + + if (restorefd != -1) + pid = clone(lxcContainerChildRestore, stacktop, SIGCHLD, &args); + else + pid = clone(lxcContainerChild, stacktop, cflags, &args); + + if (restorefd == -1) + VIR_DEBUG("clone() completed, new container PID is %d", pid); if (pid < 0) { virReportSystemError(errno, "%s", diff --git a/src/lxc/lxc_container.h b/src/lxc/lxc_container.h index 94a6c5309c..cf61a033fc 100644 --- a/src/lxc/lxc_container.h +++ b/src/lxc/lxc_container.h @@ -54,7 +54,8 @@ int lxcContainerStart(virDomainDefPtr def, int handshakefd, int *nsInheritFDs, size_t nttyPaths, - char **ttyPaths); + char **ttyPaths, + int restorefd); int lxcContainerSetupHostdevCapsMakePath(const char *dev); diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 8f166a436a..5bd0712ba9 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -139,6 +139,8 @@ struct _virLXCController { virCgroupPtr cgroup; virLXCFusePtr fuse; + + int restore; }; #include "lxc_controller_dispatch.h" @@ -1006,6 +1008,54 @@ static int lxcControllerClearCapabilities(void) return 0; } + +static int lxcControllerFindRestoredPid(int fd) +{ + int initpid = 0; + int ret = -1; + g_autofree char *checkpointdir = NULL; + g_autofree char *pidfile = NULL; + g_autofree char *checkpointfd = NULL; + int pidfilefd; + char c; + + if (fd < 0) + goto cleanup; + + checkpointfd = g_strdup_printf("/proc/self/fd/%d", fd); + + if (virFileResolveLink(checkpointfd, &checkpointdir) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to readlink checkpoint dir path")); + goto cleanup; + } + + pidfile = g_strdup_printf("%s/restore.pid", checkpointdir); + + if ((pidfilefd = virFileOpenAs(pidfile, O_RDONLY, 0, -1, -1, 0)) < 0) { + virReportSystemError(pidfilefd, + _("Failed to open domain's pidfile '%s'"), + pidfile); + goto cleanup; + } + + while ((saferead(pidfilefd, &c, 1) == 1) && c != EOF) + initpid = initpid*10 + c - '0'; + + ret = initpid; + + if (virFileRemove(pidfile, -1, -1) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to delete pidfile path")); + } + + cleanup: + VIR_FORCE_CLOSE(fd); + VIR_FORCE_CLOSE(pidfilefd); + return ret; +} + + static bool wantReboot; static virMutex lock = VIR_MUTEX_INITIALIZER; @@ -2310,6 +2360,7 @@ virLXCControllerRun(virLXCControllerPtr ctrl) int rc = -1; int control[2] = { -1, -1}; int containerhandshake[2] = { -1, -1 }; + bool restore_mode = (ctrl->restore != -1); char **containerTTYPaths = g_new0(char *, ctrl->nconsoles); size_t i; @@ -2368,7 +2419,8 @@ virLXCControllerRun(virLXCControllerPtr ctrl) containerhandshake[1], ctrl->nsFDs, ctrl->nconsoles, - containerTTYPaths)) < 0) + containerTTYPaths, + ctrl->restore)) < 0) goto cleanup; VIR_FORCE_CLOSE(control[1]); VIR_FORCE_CLOSE(containerhandshake[1]); @@ -2380,10 +2432,10 @@ virLXCControllerRun(virLXCControllerPtr ctrl) for (i = 0; i < VIR_LXC_DOMAIN_NAMESPACE_LAST; i++) VIR_FORCE_CLOSE(ctrl->nsFDs[i]); - if (virLXCControllerSetupCgroupLimits(ctrl) < 0) + if (!restore_mode && virLXCControllerSetupCgroupLimits(ctrl) < 0) goto cleanup; - if (virLXCControllerSetupUserns(ctrl) < 0) + if (!restore_mode && virLXCControllerSetupUserns(ctrl) < 0) goto cleanup; if (virLXCControllerMoveInterfaces(ctrl) < 0) @@ -2408,6 +2460,26 @@ virLXCControllerRun(virLXCControllerPtr ctrl) if (lxcControllerClearCapabilities() < 0) goto cleanup; + if (restore_mode) { + int status; + int ret = waitpid(-1, &status, 0); + int initpid; + + VIR_DEBUG("Got sig child %d", ret); + + /* We have two basic cases here. + * - CRIU died bacause of restore error and we do not have a running container + * - CRIU detached itself from the running container + */ + if ((initpid = lxcControllerFindRestoredPid(ctrl->restore)) < 0) { + virReportSystemError(errno, "%s", + _("Unable to get restored task pid")); + virNetDaemonQuit(ctrl->daemon); + goto cleanup; + } + ctrl->initpid = initpid; + } + for (i = 0; i < ctrl->nconsoles; i++) if (virLXCControllerConsoleSetNonblocking(&(ctrl->consoles[i])) < 0) goto cleanup; @@ -2450,6 +2522,7 @@ int main(int argc, char *argv[]) char **veths = NULL; int ns_fd[VIR_LXC_DOMAIN_NAMESPACE_LAST]; int handshakeFd = -1; + int restore = -1; bool bg = false; const struct option options[] = { { "background", 0, NULL, 'b' }, @@ -2462,6 +2535,7 @@ int main(int argc, char *argv[]) { "share-net", 1, NULL, 'N' }, { "share-ipc", 1, NULL, 'I' }, { "share-uts", 1, NULL, 'U' }, + { "restore", 1, NULL, 'r' }, { "help", 0, NULL, 'h' }, { 0, 0, 0, 0 }, }; @@ -2488,7 +2562,7 @@ int main(int argc, char *argv[]) while (1) { int c; - c = getopt_long(argc, argv, "dn:v:p:m:c:s:h:S:N:I:U:", + c = getopt_long(argc, argv, "dn:v:p:m:c:s:h:S:N:I:U:r:", options, NULL); if (c == -1) @@ -2560,6 +2634,14 @@ int main(int argc, char *argv[]) securityDriver = optarg; break; + case 'r': + if (virStrToLong_i(optarg, NULL, 10, &restore) < 0) { + fprintf(stderr, "malformed --restore argument '%s'", + optarg); + goto cleanup; + } + break; + case 'h': case '?': fprintf(stderr, "\n"); @@ -2576,6 +2658,7 @@ int main(int argc, char *argv[]) fprintf(stderr, " -N FD, --share-net FD\n"); fprintf(stderr, " -I FD, --share-ipc FD\n"); fprintf(stderr, " -U FD, --share-uts FD\n"); + fprintf(stderr, " -r FD, --restore FD\n"); fprintf(stderr, " -h, --help\n"); fprintf(stderr, "\n"); rc = 0; @@ -2628,6 +2711,8 @@ int main(int argc, char *argv[]) ctrl->passFDs = passFDs; ctrl->npassFDs = npassFDs; + ctrl->restore = restore; + for (i = 0; i < VIR_LXC_DOMAIN_NAMESPACE_LAST; i++) { if (ns_fd[i] != -1) { if (!ctrl->nsFDs) {/*allocate only once */ diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index 4416acf923..44306685f5 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -44,6 +44,7 @@ #include "lxc_driver.h" #include "lxc_native.h" #include "lxc_process.h" +#include "lxc_criu.h" #include "virnetdevbridge.h" #include "virnetdevveth.h" #include "virnetdevopenvswitch.h" @@ -1012,7 +1013,7 @@ static int lxcDomainCreateWithFiles(virDomainPtr dom, ret = virLXCProcessStart(dom->conn, driver, vm, nfiles, files, (flags & VIR_DOMAIN_START_AUTODESTROY), - VIR_DOMAIN_RUNNING_BOOTED); + -1, VIR_DOMAIN_RUNNING_BOOTED); if (ret == 0) { event = virDomainEventLifecycleNewFromObj(vm, @@ -1135,7 +1136,7 @@ lxcDomainCreateXMLWithFiles(virConnectPtr conn, if (virLXCProcessStart(conn, driver, vm, nfiles, files, (flags & VIR_DOMAIN_START_AUTODESTROY), - VIR_DOMAIN_RUNNING_BOOTED) < 0) { + -1, VIR_DOMAIN_RUNNING_BOOTED) < 0) { virDomainAuditStart(vm, "booted", false); virLXCDomainObjEndJob(driver, vm); if (!vm->persistent) @@ -2731,6 +2732,338 @@ static int lxcDomainResume(virDomainPtr dom) return ret; } + +static int +lxcDoDomainSave(virLXCDriverPtr driver, virDomainObjPtr vm, + const char *to) +{ + virCapsPtr caps = NULL; + virLXCSaveHeader hdr; + virLXCDriverConfigPtr cfg = virLXCDriverGetConfig(driver); + g_autofree char *checkpointdir = NULL; + g_autofree char *criufile = NULL; + g_autofree char *xml = NULL; + uint32_t xml_len; + int compressed = 0; + int fd = -1; + int criufd = -1; + int ret = -1; + ssize_t r; + + if (!(caps = virLXCDriverGetCapabilities(driver, false))) + return -1; + + checkpointdir = g_path_get_dirname(to); + + if (lxcCriuDump(vm, checkpointdir) != 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to checkpoint domain with CRIU")); + goto cleanup; + } + + if ((compressed = lxcCriuCompress(checkpointdir, + cfg->saveImageFormat)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to compress criu files")); + goto cleanup; + } + + hdr.compressed = compressed; + + if ((fd = virFileOpenAs(to, O_CREAT | O_TRUNC | O_WRONLY, + S_IRUSR | S_IWUSR, -1, -1, 0)) < 0) { + virReportSystemError(-fd, + _("Failed to create domain save file '%s'"), to); + goto cleanup; + } + + if ((xml = virDomainDefFormat(vm->def, driver->xmlopt, 0)) == NULL) + goto cleanup; + + xml_len = strlen(xml) + 1; + + memset(&hdr, 0, sizeof(hdr)); + memcpy(hdr.magic, LXC_SAVE_MAGIC, sizeof(hdr.magic)); + hdr.version = LXC_SAVE_VERSION; + hdr.xmlLen = xml_len; + + if (safewrite(fd, &hdr, sizeof(hdr)) != sizeof(hdr)) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("Failed to write save file header")); + goto cleanup; + } + + if (safewrite(fd, xml, xml_len) != xml_len) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("Failed to write xml description")); + goto cleanup; + } + + criufile = g_strdup_printf("%s/criu.save", checkpointdir); + if ((criufd = virFileOpenAs(criufile, O_RDONLY, + 0, -1, -1, 0)) < 0) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("Failed to read criu file")); + goto cleanup; + } + + do { + char buf[1024]; + + if ((r = saferead(criufd, buf, sizeof(buf))) < 0) { + virReportSystemError(errno, + _("Unable to read from file '%s'"), + criufile); + goto cleanup; + } + + if (safewrite(fd, buf, r) < 0) { + virReportSystemError(errno, + _("Unable to write to file '%s'"), + to); + goto cleanup; + } + } while (r); + + if (virFileRemove(criufile, -1, -1) < 0) + VIR_WARN("failed to remove scratch file '%s'", + criufile); + + virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, + VIR_DOMAIN_SHUTOFF_SAVED); + + ret = 0; + cleanup: + VIR_FORCE_CLOSE(fd); + VIR_FORCE_CLOSE(criufd); + virObjectUnref(caps); + virObjectUnref(cfg); + return ret; +} + +static int +lxcDomainSaveFlags(virDomainPtr dom, const char *to, const char *dxml, + unsigned int flags) +{ + int ret = -1; + virLXCDriverPtr driver = dom->conn->privateData; + virDomainObjPtr vm; + bool remove_dom = false; + + virCheckFlags(0, -1); + if (dxml) { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("xml modification unsupported")); + return -1; + } + + if (!(vm = lxcDomObjFromDomain(dom))) + goto cleanup; + + if (virDomainSaveFlagsEnsureACL(dom->conn, vm->def) < 0) + goto cleanup; + + if (virLXCDomainObjBeginJob(driver, vm, LXC_JOB_MODIFY) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("Domain is not running")); + goto endjob; + } + + if (lxcDoDomainSave(driver, vm, to) < 0) + goto endjob; + + if (!vm->persistent) + remove_dom = true; + + ret = 0; + + endjob: + virLXCDomainObjEndJob(driver, vm); + + cleanup: + if (remove_dom && vm) + virDomainObjListRemove(driver->domains, vm); + virDomainObjEndAPI(&vm); + return ret; +} + +static int +lxcDomainSave(virDomainPtr dom, const char *to) +{ + return lxcDomainSaveFlags(dom, to, NULL, 0); +} + +static int +lxcDomainRestoreFlags(virConnectPtr conn, const char *from, + const char* dxml, unsigned int flags) +{ + virLXCDriverPtr driver = conn->privateData; + virLXCDriverConfigPtr cfg = NULL; + virLXCSaveHeader hdr; + virDomainObjPtr vm = NULL; + virDomainDefPtr def = NULL; + virCapsPtr caps = NULL; + int ret = -1; + int restorefd; + g_autofree char *xml = NULL; + g_autofree char *xml_image_path = NULL; + g_autofree char *criufile = NULL; + g_autofree char *savedir = NULL; + g_autofree char *checkpointdir = NULL; + ssize_t r; + int criufd; + int fd; + + virCheckFlags(0, -1); + if (dxml) { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("xml modification unsupported")); + goto out; + } + + if ((fd = virFileOpenAs(from, O_RDONLY, 0, -1, -1, 0)) < 0) { + virReportSystemError(-fd, + _("Failed to open domain image file '%s'"), + xml_image_path); + goto out; + } + + if (saferead(fd, &hdr, sizeof(hdr)) != sizeof(hdr)) { + virReportError(VIR_ERR_OPERATION_FAILED, + "%s", _("failed to read lxc header")); + return -1; + } + + if (memcmp(hdr.magic, LXC_SAVE_MAGIC, sizeof(hdr.magic)) != 0) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("image magic is incorrect")); + return -1; + } + + if (hdr.version > LXC_SAVE_VERSION) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("image version is not supported (%d > %d)"), + hdr.version, LXC_SAVE_VERSION); + return -1; + } + + cfg = virLXCDriverGetConfig(driver); + + xml = g_new0(char, hdr.xmlLen); + + if (saferead(fd, xml, hdr.xmlLen) != hdr.xmlLen) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", _("failed to read XML")); + goto cleanup; + } + + checkpointdir = g_path_get_dirname(from); + + criufile = g_strdup_printf("%s/criu.save", checkpointdir); + if ((criufd = virFileOpenAs(criufile, O_CREAT | O_TRUNC | O_WRONLY, + S_IRUSR | S_IWUSR, -1, -1, 0)) < 0) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("Failed to read criu file")); + goto cleanup; + } + + do { + char buf[1024]; + + if ((r = saferead(fd, buf, sizeof(buf))) < 0) { + virReportSystemError(errno, + _("Unable to read from file '%s'"), + criufile); + goto cleanup; + } + + if (safewrite(criufd, buf, r) < 0) { + virReportSystemError(errno, + _("Unable to write to file '%s'"), + from); + goto cleanup; + } + } while (r); + + if (lxcCriuDecompress(checkpointdir, + cfg->saveImageFormat) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to decompress criu files")); + goto cleanup; + } + + if (!xml) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("no domain XML parsed")); + goto cleanup; + } + + if (!(caps = virLXCDriverGetCapabilities(driver, false))) + goto cleanup; + + if (!(def = virDomainDefParseString(xml, driver->xmlopt, caps, + VIR_DOMAIN_DEF_PARSE_INACTIVE))) + goto cleanup; + + if (virDomainRestoreFlagsEnsureACL(conn, def) < 0) + goto cleanup; + + if (!(vm = virDomainObjListAdd(driver->domains, def, + driver->xmlopt, + VIR_DOMAIN_OBJ_LIST_ADD_LIVE | + VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE, + NULL))) + goto cleanup; + def = NULL; + + if (virLXCDomainObjBeginJob(driver, vm, LXC_JOB_MODIFY) < 0) { + if (!vm->persistent) + virDomainObjListRemove(driver->domains, vm); + goto cleanup; + } + + savedir = g_strdup_printf("%s/save/", checkpointdir); + + restorefd = open(savedir, O_DIRECTORY); + if (restorefd < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("Can't open images dir")); + if (!vm->persistent) + virDomainObjListRemove(driver->domains, vm); + virLXCDomainObjEndJob(driver, vm); + goto cleanup; + } + + ret = virLXCProcessStart(conn, driver, vm, + 0, NULL, + 0, restorefd, + VIR_DOMAIN_RUNNING_RESTORED); + + VIR_FORCE_CLOSE(restorefd); + + if (ret < 0 && !vm->persistent) + virDomainObjListRemove(driver->domains, vm); + + virLXCDomainObjEndJob(driver, vm); + cleanup: + VIR_FORCE_CLOSE(fd); + VIR_FORCE_CLOSE(criufd); + out: + virDomainDefFree(def); + virDomainObjEndAPI(&vm); + virObjectUnref(cfg); + return ret; +} + +static int +lxcDomainRestore(virConnectPtr conn, const char *from) +{ + return lxcDomainRestoreFlags(conn, from, NULL, 0); +} + + static int lxcDomainOpenConsole(virDomainPtr dom, const char *dev_name, @@ -5088,6 +5421,10 @@ static virHypervisorDriver lxcHypervisorDriver = { .domainLookupByName = lxcDomainLookupByName, /* 0.4.2 */ .domainSuspend = lxcDomainSuspend, /* 0.7.2 */ .domainResume = lxcDomainResume, /* 0.7.2 */ + .domainSave = lxcDomainSave, /* x.x.x */ + .domainSaveFlags = lxcDomainSaveFlags, /* x.x.x */ + .domainRestore = lxcDomainRestore, /* x.x.x */ + .domainRestoreFlags = lxcDomainRestoreFlags, /* x.x.x */ .domainDestroy = lxcDomainDestroy, /* 0.4.4 */ .domainDestroyFlags = lxcDomainDestroyFlags, /* 0.9.4 */ .domainGetOSType = lxcDomainGetOSType, /* 0.4.2 */ diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index cbc04a3dcd..8dc8c42558 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -117,8 +117,8 @@ virLXCProcessReboot(virLXCDriverPtr driver, vm->newDef = NULL; virLXCProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_SHUTDOWN); vm->newDef = savedDef; - if (virLXCProcessStart(conn, driver, vm, - 0, NULL, autodestroy, reason) < 0) { + if (virLXCProcessStart(conn, driver, vm, 0, + NULL, autodestroy, -1, reason) < 0) { VIR_WARN("Unable to handle reboot of vm %s", vm->def->name); goto cleanup; @@ -942,7 +942,8 @@ virLXCProcessBuildControllerCmd(virLXCDriverPtr driver, size_t nfiles, int handshakefd, int * const logfd, - const char *pidfile) + const char *pidfile, + int restorefd) { size_t i; g_autofree char *filterstr = NULL; @@ -1016,6 +1017,12 @@ virLXCProcessBuildControllerCmd(virLXCDriverPtr driver, for (i = 0; veths && veths[i]; i++) virCommandAddArgList(cmd, "--veth", veths[i], NULL); + if (restorefd != -1) { + virCommandAddArg(cmd, "--restore"); + virCommandAddArgFormat(cmd, "%d", restorefd); + virCommandPassFD(cmd, restorefd, 0); + } + virCommandPassFD(cmd, handshakefd, 0); virCommandDaemonize(cmd); virCommandSetPidFile(cmd, pidfile); @@ -1186,6 +1193,8 @@ virLXCProcessEnsureRootFS(virDomainObjPtr vm) * @driver: pointer to driver structure * @vm: pointer to virtual machine structure * @autoDestroy: mark the domain for auto destruction + * @restorefd: file descriptor pointing to the restore directory (-1 if not + * restoring) * @reason: reason for switching vm to running state * * Starts a vm @@ -1197,6 +1206,7 @@ int virLXCProcessStart(virConnectPtr conn, virDomainObjPtr vm, unsigned int nfiles, int *files, bool autoDestroy, + int restorefd, virDomainRunningReason reason) { int rc = -1, r; @@ -1388,7 +1398,8 @@ int virLXCProcessStart(virConnectPtr conn, files, nfiles, handshakefds[1], &logfd, - pidfile))) + pidfile, + restorefd))) goto cleanup; /* now that we know it is about to start call the hook if present */ @@ -1491,6 +1502,9 @@ int virLXCProcessStart(virConnectPtr conn, if (!priv->machineName) goto cleanup; + if (restorefd != -1) + goto skip_cgroup_checks; + /* We know the cgroup must exist by this synchronization * point so lets detect that first, since it gives us a * more reliable way to kill everything off if something @@ -1507,6 +1521,8 @@ int virLXCProcessStart(virConnectPtr conn, goto cleanup; } + skip_cgroup_checks: + /* And we can get the first monitor connection now too */ if (!(priv->monitor = virLXCProcessConnectMonitor(driver, vm))) { /* Intentionally overwrite the real monitor error message, @@ -1587,7 +1603,7 @@ virLXCProcessAutostartDomain(virDomainObjPtr vm, if (vm->autostart && !virDomainObjIsActive(vm)) { ret = virLXCProcessStart(data->conn, data->driver, vm, - 0, NULL, false, + 0, NULL, false, -1, VIR_DOMAIN_RUNNING_BOOTED); virDomainAuditStart(vm, "booted", ret >= 0); if (ret < 0) { diff --git a/src/lxc/lxc_process.h b/src/lxc/lxc_process.h index 383f6f714d..dab300784f 100644 --- a/src/lxc/lxc_process.h +++ b/src/lxc/lxc_process.h @@ -28,6 +28,7 @@ int virLXCProcessStart(virConnectPtr conn, virDomainObjPtr vm, unsigned int nfiles, int *files, bool autoDestroy, + int restorefd, virDomainRunningReason reason); int virLXCProcessStop(virLXCDriverPtr driver, virDomainObjPtr vm, -- 2.27.0

Hi guys, I marked this series as RFC to discuss some points. I'm interested in enhancing this specific part of LXC. So, some questions that I would like to hear as a feedback from community: 1. I decided to use a tar to compress all CRIU img files into a single file. Any other suggestions? 2. If no is the answer to question above, is there a consensus on preferring to use command line calls or libraries? I would like to use libtar for instance. I personally think that this approach is ugly. Not sure if I'm able to do that. The same for CRIU. 3. Other important opinions obviously. -- Julio Cesar Faracco Em sáb., 27 de fev. de 2021 às 01:06, Julio Faracco <jcfaracco@gmail.com> escreveu:
This patch series implements a way to do checkpoint/restore to LXC driver using CRIU operations. This respects the other methods to save and restore processes states: using a file with a header with some metadata. The only difference here is basically the way LXC drivers join the files produced by CRIU. CRIU generates a lots of 'img' files and it is compresses using TAR to fit into the libvirt state file.
Julio Faracco (3): meson: Add support to CRIU binary into meson lxc: Including CRIU functions and functions to support C/R. lxc: Adding support to LXC driver to restore a container
meson.build | 10 + meson_options.txt | 1 + src/lxc/lxc_conf.c | 3 + src/lxc/lxc_conf.h | 2 + src/lxc/lxc_container.c | 188 +++++++++++++++++- src/lxc/lxc_container.h | 3 +- src/lxc/lxc_controller.c | 93 ++++++++- src/lxc/lxc_criu.c | 405 +++++++++++++++++++++++++++++++++++++++ src/lxc/lxc_criu.h | 50 +++++ src/lxc/lxc_driver.c | 341 +++++++++++++++++++++++++++++++- src/lxc/lxc_process.c | 26 ++- src/lxc/lxc_process.h | 1 + src/lxc/meson.build | 2 + 13 files changed, 1106 insertions(+), 19 deletions(-) create mode 100644 src/lxc/lxc_criu.c create mode 100644 src/lxc/lxc_criu.h
-- 2.27.0

On Sat, Feb 27, 2021 at 01:14:29AM -0300, Julio Faracco wrote:
Hi guys,
Hi and sorry for not replying earlier.
I marked this series as RFC to discuss some points. I'm interested in enhancing this specific part of LXC. So, some questions that I would like to hear as a feedback from community: 1. I decided to use a tar to compress all CRIU img files into a single file. Any other suggestions? 2. If no is the answer to question above, is there a consensus on preferring to use command line calls or libraries? I would like to use libtar for instance. I personally think that this approach is ugly. Not sure if I'm able to do that. The same for CRIU.
I remember that for CRIU, back when we were trying to do that, the issue was that the commands were not atomic, did not properly report error messages and maybe something more along the lines. Either there was no library interface or it was not MT-safe, basically there were couple of issues like that which we were not able to deal with. I do not really remember all the details. Maybe Michal does as I think he suggested the idea back then. I Cc'd him. In the worst scenario we will need to figure this all out again ;)

On 4/1/21 12:01 AM, Martin Kletzander wrote:
On Sat, Feb 27, 2021 at 01:14:29AM -0300, Julio Faracco wrote:
Hi guys,
Hi and sorry for not replying earlier.
Yeah, sorry. I have this marked for review and yet still haven't done so.
I marked this series as RFC to discuss some points. I'm interested in enhancing this specific part of LXC. So, some questions that I would like to hear as a feedback from community: 1. I decided to use a tar to compress all CRIU img files into a single file. Any other suggestions? 2. If no is the answer to question above, is there a consensus on preferring to use command line calls or libraries? I would like to use libtar for instance. I personally think that this approach is ugly. Not sure if I'm able to do that. The same for CRIU.
I remember that for CRIU, back when we were trying to do that, the issue was that the commands were not atomic, did not properly report error messages and maybe something more along the lines. Either there was no library interface or it was not MT-safe, basically there were couple of issues like that which we were not able to deal with.
I do not really remember all the details. Maybe Michal does as I think he suggested the idea back then. I Cc'd him. In the worst scenario we will need to figure this all out again ;)
IIRC the main problem was that we wanted CRIU to be able to send its data over a TCP connection. Back then, when a GSoC student was looking at this, CRIU was only able to store data into a file (or even multiple files in a directory?) and wasn't able to create server/client connection. Maybe this has changed since then? If not, then we can use tar, sure. And to transfer data we can use so called tunnelled migration, where the migration stream is sent over libvirt connection rather than directly to the other side (because then we would have to have nc or similar involved). https://libvirt.org/migration.html#transporttunnel Another issue was that it couldn't handle all namespaces (but I'm not certain - it was 5 years ago). But let me find some time and review patches. Michal

Hi Michal and Martin, Thanks for your reply. Just an explanation. I'm not interested directly in developing this specific feature. If there is a GSoC student addressed to this... Excellent. I'm interested in developing snapshot and container migration which unfortunately requires this feature. Unless you have another opinion. -- Julio Faracco Em qui., 1 de abr. de 2021 às 07:33, Michal Privoznik <mprivozn@redhat.com> escreveu:
On 4/1/21 12:01 AM, Martin Kletzander wrote:
On Sat, Feb 27, 2021 at 01:14:29AM -0300, Julio Faracco wrote:
Hi guys,
Hi and sorry for not replying earlier.
Yeah, sorry. I have this marked for review and yet still haven't done so.
I marked this series as RFC to discuss some points. I'm interested in enhancing this specific part of LXC. So, some questions that I would like to hear as a feedback from community: 1. I decided to use a tar to compress all CRIU img files into a single file. Any other suggestions? 2. If no is the answer to question above, is there a consensus on preferring to use command line calls or libraries? I would like to use libtar for instance. I personally think that this approach is ugly. Not sure if I'm able to do that. The same for CRIU.
I remember that for CRIU, back when we were trying to do that, the issue was that the commands were not atomic, did not properly report error messages and maybe something more along the lines. Either there was no library interface or it was not MT-safe, basically there were couple of issues like that which we were not able to deal with.
I do not really remember all the details. Maybe Michal does as I think he suggested the idea back then. I Cc'd him. In the worst scenario we will need to figure this all out again ;)
IIRC the main problem was that we wanted CRIU to be able to send its data over a TCP connection. Back then, when a GSoC student was looking at this, CRIU was only able to store data into a file (or even multiple files in a directory?) and wasn't able to create server/client connection. Maybe this has changed since then? If not, then we can use tar, sure. And to transfer data we can use so called tunnelled migration, where the migration stream is sent over libvirt connection rather than directly to the other side (because then we would have to have nc or similar involved).
https://libvirt.org/migration.html#transporttunnel
Another issue was that it couldn't handle all namespaces (but I'm not certain - it was 5 years ago).
But let me find some time and review patches.
Michal

On Thu, Apr 01, 2021 at 10:16:36AM -0300, Julio Faracco wrote:
Hi Michal and Martin,
Thanks for your reply. Just an explanation. I'm not interested directly in developing this specific feature. If there is a GSoC student addressed to this... Excellent. I'm interested in developing snapshot and container migration which unfortunately requires this feature. Unless you have another opinion.
That student was working on it, but that was 5 years ago as Michal said. I'm afraid that it's either you or nobody who does that =)
-- Julio Faracco
Em qui., 1 de abr. de 2021 às 07:33, Michal Privoznik <mprivozn@redhat.com> escreveu:
On 4/1/21 12:01 AM, Martin Kletzander wrote:
On Sat, Feb 27, 2021 at 01:14:29AM -0300, Julio Faracco wrote:
Hi guys,
Hi and sorry for not replying earlier.
Yeah, sorry. I have this marked for review and yet still haven't done so.
I marked this series as RFC to discuss some points. I'm interested in enhancing this specific part of LXC. So, some questions that I would like to hear as a feedback from community: 1. I decided to use a tar to compress all CRIU img files into a single file. Any other suggestions? 2. If no is the answer to question above, is there a consensus on preferring to use command line calls or libraries? I would like to use libtar for instance. I personally think that this approach is ugly. Not sure if I'm able to do that. The same for CRIU.
I remember that for CRIU, back when we were trying to do that, the issue was that the commands were not atomic, did not properly report error messages and maybe something more along the lines. Either there was no library interface or it was not MT-safe, basically there were couple of issues like that which we were not able to deal with.
I do not really remember all the details. Maybe Michal does as I think he suggested the idea back then. I Cc'd him. In the worst scenario we will need to figure this all out again ;)
IIRC the main problem was that we wanted CRIU to be able to send its data over a TCP connection. Back then, when a GSoC student was looking at this, CRIU was only able to store data into a file (or even multiple files in a directory?) and wasn't able to create server/client connection. Maybe this has changed since then? If not, then we can use tar, sure. And to transfer data we can use so called tunnelled migration, where the migration stream is sent over libvirt connection rather than directly to the other side (because then we would have to have nc or similar involved).
https://libvirt.org/migration.html#transporttunnel
Another issue was that it couldn't handle all namespaces (but I'm not certain - it was 5 years ago).
But let me find some time and review patches.
Michal
participants (3)
-
Julio Faracco
-
Martin Kletzander
-
Michal Privoznik