[libvirt] [PATCH RFC 0/5] lxc: Add save/restore support to lxc driver

This patch series adds support for saving an running lxc domain's state into files with lxcDomainSave and restore afterwards from files with lxcDomainRestore. Usage: virsh save [domain-name] [domain-id or domain-uuid] [directory name] I use CRIU tool (https://criu.org/Main_Page), that offers checkpoint/restore functionality for containers in userspace. For the time, I have tried successfully the C/R procedure for simple sh containers and OS containers. I'll mention some notes/issues here: *I have working C/R only for non systemd hosts (on systemd host I was facing problems with CRIU). *I have not done anything for container networking. Thats should be done with --veth-pair IN=OUT option in CRIU. *In new distros, where efivars mountpoint exists, CRIU dump fails. *The only tty restored is /dev/tty1. I'll fix this in another patch, to allow more ttys. *Currently for things to work, I have slightly modified criu source. That is in criu master I have the following diff: diff --git a/criu/tty.c b/criu/tty.c index 302dd54..2226484 100644 --- a/criu/tty.c +++ b/criu/tty.c @@ -1394,8 +1394,10 @@ static int verify_info(struct tty_info *info) */ if (term_opts_missing_any(info)) { if (tty_is_master(info)) { + /* pr_err("Corrupted master peer %x\n", info->tfe->id); return -1; + */ } else if (!term_opts_missing_all(info)) { pr_err("Corrupted slave peer %x\n", info->tfe->id); return -1; Lastly, I have ready a patch that adds support for migration but I wait for feedback on this series, and I'll send the migration one later. Anyway, any comments here are more than welcome. ps: That is a work in terms of my GSoC 2016 project. Katerina Koukiou (5): Include criu support in autotools lxc: make container's init process session leader lxc: adds checkpoint and restore helper functions lxc: adjusted libvirt-lxc process to add restore mode to it. lxc: adds save and restore support configure.ac | 8 ++ po/POTFILES.in | 1 + src/Makefile.am | 6 +- src/lxc/lxc_container.c | 208 ++++++++++++++++++++++++++++++++++-- src/lxc/lxc_container.h | 3 +- src/lxc/lxc_controller.c | 109 +++++++++++++++++-- src/lxc/lxc_criu.c | 273 +++++++++++++++++++++++++++++++++++++++++++++++ src/lxc/lxc_criu.h | 34 ++++++ src/lxc/lxc_driver.c | 238 ++++++++++++++++++++++++++++++++++++++++- src/lxc/lxc_process.c | 23 +++- src/lxc/lxc_process.h | 1 + 11 files changed, 883 insertions(+), 21 deletions(-) create mode 100644 src/lxc/lxc_criu.c create mode 100644 src/lxc/lxc_criu.h -- 2.7.3

Check for CRIU binary in autotools. This binary is needed for checkpointing/restoring linux containers. Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com> --- configure.ac | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/configure.ac b/configure.ac index 2c81c95..d061676 100644 --- a/configure.ac +++ b/configure.ac @@ -242,6 +242,7 @@ LIBVIRT_CHECK_AUDIT LIBVIRT_CHECK_AVAHI LIBVIRT_CHECK_BLKID LIBVIRT_CHECK_CAPNG +LIBVIRT_CHECK_CRIU LIBVIRT_CHECK_CURL LIBVIRT_CHECK_DBUS LIBVIRT_CHECK_FUSE @@ -425,6 +426,13 @@ AC_PATH_PROG([XSLTPROC], [xsltproc], [/usr/bin/xsltproc]) AC_PATH_PROG([AUGPARSE], [augparse], [/usr/bin/augparse]) AC_PROG_MKDIR_P AC_PROG_LN_S +AC_PATH_PROG([CRIU], [criu], [no], + [$PATH:/sbin:/usr/sbin:/usr/local/sbin]) +AM_CONDITIONAL([WITH_CRIU], [test "x$ac_cv_path_CRIU" != "xno"]) +if test "x$ac_cv_path_CRIU" != "xno"; then + AC_DEFINE_UNQUOTED([CRIU], ["$CRIU"], + [Location of criu program]) +fi dnl External programs that we can use if they are available. dnl We will hard-code paths to these programs unless we cannot -- 2.7.3

On Thu, Jul 21, 2016 at 03:37:23PM +0000, Katerina Koukiou wrote:
Check for CRIU binary in autotools. This binary is needed for checkpointing/restoring linux containers.
Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com> --- configure.ac | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/configure.ac b/configure.ac index 2c81c95..d061676 100644 --- a/configure.ac +++ b/configure.ac @@ -242,6 +242,7 @@ LIBVIRT_CHECK_AUDIT LIBVIRT_CHECK_AVAHI LIBVIRT_CHECK_BLKID LIBVIRT_CHECK_CAPNG +LIBVIRT_CHECK_CRIU
Nothing in your patch defines this.
LIBVIRT_CHECK_CURL LIBVIRT_CHECK_DBUS LIBVIRT_CHECK_FUSE @@ -425,6 +426,13 @@ AC_PATH_PROG([XSLTPROC], [xsltproc], [/usr/bin/xsltproc]) AC_PATH_PROG([AUGPARSE], [augparse], [/usr/bin/augparse]) AC_PROG_MKDIR_P AC_PROG_LN_S +AC_PATH_PROG([CRIU], [criu], [no], + [$PATH:/sbin:/usr/sbin:/usr/local/sbin]) +AM_CONDITIONAL([WITH_CRIU], [test "x$ac_cv_path_CRIU" != "xno"]) +if test "x$ac_cv_path_CRIU" != "xno"; then + AC_DEFINE_UNQUOTED([CRIU], ["$CRIU"], + [Location of criu program]) +fi
Please put this in a m4/virt-criu.m4 file and call it from the main configure.ac script Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

This patch forces container's init process, to become a session leader, that is its session ID is made the same as its process ID. That might seem unnecessary in general, but if we want to checkpoint a container with CRIU, which is needed for container migration, we must ensure that the SID of each process inside the container points to a process that lives in the same PID namespace as the container. Therefore, we force that the session leader is the init. Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com> --- src/lxc/lxc_container.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 916a37b..b857431 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -2245,6 +2245,14 @@ static int lxcContainerChild(void *data) argv->npassFDs, argv->passFDs) < 0) goto cleanup; + /* Make init process of the container the leader of the new session. + * That is needed when checkpointing container. + */ + if (setsid() < 0) { + virReportSystemError(errno, "%s", + _("Unable to become session leader")); + } + ret = 0; cleanup: VIR_FREE(ttyPath); -- 2.7.3

On Thu, Jul 21, 2016 at 03:37:24PM +0000, Katerina Koukiou wrote:
This patch forces container's init process, to become a session leader, that is its session ID is made the same as its process ID. That might seem unnecessary in general, but if we want to checkpoint a container with CRIU, which is needed for container migration, we must ensure that the SID of each process inside the container points to a process that lives in the same PID namespace as the container. Therefore, we force that the session leader is the init.
Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com> --- src/lxc/lxc_container.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 916a37b..b857431 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -2245,6 +2245,14 @@ static int lxcContainerChild(void *data) argv->npassFDs, argv->passFDs) < 0) goto cleanup;
+ /* Make init process of the container the leader of the new session. + * That is needed when checkpointing container. + */ + if (setsid() < 0) { + virReportSystemError(errno, "%s", + _("Unable to become session leader")); + }
Needs a goto cleanup, otherwise the error code gets set to 0
+ ret = 0; cleanup: VIR_FREE(ttyPath);
This is a clear bugfix we need even ignoring CRIU reqiurements, so I've pushed this with the fix mentioned. For example, it fixes running of a shell as pid 1 which previously reported sh: cannot set terminal process group (-1): Inappropriate ioctl for device sh: no job control in this shell Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

This patch adds some helper functions for checkpointing/restoring linux containers. We use CRIU binary. Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com> --- po/POTFILES.in | 1 + src/Makefile.am | 3 +- src/lxc/lxc_criu.c | 273 +++++++++++++++++++++++++++++++++++++++++++++++++++++ src/lxc/lxc_criu.h | 34 +++++++ 4 files changed, 310 insertions(+), 1 deletion(-) create mode 100644 src/lxc/lxc_criu.c create mode 100644 src/lxc/lxc_criu.h diff --git a/po/POTFILES.in b/po/POTFILES.in index a6b6c9c..718b11d 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -95,6 +95,7 @@ src/lxc/lxc_cgroup.c src/lxc/lxc_conf.c src/lxc/lxc_container.c src/lxc/lxc_controller.c +src/lxc/lxc_criu.c src/lxc/lxc_domain.c src/lxc/lxc_driver.c src/lxc/lxc_fuse.c diff --git a/src/Makefile.am b/src/Makefile.am index 78c493c..64a7680 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -750,7 +750,8 @@ LXC_DRIVER_SOURCES = \ lxc/lxc_process.c lxc/lxc_process.h \ lxc/lxc_fuse.c lxc/lxc_fuse.h \ lxc/lxc_native.c lxc/lxc_native.h \ - lxc/lxc_driver.c lxc/lxc_driver.h + lxc/lxc_driver.c lxc/lxc_driver.h \ + lxc/lxc_criu.c lxc/lxc_criu.h LXC_CONTROLLER_SOURCES = \ $(LXC_MONITOR_PROTOCOL_GENERATED) \ diff --git a/src/lxc/lxc_criu.c b/src/lxc/lxc_criu.c new file mode 100644 index 0000000..6944223 --- /dev/null +++ b/src/lxc/lxc_criu.c @@ -0,0 +1,273 @@ +/* + * lxc_criu.c: wrapper functions for CRIU C API to be used for lxc migration + * + * Copyright (C) 2016 Katerina Koukiou + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Author: Katerina Koukiou <k.koukiou@gmail.com> + */ + +#include <config.h> + +#include <fcntl.h> +#include <sys/stat.h> +#include <sys/mount.h> + +#include "virobject.h" +#include "virerror.h" +#include "virlog.h" +#include "virfile.h" +#include "vircommand.h" +#include "virstring.h" +#include "viralloc.h" + +#include "lxc_domain.h" +#include "lxc_driver.h" +#include "lxc_criu.h" + +#define VIR_FROM_THIS VIR_FROM_LXC + +VIR_LOG_INIT("lxc.lxc_criu"); + +#ifdef CRIU +int lxcCriuDump(virLXCDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm, + const char *checkpointdir) +{ + int fd; + int ret = -1; + virLXCDomainObjPrivatePtr priv; + virCommandPtr cmd; + struct stat sb; + char *path = NULL; + char *tty_info_path = NULL; + char *ttyinfo = NULL; + int status; + + if (virFileMakePath(checkpointdir) < 0) { + virReportSystemError(errno, + _("Failed to mkdir %s"), checkpointdir); + return -1; + } + + fd = open(checkpointdir, O_DIRECTORY); + if (fd < 0) { + virReportSystemError(errno, + _("Failed to open directory %s"), checkpointdir); + return -1; + } + + cmd = virCommandNew(CRIU); + virCommandAddArg(cmd, "dump"); + + virCommandAddArgList(cmd, "--images-dir", checkpointdir, NULL); + + virCommandAddArgList(cmd, "--log-file", "dump.log", NULL); + + virCommandAddArgList(cmd, "-vvvv", NULL); + + priv = vm->privateData; + virCommandAddArg(cmd, "--tree"); + virCommandAddArgFormat(cmd, "%d", priv->initpid); + + virCommandAddArgList(cmd, "--tcp-established", "--file-locks", + "--link-remap", "--force-irmap", NULL); + + virCommandAddArgList(cmd, "--manage-cgroup", NULL); + + virCommandAddArgList(cmd, "--enable-external-sharing", + "--enable-external-masters", NULL); + + virCommandAddArgList(cmd, "--enable-fs", "hugetlbfs", + "--enable-fs", "tracefs", NULL); + + /* Add support for FUSE */ + virCommandAddArgList(cmd, "--ext-mount-map", "/proc/meminfo:fuse", NULL); + virCommandAddArgList(cmd, "--ghost-limit", "10000000", NULL); + + virCommandAddArgList(cmd, "--ext-mount-map", "/dev/console:console", NULL); + virCommandAddArgList(cmd, "--ext-mount-map", "/dev/tty1:tty1", NULL); + virCommandAddArgList(cmd, "--ext-mount-map", "auto", NULL); + + /* The master pair of the /dev/pts device lives outside from what is dumped + * inside the libvirt-lxc process. Add the slave pair as an external tty + * otherwise criu will fail. + */ + if (virAsprintf(&path, "/proc/%d/root/dev/pts/0", priv->initpid) < 0) + goto cleanup; + + if (stat(path, &sb) < 0) { + virReportSystemError(errno, + _("Unable to stat %s"), path); + goto cleanup; + } + + if (virAsprintf(&tty_info_path, "%s/tty.info", checkpointdir) < 0) + goto cleanup; + + if (virAsprintf(&ttyinfo, "tty[%x:%x]", + (unsigned int)sb.st_rdev, (unsigned int)sb.st_dev) < 0) + goto cleanup; + + if (virFileWriteStr(tty_info_path, ttyinfo, 0666) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Failed to write tty info to %s"), tty_info_path); + goto cleanup; + } + + VIR_DEBUG("tty.info: tty[%x:%x]", + (unsigned int)sb.st_dev, (unsigned int)sb.st_rdev); + virCommandAddArg(cmd, "--external"); + virCommandAddArgFormat(cmd, "tty[%x:%x]", + (unsigned int)sb.st_rdev, (unsigned int)sb.st_dev); + + VIR_DEBUG("About to checkpoint domain %s (pid = %d)", + vm->def->name, priv->initpid); + virCommandRawStatus(cmd); + if (virCommandRun(cmd, &status) < 0) + goto cleanup; + + ret = 0; + + cleanup: + VIR_FORCE_CLOSE(fd); + VIR_FREE(path); + VIR_FREE(tty_info_path); + VIR_FREE(ttyinfo); + + if (ret < 0) + return ret; + return status; +} + +int lxcCriuRestore(virDomainDefPtr def, int restorefd, + int ttyfd) +{ + int ret = -1; + virCommandPtr cmd; + char *ttyinfo = NULL; + char *inheritfd = NULL; + char *tty_info_path = NULL; + char *checkpointfd = NULL; + char *checkpointdir = NULL; + char *rootfs_mount = NULL; + + cmd = virCommandNew(CRIU); + virCommandAddArg(cmd, "restore"); + + if (virAsprintf(&checkpointfd, "/proc/self/fd/%d", restorefd) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to write checkpoint dir path")); + goto cleanup; + } + + if (virFileResolveLink(checkpointfd, &checkpointdir) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to readlink checkpoint dir path")); + goto cleanup; + } + + /* CRIU needs the container's root bind mounted so that it is the root of + * some mount. + */ + if (virAsprintf(&rootfs_mount, "/tmp/%s", def->name) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to write rootfs dir mount path")); + goto cleanup; + } + + virCommandAddArgList(cmd, "--images-dir", checkpointdir, NULL); + + virCommandAddArgList(cmd, "--log-file", "restore.log", NULL); + + virCommandAddArgList(cmd, "--pidfile", "pidfile", NULL); + + virCommandAddArgList(cmd, "-vvvv", NULL); + virCommandAddArgList(cmd, "--tcp-established", "--file-locks", + "--link-remap", "--force-irmap", NULL); + + virCommandAddArgList(cmd, "--enable-external-sharing", + "--enable-external-masters", NULL); + + virCommandAddArgList(cmd, "--ext-mount-map", "auto", NULL); + + virCommandAddArgList(cmd, "--enable-fs", "hugetlbfs", + "--enable-fs", "tracefs", NULL); + + virCommandAddArgList(cmd, "--ext-mount-map", "fuse:/proc/meminfo", NULL); + + virCommandAddArgList(cmd, "--ext-mount-map", "console:/dev/console", NULL); + virCommandAddArgList(cmd, "--ext-mount-map", "tty1:/dev/tty1", NULL); + + /* Restore cgroup properties if only cgroup has been created by criu, + * otherwise do not restore properies + */ + virCommandAddArgList(cmd, "--manage-cgroup", "soft", NULL); + + virCommandAddArgList(cmd, "--restore-detached", "--restore-sibling", NULL); + + /* Restore external tty that was saved in tty.info file + */ + if (virAsprintf(&tty_info_path, "%s/tty.info", checkpointdir) < 0) + goto cleanup; + + if (virFileReadAll(tty_info_path, 1024, &ttyinfo) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Failed to read tty info from %s"), tty_info_path); + goto cleanup; + } + if (virAsprintf(&inheritfd, "fd[%d]:%s", ttyfd, ttyinfo) < 0) + goto cleanup; + + virCommandAddArgList(cmd, "--inherit-fd", inheritfd, NULL); + + /* Change the root filesystem because we run in mount namespace. + */ + virCommandAddArgList(cmd, "--root", rootfs_mount, NULL); + + /* If virCommandExec returns here we have an error */ + ignore_value(virCommandExec(cmd)); + + ret = -1; + + cleanup: + VIR_FREE(tty_info_path); + VIR_FREE(ttyinfo); + VIR_FREE(inheritfd); + VIR_FREE(checkpointdir); + VIR_FREE(checkpointfd); + VIR_FREE(rootfs_mount); + VIR_FREE(cmd); + + return ret; +} +#else +int lxcCriuDump(virLXCDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm ATTRIBUTE_UNUSED, + const char *checkpointdir ATTRIBUTE_UNUSED) +{ + virReportUnsupportedError(); + return -1; +} + +int lxcCriuRestore(virDomainDefPtr def ATTRIBUTE_UNUSED, + int fd ATTRIBUTE_UNUSED, + int ttyfd ATTRIBUTE_UNUSED) +{ + virReportUnsupportedError(); + return -1; +} +#endif diff --git a/src/lxc/lxc_criu.h b/src/lxc/lxc_criu.h new file mode 100644 index 0000000..757580c --- /dev/null +++ b/src/lxc/lxc_criu.h @@ -0,0 +1,34 @@ +/* + * lxc_criu.h: CRIU C API methods wrapper + * + * Copyright (C) 2016 Katerina Koukiou + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Author: Katerina Koukiou <k.koukiou@gmail.com> + */ + +#ifndef LXC_CRIU_H +# define LXC_CRIU_H + +# include "virobject.h" + +int lxcCriuDump(virLXCDriverPtr driver, + virDomainObjPtr vm, + const char *checkpointdir); + +int lxcCriuRestore(virDomainDefPtr def, int fd, + int ttyfd); +#endif /* LXC_CRIU_H */ -- 2.7.3

On Thu, Jul 21, 2016 at 03:37:25PM +0000, Katerina Koukiou wrote:
This patch adds some helper functions for checkpointing/restoring linux containers. We use CRIU binary.
Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com> --- po/POTFILES.in | 1 + src/Makefile.am | 3 +- src/lxc/lxc_criu.c | 273 +++++++++++++++++++++++++++++++++++++++++++++++++++++ src/lxc/lxc_criu.h | 34 +++++++ 4 files changed, 310 insertions(+), 1 deletion(-) create mode 100644 src/lxc/lxc_criu.c create mode 100644 src/lxc/lxc_criu.h
diff --git a/po/POTFILES.in b/po/POTFILES.in index a6b6c9c..718b11d 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -95,6 +95,7 @@ src/lxc/lxc_cgroup.c src/lxc/lxc_conf.c src/lxc/lxc_container.c src/lxc/lxc_controller.c +src/lxc/lxc_criu.c src/lxc/lxc_domain.c src/lxc/lxc_driver.c src/lxc/lxc_fuse.c diff --git a/src/Makefile.am b/src/Makefile.am index 78c493c..64a7680 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -750,7 +750,8 @@ LXC_DRIVER_SOURCES = \ lxc/lxc_process.c lxc/lxc_process.h \ lxc/lxc_fuse.c lxc/lxc_fuse.h \ lxc/lxc_native.c lxc/lxc_native.h \ - lxc/lxc_driver.c lxc/lxc_driver.h + lxc/lxc_driver.c lxc/lxc_driver.h \ + lxc/lxc_criu.c lxc/lxc_criu.h
LXC_CONTROLLER_SOURCES = \ $(LXC_MONITOR_PROTOCOL_GENERATED) \ diff --git a/src/lxc/lxc_criu.c b/src/lxc/lxc_criu.c new file mode 100644 index 0000000..6944223 --- /dev/null +++ b/src/lxc/lxc_criu.c @@ -0,0 +1,273 @@ +/* + * lxc_criu.c: wrapper functions for CRIU C API to be used for lxc migration + * + * Copyright (C) 2016 Katerina Koukiou + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Author: Katerina Koukiou <k.koukiou@gmail.com> + */ + +#include <config.h> + +#include <fcntl.h> +#include <sys/stat.h> +#include <sys/mount.h> + +#include "virobject.h" +#include "virerror.h" +#include "virlog.h" +#include "virfile.h" +#include "vircommand.h" +#include "virstring.h" +#include "viralloc.h" + +#include "lxc_domain.h" +#include "lxc_driver.h" +#include "lxc_criu.h" + +#define VIR_FROM_THIS VIR_FROM_LXC + +VIR_LOG_INIT("lxc.lxc_criu"); + +#ifdef CRIU +int lxcCriuDump(virLXCDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm, + const char *checkpointdir)
For dumping a container we should be creating a single file containing all the data, not creating multiple files spread across a directory. Take a look at what we do with the QEMU driver where we have a magic header, then the XML description and the the actual dumped state from QEMU. We should do the same kind of thing with LXC.
+{ + int fd; + int ret = -1; + virLXCDomainObjPrivatePtr priv; + virCommandPtr cmd; + struct stat sb; + char *path = NULL; + char *tty_info_path = NULL; + char *ttyinfo = NULL; + int status; + + if (virFileMakePath(checkpointdir) < 0) { + virReportSystemError(errno, + _("Failed to mkdir %s"), checkpointdir); + return -1; + } + + fd = open(checkpointdir, O_DIRECTORY); + if (fd < 0) { + virReportSystemError(errno, + _("Failed to open directory %s"), checkpointdir); + return -1; + } + + cmd = virCommandNew(CRIU); + virCommandAddArg(cmd, "dump"); + + virCommandAddArgList(cmd, "--images-dir", checkpointdir, NULL); + + virCommandAddArgList(cmd, "--log-file", "dump.log", NULL); + + virCommandAddArgList(cmd, "-vvvv", NULL); + + priv = vm->privateData; + virCommandAddArg(cmd, "--tree"); + virCommandAddArgFormat(cmd, "%d", priv->initpid); + + virCommandAddArgList(cmd, "--tcp-established", "--file-locks", + "--link-remap", "--force-irmap", NULL); + + virCommandAddArgList(cmd, "--manage-cgroup", NULL); + + virCommandAddArgList(cmd, "--enable-external-sharing", + "--enable-external-masters", NULL); + + virCommandAddArgList(cmd, "--enable-fs", "hugetlbfs", + "--enable-fs", "tracefs", NULL); + + /* Add support for FUSE */ + virCommandAddArgList(cmd, "--ext-mount-map", "/proc/meminfo:fuse", NULL); + virCommandAddArgList(cmd, "--ghost-limit", "10000000", NULL); + + virCommandAddArgList(cmd, "--ext-mount-map", "/dev/console:console", NULL); + virCommandAddArgList(cmd, "--ext-mount-map", "/dev/tty1:tty1", NULL); + virCommandAddArgList(cmd, "--ext-mount-map", "auto", NULL); + + /* The master pair of the /dev/pts device lives outside from what is dumped + * inside the libvirt-lxc process. Add the slave pair as an external tty + * otherwise criu will fail. + */ + if (virAsprintf(&path, "/proc/%d/root/dev/pts/0", priv->initpid) < 0) + goto cleanup; + + if (stat(path, &sb) < 0) { + virReportSystemError(errno, + _("Unable to stat %s"), path); + goto cleanup; + } + + if (virAsprintf(&tty_info_path, "%s/tty.info", checkpointdir) < 0) + goto cleanup; + + if (virAsprintf(&ttyinfo, "tty[%x:%x]", + (unsigned int)sb.st_rdev, (unsigned int)sb.st_dev) < 0) + goto cleanup; + + if (virFileWriteStr(tty_info_path, ttyinfo, 0666) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Failed to write tty info to %s"), tty_info_path); + goto cleanup; + } + + VIR_DEBUG("tty.info: tty[%x:%x]", + (unsigned int)sb.st_dev, (unsigned int)sb.st_rdev); + virCommandAddArg(cmd, "--external"); + virCommandAddArgFormat(cmd, "tty[%x:%x]", + (unsigned int)sb.st_rdev, (unsigned int)sb.st_dev); + + VIR_DEBUG("About to checkpoint domain %s (pid = %d)", + vm->def->name, priv->initpid); + virCommandRawStatus(cmd); + if (virCommandRun(cmd, &status) < 0) + goto cleanup; + + ret = 0; + + cleanup: + VIR_FORCE_CLOSE(fd); + VIR_FREE(path); + VIR_FREE(tty_info_path); + VIR_FREE(ttyinfo); + + if (ret < 0) + return ret; + return status; +} + +int lxcCriuRestore(virDomainDefPtr def, int restorefd, + int ttyfd) +{ + int ret = -1; + virCommandPtr cmd; + char *ttyinfo = NULL; + char *inheritfd = NULL; + char *tty_info_path = NULL; + char *checkpointfd = NULL; + char *checkpointdir = NULL; + char *rootfs_mount = NULL; + + cmd = virCommandNew(CRIU); + virCommandAddArg(cmd, "restore"); + + if (virAsprintf(&checkpointfd, "/proc/self/fd/%d", restorefd) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to write checkpoint dir path")); + goto cleanup; + } + + if (virFileResolveLink(checkpointfd, &checkpointdir) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to readlink checkpoint dir path")); + goto cleanup; + } + + /* CRIU needs the container's root bind mounted so that it is the root of + * some mount. + */ + if (virAsprintf(&rootfs_mount, "/tmp/%s", def->name) < 0) {
If this is a directory on the host filesysten, then this is a security flaw. You must never create predictable filenames in /tmp. Ideally files would be under one of the private directores libvirt already uses in /var/run/libvirt or /var/lib/libvirt Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

When doing lxc migration or simply restoring the container from a saved state, we need restore the container from CRIU img files that we have stored in disk. In this patch, we should extend lxcContainerStart into a more generic one, that either starts a container from scratch or restores it from a snapshot. Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com> --- src/Makefile.am | 3 +- src/lxc/lxc_container.c | 200 +++++++++++++++++++++++++++++++++++++++++++++-- src/lxc/lxc_container.h | 3 +- src/lxc/lxc_controller.c | 109 ++++++++++++++++++++++++-- src/lxc/lxc_driver.c | 4 +- src/lxc/lxc_process.c | 23 +++++- src/lxc/lxc_process.h | 1 + 7 files changed, 323 insertions(+), 20 deletions(-) diff --git a/src/Makefile.am b/src/Makefile.am index 64a7680..1542251 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -761,7 +761,8 @@ LXC_CONTROLLER_SOURCES = \ lxc/lxc_cgroup.c lxc/lxc_cgroup.h \ lxc/lxc_domain.c lxc/lxc_domain.h \ lxc/lxc_fuse.c lxc/lxc_fuse.h \ - lxc/lxc_controller.c + lxc/lxc_controller.c \ + lxc/lxc_criu.c lxc/lxc_criu.h SECURITY_DRIVER_APPARMOR_HELPER_SOURCES = \ $(DATATYPES_SOURCES) \ diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index b857431..7d307ee 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -70,6 +70,8 @@ #include "virprocess.h" #include "virstring.h" +#include "lxc_criu.h" + #define VIR_FROM_THIS VIR_FROM_LXC VIR_LOG_INIT("lxc.lxc_container"); @@ -112,6 +114,7 @@ struct __lxc_child_argv { char **ttyPaths; int handshakefd; int *nsInheritFDs; + int restorefd; }; static int lxcContainerMountFSBlock(virDomainFSDefPtr fs, @@ -266,7 +269,7 @@ static virCommandPtr lxcContainerBuildInitCmd(virDomainDefPtr vmDef, * Returns 0 on success or -1 in case of error */ static int lxcContainerSetupFDs(int *ttyfd, - size_t npassFDs, int *passFDs) + size_t npassFDs, int *passFDs, int restorefd) { int rc = -1; int open_max; @@ -362,6 +365,8 @@ static int lxcContainerSetupFDs(int *ttyfd, } for (fd = last_fd + 1; fd < open_max; fd++) { + if (fd == restorefd) + continue; int tmpfd = fd; VIR_MASS_CLOSE(tmpfd); } @@ -1077,6 +1082,36 @@ static int lxcContainerMountFSDev(virDomainDefPtr def, return ret; } + +static int lxcContainerMountFSDevPTSRestore(virDomainDefPtr def, + const char *stateDir) +{ + int ret = -1; + char *path = NULL; + int flags = MS_MOVE; + + VIR_DEBUG("Mount /dev/pts stateDir=%s", stateDir); + + if (virAsprintf(&path, "%s/%s.devpts", + stateDir, def->name) < 0) + return ret; + + VIR_DEBUG("Trying to move %s to /dev/pts", path); + + if (mount(path, "/dev/pts", NULL, flags, NULL) < 0) { + virReportSystemError(errno, + _("Failed to mount %s on /dev/pts"), + path); + goto cleanup; + } + + ret = 0; + cleanup: + VIR_FREE(path); + return ret; +} + + static int lxcContainerMountFSDevPTS(virDomainDefPtr def, const char *stateDir) { @@ -2120,6 +2155,148 @@ static int lxcAttachNS(int *ns_fd) } +/* + * lxcContainerChildRestore: + * @data: pointer to container arguments + */ +static int lxcContainerChildRestore(void *data) +{ + lxc_child_argv_t *argv = data; + virDomainDefPtr vmDef = argv->config; + int ttyfd = -1; + int ret = -1; + char *ttyPath = NULL; + virDomainFSDefPtr root; + char *sec_mount_options = NULL; + char *stateDir = NULL; + char *rootfs_mount = NULL; + + if (NULL == vmDef) { + virReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("lxcChild() passed invalid vm definition")); + goto cleanup; + } + + if (lxcContainerWaitForContinue(argv->monitor) < 0) { + virReportSystemError(errno, "%s", + _("Failed to read the container continue message")); + goto cleanup; + } + VIR_DEBUG("Received container continue message"); + + if (lxcContainerSetID(vmDef) < 0) + goto cleanup; + + root = virDomainGetFilesystemForTarget(vmDef, "/"); + + if (argv->nttyPaths) { + const char *tty = argv->ttyPaths[0]; + if (STRPREFIX(tty, "/dev/pts/")) + tty += strlen("/dev/pts/"); + if (virAsprintf(&ttyPath, "%s/%s.devpts/%s", + LXC_STATE_DIR, vmDef->name, tty) < 0) + goto cleanup; + } else { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("At least one tty is required")); + goto cleanup; + } + + VIR_DEBUG("Container TTY path: %s", ttyPath); + + ttyfd = open(ttyPath, O_RDWR); + if (ttyfd < 0) { + virReportSystemError(errno, + _("Failed to open tty %s"), + ttyPath); + goto cleanup; + } + VIR_DEBUG("Container TTY fd: %d", ttyfd); + + if (!(sec_mount_options = virSecurityManagerGetMountOptions( + argv->securityDriver, + vmDef))) + goto cleanup; + + if (lxcContainerPrepareRoot(vmDef, root, sec_mount_options) < 0) + goto cleanup; + + if (lxcContainerSendContinue(argv->handshakefd) < 0) { + virReportSystemError(errno, "%s", + _("Failed to send continue signal to controller")); + goto cleanup; + } + + VIR_DEBUG("Setting up container's std streams"); + + if (lxcContainerSetupFDs(&ttyfd, + argv->npassFDs, argv->passFDs, argv->restorefd) < 0) + goto cleanup; + + /* CRIU needs the container's root bind mounted so that it is the root of + * some mount. + */ + if (virAsprintf(&rootfs_mount, "/tmp/%s", vmDef->name) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to write rootfs dir mount path")); + goto cleanup; + } + + if (virFileMakePath(rootfs_mount) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to mkdir rootfs mount path")); + goto cleanup; + } + + if (mount(root->src, rootfs_mount, NULL, MS_BIND, NULL) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to create rootfs mountpoint")); + goto cleanup; + } + + if (virFileResolveAllLinks(LXC_STATE_DIR, &stateDir) < 0) + goto cleanup; + + /* Mounts /dev/pts */ + if (lxcContainerMountFSDevPTSRestore(vmDef, stateDir) < 0) { + virReportSystemError(errno, "%s", + _("Failed to mount dev/pts")); + goto cleanup; + } + + if (setsid() < 0) { + virReportSystemError(errno, "%s", + _("Unable to become session leader")); + } + + ret = 0; + + cleanup: + VIR_FORCE_CLOSE(argv->monitor); + VIR_FORCE_CLOSE(argv->handshakefd); + VIR_FORCE_CLOSE(ttyfd); + VIR_FREE(ttyPath); + VIR_FREE(rootfs_mount); + VIR_FREE(stateDir); + VIR_FREE(sec_mount_options); + + if (ret == 0) { + VIR_DEBUG("Executing container restore criu function"); + ret = lxcCriuRestore(vmDef, argv->restorefd, 0); + } + + if (ret != 0) { + VIR_DEBUG("Tearing down container"); + fprintf(stderr, + _("Failure in libvirt_lxc startup: %s\n"), + virGetLastErrorMessage()); + } + + return ret; +} + + + /** * lxcContainerChild: * @data: pointer to container arguments @@ -2242,7 +2419,7 @@ static int lxcContainerChild(void *data) VIR_FORCE_CLOSE(argv->handshakefd); VIR_FORCE_CLOSE(argv->monitor); if (lxcContainerSetupFDs(&ttyfd, - argv->npassFDs, argv->passFDs) < 0) + argv->npassFDs, argv->passFDs, -1) < 0) goto cleanup; /* Make init process of the container the leader of the new session. @@ -2332,7 +2509,8 @@ int lxcContainerStart(virDomainDefPtr def, int handshakefd, int *nsInheritFDs, size_t nttyPaths, - char **ttyPaths) + char **ttyPaths, + int restorefd) { pid_t pid; int cflags; @@ -2350,6 +2528,7 @@ int lxcContainerStart(virDomainDefPtr def, .ttyPaths = ttyPaths, .handshakefd = handshakefd, .nsInheritFDs = nsInheritFDs, + .restorefd = restorefd, }; /* allocate a stack for the container */ @@ -2399,10 +2578,19 @@ int lxcContainerStart(virDomainDefPtr def, VIR_DEBUG("Inheriting a UTS namespace"); } - VIR_DEBUG("Cloning container init process"); - pid = clone(lxcContainerChild, stacktop, cflags, &args); + if (restorefd == -1) + VIR_DEBUG("Cloning container init process"); + else + VIR_DEBUG("Cloning container process that will spawn criu restore"); + + if (restorefd != -1) + pid = clone(lxcContainerChildRestore, stacktop, SIGCHLD, &args); + else + pid = clone(lxcContainerChild, stacktop, cflags, &args); + VIR_FREE(stack); - VIR_DEBUG("clone() completed, new container PID is %d", pid); + if (restorefd == -1) + VIR_DEBUG("clone() completed, new container PID is %d", pid); if (pid < 0) { virReportSystemError(errno, "%s", diff --git a/src/lxc/lxc_container.h b/src/lxc/lxc_container.h index 33eaab4..5d47071 100644 --- a/src/lxc/lxc_container.h +++ b/src/lxc/lxc_container.h @@ -63,7 +63,8 @@ int lxcContainerStart(virDomainDefPtr def, int handshakefd, int *nsInheritFDs, size_t nttyPaths, - char **ttyPaths); + char **ttyPaths, + int restorefd); int lxcContainerAvailable(int features); diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index e58ff1b..e178195 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -146,6 +146,8 @@ struct _virLXCController { virCgroupPtr cgroup; virLXCFusePtr fuse; + + int restore; }; #include "lxc_controller_dispatch.h" @@ -1009,6 +1011,64 @@ static int lxcControllerClearCapabilities(void) return 0; } +static int +lxcControllerFindRestoredPid(int fd) +{ + int initpid = 0; + int ret = -1; + char *checkpointdir = NULL; + char *pidfile = NULL; + char *checkpointfd = NULL; + int pidfilefd; + char c; + + if (fd < 0) + goto cleanup; + + if (virAsprintf(&checkpointfd, "/proc/self/fd/%d", fd) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to write checkpoint dir path")); + goto cleanup; + } + + if (virFileResolveLink(checkpointfd, &checkpointdir) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to readlink checkpoint dir path")); + goto cleanup; + } + + if (virAsprintf(&pidfile, "%s/pidfile", checkpointdir) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to write pidfile path")); + goto cleanup; + } + + if ((pidfilefd = virFileOpenAs(pidfile, O_RDONLY, 0, -1, -1, 0)) < 0) { + virReportSystemError(pidfilefd, + _("Failed to open domain's pidfile '%s'"), + pidfile); + goto cleanup; + } + + while ((saferead(pidfilefd, &c, 1) == 1) && c != EOF) + initpid = initpid*10 + c - '0'; + + ret = initpid; + + if (virFileRemove(pidfile, -1, -1) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to delete pidfile path")); + } + + cleanup: + VIR_FORCE_CLOSE(fd); + VIR_FORCE_CLOSE(pidfilefd); + VIR_FREE(pidfile); + VIR_FREE(checkpointdir); + VIR_FREE(checkpointfd); + return ret; +} + static bool wantReboot; static virMutex lock = VIR_MUTEX_INITIALIZER; @@ -2348,6 +2408,7 @@ virLXCControllerRun(virLXCControllerPtr ctrl) int containerhandshake[2] = { -1, -1 }; char **containerTTYPaths = NULL; size_t i; + bool restore_mode = (ctrl->restore != -1); if (VIR_ALLOC_N(containerTTYPaths, ctrl->nconsoles) < 0) goto cleanup; @@ -2404,8 +2465,10 @@ virLXCControllerRun(virLXCControllerPtr ctrl) containerhandshake[1], ctrl->nsFDs, ctrl->nconsoles, - containerTTYPaths)) < 0) + containerTTYPaths, + ctrl->restore)) < 0) goto cleanup; + VIR_FORCE_CLOSE(control[1]); VIR_FORCE_CLOSE(containerhandshake[1]); @@ -2416,10 +2479,10 @@ virLXCControllerRun(virLXCControllerPtr ctrl) for (i = 0; i < VIR_LXC_DOMAIN_NAMESPACE_LAST; i++) VIR_FORCE_CLOSE(ctrl->nsFDs[i]); - if (virLXCControllerSetupCgroupLimits(ctrl) < 0) + if (!restore_mode && virLXCControllerSetupCgroupLimits(ctrl) < 0) goto cleanup; - if (virLXCControllerSetupUserns(ctrl) < 0) + if (!restore_mode && virLXCControllerSetupUserns(ctrl) < 0) goto cleanup; if (virLXCControllerMoveInterfaces(ctrl) < 0) @@ -2444,13 +2507,33 @@ virLXCControllerRun(virLXCControllerPtr ctrl) if (lxcControllerClearCapabilities() < 0) goto cleanup; - if (virLXCControllerDaemonHandshake(ctrl) < 0) - goto cleanup; + if (restore_mode) { + int status; + int ret = waitpid(-1, &status, 0); + VIR_DEBUG("Got sig child %d", ret); + + /* We have two basic cases here. + * - CRIU died bacause of restore error and we do not have a running container + * - CRIU detached itself from the running container + */ + int initpid; + if ((initpid = lxcControllerFindRestoredPid(ctrl->restore)) < 0) { + virReportSystemError(errno, "%s", + _("Unable to get restored task pid")); + virNetDaemonQuit(ctrl->daemon); + goto cleanup; + } else { + ctrl->initpid = initpid; + } + } for (i = 0; i < ctrl->nconsoles; i++) if (virLXCControllerConsoleSetNonblocking(&(ctrl->consoles[i])) < 0) goto cleanup; + if (virLXCControllerDaemonHandshake(ctrl) < 0) + goto cleanup; + /* We must not hold open a dbus connection for life * of LXC instance, since dbus-daemon is limited to * only a few 100 connections by default @@ -2487,6 +2570,8 @@ int main(int argc, char *argv[]) int ns_fd[VIR_LXC_DOMAIN_NAMESPACE_LAST]; int handshakeFd = -1; bool bg = false; + int restore = -1; + const struct option options[] = { { "background", 0, NULL, 'b' }, { "name", 1, NULL, 'n' }, @@ -2498,6 +2583,7 @@ int main(int argc, char *argv[]) { "share-net", 1, NULL, 'N' }, { "share-ipc", 1, NULL, 'I' }, { "share-uts", 1, NULL, 'U' }, + { "restore", 1, NULL, 'r' }, { "help", 0, NULL, 'h' }, { 0, 0, 0, 0 }, }; @@ -2525,7 +2611,7 @@ int main(int argc, char *argv[]) while (1) { int c; - c = getopt_long(argc, argv, "dn:v:p:m:c:s:h:S:N:I:U:", + c = getopt_long(argc, argv, "dn:v:p:m:c:s:h:S:N:I:U:r:", options, NULL); if (c == -1) @@ -2601,6 +2687,14 @@ int main(int argc, char *argv[]) securityDriver = optarg; break; + case 'r': + if (virStrToLong_i(optarg, NULL, 10, &restore) < 0) { + fprintf(stderr, "malformed --restore argument '%s'", + optarg); + goto cleanup; + } + break; + case 'h': case '?': fprintf(stderr, "\n"); @@ -2617,6 +2711,7 @@ int main(int argc, char *argv[]) fprintf(stderr, " -N FD, --share-net FD\n"); fprintf(stderr, " -I FD, --share-ipc FD\n"); fprintf(stderr, " -U FD, --share-uts FD\n"); + fprintf(stderr, " -r FD, --restore FD\n"); fprintf(stderr, " -h, --help\n"); fprintf(stderr, "\n"); rc = 0; @@ -2669,6 +2764,8 @@ int main(int argc, char *argv[]) ctrl->passFDs = passFDs; ctrl->npassFDs = npassFDs; + ctrl->restore = restore; + for (i = 0; i < VIR_LXC_DOMAIN_NAMESPACE_LAST; i++) { if (ns_fd[i] != -1) { if (!ctrl->nsFDs) {/*allocate only once */ diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index 46af05d..bd47c91 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -1133,7 +1133,7 @@ static int lxcDomainCreateWithFiles(virDomainPtr dom, ret = virLXCProcessStart(dom->conn, driver, vm, nfiles, files, - (flags & VIR_DOMAIN_START_AUTODESTROY), + (flags & VIR_DOMAIN_START_AUTODESTROY), -1, VIR_DOMAIN_RUNNING_BOOTED); if (ret == 0) { @@ -1259,7 +1259,7 @@ lxcDomainCreateXMLWithFiles(virConnectPtr conn, if (virLXCProcessStart(conn, driver, vm, nfiles, files, - (flags & VIR_DOMAIN_START_AUTODESTROY), + (flags & VIR_DOMAIN_START_AUTODESTROY), -1, VIR_DOMAIN_RUNNING_BOOTED) < 0) { virDomainAuditStart(vm, "booted", false); if (!vm->persistent) { diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index 28313f0..b4f92e0 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -123,7 +123,7 @@ virLXCProcessReboot(virLXCDriverPtr driver, virLXCProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_SHUTDOWN); vm->newDef = savedDef; if (virLXCProcessStart(conn, driver, vm, - 0, NULL, autodestroy, reason) < 0) { + 0, NULL, autodestroy, -1, reason) < 0) { VIR_WARN("Unable to handle reboot of vm %s", vm->def->name); goto cleanup; @@ -929,7 +929,8 @@ virLXCProcessBuildControllerCmd(virLXCDriverPtr driver, size_t nfiles, int handshakefd, int * const logfd, - const char *pidfile) + const char *pidfile, + int restorefd) { size_t i; char *filterstr; @@ -1008,6 +1009,12 @@ virLXCProcessBuildControllerCmd(virLXCDriverPtr driver, for (i = 0; i < nveths; i++) virCommandAddArgList(cmd, "--veth", veths[i], NULL); + if (restorefd != -1) { + virCommandAddArg(cmd, "--restore"); + virCommandAddArgFormat(cmd, "%d", restorefd); + virCommandPassFD(cmd, restorefd, 0); + } + virCommandPassFD(cmd, handshakefd, 0); virCommandDaemonize(cmd); virCommandSetPidFile(cmd, pidfile); @@ -1181,6 +1188,8 @@ virLXCProcessEnsureRootFS(virDomainObjPtr vm) * @driver: pointer to driver structure * @vm: pointer to virtual machine structure * @autoDestroy: mark the domain for auto destruction + * @restorefd: file descriptor pointing to the restore directory (-1 if not + * restoring) * @reason: reason for switching vm to running state * * Starts a vm @@ -1192,6 +1201,7 @@ int virLXCProcessStart(virConnectPtr conn, virDomainObjPtr vm, unsigned int nfiles, int *files, bool autoDestroy, + int restorefd, virDomainRunningReason reason) { int rc = -1, r; @@ -1406,7 +1416,7 @@ int virLXCProcessStart(virConnectPtr conn, files, nfiles, handshakefds[1], &logfd, - pidfile))) + pidfile, restorefd))) goto cleanup; /* now that we know it is about to start call the hook if present */ @@ -1511,6 +1521,9 @@ int virLXCProcessStart(virConnectPtr conn, goto cleanup; } + if (restorefd != -1) + goto skip_cgroup_checks; + /* We know the cgroup must exist by this synchronization * point so lets detect that first, since it gives us a * more reliable way to kill everything off if something @@ -1527,6 +1540,8 @@ int virLXCProcessStart(virConnectPtr conn, goto cleanup; } + skip_cgroup_checks: + /* Get the machine name so we can properly delete it through * systemd later */ if (!(priv->machineName = virSystemdGetMachineNameByPID(vm->pid))) @@ -1634,7 +1649,7 @@ virLXCProcessAutostartDomain(virDomainObjPtr vm, if (vm->autostart && !virDomainObjIsActive(vm)) { ret = virLXCProcessStart(data->conn, data->driver, vm, - 0, NULL, false, + 0, NULL, false, -1, VIR_DOMAIN_RUNNING_BOOTED); virDomainAuditStart(vm, "booted", ret >= 0); if (ret < 0) { diff --git a/src/lxc/lxc_process.h b/src/lxc/lxc_process.h index d78cdde..c724f31 100644 --- a/src/lxc/lxc_process.h +++ b/src/lxc/lxc_process.h @@ -29,6 +29,7 @@ int virLXCProcessStart(virConnectPtr conn, virDomainObjPtr vm, unsigned int nfiles, int *files, bool autoDestroy, + int restorefd, virDomainRunningReason reason); int virLXCProcessStop(virLXCDriverPtr driver, virDomainObjPtr vm, -- 2.7.3

On Thu, Jul 21, 2016 at 03:37:26PM +0000, Katerina Koukiou wrote:
When doing lxc migration or simply restoring the container from a saved state, we need restore the container from CRIU img files that we have stored in disk. In this patch, we should extend lxcContainerStart into a more generic one, that either starts a container from scratch or restores it from a snapshot.
Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com> --- src/Makefile.am | 3 +- src/lxc/lxc_container.c | 200 +++++++++++++++++++++++++++++++++++++++++++++-- src/lxc/lxc_container.h | 3 +- src/lxc/lxc_controller.c | 109 ++++++++++++++++++++++++-- src/lxc/lxc_driver.c | 4 +- src/lxc/lxc_process.c | 23 +++++- src/lxc/lxc_process.h | 1 + 7 files changed, 323 insertions(+), 20 deletions(-)
+ /* CRIU needs the container's root bind mounted so that it is the root of + * some mount. + */ + if (virAsprintf(&rootfs_mount, "/tmp/%s", vmDef->name) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to write rootfs dir mount path")); + goto cleanup; + }
Again, use of /tmp is a likely security flaw Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Add support for saving an lxc domain's state into files with lxcDomainSave and restore from file with lxcDomainRestore. Usage: virsh save [domain-name] [domain-id or domain-uuid] [directory name] Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com> --- src/lxc/lxc_driver.c | 234 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 234 insertions(+) diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index bd47c91..cc05eef 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -84,6 +84,7 @@ #include "virhostdev.h" #include "netdev_bandwidth_conf.h" +#include "lxc_criu.h" #define VIR_FROM_THIS VIR_FROM_LXC VIR_LOG_INIT("lxc.lxc_driver"); @@ -3205,6 +3206,235 @@ static int lxcDomainResume(virDomainPtr dom) } static int +lxcDoDomainSave(virLXCDriverPtr driver, virDomainObjPtr vm, + const char *to) +{ + int ret = -1; + virCapsPtr caps = NULL; + uint32_t xml_len = -1; + char *xml = NULL; + char xmlLen[33]; + char *xml_image_path = NULL; + char *str = NULL; + + if (!(caps = virLXCDriverGetCapabilities(driver, false))) + goto cleanup; + + if ((xml = virDomainDefFormat(vm->def, caps, 0)) == NULL) + goto cleanup; + + if ((ret = lxcCriuDump(driver, vm, to)) != 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to checkpoint domain with CRIU")); + goto cleanup; + } + + virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, + VIR_DOMAIN_SHUTOFF_SAVED); + + if (virAsprintf(&xml_image_path, "%s/xml-image", to) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to write image path")); + goto cleanup; + } + + xml_len = strlen(xml) + 1; + snprintf(xmlLen, sizeof(xmlLen), "%d", xml_len); + VIR_DEBUG("xmlLen = %d %s", xml_len, xmlLen); + + if (virAsprintf(&str, "%s\n%s", xmlLen, xml) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to write xml info to string")); + goto cleanup; + } + + if (virFileWriteStr(xml_image_path, str, 0666) < 0) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("Failed to write xml description to %s"), + xml_image_path); + goto cleanup; + } + + ret = 0; + + cleanup: + VIR_FREE(xml); + VIR_FREE(xml_image_path); + VIR_FREE(str); + return ret != 0 ? -1 : ret; +} + +static int +lxcDomainSaveFlags(virDomainPtr dom, const char *to, const char *dxml, + unsigned int flags) +{ + int ret = -1; + virLXCDriverPtr driver = dom->conn->privateData; + virDomainObjPtr vm; + bool remove_dom = false; + + virCheckFlags(0, -1); + if (dxml) { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("xml modification unsupported")); + return -1; + } + + if (!(vm = lxcDomObjFromDomain(dom))) + goto cleanup; + + if (virDomainSaveFlagsEnsureACL(dom->conn, vm->def) < 0) + goto cleanup; + + if (virLXCDomainObjBeginJob(driver, vm, LXC_JOB_MODIFY) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("Domain is not running")); + goto endjob; + } + + if (lxcDoDomainSave(driver, vm, to) < 0) + goto endjob; + + if (!vm->persistent) + remove_dom = true; + + ret = 0; + + endjob: + virLXCDomainObjEndJob(driver, vm); + + cleanup: + if (remove_dom && vm) + virDomainObjListRemove(driver->domains, vm); + virDomainObjEndAPI(&vm); + return ret; +} + +static int +lxcDomainSave(virDomainPtr dom, const char *to) +{ + return lxcDomainSaveFlags(dom, to, NULL, 0); +} + +static int +lxcDomainRestoreFlags(virConnectPtr conn, const char *from, + const char* dxml, unsigned int flags) +{ + virLXCDriverPtr driver = conn->privateData; + virDomainObjPtr vm = NULL; + virDomainDefPtr def = NULL; + virCapsPtr caps = NULL; + int ret = -1; + int restorefd; + char *xml = NULL; + u_int32_t xmlLen = 0; + int fd; + char *xml_image_path = NULL; + char c; + + virCheckFlags(0, -1); + if (dxml) { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("xml modification unsupported")); + goto out; + } + + if (virAsprintf(&xml_image_path, "%s/xml-image", from) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to write xml image path")); + goto out; + } + + if ((fd = virFileOpenAs(xml_image_path, O_RDONLY, 0, -1, -1, 0)) < 0) { + virReportSystemError(-fd, + _("Failed to open domain image file '%s'"), + xml_image_path); + goto out; + } + + while ((saferead(fd, &c, 1) == 1) && c != '\n') + xmlLen = xmlLen*10 + c - '0'; + xmlLen--; + + if (VIR_ALLOC_N(xml, xmlLen) < 0) + goto cleanup; + + if (saferead(fd, xml, xmlLen) != xmlLen) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", _("failed to read XML")); + goto cleanup; + } + + if (!xml) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("no domain XML parsed")); + goto cleanup; + } + + if (!(caps = virLXCDriverGetCapabilities(driver, false))) + goto cleanup; + + if (!(def = virDomainDefParseString(xml, caps, driver->xmlopt, + VIR_DOMAIN_DEF_PARSE_INACTIVE))) + goto cleanup; + + if (virDomainRestoreFlagsEnsureACL(conn, def) < 0) + goto cleanup; + + if (!(vm = virDomainObjListAdd(driver->domains, def, + driver->xmlopt, + VIR_DOMAIN_OBJ_LIST_ADD_LIVE | + VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE, + NULL))) + goto cleanup; + def = NULL; + + if (virLXCDomainObjBeginJob(driver, vm, LXC_JOB_MODIFY) < 0) { + if (!vm->persistent) + virDomainObjListRemove(driver->domains, vm); + goto cleanup; + } + + restorefd = open(from, O_DIRECTORY); + if (restorefd < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("Can't open images dir")); + if (!vm->persistent) + virDomainObjListRemove(driver->domains, vm); + virLXCDomainObjEndJob(driver, vm); + goto cleanup; + } + + ret = virLXCProcessStart(conn, driver, vm, + 0, NULL, + 0, restorefd, + VIR_DOMAIN_RUNNING_RESTORED); + + VIR_FORCE_CLOSE(restorefd); + + if (ret < 0 && !vm->persistent) + virDomainObjListRemove(driver->domains, vm); + + virLXCDomainObjEndJob(driver, vm); + cleanup: + VIR_FORCE_CLOSE(fd); + out: + virDomainDefFree(def); + VIR_FREE(xml_image_path); + VIR_FREE(xml); + virDomainObjEndAPI(&vm); + return ret; +} + +static int +lxcDomainRestore(virConnectPtr conn, const char *from) +{ + return lxcDomainRestoreFlags(conn, from, NULL, 0); +} + +static int lxcDomainOpenConsole(virDomainPtr dom, const char *dev_name, virStreamPtr st, @@ -5526,6 +5756,10 @@ static virHypervisorDriver lxcHypervisorDriver = { .domainLookupByName = lxcDomainLookupByName, /* 0.4.2 */ .domainSuspend = lxcDomainSuspend, /* 0.7.2 */ .domainResume = lxcDomainResume, /* 0.7.2 */ + .domainSave = lxcDomainSave, /* x.x.x */ + .domainSaveFlags = lxcDomainSaveFlags, /* x.x.x */ + .domainRestore = lxcDomainRestore, /* x.x.x */ + .domainRestoreFlags = lxcDomainRestoreFlags, /* x.x.x */ .domainDestroy = lxcDomainDestroy, /* 0.4.4 */ .domainDestroyFlags = lxcDomainDestroyFlags, /* 0.9.4 */ .domainGetOSType = lxcDomainGetOSType, /* 0.4.2 */ -- 2.7.3

On Thu, Jul 21, 2016 at 03:37:27PM +0000, Katerina Koukiou wrote:
Add support for saving an lxc domain's state into files with lxcDomainSave and restore from file with lxcDomainRestore. Usage: virsh save [domain-name] [domain-id or domain-uuid] [directory name]
Signed-off-by: Katerina Koukiou <k.koukiou@gmail.com> --- src/lxc/lxc_driver.c | 234 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 234 insertions(+)
diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index bd47c91..cc05eef 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -84,6 +84,7 @@ #include "virhostdev.h" #include "netdev_bandwidth_conf.h"
+#include "lxc_criu.h" #define VIR_FROM_THIS VIR_FROM_LXC
VIR_LOG_INIT("lxc.lxc_driver"); @@ -3205,6 +3206,235 @@ static int lxcDomainResume(virDomainPtr dom) }
static int +lxcDoDomainSave(virLXCDriverPtr driver, virDomainObjPtr vm, + const char *to) +{ + int ret = -1; + virCapsPtr caps = NULL; + uint32_t xml_len = -1; + char *xml = NULL; + char xmlLen[33]; + char *xml_image_path = NULL; + char *str = NULL; + + if (!(caps = virLXCDriverGetCapabilities(driver, false))) + goto cleanup; + + if ((xml = virDomainDefFormat(vm->def, caps, 0)) == NULL) + goto cleanup; + + if ((ret = lxcCriuDump(driver, vm, to)) != 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to checkpoint domain with CRIU")); + goto cleanup; + } + + virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, + VIR_DOMAIN_SHUTOFF_SAVED); + + if (virAsprintf(&xml_image_path, "%s/xml-image", to) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to write image path")); + goto cleanup; + }
The 'to' argument to virDomainSave is defined to be a filename, not a directory. So as mentioned in previous patch, you need to get all the dump state into that file, both XML and container state. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Thu, Jul 21, 2016 at 03:37:22PM +0000, Katerina Koukiou wrote:
This patch series adds support for saving an running lxc domain's state into files with lxcDomainSave and restore afterwards from files with lxcDomainRestore. Usage: virsh save [domain-name] [domain-id or domain-uuid] [directory name] I use CRIU tool (https://criu.org/Main_Page), that offers checkpoint/restore functionality for containers in userspace. For the time, I have tried successfully the C/R procedure for simple sh containers and OS containers. I'll mention some notes/issues here: *I have working C/R only for non systemd hosts (on systemd host I was facing problems with CRIU). *I have not done anything for container networking. Thats should be done with --veth-pair IN=OUT option in CRIU. *In new distros, where efivars mountpoint exists, CRIU dump fails. *The only tty restored is /dev/tty1. I'll fix this in another patch, to allow more ttys. *Currently for things to work, I have slightly modified criu source. That is in criu master I have the following diff:
diff --git a/criu/tty.c b/criu/tty.c index 302dd54..2226484 100644 --- a/criu/tty.c +++ b/criu/tty.c @@ -1394,8 +1394,10 @@ static int verify_info(struct tty_info *info) */ if (term_opts_missing_any(info)) { if (tty_is_master(info)) { + /* pr_err("Corrupted master peer %x\n", info->tfe->id); return -1; + */ } else if (!term_opts_missing_all(info)) { pr_err("Corrupted slave peer %x\n", info->tfe->id); return -1;
Lastly, I have ready a patch that adds support for migration but I wait for feedback on this series, and I'll send the migration one later. Anyway, any comments here are more than welcome.
I'd recommend focusing on getting save/restore more broadly functional before considering anything todo with migration. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (2)
-
Daniel P. Berrange
-
Katerina Koukiou