[libvirt] [PATCH RFC 0/7] Run qemu under its own namespace

This is just an proof of concept of what has been agreed on here: https://www.redhat.com/archives/libvir-list/2016-November/msg00285.html There is still a lot of to be done: - set up seclabels - implement hot(un-)plug - implement other devices, not just disks I'm sending these in a hope that somebody will at least take a quick look. I'm not looking for a code cleanliness (but if you find some issues feel free to raise them), more than design confirmation. If I'm going in wrong direction I'd rather stop now before digging any deeper. Michal Privoznik (7): virprocess: Introduce virProcessSetupPrivateNS virfile: Introduce virFilePopulateDevices virfile: Introduce virFileSetupDev virfile: Introduce virFileSetupDevPTS virfile: Introduce virFileBindMountDevice qemu: Spawn qemu under mount namespace qemu: Prepare disks when starting a domain src/libvirt_private.syms | 5 + src/lxc/lxc_container.c | 20 +--- src/lxc/lxc_controller.c | 82 +++---------- src/qemu/qemu_domain.c | 306 +++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 12 ++ src/qemu/qemu_process.c | 13 ++ src/util/virfile.c | 116 ++++++++++++++++++ src/util/virfile.h | 21 ++++ src/util/virprocess.c | 24 ++++ src/util/virprocess.h | 2 + 10 files changed, 517 insertions(+), 84 deletions(-) -- 2.8.4

This part of code that LXC currently uses will be reused so move to a generic function. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/lxc/lxc_controller.c | 18 +----------------- src/util/virprocess.c | 24 ++++++++++++++++++++++++ src/util/virprocess.h | 2 ++ 4 files changed, 28 insertions(+), 17 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index ac6a1e1..42650d1 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2246,6 +2246,7 @@ virProcessSetMaxMemLock; virProcessSetMaxProcesses; virProcessSetNamespaces; virProcessSetScheduler; +virProcessSetupPrivateNS; virProcessTranslateStatus; virProcessWait; diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 508bc3e..1bb868a 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -2092,8 +2092,6 @@ lxcCreateTty(virLXCControllerPtr ctrl, int *ttymaster, static int virLXCControllerSetupPrivateNS(void) { - int ret = -1; - /* * If doing a chroot style setup, we need to prepare * a private /dev/pts for the child now, which they @@ -2115,21 +2113,7 @@ virLXCControllerSetupPrivateNS(void) * marked as shared */ - if (unshare(CLONE_NEWNS) < 0) { - virReportSystemError(errno, "%s", - _("Cannot unshare mount namespace")); - goto cleanup; - } - - if (mount("", "/", NULL, MS_SLAVE|MS_REC, NULL) < 0) { - virReportSystemError(errno, "%s", - _("Failed to switch root mount into slave mode")); - goto cleanup; - } - - ret = 0; - cleanup: - return ret; + return virProcessSetupPrivateNS(); } diff --git a/src/util/virprocess.c b/src/util/virprocess.c index 718c4a2..94eacbd 100644 --- a/src/util/virprocess.c +++ b/src/util/virprocess.c @@ -28,6 +28,7 @@ #include <stdlib.h> #include <sys/wait.h> #include <unistd.h> +#include <sys/mount.h> #if HAVE_SETRLIMIT # include <sys/time.h> # include <sys/resource.h> @@ -1146,6 +1147,29 @@ virProcessRunInMountNamespace(pid_t pid, } +int +virProcessSetupPrivateNS(void) +{ + int ret = -1; + + if (unshare(CLONE_NEWNS) < 0) { + virReportSystemError(errno, "%s", + _("Cannot unshare mount namespace")); + goto cleanup; + } + + if (mount("", "/", NULL, MS_SLAVE|MS_REC, NULL) < 0) { + virReportSystemError(errno, "%s", + _("Failed to switch root mount into slave mode")); + goto cleanup; + } + + ret = 0; + cleanup: + return ret; +} + + /** * virProcessExitWithStatus: * @status: raw status to be reproduced when this process dies diff --git a/src/util/virprocess.h b/src/util/virprocess.h index 04e9802..74656d9 100644 --- a/src/util/virprocess.h +++ b/src/util/virprocess.h @@ -90,6 +90,8 @@ int virProcessRunInMountNamespace(pid_t pid, virProcessNamespaceCallback cb, void *opaque); +int virProcessSetupPrivateNS(void); + int virProcessSetScheduler(pid_t pid, virProcessSchedPolicy policy, int priority); -- 2.8.4

On Mon, Nov 14, 2016 at 05:43:25PM +0100, Michal Privoznik wrote:
This part of code that LXC currently uses will be reused so move to a generic function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/lxc/lxc_controller.c | 18 +----------------- src/util/virprocess.c | 24 ++++++++++++++++++++++++ src/util/virprocess.h | 2 ++ 4 files changed, 28 insertions(+), 17 deletions(-)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index ac6a1e1..42650d1 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2246,6 +2246,7 @@ virProcessSetMaxMemLock; virProcessSetMaxProcesses; virProcessSetNamespaces; virProcessSetScheduler; +virProcessSetupPrivateNS; virProcessTranslateStatus; virProcessWait;
diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 508bc3e..1bb868a 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -2092,8 +2092,6 @@ lxcCreateTty(virLXCControllerPtr ctrl, int *ttymaster, static int virLXCControllerSetupPrivateNS(void) { - int ret = -1; - /* * If doing a chroot style setup, we need to prepare * a private /dev/pts for the child now, which they @@ -2115,21 +2113,7 @@ virLXCControllerSetupPrivateNS(void) * marked as shared */
- if (unshare(CLONE_NEWNS) < 0) { - virReportSystemError(errno, "%s", - _("Cannot unshare mount namespace")); - goto cleanup; - } - - if (mount("", "/", NULL, MS_SLAVE|MS_REC, NULL) < 0) { - virReportSystemError(errno, "%s", - _("Failed to switch root mount into slave mode")); - goto cleanup; - } - - ret = 0; - cleanup: - return ret; + return virProcessSetupPrivateNS(); }
diff --git a/src/util/virprocess.c b/src/util/virprocess.c index 718c4a2..94eacbd 100644 --- a/src/util/virprocess.c +++ b/src/util/virprocess.c @@ -28,6 +28,7 @@ #include <stdlib.h> #include <sys/wait.h> #include <unistd.h> +#include <sys/mount.h> #if HAVE_SETRLIMIT # include <sys/time.h> # include <sys/resource.h> @@ -1146,6 +1147,29 @@ virProcessRunInMountNamespace(pid_t pid, }
+int +virProcessSetupPrivateNS(void) +{ + int ret = -1; + + if (unshare(CLONE_NEWNS) < 0) { + virReportSystemError(errno, "%s", + _("Cannot unshare mount namespace")); + goto cleanup; + } + + if (mount("", "/", NULL, MS_SLAVE|MS_REC, NULL) < 0) { + virReportSystemError(errno, "%s", + _("Failed to switch root mount into slave mode")); + goto cleanup; + } + + ret = 0; + cleanup: + return ret; +} + + /** * virProcessExitWithStatus: * @status: raw status to be reproduced when this process dies diff --git a/src/util/virprocess.h b/src/util/virprocess.h index 04e9802..74656d9 100644 --- a/src/util/virprocess.h +++ b/src/util/virprocess.h @@ -90,6 +90,8 @@ int virProcessRunInMountNamespace(pid_t pid, virProcessNamespaceCallback cb, void *opaque);
+int virProcessSetupPrivateNS(void);
Nitpick s/NS/MountNS/ since there's lots of namespaces and this is only privatizing one of them. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/lxc/lxc_controller.c | 34 ++++++++++++---------------------- src/util/virfile.c | 31 +++++++++++++++++++++++++++++++ src/util/virfile.h | 12 ++++++++++++ 4 files changed, 56 insertions(+), 22 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 42650d1..868905e 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1588,6 +1588,7 @@ virFileMatchesNameSuffix; virFileNBDDeviceAssociate; virFileOpenAs; virFileOpenTty; +virFilePopulateDevices; virFilePrintf; virFileReadAll; virFileReadAllQuiet; diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 1bb868a..130b09f 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -1497,42 +1497,32 @@ static int virLXCControllerPopulateDevices(virLXCControllerPtr ctrl) size_t i; int ret = -1; char *path = NULL; - const struct { - int maj; - int min; - mode_t mode; - const char *path; - } devs[] = { + virFileDevices devs[] = { { LXC_DEV_MAJ_MEMORY, LXC_DEV_MIN_NULL, 0666, "/null" }, { LXC_DEV_MAJ_MEMORY, LXC_DEV_MIN_ZERO, 0666, "/zero" }, { LXC_DEV_MAJ_MEMORY, LXC_DEV_MIN_FULL, 0666, "/full" }, { LXC_DEV_MAJ_MEMORY, LXC_DEV_MIN_RANDOM, 0666, "/random" }, { LXC_DEV_MAJ_MEMORY, LXC_DEV_MIN_URANDOM, 0666, "/urandom" }, { LXC_DEV_MAJ_TTY, LXC_DEV_MIN_TTY, 0666, "/tty" }, + { 0, 0, 0, NULL}, }; if (virLXCControllerSetupDev(ctrl) < 0) goto cleanup; + if (virAsprintf(&path, "/%s/%s.dev", LXC_STATE_DIR, ctrl->def->name) < 0) + goto cleanup; + + if (virFilePopulateDevices(path, devs) < 0) + goto cleanup; + /* Populate /dev/ with a few important bits */ for (i = 0; i < ARRAY_CARDINALITY(devs); i++) { - if (virAsprintf(&path, "/%s/%s.dev/%s", - LXC_STATE_DIR, ctrl->def->name, devs[i].path) < 0) - goto cleanup; - - dev_t dev = makedev(devs[i].maj, devs[i].min); - if (mknod(path, S_IFCHR, dev) < 0 || - chmod(path, devs[i].mode)) { - virReportSystemError(errno, - _("Failed to make device %s"), - path); - goto cleanup; - } - - if (lxcContainerChown(ctrl->def, path) < 0) - goto cleanup; - VIR_FREE(path); + if (virAsprintf(&path, "/%s/%s.dev/%s", + LXC_STATE_DIR, ctrl->def->name, devs[i].path) < 0 || + lxcContainerChown(ctrl->def, path) < 0) + goto cleanup; } ret = 0; diff --git a/src/util/virfile.c b/src/util/virfile.c index a45279a..eae4db4 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -3501,3 +3501,34 @@ int virFileIsSharedFS(const char *path) VIR_FILE_SHFS_SMB | VIR_FILE_SHFS_CIFS); } + + +int +virFilePopulateDevices(const char *prefix, + const virFileDevices *const devs) +{ + size_t i; + int ret = -1; + char *path = NULL; + + for (i = 0; devs && devs[i].path; i++) { + if (virAsprintf(&path, "%s/%s", prefix, devs[i].path) < 0) + goto cleanup; + + dev_t dev = makedev(devs[i].maj, devs[i].min); + if (mknod(path, S_IFCHR, dev) < 0 || + chmod(path, devs[i].mode)) { + virReportSystemError(errno, + _("Failed to make device %s"), + path); + goto cleanup; + } + + VIR_FREE(path); + } + + ret = 0; + cleanup: + VIR_FREE(path); + return ret; +} diff --git a/src/util/virfile.h b/src/util/virfile.h index b4ae6ea..3b3dd13 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -307,4 +307,16 @@ int virFileGetHugepageSize(const char *path, unsigned long long *size); int virFileFindHugeTLBFS(virHugeTLBFSPtr *ret_fs, size_t *ret_nfs); + +typedef struct _virFileDevices virFileDevices; +typedef virFileDevices *virFileDevicesPtr; +struct _virFileDevices { + int maj; + int min; + mode_t mode; + const char *path; +}; + +int virFilePopulateDevices(const char *prefix, + const virFileDevices *const devs); #endif /* __VIR_FILE_H */ -- 2.8.4

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/lxc/lxc_controller.c | 14 +------------- src/util/virfile.c | 30 ++++++++++++++++++++++++++++++ src/util/virfile.h | 3 +++ 4 files changed, 35 insertions(+), 13 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 868905e..b984fa1 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1602,6 +1602,7 @@ virFileResolveAllLinks; virFileResolveLink; virFileRewrite; virFileSanitizePath; +virFileSetupDev; virFileSkipRoot; virFileStripSuffix; virFileTouch; diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 130b09f..3ab155f 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -1457,12 +1457,6 @@ static int virLXCControllerSetupDev(virLXCControllerPtr ctrl) LXC_STATE_DIR, ctrl->def->name) < 0) goto cleanup; - if (virFileMakePath(dev) < 0) { - virReportSystemError(errno, - _("Failed to make path %s"), dev); - goto cleanup; - } - /* * tmpfs is limited to 64kb, since we only have device nodes in there * and don't want to DOS the entire OS RAM usage @@ -1472,14 +1466,8 @@ static int virLXCControllerSetupDev(virLXCControllerPtr ctrl) "mode=755,size=65536%s", mount_options) < 0) goto cleanup; - VIR_DEBUG("Mount devfs on %s type=tmpfs flags=%x, opts=%s", - dev, MS_NOSUID, opts); - if (mount("devfs", dev, "tmpfs", MS_NOSUID, opts) < 0) { - virReportSystemError(errno, - _("Failed to mount devfs on %s type %s (%s)"), - dev, "tmpfs", opts); + if (virFileSetupDev(dev, opts) < 0) goto cleanup; - } if (lxcContainerChown(ctrl->def, dev) < 0) goto cleanup; diff --git a/src/util/virfile.c b/src/util/virfile.c index eae4db4..d86acbf 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -32,6 +32,7 @@ #include <sys/types.h> #include <sys/socket.h> #include <sys/wait.h> +#include <sys/mount.h> #include <unistd.h> #include <dirent.h> #include <dirname.h> @@ -3532,3 +3533,32 @@ virFilePopulateDevices(const char *prefix, VIR_FREE(path); return ret; } + + +int +virFileSetupDev(const char *path, + const char *mount_options) +{ + const unsigned long mount_flags = MS_NOSUID; + const char *mount_fs = "tmpfs"; + int ret = -1; + + if (virFileMakePath(path) < 0) { + virReportSystemError(errno, + _("Failed to make path %s"), path); + goto cleanup; + } + + VIR_DEBUG("Mount devfs on %s type=tmpfs flags=%lx, opts=%s", + path, mount_flags, mount_options); + if (mount("devfs", path, mount_fs, mount_flags, mount_options) < 0) { + virReportSystemError(errno, + _("Failed to mount devfs on %s type %s (%s)"), + path, mount_fs, mount_options); + goto cleanup; + } + + ret = 0; + cleanup: + return ret; +} diff --git a/src/util/virfile.h b/src/util/virfile.h index 3b3dd13..1b3830d 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -319,4 +319,7 @@ struct _virFileDevices { int virFilePopulateDevices(const char *prefix, const virFileDevices *const devs); + +int virFileSetupDev(const char *path, + const char *mount_options); #endif /* __VIR_FILE_H */ -- 2.8.4

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/lxc/lxc_controller.c | 16 +--------------- src/util/virfile.c | 43 +++++++++++++++++++++++++++++++++++++++++-- src/util/virfile.h | 4 ++++ 4 files changed, 47 insertions(+), 17 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b984fa1..9653247 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1603,6 +1603,7 @@ virFileResolveLink; virFileRewrite; virFileSanitizePath; virFileSetupDev; +virFileSetupDevPTS; virFileSkipRoot; virFileStripSuffix; virFileTouch; diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 3ab155f..d5e8359 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -2110,8 +2110,6 @@ virLXCControllerSetupDevPTS(virLXCControllerPtr ctrl) ctrl->def); if (virAsprintf(&devpts, "%s/%s.devpts", - LXC_STATE_DIR, ctrl->def->name) < 0 || - virAsprintf(&ctrl->devptmx, "%s/%s.devpts/ptmx", LXC_STATE_DIR, ctrl->def->name) < 0) goto cleanup; @@ -2133,20 +2131,8 @@ virLXCControllerSetupDevPTS(virLXCControllerPtr ctrl) ptsgid, (mount_options ? mount_options : "")) < 0) goto cleanup; - VIR_DEBUG("Mount devpts on %s type=tmpfs flags=%x, opts=%s", - devpts, MS_NOSUID, opts); - if (mount("devpts", devpts, "devpts", MS_NOSUID, opts) < 0) { - virReportSystemError(errno, - _("Failed to mount devpts on %s"), - devpts); + if (virFileSetupDevPTS(devpts, opts, &ctrl->devptmx) < 0) goto cleanup; - } - - if (access(ctrl->devptmx, R_OK) < 0) { - virReportSystemError(ENOSYS, "%s", - _("Kernel does not support private devpts")); - goto cleanup; - } if ((lxcContainerChown(ctrl->def, ctrl->devptmx) < 0) || (lxcContainerChown(ctrl->def, devpts) < 0)) diff --git a/src/util/virfile.c b/src/util/virfile.c index d86acbf..aa81ae3 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -3549,8 +3549,8 @@ virFileSetupDev(const char *path, goto cleanup; } - VIR_DEBUG("Mount devfs on %s type=tmpfs flags=%lx, opts=%s", - path, mount_flags, mount_options); + VIR_DEBUG("Mount devfs on %s type=%s flags=%lx, opts=%s", + path, mount_fs, mount_flags, mount_options); if (mount("devfs", path, mount_fs, mount_flags, mount_options) < 0) { virReportSystemError(errno, _("Failed to mount devfs on %s type %s (%s)"), @@ -3562,3 +3562,42 @@ virFileSetupDev(const char *path, cleanup: return ret; } + + +int +virFileSetupDevPTS(const char *path, + const char *mount_options, + char **ptmx_ret) +{ + const unsigned long mount_flags = MS_NOSUID; + const char *mount_fs = "devpts"; + char *devptmx = NULL; + int ret = -1; + + if (virAsprintf(&devptmx, "%s/ptmx", path) < 0) + goto cleanup; + + VIR_DEBUG("Mount devpts on %s type=%s flags=%lx, opts=%s", + path, mount_fs, mount_flags, mount_options); + if (mount("devpts", path, mount_fs, mount_flags, mount_options) < 0) { + virReportSystemError(errno, + _("Failed to mount devpts on %s type %s (%s)"), + path, mount_fs, mount_options); + goto cleanup; + } + + if (access(devptmx, R_OK) < 0) { + virReportSystemError(ENOSYS, "%s", + _("Kernel does not support private devpts")); + goto cleanup; + } + + if (ptmx_ret) { + *ptmx_ret = devptmx; + devptmx = NULL; + } + ret = 0; + cleanup: + VIR_FREE(devptmx); + return ret; +} diff --git a/src/util/virfile.h b/src/util/virfile.h index 1b3830d..2dab6e7 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -322,4 +322,8 @@ int virFilePopulateDevices(const char *prefix, int virFileSetupDev(const char *path, const char *mount_options); + +int virFileSetupDevPTS(const char *path, + const char *mount_options, + char **ptmx_ret); #endif /* __VIR_FILE_H */ -- 2.8.4

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/lxc/lxc_container.c | 20 +++----------------- src/util/virfile.c | 16 ++++++++++++++++ src/util/virfile.h | 2 ++ 4 files changed, 22 insertions(+), 17 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 9653247..c8dea1f 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1556,6 +1556,7 @@ virDirRead; virFileAbsPath; virFileAccessibleAs; virFileActivateDirOverride; +virFileBindMountDevice; virFileBuildPath; virFileClose; virFileDeleteTree; diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 5357df4..5d7da8f 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -1112,20 +1112,6 @@ static int lxcContainerMountFSDevPTS(virDomainDefPtr def, return ret; } -static int lxcContainerBindMountDevice(const char *src, const char *dst) -{ - if (virFileTouch(dst, 0666) < 0) - return -1; - - if (mount(src, dst, "none", MS_BIND, NULL) < 0) { - virReportSystemError(errno, _("Failed to bind %s on to %s"), src, - dst); - return -1; - } - - return 0; -} - static int lxcContainerSetupDevices(char **ttyPaths, size_t nttyPaths) { size_t i; @@ -1149,7 +1135,7 @@ static int lxcContainerSetupDevices(char **ttyPaths, size_t nttyPaths) } /* We have private devpts capability, so bind that */ - if (lxcContainerBindMountDevice("/dev/pts/ptmx", "/dev/ptmx") < 0) + if (virFileBindMountDevice("/dev/pts/ptmx", "/dev/ptmx") < 0) return -1; for (i = 0; i < nttyPaths; i++) { @@ -1157,7 +1143,7 @@ static int lxcContainerSetupDevices(char **ttyPaths, size_t nttyPaths) if (virAsprintf(&tty, "/dev/tty%zu", i+1) < 0) return -1; - if (lxcContainerBindMountDevice(ttyPaths[i], tty) < 0) { + if (virFileBindMountDevice(ttyPaths[i], tty) < 0) { return -1; VIR_FREE(tty); } @@ -1165,7 +1151,7 @@ static int lxcContainerSetupDevices(char **ttyPaths, size_t nttyPaths) VIR_FREE(tty); if (i == 0 && - lxcContainerBindMountDevice(ttyPaths[i], "/dev/console") < 0) + virFileBindMountDevice(ttyPaths[i], "/dev/console") < 0) return -1; } return 0; diff --git a/src/util/virfile.c b/src/util/virfile.c index aa81ae3..0ddca01 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -3601,3 +3601,19 @@ virFileSetupDevPTS(const char *path, VIR_FREE(devptmx); return ret; } + + +int +virFileBindMountDevice(const char *src, const char *dst) +{ + if (virFileTouch(dst, 0666) < 0) + return -1; + + if (mount(src, dst, "none", MS_BIND, NULL) < 0) { + virReportSystemError(errno, _("Failed to bind %s on to %s"), src, + dst); + return -1; + } + + return 0; +} diff --git a/src/util/virfile.h b/src/util/virfile.h index 2dab6e7..23a5afb 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -326,4 +326,6 @@ int virFileSetupDev(const char *path, int virFileSetupDevPTS(const char *path, const char *mount_options, char **ptmx_ret); + +int virFileBindMountDevice(const char *src, const char *dst); #endif /* __VIR_FILE_H */ -- 2.8.4

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 233 ++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 8 ++ src/qemu/qemu_process.c | 13 +++ 3 files changed, 254 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 8cba755..3a0170c 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -55,6 +55,7 @@ #include <sys/time.h> #include <fcntl.h> +#include <sys/mount.h> #include <libxml/xpathInternals.h> @@ -86,6 +87,21 @@ VIR_ENUM_IMPL(qemuDomainAsyncJob, QEMU_ASYNC_JOB_LAST, "start", ); +#define QEMU_DEV_MAJ_MEMORY 1 +#define QEMU_DEV_MAJ_TTY 5 +#define QEMU_DEV_MAJ_KVM 10 +#define QEMU_DEV_MAJ_PTY 136 + +#define QEMU_DEV_MIN_CONSOLE 1 +#define QEMU_DEV_MIN_FULL 7 +#define QEMU_DEV_MIN_FUSE 229 +#define QEMU_DEV_MIN_KVM 232 +#define QEMU_DEV_MIN_NULL 3 +#define QEMU_DEV_MIN_PTMX 2 +#define QEMU_DEV_MIN_RANDOM 8 +#define QEMU_DEV_MIN_TTY 0 +#define QEMU_DEV_MIN_URANDOM 9 +#define QEMU_DEV_MIN_ZERO 5 struct _qemuDomainLogContext { int refs; @@ -6658,3 +6674,220 @@ qemuDomainSupportsVideoVga(virDomainVideoDefPtr video, return true; } + + +static int +qemuDomainPopulateDevices(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm ATTRIBUTE_UNUSED, + const char *path) +{ + int ret = -1; + virFileDevices devs[] = { + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_NULL, 0666, "/null" }, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_ZERO, 0666, "/zero" }, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_FULL, 0666, "/full" }, + { QEMU_DEV_MAJ_KVM, QEMU_DEV_MIN_KVM, 0660, "/kvm"}, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_RANDOM, 0666, "/random" }, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_URANDOM, 0666, "/urandom" }, + { QEMU_DEV_MAJ_TTY, QEMU_DEV_MIN_TTY, 0666, "/tty" }, + { 0, 0, 0, NULL}, + }; + + if (virFilePopulateDevices(path, devs) < 0) + goto cleanup; + + ret = 0; + cleanup: + return ret; +} + + +static int +qemuDomainSetupDev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *path) +{ + char *mount_options = NULL; + char *opts = NULL; + int ret = -1; + + VIR_DEBUG("Setting up /dev/ for domain %s", vm->def->name); + + mount_options = virSecurityManagerGetMountOptions(driver->securityManager, + vm->def); + + if (!mount_options && + VIR_STRDUP(mount_options, "") < 0) + goto cleanup; + + /* + * tmpfs is limited to 64kb, since we only have device nodes in there + * and don't want to DOS the entire OS RAM usage + */ + if (virAsprintf(&opts, + "mode=755,size=65536%s", mount_options) < 0) + goto cleanup; + + if (virFileSetupDev(path, opts) < 0) + goto cleanup; + + if (qemuDomainPopulateDevices(driver, vm, path) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(opts); + VIR_FREE(mount_options); + return ret; +} + + +static int +qemuDomainSetupDevPTS(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *path) +{ + char *mount_options = NULL; + char *opts = NULL; + int ret = -1; + const gid_t ptsgid = 5; + + mount_options = virSecurityManagerGetMountOptions(driver->securityManager, + vm->def); + + if (!mount_options && + VIR_STRDUP(mount_options, "") < 0) + goto cleanup; + + if (virAsprintf(&opts, "newinstance,ptmxmode=0666,mode=0620,gid=%u%s", + ptsgid, (mount_options ? mount_options : "")) < 0) + goto cleanup; + + if (virFileSetupDevPTS(path, opts, NULL) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(opts); + VIR_FREE(mount_options); + return ret; +} + + +int +qemuDomainBuildNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + const unsigned long mount_flags = MS_MOVE; + char *devPath = NULL; + char *devptsPath = NULL; + int ret = -1; + + if (virAsprintf(&devPath, "%s/%s.dev", + cfg->stateDir, vm->def->name) < 0 || + virAsprintf(&devptsPath, "%s/%s.devpts", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + if (qemuDomainSetupDev(driver, vm, devPath) < 0) + goto cleanup; + + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, + _("Failed to mount %s on /dev"), + devPath); + goto cleanup; + } + + if (qemuDomainSetupDevPTS(driver, vm, devptsPath) < 0) + goto cleanup; + + if (virFileMakePath("/dev/pts") < 0) { + virReportSystemError(errno, "%s", + _("Cannot create /dev/pts")); + goto cleanup; + } + + if (mount(devptsPath, "/dev/pts", NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, + _("Failed to mount %s on /dev/pts"), + devptsPath); + goto cleanup; + } + + if (virFileBindMountDevice("/dev/pts/ptmx", "/dev/ptmx") < 0) + goto cleanup; + + VIR_DEBUG("blaaah: %d", system("ls -l /dev > /tmp/blaaah")); + VIR_DEBUG("blaaah: %d", system("ls -l /dev/pts >> /tmp/blaaah")); + VIR_DEBUG("blaaah: %d", system("mount >> /tmp/blaaah")); + ret = 0; + cleanup: + VIR_FREE(devPath); + return ret; +} + + +int +qemuDomainCreateNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + int ret = -1; + char *path = NULL; + + if (virAsprintf(&path, "%s/%s.dev", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + if (virFileMakePath(path) < 0) { + virReportSystemError(errno, + _("Failed to create %s"), + path); + goto cleanup; + } + + VIR_FREE(path); + if (virAsprintf(&path, "%s/%s.devpts", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + if (virFileMakePath(path) < 0) { + virReportSystemError(errno, + _("Failed to create %s"), + path); + goto cleanup; + } + + ret = 0; + cleanup: + VIR_FREE(path); + virObjectUnref(cfg); + return ret; +} + + +void +qemuDomainDeleteNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + char *path; + + if (virAsprintf(&path, "%s/%s.dev", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + virFileDeleteTree(path); + + VIR_FREE(path); + if (virAsprintf(&path, "%s/%s.devpts", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + virFileDeleteTree(path); + cleanup: + virObjectUnref(cfg); + VIR_FREE(path); +} diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 0c1689b..8e6d199 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -784,4 +784,12 @@ int qemuDomainCheckMonitor(virQEMUDriverPtr driver, bool qemuDomainSupportsVideoVga(virDomainVideoDefPtr video, virQEMUCapsPtr qemuCaps); +int qemuDomainBuildNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm); + +int qemuDomainCreateNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm); + +void qemuDomainDeleteNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm); #endif /* __QEMU_DOMAIN_H__ */ diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 09b2a72..522ac19 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2664,6 +2664,12 @@ static int qemuProcessHook(void *data) if (virSecurityManagerClearSocketLabel(h->driver->securityManager, h->vm->def) < 0) goto cleanup; + if (virProcessSetupPrivateNS() < 0) + goto cleanup; + + if (qemuDomainBuildNamespace(h->driver, h->vm) < 0) + goto cleanup; + if (virDomainNumatuneGetMode(h->vm->def->numa, -1, &mode) == 0) { if (mode == VIR_DOMAIN_NUMATUNE_MEM_STRICT && h->cfg->cgroupControllers & (1 << VIR_CGROUP_CONTROLLER_CPUSET) && @@ -5431,6 +5437,11 @@ qemuProcessLaunch(virConnectPtr conn, qemuDomainLogContextMarkPosition(logCtxt); + VIR_DEBUG("Building mount namespace"); + + if (qemuDomainCreateNamespace(driver, vm) < 0) + goto cleanup; + VIR_DEBUG("Clear emulator capabilities: %d", cfg->clearEmulatorCapabilities); if (cfg->clearEmulatorCapabilities) @@ -6200,6 +6211,8 @@ void qemuProcessStop(virQEMUDriverPtr driver, } } + qemuDomainDeleteNamespace(driver, vm); + vm->taint = 0; vm->pid = -1; virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason); -- 2.8.4

On Mon, Nov 14, 2016 at 05:43:30PM +0100, Michal Privoznik wrote:
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 233 ++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 8 ++ src/qemu/qemu_process.c | 13 +++ 3 files changed, 254 insertions(+)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 8cba755..3a0170c 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -55,6 +55,7 @@
#include <sys/time.h> #include <fcntl.h> +#include <sys/mount.h>
#include <libxml/xpathInternals.h>
@@ -86,6 +87,21 @@ VIR_ENUM_IMPL(qemuDomainAsyncJob, QEMU_ASYNC_JOB_LAST, "start", );
+#define QEMU_DEV_MAJ_MEMORY 1 +#define QEMU_DEV_MAJ_TTY 5 +#define QEMU_DEV_MAJ_KVM 10 +#define QEMU_DEV_MAJ_PTY 136 + +#define QEMU_DEV_MIN_CONSOLE 1 +#define QEMU_DEV_MIN_FULL 7 +#define QEMU_DEV_MIN_FUSE 229 +#define QEMU_DEV_MIN_KVM 232 +#define QEMU_DEV_MIN_NULL 3 +#define QEMU_DEV_MIN_PTMX 2 +#define QEMU_DEV_MIN_RANDOM 8 +#define QEMU_DEV_MIN_TTY 0 +#define QEMU_DEV_MIN_URANDOM 9 +#define QEMU_DEV_MIN_ZERO 5
struct _qemuDomainLogContext { int refs; @@ -6658,3 +6674,220 @@ qemuDomainSupportsVideoVga(virDomainVideoDefPtr video,
return true; } + + +static int +qemuDomainPopulateDevices(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm ATTRIBUTE_UNUSED, + const char *path) +{ + int ret = -1; + virFileDevices devs[] = { + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_NULL, 0666, "/null" }, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_ZERO, 0666, "/zero" }, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_FULL, 0666, "/full" }, + { QEMU_DEV_MAJ_KVM, QEMU_DEV_MIN_KVM, 0660, "/kvm"}, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_RANDOM, 0666, "/random" }, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_URANDOM, 0666, "/urandom" }, + { QEMU_DEV_MAJ_TTY, QEMU_DEV_MIN_TTY, 0666, "/tty" }, + { 0, 0, 0, NULL}, + }; + + if (virFilePopulateDevices(path, devs) < 0) + goto cleanup; + + ret = 0; + cleanup: + return ret; +} + + +static int +qemuDomainSetupDev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *path) +{ + char *mount_options = NULL; + char *opts = NULL; + int ret = -1; + + VIR_DEBUG("Setting up /dev/ for domain %s", vm->def->name); + + mount_options = virSecurityManagerGetMountOptions(driver->securityManager, + vm->def); + + if (!mount_options && + VIR_STRDUP(mount_options, "") < 0) + goto cleanup; + + /* + * tmpfs is limited to 64kb, since we only have device nodes in there + * and don't want to DOS the entire OS RAM usage + */ + if (virAsprintf(&opts, + "mode=755,size=65536%s", mount_options) < 0) + goto cleanup; + + if (virFileSetupDev(path, opts) < 0) + goto cleanup; + + if (qemuDomainPopulateDevices(driver, vm, path) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(opts); + VIR_FREE(mount_options); + return ret; +} + + +static int +qemuDomainSetupDevPTS(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *path) +{ + char *mount_options = NULL; + char *opts = NULL; + int ret = -1; + const gid_t ptsgid = 5; + + mount_options = virSecurityManagerGetMountOptions(driver->securityManager, + vm->def); + + if (!mount_options && + VIR_STRDUP(mount_options, "") < 0) + goto cleanup; + + if (virAsprintf(&opts, "newinstance,ptmxmode=0666,mode=0620,gid=%u%s", + ptsgid, (mount_options ? mount_options : "")) < 0) + goto cleanup; + + if (virFileSetupDevPTS(path, opts, NULL) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(opts); + VIR_FREE(mount_options); + return ret; +}
This is going to cause problems - the /dev/pts/NNN nodes that QEMU has open (for its chardev backend) are exposed in the guest XML and are intended to be accessed by other processes on the host. By setting up a new /dev/pts mount with newinstance, the /dev/pts/NNN nodes for chardevs will be invisible to the host OS. So while you want to replace /dev completely, you need to keep the original /dev/pts mount unchanged. Practically speaking this means that after doing the unshare() you need to use mount() with MS_MOVE to move /dev/pts somewhere safe, then unmount the old /dev, mount our new /dev instance and then finally MS_MOVE the saved /dev/pts back in place. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 14.11.2016 17:55, Daniel P. Berrange wrote:
On Mon, Nov 14, 2016 at 05:43:30PM +0100, Michal Privoznik wrote:
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 233 ++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 8 ++ src/qemu/qemu_process.c | 13 +++ 3 files changed, 254 insertions(+)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 8cba755..3a0170c 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -55,6 +55,7 @@
#include <sys/time.h> #include <fcntl.h> +#include <sys/mount.h>
#include <libxml/xpathInternals.h>
@@ -86,6 +87,21 @@ VIR_ENUM_IMPL(qemuDomainAsyncJob, QEMU_ASYNC_JOB_LAST, "start", );
+#define QEMU_DEV_MAJ_MEMORY 1 +#define QEMU_DEV_MAJ_TTY 5 +#define QEMU_DEV_MAJ_KVM 10 +#define QEMU_DEV_MAJ_PTY 136 + +#define QEMU_DEV_MIN_CONSOLE 1 +#define QEMU_DEV_MIN_FULL 7 +#define QEMU_DEV_MIN_FUSE 229 +#define QEMU_DEV_MIN_KVM 232 +#define QEMU_DEV_MIN_NULL 3 +#define QEMU_DEV_MIN_PTMX 2 +#define QEMU_DEV_MIN_RANDOM 8 +#define QEMU_DEV_MIN_TTY 0 +#define QEMU_DEV_MIN_URANDOM 9 +#define QEMU_DEV_MIN_ZERO 5
struct _qemuDomainLogContext { int refs; @@ -6658,3 +6674,220 @@ qemuDomainSupportsVideoVga(virDomainVideoDefPtr video,
return true; } + + +static int +qemuDomainPopulateDevices(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm ATTRIBUTE_UNUSED, + const char *path) +{ + int ret = -1; + virFileDevices devs[] = { + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_NULL, 0666, "/null" }, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_ZERO, 0666, "/zero" }, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_FULL, 0666, "/full" }, + { QEMU_DEV_MAJ_KVM, QEMU_DEV_MIN_KVM, 0660, "/kvm"}, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_RANDOM, 0666, "/random" }, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_URANDOM, 0666, "/urandom" }, + { QEMU_DEV_MAJ_TTY, QEMU_DEV_MIN_TTY, 0666, "/tty" }, + { 0, 0, 0, NULL}, + }; + + if (virFilePopulateDevices(path, devs) < 0) + goto cleanup; + + ret = 0; + cleanup: + return ret; +} + + +static int +qemuDomainSetupDev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *path) +{ + char *mount_options = NULL; + char *opts = NULL; + int ret = -1; + + VIR_DEBUG("Setting up /dev/ for domain %s", vm->def->name); + + mount_options = virSecurityManagerGetMountOptions(driver->securityManager, + vm->def); + + if (!mount_options && + VIR_STRDUP(mount_options, "") < 0) + goto cleanup; + + /* + * tmpfs is limited to 64kb, since we only have device nodes in there + * and don't want to DOS the entire OS RAM usage + */ + if (virAsprintf(&opts, + "mode=755,size=65536%s", mount_options) < 0) + goto cleanup; + + if (virFileSetupDev(path, opts) < 0) + goto cleanup; + + if (qemuDomainPopulateDevices(driver, vm, path) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(opts); + VIR_FREE(mount_options); + return ret; +} + + +static int +qemuDomainSetupDevPTS(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *path) +{ + char *mount_options = NULL; + char *opts = NULL; + int ret = -1; + const gid_t ptsgid = 5; + + mount_options = virSecurityManagerGetMountOptions(driver->securityManager, + vm->def); + + if (!mount_options && + VIR_STRDUP(mount_options, "") < 0) + goto cleanup; + + if (virAsprintf(&opts, "newinstance,ptmxmode=0666,mode=0620,gid=%u%s", + ptsgid, (mount_options ? mount_options : "")) < 0) + goto cleanup; + + if (virFileSetupDevPTS(path, opts, NULL) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(opts); + VIR_FREE(mount_options); + return ret; +}
This is going to cause problems - the /dev/pts/NNN nodes that QEMU has open (for its chardev backend) are exposed in the guest XML and are intended to be accessed by other processes on the host. By setting up a new /dev/pts mount with newinstance, the /dev/pts/NNN nodes for chardevs will be invisible to the host OS.
So while you want to replace /dev completely, you need to keep the original /dev/pts mount unchanged. Practically speaking this means that after doing the unshare() you need to use mount() with MS_MOVE to move /dev/pts somewhere safe, then unmount the old /dev, mount our new /dev instance and then finally MS_MOVE the saved /dev/pts back in place.
Ah, good point. I will fix this. Thanks for such quick review. Michal

On Mon, Nov 14, 2016 at 05:43:30PM +0100, Michal Privoznik wrote:
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 233 ++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 8 ++ src/qemu/qemu_process.c | 13 +++ 3 files changed, 254 insertions(+)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 8cba755..3a0170c 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -55,6 +55,7 @@
#include <sys/time.h> #include <fcntl.h> +#include <sys/mount.h>
#include <libxml/xpathInternals.h>
@@ -86,6 +87,21 @@ VIR_ENUM_IMPL(qemuDomainAsyncJob, QEMU_ASYNC_JOB_LAST, "start", );
+#define QEMU_DEV_MAJ_MEMORY 1 +#define QEMU_DEV_MAJ_TTY 5 +#define QEMU_DEV_MAJ_KVM 10 +#define QEMU_DEV_MAJ_PTY 136 + +#define QEMU_DEV_MIN_CONSOLE 1 +#define QEMU_DEV_MIN_FULL 7 +#define QEMU_DEV_MIN_FUSE 229 +#define QEMU_DEV_MIN_KVM 232 +#define QEMU_DEV_MIN_NULL 3 +#define QEMU_DEV_MIN_PTMX 2 +#define QEMU_DEV_MIN_RANDOM 8 +#define QEMU_DEV_MIN_TTY 0 +#define QEMU_DEV_MIN_URANDOM 9 +#define QEMU_DEV_MIN_ZERO 5
struct _qemuDomainLogContext { int refs; @@ -6658,3 +6674,220 @@ qemuDomainSupportsVideoVga(virDomainVideoDefPtr video,
return true; } + + +static int +qemuDomainPopulateDevices(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm ATTRIBUTE_UNUSED, + const char *path) +{ + int ret = -1; + virFileDevices devs[] = { + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_NULL, 0666, "/null" }, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_ZERO, 0666, "/zero" }, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_FULL, 0666, "/full" }, + { QEMU_DEV_MAJ_KVM, QEMU_DEV_MIN_KVM, 0660, "/kvm"}, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_RANDOM, 0666, "/random" }, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_URANDOM, 0666, "/urandom" }, + { QEMU_DEV_MAJ_TTY, QEMU_DEV_MIN_TTY, 0666, "/tty" },
BTW, QEMU shouldn't need /dev/tty Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 14.11.2016 17:57, Daniel P. Berrange wrote:
On Mon, Nov 14, 2016 at 05:43:30PM +0100, Michal Privoznik wrote:
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 233 ++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 8 ++ src/qemu/qemu_process.c | 13 +++ 3 files changed, 254 insertions(+)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 8cba755..3a0170c 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -55,6 +55,7 @@
#include <sys/time.h> #include <fcntl.h> +#include <sys/mount.h>
#include <libxml/xpathInternals.h>
@@ -86,6 +87,21 @@ VIR_ENUM_IMPL(qemuDomainAsyncJob, QEMU_ASYNC_JOB_LAST, "start", );
+#define QEMU_DEV_MAJ_MEMORY 1 +#define QEMU_DEV_MAJ_TTY 5 +#define QEMU_DEV_MAJ_KVM 10 +#define QEMU_DEV_MAJ_PTY 136 + +#define QEMU_DEV_MIN_CONSOLE 1 +#define QEMU_DEV_MIN_FULL 7 +#define QEMU_DEV_MIN_FUSE 229 +#define QEMU_DEV_MIN_KVM 232 +#define QEMU_DEV_MIN_NULL 3 +#define QEMU_DEV_MIN_PTMX 2 +#define QEMU_DEV_MIN_RANDOM 8 +#define QEMU_DEV_MIN_TTY 0 +#define QEMU_DEV_MIN_URANDOM 9 +#define QEMU_DEV_MIN_ZERO 5
struct _qemuDomainLogContext { int refs; @@ -6658,3 +6674,220 @@ qemuDomainSupportsVideoVga(virDomainVideoDefPtr video,
return true; } + + +static int +qemuDomainPopulateDevices(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm ATTRIBUTE_UNUSED, + const char *path) +{ + int ret = -1; + virFileDevices devs[] = { + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_NULL, 0666, "/null" }, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_ZERO, 0666, "/zero" }, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_FULL, 0666, "/full" }, + { QEMU_DEV_MAJ_KVM, QEMU_DEV_MIN_KVM, 0660, "/kvm"}, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_RANDOM, 0666, "/random" }, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_URANDOM, 0666, "/urandom" }, + { QEMU_DEV_MAJ_TTY, QEMU_DEV_MIN_TTY, 0666, "/tty" },
BTW, QEMU shouldn't need /dev/tty
Yeah, I'm probably gonna replace this with cfg->cgroupDeviceACL (or with defaultDeviceACL[] from qemu_cgroup.c) anyway because some files are missing here. Michal

On Mon, Nov 14, 2016 at 06:07:43PM +0100, Michal Privoznik wrote:
On 14.11.2016 17:57, Daniel P. Berrange wrote:
On Mon, Nov 14, 2016 at 05:43:30PM +0100, Michal Privoznik wrote:
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 233 ++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 8 ++ src/qemu/qemu_process.c | 13 +++ 3 files changed, 254 insertions(+)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 8cba755..3a0170c 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -55,6 +55,7 @@
#include <sys/time.h> #include <fcntl.h> +#include <sys/mount.h>
#include <libxml/xpathInternals.h>
@@ -86,6 +87,21 @@ VIR_ENUM_IMPL(qemuDomainAsyncJob, QEMU_ASYNC_JOB_LAST, "start", );
+#define QEMU_DEV_MAJ_MEMORY 1 +#define QEMU_DEV_MAJ_TTY 5 +#define QEMU_DEV_MAJ_KVM 10 +#define QEMU_DEV_MAJ_PTY 136 + +#define QEMU_DEV_MIN_CONSOLE 1 +#define QEMU_DEV_MIN_FULL 7 +#define QEMU_DEV_MIN_FUSE 229 +#define QEMU_DEV_MIN_KVM 232 +#define QEMU_DEV_MIN_NULL 3 +#define QEMU_DEV_MIN_PTMX 2 +#define QEMU_DEV_MIN_RANDOM 8 +#define QEMU_DEV_MIN_TTY 0 +#define QEMU_DEV_MIN_URANDOM 9 +#define QEMU_DEV_MIN_ZERO 5
struct _qemuDomainLogContext { int refs; @@ -6658,3 +6674,220 @@ qemuDomainSupportsVideoVga(virDomainVideoDefPtr video,
return true; } + + +static int +qemuDomainPopulateDevices(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm ATTRIBUTE_UNUSED, + const char *path) +{ + int ret = -1; + virFileDevices devs[] = { + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_NULL, 0666, "/null" }, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_ZERO, 0666, "/zero" }, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_FULL, 0666, "/full" }, + { QEMU_DEV_MAJ_KVM, QEMU_DEV_MIN_KVM, 0660, "/kvm"}, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_RANDOM, 0666, "/random" }, + { QEMU_DEV_MAJ_MEMORY, QEMU_DEV_MIN_URANDOM, 0666, "/urandom" }, + { QEMU_DEV_MAJ_TTY, QEMU_DEV_MIN_TTY, 0666, "/tty" },
BTW, QEMU shouldn't need /dev/tty
Yeah, I'm probably gonna replace this with cfg->cgroupDeviceACL (or with defaultDeviceACL[] from qemu_cgroup.c) anyway because some files are missing here.
Arguably we should not really need to hardcode the MAJ/MIN numbers in here at all. We can just stat() the /dev/FOO file in the host to learn the correct major/minor number and copy that. We also don't really need to care about the permissions either - they can all be 0600 since we can either immediately give ownership to the 'qemu' user, or the DAC driver will do that for us. Either way we don't need to change perms per device. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 4 +++ 2 files changed, 77 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 3a0170c..572adf4 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6775,6 +6775,76 @@ qemuDomainSetupDevPTS(virQEMUDriverPtr driver, int +qemuDomainNamespaceSetupDisk(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainDiskDefPtr disk, + const char *devPath) +{ + virStorageSourcePtr next; + char *dst = NULL; + int ret = -1; + + for (next = disk->src; next; next = next->backingStore) { + struct stat sb; + mode_t mode; + + if (!next->path || !virStorageSourceIsLocalStorage(next) || + !STRPREFIX(next->path, "/dev")) { + /* Not creating device. Just continue. */ + continue; + } + + if (stat(next->path, &sb) < 0) { + virReportSystemError(errno, + _("Unable to access %s"), next->path); + goto cleanup; + } + + VIR_FREE(dst); + if (virAsprintf(&dst, "%s/%s", devPath, next->path + 4) < 0) + goto cleanup; + + mode = 0700; + if (S_ISCHR(sb.st_mode)) + mode |= S_IFCHR; + else + mode |= S_IFBLK; + + if (mknod(dst, mode, sb.st_rdev) < 0) { + virReportSystemError(errno, + _("Unable to create device %s"), + dst); + goto cleanup; + } + } + + ret = 0; + cleanup: + VIR_FREE(dst); + return ret; +} + + +static int +qemuDomainSetupAllDisks(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *devPath) +{ + size_t i; + VIR_DEBUG("Setting up disks"); + + for (i = 0; i < vm->def->ndisks; i++) { + if (qemuDomainNamespaceSetupDisk(driver, + vm->def->disks[i], + devPath) < 0) + return -1; + } + + VIR_DEBUG("Setup all disks"); + return 0; +} + + +int qemuDomainBuildNamespace(virQEMUDriverPtr driver, virDomainObjPtr vm) { @@ -6793,6 +6863,9 @@ qemuDomainBuildNamespace(virQEMUDriverPtr driver, if (qemuDomainSetupDev(driver, vm, devPath) < 0) goto cleanup; + if (qemuDomainSetupAllDisks(driver, vm, devPath) < 0) + goto cleanup; + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { virReportSystemError(errno, _("Failed to mount %s on /dev"), diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 8e6d199..f4b834c 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -792,4 +792,8 @@ int qemuDomainCreateNamespace(virQEMUDriverPtr driver, void qemuDomainDeleteNamespace(virQEMUDriverPtr driver, virDomainObjPtr vm); + +int qemuDomainNamespaceSetupDisk(virQEMUDriverPtr driver, + virDomainDiskDefPtr disk, + const char *devPath); #endif /* __QEMU_DOMAIN_H__ */ -- 2.8.4

On Mon, Nov 14, 2016 at 05:43:24PM +0100, Michal Privoznik wrote:
This is just an proof of concept of what has been agreed on here:
https://www.redhat.com/archives/libvir-list/2016-November/msg00285.html
There is still a lot of to be done: - set up seclabels - implement hot(un-)plug - implement other devices, not just disks
I'm sending these in a hope that somebody will at least take a quick look. I'm not looking for a code cleanliness (but if you find some issues feel free to raise them), more than design confirmation. If I'm going in wrong direction I'd rather stop now before digging any deeper.
I think you're doing broadly the right thing - the only significant problem is the /dev/pts issue I mention. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On Mon, Nov 14, 2016 at 04:58:19PM +0000, Daniel P. Berrange wrote:
On Mon, Nov 14, 2016 at 05:43:24PM +0100, Michal Privoznik wrote:
This is just an proof of concept of what has been agreed on here:
https://www.redhat.com/archives/libvir-list/2016-November/msg00285.html
There is still a lot of to be done: - set up seclabels - implement hot(un-)plug - implement other devices, not just disks
I'm sending these in a hope that somebody will at least take a quick look. I'm not looking for a code cleanliness (but if you find some issues feel free to raise them), more than design confirmation. If I'm going in wrong direction I'd rather stop now before digging any deeper.
I think you're doing broadly the right thing - the only significant problem is the /dev/pts issue I mention.
Oh and we'll need to make sure we skip all this when running qemu:///session since you can't spawn new namespaces as non-root. Not a big deal, since we don't have the udev race problem as non-root either :-) Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
participants (2)
-
Daniel P. Berrange
-
Michal Privoznik