[libvirt] [PATCH v2 00/21] Run qemu under its own namespace

v1 posted here: https://www.redhat.com/archives/libvir-list/2016-November/msg01208.html diff to v1: - I've dropped the patches for hugepages which are posted separately [1] - I've reworked some parts according to Dan's suggestions - Filled missing impl for virSCSIVHostDevice which was merged meanwhile Please note that patches 1-5, 7 were ACKed already. You can also find the patches on my github: https://github.com/zippy2/libvirt/tree/qemu_container_v3 (Yeah, _v2 branch is not good enough.) 1: https://www.redhat.com/archives/libvir-list/2016-November/msg01415.html Michal Privoznik (21): virprocess: Introduce virProcessSetupPrivateMountNS virfile: Introduce virFileSetupDev virfile: Introduce ACL helpers virusb: Introduce virUSBDeviceGetPath virscsi: Introduce virSCSIDeviceGetPath virscsivhost: Introduce virSCSIVHostDeviceGetPath qemu_cgroup: Expose defaultDeviceACL qemu: Spawn qemu under mount namespace qemu: Prepare disks when starting a domain qemu: Prepare hostdevs when starting a domain qemu: Prepare chardevs when starting a domain qemu: Prepare TPM when starting a domain qemu: Prepare inputs when starting a domain qemu: Prepare RNGs when starting a domain qemu: Enter the namespace on relabelling qemu: Manage /dev entry on disk hotplug qemu: Manage /dev entry on hostdev hotplug qemu: Manage /dev entry on chardev hotplug qemu: Manage /dev entry on RNG hotplug qemu: Let users opt-out from containerization qemu: Enable mount namespace config-post.h | 2 + configure.ac | 12 +- src/Makefile.am | 7 +- src/libvirt_private.syms | 10 + src/lxc/lxc_container.c | 20 +- src/lxc/lxc_controller.c | 32 +- src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 8 + src/qemu/qemu_cgroup.c | 2 +- src/qemu/qemu_cgroup.h | 1 + src/qemu/qemu_conf.c | 33 + src/qemu/qemu_conf.h | 2 + src/qemu/qemu_domain.c | 1244 +++++++++++++++++++++++++++++++++++- src/qemu/qemu_domain.h | 52 ++ src/qemu/qemu_driver.c | 5 +- src/qemu/qemu_hotplug.c | 90 ++- src/qemu/qemu_process.c | 28 +- src/qemu/qemu_security.c | 198 ++++++ src/qemu/qemu_security.h | 55 ++ src/qemu/test_libvirtd_qemu.aug.in | 3 + src/util/virfile.c | 153 +++++ src/util/virfile.h | 17 + src/util/virprocess.c | 38 ++ src/util/virprocess.h | 2 + src/util/virscsi.c | 6 + src/util/virscsi.h | 1 + src/util/virscsivhost.c | 7 + src/util/virscsivhost.h | 1 + src/util/virusb.c | 5 + src/util/virusb.h | 1 + 30 files changed, 1951 insertions(+), 85 deletions(-) create mode 100644 src/qemu/qemu_security.c create mode 100644 src/qemu/qemu_security.h -- 2.11.0

This part of code that LXC currently uses will be reused so move to a generic function. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- configure.ac | 2 +- src/libvirt_private.syms | 1 + src/lxc/lxc_controller.c | 18 +----------------- src/util/virprocess.c | 38 ++++++++++++++++++++++++++++++++++++++ src/util/virprocess.h | 2 ++ 5 files changed, 43 insertions(+), 18 deletions(-) diff --git a/configure.ac b/configure.ac index 3802e5910..da3388e7c 100644 --- a/configure.ac +++ b/configure.ac @@ -291,7 +291,7 @@ dnl and various less common threadsafe functions AC_CHECK_FUNCS_ONCE([cfmakeraw fallocate geteuid getgid getgrnam_r \ getmntent_r getpwuid_r getrlimit getuid kill mmap newlocale posix_fallocate \ posix_memalign prlimit regexec sched_getaffinity setgroups setns \ - setrlimit symlink sysctlbyname getifaddrs sched_setscheduler]) + setrlimit symlink sysctlbyname getifaddrs sched_setscheduler unshare]) dnl Availability of pthread functions. Because of $LIB_PTHREAD, we dnl cannot use AC_CHECK_FUNCS_ONCE. LIB_PTHREAD and LIBMULTITHREAD diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 6303dec8b..e6bf395f9 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2261,6 +2261,7 @@ virProcessSetMaxMemLock; virProcessSetMaxProcesses; virProcessSetNamespaces; virProcessSetScheduler; +virProcessSetupPrivateMountNS; virProcessTranslateStatus; virProcessWait; diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 508bc3e6c..29f1179c0 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -2092,8 +2092,6 @@ lxcCreateTty(virLXCControllerPtr ctrl, int *ttymaster, static int virLXCControllerSetupPrivateNS(void) { - int ret = -1; - /* * If doing a chroot style setup, we need to prepare * a private /dev/pts for the child now, which they @@ -2115,21 +2113,7 @@ virLXCControllerSetupPrivateNS(void) * marked as shared */ - if (unshare(CLONE_NEWNS) < 0) { - virReportSystemError(errno, "%s", - _("Cannot unshare mount namespace")); - goto cleanup; - } - - if (mount("", "/", NULL, MS_SLAVE|MS_REC, NULL) < 0) { - virReportSystemError(errno, "%s", - _("Failed to switch root mount into slave mode")); - goto cleanup; - } - - ret = 0; - cleanup: - return ret; + return virProcessSetupPrivateMountNS(); } diff --git a/src/util/virprocess.c b/src/util/virprocess.c index 7db97bc53..1ebe863fb 100644 --- a/src/util/virprocess.c +++ b/src/util/virprocess.c @@ -28,6 +28,9 @@ #include <stdlib.h> #include <sys/wait.h> #include <unistd.h> +#if HAVE_SYS_MOUNT_H +# include <sys/mount.h> +#endif #if HAVE_SETRLIMIT # include <sys/time.h> # include <sys/resource.h> @@ -1146,6 +1149,41 @@ virProcessRunInMountNamespace(pid_t pid, } +#if defined(HAVE_SYS_MOUNT_H) && defined(HAVE_UNSHARE) +int +virProcessSetupPrivateMountNS(void) +{ + int ret = -1; + + if (unshare(CLONE_NEWNS) < 0) { + virReportSystemError(errno, "%s", + _("Cannot unshare mount namespace")); + goto cleanup; + } + + if (mount("", "/", NULL, MS_SLAVE|MS_REC, NULL) < 0) { + virReportSystemError(errno, "%s", + _("Failed to switch root mount into slave mode")); + goto cleanup; + } + + ret = 0; + cleanup: + return ret; +} + +#else /* !defined(HAVE_SYS_MOUNT_H) || !defined(HAVE_UNSHARE) */ + +int +virProcessSetupPrivateMountNS(void) +{ + virReportSystemError(ENOSYS, "%s", + _("Namespaces are not supported on this platform.")); + return -1; +} +#endif /* !defined(HAVE_SYS_MOUNT_H) || !defined(HAVE_UNSHARE) */ + + /** * virProcessExitWithStatus: * @status: raw status to be reproduced when this process dies diff --git a/src/util/virprocess.h b/src/util/virprocess.h index 04e9802aa..c76a1fbc5 100644 --- a/src/util/virprocess.h +++ b/src/util/virprocess.h @@ -90,6 +90,8 @@ int virProcessRunInMountNamespace(pid_t pid, virProcessNamespaceCallback cb, void *opaque); +int virProcessSetupPrivateMountNS(void); + int virProcessSetScheduler(pid_t pid, virProcessSchedPolicy policy, int priority); -- 2.11.0

This part of code that LXC currently uses will be reused so move to a generic function. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 2 ++ src/lxc/lxc_container.c | 20 ++------------ src/lxc/lxc_controller.c | 14 +--------- src/util/virfile.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 6 ++++ 5 files changed, 84 insertions(+), 30 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index e6bf395f9..94eea50fd 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1556,6 +1556,7 @@ virDirRead; virFileAbsPath; virFileAccessibleAs; virFileActivateDirOverride; +virFileBindMountDevice; virFileBuildPath; virFileClose; virFileDeleteTree; @@ -1603,6 +1604,7 @@ virFileResolveLink; virFileRewrite; virFileRewriteStr; virFileSanitizePath; +virFileSetupDev; virFileSkipRoot; virFileStripSuffix; virFileTouch; diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index dd013dfce..32c0c3a4a 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -1112,20 +1112,6 @@ static int lxcContainerMountFSDevPTS(virDomainDefPtr def, return ret; } -static int lxcContainerBindMountDevice(const char *src, const char *dst) -{ - if (virFileTouch(dst, 0666) < 0) - return -1; - - if (mount(src, dst, "none", MS_BIND, NULL) < 0) { - virReportSystemError(errno, _("Failed to bind %s on to %s"), src, - dst); - return -1; - } - - return 0; -} - static int lxcContainerSetupDevices(char **ttyPaths, size_t nttyPaths) { size_t i; @@ -1149,7 +1135,7 @@ static int lxcContainerSetupDevices(char **ttyPaths, size_t nttyPaths) } /* We have private devpts capability, so bind that */ - if (lxcContainerBindMountDevice("/dev/pts/ptmx", "/dev/ptmx") < 0) + if (virFileBindMountDevice("/dev/pts/ptmx", "/dev/ptmx") < 0) return -1; for (i = 0; i < nttyPaths; i++) { @@ -1157,7 +1143,7 @@ static int lxcContainerSetupDevices(char **ttyPaths, size_t nttyPaths) if (virAsprintf(&tty, "/dev/tty%zu", i+1) < 0) return -1; - if (lxcContainerBindMountDevice(ttyPaths[i], tty) < 0) { + if (virFileBindMountDevice(ttyPaths[i], tty) < 0) { return -1; VIR_FREE(tty); } @@ -1165,7 +1151,7 @@ static int lxcContainerSetupDevices(char **ttyPaths, size_t nttyPaths) VIR_FREE(tty); if (i == 0 && - lxcContainerBindMountDevice(ttyPaths[i], "/dev/console") < 0) + virFileBindMountDevice(ttyPaths[i], "/dev/console") < 0) return -1; } return 0; diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 29f1179c0..2170b0ae2 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -1457,12 +1457,6 @@ static int virLXCControllerSetupDev(virLXCControllerPtr ctrl) LXC_STATE_DIR, ctrl->def->name) < 0) goto cleanup; - if (virFileMakePath(dev) < 0) { - virReportSystemError(errno, - _("Failed to make path %s"), dev); - goto cleanup; - } - /* * tmpfs is limited to 64kb, since we only have device nodes in there * and don't want to DOS the entire OS RAM usage @@ -1472,14 +1466,8 @@ static int virLXCControllerSetupDev(virLXCControllerPtr ctrl) "mode=755,size=65536%s", mount_options) < 0) goto cleanup; - VIR_DEBUG("Mount devfs on %s type=tmpfs flags=%x, opts=%s", - dev, MS_NOSUID, opts); - if (mount("devfs", dev, "tmpfs", MS_NOSUID, opts) < 0) { - virReportSystemError(errno, - _("Failed to mount devfs on %s type %s (%s)"), - dev, "tmpfs", opts); + if (virFileSetupDev(dev, opts) < 0) goto cleanup; - } if (lxcContainerChown(ctrl->def, dev) < 0) goto cleanup; diff --git a/src/util/virfile.c b/src/util/virfile.c index 1f0bfa906..cc585c1e1 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -32,6 +32,9 @@ #include <sys/types.h> #include <sys/socket.h> #include <sys/wait.h> +#if defined(HAVE_SYS_MOUNT_H) +# include <sys/mount.h> +#endif #include <unistd.h> #include <dirent.h> #include <dirname.h> @@ -3557,3 +3560,72 @@ int virFileIsSharedFS(const char *path) VIR_FILE_SHFS_SMB | VIR_FILE_SHFS_CIFS); } + + +#if defined(HAVE_SYS_MOUNT_H) +int +virFileSetupDev(const char *path, + const char *mount_options) +{ + const unsigned long mount_flags = MS_NOSUID; + const char *mount_fs = "tmpfs"; + int ret = -1; + + if (virFileMakePath(path) < 0) { + virReportSystemError(errno, + _("Failed to make path %s"), path); + goto cleanup; + } + + VIR_DEBUG("Mount devfs on %s type=tmpfs flags=%lx, opts=%s", + path, mount_flags, mount_options); + if (mount("devfs", path, mount_fs, mount_flags, mount_options) < 0) { + virReportSystemError(errno, + _("Failed to mount devfs on %s type %s (%s)"), + path, mount_fs, mount_options); + goto cleanup; + } + + ret = 0; + cleanup: + return ret; +} + + +int +virFileBindMountDevice(const char *src, + const char *dst) +{ + if (virFileTouch(dst, 0666) < 0) + return -1; + + if (mount(src, dst, "none", MS_BIND, NULL) < 0) { + virReportSystemError(errno, _("Failed to bind %s on to %s"), src, + dst); + return -1; + } + + return 0; +} + +#else /* !defined(HAVE_SYS_MOUNT_H) */ + +int +virFileSetupDev(const char *path ATTRIBUTE_UNUSED, + const char *mount_options ATTRIBUTE_UNUSED) +{ + virReportSystemError(ENOSYS, "%s", + _("mount is not supported on this platform.")); + return -1; +} + + +int +virFileBindMountDevice(const char *src ATTRIBUTE_UNUSED, + const char *dst ATTRIBUTE_UNUSED) +{ + virReportSystemError(ENOSYS, "%s", + _("mount is not supported on this platform.")); + return -1; +} +#endif /* !defined(HAVE_SYS_MOUNT_H) */ diff --git a/src/util/virfile.h b/src/util/virfile.h index 5b810956e..5e3bfc00c 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -311,4 +311,10 @@ int virFileGetHugepageSize(const char *path, unsigned long long *size); int virFileFindHugeTLBFS(virHugeTLBFSPtr *ret_fs, size_t *ret_nfs); + +int virFileSetupDev(const char *path, + const char *mount_options); + +int virFileBindMountDevice(const char *src, + const char *dst); #endif /* __VIR_FILE_H */ -- 2.11.0

Namely, virFileGetACLs, virFileSetACLs, virFileFreeACLs and virFileCopyACLs. These functions are going to be required when we are creating /dev for qemu. We have copy anything that's in host's /dev exactly as is. Including ACLs. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- config-post.h | 2 ++ configure.ac | 10 +++++- src/Makefile.am | 4 +-- src/libvirt_private.syms | 4 +++ src/util/virfile.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 11 +++++++ 6 files changed, 109 insertions(+), 3 deletions(-) diff --git a/config-post.h b/config-post.h index 090cc2831..6a91ff66b 100644 --- a/config-post.h +++ b/config-post.h @@ -30,6 +30,7 @@ # undef HAVE_LIBNL # undef HAVE_LIBNL3 # undef HAVE_LIBSASL2 +# undef HAVE_SYS_ACL_H # undef WITH_CAPNG # undef WITH_CURL # undef WITH_DBUS @@ -56,6 +57,7 @@ # undef HAVE_LIBNL # undef HAVE_LIBNL3 # undef HAVE_LIBSASL2 +# undef HAVE_SYS_ACL_H # undef WITH_CAPNG # undef WITH_CURL # undef WITH_DTRACE_PROBES diff --git a/configure.ac b/configure.ac index da3388e7c..4614b3687 100644 --- a/configure.ac +++ b/configure.ac @@ -332,11 +332,19 @@ dnl Availability of various common headers (non-fatal if missing). AC_CHECK_HEADERS([pwd.h regex.h sys/un.h \ sys/poll.h syslog.h mntent.h net/ethernet.h linux/magic.h \ sys/un.h sys/syscall.h sys/sysctl.h netinet/tcp.h ifaddrs.h \ - libtasn1.h sys/ucred.h sys/mount.h]) + libtasn1.h sys/ucred.h sys/mount.h sys/acl.h]) dnl Check whether endian provides handy macros. AC_CHECK_DECLS([htole64], [], [], [[#include <endian.h>]]) AC_CHECK_FUNCS([stat stat64 __xstat __xstat64 lstat lstat64 __lxstat __lxstat64]) +ACL_CFLAGS="" +ACL_LIBS="" +if test "x$ac_cv_header_sys_acl_h" = "xyes" ; then + ACL_LIBS="-lacl" +fi +AC_SUBST([ACL_CFLAGS]) +AC_SUBST([ACL_LIBS]) + dnl We need to decide at configure time if libvirt will use real atomic dnl operations ("lock free") or emulated ones with a mutex. diff --git a/src/Makefile.am b/src/Makefile.am index 8c620d5e0..07a28335a 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -1127,12 +1127,12 @@ libvirt_util_la_SOURCES = \ libvirt_util_la_CFLAGS = $(CAPNG_CFLAGS) $(YAJL_CFLAGS) $(LIBNL_CFLAGS) \ $(AM_CFLAGS) $(AUDIT_CFLAGS) $(DEVMAPPER_CFLAGS) \ $(DBUS_CFLAGS) $(LDEXP_LIBM) $(NUMACTL_CFLAGS) \ - $(POLKIT_CFLAGS) $(GNUTLS_CFLAGS) \ + $(POLKIT_CFLAGS) $(GNUTLS_CFLAGS) $(ACL_CFLAGS) \ -I$(srcdir)/conf libvirt_util_la_LIBADD = $(CAPNG_LIBS) $(YAJL_LIBS) $(LIBNL_LIBS) \ $(THREAD_LIBS) $(AUDIT_LIBS) $(DEVMAPPER_LIBS) \ $(LIB_CLOCK_GETTIME) $(DBUS_LIBS) $(MSCOM_LIBS) $(LIBXML_LIBS) \ - $(SECDRIVER_LIBS) $(NUMACTL_LIBS) \ + $(SECDRIVER_LIBS) $(NUMACTL_LIBS) $(ACL_LIBS) \ $(POLKIT_LIBS) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 94eea50fd..c1ed2c598 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1559,6 +1559,7 @@ virFileActivateDirOverride; virFileBindMountDevice; virFileBuildPath; virFileClose; +virFileCopyACLs; virFileDeleteTree; virFileDirectFdFlag; virFileExists; @@ -1568,6 +1569,8 @@ virFileFindHugeTLBFS; virFileFindMountPoint; virFileFindResource; virFileFindResourceFull; +virFileFreeACLs; +virFileGetACLs; virFileGetHugepageSize; virFileGetMountReverseSubtree; virFileGetMountSubtree; @@ -1604,6 +1607,7 @@ virFileResolveLink; virFileRewrite; virFileRewriteStr; virFileSanitizePath; +virFileSetACLs; virFileSetupDev; virFileSkipRoot; virFileStripSuffix; diff --git a/src/util/virfile.c b/src/util/virfile.c index cc585c1e1..7c1e4357d 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -48,6 +48,9 @@ #if HAVE_SYS_SYSCALL_H # include <sys/syscall.h> #endif +#if HAVE_SYS_ACL_H +# include <sys/acl.h> +#endif #ifdef __linux__ # if HAVE_LINUX_MAGIC_H @@ -3629,3 +3632,81 @@ virFileBindMountDevice(const char *src ATTRIBUTE_UNUSED, return -1; } #endif /* !defined(HAVE_SYS_MOUNT_H) */ + + +#if defined(HAVE_SYS_ACL_H) +int +virFileGetACLs(const char *file, + void **acl) +{ + if (!(*acl = acl_get_file(file, ACL_TYPE_ACCESS))) + return -1; + + return 0; +} + + +int +virFileSetACLs(const char *file, + void *acl) +{ + if (acl_set_file(file, ACL_TYPE_ACCESS, acl) < 0) + return -1; + + return 0; +} + + +void +virFileFreeACLs(void **acl) +{ + acl_free(*acl); + *acl = NULL; +} + +#else /* !defined(HAVE_SYS_ACL_H) */ + +int +virFileGetACLs(const char *file ATTRIBUTE_UNUSED, + void **acl ATTRIBUTE_UNUSED) +{ + errno = ENOTSUP; + return -1; +} + + +int +virFileSetACLs(const char *file ATTRIBUTE_UNUSED, + void *acl ATTRIBUTE_UNUSED) +{ + errno = ENOTSUP; + return -1; +} + + +void +virFileFreeACLs(void **acl) +{ + *acl = NULL; +} + +#endif /* !defined(HAVE_SYS_ACL_H) */ + +int +virFileCopyACLs(const char *src, + const char *dst) +{ + void *acl = NULL; + int ret = -1; + + if (virFileGetACLs(src, &acl) < 0) + return ret; + + if (virFileSetACLs(dst, acl) < 0) + goto cleanup; + + ret = 0; + cleanup: + virFileFreeACLs(&acl); + return ret; +} diff --git a/src/util/virfile.h b/src/util/virfile.h index 5e3bfc00c..571e5bdc8 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -317,4 +317,15 @@ int virFileSetupDev(const char *path, int virFileBindMountDevice(const char *src, const char *dst); + +int virFileGetACLs(const char *file, + void **acl); + +int virFileSetACLs(const char *file, + void *acl); + +void virFileFreeACLs(void **acl); + +int virFileCopyACLs(const char *src, + const char *dst); #endif /* __VIR_FILE_H */ -- 2.11.0

We will need this function in near future so that we know what /dev device corresponds to the USB device. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virusb.c | 5 +++++ src/util/virusb.h | 1 + 3 files changed, 7 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index c1ed2c598..c91f3798f 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2634,6 +2634,7 @@ virUSBDeviceFree; virUSBDeviceGetBus; virUSBDeviceGetDevno; virUSBDeviceGetName; +virUSBDeviceGetPath; virUSBDeviceGetUsedBy; virUSBDeviceListAdd; virUSBDeviceListCount; diff --git a/src/util/virusb.c b/src/util/virusb.c index 6a001a77b..8cd2f57f4 100644 --- a/src/util/virusb.c +++ b/src/util/virusb.c @@ -406,6 +406,11 @@ const char *virUSBDeviceGetName(virUSBDevicePtr dev) return dev->name; } +const char *virUSBDeviceGetPath(virUSBDevicePtr dev) +{ + return dev->path; +} + unsigned int virUSBDeviceGetBus(virUSBDevicePtr dev) { return dev->bus; diff --git a/src/util/virusb.h b/src/util/virusb.h index f98ea2168..716e8c691 100644 --- a/src/util/virusb.h +++ b/src/util/virusb.h @@ -67,6 +67,7 @@ void virUSBDeviceGetUsedBy(virUSBDevicePtr dev, const char **drv_name, const char **dom_name); const char *virUSBDeviceGetName(virUSBDevicePtr dev); +const char *virUSBDeviceGetPath(virUSBDevicePtr usb); unsigned int virUSBDeviceGetBus(virUSBDevicePtr dev); unsigned int virUSBDeviceGetDevno(virUSBDevicePtr dev); -- 2.11.0

We will need this function in near future so that we know what /dev device corresponds to the SCSI device. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virscsi.c | 6 ++++++ src/util/virscsi.h | 1 + 3 files changed, 8 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index c91f3798f..b89320713 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2310,6 +2310,7 @@ virSCSIDeviceGetAdapter; virSCSIDeviceGetBus; virSCSIDeviceGetDevName; virSCSIDeviceGetName; +virSCSIDeviceGetPath; virSCSIDeviceGetReadonly; virSCSIDeviceGetSgName; virSCSIDeviceGetShareable; diff --git a/src/util/virscsi.c b/src/util/virscsi.c index 4843367e0..4fd883875 100644 --- a/src/util/virscsi.c +++ b/src/util/virscsi.c @@ -315,6 +315,12 @@ virSCSIDeviceGetName(virSCSIDevicePtr dev) return dev->name; } +const char * +virSCSIDeviceGetPath(virSCSIDevicePtr dev) +{ + return dev->sg_path; +} + unsigned int virSCSIDeviceGetAdapter(virSCSIDevicePtr dev) { diff --git a/src/util/virscsi.h b/src/util/virscsi.h index df40d7f62..7d88d4e70 100644 --- a/src/util/virscsi.h +++ b/src/util/virscsi.h @@ -58,6 +58,7 @@ int virSCSIDeviceSetUsedBy(virSCSIDevicePtr dev, const char *domname); bool virSCSIDeviceIsAvailable(virSCSIDevicePtr dev); const char *virSCSIDeviceGetName(virSCSIDevicePtr dev); +const char *virSCSIDeviceGetPath(virSCSIDevicePtr dev); unsigned int virSCSIDeviceGetAdapter(virSCSIDevicePtr dev); unsigned int virSCSIDeviceGetBus(virSCSIDevicePtr dev); unsigned int virSCSIDeviceGetTarget(virSCSIDevicePtr dev); -- 2.11.0

We will need this function in near future so that we know what /dev device corresponds to the SCSI device. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virscsivhost.c | 7 +++++++ src/util/virscsivhost.h | 1 + 3 files changed, 9 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b89320713..0fd673acf 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2332,6 +2332,7 @@ virSCSIDeviceSetUsedBy; virSCSIVHostDeviceFileIterate; virSCSIVHostDeviceFree; virSCSIVHostDeviceGetName; +virSCSIVHostDeviceGetPath; virSCSIVHostDeviceListAdd; virSCSIVHostDeviceListCount; virSCSIVHostDeviceListDel; diff --git a/src/util/virscsivhost.c b/src/util/virscsivhost.c index f32d647eb..dc7df757a 100644 --- a/src/util/virscsivhost.c +++ b/src/util/virscsivhost.c @@ -243,6 +243,13 @@ virSCSIVHostDeviceGetName(virSCSIVHostDevicePtr dev) } +const char * +virSCSIVHostDeviceGetPath(virSCSIVHostDevicePtr dev) +{ + return dev->path; +} + + virSCSIVHostDevicePtr virSCSIVHostDeviceNew(const char *name) { diff --git a/src/util/virscsivhost.h b/src/util/virscsivhost.h index 1a52acf02..6018b835e 100644 --- a/src/util/virscsivhost.h +++ b/src/util/virscsivhost.h @@ -40,6 +40,7 @@ int virSCSIVHostDeviceFileIterate(virSCSIVHostDevicePtr dev, virSCSIVHostDeviceFileActor actor, void *opaque); const char *virSCSIVHostDeviceGetName(virSCSIVHostDevicePtr dev); +const char *virSCSIVHostDeviceGetPath(virSCSIVHostDevicePtr dev); virSCSIVHostDevicePtr virSCSIVHostDeviceListGet(virSCSIVHostDeviceListPtr list, int idx); size_t virSCSIVHostDeviceListCount(virSCSIVHostDeviceListPtr list); -- 2.11.0

On Wed, Dec 07, 2016 at 09:36:13AM +0100, Michal Privoznik wrote:
We will need this function in near future so that we know what /dev device corresponds to the SCSI device.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virscsivhost.c | 7 +++++++ src/util/virscsivhost.h | 1 + 3 files changed, 9 insertions(+)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

This is a list of devices that qemu needs for its run (apart from what's configured for domain). The devices on the list are enabled in the CGroups by default so they will be good candidates for initial /dev for new qemu. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_cgroup.c | 2 +- src/qemu/qemu_cgroup.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 9a9e9a3fb..6c90d46d1 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -42,7 +42,7 @@ VIR_LOG_INIT("qemu.qemu_cgroup"); -static const char *const defaultDeviceACL[] = { +const char *const defaultDeviceACL[] = { "/dev/null", "/dev/full", "/dev/zero", "/dev/random", "/dev/urandom", "/dev/ptmx", "/dev/kvm", "/dev/kqemu", diff --git a/src/qemu/qemu_cgroup.h b/src/qemu/qemu_cgroup.h index 6e2c74262..8ae4a72ab 100644 --- a/src/qemu/qemu_cgroup.h +++ b/src/qemu/qemu_cgroup.h @@ -76,4 +76,5 @@ int qemuCgroupEmulatorAllNodesAllow(virCgroupPtr cgroup, qemuCgroupEmulatorAllNodesDataPtr *data); void qemuCgroupEmulatorAllNodesRestore(qemuCgroupEmulatorAllNodesDataPtr data); +extern const char *const defaultDeviceACL[]; #endif /* __QEMU_CGROUP_H__ */ -- 2.11.0

Prime time. When it comes to spawning qemu process and relabelling all the devices it's going to touch, there's inherent race with other applications in the system (e.g. udev). Instead of trying convincing udev to not touch libvirt managed devices, we can create a separate mount namespace for the qemu, and mount our own /dev there. Of course this puts more work onto us as we have to maintain /dev files on each domain start and device hot(un-)plug. On the other hand, this enhances security also.
From technical POV, on domain startup process the parent (libvirtd) creates:
/var/lib/libvirt/qemu/$domain.dev /var/lib/libvirt/qemu/$domain.devpts The child (which is going to be qemu eventually) calls unshare() to create new mount namespace. From now on anything that child does is invisible to the parent. Child then mounts tmpfs on $domain.dev (so that it still sees original /dev from the host) and creates some devices (as explained in one of the previous patches). The devices have to be created exactly as they are in the host (including perms, seclabels, ACLs, ...). After that it moves $domain.dev mount to /dev. What's the $domain.devpts mount there for then you ask? QEMU can create PTYs for some chardevs. And historically we exposed the host ends in our domain XML allowing users to connect to them. Therefore we must preserve devpts mount to be shared with the host's one. To make this patch as small as possible, creating of devices configured for domain in question is implemented in next patches. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 377 +++++++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_domain.h | 20 +++ src/qemu/qemu_process.c | 13 ++ 3 files changed, 408 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 4aae14d9d..50b401f79 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -55,6 +55,9 @@ #include <sys/time.h> #include <fcntl.h> +#if defined(HAVE_SYS_MOUNT_H) +# include <sys/mount.h> +#endif #include <libxml/xpathInternals.h> @@ -86,6 +89,10 @@ VIR_ENUM_IMPL(qemuDomainAsyncJob, QEMU_ASYNC_JOB_LAST, "start", ); +VIR_ENUM_IMPL(qemuDomainNamespace, QEMU_DOMAIN_NS_LAST, + "mount", +); + struct _qemuDomainLogContext { int refs; @@ -146,6 +153,31 @@ qemuDomainAsyncJobPhaseFromString(qemuDomainAsyncJob job, } +bool +qemuDomainNamespaceEnabled(virDomainObjPtr vm, + qemuDomainNamespace ns) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + return priv->namespaces && + virBitmapIsBitSet(priv->namespaces, ns); +} + + +static bool +qemuDomainEnableNamespace(virDomainObjPtr vm, + qemuDomainNamespace ns) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (!priv->namespaces && + !(priv->namespaces = virBitmapNew(QEMU_DOMAIN_NS_LAST))) + return -1; + + return virBitmapSetBit(priv->namespaces, ns); +} + + void qemuDomainEventQueue(virQEMUDriverPtr driver, virObjectEventPtr event) { @@ -1541,6 +1573,8 @@ qemuDomainObjPrivateFree(void *data) virObjectUnref(priv->qemuCaps); + virBitmapFree(priv->namespaces); + virCgroupFree(&priv->cgroup); virDomainPCIAddressSetFree(priv->pciaddrs); virDomainUSBAddressSetFree(priv->usbaddrs); @@ -1627,6 +1661,17 @@ qemuDomainObjPrivateXMLFormat(virBufferPtr buf, virDomainChrTypeToString(priv->monConfig->type)); } + if (priv->namespaces) { + ssize_t ns = -1; + + virBufferAddLit(buf, "<namespaces>\n"); + virBufferAdjustIndent(buf, 2); + while ((ns = virBitmapNextSetBit(priv->namespaces, ns)) >= 0) + virBufferAsprintf(buf, "<%s/>\n", qemuDomainNamespaceTypeToString(ns)); + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</namespaces>\n"); + } + qemuDomainObjPrivateXMLFormatVcpus(buf, vm->def); if (priv->qemuCaps) { @@ -1771,6 +1816,7 @@ qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, int n; size_t i; xmlNodePtr *nodes = NULL; + xmlNodePtr node = NULL; virQEMUCapsPtr qemuCaps = NULL; virCapsPtr caps = NULL; @@ -1809,6 +1855,32 @@ qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, goto error; } + if ((node = virXPathNode("./namespaces", ctxt))) { + xmlNodePtr next; + + for (next = node->children; next; next = next->next) { + int ns = qemuDomainNamespaceTypeFromString((const char *) next->name); + if (ns < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("malformed namespace name: %s"), + next->name); + goto error; + } + if (qemuDomainEnableNamespace(vm, ns) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to enable namespace: %d"), + ns); + goto error; + } + } + } + + if (priv->namespaces && + virBitmapIsAllClear(priv->namespaces)) { + virBitmapFree(priv->namespaces); + priv->namespaces = NULL; + } + if ((n = virXPathNodeSet("./vcpus/vcpu", ctxt, &nodes)) < 0) goto error; @@ -1959,10 +2031,12 @@ qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, return 0; error: + VIR_FREE(nodes); + VIR_FREE(tmp); + virBitmapFree(priv->namespaces); + priv->namespaces = NULL; virDomainChrSourceDefFree(priv->monConfig); priv->monConfig = NULL; - VIR_FREE(nodes); - VIR_FREE(tmp); virStringListFree(priv->qemuDevices); priv->qemuDevices = NULL; virObjectUnref(qemuCaps); @@ -6653,3 +6727,302 @@ qemuDomainSupportsVideoVga(virDomainVideoDefPtr video, return true; } + + +static int +qemuDomainCreateDevice(const char *device, + const char *path, + bool allow_noent) +{ + char *devicePath = NULL; + struct stat sb; + int ret = -1; + + if (!STRPREFIX(device, "/dev")) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("invalid device: %s"), + device); + goto cleanup; + } + + if (virAsprintf(&devicePath, "%s/%s", + path, device + 4) < 0) + goto cleanup; + + if (stat(device, &sb) < 0) { + if (errno == ENOENT && allow_noent) { + /* Ignore non-existent device. */ + ret = 0; + goto cleanup; + } + + virReportSystemError(errno, _("Unable to stat %s"), device); + goto cleanup; + } + + if (virFileMakeParentPath(devicePath) < 0) { + virReportSystemError(errno, + _("Unable to create %s"), + devicePath); + goto cleanup; + } + + if (mknod(devicePath, sb.st_mode, sb.st_rdev) < 0) { + virReportSystemError(errno, + _("Failed to make device %s"), + devicePath); + goto cleanup; + } + + if (chown(devicePath, sb.st_uid, sb.st_gid) < 0) { + virReportSystemError(errno, + _("Failed to chown device %s"), + devicePath); + goto cleanup; + } + + if (virFileCopyACLs(device, devicePath) < 0 && + errno != ENOTSUP) { + virReportSystemError(errno, + _("Failed to copy ACLs on device %s"), + devicePath); + goto cleanup; + } + + ret = 0; + cleanup: + VIR_FREE(devicePath); + return ret; +} + + + +static int +qemuDomainPopulateDevices(virQEMUDriverPtr driver, + virDomainObjPtr vm ATTRIBUTE_UNUSED, + const char *path) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + const char *const *devices = (const char *const *) cfg->cgroupDeviceACL; + size_t i; + int ret = -1; + + if (!devices) + devices = defaultDeviceACL; + + for (i = 0; devices[i]; i++) { + if (qemuDomainCreateDevice(devices[i], path, true) < 0) + goto cleanup; + } + + ret = 0; + cleanup: + virObjectUnref(cfg); + return ret; +} + + +static int +qemuDomainSetupDev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *path) +{ + char *mount_options = NULL; + char *opts = NULL; + int ret = -1; + + VIR_DEBUG("Setting up /dev/ for domain %s", vm->def->name); + + mount_options = virSecurityManagerGetMountOptions(driver->securityManager, + vm->def); + + if (!mount_options && + VIR_STRDUP(mount_options, "") < 0) + goto cleanup; + + /* + * tmpfs is limited to 64kb, since we only have device nodes in there + * and don't want to DOS the entire OS RAM usage + */ + if (virAsprintf(&opts, + "mode=755,size=65536%s", mount_options) < 0) + goto cleanup; + + if (virFileSetupDev(path, opts) < 0) + goto cleanup; + + if (qemuDomainPopulateDevices(driver, vm, path) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(opts); + VIR_FREE(mount_options); + return ret; +} + + +int +qemuDomainBuildNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + const unsigned long mount_flags = MS_MOVE; + char *devPath = NULL; + char *devptsPath = NULL; + int ret = -1; + + if (!qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) { + ret = 0; + goto cleanup; + } + + if (virAsprintf(&devPath, "%s/%s.dev", + cfg->stateDir, vm->def->name) < 0 || + virAsprintf(&devptsPath, "%s/%s.devpts", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + if (qemuDomainSetupDev(driver, vm, devPath) < 0) + goto cleanup; + + /* Save /dev/pts mount point because /dev/pts/NNN is exposed in our live + * XML and other applications are supposed to be able to use it. */ + if (mount("/dev/pts", devptsPath, NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, "%s", + _("Unable to move /dev/pts mount")); + goto cleanup; + } + + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, + _("Failed to mount %s on /dev"), + devPath); + goto cleanup; + } + + if (virFileMakePath("/dev/pts") < 0) { + virReportSystemError(errno, "%s", + _("Cannot create /dev/pts")); + goto cleanup; + } + + if (mount(devptsPath, "/dev/pts", NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, + _("Failed to mount %s on /dev/pts"), + devptsPath); + goto cleanup; + } + + if (virFileBindMountDevice("/dev/pts/ptmx", "/dev/ptmx") < 0) + goto cleanup; + + ret = 0; + cleanup: + virObjectUnref(cfg); + VIR_FREE(devptsPath); + VIR_FREE(devPath); + return ret; +} + + +#if defined(__linux__) +int +qemuDomainCreateNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + int ret = -1; + char *devPath = NULL; + char *devptsPath = NULL; + + if (!virQEMUDriverIsPrivileged(driver)) { + ret = 0; + goto cleanup; + } + + if (virAsprintf(&devPath, "%s/%s.dev", + cfg->stateDir, vm->def->name) < 0 || + virAsprintf(&devptsPath, "%s/%s.devpts", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + + if (virFileMakePath(devPath) < 0) { + virReportSystemError(errno, + _("Failed to create %s"), + devPath); + goto cleanup; + } + + if (virFileMakePath(devptsPath) < 0) { + virReportSystemError(errno, + _("Failed to create %s"), + devptsPath); + goto cleanup; + } + + /* Enabling of the mount namespace goes here. */ + + ret = 0; + cleanup: + if (ret < 0) { + if (devptsPath) + unlink(devptsPath); + if (devPath) + unlink(devPath); + } + VIR_FREE(devptsPath); + VIR_FREE(devPath); + virObjectUnref(cfg); + return ret; +} + +#else /* !defined(__linux__) */ + +int +qemuDomainCreateNamespace(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm ATTRIBUTE_UNUSED) +{ + /* Namespaces are Linux specific. On other platforms just + * carry on with the old behaviour. */ + return 0; +} +#endif /* !defined(__linux__) */ + + +void +qemuDomainDeleteNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + char *devPath = NULL; + char *devptsPath = NULL; + + if (!qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) + return; + + if (virAsprintf(&devPath, "%s/%s.dev", + cfg->stateDir, vm->def->name) < 0 || + virAsprintf(&devptsPath, "%s/%s.devpts", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + if (rmdir(devPath) < 0) { + virReportSystemError(errno, + _("Unable to remove %s"), + devPath); + /* Bet effort. Fall through. */ + } + + if (rmdir(devptsPath) < 0) { + virReportSystemError(errno, + _("Unable to remove %s"), + devptsPath); + /* Bet effort. Fall through. */ + } + cleanup: + virObjectUnref(cfg); + VIR_FREE(devptsPath); + VIR_FREE(devPath); +} diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 7650ff392..9dad5bc7c 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -165,11 +165,23 @@ struct _qemuDomainUnpluggingDevice { qemuDomainUnpluggingDeviceStatus status; }; + +typedef enum { + QEMU_DOMAIN_NS_MOUNT = 0, + QEMU_DOMAIN_NS_LAST +} qemuDomainNamespace; +VIR_ENUM_DECL(qemuDomainNamespace) + +bool qemuDomainNamespaceEnabled(virDomainObjPtr vm, + qemuDomainNamespace ns); + typedef struct _qemuDomainObjPrivate qemuDomainObjPrivate; typedef qemuDomainObjPrivate *qemuDomainObjPrivatePtr; struct _qemuDomainObjPrivate { struct qemuDomainJobObj job; + virBitmapPtr namespaces; + qemuMonitorPtr mon; virDomainChrSourceDefPtr monConfig; bool monJSON; @@ -785,4 +797,12 @@ int qemuDomainCheckMonitor(virQEMUDriverPtr driver, bool qemuDomainSupportsVideoVga(virDomainVideoDefPtr video, virQEMUCapsPtr qemuCaps); +int qemuDomainBuildNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm); + +int qemuDomainCreateNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm); + +void qemuDomainDeleteNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm); #endif /* __QEMU_DOMAIN_H__ */ diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 7f19c691e..b87e5af56 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2657,6 +2657,12 @@ static int qemuProcessHook(void *data) if (virSecurityManagerClearSocketLabel(h->driver->securityManager, h->vm->def) < 0) goto cleanup; + if (virProcessSetupPrivateMountNS() < 0) + goto cleanup; + + if (qemuDomainBuildNamespace(h->driver, h->vm) < 0) + goto cleanup; + if (virDomainNumatuneGetMode(h->vm->def->numa, -1, &mode) == 0) { if (mode == VIR_DOMAIN_NUMATUNE_MEM_STRICT && h->cfg->cgroupControllers & (1 << VIR_CGROUP_CONTROLLER_CPUSET) && @@ -5429,6 +5435,11 @@ qemuProcessLaunch(virConnectPtr conn, qemuDomainLogContextMarkPosition(logCtxt); + VIR_DEBUG("Building mount namespace"); + + if (qemuDomainCreateNamespace(driver, vm) < 0) + goto cleanup; + VIR_DEBUG("Clear emulator capabilities: %d", cfg->clearEmulatorCapabilities); if (cfg->clearEmulatorCapabilities) @@ -6190,6 +6201,8 @@ void qemuProcessStop(virQEMUDriverPtr driver, } } + qemuDomainDeleteNamespace(driver, vm); + vm->taint = 0; vm->pid = -1; virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason); -- 2.11.0

On Wed, Dec 07, 2016 at 09:36:15AM +0100, Michal Privoznik wrote:
Prime time. When it comes to spawning qemu process and relabelling all the devices it's going to touch, there's inherent race with other applications in the system (e.g. udev). Instead of trying convincing udev to not touch libvirt managed devices, we can create a separate mount namespace for the qemu, and mount our own /dev there. Of course this puts more work onto us as we have to maintain /dev files on each domain start and device hot(un-)plug. On the other hand, this enhances security also.
From technical POV, on domain startup process the parent (libvirtd) creates:
/var/lib/libvirt/qemu/$domain.dev /var/lib/libvirt/qemu/$domain.devpts
The child (which is going to be qemu eventually) calls unshare() to create new mount namespace. From now on anything that child does is invisible to the parent. Child then mounts tmpfs on $domain.dev (so that it still sees original /dev from the host) and creates some devices (as explained in one of the previous patches). The devices have to be created exactly as they are in the host (including perms, seclabels, ACLs, ...). After that it moves $domain.dev mount to /dev.
What's the $domain.devpts mount there for then you ask? QEMU can create PTYs for some chardevs. And historically we exposed the host ends in our domain XML allowing users to connect to them. Therefore we must preserve devpts mount to be shared with the host's one.
To make this patch as small as possible, creating of devices configured for domain in question is implemented in next patches.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 377 +++++++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_domain.h | 20 +++ src/qemu/qemu_process.c | 13 ++ 3 files changed, 408 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 4aae14d9d..50b401f79 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -55,6 +55,9 @@
#include <sys/time.h> #include <fcntl.h> +#if defined(HAVE_SYS_MOUNT_H) +# include <sys/mount.h> +#endif
#include <libxml/xpathInternals.h>
@@ -86,6 +89,10 @@ VIR_ENUM_IMPL(qemuDomainAsyncJob, QEMU_ASYNC_JOB_LAST, "start", );
+VIR_ENUM_IMPL(qemuDomainNamespace, QEMU_DOMAIN_NS_LAST, + "mount", +); +
struct _qemuDomainLogContext { int refs; @@ -146,6 +153,31 @@ qemuDomainAsyncJobPhaseFromString(qemuDomainAsyncJob job, }
+bool +qemuDomainNamespaceEnabled(virDomainObjPtr vm, + qemuDomainNamespace ns) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + return priv->namespaces && + virBitmapIsBitSet(priv->namespaces, ns); +} + + +static bool +qemuDomainEnableNamespace(virDomainObjPtr vm, + qemuDomainNamespace ns) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (!priv->namespaces && + !(priv->namespaces = virBitmapNew(QEMU_DOMAIN_NS_LAST))) + return -1; + + return virBitmapSetBit(priv->namespaces, ns);
Returning -1 from a method returning 'bool' is not what you want :-) Also causes a compile warning in the caller about comparing bool with '< 0'. Looking at the callers, I think you probably wanted @@ -169,7 +169,7 @@ qemuDomainNamespaceEnabled(virDomainObjPtr vm, } -static bool +static int qemuDomainEnableNamespace(virDomainObjPtr vm, qemuDomainNamespace ns) { @@ -179,7 +179,9 @@ qemuDomainEnableNamespace(virDomainObjPtr vm, !(priv->namespaces = virBitmapNew(QEMU_DOMAIN_NS_LAST))) return -1; - return virBitmapSetBit(priv->namespaces, ns); + virBitmapSetBit(priv->namespaces, ns); + + return 0; } ACK with such a change made. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On Mon, Dec 12, 2016 at 10:55:55AM +0000, Daniel P. Berrange wrote:
On Wed, Dec 07, 2016 at 09:36:15AM +0100, Michal Privoznik wrote:
Prime time. When it comes to spawning qemu process and relabelling all the devices it's going to touch, there's inherent race with other applications in the system (e.g. udev). Instead of trying convincing udev to not touch libvirt managed devices, we can create a separate mount namespace for the qemu, and mount our own /dev there. Of course this puts more work onto us as we have to maintain /dev files on each domain start and device hot(un-)plug. On the other hand, this enhances security also.
From technical POV, on domain startup process the parent (libvirtd) creates:
/var/lib/libvirt/qemu/$domain.dev /var/lib/libvirt/qemu/$domain.devpts
The child (which is going to be qemu eventually) calls unshare() to create new mount namespace. From now on anything that child does is invisible to the parent. Child then mounts tmpfs on $domain.dev (so that it still sees original /dev from the host) and creates some devices (as explained in one of the previous patches). The devices have to be created exactly as they are in the host (including perms, seclabels, ACLs, ...). After that it moves $domain.dev mount to /dev.
What's the $domain.devpts mount there for then you ask? QEMU can create PTYs for some chardevs. And historically we exposed the host ends in our domain XML allowing users to connect to them. Therefore we must preserve devpts mount to be shared with the host's one.
To make this patch as small as possible, creating of devices configured for domain in question is implemented in next patches.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 377 +++++++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_domain.h | 20 +++ src/qemu/qemu_process.c | 13 ++ 3 files changed, 408 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 4aae14d9d..50b401f79 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -55,6 +55,9 @@
#include <sys/time.h> #include <fcntl.h> +#if defined(HAVE_SYS_MOUNT_H) +# include <sys/mount.h> +#endif
#include <libxml/xpathInternals.h>
@@ -86,6 +89,10 @@ VIR_ENUM_IMPL(qemuDomainAsyncJob, QEMU_ASYNC_JOB_LAST, "start", );
+VIR_ENUM_IMPL(qemuDomainNamespace, QEMU_DOMAIN_NS_LAST, + "mount", +); +
struct _qemuDomainLogContext { int refs; @@ -146,6 +153,31 @@ qemuDomainAsyncJobPhaseFromString(qemuDomainAsyncJob job, }
+bool +qemuDomainNamespaceEnabled(virDomainObjPtr vm, + qemuDomainNamespace ns) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + return priv->namespaces && + virBitmapIsBitSet(priv->namespaces, ns); +} + + +static bool +qemuDomainEnableNamespace(virDomainObjPtr vm, + qemuDomainNamespace ns) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (!priv->namespaces && + !(priv->namespaces = virBitmapNew(QEMU_DOMAIN_NS_LAST))) + return -1; + + return virBitmapSetBit(priv->namespaces, ns);
Returning -1 from a method returning 'bool' is not what you want :-) Also causes a compile warning in the caller about comparing bool with '< 0'. Looking at the callers, I think you probably wanted
@@ -169,7 +169,7 @@ qemuDomainNamespaceEnabled(virDomainObjPtr vm, }
-static bool +static int qemuDomainEnableNamespace(virDomainObjPtr vm, qemuDomainNamespace ns) { @@ -179,7 +179,9 @@ qemuDomainEnableNamespace(virDomainObjPtr vm, !(priv->namespaces = virBitmapNew(QEMU_DOMAIN_NS_LAST))) return -1;
- return virBitmapSetBit(priv->namespaces, ns); + virBitmapSetBit(priv->namespaces, ns); + + return 0; }
ACK with such a change made.
Except that checking return value of virBitmapSetBit is also required. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 12.12.2016 12:47, Daniel P. Berrange wrote:
On Mon, Dec 12, 2016 at 10:55:55AM +0000, Daniel P. Berrange wrote:
On Wed, Dec 07, 2016 at 09:36:15AM +0100, Michal Privoznik wrote:
Prime time. When it comes to spawning qemu process and relabelling all the devices it's going to touch, there's inherent race with other applications in the system (e.g. udev). Instead of trying convincing udev to not touch libvirt managed devices, we can create a separate mount namespace for the qemu, and mount our own /dev there. Of course this puts more work onto us as we have to maintain /dev files on each domain start and device hot(un-)plug. On the other hand, this enhances security also.
From technical POV, on domain startup process the parent (libvirtd) creates:
/var/lib/libvirt/qemu/$domain.dev /var/lib/libvirt/qemu/$domain.devpts
The child (which is going to be qemu eventually) calls unshare() to create new mount namespace. From now on anything that child does is invisible to the parent. Child then mounts tmpfs on $domain.dev (so that it still sees original /dev from the host) and creates some devices (as explained in one of the previous patches). The devices have to be created exactly as they are in the host (including perms, seclabels, ACLs, ...). After that it moves $domain.dev mount to /dev.
What's the $domain.devpts mount there for then you ask? QEMU can create PTYs for some chardevs. And historically we exposed the host ends in our domain XML allowing users to connect to them. Therefore we must preserve devpts mount to be shared with the host's one.
To make this patch as small as possible, creating of devices configured for domain in question is implemented in next patches.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 377 +++++++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_domain.h | 20 +++ src/qemu/qemu_process.c | 13 ++ 3 files changed, 408 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 4aae14d9d..50b401f79 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -55,6 +55,9 @@
#include <sys/time.h> #include <fcntl.h> +#if defined(HAVE_SYS_MOUNT_H) +# include <sys/mount.h> +#endif
#include <libxml/xpathInternals.h>
@@ -86,6 +89,10 @@ VIR_ENUM_IMPL(qemuDomainAsyncJob, QEMU_ASYNC_JOB_LAST, "start", );
+VIR_ENUM_IMPL(qemuDomainNamespace, QEMU_DOMAIN_NS_LAST, + "mount", +); +
struct _qemuDomainLogContext { int refs; @@ -146,6 +153,31 @@ qemuDomainAsyncJobPhaseFromString(qemuDomainAsyncJob job, }
+bool +qemuDomainNamespaceEnabled(virDomainObjPtr vm, + qemuDomainNamespace ns) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + return priv->namespaces && + virBitmapIsBitSet(priv->namespaces, ns); +} + + +static bool +qemuDomainEnableNamespace(virDomainObjPtr vm, + qemuDomainNamespace ns) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (!priv->namespaces && + !(priv->namespaces = virBitmapNew(QEMU_DOMAIN_NS_LAST))) + return -1; + + return virBitmapSetBit(priv->namespaces, ns);
Returning -1 from a method returning 'bool' is not what you want :-) Also causes a compile warning in the caller about comparing bool with '< 0'. Looking at the callers, I think you probably wanted
Btw: my compiler does not warn me. Even with -Wextra :-(
@@ -169,7 +169,7 @@ qemuDomainNamespaceEnabled(virDomainObjPtr vm, }
-static bool +static int qemuDomainEnableNamespace(virDomainObjPtr vm, qemuDomainNamespace ns) { @@ -179,7 +179,9 @@ qemuDomainEnableNamespace(virDomainObjPtr vm, !(priv->namespaces = virBitmapNew(QEMU_DOMAIN_NS_LAST))) return -1;
- return virBitmapSetBit(priv->namespaces, ns); + virBitmapSetBit(priv->namespaces, ns); + + return 0; }
ACK with such a change made.
Except that checking return value of virBitmapSetBit is also required.
Okay, I'm sticking with: - return virBitmapSetBit(priv->namespaces, ns); + if (virBitmapSetBit(priv->namespaces, ns) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to enable namespace: %s"), + qemuDomainNamespaceTypeToString(ns)); + return -1; + } + + return 0; Which makes sense from another POV: if virBitmapNew() fails, it also reports error, however virBitmapSetBit() does not. So upon failure qemuDomainEnableNamespace() might and might not reported error. Michal

On Wed, Dec 07, 2016 at 09:36:15AM +0100, Michal Privoznik wrote:
Prime time. When it comes to spawning qemu process and relabelling all the devices it's going to touch, there's inherent race with other applications in the system (e.g. udev). Instead of trying convincing udev to not touch libvirt managed devices, we can create a separate mount namespace for the qemu, and mount our own /dev there. Of course this puts more work onto us as we have to maintain /dev files on each domain start and device hot(un-)plug. On the other hand, this enhances security also.
From technical POV, on domain startup process the parent (libvirtd) creates:
/var/lib/libvirt/qemu/$domain.dev /var/lib/libvirt/qemu/$domain.devpts
The child (which is going to be qemu eventually) calls unshare() to create new mount namespace. From now on anything that child does is invisible to the parent. Child then mounts tmpfs on $domain.dev (so that it still sees original /dev from the host) and creates some devices (as explained in one of the previous patches). The devices have to be created exactly as they are in the host (including perms, seclabels, ACLs, ...). After that it moves $domain.dev mount to /dev.
What's the $domain.devpts mount there for then you ask? QEMU can create PTYs for some chardevs. And historically we exposed the host ends in our domain XML allowing users to connect to them. Therefore we must preserve devpts mount to be shared with the host's one.
To make this patch as small as possible, creating of devices configured for domain in question is implemented in next patches.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 377 +++++++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_domain.h | 20 +++ src/qemu/qemu_process.c | 13 ++ 3 files changed, 408 insertions(+), 2 deletions(-)
+int +qemuDomainBuildNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + const unsigned long mount_flags = MS_MOVE; + char *devPath = NULL; + char *devptsPath = NULL; + int ret = -1; + + if (!qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) { + ret = 0; + goto cleanup; + } + + if (virAsprintf(&devPath, "%s/%s.dev", + cfg->stateDir, vm->def->name) < 0 || + virAsprintf(&devptsPath, "%s/%s.devpts", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + if (qemuDomainSetupDev(driver, vm, devPath) < 0) + goto cleanup; + + /* Save /dev/pts mount point because /dev/pts/NNN is exposed in our live + * XML and other applications are supposed to be able to use it. */ + if (mount("/dev/pts", devptsPath, NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, "%s", + _("Unable to move /dev/pts mount")); + goto cleanup; + } + + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, + _("Failed to mount %s on /dev"), + devPath); + goto cleanup; + } + + if (virFileMakePath("/dev/pts") < 0) { + virReportSystemError(errno, "%s", + _("Cannot create /dev/pts")); + goto cleanup; + } + + if (mount(devptsPath, "/dev/pts", NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, + _("Failed to mount %s on /dev/pts"), + devptsPath); + goto cleanup; + } + + if (virFileBindMountDevice("/dev/pts/ptmx", "/dev/ptmx") < 0) + goto cleanup;
Kill this line, since it breaks Fedora with a restrictive /dev/pts/ptmx. Unlike wth the LXC driver, we don't need this symlink, since we are using the original /dev/pts, not creating a new instance. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 50b401f79..2d9b2d647 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6862,6 +6862,53 @@ qemuDomainSetupDev(virQEMUDriverPtr driver, } +static int +qemuDomainSetupDisk(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainDiskDefPtr disk, + const char *devPath) +{ + virStorageSourcePtr next; + char *dst = NULL; + int ret = -1; + + for (next = disk->src; next; next = next->backingStore) { + if (!next->path || !virStorageSourceIsLocalStorage(next) || + !STRPREFIX(next->path, "/dev")) { + /* Not creating device. Just continue. */ + continue; + } + + if (qemuDomainCreateDevice(next->path, devPath, false) < 0) + goto cleanup; + } + + ret = 0; + cleanup: + VIR_FREE(dst); + return ret; +} + + +static int +qemuDomainSetupAllDisks(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *devPath) +{ + size_t i; + VIR_DEBUG("Setting up disks"); + + for (i = 0; i < vm->def->ndisks; i++) { + if (qemuDomainSetupDisk(driver, + vm->def->disks[i], + devPath) < 0) + return -1; + } + + VIR_DEBUG("Setup all disks"); + return 0; +} + + int qemuDomainBuildNamespace(virQEMUDriverPtr driver, virDomainObjPtr vm) @@ -6894,6 +6941,9 @@ qemuDomainBuildNamespace(virQEMUDriverPtr driver, goto cleanup; } + if (qemuDomainSetupAllDisks(driver, vm, devPath) < 0) + goto cleanup; + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { virReportSystemError(errno, _("Failed to mount %s on /dev"), -- 2.11.0

On Wed, Dec 07, 2016 at 09:36:16AM +0100, Michal Privoznik wrote:
When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 161 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 161 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 2d9b2d647..c6c3cb377 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6909,6 +6909,164 @@ qemuDomainSetupAllDisks(virQEMUDriverPtr driver, } +static int +qemuDomainGetHostdevPath(virDomainHostdevDefPtr dev, + char **path) +{ + int ret = -1; + virDomainHostdevSubsysUSBPtr usbsrc = &dev->source.subsys.u.usb; + virDomainHostdevSubsysPCIPtr pcisrc = &dev->source.subsys.u.pci; + virDomainHostdevSubsysSCSIPtr scsisrc = &dev->source.subsys.u.scsi; + virDomainHostdevSubsysSCSIVHostPtr hostsrc = &dev->source.subsys.u.scsi_host; + virPCIDevicePtr pci = NULL; + virUSBDevicePtr usb = NULL; + virSCSIDevicePtr scsi = NULL; + virSCSIVHostDevicePtr host = NULL; + char *tmpPath = NULL; + bool freeTmpPath = false; + + *path = NULL; + + switch ((virDomainHostdevMode) dev->mode) { + case VIR_DOMAIN_HOSTDEV_MODE_SUBSYS: + switch ((virDomainHostdevSubsysType) dev->source.subsys.type) { + case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: + if (pcisrc->backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO) { + pci = virPCIDeviceNew(pcisrc->addr.domain, + pcisrc->addr.bus, + pcisrc->addr.slot, + pcisrc->addr.function); + if (!pci) + goto cleanup; + + if (!(tmpPath = virPCIDeviceGetIOMMUGroupDev(pci))) + goto cleanup; + freeTmpPath = true; + } + break; + + case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB: + if (dev->missing) + break; + usb = virUSBDeviceNew(usbsrc->bus, + usbsrc->device, + NULL); + if (!usb) + goto cleanup; + + if (!(tmpPath = (char *) virUSBDeviceGetPath(usb))) + goto cleanup; + break; + + case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_SCSI: + if (scsisrc->protocol == VIR_DOMAIN_HOSTDEV_SCSI_PROTOCOL_TYPE_ISCSI) { + virDomainHostdevSubsysSCSIiSCSIPtr iscsisrc = &scsisrc->u.iscsi; + /* Follow qemuSetupDiskCgroup() and qemuSetImageCgroupInternal() + * which does nothing for non local storage + */ + VIR_DEBUG("Not updating /dev for hostdev iSCSI path '%s'", iscsisrc->path); + } else { + virDomainHostdevSubsysSCSIHostPtr scsihostsrc = &scsisrc->u.host; + scsi = virSCSIDeviceNew(NULL, + scsihostsrc->adapter, + scsihostsrc->bus, + scsihostsrc->target, + scsihostsrc->unit, + dev->readonly, + dev->shareable); + + if (!scsi) + goto cleanup; + + if (!(tmpPath = (char *) virSCSIDeviceGetPath(scsi))) + goto cleanup; + } + break; + + case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_SCSI_HOST: { + if (hostsrc->protocol == + VIR_DOMAIN_HOSTDEV_SUBSYS_SCSI_HOST_PROTOCOL_TYPE_VHOST) { + if (!(host = virSCSIVHostDeviceNew(hostsrc->wwpn))) + goto cleanup; + + if (!(tmpPath = (char *) virSCSIVHostDeviceGetPath(host))) + goto cleanup; + } + break; + } + + case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_LAST: + break; + } + break; + + case VIR_DOMAIN_HOSTDEV_MODE_CAPABILITIES: + case VIR_DOMAIN_HOSTDEV_MODE_LAST: + /* nada */ + break; + } + + if (VIR_STRDUP(*path, tmpPath) < 0) + goto cleanup; + + ret = 0; + cleanup: + virPCIDeviceFree(pci); + virUSBDeviceFree(usb); + virSCSIDeviceFree(scsi); + virSCSIVHostDeviceFree(host); + if (freeTmpPath) + VIR_FREE(tmpPath); + return ret; +} + + +static int +qemuDomainSetupHostdev(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainHostdevDefPtr dev, + const char *devPath) +{ + int ret = -1; + char *path = NULL; + + if (qemuDomainGetHostdevPath(dev, &path) < 0) + goto cleanup; + + if (!path) { + /* There's no /dev device that we need to create. Claim success. */ + ret = 0; + goto cleanup; + } + + if (qemuDomainCreateDevice(path, devPath, false) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(path); + return ret; +} + + +static int +qemuDomainSetupAllHostdevs(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *devPath) +{ + size_t i; + + VIR_DEBUG("Setting up hostdevs"); + for (i = 0; i < vm->def->nhostdevs; i++) { + if (qemuDomainSetupHostdev(driver, + vm->def->hostdevs[i], + devPath) < 0) + return -1; + } + VIR_DEBUG("Setup all hostdevs"); + return 0; +} + + int qemuDomainBuildNamespace(virQEMUDriverPtr driver, virDomainObjPtr vm) @@ -6944,6 +7102,9 @@ qemuDomainBuildNamespace(virQEMUDriverPtr driver, if (qemuDomainSetupAllDisks(driver, vm, devPath) < 0) goto cleanup; + if (qemuDomainSetupAllHostdevs(driver, vm, devPath) < 0) + goto cleanup; + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { virReportSystemError(errno, _("Failed to mount %s on /dev"), -- 2.11.0

On Wed, Dec 07, 2016 at 09:36:17AM +0100, Michal Privoznik wrote:
When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 161 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 161 insertions(+)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index c6c3cb377..6cff3c92e 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -7067,6 +7067,38 @@ qemuDomainSetupAllHostdevs(virQEMUDriverPtr driver, } +static int +qemuDomainSetupChardev(virDomainDefPtr def ATTRIBUTE_UNUSED, + virDomainChrDefPtr dev, + void *opaque) +{ + const char *devPath = opaque; + + if (dev->source->type != VIR_DOMAIN_CHR_TYPE_DEV) + return 0; + + return qemuDomainCreateDevice(dev->source->data.file.path, devPath, false); +} + + +static int +qemuDomainSetupAllChardevs(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm, + const char *devPath) +{ + VIR_DEBUG("Setting up chardevs"); + + if (virDomainChrDefForeach(vm->def, + true, + qemuDomainSetupChardev, + (void *) devPath) < 0) + return -1; + + VIR_DEBUG("Setup all chardevs"); + return 0; +} + + int qemuDomainBuildNamespace(virQEMUDriverPtr driver, virDomainObjPtr vm) @@ -7105,6 +7137,9 @@ qemuDomainBuildNamespace(virQEMUDriverPtr driver, if (qemuDomainSetupAllHostdevs(driver, vm, devPath) < 0) goto cleanup; + if (qemuDomainSetupAllChardevs(driver, vm, devPath) < 0) + goto cleanup; + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { virReportSystemError(errno, _("Failed to mount %s on /dev"), -- 2.11.0

On Wed, Dec 07, 2016 at 09:36:18AM +0100, Michal Privoznik wrote:
When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 6cff3c92e..9beca4cd3 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -7099,6 +7099,35 @@ qemuDomainSetupAllChardevs(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, } +static int +qemuDomainSetupTPM(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm, + const char *devPath) +{ + virDomainTPMDefPtr dev = vm->def->tpm; + + if (!dev) + return 0; + + VIR_DEBUG("Setting up TPM"); + + switch (dev->type) { + case VIR_DOMAIN_TPM_TYPE_PASSTHROUGH: + if (qemuDomainCreateDevice(dev->data.passthrough.source.data.file.path, + devPath, false) < 0) + return -1; + break; + + case VIR_DOMAIN_TPM_TYPE_LAST: + /* nada */ + break; + } + + VIR_DEBUG("Setup TPM"); + return 0; +} + + int qemuDomainBuildNamespace(virQEMUDriverPtr driver, virDomainObjPtr vm) @@ -7140,6 +7169,9 @@ qemuDomainBuildNamespace(virQEMUDriverPtr driver, if (qemuDomainSetupAllChardevs(driver, vm, devPath) < 0) goto cleanup; + if (qemuDomainSetupTPM(driver, vm, devPath) < 0) + goto cleanup; + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { virReportSystemError(errno, _("Failed to mount %s on /dev"), -- 2.11.0

On Wed, Dec 07, 2016 at 09:36:19AM +0100, Michal Privoznik wrote:
When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 9beca4cd3..6dd7c029d 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -7128,6 +7128,52 @@ qemuDomainSetupTPM(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, } +static int +qemuDomainSetupInput(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainInputDefPtr input, + const char *devPath) +{ + int ret = -1; + + switch ((virDomainInputType) input->type) { + case VIR_DOMAIN_INPUT_TYPE_PASSTHROUGH: + if (qemuDomainCreateDevice(input->source.evdev, devPath, false) < 0) + goto cleanup; + break; + + case VIR_DOMAIN_INPUT_TYPE_MOUSE: + case VIR_DOMAIN_INPUT_TYPE_TABLET: + case VIR_DOMAIN_INPUT_TYPE_KBD: + case VIR_DOMAIN_INPUT_TYPE_LAST: + /* nada */ + break; + } + + ret = 0; + cleanup: + return ret; +} + + +static int +qemuDomainSetupAllInputs(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *devPath) +{ + size_t i; + + VIR_DEBUG("Setting up disks"); + for (i = 0; i < vm->def->ninputs; i++) { + if (qemuDomainSetupInput(driver, + vm->def->inputs[i], + devPath) < 0) + return -1; + } + VIR_DEBUG("Setup all disks"); + return 0; +} + + int qemuDomainBuildNamespace(virQEMUDriverPtr driver, virDomainObjPtr vm) @@ -7172,6 +7218,9 @@ qemuDomainBuildNamespace(virQEMUDriverPtr driver, if (qemuDomainSetupTPM(driver, vm, devPath) < 0) goto cleanup; + if (qemuDomainSetupAllInputs(driver, vm, devPath) < 0) + goto cleanup; + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { virReportSystemError(errno, _("Failed to mount %s on /dev"), -- 2.11.0

On Wed, Dec 07, 2016 at 09:36:20AM +0100, Michal Privoznik wrote:
When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 6dd7c029d..a7d26e58c 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -7174,6 +7174,46 @@ qemuDomainSetupAllInputs(virQEMUDriverPtr driver, } +static int +qemuDomainSetupRNG(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainRNGDefPtr rng, + const char *devPath) +{ + switch ((virDomainRNGBackend) rng->backend) { + case VIR_DOMAIN_RNG_BACKEND_RANDOM: + if (qemuDomainCreateDevice(rng->source.file, devPath, false) < 0) + return -1; + + case VIR_DOMAIN_RNG_BACKEND_EGD: + case VIR_DOMAIN_RNG_BACKEND_LAST: + /* nada */ + break; + } + + return 0; +} + + +static int +qemuDomainSetupAllRNGs(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *devPath) +{ + size_t i; + + VIR_DEBUG("Setting up RNGs"); + for (i = 0; i < vm->def->nrngs; i++) { + if (qemuDomainSetupRNG(driver, + vm->def->rngs[i], + devPath) < 0) + return -1; + } + + VIR_DEBUG("Setup all RNGs"); + return 0; +} + + int qemuDomainBuildNamespace(virQEMUDriverPtr driver, virDomainObjPtr vm) @@ -7221,6 +7261,9 @@ qemuDomainBuildNamespace(virQEMUDriverPtr driver, if (qemuDomainSetupAllInputs(driver, vm, devPath) < 0) goto cleanup; + if (qemuDomainSetupAllRNGs(driver, vm, devPath) < 0) + goto cleanup; + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { virReportSystemError(errno, _("Failed to mount %s on /dev"), -- 2.11.0

On Wed, Dec 07, 2016 at 09:36:21AM +0100, Michal Privoznik wrote:
When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

Instead of trying to fix our security drivers, we can use a simple trick to relabel paths in both namespace and the host. I mean, if we enter the namespace some paths are still shared with the host so any change done to them is visible from the host too. Therefore, we can just enter the namespace and call SetAllLabel()/RestoreAllLabel() from there. Yes, it has slight overhead because we have to fork in order to enter the namespace. But on the other hand, no complexity is added to our code. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/Makefile.am | 3 +- src/qemu/qemu_process.c | 15 +++--- src/qemu/qemu_security.c | 132 +++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_security.h | 39 ++++++++++++++ 4 files changed, 181 insertions(+), 8 deletions(-) create mode 100644 src/qemu/qemu_security.c create mode 100644 src/qemu/qemu_security.h diff --git a/src/Makefile.am b/src/Makefile.am index 07a28335a..073d50723 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -838,7 +838,8 @@ QEMU_DRIVER_SOURCES = \ qemu/qemu_monitor_json.h \ qemu/qemu_driver.c qemu/qemu_driver.h \ qemu/qemu_interface.c qemu/qemu_interface.h \ - qemu/qemu_capspriv.h + qemu/qemu_capspriv.h \ + qemu/qemu_security.c qemu/qemu_security.h XENAPI_DRIVER_SOURCES = \ xenapi/xenapi_driver.c xenapi/xenapi_driver.h \ diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index b87e5af56..be569f3b3 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -45,6 +45,7 @@ #include "qemu_hotplug.h" #include "qemu_migration.h" #include "qemu_interface.h" +#include "qemu_security.h" #include "cpu/cpu.h" #include "datatypes.h" @@ -5522,10 +5523,10 @@ qemuProcessLaunch(virConnectPtr conn, goto cleanup; VIR_DEBUG("Setting domain security labels"); - if (virSecurityManagerSetAllLabel(driver->securityManager, - vm->def, - incoming ? incoming->path : NULL) < 0) - goto cleanup; + if (qemuSecuritySetAllLabel(driver, + vm, + incoming ? incoming->path : NULL) < 0) + goto cleanup; /* Security manager labeled all devices, therefore * if any operation from now on fails, we need to ask the caller to @@ -6060,9 +6061,9 @@ void qemuProcessStop(virQEMUDriverPtr driver, /* Reset Security Labels unless caller don't want us to */ if (!(flags & VIR_QEMU_PROCESS_STOP_NO_RELABEL)) - virSecurityManagerRestoreAllLabel(driver->securityManager, - vm->def, - !!(flags & VIR_QEMU_PROCESS_STOP_MIGRATED)); + qemuSecurityRestoreAllLabel(driver, vm, + !!(flags & VIR_QEMU_PROCESS_STOP_MIGRATED)); + virSecurityManagerReleaseLabel(driver->securityManager, vm->def); for (i = 0; i < vm->def->ndisks; i++) { diff --git a/src/qemu/qemu_security.c b/src/qemu/qemu_security.c new file mode 100644 index 000000000..bfebfe42c --- /dev/null +++ b/src/qemu/qemu_security.c @@ -0,0 +1,132 @@ +/* + * qemu_security.c: QEMU security management + * + * Copyright (C) 2016 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Michal Privoznik <mprivozn@redhat.com> + */ + +#include <config.h> + +#include "qemu_domain.h" +#include "qemu_security.h" +#include "virlog.h" + +#define VIR_FROM_THIS VIR_FROM_QEMU + +VIR_LOG_INIT("qemu.qemu_process"); + +struct qemuSecuritySetRestoreAllLabelData { + bool set; + virQEMUDriverPtr driver; + virDomainObjPtr vm; + const char *stdin_path; + bool migrated; +}; + + +static int +qemuSecuritySetRestoreAllLabelHelper(pid_t pid, + void *opaque) +{ + struct qemuSecuritySetRestoreAllLabelData *data = opaque; + + virSecurityManagerPostFork(data->driver->securityManager); + + if (data->set) { + VIR_DEBUG("Setting up security labels inside namespace pid=%lld", + (long long) pid); + if (virSecurityManagerSetAllLabel(data->driver->securityManager, + data->vm->def, + data->stdin_path) < 0) + return -1; + } else { + VIR_DEBUG("Restoring security labels inside namespace pid=%lld", + (long long) pid); + if (virSecurityManagerRestoreAllLabel(data->driver->securityManager, + data->vm->def, + data->migrated) < 0) + return -1; + } + + return 0; +} + + +int +qemuSecuritySetAllLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *stdin_path) +{ + struct qemuSecuritySetRestoreAllLabelData data; + + memset(&data, 0, sizeof(data)); + + data.set = true; + data.driver = driver; + data.vm = vm; + data.stdin_path = stdin_path; + + if (qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) { + if (virSecurityManagerPreFork(driver->securityManager) < 0) + return -1; + if (virProcessRunInMountNamespace(vm->pid, + qemuSecuritySetRestoreAllLabelHelper, + &data) < 0) { + virSecurityManagerPostFork(driver->securityManager); + return -1; + } + virSecurityManagerPostFork(driver->securityManager); + + } else { + if (virSecurityManagerSetAllLabel(driver->securityManager, + vm->def, + stdin_path) < 0) + return -1; + } + return 0; +} + + +void +qemuSecurityRestoreAllLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + bool migrated) +{ + struct qemuSecuritySetRestoreAllLabelData data; + + memset(&data, 0, sizeof(data)); + + data.driver = driver; + data.vm = vm; + data.migrated = migrated; + + if (qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) { + if (virSecurityManagerPreFork(driver->securityManager) < 0) + return; + + virProcessRunInMountNamespace(vm->pid, + qemuSecuritySetRestoreAllLabelHelper, + &data); + virSecurityManagerPostFork(driver->securityManager); + } else { + virSecurityManagerRestoreAllLabel(driver->securityManager, + vm->def, + migrated); + } +} diff --git a/src/qemu/qemu_security.h b/src/qemu/qemu_security.h new file mode 100644 index 000000000..88c53e765 --- /dev/null +++ b/src/qemu/qemu_security.h @@ -0,0 +1,39 @@ +/* + * qemu_security.h: QEMU security management + * + * Copyright (C) 2016 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Michal Privoznik <mprivozn@redhat.com> + */ + +#ifndef __QEMU_SECURITY_H__ +# define __QEMU_SECURITY_H__ + +# include <stdbool.h> + +# include "qemu_conf.h" +# include "domain_conf.h" + +int qemuSecuritySetAllLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *stdin_path); + +void qemuSecurityRestoreAllLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + bool migrated); +#endif /* __QEMU_SECURITY_H__ */ -- 2.11.0

On Wed, Dec 07, 2016 at 09:36:22AM +0100, Michal Privoznik wrote:
Instead of trying to fix our security drivers, we can use a simple trick to relabel paths in both namespace and the host. I mean, if we enter the namespace some paths are still shared with the host so any change done to them is visible from the host too. Therefore, we can just enter the namespace and call SetAllLabel()/RestoreAllLabel() from there. Yes, it has slight overhead because we have to fork in order to enter the namespace. But on the other hand, no complexity is added to our code.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/Makefile.am | 3 +- src/qemu/qemu_process.c | 15 +++--- src/qemu/qemu_security.c | 132 +++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_security.h | 39 ++++++++++++++ 4 files changed, 181 insertions(+), 8 deletions(-) create mode 100644 src/qemu/qemu_security.c create mode 100644 src/qemu/qemu_security.h
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

When attaching a device to a domain that's using separate mount namespace we must maintain /dev entries in order for qemu process to see them. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 214 +++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 8 ++ src/qemu/qemu_driver.c | 5 +- src/qemu/qemu_hotplug.c | 22 +++-- src/qemu/qemu_security.c | 32 +++++++ src/qemu/qemu_security.h | 8 ++ 6 files changed, 280 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index a7d26e58c..ff32bb5eb 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -53,6 +53,11 @@ #include "storage/storage_driver.h" +#ifdef MAJOR_IN_MKDEV +# include <sys/mkdev.h> +#elif MAJOR_IN_SYSMACROS +# include <sys/sysmacros.h> +#endif #include <sys/time.h> #include <fcntl.h> #if defined(HAVE_SYS_MOUNT_H) @@ -7396,3 +7401,212 @@ qemuDomainDeleteNamespace(virQEMUDriverPtr driver, VIR_FREE(devptsPath); VIR_FREE(devPath); } + + +struct qemuDomainAttachDeviceMknodData { + virQEMUDriverPtr driver; + virDomainObjPtr vm; + virDomainDeviceDefPtr devDef; + const char *file; + struct stat sb; + void *acl; +}; + + +static int +qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, + void *opaque) +{ + struct qemuDomainAttachDeviceMknodData *data = opaque; + int ret = -1; + + virSecurityManagerPostFork(data->driver->securityManager); + + if (virFileMakeParentPath(data->file) < 0) { + virReportSystemError(errno, + _("Unable to create %s"), data->file); + goto cleanup; + } + + VIR_DEBUG("Creating dev %s (%d,%d)", + data->file, major(data->sb.st_rdev), minor(data->sb.st_rdev)); + if (mknod(data->file, data->sb.st_mode, data->sb.st_rdev) < 0) { + /* Because we are not removing devices on hotunplug, or + * we might be creating part of backing chain that + * already exist due to a different disk plugged to + * domain, accept EEXIST. */ + if (errno != EEXIST) { + virReportSystemError(errno, + _("Unable to create device %s"), + data->file); + goto cleanup; + } + } + + if (virFileSetACLs(data->file, data->acl) < 0 && + errno != ENOTSUP) { + virReportSystemError(errno, + _("Unable to set ACLs on %s"), data->file); + goto cleanup; + } + + switch ((virDomainDeviceType) data->devDef->type) { + case VIR_DOMAIN_DEVICE_DISK: { + virDomainDiskDefPtr def = data->devDef->data.disk; + char *tmpsrc = def->src->path; + def->src->path = (char *) data->file; + if (virSecurityManagerSetDiskLabel(data->driver->securityManager, + data->vm->def, def) < 0) { + def->src->path = tmpsrc; + goto cleanup; + } + def->src->path = tmpsrc; + } break; + + case VIR_DOMAIN_DEVICE_NONE: + case VIR_DOMAIN_DEVICE_LEASE: + case VIR_DOMAIN_DEVICE_FS: + case VIR_DOMAIN_DEVICE_NET: + case VIR_DOMAIN_DEVICE_INPUT: + case VIR_DOMAIN_DEVICE_SOUND: + case VIR_DOMAIN_DEVICE_VIDEO: + case VIR_DOMAIN_DEVICE_HOSTDEV: + case VIR_DOMAIN_DEVICE_WATCHDOG: + case VIR_DOMAIN_DEVICE_CONTROLLER: + case VIR_DOMAIN_DEVICE_GRAPHICS: + case VIR_DOMAIN_DEVICE_HUB: + case VIR_DOMAIN_DEVICE_REDIRDEV: + case VIR_DOMAIN_DEVICE_SMARTCARD: + case VIR_DOMAIN_DEVICE_CHR: + case VIR_DOMAIN_DEVICE_MEMBALLOON: + case VIR_DOMAIN_DEVICE_NVRAM: + case VIR_DOMAIN_DEVICE_RNG: + case VIR_DOMAIN_DEVICE_SHMEM: + case VIR_DOMAIN_DEVICE_TPM: + case VIR_DOMAIN_DEVICE_PANIC: + case VIR_DOMAIN_DEVICE_MEMORY: + case VIR_DOMAIN_DEVICE_IOMMU: + case VIR_DOMAIN_DEVICE_LAST: + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unexpected device type %d"), + data->devDef->type); + goto cleanup; + } + + ret = 0; + cleanup: + if (ret < 0) + unlink(data->file); + virFileFreeACLs(&data->acl); + return ret; +} + + +static int +qemuDomainAttachDeviceMknod(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDeviceDefPtr devDef, + const char *file) +{ + struct qemuDomainAttachDeviceMknodData data; + int ret = -1; + + memset(&data, 0, sizeof(data)); + + data.driver = driver; + data.vm = vm; + data.devDef = devDef; + data.file = file; + + if (stat(file, &data.sb) < 0) { + virReportSystemError(errno, + _("Unable to access %s"), file); + return ret; + } + + if (virFileGetACLs(file, &data.acl) < 0 && + errno != ENOTSUP) { + virReportSystemError(errno, + _("Unable to get ACLs on %s"), file); + return ret; + } + + if (virSecurityManagerPreFork(driver->securityManager) < 0) + goto cleanup; + + if (virProcessRunInMountNamespace(vm->pid, + qemuDomainAttachDeviceMknodHelper, + &data) < 0) { + virSecurityManagerPostFork(driver->securityManager); + goto cleanup; + } + + virSecurityManagerPostFork(driver->securityManager); + + ret = 0; + cleanup: + virFileFreeACLs(&data.acl); + return 0; +} + + +int +qemuDomainNamespaceSetupDisk(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk) +{ + virDomainDeviceDef dev = {.type = VIR_DOMAIN_DEVICE_DISK, .data.disk = disk}; + virStorageSourcePtr next; + const char *src = NULL; + struct stat sb; + int ret = -1; + + if (!qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) + return 0; + + for (next = disk->src; next; next = next->backingStore) { + if (!next->path || !virStorageSourceIsBlockLocal(disk->src) || + !STRPREFIX(next->path, "/dev")) { + /* Not creating device. Just continue. */ + continue; + } + + if (stat(next->path, &sb) < 0) { + virReportSystemError(errno, + _("Unable to access %s"), src); + goto cleanup; + } + + if (!S_ISBLK(sb.st_mode)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Disk source %s must be a block device"), + src); + goto cleanup; + } + + if (qemuDomainAttachDeviceMknod(driver, + vm, + &dev, + next->path) < 0) + goto cleanup; + } + + ret = 0; + cleanup: + return ret; +} + + +int +qemuDomainNamespaceTeardownDisk(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm ATTRIBUTE_UNUSED, + virDomainDiskDefPtr disk ATTRIBUTE_UNUSED) +{ + /* While in hotplug case we create the whole backing chain, + * here we must limit ourselves. The disk we want to remove + * might be a part of backing chain of another disk. + * If you are reading these lines and have some spare time + * you can come up with and algorithm that checks for that. + * I don't, therefore: */ + return 0; +} diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 9dad5bc7c..c63f7c954 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -805,4 +805,12 @@ int qemuDomainCreateNamespace(virQEMUDriverPtr driver, void qemuDomainDeleteNamespace(virQEMUDriverPtr driver, virDomainObjPtr vm); + +int qemuDomainNamespaceSetupDisk(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk); + +int qemuDomainNamespaceTeardownDisk(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk); #endif /* __QEMU_DOMAIN_H__ */ diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 4f624cbde..1256445f8 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -57,6 +57,7 @@ #include "qemu_process.h" #include "qemu_migration.h" #include "qemu_blockjob.h" +#include "qemu_security.h" #include "virerror.h" #include "virlog.h" @@ -16107,9 +16108,9 @@ qemuDomainBlockPivot(virQEMUDriverPtr driver, disk->mirror->format != VIR_STORAGE_FILE_RAW && (virDomainLockDiskAttach(driver->lockManager, cfg->uri, vm, disk) < 0 || + qemuDomainNamespaceSetupDisk(driver, vm, disk) < 0 || qemuSetupDiskCgroup(vm, disk) < 0 || - virSecurityManagerSetDiskLabel(driver->securityManager, vm->def, - disk) < 0)) + qemuSecuritySetDiskLabel(driver, vm, disk) < 0)) goto cleanup; disk->src = oldsrc; diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 55af88850..bc953c0db 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -34,6 +34,7 @@ #include "qemu_hostdev.h" #include "qemu_interface.h" #include "qemu_process.h" +#include "qemu_security.h" #include "domain_audit.h" #include "netdev_bandwidth_conf.h" #include "domain_nwfilter.h" @@ -109,13 +110,15 @@ qemuDomainPrepareDisk(virQEMUDriverPtr driver, vm, disk) < 0) goto cleanup; - if (virSecurityManagerSetDiskLabel(driver->securityManager, - vm->def, disk) < 0) + if (qemuSecuritySetDiskLabel(driver, vm, disk) < 0) goto rollback_lock; - if (qemuSetupDiskCgroup(vm, disk) < 0) + if (qemuDomainNamespaceSetupDisk(driver, vm, disk) < 0) goto rollback_label; + if (qemuSetupDiskCgroup(vm, disk) < 0) + goto rollback_namespace; + ret = 0; goto cleanup; @@ -123,10 +126,13 @@ qemuDomainPrepareDisk(virQEMUDriverPtr driver, if (qemuTeardownDiskCgroup(vm, disk) < 0) VIR_WARN("Unable to tear down cgroup access on %s", virDomainDiskGetSource(disk)); + rollback_namespace: + if (qemuDomainNamespaceTeardownDisk(driver, vm, disk) < 0) + VIR_WARN("Unable to remove /dev entry for %s", + virDomainDiskGetSource(disk)); rollback_label: - if (virSecurityManagerRestoreDiskLabel(driver->securityManager, - vm->def, disk) < 0) + if (qemuSecurityRestoreDiskLabel(driver, vm, disk) < 0) VIR_WARN("Unable to restore security label on %s", virDomainDiskGetSource(disk)); @@ -3588,8 +3594,7 @@ qemuDomainRemoveDiskDevice(virQEMUDriverPtr driver, qemuDomainReleaseDeviceAddress(vm, &disk->info, src); - if (virSecurityManagerRestoreDiskLabel(driver->securityManager, - vm->def, disk) < 0) + if (qemuSecurityRestoreDiskLabel(driver, vm, disk) < 0) VIR_WARN("Unable to restore security label on %s", src); if (qemuTeardownDiskCgroup(vm, disk) < 0) @@ -3598,6 +3603,9 @@ qemuDomainRemoveDiskDevice(virQEMUDriverPtr driver, if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) VIR_WARN("Unable to release lock on %s", src); + if (qemuDomainNamespaceTeardownDisk(driver, vm, disk) < 0) + VIR_WARN("Unable to remove /dev entry for %s", src); + dev.type = VIR_DOMAIN_DEVICE_DISK; dev.data.disk = disk; ignore_value(qemuRemoveSharedDevice(driver, &dev, vm->def->name)); diff --git a/src/qemu/qemu_security.c b/src/qemu/qemu_security.c index bfebfe42c..ef0865b61 100644 --- a/src/qemu/qemu_security.c +++ b/src/qemu/qemu_security.c @@ -130,3 +130,35 @@ qemuSecurityRestoreAllLabel(virQEMUDriverPtr driver, migrated); } } + + +int +qemuSecuritySetDiskLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk) +{ + if (qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) { + /* Already handled by namespace code. */ + return 0; + } + + return virSecurityManagerSetDiskLabel(driver->securityManager, + vm->def, + disk); +} + + +int +qemuSecurityRestoreDiskLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk) +{ + if (qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) { + /* Already handled by namespace code. */ + return 0; + } + + return virSecurityManagerRestoreDiskLabel(driver->securityManager, + vm->def, + disk); +} diff --git a/src/qemu/qemu_security.h b/src/qemu/qemu_security.h index 88c53e765..e3324ca8c 100644 --- a/src/qemu/qemu_security.h +++ b/src/qemu/qemu_security.h @@ -36,4 +36,12 @@ int qemuSecuritySetAllLabel(virQEMUDriverPtr driver, void qemuSecurityRestoreAllLabel(virQEMUDriverPtr driver, virDomainObjPtr vm, bool migrated); + +int qemuSecuritySetDiskLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk); + +int qemuSecurityRestoreDiskLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk); #endif /* __QEMU_SECURITY_H__ */ -- 2.11.0

On Wed, Dec 07, 2016 at 09:36:23AM +0100, Michal Privoznik wrote:
When attaching a device to a domain that's using separate mount namespace we must maintain /dev entries in order for qemu process to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 214 +++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 8 ++ src/qemu/qemu_driver.c | 5 +- src/qemu/qemu_hotplug.c | 22 +++-- src/qemu/qemu_security.c | 32 +++++++ src/qemu/qemu_security.h | 8 ++ 6 files changed, 280 insertions(+), 9 deletions(-)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

When attaching a device to a domain that's using separate mount namespace we must maintain /dev entries in order for qemu process to see them. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 157 ++++++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_domain.h | 8 +++ src/qemu/qemu_hotplug.c | 48 ++++++++++----- src/qemu/qemu_security.c | 34 ++++++++++ src/qemu/qemu_security.h | 8 +++ 5 files changed, 240 insertions(+), 15 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index ff32bb5eb..d0ba316d2 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -7463,6 +7463,13 @@ qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, def->src->path = tmpsrc; } break; + case VIR_DOMAIN_DEVICE_HOSTDEV: { + virDomainHostdevDefPtr def = data->devDef->data.hostdev; + if (virSecurityManagerSetHostdevLabel(data->driver->securityManager, + data->vm->def, def, NULL) < 0) + goto cleanup; + } break; + case VIR_DOMAIN_DEVICE_NONE: case VIR_DOMAIN_DEVICE_LEASE: case VIR_DOMAIN_DEVICE_FS: @@ -7470,7 +7477,6 @@ qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, case VIR_DOMAIN_DEVICE_INPUT: case VIR_DOMAIN_DEVICE_SOUND: case VIR_DOMAIN_DEVICE_VIDEO: - case VIR_DOMAIN_DEVICE_HOSTDEV: case VIR_DOMAIN_DEVICE_WATCHDOG: case VIR_DOMAIN_DEVICE_CONTROLLER: case VIR_DOMAIN_DEVICE_GRAPHICS: @@ -7550,6 +7556,91 @@ qemuDomainAttachDeviceMknod(virQEMUDriverPtr driver, } +static int +qemuDomainDetachDeviceUnlinkHelper(pid_t pid ATTRIBUTE_UNUSED, + void *opaque) +{ + const char *path = opaque; + + VIR_DEBUG("Unlinking %s", path); + if (unlink(path) < 0 && errno != ENOENT) { + virReportSystemError(errno, + _("Unable to remove device %s"), path); + return -1; + } + + return 0; +} + + +static int +qemuDomainDetachDeviceUnlink(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDeviceDefPtr dev, + const char *file) +{ + /* Technically, this is not needed. Yet. But in the future + * security managers might do some reference counting over + * Set/Restore label and thus for every SetLabel() there + * should be corresponding RestoreLabel(). */ + switch ((virDomainDeviceType) dev->type) { + case VIR_DOMAIN_DEVICE_DISK: { + virDomainDiskDefPtr def = dev->data.disk; + char *tmpsrc = def->src->path; + def->src->path = (char *) file; + if (virSecurityManagerRestoreDiskLabel(driver->securityManager, + vm->def, def) < 0) { + def->src->path = tmpsrc; + return -1; + } + def->src->path = tmpsrc; + } break; + + case VIR_DOMAIN_DEVICE_HOSTDEV: { + virDomainHostdevDefPtr def = dev->data.hostdev; + if (virSecurityManagerRestoreHostdevLabel(driver->securityManager, + vm->def, def, NULL) < 0) + return -1; + } break; + + case VIR_DOMAIN_DEVICE_NONE: + case VIR_DOMAIN_DEVICE_LEASE: + case VIR_DOMAIN_DEVICE_FS: + case VIR_DOMAIN_DEVICE_NET: + case VIR_DOMAIN_DEVICE_INPUT: + case VIR_DOMAIN_DEVICE_SOUND: + case VIR_DOMAIN_DEVICE_VIDEO: + case VIR_DOMAIN_DEVICE_WATCHDOG: + case VIR_DOMAIN_DEVICE_CONTROLLER: + case VIR_DOMAIN_DEVICE_GRAPHICS: + case VIR_DOMAIN_DEVICE_HUB: + case VIR_DOMAIN_DEVICE_REDIRDEV: + case VIR_DOMAIN_DEVICE_SMARTCARD: + case VIR_DOMAIN_DEVICE_CHR: + case VIR_DOMAIN_DEVICE_MEMBALLOON: + case VIR_DOMAIN_DEVICE_NVRAM: + case VIR_DOMAIN_DEVICE_RNG: + case VIR_DOMAIN_DEVICE_SHMEM: + case VIR_DOMAIN_DEVICE_TPM: + case VIR_DOMAIN_DEVICE_PANIC: + case VIR_DOMAIN_DEVICE_MEMORY: + case VIR_DOMAIN_DEVICE_IOMMU: + case VIR_DOMAIN_DEVICE_LAST: + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unexpected device type %d"), + dev->type); + return -1; + } + + if (virProcessRunInMountNamespace(vm->pid, + qemuDomainDetachDeviceUnlinkHelper, + (void *)file) < 0) + return -1; + + return 0; +} + + int qemuDomainNamespaceSetupDisk(virQEMUDriverPtr driver, virDomainObjPtr vm, @@ -7610,3 +7701,67 @@ qemuDomainNamespaceTeardownDisk(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, * I don't, therefore: */ return 0; } + + +int +qemuDomainNamespaceSetupHostdev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainHostdevDefPtr hostdev) +{ + virDomainDeviceDef dev = {.type = VIR_DOMAIN_DEVICE_HOSTDEV, .data.hostdev = hostdev}; + int ret = -1; + char *path = NULL; + + if (!qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) + return 0; + + if (qemuDomainGetHostdevPath(hostdev, &path) < 0) + goto cleanup; + + if (!path) { + /* There's no /dev device that we need to create. Claim success. */ + ret = 0; + goto cleanup; + } + + if (qemuDomainAttachDeviceMknod(driver, + vm, + &dev, + path) < 0) + goto cleanup; + ret = 0; + cleanup: + VIR_FREE(path); + return ret; +} + + +int +qemuDomainNamespaceTeardownHostdev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainHostdevDefPtr hostdev) +{ + virDomainDeviceDef dev = {.type = VIR_DOMAIN_DEVICE_HOSTDEV, .data.hostdev = hostdev}; + int ret = -1; + char *path = NULL; + + if (!qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) + return 0; + + if (qemuDomainGetHostdevPath(hostdev, &path) < 0) + goto cleanup; + + if (!path) { + /* There's no /dev device that we need to create. Claim success. */ + ret = 0; + goto cleanup; + } + + if (qemuDomainDetachDeviceUnlink(driver, vm, &dev, path) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(path); + return ret; +} diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index c63f7c954..7a3f31c18 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -813,4 +813,12 @@ int qemuDomainNamespaceSetupDisk(virQEMUDriverPtr driver, int qemuDomainNamespaceTeardownDisk(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainDiskDefPtr disk); + +int qemuDomainNamespaceSetupHostdev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainHostdevDefPtr hostdev); + +int qemuDomainNamespaceTeardownHostdev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainHostdevDefPtr hostdev); #endif /* __QEMU_DOMAIN_H__ */ diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index bc953c0db..09aca03e0 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1391,6 +1391,7 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver, bool releaseaddr = false; bool teardowncgroup = false; bool teardownlabel = false; + bool teardowndevice = false; int backend; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); unsigned int flags = 0; @@ -1442,12 +1443,15 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver, } vm->def->hostdevs[--(vm->def->nhostdevs)] = NULL; + if (qemuDomainNamespaceSetupHostdev(driver, vm, hostdev) < 0) + goto error; + teardowndevice = true; + if (qemuSetupHostdevCgroup(vm, hostdev) < 0) goto error; teardowncgroup = true; - if (virSecurityManagerSetHostdevLabel(driver->securityManager, - vm->def, hostdev, NULL) < 0) + if (qemuSecuritySetHostdevLabel(driver, vm, hostdev) < 0) goto error; if (backend != VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO) teardownlabel = true; @@ -1500,9 +1504,11 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver, if (teardowncgroup && qemuTeardownHostdevCgroup(vm, hostdev) < 0) VIR_WARN("Unable to remove host device cgroup ACL on hotplug fail"); if (teardownlabel && - virSecurityManagerRestoreHostdevLabel(driver->securityManager, - vm->def, hostdev, NULL) < 0) + qemuSecurityRestoreHostdevLabel(driver, vm, hostdev) < 0) VIR_WARN("Unable to restore host device labelling on hotplug fail"); + if (teardowndevice && + qemuDomainNamespaceTeardownHostdev(driver, vm, hostdev) < 0) + VIR_WARN("Unable to remove host device from /dev"); if (releaseaddr) qemuDomainReleaseDeviceAddress(vm, hostdev->info, NULL); @@ -2283,6 +2289,7 @@ qemuDomainAttachHostUSBDevice(virQEMUDriverPtr driver, bool added = false; bool teardowncgroup = false; bool teardownlabel = false; + bool teardowndevice = false; int ret = -1; if (priv->usbaddrs) { @@ -2296,12 +2303,15 @@ qemuDomainAttachHostUSBDevice(virQEMUDriverPtr driver, added = true; + if (qemuDomainNamespaceSetupHostdev(driver, vm, hostdev) < 0) + goto cleanup; + teardowndevice = true; + if (qemuSetupHostdevCgroup(vm, hostdev) < 0) goto cleanup; teardowncgroup = true; - if (virSecurityManagerSetHostdevLabel(driver->securityManager, - vm->def, hostdev, NULL) < 0) + if (qemuSecuritySetHostdevLabel(driver, vm, hostdev) < 0) goto cleanup; teardownlabel = true; @@ -2331,9 +2341,11 @@ qemuDomainAttachHostUSBDevice(virQEMUDriverPtr driver, if (teardowncgroup && qemuTeardownHostdevCgroup(vm, hostdev) < 0) VIR_WARN("Unable to remove host device cgroup ACL on hotplug fail"); if (teardownlabel && - virSecurityManagerRestoreHostdevLabel(driver->securityManager, - vm->def, hostdev, NULL) < 0) + qemuSecurityRestoreHostdevLabel(driver, vm, hostdev) < 0) VIR_WARN("Unable to restore host device labelling on hotplug fail"); + if (teardowndevice && + qemuDomainNamespaceTeardownHostdev(driver, vm, hostdev) < 0) + VIR_WARN("Unable to remove host device from /dev"); if (added) qemuHostdevReAttachUSBDevices(driver, vm->def->name, &hostdev, 1); if (releaseaddr) @@ -2359,6 +2371,7 @@ qemuDomainAttachHostSCSIDevice(virConnectPtr conn, char *drivealias = NULL; bool teardowncgroup = false; bool teardownlabel = false; + bool teardowndevice = false; bool driveAdded = false; if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE_SCSI_GENERIC)) { @@ -2397,12 +2410,15 @@ qemuDomainAttachHostSCSIDevice(virConnectPtr conn, return -1; } + if (qemuDomainNamespaceSetupHostdev(driver, vm, hostdev) < 0) + goto cleanup; + teardowndevice = true; + if (qemuSetupHostdevCgroup(vm, hostdev) < 0) goto cleanup; teardowncgroup = true; - if (virSecurityManagerSetHostdevLabel(driver->securityManager, - vm->def, hostdev, NULL) < 0) + if (qemuSecuritySetHostdevLabel(driver, vm, hostdev) < 0) goto cleanup; teardownlabel = true; @@ -2449,9 +2465,11 @@ qemuDomainAttachHostSCSIDevice(virConnectPtr conn, if (teardowncgroup && qemuTeardownHostdevCgroup(vm, hostdev) < 0) VIR_WARN("Unable to remove host device cgroup ACL on hotplug fail"); if (teardownlabel && - virSecurityManagerRestoreHostdevLabel(driver->securityManager, - vm->def, hostdev, NULL) < 0) + qemuSecurityRestoreHostdevLabel(driver, vm, hostdev) < 0) VIR_WARN("Unable to restore host device labelling on hotplug fail"); + if (teardowndevice && + qemuDomainNamespaceTeardownHostdev(driver, vm, hostdev) < 0) + VIR_WARN("Unable to remove host device from /dev"); } VIR_FREE(drivealias); VIR_FREE(drvstr); @@ -3782,13 +3800,15 @@ qemuDomainRemoveHostDevice(virQEMUDriverPtr driver, virDomainAuditHostdev(vm, hostdev, "detach", true); if (!is_vfio && - virSecurityManagerRestoreHostdevLabel(driver->securityManager, - vm->def, hostdev, NULL) < 0) + qemuSecurityRestoreHostdevLabel(driver, vm, hostdev) < 0) VIR_WARN("Failed to restore host device labelling"); if (qemuTeardownHostdevCgroup(vm, hostdev) < 0) VIR_WARN("Failed to remove host device cgroup ACL"); + if (qemuDomainNamespaceTeardownHostdev(driver, vm, hostdev) < 0) + VIR_WARN("Unable to remove host device from /dev"); + switch ((virDomainHostdevSubsysType) hostdev->source.subsys.type) { case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: qemuDomainRemovePCIHostDevice(driver, vm, hostdev); diff --git a/src/qemu/qemu_security.c b/src/qemu/qemu_security.c index ef0865b61..9ab91e9f2 100644 --- a/src/qemu/qemu_security.c +++ b/src/qemu/qemu_security.c @@ -162,3 +162,37 @@ qemuSecurityRestoreDiskLabel(virQEMUDriverPtr driver, vm->def, disk); } + + +int +qemuSecuritySetHostdevLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainHostdevDefPtr hostdev) +{ + if (qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) { + /* Already handled by namespace code. */ + return 0; + } + + return virSecurityManagerSetHostdevLabel(driver->securityManager, + vm->def, + hostdev, + NULL); +} + + +int +qemuSecurityRestoreHostdevLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainHostdevDefPtr hostdev) +{ + if (qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) { + /* Already handled by namespace code. */ + return 0; + } + + return virSecurityManagerRestoreHostdevLabel(driver->securityManager, + vm->def, + hostdev, + NULL); +} diff --git a/src/qemu/qemu_security.h b/src/qemu/qemu_security.h index e3324ca8c..cc373b3e1 100644 --- a/src/qemu/qemu_security.h +++ b/src/qemu/qemu_security.h @@ -44,4 +44,12 @@ int qemuSecuritySetDiskLabel(virQEMUDriverPtr driver, int qemuSecurityRestoreDiskLabel(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainDiskDefPtr disk); + +int qemuSecuritySetHostdevLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainHostdevDefPtr hostdev); + +int qemuSecurityRestoreHostdevLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainHostdevDefPtr hostdev); #endif /* __QEMU_SECURITY_H__ */ -- 2.11.0

On Wed, Dec 07, 2016 at 09:36:24AM +0100, Michal Privoznik wrote:
When attaching a device to a domain that's using separate mount namespace we must maintain /dev entries in order for qemu process to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 157 ++++++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_domain.h | 8 +++ src/qemu/qemu_hotplug.c | 48 ++++++++++----- src/qemu/qemu_security.c | 34 ++++++++++ src/qemu/qemu_security.h | 8 +++ 5 files changed, 240 insertions(+), 15 deletions(-)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

When attaching a device to a domain that's using separate mount namespace we must maintain /dev entries in order for qemu process to see them. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_domain.h | 8 +++++++ src/qemu/qemu_hotplug.c | 10 ++++++++ 3 files changed, 80 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index d0ba316d2..a8cc8ecfa 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -7470,6 +7470,10 @@ qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, goto cleanup; } break; + case VIR_DOMAIN_DEVICE_CHR: + /* No labelling. */ + break; + case VIR_DOMAIN_DEVICE_NONE: case VIR_DOMAIN_DEVICE_LEASE: case VIR_DOMAIN_DEVICE_FS: @@ -7483,7 +7487,6 @@ qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, case VIR_DOMAIN_DEVICE_HUB: case VIR_DOMAIN_DEVICE_REDIRDEV: case VIR_DOMAIN_DEVICE_SMARTCARD: - case VIR_DOMAIN_DEVICE_CHR: case VIR_DOMAIN_DEVICE_MEMBALLOON: case VIR_DOMAIN_DEVICE_NVRAM: case VIR_DOMAIN_DEVICE_RNG: @@ -7603,6 +7606,10 @@ qemuDomainDetachDeviceUnlink(virQEMUDriverPtr driver, return -1; } break; + case VIR_DOMAIN_DEVICE_CHR: + /* No labelling. */ + break; + case VIR_DOMAIN_DEVICE_NONE: case VIR_DOMAIN_DEVICE_LEASE: case VIR_DOMAIN_DEVICE_FS: @@ -7616,7 +7623,6 @@ qemuDomainDetachDeviceUnlink(virQEMUDriverPtr driver, case VIR_DOMAIN_DEVICE_HUB: case VIR_DOMAIN_DEVICE_REDIRDEV: case VIR_DOMAIN_DEVICE_SMARTCARD: - case VIR_DOMAIN_DEVICE_CHR: case VIR_DOMAIN_DEVICE_MEMBALLOON: case VIR_DOMAIN_DEVICE_NVRAM: case VIR_DOMAIN_DEVICE_RNG: @@ -7765,3 +7771,57 @@ qemuDomainNamespaceTeardownHostdev(virQEMUDriverPtr driver, VIR_FREE(path); return ret; } + + +int +qemuDomainNamespaceSetupChardev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainChrDefPtr chr) +{ + virDomainDeviceDef dev = {.type = VIR_DOMAIN_DEVICE_CHR, .data.chr = chr}; + const char *path; + int ret = -1; + + if (!qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) + return 0; + + if (chr->source->type != VIR_DOMAIN_CHR_TYPE_DEV) + return 0; + + path = chr->source->data.file.path; + + if (qemuDomainAttachDeviceMknod(driver, + vm, + &dev, + path) < 0) + goto cleanup; + ret = 0; + cleanup: + return ret; +} + + +int +qemuDomainNamespaceTeardownChardev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainChrDefPtr chr) +{ + virDomainDeviceDef dev = {.type = VIR_DOMAIN_DEVICE_CHR, .data.chr = chr}; + int ret = -1; + const char *path = NULL; + + if (!qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) + return 0; + + if (chr->source->type != VIR_DOMAIN_CHR_TYPE_DEV) + return 0; + + path = chr->source->data.file.path; + + if (qemuDomainDetachDeviceUnlink(driver, vm, &dev, path) < 0) + goto cleanup; + + ret = 0; + cleanup: + return ret; +} diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 7a3f31c18..7a774a543 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -821,4 +821,12 @@ int qemuDomainNamespaceSetupHostdev(virQEMUDriverPtr driver, int qemuDomainNamespaceTeardownHostdev(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainHostdevDefPtr hostdev); + +int qemuDomainNamespaceSetupChardev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainChrDefPtr chr); + +int qemuDomainNamespaceTeardownChardev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainChrDefPtr chr); #endif /* __QEMU_DOMAIN_H__ */ diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 09aca03e0..8b41edc56 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1851,6 +1851,7 @@ int qemuDomainAttachChrDevice(virConnectPtr conn, bool chardevAttached = false; bool tlsobjAdded = false; bool teardowncgroup = false; + bool teardowndevice = false; bool secobjAdded = false; virJSONValuePtr tlsProps = NULL; char *tlsAlias = NULL; @@ -1872,6 +1873,10 @@ int qemuDomainAttachChrDevice(virConnectPtr conn, if (rc == 1) need_release = true; + if (qemuDomainNamespaceSetupChardev(driver, vm, chr) < 0) + goto cleanup; + teardowndevice = true; + if (qemuSetupChardevCgroup(vm, chr) < 0) goto cleanup; teardowncgroup = true; @@ -1935,6 +1940,8 @@ int qemuDomainAttachChrDevice(virConnectPtr conn, qemuDomainReleaseDeviceAddress(vm, &chr->info, NULL); if (teardowncgroup && qemuTeardownChardevCgroup(vm, chr) < 0) VIR_WARN("Unable to remove chr device cgroup ACL on hotplug fail"); + if (teardowndevice && qemuDomainNamespaceTeardownChardev(driver, vm, chr) < 0) + VIR_WARN("Unable to remove chr device from /dev"); } VIR_FREE(tlsAlias); virJSONValueFree(tlsProps); @@ -4021,6 +4028,9 @@ qemuDomainRemoveChrDevice(virQEMUDriverPtr driver, if (qemuTeardownChardevCgroup(vm, chr) < 0) VIR_WARN("Failed to remove chr device cgroup ACL"); + if (qemuDomainNamespaceTeardownChardev(driver, vm, chr) < 0) + VIR_WARN("Unable to remove chr device from /dev"); + event = virDomainEventDeviceRemovedNewFromObj(vm, chr->info.alias); qemuDomainEventQueue(driver, event); -- 2.11.0

On Wed, Dec 07, 2016 at 09:36:25AM +0100, Michal Privoznik wrote:
When attaching a device to a domain that's using separate mount namespace we must maintain /dev entries in order for qemu process to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_domain.h | 8 +++++++ src/qemu/qemu_hotplug.c | 10 ++++++++ 3 files changed, 80 insertions(+), 2 deletions(-)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

When attaching a device to a domain that's using separate mount namespace we must maintain /dev entries in order for qemu process to see them. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_domain.h | 8 ++++++ src/qemu/qemu_hotplug.c | 10 +++++++ 3 files changed, 86 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index a8cc8ecfa..a5dc7bbd2 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -7471,6 +7471,7 @@ qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, } break; case VIR_DOMAIN_DEVICE_CHR: + case VIR_DOMAIN_DEVICE_RNG: /* No labelling. */ break; @@ -7489,7 +7490,6 @@ qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, case VIR_DOMAIN_DEVICE_SMARTCARD: case VIR_DOMAIN_DEVICE_MEMBALLOON: case VIR_DOMAIN_DEVICE_NVRAM: - case VIR_DOMAIN_DEVICE_RNG: case VIR_DOMAIN_DEVICE_SHMEM: case VIR_DOMAIN_DEVICE_TPM: case VIR_DOMAIN_DEVICE_PANIC: @@ -7607,6 +7607,7 @@ qemuDomainDetachDeviceUnlink(virQEMUDriverPtr driver, } break; case VIR_DOMAIN_DEVICE_CHR: + case VIR_DOMAIN_DEVICE_RNG: /* No labelling. */ break; @@ -7625,7 +7626,6 @@ qemuDomainDetachDeviceUnlink(virQEMUDriverPtr driver, case VIR_DOMAIN_DEVICE_SMARTCARD: case VIR_DOMAIN_DEVICE_MEMBALLOON: case VIR_DOMAIN_DEVICE_NVRAM: - case VIR_DOMAIN_DEVICE_RNG: case VIR_DOMAIN_DEVICE_SHMEM: case VIR_DOMAIN_DEVICE_TPM: case VIR_DOMAIN_DEVICE_PANIC: @@ -7825,3 +7825,69 @@ qemuDomainNamespaceTeardownChardev(virQEMUDriverPtr driver, cleanup: return ret; } + + +int +qemuDomainNamespaceSetupRNG(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainRNGDefPtr rng) +{ + virDomainDeviceDef dev = {.type = VIR_DOMAIN_DEVICE_RNG, .data.rng = rng}; + const char *path = NULL; + int ret = -1; + + if (!qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) + return 0; + + switch ((virDomainRNGBackend) rng->backend) { + case VIR_DOMAIN_RNG_BACKEND_RANDOM: + path = rng->source.file; + break; + + case VIR_DOMAIN_RNG_BACKEND_EGD: + case VIR_DOMAIN_RNG_BACKEND_LAST: + ret = 0; + goto cleanup; + } + + if (qemuDomainAttachDeviceMknod(driver, + vm, + &dev, + path) < 0) + goto cleanup; + ret = 0; + cleanup: + return ret; +} + + +int +qemuDomainNamespaceTeardownRNG(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainRNGDefPtr rng) +{ + virDomainDeviceDef dev = {.type = VIR_DOMAIN_DEVICE_RNG, .data.rng = rng}; + int ret = -1; + const char *path = NULL; + + if (!qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) + return 0; + + switch ((virDomainRNGBackend) rng->backend) { + case VIR_DOMAIN_RNG_BACKEND_RANDOM: + path = rng->source.file; + break; + + case VIR_DOMAIN_RNG_BACKEND_EGD: + case VIR_DOMAIN_RNG_BACKEND_LAST: + ret = 0; + goto cleanup; + } + + if (qemuDomainDetachDeviceUnlink(driver, vm, &dev, path) < 0) + goto cleanup; + + ret = 0; + cleanup: + return ret; +} diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 7a774a543..b2db45e87 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -829,4 +829,12 @@ int qemuDomainNamespaceSetupChardev(virQEMUDriverPtr driver, int qemuDomainNamespaceTeardownChardev(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainChrDefPtr chr); + +int qemuDomainNamespaceSetupRNG(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainRNGDefPtr rng); + +int qemuDomainNamespaceTeardownRNG(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainRNGDefPtr rng); #endif /* __QEMU_DOMAIN_H__ */ diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 8b41edc56..92a2e7371 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1988,6 +1988,7 @@ qemuDomainAttachRNGDevice(virConnectPtr conn, char *secAlias = NULL; bool releaseaddr = false; bool teardowncgroup = false; + bool teardowndevice = false; bool chardevAdded = false; bool objAdded = false; bool tlsobjAdded = false; @@ -2033,6 +2034,10 @@ qemuDomainAttachRNGDevice(virConnectPtr conn, goto cleanup; } + if (qemuDomainNamespaceSetupRNG(driver, vm, rng) < 0) + goto cleanup; + teardowndevice = true; + if (qemuSetupRNGCgroup(vm, rng) < 0) goto cleanup; teardowncgroup = true; @@ -2119,6 +2124,8 @@ qemuDomainAttachRNGDevice(virConnectPtr conn, qemuDomainReleaseDeviceAddress(vm, &rng->info, NULL); if (teardowncgroup && qemuTeardownRNGCgroup(vm, rng) < 0) VIR_WARN("Unable to remove RNG device cgroup ACL on hotplug fail"); + if (teardowndevice && qemuDomainNamespaceTeardownRNG(driver, vm, rng) < 0) + VIR_WARN("Unable to remove chr device from /dev"); } VIR_FREE(tlsAlias); @@ -4109,6 +4116,9 @@ qemuDomainRemoveRNGDevice(virQEMUDriverPtr driver, if (qemuTeardownRNGCgroup(vm, rng) < 0) VIR_WARN("Failed to remove RNG device cgroup ACL"); + if (qemuDomainNamespaceTeardownRNG(driver, vm, rng) < 0) + VIR_WARN("Unable to remove RNG device from /dev"); + event = virDomainEventDeviceRemovedNewFromObj(vm, rng->info.alias); qemuDomainEventQueue(driver, event); -- 2.11.0

On Wed, Dec 07, 2016 at 09:36:26AM +0100, Michal Privoznik wrote:
When attaching a device to a domain that's using separate mount namespace we must maintain /dev entries in order for qemu process to see them.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_domain.h | 8 ++++++ src/qemu/qemu_hotplug.c | 10 +++++++ 3 files changed, 86 insertions(+), 2 deletions(-)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

Given how intrusive previous patches are, it might happen that there's a bug or imperfection. Lets give users a way out: if they set 'namespaces' to an empty array in qemu.conf the feature is suppressed. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 8 ++++++++ src/qemu/qemu_conf.c | 33 +++++++++++++++++++++++++++++++++ src/qemu/qemu_conf.h | 2 ++ src/qemu/qemu_domain.c | 3 ++- src/qemu/test_libvirtd_qemu.aug.in | 3 +++ 6 files changed, 49 insertions(+), 1 deletion(-) diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug index f3cc9e684..de723b254 100644 --- a/src/qemu/libvirtd_qemu.aug +++ b/src/qemu/libvirtd_qemu.aug @@ -70,6 +70,7 @@ module Libvirtd_qemu = | str_array_entry "cgroup_controllers" | str_array_entry "cgroup_device_acl" | int_entry "seccomp_sandbox" + | str_array_entry "namespaces" let save_entry = str_entry "save_image_format" | str_entry "dump_image_format" diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 2b2bd6031..a8cd369cb 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -665,3 +665,11 @@ # Defaults to 4 # #gluster_debug_level = 9 + +# To enhance security, QEMU driver is capable of creating private namespaces +# for each domain started. Well, so far only "mount" namespace is supported. If +# enabled it means qemu process is unable to see all the devices on the system, +# only those configured for the domain in question. Libvirt then manages +# devices entries throughout the domain lifetime. This namespace is turned on +# by default. +#namespaces = [ "mount" ] diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index ccefbe890..d8b8ce8df 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -314,6 +314,12 @@ virQEMUDriverConfigPtr virQEMUDriverConfigNew(bool privileged) cfg->glusterDebugLevel = 4; cfg->stdioLogD = true; + if (!(cfg->namespaces = virBitmapNew(QEMU_DOMAIN_NS_LAST))) + goto error; + + if (virBitmapSetBit(cfg->namespaces, QEMU_DOMAIN_NS_MOUNT) < 0) + goto error; + #ifdef DEFAULT_LOADER_NVRAM if (virFirmwareParseList(DEFAULT_LOADER_NVRAM, &cfg->firmwares, @@ -349,6 +355,7 @@ static void virQEMUDriverConfigDispose(void *obj) { virQEMUDriverConfigPtr cfg = obj; + virBitmapFree(cfg->namespaces); virStringListFree(cfg->cgroupDeviceACL); @@ -433,6 +440,7 @@ int virQEMUDriverConfigLoadFile(virQEMUDriverConfigPtr cfg, char **hugetlbfs = NULL; char **nvram = NULL; char *corestr = NULL; + char **namespaces = NULL; /* Just check the file is readable before opening it, otherwise * libvirt emits an error. @@ -798,6 +806,31 @@ int virQEMUDriverConfigLoadFile(virQEMUDriverConfigPtr cfg, if (virConfGetValueUInt(conf, "gluster_debug_level", &cfg->glusterDebugLevel) < 0) goto cleanup; + if (virConfGetValueStringList(conf, "namespaces", false, &namespaces) < 0) + goto cleanup; + + if (namespaces) { + virBitmapClearAll(cfg->namespaces); + + for (i = 0; namespaces[i]; i++) { + int ns = qemuDomainNamespaceTypeFromString(namespaces[i]); + + if (ns < 0) { + virReportError(VIR_ERR_CONF_SYNTAX, + _("Unknown namespace: %s"), + namespaces[i]); + goto cleanup; + } + + if (virBitmapSetBit(cfg->namespaces, ns) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to enable namespace: %s"), + namespaces[i]); + goto cleanup; + } + } + } + ret = 0; cleanup: diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index f6e325760..5ea5923dc 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -90,6 +90,8 @@ struct _virQEMUDriverConfig { gid_t group; bool dynamicOwnership; + virBitmapPtr namespaces; + int cgroupControllers; char **cgroupDeviceACL; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index a5dc7bbd2..0308170e7 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -7311,7 +7311,8 @@ qemuDomainCreateNamespace(virQEMUDriverPtr driver, char *devPath = NULL; char *devptsPath = NULL; - if (!virQEMUDriverIsPrivileged(driver)) { + if (!virBitmapIsBitSet(cfg->namespaces, QEMU_DOMAIN_NS_MOUNT) || + !virQEMUDriverIsPrivileged(driver)) { ret = 0; goto cleanup; } diff --git a/src/qemu/test_libvirtd_qemu.aug.in b/src/qemu/test_libvirtd_qemu.aug.in index f586e956d..a749f0900 100644 --- a/src/qemu/test_libvirtd_qemu.aug.in +++ b/src/qemu/test_libvirtd_qemu.aug.in @@ -91,3 +91,6 @@ module Test_libvirtd_qemu = } { "stdio_handler" = "logd" } { "gluster_debug_level" = "9" } +{ "namespaces" + { "1" = "mount" } +} -- 2.11.0

On Wed, Dec 07, 2016 at 09:36:27AM +0100, Michal Privoznik wrote:
Given how intrusive previous patches are, it might happen that there's a bug or imperfection. Lets give users a way out: if they set 'namespaces' to an empty array in qemu.conf the feature is suppressed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 8 ++++++++ src/qemu/qemu_conf.c | 33 +++++++++++++++++++++++++++++++++ src/qemu/qemu_conf.h | 2 ++ src/qemu/qemu_domain.c | 3 ++- src/qemu/test_libvirtd_qemu.aug.in | 3 +++ 6 files changed, 49 insertions(+), 1 deletion(-)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 0308170e7..9769c11a5 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -7338,7 +7338,8 @@ qemuDomainCreateNamespace(virQEMUDriverPtr driver, goto cleanup; } - /* Enabling of the mount namespace goes here. */ + if (qemuDomainEnableNamespace(vm, QEMU_DOMAIN_NS_MOUNT) < 0) + goto cleanup; ret = 0; cleanup: -- 2.11.0

On Wed, Dec 07, 2016 at 09:36:28AM +0100, Michal Privoznik wrote:
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 0308170e7..9769c11a5 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -7338,7 +7338,8 @@ qemuDomainCreateNamespace(virQEMUDriverPtr driver, goto cleanup; }
- /* Enabling of the mount namespace goes here. */ + if (qemuDomainEnableNamespace(vm, QEMU_DOMAIN_NS_MOUNT) < 0) + goto cleanup;
ret = 0; cleanup:
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 07.12.2016 09:36, Michal Privoznik wrote:
v1 posted here: https://www.redhat.com/archives/libvir-list/2016-November/msg01208.html
diff to v1: - I've dropped the patches for hugepages which are posted separately [1] - I've reworked some parts according to Dan's suggestions - Filled missing impl for virSCSIVHostDevice which was merged meanwhile
Ping. I know this is rather big series but the sooner it is merged the better. Michal

On Wed, Dec 07, 2016 at 09:36:07AM +0100, Michal Privoznik wrote:
v1 posted here: https://www.redhat.com/archives/libvir-list/2016-November/msg01208.html
diff to v1: - I've dropped the patches for hugepages which are posted separately [1] - I've reworked some parts according to Dan's suggestions - Filled missing impl for virSCSIVHostDevice which was merged meanwhile
Please note that patches 1-5, 7 were ACKed already.
You can also find the patches on my github:
I pulled this branch and aside from the compile error i mention I can't start guests error: internal error: process exited while connecting to monitor: 2016-12-12T11:47:53.740784Z qemu-system-x86_64: -chardev pty,id=charserial0: Failed to create PTY: No such file or directory Without having investigated, I wonder if it is trying to access the /dev/ptmx file and failing ? The /dev/ptmx file needs to be a symlink to /dev/pts/ptmx and I'm not seeing code which creates that yet. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On Mon, Dec 12, 2016 at 11:48:50AM +0000, Daniel P. Berrange wrote:
On Wed, Dec 07, 2016 at 09:36:07AM +0100, Michal Privoznik wrote:
v1 posted here: https://www.redhat.com/archives/libvir-list/2016-November/msg01208.html
diff to v1: - I've dropped the patches for hugepages which are posted separately [1] - I've reworked some parts according to Dan's suggestions - Filled missing impl for virSCSIVHostDevice which was merged meanwhile
Please note that patches 1-5, 7 were ACKed already.
You can also find the patches on my github:
I pulled this branch and aside from the compile error i mention I can't start guests
error: internal error: process exited while connecting to monitor: 2016-12-12T11:47:53.740784Z qemu-system-x86_64: -chardev pty,id=charserial0: Failed to create PTY: No such file or directory
Without having investigated, I wonder if it is trying to access the /dev/ptmx file and failing ? The /dev/ptmx file needs to be a symlink to /dev/pts/ptmx and I'm not seeing code which creates that yet.
Ok, in fact it is exactly the opposite problem :-) You have done a bind mount of /dev/pts/ptmx -> /dev/ptmx, which ought to be fine, except on Fedora 25 for some reason the /dev/pts/ptmx file is created c-------- so nothing has privileges to access it. Instead Fedora uses a real /dev/ptmx device node. I guess this way Fedora works is ok, because you only need the /dev/ptmx symlink / bind mount, if we created a *new* devpts mount instance - but we're just reusing the host instance. IOW, just modify qemuDomainBuildNamespace to remove the bind mount of /dev/ptmx entirely, and QEMU starts. After that, my next problem is lack of /dev/shm break SPICE graphics. We should treat /dev/shm the same way we treat /dev/pts - just preserve the existing host /dev/shm mount point (if it exists) Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 12.12.2016 13:05, Daniel P. Berrange wrote:
On Mon, Dec 12, 2016 at 11:48:50AM +0000, Daniel P. Berrange wrote:
On Wed, Dec 07, 2016 at 09:36:07AM +0100, Michal Privoznik wrote:
v1 posted here: https://www.redhat.com/archives/libvir-list/2016-November/msg01208.html
diff to v1: - I've dropped the patches for hugepages which are posted separately [1] - I've reworked some parts according to Dan's suggestions - Filled missing impl for virSCSIVHostDevice which was merged meanwhile
Please note that patches 1-5, 7 were ACKed already.
You can also find the patches on my github:
I pulled this branch and aside from the compile error i mention I can't start guests
error: internal error: process exited while connecting to monitor: 2016-12-12T11:47:53.740784Z qemu-system-x86_64: -chardev pty,id=charserial0: Failed to create PTY: No such file or directory
Without having investigated, I wonder if it is trying to access the /dev/ptmx file and failing ? The /dev/ptmx file needs to be a symlink to /dev/pts/ptmx and I'm not seeing code which creates that yet.
Ok, in fact it is exactly the opposite problem :-)
You have done a bind mount of /dev/pts/ptmx -> /dev/ptmx, which ought to be fine, except on Fedora 25 for some reason the /dev/pts/ptmx file is created c-------- so nothing has privileges to access it. Instead Fedora uses a real /dev/ptmx device node.
I guess this way Fedora works is ok, because you only need the /dev/ptmx symlink / bind mount, if we created a *new* devpts mount instance - but we're just reusing the host instance.
IOW, just modify qemuDomainBuildNamespace to remove the bind mount of /dev/ptmx entirely, and QEMU starts.
Ah, okay. Haven't seen that on my system. Wonder why is that.
After that, my next problem is lack of /dev/shm break SPICE graphics.
We should treat /dev/shm the same way we treat /dev/pts - just preserve the existing host /dev/shm mount point (if it exists)
I will post v3 for the patch that sets up /dev/* (even though it had been ACKed). However, on my system there is more: # findmnt -R /dev/ TARGET SOURCE FSTYPE OPTIONS /dev udev devtmpfs rw,nosuid,relatime,size=10240k,nr_inodes=2013332,mode=755 ├─/dev/pts devpts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 ├─/dev/mqueue mqueue mqueue rw,nosuid,nodev,noexec,relatime └─/dev/shm shm tmpfs rw,nosuid,nodev,noexec,relatime Should we worry about /dev/mqueue too? No idea what it is. Michal

On Mon, Dec 12, 2016 at 04:34:01PM +0100, Michal Privoznik wrote:
On 12.12.2016 13:05, Daniel P. Berrange wrote:
On Mon, Dec 12, 2016 at 11:48:50AM +0000, Daniel P. Berrange wrote:
On Wed, Dec 07, 2016 at 09:36:07AM +0100, Michal Privoznik wrote:
v1 posted here: https://www.redhat.com/archives/libvir-list/2016-November/msg01208.html
diff to v1: - I've dropped the patches for hugepages which are posted separately [1] - I've reworked some parts according to Dan's suggestions - Filled missing impl for virSCSIVHostDevice which was merged meanwhile
Please note that patches 1-5, 7 were ACKed already.
You can also find the patches on my github:
I pulled this branch and aside from the compile error i mention I can't start guests
error: internal error: process exited while connecting to monitor: 2016-12-12T11:47:53.740784Z qemu-system-x86_64: -chardev pty,id=charserial0: Failed to create PTY: No such file or directory
Without having investigated, I wonder if it is trying to access the /dev/ptmx file and failing ? The /dev/ptmx file needs to be a symlink to /dev/pts/ptmx and I'm not seeing code which creates that yet.
Ok, in fact it is exactly the opposite problem :-)
You have done a bind mount of /dev/pts/ptmx -> /dev/ptmx, which ought to be fine, except on Fedora 25 for some reason the /dev/pts/ptmx file is created c-------- so nothing has privileges to access it. Instead Fedora uses a real /dev/ptmx device node.
I guess this way Fedora works is ok, because you only need the /dev/ptmx symlink / bind mount, if we created a *new* devpts mount instance - but we're just reusing the host instance.
IOW, just modify qemuDomainBuildNamespace to remove the bind mount of /dev/ptmx entirely, and QEMU starts.
Ah, okay. Haven't seen that on my system. Wonder why is that.
After that, my next problem is lack of /dev/shm break SPICE graphics.
We should treat /dev/shm the same way we treat /dev/pts - just preserve the existing host /dev/shm mount point (if it exists)
I will post v3 for the patch that sets up /dev/* (even though it had been ACKed). However, on my system there is more:
# findmnt -R /dev/ TARGET SOURCE FSTYPE OPTIONS /dev udev devtmpfs rw,nosuid,relatime,size=10240k,nr_inodes=2013332,mode=755 ├─/dev/pts devpts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 ├─/dev/mqueue mqueue mqueue rw,nosuid,nodev,noexec,relatime └─/dev/shm shm tmpfs rw,nosuid,nodev,noexec,relatime
Should we worry about /dev/mqueue too? No idea what it is.
POSIX message queues - we should probably set that up too - things using shm to coordinate between processes will often also use message queues. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On Wed, Dec 07, 2016 at 09:36:07AM +0100, Michal Privoznik wrote:
v1 posted here: https://www.redhat.com/archives/libvir-list/2016-November/msg01208.html
diff to v1: - I've dropped the patches for hugepages which are posted separately [1] - I've reworked some parts according to Dan's suggestions - Filled missing impl for virSCSIVHostDevice which was merged meanwhile
Please note that patches 1-5, 7 were ACKed already.
Please just push 1-7 instead of reposting them in any v3 - they're fine as is. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

Prime time. When it comes to spawning qemu process and relabelling all the devices it's going to touch, there's inherent race with other applications in the system (e.g. udev). Instead of trying convincing udev to not touch libvirt managed devices, we can create a separate mount namespace for the qemu, and mount our own /dev there. Of course this puts more work onto us as we have to maintain /dev files on each domain start and device hot(un-)plug. On the other hand, this enhances security also.
From technical POV, on domain startup process the parent (libvirtd) creates:
/var/lib/libvirt/qemu/$domain.dev /var/lib/libvirt/qemu/$domain.devpts The child (which is going to be qemu eventually) calls unshare() to create new mount namespace. From now on anything that child does is invisible to the parent. Child then mounts tmpfs on $domain.dev (so that it still sees original /dev from the host) and creates some devices (as explained in one of the previous patches). The devices have to be created exactly as they are in the host (including perms, seclabels, ACLs, ...). After that it moves $domain.dev mount to /dev. What's the $domain.devpts mount there for then you ask? QEMU can create PTYs for some chardevs. And historically we exposed the host ends in our domain XML allowing users to connect to them. Therefore we must preserve devpts mount to be shared with the host's one. To make this patch as small as possible, creating of devices configured for domain in question is implemented in next patches. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- Diff to v2: - More /dev/* mounts are preserved - Fixed qemuDomainEnableNamespace return type and error reporting The whole patches can found here: https://github.com/zippy2/libvirt/tree/qemu_container_v4 src/qemu/qemu_domain.c | 451 +++++++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_domain.h | 20 +++ src/qemu/qemu_process.c | 13 ++ 3 files changed, 482 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index d3e505b17..038c0a057 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -55,6 +55,9 @@ #include <sys/time.h> #include <fcntl.h> +#if defined(HAVE_SYS_MOUNT_H) +# include <sys/mount.h> +#endif #include <libxml/xpathInternals.h> @@ -86,6 +89,20 @@ VIR_ENUM_IMPL(qemuDomainAsyncJob, QEMU_ASYNC_JOB_LAST, "start", ); +VIR_ENUM_IMPL(qemuDomainNamespace, QEMU_DOMAIN_NS_LAST, + "mount", +); + + +static struct { + const char *path; + const char *suffix; +} devPreserveMounts[] = { + {"/dev/pts", "devpts"}, + {"/dev/shm", "devshm"}, + {"/dev/mqueue", "mqueue"}, +}; + struct _qemuDomainLogContext { int refs; @@ -146,6 +163,74 @@ qemuDomainAsyncJobPhaseFromString(qemuDomainAsyncJob job, } +bool +qemuDomainNamespaceEnabled(virDomainObjPtr vm, + qemuDomainNamespace ns) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + return priv->namespaces && + virBitmapIsBitSet(priv->namespaces, ns); +} + + +static int +qemuDomainEnableNamespace(virDomainObjPtr vm, + qemuDomainNamespace ns) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (!priv->namespaces && + !(priv->namespaces = virBitmapNew(QEMU_DOMAIN_NS_LAST))) + return -1; + + if (virBitmapSetBit(priv->namespaces, ns) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to enable namespace: %s"), + qemuDomainNamespaceTypeToString(ns)); + return -1; + } + + return 0; +} + + +static int +qemuDomainGetPreservedMounts(virQEMUDriverPtr driver, + virDomainObjPtr vm, + char ***devMountsPath, + size_t *ndevMountsPath) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + char **paths; + size_t i; + + if (VIR_ALLOC_N(paths, ARRAY_CARDINALITY(devPreserveMounts)) < 0) + goto error; + + for (i = 0; i < ARRAY_CARDINALITY(devPreserveMounts); i++) { + if (virAsprintf(&paths[i], "%s/%s.%s", + cfg->stateDir, vm->def->name, + devPreserveMounts[i].suffix) < 0) + goto error; + } + + *devMountsPath = paths; + *ndevMountsPath = ARRAY_CARDINALITY(devPreserveMounts); + virObjectUnref(cfg); + return 0; + + error: + if (paths) { + for (i = 0; i < ARRAY_CARDINALITY(devPreserveMounts); i++) + VIR_FREE(paths[i]); + VIR_FREE(paths); + } + virObjectUnref(cfg); + return -1; +} + + void qemuDomainEventQueue(virQEMUDriverPtr driver, virObjectEventPtr event) { @@ -1541,6 +1626,8 @@ qemuDomainObjPrivateFree(void *data) virObjectUnref(priv->qemuCaps); + virBitmapFree(priv->namespaces); + virCgroupFree(&priv->cgroup); virDomainPCIAddressSetFree(priv->pciaddrs); virDomainUSBAddressSetFree(priv->usbaddrs); @@ -1627,6 +1714,17 @@ qemuDomainObjPrivateXMLFormat(virBufferPtr buf, virDomainChrTypeToString(priv->monConfig->type)); } + if (priv->namespaces) { + ssize_t ns = -1; + + virBufferAddLit(buf, "<namespaces>\n"); + virBufferAdjustIndent(buf, 2); + while ((ns = virBitmapNextSetBit(priv->namespaces, ns)) >= 0) + virBufferAsprintf(buf, "<%s/>\n", qemuDomainNamespaceTypeToString(ns)); + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</namespaces>\n"); + } + qemuDomainObjPrivateXMLFormatVcpus(buf, vm->def); if (priv->qemuCaps) { @@ -1771,6 +1869,7 @@ qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, int n; size_t i; xmlNodePtr *nodes = NULL; + xmlNodePtr node = NULL; virQEMUCapsPtr qemuCaps = NULL; virCapsPtr caps = NULL; @@ -1809,6 +1908,30 @@ qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, goto error; } + if ((node = virXPathNode("./namespaces", ctxt))) { + xmlNodePtr next; + + for (next = node->children; next; next = next->next) { + int ns = qemuDomainNamespaceTypeFromString((const char *) next->name); + + if (ns < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("malformed namespace name: %s"), + next->name); + goto error; + } + + if (qemuDomainEnableNamespace(vm, ns) < 0) + goto error; + } + } + + if (priv->namespaces && + virBitmapIsAllClear(priv->namespaces)) { + virBitmapFree(priv->namespaces); + priv->namespaces = NULL; + } + if ((n = virXPathNodeSet("./vcpus/vcpu", ctxt, &nodes)) < 0) goto error; @@ -1959,10 +2082,12 @@ qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, return 0; error: + VIR_FREE(nodes); + VIR_FREE(tmp); + virBitmapFree(priv->namespaces); + priv->namespaces = NULL; virDomainChrSourceDefFree(priv->monConfig); priv->monConfig = NULL; - VIR_FREE(nodes); - VIR_FREE(tmp); virStringListFree(priv->qemuDevices); priv->qemuDevices = NULL; virObjectUnref(qemuCaps); @@ -6653,3 +6778,325 @@ qemuDomainSupportsVideoVga(virDomainVideoDefPtr video, return true; } + + +static int +qemuDomainCreateDevice(const char *device, + const char *path, + bool allow_noent) +{ + char *devicePath = NULL; + struct stat sb; + int ret = -1; + + if (!STRPREFIX(device, "/dev")) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("invalid device: %s"), + device); + goto cleanup; + } + + if (virAsprintf(&devicePath, "%s/%s", + path, device + 4) < 0) + goto cleanup; + + if (stat(device, &sb) < 0) { + if (errno == ENOENT && allow_noent) { + /* Ignore non-existent device. */ + ret = 0; + goto cleanup; + } + + virReportSystemError(errno, _("Unable to stat %s"), device); + goto cleanup; + } + + if (virFileMakeParentPath(devicePath) < 0) { + virReportSystemError(errno, + _("Unable to create %s"), + devicePath); + goto cleanup; + } + + if (mknod(devicePath, sb.st_mode, sb.st_rdev) < 0) { + virReportSystemError(errno, + _("Failed to make device %s"), + devicePath); + goto cleanup; + } + + if (chown(devicePath, sb.st_uid, sb.st_gid) < 0) { + virReportSystemError(errno, + _("Failed to chown device %s"), + devicePath); + goto cleanup; + } + + if (virFileCopyACLs(device, devicePath) < 0 && + errno != ENOTSUP) { + virReportSystemError(errno, + _("Failed to copy ACLs on device %s"), + devicePath); + goto cleanup; + } + + ret = 0; + cleanup: + VIR_FREE(devicePath); + return ret; +} + + + +static int +qemuDomainPopulateDevices(virQEMUDriverPtr driver, + virDomainObjPtr vm ATTRIBUTE_UNUSED, + const char *path) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + const char *const *devices = (const char *const *) cfg->cgroupDeviceACL; + size_t i; + int ret = -1; + + if (!devices) + devices = defaultDeviceACL; + + for (i = 0; devices[i]; i++) { + if (qemuDomainCreateDevice(devices[i], path, true) < 0) + goto cleanup; + } + + ret = 0; + cleanup: + virObjectUnref(cfg); + return ret; +} + + +static int +qemuDomainSetupDev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *path) +{ + char *mount_options = NULL; + char *opts = NULL; + int ret = -1; + + VIR_DEBUG("Setting up /dev/ for domain %s", vm->def->name); + + mount_options = virSecurityManagerGetMountOptions(driver->securityManager, + vm->def); + + if (!mount_options && + VIR_STRDUP(mount_options, "") < 0) + goto cleanup; + + /* + * tmpfs is limited to 64kb, since we only have device nodes in there + * and don't want to DOS the entire OS RAM usage + */ + if (virAsprintf(&opts, + "mode=755,size=65536%s", mount_options) < 0) + goto cleanup; + + if (virFileSetupDev(path, opts) < 0) + goto cleanup; + + if (qemuDomainPopulateDevices(driver, vm, path) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(opts); + VIR_FREE(mount_options); + return ret; +} + + +int +qemuDomainBuildNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + const unsigned long mount_flags = MS_MOVE; + char *devPath = NULL; + char **devMountsPath = NULL; + size_t ndevMountsPath = 0, i; + int ret = -1; + + if (!qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) { + ret = 0; + goto cleanup; + } + + if (virAsprintf(&devPath, "%s/%s.dev", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + if (qemuDomainGetPreservedMounts(driver, vm, + &devMountsPath, &ndevMountsPath) < 0) + goto cleanup; + + if (qemuDomainSetupDev(driver, vm, devPath) < 0) + goto cleanup; + + /* Save some mount points because we want to share them with the host */ + for (i = 0; i < ndevMountsPath; i++) { + if (mount(devPreserveMounts[i].path, devMountsPath[i], + NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, + _("Unable to move %s mount"), + devPreserveMounts[i].path); + goto cleanup; + } + } + + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, + _("Failed to mount %s on /dev"), + devPath); + goto cleanup; + } + + for (i = 0; i < ndevMountsPath; i++) { + if (virFileMakePath(devPreserveMounts[i].path) < 0) { + virReportSystemError(errno, _("Cannot create %s"), + devPreserveMounts[i].path); + goto cleanup; + } + + if (mount(devMountsPath[i], devPreserveMounts[i].path, + NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, + _("Failed to mount %s on %s"), + devMountsPath[i], + devPreserveMounts[i].path); + goto cleanup; + } + } + + ret = 0; + cleanup: + virObjectUnref(cfg); + VIR_FREE(devPath); + for (i = 0; i < ndevMountsPath; i++) + VIR_FREE(devMountsPath[i]); + VIR_FREE(devMountsPath); + return ret; +} + + +#if defined(__linux__) +int +qemuDomainCreateNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + int ret = -1; + char *devPath = NULL; + char **devMountsPath = NULL; + size_t ndevMountsPath = 0, i; + + if (!virQEMUDriverIsPrivileged(driver)) { + ret = 0; + goto cleanup; + } + + if (virAsprintf(&devPath, "%s/%s.dev", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + if (qemuDomainGetPreservedMounts(driver, vm, + &devMountsPath, &ndevMountsPath) < 0) + goto cleanup; + + if (virFileMakePath(devPath) < 0) { + virReportSystemError(errno, + _("Failed to create %s"), + devPath); + goto cleanup; + } + + for (i = 0; i < ndevMountsPath; i++) { + if (virFileMakePath(devMountsPath[i]) < 0) { + virReportSystemError(errno, + _("Failed to create %s"), + devMountsPath[i]); + goto cleanup; + } + } + + /* Enabling of the mount namespace goes here. */ + + ret = 0; + cleanup: + if (ret < 0) { + if (devPath) + unlink(devPath); + for (i = 0; i < ndevMountsPath; i++) + unlink(devMountsPath[i]); + } + for (i = 0; i < ndevMountsPath; i++) + VIR_FREE(devMountsPath[i]); + VIR_FREE(devMountsPath); + VIR_FREE(devPath); + virObjectUnref(cfg); + return ret; +} + +#else /* !defined(__linux__) */ + +int +qemuDomainCreateNamespace(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm ATTRIBUTE_UNUSED) +{ + /* Namespaces are Linux specific. On other platforms just + * carry on with the old behaviour. */ + return 0; +} +#endif /* !defined(__linux__) */ + + +void +qemuDomainDeleteNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + char *devPath = NULL; + char **devMountsPath = NULL; + size_t ndevMountsPath = 0, i; + + + if (!qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) + return; + + if (virAsprintf(&devPath, "%s/%s.dev", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + if (qemuDomainGetPreservedMounts(driver, vm, + &devMountsPath, &ndevMountsPath) < 0) + goto cleanup; + + if (rmdir(devPath) < 0) { + virReportSystemError(errno, + _("Unable to remove %s"), + devPath); + /* Bet effort. Fall through. */ + } + + for (i = 0; i < ndevMountsPath; i++) { + if (rmdir(devMountsPath[i]) < 0) { + virReportSystemError(errno, + _("Unable to remove %s"), + devMountsPath[i]); + /* Bet effort. Fall through. */ + } + } + cleanup: + virObjectUnref(cfg); + for (i = 0; i < ndevMountsPath; i++) + VIR_FREE(devMountsPath[i]); + VIR_FREE(devMountsPath); + VIR_FREE(devPath); +} diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 7650ff392..9dad5bc7c 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -165,11 +165,23 @@ struct _qemuDomainUnpluggingDevice { qemuDomainUnpluggingDeviceStatus status; }; + +typedef enum { + QEMU_DOMAIN_NS_MOUNT = 0, + QEMU_DOMAIN_NS_LAST +} qemuDomainNamespace; +VIR_ENUM_DECL(qemuDomainNamespace) + +bool qemuDomainNamespaceEnabled(virDomainObjPtr vm, + qemuDomainNamespace ns); + typedef struct _qemuDomainObjPrivate qemuDomainObjPrivate; typedef qemuDomainObjPrivate *qemuDomainObjPrivatePtr; struct _qemuDomainObjPrivate { struct qemuDomainJobObj job; + virBitmapPtr namespaces; + qemuMonitorPtr mon; virDomainChrSourceDefPtr monConfig; bool monJSON; @@ -785,4 +797,12 @@ int qemuDomainCheckMonitor(virQEMUDriverPtr driver, bool qemuDomainSupportsVideoVga(virDomainVideoDefPtr video, virQEMUCapsPtr qemuCaps); +int qemuDomainBuildNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm); + +int qemuDomainCreateNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm); + +void qemuDomainDeleteNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm); #endif /* __QEMU_DOMAIN_H__ */ diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index e5b77d637..6058328ce 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2657,6 +2657,12 @@ static int qemuProcessHook(void *data) if (virSecurityManagerClearSocketLabel(h->driver->securityManager, h->vm->def) < 0) goto cleanup; + if (virProcessSetupPrivateMountNS() < 0) + goto cleanup; + + if (qemuDomainBuildNamespace(h->driver, h->vm) < 0) + goto cleanup; + if (virDomainNumatuneGetMode(h->vm->def->numa, -1, &mode) == 0) { if (mode == VIR_DOMAIN_NUMATUNE_MEM_STRICT && h->cfg->cgroupControllers & (1 << VIR_CGROUP_CONTROLLER_CPUSET) && @@ -5482,6 +5488,11 @@ qemuProcessLaunch(virConnectPtr conn, qemuDomainLogContextMarkPosition(logCtxt); + VIR_DEBUG("Building mount namespace"); + + if (qemuDomainCreateNamespace(driver, vm) < 0) + goto cleanup; + VIR_DEBUG("Clear emulator capabilities: %d", cfg->clearEmulatorCapabilities); if (cfg->clearEmulatorCapabilities) @@ -6253,6 +6264,8 @@ void qemuProcessStop(virQEMUDriverPtr driver, } } + qemuDomainDeleteNamespace(driver, vm); + vm->taint = 0; vm->pid = -1; virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason); -- 2.11.0

On Mon, Dec 12, 2016 at 05:52:54PM +0100, Michal Privoznik wrote:
Prime time. When it comes to spawning qemu process and relabelling all the devices it's going to touch, there's inherent race with other applications in the system (e.g. udev). Instead of trying convincing udev to not touch libvirt managed devices, we can create a separate mount namespace for the qemu, and mount our own /dev there. Of course this puts more work onto us as we have to maintain /dev files on each domain start and device hot(un-)plug. On the other hand, this enhances security also.
From technical POV, on domain startup process the parent (libvirtd) creates:
/var/lib/libvirt/qemu/$domain.dev /var/lib/libvirt/qemu/$domain.devpts
The child (which is going to be qemu eventually) calls unshare() to create new mount namespace. From now on anything that child does is invisible to the parent. Child then mounts tmpfs on $domain.dev (so that it still sees original /dev from the host) and creates some devices (as explained in one of the previous patches). The devices have to be created exactly as they are in the host (including perms, seclabels, ACLs, ...). After that it moves $domain.dev mount to /dev.
What's the $domain.devpts mount there for then you ask? QEMU can create PTYs for some chardevs. And historically we exposed the host ends in our domain XML allowing users to connect to them. Therefore we must preserve devpts mount to be shared with the host's one.
To make this patch as small as possible, creating of devices configured for domain in question is implemented in next patches.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
Diff to v2:
- More /dev/* mounts are preserved - Fixed qemuDomainEnableNamespace return type and error reporting
The whole patches can found here:
ACK to whole series from me. I've tested this and it works correctly now, including with host assigned USB devices. There's probably some fun edge cases hiding in there, but we need more people testing to see that. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 14.12.2016 18:02, Daniel P. Berrange wrote:
On Mon, Dec 12, 2016 at 05:52:54PM +0100, Michal Privoznik wrote:
Prime time. When it comes to spawning qemu process and relabelling all the devices it's going to touch, there's inherent race with other applications in the system (e.g. udev). Instead of trying convincing udev to not touch libvirt managed devices, we can create a separate mount namespace for the qemu, and mount our own /dev there. Of course this puts more work onto us as we have to maintain /dev files on each domain start and device hot(un-)plug. On the other hand, this enhances security also.
From technical POV, on domain startup process the parent (libvirtd) creates:
/var/lib/libvirt/qemu/$domain.dev /var/lib/libvirt/qemu/$domain.devpts
The child (which is going to be qemu eventually) calls unshare() to create new mount namespace. From now on anything that child does is invisible to the parent. Child then mounts tmpfs on $domain.dev (so that it still sees original /dev from the host) and creates some devices (as explained in one of the previous patches). The devices have to be created exactly as they are in the host (including perms, seclabels, ACLs, ...). After that it moves $domain.dev mount to /dev.
What's the $domain.devpts mount there for then you ask? QEMU can create PTYs for some chardevs. And historically we exposed the host ends in our domain XML allowing users to connect to them. Therefore we must preserve devpts mount to be shared with the host's one.
To make this patch as small as possible, creating of devices configured for domain in question is implemented in next patches.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
Diff to v2:
- More /dev/* mounts are preserved - Fixed qemuDomainEnableNamespace return type and error reporting
The whole patches can found here:
ACK to whole series from me. I've tested this and it works correctly now, including with host assigned USB devices. There's probably some fun edge cases hiding in there, but we need more people testing to see that.
Thank you. I've pushed these. Michal
participants (2)
-
Daniel P. Berrange
-
Michal Privoznik