[libvirt] [PATCH v1 00/21] Run qemu under its own namespace

Finally. This is full implementation of my RFC: https://www.redhat.com/archives/libvir-list/2016-November/msg00691.html The first two patches were posted separately, but since they lack review I'm sending them here too because they are important for the feature: https://www.redhat.com/archives/libvir-list/2016-November/msg01060.html All of these patches: a) can be found on my github: https://github.com/zippy2/libvirt/tree/qemu_container_v2 b) pass my basic testing: - run domain with device passthrough - device hot(un-)plug (disks, RNGs, chardevs, PCI/USB) c) seem to add negligible overhead to domain startup process Michal Privoznik (21): qemu: Create hugepage path on per domain basis security: Implement virSecurityManagerSetHugepages virprocess: Introduce virProcessSetupPrivateMountNS virfile: Introduce virFileSetupDev virfile: Introduce ACL helpers virusb: Introduce virUSBDeviceGetPath virscsi: Introduce virSCSIDeviceGetPath qemu_cgroup: Expose defaultDeviceACL qemu: Spawn qemu under mount namespace qemu: Prepare disks when starting a domain qemu: Prepare hostdevs when starting a domain qemu: Prepare chardevs when starting a domain qemu: Prepare TPM when starting a domain qemu: Prepare inputs when starting a domain qemu: Prepare RNGs when starting a domain qemu: Enter the namespace on relabelling qemu: Manage /dev entry on disk hotplug qemu: Manage /dev entry on hostdev hotplug qemu: Manage /dev entry on chardev hotplug qemu: Manage /dev entry on RNG hotplug qemu: Let users opt-out from containerization configure.ac | 12 +- src/Makefile.am | 7 +- src/libvirt_private.syms | 9 + src/lxc/lxc_container.c | 20 +- src/lxc/lxc_controller.c | 32 +- src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 8 + src/qemu/qemu_cgroup.c | 2 +- src/qemu/qemu_cgroup.h | 1 + src/qemu/qemu_command.c | 4 +- src/qemu/qemu_conf.c | 50 +- src/qemu/qemu_conf.h | 18 +- src/qemu/qemu_domain.c | 1147 ++++++++++++++++++++ src/qemu/qemu_domain.h | 42 + src/qemu/qemu_driver.c | 24 +- src/qemu/qemu_hotplug.c | 90 +- src/qemu/qemu_process.c | 53 +- src/qemu/qemu_security.c | 208 ++++ src/qemu/qemu_security.h | 55 + src/qemu/test_libvirtd_qemu.aug.in | 1 + src/security/security_dac.c | 11 + src/security/security_selinux.c | 10 + src/util/virfile.c | 153 +++ src/util/virfile.h | 17 + src/util/virprocess.c | 38 + src/util/virprocess.h | 2 + src/util/virscsi.c | 6 + src/util/virscsi.h | 1 + src/util/virusb.c | 5 + src/util/virusb.h | 1 + .../qemuxml2argv-hugepages-numa.args | 4 +- .../qemuxml2argv-hugepages-pages.args | 14 +- .../qemuxml2argv-hugepages-pages2.args | 2 +- .../qemuxml2argv-hugepages-pages3.args | 2 +- .../qemuxml2argv-hugepages-pages5.args | 2 +- .../qemuxml2argv-hugepages-shared.args | 12 +- tests/qemuxml2argvdata/qemuxml2argv-hugepages.args | 2 +- .../qemuxml2argv-memory-hotplug-dimm-addr.args | 4 +- .../qemuxml2argv-memory-hotplug-dimm.args | 4 +- 39 files changed, 1933 insertions(+), 141 deletions(-) create mode 100644 src/qemu/qemu_security.c create mode 100644 src/qemu/qemu_security.h -- 2.8.4

If you've ever tried running a huge page backed guest under different user than root, you probably failed. Problem is even though we have corresponding APIs in the security drivers, there's no implementation and thus we don't relabel the huge page path. But even if we did, so far all of the domains share the same path: /hugepageMount/libvirt/qemu Our only option there would be to set 0777 mode on the qemu dir which is totally unsafe. Therefore, we can create dir on per-domain basis, i.e.: /hugepageMount/libvirt/qemu/domainName and chown domainName dir to the user that domain is configured to run under. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_command.c | 4 +- src/qemu/qemu_conf.c | 45 ++++++++++++++++------ src/qemu/qemu_conf.h | 16 +++++--- src/qemu/qemu_driver.c | 19 +++------ src/qemu/qemu_process.c | 25 +++++++++++- .../qemuxml2argv-hugepages-numa.args | 4 +- .../qemuxml2argv-hugepages-pages.args | 14 +++---- .../qemuxml2argv-hugepages-pages2.args | 2 +- .../qemuxml2argv-hugepages-pages3.args | 2 +- .../qemuxml2argv-hugepages-pages5.args | 2 +- .../qemuxml2argv-hugepages-shared.args | 12 +++--- tests/qemuxml2argvdata/qemuxml2argv-hugepages.args | 2 +- .../qemuxml2argv-memory-hotplug-dimm-addr.args | 4 +- .../qemuxml2argv-memory-hotplug-dimm.args | 4 +- 14 files changed, 97 insertions(+), 58 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 51b5aaf..becdce4 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3303,7 +3303,7 @@ qemuBuildMemoryBackendStr(unsigned long long size, return -1; if (pagesize) { - if (qemuGetHupageMemPath(cfg, pagesize, &mem_path) < 0) + if (qemuGetDomainHupageMemPath(def, cfg, pagesize, &mem_path) < 0) goto cleanup; *backendType = "memory-backend-file"; @@ -7177,7 +7177,7 @@ qemuBuildMemPathStr(virQEMUDriverConfigPtr cfg, return -1; } - if (qemuGetHupageMemPath(cfg, def->mem.hugepages[0].size, &mem_path) < 0) + if (qemuGetDomainHupageMemPath(def, cfg, def->mem.hugepages[0].size, &mem_path) < 0) return -1; virCommandAddArgList(cmd, "-mem-prealloc", "-mem-path", mem_path, NULL); diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index fbac3ea..9be5b60 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -1456,7 +1456,7 @@ qemuTranslateSnapshotDiskSourcePool(virConnectPtr conn ATTRIBUTE_UNUSED, } char * -qemuGetHugepagePath(virHugeTLBFSPtr hugepage) +qemuGetBaseHugepagePath(virHugeTLBFSPtr hugepage) { char *ret; @@ -1467,8 +1467,26 @@ qemuGetHugepagePath(virHugeTLBFSPtr hugepage) } +char * +qemuGetDomainHugepagePath(const virDomainDef *def, + virHugeTLBFSPtr hugepage) +{ + char *base = qemuGetBaseHugepagePath(hugepage); + char *ret; + + if (!base || + virAsprintf(&ret, "%s/%s", base, def->name) < 0) { + VIR_FREE(base); + return NULL; + } + + return ret; +} + + /** - * qemuGetDefaultHugepath: + * qemuGetDomainDefaultHugepath: + * @def: domain definition * @hugetlbfs: array of configured hugepages * @nhugetlbfs: number of item in the array * @@ -1477,8 +1495,9 @@ qemuGetHugepagePath(virHugeTLBFSPtr hugepage) * Returns 0 on success, -1 otherwise. * */ char * -qemuGetDefaultHugepath(virHugeTLBFSPtr hugetlbfs, - size_t nhugetlbfs) +qemuGetDomainDefaultHugepath(const virDomainDef *def, + virHugeTLBFSPtr hugetlbfs, + size_t nhugetlbfs) { size_t i; @@ -1489,12 +1508,12 @@ qemuGetDefaultHugepath(virHugeTLBFSPtr hugetlbfs, if (i == nhugetlbfs) i = 0; - return qemuGetHugepagePath(&hugetlbfs[i]); + return qemuGetDomainHugepagePath(def, &hugetlbfs[i]); } /** - * qemuGetHupageMemPath: Construct HP enabled memory backend path + * qemuGetDomainHupageMemPath: Construct HP enabled memory backend path * * If no specific hugepage size is requested (@pagesize is zero) * the default hugepage size is used). @@ -1504,9 +1523,10 @@ qemuGetDefaultHugepath(virHugeTLBFSPtr hugetlbfs, * -1 otherwise. */ int -qemuGetHupageMemPath(virQEMUDriverConfigPtr cfg, - unsigned long long pagesize, - char **memPath) +qemuGetDomainHupageMemPath(const virDomainDef *def, + virQEMUDriverConfigPtr cfg, + unsigned long long pagesize, + char **memPath) { size_t i = 0; @@ -1518,8 +1538,9 @@ qemuGetHupageMemPath(virQEMUDriverConfigPtr cfg, } if (!pagesize) { - if (!(*memPath = qemuGetDefaultHugepath(cfg->hugetlbfs, - cfg->nhugetlbfs))) + if (!(*memPath = qemuGetDomainDefaultHugepath(def, + cfg->hugetlbfs, + cfg->nhugetlbfs))) return -1; } else { for (i = 0; i < cfg->nhugetlbfs; i++) { @@ -1535,7 +1556,7 @@ qemuGetHupageMemPath(virQEMUDriverConfigPtr cfg, return -1; } - if (!(*memPath = qemuGetHugepagePath(&cfg->hugetlbfs[i]))) + if (!(*memPath = qemuGetDomainHugepagePath(def, &cfg->hugetlbfs[i]))) return -1; } diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index f6e3257..d191e10 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -335,11 +335,15 @@ virDomainXMLOptionPtr virQEMUDriverCreateXMLConf(virQEMUDriverPtr driver); int qemuTranslateSnapshotDiskSourcePool(virConnectPtr conn, virDomainSnapshotDiskDefPtr def); -char * qemuGetHugepagePath(virHugeTLBFSPtr hugepage); -char * qemuGetDefaultHugepath(virHugeTLBFSPtr hugetlbfs, - size_t nhugetlbfs); +char * qemuGetBaseHugepagePath(virHugeTLBFSPtr hugepage); +char * qemuGetDomainHugepagePath(const virDomainDef *def, + virHugeTLBFSPtr hugepage); +char * qemuGetDomainDefaultHugepath(const virDomainDef *def, + virHugeTLBFSPtr hugetlbfs, + size_t nhugetlbfs); -int qemuGetHupageMemPath(virQEMUDriverConfigPtr cfg, - unsigned long long pagesize, - char **memPath); +int qemuGetDomainHupageMemPath(const virDomainDef *def, + virQEMUDriverConfigPtr cfg, + unsigned long long pagesize, + char **memPath); #endif /* __QEMUD_CONF_H */ diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index fdfe912..8c8f969 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -852,7 +852,7 @@ qemuStateInitialize(bool privileged, * it, since we can't assume the root mount point has permissions that * will let our spawned QEMU instances use it. */ for (i = 0; i < cfg->nhugetlbfs; i++) { - hugepagePath = qemuGetHugepagePath(&cfg->hugetlbfs[i]); + hugepagePath = qemuGetBaseHugepagePath(&cfg->hugetlbfs[i]); if (!hugepagePath) goto error; @@ -863,19 +863,10 @@ qemuStateInitialize(bool privileged, hugepagePath); goto error; } - if (privileged) { - if (virFileUpdatePerm(cfg->hugetlbfs[i].mnt_dir, - 0, S_IXGRP | S_IXOTH) < 0) - goto error; - if (chown(hugepagePath, cfg->user, cfg->group) < 0) { - virReportSystemError(errno, - _("unable to set ownership on %s to %d:%d"), - hugepagePath, - (int) cfg->user, - (int) cfg->group); - goto error; - } - } + if (privileged && + virFileUpdatePerm(cfg->hugetlbfs[i].mnt_dir, + 0, S_IXGRP | S_IXOTH) < 0) + goto error; VIR_FREE(hugepagePath); } diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index f8f379a..93a38e1 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -5264,11 +5264,19 @@ qemuProcessPrepareHost(virQEMUDriverPtr driver, if (vm->def->mem.nhugepages) { for (i = 0; i < cfg->nhugetlbfs; i++) { - char *hugepagePath = qemuGetHugepagePath(&cfg->hugetlbfs[i]); + char *hugepagePath = qemuGetDomainHugepagePath(vm->def, &cfg->hugetlbfs[i]); if (!hugepagePath) goto cleanup; + if (virFileMakePathWithMode(hugepagePath, 0700) < 0) { + virReportSystemError(errno, + _("Unable to create %s"), + hugepagePath); + VIR_FREE(hugepagePath); + goto cleanup; + } + if (virSecurityManagerSetHugepages(driver->securityManager, vm->def, hugepagePath) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, @@ -6129,6 +6137,21 @@ void qemuProcessStop(virQEMUDriverPtr driver, networkReleaseActualDevice(vm->def, net); } + if (vm->def->mem.nhugepages) { + for (i = 0; i < cfg->nhugetlbfs; i++) { + char *hugepagePath = qemuGetDomainHugepagePath(vm->def, &cfg->hugetlbfs[i]); + + if (!hugepagePath) + continue; + + if (rmdir(hugepagePath) < 0) + VIR_WARN("Unable to remove hugepage path: %s (errno=%d)", + hugepagePath, errno); + + VIR_FREE(hugepagePath); + } + } + retry: if ((ret = qemuRemoveCgroup(vm)) < 0) { if (ret == -EBUSY && (retries++ < 5)) { diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-numa.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-numa.args index 7b90784..bd562a8 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-numa.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-numa.args @@ -11,10 +11,10 @@ QEMU_AUDIO_DRV=spice \ -m size=1048576k,slots=16,maxmem=1099511627776k \ -smp 2,sockets=2,cores=1,threads=1 \ -mem-prealloc \ --mem-path /dev/hugepages2M/libvirt/qemu \ +-mem-path /dev/hugepages2M/libvirt/qemu/fedora \ -numa node,nodeid=0,cpus=0-1,mem=1024 \ -object memory-backend-file,id=memdimm0,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=1-3,\ +mem-path=/dev/hugepages1G/libvirt/qemu/fedora,size=1073741824,host-nodes=1-3,\ policy=bind \ -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \ -uuid 63840878-0deb-4095-97e6-fc444d9bc9fa \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages.args index 9f0e696..e0aa151 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages.args @@ -11,19 +11,19 @@ QEMU_AUDIO_DRV=none \ -m 4096 \ -smp 4,sockets=4,cores=1,threads=1 \ -object memory-backend-file,id=ram-node0,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=0-3,\ -policy=bind \ +mem-path=/dev/hugepages1G/libvirt/qemu/QEMUGuest1,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ -object memory-backend-file,id=ram-node1,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=1073741824,host-nodes=0-3,\ -policy=bind \ +mem-path=/dev/hugepages2M/libvirt/qemu/QEMUGuest1,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \ -object memory-backend-file,id=ram-node2,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=0-3,\ -policy=bind \ +mem-path=/dev/hugepages1G/libvirt/qemu/QEMUGuest1,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=2,cpus=2,memdev=ram-node2 \ -object memory-backend-file,id=ram-node3,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=3,\ +mem-path=/dev/hugepages1G/libvirt/qemu/QEMUGuest1,size=1073741824,host-nodes=3,\ policy=bind \ -numa node,nodeid=3,cpus=3,memdev=ram-node3 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args index a37f03d..b4cd088 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args @@ -11,7 +11,7 @@ QEMU_AUDIO_DRV=none \ -m 1024 \ -smp 2,sockets=2,cores=1,threads=1 \ -mem-prealloc \ --mem-path /dev/hugepages2M/libvirt/qemu \ +-mem-path /dev/hugepages2M/libvirt/qemu/SomeDummyHugepagesGuest \ -numa node,nodeid=0,cpus=0,mem=256 \ -numa node,nodeid=1,cpus=1,mem=768 \ -uuid ef1bdff4-27f3-4e85-a807-5fb4d58463cc \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages3.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages3.args index 57dd3fa..bf28d74 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages3.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages3.args @@ -13,7 +13,7 @@ QEMU_AUDIO_DRV=none \ -object memory-backend-ram,id=ram-node0,size=268435456 \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ -object memory-backend-file,id=ram-node1,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=805306368 \ +mem-path=/dev/hugepages1G/libvirt/qemu/SomeDummyHugepagesGuest,size=805306368 \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \ -uuid ef1bdff4-27f3-4e85-a807-5fb4d58463cc \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages5.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages5.args index a826149..a0a6506 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages5.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages5.args @@ -10,7 +10,7 @@ QEMU_AUDIO_DRV=none \ -M pc \ -m 1024 \ -mem-prealloc \ --mem-path /dev/hugepages2M/libvirt/qemu \ +-mem-path /dev/hugepages2M/libvirt/qemu/SomeDummyHugepagesGuest \ -smp 2,sockets=2,cores=1,threads=1 \ -uuid ef1bdff4-27f3-4e85-a807-5fb4d58463cc \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-shared.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-shared.args index f9fc218..03d736a 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-shared.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-shared.args @@ -11,19 +11,19 @@ QEMU_AUDIO_DRV=none \ -m 4096 \ -smp 4,sockets=4,cores=1,threads=1 \ -object memory-backend-file,id=ram-node0,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=0-3,\ -policy=bind \ +mem-path=/dev/hugepages1G/libvirt/qemu/QEMUGuest1,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ -object memory-backend-file,id=ram-node1,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,share=yes,size=1073741824,\ +mem-path=/dev/hugepages2M/libvirt/qemu/QEMUGuest1,share=yes,size=1073741824,\ host-nodes=0-3,policy=bind \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \ -object memory-backend-file,id=ram-node2,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,share=no,size=1073741824,host-nodes=0-3,\ -policy=bind \ +mem-path=/dev/hugepages1G/libvirt/qemu/QEMUGuest1,share=no,size=1073741824,\ +host-nodes=0-3,policy=bind \ -numa node,nodeid=2,cpus=2,memdev=ram-node2 \ -object memory-backend-file,id=ram-node3,prealloc=yes,\ -mem-path=/dev/hugepages1G/libvirt/qemu,size=1073741824,host-nodes=3,\ +mem-path=/dev/hugepages1G/libvirt/qemu/QEMUGuest1,size=1073741824,host-nodes=3,\ policy=bind \ -numa node,nodeid=3,cpus=3,memdev=ram-node3 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages.args b/tests/qemuxml2argvdata/qemuxml2argv-hugepages.args index 361c8e5..2d437da 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages.args @@ -10,7 +10,7 @@ QEMU_AUDIO_DRV=none \ -M pc \ -m 214 \ -mem-prealloc \ --mem-path /dev/hugepages2M/libvirt/qemu \ +-mem-path /dev/hugepages2M/libvirt/qemu/QEMUGuest1 \ -smp 1,sockets=1,cores=1,threads=1 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ -nographic \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm-addr.args b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm-addr.args index fdbb4c3..b635327 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm-addr.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm-addr.args @@ -12,8 +12,8 @@ QEMU_AUDIO_DRV=none \ -smp 2,sockets=2,cores=1,threads=1 \ -numa node,nodeid=0,cpus=0-1,mem=214 \ -object memory-backend-file,id=memdimm0,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=536870912,host-nodes=1-3,\ -policy=bind \ +mem-path=/dev/hugepages2M/libvirt/qemu/QEMUGuest1,size=536870912,\ +host-nodes=1-3,policy=bind \ -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0,addr=4294967296 \ -object memory-backend-ram,id=memdimm2,size=536870912 \ -device pc-dimm,node=0,memdev=memdimm2,id=dimm2,slot=2 \ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm.args b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm.args index 1587aba..10d8593 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm.args +++ b/tests/qemuxml2argvdata/qemuxml2argv-memory-hotplug-dimm.args @@ -14,8 +14,8 @@ QEMU_AUDIO_DRV=none \ -object memory-backend-ram,id=memdimm0,size=536870912 \ -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \ -object memory-backend-file,id=memdimm1,prealloc=yes,\ -mem-path=/dev/hugepages2M/libvirt/qemu,size=536870912,host-nodes=1-3,\ -policy=bind \ +mem-path=/dev/hugepages2M/libvirt/qemu/QEMUGuest1,size=536870912,\ +host-nodes=1-3,policy=bind \ -device pc-dimm,node=0,memdev=memdimm1,id=dimm1,slot=1 \ -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ -nographic \ -- 2.8.4

Since its introduction in 2012 this internal API did nothing. Now it's finally the right time to put it into a good use. It's implementation is fairly simple and exactly the same as virSecurityDomainSetPathLabel. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/security/security_dac.c | 11 +++++++++++ src/security/security_selinux.c | 10 ++++++++++ 2 files changed, 21 insertions(+) diff --git a/src/security/security_dac.c b/src/security/security_dac.c index fd74e8b..3fa9df9 100644 --- a/src/security/security_dac.c +++ b/src/security/security_dac.c @@ -1564,6 +1564,15 @@ virSecurityDACDomainSetPathLabel(virSecurityManagerPtr mgr, return virSecurityDACSetOwnership(priv, NULL, path, user, group); } +static int +virSecurityDACDomainSetHugepages(virSecurityManagerPtr mgr, + virDomainDefPtr def, + const char *path) +{ + return virSecurityDACDomainSetPathLabel(mgr, def, path); +} + + virSecurityDriver virSecurityDriverDAC = { .privateDataLen = sizeof(virSecurityDACData), .name = SECURITY_DAC_NAME, @@ -1613,4 +1622,6 @@ virSecurityDriver virSecurityDriverDAC = { .getBaseLabel = virSecurityDACGetBaseLabel, .domainSetPathLabel = virSecurityDACDomainSetPathLabel, + + .domainSetSecurityHugepages = virSecurityDACDomainSetHugepages, }; diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index 89c93dc..0497acc 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -2610,6 +2610,14 @@ virSecuritySELinuxDomainSetPathLabel(virSecurityManagerPtr mgr, return virSecuritySELinuxSetFilecon(mgr, path, seclabel->imagelabel); } +static int +virSecuritySELinuxDomainSetHugepages(virSecurityManagerPtr mgr, + virDomainDefPtr def, + const char *path) +{ + return virSecuritySELinuxDomainSetPathLabel(mgr, def, path); +} + virSecurityDriver virSecurityDriverSELinux = { .privateDataLen = sizeof(virSecuritySELinuxData), .name = SECURITY_SELINUX_NAME, @@ -2656,4 +2664,6 @@ virSecurityDriver virSecurityDriverSELinux = { .getBaseLabel = virSecuritySELinuxGetBaseLabel, .domainSetPathLabel = virSecuritySELinuxDomainSetPathLabel, + + .domainSetSecurityHugepages = virSecuritySELinuxDomainSetHugepages, }; -- 2.8.4

This part of code that LXC currently uses will be reused so move to a generic function. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- configure.ac | 2 +- src/libvirt_private.syms | 1 + src/lxc/lxc_controller.c | 18 +----------------- src/util/virprocess.c | 38 ++++++++++++++++++++++++++++++++++++++ src/util/virprocess.h | 2 ++ 5 files changed, 43 insertions(+), 18 deletions(-) diff --git a/configure.ac b/configure.ac index fd50ff8..5661752 100644 --- a/configure.ac +++ b/configure.ac @@ -291,7 +291,7 @@ dnl and various less common threadsafe functions AC_CHECK_FUNCS_ONCE([cfmakeraw fallocate geteuid getgid getgrnam_r \ getmntent_r getpwuid_r getrlimit getuid kill mmap newlocale posix_fallocate \ posix_memalign prlimit regexec sched_getaffinity setgroups setns \ - setrlimit symlink sysctlbyname getifaddrs sched_setscheduler]) + setrlimit symlink sysctlbyname getifaddrs sched_setscheduler unshare]) dnl Availability of pthread functions. Because of $LIB_PTHREAD, we dnl cannot use AC_CHECK_FUNCS_ONCE. LIB_PTHREAD and LIBMULTITHREAD diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b7d26fd..316d1e0 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2247,6 +2247,7 @@ virProcessSetMaxMemLock; virProcessSetMaxProcesses; virProcessSetNamespaces; virProcessSetScheduler; +virProcessSetupPrivateMountNS; virProcessTranslateStatus; virProcessWait; diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 508bc3e..29f1179 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -2092,8 +2092,6 @@ lxcCreateTty(virLXCControllerPtr ctrl, int *ttymaster, static int virLXCControllerSetupPrivateNS(void) { - int ret = -1; - /* * If doing a chroot style setup, we need to prepare * a private /dev/pts for the child now, which they @@ -2115,21 +2113,7 @@ virLXCControllerSetupPrivateNS(void) * marked as shared */ - if (unshare(CLONE_NEWNS) < 0) { - virReportSystemError(errno, "%s", - _("Cannot unshare mount namespace")); - goto cleanup; - } - - if (mount("", "/", NULL, MS_SLAVE|MS_REC, NULL) < 0) { - virReportSystemError(errno, "%s", - _("Failed to switch root mount into slave mode")); - goto cleanup; - } - - ret = 0; - cleanup: - return ret; + return virProcessSetupPrivateMountNS(); } diff --git a/src/util/virprocess.c b/src/util/virprocess.c index 3cacc89..d85407a 100644 --- a/src/util/virprocess.c +++ b/src/util/virprocess.c @@ -28,6 +28,9 @@ #include <stdlib.h> #include <sys/wait.h> #include <unistd.h> +#if HAVE_SYS_MOUNT_H +# include <sys/mount.h> +#endif #if HAVE_SETRLIMIT # include <sys/time.h> # include <sys/resource.h> @@ -1146,6 +1149,41 @@ virProcessRunInMountNamespace(pid_t pid, } +#if defined(HAVE_SYS_MOUNT_H) && defined(HAVE_UNSHARE) +int +virProcessSetupPrivateMountNS(void) +{ + int ret = -1; + + if (unshare(CLONE_NEWNS) < 0) { + virReportSystemError(errno, "%s", + _("Cannot unshare mount namespace")); + goto cleanup; + } + + if (mount("", "/", NULL, MS_SLAVE|MS_REC, NULL) < 0) { + virReportSystemError(errno, "%s", + _("Failed to switch root mount into slave mode")); + goto cleanup; + } + + ret = 0; + cleanup: + return ret; +} + +#else /* !defined(HAVE_SYS_MOUNT_H) || !defined(HAVE_UNSHARE */ + +int +virProcessSetupPrivateMountNS(void) +{ + virReportSystemError(ENOSYS, "%s", + _("Namespaces are not supported on this platform.")); + return -1; +} +#endif /* !defined(HAVE_SYS_MOUNT_H) || !defined(HAVE_UNSHARE */ + + /** * virProcessExitWithStatus: * @status: raw status to be reproduced when this process dies diff --git a/src/util/virprocess.h b/src/util/virprocess.h index 04e9802..c76a1fb 100644 --- a/src/util/virprocess.h +++ b/src/util/virprocess.h @@ -90,6 +90,8 @@ int virProcessRunInMountNamespace(pid_t pid, virProcessNamespaceCallback cb, void *opaque); +int virProcessSetupPrivateMountNS(void); + int virProcessSetScheduler(pid_t pid, virProcessSchedPolicy policy, int priority); -- 2.8.4

On Thu, Nov 24, 2016 at 03:47:52PM +0100, Michal Privoznik wrote:
This part of code that LXC currently uses will be reused so move to a generic function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- configure.ac | 2 +- src/libvirt_private.syms | 1 + src/lxc/lxc_controller.c | 18 +----------------- src/util/virprocess.c | 38 ++++++++++++++++++++++++++++++++++++++ src/util/virprocess.h | 2 ++ 5 files changed, 43 insertions(+), 18 deletions(-)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

This part of code that LXC currently uses will be reused so move to a generic function. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 2 ++ src/lxc/lxc_container.c | 20 ++------------ src/lxc/lxc_controller.c | 14 +--------- src/util/virfile.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 6 ++++ 5 files changed, 84 insertions(+), 30 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 316d1e0..ae563eb 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1556,6 +1556,7 @@ virDirRead; virFileAbsPath; virFileAccessibleAs; virFileActivateDirOverride; +virFileBindMountDevice; virFileBuildPath; virFileClose; virFileDeleteTree; @@ -1601,6 +1602,7 @@ virFileResolveAllLinks; virFileResolveLink; virFileRewrite; virFileSanitizePath; +virFileSetupDev; virFileSkipRoot; virFileStripSuffix; virFileTouch; diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 5357df4..5d7da8f 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -1112,20 +1112,6 @@ static int lxcContainerMountFSDevPTS(virDomainDefPtr def, return ret; } -static int lxcContainerBindMountDevice(const char *src, const char *dst) -{ - if (virFileTouch(dst, 0666) < 0) - return -1; - - if (mount(src, dst, "none", MS_BIND, NULL) < 0) { - virReportSystemError(errno, _("Failed to bind %s on to %s"), src, - dst); - return -1; - } - - return 0; -} - static int lxcContainerSetupDevices(char **ttyPaths, size_t nttyPaths) { size_t i; @@ -1149,7 +1135,7 @@ static int lxcContainerSetupDevices(char **ttyPaths, size_t nttyPaths) } /* We have private devpts capability, so bind that */ - if (lxcContainerBindMountDevice("/dev/pts/ptmx", "/dev/ptmx") < 0) + if (virFileBindMountDevice("/dev/pts/ptmx", "/dev/ptmx") < 0) return -1; for (i = 0; i < nttyPaths; i++) { @@ -1157,7 +1143,7 @@ static int lxcContainerSetupDevices(char **ttyPaths, size_t nttyPaths) if (virAsprintf(&tty, "/dev/tty%zu", i+1) < 0) return -1; - if (lxcContainerBindMountDevice(ttyPaths[i], tty) < 0) { + if (virFileBindMountDevice(ttyPaths[i], tty) < 0) { return -1; VIR_FREE(tty); } @@ -1165,7 +1151,7 @@ static int lxcContainerSetupDevices(char **ttyPaths, size_t nttyPaths) VIR_FREE(tty); if (i == 0 && - lxcContainerBindMountDevice(ttyPaths[i], "/dev/console") < 0) + virFileBindMountDevice(ttyPaths[i], "/dev/console") < 0) return -1; } return 0; diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 29f1179..2170b0a 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -1457,12 +1457,6 @@ static int virLXCControllerSetupDev(virLXCControllerPtr ctrl) LXC_STATE_DIR, ctrl->def->name) < 0) goto cleanup; - if (virFileMakePath(dev) < 0) { - virReportSystemError(errno, - _("Failed to make path %s"), dev); - goto cleanup; - } - /* * tmpfs is limited to 64kb, since we only have device nodes in there * and don't want to DOS the entire OS RAM usage @@ -1472,14 +1466,8 @@ static int virLXCControllerSetupDev(virLXCControllerPtr ctrl) "mode=755,size=65536%s", mount_options) < 0) goto cleanup; - VIR_DEBUG("Mount devfs on %s type=tmpfs flags=%x, opts=%s", - dev, MS_NOSUID, opts); - if (mount("devfs", dev, "tmpfs", MS_NOSUID, opts) < 0) { - virReportSystemError(errno, - _("Failed to mount devfs on %s type %s (%s)"), - dev, "tmpfs", opts); + if (virFileSetupDev(dev, opts) < 0) goto cleanup; - } if (lxcContainerChown(ctrl->def, dev) < 0) goto cleanup; diff --git a/src/util/virfile.c b/src/util/virfile.c index f006abf..6200004 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -32,6 +32,9 @@ #include <sys/types.h> #include <sys/socket.h> #include <sys/wait.h> +#if defined(HAVE_SYS_MOUNT_H) +# include <sys/mount.h> +#endif #include <unistd.h> #include <dirent.h> #include <dirname.h> @@ -3501,3 +3504,72 @@ int virFileIsSharedFS(const char *path) VIR_FILE_SHFS_SMB | VIR_FILE_SHFS_CIFS); } + + +#if defined(HAVE_SYS_MOUNT_H) +int +virFileSetupDev(const char *path, + const char *mount_options) +{ + const unsigned long mount_flags = MS_NOSUID; + const char *mount_fs = "tmpfs"; + int ret = -1; + + if (virFileMakePath(path) < 0) { + virReportSystemError(errno, + _("Failed to make path %s"), path); + goto cleanup; + } + + VIR_DEBUG("Mount devfs on %s type=tmpfs flags=%lx, opts=%s", + path, mount_flags, mount_options); + if (mount("devfs", path, mount_fs, mount_flags, mount_options) < 0) { + virReportSystemError(errno, + _("Failed to mount devfs on %s type %s (%s)"), + path, mount_fs, mount_options); + goto cleanup; + } + + ret = 0; + cleanup: + return ret; +} + + +int +virFileBindMountDevice(const char *src, + const char *dst) +{ + if (virFileTouch(dst, 0666) < 0) + return -1; + + if (mount(src, dst, "none", MS_BIND, NULL) < 0) { + virReportSystemError(errno, _("Failed to bind %s on to %s"), src, + dst); + return -1; + } + + return 0; +} + +#else /* !defined(HAVE_SYS_MOUNT_H) */ + +int +virFileSetupDev(const char *path ATTRIBUTE_UNUSED, + const char *mount_options ATTRIBUTE_UNUSED) +{ + virReportSystemError(ENOSYS, "%s", + _("mount is not supported on this platform.")); + return -1; +} + + +int +virFileBindMountDevice(const char *src ATTRIBUTE_UNUSED, + const char *dst ATTRIBUTE_UNUSED) +{ + virReportSystemError(ENOSYS, "%s", + _("mount is not supported on this platform.")); + return -1; +} +#endif /* !defined(HAVE_SYS_MOUNT_H) */ diff --git a/src/util/virfile.h b/src/util/virfile.h index b4ae6ea..957fd91 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -307,4 +307,10 @@ int virFileGetHugepageSize(const char *path, unsigned long long *size); int virFileFindHugeTLBFS(virHugeTLBFSPtr *ret_fs, size_t *ret_nfs); + +int virFileSetupDev(const char *path, + const char *mount_options); + +int virFileBindMountDevice(const char *src, + const char *dst); #endif /* __VIR_FILE_H */ -- 2.8.4

On Thu, Nov 24, 2016 at 03:47:53PM +0100, Michal Privoznik wrote:
This part of code that LXC currently uses will be reused so move to a generic function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 2 ++ src/lxc/lxc_container.c | 20 ++------------ src/lxc/lxc_controller.c | 14 +--------- src/util/virfile.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 6 ++++ 5 files changed, 84 insertions(+), 30 deletions(-)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

Namely, virFileGetACLs, virFileSetACLs, virFileFreeACLs and virFileCopyACLs. These functions are going to be required when we are creating /dev for qemu. We have copy anything that's in host's /dev exactly as is. Including ACLs. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- configure.ac | 10 +++++- src/Makefile.am | 4 +-- src/libvirt_private.syms | 4 +++ src/util/virfile.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 11 +++++++ 5 files changed, 107 insertions(+), 3 deletions(-) diff --git a/configure.ac b/configure.ac index 5661752..972b399 100644 --- a/configure.ac +++ b/configure.ac @@ -332,11 +332,19 @@ dnl Availability of various common headers (non-fatal if missing). AC_CHECK_HEADERS([pwd.h regex.h sys/un.h \ sys/poll.h syslog.h mntent.h net/ethernet.h linux/magic.h \ sys/un.h sys/syscall.h sys/sysctl.h netinet/tcp.h ifaddrs.h \ - libtasn1.h sys/ucred.h sys/mount.h]) + libtasn1.h sys/ucred.h sys/mount.h sys/acl.h]) dnl Check whether endian provides handy macros. AC_CHECK_DECLS([htole64], [], [], [[#include <endian.h>]]) AC_CHECK_FUNCS([stat stat64 __xstat __xstat64 lstat lstat64 __lxstat __lxstat64]) +ACL_CFLAGS="" +ACL_LIBS="" +if test "x$ac_cv_header_sys_acl_h" = "xyes" ; then + ACL_LIBS="-lacl" +fi +AC_SUBST([ACL_CFLAGS]) +AC_SUBST([ACL_LIBS]) + dnl We need to decide at configure time if libvirt will use real atomic dnl operations ("lock free") or emulated ones with a mutex. diff --git a/src/Makefile.am b/src/Makefile.am index b358e0e..c08a350 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -1125,12 +1125,12 @@ libvirt_util_la_SOURCES = \ libvirt_util_la_CFLAGS = $(CAPNG_CFLAGS) $(YAJL_CFLAGS) $(LIBNL_CFLAGS) \ $(AM_CFLAGS) $(AUDIT_CFLAGS) $(DEVMAPPER_CFLAGS) \ $(DBUS_CFLAGS) $(LDEXP_LIBM) $(NUMACTL_CFLAGS) \ - $(POLKIT_CFLAGS) $(GNUTLS_CFLAGS) \ + $(POLKIT_CFLAGS) $(GNUTLS_CFLAGS) $(ACL_CFLAGS) \ -I$(srcdir)/conf libvirt_util_la_LIBADD = $(CAPNG_LIBS) $(YAJL_LIBS) $(LIBNL_LIBS) \ $(THREAD_LIBS) $(AUDIT_LIBS) $(DEVMAPPER_LIBS) \ $(LIB_CLOCK_GETTIME) $(DBUS_LIBS) $(MSCOM_LIBS) $(LIBXML_LIBS) \ - $(SECDRIVER_LIBS) $(NUMACTL_LIBS) \ + $(SECDRIVER_LIBS) $(NUMACTL_LIBS) $(ACL_LIBS) \ $(POLKIT_LIBS) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index ae563eb..5f560bf 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1559,6 +1559,7 @@ virFileActivateDirOverride; virFileBindMountDevice; virFileBuildPath; virFileClose; +virFileCopyACLs; virFileDeleteTree; virFileDirectFdFlag; virFileExists; @@ -1568,6 +1569,8 @@ virFileFindHugeTLBFS; virFileFindMountPoint; virFileFindResource; virFileFindResourceFull; +virFileFreeACLs; +virFileGetACLs; virFileGetHugepageSize; virFileGetMountReverseSubtree; virFileGetMountSubtree; @@ -1602,6 +1605,7 @@ virFileResolveAllLinks; virFileResolveLink; virFileRewrite; virFileSanitizePath; +virFileSetACLs; virFileSetupDev; virFileSkipRoot; virFileStripSuffix; diff --git a/src/util/virfile.c b/src/util/virfile.c index 6200004..c777025 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -48,6 +48,9 @@ #if HAVE_SYS_SYSCALL_H # include <sys/syscall.h> #endif +#if HAVE_SYS_ACL_H +# include <sys/acl.h> +#endif #ifdef __linux__ # if HAVE_LINUX_MAGIC_H @@ -3573,3 +3576,81 @@ virFileBindMountDevice(const char *src ATTRIBUTE_UNUSED, return -1; } #endif /* !defined(HAVE_SYS_MOUNT_H) */ + + +#if defined(HAVE_SYS_ACL_H) && !defined(LIBVIRT_NSS) && !defined(LIBVIRT_SETUID_RPC_CLIENT) +int +virFileGetACLs(const char *file, + void **acl) +{ + if (!(*acl = acl_get_file(file, ACL_TYPE_ACCESS))) + return -1; + + return 0; +} + + +int +virFileSetACLs(const char *file, + void *acl) +{ + if (acl_set_file(file, ACL_TYPE_ACCESS, acl) < 0) + return -1; + + return 0; +} + + +void +virFileFreeACLs(void **acl) +{ + acl_free(*acl); + *acl = NULL; +} + +#else /* !defined(HAVE_SYS_ACL_H) || defined(LIBVIRT_NSS) || defined(LIBVIRT_SETUID_RPC_CLIENT) */ + +int +virFileGetACLs(const char *file ATTRIBUTE_UNUSED, + void **acl ATTRIBUTE_UNUSED) +{ + errno = ENOTSUP; + return -1; +} + + +int +virFileSetACLs(const char *file ATTRIBUTE_UNUSED, + void *acl ATTRIBUTE_UNUSED) +{ + errno = ENOTSUP; + return -1; +} + + +void +virFileFreeACLs(void **acl) +{ + *acl = NULL; +} + +#endif /* !defined(HAVE_SYS_ACL_H) || defined(LIBVIRT_NSS) || defined(LIBVIRT_SETUID_RPC_CLIENT) */ + +int +virFileCopyACLs(const char *src, + const char *dst) +{ + void *acl = NULL; + int ret = -1; + + if (virFileGetACLs(src, &acl) < 0) + return ret; + + if (virFileSetACLs(dst, acl) < 0) + goto cleanup; + + ret = 0; + cleanup: + virFileFreeACLs(&acl); + return ret; +} diff --git a/src/util/virfile.h b/src/util/virfile.h index 957fd91..28fed23 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -313,4 +313,15 @@ int virFileSetupDev(const char *path, int virFileBindMountDevice(const char *src, const char *dst); + +int virFileGetACLs(const char *file, + void **acl); + +int virFileSetACLs(const char *file, + void *acl); + +void virFileFreeACLs(void **acl); + +int virFileCopyACLs(const char *src, + const char *dst); #endif /* __VIR_FILE_H */ -- 2.8.4

On Thu, Nov 24, 2016 at 03:47:54PM +0100, Michal Privoznik wrote:
Namely, virFileGetACLs, virFileSetACLs, virFileFreeACLs and virFileCopyACLs. These functions are going to be required when we are creating /dev for qemu. We have copy anything that's in host's /dev exactly as is. Including ACLs.
Do we really ? IIUC, udev uses ACLs on /dev in order to grant end users in the desktop session permission on certain device nodes, without chowning the whole device. The device nodes in our private /dev only need to be accessible to the QEMU process we're about to run. So neither existing ownership, group, permissions, nor ACLs matter at all. Our security driver code will chown/grp the device to grant QEMU access and that's all that's needed AFAICT. What am I missing that requires us to preserve ACLs ? Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 05.12.2016 13:36, Daniel P. Berrange wrote:
On Thu, Nov 24, 2016 at 03:47:54PM +0100, Michal Privoznik wrote:
Namely, virFileGetACLs, virFileSetACLs, virFileFreeACLs and virFileCopyACLs. These functions are going to be required when we are creating /dev for qemu. We have copy anything that's in host's /dev exactly as is. Including ACLs.
Do we really ?
IIUC, udev uses ACLs on /dev in order to grant end users in the desktop session permission on certain device nodes, without chowning the whole device.
The device nodes in our private /dev only need to be accessible to the QEMU process we're about to run.
So neither existing ownership, group, permissions, nor ACLs matter at all. Our security driver code will chown/grp the device to grant QEMU access and that's all that's needed AFAICT.
What am I missing that requires us to preserve ACLs ?
Admins may set ACLs on say /dev/sdb to grant access to some users and then use relabel='no' in domain XMLs so that libvirt doesn't mess it up. If we want to honour no-relabel flag we must create the device exactly as is. Michal

On Mon, Dec 05, 2016 at 02:56:12PM +0100, Michal Privoznik wrote:
On 05.12.2016 13:36, Daniel P. Berrange wrote:
On Thu, Nov 24, 2016 at 03:47:54PM +0100, Michal Privoznik wrote:
Namely, virFileGetACLs, virFileSetACLs, virFileFreeACLs and virFileCopyACLs. These functions are going to be required when we are creating /dev for qemu. We have copy anything that's in host's /dev exactly as is. Including ACLs.
Do we really ?
IIUC, udev uses ACLs on /dev in order to grant end users in the desktop session permission on certain device nodes, without chowning the whole device.
The device nodes in our private /dev only need to be accessible to the QEMU process we're about to run.
So neither existing ownership, group, permissions, nor ACLs matter at all. Our security driver code will chown/grp the device to grant QEMU access and that's all that's needed AFAICT.
What am I missing that requires us to preserve ACLs ?
Admins may set ACLs on say /dev/sdb to grant access to some users and then use relabel='no' in domain XMLs so that libvirt doesn't mess it up. If we want to honour no-relabel flag we must create the device exactly as is.
Ah ha. I totally forgot about the no-relabel case. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On Thu, Nov 24, 2016 at 03:47:54PM +0100, Michal Privoznik wrote:
Namely, virFileGetACLs, virFileSetACLs, virFileFreeACLs and virFileCopyACLs. These functions are going to be required when we are creating /dev for qemu. We have copy anything that's in host's /dev exactly as is. Including ACLs.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- configure.ac | 10 +++++- src/Makefile.am | 4 +-- src/libvirt_private.syms | 4 +++ src/util/virfile.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 11 +++++++ 5 files changed, 107 insertions(+), 3 deletions(-)
diff --git a/configure.ac b/configure.ac index 5661752..972b399 100644 --- a/configure.ac +++ b/configure.ac @@ -332,11 +332,19 @@ dnl Availability of various common headers (non-fatal if missing). AC_CHECK_HEADERS([pwd.h regex.h sys/un.h \ sys/poll.h syslog.h mntent.h net/ethernet.h linux/magic.h \ sys/un.h sys/syscall.h sys/sysctl.h netinet/tcp.h ifaddrs.h \ - libtasn1.h sys/ucred.h sys/mount.h]) + libtasn1.h sys/ucred.h sys/mount.h sys/acl.h]) dnl Check whether endian provides handy macros. AC_CHECK_DECLS([htole64], [], [], [[#include <endian.h>]]) AC_CHECK_FUNCS([stat stat64 __xstat __xstat64 lstat lstat64 __lxstat __lxstat64])
+ACL_CFLAGS="" +ACL_LIBS="" +if test "x$ac_cv_header_sys_acl_h" = "xyes" ; then + ACL_LIBS="-lacl" +fi +AC_SUBST([ACL_CFLAGS]) +AC_SUBST([ACL_LIBS]) + dnl We need to decide at configure time if libvirt will use real atomic dnl operations ("lock free") or emulated ones with a mutex.
diff --git a/src/Makefile.am b/src/Makefile.am index b358e0e..c08a350 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -1125,12 +1125,12 @@ libvirt_util_la_SOURCES = \ libvirt_util_la_CFLAGS = $(CAPNG_CFLAGS) $(YAJL_CFLAGS) $(LIBNL_CFLAGS) \ $(AM_CFLAGS) $(AUDIT_CFLAGS) $(DEVMAPPER_CFLAGS) \ $(DBUS_CFLAGS) $(LDEXP_LIBM) $(NUMACTL_CFLAGS) \ - $(POLKIT_CFLAGS) $(GNUTLS_CFLAGS) \ + $(POLKIT_CFLAGS) $(GNUTLS_CFLAGS) $(ACL_CFLAGS) \ -I$(srcdir)/conf libvirt_util_la_LIBADD = $(CAPNG_LIBS) $(YAJL_LIBS) $(LIBNL_LIBS) \ $(THREAD_LIBS) $(AUDIT_LIBS) $(DEVMAPPER_LIBS) \ $(LIB_CLOCK_GETTIME) $(DBUS_LIBS) $(MSCOM_LIBS) $(LIBXML_LIBS) \ - $(SECDRIVER_LIBS) $(NUMACTL_LIBS) \ + $(SECDRIVER_LIBS) $(NUMACTL_LIBS) $(ACL_LIBS) \ $(POLKIT_LIBS)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index ae563eb..5f560bf 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1559,6 +1559,7 @@ virFileActivateDirOverride; virFileBindMountDevice; virFileBuildPath; virFileClose; +virFileCopyACLs; virFileDeleteTree; virFileDirectFdFlag; virFileExists; @@ -1568,6 +1569,8 @@ virFileFindHugeTLBFS; virFileFindMountPoint; virFileFindResource; virFileFindResourceFull; +virFileFreeACLs; +virFileGetACLs; virFileGetHugepageSize; virFileGetMountReverseSubtree; virFileGetMountSubtree; @@ -1602,6 +1605,7 @@ virFileResolveAllLinks; virFileResolveLink; virFileRewrite; virFileSanitizePath; +virFileSetACLs; virFileSetupDev; virFileSkipRoot; virFileStripSuffix; diff --git a/src/util/virfile.c b/src/util/virfile.c index 6200004..c777025 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -48,6 +48,9 @@ #if HAVE_SYS_SYSCALL_H # include <sys/syscall.h> #endif +#if HAVE_SYS_ACL_H +# include <sys/acl.h> +#endif
#ifdef __linux__ # if HAVE_LINUX_MAGIC_H @@ -3573,3 +3576,81 @@ virFileBindMountDevice(const char *src ATTRIBUTE_UNUSED, return -1; } #endif /* !defined(HAVE_SYS_MOUNT_H) */ + + +#if defined(HAVE_SYS_ACL_H) && !defined(LIBVIRT_NSS) && !defined(LIBVIRT_SETUID_RPC_CLIENT)
You can simplify this by just adding #undef HAVE_SYS_ACL_H to config-post.h for both the NSS + SETUID blocks
+int +virFileGetACLs(const char *file, + void **acl) +{ + if (!(*acl = acl_get_file(file, ACL_TYPE_ACCESS))) + return -1; + + return 0; +} + + +int +virFileSetACLs(const char *file, + void *acl) +{ + if (acl_set_file(file, ACL_TYPE_ACCESS, acl) < 0) + return -1; + + return 0; +}
Should we consider doing virReportSystemError in these methods rather than letting the caller report the error, or is there a need for the caller to ignore errors in some cases ? Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 05.12.2016 15:00, Daniel P. Berrange wrote:
On Thu, Nov 24, 2016 at 03:47:54PM +0100, Michal Privoznik wrote:
Namely, virFileGetACLs, virFileSetACLs, virFileFreeACLs and virFileCopyACLs. These functions are going to be required when we are creating /dev for qemu. We have copy anything that's in host's /dev exactly as is. Including ACLs.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- configure.ac | 10 +++++- src/Makefile.am | 4 +-- src/libvirt_private.syms | 4 +++ src/util/virfile.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 11 +++++++ 5 files changed, 107 insertions(+), 3 deletions(-)
diff --git a/configure.ac b/configure.ac index 5661752..972b399 100644 --- a/configure.ac +++ b/configure.ac @@ -332,11 +332,19 @@ dnl Availability of various common headers (non-fatal if missing). AC_CHECK_HEADERS([pwd.h regex.h sys/un.h \ sys/poll.h syslog.h mntent.h net/ethernet.h linux/magic.h \ sys/un.h sys/syscall.h sys/sysctl.h netinet/tcp.h ifaddrs.h \ - libtasn1.h sys/ucred.h sys/mount.h]) + libtasn1.h sys/ucred.h sys/mount.h sys/acl.h]) dnl Check whether endian provides handy macros. AC_CHECK_DECLS([htole64], [], [], [[#include <endian.h>]]) AC_CHECK_FUNCS([stat stat64 __xstat __xstat64 lstat lstat64 __lxstat __lxstat64])
+ACL_CFLAGS="" +ACL_LIBS="" +if test "x$ac_cv_header_sys_acl_h" = "xyes" ; then + ACL_LIBS="-lacl" +fi +AC_SUBST([ACL_CFLAGS]) +AC_SUBST([ACL_LIBS]) + dnl We need to decide at configure time if libvirt will use real atomic dnl operations ("lock free") or emulated ones with a mutex.
diff --git a/src/Makefile.am b/src/Makefile.am index b358e0e..c08a350 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -1125,12 +1125,12 @@ libvirt_util_la_SOURCES = \ libvirt_util_la_CFLAGS = $(CAPNG_CFLAGS) $(YAJL_CFLAGS) $(LIBNL_CFLAGS) \ $(AM_CFLAGS) $(AUDIT_CFLAGS) $(DEVMAPPER_CFLAGS) \ $(DBUS_CFLAGS) $(LDEXP_LIBM) $(NUMACTL_CFLAGS) \ - $(POLKIT_CFLAGS) $(GNUTLS_CFLAGS) \ + $(POLKIT_CFLAGS) $(GNUTLS_CFLAGS) $(ACL_CFLAGS) \ -I$(srcdir)/conf libvirt_util_la_LIBADD = $(CAPNG_LIBS) $(YAJL_LIBS) $(LIBNL_LIBS) \ $(THREAD_LIBS) $(AUDIT_LIBS) $(DEVMAPPER_LIBS) \ $(LIB_CLOCK_GETTIME) $(DBUS_LIBS) $(MSCOM_LIBS) $(LIBXML_LIBS) \ - $(SECDRIVER_LIBS) $(NUMACTL_LIBS) \ + $(SECDRIVER_LIBS) $(NUMACTL_LIBS) $(ACL_LIBS) \ $(POLKIT_LIBS)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index ae563eb..5f560bf 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1559,6 +1559,7 @@ virFileActivateDirOverride; virFileBindMountDevice; virFileBuildPath; virFileClose; +virFileCopyACLs; virFileDeleteTree; virFileDirectFdFlag; virFileExists; @@ -1568,6 +1569,8 @@ virFileFindHugeTLBFS; virFileFindMountPoint; virFileFindResource; virFileFindResourceFull; +virFileFreeACLs; +virFileGetACLs; virFileGetHugepageSize; virFileGetMountReverseSubtree; virFileGetMountSubtree; @@ -1602,6 +1605,7 @@ virFileResolveAllLinks; virFileResolveLink; virFileRewrite; virFileSanitizePath; +virFileSetACLs; virFileSetupDev; virFileSkipRoot; virFileStripSuffix; diff --git a/src/util/virfile.c b/src/util/virfile.c index 6200004..c777025 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -48,6 +48,9 @@ #if HAVE_SYS_SYSCALL_H # include <sys/syscall.h> #endif +#if HAVE_SYS_ACL_H +# include <sys/acl.h> +#endif
#ifdef __linux__ # if HAVE_LINUX_MAGIC_H @@ -3573,3 +3576,81 @@ virFileBindMountDevice(const char *src ATTRIBUTE_UNUSED, return -1; } #endif /* !defined(HAVE_SYS_MOUNT_H) */ + + +#if defined(HAVE_SYS_ACL_H) && !defined(LIBVIRT_NSS) && !defined(LIBVIRT_SETUID_RPC_CLIENT)
You can simplify this by just adding #undef HAVE_SYS_ACL_H to config-post.h for both the NSS + SETUID blocks
Ah, will do.
+int +virFileGetACLs(const char *file, + void **acl) +{ + if (!(*acl = acl_get_file(file, ACL_TYPE_ACCESS))) + return -1; + + return 0; +} + + +int +virFileSetACLs(const char *file, + void *acl) +{ + if (acl_set_file(file, ACL_TYPE_ACCESS, acl) < 0) + return -1; + + return 0; +}
Should we consider doing virReportSystemError in these methods rather than letting the caller report the error, or is there a need for the caller to ignore errors in some cases ?
I'd rather not. If a filesystem is mounted with ACLs disabled, we can avoid spurious errors in the logs ("Hey, I couldn't copy ACLs on /dev/blaah") even if the error is properly ignored later in the code. And then if GetACLs() doesn't report the error, no other *ACL() function should. Michal

On Mon, Dec 05, 2016 at 03:18:29PM +0100, Michal Privoznik wrote:
On 05.12.2016 15:00, Daniel P. Berrange wrote:
On Thu, Nov 24, 2016 at 03:47:54PM +0100, Michal Privoznik wrote:
Namely, virFileGetACLs, virFileSetACLs, virFileFreeACLs and virFileCopyACLs. These functions are going to be required when we are creating /dev for qemu. We have copy anything that's in host's /dev exactly as is. Including ACLs.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- configure.ac | 10 +++++- src/Makefile.am | 4 +-- src/libvirt_private.syms | 4 +++ src/util/virfile.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 11 +++++++ 5 files changed, 107 insertions(+), 3 deletions(-)
diff --git a/configure.ac b/configure.ac index 5661752..972b399 100644 --- a/configure.ac +++ b/configure.ac @@ -332,11 +332,19 @@ dnl Availability of various common headers (non-fatal if missing). AC_CHECK_HEADERS([pwd.h regex.h sys/un.h \ sys/poll.h syslog.h mntent.h net/ethernet.h linux/magic.h \ sys/un.h sys/syscall.h sys/sysctl.h netinet/tcp.h ifaddrs.h \ - libtasn1.h sys/ucred.h sys/mount.h]) + libtasn1.h sys/ucred.h sys/mount.h sys/acl.h]) dnl Check whether endian provides handy macros. AC_CHECK_DECLS([htole64], [], [], [[#include <endian.h>]]) AC_CHECK_FUNCS([stat stat64 __xstat __xstat64 lstat lstat64 __lxstat __lxstat64])
+ACL_CFLAGS="" +ACL_LIBS="" +if test "x$ac_cv_header_sys_acl_h" = "xyes" ; then + ACL_LIBS="-lacl" +fi +AC_SUBST([ACL_CFLAGS]) +AC_SUBST([ACL_LIBS]) + dnl We need to decide at configure time if libvirt will use real atomic dnl operations ("lock free") or emulated ones with a mutex.
diff --git a/src/Makefile.am b/src/Makefile.am index b358e0e..c08a350 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -1125,12 +1125,12 @@ libvirt_util_la_SOURCES = \ libvirt_util_la_CFLAGS = $(CAPNG_CFLAGS) $(YAJL_CFLAGS) $(LIBNL_CFLAGS) \ $(AM_CFLAGS) $(AUDIT_CFLAGS) $(DEVMAPPER_CFLAGS) \ $(DBUS_CFLAGS) $(LDEXP_LIBM) $(NUMACTL_CFLAGS) \ - $(POLKIT_CFLAGS) $(GNUTLS_CFLAGS) \ + $(POLKIT_CFLAGS) $(GNUTLS_CFLAGS) $(ACL_CFLAGS) \ -I$(srcdir)/conf libvirt_util_la_LIBADD = $(CAPNG_LIBS) $(YAJL_LIBS) $(LIBNL_LIBS) \ $(THREAD_LIBS) $(AUDIT_LIBS) $(DEVMAPPER_LIBS) \ $(LIB_CLOCK_GETTIME) $(DBUS_LIBS) $(MSCOM_LIBS) $(LIBXML_LIBS) \ - $(SECDRIVER_LIBS) $(NUMACTL_LIBS) \ + $(SECDRIVER_LIBS) $(NUMACTL_LIBS) $(ACL_LIBS) \ $(POLKIT_LIBS)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index ae563eb..5f560bf 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1559,6 +1559,7 @@ virFileActivateDirOverride; virFileBindMountDevice; virFileBuildPath; virFileClose; +virFileCopyACLs; virFileDeleteTree; virFileDirectFdFlag; virFileExists; @@ -1568,6 +1569,8 @@ virFileFindHugeTLBFS; virFileFindMountPoint; virFileFindResource; virFileFindResourceFull; +virFileFreeACLs; +virFileGetACLs; virFileGetHugepageSize; virFileGetMountReverseSubtree; virFileGetMountSubtree; @@ -1602,6 +1605,7 @@ virFileResolveAllLinks; virFileResolveLink; virFileRewrite; virFileSanitizePath; +virFileSetACLs; virFileSetupDev; virFileSkipRoot; virFileStripSuffix; diff --git a/src/util/virfile.c b/src/util/virfile.c index 6200004..c777025 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -48,6 +48,9 @@ #if HAVE_SYS_SYSCALL_H # include <sys/syscall.h> #endif +#if HAVE_SYS_ACL_H +# include <sys/acl.h> +#endif
#ifdef __linux__ # if HAVE_LINUX_MAGIC_H @@ -3573,3 +3576,81 @@ virFileBindMountDevice(const char *src ATTRIBUTE_UNUSED, return -1; } #endif /* !defined(HAVE_SYS_MOUNT_H) */ + + +#if defined(HAVE_SYS_ACL_H) && !defined(LIBVIRT_NSS) && !defined(LIBVIRT_SETUID_RPC_CLIENT)
You can simplify this by just adding #undef HAVE_SYS_ACL_H to config-post.h for both the NSS + SETUID blocks
Ah, will do.
+int +virFileGetACLs(const char *file, + void **acl) +{ + if (!(*acl = acl_get_file(file, ACL_TYPE_ACCESS))) + return -1; + + return 0; +} + + +int +virFileSetACLs(const char *file, + void *acl) +{ + if (acl_set_file(file, ACL_TYPE_ACCESS, acl) < 0) + return -1; + + return 0; +}
Should we consider doing virReportSystemError in these methods rather than letting the caller report the error, or is there a need for the caller to ignore errors in some cases ?
I'd rather not. If a filesystem is mounted with ACLs disabled, we can avoid spurious errors in the logs ("Hey, I couldn't copy ACLs on /dev/blaah") even if the error is properly ignored later in the code. And then if GetACLs() doesn't report the error, no other *ACL() function should.
I was thinking that if you get ENOTSUPP with GetACLs() we'd just return success, and a NULL acl. The SetACLs method would obviously have to treat a NULL acl as a no-op. IOW, it could "just work" even when ACLs are disabled. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 05.12.2016 15:21, Daniel P. Berrange wrote:
On Mon, Dec 05, 2016 at 03:18:29PM +0100, Michal Privoznik wrote:
On 05.12.2016 15:00, Daniel P. Berrange wrote:
On Thu, Nov 24, 2016 at 03:47:54PM +0100, Michal Privoznik wrote:
Namely, virFileGetACLs, virFileSetACLs, virFileFreeACLs and virFileCopyACLs. These functions are going to be required when we are creating /dev for qemu. We have copy anything that's in host's /dev exactly as is. Including ACLs.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- configure.ac | 10 +++++- src/Makefile.am | 4 +-- src/libvirt_private.syms | 4 +++ src/util/virfile.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 11 +++++++ 5 files changed, 107 insertions(+), 3 deletions(-)
diff --git a/configure.ac b/configure.ac index 5661752..972b399 100644 --- a/configure.ac +++ b/configure.ac @@ -332,11 +332,19 @@ dnl Availability of various common headers (non-fatal if missing). AC_CHECK_HEADERS([pwd.h regex.h sys/un.h \ sys/poll.h syslog.h mntent.h net/ethernet.h linux/magic.h \ sys/un.h sys/syscall.h sys/sysctl.h netinet/tcp.h ifaddrs.h \ - libtasn1.h sys/ucred.h sys/mount.h]) + libtasn1.h sys/ucred.h sys/mount.h sys/acl.h]) dnl Check whether endian provides handy macros. AC_CHECK_DECLS([htole64], [], [], [[#include <endian.h>]]) AC_CHECK_FUNCS([stat stat64 __xstat __xstat64 lstat lstat64 __lxstat __lxstat64])
+ACL_CFLAGS="" +ACL_LIBS="" +if test "x$ac_cv_header_sys_acl_h" = "xyes" ; then + ACL_LIBS="-lacl" +fi +AC_SUBST([ACL_CFLAGS]) +AC_SUBST([ACL_LIBS]) + dnl We need to decide at configure time if libvirt will use real atomic dnl operations ("lock free") or emulated ones with a mutex.
diff --git a/src/Makefile.am b/src/Makefile.am index b358e0e..c08a350 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -1125,12 +1125,12 @@ libvirt_util_la_SOURCES = \ libvirt_util_la_CFLAGS = $(CAPNG_CFLAGS) $(YAJL_CFLAGS) $(LIBNL_CFLAGS) \ $(AM_CFLAGS) $(AUDIT_CFLAGS) $(DEVMAPPER_CFLAGS) \ $(DBUS_CFLAGS) $(LDEXP_LIBM) $(NUMACTL_CFLAGS) \ - $(POLKIT_CFLAGS) $(GNUTLS_CFLAGS) \ + $(POLKIT_CFLAGS) $(GNUTLS_CFLAGS) $(ACL_CFLAGS) \ -I$(srcdir)/conf libvirt_util_la_LIBADD = $(CAPNG_LIBS) $(YAJL_LIBS) $(LIBNL_LIBS) \ $(THREAD_LIBS) $(AUDIT_LIBS) $(DEVMAPPER_LIBS) \ $(LIB_CLOCK_GETTIME) $(DBUS_LIBS) $(MSCOM_LIBS) $(LIBXML_LIBS) \ - $(SECDRIVER_LIBS) $(NUMACTL_LIBS) \ + $(SECDRIVER_LIBS) $(NUMACTL_LIBS) $(ACL_LIBS) \ $(POLKIT_LIBS)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index ae563eb..5f560bf 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1559,6 +1559,7 @@ virFileActivateDirOverride; virFileBindMountDevice; virFileBuildPath; virFileClose; +virFileCopyACLs; virFileDeleteTree; virFileDirectFdFlag; virFileExists; @@ -1568,6 +1569,8 @@ virFileFindHugeTLBFS; virFileFindMountPoint; virFileFindResource; virFileFindResourceFull; +virFileFreeACLs; +virFileGetACLs; virFileGetHugepageSize; virFileGetMountReverseSubtree; virFileGetMountSubtree; @@ -1602,6 +1605,7 @@ virFileResolveAllLinks; virFileResolveLink; virFileRewrite; virFileSanitizePath; +virFileSetACLs; virFileSetupDev; virFileSkipRoot; virFileStripSuffix; diff --git a/src/util/virfile.c b/src/util/virfile.c index 6200004..c777025 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -48,6 +48,9 @@ #if HAVE_SYS_SYSCALL_H # include <sys/syscall.h> #endif +#if HAVE_SYS_ACL_H +# include <sys/acl.h> +#endif
#ifdef __linux__ # if HAVE_LINUX_MAGIC_H @@ -3573,3 +3576,81 @@ virFileBindMountDevice(const char *src ATTRIBUTE_UNUSED, return -1; } #endif /* !defined(HAVE_SYS_MOUNT_H) */ + + +#if defined(HAVE_SYS_ACL_H) && !defined(LIBVIRT_NSS) && !defined(LIBVIRT_SETUID_RPC_CLIENT)
You can simplify this by just adding #undef HAVE_SYS_ACL_H to config-post.h for both the NSS + SETUID blocks
Ah, will do.
+int +virFileGetACLs(const char *file, + void **acl) +{ + if (!(*acl = acl_get_file(file, ACL_TYPE_ACCESS))) + return -1; + + return 0; +} + + +int +virFileSetACLs(const char *file, + void *acl) +{ + if (acl_set_file(file, ACL_TYPE_ACCESS, acl) < 0) + return -1; + + return 0; +}
Should we consider doing virReportSystemError in these methods rather than letting the caller report the error, or is there a need for the caller to ignore errors in some cases ?
I'd rather not. If a filesystem is mounted with ACLs disabled, we can avoid spurious errors in the logs ("Hey, I couldn't copy ACLs on /dev/blaah") even if the error is properly ignored later in the code. And then if GetACLs() doesn't report the error, no other *ACL() function should.
I was thinking that if you get ENOTSUPP with GetACLs() we'd just return success, and a NULL acl. The SetACLs method would obviously have to treat a NULL acl as a no-op. IOW, it could "just work" even when ACLs are disabled.
Then if a caller wants to error out on ENOTSUPP they would have to check for both retval and errno. Moreover, since HAVE_SYS_ACL_H variant of the function reports errors, the !HAVE_SYS_ACL_H stub has to do so as well - in which case users would see an error in the logs every time they start a machine if their libvirt is build without ACLs enabled. Which reminds me, not in this patch but I need to update the spec file so that we have BuildRequires: libacl-devel in it. Michal

We will need this function in near future so that we know what /dev device corresponds to the USB device. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virusb.c | 5 +++++ src/util/virusb.h | 1 + 3 files changed, 7 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 5f560bf..8a172c3 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2602,6 +2602,7 @@ virUSBDeviceFree; virUSBDeviceGetBus; virUSBDeviceGetDevno; virUSBDeviceGetName; +virUSBDeviceGetPath; virUSBDeviceGetUsedBy; virUSBDeviceListAdd; virUSBDeviceListCount; diff --git a/src/util/virusb.c b/src/util/virusb.c index 6a001a7..8cd2f57 100644 --- a/src/util/virusb.c +++ b/src/util/virusb.c @@ -406,6 +406,11 @@ const char *virUSBDeviceGetName(virUSBDevicePtr dev) return dev->name; } +const char *virUSBDeviceGetPath(virUSBDevicePtr dev) +{ + return dev->path; +} + unsigned int virUSBDeviceGetBus(virUSBDevicePtr dev) { return dev->bus; diff --git a/src/util/virusb.h b/src/util/virusb.h index f98ea21..716e8c6 100644 --- a/src/util/virusb.h +++ b/src/util/virusb.h @@ -67,6 +67,7 @@ void virUSBDeviceGetUsedBy(virUSBDevicePtr dev, const char **drv_name, const char **dom_name); const char *virUSBDeviceGetName(virUSBDevicePtr dev); +const char *virUSBDeviceGetPath(virUSBDevicePtr usb); unsigned int virUSBDeviceGetBus(virUSBDevicePtr dev); unsigned int virUSBDeviceGetDevno(virUSBDevicePtr dev); -- 2.8.4

On Thu, Nov 24, 2016 at 03:47:55PM +0100, Michal Privoznik wrote:
We will need this function in near future so that we know what /dev device corresponds to the USB device.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virusb.c | 5 +++++ src/util/virusb.h | 1 + 3 files changed, 7 insertions(+)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

We will need this function in near future so that we know what /dev device corresponds to the SCSI device. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virscsi.c | 6 ++++++ src/util/virscsi.h | 1 + 3 files changed, 8 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 8a172c3..a5eb852 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2296,6 +2296,7 @@ virSCSIDeviceGetAdapter; virSCSIDeviceGetBus; virSCSIDeviceGetDevName; virSCSIDeviceGetName; +virSCSIDeviceGetPath; virSCSIDeviceGetReadonly; virSCSIDeviceGetSgName; virSCSIDeviceGetShareable; diff --git a/src/util/virscsi.c b/src/util/virscsi.c index 4843367..4fd8838 100644 --- a/src/util/virscsi.c +++ b/src/util/virscsi.c @@ -315,6 +315,12 @@ virSCSIDeviceGetName(virSCSIDevicePtr dev) return dev->name; } +const char * +virSCSIDeviceGetPath(virSCSIDevicePtr dev) +{ + return dev->sg_path; +} + unsigned int virSCSIDeviceGetAdapter(virSCSIDevicePtr dev) { diff --git a/src/util/virscsi.h b/src/util/virscsi.h index df40d7f..7d88d4e 100644 --- a/src/util/virscsi.h +++ b/src/util/virscsi.h @@ -58,6 +58,7 @@ int virSCSIDeviceSetUsedBy(virSCSIDevicePtr dev, const char *domname); bool virSCSIDeviceIsAvailable(virSCSIDevicePtr dev); const char *virSCSIDeviceGetName(virSCSIDevicePtr dev); +const char *virSCSIDeviceGetPath(virSCSIDevicePtr dev); unsigned int virSCSIDeviceGetAdapter(virSCSIDevicePtr dev); unsigned int virSCSIDeviceGetBus(virSCSIDevicePtr dev); unsigned int virSCSIDeviceGetTarget(virSCSIDevicePtr dev); -- 2.8.4

On Thu, Nov 24, 2016 at 03:47:56PM +0100, Michal Privoznik wrote:
We will need this function in near future so that we know what /dev device corresponds to the SCSI device.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virscsi.c | 6 ++++++ src/util/virscsi.h | 1 + 3 files changed, 8 insertions(+)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

This is a list of devices that qemu needs for its run (apart from what's configured for domain). The devices on the list are enabled in the CGroups by default so they will be good candidates for initial /dev for new qemu. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_cgroup.c | 2 +- src/qemu/qemu_cgroup.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 211d0b5..68c019e 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -42,7 +42,7 @@ VIR_LOG_INIT("qemu.qemu_cgroup"); -static const char *const defaultDeviceACL[] = { +const char *const defaultDeviceACL[] = { "/dev/null", "/dev/full", "/dev/zero", "/dev/random", "/dev/urandom", "/dev/ptmx", "/dev/kvm", "/dev/kqemu", diff --git a/src/qemu/qemu_cgroup.h b/src/qemu/qemu_cgroup.h index 6e2c742..8ae4a72 100644 --- a/src/qemu/qemu_cgroup.h +++ b/src/qemu/qemu_cgroup.h @@ -76,4 +76,5 @@ int qemuCgroupEmulatorAllNodesAllow(virCgroupPtr cgroup, qemuCgroupEmulatorAllNodesDataPtr *data); void qemuCgroupEmulatorAllNodesRestore(qemuCgroupEmulatorAllNodesDataPtr data); +extern const char *const defaultDeviceACL[]; #endif /* __QEMU_CGROUP_H__ */ -- 2.8.4

On Thu, Nov 24, 2016 at 03:47:57PM +0100, Michal Privoznik wrote:
This is a list of devices that qemu needs for its run (apart from what's configured for domain). The devices on the list are enabled in the CGroups by default so they will be good candidates for initial /dev for new qemu.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_cgroup.c | 2 +- src/qemu/qemu_cgroup.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

Prime time. When it comes to spawning qemu process and relabelling all the devices it's going to touch, there's inherent race with other applications in the system (e.g. udev). Instead of trying convincing udev to not touch libvirt managed devices, we can create a separate mount namespace for the qemu, and mount our own /dev there. Of course this puts more work onto us as we have to maintain /dev files on each domain start and device hot(un-)plug. On the other hand, this enhances security also.
From technical POV, on domain startup process the parent (libvirtd) creates:
/var/lib/libvirt/qemu/$domain.dev /var/lib/libvirt/qemu/$domain.devpts The child (which is going to be qemu eventually) calls unshare() to create new mount namespace. From now on anything that child does is invisible to the parent. Child then mounts tmpfs on $domain.dev (so that it still sees original /dev from the host) and creates some devices (as explained in one of the previous patches). The devices have to be created exactly as they are in the host (including perms, seclabels, ACLs, ...). After that it moves $domain.dev mount to /dev. What's the $domain.devpts mount there for then you ask? QEMU can create PTYs for some chardevs. And historically we exposed the host ends in our domain XML allowing users to connect to them. Therefore we must preserve devpts mount to be shared with the host's one. To make this patch as small as possible, creating of devices configured for domain in question is implemented in next patches. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 289 ++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 10 ++ src/qemu/qemu_process.c | 13 +++ 3 files changed, 312 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 137d4d5..d6a1c29 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -55,6 +55,9 @@ #include <sys/time.h> #include <fcntl.h> +#if defined(HAVE_SYS_MOUNT_H) +# include <sys/mount.h> +#endif #include <libxml/xpathInternals.h> @@ -1627,6 +1630,9 @@ qemuDomainObjPrivateXMLFormat(virBufferPtr buf, virDomainChrTypeToString(priv->monConfig->type)); } + if (priv->containerized) + virBufferAddLit(buf, "<containerized/>\n"); + qemuDomainObjPrivateXMLFormatVcpus(buf, vm->def); if (priv->qemuCaps) { @@ -1809,6 +1815,8 @@ qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, goto error; } + priv->containerized = virXPathBoolean("count(./containerized) > 0", ctxt) > 0; + if ((n = virXPathNodeSet("./vcpus/vcpu", ctxt, &nodes)) < 0) goto error; @@ -6653,3 +6661,284 @@ qemuDomainSupportsVideoVga(virDomainVideoDefPtr video, return true; } + + +static int +qemuDomainCreateDevice(const char *device, + const char *path, + bool allow_noent) +{ + char *devicePath = NULL; + struct stat sb; + int ret = -1; + + if (!STRPREFIX(device, "/dev")) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("invalid device: %s"), + device); + goto cleanup; + } + + if (virAsprintf(&devicePath, "%s/%s", + path, device + 4) < 0) + goto cleanup; + + if (stat(device, &sb) < 0) { + if (errno == ENOENT && allow_noent) { + /* Ignore non-existent device. */ + ret = 0; + goto cleanup; + } + + virReportSystemError(errno, _("Unable to stat %s"), device); + goto cleanup; + } + + if (virFileMakeParentPath(devicePath) < 0) { + virReportSystemError(errno, + _("Unable to create %s"), + devicePath); + goto cleanup; + } + + if (mknod(devicePath, sb.st_mode, sb.st_rdev) < 0) { + virReportSystemError(errno, + _("Failed to make device %s"), + devicePath); + goto cleanup; + } + + if (chown(devicePath, sb.st_uid, sb.st_gid) < 0) { + virReportSystemError(errno, + _("Failed to chown device %s"), + devicePath); + goto cleanup; + } + + if (virFileCopyACLs(device, devicePath) < 0 && + errno != ENOTSUP) { + virReportSystemError(errno, + _("Failed to copy ACLs on device %s"), + devicePath); + goto cleanup; + } + + ret = 0; + cleanup: + VIR_FREE(devicePath); + return ret; +} + + + +static int +qemuDomainPopulateDevices(virQEMUDriverPtr driver, + virDomainObjPtr vm ATTRIBUTE_UNUSED, + const char *path) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + const char *const *devices = (const char *const *) cfg->cgroupDeviceACL; + size_t i; + int ret = -1; + + if (!devices) + devices = defaultDeviceACL; + + for (i = 0; devices[i]; i++) { + if (qemuDomainCreateDevice(devices[i], path, true) < 0) + goto cleanup; + } + + ret = 0; + cleanup: + virObjectUnref(cfg); + return ret; +} + + +static int +qemuDomainSetupDev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *path) +{ + char *mount_options = NULL; + char *opts = NULL; + int ret = -1; + + VIR_DEBUG("Setting up /dev/ for domain %s", vm->def->name); + + mount_options = virSecurityManagerGetMountOptions(driver->securityManager, + vm->def); + + if (!mount_options && + VIR_STRDUP(mount_options, "") < 0) + goto cleanup; + + /* + * tmpfs is limited to 64kb, since we only have device nodes in there + * and don't want to DOS the entire OS RAM usage + */ + if (virAsprintf(&opts, + "mode=755,size=65536%s", mount_options) < 0) + goto cleanup; + + if (virFileSetupDev(path, opts) < 0) + goto cleanup; + + if (qemuDomainPopulateDevices(driver, vm, path) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(opts); + VIR_FREE(mount_options); + return ret; +} + + +int +qemuDomainBuildNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + qemuDomainObjPrivatePtr priv = vm->privateData; + const unsigned long mount_flags = MS_MOVE; + char *devPath = NULL; + char *devptsPath = NULL; + int ret = -1; + + if (!priv->containerized) { + ret = 0; + goto cleanup; + } + + if (virAsprintf(&devPath, "%s/%s.dev", + cfg->stateDir, vm->def->name) < 0 || + virAsprintf(&devptsPath, "%s/%s.devpts", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + if (qemuDomainSetupDev(driver, vm, devPath) < 0) + goto cleanup; + + /* Save /dev/pts mount point because /dev/pts/NNN is exposed in our live + * XML and other applications are supposed to be able to use it. */ + if (mount("/dev/pts", devptsPath, NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, "%s", + _("Unable to move /dev/pts mount")); + goto cleanup; + } + + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, + _("Failed to mount %s on /dev"), + devPath); + goto cleanup; + } + + if (virFileMakePath("/dev/pts") < 0) { + virReportSystemError(errno, "%s", + _("Cannot create /dev/pts")); + goto cleanup; + } + + if (mount(devptsPath, "/dev/pts", NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, + _("Failed to mount %s on /dev/pts"), + devptsPath); + goto cleanup; + } + + if (virFileBindMountDevice("/dev/pts/ptmx", "/dev/ptmx") < 0) + goto cleanup; + + VIR_DEBUG("blaaah: %d", system("find /dev/ -ls > /tmp/blaaah")); + VIR_DEBUG("blaaah: %d", system("echo >> /tmp/blaaah")); + VIR_DEBUG("blaaah: %d", system("mount >> /tmp/blaaah")); + ret = 0; + cleanup: + virObjectUnref(cfg); + VIR_FREE(devPath); + return ret; +} + + +int +qemuDomainCreateNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + qemuDomainObjPrivatePtr priv = vm->privateData; + int ret = -1; + char *path = NULL; + +#if !defined(__linux__) + /* Namespaces are Linux specific. On other platforms just + * carry on with the old behaviour. */ + return 0; +#endif + + if (!virQEMUDriverIsPrivileged(driver)) { + ret = 0; + goto cleanup; + } + + if (virAsprintf(&path, "%s/%s.dev", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + if (virFileMakePath(path) < 0) { + virReportSystemError(errno, + _("Failed to create %s"), + path); + goto cleanup; + } + + VIR_FREE(path); + if (virAsprintf(&path, "%s/%s.devpts", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + if (virFileMakePath(path) < 0) { + virReportSystemError(errno, + _("Failed to create %s"), + path); + goto cleanup; + } + + priv->containerized = true; + ret = 0; + cleanup: + VIR_FREE(path); + virObjectUnref(cfg); + return ret; +} + + +void +qemuDomainDeleteNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + qemuDomainObjPrivatePtr priv = vm->privateData; + char *path; + + if (!priv->containerized) + return; + + if (virAsprintf(&path, "%s/%s.dev", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + virFileDeleteTree(path); + + VIR_FREE(path); + if (virAsprintf(&path, "%s/%s.devpts", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + virFileDeleteTree(path); + cleanup: + virObjectUnref(cfg); + VIR_FREE(path); +} diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 7650ff3..9c34476 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -170,6 +170,8 @@ typedef qemuDomainObjPrivate *qemuDomainObjPrivatePtr; struct _qemuDomainObjPrivate { struct qemuDomainJobObj job; + bool containerized; + qemuMonitorPtr mon; virDomainChrSourceDefPtr monConfig; bool monJSON; @@ -785,4 +787,12 @@ int qemuDomainCheckMonitor(virQEMUDriverPtr driver, bool qemuDomainSupportsVideoVga(virDomainVideoDefPtr video, virQEMUCapsPtr qemuCaps); +int qemuDomainBuildNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm); + +int qemuDomainCreateNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm); + +void qemuDomainDeleteNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm); #endif /* __QEMU_DOMAIN_H__ */ diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 93a38e1..4f64dd1 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2661,6 +2661,12 @@ static int qemuProcessHook(void *data) if (virSecurityManagerClearSocketLabel(h->driver->securityManager, h->vm->def) < 0) goto cleanup; + if (virProcessSetupPrivateMountNS() < 0) + goto cleanup; + + if (qemuDomainBuildNamespace(h->driver, h->vm) < 0) + goto cleanup; + if (virDomainNumatuneGetMode(h->vm->def->numa, -1, &mode) == 0) { if (mode == VIR_DOMAIN_NUMATUNE_MEM_STRICT && h->cfg->cgroupControllers & (1 << VIR_CGROUP_CONTROLLER_CPUSET) && @@ -5434,6 +5440,11 @@ qemuProcessLaunch(virConnectPtr conn, qemuDomainLogContextMarkPosition(logCtxt); + VIR_DEBUG("Building mount namespace"); + + if (qemuDomainCreateNamespace(driver, vm) < 0) + goto cleanup; + VIR_DEBUG("Clear emulator capabilities: %d", cfg->clearEmulatorCapabilities); if (cfg->clearEmulatorCapabilities) @@ -6210,6 +6221,8 @@ void qemuProcessStop(virQEMUDriverPtr driver, } } + qemuDomainDeleteNamespace(driver, vm); + vm->taint = 0; vm->pid = -1; virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason); -- 2.8.4

On Thu, Nov 24, 2016 at 03:47:58PM +0100, Michal Privoznik wrote:
Prime time. When it comes to spawning qemu process and relabelling all the devices it's going to touch, there's inherent race with other applications in the system (e.g. udev). Instead of trying convincing udev to not touch libvirt managed devices, we can create a separate mount namespace for the qemu, and mount our own /dev there. Of course this puts more work onto us as we have to maintain /dev files on each domain start and device hot(un-)plug. On the other hand, this enhances security also.
From technical POV, on domain startup process the parent (libvirtd) creates:
/var/lib/libvirt/qemu/$domain.dev /var/lib/libvirt/qemu/$domain.devpts
The child (which is going to be qemu eventually) calls unshare() to create new mount namespace. From now on anything that child does is invisible to the parent. Child then mounts tmpfs on $domain.dev (so that it still sees original /dev from the host) and creates some devices (as explained in one of the previous patches). The devices have to be created exactly as they are in the host (including perms, seclabels, ACLs, ...). After that it moves $domain.dev mount to /dev.
What's the $domain.devpts mount there for then you ask? QEMU can create PTYs for some chardevs. And historically we exposed the host ends in our domain XML allowing users to connect to them. Therefore we must preserve devpts mount to be shared with the host's one.
To make this patch as small as possible, creating of devices configured for domain in question is implemented in next patches.
IIUC, this means that QEMU startup will be broken for any guests that need host devices for disk, usb/pci passthrough, etc ? It is nice to keep the work incremental though. I wonder if there is any value in doing everything *except* the MS_MOVE of /var/lib/libvirt/qemu/$domain.dev -> /dev/. ie only turn it on once all the code is in place ?
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 289 ++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 10 ++ src/qemu/qemu_process.c | 13 +++ 3 files changed, 312 insertions(+)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 137d4d5..d6a1c29 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -55,6 +55,9 @@
#include <sys/time.h> #include <fcntl.h> +#if defined(HAVE_SYS_MOUNT_H) +# include <sys/mount.h> +#endif
#include <libxml/xpathInternals.h>
@@ -1627,6 +1630,9 @@ qemuDomainObjPrivateXMLFormat(virBufferPtr buf, virDomainChrTypeToString(priv->monConfig->type)); }
+ if (priv->containerized) + virBufferAddLit(buf, "<containerized/>\n");
I wonder if it is worth explicitly listing the namespaces we enabled ? eg <namespaces> <mount/> </namespaces> so we're prepared to deal with us adding use of more private namespaces in future ?
+ qemuDomainObjPrivateXMLFormatVcpus(buf, vm->def);
if (priv->qemuCaps) { @@ -1809,6 +1815,8 @@ qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, goto error; }
+ priv->containerized = virXPathBoolean("count(./containerized) > 0", ctxt) > 0; + if ((n = virXPathNodeSet("./vcpus/vcpu", ctxt, &nodes)) < 0) goto error;
@@ -6653,3 +6661,284 @@ qemuDomainSupportsVideoVga(virDomainVideoDefPtr video,
return true; } + + +static int +qemuDomainCreateDevice(const char *device, + const char *path, + bool allow_noent) +{ + char *devicePath = NULL; + struct stat sb; + int ret = -1; + + if (!STRPREFIX(device, "/dev")) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("invalid device: %s"), + device); + goto cleanup; + } + + if (virAsprintf(&devicePath, "%s/%s", + path, device + 4) < 0) + goto cleanup; + + if (stat(device, &sb) < 0) { + if (errno == ENOENT && allow_noent) { + /* Ignore non-existent device. */ + ret = 0; + goto cleanup; + } + + virReportSystemError(errno, _("Unable to stat %s"), device); + goto cleanup; + } + + if (virFileMakeParentPath(devicePath) < 0) { + virReportSystemError(errno, + _("Unable to create %s"), + devicePath); + goto cleanup; + } + + if (mknod(devicePath, sb.st_mode, sb.st_rdev) < 0) { + virReportSystemError(errno, + _("Failed to make device %s"), + devicePath); + goto cleanup; + } + + if (chown(devicePath, sb.st_uid, sb.st_gid) < 0) { + virReportSystemError(errno, + _("Failed to chown device %s"), + devicePath); + goto cleanup; + } + + if (virFileCopyACLs(device, devicePath) < 0 && + errno != ENOTSUP) { + virReportSystemError(errno, + _("Failed to copy ACLs on device %s"), + devicePath); + goto cleanup; + }
Per my comments on earlier patches, I don't think we need to have the chown or ACL copy. Just have the dev owned by root:root and let our security drivers deal with the rest.
+ + ret = 0; + cleanup: + VIR_FREE(devicePath); + return ret; +} + + + +static int +qemuDomainPopulateDevices(virQEMUDriverPtr driver, + virDomainObjPtr vm ATTRIBUTE_UNUSED, + const char *path) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + const char *const *devices = (const char *const *) cfg->cgroupDeviceACL; + size_t i; + int ret = -1; + + if (!devices) + devices = defaultDeviceACL; + + for (i = 0; devices[i]; i++) { + if (qemuDomainCreateDevice(devices[i], path, true) < 0) + goto cleanup; + } + + ret = 0; + cleanup: + virObjectUnref(cfg); + return ret; +} + + +static int +qemuDomainSetupDev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *path) +{ + char *mount_options = NULL; + char *opts = NULL; + int ret = -1; + + VIR_DEBUG("Setting up /dev/ for domain %s", vm->def->name); + + mount_options = virSecurityManagerGetMountOptions(driver->securityManager, + vm->def); + + if (!mount_options && + VIR_STRDUP(mount_options, "") < 0) + goto cleanup; + + /* + * tmpfs is limited to 64kb, since we only have device nodes in there + * and don't want to DOS the entire OS RAM usage + */ + if (virAsprintf(&opts, + "mode=755,size=65536%s", mount_options) < 0) + goto cleanup; + + if (virFileSetupDev(path, opts) < 0) + goto cleanup; + + if (qemuDomainPopulateDevices(driver, vm, path) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(opts); + VIR_FREE(mount_options); + return ret; +} + + +int +qemuDomainBuildNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + qemuDomainObjPrivatePtr priv = vm->privateData; + const unsigned long mount_flags = MS_MOVE; + char *devPath = NULL; + char *devptsPath = NULL; + int ret = -1; + + if (!priv->containerized) { + ret = 0; + goto cleanup; + } + + if (virAsprintf(&devPath, "%s/%s.dev", + cfg->stateDir, vm->def->name) < 0 || + virAsprintf(&devptsPath, "%s/%s.devpts", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + if (qemuDomainSetupDev(driver, vm, devPath) < 0) + goto cleanup; + + /* Save /dev/pts mount point because /dev/pts/NNN is exposed in our live + * XML and other applications are supposed to be able to use it. */ + if (mount("/dev/pts", devptsPath, NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, "%s", + _("Unable to move /dev/pts mount")); + goto cleanup; + } + + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, + _("Failed to mount %s on /dev"), + devPath); + goto cleanup; + } + + if (virFileMakePath("/dev/pts") < 0) { + virReportSystemError(errno, "%s", + _("Cannot create /dev/pts")); + goto cleanup; + } + + if (mount(devptsPath, "/dev/pts", NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, + _("Failed to mount %s on /dev/pts"), + devptsPath); + goto cleanup; + } + + if (virFileBindMountDevice("/dev/pts/ptmx", "/dev/ptmx") < 0) + goto cleanup; + + VIR_DEBUG("blaaah: %d", system("find /dev/ -ls > /tmp/blaaah")); + VIR_DEBUG("blaaah: %d", system("echo >> /tmp/blaaah")); + VIR_DEBUG("blaaah: %d", system("mount >> /tmp/blaaah"));
Heh, :-)
+ ret = 0; + cleanup: + virObjectUnref(cfg); + VIR_FREE(devPath); + return ret; +} + + +int +qemuDomainCreateNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + qemuDomainObjPrivatePtr priv = vm->privateData; + int ret = -1; + char *path = NULL; + +#if !defined(__linux__) + /* Namespaces are Linux specific. On other platforms just + * carry on with the old behaviour. */ + return 0; +#endif
This feels quite likely to create compiler warnings on non-linux about unreachable code. It feels like we should just stub out the entire method instead.
+ + if (!virQEMUDriverIsPrivileged(driver)) { + ret = 0; + goto cleanup; + } + + if (virAsprintf(&path, "%s/%s.dev", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + if (virFileMakePath(path) < 0) { + virReportSystemError(errno, + _("Failed to create %s"), + path); + goto cleanup; + } + + VIR_FREE(path); + if (virAsprintf(&path, "%s/%s.devpts", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + if (virFileMakePath(path) < 0) { + virReportSystemError(errno, + _("Failed to create %s"), + path); + goto cleanup; + } + + priv->containerized = true; + ret = 0; + cleanup: + VIR_FREE(path); + virObjectUnref(cfg); + return ret; +} + + +void +qemuDomainDeleteNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + qemuDomainObjPrivatePtr priv = vm->privateData; + char *path; + + if (!priv->containerized) + return; + + if (virAsprintf(&path, "%s/%s.dev", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + virFileDeleteTree(path); + + VIR_FREE(path); + if (virAsprintf(&path, "%s/%s.devpts", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + virFileDeleteTree(path); + cleanup: + virObjectUnref(cfg); + VIR_FREE(path); +}
This is running in the host namespace and custom /dev tmpfs was only ever mounted in the QEMU private namespace, so should be invisible to the host. As such, IIUC, these directories should both be 100% empty. IOW, it seems that we don't need virFileDeleteTree - plain rmdir should be sufficient and safer against accidents.
diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 7650ff3..9c34476 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -170,6 +170,8 @@ typedef qemuDomainObjPrivate *qemuDomainObjPrivatePtr; struct _qemuDomainObjPrivate { struct qemuDomainJobObj job;
+ bool containerized; + qemuMonitorPtr mon; virDomainChrSourceDefPtr monConfig; bool monJSON; @@ -785,4 +787,12 @@ int qemuDomainCheckMonitor(virQEMUDriverPtr driver, bool qemuDomainSupportsVideoVga(virDomainVideoDefPtr video, virQEMUCapsPtr qemuCaps);
+int qemuDomainBuildNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm); + +int qemuDomainCreateNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm); + +void qemuDomainDeleteNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm); #endif /* __QEMU_DOMAIN_H__ */ diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 93a38e1..4f64dd1 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2661,6 +2661,12 @@ static int qemuProcessHook(void *data) if (virSecurityManagerClearSocketLabel(h->driver->securityManager, h->vm->def) < 0) goto cleanup;
+ if (virProcessSetupPrivateMountNS() < 0) + goto cleanup; + + if (qemuDomainBuildNamespace(h->driver, h->vm) < 0) + goto cleanup; + if (virDomainNumatuneGetMode(h->vm->def->numa, -1, &mode) == 0) { if (mode == VIR_DOMAIN_NUMATUNE_MEM_STRICT && h->cfg->cgroupControllers & (1 << VIR_CGROUP_CONTROLLER_CPUSET) && @@ -5434,6 +5440,11 @@ qemuProcessLaunch(virConnectPtr conn,
qemuDomainLogContextMarkPosition(logCtxt);
+ VIR_DEBUG("Building mount namespace"); + + if (qemuDomainCreateNamespace(driver, vm) < 0) + goto cleanup; + VIR_DEBUG("Clear emulator capabilities: %d", cfg->clearEmulatorCapabilities); if (cfg->clearEmulatorCapabilities) @@ -6210,6 +6221,8 @@ void qemuProcessStop(virQEMUDriverPtr driver, } }
+ qemuDomainDeleteNamespace(driver, vm); + vm->taint = 0; vm->pid = -1; virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason);
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 05.12.2016 14:26, Daniel P. Berrange wrote:
On Thu, Nov 24, 2016 at 03:47:58PM +0100, Michal Privoznik wrote:
Prime time. When it comes to spawning qemu process and relabelling all the devices it's going to touch, there's inherent race with other applications in the system (e.g. udev). Instead of trying convincing udev to not touch libvirt managed devices, we can create a separate mount namespace for the qemu, and mount our own /dev there. Of course this puts more work onto us as we have to maintain /dev files on each domain start and device hot(un-)plug. On the other hand, this enhances security also.
From technical POV, on domain startup process the parent (libvirtd) creates:
/var/lib/libvirt/qemu/$domain.dev /var/lib/libvirt/qemu/$domain.devpts
The child (which is going to be qemu eventually) calls unshare() to create new mount namespace. From now on anything that child does is invisible to the parent. Child then mounts tmpfs on $domain.dev (so that it still sees original /dev from the host) and creates some devices (as explained in one of the previous patches). The devices have to be created exactly as they are in the host (including perms, seclabels, ACLs, ...). After that it moves $domain.dev mount to /dev.
What's the $domain.devpts mount there for then you ask? QEMU can create PTYs for some chardevs. And historically we exposed the host ends in our domain XML allowing users to connect to them. Therefore we must preserve devpts mount to be shared with the host's one.
To make this patch as small as possible, creating of devices configured for domain in question is implemented in next patches.
IIUC, this means that QEMU startup will be broken for any guests that need host devices for disk, usb/pci passthrough, etc ?
Yep.
It is nice to keep the work incremental though. I wonder if there is any value in doing everything *except* the MS_MOVE of /var/lib/libvirt/qemu/$domain.dev -> /dev/. ie only turn it on once all the code is in place ?
Ah, okay.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 289 ++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 10 ++ src/qemu/qemu_process.c | 13 +++ 3 files changed, 312 insertions(+)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 137d4d5..d6a1c29 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -55,6 +55,9 @@
#include <sys/time.h> #include <fcntl.h> +#if defined(HAVE_SYS_MOUNT_H) +# include <sys/mount.h> +#endif
#include <libxml/xpathInternals.h>
@@ -1627,6 +1630,9 @@ qemuDomainObjPrivateXMLFormat(virBufferPtr buf, virDomainChrTypeToString(priv->monConfig->type)); }
+ if (priv->containerized) + virBufferAddLit(buf, "<containerized/>\n");
I wonder if it is worth explicitly listing the namespaces we enabled ? eg
<namespaces> <mount/> </namespaces>
so we're prepared to deal with us adding use of more private namespaces in future ?
Right. I was thinking about this but was unable to come with a reason for other namespaces. But this doesn't hurt and looks like to be more future-proof, so why not, right?
+ qemuDomainObjPrivateXMLFormatVcpus(buf, vm->def);
if (priv->qemuCaps) { @@ -1809,6 +1815,8 @@ qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, goto error; }
+ priv->containerized = virXPathBoolean("count(./containerized) > 0", ctxt) > 0; + if ((n = virXPathNodeSet("./vcpus/vcpu", ctxt, &nodes)) < 0) goto error;
@@ -6653,3 +6661,284 @@ qemuDomainSupportsVideoVga(virDomainVideoDefPtr video,
return true; } + + +static int +qemuDomainCreateDevice(const char *device, + const char *path, + bool allow_noent) +{ + char *devicePath = NULL; + struct stat sb; + int ret = -1; + + if (!STRPREFIX(device, "/dev")) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("invalid device: %s"), + device); + goto cleanup; + } + + if (virAsprintf(&devicePath, "%s/%s", + path, device + 4) < 0) + goto cleanup; + + if (stat(device, &sb) < 0) { + if (errno == ENOENT && allow_noent) { + /* Ignore non-existent device. */ + ret = 0; + goto cleanup; + } + + virReportSystemError(errno, _("Unable to stat %s"), device); + goto cleanup; + } + + if (virFileMakeParentPath(devicePath) < 0) { + virReportSystemError(errno, + _("Unable to create %s"), + devicePath); + goto cleanup; + } + + if (mknod(devicePath, sb.st_mode, sb.st_rdev) < 0) { + virReportSystemError(errno, + _("Failed to make device %s"), + devicePath); + goto cleanup; + } + + if (chown(devicePath, sb.st_uid, sb.st_gid) < 0) { + virReportSystemError(errno, + _("Failed to chown device %s"), + devicePath); + goto cleanup; + } + + if (virFileCopyACLs(device, devicePath) < 0 && + errno != ENOTSUP) { + virReportSystemError(errno, + _("Failed to copy ACLs on device %s"), + devicePath); + goto cleanup; + }
Per my comments on earlier patches, I don't think we need to have the chown or ACL copy. Just have the dev owned by root:root and let our security drivers deal with the rest.
Meanwhile, we discussed this in other thread and found out we really ned copy as-is.
+ + ret = 0; + cleanup: + VIR_FREE(devicePath); + return ret; +} + + + +static int +qemuDomainPopulateDevices(virQEMUDriverPtr driver, + virDomainObjPtr vm ATTRIBUTE_UNUSED, + const char *path) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + const char *const *devices = (const char *const *) cfg->cgroupDeviceACL; + size_t i; + int ret = -1; + + if (!devices) + devices = defaultDeviceACL; + + for (i = 0; devices[i]; i++) { + if (qemuDomainCreateDevice(devices[i], path, true) < 0) + goto cleanup; + } + + ret = 0; + cleanup: + virObjectUnref(cfg); + return ret; +} + + +static int +qemuDomainSetupDev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *path) +{ + char *mount_options = NULL; + char *opts = NULL; + int ret = -1; + + VIR_DEBUG("Setting up /dev/ for domain %s", vm->def->name); + + mount_options = virSecurityManagerGetMountOptions(driver->securityManager, + vm->def); + + if (!mount_options && + VIR_STRDUP(mount_options, "") < 0) + goto cleanup; + + /* + * tmpfs is limited to 64kb, since we only have device nodes in there + * and don't want to DOS the entire OS RAM usage + */ + if (virAsprintf(&opts, + "mode=755,size=65536%s", mount_options) < 0) + goto cleanup; + + if (virFileSetupDev(path, opts) < 0) + goto cleanup; + + if (qemuDomainPopulateDevices(driver, vm, path) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(opts); + VIR_FREE(mount_options); + return ret; +} + + +int +qemuDomainBuildNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + qemuDomainObjPrivatePtr priv = vm->privateData; + const unsigned long mount_flags = MS_MOVE; + char *devPath = NULL; + char *devptsPath = NULL; + int ret = -1; + + if (!priv->containerized) { + ret = 0; + goto cleanup; + } + + if (virAsprintf(&devPath, "%s/%s.dev", + cfg->stateDir, vm->def->name) < 0 || + virAsprintf(&devptsPath, "%s/%s.devpts", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + if (qemuDomainSetupDev(driver, vm, devPath) < 0) + goto cleanup; + + /* Save /dev/pts mount point because /dev/pts/NNN is exposed in our live + * XML and other applications are supposed to be able to use it. */ + if (mount("/dev/pts", devptsPath, NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, "%s", + _("Unable to move /dev/pts mount")); + goto cleanup; + } + + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, + _("Failed to mount %s on /dev"), + devPath); + goto cleanup; + } + + if (virFileMakePath("/dev/pts") < 0) { + virReportSystemError(errno, "%s", + _("Cannot create /dev/pts")); + goto cleanup; + } + + if (mount(devptsPath, "/dev/pts", NULL, mount_flags, NULL) < 0) { + virReportSystemError(errno, + _("Failed to mount %s on /dev/pts"), + devptsPath); + goto cleanup; + } + + if (virFileBindMountDevice("/dev/pts/ptmx", "/dev/ptmx") < 0) + goto cleanup; + + VIR_DEBUG("blaaah: %d", system("find /dev/ -ls > /tmp/blaaah")); + VIR_DEBUG("blaaah: %d", system("echo >> /tmp/blaaah")); + VIR_DEBUG("blaaah: %d", system("mount >> /tmp/blaaah"));
Heh, :-)
D'oh! I remembered I needed to undo some debug lines somewhere...
+ ret = 0; + cleanup: + virObjectUnref(cfg); + VIR_FREE(devPath); + return ret; +} + + +int +qemuDomainCreateNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + qemuDomainObjPrivatePtr priv = vm->privateData; + int ret = -1; + char *path = NULL; + +#if !defined(__linux__) + /* Namespaces are Linux specific. On other platforms just + * carry on with the old behaviour. */ + return 0; +#endif
This feels quite likely to create compiler warnings on non-linux about unreachable code. It feels like we should just stub out the entire method instead.
Ah, good point.
+ + if (!virQEMUDriverIsPrivileged(driver)) { + ret = 0; + goto cleanup; + } + + if (virAsprintf(&path, "%s/%s.dev", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + if (virFileMakePath(path) < 0) { + virReportSystemError(errno, + _("Failed to create %s"), + path); + goto cleanup; + } + + VIR_FREE(path); + if (virAsprintf(&path, "%s/%s.devpts", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + if (virFileMakePath(path) < 0) { + virReportSystemError(errno, + _("Failed to create %s"), + path); + goto cleanup; + } + + priv->containerized = true; + ret = 0; + cleanup: + VIR_FREE(path); + virObjectUnref(cfg); + return ret; +} + + +void +qemuDomainDeleteNamespace(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + qemuDomainObjPrivatePtr priv = vm->privateData; + char *path; + + if (!priv->containerized) + return; + + if (virAsprintf(&path, "%s/%s.dev", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + virFileDeleteTree(path); + + VIR_FREE(path); + if (virAsprintf(&path, "%s/%s.devpts", + cfg->stateDir, vm->def->name) < 0) + goto cleanup; + + virFileDeleteTree(path); + cleanup: + virObjectUnref(cfg); + VIR_FREE(path); +}
This is running in the host namespace and custom /dev tmpfs was only ever mounted in the QEMU private namespace, so should be invisible to the host. As such, IIUC, these directories should both be 100% empty. IOW, it seems that we don't need virFileDeleteTree - plain rmdir should be sufficient and safer against accidents.
Again, very good point. I'll fix it in v2. Michal

When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index d6a1c29..2d1c02b 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6796,6 +6796,53 @@ qemuDomainSetupDev(virQEMUDriverPtr driver, } +static int +qemuDomainSetupDisk(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainDiskDefPtr disk, + const char *devPath) +{ + virStorageSourcePtr next; + char *dst = NULL; + int ret = -1; + + for (next = disk->src; next; next = next->backingStore) { + if (!next->path || !virStorageSourceIsLocalStorage(next) || + !STRPREFIX(next->path, "/dev")) { + /* Not creating device. Just continue. */ + continue; + } + + if (qemuDomainCreateDevice(next->path, devPath, false) < 0) + goto cleanup; + } + + ret = 0; + cleanup: + VIR_FREE(dst); + return ret; +} + + +static int +qemuDomainSetupAllDisks(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *devPath) +{ + size_t i; + VIR_DEBUG("Setting up disks"); + + for (i = 0; i < vm->def->ndisks; i++) { + if (qemuDomainSetupDisk(driver, + vm->def->disks[i], + devPath) < 0) + return -1; + } + + VIR_DEBUG("Setup all disks"); + return 0; +} + + int qemuDomainBuildNamespace(virQEMUDriverPtr driver, virDomainObjPtr vm) @@ -6829,6 +6876,9 @@ qemuDomainBuildNamespace(virQEMUDriverPtr driver, goto cleanup; } + if (qemuDomainSetupAllDisks(driver, vm, devPath) < 0) + goto cleanup; + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { virReportSystemError(errno, _("Failed to mount %s on /dev"), -- 2.8.4

When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 146 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 146 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 2d1c02b..17fb3d0 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6843,6 +6843,149 @@ qemuDomainSetupAllDisks(virQEMUDriverPtr driver, } +static int +qemuDomainGetHostdevPath(virDomainHostdevDefPtr dev, + char **path) +{ + int ret = -1; + virDomainHostdevSubsysUSBPtr usbsrc = &dev->source.subsys.u.usb; + virDomainHostdevSubsysPCIPtr pcisrc = &dev->source.subsys.u.pci; + virDomainHostdevSubsysSCSIPtr scsisrc = &dev->source.subsys.u.scsi; + virPCIDevicePtr pci = NULL; + virUSBDevicePtr usb = NULL; + virSCSIDevicePtr scsi = NULL; + char *tmpPath = NULL; + bool freeTmpPath = false; + + *path = NULL; + + switch ((virDomainHostdevMode) dev->mode) { + case VIR_DOMAIN_HOSTDEV_MODE_SUBSYS: + switch ((virDomainHostdevSubsysType) dev->source.subsys.type) { + case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: + if (pcisrc->backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO) { + pci = virPCIDeviceNew(pcisrc->addr.domain, + pcisrc->addr.bus, + pcisrc->addr.slot, + pcisrc->addr.function); + if (!pci) + goto cleanup; + + if (!(tmpPath = virPCIDeviceGetIOMMUGroupDev(pci))) + goto cleanup; + freeTmpPath = true; + } + break; + + case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_USB: + if (dev->missing) + break; + usb = virUSBDeviceNew(usbsrc->bus, + usbsrc->device, + NULL); + if (!usb) + goto cleanup; + + if (!(tmpPath = (char *) virUSBDeviceGetPath(usb))) + goto cleanup; + break; + + case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_SCSI: + if (scsisrc->protocol == VIR_DOMAIN_HOSTDEV_SCSI_PROTOCOL_TYPE_ISCSI) { + virDomainHostdevSubsysSCSIiSCSIPtr iscsisrc = &scsisrc->u.iscsi; + /* Follow qemuSetupDiskCgroup() and qemuSetImageCgroupInternal() + * which does nothing for non local storage + */ + VIR_DEBUG("Not updating /dev for hostdev iSCSI path '%s'", iscsisrc->path); + } else { + virDomainHostdevSubsysSCSIHostPtr scsihostsrc = &scsisrc->u.host; + scsi = virSCSIDeviceNew(NULL, + scsihostsrc->adapter, + scsihostsrc->bus, + scsihostsrc->target, + scsihostsrc->unit, + dev->readonly, + dev->shareable); + + if (!scsi) + goto cleanup; + + if (!(tmpPath = (char *) virSCSIDeviceGetPath(scsi))) + goto cleanup; + } + break; + + case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_LAST: + break; + } + break; + + case VIR_DOMAIN_HOSTDEV_MODE_CAPABILITIES: + case VIR_DOMAIN_HOSTDEV_MODE_LAST: + /* nada */ + break; + } + + if (VIR_STRDUP(*path, tmpPath) < 0) + goto cleanup; + + ret = 0; + cleanup: + virPCIDeviceFree(pci); + virUSBDeviceFree(usb); + virSCSIDeviceFree(scsi); + if (freeTmpPath) + VIR_FREE(tmpPath); + return ret; +} + + +static int +qemuDomainSetupHostdev(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainHostdevDefPtr dev, + const char *devPath) +{ + int ret = -1; + char *path = NULL; + + if (qemuDomainGetHostdevPath(dev, &path) < 0) + goto cleanup; + + if (!path) { + /* There's no /dev device that we need to create. Claim success. */ + ret = 0; + goto cleanup; + } + + if (qemuDomainCreateDevice(path, devPath, false) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(path); + return ret; +} + + +static int +qemuDomainSetupAllHostdevs(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *devPath) +{ + size_t i; + + VIR_DEBUG("Setting up hostdevs"); + for (i = 0; i < vm->def->nhostdevs; i++) { + if (qemuDomainSetupHostdev(driver, + vm->def->hostdevs[i], + devPath) < 0) + return -1; + } + VIR_DEBUG("Setup all hostdevs"); + return 0; +} + + int qemuDomainBuildNamespace(virQEMUDriverPtr driver, virDomainObjPtr vm) @@ -6879,6 +7022,9 @@ qemuDomainBuildNamespace(virQEMUDriverPtr driver, if (qemuDomainSetupAllDisks(driver, vm, devPath) < 0) goto cleanup; + if (qemuDomainSetupAllHostdevs(driver, vm, devPath) < 0) + goto cleanup; + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { virReportSystemError(errno, _("Failed to mount %s on /dev"), -- 2.8.4

When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 17fb3d0..9871f4d 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6986,6 +6986,38 @@ qemuDomainSetupAllHostdevs(virQEMUDriverPtr driver, } +static int +qemuDomainSetupChardev(virDomainDefPtr def ATTRIBUTE_UNUSED, + virDomainChrDefPtr dev, + void *opaque) +{ + const char *devPath = opaque; + + if (dev->source->type != VIR_DOMAIN_CHR_TYPE_DEV) + return 0; + + return qemuDomainCreateDevice(dev->source->data.file.path, devPath, false); +} + + +static int +qemuDomainSetupAllChardevs(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm, + const char *devPath) +{ + VIR_DEBUG("Setting up chardevs"); + + if (virDomainChrDefForeach(vm->def, + true, + qemuDomainSetupChardev, + (void *) devPath) < 0) + return -1; + + VIR_DEBUG("Setup all chardevs"); + return 0; +} + + int qemuDomainBuildNamespace(virQEMUDriverPtr driver, virDomainObjPtr vm) @@ -7025,6 +7057,9 @@ qemuDomainBuildNamespace(virQEMUDriverPtr driver, if (qemuDomainSetupAllHostdevs(driver, vm, devPath) < 0) goto cleanup; + if (qemuDomainSetupAllChardevs(driver, vm, devPath) < 0) + goto cleanup; + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { virReportSystemError(errno, _("Failed to mount %s on /dev"), -- 2.8.4

When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 9871f4d..07fa900 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -7018,6 +7018,35 @@ qemuDomainSetupAllChardevs(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, } +static int +qemuDomainSetupTPM(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm, + const char *devPath) +{ + virDomainTPMDefPtr dev = vm->def->tpm; + + if (!dev) + return 0; + + VIR_DEBUG("Setting up TPM"); + + switch (dev->type) { + case VIR_DOMAIN_TPM_TYPE_PASSTHROUGH: + if (qemuDomainCreateDevice(dev->data.passthrough.source.data.file.path, + devPath, false) < 0) + return -1; + break; + + case VIR_DOMAIN_TPM_TYPE_LAST: + /* nada */ + break; + } + + VIR_DEBUG("Setup TPM"); + return 0; +} + + int qemuDomainBuildNamespace(virQEMUDriverPtr driver, virDomainObjPtr vm) @@ -7060,6 +7089,9 @@ qemuDomainBuildNamespace(virQEMUDriverPtr driver, if (qemuDomainSetupAllChardevs(driver, vm, devPath) < 0) goto cleanup; + if (qemuDomainSetupTPM(driver, vm, devPath) < 0) + goto cleanup; + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { virReportSystemError(errno, _("Failed to mount %s on /dev"), -- 2.8.4

When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 07fa900..4afa15e 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -7047,6 +7047,52 @@ qemuDomainSetupTPM(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, } +static int +qemuDomainSetupInput(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainInputDefPtr input, + const char *devPath) +{ + int ret = -1; + + switch ((virDomainInputType) input->type) { + case VIR_DOMAIN_INPUT_TYPE_PASSTHROUGH: + if (qemuDomainCreateDevice(input->source.evdev, devPath, false) < 0) + goto cleanup; + break; + + case VIR_DOMAIN_INPUT_TYPE_MOUSE: + case VIR_DOMAIN_INPUT_TYPE_TABLET: + case VIR_DOMAIN_INPUT_TYPE_KBD: + case VIR_DOMAIN_INPUT_TYPE_LAST: + /* nada */ + break; + } + + ret = 0; + cleanup: + return ret; +} + + +static int +qemuDomainSetupAllInputs(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *devPath) +{ + size_t i; + + VIR_DEBUG("Setting up disks"); + for (i = 0; i < vm->def->ninputs; i++) { + if (qemuDomainSetupInput(driver, + vm->def->inputs[i], + devPath) < 0) + return -1; + } + VIR_DEBUG("Setup all disks"); + return 0; +} + + int qemuDomainBuildNamespace(virQEMUDriverPtr driver, virDomainObjPtr vm) @@ -7092,6 +7138,9 @@ qemuDomainBuildNamespace(virQEMUDriverPtr driver, if (qemuDomainSetupTPM(driver, vm, devPath) < 0) goto cleanup; + if (qemuDomainSetupAllInputs(driver, vm, devPath) < 0) + goto cleanup; + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { virReportSystemError(errno, _("Failed to mount %s on /dev"), -- 2.8.4

When starting a domain and separate mount namespace is used, we have to create all the /dev entries that are configured for the domain. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 4afa15e..b9492a9 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -7093,6 +7093,46 @@ qemuDomainSetupAllInputs(virQEMUDriverPtr driver, } +static int +qemuDomainSetupRNG(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainRNGDefPtr rng, + const char *devPath) +{ + switch ((virDomainRNGBackend) rng->backend) { + case VIR_DOMAIN_RNG_BACKEND_RANDOM: + if (qemuDomainCreateDevice(rng->source.file, devPath, false) < 0) + return -1; + + case VIR_DOMAIN_RNG_BACKEND_EGD: + case VIR_DOMAIN_RNG_BACKEND_LAST: + /* nada */ + break; + } + + return 0; +} + + +static int +qemuDomainSetupAllRNGs(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *devPath) +{ + size_t i; + + VIR_DEBUG("Setting up RNGs"); + for (i = 0; i < vm->def->nrngs; i++) { + if (qemuDomainSetupRNG(driver, + vm->def->rngs[i], + devPath) < 0) + return -1; + } + + VIR_DEBUG("Setup all RNGs"); + return 0; +} + + int qemuDomainBuildNamespace(virQEMUDriverPtr driver, virDomainObjPtr vm) @@ -7141,6 +7181,9 @@ qemuDomainBuildNamespace(virQEMUDriverPtr driver, if (qemuDomainSetupAllInputs(driver, vm, devPath) < 0) goto cleanup; + if (qemuDomainSetupAllRNGs(driver, vm, devPath) < 0) + goto cleanup; + if (mount(devPath, "/dev", NULL, mount_flags, NULL) < 0) { virReportSystemError(errno, _("Failed to mount %s on /dev"), -- 2.8.4

Instead of trying to fix our security drivers, we can use a simple trick to relabel paths in both namespace and the host. I mean, if we enter the namespace some paths are still shared with the host so any change done to them is visible from the host too. Therefore, we can just enter the namespace and call SetAllLabel()/RestoreAllLabel() from there. Yes, it has slight overhead because we have to fork in order to enter the namespace. But on the other hand, no complexity is added to our code. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/Makefile.am | 3 +- src/qemu/qemu_process.c | 15 +++--- src/qemu/qemu_security.c | 134 +++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_security.h | 39 ++++++++++++++ 4 files changed, 183 insertions(+), 8 deletions(-) create mode 100644 src/qemu/qemu_security.c create mode 100644 src/qemu/qemu_security.h diff --git a/src/Makefile.am b/src/Makefile.am index c08a350..3244689 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -836,7 +836,8 @@ QEMU_DRIVER_SOURCES = \ qemu/qemu_monitor_json.h \ qemu/qemu_driver.c qemu/qemu_driver.h \ qemu/qemu_interface.c qemu/qemu_interface.h \ - qemu/qemu_capspriv.h + qemu/qemu_capspriv.h \ + qemu/qemu_security.c qemu/qemu_security.h XENAPI_DRIVER_SOURCES = \ xenapi/xenapi_driver.c xenapi/xenapi_driver.h \ diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 4f64dd1..5797eaa 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -45,6 +45,7 @@ #include "qemu_hotplug.h" #include "qemu_migration.h" #include "qemu_interface.h" +#include "qemu_security.h" #include "cpu/cpu.h" #include "datatypes.h" @@ -5527,10 +5528,10 @@ qemuProcessLaunch(virConnectPtr conn, goto cleanup; VIR_DEBUG("Setting domain security labels"); - if (virSecurityManagerSetAllLabel(driver->securityManager, - vm->def, - incoming ? incoming->path : NULL) < 0) - goto cleanup; + if (qemuSecuritySetAllLabel(driver, + vm, + incoming ? incoming->path : NULL) < 0) + goto cleanup; /* Security manager labeled all devices, therefore * if any operation from now on fails, we need to ask the caller to @@ -6065,9 +6066,9 @@ void qemuProcessStop(virQEMUDriverPtr driver, /* Reset Security Labels unless caller don't want us to */ if (!(flags & VIR_QEMU_PROCESS_STOP_NO_RELABEL)) - virSecurityManagerRestoreAllLabel(driver->securityManager, - vm->def, - !!(flags & VIR_QEMU_PROCESS_STOP_MIGRATED)); + qemuSecurityRestoreAllLabel(driver, vm, + !!(flags & VIR_QEMU_PROCESS_STOP_MIGRATED)); + virSecurityManagerReleaseLabel(driver->securityManager, vm->def); for (i = 0; i < vm->def->ndisks; i++) { diff --git a/src/qemu/qemu_security.c b/src/qemu/qemu_security.c new file mode 100644 index 0000000..cec7dfd --- /dev/null +++ b/src/qemu/qemu_security.c @@ -0,0 +1,134 @@ +/* + * qemu_security.c: QEMU security management + * + * Copyright (C) 2016 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Michal Privoznik <mprivozn@redhat.com> + */ + +#include <config.h> + +#include "qemu_domain.h" +#include "qemu_security.h" +#include "virlog.h" + +#define VIR_FROM_THIS VIR_FROM_QEMU + +VIR_LOG_INIT("qemu.qemu_process"); + +struct qemuSecuritySetRestoreAllLabelData { + bool set; + virQEMUDriverPtr driver; + virDomainObjPtr vm; + const char *stdin_path; + bool migrated; +}; + + +static int +qemuSecuritySetRestoreAllLabelHelper(pid_t pid, + void *opaque) +{ + struct qemuSecuritySetRestoreAllLabelData *data = opaque; + + virSecurityManagerPostFork(data->driver->securityManager); + + if (data->set) { + VIR_DEBUG("Setting up security labels inside namespace pid=%lld", + (long long) pid); + if (virSecurityManagerSetAllLabel(data->driver->securityManager, + data->vm->def, + data->stdin_path) < 0) + return -1; + } else { + VIR_DEBUG("Restoring security labels inside namespace pid=%lld", + (long long) pid); + if (virSecurityManagerRestoreAllLabel(data->driver->securityManager, + data->vm->def, + data->migrated) < 0) + return -1; + } + + return 0; +} + + +int +qemuSecuritySetAllLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *stdin_path) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + struct qemuSecuritySetRestoreAllLabelData data; + + memset(&data, 0, sizeof(data)); + + data.set = true; + data.driver = driver; + data.vm = vm; + data.stdin_path = stdin_path; + + if (priv->containerized) { + if (virSecurityManagerPreFork(driver->securityManager) < 0) + return -1; + if (virProcessRunInMountNamespace(vm->pid, + qemuSecuritySetRestoreAllLabelHelper, + &data) < 0) { + virSecurityManagerPostFork(driver->securityManager); + return -1; + } + virSecurityManagerPostFork(driver->securityManager); + + } else { + if (virSecurityManagerSetAllLabel(driver->securityManager, + vm->def, + stdin_path) < 0) + return -1; + } + return 0; +} + + +void +qemuSecurityRestoreAllLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + bool migrated) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + struct qemuSecuritySetRestoreAllLabelData data; + + memset(&data, 0, sizeof(data)); + + data.driver = driver; + data.vm = vm; + data.migrated = migrated; + + if (priv->containerized) { + if (virSecurityManagerPreFork(driver->securityManager) < 0) + return; + + virProcessRunInMountNamespace(vm->pid, + qemuSecuritySetRestoreAllLabelHelper, + &data); + virSecurityManagerPostFork(driver->securityManager); + } else { + virSecurityManagerRestoreAllLabel(driver->securityManager, + vm->def, + migrated); + } +} diff --git a/src/qemu/qemu_security.h b/src/qemu/qemu_security.h new file mode 100644 index 0000000..88c53e7 --- /dev/null +++ b/src/qemu/qemu_security.h @@ -0,0 +1,39 @@ +/* + * qemu_security.h: QEMU security management + * + * Copyright (C) 2016 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Michal Privoznik <mprivozn@redhat.com> + */ + +#ifndef __QEMU_SECURITY_H__ +# define __QEMU_SECURITY_H__ + +# include <stdbool.h> + +# include "qemu_conf.h" +# include "domain_conf.h" + +int qemuSecuritySetAllLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *stdin_path); + +void qemuSecurityRestoreAllLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + bool migrated); +#endif /* __QEMU_SECURITY_H__ */ -- 2.8.4

On Thu, Nov 24, 2016 at 03:48:05PM +0100, Michal Privoznik wrote:
Instead of trying to fix our security drivers, we can use a simple trick to relabel paths in both namespace and the host. I mean, if we enter the namespace some paths are still shared with the host so any change done to them is visible from the host too. Therefore, we can just enter the namespace and call SetAllLabel()/RestoreAllLabel() from there. Yes, it has slight overhead because we have to fork in order to enter the namespace. But on the other hand, no complexity is added to our code.
I'm a little concerned that this may be storing problems for us at a later date. If the security manager classes have any state they update such changes are invisible to the main libvirt process now. Also stuff running between fork+exec is restricted to async signal safe functions only. I think it is hard for us to be entirely confident about that safety when we're running the entire security driver labelling code in that region. Ultimately the only bit of the security drivers that needs to run in the namespace is the setfilecon_raw() or chown() system calls. So I wonder if we should make it possible to provide a namespace helper callback to the security drivers that they'd use only for the setfilecon_raw/chown calls. eg, we could do something like typedef (*virSecurityManagerNamespaceHelperFunc)(void *opaque) typedef (*virSecurityManagerNamespaceHelper)(virSecurityManagerNamespaceHelperFunc func, void *opaque) virSecurityManagerSetNamespacehelper(mgr, runInNamespaceHelper) ....do the labelling... virSecurityManagerSetNamespacehelper(mgr, NULL) the namespace helper would have to be stored in a thread local since we need to cope with some VMs not having separate namespaces The security manager code would need todo struct SELinuxNamespaceData { const char *path; security_context_t ctx; } Now instead of if (setfilecon_raw(path, tcon) < 0) ... it would do static int SELinuxNamespaceHelperFunc(void *opaque) { struct SELinuxNamespaceData *data = opaque return setfilecon_raw(data->path, data->tcon) } struct SELinuxNamespaceData data = { path, tcon} if (namespacehelper(SELinuxNamespaceHelperFunc, &data) < 0) ... Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 05.12.2016 14:40, Daniel P. Berrange wrote:
On Thu, Nov 24, 2016 at 03:48:05PM +0100, Michal Privoznik wrote:
Instead of trying to fix our security drivers, we can use a simple trick to relabel paths in both namespace and the host. I mean, if we enter the namespace some paths are still shared with the host so any change done to them is visible from the host too. Therefore, we can just enter the namespace and call SetAllLabel()/RestoreAllLabel() from there. Yes, it has slight overhead because we have to fork in order to enter the namespace. But on the other hand, no complexity is added to our code.
I'm a little concerned that this may be storing problems for us at a later date. If the security manager classes have any state they update such changes are invisible to the main libvirt process now. Also stuff running between fork+exec is restricted to async signal safe functions only. I think it is hard for us to be entirely confident about that safety when we're running the entire security driver labelling code in that region.
Ultimately the only bit of the security drivers that needs to run in the namespace is the setfilecon_raw() or chown() system calls.
So I wonder if we should make it possible to provide a namespace helper callback to the security drivers that they'd use only for the setfilecon_raw/chown calls.
eg, we could do something like
typedef (*virSecurityManagerNamespaceHelperFunc)(void *opaque) typedef (*virSecurityManagerNamespaceHelper)(virSecurityManagerNamespaceHelperFunc func, void *opaque)
virSecurityManagerSetNamespacehelper(mgr, runInNamespaceHelper)
....do the labelling...
virSecurityManagerSetNamespacehelper(mgr, NULL)
the namespace helper would have to be stored in a thread local since we need to cope with some VMs not having separate namespaces
The security manager code would need todo
struct SELinuxNamespaceData { const char *path; security_context_t ctx; }
Now instead of
if (setfilecon_raw(path, tcon) < 0) ...
it would do
static int SELinuxNamespaceHelperFunc(void *opaque) { struct SELinuxNamespaceData *data = opaque
return setfilecon_raw(data->path, data->tcon) }
struct SELinuxNamespaceData data = { path, tcon}
if (namespacehelper(SELinuxNamespaceHelperFunc, &data) < 0) ...
I was thinking about this too, but I see two problems with this approach: 1) fork() on every chown() and every sefilecon_raw(). That's too much IMO. 2) thread safeness. We can't have the sec manager wide callback. But you already covered that - and thread local variable might in fact work just fine! Michal

On Mon, Dec 05, 2016 at 03:14:50PM +0100, Michal Privoznik wrote:
On 05.12.2016 14:40, Daniel P. Berrange wrote:
On Thu, Nov 24, 2016 at 03:48:05PM +0100, Michal Privoznik wrote:
Instead of trying to fix our security drivers, we can use a simple trick to relabel paths in both namespace and the host. I mean, if we enter the namespace some paths are still shared with the host so any change done to them is visible from the host too. Therefore, we can just enter the namespace and call SetAllLabel()/RestoreAllLabel() from there. Yes, it has slight overhead because we have to fork in order to enter the namespace. But on the other hand, no complexity is added to our code.
I'm a little concerned that this may be storing problems for us at a later date. If the security manager classes have any state they update such changes are invisible to the main libvirt process now. Also stuff running between fork+exec is restricted to async signal safe functions only. I think it is hard for us to be entirely confident about that safety when we're running the entire security driver labelling code in that region.
Ultimately the only bit of the security drivers that needs to run in the namespace is the setfilecon_raw() or chown() system calls.
So I wonder if we should make it possible to provide a namespace helper callback to the security drivers that they'd use only for the setfilecon_raw/chown calls.
eg, we could do something like
typedef (*virSecurityManagerNamespaceHelperFunc)(void *opaque) typedef (*virSecurityManagerNamespaceHelper)(virSecurityManagerNamespaceHelperFunc func, void *opaque)
virSecurityManagerSetNamespacehelper(mgr, runInNamespaceHelper)
....do the labelling...
virSecurityManagerSetNamespacehelper(mgr, NULL)
the namespace helper would have to be stored in a thread local since we need to cope with some VMs not having separate namespaces
The security manager code would need todo
struct SELinuxNamespaceData { const char *path; security_context_t ctx; }
Now instead of
if (setfilecon_raw(path, tcon) < 0) ...
it would do
static int SELinuxNamespaceHelperFunc(void *opaque) { struct SELinuxNamespaceData *data = opaque
return setfilecon_raw(data->path, data->tcon) }
struct SELinuxNamespaceData data = { path, tcon}
if (namespacehelper(SELinuxNamespaceHelperFunc, &data) < 0) ...
I was thinking about this too, but I see two problems with this approach:
1) fork() on every chown() and every sefilecon_raw(). That's too much IMO.
Hmm, good point. fork() is fairly fast on Linux, but it is still a non-negligble overhead in our startup codepath. Avoiding the fork would mean having some persistent child and RPC to it which is also likely to add just as much overhead. I guess we'll just have to bear in mind the limitations for security managers in future and go with what you've done. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 05.12.2016 15:25, Daniel P. Berrange wrote:
On Mon, Dec 05, 2016 at 03:14:50PM +0100, Michal Privoznik wrote:
On 05.12.2016 14:40, Daniel P. Berrange wrote:
On Thu, Nov 24, 2016 at 03:48:05PM +0100, Michal Privoznik wrote:
Instead of trying to fix our security drivers, we can use a simple trick to relabel paths in both namespace and the host. I mean, if we enter the namespace some paths are still shared with the host so any change done to them is visible from the host too. Therefore, we can just enter the namespace and call SetAllLabel()/RestoreAllLabel() from there. Yes, it has slight overhead because we have to fork in order to enter the namespace. But on the other hand, no complexity is added to our code.
I'm a little concerned that this may be storing problems for us at a later date. If the security manager classes have any state they update such changes are invisible to the main libvirt process now. Also stuff running between fork+exec is restricted to async signal safe functions only. I think it is hard for us to be entirely confident about that safety when we're running the entire security driver labelling code in that region.
Ultimately the only bit of the security drivers that needs to run in the namespace is the setfilecon_raw() or chown() system calls.
So I wonder if we should make it possible to provide a namespace helper callback to the security drivers that they'd use only for the setfilecon_raw/chown calls.
eg, we could do something like
typedef (*virSecurityManagerNamespaceHelperFunc)(void *opaque) typedef (*virSecurityManagerNamespaceHelper)(virSecurityManagerNamespaceHelperFunc func, void *opaque)
virSecurityManagerSetNamespacehelper(mgr, runInNamespaceHelper)
....do the labelling...
virSecurityManagerSetNamespacehelper(mgr, NULL)
the namespace helper would have to be stored in a thread local since we need to cope with some VMs not having separate namespaces
The security manager code would need todo
struct SELinuxNamespaceData { const char *path; security_context_t ctx; }
Now instead of
if (setfilecon_raw(path, tcon) < 0) ...
it would do
static int SELinuxNamespaceHelperFunc(void *opaque) { struct SELinuxNamespaceData *data = opaque
return setfilecon_raw(data->path, data->tcon) }
struct SELinuxNamespaceData data = { path, tcon}
if (namespacehelper(SELinuxNamespaceHelperFunc, &data) < 0) ...
I was thinking about this too, but I see two problems with this approach:
1) fork() on every chown() and every sefilecon_raw(). That's too much IMO.
Hmm, good point. fork() is fairly fast on Linux, but it is still a non-negligble overhead in our startup codepath.
Avoiding the fork would mean having some persistent child and RPC to it which is also likely to add just as much overhead.
I guess we'll just have to bear in mind the limitations for security managers in future and go with what you've done.
The other option might be to: 1) dig out code that gets you the most concrete seclabel for a device 2) chown() & setfilecon_raw() devices at the time we are creating /dev entry (for this we need the function from 1.) 3) have a filter callback (that would be set & unset just like you propose above) that for any /dev prefixed path returns 'success' without any actual chown() or setfilecon_raw() performed. This way we would not need any additional fork() nor we would have to worry about modifying internal state of secdriver in a separate process. However: a) I'm a bit worried as this is going behind secdriver's back, b) It's a lot more code than this patch. Therefore I'd suggest to go with this patch for now and save the work for a follow up patch. Michal

On Mon, Dec 05, 2016 at 02:25:25PM +0000, Daniel P. Berrange wrote:
On Mon, Dec 05, 2016 at 03:14:50PM +0100, Michal Privoznik wrote:
On 05.12.2016 14:40, Daniel P. Berrange wrote:
On Thu, Nov 24, 2016 at 03:48:05PM +0100, Michal Privoznik wrote:
Instead of trying to fix our security drivers, we can use a simple trick to relabel paths in both namespace and the host. I mean, if we enter the namespace some paths are still shared with the host so any change done to them is visible from the host too. Therefore, we can just enter the namespace and call SetAllLabel()/RestoreAllLabel() from there. Yes, it has slight overhead because we have to fork in order to enter the namespace. But on the other hand, no complexity is added to our code.
I'm a little concerned that this may be storing problems for us at a later date. If the security manager classes have any state they update such changes are invisible to the main libvirt process now. Also stuff running between fork+exec is restricted to async signal safe functions only. I think it is hard for us to be entirely confident about that safety when we're running the entire security driver labelling code in that region.
Ultimately the only bit of the security drivers that needs to run in the namespace is the setfilecon_raw() or chown() system calls.
So I wonder if we should make it possible to provide a namespace helper callback to the security drivers that they'd use only for the setfilecon_raw/chown calls.
eg, we could do something like
typedef (*virSecurityManagerNamespaceHelperFunc)(void *opaque) typedef (*virSecurityManagerNamespaceHelper)(virSecurityManagerNamespaceHelperFunc func, void *opaque)
virSecurityManagerSetNamespacehelper(mgr, runInNamespaceHelper)
....do the labelling...
virSecurityManagerSetNamespacehelper(mgr, NULL)
the namespace helper would have to be stored in a thread local since we need to cope with some VMs not having separate namespaces
The security manager code would need todo
struct SELinuxNamespaceData { const char *path; security_context_t ctx; }
Now instead of
if (setfilecon_raw(path, tcon) < 0) ...
it would do
static int SELinuxNamespaceHelperFunc(void *opaque) { struct SELinuxNamespaceData *data = opaque
return setfilecon_raw(data->path, data->tcon) }
struct SELinuxNamespaceData data = { path, tcon}
if (namespacehelper(SELinuxNamespaceHelperFunc, &data) < 0) ...
I was thinking about this too, but I see two problems with this approach:
1) fork() on every chown() and every sefilecon_raw(). That's too much IMO.
Hmm, good point. fork() is fairly fast on Linux, but it is still a non-negligble overhead in our startup codepath.
I thought of a reasonable easy way we can deal with this. The key point is that it is only a problem during domain startup where we relabel lots of files - hotplug we're only labelling 1 file usually so overhead of 1 fork is not an issue. IOW, performance is only a problem when calling two methods virSecuritySELinuxSetAllLabel and virSecuritySELinuxRestoreAllLabel What we can do is to modify those methods so that all the labelling operations are batched up. eg have virSecuritySELinuxTransactionStart virSecuritySELinuxTransactionCommit and when a transaction is running, we modify virSecuritySELinuxSetFileconHelper so that it simply saves a record of the filepath + label. When we commit the transaction, then then call the real virSecuritySELinuxSetFileconHelper logic for the recorded files, from within our fork() helper. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 07.12.2016 09:51, Daniel P. Berrange wrote:
On Mon, Dec 05, 2016 at 02:25:25PM +0000, Daniel P. Berrange wrote:
On Mon, Dec 05, 2016 at 03:14:50PM +0100, Michal Privoznik wrote:
On 05.12.2016 14:40, Daniel P. Berrange wrote:
On Thu, Nov 24, 2016 at 03:48:05PM +0100, Michal Privoznik wrote:
Instead of trying to fix our security drivers, we can use a simple trick to relabel paths in both namespace and the host. I mean, if we enter the namespace some paths are still shared with the host so any change done to them is visible from the host too. Therefore, we can just enter the namespace and call SetAllLabel()/RestoreAllLabel() from there. Yes, it has slight overhead because we have to fork in order to enter the namespace. But on the other hand, no complexity is added to our code.
I'm a little concerned that this may be storing problems for us at a later date. If the security manager classes have any state they update such changes are invisible to the main libvirt process now. Also stuff running between fork+exec is restricted to async signal safe functions only. I think it is hard for us to be entirely confident about that safety when we're running the entire security driver labelling code in that region.
Ultimately the only bit of the security drivers that needs to run in the namespace is the setfilecon_raw() or chown() system calls.
So I wonder if we should make it possible to provide a namespace helper callback to the security drivers that they'd use only for the setfilecon_raw/chown calls.
eg, we could do something like
typedef (*virSecurityManagerNamespaceHelperFunc)(void *opaque) typedef (*virSecurityManagerNamespaceHelper)(virSecurityManagerNamespaceHelperFunc func, void *opaque)
virSecurityManagerSetNamespacehelper(mgr, runInNamespaceHelper)
....do the labelling...
virSecurityManagerSetNamespacehelper(mgr, NULL)
the namespace helper would have to be stored in a thread local since we need to cope with some VMs not having separate namespaces
The security manager code would need todo
struct SELinuxNamespaceData { const char *path; security_context_t ctx; }
Now instead of
if (setfilecon_raw(path, tcon) < 0) ...
it would do
static int SELinuxNamespaceHelperFunc(void *opaque) { struct SELinuxNamespaceData *data = opaque
return setfilecon_raw(data->path, data->tcon) }
struct SELinuxNamespaceData data = { path, tcon}
if (namespacehelper(SELinuxNamespaceHelperFunc, &data) < 0) ...
I was thinking about this too, but I see two problems with this approach:
1) fork() on every chown() and every sefilecon_raw(). That's too much IMO.
Hmm, good point. fork() is fairly fast on Linux, but it is still a non-negligble overhead in our startup codepath.
I thought of a reasonable easy way we can deal with this. The key point is that it is only a problem during domain startup where we relabel lots of files - hotplug we're only labelling 1 file usually so overhead of 1 fork is not an issue.
Unless a disk is plugged and we are relabelling the whole backing chain. That is - we might want to use the approach you're describing in other cases too, i.e. in every hotplug.
IOW, performance is only a problem when calling two methods virSecuritySELinuxSetAllLabel and virSecuritySELinuxRestoreAllLabel
But agreed that these two are the problem.
What we can do is to modify those methods so that all the labelling operations are batched up. eg have
virSecuritySELinuxTransactionStart virSecuritySELinuxTransactionCommit
and when a transaction is running, we modify virSecuritySELinuxSetFileconHelper so that it simply saves a record of the filepath + label. When we commit the transaction, then then call the real virSecuritySELinuxSetFileconHelper logic for the recorded files, from within our fork() helper.
Yup. This would work. But then again, for now I'd go with this approach (in fact that's what I've just sent), and save this for a follow up patch - which I can work on whilst waiting for review. Michal

On Wed, Dec 07, 2016 at 11:19:46AM +0100, Michal Privoznik wrote:
On 07.12.2016 09:51, Daniel P. Berrange wrote:
On Mon, Dec 05, 2016 at 02:25:25PM +0000, Daniel P. Berrange wrote:
On Mon, Dec 05, 2016 at 03:14:50PM +0100, Michal Privoznik wrote:
On 05.12.2016 14:40, Daniel P. Berrange wrote:
On Thu, Nov 24, 2016 at 03:48:05PM +0100, Michal Privoznik wrote:
Instead of trying to fix our security drivers, we can use a simple trick to relabel paths in both namespace and the host. I mean, if we enter the namespace some paths are still shared with the host so any change done to them is visible from the host too. Therefore, we can just enter the namespace and call SetAllLabel()/RestoreAllLabel() from there. Yes, it has slight overhead because we have to fork in order to enter the namespace. But on the other hand, no complexity is added to our code.
I'm a little concerned that this may be storing problems for us at a later date. If the security manager classes have any state they update such changes are invisible to the main libvirt process now. Also stuff running between fork+exec is restricted to async signal safe functions only. I think it is hard for us to be entirely confident about that safety when we're running the entire security driver labelling code in that region.
Ultimately the only bit of the security drivers that needs to run in the namespace is the setfilecon_raw() or chown() system calls.
So I wonder if we should make it possible to provide a namespace helper callback to the security drivers that they'd use only for the setfilecon_raw/chown calls.
eg, we could do something like
typedef (*virSecurityManagerNamespaceHelperFunc)(void *opaque) typedef (*virSecurityManagerNamespaceHelper)(virSecurityManagerNamespaceHelperFunc func, void *opaque)
virSecurityManagerSetNamespacehelper(mgr, runInNamespaceHelper)
....do the labelling...
virSecurityManagerSetNamespacehelper(mgr, NULL)
the namespace helper would have to be stored in a thread local since we need to cope with some VMs not having separate namespaces
The security manager code would need todo
struct SELinuxNamespaceData { const char *path; security_context_t ctx; }
Now instead of
if (setfilecon_raw(path, tcon) < 0) ...
it would do
static int SELinuxNamespaceHelperFunc(void *opaque) { struct SELinuxNamespaceData *data = opaque
return setfilecon_raw(data->path, data->tcon) }
struct SELinuxNamespaceData data = { path, tcon}
if (namespacehelper(SELinuxNamespaceHelperFunc, &data) < 0) ...
I was thinking about this too, but I see two problems with this approach:
1) fork() on every chown() and every sefilecon_raw(). That's too much IMO.
Hmm, good point. fork() is fairly fast on Linux, but it is still a non-negligble overhead in our startup codepath.
I thought of a reasonable easy way we can deal with this. The key point is that it is only a problem during domain startup where we relabel lots of files - hotplug we're only labelling 1 file usually so overhead of 1 fork is not an issue.
Unless a disk is plugged and we are relabelling the whole backing chain. That is - we might want to use the approach you're describing in other cases too, i.e. in every hotplug.
True - most apps use fairly shallow chains, so probably not too critical but worth optimizing if its not too hard.
IOW, performance is only a problem when calling two methods virSecuritySELinuxSetAllLabel and virSecuritySELinuxRestoreAllLabel
But agreed that these two are the problem.
What we can do is to modify those methods so that all the labelling operations are batched up. eg have
virSecuritySELinuxTransactionStart virSecuritySELinuxTransactionCommit
and when a transaction is running, we modify virSecuritySELinuxSetFileconHelper so that it simply saves a record of the filepath + label. When we commit the transaction, then then call the real virSecuritySELinuxSetFileconHelper logic for the recorded files, from within our fork() helper.
Yup. This would work. But then again, for now I'd go with this approach (in fact that's what I've just sent), and save this for a follow up patch - which I can work on whilst waiting for review.
Sounds fine with me Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

When attaching a device to a domain that's using separate mount namespace we must maintain /dev entries in order for qemu process to see them. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 215 +++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 8 ++ src/qemu/qemu_driver.c | 5 +- src/qemu/qemu_hotplug.c | 22 +++-- src/qemu/qemu_security.c | 36 ++++++++ src/qemu/qemu_security.h | 8 ++ 6 files changed, 285 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index b9492a9..a1edde9 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -53,6 +53,11 @@ #include "storage/storage_driver.h" +#ifdef MAJOR_IN_MKDEV +# include <sys/mkdev.h> +#elif MAJOR_IN_SYSMACROS +# include <sys/sysmacros.h> +#endif #include <sys/time.h> #include <fcntl.h> #if defined(HAVE_SYS_MOUNT_H) @@ -7297,3 +7302,213 @@ qemuDomainDeleteNamespace(virQEMUDriverPtr driver, virObjectUnref(cfg); VIR_FREE(path); } + + +struct qemuDomainAttachDeviceMknodData { + virQEMUDriverPtr driver; + virDomainObjPtr vm; + virDomainDeviceDefPtr devDef; + const char *file; + struct stat sb; + void *acl; +}; + + +static int +qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, + void *opaque) +{ + struct qemuDomainAttachDeviceMknodData *data = opaque; + int ret = -1; + + virSecurityManagerPostFork(data->driver->securityManager); + + if (virFileMakeParentPath(data->file) < 0) { + virReportSystemError(errno, + _("Unable to create %s"), data->file); + goto cleanup; + } + + VIR_DEBUG("Creating dev %s (%d,%d)", + data->file, major(data->sb.st_rdev), minor(data->sb.st_rdev)); + if (mknod(data->file, data->sb.st_mode, data->sb.st_rdev) < 0) { + /* Because we are not removing devices on hotunplug, or + * we might be creating part of backing chain that + * already exist due to a different disk plugged to + * domain, accept EEXIST. */ + if (errno != EEXIST) { + virReportSystemError(errno, + _("Unable to create device %s"), + data->file); + goto cleanup; + } + } + + if (virFileSetACLs(data->file, data->acl) < 0 && + errno != ENOTSUP) { + virReportSystemError(errno, + _("Unable to set ACLs on %s"), data->file); + goto cleanup; + } + + switch ((virDomainDeviceType) data->devDef->type) { + case VIR_DOMAIN_DEVICE_DISK: { + virDomainDiskDefPtr def = data->devDef->data.disk; + char *tmpsrc = def->src->path; + def->src->path = (char *) data->file; + if (virSecurityManagerSetDiskLabel(data->driver->securityManager, + data->vm->def, def) < 0) { + def->src->path = tmpsrc; + goto cleanup; + } + def->src->path = tmpsrc; + } break; + + case VIR_DOMAIN_DEVICE_NONE: + case VIR_DOMAIN_DEVICE_LEASE: + case VIR_DOMAIN_DEVICE_FS: + case VIR_DOMAIN_DEVICE_NET: + case VIR_DOMAIN_DEVICE_INPUT: + case VIR_DOMAIN_DEVICE_SOUND: + case VIR_DOMAIN_DEVICE_VIDEO: + case VIR_DOMAIN_DEVICE_HOSTDEV: + case VIR_DOMAIN_DEVICE_WATCHDOG: + case VIR_DOMAIN_DEVICE_CONTROLLER: + case VIR_DOMAIN_DEVICE_GRAPHICS: + case VIR_DOMAIN_DEVICE_HUB: + case VIR_DOMAIN_DEVICE_REDIRDEV: + case VIR_DOMAIN_DEVICE_SMARTCARD: + case VIR_DOMAIN_DEVICE_CHR: + case VIR_DOMAIN_DEVICE_MEMBALLOON: + case VIR_DOMAIN_DEVICE_NVRAM: + case VIR_DOMAIN_DEVICE_RNG: + case VIR_DOMAIN_DEVICE_SHMEM: + case VIR_DOMAIN_DEVICE_TPM: + case VIR_DOMAIN_DEVICE_PANIC: + case VIR_DOMAIN_DEVICE_MEMORY: + case VIR_DOMAIN_DEVICE_IOMMU: + case VIR_DOMAIN_DEVICE_LAST: + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unexpected device type %d"), + data->devDef->type); + goto cleanup; + } + + ret = 0; + cleanup: + if (ret < 0) + unlink(data->file); + virFileFreeACLs(&data->acl); + return ret; +} + + +static int +qemuDomainAttachDeviceMknod(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDeviceDefPtr devDef, + const char *file) +{ + struct qemuDomainAttachDeviceMknodData data; + int ret = -1; + + memset(&data, 0, sizeof(data)); + + data.driver = driver; + data.vm = vm; + data.devDef = devDef; + data.file = file; + + if (stat(file, &data.sb) < 0) { + virReportSystemError(errno, + _("Unable to access %s"), file); + return ret; + } + + if (virFileGetACLs(file, &data.acl) < 0 && + errno != ENOTSUP) { + virReportSystemError(errno, + _("Unable to get ACLs on %s"), file); + return ret; + } + + if (virSecurityManagerPreFork(driver->securityManager) < 0) + goto cleanup; + + if (virProcessRunInMountNamespace(vm->pid, + qemuDomainAttachDeviceMknodHelper, + &data) < 0) { + virSecurityManagerPostFork(driver->securityManager); + goto cleanup; + } + + virSecurityManagerPostFork(driver->securityManager); + + ret = 0; + cleanup: + virFileFreeACLs(&data.acl); + return 0; +} + + +int +qemuDomainNamespaceSetupDisk(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDeviceDef dev = {.type = VIR_DOMAIN_DEVICE_DISK, .data.disk = disk}; + virStorageSourcePtr next; + const char *src = NULL; + struct stat sb; + int ret = -1; + + if (!priv->containerized) + return 0; + + for (next = disk->src; next; next = next->backingStore) { + if (!next->path || !virStorageSourceIsBlockLocal(disk->src) || + !STRPREFIX(next->path, "/dev")) { + /* Not creating device. Just continue. */ + continue; + } + + if (stat(next->path, &sb) < 0) { + virReportSystemError(errno, + _("Unable to access %s"), src); + goto cleanup; + } + + if (!S_ISBLK(sb.st_mode)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Disk source %s must be a block device"), + src); + goto cleanup; + } + + if (qemuDomainAttachDeviceMknod(driver, + vm, + &dev, + next->path) < 0) + goto cleanup; + } + + ret = 0; + cleanup: + return ret; +} + + +int +qemuDomainNamespaceTeardownDisk(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm ATTRIBUTE_UNUSED, + virDomainDiskDefPtr disk ATTRIBUTE_UNUSED) +{ + /* While in hotplug case we create the whole backing chain, + * here we must limit ourselves. The disk we want to remove + * might be a part of backing chain of another disk. + * If you are reading these lines and have some spare time + * you can come up with and algorithm that checks for that. + * I don't, therefore: */ + return 0; +} diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 9c34476..045d1ba 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -795,4 +795,12 @@ int qemuDomainCreateNamespace(virQEMUDriverPtr driver, void qemuDomainDeleteNamespace(virQEMUDriverPtr driver, virDomainObjPtr vm); + +int qemuDomainNamespaceSetupDisk(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk); + +int qemuDomainNamespaceTeardownDisk(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk); #endif /* __QEMU_DOMAIN_H__ */ diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 8c8f969..fcf6dd9 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -57,6 +57,7 @@ #include "qemu_process.h" #include "qemu_migration.h" #include "qemu_blockjob.h" +#include "qemu_security.h" #include "virerror.h" #include "virlog.h" @@ -16093,9 +16094,9 @@ qemuDomainBlockPivot(virQEMUDriverPtr driver, disk->mirror->format != VIR_STORAGE_FILE_RAW && (virDomainLockDiskAttach(driver->lockManager, cfg->uri, vm, disk) < 0 || + qemuDomainNamespaceSetupDisk(driver, vm, disk) < 0 || qemuSetupDiskCgroup(vm, disk) < 0 || - virSecurityManagerSetDiskLabel(driver->securityManager, vm->def, - disk) < 0)) + qemuSecuritySetDiskLabel(driver, vm, disk) < 0)) goto cleanup; disk->src = oldsrc; diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 5038ce1..9b47b5f 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -34,6 +34,7 @@ #include "qemu_hostdev.h" #include "qemu_interface.h" #include "qemu_process.h" +#include "qemu_security.h" #include "domain_audit.h" #include "netdev_bandwidth_conf.h" #include "domain_nwfilter.h" @@ -109,13 +110,15 @@ qemuDomainPrepareDisk(virQEMUDriverPtr driver, vm, disk) < 0) goto cleanup; - if (virSecurityManagerSetDiskLabel(driver->securityManager, - vm->def, disk) < 0) + if (qemuSecuritySetDiskLabel(driver, vm, disk) < 0) goto rollback_lock; - if (qemuSetupDiskCgroup(vm, disk) < 0) + if (qemuDomainNamespaceSetupDisk(driver, vm, disk) < 0) goto rollback_label; + if (qemuSetupDiskCgroup(vm, disk) < 0) + goto rollback_namespace; + ret = 0; goto cleanup; @@ -123,10 +126,13 @@ qemuDomainPrepareDisk(virQEMUDriverPtr driver, if (qemuTeardownDiskCgroup(vm, disk) < 0) VIR_WARN("Unable to tear down cgroup access on %s", virDomainDiskGetSource(disk)); + rollback_namespace: + if (qemuDomainNamespaceTeardownDisk(driver, vm, disk) < 0) + VIR_WARN("Unable to remove /dev entry for %s", + virDomainDiskGetSource(disk)); rollback_label: - if (virSecurityManagerRestoreDiskLabel(driver->securityManager, - vm->def, disk) < 0) + if (qemuSecurityRestoreDiskLabel(driver, vm, disk) < 0) VIR_WARN("Unable to restore security label on %s", virDomainDiskGetSource(disk)); @@ -3465,8 +3471,7 @@ qemuDomainRemoveDiskDevice(virQEMUDriverPtr driver, qemuDomainReleaseDeviceAddress(vm, &disk->info, src); - if (virSecurityManagerRestoreDiskLabel(driver->securityManager, - vm->def, disk) < 0) + if (qemuSecurityRestoreDiskLabel(driver, vm, disk) < 0) VIR_WARN("Unable to restore security label on %s", src); if (qemuTeardownDiskCgroup(vm, disk) < 0) @@ -3475,6 +3480,9 @@ qemuDomainRemoveDiskDevice(virQEMUDriverPtr driver, if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) VIR_WARN("Unable to release lock on %s", src); + if (qemuDomainNamespaceTeardownDisk(driver, vm, disk) < 0) + VIR_WARN("Unable to remove /dev entry for %s", src); + dev.type = VIR_DOMAIN_DEVICE_DISK; dev.data.disk = disk; ignore_value(qemuRemoveSharedDevice(driver, &dev, vm->def->name)); diff --git a/src/qemu/qemu_security.c b/src/qemu/qemu_security.c index cec7dfd..f800d9b 100644 --- a/src/qemu/qemu_security.c +++ b/src/qemu/qemu_security.c @@ -132,3 +132,39 @@ qemuSecurityRestoreAllLabel(virQEMUDriverPtr driver, migrated); } } + + +int +qemuSecuritySetDiskLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (priv->containerized) { + /* Already handled by namespace code. */ + return 0; + } + + return virSecurityManagerSetDiskLabel(driver->securityManager, + vm->def, + disk); +} + + +int +qemuSecurityRestoreDiskLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (priv->containerized) { + /* Already handled by namespace code. */ + return 0; + } + + return virSecurityManagerRestoreDiskLabel(driver->securityManager, + vm->def, + disk); +} diff --git a/src/qemu/qemu_security.h b/src/qemu/qemu_security.h index 88c53e7..e3324ca 100644 --- a/src/qemu/qemu_security.h +++ b/src/qemu/qemu_security.h @@ -36,4 +36,12 @@ int qemuSecuritySetAllLabel(virQEMUDriverPtr driver, void qemuSecurityRestoreAllLabel(virQEMUDriverPtr driver, virDomainObjPtr vm, bool migrated); + +int qemuSecuritySetDiskLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk); + +int qemuSecurityRestoreDiskLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk); #endif /* __QEMU_SECURITY_H__ */ -- 2.8.4

When attaching a device to a domain that's using separate mount namespace we must maintain /dev entries in order for qemu process to see them. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 159 ++++++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_domain.h | 8 +++ src/qemu/qemu_hotplug.c | 48 +++++++++----- src/qemu/qemu_security.c | 38 +++++++++++ src/qemu/qemu_security.h | 8 +++ 5 files changed, 246 insertions(+), 15 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index a1edde9..ac09b2c 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -7364,6 +7364,13 @@ qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, def->src->path = tmpsrc; } break; + case VIR_DOMAIN_DEVICE_HOSTDEV: { + virDomainHostdevDefPtr def = data->devDef->data.hostdev; + if (virSecurityManagerSetHostdevLabel(data->driver->securityManager, + data->vm->def, def, NULL) < 0) + goto cleanup; + } break; + case VIR_DOMAIN_DEVICE_NONE: case VIR_DOMAIN_DEVICE_LEASE: case VIR_DOMAIN_DEVICE_FS: @@ -7371,7 +7378,6 @@ qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, case VIR_DOMAIN_DEVICE_INPUT: case VIR_DOMAIN_DEVICE_SOUND: case VIR_DOMAIN_DEVICE_VIDEO: - case VIR_DOMAIN_DEVICE_HOSTDEV: case VIR_DOMAIN_DEVICE_WATCHDOG: case VIR_DOMAIN_DEVICE_CONTROLLER: case VIR_DOMAIN_DEVICE_GRAPHICS: @@ -7451,6 +7457,91 @@ qemuDomainAttachDeviceMknod(virQEMUDriverPtr driver, } +static int +qemuDomainDetachDeviceUnlinkHelper(pid_t pid ATTRIBUTE_UNUSED, + void *opaque) +{ + const char *path = opaque; + + VIR_DEBUG("Unlinking %s", path); + if (unlink(path) < 0 && errno != ENOENT) { + virReportSystemError(errno, + _("Unable to remove device %s"), path); + return -1; + } + + return 0; +} + + +static int +qemuDomainDetachDeviceUnlink(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDeviceDefPtr dev, + const char *file) +{ + /* Technically, this is not needed. Yet. But in the future + * security managers might do some reference counting over + * Set/Restore label and thus for every SetLabel() there + * should be corresponding RestoreLabel(). */ + switch ((virDomainDeviceType) dev->type) { + case VIR_DOMAIN_DEVICE_DISK: { + virDomainDiskDefPtr def = dev->data.disk; + char *tmpsrc = def->src->path; + def->src->path = (char *) file; + if (virSecurityManagerRestoreDiskLabel(driver->securityManager, + vm->def, def) < 0) { + def->src->path = tmpsrc; + return -1; + } + def->src->path = tmpsrc; + } break; + + case VIR_DOMAIN_DEVICE_HOSTDEV: { + virDomainHostdevDefPtr def = dev->data.hostdev; + if (virSecurityManagerRestoreHostdevLabel(driver->securityManager, + vm->def, def, NULL) < 0) + return -1; + } break; + + case VIR_DOMAIN_DEVICE_NONE: + case VIR_DOMAIN_DEVICE_LEASE: + case VIR_DOMAIN_DEVICE_FS: + case VIR_DOMAIN_DEVICE_NET: + case VIR_DOMAIN_DEVICE_INPUT: + case VIR_DOMAIN_DEVICE_SOUND: + case VIR_DOMAIN_DEVICE_VIDEO: + case VIR_DOMAIN_DEVICE_WATCHDOG: + case VIR_DOMAIN_DEVICE_CONTROLLER: + case VIR_DOMAIN_DEVICE_GRAPHICS: + case VIR_DOMAIN_DEVICE_HUB: + case VIR_DOMAIN_DEVICE_REDIRDEV: + case VIR_DOMAIN_DEVICE_SMARTCARD: + case VIR_DOMAIN_DEVICE_CHR: + case VIR_DOMAIN_DEVICE_MEMBALLOON: + case VIR_DOMAIN_DEVICE_NVRAM: + case VIR_DOMAIN_DEVICE_RNG: + case VIR_DOMAIN_DEVICE_SHMEM: + case VIR_DOMAIN_DEVICE_TPM: + case VIR_DOMAIN_DEVICE_PANIC: + case VIR_DOMAIN_DEVICE_MEMORY: + case VIR_DOMAIN_DEVICE_IOMMU: + case VIR_DOMAIN_DEVICE_LAST: + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unexpected device type %d"), + dev->type); + return -1; + } + + if (virProcessRunInMountNamespace(vm->pid, + qemuDomainDetachDeviceUnlinkHelper, + (void *)file) < 0) + return -1; + + return 0; +} + + int qemuDomainNamespaceSetupDisk(virQEMUDriverPtr driver, virDomainObjPtr vm, @@ -7512,3 +7603,69 @@ qemuDomainNamespaceTeardownDisk(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, * I don't, therefore: */ return 0; } + + +int +qemuDomainNamespaceSetupHostdev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainHostdevDefPtr hostdev) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDeviceDef dev = {.type = VIR_DOMAIN_DEVICE_HOSTDEV, .data.hostdev = hostdev}; + int ret = -1; + char *path = NULL; + + if (!priv->containerized) + return 0; + + if (qemuDomainGetHostdevPath(hostdev, &path) < 0) + goto cleanup; + + if (!path) { + /* There's no /dev device that we need to create. Claim success. */ + ret = 0; + goto cleanup; + } + + if (qemuDomainAttachDeviceMknod(driver, + vm, + &dev, + path) < 0) + goto cleanup; + ret = 0; + cleanup: + VIR_FREE(path); + return ret; +} + + +int +qemuDomainNamespaceTeardownHostdev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainHostdevDefPtr hostdev) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDeviceDef dev = {.type = VIR_DOMAIN_DEVICE_HOSTDEV, .data.hostdev = hostdev}; + int ret = -1; + char *path = NULL; + + if (!priv->containerized) + return 0; + + if (qemuDomainGetHostdevPath(hostdev, &path) < 0) + goto cleanup; + + if (!path) { + /* There's no /dev device that we need to create. Claim success. */ + ret = 0; + goto cleanup; + } + + if (qemuDomainDetachDeviceUnlink(driver, vm, &dev, path) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(path); + return ret; +} diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 045d1ba..ef85ae4 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -803,4 +803,12 @@ int qemuDomainNamespaceSetupDisk(virQEMUDriverPtr driver, int qemuDomainNamespaceTeardownDisk(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainDiskDefPtr disk); + +int qemuDomainNamespaceSetupHostdev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainHostdevDefPtr hostdev); + +int qemuDomainNamespaceTeardownHostdev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainHostdevDefPtr hostdev); #endif /* __QEMU_DOMAIN_H__ */ diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 9b47b5f..58b2439 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1384,6 +1384,7 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver, bool releaseaddr = false; bool teardowncgroup = false; bool teardownlabel = false; + bool teardowndevice = false; int backend; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); unsigned int flags = 0; @@ -1435,12 +1436,15 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver, } vm->def->hostdevs[--(vm->def->nhostdevs)] = NULL; + if (qemuDomainNamespaceSetupHostdev(driver, vm, hostdev) < 0) + goto error; + teardowndevice = true; + if (qemuSetupHostdevCgroup(vm, hostdev) < 0) goto error; teardowncgroup = true; - if (virSecurityManagerSetHostdevLabel(driver->securityManager, - vm->def, hostdev, NULL) < 0) + if (qemuSecuritySetHostdevLabel(driver, vm, hostdev) < 0) goto error; if (backend != VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO) teardownlabel = true; @@ -1493,9 +1497,11 @@ qemuDomainAttachHostPCIDevice(virQEMUDriverPtr driver, if (teardowncgroup && qemuTeardownHostdevCgroup(vm, hostdev) < 0) VIR_WARN("Unable to remove host device cgroup ACL on hotplug fail"); if (teardownlabel && - virSecurityManagerRestoreHostdevLabel(driver->securityManager, - vm->def, hostdev, NULL) < 0) + qemuSecurityRestoreHostdevLabel(driver, vm, hostdev) < 0) VIR_WARN("Unable to restore host device labelling on hotplug fail"); + if (teardowndevice && + qemuDomainNamespaceTeardownHostdev(driver, vm, hostdev) < 0) + VIR_WARN("Unable to remove host device from /dev"); if (releaseaddr) qemuDomainReleaseDeviceAddress(vm, hostdev->info, NULL); @@ -2275,6 +2281,7 @@ qemuDomainAttachHostUSBDevice(virQEMUDriverPtr driver, bool added = false; bool teardowncgroup = false; bool teardownlabel = false; + bool teardowndevice = false; int ret = -1; if (priv->usbaddrs) { @@ -2288,12 +2295,15 @@ qemuDomainAttachHostUSBDevice(virQEMUDriverPtr driver, added = true; + if (qemuDomainNamespaceSetupHostdev(driver, vm, hostdev) < 0) + goto cleanup; + teardowndevice = true; + if (qemuSetupHostdevCgroup(vm, hostdev) < 0) goto cleanup; teardowncgroup = true; - if (virSecurityManagerSetHostdevLabel(driver->securityManager, - vm->def, hostdev, NULL) < 0) + if (qemuSecuritySetHostdevLabel(driver, vm, hostdev) < 0) goto cleanup; teardownlabel = true; @@ -2323,9 +2333,11 @@ qemuDomainAttachHostUSBDevice(virQEMUDriverPtr driver, if (teardowncgroup && qemuTeardownHostdevCgroup(vm, hostdev) < 0) VIR_WARN("Unable to remove host device cgroup ACL on hotplug fail"); if (teardownlabel && - virSecurityManagerRestoreHostdevLabel(driver->securityManager, - vm->def, hostdev, NULL) < 0) + qemuSecurityRestoreHostdevLabel(driver, vm, hostdev) < 0) VIR_WARN("Unable to restore host device labelling on hotplug fail"); + if (teardowndevice && + qemuDomainNamespaceTeardownHostdev(driver, vm, hostdev) < 0) + VIR_WARN("Unable to remove host device from /dev"); if (added) qemuHostdevReAttachUSBDevices(driver, vm->def->name, &hostdev, 1); if (releaseaddr) @@ -2351,6 +2363,7 @@ qemuDomainAttachHostSCSIDevice(virConnectPtr conn, char *drivealias = NULL; bool teardowncgroup = false; bool teardownlabel = false; + bool teardowndevice = false; bool driveAdded = false; if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE_SCSI_GENERIC)) { @@ -2389,12 +2402,15 @@ qemuDomainAttachHostSCSIDevice(virConnectPtr conn, return -1; } + if (qemuDomainNamespaceSetupHostdev(driver, vm, hostdev) < 0) + goto cleanup; + teardowndevice = true; + if (qemuSetupHostdevCgroup(vm, hostdev) < 0) goto cleanup; teardowncgroup = true; - if (virSecurityManagerSetHostdevLabel(driver->securityManager, - vm->def, hostdev, NULL) < 0) + if (qemuSecuritySetHostdevLabel(driver, vm, hostdev) < 0) goto cleanup; teardownlabel = true; @@ -2441,9 +2457,11 @@ qemuDomainAttachHostSCSIDevice(virConnectPtr conn, if (teardowncgroup && qemuTeardownHostdevCgroup(vm, hostdev) < 0) VIR_WARN("Unable to remove host device cgroup ACL on hotplug fail"); if (teardownlabel && - virSecurityManagerRestoreHostdevLabel(driver->securityManager, - vm->def, hostdev, NULL) < 0) + qemuSecurityRestoreHostdevLabel(driver, vm, hostdev) < 0) VIR_WARN("Unable to restore host device labelling on hotplug fail"); + if (teardowndevice && + qemuDomainNamespaceTeardownHostdev(driver, vm, hostdev) < 0) + VIR_WARN("Unable to remove host device from /dev"); } VIR_FREE(drivealias); VIR_FREE(drvstr); @@ -3651,13 +3669,15 @@ qemuDomainRemoveHostDevice(virQEMUDriverPtr driver, virDomainAuditHostdev(vm, hostdev, "detach", true); if (!is_vfio && - virSecurityManagerRestoreHostdevLabel(driver->securityManager, - vm->def, hostdev, NULL) < 0) + qemuSecurityRestoreHostdevLabel(driver, vm, hostdev) < 0) VIR_WARN("Failed to restore host device labelling"); if (qemuTeardownHostdevCgroup(vm, hostdev) < 0) VIR_WARN("Failed to remove host device cgroup ACL"); + if (qemuDomainNamespaceTeardownHostdev(driver, vm, hostdev) < 0) + VIR_WARN("Unable to remove host device from /dev"); + switch ((virDomainHostdevSubsysType) hostdev->source.subsys.type) { case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: qemuDomainRemovePCIHostDevice(driver, vm, hostdev); diff --git a/src/qemu/qemu_security.c b/src/qemu/qemu_security.c index f800d9b..de616e8 100644 --- a/src/qemu/qemu_security.c +++ b/src/qemu/qemu_security.c @@ -168,3 +168,41 @@ qemuSecurityRestoreDiskLabel(virQEMUDriverPtr driver, vm->def, disk); } + + +int +qemuSecuritySetHostdevLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainHostdevDefPtr hostdev) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (priv->containerized) { + /* Already handled by namespace code. */ + return 0; + } + + return virSecurityManagerSetHostdevLabel(driver->securityManager, + vm->def, + hostdev, + NULL); +} + + +int +qemuSecurityRestoreHostdevLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainHostdevDefPtr hostdev) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (priv->containerized) { + /* Already handled by namespace code. */ + return 0; + } + + return virSecurityManagerRestoreHostdevLabel(driver->securityManager, + vm->def, + hostdev, + NULL); +} diff --git a/src/qemu/qemu_security.h b/src/qemu/qemu_security.h index e3324ca..cc373b3 100644 --- a/src/qemu/qemu_security.h +++ b/src/qemu/qemu_security.h @@ -44,4 +44,12 @@ int qemuSecuritySetDiskLabel(virQEMUDriverPtr driver, int qemuSecurityRestoreDiskLabel(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainDiskDefPtr disk); + +int qemuSecuritySetHostdevLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainHostdevDefPtr hostdev); + +int qemuSecurityRestoreHostdevLabel(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainHostdevDefPtr hostdev); #endif /* __QEMU_SECURITY_H__ */ -- 2.8.4

When attaching a device to a domain that's using separate mount namespace we must maintain /dev entries in order for qemu process to see them. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_domain.h | 8 ++++++ src/qemu/qemu_hotplug.c | 10 ++++++++ 3 files changed, 82 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index ac09b2c..cc7ce2d 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -7371,6 +7371,10 @@ qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, goto cleanup; } break; + case VIR_DOMAIN_DEVICE_CHR: + /* No labelling. */ + break; + case VIR_DOMAIN_DEVICE_NONE: case VIR_DOMAIN_DEVICE_LEASE: case VIR_DOMAIN_DEVICE_FS: @@ -7384,7 +7388,6 @@ qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, case VIR_DOMAIN_DEVICE_HUB: case VIR_DOMAIN_DEVICE_REDIRDEV: case VIR_DOMAIN_DEVICE_SMARTCARD: - case VIR_DOMAIN_DEVICE_CHR: case VIR_DOMAIN_DEVICE_MEMBALLOON: case VIR_DOMAIN_DEVICE_NVRAM: case VIR_DOMAIN_DEVICE_RNG: @@ -7504,6 +7507,10 @@ qemuDomainDetachDeviceUnlink(virQEMUDriverPtr driver, return -1; } break; + case VIR_DOMAIN_DEVICE_CHR: + /* No labelling. */ + break; + case VIR_DOMAIN_DEVICE_NONE: case VIR_DOMAIN_DEVICE_LEASE: case VIR_DOMAIN_DEVICE_FS: @@ -7517,7 +7524,6 @@ qemuDomainDetachDeviceUnlink(virQEMUDriverPtr driver, case VIR_DOMAIN_DEVICE_HUB: case VIR_DOMAIN_DEVICE_REDIRDEV: case VIR_DOMAIN_DEVICE_SMARTCARD: - case VIR_DOMAIN_DEVICE_CHR: case VIR_DOMAIN_DEVICE_MEMBALLOON: case VIR_DOMAIN_DEVICE_NVRAM: case VIR_DOMAIN_DEVICE_RNG: @@ -7669,3 +7675,59 @@ qemuDomainNamespaceTeardownHostdev(virQEMUDriverPtr driver, VIR_FREE(path); return ret; } + + +int +qemuDomainNamespaceSetupChardev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainChrDefPtr chr) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDeviceDef dev = {.type = VIR_DOMAIN_DEVICE_CHR, .data.chr = chr}; + const char *path; + int ret = -1; + + if (!priv->containerized) + return 0; + + if (chr->source->type != VIR_DOMAIN_CHR_TYPE_DEV) + return 0; + + path = chr->source->data.file.path; + + if (qemuDomainAttachDeviceMknod(driver, + vm, + &dev, + path) < 0) + goto cleanup; + ret = 0; + cleanup: + return ret; +} + + +int +qemuDomainNamespaceTeardownChardev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainChrDefPtr chr) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDeviceDef dev = {.type = VIR_DOMAIN_DEVICE_CHR, .data.chr = chr}; + int ret = -1; + const char *path = NULL; + + if (!priv->containerized) + return 0; + + if (chr->source->type != VIR_DOMAIN_CHR_TYPE_DEV) + return 0; + + path = chr->source->data.file.path; + + if (qemuDomainDetachDeviceUnlink(driver, vm, &dev, path) < 0) + goto cleanup; + + ret = 0; + cleanup: + return ret; +} diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index ef85ae4..37b1e41 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -811,4 +811,12 @@ int qemuDomainNamespaceSetupHostdev(virQEMUDriverPtr driver, int qemuDomainNamespaceTeardownHostdev(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainHostdevDefPtr hostdev); + +int qemuDomainNamespaceSetupChardev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainChrDefPtr chr); + +int qemuDomainNamespaceTeardownChardev(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainChrDefPtr chr); #endif /* __QEMU_DOMAIN_H__ */ diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 58b2439..4f012d0 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1843,6 +1843,7 @@ int qemuDomainAttachChrDevice(virConnectPtr conn, bool chardevAttached = false; bool tlsobjAdded = false; bool teardowncgroup = false; + bool teardowndevice = false; bool secobjAdded = false; virJSONValuePtr tlsProps = NULL; char *tlsAlias = NULL; @@ -1864,6 +1865,10 @@ int qemuDomainAttachChrDevice(virConnectPtr conn, if (rc == 1) need_release = true; + if (qemuDomainNamespaceSetupChardev(driver, vm, chr) < 0) + goto cleanup; + teardowndevice = true; + if (qemuSetupChardevCgroup(vm, chr) < 0) goto cleanup; teardowncgroup = true; @@ -1927,6 +1932,8 @@ int qemuDomainAttachChrDevice(virConnectPtr conn, qemuDomainReleaseDeviceAddress(vm, &chr->info, NULL); if (teardowncgroup && qemuTeardownChardevCgroup(vm, chr) < 0) VIR_WARN("Unable to remove chr device cgroup ACL on hotplug fail"); + if (teardowndevice && qemuDomainNamespaceTeardownChardev(driver, vm, chr) < 0) + VIR_WARN("Unable to remove chr device from /dev"); } VIR_FREE(tlsAlias); virJSONValueFree(tlsProps); @@ -3887,6 +3894,9 @@ qemuDomainRemoveChrDevice(virQEMUDriverPtr driver, if (qemuTeardownChardevCgroup(vm, chr) < 0) VIR_WARN("Failed to remove chr device cgroup ACL"); + if (qemuDomainNamespaceTeardownChardev(driver, vm, chr) < 0) + VIR_WARN("Unable to remove chr device from /dev"); + event = virDomainEventDeviceRemovedNewFromObj(vm, chr->info.alias); qemuDomainEventQueue(driver, event); -- 2.8.4

When attaching a device to a domain that's using separate mount namespace we must maintain /dev entries in order for qemu process to see them. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_domain.h | 8 ++++++ src/qemu/qemu_hotplug.c | 10 +++++++ 3 files changed, 88 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index cc7ce2d..0b46336 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -7372,6 +7372,7 @@ qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, } break; case VIR_DOMAIN_DEVICE_CHR: + case VIR_DOMAIN_DEVICE_RNG: /* No labelling. */ break; @@ -7390,7 +7391,6 @@ qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, case VIR_DOMAIN_DEVICE_SMARTCARD: case VIR_DOMAIN_DEVICE_MEMBALLOON: case VIR_DOMAIN_DEVICE_NVRAM: - case VIR_DOMAIN_DEVICE_RNG: case VIR_DOMAIN_DEVICE_SHMEM: case VIR_DOMAIN_DEVICE_TPM: case VIR_DOMAIN_DEVICE_PANIC: @@ -7508,6 +7508,7 @@ qemuDomainDetachDeviceUnlink(virQEMUDriverPtr driver, } break; case VIR_DOMAIN_DEVICE_CHR: + case VIR_DOMAIN_DEVICE_RNG: /* No labelling. */ break; @@ -7526,7 +7527,6 @@ qemuDomainDetachDeviceUnlink(virQEMUDriverPtr driver, case VIR_DOMAIN_DEVICE_SMARTCARD: case VIR_DOMAIN_DEVICE_MEMBALLOON: case VIR_DOMAIN_DEVICE_NVRAM: - case VIR_DOMAIN_DEVICE_RNG: case VIR_DOMAIN_DEVICE_SHMEM: case VIR_DOMAIN_DEVICE_TPM: case VIR_DOMAIN_DEVICE_PANIC: @@ -7731,3 +7731,71 @@ qemuDomainNamespaceTeardownChardev(virQEMUDriverPtr driver, cleanup: return ret; } + + +int +qemuDomainNamespaceSetupRNG(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainRNGDefPtr rng) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDeviceDef dev = {.type = VIR_DOMAIN_DEVICE_RNG, .data.rng = rng}; + const char *path = NULL; + int ret = -1; + + if (!priv->containerized) + return 0; + + switch ((virDomainRNGBackend) rng->backend) { + case VIR_DOMAIN_RNG_BACKEND_RANDOM: + path = rng->source.file; + break; + + case VIR_DOMAIN_RNG_BACKEND_EGD: + case VIR_DOMAIN_RNG_BACKEND_LAST: + ret = 0; + goto cleanup; + } + + if (qemuDomainAttachDeviceMknod(driver, + vm, + &dev, + path) < 0) + goto cleanup; + ret = 0; + cleanup: + return ret; +} + + +int +qemuDomainNamespaceTeardownRNG(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainRNGDefPtr rng) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDeviceDef dev = {.type = VIR_DOMAIN_DEVICE_RNG, .data.rng = rng}; + int ret = -1; + const char *path = NULL; + + if (!priv->containerized) + return 0; + + switch ((virDomainRNGBackend) rng->backend) { + case VIR_DOMAIN_RNG_BACKEND_RANDOM: + path = rng->source.file; + break; + + case VIR_DOMAIN_RNG_BACKEND_EGD: + case VIR_DOMAIN_RNG_BACKEND_LAST: + ret = 0; + goto cleanup; + } + + if (qemuDomainDetachDeviceUnlink(driver, vm, &dev, path) < 0) + goto cleanup; + + ret = 0; + cleanup: + return ret; +} diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 37b1e41..78d00de 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -819,4 +819,12 @@ int qemuDomainNamespaceSetupChardev(virQEMUDriverPtr driver, int qemuDomainNamespaceTeardownChardev(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainChrDefPtr chr); + +int qemuDomainNamespaceSetupRNG(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainRNGDefPtr rng); + +int qemuDomainNamespaceTeardownRNG(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainRNGDefPtr rng); #endif /* __QEMU_DOMAIN_H__ */ diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 4f012d0..61f7c5a 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1980,6 +1980,7 @@ qemuDomainAttachRNGDevice(virConnectPtr conn, char *secAlias = NULL; bool releaseaddr = false; bool teardowncgroup = false; + bool teardowndevice = false; bool chardevAdded = false; bool objAdded = false; bool tlsobjAdded = false; @@ -2025,6 +2026,10 @@ qemuDomainAttachRNGDevice(virConnectPtr conn, goto cleanup; } + if (qemuDomainNamespaceSetupRNG(driver, vm, rng) < 0) + goto cleanup; + teardowndevice = true; + if (qemuSetupRNGCgroup(vm, rng) < 0) goto cleanup; teardowncgroup = true; @@ -2111,6 +2116,8 @@ qemuDomainAttachRNGDevice(virConnectPtr conn, qemuDomainReleaseDeviceAddress(vm, &rng->info, NULL); if (teardowncgroup && qemuTeardownRNGCgroup(vm, rng) < 0) VIR_WARN("Unable to remove RNG device cgroup ACL on hotplug fail"); + if (teardowndevice && qemuDomainNamespaceTeardownRNG(driver, vm, rng) < 0) + VIR_WARN("Unable to remove chr device from /dev"); } VIR_FREE(tlsAlias); @@ -3975,6 +3982,9 @@ qemuDomainRemoveRNGDevice(virQEMUDriverPtr driver, if (qemuTeardownRNGCgroup(vm, rng) < 0) VIR_WARN("Failed to remove RNG device cgroup ACL"); + if (qemuDomainNamespaceTeardownRNG(driver, vm, rng) < 0) + VIR_WARN("Unable to remove RNG device from /dev"); + event = virDomainEventDeviceRemovedNewFromObj(vm, rng->info.alias); qemuDomainEventQueue(driver, event); -- 2.8.4

Given how intrusive previous patches are, it might happen that there's a bug or imperfection. Lets give users a way out: if they set 'containerize' to false in qemu.conf the feature is suppressed. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 8 ++++++++ src/qemu/qemu_conf.c | 5 +++++ src/qemu/qemu_conf.h | 2 ++ src/qemu/qemu_domain.c | 3 ++- src/qemu/test_libvirtd_qemu.aug.in | 1 + 6 files changed, 19 insertions(+), 1 deletion(-) diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug index f3cc9e6..5bd7f2f 100644 --- a/src/qemu/libvirtd_qemu.aug +++ b/src/qemu/libvirtd_qemu.aug @@ -70,6 +70,7 @@ module Libvirtd_qemu = | str_array_entry "cgroup_controllers" | str_array_entry "cgroup_device_acl" | int_entry "seccomp_sandbox" + | bool_entry "containerize" let save_entry = str_entry "save_image_format" | str_entry "dump_image_format" diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 2b2bd60..26308a3 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -665,3 +665,11 @@ # Defaults to 4 # #gluster_debug_level = 9 + +# To enhance security, QEMU driver is capable of mounting private +# devtmpfs for each domain started. This means qemu process is +# unable to see all the devices on the system, just those +# configured for the domain in question. Libvirt manages device +# entries throughout the domain lifetime. This is turned on by +# default. +#containerize = 1 diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index 9be5b60..5578edd 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -314,6 +314,8 @@ virQEMUDriverConfigPtr virQEMUDriverConfigNew(bool privileged) cfg->glusterDebugLevel = 4; cfg->stdioLogD = true; + cfg->containerize = true; + #ifdef DEFAULT_LOADER_NVRAM if (virFirmwareParseList(DEFAULT_LOADER_NVRAM, &cfg->firmwares, @@ -798,6 +800,9 @@ int virQEMUDriverConfigLoadFile(virQEMUDriverConfigPtr cfg, if (virConfGetValueUInt(conf, "gluster_debug_level", &cfg->glusterDebugLevel) < 0) goto cleanup; + if (virConfGetValueBool(conf, "containerize", &cfg->containerize) < 0) + goto cleanup; + ret = 0; cleanup: diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index d191e10..a21bba4 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -90,6 +90,8 @@ struct _virQEMUDriverConfig { gid_t group; bool dynamicOwnership; + bool containerize; + int cgroupControllers; char **cgroupDeviceACL; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 0b46336..0719bf9 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -7238,7 +7238,8 @@ qemuDomainCreateNamespace(virQEMUDriverPtr driver, return 0; #endif - if (!virQEMUDriverIsPrivileged(driver)) { + if (!cfg->containerize || + !virQEMUDriverIsPrivileged(driver)) { ret = 0; goto cleanup; } diff --git a/src/qemu/test_libvirtd_qemu.aug.in b/src/qemu/test_libvirtd_qemu.aug.in index f586e95..7b8ebd2 100644 --- a/src/qemu/test_libvirtd_qemu.aug.in +++ b/src/qemu/test_libvirtd_qemu.aug.in @@ -91,3 +91,4 @@ module Test_libvirtd_qemu = } { "stdio_handler" = "logd" } { "gluster_debug_level" = "9" } +{ "containerize" = "1" } -- 2.8.4

On Thu, Nov 24, 2016 at 03:48:10PM +0100, Michal Privoznik wrote:
Given how intrusive previous patches are, it might happen that there's a bug or imperfection. Lets give users a way out: if they set 'containerize' to false in qemu.conf the feature is suppressed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 8 ++++++++ src/qemu/qemu_conf.c | 5 +++++ src/qemu/qemu_conf.h | 2 ++ src/qemu/qemu_domain.c | 3 ++- src/qemu/test_libvirtd_qemu.aug.in | 1 + 6 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug index f3cc9e6..5bd7f2f 100644 --- a/src/qemu/libvirtd_qemu.aug +++ b/src/qemu/libvirtd_qemu.aug @@ -70,6 +70,7 @@ module Libvirtd_qemu = | str_array_entry "cgroup_controllers" | str_array_entry "cgroup_device_acl" | int_entry "seccomp_sandbox" + | bool_entry "containerize"
let save_entry = str_entry "save_image_format" | str_entry "dump_image_format" diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 2b2bd60..26308a3 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -665,3 +665,11 @@ # Defaults to 4 # #gluster_debug_level = 9 + +# To enhance security, QEMU driver is capable of mounting private +# devtmpfs for each domain started. This means qemu process is +# unable to see all the devices on the system, just those +# configured for the domain in question. Libvirt manages device +# entries throughout the domain lifetime. This is turned on by +# default. +#containerize = 1
Similarly to my earlier question, I wonder if we're better off explicitly referring to the namespace we're actually using to make future enhancements simpler. eg allow either namespaces = [ "mount" ] or namespaces = [ ] so we can extend this to non-mount namespaces later if desired. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 24.11.2016 15:47, Michal Privoznik wrote:
Finally. This is full implementation of my RFC:
https://www.redhat.com/archives/libvir-list/2016-November/msg00691.html
The first two patches were posted separately, but since they lack review I'm sending them here too because they are important for the feature:
https://www.redhat.com/archives/libvir-list/2016-November/msg01060.html
There's a v2 for the first two patches: https://www.redhat.com/archives/libvir-list/2016-November/msg01415.html So you can drop the first two patches if you apply the rest on the top of them. Or disregard them completely (if you don't need to test hugepages enabled guest). It would be nice if we could get these in ASAP, as this development cycle is longer than others which gives us more time to test the patches. Michal
participants (2)
-
Daniel P. Berrange
-
Michal Privoznik