[libvirt] [PATCH 00/14] Avoid unsafe usage of /proc/$PID/root in LXC driver

This is a followup to Eric's original proposal https://www.redhat.com/archives/libvir-list/2013-December/msg01242.html The first 5 patches fix non-security bugs in the LXC hotplug code. Then there's a couple of helper patches. Finally the last 6 fix the actual security issue previously publically reported. Eric originally had a huge cleanup of virFork, but I'd prefer that be done afterwards, to minimize the backporting pain for stable branches. Daniel P. Berrange (13): Don't block use of USB with containers Fix path used for USB device attach with LXC Record hotplugged USB device in LXC live guest config Fix reset of cgroup when detaching USB device from LXC guests Disks are always block devices, never character devices Move check for cgroup devices ACL upfront in LXC hotplug Add virFileMakeParentPath helper function Add helper for running code in separate namespaces Avoid unsafe use of /proc/$PID/root in LXC disk hotplug Avoid unsafe use of /proc/$PID/root in LXC USB hotplug Avoid unsafe use of /proc/$PID/root in LXC block hostdev hotplug Avoid unsafe use of /proc/$PID/root in LXC chardev hostdev hotplug Avoid unsafe use of /proc/$PID/root in LXC hotunplug code Eric Blake (1): Avoid unsafe use of /proc/$PID/root in LXC shutdown/reboot code src/conf/domain_conf.c | 1 + src/libvirt_private.syms | 2 + src/lxc/lxc_driver.c | 552 +++++++++++++++++++++++------------------------ src/util/virfile.c | 27 +++ src/util/virfile.h | 1 + src/util/virinitctl.c | 26 +-- src/util/virinitctl.h | 5 +- src/util/virprocess.c | 114 ++++++++++ src/util/virprocess.h | 11 + 9 files changed, 442 insertions(+), 297 deletions(-) -- 1.8.5.3

virDomainDefCompatibleDevice blocks use of USB if no USB controller is present. This is not correct for containers since devices can be assigned directly regardless of any controllers. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/conf/domain_conf.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 512fe51..61e69e1 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -17582,6 +17582,7 @@ virDomainDefCompatibleDevice(virDomainDefPtr def, virDomainDeviceDefPtr dev) { if (!virDomainDefHasUSB(def) && + STRNEQ(def->os.type, "exe") && virDomainDeviceIsUSB(dev)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("Device configuration is not compatible: " -- 1.8.5.3

On 02/07/2014 08:32 AM, Daniel P. Berrange wrote:
virDomainDefCompatibleDevice blocks use of USB if no USB controller is present. This is not correct for containers since devices can be assigned directly regardless of any controllers.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/conf/domain_conf.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 512fe51..61e69e1 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -17582,6 +17582,7 @@ virDomainDefCompatibleDevice(virDomainDefPtr def, virDomainDeviceDefPtr dev) { if (!virDomainDefHasUSB(def) && + STRNEQ(def->os.type, "exe") &&
Elsewhere we alternate between STREQ_NULLABLE (virDomainDeviceInfoIterateInternal) and mandating os.type (virDomainDefPostParseInternal). Which is right? Can os.type validly be NULL? Depending on the answer, you may need STREQ_NULLABLE here, or a followup patches to the other places; but with that fixed, ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Fri, Feb 07, 2014 at 08:50:19AM -0700, Eric Blake wrote:
On 02/07/2014 08:32 AM, Daniel P. Berrange wrote:
virDomainDefCompatibleDevice blocks use of USB if no USB controller is present. This is not correct for containers since devices can be assigned directly regardless of any controllers.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/conf/domain_conf.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 512fe51..61e69e1 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -17582,6 +17582,7 @@ virDomainDefCompatibleDevice(virDomainDefPtr def, virDomainDeviceDefPtr dev) { if (!virDomainDefHasUSB(def) && + STRNEQ(def->os.type, "exe") &&
Elsewhere we alternate between STREQ_NULLABLE (virDomainDeviceInfoIterateInternal) and mandating os.type (virDomainDefPostParseInternal). Which is right? Can os.type validly be NULL?
Depending on the answer, you may need STREQ_NULLABLE here, or a followup patches to the other places; but with that fixed, ACK.
In virDomainDefPostParseInternal we have: if (!def->os.type) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("hypervisor type must be specified")); return -1; } So I believe we're fine here. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

The LXC code missed the 'usb' component out of the path /dev/bus/usb/$BUSNUM/$DEVNUM, so it failed to actually setup cgroups for the device. This was in fact lucky because the call to virLXCSetupHostUsbDeviceCgroup was also mistakenly passing '&priv->cgroup' instead of just 'priv->cgroup'. So once the path is fixed, libvirtd would then crash trying to access the bogus virCgroupPtr pointer. This would have been a security issue, were it not for the bogus path preventing the pointer reference being reached. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index 687046e..cd48be8 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -3903,7 +3903,7 @@ lxcDomainAttachDeviceHostdevSubsysUSBLive(virLXCDriverPtr driver, (unsigned long long)priv->initpid) < 0) goto cleanup; - if (virAsprintf(&dstdir, "%s/dev/bus/%03d", + if (virAsprintf(&dstdir, "%s/dev/bus/usb/%03d", vroot, def->source.subsys.u.usb.bus) < 0) goto cleanup; @@ -3968,7 +3968,7 @@ lxcDomainAttachDeviceHostdevSubsysUSBLive(virLXCDriverPtr driver, if (virUSBDeviceFileIterate(usb, virLXCSetupHostUsbDeviceCgroup, - &priv->cgroup) < 0) + priv->cgroup) < 0) goto cleanup; ret = 0; -- 1.8.5.3

On 02/07/2014 08:33 AM, Daniel P. Berrange wrote:
The LXC code missed the 'usb' component out of the path /dev/bus/usb/$BUSNUM/$DEVNUM, so it failed to actually setup cgroups for the device. This was in fact lucky because the call to virLXCSetupHostUsbDeviceCgroup was also mistakenly passing '&priv->cgroup' instead of just 'priv->cgroup'. So once the path is fixed, libvirtd would then crash trying to access the bogus virCgroupPtr pointer. This would have been a security issue, were it not for the bogus path preventing the pointer reference being reached.
One bug masking another. Eww.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

After hotplugging a USB device, the LXC driver forgot to add the device def to the virDomainDefPtr. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index cd48be8..1c7abb9 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -3943,6 +3943,9 @@ lxcDomainAttachDeviceHostdevSubsysUSBLive(virLXCDriverPtr driver, mode = 0700 | S_IFCHR; + if (VIR_REALLOC_N(vm->def->hostdevs, vm->def->nhostdevs+1) < 0) + goto cleanup; + if (virFileMakePath(dstdir) < 0) { virReportSystemError(errno, _("Unable to create %s"), dstdir); @@ -3971,6 +3974,8 @@ lxcDomainAttachDeviceHostdevSubsysUSBLive(virLXCDriverPtr driver, priv->cgroup) < 0) goto cleanup; + vm->def->hostdevs[vm->def->nhostdevs++] = def; + ret = 0; cleanup: -- 1.8.5.3

On 02/07/2014 08:33 AM, Daniel P. Berrange wrote:
After hotplugging a USB device, the LXC driver forgot to add the device def to the virDomainDefPtr.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 5 +++++ 1 file changed, 5 insertions(+)
ACK.
diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index cd48be8..1c7abb9 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -3943,6 +3943,9 @@ lxcDomainAttachDeviceHostdevSubsysUSBLive(virLXCDriverPtr driver,
mode = 0700 | S_IFCHR;
+ if (VIR_REALLOC_N(vm->def->hostdevs, vm->def->nhostdevs+1) < 0)
Space around + -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

When detaching a USB device from an LXC guest we must remove the device from the cgroup ACL. Unfortunately we were telling the cgroup code to use the guest /dev path, not the host /dev path, and the guest device node had already been unlinked. This was, however, fortunate since the code passed &priv->cgroup instead of priv->cgroup, so would have crash if the device node were accessible. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index 1c7abb9..b9da8ef 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -4477,7 +4477,7 @@ lxcDomainDetachDeviceHostdevUSBLive(virLXCDriverPtr driver, } if (!(usb = virUSBDeviceNew(def->source.subsys.u.usb.bus, - def->source.subsys.u.usb.device, vroot))) + def->source.subsys.u.usb.device, NULL))) goto cleanup; VIR_DEBUG("Unlinking %s", dst); @@ -4491,7 +4491,7 @@ lxcDomainDetachDeviceHostdevUSBLive(virLXCDriverPtr driver, if (virUSBDeviceFileIterate(usb, virLXCTeardownHostUsbDeviceCgroup, - &priv->cgroup) < 0) + priv->cgroup) < 0) VIR_WARN("cannot deny device %s for domain %s", dst, vm->def->name); -- 1.8.5.3

On 02/07/2014 08:33 AM, Daniel P. Berrange wrote:
When detaching a USB device from an LXC guest we must remove the device from the cgroup ACL. Unfortunately we were telling the cgroup code to use the guest /dev path, not the host /dev path, and the guest device node had already been unlinked. This was, however, fortunate since the code passed &priv->cgroup instead of priv->cgroup, so would have crash if the device node were accessible.
Another case of one bug masking another. Wow.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

The LXC disk hotplug code was allowing block or character devices to be given as disk. A disk is always a block device. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index b9da8ef..79a9e36 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -3668,9 +3668,9 @@ lxcDomainAttachDeviceDiskLive(virLXCDriverPtr driver, goto cleanup; } - if (!S_ISCHR(sb.st_mode) && !S_ISBLK(sb.st_mode)) { + if (!S_ISBLK(sb.st_mode)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("Disk source %s must be a character/block device"), + _("Disk source %s must be a block device"), def->src); goto cleanup; } @@ -3682,11 +3682,7 @@ lxcDomainAttachDeviceDiskLive(virLXCDriverPtr driver, if (VIR_REALLOC_N(vm->def->disks, vm->def->ndisks+1) < 0) goto cleanup; - mode = 0700; - if (S_ISCHR(sb.st_mode)) - mode |= S_IFCHR; - else - mode |= S_IFBLK; + mode = 0700 | S_IFBLK; /* Yes, the device name we're creating may not * actually correspond to the major:minor number -- 1.8.5.3

On 02/07/2014 08:33 AM, Daniel P. Berrange wrote:
The LXC disk hotplug code was allowing block or character devices to be given as disk. A disk is always a block device.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-)
ACK -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

The check for whether the cgroup devices ACL is available is done quite late during LXC hotplug - in fact after the device node is already created in the container in some cases. Better todo it upfront so we fail immediately. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 36 ++++++++++++------------------------ 1 file changed, 12 insertions(+), 24 deletions(-) diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index 79a9e36..51ca98c 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -3645,6 +3645,12 @@ lxcDomainAttachDeviceDiskLive(virLXCDriverPtr driver, goto cleanup; } + if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("devices cgroup isn't mounted")); + goto cleanup; + } + if (def->type != VIR_DOMAIN_DISK_TYPE_BLOCK) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("Can't setup disk for non-block device")); @@ -3712,12 +3718,6 @@ lxcDomainAttachDeviceDiskLive(virLXCDriverPtr driver, vm->def, def) < 0) goto cleanup; - if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) { - virReportError(VIR_ERR_OPERATION_INVALID, "%s", - _("devices cgroup isn't mounted")); - goto cleanup; - } - if (virCgroupAllowDevicePath(priv->cgroup, def->src, (def->readonly ? VIR_CGROUP_DEVICE_READ : @@ -3914,12 +3914,6 @@ lxcDomainAttachDeviceHostdevSubsysUSBLive(virLXCDriverPtr driver, def->source.subsys.u.usb.device) < 0) goto cleanup; - if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) { - virReportError(VIR_ERR_OPERATION_INVALID, "%s", - _("devices cgroup isn't mounted")); - goto cleanup; - } - if (!(usb = virUSBDeviceNew(def->source.subsys.u.usb.bus, def->source.subsys.u.usb.device, vroot))) goto cleanup; @@ -4067,12 +4061,6 @@ lxcDomainAttachDeviceHostdevStorageLive(virLXCDriverPtr driver, vm->def, def, vroot) < 0) goto cleanup; - if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) { - virReportError(VIR_ERR_OPERATION_INVALID, "%s", - _("devices cgroup isn't mounted")); - goto cleanup; - } - if (virCgroupAllowDevicePath(priv->cgroup, def->source.caps.u.storage.block, VIR_CGROUP_DEVICE_RW | VIR_CGROUP_DEVICE_MKNOD) != 0) { @@ -4175,12 +4163,6 @@ lxcDomainAttachDeviceHostdevMiscLive(virLXCDriverPtr driver, vm->def, def, vroot) < 0) goto cleanup; - if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) { - virReportError(VIR_ERR_OPERATION_INVALID, "%s", - _("devices cgroup isn't mounted")); - goto cleanup; - } - if (virCgroupAllowDevicePath(priv->cgroup, def->source.caps.u.misc.chardev, VIR_CGROUP_DEVICE_RW | VIR_CGROUP_DEVICE_MKNOD) != 0) { @@ -4256,6 +4238,12 @@ lxcDomainAttachDeviceHostdevLive(virLXCDriverPtr driver, return -1; } + if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("devices cgroup isn't mounted")); + return -1; + } + switch (dev->data.hostdev->mode) { case VIR_DOMAIN_HOSTDEV_MODE_SUBSYS: return lxcDomainAttachDeviceHostdevSubsysLive(driver, vm, dev); -- 1.8.5.3

On 02/07/2014 08:33 AM, Daniel P. Berrange wrote:
The check for whether the cgroup devices ACL is available is done quite late during LXC hotplug - in fact after the device node is already created in the container in some cases. Better todo it upfront so we fail immediately.
s/todo/to do/
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 36 ++++++++++++------------------------ 1 file changed, 12 insertions(+), 24 deletions(-)
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Add a helper function which takes a file path and ensures that all directory components leading upto the file exist. IOW, it strips the filename part of the path and passes the result to virFileMakePath. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virfile.c | 27 +++++++++++++++++++++++++++ src/util/virfile.h | 1 + 3 files changed, 29 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 2c9536a..5554a30 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1225,6 +1225,7 @@ virFileIsMountPoint; virFileLinkPointsTo; virFileLock; virFileLoopDeviceAssociate; +virFileMakeParentPath; virFileMakePath; virFileMakePathWithMode; virFileMatchesNameSuffix; diff --git a/src/util/virfile.c b/src/util/virfile.c index 631cd06..92a8621 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -2340,6 +2340,33 @@ cleanup: return ret; } + +int +virFileMakeParentPath(const char *path) +{ + char *p; + char *tmp; + int ret; + + VIR_DEBUG("path=%s", path); + + if (VIR_STRDUP(tmp, path) < 0) { + errno = ENOMEM; + return -1; + } + + if ((p = strrchr(tmp, '/')) == NULL) { + errno = EINVAL; + return -1; + } + *p = '\0'; + + ret = virFileMakePathHelper(tmp, 0777); + VIR_FREE(tmp); + return ret; +} + + /* Build up a fully qualified path for a config file to be * associated with a persistent guest or network */ char * diff --git a/src/util/virfile.h b/src/util/virfile.h index 0d20cdb..20baf6f 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -198,6 +198,7 @@ int virDirCreate(const char *path, mode_t mode, uid_t uid, gid_t gid, int virFileMakePath(const char *path) ATTRIBUTE_RETURN_CHECK; int virFileMakePathWithMode(const char *path, mode_t mode) ATTRIBUTE_RETURN_CHECK; +int virFileMakeParentPath(const char *path) ATTRIBUTE_RETURN_CHECK; char *virFileBuildPath(const char *dir, const char *name, -- 1.8.5.3

On 02/07/2014 08:33 AM, Daniel P. Berrange wrote:
Add a helper function which takes a file path and ensures that all directory components leading upto the file exist.
s/upto/up to/
IOW, it strips the filename part of the path and passes the result to virFileMakePath.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virfile.c | 27 +++++++++++++++++++++++++++ src/util/virfile.h | 1 + 3 files changed, 29 insertions(+)
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Implement virProcessRunInMountNamespace, which runs callback of type virProcessNamespaceCallback in a container namespace. This uses a child process to run the callback, since you can't change the mount namespace of a thread. This implies that callbacks have to be careful about what code they run due to async safety rules. Idea by Dan Berrange, based on an initial report by Reco <recoverym4n@gmail.com> at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=732394 Signed-off-by: Daniel Berrange <berrange@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virprocess.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++ src/util/virprocess.h | 11 +++++ 3 files changed, 126 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 5554a30..3066506 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1670,6 +1670,7 @@ virProcessGetNamespaces; virProcessGetStartTime; virProcessKill; virProcessKillPainfully; +virProcessRunInMountNamespace; virProcessSetAffinity; virProcessSetMaxFiles; virProcessSetMaxMemLock; diff --git a/src/util/virprocess.c b/src/util/virprocess.c index 83d0679..a0cbfbf 100644 --- a/src/util/virprocess.c +++ b/src/util/virprocess.c @@ -50,6 +50,8 @@ #include "virlog.h" #include "virutil.h" #include "virstring.h" +#include "virthread.h" +#include "vircommand.h" #define VIR_FROM_THIS VIR_FROM_NONE @@ -877,3 +879,115 @@ int virProcessGetStartTime(pid_t pid, return 0; } #endif + + +#ifdef HAVE_SETNS +static int virProcessNamespaceHelper(int errfd, + pid_t pid, + virProcessNamespaceCallback cb, + void *opaque) +{ + char *path; + int fd = -1; + int ret = -1; + + if (virAsprintf(&path, "/proc/%llu/ns/mnt", (unsigned long long)pid) < 0) + goto cleanup; + + if ((fd = open(path, O_RDONLY)) < 0) { + virReportSystemError(errno, "%s", + _("Kernel does not provide mount namespace")); + goto cleanup; + } + + if (setns(fd, 0) < 0) { + virReportSystemError(errno, "%s", + _("Unable to enter mount namespace")); + goto cleanup; + } + + ret = cb(pid, opaque); + + cleanup: + if (ret < 0) { + virErrorPtr err = virGetLastError(); + if (err) { + size_t len = strlen(err->message) + 1; + ignore_value(safewrite(errfd, err->message, len)); + } + } + VIR_FREE(path); + VIR_FORCE_CLOSE(fd); + return ret; +} + +/* Run cb(opaque) in the mount namespace of pid. Return -1 with error + * message raised if we fail to run the child, if the child dies from + * a signal, or if the child has status EXIT_CANCELED; otherwise + * return the exit status of the child. The callback will be run in a + * child process so must be careful to only use async signal safe + * functions. + */ +int +virProcessRunInMountNamespace(pid_t pid, + virProcessNamespaceCallback cb, + void *opaque) +{ + int ret = -1; + pid_t child = -1; + int errfd[2] = { -1, -1 }; + + if (pipe(errfd) < 0) { + virReportSystemError(errno, "%s", + _("Cannot create pipe for child")); + return -1; + } + + ret = virFork(&child); + + if (ret < 0 || child < 0) { + if (child == 0) + _exit(1); + else if (child > 0) + virProcessAbort(child); + goto cleanup; + } + + if (child == 0) { + VIR_FORCE_CLOSE(errfd[0]); + ret = virProcessNamespaceHelper(errfd[1], pid, + cb, opaque); + VIR_FORCE_CLOSE(errfd[1]); + _exit(ret < 0 ? 1 : 0); + } else { + int status; + char *buf = NULL; + VIR_FORCE_CLOSE(errfd[1]); + + ignore_value(virFileReadHeaderFD(errfd[0], 1024, &buf)); + if (virProcessWait(child, &status) < 0) + ret = -1; + else + ret = status == 0 ? 0 : -1; + if (ret < 0) + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + buf ? buf : _("Failed to run callback in mount namespace")); + VIR_FREE(buf); + } + +cleanup: + VIR_FORCE_CLOSE(errfd[0]); + VIR_FORCE_CLOSE(errfd[1]); + return ret; +} +#else /* !HAVE_SETNS */ +int +virProcessRunInMountNamespace(pid_t pid ATTRIBUTE_UNUSED, + virProcessNamespaceCallback cb ATTRIBUTE_UNUSED, + void *opaque ATTRIBUTE_UNUSED) +{ + virReportSystemError(ENOSYS, "%s", + _("Mount namespaces are not available on this platform")); + return -1; +} +#endif diff --git a/src/util/virprocess.h b/src/util/virprocess.h index 9f77bc5..75c7d1b 100644 --- a/src/util/virprocess.h +++ b/src/util/virprocess.h @@ -60,4 +60,15 @@ int virProcessSetNamespaces(size_t nfdlist, int virProcessSetMaxMemLock(pid_t pid, unsigned long long bytes); int virProcessSetMaxProcesses(pid_t pid, unsigned int procs); int virProcessSetMaxFiles(pid_t pid, unsigned int files); + +/* Callback to run code within the mount namespace tied to the given + * pid. This function must use only async-signal-safe functions, as + * it gets run after a fork of a multi-threaded process. The return + * value of this function is passed to _exit(), except that a + * negative value is treated as EXIT_CANCELED. */ +typedef int (*virProcessNamespaceCallback)(pid_t pid, void *opaque); + +int virProcessRunInMountNamespace(pid_t pid, + virProcessNamespaceCallback cb, + void *opaque); #endif /* __VIR_PROCESS_H__ */ -- 1.8.5.3

On 02/07/2014 08:33 AM, Daniel P. Berrange wrote:
Implement virProcessRunInMountNamespace, which runs callback of type virProcessNamespaceCallback in a container namespace. This uses a child process to run the callback, since you can't change the mount namespace of a thread. This implies that callbacks have to be careful about what code they run due to async safety rules.
Idea by Dan Berrange, based on an initial report by Reco <recoverym4n@gmail.com> at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=732394
Signed-off-by: Daniel Berrange <berrange@redhat.com>
Looks like you improved from my version, so it would be nice to add: Signed-off-by: Eric Blake <eblake@redhat.com>
+ +/* Run cb(opaque) in the mount namespace of pid. Return -1 with error + * message raised if we fail to run the child, if the child dies from + * a signal, or if the child has status EXIT_CANCELED; otherwise
Comment is wrong, since EXIT_CANCELED is not defined yet (it will be fixed when I rebase the virFork cleanups on top of this).
+ if (ret < 0 || child < 0) { + if (child == 0) + _exit(1);
Similarly, I prefer EXIT_FAILURE over the magic '1'. But I don't mind cleaning that up in my rebased virFork stuff.
+ else if (child > 0)
The else is redundant, since _exit() never returns.
+ if (child == 0) { + VIR_FORCE_CLOSE(errfd[0]); + ret = virProcessNamespaceHelper(errfd[1], pid, + cb, opaque); + VIR_FORCE_CLOSE(errfd[1]); + _exit(ret < 0 ? 1 : 0);
Again, more magic numbers I can clean up later.
+ + ignore_value(virFileReadHeaderFD(errfd[0], 1024, &buf)); + if (virProcessWait(child, &status) < 0) + ret = -1; + else + ret = status == 0 ? 0 : -1;
This can be simplified: ret = virProcessWait(child, NULL);
+ if (ret < 0) + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + buf ? buf : _("Failed to run callback in mount namespace"));
And this error message overwrites the error from virProcessWait.
+/* Callback to run code within the mount namespace tied to the given + * pid. This function must use only async-signal-safe functions, as + * it gets run after a fork of a multi-threaded process. The return + * value of this function is passed to _exit(), except that a + * negative value is treated as EXIT_CANCELED. */ +typedef int (*virProcessNamespaceCallback)(pid_t pid, void *opaque);
Again, the comment doesn't quite work with the rearranged patch order. My comments above are improvements, but I didn't spot any bugs in your implementation. ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Fri, Feb 07, 2014 at 10:31:03AM -0700, Eric Blake wrote:
On 02/07/2014 08:33 AM, Daniel P. Berrange wrote:
Implement virProcessRunInMountNamespace, which runs callback of type virProcessNamespaceCallback in a container namespace. This uses a child process to run the callback, since you can't change the mount namespace of a thread. This implies that callbacks have to be careful about what code they run due to async safety rules.
Idea by Dan Berrange, based on an initial report by Reco <recoverym4n@gmail.com> at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=732394
Signed-off-by: Daniel Berrange <berrange@redhat.com>
Looks like you improved from my version, so it would be nice to add:
Signed-off-by: Eric Blake <eblake@redhat.com>
Sure. What happened is that I started with your code, then I rewrote the method from scratch to use threads, then I found threads + mount namespace didn't work, so I rewrote it again to use fork again. Fun times :-) Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

From: Eric Blake <eblake@redhat.com> Use helper virProcessRunInMountNamespace in lxcDomainShutdownFlags and lxcDomainReboot. Otherwise, a malicious guest could use symlinks to force the host to manipulate the wrong file in the host's namespace. Idea by Dan Berrange, based on an initial report by Reco <recoverym4n@gmail.com> at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=732394 Signed-off-by: Eric Blake <eblake@redhat.com> --- src/lxc/lxc_driver.c | 38 ++++++++++++++++++++------------------ src/util/virinitctl.c | 26 ++++++++++---------------- src/util/virinitctl.h | 5 ++--- 3 files changed, 32 insertions(+), 37 deletions(-) diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index 51ca98c..96987cd 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -3307,12 +3307,20 @@ lxcConnectListAllDomains(virConnectPtr conn, static int +lxcDomainInitctlCallback(pid_t pid ATTRIBUTE_UNUSED, + void *opaque) +{ + int *command = opaque; + return virInitctlSetRunLevel(*command); +} + + +static int lxcDomainShutdownFlags(virDomainPtr dom, unsigned int flags) { virLXCDomainObjPrivatePtr priv; virDomainObjPtr vm; - char *vroot = NULL; int ret = -1; int rc; @@ -3339,16 +3347,14 @@ lxcDomainShutdownFlags(virDomainPtr dom, goto cleanup; } - if (virAsprintf(&vroot, "/proc/%llu/root", - (unsigned long long)priv->initpid) < 0) - goto cleanup; - if (flags == 0 || (flags & VIR_DOMAIN_SHUTDOWN_INITCTL)) { - if ((rc = virInitctlSetRunLevel(VIR_INITCTL_RUNLEVEL_POWEROFF, - vroot)) < 0) { + int command = VIR_INITCTL_RUNLEVEL_POWEROFF; + + if ((rc = virProcessRunInMountNamespace(priv->initpid, + lxcDomainInitctlCallback, + &command)) < 0) goto cleanup; - } if (rc == 0 && flags != 0 && ((flags & ~VIR_DOMAIN_SHUTDOWN_INITCTL) == 0)) { virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", @@ -3374,7 +3380,6 @@ lxcDomainShutdownFlags(virDomainPtr dom, ret = 0; cleanup: - VIR_FREE(vroot); if (vm) virObjectUnlock(vm); return ret; @@ -3386,13 +3391,13 @@ lxcDomainShutdown(virDomainPtr dom) return lxcDomainShutdownFlags(dom, 0); } + static int lxcDomainReboot(virDomainPtr dom, unsigned int flags) { virLXCDomainObjPrivatePtr priv; virDomainObjPtr vm; - char *vroot = NULL; int ret = -1; int rc; @@ -3419,16 +3424,14 @@ lxcDomainReboot(virDomainPtr dom, goto cleanup; } - if (virAsprintf(&vroot, "/proc/%llu/root", - (unsigned long long)priv->initpid) < 0) - goto cleanup; - if (flags == 0 || (flags & VIR_DOMAIN_REBOOT_INITCTL)) { - if ((rc = virInitctlSetRunLevel(VIR_INITCTL_RUNLEVEL_REBOOT, - vroot)) < 0) { + int command = VIR_INITCTL_RUNLEVEL_REBOOT; + + if ((rc = virProcessRunInMountNamespace(priv->initpid, + lxcDomainInitctlCallback, + &command)) < 0) goto cleanup; - } if (rc == 0 && flags != 0 && ((flags & ~VIR_DOMAIN_SHUTDOWN_INITCTL) == 0)) { virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", @@ -3454,7 +3457,6 @@ lxcDomainReboot(virDomainPtr dom, ret = 0; cleanup: - VIR_FREE(vroot); if (vm) virObjectUnlock(vm); return ret; diff --git a/src/util/virinitctl.c b/src/util/virinitctl.c index a85cc9c..579268f 100644 --- a/src/util/virinitctl.c +++ b/src/util/virinitctl.c @@ -111,16 +111,18 @@ struct virInitctlRequest { # endif /* - * Send a message to init to change the runlevel + * Send a message to init to change the runlevel. This function is + * asynchronous-signal-safe (thus safe to use after fork of a + * multithreaded parent) - which is good, because it should only be + * used after forking and entering correct namespace. * * Returns 1 on success, 0 if initctl does not exist, -1 on error */ -int virInitctlSetRunLevel(virInitctlRunLevel level, - const char *vroot) +int +virInitctlSetRunLevel(virInitctlRunLevel level) { struct virInitctlRequest req; int fd = -1; - char *path = NULL; int ret = -1; memset(&req, 0, sizeof(req)); @@ -131,36 +133,28 @@ int virInitctlSetRunLevel(virInitctlRunLevel level, /* Yes it is an 'int' field, but wants a numeric character. Go figure */ req.runlevel = '0' + level; - if (vroot) { - if (virAsprintf(&path, "%s/%s", vroot, VIR_INITCTL_FIFO) < 0) - return -1; - } else { - if (VIR_STRDUP(path, VIR_INITCTL_FIFO) < 0) - return -1; - } - - if ((fd = open(path, O_WRONLY|O_NONBLOCK|O_CLOEXEC|O_NOCTTY)) < 0) { + if ((fd = open(VIR_INITCTL_FIFO, + O_WRONLY|O_NONBLOCK|O_CLOEXEC|O_NOCTTY)) < 0) { if (errno == ENOENT) { ret = 0; goto cleanup; } virReportSystemError(errno, _("Cannot open init control %s"), - path); + VIR_INITCTL_FIFO); goto cleanup; } if (safewrite(fd, &req, sizeof(req)) != sizeof(req)) { virReportSystemError(errno, _("Failed to send request to init control %s"), - path); + VIR_INITCTL_FIFO); goto cleanup; } ret = 1; cleanup: - VIR_FREE(path); VIR_FORCE_CLOSE(fd); return ret; } diff --git a/src/util/virinitctl.h b/src/util/virinitctl.h index 862539f..18052c0 100644 --- a/src/util/virinitctl.h +++ b/src/util/virinitctl.h @@ -1,7 +1,7 @@ /* * virinitctl.h: API for talking to init systems via initctl * - * Copyright (C) 2012 Red Hat, Inc. + * Copyright (C) 2012-2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -37,7 +37,6 @@ enum virInitctlRunLevel { VIR_INITCTL_RUNLEVEL_LAST }; -int virInitctlSetRunLevel(virInitctlRunLevel level, - const char *vroot); +int virInitctlSetRunLevel(virInitctlRunLevel level); #endif -- 1.8.5.3

On 02/07/2014 08:33 AM, Daniel P. Berrange wrote:
From: Eric Blake <eblake@redhat.com>
Use helper virProcessRunInMountNamespace in lxcDomainShutdownFlags and lxcDomainReboot. Otherwise, a malicious guest could use symlinks to force the host to manipulate the wrong file in the host's namespace.
Idea by Dan Berrange, based on an initial report by Reco <recoverym4n@gmail.com> at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=732394
Signed-off-by: Eric Blake <eblake@redhat.com> --- src/lxc/lxc_driver.c | 38 ++++++++++++++++++++------------------ src/util/virinitctl.c | 26 ++++++++++---------------- src/util/virinitctl.h | 5 ++--- 3 files changed, 32 insertions(+), 37 deletions(-)
I guess the fact that you reposted my patch serves as the ACK :)
+++ b/src/util/virinitctl.h @@ -1,7 +1,7 @@ /* * virinitctl.h: API for talking to init systems via initctl * - * Copyright (C) 2012 Red Hat, Inc. + * Copyright (C) 2012-2013 Red Hat, Inc.
Oh my - I wrote it that long ago. We can add 2014 here. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Rewrite lxcDomainAttachDeviceDiskLive function to use the virProcessRunInMountNamespace helper. This avoids risk of a malicious guest replacing /dev with a absolute symlink, tricking the driver into changing the host OS filesystem. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 183 ++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 139 insertions(+), 44 deletions(-) diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index 96987cd..a1d0e81 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -3627,6 +3627,115 @@ cleanup: } +struct lxcDomainAttachDeviceMknodData { + virLXCDriverPtr driver; + mode_t mode; + dev_t dev; + virDomainObjPtr vm; + virDomainDeviceDefPtr def; + char *file; +}; + +static int +lxcDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, + void *opaque) +{ + struct lxcDomainAttachDeviceMknodData *data = opaque; + int ret = -1; + + virSecurityManagerPostFork(data->driver->securityManager); + + if (virFileMakeParentPath(data->file) < 0) { + virReportSystemError(errno, + _("Unable to create %s"), data->file); + goto cleanup; + } + + + /* Yes, the device name we're creating may not + * actually correspond to the major:minor number + * we're using, but we've no other option at this + * time. Just have to hope that containerized apps + * don't get upset that the major:minor is different + * to that normally implied by the device name + */ + VIR_DEBUG("Creating dev %s (%d,%d)", + data->file, major(data->dev), minor(data->dev)); + if (mknod(data->file, data->mode, data->dev) < 0) { + virReportSystemError(errno, + _("Unable to create device %s"), + data->file); + goto cleanup; + } + + if (lxcContainerChown(data->vm->def, data->file) < 0) + goto cleanup; + + /* Labelling normally operates on src, but we need + * to actually label the dst here, so hack the config */ + switch (data->def->type) { + case VIR_DOMAIN_DEVICE_DISK: { + virDomainDiskDefPtr def = data->def->data.disk; + char *tmpsrc = def->src; + def->src = data->file; + if (virSecurityManagerSetImageLabel(data->driver->securityManager, + data->vm->def, def) < 0) { + def->src = tmpsrc; + goto cleanup; + } + def->src = tmpsrc; + } break; + + default: + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unexpected device type %d"), + data->def->type); + goto cleanup; + } + + ret = 0; + + cleanup: + if (ret < 0) + unlink(data->file); + return ret; +} + + +static int lxcDomainAttachDeviceMknod(virLXCDriverPtr driver, + mode_t mode, + dev_t dev, + virDomainObjPtr vm, + virDomainDeviceDefPtr def, + char *file) +{ + virLXCDomainObjPrivatePtr priv = vm->privateData; + struct lxcDomainAttachDeviceMknodData data; + + memset(&data, 0, sizeof(data)); + + data.driver = driver; + data.mode = mode; + data.dev = dev; + data.vm = vm; + data.def = def; + data.file = file; + + if (virSecurityManagerPreFork(driver->securityManager) < 0) + return -1; + + if (virProcessRunInMountNamespace(priv->initpid, + lxcDomainAttachDeviceMknodHelper, + &data) < 0) { + virSecurityManagerPostFork(driver->securityManager); + return -1; + } + + virSecurityManagerPostFork(driver->securityManager); + return 0; +} + + static int lxcDomainAttachDeviceDiskLive(virLXCDriverPtr driver, virDomainObjPtr vm, @@ -3635,11 +3744,8 @@ lxcDomainAttachDeviceDiskLive(virLXCDriverPtr driver, virLXCDomainObjPrivatePtr priv = vm->privateData; virDomainDiskDefPtr def = dev->data.disk; int ret = -1; - char *dst = NULL; struct stat sb; - bool created = false; - mode_t mode = 0; - char *tmpsrc = def->src; + char *file = NULL; if (!priv->initpid) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", @@ -3683,51 +3789,43 @@ lxcDomainAttachDeviceDiskLive(virLXCDriverPtr driver, goto cleanup; } - if (virAsprintf(&dst, "/proc/%llu/root/dev/%s", - (unsigned long long)priv->initpid, def->dst) < 0) - goto cleanup; - - if (VIR_REALLOC_N(vm->def->disks, vm->def->ndisks+1) < 0) + if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("devices cgroup isn't mounted")); goto cleanup; + } - mode = 0700 | S_IFBLK; - - /* Yes, the device name we're creating may not - * actually correspond to the major:minor number - * we're using, but we've no other option at this - * time. Just have to hope that containerized apps - * don't get upset that the major:minor is different - * to that normally implied by the device name - */ - VIR_DEBUG("Creating dev %s (%d,%d) from %s", - dst, major(sb.st_rdev), minor(sb.st_rdev), def->src); - if (mknod(dst, mode, sb.st_rdev) < 0) { - virReportSystemError(errno, - _("Unable to create device %s"), - dst); + if (virCgroupAllowDevice(priv->cgroup, + 'b', + major(sb.st_rdev), + minor(sb.st_rdev), + (def->readonly ? + VIR_CGROUP_DEVICE_READ : + VIR_CGROUP_DEVICE_RW) | + VIR_CGROUP_DEVICE_MKNOD) != 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot allow device %s for domain %s"), + def->src, vm->def->name); goto cleanup; } - if (lxcContainerChown(vm->def, dst) < 0) + if (VIR_REALLOC_N(vm->def->disks, vm->def->ndisks+1) < 0) goto cleanup; - created = true; - - /* Labelling normally operates on src, but we need - * to actually label the dst here, so hack the config */ - def->src = dst; - if (virSecurityManagerSetImageLabel(driver->securityManager, - vm->def, def) < 0) + if (virAsprintf(&file, + "/dev/%s", def->dst) < 0) goto cleanup; - if (virCgroupAllowDevicePath(priv->cgroup, def->src, - (def->readonly ? - VIR_CGROUP_DEVICE_READ : - VIR_CGROUP_DEVICE_RW) | - VIR_CGROUP_DEVICE_MKNOD) != 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("cannot allow device %s for domain %s"), - def->src, vm->def->name); + if (lxcDomainAttachDeviceMknod(driver, + 0700 | S_IFBLK, + sb.st_rdev, + vm, + dev, + file) < 0) { + if (virCgroupDenyDevicePath(priv->cgroup, def->src, + VIR_CGROUP_DEVICE_RWM) < 0) + VIR_WARN("cannot deny device %s for domain %s", + def->src, vm->def->name); goto cleanup; } @@ -3736,11 +3834,8 @@ lxcDomainAttachDeviceDiskLive(virLXCDriverPtr driver, ret = 0; cleanup: - def->src = tmpsrc; virDomainAuditDisk(vm, NULL, def->src, "attach", ret == 0); - if (dst && created && ret < 0) - unlink(dst); - VIR_FREE(dst); + VIR_FREE(file); return ret; } -- 1.8.5.3

On 02/07/2014 08:33 AM, Daniel P. Berrange wrote:
Rewrite lxcDomainAttachDeviceDiskLive function to use the virProcessRunInMountNamespace helper. This avoids risk of a malicious guest replacing /dev with a absolute symlink, tricking the driver into changing the host OS filesystem.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 183 ++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 139 insertions(+), 44 deletions(-)
+ } + + + /* Yes, the device name we're creating may not
Why two blank lines?
+ + if (lxcContainerChown(data->vm->def, data->file) < 0) + goto cleanup;
Hmm. Calling _() can trigger malloc(), which is not async-safe in general. However, at least with glibc, I've never seen malloc deadlock because of use after multi-threaded fork, and we already have other spots in the code base that would also need cleaning if we truly want to avoid malloc between fork and exec.
+ + /* Labelling normally operates on src, but we need + * to actually label the dst here, so hack the config */ + switch (data->def->type) { + case VIR_DOMAIN_DEVICE_DISK: { + virDomainDiskDefPtr def = data->def->data.disk; + char *tmpsrc = def->src; + def->src = data->file; + if (virSecurityManagerSetImageLabel(data->driver->securityManager, + data->vm->def, def) < 0) {
Wow, that's quite the call stack that gets pulled in, when auditing for async safety. In particular, I have no idea if setfilecon_raw() and getfilecon_raw() are safe. We might be able to argue that because you grab virSecurityManagerPreFork, then no other libvirt threads will be using getfilecon_raw() at the same time; and since this code is only run in libvirtd, we don't have to worry about non-libvirt threads in the context of a larger application that linked in libvirt.so. So I _think_ we are safe.
+ def->src = tmpsrc; + goto cleanup; + } + def->src = tmpsrc; + } break; +
No real need to put def back to normal - we're in a forked child and about to exit. But doesn't hurt either, in case this gets copied and pasted to something in parent context where it does matter.
+static int lxcDomainAttachDeviceMknod(virLXCDriverPtr driver,
Style nit for breaking this to two lines, but not a showstopper.
- if (lxcContainerChown(vm->def, dst) < 0) + if (VIR_REALLOC_N(vm->def->disks, vm->def->ndisks+1) < 0)
spacing around +
+ if (lxcDomainAttachDeviceMknod(driver, + 0700 | S_IFBLK, + sb.st_rdev, + vm, + dev, + file) < 0) { + if (virCgroupDenyDevicePath(priv->cgroup, def->src, + VIR_CGROUP_DEVICE_RWM) < 0) + VIR_WARN("cannot deny device %s for domain %s", + def->src, vm->def->name);
This denies by path, while the allow was fixed to be by major:minor, which means you aren't denying what you thought. This needs to deny by major:minor. Do we need to audit this cleanup action?
goto cleanup; }
@@ -3736,11 +3834,8 @@ lxcDomainAttachDeviceDiskLive(virLXCDriverPtr driver, ret = 0;
cleanup: - def->src = tmpsrc; virDomainAuditDisk(vm, NULL, def->src, "attach", ret == 0);
As written, you only audit the grant, not the corresponding deny on failure. That's a pre-existing issue, not made worse by your patch, but something to consider whether it warrants a v2 or a followup patch to fix. My overall thoughts: If we had a way to do _just_ the mknod, then open the file, and pass the fd back to the parent, then do labeling on the fd from the parent context (rather than on the path in the child context), it would make for a smaller child action easier to audit. But I'm not sure that would get the labeling right - it looks like we have to label the actual path name in the child. Or even if selinux took a leaf from openat() and friends, and gave us the ability to do actions on a name relative to an fd, then all we'd need to do is fork, change namespace, open the fd of the container directory, pass that back, then do the remaining options in the parent, where life is much easier. But lacking either of those options, I think your approach is the best we can do. Please fix the cgroup deny, then I can give a reluctant: ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Fri, Feb 07, 2014 at 11:22:12AM -0700, Eric Blake wrote:
On 02/07/2014 08:33 AM, Daniel P. Berrange wrote: My overall thoughts:
If we had a way to do _just_ the mknod, then open the file, and pass the fd back to the parent, then do labeling on the fd from the parent context (rather than on the path in the child context), it would make for a smaller child action easier to audit. But I'm not sure that would get the labeling right - it looks like we have to label the actual path name in the child. Or even if selinux took a leaf from openat() and friends, and gave us the ability to do actions on a name relative to an fd, then all we'd need to do is fork, change namespace, open the fd of the container directory, pass that back, then do the remaining options in the parent, where life is much easier.
The FD passing idea is interesting. I think I will explore that idea further to see if it is viable before we finalize this. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Rewrite lxcDomainAttachDeviceHostdevSubsysUSBLive function to use the virProcessRunInMountNamespace helper. This avoids risk of a malicious guest replacing /dev with a absolute symlink, tricking the driver into changing the host OS filesystem. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 73 ++++++++++++++++------------------------------------ 1 file changed, 22 insertions(+), 51 deletions(-) diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index a1d0e81..589a45d 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -3686,6 +3686,13 @@ lxcDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, def->src = tmpsrc; } break; + case VIR_DOMAIN_DEVICE_HOSTDEV: { + virDomainHostdevDefPtr def = data->def->data.hostdev; + if (virSecurityManagerSetHostdevLabel(data->driver->securityManager, + data->vm->def, def, NULL) < 0) + goto cleanup; + } break; + default: virReportError(VIR_ERR_INTERNAL_ERROR, _("Unexpected device type %d"), @@ -3977,13 +3984,8 @@ lxcDomainAttachDeviceHostdevSubsysUSBLive(virLXCDriverPtr driver, virLXCDomainObjPrivatePtr priv = vm->privateData; virDomainHostdevDefPtr def = dev->data.hostdev; int ret = -1; - char *vroot = NULL; char *src = NULL; - char *dstdir = NULL; - char *dstfile = NULL; struct stat sb; - mode_t mode; - bool created = false; virUSBDevicePtr usb = NULL; if (virDomainHostdevFind(vm->def, def, NULL) >= 0) { @@ -3992,27 +3994,13 @@ lxcDomainAttachDeviceHostdevSubsysUSBLive(virLXCDriverPtr driver, return -1; } - if (virAsprintf(&vroot, "/proc/%llu/root", - (unsigned long long)priv->initpid) < 0) - goto cleanup; - - if (virAsprintf(&dstdir, "%s/dev/bus/usb/%03d", - vroot, - def->source.subsys.u.usb.bus) < 0) - goto cleanup; - - if (virAsprintf(&dstfile, "%s/%03d", - dstdir, - def->source.subsys.u.usb.device) < 0) - goto cleanup; - if (virAsprintf(&src, "/dev/bus/usb/%03d/%03d", def->source.subsys.u.usb.bus, def->source.subsys.u.usb.device) < 0) goto cleanup; if (!(usb = virUSBDeviceNew(def->source.subsys.u.usb.bus, - def->source.subsys.u.usb.device, vroot))) + def->source.subsys.u.usb.device, NULL))) goto cleanup; if (stat(src, &sb) < 0) { @@ -4028,53 +4016,36 @@ lxcDomainAttachDeviceHostdevSubsysUSBLive(virLXCDriverPtr driver, goto cleanup; } - mode = 0700 | S_IFCHR; - if (VIR_REALLOC_N(vm->def->hostdevs, vm->def->nhostdevs+1) < 0) goto cleanup; - if (virFileMakePath(dstdir) < 0) { - virReportSystemError(errno, - _("Unable to create %s"), dstdir); - goto cleanup; - } - - VIR_DEBUG("Creating dev %s (%d,%d)", - dstfile, major(sb.st_rdev), minor(sb.st_rdev)); - if (mknod(dstfile, mode, sb.st_rdev) < 0) { - virReportSystemError(errno, - _("Unable to create device %s"), - dstfile); - goto cleanup; - } - created = true; - - if (lxcContainerChown(vm->def, dstfile) < 0) - goto cleanup; - - if (virSecurityManagerSetHostdevLabel(driver->securityManager, - vm->def, def, vroot) < 0) - goto cleanup; - if (virUSBDeviceFileIterate(usb, virLXCSetupHostUsbDeviceCgroup, priv->cgroup) < 0) goto cleanup; + if (lxcDomainAttachDeviceMknod(driver, + 0700 | S_IFCHR, + sb.st_rdev, + vm, + dev, + src) < 0) { + if (virUSBDeviceFileIterate(usb, + virLXCTeardownHostUsbDeviceCgroup, + priv->cgroup) < 0) + VIR_WARN("cannot deny device %s for domain %s", + src, vm->def->name); + goto cleanup; + } + vm->def->hostdevs[vm->def->nhostdevs++] = def; ret = 0; cleanup: virDomainAuditHostdev(vm, def, "attach", ret == 0); - if (ret < 0 && created) - unlink(dstfile); - virUSBDeviceFree(usb); VIR_FREE(src); - VIR_FREE(dstfile); - VIR_FREE(dstdir); - VIR_FREE(vroot); return ret; } -- 1.8.5.3

On 02/07/2014 08:33 AM, Daniel P. Berrange wrote:
Rewrite lxcDomainAttachDeviceHostdevSubsysUSBLive function to use the virProcessRunInMountNamespace helper. This avoids risk of a malicious guest replacing /dev with a absolute symlink, tricking the driver into changing the host OS filesystem.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 73 ++++++++++++++++------------------------------------ 1 file changed, 22 insertions(+), 51 deletions(-)
ACK -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Rewrite lxcDomainAttachDeviceHostdevStorageLive function to use the virProcessRunInMountNamespace helper. This avoids risk of a malicious guest replacing /dev with a absolute symlink, tricking the driver into changing the host OS filesystem. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 64 ++++++++++++++-------------------------------------- 1 file changed, 17 insertions(+), 47 deletions(-) diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index 589a45d..5780e11 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -4058,11 +4058,7 @@ lxcDomainAttachDeviceHostdevStorageLive(virLXCDriverPtr driver, virLXCDomainObjPrivatePtr priv = vm->privateData; virDomainHostdevDefPtr def = dev->data.hostdev; int ret = -1; - char *dst = NULL; - char *vroot = NULL; struct stat sb; - bool created = false; - mode_t mode = 0; if (!def->source.caps.u.storage.block) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", @@ -4090,64 +4086,38 @@ lxcDomainAttachDeviceHostdevStorageLive(virLXCDriverPtr driver, goto cleanup; } - if (virAsprintf(&vroot, "/proc/%llu/root", - (unsigned long long)priv->initpid) < 0) - goto cleanup; - - if (virAsprintf(&dst, "%s/%s", - vroot, - def->source.caps.u.storage.block) < 0) - goto cleanup; - if (VIR_REALLOC_N(vm->def->hostdevs, vm->def->nhostdevs+1) < 0) goto cleanup; - if (lxcContainerSetupHostdevCapsMakePath(dst) < 0) { - virReportSystemError(errno, - _("Unable to create directory for device %s"), - dst); - goto cleanup; - } - - mode = 0700 | S_IFBLK; - - VIR_DEBUG("Creating dev %s (%d,%d)", - def->source.caps.u.storage.block, - major(sb.st_rdev), minor(sb.st_rdev)); - if (mknod(dst, mode, sb.st_rdev) < 0) { - virReportSystemError(errno, - _("Unable to create device %s"), - dst); - goto cleanup; - } - created = true; - - if (lxcContainerChown(vm->def, dst) < 0) - goto cleanup; - - if (virSecurityManagerSetHostdevLabel(driver->securityManager, - vm->def, def, vroot) < 0) - goto cleanup; - - if (virCgroupAllowDevicePath(priv->cgroup, def->source.caps.u.storage.block, - VIR_CGROUP_DEVICE_RW | - VIR_CGROUP_DEVICE_MKNOD) != 0) { + if (virCgroupAllowDevicePath(priv->cgroup, + def->source.caps.u.storage.block, + VIR_CGROUP_DEVICE_RWM) != 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot allow device %s for domain %s"), def->source.caps.u.storage.block, vm->def->name); goto cleanup; } + if (lxcDomainAttachDeviceMknod(driver, + 0700 | S_IFBLK, + sb.st_rdev, + vm, + dev, + def->source.caps.u.storage.block) < 0) { + if (virCgroupDenyDevicePath(priv->cgroup, + def->source.caps.u.storage.block, + VIR_CGROUP_DEVICE_RWM) != 0) + VIR_WARN("cannot deny device %s for domain %s", + def->source.caps.u.storage.block, vm->def->name); + goto cleanup; + } + vm->def->hostdevs[vm->def->nhostdevs++] = def; ret = 0; cleanup: virDomainAuditHostdev(vm, def, "attach", ret == 0); - if (dst && created && ret < 0) - unlink(dst); - VIR_FREE(dst); - VIR_FREE(vroot); return ret; } -- 1.8.5.3

On 02/07/2014 08:33 AM, Daniel P. Berrange wrote:
Rewrite lxcDomainAttachDeviceHostdevStorageLive function to use the virProcessRunInMountNamespace helper. This avoids risk of a malicious guest replacing /dev with a absolute symlink, tricking the driver into changing the host OS filesystem.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 64 ++++++++++++++-------------------------------------- 1 file changed, 17 insertions(+), 47 deletions(-)
+ if (virCgroupAllowDevicePath(priv->cgroup, + def->source.caps.u.storage.block, + VIR_CGROUP_DEVICE_RWM) != 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot allow device %s for domain %s"), def->source.caps.u.storage.block, vm->def->name);
Here, I think we're okay setting up cgroup by name - because it is the host-visible name, not the guest-visible name, and cgroup just stat()s the host name to use the major:minor anyways. But since we already have sb.st_rdev, why not avoid a second stat() and just tell cgroup to change by major:minor in the first place?
goto cleanup; }
+ if (lxcDomainAttachDeviceMknod(driver, + 0700 | S_IFBLK, + sb.st_rdev, + vm, + dev, + def->source.caps.u.storage.block) < 0) { + if (virCgroupDenyDevicePath(priv->cgroup, + def->source.caps.u.storage.block, + VIR_CGROUP_DEVICE_RWM) != 0) + VIR_WARN("cannot deny device %s for domain %s", + def->source.caps.u.storage.block, vm->def->name); + goto cleanup;
Same problem as in 10/14 about no audit of cleanup attempts. And again, something probably worth fixing in its own patch, so for this one: ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Rewrite lxcDomainAttachDeviceHostdevMisceLive function to use the virProcessRunInMountNamespace helper. This avoids risk of a malicious guest replacing /dev with a absolute symlink, tricking the driver into changing the host OS filesystem. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 66 ++++++++++++++-------------------------------------- 1 file changed, 18 insertions(+), 48 deletions(-) diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index 5780e11..763b22b 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -4130,11 +4130,7 @@ lxcDomainAttachDeviceHostdevMiscLive(virLXCDriverPtr driver, virLXCDomainObjPrivatePtr priv = vm->privateData; virDomainHostdevDefPtr def = dev->data.hostdev; int ret = -1; - char *dst = NULL; - char *vroot = NULL; struct stat sb; - bool created = false; - mode_t mode = 0; if (!def->source.caps.u.misc.chardev) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", @@ -4162,51 +4158,29 @@ lxcDomainAttachDeviceHostdevMiscLive(virLXCDriverPtr driver, goto cleanup; } - if (virAsprintf(&vroot, "/proc/%llu/root", - (unsigned long long)priv->initpid) < 0) - goto cleanup; - - if (virAsprintf(&dst, "%s/%s", - vroot, - def->source.caps.u.misc.chardev) < 0) - goto cleanup; - - if (VIR_REALLOC_N(vm->def->hostdevs, vm->def->nhostdevs+1) < 0) - goto cleanup; - - if (lxcContainerSetupHostdevCapsMakePath(dst) < 0) { - virReportSystemError(errno, - _("Unable to create directory for device %s"), - dst); - goto cleanup; - } - - mode = 0700 | S_IFCHR; - - VIR_DEBUG("Creating dev %s (%d,%d)", - def->source.caps.u.misc.chardev, - major(sb.st_rdev), minor(sb.st_rdev)); - if (mknod(dst, mode, sb.st_rdev) < 0) { - virReportSystemError(errno, - _("Unable to create device %s"), - dst); + if (virCgroupAllowDevicePath(priv->cgroup, + def->source.caps.u.misc.chardev, + VIR_CGROUP_DEVICE_RWM) != 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot allow device %s for domain %s"), + def->source.caps.u.misc.chardev, vm->def->name); goto cleanup; } - created = true; - - if (lxcContainerChown(vm->def, dst) < 0) - goto cleanup; - if (virSecurityManagerSetHostdevLabel(driver->securityManager, - vm->def, def, vroot) < 0) + if (VIR_REALLOC_N(vm->def->hostdevs, vm->def->nhostdevs+1) < 0) goto cleanup; - if (virCgroupAllowDevicePath(priv->cgroup, def->source.caps.u.misc.chardev, - VIR_CGROUP_DEVICE_RW | - VIR_CGROUP_DEVICE_MKNOD) != 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("cannot allow device %s for domain %s"), - def->source.caps.u.misc.chardev, vm->def->name); + if (lxcDomainAttachDeviceMknod(driver, + 0700 | S_IFBLK, + sb.st_rdev, + vm, + dev, + def->source.caps.u.misc.chardev) < 0) { + if (virCgroupDenyDevicePath(priv->cgroup, + def->source.caps.u.misc.chardev, + VIR_CGROUP_DEVICE_RWM) != 0) + VIR_WARN("cannot deny device %s for domain %s", + def->source.caps.u.storage.block, vm->def->name); goto cleanup; } @@ -4216,10 +4190,6 @@ lxcDomainAttachDeviceHostdevMiscLive(virLXCDriverPtr driver, cleanup: virDomainAuditHostdev(vm, def, "attach", ret == 0); - if (dst && created && ret < 0) - unlink(dst); - VIR_FREE(dst); - VIR_FREE(vroot); return ret; } -- 1.8.5.3

On 02/07/2014 08:33 AM, Daniel P. Berrange wrote:
Rewrite lxcDomainAttachDeviceHostdevMisceLive function
s/Misce/Misc/
to use the virProcessRunInMountNamespace helper. This avoids risk of a malicious guest replacing /dev with a absolute symlink, tricking the driver into changing the host OS filesystem.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 66 ++++++++++++++-------------------------------------- 1 file changed, 18 insertions(+), 48 deletions(-)
+ if (virCgroupAllowDevicePath(priv->cgroup, + def->source.caps.u.misc.chardev, + VIR_CGROUP_DEVICE_RWM) != 0) {
Again, why make cgroup do another stat of the host-side name, when we've already done a stat and could tell cgroup to operate on major:minor?
+ def->source.caps.u.misc.chardev) < 0) { + if (virCgroupDenyDevicePath(priv->cgroup, + def->source.caps.u.misc.chardev, + VIR_CGROUP_DEVICE_RWM) != 0) + VIR_WARN("cannot deny device %s for domain %s", + def->source.caps.u.storage.block, vm->def->name); goto cleanup;
And same issue on no auditing of deny on the failure path. ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Rewrite multiple hotunplug functions to to use the virProcessRunInMountNamespace helper. This avoids risk of a malicious guest replacing /dev with a absolute symlink, tricking the driver into changing the host OS filesystem. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 79 ++++++++++++++++++++++++++-------------------------- 1 file changed, 39 insertions(+), 40 deletions(-) diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index 763b22b..5940343 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -3744,6 +3744,39 @@ static int lxcDomainAttachDeviceMknod(virLXCDriverPtr driver, static int +lxcDomainAttachDeviceUnlinkHelper(pid_t pid ATTRIBUTE_UNUSED, + void *opaque) +{ + const char *path = opaque; + + VIR_DEBUG("Unlinking %s", path); + if (unlink(path) < 0 && errno != ENOENT) { + virReportSystemError(errno, + _("Unable to remove device %s"), path); + return -1; + } + + return 0; +} + + +static int +lxcDomainAttachDeviceUnlink(virDomainObjPtr vm, + char *file) +{ + virLXCDomainObjPrivatePtr priv = vm->privateData; + + if (virProcessRunInMountNamespace(priv->initpid, + lxcDomainAttachDeviceUnlinkHelper, + file) < 0) { + return -1; + } + + return 0; +} + + +static int lxcDomainAttachDeviceDiskLive(virLXCDriverPtr driver, virDomainObjPtr vm, virDomainDeviceDefPtr dev) @@ -4332,8 +4365,7 @@ lxcDomainDetachDeviceDiskLive(virDomainObjPtr vm, def = vm->def->disks[idx]; - if (virAsprintf(&dst, "/proc/%llu/root/dev/%s", - (unsigned long long)priv->initpid, def->dst) < 0) + if (virAsprintf(&dst, "/dev/%s", def->dst) < 0) goto cleanup; if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) { @@ -4342,11 +4374,8 @@ lxcDomainDetachDeviceDiskLive(virDomainObjPtr vm, goto cleanup; } - VIR_DEBUG("Unlinking %s (backed by %s)", dst, def->src); - if (unlink(dst) < 0 && errno != ENOENT) { + if (lxcDomainAttachDeviceUnlink(vm, dst) < 0) { virDomainAuditDisk(vm, def->src, NULL, "detach", false); - virReportSystemError(errno, - _("Unable to remove device %s"), dst); goto cleanup; } virDomainAuditDisk(vm, def->src, NULL, "detach", true); @@ -4441,7 +4470,6 @@ lxcDomainDetachDeviceHostdevUSBLive(virLXCDriverPtr driver, virDomainHostdevDefPtr def = NULL; int idx, ret = -1; char *dst = NULL; - char *vroot = NULL; virUSBDevicePtr usb = NULL; if ((idx = virDomainHostdevFind(vm->def, @@ -4452,12 +4480,7 @@ lxcDomainDetachDeviceHostdevUSBLive(virLXCDriverPtr driver, goto cleanup; } - if (virAsprintf(&vroot, "/proc/%llu/root", - (unsigned long long)priv->initpid) < 0) - goto cleanup; - - if (virAsprintf(&dst, "%s/dev/bus/usb/%03d/%03d", - vroot, + if (virAsprintf(&dst, "/dev/bus/usb/%03d/%03d", def->source.subsys.u.usb.bus, def->source.subsys.u.usb.device) < 0) goto cleanup; @@ -4472,11 +4495,8 @@ lxcDomainDetachDeviceHostdevUSBLive(virLXCDriverPtr driver, def->source.subsys.u.usb.device, NULL))) goto cleanup; - VIR_DEBUG("Unlinking %s", dst); - if (unlink(dst) < 0 && errno != ENOENT) { + if (lxcDomainAttachDeviceUnlink(vm, dst) < 0) { virDomainAuditHostdev(vm, def, "detach", false); - virReportSystemError(errno, - _("Unable to remove device %s"), dst); goto cleanup; } virDomainAuditHostdev(vm, def, "detach", true); @@ -4499,7 +4519,6 @@ lxcDomainDetachDeviceHostdevUSBLive(virLXCDriverPtr driver, cleanup: virUSBDeviceFree(usb); VIR_FREE(dst); - VIR_FREE(vroot); return ret; } @@ -4511,7 +4530,6 @@ lxcDomainDetachDeviceHostdevStorageLive(virDomainObjPtr vm, virLXCDomainObjPrivatePtr priv = vm->privateData; virDomainHostdevDefPtr def = NULL; int idx, ret = -1; - char *dst = NULL; if (!priv->initpid) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", @@ -4528,22 +4546,14 @@ lxcDomainDetachDeviceHostdevStorageLive(virDomainObjPtr vm, goto cleanup; } - if (virAsprintf(&dst, "/proc/%llu/root/%s", - (unsigned long long)priv->initpid, - def->source.caps.u.storage.block) < 0) - goto cleanup; - if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", _("devices cgroup isn't mounted")); goto cleanup; } - VIR_DEBUG("Unlinking %s", dst); - if (unlink(dst) < 0 && errno != ENOENT) { + if (lxcDomainAttachDeviceUnlink(vm, def->source.caps.u.storage.block) < 0) { virDomainAuditHostdev(vm, def, "detach", false); - virReportSystemError(errno, - _("Unable to remove device %s"), dst); goto cleanup; } virDomainAuditHostdev(vm, def, "detach", true); @@ -4558,7 +4568,6 @@ lxcDomainDetachDeviceHostdevStorageLive(virDomainObjPtr vm, ret = 0; cleanup: - VIR_FREE(dst); return ret; } @@ -4570,7 +4579,6 @@ lxcDomainDetachDeviceHostdevMiscLive(virDomainObjPtr vm, virLXCDomainObjPrivatePtr priv = vm->privateData; virDomainHostdevDefPtr def = NULL; int idx, ret = -1; - char *dst = NULL; if (!priv->initpid) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", @@ -4587,22 +4595,14 @@ lxcDomainDetachDeviceHostdevMiscLive(virDomainObjPtr vm, goto cleanup; } - if (virAsprintf(&dst, "/proc/%llu/root/%s", - (unsigned long long)priv->initpid, - def->source.caps.u.misc.chardev) < 0) - goto cleanup; - if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", _("devices cgroup isn't mounted")); goto cleanup; } - VIR_DEBUG("Unlinking %s", dst); - if (unlink(dst) < 0 && errno != ENOENT) { + if (lxcDomainAttachDeviceUnlink(vm, def->source.caps.u.misc.chardev) < 0) { virDomainAuditHostdev(vm, def, "detach", false); - virReportSystemError(errno, - _("Unable to remove device %s"), dst); goto cleanup; } virDomainAuditHostdev(vm, def, "detach", true); @@ -4617,7 +4617,6 @@ lxcDomainDetachDeviceHostdevMiscLive(virDomainObjPtr vm, ret = 0; cleanup: - VIR_FREE(dst); return ret; } -- 1.8.5.3

On 02/07/2014 08:33 AM, Daniel P. Berrange wrote:
Rewrite multiple hotunplug functions to to use the virProcessRunInMountNamespace helper. This avoids risk of a malicious guest replacing /dev with a absolute symlink, tricking the driver into changing the host OS filesystem.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/lxc/lxc_driver.c | 79 ++++++++++++++++++++++++++-------------------------- 1 file changed, 39 insertions(+), 40 deletions(-)
static int +lxcDomainAttachDeviceUnlinkHelper(pid_t pid ATTRIBUTE_UNUSED, + void *opaque) +{ + const char *path = opaque; + + VIR_DEBUG("Unlinking %s", path); + if (unlink(path) < 0 && errno != ENOENT) { + virReportSystemError(errno, + _("Unable to remove device %s"), path);
Same generic concern about _() using malloc in an async-safe context, but not worth worrying about in this patch. ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (2)
-
Daniel P. Berrange
-
Eric Blake