[libvirt] [PATCH 0/3] qemu: Couple of namespace fixes

*** BLURB HERE *** Michal Privoznik (3): qemu: Handle EEXIST gracefully in qemuDomainCreateDevice qemuDomainAttachDeviceMknodHelper: unlink() not so often qemuDomainCreateDevice: Be more careful about device path src/qemu/qemu_domain.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) -- 2.11.0

https://bugzilla.redhat.com/show_bug.cgi?id=1406837 Imagine you have a domain configured in such way that you are assigning two PCI devices that fall into the same IOMMU group. With mount namespace enabled what happens is that for the first PCI device corresponding /dev/vfio/X entry is created and when the code tries to do the same for the second mknod() fails as /dev/vfio/X already exists: 2016-12-21 14:40:45.648+0000: 24681: error : qemuProcessReportLogError:1792 : internal error: Process exited prior to exec: libvirt: QEMU Driver error : Failed to make device /var/run/libvirt/qemu/windoze.dev//vfio/22: File exists Worse, by default there are some devices that are created in the namespace regardless of domain configuration (e.g. /dev/null, /dev/urandom, etc.). If one of them is set as backend for some guest device (e.g. rng, chardev, etc.) it's the same story as described above. Weirdly, in attach code this is already handled. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 25cb4ad590..d05ebcb416 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6957,10 +6957,12 @@ qemuDomainCreateDevice(const char *device, } if (mknod(devicePath, sb.st_mode, sb.st_rdev) < 0) { - virReportSystemError(errno, - _("Failed to make device %s"), - devicePath); - goto cleanup; + if (errno != EEXIST) { + virReportSystemError(errno, + _("Failed to make device %s"), + devicePath); + goto cleanup; + } } if (chown(devicePath, sb.st_uid, sb.st_gid) < 0) { -- 2.11.0

On Wed, Jan 04, 2017 at 03:13:55PM +0100, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1406837
Imagine you have a domain configured in such way that you are assigning two PCI devices that fall into the same IOMMU group. With mount namespace enabled what happens is that for the first PCI device corresponding /dev/vfio/X entry is created and when the code tries to do the same for the second mknod() fails as /dev/vfio/X already exists:
2016-12-21 14:40:45.648+0000: 24681: error : qemuProcessReportLogError:1792 : internal error: Process exited prior to exec: libvirt: QEMU Driver error : Failed to make device /var/run/libvirt/qemu/windoze.dev//vfio/22: File exists
Worse, by default there are some devices that are created in the namespace regardless of domain configuration (e.g. /dev/null, /dev/urandom, etc.). If one of them is set as backend for some guest device (e.g. rng, chardev, etc.) it's the same story as described above.
Weirdly, in attach code this is already handled.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 25cb4ad590..d05ebcb416 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6957,10 +6957,12 @@ qemuDomainCreateDevice(const char *device, }
if (mknod(devicePath, sb.st_mode, sb.st_rdev) < 0) { - virReportSystemError(errno, - _("Failed to make device %s"), - devicePath); - goto cleanup; + if (errno != EEXIST) { + virReportSystemError(errno, + _("Failed to make device %s"), + devicePath); + goto cleanup; + } }
You can also skip the chown here if you want. ACK either way.
if (chown(devicePath, sb.st_uid, sb.st_gid) < 0) { -- 2.11.0
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

Not that I'd encounter any bug here, but the code doesn't look 100% correct. Imagine, somebody is trying to attach a device to a domain, and the device's /dev entry already exists in the qemu namespace. This is handled gracefully and the control continues with setting up ACLs and calling security manager to set up labels. Now, if any of these steps fail, control jump on the 'cleanup' label and unlink() the file straight away. Even when it was not us who created the file in the first place. This can be possibly dangerous. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index d05ebcb416..40bed1b396 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -7521,6 +7521,7 @@ qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, { struct qemuDomainAttachDeviceMknodData *data = opaque; int ret = -1; + bool delDevice = false; virSecurityManagerPostFork(data->driver->securityManager); @@ -7543,6 +7544,8 @@ qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, data->file); goto cleanup; } + } else { + delDevice = true; } if (virFileSetACLs(data->file, data->acl) < 0 && @@ -7606,7 +7609,7 @@ qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, ret = 0; cleanup: - if (ret < 0) + if (ret < 0 && delDevice) unlink(data->file); virFileFreeACLs(&data->acl); return ret; -- 2.11.0

On Wed, Jan 04, 2017 at 03:13:56PM +0100, Michal Privoznik wrote:
Not that I'd encounter any bug here, but the code doesn't look 100% correct. Imagine, somebody is trying to attach a device to a domain, and the device's /dev entry already exists in the qemu namespace. This is handled gracefully and the control continues with setting up ACLs and calling security manager to set up labels. Now, if any of these steps fail, control jump on the 'cleanup' label and unlink() the file straight away. Even when it was not us who created the file in the first place. This can be possibly dangerous.
"Don't unlink non-existing files" or something similar would be enough, I guess :)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index d05ebcb416..40bed1b396 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -7521,6 +7521,7 @@ qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, { struct qemuDomainAttachDeviceMknodData *data = opaque; int ret = -1; + bool delDevice = false;
virSecurityManagerPostFork(data->driver->securityManager);
@@ -7543,6 +7544,8 @@ qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED, data->file); goto cleanup; } + } else { + delDevice = true; }
if (virFileSetACLs(data->file, data->acl) < 0 && @@ -7606,7 +7609,7 @@ qemuDomainAttachDeviceMknodHelper(pid_t pid ATTRIBUTE_UNUSED,
ret = 0; cleanup: - if (ret < 0) + if (ret < 0 && delDevice) unlink(data->file); virFileFreeACLs(&data->acl); return ret; -- 2.11.0
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Wed, Jan 04, 2017 at 03:23:45PM +0100, Martin Kletzander wrote:
On Wed, Jan 04, 2017 at 03:13:56PM +0100, Michal Privoznik wrote:
Not that I'd encounter any bug here, but the code doesn't look 100% correct. Imagine, somebody is trying to attach a device to a domain, and the device's /dev entry already exists in the qemu namespace. This is handled gracefully and the control continues with setting up ACLs and calling security manager to set up labels. Now, if any of these steps fail, control jump on the 'cleanup' label and unlink() the file straight away. Even when it was not us who created the file in the first place. This can be possibly dangerous.
"Don't unlink non-existing files" or something similar would be enough, I guess :)
I forgot to add, ACK.

Again, not something that I'd hit, but there is a chance in theory that this might bite us. Currently the way we decide whether or not to create /dev entry for a device is by marching first four characters of path with "/dev". This might be not enough. Just imagine somebody has a disk image stored under "/devil/path/to/disk". We ought to be matching against "/dev/". Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 40bed1b396..3ecc30c7b5 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6917,6 +6917,8 @@ qemuDomainGetHostdevPath(virDomainHostdevDefPtr dev, } +#define DEVPREFIX "/dev/" + #if defined(__linux__) static int qemuDomainCreateDevice(const char *device, @@ -6927,7 +6929,7 @@ qemuDomainCreateDevice(const char *device, struct stat sb; int ret = -1; - if (!STRPREFIX(device, "/dev")) { + if (!STRPREFIX(device, DEVPREFIX)) { virReportError(VIR_ERR_INTERNAL_ERROR, _("invalid device: %s"), device); @@ -6935,7 +6937,7 @@ qemuDomainCreateDevice(const char *device, } if (virAsprintf(&devicePath, "%s/%s", - path, device + 4) < 0) + path, device + strlen(DEVPREFIX)) < 0) goto cleanup; if (stat(device, &sb) < 0) { @@ -7064,7 +7066,7 @@ qemuDomainSetupDisk(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, for (next = disk->src; next; next = next->backingStore) { if (!next->path || !virStorageSourceIsLocalStorage(next) || - !STRPREFIX(next->path, "/dev")) { + !STRPREFIX(next->path, DEVPREFIX)) { /* Not creating device. Just continue. */ continue; } @@ -7768,7 +7770,7 @@ qemuDomainNamespaceSetupDisk(virQEMUDriverPtr driver, for (next = disk->src; next; next = next->backingStore) { if (!next->path || !virStorageSourceIsBlockLocal(disk->src) || - !STRPREFIX(next->path, "/dev")) { + !STRPREFIX(next->path, DEVPREFIX)) { /* Not creating device. Just continue. */ continue; } -- 2.11.0

On Wed, Jan 04, 2017 at 03:13:57PM +0100, Michal Privoznik wrote:
Again, not something that I'd hit, but there is a chance in theory that this might bite us. Currently the way we decide whether or not to create /dev entry for a device is by marching first four characters of path with "/dev". This might be not enough. Just imagine somebody has a disk image stored under "/devil/path/to/disk". We ought to be matching against "/dev/".
I haven't checked it, but I believe this code only gets absolure canonicalized paths, otherwise you could theoretically still have /dev/../some/other/path.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 40bed1b396..3ecc30c7b5 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6917,6 +6917,8 @@ qemuDomainGetHostdevPath(virDomainHostdevDefPtr dev, }
+#define DEVPREFIX "/dev/" + #if defined(__linux__) static int qemuDomainCreateDevice(const char *device, @@ -6927,7 +6929,7 @@ qemuDomainCreateDevice(const char *device, struct stat sb; int ret = -1;
- if (!STRPREFIX(device, "/dev")) { + if (!STRPREFIX(device, DEVPREFIX)) { virReportError(VIR_ERR_INTERNAL_ERROR, _("invalid device: %s"), device); @@ -6935,7 +6937,7 @@ qemuDomainCreateDevice(const char *device, }
if (virAsprintf(&devicePath, "%s/%s", - path, device + 4) < 0) + path, device + strlen(DEVPREFIX)) < 0)
And we'll get rid of the double path separator as a bonus. ACK
participants (2)
-
Martin Kletzander
-
Michal Privoznik