[libvirt] [PATCH v2 0/3] Improve handling of multipath devices

v2 of: https://www.redhat.com/archives/libvir-list/2018-March/msg01541.html diff to v1: - Moved code to new file virdevmapper.c - Rename to virDevMapperGetTargets because this bug is not specific just to multipath rather than all device mapper targets - Reworked logging as suggested in review to v1 Michal Privoznik (3): util: Introduce virDevMapperGetTargets qemu: Handle device mapper targets properly news: Document device mapper fix docs/news.xml | 10 +++ libvirt.spec.in | 2 + src/libvirt_private.syms | 4 ++ src/qemu/qemu_cgroup.c | 42 ++++++++++++- src/qemu/qemu_domain.c | 64 +++++++++++++++++++ src/util/Makefile.inc.am | 2 + src/util/virdevmapper.c | 160 +++++++++++++++++++++++++++++++++++++++++++++++ src/util/virdevmapper.h | 32 ++++++++++ 8 files changed, 313 insertions(+), 3 deletions(-) create mode 100644 src/util/virdevmapper.c create mode 100644 src/util/virdevmapper.h -- 2.16.1

This helper fetches dependencies for given device mapper target. At the same time, we need to provide a dummy log function because by default libdevmapper prints out error messages to stderr which we need to suppress. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 4 ++ src/util/Makefile.inc.am | 2 + src/util/virdevmapper.c | 160 +++++++++++++++++++++++++++++++++++++++++++++++ src/util/virdevmapper.h | 32 ++++++++++ 4 files changed, 198 insertions(+) create mode 100644 src/util/virdevmapper.c create mode 100644 src/util/virdevmapper.h diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 03fe3b315f..3b7a19cff1 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1669,6 +1669,10 @@ virDBusMessageUnref; virDBusSetSharedBus; +# util/virdevmapper.h +virDevMapperGetTargets; + + # util/virdnsmasq.h dnsmasqAddDhcpHost; dnsmasqAddHost; diff --git a/src/util/Makefile.inc.am b/src/util/Makefile.inc.am index a3c3b711fd..9624fb687c 100644 --- a/src/util/Makefile.inc.am +++ b/src/util/Makefile.inc.am @@ -35,6 +35,8 @@ UTIL_SOURCES = \ util/virdbus.c \ util/virdbus.h \ util/virdbuspriv.h \ + util/virdevmapper.c \ + util/virdevmapper.h \ util/virdnsmasq.c \ util/virdnsmasq.h \ util/virebtables.c \ diff --git a/src/util/virdevmapper.c b/src/util/virdevmapper.c new file mode 100644 index 0000000000..9717482da8 --- /dev/null +++ b/src/util/virdevmapper.c @@ -0,0 +1,160 @@ +/* + * virdevmapper.c: Functions for handling devmapper + * + * Copyright (C) 2018 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Michal Privoznik <mprivozn@redhat.com> + */ + +#include <config.h> + +#ifdef MAJOR_IN_MKDEV +# include <sys/mkdev.h> +#elif MAJOR_IN_SYSMACROS +# include <sys/sysmacros.h> +#endif + +#ifdef WITH_DEVMAPPER +# include <libdevmapper.h> +#endif + +#include "virdevmapper.h" +#include "internal.h" +#include "virthread.h" +#include "viralloc.h" + +#ifdef WITH_DEVMAPPER +static void +virDevMapperDummyLogger(int level ATTRIBUTE_UNUSED, + const char *file ATTRIBUTE_UNUSED, + int line ATTRIBUTE_UNUSED, + int dm_errno ATTRIBUTE_UNUSED, + const char *fmt ATTRIBUTE_UNUSED, + ...) +{ + return; +} + +static int +virDevMapperOnceInit(void) +{ + /* Ideally, we would not need this. But libdevmapper prints + * error messages to stderr by default. Sad but true. */ + dm_log_with_errno_init(virDevMapperDummyLogger); + return 0; +} + + +VIR_ONCE_GLOBAL_INIT(virDevMapper) + +/** + * virDevMapperGetTargets: + * @path: multipath device + * @maj: returned array of MAJOR device numbers + * @min: returner array of MINOR device numbers + * @nmaj: number of items in @maj array + * + * For given @path figure out its targets, and store them in @maj + * and @min arrays. Both arrays have the same number of items + * upon return. + * + * If @path is not a multipath device, @ndevs is set to 0 and + * success is returned. + * + * If we don't have permissions to talk to kernel, -1 is returned + * and errno is set to EBADF. + * + * Returns 0 on success, + * -1 otherwise (with errno set, no libvirt error is + * reported) + */ +int +virDevMapperGetTargets(const char *path, + unsigned int **maj, + unsigned int **min, + size_t *nmaj) +{ + struct dm_task *dmt = NULL; + struct dm_deps *deps; + struct dm_info info; + size_t i; + int ret = -1; + + *nmaj = 0; + + if (virDevMapperInitialize() < 0) + goto cleanup; + + if (!(dmt = dm_task_create(DM_DEVICE_DEPS))) + goto cleanup; + + if (!dm_task_set_name(dmt, path)) { + if (errno == ENOENT) { + /* It's okay, @path is not managed by devmapper => + * not a multipath device. */ + ret = 0; + } + goto cleanup; + } + + dm_task_no_open_count(dmt); + + if (!dm_task_run(dmt)) + goto cleanup; + + if (!dm_task_get_info(dmt, &info)) + goto cleanup; + + if (!info.exists) { + ret = 0; + goto cleanup; + } + + if (!(deps = dm_task_get_deps(dmt))) + goto cleanup; + + if (VIR_ALLOC_N_QUIET(*maj, deps->count) < 0 || + VIR_ALLOC_N_QUIET(*min, deps->count) < 0) { + VIR_FREE(*maj); + goto cleanup; + } + *nmaj = deps->count; + + for (i = 0; i < deps->count; i++) { + (*maj)[i] = major(deps->device[i]); + (*min)[i] = minor(deps->device[i]); + } + + ret = 0; + cleanup: + dm_task_destroy(dmt); + return ret; +} + +#else /* ! WITH_DEVMAPPER */ + +int +virDevMapperGetTargets(const char *path ATTRIBUTE_UNUSED, + unsigned int **maj ATTRIBUTE_UNUSED, + unsigned int **min ATTRIBUTE_UNUSED, + size_t *nmaj ATTRIBUTE_UNUSED) +{ + errno = ENOSYS; + return -1; +} +#endif /* ! WITH_DEVMAPPER */ diff --git a/src/util/virdevmapper.h b/src/util/virdevmapper.h new file mode 100644 index 0000000000..e88ac0f8ce --- /dev/null +++ b/src/util/virdevmapper.h @@ -0,0 +1,32 @@ +/* + * virdevmapper.h: Functions for handling devmapper + * + * Copyright (C) 2018 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Michal Privoznik <mprivozn@redhat.com> + */ + +#ifndef __VIR_DEVMAPPER_H__ +# define __VIR_DEVMAPPER_H__ + +int virDevMapperGetTargets(const char *path, + unsigned int **maj, + unsigned int **min, + size_t *nmaj); + +#endif /* __VIR_DEVMAPPER_H__ */ -- 2.16.1

On Mon, Mar 26, 2018 at 16:43:01 +0200, Michal Privoznik wrote:
This helper fetches dependencies for given device mapper target.
At the same time, we need to provide a dummy log function because by default libdevmapper prints out error messages to stderr which we need to suppress.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 4 ++ src/util/Makefile.inc.am | 2 + src/util/virdevmapper.c | 160 +++++++++++++++++++++++++++++++++++++++++++++++ src/util/virdevmapper.h | 32 ++++++++++ 4 files changed, 198 insertions(+) create mode 100644 src/util/virdevmapper.c create mode 100644 src/util/virdevmapper.h
[...]
diff --git a/src/util/virdevmapper.c b/src/util/virdevmapper.c new file mode 100644 index 0000000000..9717482da8 --- /dev/null +++ b/src/util/virdevmapper.c @@ -0,0 +1,160 @@ +/* + * virdevmapper.c: Functions for handling devmapper + * + * Copyright (C) 2018 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Michal Privoznik <mprivozn@redhat.com> + */ + +#include <config.h> + +#ifdef MAJOR_IN_MKDEV +# include <sys/mkdev.h> +#elif MAJOR_IN_SYSMACROS +# include <sys/sysmacros.h> +#endif + +#ifdef WITH_DEVMAPPER +# include <libdevmapper.h> +#endif + +#include "virdevmapper.h" +#include "internal.h" +#include "virthread.h" +#include "viralloc.h" + +#ifdef WITH_DEVMAPPER +static void +virDevMapperDummyLogger(int level ATTRIBUTE_UNUSED, + const char *file ATTRIBUTE_UNUSED, + int line ATTRIBUTE_UNUSED, + int dm_errno ATTRIBUTE_UNUSED, + const char *fmt ATTRIBUTE_UNUSED, + ...) +{ + return; +} + +static int +virDevMapperOnceInit(void) +{ + /* Ideally, we would not need this. But libdevmapper prints + * error messages to stderr by default. Sad but true. */ + dm_log_with_errno_init(virDevMapperDummyLogger); + return 0; +} + + +VIR_ONCE_GLOBAL_INIT(virDevMapper) + +/** + * virDevMapperGetTargets: + * @path: multipath device + * @maj: returned array of MAJOR device numbers + * @min: returner array of MINOR device numbers + * @nmaj: number of items in @maj array + * + * For given @path figure out its targets, and store them in @maj + * and @min arrays. Both arrays have the same number of items + * upon return. + * + * If @path is not a multipath device, @ndevs is set to 0 and + * success is returned. + * + * If we don't have permissions to talk to kernel, -1 is returned + * and errno is set to EBADF. + * + * Returns 0 on success, + * -1 otherwise (with errno set, no libvirt error is + * reported) + */ +int +virDevMapperGetTargets(const char *path, + unsigned int **maj, + unsigned int **min, + size_t *nmaj) +{ + struct dm_task *dmt = NULL; + struct dm_deps *deps; + struct dm_info info; + size_t i; + int ret = -1; + + *nmaj = 0; + + if (virDevMapperInitialize() < 0) + goto cleanup;
[1]
+ + if (!(dmt = dm_task_create(DM_DEVICE_DEPS))) + goto cleanup;
[1]
+ + if (!dm_task_set_name(dmt, path)) { + if (errno == ENOENT) { + /* It's okay, @path is not managed by devmapper => + * not a multipath device. */ + ret = 0; + } + goto cleanup; + } + + dm_task_no_open_count(dmt); + + if (!dm_task_run(dmt)) + goto cleanup; + + if (!dm_task_get_info(dmt, &info)) + goto cleanup; + + if (!info.exists) { + ret = 0; + goto cleanup; + } + + if (!(deps = dm_task_get_deps(dmt))) + goto cleanup; + + if (VIR_ALLOC_N_QUIET(*maj, deps->count) < 0 || + VIR_ALLOC_N_QUIET(*min, deps->count) < 0) { + VIR_FREE(*maj); + goto cleanup; + } + *nmaj = deps->count; + + for (i = 0; i < deps->count; i++) { + (*maj)[i] = major(deps->device[i]); + (*min)[i] = minor(deps->device[i]); + } + + ret = 0; + cleanup: + dm_task_destroy(dmt);
[1] dm_task_destroy dereferences the argument without checking it for NULL first. Replace the two jumps above with direct returns.
+ return ret; +} + +#else /* ! WITH_DEVMAPPER */ + +int +virDevMapperGetTargets(const char *path ATTRIBUTE_UNUSED, + unsigned int **maj ATTRIBUTE_UNUSED, + unsigned int **min ATTRIBUTE_UNUSED, + size_t *nmaj ATTRIBUTE_UNUSED) +{ + errno = ENOSYS; + return -1; +} +#endif /* ! WITH_DEVMAPPER */ diff --git a/src/util/virdevmapper.h b/src/util/virdevmapper.h new file mode 100644 index 0000000000..e88ac0f8ce --- /dev/null +++ b/src/util/virdevmapper.h @@ -0,0 +1,32 @@ +/* + * virdevmapper.h: Functions for handling devmapper + * + * Copyright (C) 2018 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * Authors: + * Michal Privoznik <mprivozn@redhat.com> + */ + +#ifndef __VIR_DEVMAPPER_H__ +# define __VIR_DEVMAPPER_H__ + +int virDevMapperGetTargets(const char *path, + unsigned int **maj, + unsigned int **min, + size_t *nmaj);
Please adhere to the new coding style guidelines for new files. ACK with the two things above addressed.

https://bugzilla.redhat.com/show_bug.cgi?id=1557769 Problem with device mapper targets is that there can be several other devices 'hidden' behind them. For instance, /dev/dm-1 can consist of /dev/sda, /dev/sdb and /dev/sdc. Therefore, when setting up devices CGroup and namespaces we have to take this into account. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- libvirt.spec.in | 2 ++ src/qemu/qemu_cgroup.c | 42 ++++++++++++++++++++++++++++++--- src/qemu/qemu_domain.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 105 insertions(+), 3 deletions(-) diff --git a/libvirt.spec.in b/libvirt.spec.in index b55a947ec9..ebfac10866 100644 --- a/libvirt.spec.in +++ b/libvirt.spec.in @@ -796,6 +796,8 @@ Requires: gzip Requires: bzip2 Requires: lzop Requires: xz +# For mpath devices +Requires: device-mapper %if 0%{?fedora} || 0%{?rhel} > 7 Requires: systemd-container %endif diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index b604edb31c..e17b3d21b5 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -37,6 +37,7 @@ #include "virtypedparam.h" #include "virnuma.h" #include "virsystemd.h" +#include "virdevmapper.h" #define VIR_FROM_THIS VIR_FROM_QEMU @@ -60,7 +61,13 @@ qemuSetupImagePathCgroup(virDomainObjPtr vm, { qemuDomainObjPrivatePtr priv = vm->privateData; int perms = VIR_CGROUP_DEVICE_READ; - int ret; + unsigned int *maj = NULL; + unsigned int *min = NULL; + size_t nmaj = 0; + size_t i; + char *devPath = NULL; + int rv; + int ret = -1; if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) return 0; @@ -71,12 +78,41 @@ qemuSetupImagePathCgroup(virDomainObjPtr vm, VIR_DEBUG("Allow path %s, perms: %s", path, virCgroupGetDevicePermsString(perms)); - ret = virCgroupAllowDevicePath(priv->cgroup, path, perms, true); + rv = virCgroupAllowDevicePath(priv->cgroup, path, perms, true); virDomainAuditCgroupPath(vm, priv->cgroup, "allow", path, virCgroupGetDevicePermsString(perms), - ret); + rv); + if (rv < 0) + goto cleanup; + if (virDevMapperGetTargets(path, &maj, &min, &nmaj) < 0 && + errno != ENOSYS && errno != EBADF) { + virReportSystemError(errno, + _("Unable to get devmapper targets for %s"), + path); + goto cleanup; + } + + for (i = 0; i < nmaj; i++) { + if (virAsprintf(&devPath, "/dev/block/%u:%u", maj[i], min[i]) < 0) + goto cleanup; + + rv = virCgroupAllowDevicePath(priv->cgroup, devPath, perms, true); + + virDomainAuditCgroupPath(vm, priv->cgroup, "allow", devPath, + virCgroupGetDevicePermsString(perms), + rv); + if (rv < 0) + goto cleanup; + VIR_FREE(devPath); + } + + ret = 0; + cleanup: + VIR_FREE(devPath); + VIR_FREE(min); + VIR_FREE(maj); return ret; } diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 580e0f830d..5f56324502 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -54,6 +54,7 @@ #include "secret_util.h" #include "logging/log_manager.h" #include "locking/domain_lock.h" +#include "virdevmapper.h" #ifdef MAJOR_IN_MKDEV # include <sys/mkdev.h> @@ -10108,6 +10109,11 @@ qemuDomainSetupDisk(virQEMUDriverConfigPtr cfg ATTRIBUTE_UNUSED, { virStorageSourcePtr next; char *dst = NULL; + unsigned int *maj = NULL; + unsigned int *min = NULL; + size_t nmaj = 0; + char *devPath = NULL; + size_t i; int ret = -1; for (next = disk->src; virStorageSourceIsBacking(next); next = next->backingStore) { @@ -10118,10 +10124,31 @@ qemuDomainSetupDisk(virQEMUDriverConfigPtr cfg ATTRIBUTE_UNUSED, if (qemuDomainCreateDevice(next->path, data, false) < 0) goto cleanup; + + if (virDevMapperGetTargets(next->path, &maj, &min, &nmaj) < 0 && + errno != ENOSYS && errno != EBADF) { + virReportSystemError(errno, + _("Unable to get mpath targets for %s"), + next->path); + goto cleanup; + } + + for (i = 0; i < nmaj; i++) { + if (virAsprintf(&devPath, "/dev/block/%u:%u", maj[i], min[i]) < 0) + goto cleanup; + + if (qemuDomainCreateDevice(devPath, data, false) < 0) + goto cleanup; + + VIR_FREE(devPath); + } } ret = 0; cleanup: + VIR_FREE(devPath); + VIR_FREE(min); + VIR_FREE(maj); VIR_FREE(dst); return ret; } @@ -11131,6 +11158,12 @@ qemuDomainNamespaceSetupDisk(virDomainObjPtr vm, virStorageSourcePtr next; char **paths = NULL; size_t npaths = 0; + unsigned int *maj = NULL; + unsigned int *min = NULL; + size_t nmaj = 0; + char **devmapperPaths = NULL; + size_t ndevmapperPaths = 0; + size_t i; int ret = -1; if (!qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) @@ -11145,6 +11178,32 @@ qemuDomainNamespaceSetupDisk(virDomainObjPtr vm, if (VIR_APPEND_ELEMENT_COPY(paths, npaths, next->path) < 0) goto cleanup; + + if (virDevMapperGetTargets(next->path, &maj, &min, &nmaj) < 0 && + errno != ENOSYS && errno != EBADF) { + virReportSystemError(errno, + _("Unable to get mpath targets for %s"), + next->path); + goto cleanup; + } + + if (VIR_REALLOC_N(devmapperPaths, ndevmapperPaths + nmaj) < 0) + goto cleanup; + + for (i = 0; i < nmaj; i++) { + if (virAsprintf(&devmapperPaths[ndevmapperPaths], + "/sys/block/%u:%u", + maj[i], min[i]) < 0) + goto cleanup; + ndevmapperPaths++; + + if (VIR_APPEND_ELEMENT_COPY(paths, npaths, + devmapperPaths[ndevmapperPaths - 1]) < 0) + goto cleanup; + } + + VIR_FREE(min); + VIR_FREE(maj); } if (qemuDomainNamespaceMknodPaths(vm, (const char **)paths, npaths) < 0) @@ -11152,6 +11211,11 @@ qemuDomainNamespaceSetupDisk(virDomainObjPtr vm, ret = 0; cleanup: + for (i = 0; i < ndevmapperPaths; i++) + VIR_FREE(devmapperPaths[i]); + VIR_FREE(devmapperPaths); + VIR_FREE(min); + VIR_FREE(maj); VIR_FREE(paths); return ret; } -- 2.16.1

On Mon, Mar 26, 2018 at 16:43:02 +0200, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1557769
Problem with device mapper targets is that there can be several other devices 'hidden' behind them. For instance, /dev/dm-1 can consist of /dev/sda, /dev/sdb and /dev/sdc. Therefore, when setting up devices CGroup and namespaces we have to take this into account.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- libvirt.spec.in | 2 ++ src/qemu/qemu_cgroup.c | 42 ++++++++++++++++++++++++++++++--- src/qemu/qemu_domain.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 105 insertions(+), 3 deletions(-)
diff --git a/libvirt.spec.in b/libvirt.spec.in index b55a947ec9..ebfac10866 100644 --- a/libvirt.spec.in +++ b/libvirt.spec.in @@ -796,6 +796,8 @@ Requires: gzip Requires: bzip2 Requires: lzop Requires: xz +# For mpath devices +Requires: device-mapper %if 0%{?fedora} || 0%{?rhel} > 7 Requires: systemd-container %endif diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index b604edb31c..e17b3d21b5 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -37,6 +37,7 @@ #include "virtypedparam.h" #include "virnuma.h" #include "virsystemd.h" +#include "virdevmapper.h"
#define VIR_FROM_THIS VIR_FROM_QEMU
@@ -60,7 +61,13 @@ qemuSetupImagePathCgroup(virDomainObjPtr vm, { qemuDomainObjPrivatePtr priv = vm->privateData; int perms = VIR_CGROUP_DEVICE_READ; - int ret; + unsigned int *maj = NULL; + unsigned int *min = NULL; + size_t nmaj = 0; + size_t i; + char *devPath = NULL; + int rv; + int ret = -1;
if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) return 0; @@ -71,12 +78,41 @@ qemuSetupImagePathCgroup(virDomainObjPtr vm, VIR_DEBUG("Allow path %s, perms: %s", path, virCgroupGetDevicePermsString(perms));
- ret = virCgroupAllowDevicePath(priv->cgroup, path, perms, true); + rv = virCgroupAllowDevicePath(priv->cgroup, path, perms, true);
virDomainAuditCgroupPath(vm, priv->cgroup, "allow", path, virCgroupGetDevicePermsString(perms), - ret); + rv); + if (rv < 0) + goto cleanup;
+ if (virDevMapperGetTargets(path, &maj, &min, &nmaj) < 0 && + errno != ENOSYS && errno != EBADF) { + virReportSystemError(errno, + _("Unable to get devmapper targets for %s"), + path); + goto cleanup; + } + + for (i = 0; i < nmaj; i++) { + if (virAsprintf(&devPath, "/dev/block/%u:%u", maj[i], min[i]) < 0) + goto cleanup;
So since this path will not help that much in the audit logs ...
+ + rv = virCgroupAllowDevicePath(priv->cgroup, devPath, perms, true);
... you probably should use virCgroupAllowDevice ...
+ + virDomainAuditCgroupPath(vm, priv->cgroup, "allow", devPath, + virCgroupGetDevicePermsString(perms), + rv);
... and add an equivalent of virDomainAuditCgroupMajor that takes both major:minor similarly to virDomainAuditCgroupMajor.
+ if (rv < 0) + goto cleanup; + VIR_FREE(devPath); + } + + ret = 0; + cleanup: + VIR_FREE(devPath); + VIR_FREE(min); + VIR_FREE(maj); return ret; }
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 580e0f830d..5f56324502 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -54,6 +54,7 @@ #include "secret_util.h" #include "logging/log_manager.h" #include "locking/domain_lock.h" +#include "virdevmapper.h"
#ifdef MAJOR_IN_MKDEV # include <sys/mkdev.h> @@ -10108,6 +10109,11 @@ qemuDomainSetupDisk(virQEMUDriverConfigPtr cfg ATTRIBUTE_UNUSED, { virStorageSourcePtr next; char *dst = NULL; + unsigned int *maj = NULL; + unsigned int *min = NULL; + size_t nmaj = 0; + char *devPath = NULL; + size_t i; int ret = -1;
for (next = disk->src; virStorageSourceIsBacking(next); next = next->backingStore) { @@ -10118,10 +10124,31 @@ qemuDomainSetupDisk(virQEMUDriverConfigPtr cfg ATTRIBUTE_UNUSED,
if (qemuDomainCreateDevice(next->path, data, false) < 0) goto cleanup; + + if (virDevMapperGetTargets(next->path, &maj, &min, &nmaj) < 0 && + errno != ENOSYS && errno != EBADF) { + virReportSystemError(errno, + _("Unable to get mpath targets for %s"), + next->path); + goto cleanup; + } + + for (i = 0; i < nmaj; i++) { + if (virAsprintf(&devPath, "/dev/block/%u:%u", maj[i], min[i]) < 0) + goto cleanup; + + if (qemuDomainCreateDevice(devPath, data, false) < 0) + goto cleanup;
So now that I see this new version, this part starts looking suspicious to me. Since this did not care much that the path changed, is it really necessary to create the /dev/ entries in the container? Looks like even device mapper is returning them as the node specificator, so I'd presume it really does not matter if they are present. More specifically we can't really reverse engineer from the major:minor numbers which actual path the user used so it should not really be necessary for it to be present in the container.

On 03/26/2018 05:17 PM, Peter Krempa wrote:
On Mon, Mar 26, 2018 at 16:43:02 +0200, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1557769
Problem with device mapper targets is that there can be several other devices 'hidden' behind them. For instance, /dev/dm-1 can consist of /dev/sda, /dev/sdb and /dev/sdc. Therefore, when setting up devices CGroup and namespaces we have to take this into account.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- libvirt.spec.in | 2 ++ src/qemu/qemu_cgroup.c | 42 ++++++++++++++++++++++++++++++--- src/qemu/qemu_domain.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 105 insertions(+), 3 deletions(-)
diff --git a/libvirt.spec.in b/libvirt.spec.in index b55a947ec9..ebfac10866 100644 --- a/libvirt.spec.in +++ b/libvirt.spec.in @@ -796,6 +796,8 @@ Requires: gzip Requires: bzip2 Requires: lzop Requires: xz +# For mpath devices +Requires: device-mapper %if 0%{?fedora} || 0%{?rhel} > 7 Requires: systemd-container %endif diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index b604edb31c..e17b3d21b5 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -37,6 +37,7 @@ #include "virtypedparam.h" #include "virnuma.h" #include "virsystemd.h" +#include "virdevmapper.h"
#define VIR_FROM_THIS VIR_FROM_QEMU
@@ -60,7 +61,13 @@ qemuSetupImagePathCgroup(virDomainObjPtr vm, { qemuDomainObjPrivatePtr priv = vm->privateData; int perms = VIR_CGROUP_DEVICE_READ; - int ret; + unsigned int *maj = NULL; + unsigned int *min = NULL; + size_t nmaj = 0; + size_t i; + char *devPath = NULL; + int rv; + int ret = -1;
if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) return 0; @@ -71,12 +78,41 @@ qemuSetupImagePathCgroup(virDomainObjPtr vm, VIR_DEBUG("Allow path %s, perms: %s", path, virCgroupGetDevicePermsString(perms));
- ret = virCgroupAllowDevicePath(priv->cgroup, path, perms, true); + rv = virCgroupAllowDevicePath(priv->cgroup, path, perms, true);
virDomainAuditCgroupPath(vm, priv->cgroup, "allow", path, virCgroupGetDevicePermsString(perms), - ret); + rv); + if (rv < 0) + goto cleanup;
+ if (virDevMapperGetTargets(path, &maj, &min, &nmaj) < 0 && + errno != ENOSYS && errno != EBADF) { + virReportSystemError(errno, + _("Unable to get devmapper targets for %s"), + path); + goto cleanup; + } + + for (i = 0; i < nmaj; i++) { + if (virAsprintf(&devPath, "/dev/block/%u:%u", maj[i], min[i]) < 0) + goto cleanup;
So since this path will not help that much in the audit logs ...
+ + rv = virCgroupAllowDevicePath(priv->cgroup, devPath, perms, true);
... you probably should use virCgroupAllowDevice ...
+ + virDomainAuditCgroupPath(vm, priv->cgroup, "allow", devPath, + virCgroupGetDevicePermsString(perms), + rv);
... and add an equivalent of virDomainAuditCgroupMajor that takes both major:minor similarly to virDomainAuditCgroupMajor.
+ if (rv < 0) + goto cleanup; + VIR_FREE(devPath); + } + + ret = 0; + cleanup: + VIR_FREE(devPath); + VIR_FREE(min); + VIR_FREE(maj); return ret; }
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 580e0f830d..5f56324502 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -54,6 +54,7 @@ #include "secret_util.h" #include "logging/log_manager.h" #include "locking/domain_lock.h" +#include "virdevmapper.h"
#ifdef MAJOR_IN_MKDEV # include <sys/mkdev.h> @@ -10108,6 +10109,11 @@ qemuDomainSetupDisk(virQEMUDriverConfigPtr cfg ATTRIBUTE_UNUSED, { virStorageSourcePtr next; char *dst = NULL; + unsigned int *maj = NULL; + unsigned int *min = NULL; + size_t nmaj = 0; + char *devPath = NULL; + size_t i; int ret = -1;
for (next = disk->src; virStorageSourceIsBacking(next); next = next->backingStore) { @@ -10118,10 +10124,31 @@ qemuDomainSetupDisk(virQEMUDriverConfigPtr cfg ATTRIBUTE_UNUSED,
if (qemuDomainCreateDevice(next->path, data, false) < 0) goto cleanup; + + if (virDevMapperGetTargets(next->path, &maj, &min, &nmaj) < 0 && + errno != ENOSYS && errno != EBADF) { + virReportSystemError(errno, + _("Unable to get mpath targets for %s"), + next->path); + goto cleanup; + } + + for (i = 0; i < nmaj; i++) { + if (virAsprintf(&devPath, "/dev/block/%u:%u", maj[i], min[i]) < 0) + goto cleanup; + + if (qemuDomainCreateDevice(devPath, data, false) < 0) + goto cleanup;
So now that I see this new version, this part starts looking suspicious to me. Since this did not care much that the path changed, is it really necessary to create the /dev/ entries in the container?
Looks like even device mapper is returning them as the node specificator, so I'd presume it really does not matter if they are present.
More specifically we can't really reverse engineer from the major:minor numbers which actual path the user used so it should not really be necessary for it to be present in the container.
Yes, looks like I was too eager trying to fix this bug. I've rebuilt libvirt without qemu_domain.c change (so only CGroup code was modified) and the bug still did not reproduce. So I guess namespace changes are not necessary after all. I'll drop them. Michal

On Mon, Mar 26, 2018 at 17:22:01 +0200, Michal Privoznik wrote:
On 03/26/2018 05:17 PM, Peter Krempa wrote:
On Mon, Mar 26, 2018 at 16:43:02 +0200, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1557769
Problem with device mapper targets is that there can be several other devices 'hidden' behind them. For instance, /dev/dm-1 can consist of /dev/sda, /dev/sdb and /dev/sdc. Therefore, when setting up devices CGroup and namespaces we have to take this into account.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
[...]
+ for (i = 0; i < nmaj; i++) { + if (virAsprintf(&devPath, "/dev/block/%u:%u", maj[i], min[i]) < 0) + goto cleanup; + + if (qemuDomainCreateDevice(devPath, data, false) < 0) + goto cleanup;
So now that I see this new version, this part starts looking suspicious to me. Since this did not care much that the path changed, is it really necessary to create the /dev/ entries in the container?
Looks like even device mapper is returning them as the node specificator, so I'd presume it really does not matter if they are present.
More specifically we can't really reverse engineer from the major:minor numbers which actual path the user used so it should not really be necessary for it to be present in the container.
Yes, looks like I was too eager trying to fix this bug. I've rebuilt libvirt without qemu_domain.c change (so only CGroup code was modified) and the bug still did not reproduce. So I guess namespace changes are not necessary after all. I'll drop them.
Okay. Apart from that I thought about this for a while and also tested various configurations. Unfortunately I was not able to reproduce the issue. I realized that this code is allowing the backing devices for all device-mapper based storage. While I don't think that it is a very big problem we still might have security implications (e.g. give access to the whole device, while the device mapper makes the device accessible only partially). This means that I'd like to have an explanation why it is necessary in that case to allow the backend devices so that we can asses whether it's worth doing it always or we can limit the amount of devices allowed in cgroups. Also yet another implication is that in the hotunplug code for disks we'd remove the cgroup for the main device, but this implementation does not remove the slave devices.

On 03/27/2018 08:56 PM, Peter Krempa wrote:
On Mon, Mar 26, 2018 at 17:22:01 +0200, Michal Privoznik wrote:
On 03/26/2018 05:17 PM, Peter Krempa wrote:
On Mon, Mar 26, 2018 at 16:43:02 +0200, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1557769
Problem with device mapper targets is that there can be several other devices 'hidden' behind them. For instance, /dev/dm-1 can consist of /dev/sda, /dev/sdb and /dev/sdc. Therefore, when setting up devices CGroup and namespaces we have to take this into account.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
[...]
+ for (i = 0; i < nmaj; i++) { + if (virAsprintf(&devPath, "/dev/block/%u:%u", maj[i], min[i]) < 0) + goto cleanup; + + if (qemuDomainCreateDevice(devPath, data, false) < 0) + goto cleanup;
So now that I see this new version, this part starts looking suspicious to me. Since this did not care much that the path changed, is it really necessary to create the /dev/ entries in the container?
Looks like even device mapper is returning them as the node specificator, so I'd presume it really does not matter if they are present.
More specifically we can't really reverse engineer from the major:minor numbers which actual path the user used so it should not really be necessary for it to be present in the container.
Yes, looks like I was too eager trying to fix this bug. I've rebuilt libvirt without qemu_domain.c change (so only CGroup code was modified) and the bug still did not reproduce. So I guess namespace changes are not necessary after all. I'll drop them.
Okay. Apart from that I thought about this for a while and also tested various configurations. Unfortunately I was not able to reproduce the issue.
Looks like this is a kernel bug after all. I've also done some testing myself and fount that: a) on upstream kernel I'm unable to reproduce (4.15.11-gentoo), b) also 7.4 kenel works, c) it's only 7.5 where this bug presents itself. https://bugzilla.redhat.com/show_bug.cgi?id=1557769#c32 So I think we should ignore these patches for now until we have some input from kernel team. Michal

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/news.xml | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/docs/news.xml b/docs/news.xml index 1088895746..6f1ceb6389 100644 --- a/docs/news.xml +++ b/docs/news.xml @@ -94,6 +94,16 @@ </change> </section> <section title="Bug fixes"> + <change> + <summary> + Improve handling of device mapper targets + </summary> + <description> + When starting a domain with a disk hidden behind + devmapper libvirt needs to allow them both in devices + CGroup: the devmapper target and the disk. + </description> + </change> </section> </release> <release version="v4.1.0" date="2018-03-05"> -- 2.16.1

On Mon, Mar 26, 2018 at 16:43:03 +0200, Michal Privoznik wrote:
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/news.xml | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/docs/news.xml b/docs/news.xml index 1088895746..6f1ceb6389 100644 --- a/docs/news.xml +++ b/docs/news.xml @@ -94,6 +94,16 @@ </change> </section> <section title="Bug fixes"> + <change> + <summary> + Improve handling of device mapper targets + </summary> + <description> + When starting a domain with a disk hidden behind + devmapper libvirt needs to allow them both in devices + CGroup: the devmapper target and the disk.
This does not read very well how about: When starting a domain with a disk backed by a device-mapper volume libvirt also needs to allow the storage backing the device mapper in cgroups. But that acutally brings me to yet another question: You described the problem with multipath devices. But given that LVM or dm-crypt would basically suffer from the same problems it would be great if you also could figure out the root of the problem. Also it might be interresting to know if it's similarly happening with multiple layers of device-mapper volumes since 'dmsetup deps' (and the equivalent algorithm added in this series) does not resolve the device nodes recursively.
participants (2)
-
Michal Privoznik
-
Peter Krempa