[libvirt] [PATCH 0/5] Improve handling of multipath devices

Imagine you have a device mapper target which consists of 1..N disks (e.g. it's a multipath device, LUKS disk, whatever). Now, libvirt currently allows only the frontend (/dev/dm-1) in CGroups. However, this can make qemu fail [1] so we need to allow all the backend devices too. 1: https://bugzilla.redhat.com/show_bug.cgi?id=1557769#c22 Michal Privoznik (5): virlog: Don't log devmapper utils: Introduce virFileGetMPathTargets virfile: Introduce virFileMajMinToName qemu: Handle multipath devices properly news: Document multipath fix docs/news.xml | 10 ++++ libvirt.spec.in | 2 + src/libvirt_private.syms | 2 + src/qemu/qemu_cgroup.c | 43 +++++++++++++- src/qemu/qemu_domain.c | 58 +++++++++++++++++++ src/util/virfile.c | 143 +++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 7 +++ src/util/virlog.c | 23 ++++++++ 8 files changed, 285 insertions(+), 3 deletions(-) -- 2.16.1

Unless overridden, libdevmapper logs all messages to stderr. Therefore if something goes wrong in storage_backend_mpath.c or parthelper.c we don't honour user set logging targets. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/virlog.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/src/util/virlog.c b/src/util/virlog.c index dd927f0ba7..51f2d341db 100644 --- a/src/util/virlog.c +++ b/src/util/virlog.c @@ -40,6 +40,9 @@ #if HAVE_SYS_UN_H # include <sys/un.h> #endif +#if WITH_DEVMAPPER +# include <libdevmapper.h> +#endif #include "virerror.h" #include "virlog.h" @@ -258,6 +261,20 @@ virLogPriorityString(virLogPriority lvl) } +#ifdef WITH_DEVMAPPER +static void +virLogDM(int level ATTRIBUTE_UNUSED, + const char *file ATTRIBUTE_UNUSED, + int line ATTRIBUTE_UNUSED, + int dm_errno ATTRIBUTE_UNUSED, + const char *fmt ATTRIBUTE_UNUSED, + ...) +{ + return; +} +#endif + + static int virLogOnceInit(void) { @@ -289,6 +306,12 @@ virLogOnceInit(void) NUL_TERMINATE(virLogHostname); } +#ifdef WITH_DEVMAPPER + /* Ideally, we would not need this. But libdevmapper prints + * error messages to stderr by default. Sad but true. */ + dm_log_with_errno_init(virLogDM); +#endif + virLogUnlock(); return 0; } -- 2.16.1

On Mon, Mar 26, 2018 at 07:16:42 +0200, Michal Privoznik wrote:
Unless overridden, libdevmapper logs all messages to stderr. Therefore if something goes wrong in storage_backend_mpath.c or parthelper.c we don't honour user set logging targets.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/virlog.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)
I think this should be done in the function which registers the mpath storage backend. For now it should work as is, but given that we are going to split the daemons this code will not really be necessary in other places. And it really does not have to do much with our logging code.

On 03/26/2018 09:49 AM, Peter Krempa wrote:
On Mon, Mar 26, 2018 at 07:16:42 +0200, Michal Privoznik wrote:
Unless overridden, libdevmapper logs all messages to stderr. Therefore if something goes wrong in storage_backend_mpath.c or parthelper.c we don't honour user set logging targets.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/virlog.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)
I think this should be done in the function which registers the mpath storage backend. For now it should work as is, but given that we are going to split the daemons this code will not really be necessary in other places.
And it really does not have to do much with our logging code.
Well, in the next patch I'm introducing virFileGetMPathTargets() which uses dm_* a lot. Now, since the function lives in src/util/ it can be used by many drivers. Do we expect each one of them to provide dummy callback on their own? I don't think so. I agree this is not a nice patch, but I don't really see any better place for it. Michal

On Mon, Mar 26, 2018 at 10:16:54 +0200, Michal Privoznik wrote:
On 03/26/2018 09:49 AM, Peter Krempa wrote:
On Mon, Mar 26, 2018 at 07:16:42 +0200, Michal Privoznik wrote:
Unless overridden, libdevmapper logs all messages to stderr. Therefore if something goes wrong in storage_backend_mpath.c or parthelper.c we don't honour user set logging targets.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/virlog.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)
I think this should be done in the function which registers the mpath storage backend. For now it should work as is, but given that we are going to split the daemons this code will not really be necessary in other places.
And it really does not have to do much with our logging code.
Well, in the next patch I'm introducing virFileGetMPathTargets() which uses dm_* a lot. Now, since the function lives in src/util/ it can be used by many drivers. Do we expect each one of them to provide dummy callback on their own? I don't think so. I agree this is not a nice patch, but I don't really see any better place for it.
I've written another response to my original reply which explains a bit more. Basically, the code in the next patch should be put in a new file along with a virOnce function to silence the logs if you don't want to move it into the storage driver directly.

On Mon, Mar 26, 2018 at 09:49:08 +0200, Peter Krempa wrote:
On Mon, Mar 26, 2018 at 07:16:42 +0200, Michal Privoznik wrote:
Unless overridden, libdevmapper logs all messages to stderr. Therefore if something goes wrong in storage_backend_mpath.c or parthelper.c we don't honour user set logging targets.
'parthelper.c' does not use our logging code and also does not use any libdevmapper.h functions. According to my experiment the header file include can be dropped.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/virlog.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)
I think this should be done in the function which registers the mpath storage backend. For now it should work as is, but given that we are going to split the daemons this code will not really be necessary in other places.
So looking at the new code you've added in this series, it still might be desirable to move all the libdevmapper use into the storage driver and keep it there, so users wanting to use multipath would need to install the corresponding storage driver. Given that moving it there might be impractical, I'll suggest that you move it to a separate util file and add virOnce infrastructure to set the error helpers.

On 03/26/2018 10:44 AM, Peter Krempa wrote:
On Mon, Mar 26, 2018 at 09:49:08 +0200, Peter Krempa wrote:
On Mon, Mar 26, 2018 at 07:16:42 +0200, Michal Privoznik wrote:
Unless overridden, libdevmapper logs all messages to stderr. Therefore if something goes wrong in storage_backend_mpath.c or parthelper.c we don't honour user set logging targets.
'parthelper.c' does not use our logging code and also does not use any libdevmapper.h functions. According to my experiment the header file include can be dropped.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/virlog.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)
I think this should be done in the function which registers the mpath storage backend. For now it should work as is, but given that we are going to split the daemons this code will not really be necessary in other places.
So looking at the new code you've added in this series, it still might be desirable to move all the libdevmapper use into the storage driver and keep it there, so users wanting to use multipath would need to install the corresponding storage driver.
Given that moving it there might be impractical, I'll suggest that you move it to a separate util file and add virOnce infrastructure to set the error helpers.
Okay, fair enough. I'll wait for review on the rest of the patches before I post v2. Thanks, Michal

This helper fetches targets for multipath devices. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virfile.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 4 +++ 3 files changed, 95 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 03fe3b315f..1bbff72263 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1773,6 +1773,7 @@ virFileGetACLs; virFileGetHugepageSize; virFileGetMountReverseSubtree; virFileGetMountSubtree; +virFileGetMPathTargets; virFileHasSuffix; virFileInData; virFileIsAbsPath; diff --git a/src/util/virfile.c b/src/util/virfile.c index 5e9bd2007a..f89e4bd823 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -64,6 +64,10 @@ # include <sys/ioctl.h> #endif +#ifdef WITH_DEVMAPPER +# include <libdevmapper.h> +#endif + #include "configmake.h" #include "intprops.h" #include "viralloc.h" @@ -4227,3 +4231,89 @@ virFileWaitForExists(const char *path, return 0; } + + +#ifdef WITH_DEVMAPPER +/** + * virFileGetMPathTargets: + * @path: multipath device + * @devs: returned array of st_rdevs of @path targets + * @ndevs: size of @devs and @devNames + * + * For given @path figure out its targets, and store them in + * @devs. Items from @devs are meant to be used in + * minor() and major() to receive device MAJ:MIN pairs. + * + * If @path is not a multipath device, @ndevs is set to 0 and + * success is returned. + * + * If we don't have permissions to talk to kernel, -1 is returned + * and errno is set to EBADF. + * + * Returns 0 on success, -1 otherwise. + */ +int +virFileGetMPathTargets(const char *path, + unsigned long long **devs, + size_t *ndevs) +{ + struct dm_deps *deps; + struct dm_task *dmt; + struct dm_info info; + size_t i; + int ret = -1; + + *ndevs = 0; + + if (!(dmt = dm_task_create(DM_DEVICE_DEPS))) + goto cleanup; + + if (!dm_task_set_name(dmt, path)) { + if (errno == ENOENT) { + /* It's okay, @path is not managed by devmapper => + * not a multipath device. */ + ret = 0; + } + goto cleanup; + } + + dm_task_no_open_count(dmt); + + if (!dm_task_run(dmt)) + goto cleanup; + + if (!dm_task_get_info(dmt, &info)) + goto cleanup; + + if (!info.exists) { + ret = 0; + goto cleanup; + } + + if (!(deps = dm_task_get_deps(dmt))) + goto cleanup; + + if (VIR_ALLOC_N(*devs, deps->count) < 0) + goto cleanup; + *ndevs = deps->count; + + for (i = 0; i < deps->count; i++) + (*devs)[i] = deps->device[i]; + + ret = 0; + cleanup: + dm_task_destroy(dmt); + return ret; +} + +#else /* ! WITH_DEVMAPPER */ + +int +virFileGetMPathTargets(const char *path ATTRIBUTE_UNUSED, + unsigned long long **devs ATTRIBUTE_UNUSED, + size_t *ndevs ATTRIBUTE_UNUSED) +{ + errno = ENOSYS; + return -1; +} +#endif /* ! WITH_DEVMAPPER */ diff --git a/src/util/virfile.h b/src/util/virfile.h index cd2a3867c2..0db8bf0b99 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -358,4 +358,8 @@ int virFileInData(int fd, int *inData, long long *length); +int virFileGetMPathTargets(const char *path, + unsigned long long **devs, + size_t *ndevs); + #endif /* __VIR_FILE_H */ -- 2.16.1

On Mon, Mar 26, 2018 at 07:16:43 +0200, Michal Privoznik wrote:
This helper fetches targets for multipath devices.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virfile.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 4 +++ 3 files changed, 95 insertions(+)
As I've pointed in the conversation regarding patch 1/5, this should be put into a separate file, so that it's obvious what needs to be removed if this functionality will need to be put into the storage driver. Additionally it then will be justified to use a virOnce there to set the logging function for libdevmapper.
diff --git a/src/util/virfile.c b/src/util/virfile.c index 5e9bd2007a..f89e4bd823 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c
@@ -4227,3 +4231,89 @@ virFileWaitForExists(const char *path,
return 0; } + + +#ifdef WITH_DEVMAPPER +/** + * virFileGetMPathTargets: + * @path: multipath device + * @devs: returned array of st_rdevs of @path targets + * @ndevs: size of @devs and @devNames
There's no 'devNames' parameter.
+ * + * For given @path figure out its targets, and store them in + * @devs. Items from @devs are meant to be used in + * minor() and major() to receive device MAJ:MIN pairs. + * + * If @path is not a multipath device, @ndevs is set to 0 and + * success is returned. + * + * If we don't have permissions to talk to kernel, -1 is returned + * and errno is set to EBADF.
You need to document that this function does not use libvirt errors.
+ * + * Returns 0 on success, -1 otherwise. + */ +int +virFileGetMPathTargets(const char *path, + unsigned long long **devs,
struct 'dm_deps' uses uint64_t for this.
+ size_t *ndevs) +{ + struct dm_deps *deps; + struct dm_task *dmt; + struct dm_info info; + size_t i; + int ret = -1; + + *ndevs = 0; + + if (!(dmt = dm_task_create(DM_DEVICE_DEPS))) + goto cleanup; + + if (!dm_task_set_name(dmt, path)) { + if (errno == ENOENT) { + /* It's okay, @path is not managed by devmapper => + * not a multipath device. */ + ret = 0; + } + goto cleanup; + } + + dm_task_no_open_count(dmt); + + if (!dm_task_run(dmt)) + goto cleanup; + + if (!dm_task_get_info(dmt, &info)) + goto cleanup; + + if (!info.exists) { + ret = 0; + goto cleanup; + } + + if (!(deps = dm_task_get_deps(dmt))) + goto cleanup; + + if (VIR_ALLOC_N(*devs, deps->count) < 0)
VIR_ALLOC_N_QUIET, since we don't do libvirt errors.
+ goto cleanup; + *ndevs = deps->count; + + for (i = 0; i < deps->count; i++) + (*devs)[i] = deps->device[i]; + + ret = 0; + cleanup: + dm_task_destroy(dmt); + return ret; +}
Okay this code is stol^W inspired by the 'dmsetup deps' functionality. You might want to point that fact out, since it took some time to find it. The code looks good, but since it requires the virOnce stuff, v2 will be needed.

This function can be used to translate MAJ:MIN device pair into /dev/blah name. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virfile.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 3 +++ 3 files changed, 57 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 1bbff72263..a705bfd5ef 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1787,6 +1787,7 @@ virFileLength; virFileLinkPointsTo; virFileLock; virFileLoopDeviceAssociate; +virFileMajMinToName; virFileMakeParentPath; virFileMakePath; virFileMakePathWithMode; diff --git a/src/util/virfile.c b/src/util/virfile.c index f89e4bd823..62620862a1 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -68,6 +68,12 @@ # include <libdevmapper.h> #endif +#ifdef MAJOR_IN_MKDEV +# include <sys/mkdev.h> +#elif MAJOR_IN_SYSMACROS +# include <sys/sysmacros.h> +#endif + #include "configmake.h" #include "intprops.h" #include "viralloc.h" @@ -4317,3 +4323,50 @@ virFileGetMPathTargets(const char *path ATTRIBUTE_UNUSED, return -1; } #endif /* ! WITH_DEVMAPPER */ + + +/** + * virFileMajMinToName: + * @device: device (rdev) to translate + * @name: returned name + * + * For given MAJ:MIN pair (stored in one integer like in st_rdev) + * fetch device name, e.g. 8:0 is translated to "/dev/sda". + * Caller is responsible for freeing @name when no longer needed. + * + * Returns 0 on success, -1 otherwise. + */ +int +virFileMajMinToName(unsigned long long device, + char **name) +{ + struct stat sb; + char *sysfsPath = NULL; + char *link = NULL; + int ret = -1; + + *name = NULL; + if (virAsprintf(&sysfsPath, "/sys/dev/block/%u:%u", + major(device), minor(device)) < 0) + goto cleanup; + + if (lstat(sysfsPath, &sb) < 0) + goto cleanup; + + if (!S_ISLNK(sb.st_mode)) { + errno = ENXIO; + goto cleanup; + } + + if (virFileReadLink(sysfsPath, &link) < 0) + goto cleanup; + + if (virAsprintf(name, "/dev/%s", last_component(link)) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(link); + VIR_FREE(sysfsPath); + return ret; +} diff --git a/src/util/virfile.h b/src/util/virfile.h index 0db8bf0b99..390940dc98 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -362,4 +362,7 @@ int virFileGetMPathTargets(const char *path, unsigned long long **devs, size_t *ndevs); +int virFileMajMinToName(unsigned long long device, + char **name); + #endif /* __VIR_FILE_H */ -- 2.16.1

On Mon, Mar 26, 2018 at 07:16:44 +0200, Michal Privoznik wrote:
This function can be used to translate MAJ:MIN device pair into /dev/blah name.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virfile.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 3 +++ 3 files changed, 57 insertions(+)
diff --git a/src/util/virfile.c b/src/util/virfile.c index f89e4bd823..62620862a1 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -68,6 +68,12 @@ # include <libdevmapper.h> #endif
+#ifdef MAJOR_IN_MKDEV +# include <sys/mkdev.h> +#elif MAJOR_IN_SYSMACROS +# include <sys/sysmacros.h> +#endif + #include "configmake.h" #include "intprops.h" #include "viralloc.h" @@ -4317,3 +4323,50 @@ virFileGetMPathTargets(const char *path ATTRIBUTE_UNUSED, return -1; } #endif /* ! WITH_DEVMAPPER */ + + +/** + * virFileMajMinToName: + * @device: device (rdev) to translate + * @name: returned name
I'd prefer if you supply the device identifiers already separated. That means that the function in the previous patch should preferrably already split them so that we don't need to do it here.
+ * + * For given MAJ:MIN pair (stored in one integer like in st_rdev) + * fetch device name, e.g. 8:0 is translated to "/dev/sda". + * Caller is responsible for freeing @name when no longer needed.
There is no info on how to treat errors.
+ * + * Returns 0 on success, -1 otherwise.
Why isn't the path returned directly?
+ */ +int +virFileMajMinToName(unsigned long long device, + char **name) +{ + struct stat sb; + char *sysfsPath = NULL; + char *link = NULL; + int ret = -1; + + *name = NULL; + if (virAsprintf(&sysfsPath, "/sys/dev/block/%u:%u", + major(device), minor(device)) < 0)
So the libdevmapper code is using a similar path to figure out the sysfs path, but only for devicemapper devices since it accesses /sys/dev/block/%u:%u/dm/name. I've also found that '/dev/block/' has the information you might need. Is there any particular reason to use sysfs? I'm not really persuaded that the directory name in sysfs is good enough so if you can provide where you've inspired yourself it will be beneficial. Additionally virAsprintf reports libvirt errors but this function does not.
+ goto cleanup; + + if (lstat(sysfsPath, &sb) < 0) + goto cleanup; + + if (!S_ISLNK(sb.st_mode)) { + errno = ENXIO; + goto cleanup; + } + + if (virFileReadLink(sysfsPath, &link) < 0) + goto cleanup; + + if (virAsprintf(name, "/dev/%s", last_component(link)) < 0) + goto cleanup;
So this is the fishy path. I'd prefer /dev/block since the path already points to something in the /dev/ filesystem and isn't conjured up from something that looks the same.
+ + ret = 0; + cleanup: + VIR_FREE(link); + VIR_FREE(sysfsPath); + return ret; +} diff --git a/src/util/virfile.h b/src/util/virfile.h index 0db8bf0b99..390940dc98 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -362,4 +362,7 @@ int virFileGetMPathTargets(const char *path, unsigned long long **devs, size_t *ndevs);
+int virFileMajMinToName(unsigned long long device, + char **name); + #endif /* __VIR_FILE_H */ -- 2.16.1
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 03/26/2018 11:58 AM, Peter Krempa wrote:
On Mon, Mar 26, 2018 at 07:16:44 +0200, Michal Privoznik wrote:
This function can be used to translate MAJ:MIN device pair into /dev/blah name.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virfile.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 3 +++ 3 files changed, 57 insertions(+)
diff --git a/src/util/virfile.c b/src/util/virfile.c index f89e4bd823..62620862a1 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -68,6 +68,12 @@ # include <libdevmapper.h> #endif
+#ifdef MAJOR_IN_MKDEV +# include <sys/mkdev.h> +#elif MAJOR_IN_SYSMACROS +# include <sys/sysmacros.h> +#endif + #include "configmake.h" #include "intprops.h" #include "viralloc.h" @@ -4317,3 +4323,50 @@ virFileGetMPathTargets(const char *path ATTRIBUTE_UNUSED, return -1; } #endif /* ! WITH_DEVMAPPER */ + + +/** + * virFileMajMinToName: + * @device: device (rdev) to translate + * @name: returned name
I'd prefer if you supply the device identifiers already separated. That means that the function in the previous patch should preferrably already split them so that we don't need to do it here.
Okay.
+ * + * For given MAJ:MIN pair (stored in one integer like in st_rdev) + * fetch device name, e.g. 8:0 is translated to "/dev/sda". + * Caller is responsible for freeing @name when no longer needed.
There is no info on how to treat errors.
+ * + * Returns 0 on success, -1 otherwise.
Why isn't the path returned directly?
No reason. It's just that I'm used to functions returning an integer. I can change it.
+ */ +int +virFileMajMinToName(unsigned long long device, + char **name) +{ + struct stat sb; + char *sysfsPath = NULL; + char *link = NULL; + int ret = -1; + + *name = NULL; + if (virAsprintf(&sysfsPath, "/sys/dev/block/%u:%u", + major(device), minor(device)) < 0)
So the libdevmapper code is using a similar path to figure out the sysfs path, but only for devicemapper devices since it accesses /sys/dev/block/%u:%u/dm/name.
I've also found that '/dev/block/' has the information you might need.
Is there any particular reason to use sysfs? I'm not really persuaded that the directory name in sysfs is good enough so if you can provide where you've inspired yourself it will be beneficial.
Okay, I'll switch to /dev/block/... Looks like I took too much inspiration from other projects O:-)
Additionally virAsprintf reports libvirt errors but this function does not.
+ goto cleanup; + + if (lstat(sysfsPath, &sb) < 0) + goto cleanup; + + if (!S_ISLNK(sb.st_mode)) { + errno = ENXIO; + goto cleanup; + } + + if (virFileReadLink(sysfsPath, &link) < 0) + goto cleanup; + + if (virAsprintf(name, "/dev/%s", last_component(link)) < 0) + goto cleanup;
So this is the fishy path. I'd prefer /dev/block since the path already points to something in the /dev/ filesystem and isn't conjured up from something that looks the same.
Ah, good point. So if I do plain: virAsprintf(&path, "/dev/block/%u:%u", major(), minor()) I immediately get the correct symlink with both namespace and cgroup codes know how to deal with. Cool, so this patch might be dropped. Michal

https://bugzilla.redhat.com/show_bug.cgi?id=1557769 Problem with multipath devices is that there can be several other devices 'hidden' behind them. For instance, /dev/dm-1 can consist of /dev/sda, /dev/sdb and /dev/sdc. Therefore, when setting up devices CGroup and namespaces we have to take this into account. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- libvirt.spec.in | 2 ++ src/qemu/qemu_cgroup.c | 43 ++++++++++++++++++++++++++++++++++--- src/qemu/qemu_domain.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 100 insertions(+), 3 deletions(-) diff --git a/libvirt.spec.in b/libvirt.spec.in index b55a947ec9..ebfac10866 100644 --- a/libvirt.spec.in +++ b/libvirt.spec.in @@ -796,6 +796,8 @@ Requires: gzip Requires: bzip2 Requires: lzop Requires: xz +# For mpath devices +Requires: device-mapper %if 0%{?fedora} || 0%{?rhel} > 7 Requires: systemd-container %endif diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index b604edb31c..a2198c9789 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -60,7 +60,12 @@ qemuSetupImagePathCgroup(virDomainObjPtr vm, { qemuDomainObjPrivatePtr priv = vm->privateData; int perms = VIR_CGROUP_DEVICE_READ; - int ret; + unsigned long long *mpathdevs = NULL; + size_t nmpathdevs = 0; + size_t i; + char *devPath = NULL; + int rv; + int ret = -1; if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) return 0; @@ -71,12 +76,44 @@ qemuSetupImagePathCgroup(virDomainObjPtr vm, VIR_DEBUG("Allow path %s, perms: %s", path, virCgroupGetDevicePermsString(perms)); - ret = virCgroupAllowDevicePath(priv->cgroup, path, perms, true); + rv = virCgroupAllowDevicePath(priv->cgroup, path, perms, true); virDomainAuditCgroupPath(vm, priv->cgroup, "allow", path, virCgroupGetDevicePermsString(perms), - ret); + rv); + if (rv < 0) + goto cleanup; + if (virFileGetMPathTargets(path, &mpathdevs, &nmpathdevs) < 0 && + errno != ENOSYS && errno != EBADF) { + virReportSystemError(errno, + _("Unable to get mpath targets for %s"), + path); + goto cleanup; + } + + for (i = 0; i < nmpathdevs; i++) { + if (virFileMajMinToName(mpathdevs[i], &devPath) < 0) { + virReportSystemError(errno, + _("Unable to translate %llx to device path"), + mpathdevs[i]); + goto cleanup; + } + + rv = virCgroupAllowDevicePath(priv->cgroup, devPath, perms, true); + + virDomainAuditCgroupPath(vm, priv->cgroup, "allow", devPath, + virCgroupGetDevicePermsString(perms), + rv); + if (rv < 0) + goto cleanup; + VIR_FREE(devPath); + } + + ret = 0; + cleanup: + VIR_FREE(devPath); + VIR_FREE(mpathdevs); return ret; } diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 4aaf617dae..cd39eb5942 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -10105,6 +10105,10 @@ qemuDomainSetupDisk(virQEMUDriverConfigPtr cfg ATTRIBUTE_UNUSED, { virStorageSourcePtr next; char *dst = NULL; + unsigned long long *mpathdevs = NULL; + size_t nmpathdevs = 0; + char *mpathPath = NULL; + size_t i; int ret = -1; for (next = disk->src; virStorageSourceIsBacking(next); next = next->backingStore) { @@ -10115,10 +10119,34 @@ qemuDomainSetupDisk(virQEMUDriverConfigPtr cfg ATTRIBUTE_UNUSED, if (qemuDomainCreateDevice(next->path, data, false) < 0) goto cleanup; + + if (virFileGetMPathTargets(next->path, &mpathdevs, &nmpathdevs) < 0 && + errno != ENOSYS && errno != EBADF) { + virReportSystemError(errno, + _("Unable to get mpath targets for %s"), + next->path); + goto cleanup; + } + + for (i = 0; i < nmpathdevs; i++) { + if (virFileMajMinToName(mpathdevs[i], &mpathPath) < 0) { + virReportSystemError(errno, + _("Unable to translate %llx to device path"), + mpathdevs[i]); + goto cleanup; + } + + if (qemuDomainCreateDevice(mpathPath, data, false) < 0) + goto cleanup; + + VIR_FREE(mpathPath); + } } ret = 0; cleanup: + VIR_FREE(mpathPath); + VIR_FREE(mpathdevs); VIR_FREE(dst); return ret; } @@ -11128,6 +11156,10 @@ qemuDomainNamespaceSetupDisk(virDomainObjPtr vm, virStorageSourcePtr next; char **paths = NULL; size_t npaths = 0; + unsigned long long *mpathdevs = NULL; + char **mpathdevPaths = NULL; + size_t nmpathdevs = 0; + size_t i; int ret = -1; if (!qemuDomainNamespaceEnabled(vm, QEMU_DOMAIN_NS_MOUNT)) @@ -11142,6 +11174,29 @@ qemuDomainNamespaceSetupDisk(virDomainObjPtr vm, if (VIR_APPEND_ELEMENT_COPY(paths, npaths, next->path) < 0) goto cleanup; + + if (virFileGetMPathTargets(next->path, &mpathdevs, &nmpathdevs) < 0 && + errno != ENOSYS && errno != EBADF) { + virReportSystemError(errno, + _("Unable to get mpath targets for %s"), + next->path); + goto cleanup; + } + + if (VIR_ALLOC_N(mpathdevPaths, nmpathdevs) < 0) + goto cleanup; + + for (i = 0; i < nmpathdevs; i++) { + if (virFileMajMinToName(mpathdevs[i], &mpathdevPaths[i]) < 0) { + virReportSystemError(errno, + _("Unable to translate %llx to device path"), + mpathdevs[i]); + goto cleanup; + } + + if (VIR_APPEND_ELEMENT_COPY(paths, npaths, mpathdevPaths[i]) < 0) + goto cleanup; + } } if (qemuDomainNamespaceMknodPaths(vm, (const char **)paths, npaths) < 0) @@ -11149,6 +11204,9 @@ qemuDomainNamespaceSetupDisk(virDomainObjPtr vm, ret = 0; cleanup: + for (i = 0; i < nmpathdevs; i++) + VIR_FREE(mpathdevPaths[i]); + VIR_FREE(mpathdevPaths); VIR_FREE(paths); return ret; } -- 2.16.1

On Mon, Mar 26, 2018 at 07:16:45 +0200, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1557769
Problem with multipath devices is that there can be several other devices 'hidden' behind them. For instance, /dev/dm-1 can consist of /dev/sda, /dev/sdb and /dev/sdc. Therefore, when setting up devices CGroup and namespaces we have to take this into account.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- libvirt.spec.in | 2 ++ src/qemu/qemu_cgroup.c | 43 ++++++++++++++++++++++++++++++++++--- src/qemu/qemu_domain.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 100 insertions(+), 3 deletions(-)
diff --git a/libvirt.spec.in b/libvirt.spec.in index b55a947ec9..ebfac10866 100644 --- a/libvirt.spec.in +++ b/libvirt.spec.in @@ -796,6 +796,8 @@ Requires: gzip Requires: bzip2 Requires: lzop Requires: xz +# For mpath devices +Requires: device-mapper %if 0%{?fedora} || 0%{?rhel} > 7 Requires: systemd-container %endif diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index b604edb31c..a2198c9789 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -60,7 +60,12 @@ qemuSetupImagePathCgroup(virDomainObjPtr vm, { qemuDomainObjPrivatePtr priv = vm->privateData; int perms = VIR_CGROUP_DEVICE_READ; - int ret; + unsigned long long *mpathdevs = NULL; + size_t nmpathdevs = 0; + size_t i; + char *devPath = NULL; + int rv; + int ret = -1;
if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_DEVICES)) return 0; @@ -71,12 +76,44 @@ qemuSetupImagePathCgroup(virDomainObjPtr vm, VIR_DEBUG("Allow path %s, perms: %s", path, virCgroupGetDevicePermsString(perms));
- ret = virCgroupAllowDevicePath(priv->cgroup, path, perms, true); + rv = virCgroupAllowDevicePath(priv->cgroup, path, perms, true);
virDomainAuditCgroupPath(vm, priv->cgroup, "allow", path, virCgroupGetDevicePermsString(perms), - ret); + rv); + if (rv < 0) + goto cleanup;
+ if (virFileGetMPathTargets(path, &mpathdevs, &nmpathdevs) < 0 && + errno != ENOSYS && errno != EBADF) { + virReportSystemError(errno, + _("Unable to get mpath targets for %s"), + path); + goto cleanup; + } + + for (i = 0; i < nmpathdevs; i++) { + if (virFileMajMinToName(mpathdevs[i], &devPath) < 0) {
So for cgroups this really isn't necessary. You can use virCgroupAllowDevice which already takes the major/minor numbers rather than a path. Given that virCgroupAllowDevicePath would convert the path to major/minor numbers, this conversion is really wasteful.
+ virReportSystemError(errno, + _("Unable to translate %llx to device path"), + mpathdevs[i]);
This error message would be very useless, given that it's not coverted into major/minor.
+ goto cleanup; + } + + rv = virCgroupAllowDevicePath(priv->cgroup, devPath, perms, true); + + virDomainAuditCgroupPath(vm, priv->cgroup, "allow", devPath, + virCgroupGetDevicePermsString(perms), + rv);
Hmm, right. The message in the audit log when allowing the device would not contain the path to the device, which would make it slightly harder to use.
+ if (rv < 0) + goto cleanup; + VIR_FREE(devPath); + } + + ret = 0; + cleanup: + VIR_FREE(devPath); + VIR_FREE(mpathdevs); return ret;
The only thing is that I'm not really a fan of calling the detection code twice for every device you are going to add to the VM, once for cgroups and once for namespaces. I think that the disk image setup code should be extracted, so that the data needs to be obtained only once and afterwards can be used to setup all the required images. I'm not going to insist, but that also means that nobody will ever do it. Especially since we for some reason hide all cgroup code setup into qemu_cgroup. This also means that it has to be duplicated for namespaces. The code will require some changes if you implement my suggestions, but it looks reasonable.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/news.xml | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/docs/news.xml b/docs/news.xml index 1088895746..a7ac63eddb 100644 --- a/docs/news.xml +++ b/docs/news.xml @@ -94,6 +94,16 @@ </change> </section> <section title="Bug fixes"> + <change> + <summary> + Improve handling of multipath devices + </summary> + <description> + When starting a domain with a disk hidden behind + devmapper libvirt needs to allow them both in devices + CGroup: the devmapper target and the disk. + </description> + </change> </section> </release> <release version="v4.1.0" date="2018-03-05"> -- 2.16.1
participants (2)
-
Michal Privoznik
-
Peter Krempa