[libvirt] [PATCH 0/1] Multipath pool support

The following patch implements multipath pool support. It's very basic functionality, consisting of creating a pool that contains all the multipath devices on the host. That will cover the common case of users who just want to discover all the available multipath devices and assign them to guests. It doesn't currently allow configuration of multipathing, so for now setting the multipath configuration will have to continue to be done as part of the host system build. Example XML to create the pool is: <pool type="mpath"> <name>mpath</name> <target> <path>/dev/mapper</path> </target> </pool> The target element is ignored, as it is by the disk pool, but the config code rejects the XML if it does not exist. That behavior should obviously be cleaned up, but I think that should be done in a separate patch, as it's really a bug in the config code, not related to the addition of the new pool type. Dave

This pool type contains volumes for the multipath devices that are present on the host. It does not (yet) support any sort of multipath configuration, so that aspect of system administration must still be done at host build time. --- configure.in | 24 +++ po/POTFILES.in | 1 + src/Makefile.am | 9 + src/storage_backend.c | 82 ++++++++++ src/storage_backend.h | 5 +- src/storage_backend_mpath.c | 344 +++++++++++++++++++++++++++++++++++++++++++ src/storage_backend_mpath.h | 31 ++++ src/storage_conf.c | 7 +- src/storage_conf.h | 1 + 9 files changed, 502 insertions(+), 2 deletions(-) create mode 100644 src/storage_backend_mpath.c create mode 100644 src/storage_backend_mpath.h diff --git a/configure.in b/configure.in index 634e812..7a94373 100644 --- a/configure.in +++ b/configure.in @@ -906,6 +906,8 @@ AC_ARG_WITH([storage-iscsi], [ --with-storage-iscsi with iSCSI backend for the storage driver (on)],[],[with_storage_iscsi=check]) AC_ARG_WITH([storage-scsi], [ --with-storage-scsi with SCSI backend for the storage driver (on)],[],[with_storage_scsi=check]) +AC_ARG_WITH([storage-mpath], +[ --with-storage-mpath with mpath backend for the storage driver (on)],[],[with_storage_mpath=check]) AC_ARG_WITH([storage-disk], [ --with-storage-disk with GPartd Disk backend for the storage driver (on)],[],[with_storage_disk=check]) @@ -916,6 +918,7 @@ if test "$with_libvirtd" = "no"; then with_storage_lvm=no with_storage_iscsi=no with_storage_scsi=no + with_storage_mpath=no with_storage_disk=no fi if test "$with_storage_dir" = "yes" ; then @@ -1037,6 +1040,26 @@ if test "$with_storage_scsi" = "check"; then fi AM_CONDITIONAL([WITH_STORAGE_SCSI], [test "$with_storage_scsi" = "yes"]) +if test "$with_storage_mpath" = "check"; then + with_storage_mpath=yes + + AC_DEFINE_UNQUOTED([WITH_STORAGE_MPATH], 1, + [whether mpath backend for storage driver is enabled]) +fi +AM_CONDITIONAL([WITH_STORAGE_MPATH], [test "$with_storage_mpath" = "yes"]) + +if test "$with_storage_mpath" = "yes"; then + DEVMAPPER_REQUIRED=0.0 + DEVMAPPER_CFLAGS= + DEVMAPPER_LIBS= + PKG_CHECK_MODULES(DEVMAPPER, devmapper >= $DEVMAPPER_REQUIRED, + [], [ + AC_MSG_ERROR( + [You must install device-mapper-devel >= $DEVMAPPER_REQUIRED to compile libvirt]) + ]) +fi +AC_SUBST([DEVMAPPER_CFLAGS]) +AC_SUBST([DEVMAPPER_LIBS]) LIBPARTED_CFLAGS= LIBPARTED_LIBS= @@ -1506,6 +1529,7 @@ AC_MSG_NOTICE([ NetFS: $with_storage_fs]) AC_MSG_NOTICE([ LVM: $with_storage_lvm]) AC_MSG_NOTICE([ iSCSI: $with_storage_iscsi]) AC_MSG_NOTICE([ SCSI: $with_storage_scsi]) +AC_MSG_NOTICE([ mpath: $with_storage_mpath]) AC_MSG_NOTICE([ Disk: $with_storage_disk]) AC_MSG_NOTICE([]) AC_MSG_NOTICE([Security Drivers]) diff --git a/po/POTFILES.in b/po/POTFILES.in index 7ccf3ab..f70936b 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -36,6 +36,7 @@ src/storage_backend_fs.c src/storage_backend_iscsi.c src/storage_backend_logical.c src/storage_backend_scsi.c +src/storage_backend_mpath.c src/storage_conf.c src/storage_driver.c src/test.c diff --git a/src/Makefile.am b/src/Makefile.am index 79826b1..57bf9ba 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -184,6 +184,9 @@ STORAGE_DRIVER_ISCSI_SOURCES = \ STORAGE_DRIVER_SCSI_SOURCES = \ storage_backend_scsi.h storage_backend_scsi.c +STORAGE_DRIVER_MPATH_SOURCES = \ + storage_backend_mpath.h storage_backend_mpath.c + STORAGE_DRIVER_DISK_SOURCES = \ storage_backend_disk.h storage_backend_disk.c @@ -436,6 +439,10 @@ if WITH_STORAGE_SCSI libvirt_driver_storage_la_SOURCES += $(STORAGE_DRIVER_SCSI_SOURCES) endif +if WITH_STORAGE_MPATH +libvirt_driver_storage_la_SOURCES += $(STORAGE_DRIVER_MPATH_SOURCES) +endif + if WITH_STORAGE_DISK libvirt_driver_storage_la_SOURCES += $(STORAGE_DRIVER_DISK_SOURCES) endif @@ -495,6 +502,7 @@ EXTRA_DIST += \ $(STORAGE_DRIVER_LVM_SOURCES) \ $(STORAGE_DRIVER_ISCSI_SOURCES) \ $(STORAGE_DRIVER_SCSI_SOURCES) \ + $(STORAGE_DRIVER_MPATH_SOURCES) \ $(STORAGE_DRIVER_DISK_SOURCES) \ $(NODE_DEVICE_DRIVER_SOURCES) \ $(NODE_DEVICE_DRIVER_HAL_SOURCES) \ @@ -563,6 +571,7 @@ libvirt_la_LDFLAGS = $(VERSION_SCRIPT_FLAGS)libvirt.syms \ $(COVERAGE_CFLAGS:-f%=-Wc,-f%) \ $(LIBXML_LIBS) $(SELINUX_LIBS) \ $(XEN_LIBS) $(DRIVER_MODULE_LIBS) \ + $(DEVMAPPER_LIBS) \ @CYGWIN_EXTRA_LDFLAGS@ @MINGW_EXTRA_LDFLAGS@ libvirt_la_CFLAGS = $(COVERAGE_CFLAGS) -DIN_LIBVIRT libvirt_la_DEPENDENCIES = $(libvirt_la_LIBADD) libvirt.syms diff --git a/src/storage_backend.c b/src/storage_backend.c index 1664804..0994fa0 100644 --- a/src/storage_backend.c +++ b/src/storage_backend.c @@ -59,6 +59,9 @@ #if WITH_STORAGE_SCSI #include "storage_backend_scsi.h" #endif +#if WITH_STORAGE_MPATH +#include "storage_backend_mpath.h" +#endif #if WITH_STORAGE_DISK #include "storage_backend_disk.h" #endif @@ -89,6 +92,9 @@ static virStorageBackendPtr backends[] = { #if WITH_STORAGE_SCSI &virStorageBackendSCSI, #endif +#if WITH_STORAGE_MPATH + &virStorageBackendMpath, +#endif #if WITH_STORAGE_DISK &virStorageBackendDisk, #endif @@ -770,6 +776,82 @@ virStorageBackendUpdateVolTargetInfoFD(virConnectPtr conn, return 0; } + +struct diskType { + int part_table_type; + unsigned short offset; + unsigned short length; + unsigned long long magic; +}; + + +static struct diskType const disk_types[] = { + { VIR_STORAGE_POOL_DISK_LVM2, 0x218, 8, 0x31303020324D564CULL }, + { VIR_STORAGE_POOL_DISK_GPT, 0x200, 8, 0x5452415020494645ULL }, + { VIR_STORAGE_POOL_DISK_DVH, 0x0, 4, 0x41A9E50BULL }, + { VIR_STORAGE_POOL_DISK_MAC, 0x0, 2, 0x5245ULL }, + { VIR_STORAGE_POOL_DISK_BSD, 0x40, 4, 0x82564557ULL }, + { VIR_STORAGE_POOL_DISK_SUN, 0x1fc, 2, 0xBEDAULL }, + /* + * NOTE: pc98 is funky; the actual signature is 0x55AA (just like dos), so + * we can't use that. At the moment I'm relying on the "dummy" IPL + * bootloader data that comes from parted. Luckily, the chances of running + * into a pc98 machine running libvirt are approximately nil. + */ + /*{ 0x1fe, 2, 0xAA55UL },*/ + { VIR_STORAGE_POOL_DISK_PC98, 0x0, 8, 0x314C5049000000CBULL }, + /* + * NOTE: the order is important here; some other disk types (like GPT and + * and PC98) also have 0x55AA at this offset. For that reason, the DOS + * one must be the last one. + */ + { VIR_STORAGE_POOL_DISK_DOS, 0x1fe, 2, 0xAA55ULL }, + { -1, 0x0, 0, 0x0ULL }, +}; + + +int +virStorageBackendUpdateVolTargetFormatFD(virConnectPtr conn, + virStorageVolTargetPtr target, + int fd) +{ + int i; + off_t start; + unsigned char buffer[1024]; + ssize_t bytes; + + /* make sure to set the target format "unknown" to begin with */ + target->format = VIR_STORAGE_POOL_DISK_UNKNOWN; + + start = lseek(fd, 0, SEEK_SET); + if (start < 0) { + virReportSystemError(conn, errno, + _("cannot seek to beginning of file '%s'"), + target->path); + return -1; + } + bytes = saferead(fd, buffer, sizeof(buffer)); + if (bytes < 0) { + virReportSystemError(conn, errno, + _("cannot read beginning of file '%s'"), + target->path); + return -1; + } + + for (i = 0; disk_types[i].part_table_type != -1; i++) { + if (disk_types[i].offset + disk_types[i].length > bytes) + continue; + if (memcmp(buffer+disk_types[i].offset, &disk_types[i].magic, + disk_types[i].length) == 0) { + target->format = disk_types[i].part_table_type; + break; + } + } + + return 0; +} + + void virStorageBackendWaitForDevices(virConnectPtr conn) { virWaitForDevices(conn); diff --git a/src/storage_backend.h b/src/storage_backend.h index e80b05d..eb5bf87 100644 --- a/src/storage_backend.h +++ b/src/storage_backend.h @@ -91,7 +91,10 @@ int virStorageBackendUpdateVolTargetInfoFD(virConnectPtr conn, int fd, unsigned long long *allocation, unsigned long long *capacity); - +int +virStorageBackendUpdateVolTargetFormatFD(virConnectPtr conn, + virStorageVolTargetPtr target, + int fd); void virStorageBackendWaitForDevices(virConnectPtr conn); char *virStorageBackendStablePath(virConnectPtr conn, diff --git a/src/storage_backend_mpath.c b/src/storage_backend_mpath.c new file mode 100644 index 0000000..8a2d90a --- /dev/null +++ b/src/storage_backend_mpath.c @@ -0,0 +1,344 @@ +/* + * storage_backend_mpath.c: storage backend for multipath handling + * + * Copyright (C) 2007-2008 Red Hat, Inc. + * Copyright (C) 2007-2008 Daniel P. Berrange + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Author: Daniel P. Berrange <berrange redhat com> + */ + +#include <config.h> + +#include <unistd.h> +#include <stdio.h> +#include <dirent.h> +#include <fcntl.h> + +#include <libdevmapper.h> + +#include "virterror_internal.h" +#include "storage_conf.h" +#include "storage_backend.h" +#include "memory.h" +#include "logging.h" + +#define VIR_FROM_THIS VIR_FROM_STORAGE + +static int +virStorageBackendMpathUpdateVolTargetInfo(virConnectPtr conn, + virStorageVolTargetPtr target, + unsigned long long *allocation, + unsigned long long *capacity) +{ + int fd; + + if ((fd = open(target->path, O_RDONLY)) < 0) { + virReportSystemError(conn, errno, + _("cannot open volume '%s'"), + target->path); + return -1; + } + + if (virStorageBackendUpdateVolTargetInfoFD(conn, + target, + fd, + allocation, + capacity) < 0) { + + VIR_ERROR(_("Failed to update volume target info for %s"), + target->path); + + return -1; + } + + if (virStorageBackendUpdateVolTargetFormatFD(conn, + target, + fd) < 0) { + VIR_ERROR(_("Failed to update volume target format for %s"), + target->path); + + return -1; + } + + return 0; +} + + +static int +virStorageBackendMpathNewVol(virConnectPtr conn, + virStoragePoolObjPtr pool, + const int devnum, + const char *dev) +{ + virStorageVolDefPtr vol; + int retval = 0; + + if (VIR_ALLOC(vol) < 0) { + virReportOOMError(conn); + retval = -1; + goto out; + } + + vol->type = VIR_STORAGE_VOL_BLOCK; + + if (virAsprintf(&(vol->name), "dm-%u", devnum) < 0) { + virReportOOMError(conn); + retval = -1; + goto free_vol; + } + + if (virAsprintf(&vol->target.path, "/dev/%s", dev) < 0) { + virReportOOMError(conn); + retval = -1; + goto free_vol; + } + + if (virStorageBackendMpathUpdateVolTargetInfo(conn, + &vol->target, + &vol->allocation, + &vol->capacity) < 0) { + + virStorageReportError(conn, VIR_ERR_INTERNAL_ERROR, + _("Failed to update volume for '%s'"), + vol->target.path); + retval = -1; + goto free_vol; + } + + /* XXX should use logical unit's UUID instead */ + vol->key = strdup(vol->target.path); + if (vol->key == NULL) { + virReportOOMError(conn); + retval = -1; + goto free_vol; + } + + pool->def->capacity += vol->capacity; + pool->def->allocation += vol->allocation; + + if (VIR_REALLOC_N(pool->volumes.objs, + pool->volumes.count + 1) < 0) { + virReportOOMError(conn); + retval = -1; + goto free_vol; + } + pool->volumes.objs[pool->volumes.count++] = vol; + + goto out; + +free_vol: + virStorageVolDefFree(vol); +out: + return retval; +} + + +static int +virStorageBackendIsMultipath(const char *devname) +{ + int ret = 0; + struct dm_task *dmt = NULL; + void *next = NULL; + uint64_t start, length; + char *target_type = NULL; + char *params = NULL; + + dmt = dm_task_create(DM_DEVICE_TABLE); + if (dmt == NULL) { + ret = -1; + goto out; + } + + if (dm_task_set_name(dmt, devname) == 0) { + ret = -1; + goto out; + } + + dm_task_no_open_count(dmt); + + if (!dm_task_run(dmt)) { + ret = -1; + goto out; + } + + next = dm_get_next_target(dmt, next, &start, &length, + &target_type, ¶ms); + + if (target_type == NULL) { + ret = -1; + goto out; + } + + if (!(strcmp(target_type, "multipath"))) { + ret = 1; + } + +out: + if (dmt != NULL) { + dm_task_destroy(dmt); + } + return ret; +} + + +static int +virStorageBackendGetMinorNumber(const char *devname, uint32_t *minor) +{ + int ret = -1; + struct dm_task *dmt; + struct dm_info info; + + if (!(dmt = dm_task_create(DM_DEVICE_INFO))) { + goto out; + } + + if (!dm_task_set_name(dmt, devname)) { + goto out; + } + + if (!dm_task_run(dmt)) { + goto out; + } + + if (!dm_task_get_info(dmt, &info)) { + goto out; + } + + *minor = info.minor; + ret = 0; + +out: + if (dmt != NULL) { + dm_task_destroy(dmt); + } + + return ret; +} + + +static int +virStorageBackendCreateVols(virConnectPtr conn, + virStoragePoolObjPtr pool, + struct dm_names *names) +{ + int retval = 0, is_mpath = 0; + char *map_device = NULL; + uint32_t minor = -1; + + do { + is_mpath = virStorageBackendIsMultipath(names->name); + + if (is_mpath < 0) { + retval = -1; + goto out; + } + + if (is_mpath == 1) { + + if (virAsprintf(&map_device, "mapper/%s", names->name) < 0) { + virReportOOMError(conn); + retval = -1; + goto out; + } + + if (virStorageBackendGetMinorNumber(names->name, &minor) < 0) { + retval = -1; + goto out; + } + + virStorageBackendMpathNewVol(conn, + pool, + minor, + map_device); + + VIR_FREE(map_device); + } + + /* Given the way libdevmapper returns its data, I don't see + * any way to avoid this series of casts. */ + names = (struct dm_names *)(((char *)names) + names->next); + + } while (names->next); + +out: + return retval; +} + + +static int +virStorageBackendGetMaps(virConnectPtr conn, + virStoragePoolObjPtr pool) +{ + int retval = 0; + struct dm_task *dmt = NULL; + struct dm_names *names = NULL; + + if (!(dmt = dm_task_create(DM_DEVICE_LIST))) { + retval = 1; + goto out; + } + + dm_task_no_open_count(dmt); + + if (!dm_task_run(dmt)) { + retval = 1; + goto out; + } + + if (!(names = dm_task_get_names(dmt))) { + retval = 1; + goto out; + } + + if (!names->dev) { + /* No devices found */ + goto out; + } + + virStorageBackendCreateVols(conn, pool, names); + +out: + if (dmt != NULL) { + dm_task_destroy (dmt); + } + return retval; +} + + +static int +virStorageBackendMpathRefreshPool(virConnectPtr conn, + virStoragePoolObjPtr pool) +{ + int retval = 0; + + VIR_ERROR(_("in %s"), __func__); + + pool->def->allocation = pool->def->capacity = pool->def->available = 0; + + virStorageBackendWaitForDevices(conn); + + virStorageBackendGetMaps(conn, pool); + + return retval; +} + + +virStorageBackend virStorageBackendMpath = { + .type = VIR_STORAGE_POOL_MPATH, + + .refreshPool = virStorageBackendMpathRefreshPool, +}; diff --git a/src/storage_backend_mpath.h b/src/storage_backend_mpath.h new file mode 100644 index 0000000..9de2b79 --- /dev/null +++ b/src/storage_backend_mpath.h @@ -0,0 +1,31 @@ +/* + * storage_backend_scsi.h: storage backend for SCSI handling + * + * Copyright (C) 2007-2008 Red Hat, Inc. + * Copyright (C) 2007-2008 Daniel P. Berrange + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Author: Daniel P. Berrange <berrange redhat com> + */ + +#ifndef __VIR_STORAGE_BACKEND_MPATH_H__ +#define __VIR_STORAGE_BACKEND_MPATH_H__ + +#include "storage_backend.h" + +extern virStorageBackend virStorageBackendMpath; + +#endif /* __VIR_STORAGE_BACKEND_MPATH_H__ */ diff --git a/src/storage_conf.c b/src/storage_conf.c index 075279c..d94c425 100644 --- a/src/storage_conf.c +++ b/src/storage_conf.c @@ -58,7 +58,7 @@ VIR_ENUM_IMPL(virStoragePool, VIR_STORAGE_POOL_LAST, "dir", "fs", "netfs", "logical", "disk", "iscsi", - "scsi") + "scsi", "mpath") VIR_ENUM_IMPL(virStoragePoolFormatFileSystem, VIR_STORAGE_POOL_FS_LAST, @@ -203,6 +203,11 @@ static virStoragePoolTypeInfo poolTypeInfo[] = { .formatToString = virStoragePoolFormatDiskTypeToString, } }, + { .poolType = VIR_STORAGE_POOL_MPATH, + .volOptions = { + .formatToString = virStoragePoolFormatDiskTypeToString, + } + }, { .poolType = VIR_STORAGE_POOL_DISK, .poolOptions = { .flags = (VIR_STORAGE_POOL_SOURCE_DEVICE), diff --git a/src/storage_conf.h b/src/storage_conf.h index a6c3650..59b8166 100644 --- a/src/storage_conf.h +++ b/src/storage_conf.h @@ -116,6 +116,7 @@ enum virStoragePoolType { VIR_STORAGE_POOL_DISK, /* Disk partitions */ VIR_STORAGE_POOL_ISCSI, /* iSCSI targets */ VIR_STORAGE_POOL_SCSI, /* SCSI HBA */ + VIR_STORAGE_POOL_MPATH, /* Multipath devices */ VIR_STORAGE_POOL_LAST, }; -- 1.6.0.6

On Tue, Jul 21, 2009 at 05:25:37PM -0400, David Allan wrote:
This pool type contains volumes for the multipath devices that are present on the host. It does not (yet) support any sort of multipath configuration, so that aspect of system administration must still be done at host build time. [...] +if test "$with_storage_mpath" = "check"; then + with_storage_mpath=yes + + AC_DEFINE_UNQUOTED([WITH_STORAGE_MPATH], 1, + [whether mpath backend for storage driver is enabled]) +fi +AM_CONDITIONAL([WITH_STORAGE_MPATH], [test "$with_storage_mpath" = "yes"]) + +if test "$with_storage_mpath" = "yes"; then + DEVMAPPER_REQUIRED=0.0 + DEVMAPPER_CFLAGS= + DEVMAPPER_LIBS= + PKG_CHECK_MODULES(DEVMAPPER, devmapper >= $DEVMAPPER_REQUIRED, + [], [ + AC_MSG_ERROR( + [You must install device-mapper-devel >= $DEVMAPPER_REQUIRED to compile libvirt]) + ]) +fi +AC_SUBST([DEVMAPPER_CFLAGS]) +AC_SUBST([DEVMAPPER_LIBS])
Hum, the way I read this is that f you give no specific option to configure, check turns into a with_storage_mpath" = "yes" and then if device-mapper-devel is not available configure fails, while I would expect check to not fail in that case and switch with_storage_mpath to no , I afraid this can be confusing, or I'm the one confused :-) [...]
+++ b/src/storage_backend_mpath.c @@ -0,0 +1,344 @@ +/* + * storage_backend_mpath.c: storage backend for multipath handling + * + * Copyright (C) 2007-2008 Red Hat, Inc. + * Copyright (C) 2007-2008 Daniel P. Berrange [...] + * Author: Daniel P. Berrange <berrange redhat com>
Hum, really ? please update :-) or clarify ! and extend to 2009 [...]
+++ b/src/storage_backend_mpath.h @@ -0,0 +1,31 @@ +/* + * storage_backend_scsi.h: storage backend for SCSI handling + * + * Copyright (C) 2007-2008 Red Hat, Inc. + * Copyright (C) 2007-2008 Daniel P. Berrange [...] + * Author: Daniel P. Berrange <berrange redhat com> + */
To be fixed too including name and description Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

Daniel Veillard wrote:
On Tue, Jul 21, 2009 at 05:25:37PM -0400, David Allan wrote:
This pool type contains volumes for the multipath devices that are present on the host. It does not (yet) support any sort of multipath configuration, so that aspect of system administration must still be done at host build time. [...] +if test "$with_storage_mpath" = "check"; then + with_storage_mpath=yes + + AC_DEFINE_UNQUOTED([WITH_STORAGE_MPATH], 1, + [whether mpath backend for storage driver is enabled]) +fi +AM_CONDITIONAL([WITH_STORAGE_MPATH], [test "$with_storage_mpath" = "yes"]) + +if test "$with_storage_mpath" = "yes"; then + DEVMAPPER_REQUIRED=0.0 + DEVMAPPER_CFLAGS= + DEVMAPPER_LIBS= + PKG_CHECK_MODULES(DEVMAPPER, devmapper >= $DEVMAPPER_REQUIRED, + [], [ + AC_MSG_ERROR( + [You must install device-mapper-devel >= $DEVMAPPER_REQUIRED to compile libvirt]) + ]) +fi +AC_SUBST([DEVMAPPER_CFLAGS]) +AC_SUBST([DEVMAPPER_LIBS])
Hum, the way I read this is that f you give no specific option to configure, check turns into a with_storage_mpath" = "yes" and then if device-mapper-devel is not available configure fails, while I would expect check to not fail in that case and switch with_storage_mpath to no , I afraid this can be confusing, or I'm the one confused :-)
Hmm...yes, I was afraid I hadn't gotten the build stuff quite right. I'm getting familiar with how it all works, but your advice on how to do this would be very welcome. Basically all I want is for libdevmapper to be pulled into the compile/link if the user requests a build with multipath support.
[...]
+++ b/src/storage_backend_mpath.c @@ -0,0 +1,344 @@ +/* + * storage_backend_mpath.c: storage backend for multipath handling + * + * Copyright (C) 2007-2008 Red Hat, Inc. + * Copyright (C) 2007-2008 Daniel P. Berrange [...] + * Author: Daniel P. Berrange <berrange redhat com>
Hum, really ? please update :-) or clarify ! and extend to 2009
Damn C&P...thanks for catching that.
[...]
+++ b/src/storage_backend_mpath.h @@ -0,0 +1,31 @@ +/* + * storage_backend_scsi.h: storage backend for SCSI handling + * + * Copyright (C) 2007-2008 Red Hat, Inc. + * Copyright (C) 2007-2008 Daniel P. Berrange [...] + * Author: Daniel P. Berrange <berrange redhat com> + */
To be fixed too including name and description
Indeed. Thanks; I'll make those changes in the next revision. Dave

On Tue, Jul 21, 2009 at 05:25:36PM -0400, David Allan wrote:
The following patch implements multipath pool support. It's very basic functionality, consisting of creating a pool that contains all the multipath devices on the host. That will cover the common case of users who just want to discover all the available multipath devices and assign them to guests.
It doesn't currently allow configuration of multipathing, so for now setting the multipath configuration will have to continue to be done as part of the host system build.
Example XML to create the pool is:
<pool type="mpath"> <name>mpath</name> <target> <path>/dev/mapper</path> </target> </pool>
So this is in essence a 'singleton' pool, since there's only really one of them per host. There is also no quanity of storage associated with a mpath pool - it is simply dealing with volumes from other pools. This falls into the same conceptual bucket as things like DM-RAID, MD-RAID and even loopback device management. The question I've never been able to satisfactorily answer myself is whether these things(mpath,raid,loopback) should be living in the storage pool APIs, or in the host device APIs. I also wonder people determine the assoication between the volumes in the mpath pool, and the volumes for each corresponding path. eg, how do they determine that /dev/mapper/dm-4 multipath device is associated with devices from the SCSI storage pool 'xyz'. The storage volume APIs & XML format don't really have a way to express this relationship. The host device APIs have a much more limited set of operations (list, create, delete) but this may well be all that's needed for things like raid/mpath/loopback devices, and with its XML format being capability based we could add a multipath capability under which we list the constituent paths of each device. Now, if my understanding is correct, then if multipath is active it should automatically create multipath devices for each unique LUN on a storage array. DM does SCSI queries to determine which block devices are paths to the same underlying LUN. Taking a simple iSCSI storage pool <pool type='iscsi'> <name>virtimages</name> <source> <host name="iscsi.example.com"/> <device path="demo-target"/> </source> <target> <path>/dev/disk/by-path</path> </target> </pool> this example would show you each individual block device, generating paths under /dev/disk/by-path. Now, we decide we want to make use of multipath for this particular pool. We should be able to just change the target path, to point to /dev/mpath, <pool type='iscsi'> <name>virtimages</name> <source> <host name="iscsi.example.com"/> <device path="demo-target"/> </source> <target> <path>/dev/mpath</path> </target> </pool> and have it give us back the unique multipath enabled LUNs, instead of each individual block device.
The target element is ignored, as it is by the disk pool, but the config code rejects the XML if it does not exist. That behavior should obviously be cleaned up, but I think that should be done in a separate patch, as it's really a bug in the config code, not related to the addition of the new pool type.
The target element is not ignored by the disk pool. This is used to form the stable device paths via virStorageBackendStablePath() for all block device based pools. Even for multipath, there are 3 possible directories under which you can see LUNs, with varying plus & minuses for naming stability/uniqueness across hosts. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Daniel P. Berrange wrote:
On Tue, Jul 21, 2009 at 05:25:36PM -0400, David Allan wrote:
The following patch implements multipath pool support. It's very basic functionality, consisting of creating a pool that contains all the multipath devices on the host. That will cover the common case of users who just want to discover all the available multipath devices and assign them to guests.
It doesn't currently allow configuration of multipathing, so for now setting the multipath configuration will have to continue to be done as part of the host system build.
Example XML to create the pool is:
<pool type="mpath"> <name>mpath</name> <target> <path>/dev/mapper</path> </target> </pool>
So this is in essence a 'singleton' pool, since there's only really one of them per host. There is also no quanity of storage associated with a mpath pool - it is simply dealing with volumes from other pools. This falls into the same conceptual bucket as things like DM-RAID, MD-RAID and even loopback device management.
It is a singleton pool, in that there is only one dm instance per host. With regard to capacity, the dm devices have capacity, and their constituent devices could be members of other pools. Can you elaborate on what you see as the implications of those points?
The question I've never been able to satisfactorily answer myself is whether these things(mpath,raid,loopback) should be living in the storage pool APIs, or in the host device APIs.
I also wonder people determine the assoication between the volumes in the mpath pool, and the volumes for each corresponding path. eg, how do they determine that /dev/mapper/dm-4 multipath device is associated with devices from the SCSI storage pool 'xyz'. The storage volume APIs & XML format don't really have a way to express this relationship.
It's not difficult to query to find out what devices are parents of a given device, but what is the use case for finding out the pools of the parent devices?
The host device APIs have a much more limited set of operations (list, create, delete) but this may well be all that's needed for things like raid/mpath/loopback devices, and with its XML format being capability based we could add a multipath capability under which we list the constituent paths of each device.
If we decide to implement creation and destruction of multipath devices, I would think the node device APIs would be the place to do it.
Now, if my understanding is correct, then if multipath is active it should automatically create multipath devices for each unique LUN on a storage array. DM does SCSI queries to determine which block devices are paths to the same underlying LUN.
That's basically correct, and the administrator can configure which devices have multipath devices created.
Taking a simple iSCSI storage pool
<pool type='iscsi'> <name>virtimages</name> <source> <host name="iscsi.example.com"/> <device path="demo-target"/> </source> <target> <path>/dev/disk/by-path</path> </target> </pool>
this example would show you each individual block device, generating paths under /dev/disk/by-path.
Now, we decide we want to make use of multipath for this particular pool. We should be able to just change the target path, to point to /dev/mpath,
<pool type='iscsi'> <name>virtimages</name> <source> <host name="iscsi.example.com"/> <device path="demo-target"/> </source> <target> <path>/dev/mpath</path> </target> </pool>
and have it give us back the unique multipath enabled LUNs, instead of each individual block device.
The problem with this approach is that dm devices are not SCSI devices, so putting them in a SCSI pool seems wrong. iSCSI pools have always contained volumes which are iSCSI block devices, directory pools have always had volumes which are files. We shouldn't break that assumption unless we have a good reason. It's not impossible to do what you describe, but I don't understand why it's a benefit.
The target element is ignored, as it is by the disk pool, but the config code rejects the XML if it does not exist. That behavior should obviously be cleaned up, but I think that should be done in a separate patch, as it's really a bug in the config code, not related to the addition of the new pool type.
The target element is not ignored by the disk pool. This is used to form the stable device paths via virStorageBackendStablePath() for all block device based pools.
Hmm--on my system the path I specify shows up in the pool XML, but is unused as far as I can tell. I can hand it something totally bogus and it doesn't complain. I think your next point is very good, though, so I'll make the target element meaningful in the multipath case and we can investigate the disk behavior separately.
Even for multipath, there are 3 possible directories under which you can see LUNs, with varying plus & minuses for naming stability/uniqueness across hosts.
That's a good point. I'll make the target element meaningful. Dave

On Thu, Jul 23, 2009 at 02:53:48PM -0400, Dave Allan wrote:
Daniel P. Berrange wrote:
It doesn't currently allow configuration of multipathing, so for now setting the multipath configuration will have to continue to be done as part of the host system build.
Example XML to create the pool is:
<pool type="mpath"> <name>mpath</name> <target> <path>/dev/mapper</path> </target> </pool>
So this is in essence a 'singleton' pool, since there's only really one of them per host. There is also no quanity of storage associated with a mpath pool - it is simply dealing with volumes from other pools. This falls into the same conceptual bucket as things like DM-RAID, MD-RAID and even loopback device management.
It is a singleton pool, in that there is only one dm instance per host. With regard to capacity, the dm devices have capacity, and their constituent devices could be members of other pools. Can you elaborate on what you see as the implications of those points?
The storage pool vs storage volume concept was modelled around the idea that you have some storage source, and it is sub-divided into a number of volumes With multipath pool you have no storage source - the source is the SCSI/iSCSI pool which actually provides the underlying block provides which are the LUN paths. So by having a explicit storage pool for multipath, there's an implicit dependancy between 2 pools. If you refresh a SCSI pool, you must then refresh a multipath pool too. Or if you add a SCSI/iSCSI pool you must also refresh the multipath pool. There's also the issue of tracking the assoication between multipath volumes and the pools to ensure you don't remove a pool that's providing a multipath volume thats still in use.
The question I've never been able to satisfactorily answer myself is whether these things(mpath,raid,loopback) should be living in the storage pool APIs, or in the host device APIs.
I also wonder people determine the assoication between the volumes in the mpath pool, and the volumes for each corresponding path. eg, how do they determine that /dev/mapper/dm-4 multipath device is associated with devices from the SCSI storage pool 'xyz'. The storage volume APIs & XML format don't really have a way to express this relationship.
It's not difficult to query to find out what devices are parents of a given device, but what is the use case for finding out the pools of the parent devices?
Say you have 3 SCSI NPIV pools configured, and a multipath pool. You want to remove one of the SCSI pools, and know that the multipath devices X, Y & Z are in use. You need to determine which of the SCSI pools contains the underlying block devices for these multipath devices before you can safely remove that SCSI pool.
The host device APIs have a much more limited set of operations (list, create, delete) but this may well be all that's needed for things like raid/mpath/loopback devices, and with its XML format being capability based we could add a multipath capability under which we list the constituent paths of each device.
If we decide to implement creation and destruction of multipath devices, I would think the node device APIs would be the place to do it.
If we intend to do creation/deletion of multipath devices in the node device APIs, then we esentially get listing of multipath devices in the node device APIs for free. So do we need a dedicated storage pool for multipath too ? I have a feeling that the DeviceKit impl of the node devive APIs (which is currently disabled by default, may already be reporting on all device mapper block devices - the HAL impl does not.
Now, if my understanding is correct, then if multipath is active it should automatically create multipath devices for each unique LUN on a storage array. DM does SCSI queries to determine which block devices are paths to the same underlying LUN.
That's basically correct, and the administrator can configure which devices have multipath devices created.
Taking a simple iSCSI storage pool
<pool type='iscsi'> <name>virtimages</name> <source> <host name="iscsi.example.com"/> <device path="demo-target"/> </source> <target> <path>/dev/disk/by-path</path> </target> </pool>
this example would show you each individual block device, generating paths under /dev/disk/by-path.
Now, we decide we want to make use of multipath for this particular pool. We should be able to just change the target path, to point to /dev/mpath,
<pool type='iscsi'> <name>virtimages</name> <source> <host name="iscsi.example.com"/> <device path="demo-target"/> </source> <target> <path>/dev/mpath</path> </target> </pool>
and have it give us back the unique multipath enabled LUNs, instead of each individual block device.
The problem with this approach is that dm devices are not SCSI devices, so putting them in a SCSI pool seems wrong. iSCSI pools have always contained volumes which are iSCSI block devices, directory pools have always had volumes which are files. We shouldn't break that assumption unless we have a good reason. It's not impossible to do what you describe, but I don't understand why it's a benefit.
What is a SCSI device though ? Under Linux these days everything appears to be a SCIS device whether it is SCSI or not, eg PATA, SATA, USB. So there can be no assumption that a SCSI HBA pool gives you SCSI devices. If an application using a pool expects volumes to have particular SCSI capabilities (peristent reservations for example), then the only way is for it to query the device, or try the capability it wants and handle failure. The best libvirt can guarentee is that a SCSI, disk, iSCSI & logical pools will give back block devices, while fs / netfs pools will give back plain files. The one downside I realize with my suggestion here, is that a single multipath device may have many paths, and each path may go via a separate HBA, which would mean separate SCSI pool. So in fact I think we shouldn't expose multipath in normal SCSI pools after all :-) I'm still inclined to think we can do the 'list' operation in node device APis though
The target element is ignored, as it is by the disk pool, but the config code rejects the XML if it does not exist. That behavior should obviously be cleaned up, but I think that should be done in a separate patch, as it's really a bug in the config code, not related to the addition of the new pool type.
The target element is not ignored by the disk pool. This is used to form the stable device paths via virStorageBackendStablePath() for all block device based pools.
Hmm--on my system the path I specify shows up in the pool XML, but is unused as far as I can tell. I can hand it something totally bogus and it doesn't complain. I think your next point is very good, though, so I'll make the target element meaningful in the multipath case and we can investigate the disk behavior separately.
Normally a disk pool will give you back volumes whose path name is /dev/sdXX. If you give the pool a target path if /dev/disk/by-uuid then the volumes will get paths like /dev/disk/by-uuid/b0509f5a-2824-4090-9da2-d0f0ff4ace0e Since it is possible that some volumes may not have stable paths though, we fall back to /dev/sdXXX if one can't be formed. We should probably explicitly reject bogus target paths which don't even exist on disk though. Only allow targets under /dev, where the given target exists Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Daniel P. Berrange wrote:
On Thu, Jul 23, 2009 at 02:53:48PM -0400, Dave Allan wrote:
Daniel P. Berrange wrote:
It doesn't currently allow configuration of multipathing, so for now setting the multipath configuration will have to continue to be done as part of the host system build.
Example XML to create the pool is:
<pool type="mpath"> <name>mpath</name> <target> <path>/dev/mapper</path> </target> </pool> So this is in essence a 'singleton' pool, since there's only really one of them per host. There is also no quanity of storage associated with a mpath pool - it is simply dealing with volumes from other pools. This falls into the same conceptual bucket as things like DM-RAID, MD-RAID and even loopback device management. It is a singleton pool, in that there is only one dm instance per host. With regard to capacity, the dm devices have capacity, and their constituent devices could be members of other pools. Can you elaborate on what you see as the implications of those points?
The storage pool vs storage volume concept was modelled around the idea that you have some storage source, and it is sub-divided into a number of volumes
With multipath pool you have no storage source - the source is the SCSI/iSCSI pool which actually provides the underlying block provides which are the LUN paths. So by having a explicit storage pool for multipath, there's an implicit dependancy between 2 pools. If you refresh a SCSI pool, you must then refresh a multipath pool too. Or if you add a SCSI/iSCSI pool you must also refresh the multipath pool. There's also the issue of tracking the assoication between multipath volumes and the pools to ensure you don't remove a pool that's providing a multipath volume thats still in use.
The problem of hierarchical relationships among pools can exist with the other pools as well, since one could create a logical pool on top of a block device that's part of an iSCSI or other pool. It's also possible that a hierarchical pool relationship might not exist with the multipath pool if a user didn't create pools for HBAs.
The question I've never been able to satisfactorily answer myself is whether these things(mpath,raid,loopback) should be living in the storage pool APIs, or in the host device APIs.
I also wonder people determine the assoication between the volumes in the mpath pool, and the volumes for each corresponding path. eg, how do they determine that /dev/mapper/dm-4 multipath device is associated with devices from the SCSI storage pool 'xyz'. The storage volume APIs & XML format don't really have a way to express this relationship. It's not difficult to query to find out what devices are parents of a given device, but what is the use case for finding out the pools of the parent devices?
Say you have 3 SCSI NPIV pools configured, and a multipath pool. You want to remove one of the SCSI pools, and know that the multipath devices X, Y & Z are in use. You need to determine which of the SCSI pools contains the underlying block devices for these multipath devices before you can safely remove that SCSI pool.
Ok, that makes sense, but this problem exists with any hierarchical pool so users are already dealing with it.
The host device APIs have a much more limited set of operations (list, create, delete) but this may well be all that's needed for things like raid/mpath/loopback devices, and with its XML format being capability based we could add a multipath capability under which we list the constituent paths of each device. If we decide to implement creation and destruction of multipath devices, I would think the node device APIs would be the place to do it.
If we intend to do creation/deletion of multipath devices in the node device APIs, then we esentially get listing of multipath devices in the node device APIs for free. So do we need a dedicated storage pool for multipath too ?
Isn't the general idea that storage pools are how people should be managing storage? We shouldn't make people use a separate API to enumerate one type of storage.
I have a feeling that the DeviceKit impl of the node devive APIs (which is currently disabled by default, may already be reporting on all device mapper block devices - the HAL impl does not.
That may be--there's a fairly wide gap between the two sets of functionality.
Now, if my understanding is correct, then if multipath is active it should automatically create multipath devices for each unique LUN on a storage array. DM does SCSI queries to determine which block devices are paths to the same underlying LUN. That's basically correct, and the administrator can configure which devices have multipath devices created.
Taking a simple iSCSI storage pool
<pool type='iscsi'> <name>virtimages</name> <source> <host name="iscsi.example.com"/> <device path="demo-target"/> </source> <target> <path>/dev/disk/by-path</path> </target> </pool>
this example would show you each individual block device, generating paths under /dev/disk/by-path.
Now, we decide we want to make use of multipath for this particular pool. We should be able to just change the target path, to point to /dev/mpath,
<pool type='iscsi'> <name>virtimages</name> <source> <host name="iscsi.example.com"/> <device path="demo-target"/> </source> <target> <path>/dev/mpath</path> </target> </pool>
and have it give us back the unique multipath enabled LUNs, instead of each individual block device. The problem with this approach is that dm devices are not SCSI devices, so putting them in a SCSI pool seems wrong. iSCSI pools have always contained volumes which are iSCSI block devices, directory pools have always had volumes which are files. We shouldn't break that assumption unless we have a good reason. It's not impossible to do what you describe, but I don't understand why it's a benefit.
What is a SCSI device though ? Under Linux these days everything appears to be a SCIS device whether it is SCSI or not, eg PATA, SATA, USB. So there can be no assumption that a SCSI HBA pool gives you SCSI devices. If an application using a pool expects volumes to have particular SCSI capabilities (peristent reservations for example), then the only way is for it to query the device, or try the capability it wants and handle failure. The best libvirt can guarentee is that a SCSI, disk, iSCSI & logical pools will give back block devices, while fs / netfs pools will give back plain files. The one downside I realize with my suggestion here, is that a single multipath device may have many paths, and each path may go via a separate HBA, which would mean separate SCSI pool. So in fact I think we shouldn't expose multipath in normal SCSI pools after all :-)
Agreed, let's keep the existing pools the way they are.
I'm still inclined to think we can do the 'list' operation in node device APis though
Again, I think using the node device APIs as the only support for multipath devices is contrary to how we're leading people to believe storage should be managed with libvirt.
The target element is ignored, as it is by the disk pool, but the config code rejects the XML if it does not exist. That behavior should obviously be cleaned up, but I think that should be done in a separate patch, as it's really a bug in the config code, not related to the addition of the new pool type. The target element is not ignored by the disk pool. This is used to form the stable device paths via virStorageBackendStablePath() for all block device based pools. Hmm--on my system the path I specify shows up in the pool XML, but is unused as far as I can tell. I can hand it something totally bogus and it doesn't complain. I think your next point is very good, though, so I'll make the target element meaningful in the multipath case and we can investigate the disk behavior separately.
Normally a disk pool will give you back volumes whose path name is /dev/sdXX. If you give the pool a target path if /dev/disk/by-uuid then the volumes will get paths like /dev/disk/by-uuid/b0509f5a-2824-4090-9da2-d0f0ff4ace0e Since it is possible that some volumes may not have stable paths though, we fall back to /dev/sdXXX if one can't be formed.
We should probably explicitly reject bogus target paths which don't even exist on disk though. Only allow targets under /dev, where the given target exists
That sounds good. Dave

Dave Allan wrote:
Daniel P. Berrange wrote:
On Thu, Jul 23, 2009 at 02:53:48PM -0400, Dave Allan wrote:
Daniel P. Berrange wrote:
It doesn't currently allow configuration of multipathing, so for now setting the multipath configuration will have to continue to be done as part of the host system build.
Example XML to create the pool is:
<pool type="mpath"> <name>mpath</name> <target> <path>/dev/mapper</path> </target> </pool> So this is in essence a 'singleton' pool, since there's only really one of them per host. There is also no quanity of storage associated with a mpath pool - it is simply dealing with volumes from other pools. This falls into the same conceptual bucket as things like DM-RAID, MD-RAID and even loopback device management. It is a singleton pool, in that there is only one dm instance per host. With regard to capacity, the dm devices have capacity, and their constituent devices could be members of other pools. Can you elaborate on what you see as the implications of those points?
The storage pool vs storage volume concept was modelled around the idea that you have some storage source, and it is sub-divided into a number of volumes
With multipath pool you have no storage source - the source is the SCSI/iSCSI pool which actually provides the underlying block provides which are the LUN paths. So by having a explicit storage pool for multipath, there's an implicit dependancy between 2 pools. If you refresh a SCSI pool, you must then refresh a multipath pool too. Or if you add a SCSI/iSCSI pool you must also refresh the multipath pool. There's also the issue of tracking the assoication between multipath volumes and the pools to ensure you don't remove a pool that's providing a multipath volume thats still in use.
The problem of hierarchical relationships among pools can exist with the other pools as well, since one could create a logical pool on top of a block device that's part of an iSCSI or other pool. It's also possible that a hierarchical pool relationship might not exist with the multipath pool if a user didn't create pools for HBAs.
The question I've never been able to satisfactorily answer myself is whether these things(mpath,raid,loopback) should be living in the storage pool APIs, or in the host device APIs.
I also wonder people determine the assoication between the volumes in the mpath pool, and the volumes for each corresponding path. eg, how do they determine that /dev/mapper/dm-4 multipath device is associated with devices from the SCSI storage pool 'xyz'. The storage volume APIs & XML format don't really have a way to express this relationship. It's not difficult to query to find out what devices are parents of a given device, but what is the use case for finding out the pools of the parent devices?
Say you have 3 SCSI NPIV pools configured, and a multipath pool. You want to remove one of the SCSI pools, and know that the multipath devices X, Y & Z are in use. You need to determine which of the SCSI pools contains the underlying block devices for these multipath devices before you can safely remove that SCSI pool.
Ok, that makes sense, but this problem exists with any hierarchical pool so users are already dealing with it.
The host device APIs have a much more limited set of operations (list, create, delete) but this may well be all that's needed for things like raid/mpath/loopback devices, and with its XML format being capability based we could add a multipath capability under which we list the constituent paths of each device. If we decide to implement creation and destruction of multipath devices, I would think the node device APIs would be the place to do it.
If we intend to do creation/deletion of multipath devices in the node device APIs, then we esentially get listing of multipath devices in the node device APIs for free. So do we need a dedicated storage pool for multipath too ?
Isn't the general idea that storage pools are how people should be managing storage? We shouldn't make people use a separate API to enumerate one type of storage.
I have a feeling that the DeviceKit impl of the node devive APIs (which is currently disabled by default, may already be reporting on all device mapper block devices - the HAL impl does not.
That may be--there's a fairly wide gap between the two sets of functionality.
Now, if my understanding is correct, then if multipath is active it should automatically create multipath devices for each unique LUN on a storage array. DM does SCSI queries to determine which block devices are paths to the same underlying LUN. That's basically correct, and the administrator can configure which devices have multipath devices created.
Taking a simple iSCSI storage pool
<pool type='iscsi'> <name>virtimages</name> <source> <host name="iscsi.example.com"/> <device path="demo-target"/> </source> <target> <path>/dev/disk/by-path</path> </target> </pool>
this example would show you each individual block device, generating paths under /dev/disk/by-path.
Now, we decide we want to make use of multipath for this particular pool. We should be able to just change the target path, to point to /dev/mpath,
<pool type='iscsi'> <name>virtimages</name> <source> <host name="iscsi.example.com"/> <device path="demo-target"/> </source> <target> <path>/dev/mpath</path> </target> </pool>
and have it give us back the unique multipath enabled LUNs, instead of each individual block device. The problem with this approach is that dm devices are not SCSI devices, so putting them in a SCSI pool seems wrong. iSCSI pools have always contained volumes which are iSCSI block devices, directory pools have always had volumes which are files. We shouldn't break that assumption unless we have a good reason. It's not impossible to do what you describe, but I don't understand why it's a benefit.
What is a SCSI device though ? Under Linux these days everything appears to be a SCIS device whether it is SCSI or not, eg PATA, SATA, USB. So there can be no assumption that a SCSI HBA pool gives you SCSI devices. If an application using a pool expects volumes to have particular SCSI capabilities (peristent reservations for example), then the only way is for it to query the device, or try the capability it wants and handle failure. The best libvirt can guarentee is that a SCSI, disk, iSCSI & logical pools will give back block devices, while fs / netfs pools will give back plain files. The one downside I realize with my suggestion here, is that a single multipath device may have many paths, and each path may go via a separate HBA, which would mean separate SCSI pool. So in fact I think we shouldn't expose multipath in normal SCSI pools after all :-)
Agreed, let's keep the existing pools the way they are.
I'm still inclined to think we can do the 'list' operation in node device APis though
Again, I think using the node device APIs as the only support for multipath devices is contrary to how we're leading people to believe storage should be managed with libvirt.
The target element is ignored, as it is by the disk pool, but the config code rejects the XML if it does not exist. That behavior should obviously be cleaned up, but I think that should be done in a separate patch, as it's really a bug in the config code, not related to the addition of the new pool type. The target element is not ignored by the disk pool. This is used to form the stable device paths via virStorageBackendStablePath() for all block device based pools. Hmm--on my system the path I specify shows up in the pool XML, but is unused as far as I can tell. I can hand it something totally bogus and it doesn't complain. I think your next point is very good, though, so I'll make the target element meaningful in the multipath case and we can investigate the disk behavior separately.
Normally a disk pool will give you back volumes whose path name is /dev/sdXX. If you give the pool a target path if /dev/disk/by-uuid then the volumes will get paths like /dev/disk/by-uuid/b0509f5a-2824-4090-9da2-d0f0ff4ace0e Since it is possible that some volumes may not have stable paths though, we fall back to /dev/sdXXX if one can't be formed.
We should probably explicitly reject bogus target paths which don't even exist on disk though. Only allow targets under /dev, where the given target exists
That sounds good.
Dave
Dan, Ping, what are your thoughts on this stuff? Dave
participants (4)
-
Daniel P. Berrange
-
Daniel Veillard
-
Dave Allan
-
David Allan