Re: [libvirt] [PATCH 0/1] Multipath pool support

Wednesday, 19 August 2009

Dave Allan wrote:
...
 Daniel P. Berrange wrote:
> On Thu, Jul 23, 2009 at 02:53:48PM -0400, Dave Allan wrote:
>> Daniel P. Berrange wrote:
>>>> It doesn't currently allow configuration of multipathing, so for
>>>> now setting the multipath configuration will have to continue to be
>>>> done as part of the host system build.
>>>>
>>>> Example XML to create the pool is:
>>>>
>>>> <pool type="mpath"> <name>mpath</name>
<target>
>>>> <path>/dev/mapper</path> </target> </pool>
>>> So this is in essence a 'singleton' pool, since there's only
really
>>> one of them per host. There is also no quanity of storage associated
>>> with a mpath pool - it is simply dealing with volumes from other
>>> pools. This falls into the same conceptual bucket as things like
>>> DM-RAID, MD-RAID and even loopback device management.
>> It is a singleton pool, in that there is only one dm instance per host.
>>  With regard to capacity, the dm devices have capacity, and their
>> constituent devices could be members of other pools.  Can you elaborate
>> on what you see as the implications of those points?
>
> The storage pool vs storage volume concept was modelled around the idea
> that you have some storage source, and it is sub-divided into a number
> of volumes
>
> With multipath pool you have no storage source - the source is the
> SCSI/iSCSI pool which actually provides the underlying block provides
> which are the LUN paths. So by having a explicit storage pool for
> multipath, there's an implicit dependancy between 2 pools. If you
> refresh a SCSI pool, you must then refresh a multipath pool too.
> Or if you add a SCSI/iSCSI pool you must also refresh the multipath
> pool. There's also the issue of tracking the assoication between
> multipath volumes and the pools to ensure you don't remove a pool
> that's providing a multipath volume thats still in use.

 The problem of hierarchical relationships among pools can exist with the 
 other pools as well, since one could create a logical pool on top of a 
 block device that's part of an iSCSI or other pool.  It's also possible 
 that a hierarchical pool relationship might not exist with the multipath 
 pool if a user didn't create pools for HBAs.

>>> The question I've never been able to satisfactorily answer myself is
>>> whether these things(mpath,raid,loopback) should be living in the
>>> storage pool APIs, or in the host device APIs.
>>>
>>> I also wonder people determine the assoication between the volumes in
>>> the mpath pool, and the volumes for each corresponding path. eg, how
>>> do they determine that /dev/mapper/dm-4 multipath device is
>>> associated with devices from the SCSI storage pool 'xyz'. The
storage
>>> volume APIs & XML format don't really have a way to express this
>>> relationship.
>> It's not difficult to query to find out what devices are parents of a
>> given device, but what is the use case for finding out the pools of the
>> parent devices?
>
> Say you have 3 SCSI NPIV pools configured, and a multipath pool.
> You want to remove one of the SCSI pools, and know that the
> multipath devices X, Y & Z are in use. You need to determine which
> of the SCSI pools contains the underlying block devices for these
> multipath devices before you can safely remove that SCSI pool.

 Ok, that makes sense, but this problem exists with any hierarchical pool 
 so users are already dealing with it.

>>> The host device APIs have a much more limited set of operations
>>> (list, create, delete) but this may well be all that's needed for
>>> things like raid/mpath/loopback devices, and with its XML format
>>> being capability based we could add a multipath capability under
>>> which we list the constituent paths of each device.
>> If we decide to implement creation and destruction of multipath devices,
>> I would think the node device APIs would be the place to do it.
>
> If we intend to do creation/deletion of multipath devices in the
> node device  APIs, then we esentially get listing of multipath
> devices in the node device APIs for free. So do we need a dedicated
> storage pool for multipath too ?

 Isn't the general idea that storage pools are how people should be 
 managing storage?  We shouldn't make people use a separate API to 
 enumerate one type of storage.

> I have a feeling that the DeviceKit impl of the node devive APIs (which
> is currently disabled by default, may already be reporting on all
> device mapper block devices - the HAL impl does not.

 That may be--there's a fairly wide gap between the two sets of 
 functionality.

>>> Now, if my understanding is correct, then if multipath is active it
>>> should automatically create multipath devices for each unique LUN on
>>> a storage array. DM does SCSI queries to determine which block
>>> devices are paths to the same underlying LUN.
>> That's basically correct, and the administrator can configure which
>> devices have multipath devices created.
>>
>>> Taking a simple iSCSI storage pool
>>>
>>> <pool type='iscsi'> <name>virtimages</name>
<source> <host
>>> name="iscsi.example.com"/> <device
path="demo-target"/> </source>
>>> <target> <path>/dev/disk/by-path</path> </target>
</pool>
>>>
>>> this example would show you each individual block device, generating
>>> paths under /dev/disk/by-path.
>>>
>>> Now, we decide we want to make use of multipath for this particular
>>> pool. We should be able to just change the target path, to point to
>>> /dev/mpath,
>>>
>>> <pool type='iscsi'> <name>virtimages</name>
<source> <host
>>> name="iscsi.example.com"/> <device
path="demo-target"/> </source>
>>> <target> <path>/dev/mpath</path> </target>
</pool>
>>>
>>> and have it give us back the unique multipath enabled LUNs, instead
>>> of each individual block device.
>> The problem with this approach is that dm devices are not SCSI devices,
>> so putting them in a SCSI pool seems wrong.  iSCSI pools have always
>> contained volumes which are iSCSI block devices, directory pools have
>> always had volumes which are files.  We shouldn't break that assumption
>> unless we have a good reason.  It's not impossible to do what you
>> describe, but I don't understand why it's a benefit.
>
> What is a SCSI device though ? Under Linux these days everything appears
> to be a SCIS device whether it is SCSI or not, eg PATA, SATA, USB. So
> there can be no assumption that a SCSI HBA pool gives you SCSI devices.
> If an application using a pool expects volumes to have particular
> SCSI capabilities (peristent reservations for example), then the only
> way is for it to query the device, or try the capability it wants and
> handle failure. The best libvirt can guarentee is that a SCSI, disk,
> iSCSI  & logical pools will give back block devices,  while fs / netfs
> pools will give back plain files.
> The one downside I realize with my suggestion here, is that a single
> multipath device may have many paths, and each path may go via a
> separate HBA, which would mean separate SCSI pool. So in fact I think
> we shouldn't expose multipath in normal SCSI pools after all :-)

 Agreed, let's keep the existing pools the way they are.

> I'm still inclined to think we can do the 'list' operation in node
device
> APis though

 Again, I think using the node device APIs as the only support for 
 multipath devices is contrary to how we're leading people to believe 
 storage should be managed with libvirt.

>>>> The target element is ignored, as it is by the disk pool, but the
>>>> config code rejects the XML if it does not exist.  That behavior
>>>> should obviously be cleaned up, but I think that should be done in
>>>> a separate patch, as it's really a bug in the config code, not
>>>> related to the addition of the new pool type.
>>> The target element is not ignored by the disk pool. This is used to
>>> form the stable device paths via virStorageBackendStablePath() for
>>> all block device based pools.
>> Hmm--on my system the path I specify shows up in the pool XML, but is
>> unused as far as I can tell.  I can hand it something totally bogus and
>> it doesn't complain.  I think your next point is very good, though, so
>> I'll make the target element meaningful in the multipath case and we can
>> investigate the disk behavior separately.
>
> Normally a disk pool will give you back volumes whose path name
> is /dev/sdXX. If you give the pool a target path if /dev/disk/by-uuid
> then the volumes will get paths like 
> /dev/disk/by-uuid/b0509f5a-2824-4090-9da2-d0f0ff4ace0e
> Since it is possible that some volumes may not have stable paths
> though, we fall back to /dev/sdXXX if one can't be formed.
>
> We should probably explicitly reject bogus target paths which don't
> even exist on disk though. Only allow targets under /dev, where the
> given target exists

 That sounds good.

 Dave 
Dan,

Ping, what are your thoughts on this stuff?

Dave

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [PATCH 0/1] Multipath pool support