Dave Allan wrote:
Daniel P. Berrange wrote:
> On Thu, Jul 23, 2009 at 02:53:48PM -0400, Dave Allan wrote:
>> Daniel P. Berrange wrote:
>>>> It doesn't currently allow configuration of multipathing, so for
>>>> now setting the multipath configuration will have to continue to be
>>>> done as part of the host system build.
>>>>
>>>> Example XML to create the pool is:
>>>>
>>>> <pool type="mpath"> <name>mpath</name>
<target>
>>>> <path>/dev/mapper</path> </target> </pool>
>>> So this is in essence a 'singleton' pool, since there's only
really
>>> one of them per host. There is also no quanity of storage associated
>>> with a mpath pool - it is simply dealing with volumes from other
>>> pools. This falls into the same conceptual bucket as things like
>>> DM-RAID, MD-RAID and even loopback device management.
>> It is a singleton pool, in that there is only one dm instance per host.
>> With regard to capacity, the dm devices have capacity, and their
>> constituent devices could be members of other pools. Can you elaborate
>> on what you see as the implications of those points?
>
> The storage pool vs storage volume concept was modelled around the idea
> that you have some storage source, and it is sub-divided into a number
> of volumes
>
> With multipath pool you have no storage source - the source is the
> SCSI/iSCSI pool which actually provides the underlying block provides
> which are the LUN paths. So by having a explicit storage pool for
> multipath, there's an implicit dependancy between 2 pools. If you
> refresh a SCSI pool, you must then refresh a multipath pool too.
> Or if you add a SCSI/iSCSI pool you must also refresh the multipath
> pool. There's also the issue of tracking the assoication between
> multipath volumes and the pools to ensure you don't remove a pool
> that's providing a multipath volume thats still in use.
The problem of hierarchical relationships among pools can exist with the
other pools as well, since one could create a logical pool on top of a
block device that's part of an iSCSI or other pool. It's also possible
that a hierarchical pool relationship might not exist with the multipath
pool if a user didn't create pools for HBAs.
>>> The question I've never been able to satisfactorily answer myself is
>>> whether these things(mpath,raid,loopback) should be living in the
>>> storage pool APIs, or in the host device APIs.
>>>
>>> I also wonder people determine the assoication between the volumes in
>>> the mpath pool, and the volumes for each corresponding path. eg, how
>>> do they determine that /dev/mapper/dm-4 multipath device is
>>> associated with devices from the SCSI storage pool 'xyz'. The
storage
>>> volume APIs & XML format don't really have a way to express this
>>> relationship.
>> It's not difficult to query to find out what devices are parents of a
>> given device, but what is the use case for finding out the pools of the
>> parent devices?
>
> Say you have 3 SCSI NPIV pools configured, and a multipath pool.
> You want to remove one of the SCSI pools, and know that the
> multipath devices X, Y & Z are in use. You need to determine which
> of the SCSI pools contains the underlying block devices for these
> multipath devices before you can safely remove that SCSI pool.
Ok, that makes sense, but this problem exists with any hierarchical pool
so users are already dealing with it.
>>> The host device APIs have a much more limited set of operations
>>> (list, create, delete) but this may well be all that's needed for
>>> things like raid/mpath/loopback devices, and with its XML format
>>> being capability based we could add a multipath capability under
>>> which we list the constituent paths of each device.
>> If we decide to implement creation and destruction of multipath devices,
>> I would think the node device APIs would be the place to do it.
>
> If we intend to do creation/deletion of multipath devices in the
> node device APIs, then we esentially get listing of multipath
> devices in the node device APIs for free. So do we need a dedicated
> storage pool for multipath too ?
Isn't the general idea that storage pools are how people should be
managing storage? We shouldn't make people use a separate API to
enumerate one type of storage.
> I have a feeling that the DeviceKit impl of the node devive APIs (which
> is currently disabled by default, may already be reporting on all
> device mapper block devices - the HAL impl does not.
That may be--there's a fairly wide gap between the two sets of
functionality.
>>> Now, if my understanding is correct, then if multipath is active it
>>> should automatically create multipath devices for each unique LUN on
>>> a storage array. DM does SCSI queries to determine which block
>>> devices are paths to the same underlying LUN.
>> That's basically correct, and the administrator can configure which
>> devices have multipath devices created.
>>
>>> Taking a simple iSCSI storage pool
>>>
>>> <pool type='iscsi'> <name>virtimages</name>
<source> <host
>>> name="iscsi.example.com"/> <device
path="demo-target"/> </source>
>>> <target> <path>/dev/disk/by-path</path> </target>
</pool>
>>>
>>> this example would show you each individual block device, generating
>>> paths under /dev/disk/by-path.
>>>
>>> Now, we decide we want to make use of multipath for this particular
>>> pool. We should be able to just change the target path, to point to
>>> /dev/mpath,
>>>
>>> <pool type='iscsi'> <name>virtimages</name>
<source> <host
>>> name="iscsi.example.com"/> <device
path="demo-target"/> </source>
>>> <target> <path>/dev/mpath</path> </target>
</pool>
>>>
>>> and have it give us back the unique multipath enabled LUNs, instead
>>> of each individual block device.
>> The problem with this approach is that dm devices are not SCSI devices,
>> so putting them in a SCSI pool seems wrong. iSCSI pools have always
>> contained volumes which are iSCSI block devices, directory pools have
>> always had volumes which are files. We shouldn't break that assumption
>> unless we have a good reason. It's not impossible to do what you
>> describe, but I don't understand why it's a benefit.
>
> What is a SCSI device though ? Under Linux these days everything appears
> to be a SCIS device whether it is SCSI or not, eg PATA, SATA, USB. So
> there can be no assumption that a SCSI HBA pool gives you SCSI devices.
> If an application using a pool expects volumes to have particular
> SCSI capabilities (peristent reservations for example), then the only
> way is for it to query the device, or try the capability it wants and
> handle failure. The best libvirt can guarentee is that a SCSI, disk,
> iSCSI & logical pools will give back block devices, while fs / netfs
> pools will give back plain files.
> The one downside I realize with my suggestion here, is that a single
> multipath device may have many paths, and each path may go via a
> separate HBA, which would mean separate SCSI pool. So in fact I think
> we shouldn't expose multipath in normal SCSI pools after all :-)
Agreed, let's keep the existing pools the way they are.
> I'm still inclined to think we can do the 'list' operation in node
device
> APis though
Again, I think using the node device APIs as the only support for
multipath devices is contrary to how we're leading people to believe
storage should be managed with libvirt.
>>>> The target element is ignored, as it is by the disk pool, but the
>>>> config code rejects the XML if it does not exist. That behavior
>>>> should obviously be cleaned up, but I think that should be done in
>>>> a separate patch, as it's really a bug in the config code, not
>>>> related to the addition of the new pool type.
>>> The target element is not ignored by the disk pool. This is used to
>>> form the stable device paths via virStorageBackendStablePath() for
>>> all block device based pools.
>> Hmm--on my system the path I specify shows up in the pool XML, but is
>> unused as far as I can tell. I can hand it something totally bogus and
>> it doesn't complain. I think your next point is very good, though, so
>> I'll make the target element meaningful in the multipath case and we can
>> investigate the disk behavior separately.
>
> Normally a disk pool will give you back volumes whose path name
> is /dev/sdXX. If you give the pool a target path if /dev/disk/by-uuid
> then the volumes will get paths like
> /dev/disk/by-uuid/b0509f5a-2824-4090-9da2-d0f0ff4ace0e
> Since it is possible that some volumes may not have stable paths
> though, we fall back to /dev/sdXXX if one can't be formed.
>
> We should probably explicitly reject bogus target paths which don't
> even exist on disk though. Only allow targets under /dev, where the
> given target exists
That sounds good.
Dave
Dan,
Ping, what are your thoughts on this stuff?
Dave