Re: [libvirt] [PATCH 0/1] Multipath pool support

24 Jul 2009

      On Thu, Jul 23, 2009 at 02:53:48PM -0400, Dave Allan wrote:
...
Daniel P. Berrange wrote:
...
...
It doesn't currently allow configuration of multipathing, so for 
now setting the multipath configuration will have to continue to be
done as part of the host system build.
Example XML to create the pool is:
<pool type="mpath"> <name>mpath</name> <target> 
<path>/dev/mapper</path> </target> </pool>
So this is in essence a 'singleton' pool, since there's only really 
one of them per host. There is also no quanity of storage associated 
with a mpath pool - it is simply dealing with volumes from other 
pools. This falls into the same conceptual bucket as things like 
DM-RAID, MD-RAID and even loopback device management.
It is a singleton pool, in that there is only one dm instance per host. 
 With regard to capacity, the dm devices have capacity, and their 
constituent devices could be members of other pools.  Can you elaborate 
on what you see as the implications of those points?
The storage pool vs storage volume concept was modelled around the idea
that you have some storage source, and it is sub-divided into a number
of volumes

With multipath pool you have no storage source - the source is the 
SCSI/iSCSI pool which actually provides the underlying block provides
which are the LUN paths. So by having a explicit storage pool for
multipath, there's an implicit dependancy between 2 pools. If you
refresh a SCSI pool, you must then refresh a multipath pool too.
Or if you add a SCSI/iSCSI pool you must also refresh the multipath
pool. There's also the issue of tracking the assoication between
multipath volumes and the pools to ensure you don't remove a pool
that's providing a multipath volume thats still in use.
...
...
The question I've never been able to satisfactorily answer myself is
whether these things(mpath,raid,loopback) should be living in the 
storage pool APIs, or in the host device APIs.
I also wonder people determine the assoication between the volumes in
the mpath pool, and the volumes for each corresponding path. eg, how
do they determine that /dev/mapper/dm-4 multipath device is 
associated with devices from the SCSI storage pool 'xyz'. The storage
volume APIs & XML format don't really have a way to express this 
relationship.
It's not difficult to query to find out what devices are parents of a 
given device, but what is the use case for finding out the pools of the 
parent devices?
Say you have 3 SCSI NPIV pools configured, and a multipath pool.
You want to remove one of the SCSI pools, and know that the 
multipath devices X, Y & Z are in use. You need to determine which
of the SCSI pools contains the underlying block devices for these
multipath devices before you can safely remove that SCSI pool.
...
...
The host device APIs have a much more limited set of operations 
(list, create, delete) but this may well be all that's needed for 
things like raid/mpath/loopback devices, and with its XML format 
being capability based we could add a multipath capability under 
which we list the constituent paths of each device.
If we decide to implement creation and destruction of multipath devices, 
I would think the node device APIs would be the place to do it.
If we intend to do creation/deletion of multipath devices in the
node device  APIs, then we esentially get listing of multipath
devices in the node device APIs for free. So do we need a dedicated
storage pool for multipath too ?

I have a feeling that the DeviceKit impl of the node devive APIs (which
is currently disabled by default, may already be reporting on all
device mapper block devices - the HAL impl does not.
...
...
Now, if my understanding is correct, then if multipath is active it 
should automatically create multipath devices for each unique LUN on 
a storage array. DM does SCSI queries to determine which block 
devices are paths to the same underlying LUN.
That's basically correct, and the administrator can configure which 
devices have multipath devices created.
...
Taking a simple iSCSI storage pool
<pool type='iscsi'> <name>virtimages</name> <source> <host 
name="iscsi.example.com"/> <device path="demo-target"/> </source> 
<target> <path>/dev/disk/by-path</path> </target> </pool>
this example would show you each individual block device, generating
paths under /dev/disk/by-path.
Now, we decide we want to make use of multipath for this particular 
pool. We should be able to just change the target path, to point to 
/dev/mpath,
<pool type='iscsi'> <name>virtimages</name> <source> <host 
name="iscsi.example.com"/> <device path="demo-target"/> </source> 
<target> <path>/dev/mpath</path> </target> </pool>
and have it give us back the unique multipath enabled LUNs, instead 
of each individual block device.
The problem with this approach is that dm devices are not SCSI devices, 
so putting them in a SCSI pool seems wrong.  iSCSI pools have always 
contained volumes which are iSCSI block devices, directory pools have 
always had volumes which are files.  We shouldn't break that assumption 
unless we have a good reason.  It's not impossible to do what you 
describe, but I don't understand why it's a benefit.
What is a SCSI device though ? Under Linux these days everything appears
to be a SCIS device whether it is SCSI or not, eg PATA, SATA, USB. So
there can be no assumption that a SCSI HBA pool gives you SCSI devices.
If an application using a pool expects volumes to have particular 
SCSI capabilities (peristent reservations for example), then the only
way is for it to query the device, or try the capability it wants and
handle failure. The best libvirt can guarentee is that a SCSI, disk, 
iSCSI  & logical pools will give back block devices,  while fs / netfs
pools will give back plain files.

The one downside I realize with my suggestion here, is that a single
multipath device may have many paths, and each path may go via a
separate HBA, which would mean separate SCSI pool. So in fact I think
we shouldn't expose multipath in normal SCSI pools after all :-)

I'm still inclined to think we can do the 'list' operation in node device
APis though
...
...
...
The target element is ignored, as it is by the disk pool, but the 
config code rejects the XML if it does not exist.  That behavior 
should obviously be cleaned up, but I think that should be done in 
a separate patch, as it's really a bug in the config code, not 
related to the addition of the new pool type.
The target element is not ignored by the disk pool. This is used to 
form the stable device paths via virStorageBackendStablePath() for 
all block device based pools.
Hmm--on my system the path I specify shows up in the pool XML, but is
unused as far as I can tell.  I can hand it something totally bogus and
it doesn't complain.  I think your next point is very good, though, so
I'll make the target element meaningful in the multipath case and we can
investigate the disk behavior separately.
Normally a disk pool will give you back volumes whose path name
is /dev/sdXX. If you give the pool a target path if /dev/disk/by-uuid
then the volumes will get paths like /dev/disk/by-uuid/b0509f5a-2824-4090-9da2-d0f0ff4ace0e
Since it is possible that some volumes may not have stable paths
though, we fall back to /dev/sdXXX if one can't be formed.

We should probably explicitly reject bogus target paths which don't
even exist on disk though. Only allow targets under /dev, where the
given target exists

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|