On 09/03/2016 09:13 AM, Johan Kragsterman wrote:
Hi!
Report from my multipath tests today.
My test virtual machine, that runs from an NPIV pool, is not able to use multipath.
When I pulled the cable from one of the targets, it crashed.
But, strangely, it could boot up again on that other path, that it just crashed on.
That tells me it can use both paths, and is not limited to one of them only, but because
the multipath layer isn't there, it can not survive a path failure, but can come up
again on a reboot.
The question is, WHY doesn't an NPIV pool support multipath? It is sort of the idea
behind FC to be redundant and to always have multiple paths to failover to. Why was the
NPIV pool designed like this?
Not having done the original design, I can surmise that the "goal" for
the design was to not force the user to provide something to the guest
that could change in subsequent reboots as well as having a way to
"migrate" a guest to a different host. Although perhaps proven not
perfectly reliable per this libvir-list posting:
http://www.redhat.com/archives/libvir-list/2016-July/msg00524.html
The "issue" (so to speak) is that there's no guarantee the vHBA
'scsi_hostM' remains constant between reboots, but having a pool based
upon some vHBA that gets created from a vport capable HBA and providing
the guest with a volume by a unit number rather than some path that
can/may change.
Like you've noted, it's a complicated technology and there's more than
"one" design goal that needs to be considered. Your goal is seemingly to
be able to have the storage available on either path for the same host,
while someone else may want to be able to migrate between two hosts
using the same target where each host "sees" a different path to the target.
BTW: It's not clear to me from your description how you have added the
volume to the pool.
Is the volume in a SCSI pool?
XML for the volume in the guest:
<disk type='volume' device='lun'>
<driver name='qemu' type='raw'/>
<source pool='vhbapool_host3' volume='unit:0:4:0'/>
<target dev='sda' bus='scsi'/>
</disk>
# virsh vol-list vhbapool_host3
Name Path
------------------------------------------------------------------------------
unit:0:4:0
/dev/disk/by-path/pci-0000:10:00.0-fc-0x5006016844602198-lun-0
# virsh pool-dumpxml vhbapool_host3
<pool type='scsi'>
<name>vhbapool_host3</name>
...
<source>
<adapter type='fc_host' parent='scsi_host3'
wwnn='5001a4a4d2f10190'
wwpn='5001a4af287f9b40'/>
</source>
<target>
<path>/dev/disk/by-path</path>
...
or in a MPATH pool:
XML for the volume in the guest
<disk type='block' device='lun' sgio='unfiltered'>
<driver name='qemu' type='raw'/>
<source dev='/dev/mapper/3600a0b80005ad1d700002dde4fa32ca8'/>
<target dev='sda' bus='scsi'/>
<address type='drive' controller='0' bus='0'
target='0' unit='0'/>
</disk>
# virsh vol-list mpath
Name Path
-----------------------------------------
...
dm-5 /dev/mapper/3600a0b80005ad1d700002dde4fa32ca8
# virsh pool-dumpxml mpath
<pool type='mpath'>
<name>mpath</name>
...
<source>
</source>
<target>
<path>/dev/mapper</path>
...
caveat: I have limited mpath details knowledge. I know what the
technology is, but limited usage.
Conceptually, I understand what you're trying to accomplish. The devil
of course is in the details and yes, we really need to do a better job
documenting the various usage models. Of course, I'd be remiss if I
didn't say "patches welcome"!
If we could use the underlying devices and pass them directly to the
guest, then we could implement multipath in the guest.
But I sort of lean to that not use the NPIV anymore, since it only seems to complicate
things. In VmWare they can attach the NPIV directly to the guest, which means that the
NPIV, and whith that the LUN's are easily transfered across the SAN hosts. Here, with
libvirt/qemu/kvm, we can not attach an NPIV to the guest, which sort of makes the whole
idea fall. Especially if this is the case, that there is no multipath support. Better then
to map the LUN's directly to the host, and use the multipath devices for the guests.
A terminology thing - do you mean passing the vHBA through to the guest,
such as:
XML to pass a 'scsi_host' to the guest:
...
<hostdev mode='subsystem' type='scsi' managed='no'>
<source>
<adapter name='scsi_host15'/>
<address bus='0' target='0' unit='0'/>
</source>
<address type='drive' controller='0' bus='0'
target='0' unit='0'/>
</hostdev>
...
where scsi_host15 is the vHBA created from the vport capable scsi_host3
# virsh nodedev-dumpxml scsi_host15
<device>
<name>scsi_host15</name>
<path>/sys/devices/pci0000:00/0000:00:04.0/0000:10:00.0/host3/vport-3:0-10/host15</path>
<parent>scsi_host3</parent>
...
<capability type='fc_host'>
<wwnn>5001a4a4d2f10190</wwnn>
<wwpn>5001a4af287f9b40</wwpn>
<fabric_wwn>2002000573de9a81</fabric_wwn>
...
the wwnn/wwpn are "fabricated" by libvirt (automagically created).
# virsh nodedev-dumpxml scsi_host3
<device>
<name>scsi_host3</name>
<path>/sys/devices/pci0000:00/0000:00:04.0/0000:10:00.0/host3</path>
<parent>pci_0000_10_00_0</parent>
<capability type='scsi_host'>
...
<capability type='fc_host'>
<wwnn>20000000c9848140</wwnn>
<wwpn>10000000c9848140</wwpn>
<fabric_wwn>2002000573de9a81</fabric_wwn>
</capability>
<capability type='vport_ops'>
<max_vports>127</max_vports>
<vports>2</vports>
</capability>
...
You'll note the fabric_wwn are the same for both.
I haven't used this method in my limited test environment. It's been a
"recent question" at KVM Forum that I have on my "todo" list to
investigate a bit more (competing with multiple other items).
John
If anyone else has opinions on this, or ideas that are better than
mine, I would very much like to hear them.
Regards Johan
-----libvirt-users-bounces(a)redhat.com skrev: -----
Till: John Ferlan <jferlan(a)redhat.com>
Från: Johan Kragsterman
Sänt av: libvirt-users-bounces(a)redhat.com
Datum: 2016-09-03 08:36
Kopia: libvirt-users(a)redhat.com, Yuan Dan <dyuan(a)redhat.com>
Ärende: [libvirt-users] Ang: Re: Ang: Ang: Re: Ang: Re: attaching storage pool error
Hi, John, and thank you!
This was a very thorough and welcome response, I was wondering where all the storage guys
were...
I will get back to you with more details later, specifically about multipath, since this
needs to be investigated thoroughly.
I have, with trial and error method, during the elapsed time, been able to attach the
NPIV pool LUN to a virtio-scsi controller, and it seems it already uses multipath, when I
look at the volumes in the host.
It seems for me a little bit confusing with this multipath pool procedure, since an NPIV
vhba by nature always is multipath. I will do a very simple test later today, the best
test there is: Just pulling a cable, first from one of the FC targets, and put it back
again, and then do the same with the other one. This will give me the answer if it runs on
multipath or not.
The considerations I got was, whether I would implement multipath on the guest or on the
host, and I don't know which I would prefer. Simplicity is always to prefer, so if it
is working fine on the host, I guess I'd prefer that.
Get back to you later...
/Johan
-----John Ferlan <jferlan(a)redhat.com> skrev: -----
Till: Johan Kragsterman <johan.kragsterman(a)capvert.se>, Yuan Dan
<dyuan(a)redhat.com>
Från: John Ferlan <jferlan(a)redhat.com>
Datum: 2016-09-02 20:51
Kopia: libvirt-users(a)redhat.com
Ärende: Re: [libvirt-users] Ang: Ang: Re: Ang: Re: attaching storage pool error
On 08/24/2016 06:31 AM, Johan Kragsterman wrote:
>
> Hi again!
>
I saw this last week while I was at KVM Forum, but just haven't had the
time until now to start thinking about this stuff again ... as you point
out with your questions and replies - NPIV/vHBA is tricky and
complicated... I always have try to "clear the decks" of anything else
before trying to page how this all works back into the frontal cortex.
Once done, I quickly do a page flush.
It was also a bit confusing with respect to how the responses have been
threaded - so I just took the most recent one and started there.
> -----libvirt-users-bounces(a)redhat.com skrev: -----
> Till: Yuan Dan <dyuan(a)redhat.com>
> Från: Johan Kragsterman
> Sänt av: libvirt-users-bounces(a)redhat.com
> Datum: 2016-08-24 07:52
> Kopia: libvirt-users(a)redhat.com
> Ärende: [libvirt-users] Ang: Re: Ang: Re: attaching storage pool error
>
> Hi and thanks for your important input,Dan!
>
>
>>>
>>>
>>> System centos7, system default libvirt version.
>>>
>>> I've succeeded to create an npiv storage pool, which I could start
without
>>> problems. Though I couldn't attach it to the vm, it throwed errors when
>>> trying. I want to boot from it, so I need it working from start. I read one
>>> of Daniel Berrange's old(2010) blogs about attaching an iScsi pool, and
>>> draw
>>> my conclusions from that. Other documentation I haven't found. Someone
can
>>> point me to a more recent documentation of this?
>>>
>>> Are there other mailing list in the libvirt/KVM communities that are more
>>> focused on storage? I'd like to know about these, if so, since I'm a
>>> storage
>>> guy, and fiddle around a lot with these things...
>>>
>>> There are quite a few things I'd like to know about, that I doubt this
list
>>> cares about, or have knowledge about, like multipath devices/pools,
>>> virtio-scsi in combination with npiv-storagepool, etc.
>>>
>>> So anyone that can point me further....?
>>
>>
http://libvirt.org/formatstorage.html
>>
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/...
>>
The Red Hat documentation is most up-to-date - it was sourced (more or
less) from:
http://wiki.libvirt.org/page/NPIV_in_libvirt
There's some old stuff in there and probably needs a cleanse to provide
all the "supported" options.
>> Hope it can help you to get start with it.
>>
>>
>> Unfortunatly I have already gone through these documents, several times as
>> well, but these are only about the creation of storage pools, not how you
>> attach them to the guest.
>
> If the pool is ready, here are kinds of examples
http://libvirt.org/formatdomain.html#elementsDisks
>
> you can use it in guest like this:
> <disk type='volume' device='disk'>
> <driver name='qemu' type='raw'/>
> <source pool='iscsi-pool' volume='unit:0:0:1'
mode='host'/>
> <auth username='myuser'>
> <secret type='iscsi' usage='libvirtiscsi'/>
> </auth>
> <target dev='vdb' bus='virtio'/>
> </disk>
>
This is an 'iscsi' pool format, but something similar can be crafted for
the 'scsi' pool used for fc_host devices.
>
>
> As I described above, I created an npiv pool for my FC backend. I'd also like to
get scsi pass through, which seems to be possible only if I use "device=lun".
Can I NOT use "device=lun", and then obviously NOT get "scsi pass
through", if I use an npiv storage pool? Is the only way to get "scsi pass
through" to NOT use a storage pool, but instead use the host lun's?
>
So for the purposes of taking the right steps, I assume you used 'virsh
nodedev-list --cap vports' in order to find FC capable scsi_host#'s.
Then you created your vHBA based on the FC capable fc_host, using XML
such as:
<device>
<parent>scsi_hostN</parent>
<capability type='scsi_host'>
<capability type='fc_host'>
</capability>
</capability>
</device>
where scsi_hostN and 'N' in particular is the FC capable fc_host
Then creation of the node device :
#virsh nodedev-create vhba.xml
Node device scsi_hostM created from vhba.xml
where 'M' is whatever the next available scsi_host# is on your host.
If you 'virsh nodedev-dumpxml scsi_hostM' you'll see the wwnn/wwpn details.
You can then create a vHBA scsi pool from that in order to ensure the
persistence of the vHBA. Although it's not required - the vHBA scsi
pool just allows you to provide a source pool and volume by unit # for
your guest rather than having to edit guests between host reboots or
other such conditions which cause
>
> What do you think about this?:
>
> <disk type='volume' device='disk'>
> <driver name='qemu' type='raw'/>
> <source pool='vhbapool_host8' volume='unit:0:0:1'/>
> <target dev='hda' bus='ide'/>
> </disk>
>
>
> But I'd prefer to be able to use something like this instead:
>
>
>
> <disk type='volume' device='lun'>
> <driver name='qemu' type='raw'/>
> <source pool='vhbapool_host8' volume='unit:0:0:1'/>
> <target dev='vda' bus='scsi'/>
> </disk>
>
> But that might not be possible...?
>
The "volume" and "disk" or "volume" and "lun"
syntax can be used
somewhat interchangeably. As your point out the features for disk and
lun are slightly different. Usage of the 'lun' syntax allows addition
of the attribute "sgio='unfiltered'"
>
>
> A couple of additional questions here:
>
> * Since the target device is already defined in the pool, I don't see the reason
for defining it here as well, like in your example with the iscsi pool?
Let's forget about iscsi
> * I'd like to use virtio-scsi combined with the pool, is that possible?
Works on my test guest (ok not the order from dumpxml):
...
<controller type='scsi' index='0' model='virtio-scsi'>
<alias name='scsi0'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x05'
function='0x0'/>
</controller>
...
<disk type='volume' device='lun'>
<driver name='qemu' type='raw'/>
<source pool='vhbapool_host3' volume='unit:0:4:0'/>
<backingStore/>
<target dev='sda' bus='scsi'/>
<shareable/>
<alias name='scsi0-0-0-0'/>
<address type='drive' controller='0' bus='0'
target='0' unit='0'/>
</disk>
...
> * If so, how do I define that? I understand I can define a controller separatly, but
how do I tell the guest to use that specific controller in combination with that pool?
See above... The controller has a "type", "index", and
"model"... Then
when adding the disk, use the type='drive' controller='#', where # is
the index number from your virtio-scsi controller.
> * Since the npiv pool obviously is a pool based on an fc initiator, the fc target
can/will provision more lun's to that initiator, how will that effect the pool and the
guest's access to new lun's? In this example the volume says 'unit:0:0:1',
and I guess that will change if there will be more lun's in there? Or is that
"volume unit" the "scsi target device", and can hold multiple
lun's?
>
You can use 'virsh pool-refresh $poolname' - it will find new luns...
Err, it *should* find new luns ;-) Existing 'unit:#:#:#' values
shouldn't change - they should be "tied to" the same wwnn. Use
"virsh
vol-list $poolname" to see the Path. So when new ones are added they are
given new unit number's. Reboots should find the same order.
> ...more...
>
>
> I've found something here in the RHEL7 virt guide:
>
>
> <disk type='volume' device='lun' sgio='unfiltered'>
> <driver name='qemu' type='raw'/>
> <source pool='vhbapool_host3' volume='unit:0:0:0'/>
> <target dev='sda' bus='scsi'/>
> <shareable />
> </disk>
>
Fair warning, use of sgio='unfiltered' does require some specific
kernels... There were many "issues" with this - mostly related to kernel
support. If not supported by the kernel, you are advised :
error: Requested operation is not valid: unpriv_sgio is not supported by
this kernel
>
>
>
> Question that shows up here is the multipath question. Since this is fibre channel it
is of coarse multipath. The "target dev" says 'sda'. In a multipath dev
list it should say "/dev/mapper/mpatha".
>
> How to handle that?
>
Uhh... multipath... Not my strong suit... I'm taking an example from a
bz that you won't be able to read because it's marked private.
Once you have your vHBA and scsi_hostM for that vHBA on the host you can
use 'lsscsi' (you may have to yum/dnf install it - it's a very useful
tool)...
# lsscsi
...
//assume scsi_host6 is the new vHBA created as follow
[6:0:0:0] disk IBM 1726-4xx FAStT 0617 -
[6:0:1:0] disk IBM 1726-4xx FAStT 0617 -
[6:0:2:0] disk IBM 1726-4xx FAStT 0617 /dev/sdf
[6:0:3:0] disk IBM 1726-4xx FAStT 0617 /dev/sdg
You'll need an mpath pool:
# virsh pool-dumpxml mpath
<pool type='mpath'>
<name>mpath</name>
<source>
</source>
<target>
<path>/dev/mapper</path>
<permissions>
<mode>0755</mode>
<owner>-1</owner>
<group>-1</group>
</permissions>
</target>
</pool>
# virsh pool-define mpath
# virsh pool-start mpath
# virsh vol-list mpath
Name Path
-----------------------------------------
dm-0 /dev/mapper/3600a0b80005adb0b0000ab2d4cae9254
dm-5 /dev/mapper/3600a0b80005ad1d700002dde4fa32ca8
<=== this one is from vhba scsi_host6
Then using something like:
<disk type='block' device='lun' sgio='unfiltered'>
<driver name='qemu' type='raw'/>
<source dev='/dev/mapper/3600a0b80005ad1d700002dde4fa32ca8'/>
<target dev='sda' bus='scsi'/>
<alias name='scsi0-0-0-0'/>
<address type='drive' controller='0' bus='0'
target='0' unit='0'/>
</disk>
HTH,
John
(FWIW: I'm not sure how the leap of faith was taken that dm-5 is the
vHBA for scsi_host6... Although I think it's from the wwnn for a volume
in the vHBA as seen when using a virsh vol-list from a pool created
using the vHBA within the bz).
_______________________________________________
libvirt-users mailing list
libvirt-users(a)redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users