Any comments on this observation?

On Sat, Jul 16, 2016 at 2:25 AM, Nitesh Konkar <niteshkonkar.libvirt@gmail.com> wrote:
Link:  http://wiki.libvirt.org/page/NPIV_in_libvirt
Topic: Virtual machine configuration change to use vHBA LUN

There is a NPIV storage pool defined on two hosts and  pool contains a total of 8 volumes, allocated from a storage device.

Source:
# virsh vol-list poolvhba0
 Name                 Path                                    
------------------------------------------------------------------------------
 unit:0:0:0           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000366
 unit:0:0:1           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000367
 unit:0:0:2           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000368
 unit:0:0:3           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000369
 unit:0:0:4           /dev/disk/by-id/wwn-0x6005076802818bda300000000000036a
 unit:0:0:5           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000380
 unit:0:0:6           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000381
 unit:0:0:7           /dev/disk/by-id/wwn-0x6005076802818bda3000000000000382
--------------------------------------------------------------------

Destination: ------------------------------
-------------------------------------- # virsh vol-list poolvhba0 Name Path ------------------------------------------------------------------------------ unit:0:0:0 /dev/disk/by-id/wwn-0x6005076802818bda3000000000000380 unit:0:0:1 /dev/disk/by-id/wwn-0x6005076802818bda3000000000000381 unit:0:0:2 /dev/disk/by-id/wwn-0x6005076802818bda3000000000000382 unit:0:0:3 /dev/disk/by-id/wwn-0x6005076802818bda3000000000000367 unit:0:0:4 /dev/disk/by-id/wwn-0x6005076802818bda3000000000000368 unit:0:0:5 /dev/disk/by-id/wwn-0x6005076802818bda3000000000000366 unit:0:0:6 /dev/disk/by-id/wwn-0x6005076802818bda300000000000036a unit:0:0:7 /dev/disk/by-id/wwn-0x6005076802818bda3000000000000369 --------------------------------------------------------------------

As you can see in the above output,the same set of eight LUNs from the storage server have been mapped,
but the order that the LUNs are probed on each host is different, resulting in different unit names
on the two different hosts .
If the the guest XMLs is referencing its storage by "unit" number then is 
it safe to migrate such guests because the "unit number" is assigned by the
driver according to the specific way it probes the storage and hence when you migrate
these guests , it results in different unit names on the destination hosts.
Thus the migrated guest gets mapped to the wrong LUNs and is given the wrong disks.
The problem is that the LUN numbers on the destination host and source host do not agree.
Example, LUN 0 on source_host, for example, may be LUN 5 on destination_host.
When the guest is given the wrong disk, it suffers a fatal I/O error. (This is
manifested as fatal I/O errors since the guest has no idea that its disks just
changed out under it.)The migration does not take into account that the unit numbers do
match on on the source and destination sides.

So, should libvirt make sure that the guest domains reference NPIV pool volumes by their
globally-unique wwn instead of by "unit" numbers?

The guest XML references its storage by "unit" number.

Eg:-
<disk type='volume' device='lun'> <driver name='qemu' type='raw' cache='none'/> <source pool='poolvhba0' volume='unit:0:0:0'/> <backingStore/> <target dev='vdb' bus='virtio'/> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk>

I am planning to write a patch for it. Any comments on the above observation/approach would be appreciated. 
Thanks,
Nitesh.