[libvirt-users] libvirt does not recognize all devices in iscsi and mpath pools in a predictable manner

Hi, I'm using libvirt 0.8.3 on Fedora 14 (as I wrote earlier, I'm having some trouble updating to the newest version), and I'm having problems getting iscsi and mpath storage pools to work in a usable and consistent manner. I have two storage pools defined on the host machine, one for raw iscsi- devices and one for those same iscsi devices device-mapped by multipath. They look like this: <pool type='iscsi'> <name>iscsi01</name> <source> <host name='10.3.1.15'/> <device path='iqn.1984-05.com.dell:powervault.md3200i.6782bcb0000859f3000000004d3eec7d'/> </source> <target> <path>/dev/disk/by-id</path> </target> </pool> and: <pool type="mpath"> <name>mpath01</name> <target> <path>/dev/mapper</path> </target> </pool> I chose <path>/dev/disk/by-id</path> over /dev/disk/by-path for the iscsi pool because I need to be able to migrate running virtual machines to other hosts, so the actual device paths for the disks need to be the same on all hosts in the cluster. I have two LUNs configured in the iSCSI array, and when I list the volumes in the iscsi pool, I get this: virsh # vol-list iscsi01 Name Pfad ----------------------------------------- 23.0.0.1 /dev/disk/by-id/wwn-0x6782bcb0000859f3000007004e680190 23.0.0.2 /dev/disk/by-id/scsi-36782bcb0000859f3000007294e6e73f2 Apparently, for some weird reason, libvirt chooses one naming scheme for one of the volumes and another for the other one. This a problem for me for two reasons. 1 - Once I create a new lun for a new VM, rescan the iscsi bus and refresh libvirt's pool (see my mail from 24th of August on this list), I need to be able to automatically and reliably identify the corresponding new volumes in libvirt's volume list, so I can initialize them by copying an OS image onto them and then assign them to the new virtual machine. This is made a lot harder if libvirt randomly switches between naming schemes - though still feasible if I'm aware of the problem. 2 - I need to be able to migrate running VMs to a different host machine, in case the current one is overloaded or in need of maintenance. This is bound to be problematic if the storage volumes go by different path names on the different hosts. Both LUNs do appear under both naming schemes under /dev/disk/by-id: ls /dev/disk/by-id/ [...] scsi-36782bcb0000859f3000007004e680190 [...] scsi-36782bcb0000859f3000007294e6e73f2 scsi-36a4badb00b0f910012e0fccb07606fe6 [...] wwn-0x6782bcb0000859f3000007004e680190 [...] wwn-0x6782bcb0000859f3000007294e6e73f2 wwn-0x6a4badb00b0f910012e0fccb07606fe6 [...] (snipped some irrelevant parts) With the mpath pool, it's even worse: One of the two volumes is completely missing: virsh # vol-list mpath01 Name Pfad ----------------------------------------- dm-3 /dev/mapper/36782bcb0000859f3000007004e680190 Both iscsi volumes have in fact been picked up by multipathd and do appear in /dev/mapper/, too, but no matter how often I say refresh-pool mpath01, it will always only show the one volume. Does anybody know of these problems or how to work around them? Are these problems solved in newer versions? (I could not find anyhting in the bugtracker...) Guido

On Tue, Sep 13, 2011 at 05:17:54PM +0200, Guido Winkelmann wrote:
Hi,
I'm using libvirt 0.8.3 on Fedora 14 (as I wrote earlier, I'm having some trouble updating to the newest version), and I'm having problems getting iscsi and mpath storage pools to work in a usable and consistent manner.
I have two storage pools defined on the host machine, one for raw iscsi- devices and one for those same iscsi devices device-mapped by multipath. They look like this:
<pool type='iscsi'> <name>iscsi01</name> <source> <host name='10.3.1.15'/> <device path='iqn.1984-05.com.dell:powervault.md3200i.6782bcb0000859f3000000004d3eec7d'/> </source> <target> <path>/dev/disk/by-id</path> </target> </pool>
and:
<pool type="mpath"> <name>mpath01</name> <target> <path>/dev/mapper</path> </target> </pool>
I chose <path>/dev/disk/by-id</path> over /dev/disk/by-path for the iscsi pool because I need to be able to migrate running virtual machines to other hosts, so the actual device paths for the disks need to be the same on all hosts in the cluster.
/dev/disk/by-path should be the same across all hosts too, at least for iSCSI, but perhaps not FibreChannel - depending on the udev naming scheme.
I have two LUNs configured in the iSCSI array, and when I list the volumes in the iscsi pool, I get this:
virsh # vol-list iscsi01 Name Pfad ----------------------------------------- 23.0.0.1 /dev/disk/by-id/wwn-0x6782bcb0000859f3000007004e680190 23.0.0.2 /dev/disk/by-id/scsi-36782bcb0000859f3000007294e6e73f2
Apparently, for some weird reason, libvirt chooses one naming scheme for one of the volumes and another for the other one. This a problem for me for two reasons.
1 - Once I create a new lun for a new VM, rescan the iscsi bus and refresh libvirt's pool (see my mail from 24th of August on this list), I need to be able to automatically and reliably identify the corresponding new volumes in libvirt's volume list, so I can initialize them by copying an OS image onto them and then assign them to the new virtual machine. This is made a lot harder if libvirt randomly switches between naming schemes - though still feasible if I'm aware of the problem.
2 - I need to be able to migrate running VMs to a different host machine, in case the current one is overloaded or in need of maintenance. This is bound to be problematic if the storage volumes go by different path names on the different hosts.
Yeah, that's clearly not acceptable & we need to fix libvirt here.
Both LUNs do appear under both naming schemes under /dev/disk/by-id:
ls /dev/disk/by-id/ [...] scsi-36782bcb0000859f3000007004e680190 [...] scsi-36782bcb0000859f3000007294e6e73f2 scsi-36a4badb00b0f910012e0fccb07606fe6 [...] wwn-0x6782bcb0000859f3000007004e680190 [...] wwn-0x6782bcb0000859f3000007294e6e73f2 wwn-0x6a4badb00b0f910012e0fccb07606fe6 [...]
(snipped some irrelevant parts)
This is the problem. Our code is assuming that /dev/disk/by-id contains only 1 symlink per disk. For some reason your udev rules are creating multiple symlinks per disk, and hence breaking libvirt. We iterate over the entries in /dev/disk/by-id, so we get them back in whatever order the filesystem feels like today. What we need todo is to read all the matches for that disk, sort the results, and then pick the first result. Or, we might want to make it possible to specify a target path of /dev/disk/by-id/www-* so that you can choose which naming scheme to use.
With the mpath pool, it's even worse: One of the two volumes is completely missing:
virsh # vol-list mpath01 Name Pfad ----------------------------------------- dm-3 /dev/mapper/36782bcb0000859f3000007004e680190
Both iscsi volumes have in fact been picked up by multipathd and do appear in /dev/mapper/, too, but no matter how often I say refresh-pool mpath01, it will always only show the one volume.
This is a little odd, I've no immediate explanation for it. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Am Dienstag, 13. September 2011, 17:39:16 schrieben Sie:
On Tue, Sep 13, 2011 at 05:17:54PM +0200, Guido Winkelmann wrote: [...]
I chose <path>/dev/disk/by-id</path> over /dev/disk/by-path for the iscsi pool because I need to be able to migrate running virtual machines to other hosts, so the actual device paths for the disks need to be the same on all hosts in the cluster.
/dev/disk/by-path should be the same across all hosts too, at least for iSCSI, but perhaps not FibreChannel - depending on the udev naming scheme.
Not in my case. The by-path device names contain the ip address of the iSCSI array, and in my case, two host machines access the array via different IP addresses... Granted, that's only a test setup and the planned production setup won't have that problem, but still, there is no guarantee that the by-path are always the same across all hosts. Also, if the used iSCSI array can be accessed via multiple different IP addresses, each volume will be listed multiple times under by-path. [...]
Both LUNs do appear under both naming schemes under /dev/disk/by-id:
ls /dev/disk/by-id/ [...] scsi-36782bcb0000859f3000007004e680190 [...] scsi-36782bcb0000859f3000007294e6e73f2 scsi-36a4badb00b0f910012e0fccb07606fe6 [...] wwn-0x6782bcb0000859f3000007004e680190 [...] wwn-0x6782bcb0000859f3000007294e6e73f2 wwn-0x6a4badb00b0f910012e0fccb07606fe6 [...]
(snipped some irrelevant parts)
This is the problem. Our code is assuming that /dev/disk/by-id contains only 1 symlink per disk. For some reason your udev rules are creating multiple symlinks per disk, and hence breaking libvirt.
Well, I did not change anything in the udev rules. They're the defaults from Fedora 14...
We iterate over the entries in /dev/disk/by-id, so we get them back in whatever order the filesystem feels like today. What we need todo is to read all the matches for that disk, sort the results, and then pick the first result.
Or, we might want to make it possible to specify a target path of
/dev/disk/by-id/www-*
so that you can choose which naming scheme to use.
The last one sounds like a good idea to me. I'm still trying to figure out where the extra "3" comes from in front of the world-wide id in the scsi- naming scheming or in that of the multipathd... One thing I've noticed in the meantime is that, if I know the device node exists on the host, I can just give it as the pathname when defining a new virtual machine, even if libvirt won't show that volume in vol-list, or won't show it under that name, and it'll just work. While that solves my problem, it means I'm now working entirely around libvirt's storage subsystem. I might as well not even bother defining any pools any more... :-( Regards, Guido PS: I'm sorry, this mail was supposed to go the mailing list, not to Daniel personally...
participants (2)
-
Daniel P. Berrange
-
Guido Winkelmann