Hi,
For some time now (starting with the 3.10 or 3.11 kernels, but I'm not
sure) I have the following problem:
I have a machine with 6 SATA slots and two SAS controllers, one onboard
HBA and one RAID controller in a PCIe slot. The problem is that the
order of the SAS controllers changes randomly after a reboot, so the
SCSI addresses of the devices change. One time lsscsi prints (empty
lines and ellipsis added for clarity):
$ lsscsi
[0:0:0:0] disk ATA INTEL SSDSC2BA20 5DV1 /dev/sda
[6:0:0:0] disk SmrtStor TXA2D20200GA6001 KT00 /dev/sdb
[...]
[6:0:7:0] tape TANDBERG LTO-4 HH U619 /dev/st0
[6:0:8:0] enclosu Intel RES2SV240 0d00 -
[7:0:0:0] disk LSI 9750-8i DISK 5.12 /dev/sdh
[7:0:2:0] disk LSI 9750-8i DISK 5.12 /dev/sdi
After the next reboot it looks like this:
$ lsscsi
[0:0:0:0] disk ATA INTEL SSDSC2BA20 5DV1 /dev/sda
[6:0:0:0] disk LSI 9750-8i DISK 5.12 /dev/sdb
[6:0:2:0] disk LSI 9750-8i DISK 5.12 /dev/sdc
[8:0:0:0] disk SmrtStor TXA2D20200GA6001 KT00 /dev/sdd
[...]
[8:0:7:0] tape TANDBERG LTO-4 HH U619 /dev/st0
[8:0:8:0] enclosu Intel RES2SV240 0d00 -
That's really a problem. One of them, not so much a libvirt problem, is
the fact that existing command line tools to maintain HBAs and RAID
controllers access controller and devices via the host address.
The other problem is with libvirt. If you want to access an arbitrary
SAS device from a VM, you have to specify the address of the device as
SCSI address. Consider this case in a VM xml file to attach to the
host's tape drive:
<hostdev mode='subsystem' type='scsi' managed='no'
sgio='unfiltered'>
<source>
<adapter name='scsi_host6'/>
<address bus='0' target='7' unit='0'/>
</source>
</hostdev>
With the current behaviour of the kernel, this will be broken in all
setups with multiple controllers. If the host adapter address changes
arbitrarily after a reboot, the VM will have a broken link to the SCSI
device. In the worst case, it will access another device connected to
another controller and data on that other device may be destroyed.
I asked elsewhere how to fix the order of the SAS controllers, but the
only information I got was that it's apparently a deliberate choice to
initialize HW in parallel, because the OS doesn't care for the order,
and thus, the host address of the controllers.
Sure, I can create a wrapper script and systemd foo to workaround this
issue by creating the xml file content on the fly after a reboot, but,
honestly, that can't be a *solution* for this behaviour.
As far as I can see, it's kind of impossible to stick to the SCSI
address for SCSI devices in the VM XML description. If the VM setup
should survive a reboot in a multi-HBA environment, it's necessary to
use unambiguous OS device names, like the udev entries under /dev/disk
and /dev/tape(*).
Is that already possible, or is there some work in progress? Or *is*
there some way to make sure that the SAS controllers always have the
same host bus address after reboot so libvirt doesn't get into trouble?
Thanks,
Corinna
(*) But then again, assuming I want to import the enclosure at SCSI
address [X:0:8:0] into a VM? /dev/sg9? But, no, this isn't
unambiguous either. So... what?
--
Corinna Vinschen
Cygwin Maintainer
Red Hat