Before posting it to WIKI or somewhere, I want to see if there is any
suggestions on it, or if I missed something.
============================================
How to use NPIV in libvirt
I planned to wrote a document about how to use NPIV in libvirt after
more features are supported, but it looks like I can't wait till then,
got lots lots of questions from both the bugs and mails. So here we go.
The document tries to summary up the things about NPIV that libvirt
supports till now, and the TODO list. Feedback or suggestion is welcomed.
1) How to find out which HBA(s) support vHBA
For libvirt newer than "1.0.4", you can find it out simply by:
# virsh nodedev-list --cap vports
"--cap vports" is to tell "nodedev-list" only outputs the devices
which support "vports" capability, i.e. support vHBA.
And also since version "1.0.4", you should be able to know the maximum
vports the HBA supports and the current vports number from the HBA's XML,
e.g.
# virsh nodedev-dumpxml scsi_host5
<device>
<name>scsi_host5</name>
<parent>pci_0000_04_00_1</parent>
<capability type='scsi_host'>
<host>5</host>
<capability type='fc_host'>
<wwnn>2001001b32a9da4e</wwnn>
<wwpn>2101001b32a9da4e</wwpn>
<fabric_wwn>2001000dec9877c1</fabric_wwn>
</capability>
<capability type='vport_ops'>
<max_vports>164</max_vports>
<vports>5</vports>
</capability>
</capability>
</device>
For libvirt older than "1.0.4", it's a bit complicated than above:
First you need to find out all the HBAs, e.g.
# virsh nodedev-list --cap scsi_host
scsi_host0
scsi_host1
scsi_host2
scsi_host3
scsi_host4
scsi_host5
And then, to see if the HBA supports vHBA, check if the dumped
XML contains "vport_ops" capability. E.g.
# virsh nodedev-dumpxml scsi_host3
<device>
<name>scsi_host3</name>
<parent>pci_0000_00_08_0</parent>
<capability type='scsi_host'>
<host>3</host>
</capability>
</device>
That says "scsi_host3" doesn't support vHBA
# virsh nodedev-dumpxml scsi_host5
<device>
<name>scsi_host5</name>
<parent>pci_0000_04_00_1</parent>
<capability type='scsi_host'>
<host>5</host>
<capability type='fc_host'>
<wwnn>2001001b32a9da4e</wwnn>
<wwpn>2101001b32a9da4e</wwpn>
<fabric_wwn>2001000dec9877c1</fabric_wwn>
</capability>
<capability type='vport_ops' />
</capability>
</device>
But "scsi_host5" supports it.
One might be confused with the node device naming style (e.g. scsi_host5)
in this document and RHEL6 Virtualization Guide [1]
(pci_10df_fe00_scsi_host_0). It's because of libvirt has two backends for
node device driver: udev and HAL. We prefer the udev backend more than HAL
backend in internal implementation, I think there is good enough reason to
do so (HAL is maintenance mode now). I believe udev backend is used more
than HAL backend, but if your destribution packager build libvirt without
udev backend, don't be surprised with the node device names like the ones
in [1].
2) How to create a vHBA
Pick up one HBA which supports vHBA, use it's "node device name" as the
"parent" of vHBA, and specify the "wwnn" and "wwpn" in the
vHBA's XML. E.g.
<device>
<name>scsi_host6</name>
<parent>scsi_host5</parent>
<capability type='scsi_host'>
<capability type='fc_host'>
<wwnn>2001001b32a9da5e</wwnn>
<wwpn>2101001b32a9da5e</wwpn>
</capability>
</capability>
</device>
Then create the vHBA with virsh command "nodedev-create" (assuming above
XML file is named "vhba.xml"):
# virsh nodedev-create vhba.xml
Node device scsi_host6 created from vhba.xml
Since "0.9.10", libvirt will generate "wwnn" and "wwpn"
automatically if
they are not specified. It means one can create the vHBA by a more simple
XML like:
<device>
<parent>scsi_host5</parent>
<capability type='scsi_host'>
<capability type='fc_host'>
</capability>
</capability>
</device>
3) How to destroy a vHBA
As usual, destroying something is always simpler than creating it:
# virsh nodedev-destroy scsi_host6
Destroyed node device 'scsi_host6'
You might already realize that the vHBA is removed permanently, don't be
surprised, it's the life, node device driver doesn't support persistent
config. I won't say it's nightmare for users who screams when realizing the
vHBA disappeared after a system rebooting, but it's relatively not good,
(assuming that you got the wwnn:wwpn pair from the storage admin, but didn't
record it). Fortunately, we support the persistent vHBA now, see next
section
for details.
4) How to create a persistent vHBA
Let's go back to the history a bit firstly.
Prior to libvirt "1.0.5", one can define a "scsi" type pool based
on a
(v)HBA by it's scsi host name (e.g. "host5" in XML below). E.g.
<pool type='scsi'>
<name>poolhba0</name>
<uuid>e9392370-2917-565e-692b-d057f46512d6</uuid>
<capacity unit='bytes'>0</capacity>
<allocation unit='bytes'>0</allocation>
<available unit='bytes'>0</available>
<source>
<adapter name='host0'/>
</source>
<target>
<path>/dev/disk/by-path</path>
<permissions>
<mode>0700</mode>
<owner>0</owner>
<group>0</group>
</permissions>
</target>
</pool>
Quite nice? yeah, at least it looks so, but the problem is the scsi host
number is *unstable* (it can be changed after system rebooting, or kernel
module reloading, or a vHBA recreating etc), and thus the "scsi" type pool
based on a (v)HBA becomes unstable too. Obviously it doesn't help on the
"persistent vHBA" problem.
To solve the problems, since libvirt "1.0.5", we introduced new XML
schema
to indicate the (v)HBA. An example of the XML:
<pool type='scsi'>
<name>poolvhba0</name>
<uuid>e9392370-2917-565e-692b-d057f46512d6</uuid>
<source>
<adapter type='fc_host' parent='scsi_host5'
wwnn='20000000c9831b4b' wwpn='10000000c9831b4b'/>
</source>
<target>
<path>/dev/disk/by-path</path>
<permissions>
<mode>0700</mode>
<owner>0</owner>
<group>0</group>
</permissions>
</target>
</pool>
It allows to define a "scsi" type pool based on either a HBA or a
vHBA. For
HBA, "parent" attribute can be omitted. For vHBA, if "parent" is not
specified,
libvirt will pick up the first HBA which supports vHBA, and doesn't
exceed the
maximum vports it supports, automatically.
For the pool based on a vHBA, When the pool is starting, libvirt will
check
if the specified vHBA (wwnn:wwpn) is existing on host or not, if it doesn't
exist yet, libvirt will create it automatically. When the pool is being
stopped,
the vHBA is destroyed. But since storage driver supports the persistent
config,
one can easily gets the vHBA with same "wwnn:wwpn" in next starting
(Don't scream
if your pool is transient).
It's not the end if you want to get the vHBA created automatically
after system
rebooting, you will need to set the pool as "autostart":
# virsh pool-autostart poolvhba0
One might be curious about why not to support persistent config for
node device
driver, and support to create persistent vHBA there. One of the reason
is that
it will be duplicate with what storage pool does. And another reason
(the important
one) is we want to assiciate the libvirt storage pool/volume with domain
(see
section "Use LUN for guest" below).
5) How to find out the LUN's path
If you have defined the "scsi" type pool based on the (v)HBA, it's
simple to
lookup what LUNs attached to the (v)HBA by virsh command "vol-list", e.g.
# virsh vol-list poolvhba0 --details
Name Path Type Capacity Allocation
--------------------------------------------------------------------------------------------------------
unit:0:2:0
/dev/disk/by-path/pci-0000:04:00.1-fc-0x203500a0b85ad1d7-lun-0 block
20.01 GiB 20.01 GiB
If you have not defined a "scsi" type pool based on the (v)HBA, you
can find it
out (v)HBA by either virsh command "nodedev-list --tree", or iterating
sysfs manually.
To find out the LUNs by virsh command "nodedev-list" (irrelevant
ouputs are
omitted):
# virsh nodedev-list --tree
+- pci_0000_00_0d_0
| |
| +- pci_0000_04_00_0
| | |
| | +- scsi_host4
| |
| +- pci_0000_04_00_1
| |
| +- scsi_host5
| |
| +- scsi_host7
| +- scsi_target5_0_0
| | |
| | +- scsi_5_0_0_0
| |
| +- scsi_target5_0_1
| | |
| | +- scsi_5_0_1_0
| |
| +- scsi_target5_0_2
| | |
| | +- scsi_5_0_2_0
| | |
| | +- block_sdb_3600a0b80005adb0b0000ab2d4cae9254
| |
| +- scsi_target5_0_3
| |
| +- scsi_5_0_3_0
"scsi_host5" is an HBA on my host, it has a LUN named
"block_sdb_3600a0b80005adb0b0000ab2d4cae9254", don't be confused with
the naming,
it's the naming style libvirt uses, meaningful only for libvirt. It
indicates
the LUN has a short device path "/dev/sdb", and a ID
"3600a0b80005adb0b0000ab2d4cae9254":
# ls /dev/disk/by-id/ | grep 3600a0b80005adb0b0000ab2d4cae9254
scsi-3600a0b80005adb0b0000ab2d4cae9254
To manually find the LUNs of a (v)HBA:
First, you need to iterate over all the directores begins with the SCSI
scsi host number of the v(HBA) under "/sys/bus/scsi/devices". E.g. I
will look
up the LUNs of the HBA with SCSI host number 5 on my host:
# ls /sys/bus/scsi/devices/5:* -d
/sys/bus/scsi/devices/5:0:0:0 /sys/bus/scsi/devices/5:0:1:0
/sys/bus/scsi/devices/5:0:2:0 /sys/bus/scsi/devices/5:0:3:0
# ls /sys/bus/scsi/devices/5\:0\:3\:0/block/sdc
It means scsi_host5 has a LUN attached with device name "sdc" on address
"5:0:3:0".
# ls /sys/bus/scsi/devices/5\:0\:1\:0/ | grep block
device_blocked
scsi_host5 doesn't have a LUN attached on address "5:0:2:0"
The device name like "sdc" is not stable, to find out the stable
path, find
out the symbol link which points to the device name. E.g.
# ls -l /dev/disk/by-path/
lrwxrwxrwx. 1 root root 9 Sep 10 22:28
pci-0000:00:07.0-scsi-0:0:0:0 -> ../../sda
lrwxrwxrwx. 1 root root 10 Sep 10 22:28
pci-0000:00:07.0-scsi-0:0:0:0-part1 -> ../../sda1
lrwxrwxrwx. 1 root root 9 Sep 10 22:28
pci-0000:04:00.1-fc-0x203400a0b85ad1d7-lun-0 -> ../../sdc
Then "/dev/disk/by-path/pci-0000:04:00.1-fc-0x203400a0b85ad1d7-lun-0"
is the
stable path of the LUN attached to address "5:0:3:0". Of course, you can use
the similiar method to get the "by-id | by-uuid | by-label" stable path.
6) Use the LUN to guest
Since libvirt "1.0.5", we supported to use the storage volume as disk
source by
two new attributes ("pool" and "volume") for disk
"<source"> element. E.g.
<disk type='volume' device='disk'>
<driver name='qemu' type='raw'/>
<source pool='poolvhba0' volume='unit:0:2:0 '/>
<target dev='hda' bus='ide'/>
</disk>
There are lots of advantage to do so. Since the mainly purpose of the
document is about "how to use", I will only mention two here to persuade
you using the it. First, you don't need to look up the LUN's path youself.
Second, assuming that you want to migrate a domain which uses a LUN attached
to a vHBA, do you want to create the vHBA manually on target host? With the
pool, you can simply define/start a pool with same config on target host.
So, if your libvirt is newer than "1.0.5", we recommend you to define the
"scsi" type pool based on the (v)HBA, and use "pool/volume" names to
use
the LUN as disk source.
You can either use the LUN as qemu emulated disk, or passthrough it to
guest.
To use it as qemu emulated disk, specifying the "device" attribute as
"device='disk|cdrom|floppy'". E.g.
<disk type='volume' device='disk'>
<driver name='qemu' type='raw'/>
<source pool='blk-pool0' volume='blk-pool0-vol0'/>
<target dev='hda' bus='ide'/>
</disk>
Or (using the LUN's path directly)
<disk type='volume' device='disk'>
<driver name='qemu' type='raw'/>
<source
dev='/dev/disk/by-path/pci-0000\:04\:00.1-fc-0x203400a0b85ad1d7-lun-0'/>
<target dev='sda' bus='scsi'/>
</disk>
To passthrough the LUN, specifying the "device" attribute as
"device='lun'", e.g.
<disk type='volume' device='lun'>
<driver name='qemu' type='raw'/>
<source
dev='/dev/disk/by-path/pci-0000\:04\:00.1-fc-0x203400a0b85ad1d7-lun-0'/>
<target dev='sda' bus='scsi'/>
</disk>
6) Future work
* NPIV based SCSI host passthrough
That's what the users ask: How to passthrough a (v)HBA to guest?
* Expose vendor information, LUN's path, state of (v)HBA in its XML
* May be a virsh command to simplify vHBA creation with options
[1]
http://www.linuxtopia.org/online_books/rhel6/rhel_6_virtualization/rhel_6...
Regards,
Osier