[libvirt] virtio-scsi support proposal, v2

Here is a revised version of the virtio-scsi proposal. There's actually not too much left intact from v1. :) The main simplification is in how SCSI hosts can be addressed in a stable manner. SCSI controller models ====================== Existing controller models are "auto", "buslogic", "lsilogic", "lsias1068", or "vmpvscsi". The new controller model "virtio-scsi" is added. The model "lsilogic" is mapped to the existing "lsi" device in QEMU. When PPC64 support will be added, another controller model "spapr-vscsi" will be added. Stable addressing for SCSI devices ================================== The existing <address type='drive' ...> element will be extended as follows: <address type='drive' controller='...' bus='...' target='...' unit='...'/> where controller selects the qdev parent device, while bus/target/unit are passed as qdev properties (the QEMU names are respectively channel, scsi-id, lun). Libvirt should check for the QEMU "scsi-disk.channel" property. If it is unavailable, QEMU will only support channel=lun=0 and 0<=target<=7. LUN passthrough: block devices ============================== A SCSI block device from the host can be attached to a domain in two ways: as an emulated LUN with SCSI commands implemented within QEMU, or by passing SCSI commands down to the block device. The former is handled by the existing <disk type='file'>, <disk type='block'> and <disk type='network'> XML syntax. The latter is not yet supported. On the QEMU side, LUN passthrough is implemented by one of the scsi-generic and scsi-block devices. Scsi-generic requires a /dev/sg device name, and can be applied to any device. scsi-block is only available in QEMU 1.0 or newer, requires a block device, can be applied only to block devices (sd/sr) and has better performance. To implement LUN passthrough for block device, libvirt will add a new <disk device='lun'> attribute. When, device='lun' is passed, the device attribute is ignored. Example: <disk type='block' device='lun'> <disk name='qemu' type='raw'/> <source dev='/dev/sda'/> <target dev='sda' bus='scsi'> <address type='drive' controller='...' bus='...' target='...' unit='...'/> </disk> Also, virtio-blk handling will be enhanced to disable SG_IO passthrough when <disk device='disk'>, and only enable it when <disk device='lun'>. (I am not sure whether the 'lun' value should be for the type or device attribute. Laine has a patch to implement it for virtio disks which uses "type"). This syntax makes it clear what is the passed-through device, and at the same time it makes it very easy to switch a disk between emulated and passthrough modes. Also, a stable addressing for the source device is provided by /dev/disk/by-id and /dev/disk/by-path. Stable SCSI host addressing =========================== SCSI host number in Linux is not stable. An alternative stable addressing is required to pass a whole host or target to a guest. One place in which this could be supported is the SCSI volume pool syntax: <pool type='scsi'> <name>virtimages</name> <source> <adapter name='host0'/> </source> <target> <path>/dev/disk/by-id</path> </target> </pool> libvirt will deprecate the above form for the adapter element and provide the following forms: <adapter name='scsi_host0'/> <adapter parent='pci_0000_00_1f_2' unique_id='1'/> The existing form changes from host0 to scsi_host0, for consistency with the naming that is used in nodedev. The new parent/unique_id addressing uses a parent PCI device and a unique id that Linux provides in sysfs. In order to determine the SCSI host number, libvirt would scan all files matched by the glob pattern /sys/bus/pci/devices/0000:00:1f.2/*/scsi_host/*/unique_id, looking for the one that contains "1". The unique_id can be omitted. In this case, the pool will refer to the host with the smallest unique_id under the given device. Furthermore, a SCSI pool can be restricted to one target using an additional element: <source> <adapter name='scsi_host0'/> <address type='scsi' bus='0' target='0'/> </source> (bus defaults to 0, target is mandatory). Generic passthrough =================== Generic device passthrough at the LUN, target or host level builds on the extensions to SCSI addressing from the previous section. Passing a single LUN extends the <hostdev> tag as follows: <hostdev type='scsi'> <source> <adapter name='scsi_host0'/> <address type='scsi' bus='0' target='0' unit='0'/> </source> <target> <address type='scsi' controller='...' bus='...' target='...' unit='...'/> </target> </hostdev> This will map to a -drive QEMU option referring to a scsi-generic device, and a "-device scsi-generic" option referring to the drive. libvirt can determine the /dev/sg file to use by reading the directory /sys/bus/scsi/devices/target*/*/scsi_generic. These devices might also be shown in the nodedev tree, similar to block devices. Whenever a domain should receive all devices belonging to a SCSI host, a similar <source> item should be included within the <controller type='scsi'> element: <controller type='scsi' model='virtio-scsi'> <source> <adapter name='scsi_host0'/> </source> </controller> In this case, libvirt should use scsi-block rather than scsi-generic for block devices. NPIV-based SCSI host passthrough ================================ In NPIV, a virtual HBA is created using "virsh nodedev-create" and passed to the guest. Passing through a whole SCSI host is quite common when using NPIV. As a result, it is desirable to easily address virtual HBAs both in SCSI storage pools and in <controller type='scsi'> elements. Here are two proposals for how to refer to NPIV adapters: 1) add persistent nodedevs via commands nodedev-define, nodedev-undefine, nodedev-start. The persistent nodedevs have a name, and this can be used simply with <adapter name='NAME'>. 2) Virtual adapters do have a stable address, namely its WWN. This can be used in a third <adapter> syntax: <source> <adapter type='fc_host' wwpn='...' wwnn='...'/> </source>

On 2011年12月23日 16:36, Paolo Bonzini wrote:
Here is a revised version of the virtio-scsi proposal. There's actually not too much left intact from v1. :)
The main simplification is in how SCSI hosts can be addressed in a stable manner.
SCSI controller models ======================
Existing controller models are "auto", "buslogic", "lsilogic", "lsias1068", or "vmpvscsi". The new controller model "virtio-scsi" is added. The model "lsilogic" is mapped to the existing "lsi" device in QEMU.
When PPC64 support will be added, another controller model "spapr-vscsi" will be added.
Stable addressing for SCSI devices ==================================
The existing<address type='drive' ...> element will be extended as follows:
<address type='drive' controller='...' bus='...' target='...' unit='...'/>
where controller selects the qdev parent device, while bus/target/unit are passed as qdev properties (the QEMU names are respectively channel, scsi-id, lun).
Libvirt should check for the QEMU "scsi-disk.channel" property. If it is unavailable, QEMU will only support channel=lun=0 and 0<=target<=7.
LUN passthrough: block devices ==============================
A SCSI block device from the host can be attached to a domain in two ways: as an emulated LUN with SCSI commands implemented within QEMU, or by passing SCSI commands down to the block device. The former is handled by the existing<disk type='file'>,<disk type='block'> and <disk type='network'> XML syntax. The latter is not yet supported.
On the QEMU side, LUN passthrough is implemented by one of the scsi-generic and scsi-block devices. Scsi-generic requires a /dev/sg device name, and can be applied to any device. scsi-block is only available in QEMU 1.0 or newer, requires a block device, can be applied only to block devices (sd/sr) and has better performance.
To implement LUN passthrough for block device, libvirt will add a new <disk device='lun'> attribute. When, device='lun' is passed, the device attribute is ignored.
Example:
<disk type='block' device='lun'> <disk name='qemu' type='raw'/> <source dev='/dev/sda'/> <target dev='sda' bus='scsi'> <address type='drive' controller='...' bus='...' target='...' unit='...'/> </disk>
Also, virtio-blk handling will be enhanced to disable SG_IO passthrough when<disk device='disk'>, and only enable it when<disk device='lun'>.
(I am not sure whether the 'lun' value should be for the type or device attribute. Laine has a patch to implement it for virtio disks which uses "type").
IMHO "device=lun" is the right way to go here, per we want the the device is exposed to guest as a LUN but not a normal disk. But it seems for Laine's patch, it's also right to use "type=lun", as it tries to disable/enable SG_IO for normal disk?
This syntax makes it clear what is the passed-through device, and at the same time it makes it very easy to switch a disk between emulated and passthrough modes. Also, a stable addressing for the source device is provided by /dev/disk/by-id and /dev/disk/by-path.
Stable SCSI host addressing ===========================
SCSI host number in Linux is not stable. An alternative stable addressing is required to pass a whole host or target to a guest.
One place in which this could be supported is the SCSI volume pool syntax:
<pool type='scsi'> <name>virtimages</name> <source> <adapter name='host0'/> </source> <target> <path>/dev/disk/by-id</path> </target> </pool>
libvirt will deprecate the above form for the adapter element and provide the following forms:
<adapter name='scsi_host0'/>
<adapter parent='pci_0000_00_1f_2' unique_id='1'/>
The existing form changes from host0 to scsi_host0, for consistency with the naming that is used in nodedev. The new parent/unique_id addressing uses a parent PCI device and a unique id that Linux provides in sysfs. In order to determine the SCSI host number, libvirt would scan all files matched by the glob pattern /sys/bus/pci/devices/0000:00:1f.2/*/scsi_host/*/unique_id, looking for the one that contains "1".
The unique_id can be omitted. In this case, the pool will refer to the host with the smallest unique_id under the given device.
Furthermore, a SCSI pool can be restricted to one target using an additional element:
<source> <adapter name='scsi_host0'/> <address type='scsi' bus='0' target='0'/> </source>
(bus defaults to 0, target is mandatory).
Generic passthrough ===================
Generic device passthrough at the LUN, target or host level builds on the extensions to SCSI addressing from the previous section.
Passing a single LUN extends the<hostdev> tag as follows:
<hostdev type='scsi'> <source> <adapter name='scsi_host0'/> <address type='scsi' bus='0' target='0' unit='0'/> </source> <target> <address type='scsi' controller='...' bus='...' target='...' unit='...'/> </target> </hostdev>
This will map to a -drive QEMU option referring to a scsi-generic device, and a "-device scsi-generic" option referring to the drive. libvirt can determine the /dev/sg file to use by reading the directory /sys/bus/scsi/devices/target*/*/scsi_generic. These devices might also be shown in the nodedev tree, similar to block devices.
Whenever a domain should receive all devices belonging to a SCSI host, a similar<source> item should be included within the<controller type='scsi'> element:
<controller type='scsi' model='virtio-scsi'> <source> <adapter name='scsi_host0'/> </source> </controller>
In this case, libvirt should use scsi-block rather than scsi-generic for block devices.
NPIV-based SCSI host passthrough ================================
In NPIV, a virtual HBA is created using "virsh nodedev-create" and passed to the guest. Passing through a whole SCSI host is quite common when using NPIV. As a result, it is desirable to easily address virtual HBAs both in SCSI storage pools and in<controller type='scsi'> elements.
Here are two proposals for how to refer to NPIV adapters:
1) add persistent nodedevs via commands nodedev-define, nodedev-undefine, nodedev-start. The persistent nodedevs have a name, and this can be used simply with<adapter name='NAME'>.
2) Virtual adapters do have a stable address, namely its WWN. This can be used in a third<adapter> syntax:
<source> <adapter type='fc_host' wwpn='...' wwnn='...'/> </source>

On 12/23/2011 12:57 PM, Osier Yang wrote:
IMHO "device=lun" is the right way to go here, per we want the the device is exposed to guest as a LUN but not a normal disk. But it seems for Laine's patch, it's also right to use "type=lun", as it tries to disable/enable SG_IO for normal disk?
That's a guest-visible property... it's a bit confusing, but you're right that device="lun" is preferrable here. SG_IO on virtio-blk is in general not the cleanest thing, so it's somewhat expected that the mapping to XML is not perfect. Paolo

On Fri, Dec 23, 2011 at 09:36:23AM +0100, Paolo Bonzini wrote:
Here is a revised version of the virtio-scsi proposal. There's actually not too much left intact from v1. :)
The devil is in the details, but I think I broadly agree with everything suggested in the proposal below.
The main simplification is in how SCSI hosts can be addressed in a stable manner.
SCSI controller models ======================
Existing controller models are "auto", "buslogic", "lsilogic", "lsias1068", or "vmpvscsi". The new controller model "virtio-scsi" is added. The model "lsilogic" is mapped to the existing "lsi" device in QEMU.
When PPC64 support will be added, another controller model "spapr-vscsi" will be added.
Stable addressing for SCSI devices ==================================
The existing <address type='drive' ...> element will be extended as follows:
<address type='drive' controller='...' bus='...' target='...' unit='...'/>
where controller selects the qdev parent device, while bus/target/unit are passed as qdev properties (the QEMU names are respectively channel, scsi-id, lun).
Libvirt should check for the QEMU "scsi-disk.channel" property. If it is unavailable, QEMU will only support channel=lun=0 and 0<=target<=7.
LUN passthrough: block devices ==============================
A SCSI block device from the host can be attached to a domain in two ways: as an emulated LUN with SCSI commands implemented within QEMU, or by passing SCSI commands down to the block device. The former is handled by the existing <disk type='file'>, <disk type='block'> and <disk type='network'> XML syntax. The latter is not yet supported.
On the QEMU side, LUN passthrough is implemented by one of the scsi-generic and scsi-block devices. Scsi-generic requires a /dev/sg device name, and can be applied to any device. scsi-block is only available in QEMU 1.0 or newer, requires a block device, can be applied only to block devices (sd/sr) and has better performance.
To implement LUN passthrough for block device, libvirt will add a new <disk device='lun'> attribute. When, device='lun' is passed, the device attribute is ignored.
Example:
<disk type='block' device='lun'> <disk name='qemu' type='raw'/> <source dev='/dev/sda'/> <target dev='sda' bus='scsi'> <address type='drive' controller='...' bus='...' target='...' unit='...'/> </disk>
Also, virtio-blk handling will be enhanced to disable SG_IO passthrough when <disk device='disk'>, and only enable it when <disk device='lun'>.
(I am not sure whether the 'lun' value should be for the type or device attribute. Laine has a patch to implement it for virtio disks which uses "type").
If you consider today we have type=block type=file type=network any of these 3 can be fronted by a virtio-blk device in the guest which allows SG_IO. With type='file' SG_IO is trivially blocked. We have initially focused on type=block, presuming that type=network doesn't support SG_IO either. Thus we were free to suggest a new type=lun to replace type=block without getting into an ambiguity. In retrospect I don't think this presumption is valid. I think it is conceivable that type=network could support SG_IO when pointed to QEMU's userspace iSCSI block driver. Thus I think we need to rather use device=lun, as suggested in this proposal
This syntax makes it clear what is the passed-through device, and at the same time it makes it very easy to switch a disk between emulated and passthrough modes. Also, a stable addressing for the source device is provided by /dev/disk/by-id and /dev/disk/by-path.
Stable SCSI host addressing ===========================
SCSI host number in Linux is not stable. An alternative stable addressing is required to pass a whole host or target to a guest.
One place in which this could be supported is the SCSI volume pool syntax:
<pool type='scsi'> <name>virtimages</name> <source> <adapter name='host0'/> </source> <target> <path>/dev/disk/by-id</path> </target> </pool>
libvirt will deprecate the above form for the adapter element and provide the following forms:
<adapter name='scsi_host0'/>
<adapter parent='pci_0000_00_1f_2' unique_id='1'/>
The existing form changes from host0 to scsi_host0, for consistency with the naming that is used in nodedev. The new parent/unique_id addressing uses a parent PCI device and a unique id that Linux provides in sysfs. In order to determine the SCSI host number, libvirt would scan all files matched by the glob pattern /sys/bus/pci/devices/0000:00:1f.2/*/scsi_host/*/unique_id, looking for the one that contains "1".
The unique_id can be omitted. In this case, the pool will refer to the host with the smallest unique_id under the given device.
Furthermore, a SCSI pool can be restricted to one target using an additional element:
<source> <adapter name='scsi_host0'/> <address type='scsi' bus='0' target='0'/> </source>
(bus defaults to 0, target is mandatory).
Generic passthrough ===================
Generic device passthrough at the LUN, target or host level builds on the extensions to SCSI addressing from the previous section.
Passing a single LUN extends the <hostdev> tag as follows:
<hostdev type='scsi'> <source> <adapter name='scsi_host0'/> <address type='scsi' bus='0' target='0' unit='0'/> </source> <target> <address type='scsi' controller='...' bus='...' target='...' unit='...'/> </target> </hostdev>
This will map to a -drive QEMU option referring to a scsi-generic device, and a "-device scsi-generic" option referring to the drive. libvirt can determine the /dev/sg file to use by reading the directory /sys/bus/scsi/devices/target*/*/scsi_generic. These devices might also be shown in the nodedev tree, similar to block devices.
Whenever a domain should receive all devices belonging to a SCSI host, a similar <source> item should be included within the <controller type='scsi'> element:
<controller type='scsi' model='virtio-scsi'> <source> <adapter name='scsi_host0'/> </source> </controller>
In this case, libvirt should use scsi-block rather than scsi-generic for block devices.
NPIV-based SCSI host passthrough ================================
In NPIV, a virtual HBA is created using "virsh nodedev-create" and passed to the guest. Passing through a whole SCSI host is quite common when using NPIV. As a result, it is desirable to easily address virtual HBAs both in SCSI storage pools and in <controller type='scsi'> elements.
Here are two proposals for how to refer to NPIV adapters:
1) add persistent nodedevs via commands nodedev-define, nodedev-undefine, nodedev-start. The persistent nodedevs have a name, and this can be used simply with <adapter name='NAME'>.
2) Virtual adapters do have a stable address, namely its WWN. This can be used in a third <adapter> syntax:
<source> <adapter type='fc_host' wwpn='...' wwnn='...'/> </source>
Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (3)
-
Daniel P. Berrange
-
Osier Yang
-
Paolo Bonzini