On Thu, Jul 11, 2019 at 17:53:57 +0200, Michal Privoznik wrote:
There is this class of PCI devices that act like disks: NVMe.
Therefore, they are both PCI devices and disks. While we already
have <hostdev/> (and can assign a NVMe device to a domain
successfully) we don't have disk representation. There are three
problems with PCI assignment in case of a NVMe device:
1) domains with <hostdev/> can't be migrated
2) NVMe device is assigned whole, there's no way to assign only a
namespace
3) Because hypervisors see <hostdev/> they don't put block layer
on top of it - users don't get all the fancy features like
snapshots
NVMe namespaces are way of splitting one continuous NVDIMM memory
s/NVDIMM memory/NVMe device/
into smaller ones, effectively creating smaller NVMe-s (which can
then be partitioned, LVMed, etc.)
Because of all of this the following XML was chosen to model a
NVMe device:
<disk type='nvme' device='disk'>
<driver name='qemu' type='raw'/>
<source type='pci' managed='yes' namespace='1'>
<address domain='0x0000' bus='0x01' slot='0x00'
function='0x0'/>
</source>
<target dev='vda' bus='virtio'/>
</disk>
Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
---
docs/formatdomain.html.in | 45 +++++++++++++++++++++--
docs/schemas/domaincommon.rng | 32 ++++++++++++++++
tests/qemuxml2argvdata/disk-nvme.xml | 55 ++++++++++++++++++++++++++++
3 files changed, 129 insertions(+), 3 deletions(-)
create mode 100644 tests/qemuxml2argvdata/disk-nvme.xml
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index a7a6ec32a5..545578076d 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -2922,6 +2922,13 @@
</backingStore>
<target dev='vdd' bus='virtio'/>
</disk>
+ <disk type='nvme' device='disk'>
+ <driver name='qemu' type='raw'/>
+ <source type='pci' managed='yes'
namespace='1'>
The 'type' filed may get confusing a bit as it is supposed to be stored
in virStorageSource->nvme->type, while virStorageSource has it's own
type.
Also I'm pondering whether managed='yes' belongs as a top-level
attribute under 'source' as it's possibly specific to the 'pci'
setting
of type.
+ <address domain='0x0000' bus='0x01'
slot='0x00' function='0x0'/>
+ </source>
+ <target dev='vde' bus='virtio'/>
+ </disk>
</devices>
...</pre>
[...]
@@ -3118,6 +3126,31 @@
<span class="since">Since 1.0.5</span>
</p>
</dd>
+ <dt><code>nvme</code></dt>
+ <dd>
+ To specify disk source for NVMe disk the <code>source</code>
+ element has the following attributes:
+ <dl>
+ <dt><code>type</code></dt>
+ <dd>The type of address specified in
<code>address</code>
+ sub-element. Currently, only <code>pci</code> value is
+ accepted.
+ </dd>
+
+ <dt><code>managed</code></dt>
+ <dd>This attribute instructs libvirt to detach NVMe
+ controller automatically on domain startup
(<code>yes</code>)
+ or expect the controller to be detached by system
+ administrator (<code>no</code>).
+ </dd>
+
+ <dt><code>namespace</code></dt>
+ <dd>The namespace ID which should be assigned to the domain.
+ According to NVMe standard, namespace numbers start from 1,
+ including.
+ </dd>
+ </dl>
+ </dd>
</dl>
With "file", "block", and "volume", one or more
optional
sub-elements <code>seclabel</code>, <a
href="#seclabel">described
@@ -3280,11 +3313,17 @@
initiator IQN needed to access the source via mandatory
attribute <code>name</code>.
</dd>
+ <dt><code>address</code></dt>
+ <dd>For disk of type <code>nvme</code> this element
+ specifies the PCI address of the host NVMe
+ controller.
+ <span class="since">Since 5.5.0</span>
+ </dd>
</dl>
<p>
- For a "file" or "volume" disk type which represents a cdrom
or floppy
- (the <code>device</code> attribute), it is possible to define
+ For a "file", "volume" or "nvme" disk type which
represents a cdrom or
+ floppy (the <code>device</code> attribute), it is possible to define
You specifically forbid startup policy in the next commit, so what's the
point of documenting it here?
policy what to do with the disk if the source file is not
accessible.
(NB, <code>startupPolicy</code> is not valid for "volume"
disk unless
the specified storage volume is of "file" type). This is done by the
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index 31db599ab9..f367e8f6fd 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
[...]
@@ -1918,6 +1919,37 @@
</optional>
</define>
+ <define name="diskSourceNvme">
+ <attribute name="type">
+ <value>nvme</value>
+ </attribute>
+ <optional>
+ <element name="source">
+ <attribute name="type">
+ <value>pci</value>
+ </attribute>
+ <attribute name="namespace">
+ <ref name="uint32"/>
+ </attribute>
+ <optional>
+ <attribute name="managed">
+ <ref name="virYesNo"/>
+ </attribute>
+ </optional>
+ <element name="address">
+ <ref name="pciaddress"/>
+ </element>
+ <ref name="diskSourceCommon"/>
+ <optional>
+ <ref name="storageStartupPolicy"/>
+ </optional>
+ <optional>
+ <ref name="encryption"/>
+ </optional>
+ </element>
+ </optional>
+ </define>
+
<define name="diskTarget">
<data type="string">
<param
name="pattern">(ioemu:)?(fd|hd|sd|vd|xvd|ubd)[a-zA-Z0-9_]+</param>
diff --git a/tests/qemuxml2argvdata/disk-nvme.xml b/tests/qemuxml2argvdata/disk-nvme.xml
new file mode 100644
index 0000000000..0b3dbad4eb
--- /dev/null
+++ b/tests/qemuxml2argvdata/disk-nvme.xml
@@ -0,0 +1,55 @@
+<domain type='qemu'>
+ <name>QEMUGuest1</name>
+ <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid>
+ <memory unit='KiB'>219136</memory>
+ <currentMemory unit='KiB'>219136</currentMemory>
+ <vcpu placement='static'>1</vcpu>
+ <os>
+ <type arch='i686' machine='pc'>hvm</type>
+ <boot dev='hd'/>
+ </os>
+ <clock offset='utc'/>
+ <on_poweroff>destroy</on_poweroff>
+ <on_reboot>restart</on_reboot>
+ <on_crash>destroy</on_crash>
+ <devices>
+ <emulator>/usr/bin/qemu-system-i686</emulator>
+ <disk type='nvme' device='disk'>
+ <driver name='qemu' type='raw'/>
+ <source type='pci' managed='yes' namespace='1'>
+ <address domain='0x0000' bus='0x01' slot='0x00'
function='0x0'/>
+ </source>
+ <target dev='vda' bus='virtio'/>
+ </disk>
+ <disk type='nvme' device='disk'>
+ <driver name='qemu' type='raw'/>
+ <source type='pci' managed='yes' namespace='2'>
+ <address domain='0x0000' bus='0x01' slot='0x00'
function='0x0'/>
Make at heast one of them use qcow2 as format.
+ </source>
+ <target dev='vdb' bus='virtio'/>
+ </disk>
+ <disk type='nvme' device='disk'>
+ <driver name='qemu' type='raw'/>
+ <source type='pci' managed='no' namespace='1'>
+ <address domain='0x0000' bus='0x02' slot='0x00'
function='0x0'/>
+ </source>
+ <target dev='vdc' bus='virtio'/>
+ </disk>
+ <disk type='nvme' device='disk'>
+ <driver name='qemu' type='raw'/>
+ <source type='pci' managed='no' namespace='2'>
+ <address domain='0x0001' bus='0x02' slot='0x00'
function='0x0'/>
+ <encryption format='luks'>
+ <secret type='passphrase'
uuid='0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f'/>
+ </encryption>
+ </source>
+ <target dev='vdd' bus='virtio'/>
+ </disk>
+ <controller type='usb' index='0'/>
+ <controller type='pci' index='0' model='pci-root'/>
+ <controller type='scsi' index='0'
model='virtio-scsi'/>
+ <input type='mouse' bus='ps2'/>
+ <input type='keyboard' bus='ps2'/>
+ <memballoon model='none'/>
+ </devices>
+</domain>
I'm also missing any form of documentation describing the caveats (e.g.
users should not pass in a NVMe disk the host is using) or any
advantages/reasons for using this.