On 12/2/19 9:26 AM, Michal Privoznik wrote:
There is this class of PCI devices that act like disks: NVMe.
Therefore, they are both PCI devices and disks. While we already
have <hostdev/> (and can assign a NVMe device to a domain
successfully) we don't have disk representation. There are three
problems with PCI assignment in case of a NVMe device:
1) domains with <hostdev/> can't be migrated
2) NVMe device is assigned whole, there's no way to assign only a
namespace
3) Because hypervisors see <hostdev/> they don't put block layer
on top of it - users don't get all the fancy features like
snapshots
NVMe namespaces are way of splitting one continuous NVDIMM memory
into smaller ones, effectively creating smaller NVMe-s (which can
then be partitioned, LVMed, etc.)
Because of all of this the following XML was chosen to model a
NVMe device:
<disk type='nvme' device='disk'>
<driver name='qemu' type='raw'/>
<source type='pci' managed='yes' namespace='1'>
<address domain='0x0000' bus='0x01' slot='0x00'
function='0x0'/>
</source>
<target dev='vda' bus='virtio'/>
</disk>
Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
---
docs/formatdomain.html.in | 57 +++++++++++++++++++++++--
docs/schemas/domaincommon.rng | 32 ++++++++++++++
tests/qemuxml2argvdata/disk-nvme.xml | 63 ++++++++++++++++++++++++++++
3 files changed, 149 insertions(+), 3 deletions(-)
create mode 100644 tests/qemuxml2argvdata/disk-nvme.xml
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index 6df4a8b26e..fe871d933f 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -2944,6 +2944,13 @@
</backingStore>
<target dev='vdd' bus='virtio'/>
</disk>
+ <disk type='nvme' device='disk'>
+ <driver name='qemu' type='raw'/>
+ <source type='pci' managed='yes'
namespace='1'>
+ <address domain='0x0000' bus='0x01' slot='0x00'
function='0x0'/>
+ </source>
+ <target dev='vde' bus='virtio'/>
+ </disk>
</devices>
...</pre>
@@ -2957,7 +2964,8 @@
Valid values are "file", "block",
"dir" (<span class="since">since
0.7.5</span>),
"network" (<span class="since">since
0.8.7</span>), or
- "volume" (<span class="since">since
1.0.5</span>)
+ "volume" (<span class="since">since
1.0.5</span>), or
+ "nvme" (<span class="since">since
5.6.0</span>)
6.0.0 or whatever version this will land in
and refer to the underlying source for the disk.
<span class="since">Since 0.0.3</span>
</dd>
@@ -3140,6 +3148,43 @@
<span class="since">Since 1.0.5</span>
</p>
</dd>
+ <dt><code>nvme</code></dt>
+ <dd>
+ To specify disk source for NVMe disk the <code>source</code>
+ element has the following attributes:
+ <dl>
+ <dt><code>type</code></dt>
+ <dd>The type of address specified in
<code>address</code>
+ sub-element. Currently, only <code>pci</code> value is
+ accepted.
+ </dd>
+
+ <dt><code>managed</code></dt>
+ <dd>This attribute instructs libvirt to detach NVMe
+ controller automatically on domain startup
(<code>yes</code>)
+ or expect the controller to be detached by system
+ administrator (<code>no</code>).
+ </dd>
+
+ <dt><code>namespace</code></dt>
+ <dd>The namespace ID which should be assigned to the domain.
+ According to NVMe standard, namespace numbers start from 1,
+ including.
+ </dd>
+ </dl>
+
+ The difference between <code><disk
type='nvme'></code>
+ and <code><hostdev/></code> is that the latter
is plain
+ host device assignment with all its limitations (e.g. no live
+ migration), while the former makes hypervisor to run the NVMe
+ disk through hypervisor's block layer thus enabling all
+ features provided by the layer (e.g. snapshots, domain
+ migration, etc.). Moreover, since the NVMe disk is unbinded
+ from its PCI driver, the host kernel storage stack is not
+ involved (compared to passing say <code>/dev/nvme0n1</code>
via
+ <code><disk type='block'></code> and
therefore lower
+ latencies can be achieved.
+ </dd>
</dl>
With "file", "block", and "volume", one or more
optional
sub-elements <code>seclabel</code>, <a
href="#seclabel">described
@@ -3302,11 +3347,17 @@
initiator IQN needed to access the source via mandatory
attribute <code>name</code>.
</dd>
+ <dt><code>address</code></dt>
+ <dd>For disk of type <code>nvme</code> this element
+ specifies the PCI address of the host NVMe
+ controller.
+ <span class="since">Since 5.6.0</span>
Same
+ </dd>
</dl>
<p>
- For a "file" or "volume" disk type which represents a cdrom
or floppy
- (the <code>device</code> attribute), it is possible to define
+ For a "file" or "volume" disk type which represents a cdrom
or
+ floppy (the <code>device</code> attribute), it is possible to define
Stray change?
Also, tn the test XML you need to "s/qemu-system-i686/qemu-system-i386/"
or you'll hit a weird error. And VIR_TEST_REGENERATE_OUTPUT is also
busted, see my patches elsewhere on this list.
Reviewed-by: Cole Robinson <crobinso(a)redhat.com>
- Cole