On 7/16/19 2:35 PM, Peter Krempa wrote:
On Thu, Jul 11, 2019 at 17:53:57 +0200, Michal Privoznik wrote:
> There is this class of PCI devices that act like disks: NVMe.
> Therefore, they are both PCI devices and disks. While we already
> have <hostdev/> (and can assign a NVMe device to a domain
> successfully) we don't have disk representation. There are three
> problems with PCI assignment in case of a NVMe device:
>
> 1) domains with <hostdev/> can't be migrated
>
> 2) NVMe device is assigned whole, there's no way to assign only a
> namespace
>
> 3) Because hypervisors see <hostdev/> they don't put block layer
> on top of it - users don't get all the fancy features like
> snapshots
>
> NVMe namespaces are way of splitting one continuous NVDIMM memory
s/NVDIMM memory/NVMe device/
> into smaller ones, effectively creating smaller NVMe-s (which can
> then be partitioned, LVMed, etc.)
>
> Because of all of this the following XML was chosen to model a
> NVMe device:
>
> <disk type='nvme' device='disk'>
> <driver name='qemu' type='raw'/>
> <source type='pci' managed='yes' namespace='1'>
> <address domain='0x0000' bus='0x01' slot='0x00'
function='0x0'/>
> </source>
> <target dev='vda' bus='virtio'/>
> </disk>
>
> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
> ---
> docs/formatdomain.html.in | 45 +++++++++++++++++++++--
> docs/schemas/domaincommon.rng | 32 ++++++++++++++++
> tests/qemuxml2argvdata/disk-nvme.xml | 55 ++++++++++++++++++++++++++++
> 3 files changed, 129 insertions(+), 3 deletions(-)
> create mode 100644 tests/qemuxml2argvdata/disk-nvme.xml
>
> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> index a7a6ec32a5..545578076d 100644
> --- a/docs/formatdomain.html.in
> +++ b/docs/formatdomain.html.in
> @@ -2922,6 +2922,13 @@
> </backingStore>
> <target dev='vdd' bus='virtio'/>
> </disk>
> + <disk type='nvme' device='disk'>
> + <driver name='qemu' type='raw'/>
> + <source type='pci' managed='yes'
namespace='1'>
The 'type' filed may get confusing a bit as it is supposed to be stored
in virStorageSource->nvme->type, while virStorageSource has it's own
type.
Well, I'm trygin to mix two things here: <disk/> and <hostdev/> (well,
PCI devices). There are other types of NVMe than PCIe attached ones.
They are called NVMe-oF (NVMe over Fabrics), so I've figured that we
want to specifically tell that NVMe we are dealing with here is the PCIe
one. But I guess it can also be added later, if we ever decide to
support NVMe-oF (disks without @type would have 'pci' autofilled in a
post parse callback).
Also I'm pondering whether managed='yes' belongs as a top-level
attribute under 'source' as it's possibly specific to the 'pci'
setting
of type.
Yeah, designing future proof XML is hard. I'm open to any suggestion.
> + <address domain='0x0000' bus='0x01' slot='0x00'
function='0x0'/>
> + </source>
> + <target dev='vde' bus='virtio'/>
> + </disk>
> </devices>
> ...</pre>
>
[...]
> @@ -3118,6 +3126,31 @@
> <span class="since">Since 1.0.5</span>
> </p>
> </dd>
> + <dt><code>nvme</code></dt>
> + <dd>
> + To specify disk source for NVMe disk the
<code>source</code>
> + element has the following attributes:
> + <dl>
> + <dt><code>type</code></dt>
> + <dd>The type of address specified in
<code>address</code>
> + sub-element. Currently, only <code>pci</code> value is
> + accepted.
> + </dd>
> +
> + <dt><code>managed</code></dt>
> + <dd>This attribute instructs libvirt to detach NVMe
> + controller automatically on domain startup
(<code>yes</code>)
> + or expect the controller to be detached by system
> + administrator (<code>no</code>).
> + </dd>
> +
> + <dt><code>namespace</code></dt>
> + <dd>The namespace ID which should be assigned to the domain.
> + According to NVMe standard, namespace numbers start from 1,
> + including.
> + </dd>
> + </dl>
> + </dd>
> </dl>
> With "file", "block", and "volume", one or
more optional
> sub-elements <code>seclabel</code>, <a
href="#seclabel">described
> @@ -3280,11 +3313,17 @@
> initiator IQN needed to access the source via mandatory
> attribute <code>name</code>.
> </dd>
> + <dt><code>address</code></dt>
> + <dd>For disk of type <code>nvme</code> this element
> + specifies the PCI address of the host NVMe
> + controller.
> + <span class="since">Since 5.5.0</span>
> + </dd>
> </dl>
>
> <p>
> - For a "file" or "volume" disk type which represents a
cdrom or floppy
> - (the <code>device</code> attribute), it is possible to define
> + For a "file", "volume" or "nvme" disk type
which represents a cdrom or
> + floppy (the <code>device</code> attribute), it is possible to
define
You specifically forbid startup policy in the next commit, so what's the
point of documenting it here?
Ah, good point. Back in the day I still wanted to make startupPolicy
work, but then decided not to.
> policy what to do with the disk if the source file is not accessible.
> (NB, <code>startupPolicy</code> is not valid for
"volume" disk unless
> the specified storage volume is of "file" type). This is done by
the
> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
> index 31db599ab9..f367e8f6fd 100644
> --- a/docs/schemas/domaincommon.rng
> +++ b/docs/schemas/domaincommon.rng
[...]
> @@ -1918,6 +1919,37 @@
> </optional>
> </define>
>
> + <define name="diskSourceNvme">
> + <attribute name="type">
> + <value>nvme</value>
> + </attribute>
> + <optional>
> + <element name="source">
> + <attribute name="type">
> + <value>pci</value>
> + </attribute>
> + <attribute name="namespace">
> + <ref name="uint32"/>
> + </attribute>
> + <optional>
> + <attribute name="managed">
> + <ref name="virYesNo"/>
> + </attribute>
> + </optional>
> + <element name="address">
> + <ref name="pciaddress"/>
> + </element>
> + <ref name="diskSourceCommon"/>
> + <optional>
> + <ref name="storageStartupPolicy"/>
> + </optional>
> + <optional>
> + <ref name="encryption"/>
> + </optional>
> + </element>
> + </optional>
> + </define>
> +
> <define name="diskTarget">
> <data type="string">
> <param
name="pattern">(ioemu:)?(fd|hd|sd|vd|xvd|ubd)[a-zA-Z0-9_]+</param>
> diff --git a/tests/qemuxml2argvdata/disk-nvme.xml
b/tests/qemuxml2argvdata/disk-nvme.xml
> new file mode 100644
> index 0000000000..0b3dbad4eb
> --- /dev/null
> +++ b/tests/qemuxml2argvdata/disk-nvme.xml
> @@ -0,0 +1,55 @@
> +<domain type='qemu'>
> + <name>QEMUGuest1</name>
> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid>
> + <memory unit='KiB'>219136</memory>
> + <currentMemory unit='KiB'>219136</currentMemory>
> + <vcpu placement='static'>1</vcpu>
> + <os>
> + <type arch='i686' machine='pc'>hvm</type>
> + <boot dev='hd'/>
> + </os>
> + <clock offset='utc'/>
> + <on_poweroff>destroy</on_poweroff>
> + <on_reboot>restart</on_reboot>
> + <on_crash>destroy</on_crash>
> + <devices>
> + <emulator>/usr/bin/qemu-system-i686</emulator>
> + <disk type='nvme' device='disk'>
> + <driver name='qemu' type='raw'/>
> + <source type='pci' managed='yes' namespace='1'>
> + <address domain='0x0000' bus='0x01' slot='0x00'
function='0x0'/>
> + </source>
> + <target dev='vda' bus='virtio'/>
> + </disk>
> + <disk type='nvme' device='disk'>
> + <driver name='qemu' type='raw'/>
> + <source type='pci' managed='yes' namespace='2'>
> + <address domain='0x0000' bus='0x01' slot='0x00'
function='0x0'/>
Make at heast one of them use qcow2 as format.
Okay.
> + </source>
> + <target dev='vdb' bus='virtio'/>
> + </disk>
> + <disk type='nvme' device='disk'>
> + <driver name='qemu' type='raw'/>
> + <source type='pci' managed='no' namespace='1'>
> + <address domain='0x0000' bus='0x02' slot='0x00'
function='0x0'/>
> + </source>
> + <target dev='vdc' bus='virtio'/>
> + </disk>
> + <disk type='nvme' device='disk'>
> + <driver name='qemu' type='raw'/>
> + <source type='pci' managed='no' namespace='2'>
> + <address domain='0x0001' bus='0x02' slot='0x00'
function='0x0'/>
> + <encryption format='luks'>
> + <secret type='passphrase'
uuid='0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f'/>
> + </encryption>
> + </source>
> + <target dev='vdd' bus='virtio'/>
> + </disk>
> + <controller type='usb' index='0'/>
> + <controller type='pci' index='0'
model='pci-root'/>
> + <controller type='scsi' index='0'
model='virtio-scsi'/>
> + <input type='mouse' bus='ps2'/>
> + <input type='keyboard' bus='ps2'/>
> + <memballoon model='none'/>
> + </devices>
> +</domain>
I'm also missing any form of documentation describing the caveats (e.g.
users should not pass in a NVMe disk the host is using)
Since I'm using virHostdevManager to track used NVMe-s, any attempt to
start a domain with a taken NVMe disk will fail. So users will see that
immediately :-) But okay, I will enhance the docs.
Michal