On Thu, Feb 18, 2021 at 14:31:08 +0100, Michal Privoznik wrote:
This commit adds new memorydevices.rst page which should serve
all models of memory devices. Yet, I'm documenting virtio-mem
quirks only.
Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
---
docs/kbase/index.rst | 4 +
docs/kbase/memorydevices.rst | 160 +++++++++++++++++++++++++++++++++++
docs/kbase/meson.build | 1 +
3 files changed, 165 insertions(+)
create mode 100644 docs/kbase/memorydevices.rst
diff --git a/docs/kbase/index.rst b/docs/kbase/index.rst
index 532804fe05..45450bf33b 100644
--- a/docs/kbase/index.rst
+++ b/docs/kbase/index.rst
@@ -46,6 +46,10 @@ Usage
`PCI topology <../pci-addresses.html>`__
Addressing schemes for PCI devices
+`Memory devices <memorydevices.html>`__
+ Memory devices and their use
+
+
Internals / Debugging
---------------------
diff --git a/docs/kbase/memorydevices.rst b/docs/kbase/memorydevices.rst
new file mode 100644
index 0000000000..23adf54e16
--- /dev/null
+++ b/docs/kbase/memorydevices.rst
@@ -0,0 +1,160 @@
+==============
+Memory devices
+==============
+
+.. contents::
+
+Basics
+======
+
+Memory devices can be divided into two families: DIMMs and NVDIMMs. The former
+is typical RAM memory: it's volatile and thus its contents doesn't survive
+reboots nor guest shut downs and power ons. The latter retains its contents
+across reboots or power outages.
+
+In Libvirt, there are two models for DIMMs:
+
+* ``dimm`` model:
+
+ ::
+
+ <memory model='dimm'>
+ <target>
+ <size unit='KiB'>523264</size>
+ <node>0</node>
+ </target>
+ <address type='dimm' slot='0'/>
+ </memory>
+
+* ``virtio-mem`` model:
+
+ ::
+
+ <memory model='virtio-mem'>
+ <target>
+ <size unit='KiB'>1048576</size>
+ <node>0</node>
+ <block unit='KiB'>2048</block>
+ <requested unit='KiB'>524288</requested>
+ </target>
+ <address type='pci' domain='0x0000' bus='0x00'
slot='0x02' function='0x0'/>
+ </memory>
+
+Then there are two models for NVDIMMs:
+
+* ``nvidmm`` model:
+
+ ::
+
+ <memory model='nvdimm'>
+ <source>
+ <path>/tmp/nvdimm</path>
+ </source>
+ <target>
+ <size unit='KiB'>523264</size>
+ <node>0</node>
+ </target>
+ <address type='dimm' slot='0'/>
+ </memory>
+
+* ``virtio-pmem`` model:
+
+ ::
+
+ <memory model='virtio-pmem' access='shared'>
+ <source>
+ <path>/tmp/virtio_pmem</path>
+ </source>
+ <target>
+ <size unit='KiB'>524288</size>
+ </target>
+ <address type='pci' domain='0x0000' bus='0x00'
slot='0x05' function='0x0'/>
+ </memory>
+
+
+Please not that (maybe somewhat surprisingly) virtio models go onto PCI bus
s/not/note/
+instead of DIMM slots.
+
+Furthermore, DIMMs can have ``<source/>`` element which configures backend for
+devices. For NVDIMMs the element is mandatory and reflects where the contents
+is saved.
+
+See
https://libvirt.org/formatdomain.html#elementsMemory
Please use a relative link inside of the libvirt project. You might need
to enclose it.
+``virtio-mem`` model
+====================
+
+The ``virtio-mem`` model can be viewed as revised memory balloon. It offers
+memory hotplug and hotunplug solution (without the actual hotplug of the
I'd say 'adding and removing of memory without actual hotplug or unplug
of devices'.
+device). It solves problems that memory balloon can't solve on
its own and thus
+is more flexible than DIMM + balloon solution. ``virtio-mem`` is NUMA aware,
+and thus memory can be inflated/deflated only for a subset of guest NUMA nodes.
+Also, it works with chunks that are either exposed to guest or taken back from
+it.
+
+See
https://virtio-mem.gitlab.io/
+
+Under the hood, ``virtio-mem`` device is split into chunks of equal size which
+are then exposed to the guest. Either all of them or only a portion depending
+on user's request. Therefore there are three important sizes for
+``virtio-mem``. All are to be found under ``<target/>`` element:
+
+#. The maximum size the device can ever offer, exposed under ``<size/>``
+#. The size a single block, exposed under ``<block/>``
size of a
+#. The current size exposed to the guest, exposed under
``<requested/>``
+
+For instance, the following example the maximum size is 4GiB, the block size is
+2MiB and only 1GiB should be exposed to the guest:
+
+ ::
+
+ <memory model='virtio-mem'>
+ <target>
+ <size unit='KiB'>4194304</size>
+ <block unit='KiB'>2048</block>
+ <requested unit='KiB'>1048576</requested>
+ </target>
+ </memory>
+
+Please note that ``<requested/>`` must be an integer multiple of
``<block/>``
+size or zero (memory completely deflated) and has to be less or equal to
+``<size/>`` (memory completely inflated). Furthermore, QEMU recommends the
I'd avoid inflated/deflated since it has exactly the opposite meaning
with memballoon.
+``<block/>`` size to be as big as a Transparent Huge Page
(usually 2MiB).
+
+To change the size exposed to the guest, users should pass memory device XML
+with nothing but ``<requested/>`` changed into the
+``virDomainUpdateDeviceFlags()`` API. For user's convenience this can be done
+via virsh too:
+
+ ::
+
+ # virsh update-memory-device $dom --requested size 2GiB
+
+If there are two or more ``<memory/>`` devices then ``--alias`` shall be used
+to tell virsh which memory device should be updated.
+
+For running guests there is fourth size that can be found under ``<target/>``:
+
+ ::
+
+ <actual unit='KiB'>2097152</actual>
+
+The ``<actual/>`` reflects the actual size consumed by the guest. In general it
s/consumed/used/
+can differ from ``<requested/>``. Reasons include guest kernel
missing
+``virtio-mem`` module and thus being unable to take offered memory, or guest
+kernel being unable to free memory and allow deflation. Since ``<actual/>``
s/and allow deflation//
+only reports size to users, the element is never parsed. It is
formatted only
+into live XML.
+
+Since changing actual allocation requires cooperation with guest kernel,
+requests for change are not instant. Therefore, libvirt emits
+``VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE`` event whenever actual
+allocation changed.
+
+Please not that using ``virtio-mem`` with memory balloon is not possible,
+currently. The real reason is that libvirt's memory accounting isn't ready and
+mixing these two would be confusing to users. Libvirt exposes current value of
+memory balloon under ``<currentMemory/>`` but if it were to account for
+``<actual/>`` too then it would be impossible to learn true size of the
+balloon. Also it might result in mistakenly trying to deflate ``virtio-mem``
+via ``setmem`` command.