
On Fri, Jan 22, 2021 at 13:50:25 +0100, Michal Privoznik wrote:
The virtio-mem is paravirtualized mechanism of adding/removing memory to/from a VM. A virtio-mem-pci device is split into blocks of equal size which are then exposed (all or only a requested portion of them) to the guest kernel to use as regular memory. Therefore, the device has two important attributes:
1) block-size, which defines the size of a block 2) requested-size, which defines how much memory (in bytes) is the device requested to expose to the guest.
The 'block-size' is configured on command line and immutable throughout device's lifetime. The 'requested-size' can be set on the command line too, but also is adjustable via monitor. In fact, that is how management software places its requests to change the memory allocation. If it wants to give more memory to the guest it changes 'requested-size' to a bigger value, and if it wants to shrink guest memory it changes the 'requested-size' to a smaller value. Note, value of zero means that guest should release all memory offered by the device. Of course, guest has to cooperate. Therefore, there is a third attribute 'size' which is read only and reflects how much memory the guest still has. This can be different to 'requested-size', obviously. Because of name clash, I've named it 'actualsize' and it is dealt with in future commits (it is a runtime information anyway).
In the backend, memory for virtio-mem is backed by usual objects: memory-backend-{ram,file,memfd} and their size puts the cap on the amount of memory that a virtio-mem device can offer to a guest. But we are already able to express this info using <size/> under <target/>.
Therefore, we need only two more elements to cover 'block-size' and 'requested-size' attributes. This is the XML I've came up with:
<memory model='virtio-mem'> <source> <nodemask>1-3</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>2097152</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>1048576</requested> </target> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </memory>
I hope by now it is obvious that:
1) 'requested-size' must be an integer multiple of 'block-size', and 2) virtio-mem-pci device goes onto PCI bus and thus needs PCI address.
Then there is a limitation that the minimal 'block-size' is transparent huge page size (I'll leave this without explanation).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
[...]
diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index af540391db..2938758ec2 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst
[...]
@@ -7274,7 +7286,8 @@ Example: usage of the memory devices Provide ``dimm`` to add a virtual DIMM module to the guest. :since:`Since 1.2.14` Provide ``nvdimm`` model that adds a Non-Volatile DIMM module. :since:`Since 3.2.0` Provide ``virtio-pmem`` model to add a paravirtualized - persistent memory device. :since:`Since 7.1.0` + persistent memory device. :since:`Since 7.1.0` Provide ``virtio-mem`` model + to add paravirtualized memory device. :since: `Since 7.1.0`
s/: `/:`/
``access`` An optional attribute ``access`` ( :since:`since 3.2.0` ) that provides
[...]
@@ -7366,6 +7380,17 @@ Example: usage of the memory devices so other backend types should use the ``readonly`` element. :since:`Since 5.0.0`
+ ``block`` + For ``virtio-mem`` only. + The size of an individual block, granularity of division of memory module. + Must be power of two and at least equal to size of a transparent hugepage + (2MiB on x84_64). The default is hypervisor dependant. + + ``requested`` + For ``virtio-mem`` only. + The total size of blocks exposed to the guest. Must respect ``block`` + granularity.
This is a bit misleading. I'd avoid using 'blocks'. [...]
diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c index 649fc335ac..b5a0c09468 100644 --- a/src/conf/domain_validate.c +++ b/src/conf/domain_validate.c @@ -25,6 +25,7 @@ #include "virconftypes.h" #include "virlog.h" #include "virutil.h" +#include "virhostmem.h"
#define VIR_FROM_THIS VIR_FROM_DOMAIN
@@ -1389,6 +1390,8 @@ static int virDomainMemoryDefValidate(const virDomainMemoryDef *mem, const virDomainDef *def) { + unsigned long long thpSize; + switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: if (!mem->nvdimmPath) { @@ -1442,6 +1445,42 @@ virDomainMemoryDefValidate(const virDomainMemoryDef *mem, _("virtio-pmem does not support NUMA nodes")); return -1; } + break; + + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (mem->requestedsize > mem->size) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("requested size must be smaller than @size")); + return -1; + }
so mem->size is the maximum size of the virtio-mem provided memory? I don't think that it's obvious from the docs.
+ + if (!VIR_IS_POW2(mem->blocksize)) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("block size must be a power of two")); + return -1; + } +
[...]
diff --git a/tests/domaincapsmock.c b/tests/domaincapsmock.c index d81a898dc0..34a2f9ad81 100644 --- a/tests/domaincapsmock.c +++ b/tests/domaincapsmock.c @@ -17,6 +17,7 @@ #include <config.h>
#include "virhostcpu.h" +#include "virhostmem.h" #ifdef WITH_LIBXL # include "libxl/libxl_capabilities.h" #endif @@ -40,3 +41,11 @@ virHostCPUGetMicrocodeVersion(virArch hostArch G_GNUC_UNUSED) { return 0; } + +int +virHostMemGetTHPSize(unsigned long long *size) +{ + /* Pretend Transparent Huge Page size is 2MiB. */ + *size = 2048; + return 0; +}
Put the mocking along with the implementation or separately. It seems weird to implement this in a patch which adds a new device model.