On 22.01.21 19:16, Daniel Henrique Barboza wrote:
On 1/22/21 9:50 AM, Michal Privoznik wrote:
> The virtio-mem is paravirtualized mechanism of adding/removing
> memory to/from a VM. A virtio-mem-pci device is split into blocks
> of equal size which are then exposed (all or only a requested
> portion of them) to the guest kernel to use as regular memory.
> Therefore, the device has two important attributes:
>
> 1) block-size, which defines the size of a block
> 2) requested-size, which defines how much memory (in bytes)
> is the device requested to expose to the guest.
>
> The 'block-size' is configured on command line and immutable
> throughout device's lifetime. The 'requested-size' can be set on
> the command line too, but also is adjustable via monitor. In
> fact, that is how management software places its requests to
> change the memory allocation. If it wants to give more memory to
> the guest it changes 'requested-size' to a bigger value, and if it
> wants to shrink guest memory it changes the 'requested-size' to a
> smaller value. Note, value of zero means that guest should
> release all memory offered by the device. Of course, guest has to
> cooperate. Therefore, there is a third attribute 'size' which is
> read only and reflects how much memory the guest still has. This
> can be different to 'requested-size', obviously. Because of name
> clash, I've named it 'actualsize' and it is dealt with in future
> commits (it is a runtime information anyway).
>
> In the backend, memory for virtio-mem is backed by usual objects:
> memory-backend-{ram,file,memfd} and their size puts the cap on
> the amount of memory that a virtio-mem device can offer to a
> guest. But we are already able to express this info using <size/>
> under <target/>.
>
> Therefore, we need only two more elements to cover 'block-size'
> and 'requested-size' attributes. This is the XML I've came up
> with:
>
> <memory model='virtio-mem'>
> <source>
> <nodemask>1-3</nodemask>
> <pagesize unit='KiB'>2048</pagesize>
> </source>
> <target>
> <size unit='KiB'>2097152</size>
> <node>0</node>
> <block unit='KiB'>2048</block>
> <requested unit='KiB'>1048576</requested>
> </target>
> <address type='pci' domain='0x0000' bus='0x00'
slot='0x04' function='0x0'/>
> </memory>
>
> I hope by now it is obvious that:
>
> 1) 'requested-size' must be an integer multiple of
> 'block-size', and
> 2) virtio-mem-pci device goes onto PCI bus and thus needs PCI
> address.
>
> Then there is a limitation that the minimal 'block-size' is
> transparent huge page size (I'll leave this without explanation).
>
> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
> ---
> docs/formatdomain.rst | 35 ++++++++--
> docs/schemas/domaincommon.rng | 11 ++++
> src/conf/domain_conf.c | 53 ++++++++++++++-
> src/conf/domain_conf.h | 3 +
> src/conf/domain_validate.c | 39 +++++++++++
> src/qemu/qemu_alias.c | 1 +
> src/qemu/qemu_command.c | 1 +
> src/qemu/qemu_domain.c | 10 +++
> src/qemu/qemu_domain_address.c | 37 ++++++++---
> src/qemu/qemu_validate.c | 8 +++
> src/security/security_apparmor.c | 1 +
> src/security/security_dac.c | 2 +
> src/security/security_selinux.c | 2 +
> tests/domaincapsmock.c | 9 +++
> .../memory-hotplug-virtio-mem.xml | 66 +++++++++++++++++++
> ...emory-hotplug-virtio-mem.x86_64-latest.xml | 1 +
> tests/qemuxml2xmltest.c | 1 +
> 17 files changed, 264 insertions(+), 16 deletions(-)
> create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml
> create mode 120000
tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml
>
> diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
> index af540391db..2938758ec2 100644
> --- a/docs/formatdomain.rst
> +++ b/docs/formatdomain.rst
> @@ -7267,6 +7267,18 @@ Example: usage of the memory devices
> <size unit='KiB'>524288</size>
> </target>
> </memory>
> + <memory model='virtio-mem'>
> + <source>
> + <nodemask>1-3</nodemask>
> + <pagesize unit='KiB'>2048</pagesize>
> + </source>
> + <target>
> + <size unit='KiB'>2097152</size>
> + <node>0</node>
> + <block unit='KiB'>2048</block>
> + <requested unit='KiB'>1048576</requested>
> + </target>
> + </memory>
> </devices>
> ...
>
> @@ -7274,7 +7286,8 @@ Example: usage of the memory devices
> Provide ``dimm`` to add a virtual DIMM module to the guest. :since:`Since
> 1.2.14` Provide ``nvdimm`` model that adds a Non-Volatile DIMM module.
> :since:`Since 3.2.0` Provide ``virtio-pmem`` model to add a paravirtualized
> - persistent memory device. :since:`Since 7.1.0`
> + persistent memory device. :since:`Since 7.1.0` Provide ``virtio-mem`` model
> + to add paravirtualized memory device. :since: `Since 7.1.0`
>
> ``access``
> An optional attribute ``access`` ( :since:`since 3.2.0` ) that provides
> @@ -7297,10 +7310,11 @@ Example: usage of the memory devices
> allowed only for ``model='nvdimm'`` for pSeries guests. :since:`Since
6.2.0`
>
> ``source``
> - For model ``dimm`` this element is optional and allows to fine tune the
> - source of the memory used for the given memory device. If the element is not
> - provided defaults configured via ``numatune`` are used. If ``dimm`` is
> - provided, then the following optional elements can be provided as well:
> + For model ``dimm`` and model ``virtio-mem`` this element is optional and
> + allows to fine tune the source of the memory used for the given memory
> + device. If the element is not provided defaults configured via ``numatune``
> + are used. If the element is provided, then the following optional elements
> + can be provided:
>
> ``pagesize``
> This element can be used to override the default host page size used for
> @@ -7366,6 +7380,17 @@ Example: usage of the memory devices
> so other backend types should use the ``readonly`` element. :since:`Since
> 5.0.0`
>
> + ``block``
> + For ``virtio-mem`` only.
> + The size of an individual block, granularity of division of memory module.
> + Must be power of two and at least equal to size of a transparent hugepage
> + (2MiB on x84_64). The default is hypervisor dependant.
I don't think that 'dependant' is wrong in this context but
'dependent' is more
common.
> +
> + ``requested``
> + For ``virtio-mem`` only.
> + The total size of blocks exposed to the guest. Must respect ``block``
> + granularity.
> +
> :anchor:`<a id="elementsIommu"/>`
>
> IOMMU devices
> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
> index a4bddcf132..5bc120073e 100644
> --- a/docs/schemas/domaincommon.rng
> +++ b/docs/schemas/domaincommon.rng
> @@ -6020,6 +6020,7 @@
> <value>dimm</value>
> <value>nvdimm</value>
> <value>virtio-pmem</value>
> + <value>virtio-mem</value>
> </choice>
> </attribute>
> <optional>
> @@ -6104,6 +6105,16 @@
> <ref name="unsignedInt"/>
> </element>
> </optional>
> + <optional>
> + <element name="block">
> + <ref name="scaledInteger"/>
> + </element>
> + </optional>
> + <optional>
> + <element name="requested">
> + <ref name="scaledInteger"/>
> + </element>
> + </optional>
> <optional>
> <element name="label">
> <element name="size">
> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
> index dab4f10326..f8c5a40b24 100644
> --- a/src/conf/domain_conf.c
> +++ b/src/conf/domain_conf.c
> @@ -1310,6 +1310,7 @@ VIR_ENUM_IMPL(virDomainMemoryModel,
> "dimm",
> "nvdimm",
> "virtio-pmem",
> + "virtio-mem",
> );
>
> VIR_ENUM_IMPL(virDomainShmemModel,
> @@ -5359,6 +5360,7 @@ virDomainMemoryDefPostParse(virDomainMemoryDefPtr mem,
> }
> break;
>
> + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
> case VIR_DOMAIN_MEMORY_MODEL_DIMM:
> case VIR_DOMAIN_MEMORY_MODEL_NONE:
> case VIR_DOMAIN_MEMORY_MODEL_LAST:
> @@ -15322,6 +15324,7 @@ virDomainMemorySourceDefParseXML(xmlNodePtr node,
>
> switch (def->model) {
> case VIR_DOMAIN_MEMORY_MODEL_DIMM:
> + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
> if (virDomainParseMemory("./pagesize",
"./pagesize/@unit", ctxt,
> &def->pagesize, false, false) < 0)
> return -1;
> @@ -15388,7 +15391,8 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node,
> &def->size, true, false) < 0)
> return -1;
>
> - if (def->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) {
> + switch (def->model) {
> + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM:
> if (virDomainParseMemory("./label/size",
"./label/size/@unit", ctxt,
> &def->labelsize, false, false) < 0)
> return -1;
> @@ -15407,6 +15411,23 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node,
>
> if (virXPathBoolean("boolean(./readonly)", ctxt))
> def->readonly = true;
> + break;
> +
> + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
> + if (virDomainParseMemory("./block", "./block/@unit",
ctxt,
> + &def->blocksize, false, false) < 0)
> + return -1;
> +
> + if (virDomainParseMemory("./requested",
"./requested/@unit", ctxt,
> + &def->requestedsize, false, false) < 0)
> + return -1;
> + break;
> +
> + case VIR_DOMAIN_MEMORY_MODEL_NONE:
> + case VIR_DOMAIN_MEMORY_MODEL_DIMM:
> + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM:
> + case VIR_DOMAIN_MEMORY_MODEL_LAST:
> + break;
> }
>
> return 0;
> @@ -17214,11 +17235,14 @@ virDomainMemoryFindByDefInternal(virDomainDefPtr def,
> /* target info -> always present */
> if (tmp->model != mem->model ||
> tmp->targetNode != mem->targetNode ||
> - tmp->size != mem->size)
> + tmp->size != mem->size ||
> + tmp->blocksize != mem->blocksize ||
> + tmp->requestedsize != mem->requestedsize)
> continue;
>
> switch (mem->model) {
> case VIR_DOMAIN_MEMORY_MODEL_DIMM:
> + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
> /* source stuff -> match with device */
> if (tmp->pagesize != mem->pagesize)
> continue;
> @@ -22784,6 +22808,22 @@ virDomainMemoryDefCheckABIStability(virDomainMemoryDefPtr
src,
> return false;
> }
>
> + if (src->blocksize != dst->blocksize) {
> + virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
> + _("Target memory device block size '%llu'
doesn't match "
> + "source memory device block size
'%llu'"),
> + dst->blocksize, src->blocksize);
> + return false;
> + }
> +
> + if (src->requestedsize != dst->requestedsize) {
> + virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
> + _("Target memory device requested size '%llu'
doesn't match "
> + "source memory device requested size
'%llu'"),
> + dst->requestedsize, src->requestedsize);
> + return false;
> + }
> +
> if (src->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) {
> if (src->labelsize != dst->labelsize) {
> virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
> @@ -26507,6 +26547,7 @@ virDomainMemorySourceDefFormat(virBufferPtr buf,
>
> switch (def->model) {
> case VIR_DOMAIN_MEMORY_MODEL_DIMM:
> + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
> if (def->sourceNodes) {
> if (!(bitmap = virBitmapFormat(def->sourceNodes)))
> return -1;
> @@ -26563,6 +26604,14 @@ virDomainMemoryTargetDefFormat(virBufferPtr buf,
> if (def->readonly)
> virBufferAddLit(&childBuf, "<readonly/>\n");
>
> + if (def->blocksize) {
> + virBufferAsprintf(&childBuf, "<block
unit='KiB'>%llu</block>\n",
> + def->blocksize);
> +
> + virBufferAsprintf(&childBuf, "<requested
unit='KiB'>%llu</requested>\n",
> + def->requestedsize);
> + }
> +
> virXMLFormatElement(buf, "target", NULL, &childBuf);
> }
>
> diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
> index 95ad052891..5d89ecfe9d 100644
> --- a/src/conf/domain_conf.h
> +++ b/src/conf/domain_conf.h
> @@ -2308,6 +2308,7 @@ typedef enum {
> VIR_DOMAIN_MEMORY_MODEL_DIMM, /* dimm hotpluggable memory device */
> VIR_DOMAIN_MEMORY_MODEL_NVDIMM, /* nvdimm memory device */
> VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM, /* virtio-pmem memory device */
> + VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM, /* virtio-mem memory device */
>
> VIR_DOMAIN_MEMORY_MODEL_LAST
> } virDomainMemoryModel;
> @@ -2328,6 +2329,8 @@ struct _virDomainMemoryDef {
> int targetNode;
> unsigned long long size; /* kibibytes */
> unsigned long long labelsize; /* kibibytes; valid only for NVDIMM */
> + unsigned long long blocksize; /* kibibytes; valid only for VIRTIO_MEM */
> + unsigned long long requestedsize; /* kibibytes; valid only for VIRTIO_MEM */
> bool readonly; /* valid only for NVDIMM */
>
> /* required for QEMU NVDIMM ppc64 support */
> diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c
> index 649fc335ac..b5a0c09468 100644
> --- a/src/conf/domain_validate.c
> +++ b/src/conf/domain_validate.c
> @@ -25,6 +25,7 @@
> #include "virconftypes.h"
> #include "virlog.h"
> #include "virutil.h"
> +#include "virhostmem.h"
>
> #define VIR_FROM_THIS VIR_FROM_DOMAIN
>
> @@ -1389,6 +1390,8 @@ static int
> virDomainMemoryDefValidate(const virDomainMemoryDef *mem,
> const virDomainDef *def)
> {
> + unsigned long long thpSize;
> +
> switch (mem->model) {
> case VIR_DOMAIN_MEMORY_MODEL_NVDIMM:
> if (!mem->nvdimmPath) {
> @@ -1442,6 +1445,42 @@ virDomainMemoryDefValidate(const virDomainMemoryDef *mem,
> _("virtio-pmem does not support NUMA
nodes"));
> return -1;
> }
> + break;
> +
> + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
> + if (mem->requestedsize > mem->size) {
> + virReportError(VIR_ERR_XML_DETAIL, "%s",
> + _("requested size must be smaller than
@size"));
> + return -1;
> + }
> +
> + if (!VIR_IS_POW2(mem->blocksize)) {
> + virReportError(VIR_ERR_XML_DETAIL, "%s",
> + _("block size must be a power of two"));
> + return -1;
> + }
> +
> + if (virHostMemGetTHPSize(&thpSize) < 0) {
> + /* We failed to get THP size, fall back to a sane default. On
> + * almost every architecture the size will be 2MiB, except for some
> + * funky arches like sparc and m68k. Use 2MiB and refine later if
> + * somebody complains. */
> + thpSize = 2048;
FWIW, a Power 9 server uses 2MiB too:
$ cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
2097152
I don't think you should worry about too much since x86 is the only arch that is
supporting virtio-mem (for now).
So, in QEMU we use the following logic right now:
get_block_size() {
block_size = backend_pagesize();
if (block_size == NATIVE_PAGE_SIZE)
return MAX(block_size, native_thp_size());
return MAX(block_size, 1 * MiB);
}
detect_thp_size() {
thp_size = read_from_file();
if (!thp_size || thp_size > 16 * MiB) {
if (s390x)
return 1 * MiB;
return 2 * MiB;
}
return thp_size;
}
Especially, we also cap big block sizes (esp. arm64 with currently 512
MiB THP), as we prefer flexibility at this point.
So yes, on a x86-4 *host* we'll usually end up 2 MiB in QEMU. On arm64
it can be quite different.
Reviewed-by: Daniel Henrique Barboza <danielhb413(a)gmail.com>
--
Thanks,
David / dhildenb