[PATCH v4 for v7.6.0 00/14] Introduce virtio-mem <memory/> model

v4 of: https://listman.redhat.com/archives/libvir-list/2021-April/msg01138.html diff to v3: - Rebased code on the top of master - Tried to work in all Peter's review suggestions - Fixed a bug where adjusting <requested/> was viewed as hotplug of new <memory/> by XML validator and thus if <maxMemory/> was close enough to <currentMemory/> the validator reported an error (this was reported by David). You can also find these patches on my branch: https://gitlab.com/MichalPrivoznik/libvirt/-/tree/virtio_mem_v4 Michal Prívozník (14): virhostmem: Introduce virHostMemGetTHPSize() qemu_capabilities: Introduce QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI conf: Introduce virtio-mem <memory/> model qemu: Build command line for virtio-mem qemu: Wire up <memory/> live update Introduce <actual/> property to virtio-mem conf: Introduce virDomainMemoryFindByDeviceAlias() qemu: Wire up MEMORY_DEVICE_SIZE_CHANGE event qemu: Refresh the actual size of virtio-mem on monitor reconnect qemu: Account for both memballoon and virtio-mem qemuDomainSetMemoryFlags: Take virtio-mem into consideration virsh: Introduce update-memory-device command news: document recent virtio memory addition kbase: Document virtio-mem NEWS.rst | 12 + docs/formatdomain.rst | 45 +++- docs/kbase/index.rst | 4 + docs/kbase/memorydevices.rst | 150 +++++++++++ docs/kbase/meson.build | 1 + docs/manpages/virsh.rst | 30 +++ docs/schemas/domaincommon.rng | 16 ++ examples/c/misc/event-test.c | 17 ++ include/libvirt/libvirt-domain.h | 23 ++ src/conf/domain_conf.c | 118 ++++++++- src/conf/domain_conf.h | 15 ++ src/conf/domain_event.c | 84 ++++++ src/conf/domain_event.h | 10 + src/conf/domain_validate.c | 39 +++ src/libvirt_private.syms | 5 + src/qemu/qemu_alias.c | 10 +- src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 13 +- src/qemu/qemu_domain.c | 33 ++- src/qemu/qemu_domain.h | 1 + src/qemu/qemu_domain_address.c | 38 ++- src/qemu/qemu_driver.c | 240 +++++++++++++++++- src/qemu/qemu_hotplug.c | 18 ++ src/qemu/qemu_hotplug.h | 5 + src/qemu/qemu_monitor.c | 37 +++ src/qemu/qemu_monitor.h | 28 ++ src/qemu/qemu_monitor_json.c | 97 +++++-- src/qemu/qemu_monitor_json.h | 5 + src/qemu/qemu_process.c | 72 ++++++ src/qemu/qemu_validate.c | 8 + src/remote/remote_daemon_dispatch.c | 30 +++ src/remote/remote_driver.c | 32 +++ src/remote/remote_protocol.x | 14 +- src/remote_protocol-structs | 7 + src/security/security_apparmor.c | 1 + src/security/security_dac.c | 2 + src/security/security_selinux.c | 2 + src/util/virhostmem.c | 54 ++++ src/util/virhostmem.h | 3 + tests/domaincapsmock.c | 9 + .../caps_5.1.0.x86_64.xml | 1 + .../caps_5.2.0.x86_64.xml | 1 + .../caps_6.0.0.x86_64.xml | 1 + .../caps_6.1.0.x86_64.xml | 1 + ...mory-hotplug-virtio-mem.x86_64-latest.args | 41 +++ .../memory-hotplug-virtio-mem.xml | 67 +++++ tests/qemuxml2argvtest.c | 1 + ...emory-hotplug-virtio-mem.x86_64-latest.xml | 1 + tests/qemuxml2xmltest.c | 1 + tools/virsh-domain.c | 169 ++++++++++++ 51 files changed, 1562 insertions(+), 53 deletions(-) create mode 100644 docs/kbase/memorydevices.rst create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml create mode 120000 tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml -- 2.31.1

New virHostMemGetTHPSize() is introduced which allows caller to obtain THP PMD (Page Middle Directory) size, which is equal to the minimal size that THP can use, taken from kernel doc (Documentation/admin-guide/mm/transhuge.rst): Some userspace (such as a test program, or an optimized memory allocation library) may want to know the size (in bytes) of a transparent hugepage:: cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size Since this size depends on the host architecture and the kernel it won't change whilst libvirtd is running. Therefore, we can use virOnce() and cache the value. Of course, we can be running under kernel that has THP disabled or has no notion of THP at all. In that case a negative value is returned to signal error. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virhostmem.c | 54 ++++++++++++++++++++++++++++++++++++++++ src/util/virhostmem.h | 3 +++ tests/domaincapsmock.c | 9 +++++++ 4 files changed, 67 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 68e4b6aab8..19f0d0ddc8 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2393,6 +2393,7 @@ virHostMemGetFreePages; virHostMemGetInfo; virHostMemGetParameters; virHostMemGetStats; +virHostMemGetTHPSize; virHostMemSetParameters; diff --git a/src/util/virhostmem.c b/src/util/virhostmem.c index 8aa675cb4f..7de87d92ae 100644 --- a/src/util/virhostmem.c +++ b/src/util/virhostmem.c @@ -45,11 +45,14 @@ #include "virstring.h" #include "virnuma.h" #include "virlog.h" +#include "virthread.h" #define VIR_FROM_THIS VIR_FROM_NONE VIR_LOG_INIT("util.hostmem"); +static unsigned long long virHostTHPPMDSize; /* in kibibytes */ +static virOnceControl virHostMemGetTHPSizeOnce = VIR_ONCE_CONTROL_INITIALIZER; #ifdef __FreeBSD__ # define BSD_MEMORY_STATS_ALL 4 @@ -918,3 +921,54 @@ virHostMemAllocPages(unsigned int npages, return ncounts; } + +#if defined(__linux__) +# define HPAGE_PMD_SIZE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size" +static void +virHostMemGetTHPSizeSysfs(unsigned long long *size) +{ + if (virFileReadValueUllong(size, "%s", HPAGE_PMD_SIZE_PATH) < 0) { + VIR_WARN("unable to get THP PMD size: %s", g_strerror(errno)); + return; + } + + /* Size is now in bytes. Convert to KiB. */ + *size >>= 10; +} +#endif /* defined(__linux__) */ + + +static void +virHostMemGetTHPSizeOnceInit(void) +{ +#if defined(__linux__) + virHostMemGetTHPSizeSysfs(&virHostTHPPMDSize); +#else /* !defined(__linux__) */ + VIR_WARN("Getting THP size not ported yet"); +#endif /* !defined(__linux__) */ +} + + +/** + * virHostMemGetTHPSize: + * @size: returned size of THP in kibibytes + * + * Obtain Transparent Huge Page size in kibibytes. The size + * depends on host architecture and kernel. Because of virOnce(), + * do not rely on errno in case of failure. + * + * Returns: 0 on success, + * -1 on failure. + */ +int +virHostMemGetTHPSize(unsigned long long *size) +{ + if (virOnce(&virHostMemGetTHPSizeOnce, virHostMemGetTHPSizeOnceInit) < 0) + return -1; + + if (virHostTHPPMDSize == 0) + return -1; + + *size = virHostTHPPMDSize; + return 0; +} diff --git a/src/util/virhostmem.h b/src/util/virhostmem.h index 1369829807..bf15c40698 100644 --- a/src/util/virhostmem.h +++ b/src/util/virhostmem.h @@ -53,3 +53,6 @@ int virHostMemAllocPages(unsigned int npages, int startCell, unsigned int cellCount, bool add); + +int virHostMemGetTHPSize(unsigned long long *size) + G_GNUC_NO_INLINE; diff --git a/tests/domaincapsmock.c b/tests/domaincapsmock.c index b03f75199b..aa5eb68547 100644 --- a/tests/domaincapsmock.c +++ b/tests/domaincapsmock.c @@ -17,6 +17,7 @@ #include <config.h> #include "virhostcpu.h" +#include "virhostmem.h" int virHostCPUGetKVMMaxVCPUs(void) @@ -29,3 +30,11 @@ virHostCPUGetMicrocodeVersion(virArch hostArch G_GNUC_UNUSED) { return 0; } + +int +virHostMemGetTHPSize(unsigned long long *size) +{ + /* Pretend Transparent Huge Page size is 2MiB. */ + *size = 2048; + return 0; +} -- 2.31.1

This commit introduces a new capability that reflects virtio-mem-pci device support in QEMU: QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI, /* -device virtio-mem-pci */ The virtio-mem-pci device was introduced in QEMU 5.1. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml | 1 + 6 files changed, 7 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index d1cd8f11ac..a203aa7557 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -636,6 +636,7 @@ VIR_ENUM_IMPL(virQEMUCaps, /* 405 */ "confidential-guest-support", "query-display-options", + "virtio-mem-pci", ); @@ -1354,6 +1355,7 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { { "input-linux", QEMU_CAPS_INPUT_LINUX }, { "virtio-gpu-gl-pci", QEMU_CAPS_VIRTIO_GPU_GL_PCI }, { "virtio-vga-gl", QEMU_CAPS_VIRTIO_VGA_GL }, + { "virtio-mem-pci", QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI }, }; diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 7944b9170a..b2ece76e38 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -616,6 +616,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ /* 405 */ QEMU_CAPS_MACHINE_CONFIDENTAL_GUEST_SUPPORT, /* -machine confidential-guest-support */ QEMU_CAPS_QUERY_DISPLAY_OPTIONS, /* 'query-display-options' qmp command present */ + QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI, /* -device virtio-mem-pci */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml index 674d984432..268a7dcd68 100644 --- a/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml @@ -252,6 +252,7 @@ <flag name='rotation-rate'/> <flag name='input-linux'/> <flag name='query-display-options'/> + <flag name='virtio-mem-pci'/> <version>5001000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml index ec3384cab8..f924a88488 100644 --- a/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml @@ -253,6 +253,7 @@ <flag name='rotation-rate'/> <flag name='input-linux'/> <flag name='query-display-options'/> + <flag name='virtio-mem-pci'/> <version>5002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml index d6198c2479..92468f20c9 100644 --- a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml @@ -260,6 +260,7 @@ <flag name='input-linux'/> <flag name='confidential-guest-support'/> <flag name='query-display-options'/> + <flag name='virtio-mem-pci'/> <version>6000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml index 1937b88a4d..dfda512790 100644 --- a/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml @@ -260,6 +260,7 @@ <flag name='virtio-vga-gl'/> <flag name='confidential-guest-support'/> <flag name='query-display-options'/> + <flag name='virtio-mem-pci'/> <version>6000050</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100243</microcodeVersion> -- 2.31.1

The virtio-mem is paravirtualized mechanism of adding/removing memory to/from a VM. A virtio-mem-pci device is split into blocks of equal size which are then exposed (all or only a requested portion of them) to the guest kernel to use as regular memory. Therefore, the device has two important attributes: 1) block-size, which defines the size of a block 2) requested-size, which defines how much memory (in bytes) is the device requested to expose to the guest. The 'block-size' is configured on command line and immutable throughout device's lifetime. The 'requested-size' can be set on the command line too, but also is adjustable via monitor. In fact, that is how management software places its requests to change the memory allocation. If it wants to give more memory to the guest it changes 'requested-size' to a bigger value, and if it wants to shrink guest memory it changes the 'requested-size' to a smaller value. Note, value of zero means that guest should release all memory offered by the device. Of course, guest has to cooperate. Therefore, there is a third attribute 'size' which is read only and reflects how much memory the guest still has. This can be different to 'requested-size', obviously. Because of name clash, I've named it 'actualsize' and it is dealt with in future commits (it is a runtime information anyway). In the backend, memory for virtio-mem is backed by usual objects: memory-backend-{ram,file,memfd} and their size puts the cap on the amount of memory that a virtio-mem device can offer to a guest. But we are already able to express this info using <size/> under <target/>. Therefore, we need only two more elements to cover 'block-size' and 'requested-size' attributes. This is the XML I've came up with: <memory model='virtio-mem'> <source> <nodemask>1-3</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>2097152</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>1048576</requested> </target> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </memory> I hope by now it is obvious that: 1) 'requested-size' must be an integer multiple of 'block-size', and 2) virtio-mem-pci device goes onto PCI bus and thus needs PCI address. Then there is a limitation that the minimal 'block-size' is transparent huge page size (I'll leave this without explanation). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.rst | 38 +++++++++-- docs/schemas/domaincommon.rng | 11 +++ src/conf/domain_conf.c | 53 ++++++++++++++- src/conf/domain_conf.h | 3 + src/conf/domain_validate.c | 39 +++++++++++ src/qemu/qemu_alias.c | 1 + src/qemu/qemu_command.c | 1 + src/qemu/qemu_domain.c | 10 +++ src/qemu/qemu_domain_address.c | 38 ++++++++--- src/qemu/qemu_process.c | 2 + src/qemu/qemu_validate.c | 8 +++ src/security/security_apparmor.c | 1 + src/security/security_dac.c | 2 + src/security/security_selinux.c | 2 + .../memory-hotplug-virtio-mem.xml | 67 +++++++++++++++++++ ...emory-hotplug-virtio-mem.x86_64-latest.xml | 1 + tests/qemuxml2xmltest.c | 1 + 17 files changed, 261 insertions(+), 17 deletions(-) create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml create mode 120000 tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index c6dede053f..967d5bbb91 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -7763,6 +7763,18 @@ Example: usage of the memory devices <size unit='KiB'>524288</size> </target> </memory> + <memory model='virtio-mem'> + <source> + <nodemask>1-3</nodemask> + <pagesize unit='KiB'>2048</pagesize> + </source> + <target> + <size unit='KiB'>2097152</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>1048576</requested> + </target> + </memory> </devices> ... @@ -7770,7 +7782,8 @@ Example: usage of the memory devices Provide ``dimm`` to add a virtual DIMM module to the guest. :since:`Since 1.2.14` Provide ``nvdimm`` model that adds a Non-Volatile DIMM module. :since:`Since 3.2.0` Provide ``virtio-pmem`` model to add a paravirtualized - persistent memory device. :since:`Since 7.1.0` + persistent memory device. :since:`Since 7.1.0` Provide ``virtio-mem`` model + to add paravirtualized memory device. :since:`Since 7.4.0` ``access`` An optional attribute ``access`` ( :since:`since 3.2.0` ) that provides @@ -7793,10 +7806,11 @@ Example: usage of the memory devices allowed only for ``model='nvdimm'`` for pSeries guests. :since:`Since 6.2.0` ``source`` - For model ``dimm`` this element is optional and allows to fine tune the - source of the memory used for the given memory device. If the element is not - provided defaults configured via ``numatune`` are used. If ``dimm`` is - provided, then the following optional elements can be provided as well: + For model ``dimm`` and model ``virtio-mem`` this element is optional and + allows to fine tune the source of the memory used for the given memory + device. If the element is not provided defaults configured via ``numatune`` + are used. If the element is provided, then the following optional elements + can be provided: ``pagesize`` This element can be used to override the default host page size used for @@ -7835,7 +7849,8 @@ Example: usage of the memory devices added memory from the perspective of the guest. The mandatory ``size`` subelement configures the size of the added memory as - a scaled integer. + a scaled integer. For ``virtio-mem`` this represents the maximum possible + size exposed to the guest. The ``node`` subelement configures the guest NUMA node to attach the memory to. The element shall be used only if the guest has NUMA nodes configured. @@ -7862,6 +7877,17 @@ Example: usage of the memory devices so other backend types should use the ``readonly`` element. :since:`Since 5.0.0` + ``block`` + For ``virtio-mem`` only. + The size of an individual block, granularity of division of memory block. + Must be power of two and at least equal to size of a transparent hugepage + (2MiB on x84_64). The default is hypervisor dependent. + + ``requested`` + For ``virtio-mem`` only. + The total size exposed to the guest. Must respect ``block`` granularity + and be smaller than or equal to ``size``. + :anchor:`<a id="elementsIommu"/>` IOMMU devices diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 5ea14b6dbf..439dd893ff 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -6551,6 +6551,7 @@ <value>dimm</value> <value>nvdimm</value> <value>virtio-pmem</value> + <value>virtio-mem</value> </choice> </attribute> <optional> @@ -6638,6 +6639,16 @@ <ref name="unsignedInt"/> </element> </optional> + <optional> + <element name="block"> + <ref name="scaledInteger"/> + </element> + </optional> + <optional> + <element name="requested"> + <ref name="scaledInteger"/> + </element> + </optional> <optional> <element name="label"> <element name="size"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index f65509d8ec..cac484c9a2 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -1381,6 +1381,7 @@ VIR_ENUM_IMPL(virDomainMemoryModel, "dimm", "nvdimm", "virtio-pmem", + "virtio-mem", ); VIR_ENUM_IMPL(virDomainShmemModel, @@ -5469,6 +5470,7 @@ virDomainMemoryDefPostParse(virDomainMemoryDef *mem, } break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_DIMM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: @@ -14623,6 +14625,7 @@ virDomainMemorySourceDefParseXML(xmlNodePtr node, switch (def->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: if (virDomainParseMemory("./pagesize", "./pagesize/@unit", ctxt, &def->pagesize, false, false) < 0) return -1; @@ -14689,7 +14692,8 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node, &def->size, true, false) < 0) return -1; - if (def->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) { + switch (def->model) { + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: if (virDomainParseMemory("./label/size", "./label/size/@unit", ctxt, &def->labelsize, false, false) < 0) return -1; @@ -14708,6 +14712,23 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node, if (virXPathBoolean("boolean(./readonly)", ctxt)) def->readonly = true; + break; + + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (virDomainParseMemory("./block", "./block/@unit", ctxt, + &def->blocksize, false, false) < 0) + return -1; + + if (virDomainParseMemory("./requested", "./requested/@unit", ctxt, + &def->requestedsize, false, false) < 0) + return -1; + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; } return 0; @@ -16466,11 +16487,14 @@ virDomainMemoryFindByDefInternal(virDomainDef *def, /* target info -> always present */ if (tmp->model != mem->model || tmp->targetNode != mem->targetNode || - tmp->size != mem->size) + tmp->size != mem->size || + tmp->blocksize != mem->blocksize || + tmp->requestedsize != mem->requestedsize) continue; switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: /* source stuff -> match with device */ if (tmp->pagesize != mem->pagesize) continue; @@ -21841,6 +21865,22 @@ virDomainMemoryDefCheckABIStability(virDomainMemoryDef *src, return false; } + if (src->blocksize != dst->blocksize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Target memory device block size '%llu' doesn't match " + "source memory device block size '%llu'"), + dst->blocksize, src->blocksize); + return false; + } + + if (src->requestedsize != dst->requestedsize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Target memory device requested size '%llu' doesn't match " + "source memory device requested size '%llu'"), + dst->requestedsize, src->requestedsize); + return false; + } + if (src->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) { if (src->labelsize != dst->labelsize) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, @@ -25753,6 +25793,7 @@ virDomainMemorySourceDefFormat(virBuffer *buf, switch (def->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: if (def->sourceNodes) { if (!(bitmap = virBitmapFormat(def->sourceNodes))) return -1; @@ -25809,6 +25850,14 @@ virDomainMemoryTargetDefFormat(virBuffer *buf, if (def->readonly) virBufferAddLit(&childBuf, "<readonly/>\n"); + if (def->blocksize) { + virBufferAsprintf(&childBuf, "<block unit='KiB'>%llu</block>\n", + def->blocksize); + + virBufferAsprintf(&childBuf, "<requested unit='KiB'>%llu</requested>\n", + def->requestedsize); + } + virXMLFormatElement(buf, "target", NULL, &childBuf); } diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index f706c498ff..bafa379118 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2459,6 +2459,7 @@ typedef enum { VIR_DOMAIN_MEMORY_MODEL_DIMM, /* dimm hotpluggable memory device */ VIR_DOMAIN_MEMORY_MODEL_NVDIMM, /* nvdimm memory device */ VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM, /* virtio-pmem memory device */ + VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM, /* virtio-mem memory device */ VIR_DOMAIN_MEMORY_MODEL_LAST } virDomainMemoryModel; @@ -2479,6 +2480,8 @@ struct _virDomainMemoryDef { int targetNode; unsigned long long size; /* kibibytes */ unsigned long long labelsize; /* kibibytes; valid only for NVDIMM */ + unsigned long long blocksize; /* kibibytes; valid only for VIRTIO_MEM */ + unsigned long long requestedsize; /* kibibytes; valid only for VIRTIO_MEM */ bool readonly; /* valid only for NVDIMM */ /* required for QEMU NVDIMM ppc64 support */ diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c index 2124d25d16..c256fb1133 100644 --- a/src/conf/domain_validate.c +++ b/src/conf/domain_validate.c @@ -28,6 +28,7 @@ #include "virlog.h" #include "virutil.h" #include "virstring.h" +#include "virhostmem.h" #define VIR_FROM_THIS VIR_FROM_DOMAIN @@ -1873,6 +1874,8 @@ static int virDomainMemoryDefValidate(const virDomainMemoryDef *mem, const virDomainDef *def) { + unsigned long long thpSize; + switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: if (!mem->nvdimmPath) { @@ -1926,6 +1929,42 @@ virDomainMemoryDefValidate(const virDomainMemoryDef *mem, _("virtio-pmem does not support NUMA nodes")); return -1; } + break; + + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (mem->requestedsize > mem->size) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("requested size must be smaller than or equal to @size")); + return -1; + } + + if (!VIR_IS_POW2(mem->blocksize)) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("block size must be a power of two")); + return -1; + } + + if (virHostMemGetTHPSize(&thpSize) < 0) { + /* We failed to get THP size, fall back to a sane default. On + * almost every architecture the size will be 2MiB, except for some + * funky arches like sparc and m68k. Use 2MiB and refine later if + * somebody complains. */ + thpSize = 2048; + } + + if (mem->blocksize < thpSize) { + virReportError(VIR_ERR_XML_DETAIL, + _("block size too small, must be at least %lluKiB"), + thpSize); + return -1; + } + + if (mem->requestedsize % mem->blocksize != 0) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("requested size must be an integer multiple of block size")); + return -1; + } + break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: break; diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c index ed47fa335a..79e8953b2f 100644 --- a/src/qemu/qemu_alias.c +++ b/src/qemu/qemu_alias.c @@ -522,6 +522,7 @@ qemuAssignDeviceMemoryAlias(virDomainDef *def, case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: prefix = "virtiopmem"; break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index ea513693f7..6c5a601811 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3316,6 +3316,7 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def, device = "virtio-pmem-pci"; break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index fc60e15eea..d4ad29fac4 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -8962,6 +8962,16 @@ qemuDomainDefValidateMemoryHotplugDevice(const virDomainMemoryDef *mem, needsNuma = false; break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI && + mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("only 'pci' addresses are supported for the %s device"), + virDomainMemoryModelTypeToString(mem->model)); + return -1; + } + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: return -1; diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 6cd0cb8c84..d148982ac8 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -380,9 +380,18 @@ qemuDomainPrimeVirtioDeviceAddresses(virDomainDef *def, } for (i = 0; i < def->nmems; i++) { - if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM && - def->mems[i]->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) - def->mems[i]->info.type = type; + switch (def->mems[i]->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (def->mems[i]->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) + def->mems[i]->info.type = type; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } } if (type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_CCW) { @@ -1010,6 +1019,7 @@ qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDef *dev, case VIR_DOMAIN_DEVICE_MEMORY: switch (dev->data.memory->model) { case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: return virtioFlags; case VIR_DOMAIN_MEMORY_MODEL_NONE: @@ -2370,12 +2380,19 @@ qemuDomainAssignDevicePCISlots(virDomainDef *def, for (i = 0; i < def->nmems; i++) { virDomainMemoryDef *mem = def->mems[i]; - if (mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM || - !virDeviceInfoPCIAddressIsWanted(&mem->info)) - continue; - - if (qemuDomainPCIAddressReserveNextAddr(addrs, &mem->info) < 0) - return -1; + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (virDeviceInfoPCIAddressIsWanted(&mem->info) && + qemuDomainPCIAddressReserveNextAddr(addrs, &mem->info) < 0) + return -1; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } } return 0; @@ -3039,6 +3056,7 @@ qemuDomainAssignMemoryDeviceSlot(virQEMUDriver *driver, break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: return qemuDomainEnsurePCIAddress(vm, &dev, driver); break; @@ -3064,6 +3082,7 @@ qemuDomainReleaseMemoryDeviceSlot(virDomainObj *vm, break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: qemuDomainReleaseDeviceAddress(vm, &mem->info); break; @@ -3098,6 +3117,7 @@ qemuDomainAssignMemorySlots(virDomainDef *def) break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: /* handled in qemuDomainAssignPCIAddresses() */ break; case VIR_DOMAIN_MEMORY_MODEL_NONE: diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 2b03b0ab98..62208048b0 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3877,6 +3877,7 @@ qemuProcessDomainMemoryDefNeedHugepagesPath(const virDomainMemoryDef *mem, { switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: return mem->pagesize && mem->pagesize != system_pagesize; case VIR_DOMAIN_MEMORY_MODEL_NONE: @@ -3945,6 +3946,7 @@ qemuProcessNeedMemoryBackingPath(virDomainDef *def, if (mem) { switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: if (mem->access != VIR_DOMAIN_MEMORY_ACCESS_DEFAULT) { /* No need to check for access mode on the target node, * it was checked for in the previous loop. */ diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 382473d03b..18d35e6b3e 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -4933,6 +4933,14 @@ qemuValidateDomainDeviceDefMemory(virDomainMemoryDef *mem, } break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("virtio-mem isn't supported by this QEMU binary")); + return -1; + } + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; diff --git a/src/security/security_apparmor.c b/src/security/security_apparmor.c index 84363015dc..4901461f31 100644 --- a/src/security/security_apparmor.c +++ b/src/security/security_apparmor.c @@ -693,6 +693,7 @@ AppArmorSetMemoryLabel(virSecurityManager *mgr, return reload_profile(mgr, def, mem->nvdimmPath, true); case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; } diff --git a/src/security/security_dac.c b/src/security/security_dac.c index 4909107fcc..fb827eef43 100644 --- a/src/security/security_dac.c +++ b/src/security/security_dac.c @@ -1850,6 +1850,7 @@ virSecurityDACRestoreMemoryLabel(virSecurityManager *mgr, break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: case VIR_DOMAIN_MEMORY_MODEL_NONE: ret = 0; @@ -2033,6 +2034,7 @@ virSecurityDACSetMemoryLabel(virSecurityManager *mgr, break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: case VIR_DOMAIN_MEMORY_MODEL_NONE: ret = 0; diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index b50f4463cc..baeb8f9b34 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1580,6 +1580,7 @@ virSecuritySELinuxSetMemoryLabel(virSecurityManager *mgr, case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; } @@ -1607,6 +1608,7 @@ virSecuritySELinuxRestoreMemoryLabel(virSecurityManager *mgr, break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: ret = 0; diff --git a/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml new file mode 100644 index 0000000000..c10528aad8 --- /dev/null +++ b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml @@ -0,0 +1,67 @@ +<domain type='kvm'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <maxMemory slots='16' unit='KiB'>1099511627776</maxMemory> + <memory unit='KiB'>8388608</memory> + <currentMemory unit='KiB'>8388608</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <cpu mode='custom' match='exact' check='none'> + <model fallback='forbid'>qemu64</model> + <topology sockets='2' dies='1' cores='1' threads='1'/> + <numa> + <cell id='0' cpus='0-1' memory='2095104' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i386</emulator> + <disk type='block' device='disk'> + <driver name='qemu' type='raw'/> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='ide' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <controller type='usb' index='0' model='piix3-uhci'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <audio id='1' type='none'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </memballoon> + <memory model='virtio-mem'> + <target> + <size unit='KiB'>1048576</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>524288</requested> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </memory> + <memory model='virtio-mem'> + <source> + <nodemask>1-3</nodemask> + <pagesize unit='KiB'>2048</pagesize> + </source> + <target> + <size unit='KiB'>2097152</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>1048576</requested> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </memory> + </devices> +</domain> diff --git a/tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml b/tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml new file mode 120000 index 0000000000..a9d298129c --- /dev/null +++ b/tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml @@ -0,0 +1 @@ +../qemuxml2argvdata/memory-hotplug-virtio-mem.xml \ No newline at end of file diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 40e027aaa4..0ddbdae6b6 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -1259,6 +1259,7 @@ mymain(void) QEMU_CAPS_DEVICE_NVDIMM, QEMU_CAPS_LAST); DO_TEST_CAPS_LATEST("memory-hotplug-virtio-pmem"); + DO_TEST_CAPS_LATEST("memory-hotplug-virtio-mem"); DO_TEST("net-udp", NONE); -- 2.31.1

Nothing special is happening here. All important changes were done when for 'virtio-pmem' (adjusting the code to put virtio memory on PCI bus, generating alias using qemuDomainDeviceAliasIndex(). The only bit that might look suspicious is no prealloc for virtio-mem. But if you think about it, the whole purpose of this device is to change amount of memory exposed to guest on the fly. There is no point in locking the whole backend in memory. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_alias.c | 9 +++- src/qemu/qemu_command.c | 12 +++++- ...mory-hotplug-virtio-mem.x86_64-latest.args | 41 +++++++++++++++++++ tests/qemuxml2argvtest.c | 1 + 4 files changed, 60 insertions(+), 3 deletions(-) create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c index 79e8953b2f..81a1e7eeed 100644 --- a/src/qemu/qemu_alias.c +++ b/src/qemu/qemu_alias.c @@ -475,8 +475,11 @@ qemuDeviceMemoryGetAliasID(virDomainDef *def, size_t i; int maxidx = 0; - /* virtio-pmem goes onto PCI bus and thus DIMM address is not valid */ - if (!oldAlias && mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM) + /* virtio-pmem and virtio-mem go onto PCI bus and thus DIMM address is not + * valid */ + if (!oldAlias && + mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM && + mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) return mem->info.addr.dimm.slot; for (i = 0; i < def->nmems; i++) { @@ -523,6 +526,8 @@ qemuAssignDeviceMemoryAlias(virDomainDef *def, prefix = "virtiopmem"; break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + prefix = "virtiomem"; + break; case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 6c5a601811..e57303fc4e 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3083,7 +3083,9 @@ qemuBuildMemoryBackendProps(virJSONValue **backendProps, } else if (useHugepage) { if (qemuGetDomainHupageMemPath(priv->driver, def, pagesize, &memPath) < 0) return -1; - prealloc = true; + /* For virtio-mem backed by hugepages we don't need prealloc. */ + if (mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + prealloc = true; } else { /* We can have both pagesize and mem source. If that's the case, * prefer hugepages as those are more specific. */ @@ -3317,6 +3319,9 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def, break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + device = "virtio-mem-pci"; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: @@ -3333,6 +3338,11 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def, if (mem->labelsize) virBufferAsprintf(&buf, "label-size=%llu,", mem->labelsize * 1024); + if (mem->blocksize) { + virBufferAsprintf(&buf, "block-size=%llu,", mem->blocksize * 1024); + virBufferAsprintf(&buf, "requested-size=%llu,", mem->requestedsize * 1024); + } + if (mem->uuid) { char uuidstr[VIR_UUID_STRING_BUFLEN]; diff --git a/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args new file mode 100644 index 0000000000..16f10c14ad --- /dev/null +++ b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args @@ -0,0 +1,41 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +/usr/bin/qemu-system-i386 \ +-name guest=QEMUGuest1,debug-threads=on \ +-S \ +-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/tmp/lib/domain--1-QEMUGuest1/master-key.aes"}' \ +-machine pc,accel=kvm,usb=off,dump-guest-core=off \ +-cpu qemu64 \ +-m size=2095104k,slots=16,maxmem=1099511627776k \ +-overcommit mem-lock=off \ +-smp 2,sockets=2,dies=1,cores=1,threads=1 \ +-object '{"qom-type":"memory-backend-ram","id":"ram-node0","size":2145386496}' \ +-numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \ +-object '{"qom-type":"memory-backend-ram","id":"memvirtiomem0","size":1073741824}' \ +-device virtio-mem-pci,node=0,block-size=2097152,requested-size=536870912,memdev=memvirtiomem0,id=virtiomem0,bus=pci.0,addr=0x2 \ +-object '{"qom-type":"memory-backend-file","id":"memvirtiomem1","mem-path":"/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1","size":2147483648,"host-nodes":[1,2,3],"policy":"bind"}' \ +-device virtio-mem-pci,node=0,block-size=2097152,requested-size=1073741824,memdev=memvirtiomem1,id=virtiomem1,bus=pci.0,addr=0x3 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-boot strict=on \ +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ +-blockdev '{"driver":"host_device","filename":"/dev/HostVG/QEMUGuest1","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ +-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}' \ +-device ide-hd,bus=ide.0,unit=0,drive=libvirt-1-format,id=ide0-0-0,bootindex=1 \ +-audiodev id=audio1,driver=none \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ +-msg timestamp=on diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 9df28658b9..6587cf6a92 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -3123,6 +3123,7 @@ mymain(void) QEMU_CAPS_LAST); DO_TEST_CAPS_VER("memory-hotplug-virtio-pmem", "5.2.0"); DO_TEST_CAPS_LATEST("memory-hotplug-virtio-pmem"); + DO_TEST_CAPS_LATEST("memory-hotplug-virtio-mem"); DO_TEST("machine-aeskeywrap-on-caps", QEMU_CAPS_AES_KEY_WRAP, -- 2.31.1

As advertised in one of previous commits, we want to be able to change 'requested-size' attribute of virtio-mem on the fly. This commit does exactly that. Changing anything else is checked for and forbidden. Once guest has changed the allocation, QEMU emits an event which we will use to track the allocation. In the next commit. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> --- src/conf/domain_conf.c | 39 +++++++- src/conf/domain_conf.h | 4 + src/libvirt_private.syms | 1 + src/qemu/qemu_driver.c | 174 ++++++++++++++++++++++++++++++++++- src/qemu/qemu_hotplug.c | 18 ++++ src/qemu/qemu_hotplug.h | 5 + src/qemu/qemu_monitor.c | 13 +++ src/qemu/qemu_monitor.h | 5 + src/qemu/qemu_monitor_json.c | 15 +++ src/qemu/qemu_monitor_json.h | 5 + 10 files changed, 277 insertions(+), 2 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index cac484c9a2..76344b592d 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -16549,6 +16549,42 @@ virDomainMemoryFindInactiveByDef(virDomainDef *def, } +/** + * virDomainMemoryFindByDeviceInfo: + * @def: domain defintion + * @info: device info to match + * + * For given domain definition @def find <memory/> device with + * matching address and matching device alias (if set in @info, + * otherwise ignored). + * + * Returns: device if found, + * NULL otherwise. + */ +virDomainMemoryDef * +virDomainMemoryFindByDeviceInfo(virDomainDef *def, + virDomainDeviceInfo *info) +{ + size_t i; + + for (i = 0; i < def->nmems; i++) { + virDomainMemoryDef *tmp = def->mems[i]; + + if (!virDomainDeviceInfoAddressIsEqual(&tmp->info, info)) + continue; + + /* alias, if present */ + if (info->alias && + STRNEQ_NULLABLE(tmp->info.alias, info->alias)) + continue; + + return tmp; + } + + return NULL; +} + + /** * virDomainMemoryInsert: * @@ -28515,7 +28551,8 @@ virDomainDefCompatibleDevice(virDomainDef *def, return -1; } - if ((virDomainDefGetMemoryTotal(def) + sz) > def->mem.max_memory) { + if (action == VIR_DOMAIN_DEVICE_ACTION_ATTACH && + (virDomainDefGetMemoryTotal(def) + sz) > def->mem.max_memory) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("Attaching memory device with size '%llu' would " "exceed domain's maxMemory config size '%llu'"), diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index bafa379118..0c43a6ae64 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -3771,6 +3771,10 @@ int virDomainMemoryFindByDef(virDomainDef *def, virDomainMemoryDef *mem) int virDomainMemoryFindInactiveByDef(virDomainDef *def, virDomainMemoryDef *mem) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; +virDomainMemoryDef * +virDomainMemoryFindByDeviceInfo(virDomainDef *dev, + virDomainDeviceInfo *info) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; int virDomainShmemDefInsert(virDomainDef *def, virDomainShmemDef *shmem) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 19f0d0ddc8..57290b6573 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -503,6 +503,7 @@ virDomainMemballoonModelTypeFromString; virDomainMemballoonModelTypeToString; virDomainMemoryDefFree; virDomainMemoryFindByDef; +virDomainMemoryFindByDeviceInfo; virDomainMemoryFindInactiveByDef; virDomainMemoryInsert; virDomainMemoryModelTypeToString; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 235f575901..a638c67f1e 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7113,6 +7113,175 @@ qemuDomainChangeDiskLive(virDomainObj *vm, return 0; } + +static bool +qemuDomainChangeMemoryLiveValidateChange(const virDomainMemoryDef *oldDef, + const virDomainMemoryDef *newDef) +{ + /* The only thing that is allowed to change is 'requestedsize' for + * virtio-mem model. Check if user isn't trying to sneak in change for + * something else. */ + + switch (oldDef->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory of model '%s'"), + virDomainMemoryModelTypeToString(oldDef->model)); + return false; + break; + } + + if (oldDef->model != newDef->model) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory model from '%s' to '%s'"), + virDomainMemoryModelTypeToString(oldDef->model), + virDomainMemoryModelTypeToString(newDef->model)); + return false; + } + + if (oldDef->access != newDef->access) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory access from '%s' to '%s'"), + virDomainMemoryAccessTypeToString(oldDef->access), + virDomainMemoryAccessTypeToString(newDef->access)); + return false; + } + + if (oldDef->discard != newDef->discard) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory discard from '%s' to '%s'"), + virTristateBoolTypeToString(oldDef->discard), + virTristateBoolTypeToString(newDef->discard)); + return false; + } + + if (!virBitmapEqual(oldDef->sourceNodes, + newDef->sourceNodes)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cannot modify memory source nodes")); + return false; + } + + if (oldDef->pagesize != newDef->pagesize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory pagesize from '%llu' to '%llu'"), + oldDef->pagesize, + newDef->pagesize); + return false; + } + + if (STRNEQ_NULLABLE(oldDef->nvdimmPath, newDef->nvdimmPath)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory path from '%s' to '%s'"), + NULLSTR(oldDef->nvdimmPath), + NULLSTR(newDef->nvdimmPath)); + return false; + } + + if (oldDef->alignsize != newDef->alignsize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory align size from '%llu' to '%llu'"), + oldDef->alignsize, newDef->alignsize); + return false; + } + + if (oldDef->nvdimmPmem != newDef->nvdimmPmem) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory pmem from '%d' to '%d'"), + oldDef->nvdimmPmem, newDef->nvdimmPmem); + return false; + } + + if (oldDef->targetNode != newDef->targetNode) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory targetNode from '%d' to '%d'"), + oldDef->targetNode, newDef->targetNode); + return false; + } + + if (oldDef->size != newDef->size) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory size from '%llu' to '%llu'"), + oldDef->size, newDef->size); + return false; + } + + if (oldDef->labelsize != newDef->labelsize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory label size from '%llu' to '%llu'"), + oldDef->labelsize, newDef->labelsize); + return false; + } + if (oldDef->blocksize != newDef->blocksize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory block size from '%llu' to '%llu'"), + oldDef->blocksize, newDef->blocksize); + return false; + } + + /* requestedsize can change */ + + if (oldDef->readonly != newDef->readonly) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cannot modify memory pmem flag")); + return false; + } + + if ((oldDef->uuid || newDef->uuid) && + !(oldDef->uuid && newDef->uuid && + memcmp(oldDef->uuid, newDef->uuid, VIR_UUID_BUFLEN) == 0)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cannot modify memory UUID")); + return false; + } + + return true; +} + + +static int +qemuDomainChangeMemoryLive(virQEMUDriver *driver G_GNUC_UNUSED, + virDomainObj *vm, + virDomainDeviceDef *dev) +{ + virDomainDeviceDef oldDev = { .type = VIR_DOMAIN_DEVICE_MEMORY }; + virDomainMemoryDef *newDef = dev->data.memory; + virDomainMemoryDef *oldDef = NULL; + + oldDef = virDomainMemoryFindByDeviceInfo(vm->def, &dev->data.memory->info); + if (!oldDef) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("memory '%s' not found"), dev->data.memory->info.alias); + return -1; + } + + oldDev.data.memory = oldDef; + + if (virDomainDefCompatibleDevice(vm->def, dev, &oldDev, + VIR_DOMAIN_DEVICE_ACTION_UPDATE, + true) < 0) + return -1; + + if (!qemuDomainChangeMemoryLiveValidateChange(oldDef, newDef)) + return -1; + + if (qemuDomainChangeMemoryRequestedSize(driver, vm, + newDef->info.alias, + newDef->requestedsize) < 0) + return -1; + + oldDef->requestedsize = newDef->requestedsize; + return 0; +} + + static int qemuDomainUpdateDeviceLive(virDomainObj *vm, virDomainDeviceDef *dev, @@ -7154,6 +7323,10 @@ qemuDomainUpdateDeviceLive(virDomainObj *vm, ret = qemuDomainChangeNet(driver, vm, dev); break; + case VIR_DOMAIN_DEVICE_MEMORY: + ret = qemuDomainChangeMemoryLive(driver, vm, dev); + break; + case VIR_DOMAIN_DEVICE_FS: case VIR_DOMAIN_DEVICE_INPUT: case VIR_DOMAIN_DEVICE_SOUND: @@ -7169,7 +7342,6 @@ qemuDomainUpdateDeviceLive(virDomainObj *vm, case VIR_DOMAIN_DEVICE_HOSTDEV: case VIR_DOMAIN_DEVICE_CONTROLLER: case VIR_DOMAIN_DEVICE_REDIRDEV: - case VIR_DOMAIN_DEVICE_MEMORY: case VIR_DOMAIN_DEVICE_CHR: case VIR_DOMAIN_DEVICE_NONE: case VIR_DOMAIN_DEVICE_TPM: diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index d2a354d026..d9ca4016ff 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -6743,3 +6743,21 @@ qemuDomainSetVcpuInternal(virQEMUDriver *driver, virBitmapFree(livevcpus); return ret; } + + +int +qemuDomainChangeMemoryRequestedSize(virQEMUDriver *driver, + virDomainObj *vm, + const char *alias, + unsigned long long requestedsize) +{ + qemuDomainObjPrivate *priv = vm->privateData; + int rc; + + qemuDomainObjEnterMonitor(driver, vm); + rc = qemuMonitorChangeMemoryRequestedSize(priv->mon, alias, requestedsize); + if (qemuDomainObjExitMonitor(driver, vm) < 0) + return -1; + + return rc; +} diff --git a/src/qemu/qemu_hotplug.h b/src/qemu/qemu_hotplug.h index b5f7afb076..ae8feb459c 100644 --- a/src/qemu/qemu_hotplug.h +++ b/src/qemu/qemu_hotplug.h @@ -166,3 +166,8 @@ int qemuHotplugAttachDBusVMState(virQEMUDriver *driver, int qemuHotplugRemoveDBusVMState(virQEMUDriver *driver, virDomainObj *vm, qemuDomainAsyncJob asyncJob); + +int qemuDomainChangeMemoryRequestedSize(virQEMUDriver *driver, + virDomainObj *vm, + const char *alias, + unsigned long long requestedsize); diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 8f35b4240f..be04684585 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -4743,3 +4743,16 @@ qemuMonitorQueryDirtyRate(qemuMonitor *mon, return qemuMonitorJSONQueryDirtyRate(mon, info); } + + +int +qemuMonitorChangeMemoryRequestedSize(qemuMonitor *mon, + const char *alias, + unsigned long long requestedsize) +{ + VIR_DEBUG("alias=%s requestedsize=%llu", alias, requestedsize); + + QEMU_CHECK_MONITOR(mon); + + return qemuMonitorJSONChangeMemoryRequestedSize(mon, alias, requestedsize); +} diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 6a25def78b..b5de5711e0 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1496,3 +1496,8 @@ struct _qemuMonitorDirtyRateInfo { int qemuMonitorQueryDirtyRate(qemuMonitor *mon, qemuMonitorDirtyRateInfo *info); + +int +qemuMonitorChangeMemoryRequestedSize(qemuMonitor *mon, + const char *alias, + unsigned long long requestedsize); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 223777739d..d3db88a389 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -9404,3 +9404,18 @@ qemuMonitorJSONQueryDirtyRate(qemuMonitor *mon, return qemuMonitorJSONExtractDirtyRateInfo(data, info); } + + +int +qemuMonitorJSONChangeMemoryRequestedSize(qemuMonitor *mon, + const char *alias, + unsigned long long requestedsize) +{ + g_autofree char *path = g_strdup_printf("/machine/peripheral/%s", alias); + qemuMonitorJSONObjectProperty prop = { + .type = QEMU_MONITOR_OBJECT_PROPERTY_ULONG, + .val.ul = requestedsize * 1024, /* monitor needs bytes */ + }; + + return qemuMonitorJSONSetObjectProperty(mon, path, "requested-size", &prop); +} diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index 01a3ba25f1..f91917b671 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -706,3 +706,8 @@ qemuMonitorJSONStartDirtyRateCalc(qemuMonitor *mon, int qemuMonitorJSONQueryDirtyRate(qemuMonitor *mon, qemuMonitorDirtyRateInfo *info); + +int +qemuMonitorJSONChangeMemoryRequestedSize(qemuMonitor *mon, + const char *alias, + unsigned long long requestedsize); -- 2.31.1

The virtio-mem has another property that isn't exposed yet: actual size exposed to the guest. Please note, that this is different to <requested/> because esp. on sizing the memory down guest may refuse to release some blocks. Therefore, let's have another size to report in the XML. But because of its nature, the <actual/> won't be parsed and is report only (for live XMLs). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.rst | 7 +++++++ docs/schemas/domaincommon.rng | 5 +++++ src/conf/domain_conf.c | 9 +++++++-- src/conf/domain_conf.h | 3 +++ 4 files changed, 22 insertions(+), 2 deletions(-) diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 967d5bbb91..01e5a3fe03 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -7773,6 +7773,7 @@ Example: usage of the memory devices <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>1048576</requested> + <actual unit='KiB'>524288</requested> </target> </memory> </devices> @@ -7888,6 +7889,12 @@ Example: usage of the memory devices The total size exposed to the guest. Must respect ``block`` granularity and be smaller than or equal to ``size``. + ``actual`` + Active XML for ``virtio-mem`` model may contain ``actual`` element that + reflects the actual size of the corresponding virtio memory device. The + element is formatted into live XML and never parsed, i.e. it is + output-only element. + :anchor:`<a id="elementsIommu"/>` IOMMU devices diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 439dd893ff..b832dce19b 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -6649,6 +6649,11 @@ <ref name="scaledInteger"/> </element> </optional> + <optional> + <element name="actual"> + <ref name="scaledInteger"/> + </element> + </optional> <optional> <element name="label"> <element name="size"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 76344b592d..1ef70e8a80 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -25870,7 +25870,8 @@ virDomainMemorySourceDefFormat(virBuffer *buf, static void virDomainMemoryTargetDefFormat(virBuffer *buf, - virDomainMemoryDef *def) + virDomainMemoryDef *def, + unsigned int flags) { g_auto(virBuffer) childBuf = VIR_BUFFER_INIT_CHILD(buf); @@ -25892,6 +25893,10 @@ virDomainMemoryTargetDefFormat(virBuffer *buf, virBufferAsprintf(&childBuf, "<requested unit='KiB'>%llu</requested>\n", def->requestedsize); + if (!(flags & VIR_DOMAIN_DEF_FORMAT_INACTIVE)) { + virBufferAsprintf(&childBuf, "<actual unit='KiB'>%llu</actual>\n", + def->actualsize); + } } virXMLFormatElement(buf, "target", NULL, &childBuf); @@ -25924,7 +25929,7 @@ virDomainMemoryDefFormat(virBuffer *buf, if (virDomainMemorySourceDefFormat(buf, def) < 0) return -1; - virDomainMemoryTargetDefFormat(buf, def); + virDomainMemoryTargetDefFormat(buf, def, flags); virDomainDeviceInfoFormat(buf, &def->info, flags); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 0c43a6ae64..b57eba655d 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2482,6 +2482,9 @@ struct _virDomainMemoryDef { unsigned long long labelsize; /* kibibytes; valid only for NVDIMM */ unsigned long long blocksize; /* kibibytes; valid only for VIRTIO_MEM */ unsigned long long requestedsize; /* kibibytes; valid only for VIRTIO_MEM */ + unsigned long long actualsize; /* kibibytes, valid for VIRTIO_MEM and + active domain only, only to report never + parse */ bool readonly; /* valid only for NVDIMM */ /* required for QEMU NVDIMM ppc64 support */ -- 2.31.1

This function will be needed in the next commit where we will want to find virtio-mem given its alias by QEMU on the monitor. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/domain_conf.c | 17 +++++++++++++++++ src/conf/domain_conf.h | 5 +++++ src/libvirt_private.syms | 1 + 3 files changed, 23 insertions(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 1ef70e8a80..d75c73884c 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -16585,6 +16585,23 @@ virDomainMemoryFindByDeviceInfo(virDomainDef *def, } +virDomainMemoryDef * +virDomainMemoryFindByDeviceAlias(virDomainDef *def, + const char *alias) +{ + size_t i; + + for (i = 0; i < def->nmems; i++) { + virDomainMemoryDef *tmp = def->mems[i]; + + if (STREQ_NULLABLE(tmp->info.alias, alias)) + return tmp; + } + + return NULL; +} + + /** * virDomainMemoryInsert: * diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index b57eba655d..c41f371aa9 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -3779,6 +3779,11 @@ virDomainMemoryFindByDeviceInfo(virDomainDef *dev, virDomainDeviceInfo *info) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; +virDomainMemoryDef * +virDomainMemoryFindByDeviceAlias(virDomainDef *def, + const char *alias) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; + int virDomainShmemDefInsert(virDomainDef *def, virDomainShmemDef *shmem) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; bool virDomainShmemDefEquals(virDomainShmemDef *src, virDomainShmemDef *dst) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 57290b6573..0721e13143 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -503,6 +503,7 @@ virDomainMemballoonModelTypeFromString; virDomainMemballoonModelTypeToString; virDomainMemoryDefFree; virDomainMemoryFindByDef; +virDomainMemoryFindByDeviceAlias; virDomainMemoryFindByDeviceInfo; virDomainMemoryFindInactiveByDef; virDomainMemoryInsert; -- 2.31.1

As advertised in previous commit, this event is delivered to us when virtio-mem module changes the allocation inside the guest. It comes with one attribute - size - which holds the new size of the virtio-mem (well, allocated size), in bytes. Mind you, this is not necessarily the same number as 'requested size'. It almost certainly will be when sizing the memory up, but it might not be when sizing the memory down - the guest kernel might be unable to free some blocks. This actual size is reported in the domain XML as an output element only. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- examples/c/misc/event-test.c | 17 ++++++ include/libvirt/libvirt-domain.h | 23 ++++++++ src/conf/domain_event.c | 84 +++++++++++++++++++++++++++++ src/conf/domain_event.h | 10 ++++ src/libvirt_private.syms | 2 + src/qemu/qemu_domain.c | 3 ++ src/qemu/qemu_domain.h | 1 + src/qemu/qemu_driver.c | 37 +++++++++++++ src/qemu/qemu_monitor.c | 24 +++++++++ src/qemu/qemu_monitor.h | 20 +++++++ src/qemu/qemu_monitor_json.c | 24 +++++++++ src/qemu/qemu_process.c | 42 +++++++++++++++ src/remote/remote_daemon_dispatch.c | 30 +++++++++++ src/remote/remote_driver.c | 32 +++++++++++ src/remote/remote_protocol.x | 14 ++++- src/remote_protocol-structs | 7 +++ tools/virsh-domain.c | 20 +++++++ 17 files changed, 389 insertions(+), 1 deletion(-) diff --git a/examples/c/misc/event-test.c b/examples/c/misc/event-test.c index 10c707e66b..1eec76c79d 100644 --- a/examples/c/misc/event-test.c +++ b/examples/c/misc/event-test.c @@ -982,6 +982,22 @@ myDomainEventMemoryFailureCallback(virConnectPtr conn G_GNUC_UNUSED, } +static int +myDomainEventMemoryDeviceSizeChangeCallback(virConnectPtr conn G_GNUC_UNUSED, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque G_GNUC_UNUSED) +{ + /* Casts to uint64_t to work around mingw not knowing %lld */ + printf("%s EVENT: Domain %s(%d) memory device size change: " + "alias: '%s' new size %" PRIu64 "'\n", + __func__, virDomainGetName(dom), virDomainGetID(dom), + alias, (uint64_t)size); + return 0; +} + + static int myDomainEventMigrationIterationCallback(virConnectPtr conn G_GNUC_UNUSED, virDomainPtr dom, @@ -1113,6 +1129,7 @@ struct domainEventData domainEvents[] = { DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_METADATA_CHANGE, myDomainEventMetadataChangeCallback), DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_BLOCK_THRESHOLD, myDomainEventBlockThresholdCallback), DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_MEMORY_FAILURE, myDomainEventMemoryFailureCallback), + DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE, myDomainEventMemoryDeviceSizeChangeCallback), }; struct storagePoolEventData { diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 7ef8ac51e5..534063d15b 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -4654,6 +4654,28 @@ typedef void (*virConnectDomainEventMemoryFailureCallback)(virConnectPtr conn, unsigned int flags, void *opaque); +/** + * virConnectDomainEventMemoryDeviceSizeChangeCallback: + * @conn: connection object + * @dom: domain on which the event occurred + * @alias: memory device alias + * @size: new actual size of memory device (in KiB) + * @opaque: application specified data + * + * The callback occurs when the guest acknowledges request to change size of + * memory device (so far only virtio-mem model supports this). The @size then + * reflects the new amount of guest visible memory (in kibibytes). + * + * The callback signature to use when registering for an event of type + * VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE with + * virConnectDomainEventRegisterAny(). + */ +typedef void (*virConnectDomainEventMemoryDeviceSizeChangeCallback)(virConnectPtr conn, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque); + /** * VIR_DOMAIN_EVENT_CALLBACK: @@ -4698,6 +4720,7 @@ typedef enum { VIR_DOMAIN_EVENT_ID_METADATA_CHANGE = 23, /* virConnectDomainEventMetadataChangeCallback */ VIR_DOMAIN_EVENT_ID_BLOCK_THRESHOLD = 24, /* virConnectDomainEventBlockThresholdCallback */ VIR_DOMAIN_EVENT_ID_MEMORY_FAILURE = 25, /* virConnectDomainEventMemoryFailureCallback */ + VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE = 26, /* virConnectDomainEventMemoryDeviceSizeChangeCallback */ # ifdef VIR_ENUM_SENTINELS VIR_DOMAIN_EVENT_ID_LAST diff --git a/src/conf/domain_event.c b/src/conf/domain_event.c index 15a3baedf7..18539e348b 100644 --- a/src/conf/domain_event.c +++ b/src/conf/domain_event.c @@ -58,6 +58,7 @@ static virClass *virDomainEventDeviceRemovalFailedClass; static virClass *virDomainEventMetadataChangeClass; static virClass *virDomainEventBlockThresholdClass; static virClass *virDomainEventMemoryFailureClass; +static virClass *virDomainEventMemoryDeviceSizeChangeClass; static void virDomainEventDispose(void *obj); static void virDomainEventLifecycleDispose(void *obj); @@ -81,6 +82,7 @@ static void virDomainEventDeviceRemovalFailedDispose(void *obj); static void virDomainEventMetadataChangeDispose(void *obj); static void virDomainEventBlockThresholdDispose(void *obj); static void virDomainEventMemoryFailureDispose(void *obj); +static void virDomainEventMemoryDeviceSizeChangeDispose(void *obj); static void virDomainEventDispatchDefaultFunc(virConnectPtr conn, @@ -277,6 +279,15 @@ struct _virDomainEventMemoryFailure { }; typedef struct _virDomainEventMemoryFailure virDomainEventMemoryFailure; +struct _virDomainEventMemoryDeviceSizeChange { + virDomainEvent parent; + + char *alias; + unsigned long long size; +}; +typedef struct _virDomainEventMemoryDeviceSizeChange virDomainEventMemoryDeviceSizeChange; +typedef virDomainEventMemoryDeviceSizeChange *virDomainEventMemoryDeviceSizeChangePtr; + static int virDomainEventsOnceInit(void) { @@ -324,6 +335,8 @@ virDomainEventsOnceInit(void) return -1; if (!VIR_CLASS_NEW(virDomainEventMemoryFailure, virDomainEventClass)) return -1; + if (!VIR_CLASS_NEW(virDomainEventMemoryDeviceSizeChange, virDomainEventClass)) + return -1; return 0; } @@ -540,6 +553,14 @@ virDomainEventMemoryFailureDispose(void *obj) VIR_DEBUG("obj=%p", event); } +static void +virDomainEventMemoryDeviceSizeChangeDispose(void *obj) +{ + virDomainEventMemoryDeviceSizeChangePtr event = obj; + VIR_DEBUG("obj=%p", event); + + g_free(event->alias); +} static void * virDomainEventNew(virClass *klass, @@ -1664,6 +1685,57 @@ virDomainEventMemoryFailureNewFromDom(virDomainPtr dom, recipient, action, flags); } + +static virObjectEvent * +virDomainEventMemoryDeviceSizeChangeNew(int id, + const char *name, + unsigned char *uuid, + const char *alias, + unsigned long long size) +{ + virDomainEventMemoryDeviceSizeChangePtr ev; + + if (virDomainEventsInitialize() < 0) + return NULL; + + if (!(ev = virDomainEventNew(virDomainEventMemoryDeviceSizeChangeClass, + VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE, + id, name, uuid))) + return NULL; + + ev->alias = g_strdup(alias); + ev->size = size; + + return (virObjectEvent *)ev; +} + + +virObjectEvent * +virDomainEventMemoryDeviceSizeChangeNewFromObj(virDomainObj *obj, + const char *alias, + unsigned long long size) +{ + return virDomainEventMemoryDeviceSizeChangeNew(obj->def->id, + obj->def->name, + obj->def->uuid, + alias, + size); +} + + +virObjectEvent * +virDomainEventMemoryDeviceSizeChangeNewFromDom(virDomainPtr dom, + const char *alias, + unsigned long long size) +{ + return virDomainEventMemoryDeviceSizeChangeNew(dom->id, + dom->name, + dom->uuid, + alias, + size); +} + + static void virDomainEventDispatchDefaultFunc(virConnectPtr conn, virObjectEvent *event, @@ -1960,6 +2032,18 @@ virDomainEventDispatchDefaultFunc(virConnectPtr conn, goto cleanup; } + case VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE: + { + virDomainEventMemoryDeviceSizeChangePtr memoryDeviceSizeChangeEvent; + + memoryDeviceSizeChangeEvent = (virDomainEventMemoryDeviceSizeChangePtr)event; + ((virConnectDomainEventMemoryDeviceSizeChangeCallback)cb)(conn, dom, + memoryDeviceSizeChangeEvent->alias, + memoryDeviceSizeChangeEvent->size, + cbopaque); + goto cleanup; + } + case VIR_DOMAIN_EVENT_ID_LAST: break; } diff --git a/src/conf/domain_event.h b/src/conf/domain_event.h index 2a59e613cd..4a9f6b988b 100644 --- a/src/conf/domain_event.h +++ b/src/conf/domain_event.h @@ -267,6 +267,16 @@ virDomainEventMemoryFailureNewFromDom(virDomainPtr dom, int action, unsigned int flags); +virObjectEvent * +virDomainEventMemoryDeviceSizeChangeNewFromObj(virDomainObj *obj, + const char *alias, + unsigned long long size); + +virObjectEvent * +virDomainEventMemoryDeviceSizeChangeNewFromDom(virDomainPtr dom, + const char *alias, + unsigned long long size); + int virDomainEventStateRegister(virConnectPtr conn, virObjectEventState *state, diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 0721e13143..e25866c60b 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -719,6 +719,8 @@ virDomainEventLifecycleNew; virDomainEventLifecycleNewFromDef; virDomainEventLifecycleNewFromDom; virDomainEventLifecycleNewFromObj; +virDomainEventMemoryDeviceSizeChangeNewFromDom; +virDomainEventMemoryDeviceSizeChangeNewFromObj; virDomainEventMemoryFailureNewFromDom; virDomainEventMemoryFailureNewFromObj; virDomainEventMetadataChangeNewFromDom; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index d4ad29fac4..91a598c208 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11007,6 +11007,9 @@ qemuProcessEventFree(struct qemuProcessEvent *event) case QEMU_PROCESS_EVENT_JOB_STATUS_CHANGE: virObjectUnref(event->data); break; + case QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE: + qemuMonitorMemoryDeviceSizeChangeFree(event->data); + break; case QEMU_PROCESS_EVENT_PR_DISCONNECT: case QEMU_PROCESS_EVENT_LAST: break; diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index acf6ca5ab6..282ae345d5 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -430,6 +430,7 @@ typedef enum { QEMU_PROCESS_EVENT_PR_DISCONNECT, QEMU_PROCESS_EVENT_RDMA_GID_STATUS_CHANGED, QEMU_PROCESS_EVENT_GUEST_CRASHLOADED, + QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE, QEMU_PROCESS_EVENT_LAST } qemuProcessEventType; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index a638c67f1e..996f97b526 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4300,6 +4300,40 @@ processGuestCrashloadedEvent(virQEMUDriver *driver, } +static void +processMemoryDeviceSizeChange(virQEMUDriver *driver, + virDomainObj *vm, + qemuMonitorMemoryDeviceSizeChange *info) +{ + virDomainMemoryDef *mem = NULL; + virObjectEvent *event = NULL; + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) + return; + + if (!virDomainObjIsActive(vm)) { + VIR_DEBUG("Domain is not running"); + goto endjob; + } + + mem = virDomainMemoryFindByDeviceAlias(vm->def, info->devAlias); + if (!mem) { + VIR_DEBUG("Memory device '%s' not found", info->devAlias); + goto endjob; + } + + mem->actualsize = VIR_DIV_UP(info->size, 1024); + + event = virDomainEventMemoryDeviceSizeChangeNewFromObj(vm, + info->devAlias, + mem->actualsize); + + endjob: + qemuDomainObjEndJob(driver, vm); + virObjectEventStateQueue(driver->domainEventState, event); +} + + static void qemuProcessEventHandler(void *data, void *opaque) { struct qemuProcessEvent *processEvent = data; @@ -4349,6 +4383,9 @@ static void qemuProcessEventHandler(void *data, void *opaque) case QEMU_PROCESS_EVENT_GUEST_CRASHLOADED: processGuestCrashloadedEvent(driver, vm); break; + case QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE: + processMemoryDeviceSizeChange(driver, vm, processEvent->data); + break; case QEMU_PROCESS_EVENT_LAST: break; } diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index be04684585..aeaa7ee975 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1446,6 +1446,20 @@ qemuMonitorEmitSpiceMigrated(qemuMonitor *mon) } +int +qemuMonitorEmitMemoryDeviceSizeChange(qemuMonitor *mon, + const char *devAlias, + unsigned long long size) +{ + int ret = -1; + VIR_DEBUG("mon=%p, devAlias='%s', size=%llu", mon, devAlias, size); + + QEMU_MONITOR_CALLBACK(mon, ret, domainMemoryDeviceSizeChange, mon->vm, devAlias, size); + + return ret; +} + + int qemuMonitorEmitMemoryFailure(qemuMonitor *mon, qemuMonitorEventMemoryFailure *mfp) @@ -4393,6 +4407,16 @@ qemuMonitorEventRdmaGidStatusFree(qemuMonitorRdmaGidStatus *info) } +void +qemuMonitorMemoryDeviceSizeChangeFree(qemuMonitorMemoryDeviceSizeChangePtr info) +{ + if (!info) + return; + + g_free(info->devAlias); +} + + int qemuMonitorSetWatchdogAction(qemuMonitor *mon, const char *action) diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index b5de5711e0..1c69a86af0 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -101,6 +101,14 @@ struct _qemuMonitorRdmaGidStatus { }; +typedef struct _qemuMonitorMemoryDeviceSizeChange qemuMonitorMemoryDeviceSizeChange; +typedef qemuMonitorMemoryDeviceSizeChange *qemuMonitorMemoryDeviceSizeChangePtr; +struct _qemuMonitorMemoryDeviceSizeChange { + char *devAlias; + unsigned long long size; +}; + + typedef enum { QEMU_MONITOR_JOB_TYPE_UNKNOWN, /* internal value, not exposed by qemu */ QEMU_MONITOR_JOB_TYPE_COMMIT, @@ -145,6 +153,7 @@ struct _qemuMonitorJobInfo { char *qemuMonitorGuestPanicEventInfoFormatMsg(qemuMonitorEventPanicInfo *info); void qemuMonitorEventPanicInfoFree(qemuMonitorEventPanicInfo *info); void qemuMonitorEventRdmaGidStatusFree(qemuMonitorRdmaGidStatus *info); +void qemuMonitorMemoryDeviceSizeChangeFree(qemuMonitorMemoryDeviceSizeChange *info); typedef void (*qemuMonitorDestroyCallback)(qemuMonitor *mon, virDomainObj *vm, @@ -364,6 +373,12 @@ typedef int (*qemuMonitorDomainMemoryFailureCallback)(qemuMonitor *mon, qemuMonitorEventMemoryFailure *mfp, void *opaque); +typedef int (*qemuMonitorDomainMemoryDeviceSizeChange)(qemuMonitor *mon, + virDomainObj *vm, + const char *alias, + unsigned long long size, + void *opaque); + typedef struct _qemuMonitorCallbacks qemuMonitorCallbacks; struct _qemuMonitorCallbacks { qemuMonitorDestroyCallback destroy; @@ -400,6 +415,7 @@ struct _qemuMonitorCallbacks { qemuMonitorDomainRdmaGidStatusChangedCallback domainRdmaGidStatusChanged; qemuMonitorDomainGuestCrashloadedCallback domainGuestCrashloaded; qemuMonitorDomainMemoryFailureCallback domainMemoryFailure; + qemuMonitorDomainMemoryDeviceSizeChange domainMemoryDeviceSizeChange; }; qemuMonitor *qemuMonitorOpen(virDomainObj *vm, @@ -496,6 +512,10 @@ int qemuMonitorEmitSerialChange(qemuMonitor *mon, bool connected); int qemuMonitorEmitSpiceMigrated(qemuMonitor *mon); +int qemuMonitorEmitMemoryDeviceSizeChange(qemuMonitor *mon, + const char *devAlias, + unsigned long long size); + int qemuMonitorEmitMemoryFailure(qemuMonitor *mon, qemuMonitorEventMemoryFailure *mfp); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index d3db88a389..57b78cd686 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -113,6 +113,7 @@ static void qemuMonitorJSONHandleDumpCompleted(qemuMonitor *mon, virJSONValue *d static void qemuMonitorJSONHandlePRManagerStatusChanged(qemuMonitor *mon, virJSONValue *data); static void qemuMonitorJSONHandleRdmaGidStatusChanged(qemuMonitor *mon, virJSONValue *data); static void qemuMonitorJSONHandleMemoryFailure(qemuMonitor *mon, virJSONValue *data); +static void qemuMonitorJSONHandleMemoryDeviceSizeChange(qemuMonitor *mon, virJSONValue *data); typedef struct { const char *type; @@ -133,6 +134,7 @@ static qemuEventHandler eventHandlers[] = { { "GUEST_CRASHLOADED", qemuMonitorJSONHandleGuestCrashloaded, }, { "GUEST_PANICKED", qemuMonitorJSONHandleGuestPanic, }, { "JOB_STATUS_CHANGE", qemuMonitorJSONHandleJobStatusChange, }, + { "MEMORY_DEVICE_SIZE_CHANGE", qemuMonitorJSONHandleMemoryDeviceSizeChange, }, { "MEMORY_FAILURE", qemuMonitorJSONHandleMemoryFailure, }, { "MIGRATION", qemuMonitorJSONHandleMigrationStatus, }, { "MIGRATION_PASS", qemuMonitorJSONHandleMigrationPass, }, @@ -1320,6 +1322,28 @@ qemuMonitorJSONHandleSpiceMigrated(qemuMonitor *mon, } +static void +qemuMonitorJSONHandleMemoryDeviceSizeChange(qemuMonitor *mon, + virJSONValue *data) +{ + const char *name; + unsigned long long size; + + if (!(name = virJSONValueObjectGetString(data, "id"))) { + VIR_WARN("missing device alias in MEMORY_DEVICE_SIZE_CHANGE event"); + return; + } + + if (virJSONValueObjectGetNumberUlong(data, "size", &size) < 0) { + VIR_WARN("missing new size for '%s' in MEMORY_DEVICE_SIZE_CHANGE event", name); + return; + } + + + qemuMonitorEmitMemoryDeviceSizeChange(mon, name, size); +} + + static void qemuMonitorJSONHandleMemoryFailure(qemuMonitor *mon, virJSONValue *data) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 62208048b0..78e5a65d5f 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1919,6 +1919,47 @@ qemuProcessHandleMemoryFailure(qemuMonitor *mon G_GNUC_UNUSED, } +static int +qemuProcessHandleMemoryDeviceSizeChange(qemuMonitor *mon G_GNUC_UNUSED, + virDomainObj *vm, + const char *devAlias, + unsigned long long size, + void *opaque) +{ + virQEMUDriver *driver = opaque; + struct qemuProcessEvent *processEvent = NULL; + qemuMonitorMemoryDeviceSizeChange *info = NULL; + int ret = -1; + + virObjectLock(vm); + + VIR_DEBUG("Memory device '%s' changed size to '%llu' in domain '%s'", + devAlias, size, vm->def->name); + + info = g_new0(qemuMonitorMemoryDeviceSizeChange, 1); + info->devAlias = g_strdup(devAlias); + info->size = size; + + processEvent = g_new0(struct qemuProcessEvent, 1); + processEvent->eventType = QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE; + processEvent->vm = virObjectRef(vm); + processEvent->data = g_steal_pointer(&info); + + if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { + qemuProcessEventFree(processEvent); + virObjectUnref(vm); + goto cleanup; + } + + processEvent = NULL; + ret = 0; + cleanup: + qemuProcessEventFree(processEvent); + virObjectUnlock(vm); + return ret; +} + + static qemuMonitorCallbacks monitorCallbacks = { .eofNotify = qemuProcessHandleMonitorEOF, .errorNotify = qemuProcessHandleMonitorError, @@ -1952,6 +1993,7 @@ static qemuMonitorCallbacks monitorCallbacks = { .domainRdmaGidStatusChanged = qemuProcessHandleRdmaGidStatusChanged, .domainGuestCrashloaded = qemuProcessHandleGuestCrashloaded, .domainMemoryFailure = qemuProcessHandleMemoryFailure, + .domainMemoryDeviceSizeChange = qemuProcessHandleMemoryDeviceSizeChange, }; static void diff --git a/src/remote/remote_daemon_dispatch.c b/src/remote/remote_daemon_dispatch.c index 65aa20f7d1..ff81ecfa79 100644 --- a/src/remote/remote_daemon_dispatch.c +++ b/src/remote/remote_daemon_dispatch.c @@ -1334,6 +1334,35 @@ remoteRelayDomainEventMemoryFailure(virConnectPtr conn, } +static int +remoteRelayDomainEventMemoryDeviceSizeChange(virConnectPtr conn, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque) +{ + daemonClientEventCallback *callback = opaque; + remote_domain_event_memory_device_size_change_msg data; + + if (callback->callbackID < 0 || + !remoteRelayDomainEventCheckACL(callback->client, conn, dom)) + return -1; + + /* build return data */ + memset(&data, 0, sizeof(data)); + data.callbackID = callback->callbackID; + data.alias = g_strdup(alias); + data.size = size; + make_nonnull_domain(&data.dom, dom); + + remoteDispatchObjectEventSend(callback->client, remoteProgram, + REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE, + (xdrproc_t)xdr_remote_domain_event_memory_device_size_change_msg, + &data); + return 0; +} + + static virConnectDomainEventGenericCallback domainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventLifecycle), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventReboot), @@ -1361,6 +1390,7 @@ static virConnectDomainEventGenericCallback domainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventMetadataChange), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventBlockThreshold), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventMemoryFailure), + VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventMemoryDeviceSizeChange), }; G_STATIC_ASSERT(G_N_ELEMENTS(domainEventCallbacks) == VIR_DOMAIN_EVENT_ID_LAST); diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index c03c68ec30..1f94be6cbc 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -410,6 +410,10 @@ remoteDomainBuildEventMemoryFailure(virNetClientProgram *prog, void *evdata, void *opaque); static void +remoteDomainBuildEventMemoryDeviceSizeChange(virNetClientProgram *prog, + virNetClient *client, + void *evdata, void *opaque); +static void remoteConnectNotifyEventConnectionClosed(virNetClientProgram *prog G_GNUC_UNUSED, virNetClient *client G_GNUC_UNUSED, void *evdata, void *opaque); @@ -624,6 +628,10 @@ static virNetClientProgramEvent remoteEvents[] = { remoteDomainBuildEventMemoryFailure, sizeof(remote_domain_event_memory_failure_msg), (xdrproc_t)xdr_remote_domain_event_memory_failure_msg }, + { REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE, + remoteDomainBuildEventMemoryDeviceSizeChange, + sizeof(remote_domain_event_memory_device_size_change_msg), + (xdrproc_t)xdr_remote_domain_event_memory_device_size_change_msg }, }; static void @@ -5436,6 +5444,30 @@ remoteDomainBuildEventMemoryFailure(virNetClientProgram *prog G_GNUC_UNUSED, } +static void +remoteDomainBuildEventMemoryDeviceSizeChange(virNetClientProgram *prog G_GNUC_UNUSED, + virNetClient *client G_GNUC_UNUSED, + void *evdata, void *opaque) +{ + virConnectPtr conn = opaque; + remote_domain_event_memory_device_size_change_msg *msg = evdata; + struct private_data *priv = conn->privateData; + virDomainPtr dom; + virObjectEvent *event = NULL; + + if (!(dom = get_nonnull_domain(conn, msg->dom))) + return; + + event = virDomainEventMemoryDeviceSizeChangeNewFromDom(dom, + msg->alias, + msg->size); + + virObjectUnref(dom); + + virObjectEventStateQueueRemote(priv->eventState, event, msg->callbackID); +} + + static int remoteStreamSend(virStreamPtr st, const char *data, diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index de69704b68..a45bdaa7e8 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -3837,6 +3837,13 @@ struct remote_domain_start_dirty_rate_calc_args { }; +struct remote_domain_event_memory_device_size_change_msg { + int callbackID; + remote_nonnull_domain dom; + remote_nonnull_string alias; + unsigned hyper size; +}; + /*----- Protocol. -----*/ /* Define the program number, protocol version and procedure numbers here. */ @@ -6784,6 +6791,11 @@ enum remote_procedure { * @priority: high * @acl: node_device:start */ - REMOTE_PROC_NODE_DEVICE_CREATE = 430 + REMOTE_PROC_NODE_DEVICE_CREATE = 430, + /** + * @generate: both + * @acl: none + */ + REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE = 431 }; diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index 6b46328adc..7286c06ae8 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -3192,6 +3192,12 @@ struct remote_domain_start_dirty_rate_calc_args { int seconds; u_int flags; }; +struct remote_domain_event_memory_device_size_change_msg { + int callbackID; + remote_nonnull_domain dom; + remote_nonnull_string alias; + uint64_t size; +}; enum remote_procedure { REMOTE_PROC_CONNECT_OPEN = 1, REMOTE_PROC_CONNECT_CLOSE = 2, @@ -3623,4 +3629,5 @@ enum remote_procedure { REMOTE_PROC_NODE_DEVICE_DEFINE_XML = 428, REMOTE_PROC_NODE_DEVICE_UNDEFINE = 429, REMOTE_PROC_NODE_DEVICE_CREATE = 430, + REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE = 431, }; diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index f2a5fb03a4..b69eb3e346 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -13464,6 +13464,24 @@ virshEventMemoryFailurePrint(virConnectPtr conn G_GNUC_UNUSED, } +static void +virshEventMemoryDeviceSizeChangePrint(virConnectPtr conn G_GNUC_UNUSED, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque) +{ + g_auto(virBuffer) buf = VIR_BUFFER_INITIALIZER; + + virBufferAsprintf(&buf, + _("event 'memory-device-size-change' for domain '%s':\n" + "alias: %s\nsize: %llu\n"), + virDomainGetName(dom), alias, size); + + virshEventPrint(opaque, &buf); +} + + virshDomainEventCallback virshDomainEventCallbacks[] = { { "lifecycle", VIR_DOMAIN_EVENT_CALLBACK(virshEventLifecyclePrint), }, @@ -13515,6 +13533,8 @@ virshDomainEventCallback virshDomainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(virshEventBlockThresholdPrint), }, { "memory-failure", VIR_DOMAIN_EVENT_CALLBACK(virshEventMemoryFailurePrint), }, + { "memory-device-size-change", + VIR_DOMAIN_EVENT_CALLBACK(virshEventMemoryDeviceSizeChangePrint), }, }; G_STATIC_ASSERT(VIR_DOMAIN_EVENT_ID_LAST == G_N_ELEMENTS(virshDomainEventCallbacks)); -- 2.31.1

If the QEMU driver restarts it loses the track of the actual size of virtio-mem (because it's runtime type of information and thus not stored in XML) and therefore, we have to refresh it when reconnecting to the domain monitor. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_domain.c | 20 +++++++++++-- src/qemu/qemu_monitor.h | 3 ++ src/qemu/qemu_monitor_json.c | 58 ++++++++++++++++++++++-------------- src/qemu/qemu_process.c | 3 ++ 4 files changed, 58 insertions(+), 26 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 91a598c208..914ecbd3b6 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -8289,9 +8289,23 @@ qemuDomainUpdateMemoryDeviceInfo(virQEMUDriver *driver, if (!(dimm = virHashLookup(meminfo, mem->info.alias))) continue; - mem->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM; - mem->info.addr.dimm.slot = dimm->slot; - mem->info.addr.dimm.base = dimm->address; + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + mem->actualsize = VIR_DIV_UP(dimm->size, 1024); + break; + + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + mem->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM; + mem->info.addr.dimm.slot = dimm->slot; + mem->info.addr.dimm.base = dimm->address; + break; + + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } } virHashFree(meminfo); diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 1c69a86af0..70cdcf9ad2 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1351,10 +1351,13 @@ int qemuMonitorSetIOThread(qemuMonitor *mon, typedef struct _qemuMonitorMemoryDeviceInfo qemuMonitorMemoryDeviceInfo; struct _qemuMonitorMemoryDeviceInfo { + /* For pc-dimm */ unsigned long long address; unsigned int slot; bool hotplugged; bool hotpluggable; + /* For virtio-mem */ + unsigned long long size; /* in bytes */ }; int qemuMonitorGetMemoryDeviceInfo(qemuMonitor *mon, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 57b78cd686..5d8f24db4f 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -8056,7 +8056,6 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon, virJSONValue *cmd; virJSONValue *reply = NULL; virJSONValue *data = NULL; - qemuMonitorMemoryDeviceInfo *meminfo = NULL; size_t i; if (!(cmd = qemuMonitorJSONMakeCommand("query-memory-devices", NULL))) @@ -8077,6 +8076,9 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon, for (i = 0; i < virJSONValueArraySize(data); i++) { virJSONValue *elem = virJSONValueArrayGet(data, i); + g_autofree qemuMonitorMemoryDeviceInfo *meminfo = NULL; + virJSONValue *dimminfo; + const char *devalias; const char *type; if (!(type = virJSONValueObjectGetString(elem, "type"))) { @@ -8086,26 +8088,26 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon, goto cleanup; } + if (!(dimminfo = virJSONValueObjectGetObject(elem, "data"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("query-memory-devices reply data doesn't " + "contain enum data")); + goto cleanup; + } + + /* While 'id' attribute is marked as optional in QEMU's QAPI + * specification, Libvirt always sets it. Thus we can fail if not + * present. */ + if (!(devalias = virJSONValueObjectGetString(dimminfo, "id"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("dimm memory info data is missing 'id'")); + goto cleanup; + } + + meminfo = g_new0(qemuMonitorMemoryDeviceInfo, 1); + /* dimm memory devices */ if (STREQ(type, "dimm")) { - virJSONValue *dimminfo; - const char *devalias; - - if (!(dimminfo = virJSONValueObjectGetObject(elem, "data"))) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("query-memory-devices reply data doesn't " - "contain enum data")); - goto cleanup; - } - - if (!(devalias = virJSONValueObjectGetString(dimminfo, "id"))) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("dimm memory info data is missing 'id'")); - goto cleanup; - } - - meminfo = g_new0(qemuMonitorMemoryDeviceInfo, 1); - if (virJSONValueObjectGetNumberUlong(dimminfo, "addr", &meminfo->address) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -8136,17 +8138,27 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon, } - if (virHashAddEntry(info, devalias, meminfo) < 0) + } else if (STREQ(type, "virtio-mem")) { + if (virJSONValueObjectGetNumberUlong(dimminfo, "size", + &meminfo->size) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("malformed/missing size in virtio memory info")); goto cleanup; - - meminfo = NULL; + } + } else { + /* type not handled yet */ + continue; } + + if (virHashAddEntry(info, devalias, meminfo) < 0) + goto cleanup; + + meminfo = NULL; } ret = 0; cleanup: - VIR_FREE(meminfo); virJSONValueFree(cmd); virJSONValueFree(reply); return ret; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 78e5a65d5f..8e82043f43 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8735,6 +8735,9 @@ qemuProcessReconnect(void *opaque) qemuDomainVcpuPersistOrder(obj->def); + if (qemuDomainUpdateMemoryDeviceInfo(driver, obj, QEMU_ASYNC_JOB_NONE) < 0) + goto error; + if (qemuProcessDetectIOThreadPIDs(driver, obj, QEMU_ASYNC_JOB_NONE) < 0) goto error; -- 2.31.1

Reporting how much memory is exposed to the guest happens under <currentMemory/> which is taken from def->mem.cur_balloon. The reported amount should account for both balloon size and the sum of @actualsize of all virtio-mems. For instance, if domain has 4GiB via balloon and additional 2GiB via virtio-mem, then the domain XML should report 6GiB. The same applies for domain statistics. The way to achieve this is to account for either balloon or virtio-mem when the size of the other is changed, e.g. on balloon change we have to add all @actualsize (for non virtio-mem these will be zero, so the check for memory model is needless, but makes it more obvious what's happening), and vice versa. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_driver.c | 9 +++++++++ src/qemu/qemu_process.c | 25 +++++++++++++++++++++++++ 2 files changed, 34 insertions(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 996f97b526..735e383da6 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4307,6 +4307,7 @@ processMemoryDeviceSizeChange(virQEMUDriver *driver, { virDomainMemoryDef *mem = NULL; virObjectEvent *event = NULL; + unsigned long long balloon; if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) return; @@ -4322,7 +4323,15 @@ processMemoryDeviceSizeChange(virQEMUDriver *driver, goto endjob; } + /* If this looks weird it's because it is. The balloon size + * as reported by QEMU does not include any of @actualsize. + * It really contains just the balloon size. But in domain + * definition we want to report also sum of @actualsize. Do + * a bit of math to fix the domain definition. */ + balloon = vm->def->mem.cur_balloon - mem->actualsize; mem->actualsize = VIR_DIV_UP(info->size, 1024); + balloon += mem->actualsize; + vm->def->mem.cur_balloon = balloon; event = virDomainEventMemoryDeviceSizeChangeNewFromObj(vm, info->devAlias, diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 8e82043f43..06d46b31dc 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1232,10 +1232,22 @@ qemuProcessHandleBalloonChange(qemuMonitor *mon G_GNUC_UNUSED, virQEMUDriver *driver = opaque; virObjectEvent *event = NULL; g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); + size_t i; virObjectLock(vm); event = virDomainEventBalloonChangeNewFromObj(vm, actual); + /* We want the balloon size stored in domain definition to + * account for the actual size of virtio-mem too. But the + * balloon size as reported by QEMU (@actual) contains just + * the balloon size without any virtio-mem. Do a wee bit of + * math to fix it. */ + VIR_DEBUG("balloon size before fix is %lld", actual); + for (i = 0; i < vm->def->nmems; i++) { + if (vm->def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + actual += vm->def->mems[i]->actualsize; + } + VIR_DEBUG("Updating balloon from %lld to %lld kb", vm->def->mem.cur_balloon, actual); vm->def->mem.cur_balloon = actual; @@ -2429,6 +2441,7 @@ qemuProcessRefreshBalloonState(virQEMUDriver *driver, int asyncJob) { unsigned long long balloon; + size_t i; int rc; /* if no ballooning is available, the current size equals to the current @@ -2445,6 +2458,18 @@ qemuProcessRefreshBalloonState(virQEMUDriver *driver, if (qemuDomainObjExitMonitor(driver, vm) < 0 || rc < 0) return -1; + /* We want the balloon size stored in domain definition to + * account for the actual size of virtio-mem too. But the + * balloon size as reported by QEMU (@balloon) contains just + * the balloon size without any virtio-mem. Do a wee bit of + * math to fix it. */ + VIR_DEBUG("balloon size before fix is %lld", balloon); + for (i = 0; i < vm->def->nmems; i++) { + if (vm->def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + balloon += vm->def->mems[i]->actualsize; + } + VIR_DEBUG("Updating balloon from %lld to %lld kb", + vm->def->mem.cur_balloon, balloon); vm->def->mem.cur_balloon = balloon; return 0; -- 2.31.1

The qemuDomainSetMemoryFlags() allows for memballoon (<currentMemory/>) changes for both active and inactive guests. And just before doing any change, we have to make sure that the new size is not greater than the total memory (<memory/>). However, the total memory includes not only the regular guest memory, but also sum of maximum sizes of all virtio-mems (in fact all memory devices for that matter). But virtio-mem devices are modified differently (via virDomainUpdateDevice()) and thus the upper limit for new balloon size has to be lowered. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_driver.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 735e383da6..b5610ca8e4 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2416,12 +2416,28 @@ static int qemuDomainSetMemoryFlags(virDomainPtr dom, unsigned long newmem, } else { /* resize the current memory */ unsigned long oldmax = 0; + size_t i; - if (def) + if (def) { oldmax = virDomainDefGetMemoryTotal(def); + + /* While virtio-mem is regular mem from guest POV, it can't be + * modified through this API. */ + for (i = 0; i < def->nmems; i++) { + if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + oldmax -= def->mems[i]->size; + } + } + if (persistentDef) { - if (!oldmax || oldmax > virDomainDefGetMemoryTotal(persistentDef)) + if (!def || oldmax > virDomainDefGetMemoryTotal(persistentDef)) { oldmax = virDomainDefGetMemoryTotal(persistentDef); + + for (i = 0; i < persistentDef->nmems; i++) { + if (persistentDef->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + oldmax -= persistentDef->mems[i]->size; + } + } } if (newmem > oldmax) { -- 2.31.1

New 'update-memory-device' command is introduced which aims on making it user friendly to change <memory/> device. So far I just need to change <requested/> so I'm introducing --requested-size only; but the idea is that this is extensible for other cases too. For instance, want to change <myElement/>? A new --my-element argument can be easily introduced. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/manpages/virsh.rst | 30 ++++++++ tools/virsh-domain.c | 149 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 179 insertions(+) diff --git a/docs/manpages/virsh.rst b/docs/manpages/virsh.rst index 87668f2b9a..80802b28ee 100644 --- a/docs/manpages/virsh.rst +++ b/docs/manpages/virsh.rst @@ -4923,6 +4923,36 @@ results as some fields may be autogenerated and thus match devices other than expected. +update-memory-device +-------------------- + +**Syntax:** + +:: + + update-memory-device domain [--print-xml] [--alias alias] + [[--live] [--config] | [--current]] + [--requested-size size] + +This command finds ``<memory/>`` device inside given *domain*, changes +requested values and passes updated device XML to daemon. If *--print-xml* is +specified then the device is not changed, but the updated device XML is printed +to stdout. If there are more than one ``<memory/>`` devices in *domain* use +*--alias* to select the desired one. + +If *--live* is specified, affect a running domain. +If *--config* is specified, affect the next startup of a persistent guest. +If *--current* is specified, it is equivalent to either *--live* or +*--config*, depending on the current state of the guest. +Both *--live* and *--config* flags may be given, but *--current* is +exclusive. Not specifying any flag is the same as specifying *--current*. + +If *--requested-size* is specified then ``<requested/>`` under memory target is +changed to requested *size* (as scaled integer, see ``NOTES`` above). It +defaults to kibibytes if no suffix is provided. The option is valid only for +``virtio-mem`` memory device model. + + change-media ------------ diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index b69eb3e346..70efca811d 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -9107,6 +9107,149 @@ cmdSetmaxmem(vshControl *ctl, const vshCmd *cmd) return ret; } + +/* + * "update-memory-device" command + */ +static const vshCmdInfo info_update_memory_device[] = { + {.name = "help", + .data = N_("update memory device of a domain") + }, + {.name = "desc", + .data = N_("Update values of a memory device of a domain") + }, + {.name = NULL} +}; + +static const vshCmdOptDef opts_update_memory_device[] = { + VIRSH_COMMON_OPT_DOMAIN_FULL(0), + VIRSH_COMMON_OPT_DOMAIN_CONFIG, + VIRSH_COMMON_OPT_DOMAIN_LIVE, + VIRSH_COMMON_OPT_DOMAIN_CURRENT, + {.name = "print-xml", + .type = VSH_OT_BOOL, + .help = N_("print updated memory device XML instead of executing the change") + }, + {.name = "alias", + .type = VSH_OT_STRING, + .completer = virshDomainDeviceAliasCompleter, + .help = N_("memory device alias"), + }, + {.name = "requested-size", + .type = VSH_OT_INT, + .help = N_("new value of <requested/> size, as scaled integer (default KiB)") + }, + {.name = NULL} +}; + +static int +virshGetUpdatedMemoryXML(char **updatedMemoryXML, + vshControl *ctl, + const vshCmd *cmd, + virDomainPtr dom, + unsigned int flags) +{ + const char *alias = NULL; + g_autoptr(xmlDoc) doc = NULL; + g_autoptr(xmlXPathContext) ctxt = NULL; + g_autofree char *xpath = NULL; + int nmems; + g_autofree xmlNodePtr *mems = NULL; + unsigned int domainXMLFlags = 0; + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) + domainXMLFlags |= VIR_DOMAIN_XML_INACTIVE; + + if (virshDomainGetXMLFromDom(ctl, dom, domainXMLFlags, &doc, &ctxt) < 0) + return -1; + + if (vshCommandOptStringReq(ctl, cmd, "alias", &alias) < 0) + return -1; + + if (alias) { + xpath = g_strdup_printf("/domain/devices/memory[./alias/@name='%s']", alias); + } else { + xpath = g_strdup("/domain/devices/memory"); + } + + nmems = virXPathNodeSet(xpath, ctxt, &mems); + if (nmems < 0) { + vshSaveLibvirtError(); + return -1; + } else if (nmems == 0) { + vshError(ctl, _("no memory device found")); + return -1; + } else if (nmems > 1) { + vshError(ctl, _("multiple memory devices found, use --alias to select one")); + return -1; + } + + ctxt->node = mems[0]; + + if (vshCommandOptBool(cmd, "requested-size")) { + xmlNodePtr requestedSizeNode; + g_autofree char *kibibytesStr = NULL; + unsigned long long bytes = 0; + unsigned long kibibytes = 0; + + if (vshCommandOptScaledInt(ctl, cmd, "requested-size", &bytes, 1024, ULLONG_MAX) < 0) + return -1; + kibibytes = VIR_DIV_UP(bytes, 1024); + + requestedSizeNode = virXPathNode("./target/requested", ctxt); + + if (!requestedSizeNode) { + vshError(ctl, _("virtio-mem device is missing <requested/>")); + return -1; + } + + kibibytesStr = g_strdup_printf("%lu", kibibytes); + xmlNodeSetContent(requestedSizeNode, BAD_CAST kibibytesStr); + } + + if (!(*updatedMemoryXML = virXMLNodeToString(doc, mems[0]))) { + vshSaveLibvirtError(); + return -1; + } + + return 0; +} + +static bool +cmdUpdateMemoryDevice(vshControl *ctl, const vshCmd *cmd) +{ + g_autoptr(virshDomain) dom = NULL; + bool config = vshCommandOptBool(cmd, "config"); + bool live = vshCommandOptBool(cmd, "live"); + bool current = vshCommandOptBool(cmd, "current"); + g_autofree char *updatedMemoryXML = NULL; + unsigned int flags = VIR_DOMAIN_AFFECT_CURRENT; + + VSH_EXCLUSIVE_OPTIONS_VAR(current, live); + VSH_EXCLUSIVE_OPTIONS_VAR(current, config); + + if (config) + flags |= VIR_DOMAIN_AFFECT_CONFIG; + if (live) + flags |= VIR_DOMAIN_AFFECT_LIVE; + + if (!(dom = virshCommandOptDomain(ctl, cmd, NULL))) + return false; + + if (virshGetUpdatedMemoryXML(&updatedMemoryXML, ctl, cmd, dom, flags) < 0) + return false; + + if (vshCommandOptBool(cmd, "print-xml")) { + vshPrint(ctl, "%s", updatedMemoryXML); + } else { + if (virDomainUpdateDeviceFlags(dom, updatedMemoryXML, flags) < 0) + return false; + } + + return true; +} + + /* * "memtune" command */ @@ -15024,6 +15167,12 @@ const vshCmdDef domManagementCmds[] = { .info = info_update_device, .flags = 0 }, + {.name = "update-memory-device", + .handler = cmdUpdateMemoryDevice, + .opts = opts_update_memory_device, + .info = info_update_memory_device, + .flags = 0 + }, {.name = "vcpucount", .handler = cmdVcpucount, .opts = opts_vcpucount, -- 2.31.1

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- NEWS.rst | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/NEWS.rst b/NEWS.rst index 2c7180cb95..3cb4289784 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -8,6 +8,18 @@ the changes introduced by each of them. For a more fine-grained view, use the `git log`_. +v7.6.0 (unreleased) +=================== + +* **New features** + + * Introduce virtio-mem ``<memory/>`` model + + New virtio-mem model is introduced for ``<memory/>`` device which is a + paravirtualized mechanism of adding/removing memory to/from a VM. Use + ``virDomainUpdateDeviceFlags()`` API to adjust amount of memory or ``virsh + update-memory-device`` for convenience. + v7.5.0 (unreleased) =================== -- 2.31.1

This commit adds new memorydevices.rst page which should serve all models of memory devices. Yet, I'm documenting virtio-mem quirks only. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/kbase/index.rst | 4 + docs/kbase/memorydevices.rst | 150 +++++++++++++++++++++++++++++++++++ docs/kbase/meson.build | 1 + 3 files changed, 155 insertions(+) create mode 100644 docs/kbase/memorydevices.rst diff --git a/docs/kbase/index.rst b/docs/kbase/index.rst index 91083ee49d..6355fe4f1d 100644 --- a/docs/kbase/index.rst +++ b/docs/kbase/index.rst @@ -52,6 +52,10 @@ Usage `PCI topology <../pci-addresses.html>`__ Addressing schemes for PCI devices +`Memory devices <memorydevices.html>`__ + Memory devices and their use + + Internals / Debugging --------------------- diff --git a/docs/kbase/memorydevices.rst b/docs/kbase/memorydevices.rst new file mode 100644 index 0000000000..23ccd6da88 --- /dev/null +++ b/docs/kbase/memorydevices.rst @@ -0,0 +1,150 @@ +============== +Memory devices +============== + +.. contents:: + +Basics +====== + +Memory devices can be divided into two families: volatile and non-volatile. +The former is typical RAM memory: it's volatile and thus its contents doesn't +survive reboots nor guest shut downs and power ons. The latter retains its +contents across reboots or power outages. + +In Libvirt, there are two models for volatile memory: + +* ``dimm`` model: + + :: + + <memory model='dimm'> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + <address type='dimm' slot='0'/> + </memory> + +* ``virtio-mem`` model: + + :: + + <memory model='virtio-mem'> + <target> + <size unit='KiB'>1048576</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>524288</requested> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </memory> + +Then there are two models for non-volatile memory: + +* ``nvidmm`` model: + + :: + + <memory model='nvdimm'> + <source> + <path>/tmp/nvdimm</path> + </source> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + <address type='dimm' slot='0'/> + </memory> + +* ``virtio-pmem`` model: + + :: + + <memory model='virtio-pmem' access='shared'> + <source> + <path>/tmp/virtio_pmem</path> + </source> + <target> + <size unit='KiB'>524288</size> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </memory> + + +Please note that (maybe somewhat surprisingly) virtio models go onto PCI bus +instead of DIMM slots. + +Furthermore, DIMMs can have ``<source/>`` element which configures backend for +devices. For NVDIMMs the element is mandatory and reflects where the contents +is saved. + +See `memory devices documentation <../formatdomain.html#elementsMemory>`_. + +``virtio-mem`` model +==================== + +The ``virtio-mem`` model can be viewed as revised memory balloon. It offers +adding and removing memory (without the actual hotplug of the device). It +solves problems that memory balloon can't solve on its own and thus is more +flexible than DIMM + balloon solution. ``virtio-mem`` is NUMA aware, and thus +memory can be inflated/deflated only for a subset of guest NUMA nodes. Also, +it works with chunks that are either exposed to guest or taken back from it. + +See https://virtio-mem.gitlab.io/ + +Under the hood, ``virtio-mem`` device is split into chunks of equal size which +are then exposed to the guest. Either all of them or only a portion depending +on user's request. Therefore there are three important sizes for +``virtio-mem``. All are to be found under ``<target/>`` element: + +#. The maximum size the device can ever offer, exposed under ``<size/>`` +#. The size of a single block, exposed under ``<block/>`` +#. The current size exposed to the guest, exposed under ``<requested/>`` + +For instance, the following example the maximum size is 4GiB, the block size is +2MiB and only 1GiB should be exposed to the guest: + + :: + + <memory model='virtio-mem'> + <target> + <size unit='KiB'>4194304</size> + <block unit='KiB'>2048</block> + <requested unit='KiB'>1048576</requested> + </target> + </memory> + +Please note that ``<requested/>`` must be an integer multiple of ``<block/>`` +size or zero (no blocks exposed to the guest) and has to be less or equal to +``<size/>`` (all blocks exposed to the guest). Furthermore, QEMU recommends the +``<block/>`` size to be as big as a Transparent Huge Page (usually 2MiB). + +To change the size exposed to the guest, users should pass memory device XML +with nothing but ``<requested/>`` changed into the +``virDomainUpdateDeviceFlags()`` API. For user's convenience this can be done +via virsh too: + + :: + + # virsh update-memory-device $dom --requested-size 2GiB + +If there are two or more ``<memory/>`` devices then ``--alias`` shall be used +to tell virsh which memory device should be updated. + +For running guests there is fourth size that can be found under ``<target/>``: + + :: + + <actual unit='KiB'>2097152</actual> + +The ``<actual/>`` reflects the actual size used by the guest. In general it +can differ from ``<requested/>``. Reasons include guest kernel missing +``virtio-mem`` module and thus being unable to take offered memory, or guest +kernel being unable to free memory. Since ``<actual/>`` only reports size to +users, the element is never parsed. It is formatted only into live XML. + +Since changing actual allocation requires cooperation with guest kernel, +requests for change are not instant. Therefore, libvirt emits +``VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE`` event whenever actual +allocation changed. diff --git a/docs/kbase/meson.build b/docs/kbase/meson.build index 7631b47018..f93f687efb 100644 --- a/docs/kbase/meson.build +++ b/docs/kbase/meson.build @@ -10,6 +10,7 @@ docs_kbase_files = [ 'locking-lockd', 'locking', 'locking-sanlock', + 'memorydevices', 'merging_disk_image_chains', 'migrationinternals', 'qemu-passthrough-security', -- 2.31.1

On 6/23/21 4:12 AM, Michal Privoznik wrote:
This commit adds new memorydevices.rst page which should serve all models of memory devices. Yet, I'm documenting virtio-mem quirks only.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/kbase/index.rst | 4 + docs/kbase/memorydevices.rst | 150 +++++++++++++++++++++++++++++++++++ docs/kbase/meson.build | 1 + 3 files changed, 155 insertions(+) create mode 100644 docs/kbase/memorydevices.rst
diff --git a/docs/kbase/index.rst b/docs/kbase/index.rst index 91083ee49d..6355fe4f1d 100644 --- a/docs/kbase/index.rst +++ b/docs/kbase/index.rst @@ -52,6 +52,10 @@ Usage `PCI topology <../pci-addresses.html>`__ Addressing schemes for PCI devices
+`Memory devices <memorydevices.html>`__ + Memory devices and their use + + Internals / Debugging ---------------------
diff --git a/docs/kbase/memorydevices.rst b/docs/kbase/memorydevices.rst new file mode 100644 index 0000000000..23ccd6da88 --- /dev/null +++ b/docs/kbase/memorydevices.rst @@ -0,0 +1,150 @@ +============== +Memory devices +============== + +.. contents:: + +Basics +====== + +Memory devices can be divided into two families: volatile and non-volatile. +The former is typical RAM memory: it's volatile and thus its contents doesn't +survive reboots nor guest shut downs and power ons.
The last part of this sentence is a little awkward. How about something like "... its contents doesn't survive guest reboots or power cycles." ?
The latter retains its +contents across reboots or power outages. + +In Libvirt, there are two models for volatile memory: + +* ``dimm`` model: + + :: + + <memory model='dimm'> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + <address type='dimm' slot='0'/> + </memory> + +* ``virtio-mem`` model: + + :: + + <memory model='virtio-mem'> + <target> + <size unit='KiB'>1048576</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>524288</requested> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </memory> + +Then there are two models for non-volatile memory: + +* ``nvidmm`` model:
nvdimm
+ + :: + + <memory model='nvdimm'> + <source> + <path>/tmp/nvdimm</path> + </source> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + <address type='dimm' slot='0'/> + </memory> + +* ``virtio-pmem`` model: + + :: + + <memory model='virtio-pmem' access='shared'> + <source> + <path>/tmp/virtio_pmem</path> + </source> + <target> + <size unit='KiB'>524288</size> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </memory> + + +Please note that (maybe somewhat surprisingly) virtio models go onto PCI bus +instead of DIMM slots. + +Furthermore, DIMMs can have ``<source/>`` element which configures backend for +devices. For NVDIMMs the element is mandatory and reflects where the contents +is saved.
"where the content is saved" or "where the contents are saved"
+ +See `memory devices documentation <../formatdomain.html#elementsMemory>`_. + +``virtio-mem`` model +==================== + +The ``virtio-mem`` model can be viewed as revised memory balloon. It offers +adding and removing memory (without the actual hotplug of the device). It +solves problems that memory balloon can't solve on its own and thus is more +flexible than DIMM + balloon solution. ``virtio-mem`` is NUMA aware, and thus +memory can be inflated/deflated only for a subset of guest NUMA nodes. Also, +it works with chunks that are either exposed to guest or taken back from it.
"or reclaimed from it" ?
+ +See https://virtio-mem.gitlab.io/ + +Under the hood, ``virtio-mem`` device is split into chunks of equal size which +are then exposed to the guest. Either all of them or only a portion depending +on user's request. Therefore there are three important sizes for +``virtio-mem``. All are to be found under ``<target/>`` element: + +#. The maximum size the device can ever offer, exposed under ``<size/>`` +#. The size of a single block, exposed under ``<block/>`` +#. The current size exposed to the guest, exposed under ``<requested/>`` + +For instance, the following example the maximum size is 4GiB, the block size is
"For instance, in the following example ..."
+2MiB and only 1GiB should be exposed to the guest: + + :: + + <memory model='virtio-mem'> + <target> + <size unit='KiB'>4194304</size> + <block unit='KiB'>2048</block> + <requested unit='KiB'>1048576</requested> + </target> + </memory> + +Please note that ``<requested/>`` must be an integer multiple of ``<block/>`` +size or zero (no blocks exposed to the guest) and has to be less or equal to +``<size/>`` (all blocks exposed to the guest). Furthermore, QEMU recommends the +``<block/>`` size to be as big as a Transparent Huge Page (usually 2MiB). + +To change the size exposed to the guest, users should pass memory device XML +with nothing but ``<requested/>`` changed into the +``virDomainUpdateDeviceFlags()`` API. For user's convenience this can be done +via virsh too: + + :: + + # virsh update-memory-device $dom --requested-size 2GiB + +If there are two or more ``<memory/>`` devices then ``--alias`` shall be used +to tell virsh which memory device should be updated. + +For running guests there is fourth size that can be found under ``<target/>``: + + :: + + <actual unit='KiB'>2097152</actual> + +The ``<actual/>`` reflects the actual size used by the guest. In general it +can differ from ``<requested/>``. Reasons include guest kernel missing +``virtio-mem`` module and thus being unable to take offered memory, or guest +kernel being unable to free memory. Since ``<actual/>`` only reports size to +users, the element is never parsed. It is formatted only into live XML. + +Since changing actual allocation requires cooperation with guest kernel, +requests for change are not instant. Therefore, libvirt emits +``VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE`` event whenever actual +allocation changed.
Nice doc, and nice addition to the KB! Reviewed-by: Jim Fehlig <jfehlig@suse.com> Regards, Jim
diff --git a/docs/kbase/meson.build b/docs/kbase/meson.build index 7631b47018..f93f687efb 100644 --- a/docs/kbase/meson.build +++ b/docs/kbase/meson.build @@ -10,6 +10,7 @@ docs_kbase_files = [ 'locking-lockd', 'locking', 'locking-sanlock', + 'memorydevices', 'merging_disk_image_chains', 'migrationinternals', 'qemu-passthrough-security',

On 6/23/21 5:52 PM, Jim Fehlig wrote:
On 6/23/21 4:12 AM, Michal Privoznik wrote:
This commit adds new memorydevices.rst page which should serve all models of memory devices. Yet, I'm documenting virtio-mem quirks only.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/kbase/index.rst | 4 + docs/kbase/memorydevices.rst | 150 +++++++++++++++++++++++++++++++++++ docs/kbase/meson.build | 1 + 3 files changed, 155 insertions(+) create mode 100644 docs/kbase/memorydevices.rst
Nice doc, and nice addition to the KB!
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
Thanks, I've made the changes locally for now. Michal

On 23.06.21 12:12, Michal Privoznik wrote:
v4 of:
https://listman.redhat.com/archives/libvir-list/2021-April/msg01138.html
diff to v3: - Rebased code on the top of master - Tried to work in all Peter's review suggestions - Fixed a bug where adjusting <requested/> was viewed as hotplug of new <memory/> by XML validator and thus if <maxMemory/> was close enough to <currentMemory/> the validator reported an error (this was reported by David).
Hi Michal, I just retested with this version and it mostly works as expected. I tested quite some memory configurations and have some comments / reports :) I tested successfully: - 1 node with one device - 2 nodes with one device on each node - 2 nodes with two devices on one node - "virsh update-memory-device" on live domains -- works great - huge pages and anonymous memory with access=private and access=shared. There is only one issue with hugepages and memfd (prealloc=on gets set). - shared memory on memfd and anonymous memory (-> shared file) with access=shared I only tested on a single host NUMA node so far, but don't expect surprises with host numa policies. 1. "virsh update-memory-device" and stopped domains Once I have more than one virtio-mem device defined for a VM, "virsh update-memory-device" cannot be used anymore as aliases don't seem to be available on stopped VMs. If I manually define an alias on a stopped VM, the alias silently gets dropped. Is there any way to identify a virtio-mem device on a stopped domain? 2. "virsh update-memory-device" with --config on a running domain # virsh update-memory-device "Fedora34" --config --alias "virtiomem1" --requested-size 16G error: no memory device found I guess the issue is again, that alias don't apply to the "!live" XML. So the "--config" option doesn't really work when having more than one virtio-mem device defined for a VM. 3. "virsh update-memory-device" and nodes In addition to "--alias", something like "--node" would also be nice to have -- assuming there is only a single virtio-mem device per NUMA node, which is usually the case. For example: "virsh update-memory-device "Fedora34" --node 1 --requested-size 16G" could come in handy. This would also work on "!live" domains. 4. "actual" vs. "current" "<actual unit='KiB'>16777216</actual>" I wonder if "current" instead of "actual" would be more in line with "currentMemory". But no strong opinion. 5. Slot handling. As already discussed, virtio-mem and virtio-pmem don't need slots. Yet, the "slots" definition is required and libvirt reserves once slot for each such device ("error: unsupported configuration: memory device count '2' exceeds slots count '1'"). This is certainly future work, if we ever want to change that. 6. 4k source results in an error <source> <pagesize unit='KiB'>4096</pagesize> <nodemask>0-1</nodemask> </source> "error: internal error: Unable to find any usable hugetlbfs mount for 4096 KiB" This example is taken from https://libvirt.org/formatdomain.html for DIMMs. Not sure what the expected behavior is. 7. File source gets silently dropped <source> <path>/dev/shmem/vm0</path> </source> The statement gets silently dropped, which is somewhat surprising. However, I did not test what happens with DIMMs, maybe it's the same. 8. Global preallocation of memory With <memoryBacking> <allocation mode="immediate"\> </memoryBacking> we also get "prealloc=on" set for the memory backends of the virito-mem devices, which is sub-optimal, because we end up preallocating all memory of the memory backend (which is unexpected for a virtio-mem device) and virtio-mem will then discard all memory immediately again. So it's essentially a dangerous NOP -- dangerous because we temporarily consume a lot of memory. In an ideal world, we would not set this for the memory backend used for the virtio-mem devices, but for the virtio-mem devices themselves, such that preallocation happens when new memory blocks are actually exposed to the VM. As virtio-mem does not support "prealloc=on" for virtio-mem devices yet, this is future work. We might want to error out, though, if <allocation mode="immediate"\> is used along with virtio-mem devices for now. I'm planning on implementing this in QEMU soon. Until then, it might also be good enough to simply document that this setup should be avoided. 9. Memfd and huge pages <memoryBacking> <source type="memfd"/> </memoryBacking> and <memory model='virtio-mem' access='shared'> <source> <pagesize unit='KiB'>2048</pagesize> </source> ... </memory> I get on the QEMU cmdline "-object {"qom-type":"memory-backend-memfd","id":"memvirtiomem0","hugetlb":true,"hugetlbsize":2097152,"share":true,"prealloc":true,"size":17179869184}" Dropping "the memfd" source I get on the QEMU cmdline: -object^@{"qom-type":"memory-backend-file","id":"memvirtiomem0","mem-path":"/dev/hugepages/libvirt/qemu/2-Fedora34-2","share":true,"size":17179869184} "prealloc":true should not have been added for virtio-mem in case of memfd. !memfd does what's expected. 10. Memory locking With <memoryBacking> <locked/> </memoryBacking> virtio-mem fails with "qemu-system-x86_64: -device virtio-mem-pci,node=0,block-size=2097152,requested-size=0,memdev=memvirtiomem0,id=virtiomem0,bus=pci.0,addr=0x2: Incompatible with mlock" Unfortunately,for example, on shmem like: <memoryBacking> <locked/> <access mode="shared"/> <source type="memfd"/> </memoryBacking> it seems to fail after essentially (temporarily) preallocating all memory for the memory backend of the virtio-mem device. In the future, virtio-mem might be able to support mlock, until then, this is suboptimal but at least fails at some point. 11. Reservation of memory With new QEMU versions we'll want to pass "reserve=off" for the memory backend used, especially with hugepages and private mappings. While this change was merged into QEMU, it's not part of an official release yet. Future work. https://lore.kernel.org/qemu-devel/20210510114328.21835-1-david@redhat.com/ Otherwise, when we don't have the "size" currently in free and "unreserved" hugepages, we'll fail with "qemu-system-x86_64: unable to map backing store for guest RAM: Cannot allocate memory". The same thing can easily happen on anonymous memory when memory overcommit isn't disabled. So this is future work, but at least the QEMU part is already upstream. I'm planning on adding some libvirt documentation to https://virtio-mem.gitlab.io/ soon, where I'll document some of this, including care that has to be taken with mlock and preallocation. Thanks for all your work! -- Thanks, David / dhildenb

On 7/7/21 12:30 PM, David Hildenbrand wrote:
On 23.06.21 12:12, Michal Privoznik wrote:
v4 of:
https://listman.redhat.com/archives/libvir-list/2021-April/msg01138.html
diff to v3: - Rebased code on the top of master - Tried to work in all Peter's review suggestions - Fixed a bug where adjusting <requested/> was viewed as hotplug of new <memory/> by XML validator and thus if <maxMemory/> was close enough to <currentMemory/> the validator reported an error (this was reported by David).
Hi Michal,
Hi, sorry for replying this late.
I just retested with this version and it mostly works as expected. I tested quite some memory configurations and have some comments / reports :)
I tested successfully: - 1 node with one device - 2 nodes with one device on each node - 2 nodes with two devices on one node - "virsh update-memory-device" on live domains -- works great - huge pages and anonymous memory with access=private and access=shared. There is only one issue with hugepages and memfd (prealloc=on gets set). - shared memory on memfd and anonymous memory (-> shared file) with access=shared
I only tested on a single host NUMA node so far, but don't expect surprises with host numa policies.
1. "virsh update-memory-device" and stopped domains
Once I have more than one virtio-mem device defined for a VM, "virsh update-memory-device" cannot be used anymore as aliases don't seem to be available on stopped VMs. If I manually define an alias on a stopped VM, the alias silently gets dropped. Is there any way to identify a virtio-mem device on a stopped domain?
Yes. You want what we call user aliases. They have to have "ua-" prefix so that they don't clash with whatever libvirt generates. Something like this: <memory model="virtio-mem"> <source> <pagesize unit="KiB">2048</pagesize> </source> <target> <size unit="KiB">2097152</size> <node>0</node> <block unit="KiB">2048</block> <requested unit="KiB">1048576</requested> </target> <alias name="ua-virtiomem0"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x06" function="0x0"/> </memory> But then you get to the fact that I haven't implemented update for inactive domains. Will do in v5.
2. "virsh update-memory-device" with --config on a running domain
# virsh update-memory-device "Fedora34" --config --alias "virtiomem1" --requested-size 16G error: no memory device found
I guess the issue is again, that alias don't apply to the "!live" XML. So the "--config" option doesn't really work when having more than one virtio-mem device defined for a VM.
Good point. I wonder what piece of input XML I can use to look up corresponding virtio-mem. Since it has a PCI address maybe I can use that instead of alias? But then we are back to the old problem - in general inactive and active XMLs can be different (due to hot(un-)plug). So even when I'd find a device on the same PCI address it may be different, actually. Therefore, I think the safest is to use aliases. At anyrate - this can be implemented afterwards.
3. "virsh update-memory-device" and nodes
In addition to "--alias", something like "--node" would also be nice to have -- assuming there is only a single virtio-mem device per NUMA node, which is usually the case. For example:
"virsh update-memory-device "Fedora34" --node 1 --requested-size 16G" could come in handy. This would also work on "!live" domains.
Yes, makes sense.
4. "actual" vs. "current"
"<actual unit='KiB'>16777216</actual>" I wonder if "current" instead of "actual" would be more in line with "currentMemory". But no strong opinion.
Yeah, I don't have any opinion either. I can change it.
5. Slot handling.
As already discussed, virtio-mem and virtio-pmem don't need slots. Yet, the "slots" definition is required and libvirt reserves once slot for each such device ("error: unsupported configuration: memory device count '2' exceeds slots count '1'"). This is certainly future work, if we ever want to change that.
I can look into this.
6. 4k source results in an error
<source> <pagesize unit='KiB'>4096</pagesize> <nodemask>0-1</nodemask> </source>
"error: internal error: Unable to find any usable hugetlbfs mount for 4096 KiB"
This example is taken from https://libvirt.org/formatdomain.html for DIMMs. Not sure what the expected behavior is.
Ouch, this is a clear bug. Let me investigate and fix in next version.
7. File source gets silently dropped
<source> <path>/dev/shmem/vm0</path> </source>
The statement gets silently dropped, which is somewhat surprising. However, I did not test what happens with DIMMs, maybe it's the same.
Yeah, this is somewhat expected. I mean, the part that's expected is that libvirt drops parts it doesn't parse. Sometimes it is pretty obvious (<source someRandomAttribute='value'/>) and sometimes it's not so (like in your example when <path/> makes sense for other memory models like virtio-pmem). But just to be sure - since virtio-mem can be backed by any memory-backing-* backend, does it make sense to have <path/> there? So far my code would use memory-backend-file for hugepages only.
8. Global preallocation of memory
With
<memoryBacking> <allocation mode="immediate"\> </memoryBacking>
we also get "prealloc=on" set for the memory backends of the virito-mem devices, which is sub-optimal, because we end up preallocating all memory of the memory backend (which is unexpected for a virtio-mem device) and virtio-mem will then discard all memory immediately again. So it's essentially a dangerous NOP -- dangerous because we temporarily consume a lot of memory.
In an ideal world, we would not set this for the memory backend used for the virtio-mem devices, but for the virtio-mem devices themselves, such that preallocation happens when new memory blocks are actually exposed to the VM.
As virtio-mem does not support "prealloc=on" for virtio-mem devices yet, this is future work. We might want to error out, though, if <allocation mode="immediate"\> is used along with virtio-mem devices for now. I'm planning on implementing this in QEMU soon. Until then, it might also be good enough to simply document that this setup should be avoided.
Right. Meanwhile this was implemented in QEMU and thus I can drop prealloc=on. But then my question is what happens when user wants to expose additional memory to the guest but doesn't have enough free hugepages in the pool? Libvirt's using prealloc=on so that QEMU doesn't get killed later, after the guest booted up.
9. Memfd and huge pages
<memoryBacking> <source type="memfd"/> </memoryBacking>
and
<memory model='virtio-mem' access='shared'> <source> <pagesize unit='KiB'>2048</pagesize> </source> ... </memory>
I get on the QEMU cmdline
"-object {"qom-type":"memory-backend-memfd","id":"memvirtiomem0","hugetlb":true,"hugetlbsize":2097152,"share":true,"prealloc":true,"size":17179869184}"
Dropping "the memfd" source I get on the QEMU cmdline:
-object^@{"qom-type":"memory-backend-file","id":"memvirtiomem0","mem-path":"/dev/hugepages/libvirt/qemu/2-Fedora34-2","share":true,"size":17179869184}
"prealloc":true should not have been added for virtio-mem in case of memfd. !memfd does what's expected.
Okay, I will fix this. But can you shed more light here? I mean, why the difference?
10. Memory locking
With
<memoryBacking> <locked/> </memoryBacking>
virtio-mem fails with
"qemu-system-x86_64: -device virtio-mem-pci,node=0,block-size=2097152,requested-size=0,memdev=memvirtiomem0,id=virtiomem0,bus=pci.0,addr=0x2: Incompatible with mlock"
Unfortunately,for example, on shmem like:
<memoryBacking> <locked/> <access mode="shared"/> <source type="memfd"/> </memoryBacking>
it seems to fail after essentially (temporarily) preallocating all memory for the memory backend of the virtio-mem device. In the future, virtio-mem might be able to support mlock, until then, this is suboptimal but at least fails at some point.
11. Reservation of memory
With new QEMU versions we'll want to pass "reserve=off" for the memory backend used, especially with hugepages and private mappings. While this change was merged into QEMU, it's not part of an official release yet. Future work.
https://lore.kernel.org/qemu-devel/20210510114328.21835-1-david@redhat.com/
Otherwise, when we don't have the "size" currently in free and "unreserved" hugepages, we'll fail with "qemu-system-x86_64: unable to map backing store for guest RAM: Cannot allocate memory". The same thing can easily happen on anonymous memory when memory overcommit isn't disabled.
So this is future work, but at least the QEMU part is already upstream.
So what's the difference between reserve and prealloc? Michal

Hi,
sorry for replying this late.
Thanks for looking into this. It's a fairly long list, so it's understandable that it took a while. :)
5. Slot handling.
As already discussed, virtio-mem and virtio-pmem don't need slots. Yet, the "slots" definition is required and libvirt reserves once slot for each such device ("error: unsupported configuration: memory device count '2' exceeds slots count '1'"). This is certainly future work, if we ever want to change that.
I can look into this.
Yeah, but it can certainly be considered as future work as well.
7. File source gets silently dropped
<source> <path>/dev/shmem/vm0</path> </source>
The statement gets silently dropped, which is somewhat surprising. However, I did not test what happens with DIMMs, maybe it's the same.
Yeah, this is somewhat expected. I mean, the part that's expected is that libvirt drops parts it doesn't parse. Sometimes it is pretty obvious (<source someRandomAttribute='value'/>) and sometimes it's not so (like in your example when <path/> makes sense for other memory models like virtio-pmem). But just to be sure - since virtio-mem can be backed by any memory-backing-* backend, does it make sense to have <path/> there? So far my code would use memory-backend-file for hugepages only.
So it could be backed by a file residing on a filesystem that supports sparse files (shmem, hugetlbfs, ext4, ...) -- IOW anything modern :) It's supposed to work, but if it makes your life easier, we can consider supporting other file sources future work.
8. Global preallocation of memory
With
<memoryBacking> <allocation mode="immediate"\> </memoryBacking>
we also get "prealloc=on" set for the memory backends of the virito-mem devices, which is sub-optimal, because we end up preallocating all memory of the memory backend (which is unexpected for a virtio-mem device) and virtio-mem will then discard all memory immediately again. So it's essentially a dangerous NOP -- dangerous because we temporarily consume a lot of memory.
In an ideal world, we would not set this for the memory backend used for the virtio-mem devices, but for the virtio-mem devices themselves, such that preallocation happens when new memory blocks are actually exposed to the VM.
As virtio-mem does not support "prealloc=on" for virtio-mem devices yet, this is future work. We might want to error out, though, if <allocation mode="immediate"\> is used along with virtio-mem devices for now. I'm planning on implementing this in QEMU soon. Until then, it might also be good enough to simply document that this setup should be avoided.
Right. Meanwhile this was implemented in QEMU and thus I can drop prealloc=on. But then my question is what happens when user wants to expose additional memory to the guest but doesn't have enough free hugepages in the pool? Libvirt's using prealloc=on so that QEMU doesn't get killed later, after the guest booted up.
So "prealloc=on" support for virtio-mem is unfortunately not part of QEMU yet (only "reserve=off" for memory backends). As you correctly state, until that is in place, huge pages cannot be used in a safe way with virtio-mem, which is why they are not officially supported yet by virtio-mem. The idea is to specify "prealloc=on" on the virtio-mem device level once supported, instead of on the memory backend level. So virtio-mem will preallocate the relevant memory before actually giving new block to the guest via virtio-mem, not when creating the memory backend. If preallocation fails at that point, no new blocks are given to the guest and we won't get killed. Think of it like this: you defer preallocation to the point where you actually use the memory and handle preallocation errors still in a safe way. More details are below.
9. Memfd and huge pages
<memoryBacking> <source type="memfd"/> </memoryBacking>
and
<memory model='virtio-mem' access='shared'> <source> <pagesize unit='KiB'>2048</pagesize> </source> ... </memory>
I get on the QEMU cmdline
"-object {"qom-type":"memory-backend-memfd","id":"memvirtiomem0","hugetlb":true,"hugetlbsize":2097152,"share":true,"prealloc":true,"size":17179869184}"
Dropping "the memfd" source I get on the QEMU cmdline:
-object^@{"qom-type":"memory-backend-file","id":"memvirtiomem0","mem-path":"/dev/hugepages/libvirt/qemu/2-Fedora34-2","share":true,"size":17179869184}
"prealloc":true should not have been added for virtio-mem in case of memfd. !memfd does what's expected.
Okay, I will fix this. But can you shed more light here? I mean, why the difference?
Assume you want a 1TB virtio-mem device backed by huge pages but initially only expose 1GB to the VM. When setting prealloc=on on the memory backend, we will preallocate 1TB of huge pages when starting QEMU to discard the memory immediately again within virtio-mem startup code (first thing it does is make sure there is no memory backing at all, meaning the memory backend is completely "empty"). We end up with no preallocated memory and temporarily having allocated 1 TB. When setting "prealloc=on" (once supported) on the virtio-mem device instead, we'll preallocate memory dynamically as we hand it to the VM -- so initially only 1GB. Assume we want to give the VM an additional 16GB via that virtio-mem device. virtio-mem will dynamically try preallocating the memory before giving the guest 16GB. Assume only 8GB could be preallocated, then the VM will only get additional 8GB and we won't crash. [...]
11. Reservation of memory
With new QEMU versions we'll want to pass "reserve=off" for the memory backend used, especially with hugepages and private mappings. While this change was merged into QEMU, it's not part of an official release yet. Future work.
https://lore.kernel.org/qemu-devel/20210510114328.21835-1-david@redhat.com/
Otherwise, when we don't have the "size" currently in free and "unreserved" hugepages, we'll fail with "qemu-system-x86_64: unable to map backing store for guest RAM: Cannot allocate memory". The same thing can easily happen on anonymous memory when memory overcommit isn't disabled.
So this is future work, but at least the QEMU part is already upstream.
So what's the difference between reserve and prealloc?
It's difficult the way especially huge pages work. Say you mmap() 1TB of huge pages. Linux will "reserve" 1TB of huge pages and fail mmap() if it can't. BUT it will not preallocate huge pages yet, they are only accounted for in the OS as reserved for this mapping. The idea is that you cannot really overcommit huge pages in the traditional sense, so the reservation mechanism was implemented to make it harder for user space to do something stupid. BUT we still need preallocation because the whole "huge page reservation" code is broken and not NUMA aware! This, however, breaks the idea of virtio-mem, where you want to dynamically decide how much memory you actually give to a VM. If you reserve all huge pages of the memory backend upfront, they cannot be used for anything else in the meantime and you can just stop using virtio-mem and use a large DIMM instead. In the end, what we want in virtio-mem with huge pages in the future is: * reserve=off for the memory backend: don't reserve any huge pages by in the OS, we'll be preallocating instead. * prealloc=on for the virtio-mem device: preallocate memory dynamically when really about to be used by the VM and fail in a safe way if preallcoation fails. In addition to that, "reserve=off" can be useful with virtio-mem also when backed by ordinary system RAM where we don't use preallocation, just due to the way some memory overcommit modes work. But that's also stuff for the future to optimize and you don't have to bother about that just now. :) -- Thanks, David / dhildenb

On 7/7/21 12:30 PM, David Hildenbrand wrote:
On 23.06.21 12:12, Michal Privoznik wrote:
v4 of:
https://listman.redhat.com/archives/libvir-list/2021-April/msg01138.html
diff to v3: - Rebased code on the top of master - Tried to work in all Peter's review suggestions - Fixed a bug where adjusting <requested/> was viewed as hotplug of new <memory/> by XML validator and thus if <maxMemory/> was close enough to <currentMemory/> the validator reported an error (this was reported by David).
Hi Michal,
6. 4k source results in an error
<source> <pagesize unit='KiB'>4096</pagesize> <nodemask>0-1</nodemask> </source>
"error: internal error: Unable to find any usable hugetlbfs mount for 4096 KiB"
This example is taken from https://libvirt.org/formatdomain.html for DIMMs. Not sure what the expected behavior is.
Just realized that this IS expected behavior. 4096KiB pages (=4MiB) are not regular system pages (4KiB). Thus libvirt is trying to find hugetlbfs mount point that's serving 4MiB pages, unsuccessfully. I'll post a patch that fixes the example though. Michal

On 13.09.21 08:53, Michal Prívozník wrote:
On 7/7/21 12:30 PM, David Hildenbrand wrote:
On 23.06.21 12:12, Michal Privoznik wrote:
v4 of:
https://listman.redhat.com/archives/libvir-list/2021-April/msg01138.html
diff to v3: - Rebased code on the top of master - Tried to work in all Peter's review suggestions - Fixed a bug where adjusting <requested/> was viewed as hotplug of new <memory/> by XML validator and thus if <maxMemory/> was close enough to <currentMemory/> the validator reported an error (this was reported by David).
Hi Michal,
6. 4k source results in an error
<source> <pagesize unit='KiB'>4096</pagesize> <nodemask>0-1</nodemask> </source>
"error: internal error: Unable to find any usable hugetlbfs mount for 4096 KiB"
This example is taken from https://libvirt.org/formatdomain.html for DIMMs. Not sure what the expected behavior is.
Just realized that this IS expected behavior. 4096KiB pages (=4MiB) are not regular system pages (4KiB). Thus libvirt is trying to find hugetlbfs mount point that's serving 4MiB pages, unsuccessfully. I'll post a patch that fixes the example though.
Ah, very right. I blindly copied the example ... make sense to me and the error message is correct. -- Thanks, David / dhildenb
participants (4)
-
David Hildenbrand
-
Jim Fehlig
-
Michal Privoznik
-
Michal Prívozník