[PATCH v3 00/15] Introduce virtio-mem <memory/> model

v3 of: https://listman.redhat.com/archives/libvir-list/2021-February/msg00961.html diff to v2: - Dropped code that forbade use of virtio-mem and memballoon at the same time; - This meant that I had to adjust memory accounting, qemuDomainSetMemoryFlags() - see patches 11/15 and 12/15 which are new. - Fixed small nits raised by Peter in his review of v2 Michal Prívozník (15): virhostmem: Introduce virHostMemGetTHPSize() qemu_process: Deduplicate code in qemuProcessNeedHugepagesPath() qemu_process: Drop needless check in qemuProcessNeedMemoryBackingPath() qemu_capabilities: Introduce QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI conf: Introduce virtio-mem <memory/> model qemu: Build command line for virtio-mem qemu: Wire up <memory/> live update qemu: Wire up MEMORY_DEVICE_SIZE_CHANGE event Introduce MEMORY_DEVICE_SIZE_CHANGE event qemu: Refresh the actual size of virtio-mem on monitor reconnect qemu: Account for both memballoon and virtio-mem qemuDomainSetMemoryFlags: Take virtio-mem into consideration virsh: Introduce update-memory-device command news: document recent virtio memory addition kbase: Document virtio-mem NEWS.rst | 7 + docs/formatdomain.rst | 45 +++- docs/kbase/index.rst | 4 + docs/kbase/memorydevices.rst | 150 +++++++++++ docs/kbase/meson.build | 1 + docs/manpages/virsh.rst | 30 +++ docs/schemas/domaincommon.rng | 16 ++ examples/c/misc/event-test.c | 17 ++ include/libvirt/libvirt-domain.h | 23 ++ src/conf/domain_conf.c | 115 ++++++++- src/conf/domain_conf.h | 15 ++ src/conf/domain_event.c | 84 +++++++ src/conf/domain_event.h | 10 + src/conf/domain_validate.c | 39 +++ src/libvirt_private.syms | 5 + src/qemu/qemu_alias.c | 10 +- src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 13 +- src/qemu/qemu_domain.c | 50 +++- src/qemu/qemu_domain.h | 1 + src/qemu/qemu_domain_address.c | 38 ++- src/qemu/qemu_driver.c | 233 +++++++++++++++++- src/qemu/qemu_hotplug.c | 18 ++ src/qemu/qemu_hotplug.h | 5 + src/qemu/qemu_monitor.c | 37 +++ src/qemu/qemu_monitor.h | 27 ++ src/qemu/qemu_monitor_json.c | 97 ++++++-- src/qemu/qemu_monitor_json.h | 5 + src/qemu/qemu_process.c | 118 ++++++++- src/qemu/qemu_validate.c | 8 + src/remote/remote_daemon_dispatch.c | 30 +++ src/remote/remote_driver.c | 32 +++ src/remote/remote_protocol.x | 15 +- src/remote_protocol-structs | 7 + src/security/security_apparmor.c | 1 + src/security/security_dac.c | 2 + src/security/security_selinux.c | 2 + src/util/virhostmem.c | 63 +++++ src/util/virhostmem.h | 3 + tests/domaincapsmock.c | 9 + .../caps_5.1.0.x86_64.xml | 1 + .../caps_5.2.0.x86_64.xml | 1 + .../caps_6.0.0.x86_64.xml | 1 + ...mory-hotplug-virtio-mem.x86_64-latest.args | 49 ++++ .../memory-hotplug-virtio-mem.xml | 67 +++++ tests/qemuxml2argvtest.c | 1 + ...emory-hotplug-virtio-mem.x86_64-latest.xml | 1 + tests/qemuxml2xmltest.c | 1 + tools/virsh-domain.c | 169 +++++++++++++ 50 files changed, 1612 insertions(+), 67 deletions(-) create mode 100644 docs/kbase/memorydevices.rst create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml create mode 120000 tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml -- 2.26.2

New virHostMemGetTHPSize() is introduced which allows caller to obtain THP PMD (Page Middle Directory) size, which is equal to the minimal size that THP can use, taken from kernel doc (Documentation/admin-guide/mm/transhuge.rst): Some userspace (such as a test program, or an optimized memory allocation library) may want to know the size (in bytes) of a transparent hugepage:: cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size Since this size depends on the host architecture and the kernel it won't change whilst libvirtd is running. Therefore, we can use virOnce() and cache the value. Of course, we can be running under kernel that has THP disabled or has no notion of THP at all. In that case a negative value is returned to signal error. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virhostmem.c | 63 ++++++++++++++++++++++++++++++++++++++++ src/util/virhostmem.h | 3 ++ tests/domaincapsmock.c | 9 ++++++ 4 files changed, 76 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 526dcee11a..929880aab7 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2377,6 +2377,7 @@ virHostMemGetFreePages; virHostMemGetInfo; virHostMemGetParameters; virHostMemGetStats; +virHostMemGetTHPSize; virHostMemSetParameters; diff --git a/src/util/virhostmem.c b/src/util/virhostmem.c index ae42978ed2..89b31af3ca 100644 --- a/src/util/virhostmem.c +++ b/src/util/virhostmem.c @@ -45,11 +45,14 @@ #include "virstring.h" #include "virnuma.h" #include "virlog.h" +#include "virthread.h" #define VIR_FROM_THIS VIR_FROM_NONE VIR_LOG_INIT("util.hostmem"); +static unsigned long long virHostTHPPMDSize; /* in kibibytes */ +static virOnceControl virHostMemGetTHPSizeOnce = VIR_ONCE_CONTROL_INITIALIZER; #ifdef __FreeBSD__ # define BSD_MEMORY_STATS_ALL 4 @@ -920,3 +923,63 @@ virHostMemAllocPages(unsigned int npages, return ncounts; } + +#if defined(__linux__) +# define HPAGE_PMD_SIZE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size" +static int +virHostMemGetTHPSizeSysfs(unsigned long long *size) +{ + g_autofree char *buf = NULL; + + /* 1KiB limit is more than enough. */ + if (virFileReadAll(HPAGE_PMD_SIZE_PATH, 1024, &buf) < 0) + return -1; + + virStringTrimOptionalNewline(buf); + if (virStrToLong_ull(buf, NULL, 10, size) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("unable to parse THP PMD size: %s"), buf); + return -1; + } + + /* Size is now in bytes. Convert to KiB. */ + *size >>= 10; + return 0; +} +#endif /* defined(__linux__) */ + + +static void +virHostMemGetTHPSizeOnceInit(void) +{ +#if defined(__linux__) + virHostMemGetTHPSizeSysfs(&virHostTHPPMDSize); +#else /* !defined(__linux__) */ + VIR_WARN("Getting THP size not ported yet"); +#endif /* !defined(__linux__) */ +} + + +/** + * virHostMemGetTHPSize: + * @size: returned size of THP in kibibytes + * + * Obtain Transparent Huge Page size in kibibytes. The size + * depends on host architecture and kernel. Because of virOnce(), + * do not rely on errno in case of failure. + * + * Returns: 0 on success, + * -1 on failure. + */ +int +virHostMemGetTHPSize(unsigned long long *size) +{ + if (virOnce(&virHostMemGetTHPSizeOnce, virHostMemGetTHPSizeOnceInit) < 0) + return -1; + + if (virHostTHPPMDSize == 0) + return -1; + + *size = virHostTHPPMDSize; + return 0; +} diff --git a/src/util/virhostmem.h b/src/util/virhostmem.h index 1369829807..bf15c40698 100644 --- a/src/util/virhostmem.h +++ b/src/util/virhostmem.h @@ -53,3 +53,6 @@ int virHostMemAllocPages(unsigned int npages, int startCell, unsigned int cellCount, bool add); + +int virHostMemGetTHPSize(unsigned long long *size) + G_GNUC_NO_INLINE; diff --git a/tests/domaincapsmock.c b/tests/domaincapsmock.c index d81a898dc0..34a2f9ad81 100644 --- a/tests/domaincapsmock.c +++ b/tests/domaincapsmock.c @@ -17,6 +17,7 @@ #include <config.h> #include "virhostcpu.h" +#include "virhostmem.h" #ifdef WITH_LIBXL # include "libxl/libxl_capabilities.h" #endif @@ -40,3 +41,11 @@ virHostCPUGetMicrocodeVersion(virArch hostArch G_GNUC_UNUSED) { return 0; } + +int +virHostMemGetTHPSize(unsigned long long *size) +{ + /* Pretend Transparent Huge Page size is 2MiB. */ + *size = 2048; + return 0; +} -- 2.26.2

The aim of qemuProcessNeedHugepagesPath() is to return whether guest needs private path inside HugeTLBFS mounts (deducted from domain definition @def) or whether the memory device that user is hotplugging in needs the private path (deducted from the @mem argument). The actual creation of the path is done in the only caller qemuProcessBuildDestroyMemoryPaths(). The rule for the first case (@def) and the second case (@mem) is the same (domain has a DIMM device that has HP requested) and is written twice. Move the logic into a function to deduplicate the code. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 5f31260221..74667206b9 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3862,6 +3862,27 @@ qemuProcessReconnectCheckMemAliasOrderMismatch(virDomainObjPtr vm) } +static bool +qemuProcessDomainMemoryDefNeedHugepagesPath(const virDomainMemoryDef *mem, + const long system_pagesize) +{ + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + return mem->pagesize && + mem->pagesize != system_pagesize; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + /* None of these can be backed by hugepages. */ + return false; + } + + return false; +} + + static bool qemuProcessNeedHugepagesPath(virDomainDefPtr def, virDomainMemoryDefPtr mem) @@ -3878,16 +3899,12 @@ qemuProcessNeedHugepagesPath(virDomainDefPtr def, } for (i = 0; i < def->nmems; i++) { - if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_DIMM && - def->mems[i]->pagesize && - def->mems[i]->pagesize != system_pagesize) + if (qemuProcessDomainMemoryDefNeedHugepagesPath(def->mems[i], system_pagesize)) return true; } if (mem && - mem->model == VIR_DOMAIN_MEMORY_MODEL_DIMM && - mem->pagesize && - mem->pagesize != system_pagesize) + qemuProcessDomainMemoryDefNeedHugepagesPath(mem, system_pagesize)) return true; return false; -- 2.26.2

The aim of this function is to return whether domain definition and/or memory device that user intents to hotplug needs a private path inside cfg->memoryBackingDir. The rule for the memory device that's being hotplug includes checking whether corresponding guest NUMA node needs memoryBackingDir. Well, while the rationale behind makes sense it is not necessary to check for that really - just a few lines above every guest NUMA node was checked exactly for that. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 74667206b9..7eeeeead33 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3929,13 +3929,24 @@ qemuProcessNeedMemoryBackingPath(virDomainDefPtr def, return true; } - if (mem && - mem->model == VIR_DOMAIN_MEMORY_MODEL_DIMM && - (mem->access != VIR_DOMAIN_MEMORY_ACCESS_DEFAULT || - (mem->targetNode >= 0 && - virDomainNumaGetNodeMemoryAccessMode(def->numa, mem->targetNode) - != VIR_DOMAIN_MEMORY_ACCESS_DEFAULT))) - return true; + if (mem) { + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + if (mem->access != VIR_DOMAIN_MEMORY_ACCESS_DEFAULT) { + /* No need to check for access mode on the target node, + * it was checked for in the previous loop. */ + return true; + } + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + /* Backed by user provided path. Not stored in memory + * backing dir anyway. */ + break; + } + } return false; } -- 2.26.2

This commit introduces a new capability that reflects virtio-mem-pci device support in QEMU: QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI, /* -device virtio-mem-pci */ The virtio-mem-pci device was introduced in QEMU 5.1. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml | 1 + 5 files changed, 6 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 447cf77875..f0735774c1 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -622,6 +622,7 @@ VIR_ENUM_IMPL(virQEMUCaps, /* 395 */ "vnc-power-control", "audiodev", + "virtio-mem-pci", ); @@ -1343,6 +1344,7 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { { "am53c974", QEMU_CAPS_SCSI_AM53C974 }, { "virtio-pmem-pci", QEMU_CAPS_DEVICE_VIRTIO_PMEM_PCI }, { "vhost-user-blk", QEMU_CAPS_DEVICE_VHOST_USER_BLK }, + { "virtio-mem-pci", QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI }, }; diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index ee321df66c..e74c1eeaba 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -602,6 +602,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ /* 395 */ QEMU_CAPS_VNC_POWER_CONTROL, /* -vnc power-control option */ QEMU_CAPS_AUDIODEV, /* -audiodev instead of QEMU_AUDIO_DRV */ + QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI, /* -device virtio-mem-pci */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml index dd91777c9b..ff946a6864 100644 --- a/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml @@ -255,6 +255,7 @@ <flag name='memory-backend-file.x-use-canonical-path-for-ramblock-id'/> <flag name='vnc-opts'/> <flag name='audiodev'/> + <flag name='virtio-mem-pci'/> <version>5001000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml index 17cec44cd8..059146f373 100644 --- a/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml @@ -256,6 +256,7 @@ <flag name='memory-backend-file.x-use-canonical-path-for-ramblock-id'/> <flag name='vnc-opts'/> <flag name='audiodev'/> + <flag name='virtio-mem-pci'/> <version>5002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml index 6d546268e4..e1145bf8dd 100644 --- a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml @@ -259,6 +259,7 @@ <flag name='migration-param.block-bitmap-mapping'/> <flag name='vnc-power-control'/> <flag name='audiodev'/> + <flag name='virtio-mem-pci'/> <version>5002050</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100242</microcodeVersion> -- 2.26.2

The virtio-mem is paravirtualized mechanism of adding/removing memory to/from a VM. A virtio-mem-pci device is split into blocks of equal size which are then exposed (all or only a requested portion of them) to the guest kernel to use as regular memory. Therefore, the device has two important attributes: 1) block-size, which defines the size of a block 2) requested-size, which defines how much memory (in bytes) is the device requested to expose to the guest. The 'block-size' is configured on command line and immutable throughout device's lifetime. The 'requested-size' can be set on the command line too, but also is adjustable via monitor. In fact, that is how management software places its requests to change the memory allocation. If it wants to give more memory to the guest it changes 'requested-size' to a bigger value, and if it wants to shrink guest memory it changes the 'requested-size' to a smaller value. Note, value of zero means that guest should release all memory offered by the device. Of course, guest has to cooperate. Therefore, there is a third attribute 'size' which is read only and reflects how much memory the guest still has. This can be different to 'requested-size', obviously. Because of name clash, I've named it 'actualsize' and it is dealt with in future commits (it is a runtime information anyway). In the backend, memory for virtio-mem is backed by usual objects: memory-backend-{ram,file,memfd} and their size puts the cap on the amount of memory that a virtio-mem device can offer to a guest. But we are already able to express this info using <size/> under <target/>. Therefore, we need only two more elements to cover 'block-size' and 'requested-size' attributes. This is the XML I've came up with: <memory model='virtio-mem'> <source> <nodemask>1-3</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>2097152</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>1048576</requested> </target> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </memory> I hope by now it is obvious that: 1) 'requested-size' must be an integer multiple of 'block-size', and 2) virtio-mem-pci device goes onto PCI bus and thus needs PCI address. Then there is a limitation that the minimal 'block-size' is transparent huge page size (I'll leave this without explanation). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.rst | 38 +++++++++-- docs/schemas/domaincommon.rng | 11 +++ src/conf/domain_conf.c | 53 ++++++++++++++- src/conf/domain_conf.h | 3 + src/conf/domain_validate.c | 39 +++++++++++ src/qemu/qemu_alias.c | 1 + src/qemu/qemu_command.c | 1 + src/qemu/qemu_domain.c | 10 +++ src/qemu/qemu_domain_address.c | 38 ++++++++--- src/qemu/qemu_process.c | 2 + src/qemu/qemu_validate.c | 8 +++ src/security/security_apparmor.c | 1 + src/security/security_dac.c | 2 + src/security/security_selinux.c | 2 + .../memory-hotplug-virtio-mem.xml | 67 +++++++++++++++++++ ...emory-hotplug-virtio-mem.x86_64-latest.xml | 1 + tests/qemuxml2xmltest.c | 1 + 17 files changed, 261 insertions(+), 17 deletions(-) create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml create mode 120000 tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index a2ea2690a5..a09ea68888 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -7673,6 +7673,18 @@ Example: usage of the memory devices <size unit='KiB'>524288</size> </target> </memory> + <memory model='virtio-mem'> + <source> + <nodemask>1-3</nodemask> + <pagesize unit='KiB'>2048</pagesize> + </source> + <target> + <size unit='KiB'>2097152</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>1048576</requested> + </target> + </memory> </devices> ... @@ -7680,7 +7692,8 @@ Example: usage of the memory devices Provide ``dimm`` to add a virtual DIMM module to the guest. :since:`Since 1.2.14` Provide ``nvdimm`` model that adds a Non-Volatile DIMM module. :since:`Since 3.2.0` Provide ``virtio-pmem`` model to add a paravirtualized - persistent memory device. :since:`Since 7.1.0` + persistent memory device. :since:`Since 7.1.0` Provide ``virtio-mem`` model + to add paravirtualized memory device. :since:`Since 7.1.0` ``access`` An optional attribute ``access`` ( :since:`since 3.2.0` ) that provides @@ -7703,10 +7716,11 @@ Example: usage of the memory devices allowed only for ``model='nvdimm'`` for pSeries guests. :since:`Since 6.2.0` ``source`` - For model ``dimm`` this element is optional and allows to fine tune the - source of the memory used for the given memory device. If the element is not - provided defaults configured via ``numatune`` are used. If ``dimm`` is - provided, then the following optional elements can be provided as well: + For model ``dimm`` and model ``virtio-mem`` this element is optional and + allows to fine tune the source of the memory used for the given memory + device. If the element is not provided defaults configured via ``numatune`` + are used. If the element is provided, then the following optional elements + can be provided: ``pagesize`` This element can be used to override the default host page size used for @@ -7745,7 +7759,8 @@ Example: usage of the memory devices added memory from the perspective of the guest. The mandatory ``size`` subelement configures the size of the added memory as - a scaled integer. + a scaled integer. For ``virtio-mem`` this represents the maximum possible + size exposed to the guest. The ``node`` subelement configures the guest NUMA node to attach the memory to. The element shall be used only if the guest has NUMA nodes configured. @@ -7772,6 +7787,17 @@ Example: usage of the memory devices so other backend types should use the ``readonly`` element. :since:`Since 5.0.0` + ``block`` + For ``virtio-mem`` only. + The size of an individual block, granularity of division of memory module. + Must be power of two and at least equal to size of a transparent hugepage + (2MiB on x84_64). The default is hypervisor dependent. + + ``requested`` + For ``virtio-mem`` only. + The total size exposed to the guest. Must respect ``block`` granularity + and be smaller or equal to ``size``. + :anchor:`<a id="elementsIommu"/>` IOMMU devices diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index e6db2f5b74..a1e3d96724 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -6407,6 +6407,7 @@ <value>dimm</value> <value>nvdimm</value> <value>virtio-pmem</value> + <value>virtio-mem</value> </choice> </attribute> <optional> @@ -6491,6 +6492,16 @@ <ref name="unsignedInt"/> </element> </optional> + <optional> + <element name="block"> + <ref name="scaledInteger"/> + </element> + </optional> + <optional> + <element name="requested"> + <ref name="scaledInteger"/> + </element> + </optional> <optional> <element name="label"> <element name="size"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 47756ff0be..7cd373547b 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -1358,6 +1358,7 @@ VIR_ENUM_IMPL(virDomainMemoryModel, "dimm", "nvdimm", "virtio-pmem", + "virtio-mem", ); VIR_ENUM_IMPL(virDomainShmemModel, @@ -5464,6 +5465,7 @@ virDomainMemoryDefPostParse(virDomainMemoryDefPtr mem, } break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_DIMM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: @@ -15827,6 +15829,7 @@ virDomainMemorySourceDefParseXML(xmlNodePtr node, switch (def->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: if (virDomainParseMemory("./pagesize", "./pagesize/@unit", ctxt, &def->pagesize, false, false) < 0) return -1; @@ -15893,7 +15896,8 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node, &def->size, true, false) < 0) return -1; - if (def->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) { + switch (def->model) { + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: if (virDomainParseMemory("./label/size", "./label/size/@unit", ctxt, &def->labelsize, false, false) < 0) return -1; @@ -15912,6 +15916,23 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node, if (virXPathBoolean("boolean(./readonly)", ctxt)) def->readonly = true; + break; + + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (virDomainParseMemory("./block", "./block/@unit", ctxt, + &def->blocksize, false, false) < 0) + return -1; + + if (virDomainParseMemory("./requested", "./requested/@unit", ctxt, + &def->requestedsize, false, false) < 0) + return -1; + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; } return 0; @@ -17730,11 +17751,14 @@ virDomainMemoryFindByDefInternal(virDomainDefPtr def, /* target info -> always present */ if (tmp->model != mem->model || tmp->targetNode != mem->targetNode || - tmp->size != mem->size) + tmp->size != mem->size || + tmp->blocksize != mem->blocksize || + tmp->requestedsize != mem->requestedsize) continue; switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: /* source stuff -> match with device */ if (tmp->pagesize != mem->pagesize) continue; @@ -23301,6 +23325,22 @@ virDomainMemoryDefCheckABIStability(virDomainMemoryDefPtr src, return false; } + if (src->blocksize != dst->blocksize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Target memory device block size '%llu' doesn't match " + "source memory device block size '%llu'"), + dst->blocksize, src->blocksize); + return false; + } + + if (src->requestedsize != dst->requestedsize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Target memory device requested size '%llu' doesn't match " + "source memory device requested size '%llu'"), + dst->requestedsize, src->requestedsize); + return false; + } + if (src->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) { if (src->labelsize != dst->labelsize) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, @@ -27211,6 +27251,7 @@ virDomainMemorySourceDefFormat(virBufferPtr buf, switch (def->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: if (def->sourceNodes) { if (!(bitmap = virBitmapFormat(def->sourceNodes))) return -1; @@ -27267,6 +27308,14 @@ virDomainMemoryTargetDefFormat(virBufferPtr buf, if (def->readonly) virBufferAddLit(&childBuf, "<readonly/>\n"); + if (def->blocksize) { + virBufferAsprintf(&childBuf, "<block unit='KiB'>%llu</block>\n", + def->blocksize); + + virBufferAsprintf(&childBuf, "<requested unit='KiB'>%llu</requested>\n", + def->requestedsize); + } + virXMLFormatElement(buf, "target", NULL, &childBuf); } diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 54a631853b..1815ba17b7 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2431,6 +2431,7 @@ typedef enum { VIR_DOMAIN_MEMORY_MODEL_DIMM, /* dimm hotpluggable memory device */ VIR_DOMAIN_MEMORY_MODEL_NVDIMM, /* nvdimm memory device */ VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM, /* virtio-pmem memory device */ + VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM, /* virtio-mem memory device */ VIR_DOMAIN_MEMORY_MODEL_LAST } virDomainMemoryModel; @@ -2451,6 +2452,8 @@ struct _virDomainMemoryDef { int targetNode; unsigned long long size; /* kibibytes */ unsigned long long labelsize; /* kibibytes; valid only for NVDIMM */ + unsigned long long blocksize; /* kibibytes; valid only for VIRTIO_MEM */ + unsigned long long requestedsize; /* kibibytes; valid only for VIRTIO_MEM */ bool readonly; /* valid only for NVDIMM */ /* required for QEMU NVDIMM ppc64 support */ diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c index ceefe18ac4..a13333ae6b 100644 --- a/src/conf/domain_validate.c +++ b/src/conf/domain_validate.c @@ -27,6 +27,7 @@ #include "virconftypes.h" #include "virlog.h" #include "virutil.h" +#include "virhostmem.h" #define VIR_FROM_THIS VIR_FROM_DOMAIN @@ -1636,6 +1637,8 @@ static int virDomainMemoryDefValidate(const virDomainMemoryDef *mem, const virDomainDef *def) { + unsigned long long thpSize; + switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: if (!mem->nvdimmPath) { @@ -1689,6 +1692,42 @@ virDomainMemoryDefValidate(const virDomainMemoryDef *mem, _("virtio-pmem does not support NUMA nodes")); return -1; } + break; + + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (mem->requestedsize > mem->size) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("requested size must be smaller than @size")); + return -1; + } + + if (!VIR_IS_POW2(mem->blocksize)) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("block size must be a power of two")); + return -1; + } + + if (virHostMemGetTHPSize(&thpSize) < 0) { + /* We failed to get THP size, fall back to a sane default. On + * almost every architecture the size will be 2MiB, except for some + * funky arches like sparc and m68k. Use 2MiB and refine later if + * somebody complains. */ + thpSize = 2048; + } + + if (mem->blocksize < thpSize) { + virReportError(VIR_ERR_XML_DETAIL, + _("block size too small, must be at least %lluKiB"), + thpSize); + return -1; + } + + if (mem->requestedsize % mem->blocksize != 0) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("requested size must be an integer multiple of block size")); + return -1; + } + break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: break; diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c index 1c6f04c0ba..1a330829a3 100644 --- a/src/qemu/qemu_alias.c +++ b/src/qemu/qemu_alias.c @@ -522,6 +522,7 @@ qemuAssignDeviceMemoryAlias(virDomainDefPtr def, case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: prefix = "virtiopmem"; break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 5717f7b98d..9754cff61d 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3378,6 +3378,7 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def, device = "virtio-pmem-pci"; break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index ed2a1481d4..bfecbdcfb4 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -8963,6 +8963,16 @@ qemuDomainDefValidateMemoryHotplugDevice(const virDomainMemoryDef *mem, needsNuma = false; break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI && + mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("only 'pci' addresses are supported for the %s device"), + virDomainMemoryModelTypeToString(mem->model)); + return -1; + } + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: return -1; diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 68dbf9e95b..64c57f1cad 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -380,9 +380,18 @@ qemuDomainPrimeVirtioDeviceAddresses(virDomainDefPtr def, } for (i = 0; i < def->nmems; i++) { - if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM && - def->mems[i]->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) - def->mems[i]->info.type = type; + switch (def->mems[i]->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (def->mems[i]->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) + def->mems[i]->info.type = type; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } } if (type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_CCW) { @@ -1011,6 +1020,7 @@ qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDefPtr dev, case VIR_DOMAIN_DEVICE_MEMORY: switch (dev->data.memory->model) { case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: return virtioFlags; case VIR_DOMAIN_MEMORY_MODEL_NONE: @@ -2438,12 +2448,19 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, for (i = 0; i < def->nmems; i++) { virDomainMemoryDefPtr mem = def->mems[i]; - if (mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM || - !virDeviceInfoPCIAddressIsWanted(&mem->info)) - continue; - - if (qemuDomainPCIAddressReserveNextAddr(addrs, &mem->info) < 0) - return -1; + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (virDeviceInfoPCIAddressIsWanted(&mem->info) && + qemuDomainPCIAddressReserveNextAddr(addrs, &mem->info) < 0) + return -1; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } } return 0; @@ -3110,6 +3127,7 @@ qemuDomainAssignMemoryDeviceSlot(virQEMUDriverPtr driver, break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: return qemuDomainEnsurePCIAddress(vm, &dev, driver); break; @@ -3135,6 +3153,7 @@ qemuDomainReleaseMemoryDeviceSlot(virDomainObjPtr vm, break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: qemuDomainReleaseDeviceAddress(vm, &mem->info); break; @@ -3169,6 +3188,7 @@ qemuDomainAssignMemorySlots(virDomainDefPtr def) break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: /* handled in qemuDomainAssignPCIAddresses() */ break; case VIR_DOMAIN_MEMORY_MODEL_NONE: diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 7eeeeead33..6eb61f5667 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3868,6 +3868,7 @@ qemuProcessDomainMemoryDefNeedHugepagesPath(const virDomainMemoryDef *mem, { switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: return mem->pagesize && mem->pagesize != system_pagesize; @@ -3932,6 +3933,7 @@ qemuProcessNeedMemoryBackingPath(virDomainDefPtr def, if (mem) { switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: if (mem->access != VIR_DOMAIN_MEMORY_ACCESS_DEFAULT) { /* No need to check for access mode on the target node, * it was checked for in the previous loop. */ diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 6043f974ce..f20a88fc35 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -4825,6 +4825,14 @@ qemuValidateDomainDeviceDefMemory(virDomainMemoryDefPtr mem, } break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("virtio-mem isn't supported by this QEMU binary")); + return -1; + } + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; diff --git a/src/security/security_apparmor.c b/src/security/security_apparmor.c index e6ecb513b1..5aeaf975a1 100644 --- a/src/security/security_apparmor.c +++ b/src/security/security_apparmor.c @@ -693,6 +693,7 @@ AppArmorSetMemoryLabel(virSecurityManagerPtr mgr, return reload_profile(mgr, def, mem->nvdimmPath, true); case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; } diff --git a/src/security/security_dac.c b/src/security/security_dac.c index 344bd6fc5f..0a6bc6a121 100644 --- a/src/security/security_dac.c +++ b/src/security/security_dac.c @@ -1880,6 +1880,7 @@ virSecurityDACRestoreMemoryLabel(virSecurityManagerPtr mgr, break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: case VIR_DOMAIN_MEMORY_MODEL_NONE: ret = 0; @@ -2063,6 +2064,7 @@ virSecurityDACSetMemoryLabel(virSecurityManagerPtr mgr, break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: case VIR_DOMAIN_MEMORY_MODEL_NONE: ret = 0; diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index 9da4e96fa6..60114d6f20 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1584,6 +1584,7 @@ virSecuritySELinuxSetMemoryLabel(virSecurityManagerPtr mgr, case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; } @@ -1611,6 +1612,7 @@ virSecuritySELinuxRestoreMemoryLabel(virSecurityManagerPtr mgr, break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: ret = 0; diff --git a/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml new file mode 100644 index 0000000000..c10528aad8 --- /dev/null +++ b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml @@ -0,0 +1,67 @@ +<domain type='kvm'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <maxMemory slots='16' unit='KiB'>1099511627776</maxMemory> + <memory unit='KiB'>8388608</memory> + <currentMemory unit='KiB'>8388608</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <cpu mode='custom' match='exact' check='none'> + <model fallback='forbid'>qemu64</model> + <topology sockets='2' dies='1' cores='1' threads='1'/> + <numa> + <cell id='0' cpus='0-1' memory='2095104' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i386</emulator> + <disk type='block' device='disk'> + <driver name='qemu' type='raw'/> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='ide' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <controller type='usb' index='0' model='piix3-uhci'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <audio id='1' type='none'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </memballoon> + <memory model='virtio-mem'> + <target> + <size unit='KiB'>1048576</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>524288</requested> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </memory> + <memory model='virtio-mem'> + <source> + <nodemask>1-3</nodemask> + <pagesize unit='KiB'>2048</pagesize> + </source> + <target> + <size unit='KiB'>2097152</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>1048576</requested> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </memory> + </devices> +</domain> diff --git a/tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml b/tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml new file mode 120000 index 0000000000..a9d298129c --- /dev/null +++ b/tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml @@ -0,0 +1 @@ +../qemuxml2argvdata/memory-hotplug-virtio-mem.xml \ No newline at end of file diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index f25a0902c9..d375cd755c 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -1268,6 +1268,7 @@ mymain(void) QEMU_CAPS_DEVICE_NVDIMM, QEMU_CAPS_LAST); DO_TEST_CAPS_LATEST("memory-hotplug-virtio-pmem"); + DO_TEST_CAPS_LATEST("memory-hotplug-virtio-mem"); DO_TEST("net-udp", NONE); -- 2.26.2

Nothing special is happening here. All important changes were done when for 'virtio-pmem' (adjusting the code to put virtio memory on PCI bus, generating alias using qemuDomainDeviceAliasIndex(). The only bit that might look suspicious is no prealloc for virtio-mem. But if you think about it, the whole purpose of this device is to change amount of memory exposed to guest on the fly. There is no point in locking the whole backend in memory. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_alias.c | 9 +++- src/qemu/qemu_command.c | 12 ++++- ...mory-hotplug-virtio-mem.x86_64-latest.args | 49 +++++++++++++++++++ tests/qemuxml2argvtest.c | 1 + 4 files changed, 68 insertions(+), 3 deletions(-) create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c index 1a330829a3..cef02ae916 100644 --- a/src/qemu/qemu_alias.c +++ b/src/qemu/qemu_alias.c @@ -475,8 +475,11 @@ qemuDeviceMemoryGetAliasID(virDomainDefPtr def, size_t i; int maxidx = 0; - /* virtio-pmem goes onto PCI bus and thus DIMM address is not valid */ - if (!oldAlias && mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM) + /* virtio-pmem and virtio-mem go onto PCI bus and thus DIMM address is not + * valid */ + if (!oldAlias && + mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM && + mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) return mem->info.addr.dimm.slot; for (i = 0; i < def->nmems; i++) { @@ -523,6 +526,8 @@ qemuAssignDeviceMemoryAlias(virDomainDefPtr def, prefix = "virtiopmem"; break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + prefix = "virtiomem"; + break; case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 9754cff61d..ba6e149e27 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3150,7 +3150,9 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps, } else if (useHugepage) { if (qemuGetDomainHupageMemPath(priv->driver, def, pagesize, &memPath) < 0) return -1; - prealloc = true; + /* For virtio-mem backed by hugepages we don't need prealloc. */ + if (mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + prealloc = true; } else { /* We can have both pagesize and mem source. If that's the case, * prefer hugepages as those are more specific. */ @@ -3379,6 +3381,9 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def, break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + device = "virtio-mem-pci"; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: @@ -3395,6 +3400,11 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def, if (mem->labelsize) virBufferAsprintf(&buf, "label-size=%llu,", mem->labelsize * 1024); + if (mem->blocksize) { + virBufferAsprintf(&buf, "block-size=%llu,", mem->blocksize * 1024); + virBufferAsprintf(&buf, "requested-size=%llu,", mem->requestedsize * 1024); + } + if (mem->uuid) { char uuidstr[VIR_UUID_STRING_BUFLEN]; diff --git a/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args new file mode 100644 index 0000000000..85aa3c4555 --- /dev/null +++ b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args @@ -0,0 +1,49 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +/usr/bin/qemu-system-i386 \ +-name guest=QEMUGuest1,debug-threads=on \ +-S \ +-object secret,id=masterKey0,format=raw,\ +file=/tmp/lib/domain--1-QEMUGuest1/master-key.aes \ +-machine pc,accel=kvm,usb=off,dump-guest-core=off \ +-cpu qemu64 \ +-m size=2095104k,slots=16,maxmem=1099511627776k \ +-overcommit mem-lock=off \ +-smp 2,sockets=2,dies=1,cores=1,threads=1 \ +-object memory-backend-ram,id=ram-node0,size=2145386496 \ +-numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \ +-object memory-backend-ram,id=memvirtiomem0,size=1073741824 \ +-device virtio-mem-pci,node=0,block-size=2097152,requested-size=536870912,\ +memdev=memvirtiomem0,id=virtiomem0,bus=pci.0,addr=0x2 \ +-object memory-backend-file,id=memvirtiomem1,\ +mem-path=/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1,size=2147483648,\ +host-nodes=1-3,policy=bind \ +-device virtio-mem-pci,node=0,block-size=2097152,requested-size=1073741824,\ +memdev=memvirtiomem1,id=virtiomem1,bus=pci.0,addr=0x3 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-boot strict=on \ +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ +-blockdev '{"driver":"host_device","filename":"/dev/HostVG/QEMUGuest1",\ +"node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ +-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw",\ +"file":"libvirt-1-storage"}' \ +-device ide-hd,bus=ide.0,unit=0,drive=libvirt-1-format,id=ide0-0-0,bootindex=1 \ +-audiodev id=audio1,driver=none \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,\ +resourcecontrol=deny \ +-msg timestamp=on diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 29054ba168..b421e00559 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -3163,6 +3163,7 @@ mymain(void) QEMU_CAPS_DEVICE_NVDIMM, QEMU_CAPS_LAST); DO_TEST_CAPS_LATEST("memory-hotplug-virtio-pmem"); + DO_TEST_CAPS_LATEST("memory-hotplug-virtio-mem"); DO_TEST("machine-aeskeywrap-on-caps", QEMU_CAPS_AES_KEY_WRAP, -- 2.26.2

As advertised in one of previous commits, we want to be able to change 'requested-size' attribute of virtio-mem on the fly. This commit does exactly that. Changing anything else is checked for and forbidden. Once guest has changed the allocation, QEMU emits an event which we will use to track the allocation. In the next commit. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/domain_conf.c | 36 ++++++++ src/conf/domain_conf.h | 4 + src/libvirt_private.syms | 1 + src/qemu/qemu_driver.c | 172 ++++++++++++++++++++++++++++++++++- src/qemu/qemu_hotplug.c | 18 ++++ src/qemu/qemu_hotplug.h | 5 + src/qemu/qemu_monitor.c | 13 +++ src/qemu/qemu_monitor.h | 4 + src/qemu/qemu_monitor_json.c | 15 +++ src/qemu/qemu_monitor_json.h | 5 + 10 files changed, 272 insertions(+), 1 deletion(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 7cd373547b..f6d071dc17 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -17813,6 +17813,42 @@ virDomainMemoryFindInactiveByDef(virDomainDefPtr def, } +/** + * virDomainMemoryFindByDeviceInfo: + * @def: domain defintion + * @info: device info to match + * + * For given domain definition @def find <memory/> device with + * matching address and matching device alias (if set in @info, + * otherwise ignored). + * + * Returns: device if found, + * NULL otherwise. + */ +virDomainMemoryDefPtr +virDomainMemoryFindByDeviceInfo(virDomainDefPtr def, + virDomainDeviceInfoPtr info) +{ + size_t i; + + for (i = 0; i < def->nmems; i++) { + virDomainMemoryDefPtr tmp = def->mems[i]; + + if (!virDomainDeviceInfoAddressIsEqual(&tmp->info, info)) + continue; + + /* alias, if present */ + if (info->alias && + STRNEQ_NULLABLE(tmp->info.alias, info->alias)) + continue; + + return tmp; + } + + return NULL; +} + + /** * virDomainMemoryInsert: * diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 1815ba17b7..169887e1a8 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -3741,6 +3741,10 @@ int virDomainMemoryFindByDef(virDomainDefPtr def, virDomainMemoryDefPtr mem) int virDomainMemoryFindInactiveByDef(virDomainDefPtr def, virDomainMemoryDefPtr mem) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; +virDomainMemoryDefPtr +virDomainMemoryFindByDeviceInfo(virDomainDefPtr dev, + virDomainDeviceInfoPtr info) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; int virDomainShmemDefInsert(virDomainDefPtr def, virDomainShmemDefPtr shmem) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 929880aab7..a317992f1e 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -499,6 +499,7 @@ virDomainMemballoonModelTypeFromString; virDomainMemballoonModelTypeToString; virDomainMemoryDefFree; virDomainMemoryFindByDef; +virDomainMemoryFindByDeviceInfo; virDomainMemoryFindInactiveByDef; virDomainMemoryInsert; virDomainMemoryModelTypeToString; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 16c5ccae45..c055fc6e55 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7129,6 +7129,165 @@ qemuDomainChangeDiskLive(virDomainObjPtr vm, return 0; } + +static bool +qemuDomainChangeMemoryLiveValidateChange(const virDomainMemoryDef *oldDef, + const virDomainMemoryDef *newDef) +{ + /* The only thing that is allowed to change is 'requestedsize' for virtio + * model. */ + if (oldDef->model != newDef->model) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory model from '%s' to '%s'"), + virDomainMemoryModelTypeToString(oldDef->model), + virDomainMemoryModelTypeToString(newDef->model)); + return false; + } + + if (oldDef->access != newDef->access) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory access from '%s' to '%s'"), + virDomainMemoryAccessTypeToString(oldDef->access), + virDomainMemoryAccessTypeToString(newDef->access)); + return false; + } + + if (oldDef->discard != newDef->discard) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory discard from '%s' to '%s'"), + virTristateBoolTypeToString(oldDef->discard), + virTristateBoolTypeToString(newDef->discard)); + return false; + } + + if (!virBitmapEqual(oldDef->sourceNodes, + newDef->sourceNodes)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cannot modify memory source nodes")); + return false; + } + + if (oldDef->pagesize != newDef->pagesize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory pagesize from '%llu' to '%llu'"), + oldDef->pagesize, + newDef->pagesize); + return false; + } + + if (STRNEQ_NULLABLE(oldDef->nvdimmPath, newDef->nvdimmPath)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory path from '%s' to '%s'"), + NULLSTR(oldDef->nvdimmPath), + NULLSTR(newDef->nvdimmPath)); + return false; + } + + if (oldDef->alignsize != newDef->alignsize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory align size from '%llu' to '%llu'"), + oldDef->alignsize, newDef->alignsize); + return false; + } + + if (oldDef->nvdimmPmem != newDef->nvdimmPmem) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory pmem from '%d' to '%d'"), + oldDef->nvdimmPmem, newDef->nvdimmPmem); + return false; + } + + if (oldDef->targetNode != newDef->targetNode) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory targetNode from '%d' to '%d'"), + oldDef->targetNode, newDef->targetNode); + return false; + } + + if (oldDef->size != newDef->size) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory size from '%llu' to '%llu'"), + oldDef->size, newDef->size); + return false; + } + + if (oldDef->labelsize != newDef->labelsize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory label size from '%llu' to '%llu'"), + oldDef->labelsize, newDef->labelsize); + return false; + } + if (oldDef->blocksize != newDef->blocksize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory block size from '%llu' to '%llu'"), + oldDef->blocksize, newDef->blocksize); + return false; + } + + /* requestedsize can change */ + + if (oldDef->readonly != newDef->readonly) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cannot modify memory pmem flag")); + return false; + } + + if ((oldDef->uuid || newDef->uuid) && + !(oldDef->uuid && newDef->uuid && + memcmp(oldDef->uuid, newDef->uuid, VIR_UUID_BUFLEN) == 0)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cannot modify memory UUID")); + return false; + } + + switch (oldDef->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory of model '%s'"), + virDomainMemoryModelTypeToString(oldDef->model)); + return false; + break; + } + + return true; +} + + +static int +qemuDomainChangeMemoryLive(virQEMUDriverPtr driver G_GNUC_UNUSED, + virDomainObjPtr vm, + virDomainDeviceDefPtr dev) +{ + virDomainMemoryDefPtr newDef = dev->data.memory; + virDomainMemoryDefPtr oldDef = NULL; + + oldDef = virDomainMemoryFindByDeviceInfo(vm->def, &dev->data.memory->info); + if (!oldDef) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("memory '%s' not found"), dev->data.memory->info.alias); + return -1; + } + + if (!qemuDomainChangeMemoryLiveValidateChange(oldDef, newDef)) + return -1; + + if (qemuDomainChangeMemoryRequestedSize(driver, vm, + newDef->info.alias, + newDef->requestedsize) < 0) + return -1; + + oldDef->requestedsize = newDef->requestedsize; + return 0; +} + + static int qemuDomainUpdateDeviceLive(virDomainObjPtr vm, virDomainDeviceDefPtr dev, @@ -7170,6 +7329,18 @@ qemuDomainUpdateDeviceLive(virDomainObjPtr vm, ret = qemuDomainChangeNet(driver, vm, dev); break; + case VIR_DOMAIN_DEVICE_MEMORY: + oldDev.data.memory = virDomainMemoryFindByDeviceInfo(vm->def, &dev->data.memory->info); + if (oldDev.data.memory) { + if (virDomainDefCompatibleDevice(vm->def, dev, &oldDev, + VIR_DOMAIN_DEVICE_ACTION_UPDATE, + true) < 0) + return -1; + } + + ret = qemuDomainChangeMemoryLive(driver, vm, dev); + break; + case VIR_DOMAIN_DEVICE_FS: case VIR_DOMAIN_DEVICE_INPUT: case VIR_DOMAIN_DEVICE_SOUND: @@ -7185,7 +7356,6 @@ qemuDomainUpdateDeviceLive(virDomainObjPtr vm, case VIR_DOMAIN_DEVICE_HOSTDEV: case VIR_DOMAIN_DEVICE_CONTROLLER: case VIR_DOMAIN_DEVICE_REDIRDEV: - case VIR_DOMAIN_DEVICE_MEMORY: case VIR_DOMAIN_DEVICE_CHR: case VIR_DOMAIN_DEVICE_NONE: case VIR_DOMAIN_DEVICE_TPM: diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index a66354426d..bfd5fb80d7 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -6714,3 +6714,21 @@ qemuDomainSetVcpuInternal(virQEMUDriverPtr driver, virBitmapFree(livevcpus); return ret; } + + +int +qemuDomainChangeMemoryRequestedSize(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *alias, + unsigned long long requestedsize) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + int rc; + + qemuDomainObjEnterMonitor(driver, vm); + rc = qemuMonitorChangeMemoryRequestedSize(priv->mon, alias, requestedsize); + if (qemuDomainObjExitMonitor(driver, vm) < 0) + return -1; + + return rc; +} diff --git a/src/qemu/qemu_hotplug.h b/src/qemu/qemu_hotplug.h index 6287c5b5e8..9e551a1f82 100644 --- a/src/qemu/qemu_hotplug.h +++ b/src/qemu/qemu_hotplug.h @@ -160,3 +160,8 @@ int qemuHotplugAttachDBusVMState(virQEMUDriverPtr driver, int qemuHotplugRemoveDBusVMState(virQEMUDriverPtr driver, virDomainObjPtr vm, qemuDomainAsyncJob asyncJob); + +int qemuDomainChangeMemoryRequestedSize(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *alias, + unsigned long long requestedsize); diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index f6cd9d9eda..7e7d2a376e 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -4742,3 +4742,16 @@ qemuMonitorTransactionBackup(virJSONValuePtr actions, return qemuMonitorJSONTransactionBackup(actions, device, jobname, target, bitmap, syncmode); } + + +int +qemuMonitorChangeMemoryRequestedSize(qemuMonitorPtr mon, + const char *alias, + unsigned long long requestedsize) +{ + VIR_DEBUG("alias=%s requestedsize=%llu", alias, requestedsize); + + QEMU_CHECK_MONITOR(mon); + + return qemuMonitorJSONChangeMemoryRequestedSize(mon, alias, requestedsize); +} diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index d25c26343a..51c5732c93 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1526,3 +1526,7 @@ qemuMonitorTransactionBackup(virJSONValuePtr actions, const char *target, const char *bitmap, qemuMonitorTransactionBackupSyncMode syncmode); + +int qemuMonitorChangeMemoryRequestedSize(qemuMonitorPtr mon, + const char *alias, + unsigned long long requestedsize); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index ad62e24cfc..78519bcd57 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -9480,3 +9480,18 @@ qemuMonitorJSONGetCPUMigratable(qemuMonitorPtr mon, return virJSONValueGetBoolean(virJSONValueObjectGet(reply, "return"), migratable); } + + +int +qemuMonitorJSONChangeMemoryRequestedSize(qemuMonitorPtr mon, + const char *alias, + unsigned long long requestedsize) +{ + g_autofree char *path = g_strdup_printf("/machine/peripheral/%s", alias); + qemuMonitorJSONObjectProperty prop = { + .type = QEMU_MONITOR_OBJECT_PROPERTY_ULONG, + .val.ul = requestedsize * 1024, /* monitor needs bytes */ + }; + + return qemuMonitorJSONSetObjectProperty(mon, path, "requested-size", &prop); +} diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index 3dd1eb24c7..5e802f54e6 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -711,3 +711,8 @@ int qemuMonitorJSONSetDBusVMStateIdList(qemuMonitorPtr mon, int qemuMonitorJSONGetCPUMigratable(qemuMonitorPtr mon, bool *migratable); + +int +qemuMonitorJSONChangeMemoryRequestedSize(qemuMonitorPtr mon, + const char *alias, + unsigned long long requestedsize); -- 2.26.2

As advertised in previous commit, this event is delivered to us when virtio-mem module changes the allocation inside the guest. It comes with one attribute - size - which holds the new size of the virtio-mem (well, allocated size), in bytes. Mind you, this is not necessarily the same number as 'requested size'. It almost certainly will be when sizing the memory up, but it might not be when sizing the memory down - the guest kernel might be unable to free some blocks. This actual size is reported in the domain XML as an output element only. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.rst | 7 ++++++ docs/schemas/domaincommon.rng | 5 +++++ src/conf/domain_conf.c | 26 ++++++++++++++++++++-- src/conf/domain_conf.h | 8 +++++++ src/libvirt_private.syms | 1 + src/qemu/qemu_domain.c | 3 +++ src/qemu/qemu_domain.h | 1 + src/qemu/qemu_driver.c | 31 ++++++++++++++++++++++++++ src/qemu/qemu_monitor.c | 24 ++++++++++++++++++++ src/qemu/qemu_monitor.h | 20 +++++++++++++++++ src/qemu/qemu_monitor_json.c | 24 ++++++++++++++++++++ src/qemu/qemu_process.c | 42 +++++++++++++++++++++++++++++++++++ 12 files changed, 190 insertions(+), 2 deletions(-) diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index a09ea68888..80adda5b6e 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -7683,6 +7683,7 @@ Example: usage of the memory devices <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>1048576</requested> + <actual unit='KiB'>524288</requested> </target> </memory> </devices> @@ -7798,6 +7799,12 @@ Example: usage of the memory devices The total size exposed to the guest. Must respect ``block`` granularity and be smaller or equal to ``size``. + ``actual`` + Active XML for ``virtio-mem`` model may contain ``actual`` element that + reflects the actual size of the corresponding virtio memory device. The + element is formatted into live XML and never parsed, i.e. it is + output-only element. + :anchor:`<a id="elementsIommu"/>` IOMMU devices diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index a1e3d96724..4ce0fdcd49 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -6502,6 +6502,11 @@ <ref name="scaledInteger"/> </element> </optional> + <optional> + <element name="actual"> + <ref name="scaledInteger"/> + </element> + </optional> <optional> <element name="label"> <element name="size"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index f6d071dc17..c6efc26357 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -17849,6 +17849,23 @@ virDomainMemoryFindByDeviceInfo(virDomainDefPtr def, } +virDomainMemoryDefPtr +virDomainMemoryFindByDeviceAlias(virDomainDefPtr def, + const char *alias) +{ + size_t i; + + for (i = 0; i < def->nmems; i++) { + virDomainMemoryDefPtr tmp = def->mems[i]; + + if (STREQ_NULLABLE(tmp->info.alias, alias)) + return tmp; + } + + return NULL; +} + + /** * virDomainMemoryInsert: * @@ -27328,7 +27345,8 @@ virDomainMemorySourceDefFormat(virBufferPtr buf, static void virDomainMemoryTargetDefFormat(virBufferPtr buf, - virDomainMemoryDefPtr def) + virDomainMemoryDefPtr def, + unsigned int flags) { g_auto(virBuffer) childBuf = VIR_BUFFER_INIT_CHILD(buf); @@ -27350,6 +27368,10 @@ virDomainMemoryTargetDefFormat(virBufferPtr buf, virBufferAsprintf(&childBuf, "<requested unit='KiB'>%llu</requested>\n", def->requestedsize); + if (!(flags & VIR_DOMAIN_DEF_FORMAT_INACTIVE)) { + virBufferAsprintf(&childBuf, "<actual unit='KiB'>%llu</actual>\n", + def->actualsize); + } } virXMLFormatElement(buf, "target", NULL, &childBuf); @@ -27382,7 +27404,7 @@ virDomainMemoryDefFormat(virBufferPtr buf, if (virDomainMemorySourceDefFormat(buf, def) < 0) return -1; - virDomainMemoryTargetDefFormat(buf, def); + virDomainMemoryTargetDefFormat(buf, def, flags); virDomainDeviceInfoFormat(buf, &def->info, flags); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 169887e1a8..e3a0d19e4f 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2454,6 +2454,9 @@ struct _virDomainMemoryDef { unsigned long long labelsize; /* kibibytes; valid only for NVDIMM */ unsigned long long blocksize; /* kibibytes; valid only for VIRTIO_MEM */ unsigned long long requestedsize; /* kibibytes; valid only for VIRTIO_MEM */ + unsigned long long actualsize; /* kibibytes, valid for VIRTIO_MEM and + active domain only, only to report never + parse */ bool readonly; /* valid only for NVDIMM */ /* required for QEMU NVDIMM ppc64 support */ @@ -3746,6 +3749,11 @@ virDomainMemoryFindByDeviceInfo(virDomainDefPtr dev, virDomainDeviceInfoPtr info) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; +virDomainMemoryDefPtr +virDomainMemoryFindByDeviceAlias(virDomainDefPtr def, + const char *alias) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; + int virDomainShmemDefInsert(virDomainDefPtr def, virDomainShmemDefPtr shmem) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; bool virDomainShmemDefEquals(virDomainShmemDefPtr src, virDomainShmemDefPtr dst) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index a317992f1e..038b17ce08 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -499,6 +499,7 @@ virDomainMemballoonModelTypeFromString; virDomainMemballoonModelTypeToString; virDomainMemoryDefFree; virDomainMemoryFindByDef; +virDomainMemoryFindByDeviceAlias; virDomainMemoryFindByDeviceInfo; virDomainMemoryFindInactiveByDef; virDomainMemoryInsert; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index bfecbdcfb4..621ae3b694 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11058,6 +11058,9 @@ qemuProcessEventFree(struct qemuProcessEvent *event) case QEMU_PROCESS_EVENT_JOB_STATUS_CHANGE: virObjectUnref(event->data); break; + case QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE: + qemuMonitorMemoryDeviceSizeChangeFree(event->data); + break; case QEMU_PROCESS_EVENT_PR_DISCONNECT: case QEMU_PROCESS_EVENT_LAST: break; diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 949307229b..3d693d840d 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -441,6 +441,7 @@ typedef enum { QEMU_PROCESS_EVENT_PR_DISCONNECT, QEMU_PROCESS_EVENT_RDMA_GID_STATUS_CHANGED, QEMU_PROCESS_EVENT_GUEST_CRASHLOADED, + QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE, QEMU_PROCESS_EVENT_LAST } qemuProcessEventType; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index c055fc6e55..006c8fd1a4 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4316,6 +4316,34 @@ processGuestCrashloadedEvent(virQEMUDriverPtr driver, } +static void +processMemoryDeviceSizeChange(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuMonitorMemoryDeviceSizeChangePtr info) +{ + virDomainMemoryDefPtr mem = NULL; + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) + return; + + if (!virDomainObjIsActive(vm)) { + VIR_DEBUG("Domain is not running"); + goto endjob; + } + + mem = virDomainMemoryFindByDeviceAlias(vm->def, info->devAlias); + if (!mem) { + VIR_DEBUG("Memory device '%s' not found", info->devAlias); + goto endjob; + } + + mem->actualsize = VIR_DIV_UP(info->size, 1024); + + endjob: + qemuDomainObjEndJob(driver, vm); +} + + static void qemuProcessEventHandler(void *data, void *opaque) { struct qemuProcessEvent *processEvent = data; @@ -4365,6 +4393,9 @@ static void qemuProcessEventHandler(void *data, void *opaque) case QEMU_PROCESS_EVENT_GUEST_CRASHLOADED: processGuestCrashloadedEvent(driver, vm); break; + case QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE: + processMemoryDeviceSizeChange(driver, vm, processEvent->data); + break; case QEMU_PROCESS_EVENT_LAST: break; } diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 7e7d2a376e..6f05a0eaab 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1444,6 +1444,20 @@ qemuMonitorEmitSpiceMigrated(qemuMonitorPtr mon) } +int +qemuMonitorEmitMemoryDeviceSizeChange(qemuMonitorPtr mon, + const char *devAlias, + unsigned long long size) +{ + int ret = -1; + VIR_DEBUG("mon=%p, devAlias='%s', size=%llu", mon, devAlias, size); + + QEMU_MONITOR_CALLBACK(mon, ret, domainMemoryDeviceSizeChange, mon->vm, devAlias, size); + + return ret; +} + + int qemuMonitorEmitMemoryFailure(qemuMonitorPtr mon, qemuMonitorEventMemoryFailurePtr mfp) @@ -4416,6 +4430,16 @@ qemuMonitorEventRdmaGidStatusFree(qemuMonitorRdmaGidStatusPtr info) } +void +qemuMonitorMemoryDeviceSizeChangeFree(qemuMonitorMemoryDeviceSizeChangePtr info) +{ + if (!info) + return; + + g_free(info->devAlias); +} + + int qemuMonitorSetWatchdogAction(qemuMonitorPtr mon, const char *action) diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 51c5732c93..4c352125c9 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -108,6 +108,14 @@ struct _qemuMonitorRdmaGidStatus { }; +typedef struct _qemuMonitorMemoryDeviceSizeChange qemuMonitorMemoryDeviceSizeChange; +typedef qemuMonitorMemoryDeviceSizeChange *qemuMonitorMemoryDeviceSizeChangePtr; +struct _qemuMonitorMemoryDeviceSizeChange { + char *devAlias; + unsigned long long size; +}; + + typedef enum { QEMU_MONITOR_JOB_TYPE_UNKNOWN, /* internal value, not exposed by qemu */ QEMU_MONITOR_JOB_TYPE_COMMIT, @@ -153,6 +161,7 @@ struct _qemuMonitorJobInfo { char *qemuMonitorGuestPanicEventInfoFormatMsg(qemuMonitorEventPanicInfoPtr info); void qemuMonitorEventPanicInfoFree(qemuMonitorEventPanicInfoPtr info); void qemuMonitorEventRdmaGidStatusFree(qemuMonitorRdmaGidStatusPtr info); +void qemuMonitorMemoryDeviceSizeChangeFree(qemuMonitorMemoryDeviceSizeChangePtr info); typedef void (*qemuMonitorDestroyCallback)(qemuMonitorPtr mon, virDomainObjPtr vm, @@ -374,6 +383,12 @@ typedef int (*qemuMonitorDomainMemoryFailureCallback)(qemuMonitorPtr mon, qemuMonitorEventMemoryFailurePtr mfp, void *opaque); +typedef int (*qemuMonitorDomainMemoryDeviceSizeChange)(qemuMonitorPtr mon, + virDomainObjPtr vm, + const char *alias, + unsigned long long size, + void *opaque); + typedef struct _qemuMonitorCallbacks qemuMonitorCallbacks; typedef qemuMonitorCallbacks *qemuMonitorCallbacksPtr; struct _qemuMonitorCallbacks { @@ -411,6 +426,7 @@ struct _qemuMonitorCallbacks { qemuMonitorDomainRdmaGidStatusChangedCallback domainRdmaGidStatusChanged; qemuMonitorDomainGuestCrashloadedCallback domainGuestCrashloaded; qemuMonitorDomainMemoryFailureCallback domainMemoryFailure; + qemuMonitorDomainMemoryDeviceSizeChange domainMemoryDeviceSizeChange; }; qemuMonitorPtr qemuMonitorOpen(virDomainObjPtr vm, @@ -507,6 +523,10 @@ int qemuMonitorEmitSerialChange(qemuMonitorPtr mon, bool connected); int qemuMonitorEmitSpiceMigrated(qemuMonitorPtr mon); +int qemuMonitorEmitMemoryDeviceSizeChange(qemuMonitorPtr mon, + const char *devAlias, + unsigned long long size); + int qemuMonitorEmitMemoryFailure(qemuMonitorPtr mon, qemuMonitorEventMemoryFailurePtr mfp); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 78519bcd57..ba1c63a37b 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -113,6 +113,7 @@ static void qemuMonitorJSONHandleDumpCompleted(qemuMonitorPtr mon, virJSONValueP static void qemuMonitorJSONHandlePRManagerStatusChanged(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleRdmaGidStatusChanged(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleMemoryFailure(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleMemoryDeviceSizeChange(qemuMonitorPtr mon, virJSONValuePtr data); typedef struct { const char *type; @@ -133,6 +134,7 @@ static qemuEventHandler eventHandlers[] = { { "GUEST_CRASHLOADED", qemuMonitorJSONHandleGuestCrashloaded, }, { "GUEST_PANICKED", qemuMonitorJSONHandleGuestPanic, }, { "JOB_STATUS_CHANGE", qemuMonitorJSONHandleJobStatusChange, }, + { "MEMORY_DEVICE_SIZE_CHANGE", qemuMonitorJSONHandleMemoryDeviceSizeChange, }, { "MEMORY_FAILURE", qemuMonitorJSONHandleMemoryFailure, }, { "MIGRATION", qemuMonitorJSONHandleMigrationStatus, }, { "MIGRATION_PASS", qemuMonitorJSONHandleMigrationPass, }, @@ -1329,6 +1331,28 @@ qemuMonitorJSONHandleSpiceMigrated(qemuMonitorPtr mon, } +static void +qemuMonitorJSONHandleMemoryDeviceSizeChange(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + const char *name; + unsigned long long size; + + if (!(name = virJSONValueObjectGetString(data, "id"))) { + VIR_WARN("missing device alias in MEMORY_DEVICE_SIZE_CHANGE event"); + return; + } + + if (virJSONValueObjectGetNumberUlong(data, "size", &size) < 0) { + VIR_WARN("missing new size for '%s' in MEMORY_DEVICE_SIZE_CHANGE event", name); + return; + } + + + qemuMonitorEmitMemoryDeviceSizeChange(mon, name, size); +} + + static void qemuMonitorJSONHandleMemoryFailure(qemuMonitorPtr mon, virJSONValuePtr data) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 6eb61f5667..eb9fab8e9a 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1940,6 +1940,47 @@ qemuProcessHandleMemoryFailure(qemuMonitorPtr mon G_GNUC_UNUSED, } +static int +qemuProcessHandleMemoryDeviceSizeChange(qemuMonitorPtr mon G_GNUC_UNUSED, + virDomainObjPtr vm, + const char *devAlias, + unsigned long long size, + void *opaque) +{ + virQEMUDriverPtr driver = opaque; + struct qemuProcessEvent *processEvent = NULL; + qemuMonitorMemoryDeviceSizeChangePtr info = NULL; + int ret = -1; + + virObjectLock(vm); + + VIR_DEBUG("Memory device '%s' changed size to '%llu' in domain '%s'", + devAlias, size, vm->def->name); + + info = g_new0(qemuMonitorMemoryDeviceSizeChange, 1); + info->devAlias = g_strdup(devAlias); + info->size = size; + + processEvent = g_new0(struct qemuProcessEvent, 1); + processEvent->eventType = QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE; + processEvent->vm = virObjectRef(vm); + processEvent->data = g_steal_pointer(&info); + + if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { + qemuProcessEventFree(processEvent); + virObjectUnref(vm); + goto cleanup; + } + + processEvent = NULL; + ret = 0; + cleanup: + qemuProcessEventFree(processEvent); + virObjectUnlock(vm); + return ret; +} + + static qemuMonitorCallbacks monitorCallbacks = { .eofNotify = qemuProcessHandleMonitorEOF, .errorNotify = qemuProcessHandleMonitorError, @@ -1973,6 +2014,7 @@ static qemuMonitorCallbacks monitorCallbacks = { .domainRdmaGidStatusChanged = qemuProcessHandleRdmaGidStatusChanged, .domainGuestCrashloaded = qemuProcessHandleGuestCrashloaded, .domainMemoryFailure = qemuProcessHandleMemoryFailure, + .domainMemoryDeviceSizeChange = qemuProcessHandleMemoryDeviceSizeChange, }; static void -- 2.26.2

New event is introduced that is emitted whenever guest acknowledges allocation change request of a virtio-mem. The aim is to let applications know when that happens, because changes in allocation are not synchronous with issuing the request. Under the hood, the event is emitted whenever QEMU emits MEMORY_DEVICE_SIZE_CHANGE event. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- examples/c/misc/event-test.c | 17 ++++++ include/libvirt/libvirt-domain.h | 23 ++++++++ src/conf/domain_event.c | 84 +++++++++++++++++++++++++++++ src/conf/domain_event.h | 10 ++++ src/libvirt_private.syms | 2 + src/qemu/qemu_driver.c | 6 +++ src/remote/remote_daemon_dispatch.c | 30 +++++++++++ src/remote/remote_driver.c | 32 +++++++++++ src/remote/remote_protocol.x | 15 +++++- src/remote_protocol-structs | 7 +++ tools/virsh-domain.c | 20 +++++++ 11 files changed, 245 insertions(+), 1 deletion(-) diff --git a/examples/c/misc/event-test.c b/examples/c/misc/event-test.c index 76d4f3f6e8..0b77914715 100644 --- a/examples/c/misc/event-test.c +++ b/examples/c/misc/event-test.c @@ -978,6 +978,22 @@ myDomainEventMemoryFailureCallback(virConnectPtr conn G_GNUC_UNUSED, } +static int +myDomainEventMemoryDeviceSizeChangeCallback(virConnectPtr conn G_GNUC_UNUSED, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque G_GNUC_UNUSED) +{ + /* Casts to uint64_t to work around mingw not knowing %lld */ + printf("%s EVENT: Domain %s(%d) memory device size change: " + "alias: '%s' new size %" PRIu64 "'\n", + __func__, virDomainGetName(dom), virDomainGetID(dom), + alias, (uint64_t)size); + return 0; +} + + static int myDomainEventMigrationIterationCallback(virConnectPtr conn G_GNUC_UNUSED, virDomainPtr dom, @@ -1109,6 +1125,7 @@ struct domainEventData domainEvents[] = { DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_METADATA_CHANGE, myDomainEventMetadataChangeCallback), DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_BLOCK_THRESHOLD, myDomainEventBlockThresholdCallback), DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_MEMORY_FAILURE, myDomainEventMemoryFailureCallback), + DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE, myDomainEventMemoryDeviceSizeChangeCallback), }; struct storagePoolEventData { diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 8011cf9fe1..2c449b4f31 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -4651,6 +4651,28 @@ typedef void (*virConnectDomainEventMemoryFailureCallback)(virConnectPtr conn, unsigned int flags, void *opaque); +/** + * virConnectDomainEventMemoryDeviceSizeChangeCallback: + * @conn: connection object + * @dom: domain on which the event occurred + * @alias: memory device alias + * @size: new actual size of memory device (in KiB) + * @opaque: application specified data + * + * The callback occurs when the guest acknowledges request to change size of + * memory device (so far only virtio-mem model supports this). The @size then + * reflects the new amount of guest visible memory (in kibibytes). + * + * The callback signature to use when registering for an event of type + * VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE with + * virConnectDomainEventRegisterAny(). + */ +typedef void (*virConnectDomainEventMemoryDeviceSizeChangeCallback)(virConnectPtr conn, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque); + /** * VIR_DOMAIN_EVENT_CALLBACK: @@ -4695,6 +4717,7 @@ typedef enum { VIR_DOMAIN_EVENT_ID_METADATA_CHANGE = 23, /* virConnectDomainEventMetadataChangeCallback */ VIR_DOMAIN_EVENT_ID_BLOCK_THRESHOLD = 24, /* virConnectDomainEventBlockThresholdCallback */ VIR_DOMAIN_EVENT_ID_MEMORY_FAILURE = 25, /* virConnectDomainEventMemoryFailureCallback */ + VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE = 26, /* virConnectDomainEventMemoryDeviceSizeChangeCallback */ # ifdef VIR_ENUM_SENTINELS VIR_DOMAIN_EVENT_ID_LAST diff --git a/src/conf/domain_event.c b/src/conf/domain_event.c index 726a792dae..31f9cc34f1 100644 --- a/src/conf/domain_event.c +++ b/src/conf/domain_event.c @@ -58,6 +58,7 @@ static virClassPtr virDomainEventDeviceRemovalFailedClass; static virClassPtr virDomainEventMetadataChangeClass; static virClassPtr virDomainEventBlockThresholdClass; static virClassPtr virDomainEventMemoryFailureClass; +static virClassPtr virDomainEventMemoryDeviceSizeChangeClass; static void virDomainEventDispose(void *obj); static void virDomainEventLifecycleDispose(void *obj); @@ -81,6 +82,7 @@ static void virDomainEventDeviceRemovalFailedDispose(void *obj); static void virDomainEventMetadataChangeDispose(void *obj); static void virDomainEventBlockThresholdDispose(void *obj); static void virDomainEventMemoryFailureDispose(void *obj); +static void virDomainEventMemoryDeviceSizeChangeDispose(void *obj); static void virDomainEventDispatchDefaultFunc(virConnectPtr conn, @@ -299,6 +301,15 @@ struct _virDomainEventMemoryFailure { typedef struct _virDomainEventMemoryFailure virDomainEventMemoryFailure; typedef virDomainEventMemoryFailure *virDomainEventMemoryFailurePtr; +struct _virDomainEventMemoryDeviceSizeChange { + virDomainEvent parent; + + char *alias; + unsigned long long size; +}; +typedef struct _virDomainEventMemoryDeviceSizeChange virDomainEventMemoryDeviceSizeChange; +typedef virDomainEventMemoryDeviceSizeChange *virDomainEventMemoryDeviceSizeChangePtr; + static int virDomainEventsOnceInit(void) { @@ -346,6 +357,8 @@ virDomainEventsOnceInit(void) return -1; if (!VIR_CLASS_NEW(virDomainEventMemoryFailure, virDomainEventClass)) return -1; + if (!VIR_CLASS_NEW(virDomainEventMemoryDeviceSizeChange, virDomainEventClass)) + return -1; return 0; } @@ -562,6 +575,14 @@ virDomainEventMemoryFailureDispose(void *obj) VIR_DEBUG("obj=%p", event); } +static void +virDomainEventMemoryDeviceSizeChangeDispose(void *obj) +{ + virDomainEventMemoryDeviceSizeChangePtr event = obj; + VIR_DEBUG("obj=%p", event); + + g_free(event->alias); +} static void * virDomainEventNew(virClassPtr klass, @@ -1686,6 +1707,57 @@ virDomainEventMemoryFailureNewFromDom(virDomainPtr dom, recipient, action, flags); } + +static virObjectEventPtr +virDomainEventMemoryDeviceSizeChangeNew(int id, + const char *name, + unsigned char *uuid, + const char *alias, + unsigned long long size) +{ + virDomainEventMemoryDeviceSizeChangePtr ev; + + if (virDomainEventsInitialize() < 0) + return NULL; + + if (!(ev = virDomainEventNew(virDomainEventMemoryDeviceSizeChangeClass, + VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE, + id, name, uuid))) + return NULL; + + ev->alias = g_strdup(alias); + ev->size = size; + + return (virObjectEventPtr)ev; +} + + +virObjectEventPtr +virDomainEventMemoryDeviceSizeChangeNewFromObj(virDomainObjPtr obj, + const char *alias, + unsigned long long size) +{ + return virDomainEventMemoryDeviceSizeChangeNew(obj->def->id, + obj->def->name, + obj->def->uuid, + alias, + size); +} + + +virObjectEventPtr +virDomainEventMemoryDeviceSizeChangeNewFromDom(virDomainPtr dom, + const char *alias, + unsigned long long size) +{ + return virDomainEventMemoryDeviceSizeChangeNew(dom->id, + dom->name, + dom->uuid, + alias, + size); +} + + static void virDomainEventDispatchDefaultFunc(virConnectPtr conn, virObjectEventPtr event, @@ -1982,6 +2054,18 @@ virDomainEventDispatchDefaultFunc(virConnectPtr conn, goto cleanup; } + case VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE: + { + virDomainEventMemoryDeviceSizeChangePtr memoryDeviceSizeChangeEvent; + + memoryDeviceSizeChangeEvent = (virDomainEventMemoryDeviceSizeChangePtr)event; + ((virConnectDomainEventMemoryDeviceSizeChangeCallback)cb)(conn, dom, + memoryDeviceSizeChangeEvent->alias, + memoryDeviceSizeChangeEvent->size, + cbopaque); + goto cleanup; + } + case VIR_DOMAIN_EVENT_ID_LAST: break; } diff --git a/src/conf/domain_event.h b/src/conf/domain_event.h index 13a1c56ce1..717d1c7307 100644 --- a/src/conf/domain_event.h +++ b/src/conf/domain_event.h @@ -267,6 +267,16 @@ virDomainEventMemoryFailureNewFromDom(virDomainPtr dom, int action, unsigned int flags); +virObjectEventPtr +virDomainEventMemoryDeviceSizeChangeNewFromObj(virDomainObjPtr obj, + const char *alias, + unsigned long long size); + +virObjectEventPtr +virDomainEventMemoryDeviceSizeChangeNewFromDom(virDomainPtr dom, + const char *alias, + unsigned long long size); + int virDomainEventStateRegister(virConnectPtr conn, virObjectEventStatePtr state, diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 038b17ce08..547b42b9e2 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -714,6 +714,8 @@ virDomainEventLifecycleNew; virDomainEventLifecycleNewFromDef; virDomainEventLifecycleNewFromDom; virDomainEventLifecycleNewFromObj; +virDomainEventMemoryDeviceSizeChangeNewFromDom; +virDomainEventMemoryDeviceSizeChangeNewFromObj; virDomainEventMemoryFailureNewFromDom; virDomainEventMemoryFailureNewFromObj; virDomainEventMetadataChangeNewFromDom; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 006c8fd1a4..6bcbc75fbc 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4322,6 +4322,7 @@ processMemoryDeviceSizeChange(virQEMUDriverPtr driver, qemuMonitorMemoryDeviceSizeChangePtr info) { virDomainMemoryDefPtr mem = NULL; + virObjectEventPtr event = NULL; if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) return; @@ -4339,8 +4340,13 @@ processMemoryDeviceSizeChange(virQEMUDriverPtr driver, mem->actualsize = VIR_DIV_UP(info->size, 1024); + event = virDomainEventMemoryDeviceSizeChangeNewFromObj(vm, + info->devAlias, + mem->actualsize); + endjob: qemuDomainObjEndJob(driver, vm); + virObjectEventStateQueue(driver->domainEventState, event); } diff --git a/src/remote/remote_daemon_dispatch.c b/src/remote/remote_daemon_dispatch.c index 9700dba450..4af82dc809 100644 --- a/src/remote/remote_daemon_dispatch.c +++ b/src/remote/remote_daemon_dispatch.c @@ -1333,6 +1333,35 @@ remoteRelayDomainEventMemoryFailure(virConnectPtr conn, } +static int +remoteRelayDomainEventMemoryDeviceSizeChange(virConnectPtr conn, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque) +{ + daemonClientEventCallbackPtr callback = opaque; + remote_domain_event_memory_device_size_change_msg data; + + if (callback->callbackID < 0 || + !remoteRelayDomainEventCheckACL(callback->client, conn, dom)) + return -1; + + /* build return data */ + memset(&data, 0, sizeof(data)); + data.callbackID = callback->callbackID; + data.alias = g_strdup(alias); + data.size = size; + make_nonnull_domain(&data.dom, dom); + + remoteDispatchObjectEventSend(callback->client, remoteProgram, + REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE, + (xdrproc_t)xdr_remote_domain_event_memory_device_size_change_msg, + &data); + return 0; +} + + static virConnectDomainEventGenericCallback domainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventLifecycle), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventReboot), @@ -1360,6 +1389,7 @@ static virConnectDomainEventGenericCallback domainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventMetadataChange), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventBlockThreshold), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventMemoryFailure), + VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventMemoryDeviceSizeChange), }; G_STATIC_ASSERT(G_N_ELEMENTS(domainEventCallbacks) == VIR_DOMAIN_EVENT_ID_LAST); diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index 31868269b1..b2394ec2a8 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -410,6 +410,10 @@ remoteDomainBuildEventMemoryFailure(virNetClientProgramPtr prog, void *evdata, void *opaque); static void +remoteDomainBuildEventMemoryDeviceSizeChange(virNetClientProgramPtr prog, + virNetClientPtr client, + void *evdata, void *opaque); +static void remoteConnectNotifyEventConnectionClosed(virNetClientProgramPtr prog G_GNUC_UNUSED, virNetClientPtr client G_GNUC_UNUSED, void *evdata, void *opaque); @@ -624,6 +628,10 @@ static virNetClientProgramEvent remoteEvents[] = { remoteDomainBuildEventMemoryFailure, sizeof(remote_domain_event_memory_failure_msg), (xdrproc_t)xdr_remote_domain_event_memory_failure_msg }, + { REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE, + remoteDomainBuildEventMemoryDeviceSizeChange, + sizeof(remote_domain_event_memory_device_size_change_msg), + (xdrproc_t)xdr_remote_domain_event_memory_device_size_change_msg }, }; static void @@ -5463,6 +5471,30 @@ remoteDomainBuildEventMemoryFailure(virNetClientProgramPtr prog G_GNUC_UNUSED, } +static void +remoteDomainBuildEventMemoryDeviceSizeChange(virNetClientProgramPtr prog G_GNUC_UNUSED, + virNetClientPtr client G_GNUC_UNUSED, + void *evdata, void *opaque) +{ + virConnectPtr conn = opaque; + remote_domain_event_memory_device_size_change_msg *msg = evdata; + struct private_data *priv = conn->privateData; + virDomainPtr dom; + virObjectEventPtr event = NULL; + + if (!(dom = get_nonnull_domain(conn, msg->dom))) + return; + + event = virDomainEventMemoryDeviceSizeChangeNewFromDom(dom, + msg->alias, + msg->size); + + virObjectUnref(dom); + + virObjectEventStateQueueRemote(priv->eventState, event, msg->callbackID); +} + + static int remoteStreamSend(virStreamPtr st, const char *data, diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index d3724bc305..d154bde7db 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -3812,6 +3812,13 @@ struct remote_domain_get_messages_ret { }; +struct remote_domain_event_memory_device_size_change_msg { + int callbackID; + remote_nonnull_domain dom; + remote_nonnull_string alias; + unsigned hyper size; +}; + /*----- Protocol. -----*/ /* Define the program number, protocol version and procedure numbers here. */ @@ -6733,5 +6740,11 @@ enum remote_procedure { * @generate: none * @acl: domain:read */ - REMOTE_PROC_DOMAIN_GET_MESSAGES = 426 + REMOTE_PROC_DOMAIN_GET_MESSAGES = 426, + + /** + * @generate: both + * @acl: none + */ + REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE = 427 }; diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index c0c034ac6a..37120be91b 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -3172,6 +3172,12 @@ struct remote_domain_get_messages_ret { remote_nonnull_string * msgs_val; } msgs; }; +struct remote_domain_event_memory_device_size_change_msg { + int callbackID; + remote_nonnull_domain dom; + remote_nonnull_string alias; + uint64_t size; +}; enum remote_procedure { REMOTE_PROC_CONNECT_OPEN = 1, REMOTE_PROC_CONNECT_CLOSE = 2, @@ -3599,4 +3605,5 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_AUTHORIZED_SSH_KEYS_GET = 424, REMOTE_PROC_DOMAIN_AUTHORIZED_SSH_KEYS_SET = 425, REMOTE_PROC_DOMAIN_GET_MESSAGES = 426, + REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE = 427, }; diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index f8f36ce361..a4c5542ada 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -13494,6 +13494,24 @@ virshEventMemoryFailurePrint(virConnectPtr conn G_GNUC_UNUSED, } +static void +virshEventMemoryDeviceSizeChangePrint(virConnectPtr conn G_GNUC_UNUSED, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque) +{ + g_auto(virBuffer) buf = VIR_BUFFER_INITIALIZER; + + virBufferAsprintf(&buf, + _("event 'memory-device-size-change' for domain '%s':\n" + "alias: %s\nsize: %llu\n"), + virDomainGetName(dom), alias, size); + + virshEventPrint(opaque, &buf); +} + + virshDomainEventCallback virshDomainEventCallbacks[] = { { "lifecycle", VIR_DOMAIN_EVENT_CALLBACK(virshEventLifecyclePrint), }, @@ -13545,6 +13563,8 @@ virshDomainEventCallback virshDomainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(virshEventBlockThresholdPrint), }, { "memory-failure", VIR_DOMAIN_EVENT_CALLBACK(virshEventMemoryFailurePrint), }, + { "memory-device-size-change", + VIR_DOMAIN_EVENT_CALLBACK(virshEventMemoryDeviceSizeChangePrint), }, }; G_STATIC_ASSERT(VIR_DOMAIN_EVENT_ID_LAST == G_N_ELEMENTS(virshDomainEventCallbacks)); -- 2.26.2

If the QEMU driver restarts it loses the track of the actual size of virtio-mem (because it's runtime type of information and thus not stored in XML) and therefore, we have to refresh it when reconnecting to the domain monitor. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 37 +++++++++++++++++++---- src/qemu/qemu_monitor.h | 3 ++ src/qemu/qemu_monitor_json.c | 58 ++++++++++++++++++++++-------------- src/qemu/qemu_process.c | 3 ++ 4 files changed, 73 insertions(+), 28 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 621ae3b694..87b110ed73 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -8273,9 +8273,21 @@ qemuDomainUpdateMemoryDeviceInfo(virQEMUDriverPtr driver, return -1; } - /* if qemu doesn't support the info request, just carry on */ - if (rc == -2) + /* If qemu doesn't support the info request, just carry on, unless we + * really need it. */ + if (rc == -2) { + for (i = 0; i < vm->def->nmems; i++) { + virDomainMemoryDefPtr mem = vm->def->mems[i]; + + if (mem->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("qemu did not return info on vitio-mem device")); + return -1; + } + } + return 0; + } if (rc < 0) return -1; @@ -8290,9 +8302,24 @@ qemuDomainUpdateMemoryDeviceInfo(virQEMUDriverPtr driver, if (!(dimm = virHashLookup(meminfo, mem->info.alias))) continue; - mem->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM; - mem->info.addr.dimm.slot = dimm->slot; - mem->info.addr.dimm.base = dimm->address; + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + mem->actualsize = VIR_DIV_UP(dimm->size, 1024); + break; + + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + mem->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM; + mem->info.addr.dimm.slot = dimm->slot; + mem->info.addr.dimm.base = dimm->address; + break; + + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + /* nada */ + break; + } } virHashFree(meminfo); diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 4c352125c9..ed1cc092b2 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1396,10 +1396,13 @@ typedef struct _qemuMonitorMemoryDeviceInfo qemuMonitorMemoryDeviceInfo; typedef qemuMonitorMemoryDeviceInfo *qemuMonitorMemoryDeviceInfoPtr; struct _qemuMonitorMemoryDeviceInfo { + /* For pc-dimm */ unsigned long long address; unsigned int slot; bool hotplugged; bool hotpluggable; + /* For virtio-mem */ + unsigned long long size; /* in bytes */ }; int qemuMonitorGetMemoryDeviceInfo(qemuMonitorPtr mon, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index ba1c63a37b..5229db71ab 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -8233,7 +8233,6 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitorPtr mon, virJSONValuePtr cmd; virJSONValuePtr reply = NULL; virJSONValuePtr data = NULL; - qemuMonitorMemoryDeviceInfoPtr meminfo = NULL; size_t i; if (!(cmd = qemuMonitorJSONMakeCommand("query-memory-devices", NULL))) @@ -8254,6 +8253,9 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitorPtr mon, for (i = 0; i < virJSONValueArraySize(data); i++) { virJSONValuePtr elem = virJSONValueArrayGet(data, i); + g_autofree qemuMonitorMemoryDeviceInfoPtr meminfo = NULL; + virJSONValuePtr dimminfo; + const char *devalias; const char *type; if (!(type = virJSONValueObjectGetString(elem, "type"))) { @@ -8263,26 +8265,26 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitorPtr mon, goto cleanup; } + if (!(dimminfo = virJSONValueObjectGetObject(elem, "data"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("query-memory-devices reply data doesn't " + "contain enum data")); + goto cleanup; + } + + /* While 'id' attribute is marked as optional in QEMU's QAPI + * specification, Libvirt always sets it. Thus we can fail if not + * present. */ + if (!(devalias = virJSONValueObjectGetString(dimminfo, "id"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("dimm memory info data is missing 'id'")); + goto cleanup; + } + + meminfo = g_new0(qemuMonitorMemoryDeviceInfo, 1); + /* dimm memory devices */ if (STREQ(type, "dimm")) { - virJSONValuePtr dimminfo; - const char *devalias; - - if (!(dimminfo = virJSONValueObjectGetObject(elem, "data"))) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("query-memory-devices reply data doesn't " - "contain enum data")); - goto cleanup; - } - - if (!(devalias = virJSONValueObjectGetString(dimminfo, "id"))) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("dimm memory info data is missing 'id'")); - goto cleanup; - } - - meminfo = g_new0(qemuMonitorMemoryDeviceInfo, 1); - if (virJSONValueObjectGetNumberUlong(dimminfo, "addr", &meminfo->address) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -8313,17 +8315,27 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitorPtr mon, } - if (virHashAddEntry(info, devalias, meminfo) < 0) + } else if (STREQ(type, "virtio-mem")) { + if (virJSONValueObjectGetNumberUlong(dimminfo, "size", + &meminfo->size) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("malformed/missing slot in dimm memory info")); goto cleanup; - - meminfo = NULL; + } + } else { + /* type not handled yet */ + continue; } + + if (virHashAddEntry(info, devalias, meminfo) < 0) + goto cleanup; + + meminfo = NULL; } ret = 0; cleanup: - VIR_FREE(meminfo); virJSONValueFree(cmd); virJSONValueFree(reply); return ret; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index eb9fab8e9a..dc301371ef 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8542,6 +8542,9 @@ qemuProcessReconnect(void *opaque) qemuDomainVcpuPersistOrder(obj->def); + if (qemuDomainUpdateMemoryDeviceInfo(driver, obj, QEMU_ASYNC_JOB_NONE) < 0) + goto error; + if (qemuProcessDetectIOThreadPIDs(driver, obj, QEMU_ASYNC_JOB_NONE) < 0) goto error; -- 2.26.2

Reporting how much memory is exposed to the guest happens under <currentMemory/> which is taken from def->mem.cur_balloon. The reported amount should account for both balloon size and the sum of @actualsize of all virtio-mems. For instance, if domain has 4GiB via balloon and additional 2GiB via virtio-mem, then the domain XML should report 6GiB. The same applies for domain statistics. The way to achieve this is to account for either balloon or virtio-mem when the size of the other is changed, e.g. on balloon change we have to add all @actualsize (for non virtio-mem these will be zero, so the check for memory model is needless, but makes it more obvious what's happening), and vice versa. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_driver.c | 4 ++++ src/qemu/qemu_process.c | 17 +++++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 6bcbc75fbc..d5891e32c3 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4323,6 +4323,7 @@ processMemoryDeviceSizeChange(virQEMUDriverPtr driver, { virDomainMemoryDefPtr mem = NULL; virObjectEventPtr event = NULL; + unsigned long long balloon; if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) return; @@ -4338,7 +4339,10 @@ processMemoryDeviceSizeChange(virQEMUDriverPtr driver, goto endjob; } + balloon = vm->def->mem.cur_balloon - mem->actualsize; mem->actualsize = VIR_DIV_UP(info->size, 1024); + balloon += mem->actualsize; + vm->def->mem.cur_balloon = balloon; event = virDomainEventMemoryDeviceSizeChangeNewFromObj(vm, info->devAlias, diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index dc301371ef..f6da5afb48 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1253,10 +1253,19 @@ qemuProcessHandleBalloonChange(qemuMonitorPtr mon G_GNUC_UNUSED, virQEMUDriverPtr driver = opaque; virObjectEventPtr event = NULL; g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); + size_t i; virObjectLock(vm); event = virDomainEventBalloonChangeNewFromObj(vm, actual); + /* We want the balloon size stored in domain definition to account for the + * actual size of virtio-mem too. */ + VIR_DEBUG("balloon size before fix is %lld", actual); + for (i = 0; i < vm->def->nmems; i++) { + if (vm->def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + actual += vm->def->mems[i]->actualsize; + } + VIR_DEBUG("Updating balloon from %lld to %lld kb", vm->def->mem.cur_balloon, actual); vm->def->mem.cur_balloon = actual; @@ -2451,6 +2460,7 @@ qemuProcessRefreshBalloonState(virQEMUDriverPtr driver, int asyncJob) { unsigned long long balloon; + size_t i; int rc; /* if no ballooning is available, the current size equals to the current @@ -2467,6 +2477,13 @@ qemuProcessRefreshBalloonState(virQEMUDriverPtr driver, if (qemuDomainObjExitMonitor(driver, vm) < 0 || rc < 0) return -1; + /* We want the balloon size stored in domain definition to account for the + * actual size of virtio-mem too. */ + VIR_DEBUG("balloon size before fix is %lld", balloon); + for (i = 0; i < vm->def->nmems; i++) { + if (vm->def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + balloon += vm->def->mems[i]->actualsize; + } vm->def->mem.cur_balloon = balloon; return 0; -- 2.26.2

The qemuDomainSetMemoryFlags() allows for memballoon (<currentMemory/>) changes for both active and inactive guests. And just before doing any change, we have to make sure that the new size is not greater than the total memory (<memory/>). However, the total memory includes not only the regular guest memory, but also sum of maximum sizes of all virtio-mems (in fact all memory devices for that matter). But virtio-mem devices are modified differently (via virDomainUpdateDevice()) and thus the upper limit for new balloon size has to be lowered. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_driver.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index d5891e32c3..235cc8566e 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2432,12 +2432,28 @@ static int qemuDomainSetMemoryFlags(virDomainPtr dom, unsigned long newmem, } else { /* resize the current memory */ unsigned long oldmax = 0; + size_t i; - if (def) + if (def) { oldmax = virDomainDefGetMemoryTotal(def); + + /* While virtio-mem is regular mem from guest POV, it can't be + * modified through this API. */ + for (i = 0; i < def->nmems; i++) { + if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + oldmax -= def->mems[i]->size; + } + } + if (persistentDef) { - if (!oldmax || oldmax > virDomainDefGetMemoryTotal(persistentDef)) + if (!oldmax || oldmax > virDomainDefGetMemoryTotal(persistentDef)) { oldmax = virDomainDefGetMemoryTotal(persistentDef); + + for (i = 0; i < persistentDef->nmems; i++) { + if (persistentDef->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + oldmax -= persistentDef->mems[i]->size; + } + } } if (newmem > oldmax) { -- 2.26.2

New 'update-memory-device' command is introduced which aims on making it user friendly to change <memory/> device. So far I just need to change <requested/> so I'm introducing --requested-size only; but the idea is that this is extensible for other cases too. For instance, want to change <myElement/>? A new --my-element argument can be easily introduced. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/manpages/virsh.rst | 30 ++++++++ tools/virsh-domain.c | 149 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 179 insertions(+) diff --git a/docs/manpages/virsh.rst b/docs/manpages/virsh.rst index 9ff31a0160..85aff0b1ee 100644 --- a/docs/manpages/virsh.rst +++ b/docs/manpages/virsh.rst @@ -4892,6 +4892,36 @@ results as some fields may be autogenerated and thus match devices other than expected. +update-memory-device +-------------------- + +**Syntax:** + +:: + + update-memory-device domain [--print-xml] [--alias alias] + [[--live] [--config] | [--current]] + [--requested-size size] + +This command finds ``<memory/>`` device inside given *domain*, changes +requested values and passes updated device XML to daemon. If *--print-xml* is +specified then the device is not changed, but the updated device XML is printed +to stdout. If there are more than one ``<memory/>`` devices in *domain* use +*--alias* to select the desired one. + +If *--live* is specified, affect a running domain. +If *--config* is specified, affect the next startup of a persistent guest. +If *--current* is specified, it is equivalent to either *--live* or +*--config*, depending on the current state of the guest. +Both *--live* and *--config* flags may be given, but *--current* is +exclusive. Not specifying any flag is the same as specifying *--current*. + +If *--requested-size* is specified then ``<requested/>`` under memory target is +changed to requested *size* (as scaled integer, see ``NOTES`` above). It +defaults to kibibytes if no suffix is provided. The option is valid only for +``virtio-mem`` memory device model. + + change-media ------------ diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index a4c5542ada..e2acd4f543 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -9129,6 +9129,149 @@ cmdSetmaxmem(vshControl *ctl, const vshCmd *cmd) return ret; } + +/* + * "update-memory-device" command + */ +static const vshCmdInfo info_update_memory_device[] = { + {.name = "help", + .data = N_("update memory device of a domain") + }, + {.name = "desc", + .data = N_("Update values of a memory device of a domain") + }, + {.name = NULL} +}; + +static const vshCmdOptDef opts_update_memory_device[] = { + VIRSH_COMMON_OPT_DOMAIN_FULL(0), + VIRSH_COMMON_OPT_DOMAIN_CONFIG, + VIRSH_COMMON_OPT_DOMAIN_LIVE, + VIRSH_COMMON_OPT_DOMAIN_CURRENT, + {.name = "print-xml", + .type = VSH_OT_BOOL, + .help = N_("print updated memory device XML instead of executing the change") + }, + {.name = "alias", + .type = VSH_OT_STRING, + .completer = virshDomainDeviceAliasCompleter, + .help = N_("memory device alias"), + }, + {.name = "requested-size", + .type = VSH_OT_INT, + .help = N_("new value of <requested/> size, as scaled integer (default KiB)") + }, + {.name = NULL} +}; + +static int +virshGetUpdatedMemoryXML(char **updatedMemoryXML, + vshControl *ctl, + const vshCmd *cmd, + virDomainPtr dom, + unsigned int flags) +{ + const char *alias = NULL; + g_autoptr(xmlDoc) doc = NULL; + g_autoptr(xmlXPathContext) ctxt = NULL; + g_autofree char *xpath = NULL; + int nmems; + g_autofree xmlNodePtr *mems = NULL; + unsigned int domainXMLFlags = 0; + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) + domainXMLFlags |= VIR_DOMAIN_XML_INACTIVE; + + if (virshDomainGetXMLFromDom(ctl, dom, domainXMLFlags, &doc, &ctxt) < 0) + return -1; + + if (vshCommandOptStringReq(ctl, cmd, "alias", &alias) < 0) + return -1; + + if (alias) { + xpath = g_strdup_printf("/domain/devices/memory[./alias/@name='%s']", alias); + } else { + xpath = g_strdup("/domain/devices/memory"); + } + + nmems = virXPathNodeSet(xpath, ctxt, &mems); + if (nmems < 0) { + vshSaveLibvirtError(); + return -1; + } else if (nmems == 0) { + vshError(ctl, _("no memory device found")); + return -1; + } else if (nmems > 1) { + vshError(ctl, _("multiple memory devices found, use --alias to select one")); + return -1; + } + + ctxt->node = mems[0]; + + if (vshCommandOptBool(cmd, "requested-size")) { + xmlNodePtr requestedSizeNode; + g_autofree char *kibibytesStr = NULL; + unsigned long long bytes = 0; + unsigned long kibibytes = 0; + + if (vshCommandOptScaledInt(ctl, cmd, "requested-size", &bytes, 1024, ULLONG_MAX) < 0) + return -1; + kibibytes = VIR_DIV_UP(bytes, 1024); + + requestedSizeNode = virXPathNode("./target/requested", ctxt); + + if (!requestedSizeNode) { + vshError(ctl, _("virtio-mem device is missing <requested/>")); + return -1; + } + + kibibytesStr = g_strdup_printf("%lu", kibibytes); + xmlNodeSetContent(requestedSizeNode, BAD_CAST kibibytesStr); + } + + if (!(*updatedMemoryXML = virXMLNodeToString(doc, mems[0]))) { + vshSaveLibvirtError(); + return -1; + } + + return 0; +} + +static bool +cmdUpdateMemoryDevice(vshControl *ctl, const vshCmd *cmd) +{ + g_autoptr(virshDomain) dom = NULL; + bool config = vshCommandOptBool(cmd, "config"); + bool live = vshCommandOptBool(cmd, "live"); + bool current = vshCommandOptBool(cmd, "current"); + g_autofree char *updatedMemoryXML = NULL; + unsigned int flags = VIR_DOMAIN_AFFECT_CURRENT; + + VSH_EXCLUSIVE_OPTIONS_VAR(current, live); + VSH_EXCLUSIVE_OPTIONS_VAR(current, config); + + if (config) + flags |= VIR_DOMAIN_AFFECT_CONFIG; + if (live) + flags |= VIR_DOMAIN_AFFECT_LIVE; + + if (!(dom = virshCommandOptDomain(ctl, cmd, NULL))) + return false; + + if (virshGetUpdatedMemoryXML(&updatedMemoryXML, ctl, cmd, dom, flags) < 0) + return false; + + if (vshCommandOptBool(cmd, "print-xml")) { + vshPrint(ctl, "%s", updatedMemoryXML); + } else { + if (virDomainUpdateDeviceFlags(dom, updatedMemoryXML, flags) < 0) + return false; + } + + return true; +} + + /* * "memtune" command */ @@ -15007,6 +15150,12 @@ const vshCmdDef domManagementCmds[] = { .info = info_update_device, .flags = 0 }, + {.name = "update-memory-device", + .handler = cmdUpdateMemoryDevice, + .opts = opts_update_memory_device, + .info = info_update_memory_device, + .flags = 0 + }, {.name = "vcpucount", .handler = cmdVcpucount, .opts = opts_vcpucount, -- 2.26.2

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- NEWS.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/NEWS.rst b/NEWS.rst index bd40373a80..a58b04edb0 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -13,6 +13,13 @@ v7.2.0 (unreleased) * **New features** + * Introduce virtio-mem ``<memory/>`` model + + New virtio-mem model is introduced for ``<memory/>`` device which is a + paravirtualized mechanism of adding/removing memory to/from a VM. Use + ``virDomainUpdateDeviceFlags()`` API to adjust amount of memory or ``virsh + update-memory-device`` for convenience. + * **Improvements** * **Bug fixes** -- 2.26.2

This commit adds new memorydevices.rst page which should serve all models of memory devices. Yet, I'm documenting virtio-mem quirks only. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/kbase/index.rst | 4 + docs/kbase/memorydevices.rst | 150 +++++++++++++++++++++++++++++++++++ docs/kbase/meson.build | 1 + 3 files changed, 155 insertions(+) create mode 100644 docs/kbase/memorydevices.rst diff --git a/docs/kbase/index.rst b/docs/kbase/index.rst index 532804fe05..45450bf33b 100644 --- a/docs/kbase/index.rst +++ b/docs/kbase/index.rst @@ -46,6 +46,10 @@ Usage `PCI topology <../pci-addresses.html>`__ Addressing schemes for PCI devices +`Memory devices <memorydevices.html>`__ + Memory devices and their use + + Internals / Debugging --------------------- diff --git a/docs/kbase/memorydevices.rst b/docs/kbase/memorydevices.rst new file mode 100644 index 0000000000..5c4c45a77f --- /dev/null +++ b/docs/kbase/memorydevices.rst @@ -0,0 +1,150 @@ +============== +Memory devices +============== + +.. contents:: + +Basics +====== + +Memory devices can be divided into two families: DIMMs and NVDIMMs. The former +is typical RAM memory: it's volatile and thus its contents doesn't survive +reboots nor guest shut downs and power ons. The latter retains its contents +across reboots or power outages. + +In Libvirt, there are two models for DIMMs: + +* ``dimm`` model: + + :: + + <memory model='dimm'> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + <address type='dimm' slot='0'/> + </memory> + +* ``virtio-mem`` model: + + :: + + <memory model='virtio-mem'> + <target> + <size unit='KiB'>1048576</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>524288</requested> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </memory> + +Then there are two models for NVDIMMs: + +* ``nvidmm`` model: + + :: + + <memory model='nvdimm'> + <source> + <path>/tmp/nvdimm</path> + </source> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + <address type='dimm' slot='0'/> + </memory> + +* ``virtio-pmem`` model: + + :: + + <memory model='virtio-pmem' access='shared'> + <source> + <path>/tmp/virtio_pmem</path> + </source> + <target> + <size unit='KiB'>524288</size> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </memory> + + +Please note that (maybe somewhat surprisingly) virtio models go onto PCI bus +instead of DIMM slots. + +Furthermore, DIMMs can have ``<source/>`` element which configures backend for +devices. For NVDIMMs the element is mandatory and reflects where the contents +is saved. + +See `memory devices documentation <../formatdomain.html#elementsMemory>`_. + +``virtio-mem`` model +==================== + +The ``virtio-mem`` model can be viewed as revised memory balloon. It offers +adding and removing memory (without the actual hotplug of the device). It +solves problems that memory balloon can't solve on its own and thus is more +flexible than DIMM + balloon solution. ``virtio-mem`` is NUMA aware, and thus +memory can be inflated/deflated only for a subset of guest NUMA nodes. Also, +it works with chunks that are either exposed to guest or taken back from it. + +See https://virtio-mem.gitlab.io/ + +Under the hood, ``virtio-mem`` device is split into chunks of equal size which +are then exposed to the guest. Either all of them or only a portion depending +on user's request. Therefore there are three important sizes for +``virtio-mem``. All are to be found under ``<target/>`` element: + +#. The maximum size the device can ever offer, exposed under ``<size/>`` +#. The size of a single block, exposed under ``<block/>`` +#. The current size exposed to the guest, exposed under ``<requested/>`` + +For instance, the following example the maximum size is 4GiB, the block size is +2MiB and only 1GiB should be exposed to the guest: + + :: + + <memory model='virtio-mem'> + <target> + <size unit='KiB'>4194304</size> + <block unit='KiB'>2048</block> + <requested unit='KiB'>1048576</requested> + </target> + </memory> + +Please note that ``<requested/>`` must be an integer multiple of ``<block/>`` +size or zero (no blocks exposed to the guest) and has to be less or equal to +``<size/>`` (all blocks exposed to the guest). Furthermore, QEMU recommends the +``<block/>`` size to be as big as a Transparent Huge Page (usually 2MiB). + +To change the size exposed to the guest, users should pass memory device XML +with nothing but ``<requested/>`` changed into the +``virDomainUpdateDeviceFlags()`` API. For user's convenience this can be done +via virsh too: + + :: + + # virsh update-memory-device $dom --requested size 2GiB + +If there are two or more ``<memory/>`` devices then ``--alias`` shall be used +to tell virsh which memory device should be updated. + +For running guests there is fourth size that can be found under ``<target/>``: + + :: + + <actual unit='KiB'>2097152</actual> + +The ``<actual/>`` reflects the actual size used by the guest. In general it +can differ from ``<requested/>``. Reasons include guest kernel missing +``virtio-mem`` module and thus being unable to take offered memory, or guest +kernel being unable to free memory. Since ``<actual/>`` only reports size to +users, the element is never parsed. It is formatted only into live XML. + +Since changing actual allocation requires cooperation with guest kernel, +requests for change are not instant. Therefore, libvirt emits +``VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE`` event whenever actual +allocation changed. diff --git a/docs/kbase/meson.build b/docs/kbase/meson.build index 7b4e7abbd3..7f6adba915 100644 --- a/docs/kbase/meson.build +++ b/docs/kbase/meson.build @@ -9,6 +9,7 @@ docs_kbase_files = [ 'locking-lockd', 'locking', 'locking-sanlock', + 'memorydevices', 'migrationinternals', 'qemu-passthrough-security', 'rpm-deployment', -- 2.26.2

On 17.03.21 12:57, Michal Privoznik wrote:
v3 of:
https://listman.redhat.com/archives/libvir-list/2021-February/msg00961.html
diff to v2: - Dropped code that forbade use of virtio-mem and memballoon at the same time; - This meant that I had to adjust memory accounting, qemuDomainSetMemoryFlags() - see patches 11/15 and 12/15 which are new. - Fixed small nits raised by Peter in his review of v2
Hi Michal, do you have a branch somewhere that I can easily checkout to play with it/test it? I can spot that at least patch #2 and #3 are already upstream. The remaining patches don't seem to apply cleanly on current upstream/master. Thanks! -- Thanks, David / dhildenb

On 6/17/21 11:44 AM, David Hildenbrand wrote:
On 17.03.21 12:57, Michal Privoznik wrote:
v3 of:
https://listman.redhat.com/archives/libvir-list/2021-February/msg00961.html
diff to v2: - Dropped code that forbade use of virtio-mem and memballoon at the same time; - This meant that I had to adjust memory accounting, qemuDomainSetMemoryFlags() - see patches 11/15 and 12/15 which are new. - Fixed small nits raised by Peter in his review of v2
Hi Michal, do you have a branch somewhere that I can easily checkout to play with it/test it?
Yes: https://gitlab.com/MichalPrivoznik/libvirt/-/tree/virtio_mem_v4 There were some comments in Peter's review and I really should fix my code according to them and merge/send v4. Michal

On 17.06.21 13:18, Michal Prívozník wrote:
On 6/17/21 11:44 AM, David Hildenbrand wrote:
On 17.03.21 12:57, Michal Privoznik wrote:
v3 of:
https://listman.redhat.com/archives/libvir-list/2021-February/msg00961.html
diff to v2: - Dropped code that forbade use of virtio-mem and memballoon at the same time; - This meant that I had to adjust memory accounting, qemuDomainSetMemoryFlags() - see patches 11/15 and 12/15 which are new. - Fixed small nits raised by Peter in his review of v2
Hi Michal, do you have a branch somewhere that I can easily checkout to play with it/test it?
Yes:
https://gitlab.com/MichalPrivoznik/libvirt/-/tree/virtio_mem_v4
There were some comments in Peter's review and I really should fix my code according to them and merge/send v4.
Michal
Thanks! I started with a single virtio-mem device. 1. NUMA requirement Right now, one can really only configure "maxMemory" with NUMA specified, otherwise there will be "error: unsupported configuration: At least one numa node has to be configured when enabling memory hotplug". I recall this is a limitation of older QEMU which would not create ACPI SRAT tables otherwise. In QEMU, this is no longer the case. As soon as "maxmem" is specified on the QEMU cmdline, we fake a single NUMA node: hw/core/numa.c:numa_complete_configuration() "Enable NUMA implicitly by adding a new NUMA node automatically" -> m->auto_enable_numa_with_memdev / mc->auto_enable_numa_with_memhp m->auto_enable_numa_with_memdev (slots=0) is set since 5.1 on x86-64 and arm64 m->auto_enable_numa_with_memhp (slots>0) is set since 2.11 on x86-64 and 4.1 on arm64 So in theory, with newer QEMU on x86-64 and arm64 we could drop that limitation in libvirt (might require some changes eventually regarding the "node" specification handling). ppc64 shouldn't care as there is no ACPI. 2. "virsh update-memory-device" My domain looks something like: <domain type='kvm' id='3'> <name>Fedora 34</name> ... <maxMemory slots='16' unit='KiB'>20971520</maxMemory> <memory unit='KiB'>20971520</memory> <currentMemory unit='KiB'>4194304</currentMemory> <vcpu placement='static'>16</vcpu> ... <cpu mode='custom' match='exact' check='full'> <numa> <cell id='0' cpus='0-15' memory='4194304' unit='KiB'/> </numa> </cpu> ... <devices> ... <memory model='virtio-mem'> <target> <size unit='KiB'>16777216</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>524288</requested> <actual unit='KiB'>0</actual> </target> <alias name='virtiomem0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </memory> </devices> ... It starts just fine and I can query the current properties # virsh dumpxml "Fedora 34" ... <memory model='virtio-mem'> <target> <size unit='KiB'>16777216</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>524288</requested> <actual unit='KiB'>524288</actual> </target> <alias name='virtiomem0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </memory> Trying to update the requested size, however results in [root@virtlab704 ~]# virsh update-memory-device "Fedora 34" --requested-size 0 error: unsupported configuration: Attaching memory device with size '16777216' would exceed domain's maxMemory config size '20971520' Tried with "--alias" with the same result. Is it maybe an issue regarding handling of "maxMemory" vs "memory" ? 3. "memory" item handling Whenever I edit the XML and set e.g., "<memory unit='GiB'>4</memory>", it's silently converted back to "20 GiB". Maybe that's just always implicitly calculated from the NUMA spec and the defined devices. Thanks! -- Thanks, David / dhildenb

On Thu, Jun 17, 2021 at 14:03:44 +0200, David Hildenbrand wrote:
On 17.06.21 13:18, Michal Prívozník wrote:
On 6/17/21 11:44 AM, David Hildenbrand wrote:
On 17.03.21 12:57, Michal Privoznik wrote:
v3 of:
https://listman.redhat.com/archives/libvir-list/2021-February/msg00961.html
diff to v2: - Dropped code that forbade use of virtio-mem and memballoon at the same time; - This meant that I had to adjust memory accounting, qemuDomainSetMemoryFlags() - see patches 11/15 and 12/15 which are new. - Fixed small nits raised by Peter in his review of v2
Hi Michal, do you have a branch somewhere that I can easily checkout to play with it/test it?
Yes:
https://gitlab.com/MichalPrivoznik/libvirt/-/tree/virtio_mem_v4
There were some comments in Peter's review and I really should fix my code according to them and merge/send v4.
Michal
Thanks! I started with a single virtio-mem device.
1. NUMA requirement
Right now, one can really only configure "maxMemory" with NUMA specified, otherwise there will be "error: unsupported configuration: At least one numa node has to be configured when enabling memory hotplug".
I recall this is a limitation of older QEMU which would not create ACPI SRAT tables otherwise. In QEMU, this is no longer the case. As soon as "maxmem" is specified on the QEMU cmdline, we fake a single NUMA node:
hw/core/numa.c:numa_complete_configuration()
"Enable NUMA implicitly by adding a new NUMA node automatically" -> m->auto_enable_numa_with_memdev / mc->auto_enable_numa_with_memhp
m->auto_enable_numa_with_memdev (slots=0) is set since 5.1 on x86-64 and arm64 m->auto_enable_numa_with_memhp (slots>0) is set since 2.11 on x86-64 and 4.1 on arm64
So in theory, with newer QEMU on x86-64 and arm64 we could drop that limitation in libvirt (might require some changes eventually regarding the "node" specification handling). ppc64 shouldn't care as there is no ACPI.
The main reason for the check to be present is actually exactly what you are describing. qemu fakes the numa node in case none are configured, this means that in case where libvirt would not enforce that you'd get a discrepancy between the config and what qemu exposes. [...]
3. "memory" item handling
Whenever I edit the XML and set e.g., "<memory unit='GiB'>4</memory>", it's silently converted back to "20 GiB". Maybe that's just always implicitly calculated from the NUMA spec and the defined devices.
In cases where you've got numa confuigured, the <memory> element is re-calculated back from the size of the numa nodes, as when you change the value there isn't any obvious algorithm on picking NUMA nodes where to pull from.

On 17.06.21 14:17, Peter Krempa wrote:
On Thu, Jun 17, 2021 at 14:03:44 +0200, David Hildenbrand wrote:
On 17.06.21 13:18, Michal Prívozník wrote:
On 6/17/21 11:44 AM, David Hildenbrand wrote:
On 17.03.21 12:57, Michal Privoznik wrote:
v3 of:
https://listman.redhat.com/archives/libvir-list/2021-February/msg00961.html
diff to v2: - Dropped code that forbade use of virtio-mem and memballoon at the same time; - This meant that I had to adjust memory accounting, qemuDomainSetMemoryFlags() - see patches 11/15 and 12/15 which are new. - Fixed small nits raised by Peter in his review of v2
Hi Michal, do you have a branch somewhere that I can easily checkout to play with it/test it?
Yes:
https://gitlab.com/MichalPrivoznik/libvirt/-/tree/virtio_mem_v4
There were some comments in Peter's review and I really should fix my code according to them and merge/send v4.
Michal
Thanks! I started with a single virtio-mem device.
1. NUMA requirement
Right now, one can really only configure "maxMemory" with NUMA specified, otherwise there will be "error: unsupported configuration: At least one numa node has to be configured when enabling memory hotplug".
I recall this is a limitation of older QEMU which would not create ACPI SRAT tables otherwise. In QEMU, this is no longer the case. As soon as "maxmem" is specified on the QEMU cmdline, we fake a single NUMA node:
hw/core/numa.c:numa_complete_configuration()
"Enable NUMA implicitly by adding a new NUMA node automatically" -> m->auto_enable_numa_with_memdev / mc->auto_enable_numa_with_memhp
m->auto_enable_numa_with_memdev (slots=0) is set since 5.1 on x86-64 and arm64 m->auto_enable_numa_with_memhp (slots>0) is set since 2.11 on x86-64 and 4.1 on arm64
So in theory, with newer QEMU on x86-64 and arm64 we could drop that limitation in libvirt (might require some changes eventually regarding the "node" specification handling). ppc64 shouldn't care as there is no ACPI.
The main reason for the check to be present is actually exactly what you are describing. qemu fakes the numa node in case none are configured, this means that in case where libvirt would not enforce that you'd get a discrepancy between the config and what qemu exposes.
Right, but it fakes a single NUMA node just for the purpose of creating the ACPI SRAT. (the same thing Linux will do implicitly if there are no NUMA nodes, because no NUMA == single NUMA node) Anyhow, just something I found awkward to run into, because QEMU doesn't have this limitation anymore and that handling is properly glued to compatibility machines so there is no change when migrating etc. But it's certainly more a "nice to have for smaller XML files" :) I guess auto_enable_numa handling might result in a different QEMU behavior (like the result for HMP "info numa" etc.), so if we'd ever want to go down that path, we'd have to double check what the effects are.
[...]
3. "memory" item handling
Whenever I edit the XML and set e.g., "<memory unit='GiB'>4</memory>", it's silently converted back to "20 GiB". Maybe that's just always implicitly calculated from the NUMA spec and the defined devices.
In cases where you've got numa confuigured, the <memory> element is re-calculated back from the size of the numa nodes, as when you change the value there isn't any obvious algorithm on picking NUMA nodes where to pull from.
Makes sense. Another minor thing that is similar to 1. (suboptimal but certainly no show stopper) 4. QEMU does no longer require a "slots" specification when maxmem is set (because virtio-based memory devices don't require ACPI memory module slots). Specifying "<maxMemory unit='KiB'>20971520</maxMemory>" results in (IMHO confusing) error: "error: XML document failed to validate against schema: Unable to validate doc against /usr/local/share/libvirt/schemas/domain.rng Extra element maxMemory in interleave Invalid sequence in interleave Element domain failed to validate content" "Extra element maxMemory in interleave Invalid sequence in interleave Element domain failed to validate content" Specifying "<maxMemory slots='0' unit='KiB'>20971520</maxMemory>" results in "error: XML error: both maximum memory size and memory slot count must be specified" However, older QEMU version have that requirement. Supported since 3.0 I think: $ git tag --contains 951f2269af2 ... v3.0.0 ... -- Thanks, David / dhildenb

On Thu, Jun 17, 2021 at 14:34:21 +0200, David Hildenbrand wrote:
On 17.06.21 14:17, Peter Krempa wrote:
On Thu, Jun 17, 2021 at 14:03:44 +0200, David Hildenbrand wrote:
[...]
4. QEMU does no longer require a "slots" specification when maxmem is set (because virtio-based memory devices don't require ACPI memory module slots).
Specifying "<maxMemory unit='KiB'>20971520</maxMemory>" results in (IMHO confusing) error:
"error: XML document failed to validate against schema: Unable to validate doc against /usr/local/share/libvirt/schemas/domain.rng Extra element maxMemory in interleave Invalid sequence in interleave Element domain failed to validate content"
"Extra element maxMemory in interleave Invalid sequence in interleave Element domain failed to validate content"
This is because the XML schema for maxMemory requires the element. Unfortunately the XML schema validator from libxml2 has extremely user-unfriendly errors.
Specifying "<maxMemory slots='0' unit='KiB'>20971520</maxMemory>" results in "error: XML error: both maximum memory size and memory slot count must be specified"
This is the error from the XML parser which has human-crafter errors.
However, older QEMU version have that requirement. Supported since 3.0 I think:
$ git tag --contains 951f2269af2 ... v3.0.0 ...
Is there a way how to detect that it's not needed? E.g. via the QMP schema or such? In this instance we could perhaps drop it and let qemu report the error ... well if it's reasonable though.

On 17.06.21 14:42, Peter Krempa wrote:
On Thu, Jun 17, 2021 at 14:34:21 +0200, David Hildenbrand wrote:
On 17.06.21 14:17, Peter Krempa wrote:
On Thu, Jun 17, 2021 at 14:03:44 +0200, David Hildenbrand wrote:
[...]
4. QEMU does no longer require a "slots" specification when maxmem is set (because virtio-based memory devices don't require ACPI memory module slots).
Specifying "<maxMemory unit='KiB'>20971520</maxMemory>" results in (IMHO confusing) error:
"error: XML document failed to validate against schema: Unable to validate doc against /usr/local/share/libvirt/schemas/domain.rng Extra element maxMemory in interleave Invalid sequence in interleave Element domain failed to validate content"
"Extra element maxMemory in interleave Invalid sequence in interleave Element domain failed to validate content"
This is because the XML schema for maxMemory requires the element. Unfortunately the XML schema validator from libxml2 has extremely user-unfriendly errors.
Specifying "<maxMemory slots='0' unit='KiB'>20971520</maxMemory>" results in "error: XML error: both maximum memory size and memory slot count must be specified"
This is the error from the XML parser which has human-crafter errors.
However, older QEMU version have that requirement. Supported since 3.0 I think:
$ git tag --contains 951f2269af2 ... v3.0.0 ...
Is there a way how to detect that it's not needed? E.g. via the QMP schema or such?
In this instance we could perhaps drop it and let qemu report the error ... well if it's reasonable though.
I guess only implicitly, for example, if virtio-mem-pci or virtio-pmem-pci are possible (I assume supported devices can be queried via QMP). And without these devices, it doesn't make sense to have "slots=0". virtio-pmem-pci was introduced with v4.1.0, virtio-mem-pci was introduced with v5.1.0. -- Thanks, David / dhildenb
participants (4)
-
David Hildenbrand
-
Michal Privoznik
-
Michal Prívozník
-
Peter Krempa