[PATCH v5 00/16] Introduce virtio-mem <memory/> model

v4 of: https://listman.redhat.com/archives/libvir-list/2021-June/msg00679.html diff to v4: - Rebased onto current master - Worked in David's suggestions, e.g. rename from <actual/> to <current/>, implemented offline memory update, implemented --node argument to virsh update-memory-device, prealloc is OFF and reserve is ON for virtio-mem Some suggestions are left as future work. For instance: - Don't require memory slots because virtio-mem lives on PCI bus anyway - Allow path backed backend for virtio-mem - support .prealloc for virtio-mem object (not memory-backend-* !) I keep occasionally rebased version on my gitlab: https://gitlab.com/MichalPrivoznik/libvirt/-/commits/virtio_mem_v5/ Michal Prívozník (16): virhostmem: Introduce virHostMemGetTHPSize() qemu_capabilities: Introduce QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI qemu_capabilities: Introduce QEMU_CAPS_MEMORY_BACKEND_RESERVE conf: Introduce virtio-mem <memory/> model qemu: Build command line for virtio-mem qemu: Wire up <memory/> live update qemu: Wire up <memory/> offline update Introduce <current/> property to virtio-mem conf: Introduce virDomainMemoryFindByDeviceAlias() qemu: Wire up MEMORY_DEVICE_SIZE_CHANGE event qemu: Refresh the current size of virtio-mem on monitor reconnect qemu: Account for both memballoon and virtio-mem qemuDomainSetMemoryFlags: Take virtio-mem into consideration virsh: Introduce update-memory-device command news: document recent virtio memory addition kbase: Document virtio-mem NEWS.rst | 7 + docs/formatdomain.rst | 45 ++- docs/kbase/index.rst | 4 + docs/kbase/memorydevices.rst | 150 ++++++++++ docs/kbase/meson.build | 1 + docs/manpages/virsh.rst | 30 ++ docs/schemas/domaincommon.rng | 16 ++ examples/c/misc/event-test.c | 17 ++ include/libvirt/libvirt-domain.h | 24 ++ src/conf/domain_conf.c | 126 ++++++++- src/conf/domain_conf.h | 16 ++ src/conf/domain_event.c | 84 ++++++ src/conf/domain_event.h | 10 + src/conf/domain_validate.c | 39 +++ src/libvirt_private.syms | 5 + src/qemu/qemu_alias.c | 10 +- src/qemu/qemu_capabilities.c | 6 + src/qemu/qemu_capabilities.h | 4 + src/qemu/qemu_command.c | 25 +- src/qemu/qemu_domain.c | 33 ++- src/qemu/qemu_domain.h | 1 + src/qemu/qemu_domain_address.c | 38 ++- src/qemu/qemu_driver.c | 259 +++++++++++++++++- src/qemu/qemu_hotplug.c | 18 ++ src/qemu/qemu_hotplug.h | 5 + src/qemu/qemu_monitor.c | 34 +++ src/qemu/qemu_monitor.h | 28 ++ src/qemu/qemu_monitor_json.c | 97 +++++-- src/qemu/qemu_monitor_json.h | 5 + src/qemu/qemu_process.c | 72 +++++ src/qemu/qemu_validate.c | 8 + src/remote/remote_daemon_dispatch.c | 30 ++ src/remote/remote_driver.c | 32 +++ src/remote/remote_protocol.x | 15 +- src/remote_protocol-structs | 7 + src/security/security_apparmor.c | 1 + src/security/security_dac.c | 2 + src/security/security_selinux.c | 2 + src/util/virhostmem.c | 54 ++++ src/util/virhostmem.h | 3 + tests/domaincapsmock.c | 9 + .../caps_5.1.0.x86_64.xml | 1 + .../caps_5.2.0.x86_64.xml | 1 + .../caps_6.0.0.x86_64.xml | 1 + .../caps_6.1.0.x86_64.xml | 2 + ...mory-hotplug-virtio-mem.x86_64-latest.args | 41 +++ .../memory-hotplug-virtio-mem.xml | 67 +++++ tests/qemuxml2argvtest.c | 1 + ...emory-hotplug-virtio-mem.x86_64-latest.xml | 1 + tests/qemuxml2xmltest.c | 1 + tools/virsh-domain.c | 181 ++++++++++++ 51 files changed, 1613 insertions(+), 56 deletions(-) create mode 100644 docs/kbase/memorydevices.rst create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml create mode 120000 tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml -- 2.32.0

New virHostMemGetTHPSize() is introduced which allows caller to obtain THP PMD (Page Middle Directory) size, which is equal to the minimal size that THP can use, taken from kernel doc (Documentation/admin-guide/mm/transhuge.rst): Some userspace (such as a test program, or an optimized memory allocation library) may want to know the size (in bytes) of a transparent hugepage:: cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size Since this size depends on the host architecture and the kernel it won't change whilst libvirtd is running. Therefore, we can use virOnce() and cache the value. Of course, we can be running under kernel that has THP disabled or has no notion of THP at all. In that case a negative value is returned to signal error. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virhostmem.c | 54 ++++++++++++++++++++++++++++++++++++++++ src/util/virhostmem.h | 3 +++ tests/domaincapsmock.c | 9 +++++++ 4 files changed, 67 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 2778fe7f8f..d42d3abcce 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2408,6 +2408,7 @@ virHostMemGetFreePages; virHostMemGetInfo; virHostMemGetParameters; virHostMemGetStats; +virHostMemGetTHPSize; virHostMemSetParameters; diff --git a/src/util/virhostmem.c b/src/util/virhostmem.c index b13c3fe38e..5984dfab3f 100644 --- a/src/util/virhostmem.c +++ b/src/util/virhostmem.c @@ -45,11 +45,14 @@ #include "virstring.h" #include "virnuma.h" #include "virlog.h" +#include "virthread.h" #define VIR_FROM_THIS VIR_FROM_NONE VIR_LOG_INIT("util.hostmem"); +static unsigned long long virHostTHPPMDSize; /* in kibibytes */ +static virOnceControl virHostMemGetTHPSizeOnce = VIR_ONCE_CONTROL_INITIALIZER; #ifdef __FreeBSD__ # define BSD_MEMORY_STATS_ALL 4 @@ -930,3 +933,54 @@ virHostMemAllocPages(unsigned int npages, return ncounts; } + +#if defined(__linux__) +# define HPAGE_PMD_SIZE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size" +static void +virHostMemGetTHPSizeSysfs(unsigned long long *size) +{ + if (virFileReadValueUllong(size, "%s", HPAGE_PMD_SIZE_PATH) < 0) { + VIR_WARN("unable to get THP PMD size: %s", g_strerror(errno)); + return; + } + + /* Size is now in bytes. Convert to KiB. */ + *size >>= 10; +} +#endif /* defined(__linux__) */ + + +static void +virHostMemGetTHPSizeOnceInit(void) +{ +#if defined(__linux__) + virHostMemGetTHPSizeSysfs(&virHostTHPPMDSize); +#else /* !defined(__linux__) */ + VIR_WARN("Getting THP size not ported yet"); +#endif /* !defined(__linux__) */ +} + + +/** + * virHostMemGetTHPSize: + * @size: returned size of THP in kibibytes + * + * Obtain Transparent Huge Page size in kibibytes. The size + * depends on host architecture and kernel. Because of virOnce(), + * do not rely on errno in case of failure. + * + * Returns: 0 on success, + * -1 on failure. + */ +int +virHostMemGetTHPSize(unsigned long long *size) +{ + if (virOnce(&virHostMemGetTHPSizeOnce, virHostMemGetTHPSizeOnceInit) < 0) + return -1; + + if (virHostTHPPMDSize == 0) + return -1; + + *size = virHostTHPPMDSize; + return 0; +} diff --git a/src/util/virhostmem.h b/src/util/virhostmem.h index 3265215d84..c36de94f0f 100644 --- a/src/util/virhostmem.h +++ b/src/util/virhostmem.h @@ -55,3 +55,6 @@ int virHostMemAllocPages(unsigned int npages, unsigned int cellCount, int lastCell, bool add); + +int virHostMemGetTHPSize(unsigned long long *size) + G_GNUC_NO_INLINE; diff --git a/tests/domaincapsmock.c b/tests/domaincapsmock.c index 0a6c541f77..d382d06e27 100644 --- a/tests/domaincapsmock.c +++ b/tests/domaincapsmock.c @@ -17,6 +17,7 @@ #include <config.h> #include "virhostcpu.h" +#include "virhostmem.h" #if WITH_QEMU # include "virmock.h" @@ -51,3 +52,11 @@ virQEMUCapsGetKVMSupportsSecureGuest(virQEMUCaps *qemuCaps) return real_virQEMUCapsGetKVMSupportsSecureGuest(qemuCaps); } #endif + +int +virHostMemGetTHPSize(unsigned long long *size) +{ + /* Pretend Transparent Huge Page size is 2MiB. */ + *size = 2048; + return 0; +} -- 2.32.0

This commit introduces a new capability that reflects virtio-mem-pci device support in QEMU: QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI, /* -device virtio-mem-pci */ The virtio-mem-pci device was introduced in QEMU 5.1. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml | 1 + 6 files changed, 7 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index f27a621f8c..711dc2f248 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -638,6 +638,7 @@ VIR_ENUM_IMPL(virQEMUCaps, "query-display-options", /* QEMU_CAPS_QUERY_DISPLAY_OPTIONS */ "s390-pv-guest", /* QEMU_CAPS_S390_PV_GUEST */ "set-action", /* QEMU_CAPS_SET_ACTION */ + "virtio-mem-pci", /* QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI */ ); @@ -1354,6 +1355,7 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { { "virtio-gpu-gl-pci", QEMU_CAPS_VIRTIO_GPU_GL_PCI }, { "virtio-vga-gl", QEMU_CAPS_VIRTIO_VGA_GL }, { "s390-pv-guest", QEMU_CAPS_S390_PV_GUEST }, + { "virtio-mem-pci", QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI }, }; diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index f3379f556c..1e88591975 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -618,6 +618,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ QEMU_CAPS_QUERY_DISPLAY_OPTIONS, /* 'query-display-options' qmp command present */ QEMU_CAPS_S390_PV_GUEST, /* -object s390-pv-guest,... */ QEMU_CAPS_SET_ACTION, /* 'set-action' QMP command */ + QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI, /* -device virtio-mem-pci */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml index fc0b502ef9..ff984afaa4 100644 --- a/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml @@ -245,6 +245,7 @@ <flag name='rotation-rate'/> <flag name='input-linux'/> <flag name='query-display-options'/> + <flag name='virtio-mem-pci'/> <version>5001000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml index cb1226fc04..1d6ac320b3 100644 --- a/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml @@ -246,6 +246,7 @@ <flag name='rotation-rate'/> <flag name='input-linux'/> <flag name='query-display-options'/> + <flag name='virtio-mem-pci'/> <version>5002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml index d03b8aa726..8be83614ad 100644 --- a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml @@ -254,6 +254,7 @@ <flag name='confidential-guest-support'/> <flag name='query-display-options'/> <flag name='set-action'/> + <flag name='virtio-mem-pci'/> <version>6000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml index 8239f4266a..20ec221deb 100644 --- a/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml @@ -256,6 +256,7 @@ <flag name='confidential-guest-support'/> <flag name='query-display-options'/> <flag name='set-action'/> + <flag name='virtio-mem-pci'/> <version>6001000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100243</microcodeVersion> -- 2.32.0

This capability tracks whether memory-backend-* supports .reserve attribute which is going to be important for backends associated with virtio-mem devices. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_capabilities.c | 4 ++++ src/qemu/qemu_capabilities.h | 3 +++ tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml | 1 + 3 files changed, 8 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 711dc2f248..f92ef069d1 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -639,6 +639,9 @@ VIR_ENUM_IMPL(virQEMUCaps, "s390-pv-guest", /* QEMU_CAPS_S390_PV_GUEST */ "set-action", /* QEMU_CAPS_SET_ACTION */ "virtio-mem-pci", /* QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI */ + + /* 410 */ + "memory-backend-file.reserve", /* QEMU_CAPS_MEMORY_BACKEND_RESERVE */ ); @@ -1719,6 +1722,7 @@ static struct virQEMUCapsStringFlags virQEMUCapsObjectPropsMemoryBackendFile[] = * supported. The 'x-' prefix was kept for compatibility with already * released qemu versions. */ { "x-use-canonical-path-for-ramblock-id", QEMU_CAPS_X_USE_CANONICAL_PATH_FOR_RAMBLOCK_ID }, + { "reserve", QEMU_CAPS_MEMORY_BACKEND_RESERVE }, }; static struct virQEMUCapsStringFlags virQEMUCapsObjectPropsMemoryBackendMemfd[] = { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 1e88591975..79a7df4f7d 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -620,6 +620,9 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ QEMU_CAPS_SET_ACTION, /* 'set-action' QMP command */ QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI, /* -device virtio-mem-pci */ + /* 410 */ + QEMU_CAPS_MEMORY_BACKEND_RESERVE, /* -object memory-backend-*.reserve= */ + QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml index 20ec221deb..ba4b65f04d 100644 --- a/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml @@ -257,6 +257,7 @@ <flag name='query-display-options'/> <flag name='set-action'/> <flag name='virtio-mem-pci'/> + <flag name='memory-backend-file.reserve'/> <version>6001000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100243</microcodeVersion> -- 2.32.0

The virtio-mem is paravirtualized mechanism of adding/removing memory to/from a VM. A virtio-mem-pci device is split into blocks of equal size which are then exposed (all or only a requested portion of them) to the guest kernel to use as regular memory. Therefore, the device has two important attributes: 1) block-size, which defines the size of a block 2) requested-size, which defines how much memory (in bytes) is the device requested to expose to the guest. The 'block-size' is configured on command line and immutable throughout device's lifetime. The 'requested-size' can be set on the command line too, but also is adjustable via monitor. In fact, that is how management software places its requests to change the memory allocation. If it wants to give more memory to the guest it changes 'requested-size' to a bigger value, and if it wants to shrink guest memory it changes the 'requested-size' to a smaller value. Note, value of zero means that guest should release all memory offered by the device. Of course, guest has to cooperate. Therefore, there is a third attribute 'size' which is read only and reflects how much memory the guest still has. This can be different to 'requested-size', obviously. Because of name clash, I've named it 'current' and it is dealt with in future commits (it is a runtime information anyway). In the backend, memory for virtio-mem is backed by usual objects: memory-backend-{ram,file,memfd} and their size puts the cap on the amount of memory that a virtio-mem device can offer to a guest. But we are already able to express this info using <size/> under <target/>. Therefore, we need only two more elements to cover 'block-size' and 'requested-size' attributes. This is the XML I've came up with: <memory model='virtio-mem'> <source> <nodemask>1-3</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>2097152</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>1048576</requested> </target> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </memory> I hope by now it is obvious that: 1) 'requested-size' must be an integer multiple of 'block-size', and 2) virtio-mem-pci device goes onto PCI bus and thus needs PCI address. Then there is a limitation that the minimal 'block-size' is transparent huge page size (I'll leave this without explanation). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.rst | 38 +++++++++-- docs/schemas/domaincommon.rng | 11 +++ src/conf/domain_conf.c | 53 ++++++++++++++- src/conf/domain_conf.h | 3 + src/conf/domain_validate.c | 39 +++++++++++ src/qemu/qemu_alias.c | 1 + src/qemu/qemu_command.c | 1 + src/qemu/qemu_domain.c | 10 +++ src/qemu/qemu_domain_address.c | 38 ++++++++--- src/qemu/qemu_process.c | 2 + src/qemu/qemu_validate.c | 8 +++ src/security/security_apparmor.c | 1 + src/security/security_dac.c | 2 + src/security/security_selinux.c | 2 + .../memory-hotplug-virtio-mem.xml | 67 +++++++++++++++++++ ...emory-hotplug-virtio-mem.x86_64-latest.xml | 1 + tests/qemuxml2xmltest.c | 1 + 17 files changed, 261 insertions(+), 17 deletions(-) create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml create mode 120000 tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 9173d9f1d6..c327e1deae 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -7790,6 +7790,18 @@ Example: usage of the memory devices <size unit='KiB'>524288</size> </target> </memory> + <memory model='virtio-mem'> + <source> + <nodemask>1-3</nodemask> + <pagesize unit='KiB'>2048</pagesize> + </source> + <target> + <size unit='KiB'>2097152</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>1048576</requested> + </target> + </memory> </devices> ... @@ -7797,7 +7809,8 @@ Example: usage of the memory devices Provide ``dimm`` to add a virtual DIMM module to the guest. :since:`Since 1.2.14` Provide ``nvdimm`` model that adds a Non-Volatile DIMM module. :since:`Since 3.2.0` Provide ``virtio-pmem`` model to add a paravirtualized - persistent memory device. :since:`Since 7.1.0` + persistent memory device. :since:`Since 7.1.0` Provide ``virtio-mem`` model + to add paravirtualized memory device. :since:`Since 7.4.0` ``access`` An optional attribute ``access`` ( :since:`since 3.2.0` ) that provides @@ -7820,10 +7833,11 @@ Example: usage of the memory devices allowed only for ``model='nvdimm'`` for pSeries guests. :since:`Since 6.2.0` ``source`` - For model ``dimm`` this element is optional and allows to fine tune the - source of the memory used for the given memory device. If the element is not - provided defaults configured via ``numatune`` are used. If ``dimm`` is - provided, then the following optional elements can be provided as well: + For model ``dimm`` and model ``virtio-mem`` this element is optional and + allows to fine tune the source of the memory used for the given memory + device. If the element is not provided defaults configured via ``numatune`` + are used. If the element is provided, then the following optional elements + can be provided: ``pagesize`` This element can be used to override the default host page size used for @@ -7862,7 +7876,8 @@ Example: usage of the memory devices added memory from the perspective of the guest. The mandatory ``size`` subelement configures the size of the added memory as - a scaled integer. + a scaled integer. For ``virtio-mem`` this represents the maximum possible + size exposed to the guest. The ``node`` subelement configures the guest NUMA node to attach the memory to. The element shall be used only if the guest has NUMA nodes configured. @@ -7889,6 +7904,17 @@ Example: usage of the memory devices so other backend types should use the ``readonly`` element. :since:`Since 5.0.0` + ``block`` + For ``virtio-mem`` only. + The size of an individual block, granularity of division of memory block. + Must be power of two and at least equal to size of a transparent hugepage + (2MiB on x84_64). The default is hypervisor dependent. + + ``requested`` + For ``virtio-mem`` only. + The total size exposed to the guest. Must respect ``block`` granularity + and be smaller than or equal to ``size``. + :anchor:`<a id="elementsIommu"/>` IOMMU devices diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 11fa24f398..82fdc7c8b5 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -6595,6 +6595,7 @@ <value>dimm</value> <value>nvdimm</value> <value>virtio-pmem</value> + <value>virtio-mem</value> </choice> </attribute> <optional> @@ -6682,6 +6683,16 @@ <ref name="unsignedInt"/> </element> </optional> + <optional> + <element name="block"> + <ref name="scaledInteger"/> + </element> + </optional> + <optional> + <element name="requested"> + <ref name="scaledInteger"/> + </element> + </optional> <optional> <element name="label"> <element name="size"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 858f6f923a..c4cac65a7b 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -1392,6 +1392,7 @@ VIR_ENUM_IMPL(virDomainMemoryModel, "dimm", "nvdimm", "virtio-pmem", + "virtio-mem", ); VIR_ENUM_IMPL(virDomainShmemModel, @@ -5494,6 +5495,7 @@ virDomainMemoryDefPostParse(virDomainMemoryDef *mem, } break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_DIMM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: @@ -14650,6 +14652,7 @@ virDomainMemorySourceDefParseXML(xmlNodePtr node, switch (def->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: if (virDomainParseMemory("./pagesize", "./pagesize/@unit", ctxt, &def->pagesize, false, false) < 0) return -1; @@ -14716,7 +14719,8 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node, &def->size, true, false) < 0) return -1; - if (def->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) { + switch (def->model) { + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: if (virDomainParseMemory("./label/size", "./label/size/@unit", ctxt, &def->labelsize, false, false) < 0) return -1; @@ -14735,6 +14739,23 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node, if (virXPathBoolean("boolean(./readonly)", ctxt)) def->readonly = true; + break; + + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (virDomainParseMemory("./block", "./block/@unit", ctxt, + &def->blocksize, false, false) < 0) + return -1; + + if (virDomainParseMemory("./requested", "./requested/@unit", ctxt, + &def->requestedsize, false, false) < 0) + return -1; + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; } return 0; @@ -16491,11 +16512,14 @@ virDomainMemoryFindByDefInternal(virDomainDef *def, /* target info -> always present */ if (tmp->model != mem->model || tmp->targetNode != mem->targetNode || - tmp->size != mem->size) + tmp->size != mem->size || + tmp->blocksize != mem->blocksize || + tmp->requestedsize != mem->requestedsize) continue; switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: /* source stuff -> match with device */ if (tmp->pagesize != mem->pagesize) continue; @@ -21875,6 +21899,22 @@ virDomainMemoryDefCheckABIStability(virDomainMemoryDef *src, return false; } + if (src->blocksize != dst->blocksize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Target memory device block size '%llu' doesn't match " + "source memory device block size '%llu'"), + dst->blocksize, src->blocksize); + return false; + } + + if (src->requestedsize != dst->requestedsize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Target memory device requested size '%llu' doesn't match " + "source memory device requested size '%llu'"), + dst->requestedsize, src->requestedsize); + return false; + } + if (src->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) { if (src->labelsize != dst->labelsize) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, @@ -25780,6 +25820,7 @@ virDomainMemorySourceDefFormat(virBuffer *buf, switch (def->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: if (def->sourceNodes) { if (!(bitmap = virBitmapFormat(def->sourceNodes))) return -1; @@ -25836,6 +25877,14 @@ virDomainMemoryTargetDefFormat(virBuffer *buf, if (def->readonly) virBufferAddLit(&childBuf, "<readonly/>\n"); + if (def->blocksize) { + virBufferAsprintf(&childBuf, "<block unit='KiB'>%llu</block>\n", + def->blocksize); + + virBufferAsprintf(&childBuf, "<requested unit='KiB'>%llu</requested>\n", + def->requestedsize); + } + virXMLFormatElement(buf, "target", NULL, &childBuf); } diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index c7e6df7981..3bdfb66a16 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2472,6 +2472,7 @@ typedef enum { VIR_DOMAIN_MEMORY_MODEL_DIMM, /* dimm hotpluggable memory device */ VIR_DOMAIN_MEMORY_MODEL_NVDIMM, /* nvdimm memory device */ VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM, /* virtio-pmem memory device */ + VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM, /* virtio-mem memory device */ VIR_DOMAIN_MEMORY_MODEL_LAST } virDomainMemoryModel; @@ -2492,6 +2493,8 @@ struct _virDomainMemoryDef { int targetNode; unsigned long long size; /* kibibytes */ unsigned long long labelsize; /* kibibytes; valid only for NVDIMM */ + unsigned long long blocksize; /* kibibytes; valid only for VIRTIO_MEM */ + unsigned long long requestedsize; /* kibibytes; valid only for VIRTIO_MEM */ bool readonly; /* valid only for NVDIMM */ /* required for QEMU NVDIMM ppc64 support */ diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c index 1bc62c364d..f023d22f23 100644 --- a/src/conf/domain_validate.c +++ b/src/conf/domain_validate.c @@ -28,6 +28,7 @@ #include "virlog.h" #include "virutil.h" #include "virstring.h" +#include "virhostmem.h" #define VIR_FROM_THIS VIR_FROM_DOMAIN @@ -1966,6 +1967,8 @@ static int virDomainMemoryDefValidate(const virDomainMemoryDef *mem, const virDomainDef *def) { + unsigned long long thpSize; + switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: if (!mem->nvdimmPath) { @@ -2019,6 +2022,42 @@ virDomainMemoryDefValidate(const virDomainMemoryDef *mem, _("virtio-pmem does not support NUMA nodes")); return -1; } + break; + + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (mem->requestedsize > mem->size) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("requested size must be smaller than or equal to @size")); + return -1; + } + + if (!VIR_IS_POW2(mem->blocksize)) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("block size must be a power of two")); + return -1; + } + + if (virHostMemGetTHPSize(&thpSize) < 0) { + /* We failed to get THP size, fall back to a sane default. On + * almost every architecture the size will be 2MiB, except for some + * funky arches like sparc and m68k. Use 2MiB and refine later if + * somebody complains. */ + thpSize = 2048; + } + + if (mem->blocksize < thpSize) { + virReportError(VIR_ERR_XML_DETAIL, + _("block size too small, must be at least %lluKiB"), + thpSize); + return -1; + } + + if (mem->requestedsize % mem->blocksize != 0) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("requested size must be an integer multiple of block size")); + return -1; + } + break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: break; diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c index ed47fa335a..79e8953b2f 100644 --- a/src/qemu/qemu_alias.c +++ b/src/qemu/qemu_alias.c @@ -522,6 +522,7 @@ qemuAssignDeviceMemoryAlias(virDomainDef *def, case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: prefix = "virtiopmem"; break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 4d29313f45..594e5643b1 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3317,6 +3317,7 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def, device = "virtio-pmem-pci"; break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 5de7461fb3..298f9b076d 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -9038,6 +9038,16 @@ qemuDomainDefValidateMemoryHotplugDevice(const virDomainMemoryDef *mem, needsNuma = false; break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI && + mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("only 'pci' addresses are supported for the %s device"), + virDomainMemoryModelTypeToString(mem->model)); + return -1; + } + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: return -1; diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 392368bd38..c43ad23cf5 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -380,9 +380,18 @@ qemuDomainPrimeVirtioDeviceAddresses(virDomainDef *def, } for (i = 0; i < def->nmems; i++) { - if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM && - def->mems[i]->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) - def->mems[i]->info.type = type; + switch (def->mems[i]->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (def->mems[i]->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) + def->mems[i]->info.type = type; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } } if (type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_CCW) { @@ -1010,6 +1019,7 @@ qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDef *dev, case VIR_DOMAIN_DEVICE_MEMORY: switch (dev->data.memory->model) { case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: return virtioFlags; case VIR_DOMAIN_MEMORY_MODEL_NONE: @@ -2370,12 +2380,19 @@ qemuDomainAssignDevicePCISlots(virDomainDef *def, for (i = 0; i < def->nmems; i++) { virDomainMemoryDef *mem = def->mems[i]; - if (mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM || - !virDeviceInfoPCIAddressIsWanted(&mem->info)) - continue; - - if (qemuDomainPCIAddressReserveNextAddr(addrs, &mem->info) < 0) - return -1; + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (virDeviceInfoPCIAddressIsWanted(&mem->info) && + qemuDomainPCIAddressReserveNextAddr(addrs, &mem->info) < 0) + return -1; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } } return 0; @@ -3034,6 +3051,7 @@ qemuDomainAssignMemoryDeviceSlot(virQEMUDriver *driver, break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: return qemuDomainEnsurePCIAddress(vm, &dev, driver); break; @@ -3059,6 +3077,7 @@ qemuDomainReleaseMemoryDeviceSlot(virDomainObj *vm, break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: qemuDomainReleaseDeviceAddress(vm, &mem->info); break; @@ -3093,6 +3112,7 @@ qemuDomainAssignMemorySlots(virDomainDef *def) break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: /* handled in qemuDomainAssignPCIAddresses() */ break; case VIR_DOMAIN_MEMORY_MODEL_NONE: diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 632cb817b9..07bd7e928a 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3791,6 +3791,7 @@ qemuProcessDomainMemoryDefNeedHugepagesPath(const virDomainMemoryDef *mem, { switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: return mem->pagesize && mem->pagesize != system_pagesize; case VIR_DOMAIN_MEMORY_MODEL_NONE: @@ -3859,6 +3860,7 @@ qemuProcessNeedMemoryBackingPath(virDomainDef *def, if (mem) { switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: if (mem->access != VIR_DOMAIN_MEMORY_ACCESS_DEFAULT) { /* No need to check for access mode on the target node, * it was checked for in the previous loop. */ diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 9d93f373ab..5eaa6fa5ac 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -4993,6 +4993,14 @@ qemuValidateDomainDeviceDefMemory(virDomainMemoryDef *mem, } break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("virtio-mem isn't supported by this QEMU binary")); + return -1; + } + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; diff --git a/src/security/security_apparmor.c b/src/security/security_apparmor.c index d942ea5005..2cc15ba6af 100644 --- a/src/security/security_apparmor.c +++ b/src/security/security_apparmor.c @@ -693,6 +693,7 @@ AppArmorSetMemoryLabel(virSecurityManager *mgr, return reload_profile(mgr, def, mem->nvdimmPath, true); case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; } diff --git a/src/security/security_dac.c b/src/security/security_dac.c index 04b9ecf028..1733d63410 100644 --- a/src/security/security_dac.c +++ b/src/security/security_dac.c @@ -1849,6 +1849,7 @@ virSecurityDACRestoreMemoryLabel(virSecurityManager *mgr, break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: case VIR_DOMAIN_MEMORY_MODEL_NONE: ret = 0; @@ -2033,6 +2034,7 @@ virSecurityDACSetMemoryLabel(virSecurityManager *mgr, break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: case VIR_DOMAIN_MEMORY_MODEL_NONE: ret = 0; diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index 0e5ea0366d..cc72453329 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1582,6 +1582,7 @@ virSecuritySELinuxSetMemoryLabel(virSecurityManager *mgr, case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; } @@ -1609,6 +1610,7 @@ virSecuritySELinuxRestoreMemoryLabel(virSecurityManager *mgr, break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: ret = 0; diff --git a/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml new file mode 100644 index 0000000000..c10528aad8 --- /dev/null +++ b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml @@ -0,0 +1,67 @@ +<domain type='kvm'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <maxMemory slots='16' unit='KiB'>1099511627776</maxMemory> + <memory unit='KiB'>8388608</memory> + <currentMemory unit='KiB'>8388608</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <cpu mode='custom' match='exact' check='none'> + <model fallback='forbid'>qemu64</model> + <topology sockets='2' dies='1' cores='1' threads='1'/> + <numa> + <cell id='0' cpus='0-1' memory='2095104' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i386</emulator> + <disk type='block' device='disk'> + <driver name='qemu' type='raw'/> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='ide' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <controller type='usb' index='0' model='piix3-uhci'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <audio id='1' type='none'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </memballoon> + <memory model='virtio-mem'> + <target> + <size unit='KiB'>1048576</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>524288</requested> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </memory> + <memory model='virtio-mem'> + <source> + <nodemask>1-3</nodemask> + <pagesize unit='KiB'>2048</pagesize> + </source> + <target> + <size unit='KiB'>2097152</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>1048576</requested> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </memory> + </devices> +</domain> diff --git a/tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml b/tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml new file mode 120000 index 0000000000..a9d298129c --- /dev/null +++ b/tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml @@ -0,0 +1 @@ +../qemuxml2argvdata/memory-hotplug-virtio-mem.xml \ No newline at end of file diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 6d3526f91f..ccefef3a5c 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -1214,6 +1214,7 @@ mymain(void) QEMU_CAPS_DEVICE_NVDIMM, QEMU_CAPS_LAST, ARG_END); DO_TEST_CAPS_LATEST("memory-hotplug-virtio-pmem"); + DO_TEST_CAPS_LATEST("memory-hotplug-virtio-mem"); DO_TEST_NOCAPS("net-udp"); -- 2.32.0

Nothing special is happening here. All important changes were done when for 'virtio-pmem' (adjusting the code to put virtio memory on PCI bus, generating alias using qemuDomainDeviceAliasIndex(). The only bit that might look suspicious is no prealloc for virtio-mem. But if you think about it, the whole purpose of this device is to change amount of memory exposed to guest on the fly. There is no point in locking the whole backend in memory. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_alias.c | 9 +++- src/qemu/qemu_command.c | 24 +++++++++-- ...mory-hotplug-virtio-mem.x86_64-latest.args | 41 +++++++++++++++++++ tests/qemuxml2argvtest.c | 1 + 4 files changed, 70 insertions(+), 5 deletions(-) create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c index 79e8953b2f..81a1e7eeed 100644 --- a/src/qemu/qemu_alias.c +++ b/src/qemu/qemu_alias.c @@ -475,8 +475,11 @@ qemuDeviceMemoryGetAliasID(virDomainDef *def, size_t i; int maxidx = 0; - /* virtio-pmem goes onto PCI bus and thus DIMM address is not valid */ - if (!oldAlias && mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM) + /* virtio-pmem and virtio-mem go onto PCI bus and thus DIMM address is not + * valid */ + if (!oldAlias && + mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM && + mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) return mem->info.addr.dimm.slot; for (i = 0; i < def->nmems; i++) { @@ -523,6 +526,8 @@ qemuAssignDeviceMemoryAlias(virDomainDef *def, prefix = "virtiopmem"; break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + prefix = "virtiomem"; + break; case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 594e5643b1..91f3094ac8 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3136,9 +3136,19 @@ qemuBuildMemoryBackendProps(virJSONValue **backendProps, virJSONValueObjectAdd(props, "b:x-use-canonical-path-for-ramblock-id", false, NULL) < 0) return -1; - if (!priv->memPrealloc && - virJSONValueObjectAdd(props, "B:prealloc", prealloc, NULL) < 0) - return -1; + if (mem->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) { + /* Explicitly disable prealloc for virtio-mem */ + if (priv->memPrealloc && + virJSONValueObjectAppendBoolean(props, "prealloc", 0) < 0) + return -1; + if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MEMORY_BACKEND_RESERVE) && + virJSONValueObjectAppendBoolean(props, "reserve", 0) < 0) + return -1; + } else { + if (!priv->memPrealloc && + virJSONValueObjectAdd(props, "B:prealloc", prealloc, NULL) < 0) + return -1; + } if (virJSONValueObjectAdd(props, "U:size", mem->size * 1024, NULL) < 0) return -1; @@ -3318,6 +3328,9 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def, break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + device = "virtio-mem-pci"; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: @@ -3334,6 +3347,11 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def, if (mem->labelsize) virBufferAsprintf(&buf, "label-size=%llu,", mem->labelsize * 1024); + if (mem->blocksize) { + virBufferAsprintf(&buf, "block-size=%llu,", mem->blocksize * 1024); + virBufferAsprintf(&buf, "requested-size=%llu,", mem->requestedsize * 1024); + } + if (mem->uuid) { char uuidstr[VIR_UUID_STRING_BUFLEN]; diff --git a/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args new file mode 100644 index 0000000000..22ee1bc459 --- /dev/null +++ b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args @@ -0,0 +1,41 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +/usr/bin/qemu-system-i386 \ +-name guest=QEMUGuest1,debug-threads=on \ +-S \ +-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/tmp/lib/domain--1-QEMUGuest1/master-key.aes"}' \ +-machine pc,accel=kvm,usb=off,dump-guest-core=off \ +-cpu qemu64 \ +-m size=2095104k,slots=16,maxmem=1099511627776k \ +-overcommit mem-lock=off \ +-smp 2,sockets=2,dies=1,cores=1,threads=1 \ +-object '{"qom-type":"memory-backend-ram","id":"ram-node0","size":2145386496}' \ +-numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \ +-object '{"qom-type":"memory-backend-ram","id":"memvirtiomem0","reserve":false,"size":1073741824}' \ +-device virtio-mem-pci,node=0,block-size=2097152,requested-size=536870912,memdev=memvirtiomem0,id=virtiomem0,bus=pci.0,addr=0x2 \ +-object '{"qom-type":"memory-backend-file","id":"memvirtiomem1","mem-path":"/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1","reserve":false,"size":2147483648,"host-nodes":[1,2,3],"policy":"bind"}' \ +-device virtio-mem-pci,node=0,block-size=2097152,requested-size=1073741824,memdev=memvirtiomem1,id=virtiomem1,bus=pci.0,addr=0x3 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-boot strict=on \ +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ +-blockdev '{"driver":"host_device","filename":"/dev/HostVG/QEMUGuest1","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ +-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}' \ +-device ide-hd,bus=ide.0,unit=0,drive=libvirt-1-format,id=ide0-0-0,bootindex=1 \ +-audiodev id=audio1,driver=none \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ +-msg timestamp=on diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 74af93b08f..23f4a8471f 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -3143,6 +3143,7 @@ mymain(void) ARG_END); DO_TEST_CAPS_VER("memory-hotplug-virtio-pmem", "5.2.0"); DO_TEST_CAPS_LATEST("memory-hotplug-virtio-pmem"); + DO_TEST_CAPS_LATEST("memory-hotplug-virtio-mem"); DO_TEST("machine-aeskeywrap-on-caps", QEMU_CAPS_AES_KEY_WRAP, -- 2.32.0

On 13.09.21 16:52, Michal Privoznik wrote:
Nothing special is happening here. All important changes were done when for 'virtio-pmem' (adjusting the code to put virtio memory on PCI bus, generating alias using qemuDomainDeviceAliasIndex(). The only bit that might look suspicious is no prealloc for virtio-mem. But if you think about it, the whole purpose of this device is to change amount of memory exposed to guest on the fly. There is no point in locking the whole backend in memory.
Do I understand correctly that we'll always try setting "reserve=off", and "prealloc=off"? That would be awesome :) I do wonder if we want to warn or bail out if "priv->memPrealloc" is set but we are temporarily not able to support it -- as discussed, because virtio-mem in QEMU yet has to be taught about it. FYI: QEMU will bail out if "reserve=off,prealloc=on" is set in combination.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_alias.c | 9 +++- src/qemu/qemu_command.c | 24 +++++++++-- ...mory-hotplug-virtio-mem.x86_64-latest.args | 41 +++++++++++++++++++ tests/qemuxml2argvtest.c | 1 + 4 files changed, 70 insertions(+), 5 deletions(-) create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args
diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c index 79e8953b2f..81a1e7eeed 100644 --- a/src/qemu/qemu_alias.c +++ b/src/qemu/qemu_alias.c @@ -475,8 +475,11 @@ qemuDeviceMemoryGetAliasID(virDomainDef *def, size_t i; int maxidx = 0;
- /* virtio-pmem goes onto PCI bus and thus DIMM address is not valid */ - if (!oldAlias && mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM) + /* virtio-pmem and virtio-mem go onto PCI bus and thus DIMM address is not + * valid */ + if (!oldAlias && + mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM && + mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) return mem->info.addr.dimm.slot;
for (i = 0; i < def->nmems; i++) { @@ -523,6 +526,8 @@ qemuAssignDeviceMemoryAlias(virDomainDef *def, prefix = "virtiopmem"; break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + prefix = "virtiomem"; + break; case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 594e5643b1..91f3094ac8 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3136,9 +3136,19 @@ qemuBuildMemoryBackendProps(virJSONValue **backendProps, virJSONValueObjectAdd(props, "b:x-use-canonical-path-for-ramblock-id", false, NULL) < 0) return -1;
- if (!priv->memPrealloc && - virJSONValueObjectAdd(props, "B:prealloc", prealloc, NULL) < 0) - return -1; + if (mem->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) { + /* Explicitly disable prealloc for virtio-mem */ + if (priv->memPrealloc && + virJSONValueObjectAppendBoolean(props, "prealloc", 0) < 0) + return -1; + if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MEMORY_BACKEND_RESERVE) && + virJSONValueObjectAppendBoolean(props, "reserve", 0) < 0) + return -1; + } else { + if (!priv->memPrealloc && + virJSONValueObjectAdd(props, "B:prealloc", prealloc, NULL) < 0) + return -1; + }
if (virJSONValueObjectAdd(props, "U:size", mem->size * 1024, NULL) < 0) return -1; @@ -3318,6 +3328,9 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def, break;
case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + device = "virtio-mem-pci"; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: @@ -3334,6 +3347,11 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def, if (mem->labelsize) virBufferAsprintf(&buf, "label-size=%llu,", mem->labelsize * 1024);
+ if (mem->blocksize) { + virBufferAsprintf(&buf, "block-size=%llu,", mem->blocksize * 1024); + virBufferAsprintf(&buf, "requested-size=%llu,", mem->requestedsize * 1024); + } + if (mem->uuid) { char uuidstr[VIR_UUID_STRING_BUFLEN];
diff --git a/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args new file mode 100644 index 0000000000..22ee1bc459 --- /dev/null +++ b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args @@ -0,0 +1,41 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +/usr/bin/qemu-system-i386 \ +-name guest=QEMUGuest1,debug-threads=on \ +-S \ +-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/tmp/lib/domain--1-QEMUGuest1/master-key.aes"}' \ +-machine pc,accel=kvm,usb=off,dump-guest-core=off \ +-cpu qemu64 \ +-m size=2095104k,slots=16,maxmem=1099511627776k \ +-overcommit mem-lock=off \ +-smp 2,sockets=2,dies=1,cores=1,threads=1 \ +-object '{"qom-type":"memory-backend-ram","id":"ram-node0","size":2145386496}' \ +-numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \ +-object '{"qom-type":"memory-backend-ram","id":"memvirtiomem0","reserve":false,"size":1073741824}' \ +-device virtio-mem-pci,node=0,block-size=2097152,requested-size=536870912,memdev=memvirtiomem0,id=virtiomem0,bus=pci.0,addr=0x2 \ +-object '{"qom-type":"memory-backend-file","id":"memvirtiomem1","mem-path":"/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1","reserve":false,"size":2147483648,"host-nodes":[1,2,3],"policy":"bind"}' \ +-device virtio-mem-pci,node=0,block-size=2097152,requested-size=1073741824,memdev=memvirtiomem1,id=virtiomem1,bus=pci.0,addr=0x3 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-boot strict=on \ +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ +-blockdev '{"driver":"host_device","filename":"/dev/HostVG/QEMUGuest1","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ +-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}' \ +-device ide-hd,bus=ide.0,unit=0,drive=libvirt-1-format,id=ide0-0-0,bootindex=1 \ +-audiodev id=audio1,driver=none \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ +-msg timestamp=on diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 74af93b08f..23f4a8471f 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -3143,6 +3143,7 @@ mymain(void) ARG_END); DO_TEST_CAPS_VER("memory-hotplug-virtio-pmem", "5.2.0"); DO_TEST_CAPS_LATEST("memory-hotplug-virtio-pmem"); + DO_TEST_CAPS_LATEST("memory-hotplug-virtio-mem");
DO_TEST("machine-aeskeywrap-on-caps", QEMU_CAPS_AES_KEY_WRAP,
-- Thanks, David / dhildenb

Hi Michal, I tried to test the virtio-mem with upstream version v7.7.0-136-g9b49c2c6d3 adding the current patch (with qemu-6.1.0-7.fc36.x86_64) - <maxMemory slots='16' unit='KiB'>8388608</maxMemory> <memory unit='KiB'>1179648</memory> <currentMemory unit='KiB'>1179648</currentMemory> <memoryBacking> <allocation mode='immediate'/> </memoryBacking> .... <cpu mode='custom' match='exact' check='none'> <model fallback='forbid'>qemu64</model> <numa> <cell id='0' cpus='0-1' memory='1048576' unit='KiB' discard='yes'/> </numa> </cpu> ... <memory model='virtio-mem'> <source> <nodemask>0</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>131072</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>131072</requested> </target> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </memory> The vm was started successfully and there was no "prealloc: true" in the command line. But I also found when <allocation mode='immediate'/> was added in the domain and the "virtio-mem" device was not added, the qemu command line also didn't have "prealloc:true". Is this correct? This result is different in "libvirt-7.6.0-3.module+el8.5.0+12510+80564ecf.x86_64" & "qemu-kvm-6.0.0-29.module+el8.5.0+12386+43574bac.x86_64". Best Regards, Jing Qi On Tue, Sep 14, 2021 at 9:07 PM David Hildenbrand <david@redhat.com> wrote:
On 13.09.21 16:52, Michal Privoznik wrote:
Nothing special is happening here. All important changes were done when for 'virtio-pmem' (adjusting the code to put virtio memory on PCI bus, generating alias using qemuDomainDeviceAliasIndex(). The only bit that might look suspicious is no prealloc for virtio-mem. But if you think about it, the whole purpose of this device is to change amount of memory exposed to guest on the fly. There is no point in locking the whole backend in memory.
Do I understand correctly that we'll always try setting "reserve=off", and "prealloc=off"? That would be awesome :)
I do wonder if we want to warn or bail out if "priv->memPrealloc" is set but we are temporarily not able to support it -- as discussed, because virtio-mem in QEMU yet has to be taught about it.
FYI: QEMU will bail out if "reserve=off,prealloc=on" is set in combination.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_alias.c | 9 +++- src/qemu/qemu_command.c | 24 +++++++++-- ...mory-hotplug-virtio-mem.x86_64-latest.args | 41 +++++++++++++++++++ tests/qemuxml2argvtest.c | 1 + 4 files changed, 70 insertions(+), 5 deletions(-) create mode 100644
tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args
diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c index 79e8953b2f..81a1e7eeed 100644 --- a/src/qemu/qemu_alias.c +++ b/src/qemu/qemu_alias.c @@ -475,8 +475,11 @@ qemuDeviceMemoryGetAliasID(virDomainDef *def, size_t i; int maxidx = 0;
- /* virtio-pmem goes onto PCI bus and thus DIMM address is not valid
*/
- if (!oldAlias && mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM) + /* virtio-pmem and virtio-mem go onto PCI bus and thus DIMM address is not + * valid */ + if (!oldAlias && + mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM && + mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) return mem->info.addr.dimm.slot;
for (i = 0; i < def->nmems; i++) { @@ -523,6 +526,8 @@ qemuAssignDeviceMemoryAlias(virDomainDef *def, prefix = "virtiopmem"; break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + prefix = "virtiomem"; + break; case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 594e5643b1..91f3094ac8 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3136,9 +3136,19 @@ qemuBuildMemoryBackendProps(virJSONValue **backendProps, virJSONValueObjectAdd(props, "b:x-use-canonical-path-for-ramblock-id", false, NULL) < 0) return -1;
- if (!priv->memPrealloc && - virJSONValueObjectAdd(props, "B:prealloc", prealloc, NULL) < 0) - return -1; + if (mem->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) { + /* Explicitly disable prealloc for virtio-mem */ + if (priv->memPrealloc && + virJSONValueObjectAppendBoolean(props, "prealloc", 0) < 0) + return -1; + if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MEMORY_BACKEND_RESERVE) && + virJSONValueObjectAppendBoolean(props, "reserve", 0) < 0) + return -1; + } else { + if (!priv->memPrealloc && + virJSONValueObjectAdd(props, "B:prealloc", prealloc, NULL) < 0) + return -1; + }
if (virJSONValueObjectAdd(props, "U:size", mem->size * 1024, NULL) < 0) return -1; @@ -3318,6 +3328,9 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def, break;
case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + device = "virtio-mem-pci"; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: @@ -3334,6 +3347,11 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def, if (mem->labelsize) virBufferAsprintf(&buf, "label-size=%llu,", mem->labelsize * 1024);
+ if (mem->blocksize) { + virBufferAsprintf(&buf, "block-size=%llu,", mem->blocksize * 1024); + virBufferAsprintf(&buf, "requested-size=%llu,", mem->requestedsize * 1024); + } + if (mem->uuid) { char uuidstr[VIR_UUID_STRING_BUFLEN];
diff --git a/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args new file mode 100644 index 0000000000..22ee1bc459 --- /dev/null +++ b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args @@ -0,0 +1,41 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +/usr/bin/qemu-system-i386 \ +-name guest=QEMUGuest1,debug-threads=on \ +-S \ +-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/tmp/lib/domain--1-QEMUGuest1/master-key.aes"}' \ +-machine pc,accel=kvm,usb=off,dump-guest-core=off \ +-cpu qemu64 \ +-m size=2095104k,slots=16,maxmem=1099511627776k \ +-overcommit mem-lock=off \ +-smp 2,sockets=2,dies=1,cores=1,threads=1 \ +-object '{"qom-type":"memory-backend-ram","id":"ram-node0","size":2145386496}' \ +-numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \ +-object '{"qom-type":"memory-backend-ram","id":"memvirtiomem0","reserve":false,"size":1073741824}' \ +-device virtio-mem-pci,node=0,block-size=2097152,requested-size=536870912,memdev=memvirtiomem0,id=virtiomem0,bus=pci.0,addr=0x2 \ +-object '{"qom-type":"memory-backend-file","id":"memvirtiomem1","mem-path":"/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1","reserve":false,"size":2147483648,"host-nodes":[1,2,3],"policy":"bind"}' \ +-device virtio-mem-pci,node=0,block-size=2097152,requested-size=1073741824,memdev=memvirtiomem1,id=virtiomem1,bus=pci.0,addr=0x3 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-boot strict=on \ +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ +-blockdev '{"driver":"host_device","filename":"/dev/HostVG/QEMUGuest1","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ +-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}' \ +-device ide-hd,bus=ide.0,unit=0,drive=libvirt-1-format,id=ide0-0-0,bootindex=1 \ +-audiodev id=audio1,driver=none \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ +-msg timestamp=on diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 74af93b08f..23f4a8471f 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -3143,6 +3143,7 @@ mymain(void) ARG_END); DO_TEST_CAPS_VER("memory-hotplug-virtio-pmem", "5.2.0"); DO_TEST_CAPS_LATEST("memory-hotplug-virtio-pmem"); + DO_TEST_CAPS_LATEST("memory-hotplug-virtio-mem");
DO_TEST("machine-aeskeywrap-on-caps", QEMU_CAPS_AES_KEY_WRAP,
-- Thanks,
David / dhildenb
-- Thanks & Regards, Jing,Qi

On 9/15/21 8:16 AM, Jing Qi wrote:
Hi Michal, I tried to test the virtio-mem with upstream version v7.7.0-136-g9b49c2c6d3 adding the current patch (with qemu-6.1.0-7.fc36.x86_64) -
Hey, thanks for that!
<maxMemory slots='16' unit='KiB'>8388608</maxMemory> <memory unit='KiB'>1179648</memory> <currentMemory unit='KiB'>1179648</currentMemory> <memoryBacking> <allocation mode='immediate'/> </memoryBacking> .... <cpu mode='custom' match='exact' check='none'> <model fallback='forbid'>qemu64</model> <numa> <cell id='0' cpus='0-1' memory='1048576' unit='KiB' discard='yes'/> </numa> </cpu> ... <memory model='virtio-mem'> <source> <nodemask>0</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>131072</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>131072</requested> </target> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </memory>
The vm was started successfully and there was no "prealloc: true" in the command line. But I also found when <allocation mode='immediate'/> was added in the domain and the "virtio-mem" device was not added, the qemu command line also didn't have "prealloc:true". Is this correct? This result is different in "libvirt-7.6.0-3.module+el8.5.0+12510+80564ecf.x86_64" & "qemu-kvm-6.0.0-29.module+el8.5.0+12386+43574bac.x86_64".
This doesn't sound correct. But I'm unable to reproduce locally. Can you please share your domain XML that's failing? Michal

On 9/14/21 3:05 PM, David Hildenbrand wrote:
On 13.09.21 16:52, Michal Privoznik wrote:
Nothing special is happening here. All important changes were done when for 'virtio-pmem' (adjusting the code to put virtio memory on PCI bus, generating alias using qemuDomainDeviceAliasIndex(). The only bit that might look suspicious is no prealloc for virtio-mem. But if you think about it, the whole purpose of this device is to change amount of memory exposed to guest on the fly. There is no point in locking the whole backend in memory.
Do I understand correctly that we'll always try setting "reserve=off", and "prealloc=off"? That would be awesome :)
prealloc=off would be set (almost) unconditionally for all memory-backend-* objects that are associated with virtio-mem device. And if QEMU is new enough to have .reserve attribute then reserve=off is set too.
I do wonder if we want to warn or bail out if "priv->memPrealloc" is set but we are temporarily not able to support it -- as discussed, because virtio-mem in QEMU yet has to be taught about it.
priv->memPrealloc might not be what you think it is. The fact whether immediate allocation was requested is stored in def->mem.allocation; and priv->memPrealloc is just there to track whether -mem-prealloc what already put onto (paritally) generated cmd line and thus the rest of generators must refrain from setting prealloc=on. Codewise speaking, if you'd look at qemuBuildCommandLine() (this is where cmd line generation happens) then you'd see qemuBuildMemCommandLine() being called first and only after that qemuBuildMemoryDeviceCommandLine() being called second. The former is responsible for generating generic memory for guest (e.g. -m XX -mem-prealloc -mem-path ... which is the old style, nowadays a combination of -machine memory-backend= + -object memory-backend-* is generated). Long story short, if priv->memPrealloc isn't true at this point, then libvirt doesn't have to set prealloc=off explicitly, because off is the default.
FYI: QEMU will bail out if "reserve=off,prealloc=on" is set in combination.
Yeah, this combination should never happen. Michal

Hi Michal, The domain xml is attached. Regards, Jing Qi On Wed, Sep 15, 2021 at 7:14 PM Michal Prívozník <mprivozn@redhat.com> wrote:
On 9/14/21 3:05 PM, David Hildenbrand wrote:
On 13.09.21 16:52, Michal Privoznik wrote:
Nothing special is happening here. All important changes were done when for 'virtio-pmem' (adjusting the code to put virtio memory on PCI bus, generating alias using qemuDomainDeviceAliasIndex(). The only bit that might look suspicious is no prealloc for virtio-mem. But if you think about it, the whole purpose of this device is to change amount of memory exposed to guest on the fly. There is no point in locking the whole backend in memory.
Do I understand correctly that we'll always try setting "reserve=off", and "prealloc=off"? That would be awesome :)
prealloc=off would be set (almost) unconditionally for all memory-backend-* objects that are associated with virtio-mem device. And if QEMU is new enough to have .reserve attribute then reserve=off is set too.
I do wonder if we want to warn or bail out if "priv->memPrealloc" is set but we are temporarily not able to support it -- as discussed, because virtio-mem in QEMU yet has to be taught about it.
priv->memPrealloc might not be what you think it is. The fact whether immediate allocation was requested is stored in def->mem.allocation; and priv->memPrealloc is just there to track whether -mem-prealloc what already put onto (paritally) generated cmd line and thus the rest of generators must refrain from setting prealloc=on.
Codewise speaking, if you'd look at qemuBuildCommandLine() (this is where cmd line generation happens) then you'd see qemuBuildMemCommandLine() being called first and only after that qemuBuildMemoryDeviceCommandLine() being called second.
The former is responsible for generating generic memory for guest (e.g. -m XX -mem-prealloc -mem-path ... which is the old style, nowadays a combination of -machine memory-backend= + -object memory-backend-* is generated).
Long story short, if priv->memPrealloc isn't true at this point, then libvirt doesn't have to set prealloc=off explicitly, because off is the default.
FYI: QEMU will bail out if "reserve=off,prealloc=on" is set in
combination.
Yeah, this combination should never happen.
Michal
-- Thanks & Regards, Jing,Qi

On 9/15/21 1:26 PM, Jing Qi wrote:
Hi Michal, The domain xml is attached.
Thank you. I was able to reproduce but the problem is not in these patches. I can reproduce even without them. I have a fix that I will post shortly. Michal

On 15.09.21 13:07, Michal Prívozník wrote:
On 9/14/21 3:05 PM, David Hildenbrand wrote:
On 13.09.21 16:52, Michal Privoznik wrote:
Nothing special is happening here. All important changes were done when for 'virtio-pmem' (adjusting the code to put virtio memory on PCI bus, generating alias using qemuDomainDeviceAliasIndex(). The only bit that might look suspicious is no prealloc for virtio-mem. But if you think about it, the whole purpose of this device is to change amount of memory exposed to guest on the fly. There is no point in locking the whole backend in memory.
Do I understand correctly that we'll always try setting "reserve=off", and "prealloc=off"? That would be awesome :)
prealloc=off would be set (almost) unconditionally for all memory-backend-* objects that are associated with virtio-mem device. And if QEMU is new enough to have .reserve attribute then reserve=off is set too.
Ah, perfect.
I do wonder if we want to warn or bail out if "priv->memPrealloc" is set but we are temporarily not able to support it -- as discussed, because virtio-mem in QEMU yet has to be taught about it.
priv->memPrealloc might not be what you think it is. The fact whether immediate allocation was requested is stored in def->mem.allocation; and priv->memPrealloc is just there to track whether -mem-prealloc what already put onto (paritally) generated cmd line and thus the rest of generators must refrain from setting prealloc=on.
Codewise speaking, if you'd look at qemuBuildCommandLine() (this is where cmd line generation happens) then you'd see qemuBuildMemCommandLine() being called first and only after that qemuBuildMemoryDeviceCommandLine() being called second.
The former is responsible for generating generic memory for guest (e.g. -m XX -mem-prealloc -mem-path ... which is the old style, nowadays a combination of -machine memory-backend= + -object memory-backend-* is generated).
Long story short, if priv->memPrealloc isn't true at this point, then libvirt doesn't have to set prealloc=off explicitly, because off is the default.
That makes sense. Yet I wonder if we should warn that preallocation is currently not supported for virtio-mem and will be ignored. Your decision :) Thanks a bunch! -- Thanks, David / dhildenb

On 9/17/21 2:54 PM, David Hildenbrand wrote:
On 15.09.21 13:07, Michal Prívozník wrote:
On 9/14/21 3:05 PM, David Hildenbrand wrote:
On 13.09.21 16:52, Michal Privoznik wrote:
Nothing special is happening here. All important changes were done when for 'virtio-pmem' (adjusting the code to put virtio memory on PCI bus, generating alias using qemuDomainDeviceAliasIndex(). The only bit that might look suspicious is no prealloc for virtio-mem. But if you think about it, the whole purpose of this device is to change amount of memory exposed to guest on the fly. There is no point in locking the whole backend in memory.
Do I understand correctly that we'll always try setting "reserve=off", and "prealloc=off"? That would be awesome :)
prealloc=off would be set (almost) unconditionally for all memory-backend-* objects that are associated with virtio-mem device. And if QEMU is new enough to have .reserve attribute then reserve=off is set too.
Ah, perfect.
I do wonder if we want to warn or bail out if "priv->memPrealloc" is set but we are temporarily not able to support it -- as discussed, because virtio-mem in QEMU yet has to be taught about it.
priv->memPrealloc might not be what you think it is. The fact whether immediate allocation was requested is stored in def->mem.allocation; and priv->memPrealloc is just there to track whether -mem-prealloc what already put onto (paritally) generated cmd line and thus the rest of generators must refrain from setting prealloc=on.
Codewise speaking, if you'd look at qemuBuildCommandLine() (this is where cmd line generation happens) then you'd see qemuBuildMemCommandLine() being called first and only after that qemuBuildMemoryDeviceCommandLine() being called second.
The former is responsible for generating generic memory for guest (e.g. -m XX -mem-prealloc -mem-path ... which is the old style, nowadays a combination of -machine memory-backend= + -object memory-backend-* is generated).
Long story short, if priv->memPrealloc isn't true at this point, then libvirt doesn't have to set prealloc=off explicitly, because off is the default.
That makes sense. Yet I wonder if we should warn that preallocation is currently not supported for virtio-mem and will be ignored. Your decision :)
We could do that, yes. Let me respin the series with that change - these patches don't apply cleanly anymore anyway. Should a similar warning be produced when hugepages are configured? Michal

On 21.09.21 10:56, Michal Prívozník wrote:
On 9/17/21 2:54 PM, David Hildenbrand wrote:
On 15.09.21 13:07, Michal Prívozník wrote:
On 9/14/21 3:05 PM, David Hildenbrand wrote:
On 13.09.21 16:52, Michal Privoznik wrote:
Nothing special is happening here. All important changes were done when for 'virtio-pmem' (adjusting the code to put virtio memory on PCI bus, generating alias using qemuDomainDeviceAliasIndex(). The only bit that might look suspicious is no prealloc for virtio-mem. But if you think about it, the whole purpose of this device is to change amount of memory exposed to guest on the fly. There is no point in locking the whole backend in memory.
Do I understand correctly that we'll always try setting "reserve=off", and "prealloc=off"? That would be awesome :)
prealloc=off would be set (almost) unconditionally for all memory-backend-* objects that are associated with virtio-mem device. And if QEMU is new enough to have .reserve attribute then reserve=off is set too.
Ah, perfect.
I do wonder if we want to warn or bail out if "priv->memPrealloc" is set but we are temporarily not able to support it -- as discussed, because virtio-mem in QEMU yet has to be taught about it.
priv->memPrealloc might not be what you think it is. The fact whether immediate allocation was requested is stored in def->mem.allocation; and priv->memPrealloc is just there to track whether -mem-prealloc what already put onto (paritally) generated cmd line and thus the rest of generators must refrain from setting prealloc=on.
Codewise speaking, if you'd look at qemuBuildCommandLine() (this is where cmd line generation happens) then you'd see qemuBuildMemCommandLine() being called first and only after that qemuBuildMemoryDeviceCommandLine() being called second.
The former is responsible for generating generic memory for guest (e.g. -m XX -mem-prealloc -mem-path ... which is the old style, nowadays a combination of -machine memory-backend= + -object memory-backend-* is generated).
Long story short, if priv->memPrealloc isn't true at this point, then libvirt doesn't have to set prealloc=off explicitly, because off is the default.
That makes sense. Yet I wonder if we should warn that preallocation is currently not supported for virtio-mem and will be ignored. Your decision :)
We could do that, yes. Let me respin the series with that change - these patches don't apply cleanly anymore anyway. Should a similar warning be produced when hugepages are configured?
I think it might make sense to print a warning if we cannot support prealloc and it was requested. We can then skip the warning once virtio-mem supports the "prealloc" property that we can just enable. Using hugepages without prealloc is in general unsafe, so I don't think we need a virtio-mem specific warning. :) Thanks! -- Thanks, David / dhildenb

As advertised in one of previous commits, we want to be able to change 'requested-size' attribute of virtio-mem on the fly. This commit does exactly that. Changing anything else is checked for and forbidden. Once guest has changed the allocation, QEMU emits an event which we will use to track the allocation. In the next commit. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/domain_conf.c | 47 +++++++++- src/conf/domain_conf.h | 5 + src/libvirt_private.syms | 1 + src/qemu/qemu_driver.c | 174 ++++++++++++++++++++++++++++++++++- src/qemu/qemu_hotplug.c | 18 ++++ src/qemu/qemu_hotplug.h | 5 + src/qemu/qemu_monitor.c | 13 +++ src/qemu/qemu_monitor.h | 5 + src/qemu/qemu_monitor_json.c | 15 +++ src/qemu/qemu_monitor_json.h | 5 + 10 files changed, 286 insertions(+), 2 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index c4cac65a7b..9562065499 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -16574,6 +16574,50 @@ virDomainMemoryFindInactiveByDef(virDomainDef *def, } +/** + * virDomainMemoryFindByDeviceInfo: + * @def: domain defintion + * @info: device info to match + * @pos: store position within array + * + * For given domain definition @def find <memory/> device with + * matching address and matching device alias (if set in @info, + * otherwise ignored). + * + * If @pos is not NULL then the position of the matched device + * within the array is stored there. + * + * Returns: device if found, + * NULL otherwise. + */ +virDomainMemoryDef * +virDomainMemoryFindByDeviceInfo(virDomainDef *def, + virDomainDeviceInfo *info, + int *pos) +{ + size_t i; + + for (i = 0; i < def->nmems; i++) { + virDomainMemoryDef *tmp = def->mems[i]; + + if (!virDomainDeviceInfoAddressIsEqual(&tmp->info, info)) + continue; + + /* alias, if present */ + if (info->alias && + STRNEQ_NULLABLE(tmp->info.alias, info->alias)) + continue; + + if (pos) + *pos = i; + + return tmp; + } + + return NULL; +} + + /** * virDomainMemoryInsert: * @@ -28567,7 +28611,8 @@ virDomainDefCompatibleDevice(virDomainDef *def, return -1; } - if ((virDomainDefGetMemoryTotal(def) + sz) > def->mem.max_memory) { + if (action == VIR_DOMAIN_DEVICE_ACTION_ATTACH && + (virDomainDefGetMemoryTotal(def) + sz) > def->mem.max_memory) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("Attaching memory device with size '%llu' would " "exceed domain's maxMemory config size '%llu'"), diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 3bdfb66a16..e9d8e0c7ae 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -3806,6 +3806,11 @@ int virDomainMemoryFindByDef(virDomainDef *def, virDomainMemoryDef *mem) int virDomainMemoryFindInactiveByDef(virDomainDef *def, virDomainMemoryDef *mem) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; +virDomainMemoryDef * +virDomainMemoryFindByDeviceInfo(virDomainDef *dev, + virDomainDeviceInfo *info, + int *pos) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; int virDomainShmemDefInsert(virDomainDef *def, virDomainShmemDef *shmem) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index d42d3abcce..3f9035c25b 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -508,6 +508,7 @@ virDomainMemballoonModelTypeFromString; virDomainMemballoonModelTypeToString; virDomainMemoryDefFree; virDomainMemoryFindByDef; +virDomainMemoryFindByDeviceInfo; virDomainMemoryFindInactiveByDef; virDomainMemoryInsert; virDomainMemoryModelTypeToString; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index dfc27572c4..d05052a3d7 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7087,6 +7087,175 @@ qemuDomainChangeDiskLive(virDomainObj *vm, return 0; } + +static bool +qemuDomainChangeMemoryLiveValidateChange(const virDomainMemoryDef *oldDef, + const virDomainMemoryDef *newDef) +{ + /* The only thing that is allowed to change is 'requestedsize' for + * virtio-mem model. Check if user isn't trying to sneak in change for + * something else. */ + + switch (oldDef->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory of model '%s'"), + virDomainMemoryModelTypeToString(oldDef->model)); + return false; + break; + } + + if (oldDef->model != newDef->model) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory model from '%s' to '%s'"), + virDomainMemoryModelTypeToString(oldDef->model), + virDomainMemoryModelTypeToString(newDef->model)); + return false; + } + + if (oldDef->access != newDef->access) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory access from '%s' to '%s'"), + virDomainMemoryAccessTypeToString(oldDef->access), + virDomainMemoryAccessTypeToString(newDef->access)); + return false; + } + + if (oldDef->discard != newDef->discard) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory discard from '%s' to '%s'"), + virTristateBoolTypeToString(oldDef->discard), + virTristateBoolTypeToString(newDef->discard)); + return false; + } + + if (!virBitmapEqual(oldDef->sourceNodes, + newDef->sourceNodes)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cannot modify memory source nodes")); + return false; + } + + if (oldDef->pagesize != newDef->pagesize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory pagesize from '%llu' to '%llu'"), + oldDef->pagesize, + newDef->pagesize); + return false; + } + + if (STRNEQ_NULLABLE(oldDef->nvdimmPath, newDef->nvdimmPath)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory path from '%s' to '%s'"), + NULLSTR(oldDef->nvdimmPath), + NULLSTR(newDef->nvdimmPath)); + return false; + } + + if (oldDef->alignsize != newDef->alignsize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory align size from '%llu' to '%llu'"), + oldDef->alignsize, newDef->alignsize); + return false; + } + + if (oldDef->nvdimmPmem != newDef->nvdimmPmem) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory pmem from '%d' to '%d'"), + oldDef->nvdimmPmem, newDef->nvdimmPmem); + return false; + } + + if (oldDef->targetNode != newDef->targetNode) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory targetNode from '%d' to '%d'"), + oldDef->targetNode, newDef->targetNode); + return false; + } + + if (oldDef->size != newDef->size) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory size from '%llu' to '%llu'"), + oldDef->size, newDef->size); + return false; + } + + if (oldDef->labelsize != newDef->labelsize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory label size from '%llu' to '%llu'"), + oldDef->labelsize, newDef->labelsize); + return false; + } + if (oldDef->blocksize != newDef->blocksize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory block size from '%llu' to '%llu'"), + oldDef->blocksize, newDef->blocksize); + return false; + } + + /* requestedsize can change */ + + if (oldDef->readonly != newDef->readonly) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cannot modify memory pmem flag")); + return false; + } + + if ((oldDef->uuid || newDef->uuid) && + !(oldDef->uuid && newDef->uuid && + memcmp(oldDef->uuid, newDef->uuid, VIR_UUID_BUFLEN) == 0)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cannot modify memory UUID")); + return false; + } + + return true; +} + + +static int +qemuDomainChangeMemoryLive(virQEMUDriver *driver G_GNUC_UNUSED, + virDomainObj *vm, + virDomainDeviceDef *dev) +{ + virDomainDeviceDef oldDev = { .type = VIR_DOMAIN_DEVICE_MEMORY }; + virDomainMemoryDef *newDef = dev->data.memory; + virDomainMemoryDef *oldDef = NULL; + + oldDef = virDomainMemoryFindByDeviceInfo(vm->def, &dev->data.memory->info, NULL); + if (!oldDef) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("memory '%s' not found"), dev->data.memory->info.alias); + return -1; + } + + oldDev.data.memory = oldDef; + + if (virDomainDefCompatibleDevice(vm->def, dev, &oldDev, + VIR_DOMAIN_DEVICE_ACTION_UPDATE, + true) < 0) + return -1; + + if (!qemuDomainChangeMemoryLiveValidateChange(oldDef, newDef)) + return -1; + + if (qemuDomainChangeMemoryRequestedSize(driver, vm, + newDef->info.alias, + newDef->requestedsize) < 0) + return -1; + + oldDef->requestedsize = newDef->requestedsize; + return 0; +} + + static int qemuDomainUpdateDeviceLive(virDomainObj *vm, virDomainDeviceDef *dev, @@ -7128,6 +7297,10 @@ qemuDomainUpdateDeviceLive(virDomainObj *vm, ret = qemuDomainChangeNet(driver, vm, dev); break; + case VIR_DOMAIN_DEVICE_MEMORY: + ret = qemuDomainChangeMemoryLive(driver, vm, dev); + break; + case VIR_DOMAIN_DEVICE_FS: case VIR_DOMAIN_DEVICE_INPUT: case VIR_DOMAIN_DEVICE_SOUND: @@ -7143,7 +7316,6 @@ qemuDomainUpdateDeviceLive(virDomainObj *vm, case VIR_DOMAIN_DEVICE_HOSTDEV: case VIR_DOMAIN_DEVICE_CONTROLLER: case VIR_DOMAIN_DEVICE_REDIRDEV: - case VIR_DOMAIN_DEVICE_MEMORY: case VIR_DOMAIN_DEVICE_CHR: case VIR_DOMAIN_DEVICE_NONE: case VIR_DOMAIN_DEVICE_TPM: diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 9c16ab4567..ccfd83cf66 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -6800,3 +6800,21 @@ qemuDomainSetVcpuInternal(virQEMUDriver *driver, virBitmapFree(livevcpus); return ret; } + + +int +qemuDomainChangeMemoryRequestedSize(virQEMUDriver *driver, + virDomainObj *vm, + const char *alias, + unsigned long long requestedsize) +{ + qemuDomainObjPrivate *priv = vm->privateData; + int rc; + + qemuDomainObjEnterMonitor(driver, vm); + rc = qemuMonitorChangeMemoryRequestedSize(priv->mon, alias, requestedsize); + if (qemuDomainObjExitMonitor(driver, vm) < 0) + return -1; + + return rc; +} diff --git a/src/qemu/qemu_hotplug.h b/src/qemu/qemu_hotplug.h index b5f7afb076..ae8feb459c 100644 --- a/src/qemu/qemu_hotplug.h +++ b/src/qemu/qemu_hotplug.h @@ -166,3 +166,8 @@ int qemuHotplugAttachDBusVMState(virQEMUDriver *driver, int qemuHotplugRemoveDBusVMState(virQEMUDriver *driver, virDomainObj *vm, qemuDomainAsyncJob asyncJob); + +int qemuDomainChangeMemoryRequestedSize(virQEMUDriver *driver, + virDomainObj *vm, + const char *alias, + unsigned long long requestedsize); diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 42846349c4..24119ef66e 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -4625,3 +4625,16 @@ qemuMonitorSetAction(qemuMonitor *mon, return qemuMonitorJSONSetAction(mon, shutdown, reboot, watchdog, panic); } + + +int +qemuMonitorChangeMemoryRequestedSize(qemuMonitor *mon, + const char *alias, + unsigned long long requestedsize) +{ + VIR_DEBUG("alias=%s requestedsize=%llu", alias, requestedsize); + + QEMU_CHECK_MONITOR(mon); + + return qemuMonitorJSONChangeMemoryRequestedSize(mon, alias, requestedsize); +} diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 2f08357c0c..0ebe73ac04 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1537,3 +1537,8 @@ qemuMonitorSetAction(qemuMonitor *mon, qemuMonitorActionReboot reboot, qemuMonitorActionWatchdog watchdog, qemuMonitorActionPanic panic); + +int +qemuMonitorChangeMemoryRequestedSize(qemuMonitor *mon, + const char *alias, + unsigned long long requestedsize); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 8d3c4031a6..d5e95f3dda 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -9404,3 +9404,18 @@ qemuMonitorJSONSetAction(qemuMonitor *mon, return 0; } + + +int +qemuMonitorJSONChangeMemoryRequestedSize(qemuMonitor *mon, + const char *alias, + unsigned long long requestedsize) +{ + g_autofree char *path = g_strdup_printf("/machine/peripheral/%s", alias); + qemuMonitorJSONObjectProperty prop = { + .type = QEMU_MONITOR_OBJECT_PROPERTY_ULONG, + .val.ul = requestedsize * 1024, /* monitor needs bytes */ + }; + + return qemuMonitorJSONSetObjectProperty(mon, path, "requested-size", &prop); +} diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index c8cf734a1c..0289f8c3bf 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -710,3 +710,8 @@ qemuMonitorJSONSetAction(qemuMonitor *mon, qemuMonitorActionReboot reboot, qemuMonitorActionWatchdog watchdog, qemuMonitorActionPanic panic); + +int +qemuMonitorJSONChangeMemoryRequestedSize(qemuMonitor *mon, + const char *alias, + unsigned long long requestedsize); -- 2.32.0

Updating offline XML of <memory/> devices might come handy when dealing with virtio-mem devices. But it's implemented to just replace one virDomainMemoryDef with another so it can be used to change almost anything. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_driver.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index d05052a3d7..10ff8519c5 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7764,6 +7764,7 @@ qemuDomainUpdateDeviceConfig(virDomainDef *vmdef, virDomainDiskDef *newDisk; virDomainGraphicsDef *newGraphics; virDomainNetDef *net; + virDomainMemoryDef *mem; virDomainDeviceDef oldDev = { .type = dev->type }; int pos; @@ -7826,6 +7827,23 @@ qemuDomainUpdateDeviceConfig(virDomainDef *vmdef, dev->data.net = NULL; break; + case VIR_DOMAIN_DEVICE_MEMORY: + mem = virDomainMemoryFindByDeviceInfo(vmdef, &dev->data.memory->info, &pos); + if (!mem) { + virReportError(VIR_ERR_INVALID_ARG, "%s", _("memory not found")); + return -1; + } + + oldDev.data.memory = mem; + if (virDomainDefCompatibleDevice(vmdef, dev, &oldDev, + VIR_DOMAIN_DEVICE_ACTION_UPDATE, + false) < 0) + return -1; + + virDomainMemoryDefFree(vmdef->mems[pos]); + vmdef->mems[pos] = g_steal_pointer(&dev->data.memory); + break; + case VIR_DOMAIN_DEVICE_FS: case VIR_DOMAIN_DEVICE_INPUT: case VIR_DOMAIN_DEVICE_SOUND: @@ -7842,7 +7860,6 @@ qemuDomainUpdateDeviceConfig(virDomainDef *vmdef, case VIR_DOMAIN_DEVICE_CONTROLLER: case VIR_DOMAIN_DEVICE_REDIRDEV: case VIR_DOMAIN_DEVICE_CHR: - case VIR_DOMAIN_DEVICE_MEMORY: case VIR_DOMAIN_DEVICE_NONE: case VIR_DOMAIN_DEVICE_TPM: case VIR_DOMAIN_DEVICE_PANIC: -- 2.32.0

The virtio-mem has another property that isn't exposed yet: current size exposed to the guest. Please note, that this is different to <requested/> because esp. on sizing the memory down guest may refuse to release some blocks. Therefore, let's have another size to report in the XML. But because of its nature, the <current/> won't be parsed and is report only (for live XMLs). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.rst | 7 +++++++ docs/schemas/domaincommon.rng | 5 +++++ src/conf/domain_conf.c | 9 +++++++-- src/conf/domain_conf.h | 3 +++ 4 files changed, 22 insertions(+), 2 deletions(-) diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index c327e1deae..7dc93d6abe 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -7800,6 +7800,7 @@ Example: usage of the memory devices <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>1048576</requested> + <current unit='KiB'>524288</current> </target> </memory> </devices> @@ -7915,6 +7916,12 @@ Example: usage of the memory devices The total size exposed to the guest. Must respect ``block`` granularity and be smaller than or equal to ``size``. + ``current`` + Active XML for ``virtio-mem`` model may contain ``current`` element that + reflects the current size of the corresponding virtio memory device. The + element is formatted into live XML and never parsed, i.e. it is + output-only element. + :anchor:`<a id="elementsIommu"/>` IOMMU devices diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 82fdc7c8b5..a68f794262 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -6693,6 +6693,11 @@ <ref name="scaledInteger"/> </element> </optional> + <optional> + <element name="current"> + <ref name="scaledInteger"/> + </element> + </optional> <optional> <element name="label"> <element name="size"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 9562065499..c91dd9d81c 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -25905,7 +25905,8 @@ virDomainMemorySourceDefFormat(virBuffer *buf, static void virDomainMemoryTargetDefFormat(virBuffer *buf, - virDomainMemoryDef *def) + virDomainMemoryDef *def, + unsigned int flags) { g_auto(virBuffer) childBuf = VIR_BUFFER_INIT_CHILD(buf); @@ -25927,6 +25928,10 @@ virDomainMemoryTargetDefFormat(virBuffer *buf, virBufferAsprintf(&childBuf, "<requested unit='KiB'>%llu</requested>\n", def->requestedsize); + if (!(flags & VIR_DOMAIN_DEF_FORMAT_INACTIVE)) { + virBufferAsprintf(&childBuf, "<current unit='KiB'>%llu</current>\n", + def->currentsize); + } } virXMLFormatElement(buf, "target", NULL, &childBuf); @@ -25959,7 +25964,7 @@ virDomainMemoryDefFormat(virBuffer *buf, if (virDomainMemorySourceDefFormat(buf, def) < 0) return -1; - virDomainMemoryTargetDefFormat(buf, def); + virDomainMemoryTargetDefFormat(buf, def, flags); virDomainDeviceInfoFormat(buf, &def->info, flags); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index e9d8e0c7ae..91bf52ff3b 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2495,6 +2495,9 @@ struct _virDomainMemoryDef { unsigned long long labelsize; /* kibibytes; valid only for NVDIMM */ unsigned long long blocksize; /* kibibytes; valid only for VIRTIO_MEM */ unsigned long long requestedsize; /* kibibytes; valid only for VIRTIO_MEM */ + unsigned long long currentsize; /* kibibytes, valid for VIRTIO_MEM and + active domain only, only to report never + parse */ bool readonly; /* valid only for NVDIMM */ /* required for QEMU NVDIMM ppc64 support */ -- 2.32.0

This function will be needed in the next commit where we will want to find virtio-mem given its alias by QEMU on the monitor. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/domain_conf.c | 17 +++++++++++++++++ src/conf/domain_conf.h | 5 +++++ src/libvirt_private.syms | 1 + 3 files changed, 23 insertions(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index c91dd9d81c..ae0218a674 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -16618,6 +16618,23 @@ virDomainMemoryFindByDeviceInfo(virDomainDef *def, } +virDomainMemoryDef * +virDomainMemoryFindByDeviceAlias(virDomainDef *def, + const char *alias) +{ + size_t i; + + for (i = 0; i < def->nmems; i++) { + virDomainMemoryDef *tmp = def->mems[i]; + + if (STREQ_NULLABLE(tmp->info.alias, alias)) + return tmp; + } + + return NULL; +} + + /** * virDomainMemoryInsert: * diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 91bf52ff3b..2ce32934d5 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -3815,6 +3815,11 @@ virDomainMemoryFindByDeviceInfo(virDomainDef *dev, int *pos) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; +virDomainMemoryDef * +virDomainMemoryFindByDeviceAlias(virDomainDef *def, + const char *alias) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; + int virDomainShmemDefInsert(virDomainDef *def, virDomainShmemDef *shmem) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; bool virDomainShmemDefEquals(virDomainShmemDef *src, virDomainShmemDef *dst) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 3f9035c25b..dfb1e29ca0 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -508,6 +508,7 @@ virDomainMemballoonModelTypeFromString; virDomainMemballoonModelTypeToString; virDomainMemoryDefFree; virDomainMemoryFindByDef; +virDomainMemoryFindByDeviceAlias; virDomainMemoryFindByDeviceInfo; virDomainMemoryFindInactiveByDef; virDomainMemoryInsert; -- 2.32.0

As advertised in previous commit, this event is delivered to us when virtio-mem module changes the allocation inside the guest. It comes with one attribute - size - which holds the new size of the virtio-mem (well, allocated size), in bytes. Mind you, this is not necessarily the same number as 'requested size'. It almost certainly will be when sizing the memory up, but it might not be when sizing the memory down - the guest kernel might be unable to free some blocks. This current size is reported in the domain XML as an output element only. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- examples/c/misc/event-test.c | 17 ++++++ include/libvirt/libvirt-domain.h | 24 +++++++++ src/conf/domain_event.c | 84 +++++++++++++++++++++++++++++ src/conf/domain_event.h | 10 ++++ src/libvirt_private.syms | 2 + src/qemu/qemu_domain.c | 3 ++ src/qemu/qemu_domain.h | 1 + src/qemu/qemu_driver.c | 37 +++++++++++++ src/qemu/qemu_monitor.c | 21 ++++++++ src/qemu/qemu_monitor.h | 20 +++++++ src/qemu/qemu_monitor_json.c | 24 +++++++++ src/qemu/qemu_process.c | 42 +++++++++++++++ src/remote/remote_daemon_dispatch.c | 30 +++++++++++ src/remote/remote_driver.c | 32 +++++++++++ src/remote/remote_protocol.x | 15 +++++- src/remote_protocol-structs | 7 +++ tools/virsh-domain.c | 20 +++++++ 17 files changed, 388 insertions(+), 1 deletion(-) diff --git a/examples/c/misc/event-test.c b/examples/c/misc/event-test.c index 10c707e66b..1eec76c79d 100644 --- a/examples/c/misc/event-test.c +++ b/examples/c/misc/event-test.c @@ -982,6 +982,22 @@ myDomainEventMemoryFailureCallback(virConnectPtr conn G_GNUC_UNUSED, } +static int +myDomainEventMemoryDeviceSizeChangeCallback(virConnectPtr conn G_GNUC_UNUSED, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque G_GNUC_UNUSED) +{ + /* Casts to uint64_t to work around mingw not knowing %lld */ + printf("%s EVENT: Domain %s(%d) memory device size change: " + "alias: '%s' new size %" PRIu64 "'\n", + __func__, virDomainGetName(dom), virDomainGetID(dom), + alias, (uint64_t)size); + return 0; +} + + static int myDomainEventMigrationIterationCallback(virConnectPtr conn G_GNUC_UNUSED, virDomainPtr dom, @@ -1113,6 +1129,7 @@ struct domainEventData domainEvents[] = { DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_METADATA_CHANGE, myDomainEventMetadataChangeCallback), DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_BLOCK_THRESHOLD, myDomainEventBlockThresholdCallback), DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_MEMORY_FAILURE, myDomainEventMemoryFailureCallback), + DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE, myDomainEventMemoryDeviceSizeChangeCallback), }; struct storagePoolEventData { diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 7ef8ac51e5..eaafcc6b29 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -4655,6 +4655,29 @@ typedef void (*virConnectDomainEventMemoryFailureCallback)(virConnectPtr conn, void *opaque); +/** + * virConnectDomainEventMemoryDeviceSizeChangeCallback: + * @conn: connection object + * @dom: domain on which the event occurred + * @alias: memory device alias + * @size: new current size of memory device (in KiB) + * @opaque: application specified data + * + * The callback occurs when the guest acknowledges request to change size of + * memory device (so far only virtio-mem model supports this). The @size then + * reflects the new amount of guest visible memory (in kibibytes). + * + * The callback signature to use when registering for an event of type + * VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE with + * virConnectDomainEventRegisterAny(). + */ +typedef void (*virConnectDomainEventMemoryDeviceSizeChangeCallback)(virConnectPtr conn, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque); + + /** * VIR_DOMAIN_EVENT_CALLBACK: * @@ -4698,6 +4721,7 @@ typedef enum { VIR_DOMAIN_EVENT_ID_METADATA_CHANGE = 23, /* virConnectDomainEventMetadataChangeCallback */ VIR_DOMAIN_EVENT_ID_BLOCK_THRESHOLD = 24, /* virConnectDomainEventBlockThresholdCallback */ VIR_DOMAIN_EVENT_ID_MEMORY_FAILURE = 25, /* virConnectDomainEventMemoryFailureCallback */ + VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE = 26, /* virConnectDomainEventMemoryDeviceSizeChangeCallback */ # ifdef VIR_ENUM_SENTINELS VIR_DOMAIN_EVENT_ID_LAST diff --git a/src/conf/domain_event.c b/src/conf/domain_event.c index 15a3baedf7..18539e348b 100644 --- a/src/conf/domain_event.c +++ b/src/conf/domain_event.c @@ -58,6 +58,7 @@ static virClass *virDomainEventDeviceRemovalFailedClass; static virClass *virDomainEventMetadataChangeClass; static virClass *virDomainEventBlockThresholdClass; static virClass *virDomainEventMemoryFailureClass; +static virClass *virDomainEventMemoryDeviceSizeChangeClass; static void virDomainEventDispose(void *obj); static void virDomainEventLifecycleDispose(void *obj); @@ -81,6 +82,7 @@ static void virDomainEventDeviceRemovalFailedDispose(void *obj); static void virDomainEventMetadataChangeDispose(void *obj); static void virDomainEventBlockThresholdDispose(void *obj); static void virDomainEventMemoryFailureDispose(void *obj); +static void virDomainEventMemoryDeviceSizeChangeDispose(void *obj); static void virDomainEventDispatchDefaultFunc(virConnectPtr conn, @@ -277,6 +279,15 @@ struct _virDomainEventMemoryFailure { }; typedef struct _virDomainEventMemoryFailure virDomainEventMemoryFailure; +struct _virDomainEventMemoryDeviceSizeChange { + virDomainEvent parent; + + char *alias; + unsigned long long size; +}; +typedef struct _virDomainEventMemoryDeviceSizeChange virDomainEventMemoryDeviceSizeChange; +typedef virDomainEventMemoryDeviceSizeChange *virDomainEventMemoryDeviceSizeChangePtr; + static int virDomainEventsOnceInit(void) { @@ -324,6 +335,8 @@ virDomainEventsOnceInit(void) return -1; if (!VIR_CLASS_NEW(virDomainEventMemoryFailure, virDomainEventClass)) return -1; + if (!VIR_CLASS_NEW(virDomainEventMemoryDeviceSizeChange, virDomainEventClass)) + return -1; return 0; } @@ -540,6 +553,14 @@ virDomainEventMemoryFailureDispose(void *obj) VIR_DEBUG("obj=%p", event); } +static void +virDomainEventMemoryDeviceSizeChangeDispose(void *obj) +{ + virDomainEventMemoryDeviceSizeChangePtr event = obj; + VIR_DEBUG("obj=%p", event); + + g_free(event->alias); +} static void * virDomainEventNew(virClass *klass, @@ -1664,6 +1685,57 @@ virDomainEventMemoryFailureNewFromDom(virDomainPtr dom, recipient, action, flags); } + +static virObjectEvent * +virDomainEventMemoryDeviceSizeChangeNew(int id, + const char *name, + unsigned char *uuid, + const char *alias, + unsigned long long size) +{ + virDomainEventMemoryDeviceSizeChangePtr ev; + + if (virDomainEventsInitialize() < 0) + return NULL; + + if (!(ev = virDomainEventNew(virDomainEventMemoryDeviceSizeChangeClass, + VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE, + id, name, uuid))) + return NULL; + + ev->alias = g_strdup(alias); + ev->size = size; + + return (virObjectEvent *)ev; +} + + +virObjectEvent * +virDomainEventMemoryDeviceSizeChangeNewFromObj(virDomainObj *obj, + const char *alias, + unsigned long long size) +{ + return virDomainEventMemoryDeviceSizeChangeNew(obj->def->id, + obj->def->name, + obj->def->uuid, + alias, + size); +} + + +virObjectEvent * +virDomainEventMemoryDeviceSizeChangeNewFromDom(virDomainPtr dom, + const char *alias, + unsigned long long size) +{ + return virDomainEventMemoryDeviceSizeChangeNew(dom->id, + dom->name, + dom->uuid, + alias, + size); +} + + static void virDomainEventDispatchDefaultFunc(virConnectPtr conn, virObjectEvent *event, @@ -1960,6 +2032,18 @@ virDomainEventDispatchDefaultFunc(virConnectPtr conn, goto cleanup; } + case VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE: + { + virDomainEventMemoryDeviceSizeChangePtr memoryDeviceSizeChangeEvent; + + memoryDeviceSizeChangeEvent = (virDomainEventMemoryDeviceSizeChangePtr)event; + ((virConnectDomainEventMemoryDeviceSizeChangeCallback)cb)(conn, dom, + memoryDeviceSizeChangeEvent->alias, + memoryDeviceSizeChangeEvent->size, + cbopaque); + goto cleanup; + } + case VIR_DOMAIN_EVENT_ID_LAST: break; } diff --git a/src/conf/domain_event.h b/src/conf/domain_event.h index 2a59e613cd..4a9f6b988b 100644 --- a/src/conf/domain_event.h +++ b/src/conf/domain_event.h @@ -267,6 +267,16 @@ virDomainEventMemoryFailureNewFromDom(virDomainPtr dom, int action, unsigned int flags); +virObjectEvent * +virDomainEventMemoryDeviceSizeChangeNewFromObj(virDomainObj *obj, + const char *alias, + unsigned long long size); + +virObjectEvent * +virDomainEventMemoryDeviceSizeChangeNewFromDom(virDomainPtr dom, + const char *alias, + unsigned long long size); + int virDomainEventStateRegister(virConnectPtr conn, virObjectEventState *state, diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index dfb1e29ca0..83fd2b022a 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -726,6 +726,8 @@ virDomainEventLifecycleNew; virDomainEventLifecycleNewFromDef; virDomainEventLifecycleNewFromDom; virDomainEventLifecycleNewFromObj; +virDomainEventMemoryDeviceSizeChangeNewFromDom; +virDomainEventMemoryDeviceSizeChangeNewFromObj; virDomainEventMemoryFailureNewFromDom; virDomainEventMemoryFailureNewFromObj; virDomainEventMetadataChangeNewFromDom; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 298f9b076d..325e45d009 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11033,6 +11033,9 @@ qemuProcessEventFree(struct qemuProcessEvent *event) case QEMU_PROCESS_EVENT_JOB_STATUS_CHANGE: virObjectUnref(event->data); break; + case QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE: + qemuMonitorMemoryDeviceSizeChangeFree(event->data); + break; case QEMU_PROCESS_EVENT_PR_DISCONNECT: case QEMU_PROCESS_EVENT_LAST: break; diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index ae9d76ec4a..802de93e2e 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -438,6 +438,7 @@ typedef enum { QEMU_PROCESS_EVENT_PR_DISCONNECT, QEMU_PROCESS_EVENT_RDMA_GID_STATUS_CHANGED, QEMU_PROCESS_EVENT_GUEST_CRASHLOADED, + QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE, QEMU_PROCESS_EVENT_LAST } qemuProcessEventType; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 10ff8519c5..712ec68f7c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4254,6 +4254,40 @@ processGuestCrashloadedEvent(virQEMUDriver *driver, } +static void +processMemoryDeviceSizeChange(virQEMUDriver *driver, + virDomainObj *vm, + qemuMonitorMemoryDeviceSizeChange *info) +{ + virDomainMemoryDef *mem = NULL; + virObjectEvent *event = NULL; + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) + return; + + if (!virDomainObjIsActive(vm)) { + VIR_DEBUG("Domain is not running"); + goto endjob; + } + + mem = virDomainMemoryFindByDeviceAlias(vm->def, info->devAlias); + if (!mem) { + VIR_DEBUG("Memory device '%s' not found", info->devAlias); + goto endjob; + } + + mem->currentsize = VIR_DIV_UP(info->size, 1024); + + event = virDomainEventMemoryDeviceSizeChangeNewFromObj(vm, + info->devAlias, + mem->currentsize); + + endjob: + qemuDomainObjEndJob(driver, vm); + virObjectEventStateQueue(driver->domainEventState, event); +} + + static void qemuProcessEventHandler(void *data, void *opaque) { struct qemuProcessEvent *processEvent = data; @@ -4303,6 +4337,9 @@ static void qemuProcessEventHandler(void *data, void *opaque) case QEMU_PROCESS_EVENT_GUEST_CRASHLOADED: processGuestCrashloadedEvent(driver, vm); break; + case QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE: + processMemoryDeviceSizeChange(driver, vm, processEvent->data); + break; case QEMU_PROCESS_EVENT_LAST: break; } diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 24119ef66e..5fc23f13d3 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1382,6 +1382,17 @@ qemuMonitorEmitSpiceMigrated(qemuMonitor *mon) } +void +qemuMonitorEmitMemoryDeviceSizeChange(qemuMonitor *mon, + const char *devAlias, + unsigned long long size) +{ + VIR_DEBUG("mon=%p, devAlias='%s', size=%llu", mon, devAlias, size); + + QEMU_MONITOR_CALLBACK(mon, domainMemoryDeviceSizeChange, mon->vm, devAlias, size); +} + + void qemuMonitorEmitMemoryFailure(qemuMonitor *mon, qemuMonitorEventMemoryFailure *mfp) @@ -4266,6 +4277,16 @@ qemuMonitorEventRdmaGidStatusFree(qemuMonitorRdmaGidStatus *info) } +void +qemuMonitorMemoryDeviceSizeChangeFree(qemuMonitorMemoryDeviceSizeChangePtr info) +{ + if (!info) + return; + + g_free(info->devAlias); +} + + int qemuMonitorSetWatchdogAction(qemuMonitor *mon, const char *action) diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 0ebe73ac04..962b516542 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -101,6 +101,14 @@ struct _qemuMonitorRdmaGidStatus { }; +typedef struct _qemuMonitorMemoryDeviceSizeChange qemuMonitorMemoryDeviceSizeChange; +typedef qemuMonitorMemoryDeviceSizeChange *qemuMonitorMemoryDeviceSizeChangePtr; +struct _qemuMonitorMemoryDeviceSizeChange { + char *devAlias; + unsigned long long size; +}; + + typedef enum { QEMU_MONITOR_ACTION_SHUTDOWN_KEEP, /* do not change the current setting */ QEMU_MONITOR_ACTION_SHUTDOWN_POWEROFF, @@ -187,6 +195,7 @@ struct _qemuMonitorJobInfo { char *qemuMonitorGuestPanicEventInfoFormatMsg(qemuMonitorEventPanicInfo *info); void qemuMonitorEventPanicInfoFree(qemuMonitorEventPanicInfo *info); void qemuMonitorEventRdmaGidStatusFree(qemuMonitorRdmaGidStatus *info); +void qemuMonitorMemoryDeviceSizeChangeFree(qemuMonitorMemoryDeviceSizeChange *info); typedef void (*qemuMonitorDestroyCallback)(qemuMonitor *mon, virDomainObj *vm, @@ -403,6 +412,12 @@ typedef void (*qemuMonitorDomainMemoryFailureCallback)(qemuMonitor *mon, qemuMonitorEventMemoryFailure *mfp, void *opaque); +typedef int (*qemuMonitorDomainMemoryDeviceSizeChange)(qemuMonitor *mon, + virDomainObj *vm, + const char *alias, + unsigned long long size, + void *opaque); + typedef struct _qemuMonitorCallbacks qemuMonitorCallbacks; struct _qemuMonitorCallbacks { qemuMonitorEofNotifyCallback eofNotify; @@ -437,6 +452,7 @@ struct _qemuMonitorCallbacks { qemuMonitorDomainRdmaGidStatusChangedCallback domainRdmaGidStatusChanged; qemuMonitorDomainGuestCrashloadedCallback domainGuestCrashloaded; qemuMonitorDomainMemoryFailureCallback domainMemoryFailure; + qemuMonitorDomainMemoryDeviceSizeChange domainMemoryDeviceSizeChange; }; qemuMonitor *qemuMonitorOpen(virDomainObj *vm, @@ -532,6 +548,10 @@ void qemuMonitorEmitSerialChange(qemuMonitor *mon, bool connected); void qemuMonitorEmitSpiceMigrated(qemuMonitor *mon); +void qemuMonitorEmitMemoryDeviceSizeChange(qemuMonitor *mon, + const char *devAlias, + unsigned long long size); + void qemuMonitorEmitMemoryFailure(qemuMonitor *mon, qemuMonitorEventMemoryFailure *mfp); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index d5e95f3dda..ddce0fd2cd 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -112,6 +112,7 @@ static void qemuMonitorJSONHandleDumpCompleted(qemuMonitor *mon, virJSONValue *d static void qemuMonitorJSONHandlePRManagerStatusChanged(qemuMonitor *mon, virJSONValue *data); static void qemuMonitorJSONHandleRdmaGidStatusChanged(qemuMonitor *mon, virJSONValue *data); static void qemuMonitorJSONHandleMemoryFailure(qemuMonitor *mon, virJSONValue *data); +static void qemuMonitorJSONHandleMemoryDeviceSizeChange(qemuMonitor *mon, virJSONValue *data); typedef struct { const char *type; @@ -132,6 +133,7 @@ static qemuEventHandler eventHandlers[] = { { "GUEST_CRASHLOADED", qemuMonitorJSONHandleGuestCrashloaded, }, { "GUEST_PANICKED", qemuMonitorJSONHandleGuestPanic, }, { "JOB_STATUS_CHANGE", qemuMonitorJSONHandleJobStatusChange, }, + { "MEMORY_DEVICE_SIZE_CHANGE", qemuMonitorJSONHandleMemoryDeviceSizeChange, }, { "MEMORY_FAILURE", qemuMonitorJSONHandleMemoryFailure, }, { "MIGRATION", qemuMonitorJSONHandleMigrationStatus, }, { "MIGRATION_PASS", qemuMonitorJSONHandleMigrationPass, }, @@ -1313,6 +1315,28 @@ qemuMonitorJSONHandleSpiceMigrated(qemuMonitor *mon, } +static void +qemuMonitorJSONHandleMemoryDeviceSizeChange(qemuMonitor *mon, + virJSONValue *data) +{ + const char *name; + unsigned long long size; + + if (!(name = virJSONValueObjectGetString(data, "id"))) { + VIR_WARN("missing device alias in MEMORY_DEVICE_SIZE_CHANGE event"); + return; + } + + if (virJSONValueObjectGetNumberUlong(data, "size", &size) < 0) { + VIR_WARN("missing new size for '%s' in MEMORY_DEVICE_SIZE_CHANGE event", name); + return; + } + + + qemuMonitorEmitMemoryDeviceSizeChange(mon, name, size); +} + + static void qemuMonitorJSONHandleMemoryFailure(qemuMonitor *mon, virJSONValue *data) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 07bd7e928a..8ca496cfb4 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1833,6 +1833,47 @@ qemuProcessHandleMemoryFailure(qemuMonitor *mon G_GNUC_UNUSED, } +static int +qemuProcessHandleMemoryDeviceSizeChange(qemuMonitor *mon G_GNUC_UNUSED, + virDomainObj *vm, + const char *devAlias, + unsigned long long size, + void *opaque) +{ + virQEMUDriver *driver = opaque; + struct qemuProcessEvent *processEvent = NULL; + qemuMonitorMemoryDeviceSizeChange *info = NULL; + int ret = -1; + + virObjectLock(vm); + + VIR_DEBUG("Memory device '%s' changed size to '%llu' in domain '%s'", + devAlias, size, vm->def->name); + + info = g_new0(qemuMonitorMemoryDeviceSizeChange, 1); + info->devAlias = g_strdup(devAlias); + info->size = size; + + processEvent = g_new0(struct qemuProcessEvent, 1); + processEvent->eventType = QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE; + processEvent->vm = virObjectRef(vm); + processEvent->data = g_steal_pointer(&info); + + if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { + qemuProcessEventFree(processEvent); + virObjectUnref(vm); + goto cleanup; + } + + processEvent = NULL; + ret = 0; + cleanup: + qemuProcessEventFree(processEvent); + virObjectUnlock(vm); + return ret; +} + + static qemuMonitorCallbacks monitorCallbacks = { .eofNotify = qemuProcessHandleMonitorEOF, .errorNotify = qemuProcessHandleMonitorError, @@ -1866,6 +1907,7 @@ static qemuMonitorCallbacks monitorCallbacks = { .domainRdmaGidStatusChanged = qemuProcessHandleRdmaGidStatusChanged, .domainGuestCrashloaded = qemuProcessHandleGuestCrashloaded, .domainMemoryFailure = qemuProcessHandleMemoryFailure, + .domainMemoryDeviceSizeChange = qemuProcessHandleMemoryDeviceSizeChange, }; static void diff --git a/src/remote/remote_daemon_dispatch.c b/src/remote/remote_daemon_dispatch.c index 0bac7d4f51..bcfeadc2ae 100644 --- a/src/remote/remote_daemon_dispatch.c +++ b/src/remote/remote_daemon_dispatch.c @@ -1334,6 +1334,35 @@ remoteRelayDomainEventMemoryFailure(virConnectPtr conn, } +static int +remoteRelayDomainEventMemoryDeviceSizeChange(virConnectPtr conn, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque) +{ + daemonClientEventCallback *callback = opaque; + remote_domain_event_memory_device_size_change_msg data; + + if (callback->callbackID < 0 || + !remoteRelayDomainEventCheckACL(callback->client, conn, dom)) + return -1; + + /* build return data */ + memset(&data, 0, sizeof(data)); + data.callbackID = callback->callbackID; + data.alias = g_strdup(alias); + data.size = size; + make_nonnull_domain(&data.dom, dom); + + remoteDispatchObjectEventSend(callback->client, remoteProgram, + REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE, + (xdrproc_t)xdr_remote_domain_event_memory_device_size_change_msg, + &data); + return 0; +} + + static virConnectDomainEventGenericCallback domainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventLifecycle), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventReboot), @@ -1361,6 +1390,7 @@ static virConnectDomainEventGenericCallback domainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventMetadataChange), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventBlockThreshold), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventMemoryFailure), + VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventMemoryDeviceSizeChange), }; G_STATIC_ASSERT(G_N_ELEMENTS(domainEventCallbacks) == VIR_DOMAIN_EVENT_ID_LAST); diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index b64a86af63..ae6dc849ff 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -410,6 +410,10 @@ remoteDomainBuildEventMemoryFailure(virNetClientProgram *prog, void *evdata, void *opaque); static void +remoteDomainBuildEventMemoryDeviceSizeChange(virNetClientProgram *prog, + virNetClient *client, + void *evdata, void *opaque); +static void remoteConnectNotifyEventConnectionClosed(virNetClientProgram *prog G_GNUC_UNUSED, virNetClient *client G_GNUC_UNUSED, void *evdata, void *opaque); @@ -624,6 +628,10 @@ static virNetClientProgramEvent remoteEvents[] = { remoteDomainBuildEventMemoryFailure, sizeof(remote_domain_event_memory_failure_msg), (xdrproc_t)xdr_remote_domain_event_memory_failure_msg }, + { REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE, + remoteDomainBuildEventMemoryDeviceSizeChange, + sizeof(remote_domain_event_memory_device_size_change_msg), + (xdrproc_t)xdr_remote_domain_event_memory_device_size_change_msg }, }; static void @@ -5436,6 +5444,30 @@ remoteDomainBuildEventMemoryFailure(virNetClientProgram *prog G_GNUC_UNUSED, } +static void +remoteDomainBuildEventMemoryDeviceSizeChange(virNetClientProgram *prog G_GNUC_UNUSED, + virNetClient *client G_GNUC_UNUSED, + void *evdata, void *opaque) +{ + virConnectPtr conn = opaque; + remote_domain_event_memory_device_size_change_msg *msg = evdata; + struct private_data *priv = conn->privateData; + virDomainPtr dom; + virObjectEvent *event = NULL; + + if (!(dom = get_nonnull_domain(conn, msg->dom))) + return; + + event = virDomainEventMemoryDeviceSizeChangeNewFromDom(dom, + msg->alias, + msg->size); + + virObjectUnref(dom); + + virObjectEventStateQueueRemote(priv->eventState, event, msg->callbackID); +} + + static int remoteStreamSend(virStreamPtr st, const char *data, diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index df1b126b0c..34c2751755 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -3855,6 +3855,13 @@ struct remote_domain_start_dirty_rate_calc_args { }; +struct remote_domain_event_memory_device_size_change_msg { + int callbackID; + remote_nonnull_domain dom; + remote_nonnull_string alias; + unsigned hyper size; +}; + /*----- Protocol. -----*/ /* Define the program number, protocol version and procedure numbers here. */ @@ -6818,5 +6825,11 @@ enum remote_procedure { * @acl: network:write * @acl: network:save */ - REMOTE_PROC_NETWORK_DEFINE_XML_FLAGS = 432 + REMOTE_PROC_NETWORK_DEFINE_XML_FLAGS = 432, + + /** + * @generate: both + * @acl: none + */ + REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE = 433 }; diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index dad83361fa..b91e1b867f 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -3206,6 +3206,12 @@ struct remote_domain_start_dirty_rate_calc_args { int seconds; u_int flags; }; +struct remote_domain_event_memory_device_size_change_msg { + int callbackID; + remote_nonnull_domain dom; + remote_nonnull_string alias; + uint64_t size; +}; enum remote_procedure { REMOTE_PROC_CONNECT_OPEN = 1, REMOTE_PROC_CONNECT_CLOSE = 2, @@ -3639,4 +3645,5 @@ enum remote_procedure { REMOTE_PROC_NODE_DEVICE_CREATE = 430, REMOTE_PROC_NWFILTER_DEFINE_XML_FLAGS = 431, REMOTE_PROC_NETWORK_DEFINE_XML_FLAGS = 432, + REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE = 433, }; diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index e5bd1fdd75..b06f651c13 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -13148,6 +13148,24 @@ virshEventMemoryFailurePrint(virConnectPtr conn G_GNUC_UNUSED, } +static void +virshEventMemoryDeviceSizeChangePrint(virConnectPtr conn G_GNUC_UNUSED, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque) +{ + g_auto(virBuffer) buf = VIR_BUFFER_INITIALIZER; + + virBufferAsprintf(&buf, + _("event 'memory-device-size-change' for domain '%s':\n" + "alias: %s\nsize: %llu\n"), + virDomainGetName(dom), alias, size); + + virshEventPrint(opaque, &buf); +} + + virshDomainEventCallback virshDomainEventCallbacks[] = { { "lifecycle", VIR_DOMAIN_EVENT_CALLBACK(virshEventLifecyclePrint), }, @@ -13199,6 +13217,8 @@ virshDomainEventCallback virshDomainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(virshEventBlockThresholdPrint), }, { "memory-failure", VIR_DOMAIN_EVENT_CALLBACK(virshEventMemoryFailurePrint), }, + { "memory-device-size-change", + VIR_DOMAIN_EVENT_CALLBACK(virshEventMemoryDeviceSizeChangePrint), }, }; G_STATIC_ASSERT(VIR_DOMAIN_EVENT_ID_LAST == G_N_ELEMENTS(virshDomainEventCallbacks)); -- 2.32.0

If the QEMU driver restarts it loses the track of the current size of virtio-mem (because it's runtime type of information and thus not stored in XML) and therefore, we have to refresh it when reconnecting to the domain monitor. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 20 +++++++++++-- src/qemu/qemu_monitor.h | 3 ++ src/qemu/qemu_monitor_json.c | 58 ++++++++++++++++++++++-------------- src/qemu/qemu_process.c | 3 ++ 4 files changed, 58 insertions(+), 26 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 325e45d009..00c6ae877c 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -8375,9 +8375,23 @@ qemuDomainUpdateMemoryDeviceInfo(virQEMUDriver *driver, if (!(dimm = virHashLookup(meminfo, mem->info.alias))) continue; - mem->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM; - mem->info.addr.dimm.slot = dimm->slot; - mem->info.addr.dimm.base = dimm->address; + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + mem->currentsize = VIR_DIV_UP(dimm->size, 1024); + break; + + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + mem->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM; + mem->info.addr.dimm.slot = dimm->slot; + mem->info.addr.dimm.base = dimm->address; + break; + + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } } virHashFree(meminfo); diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 962b516542..648fe293ed 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1385,10 +1385,13 @@ int qemuMonitorSetIOThread(qemuMonitor *mon, typedef struct _qemuMonitorMemoryDeviceInfo qemuMonitorMemoryDeviceInfo; struct _qemuMonitorMemoryDeviceInfo { + /* For pc-dimm */ unsigned long long address; unsigned int slot; bool hotplugged; bool hotpluggable; + /* For virtio-mem */ + unsigned long long size; /* in bytes */ }; int qemuMonitorGetMemoryDeviceInfo(qemuMonitor *mon, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index ddce0fd2cd..a4e60b4ee9 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -7984,7 +7984,6 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon, virJSONValue *cmd; virJSONValue *reply = NULL; virJSONValue *data = NULL; - qemuMonitorMemoryDeviceInfo *meminfo = NULL; size_t i; if (!(cmd = qemuMonitorJSONMakeCommand("query-memory-devices", NULL))) @@ -8005,6 +8004,9 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon, for (i = 0; i < virJSONValueArraySize(data); i++) { virJSONValue *elem = virJSONValueArrayGet(data, i); + g_autofree qemuMonitorMemoryDeviceInfo *meminfo = NULL; + virJSONValue *dimminfo; + const char *devalias; const char *type; if (!(type = virJSONValueObjectGetString(elem, "type"))) { @@ -8014,26 +8016,26 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon, goto cleanup; } + if (!(dimminfo = virJSONValueObjectGetObject(elem, "data"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("query-memory-devices reply data doesn't " + "contain enum data")); + goto cleanup; + } + + /* While 'id' attribute is marked as optional in QEMU's QAPI + * specification, Libvirt always sets it. Thus we can fail if not + * present. */ + if (!(devalias = virJSONValueObjectGetString(dimminfo, "id"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("dimm memory info data is missing 'id'")); + goto cleanup; + } + + meminfo = g_new0(qemuMonitorMemoryDeviceInfo, 1); + /* dimm memory devices */ if (STREQ(type, "dimm")) { - virJSONValue *dimminfo; - const char *devalias; - - if (!(dimminfo = virJSONValueObjectGetObject(elem, "data"))) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("query-memory-devices reply data doesn't " - "contain enum data")); - goto cleanup; - } - - if (!(devalias = virJSONValueObjectGetString(dimminfo, "id"))) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("dimm memory info data is missing 'id'")); - goto cleanup; - } - - meminfo = g_new0(qemuMonitorMemoryDeviceInfo, 1); - if (virJSONValueObjectGetNumberUlong(dimminfo, "addr", &meminfo->address) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -8064,17 +8066,27 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon, } - if (virHashAddEntry(info, devalias, meminfo) < 0) + } else if (STREQ(type, "virtio-mem")) { + if (virJSONValueObjectGetNumberUlong(dimminfo, "size", + &meminfo->size) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("malformed/missing size in virtio memory info")); goto cleanup; - - meminfo = NULL; + } + } else { + /* type not handled yet */ + continue; } + + if (virHashAddEntry(info, devalias, meminfo) < 0) + goto cleanup; + + meminfo = NULL; } ret = 0; cleanup: - VIR_FREE(meminfo); virJSONValueFree(cmd); virJSONValueFree(reply); return ret; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 8ca496cfb4..b2ac5937ec 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8718,6 +8718,9 @@ qemuProcessReconnect(void *opaque) qemuDomainVcpuPersistOrder(obj->def); + if (qemuDomainUpdateMemoryDeviceInfo(driver, obj, QEMU_ASYNC_JOB_NONE) < 0) + goto error; + if (qemuProcessDetectIOThreadPIDs(driver, obj, QEMU_ASYNC_JOB_NONE) < 0) goto error; -- 2.32.0

Reporting how much memory is exposed to the guest happens under <currentMemory/> which is taken from def->mem.cur_balloon. The reported amount should account for both balloon size and the sum of @currentsize of all virtio-mems. For instance, if domain has 4GiB via balloon and additional 2GiB via virtio-mem, then the domain XML should report 6GiB. The same applies for domain statistics. The way to achieve this is to account for either balloon or virtio-mem when the size of the other is changed, e.g. on balloon change we have to add all @currentsize (for non virtio-mem these will be zero, so the check for memory model is needless, but makes it more obvious what's happening), and vice versa. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_driver.c | 9 +++++++++ src/qemu/qemu_process.c | 25 +++++++++++++++++++++++++ 2 files changed, 34 insertions(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 712ec68f7c..84adeec1c2 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4261,6 +4261,7 @@ processMemoryDeviceSizeChange(virQEMUDriver *driver, { virDomainMemoryDef *mem = NULL; virObjectEvent *event = NULL; + unsigned long long balloon; if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) return; @@ -4276,7 +4277,15 @@ processMemoryDeviceSizeChange(virQEMUDriver *driver, goto endjob; } + /* If this looks weird it's because it is. The balloon size + * as reported by QEMU does not include any of @currentsize. + * It really contains just the balloon size. But in domain + * definition we want to report also sum of @currentsize. Do + * a bit of math to fix the domain definition. */ + balloon = vm->def->mem.cur_balloon - mem->currentsize; mem->currentsize = VIR_DIV_UP(info->size, 1024); + balloon += mem->currentsize; + vm->def->mem.cur_balloon = balloon; event = virDomainEventMemoryDeviceSizeChangeNewFromObj(vm, info->devAlias, diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index b2ac5937ec..f8e7345329 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1213,10 +1213,22 @@ qemuProcessHandleBalloonChange(qemuMonitor *mon G_GNUC_UNUSED, virQEMUDriver *driver = opaque; virObjectEvent *event = NULL; g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); + size_t i; virObjectLock(vm); event = virDomainEventBalloonChangeNewFromObj(vm, actual); + /* We want the balloon size stored in domain definition to + * account for the actual size of virtio-mem too. But the + * balloon size as reported by QEMU (@actual) contains just + * the balloon size without any virtio-mem. Do a wee bit of + * math to fix it. */ + VIR_DEBUG("balloon size before fix is %lld", actual); + for (i = 0; i < vm->def->nmems; i++) { + if (vm->def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + actual += vm->def->mems[i]->currentsize; + } + VIR_DEBUG("Updating balloon from %lld to %lld kb", vm->def->mem.cur_balloon, actual); vm->def->mem.cur_balloon = actual; @@ -2343,6 +2355,7 @@ qemuProcessRefreshBalloonState(virQEMUDriver *driver, int asyncJob) { unsigned long long balloon; + size_t i; int rc; /* if no ballooning is available, the current size equals to the current @@ -2359,6 +2372,18 @@ qemuProcessRefreshBalloonState(virQEMUDriver *driver, if (qemuDomainObjExitMonitor(driver, vm) < 0 || rc < 0) return -1; + /* We want the balloon size stored in domain definition to + * account for the actual size of virtio-mem too. But the + * balloon size as reported by QEMU (@balloon) contains just + * the balloon size without any virtio-mem. Do a wee bit of + * math to fix it. */ + VIR_DEBUG("balloon size before fix is %lld", balloon); + for (i = 0; i < vm->def->nmems; i++) { + if (vm->def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + balloon += vm->def->mems[i]->currentsize; + } + VIR_DEBUG("Updating balloon from %lld to %lld kb", + vm->def->mem.cur_balloon, balloon); vm->def->mem.cur_balloon = balloon; return 0; -- 2.32.0

The qemuDomainSetMemoryFlags() allows for memballoon (<currentMemory/>) changes for both active and inactive guests. And just before doing any change, we have to make sure that the new size is not greater than the total memory (<memory/>). However, the total memory includes not only the regular guest memory, but also sum of maximum sizes of all virtio-mems (in fact all memory devices for that matter). But virtio-mem devices are modified differently (via virDomainUpdateDevice()) and thus the upper limit for new balloon size has to be lowered. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_driver.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 84adeec1c2..171a5217a4 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2388,12 +2388,28 @@ static int qemuDomainSetMemoryFlags(virDomainPtr dom, unsigned long newmem, } else { /* resize the current memory */ unsigned long oldmax = 0; + size_t i; - if (def) + if (def) { oldmax = virDomainDefGetMemoryTotal(def); + + /* While virtio-mem is regular mem from guest POV, it can't be + * modified through this API. */ + for (i = 0; i < def->nmems; i++) { + if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + oldmax -= def->mems[i]->size; + } + } + if (persistentDef) { - if (!oldmax || oldmax > virDomainDefGetMemoryTotal(persistentDef)) + if (!def || oldmax > virDomainDefGetMemoryTotal(persistentDef)) { oldmax = virDomainDefGetMemoryTotal(persistentDef); + + for (i = 0; i < persistentDef->nmems; i++) { + if (persistentDef->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + oldmax -= persistentDef->mems[i]->size; + } + } } if (newmem > oldmax) { -- 2.32.0

New 'update-memory-device' command is introduced which aims on making it user friendly to change <memory/> device. So far I just need to change <requested/> so I'm introducing --requested-size only; but the idea is that this is extensible for other cases too. For instance, want to change <myElement/>? A new --my-element argument can be easily introduced. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/manpages/virsh.rst | 30 ++++++++ tools/virsh-domain.c | 161 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 191 insertions(+) diff --git a/docs/manpages/virsh.rst b/docs/manpages/virsh.rst index 9561b3f59d..c57937a55a 100644 --- a/docs/manpages/virsh.rst +++ b/docs/manpages/virsh.rst @@ -4919,6 +4919,36 @@ results as some fields may be autogenerated and thus match devices other than expected. +update-memory-device +-------------------- + +**Syntax:** + +:: + + update-memory-device domain [--print-xml] [[--alias alias] | [--node node]] + [[--live] [--config] | [--current]] + [--requested-size size] + +This command finds ``<memory/>`` device inside given *domain*, changes +requested values and passes updated device XML to daemon. If *--print-xml* is +specified then the device is not changed, but the updated device XML is printed +to stdout. If there are more than one ``<memory/>`` devices in *domain* use +*--alias* or *--node* to select the desired one. + +If *--live* is specified, affect a running domain. +If *--config* is specified, affect the next startup of a persistent guest. +If *--current* is specified, it is equivalent to either *--live* or +*--config*, depending on the current state of the guest. +Both *--live* and *--config* flags may be given, but *--current* is +exclusive. Not specifying any flag is the same as specifying *--current*. + +If *--requested-size* is specified then ``<requested/>`` under memory target is +changed to requested *size* (as scaled integer, see ``NOTES`` above). It +defaults to kibibytes if no suffix is provided. The option is valid only for +``virtio-mem`` memory device model. + + change-media ------------ diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index b06f651c13..0c02ce5ae0 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -8910,6 +8910,161 @@ cmdSetmaxmem(vshControl *ctl, const vshCmd *cmd) return true; } + +/* + * "update-memory-device" command + */ +static const vshCmdInfo info_update_memory_device[] = { + {.name = "help", + .data = N_("update memory device of a domain") + }, + {.name = "desc", + .data = N_("Update values of a memory device of a domain") + }, + {.name = NULL} +}; + +static const vshCmdOptDef opts_update_memory_device[] = { + VIRSH_COMMON_OPT_DOMAIN_FULL(0), + VIRSH_COMMON_OPT_DOMAIN_CONFIG, + VIRSH_COMMON_OPT_DOMAIN_LIVE, + VIRSH_COMMON_OPT_DOMAIN_CURRENT, + {.name = "print-xml", + .type = VSH_OT_BOOL, + .help = N_("print updated memory device XML instead of executing the change") + }, + {.name = "alias", + .type = VSH_OT_STRING, + .completer = virshDomainDeviceAliasCompleter, + .help = N_("memory device alias") + }, + {.name = "node", + .type = VSH_OT_INT, + .help = N_("memory device target node") + }, + {.name = "requested-size", + .type = VSH_OT_INT, + .help = N_("new value of <requested/> size, as scaled integer (default KiB)") + }, + {.name = NULL} +}; + +static int +virshGetUpdatedMemoryXML(char **updatedMemoryXML, + vshControl *ctl, + const vshCmd *cmd, + virDomainPtr dom, + unsigned int flags) +{ + const char *alias = NULL; + bool nodeOpt = false; + unsigned int node = 0; + g_autoptr(xmlDoc) doc = NULL; + g_autoptr(xmlXPathContext) ctxt = NULL; + g_autofree char *xpath = NULL; + int nmems; + g_autofree xmlNodePtr *mems = NULL; + unsigned int domainXMLFlags = 0; + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) + domainXMLFlags |= VIR_DOMAIN_XML_INACTIVE; + + if (virshDomainGetXMLFromDom(ctl, dom, domainXMLFlags, &doc, &ctxt) < 0) + return -1; + + nodeOpt = vshCommandOptBool(cmd, "node"); + if (vshCommandOptStringReq(ctl, cmd, "alias", &alias) < 0 || + vshCommandOptUInt(ctl, cmd, "node", &node) < 0) { + return -1; + } + + if (nodeOpt) { + xpath = g_strdup_printf("/domain/devices/memory[./target/node='%u']", node); + } else if (alias) { + xpath = g_strdup_printf("/domain/devices/memory[./alias/@name='%s']", alias); + } else { + xpath = g_strdup("/domain/devices/memory"); + } + + nmems = virXPathNodeSet(xpath, ctxt, &mems); + if (nmems < 0) { + vshSaveLibvirtError(); + return -1; + } else if (nmems == 0) { + vshError(ctl, _("no memory device found")); + return -1; + } else if (nmems > 1) { + vshError(ctl, _("multiple memory devices found, use --alias or --node to select one")); + return -1; + } + + ctxt->node = mems[0]; + + if (vshCommandOptBool(cmd, "requested-size")) { + xmlNodePtr requestedSizeNode; + g_autofree char *kibibytesStr = NULL; + unsigned long long bytes = 0; + unsigned long kibibytes = 0; + + if (vshCommandOptScaledInt(ctl, cmd, "requested-size", &bytes, 1024, ULLONG_MAX) < 0) + return -1; + kibibytes = VIR_DIV_UP(bytes, 1024); + + requestedSizeNode = virXPathNode("./target/requested", ctxt); + + if (!requestedSizeNode) { + vshError(ctl, _("virtio-mem device is missing <requested/>")); + return -1; + } + + kibibytesStr = g_strdup_printf("%lu", kibibytes); + xmlNodeSetContent(requestedSizeNode, BAD_CAST kibibytesStr); + } + + if (!(*updatedMemoryXML = virXMLNodeToString(doc, mems[0]))) { + vshSaveLibvirtError(); + return -1; + } + + return 0; +} + +static bool +cmdUpdateMemoryDevice(vshControl *ctl, const vshCmd *cmd) +{ + g_autoptr(virshDomain) dom = NULL; + bool config = vshCommandOptBool(cmd, "config"); + bool live = vshCommandOptBool(cmd, "live"); + bool current = vshCommandOptBool(cmd, "current"); + g_autofree char *updatedMemoryXML = NULL; + unsigned int flags = VIR_DOMAIN_AFFECT_CURRENT; + + VSH_EXCLUSIVE_OPTIONS_VAR(current, live); + VSH_EXCLUSIVE_OPTIONS_VAR(current, config); + VSH_EXCLUSIVE_OPTIONS("node", "alias"); + + if (config) + flags |= VIR_DOMAIN_AFFECT_CONFIG; + if (live) + flags |= VIR_DOMAIN_AFFECT_LIVE; + + if (!(dom = virshCommandOptDomain(ctl, cmd, NULL))) + return false; + + if (virshGetUpdatedMemoryXML(&updatedMemoryXML, ctl, cmd, dom, flags) < 0) + return false; + + if (vshCommandOptBool(cmd, "print-xml")) { + vshPrint(ctl, "%s", updatedMemoryXML); + } else { + if (virDomainUpdateDeviceFlags(dom, updatedMemoryXML, flags) < 0) + return false; + } + + return true; +} + + /* * "memtune" command */ @@ -14675,6 +14830,12 @@ const vshCmdDef domManagementCmds[] = { .info = info_update_device, .flags = 0 }, + {.name = "update-memory-device", + .handler = cmdUpdateMemoryDevice, + .opts = opts_update_memory_device, + .info = info_update_memory_device, + .flags = 0 + }, {.name = "vcpucount", .handler = cmdVcpucount, .opts = opts_vcpucount, -- 2.32.0

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- NEWS.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/NEWS.rst b/NEWS.rst index 4521499db7..68afc01633 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -17,6 +17,13 @@ v7.8.0 (unreleased) * **New features** + * Introduce virtio-mem ``<memory/>`` model + + New virtio-mem model is introduced for ``<memory/>`` device which is a + paravirtualized mechanism of adding/removing memory to/from a VM. Use + ``virDomainUpdateDeviceFlags()`` API to adjust amount of memory or ``virsh + update-memory-device`` for convenience. + * **Improvements** * **Bug fixes** -- 2.32.0

This commit adds new memorydevices.rst page which should serve all models of memory devices. Yet, I'm documenting virtio-mem quirks only. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/kbase/index.rst | 4 + docs/kbase/memorydevices.rst | 150 +++++++++++++++++++++++++++++++++++ docs/kbase/meson.build | 1 + 3 files changed, 155 insertions(+) create mode 100644 docs/kbase/memorydevices.rst diff --git a/docs/kbase/index.rst b/docs/kbase/index.rst index 351ea2c93b..09b19ed1af 100644 --- a/docs/kbase/index.rst +++ b/docs/kbase/index.rst @@ -52,6 +52,10 @@ Usage `PCI topology <../pci-addresses.html>`__ Addressing schemes for PCI devices +`Memory devices <memorydevices.html>`__ + Memory devices and their use + + Internals / Debugging --------------------- diff --git a/docs/kbase/memorydevices.rst b/docs/kbase/memorydevices.rst new file mode 100644 index 0000000000..7f58a4247b --- /dev/null +++ b/docs/kbase/memorydevices.rst @@ -0,0 +1,150 @@ +============== +Memory devices +============== + +.. contents:: + +Basics +====== + +Memory devices can be divided into two families: volatile and non-volatile. +The former is typical RAM memory: it's volatile and thus its contents doesn't +survive guest reboots or power cycles. The latter retains its contents across +reboots or power outages. + +In Libvirt, there are two models for volatile memory: + +* ``dimm`` model: + + :: + + <memory model='dimm'> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + <address type='dimm' slot='0'/> + </memory> + +* ``virtio-mem`` model: + + :: + + <memory model='virtio-mem'> + <target> + <size unit='KiB'>1048576</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>524288</requested> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </memory> + +Then there are two models for non-volatile memory: + +* ``nvdimm`` model: + + :: + + <memory model='nvdimm'> + <source> + <path>/tmp/nvdimm</path> + </source> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + <address type='dimm' slot='0'/> + </memory> + +* ``virtio-pmem`` model: + + :: + + <memory model='virtio-pmem' access='shared'> + <source> + <path>/tmp/virtio_pmem</path> + </source> + <target> + <size unit='KiB'>524288</size> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </memory> + + +Please note that (maybe somewhat surprisingly) virtio models go onto PCI bus +instead of DIMM slots. + +Furthermore, DIMMs can have ``<source/>`` element which configures backend for +devices. For NVDIMMs the element is mandatory and reflects where the content +is saved. + +See `memory devices documentation <../formatdomain.html#elementsMemory>`_. + +``virtio-mem`` model +==================== + +The ``virtio-mem`` model can be viewed as revised memory balloon. It offers +adding and removing memory (without the actual hotplug of the device). It +solves problems that memory balloon can't solve on its own and thus is more +flexible than DIMM + balloon solution. ``virtio-mem`` is NUMA aware, and thus +memory can be inflated/deflated only for a subset of guest NUMA nodes. Also, +it works with chunks that are either exposed to guest or reclaimed from it. + +See https://virtio-mem.gitlab.io/ + +Under the hood, ``virtio-mem`` device is split into chunks of equal size which +are then exposed to the guest. Either all of them or only a portion depending +on user's request. Therefore there are three important sizes for +``virtio-mem``. All are to be found under ``<target/>`` element: + +#. The maximum size the device can ever offer, exposed under ``<size/>`` +#. The size of a single block, exposed under ``<block/>`` +#. The current size exposed to the guest, exposed under ``<requested/>`` + +For instance, in the following example the maximum size is 4GiB, the block size +is 2MiB and only 1GiB should be exposed to the guest: + + :: + + <memory model='virtio-mem'> + <target> + <size unit='KiB'>4194304</size> + <block unit='KiB'>2048</block> + <requested unit='KiB'>1048576</requested> + </target> + </memory> + +Please note that ``<requested/>`` must be an integer multiple of ``<block/>`` +size or zero (no blocks exposed to the guest) and has to be less or equal to +``<size/>`` (all blocks exposed to the guest). Furthermore, QEMU recommends the +``<block/>`` size to be as big as a Transparent Huge Page (usually 2MiB). + +To change the size exposed to the guest, users should pass memory device XML +with nothing but ``<requested/>`` changed into the +``virDomainUpdateDeviceFlags()`` API. For user's convenience this can be done +via virsh too: + + :: + + # virsh update-memory-device $dom --requested-size 2GiB + +If there are two or more ``<memory/>`` devices then ``--alias`` shall be used +to tell virsh which memory device should be updated. + +For running guests there is fourth size that can be found under ``<target/>``: + + :: + + <current unit='KiB'>2097152</current> + +The ``<current/>`` reflects the current size used by the guest. In general it +can differ from ``<requested/>``. Reasons include guest kernel missing +``virtio-mem`` module and thus being unable to take offered memory, or guest +kernel being unable to free memory. Since ``<current/>`` only reports size to +users, the element is never parsed. It is formatted only into live XML. + +Since changing ``<current/>`` allocation requires cooperation with guest +kernel, requests for change are not instant. Therefore, libvirt emits +``VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE`` event whenever current +allocation changed. diff --git a/docs/kbase/meson.build b/docs/kbase/meson.build index 73dc11837f..af067db453 100644 --- a/docs/kbase/meson.build +++ b/docs/kbase/meson.build @@ -10,6 +10,7 @@ docs_kbase_files = [ 'locking-lockd', 'locking', 'locking-sanlock', + 'memorydevices', 'merging_disk_image_chains', 'migrationinternals', 'qemu-core-dump', -- 2.32.0

On 13.09.21 16:52, Michal Privoznik wrote:
v4 of:
https://listman.redhat.com/archives/libvir-list/2021-June/msg00679.html
diff to v4: - Rebased onto current master - Worked in David's suggestions, e.g. rename from <actual/> to <current/>, implemented offline memory update, implemented --node argument to virsh update-memory-device, prealloc is OFF and reserve is ON for virtio-mem
Some suggestions are left as future work. For instance: - Don't require memory slots because virtio-mem lives on PCI bus anyway - Allow path backed backend for virtio-mem
Just a note that <memoryBacking> <source type='file'/> <access mode='shared'/> </memoryBacking> is doing what it's supposed to do. So only explicit file paths are not supported yet.
- support .prealloc for virtio-mem object (not memory-backend-* !)
I keep occasionally rebased version on my gitlab:
https://gitlab.com/MichalPrivoznik/libvirt/-/commits/virtio_mem_v5/
I just played with it and "virsh update-memory-device" is working like a charm now: a) with "--node" b) with "--alias", including manually specified alias like "<alias name='ua-virtiomem1'/>" c) with --config, --live, --current I see that "aliases" prefixed with "ua-" are an existing concept. Maybe we want to cross-reference that in the virtio-mem documentation? Nothing unusual found during my testing. I did not play with huge pages, as it's initially not supported. Thanks a bunch! -- Thanks, David / dhildenb

On 14.09.21 14:34, David Hildenbrand wrote:
On 13.09.21 16:52, Michal Privoznik wrote:
v4 of:
https://listman.redhat.com/archives/libvir-list/2021-June/msg00679.html
diff to v4: - Rebased onto current master - Worked in David's suggestions, e.g. rename from <actual/> to <current/>, implemented offline memory update, implemented --node argument to virsh update-memory-device, prealloc is OFF and reserve is ON for virtio-mem
Some suggestions are left as future work. For instance: - Don't require memory slots because virtio-mem lives on PCI bus anyway - Allow path backed backend for virtio-mem
Just a note that
<memoryBacking> <source type='file'/> <access mode='shared'/> </memoryBacking>
is doing what it's supposed to do. So only explicit file paths are not supported yet.
- support .prealloc for virtio-mem object (not memory-backend-* !)
I keep occasionally rebased version on my gitlab:
https://gitlab.com/MichalPrivoznik/libvirt/-/commits/virtio_mem_v5/
I just played with it and "virsh update-memory-device" is working like a charm now:
a) with "--node" b) with "--alias", including manually specified alias like "<alias name='ua-virtiomem1'/>" c) with --config, --live, --current
I see that "aliases" prefixed with "ua-" are an existing concept. Maybe we want to cross-reference that in the virtio-mem documentation?
Nothing unusual found during my testing. I did not play with huge pages, as it's initially not supported.
... and I just played with huge pages, and due to the added "reserve=off" it works just as expected, nice. (prealloc support to be added to make it actually safe to use) -- Thanks, David / dhildenb

On 13.09.21 16:52, Michal Privoznik wrote:
v4 of:
https://listman.redhat.com/archives/libvir-list/2021-June/msg00679.html
diff to v4: - Rebased onto current master - Worked in David's suggestions, e.g. rename from <actual/> to <current/>, implemented offline memory update, implemented --node argument to virsh update-memory-device, prealloc is OFF and reserve is ON for virtio-mem
Some suggestions are left as future work. For instance: - Don't require memory slots because virtio-mem lives on PCI bus anyway - Allow path backed backend for virtio-mem - support .prealloc for virtio-mem object (not memory-backend-* !)
I keep occasionally rebased version on my gitlab:
https://gitlab.com/MichalPrivoznik/libvirt/-/commits/virtio_mem_v5/
Hi Michal, I noticed one minor thing: If I start a VM with <numa> <cell id='0' cpus='0-7' memory='2097152' unit='KiB'/> <cell id='1' cpus='8-15' memory='2097152' unit='KiB'/> </numa> ... <memory model='virtio-mem'> <target> <size unit='KiB'>16777216</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>2097152</requested> </target> <alias name='ua-virtiomem0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </memory> <memory model='virtio-mem'> <target> <size unit='KiB'>16777216</size> <node>1</node> <block unit='KiB'>2048</block> <requested unit='KiB'>2097152</requested> </target> <alias name='ua-virtiomem1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </memory> I get after it booted up <maxMemory slots='2' unit='KiB'>41943040</maxMemory> <memory unit='KiB'>37748736</memory> <currentMemory unit='KiB'>9437184</currentMemory> <memory model='virtio-mem'> <target> <size unit='KiB'>16777216</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>2097152</requested> <current unit='KiB'>131072</current> </target> <alias name='ua-virtiomem0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </memory> <memory model='virtio-mem'> <target> <size unit='KiB'>16777216</size> <node>1</node> <block unit='KiB'>2048</block> <requested unit='KiB'>2097152</requested> <current unit='KiB'>2097152</current> </target> <alias name='ua-virtiomem1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </memory> Note the "<current unit='KiB'>131072</current>". Inside of the guest, I can see that it really is 2G. If I trigger a new "virsh update-memory-device Fedora34", it gets updated properly. I assume the devices get initialized in the guest in parallel. Could it be that libvirt gets confused when there are concurrent notifications about two devices or could it be QEMU accidentally swallows some events? I'll investigate the latter regarding rate limiting the output in QEMU, maybe something is messed up. -- Thanks, David / dhildenb

On 21.09.21 11:33, David Hildenbrand wrote:
On 13.09.21 16:52, Michal Privoznik wrote:
v4 of:
https://listman.redhat.com/archives/libvir-list/2021-June/msg00679.html
diff to v4: - Rebased onto current master - Worked in David's suggestions, e.g. rename from <actual/> to <current/>, implemented offline memory update, implemented --node argument to virsh update-memory-device, prealloc is OFF and reserve is ON for virtio-mem
Some suggestions are left as future work. For instance: - Don't require memory slots because virtio-mem lives on PCI bus anyway - Allow path backed backend for virtio-mem - support .prealloc for virtio-mem object (not memory-backend-* !)
I keep occasionally rebased version on my gitlab:
https://gitlab.com/MichalPrivoznik/libvirt/-/commits/virtio_mem_v5/
Hi Michal,
I noticed one minor thing:
If I start a VM with
<numa> <cell id='0' cpus='0-7' memory='2097152' unit='KiB'/> <cell id='1' cpus='8-15' memory='2097152' unit='KiB'/> </numa> ... <memory model='virtio-mem'> <target> <size unit='KiB'>16777216</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>2097152</requested> </target> <alias name='ua-virtiomem0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </memory> <memory model='virtio-mem'> <target> <size unit='KiB'>16777216</size> <node>1</node> <block unit='KiB'>2048</block> <requested unit='KiB'>2097152</requested> </target> <alias name='ua-virtiomem1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </memory>
I get after it booted up
<maxMemory slots='2' unit='KiB'>41943040</maxMemory> <memory unit='KiB'>37748736</memory> <currentMemory unit='KiB'>9437184</currentMemory> <memory model='virtio-mem'> <target> <size unit='KiB'>16777216</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>2097152</requested> <current unit='KiB'>131072</current> </target> <alias name='ua-virtiomem0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </memory> <memory model='virtio-mem'> <target> <size unit='KiB'>16777216</size> <node>1</node> <block unit='KiB'>2048</block> <requested unit='KiB'>2097152</requested> <current unit='KiB'>2097152</current> </target> <alias name='ua-virtiomem1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </memory>
Note the "<current unit='KiB'>131072</current>". Inside of the guest, I can see that it really is 2G.
If I trigger a new "virsh update-memory-device Fedora34", it gets updated properly.
I assume the devices get initialized in the guest in parallel. Could it be that libvirt gets confused when there are concurrent notifications about two devices or could it be QEMU accidentally swallows some events? I'll investigate the latter regarding rate limiting the output in QEMU, maybe something is messed up.
QEMU BUG, will send a fix :) -- Thanks, David / dhildenb
participants (4)
-
David Hildenbrand
-
Jing Qi
-
Michal Privoznik
-
Michal Prívozník