[PATCH v2 RESEND 00/13] Introduce virtio-mem <memory/> model

Resend of: https://listman.redhat.com/archives/libvir-list/2021-February/msg00534.html We had couple of changes made (new capabilities added mostly) and the original patch set no long applies cleanly. This is pure rebase of the original patch set, no functional change (hence I'm keeping version number). Michal Prívozník (13): virhostmem: Introduce virHostMemGetTHPSize() qemu_process: Deduplicate code in qemuProcessNeedHugepagesPath() qemu_process: Drop needless check in qemuProcessNeedMemoryBackingPath() qemu_capabilities: Introduce QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI conf: Introduce virtio-mem <memory/> model qemu: Build command line for virtio-mem qemu: Wire up <memory/> live update qemu: Wire up MEMORY_DEVICE_SIZE_CHANGE event Introduce MEMORY_DEVICE_SIZE_CHANGE event qemu: Refresh the actual size of virtio-mem on monitor reconnect virsh: Introduce update-memory command news: document recent virtio memory addition kbase: Document virtio-mem NEWS.rst | 7 + docs/formatdomain.rst | 58 ++++- docs/kbase/index.rst | 4 + docs/kbase/memorydevices.rst | 160 ++++++++++++++ docs/kbase/meson.build | 1 + docs/manpages/virsh.rst | 30 +++ docs/schemas/domaincommon.rng | 16 ++ examples/c/misc/event-test.c | 17 ++ include/libvirt/libvirt-domain.h | 23 ++ src/conf/domain_conf.c | 115 +++++++++- src/conf/domain_conf.h | 15 ++ src/conf/domain_event.c | 84 +++++++ src/conf/domain_event.h | 10 + src/conf/domain_validate.c | 39 ++++ src/libvirt_private.syms | 5 + src/qemu/qemu_alias.c | 10 +- src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 13 +- src/qemu/qemu_domain.c | 67 +++++- src/qemu/qemu_domain.h | 1 + src/qemu/qemu_domain_address.c | 38 +++- src/qemu/qemu_driver.c | 209 +++++++++++++++++- src/qemu/qemu_hotplug.c | 18 ++ src/qemu/qemu_hotplug.h | 5 + src/qemu/qemu_monitor.c | 37 ++++ src/qemu/qemu_monitor.h | 27 +++ src/qemu/qemu_monitor_json.c | 95 ++++++-- src/qemu/qemu_monitor_json.h | 5 + src/qemu/qemu_process.c | 101 +++++++-- src/qemu/qemu_validate.c | 22 +- src/remote/remote_daemon_dispatch.c | 30 +++ src/remote/remote_driver.c | 32 +++ src/remote/remote_protocol.x | 15 +- src/remote_protocol-structs | 7 + src/security/security_apparmor.c | 1 + src/security/security_dac.c | 2 + src/security/security_selinux.c | 2 + src/util/virhostmem.c | 63 ++++++ src/util/virhostmem.h | 3 + tests/domaincapsmock.c | 9 + .../caps_5.1.0.x86_64.xml | 1 + .../caps_5.2.0.x86_64.xml | 1 + .../caps_6.0.0.x86_64.xml | 1 + ...mory-hotplug-virtio-mem.x86_64-latest.args | 48 ++++ .../memory-hotplug-virtio-mem.xml | 64 ++++++ tests/qemuxml2argvtest.c | 1 + ...emory-hotplug-virtio-mem.x86_64-latest.xml | 1 + tests/qemuxml2xmltest.c | 1 + tools/virsh-domain.c | 170 ++++++++++++++ 50 files changed, 1619 insertions(+), 68 deletions(-) create mode 100644 docs/kbase/memorydevices.rst create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml create mode 120000 tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml -- 2.26.2

New virHostMemGetTHPSize() is introduced which allows caller to obtain THP PMD (Page Middle Directory) size, which is equal to the minimal size that THP can use, taken from kernel doc (Documentation/admin-guide/mm/transhuge.rst): Some userspace (such as a test program, or an optimized memory allocation library) may want to know the size (in bytes) of a transparent hugepage:: cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size Since this size depends on the host architecture and the kernel it won't change whilst libvirtd is running. Therefore, we can use virOnce() and cache the value. Of course, we can be running under kernel that has THP disabled or has no notion of THP at all. In that case a negative value is returned to signal error. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virhostmem.c | 63 ++++++++++++++++++++++++++++++++++++++++ src/util/virhostmem.h | 3 ++ tests/domaincapsmock.c | 9 ++++++ 4 files changed, 76 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 6b7261b987..0e6ad1222f 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2374,6 +2374,7 @@ virHostMemGetFreePages; virHostMemGetInfo; virHostMemGetParameters; virHostMemGetStats; +virHostMemGetTHPSize; virHostMemSetParameters; diff --git a/src/util/virhostmem.c b/src/util/virhostmem.c index ae42978ed2..ef7b97806f 100644 --- a/src/util/virhostmem.c +++ b/src/util/virhostmem.c @@ -45,11 +45,14 @@ #include "virstring.h" #include "virnuma.h" #include "virlog.h" +#include "virthread.h" #define VIR_FROM_THIS VIR_FROM_NONE VIR_LOG_INIT("util.hostmem"); +static unsigned long long virHostTHPPMDSize; +static virOnceControl virHostMemGetTHPSizeOnce = VIR_ONCE_CONTROL_INITIALIZER; #ifdef __FreeBSD__ # define BSD_MEMORY_STATS_ALL 4 @@ -920,3 +923,63 @@ virHostMemAllocPages(unsigned int npages, return ncounts; } + +#if defined(__linux__) +# define HPAGE_PMD_SIZE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size" +static int +virHostMemGetTHPSizeSysfs(unsigned long long *size) +{ + g_autofree char *buf = NULL; + + /* 1KiB limit is more than enough. */ + if (virFileReadAll(HPAGE_PMD_SIZE_PATH, 1024, &buf) < 0) + return -1; + + virStringTrimOptionalNewline(buf); + if (virStrToLong_ull(buf, NULL, 10, size) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("unable to parse THP PMD size: %s"), buf); + return -1; + } + + /* Size is now in bytes. Convert to KiB. */ + *size >>= 10; + return 0; +} +#endif /* defined(__linux__) */ + + +static void +virHostMemGetTHPSizeOnceInit(void) +{ +#if defined(__linux__) + virHostMemGetTHPSizeSysfs(&virHostTHPPMDSize); +#else /* !defined(__linux__) */ + VIR_WARN("Getting THP size not ported yet"); +#endif /* !defined(__linux__) */ +} + + +/** + * virHostMemGetTHPSize: + * @size: returned size of THP in kibibytes + * + * Obtain Transparent Huge Page size in kibibytes. The size + * depends on host architecture and kernel. Because of virOnce(), + * do not rely on errno in case of failure. + * + * Returns: 0 on success, + * -1 on failure. + */ +int +virHostMemGetTHPSize(unsigned long long *size) +{ + if (virOnce(&virHostMemGetTHPSizeOnce, virHostMemGetTHPSizeOnceInit) < 0) + return -1; + + if (virHostTHPPMDSize == 0) + return -1; + + *size = virHostTHPPMDSize; + return 0; +} diff --git a/src/util/virhostmem.h b/src/util/virhostmem.h index 1369829807..bf15c40698 100644 --- a/src/util/virhostmem.h +++ b/src/util/virhostmem.h @@ -53,3 +53,6 @@ int virHostMemAllocPages(unsigned int npages, int startCell, unsigned int cellCount, bool add); + +int virHostMemGetTHPSize(unsigned long long *size) + G_GNUC_NO_INLINE; diff --git a/tests/domaincapsmock.c b/tests/domaincapsmock.c index d81a898dc0..34a2f9ad81 100644 --- a/tests/domaincapsmock.c +++ b/tests/domaincapsmock.c @@ -17,6 +17,7 @@ #include <config.h> #include "virhostcpu.h" +#include "virhostmem.h" #ifdef WITH_LIBXL # include "libxl/libxl_capabilities.h" #endif @@ -40,3 +41,11 @@ virHostCPUGetMicrocodeVersion(virArch hostArch G_GNUC_UNUSED) { return 0; } + +int +virHostMemGetTHPSize(unsigned long long *size) +{ + /* Pretend Transparent Huge Page size is 2MiB. */ + *size = 2048; + return 0; +} -- 2.26.2

On Thu, Feb 18, 2021 at 14:30:56 +0100, Michal Privoznik wrote:
New virHostMemGetTHPSize() is introduced which allows caller to obtain THP PMD (Page Middle Directory) size, which is equal to the minimal size that THP can use, taken from kernel doc (Documentation/admin-guide/mm/transhuge.rst):
Some userspace (such as a test program, or an optimized memory allocation library) may want to know the size (in bytes) of a transparent hugepage::
cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
Since this size depends on the host architecture and the kernel it won't change whilst libvirtd is running. Therefore, we can use virOnce() and cache the value. Of course, we can be running under kernel that has THP disabled or has no notion of THP at all. In that case a negative value is returned to signal error.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virhostmem.c | 63 ++++++++++++++++++++++++++++++++++++++++ src/util/virhostmem.h | 3 ++ tests/domaincapsmock.c | 9 ++++++ 4 files changed, 76 insertions(+)
[...]
diff --git a/src/util/virhostmem.c b/src/util/virhostmem.c index ae42978ed2..ef7b97806f 100644 --- a/src/util/virhostmem.c +++ b/src/util/virhostmem.c @@ -45,11 +45,14 @@ #include "virstring.h" #include "virnuma.h" #include "virlog.h" +#include "virthread.h"
#define VIR_FROM_THIS VIR_FROM_NONE
VIR_LOG_INIT("util.hostmem");
+static unsigned long long virHostTHPPMDSize;
Please add a comment that it's in kiB.
+static virOnceControl virHostMemGetTHPSizeOnce = VIR_ONCE_CONTROL_INITIALIZER;

The aim of qemuProcessNeedHugepagesPath() is to return whether guest needs private path inside HugeTLBFS mounts (deducted from domain definition @def) or whether the memory device that user is hotplugging in needs the private path (deducted from the @mem argument). The actual creation of the path is done in the only caller qemuProcessBuildDestroyMemoryPaths(). The rule for the first case (@def) and the second case (@mem) is the same (domain has a DIMM device that has HP requested) and is written twice. Move the logic into a function to deduplicate the code. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 7feb35e609..6a223e18c2 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3863,6 +3863,27 @@ qemuProcessReconnectCheckMemAliasOrderMismatch(virDomainObjPtr vm) } +static bool +qemuProcessDomainMemoryDefNeedHugepagesPath(const virDomainMemoryDef *mem, + const long system_pagesize) +{ + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + return mem->pagesize && + mem->pagesize != system_pagesize; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + /* None of these can be backed by hugepages. */ + return false; + } + + return false; +} + + static bool qemuProcessNeedHugepagesPath(virDomainDefPtr def, virDomainMemoryDefPtr mem) @@ -3879,16 +3900,12 @@ qemuProcessNeedHugepagesPath(virDomainDefPtr def, } for (i = 0; i < def->nmems; i++) { - if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_DIMM && - def->mems[i]->pagesize && - def->mems[i]->pagesize != system_pagesize) + if (qemuProcessDomainMemoryDefNeedHugepagesPath(def->mems[i], system_pagesize)) return true; } if (mem && - mem->model == VIR_DOMAIN_MEMORY_MODEL_DIMM && - mem->pagesize && - mem->pagesize != system_pagesize) + qemuProcessDomainMemoryDefNeedHugepagesPath(mem, system_pagesize)) return true; return false; -- 2.26.2

The aim of this function is to return whether domain definition and/or memory device that user intents to hotplug needs a private path inside cfg->memoryBackingDir. The rule for the memory device that's being hotplug includes checking whether corresponding guest NUMA node needs memoryBackingDir. Well, while the rationale behind makes sense it is not necessary to check for that really - just a few lines above every guest NUMA node was checked exactly for that. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 6a223e18c2..066f153703 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3930,13 +3930,24 @@ qemuProcessNeedMemoryBackingPath(virDomainDefPtr def, return true; } - if (mem && - mem->model == VIR_DOMAIN_MEMORY_MODEL_DIMM && - (mem->access != VIR_DOMAIN_MEMORY_ACCESS_DEFAULT || - (mem->targetNode >= 0 && - virDomainNumaGetNodeMemoryAccessMode(def->numa, mem->targetNode) - != VIR_DOMAIN_MEMORY_ACCESS_DEFAULT))) - return true; + if (mem) { + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + if (mem->access != VIR_DOMAIN_MEMORY_ACCESS_DEFAULT) { + /* No need to check for access mode on the target node, + * it was checked for in the previous loop. */ + return true; + } + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + /* Backed by user provided path. Not stored in memory + * backing dir anyway. */ + break; + } + } return false; } -- 2.26.2

This commit introduces a new capability that reflects virtio-mem-pci device support in QEMU: QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI, /* -device virtio-mem-pci */ The virtio-mem-pci device was introduced in QEMU 5.1. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml | 1 + 5 files changed, 6 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 3f8593a9e5..3f47a6aaf5 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -617,6 +617,7 @@ VIR_ENUM_IMPL(virQEMUCaps, "cpu-max", "memory-backend-file.x-use-canonical-path-for-ramblock-id", "vnc-opts", + "virtio-mem-pci", ); @@ -1338,6 +1339,7 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { { "am53c974", QEMU_CAPS_SCSI_AM53C974 }, { "virtio-pmem-pci", QEMU_CAPS_DEVICE_VIRTIO_PMEM_PCI }, { "vhost-user-blk", QEMU_CAPS_DEVICE_VHOST_USER_BLK }, + { "virtio-mem-pci", QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI }, }; diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 38574eef16..9eb69a5fd7 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -597,6 +597,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ QEMU_CAPS_CPU_MAX, /* -cpu max */ QEMU_CAPS_X_USE_CANONICAL_PATH_FOR_RAMBLOCK_ID, /* -object memory-backend-file,x-use-canonical-path-for-ramblock-id= */ QEMU_CAPS_VNC_OPTS, /* -vnc uses QemuOpts parser instead of custom code */ + QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI, /* -device virtio-mem-pci */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml index d1096c3cb7..2fb2edb2a5 100644 --- a/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml @@ -254,6 +254,7 @@ <flag name='cpu-max'/> <flag name='memory-backend-file.x-use-canonical-path-for-ramblock-id'/> <flag name='vnc-opts'/> + <flag name='virtio-mem-pci'/> <version>5001000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml index 853f73fa53..75c5008f8a 100644 --- a/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml @@ -255,6 +255,7 @@ <flag name='cpu-max'/> <flag name='memory-backend-file.x-use-canonical-path-for-ramblock-id'/> <flag name='vnc-opts'/> + <flag name='virtio-mem-pci'/> <version>5002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml index 23fb5b7393..eee225dc92 100644 --- a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml @@ -256,6 +256,7 @@ <flag name='cpu-max'/> <flag name='memory-backend-file.x-use-canonical-path-for-ramblock-id'/> <flag name='vnc-opts'/> + <flag name='virtio-mem-pci'/> <version>5002050</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100242</microcodeVersion> -- 2.26.2

The virtio-mem is paravirtualized mechanism of adding/removing memory to/from a VM. A virtio-mem-pci device is split into blocks of equal size which are then exposed (all or only a requested portion of them) to the guest kernel to use as regular memory. Therefore, the device has two important attributes: 1) block-size, which defines the size of a block 2) requested-size, which defines how much memory (in bytes) is the device requested to expose to the guest. The 'block-size' is configured on command line and immutable throughout device's lifetime. The 'requested-size' can be set on the command line too, but also is adjustable via monitor. In fact, that is how management software places its requests to change the memory allocation. If it wants to give more memory to the guest it changes 'requested-size' to a bigger value, and if it wants to shrink guest memory it changes the 'requested-size' to a smaller value. Note, value of zero means that guest should release all memory offered by the device. Of course, guest has to cooperate. Therefore, there is a third attribute 'size' which is read only and reflects how much memory the guest still has. This can be different to 'requested-size', obviously. Because of name clash, I've named it 'actualsize' and it is dealt with in future commits (it is a runtime information anyway). In the backend, memory for virtio-mem is backed by usual objects: memory-backend-{ram,file,memfd} and their size puts the cap on the amount of memory that a virtio-mem device can offer to a guest. But we are already able to express this info using <size/> under <target/>. Therefore, we need only two more elements to cover 'block-size' and 'requested-size' attributes. This is the XML I've came up with: <memory model='virtio-mem'> <source> <nodemask>1-3</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>2097152</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>1048576</requested> </target> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </memory> I hope by now it is obvious that: 1) 'requested-size' must be an integer multiple of 'block-size', and 2) virtio-mem-pci device goes onto PCI bus and thus needs PCI address. Then there is a limitation that the minimal 'block-size' is transparent huge page size (I'll leave this without explanation). Since now we have (possibly) two or more devices that allow memory inflation/deflation and accounting for all of them (and thus keeping <currentMemory/> updated) might be hard. Therefore, I'm deliberately forbidding memballoon. It's okay - virtio-mem is superior to memballoon anyway. We can always reevaluate later. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.rst | 51 +++++++++++++-- docs/schemas/domaincommon.rng | 11 ++++ src/conf/domain_conf.c | 53 ++++++++++++++- src/conf/domain_conf.h | 3 + src/conf/domain_validate.c | 39 +++++++++++ src/qemu/qemu_alias.c | 1 + src/qemu/qemu_command.c | 1 + src/qemu/qemu_domain.c | 27 +++++++- src/qemu/qemu_domain_address.c | 38 ++++++++--- src/qemu/qemu_process.c | 2 + src/qemu/qemu_validate.c | 22 ++++++- src/security/security_apparmor.c | 1 + src/security/security_dac.c | 2 + src/security/security_selinux.c | 2 + .../memory-hotplug-virtio-mem.xml | 64 +++++++++++++++++++ ...emory-hotplug-virtio-mem.x86_64-latest.xml | 1 + tests/qemuxml2xmltest.c | 1 + 17 files changed, 298 insertions(+), 21 deletions(-) create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml create mode 120000 tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 2587106191..05d359a32d 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -6924,6 +6924,8 @@ appropriate, so there is no need to explicitly add this element in the guest XML unless a specific PCI slot needs to be assigned. :since:`Since 0.8.3, Xen, QEMU and KVM only` Additionally, :since:`since 0.8.4` , if the memballoon device needs to be explicitly disabled, ``model='none'`` may be used. +Using memballoon among with ``virtio-mem`` memory device is unsupported. In +that case, setting model to anything else then ``none`` results in an error. Example: automatically added device with KVM @@ -6935,6 +6937,16 @@ Example: automatically added device with KVM </devices> ... +Example: automatically added device with QEMU/KVM and a ``virtio-mem`` device: + +:: + + ... + <devices> + <memballoon model='none'/> + </devices> + ... + Example: manually added device with static PCI slot 2 requested :: @@ -7374,6 +7386,18 @@ Example: usage of the memory devices <size unit='KiB'>524288</size> </target> </memory> + <memory model='virtio-mem'> + <source> + <nodemask>1-3</nodemask> + <pagesize unit='KiB'>2048</pagesize> + </source> + <target> + <size unit='KiB'>2097152</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>1048576</requested> + </target> + </memory> </devices> ... @@ -7381,7 +7405,9 @@ Example: usage of the memory devices Provide ``dimm`` to add a virtual DIMM module to the guest. :since:`Since 1.2.14` Provide ``nvdimm`` model that adds a Non-Volatile DIMM module. :since:`Since 3.2.0` Provide ``virtio-pmem`` model to add a paravirtualized - persistent memory device. :since:`Since 7.1.0` + persistent memory device. :since:`Since 7.1.0` Provide ``virtio-mem`` model + to add paravirtualized memory device. :since:`Since 7.1.0` Please note that + using ``virtio-mem`` with memory balloon is unsupported. ``access`` An optional attribute ``access`` ( :since:`since 3.2.0` ) that provides @@ -7404,10 +7430,11 @@ Example: usage of the memory devices allowed only for ``model='nvdimm'`` for pSeries guests. :since:`Since 6.2.0` ``source`` - For model ``dimm`` this element is optional and allows to fine tune the - source of the memory used for the given memory device. If the element is not - provided defaults configured via ``numatune`` are used. If ``dimm`` is - provided, then the following optional elements can be provided as well: + For model ``dimm`` and model ``virtio-mem`` this element is optional and + allows to fine tune the source of the memory used for the given memory + device. If the element is not provided defaults configured via ``numatune`` + are used. If the element is provided, then the following optional elements + can be provided: ``pagesize`` This element can be used to override the default host page size used for @@ -7446,7 +7473,8 @@ Example: usage of the memory devices added memory from the perspective of the guest. The mandatory ``size`` subelement configures the size of the added memory as - a scaled integer. + a scaled integer. For ``virtio-mem`` this represents the maximum possible + size exposed to the guest. The ``node`` subelement configures the guest NUMA node to attach the memory to. The element shall be used only if the guest has NUMA nodes configured. @@ -7473,6 +7501,17 @@ Example: usage of the memory devices so other backend types should use the ``readonly`` element. :since:`Since 5.0.0` + ``block`` + For ``virtio-mem`` only. + The size of an individual block, granularity of division of memory module. + Must be power of two and at least equal to size of a transparent hugepage + (2MiB on x84_64). The default is hypervisor dependent. + + ``requested`` + For ``virtio-mem`` only. + The total size exposed to the guest. Must respect ``block`` granularity + and be smaller or equal to ``size``. + :anchor:`<a id="elementsIommu"/>` IOMMU devices diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index e6de934456..1f672756bd 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -6056,6 +6056,7 @@ <value>dimm</value> <value>nvdimm</value> <value>virtio-pmem</value> + <value>virtio-mem</value> </choice> </attribute> <optional> @@ -6140,6 +6141,16 @@ <ref name="unsignedInt"/> </element> </optional> + <optional> + <element name="block"> + <ref name="scaledInteger"/> + </element> + </optional> + <optional> + <element name="requested"> + <ref name="scaledInteger"/> + </element> + </optional> <optional> <element name="label"> <element name="size"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index b731744f04..f4c08ef43d 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -1329,6 +1329,7 @@ VIR_ENUM_IMPL(virDomainMemoryModel, "dimm", "nvdimm", "virtio-pmem", + "virtio-mem", ); VIR_ENUM_IMPL(virDomainShmemModel, @@ -5369,6 +5370,7 @@ virDomainMemoryDefPostParse(virDomainMemoryDefPtr mem, } break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_DIMM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: @@ -15392,6 +15394,7 @@ virDomainMemorySourceDefParseXML(xmlNodePtr node, switch (def->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: if (virDomainParseMemory("./pagesize", "./pagesize/@unit", ctxt, &def->pagesize, false, false) < 0) return -1; @@ -15458,7 +15461,8 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node, &def->size, true, false) < 0) return -1; - if (def->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) { + switch (def->model) { + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: if (virDomainParseMemory("./label/size", "./label/size/@unit", ctxt, &def->labelsize, false, false) < 0) return -1; @@ -15477,6 +15481,23 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node, if (virXPathBoolean("boolean(./readonly)", ctxt)) def->readonly = true; + break; + + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (virDomainParseMemory("./block", "./block/@unit", ctxt, + &def->blocksize, false, false) < 0) + return -1; + + if (virDomainParseMemory("./requested", "./requested/@unit", ctxt, + &def->requestedsize, false, false) < 0) + return -1; + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; } return 0; @@ -17294,11 +17315,14 @@ virDomainMemoryFindByDefInternal(virDomainDefPtr def, /* target info -> always present */ if (tmp->model != mem->model || tmp->targetNode != mem->targetNode || - tmp->size != mem->size) + tmp->size != mem->size || + tmp->blocksize != mem->blocksize || + tmp->requestedsize != mem->requestedsize) continue; switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: /* source stuff -> match with device */ if (tmp->pagesize != mem->pagesize) continue; @@ -22873,6 +22897,22 @@ virDomainMemoryDefCheckABIStability(virDomainMemoryDefPtr src, return false; } + if (src->blocksize != dst->blocksize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Target memory device block size '%llu' doesn't match " + "source memory device block size '%llu'"), + dst->blocksize, src->blocksize); + return false; + } + + if (src->requestedsize != dst->requestedsize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Target memory device requested size '%llu' doesn't match " + "source memory device requested size '%llu'"), + dst->requestedsize, src->requestedsize); + return false; + } + if (src->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) { if (src->labelsize != dst->labelsize) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, @@ -26629,6 +26669,7 @@ virDomainMemorySourceDefFormat(virBufferPtr buf, switch (def->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: if (def->sourceNodes) { if (!(bitmap = virBitmapFormat(def->sourceNodes))) return -1; @@ -26685,6 +26726,14 @@ virDomainMemoryTargetDefFormat(virBufferPtr buf, if (def->readonly) virBufferAddLit(&childBuf, "<readonly/>\n"); + if (def->blocksize) { + virBufferAsprintf(&childBuf, "<block unit='KiB'>%llu</block>\n", + def->blocksize); + + virBufferAsprintf(&childBuf, "<requested unit='KiB'>%llu</requested>\n", + def->requestedsize); + } + virXMLFormatElement(buf, "target", NULL, &childBuf); } diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 930eed60de..14d238df4b 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2312,6 +2312,7 @@ typedef enum { VIR_DOMAIN_MEMORY_MODEL_DIMM, /* dimm hotpluggable memory device */ VIR_DOMAIN_MEMORY_MODEL_NVDIMM, /* nvdimm memory device */ VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM, /* virtio-pmem memory device */ + VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM, /* virtio-mem memory device */ VIR_DOMAIN_MEMORY_MODEL_LAST } virDomainMemoryModel; @@ -2332,6 +2333,8 @@ struct _virDomainMemoryDef { int targetNode; unsigned long long size; /* kibibytes */ unsigned long long labelsize; /* kibibytes; valid only for NVDIMM */ + unsigned long long blocksize; /* kibibytes; valid only for VIRTIO_MEM */ + unsigned long long requestedsize; /* kibibytes; valid only for VIRTIO_MEM */ bool readonly; /* valid only for NVDIMM */ /* required for QEMU NVDIMM ppc64 support */ diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c index b47ecba86b..7ab91c8f2f 100644 --- a/src/conf/domain_validate.c +++ b/src/conf/domain_validate.c @@ -26,6 +26,7 @@ #include "virconftypes.h" #include "virlog.h" #include "virutil.h" +#include "virhostmem.h" #define VIR_FROM_THIS VIR_FROM_DOMAIN @@ -1612,6 +1613,8 @@ static int virDomainMemoryDefValidate(const virDomainMemoryDef *mem, const virDomainDef *def) { + unsigned long long thpSize; + switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: if (!mem->nvdimmPath) { @@ -1665,6 +1668,42 @@ virDomainMemoryDefValidate(const virDomainMemoryDef *mem, _("virtio-pmem does not support NUMA nodes")); return -1; } + break; + + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (mem->requestedsize > mem->size) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("requested size must be smaller than @size")); + return -1; + } + + if (!VIR_IS_POW2(mem->blocksize)) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("block size must be a power of two")); + return -1; + } + + if (virHostMemGetTHPSize(&thpSize) < 0) { + /* We failed to get THP size, fall back to a sane default. On + * almost every architecture the size will be 2MiB, except for some + * funky arches like sparc and m68k. Use 2MiB and refine later if + * somebody complains. */ + thpSize = 2048; + } + + if (mem->blocksize < thpSize) { + virReportError(VIR_ERR_XML_DETAIL, + _("block size too small, must be at least %lluKiB"), + thpSize); + return -1; + } + + if (mem->requestedsize % mem->blocksize != 0) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("requested size must be an integer multiple of block size")); + return -1; + } + break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: break; diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c index 1c6f04c0ba..1a330829a3 100644 --- a/src/qemu/qemu_alias.c +++ b/src/qemu/qemu_alias.c @@ -522,6 +522,7 @@ qemuAssignDeviceMemoryAlias(virDomainDefPtr def, case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: prefix = "virtiopmem"; break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index d801018aa2..ac9a8c9f6a 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3355,6 +3355,7 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def, device = "virtio-pmem-pci"; break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 541d592bbe..26b9b5583e 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -3679,10 +3679,21 @@ qemuDomainDefAddDefaultDevices(virDomainDefPtr def, } if (addDefaultMemballoon && !def->memballoon) { - virDomainMemballoonDefPtr memballoon; - memballoon = g_new0(virDomainMemballoonDef, 1); + virDomainMemballoonDefPtr memballoon = g_new0(virDomainMemballoonDef, 1); + size_t i; - memballoon->model = VIR_DOMAIN_MEMBALLOON_MODEL_VIRTIO; + /* To simplify virtio-mem implementation, memballoon has to be turned + * off if domain has a virtio-mem device. See + * qemuValidateDomainDeviceDefMemory() for more details. */ + for (i = 0; i < def->nmems; i++) { + if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + break; + } + + if (i == def->nmems) + memballoon->model = VIR_DOMAIN_MEMBALLOON_MODEL_VIRTIO; + else + memballoon->model = VIR_DOMAIN_MEMBALLOON_MODEL_NONE; def->memballoon = memballoon; } @@ -8775,6 +8786,16 @@ qemuDomainDefValidateMemoryHotplugDevice(const virDomainMemoryDef *mem, needsNuma = false; break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI && + mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("only 'pci' addresses are supported for the %s device"), + virDomainMemoryModelTypeToString(mem->model)); + return -1; + } + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: return -1; diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 68dbf9e95b..64c57f1cad 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -380,9 +380,18 @@ qemuDomainPrimeVirtioDeviceAddresses(virDomainDefPtr def, } for (i = 0; i < def->nmems; i++) { - if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM && - def->mems[i]->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) - def->mems[i]->info.type = type; + switch (def->mems[i]->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (def->mems[i]->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) + def->mems[i]->info.type = type; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } } if (type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_CCW) { @@ -1011,6 +1020,7 @@ qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDefPtr dev, case VIR_DOMAIN_DEVICE_MEMORY: switch (dev->data.memory->model) { case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: return virtioFlags; case VIR_DOMAIN_MEMORY_MODEL_NONE: @@ -2438,12 +2448,19 @@ qemuDomainAssignDevicePCISlots(virDomainDefPtr def, for (i = 0; i < def->nmems; i++) { virDomainMemoryDefPtr mem = def->mems[i]; - if (mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM || - !virDeviceInfoPCIAddressIsWanted(&mem->info)) - continue; - - if (qemuDomainPCIAddressReserveNextAddr(addrs, &mem->info) < 0) - return -1; + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (virDeviceInfoPCIAddressIsWanted(&mem->info) && + qemuDomainPCIAddressReserveNextAddr(addrs, &mem->info) < 0) + return -1; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } } return 0; @@ -3110,6 +3127,7 @@ qemuDomainAssignMemoryDeviceSlot(virQEMUDriverPtr driver, break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: return qemuDomainEnsurePCIAddress(vm, &dev, driver); break; @@ -3135,6 +3153,7 @@ qemuDomainReleaseMemoryDeviceSlot(virDomainObjPtr vm, break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: qemuDomainReleaseDeviceAddress(vm, &mem->info); break; @@ -3169,6 +3188,7 @@ qemuDomainAssignMemorySlots(virDomainDefPtr def) break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: /* handled in qemuDomainAssignPCIAddresses() */ break; case VIR_DOMAIN_MEMORY_MODEL_NONE: diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 066f153703..ebf42259b5 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3869,6 +3869,7 @@ qemuProcessDomainMemoryDefNeedHugepagesPath(const virDomainMemoryDef *mem, { switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: return mem->pagesize && mem->pagesize != system_pagesize; @@ -3933,6 +3934,7 @@ qemuProcessNeedMemoryBackingPath(virDomainDefPtr def, if (mem) { switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: if (mem->access != VIR_DOMAIN_MEMORY_ACCESS_DEFAULT) { /* No need to check for access mode on the target node, * it was checked for in the previous loop. */ diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 2541ae856a..6150083251 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -4653,6 +4653,7 @@ qemuValidateDomainDeviceDefHub(virDomainHubDefPtr hub, static int qemuValidateDomainDeviceDefMemory(virDomainMemoryDefPtr mem, + const virDomainDef *def, virQEMUCapsPtr qemuCaps) { switch (mem->model) { @@ -4688,6 +4689,25 @@ qemuValidateDomainDeviceDefMemory(virDomainMemoryDefPtr mem, } break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + /* Accounting balloon and virtio-mem is hard. We have plenty of APIs + * which take balloon from QEMU and report it to users. We would have + * to change all that and account for virtio-mem actual size. Also, + * virtio-mem is supposed to be replacement for balloon. Disable + * coexistence of these two for now. */ + if (virDomainDefHasMemballoon(def)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("virtio-mem is not supported with memory balloon")); + return -1; + } + + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("virtio-mem isn't supported by this QEMU binary")); + return -1; + } + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; @@ -4842,7 +4862,7 @@ qemuValidateDomainDeviceDef(const virDomainDeviceDef *dev, break; case VIR_DOMAIN_DEVICE_MEMORY: - ret = qemuValidateDomainDeviceDefMemory(dev->data.memory, qemuCaps); + ret = qemuValidateDomainDeviceDefMemory(dev->data.memory, def, qemuCaps); break; case VIR_DOMAIN_DEVICE_SHMEM: diff --git a/src/security/security_apparmor.c b/src/security/security_apparmor.c index e6ecb513b1..5aeaf975a1 100644 --- a/src/security/security_apparmor.c +++ b/src/security/security_apparmor.c @@ -693,6 +693,7 @@ AppArmorSetMemoryLabel(virSecurityManagerPtr mgr, return reload_profile(mgr, def, mem->nvdimmPath, true); case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; } diff --git a/src/security/security_dac.c b/src/security/security_dac.c index 00eeae0d27..5ae645f25b 100644 --- a/src/security/security_dac.c +++ b/src/security/security_dac.c @@ -1871,6 +1871,7 @@ virSecurityDACRestoreMemoryLabel(virSecurityManagerPtr mgr, break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: case VIR_DOMAIN_MEMORY_MODEL_NONE: ret = 0; @@ -2054,6 +2055,7 @@ virSecurityDACSetMemoryLabel(virSecurityManagerPtr mgr, break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: case VIR_DOMAIN_MEMORY_MODEL_NONE: ret = 0; diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index 1d1d9edfff..c0a3b28044 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1584,6 +1584,7 @@ virSecuritySELinuxSetMemoryLabel(virSecurityManagerPtr mgr, case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; } @@ -1611,6 +1612,7 @@ virSecuritySELinuxRestoreMemoryLabel(virSecurityManagerPtr mgr, break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: ret = 0; diff --git a/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml new file mode 100644 index 0000000000..eed9e35d0d --- /dev/null +++ b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml @@ -0,0 +1,64 @@ +<domain type='kvm'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <maxMemory slots='16' unit='KiB'>1099511627776</maxMemory> + <memory unit='KiB'>8388608</memory> + <currentMemory unit='KiB'>8388608</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <cpu mode='custom' match='exact' check='none'> + <model fallback='forbid'>qemu64</model> + <topology sockets='2' dies='1' cores='1' threads='1'/> + <numa> + <cell id='0' cpus='0-1' memory='2095104' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i386</emulator> + <disk type='block' device='disk'> + <driver name='qemu' type='raw'/> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='ide' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <controller type='usb' index='0' model='piix3-uhci'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='none'/> + <memory model='virtio-mem'> + <target> + <size unit='KiB'>1048576</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>524288</requested> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </memory> + <memory model='virtio-mem'> + <source> + <nodemask>1-3</nodemask> + <pagesize unit='KiB'>2048</pagesize> + </source> + <target> + <size unit='KiB'>2097152</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>1048576</requested> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </memory> + </devices> +</domain> diff --git a/tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml b/tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml new file mode 120000 index 0000000000..a9d298129c --- /dev/null +++ b/tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml @@ -0,0 +1 @@ +../qemuxml2argvdata/memory-hotplug-virtio-mem.xml \ No newline at end of file diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 5cd945f28f..3a77f8d6c0 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -1267,6 +1267,7 @@ mymain(void) QEMU_CAPS_DEVICE_NVDIMM, QEMU_CAPS_LAST); DO_TEST_CAPS_LATEST("memory-hotplug-virtio-pmem"); + DO_TEST_CAPS_LATEST("memory-hotplug-virtio-mem"); DO_TEST("net-udp", NONE); -- 2.26.2

On Thu, Feb 18, 2021 at 14:31:00 +0100, Michal Privoznik wrote:
The virtio-mem is paravirtualized mechanism of adding/removing memory to/from a VM. A virtio-mem-pci device is split into blocks of equal size which are then exposed (all or only a requested portion of them) to the guest kernel to use as regular memory. Therefore, the device has two important attributes:
1) block-size, which defines the size of a block 2) requested-size, which defines how much memory (in bytes) is the device requested to expose to the guest.
The 'block-size' is configured on command line and immutable throughout device's lifetime. The 'requested-size' can be set on the command line too, but also is adjustable via monitor. In fact, that is how management software places its requests to change the memory allocation. If it wants to give more memory to the guest it changes 'requested-size' to a bigger value, and if it wants to shrink guest memory it changes the 'requested-size' to a smaller value. Note, value of zero means that guest should release all memory offered by the device. Of course, guest has to cooperate. Therefore, there is a third attribute 'size' which is read only and reflects how much memory the guest still has. This can be different to 'requested-size', obviously. Because of name clash, I've named it 'actualsize' and it is dealt with in future commits (it is a runtime information anyway).
In the backend, memory for virtio-mem is backed by usual objects: memory-backend-{ram,file,memfd} and their size puts the cap on the amount of memory that a virtio-mem device can offer to a guest. But we are already able to express this info using <size/> under <target/>.
Therefore, we need only two more elements to cover 'block-size' and 'requested-size' attributes. This is the XML I've came up with:
<memory model='virtio-mem'> <source> <nodemask>1-3</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>2097152</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>1048576</requested> </target> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </memory>
I hope by now it is obvious that:
1) 'requested-size' must be an integer multiple of 'block-size', and 2) virtio-mem-pci device goes onto PCI bus and thus needs PCI address.
Then there is a limitation that the minimal 'block-size' is transparent huge page size (I'll leave this without explanation).
Since now we have (possibly) two or more devices that allow memory inflation/deflation and accounting for all of them (and thus keeping <currentMemory/> updated) might be hard. Therefore, I'm deliberately forbidding memballoon. It's okay - virtio-mem is superior to memballoon anyway. We can always reevaluate later.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.rst | 51 +++++++++++++-- docs/schemas/domaincommon.rng | 11 ++++ src/conf/domain_conf.c | 53 ++++++++++++++- src/conf/domain_conf.h | 3 + src/conf/domain_validate.c | 39 +++++++++++ src/qemu/qemu_alias.c | 1 + src/qemu/qemu_command.c | 1 + src/qemu/qemu_domain.c | 27 +++++++- src/qemu/qemu_domain_address.c | 38 ++++++++--- src/qemu/qemu_process.c | 2 + src/qemu/qemu_validate.c | 22 ++++++- src/security/security_apparmor.c | 1 + src/security/security_dac.c | 2 + src/security/security_selinux.c | 2 + .../memory-hotplug-virtio-mem.xml | 64 +++++++++++++++++++ ...emory-hotplug-virtio-mem.x86_64-latest.xml | 1 + tests/qemuxml2xmltest.c | 1 + 17 files changed, 298 insertions(+), 21 deletions(-) create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml create mode 120000 tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml
diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 2587106191..05d359a32d 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst
[...]
@@ -6935,6 +6937,16 @@ Example: automatically added device with KVM </devices> ...
+Example: automatically added device with QEMU/KVM and a ``virtio-mem`` device: + +:: + + ... + <devices>
The example is missing the 'virtio-mem' device which causes the balloon to be disabled.
+ <memballoon model='none'/> + </devices> + ... + Example: manually added device with static PCI slot 2 requested
::
[...]
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 541d592bbe..26b9b5583e 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -3679,10 +3679,21 @@ qemuDomainDefAddDefaultDevices(virDomainDefPtr def, }
if (addDefaultMemballoon && !def->memballoon) { - virDomainMemballoonDefPtr memballoon; - memballoon = g_new0(virDomainMemballoonDef, 1); + virDomainMemballoonDefPtr memballoon = g_new0(virDomainMemballoonDef, 1); + size_t i;
- memballoon->model = VIR_DOMAIN_MEMBALLOON_MODEL_VIRTIO; + /* To simplify virtio-mem implementation, memballoon has to be turned + * off if domain has a virtio-mem device. See + * qemuValidateDomainDeviceDefMemory() for more details. */ + for (i = 0; i < def->nmems; i++) { + if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + break; + } + + if (i == def->nmems) + memballoon->model = VIR_DOMAIN_MEMBALLOON_MODEL_VIRTIO; + else + memballoon->model = VIR_DOMAIN_MEMBALLOON_MODEL_NONE; def->memballoon = memballoon; }
This might be a point of contention. I can see why we'd want to do this but people might have other ideas. Preferably I'd like danpb's input. I think the best approach will be to not hide it into the massive patch adding the boring XML bits but split it separately. [...]
diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 2541ae856a..6150083251 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c
[...]
@@ -4688,6 +4689,25 @@ qemuValidateDomainDeviceDefMemory(virDomainMemoryDefPtr mem, } break;
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + /* Accounting balloon and virtio-mem is hard. We have plenty of APIs + * which take balloon from QEMU and report it to users. We would have + * to change all that and account for virtio-mem actual size. Also, + * virtio-mem is supposed to be replacement for balloon. Disable + * coexistence of these two for now. */ + if (virDomainDefHasMemballoon(def)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("virtio-mem is not supported with memory balloon")); + return -1; + } + + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("virtio-mem isn't supported by this QEMU binary")); + return -1; + } + break;
And this one too. I agree with this hunk as is, but not in this commit.

On 18.02.21 14:31, Michal Privoznik wrote:
The virtio-mem is paravirtualized mechanism of adding/removing memory to/from a VM. A virtio-mem-pci device is split into blocks of equal size which are then exposed (all or only a requested portion of them) to the guest kernel to use as regular memory. Therefore, the device has two important attributes:
1) block-size, which defines the size of a block 2) requested-size, which defines how much memory (in bytes) is the device requested to expose to the guest.
The 'block-size' is configured on command line and immutable throughout device's lifetime. The 'requested-size' can be set on the command line too, but also is adjustable via monitor. In fact, that is how management software places its requests to change the memory allocation. If it wants to give more memory to the guest it changes 'requested-size' to a bigger value, and if it wants to shrink guest memory it changes the 'requested-size' to a smaller value. Note, value of zero means that guest should release all memory offered by the device. Of course, guest has to cooperate. Therefore, there is a third attribute 'size' which is read only and reflects how much memory the guest still has. This can be different to 'requested-size', obviously. Because of name clash, I've named it 'actualsize' and it is dealt with in future commits (it is a runtime information anyway).
In the backend, memory for virtio-mem is backed by usual objects: memory-backend-{ram,file,memfd} and their size puts the cap on the amount of memory that a virtio-mem device can offer to a guest. But we are already able to express this info using <size/> under <target/>.
Therefore, we need only two more elements to cover 'block-size' and 'requested-size' attributes. This is the XML I've came up with:
<memory model='virtio-mem'> <source> <nodemask>1-3</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>2097152</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>1048576</requested> </target> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </memory>
I hope by now it is obvious that:
1) 'requested-size' must be an integer multiple of 'block-size', and 2) virtio-mem-pci device goes onto PCI bus and thus needs PCI address.
Then there is a limitation that the minimal 'block-size' is transparent huge page size (I'll leave this without explanation).
Since now we have (possibly) two or more devices that allow memory inflation/deflation and accounting for all of them (and thus keeping <currentMemory/> updated) might be hard. Therefore, I'm deliberately forbidding memballoon. It's okay - virtio-mem is superior to memballoon anyway. We can always reevaluate later.
That's a bad idea. It'll still be used for getting memory stats, free page hinting and free page reporting. Very weird use cases might even want to mix balloon inflation/deflation with virtio-mem ... -- Thanks, David / dhildenb

On 2/18/21 4:00 PM, David Hildenbrand wrote:
On 18.02.21 14:31, Michal Privoznik wrote:
Since now we have (possibly) two or more devices that allow memory inflation/deflation and accounting for all of them (and thus keeping <currentMemory/> updated) might be hard. Therefore, I'm deliberately forbidding memballoon. It's okay - virtio-mem is superior to memballoon anyway. We can always reevaluate later.
That's a bad idea. It'll still be used for getting memory stats, free page hinting and free page reporting.
Very weird use cases might even want to mix balloon inflation/deflation with virtio-mem ...
I understand that, but then accounting for memory at libvirt level becomes hard. What I mean, currently we have three elements for reporting guest memory: <domain> <maxMemory slots='16' unit='KiB'>1524288</maxMemory> <memory unit='KiB'>524288</memory> <currentMemory unit='KiB'>524288</currentMemory> ... </domain> <maxMemory/> is for memory hotplug (corresponds to -m maxmem=X,slots=Y) <memory/> is the initial amount guest has PLUS sum of sizes of all memory devices. For instance if I want to give 1GiB initial amount and have 1 pc-dimm of 2GiB size this element will report 3GiB. <currentMemory/> then reflects actual balloon size (gets updated on BALLOON_CHANGE event and when libvirt finds fit via query-balloon). In my experiments, balloon does not account for virtio-mem actual size. Therefore, I can see following options: 1) fix <currentMemory/> so that it does balloon size + sum(virtio-mem.actual), or 2) don't touch <currentMemory/> at all and let mgmt apps compute the sum themselves (should they need it- which I believe they will), 3) come up with new, forth element (awful). The benefit of 1) is that the element will report the same number as 'free' ran within guest. Problem is that I'd have to change all places where we take the size of the balloon device directly and add sum of actual sizes of virtio-mem devices (e.g. all stats APIs, ballon changed event that we fire when we see qemu's event). And okay, if I do that - then the only way to learn true value of balloon would be to take current memory and subtract the sum of virtio-mem actual sizes. Which is like option 2) but reversed. And since I could not decide which way to go, I made it problem of future me by forbidding balloon and virtio-mem coexistence. Michal

On 22.02.21 16:46, Michal Privoznik wrote:
On 2/18/21 4:00 PM, David Hildenbrand wrote:
On 18.02.21 14:31, Michal Privoznik wrote:
Since now we have (possibly) two or more devices that allow memory inflation/deflation and accounting for all of them (and thus keeping <currentMemory/> updated) might be hard. Therefore, I'm deliberately forbidding memballoon. It's okay - virtio-mem is superior to memballoon anyway. We can always reevaluate later.
That's a bad idea. It'll still be used for getting memory stats, free page hinting and free page reporting.
Very weird use cases might even want to mix balloon inflation/deflation with virtio-mem ...
I understand that, but then accounting for memory at libvirt level becomes hard. What I mean, currently we have three elements for reporting guest memory:
<domain> <maxMemory slots='16' unit='KiB'>1524288</maxMemory> <memory unit='KiB'>524288</memory> <currentMemory unit='KiB'>524288</currentMemory> ... </domain>
<maxMemory/> is for memory hotplug (corresponds to -m maxmem=X,slots=Y)
<memory/> is the initial amount guest has PLUS sum of sizes of all memory devices. For instance if I want to give 1GiB initial amount and have 1 pc-dimm of 2GiB size this element will report 3GiB.
<currentMemory/> then reflects actual balloon size (gets updated on BALLOON_CHANGE event and when libvirt finds fit via query-balloon).
In my experiments, balloon does not account for virtio-mem actual size. Therefore, I can see following options:
1) fix <currentMemory/> so that it does balloon size + sum(virtio-mem.actual), or
2) don't touch <currentMemory/> at all and let mgmt apps compute the sum themselves (should they need it- which I believe they will),
3) come up with new, forth element (awful).
The benefit of 1) is that the element will report the same number as 'free' ran within guest. Problem is that I'd have to change all places where we take the size of the balloon device directly and add sum of actual sizes of virtio-mem devices (e.g. all stats APIs, ballon changed event that we fire when we see qemu's event). And okay, if I do that - then the only way to learn true value of balloon would be to take current memory and subtract the sum of virtio-mem actual sizes. Which is like option 2) but reversed. And since I could not decide which way to go, I made it problem of future me by forbidding balloon and virtio-mem coexistence.
In QEMU, we could make "info balloon" etc. also include virtio-mem provided memory. The main reason I did not do so initially is that 1) It's racy when reading/writing the balloon size. The QEMU interface is broken as we don't get/set the size of the balloon size but instead the logical VM size. While someone sets the logical VM size via virtio-balloon, the logical VM size might change due to virtio- mem guest activity. 2) I don't want people to be tempted to use both at the same time. Maybe the right think to do is make "info balloon" report the current logical VM size. If there are races, bad luck - better not use both things at the same time. -- Thanks, David / dhildenb

On Mon, Feb 22, 2021 at 16:57:11 +0100, David Hildenbrand wrote:
On 22.02.21 16:46, Michal Privoznik wrote:
On 2/18/21 4:00 PM, David Hildenbrand wrote:
On 18.02.21 14:31, Michal Privoznik wrote:
[...]
In QEMU, we could make "info balloon" etc. also include virtio-mem provided memory.
The main reason I did not do so initially is that 1) It's racy when reading/writing the balloon size. The QEMU interface is broken as we don't get/set the size of the balloon size but instead the logical VM size. While someone sets the logical VM size via virtio-balloon, the logical VM size might change due to virtio- mem guest activity. 2) I don't want people to be tempted to use both at the same time.
Maybe the right think to do is make "info balloon" report the current logical VM size. If there are races, bad luck - better not use both things at the same time.
We specifically don't call info balloon or any equivalent nowadays if qemu supports the balloon event and we should keep it like that at least in terms of the active XML. The bulk stats functions we have already call the monitor so it's okay to do it there. Both virtio-mem and balloon do have events so we should be able to update any required definition bits without an explicit call to the balloon info apart from the stats code and the refresh on reconnect.

On 22.02.21 17:28, Peter Krempa wrote:
On Mon, Feb 22, 2021 at 16:57:11 +0100, David Hildenbrand wrote:
On 22.02.21 16:46, Michal Privoznik wrote:
On 2/18/21 4:00 PM, David Hildenbrand wrote:
On 18.02.21 14:31, Michal Privoznik wrote:
[...]
In QEMU, we could make "info balloon" etc. also include virtio-mem provided memory.
The main reason I did not do so initially is that 1) It's racy when reading/writing the balloon size. The QEMU interface is broken as we don't get/set the size of the balloon size but instead the logical VM size. While someone sets the logical VM size via virtio-balloon, the logical VM size might change due to virtio- mem guest activity. 2) I don't want people to be tempted to use both at the same time.
Maybe the right think to do is make "info balloon" report the current logical VM size. If there are races, bad luck - better not use both things at the same time.
We specifically don't call info balloon or any equivalent nowadays if qemu supports the balloon event and we should keep it like that at least in terms of the active XML.
The bulk stats functions we have already call the monitor so it's okay to do it there.
Both virtio-mem and balloon do have events so we should be able to update any required definition bits without an explicit call to the balloon info apart from the stats code and the refresh on reconnect.
Just to clarify, any BALLOON communication with the outer world (set/get/balloon_change) are in terms of "logical VM size", not "balloon size". logical_vm_size = vm_size - balloon_size vm_size does currently not include virtio-mem provided memory. -- Thanks, David / dhildenb

On Thu, Feb 18, 2021 at 04:00:05PM +0100, David Hildenbrand wrote:
On 18.02.21 14:31, Michal Privoznik wrote:
The virtio-mem is paravirtualized mechanism of adding/removing memory to/from a VM. A virtio-mem-pci device is split into blocks of equal size which are then exposed (all or only a requested portion of them) to the guest kernel to use as regular memory. Therefore, the device has two important attributes:
1) block-size, which defines the size of a block 2) requested-size, which defines how much memory (in bytes) is the device requested to expose to the guest.
The 'block-size' is configured on command line and immutable throughout device's lifetime. The 'requested-size' can be set on the command line too, but also is adjustable via monitor. In fact, that is how management software places its requests to change the memory allocation. If it wants to give more memory to the guest it changes 'requested-size' to a bigger value, and if it wants to shrink guest memory it changes the 'requested-size' to a smaller value. Note, value of zero means that guest should release all memory offered by the device. Of course, guest has to cooperate. Therefore, there is a third attribute 'size' which is read only and reflects how much memory the guest still has. This can be different to 'requested-size', obviously. Because of name clash, I've named it 'actualsize' and it is dealt with in future commits (it is a runtime information anyway).
In the backend, memory for virtio-mem is backed by usual objects: memory-backend-{ram,file,memfd} and their size puts the cap on the amount of memory that a virtio-mem device can offer to a guest. But we are already able to express this info using <size/> under <target/>.
Therefore, we need only two more elements to cover 'block-size' and 'requested-size' attributes. This is the XML I've came up with:
<memory model='virtio-mem'> <source> <nodemask>1-3</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>2097152</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>1048576</requested> </target> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </memory>
I hope by now it is obvious that:
1) 'requested-size' must be an integer multiple of 'block-size', and 2) virtio-mem-pci device goes onto PCI bus and thus needs PCI address.
Then there is a limitation that the minimal 'block-size' is transparent huge page size (I'll leave this without explanation).
Since now we have (possibly) two or more devices that allow memory inflation/deflation and accounting for all of them (and thus keeping <currentMemory/> updated) might be hard. Therefore, I'm deliberately forbidding memballoon. It's okay - virtio-mem is superior to memballoon anyway. We can always reevaluate later.
That's a bad idea. It'll still be used for getting memory stats, free page hinting and free page reporting.
Yep, and this feature is broadly used in mgmt apps, so if we enforce this mutual exclusion it is putting apps into a no-win scenario.
Very weird use cases might even want to mix balloon inflation/deflation with virtio-mem ...
I'd feel less bad about blocking inflation/deflation, but that does go against the general libvirt POV that we should leave usage policies upto the mgmt app to decide upon. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Nothing special is happening here. All important changes were done when for 'virtio-pmem' (adjusting the code to put virtio memory on PCI bus, generating alias using qemuDomainDeviceAliasIndex(). The only bit that might look suspicious is no prealloc for virtio-mem. But if you think about it, the whole purpose of this device is to change amount of memory exposed to guest on the fly. There is no point in locking the whole backend in memory. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_alias.c | 9 +++- src/qemu/qemu_command.c | 12 ++++- ...mory-hotplug-virtio-mem.x86_64-latest.args | 48 +++++++++++++++++++ tests/qemuxml2argvtest.c | 1 + 4 files changed, 67 insertions(+), 3 deletions(-) create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c index 1a330829a3..cef02ae916 100644 --- a/src/qemu/qemu_alias.c +++ b/src/qemu/qemu_alias.c @@ -475,8 +475,11 @@ qemuDeviceMemoryGetAliasID(virDomainDefPtr def, size_t i; int maxidx = 0; - /* virtio-pmem goes onto PCI bus and thus DIMM address is not valid */ - if (!oldAlias && mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM) + /* virtio-pmem and virtio-mem go onto PCI bus and thus DIMM address is not + * valid */ + if (!oldAlias && + mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM && + mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) return mem->info.addr.dimm.slot; for (i = 0; i < def->nmems; i++) { @@ -523,6 +526,8 @@ qemuAssignDeviceMemoryAlias(virDomainDefPtr def, prefix = "virtiopmem"; break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + prefix = "virtiomem"; + break; case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index ac9a8c9f6a..d56ba26874 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3127,7 +3127,9 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps, } else if (useHugepage) { if (qemuGetDomainHupageMemPath(priv->driver, def, pagesize, &memPath) < 0) return -1; - prealloc = true; + /* For virtio-mem backed by hugepages we don't need prealloc. */ + if (mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + prealloc = true; } else { /* We can have both pagesize and mem source. If that's the case, * prefer hugepages as those are more specific. */ @@ -3356,6 +3358,9 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def, break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + device = "virtio-mem-pci"; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: @@ -3372,6 +3377,11 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def, if (mem->labelsize) virBufferAsprintf(&buf, "label-size=%llu,", mem->labelsize * 1024); + if (mem->blocksize) { + virBufferAsprintf(&buf, "block-size=%llu,", mem->blocksize * 1024); + virBufferAsprintf(&buf, "requested-size=%llu,", mem->requestedsize * 1024); + } + if (mem->uuid) { char uuidstr[VIR_UUID_STRING_BUFLEN]; diff --git a/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args new file mode 100644 index 0000000000..76d8c41886 --- /dev/null +++ b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args @@ -0,0 +1,48 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-i386 \ +-name guest=QEMUGuest1,debug-threads=on \ +-S \ +-object secret,id=masterKey0,format=raw,\ +file=/tmp/lib/domain--1-QEMUGuest1/master-key.aes \ +-machine pc,accel=kvm,usb=off,dump-guest-core=off \ +-cpu qemu64 \ +-m size=2095104k,slots=16,maxmem=1099511627776k \ +-overcommit mem-lock=off \ +-smp 2,sockets=2,dies=1,cores=1,threads=1 \ +-object memory-backend-ram,id=ram-node0,size=2145386496 \ +-numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \ +-object memory-backend-ram,id=memvirtiomem0,size=1073741824 \ +-device virtio-mem-pci,node=0,block-size=2097152,requested-size=536870912,\ +memdev=memvirtiomem0,id=virtiomem0,bus=pci.0,addr=0x2 \ +-object memory-backend-file,id=memvirtiomem1,\ +mem-path=/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1,size=2147483648,\ +host-nodes=1-3,policy=bind \ +-device virtio-mem-pci,node=0,block-size=2097152,requested-size=1073741824,\ +memdev=memvirtiomem1,id=virtiomem1,bus=pci.0,addr=0x3 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-boot strict=on \ +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ +-blockdev '{"driver":"host_device","filename":"/dev/HostVG/QEMUGuest1",\ +"node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ +-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw",\ +"file":"libvirt-1-storage"}' \ +-device ide-hd,bus=ide.0,unit=0,drive=libvirt-1-format,id=ide0-0-0,bootindex=1 \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,\ +resourcecontrol=deny \ +-msg timestamp=on diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index db438c5466..61aff38abd 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -3060,6 +3060,7 @@ mymain(void) QEMU_CAPS_DEVICE_NVDIMM, QEMU_CAPS_LAST); DO_TEST_CAPS_LATEST("memory-hotplug-virtio-pmem"); + DO_TEST_CAPS_LATEST("memory-hotplug-virtio-mem"); DO_TEST("machine-aeskeywrap-on-caps", QEMU_CAPS_AES_KEY_WRAP, -- 2.26.2

As advertised in one of previous commits, we want to be able to change 'requested-size' attribute of virtio-mem on the fly. This commit does exactly that. Changing anything else is checked for and forbidden. Once guest has changed the allocation, QEMU emits an event which we will use to track the allocation. In the next commit. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/domain_conf.c | 36 ++++++++ src/conf/domain_conf.h | 4 + src/libvirt_private.syms | 1 + src/qemu/qemu_driver.c | 172 ++++++++++++++++++++++++++++++++++- src/qemu/qemu_hotplug.c | 18 ++++ src/qemu/qemu_hotplug.h | 5 + src/qemu/qemu_monitor.c | 13 +++ src/qemu/qemu_monitor.h | 4 + src/qemu/qemu_monitor_json.c | 15 +++ src/qemu/qemu_monitor_json.h | 5 + 10 files changed, 272 insertions(+), 1 deletion(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index f4c08ef43d..5f9210a112 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -17377,6 +17377,42 @@ virDomainMemoryFindInactiveByDef(virDomainDefPtr def, } +/** + * virDomainMemoryFindByDeviceInfo: + * @def: domain defintion + * @info: device info to match + * + * For given domain definition @def find <memory/> device with + * matching address and matching device alias (if set in @info, + * otherwise ignored). + * + * Returns: device if found, + * NULL otherwise. + */ +virDomainMemoryDefPtr +virDomainMemoryFindByDeviceInfo(virDomainDefPtr def, + virDomainDeviceInfoPtr info) +{ + size_t i; + + for (i = 0; i < def->nmems; i++) { + virDomainMemoryDefPtr tmp = def->mems[i]; + + if (!virDomainDeviceInfoAddressIsEqual(&tmp->info, info)) + continue; + + /* alias, if present */ + if (info->alias && + STRNEQ_NULLABLE(tmp->info.alias, info->alias)) + continue; + + return tmp; + } + + return NULL; +} + + /** * virDomainMemoryInsert: * diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 14d238df4b..af38d8aa81 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -3622,6 +3622,10 @@ int virDomainMemoryFindByDef(virDomainDefPtr def, virDomainMemoryDefPtr mem) int virDomainMemoryFindInactiveByDef(virDomainDefPtr def, virDomainMemoryDefPtr mem) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; +virDomainMemoryDefPtr +virDomainMemoryFindByDeviceInfo(virDomainDefPtr dev, + virDomainDeviceInfoPtr info) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; int virDomainShmemDefInsert(virDomainDefPtr def, virDomainShmemDefPtr shmem) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 0e6ad1222f..ec1af101a9 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -494,6 +494,7 @@ virDomainMemballoonModelTypeFromString; virDomainMemballoonModelTypeToString; virDomainMemoryDefFree; virDomainMemoryFindByDef; +virDomainMemoryFindByDeviceInfo; virDomainMemoryFindInactiveByDef; virDomainMemoryInsert; virDomainMemoryModelTypeToString; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index b9bbdf8d48..1a121e9278 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7093,6 +7093,165 @@ qemuDomainChangeDiskLive(virDomainObjPtr vm, return 0; } + +static bool +qemuDomainChangeMemoryLiveValidateChange(const virDomainMemoryDef *oldDef, + const virDomainMemoryDef *newDef) +{ + /* The only thing that is allowed to change is 'requestedsize' for virtio + * model. */ + if (oldDef->model != newDef->model) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory model from '%s' to '%s'"), + virDomainMemoryModelTypeToString(oldDef->model), + virDomainMemoryModelTypeToString(newDef->model)); + return false; + } + + if (oldDef->access != newDef->access) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory access from '%s' to '%s'"), + virDomainMemoryAccessTypeToString(oldDef->access), + virDomainMemoryAccessTypeToString(newDef->access)); + return false; + } + + if (oldDef->discard != newDef->discard) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory discard from '%s' to '%s'"), + virTristateBoolTypeToString(oldDef->discard), + virTristateBoolTypeToString(newDef->discard)); + return false; + } + + if (!virBitmapEqual(oldDef->sourceNodes, + newDef->sourceNodes)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cannot modify memory source nodes")); + return false; + } + + if (oldDef->pagesize != newDef->pagesize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory pagesize from '%llu' to '%llu'"), + oldDef->pagesize, + newDef->pagesize); + return false; + } + + if (STRNEQ_NULLABLE(oldDef->nvdimmPath, newDef->nvdimmPath)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory path from '%s' to '%s'"), + NULLSTR(oldDef->nvdimmPath), + NULLSTR(newDef->nvdimmPath)); + return false; + } + + if (oldDef->alignsize != newDef->alignsize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory align size from '%llu' to '%llu'"), + oldDef->alignsize, newDef->alignsize); + return false; + } + + if (oldDef->nvdimmPmem != newDef->nvdimmPmem) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory pmem from '%d' to '%d'"), + oldDef->nvdimmPmem, newDef->nvdimmPmem); + return false; + } + + if (oldDef->targetNode != newDef->targetNode) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory targetNode from '%d' to '%d'"), + oldDef->targetNode, newDef->targetNode); + return false; + } + + if (oldDef->size != newDef->size) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory size from '%llu' to '%llu'"), + oldDef->size, newDef->size); + return false; + } + + if (oldDef->labelsize != newDef->labelsize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory label size from '%llu' to '%llu'"), + oldDef->labelsize, newDef->labelsize); + return false; + } + if (oldDef->blocksize != newDef->blocksize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory block size from '%llu' to '%llu'"), + oldDef->blocksize, newDef->blocksize); + return false; + } + + /* requestedsize can change */ + + if (oldDef->readonly != newDef->readonly) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cannot modify memory pmem flag")); + return false; + } + + if ((oldDef->uuid || newDef->uuid) && + !(oldDef->uuid && newDef->uuid && + memcmp(oldDef->uuid, newDef->uuid, VIR_UUID_BUFLEN) == 0)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cannot modify memory UUID")); + return false; + } + + switch (oldDef->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory of model '%s'"), + virDomainMemoryModelTypeToString(oldDef->model)); + return false; + break; + } + + return true; +} + + +static int +qemuDomainChangeMemoryLive(virQEMUDriverPtr driver G_GNUC_UNUSED, + virDomainObjPtr vm, + virDomainDeviceDefPtr dev) +{ + virDomainMemoryDefPtr newDef = dev->data.memory; + virDomainMemoryDefPtr oldDef = NULL; + + oldDef = virDomainMemoryFindByDeviceInfo(vm->def, &dev->data.memory->info); + if (!oldDef) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("memory '%s' not found"), dev->data.memory->info.alias); + return -1; + } + + if (!qemuDomainChangeMemoryLiveValidateChange(oldDef, newDef)) + return -1; + + if (qemuDomainChangeMemoryRequestedSize(driver, vm, + newDef->info.alias, + newDef->requestedsize) < 0) + return -1; + + oldDef->requestedsize = newDef->requestedsize; + return 0; +} + + static int qemuDomainUpdateDeviceLive(virDomainObjPtr vm, virDomainDeviceDefPtr dev, @@ -7134,6 +7293,18 @@ qemuDomainUpdateDeviceLive(virDomainObjPtr vm, ret = qemuDomainChangeNet(driver, vm, dev); break; + case VIR_DOMAIN_DEVICE_MEMORY: + oldDev.data.memory = virDomainMemoryFindByDeviceInfo(vm->def, &dev->data.memory->info); + if (oldDev.data.memory) { + if (virDomainDefCompatibleDevice(vm->def, dev, &oldDev, + VIR_DOMAIN_DEVICE_ACTION_UPDATE, + true) < 0) + return -1; + } + + ret = qemuDomainChangeMemoryLive(driver, vm, dev); + break; + case VIR_DOMAIN_DEVICE_FS: case VIR_DOMAIN_DEVICE_INPUT: case VIR_DOMAIN_DEVICE_SOUND: @@ -7149,7 +7320,6 @@ qemuDomainUpdateDeviceLive(virDomainObjPtr vm, case VIR_DOMAIN_DEVICE_HOSTDEV: case VIR_DOMAIN_DEVICE_CONTROLLER: case VIR_DOMAIN_DEVICE_REDIRDEV: - case VIR_DOMAIN_DEVICE_MEMORY: case VIR_DOMAIN_DEVICE_CHR: case VIR_DOMAIN_DEVICE_NONE: case VIR_DOMAIN_DEVICE_TPM: diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index a66354426d..bfd5fb80d7 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -6714,3 +6714,21 @@ qemuDomainSetVcpuInternal(virQEMUDriverPtr driver, virBitmapFree(livevcpus); return ret; } + + +int +qemuDomainChangeMemoryRequestedSize(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *alias, + unsigned long long requestedsize) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + int rc; + + qemuDomainObjEnterMonitor(driver, vm); + rc = qemuMonitorChangeMemoryRequestedSize(priv->mon, alias, requestedsize); + if (qemuDomainObjExitMonitor(driver, vm) < 0) + return -1; + + return rc; +} diff --git a/src/qemu/qemu_hotplug.h b/src/qemu/qemu_hotplug.h index 6287c5b5e8..9e551a1f82 100644 --- a/src/qemu/qemu_hotplug.h +++ b/src/qemu/qemu_hotplug.h @@ -160,3 +160,8 @@ int qemuHotplugAttachDBusVMState(virQEMUDriverPtr driver, int qemuHotplugRemoveDBusVMState(virQEMUDriverPtr driver, virDomainObjPtr vm, qemuDomainAsyncJob asyncJob); + +int qemuDomainChangeMemoryRequestedSize(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *alias, + unsigned long long requestedsize); diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 0476d606f5..217870c437 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -4721,3 +4721,16 @@ qemuMonitorTransactionBackup(virJSONValuePtr actions, return qemuMonitorJSONTransactionBackup(actions, device, jobname, target, bitmap, syncmode); } + + +int +qemuMonitorChangeMemoryRequestedSize(qemuMonitorPtr mon, + const char *alias, + unsigned long long requestedsize) +{ + VIR_DEBUG("alias=%s requestedsize=%llu", alias, requestedsize); + + QEMU_CHECK_MONITOR(mon); + + return qemuMonitorJSONChangeMemoryRequestedSize(mon, alias, requestedsize); +} diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 32dc96ee82..05df73cff8 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1518,3 +1518,7 @@ qemuMonitorTransactionBackup(virJSONValuePtr actions, const char *target, const char *bitmap, qemuMonitorTransactionBackupSyncMode syncmode); + +int qemuMonitorChangeMemoryRequestedSize(qemuMonitorPtr mon, + const char *alias, + unsigned long long requestedsize); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index f8c78d9093..07d19b5e6d 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -9462,3 +9462,18 @@ qemuMonitorJSONGetCPUMigratable(qemuMonitorPtr mon, return virJSONValueGetBoolean(virJSONValueObjectGet(reply, "return"), migratable); } + + +int +qemuMonitorJSONChangeMemoryRequestedSize(qemuMonitorPtr mon, + const char *alias, + unsigned long long requestedsize) +{ + g_autofree char *path = g_strdup_printf("/machine/peripheral/%s", alias); + qemuMonitorJSONObjectProperty prop = { + .type = QEMU_MONITOR_OBJECT_PROPERTY_ULONG, + .val.ul = requestedsize * 1024, /* monitor needs bytes */ + }; + + return qemuMonitorJSONSetObjectProperty(mon, path, "requested-size", &prop); +} diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index 31652d4207..9f264500f3 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -705,3 +705,8 @@ int qemuMonitorJSONSetDBusVMStateIdList(qemuMonitorPtr mon, int qemuMonitorJSONGetCPUMigratable(qemuMonitorPtr mon, bool *migratable); + +int +qemuMonitorJSONChangeMemoryRequestedSize(qemuMonitorPtr mon, + const char *alias, + unsigned long long requestedsize); -- 2.26.2

As advertised in previous commit, this event is delivered to us when virtio-mem module changes the allocation inside the guest. It comes with one attribute - size - which holds the new size of the virtio-mem (well, allocated size), in bytes. Mind you, this is not necessarily the same number as 'requested size'. It almost certainly will be when sizing the memory up, but it might not be when sizing the memory down - the guest kernel might be unable to free some blocks. This actual size is reported in the domain XML as an output element only. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.rst | 7 ++++++ docs/schemas/domaincommon.rng | 5 +++++ src/conf/domain_conf.c | 26 ++++++++++++++++++++-- src/conf/domain_conf.h | 8 +++++++ src/libvirt_private.syms | 1 + src/qemu/qemu_domain.c | 3 +++ src/qemu/qemu_domain.h | 1 + src/qemu/qemu_driver.c | 31 ++++++++++++++++++++++++++ src/qemu/qemu_monitor.c | 24 ++++++++++++++++++++ src/qemu/qemu_monitor.h | 20 +++++++++++++++++ src/qemu/qemu_monitor_json.c | 24 ++++++++++++++++++++ src/qemu/qemu_process.c | 42 +++++++++++++++++++++++++++++++++++ 12 files changed, 190 insertions(+), 2 deletions(-) diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 05d359a32d..ccce01fd3b 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -7396,6 +7396,7 @@ Example: usage of the memory devices <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>1048576</requested> + <actual unit='KiB'>524288</requested> </target> </memory> </devices> @@ -7512,6 +7513,12 @@ Example: usage of the memory devices The total size exposed to the guest. Must respect ``block`` granularity and be smaller or equal to ``size``. + ``actual`` + Active XML for ``virtio-mem`` model may contain ``actual`` element that + reflects the actual size of the corresponding virtio memory device. The + element is formatted into live XML and never parsed, i.e. it is + output-only element. + :anchor:`<a id="elementsIommu"/>` IOMMU devices diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 1f672756bd..fc7569ba8f 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -6151,6 +6151,11 @@ <ref name="scaledInteger"/> </element> </optional> + <optional> + <element name="actual"> + <ref name="scaledInteger"/> + </element> + </optional> <optional> <element name="label"> <element name="size"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 5f9210a112..aaeb50f2a2 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -17413,6 +17413,23 @@ virDomainMemoryFindByDeviceInfo(virDomainDefPtr def, } +virDomainMemoryDefPtr +virDomainMemoryFindByDeviceAlias(virDomainDefPtr def, + const char *alias) +{ + size_t i; + + for (i = 0; i < def->nmems; i++) { + virDomainMemoryDefPtr tmp = def->mems[i]; + + if (STREQ_NULLABLE(tmp->info.alias, alias)) + return tmp; + } + + return NULL; +} + + /** * virDomainMemoryInsert: * @@ -26746,7 +26763,8 @@ virDomainMemorySourceDefFormat(virBufferPtr buf, static void virDomainMemoryTargetDefFormat(virBufferPtr buf, - virDomainMemoryDefPtr def) + virDomainMemoryDefPtr def, + unsigned int flags) { g_auto(virBuffer) childBuf = VIR_BUFFER_INIT_CHILD(buf); @@ -26768,6 +26786,10 @@ virDomainMemoryTargetDefFormat(virBufferPtr buf, virBufferAsprintf(&childBuf, "<requested unit='KiB'>%llu</requested>\n", def->requestedsize); + if (!(flags & VIR_DOMAIN_DEF_FORMAT_INACTIVE)) { + virBufferAsprintf(&childBuf, "<actual unit='KiB'>%llu</actual>\n", + def->actualsize); + } } virXMLFormatElement(buf, "target", NULL, &childBuf); @@ -26800,7 +26822,7 @@ virDomainMemoryDefFormat(virBufferPtr buf, if (virDomainMemorySourceDefFormat(buf, def) < 0) return -1; - virDomainMemoryTargetDefFormat(buf, def); + virDomainMemoryTargetDefFormat(buf, def, flags); virDomainDeviceInfoFormat(buf, &def->info, flags); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index af38d8aa81..8a43c9655f 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2335,6 +2335,9 @@ struct _virDomainMemoryDef { unsigned long long labelsize; /* kibibytes; valid only for NVDIMM */ unsigned long long blocksize; /* kibibytes; valid only for VIRTIO_MEM */ unsigned long long requestedsize; /* kibibytes; valid only for VIRTIO_MEM */ + unsigned long long actualsize; /* kibibytes, valid for VIRTIO_MEM and + active domain only, only to report never + parse */ bool readonly; /* valid only for NVDIMM */ /* required for QEMU NVDIMM ppc64 support */ @@ -3627,6 +3630,11 @@ virDomainMemoryFindByDeviceInfo(virDomainDefPtr dev, virDomainDeviceInfoPtr info) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; +virDomainMemoryDefPtr +virDomainMemoryFindByDeviceAlias(virDomainDefPtr def, + const char *alias) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; + int virDomainShmemDefInsert(virDomainDefPtr def, virDomainShmemDefPtr shmem) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; bool virDomainShmemDefEquals(virDomainShmemDefPtr src, virDomainShmemDefPtr dst) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index ec1af101a9..4f362e3c54 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -494,6 +494,7 @@ virDomainMemballoonModelTypeFromString; virDomainMemballoonModelTypeToString; virDomainMemoryDefFree; virDomainMemoryFindByDef; +virDomainMemoryFindByDeviceAlias; virDomainMemoryFindByDeviceInfo; virDomainMemoryFindInactiveByDef; virDomainMemoryInsert; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 26b9b5583e..7de9f8c050 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -10880,6 +10880,9 @@ qemuProcessEventFree(struct qemuProcessEvent *event) case QEMU_PROCESS_EVENT_JOB_STATUS_CHANGE: virObjectUnref(event->data); break; + case QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE: + qemuMonitorMemoryDeviceSizeChangeFree(event->data); + break; case QEMU_PROCESS_EVENT_PR_DISCONNECT: case QEMU_PROCESS_EVENT_LAST: break; diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 9a7d997d65..91d6d6ef80 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -441,6 +441,7 @@ typedef enum { QEMU_PROCESS_EVENT_PR_DISCONNECT, QEMU_PROCESS_EVENT_RDMA_GID_STATUS_CHANGED, QEMU_PROCESS_EVENT_GUEST_CRASHLOADED, + QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE, QEMU_PROCESS_EVENT_LAST } qemuProcessEventType; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 1a121e9278..995faf2abc 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4279,6 +4279,34 @@ processGuestCrashloadedEvent(virQEMUDriverPtr driver, } +static void +processMemoryDeviceSizeChange(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuMonitorMemoryDeviceSizeChangePtr info) +{ + virDomainMemoryDefPtr mem = NULL; + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) + return; + + if (!virDomainObjIsActive(vm)) { + VIR_DEBUG("Domain is not running"); + goto endjob; + } + + mem = virDomainMemoryFindByDeviceAlias(vm->def, info->devAlias); + if (!mem) { + VIR_DEBUG("Memory device '%s' not found", info->devAlias); + goto endjob; + } + + mem->actualsize = VIR_DIV_UP(info->size, 1024); + + endjob: + qemuDomainObjEndJob(driver, vm); +} + + static void qemuProcessEventHandler(void *data, void *opaque) { struct qemuProcessEvent *processEvent = data; @@ -4328,6 +4356,9 @@ static void qemuProcessEventHandler(void *data, void *opaque) case QEMU_PROCESS_EVENT_GUEST_CRASHLOADED: processGuestCrashloadedEvent(driver, vm); break; + case QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE: + processMemoryDeviceSizeChange(driver, vm, processEvent->data); + break; case QEMU_PROCESS_EVENT_LAST: break; } diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 217870c437..a7b7a78a55 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1437,6 +1437,20 @@ qemuMonitorEmitSpiceMigrated(qemuMonitorPtr mon) } +int +qemuMonitorEmitMemoryDeviceSizeChange(qemuMonitorPtr mon, + const char *devAlias, + unsigned long long size) +{ + int ret = -1; + VIR_DEBUG("mon=%p, devAlias='%s', size=%llu", mon, devAlias, size); + + QEMU_MONITOR_CALLBACK(mon, ret, domainMemoryDeviceSizeChange, mon->vm, devAlias, size); + + return ret; +} + + int qemuMonitorEmitMemoryFailure(qemuMonitorPtr mon, qemuMonitorEventMemoryFailurePtr mfp) @@ -4408,6 +4422,16 @@ qemuMonitorEventRdmaGidStatusFree(qemuMonitorRdmaGidStatusPtr info) } +void +qemuMonitorMemoryDeviceSizeChangeFree(qemuMonitorMemoryDeviceSizeChangePtr info) +{ + if (!info) + return; + + g_free(info->devAlias); +} + + int qemuMonitorSetWatchdogAction(qemuMonitorPtr mon, const char *action) diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 05df73cff8..53f2aa9600 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -108,6 +108,14 @@ struct _qemuMonitorRdmaGidStatus { }; +typedef struct _qemuMonitorMemoryDeviceSizeChange qemuMonitorMemoryDeviceSizeChange; +typedef qemuMonitorMemoryDeviceSizeChange *qemuMonitorMemoryDeviceSizeChangePtr; +struct _qemuMonitorMemoryDeviceSizeChange { + char *devAlias; + unsigned long long size; +}; + + typedef enum { QEMU_MONITOR_JOB_TYPE_UNKNOWN, /* internal value, not exposed by qemu */ QEMU_MONITOR_JOB_TYPE_COMMIT, @@ -153,6 +161,7 @@ struct _qemuMonitorJobInfo { char *qemuMonitorGuestPanicEventInfoFormatMsg(qemuMonitorEventPanicInfoPtr info); void qemuMonitorEventPanicInfoFree(qemuMonitorEventPanicInfoPtr info); void qemuMonitorEventRdmaGidStatusFree(qemuMonitorRdmaGidStatusPtr info); +void qemuMonitorMemoryDeviceSizeChangeFree(qemuMonitorMemoryDeviceSizeChangePtr info); typedef void (*qemuMonitorDestroyCallback)(qemuMonitorPtr mon, virDomainObjPtr vm, @@ -374,6 +383,12 @@ typedef int (*qemuMonitorDomainMemoryFailureCallback)(qemuMonitorPtr mon, qemuMonitorEventMemoryFailurePtr mfp, void *opaque); +typedef int (*qemuMonitorDomainMemoryDeviceSizeChange)(qemuMonitorPtr mon, + virDomainObjPtr vm, + const char *alias, + unsigned long long size, + void *opaque); + typedef struct _qemuMonitorCallbacks qemuMonitorCallbacks; typedef qemuMonitorCallbacks *qemuMonitorCallbacksPtr; struct _qemuMonitorCallbacks { @@ -411,6 +426,7 @@ struct _qemuMonitorCallbacks { qemuMonitorDomainRdmaGidStatusChangedCallback domainRdmaGidStatusChanged; qemuMonitorDomainGuestCrashloadedCallback domainGuestCrashloaded; qemuMonitorDomainMemoryFailureCallback domainMemoryFailure; + qemuMonitorDomainMemoryDeviceSizeChange domainMemoryDeviceSizeChange; }; qemuMonitorPtr qemuMonitorOpen(virDomainObjPtr vm, @@ -507,6 +523,10 @@ int qemuMonitorEmitSerialChange(qemuMonitorPtr mon, bool connected); int qemuMonitorEmitSpiceMigrated(qemuMonitorPtr mon); +int qemuMonitorEmitMemoryDeviceSizeChange(qemuMonitorPtr mon, + const char *devAlias, + unsigned long long size); + int qemuMonitorEmitMemoryFailure(qemuMonitorPtr mon, qemuMonitorEventMemoryFailurePtr mfp); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 07d19b5e6d..9ce7ba52ba 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -113,6 +113,7 @@ static void qemuMonitorJSONHandleDumpCompleted(qemuMonitorPtr mon, virJSONValueP static void qemuMonitorJSONHandlePRManagerStatusChanged(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleRdmaGidStatusChanged(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleMemoryFailure(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleMemoryDeviceSizeChange(qemuMonitorPtr mon, virJSONValuePtr data); typedef struct { const char *type; @@ -133,6 +134,7 @@ static qemuEventHandler eventHandlers[] = { { "GUEST_CRASHLOADED", qemuMonitorJSONHandleGuestCrashloaded, }, { "GUEST_PANICKED", qemuMonitorJSONHandleGuestPanic, }, { "JOB_STATUS_CHANGE", qemuMonitorJSONHandleJobStatusChange, }, + { "MEMORY_DEVICE_SIZE_CHANGE", qemuMonitorJSONHandleMemoryDeviceSizeChange, }, { "MEMORY_FAILURE", qemuMonitorJSONHandleMemoryFailure, }, { "MIGRATION", qemuMonitorJSONHandleMigrationStatus, }, { "MIGRATION_PASS", qemuMonitorJSONHandleMigrationPass, }, @@ -1333,6 +1335,28 @@ qemuMonitorJSONHandleSpiceMigrated(qemuMonitorPtr mon, } +static void +qemuMonitorJSONHandleMemoryDeviceSizeChange(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + const char *name; + unsigned long long size; + + if (!(name = virJSONValueObjectGetString(data, "id"))) { + VIR_WARN("missing device alias in MEMORY_DEVICE_SIZE_CHANGE event"); + return; + } + + if (virJSONValueObjectGetNumberUlong(data, "size", &size) < 0) { + VIR_WARN("missing new size for '%s' in MEMORY_DEVICE_SIZE_CHANGE event", name); + return; + } + + + qemuMonitorEmitMemoryDeviceSizeChange(mon, name, size); +} + + static void qemuMonitorJSONHandleMemoryFailure(qemuMonitorPtr mon, virJSONValuePtr data) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index ebf42259b5..4ab085841a 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1937,6 +1937,47 @@ qemuProcessHandleMemoryFailure(qemuMonitorPtr mon G_GNUC_UNUSED, } +static int +qemuProcessHandleMemoryDeviceSizeChange(qemuMonitorPtr mon G_GNUC_UNUSED, + virDomainObjPtr vm, + const char *devAlias, + unsigned long long size, + void *opaque) +{ + virQEMUDriverPtr driver = opaque; + struct qemuProcessEvent *processEvent = NULL; + qemuMonitorMemoryDeviceSizeChangePtr info = NULL; + int ret = -1; + + virObjectLock(vm); + + VIR_DEBUG("Memory device '%s' changed size to '%llu' in domain '%s'", + devAlias, size, vm->def->name); + + info = g_new0(qemuMonitorMemoryDeviceSizeChange, 1); + info->devAlias = g_strdup(devAlias); + info->size = size; + + processEvent = g_new0(struct qemuProcessEvent, 1); + processEvent->eventType = QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE; + processEvent->vm = virObjectRef(vm); + processEvent->data = g_steal_pointer(&info); + + if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { + qemuProcessEventFree(processEvent); + virObjectUnref(vm); + goto cleanup; + } + + processEvent = NULL; + ret = 0; + cleanup: + qemuProcessEventFree(processEvent); + virObjectUnlock(vm); + return ret; +} + + static qemuMonitorCallbacks monitorCallbacks = { .eofNotify = qemuProcessHandleMonitorEOF, .errorNotify = qemuProcessHandleMonitorError, @@ -1970,6 +2011,7 @@ static qemuMonitorCallbacks monitorCallbacks = { .domainRdmaGidStatusChanged = qemuProcessHandleRdmaGidStatusChanged, .domainGuestCrashloaded = qemuProcessHandleGuestCrashloaded, .domainMemoryFailure = qemuProcessHandleMemoryFailure, + .domainMemoryDeviceSizeChange = qemuProcessHandleMemoryDeviceSizeChange, }; static void -- 2.26.2

New event is introduced that is emitted whenever guest acknowledges allocation change request of a virtio-mem. The aim is to let applications know when that happens, because changes in allocation are not synchronous with issuing the request. Under the hood, the event is emitted whenever QEMU emits MEMORY_DEVICE_SIZE_CHANGE event. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- examples/c/misc/event-test.c | 17 ++++++ include/libvirt/libvirt-domain.h | 23 ++++++++ src/conf/domain_event.c | 84 +++++++++++++++++++++++++++++ src/conf/domain_event.h | 10 ++++ src/libvirt_private.syms | 2 + src/qemu/qemu_driver.c | 6 +++ src/remote/remote_daemon_dispatch.c | 30 +++++++++++ src/remote/remote_driver.c | 32 +++++++++++ src/remote/remote_protocol.x | 15 +++++- src/remote_protocol-structs | 7 +++ tools/virsh-domain.c | 20 +++++++ 11 files changed, 245 insertions(+), 1 deletion(-) diff --git a/examples/c/misc/event-test.c b/examples/c/misc/event-test.c index f164e825e1..e4b29ef67d 100644 --- a/examples/c/misc/event-test.c +++ b/examples/c/misc/event-test.c @@ -978,6 +978,22 @@ myDomainEventMemoryFailureCallback(virConnectPtr conn G_GNUC_UNUSED, } +static int +myDomainEventMemoryDeviceSizeChangeCallback(virConnectPtr conn G_GNUC_UNUSED, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque G_GNUC_UNUSED) +{ + /* Casts to uint64_t to work around mingw not knowing %lld */ + printf("%s EVENT: Domain %s(%d) memory device size change: " + "alias: '%s' new size %" PRIu64 "'\n", + __func__, virDomainGetName(dom), virDomainGetID(dom), + alias, (uint64_t)size); + return 0; +} + + static int myDomainEventMigrationIterationCallback(virConnectPtr conn G_GNUC_UNUSED, virDomainPtr dom, @@ -1109,6 +1125,7 @@ struct domainEventData domainEvents[] = { DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_METADATA_CHANGE, myDomainEventMetadataChangeCallback), DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_BLOCK_THRESHOLD, myDomainEventBlockThresholdCallback), DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_MEMORY_FAILURE, myDomainEventMemoryFailureCallback), + DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE, myDomainEventMemoryDeviceSizeChangeCallback), }; struct storagePoolEventData { diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 8011cf9fe1..2c449b4f31 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -4651,6 +4651,28 @@ typedef void (*virConnectDomainEventMemoryFailureCallback)(virConnectPtr conn, unsigned int flags, void *opaque); +/** + * virConnectDomainEventMemoryDeviceSizeChangeCallback: + * @conn: connection object + * @dom: domain on which the event occurred + * @alias: memory device alias + * @size: new actual size of memory device (in KiB) + * @opaque: application specified data + * + * The callback occurs when the guest acknowledges request to change size of + * memory device (so far only virtio-mem model supports this). The @size then + * reflects the new amount of guest visible memory (in kibibytes). + * + * The callback signature to use when registering for an event of type + * VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE with + * virConnectDomainEventRegisterAny(). + */ +typedef void (*virConnectDomainEventMemoryDeviceSizeChangeCallback)(virConnectPtr conn, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque); + /** * VIR_DOMAIN_EVENT_CALLBACK: @@ -4695,6 +4717,7 @@ typedef enum { VIR_DOMAIN_EVENT_ID_METADATA_CHANGE = 23, /* virConnectDomainEventMetadataChangeCallback */ VIR_DOMAIN_EVENT_ID_BLOCK_THRESHOLD = 24, /* virConnectDomainEventBlockThresholdCallback */ VIR_DOMAIN_EVENT_ID_MEMORY_FAILURE = 25, /* virConnectDomainEventMemoryFailureCallback */ + VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE = 26, /* virConnectDomainEventMemoryDeviceSizeChangeCallback */ # ifdef VIR_ENUM_SENTINELS VIR_DOMAIN_EVENT_ID_LAST diff --git a/src/conf/domain_event.c b/src/conf/domain_event.c index 726a792dae..31f9cc34f1 100644 --- a/src/conf/domain_event.c +++ b/src/conf/domain_event.c @@ -58,6 +58,7 @@ static virClassPtr virDomainEventDeviceRemovalFailedClass; static virClassPtr virDomainEventMetadataChangeClass; static virClassPtr virDomainEventBlockThresholdClass; static virClassPtr virDomainEventMemoryFailureClass; +static virClassPtr virDomainEventMemoryDeviceSizeChangeClass; static void virDomainEventDispose(void *obj); static void virDomainEventLifecycleDispose(void *obj); @@ -81,6 +82,7 @@ static void virDomainEventDeviceRemovalFailedDispose(void *obj); static void virDomainEventMetadataChangeDispose(void *obj); static void virDomainEventBlockThresholdDispose(void *obj); static void virDomainEventMemoryFailureDispose(void *obj); +static void virDomainEventMemoryDeviceSizeChangeDispose(void *obj); static void virDomainEventDispatchDefaultFunc(virConnectPtr conn, @@ -299,6 +301,15 @@ struct _virDomainEventMemoryFailure { typedef struct _virDomainEventMemoryFailure virDomainEventMemoryFailure; typedef virDomainEventMemoryFailure *virDomainEventMemoryFailurePtr; +struct _virDomainEventMemoryDeviceSizeChange { + virDomainEvent parent; + + char *alias; + unsigned long long size; +}; +typedef struct _virDomainEventMemoryDeviceSizeChange virDomainEventMemoryDeviceSizeChange; +typedef virDomainEventMemoryDeviceSizeChange *virDomainEventMemoryDeviceSizeChangePtr; + static int virDomainEventsOnceInit(void) { @@ -346,6 +357,8 @@ virDomainEventsOnceInit(void) return -1; if (!VIR_CLASS_NEW(virDomainEventMemoryFailure, virDomainEventClass)) return -1; + if (!VIR_CLASS_NEW(virDomainEventMemoryDeviceSizeChange, virDomainEventClass)) + return -1; return 0; } @@ -562,6 +575,14 @@ virDomainEventMemoryFailureDispose(void *obj) VIR_DEBUG("obj=%p", event); } +static void +virDomainEventMemoryDeviceSizeChangeDispose(void *obj) +{ + virDomainEventMemoryDeviceSizeChangePtr event = obj; + VIR_DEBUG("obj=%p", event); + + g_free(event->alias); +} static void * virDomainEventNew(virClassPtr klass, @@ -1686,6 +1707,57 @@ virDomainEventMemoryFailureNewFromDom(virDomainPtr dom, recipient, action, flags); } + +static virObjectEventPtr +virDomainEventMemoryDeviceSizeChangeNew(int id, + const char *name, + unsigned char *uuid, + const char *alias, + unsigned long long size) +{ + virDomainEventMemoryDeviceSizeChangePtr ev; + + if (virDomainEventsInitialize() < 0) + return NULL; + + if (!(ev = virDomainEventNew(virDomainEventMemoryDeviceSizeChangeClass, + VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE, + id, name, uuid))) + return NULL; + + ev->alias = g_strdup(alias); + ev->size = size; + + return (virObjectEventPtr)ev; +} + + +virObjectEventPtr +virDomainEventMemoryDeviceSizeChangeNewFromObj(virDomainObjPtr obj, + const char *alias, + unsigned long long size) +{ + return virDomainEventMemoryDeviceSizeChangeNew(obj->def->id, + obj->def->name, + obj->def->uuid, + alias, + size); +} + + +virObjectEventPtr +virDomainEventMemoryDeviceSizeChangeNewFromDom(virDomainPtr dom, + const char *alias, + unsigned long long size) +{ + return virDomainEventMemoryDeviceSizeChangeNew(dom->id, + dom->name, + dom->uuid, + alias, + size); +} + + static void virDomainEventDispatchDefaultFunc(virConnectPtr conn, virObjectEventPtr event, @@ -1982,6 +2054,18 @@ virDomainEventDispatchDefaultFunc(virConnectPtr conn, goto cleanup; } + case VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE: + { + virDomainEventMemoryDeviceSizeChangePtr memoryDeviceSizeChangeEvent; + + memoryDeviceSizeChangeEvent = (virDomainEventMemoryDeviceSizeChangePtr)event; + ((virConnectDomainEventMemoryDeviceSizeChangeCallback)cb)(conn, dom, + memoryDeviceSizeChangeEvent->alias, + memoryDeviceSizeChangeEvent->size, + cbopaque); + goto cleanup; + } + case VIR_DOMAIN_EVENT_ID_LAST: break; } diff --git a/src/conf/domain_event.h b/src/conf/domain_event.h index 13a1c56ce1..717d1c7307 100644 --- a/src/conf/domain_event.h +++ b/src/conf/domain_event.h @@ -267,6 +267,16 @@ virDomainEventMemoryFailureNewFromDom(virDomainPtr dom, int action, unsigned int flags); +virObjectEventPtr +virDomainEventMemoryDeviceSizeChangeNewFromObj(virDomainObjPtr obj, + const char *alias, + unsigned long long size); + +virObjectEventPtr +virDomainEventMemoryDeviceSizeChangeNewFromDom(virDomainPtr dom, + const char *alias, + unsigned long long size); + int virDomainEventStateRegister(virConnectPtr conn, virObjectEventStatePtr state, diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 4f362e3c54..e4c8dd7c1e 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -708,6 +708,8 @@ virDomainEventLifecycleNew; virDomainEventLifecycleNewFromDef; virDomainEventLifecycleNewFromDom; virDomainEventLifecycleNewFromObj; +virDomainEventMemoryDeviceSizeChangeNewFromDom; +virDomainEventMemoryDeviceSizeChangeNewFromObj; virDomainEventMemoryFailureNewFromDom; virDomainEventMemoryFailureNewFromObj; virDomainEventMetadataChangeNewFromDom; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 995faf2abc..0d12939021 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4285,6 +4285,7 @@ processMemoryDeviceSizeChange(virQEMUDriverPtr driver, qemuMonitorMemoryDeviceSizeChangePtr info) { virDomainMemoryDefPtr mem = NULL; + virObjectEventPtr event = NULL; if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) return; @@ -4302,8 +4303,13 @@ processMemoryDeviceSizeChange(virQEMUDriverPtr driver, mem->actualsize = VIR_DIV_UP(info->size, 1024); + event = virDomainEventMemoryDeviceSizeChangeNewFromObj(vm, + info->devAlias, + mem->actualsize); + endjob: qemuDomainObjEndJob(driver, vm); + virObjectEventStateQueue(driver->domainEventState, event); } diff --git a/src/remote/remote_daemon_dispatch.c b/src/remote/remote_daemon_dispatch.c index e9f2a0ce5b..1ea79d0cf9 100644 --- a/src/remote/remote_daemon_dispatch.c +++ b/src/remote/remote_daemon_dispatch.c @@ -1333,6 +1333,35 @@ remoteRelayDomainEventMemoryFailure(virConnectPtr conn, } +static int +remoteRelayDomainEventMemoryDeviceSizeChange(virConnectPtr conn, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque) +{ + daemonClientEventCallbackPtr callback = opaque; + remote_domain_event_memory_device_size_change_msg data; + + if (callback->callbackID < 0 || + !remoteRelayDomainEventCheckACL(callback->client, conn, dom)) + return -1; + + /* build return data */ + memset(&data, 0, sizeof(data)); + data.callbackID = callback->callbackID; + data.alias = g_strdup(alias); + data.size = size; + make_nonnull_domain(&data.dom, dom); + + remoteDispatchObjectEventSend(callback->client, remoteProgram, + REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE, + (xdrproc_t)xdr_remote_domain_event_memory_device_size_change_msg, + &data); + return 0; +} + + static virConnectDomainEventGenericCallback domainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventLifecycle), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventReboot), @@ -1360,6 +1389,7 @@ static virConnectDomainEventGenericCallback domainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventMetadataChange), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventBlockThreshold), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventMemoryFailure), + VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventMemoryDeviceSizeChange), }; G_STATIC_ASSERT(G_N_ELEMENTS(domainEventCallbacks) == VIR_DOMAIN_EVENT_ID_LAST); diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index a83cd866e7..7155179744 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -410,6 +410,10 @@ remoteDomainBuildEventMemoryFailure(virNetClientProgramPtr prog, void *evdata, void *opaque); static void +remoteDomainBuildEventMemoryDeviceSizeChange(virNetClientProgramPtr prog, + virNetClientPtr client, + void *evdata, void *opaque); +static void remoteConnectNotifyEventConnectionClosed(virNetClientProgramPtr prog G_GNUC_UNUSED, virNetClientPtr client G_GNUC_UNUSED, void *evdata, void *opaque); @@ -624,6 +628,10 @@ static virNetClientProgramEvent remoteEvents[] = { remoteDomainBuildEventMemoryFailure, sizeof(remote_domain_event_memory_failure_msg), (xdrproc_t)xdr_remote_domain_event_memory_failure_msg }, + { REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE, + remoteDomainBuildEventMemoryDeviceSizeChange, + sizeof(remote_domain_event_memory_device_size_change_msg), + (xdrproc_t)xdr_remote_domain_event_memory_device_size_change_msg }, }; static void @@ -5463,6 +5471,30 @@ remoteDomainBuildEventMemoryFailure(virNetClientProgramPtr prog G_GNUC_UNUSED, } +static void +remoteDomainBuildEventMemoryDeviceSizeChange(virNetClientProgramPtr prog G_GNUC_UNUSED, + virNetClientPtr client G_GNUC_UNUSED, + void *evdata, void *opaque) +{ + virConnectPtr conn = opaque; + remote_domain_event_memory_device_size_change_msg *msg = evdata; + struct private_data *priv = conn->privateData; + virDomainPtr dom; + virObjectEventPtr event = NULL; + + if (!(dom = get_nonnull_domain(conn, msg->dom))) + return; + + event = virDomainEventMemoryDeviceSizeChangeNewFromDom(dom, + msg->alias, + msg->size); + + virObjectUnref(dom); + + virObjectEventStateQueueRemote(priv->eventState, event, msg->callbackID); +} + + static int remoteStreamSend(virStreamPtr st, const char *data, diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index d3724bc305..d154bde7db 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -3812,6 +3812,13 @@ struct remote_domain_get_messages_ret { }; +struct remote_domain_event_memory_device_size_change_msg { + int callbackID; + remote_nonnull_domain dom; + remote_nonnull_string alias; + unsigned hyper size; +}; + /*----- Protocol. -----*/ /* Define the program number, protocol version and procedure numbers here. */ @@ -6733,5 +6740,11 @@ enum remote_procedure { * @generate: none * @acl: domain:read */ - REMOTE_PROC_DOMAIN_GET_MESSAGES = 426 + REMOTE_PROC_DOMAIN_GET_MESSAGES = 426, + + /** + * @generate: both + * @acl: none + */ + REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE = 427 }; diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index c0c034ac6a..37120be91b 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -3172,6 +3172,12 @@ struct remote_domain_get_messages_ret { remote_nonnull_string * msgs_val; } msgs; }; +struct remote_domain_event_memory_device_size_change_msg { + int callbackID; + remote_nonnull_domain dom; + remote_nonnull_string alias; + uint64_t size; +}; enum remote_procedure { REMOTE_PROC_CONNECT_OPEN = 1, REMOTE_PROC_CONNECT_CLOSE = 2, @@ -3599,4 +3605,5 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_AUTHORIZED_SSH_KEYS_GET = 424, REMOTE_PROC_DOMAIN_AUTHORIZED_SSH_KEYS_SET = 425, REMOTE_PROC_DOMAIN_GET_MESSAGES = 426, + REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE = 427, }; diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index d40995f44d..49ddd69d1b 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -13503,6 +13503,24 @@ virshEventMemoryFailurePrint(virConnectPtr conn G_GNUC_UNUSED, } +static void +virshEventMemoryDeviceSizeChangePrint(virConnectPtr conn G_GNUC_UNUSED, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque) +{ + g_auto(virBuffer) buf = VIR_BUFFER_INITIALIZER; + + virBufferAsprintf(&buf, + _("event 'memory-device-size-change' for domain '%s':\n" + "alias: %s\nsize: %llu\n"), + virDomainGetName(dom), alias, size); + + virshEventPrint(opaque, &buf); +} + + virshDomainEventCallback virshDomainEventCallbacks[] = { { "lifecycle", VIR_DOMAIN_EVENT_CALLBACK(virshEventLifecyclePrint), }, @@ -13554,6 +13572,8 @@ virshDomainEventCallback virshDomainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(virshEventBlockThresholdPrint), }, { "memory-failure", VIR_DOMAIN_EVENT_CALLBACK(virshEventMemoryFailurePrint), }, + { "memory-device-size-change", + VIR_DOMAIN_EVENT_CALLBACK(virshEventMemoryDeviceSizeChangePrint), }, }; G_STATIC_ASSERT(VIR_DOMAIN_EVENT_ID_LAST == G_N_ELEMENTS(virshDomainEventCallbacks)); -- 2.26.2

On Thu, Feb 18, 2021 at 14:31:04 +0100, Michal Privoznik wrote:
New event is introduced that is emitted whenever guest acknowledges allocation change request of a virtio-mem. The aim is to let applications know when that happens, because changes in allocation are not synchronous with issuing the request. Under the hood, the event is emitted whenever QEMU emits MEMORY_DEVICE_SIZE_CHANGE event.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- examples/c/misc/event-test.c | 17 ++++++ include/libvirt/libvirt-domain.h | 23 ++++++++ src/conf/domain_event.c | 84 +++++++++++++++++++++++++++++ src/conf/domain_event.h | 10 ++++ src/libvirt_private.syms | 2 + src/qemu/qemu_driver.c | 6 +++ src/remote/remote_daemon_dispatch.c | 30 +++++++++++ src/remote/remote_driver.c | 32 +++++++++++ src/remote/remote_protocol.x | 15 +++++- src/remote_protocol-structs | 7 +++ tools/virsh-domain.c | 20 +++++++ 11 files changed, 245 insertions(+), 1 deletion(-)
[...]
diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 8011cf9fe1..2c449b4f31 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -4651,6 +4651,28 @@ typedef void (*virConnectDomainEventMemoryFailureCallback)(virConnectPtr conn, unsigned int flags, void *opaque);
+/** + * virConnectDomainEventMemoryDeviceSizeChangeCallback: + * @conn: connection object + * @dom: domain on which the event occurred + * @alias: memory device alias + * @size: new actual size of memory device (in KiB)
I'd specifically call this actualsize or something to maybe add one more layer to avoid confuision with the 'size'
+ * @opaque: application specified data + * + * The callback occurs when the guest acknowledges request to change size of + * memory device (so far only virtio-mem model supports this). The @size then + * reflects the new amount of guest visible memory (in kibibytes). + * + * The callback signature to use when registering for an event of type + * VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE with + * virConnectDomainEventRegisterAny(). + */ +typedef void (*virConnectDomainEventMemoryDeviceSizeChangeCallback)(virConnectPtr conn, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque); +
/** * VIR_DOMAIN_EVENT_CALLBACK:
But I don't think it needs to be changed in the internals though.

If the QEMU driver restarts it loses the track of the actual size of virtio-mem (because it's runtime type of information and thus not stored in XML) and therefore, we have to refresh it when reconnecting to the domain monitor. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 37 ++++++++++++++++++++---- src/qemu/qemu_monitor.h | 3 ++ src/qemu/qemu_monitor_json.c | 56 ++++++++++++++++++++++-------------- src/qemu/qemu_process.c | 3 ++ 4 files changed, 72 insertions(+), 27 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 7de9f8c050..3275514908 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -8096,9 +8096,21 @@ qemuDomainUpdateMemoryDeviceInfo(virQEMUDriverPtr driver, return -1; } - /* if qemu doesn't support the info request, just carry on */ - if (rc == -2) + /* If qemu doesn't support the info request, just carry on, unless we + * really need it. */ + if (rc == -2) { + for (i = 0; i < vm->def->nmems; i++) { + virDomainMemoryDefPtr mem = vm->def->mems[i]; + + if (mem->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("qemu did not return info on vitio-mem device")); + return -1; + } + } + return 0; + } if (rc < 0) return -1; @@ -8113,9 +8125,24 @@ qemuDomainUpdateMemoryDeviceInfo(virQEMUDriverPtr driver, if (!(dimm = virHashLookup(meminfo, mem->info.alias))) continue; - mem->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM; - mem->info.addr.dimm.slot = dimm->slot; - mem->info.addr.dimm.base = dimm->address; + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + mem->actualsize = VIR_DIV_UP(dimm->size, 1024); + break; + + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + mem->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM; + mem->info.addr.dimm.slot = dimm->slot; + mem->info.addr.dimm.base = dimm->address; + break; + + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + /* nada */ + break; + } } virHashFree(meminfo); diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 53f2aa9600..c369b8c6ef 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1393,10 +1393,13 @@ typedef struct _qemuMonitorMemoryDeviceInfo qemuMonitorMemoryDeviceInfo; typedef qemuMonitorMemoryDeviceInfo *qemuMonitorMemoryDeviceInfoPtr; struct _qemuMonitorMemoryDeviceInfo { + /* For pc-dimm */ unsigned long long address; unsigned int slot; bool hotplugged; bool hotpluggable; + /* For virtio-mem */ + unsigned long long size; /* in bytes */ }; int qemuMonitorGetMemoryDeviceInfo(qemuMonitorPtr mon, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 9ce7ba52ba..09e898442d 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -8238,7 +8238,6 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitorPtr mon, virJSONValuePtr cmd; virJSONValuePtr reply = NULL; virJSONValuePtr data = NULL; - qemuMonitorMemoryDeviceInfoPtr meminfo = NULL; size_t i; if (!(cmd = qemuMonitorJSONMakeCommand("query-memory-devices", NULL))) @@ -8259,6 +8258,9 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitorPtr mon, for (i = 0; i < virJSONValueArraySize(data); i++) { virJSONValuePtr elem = virJSONValueArrayGet(data, i); + g_autofree qemuMonitorMemoryDeviceInfoPtr meminfo = NULL; + virJSONValuePtr dimminfo; + const char *devalias; const char *type; if (!(type = virJSONValueObjectGetString(elem, "type"))) { @@ -8268,26 +8270,17 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitorPtr mon, goto cleanup; } + if (!(dimminfo = virJSONValueObjectGetObject(elem, "data"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("query-memory-devices reply data doesn't " + "contain enum data")); + goto cleanup; + } + + meminfo = g_new0(qemuMonitorMemoryDeviceInfo, 1); + /* dimm memory devices */ if (STREQ(type, "dimm")) { - virJSONValuePtr dimminfo; - const char *devalias; - - if (!(dimminfo = virJSONValueObjectGetObject(elem, "data"))) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("query-memory-devices reply data doesn't " - "contain enum data")); - goto cleanup; - } - - if (!(devalias = virJSONValueObjectGetString(dimminfo, "id"))) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("dimm memory info data is missing 'id'")); - goto cleanup; - } - - meminfo = g_new0(qemuMonitorMemoryDeviceInfo, 1); - if (virJSONValueObjectGetNumberUlong(dimminfo, "addr", &meminfo->address) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -8318,17 +8311,36 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitorPtr mon, } - if (virHashAddEntry(info, devalias, meminfo) < 0) + } else if (STREQ(type, "virtio-mem")) { + /* While 'id' attribute is marked as optional in QEMU's QAPI + * specification, Libvirt always sets it. Thus we can fail if not + * present. */ + if (!(devalias = virJSONValueObjectGetString(dimminfo, "id"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("dimm memory info data is missing 'id'")); goto cleanup; + } - meminfo = NULL; + if (virJSONValueObjectGetNumberUlong(dimminfo, "size", + &meminfo->size) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("malformed/missing slot in dimm memory info")); + goto cleanup; + } + } else { + /* type not handled yet */ + continue; } + + if (virHashAddEntry(info, devalias, meminfo) < 0) + goto cleanup; + + meminfo = NULL; } ret = 0; cleanup: - VIR_FREE(meminfo); virJSONValueFree(cmd); virJSONValueFree(reply); return ret; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 4ab085841a..5881511218 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8523,6 +8523,9 @@ qemuProcessReconnect(void *opaque) qemuDomainVcpuPersistOrder(obj->def); + if (qemuDomainUpdateMemoryDeviceInfo(driver, obj, QEMU_ASYNC_JOB_NONE) < 0) + goto error; + if (qemuProcessDetectIOThreadPIDs(driver, obj, QEMU_ASYNC_JOB_NONE) < 0) goto error; -- 2.26.2

On Thu, Feb 18, 2021 at 14:31:05 +0100, Michal Privoznik wrote:
If the QEMU driver restarts it loses the track of the actual size of virtio-mem (because it's runtime type of information and thus not stored in XML) and therefore, we have to refresh it when reconnecting to the domain monitor.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 37 ++++++++++++++++++++---- src/qemu/qemu_monitor.h | 3 ++ src/qemu/qemu_monitor_json.c | 56 ++++++++++++++++++++++-------------- src/qemu/qemu_process.c | 3 ++ 4 files changed, 72 insertions(+), 27 deletions(-)
[...]
int qemuMonitorGetMemoryDeviceInfo(qemuMonitorPtr mon, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 9ce7ba52ba..09e898442d 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -8238,7 +8238,6 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitorPtr mon, virJSONValuePtr cmd; virJSONValuePtr reply = NULL; virJSONValuePtr data = NULL; - qemuMonitorMemoryDeviceInfoPtr meminfo = NULL; size_t i;
if (!(cmd = qemuMonitorJSONMakeCommand("query-memory-devices", NULL))) @@ -8259,6 +8258,9 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitorPtr mon,
for (i = 0; i < virJSONValueArraySize(data); i++) { virJSONValuePtr elem = virJSONValueArrayGet(data, i); + g_autofree qemuMonitorMemoryDeviceInfoPtr meminfo = NULL; + virJSONValuePtr dimminfo; + const char *devalias; const char *type;
../../../libvirt/src/qemu/qemu_monitor_json.c:8335:13: error: ‘devalias’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 8335 | if (virHashAddEntry(info, devalias, meminfo) < 0) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

New 'update-memory' command is introduced which aims on making it user friendly to change <memory/> device. So far I just need to change <requested/> so I'm introducing --requested-size only; but the idea is that this is extensible for other cases too. For instance, want to change <myElement/>? A new --my-element argument can be easily introduced. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/manpages/virsh.rst | 30 ++++++++ tools/virsh-domain.c | 150 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 180 insertions(+) diff --git a/docs/manpages/virsh.rst b/docs/manpages/virsh.rst index e3afa48f7b..ea26a47fd7 100644 --- a/docs/manpages/virsh.rst +++ b/docs/manpages/virsh.rst @@ -4891,6 +4891,36 @@ results as some fields may be autogenerated and thus match devices other than expected. +update-memory-device +-------------------- + +**Syntax:** + +:: + + update-memory-device domain [--print-xml] [--alias alias] + [[--live] [--config] | [--current]] + [--requested-size size] + +This command finds ``<memory/>`` device inside given *domain*, changes +requested values and passes updated device XML to daemon. If *--print-xml* is +specified then the device is not changed, but the updated device XML is printed +to stdout. If there are more than one ``<memory/>`` devices in *domain* use +*--alias* to select the desired one. + +If *--live* is specified, affect a running domain. +If *--config* is specified, affect the next startup of a persistent guest. +If *--current* is specified, it is equivalent to either *--live* or +*--config*, depending on the current state of the guest. +Both *--live* and *--config* flags may be given, but *--current* is +exclusive. Not specifying any flag is the same as specifying *--current*. + +If *--requested-size* is specified then ``<requested/>`` under memory target is +changed to requested *size* (as scaled integer, see ``NOTES`` above). It +defaults to kibibytes if no suffix is provided. The option is valid only for +``virtio-mem`` memory device model. + + change-media ------------ diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 49ddd69d1b..b6e27e0e63 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -9129,6 +9129,150 @@ cmdSetmaxmem(vshControl *ctl, const vshCmd *cmd) return ret; } + +/* + * "update-memory-device" command + */ +static const vshCmdInfo info_update_memory_device[] = { + {.name = "help", + .data = N_("update memory device of a domain") + }, + {.name = "desc", + .data = N_("Update values of a memory device of a domain") + }, + {.name = NULL} +}; + +static const vshCmdOptDef opts_update_memory_device[] = { + VIRSH_COMMON_OPT_DOMAIN_FULL(0), + VIRSH_COMMON_OPT_DOMAIN_CONFIG, + VIRSH_COMMON_OPT_DOMAIN_LIVE, + VIRSH_COMMON_OPT_DOMAIN_CURRENT, + {.name = "print-xml", + .type = VSH_OT_BOOL, + .help = N_("print updated memory device XML instead of executing the change") + }, + {.name = "alias", + .type = VSH_OT_STRING, + .completer = virshDomainDeviceAliasCompleter, + .help = N_("memory device alias"), + }, + {.name = "requested-size", + .type = VSH_OT_INT, + .help = N_("new value of <requested/> size, as scaled integer (default KiB)") + }, + {.name = NULL} +}; + +static int +virshGetUpdatedMemoryXML(char **updatedMemoryXML, + vshControl *ctl, + const vshCmd *cmd, + virDomainPtr dom, + unsigned int flags) +{ + const char *alias = NULL; + g_autoptr(xmlDoc) doc = NULL; + g_autoptr(xmlXPathContext) ctxt = NULL; + g_autofree char *xpath = NULL; + int nmems; + g_autofree xmlNodePtr *mems = NULL; + g_autoptr(xmlBuffer) xmlbuf = NULL; + unsigned int domainXMLFlags = 0; + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) + domainXMLFlags |= VIR_DOMAIN_XML_INACTIVE; + + if (virshDomainGetXMLFromDom(ctl, dom, domainXMLFlags, &doc, &ctxt) < 0) + return -1; + + if (vshCommandOptStringReq(ctl, cmd, "alias", &alias) < 0) + return -1; + + if (alias) { + xpath = g_strdup_printf("/domain/devices/memory[./alias/@name='%s']", alias); + } else { + xpath = g_strdup("/domain/devices/memory"); + } + + nmems = virXPathNodeSet(xpath, ctxt, &mems); + if (nmems < 0) { + vshSaveLibvirtError(); + return -1; + } else if (nmems == 0) { + vshError(ctl, _("no memory device found")); + return -1; + } else if (nmems > 1) { + vshError(ctl, _("multiple memory devices found, use --alias to select one")); + return -1; + } + + ctxt->node = mems[0]; + + if (vshCommandOptBool(cmd, "requested-size")) { + xmlNodePtr requestedSizeNode; + g_autofree char *kibibytesStr = NULL; + unsigned long long bytes = 0; + unsigned long kibibytes = 0; + + if (vshCommandOptScaledInt(ctl, cmd, "requested-size", &bytes, 1024, ULLONG_MAX) < 0) + return -1; + kibibytes = VIR_DIV_UP(bytes, 1024); + + requestedSizeNode = virXPathNode("./target/requested", ctxt); + + if (!requestedSizeNode) { + vshError(ctl, _("virtio-mem device is missing <requested/>")); + return -1; + } + + kibibytesStr = g_strdup_printf("%lu", kibibytes); + xmlNodeSetContent(requestedSizeNode, BAD_CAST kibibytesStr); + } + + if (!(*updatedMemoryXML = virXMLNodeToString(doc, mems[0]))) { + vshSaveLibvirtError(); + return -1; + } + + return 0; +} + +static bool +cmdUpdateMemoryDevice(vshControl *ctl, const vshCmd *cmd) +{ + g_autoptr(virshDomain) dom = NULL; + bool config = vshCommandOptBool(cmd, "config"); + bool live = vshCommandOptBool(cmd, "live"); + bool current = vshCommandOptBool(cmd, "current"); + g_autofree char *updatedMemoryXML = NULL; + unsigned int flags = VIR_DOMAIN_AFFECT_CURRENT; + + VSH_EXCLUSIVE_OPTIONS_VAR(current, live); + VSH_EXCLUSIVE_OPTIONS_VAR(current, config); + + if (config) + flags |= VIR_DOMAIN_AFFECT_CONFIG; + if (live) + flags |= VIR_DOMAIN_AFFECT_LIVE; + + if (!(dom = virshCommandOptDomain(ctl, cmd, NULL))) + return false; + + if (virshGetUpdatedMemoryXML(&updatedMemoryXML, ctl, cmd, dom, flags) < 0) + return false; + + if (vshCommandOptBool(cmd, "print-xml")) { + vshPrint(ctl, "%s", updatedMemoryXML); + } else { + if (virDomainUpdateDeviceFlags(dom, updatedMemoryXML, flags) < 0) + return false; + } + + return true; +} + + /* * "memtune" command */ @@ -15016,6 +15160,12 @@ const vshCmdDef domManagementCmds[] = { .info = info_update_device, .flags = 0 }, + {.name = "update-memory-device", + .handler = cmdUpdateMemoryDevice, + .opts = opts_update_memory_device, + .info = info_update_memory_device, + .flags = 0 + }, {.name = "vcpucount", .handler = cmdVcpucount, .opts = opts_vcpucount, -- 2.26.2

On Thu, Feb 18, 2021 at 14:31:06 +0100, Michal Privoznik wrote: s/update-memory/update-memory-device/ in subject
New 'update-memory' command is introduced which aims on making it
and here.
user friendly to change <memory/> device. So far I just need to change <requested/> so I'm introducing --requested-size only; but the idea is that this is extensible for other cases too. For instance, want to change <myElement/>? A new --my-element argument can be easily introduced.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- NEWS.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/NEWS.rst b/NEWS.rst index 440dbf2601..abb3f5d867 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -47,6 +47,13 @@ v7.1.0 (unreleased) to set the hostdev device's MAC address (which is a necessary part of the alternate <interface type='hostdev'>). + * Introduce virtio-mem ``<memory/>`` model + + New virtio-mem model is introduced for ``<memory/>`` device which is a + paravirtualized mechanism of adding/removing memory to/from a VM. Use + ``virDomainUpdateDeviceFlags()`` API to adjust amount of memory or ``virsh + update-memory-device`` for convenience. + * **Improvements** * **Bug fixes** -- 2.26.2

This commit adds new memorydevices.rst page which should serve all models of memory devices. Yet, I'm documenting virtio-mem quirks only. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/kbase/index.rst | 4 + docs/kbase/memorydevices.rst | 160 +++++++++++++++++++++++++++++++++++ docs/kbase/meson.build | 1 + 3 files changed, 165 insertions(+) create mode 100644 docs/kbase/memorydevices.rst diff --git a/docs/kbase/index.rst b/docs/kbase/index.rst index 532804fe05..45450bf33b 100644 --- a/docs/kbase/index.rst +++ b/docs/kbase/index.rst @@ -46,6 +46,10 @@ Usage `PCI topology <../pci-addresses.html>`__ Addressing schemes for PCI devices +`Memory devices <memorydevices.html>`__ + Memory devices and their use + + Internals / Debugging --------------------- diff --git a/docs/kbase/memorydevices.rst b/docs/kbase/memorydevices.rst new file mode 100644 index 0000000000..23adf54e16 --- /dev/null +++ b/docs/kbase/memorydevices.rst @@ -0,0 +1,160 @@ +============== +Memory devices +============== + +.. contents:: + +Basics +====== + +Memory devices can be divided into two families: DIMMs and NVDIMMs. The former +is typical RAM memory: it's volatile and thus its contents doesn't survive +reboots nor guest shut downs and power ons. The latter retains its contents +across reboots or power outages. + +In Libvirt, there are two models for DIMMs: + +* ``dimm`` model: + + :: + + <memory model='dimm'> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + <address type='dimm' slot='0'/> + </memory> + +* ``virtio-mem`` model: + + :: + + <memory model='virtio-mem'> + <target> + <size unit='KiB'>1048576</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>524288</requested> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </memory> + +Then there are two models for NVDIMMs: + +* ``nvidmm`` model: + + :: + + <memory model='nvdimm'> + <source> + <path>/tmp/nvdimm</path> + </source> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + <address type='dimm' slot='0'/> + </memory> + +* ``virtio-pmem`` model: + + :: + + <memory model='virtio-pmem' access='shared'> + <source> + <path>/tmp/virtio_pmem</path> + </source> + <target> + <size unit='KiB'>524288</size> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </memory> + + +Please not that (maybe somewhat surprisingly) virtio models go onto PCI bus +instead of DIMM slots. + +Furthermore, DIMMs can have ``<source/>`` element which configures backend for +devices. For NVDIMMs the element is mandatory and reflects where the contents +is saved. + +See https://libvirt.org/formatdomain.html#elementsMemory + +``virtio-mem`` model +==================== + +The ``virtio-mem`` model can be viewed as revised memory balloon. It offers +memory hotplug and hotunplug solution (without the actual hotplug of the +device). It solves problems that memory balloon can't solve on its own and thus +is more flexible than DIMM + balloon solution. ``virtio-mem`` is NUMA aware, +and thus memory can be inflated/deflated only for a subset of guest NUMA nodes. +Also, it works with chunks that are either exposed to guest or taken back from +it. + +See https://virtio-mem.gitlab.io/ + +Under the hood, ``virtio-mem`` device is split into chunks of equal size which +are then exposed to the guest. Either all of them or only a portion depending +on user's request. Therefore there are three important sizes for +``virtio-mem``. All are to be found under ``<target/>`` element: + +#. The maximum size the device can ever offer, exposed under ``<size/>`` +#. The size a single block, exposed under ``<block/>`` +#. The current size exposed to the guest, exposed under ``<requested/>`` + +For instance, the following example the maximum size is 4GiB, the block size is +2MiB and only 1GiB should be exposed to the guest: + + :: + + <memory model='virtio-mem'> + <target> + <size unit='KiB'>4194304</size> + <block unit='KiB'>2048</block> + <requested unit='KiB'>1048576</requested> + </target> + </memory> + +Please note that ``<requested/>`` must be an integer multiple of ``<block/>`` +size or zero (memory completely deflated) and has to be less or equal to +``<size/>`` (memory completely inflated). Furthermore, QEMU recommends the +``<block/>`` size to be as big as a Transparent Huge Page (usually 2MiB). + +To change the size exposed to the guest, users should pass memory device XML +with nothing but ``<requested/>`` changed into the +``virDomainUpdateDeviceFlags()`` API. For user's convenience this can be done +via virsh too: + + :: + + # virsh update-memory-device $dom --requested size 2GiB + +If there are two or more ``<memory/>`` devices then ``--alias`` shall be used +to tell virsh which memory device should be updated. + +For running guests there is fourth size that can be found under ``<target/>``: + + :: + + <actual unit='KiB'>2097152</actual> + +The ``<actual/>`` reflects the actual size consumed by the guest. In general it +can differ from ``<requested/>``. Reasons include guest kernel missing +``virtio-mem`` module and thus being unable to take offered memory, or guest +kernel being unable to free memory and allow deflation. Since ``<actual/>`` +only reports size to users, the element is never parsed. It is formatted only +into live XML. + +Since changing actual allocation requires cooperation with guest kernel, +requests for change are not instant. Therefore, libvirt emits +``VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE`` event whenever actual +allocation changed. + +Please not that using ``virtio-mem`` with memory balloon is not possible, +currently. The real reason is that libvirt's memory accounting isn't ready and +mixing these two would be confusing to users. Libvirt exposes current value of +memory balloon under ``<currentMemory/>`` but if it were to account for +``<actual/>`` too then it would be impossible to learn true size of the +balloon. Also it might result in mistakenly trying to deflate ``virtio-mem`` +via ``setmem`` command. diff --git a/docs/kbase/meson.build b/docs/kbase/meson.build index 7b4e7abbd3..7f6adba915 100644 --- a/docs/kbase/meson.build +++ b/docs/kbase/meson.build @@ -9,6 +9,7 @@ docs_kbase_files = [ 'locking-lockd', 'locking', 'locking-sanlock', + 'memorydevices', 'migrationinternals', 'qemu-passthrough-security', 'rpm-deployment', -- 2.26.2

On Thu, Feb 18, 2021 at 14:31:08 +0100, Michal Privoznik wrote:
This commit adds new memorydevices.rst page which should serve all models of memory devices. Yet, I'm documenting virtio-mem quirks only.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/kbase/index.rst | 4 + docs/kbase/memorydevices.rst | 160 +++++++++++++++++++++++++++++++++++ docs/kbase/meson.build | 1 + 3 files changed, 165 insertions(+) create mode 100644 docs/kbase/memorydevices.rst
diff --git a/docs/kbase/index.rst b/docs/kbase/index.rst index 532804fe05..45450bf33b 100644 --- a/docs/kbase/index.rst +++ b/docs/kbase/index.rst @@ -46,6 +46,10 @@ Usage `PCI topology <../pci-addresses.html>`__ Addressing schemes for PCI devices
+`Memory devices <memorydevices.html>`__ + Memory devices and their use + + Internals / Debugging ---------------------
diff --git a/docs/kbase/memorydevices.rst b/docs/kbase/memorydevices.rst new file mode 100644 index 0000000000..23adf54e16 --- /dev/null +++ b/docs/kbase/memorydevices.rst @@ -0,0 +1,160 @@ +============== +Memory devices +============== + +.. contents:: + +Basics +====== + +Memory devices can be divided into two families: DIMMs and NVDIMMs. The former +is typical RAM memory: it's volatile and thus its contents doesn't survive +reboots nor guest shut downs and power ons. The latter retains its contents +across reboots or power outages. + +In Libvirt, there are two models for DIMMs: + +* ``dimm`` model: + + :: + + <memory model='dimm'> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + <address type='dimm' slot='0'/> + </memory> + +* ``virtio-mem`` model: + + :: + + <memory model='virtio-mem'> + <target> + <size unit='KiB'>1048576</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>524288</requested> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </memory> + +Then there are two models for NVDIMMs: + +* ``nvidmm`` model: + + :: + + <memory model='nvdimm'> + <source> + <path>/tmp/nvdimm</path> + </source> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + <address type='dimm' slot='0'/> + </memory> + +* ``virtio-pmem`` model: + + :: + + <memory model='virtio-pmem' access='shared'> + <source> + <path>/tmp/virtio_pmem</path> + </source> + <target> + <size unit='KiB'>524288</size> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </memory> + + +Please not that (maybe somewhat surprisingly) virtio models go onto PCI bus
s/not/note/
+instead of DIMM slots. + +Furthermore, DIMMs can have ``<source/>`` element which configures backend for +devices. For NVDIMMs the element is mandatory and reflects where the contents +is saved. + +See https://libvirt.org/formatdomain.html#elementsMemory
Please use a relative link inside of the libvirt project. You might need to enclose it.
+``virtio-mem`` model +==================== + +The ``virtio-mem`` model can be viewed as revised memory balloon. It offers +memory hotplug and hotunplug solution (without the actual hotplug of the
I'd say 'adding and removing of memory without actual hotplug or unplug of devices'.
+device). It solves problems that memory balloon can't solve on its own and thus +is more flexible than DIMM + balloon solution. ``virtio-mem`` is NUMA aware, +and thus memory can be inflated/deflated only for a subset of guest NUMA nodes. +Also, it works with chunks that are either exposed to guest or taken back from +it. + +See https://virtio-mem.gitlab.io/ + +Under the hood, ``virtio-mem`` device is split into chunks of equal size which +are then exposed to the guest. Either all of them or only a portion depending +on user's request. Therefore there are three important sizes for +``virtio-mem``. All are to be found under ``<target/>`` element: + +#. The maximum size the device can ever offer, exposed under ``<size/>`` +#. The size a single block, exposed under ``<block/>``
size of a
+#. The current size exposed to the guest, exposed under ``<requested/>`` + +For instance, the following example the maximum size is 4GiB, the block size is +2MiB and only 1GiB should be exposed to the guest: + + :: + + <memory model='virtio-mem'> + <target> + <size unit='KiB'>4194304</size> + <block unit='KiB'>2048</block> + <requested unit='KiB'>1048576</requested> + </target> + </memory> + +Please note that ``<requested/>`` must be an integer multiple of ``<block/>`` +size or zero (memory completely deflated) and has to be less or equal to +``<size/>`` (memory completely inflated). Furthermore, QEMU recommends the
I'd avoid inflated/deflated since it has exactly the opposite meaning with memballoon.
+``<block/>`` size to be as big as a Transparent Huge Page (usually 2MiB). + +To change the size exposed to the guest, users should pass memory device XML +with nothing but ``<requested/>`` changed into the +``virDomainUpdateDeviceFlags()`` API. For user's convenience this can be done +via virsh too: + + :: + + # virsh update-memory-device $dom --requested size 2GiB + +If there are two or more ``<memory/>`` devices then ``--alias`` shall be used +to tell virsh which memory device should be updated. + +For running guests there is fourth size that can be found under ``<target/>``: + + :: + + <actual unit='KiB'>2097152</actual> + +The ``<actual/>`` reflects the actual size consumed by the guest. In general it
s/consumed/used/
+can differ from ``<requested/>``. Reasons include guest kernel missing +``virtio-mem`` module and thus being unable to take offered memory, or guest +kernel being unable to free memory and allow deflation. Since ``<actual/>``
s/and allow deflation//
+only reports size to users, the element is never parsed. It is formatted only +into live XML. + +Since changing actual allocation requires cooperation with guest kernel, +requests for change are not instant. Therefore, libvirt emits +``VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE`` event whenever actual +allocation changed. + +Please not that using ``virtio-mem`` with memory balloon is not possible, +currently. The real reason is that libvirt's memory accounting isn't ready and +mixing these two would be confusing to users. Libvirt exposes current value of +memory balloon under ``<currentMemory/>`` but if it were to account for +``<actual/>`` too then it would be impossible to learn true size of the +balloon. Also it might result in mistakenly trying to deflate ``virtio-mem`` +via ``setmem`` command.
participants (4)
-
Daniel P. Berrangé
-
David Hildenbrand
-
Michal Privoznik
-
Peter Krempa