[PATCH v3 RESEND for v7.4.0 00/14] Introduce virtio-mem <memory/> model

This is a rebased version of v3 I've sent about a month ago: https://listman.redhat.com/archives/libvir-list/2021-March/msg00823.html v2 here: https://listman.redhat.com/archives/libvir-list/2021-February/msg00961.html My intent is to merge these only after the upcoming release so that we have the biggest test window possible. diff to v2: - Dropped code that forbade use of virtio-mem and memballoon at the same time; - This meant that I had to adjust memory accounting, qemuDomainSetMemoryFlags() - see patches 11/15 and 12/15 which are new. - Fixed small nits raised by Peter in his review of v2 Michal Prívozník (14): virhostmem: Introduce virHostMemGetTHPSize() qemu_process: Deduplicate code in qemuProcessNeedHugepagesPath() qemu_process: Drop needless check in qemuProcessNeedMemoryBackingPath() qemu_capabilities: Introduce QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI conf: Introduce virtio-mem <memory/> model qemu: Build command line for virtio-mem qemu: Wire up <memory/> live update qemu: Wire up MEMORY_DEVICE_SIZE_CHANGE event qemu: Refresh the actual size of virtio-mem on monitor reconnect qemu: Account for both memballoon and virtio-mem qemuDomainSetMemoryFlags: Take virtio-mem into consideration virsh: Introduce update-memory-device command news: document recent virtio memory addition kbase: Document virtio-mem NEWS.rst | 16 ++ docs/formatdomain.rst | 45 +++- docs/kbase/index.rst | 4 + docs/kbase/memorydevices.rst | 150 +++++++++++ docs/kbase/meson.build | 1 + docs/manpages/virsh.rst | 30 +++ docs/schemas/domaincommon.rng | 16 ++ examples/c/misc/event-test.c | 17 ++ include/libvirt/libvirt-domain.h | 23 ++ src/conf/domain_conf.c | 115 ++++++++- src/conf/domain_conf.h | 15 ++ src/conf/domain_event.c | 84 +++++++ src/conf/domain_event.h | 10 + src/conf/domain_validate.c | 39 +++ src/libvirt_private.syms | 5 + src/qemu/qemu_alias.c | 10 +- src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 13 +- src/qemu/qemu_domain.c | 50 +++- src/qemu/qemu_domain.h | 1 + src/qemu/qemu_domain_address.c | 38 ++- src/qemu/qemu_driver.c | 233 +++++++++++++++++- src/qemu/qemu_hotplug.c | 18 ++ src/qemu/qemu_hotplug.h | 5 + src/qemu/qemu_monitor.c | 37 +++ src/qemu/qemu_monitor.h | 28 +++ src/qemu/qemu_monitor_json.c | 97 ++++++-- src/qemu/qemu_monitor_json.h | 5 + src/qemu/qemu_process.c | 118 ++++++++- src/qemu/qemu_validate.c | 8 + src/remote/remote_daemon_dispatch.c | 30 +++ src/remote/remote_driver.c | 32 +++ src/remote/remote_protocol.x | 14 +- src/remote_protocol-structs | 7 + src/security/security_apparmor.c | 1 + src/security/security_dac.c | 2 + src/security/security_selinux.c | 2 + src/util/virhostmem.c | 63 +++++ src/util/virhostmem.h | 3 + tests/domaincapsmock.c | 9 + .../caps_5.1.0.x86_64.xml | 1 + .../caps_5.2.0.x86_64.xml | 1 + .../caps_6.0.0.x86_64.xml | 1 + ...mory-hotplug-virtio-mem.x86_64-latest.args | 41 +++ .../memory-hotplug-virtio-mem.xml | 67 +++++ tests/qemuxml2argvtest.c | 1 + ...emory-hotplug-virtio-mem.x86_64-latest.xml | 1 + tests/qemuxml2xmltest.c | 1 + tools/virsh-domain.c | 169 +++++++++++++ 50 files changed, 1613 insertions(+), 67 deletions(-) create mode 100644 docs/kbase/memorydevices.rst create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml create mode 120000 tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml -- 2.26.3

New virHostMemGetTHPSize() is introduced which allows caller to obtain THP PMD (Page Middle Directory) size, which is equal to the minimal size that THP can use, taken from kernel doc (Documentation/admin-guide/mm/transhuge.rst): Some userspace (such as a test program, or an optimized memory allocation library) may want to know the size (in bytes) of a transparent hugepage:: cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size Since this size depends on the host architecture and the kernel it won't change whilst libvirtd is running. Therefore, we can use virOnce() and cache the value. Of course, we can be running under kernel that has THP disabled or has no notion of THP at all. In that case a negative value is returned to signal error. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virhostmem.c | 63 ++++++++++++++++++++++++++++++++++++++++ src/util/virhostmem.h | 3 ++ tests/domaincapsmock.c | 9 ++++++ 4 files changed, 76 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 8f8d399d88..1a3295b111 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2387,6 +2387,7 @@ virHostMemGetFreePages; virHostMemGetInfo; virHostMemGetParameters; virHostMemGetStats; +virHostMemGetTHPSize; virHostMemSetParameters; diff --git a/src/util/virhostmem.c b/src/util/virhostmem.c index ae42978ed2..89b31af3ca 100644 --- a/src/util/virhostmem.c +++ b/src/util/virhostmem.c @@ -45,11 +45,14 @@ #include "virstring.h" #include "virnuma.h" #include "virlog.h" +#include "virthread.h" #define VIR_FROM_THIS VIR_FROM_NONE VIR_LOG_INIT("util.hostmem"); +static unsigned long long virHostTHPPMDSize; /* in kibibytes */ +static virOnceControl virHostMemGetTHPSizeOnce = VIR_ONCE_CONTROL_INITIALIZER; #ifdef __FreeBSD__ # define BSD_MEMORY_STATS_ALL 4 @@ -920,3 +923,63 @@ virHostMemAllocPages(unsigned int npages, return ncounts; } + +#if defined(__linux__) +# define HPAGE_PMD_SIZE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size" +static int +virHostMemGetTHPSizeSysfs(unsigned long long *size) +{ + g_autofree char *buf = NULL; + + /* 1KiB limit is more than enough. */ + if (virFileReadAll(HPAGE_PMD_SIZE_PATH, 1024, &buf) < 0) + return -1; + + virStringTrimOptionalNewline(buf); + if (virStrToLong_ull(buf, NULL, 10, size) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("unable to parse THP PMD size: %s"), buf); + return -1; + } + + /* Size is now in bytes. Convert to KiB. */ + *size >>= 10; + return 0; +} +#endif /* defined(__linux__) */ + + +static void +virHostMemGetTHPSizeOnceInit(void) +{ +#if defined(__linux__) + virHostMemGetTHPSizeSysfs(&virHostTHPPMDSize); +#else /* !defined(__linux__) */ + VIR_WARN("Getting THP size not ported yet"); +#endif /* !defined(__linux__) */ +} + + +/** + * virHostMemGetTHPSize: + * @size: returned size of THP in kibibytes + * + * Obtain Transparent Huge Page size in kibibytes. The size + * depends on host architecture and kernel. Because of virOnce(), + * do not rely on errno in case of failure. + * + * Returns: 0 on success, + * -1 on failure. + */ +int +virHostMemGetTHPSize(unsigned long long *size) +{ + if (virOnce(&virHostMemGetTHPSizeOnce, virHostMemGetTHPSizeOnceInit) < 0) + return -1; + + if (virHostTHPPMDSize == 0) + return -1; + + *size = virHostTHPPMDSize; + return 0; +} diff --git a/src/util/virhostmem.h b/src/util/virhostmem.h index 1369829807..bf15c40698 100644 --- a/src/util/virhostmem.h +++ b/src/util/virhostmem.h @@ -53,3 +53,6 @@ int virHostMemAllocPages(unsigned int npages, int startCell, unsigned int cellCount, bool add); + +int virHostMemGetTHPSize(unsigned long long *size) + G_GNUC_NO_INLINE; diff --git a/tests/domaincapsmock.c b/tests/domaincapsmock.c index d81a898dc0..34a2f9ad81 100644 --- a/tests/domaincapsmock.c +++ b/tests/domaincapsmock.c @@ -17,6 +17,7 @@ #include <config.h> #include "virhostcpu.h" +#include "virhostmem.h" #ifdef WITH_LIBXL # include "libxl/libxl_capabilities.h" #endif @@ -40,3 +41,11 @@ virHostCPUGetMicrocodeVersion(virArch hostArch G_GNUC_UNUSED) { return 0; } + +int +virHostMemGetTHPSize(unsigned long long *size) +{ + /* Pretend Transparent Huge Page size is 2MiB. */ + *size = 2048; + return 0; +} -- 2.26.3

On Fri, Apr 23, 2021 at 15:24:23 +0200, Michal Privoznik wrote:
New virHostMemGetTHPSize() is introduced which allows caller to obtain THP PMD (Page Middle Directory) size, which is equal to the minimal size that THP can use, taken from kernel doc (Documentation/admin-guide/mm/transhuge.rst):
Some userspace (such as a test program, or an optimized memory allocation library) may want to know the size (in bytes) of a transparent hugepage::
cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
Since this size depends on the host architecture and the kernel it won't change whilst libvirtd is running. Therefore, we can use virOnce() and cache the value. Of course, we can be running under kernel that has THP disabled or has no notion of THP at all. In that case a negative value is returned to signal error.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virhostmem.c | 63 ++++++++++++++++++++++++++++++++++++++++ src/util/virhostmem.h | 3 ++ tests/domaincapsmock.c | 9 ++++++ 4 files changed, 76 insertions(+)
[...]
diff --git a/src/util/virhostmem.c b/src/util/virhostmem.c index ae42978ed2..89b31af3ca 100644 --- a/src/util/virhostmem.c +++ b/src/util/virhostmem.c @@ -45,11 +45,14 @@ #include "virstring.h" #include "virnuma.h" #include "virlog.h" +#include "virthread.h"
#define VIR_FROM_THIS VIR_FROM_NONE
VIR_LOG_INIT("util.hostmem");
+static unsigned long long virHostTHPPMDSize; /* in kibibytes */ +static virOnceControl virHostMemGetTHPSizeOnce = VIR_ONCE_CONTROL_INITIALIZER;
#ifdef __FreeBSD__ # define BSD_MEMORY_STATS_ALL 4 @@ -920,3 +923,63 @@ virHostMemAllocPages(unsigned int npages,
return ncounts; } + +#if defined(__linux__) +# define HPAGE_PMD_SIZE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size" +static int
Nothing checks the return value.
+virHostMemGetTHPSizeSysfs(unsigned long long *size) +{ + g_autofree char *buf = NULL; + + /* 1KiB limit is more than enough. */ + if (virFileReadAll(HPAGE_PMD_SIZE_PATH, 1024, &buf) < 0) + return -1; + + virStringTrimOptionalNewline(buf); + if (virStrToLong_ull(buf, NULL, 10, size) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("unable to parse THP PMD size: %s"), buf); + return -1; + } + + /* Size is now in bytes. Convert to KiB. */ + *size >>= 10; + return 0; +} +#endif /* defined(__linux__) */ + + +static void +virHostMemGetTHPSizeOnceInit(void) +{ +#if defined(__linux__) + virHostMemGetTHPSizeSysfs(&virHostTHPPMDSize); +#else /* !defined(__linux__) */ + VIR_WARN("Getting THP size not ported yet"); +#endif /* !defined(__linux__) */ +} + + +/** + * virHostMemGetTHPSize: + * @size: returned size of THP in kibibytes + * + * Obtain Transparent Huge Page size in kibibytes. The size + * depends on host architecture and kernel. Because of virOnce(), + * do not rely on errno in case of failure. + * + * Returns: 0 on success, + * -1 on failure. + */ +int +virHostMemGetTHPSize(unsigned long long *size) +{ + if (virOnce(&virHostMemGetTHPSizeOnce, virHostMemGetTHPSizeOnceInit) < 0)
This directly returns the return value from 'pthread_once' whose manual entry states: RETURN VALUE Upon successful completion, pthread_once() shall return zero; otherwise, an error number shall be returned to indicate the error. Which reads as if 'errno' is returned by the function which would make the '< 0' check wrong.
+ return -1; + + if (virHostTHPPMDSize == 0) + return -1; + + *size = virHostTHPPMDSize; + return 0; +}
With the above problems addressed: Reviewed-by: Peter Krempa <pkrempa@redhat.com>

On 5/3/21 1:07 PM, Peter Krempa wrote:
On Fri, Apr 23, 2021 at 15:24:23 +0200, Michal Privoznik wrote:
New virHostMemGetTHPSize() is introduced which allows caller to obtain THP PMD (Page Middle Directory) size, which is equal to the minimal size that THP can use, taken from kernel doc (Documentation/admin-guide/mm/transhuge.rst):
Some userspace (such as a test program, or an optimized memory allocation library) may want to know the size (in bytes) of a transparent hugepage::
cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
Since this size depends on the host architecture and the kernel it won't change whilst libvirtd is running. Therefore, we can use virOnce() and cache the value. Of course, we can be running under kernel that has THP disabled or has no notion of THP at all. In that case a negative value is returned to signal error.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virhostmem.c | 63 ++++++++++++++++++++++++++++++++++++++++ src/util/virhostmem.h | 3 ++ tests/domaincapsmock.c | 9 ++++++ 4 files changed, 76 insertions(+)
[...]
diff --git a/src/util/virhostmem.c b/src/util/virhostmem.c index ae42978ed2..89b31af3ca 100644 --- a/src/util/virhostmem.c +++ b/src/util/virhostmem.c @@ -45,11 +45,14 @@ #include "virstring.h" #include "virnuma.h" #include "virlog.h" +#include "virthread.h"
#define VIR_FROM_THIS VIR_FROM_NONE
VIR_LOG_INIT("util.hostmem");
+static unsigned long long virHostTHPPMDSize; /* in kibibytes */ +static virOnceControl virHostMemGetTHPSizeOnce = VIR_ONCE_CONTROL_INITIALIZER;
#ifdef __FreeBSD__ # define BSD_MEMORY_STATS_ALL 4 @@ -920,3 +923,63 @@ virHostMemAllocPages(unsigned int npages,
return ncounts; } + +#if defined(__linux__) +# define HPAGE_PMD_SIZE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size" +static int
Nothing checks the return value.
+virHostMemGetTHPSizeSysfs(unsigned long long *size) +{ + g_autofree char *buf = NULL; + + /* 1KiB limit is more than enough. */ + if (virFileReadAll(HPAGE_PMD_SIZE_PATH, 1024, &buf) < 0) + return -1; + + virStringTrimOptionalNewline(buf); + if (virStrToLong_ull(buf, NULL, 10, size) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("unable to parse THP PMD size: %s"), buf); + return -1; + } + + /* Size is now in bytes. Convert to KiB. */ + *size >>= 10; + return 0; +} +#endif /* defined(__linux__) */ + + +static void +virHostMemGetTHPSizeOnceInit(void) +{ +#if defined(__linux__) + virHostMemGetTHPSizeSysfs(&virHostTHPPMDSize); +#else /* !defined(__linux__) */ + VIR_WARN("Getting THP size not ported yet"); +#endif /* !defined(__linux__) */ +} + + +/** + * virHostMemGetTHPSize: + * @size: returned size of THP in kibibytes + * + * Obtain Transparent Huge Page size in kibibytes. The size + * depends on host architecture and kernel. Because of virOnce(), + * do not rely on errno in case of failure. + * + * Returns: 0 on success, + * -1 on failure. + */ +int +virHostMemGetTHPSize(unsigned long long *size) +{ + if (virOnce(&virHostMemGetTHPSizeOnce, virHostMemGetTHPSizeOnceInit) < 0)
This directly returns the return value from 'pthread_once' whose manual entry states:
RETURN VALUE Upon successful completion, pthread_once() shall return zero; otherwise, an error number shall be returned to indicate the error.
Which reads as if 'errno' is returned by the function which would make the '< 0' check wrong.
Interesting, never though about this. Just blindly copied what we have elsewhere. Looking into glibc - pthread_once will never return anything else than 0. Then uclibc-ng doesn't provide pthread_once and musl also always returns 0. I'll send a patch that fixes this issue shortly. Michal

The aim of qemuProcessNeedHugepagesPath() is to return whether guest needs private path inside HugeTLBFS mounts (deducted from domain definition @def) or whether the memory device that user is hotplugging in needs the private path (deducted from the @mem argument). The actual creation of the path is done in the only caller qemuProcessBuildDestroyMemoryPaths(). The rule for the first case (@def) and the second case (@mem) is the same (domain has a DIMM device that has HP requested) and is written twice. Move the logic into a function to deduplicate the code. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 92c1f0ab74..449e5f1547 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3872,6 +3872,27 @@ qemuProcessReconnectCheckMemAliasOrderMismatch(virDomainObj *vm) } +static bool +qemuProcessDomainMemoryDefNeedHugepagesPath(const virDomainMemoryDef *mem, + const long system_pagesize) +{ + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + return mem->pagesize && + mem->pagesize != system_pagesize; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + /* None of these can be backed by hugepages. */ + return false; + } + + return false; +} + + static bool qemuProcessNeedHugepagesPath(virDomainDef *def, virDomainMemoryDef *mem) @@ -3888,16 +3909,12 @@ qemuProcessNeedHugepagesPath(virDomainDef *def, } for (i = 0; i < def->nmems; i++) { - if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_DIMM && - def->mems[i]->pagesize && - def->mems[i]->pagesize != system_pagesize) + if (qemuProcessDomainMemoryDefNeedHugepagesPath(def->mems[i], system_pagesize)) return true; } if (mem && - mem->model == VIR_DOMAIN_MEMORY_MODEL_DIMM && - mem->pagesize && - mem->pagesize != system_pagesize) + qemuProcessDomainMemoryDefNeedHugepagesPath(mem, system_pagesize)) return true; return false; -- 2.26.3

On Fri, Apr 23, 2021 at 15:24:24 +0200, Michal Privoznik wrote:
The aim of qemuProcessNeedHugepagesPath() is to return whether guest needs private path inside HugeTLBFS mounts (deducted from domain definition @def) or whether the memory device that user is hotplugging in needs the private path (deducted from the @mem argument). The actual creation of the path is done in the only caller qemuProcessBuildDestroyMemoryPaths().
The rule for the first case (@def) and the second case (@mem) is the same (domain has a DIMM device that has HP requested) and is written twice. Move the logic into a function to deduplicate the code.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 92c1f0ab74..449e5f1547 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3872,6 +3872,27 @@ qemuProcessReconnectCheckMemAliasOrderMismatch(virDomainObj *vm) }
+static bool +qemuProcessDomainMemoryDefNeedHugepagesPath(const virDomainMemoryDef *mem, + const long system_pagesize) +{ + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + return mem->pagesize && + mem->pagesize != system_pagesize;
Put both terms on a single line. Reviewed-by: Peter Krempa <pkrempa@redhat.com>

The aim of this function is to return whether domain definition and/or memory device that user intents to hotplug needs a private path inside cfg->memoryBackingDir. The rule for the memory device that's being hotplug includes checking whether corresponding guest NUMA node needs memoryBackingDir. Well, while the rationale behind makes sense it is not necessary to check for that really - just a few lines above every guest NUMA node was checked exactly for that. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 449e5f1547..a5f256154f 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3939,13 +3939,24 @@ qemuProcessNeedMemoryBackingPath(virDomainDef *def, return true; } - if (mem && - mem->model == VIR_DOMAIN_MEMORY_MODEL_DIMM && - (mem->access != VIR_DOMAIN_MEMORY_ACCESS_DEFAULT || - (mem->targetNode >= 0 && - virDomainNumaGetNodeMemoryAccessMode(def->numa, mem->targetNode) - != VIR_DOMAIN_MEMORY_ACCESS_DEFAULT))) - return true; + if (mem) { + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + if (mem->access != VIR_DOMAIN_MEMORY_ACCESS_DEFAULT) { + /* No need to check for access mode on the target node, + * it was checked for in the previous loop. */ + return true; + } + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + /* Backed by user provided path. Not stored in memory + * backing dir anyway. */ + break; + } + } return false; } -- 2.26.3

On Fri, Apr 23, 2021 at 15:24:25 +0200, Michal Privoznik wrote:
The aim of this function is to return whether domain definition and/or memory device that user intents to hotplug needs a private path inside cfg->memoryBackingDir. The rule for the memory device that's being hotplug includes checking whether corresponding guest NUMA node needs memoryBackingDir. Well, while the rationale behind makes sense it is not necessary to check for that really - just a few lines above every guest NUMA node was checked exactly for that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 449e5f1547..a5f256154f 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3939,13 +3939,24 @@ qemuProcessNeedMemoryBackingPath(virDomainDef *def, return true; }
- if (mem && - mem->model == VIR_DOMAIN_MEMORY_MODEL_DIMM && - (mem->access != VIR_DOMAIN_MEMORY_ACCESS_DEFAULT || - (mem->targetNode >= 0 && - virDomainNumaGetNodeMemoryAccessMode(def->numa, mem->targetNode) - != VIR_DOMAIN_MEMORY_ACCESS_DEFAULT))) - return true; + if (mem) { + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + if (mem->access != VIR_DOMAIN_MEMORY_ACCESS_DEFAULT) { + /* No need to check for access mode on the target node, + * it was checked for in the previous loop. */ + return true; + }
missing 'break;' and fall-through doesn't make sense here.
+ + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + /* Backed by user provided path. Not stored in memory + * backing dir anyway. */ + break; + } + }
With the above addressed: Reviewed-by: Peter Krempa <pkrempa@redhat.com>

This commit introduces a new capability that reflects virtio-mem-pci device support in QEMU: QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI, /* -device virtio-mem-pci */ The virtio-mem-pci device was introduced in QEMU 5.1. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml | 1 + 5 files changed, 6 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 7971a9c557..1d246725df 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -629,6 +629,7 @@ VIR_ENUM_IMPL(virQEMUCaps, /* 400 */ "compat-deprecated", "acpi-index", + "virtio-mem-pci", ); @@ -1347,6 +1348,7 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { { "am53c974", QEMU_CAPS_SCSI_AM53C974 }, { "virtio-pmem-pci", QEMU_CAPS_DEVICE_VIRTIO_PMEM_PCI }, { "vhost-user-blk", QEMU_CAPS_DEVICE_VHOST_USER_BLK }, + { "virtio-mem-pci", QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI }, }; diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index f54aad5dfd..9119221925 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -609,6 +609,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ /* 400 */ QEMU_CAPS_COMPAT_DEPRECATED, /* -compat deprecated-(input|output) is supported */ QEMU_CAPS_ACPI_INDEX, /* PCI device 'acpi-index' property */ + QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI, /* -device virtio-mem-pci */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml index 12bb9a1b0f..253f050e0a 100644 --- a/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml @@ -257,6 +257,7 @@ <flag name='audiodev'/> <flag name='blockdev-backup'/> <flag name='rotation-rate'/> + <flag name='virtio-mem-pci'/> <version>5001000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml index 2fee135b1e..470cb769aa 100644 --- a/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml @@ -258,6 +258,7 @@ <flag name='audiodev'/> <flag name='blockdev-backup'/> <flag name='rotation-rate'/> + <flag name='virtio-mem-pci'/> <version>5002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml index 8cc949d735..db46a385cb 100644 --- a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml @@ -263,6 +263,7 @@ <flag name='rotation-rate'/> <flag name='compat-deprecated'/> <flag name='acpi-index'/> + <flag name='virtio-mem-pci'/> <version>5002092</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100242</microcodeVersion> -- 2.26.3

On Fri, Apr 23, 2021 at 15:24:26 +0200, Michal Privoznik wrote:
This commit introduces a new capability that reflects virtio-mem-pci device support in QEMU:
QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI, /* -device virtio-mem-pci */
The virtio-mem-pci device was introduced in QEMU 5.1.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml | 1 + 5 files changed, 6 insertions(+)
Reviewed-by: Peter Krempa <pkrempa@redhat.com>

The virtio-mem is paravirtualized mechanism of adding/removing memory to/from a VM. A virtio-mem-pci device is split into blocks of equal size which are then exposed (all or only a requested portion of them) to the guest kernel to use as regular memory. Therefore, the device has two important attributes: 1) block-size, which defines the size of a block 2) requested-size, which defines how much memory (in bytes) is the device requested to expose to the guest. The 'block-size' is configured on command line and immutable throughout device's lifetime. The 'requested-size' can be set on the command line too, but also is adjustable via monitor. In fact, that is how management software places its requests to change the memory allocation. If it wants to give more memory to the guest it changes 'requested-size' to a bigger value, and if it wants to shrink guest memory it changes the 'requested-size' to a smaller value. Note, value of zero means that guest should release all memory offered by the device. Of course, guest has to cooperate. Therefore, there is a third attribute 'size' which is read only and reflects how much memory the guest still has. This can be different to 'requested-size', obviously. Because of name clash, I've named it 'actualsize' and it is dealt with in future commits (it is a runtime information anyway). In the backend, memory for virtio-mem is backed by usual objects: memory-backend-{ram,file,memfd} and their size puts the cap on the amount of memory that a virtio-mem device can offer to a guest. But we are already able to express this info using <size/> under <target/>. Therefore, we need only two more elements to cover 'block-size' and 'requested-size' attributes. This is the XML I've came up with: <memory model='virtio-mem'> <source> <nodemask>1-3</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>2097152</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>1048576</requested> </target> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </memory> I hope by now it is obvious that: 1) 'requested-size' must be an integer multiple of 'block-size', and 2) virtio-mem-pci device goes onto PCI bus and thus needs PCI address. Then there is a limitation that the minimal 'block-size' is transparent huge page size (I'll leave this without explanation). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.rst | 38 +++++++++-- docs/schemas/domaincommon.rng | 11 +++ src/conf/domain_conf.c | 53 ++++++++++++++- src/conf/domain_conf.h | 3 + src/conf/domain_validate.c | 39 +++++++++++ src/qemu/qemu_alias.c | 1 + src/qemu/qemu_command.c | 1 + src/qemu/qemu_domain.c | 10 +++ src/qemu/qemu_domain_address.c | 38 ++++++++--- src/qemu/qemu_process.c | 2 + src/qemu/qemu_validate.c | 8 +++ src/security/security_apparmor.c | 1 + src/security/security_dac.c | 2 + src/security/security_selinux.c | 2 + .../memory-hotplug-virtio-mem.xml | 67 +++++++++++++++++++ ...emory-hotplug-virtio-mem.x86_64-latest.xml | 1 + tests/qemuxml2xmltest.c | 1 + 17 files changed, 261 insertions(+), 17 deletions(-) create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml create mode 120000 tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 1b9b221611..44e478c88a 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -7721,6 +7721,18 @@ Example: usage of the memory devices <size unit='KiB'>524288</size> </target> </memory> + <memory model='virtio-mem'> + <source> + <nodemask>1-3</nodemask> + <pagesize unit='KiB'>2048</pagesize> + </source> + <target> + <size unit='KiB'>2097152</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>1048576</requested> + </target> + </memory> </devices> ... @@ -7728,7 +7740,8 @@ Example: usage of the memory devices Provide ``dimm`` to add a virtual DIMM module to the guest. :since:`Since 1.2.14` Provide ``nvdimm`` model that adds a Non-Volatile DIMM module. :since:`Since 3.2.0` Provide ``virtio-pmem`` model to add a paravirtualized - persistent memory device. :since:`Since 7.1.0` + persistent memory device. :since:`Since 7.1.0` Provide ``virtio-mem`` model + to add paravirtualized memory device. :since:`Since 7.4.0` ``access`` An optional attribute ``access`` ( :since:`since 3.2.0` ) that provides @@ -7751,10 +7764,11 @@ Example: usage of the memory devices allowed only for ``model='nvdimm'`` for pSeries guests. :since:`Since 6.2.0` ``source`` - For model ``dimm`` this element is optional and allows to fine tune the - source of the memory used for the given memory device. If the element is not - provided defaults configured via ``numatune`` are used. If ``dimm`` is - provided, then the following optional elements can be provided as well: + For model ``dimm`` and model ``virtio-mem`` this element is optional and + allows to fine tune the source of the memory used for the given memory + device. If the element is not provided defaults configured via ``numatune`` + are used. If the element is provided, then the following optional elements + can be provided: ``pagesize`` This element can be used to override the default host page size used for @@ -7793,7 +7807,8 @@ Example: usage of the memory devices added memory from the perspective of the guest. The mandatory ``size`` subelement configures the size of the added memory as - a scaled integer. + a scaled integer. For ``virtio-mem`` this represents the maximum possible + size exposed to the guest. The ``node`` subelement configures the guest NUMA node to attach the memory to. The element shall be used only if the guest has NUMA nodes configured. @@ -7820,6 +7835,17 @@ Example: usage of the memory devices so other backend types should use the ``readonly`` element. :since:`Since 5.0.0` + ``block`` + For ``virtio-mem`` only. + The size of an individual block, granularity of division of memory module. + Must be power of two and at least equal to size of a transparent hugepage + (2MiB on x84_64). The default is hypervisor dependent. + + ``requested`` + For ``virtio-mem`` only. + The total size exposed to the guest. Must respect ``block`` granularity + and be smaller or equal to ``size``. + :anchor:`<a id="elementsIommu"/>` IOMMU devices diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index a2e5c50c1d..b75c2f3ead 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -6521,6 +6521,7 @@ <value>dimm</value> <value>nvdimm</value> <value>virtio-pmem</value> + <value>virtio-mem</value> </choice> </attribute> <optional> @@ -6608,6 +6609,16 @@ <ref name="unsignedInt"/> </element> </optional> + <optional> + <element name="block"> + <ref name="scaledInteger"/> + </element> + </optional> + <optional> + <element name="requested"> + <ref name="scaledInteger"/> + </element> + </optional> <optional> <element name="label"> <element name="size"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 24c0943d62..240bcfa020 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -1373,6 +1373,7 @@ VIR_ENUM_IMPL(virDomainMemoryModel, "dimm", "nvdimm", "virtio-pmem", + "virtio-mem", ); VIR_ENUM_IMPL(virDomainShmemModel, @@ -5470,6 +5471,7 @@ virDomainMemoryDefPostParse(virDomainMemoryDef *mem, } break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_DIMM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: @@ -15396,6 +15398,7 @@ virDomainMemorySourceDefParseXML(xmlNodePtr node, switch (def->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: if (virDomainParseMemory("./pagesize", "./pagesize/@unit", ctxt, &def->pagesize, false, false) < 0) return -1; @@ -15462,7 +15465,8 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node, &def->size, true, false) < 0) return -1; - if (def->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) { + switch (def->model) { + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: if (virDomainParseMemory("./label/size", "./label/size/@unit", ctxt, &def->labelsize, false, false) < 0) return -1; @@ -15481,6 +15485,23 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node, if (virXPathBoolean("boolean(./readonly)", ctxt)) def->readonly = true; + break; + + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (virDomainParseMemory("./block", "./block/@unit", ctxt, + &def->blocksize, false, false) < 0) + return -1; + + if (virDomainParseMemory("./requested", "./requested/@unit", ctxt, + &def->requestedsize, false, false) < 0) + return -1; + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; } return 0; @@ -17301,11 +17322,14 @@ virDomainMemoryFindByDefInternal(virDomainDef *def, /* target info -> always present */ if (tmp->model != mem->model || tmp->targetNode != mem->targetNode || - tmp->size != mem->size) + tmp->size != mem->size || + tmp->blocksize != mem->blocksize || + tmp->requestedsize != mem->requestedsize) continue; switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: /* source stuff -> match with device */ if (tmp->pagesize != mem->pagesize) continue; @@ -22943,6 +22967,22 @@ virDomainMemoryDefCheckABIStability(virDomainMemoryDef *src, return false; } + if (src->blocksize != dst->blocksize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Target memory device block size '%llu' doesn't match " + "source memory device block size '%llu'"), + dst->blocksize, src->blocksize); + return false; + } + + if (src->requestedsize != dst->requestedsize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Target memory device requested size '%llu' doesn't match " + "source memory device requested size '%llu'"), + dst->requestedsize, src->requestedsize); + return false; + } + if (src->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) { if (src->labelsize != dst->labelsize) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, @@ -26862,6 +26902,7 @@ virDomainMemorySourceDefFormat(virBuffer *buf, switch (def->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: if (def->sourceNodes) { if (!(bitmap = virBitmapFormat(def->sourceNodes))) return -1; @@ -26918,6 +26959,14 @@ virDomainMemoryTargetDefFormat(virBuffer *buf, if (def->readonly) virBufferAddLit(&childBuf, "<readonly/>\n"); + if (def->blocksize) { + virBufferAsprintf(&childBuf, "<block unit='KiB'>%llu</block>\n", + def->blocksize); + + virBufferAsprintf(&childBuf, "<requested unit='KiB'>%llu</requested>\n", + def->requestedsize); + } + virXMLFormatElement(buf, "target", NULL, &childBuf); } diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index a7cad31896..884e2c4d17 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2446,6 +2446,7 @@ typedef enum { VIR_DOMAIN_MEMORY_MODEL_DIMM, /* dimm hotpluggable memory device */ VIR_DOMAIN_MEMORY_MODEL_NVDIMM, /* nvdimm memory device */ VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM, /* virtio-pmem memory device */ + VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM, /* virtio-mem memory device */ VIR_DOMAIN_MEMORY_MODEL_LAST } virDomainMemoryModel; @@ -2466,6 +2467,8 @@ struct _virDomainMemoryDef { int targetNode; unsigned long long size; /* kibibytes */ unsigned long long labelsize; /* kibibytes; valid only for NVDIMM */ + unsigned long long blocksize; /* kibibytes; valid only for VIRTIO_MEM */ + unsigned long long requestedsize; /* kibibytes; valid only for VIRTIO_MEM */ bool readonly; /* valid only for NVDIMM */ /* required for QEMU NVDIMM ppc64 support */ diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c index 686b9e8d16..16fa65bc6b 100644 --- a/src/conf/domain_validate.c +++ b/src/conf/domain_validate.c @@ -28,6 +28,7 @@ #include "virlog.h" #include "virutil.h" #include "virstring.h" +#include "virhostmem.h" #define VIR_FROM_THIS VIR_FROM_DOMAIN @@ -1843,6 +1844,8 @@ static int virDomainMemoryDefValidate(const virDomainMemoryDef *mem, const virDomainDef *def) { + unsigned long long thpSize; + switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: if (!mem->nvdimmPath) { @@ -1896,6 +1899,42 @@ virDomainMemoryDefValidate(const virDomainMemoryDef *mem, _("virtio-pmem does not support NUMA nodes")); return -1; } + break; + + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (mem->requestedsize > mem->size) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("requested size must be smaller than @size")); + return -1; + } + + if (!VIR_IS_POW2(mem->blocksize)) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("block size must be a power of two")); + return -1; + } + + if (virHostMemGetTHPSize(&thpSize) < 0) { + /* We failed to get THP size, fall back to a sane default. On + * almost every architecture the size will be 2MiB, except for some + * funky arches like sparc and m68k. Use 2MiB and refine later if + * somebody complains. */ + thpSize = 2048; + } + + if (mem->blocksize < thpSize) { + virReportError(VIR_ERR_XML_DETAIL, + _("block size too small, must be at least %lluKiB"), + thpSize); + return -1; + } + + if (mem->requestedsize % mem->blocksize != 0) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("requested size must be an integer multiple of block size")); + return -1; + } + break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: break; diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c index 63638b1402..e1caeec006 100644 --- a/src/qemu/qemu_alias.c +++ b/src/qemu/qemu_alias.c @@ -523,6 +523,7 @@ qemuAssignDeviceMemoryAlias(virDomainDef *def, case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: prefix = "virtiopmem"; break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index be93182092..422efa22ab 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3452,6 +3452,7 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def, device = "virtio-pmem-pci"; break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 6e3e3555c7..c182b47f38 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -8968,6 +8968,16 @@ qemuDomainDefValidateMemoryHotplugDevice(const virDomainMemoryDef *mem, needsNuma = false; break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI && + mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("only 'pci' addresses are supported for the %s device"), + virDomainMemoryModelTypeToString(mem->model)); + return -1; + } + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: return -1; diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 1ee75b8f2e..c632b055f1 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -380,9 +380,18 @@ qemuDomainPrimeVirtioDeviceAddresses(virDomainDef *def, } for (i = 0; i < def->nmems; i++) { - if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM && - def->mems[i]->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) - def->mems[i]->info.type = type; + switch (def->mems[i]->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (def->mems[i]->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) + def->mems[i]->info.type = type; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } } if (type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_CCW) { @@ -1012,6 +1021,7 @@ qemuDomainDeviceCalculatePCIConnectFlags(virDomainDeviceDef *dev, case VIR_DOMAIN_DEVICE_MEMORY: switch (dev->data.memory->model) { case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: return virtioFlags; case VIR_DOMAIN_MEMORY_MODEL_NONE: @@ -2439,12 +2449,19 @@ qemuDomainAssignDevicePCISlots(virDomainDef *def, for (i = 0; i < def->nmems; i++) { virDomainMemoryDef *mem = def->mems[i]; - if (mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM || - !virDeviceInfoPCIAddressIsWanted(&mem->info)) - continue; - - if (qemuDomainPCIAddressReserveNextAddr(addrs, &mem->info) < 0) - return -1; + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (virDeviceInfoPCIAddressIsWanted(&mem->info) && + qemuDomainPCIAddressReserveNextAddr(addrs, &mem->info) < 0) + return -1; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } } return 0; @@ -3110,6 +3127,7 @@ qemuDomainAssignMemoryDeviceSlot(virQEMUDriver *driver, break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: return qemuDomainEnsurePCIAddress(vm, &dev, driver); break; @@ -3135,6 +3153,7 @@ qemuDomainReleaseMemoryDeviceSlot(virDomainObj *vm, break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: qemuDomainReleaseDeviceAddress(vm, &mem->info); break; @@ -3169,6 +3188,7 @@ qemuDomainAssignMemorySlots(virDomainDef *def) break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: /* handled in qemuDomainAssignPCIAddresses() */ break; case VIR_DOMAIN_MEMORY_MODEL_NONE: diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index a5f256154f..9c16e51917 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3878,6 +3878,7 @@ qemuProcessDomainMemoryDefNeedHugepagesPath(const virDomainMemoryDef *mem, { switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: return mem->pagesize && mem->pagesize != system_pagesize; @@ -3942,6 +3943,7 @@ qemuProcessNeedMemoryBackingPath(virDomainDef *def, if (mem) { switch (mem->model) { case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: if (mem->access != VIR_DOMAIN_MEMORY_ACCESS_DEFAULT) { /* No need to check for access mode on the target node, * it was checked for in the previous loop. */ diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 255d653118..b02b33a0f1 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -4895,6 +4895,14 @@ qemuValidateDomainDeviceDefMemory(virDomainMemoryDef *mem, } break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("virtio-mem isn't supported by this QEMU binary")); + return -1; + } + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; diff --git a/src/security/security_apparmor.c b/src/security/security_apparmor.c index a2a8435fe4..35642129f7 100644 --- a/src/security/security_apparmor.c +++ b/src/security/security_apparmor.c @@ -693,6 +693,7 @@ AppArmorSetMemoryLabel(virSecurityManager *mgr, return reload_profile(mgr, def, mem->nvdimmPath, true); case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; } diff --git a/src/security/security_dac.c b/src/security/security_dac.c index e973964735..c2b80200f0 100644 --- a/src/security/security_dac.c +++ b/src/security/security_dac.c @@ -1872,6 +1872,7 @@ virSecurityDACRestoreMemoryLabel(virSecurityManager *mgr, break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: case VIR_DOMAIN_MEMORY_MODEL_NONE: ret = 0; @@ -2055,6 +2056,7 @@ virSecurityDACSetMemoryLabel(virSecurityManager *mgr, break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: case VIR_DOMAIN_MEMORY_MODEL_NONE: ret = 0; diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index 29628d8953..f8fc6bae89 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1578,6 +1578,7 @@ virSecuritySELinuxSetMemoryLabel(virSecurityManager *mgr, case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_LAST: break; } @@ -1605,6 +1606,7 @@ virSecuritySELinuxRestoreMemoryLabel(virSecurityManager *mgr, break; case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: ret = 0; diff --git a/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml new file mode 100644 index 0000000000..c10528aad8 --- /dev/null +++ b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml @@ -0,0 +1,67 @@ +<domain type='kvm'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <maxMemory slots='16' unit='KiB'>1099511627776</maxMemory> + <memory unit='KiB'>8388608</memory> + <currentMemory unit='KiB'>8388608</currentMemory> + <vcpu placement='static' cpuset='0-1'>2</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <cpu mode='custom' match='exact' check='none'> + <model fallback='forbid'>qemu64</model> + <topology sockets='2' dies='1' cores='1' threads='1'/> + <numa> + <cell id='0' cpus='0-1' memory='2095104' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i386</emulator> + <disk type='block' device='disk'> + <driver name='qemu' type='raw'/> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='ide' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> + </controller> + <controller type='usb' index='0' model='piix3-uhci'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <audio id='1' type='none'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </memballoon> + <memory model='virtio-mem'> + <target> + <size unit='KiB'>1048576</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>524288</requested> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </memory> + <memory model='virtio-mem'> + <source> + <nodemask>1-3</nodemask> + <pagesize unit='KiB'>2048</pagesize> + </source> + <target> + <size unit='KiB'>2097152</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>1048576</requested> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </memory> + </devices> +</domain> diff --git a/tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml b/tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml new file mode 120000 index 0000000000..a9d298129c --- /dev/null +++ b/tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml @@ -0,0 +1 @@ +../qemuxml2argvdata/memory-hotplug-virtio-mem.xml \ No newline at end of file diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index c37de0c704..e37d6eac62 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -1288,6 +1288,7 @@ mymain(void) QEMU_CAPS_DEVICE_NVDIMM, QEMU_CAPS_LAST); DO_TEST_CAPS_LATEST("memory-hotplug-virtio-pmem"); + DO_TEST_CAPS_LATEST("memory-hotplug-virtio-mem"); DO_TEST("net-udp", NONE); -- 2.26.3

On Fri, Apr 23, 2021 at 15:24:27 +0200, Michal Privoznik wrote: [...]
diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 1b9b221611..44e478c88a 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst
[...]
@@ -7793,7 +7807,8 @@ Example: usage of the memory devices added memory from the perspective of the guest.
The mandatory ``size`` subelement configures the size of the added memory as - a scaled integer. + a scaled integer. For ``virtio-mem`` this represents the maximum possible + size exposed to the guest.
The ``node`` subelement configures the guest NUMA node to attach the memory to. The element shall be used only if the guest has NUMA nodes configured. @@ -7820,6 +7835,17 @@ Example: usage of the memory devices so other backend types should use the ``readonly`` element. :since:`Since 5.0.0`
+ ``block`` + For ``virtio-mem`` only. + The size of an individual block, granularity of division of memory module.
I'd refrain from using 'memory module' as this is specifically paravirtualized -> no modules.
+ Must be power of two and at least equal to size of a transparent hugepage + (2MiB on x84_64). The default is hypervisor dependent. + + ``requested`` + For ``virtio-mem`` only. + The total size exposed to the guest. Must respect ``block`` granularity + and be smaller or equal to ``size``. + :anchor:`<a id="elementsIommu"/>`
IOMMU devices
[...]
diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c index 686b9e8d16..16fa65bc6b 100644 --- a/src/conf/domain_validate.c +++ b/src/conf/domain_validate.c
[...]
@@ -1896,6 +1899,42 @@ virDomainMemoryDefValidate(const virDomainMemoryDef *mem, _("virtio-pmem does not support NUMA nodes")); return -1; } + break; + + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (mem->requestedsize > mem->size) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("requested size must be smaller than @size")); + return -1; + }
I'd expect that 'mem->requestedsize == mem->size' is also okay otherwise it'll rob you from using the last block.
+ + if (!VIR_IS_POW2(mem->blocksize)) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("block size must be a power of two")); + return -1; + } + + if (virHostMemGetTHPSize(&thpSize) < 0) { + /* We failed to get THP size, fall back to a sane default. On + * almost every architecture the size will be 2MiB, except for some + * funky arches like sparc and m68k. Use 2MiB and refine later if + * somebody complains. */ + thpSize = 2048; + }
[...]
diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c index 63638b1402..e1caeec006 100644 --- a/src/qemu/qemu_alias.c +++ b/src/qemu/qemu_alias.c @@ -523,6 +523,7 @@ qemuAssignDeviceMemoryAlias(virDomainDef *def, case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: prefix = "virtiopmem"; break; + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default:
In this hunk you refrain from adding qemuisms and they are added later, but all the following hunks add qemu-specific details. Specifically the last one adding qemuCaps check is questionable.
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 6e3e3555c7..c182b47f38 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -8968,6 +8968,16 @@ qemuDomainDefValidateMemoryHotplugDevice(const virDomainMemoryDef *mem, needsNuma = false; break;
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI && + mem->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("only 'pci' addresses are supported for the %s device"), + virDomainMemoryModelTypeToString(mem->model)); + return -1; + } + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: return -1; diff --git a/src/qemu/qemu_domain_address.c b/src/qemu/qemu_domain_address.c index 1ee75b8f2e..c632b055f1 100644 --- a/src/qemu/qemu_domain_address.c +++ b/src/qemu/qemu_domain_address.c @@ -380,9 +380,18 @@ qemuDomainPrimeVirtioDeviceAddresses(virDomainDef *def, }
for (i = 0; i < def->nmems; i++) { - if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM && - def->mems[i]->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) - def->mems[i]->info.type = type; + switch (def->mems[i]->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (def->mems[i]->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE) + def->mems[i]->info.type = type; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } }
if (type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_CCW) {
[...]
@@ -2439,12 +2449,19 @@ qemuDomainAssignDevicePCISlots(virDomainDef *def, for (i = 0; i < def->nmems; i++) { virDomainMemoryDef *mem = def->mems[i];
- if (mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM || - !virDeviceInfoPCIAddressIsWanted(&mem->info)) - continue; - - if (qemuDomainPCIAddressReserveNextAddr(addrs, &mem->info) < 0) - return -1; + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (virDeviceInfoPCIAddressIsWanted(&mem->info) && + qemuDomainPCIAddressReserveNextAddr(addrs, &mem->info) < 0) + return -1; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + break; + } }
return 0;
[...]
diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 255d653118..b02b33a0f1 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -4895,6 +4895,14 @@ qemuValidateDomainDeviceDefMemory(virDomainMemoryDef *mem, } break;
+ case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("virtio-mem isn't supported by this QEMU binary")); + return -1; + } + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: break;
With the nits addressed: Reviewed-by: Peter Krempa <pkrempa@redhat.com>

Nothing special is happening here. All important changes were done when for 'virtio-pmem' (adjusting the code to put virtio memory on PCI bus, generating alias using qemuDomainDeviceAliasIndex(). The only bit that might look suspicious is no prealloc for virtio-mem. But if you think about it, the whole purpose of this device is to change amount of memory exposed to guest on the fly. There is no point in locking the whole backend in memory. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_alias.c | 9 +++- src/qemu/qemu_command.c | 12 +++++- ...mory-hotplug-virtio-mem.x86_64-latest.args | 41 +++++++++++++++++++ tests/qemuxml2argvtest.c | 1 + 4 files changed, 60 insertions(+), 3 deletions(-) create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c index e1caeec006..39a48f0e71 100644 --- a/src/qemu/qemu_alias.c +++ b/src/qemu/qemu_alias.c @@ -476,8 +476,11 @@ qemuDeviceMemoryGetAliasID(virDomainDef *def, size_t i; int maxidx = 0; - /* virtio-pmem goes onto PCI bus and thus DIMM address is not valid */ - if (!oldAlias && mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM) + /* virtio-pmem and virtio-mem go onto PCI bus and thus DIMM address is not + * valid */ + if (!oldAlias && + mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM && + mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) return mem->info.addr.dimm.slot; for (i = 0; i < def->nmems; i++) { @@ -524,6 +527,8 @@ qemuAssignDeviceMemoryAlias(virDomainDef *def, prefix = "virtiopmem"; break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + prefix = "virtiomem"; + break; case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 422efa22ab..065a8649c8 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3219,7 +3219,9 @@ qemuBuildMemoryBackendProps(virJSONValue **backendProps, } else if (useHugepage) { if (qemuGetDomainHupageMemPath(priv->driver, def, pagesize, &memPath) < 0) return -1; - prealloc = true; + /* For virtio-mem backed by hugepages we don't need prealloc. */ + if (mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + prealloc = true; } else { /* We can have both pagesize and mem source. If that's the case, * prefer hugepages as those are more specific. */ @@ -3453,6 +3455,9 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def, break; case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + device = "virtio-mem-pci"; + break; + case VIR_DOMAIN_MEMORY_MODEL_NONE: case VIR_DOMAIN_MEMORY_MODEL_LAST: default: @@ -3469,6 +3474,11 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def, if (mem->labelsize) virBufferAsprintf(&buf, "label-size=%llu,", mem->labelsize * 1024); + if (mem->blocksize) { + virBufferAsprintf(&buf, "block-size=%llu,", mem->blocksize * 1024); + virBufferAsprintf(&buf, "requested-size=%llu,", mem->requestedsize * 1024); + } + if (mem->uuid) { char uuidstr[VIR_UUID_STRING_BUFLEN]; diff --git a/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args new file mode 100644 index 0000000000..16f10c14ad --- /dev/null +++ b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args @@ -0,0 +1,41 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +/usr/bin/qemu-system-i386 \ +-name guest=QEMUGuest1,debug-threads=on \ +-S \ +-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/tmp/lib/domain--1-QEMUGuest1/master-key.aes"}' \ +-machine pc,accel=kvm,usb=off,dump-guest-core=off \ +-cpu qemu64 \ +-m size=2095104k,slots=16,maxmem=1099511627776k \ +-overcommit mem-lock=off \ +-smp 2,sockets=2,dies=1,cores=1,threads=1 \ +-object '{"qom-type":"memory-backend-ram","id":"ram-node0","size":2145386496}' \ +-numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \ +-object '{"qom-type":"memory-backend-ram","id":"memvirtiomem0","size":1073741824}' \ +-device virtio-mem-pci,node=0,block-size=2097152,requested-size=536870912,memdev=memvirtiomem0,id=virtiomem0,bus=pci.0,addr=0x2 \ +-object '{"qom-type":"memory-backend-file","id":"memvirtiomem1","mem-path":"/dev/hugepages2M/libvirt/qemu/-1-QEMUGuest1","size":2147483648,"host-nodes":[1,2,3],"policy":"bind"}' \ +-device virtio-mem-pci,node=0,block-size=2097152,requested-size=1073741824,memdev=memvirtiomem1,id=virtiomem1,bus=pci.0,addr=0x3 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-boot strict=on \ +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ +-blockdev '{"driver":"host_device","filename":"/dev/HostVG/QEMUGuest1","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ +-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}' \ +-device ide-hd,bus=ide.0,unit=0,drive=libvirt-1-format,id=ide0-0-0,bootindex=1 \ +-audiodev id=audio1,driver=none \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ +-msg timestamp=on diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index f0efe98d7e..b5de93eccb 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -3228,6 +3228,7 @@ mymain(void) QEMU_CAPS_LAST); DO_TEST_CAPS_VER("memory-hotplug-virtio-pmem", "5.2.0"); DO_TEST_CAPS_LATEST("memory-hotplug-virtio-pmem"); + DO_TEST_CAPS_LATEST("memory-hotplug-virtio-mem"); DO_TEST("machine-aeskeywrap-on-caps", QEMU_CAPS_AES_KEY_WRAP, -- 2.26.3

On Fri, Apr 23, 2021 at 15:24:28 +0200, Michal Privoznik wrote:
Nothing special is happening here. All important changes were done when for 'virtio-pmem' (adjusting the code to put virtio memory on PCI bus, generating alias using qemuDomainDeviceAliasIndex(). The only bit that might look suspicious is no prealloc for virtio-mem. But if you think about it, the whole purpose of this device is to change amount of memory exposed to guest on the fly. There is no point in locking the whole backend in memory.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_alias.c | 9 +++- src/qemu/qemu_command.c | 12 +++++- ...mory-hotplug-virtio-mem.x86_64-latest.args | 41 +++++++++++++++++++ tests/qemuxml2argvtest.c | 1 + 4 files changed, 60 insertions(+), 3 deletions(-) create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args
Reviewed-by: Peter Krempa <pkrempa@redhat.com>

As advertised in one of previous commits, we want to be able to change 'requested-size' attribute of virtio-mem on the fly. This commit does exactly that. Changing anything else is checked for and forbidden. Once guest has changed the allocation, QEMU emits an event which we will use to track the allocation. In the next commit. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/domain_conf.c | 36 ++++++++ src/conf/domain_conf.h | 4 + src/libvirt_private.syms | 1 + src/qemu/qemu_driver.c | 172 ++++++++++++++++++++++++++++++++++- src/qemu/qemu_hotplug.c | 18 ++++ src/qemu/qemu_hotplug.h | 5 + src/qemu/qemu_monitor.c | 13 +++ src/qemu/qemu_monitor.h | 5 + src/qemu/qemu_monitor_json.c | 15 +++ src/qemu/qemu_monitor_json.h | 5 + 10 files changed, 273 insertions(+), 1 deletion(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 240bcfa020..e75c5c9dcd 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -17384,6 +17384,42 @@ virDomainMemoryFindInactiveByDef(virDomainDef *def, } +/** + * virDomainMemoryFindByDeviceInfo: + * @def: domain defintion + * @info: device info to match + * + * For given domain definition @def find <memory/> device with + * matching address and matching device alias (if set in @info, + * otherwise ignored). + * + * Returns: device if found, + * NULL otherwise. + */ +virDomainMemoryDef * +virDomainMemoryFindByDeviceInfo(virDomainDef *def, + virDomainDeviceInfo *info) +{ + size_t i; + + for (i = 0; i < def->nmems; i++) { + virDomainMemoryDef *tmp = def->mems[i]; + + if (!virDomainDeviceInfoAddressIsEqual(&tmp->info, info)) + continue; + + /* alias, if present */ + if (info->alias && + STRNEQ_NULLABLE(tmp->info.alias, info->alias)) + continue; + + return tmp; + } + + return NULL; +} + + /** * virDomainMemoryInsert: * diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 884e2c4d17..4fc864b1eb 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -3757,6 +3757,10 @@ int virDomainMemoryFindByDef(virDomainDef *def, virDomainMemoryDef *mem) int virDomainMemoryFindInactiveByDef(virDomainDef *def, virDomainMemoryDef *mem) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; +virDomainMemoryDef * +virDomainMemoryFindByDeviceInfo(virDomainDef *dev, + virDomainDeviceInfo *info) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; int virDomainShmemDefInsert(virDomainDef *def, virDomainShmemDef *shmem) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 1a3295b111..66af60a775 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -502,6 +502,7 @@ virDomainMemballoonModelTypeFromString; virDomainMemballoonModelTypeToString; virDomainMemoryDefFree; virDomainMemoryFindByDef; +virDomainMemoryFindByDeviceInfo; virDomainMemoryFindInactiveByDef; virDomainMemoryInsert; virDomainMemoryModelTypeToString; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index d908e95ba7..9a241ad551 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7130,6 +7130,165 @@ qemuDomainChangeDiskLive(virDomainObj *vm, return 0; } + +static bool +qemuDomainChangeMemoryLiveValidateChange(const virDomainMemoryDef *oldDef, + const virDomainMemoryDef *newDef) +{ + /* The only thing that is allowed to change is 'requestedsize' for virtio + * model. */ + if (oldDef->model != newDef->model) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory model from '%s' to '%s'"), + virDomainMemoryModelTypeToString(oldDef->model), + virDomainMemoryModelTypeToString(newDef->model)); + return false; + } + + if (oldDef->access != newDef->access) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory access from '%s' to '%s'"), + virDomainMemoryAccessTypeToString(oldDef->access), + virDomainMemoryAccessTypeToString(newDef->access)); + return false; + } + + if (oldDef->discard != newDef->discard) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory discard from '%s' to '%s'"), + virTristateBoolTypeToString(oldDef->discard), + virTristateBoolTypeToString(newDef->discard)); + return false; + } + + if (!virBitmapEqual(oldDef->sourceNodes, + newDef->sourceNodes)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cannot modify memory source nodes")); + return false; + } + + if (oldDef->pagesize != newDef->pagesize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory pagesize from '%llu' to '%llu'"), + oldDef->pagesize, + newDef->pagesize); + return false; + } + + if (STRNEQ_NULLABLE(oldDef->nvdimmPath, newDef->nvdimmPath)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory path from '%s' to '%s'"), + NULLSTR(oldDef->nvdimmPath), + NULLSTR(newDef->nvdimmPath)); + return false; + } + + if (oldDef->alignsize != newDef->alignsize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory align size from '%llu' to '%llu'"), + oldDef->alignsize, newDef->alignsize); + return false; + } + + if (oldDef->nvdimmPmem != newDef->nvdimmPmem) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory pmem from '%d' to '%d'"), + oldDef->nvdimmPmem, newDef->nvdimmPmem); + return false; + } + + if (oldDef->targetNode != newDef->targetNode) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory targetNode from '%d' to '%d'"), + oldDef->targetNode, newDef->targetNode); + return false; + } + + if (oldDef->size != newDef->size) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory size from '%llu' to '%llu'"), + oldDef->size, newDef->size); + return false; + } + + if (oldDef->labelsize != newDef->labelsize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory label size from '%llu' to '%llu'"), + oldDef->labelsize, newDef->labelsize); + return false; + } + if (oldDef->blocksize != newDef->blocksize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory block size from '%llu' to '%llu'"), + oldDef->blocksize, newDef->blocksize); + return false; + } + + /* requestedsize can change */ + + if (oldDef->readonly != newDef->readonly) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cannot modify memory pmem flag")); + return false; + } + + if ((oldDef->uuid || newDef->uuid) && + !(oldDef->uuid && newDef->uuid && + memcmp(oldDef->uuid, newDef->uuid, VIR_UUID_BUFLEN) == 0)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cannot modify memory UUID")); + return false; + } + + switch (oldDef->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory of model '%s'"), + virDomainMemoryModelTypeToString(oldDef->model)); + return false; + break; + } + + return true; +} + + +static int +qemuDomainChangeMemoryLive(virQEMUDriver *driver G_GNUC_UNUSED, + virDomainObj *vm, + virDomainDeviceDef *dev) +{ + virDomainMemoryDef *newDef = dev->data.memory; + virDomainMemoryDef *oldDef = NULL; + + oldDef = virDomainMemoryFindByDeviceInfo(vm->def, &dev->data.memory->info); + if (!oldDef) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("memory '%s' not found"), dev->data.memory->info.alias); + return -1; + } + + if (!qemuDomainChangeMemoryLiveValidateChange(oldDef, newDef)) + return -1; + + if (qemuDomainChangeMemoryRequestedSize(driver, vm, + newDef->info.alias, + newDef->requestedsize) < 0) + return -1; + + oldDef->requestedsize = newDef->requestedsize; + return 0; +} + + static int qemuDomainUpdateDeviceLive(virDomainObj *vm, virDomainDeviceDef *dev, @@ -7171,6 +7330,18 @@ qemuDomainUpdateDeviceLive(virDomainObj *vm, ret = qemuDomainChangeNet(driver, vm, dev); break; + case VIR_DOMAIN_DEVICE_MEMORY: + oldDev.data.memory = virDomainMemoryFindByDeviceInfo(vm->def, &dev->data.memory->info); + if (oldDev.data.memory) { + if (virDomainDefCompatibleDevice(vm->def, dev, &oldDev, + VIR_DOMAIN_DEVICE_ACTION_UPDATE, + true) < 0) + return -1; + } + + ret = qemuDomainChangeMemoryLive(driver, vm, dev); + break; + case VIR_DOMAIN_DEVICE_FS: case VIR_DOMAIN_DEVICE_INPUT: case VIR_DOMAIN_DEVICE_SOUND: @@ -7186,7 +7357,6 @@ qemuDomainUpdateDeviceLive(virDomainObj *vm, case VIR_DOMAIN_DEVICE_HOSTDEV: case VIR_DOMAIN_DEVICE_CONTROLLER: case VIR_DOMAIN_DEVICE_REDIRDEV: - case VIR_DOMAIN_DEVICE_MEMORY: case VIR_DOMAIN_DEVICE_CHR: case VIR_DOMAIN_DEVICE_NONE: case VIR_DOMAIN_DEVICE_TPM: diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 4344edc75b..1ac055bd6c 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -6704,3 +6704,21 @@ qemuDomainSetVcpuInternal(virQEMUDriver *driver, virBitmapFree(livevcpus); return ret; } + + +int +qemuDomainChangeMemoryRequestedSize(virQEMUDriver *driver, + virDomainObj *vm, + const char *alias, + unsigned long long requestedsize) +{ + qemuDomainObjPrivate *priv = vm->privateData; + int rc; + + qemuDomainObjEnterMonitor(driver, vm); + rc = qemuMonitorChangeMemoryRequestedSize(priv->mon, alias, requestedsize); + if (qemuDomainObjExitMonitor(driver, vm) < 0) + return -1; + + return rc; +} diff --git a/src/qemu/qemu_hotplug.h b/src/qemu/qemu_hotplug.h index df8f76f8d6..c8a9e5ebbb 100644 --- a/src/qemu/qemu_hotplug.h +++ b/src/qemu/qemu_hotplug.h @@ -160,3 +160,8 @@ int qemuHotplugAttachDBusVMState(virQEMUDriver *driver, int qemuHotplugRemoveDBusVMState(virQEMUDriver *driver, virDomainObj *vm, qemuDomainAsyncJob asyncJob); + +int qemuDomainChangeMemoryRequestedSize(virQEMUDriver *driver, + virDomainObj *vm, + const char *alias, + unsigned long long requestedsize); diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index f3f14c46b6..c5793a8238 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -4781,3 +4781,16 @@ qemuMonitorQueryDirtyRate(qemuMonitor *mon, return qemuMonitorJSONQueryDirtyRate(mon, info); } + + +int +qemuMonitorChangeMemoryRequestedSize(qemuMonitor *mon, + const char *alias, + unsigned long long requestedsize) +{ + VIR_DEBUG("alias=%s requestedsize=%llu", alias, requestedsize); + + QEMU_CHECK_MONITOR(mon); + + return qemuMonitorJSONChangeMemoryRequestedSize(mon, alias, requestedsize); +} diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 230d00a894..8d27a7e3fb 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1503,3 +1503,8 @@ struct _qemuMonitorDirtyRateInfo { int qemuMonitorQueryDirtyRate(qemuMonitor *mon, qemuMonitorDirtyRateInfo *info); + +int +qemuMonitorChangeMemoryRequestedSize(qemuMonitor *mon, + const char *alias, + unsigned long long requestedsize); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 652034472a..e2f68e838b 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -9568,3 +9568,18 @@ qemuMonitorJSONQueryDirtyRate(qemuMonitor *mon, return qemuMonitorJSONExtractDirtyRateInfo(data, info); } + + +int +qemuMonitorJSONChangeMemoryRequestedSize(qemuMonitor *mon, + const char *alias, + unsigned long long requestedsize) +{ + g_autofree char *path = g_strdup_printf("/machine/peripheral/%s", alias); + qemuMonitorJSONObjectProperty prop = { + .type = QEMU_MONITOR_OBJECT_PROPERTY_ULONG, + .val.ul = requestedsize * 1024, /* monitor needs bytes */ + }; + + return qemuMonitorJSONSetObjectProperty(mon, path, "requested-size", &prop); +} diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index 0846325a81..28280ca333 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -716,3 +716,8 @@ qemuMonitorJSONStartDirtyRateCalc(qemuMonitor *mon, int qemuMonitorJSONQueryDirtyRate(qemuMonitor *mon, qemuMonitorDirtyRateInfo *info); + +int +qemuMonitorJSONChangeMemoryRequestedSize(qemuMonitor *mon, + const char *alias, + unsigned long long requestedsize); -- 2.26.3

On Fri, Apr 23, 2021 at 15:24:29 +0200, Michal Privoznik wrote:
As advertised in one of previous commits, we want to be able to change 'requested-size' attribute of virtio-mem on the fly. This commit does exactly that. Changing anything else is checked for and forbidden.
Once guest has changed the allocation, QEMU emits an event which we will use to track the allocation. In the next commit.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/domain_conf.c | 36 ++++++++ src/conf/domain_conf.h | 4 + src/libvirt_private.syms | 1 + src/qemu/qemu_driver.c | 172 ++++++++++++++++++++++++++++++++++- src/qemu/qemu_hotplug.c | 18 ++++ src/qemu/qemu_hotplug.h | 5 + src/qemu/qemu_monitor.c | 13 +++ src/qemu/qemu_monitor.h | 5 + src/qemu/qemu_monitor_json.c | 15 +++ src/qemu/qemu_monitor_json.h | 5 + 10 files changed, 273 insertions(+), 1 deletion(-)
[...]
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index d908e95ba7..9a241ad551 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7130,6 +7130,165 @@ qemuDomainChangeDiskLive(virDomainObj *vm, return 0; }
+ +static bool +qemuDomainChangeMemoryLiveValidateChange(const virDomainMemoryDef *oldDef, + const virDomainMemoryDef *newDef) +{
I must say this function feels almost as the ABI stability checker.
+ /* The only thing that is allowed to change is 'requestedsize' for virtio + * model. */
Also this is a bit weird. You mention that we are dealing with virtio-mem ...
+ if (oldDef->model != newDef->model) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory model from '%s' to '%s'"), + virDomainMemoryModelTypeToString(oldDef->model), + virDomainMemoryModelTypeToString(newDef->model)); + return false; + } + + if (oldDef->access != newDef->access) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory access from '%s' to '%s'"), + virDomainMemoryAccessTypeToString(oldDef->access), + virDomainMemoryAccessTypeToString(newDef->access)); + return false; + } + + if (oldDef->discard != newDef->discard) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory discard from '%s' to '%s'"), + virTristateBoolTypeToString(oldDef->discard), + virTristateBoolTypeToString(newDef->discard)); + return false; + } + + if (!virBitmapEqual(oldDef->sourceNodes, + newDef->sourceNodes)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cannot modify memory source nodes")); + return false; + } + + if (oldDef->pagesize != newDef->pagesize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory pagesize from '%llu' to '%llu'"), + oldDef->pagesize, + newDef->pagesize); + return false; + } + + if (STRNEQ_NULLABLE(oldDef->nvdimmPath, newDef->nvdimmPath)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory path from '%s' to '%s'"), + NULLSTR(oldDef->nvdimmPath), + NULLSTR(newDef->nvdimmPath)); + return false; + }
... but also check stuff not relevant to virtio-mem ...
+ + if (oldDef->alignsize != newDef->alignsize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory align size from '%llu' to '%llu'"), + oldDef->alignsize, newDef->alignsize); + return false; + } + + if (oldDef->nvdimmPmem != newDef->nvdimmPmem) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory pmem from '%d' to '%d'"), + oldDef->nvdimmPmem, newDef->nvdimmPmem); + return false; + } + + if (oldDef->targetNode != newDef->targetNode) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory targetNode from '%d' to '%d'"), + oldDef->targetNode, newDef->targetNode); + return false; + } + + if (oldDef->size != newDef->size) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory size from '%llu' to '%llu'"), + oldDef->size, newDef->size); + return false; + } + + if (oldDef->labelsize != newDef->labelsize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory label size from '%llu' to '%llu'"), + oldDef->labelsize, newDef->labelsize); + return false; + } + if (oldDef->blocksize != newDef->blocksize) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory block size from '%llu' to '%llu'"), + oldDef->blocksize, newDef->blocksize); + return false; + } + + /* requestedsize can change */ + + if (oldDef->readonly != newDef->readonly) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cannot modify memory pmem flag")); + return false; + } + + if ((oldDef->uuid || newDef->uuid) && + !(oldDef->uuid && newDef->uuid && + memcmp(oldDef->uuid, newDef->uuid, VIR_UUID_BUFLEN) == 0)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cannot modify memory UUID")); + return false; + } + + switch (oldDef->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + break; + + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("cannot modify memory of model '%s'"), + virDomainMemoryModelTypeToString(oldDef->model)); + return false;
... but at the end only allow virtio-mem, so the irrelevant checks are actually not even necessary.
+ break; + } + + return true; +} + + +static int +qemuDomainChangeMemoryLive(virQEMUDriver *driver G_GNUC_UNUSED, + virDomainObj *vm, + virDomainDeviceDef *dev) +{ + virDomainMemoryDef *newDef = dev->data.memory; + virDomainMemoryDef *oldDef = NULL; + + oldDef = virDomainMemoryFindByDeviceInfo(vm->def, &dev->data.memory->info); + if (!oldDef) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("memory '%s' not found"), dev->data.memory->info.alias);
[1]
+ return -1; + } + + if (!qemuDomainChangeMemoryLiveValidateChange(oldDef, newDef)) + return -1; + + if (qemuDomainChangeMemoryRequestedSize(driver, vm, + newDef->info.alias, + newDef->requestedsize) < 0) + return -1; + + oldDef->requestedsize = newDef->requestedsize; + return 0; +} + + static int qemuDomainUpdateDeviceLive(virDomainObj *vm, virDomainDeviceDef *dev, @@ -7171,6 +7330,18 @@ qemuDomainUpdateDeviceLive(virDomainObj *vm, ret = qemuDomainChangeNet(driver, vm, dev); break;
+ case VIR_DOMAIN_DEVICE_MEMORY: + oldDev.data.memory = virDomainMemoryFindByDeviceInfo(vm->def, &dev->data.memory->info); + if (oldDev.data.memory) {
Also this is a bit weird too. Here you do this optionally if virDomainMemoryFindByDeviceInfo finds the device ...
+ if (virDomainDefCompatibleDevice(vm->def, dev, &oldDev, + VIR_DOMAIN_DEVICE_ACTION_UPDATE, + true) < 0) + return -1; + } + + ret = qemuDomainChangeMemoryLive(driver, vm, dev);
... but here the first thing that is done is that virDomainMemoryFindByDeviceInfo is called again and a NULL return is fatal.
+ break; + case VIR_DOMAIN_DEVICE_FS: case VIR_DOMAIN_DEVICE_INPUT: case VIR_DOMAIN_DEVICE_SOUND:
With the wierdness reduced: Reviewed-by: Peter Krempa <pkrempa@redhat.com>

As advertised in previous commit, this event is delivered to us when virtio-mem module changes the allocation inside the guest. It comes with one attribute - size - which holds the new size of the virtio-mem (well, allocated size), in bytes. Mind you, this is not necessarily the same number as 'requested size'. It almost certainly will be when sizing the memory up, but it might not be when sizing the memory down - the guest kernel might be unable to free some blocks. This actual size is reported in the domain XML as an output element only. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.rst | 7 +++ docs/schemas/domaincommon.rng | 5 ++ examples/c/misc/event-test.c | 17 ++++++ include/libvirt/libvirt-domain.h | 23 ++++++++ src/conf/domain_conf.c | 26 ++++++++- src/conf/domain_conf.h | 8 +++ src/conf/domain_event.c | 84 +++++++++++++++++++++++++++++ src/conf/domain_event.h | 10 ++++ src/libvirt_private.syms | 3 ++ src/qemu/qemu_domain.c | 3 ++ src/qemu/qemu_domain.h | 1 + src/qemu/qemu_driver.c | 37 +++++++++++++ src/qemu/qemu_monitor.c | 24 +++++++++ src/qemu/qemu_monitor.h | 20 +++++++ src/qemu/qemu_monitor_json.c | 24 +++++++++ src/qemu/qemu_process.c | 42 +++++++++++++++ src/remote/remote_daemon_dispatch.c | 30 +++++++++++ src/remote/remote_driver.c | 32 +++++++++++ src/remote/remote_protocol.x | 14 ++++- src/remote_protocol-structs | 7 +++ tools/virsh-domain.c | 20 +++++++ 21 files changed, 434 insertions(+), 3 deletions(-) diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 44e478c88a..7ab5d25488 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -7731,6 +7731,7 @@ Example: usage of the memory devices <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>1048576</requested> + <actual unit='KiB'>524288</requested> </target> </memory> </devices> @@ -7846,6 +7847,12 @@ Example: usage of the memory devices The total size exposed to the guest. Must respect ``block`` granularity and be smaller or equal to ``size``. + ``actual`` + Active XML for ``virtio-mem`` model may contain ``actual`` element that + reflects the actual size of the corresponding virtio memory device. The + element is formatted into live XML and never parsed, i.e. it is + output-only element. + :anchor:`<a id="elementsIommu"/>` IOMMU devices diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index b75c2f3ead..0ab9e82c09 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -6619,6 +6619,11 @@ <ref name="scaledInteger"/> </element> </optional> + <optional> + <element name="actual"> + <ref name="scaledInteger"/> + </element> + </optional> <optional> <element name="label"> <element name="size"> diff --git a/examples/c/misc/event-test.c b/examples/c/misc/event-test.c index 10c707e66b..1eec76c79d 100644 --- a/examples/c/misc/event-test.c +++ b/examples/c/misc/event-test.c @@ -982,6 +982,22 @@ myDomainEventMemoryFailureCallback(virConnectPtr conn G_GNUC_UNUSED, } +static int +myDomainEventMemoryDeviceSizeChangeCallback(virConnectPtr conn G_GNUC_UNUSED, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque G_GNUC_UNUSED) +{ + /* Casts to uint64_t to work around mingw not knowing %lld */ + printf("%s EVENT: Domain %s(%d) memory device size change: " + "alias: '%s' new size %" PRIu64 "'\n", + __func__, virDomainGetName(dom), virDomainGetID(dom), + alias, (uint64_t)size); + return 0; +} + + static int myDomainEventMigrationIterationCallback(virConnectPtr conn G_GNUC_UNUSED, virDomainPtr dom, @@ -1113,6 +1129,7 @@ struct domainEventData domainEvents[] = { DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_METADATA_CHANGE, myDomainEventMetadataChangeCallback), DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_BLOCK_THRESHOLD, myDomainEventBlockThresholdCallback), DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_MEMORY_FAILURE, myDomainEventMemoryFailureCallback), + DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE, myDomainEventMemoryDeviceSizeChangeCallback), }; struct storagePoolEventData { diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index e99bfb7654..171a2ae704 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -4653,6 +4653,28 @@ typedef void (*virConnectDomainEventMemoryFailureCallback)(virConnectPtr conn, unsigned int flags, void *opaque); +/** + * virConnectDomainEventMemoryDeviceSizeChangeCallback: + * @conn: connection object + * @dom: domain on which the event occurred + * @alias: memory device alias + * @size: new actual size of memory device (in KiB) + * @opaque: application specified data + * + * The callback occurs when the guest acknowledges request to change size of + * memory device (so far only virtio-mem model supports this). The @size then + * reflects the new amount of guest visible memory (in kibibytes). + * + * The callback signature to use when registering for an event of type + * VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE with + * virConnectDomainEventRegisterAny(). + */ +typedef void (*virConnectDomainEventMemoryDeviceSizeChangeCallback)(virConnectPtr conn, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque); + /** * VIR_DOMAIN_EVENT_CALLBACK: @@ -4697,6 +4719,7 @@ typedef enum { VIR_DOMAIN_EVENT_ID_METADATA_CHANGE = 23, /* virConnectDomainEventMetadataChangeCallback */ VIR_DOMAIN_EVENT_ID_BLOCK_THRESHOLD = 24, /* virConnectDomainEventBlockThresholdCallback */ VIR_DOMAIN_EVENT_ID_MEMORY_FAILURE = 25, /* virConnectDomainEventMemoryFailureCallback */ + VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE = 26, /* virConnectDomainEventMemoryDeviceSizeChangeCallback */ # ifdef VIR_ENUM_SENTINELS VIR_DOMAIN_EVENT_ID_LAST diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index e75c5c9dcd..d578a95526 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -17420,6 +17420,23 @@ virDomainMemoryFindByDeviceInfo(virDomainDef *def, } +virDomainMemoryDef * +virDomainMemoryFindByDeviceAlias(virDomainDef *def, + const char *alias) +{ + size_t i; + + for (i = 0; i < def->nmems; i++) { + virDomainMemoryDef *tmp = def->mems[i]; + + if (STREQ_NULLABLE(tmp->info.alias, alias)) + return tmp; + } + + return NULL; +} + + /** * virDomainMemoryInsert: * @@ -26979,7 +26996,8 @@ virDomainMemorySourceDefFormat(virBuffer *buf, static void virDomainMemoryTargetDefFormat(virBuffer *buf, - virDomainMemoryDef *def) + virDomainMemoryDef *def, + unsigned int flags) { g_auto(virBuffer) childBuf = VIR_BUFFER_INIT_CHILD(buf); @@ -27001,6 +27019,10 @@ virDomainMemoryTargetDefFormat(virBuffer *buf, virBufferAsprintf(&childBuf, "<requested unit='KiB'>%llu</requested>\n", def->requestedsize); + if (!(flags & VIR_DOMAIN_DEF_FORMAT_INACTIVE)) { + virBufferAsprintf(&childBuf, "<actual unit='KiB'>%llu</actual>\n", + def->actualsize); + } } virXMLFormatElement(buf, "target", NULL, &childBuf); @@ -27033,7 +27055,7 @@ virDomainMemoryDefFormat(virBuffer *buf, if (virDomainMemorySourceDefFormat(buf, def) < 0) return -1; - virDomainMemoryTargetDefFormat(buf, def); + virDomainMemoryTargetDefFormat(buf, def, flags); virDomainDeviceInfoFormat(buf, &def->info, flags); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 4fc864b1eb..87b0d53a27 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2469,6 +2469,9 @@ struct _virDomainMemoryDef { unsigned long long labelsize; /* kibibytes; valid only for NVDIMM */ unsigned long long blocksize; /* kibibytes; valid only for VIRTIO_MEM */ unsigned long long requestedsize; /* kibibytes; valid only for VIRTIO_MEM */ + unsigned long long actualsize; /* kibibytes, valid for VIRTIO_MEM and + active domain only, only to report never + parse */ bool readonly; /* valid only for NVDIMM */ /* required for QEMU NVDIMM ppc64 support */ @@ -3762,6 +3765,11 @@ virDomainMemoryFindByDeviceInfo(virDomainDef *dev, virDomainDeviceInfo *info) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; +virDomainMemoryDef * +virDomainMemoryFindByDeviceAlias(virDomainDef *def, + const char *alias) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; + int virDomainShmemDefInsert(virDomainDef *def, virDomainShmemDef *shmem) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; bool virDomainShmemDefEquals(virDomainShmemDef *src, virDomainShmemDef *dst) diff --git a/src/conf/domain_event.c b/src/conf/domain_event.c index 15a3baedf7..196093884a 100644 --- a/src/conf/domain_event.c +++ b/src/conf/domain_event.c @@ -58,6 +58,7 @@ static virClass *virDomainEventDeviceRemovalFailedClass; static virClass *virDomainEventMetadataChangeClass; static virClass *virDomainEventBlockThresholdClass; static virClass *virDomainEventMemoryFailureClass; +static virClass *virDomainEventMemoryDeviceSizeChangeClass; static void virDomainEventDispose(void *obj); static void virDomainEventLifecycleDispose(void *obj); @@ -81,6 +82,7 @@ static void virDomainEventDeviceRemovalFailedDispose(void *obj); static void virDomainEventMetadataChangeDispose(void *obj); static void virDomainEventBlockThresholdDispose(void *obj); static void virDomainEventMemoryFailureDispose(void *obj); +static void virDomainEventMemoryDeviceSizeChangeDispose(void *obj); static void virDomainEventDispatchDefaultFunc(virConnectPtr conn, @@ -277,6 +279,15 @@ struct _virDomainEventMemoryFailure { }; typedef struct _virDomainEventMemoryFailure virDomainEventMemoryFailure; +struct _virDomainEventMemoryDeviceSizeChange { + virDomainEvent parent; + + char *alias; + unsigned long long size; +}; +typedef struct _virDomainEventMemoryDeviceSizeChange virDomainEventMemoryDeviceSizeChange; +typedef virDomainEventMemoryDeviceSizeChange *virDomainEventMemoryDeviceSizeChangePtr; + static int virDomainEventsOnceInit(void) { @@ -324,6 +335,8 @@ virDomainEventsOnceInit(void) return -1; if (!VIR_CLASS_NEW(virDomainEventMemoryFailure, virDomainEventClass)) return -1; + if (!VIR_CLASS_NEW(virDomainEventMemoryDeviceSizeChange, virDomainEventClass)) + return -1; return 0; } @@ -540,6 +553,14 @@ virDomainEventMemoryFailureDispose(void *obj) VIR_DEBUG("obj=%p", event); } +static void +virDomainEventMemoryDeviceSizeChangeDispose(void *obj) +{ + virDomainEventMemoryDeviceSizeChangePtr event = obj; + VIR_DEBUG("obj=%p", event); + + g_free(event->alias); +} static void * virDomainEventNew(virClass *klass, @@ -1664,6 +1685,57 @@ virDomainEventMemoryFailureNewFromDom(virDomainPtr dom, recipient, action, flags); } + +static virObjectEvent * +virDomainEventMemoryDeviceSizeChangeNew(int id, + const char *name, + unsigned char *uuid, + const char *alias, + unsigned long long size) +{ + virDomainEventMemoryDeviceSizeChangePtr ev; + + if (virDomainEventsInitialize() < 0) + return NULL; + + if (!(ev = virDomainEventNew(virDomainEventMemoryDeviceSizeChangeClass, + VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE, + id, name, uuid))) + return NULL; + + ev->alias = g_strdup(alias); + ev->size = size; + + return (virObjectEvent *)ev; +} + + +virObjectEvent * +virDomainEventMemoryDeviceSizeChangeNewFromObj(virDomainObj *obj, + const char *alias, + unsigned long long size) +{ + return virDomainEventMemoryDeviceSizeChangeNew(obj->def->id, + obj->def->name, + obj->def->uuid, + alias, + size); +} + + +virObjectEvent * +virDomainEventMemoryDeviceSizeChangeNewFromDom(virDomainPtr dom, + const char *alias, + unsigned long long size) +{ + return virDomainEventMemoryDeviceSizeChangeNew(dom->id, + dom->name, + dom->uuid, + alias, + size); +} + + static void virDomainEventDispatchDefaultFunc(virConnectPtr conn, virObjectEvent *event, @@ -1960,6 +2032,18 @@ virDomainEventDispatchDefaultFunc(virConnectPtr conn, goto cleanup; } + case VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE: + { + virDomainEventMemoryDeviceSizeChangePtr memoryDeviceSizeChangeEvent; + + memoryDeviceSizeChangeEvent = (virDomainEventMemoryDeviceSizeChangePtr)event; + ((virConnectDomainEventMemoryDeviceSizeChangeCallback)cb)(conn, dom, + memoryDeviceSizeChangeEvent->alias, + memoryDeviceSizeChangeEvent->size, + cbopaque); + goto cleanup; + } + case VIR_DOMAIN_EVENT_ID_LAST: break; } diff --git a/src/conf/domain_event.h b/src/conf/domain_event.h index 2a59e613cd..4a9f6b988b 100644 --- a/src/conf/domain_event.h +++ b/src/conf/domain_event.h @@ -267,6 +267,16 @@ virDomainEventMemoryFailureNewFromDom(virDomainPtr dom, int action, unsigned int flags); +virObjectEvent * +virDomainEventMemoryDeviceSizeChangeNewFromObj(virDomainObj *obj, + const char *alias, + unsigned long long size); + +virObjectEvent * +virDomainEventMemoryDeviceSizeChangeNewFromDom(virDomainPtr dom, + const char *alias, + unsigned long long size); + int virDomainEventStateRegister(virConnectPtr conn, virObjectEventState *state, diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 66af60a775..29efdaac6b 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -502,6 +502,7 @@ virDomainMemballoonModelTypeFromString; virDomainMemballoonModelTypeToString; virDomainMemoryDefFree; virDomainMemoryFindByDef; +virDomainMemoryFindByDeviceAlias; virDomainMemoryFindByDeviceInfo; virDomainMemoryFindInactiveByDef; virDomainMemoryInsert; @@ -716,6 +717,8 @@ virDomainEventLifecycleNew; virDomainEventLifecycleNewFromDef; virDomainEventLifecycleNewFromDom; virDomainEventLifecycleNewFromObj; +virDomainEventMemoryDeviceSizeChangeNewFromDom; +virDomainEventMemoryDeviceSizeChangeNewFromObj; virDomainEventMemoryFailureNewFromDom; virDomainEventMemoryFailureNewFromObj; virDomainEventMetadataChangeNewFromDom; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index c182b47f38..226e1d9b79 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11053,6 +11053,9 @@ qemuProcessEventFree(struct qemuProcessEvent *event) case QEMU_PROCESS_EVENT_JOB_STATUS_CHANGE: virObjectUnref(event->data); break; + case QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE: + qemuMonitorMemoryDeviceSizeChangeFree(event->data); + break; case QEMU_PROCESS_EVENT_PR_DISCONNECT: case QEMU_PROCESS_EVENT_LAST: break; diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 2626f5dcaa..3302ced640 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -427,6 +427,7 @@ typedef enum { QEMU_PROCESS_EVENT_PR_DISCONNECT, QEMU_PROCESS_EVENT_RDMA_GID_STATUS_CHANGED, QEMU_PROCESS_EVENT_GUEST_CRASHLOADED, + QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE, QEMU_PROCESS_EVENT_LAST } qemuProcessEventType; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 9a241ad551..b495a261d1 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4317,6 +4317,40 @@ processGuestCrashloadedEvent(virQEMUDriver *driver, } +static void +processMemoryDeviceSizeChange(virQEMUDriver *driver, + virDomainObj *vm, + qemuMonitorMemoryDeviceSizeChange *info) +{ + virDomainMemoryDef *mem = NULL; + virObjectEvent *event = NULL; + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) + return; + + if (!virDomainObjIsActive(vm)) { + VIR_DEBUG("Domain is not running"); + goto endjob; + } + + mem = virDomainMemoryFindByDeviceAlias(vm->def, info->devAlias); + if (!mem) { + VIR_DEBUG("Memory device '%s' not found", info->devAlias); + goto endjob; + } + + mem->actualsize = VIR_DIV_UP(info->size, 1024); + + event = virDomainEventMemoryDeviceSizeChangeNewFromObj(vm, + info->devAlias, + mem->actualsize); + + endjob: + qemuDomainObjEndJob(driver, vm); + virObjectEventStateQueue(driver->domainEventState, event); +} + + static void qemuProcessEventHandler(void *data, void *opaque) { struct qemuProcessEvent *processEvent = data; @@ -4366,6 +4400,9 @@ static void qemuProcessEventHandler(void *data, void *opaque) case QEMU_PROCESS_EVENT_GUEST_CRASHLOADED: processGuestCrashloadedEvent(driver, vm); break; + case QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE: + processMemoryDeviceSizeChange(driver, vm, processEvent->data); + break; case QEMU_PROCESS_EVENT_LAST: break; } diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index c5793a8238..9292d63a7f 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1449,6 +1449,20 @@ qemuMonitorEmitSpiceMigrated(qemuMonitor *mon) } +int +qemuMonitorEmitMemoryDeviceSizeChange(qemuMonitor *mon, + const char *devAlias, + unsigned long long size) +{ + int ret = -1; + VIR_DEBUG("mon=%p, devAlias='%s', size=%llu", mon, devAlias, size); + + QEMU_MONITOR_CALLBACK(mon, ret, domainMemoryDeviceSizeChange, mon->vm, devAlias, size); + + return ret; +} + + int qemuMonitorEmitMemoryFailure(qemuMonitor *mon, qemuMonitorEventMemoryFailure *mfp) @@ -4431,6 +4445,16 @@ qemuMonitorEventRdmaGidStatusFree(qemuMonitorRdmaGidStatus *info) } +void +qemuMonitorMemoryDeviceSizeChangeFree(qemuMonitorMemoryDeviceSizeChangePtr info) +{ + if (!info) + return; + + g_free(info->devAlias); +} + + int qemuMonitorSetWatchdogAction(qemuMonitor *mon, const char *action) diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 8d27a7e3fb..078310de61 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -101,6 +101,14 @@ struct _qemuMonitorRdmaGidStatus { }; +typedef struct _qemuMonitorMemoryDeviceSizeChange qemuMonitorMemoryDeviceSizeChange; +typedef qemuMonitorMemoryDeviceSizeChange *qemuMonitorMemoryDeviceSizeChangePtr; +struct _qemuMonitorMemoryDeviceSizeChange { + char *devAlias; + unsigned long long size; +}; + + typedef enum { QEMU_MONITOR_JOB_TYPE_UNKNOWN, /* internal value, not exposed by qemu */ QEMU_MONITOR_JOB_TYPE_COMMIT, @@ -145,6 +153,7 @@ struct _qemuMonitorJobInfo { char *qemuMonitorGuestPanicEventInfoFormatMsg(qemuMonitorEventPanicInfo *info); void qemuMonitorEventPanicInfoFree(qemuMonitorEventPanicInfo *info); void qemuMonitorEventRdmaGidStatusFree(qemuMonitorRdmaGidStatus *info); +void qemuMonitorMemoryDeviceSizeChangeFree(qemuMonitorMemoryDeviceSizeChange *info); typedef void (*qemuMonitorDestroyCallback)(qemuMonitor *mon, virDomainObj *vm, @@ -364,6 +373,12 @@ typedef int (*qemuMonitorDomainMemoryFailureCallback)(qemuMonitor *mon, qemuMonitorEventMemoryFailure *mfp, void *opaque); +typedef int (*qemuMonitorDomainMemoryDeviceSizeChange)(qemuMonitor *mon, + virDomainObj *vm, + const char *alias, + unsigned long long size, + void *opaque); + typedef struct _qemuMonitorCallbacks qemuMonitorCallbacks; struct _qemuMonitorCallbacks { qemuMonitorDestroyCallback destroy; @@ -400,6 +415,7 @@ struct _qemuMonitorCallbacks { qemuMonitorDomainRdmaGidStatusChangedCallback domainRdmaGidStatusChanged; qemuMonitorDomainGuestCrashloadedCallback domainGuestCrashloaded; qemuMonitorDomainMemoryFailureCallback domainMemoryFailure; + qemuMonitorDomainMemoryDeviceSizeChange domainMemoryDeviceSizeChange; }; qemuMonitor *qemuMonitorOpen(virDomainObj *vm, @@ -496,6 +512,10 @@ int qemuMonitorEmitSerialChange(qemuMonitor *mon, bool connected); int qemuMonitorEmitSpiceMigrated(qemuMonitor *mon); +int qemuMonitorEmitMemoryDeviceSizeChange(qemuMonitor *mon, + const char *devAlias, + unsigned long long size); + int qemuMonitorEmitMemoryFailure(qemuMonitor *mon, qemuMonitorEventMemoryFailure *mfp); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index e2f68e838b..071c6a19be 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -113,6 +113,7 @@ static void qemuMonitorJSONHandleDumpCompleted(qemuMonitor *mon, virJSONValue *d static void qemuMonitorJSONHandlePRManagerStatusChanged(qemuMonitor *mon, virJSONValue *data); static void qemuMonitorJSONHandleRdmaGidStatusChanged(qemuMonitor *mon, virJSONValue *data); static void qemuMonitorJSONHandleMemoryFailure(qemuMonitor *mon, virJSONValue *data); +static void qemuMonitorJSONHandleMemoryDeviceSizeChange(qemuMonitor *mon, virJSONValue *data); typedef struct { const char *type; @@ -133,6 +134,7 @@ static qemuEventHandler eventHandlers[] = { { "GUEST_CRASHLOADED", qemuMonitorJSONHandleGuestCrashloaded, }, { "GUEST_PANICKED", qemuMonitorJSONHandleGuestPanic, }, { "JOB_STATUS_CHANGE", qemuMonitorJSONHandleJobStatusChange, }, + { "MEMORY_DEVICE_SIZE_CHANGE", qemuMonitorJSONHandleMemoryDeviceSizeChange, }, { "MEMORY_FAILURE", qemuMonitorJSONHandleMemoryFailure, }, { "MIGRATION", qemuMonitorJSONHandleMigrationStatus, }, { "MIGRATION_PASS", qemuMonitorJSONHandleMigrationPass, }, @@ -1320,6 +1322,28 @@ qemuMonitorJSONHandleSpiceMigrated(qemuMonitor *mon, } +static void +qemuMonitorJSONHandleMemoryDeviceSizeChange(qemuMonitor *mon, + virJSONValue *data) +{ + const char *name; + unsigned long long size; + + if (!(name = virJSONValueObjectGetString(data, "id"))) { + VIR_WARN("missing device alias in MEMORY_DEVICE_SIZE_CHANGE event"); + return; + } + + if (virJSONValueObjectGetNumberUlong(data, "size", &size) < 0) { + VIR_WARN("missing new size for '%s' in MEMORY_DEVICE_SIZE_CHANGE event", name); + return; + } + + + qemuMonitorEmitMemoryDeviceSizeChange(mon, name, size); +} + + static void qemuMonitorJSONHandleMemoryFailure(qemuMonitor *mon, virJSONValue *data) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 9c16e51917..47fa4068b0 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1919,6 +1919,47 @@ qemuProcessHandleMemoryFailure(qemuMonitor *mon G_GNUC_UNUSED, } +static int +qemuProcessHandleMemoryDeviceSizeChange(qemuMonitor *mon G_GNUC_UNUSED, + virDomainObj *vm, + const char *devAlias, + unsigned long long size, + void *opaque) +{ + virQEMUDriver *driver = opaque; + struct qemuProcessEvent *processEvent = NULL; + qemuMonitorMemoryDeviceSizeChange *info = NULL; + int ret = -1; + + virObjectLock(vm); + + VIR_DEBUG("Memory device '%s' changed size to '%llu' in domain '%s'", + devAlias, size, vm->def->name); + + info = g_new0(qemuMonitorMemoryDeviceSizeChange, 1); + info->devAlias = g_strdup(devAlias); + info->size = size; + + processEvent = g_new0(struct qemuProcessEvent, 1); + processEvent->eventType = QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE; + processEvent->vm = virObjectRef(vm); + processEvent->data = g_steal_pointer(&info); + + if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { + qemuProcessEventFree(processEvent); + virObjectUnref(vm); + goto cleanup; + } + + processEvent = NULL; + ret = 0; + cleanup: + qemuProcessEventFree(processEvent); + virObjectUnlock(vm); + return ret; +} + + static qemuMonitorCallbacks monitorCallbacks = { .eofNotify = qemuProcessHandleMonitorEOF, .errorNotify = qemuProcessHandleMonitorError, @@ -1952,6 +1993,7 @@ static qemuMonitorCallbacks monitorCallbacks = { .domainRdmaGidStatusChanged = qemuProcessHandleRdmaGidStatusChanged, .domainGuestCrashloaded = qemuProcessHandleGuestCrashloaded, .domainMemoryFailure = qemuProcessHandleMemoryFailure, + .domainMemoryDeviceSizeChange = qemuProcessHandleMemoryDeviceSizeChange, }; static void diff --git a/src/remote/remote_daemon_dispatch.c b/src/remote/remote_daemon_dispatch.c index 1b4f5256c3..285078f772 100644 --- a/src/remote/remote_daemon_dispatch.c +++ b/src/remote/remote_daemon_dispatch.c @@ -1333,6 +1333,35 @@ remoteRelayDomainEventMemoryFailure(virConnectPtr conn, } +static int +remoteRelayDomainEventMemoryDeviceSizeChange(virConnectPtr conn, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque) +{ + daemonClientEventCallback *callback = opaque; + remote_domain_event_memory_device_size_change_msg data; + + if (callback->callbackID < 0 || + !remoteRelayDomainEventCheckACL(callback->client, conn, dom)) + return -1; + + /* build return data */ + memset(&data, 0, sizeof(data)); + data.callbackID = callback->callbackID; + data.alias = g_strdup(alias); + data.size = size; + make_nonnull_domain(&data.dom, dom); + + remoteDispatchObjectEventSend(callback->client, remoteProgram, + REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE, + (xdrproc_t)xdr_remote_domain_event_memory_device_size_change_msg, + &data); + return 0; +} + + static virConnectDomainEventGenericCallback domainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventLifecycle), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventReboot), @@ -1360,6 +1389,7 @@ static virConnectDomainEventGenericCallback domainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventMetadataChange), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventBlockThreshold), VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventMemoryFailure), + VIR_DOMAIN_EVENT_CALLBACK(remoteRelayDomainEventMemoryDeviceSizeChange), }; G_STATIC_ASSERT(G_N_ELEMENTS(domainEventCallbacks) == VIR_DOMAIN_EVENT_ID_LAST); diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index 0c72d69933..0e0b92d584 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -410,6 +410,10 @@ remoteDomainBuildEventMemoryFailure(virNetClientProgram *prog, void *evdata, void *opaque); static void +remoteDomainBuildEventMemoryDeviceSizeChange(virNetClientProgram *prog, + virNetClient *client, + void *evdata, void *opaque); +static void remoteConnectNotifyEventConnectionClosed(virNetClientProgram *prog G_GNUC_UNUSED, virNetClient *client G_GNUC_UNUSED, void *evdata, void *opaque); @@ -624,6 +628,10 @@ static virNetClientProgramEvent remoteEvents[] = { remoteDomainBuildEventMemoryFailure, sizeof(remote_domain_event_memory_failure_msg), (xdrproc_t)xdr_remote_domain_event_memory_failure_msg }, + { REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE, + remoteDomainBuildEventMemoryDeviceSizeChange, + sizeof(remote_domain_event_memory_device_size_change_msg), + (xdrproc_t)xdr_remote_domain_event_memory_device_size_change_msg }, }; static void @@ -5462,6 +5470,30 @@ remoteDomainBuildEventMemoryFailure(virNetClientProgram *prog G_GNUC_UNUSED, } +static void +remoteDomainBuildEventMemoryDeviceSizeChange(virNetClientProgram *prog G_GNUC_UNUSED, + virNetClient *client G_GNUC_UNUSED, + void *evdata, void *opaque) +{ + virConnectPtr conn = opaque; + remote_domain_event_memory_device_size_change_msg *msg = evdata; + struct private_data *priv = conn->privateData; + virDomainPtr dom; + virObjectEvent *event = NULL; + + if (!(dom = get_nonnull_domain(conn, msg->dom))) + return; + + event = virDomainEventMemoryDeviceSizeChangeNewFromDom(dom, + msg->alias, + msg->size); + + virObjectUnref(dom); + + virObjectEventStateQueueRemote(priv->eventState, event, msg->callbackID); +} + + static int remoteStreamSend(virStreamPtr st, const char *data, diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index de69704b68..a45bdaa7e8 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -3837,6 +3837,13 @@ struct remote_domain_start_dirty_rate_calc_args { }; +struct remote_domain_event_memory_device_size_change_msg { + int callbackID; + remote_nonnull_domain dom; + remote_nonnull_string alias; + unsigned hyper size; +}; + /*----- Protocol. -----*/ /* Define the program number, protocol version and procedure numbers here. */ @@ -6784,6 +6791,11 @@ enum remote_procedure { * @priority: high * @acl: node_device:start */ - REMOTE_PROC_NODE_DEVICE_CREATE = 430 + REMOTE_PROC_NODE_DEVICE_CREATE = 430, + /** + * @generate: both + * @acl: none + */ + REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE = 431 }; diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index 6b46328adc..7286c06ae8 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -3192,6 +3192,12 @@ struct remote_domain_start_dirty_rate_calc_args { int seconds; u_int flags; }; +struct remote_domain_event_memory_device_size_change_msg { + int callbackID; + remote_nonnull_domain dom; + remote_nonnull_string alias; + uint64_t size; +}; enum remote_procedure { REMOTE_PROC_CONNECT_OPEN = 1, REMOTE_PROC_CONNECT_CLOSE = 2, @@ -3623,4 +3629,5 @@ enum remote_procedure { REMOTE_PROC_NODE_DEVICE_DEFINE_XML = 428, REMOTE_PROC_NODE_DEVICE_UNDEFINE = 429, REMOTE_PROC_NODE_DEVICE_CREATE = 430, + REMOTE_PROC_DOMAIN_EVENT_MEMORY_DEVICE_SIZE_CHANGE = 431, }; diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 0825f82522..7384d2d68d 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -13475,6 +13475,24 @@ virshEventMemoryFailurePrint(virConnectPtr conn G_GNUC_UNUSED, } +static void +virshEventMemoryDeviceSizeChangePrint(virConnectPtr conn G_GNUC_UNUSED, + virDomainPtr dom, + const char *alias, + unsigned long long size, + void *opaque) +{ + g_auto(virBuffer) buf = VIR_BUFFER_INITIALIZER; + + virBufferAsprintf(&buf, + _("event 'memory-device-size-change' for domain '%s':\n" + "alias: %s\nsize: %llu\n"), + virDomainGetName(dom), alias, size); + + virshEventPrint(opaque, &buf); +} + + virshDomainEventCallback virshDomainEventCallbacks[] = { { "lifecycle", VIR_DOMAIN_EVENT_CALLBACK(virshEventLifecyclePrint), }, @@ -13526,6 +13544,8 @@ virshDomainEventCallback virshDomainEventCallbacks[] = { VIR_DOMAIN_EVENT_CALLBACK(virshEventBlockThresholdPrint), }, { "memory-failure", VIR_DOMAIN_EVENT_CALLBACK(virshEventMemoryFailurePrint), }, + { "memory-device-size-change", + VIR_DOMAIN_EVENT_CALLBACK(virshEventMemoryDeviceSizeChangePrint), }, }; G_STATIC_ASSERT(VIR_DOMAIN_EVENT_ID_LAST == G_N_ELEMENTS(virshDomainEventCallbacks)); -- 2.26.3

On Fri, Apr 23, 2021 at 15:24:30 +0200, Michal Privoznik wrote:
As advertised in previous commit, this event is delivered to us when virtio-mem module changes the allocation inside the guest. It comes with one attribute - size - which holds the new size of the virtio-mem (well, allocated size), in bytes. Mind you, this is not necessarily the same number as 'requested size'. It almost certainly will be when sizing the memory up, but it might not be when sizing the memory down - the guest kernel might be unable to free some blocks.
This actual size is reported in the domain XML as an output element only.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.rst | 7 +++ docs/schemas/domaincommon.rng | 5 ++ examples/c/misc/event-test.c | 17 ++++++ include/libvirt/libvirt-domain.h | 23 ++++++++ src/conf/domain_conf.c | 26 ++++++++- src/conf/domain_conf.h | 8 +++ src/conf/domain_event.c | 84 +++++++++++++++++++++++++++++ src/conf/domain_event.h | 10 ++++ src/libvirt_private.syms | 3 ++ src/qemu/qemu_domain.c | 3 ++ src/qemu/qemu_domain.h | 1 + src/qemu/qemu_driver.c | 37 +++++++++++++ src/qemu/qemu_monitor.c | 24 +++++++++ src/qemu/qemu_monitor.h | 20 +++++++ src/qemu/qemu_monitor_json.c | 24 +++++++++ src/qemu/qemu_process.c | 42 +++++++++++++++ src/remote/remote_daemon_dispatch.c | 30 +++++++++++ src/remote/remote_driver.c | 32 +++++++++++ src/remote/remote_protocol.x | 14 ++++- src/remote_protocol-structs | 7 +++ tools/virsh-domain.c | 20 +++++++ 21 files changed, 434 insertions(+), 3 deletions(-)
Mixing public API changes, wit XML and qemu changes is NOT acceptable. This commit is also too chaotic. I'll review it once you split it appropriately.

If the QEMU driver restarts it loses the track of the actual size of virtio-mem (because it's runtime type of information and thus not stored in XML) and therefore, we have to refresh it when reconnecting to the domain monitor. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 37 +++++++++++++++++++---- src/qemu/qemu_monitor.h | 3 ++ src/qemu/qemu_monitor_json.c | 58 ++++++++++++++++++++++-------------- src/qemu/qemu_process.c | 3 ++ 4 files changed, 73 insertions(+), 28 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 226e1d9b79..3c17d8704a 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -8278,9 +8278,21 @@ qemuDomainUpdateMemoryDeviceInfo(virQEMUDriver *driver, return -1; } - /* if qemu doesn't support the info request, just carry on */ - if (rc == -2) + /* If qemu doesn't support the info request, just carry on, unless we + * really need it. */ + if (rc == -2) { + for (i = 0; i < vm->def->nmems; i++) { + virDomainMemoryDef *mem = vm->def->mems[i]; + + if (mem->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("qemu did not return info on vitio-mem device")); + return -1; + } + } + return 0; + } if (rc < 0) return -1; @@ -8295,9 +8307,24 @@ qemuDomainUpdateMemoryDeviceInfo(virQEMUDriver *driver, if (!(dimm = virHashLookup(meminfo, mem->info.alias))) continue; - mem->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM; - mem->info.addr.dimm.slot = dimm->slot; - mem->info.addr.dimm.base = dimm->address; + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + mem->actualsize = VIR_DIV_UP(dimm->size, 1024); + break; + + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + mem->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM; + mem->info.addr.dimm.slot = dimm->slot; + mem->info.addr.dimm.base = dimm->address; + break; + + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + /* nada */ + break; + } } virHashFree(meminfo); diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 078310de61..a76a5c8799 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1358,10 +1358,13 @@ int qemuMonitorSetIOThread(qemuMonitor *mon, typedef struct _qemuMonitorMemoryDeviceInfo qemuMonitorMemoryDeviceInfo; struct _qemuMonitorMemoryDeviceInfo { + /* For pc-dimm */ unsigned long long address; unsigned int slot; bool hotplugged; bool hotpluggable; + /* For virtio-mem */ + unsigned long long size; /* in bytes */ }; int qemuMonitorGetMemoryDeviceInfo(qemuMonitor *mon, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 071c6a19be..b0a65a8617 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -8220,7 +8220,6 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon, virJSONValue *cmd; virJSONValue *reply = NULL; virJSONValue *data = NULL; - qemuMonitorMemoryDeviceInfo *meminfo = NULL; size_t i; if (!(cmd = qemuMonitorJSONMakeCommand("query-memory-devices", NULL))) @@ -8241,6 +8240,9 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon, for (i = 0; i < virJSONValueArraySize(data); i++) { virJSONValue *elem = virJSONValueArrayGet(data, i); + g_autofree qemuMonitorMemoryDeviceInfo *meminfo = NULL; + virJSONValue *dimminfo; + const char *devalias; const char *type; if (!(type = virJSONValueObjectGetString(elem, "type"))) { @@ -8250,26 +8252,26 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon, goto cleanup; } + if (!(dimminfo = virJSONValueObjectGetObject(elem, "data"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("query-memory-devices reply data doesn't " + "contain enum data")); + goto cleanup; + } + + /* While 'id' attribute is marked as optional in QEMU's QAPI + * specification, Libvirt always sets it. Thus we can fail if not + * present. */ + if (!(devalias = virJSONValueObjectGetString(dimminfo, "id"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("dimm memory info data is missing 'id'")); + goto cleanup; + } + + meminfo = g_new0(qemuMonitorMemoryDeviceInfo, 1); + /* dimm memory devices */ if (STREQ(type, "dimm")) { - virJSONValue *dimminfo; - const char *devalias; - - if (!(dimminfo = virJSONValueObjectGetObject(elem, "data"))) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("query-memory-devices reply data doesn't " - "contain enum data")); - goto cleanup; - } - - if (!(devalias = virJSONValueObjectGetString(dimminfo, "id"))) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("dimm memory info data is missing 'id'")); - goto cleanup; - } - - meminfo = g_new0(qemuMonitorMemoryDeviceInfo, 1); - if (virJSONValueObjectGetNumberUlong(dimminfo, "addr", &meminfo->address) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -8300,17 +8302,27 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon, } - if (virHashAddEntry(info, devalias, meminfo) < 0) + } else if (STREQ(type, "virtio-mem")) { + if (virJSONValueObjectGetNumberUlong(dimminfo, "size", + &meminfo->size) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("malformed/missing slot in dimm memory info")); goto cleanup; - - meminfo = NULL; + } + } else { + /* type not handled yet */ + continue; } + + if (virHashAddEntry(info, devalias, meminfo) < 0) + goto cleanup; + + meminfo = NULL; } ret = 0; cleanup: - VIR_FREE(meminfo); virJSONValueFree(cmd); virJSONValueFree(reply); return ret; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 47fa4068b0..55f081cc9d 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8561,6 +8561,9 @@ qemuProcessReconnect(void *opaque) qemuDomainVcpuPersistOrder(obj->def); + if (qemuDomainUpdateMemoryDeviceInfo(driver, obj, QEMU_ASYNC_JOB_NONE) < 0) + goto error; + if (qemuProcessDetectIOThreadPIDs(driver, obj, QEMU_ASYNC_JOB_NONE) < 0) goto error; -- 2.26.3

Tested the patch with libvirt v7.2.0-381-g3c3c55be66 & qemu-kvm-5.2.0-0.7.rc2.fc34.x86_64 S1: Start domain with virtio-mem device # virsh start pc_test # virsh dumpxml pc_test <domain type='kvm' id='11'> <name>pc_test</name> <uuid>927da985-2937-4dfe-ac13-be723293e0d9</uuid> <maxMemory slots='16' unit='KiB'>6291456</maxMemory> <memory unit='KiB'>1179648</memory> <currentMemory unit='KiB'>1179648</currentMemory> ... <memory model='virtio-mem'> <source> <nodemask>0</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>131072</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>131072</requested> <actual unit='KiB'>131072</actual> </target> <alias name='virtiomem0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/> </memory> .. 2. Hot plug a virtio-mem device, the values of memory & currentMemory are updated accordingly. # cat mem.xml <memory model='virtio-mem'> <source> <nodemask>0</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>131072</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>131072</requested> <actual unit='KiB'>131072</actual> </target> </memory> #virsh attach-device pc_test mem.xml Device attached successfully # virsh dumpxml pc_test <memory unit='KiB'>1310720</memory> <currentMemory unit='KiB'>1310720</currentMemory> ... <memory model='virtio-mem'> <source> <nodemask>0</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>131072</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>131072</requested> <actual unit='KiB'>131072</actual> </target> <alias name='virtiomem0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/> </memory> <memory model='virtio-mem'> <source> <nodemask>0</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>131072</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>131072</requested> <actual unit='KiB'>131072</actual> </target> <alias name='virtiomem1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/> </memory> 3. Updated the requested size, it's changed as expected. # virsh update-memory-device pc_test --alias virtiomem1 --requested-size 100MiB # virsh dumpxml pc_test <memory unit='KiB'>1310720</memory> <currentMemory unit='KiB'>1282048</currentMemory> ... <memory model='virtio-mem'> <source> <nodemask>0</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>131072</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>102400</requested> <actual unit='KiB'>102400</actual> </target> <alias name='virtiomem1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/> </memory> S2: From the docs, "virtio-pmem" works also. # cat pmem.xml <memory model='virtio-pmem' access='shared'> <source> <path>/tmp/nvdimm</path> </source> <target> <size unit='KiB'>131072</size> <label> <size unit='KiB'>128</size> </label> </target> </memory> # virsh attach-device pc_test pmem.xml Device attached successfully #virsh dumpxml pc_test <memory unit='KiB'>1441792</memory> <currentMemory unit='KiB'>1282048</currentMemory> Question: But seems it don't support <node> element , and if add <node> element, below errors are prompted - error: Failed to attach device from pmem error: unsupported configuration: virtio-pmem does not support NUMA nodes Since the <node> element is supported by nvdimm & virtio-mem, can you help to confirm if it's as expected? @Michal Privoznik <mprivozn@redhat.com> Thanks, Jing Qi On Fri, Apr 23, 2021 at 9:26 PM Michal Privoznik <mprivozn@redhat.com> wrote:
If the QEMU driver restarts it loses the track of the actual size of virtio-mem (because it's runtime type of information and thus not stored in XML) and therefore, we have to refresh it when reconnecting to the domain monitor.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 37 +++++++++++++++++++---- src/qemu/qemu_monitor.h | 3 ++ src/qemu/qemu_monitor_json.c | 58 ++++++++++++++++++++++-------------- src/qemu/qemu_process.c | 3 ++ 4 files changed, 73 insertions(+), 28 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 226e1d9b79..3c17d8704a 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -8278,9 +8278,21 @@ qemuDomainUpdateMemoryDeviceInfo(virQEMUDriver *driver, return -1; }
- /* if qemu doesn't support the info request, just carry on */ - if (rc == -2) + /* If qemu doesn't support the info request, just carry on, unless we + * really need it. */ + if (rc == -2) { + for (i = 0; i < vm->def->nmems; i++) { + virDomainMemoryDef *mem = vm->def->mems[i]; + + if (mem->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("qemu did not return info on vitio-mem device")); + return -1; + } + } + return 0; + }
if (rc < 0) return -1; @@ -8295,9 +8307,24 @@ qemuDomainUpdateMemoryDeviceInfo(virQEMUDriver *driver, if (!(dimm = virHashLookup(meminfo, mem->info.alias))) continue;
- mem->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM; - mem->info.addr.dimm.slot = dimm->slot; - mem->info.addr.dimm.base = dimm->address; + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + mem->actualsize = VIR_DIV_UP(dimm->size, 1024); + break; + + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + mem->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM; + mem->info.addr.dimm.slot = dimm->slot; + mem->info.addr.dimm.base = dimm->address; + break; + + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + /* nada */ + break; + } }
virHashFree(meminfo); diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 078310de61..a76a5c8799 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1358,10 +1358,13 @@ int qemuMonitorSetIOThread(qemuMonitor *mon,
typedef struct _qemuMonitorMemoryDeviceInfo qemuMonitorMemoryDeviceInfo; struct _qemuMonitorMemoryDeviceInfo { + /* For pc-dimm */ unsigned long long address; unsigned int slot; bool hotplugged; bool hotpluggable; + /* For virtio-mem */ + unsigned long long size; /* in bytes */ };
int qemuMonitorGetMemoryDeviceInfo(qemuMonitor *mon, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 071c6a19be..b0a65a8617 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -8220,7 +8220,6 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon, virJSONValue *cmd; virJSONValue *reply = NULL; virJSONValue *data = NULL; - qemuMonitorMemoryDeviceInfo *meminfo = NULL; size_t i;
if (!(cmd = qemuMonitorJSONMakeCommand("query-memory-devices", NULL))) @@ -8241,6 +8240,9 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon,
for (i = 0; i < virJSONValueArraySize(data); i++) { virJSONValue *elem = virJSONValueArrayGet(data, i); + g_autofree qemuMonitorMemoryDeviceInfo *meminfo = NULL; + virJSONValue *dimminfo; + const char *devalias; const char *type;
if (!(type = virJSONValueObjectGetString(elem, "type"))) { @@ -8250,26 +8252,26 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon, goto cleanup; }
+ if (!(dimminfo = virJSONValueObjectGetObject(elem, "data"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("query-memory-devices reply data doesn't " + "contain enum data")); + goto cleanup; + } + + /* While 'id' attribute is marked as optional in QEMU's QAPI + * specification, Libvirt always sets it. Thus we can fail if not + * present. */ + if (!(devalias = virJSONValueObjectGetString(dimminfo, "id"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("dimm memory info data is missing 'id'")); + goto cleanup; + } + + meminfo = g_new0(qemuMonitorMemoryDeviceInfo, 1); + /* dimm memory devices */ if (STREQ(type, "dimm")) { - virJSONValue *dimminfo; - const char *devalias; - - if (!(dimminfo = virJSONValueObjectGetObject(elem, "data"))) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("query-memory-devices reply data doesn't " - "contain enum data")); - goto cleanup; - } - - if (!(devalias = virJSONValueObjectGetString(dimminfo, "id"))) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("dimm memory info data is missing 'id'")); - goto cleanup; - } - - meminfo = g_new0(qemuMonitorMemoryDeviceInfo, 1); - if (virJSONValueObjectGetNumberUlong(dimminfo, "addr", &meminfo->address) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -8300,17 +8302,27 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon,
}
- if (virHashAddEntry(info, devalias, meminfo) < 0) + } else if (STREQ(type, "virtio-mem")) { + if (virJSONValueObjectGetNumberUlong(dimminfo, "size", + &meminfo->size) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("malformed/missing slot in dimm memory info")); goto cleanup; - - meminfo = NULL; + } + } else { + /* type not handled yet */ + continue; } + + if (virHashAddEntry(info, devalias, meminfo) < 0) + goto cleanup; + + meminfo = NULL; }
ret = 0;
cleanup: - VIR_FREE(meminfo); virJSONValueFree(cmd); virJSONValueFree(reply); return ret; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 47fa4068b0..55f081cc9d 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8561,6 +8561,9 @@ qemuProcessReconnect(void *opaque)
qemuDomainVcpuPersistOrder(obj->def);
+ if (qemuDomainUpdateMemoryDeviceInfo(driver, obj, QEMU_ASYNC_JOB_NONE) < 0) + goto error; + if (qemuProcessDetectIOThreadPIDs(driver, obj, QEMU_ASYNC_JOB_NONE) < 0) goto error;
-- 2.26.3
-- Thanks & Regards, Jing,Qi

On 4/26/21 9:52 AM, Jing Qi wrote:
S2: From the docs, "virtio-pmem" works also. # cat pmem.xml <memory model='virtio-pmem' access='shared'> <source> <path>/tmp/nvdimm</path> </source> <target> <size unit='KiB'>131072</size> <label> <size unit='KiB'>128</size> </label> </target> </memory> # virsh attach-device pc_test pmem.xml Device attached successfully #virsh dumpxml pc_test <memory unit='KiB'>1441792</memory> <currentMemory unit='KiB'>1282048</currentMemory>
Question: But seems it don't support <node> element , and if add <node> element, below errors are prompted - error: Failed to attach device from pmem error: unsupported configuration: virtio-pmem does not support NUMA nodes Since the <node> element is supported by nvdimm & virtio-mem, can you help to confirm if it's as expected? @Michal Privoznik <mailto:mprivozn@redhat.com>
Yes, this is expected. virtio-pmem does not support NUMA node: https://listman.redhat.com/archives/libvir-list/2020-December/msg00211.html Michal

On Fri, Apr 23, 2021 at 15:24:31 +0200, Michal Privoznik wrote:
If the QEMU driver restarts it loses the track of the actual size of virtio-mem (because it's runtime type of information and thus not stored in XML) and therefore, we have to refresh it when reconnecting to the domain monitor.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 37 +++++++++++++++++++---- src/qemu/qemu_monitor.h | 3 ++ src/qemu/qemu_monitor_json.c | 58 ++++++++++++++++++++++-------------- src/qemu/qemu_process.c | 3 ++ 4 files changed, 73 insertions(+), 28 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 226e1d9b79..3c17d8704a 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -8278,9 +8278,21 @@ qemuDomainUpdateMemoryDeviceInfo(virQEMUDriver *driver, return -1; }
- /* if qemu doesn't support the info request, just carry on */ - if (rc == -2) + /* If qemu doesn't support the info request, just carry on, unless we + * really need it. */ + if (rc == -2) { + for (i = 0; i < vm->def->nmems; i++) { + virDomainMemoryDef *mem = vm->def->mems[i]; + + if (mem->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("qemu did not return info on vitio-mem device")); + return -1;
IMO this error doesn't justify losing a VM on reconnect.
+ } + } + return 0; + }
if (rc < 0) return -1; @@ -8295,9 +8307,24 @@ qemuDomainUpdateMemoryDeviceInfo(virQEMUDriver *driver, if (!(dimm = virHashLookup(meminfo, mem->info.alias))) continue;
- mem->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM; - mem->info.addr.dimm.slot = dimm->slot; - mem->info.addr.dimm.base = dimm->address; + switch (mem->model) { + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: + mem->actualsize = VIR_DIV_UP(dimm->size, 1024); + break; + + case VIR_DOMAIN_MEMORY_MODEL_DIMM: + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM: + mem->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM; + mem->info.addr.dimm.slot = dimm->slot; + mem->info.addr.dimm.base = dimm->address; + break; + + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM: + case VIR_DOMAIN_MEMORY_MODEL_NONE: + case VIR_DOMAIN_MEMORY_MODEL_LAST: + /* nada */
Remove the pointless coment.
+ break; + } }
virHashFree(meminfo);
[...]
diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 071c6a19be..b0a65a8617 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c
[...]
@@ -8250,26 +8252,26 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon, goto cleanup; }
+ if (!(dimminfo = virJSONValueObjectGetObject(elem, "data"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("query-memory-devices reply data doesn't " + "contain enum data")); + goto cleanup; + } + + /* While 'id' attribute is marked as optional in QEMU's QAPI + * specification, Libvirt always sets it. Thus we can fail if not + * present. */ + if (!(devalias = virJSONValueObjectGetString(dimminfo, "id"))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("dimm memory info data is missing 'id'")); + goto cleanup; + } + + meminfo = g_new0(qemuMonitorMemoryDeviceInfo, 1); + /* dimm memory devices */ if (STREQ(type, "dimm")) { - virJSONValue *dimminfo; - const char *devalias; - - if (!(dimminfo = virJSONValueObjectGetObject(elem, "data"))) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("query-memory-devices reply data doesn't " - "contain enum data")); - goto cleanup; - } - - if (!(devalias = virJSONValueObjectGetString(dimminfo, "id"))) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("dimm memory info data is missing 'id'")); - goto cleanup; - } - - meminfo = g_new0(qemuMonitorMemoryDeviceInfo, 1); - if (virJSONValueObjectGetNumberUlong(dimminfo, "addr", &meminfo->address) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -8300,17 +8302,27 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon,
}
- if (virHashAddEntry(info, devalias, meminfo) < 0) + } else if (STREQ(type, "virtio-mem")) { + if (virJSONValueObjectGetNumberUlong(dimminfo, "size", + &meminfo->size) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("malformed/missing slot in dimm memory info"));
Misleading error message.
goto cleanup; - - meminfo = NULL; + } + } else { + /* type not handled yet */ + continue; } +
With the above issues resolved: Reviewed-by: Peter Krempa <pkrempa@redhat.com>

Reporting how much memory is exposed to the guest happens under <currentMemory/> which is taken from def->mem.cur_balloon. The reported amount should account for both balloon size and the sum of @actualsize of all virtio-mems. For instance, if domain has 4GiB via balloon and additional 2GiB via virtio-mem, then the domain XML should report 6GiB. The same applies for domain statistics. The way to achieve this is to account for either balloon or virtio-mem when the size of the other is changed, e.g. on balloon change we have to add all @actualsize (for non virtio-mem these will be zero, so the check for memory model is needless, but makes it more obvious what's happening), and vice versa. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_driver.c | 4 ++++ src/qemu/qemu_process.c | 17 +++++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index b495a261d1..e376a64bee 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4324,6 +4324,7 @@ processMemoryDeviceSizeChange(virQEMUDriver *driver, { virDomainMemoryDef *mem = NULL; virObjectEvent *event = NULL; + unsigned long long balloon; if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) return; @@ -4339,7 +4340,10 @@ processMemoryDeviceSizeChange(virQEMUDriver *driver, goto endjob; } + balloon = vm->def->mem.cur_balloon - mem->actualsize; mem->actualsize = VIR_DIV_UP(info->size, 1024); + balloon += mem->actualsize; + vm->def->mem.cur_balloon = balloon; event = virDomainEventMemoryDeviceSizeChangeNewFromObj(vm, info->devAlias, diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 55f081cc9d..b529983f73 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1232,10 +1232,19 @@ qemuProcessHandleBalloonChange(qemuMonitor *mon G_GNUC_UNUSED, virQEMUDriver *driver = opaque; virObjectEvent *event = NULL; g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); + size_t i; virObjectLock(vm); event = virDomainEventBalloonChangeNewFromObj(vm, actual); + /* We want the balloon size stored in domain definition to account for the + * actual size of virtio-mem too. */ + VIR_DEBUG("balloon size before fix is %lld", actual); + for (i = 0; i < vm->def->nmems; i++) { + if (vm->def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + actual += vm->def->mems[i]->actualsize; + } + VIR_DEBUG("Updating balloon from %lld to %lld kb", vm->def->mem.cur_balloon, actual); vm->def->mem.cur_balloon = actual; @@ -2430,6 +2439,7 @@ qemuProcessRefreshBalloonState(virQEMUDriver *driver, int asyncJob) { unsigned long long balloon; + size_t i; int rc; /* if no ballooning is available, the current size equals to the current @@ -2446,6 +2456,13 @@ qemuProcessRefreshBalloonState(virQEMUDriver *driver, if (qemuDomainObjExitMonitor(driver, vm) < 0 || rc < 0) return -1; + /* We want the balloon size stored in domain definition to account for the + * actual size of virtio-mem too. */ + VIR_DEBUG("balloon size before fix is %lld", balloon); + for (i = 0; i < vm->def->nmems; i++) { + if (vm->def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + balloon += vm->def->mems[i]->actualsize; + } vm->def->mem.cur_balloon = balloon; return 0; -- 2.26.3

On Fri, Apr 23, 2021 at 15:24:32 +0200, Michal Privoznik wrote:
Reporting how much memory is exposed to the guest happens under <currentMemory/> which is taken from def->mem.cur_balloon. The reported amount should account for both balloon size and the sum of @actualsize of all virtio-mems. For instance, if domain has 4GiB via balloon and additional 2GiB via virtio-mem, then the domain XML should report 6GiB. The same applies for domain statistics.
The way to achieve this is to account for either balloon or virtio-mem when the size of the other is changed, e.g. on balloon change we have to add all @actualsize (for non virtio-mem these will be zero, so the check for memory model is needless, but makes it more obvious what's happening), and vice versa.
I think this needs to be recorded in a comment where the update happens. Also the detail that the virtio-mem size isn't actually included in the balloon size reported by qemu, because that's a rather important quirk of all of this.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_driver.c | 4 ++++ src/qemu/qemu_process.c | 17 +++++++++++++++++ 2 files changed, 21 insertions(+)
The code looks okay, but I'll do a final review with the comments in place.

The qemuDomainSetMemoryFlags() allows for memballoon (<currentMemory/>) changes for both active and inactive guests. And just before doing any change, we have to make sure that the new size is not greater than the total memory (<memory/>). However, the total memory includes not only the regular guest memory, but also sum of maximum sizes of all virtio-mems (in fact all memory devices for that matter). But virtio-mem devices are modified differently (via virDomainUpdateDevice()) and thus the upper limit for new balloon size has to be lowered. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_driver.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index e376a64bee..6a9da62ded 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2433,12 +2433,28 @@ static int qemuDomainSetMemoryFlags(virDomainPtr dom, unsigned long newmem, } else { /* resize the current memory */ unsigned long oldmax = 0; + size_t i; - if (def) + if (def) { oldmax = virDomainDefGetMemoryTotal(def); + + /* While virtio-mem is regular mem from guest POV, it can't be + * modified through this API. */ + for (i = 0; i < def->nmems; i++) { + if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + oldmax -= def->mems[i]->size; + } + } + if (persistentDef) { - if (!oldmax || oldmax > virDomainDefGetMemoryTotal(persistentDef)) + if (!oldmax || oldmax > virDomainDefGetMemoryTotal(persistentDef)) { oldmax = virDomainDefGetMemoryTotal(persistentDef); + + for (i = 0; i < persistentDef->nmems; i++) { + if (persistentDef->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + oldmax -= persistentDef->mems[i]->size; + } + } } if (newmem > oldmax) { -- 2.26.3

On Fri, Apr 23, 2021 at 15:24:33 +0200, Michal Privoznik wrote:
The qemuDomainSetMemoryFlags() allows for memballoon (<currentMemory/>) changes for both active and inactive guests. And just before doing any change, we have to make sure that the new size is not greater than the total memory (<memory/>).
However, the total memory includes not only the regular guest memory, but also sum of maximum sizes of all virtio-mems (in fact all memory devices for that matter). But virtio-mem devices are modified differently (via virDomainUpdateDevice()) and thus the upper limit for new balloon size has to be lowered.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_driver.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index e376a64bee..6a9da62ded 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2433,12 +2433,28 @@ static int qemuDomainSetMemoryFlags(virDomainPtr dom, unsigned long newmem, } else { /* resize the current memory */ unsigned long oldmax = 0; + size_t i;
- if (def) + if (def) { oldmax = virDomainDefGetMemoryTotal(def); + + /* While virtio-mem is regular mem from guest POV, it can't be + * modified through this API. */ + for (i = 0; i < def->nmems; i++) { + if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) + oldmax -= def->mems[i]->size; + } + } + if (persistentDef) { - if (!oldmax || oldmax > virDomainDefGetMemoryTotal(persistentDef)) + if (!oldmax || oldmax > virDomainDefGetMemoryTotal(persistentDef)) {
IMO this code will not be correct any more when you do the subtraction of virtio-mem devices above as you compare it with the total memory of the persistent def.

New 'update-memory-device' command is introduced which aims on making it user friendly to change <memory/> device. So far I just need to change <requested/> so I'm introducing --requested-size only; but the idea is that this is extensible for other cases too. For instance, want to change <myElement/>? A new --my-element argument can be easily introduced. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/manpages/virsh.rst | 30 ++++++++ tools/virsh-domain.c | 149 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 179 insertions(+) diff --git a/docs/manpages/virsh.rst b/docs/manpages/virsh.rst index ad91cd6356..bc2cf8e93e 100644 --- a/docs/manpages/virsh.rst +++ b/docs/manpages/virsh.rst @@ -4922,6 +4922,36 @@ results as some fields may be autogenerated and thus match devices other than expected. +update-memory-device +-------------------- + +**Syntax:** + +:: + + update-memory-device domain [--print-xml] [--alias alias] + [[--live] [--config] | [--current]] + [--requested-size size] + +This command finds ``<memory/>`` device inside given *domain*, changes +requested values and passes updated device XML to daemon. If *--print-xml* is +specified then the device is not changed, but the updated device XML is printed +to stdout. If there are more than one ``<memory/>`` devices in *domain* use +*--alias* to select the desired one. + +If *--live* is specified, affect a running domain. +If *--config* is specified, affect the next startup of a persistent guest. +If *--current* is specified, it is equivalent to either *--live* or +*--config*, depending on the current state of the guest. +Both *--live* and *--config* flags may be given, but *--current* is +exclusive. Not specifying any flag is the same as specifying *--current*. + +If *--requested-size* is specified then ``<requested/>`` under memory target is +changed to requested *size* (as scaled integer, see ``NOTES`` above). It +defaults to kibibytes if no suffix is provided. The option is valid only for +``virtio-mem`` memory device model. + + change-media ------------ diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 7384d2d68d..82f3d7118b 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -9119,6 +9119,149 @@ cmdSetmaxmem(vshControl *ctl, const vshCmd *cmd) return ret; } + +/* + * "update-memory-device" command + */ +static const vshCmdInfo info_update_memory_device[] = { + {.name = "help", + .data = N_("update memory device of a domain") + }, + {.name = "desc", + .data = N_("Update values of a memory device of a domain") + }, + {.name = NULL} +}; + +static const vshCmdOptDef opts_update_memory_device[] = { + VIRSH_COMMON_OPT_DOMAIN_FULL(0), + VIRSH_COMMON_OPT_DOMAIN_CONFIG, + VIRSH_COMMON_OPT_DOMAIN_LIVE, + VIRSH_COMMON_OPT_DOMAIN_CURRENT, + {.name = "print-xml", + .type = VSH_OT_BOOL, + .help = N_("print updated memory device XML instead of executing the change") + }, + {.name = "alias", + .type = VSH_OT_STRING, + .completer = virshDomainDeviceAliasCompleter, + .help = N_("memory device alias"), + }, + {.name = "requested-size", + .type = VSH_OT_INT, + .help = N_("new value of <requested/> size, as scaled integer (default KiB)") + }, + {.name = NULL} +}; + +static int +virshGetUpdatedMemoryXML(char **updatedMemoryXML, + vshControl *ctl, + const vshCmd *cmd, + virDomainPtr dom, + unsigned int flags) +{ + const char *alias = NULL; + g_autoptr(xmlDoc) doc = NULL; + g_autoptr(xmlXPathContext) ctxt = NULL; + g_autofree char *xpath = NULL; + int nmems; + g_autofree xmlNodePtr *mems = NULL; + unsigned int domainXMLFlags = 0; + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) + domainXMLFlags |= VIR_DOMAIN_XML_INACTIVE; + + if (virshDomainGetXMLFromDom(ctl, dom, domainXMLFlags, &doc, &ctxt) < 0) + return -1; + + if (vshCommandOptStringReq(ctl, cmd, "alias", &alias) < 0) + return -1; + + if (alias) { + xpath = g_strdup_printf("/domain/devices/memory[./alias/@name='%s']", alias); + } else { + xpath = g_strdup("/domain/devices/memory"); + } + + nmems = virXPathNodeSet(xpath, ctxt, &mems); + if (nmems < 0) { + vshSaveLibvirtError(); + return -1; + } else if (nmems == 0) { + vshError(ctl, _("no memory device found")); + return -1; + } else if (nmems > 1) { + vshError(ctl, _("multiple memory devices found, use --alias to select one")); + return -1; + } + + ctxt->node = mems[0]; + + if (vshCommandOptBool(cmd, "requested-size")) { + xmlNodePtr requestedSizeNode; + g_autofree char *kibibytesStr = NULL; + unsigned long long bytes = 0; + unsigned long kibibytes = 0; + + if (vshCommandOptScaledInt(ctl, cmd, "requested-size", &bytes, 1024, ULLONG_MAX) < 0) + return -1; + kibibytes = VIR_DIV_UP(bytes, 1024); + + requestedSizeNode = virXPathNode("./target/requested", ctxt); + + if (!requestedSizeNode) { + vshError(ctl, _("virtio-mem device is missing <requested/>")); + return -1; + } + + kibibytesStr = g_strdup_printf("%lu", kibibytes); + xmlNodeSetContent(requestedSizeNode, BAD_CAST kibibytesStr); + } + + if (!(*updatedMemoryXML = virXMLNodeToString(doc, mems[0]))) { + vshSaveLibvirtError(); + return -1; + } + + return 0; +} + +static bool +cmdUpdateMemoryDevice(vshControl *ctl, const vshCmd *cmd) +{ + g_autoptr(virshDomain) dom = NULL; + bool config = vshCommandOptBool(cmd, "config"); + bool live = vshCommandOptBool(cmd, "live"); + bool current = vshCommandOptBool(cmd, "current"); + g_autofree char *updatedMemoryXML = NULL; + unsigned int flags = VIR_DOMAIN_AFFECT_CURRENT; + + VSH_EXCLUSIVE_OPTIONS_VAR(current, live); + VSH_EXCLUSIVE_OPTIONS_VAR(current, config); + + if (config) + flags |= VIR_DOMAIN_AFFECT_CONFIG; + if (live) + flags |= VIR_DOMAIN_AFFECT_LIVE; + + if (!(dom = virshCommandOptDomain(ctl, cmd, NULL))) + return false; + + if (virshGetUpdatedMemoryXML(&updatedMemoryXML, ctl, cmd, dom, flags) < 0) + return false; + + if (vshCommandOptBool(cmd, "print-xml")) { + vshPrint(ctl, "%s", updatedMemoryXML); + } else { + if (virDomainUpdateDeviceFlags(dom, updatedMemoryXML, flags) < 0) + return false; + } + + return true; +} + + /* * "memtune" command */ @@ -15035,6 +15178,12 @@ const vshCmdDef domManagementCmds[] = { .info = info_update_device, .flags = 0 }, + {.name = "update-memory-device", + .handler = cmdUpdateMemoryDevice, + .opts = opts_update_memory_device, + .info = info_update_memory_device, + .flags = 0 + }, {.name = "vcpucount", .handler = cmdVcpucount, .opts = opts_vcpucount, -- 2.26.3

On Fri, Apr 23, 2021 at 15:24:34 +0200, Michal Privoznik wrote:
New 'update-memory-device' command is introduced which aims on making it user friendly to change <memory/> device. So far I just need to change <requested/> so I'm introducing --requested-size only; but the idea is that this is extensible for other cases too. For instance, want to change <myElement/>? A new --my-element argument can be easily introduced.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/manpages/virsh.rst | 30 ++++++++ tools/virsh-domain.c | 149 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 179 insertions(+)
Reviewed-by: Peter Krempa <pkrempa@redhat.com>

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- NEWS.rst | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/NEWS.rst b/NEWS.rst index 5f8b0ae02d..af6631724a 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -8,6 +8,22 @@ the changes introduced by each of them. For a more fine-grained view, use the `git log`_. +v7.4.0 (unreleased) +=================== + +* **New features** + + * Introduce virtio-mem ``<memory/>`` model + + New virtio-mem model is introduced for ``<memory/>`` device which is a + paravirtualized mechanism of adding/removing memory to/from a VM. Use + ``virDomainUpdateDeviceFlags()`` API to adjust amount of memory or ``virsh + update-memory-device`` for convenience. + +* **Improvements** + +* **Bug fixes** + v7.3.0 (unreleased) =================== -- 2.26.3

On Fri, Apr 23, 2021 at 15:24:35 +0200, Michal Privoznik wrote:
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- NEWS.rst | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
Reviewed-by: Peter Krempa <pkrempa@redhat.com>

This commit adds new memorydevices.rst page which should serve all models of memory devices. Yet, I'm documenting virtio-mem quirks only. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/kbase/index.rst | 4 + docs/kbase/memorydevices.rst | 150 +++++++++++++++++++++++++++++++++++ docs/kbase/meson.build | 1 + 3 files changed, 155 insertions(+) create mode 100644 docs/kbase/memorydevices.rst diff --git a/docs/kbase/index.rst b/docs/kbase/index.rst index 532804fe05..45450bf33b 100644 --- a/docs/kbase/index.rst +++ b/docs/kbase/index.rst @@ -46,6 +46,10 @@ Usage `PCI topology <../pci-addresses.html>`__ Addressing schemes for PCI devices +`Memory devices <memorydevices.html>`__ + Memory devices and their use + + Internals / Debugging --------------------- diff --git a/docs/kbase/memorydevices.rst b/docs/kbase/memorydevices.rst new file mode 100644 index 0000000000..5c4c45a77f --- /dev/null +++ b/docs/kbase/memorydevices.rst @@ -0,0 +1,150 @@ +============== +Memory devices +============== + +.. contents:: + +Basics +====== + +Memory devices can be divided into two families: DIMMs and NVDIMMs. The former +is typical RAM memory: it's volatile and thus its contents doesn't survive +reboots nor guest shut downs and power ons. The latter retains its contents +across reboots or power outages. + +In Libvirt, there are two models for DIMMs: + +* ``dimm`` model: + + :: + + <memory model='dimm'> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + <address type='dimm' slot='0'/> + </memory> + +* ``virtio-mem`` model: + + :: + + <memory model='virtio-mem'> + <target> + <size unit='KiB'>1048576</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>524288</requested> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </memory> + +Then there are two models for NVDIMMs: + +* ``nvidmm`` model: + + :: + + <memory model='nvdimm'> + <source> + <path>/tmp/nvdimm</path> + </source> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + <address type='dimm' slot='0'/> + </memory> + +* ``virtio-pmem`` model: + + :: + + <memory model='virtio-pmem' access='shared'> + <source> + <path>/tmp/virtio_pmem</path> + </source> + <target> + <size unit='KiB'>524288</size> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </memory> + + +Please note that (maybe somewhat surprisingly) virtio models go onto PCI bus +instead of DIMM slots. + +Furthermore, DIMMs can have ``<source/>`` element which configures backend for +devices. For NVDIMMs the element is mandatory and reflects where the contents +is saved. + +See `memory devices documentation <../formatdomain.html#elementsMemory>`_. + +``virtio-mem`` model +==================== + +The ``virtio-mem`` model can be viewed as revised memory balloon. It offers +adding and removing memory (without the actual hotplug of the device). It +solves problems that memory balloon can't solve on its own and thus is more +flexible than DIMM + balloon solution. ``virtio-mem`` is NUMA aware, and thus +memory can be inflated/deflated only for a subset of guest NUMA nodes. Also, +it works with chunks that are either exposed to guest or taken back from it. + +See https://virtio-mem.gitlab.io/ + +Under the hood, ``virtio-mem`` device is split into chunks of equal size which +are then exposed to the guest. Either all of them or only a portion depending +on user's request. Therefore there are three important sizes for +``virtio-mem``. All are to be found under ``<target/>`` element: + +#. The maximum size the device can ever offer, exposed under ``<size/>`` +#. The size of a single block, exposed under ``<block/>`` +#. The current size exposed to the guest, exposed under ``<requested/>`` + +For instance, the following example the maximum size is 4GiB, the block size is +2MiB and only 1GiB should be exposed to the guest: + + :: + + <memory model='virtio-mem'> + <target> + <size unit='KiB'>4194304</size> + <block unit='KiB'>2048</block> + <requested unit='KiB'>1048576</requested> + </target> + </memory> + +Please note that ``<requested/>`` must be an integer multiple of ``<block/>`` +size or zero (no blocks exposed to the guest) and has to be less or equal to +``<size/>`` (all blocks exposed to the guest). Furthermore, QEMU recommends the +``<block/>`` size to be as big as a Transparent Huge Page (usually 2MiB). + +To change the size exposed to the guest, users should pass memory device XML +with nothing but ``<requested/>`` changed into the +``virDomainUpdateDeviceFlags()`` API. For user's convenience this can be done +via virsh too: + + :: + + # virsh update-memory-device $dom --requested size 2GiB + +If there are two or more ``<memory/>`` devices then ``--alias`` shall be used +to tell virsh which memory device should be updated. + +For running guests there is fourth size that can be found under ``<target/>``: + + :: + + <actual unit='KiB'>2097152</actual> + +The ``<actual/>`` reflects the actual size used by the guest. In general it +can differ from ``<requested/>``. Reasons include guest kernel missing +``virtio-mem`` module and thus being unable to take offered memory, or guest +kernel being unable to free memory. Since ``<actual/>`` only reports size to +users, the element is never parsed. It is formatted only into live XML. + +Since changing actual allocation requires cooperation with guest kernel, +requests for change are not instant. Therefore, libvirt emits +``VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE`` event whenever actual +allocation changed. diff --git a/docs/kbase/meson.build b/docs/kbase/meson.build index 7b4e7abbd3..7f6adba915 100644 --- a/docs/kbase/meson.build +++ b/docs/kbase/meson.build @@ -9,6 +9,7 @@ docs_kbase_files = [ 'locking-lockd', 'locking', 'locking-sanlock', + 'memorydevices', 'migrationinternals', 'qemu-passthrough-security', 'rpm-deployment', -- 2.26.3

On Fri, Apr 23, 2021 at 15:24:36 +0200, Michal Privoznik wrote:
This commit adds new memorydevices.rst page which should serve all models of memory devices. Yet, I'm documenting virtio-mem quirks only.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/kbase/index.rst | 4 + docs/kbase/memorydevices.rst | 150 +++++++++++++++++++++++++++++++++++ docs/kbase/meson.build | 1 + 3 files changed, 155 insertions(+) create mode 100644 docs/kbase/memorydevices.rst
diff --git a/docs/kbase/index.rst b/docs/kbase/index.rst index 532804fe05..45450bf33b 100644 --- a/docs/kbase/index.rst +++ b/docs/kbase/index.rst @@ -46,6 +46,10 @@ Usage `PCI topology <../pci-addresses.html>`__ Addressing schemes for PCI devices
+`Memory devices <memorydevices.html>`__ + Memory devices and their use + + Internals / Debugging ---------------------
diff --git a/docs/kbase/memorydevices.rst b/docs/kbase/memorydevices.rst new file mode 100644 index 0000000000..5c4c45a77f --- /dev/null +++ b/docs/kbase/memorydevices.rst @@ -0,0 +1,150 @@ +============== +Memory devices +============== + +.. contents:: + +Basics +====== + +Memory devices can be divided into two families: DIMMs and NVDIMMs. The former
DIMMs/NVDIMMs description is misleading since you then put virtio-mem under dimm. I'd suggest you use 'volatile' and 'non-volatile'
+is typical RAM memory: it's volatile and thus its contents doesn't survive +reboots nor guest shut downs and power ons. The latter retains its contents +across reboots or power outages. + +In Libvirt, there are two models for DIMMs: + +* ``dimm`` model: + + :: + + <memory model='dimm'> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + <address type='dimm' slot='0'/> + </memory> + +* ``virtio-mem`` model: + + :: + + <memory model='virtio-mem'> + <target> + <size unit='KiB'>1048576</size> + <node>0</node> + <block unit='KiB'>2048</block> + <requested unit='KiB'>524288</requested> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> + </memory> + +Then there are two models for NVDIMMs: + +* ``nvidmm`` model: + + :: + + <memory model='nvdimm'> + <source> + <path>/tmp/nvdimm</path> + </source> + <target> + <size unit='KiB'>523264</size> + <node>0</node> + </target> + <address type='dimm' slot='0'/> + </memory> + +* ``virtio-pmem`` model: + + :: + + <memory model='virtio-pmem' access='shared'> + <source> + <path>/tmp/virtio_pmem</path> + </source> + <target> + <size unit='KiB'>524288</size> + </target> + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> + </memory> + + +Please note that (maybe somewhat surprisingly) virtio models go onto PCI bus +instead of DIMM slots.
This weird note won't be needed if you clarify the volatility in the first place. You can then mention the differences for each device type along with advantages etc ...
+ +Furthermore, DIMMs can have ``<source/>`` element which configures backend for +devices. For NVDIMMs the element is mandatory and reflects where the contents +is saved. + +See `memory devices documentation <../formatdomain.html#elementsMemory>`_. + +``virtio-mem`` model +==================== + +The ``virtio-mem`` model can be viewed as revised memory balloon. It offers
IMO it's closer to better memory hotplug with semantics closer to the balloon
+adding and removing memory (without the actual hotplug of the device). It +solves problems that memory balloon can't solve on its own and thus is more +flexible than DIMM + balloon solution. ``virtio-mem`` is NUMA aware, and thus +memory can be inflated/deflated only for a subset of guest NUMA nodes. Also, +it works with chunks that are either exposed to guest or taken back from it. + +See https://virtio-mem.gitlab.io/ + +Under the hood, ``virtio-mem`` device is split into chunks of equal size which +are then exposed to the guest. Either all of them or only a portion depending +on user's request. Therefore there are three important sizes for +``virtio-mem``. All are to be found under ``<target/>`` element: + +#. The maximum size the device can ever offer, exposed under ``<size/>`` +#. The size of a single block, exposed under ``<block/>`` +#. The current size exposed to the guest, exposed under ``<requested/>`` + +For instance, the following example the maximum size is 4GiB, the block size is +2MiB and only 1GiB should be exposed to the guest: + + :: + + <memory model='virtio-mem'> + <target> + <size unit='KiB'>4194304</size> + <block unit='KiB'>2048</block> + <requested unit='KiB'>1048576</requested> + </target> + </memory> + +Please note that ``<requested/>`` must be an integer multiple of ``<block/>`` +size or zero (no blocks exposed to the guest) and has to be less or equal to +``<size/>`` (all blocks exposed to the guest). Furthermore, QEMU recommends the +``<block/>`` size to be as big as a Transparent Huge Page (usually 2MiB). + +To change the size exposed to the guest, users should pass memory device XML +with nothing but ``<requested/>`` changed into the +``virDomainUpdateDeviceFlags()`` API. For user's convenience this can be done +via virsh too: + + :: + + # virsh update-memory-device $dom --requested size 2GiB
--requested-size
+ +If there are two or more ``<memory/>`` devices then ``--alias`` shall be used +to tell virsh which memory device should be updated. + +For running guests there is fourth size that can be found under ``<target/>``: + + :: + + <actual unit='KiB'>2097152</actual> + +The ``<actual/>`` reflects the actual size used by the guest. In general it +can differ from ``<requested/>``. Reasons include guest kernel missing +``virtio-mem`` module and thus being unable to take offered memory, or guest +kernel being unable to free memory. Since ``<actual/>`` only reports size to +users, the element is never parsed. It is formatted only into live XML. + +Since changing actual allocation requires cooperation with guest kernel, +requests for change are not instant. Therefore, libvirt emits +``VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE`` event whenever actual +allocation changed.
Reviewed-by: Peter Krempa <pkrempa@redhat.com>

Tested with libvirt v7.2.0-381-g3c3c55be66 and qemu-5.2.0-0.7.rc2.fc34.x86_64 S1. Without hugepage 1.Start domain with virtio-mem device, and check the value of <memory> is the total memory number including virtio-mem device. # virsh start pc_test 2. Hot plug a virtio-mem device, the values of memory & currentMemory are refreshed accordingly. # cat mem.xml <memory model='virtio-mem'> <source> <nodemask>0</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>131072</size> <node>0</node> <block unit='KiB'>2048</block> <requested unit='KiB'>131072</requested> <actual unit='KiB'>131072</actual> </target> </memory> #virsh attach-device pc_test mem.xml 3. Updated the requested size, and it's changed as expected from the output of "virsh dumpxml " # virsh update-memory-device pc_test --alias virtiomem1 --requested-size 100MiB S2. With below hugepage configuration, tested above 3 steps again, and all steps worked. <memoryBacking> <hugepages/> </memoryBacking> S3. Hot plug virtio-pmem and the value of <memory> is updated accordingly. <memory model='virtio-pmem' access='shared'> <source> <path>/tmp/nvdimm</path> </source> <target> <size unit='KiB'>131072</size> <label> <size unit='KiB'>128</size> </label> </target> </memory> # virsh attach-device pc_test pmem.xml Device attached successfully #virsh dumpxml pc_test ... -- Tested-by Jing Qi <jinqi@redhat.com> On Fri, Apr 23, 2021 at 9:25 PM Michal Privoznik <mprivozn@redhat.com> wrote:
This is a rebased version of v3 I've sent about a month ago:
https://listman.redhat.com/archives/libvir-list/2021-March/msg00823.html
v2 here:
https://listman.redhat.com/archives/libvir-list/2021-February/msg00961.html
My intent is to merge these only after the upcoming release so that we have the biggest test window possible.
diff to v2: - Dropped code that forbade use of virtio-mem and memballoon at the same time; - This meant that I had to adjust memory accounting, qemuDomainSetMemoryFlags() - see patches 11/15 and 12/15 which are new. - Fixed small nits raised by Peter in his review of v2
Michal Prívozník (14): virhostmem: Introduce virHostMemGetTHPSize() qemu_process: Deduplicate code in qemuProcessNeedHugepagesPath() qemu_process: Drop needless check in qemuProcessNeedMemoryBackingPath() qemu_capabilities: Introduce QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI conf: Introduce virtio-mem <memory/> model qemu: Build command line for virtio-mem qemu: Wire up <memory/> live update qemu: Wire up MEMORY_DEVICE_SIZE_CHANGE event qemu: Refresh the actual size of virtio-mem on monitor reconnect qemu: Account for both memballoon and virtio-mem qemuDomainSetMemoryFlags: Take virtio-mem into consideration virsh: Introduce update-memory-device command news: document recent virtio memory addition kbase: Document virtio-mem
NEWS.rst | 16 ++ docs/formatdomain.rst | 45 +++- docs/kbase/index.rst | 4 + docs/kbase/memorydevices.rst | 150 +++++++++++ docs/kbase/meson.build | 1 + docs/manpages/virsh.rst | 30 +++ docs/schemas/domaincommon.rng | 16 ++ examples/c/misc/event-test.c | 17 ++ include/libvirt/libvirt-domain.h | 23 ++ src/conf/domain_conf.c | 115 ++++++++- src/conf/domain_conf.h | 15 ++ src/conf/domain_event.c | 84 +++++++ src/conf/domain_event.h | 10 + src/conf/domain_validate.c | 39 +++ src/libvirt_private.syms | 5 + src/qemu/qemu_alias.c | 10 +- src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 13 +- src/qemu/qemu_domain.c | 50 +++- src/qemu/qemu_domain.h | 1 + src/qemu/qemu_domain_address.c | 38 ++- src/qemu/qemu_driver.c | 233 +++++++++++++++++- src/qemu/qemu_hotplug.c | 18 ++ src/qemu/qemu_hotplug.h | 5 + src/qemu/qemu_monitor.c | 37 +++ src/qemu/qemu_monitor.h | 28 +++ src/qemu/qemu_monitor_json.c | 97 ++++++-- src/qemu/qemu_monitor_json.h | 5 + src/qemu/qemu_process.c | 118 ++++++++- src/qemu/qemu_validate.c | 8 + src/remote/remote_daemon_dispatch.c | 30 +++ src/remote/remote_driver.c | 32 +++ src/remote/remote_protocol.x | 14 +- src/remote_protocol-structs | 7 + src/security/security_apparmor.c | 1 + src/security/security_dac.c | 2 + src/security/security_selinux.c | 2 + src/util/virhostmem.c | 63 +++++ src/util/virhostmem.h | 3 + tests/domaincapsmock.c | 9 + .../caps_5.1.0.x86_64.xml | 1 + .../caps_5.2.0.x86_64.xml | 1 + .../caps_6.0.0.x86_64.xml | 1 + ...mory-hotplug-virtio-mem.x86_64-latest.args | 41 +++ .../memory-hotplug-virtio-mem.xml | 67 +++++ tests/qemuxml2argvtest.c | 1 + ...emory-hotplug-virtio-mem.x86_64-latest.xml | 1 + tests/qemuxml2xmltest.c | 1 + tools/virsh-domain.c | 169 +++++++++++++ 50 files changed, 1613 insertions(+), 67 deletions(-) create mode 100644 docs/kbase/memorydevices.rst create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml create mode 120000 tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml
-- 2.26.3
-- Thanks & Regards, Jing,Qi
participants (4)
-
Jing Qi
-
Michal Privoznik
-
Michal Prívozník
-
Peter Krempa