[libvirt] [PATCH 0/5] Use memfd if possible

From: Marc-André Lureau <marcandre.lureau@redhat.com> Hi, This is an alternative to "[RFC v2 02/16] qemu: add memfd memory backing", which added a new source type. Instead, Daniel suggested to automatically use memfd for anonymous memory when available. Marc-André Lureau (5): qemu: add memory-backend-memfd capability check qemu: check memory-backend-memfd.hugetlb capability qemu: prefer memfd for anonymous memory conf: drop hugepage non-anoymous memory requirement tests: add qemuxml2argv memfd-memory-numa test src/conf/domain_conf.c | 7 -- src/qemu/qemu_capabilities.c | 10 ++ src/qemu/qemu_capabilities.h | 2 + src/qemu/qemu_command.c | 61 +++++++--- .../caps_2.12.0.aarch64.replies | 94 ++++++++++++--- .../caps_2.12.0.aarch64.xml | 4 +- .../caps_2.12.0.ppc64.replies | 90 +++++++++++--- .../caps_2.12.0.ppc64.xml | 4 +- .../caps_2.12.0.s390x.replies | 98 ++++++++++++---- .../caps_2.12.0.s390x.xml | 4 +- .../caps_2.12.0.x86_64.replies | 110 +++++++++++++----- .../caps_2.12.0.x86_64.xml | 4 +- .../caps_3.0.0.ppc64.replies | 90 +++++++++++--- .../qemucapabilitiesdata/caps_3.0.0.ppc64.xml | 4 +- .../caps_3.0.0.riscv32.replies | 86 +++++++++++--- .../caps_3.0.0.riscv32.xml | 2 + .../caps_3.0.0.riscv64.replies | 86 +++++++++++--- .../caps_3.0.0.riscv64.xml | 2 + .../caps_3.0.0.x86_64.replies | 110 +++++++++++++----- .../caps_3.0.0.x86_64.xml | 4 +- tests/qemuxml2argvdata/memfd-memory-numa.args | 28 +++++ tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 ++++++ tests/qemuxml2argvtest.c | 5 + 23 files changed, 760 insertions(+), 181 deletions(-) create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.args create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml -- 2.19.0.rc1

From: Marc-André Lureau <marcandre.lureau@redhat.com> Check availability of "-object memory-backend-memfd". Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml | 1 + 10 files changed, 11 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index a075677421..2c2f193aae 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -508,6 +508,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, /* 315 */ "vfio-pci.display", "blockdev", + "memory-backend-memfd", ); @@ -1148,6 +1149,7 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { { "vhost-vsock-device", QEMU_CAPS_DEVICE_VHOST_VSOCK }, { "mch", QEMU_CAPS_DEVICE_MCH }, { "sev-guest", QEMU_CAPS_SEV_GUEST }, + { "memory-backend-memfd", QEMU_CAPS_OBJECT_MEMORY_MEMFD }, }; static struct virQEMUCapsStringFlags virQEMUCapsDevicePropsVirtioBalloon[] = { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 3d3a978759..24ce4545a4 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -492,6 +492,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ /* 315 */ QEMU_CAPS_VFIO_PCI_DISPLAY, /* -device vfio-pci.display */ QEMU_CAPS_BLOCKDEV, /* -blockdev and blockdev-add are supported */ + QEMU_CAPS_OBJECT_MEMORY_MEMFD, /* -object memory-backend-memfd */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml index 3c1a704100..a1f5111fc4 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml @@ -169,6 +169,7 @@ <flag name='tpm-emulator'/> <flag name='egl-headless'/> <flag name='vfio-pci.display'/> + <flag name='memory-backend-memfd'/> <version>2011090</version> <kvmVersion>0</kvmVersion> <microcodeVersion>347144</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml index 877362eaef..c246e5c94a 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml @@ -167,6 +167,7 @@ <flag name='machine.pseries.cap-htm'/> <flag name='egl-headless'/> <flag name='vfio-pci.display'/> + <flag name='memory-backend-memfd'/> <version>2011090</version> <kvmVersion>0</kvmVersion> <microcodeVersion>427928</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml index b8e46a970a..2e6baf42a7 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml @@ -133,6 +133,7 @@ <flag name='tpm-emulator'/> <flag name='egl-headless'/> <flag name='vfio-pci.display'/> + <flag name='memory-backend-memfd'/> <version>2012000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>375593</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml index edf944bc35..4b410997d1 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml @@ -211,6 +211,7 @@ <flag name='sev-guest'/> <flag name='egl-headless'/> <flag name='vfio-pci.display'/> + <flag name='memory-backend-memfd'/> <version>2011090</version> <kvmVersion>0</kvmVersion> <microcodeVersion>415790</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml index 6892c9bd64..a9967d67a3 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml @@ -167,6 +167,7 @@ <flag name='machine.pseries.cap-htm'/> <flag name='egl-headless'/> <flag name='vfio-pci.display'/> + <flag name='memory-backend-memfd'/> <version>2012050</version> <kvmVersion>0</kvmVersion> <microcodeVersion>446365</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml b/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml index 39cc480dd2..183ad7cc6c 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml @@ -104,6 +104,7 @@ <flag name='chardev-fd-pass'/> <flag name='tpm-emulator'/> <flag name='egl-headless'/> + <flag name='memory-backend-memfd'/> <version>3000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml index 344740879e..f2f32e3025 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml @@ -104,6 +104,7 @@ <flag name='chardev-fd-pass'/> <flag name='tpm-emulator'/> <flag name='egl-headless'/> + <flag name='memory-backend-memfd'/> <version>3000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml index 747f51b799..e4665af165 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml @@ -213,6 +213,7 @@ <flag name='usb-storage.werror'/> <flag name='egl-headless'/> <flag name='vfio-pci.display'/> + <flag name='memory-backend-memfd'/> <version>3000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>427391</microcodeVersion> -- 2.19.0.rc1

On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Check availability of "-object memory-backend-memfd".
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml | 1 + 10 files changed, 11 insertions(+)
Reviewed-by: John Ferlan <jferlan@redhat.com> John

From: Marc-André Lureau <marcandre.lureau@redhat.com> QEMU 3.1 should only expose the property if the host is actually capable of creating hugetable-backed memfd. However, it may fail at runtime depending on requested "hugetlbsize". Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_capabilities.c | 8 ++ src/qemu/qemu_capabilities.h | 1 + .../caps_2.12.0.aarch64.replies | 94 ++++++++++++--- .../caps_2.12.0.aarch64.xml | 3 +- .../caps_2.12.0.ppc64.replies | 90 +++++++++++--- .../caps_2.12.0.ppc64.xml | 3 +- .../caps_2.12.0.s390x.replies | 98 ++++++++++++---- .../caps_2.12.0.s390x.xml | 3 +- .../caps_2.12.0.x86_64.replies | 110 +++++++++++++----- .../caps_2.12.0.x86_64.xml | 3 +- .../caps_3.0.0.ppc64.replies | 90 +++++++++++--- .../qemucapabilitiesdata/caps_3.0.0.ppc64.xml | 3 +- .../caps_3.0.0.riscv32.replies | 86 +++++++++++--- .../caps_3.0.0.riscv32.xml | 1 + .../caps_3.0.0.riscv64.replies | 86 +++++++++++--- .../caps_3.0.0.riscv64.xml | 1 + .../caps_3.0.0.x86_64.replies | 110 +++++++++++++----- .../caps_3.0.0.x86_64.xml | 3 +- 18 files changed, 637 insertions(+), 156 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 2c2f193aae..9d8a18c7ff 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -509,6 +509,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, "vfio-pci.display", "blockdev", "memory-backend-memfd", + "memory-backend-memfd.hugetlb", ); @@ -1439,6 +1440,10 @@ static struct virQEMUCapsStringFlags virQEMUCapsObjectPropsMemoryBackendFile[] = { "discard-data", QEMU_CAPS_OBJECT_MEMORY_FILE_DISCARD }, }; +static struct virQEMUCapsStringFlags virQEMUCapsObjectPropsMemoryBackendMemfd[] = { + { "hugetlb", QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB }, +}; + static struct virQEMUCapsStringFlags virQEMUCapsObjectPropsSPAPRMachine[] = { { "cap-hpt-max-page-size", QEMU_CAPS_MACHINE_PSERIES_CAP_HPT_MAX_PAGE_SIZE }, { "cap-htm", QEMU_CAPS_MACHINE_PSERIES_CAP_HTM }, @@ -1448,6 +1453,9 @@ static virQEMUCapsObjectTypeProps virQEMUCapsObjectProps[] = { { "memory-backend-file", virQEMUCapsObjectPropsMemoryBackendFile, ARRAY_CARDINALITY(virQEMUCapsObjectPropsMemoryBackendFile), QEMU_CAPS_OBJECT_MEMORY_FILE }, + { "memory-backend-memfd", virQEMUCapsObjectPropsMemoryBackendMemfd, + ARRAY_CARDINALITY(virQEMUCapsObjectPropsMemoryBackendMemfd), + QEMU_CAPS_OBJECT_MEMORY_FILE }, { "spapr-machine", virQEMUCapsObjectPropsSPAPRMachine, ARRAY_CARDINALITY(virQEMUCapsObjectPropsSPAPRMachine), -1 }, diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 24ce4545a4..4ab2a6061e 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -493,6 +493,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ QEMU_CAPS_VFIO_PCI_DISPLAY, /* -device vfio-pci.display */ QEMU_CAPS_BLOCKDEV, /* -blockdev and blockdev-add are supported */ QEMU_CAPS_OBJECT_MEMORY_MEMFD, /* -object memory-backend-memfd */ + QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB, /* -object memory-backend-memfd.hugetlb */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.replies b/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.replies index db5b5140d5..7ce37bc27f 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.replies +++ b/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.replies @@ -5551,13 +5551,71 @@ { "execute": "qom-list-properties", "arguments": { - "typename": "spapr-machine" + "typename": "memory-backend-memfd" }, "id": "libvirt-35" } { - "id": "libvirt-35", + "return": [ + { + "name": "policy", + "type": "HostMemPolicy" + }, + { + "name": "share", + "type": "bool" + }, + { + "name": "host-nodes", + "type": "int" + }, + { + "name": "prealloc", + "type": "bool" + }, + { + "name": "dump", + "type": "bool" + }, + { + "name": "size", + "type": "int" + }, + { + "name": "merge", + "type": "bool" + }, + { + "name": "seal", + "type": "bool" + }, + { + "name": "hugetlbsize", + "type": "int" + }, + { + "name": "hugetlb", + "type": "bool" + }, + { + "name": "type", + "type": "string" + } + ], + "id": "libvirt-35" +} + +{ + "execute": "qom-list-properties", + "arguments": { + "typename": "spapr-machine" + }, + "id": "libvirt-36" +} + +{ + "id": "libvirt-36", "error": { "class": "DeviceNotFound", "desc": "Class 'spapr-machine' not found" @@ -5566,7 +5624,7 @@ { "execute": "query-machines", - "id": "libvirt-36" + "id": "libvirt-37" } { @@ -5863,12 +5921,12 @@ "cpu-max": 1 } ], - "id": "libvirt-36" + "id": "libvirt-37" } { "execute": "query-cpu-definitions", - "id": "libvirt-37" + "id": "libvirt-38" } { @@ -6044,35 +6102,35 @@ "static": false } ], - "id": "libvirt-37" + "id": "libvirt-38" } { "execute": "query-tpm-models", - "id": "libvirt-38" + "id": "libvirt-39" } { "return": [ ], - "id": "libvirt-38" + "id": "libvirt-39" } { "execute": "query-tpm-types", - "id": "libvirt-39" + "id": "libvirt-40" } { "return": [ "emulator" ], - "id": "libvirt-39" + "id": "libvirt-40" } { "execute": "query-command-line-options", - "id": "libvirt-40" + "id": "libvirt-41" } { @@ -7233,12 +7291,12 @@ "option": "drive" } ], - "id": "libvirt-40" + "id": "libvirt-41" } { "execute": "query-migrate-capabilities", - "id": "libvirt-41" + "id": "libvirt-42" } { @@ -7300,12 +7358,12 @@ "capability": "dirty-bitmaps" } ], - "id": "libvirt-41" + "id": "libvirt-42" } { "execute": "query-qmp-schema", - "id": "libvirt-42" + "id": "libvirt-43" } { @@ -18673,12 +18731,12 @@ "meta-type": "object" } ], - "id": "libvirt-42" + "id": "libvirt-43" } { "execute": "query-gic-capabilities", - "id": "libvirt-43" + "id": "libvirt-44" } { @@ -18694,7 +18752,7 @@ "kernel": false } ], - "id": "libvirt-43" + "id": "libvirt-44" } { diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml index a1f5111fc4..5d5be965f0 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml @@ -170,9 +170,10 @@ <flag name='egl-headless'/> <flag name='vfio-pci.display'/> <flag name='memory-backend-memfd'/> + <flag name='memory-backend-memfd.hugetlb'/> <version>2011090</version> <kvmVersion>0</kvmVersion> - <microcodeVersion>347144</microcodeVersion> + <microcodeVersion>347959</microcodeVersion> <package>v2.12.0-rc0</package> <arch>aarch64</arch> <cpu type='kvm' name='pxa262'/> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.replies b/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.replies index 786cb1844a..4ba3abfac1 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.replies +++ b/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.replies @@ -5606,11 +5606,69 @@ { "execute": "qom-list-properties", "arguments": { - "typename": "spapr-machine" + "typename": "memory-backend-memfd" }, "id": "libvirt-36" } +{ + "return": [ + { + "name": "policy", + "type": "HostMemPolicy" + }, + { + "name": "share", + "type": "bool" + }, + { + "name": "host-nodes", + "type": "int" + }, + { + "name": "prealloc", + "type": "bool" + }, + { + "name": "dump", + "type": "bool" + }, + { + "name": "size", + "type": "int" + }, + { + "name": "merge", + "type": "bool" + }, + { + "name": "seal", + "type": "bool" + }, + { + "name": "hugetlbsize", + "type": "int" + }, + { + "name": "hugetlb", + "type": "bool" + }, + { + "name": "type", + "type": "string" + } + ], + "id": "libvirt-36" +} + +{ + "execute": "qom-list-properties", + "arguments": { + "typename": "spapr-machine" + }, + "id": "libvirt-37" +} + { "return": [ { @@ -5769,12 +5827,12 @@ "type": "bool" } ], - "id": "libvirt-36" + "id": "libvirt-37" } { "execute": "query-machines", - "id": "libvirt-37" + "id": "libvirt-38" } { @@ -5912,12 +5970,12 @@ "cpu-max": 1 } ], - "id": "libvirt-37" + "id": "libvirt-38" } { "execute": "query-cpu-definitions", - "id": "libvirt-38" + "id": "libvirt-39" } { @@ -8113,35 +8171,35 @@ "static": false } ], - "id": "libvirt-38" + "id": "libvirt-39" } { "execute": "query-tpm-models", - "id": "libvirt-39" + "id": "libvirt-40" } { "return": [ ], - "id": "libvirt-39" + "id": "libvirt-40" } { "execute": "query-tpm-types", - "id": "libvirt-40" + "id": "libvirt-41" } { "return": [ "emulator" ], - "id": "libvirt-40" + "id": "libvirt-41" } { "execute": "query-command-line-options", - "id": "libvirt-41" + "id": "libvirt-42" } { @@ -9297,12 +9355,12 @@ "option": "drive" } ], - "id": "libvirt-41" + "id": "libvirt-42" } { "execute": "query-migrate-capabilities", - "id": "libvirt-42" + "id": "libvirt-43" } { @@ -9364,12 +9422,12 @@ "capability": "dirty-bitmaps" } ], - "id": "libvirt-42" + "id": "libvirt-43" } { "execute": "query-qmp-schema", - "id": "libvirt-43" + "id": "libvirt-44" } { @@ -20737,7 +20795,7 @@ "meta-type": "object" } ], - "id": "libvirt-43" + "id": "libvirt-44" } { diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml index c246e5c94a..703fa86de7 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml @@ -168,9 +168,10 @@ <flag name='egl-headless'/> <flag name='vfio-pci.display'/> <flag name='memory-backend-memfd'/> + <flag name='memory-backend-memfd.hugetlb'/> <version>2011090</version> <kvmVersion>0</kvmVersion> - <microcodeVersion>427928</microcodeVersion> + <microcodeVersion>428743</microcodeVersion> <package>v2.12.0-rc0</package> <arch>ppc64</arch> <cpu type='kvm' name='default'/> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.replies b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.replies index 25a005c5cd..946d48e083 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.replies +++ b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.replies @@ -3905,13 +3905,71 @@ { "execute": "qom-list-properties", "arguments": { - "typename": "spapr-machine" + "typename": "memory-backend-memfd" }, "id": "libvirt-35" } { - "id": "libvirt-35", + "return": [ + { + "name": "policy", + "type": "HostMemPolicy" + }, + { + "name": "share", + "type": "bool" + }, + { + "name": "host-nodes", + "type": "int" + }, + { + "name": "prealloc", + "type": "bool" + }, + { + "name": "dump", + "type": "bool" + }, + { + "name": "size", + "type": "int" + }, + { + "name": "merge", + "type": "bool" + }, + { + "name": "seal", + "type": "bool" + }, + { + "name": "hugetlbsize", + "type": "int" + }, + { + "name": "hugetlb", + "type": "bool" + }, + { + "name": "type", + "type": "string" + } + ], + "id": "libvirt-35" +} + +{ + "execute": "qom-list-properties", + "arguments": { + "typename": "spapr-machine" + }, + "id": "libvirt-36" +} + +{ + "id": "libvirt-36", "error": { "class": "DeviceNotFound", "desc": "Class 'spapr-machine' not found" @@ -3920,7 +3978,7 @@ { "execute": "query-machines", - "id": "libvirt-36" + "id": "libvirt-37" } { @@ -3978,12 +4036,12 @@ "alias": "s390-ccw-virtio" } ], - "id": "libvirt-36" + "id": "libvirt-37" } { "execute": "query-cpu-definitions", - "id": "libvirt-37" + "id": "libvirt-38" } { @@ -4518,35 +4576,35 @@ "migration-safe": true } ], - "id": "libvirt-37" + "id": "libvirt-38" } { "execute": "query-tpm-models", - "id": "libvirt-38" + "id": "libvirt-39" } { "return": [ ], - "id": "libvirt-38" + "id": "libvirt-39" } { "execute": "query-tpm-types", - "id": "libvirt-39" + "id": "libvirt-40" } { "return": [ "emulator" ], - "id": "libvirt-39" + "id": "libvirt-40" } { "execute": "query-command-line-options", - "id": "libvirt-40" + "id": "libvirt-41" } { @@ -5671,12 +5729,12 @@ "option": "drive" } ], - "id": "libvirt-40" + "id": "libvirt-41" } { "execute": "query-migrate-capabilities", - "id": "libvirt-41" + "id": "libvirt-42" } { @@ -5738,12 +5796,12 @@ "capability": "dirty-bitmaps" } ], - "id": "libvirt-41" + "id": "libvirt-42" } { "execute": "query-qmp-schema", - "id": "libvirt-42" + "id": "libvirt-43" } { @@ -17111,7 +17169,7 @@ "meta-type": "object" } ], - "id": "libvirt-42" + "id": "libvirt-43" } { @@ -17122,7 +17180,7 @@ "name": "host" } }, - "id": "libvirt-43" + "id": "libvirt-44" } { @@ -17160,7 +17218,7 @@ } } }, - "id": "libvirt-43" + "id": "libvirt-44" } { @@ -17174,11 +17232,11 @@ } } }, - "id": "libvirt-44" + "id": "libvirt-45" } { - "id": "libvirt-44", + "id": "libvirt-45", "error": { "class": "GenericError", "desc": "Property '.migratable' not found" diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml index 2e6baf42a7..dd9125e2a5 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml @@ -134,9 +134,10 @@ <flag name='egl-headless'/> <flag name='vfio-pci.display'/> <flag name='memory-backend-memfd'/> + <flag name='memory-backend-memfd.hugetlb'/> <version>2012000</version> <kvmVersion>0</kvmVersion> - <microcodeVersion>375593</microcodeVersion> + <microcodeVersion>376408</microcodeVersion> <package></package> <arch>s390x</arch> <hostCPU type='kvm' model='z14-base' migratability='no'> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.replies b/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.replies index 9d7e653216..ec015f775a 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.replies +++ b/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.replies @@ -4964,13 +4964,71 @@ { "execute": "qom-list-properties", "arguments": { - "typename": "spapr-machine" + "typename": "memory-backend-memfd" }, "id": "libvirt-40" } { - "id": "libvirt-40", + "return": [ + { + "name": "policy", + "type": "HostMemPolicy" + }, + { + "name": "share", + "type": "bool" + }, + { + "name": "host-nodes", + "type": "int" + }, + { + "name": "prealloc", + "type": "bool" + }, + { + "name": "dump", + "type": "bool" + }, + { + "name": "size", + "type": "int" + }, + { + "name": "merge", + "type": "bool" + }, + { + "name": "seal", + "type": "bool" + }, + { + "name": "hugetlbsize", + "type": "int" + }, + { + "name": "hugetlb", + "type": "bool" + }, + { + "name": "type", + "type": "string" + } + ], + "id": "libvirt-40" +} + +{ + "execute": "qom-list-properties", + "arguments": { + "typename": "spapr-machine" + }, + "id": "libvirt-41" +} + +{ + "id": "libvirt-41", "error": { "class": "DeviceNotFound", "desc": "Class 'spapr-machine' not found" @@ -4979,7 +5037,7 @@ { "execute": "query-machines", - "id": "libvirt-41" + "id": "libvirt-42" } { @@ -5178,12 +5236,12 @@ "cpu-max": 255 } ], - "id": "libvirt-41" + "id": "libvirt-42" } { "execute": "query-cpu-definitions", - "id": "libvirt-42" + "id": "libvirt-43" } { @@ -5697,12 +5755,12 @@ "migration-safe": true } ], - "id": "libvirt-42" + "id": "libvirt-43" } { "execute": "query-tpm-models", - "id": "libvirt-43" + "id": "libvirt-44" } { @@ -5710,12 +5768,12 @@ "tpm-crb", "tpm-tis" ], - "id": "libvirt-43" + "id": "libvirt-44" } { "execute": "query-tpm-types", - "id": "libvirt-44" + "id": "libvirt-45" } { @@ -5723,12 +5781,12 @@ "passthrough", "emulator" ], - "id": "libvirt-44" + "id": "libvirt-45" } { "execute": "query-command-line-options", - "id": "libvirt-45" + "id": "libvirt-46" } { @@ -7015,12 +7073,12 @@ "option": "drive" } ], - "id": "libvirt-45" + "id": "libvirt-46" } { "execute": "query-migrate-capabilities", - "id": "libvirt-46" + "id": "libvirt-47" } { @@ -7082,12 +7140,12 @@ "capability": "dirty-bitmaps" } ], - "id": "libvirt-46" + "id": "libvirt-47" } { "execute": "query-qmp-schema", - "id": "libvirt-47" + "id": "libvirt-48" } { @@ -18455,7 +18513,7 @@ "meta-type": "object" } ], - "id": "libvirt-47" + "id": "libvirt-48" } { @@ -18466,7 +18524,7 @@ "name": "host" } }, - "id": "libvirt-48" + "id": "libvirt-49" } { @@ -18656,7 +18714,7 @@ } } }, - "id": "libvirt-48" + "id": "libvirt-49" } { @@ -18848,7 +18906,7 @@ } } }, - "id": "libvirt-49" + "id": "libvirt-50" } { @@ -19103,7 +19161,7 @@ } } }, - "id": "libvirt-49" + "id": "libvirt-50" } { @@ -19117,7 +19175,7 @@ } } }, - "id": "libvirt-50" + "id": "libvirt-51" } { @@ -19307,7 +19365,7 @@ } } }, - "id": "libvirt-50" + "id": "libvirt-51" } { @@ -19499,7 +19557,7 @@ } } }, - "id": "libvirt-51" + "id": "libvirt-52" } { @@ -19754,12 +19812,12 @@ } } }, - "id": "libvirt-51" + "id": "libvirt-52" } { "execute": "query-sev-capabilities", - "id": "libvirt-52" + "id": "libvirt-53" } { @@ -19769,7 +19827,7 @@ "cert-chain": "AQAAAAAOAAAAQAAAAAOAAAAQAAAAAOAAAAQAAAAAOAAAAQAAAAAOAAA", "pdh": "AQAAAAAOAAAAQAAAAAOAAAAQAAAAAOAAAAQAAAAAOAAAAQAAAAAOAAAAQAAAAAOAAA" }, - "id": "libvirt-52" + "id": "libvirt-53" } { diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml index 4b410997d1..704f905e39 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml @@ -212,9 +212,10 @@ <flag name='egl-headless'/> <flag name='vfio-pci.display'/> <flag name='memory-backend-memfd'/> + <flag name='memory-backend-memfd.hugetlb'/> <version>2011090</version> <kvmVersion>0</kvmVersion> - <microcodeVersion>415790</microcodeVersion> + <microcodeVersion>416605</microcodeVersion> <package>v2.12.0-rc0</package> <arch>x86_64</arch> <hostCPU type='kvm' model='base' migratability='yes'> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.replies b/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.replies index 932c418f3f..42a14d6688 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.replies +++ b/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.replies @@ -5689,11 +5689,69 @@ { "execute": "qom-list-properties", "arguments": { - "typename": "spapr-machine" + "typename": "memory-backend-memfd" }, "id": "libvirt-36" } +{ + "return": [ + { + "name": "policy", + "type": "HostMemPolicy" + }, + { + "name": "share", + "type": "bool" + }, + { + "name": "host-nodes", + "type": "int" + }, + { + "name": "prealloc", + "type": "bool" + }, + { + "name": "dump", + "type": "bool" + }, + { + "name": "size", + "type": "int" + }, + { + "name": "merge", + "type": "bool" + }, + { + "name": "seal", + "type": "bool" + }, + { + "name": "hugetlbsize", + "type": "int" + }, + { + "name": "hugetlb", + "type": "bool" + }, + { + "name": "type", + "type": "string" + } + ], + "id": "libvirt-36" +} + +{ + "execute": "qom-list-properties", + "arguments": { + "typename": "spapr-machine" + }, + "id": "libvirt-37" +} + { "return": [ { @@ -5862,12 +5920,12 @@ "type": "bool" } ], - "id": "libvirt-36" + "id": "libvirt-37" } { "execute": "query-machines", - "id": "libvirt-37" + "id": "libvirt-38" } { @@ -6010,12 +6068,12 @@ "cpu-max": 1 } ], - "id": "libvirt-37" + "id": "libvirt-38" } { "execute": "query-cpu-definitions", - "id": "libvirt-38" + "id": "libvirt-39" } { @@ -8211,35 +8269,35 @@ "static": false } ], - "id": "libvirt-38" + "id": "libvirt-39" } { "execute": "query-tpm-models", - "id": "libvirt-39" + "id": "libvirt-40" } { "return": [ ], - "id": "libvirt-39" + "id": "libvirt-40" } { "execute": "query-tpm-types", - "id": "libvirt-40" + "id": "libvirt-41" } { "return": [ "emulator" ], - "id": "libvirt-40" + "id": "libvirt-41" } { "execute": "query-command-line-options", - "id": "libvirt-41" + "id": "libvirt-42" } { @@ -9369,12 +9427,12 @@ "option": "drive" } ], - "id": "libvirt-41" + "id": "libvirt-42" } { "execute": "query-migrate-capabilities", - "id": "libvirt-42" + "id": "libvirt-43" } { @@ -9444,12 +9502,12 @@ "capability": "late-block-activate" } ], - "id": "libvirt-42" + "id": "libvirt-43" } { "execute": "query-qmp-schema", - "id": "libvirt-43" + "id": "libvirt-44" } { @@ -21608,7 +21666,7 @@ "meta-type": "object" } ], - "id": "libvirt-43" + "id": "libvirt-44" } { diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml index a9967d67a3..fe7b831144 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml @@ -168,9 +168,10 @@ <flag name='egl-headless'/> <flag name='vfio-pci.display'/> <flag name='memory-backend-memfd'/> + <flag name='memory-backend-memfd.hugetlb'/> <version>2012050</version> <kvmVersion>0</kvmVersion> - <microcodeVersion>446365</microcodeVersion> + <microcodeVersion>447180</microcodeVersion> <package>v2.12.0-1689-g518d23a</package> <arch>ppc64</arch> <cpu type='kvm' name='default'/> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.replies b/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.replies index 97eeec3dbc..11cfbbc654 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.replies +++ b/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.replies @@ -1821,13 +1821,71 @@ { "execute": "qom-list-properties", "arguments": { - "typename": "spapr-machine" + "typename": "memory-backend-memfd" }, "id": "libvirt-31" } { - "id": "libvirt-31", + "return": [ + { + "name": "policy", + "type": "HostMemPolicy" + }, + { + "name": "share", + "type": "bool" + }, + { + "name": "host-nodes", + "type": "int" + }, + { + "name": "prealloc", + "type": "bool" + }, + { + "name": "dump", + "type": "bool" + }, + { + "name": "size", + "type": "int" + }, + { + "name": "merge", + "type": "bool" + }, + { + "name": "seal", + "type": "bool" + }, + { + "name": "hugetlbsize", + "type": "int" + }, + { + "name": "hugetlb", + "type": "bool" + }, + { + "name": "type", + "type": "string" + } + ], + "id": "libvirt-31" +} + +{ + "execute": "qom-list-properties", + "arguments": { + "typename": "spapr-machine" + }, + "id": "libvirt-32" +} + +{ + "id": "libvirt-32", "error": { "class": "DeviceNotFound", "desc": "Class 'spapr-machine' not found" @@ -1836,7 +1894,7 @@ { "execute": "query-machines", - "id": "libvirt-32" + "id": "libvirt-33" } { @@ -1873,23 +1931,23 @@ "cpu-max": 1 } ], - "id": "libvirt-32" + "id": "libvirt-33" } { "execute": "query-tpm-models", - "id": "libvirt-33" + "id": "libvirt-34" } { "return": [ ], - "id": "libvirt-33" + "id": "libvirt-34" } { "execute": "query-tpm-types", - "id": "libvirt-34" + "id": "libvirt-35" } { @@ -1897,12 +1955,12 @@ "passthrough", "emulator" ], - "id": "libvirt-34" + "id": "libvirt-35" } { "execute": "query-command-line-options", - "id": "libvirt-35" + "id": "libvirt-36" } { @@ -3027,12 +3085,12 @@ "option": "drive" } ], - "id": "libvirt-35" + "id": "libvirt-36" } { "execute": "query-migrate-capabilities", - "id": "libvirt-36" + "id": "libvirt-37" } { @@ -3102,12 +3160,12 @@ "capability": "late-block-activate" } ], - "id": "libvirt-36" + "id": "libvirt-37" } { "execute": "query-qmp-schema", - "id": "libvirt-37" + "id": "libvirt-38" } { @@ -14782,5 +14840,5 @@ "meta-type": "object" } ], - "id": "libvirt-37" + "id": "libvirt-38" } diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml b/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml index 183ad7cc6c..892c40f632 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml @@ -105,6 +105,7 @@ <flag name='tpm-emulator'/> <flag name='egl-headless'/> <flag name='memory-backend-memfd'/> + <flag name='memory-backend-memfd.hugetlb'/> <version>3000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.replies b/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.replies index 23188fffb6..71cbcb85fd 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.replies +++ b/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.replies @@ -1821,13 +1821,71 @@ { "execute": "qom-list-properties", "arguments": { - "typename": "spapr-machine" + "typename": "memory-backend-memfd" }, "id": "libvirt-31" } { - "id": "libvirt-31", + "return": [ + { + "name": "policy", + "type": "HostMemPolicy" + }, + { + "name": "share", + "type": "bool" + }, + { + "name": "host-nodes", + "type": "int" + }, + { + "name": "prealloc", + "type": "bool" + }, + { + "name": "dump", + "type": "bool" + }, + { + "name": "size", + "type": "int" + }, + { + "name": "merge", + "type": "bool" + }, + { + "name": "seal", + "type": "bool" + }, + { + "name": "hugetlbsize", + "type": "int" + }, + { + "name": "hugetlb", + "type": "bool" + }, + { + "name": "type", + "type": "string" + } + ], + "id": "libvirt-31" +} + +{ + "execute": "qom-list-properties", + "arguments": { + "typename": "spapr-machine" + }, + "id": "libvirt-32" +} + +{ + "id": "libvirt-32", "error": { "class": "DeviceNotFound", "desc": "Class 'spapr-machine' not found" @@ -1836,7 +1894,7 @@ { "execute": "query-machines", - "id": "libvirt-32" + "id": "libvirt-33" } { @@ -1873,23 +1931,23 @@ "cpu-max": 1 } ], - "id": "libvirt-32" + "id": "libvirt-33" } { "execute": "query-tpm-models", - "id": "libvirt-33" + "id": "libvirt-34" } { "return": [ ], - "id": "libvirt-33" + "id": "libvirt-34" } { "execute": "query-tpm-types", - "id": "libvirt-34" + "id": "libvirt-35" } { @@ -1897,12 +1955,12 @@ "passthrough", "emulator" ], - "id": "libvirt-34" + "id": "libvirt-35" } { "execute": "query-command-line-options", - "id": "libvirt-35" + "id": "libvirt-36" } { @@ -3027,12 +3085,12 @@ "option": "drive" } ], - "id": "libvirt-35" + "id": "libvirt-36" } { "execute": "query-migrate-capabilities", - "id": "libvirt-36" + "id": "libvirt-37" } { @@ -3102,12 +3160,12 @@ "capability": "late-block-activate" } ], - "id": "libvirt-36" + "id": "libvirt-37" } { "execute": "query-qmp-schema", - "id": "libvirt-37" + "id": "libvirt-38" } { @@ -14782,5 +14840,5 @@ "meta-type": "object" } ], - "id": "libvirt-37" + "id": "libvirt-38" } diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml index f2f32e3025..0676cef108 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml @@ -105,6 +105,7 @@ <flag name='tpm-emulator'/> <flag name='egl-headless'/> <flag name='memory-backend-memfd'/> + <flag name='memory-backend-memfd.hugetlb'/> <version>3000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.replies b/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.replies index fdab682f94..f06b44724d 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.replies +++ b/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.replies @@ -5076,13 +5076,71 @@ { "execute": "qom-list-properties", "arguments": { - "typename": "spapr-machine" + "typename": "memory-backend-memfd" }, "id": "libvirt-40" } { - "id": "libvirt-40", + "return": [ + { + "name": "policy", + "type": "HostMemPolicy" + }, + { + "name": "share", + "type": "bool" + }, + { + "name": "host-nodes", + "type": "int" + }, + { + "name": "prealloc", + "type": "bool" + }, + { + "name": "dump", + "type": "bool" + }, + { + "name": "size", + "type": "int" + }, + { + "name": "merge", + "type": "bool" + }, + { + "name": "seal", + "type": "bool" + }, + { + "name": "hugetlbsize", + "type": "int" + }, + { + "name": "hugetlb", + "type": "bool" + }, + { + "name": "type", + "type": "string" + } + ], + "id": "libvirt-40" +} + +{ + "execute": "qom-list-properties", + "arguments": { + "typename": "spapr-machine" + }, + "id": "libvirt-41" +} + +{ + "id": "libvirt-41", "error": { "class": "DeviceNotFound", "desc": "Class 'spapr-machine' not found" @@ -5091,7 +5149,7 @@ { "execute": "query-machines", - "id": "libvirt-41" + "id": "libvirt-42" } { @@ -5300,12 +5358,12 @@ "cpu-max": 255 } ], - "id": "libvirt-41" + "id": "libvirt-42" } { "execute": "query-cpu-definitions", - "id": "libvirt-42" + "id": "libvirt-43" } { @@ -5742,12 +5800,12 @@ "migration-safe": true } ], - "id": "libvirt-42" + "id": "libvirt-43" } { "execute": "query-tpm-models", - "id": "libvirt-43" + "id": "libvirt-44" } { @@ -5755,12 +5813,12 @@ "tpm-crb", "tpm-tis" ], - "id": "libvirt-43" + "id": "libvirt-44" } { "execute": "query-tpm-types", - "id": "libvirt-44" + "id": "libvirt-45" } { @@ -5768,12 +5826,12 @@ "passthrough", "emulator" ], - "id": "libvirt-44" + "id": "libvirt-45" } { "execute": "query-command-line-options", - "id": "libvirt-45" + "id": "libvirt-46" } { @@ -7072,12 +7130,12 @@ "option": "drive" } ], - "id": "libvirt-45" + "id": "libvirt-46" } { "execute": "query-migrate-capabilities", - "id": "libvirt-46" + "id": "libvirt-47" } { @@ -7147,12 +7205,12 @@ "capability": "late-block-activate" } ], - "id": "libvirt-46" + "id": "libvirt-47" } { "execute": "query-qmp-schema", - "id": "libvirt-47" + "id": "libvirt-48" } { @@ -19032,7 +19090,7 @@ "meta-type": "object" } ], - "id": "libvirt-47" + "id": "libvirt-48" } { @@ -19043,7 +19101,7 @@ "name": "host" } }, - "id": "libvirt-48" + "id": "libvirt-49" } { @@ -19236,7 +19294,7 @@ } } }, - "id": "libvirt-48" + "id": "libvirt-49" } { @@ -19431,7 +19489,7 @@ } } }, - "id": "libvirt-49" + "id": "libvirt-50" } { @@ -19694,7 +19752,7 @@ } } }, - "id": "libvirt-49" + "id": "libvirt-50" } { @@ -19708,7 +19766,7 @@ } } }, - "id": "libvirt-50" + "id": "libvirt-51" } { @@ -19901,7 +19959,7 @@ } } }, - "id": "libvirt-50" + "id": "libvirt-51" } { @@ -20096,7 +20154,7 @@ } } }, - "id": "libvirt-51" + "id": "libvirt-52" } { @@ -20359,16 +20417,16 @@ } } }, - "id": "libvirt-51" + "id": "libvirt-52" } { "execute": "query-sev-capabilities", - "id": "libvirt-52" + "id": "libvirt-53" } { - "id": "libvirt-52", + "id": "libvirt-53", "error": { "class": "GenericError", "desc": "SEV feature is not available" diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml index e4665af165..19354ed72a 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml @@ -214,9 +214,10 @@ <flag name='egl-headless'/> <flag name='vfio-pci.display'/> <flag name='memory-backend-memfd'/> + <flag name='memory-backend-memfd.hugetlb'/> <version>3000000</version> <kvmVersion>0</kvmVersion> - <microcodeVersion>427391</microcodeVersion> + <microcodeVersion>428206</microcodeVersion> <package>v3.0.0</package> <arch>x86_64</arch> <hostCPU type='kvm' model='base' migratability='yes'> -- 2.19.0.rc1

On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
QEMU 3.1 should only expose the property if the host is actually capable of creating hugetable-backed memfd. However, it may fail at runtime depending on requested "hugetlbsize".
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_capabilities.c | 8 ++ src/qemu/qemu_capabilities.h | 1 + .../caps_2.12.0.aarch64.replies | 94 ++++++++++++--- .../caps_2.12.0.aarch64.xml | 3 +- .../caps_2.12.0.ppc64.replies | 90 +++++++++++--- .../caps_2.12.0.ppc64.xml | 3 +- .../caps_2.12.0.s390x.replies | 98 ++++++++++++---- .../caps_2.12.0.s390x.xml | 3 +- .../caps_2.12.0.x86_64.replies | 110 +++++++++++++----- .../caps_2.12.0.x86_64.xml | 3 +- .../caps_3.0.0.ppc64.replies | 90 +++++++++++--- .../qemucapabilitiesdata/caps_3.0.0.ppc64.xml | 3 +- .../caps_3.0.0.riscv32.replies | 86 +++++++++++--- .../caps_3.0.0.riscv32.xml | 1 + .../caps_3.0.0.riscv64.replies | 86 +++++++++++--- .../caps_3.0.0.riscv64.xml | 1 + .../caps_3.0.0.x86_64.replies | 110 +++++++++++++----- .../caps_3.0.0.x86_64.xml | 3 +- 18 files changed, 637 insertions(+), 156 deletions(-)
This one ended up being more ugly because Jano posted, got ACK'd, and pushed a series that removed some capabilities and adjusted the *.replies numbering. I was able to merge with some amount of effort. Some grumble, mumble about it being easier when there's a qapi/*.json file that lists the command and argument options as opposed to looking through source code ;-) Reviewed-by: John Ferlan <jferlan@redhat.com> John

From: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_command.c | 61 +++++++++++++++++++++++++++++------------ 1 file changed, 43 insertions(+), 18 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index c9e3a91e32..97cfc8a18d 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3113,6 +3113,24 @@ qemuBuildControllerDevCommandLine(virCommandPtr cmd, return ret; } +static int +qemuBuildMemoryBackendPropsShare(virJSONValuePtr props, + virDomainMemoryAccess memAccess) +{ + switch (memAccess) { + case VIR_DOMAIN_MEMORY_ACCESS_SHARED: + return virJSONValueObjectAdd(props, "b:share", true, NULL); + + case VIR_DOMAIN_MEMORY_ACCESS_PRIVATE: + return virJSONValueObjectAdd(props, "b:share", false, NULL); + + case VIR_DOMAIN_MEMORY_ACCESS_DEFAULT: + case VIR_DOMAIN_MEMORY_ACCESS_LAST: + break; + } + + return 0; +} /** * qemuBuildMemoryBackendProps: @@ -3259,7 +3277,23 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps, if (!(props = virJSONValueNewObject())) return -1; - if (useHugepage || mem->nvdimmPath || memAccess || + if (!mem->nvdimmPath && + def->mem.source == VIR_DOMAIN_MEMORY_SOURCE_ANONYMOUS && + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD) && + (!useHugepage || virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB))) { + backendType = "memory-backend-memfd"; + + if (useHugepage && + (virJSONValueObjectAdd(props, "b:hugetlb", useHugepage, NULL) < 0 || + virJSONValueObjectAdd(props, "U:hugetlbsize", pagesize << 10, NULL) < 0)) { + goto cleanup; + } + + if (qemuBuildMemoryBackendPropsShare(props, memAccess) < 0) { + goto cleanup; + } + + } else if (useHugepage || mem->nvdimmPath || memAccess || def->mem.source == VIR_DOMAIN_MEMORY_SOURCE_FILE) { if (mem->nvdimmPath) { @@ -3297,20 +3331,8 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps, goto cleanup; } - switch (memAccess) { - case VIR_DOMAIN_MEMORY_ACCESS_SHARED: - if (virJSONValueObjectAdd(props, "b:share", true, NULL) < 0) - goto cleanup; - break; - - case VIR_DOMAIN_MEMORY_ACCESS_PRIVATE: - if (virJSONValueObjectAdd(props, "b:share", false, NULL) < 0) - goto cleanup; - break; - - case VIR_DOMAIN_MEMORY_ACCESS_DEFAULT: - case VIR_DOMAIN_MEMORY_ACCESS_LAST: - break; + if (qemuBuildMemoryBackendPropsShare(props, memAccess) < 0) { + goto cleanup; } } else { backendType = "memory-backend-ram"; @@ -7609,7 +7631,8 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, if (virDomainNumatuneHasPerNodeBinding(def->numa) && !(virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM) || - virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE))) { + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE) || + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD))) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("Per-node memory binding is not supported " "with this QEMU")); @@ -7618,7 +7641,8 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, if (def->mem.nhugepages && def->mem.hugepages[0].size != system_page_size && - !virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE)) { + !(virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE) || + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB))) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("huge pages per NUMA node are not " "supported with this QEMU")); @@ -7635,7 +7659,8 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, * need to check which approach to use */ for (i = 0; i < ncells; i++) { if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM) || - virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE)) { + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE) || + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD)) { if ((rc = qemuBuildMemoryCellBackendStr(def, cfg, i, priv, &nodeBackends[i])) < 0) -- 2.19.0.rc1

On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Would be nice to have a few more words here. If you provide them I can add them... The if statement is difficult to read unless you know what each field really means. secondary question - should we document what gets used?, e.g.: https://libvirt.org/formatdomain.html#elementsMemoryBacking Seems to me the preference to use memfd is for memory backing using anonymous source for nvdimm's without a defined path, but sometimes my wording doesn't match reality.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_command.c | 61 +++++++++++++++++++++++++++++------------ 1 file changed, 43 insertions(+), 18 deletions(-)
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index c9e3a91e32..97cfc8a18d 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3113,6 +3113,24 @@ qemuBuildControllerDevCommandLine(virCommandPtr cmd, return ret; }
+static int +qemuBuildMemoryBackendPropsShare(virJSONValuePtr props, + virDomainMemoryAccess memAccess) +{ + switch (memAccess) { + case VIR_DOMAIN_MEMORY_ACCESS_SHARED: + return virJSONValueObjectAdd(props, "b:share", true, NULL); + + case VIR_DOMAIN_MEMORY_ACCESS_PRIVATE: + return virJSONValueObjectAdd(props, "b:share", false, NULL); + + case VIR_DOMAIN_MEMORY_ACCESS_DEFAULT: + case VIR_DOMAIN_MEMORY_ACCESS_LAST: + break; + } + + return 0; +}
/** * qemuBuildMemoryBackendProps:
The comments should have been updated... In particular: "Then, if one of the two memory-backend-* should be used..."
@@ -3259,7 +3277,23 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps, if (!(props = virJSONValueNewObject())) return -1;
- if (useHugepage || mem->nvdimmPath || memAccess ||
Is this preference over "-ram" or "-file"? It would seem to me someone choosing "file" has a specific case and this is more for those other options where if capabilities exist, then we try to use them.
+ if (!mem->nvdimmPath && + def->mem.source == VIR_DOMAIN_MEMORY_SOURCE_ANONYMOUS && + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD) && + (!useHugepage || virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB))) { + backendType = "memory-backend-memfd"; + + if (useHugepage && + (virJSONValueObjectAdd(props, "b:hugetlb", useHugepage, NULL) < 0 || + virJSONValueObjectAdd(props, "U:hugetlbsize", pagesize << 10, NULL) < 0)) { + goto cleanup; + } + + if (qemuBuildMemoryBackendPropsShare(props, memAccess) < 0) { + goto cleanup; + }
Running syntax-check would fail because of the { }
+ + } else if (useHugepage || mem->nvdimmPath || memAccess || def->mem.source == VIR_DOMAIN_MEMORY_SOURCE_FILE) {
if (mem->nvdimmPath) { @@ -3297,20 +3331,8 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps, goto cleanup; }
- switch (memAccess) { - case VIR_DOMAIN_MEMORY_ACCESS_SHARED: - if (virJSONValueObjectAdd(props, "b:share", true, NULL) < 0) - goto cleanup; - break; - - case VIR_DOMAIN_MEMORY_ACCESS_PRIVATE: - if (virJSONValueObjectAdd(props, "b:share", false, NULL) < 0) - goto cleanup; - break; - - case VIR_DOMAIN_MEMORY_ACCESS_DEFAULT: - case VIR_DOMAIN_MEMORY_ACCESS_LAST: - break; + if (qemuBuildMemoryBackendPropsShare(props, memAccess) < 0) { + goto cleanup; }
Running syntax-check would fail here because of the { } All this is fix-able without you needing to post another series, but I need you to provide the verbiage for the intro and perhaps something that could be added to the web page. I can adjust the patch accordingly. Assuming of course Michal doesn't have other reservations... Reviewed-by: John Ferlan <jferlan@redhat.com> John
} else { backendType = "memory-backend-ram"; @@ -7609,7 +7631,8 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg,
if (virDomainNumatuneHasPerNodeBinding(def->numa) && !(virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM) || - virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE))) { + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE) || + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD))) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("Per-node memory binding is not supported " "with this QEMU")); @@ -7618,7 +7641,8 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg,
if (def->mem.nhugepages && def->mem.hugepages[0].size != system_page_size && - !virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE)) { + !(virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE) || + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB))) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("huge pages per NUMA node are not " "supported with this QEMU")); @@ -7635,7 +7659,8 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, * need to check which approach to use */ for (i = 0; i < ncells; i++) { if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM) || - virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE)) { + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE) || + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD)) {
if ((rc = qemuBuildMemoryCellBackendStr(def, cfg, i, priv, &nodeBackends[i])) < 0)

Hi On Tue, Sep 11, 2018 at 2:46 AM, John Ferlan <jferlan@redhat.com> wrote:
On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Would be nice to have a few more words here. If you provide them I can add them... The if statement is difficult to read unless you know what each field really means.
hostmem-memfd is quite similar to hostmem-file. The main benefits are that it doesn't need to create filesystem files, and it also enforces sealing, providing a bit more safety.
secondary question - should we document what gets used?, e.g.:
https://libvirt.org/formatdomain.html#elementsMemoryBacking
Seems to me the preference to use memfd is for memory backing using anonymous source for nvdimm's without a defined path, but sometimes my wording doesn't match reality.
Yes it could be documented. But it's now an allocation decision that could evolve, or an implementation detail. Would you like to see something like that? <dt><code>source</code></dt> - <dd>In this attribute you can switch to file memorybacking or keep - default anonymous.</dd> + <dd>In this attribute you can switch to file memorybacking or + keep default anonymous. <span class="since">Since 4.8.0</span>, + when the memory is anonymous and the host supports it, libvirt + will use a memfd memory backing, providing additional safety + guarantees. + </dd> <dt><code>access</code></dt>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/qemu/qemu_command.c | 61 +++++++++++++++++++++++++++++------------ 1 file changed, 43 insertions(+), 18 deletions(-)
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index c9e3a91e32..97cfc8a18d 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3113,6 +3113,24 @@ qemuBuildControllerDevCommandLine(virCommandPtr cmd, return ret; }
+static int +qemuBuildMemoryBackendPropsShare(virJSONValuePtr props, + virDomainMemoryAccess memAccess) +{ + switch (memAccess) { + case VIR_DOMAIN_MEMORY_ACCESS_SHARED: + return virJSONValueObjectAdd(props, "b:share", true, NULL); + + case VIR_DOMAIN_MEMORY_ACCESS_PRIVATE: + return virJSONValueObjectAdd(props, "b:share", false, NULL); + + case VIR_DOMAIN_MEMORY_ACCESS_DEFAULT: + case VIR_DOMAIN_MEMORY_ACCESS_LAST: + break; + } + + return 0; +}
/** * qemuBuildMemoryBackendProps:
The comments should have been updated... In particular:
"Then, if one of the two memory-backend-* should be used..."
@@ -3259,7 +3277,23 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps, if (!(props = virJSONValueNewObject())) return -1;
- if (useHugepage || mem->nvdimmPath || memAccess ||
Is this preference over "-ram" or "-file"? It would seem to me someone choosing "file" has a specific case and this is more for those other options where if capabilities exist, then we try to use them.
(tbh, I don't know if you could have both nvdimmPath source == ANONYMOUS. That seems incompatible to me) So the 'if' statement reads: if memory source is anonymous, and qemu supports memfd (and if hugepage is requested and memfd supports it), then let's use memfd, otherwise, keep/use the existing allocation rules.
+ if (!mem->nvdimmPath && + def->mem.source == VIR_DOMAIN_MEMORY_SOURCE_ANONYMOUS && + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD) && + (!useHugepage || virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB))) { + backendType = "memory-backend-memfd"; + + if (useHugepage && + (virJSONValueObjectAdd(props, "b:hugetlb", useHugepage, NULL) < 0 || + virJSONValueObjectAdd(props, "U:hugetlbsize", pagesize << 10, NULL) < 0)) { + goto cleanup; + } + + if (qemuBuildMemoryBackendPropsShare(props, memAccess) < 0) { + goto cleanup; + }
Running syntax-check would fail because of the { }
Sorry, I keep mixing with the QEMU coding style...
+ + } else if (useHugepage || mem->nvdimmPath || memAccess || def->mem.source == VIR_DOMAIN_MEMORY_SOURCE_FILE) {
if (mem->nvdimmPath) { @@ -3297,20 +3331,8 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps, goto cleanup; }
- switch (memAccess) { - case VIR_DOMAIN_MEMORY_ACCESS_SHARED: - if (virJSONValueObjectAdd(props, "b:share", true, NULL) < 0) - goto cleanup; - break; - - case VIR_DOMAIN_MEMORY_ACCESS_PRIVATE: - if (virJSONValueObjectAdd(props, "b:share", false, NULL) < 0) - goto cleanup; - break; - - case VIR_DOMAIN_MEMORY_ACCESS_DEFAULT: - case VIR_DOMAIN_MEMORY_ACCESS_LAST: - break; + if (qemuBuildMemoryBackendPropsShare(props, memAccess) < 0) { + goto cleanup; }
Running syntax-check would fail here because of the { }
All this is fix-able without you needing to post another series, but I need you to provide the verbiage for the intro and perhaps something that could be added to the web page. I can adjust the patch accordingly.
Assuming of course Michal doesn't have other reservations...
Reviewed-by: John Ferlan <jferlan@redhat.com>
If you already resolved patch 1 & 2 conflicts, I would appreciate if you could take care. Otherwise I'll have to rebase & resend the patches. thanks
John
} else { backendType = "memory-backend-ram"; @@ -7609,7 +7631,8 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg,
if (virDomainNumatuneHasPerNodeBinding(def->numa) && !(virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM) || - virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE))) { + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE) || + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD))) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("Per-node memory binding is not supported " "with this QEMU")); @@ -7618,7 +7641,8 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg,
if (def->mem.nhugepages && def->mem.hugepages[0].size != system_page_size && - !virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE)) { + !(virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE) || + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB))) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("huge pages per NUMA node are not " "supported with this QEMU")); @@ -7635,7 +7659,8 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, * need to check which approach to use */ for (i = 0; i < ncells; i++) { if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM) || - virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE)) { + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE) || + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD)) {
if ((rc = qemuBuildMemoryCellBackendStr(def, cfg, i, priv, &nodeBackends[i])) < 0)

On 09/11/2018 09:53 AM, Marc-André Lureau wrote:
Hi
On Tue, Sep 11, 2018 at 2:46 AM, John Ferlan <jferlan@redhat.com> wrote:
On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Would be nice to have a few more words here. If you provide them I can add them... The if statement is difficult to read unless you know what each field really means.
hostmem-memfd is quite similar to hostmem-file. The main benefits are that it doesn't need to create filesystem files, and it also enforces sealing, providing a bit more safety.
secondary question - should we document what gets used?, e.g.:
https://libvirt.org/formatdomain.html#elementsMemoryBacking
Seems to me the preference to use memfd is for memory backing using anonymous source for nvdimm's without a defined path, but sometimes my wording doesn't match reality.
Yes it could be documented. But it's now an allocation decision that could evolve, or an implementation detail.
Would you like to see something like that?
<dt><code>source</code></dt> - <dd>In this attribute you can switch to file memorybacking or keep - default anonymous.</dd> + <dd>In this attribute you can switch to file memorybacking or + keep default anonymous. <span class="since">Since 4.8.0</span>, + when the memory is anonymous and the host supports it, libvirt + will use a memfd memory backing, providing additional safety + guarantees. + </dd> <dt><code>access</code></dt>
I don't think we should document this because: a) once we do, it's harder to change because of backwards compatibility. Imagine a bz like this: "with these domain settings libvirt was putting backend X onto cmd line and now that's changed to backend Y". b) it's no user business how libvirt ensures domain settings. In general, libvirt is dealing with custom build qemus where features might be disabled. For instance, memfd.hugetlb. Then we don't use that and fallback to memory-backend-*. Also, users that rely on certain backend have probably broken setup anyway. Michal

On 09/11/2018 03:53 AM, Marc-André Lureau wrote:
Hi
On Tue, Sep 11, 2018 at 2:46 AM, John Ferlan <jferlan@redhat.com> wrote:
On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Would be nice to have a few more words here. If you provide them I can add them... The if statement is difficult to read unless you know what each field really means.
hostmem-memfd is quite similar to hostmem-file. The main benefits are that it doesn't need to create filesystem files, and it also enforces sealing, providing a bit more safety.
So I can modify the commit message to: qemu: Prefer using memfd for anonymous memory Add support and prefer usage of hostmem-memfd backing when the memory backing source is anonymous and the QEMU capabilities support it. The main benefits are that it doesn't need to create filesystem files, and it also enforces sealing, providing a bit more safety.
secondary question - should we document what gets used?, e.g.:
https://libvirt.org/formatdomain.html#elementsMemoryBacking
Seems to me the preference to use memfd is for memory backing using anonymous source for nvdimm's without a defined path, but sometimes my wording doesn't match reality.
Yes it could be documented. But it's now an allocation decision that could evolve, or an implementation detail.
Would you like to see something like that?
<dt><code>source</code></dt> - <dd>In this attribute you can switch to file memorybacking or keep - default anonymous.</dd> + <dd>In this attribute you can switch to file memorybacking or + keep default anonymous. <span class="since">Since 4.8.0</span>, + when the memory is anonymous and the host supports it, libvirt + will use a memfd memory backing, providing additional safety + guarantees. + </dd> <dt><code>access</code></dt>
I won't make changes to the docs to describe how things are working. I'll defer to Michal's reasoning. It's why I ask though especially in areas where I have less exposure.
[...]
Running syntax-check would fail here because of the { }
All this is fix-able without you needing to post another series, but I need you to provide the verbiage for the intro and perhaps something that could be added to the web page. I can adjust the patch accordingly.
Assuming of course Michal doesn't have other reservations...
Reviewed-by: John Ferlan <jferlan@redhat.com>
If you already resolved patch 1 & 2 conflicts, I would appreciate if you could take care. Otherwise I'll have to rebase & resend the patches.
thanks
I don't mind making the changes; however, based on the continued migration discussion and possible need for more libvirt changes, I can also hold off. I could push the adjustments for patches 1 and 2, since they're essentially ready. That may rankle a few feathers, but so does making changes to the capabilities when there's already patches on list that will be impacted by the pushed changes. I also don't think at this point it's a question of "if" support for -memfd will be added, but "when". I also assume the migration questions can be worked out. Since this change would affect a 'default rule' for putting together the backing store, I think perhaps we could modify the existing helper qemuDomainABIStabilityCheck in order to make some "similar" or "sort of" checks about the existing and future assumption. I think that'll solve the "issue" at least from the POV of what QEMU will allow. John [...]

On 09/11/2018 12:46 AM, John Ferlan wrote:
On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Would be nice to have a few more words here. If you provide them I can add them... The if statement is difficult to read unless you know what each field really means.
secondary question - should we document what gets used?, e.g.:
https://libvirt.org/formatdomain.html#elementsMemoryBacking
Seems to me the preference to use memfd is for memory backing using anonymous source for nvdimm's without a defined path, but sometimes my wording doesn't match reality.
I don't think we want to tell users what backend are we going to use under what conditions. Firstly, these conditions will change (as they did in the past). Secondly, what backend libvirt decides to use is no business of users. I mean, they care about providing XML that matches their demands. It's libvirt's job to fulfil them. Look at this from the other way: if an user wants to have memory-backend-file for his domain, how would they enforce it once memfd is merged? Sure, they can tweak their memoryBacking settings, but that would work only until we decide to change the decision process for mem backend. What I am more worried about is migration. What happens if I migrate a hugepages domain from older libvirt to a newer one (the former doesn't support memfd, the latter does). On the source the domain was started with memory-backend-file (or memory-backend-ram with -mem-path). And during migration, the generated cmd line would use memfd. And I don't think qemu is capable of dealing with this discrepancy, is it? Or is memfd going to be used only for hugepages + <source type='anonymous'/> case (which is not allowed now and thus migration scenario I'm describing can't happen)? Michal

Hi On Tue, Sep 11, 2018 at 12:37 PM, Michal Privoznik <mprivozn@redhat.com> wrote:
On 09/11/2018 12:46 AM, John Ferlan wrote:
On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Would be nice to have a few more words here. If you provide them I can add them... The if statement is difficult to read unless you know what each field really means.
secondary question - should we document what gets used?, e.g.:
https://libvirt.org/formatdomain.html#elementsMemoryBacking
Seems to me the preference to use memfd is for memory backing using anonymous source for nvdimm's without a defined path, but sometimes my wording doesn't match reality.
I don't think we want to tell users what backend are we going to use under what conditions. Firstly, these conditions will change (as they did in the past). Secondly, what backend libvirt decides to use is no business of users. I mean, they care about providing XML that matches their demands. It's libvirt's job to fulfil them.
Look at this from the other way: if an user wants to have memory-backend-file for his domain, how would they enforce it once memfd is merged? Sure, they can tweak their memoryBacking settings, but that would work only until we decide to change the decision process for mem backend.
What I am more worried about is migration. What happens if I migrate a hugepages domain from older libvirt to a newer one (the former doesn't support memfd, the latter does). On the source the domain was started with memory-backend-file (or memory-backend-ram with -mem-path). And during migration, the generated cmd line would use memfd. And I don't think qemu is capable of dealing with this discrepancy, is it?
Actually, qemu doesn't care about the hostmem backend kind, it should handle the migration ok. However, there seems to be a bug in qemu, and hostmem backend don't use the right qom object name. with memory-backend-ram: (qemu) info qom-tree /objects /objects (container) /mem (memory-backend-file) /mem[0] (qemu:memory-region) But with memory-backend-file or memory-backend-memfd: (qemu) info qom-tree /objects /objects (container) /mem (memory-backend-file) /\x2fobjects\x2fmem[0] (qemu:memory-region) This causes migration to fail because of the object naming mismatch. It can migrate from/to -file and -memfd, since they use the same "broken" name, but not with -ram. I don't know how we can solve this migration issue without breaking things further. Any idea David?
Or is memfd going to be used only for hugepages + <source type='anonymous'/> case (which is not allowed now and thus migration scenario I'm describing can't happen)?
With those patches, memfd is used for anonymous memory (shared or not, hpt or not) with an explicit numa configuration. thanks

* Marc-André Lureau (marcandre.lureau@redhat.com) wrote:
Hi
On Tue, Sep 11, 2018 at 12:37 PM, Michal Privoznik <mprivozn@redhat.com> wrote:
On 09/11/2018 12:46 AM, John Ferlan wrote:
On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Would be nice to have a few more words here. If you provide them I can add them... The if statement is difficult to read unless you know what each field really means.
secondary question - should we document what gets used?, e.g.:
https://libvirt.org/formatdomain.html#elementsMemoryBacking
Seems to me the preference to use memfd is for memory backing using anonymous source for nvdimm's without a defined path, but sometimes my wording doesn't match reality.
I don't think we want to tell users what backend are we going to use under what conditions. Firstly, these conditions will change (as they did in the past). Secondly, what backend libvirt decides to use is no business of users. I mean, they care about providing XML that matches their demands. It's libvirt's job to fulfil them.
Look at this from the other way: if an user wants to have memory-backend-file for his domain, how would they enforce it once memfd is merged? Sure, they can tweak their memoryBacking settings, but that would work only until we decide to change the decision process for mem backend.
What I am more worried about is migration. What happens if I migrate a hugepages domain from older libvirt to a newer one (the former doesn't support memfd, the latter does). On the source the domain was started with memory-backend-file (or memory-backend-ram with -mem-path). And during migration, the generated cmd line would use memfd. And I don't think qemu is capable of dealing with this discrepancy, is it?
Actually, qemu doesn't care about the hostmem backend kind, it should handle the migration ok.
However, there seems to be a bug in qemu, and hostmem backend don't use the right qom object name.
Can you give me the command lines you're using? Dave
with memory-backend-ram:
(qemu) info qom-tree /objects /objects (container) /mem (memory-backend-file) /mem[0] (qemu:memory-region)
But with memory-backend-file or memory-backend-memfd:
(qemu) info qom-tree /objects /objects (container) /mem (memory-backend-file) /\x2fobjects\x2fmem[0] (qemu:memory-region)
This causes migration to fail because of the object naming mismatch.
It can migrate from/to -file and -memfd, since they use the same "broken" name, but not with -ram.
I don't know how we can solve this migration issue without breaking things further. Any idea David?
Or is memfd going to be used only for hugepages + <source type='anonymous'/> case (which is not allowed now and thus migration scenario I'm describing can't happen)?
With those patches, memfd is used for anonymous memory (shared or not, hpt or not) with an explicit numa configuration.
thanks -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Hi On Tue, Sep 11, 2018 at 2:32 PM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
* Marc-André Lureau (marcandre.lureau@redhat.com) wrote:
Hi
On Tue, Sep 11, 2018 at 12:37 PM, Michal Privoznik <mprivozn@redhat.com> wrote:
On 09/11/2018 12:46 AM, John Ferlan wrote:
On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Would be nice to have a few more words here. If you provide them I can add them... The if statement is difficult to read unless you know what each field really means.
secondary question - should we document what gets used?, e.g.:
https://libvirt.org/formatdomain.html#elementsMemoryBacking
Seems to me the preference to use memfd is for memory backing using anonymous source for nvdimm's without a defined path, but sometimes my wording doesn't match reality.
I don't think we want to tell users what backend are we going to use under what conditions. Firstly, these conditions will change (as they did in the past). Secondly, what backend libvirt decides to use is no business of users. I mean, they care about providing XML that matches their demands. It's libvirt's job to fulfil them.
Look at this from the other way: if an user wants to have memory-backend-file for his domain, how would they enforce it once memfd is merged? Sure, they can tweak their memoryBacking settings, but that would work only until we decide to change the decision process for mem backend.
What I am more worried about is migration. What happens if I migrate a hugepages domain from older libvirt to a newer one (the former doesn't support memfd, the latter does). On the source the domain was started with memory-backend-file (or memory-backend-ram with -mem-path). And during migration, the generated cmd line would use memfd. And I don't think qemu is capable of dealing with this discrepancy, is it?
Actually, qemu doesn't care about the hostmem backend kind, it should handle the migration ok.
However, there seems to be a bug in qemu, and hostmem backend don't use the right qom object name.
Can you give me the command lines you're using?
qemu -m 4096 -object memory-backend-ram,id=mem,size=4G -numa node,memdev=mem -monitor stdio qemu -m 4096 -object memory-backend-file,id=mem,size=4G,mem-path=/tmp/foo -numa node,memdev=mem -monitor stdio qemu -m 4096 -object memory-backend-memfd,id=mem,size=4G -numa node,memdev=mem -monitor stdio
Dave
with memory-backend-ram:
(qemu) info qom-tree /objects /objects (container) /mem (memory-backend-file) /mem[0] (qemu:memory-region)
But with memory-backend-file or memory-backend-memfd:
(qemu) info qom-tree /objects /objects (container) /mem (memory-backend-file) /\x2fobjects\x2fmem[0] (qemu:memory-region)
This causes migration to fail because of the object naming mismatch.
It can migrate from/to -file and -memfd, since they use the same "broken" name, but not with -ram.
I don't know how we can solve this migration issue without breaking things further. Any idea David?
Or is memfd going to be used only for hugepages + <source type='anonymous'/> case (which is not allowed now and thus migration scenario I'm describing can't happen)?
With those patches, memfd is used for anonymous memory (shared or not, hpt or not) with an explicit numa configuration.
thanks -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

* Marc-André Lureau (marcandre.lureau@redhat.com) wrote:
Hi
On Tue, Sep 11, 2018 at 2:32 PM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
* Marc-André Lureau (marcandre.lureau@redhat.com) wrote:
Hi
On Tue, Sep 11, 2018 at 12:37 PM, Michal Privoznik <mprivozn@redhat.com> wrote:
On 09/11/2018 12:46 AM, John Ferlan wrote:
On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Would be nice to have a few more words here. If you provide them I can add them... The if statement is difficult to read unless you know what each field really means.
secondary question - should we document what gets used?, e.g.:
https://libvirt.org/formatdomain.html#elementsMemoryBacking
Seems to me the preference to use memfd is for memory backing using anonymous source for nvdimm's without a defined path, but sometimes my wording doesn't match reality.
I don't think we want to tell users what backend are we going to use under what conditions. Firstly, these conditions will change (as they did in the past). Secondly, what backend libvirt decides to use is no business of users. I mean, they care about providing XML that matches their demands. It's libvirt's job to fulfil them.
Look at this from the other way: if an user wants to have memory-backend-file for his domain, how would they enforce it once memfd is merged? Sure, they can tweak their memoryBacking settings, but that would work only until we decide to change the decision process for mem backend.
What I am more worried about is migration. What happens if I migrate a hugepages domain from older libvirt to a newer one (the former doesn't support memfd, the latter does). On the source the domain was started with memory-backend-file (or memory-backend-ram with -mem-path). And during migration, the generated cmd line would use memfd. And I don't think qemu is capable of dealing with this discrepancy, is it?
Actually, qemu doesn't care about the hostmem backend kind, it should handle the migration ok.
However, there seems to be a bug in qemu, and hostmem backend don't use the right qom object name.
Can you give me the command lines you're using?
qemu -m 4096 -object memory-backend-ram,id=mem,size=4G -numa node,memdev=mem -monitor stdio qemu -m 4096 -object memory-backend-file,id=mem,size=4G,mem-path=/tmp/foo -numa node,memdev=mem -monitor stdio qemu -m 4096 -object memory-backend-memfd,id=mem,size=4G -numa node,memdev=mem -monitor stdio
There seem to be two different problems (at least); there's that escaping problem where the /'s are shown as \x2f in into qom-tree, but info ramblock looks saner, but is still showing the difference: ./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-ram,id=mem,size=1G -numa node,memdev=mem -monitor stdio (qemu) info ramblock Block Name PSize Offset Used Total mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000 ./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-file,id=mem,size=1G,mem-path=/tmp/foo -numa node,memdev=mem -monitor stdio (qemu) info ramblock Block Name PSize Offset Used Total /objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000 ./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-memfd,id=mem,size=1G -numa node,memdev=mem -monitor stdio QEMU 3.0.50 monitor - type 'help' for more information (qemu) info ramblock Block Name PSize Offset Used Total /objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000 hostmem-file.c is using object_get_canonical_path to get the RAMBlock where as hostmem-ram.c is using object_get_canonical_path_**component** The problem is if we change either of them then again we break migration compatibility. We could wire it to a machine type and/or property, so that memory-backend-ram would use the long name on newere qemus with an appropriate flag? Dave
Dave
with memory-backend-ram:
(qemu) info qom-tree /objects /objects (container) /mem (memory-backend-file) /mem[0] (qemu:memory-region)
But with memory-backend-file or memory-backend-memfd:
(qemu) info qom-tree /objects /objects (container) /mem (memory-backend-file) /\x2fobjects\x2fmem[0] (qemu:memory-region)
This causes migration to fail because of the object naming mismatch.
It can migrate from/to -file and -memfd, since they use the same "broken" name, but not with -ram.
I don't know how we can solve this migration issue without breaking things further. Any idea David?
Or is memfd going to be used only for hugepages + <source type='anonymous'/> case (which is not allowed now and thus migration scenario I'm describing can't happen)?
With those patches, memfd is used for anonymous memory (shared or not, hpt or not) with an explicit numa configuration.
thanks -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Hi On Tue, Sep 11, 2018 at 3:26 PM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
* Marc-André Lureau (marcandre.lureau@redhat.com) wrote:
Hi
On Tue, Sep 11, 2018 at 2:32 PM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
* Marc-André Lureau (marcandre.lureau@redhat.com) wrote:
Hi
On Tue, Sep 11, 2018 at 12:37 PM, Michal Privoznik <mprivozn@redhat.com> wrote:
On 09/11/2018 12:46 AM, John Ferlan wrote:
On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote: > From: Marc-André Lureau <marcandre.lureau@redhat.com> >
Would be nice to have a few more words here. If you provide them I can add them... The if statement is difficult to read unless you know what each field really means.
secondary question - should we document what gets used?, e.g.:
https://libvirt.org/formatdomain.html#elementsMemoryBacking
Seems to me the preference to use memfd is for memory backing using anonymous source for nvdimm's without a defined path, but sometimes my wording doesn't match reality.
I don't think we want to tell users what backend are we going to use under what conditions. Firstly, these conditions will change (as they did in the past). Secondly, what backend libvirt decides to use is no business of users. I mean, they care about providing XML that matches their demands. It's libvirt's job to fulfil them.
Look at this from the other way: if an user wants to have memory-backend-file for his domain, how would they enforce it once memfd is merged? Sure, they can tweak their memoryBacking settings, but that would work only until we decide to change the decision process for mem backend.
What I am more worried about is migration. What happens if I migrate a hugepages domain from older libvirt to a newer one (the former doesn't support memfd, the latter does). On the source the domain was started with memory-backend-file (or memory-backend-ram with -mem-path). And during migration, the generated cmd line would use memfd. And I don't think qemu is capable of dealing with this discrepancy, is it?
Actually, qemu doesn't care about the hostmem backend kind, it should handle the migration ok.
However, there seems to be a bug in qemu, and hostmem backend don't use the right qom object name.
Can you give me the command lines you're using?
qemu -m 4096 -object memory-backend-ram,id=mem,size=4G -numa node,memdev=mem -monitor stdio qemu -m 4096 -object memory-backend-file,id=mem,size=4G,mem-path=/tmp/foo -numa node,memdev=mem -monitor stdio qemu -m 4096 -object memory-backend-memfd,id=mem,size=4G -numa node,memdev=mem -monitor stdio
There seem to be two different problems (at least); there's that escaping problem where the /'s are shown as \x2f in into qom-tree,
That's not a problem, this is done in memory_region_escape_name()
but info ramblock looks saner, but is still showing the difference:
./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-ram,id=mem,size=1G -numa node,memdev=mem -monitor stdio (qemu) info ramblock Block Name PSize Offset Used Total mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-file,id=mem,size=1G,mem-path=/tmp/foo -numa node,memdev=mem -monitor stdio (qemu) info ramblock Block Name PSize Offset Used Total /objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-memfd,id=mem,size=1G -numa node,memdev=mem -monitor stdio QEMU 3.0.50 monitor - type 'help' for more information (qemu) info ramblock Block Name PSize Offset Used Total /objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
hostmem-file.c is using object_get_canonical_path to get the RAMBlock where as hostmem-ram.c is using object_get_canonical_path_**component**
The problem is if we change either of them then again we break migration compatibility.
Yes, that was the object of my question :)
We could wire it to a machine type and/or property, so that memory-backend-ram would use the long name on newere qemus with an appropriate flag?
Good idea, I can prepare a patch. However, libvirt will have to learn of this migration issue with older version, it's probably not worth to try to make more workarounds.
Dave
Dave
with memory-backend-ram:
(qemu) info qom-tree /objects /objects (container) /mem (memory-backend-file) /mem[0] (qemu:memory-region)
But with memory-backend-file or memory-backend-memfd:
(qemu) info qom-tree /objects /objects (container) /mem (memory-backend-file) /\x2fobjects\x2fmem[0] (qemu:memory-region)
This causes migration to fail because of the object naming mismatch.
It can migrate from/to -file and -memfd, since they use the same "broken" name, but not with -ram.
I don't know how we can solve this migration issue without breaking things further. Any idea David?
Or is memfd going to be used only for hugepages + <source type='anonymous'/> case (which is not allowed now and thus migration scenario I'm describing can't happen)?
With those patches, memfd is used for anonymous memory (shared or not, hpt or not) with an explicit numa configuration.
thanks -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

* Marc-André Lureau (marcandre.lureau@redhat.com) wrote:
Hi
On Tue, Sep 11, 2018 at 3:26 PM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
* Marc-André Lureau (marcandre.lureau@redhat.com) wrote:
Hi
On Tue, Sep 11, 2018 at 2:32 PM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
* Marc-André Lureau (marcandre.lureau@redhat.com) wrote:
Hi
On Tue, Sep 11, 2018 at 12:37 PM, Michal Privoznik <mprivozn@redhat.com> wrote:
On 09/11/2018 12:46 AM, John Ferlan wrote: > > On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote: >> From: Marc-André Lureau <marcandre.lureau@redhat.com> >> > > Would be nice to have a few more words here. If you provide them I can > add them... The if statement is difficult to read unless you know what > each field really means. > > secondary question - should we document what gets used?, e.g.: > > https://libvirt.org/formatdomain.html#elementsMemoryBacking > > Seems to me the preference to use memfd is for memory backing using > anonymous source for nvdimm's without a defined path, but sometimes my > wording doesn't match reality.
I don't think we want to tell users what backend are we going to use under what conditions. Firstly, these conditions will change (as they did in the past). Secondly, what backend libvirt decides to use is no business of users. I mean, they care about providing XML that matches their demands. It's libvirt's job to fulfil them.
Look at this from the other way: if an user wants to have memory-backend-file for his domain, how would they enforce it once memfd is merged? Sure, they can tweak their memoryBacking settings, but that would work only until we decide to change the decision process for mem backend.
What I am more worried about is migration. What happens if I migrate a hugepages domain from older libvirt to a newer one (the former doesn't support memfd, the latter does). On the source the domain was started with memory-backend-file (or memory-backend-ram with -mem-path). And during migration, the generated cmd line would use memfd. And I don't think qemu is capable of dealing with this discrepancy, is it?
Actually, qemu doesn't care about the hostmem backend kind, it should handle the migration ok.
However, there seems to be a bug in qemu, and hostmem backend don't use the right qom object name.
Can you give me the command lines you're using?
qemu -m 4096 -object memory-backend-ram,id=mem,size=4G -numa node,memdev=mem -monitor stdio qemu -m 4096 -object memory-backend-file,id=mem,size=4G,mem-path=/tmp/foo -numa node,memdev=mem -monitor stdio qemu -m 4096 -object memory-backend-memfd,id=mem,size=4G -numa node,memdev=mem -monitor stdio
There seem to be two different problems (at least); there's that escaping problem where the /'s are shown as \x2f in into qom-tree,
That's not a problem, this is done in memory_region_escape_name()
but info ramblock looks saner, but is still showing the difference:
./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-ram,id=mem,size=1G -numa node,memdev=mem -monitor stdio (qemu) info ramblock Block Name PSize Offset Used Total mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-file,id=mem,size=1G,mem-path=/tmp/foo -numa node,memdev=mem -monitor stdio (qemu) info ramblock Block Name PSize Offset Used Total /objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-memfd,id=mem,size=1G -numa node,memdev=mem -monitor stdio QEMU 3.0.50 monitor - type 'help' for more information (qemu) info ramblock Block Name PSize Offset Used Total /objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
hostmem-file.c is using object_get_canonical_path to get the RAMBlock where as hostmem-ram.c is using object_get_canonical_path_**component**
The problem is if we change either of them then again we break migration compatibility.
Yes, that was the object of my question :)
We could wire it to a machine type and/or property, so that memory-backend-ram would use the long name on newere qemus with an appropriate flag?
Good idea, I can prepare a patch.
Great; if you add the property to use the longname, then turn that property on in the newer machine type it should work. A qemu that has the property can then be assumed to the right thing when set.
However, libvirt will have to learn of this migration issue with older version, it's probably not worth to try to make more workarounds.
Yeh I'm not sure what your heuristics look like for these choices. But for a VM without this fix then you can't convert from backend-ram to memfd. Dave
Dave
Dave
with memory-backend-ram:
(qemu) info qom-tree /objects /objects (container) /mem (memory-backend-file) /mem[0] (qemu:memory-region)
But with memory-backend-file or memory-backend-memfd:
(qemu) info qom-tree /objects /objects (container) /mem (memory-backend-file) /\x2fobjects\x2fmem[0] (qemu:memory-region)
This causes migration to fail because of the object naming mismatch.
It can migrate from/to -file and -memfd, since they use the same "broken" name, but not with -ram.
I don't know how we can solve this migration issue without breaking things further. Any idea David?
Or is memfd going to be used only for hugepages + <source type='anonymous'/> case (which is not allowed now and thus migration scenario I'm describing can't happen)?
With those patches, memfd is used for anonymous memory (shared or not, hpt or not) with an explicit numa configuration.
thanks -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

On Tue, 11 Sep 2018 12:49:12 +0100 "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
* Marc-André Lureau (marcandre.lureau@redhat.com) wrote:
Hi
On Tue, Sep 11, 2018 at 3:26 PM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
* Marc-André Lureau (marcandre.lureau@redhat.com) wrote:
Hi
On Tue, Sep 11, 2018 at 2:32 PM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
* Marc-André Lureau (marcandre.lureau@redhat.com) wrote:
Hi
On Tue, Sep 11, 2018 at 12:37 PM, Michal Privoznik <mprivozn@redhat.com> wrote: > On 09/11/2018 12:46 AM, John Ferlan wrote: >> >> On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote: >>> From: Marc-André Lureau <marcandre.lureau@redhat.com> >>> >> >> Would be nice to have a few more words here. If you provide them I can >> add them... The if statement is difficult to read unless you know what >> each field really means. >> >> secondary question - should we document what gets used?, e.g.: >> >> https://libvirt.org/formatdomain.html#elementsMemoryBacking >> >> Seems to me the preference to use memfd is for memory backing using >> anonymous source for nvdimm's without a defined path, but sometimes my >> wording doesn't match reality. > > I don't think we want to tell users what backend are we going to use > under what conditions. Firstly, these conditions will change (as they > did in the past). Secondly, what backend libvirt decides to use is no > business of users. I mean, they care about providing XML that matches > their demands. It's libvirt's job to fulfil them. > > Look at this from the other way: if an user wants to have > memory-backend-file for his domain, how would they enforce it once memfd > is merged? Sure, they can tweak their memoryBacking settings, but that > would work only until we decide to change the decision process for mem > backend. > > What I am more worried about is migration. What happens if I migrate a > hugepages domain from older libvirt to a newer one (the former doesn't > support memfd, the latter does). On the source the domain was started > with memory-backend-file (or memory-backend-ram with -mem-path). And > during migration, the generated cmd line would use memfd. And I don't > think qemu is capable of dealing with this discrepancy, is it?
Actually, qemu doesn't care about the hostmem backend kind, it should handle the migration ok.
However, there seems to be a bug in qemu, and hostmem backend don't use the right qom object name.
Can you give me the command lines you're using?
qemu -m 4096 -object memory-backend-ram,id=mem,size=4G -numa node,memdev=mem -monitor stdio qemu -m 4096 -object memory-backend-file,id=mem,size=4G,mem-path=/tmp/foo -numa node,memdev=mem -monitor stdio qemu -m 4096 -object memory-backend-memfd,id=mem,size=4G -numa node,memdev=mem -monitor stdio
There seem to be two different problems (at least); there's that escaping problem where the /'s are shown as \x2f in into qom-tree,
That's not a problem, this is done in memory_region_escape_name()
but info ramblock looks saner, but is still showing the difference:
./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-ram,id=mem,size=1G -numa node,memdev=mem -monitor stdio (qemu) info ramblock Block Name PSize Offset Used Total mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-file,id=mem,size=1G,mem-path=/tmp/foo -numa node,memdev=mem -monitor stdio (qemu) info ramblock Block Name PSize Offset Used Total /objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-memfd,id=mem,size=1G -numa node,memdev=mem -monitor stdio QEMU 3.0.50 monitor - type 'help' for more information (qemu) info ramblock Block Name PSize Offset Used Total /objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
hostmem-file.c is using object_get_canonical_path to get the RAMBlock where as hostmem-ram.c is using object_get_canonical_path_**component**
The problem is if we change either of them then again we break migration compatibility.
Yes, that was the object of my question :)
We could wire it to a machine type and/or property, so that memory-backend-ram would use the long name on newere qemus with an appropriate flag?
Good idea, I can prepare a patch.
Great; if you add the property to use the longname, then turn that property on in the newer machine type it should work. A qemu that has the property can then be assumed to the right thing when set. compat properties mechanism is applicable only for device based objects and backends are not based on it. So it won't be so easy, one basically would need to re-implement or event better extend compat props mechanism to backends.
However, libvirt will have to learn of this migration issue with older version, it's probably not worth to try to make more workarounds.
Yeh I'm not sure what your heuristics look like for these choices. But for a VM without this fix then you can't convert from backend-ram to memfd. I wouldn't try migrate from one to backend type to another automatically if domain used backend-ram than libvirt should start target with the same backend (it not only ram block name in migration stream, but could also involve ramblock's alignment, padding, guard pages or something else as it's different backends and potentially can change its default behavior independently from each other).
Redefining meaning of 'anonymous' from backend-ram to memfd is fine only if libvirt is able to distinguish old domains with ram backend vs memfd (so it could start domains accordingly, i.e. no cross migration). Otherwise we would be creating time bomb, that would explode when 2 independent backends change in incompatible manner.
Dave
Dave
Dave
with memory-backend-ram:
(qemu) info qom-tree /objects /objects (container) /mem (memory-backend-file) /mem[0] (qemu:memory-region)
But with memory-backend-file or memory-backend-memfd:
(qemu) info qom-tree /objects /objects (container) /mem (memory-backend-file) /\x2fobjects\x2fmem[0] (qemu:memory-region)
This causes migration to fail because of the object naming mismatch.
It can migrate from/to -file and -memfd, since they use the same "broken" name, but not with -ram.
I don't know how we can solve this migration issue without breaking things further. Any idea David?
> Or is memfd going to be used only for hugepages + <source > type='anonymous'/> case (which is not allowed now and thus migration > scenario I'm describing can't happen)?
With those patches, memfd is used for anonymous memory (shared or not, hpt or not) with an explicit numa configuration.
thanks -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Hi On Tue, Sep 11, 2018 at 5:14 PM, Igor Mammedov <imammedo@redhat.com> wrote:
On Tue, 11 Sep 2018 12:49:12 +0100 "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
* Marc-André Lureau (marcandre.lureau@redhat.com) wrote:
Hi
On Tue, Sep 11, 2018 at 3:26 PM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
* Marc-André Lureau (marcandre.lureau@redhat.com) wrote:
Hi
On Tue, Sep 11, 2018 at 2:32 PM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
* Marc-André Lureau (marcandre.lureau@redhat.com) wrote: > Hi > > On Tue, Sep 11, 2018 at 12:37 PM, Michal Privoznik <mprivozn@redhat.com> wrote: > > On 09/11/2018 12:46 AM, John Ferlan wrote: > >> > >> On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote: > >>> From: Marc-André Lureau <marcandre.lureau@redhat.com> > >>> > >> > >> Would be nice to have a few more words here. If you provide them I can > >> add them... The if statement is difficult to read unless you know what > >> each field really means. > >> > >> secondary question - should we document what gets used?, e.g.: > >> > >> https://libvirt.org/formatdomain.html#elementsMemoryBacking > >> > >> Seems to me the preference to use memfd is for memory backing using > >> anonymous source for nvdimm's without a defined path, but sometimes my > >> wording doesn't match reality. > > > > I don't think we want to tell users what backend are we going to use > > under what conditions. Firstly, these conditions will change (as they > > did in the past). Secondly, what backend libvirt decides to use is no > > business of users. I mean, they care about providing XML that matches > > their demands. It's libvirt's job to fulfil them. > > > > Look at this from the other way: if an user wants to have > > memory-backend-file for his domain, how would they enforce it once memfd > > is merged? Sure, they can tweak their memoryBacking settings, but that > > would work only until we decide to change the decision process for mem > > backend. > > > > What I am more worried about is migration. What happens if I migrate a > > hugepages domain from older libvirt to a newer one (the former doesn't > > support memfd, the latter does). On the source the domain was started > > with memory-backend-file (or memory-backend-ram with -mem-path). And > > during migration, the generated cmd line would use memfd. And I don't > > think qemu is capable of dealing with this discrepancy, is it? > > > Actually, qemu doesn't care about the hostmem backend kind, it should > handle the migration ok. > > However, there seems to be a bug in qemu, and hostmem backend don't > use the right qom object name.
Can you give me the command lines you're using?
qemu -m 4096 -object memory-backend-ram,id=mem,size=4G -numa node,memdev=mem -monitor stdio qemu -m 4096 -object memory-backend-file,id=mem,size=4G,mem-path=/tmp/foo -numa node,memdev=mem -monitor stdio qemu -m 4096 -object memory-backend-memfd,id=mem,size=4G -numa node,memdev=mem -monitor stdio
There seem to be two different problems (at least); there's that escaping problem where the /'s are shown as \x2f in into qom-tree,
That's not a problem, this is done in memory_region_escape_name()
but info ramblock looks saner, but is still showing the difference:
./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-ram,id=mem,size=1G -numa node,memdev=mem -monitor stdio (qemu) info ramblock Block Name PSize Offset Used Total mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-file,id=mem,size=1G,mem-path=/tmp/foo -numa node,memdev=mem -monitor stdio (qemu) info ramblock Block Name PSize Offset Used Total /objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-memfd,id=mem,size=1G -numa node,memdev=mem -monitor stdio QEMU 3.0.50 monitor - type 'help' for more information (qemu) info ramblock Block Name PSize Offset Used Total /objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
hostmem-file.c is using object_get_canonical_path to get the RAMBlock where as hostmem-ram.c is using object_get_canonical_path_**component**
The problem is if we change either of them then again we break migration compatibility.
Yes, that was the object of my question :)
We could wire it to a machine type and/or property, so that memory-backend-ram would use the long name on newere qemus with an appropriate flag?
Good idea, I can prepare a patch.
Great; if you add the property to use the longname, then turn that property on in the newer machine type it should work. A qemu that has the property can then be assumed to the right thing when set. compat properties mechanism is applicable only for device based objects and backends are not based on it. So it won't be so easy, one basically would need to re-implement or event better extend compat props mechanism to backends.
indeed
However, libvirt will have to learn of this migration issue with older version, it's probably not worth to try to make more workarounds.
Yeh I'm not sure what your heuristics look like for these choices. But for a VM without this fix then you can't convert from backend-ram to memfd. I wouldn't try migrate from one to backend type to another automatically if domain used backend-ram than libvirt should start target with the same backend (it not only ram block name in migration stream, but could also involve ramblock's alignment, padding, guard pages or something else as it's different backends and potentially can change its default behavior independently from each other).
Then libvirt can't transparently use memfd, and we will go back to my initial suggestion to have a new memory backing source kind in the domain XML named "memfd". Are "ramblock's alignment, padding, guard pages" exposed in domain XML? Didn't they change over time in qemu wtihout libvirt noticing? Why allocation with memfd couldn't be transparently be changed the same way?
Redefining meaning of 'anonymous' from backend-ram to memfd is fine only if libvirt is able to distinguish old domains with ram backend vs memfd (so it could start domains accordingly, i.e. no cross migration).
And memory-backend-file used as anonymous memory (without explicit path etc).
Otherwise we would be creating time bomb, that would explode when 2 independent backends change in incompatible manner.
If there is such a limitation, qemu should prevent it then. It seems qemu let you migrate from/to the various hostmem-* (as long they use the same name, which is the case for -file and -memfd at this point). Why restrict that now?
Dave
Dave
Dave
> with memory-backend-ram: > > (qemu) info qom-tree /objects > /objects (container) > /mem (memory-backend-file) > /mem[0] (qemu:memory-region) > > But with memory-backend-file or memory-backend-memfd: > > (qemu) info qom-tree /objects > /objects (container) > /mem (memory-backend-file) > /\x2fobjects\x2fmem[0] (qemu:memory-region) > > > This causes migration to fail because of the object naming mismatch. > > It can migrate from/to -file and -memfd, since they use the same > "broken" name, but not with -ram. > > I don't know how we can solve this migration issue without breaking > things further. Any idea David? > > > Or is memfd going to be used only for hugepages + <source > > type='anonymous'/> case (which is not allowed now and thus migration > > scenario I'm describing can't happen)? > > With those patches, memfd is used for anonymous memory (shared or not, > hpt or not) with an explicit numa configuration. > > thanks -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Hi
On Tue, Sep 11, 2018 at 5:14 PM, Igor Mammedov <imammedo@redhat.com> wrote:
On Tue, 11 Sep 2018 12:49:12 +0100 "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
* Marc-André Lureau (marcandre.lureau@redhat.com) wrote:
Hi
On Tue, Sep 11, 2018 at 3:26 PM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
* Marc-André Lureau (marcandre.lureau@redhat.com) wrote:
Hi
On Tue, Sep 11, 2018 at 2:32 PM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > * Marc-André Lureau (marcandre.lureau@redhat.com) wrote: >> Hi >> >> On Tue, Sep 11, 2018 at 12:37 PM, Michal Privoznik <mprivozn@redhat.com> wrote: >> > On 09/11/2018 12:46 AM, John Ferlan wrote: >> >> >> >> On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote: >> >>> From: Marc-André Lureau <marcandre.lureau@redhat.com> >> >>> >> >> >> >> Would be nice to have a few more words here. If you provide them I can >> >> add them... The if statement is difficult to read unless you know what >> >> each field really means. >> >> >> >> secondary question - should we document what gets used?, e.g.: >> >> >> >> https://libvirt.org/formatdomain.html#elementsMemoryBacking >> >> >> >> Seems to me the preference to use memfd is for memory backing using >> >> anonymous source for nvdimm's without a defined path, but sometimes my >> >> wording doesn't match reality. >> > >> > I don't think we want to tell users what backend are we going to use >> > under what conditions. Firstly, these conditions will change (as they >> > did in the past). Secondly, what backend libvirt decides to use is no >> > business of users. I mean, they care about providing XML that matches >> > their demands. It's libvirt's job to fulfil them. >> > >> > Look at this from the other way: if an user wants to have >> > memory-backend-file for his domain, how would they enforce it once memfd >> > is merged? Sure, they can tweak their memoryBacking settings, but that >> > would work only until we decide to change the decision process for mem >> > backend. >> > >> > What I am more worried about is migration. What happens if I migrate a >> > hugepages domain from older libvirt to a newer one (the former doesn't >> > support memfd, the latter does). On the source the domain was started >> > with memory-backend-file (or memory-backend-ram with -mem-path). And >> > during migration, the generated cmd line would use memfd. And I don't >> > think qemu is capable of dealing with this discrepancy, is it? >> >> >> Actually, qemu doesn't care about the hostmem backend kind, it should >> handle the migration ok. >> >> However, there seems to be a bug in qemu, and hostmem backend don't >> use the right qom object name. > > Can you give me the command lines you're using?
qemu -m 4096 -object memory-backend-ram,id=mem,size=4G -numa node,memdev=mem -monitor stdio qemu -m 4096 -object memory-backend-file,id=mem,size=4G,mem-path=/tmp/foo -numa node,memdev=mem -monitor stdio qemu -m 4096 -object memory-backend-memfd,id=mem,size=4G -numa node,memdev=mem -monitor stdio
There seem to be two different problems (at least); there's that escaping problem where the /'s are shown as \x2f in into qom-tree,
That's not a problem, this is done in memory_region_escape_name()
but info ramblock looks saner, but is still showing the difference:
./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-ram,id=mem,size=1G -numa node,memdev=mem -monitor stdio (qemu) info ramblock Block Name PSize Offset Used Total mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-file,id=mem,size=1G,mem-path=/tmp/foo -numa node,memdev=mem -monitor stdio (qemu) info ramblock Block Name PSize Offset Used Total /objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-memfd,id=mem,size=1G -numa node,memdev=mem -monitor stdio QEMU 3.0.50 monitor - type 'help' for more information (qemu) info ramblock Block Name PSize Offset Used Total /objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
hostmem-file.c is using object_get_canonical_path to get the RAMBlock where as hostmem-ram.c is using object_get_canonical_path_**component**
The problem is if we change either of them then again we break migration compatibility.
Yes, that was the object of my question :)
We could wire it to a machine type and/or property, so that memory-backend-ram would use the long name on newere qemus with an appropriate flag?
Good idea, I can prepare a patch.
Great; if you add the property to use the longname, then turn that property on in the newer machine type it should work. A qemu that has the property can then be assumed to the right thing when set. compat properties mechanism is applicable only for device based objects and backends are not based on it. So it won't be so easy, one basically would need to re-implement or event better extend compat props mechanism to backends.
indeed
However, libvirt will have to learn of this migration issue with older version, it's probably not worth to try to make more workarounds.
Yeh I'm not sure what your heuristics look like for these choices. But for a VM without this fix then you can't convert from backend-ram to memfd. I wouldn't try migrate from one to backend type to another automatically if domain used backend-ram than libvirt should start target with the same backend (it not only ram block name in migration stream, but could also involve ramblock's alignment, padding, guard pages or something else as it's different backends and potentially can change its default behavior independently from each other).
Then libvirt can't transparently use memfd, and we will go back to my initial suggestion to have a new memory backing source kind in the domain XML named "memfd". less magic the better, the only downside is that implementation
On Tue, 11 Sep 2018 17:30:31 +0400 Marc-André Lureau <marcandre.lureau@redhat.com> wrote: details of a QEMU backend sip through abstraction libvirt is supposed to produce for it's users and a question how users are supposed to pick a backend variant for their needs.
Are "ramblock's alignment, padding, guard pages" exposed in domain XML? Didn't they change over time in qemu wtihout libvirt noticing? Why allocation with memfd couldn't be transparently be changed the same way?
Redefining meaning of 'anonymous' from backend-ram to memfd is fine only if libvirt is able to distinguish old domains with ram backend vs memfd (so it could start domains accordingly, i.e. no cross migration).
And memory-backend-file used as anonymous memory (without explicit path etc).
Otherwise we would be creating time bomb, that would explode when 2 independent backends change in incompatible manner.
If there is such a limitation, qemu should prevent it then. It seems qemu let you migrate from/to the various hostmem-* (as long they use the same name, which is the case for -file and -memfd at this point). Why restrict that now?
it works buy luck not by design. Even though qemu doesn't block it, it doesn't mean that's the right thing to do. Rule of the thumb with migration is that CLI on destination should match one one source (i.e. no magical cli replacements). If it's not then user is to blame.
Dave
Dave
> > Dave > >> with memory-backend-ram: >> >> (qemu) info qom-tree /objects >> /objects (container) >> /mem (memory-backend-file) >> /mem[0] (qemu:memory-region) >> >> But with memory-backend-file or memory-backend-memfd: >> >> (qemu) info qom-tree /objects >> /objects (container) >> /mem (memory-backend-file) >> /\x2fobjects\x2fmem[0] (qemu:memory-region) >> >> >> This causes migration to fail because of the object naming mismatch. >> >> It can migrate from/to -file and -memfd, since they use the same >> "broken" name, but not with -ram. >> >> I don't know how we can solve this migration issue without breaking >> things further. Any idea David? >> >> > Or is memfd going to be used only for hugepages + <source >> > type='anonymous'/> case (which is not allowed now and thus migration >> > scenario I'm describing can't happen)? >> >> With those patches, memfd is used for anonymous memory (shared or not, >> hpt or not) with an explicit numa configuration. >> >> thanks > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

* Igor Mammedov (imammedo@redhat.com) wrote:
On Tue, 11 Sep 2018 17:30:31 +0400 Marc-André Lureau <marcandre.lureau@redhat.com> wrote:
Hi
On Tue, Sep 11, 2018 at 5:14 PM, Igor Mammedov <imammedo@redhat.com> wrote:
On Tue, 11 Sep 2018 12:49:12 +0100 "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
* Marc-André Lureau (marcandre.lureau@redhat.com) wrote:
Hi
On Tue, Sep 11, 2018 at 3:26 PM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
* Marc-André Lureau (marcandre.lureau@redhat.com) wrote: > Hi > > On Tue, Sep 11, 2018 at 2:32 PM, Dr. David Alan Gilbert > <dgilbert@redhat.com> wrote: > > * Marc-André Lureau (marcandre.lureau@redhat.com) wrote: > >> Hi > >> > >> On Tue, Sep 11, 2018 at 12:37 PM, Michal Privoznik <mprivozn@redhat.com> wrote: > >> > On 09/11/2018 12:46 AM, John Ferlan wrote: > >> >> > >> >> On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote: > >> >>> From: Marc-André Lureau <marcandre.lureau@redhat.com> > >> >>> > >> >> > >> >> Would be nice to have a few more words here. If you provide them I can > >> >> add them... The if statement is difficult to read unless you know what > >> >> each field really means. > >> >> > >> >> secondary question - should we document what gets used?, e.g.: > >> >> > >> >> https://libvirt.org/formatdomain.html#elementsMemoryBacking > >> >> > >> >> Seems to me the preference to use memfd is for memory backing using > >> >> anonymous source for nvdimm's without a defined path, but sometimes my > >> >> wording doesn't match reality. > >> > > >> > I don't think we want to tell users what backend are we going to use > >> > under what conditions. Firstly, these conditions will change (as they > >> > did in the past). Secondly, what backend libvirt decides to use is no > >> > business of users. I mean, they care about providing XML that matches > >> > their demands. It's libvirt's job to fulfil them. > >> > > >> > Look at this from the other way: if an user wants to have > >> > memory-backend-file for his domain, how would they enforce it once memfd > >> > is merged? Sure, they can tweak their memoryBacking settings, but that > >> > would work only until we decide to change the decision process for mem > >> > backend. > >> > > >> > What I am more worried about is migration. What happens if I migrate a > >> > hugepages domain from older libvirt to a newer one (the former doesn't > >> > support memfd, the latter does). On the source the domain was started > >> > with memory-backend-file (or memory-backend-ram with -mem-path). And > >> > during migration, the generated cmd line would use memfd. And I don't > >> > think qemu is capable of dealing with this discrepancy, is it? > >> > >> > >> Actually, qemu doesn't care about the hostmem backend kind, it should > >> handle the migration ok. > >> > >> However, there seems to be a bug in qemu, and hostmem backend don't > >> use the right qom object name. > > > > Can you give me the command lines you're using? > > qemu -m 4096 -object memory-backend-ram,id=mem,size=4G -numa > node,memdev=mem -monitor stdio > qemu -m 4096 -object > memory-backend-file,id=mem,size=4G,mem-path=/tmp/foo -numa > node,memdev=mem -monitor stdio > qemu -m 4096 -object memory-backend-memfd,id=mem,size=4G -numa > node,memdev=mem -monitor stdio
There seem to be two different problems (at least); there's that escaping problem where the /'s are shown as \x2f in into qom-tree,
That's not a problem, this is done in memory_region_escape_name()
but info ramblock looks saner, but is still showing the difference:
./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-ram,id=mem,size=1G -numa node,memdev=mem -monitor stdio (qemu) info ramblock Block Name PSize Offset Used Total mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-file,id=mem,size=1G,mem-path=/tmp/foo -numa node,memdev=mem -monitor stdio (qemu) info ramblock Block Name PSize Offset Used Total /objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-memfd,id=mem,size=1G -numa node,memdev=mem -monitor stdio QEMU 3.0.50 monitor - type 'help' for more information (qemu) info ramblock Block Name PSize Offset Used Total /objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000
hostmem-file.c is using object_get_canonical_path to get the RAMBlock where as hostmem-ram.c is using object_get_canonical_path_**component**
The problem is if we change either of them then again we break migration compatibility.
Yes, that was the object of my question :)
We could wire it to a machine type and/or property, so that memory-backend-ram would use the long name on newere qemus with an appropriate flag?
Good idea, I can prepare a patch.
Great; if you add the property to use the longname, then turn that property on in the newer machine type it should work. A qemu that has the property can then be assumed to the right thing when set. compat properties mechanism is applicable only for device based objects and backends are not based on it. So it won't be so easy, one basically would need to re-implement or event better extend compat props mechanism to backends.
indeed
However, libvirt will have to learn of this migration issue with older version, it's probably not worth to try to make more workarounds.
Yeh I'm not sure what your heuristics look like for these choices. But for a VM without this fix then you can't convert from backend-ram to memfd. I wouldn't try migrate from one to backend type to another automatically if domain used backend-ram than libvirt should start target with the same backend (it not only ram block name in migration stream, but could also involve ramblock's alignment, padding, guard pages or something else as it's different backends and potentially can change its default behavior independently from each other).
Then libvirt can't transparently use memfd, and we will go back to my initial suggestion to have a new memory backing source kind in the domain XML named "memfd". less magic the better, the only downside is that implementation details of a QEMU backend sip through abstraction libvirt is supposed to produce for it's users and a question how users are supposed to pick a backend variant for their needs.
Are "ramblock's alignment, padding, guard pages" exposed in domain XML? Didn't they change over time in qemu wtihout libvirt noticing? Why allocation with memfd couldn't be transparently be changed the same way?
Redefining meaning of 'anonymous' from backend-ram to memfd is fine only if libvirt is able to distinguish old domains with ram backend vs memfd (so it could start domains accordingly, i.e. no cross migration).
And memory-backend-file used as anonymous memory (without explicit path etc).
Otherwise we would be creating time bomb, that would explode when 2 independent backends change in incompatible manner.
If there is such a limitation, qemu should prevent it then. It seems qemu let you migrate from/to the various hostmem-* (as long they use the same name, which is the case for -file and -memfd at this point). Why restrict that now?
it works buy luck not by design. Even though qemu doesn't block it, it doesn't mean that's the right thing to do. Rule of the thumb with migration is that CLI on destination should match one one source (i.e. no magical cli replacements). If it's not then user is to blame.
The rule isn't actually that strong. We normally allow the backends to change as long as the guest visible parts don't. For example, it's perfectly legal to migrate between a qemu that's got it's virtio-blk wired to a NFS disk to a qemu that's got it wired to iSCSI - the guest view in the two cases is the same but the command line is quite different. Similarly for networking you can flip to different tap setups. So as long as the change: a) looks identical to the guest b) Doesn't have any backend specific migration data then a migration should work and I'd expect it to work. Dave
Dave
Dave
> > > > Dave > > > >> with memory-backend-ram: > >> > >> (qemu) info qom-tree /objects > >> /objects (container) > >> /mem (memory-backend-file) > >> /mem[0] (qemu:memory-region) > >> > >> But with memory-backend-file or memory-backend-memfd: > >> > >> (qemu) info qom-tree /objects > >> /objects (container) > >> /mem (memory-backend-file) > >> /\x2fobjects\x2fmem[0] (qemu:memory-region) > >> > >> > >> This causes migration to fail because of the object naming mismatch. > >> > >> It can migrate from/to -file and -memfd, since they use the same > >> "broken" name, but not with -ram. > >> > >> I don't know how we can solve this migration issue without breaking > >> things further. Any idea David? > >> > >> > Or is memfd going to be used only for hugepages + <source > >> > type='anonymous'/> case (which is not allowed now and thus migration > >> > scenario I'm describing can't happen)? > >> > >> With those patches, memfd is used for anonymous memory (shared or not, > >> hpt or not) with an explicit numa configuration. > >> > >> thanks > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

On Thu, 13 Sep 2018 15:36:38 +0100 "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
* Igor Mammedov (imammedo@redhat.com) wrote:
On Tue, 11 Sep 2018 17:30:31 +0400 Marc-André Lureau <marcandre.lureau@redhat.com> wrote:
Hi
On Tue, Sep 11, 2018 at 5:14 PM, Igor Mammedov <imammedo@redhat.com> wrote:
On Tue, 11 Sep 2018 12:49:12 +0100 "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
* Marc-André Lureau (marcandre.lureau@redhat.com) wrote:
Hi
On Tue, Sep 11, 2018 at 3:26 PM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > * Marc-André Lureau (marcandre.lureau@redhat.com) wrote: >> Hi >> >> On Tue, Sep 11, 2018 at 2:32 PM, Dr. David Alan Gilbert >> <dgilbert@redhat.com> wrote: >> > * Marc-André Lureau (marcandre.lureau@redhat.com) wrote: >> >> Hi >> >> >> >> On Tue, Sep 11, 2018 at 12:37 PM, Michal Privoznik <mprivozn@redhat.com> wrote: >> >> > On 09/11/2018 12:46 AM, John Ferlan wrote: >> >> >> >> >> >> On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote: >> >> >>> From: Marc-André Lureau <marcandre.lureau@redhat.com> >> >> >>> >> >> >> >> >> >> Would be nice to have a few more words here. If you provide them I can >> >> >> add them... The if statement is difficult to read unless you know what >> >> >> each field really means. >> >> >> >> >> >> secondary question - should we document what gets used?, e.g.: >> >> >> >> >> >> https://libvirt.org/formatdomain.html#elementsMemoryBacking >> >> >> >> >> >> Seems to me the preference to use memfd is for memory backing using >> >> >> anonymous source for nvdimm's without a defined path, but sometimes my >> >> >> wording doesn't match reality. >> >> > >> >> > I don't think we want to tell users what backend are we going to use >> >> > under what conditions. Firstly, these conditions will change (as they >> >> > did in the past). Secondly, what backend libvirt decides to use is no >> >> > business of users. I mean, they care about providing XML that matches >> >> > their demands. It's libvirt's job to fulfil them. >> >> > >> >> > Look at this from the other way: if an user wants to have >> >> > memory-backend-file for his domain, how would they enforce it once memfd >> >> > is merged? Sure, they can tweak their memoryBacking settings, but that >> >> > would work only until we decide to change the decision process for mem >> >> > backend. >> >> > >> >> > What I am more worried about is migration. What happens if I migrate a >> >> > hugepages domain from older libvirt to a newer one (the former doesn't >> >> > support memfd, the latter does). On the source the domain was started >> >> > with memory-backend-file (or memory-backend-ram with -mem-path). And >> >> > during migration, the generated cmd line would use memfd. And I don't >> >> > think qemu is capable of dealing with this discrepancy, is it? >> >> >> >> >> >> Actually, qemu doesn't care about the hostmem backend kind, it should >> >> handle the migration ok. >> >> >> >> However, there seems to be a bug in qemu, and hostmem backend don't >> >> use the right qom object name. >> > >> > Can you give me the command lines you're using? >> >> qemu -m 4096 -object memory-backend-ram,id=mem,size=4G -numa >> node,memdev=mem -monitor stdio >> qemu -m 4096 -object >> memory-backend-file,id=mem,size=4G,mem-path=/tmp/foo -numa >> node,memdev=mem -monitor stdio >> qemu -m 4096 -object memory-backend-memfd,id=mem,size=4G -numa >> node,memdev=mem -monitor stdio > > There seem to be two different problems (at least); there's that > escaping problem where the /'s are shown as \x2f in into qom-tree,
That's not a problem, this is done in memory_region_escape_name()
> but info ramblock looks saner, but is still showing the difference: > > ./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-ram,id=mem,size=1G -numa node,memdev=mem -monitor stdio > (qemu) info ramblock > Block Name PSize Offset Used Total > mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000 > > ./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-file,id=mem,size=1G,mem-path=/tmp/foo -numa node,memdev=mem -monitor stdio > (qemu) info ramblock > Block Name PSize Offset Used Total > /objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000 > > ./x86_64-softmmu/qemu-system-x86_64 -m 1024 -object memory-backend-memfd,id=mem,size=1G -numa node,memdev=mem -monitor stdio > QEMU 3.0.50 monitor - type 'help' for more information > (qemu) info ramblock > Block Name PSize Offset Used Total > /objects/mem 4 KiB 0x0000000000000000 0x0000000040000000 0x0000000040000000 > > hostmem-file.c is using object_get_canonical_path to get the RAMBlock > where as hostmem-ram.c is using object_get_canonical_path_**component** > > The problem is if we change either of them then again we break > migration compatibility.
Yes, that was the object of my question :)
> We could wire it to a machine type and/or property, so that > memory-backend-ram would use the long name on newere qemus with an > appropriate flag?
Good idea, I can prepare a patch.
Great; if you add the property to use the longname, then turn that property on in the newer machine type it should work. A qemu that has the property can then be assumed to the right thing when set. compat properties mechanism is applicable only for device based objects and backends are not based on it. So it won't be so easy, one basically would need to re-implement or event better extend compat props mechanism to backends.
indeed
However, libvirt will have to learn of this migration issue with older version, it's probably not worth to try to make more workarounds.
Yeh I'm not sure what your heuristics look like for these choices. But for a VM without this fix then you can't convert from backend-ram to memfd. I wouldn't try migrate from one to backend type to another automatically if domain used backend-ram than libvirt should start target with the same backend (it not only ram block name in migration stream, but could also involve ramblock's alignment, padding, guard pages or something else as it's different backends and potentially can change its default behavior independently from each other).
Then libvirt can't transparently use memfd, and we will go back to my initial suggestion to have a new memory backing source kind in the domain XML named "memfd". less magic the better, the only downside is that implementation details of a QEMU backend sip through abstraction libvirt is supposed to produce for it's users and a question how users are supposed to pick a backend variant for their needs.
Are "ramblock's alignment, padding, guard pages" exposed in domain XML? Didn't they change over time in qemu wtihout libvirt noticing? Why allocation with memfd couldn't be transparently be changed the same way?
Redefining meaning of 'anonymous' from backend-ram to memfd is fine only if libvirt is able to distinguish old domains with ram backend vs memfd (so it could start domains accordingly, i.e. no cross migration).
And memory-backend-file used as anonymous memory (without explicit path etc).
Otherwise we would be creating time bomb, that would explode when 2 independent backends change in incompatible manner.
If there is such a limitation, qemu should prevent it then. It seems qemu let you migrate from/to the various hostmem-* (as long they use the same name, which is the case for -file and -memfd at this point). Why restrict that now?
it works buy luck not by design. Even though qemu doesn't block it, it doesn't mean that's the right thing to do. Rule of the thumb with migration is that CLI on destination should match one one source (i.e. no magical cli replacements). If it's not then user is to blame.
The rule isn't actually that strong. We normally allow the backends to change as long as the guest visible parts don't. For example, it's perfectly legal to migrate between a qemu that's got it's virtio-blk wired to a NFS disk to a qemu that's got it wired to iSCSI - the guest view in the two cases is the same but the command line is quite different. Similarly for networking you can flip to different tap setups.
So as long as the change: a) looks identical to the guest well, in case of memory-backend, alignment of backing storage/memory_region affects guest ABI (RAM layout) (maybe there are other variables). So one switching memory backends should ensure that new backend and it's options will match old backend properties (on QEMU side we don't have anything ensure it (hopefully migration would fail due to different GPA layout, but that's it).
Considering that a backend author typically cares only about his own backend, I wouldn't bet on backend switching being not regressed. Hence I don't really support idea of magical backend switching and putting extra effort on QEMU side to make it work and maintain. It would be more robust to add an additional type of anonymous memory in domain description or let libvirt toggle anonymous meaning/backend based on domain machine type/version and host capabilities (has memfd or not/supported hugepage sizes/...). This way old domains will continue using old backends and new ones will use new ones.
b) Doesn't have any backend specific migration data
then a migration should work and I'd expect it to work.
Dave
Dave
> Dave > > > >> > >> > Dave >> > >> >> with memory-backend-ram: >> >> >> >> (qemu) info qom-tree /objects >> >> /objects (container) >> >> /mem (memory-backend-file) >> >> /mem[0] (qemu:memory-region) >> >> >> >> But with memory-backend-file or memory-backend-memfd: >> >> >> >> (qemu) info qom-tree /objects >> >> /objects (container) >> >> /mem (memory-backend-file) >> >> /\x2fobjects\x2fmem[0] (qemu:memory-region) >> >> >> >> >> >> This causes migration to fail because of the object naming mismatch. >> >> >> >> It can migrate from/to -file and -memfd, since they use the same >> >> "broken" name, but not with -ram. >> >> >> >> I don't know how we can solve this migration issue without breaking >> >> things further. Any idea David? >> >> >> >> > Or is memfd going to be used only for hugepages + <source >> >> > type='anonymous'/> case (which is not allowed now and thus migration >> >> > scenario I'm describing can't happen)? >> >> >> >> With those patches, memfd is used for anonymous memory (shared or not, >> >> hpt or not) with an explicit numa configuration. >> >> >> >> thanks >> > -- >> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

From: Marc-André Lureau <marcandre.lureau@redhat.com> memfd is able to allocate hugepage anonymous memory. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/conf/domain_conf.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 86199623cc..696cf6ef18 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -6186,13 +6186,6 @@ virDomainDefMemtuneValidate(const virDomainDef *def) return -1; } - if (mem->source == VIR_DOMAIN_MEMORY_SOURCE_ANONYMOUS) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("hugepages are not allowed with anonymous " - "memory source")); - return -1; - } - for (i = 0; i < mem->nhugepages; i++) { size_t j; ssize_t nextBit; -- 2.19.0.rc1

"non-anonymous" On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
memfd is able to allocate hugepage anonymous memory.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/conf/domain_conf.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 86199623cc..696cf6ef18 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -6186,13 +6186,6 @@ virDomainDefMemtuneValidate(const virDomainDef *def) return -1; }
- if (mem->source == VIR_DOMAIN_MEMORY_SOURCE_ANONYMOUS) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("hugepages are not allowed with anonymous " - "memory source")); - return -1; - } -
I believe we need to move this check into qemu specific code that would then be able to test for QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB See qemuDomainDefValidateMemory and go from there. I think this may require 2 patches though... One to move the two checks that I don't think are "mem" specific and the next to add the "filter" that if the capability exists, then we can support; otherwise, still fail. "Theoretically speaking" those are qemu specific checks - the nodemask checks done after this would appear to be more generic. John
for (i = 0; i < mem->nhugepages; i++) { size_t j; ssize_t nextBit;

Hi On Tue, Sep 11, 2018 at 2:56 AM, John Ferlan <jferlan@redhat.com> wrote:
"non-anonymous"
On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
memfd is able to allocate hugepage anonymous memory.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/conf/domain_conf.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 86199623cc..696cf6ef18 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -6186,13 +6186,6 @@ virDomainDefMemtuneValidate(const virDomainDef *def) return -1; }
- if (mem->source == VIR_DOMAIN_MEMORY_SOURCE_ANONYMOUS) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("hugepages are not allowed with anonymous " - "memory source")); - return -1; - } -
I believe we need to move this check into qemu specific code that would then be able to test for QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB
Would that be what you have in mind? diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 5329899b13..d152466e28 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -3950,10 +3950,19 @@ qemuDomainDefValidateFeatures(const virDomainDef *def, static int -qemuDomainDefValidateMemory(const virDomainDef *def) +qemuDomainDefValidateMemory(const virDomainDef *def, + virQEMUCapsPtr qemuCaps) { const long system_page_size = virGetSystemPageSizeKB(); + if (def->mem.nhugepages != 0 && + def->mem.source == VIR_DOMAIN_MEMORY_SOURCE_ANONYMOUS && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("anonymous memory source with hugepages is not supported")); + return -1; + } + /* We can't guarantee any other mem.access * if no guest NUMA nodes are defined. */ if (def->mem.nhugepages != 0 && @@ -4094,7 +4103,7 @@ qemuDomainDefValidate(const virDomainDef *def, if (qemuDomainDefValidateFeatures(def, qemuCaps) < 0) goto cleanup; - if (qemuDomainDefValidateMemory(def) < 0) + if (qemuDomainDefValidateMemory(def, qemuCaps) < 0) goto cleanup; ret = 0;
See qemuDomainDefValidateMemory and go from there. I think this may require 2 patches though... One to move the two checks that I don't think are "mem" specific and the next to add the "filter" that if the capability exists, then we can support; otherwise, still fail.
"Theoretically speaking" those are qemu specific checks - the nodemask checks done after this would appear to be more generic.
John
for (i = 0; i < mem->nhugepages; i++) { size_t j; ssize_t nextBit;

On 09/11/2018 04:37 AM, Marc-André Lureau wrote:
Hi
On Tue, Sep 11, 2018 at 2:56 AM, John Ferlan <jferlan@redhat.com> wrote:
"non-anonymous"
On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
memfd is able to allocate hugepage anonymous memory.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- src/conf/domain_conf.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 86199623cc..696cf6ef18 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -6186,13 +6186,6 @@ virDomainDefMemtuneValidate(const virDomainDef *def) return -1; }
- if (mem->source == VIR_DOMAIN_MEMORY_SOURCE_ANONYMOUS) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("hugepages are not allowed with anonymous " - "memory source")); - return -1; - } -
I believe we need to move this check into qemu specific code that would then be able to test for QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB
Would that be what you have in mind?
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 5329899b13..d152466e28 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -3950,10 +3950,19 @@ qemuDomainDefValidateFeatures(const virDomainDef *def,
static int -qemuDomainDefValidateMemory(const virDomainDef *def) +qemuDomainDefValidateMemory(const virDomainDef *def, + virQEMUCapsPtr qemuCaps) { const long system_page_size = virGetSystemPageSizeKB();
+ if (def->mem.nhugepages != 0 && + def->mem.source == VIR_DOMAIN_MEMORY_SOURCE_ANONYMOUS && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("anonymous memory source with hugepages is not supported")); + return -1; + } + /* We can't guarantee any other mem.access * if no guest NUMA nodes are defined. */ if (def->mem.nhugepages != 0 && @@ -4094,7 +4103,7 @@ qemuDomainDefValidate(const virDomainDef *def, if (qemuDomainDefValidateFeatures(def, qemuCaps) < 0) goto cleanup;
- if (qemuDomainDefValidateMemory(def) < 0) + if (qemuDomainDefValidateMemory(def, qemuCaps) < 0) goto cleanup;
ret = 0;
Yes, more or less... That's the end result, but we have to "get there" first. I'll post the "getting there" first patch and then can adjust your series from there. Tks John
See qemuDomainDefValidateMemory and go from there. I think this may require 2 patches though... One to move the two checks that I don't think are "mem" specific and the next to add the "filter" that if the capability exists, then we can support; otherwise, still fail.
"Theoretically speaking" those are qemu specific checks - the nodemask checks done after this would appear to be more generic.
John
for (i = 0; i < mem->nhugepages; i++) { size_t j; ssize_t nextBit;

From: Marc-André Lureau <marcandre.lureau@redhat.com> Check anonymous memory is backed by memfd if qemu is capable. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- tests/qemuxml2argvdata/memfd-memory-numa.args | 28 +++++++++++++++ tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 +++++++++++++++++++ tests/qemuxml2argvtest.c | 5 +++ 3 files changed, 69 insertions(+) create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.args create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml diff --git a/tests/qemuxml2argvdata/memfd-memory-numa.args b/tests/qemuxml2argvdata/memfd-memory-numa.args new file mode 100644 index 0000000000..b26c476196 --- /dev/null +++ b/tests/qemuxml2argvdata/memfd-memory-numa.args @@ -0,0 +1,28 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name instance-00000092 \ +-S \ +-machine pc-i440fx-wily,accel=kvm,usb=off,dump-guest-core=off \ +-m 14336 \ +-mem-prealloc \ +-smp 20,sockets=1,cores=8,threads=1 \ +-object memory-backend-memfd,id=ram-node0,hugetlb=yes,hugetlbsize=2097152,share=yes,\ +size=15032385536,host-nodes=3,policy=preferred \ +-numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \ +-uuid 126f2720-6f8e-45ab-a886-ec9277079a67 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,\ +path=/tmp/lib/domain--1-instance-00000092/monitor.sock,server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/memfd-memory-numa.xml b/tests/qemuxml2argvdata/memfd-memory-numa.xml new file mode 100644 index 0000000000..abe93e8c4b --- /dev/null +++ b/tests/qemuxml2argvdata/memfd-memory-numa.xml @@ -0,0 +1,36 @@ + <domain type='kvm' id='56'> + <name>instance-00000092</name> + <uuid>126f2720-6f8e-45ab-a886-ec9277079a67</uuid> + <memory unit='KiB'>14680064</memory> + <currentMemory unit='KiB'>14680064</currentMemory> + <memoryBacking> + <hugepages> + <page size="2" unit="M"/> + </hugepages> + <source type='anonymous'/> + <access mode='shared'/> + <allocation mode='immediate'/> + </memoryBacking> + <numatune> + <memnode cellid='0' mode='preferred' nodeset='3'/> + </numatune> + <vcpu placement='static'>20</vcpu> + <os> + <type arch='x86_64' machine='pc-i440fx-wily'>hvm</type> + <boot dev='hd'/> + </os> + <cpu> + <topology sockets='1' cores='8' threads='1'/> + <numa> + <cell id='0' cpus='0-7' memory='14680064' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <memballoon model='virtio'/> + </devices> + </domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 35df63b2ac..76008a8d07 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -2928,6 +2928,11 @@ mymain(void) DO_TEST("fd-memory-no-numa-topology", QEMU_CAPS_OBJECT_MEMORY_FILE, QEMU_CAPS_KVM); + DO_TEST("memfd-memory-numa", + QEMU_CAPS_OBJECT_MEMORY_MEMFD, + QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB, + QEMU_CAPS_KVM); + DO_TEST("cpu-check-none", QEMU_CAPS_KVM); DO_TEST("cpu-check-partial", QEMU_CAPS_KVM); DO_TEST("cpu-check-full", QEMU_CAPS_KVM); -- 2.19.0.rc1

On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Check anonymous memory is backed by memfd if qemu is capable.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- tests/qemuxml2argvdata/memfd-memory-numa.args | 28 +++++++++++++++ tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 +++++++++++++++++++ tests/qemuxml2argvtest.c | 5 +++ 3 files changed, 69 insertions(+) create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.args create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml
diff --git a/tests/qemuxml2argvdata/memfd-memory-numa.args b/tests/qemuxml2argvdata/memfd-memory-numa.args new file mode 100644 index 0000000000..b26c476196 --- /dev/null +++ b/tests/qemuxml2argvdata/memfd-memory-numa.args @@ -0,0 +1,28 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name instance-00000092 \ +-S \ +-machine pc-i440fx-wily,accel=kvm,usb=off,dump-guest-core=off \ +-m 14336 \ +-mem-prealloc \ +-smp 20,sockets=1,cores=8,threads=1 \ +-object memory-backend-memfd,id=ram-node0,hugetlb=yes,hugetlbsize=2097152,share=yes,\ +size=15032385536,host-nodes=3,policy=preferred \
Another syntax-check error here, needed to move the "share=yes," to the subsequent line.
+-numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \ +-uuid 126f2720-6f8e-45ab-a886-ec9277079a67 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,\ +path=/tmp/lib/domain--1-instance-00000092/monitor.sock,server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/memfd-memory-numa.xml b/tests/qemuxml2argvdata/memfd-memory-numa.xml new file mode 100644 index 0000000000..abe93e8c4b --- /dev/null +++ b/tests/qemuxml2argvdata/memfd-memory-numa.xml @@ -0,0 +1,36 @@ + <domain type='kvm' id='56'> + <name>instance-00000092</name> + <uuid>126f2720-6f8e-45ab-a886-ec9277079a67</uuid> + <memory unit='KiB'>14680064</memory> + <currentMemory unit='KiB'>14680064</currentMemory> + <memoryBacking> + <hugepages> + <page size="2" unit="M"/> + </hugepages> + <source type='anonymous'/> + <access mode='shared'/> + <allocation mode='immediate'/> + </memoryBacking> + <numatune> + <memnode cellid='0' mode='preferred' nodeset='3'/> + </numatune> + <vcpu placement='static'>20</vcpu> + <os> + <type arch='x86_64' machine='pc-i440fx-wily'>hvm</type> + <boot dev='hd'/> + </os> + <cpu> + <topology sockets='1' cores='8' threads='1'/> + <numa> + <cell id='0' cpus='0-7' memory='14680064' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <memballoon model='virtio'/> + </devices> + </domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 35df63b2ac..76008a8d07 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -2928,6 +2928,11 @@ mymain(void) DO_TEST("fd-memory-no-numa-topology", QEMU_CAPS_OBJECT_MEMORY_FILE, QEMU_CAPS_KVM);
+ DO_TEST("memfd-memory-numa", + QEMU_CAPS_OBJECT_MEMORY_MEMFD, + QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB, + QEMU_CAPS_KVM); +
Theoretically, if we have 3.1 capabilties to test against, then this would use a DO_TEST_CAPS_LATEST, while a "pre-3.1" would still be using -ramfd, right? That is, using DO_TEST_CAPS_VER w/ "3.0.0" would generate different results. I'm conflicted if we should wait for someone to generate the 3.1 caps or not. For whatever reason, when I post them they're not quite right for someone else's tastes... Let's see if anyone else has strong feelings one way or another. John
DO_TEST("cpu-check-none", QEMU_CAPS_KVM); DO_TEST("cpu-check-partial", QEMU_CAPS_KVM); DO_TEST("cpu-check-full", QEMU_CAPS_KVM);

Hi On Tue, Sep 11, 2018 at 2:57 AM, John Ferlan <jferlan@redhat.com> wrote:
On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Check anonymous memory is backed by memfd if qemu is capable.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- tests/qemuxml2argvdata/memfd-memory-numa.args | 28 +++++++++++++++ tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 +++++++++++++++++++ tests/qemuxml2argvtest.c | 5 +++ 3 files changed, 69 insertions(+) create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.args create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml
diff --git a/tests/qemuxml2argvdata/memfd-memory-numa.args b/tests/qemuxml2argvdata/memfd-memory-numa.args new file mode 100644 index 0000000000..b26c476196 --- /dev/null +++ b/tests/qemuxml2argvdata/memfd-memory-numa.args @@ -0,0 +1,28 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name instance-00000092 \ +-S \ +-machine pc-i440fx-wily,accel=kvm,usb=off,dump-guest-core=off \ +-m 14336 \ +-mem-prealloc \ +-smp 20,sockets=1,cores=8,threads=1 \ +-object memory-backend-memfd,id=ram-node0,hugetlb=yes,hugetlbsize=2097152,share=yes,\ +size=15032385536,host-nodes=3,policy=preferred \
Another syntax-check error here, needed to move the "share=yes," to the subsequent line.
ok
+-numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \ +-uuid 126f2720-6f8e-45ab-a886-ec9277079a67 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,\ +path=/tmp/lib/domain--1-instance-00000092/monitor.sock,server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/memfd-memory-numa.xml b/tests/qemuxml2argvdata/memfd-memory-numa.xml new file mode 100644 index 0000000000..abe93e8c4b --- /dev/null +++ b/tests/qemuxml2argvdata/memfd-memory-numa.xml @@ -0,0 +1,36 @@ + <domain type='kvm' id='56'> + <name>instance-00000092</name> + <uuid>126f2720-6f8e-45ab-a886-ec9277079a67</uuid> + <memory unit='KiB'>14680064</memory> + <currentMemory unit='KiB'>14680064</currentMemory> + <memoryBacking> + <hugepages> + <page size="2" unit="M"/> + </hugepages> + <source type='anonymous'/> + <access mode='shared'/> + <allocation mode='immediate'/> + </memoryBacking> + <numatune> + <memnode cellid='0' mode='preferred' nodeset='3'/> + </numatune> + <vcpu placement='static'>20</vcpu> + <os> + <type arch='x86_64' machine='pc-i440fx-wily'>hvm</type> + <boot dev='hd'/> + </os> + <cpu> + <topology sockets='1' cores='8' threads='1'/> + <numa> + <cell id='0' cpus='0-7' memory='14680064' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <memballoon model='virtio'/> + </devices> + </domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 35df63b2ac..76008a8d07 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -2928,6 +2928,11 @@ mymain(void) DO_TEST("fd-memory-no-numa-topology", QEMU_CAPS_OBJECT_MEMORY_FILE, QEMU_CAPS_KVM);
+ DO_TEST("memfd-memory-numa", + QEMU_CAPS_OBJECT_MEMORY_MEMFD, + QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB, + QEMU_CAPS_KVM); +
Theoretically, if we have 3.1 capabilties to test against, then this would use a DO_TEST_CAPS_LATEST, while a "pre-3.1" would still be using -ramfd, right? That is, using DO_TEST_CAPS_VER w/ "3.0.0" would generate different results.
I'm conflicted if we should wait for someone to generate the 3.1 caps or not. For whatever reason, when I post them they're not quite right for someone else's tastes...
Let's see if anyone else has strong feelings one way or another.
-memfd is available since 2.12. After patch 1 & 2 are applied, we should probably switch to use DO_TEST_CAPS_LATEST. Before 2.12 (or if the capabilities are not exposed by the host qemu) the argv will use -file. This is already covered by existing tests, like hugepages-shared. thanks
John
DO_TEST("cpu-check-none", QEMU_CAPS_KVM); DO_TEST("cpu-check-partial", QEMU_CAPS_KVM); DO_TEST("cpu-check-full", QEMU_CAPS_KVM);

[...]
diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 35df63b2ac..76008a8d07 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -2928,6 +2928,11 @@ mymain(void) DO_TEST("fd-memory-no-numa-topology", QEMU_CAPS_OBJECT_MEMORY_FILE, QEMU_CAPS_KVM);
+ DO_TEST("memfd-memory-numa", + QEMU_CAPS_OBJECT_MEMORY_MEMFD, + QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB, + QEMU_CAPS_KVM); +
Theoretically, if we have 3.1 capabilties to test against, then this would use a DO_TEST_CAPS_LATEST, while a "pre-3.1" would still be using -ramfd, right? That is, using DO_TEST_CAPS_VER w/ "3.0.0" would generate different results.
I'm conflicted if we should wait for someone to generate the 3.1 caps or not. For whatever reason, when I post them they're not quite right for someone else's tastes...
Let's see if anyone else has strong feelings one way or another.
-memfd is available since 2.12. After patch 1 & 2 are applied, we should probably switch to use DO_TEST_CAPS_LATEST.
Theoretically patches 3, 4, and 5 could be one patch, but having separate also works well for review purposes! While MEMFD is there is the HUGETLB and comment in page 2 about QEMU 3.1 that is what I was concerned with, especially since 2.12 and 3.0 find the value... Looking at the QEMU sources, I see you added the field in commit dbb9e0f40, which is 2.12 based. Still reading deeper into the comments in patch 2, it just seems that @hugetlbsize has some sort run-time issue that gets fixed by 3.1. Harder for libvirt to detect that an issue exists unless something was added in 3.1 that libvirt could test on for a capability. I'm not sure what the issue is, but maybe that's something document-able at least with respect to what values are provided in the XML for memoryBacking. John
Before 2.12 (or if the capabilities are not exposed by the host qemu) the argv will use -file. This is already covered by existing tests, like hugepages-shared.
thanks
John
DO_TEST("cpu-check-none", QEMU_CAPS_KVM); DO_TEST("cpu-check-partial", QEMU_CAPS_KVM); DO_TEST("cpu-check-full", QEMU_CAPS_KVM);

Hi On Tue, Sep 11, 2018 at 5:21 PM, John Ferlan <jferlan@redhat.com> wrote:
[...]
diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 35df63b2ac..76008a8d07 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -2928,6 +2928,11 @@ mymain(void) DO_TEST("fd-memory-no-numa-topology", QEMU_CAPS_OBJECT_MEMORY_FILE, QEMU_CAPS_KVM);
+ DO_TEST("memfd-memory-numa", + QEMU_CAPS_OBJECT_MEMORY_MEMFD, + QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB, + QEMU_CAPS_KVM); +
Theoretically, if we have 3.1 capabilties to test against, then this would use a DO_TEST_CAPS_LATEST, while a "pre-3.1" would still be using -ramfd, right? That is, using DO_TEST_CAPS_VER w/ "3.0.0" would generate different results.
I'm conflicted if we should wait for someone to generate the 3.1 caps or not. For whatever reason, when I post them they're not quite right for someone else's tastes...
Let's see if anyone else has strong feelings one way or another.
-memfd is available since 2.12. After patch 1 & 2 are applied, we should probably switch to use DO_TEST_CAPS_LATEST.
Theoretically patches 3, 4, and 5 could be one patch, but having separate also works well for review purposes!
While MEMFD is there is the HUGETLB and comment in page 2 about QEMU 3.1 that is what I was concerned with, especially since 2.12 and 3.0 find the value...
Looking at the QEMU sources, I see you added the field in commit dbb9e0f40, which is 2.12 based.
It's added in 2.12: git describe --contains --match=v2* dbb9e0f40 v2.12.0-rc0~107^2~8 However, only with upcoming patch for 3.1 (queued by Paolo today) will the hugetlb properties be run-time checked/exposed.
Still reading deeper into the comments in patch 2, it just seems that @hugetlbsize has some sort run-time issue that gets fixed by 3.1. Harder
It's not an issue, but it will help libvirt to figure out before starting qemu if anonymous memfd hugetlb is supported.
for libvirt to detect that an issue exists unless something was added in 3.1 that libvirt could test on for a capability. I'm not sure what the issue is, but maybe that's something document-able at least with respect to what values are provided in the XML for memoryBacking.
If you request anonymous memory & hugetlb today, you have a libvirt error. With the series, if the host/qemu doesn't support it, you will get an error. https://libvirt.org/formatdomain.html#elementsMemoryBacking There is no documentation about the file memory backing requirement today (it seems). We could explain it and add that a memfd-hugetlb-capable doesn't need it (when there is no numa assignment). Is this what you are asking?
John
Before 2.12 (or if the capabilities are not exposed by the host qemu) the argv will use -file. This is already covered by existing tests, like hugepages-shared.
thanks
John
DO_TEST("cpu-check-none", QEMU_CAPS_KVM); DO_TEST("cpu-check-partial", QEMU_CAPS_KVM); DO_TEST("cpu-check-full", QEMU_CAPS_KVM);

On 09/11/2018 09:45 AM, Marc-André Lureau wrote:
Hi
On Tue, Sep 11, 2018 at 5:21 PM, John Ferlan <jferlan@redhat.com> wrote:
[...]
diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 35df63b2ac..76008a8d07 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -2928,6 +2928,11 @@ mymain(void) DO_TEST("fd-memory-no-numa-topology", QEMU_CAPS_OBJECT_MEMORY_FILE, QEMU_CAPS_KVM);
+ DO_TEST("memfd-memory-numa", + QEMU_CAPS_OBJECT_MEMORY_MEMFD, + QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB, + QEMU_CAPS_KVM); +
Theoretically, if we have 3.1 capabilties to test against, then this would use a DO_TEST_CAPS_LATEST, while a "pre-3.1" would still be using -ramfd, right? That is, using DO_TEST_CAPS_VER w/ "3.0.0" would generate different results.
I'm conflicted if we should wait for someone to generate the 3.1 caps or not. For whatever reason, when I post them they're not quite right for someone else's tastes...
Let's see if anyone else has strong feelings one way or another.
-memfd is available since 2.12. After patch 1 & 2 are applied, we should probably switch to use DO_TEST_CAPS_LATEST.
Theoretically patches 3, 4, and 5 could be one patch, but having separate also works well for review purposes!
While MEMFD is there is the HUGETLB and comment in page 2 about QEMU 3.1 that is what I was concerned with, especially since 2.12 and 3.0 find the value...
Looking at the QEMU sources, I see you added the field in commit dbb9e0f40, which is 2.12 based.
It's added in 2.12: git describe --contains --match=v2* dbb9e0f40 v2.12.0-rc0~107^2~8
However, only with upcoming patch for 3.1 (queued by Paolo today) will the hugetlb properties be run-time checked/exposed.
Still reading deeper into the comments in patch 2, it just seems that @hugetlbsize has some sort run-time issue that gets fixed by 3.1. Harder
It's not an issue, but it will help libvirt to figure out before starting qemu if anonymous memfd hugetlb is supported.
for libvirt to detect that an issue exists unless something was added in 3.1 that libvirt could test on for a capability. I'm not sure what the issue is, but maybe that's something document-able at least with respect to what values are provided in the XML for memoryBacking.
If you request anonymous memory & hugetlb today, you have a libvirt error. With the series, if the host/qemu doesn't support it, you will get an error.
Now I'm getting more confused. With this patch series applied, but without the 3.1 changes, if the anonymous memfd hugetlb is used there will be a run time issue? IOW: Does it really only work in 3.1? If so, then we need to figure out a mechanism for determining that as there's no reason to "default to" -memfd then for 2.12 and 3.0, right?
https://libvirt.org/formatdomain.html#elementsMemoryBacking
There is no documentation about the file memory backing requirement today (it seems). We could explain it and add that a memfd-hugetlb-capable doesn't need it (when there is no numa assignment). Is this what you are asking?
Essentially - I'm sure we'd have to carefully word things to take into account Michal's position of we don't want to describe the conditions related to what backend is being used "by default" and for "which version". Still I think the whether to document or not is related to what the hugetlb problem is. Tough to say don't use this unless you have qemu 3.1 installed even though it's supported back to 2.12. I don't even want to think about describing the migration discussion... John
John
Before 2.12 (or if the capabilities are not exposed by the host qemu) the argv will use -file. This is already covered by existing tests, like hugepages-shared.
thanks
John
DO_TEST("cpu-check-none", QEMU_CAPS_KVM); DO_TEST("cpu-check-partial", QEMU_CAPS_KVM); DO_TEST("cpu-check-full", QEMU_CAPS_KVM);

Hi On Tue, Sep 11, 2018 at 7:39 PM, John Ferlan <jferlan@redhat.com> wrote:
On 09/11/2018 09:45 AM, Marc-André Lureau wrote:
Hi
On Tue, Sep 11, 2018 at 5:21 PM, John Ferlan <jferlan@redhat.com> wrote:
[...]
diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 35df63b2ac..76008a8d07 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -2928,6 +2928,11 @@ mymain(void) DO_TEST("fd-memory-no-numa-topology", QEMU_CAPS_OBJECT_MEMORY_FILE, QEMU_CAPS_KVM);
+ DO_TEST("memfd-memory-numa", + QEMU_CAPS_OBJECT_MEMORY_MEMFD, + QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB, + QEMU_CAPS_KVM); +
Theoretically, if we have 3.1 capabilties to test against, then this would use a DO_TEST_CAPS_LATEST, while a "pre-3.1" would still be using -ramfd, right? That is, using DO_TEST_CAPS_VER w/ "3.0.0" would generate different results.
I'm conflicted if we should wait for someone to generate the 3.1 caps or not. For whatever reason, when I post them they're not quite right for someone else's tastes...
Let's see if anyone else has strong feelings one way or another.
-memfd is available since 2.12. After patch 1 & 2 are applied, we should probably switch to use DO_TEST_CAPS_LATEST.
Theoretically patches 3, 4, and 5 could be one patch, but having separate also works well for review purposes!
While MEMFD is there is the HUGETLB and comment in page 2 about QEMU 3.1 that is what I was concerned with, especially since 2.12 and 3.0 find the value...
Looking at the QEMU sources, I see you added the field in commit dbb9e0f40, which is 2.12 based.
It's added in 2.12: git describe --contains --match=v2* dbb9e0f40 v2.12.0-rc0~107^2~8
However, only with upcoming patch for 3.1 (queued by Paolo today) will the hugetlb properties be run-time checked/exposed.
Still reading deeper into the comments in patch 2, it just seems that @hugetlbsize has some sort run-time issue that gets fixed by 3.1. Harder
It's not an issue, but it will help libvirt to figure out before starting qemu if anonymous memfd hugetlb is supported.
for libvirt to detect that an issue exists unless something was added in 3.1 that libvirt could test on for a capability. I'm not sure what the issue is, but maybe that's something document-able at least with respect to what values are provided in the XML for memoryBacking.
If you request anonymous memory & hugetlb today, you have a libvirt error. With the series, if the host/qemu doesn't support it, you will get an error.
Now I'm getting more confused. With this patch series applied, but without the 3.1 changes, if the anonymous memfd hugetlb is used there will be a run time issue?
IOW: Does it really only work in 3.1? If so, then we need to figure out a mechanism for determining that as there's no reason to "default to" -memfd then for 2.12 and 3.0, right?
No, it will work with 2.12, 3.0 or 3.1 as long as the host is capable. What qemu will do in 3.1 is probe a bit the host to check if hugetlb-memfd is supported by the host. In all cases, hugetlb allocation (or allocation in general) can still fail at run time due unsatisfiable request (limits, page size etc).
https://libvirt.org/formatdomain.html#elementsMemoryBacking
There is no documentation about the file memory backing requirement today (it seems). We could explain it and add that a memfd-hugetlb-capable doesn't need it (when there is no numa assignment). Is this what you are asking?
Essentially - I'm sure we'd have to carefully word things to take into account Michal's position of we don't want to describe the conditions related to what backend is being used "by default" and for "which version". Still I think the whether to document or not is related to what the hugetlb problem is. Tough to say don't use this unless you have qemu 3.1 installed even though it's supported back to 2.12. I don't even want to think about describing the migration discussion...
ok, I think it's not worth documenting at this point if we want and can make things transparent to the user.
John
John
Before 2.12 (or if the capabilities are not exposed by the host qemu) the argv will use -file. This is already covered by existing tests, like hugepages-shared.
thanks
John
DO_TEST("cpu-check-none", QEMU_CAPS_KVM); DO_TEST("cpu-check-partial", QEMU_CAPS_KVM); DO_TEST("cpu-check-full", QEMU_CAPS_KVM);

On 09/11/2018 04:48 AM, Marc-André Lureau wrote:
Hi
On Tue, Sep 11, 2018 at 2:57 AM, John Ferlan <jferlan@redhat.com> wrote:
On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Check anonymous memory is backed by memfd if qemu is capable.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- tests/qemuxml2argvdata/memfd-memory-numa.args | 28 +++++++++++++++ tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 +++++++++++++++++++ tests/qemuxml2argvtest.c | 5 +++ 3 files changed, 69 insertions(+) create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.args create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml
diff --git a/tests/qemuxml2argvdata/memfd-memory-numa.args b/tests/qemuxml2argvdata/memfd-memory-numa.args new file mode 100644 index 0000000000..b26c476196 --- /dev/null +++ b/tests/qemuxml2argvdata/memfd-memory-numa.args @@ -0,0 +1,28 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name instance-00000092 \ +-S \ +-machine pc-i440fx-wily,accel=kvm,usb=off,dump-guest-core=off \ +-m 14336 \ +-mem-prealloc \ +-smp 20,sockets=1,cores=8,threads=1 \ +-object memory-backend-memfd,id=ram-node0,hugetlb=yes,hugetlbsize=2097152,share=yes,\ +size=15032385536,host-nodes=3,policy=preferred \
Another syntax-check error here, needed to move the "share=yes," to the subsequent line.
ok
+-numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \ +-uuid 126f2720-6f8e-45ab-a886-ec9277079a67 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,\ +path=/tmp/lib/domain--1-instance-00000092/monitor.sock,server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/memfd-memory-numa.xml b/tests/qemuxml2argvdata/memfd-memory-numa.xml new file mode 100644 index 0000000000..abe93e8c4b --- /dev/null +++ b/tests/qemuxml2argvdata/memfd-memory-numa.xml @@ -0,0 +1,36 @@ + <domain type='kvm' id='56'> + <name>instance-00000092</name> + <uuid>126f2720-6f8e-45ab-a886-ec9277079a67</uuid> + <memory unit='KiB'>14680064</memory> + <currentMemory unit='KiB'>14680064</currentMemory> + <memoryBacking> + <hugepages> + <page size="2" unit="M"/> + </hugepages> + <source type='anonymous'/> + <access mode='shared'/> + <allocation mode='immediate'/> + </memoryBacking> + <numatune> + <memnode cellid='0' mode='preferred' nodeset='3'/> + </numatune> + <vcpu placement='static'>20</vcpu> + <os> + <type arch='x86_64' machine='pc-i440fx-wily'>hvm</type>>> + <boot dev='hd'/> + </os> + <cpu> + <topology sockets='1' cores='8' threads='1'/> + <numa> + <cell id='0' cpus='0-7' memory='14680064' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <memballoon model='virtio'/> + </devices> + </domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 35df63b2ac..76008a8d07 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -2928,6 +2928,11 @@ mymain(void) DO_TEST("fd-memory-no-numa-topology", QEMU_CAPS_OBJECT_MEMORY_FILE, QEMU_CAPS_KVM);
+ DO_TEST("memfd-memory-numa", + QEMU_CAPS_OBJECT_MEMORY_MEMFD, + QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB, + QEMU_CAPS_KVM); +
Theoretically, if we have 3.1 capabilties to test against, then this would use a DO_TEST_CAPS_LATEST, while a "pre-3.1" would still be using -ramfd, right? That is, using DO_TEST_CAPS_VER w/ "3.0.0" would generate different results.
I'm conflicted if we should wait for someone to generate the 3.1 caps or not. For whatever reason, when I post them they're not quite right for someone else's tastes...
Let's see if anyone else has strong feelings one way or another.
-memfd is available since 2.12. After patch 1 & 2 are applied, we should probably switch to use DO_TEST_CAPS_LATEST.
hrmph - tried using CAPS_LATEST, and got the error "CPU topology doesn't match maximum vcpu count" well *that's* helpful /-|... The only libvirt test that cares about it currently is cpu-hotplug-startup and yes, the maxvcpus matches the cpu topology calculation... So, as long as I change vcpu count from 20 to 8, rename the tests/qemuxml2argvdata/memfd-memory-numa.args to memfd-memory-numa.x86_64-latest.args, and regenerate the output to: LC_ALL=C \ PATH=/bin \ HOME=/home/test \ USER=test \ LOGNAME=test \ QEMU_AUDIO_DRV=none \ /usr/bin/qemu-system-x86_64 \ -name guest=instance-00000092,debug-threads=on \ -S \ -object secret,id=masterKey0,format=raw,\ file=/tmp/lib/domain--1-instance-00000092/master-key.aes \ -machine pc-i440fx-wily,accel=kvm,usb=off,dump-guest-core=off \ -m 14336 \ -mem-prealloc \ -realtime mlock=off \ -smp 8,sockets=1,cores=8,threads=1 \ -object memory-backend-memfd,id=ram-node0,hugetlb=yes,hugetlbsize=2097152,\ share=yes,size=15032385536,host-nodes=3,policy=preferred \ -numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \ -uuid 126f2720-6f8e-45ab-a886-ec9277079a67 \ -display none \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=1729,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc \ -no-shutdown \ -no-acpi \ -boot strict=on \ -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x2 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,\ resourcecontrol=deny \ -msg timestamp=on Then, the test is happy. The memory-backend-memfd object doesn't change. So all that's "left": 1. "Add" a check in qemuDomainABIStabilityCheck to ensure we're not changing from memory-backend-ram to memory-backend-memfd. We already check that "(src->mem.source != dst->mem.source)" - so we know we're already anonymous or not. Any suggestions? If source is anonymous, then what? I think we can use the qemuDomainObjPrivatePtr in some way to determine that we were started with -memfd (or not started that way). 2. Get the patches I posted today to cleanup/move the memory backing checks from domain_conf to qemu_domain: https://www.redhat.com/archives/libvir-list/2018-September/msg00463.html reviewed and pushed so that patch4 can use the qemu_domain API to alter it's hugepages check. John
Before 2.12 (or if the capabilities are not exposed by the host qemu) the argv will use -file. This is already covered by existing tests, like hugepages-shared.
thanks
John
DO_TEST("cpu-check-none", QEMU_CAPS_KVM); DO_TEST("cpu-check-partial", QEMU_CAPS_KVM); DO_TEST("cpu-check-full", QEMU_CAPS_KVM);

Hi On Wed, Sep 12, 2018 at 4:01 AM, John Ferlan <jferlan@redhat.com> wrote:
On 09/11/2018 04:48 AM, Marc-André Lureau wrote:
Hi
On Tue, Sep 11, 2018 at 2:57 AM, John Ferlan <jferlan@redhat.com> wrote:
On 09/07/2018 07:32 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Check anonymous memory is backed by memfd if qemu is capable.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- tests/qemuxml2argvdata/memfd-memory-numa.args | 28 +++++++++++++++ tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 +++++++++++++++++++ tests/qemuxml2argvtest.c | 5 +++ 3 files changed, 69 insertions(+) create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.args create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml
diff --git a/tests/qemuxml2argvdata/memfd-memory-numa.args b/tests/qemuxml2argvdata/memfd-memory-numa.args new file mode 100644 index 0000000000..b26c476196 --- /dev/null +++ b/tests/qemuxml2argvdata/memfd-memory-numa.args @@ -0,0 +1,28 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name instance-00000092 \ +-S \ +-machine pc-i440fx-wily,accel=kvm,usb=off,dump-guest-core=off \ +-m 14336 \ +-mem-prealloc \ +-smp 20,sockets=1,cores=8,threads=1 \ +-object memory-backend-memfd,id=ram-node0,hugetlb=yes,hugetlbsize=2097152,share=yes,\ +size=15032385536,host-nodes=3,policy=preferred \
Another syntax-check error here, needed to move the "share=yes," to the subsequent line.
ok
+-numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \ +-uuid 126f2720-6f8e-45ab-a886-ec9277079a67 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,\ +path=/tmp/lib/domain--1-instance-00000092/monitor.sock,server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/memfd-memory-numa.xml b/tests/qemuxml2argvdata/memfd-memory-numa.xml new file mode 100644 index 0000000000..abe93e8c4b --- /dev/null +++ b/tests/qemuxml2argvdata/memfd-memory-numa.xml @@ -0,0 +1,36 @@ + <domain type='kvm' id='56'> + <name>instance-00000092</name> + <uuid>126f2720-6f8e-45ab-a886-ec9277079a67</uuid> + <memory unit='KiB'>14680064</memory> + <currentMemory unit='KiB'>14680064</currentMemory> + <memoryBacking> + <hugepages> + <page size="2" unit="M"/> + </hugepages> + <source type='anonymous'/> + <access mode='shared'/> + <allocation mode='immediate'/> + </memoryBacking> + <numatune> + <memnode cellid='0' mode='preferred' nodeset='3'/> + </numatune> + <vcpu placement='static'>20</vcpu> + <os> + <type arch='x86_64' machine='pc-i440fx-wily'>hvm</type>>> + <boot dev='hd'/> + </os> + <cpu> + <topology sockets='1' cores='8' threads='1'/> + <numa> + <cell id='0' cpus='0-7' memory='14680064' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <memballoon model='virtio'/> + </devices> + </domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 35df63b2ac..76008a8d07 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -2928,6 +2928,11 @@ mymain(void) DO_TEST("fd-memory-no-numa-topology", QEMU_CAPS_OBJECT_MEMORY_FILE, QEMU_CAPS_KVM);
+ DO_TEST("memfd-memory-numa", + QEMU_CAPS_OBJECT_MEMORY_MEMFD, + QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB, + QEMU_CAPS_KVM); +
Theoretically, if we have 3.1 capabilties to test against, then this would use a DO_TEST_CAPS_LATEST, while a "pre-3.1" would still be using -ramfd, right? That is, using DO_TEST_CAPS_VER w/ "3.0.0" would generate different results.
I'm conflicted if we should wait for someone to generate the 3.1 caps or not. For whatever reason, when I post them they're not quite right for someone else's tastes...
Let's see if anyone else has strong feelings one way or another.
-memfd is available since 2.12. After patch 1 & 2 are applied, we should probably switch to use DO_TEST_CAPS_LATEST.
hrmph - tried using CAPS_LATEST, and got the error
"CPU topology doesn't match maximum vcpu count"
well *that's* helpful /-|...
The only libvirt test that cares about it currently is cpu-hotplug-startup and yes, the maxvcpus matches the cpu topology calculation...
So, as long as I change vcpu count from 20 to 8, rename the tests/qemuxml2argvdata/memfd-memory-numa.args to memfd-memory-numa.x86_64-latest.args, and regenerate the output to:
LC_ALL=C \ PATH=/bin \ HOME=/home/test \ USER=test \ LOGNAME=test \ QEMU_AUDIO_DRV=none \ /usr/bin/qemu-system-x86_64 \ -name guest=instance-00000092,debug-threads=on \ -S \ -object secret,id=masterKey0,format=raw,\ file=/tmp/lib/domain--1-instance-00000092/master-key.aes \ -machine pc-i440fx-wily,accel=kvm,usb=off,dump-guest-core=off \ -m 14336 \ -mem-prealloc \ -realtime mlock=off \ -smp 8,sockets=1,cores=8,threads=1 \ -object memory-backend-memfd,id=ram-node0,hugetlb=yes,hugetlbsize=2097152,\ share=yes,size=15032385536,host-nodes=3,policy=preferred \ -numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \ -uuid 126f2720-6f8e-45ab-a886-ec9277079a67 \ -display none \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=1729,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc \ -no-shutdown \ -no-acpi \ -boot strict=on \ -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x2 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,\ resourcecontrol=deny \ -msg timestamp=on
Then, the test is happy. The memory-backend-memfd object doesn't change.
ok
So all that's "left":
1. "Add" a check in qemuDomainABIStabilityCheck to ensure we're not changing from memory-backend-ram to memory-backend-memfd. We already check that "(src->mem.source != dst->mem.source)" - so we know we're already anonymous or not.
Any suggestions? If source is anonymous, then what? I think we can use the qemuDomainObjPrivatePtr in some way to determine that we were started with -memfd (or not started that way).
No idea how we could save that information across various restarts / version changes. Tbh, I would try to migrate, and let qemu fail if something is incompatible (such as incompatible memory backends or memory region name mismatch). See also my qemu series "[PATCH 0/9] hostmem-ram: use whole path for region name with >= 3.1". It feels like libvirt duplicates some qemu logic/error otherwise.
2. Get the patches I posted today to cleanup/move the memory backing checks from domain_conf to qemu_domain:
https://www.redhat.com/archives/libvir-list/2018-September/msg00463.html
reviewed and pushed so that patch4 can use the qemu_domain API to alter it's hugepages check.
done feel free to update & resend my series, or else I will rebase and resend it thanks
John
Before 2.12 (or if the capabilities are not exposed by the host qemu) the argv will use -file. This is already covered by existing tests, like hugepages-shared.
thanks
John
DO_TEST("cpu-check-none", QEMU_CAPS_KVM); DO_TEST("cpu-check-partial", QEMU_CAPS_KVM); DO_TEST("cpu-check-full", QEMU_CAPS_KVM);

[...]
So all that's "left":
1. "Add" a check in qemuDomainABIStabilityCheck to ensure we're not changing from memory-backend-ram to memory-backend-memfd. We already check that "(src->mem.source != dst->mem.source)" - so we know we're already anonymous or not.
Any suggestions? If source is anonymous, then what? I think we can use the qemuDomainObjPrivatePtr in some way to determine that we were started with -memfd (or not started that way).
No idea how we could save that information across various restarts / version changes.
I think it'd be ugly... I think migration cookies would have to be used... I considered other mechanisms, but each wouldn't quite work. Without writing the code, if we cared to do this, then we'd have: 1. Add a field to qemuDomainObjPrivatePtr that indicates what got started (none, memfd, file, or ram). Add a typedef enum that has unknown, none, memfd, file, and ram. Add the Parse/Format code to handle the field. 2. Modify the qemu_command code to set the field in priv based on what got started, if something got started. The value would be > 0... 3. Mess with the migration cookie logic to add checks for what the source started. On the destination side of that cookie if we had the "right capabilities", then check the source cookie to see what it has. If it didn't have that field, then I think one could assume the source with anonymous memory backing would be using -ram. We'd already fail the src/dst mem.source check if one used -file. I'm not all the versed in the cookies, but I think that'd work "logically thinking" at least. The devil would be in the details. Assuming your 3.1 patches do something to handle the condition, I guess it comes does to how much of a problem it's believed this could be in 2.12 and 3.0 if someone is running -ram and migrates to a host that would default to -memfd.
Tbh, I would try to migrate, and let qemu fail if something is incompatible (such as incompatible memory backends or memory region name mismatch). See also my qemu series "[PATCH 0/9] hostmem-ram: use whole path for region name with >= 3.1". It feels like libvirt duplicates some qemu logic/error otherwise.
I'm sure there's lots of duplication, but generally doing the checks in libvirt allow for a bit "easier" (in least in terms of libvirt) backout logic. Once the qemu process starts - if the process eventually dies because of something, then the logging only goes to libvirt log files. If the process fails to start, libvirt does capture and give that information back to the consumer. So call it preventative duplication. I think historically some qemu error messages have been a bit too vague to figure out why something didn't work.
2. Get the patches I posted today to cleanup/move the memory backing checks from domain_conf to qemu_domain:
https://www.redhat.com/archives/libvir-list/2018-September/msg00463.html
reviewed and pushed so that patch4 can use the qemu_domain API to alter it's hugepages check.
done
Thanks - I pushed that...
feel free to update & resend my series, or else I will rebase and resend it
thanks
OK - I adjusted your changes to handle the previously agreed upon "issues" and was ready to push the series when it dawned on me that the MEMFD and MEMFD_HUGETLB capabilities both use the 2.12 release - so realistically would the latter really be necessary? Again if something doesn't quite work in 2.12 and 3.0 for hugetlb, then perhaps there's something in 3.1 that can be checked. I can remove or keep patch 2. If removed, then just use MEMFD as the basis. Your call. John

Hi On Thu, Sep 13, 2018 at 2:25 AM, John Ferlan <jferlan@redhat.com> wrote:
[...]
So all that's "left":
1. "Add" a check in qemuDomainABIStabilityCheck to ensure we're not changing from memory-backend-ram to memory-backend-memfd. We already check that "(src->mem.source != dst->mem.source)" - so we know we're already anonymous or not.
Any suggestions? If source is anonymous, then what? I think we can use the qemuDomainObjPrivatePtr in some way to determine that we were started with -memfd (or not started that way).
No idea how we could save that information across various restarts / version changes.
I think it'd be ugly... I think migration cookies would have to be used... I considered other mechanisms, but each wouldn't quite work. Without writing the code, if we cared to do this, then we'd have:
1. Add a field to qemuDomainObjPrivatePtr that indicates what got started (none, memfd, file, or ram). Add a typedef enum that has unknown, none, memfd, file, and ram. Add the Parse/Format code to handle the field.
2. Modify the qemu_command code to set the field in priv based on what got started, if something got started. The value would be > 0...
3. Mess with the migration cookie logic to add checks for what the source started. On the destination side of that cookie if we had the "right capabilities", then check the source cookie to see what it has. If it didn't have that field, then I think one could assume the source with anonymous memory backing would be using -ram. We'd already fail the src/dst mem.source check if one used -file. I'm not all the versed in the cookies, but I think that'd work "logically thinking" at least. The devil would be in the details.
Assuming your 3.1 patches do something to handle the condition, I guess it comes does to how much of a problem it's believed this could be in 2.12 and 3.0 if someone is running -ram and migrates to a host that would default to -memfd.
I am afraid we will need to do it to handle transparent -memfd usage. I'll look at it with your help.
Tbh, I would try to migrate, and let qemu fail if something is incompatible (such as incompatible memory backends or memory region name mismatch). See also my qemu series "[PATCH 0/9] hostmem-ram: use whole path for region name with >= 3.1". It feels like libvirt duplicates some qemu logic/error otherwise.
I'm sure there's lots of duplication, but generally doing the checks in libvirt allow for a bit "easier" (in least in terms of libvirt) backout logic. Once the qemu process starts - if the process eventually dies because of something, then the logging only goes to libvirt log files. If the process fails to start, libvirt does capture and give that information back to the consumer. So call it preventative duplication. I think historically some qemu error messages have been a bit too vague to figure out why something didn't work.
2. Get the patches I posted today to cleanup/move the memory backing checks from domain_conf to qemu_domain:
https://www.redhat.com/archives/libvir-list/2018-September/msg00463.html
reviewed and pushed so that patch4 can use the qemu_domain API to alter it's hugepages check.
done
Thanks - I pushed that...
feel free to update & resend my series, or else I will rebase and resend it
thanks
OK - I adjusted your changes to handle the previously agreed upon "issues" and was ready to push the series when it dawned on me that the MEMFD and MEMFD_HUGETLB capabilities both use the 2.12 release - so realistically would the latter really be necessary?
Again if something doesn't quite work in 2.12 and 3.0 for hugetlb, then perhaps there's something in 3.1 that can be checked.
I can remove or keep patch 2. If removed, then just use MEMFD as the basis. Your call.
I'd keep the MEMFD_HUGETLB check, even with <3.1.

On 09/13/2018 03:39 AM, Marc-André Lureau wrote:
Hi
On Thu, Sep 13, 2018 at 2:25 AM, John Ferlan <jferlan@redhat.com> wrote:
[...]
So all that's "left":
1. "Add" a check in qemuDomainABIStabilityCheck to ensure we're not changing from memory-backend-ram to memory-backend-memfd. We already check that "(src->mem.source != dst->mem.source)" - so we know we're already anonymous or not.
Any suggestions? If source is anonymous, then what? I think we can use the qemuDomainObjPrivatePtr in some way to determine that we were started with -memfd (or not started that way).
No idea how we could save that information across various restarts / version changes.
I think it'd be ugly... I think migration cookies would have to be used... I considered other mechanisms, but each wouldn't quite work. Without writing the code, if we cared to do this, then we'd have:
1. Add a field to qemuDomainObjPrivatePtr that indicates what got started (none, memfd, file, or ram). Add a typedef enum that has unknown, none, memfd, file, and ram. Add the Parse/Format code to handle the field.
2. Modify the qemu_command code to set the field in priv based on what got started, if something got started. The value would be > 0...
3. Mess with the migration cookie logic to add checks for what the source started. On the destination side of that cookie if we had the "right capabilities", then check the source cookie to see what it has. If it didn't have that field, then I think one could assume the source with anonymous memory backing would be using -ram. We'd already fail the src/dst mem.source check if one used -file. I'm not all the versed in the cookies, but I think that'd work "logically thinking" at least. The devil would be in the details.
Assuming your 3.1 patches do something to handle the condition, I guess it comes does to how much of a problem it's believed this could be in 2.12 and 3.0 if someone is running -ram and migrates to a host that would default to -memfd.
I am afraid we will need to do it to handle transparent -memfd usage. I'll look at it with your help.
Let's see what I can cobble together. I'll repost the series a bit later today hopefully. John
Tbh, I would try to migrate, and let qemu fail if something is incompatible (such as incompatible memory backends or memory region name mismatch). See also my qemu series "[PATCH 0/9] hostmem-ram: use whole path for region name with >= 3.1". It feels like libvirt duplicates some qemu logic/error otherwise.
I'm sure there's lots of duplication, but generally doing the checks in libvirt allow for a bit "easier" (in least in terms of libvirt) backout logic. Once the qemu process starts - if the process eventually dies because of something, then the logging only goes to libvirt log files. If the process fails to start, libvirt does capture and give that information back to the consumer. So call it preventative duplication. I think historically some qemu error messages have been a bit too vague to figure out why something didn't work.
2. Get the patches I posted today to cleanup/move the memory backing checks from domain_conf to qemu_domain:
https://www.redhat.com/archives/libvir-list/2018-September/msg00463.html
reviewed and pushed so that patch4 can use the qemu_domain API to alter it's hugepages check.
done
Thanks - I pushed that...
feel free to update & resend my series, or else I will rebase and resend it
thanks
OK - I adjusted your changes to handle the previously agreed upon "issues" and was ready to push the series when it dawned on me that the MEMFD and MEMFD_HUGETLB capabilities both use the 2.12 release - so realistically would the latter really be necessary?
Again if something doesn't quite work in 2.12 and 3.0 for hugetlb, then perhaps there's something in 3.1 that can be checked.
I can remove or keep patch 2. If removed, then just use MEMFD as the basis. Your call.
I'd keep the MEMFD_HUGETLB check, even with <3.1.

On 09/13/2018 10:09 AM, John Ferlan wrote:
On 09/13/2018 03:39 AM, Marc-André Lureau wrote:
Hi
On Thu, Sep 13, 2018 at 2:25 AM, John Ferlan <jferlan@redhat.com> wrote:
[...]
So all that's "left":
1. "Add" a check in qemuDomainABIStabilityCheck to ensure we're not changing from memory-backend-ram to memory-backend-memfd. We already check that "(src->mem.source != dst->mem.source)" - so we know we're already anonymous or not.
Any suggestions? If source is anonymous, then what? I think we can use the qemuDomainObjPrivatePtr in some way to determine that we were started with -memfd (or not started that way).
No idea how we could save that information across various restarts / version changes.
I think it'd be ugly... I think migration cookies would have to be used... I considered other mechanisms, but each wouldn't quite work. Without writing the code, if we cared to do this, then we'd have:
1. Add a field to qemuDomainObjPrivatePtr that indicates what got started (none, memfd, file, or ram). Add a typedef enum that has unknown, none, memfd, file, and ram. Add the Parse/Format code to handle the field.
2. Modify the qemu_command code to set the field in priv based on what got started, if something got started. The value would be > 0...
3. Mess with the migration cookie logic to add checks for what the source started. On the destination side of that cookie if we had the "right capabilities", then check the source cookie to see what it has. If it didn't have that field, then I think one could assume the source with anonymous memory backing would be using -ram. We'd already fail the src/dst mem.source check if one used -file. I'm not all the versed in the cookies, but I think that'd work "logically thinking" at least. The devil would be in the details.
Assuming your 3.1 patches do something to handle the condition, I guess it comes does to how much of a problem it's believed this could be in 2.12 and 3.0 if someone is running -ram and migrates to a host that would default to -memfd.
I am afraid we will need to do it to handle transparent -memfd usage. I'll look at it with your help.
Let's see what I can cobble together. I'll repost the series a bit later today hopefully.
After spending a few hours on this, the cookies just don't help enough or I don't know/understand enough about their usage. I keep coming back to the problem of how do we disallow a migration from a host that has/knows about and uses anonymous memfd to one that doesn't know about it. Similarly, if a domain source w/ "file" or "ram" (whether at startup time or via hotplug) is migrated to a target host that would generate memfd - we have no mechanism to stop the migration because we have no way to tell what it was running, especially since what gets started isn't just based off the source type - hugepages have a tangential role. Lots of logic stuffed into qemu_command that probably should have been in some qemuDomainPrepareMemtune API. So unfortunately, I think the only safe way is to create a new source type ("anonmem", "anonfile", "anonmemfd", ??) and describe it as lightly as the other entries are described (ironically the document default of "anonymous" could be "file" or it could be "ram" based 3 other factors not described in the docs). At least with a new type name/value we can guarantee that someone selects it by name rather than the multipurpose "anonymous" type. I think it would mean moving the caps checks to a bit later in the code, search for "otherwise check the required capability". Unless someone still brave enough to keep reading this stream has an idea to try. I'm tapped out! John

On 09/13/2018 11:51 PM, John Ferlan wrote:
On 09/13/2018 10:09 AM, John Ferlan wrote:
On 09/13/2018 03:39 AM, Marc-André Lureau wrote:
Hi
On Thu, Sep 13, 2018 at 2:25 AM, John Ferlan <jferlan@redhat.com> wrote:
[...]
So all that's "left":
1. "Add" a check in qemuDomainABIStabilityCheck to ensure we're not changing from memory-backend-ram to memory-backend-memfd. We already check that "(src->mem.source != dst->mem.source)" - so we know we're already anonymous or not.
Any suggestions? If source is anonymous, then what? I think we can use the qemuDomainObjPrivatePtr in some way to determine that we were started with -memfd (or not started that way).
No idea how we could save that information across various restarts / version changes.
I think it'd be ugly... I think migration cookies would have to be used... I considered other mechanisms, but each wouldn't quite work. Without writing the code, if we cared to do this, then we'd have:
1. Add a field to qemuDomainObjPrivatePtr that indicates what got started (none, memfd, file, or ram). Add a typedef enum that has unknown, none, memfd, file, and ram. Add the Parse/Format code to handle the field.
2. Modify the qemu_command code to set the field in priv based on what got started, if something got started. The value would be > 0...
3. Mess with the migration cookie logic to add checks for what the source started. On the destination side of that cookie if we had the "right capabilities", then check the source cookie to see what it has. If it didn't have that field, then I think one could assume the source with anonymous memory backing would be using -ram. We'd already fail the src/dst mem.source check if one used -file. I'm not all the versed in the cookies, but I think that'd work "logically thinking" at least. The devil would be in the details.
Assuming your 3.1 patches do something to handle the condition, I guess it comes does to how much of a problem it's believed this could be in 2.12 and 3.0 if someone is running -ram and migrates to a host that would default to -memfd.
I am afraid we will need to do it to handle transparent -memfd usage. I'll look at it with your help.
Let's see what I can cobble together. I'll repost the series a bit later today hopefully.
After spending a few hours on this, the cookies just don't help enough or I don't know/understand enough about their usage.
I keep coming back to the problem of how do we disallow a migration from a host that has/knows about and uses anonymous memfd to one that doesn't know about it. Similarly, if a domain source w/ "file" or "ram" (whether at startup time or via hotplug) is migrated to a target host that would generate memfd - we have no mechanism to stop the migration because we have no way to tell what it was running, especially since what gets started isn't just based off the source type - hugepages have a tangential role. Lots of logic stuffed into qemu_command that probably should have been in some qemuDomainPrepareMemtune API.
So unfortunately, I think the only safe way is to create a new source type ("anonmem", "anonfile", "anonmemfd", ??) and describe it as lightly as the other entries are described (ironically the document default of "anonymous" could be "file" or it could be "ram" based 3 other factors not described in the docs). At least with a new type name/value we can guarantee that someone selects it by name rather than the multipurpose "anonymous" type. I think it would mean moving the caps checks to a bit later in the code, search for "otherwise check the required capability".
Unless someone still brave enough to keep reading this stream has an idea to try. I'm tapped out!
We can have an element/attribute in status XML/migration XML saying which backend we've used. This is slightly tricky because we have more places then one where users can tune confuguration such that we use different backends. My personal favorite is: <memoryBacking> <hugepages> <page size='2048' unit='KiB' nodeset='1'/> </hugepages> </memoryBacking> <cpu> <numa> <cell id='0' cpus='0' memory='1048576' unit='KiB'/> <cell id='1' cpus='1' memory='1048576' unit='KiB' memAccess='shared'/> <cell id='2' cpus='2' memory='1048576' unit='KiB' memAccess='private'/> <cell id='3' cpus='3' memory='1048576' unit='KiB'/> </numa> </cpu> <devices> <memory model='dimm'> <target> <size unit='KiB'>524288</size> <node>1</node> </target> <address type='dimm' slot='0' base='0x100000000'/> </memory> </devices> So what we can have is: <hugepages> <page size=.... backend='memory-backend-file'/> </hugepages> <cell id='0' cpus='0' memory='1048576' unit='KiB' backend='memory-backend-ram'/> <cell id='1' cpus='1' memory='1048576' unit='KiB' memAccess='shared' backend='memory-backend-file'/> <cell id='2' cpus='2' memory='1048576' unit='KiB' memAccess='private' backend='memory-backend-file/> <cell id='3' cpus='3' memory='1048576' unit='KiB' backend='memory-backend-ram'/> <devices> <memory model='dimm' backend='memory-backend-ram'/> .. </devices> This way we know what backend was used on the source (in saved state) and the only thing we need to know on dst (on restore) is to check if given backend is available. I don't think putting anything in migration cookies is going to help. It might help migration if anything but it will definitely keep save/restore broken as there are no migration cookies. Michal

Hi On Fri, Sep 14, 2018 at 11:44 AM, Michal Prívozník <mprivozn@redhat.com> wrote:
On 09/13/2018 11:51 PM, John Ferlan wrote:
On 09/13/2018 10:09 AM, John Ferlan wrote:
On 09/13/2018 03:39 AM, Marc-André Lureau wrote:
Hi
On Thu, Sep 13, 2018 at 2:25 AM, John Ferlan <jferlan@redhat.com> wrote:
[...]
> > So all that's "left": > > 1. "Add" a check in qemuDomainABIStabilityCheck to ensure we're not > changing from memory-backend-ram to memory-backend-memfd. We already > check that "(src->mem.source != dst->mem.source)" - so we know we're > already anonymous or not. > > Any suggestions? If source is anonymous, then what? I think we can use > the qemuDomainObjPrivatePtr in some way to determine that we were > started with -memfd (or not started that way).
No idea how we could save that information across various restarts / version changes.
I think it'd be ugly... I think migration cookies would have to be used... I considered other mechanisms, but each wouldn't quite work. Without writing the code, if we cared to do this, then we'd have:
1. Add a field to qemuDomainObjPrivatePtr that indicates what got started (none, memfd, file, or ram). Add a typedef enum that has unknown, none, memfd, file, and ram. Add the Parse/Format code to handle the field.
2. Modify the qemu_command code to set the field in priv based on what got started, if something got started. The value would be > 0...
3. Mess with the migration cookie logic to add checks for what the source started. On the destination side of that cookie if we had the "right capabilities", then check the source cookie to see what it has. If it didn't have that field, then I think one could assume the source with anonymous memory backing would be using -ram. We'd already fail the src/dst mem.source check if one used -file. I'm not all the versed in the cookies, but I think that'd work "logically thinking" at least. The devil would be in the details.
Assuming your 3.1 patches do something to handle the condition, I guess it comes does to how much of a problem it's believed this could be in 2.12 and 3.0 if someone is running -ram and migrates to a host that would default to -memfd.
I am afraid we will need to do it to handle transparent -memfd usage. I'll look at it with your help.
Let's see what I can cobble together. I'll repost the series a bit later today hopefully.
After spending a few hours on this, the cookies just don't help enough or I don't know/understand enough about their usage.
I keep coming back to the problem of how do we disallow a migration from a host that has/knows about and uses anonymous memfd to one that doesn't know about it. Similarly, if a domain source w/ "file" or "ram" (whether at startup time or via hotplug) is migrated to a target host that would generate memfd - we have no mechanism to stop the migration because we have no way to tell what it was running, especially since what gets started isn't just based off the source type - hugepages have a tangential role. Lots of logic stuffed into qemu_command that probably should have been in some qemuDomainPrepareMemtune API.
So unfortunately, I think the only safe way is to create a new source type ("anonmem", "anonfile", "anonmemfd", ??) and describe it as lightly as the other entries are described (ironically the document default of "anonymous" could be "file" or it could be "ram" based 3 other factors not described in the docs). At least with a new type name/value we can guarantee that someone selects it by name rather than the multipurpose "anonymous" type. I think it would mean moving the caps checks to a bit later in the code, search for "otherwise check the required capability".
Unless someone still brave enough to keep reading this stream has an idea to try. I'm tapped out!
We can have an element/attribute in status XML/migration XML saying which backend we've used. This is slightly tricky because we have more places then one where users can tune confuguration such that we use different backends. My personal favorite is:
<memoryBacking> <hugepages> <page size='2048' unit='KiB' nodeset='1'/> </hugepages> </memoryBacking>
<cpu> <numa> <cell id='0' cpus='0' memory='1048576' unit='KiB'/> <cell id='1' cpus='1' memory='1048576' unit='KiB' memAccess='shared'/> <cell id='2' cpus='2' memory='1048576' unit='KiB' memAccess='private'/> <cell id='3' cpus='3' memory='1048576' unit='KiB'/> </numa> </cpu>
<devices> <memory model='dimm'> <target> <size unit='KiB'>524288</size> <node>1</node> </target> <address type='dimm' slot='0' base='0x100000000'/> </memory> </devices>
So what we can have is:
<hugepages> <page size=.... backend='memory-backend-file'/> </hugepages>
<cell id='0' cpus='0' memory='1048576' unit='KiB' backend='memory-backend-ram'/> <cell id='1' cpus='1' memory='1048576' unit='KiB' memAccess='shared' backend='memory-backend-file'/> <cell id='2' cpus='2' memory='1048576' unit='KiB' memAccess='private' backend='memory-backend-file/> <cell id='3' cpus='3' memory='1048576' unit='KiB' backend='memory-backend-ram'/>
<devices> <memory model='dimm' backend='memory-backend-ram'/>
That's a bit overkill to me, since we don't have (yet) the capacity for a user to select the memory backend, and the value is a qemu-specific detail.
.. </devices>
This way we know what backend was used on the source (in saved state) and the only thing we need to know on dst (on restore) is to check if given backend is available.
I don't think putting anything in migration cookies is going to help. It might help migration if anything but it will definitely keep save/restore broken as there are no migration cookies.
Ah, too bad. I am not familar enough with migration and save/restore in libvirt. But I started to imagine how the migration cookie could have been used. Is there only in the domain XML we can save information? If yes, then either we go with your proposal (although I wonder if it should be qemu: namespaced) or can we introduce libvirt capabilites? (something as simple as <capabilities><qemu-memorybackend-memfd</capabilities>) ? thanks!
Michal

On 09/17/2018 11:30 AM, Marc-André Lureau wrote:
Hi
On Fri, Sep 14, 2018 at 11:44 AM, Michal Prívozník <mprivozn@redhat.com> wrote:
On 09/13/2018 11:51 PM, John Ferlan wrote:
On 09/13/2018 10:09 AM, John Ferlan wrote:
On 09/13/2018 03:39 AM, Marc-André Lureau wrote:
Hi
On Thu, Sep 13, 2018 at 2:25 AM, John Ferlan <jferlan@redhat.com> wrote:
[...]
>> >> So all that's "left": >> >> 1. "Add" a check in qemuDomainABIStabilityCheck to ensure we're not >> changing from memory-backend-ram to memory-backend-memfd. We already >> check that "(src->mem.source != dst->mem.source)" - so we know we're >> already anonymous or not. >> >> Any suggestions? If source is anonymous, then what? I think we can use >> the qemuDomainObjPrivatePtr in some way to determine that we were >> started with -memfd (or not started that way). > > No idea how we could save that information across various restarts / > version changes.
I think it'd be ugly... I think migration cookies would have to be used... I considered other mechanisms, but each wouldn't quite work. Without writing the code, if we cared to do this, then we'd have:
1. Add a field to qemuDomainObjPrivatePtr that indicates what got started (none, memfd, file, or ram). Add a typedef enum that has unknown, none, memfd, file, and ram. Add the Parse/Format code to handle the field.
2. Modify the qemu_command code to set the field in priv based on what got started, if something got started. The value would be > 0...
3. Mess with the migration cookie logic to add checks for what the source started. On the destination side of that cookie if we had the "right capabilities", then check the source cookie to see what it has. If it didn't have that field, then I think one could assume the source with anonymous memory backing would be using -ram. We'd already fail the src/dst mem.source check if one used -file. I'm not all the versed in the cookies, but I think that'd work "logically thinking" at least. The devil would be in the details.
Assuming your 3.1 patches do something to handle the condition, I guess it comes does to how much of a problem it's believed this could be in 2.12 and 3.0 if someone is running -ram and migrates to a host that would default to -memfd.
I am afraid we will need to do it to handle transparent -memfd usage. I'll look at it with your help.
Let's see what I can cobble together. I'll repost the series a bit later today hopefully.
After spending a few hours on this, the cookies just don't help enough or I don't know/understand enough about their usage.
I keep coming back to the problem of how do we disallow a migration from a host that has/knows about and uses anonymous memfd to one that doesn't know about it. Similarly, if a domain source w/ "file" or "ram" (whether at startup time or via hotplug) is migrated to a target host that would generate memfd - we have no mechanism to stop the migration because we have no way to tell what it was running, especially since what gets started isn't just based off the source type - hugepages have a tangential role. Lots of logic stuffed into qemu_command that probably should have been in some qemuDomainPrepareMemtune API.
So unfortunately, I think the only safe way is to create a new source type ("anonmem", "anonfile", "anonmemfd", ??) and describe it as lightly as the other entries are described (ironically the document default of "anonymous" could be "file" or it could be "ram" based 3 other factors not described in the docs). At least with a new type name/value we can guarantee that someone selects it by name rather than the multipurpose "anonymous" type. I think it would mean moving the caps checks to a bit later in the code, search for "otherwise check the required capability".
Unless someone still brave enough to keep reading this stream has an idea to try. I'm tapped out!
We can have an element/attribute in status XML/migration XML saying which backend we've used. This is slightly tricky because we have more places then one where users can tune confuguration such that we use different backends. My personal favorite is:
<memoryBacking> <hugepages> <page size='2048' unit='KiB' nodeset='1'/> </hugepages> </memoryBacking>
<cpu> <numa> <cell id='0' cpus='0' memory='1048576' unit='KiB'/> <cell id='1' cpus='1' memory='1048576' unit='KiB' memAccess='shared'/> <cell id='2' cpus='2' memory='1048576' unit='KiB' memAccess='private'/> <cell id='3' cpus='3' memory='1048576' unit='KiB'/> </numa> </cpu>
<devices> <memory model='dimm'> <target> <size unit='KiB'>524288</size> <node>1</node> </target> <address type='dimm' slot='0' base='0x100000000'/> </memory> </devices>
So what we can have is:
<hugepages> <page size=.... backend='memory-backend-file'/> </hugepages>
<cell id='0' cpus='0' memory='1048576' unit='KiB' backend='memory-backend-ram'/> <cell id='1' cpus='1' memory='1048576' unit='KiB' memAccess='shared' backend='memory-backend-file'/> <cell id='2' cpus='2' memory='1048576' unit='KiB' memAccess='private' backend='memory-backend-file/> <cell id='3' cpus='3' memory='1048576' unit='KiB' backend='memory-backend-ram'/>
<devices> <memory model='dimm' backend='memory-backend-ram'/>
That's a bit overkill to me, since we don't have (yet) the capacity for a user to select the memory backend, and the value is a qemu-specific detail.
So status XML is not something we parse from user. It's produced by libvirt and it's a superset of user provided XML and some runtime information. For instance, look around the lines where VIR_DOMAIN_DEF_PARSE_STATUS flag occurs.
.. </devices>
This way we know what backend was used on the source (in saved state) and the only thing we need to know on dst (on restore) is to check if given backend is available.
I don't think putting anything in migration cookies is going to help. It might help migration if anything but it will definitely keep save/restore broken as there are no migration cookies.
Ah, too bad. I am not familar enough with migration and save/restore in libvirt. But I started to imagine how the migration cookie could have been used.
From qemu POV, there's no difference between migration and save/restore. All of them is a migration except save/restore is migration to/from a file (FD actually).
Is there only in the domain XML we can save information?
Yes, status XML. That's where libvirt keeps its runtime information (and which backend was used falls exactly into this category) so that it is preserved on the daemon restart.
If yes, then either we go with your proposal (although I wonder if it should be qemu: namespaced) or can we introduce libvirt capabilites? (something as simple as <capabilities><qemu-memorybackend-memfd</capabilities>) ?
No need. Once again, this is not something that users will ever see, nor libvirt would parse it when parsing input from user. The same applies for migration XML. These two are different in some aspects, but that is not critical for this feature. It's sufficient to say for now that status XML preserves runtime data between daemon restarts (we want freshly restarted libvirt to remember what backend was used) and migration XML preserves runtime data on migration (we want the destination to know what backend is used). Michal

Hi On Mon, Sep 17, 2018 at 3:07 PM, Michal Privoznik <mprivozn@redhat.com> wrote:
On 09/17/2018 11:30 AM, Marc-André Lureau wrote:
Hi
On Fri, Sep 14, 2018 at 11:44 AM, Michal Prívozník <mprivozn@redhat.com> wrote:
On 09/13/2018 11:51 PM, John Ferlan wrote:
On 09/13/2018 10:09 AM, John Ferlan wrote:
On 09/13/2018 03:39 AM, Marc-André Lureau wrote:
Hi
On Thu, Sep 13, 2018 at 2:25 AM, John Ferlan <jferlan@redhat.com> wrote: > > [...] > >>> >>> So all that's "left": >>> >>> 1. "Add" a check in qemuDomainABIStabilityCheck to ensure we're not >>> changing from memory-backend-ram to memory-backend-memfd. We already >>> check that "(src->mem.source != dst->mem.source)" - so we know we're >>> already anonymous or not. >>> >>> Any suggestions? If source is anonymous, then what? I think we can use >>> the qemuDomainObjPrivatePtr in some way to determine that we were >>> started with -memfd (or not started that way). >> >> No idea how we could save that information across various restarts / >> version changes. > > I think it'd be ugly... I think migration cookies would have to be > used... I considered other mechanisms, but each wouldn't quite work. > Without writing the code, if we cared to do this, then we'd have: > > 1. Add a field to qemuDomainObjPrivatePtr that indicates what got > started (none, memfd, file, or ram). Add a typedef enum that has > unknown, none, memfd, file, and ram. Add the Parse/Format code to handle > the field. > > 2. Modify the qemu_command code to set the field in priv based on what > got started, if something got started. The value would be > 0... > > 3. Mess with the migration cookie logic to add checks for what the > source started. On the destination side of that cookie if we had the > "right capabilities", then check the source cookie to see what it has. > If it didn't have that field, then I think one could assume the source > with anonymous memory backing would be using -ram. We'd already fail the > src/dst mem.source check if one used -file. I'm not all the versed in > the cookies, but I think that'd work "logically thinking" at least. The > devil would be in the details. > > Assuming your 3.1 patches do something to handle the condition, I guess > it comes does to how much of a problem it's believed this could be in > 2.12 and 3.0 if someone is running -ram and migrates to a host that > would default to -memfd.
I am afraid we will need to do it to handle transparent -memfd usage. I'll look at it with your help.
Let's see what I can cobble together. I'll repost the series a bit later today hopefully.
After spending a few hours on this, the cookies just don't help enough or I don't know/understand enough about their usage.
I keep coming back to the problem of how do we disallow a migration from a host that has/knows about and uses anonymous memfd to one that doesn't know about it. Similarly, if a domain source w/ "file" or "ram" (whether at startup time or via hotplug) is migrated to a target host that would generate memfd - we have no mechanism to stop the migration because we have no way to tell what it was running, especially since what gets started isn't just based off the source type - hugepages have a tangential role. Lots of logic stuffed into qemu_command that probably should have been in some qemuDomainPrepareMemtune API.
So unfortunately, I think the only safe way is to create a new source type ("anonmem", "anonfile", "anonmemfd", ??) and describe it as lightly as the other entries are described (ironically the document default of "anonymous" could be "file" or it could be "ram" based 3 other factors not described in the docs). At least with a new type name/value we can guarantee that someone selects it by name rather than the multipurpose "anonymous" type. I think it would mean moving the caps checks to a bit later in the code, search for "otherwise check the required capability".
Unless someone still brave enough to keep reading this stream has an idea to try. I'm tapped out!
We can have an element/attribute in status XML/migration XML saying which backend we've used. This is slightly tricky because we have more places then one where users can tune confuguration such that we use different backends. My personal favorite is:
<memoryBacking> <hugepages> <page size='2048' unit='KiB' nodeset='1'/> </hugepages> </memoryBacking>
<cpu> <numa> <cell id='0' cpus='0' memory='1048576' unit='KiB'/> <cell id='1' cpus='1' memory='1048576' unit='KiB' memAccess='shared'/> <cell id='2' cpus='2' memory='1048576' unit='KiB' memAccess='private'/> <cell id='3' cpus='3' memory='1048576' unit='KiB'/> </numa> </cpu>
<devices> <memory model='dimm'> <target> <size unit='KiB'>524288</size> <node>1</node> </target> <address type='dimm' slot='0' base='0x100000000'/> </memory> </devices>
So what we can have is:
<hugepages> <page size=.... backend='memory-backend-file'/> </hugepages>
<cell id='0' cpus='0' memory='1048576' unit='KiB' backend='memory-backend-ram'/> <cell id='1' cpus='1' memory='1048576' unit='KiB' memAccess='shared' backend='memory-backend-file'/> <cell id='2' cpus='2' memory='1048576' unit='KiB' memAccess='private' backend='memory-backend-file/> <cell id='3' cpus='3' memory='1048576' unit='KiB' backend='memory-backend-ram'/>
<devices> <memory model='dimm' backend='memory-backend-ram'/>
That's a bit overkill to me, since we don't have (yet) the capacity for a user to select the memory backend, and the value is a qemu-specific detail.
So status XML is not something we parse from user. It's produced by libvirt and it's a superset of user provided XML and some runtime information. For instance, look around the lines where VIR_DOMAIN_DEF_PARSE_STATUS flag occurs.
.. </devices>
This way we know what backend was used on the source (in saved state) and the only thing we need to know on dst (on restore) is to check if given backend is available.
I don't think putting anything in migration cookies is going to help. It might help migration if anything but it will definitely keep save/restore broken as there are no migration cookies.
Ah, too bad. I am not familar enough with migration and save/restore in libvirt. But I started to imagine how the migration cookie could have been used.
From qemu POV, there's no difference between migration and save/restore. All of them is a migration except save/restore is migration to/from a file (FD actually).
Is there only in the domain XML we can save information?
Yes, status XML. That's where libvirt keeps its runtime information (and which backend was used falls exactly into this category) so that it is preserved on the daemon restart.
If yes, then either we go with your proposal (although I wonder if it should be qemu: namespaced) or can we introduce libvirt capabilites? (something as simple as <capabilities><qemu-memorybackend-memfd</capabilities>) ?
No need. Once again, this is not something that users will ever see, nor libvirt would parse it when parsing input from user.
The same applies for migration XML. These two are different in some aspects, but that is not critical for this feature. It's sufficient to say for now that status XML preserves runtime data between daemon restarts (we want freshly restarted libvirt to remember what backend was used) and migration XML preserves runtime data on migration (we want the destination to know what backend is used).
Ok Wouldn't it be easier to have <source type="memfd"/> Daniel didn't have a strong objection against it, it was more of a suggestion for "anonymous" type improvement: https://www.redhat.com/archives/libvir-list/2018-August/msg01841.html Eventually, "anonymous" could be smartly changed to "memfd" by libvirt when possible (from a non-resume start) thanks
participants (7)
-
Dr. David Alan Gilbert
-
Igor Mammedov
-
John Ferlan
-
Marc-André Lureau
-
marcandre.lureau@redhat.com
-
Michal Privoznik
-
Michal Prívozník