[libvirt] [PATCH v2 0/3] Add "memfd" memory backing type

From: Marc-André Lureau <marcandre.lureau@redhat.com> Hi, This is an alternative series from "[PATCH 0/5] Use memfd if possible". Instead of automatically using memfd for anonymous memory when available (as suggested by Daniel), it introduces the "memfd" memory backing type. Although using memfd transparently when possible is a good idea, it is a source of various complications for migration & save/restore. This could eventually be challenged in a different series. The first two patches have been modified and reviewed by John Ferlan. Hopefully they can be merged early, regardless of the last patch outcome, to avoid the painful rebase conflicts due to capabilities checks introduction. Thanks Marc-André Lureau (3): qemu: add memory-backend-memfd capability check qemu: check memory-backend-memfd.hugetlb capability qemu: add memfd source type docs/formatdomain.html.in | 9 +- docs/schemas/domaincommon.rng | 1 + src/conf/domain_conf.c | 3 +- src/conf/domain_conf.h | 1 + src/qemu/qemu_capabilities.c | 10 ++ src/qemu/qemu_capabilities.h | 2 + src/qemu/qemu_command.c | 69 +++++++---- src/qemu/qemu_domain.c | 12 +- .../caps_2.12.0.aarch64.replies | 94 ++++++++++++--- .../caps_2.12.0.aarch64.xml | 4 +- .../caps_2.12.0.ppc64.replies | 90 +++++++++++--- .../caps_2.12.0.ppc64.xml | 4 +- .../caps_2.12.0.s390x.replies | 98 ++++++++++++---- .../caps_2.12.0.s390x.xml | 4 +- .../caps_2.12.0.x86_64.replies | 110 +++++++++++++----- .../caps_2.12.0.x86_64.xml | 4 +- .../caps_3.0.0.ppc64.replies | 90 +++++++++++--- .../qemucapabilitiesdata/caps_3.0.0.ppc64.xml | 4 +- .../caps_3.0.0.riscv32.replies | 86 +++++++++++--- .../caps_3.0.0.riscv32.xml | 2 + .../caps_3.0.0.riscv64.replies | 86 +++++++++++--- .../caps_3.0.0.riscv64.xml | 2 + .../caps_3.0.0.x86_64.replies | 110 +++++++++++++----- .../caps_3.0.0.x86_64.xml | 4 +- .../memfd-memory-numa.x86_64-latest.args | 34 ++++++ tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 ++++++ tests/qemuxml2argvtest.c | 2 + 27 files changed, 788 insertions(+), 183 deletions(-) create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml -- 2.19.0

From: Marc-André Lureau <marcandre.lureau@redhat.com> Check availability of "-object memory-backend-memfd". Reviewed-by: John Ferlan <jferlan@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml | 1 + tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml | 1 + 10 files changed, 11 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index e04a3d775f..d866fbc183 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -508,6 +508,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, /* 315 */ "vfio-pci.display", "blockdev", + "memory-backend-memfd", ); @@ -1147,6 +1148,7 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { { "vhost-vsock-device", QEMU_CAPS_DEVICE_VHOST_VSOCK }, { "mch", QEMU_CAPS_DEVICE_MCH }, { "sev-guest", QEMU_CAPS_SEV_GUEST }, + { "memory-backend-memfd", QEMU_CAPS_OBJECT_MEMORY_MEMFD }, }; static struct virQEMUCapsStringFlags virQEMUCapsDevicePropsVirtioBalloon[] = { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index a0134493aa..b56d92c2f9 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -492,6 +492,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ /* 315 */ QEMU_CAPS_VFIO_PCI_DISPLAY, /* -device vfio-pci.display */ QEMU_CAPS_BLOCKDEV, /* -blockdev and blockdev-add are supported */ + QEMU_CAPS_OBJECT_MEMORY_MEMFD, /* -object memory-backend-memfd */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml index 71c3d0f53f..e4de7da349 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml @@ -163,6 +163,7 @@ <flag name='tpm-emulator'/> <flag name='egl-headless'/> <flag name='vfio-pci.display'/> + <flag name='memory-backend-memfd'/> <version>2011090</version> <kvmVersion>0</kvmVersion> <microcodeVersion>344910</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml index d638663c75..ffe451b332 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml @@ -161,6 +161,7 @@ <flag name='machine.pseries.cap-htm'/> <flag name='egl-headless'/> <flag name='vfio-pci.display'/> + <flag name='memory-backend-memfd'/> <version>2011090</version> <kvmVersion>0</kvmVersion> <microcodeVersion>425694</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml index f1a154c4c4..7fa50bdfaa 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml @@ -129,6 +129,7 @@ <flag name='tpm-emulator'/> <flag name='egl-headless'/> <flag name='vfio-pci.display'/> + <flag name='memory-backend-memfd'/> <version>2012000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>374287</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml index 2bded9fc38..b3f6bdc302 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml @@ -204,6 +204,7 @@ <flag name='sev-guest'/> <flag name='egl-headless'/> <flag name='vfio-pci.display'/> + <flag name='memory-backend-memfd'/> <version>2011090</version> <kvmVersion>0</kvmVersion> <microcodeVersion>413556</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml index ce70bbad61..f3df19fc33 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml @@ -161,6 +161,7 @@ <flag name='machine.pseries.cap-htm'/> <flag name='egl-headless'/> <flag name='vfio-pci.display'/> + <flag name='memory-backend-memfd'/> <version>2012050</version> <kvmVersion>0</kvmVersion> <microcodeVersion>444131</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml b/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml index f6e74ee7c6..bf04826ad0 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml @@ -101,6 +101,7 @@ <flag name='chardev-fd-pass'/> <flag name='tpm-emulator'/> <flag name='egl-headless'/> + <flag name='memory-backend-memfd'/> <version>3000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml index b6b1bc12db..b5f8f94052 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml @@ -101,6 +101,7 @@ <flag name='chardev-fd-pass'/> <flag name='tpm-emulator'/> <flag name='egl-headless'/> + <flag name='memory-backend-memfd'/> <version>3000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml index 1d910a9679..2f22fd10b6 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml @@ -206,6 +206,7 @@ <flag name='usb-storage.werror'/> <flag name='egl-headless'/> <flag name='vfio-pci.display'/> + <flag name='memory-backend-memfd'/> <version>3000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>425157</microcodeVersion> -- 2.19.0

From: Marc-André Lureau <marcandre.lureau@redhat.com> QEMU 3.1 should only expose the property if the host is actually capable of creating hugetable-backed memfd. However, it may fail at runtime depending on requested "hugetlbsize". Reviewed-by: John Ferlan <jferlan@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/qemu/qemu_capabilities.c | 8 ++ src/qemu/qemu_capabilities.h | 1 + .../caps_2.12.0.aarch64.replies | 94 ++++++++++++--- .../caps_2.12.0.aarch64.xml | 3 +- .../caps_2.12.0.ppc64.replies | 90 +++++++++++--- .../caps_2.12.0.ppc64.xml | 3 +- .../caps_2.12.0.s390x.replies | 98 ++++++++++++---- .../caps_2.12.0.s390x.xml | 3 +- .../caps_2.12.0.x86_64.replies | 110 +++++++++++++----- .../caps_2.12.0.x86_64.xml | 3 +- .../caps_3.0.0.ppc64.replies | 90 +++++++++++--- .../qemucapabilitiesdata/caps_3.0.0.ppc64.xml | 3 +- .../caps_3.0.0.riscv32.replies | 86 +++++++++++--- .../caps_3.0.0.riscv32.xml | 1 + .../caps_3.0.0.riscv64.replies | 86 +++++++++++--- .../caps_3.0.0.riscv64.xml | 1 + .../caps_3.0.0.x86_64.replies | 110 +++++++++++++----- .../caps_3.0.0.x86_64.xml | 3 +- 18 files changed, 637 insertions(+), 156 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index d866fbc183..e333e90171 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -509,6 +509,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, "vfio-pci.display", "blockdev", "memory-backend-memfd", + "memory-backend-memfd.hugetlb", ); @@ -1411,6 +1412,10 @@ static struct virQEMUCapsStringFlags virQEMUCapsObjectPropsMemoryBackendFile[] = { "discard-data", QEMU_CAPS_OBJECT_MEMORY_FILE_DISCARD }, }; +static struct virQEMUCapsStringFlags virQEMUCapsObjectPropsMemoryBackendMemfd[] = { + { "hugetlb", QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB }, +}; + static struct virQEMUCapsStringFlags virQEMUCapsObjectPropsSPAPRMachine[] = { { "cap-hpt-max-page-size", QEMU_CAPS_MACHINE_PSERIES_CAP_HPT_MAX_PAGE_SIZE }, { "cap-htm", QEMU_CAPS_MACHINE_PSERIES_CAP_HTM }, @@ -1420,6 +1425,9 @@ static virQEMUCapsObjectTypeProps virQEMUCapsObjectProps[] = { { "memory-backend-file", virQEMUCapsObjectPropsMemoryBackendFile, ARRAY_CARDINALITY(virQEMUCapsObjectPropsMemoryBackendFile), QEMU_CAPS_OBJECT_MEMORY_FILE }, + { "memory-backend-memfd", virQEMUCapsObjectPropsMemoryBackendMemfd, + ARRAY_CARDINALITY(virQEMUCapsObjectPropsMemoryBackendMemfd), + QEMU_CAPS_OBJECT_MEMORY_FILE }, { "spapr-machine", virQEMUCapsObjectPropsSPAPRMachine, ARRAY_CARDINALITY(virQEMUCapsObjectPropsSPAPRMachine), -1 }, diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index b56d92c2f9..1f0f49979c 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -493,6 +493,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ QEMU_CAPS_VFIO_PCI_DISPLAY, /* -device vfio-pci.display */ QEMU_CAPS_BLOCKDEV, /* -blockdev and blockdev-add are supported */ QEMU_CAPS_OBJECT_MEMORY_MEMFD, /* -object memory-backend-memfd */ + QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB, /* -object memory-backend-memfd.hugetlb */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.replies b/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.replies index 4208a66156..2cd6705d78 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.replies +++ b/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.replies @@ -5403,13 +5403,71 @@ { "execute": "qom-list-properties", "arguments": { - "typename": "spapr-machine" + "typename": "memory-backend-memfd" }, "id": "libvirt-31" } { - "id": "libvirt-31", + "return": [ + { + "name": "policy", + "type": "HostMemPolicy" + }, + { + "name": "share", + "type": "bool" + }, + { + "name": "host-nodes", + "type": "int" + }, + { + "name": "prealloc", + "type": "bool" + }, + { + "name": "dump", + "type": "bool" + }, + { + "name": "size", + "type": "int" + }, + { + "name": "merge", + "type": "bool" + }, + { + "name": "seal", + "type": "bool" + }, + { + "name": "hugetlbsize", + "type": "int" + }, + { + "name": "hugetlb", + "type": "bool" + }, + { + "name": "type", + "type": "string" + } + ], + "id": "libvirt-31" +} + +{ + "execute": "qom-list-properties", + "arguments": { + "typename": "spapr-machine" + }, + "id": "libvirt-32" +} + +{ + "id": "libvirt-32", "error": { "class": "DeviceNotFound", "desc": "Class 'spapr-machine' not found" @@ -5418,7 +5476,7 @@ { "execute": "query-machines", - "id": "libvirt-32" + "id": "libvirt-33" } { @@ -5715,12 +5773,12 @@ "cpu-max": 1 } ], - "id": "libvirt-32" + "id": "libvirt-33" } { "execute": "query-cpu-definitions", - "id": "libvirt-33" + "id": "libvirt-34" } { @@ -5896,35 +5954,35 @@ "static": false } ], - "id": "libvirt-33" + "id": "libvirt-34" } { "execute": "query-tpm-models", - "id": "libvirt-34" + "id": "libvirt-35" } { "return": [ ], - "id": "libvirt-34" + "id": "libvirt-35" } { "execute": "query-tpm-types", - "id": "libvirt-35" + "id": "libvirt-36" } { "return": [ "emulator" ], - "id": "libvirt-35" + "id": "libvirt-36" } { "execute": "query-command-line-options", - "id": "libvirt-36" + "id": "libvirt-37" } { @@ -7085,12 +7143,12 @@ "option": "drive" } ], - "id": "libvirt-36" + "id": "libvirt-37" } { "execute": "query-migrate-capabilities", - "id": "libvirt-37" + "id": "libvirt-38" } { @@ -7152,12 +7210,12 @@ "capability": "dirty-bitmaps" } ], - "id": "libvirt-37" + "id": "libvirt-38" } { "execute": "query-qmp-schema", - "id": "libvirt-38" + "id": "libvirt-39" } { @@ -18525,12 +18583,12 @@ "meta-type": "object" } ], - "id": "libvirt-38" + "id": "libvirt-39" } { "execute": "query-gic-capabilities", - "id": "libvirt-39" + "id": "libvirt-40" } { @@ -18546,7 +18604,7 @@ "kernel": false } ], - "id": "libvirt-39" + "id": "libvirt-40" } { diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml index e4de7da349..1bf025007d 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml @@ -164,9 +164,10 @@ <flag name='egl-headless'/> <flag name='vfio-pci.display'/> <flag name='memory-backend-memfd'/> + <flag name='memory-backend-memfd.hugetlb'/> <version>2011090</version> <kvmVersion>0</kvmVersion> - <microcodeVersion>344910</microcodeVersion> + <microcodeVersion>345725</microcodeVersion> <package>v2.12.0-rc0</package> <arch>aarch64</arch> <cpu type='kvm' name='pxa262'/> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.replies b/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.replies index bd28546275..d8aef1e9d1 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.replies +++ b/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.replies @@ -5458,11 +5458,69 @@ { "execute": "qom-list-properties", "arguments": { - "typename": "spapr-machine" + "typename": "memory-backend-memfd" }, "id": "libvirt-32" } +{ + "return": [ + { + "name": "policy", + "type": "HostMemPolicy" + }, + { + "name": "share", + "type": "bool" + }, + { + "name": "host-nodes", + "type": "int" + }, + { + "name": "prealloc", + "type": "bool" + }, + { + "name": "dump", + "type": "bool" + }, + { + "name": "size", + "type": "int" + }, + { + "name": "merge", + "type": "bool" + }, + { + "name": "seal", + "type": "bool" + }, + { + "name": "hugetlbsize", + "type": "int" + }, + { + "name": "hugetlb", + "type": "bool" + }, + { + "name": "type", + "type": "string" + } + ], + "id": "libvirt-32" +} + +{ + "execute": "qom-list-properties", + "arguments": { + "typename": "spapr-machine" + }, + "id": "libvirt-33" +} + { "return": [ { @@ -5621,12 +5679,12 @@ "type": "bool" } ], - "id": "libvirt-32" + "id": "libvirt-33" } { "execute": "query-machines", - "id": "libvirt-33" + "id": "libvirt-34" } { @@ -5764,12 +5822,12 @@ "cpu-max": 1 } ], - "id": "libvirt-33" + "id": "libvirt-34" } { "execute": "query-cpu-definitions", - "id": "libvirt-34" + "id": "libvirt-35" } { @@ -7965,35 +8023,35 @@ "static": false } ], - "id": "libvirt-34" + "id": "libvirt-35" } { "execute": "query-tpm-models", - "id": "libvirt-35" + "id": "libvirt-36" } { "return": [ ], - "id": "libvirt-35" + "id": "libvirt-36" } { "execute": "query-tpm-types", - "id": "libvirt-36" + "id": "libvirt-37" } { "return": [ "emulator" ], - "id": "libvirt-36" + "id": "libvirt-37" } { "execute": "query-command-line-options", - "id": "libvirt-37" + "id": "libvirt-38" } { @@ -9149,12 +9207,12 @@ "option": "drive" } ], - "id": "libvirt-37" + "id": "libvirt-38" } { "execute": "query-migrate-capabilities", - "id": "libvirt-38" + "id": "libvirt-39" } { @@ -9216,12 +9274,12 @@ "capability": "dirty-bitmaps" } ], - "id": "libvirt-38" + "id": "libvirt-39" } { "execute": "query-qmp-schema", - "id": "libvirt-39" + "id": "libvirt-40" } { @@ -20589,7 +20647,7 @@ "meta-type": "object" } ], - "id": "libvirt-39" + "id": "libvirt-40" } { diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml index ffe451b332..d722a5d6e8 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml @@ -162,9 +162,10 @@ <flag name='egl-headless'/> <flag name='vfio-pci.display'/> <flag name='memory-backend-memfd'/> + <flag name='memory-backend-memfd.hugetlb'/> <version>2011090</version> <kvmVersion>0</kvmVersion> - <microcodeVersion>425694</microcodeVersion> + <microcodeVersion>426509</microcodeVersion> <package>v2.12.0-rc0</package> <arch>ppc64</arch> <cpu type='kvm' name='default'/> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.replies b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.replies index f98afbceae..b5a14b5916 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.replies +++ b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.replies @@ -3818,13 +3818,71 @@ { "execute": "qom-list-properties", "arguments": { - "typename": "spapr-machine" + "typename": "memory-backend-memfd" }, "id": "libvirt-31" } { - "id": "libvirt-31", + "return": [ + { + "name": "policy", + "type": "HostMemPolicy" + }, + { + "name": "share", + "type": "bool" + }, + { + "name": "host-nodes", + "type": "int" + }, + { + "name": "prealloc", + "type": "bool" + }, + { + "name": "dump", + "type": "bool" + }, + { + "name": "size", + "type": "int" + }, + { + "name": "merge", + "type": "bool" + }, + { + "name": "seal", + "type": "bool" + }, + { + "name": "hugetlbsize", + "type": "int" + }, + { + "name": "hugetlb", + "type": "bool" + }, + { + "name": "type", + "type": "string" + } + ], + "id": "libvirt-31" +} + +{ + "execute": "qom-list-properties", + "arguments": { + "typename": "spapr-machine" + }, + "id": "libvirt-32" +} + +{ + "id": "libvirt-32", "error": { "class": "DeviceNotFound", "desc": "Class 'spapr-machine' not found" @@ -3833,7 +3891,7 @@ { "execute": "query-machines", - "id": "libvirt-32" + "id": "libvirt-33" } { @@ -3891,12 +3949,12 @@ "alias": "s390-ccw-virtio" } ], - "id": "libvirt-32" + "id": "libvirt-33" } { "execute": "query-cpu-definitions", - "id": "libvirt-33" + "id": "libvirt-34" } { @@ -4431,35 +4489,35 @@ "migration-safe": true } ], - "id": "libvirt-33" + "id": "libvirt-34" } { "execute": "query-tpm-models", - "id": "libvirt-34" + "id": "libvirt-35" } { "return": [ ], - "id": "libvirt-34" + "id": "libvirt-35" } { "execute": "query-tpm-types", - "id": "libvirt-35" + "id": "libvirt-36" } { "return": [ "emulator" ], - "id": "libvirt-35" + "id": "libvirt-36" } { "execute": "query-command-line-options", - "id": "libvirt-36" + "id": "libvirt-37" } { @@ -5584,12 +5642,12 @@ "option": "drive" } ], - "id": "libvirt-36" + "id": "libvirt-37" } { "execute": "query-migrate-capabilities", - "id": "libvirt-37" + "id": "libvirt-38" } { @@ -5651,12 +5709,12 @@ "capability": "dirty-bitmaps" } ], - "id": "libvirt-37" + "id": "libvirt-38" } { "execute": "query-qmp-schema", - "id": "libvirt-38" + "id": "libvirt-39" } { @@ -17024,7 +17082,7 @@ "meta-type": "object" } ], - "id": "libvirt-38" + "id": "libvirt-39" } { @@ -17035,7 +17093,7 @@ "name": "host" } }, - "id": "libvirt-39" + "id": "libvirt-40" } { @@ -17073,7 +17131,7 @@ } } }, - "id": "libvirt-39" + "id": "libvirt-40" } { @@ -17087,11 +17145,11 @@ } } }, - "id": "libvirt-40" + "id": "libvirt-41" } { - "id": "libvirt-40", + "id": "libvirt-41", "error": { "class": "GenericError", "desc": "Property '.migratable' not found" diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml index 7fa50bdfaa..2082b1f6d3 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml @@ -130,9 +130,10 @@ <flag name='egl-headless'/> <flag name='vfio-pci.display'/> <flag name='memory-backend-memfd'/> + <flag name='memory-backend-memfd.hugetlb'/> <version>2012000</version> <kvmVersion>0</kvmVersion> - <microcodeVersion>374287</microcodeVersion> + <microcodeVersion>375102</microcodeVersion> <package></package> <arch>s390x</arch> <hostCPU type='kvm' model='z14-base' migratability='no'> diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.replies b/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.replies index e0b6d2f937..675b85b43d 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.replies +++ b/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.replies @@ -4816,13 +4816,71 @@ { "execute": "qom-list-properties", "arguments": { - "typename": "spapr-machine" + "typename": "memory-backend-memfd" }, "id": "libvirt-36" } { - "id": "libvirt-36", + "return": [ + { + "name": "policy", + "type": "HostMemPolicy" + }, + { + "name": "share", + "type": "bool" + }, + { + "name": "host-nodes", + "type": "int" + }, + { + "name": "prealloc", + "type": "bool" + }, + { + "name": "dump", + "type": "bool" + }, + { + "name": "size", + "type": "int" + }, + { + "name": "merge", + "type": "bool" + }, + { + "name": "seal", + "type": "bool" + }, + { + "name": "hugetlbsize", + "type": "int" + }, + { + "name": "hugetlb", + "type": "bool" + }, + { + "name": "type", + "type": "string" + } + ], + "id": "libvirt-36" +} + +{ + "execute": "qom-list-properties", + "arguments": { + "typename": "spapr-machine" + }, + "id": "libvirt-37" +} + +{ + "id": "libvirt-37", "error": { "class": "DeviceNotFound", "desc": "Class 'spapr-machine' not found" @@ -4831,7 +4889,7 @@ { "execute": "query-machines", - "id": "libvirt-37" + "id": "libvirt-38" } { @@ -5030,12 +5088,12 @@ "cpu-max": 255 } ], - "id": "libvirt-37" + "id": "libvirt-38" } { "execute": "query-cpu-definitions", - "id": "libvirt-38" + "id": "libvirt-39" } { @@ -5549,12 +5607,12 @@ "migration-safe": true } ], - "id": "libvirt-38" + "id": "libvirt-39" } { "execute": "query-tpm-models", - "id": "libvirt-39" + "id": "libvirt-40" } { @@ -5562,12 +5620,12 @@ "tpm-crb", "tpm-tis" ], - "id": "libvirt-39" + "id": "libvirt-40" } { "execute": "query-tpm-types", - "id": "libvirt-40" + "id": "libvirt-41" } { @@ -5575,12 +5633,12 @@ "passthrough", "emulator" ], - "id": "libvirt-40" + "id": "libvirt-41" } { "execute": "query-command-line-options", - "id": "libvirt-41" + "id": "libvirt-42" } { @@ -6867,12 +6925,12 @@ "option": "drive" } ], - "id": "libvirt-41" + "id": "libvirt-42" } { "execute": "query-migrate-capabilities", - "id": "libvirt-42" + "id": "libvirt-43" } { @@ -6934,12 +6992,12 @@ "capability": "dirty-bitmaps" } ], - "id": "libvirt-42" + "id": "libvirt-43" } { "execute": "query-qmp-schema", - "id": "libvirt-43" + "id": "libvirt-44" } { @@ -18307,7 +18365,7 @@ "meta-type": "object" } ], - "id": "libvirt-43" + "id": "libvirt-44" } { @@ -18318,7 +18376,7 @@ "name": "host" } }, - "id": "libvirt-44" + "id": "libvirt-45" } { @@ -18508,7 +18566,7 @@ } } }, - "id": "libvirt-44" + "id": "libvirt-45" } { @@ -18700,7 +18758,7 @@ } } }, - "id": "libvirt-45" + "id": "libvirt-46" } { @@ -18955,7 +19013,7 @@ } } }, - "id": "libvirt-45" + "id": "libvirt-46" } { @@ -18969,7 +19027,7 @@ } } }, - "id": "libvirt-46" + "id": "libvirt-47" } { @@ -19159,7 +19217,7 @@ } } }, - "id": "libvirt-46" + "id": "libvirt-47" } { @@ -19351,7 +19409,7 @@ } } }, - "id": "libvirt-47" + "id": "libvirt-48" } { @@ -19606,12 +19664,12 @@ } } }, - "id": "libvirt-47" + "id": "libvirt-48" } { "execute": "query-sev-capabilities", - "id": "libvirt-48" + "id": "libvirt-49" } { @@ -19621,7 +19679,7 @@ "cert-chain": "AQAAAAAOAAAAQAAAAAOAAAAQAAAAAOAAAAQAAAAAOAAAAQAAAAAOAAA", "pdh": "AQAAAAAOAAAAQAAAAAOAAAAQAAAAAOAAAAQAAAAAOAAAAQAAAAAOAAAAQAAAAAOAAA" }, - "id": "libvirt-48" + "id": "libvirt-49" } { diff --git a/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml index b3f6bdc302..f2a63d7a88 100644 --- a/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml @@ -205,9 +205,10 @@ <flag name='egl-headless'/> <flag name='vfio-pci.display'/> <flag name='memory-backend-memfd'/> + <flag name='memory-backend-memfd.hugetlb'/> <version>2011090</version> <kvmVersion>0</kvmVersion> - <microcodeVersion>413556</microcodeVersion> + <microcodeVersion>414371</microcodeVersion> <package>v2.12.0-rc0</package> <arch>x86_64</arch> <hostCPU type='kvm' model='base' migratability='yes'> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.replies b/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.replies index eb57c77a90..aff01371a3 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.replies +++ b/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.replies @@ -5541,11 +5541,69 @@ { "execute": "qom-list-properties", "arguments": { - "typename": "spapr-machine" + "typename": "memory-backend-memfd" }, "id": "libvirt-32" } +{ + "return": [ + { + "name": "policy", + "type": "HostMemPolicy" + }, + { + "name": "share", + "type": "bool" + }, + { + "name": "host-nodes", + "type": "int" + }, + { + "name": "prealloc", + "type": "bool" + }, + { + "name": "dump", + "type": "bool" + }, + { + "name": "size", + "type": "int" + }, + { + "name": "merge", + "type": "bool" + }, + { + "name": "seal", + "type": "bool" + }, + { + "name": "hugetlbsize", + "type": "int" + }, + { + "name": "hugetlb", + "type": "bool" + }, + { + "name": "type", + "type": "string" + } + ], + "id": "libvirt-32" +} + +{ + "execute": "qom-list-properties", + "arguments": { + "typename": "spapr-machine" + }, + "id": "libvirt-33" +} + { "return": [ { @@ -5714,12 +5772,12 @@ "type": "bool" } ], - "id": "libvirt-32" + "id": "libvirt-33" } { "execute": "query-machines", - "id": "libvirt-33" + "id": "libvirt-34" } { @@ -5862,12 +5920,12 @@ "cpu-max": 1 } ], - "id": "libvirt-33" + "id": "libvirt-34" } { "execute": "query-cpu-definitions", - "id": "libvirt-34" + "id": "libvirt-35" } { @@ -8063,35 +8121,35 @@ "static": false } ], - "id": "libvirt-34" + "id": "libvirt-35" } { "execute": "query-tpm-models", - "id": "libvirt-35" + "id": "libvirt-36" } { "return": [ ], - "id": "libvirt-35" + "id": "libvirt-36" } { "execute": "query-tpm-types", - "id": "libvirt-36" + "id": "libvirt-37" } { "return": [ "emulator" ], - "id": "libvirt-36" + "id": "libvirt-37" } { "execute": "query-command-line-options", - "id": "libvirt-37" + "id": "libvirt-38" } { @@ -9221,12 +9279,12 @@ "option": "drive" } ], - "id": "libvirt-37" + "id": "libvirt-38" } { "execute": "query-migrate-capabilities", - "id": "libvirt-38" + "id": "libvirt-39" } { @@ -9296,12 +9354,12 @@ "capability": "late-block-activate" } ], - "id": "libvirt-38" + "id": "libvirt-39" } { "execute": "query-qmp-schema", - "id": "libvirt-39" + "id": "libvirt-40" } { @@ -21460,7 +21518,7 @@ "meta-type": "object" } ], - "id": "libvirt-39" + "id": "libvirt-40" } { diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml index f3df19fc33..c901c671a5 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml @@ -162,9 +162,10 @@ <flag name='egl-headless'/> <flag name='vfio-pci.display'/> <flag name='memory-backend-memfd'/> + <flag name='memory-backend-memfd.hugetlb'/> <version>2012050</version> <kvmVersion>0</kvmVersion> - <microcodeVersion>444131</microcodeVersion> + <microcodeVersion>444946</microcodeVersion> <package>v2.12.0-1689-g518d23a</package> <arch>ppc64</arch> <cpu type='kvm' name='default'/> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.replies b/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.replies index 3e8d136a32..663b4a49c0 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.replies +++ b/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.replies @@ -1734,13 +1734,71 @@ { "execute": "qom-list-properties", "arguments": { - "typename": "spapr-machine" + "typename": "memory-backend-memfd" }, "id": "libvirt-27" } { - "id": "libvirt-27", + "return": [ + { + "name": "policy", + "type": "HostMemPolicy" + }, + { + "name": "share", + "type": "bool" + }, + { + "name": "host-nodes", + "type": "int" + }, + { + "name": "prealloc", + "type": "bool" + }, + { + "name": "dump", + "type": "bool" + }, + { + "name": "size", + "type": "int" + }, + { + "name": "merge", + "type": "bool" + }, + { + "name": "seal", + "type": "bool" + }, + { + "name": "hugetlbsize", + "type": "int" + }, + { + "name": "hugetlb", + "type": "bool" + }, + { + "name": "type", + "type": "string" + } + ], + "id": "libvirt-27" +} + +{ + "execute": "qom-list-properties", + "arguments": { + "typename": "spapr-machine" + }, + "id": "libvirt-28" +} + +{ + "id": "libvirt-28", "error": { "class": "DeviceNotFound", "desc": "Class 'spapr-machine' not found" @@ -1749,7 +1807,7 @@ { "execute": "query-machines", - "id": "libvirt-28" + "id": "libvirt-29" } { @@ -1786,23 +1844,23 @@ "cpu-max": 1 } ], - "id": "libvirt-28" + "id": "libvirt-29" } { "execute": "query-tpm-models", - "id": "libvirt-29" + "id": "libvirt-30" } { "return": [ ], - "id": "libvirt-29" + "id": "libvirt-30" } { "execute": "query-tpm-types", - "id": "libvirt-30" + "id": "libvirt-31" } { @@ -1810,12 +1868,12 @@ "passthrough", "emulator" ], - "id": "libvirt-30" + "id": "libvirt-31" } { "execute": "query-command-line-options", - "id": "libvirt-31" + "id": "libvirt-32" } { @@ -2940,12 +2998,12 @@ "option": "drive" } ], - "id": "libvirt-31" + "id": "libvirt-32" } { "execute": "query-migrate-capabilities", - "id": "libvirt-32" + "id": "libvirt-33" } { @@ -3015,12 +3073,12 @@ "capability": "late-block-activate" } ], - "id": "libvirt-32" + "id": "libvirt-33" } { "execute": "query-qmp-schema", - "id": "libvirt-33" + "id": "libvirt-34" } { @@ -14695,5 +14753,5 @@ "meta-type": "object" } ], - "id": "libvirt-33" + "id": "libvirt-34" } diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml b/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml index bf04826ad0..2e1530e0eb 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml @@ -102,6 +102,7 @@ <flag name='tpm-emulator'/> <flag name='egl-headless'/> <flag name='memory-backend-memfd'/> + <flag name='memory-backend-memfd.hugetlb'/> <version>3000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.replies b/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.replies index 3631193566..cc66c232ab 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.replies +++ b/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.replies @@ -1734,13 +1734,71 @@ { "execute": "qom-list-properties", "arguments": { - "typename": "spapr-machine" + "typename": "memory-backend-memfd" }, "id": "libvirt-27" } { - "id": "libvirt-27", + "return": [ + { + "name": "policy", + "type": "HostMemPolicy" + }, + { + "name": "share", + "type": "bool" + }, + { + "name": "host-nodes", + "type": "int" + }, + { + "name": "prealloc", + "type": "bool" + }, + { + "name": "dump", + "type": "bool" + }, + { + "name": "size", + "type": "int" + }, + { + "name": "merge", + "type": "bool" + }, + { + "name": "seal", + "type": "bool" + }, + { + "name": "hugetlbsize", + "type": "int" + }, + { + "name": "hugetlb", + "type": "bool" + }, + { + "name": "type", + "type": "string" + } + ], + "id": "libvirt-27" +} + +{ + "execute": "qom-list-properties", + "arguments": { + "typename": "spapr-machine" + }, + "id": "libvirt-28" +} + +{ + "id": "libvirt-28", "error": { "class": "DeviceNotFound", "desc": "Class 'spapr-machine' not found" @@ -1749,7 +1807,7 @@ { "execute": "query-machines", - "id": "libvirt-28" + "id": "libvirt-29" } { @@ -1786,23 +1844,23 @@ "cpu-max": 1 } ], - "id": "libvirt-28" + "id": "libvirt-29" } { "execute": "query-tpm-models", - "id": "libvirt-29" + "id": "libvirt-30" } { "return": [ ], - "id": "libvirt-29" + "id": "libvirt-30" } { "execute": "query-tpm-types", - "id": "libvirt-30" + "id": "libvirt-31" } { @@ -1810,12 +1868,12 @@ "passthrough", "emulator" ], - "id": "libvirt-30" + "id": "libvirt-31" } { "execute": "query-command-line-options", - "id": "libvirt-31" + "id": "libvirt-32" } { @@ -2940,12 +2998,12 @@ "option": "drive" } ], - "id": "libvirt-31" + "id": "libvirt-32" } { "execute": "query-migrate-capabilities", - "id": "libvirt-32" + "id": "libvirt-33" } { @@ -3015,12 +3073,12 @@ "capability": "late-block-activate" } ], - "id": "libvirt-32" + "id": "libvirt-33" } { "execute": "query-qmp-schema", - "id": "libvirt-33" + "id": "libvirt-34" } { @@ -14695,5 +14753,5 @@ "meta-type": "object" } ], - "id": "libvirt-33" + "id": "libvirt-34" } diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml index b5f8f94052..4f54dc0217 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml @@ -102,6 +102,7 @@ <flag name='tpm-emulator'/> <flag name='egl-headless'/> <flag name='memory-backend-memfd'/> + <flag name='memory-backend-memfd.hugetlb'/> <version>3000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.replies b/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.replies index 17edb990e1..f5bbe5c650 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.replies +++ b/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.replies @@ -4928,13 +4928,71 @@ { "execute": "qom-list-properties", "arguments": { - "typename": "spapr-machine" + "typename": "memory-backend-memfd" }, "id": "libvirt-36" } { - "id": "libvirt-36", + "return": [ + { + "name": "policy", + "type": "HostMemPolicy" + }, + { + "name": "share", + "type": "bool" + }, + { + "name": "host-nodes", + "type": "int" + }, + { + "name": "prealloc", + "type": "bool" + }, + { + "name": "dump", + "type": "bool" + }, + { + "name": "size", + "type": "int" + }, + { + "name": "merge", + "type": "bool" + }, + { + "name": "seal", + "type": "bool" + }, + { + "name": "hugetlbsize", + "type": "int" + }, + { + "name": "hugetlb", + "type": "bool" + }, + { + "name": "type", + "type": "string" + } + ], + "id": "libvirt-36" +} + +{ + "execute": "qom-list-properties", + "arguments": { + "typename": "spapr-machine" + }, + "id": "libvirt-37" +} + +{ + "id": "libvirt-37", "error": { "class": "DeviceNotFound", "desc": "Class 'spapr-machine' not found" @@ -4943,7 +5001,7 @@ { "execute": "query-machines", - "id": "libvirt-37" + "id": "libvirt-38" } { @@ -5152,12 +5210,12 @@ "cpu-max": 255 } ], - "id": "libvirt-37" + "id": "libvirt-38" } { "execute": "query-cpu-definitions", - "id": "libvirt-38" + "id": "libvirt-39" } { @@ -5594,12 +5652,12 @@ "migration-safe": true } ], - "id": "libvirt-38" + "id": "libvirt-39" } { "execute": "query-tpm-models", - "id": "libvirt-39" + "id": "libvirt-40" } { @@ -5607,12 +5665,12 @@ "tpm-crb", "tpm-tis" ], - "id": "libvirt-39" + "id": "libvirt-40" } { "execute": "query-tpm-types", - "id": "libvirt-40" + "id": "libvirt-41" } { @@ -5620,12 +5678,12 @@ "passthrough", "emulator" ], - "id": "libvirt-40" + "id": "libvirt-41" } { "execute": "query-command-line-options", - "id": "libvirt-41" + "id": "libvirt-42" } { @@ -6924,12 +6982,12 @@ "option": "drive" } ], - "id": "libvirt-41" + "id": "libvirt-42" } { "execute": "query-migrate-capabilities", - "id": "libvirt-42" + "id": "libvirt-43" } { @@ -6999,12 +7057,12 @@ "capability": "late-block-activate" } ], - "id": "libvirt-42" + "id": "libvirt-43" } { "execute": "query-qmp-schema", - "id": "libvirt-43" + "id": "libvirt-44" } { @@ -18884,7 +18942,7 @@ "meta-type": "object" } ], - "id": "libvirt-43" + "id": "libvirt-44" } { @@ -18895,7 +18953,7 @@ "name": "host" } }, - "id": "libvirt-44" + "id": "libvirt-45" } { @@ -19088,7 +19146,7 @@ } } }, - "id": "libvirt-44" + "id": "libvirt-45" } { @@ -19283,7 +19341,7 @@ } } }, - "id": "libvirt-45" + "id": "libvirt-46" } { @@ -19546,7 +19604,7 @@ } } }, - "id": "libvirt-45" + "id": "libvirt-46" } { @@ -19560,7 +19618,7 @@ } } }, - "id": "libvirt-46" + "id": "libvirt-47" } { @@ -19753,7 +19811,7 @@ } } }, - "id": "libvirt-46" + "id": "libvirt-47" } { @@ -19948,7 +20006,7 @@ } } }, - "id": "libvirt-47" + "id": "libvirt-48" } { @@ -20211,16 +20269,16 @@ } } }, - "id": "libvirt-47" + "id": "libvirt-48" } { "execute": "query-sev-capabilities", - "id": "libvirt-48" + "id": "libvirt-49" } { - "id": "libvirt-48", + "id": "libvirt-49", "error": { "class": "GenericError", "desc": "SEV feature is not available" diff --git a/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml index 2f22fd10b6..67e490dcdf 100644 --- a/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml @@ -207,9 +207,10 @@ <flag name='egl-headless'/> <flag name='vfio-pci.display'/> <flag name='memory-backend-memfd'/> + <flag name='memory-backend-memfd.hugetlb'/> <version>3000000</version> <kvmVersion>0</kvmVersion> - <microcodeVersion>425157</microcodeVersion> + <microcodeVersion>425972</microcodeVersion> <package>v3.0.0</package> <arch>x86_64</arch> <hostCPU type='kvm' model='base' migratability='yes'> -- 2.19.0

From: Marc-André Lureau <marcandre.lureau@redhat.com> Add a new memoryBacking source type "memfd", supported by QEMU (when the apability is available). A memfd is a specialized anonymous memory kind. As such, an anonymous source type could be automatically using a memfd. However, there are some complications when migrating from different memory backends in qemu (mainly due to the internal object naming at this point, but there could be more). For now, it is simpler and safer to simply introduce a new source type "memfd". Eventually, the "anonymous" type could learn to use memfd transparently in a seperate change. The main benefits are that it doesn't need to create filesystem files, and it also enforces sealing, providing a bit more safety. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- docs/formatdomain.html.in | 9 +-- docs/schemas/domaincommon.rng | 1 + src/conf/domain_conf.c | 3 +- src/conf/domain_conf.h | 1 + src/qemu/qemu_command.c | 69 +++++++++++++------ src/qemu/qemu_domain.c | 12 +++- .../memfd-memory-numa.x86_64-latest.args | 34 +++++++++ tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 ++++++++++ tests/qemuxml2argvtest.c | 2 + 9 files changed, 140 insertions(+), 27 deletions(-) create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 1f12ab5b42..eeee1f6d40 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1099,7 +1099,7 @@ </hugepages> <nosharepages/> <locked/> - <source type="file|anonymous"/> + <source type="file|anonymous|memfd"/> <access mode="shared|private"/> <allocation mode="immediate|ondemand"/> <discard/> @@ -1150,9 +1150,10 @@ suitable for the specific environment at the same time to mitigate the risks described above. <span class="since">Since 1.0.6</span></dd> <dt><code>source</code></dt> - <dd>Using the <code>type</code> attribute, it's possible to provide - "file" to utilize file memorybacking or keep the default - "anonymous".</dd> + <dd>Using the <code>type</code> attribute, it's possible to + provide "file" to utilize file memorybacking or keep the + default "anonymous". <span class="since">Since 4.8.0</span>, + you may choose "memfd" backing. (QEMU/KVM only)</dd> <dt><code>access</code></dt> <dd>Using the <code>mode</code> attribute, specify if the memory is to be "shared" or "private". This can be overridden per numa node by diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 099a949cf8..4b431b4188 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -655,6 +655,7 @@ <choice> <value>file</value> <value>anonymous</value> + <value>memfd</value> </choice> </attribute> </element> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 1ee43950ae..648015b5b5 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -894,7 +894,8 @@ VIR_ENUM_IMPL(virDomainDiskMirrorState, VIR_DOMAIN_DISK_MIRROR_STATE_LAST, VIR_ENUM_IMPL(virDomainMemorySource, VIR_DOMAIN_MEMORY_SOURCE_LAST, "none", "file", - "anonymous") + "anonymous", + "memfd") VIR_ENUM_IMPL(virDomainMemoryAllocation, VIR_DOMAIN_MEMORY_ALLOCATION_LAST, "none", diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index e30a4b2fe7..00dacef3af 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -607,6 +607,7 @@ typedef enum { VIR_DOMAIN_MEMORY_SOURCE_NONE = 0, /* No memory source defined */ VIR_DOMAIN_MEMORY_SOURCE_FILE, /* Memory source is set as file */ VIR_DOMAIN_MEMORY_SOURCE_ANONYMOUS, /* Memory source is set as anonymous */ + VIR_DOMAIN_MEMORY_SOURCE_MEMFD, /* Memory source is set as memfd */ VIR_DOMAIN_MEMORY_SOURCE_LAST, } virDomainMemorySource; diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 0a353f87ba..b7481b622d 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3114,6 +3114,26 @@ qemuBuildControllerDevCommandLine(virCommandPtr cmd, } +static int +qemuBuildMemoryBackendPropsShare(virJSONValuePtr props, + virDomainMemoryAccess memAccess) +{ + switch (memAccess) { + case VIR_DOMAIN_MEMORY_ACCESS_SHARED: + return virJSONValueObjectAdd(props, "b:share", true, NULL); + + case VIR_DOMAIN_MEMORY_ACCESS_PRIVATE: + return virJSONValueObjectAdd(props, "b:share", false, NULL); + + case VIR_DOMAIN_MEMORY_ACCESS_DEFAULT: + case VIR_DOMAIN_MEMORY_ACCESS_LAST: + break; + } + + return 0; +} + + /** * qemuBuildMemoryBackendProps: * @backendProps: [out] constructed object @@ -3133,7 +3153,7 @@ qemuBuildControllerDevCommandLine(virCommandPtr cmd, * configuration value of 1 is returned. This behaviour can be suppressed by * setting @force to true in which case 0 would be returned. * - * Then, if one of the two memory-backend-* should be used, the @qemuCaps is + * Then, if one of the three memory-backend-* should be used, the @qemuCaps is * consulted to check if qemu does support it. * * Returns: 0 on success, @@ -3259,7 +3279,19 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps, if (!(props = virJSONValueNewObject())) return -1; - if (useHugepage || mem->nvdimmPath || memAccess || + if (def->mem.source == VIR_DOMAIN_MEMORY_SOURCE_MEMFD) { + backendType = "memory-backend-memfd"; + + if (useHugepage && + (virJSONValueObjectAdd(props, "b:hugetlb", useHugepage, NULL) < 0 || + virJSONValueObjectAdd(props, "U:hugetlbsize", pagesize << 10, NULL) < 0)) { + goto cleanup; + } + + if (qemuBuildMemoryBackendPropsShare(props, memAccess) < 0) + goto cleanup; + + } else if (useHugepage || mem->nvdimmPath || memAccess || def->mem.source == VIR_DOMAIN_MEMORY_SOURCE_FILE) { if (mem->nvdimmPath) { @@ -3297,21 +3329,8 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps, goto cleanup; } - switch (memAccess) { - case VIR_DOMAIN_MEMORY_ACCESS_SHARED: - if (virJSONValueObjectAdd(props, "b:share", true, NULL) < 0) - goto cleanup; - break; - - case VIR_DOMAIN_MEMORY_ACCESS_PRIVATE: - if (virJSONValueObjectAdd(props, "b:share", false, NULL) < 0) - goto cleanup; - break; - - case VIR_DOMAIN_MEMORY_ACCESS_DEFAULT: - case VIR_DOMAIN_MEMORY_ACCESS_LAST: - break; - } + if (qemuBuildMemoryBackendPropsShare(props, memAccess) < 0) + goto cleanup; } else { backendType = "memory-backend-ram"; } @@ -3341,7 +3360,9 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps, if (!needHugepage && !mem->sourceNodes && !nodeSpecified && !mem->nvdimmPath && memAccess == VIR_DOMAIN_MEMORY_ACCESS_DEFAULT && - def->mem.source != VIR_DOMAIN_MEMORY_SOURCE_FILE && !force) { + def->mem.source != VIR_DOMAIN_MEMORY_SOURCE_FILE && + def->mem.source != VIR_DOMAIN_MEMORY_SOURCE_MEMFD && + !force) { /* report back that using the new backend is not necessary * to achieve the desired configuration */ ret = 1; @@ -3359,6 +3380,12 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps, _("this qemu doesn't support the " "memory-backend-ram object")); goto cleanup; + } else if (STREQ(backendType, "memory-backend-memory") && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("this qemu doesn't support the " + "memory-backend-memfd object")); + goto cleanup; } ret = 0; @@ -7567,7 +7594,8 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, if (virDomainNumatuneHasPerNodeBinding(def->numa) && !(virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM) || - virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE))) { + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE) || + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD))) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("Per-node memory binding is not supported " "with this QEMU")); @@ -7593,7 +7621,8 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, * need to check which approach to use */ for (i = 0; i < ncells; i++) { if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM) || - virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE)) { + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE) || + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD)) { if ((rc = qemuBuildMemoryCellBackendStr(def, cfg, i, priv, &nodeBackends[i])) < 0) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 2fd8a2a268..4983669a34 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -3949,7 +3949,8 @@ qemuDomainDefValidateFeatures(const virDomainDef *def, static int -qemuDomainDefValidateMemory(const virDomainDef *def) +qemuDomainDefValidateMemory(const virDomainDef *def, + virQEMUCapsPtr qemuCaps) { const long system_page_size = virGetSystemPageSizeKB(); const virDomainMemtune *mem = &def->mem; @@ -3971,6 +3972,13 @@ qemuDomainDefValidateMemory(const virDomainDef *def) return -1; } + if (mem->source == VIR_DOMAIN_MEMORY_SOURCE_MEMFD && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("hugepages is not support with memfd memory source")); + return -1; + } + /* We can't guarantee any other mem.access * if no guest NUMA nodes are defined. */ if (mem->hugepages[0].size != system_page_size && @@ -4110,7 +4118,7 @@ qemuDomainDefValidate(const virDomainDef *def, if (qemuDomainDefValidateFeatures(def, qemuCaps) < 0) goto cleanup; - if (qemuDomainDefValidateMemory(def) < 0) + if (qemuDomainDefValidateMemory(def, qemuCaps) < 0) goto cleanup; ret = 0; diff --git a/tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args b/tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args new file mode 100644 index 0000000000..d0f4057e01 --- /dev/null +++ b/tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args @@ -0,0 +1,34 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name guest=instance-00000092,debug-threads=on \ +-S \ +-object secret,id=masterKey0,format=raw,\ +file=/tmp/lib/domain--1-instance-00000092/master-key.aes \ +-machine pc-i440fx-wily,accel=kvm,usb=off,dump-guest-core=off \ +-m 14336 \ +-mem-prealloc \ +-realtime mlock=off \ +-smp 8,sockets=1,cores=8,threads=1 \ +-object memory-backend-memfd,id=ram-node0,hugetlb=yes,hugetlbsize=2097152,\ +share=yes,size=15032385536,host-nodes=3,policy=preferred \ +-numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \ +-uuid 126f2720-6f8e-45ab-a886-ec9277079a67 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-boot strict=on \ +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x2 \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,\ +resourcecontrol=deny \ +-msg timestamp=on diff --git a/tests/qemuxml2argvdata/memfd-memory-numa.xml b/tests/qemuxml2argvdata/memfd-memory-numa.xml new file mode 100644 index 0000000000..8416a990fa --- /dev/null +++ b/tests/qemuxml2argvdata/memfd-memory-numa.xml @@ -0,0 +1,36 @@ + <domain type='kvm' id='56'> + <name>instance-00000092</name> + <uuid>126f2720-6f8e-45ab-a886-ec9277079a67</uuid> + <memory unit='KiB'>14680064</memory> + <currentMemory unit='KiB'>14680064</currentMemory> + <memoryBacking> + <hugepages> + <page size="2" unit="M"/> + </hugepages> + <source type='memfd'/> + <access mode='shared'/> + <allocation mode='immediate'/> + </memoryBacking> + <numatune> + <memnode cellid='0' mode='preferred' nodeset='3'/> + </numatune> + <vcpu placement='static'>8</vcpu> + <os> + <type arch='x86_64' machine='pc-i440fx-wily'>hvm</type> + <boot dev='hd'/> + </os> + <cpu> + <topology sockets='1' cores='8' threads='1'/> + <numa> + <cell id='0' cpus='0-7' memory='14680064' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <memballoon model='virtio'/> + </devices> + </domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 3d84cb346a..79938a973c 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -2945,6 +2945,8 @@ mymain(void) DO_TEST("fd-memory-no-numa-topology", QEMU_CAPS_OBJECT_MEMORY_FILE, QEMU_CAPS_KVM); + DO_TEST_CAPS_LATEST("memfd-memory-numa"); + DO_TEST("cpu-check-none", QEMU_CAPS_KVM); DO_TEST("cpu-check-partial", QEMU_CAPS_KVM); DO_TEST("cpu-check-full", QEMU_CAPS_KVM); -- 2.19.0

On 09/17/2018 09:14 AM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Add a new memoryBacking source type "memfd", supported by QEMU (when the apability is available).
*capability
A memfd is a specialized anonymous memory kind. As such, an anonymous source type could be automatically using a memfd. However, there are some complications when migrating from different memory backends in qemu (mainly due to the internal object naming at this point, but there could be more). For now, it is simpler and safer to simply introduce a new source type "memfd". Eventually, the "anonymous" type could learn to use memfd transparently in a seperate change.
*separate
The main benefits are that it doesn't need to create filesystem files, and it also enforces sealing, providing a bit more safety.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- docs/formatdomain.html.in | 9 +-- docs/schemas/domaincommon.rng | 1 + src/conf/domain_conf.c | 3 +- src/conf/domain_conf.h | 1 + src/qemu/qemu_command.c | 69 +++++++++++++------ src/qemu/qemu_domain.c | 12 +++- .../memfd-memory-numa.x86_64-latest.args | 34 +++++++++ tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 ++++++++++ tests/qemuxml2argvtest.c | 2 + 9 files changed, 140 insertions(+), 27 deletions(-) create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml
More recently I've been trying to enforce separating XML/conf/rng/docs changes from qemu/args changes... This makes review and testing a bit easier and more "restricted". Since I didn't make it clear previously and I can split things up - no problem. I'll also be adding a "qemuxml2xmltest" for the input file to "prove" it generates the output. It'll of course need to add the QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB to the DO_TEST. Adding xml2xmltest is something required when we add new attributes or input options. I'll split the commit message appropriately too. BTW: I think if "someone" follows this up with moving the qemu_command logic into a new qemuDomainPrepare* method, then I think we can separate the "new" or "fresh" start from the migration start and thus might be able to generate a mechanism that would use memfd for anonymous with the right capabilities present. Not sure it'll fly, but it may be worth a shot. It's getting more and more painful to be stuck with "old stuff".
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 1f12ab5b42..eeee1f6d40 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1099,7 +1099,7 @@ </hugepages> <nosharepages/> <locked/> - <source type="file|anonymous"/> + <source type="file|anonymous|memfd"/> <access mode="shared|private"/> <allocation mode="immediate|ondemand"/> <discard/> @@ -1150,9 +1150,10 @@ suitable for the specific environment at the same time to mitigate the risks described above. <span class="since">Since 1.0.6</span></dd> <dt><code>source</code></dt> - <dd>Using the <code>type</code> attribute, it's possible to provide - "file" to utilize file memorybacking or keep the default - "anonymous".</dd> + <dd>Using the <code>type</code> attribute, it's possible to + provide "file" to utilize file memorybacking or keep the + default "anonymous". <span class="since">Since 4.8.0</span>, + you may choose "memfd" backing. (QEMU/KVM only)</dd>
Need to keep format consistent, I'll adjust.
<dt><code>access</code></dt> <dd>Using the <code>mode</code> attribute, specify if the memory is to be "shared" or "private". This can be overridden per numa node by diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 099a949cf8..4b431b4188 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -655,6 +655,7 @@ <choice> <value>file</value> <value>anonymous</value> + <value>memfd</value> </choice> </attribute> </element> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 1ee43950ae..648015b5b5 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -894,7 +894,8 @@ VIR_ENUM_IMPL(virDomainDiskMirrorState, VIR_DOMAIN_DISK_MIRROR_STATE_LAST, VIR_ENUM_IMPL(virDomainMemorySource, VIR_DOMAIN_MEMORY_SOURCE_LAST, "none", "file", - "anonymous") + "anonymous", + "memfd")
syntax-check would tell you thou shalt not use tabs
VIR_ENUM_IMPL(virDomainMemoryAllocation, VIR_DOMAIN_MEMORY_ALLOCATION_LAST, "none",
[...]
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 2fd8a2a268..4983669a34 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -3949,7 +3949,8 @@ qemuDomainDefValidateFeatures(const virDomainDef *def,
static int -qemuDomainDefValidateMemory(const virDomainDef *def) +qemuDomainDefValidateMemory(const virDomainDef *def, + virQEMUCapsPtr qemuCaps) { const long system_page_size = virGetSystemPageSizeKB(); const virDomainMemtune *mem = &def->mem; @@ -3971,6 +3972,13 @@ qemuDomainDefValidateMemory(const virDomainDef *def) return -1; }
+ if (mem->source == VIR_DOMAIN_MEMORY_SOURCE_MEMFD && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("hugepages is not support with memfd memory source"));
_("hugepages are not supported using memfd memory " "source with this version of QEMU"));
+ return -1; + } + /* We can't guarantee any other mem.access * if no guest NUMA nodes are defined. */ if (mem->hugepages[0].size != system_page_size && @@ -4110,7 +4118,7 @@ qemuDomainDefValidate(const virDomainDef *def, if (qemuDomainDefValidateFeatures(def, qemuCaps) < 0) goto cleanup;
- if (qemuDomainDefValidateMemory(def) < 0) + if (qemuDomainDefValidateMemory(def, qemuCaps) < 0) goto cleanup;
ret = 0;
[...]
diff --git a/tests/qemuxml2argvdata/memfd-memory-numa.xml b/tests/qemuxml2argvdata/memfd-memory-numa.xml new file mode 100644 index 0000000000..8416a990fa --- /dev/null +++ b/tests/qemuxml2argvdata/memfd-memory-numa.xml
I don't recall from the original change, but each of the lines is prefixed by 2 extra spaces... I'll fix before pushing. I can fixup the nits noted. I'll wait until tomorrow before pushing so that if Michal or Pavel wish to comment they can... Reviewed-by: John Ferlan <jferlan@redhat.com> John
@@ -0,0 +1,36 @@ + <domain type='kvm' id='56'> + <name>instance-00000092</name> + <uuid>126f2720-6f8e-45ab-a886-ec9277079a67</uuid> + <memory unit='KiB'>14680064</memory> + <currentMemory unit='KiB'>14680064</currentMemory> + <memoryBacking> + <hugepages> + <page size="2" unit="M"/> + </hugepages> + <source type='memfd'/> + <access mode='shared'/> + <allocation mode='immediate'/> + </memoryBacking> + <numatune> + <memnode cellid='0' mode='preferred' nodeset='3'/> + </numatune> + <vcpu placement='static'>8</vcpu> + <os> + <type arch='x86_64' machine='pc-i440fx-wily'>hvm</type> + <boot dev='hd'/> + </os> + <cpu> + <topology sockets='1' cores='8' threads='1'/> + <numa> + <cell id='0' cpus='0-7' memory='14680064' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <memballoon model='virtio'/> + </devices> + </domain>
[...]

On 09/17/2018 03:14 PM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Add a new memoryBacking source type "memfd", supported by QEMU (when the apability is available).
A memfd is a specialized anonymous memory kind. As such, an anonymous source type could be automatically using a memfd. However, there are some complications when migrating from different memory backends in qemu (mainly due to the internal object naming at this point, but there could be more). For now, it is simpler and safer to simply introduce a new source type "memfd". Eventually, the "anonymous" type could learn to use memfd transparently in a seperate change.
The main benefits are that it doesn't need to create filesystem files, and it also enforces sealing, providing a bit more safety.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- docs/formatdomain.html.in | 9 +-- docs/schemas/domaincommon.rng | 1 + src/conf/domain_conf.c | 3 +- src/conf/domain_conf.h | 1 + src/qemu/qemu_command.c | 69 +++++++++++++------ src/qemu/qemu_domain.c | 12 +++- .../memfd-memory-numa.x86_64-latest.args | 34 +++++++++ tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 ++++++++++ tests/qemuxml2argvtest.c | 2 + 9 files changed, 140 insertions(+), 27 deletions(-) create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 1f12ab5b42..eeee1f6d40 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1099,7 +1099,7 @@ </hugepages> <nosharepages/> <locked/> - <source type="file|anonymous"/> + <source type="file|anonymous|memfd"/>
I'm sorry but I do not think this is the way we should go. This effectively avoids libvirt making the decision and exposes the backend used directly. This puts unnecessary burden on mgmt applications because they have to make yet another decision (track another domain attribute). IIUC, memfd is like memory-backend-file and -ram combined. It can do hugepages or just plain malloc(). Therefore it should be our first choice for freshly started domains. And only if qemu doesn't support it we should fall back to either -file or -ram backends. This means we have to track what backend the domain was started with so that we preserve that on migration (although, the fact that these backends are not interchangeable makes me question 'backend' in their name :-P). For that we can use status/migration XML as I suggested earlier. Once again, status XML is not editable by user [*] and is used solely by libvirtd to store runtime information for a running domain (and backend used falls into that category). Michal * - sure, an evil admin could edit the status XML file (which is usually stored under /var/run/libvirt/qemu/$domain.xml) and restart libvirtd to reload the changes. But hey, the file is readable/writable by root only and there are plenty other ways how an evil root could mess up with running domains. We (have to) trust root.

On Wed, Sep 19, 2018 at 11:41:11AM +0200, Michal Privoznik wrote:
On 09/17/2018 03:14 PM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Add a new memoryBacking source type "memfd", supported by QEMU (when the apability is available).
A memfd is a specialized anonymous memory kind. As such, an anonymous source type could be automatically using a memfd. However, there are some complications when migrating from different memory backends in qemu (mainly due to the internal object naming at this point, but there could be more). For now, it is simpler and safer to simply introduce a new source type "memfd". Eventually, the "anonymous" type could learn to use memfd transparently in a seperate change.
The main benefits are that it doesn't need to create filesystem files, and it also enforces sealing, providing a bit more safety.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- docs/formatdomain.html.in | 9 +-- docs/schemas/domaincommon.rng | 1 + src/conf/domain_conf.c | 3 +- src/conf/domain_conf.h | 1 + src/qemu/qemu_command.c | 69 +++++++++++++------ src/qemu/qemu_domain.c | 12 +++- .../memfd-memory-numa.x86_64-latest.args | 34 +++++++++ tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 ++++++++++ tests/qemuxml2argvtest.c | 2 + 9 files changed, 140 insertions(+), 27 deletions(-) create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 1f12ab5b42..eeee1f6d40 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1099,7 +1099,7 @@ </hugepages> <nosharepages/> <locked/> - <source type="file|anonymous"/> + <source type="file|anonymous|memfd"/>
I'm sorry but I do not think this is the way we should go. This effectively avoids libvirt making the decision and exposes the backend used directly. This puts unnecessary burden on mgmt applications because they have to make yet another decision (track another domain attribute).
IIUC, memfd is like memory-backend-file and -ram combined. It can do hugepages or just plain malloc(). Therefore it should be our first choice for freshly started domains. And only if qemu doesn't support it we should fall back to either -file or -ram backends.
This means we have to track what backend the domain was started with so that we preserve that on migration (although, the fact that these backends are not interchangeable makes me question 'backend' in their name :-P). For that we can use status/migration XML as I suggested earlier.
Once again, status XML is not editable by user [*] and is used solely by libvirtd to store runtime information for a running domain (and backend used falls into that category).
I have to agree with Michal, we should not expose this implementation detail in domain XML if we can hide it in status/migratable XML. One thing about the migration though. I'm not sure what are we officially supporting in libvirt because it might cause us some issues. We need to make sure that if you live-migrate domain from old libvirt to new libvirt you should be able to migrate that domain back to old libvirt. The question is, whether this applies if you destroy and start the domain on the new libvirt before you live-migrate it back to old libvirt. Without the restart there is no issue, because the backend would not be changed, but once you start the same domain again we would pick new backend which would prevent migrating it back to the old libvirt. I'm adding Jiri to CC, he should know more. Pavel

On 09/19/2018 12:02 PM, Pavel Hrdina wrote:
On Wed, Sep 19, 2018 at 11:41:11AM +0200, Michal Privoznik wrote:
On 09/17/2018 03:14 PM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Add a new memoryBacking source type "memfd", supported by QEMU (when the apability is available).
A memfd is a specialized anonymous memory kind. As such, an anonymous source type could be automatically using a memfd. However, there are some complications when migrating from different memory backends in qemu (mainly due to the internal object naming at this point, but there could be more). For now, it is simpler and safer to simply introduce a new source type "memfd". Eventually, the "anonymous" type could learn to use memfd transparently in a seperate change.
The main benefits are that it doesn't need to create filesystem files, and it also enforces sealing, providing a bit more safety.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- docs/formatdomain.html.in | 9 +-- docs/schemas/domaincommon.rng | 1 + src/conf/domain_conf.c | 3 +- src/conf/domain_conf.h | 1 + src/qemu/qemu_command.c | 69 +++++++++++++------ src/qemu/qemu_domain.c | 12 +++- .../memfd-memory-numa.x86_64-latest.args | 34 +++++++++ tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 ++++++++++ tests/qemuxml2argvtest.c | 2 + 9 files changed, 140 insertions(+), 27 deletions(-) create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 1f12ab5b42..eeee1f6d40 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1099,7 +1099,7 @@ </hugepages> <nosharepages/> <locked/> - <source type="file|anonymous"/> + <source type="file|anonymous|memfd"/>
I'm sorry but I do not think this is the way we should go. This effectively avoids libvirt making the decision and exposes the backend used directly. This puts unnecessary burden on mgmt applications because they have to make yet another decision (track another domain attribute).
IIUC, memfd is like memory-backend-file and -ram combined. It can do hugepages or just plain malloc(). Therefore it should be our first choice for freshly started domains. And only if qemu doesn't support it we should fall back to either -file or -ram backends.
This means we have to track what backend the domain was started with so that we preserve that on migration (although, the fact that these backends are not interchangeable makes me question 'backend' in their name :-P). For that we can use status/migration XML as I suggested earlier.
Once again, status XML is not editable by user [*] and is used solely by libvirtd to store runtime information for a running domain (and backend used falls into that category).
I have to agree with Michal, we should not expose this implementation detail in domain XML if we can hide it in status/migratable XML.
One thing about the migration though. I'm not sure what are we officially supporting in libvirt because it might cause us some issues.
We need to make sure that if you live-migrate domain from old libvirt to new libvirt you should be able to migrate that domain back to old libvirt. The question is, whether this applies if you destroy and start the domain on the new libvirt before you live-migrate it back to old libvirt.
Without the restart there is no issue, because the backend would not be changed, but once you start the same domain again we would pick new backend which would prevent migrating it back to the old libvirt.
This is not supported. Exactly because of this reason. If we were to preserve this forward compatibility (i.e. migration from newer libvirt to older) then we couldn't use any new feature qemu has. For instance, if qemu introduces a new device and a user starts a domain with it, migration to older qemu will not work then, obviously. This also applies for other qemu capabilities. Migrating from newer version to older is not supported. It might work, but that's rather coincidence than intent. Michal

On Wed, Sep 19, 2018 at 13:10:42 +0200, Michal Privoznik wrote:
On 09/19/2018 12:02 PM, Pavel Hrdina wrote:
One thing about the migration though. I'm not sure what are we officially supporting in libvirt because it might cause us some issues.
We need to make sure that if you live-migrate domain from old libvirt to new libvirt you should be able to migrate that domain back to old libvirt. The question is, whether this applies if you destroy and start the domain on the new libvirt before you live-migrate it back to old libvirt.
Without the restart there is no issue, because the backend would not be changed, but once you start the same domain again we would pick new backend which would prevent migrating it back to the old libvirt.
This is not supported. Exactly because of this reason. If we were to preserve this forward compatibility (i.e. migration from newer libvirt to older) then we couldn't use any new feature qemu has. For instance, if qemu introduces a new device and a user starts a domain with it, migration to older qemu will not work then, obviously. This also applies for other qemu capabilities.
Migrating from newer version to older is not supported. It might work, but that's rather coincidence than intent.
Not really. The key point is whether a user explicitly asked for the new feature. In other words, taking an old XML usable with old libvirt and starting it on new libvirt should result in a domain which can be migrated back to the old version. Think about oVirt and its transient domains which are always started from scratch using a generated XML. They need to be able to support heterogeneous clusters where some hosts run an old OS while some were already updated to a newer version of the host OS (and libvirt). Both existing and newly started domains should be migratable to any host in the cluster no matter when they were initially started. Unless of course oVirt's cluster level (or what the name for it is) is upgraded from at which point all hosts need to support new features. So much for a theory. I think have bugs which prevent such migration compatibility in some specific cases. I'm not sure whether we intentionally broke this in the past, but I think we should avoid breaking it if possible. Jirka

On 09/19/2018 01:43 PM, Jiri Denemark wrote:
On Wed, Sep 19, 2018 at 13:10:42 +0200, Michal Privoznik wrote:
On 09/19/2018 12:02 PM, Pavel Hrdina wrote:
One thing about the migration though. I'm not sure what are we officially supporting in libvirt because it might cause us some issues.
We need to make sure that if you live-migrate domain from old libvirt to new libvirt you should be able to migrate that domain back to old libvirt. The question is, whether this applies if you destroy and start the domain on the new libvirt before you live-migrate it back to old libvirt.
Without the restart there is no issue, because the backend would not be changed, but once you start the same domain again we would pick new backend which would prevent migrating it back to the old libvirt.
This is not supported. Exactly because of this reason. If we were to preserve this forward compatibility (i.e. migration from newer libvirt to older) then we couldn't use any new feature qemu has. For instance, if qemu introduces a new device and a user starts a domain with it, migration to older qemu will not work then, obviously. This also applies for other qemu capabilities.
Migrating from newer version to older is not supported. It might work, but that's rather coincidence than intent.
Not really. The key point is whether a user explicitly asked for the new feature. In other words, taking an old XML usable with old libvirt and starting it on new libvirt should result in a domain which can be migrated back to the old version.
Think about oVirt and its transient domains which are always started from scratch using a generated XML. They need to be able to support heterogeneous clusters where some hosts run an old OS while some were already updated to a newer version of the host OS (and libvirt). Both existing and newly started domains should be migratable to any host in the cluster no matter when they were initially started. Unless of course oVirt's cluster level (or what the name for it is) is upgraded from at which point all hosts need to support new features.
So much for a theory. I think have bugs which prevent such migration compatibility in some specific cases. I'm not sure whether we intentionally broke this in the past, but I think we should avoid breaking it if possible.
Well, in that case we don't have a good option. If we have say three options to chose from (say mem backend to use), and they are not interchangeable, the moment we chose one the domain is not migratable. Yet, when starting the domain we want to give our users the best option available. BTW: what features does this new memfd backend provides that can't be achieved via traditional -file or -ram backends? Michal

Hi On Wed, Sep 19, 2018 at 5:31 PM Michal Privoznik <mprivozn@redhat.com> wrote:
On 09/19/2018 01:43 PM, Jiri Denemark wrote: BTW: what features does this new memfd backend provides that can't be achieved via traditional -file or -ram backends?
2 things stand out: - no need for files & mountpoint - sealing, preventing grow/srink (qemu doesn't seal write) memfd is thus a better way to share anonymous memory between processes on linux (when you don't have other weird constrains). Read also the initial post: https://dvdhrm.wordpress.com/tag/memfd/. Since then memfd has grown a few more features, such as hugetlb support.

On 09/19/2018 07:10 AM, Michal Privoznik wrote:
On 09/19/2018 12:02 PM, Pavel Hrdina wrote:
On Wed, Sep 19, 2018 at 11:41:11AM +0200, Michal Privoznik wrote:
On 09/17/2018 03:14 PM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Add a new memoryBacking source type "memfd", supported by QEMU (when the apability is available).
A memfd is a specialized anonymous memory kind. As such, an anonymous source type could be automatically using a memfd. However, there are some complications when migrating from different memory backends in qemu (mainly due to the internal object naming at this point, but there could be more). For now, it is simpler and safer to simply introduce a new source type "memfd". Eventually, the "anonymous" type could learn to use memfd transparently in a seperate change.
The main benefits are that it doesn't need to create filesystem files, and it also enforces sealing, providing a bit more safety.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- docs/formatdomain.html.in | 9 +-- docs/schemas/domaincommon.rng | 1 + src/conf/domain_conf.c | 3 +- src/conf/domain_conf.h | 1 + src/qemu/qemu_command.c | 69 +++++++++++++------ src/qemu/qemu_domain.c | 12 +++- .../memfd-memory-numa.x86_64-latest.args | 34 +++++++++ tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 ++++++++++ tests/qemuxml2argvtest.c | 2 + 9 files changed, 140 insertions(+), 27 deletions(-) create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 1f12ab5b42..eeee1f6d40 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1099,7 +1099,7 @@ </hugepages> <nosharepages/> <locked/> - <source type="file|anonymous"/> + <source type="file|anonymous|memfd"/>
I'm sorry but I do not think this is the way we should go. This effectively avoids libvirt making the decision and exposes the backend used directly. This puts unnecessary burden on mgmt applications because they have to make yet another decision (track another domain attribute).
IIUC, memfd is like memory-backend-file and -ram combined. It can do hugepages or just plain malloc(). Therefore it should be our first choice for freshly started domains. And only if qemu doesn't support it we should fall back to either -file or -ram backends.
This means we have to track what backend the domain was started with so that we preserve that on migration (although, the fact that these backends are not interchangeable makes me question 'backend' in their name :-P). For that we can use status/migration XML as I suggested earlier.
Once again, status XML is not editable by user [*] and is used solely by libvirtd to store runtime information for a running domain (and backend used falls into that category).
I have to agree with Michal, we should not expose this implementation detail in domain XML if we can hide it in status/migratable XML.
One thing about the migration though. I'm not sure what are we officially supporting in libvirt because it might cause us some issues.
We need to make sure that if you live-migrate domain from old libvirt to new libvirt you should be able to migrate that domain back to old libvirt. The question is, whether this applies if you destroy and start the domain on the new libvirt before you live-migrate it back to old libvirt.
Without the restart there is no issue, because the backend would not be changed, but once you start the same domain again we would pick new backend which would prevent migrating it back to the old libvirt.
This is not supported. Exactly because of this reason. If we were to preserve this forward compatibility (i.e. migration from newer libvirt to older) then we couldn't use any new feature qemu has. For instance, if qemu introduces a new device and a user starts a domain with it, migration to older qemu will not work then, obviously. This also applies for other qemu capabilities.
Migrating from newer version to older is not supported. It might work, but that's rather coincidence than intent.
Tough time to decide which of these to reply to... So I'll go here. Anyway, perhaps it should be noted that in V1 the "decision" was to force the consumer to select "anonymous" as their source type. Go back to patch 3: + if (!mem->nvdimmPath && + def->mem.source == VIR_DOMAIN_MEMORY_SOURCE_ANONYMOUS && + virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD) && + (!useHugepage || virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB))) Since "anonymous" has to be selected (it's not automatic nor is it "set" in the source type if nothing provided), then supplying "memfd" is only a stretch because it's a new type. IOW, mgmt apps still would have had to request "anonymous". If they didn't, then "file" would probably be used for any hugepage backend while "ram" would be used for others (anonymous or not provided). I say probably because there's the conditions for mem->nvdimmPath OR memAccess which also could choose "file" if source type was not supplied. IIRC, the issue was migration if source type was "ram" and target was "memfd"; however, that would be a source that didn't use hugepages (in which which "file" would have been chosen). The issue was more qemu based because some path wasn't fully supplied when using ram. I think the "root cause" of the angst is because all the decision making is left in the qemu_command code. If libvirtd is restarted, it's just going to "find" the domain already started and wouldn't be updating any status, so to that degree I think using status files won't work. As for migration.... I think some of that just got answered while I was typing... But, if "A" is a system that doesn't have memfd and "B" is a system that has memfd *and* there's an automated decision to use memfd when the source type is "anonymous", then for A -> B there's no mechanism for A to tell B how it was started. If B -> A isn't supported, then fine - what stops that? What I had a hard time determining in the migration code was what happens if the cookie bits the target has don't match what the source has. I didn't see any sort of fail the migration operation, but there's so much of that code I don't know. If we send a cookie that indicates new default for anonymous backend, would that be automatically rejected if the target doesn't know about it. Likewise, if a source sends it's cookies without some bit, would that be rejected. It's this second case where I came to an impasse on how to handle. John

Hi On Wed, Sep 19, 2018 at 1:41 PM Michal Privoznik <mprivozn@redhat.com> wrote:
On 09/17/2018 03:14 PM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Add a new memoryBacking source type "memfd", supported by QEMU (when the apability is available).
A memfd is a specialized anonymous memory kind. As such, an anonymous source type could be automatically using a memfd. However, there are some complications when migrating from different memory backends in qemu (mainly due to the internal object naming at this point, but there could be more). For now, it is simpler and safer to simply introduce a new source type "memfd". Eventually, the "anonymous" type could learn to use memfd transparently in a seperate change.
The main benefits are that it doesn't need to create filesystem files, and it also enforces sealing, providing a bit more safety.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- docs/formatdomain.html.in | 9 +-- docs/schemas/domaincommon.rng | 1 + src/conf/domain_conf.c | 3 +- src/conf/domain_conf.h | 1 + src/qemu/qemu_command.c | 69 +++++++++++++------ src/qemu/qemu_domain.c | 12 +++- .../memfd-memory-numa.x86_64-latest.args | 34 +++++++++ tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 ++++++++++ tests/qemuxml2argvtest.c | 2 + 9 files changed, 140 insertions(+), 27 deletions(-) create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 1f12ab5b42..eeee1f6d40 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1099,7 +1099,7 @@ </hugepages> <nosharepages/> <locked/> - <source type="file|anonymous"/> + <source type="file|anonymous|memfd"/>
I'm sorry but I do not think this is the way we should go. This effectively avoids libvirt making the decision and exposes the backend used directly. This puts unnecessary burden on mgmt applications because they have to make yet another decision (track another domain attribute).
IIUC, memfd is like memory-backend-file and -ram combined. It can do hugepages or just plain malloc(). Therefore it should be our first choice for freshly started domains. And only if qemu doesn't support it we should fall back to either -file or -ram backends.
memory-backend-memfd doesn't replace either -file or -ram though. It's a specialized anonymous memory kind, linux-only atm, and not widely available. -file should be used for nvram or complex hugepage/numa setup for ex. But it's legitimate that a VM user request memfd to be used. The point of this patch is not to say that we shouldn't try to use memfd when possible, but rather let the user request specifically memfd, for security reasons for example. If the setup cannot be satisfied with -memfd, the user should get an error.
This means we have to track what backend the domain was started with so that we preserve that on migration (although, the fact that these backends are not interchangeable makes me question 'backend' in their name :-P). For that we can use status/migration XML as I suggested earlier.
Once again, status XML is not editable by user [*] and is used solely by libvirtd to store runtime information for a running domain (and backend used falls into that category).
Why not do this transparent memfd-usage in a seperate series?
Michal
* - sure, an evil admin could edit the status XML file (which is usually stored under /var/run/libvirt/qemu/$domain.xml) and restart libvirtd to reload the changes. But hey, the file is readable/writable by root only and there are plenty other ways how an evil root could mess up with running domains. We (have to) trust root.

On 09/19/2018 12:03 PM, Marc-André Lureau wrote:
Hi
On Wed, Sep 19, 2018 at 1:41 PM Michal Privoznik <mprivozn@redhat.com> wrote:
On 09/17/2018 03:14 PM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Add a new memoryBacking source type "memfd", supported by QEMU (when the apability is available).
A memfd is a specialized anonymous memory kind. As such, an anonymous source type could be automatically using a memfd. However, there are some complications when migrating from different memory backends in qemu (mainly due to the internal object naming at this point, but there could be more). For now, it is simpler and safer to simply introduce a new source type "memfd". Eventually, the "anonymous" type could learn to use memfd transparently in a seperate change.
The main benefits are that it doesn't need to create filesystem files, and it also enforces sealing, providing a bit more safety.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- docs/formatdomain.html.in | 9 +-- docs/schemas/domaincommon.rng | 1 + src/conf/domain_conf.c | 3 +- src/conf/domain_conf.h | 1 + src/qemu/qemu_command.c | 69 +++++++++++++------ src/qemu/qemu_domain.c | 12 +++- .../memfd-memory-numa.x86_64-latest.args | 34 +++++++++ tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 ++++++++++ tests/qemuxml2argvtest.c | 2 + 9 files changed, 140 insertions(+), 27 deletions(-) create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 1f12ab5b42..eeee1f6d40 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1099,7 +1099,7 @@ </hugepages> <nosharepages/> <locked/> - <source type="file|anonymous"/> + <source type="file|anonymous|memfd"/>
I'm sorry but I do not think this is the way we should go. This effectively avoids libvirt making the decision and exposes the backend used directly. This puts unnecessary burden on mgmt applications because they have to make yet another decision (track another domain attribute).
IIUC, memfd is like memory-backend-file and -ram combined. It can do hugepages or just plain malloc(). Therefore it should be our first choice for freshly started domains. And only if qemu doesn't support it we should fall back to either -file or -ram backends.
memory-backend-memfd doesn't replace either -file or -ram though. It's a specialized anonymous memory kind, linux-only atm, and not widely available.
Well, neither libvirt nor qemu really support hugepages on anything else than linux. Nor it ever will? Because if we merge these patches and expose it in domain XML, there is no turning back. We can't stop supporting it.
-file should be used for nvram or complex hugepage/numa setup for ex.
How come? I can see .host-nodes and .policy attributes for -memfd backend too. Sure, nvram is special, but for plain hugepages use case -file and -memfd are interchangeable, aren't they? -object memory-backend-memfd,id=ram-node0,\ hugetlb=yes,hugetlbsize=2097152,\ share=yes,size=15032385536,host-nodes=3,policy=preferred -object memory-backend-file,id=ram-node0,\ path=/path/to/2M/hugetlfs,\ size=15032385536,host-nodes=3,policy=preferred And for -ram there is no difference from usage/libvirt POV. -object memory-backend-memfd,id=ram-node0,\ share=yes,size=15032385536,host-nodes=3,policy=preferred -object memory-backend-ram,id=ram-node0,\ size=15032385536,host-nodes=3,policy=preferred
But it's legitimate that a VM user request memfd to be used.
The point of this patch is not to say that we shouldn't try to use memfd when possible, but rather let the user request specifically memfd, for security reasons for example. If the setup cannot be satisfied with -memfd, the user should get an error.
What security reasons do you have in mind?
This means we have to track what backend the domain was started with so that we preserve that on migration (although, the fact that these backends are not interchangeable makes me question 'backend' in their name :-P). For that we can use status/migration XML as I suggested earlier.
Once again, status XML is not editable by user [*] and is used solely by libvirtd to store runtime information for a running domain (and backend used falls into that category).
Why not do this transparent memfd-usage in a seperate series?
Depends what we want libvirt to be. If we want it to be mere XML->qemu cmd line generator, then we can expose all qemu settings as they are. If we want it to have some logic built in (so that mgmt applications can offload some decisions to it), then we can't expose all qemu settings. I my ideal world, I'd like to tell libvirt "I want a machine that uses hugepages of this size" and let libvirt figure out the best command line to fulfil my request (either use -file or -memfd or even -ram + -mem-path). On the other hand, I don't want to discourage you from posting patches, so this is the point where I will no longer object. I pointed out my objections enough :-) Michal

Hi On Wed, Sep 19, 2018 at 5:58 PM Michal Privoznik <mprivozn@redhat.com> wrote:
On 09/19/2018 12:03 PM, Marc-André Lureau wrote:
Hi
On Wed, Sep 19, 2018 at 1:41 PM Michal Privoznik <mprivozn@redhat.com> wrote:
On 09/17/2018 03:14 PM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Add a new memoryBacking source type "memfd", supported by QEMU (when the apability is available).
A memfd is a specialized anonymous memory kind. As such, an anonymous source type could be automatically using a memfd. However, there are some complications when migrating from different memory backends in qemu (mainly due to the internal object naming at this point, but there could be more). For now, it is simpler and safer to simply introduce a new source type "memfd". Eventually, the "anonymous" type could learn to use memfd transparently in a seperate change.
The main benefits are that it doesn't need to create filesystem files, and it also enforces sealing, providing a bit more safety.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- docs/formatdomain.html.in | 9 +-- docs/schemas/domaincommon.rng | 1 + src/conf/domain_conf.c | 3 +- src/conf/domain_conf.h | 1 + src/qemu/qemu_command.c | 69 +++++++++++++------ src/qemu/qemu_domain.c | 12 +++- .../memfd-memory-numa.x86_64-latest.args | 34 +++++++++ tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 ++++++++++ tests/qemuxml2argvtest.c | 2 + 9 files changed, 140 insertions(+), 27 deletions(-) create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 1f12ab5b42..eeee1f6d40 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1099,7 +1099,7 @@ </hugepages> <nosharepages/> <locked/> - <source type="file|anonymous"/> + <source type="file|anonymous|memfd"/>
I'm sorry but I do not think this is the way we should go. This effectively avoids libvirt making the decision and exposes the backend used directly. This puts unnecessary burden on mgmt applications because they have to make yet another decision (track another domain attribute).
IIUC, memfd is like memory-backend-file and -ram combined. It can do hugepages or just plain malloc(). Therefore it should be our first choice for freshly started domains. And only if qemu doesn't support it we should fall back to either -file or -ram backends.
memory-backend-memfd doesn't replace either -file or -ram though. It's a specialized anonymous memory kind, linux-only atm, and not widely available.
Well, neither libvirt nor qemu really support hugepages on anything else than linux.
Nor it ever will? Because if we merge these patches and expose it in domain XML, there is no turning back. We can't stop supporting it.
-file should be used for nvram or complex hugepage/numa setup for ex.
How come? I can see .host-nodes and .policy attributes for -memfd backend too. Sure, nvram is special, but for plain hugepages use case -file and -memfd are interchangeable, aren't they?
Sorry, I think I misunderstood the problem then. The qemu mbind() might do all the work. David, didn't you point out limitation of -memfd compared to -file for NUMA setup?
-object memory-backend-memfd,id=ram-node0,\ hugetlb=yes,hugetlbsize=2097152,\ share=yes,size=15032385536,host-nodes=3,policy=preferred
-object memory-backend-file,id=ram-node0,\ path=/path/to/2M/hugetlfs,\ size=15032385536,host-nodes=3,policy=preferred
And for -ram there is no difference from usage/libvirt POV.
-object memory-backend-memfd,id=ram-node0,\ share=yes,size=15032385536,host-nodes=3,policy=preferred
-object memory-backend-ram,id=ram-node0,\ size=15032385536,host-nodes=3,policy=preferred
But it's legitimate that a VM user request memfd to be used.
The point of this patch is not to say that we shouldn't try to use memfd when possible, but rather let the user request specifically memfd, for security reasons for example. If the setup cannot be satisfied with -memfd, the user should get an error.
What security reasons do you have in mind?
grow/shrink sealing (and avoiding somewhat hazardous file system operations).
This means we have to track what backend the domain was started with so that we preserve that on migration (although, the fact that these backends are not interchangeable makes me question 'backend' in their name :-P). For that we can use status/migration XML as I suggested earlier.
Once again, status XML is not editable by user [*] and is used solely by libvirtd to store runtime information for a running domain (and backend used falls into that category).
Why not do this transparent memfd-usage in a seperate series?
Depends what we want libvirt to be. If we want it to be mere XML->qemu cmd line generator, then we can expose all qemu settings as they are. If we want it to have some logic built in (so that mgmt applications can offload some decisions to it), then we can't expose all qemu settings.
I my ideal world, I'd like to tell libvirt "I want a machine that uses hugepages of this size" and let libvirt figure out the best command line to fulfil my request (either use -file or -memfd or even -ram + -mem-path).
On the other hand, I don't want to discourage you from posting patches, so this is the point where I will no longer object. I pointed out my objections enough :-)
I see the benefit in using memfd whenever possible. But I also see a benefit in being able to request its usage explcitely. That's why I think the 2 approaches are compatible. Thanks!

* Marc-André Lureau (marcandre.lureau@redhat.com) wrote:
Hi
On Wed, Sep 19, 2018 at 5:58 PM Michal Privoznik <mprivozn@redhat.com> wrote:
On 09/19/2018 12:03 PM, Marc-André Lureau wrote:
Hi
On Wed, Sep 19, 2018 at 1:41 PM Michal Privoznik <mprivozn@redhat.com> wrote:
On 09/17/2018 03:14 PM, marcandre.lureau@redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Add a new memoryBacking source type "memfd", supported by QEMU (when the apability is available).
A memfd is a specialized anonymous memory kind. As such, an anonymous source type could be automatically using a memfd. However, there are some complications when migrating from different memory backends in qemu (mainly due to the internal object naming at this point, but there could be more). For now, it is simpler and safer to simply introduce a new source type "memfd". Eventually, the "anonymous" type could learn to use memfd transparently in a seperate change.
The main benefits are that it doesn't need to create filesystem files, and it also enforces sealing, providing a bit more safety.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- docs/formatdomain.html.in | 9 +-- docs/schemas/domaincommon.rng | 1 + src/conf/domain_conf.c | 3 +- src/conf/domain_conf.h | 1 + src/qemu/qemu_command.c | 69 +++++++++++++------ src/qemu/qemu_domain.c | 12 +++- .../memfd-memory-numa.x86_64-latest.args | 34 +++++++++ tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 ++++++++++ tests/qemuxml2argvtest.c | 2 + 9 files changed, 140 insertions(+), 27 deletions(-) create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 1f12ab5b42..eeee1f6d40 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1099,7 +1099,7 @@ </hugepages> <nosharepages/> <locked/> - <source type="file|anonymous"/> + <source type="file|anonymous|memfd"/>
I'm sorry but I do not think this is the way we should go. This effectively avoids libvirt making the decision and exposes the backend used directly. This puts unnecessary burden on mgmt applications because they have to make yet another decision (track another domain attribute).
IIUC, memfd is like memory-backend-file and -ram combined. It can do hugepages or just plain malloc(). Therefore it should be our first choice for freshly started domains. And only if qemu doesn't support it we should fall back to either -file or -ram backends.
memory-backend-memfd doesn't replace either -file or -ram though. It's a specialized anonymous memory kind, linux-only atm, and not widely available.
Well, neither libvirt nor qemu really support hugepages on anything else than linux.
Nor it ever will? Because if we merge these patches and expose it in domain XML, there is no turning back. We can't stop supporting it.
-file should be used for nvram or complex hugepage/numa setup for ex.
How come? I can see .host-nodes and .policy attributes for -memfd backend too. Sure, nvram is special, but for plain hugepages use case -file and -memfd are interchangeable, aren't they?
Sorry, I think I misunderstood the problem then. The qemu mbind() might do all the work.
David, didn't you point out limitation of -memfd compared to -file for NUMA setup?
<thinks> I think we came to the conclusion they're mostly the same, but with the gotcha that it's harder to control allocation with memfd. I think for example you can create a fixed size hugetlbfs mount and put a set of VMs in it and no they're limited to that size. I think you can do similar things with /dev/shm like mounts. Dave
-object memory-backend-memfd,id=ram-node0,\ hugetlb=yes,hugetlbsize=2097152,\ share=yes,size=15032385536,host-nodes=3,policy=preferred
-object memory-backend-file,id=ram-node0,\ path=/path/to/2M/hugetlfs,\ size=15032385536,host-nodes=3,policy=preferred
And for -ram there is no difference from usage/libvirt POV.
-object memory-backend-memfd,id=ram-node0,\ share=yes,size=15032385536,host-nodes=3,policy=preferred
-object memory-backend-ram,id=ram-node0,\ size=15032385536,host-nodes=3,policy=preferred
But it's legitimate that a VM user request memfd to be used.
The point of this patch is not to say that we shouldn't try to use memfd when possible, but rather let the user request specifically memfd, for security reasons for example. If the setup cannot be satisfied with -memfd, the user should get an error.
What security reasons do you have in mind?
grow/shrink sealing (and avoiding somewhat hazardous file system operations).
This means we have to track what backend the domain was started with so that we preserve that on migration (although, the fact that these backends are not interchangeable makes me question 'backend' in their name :-P). For that we can use status/migration XML as I suggested earlier.
Once again, status XML is not editable by user [*] and is used solely by libvirtd to store runtime information for a running domain (and backend used falls into that category).
Why not do this transparent memfd-usage in a seperate series?
Depends what we want libvirt to be. If we want it to be mere XML->qemu cmd line generator, then we can expose all qemu settings as they are. If we want it to have some logic built in (so that mgmt applications can offload some decisions to it), then we can't expose all qemu settings.
I my ideal world, I'd like to tell libvirt "I want a machine that uses hugepages of this size" and let libvirt figure out the best command line to fulfil my request (either use -file or -memfd or even -ram + -mem-path).
On the other hand, I don't want to discourage you from posting patches, so this is the point where I will no longer object. I pointed out my objections enough :-)
I see the benefit in using memfd whenever possible. But I also see a benefit in being able to request its usage explcitely. That's why I think the 2 approaches are compatible.
Thanks! -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Hi On Mon, Sep 17, 2018 at 5:15 PM <marcandre.lureau@redhat.com> wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Hi,
This is an alternative series from "[PATCH 0/5] Use memfd if possible". Instead of automatically using memfd for anonymous memory when available (as suggested by Daniel), it introduces the "memfd" memory backing type.
Although using memfd transparently when possible is a good idea, it is a source of various complications for migration & save/restore. This could eventually be challenged in a different series.
The first two patches have been modified and reviewed by John Ferlan. Hopefully they can be merged early, regardless of the last patch outcome, to avoid the painful rebase conflicts due to capabilities checks introduction.
ping, There is a minor qemu caps conflict now, and version should be updated. thanks
Thanks
Marc-André Lureau (3): qemu: add memory-backend-memfd capability check qemu: check memory-backend-memfd.hugetlb capability qemu: add memfd source type
docs/formatdomain.html.in | 9 +- docs/schemas/domaincommon.rng | 1 + src/conf/domain_conf.c | 3 +- src/conf/domain_conf.h | 1 + src/qemu/qemu_capabilities.c | 10 ++ src/qemu/qemu_capabilities.h | 2 + src/qemu/qemu_command.c | 69 +++++++---- src/qemu/qemu_domain.c | 12 +- .../caps_2.12.0.aarch64.replies | 94 ++++++++++++--- .../caps_2.12.0.aarch64.xml | 4 +- .../caps_2.12.0.ppc64.replies | 90 +++++++++++--- .../caps_2.12.0.ppc64.xml | 4 +- .../caps_2.12.0.s390x.replies | 98 ++++++++++++---- .../caps_2.12.0.s390x.xml | 4 +- .../caps_2.12.0.x86_64.replies | 110 +++++++++++++----- .../caps_2.12.0.x86_64.xml | 4 +- .../caps_3.0.0.ppc64.replies | 90 +++++++++++--- .../qemucapabilitiesdata/caps_3.0.0.ppc64.xml | 4 +- .../caps_3.0.0.riscv32.replies | 86 +++++++++++--- .../caps_3.0.0.riscv32.xml | 2 + .../caps_3.0.0.riscv64.replies | 86 +++++++++++--- .../caps_3.0.0.riscv64.xml | 2 + .../caps_3.0.0.x86_64.replies | 110 +++++++++++++----- .../caps_3.0.0.x86_64.xml | 4 +- .../memfd-memory-numa.x86_64-latest.args | 34 ++++++ tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 ++++++ tests/qemuxml2argvtest.c | 2 + 27 files changed, 788 insertions(+), 183 deletions(-) create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml
-- 2.19.0
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- Marc-André Lureau

Hi On Thu, Nov 8, 2018 at 4:40 PM Marc-André Lureau <marcandre.lureau@gmail.com> wrote:
Hi
On Mon, Sep 17, 2018 at 5:15 PM <marcandre.lureau@redhat.com> wrote:
From: Marc-André Lureau <marcandre.lureau@redhat.com>
Hi,
This is an alternative series from "[PATCH 0/5] Use memfd if possible". Instead of automatically using memfd for anonymous memory when available (as suggested by Daniel), it introduces the "memfd" memory backing type.
Although using memfd transparently when possible is a good idea, it is a source of various complications for migration & save/restore. This could eventually be challenged in a different series.
The first two patches have been modified and reviewed by John Ferlan. Hopefully they can be merged early, regardless of the last patch outcome, to avoid the painful rebase conflicts due to capabilities checks introduction.
ping,
There is a minor qemu caps conflict now, and version should be updated.
fwiw, I pushed an updated tree https://github.com/elmarco/libvirt/commits/memfd waiting for some feedback on v2 before resending.
thanks
Thanks
Marc-André Lureau (3): qemu: add memory-backend-memfd capability check qemu: check memory-backend-memfd.hugetlb capability qemu: add memfd source type
docs/formatdomain.html.in | 9 +- docs/schemas/domaincommon.rng | 1 + src/conf/domain_conf.c | 3 +- src/conf/domain_conf.h | 1 + src/qemu/qemu_capabilities.c | 10 ++ src/qemu/qemu_capabilities.h | 2 + src/qemu/qemu_command.c | 69 +++++++---- src/qemu/qemu_domain.c | 12 +- .../caps_2.12.0.aarch64.replies | 94 ++++++++++++--- .../caps_2.12.0.aarch64.xml | 4 +- .../caps_2.12.0.ppc64.replies | 90 +++++++++++--- .../caps_2.12.0.ppc64.xml | 4 +- .../caps_2.12.0.s390x.replies | 98 ++++++++++++---- .../caps_2.12.0.s390x.xml | 4 +- .../caps_2.12.0.x86_64.replies | 110 +++++++++++++----- .../caps_2.12.0.x86_64.xml | 4 +- .../caps_3.0.0.ppc64.replies | 90 +++++++++++--- .../qemucapabilitiesdata/caps_3.0.0.ppc64.xml | 4 +- .../caps_3.0.0.riscv32.replies | 86 +++++++++++--- .../caps_3.0.0.riscv32.xml | 2 + .../caps_3.0.0.riscv64.replies | 86 +++++++++++--- .../caps_3.0.0.riscv64.xml | 2 + .../caps_3.0.0.x86_64.replies | 110 +++++++++++++----- .../caps_3.0.0.x86_64.xml | 4 +- .../memfd-memory-numa.x86_64-latest.args | 34 ++++++ tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 ++++++ tests/qemuxml2argvtest.c | 2 + 27 files changed, 788 insertions(+), 183 deletions(-) create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml
-- 2.19.0
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- Marc-André Lureau
-- Marc-André Lureau
participants (8)
-
Dr. David Alan Gilbert
-
Jiri Denemark
-
John Ferlan
-
Marc-André Lureau
-
Marc-André Lureau
-
marcandre.lureau@redhat.com
-
Michal Privoznik
-
Pavel Hrdina