On 09/17/2018 09:14 AM, marcandre.lureau(a)redhat.com wrote:
From: Marc-André Lureau <marcandre.lureau(a)redhat.com>
Add a new memoryBacking source type "memfd", supported by QEMU (when
the apability is available).
*capability
A memfd is a specialized anonymous memory kind. As such, an anonymous
source type could be automatically using a memfd. However, there are
some complications when migrating from different memory backends in
qemu (mainly due to the internal object naming at this point, but
there could be more). For now, it is simpler and safer to simply
introduce a new source type "memfd". Eventually, the "anonymous"
type
could learn to use memfd transparently in a seperate change.
*separate
The main benefits are that it doesn't need to create filesystem files,
and it also enforces sealing, providing a bit more safety.
Signed-off-by: Marc-André Lureau <marcandre.lureau(a)redhat.com>
---
docs/formatdomain.html.in | 9 +--
docs/schemas/domaincommon.rng | 1 +
src/conf/domain_conf.c | 3 +-
src/conf/domain_conf.h | 1 +
src/qemu/qemu_command.c | 69 +++++++++++++------
src/qemu/qemu_domain.c | 12 +++-
.../memfd-memory-numa.x86_64-latest.args | 34 +++++++++
tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 ++++++++++
tests/qemuxml2argvtest.c | 2 +
9 files changed, 140 insertions(+), 27 deletions(-)
create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args
create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml
More recently I've been trying to enforce separating XML/conf/rng/docs
changes from qemu/args changes... This makes review and testing a bit
easier and more "restricted".
Since I didn't make it clear previously and I can split things up - no
problem. I'll also be adding a "qemuxml2xmltest" for the input file to
"prove" it generates the output. It'll of course need to add the
QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB to the DO_TEST.
Adding xml2xmltest is something required when we add new attributes or
input options.
I'll split the commit message appropriately too.
BTW: I think if "someone" follows this up with moving the qemu_command
logic into a new qemuDomainPrepare* method, then I think we can separate
the "new" or "fresh" start from the migration start and thus might be
able to generate a mechanism that would use memfd for anonymous with the
right capabilities present. Not sure it'll fly, but it may be worth a
shot. It's getting more and more painful to be stuck with "old stuff".
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index 1f12ab5b42..eeee1f6d40 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -1099,7 +1099,7 @@
</hugepages>
<nosharepages/>
<locked/>
- <source type="file|anonymous"/>
+ <source type="file|anonymous|memfd"/>
<access mode="shared|private"/>
<allocation mode="immediate|ondemand"/>
<discard/>
@@ -1150,9 +1150,10 @@
suitable for the specific environment at the same time to mitigate
the risks described above. <span class="since">Since
1.0.6</span></dd>
<dt><code>source</code></dt>
- <dd>Using the <code>type</code> attribute, it's possible to
provide
- "file" to utilize file memorybacking or keep the default
- "anonymous".</dd>
+ <dd>Using the <code>type</code> attribute, it's possible
to
+ provide "file" to utilize file memorybacking or keep the
+ default "anonymous". <span class="since">Since
4.8.0</span>,
+ you may choose "memfd" backing. (QEMU/KVM only)</dd>
Need to keep format consistent, I'll adjust.
<dt><code>access</code></dt>
<dd>Using the <code>mode</code> attribute, specify if the
memory is
to be "shared" or "private". This can be overridden per
numa node by
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index 099a949cf8..4b431b4188 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -655,6 +655,7 @@
<choice>
<value>file</value>
<value>anonymous</value>
+ <value>memfd</value>
</choice>
</attribute>
</element>
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 1ee43950ae..648015b5b5 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -894,7 +894,8 @@ VIR_ENUM_IMPL(virDomainDiskMirrorState,
VIR_DOMAIN_DISK_MIRROR_STATE_LAST,
VIR_ENUM_IMPL(virDomainMemorySource, VIR_DOMAIN_MEMORY_SOURCE_LAST,
"none",
"file",
- "anonymous")
+ "anonymous",
+ "memfd")
syntax-check would tell you thou shalt not use tabs
VIR_ENUM_IMPL(virDomainMemoryAllocation, VIR_DOMAIN_MEMORY_ALLOCATION_LAST,
"none",
[...]
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 2fd8a2a268..4983669a34 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -3949,7 +3949,8 @@ qemuDomainDefValidateFeatures(const virDomainDef *def,
static int
-qemuDomainDefValidateMemory(const virDomainDef *def)
+qemuDomainDefValidateMemory(const virDomainDef *def,
+ virQEMUCapsPtr qemuCaps)
{
const long system_page_size = virGetSystemPageSizeKB();
const virDomainMemtune *mem = &def->mem;
@@ -3971,6 +3972,13 @@ qemuDomainDefValidateMemory(const virDomainDef *def)
return -1;
}
+ if (mem->source == VIR_DOMAIN_MEMORY_SOURCE_MEMFD &&
+ !virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_MEMFD_HUGETLB)) {
+ virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+ _("hugepages is not support with memfd memory
source"));
_("hugepages are not supported using memfd memory "
"source with this version of QEMU"));
+ return -1;
+ }
+
/* We can't guarantee any other mem.access
* if no guest NUMA nodes are defined. */
if (mem->hugepages[0].size != system_page_size &&
@@ -4110,7 +4118,7 @@ qemuDomainDefValidate(const virDomainDef *def,
if (qemuDomainDefValidateFeatures(def, qemuCaps) < 0)
goto cleanup;
- if (qemuDomainDefValidateMemory(def) < 0)
+ if (qemuDomainDefValidateMemory(def, qemuCaps) < 0)
goto cleanup;
ret = 0;
[...]
diff --git a/tests/qemuxml2argvdata/memfd-memory-numa.xml
b/tests/qemuxml2argvdata/memfd-memory-numa.xml
new file mode 100644
index 0000000000..8416a990fa
--- /dev/null
+++ b/tests/qemuxml2argvdata/memfd-memory-numa.xml
I don't recall from the original change, but each of the lines is
prefixed by 2 extra spaces... I'll fix before pushing.
I can fixup the nits noted. I'll wait until tomorrow before pushing so
that if Michal or Pavel wish to comment they can...
Reviewed-by: John Ferlan <jferlan(a)redhat.com>
John
@@ -0,0 +1,36 @@
+ <domain type='kvm' id='56'>
+ <name>instance-00000092</name>
+ <uuid>126f2720-6f8e-45ab-a886-ec9277079a67</uuid>
+ <memory unit='KiB'>14680064</memory>
+ <currentMemory unit='KiB'>14680064</currentMemory>
+ <memoryBacking>
+ <hugepages>
+ <page size="2" unit="M"/>
+ </hugepages>
+ <source type='memfd'/>
+ <access mode='shared'/>
+ <allocation mode='immediate'/>
+ </memoryBacking>
+ <numatune>
+ <memnode cellid='0' mode='preferred'
nodeset='3'/>
+ </numatune>
+ <vcpu placement='static'>8</vcpu>
+ <os>
+ <type arch='x86_64'
machine='pc-i440fx-wily'>hvm</type>
+ <boot dev='hd'/>
+ </os>
+ <cpu>
+ <topology sockets='1' cores='8' threads='1'/>
+ <numa>
+ <cell id='0' cpus='0-7' memory='14680064'
unit='KiB'/>
+ </numa>
+ </cpu>
+ <clock offset='utc'/>
+ <on_poweroff>destroy</on_poweroff>
+ <on_reboot>restart</on_reboot>
+ <on_crash>destroy</on_crash>
+ <devices>
+ <emulator>/usr/bin/qemu-system-x86_64</emulator>
+ <memballoon model='virtio'/>
+ </devices>
+ </domain>
[...]