* Marc-André Lureau (marcandre.lureau(a)redhat.com) wrote:
Hi
On Wed, Sep 19, 2018 at 5:58 PM Michal Privoznik <mprivozn(a)redhat.com> wrote:
>
> On 09/19/2018 12:03 PM, Marc-André Lureau wrote:
> > Hi
> >
> > On Wed, Sep 19, 2018 at 1:41 PM Michal Privoznik <mprivozn(a)redhat.com>
wrote:
> >>
> >> On 09/17/2018 03:14 PM, marcandre.lureau(a)redhat.com wrote:
> >>> From: Marc-André Lureau <marcandre.lureau(a)redhat.com>
> >>>
> >>> Add a new memoryBacking source type "memfd", supported by
QEMU (when
> >>> the apability is available).
> >>>
> >>> A memfd is a specialized anonymous memory kind. As such, an anonymous
> >>> source type could be automatically using a memfd. However, there are
> >>> some complications when migrating from different memory backends in
> >>> qemu (mainly due to the internal object naming at this point, but
> >>> there could be more). For now, it is simpler and safer to simply
> >>> introduce a new source type "memfd". Eventually, the
"anonymous" type
> >>> could learn to use memfd transparently in a seperate change.
> >>>
> >>> The main benefits are that it doesn't need to create filesystem
files,
> >>> and it also enforces sealing, providing a bit more safety.
> >>>
> >>> Signed-off-by: Marc-André Lureau <marcandre.lureau(a)redhat.com>
> >>> ---
> >>> docs/formatdomain.html.in | 9 +--
> >>> docs/schemas/domaincommon.rng | 1 +
> >>> src/conf/domain_conf.c | 3 +-
> >>> src/conf/domain_conf.h | 1 +
> >>> src/qemu/qemu_command.c | 69
+++++++++++++------
> >>> src/qemu/qemu_domain.c | 12 +++-
> >>> .../memfd-memory-numa.x86_64-latest.args | 34 +++++++++
> >>> tests/qemuxml2argvdata/memfd-memory-numa.xml | 36 ++++++++++
> >>> tests/qemuxml2argvtest.c | 2 +
> >>> 9 files changed, 140 insertions(+), 27 deletions(-)
> >>> create mode 100644
tests/qemuxml2argvdata/memfd-memory-numa.x86_64-latest.args
> >>> create mode 100644 tests/qemuxml2argvdata/memfd-memory-numa.xml
> >>>
> >>> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> >>> index 1f12ab5b42..eeee1f6d40 100644
> >>> --- a/docs/formatdomain.html.in
> >>> +++ b/docs/formatdomain.html.in
> >>> @@ -1099,7 +1099,7 @@
> >>> </hugepages>
> >>> <nosharepages/>
> >>> <locked/>
> >>> - <source type="file|anonymous"/>
> >>> + <source type="file|anonymous|memfd"/>
> >>
> >> I'm sorry but I do not think this is the way we should go. This
> >> effectively avoids libvirt making the decision and exposes the backend
> >> used directly. This puts unnecessary burden on mgmt applications because
> >> they have to make yet another decision (track another domain attribute).
> >>
> >> IIUC, memfd is like memory-backend-file and -ram combined. It can do
> >> hugepages or just plain malloc(). Therefore it should be our first
> >> choice for freshly started domains. And only if qemu doesn't support
it
> >> we should fall back to either -file or -ram backends.
> >
> > memory-backend-memfd doesn't replace either -file or -ram though. It's
> > a specialized anonymous memory kind, linux-only atm, and not widely
> > available.
>
> Well, neither libvirt nor qemu really support hugepages on anything else
> than linux.
>
> Nor it ever will? Because if we merge these patches and expose it in
> domain XML, there is no turning back. We can't stop supporting it.
>
> >
> > -file should be used for nvram or complex hugepage/numa setup for ex.
>
> How come? I can see .host-nodes and .policy attributes for -memfd
> backend too. Sure, nvram is special, but for plain hugepages use case
> -file and -memfd are interchangeable, aren't they?
Sorry, I think I misunderstood the problem then. The qemu mbind()
might do all the work.
David, didn't you point out limitation of -memfd compared to -file for
NUMA setup?
<thinks> I think we came to the conclusion they're mostly the same, but
with the gotcha that it's harder to control allocation with memfd.
I think for example you can create a fixed size hugetlbfs mount and
put a set of VMs in it and no they're limited to that size.
I think you can do similar things with /dev/shm like mounts.
Dave
>
> -object memory-backend-memfd,id=ram-node0,\
> hugetlb=yes,hugetlbsize=2097152,\
> share=yes,size=15032385536,host-nodes=3,policy=preferred
>
> -object memory-backend-file,id=ram-node0,\
> path=/path/to/2M/hugetlfs,\
> size=15032385536,host-nodes=3,policy=preferred
>
>
> And for -ram there is no difference from usage/libvirt POV.
>
> -object memory-backend-memfd,id=ram-node0,\
> share=yes,size=15032385536,host-nodes=3,policy=preferred
>
> -object memory-backend-ram,id=ram-node0,\
> size=15032385536,host-nodes=3,policy=preferred
>
>
> >
> > But it's legitimate that a VM user request memfd to be used.
> >
> > The point of this patch is not to say that we shouldn't try to use
> > memfd when possible, but rather let the user request specifically
> > memfd, for security reasons for example. If the setup cannot be
> > satisfied with -memfd, the user should get an error.
>
> What security reasons do you have in mind?
grow/shrink sealing (and avoiding somewhat hazardous file system operations).
>
> >
> >>
> >> This means we have to track what backend the domain was started with so
> >> that we preserve that on migration (although, the fact that these
> >> backends are not interchangeable makes me question 'backend' in
their
> >> name :-P). For that we can use status/migration XML as I suggested
earlier.
> >>
> >> Once again, status XML is not editable by user [*] and is used solely by
> >> libvirtd to store runtime information for a running domain (and backend
> >> used falls into that category).
> >
> > Why not do this transparent memfd-usage in a seperate series?
>
> Depends what we want libvirt to be. If we want it to be mere XML->qemu
> cmd line generator, then we can expose all qemu settings as they are. If
> we want it to have some logic built in (so that mgmt applications can
> offload some decisions to it), then we can't expose all qemu settings.
>
> I my ideal world, I'd like to tell libvirt "I want a machine that uses
> hugepages of this size" and let libvirt figure out the best command line
> to fulfil my request (either use -file or -memfd or even -ram + -mem-path).
>
> On the other hand, I don't want to discourage you from posting patches,
> so this is the point where I will no longer object. I pointed out my
> objections enough :-)
I see the benefit in using memfd whenever possible. But I also see a
benefit in being able to request its usage explcitely. That's why I
think the 2 approaches are compatible.
Thanks!
--
Dr. David Alan Gilbert / dgilbert(a)redhat.com / Manchester, UK