-----Original Message-----
From: Daniel P. Berrange [mailto:berrange@redhat.com]
Sent: Thursday, May 12, 2016 5:28 PM
To: Mooney, Sean K <sean.k.mooney(a)intel.com>
Cc: libvir-list(a)redhat.com
Subject: Re: [libvirt] adding a new libvirt xml element for File
Descriptor backed memory for use with vhost-user
On Thu, May 12, 2016 at 04:00:29PM +0000, Mooney, Sean K wrote:
> > > Today it is possible to use Libvirt to spawn a vm without hugepage
> > > memory and a file descriptor backed memdev Via the use of the
> > qemu:commandline element.
> > >
> > > <qemu:commandline>
> > > <qemu:arg value='-object'/>
> > > <qemu:arg value='memory-backend-file,id=mem,size=1024M,mem-
> > path=/var/lib/libvirt/qemu,share=on'/>
> > > <qemu:arg value='-numa'/>
> > > <qemu:arg value='node,memdev=mem'/>
> > > <qemu:arg value='-mem-prealloc'/>
> > > </qemu:commandline>
> > >
> > > I created a proof of concept patch to nova to demonstrate that
> > > this works however to support this usecase in Nova a new xml
> > > element is
> > required.
> > >
https://review.openstack.org/#/c/309565/1
> > >
> > > I would like to propose the introduction of a new subelemnt to
> > > the memorybacking element to request file discrptro backed memory
> > >
> > > <memoryBacking>
> > > <filedescriptor size_mb="1024"
path="/var/lib/libvirt/qemu"
> > > prealloc="true" shared="on" />
</memoryBacking>
> >
> > Specifying a size is not required - we already know how big memory
> > must be for the guest.
> >
> > We already have a memAccess='shared' attribute against the
<numa>
> > element that is used to determine if the underlying memory should be
> > setup as shared. We could define a further element that lets us
> > control memory access mode for guests without NUMA topology
specified.
> [Mooney, Sean K] hi yes the reason I added the shared attribute was to
> cater for The case of guest without numa topology. For guest with numa
> topology I agree that Using the memAcess='shared' on the cell is
better for consistency with hugepage memory.
>
> > <memoryBacking>
> > <access mode="shared"/>
> > </memoryBacking>
> >
> > For huge pages it seems we unconditionally pass --mem-prealloc. I'm
> > thinking we could perhaps make that configurable via an element
> >
> >
> > <memoryBacking>
> > <allocation mode="immediate|ondemand"/>
> > </memoryBacking>
> >
> > to control use of -mem-prealloc or not.
> [Mooney, Sean K] for the vhost user case the the mem-prealloc is
> required Because you are basically doing dma so you really want memory
to allocated.
> Generically though from a Libvirt point of view I do think It makes
> sense for this To be configurable to allow over subscript of memory
for higher density.
> >
> > So all that remains is a way to request file based backing of RAM.
> > As with huge pages, I think we should hide the actual path from the
user.
> > We should just use /dev/shm as the backing for non-hugepage RAM. For
> > this we could define something like
> >
> > <memoryBacking>
> > <source type="file|anonymous"/>
> > </memoryBacking>
> >
> [Mooney, Sean K] for some reason when I used /dev/shm I could only
boot one instance at a time.
> that was my first choice but maybe we would have to create a file per
instance under /dev/shm to make it work.
QEMU should create the file itself - its not different to our use of
hugetlbfs in fact. Possibly you hit a limit on amount of memory allowed
to be used via /dev/shm - iirc the mount point tis limited to 50% by
default
If you use /var/lib/libvirt/ as the location you get a real file backed
by disk, so akin to putting the VM on swap IIUC !
[Mooney, Sean K] That was my
initial assumption too however when you use
/var/lib/libvirt/ or /dev/shm qemu does not create a file in the directory.
What I think is happening is it does not actually create a file and just
a file descriptor that is mapped to a memory region. I believe it is merely
using the path to determine what the default page size should be when allocating
filebacking in memory. This is something that we can look into though.
> > Putting that all together, to get what you want we'd have
> >
> > <memoryBacking>
> > <source type="file"/>
> > <access mode="shared"/>
> > <allocation mode="immediate"/>
> > </memoryBacking>
> >
> [Mooney, Sean K]
> Yes this seems like it would be a clean way to address this use case.
> Can you guage how small/large of a change this would be. Its been A
> while since I worked with c directly but if you could point me in the
> Right direction in the Libvirt codebase I would be happy to look at
> creating an RFC patch.
First there's defining the XML extensions - needs
docs/schemas/domaincommon.rng and src/conf/domain_conf.{c,h} to be
changed.
Then there's wiring up QEMU XML -> ARGV conversion -
src/qemu/qemu_command.c and adding test cases in
tests/qemuxml2argvtest.c
> From a nova side assuming Libvirt was extended for this feature should
> I open a blueprint to extend the existing guest memory backing support
> In parallel to the Libvirt implementation or wait until after it is
> support in Libvirt to start the Nova discussion? In either case I
> think we agree that any support in nova Would Depend on Libvirt
> support to be accepted in upstream nova.
You're going to hit the deadline for approval of Newton specs in Nova
fairly soon, and unless the libvirt impl is done before then, I think it
is unlikely you'd get a spec approved. So by all means work on this in
parallel, but be realistic about chances of approval in Nova for this
cycle.
[Mooney, Sean K] actually I was assuming that this would be completed
early
In Ocata as it required changes in Libvirt first.