For the Kove integration, the memory is allocated on external devices, similar to a SAN
device LUN allocation. As such, each virt will have its own separate allocation, and will
need its memory file(s) managed independently of other virts. We also use information from
the virtual machine management layer (such as RHV, oVirt, or OpenStack) to associate VM
metadata (such as VM ID, owner, or project) with the allocation, to assist the
administrators with monitoring, tracking, and billing memory usage. This data also assists
in maintenance and troubleshooting by identifying which VMs and which hosts are utilizing
memory on a given external device. I don't believe we could (easily) get this data
just from the process information of the process creating or opening files, but would need
to do some significant work to trace this information from the qemu/libvirt/management
layer stack. Pre-allocating the files at the integration point, such as oVirt / RHV hooks
or libvirt prepare hooks, allows us to collect this information from the upper layers or
domain XML directly.
We don't actually need the file path exposed within the domain XML itself. All
that's really needed is just to have some mechanism for using predictable filename(s)
for qemu, instead of memory filenames that are currently generated within qemu itself
using mktemp. Our original proposal for this was to use the domain UUID for the filename,
and using the file within the "memory_backing_dir" directory from qemu.conf.
This does have the limitation of not supporting multiple memory backing files or hotplug.
An adaptation of this would be to use the domain UUID (or domain name), plus the memory
device id in the filename (for example: <domain_uuid>_<mem_id1>). This would
utilize the same generation for the mem_id that is already in use for creating the memory
device in qemu. Any other mechanism which would result in well-defined filenames would
also work, as long as the filename is predictable prior to qemu startup.
We may wish to add an additional flag in qemu.conf to enable this behavior, defaulting to
the current random filename generation if not specified. As the path is in qemu.conf, and
the filename would be generated internally within libvirt, this avoids exposing any file
paths within the domain XML, keeping it system agnostic.
An alternative would be to allow specification of the filename directly in the domain XML,
while continuing to use the path from qemu.conf's memory_backing_dir directive. With
this approach, libvirt would need to sanitize the filename input to prevent escaping the
memory_backing_dir directory with "..". This method does expose the filenames
(but not the path) in the XML, but allows the management layer (such as oVirt, RHV, or
Openstack) to control the file creation locations directly.
--
Zack Cornelius
zack.cornelius(a)kove.net
----- Original Message -----
From: "Michal Privoznik" <mprivozn(a)redhat.com>
To: "Daniel P. Berrange" <berrange(a)redhat.com>
Cc: "libvir-list" <libvir-list(a)redhat.com>, "Zack Cornelius"
<zack.cornelius(a)kove.net>
Sent: Thursday, September 14, 2017 6:46:48 AM
Subject: Re: [libvirt] Exposing mem-path in domain XML
On 09/06/2017 01:42 PM, Daniel P. Berrange wrote:
> On Wed, Sep 06, 2017 at 01:35:45PM +0200, Michal Privoznik wrote:
>> On 09/05/2017 04:07 PM, Daniel P. Berrange wrote:
>>> On Tue, Sep 05, 2017 at 03:59:09PM +0200, Michal Privoznik wrote:
>>>> On 07/28/2017 10:59 AM, Daniel P. Berrange wrote:
>>>>> On Fri, Jul 28, 2017 at 10:45:21AM +0200, Michal Privoznik wrote:
>>>>>> On 07/27/2017 03:50 PM, Daniel P. Berrange wrote:
>>>>>>> On Thu, Jul 27, 2017 at 02:11:25PM +0200, Michal Privoznik
wrote:
>>>>>>>> Dear list,
>>>>>>>>
>>>>>>>> there is the following bug [1] which I'm not quite
sure how to grasp. So
>>>>>>>> there is this application/infrastructure called Kove [2]
that allows you
>>>>>>>> to have memory for your application stored on a distant
host in network
>>>>>>>> and basically fetch needed region on pagefault. Now
imagine that
>>>>>>>> somebody wants to use it for backing up domain memory.
However, the way
>>>>>>>> that the tool works is it has some kernel module and then
some userland
>>>>>>>> binary that is fed with the path of the mmaped file. I
don't know all
>>>>>>>> the details, but the point is, in order to let users use
this we need to
>>>>>>>> expose the paths for mem-path for the guest memory. I
know we did not
>>>>>>>> want to do this in the past, but now it looks like we
don't have a way
>>>>>>>> around it, do we?
>>>>>>>
>>>>>>> We don't want to expose the concept of paths in the XML
because this is
>>>>>>> a linux specific way to configure hugepages / shared memory.
So we hide
>>>>>>> the particular path used in the internal impl of the QEMU
driver, and
>>>>>>> or via the qemu.conf global config file. I don't really
want to change
>>>>>>> that approach, particularly if the only reason is to
integrate with a
>>>>>>> closed source binary like Kove.
>>>>>>
>>>>>> Yep, I agree with that. However, if you read the discussion in
the
>>>>>> linked bug you'll find that they need to know what file in
the
>>>>>> memory_backing_dir (from qemu.conf) corresponds to which domain.
The
>>>>>> reported suggested using UUID based filenames, which I fear is
not
>>>>>> enough because one can have multiple <memory
type='dimm'/> -s configured
>>>>>> for their domain. But I guess we could go with:
>>>>>>
>>>>>> ${memory_backing_dir}/${domName} for generic memory
>>>>>> ${memory_backing_dir}/${domName}_N for Nth <memory/>
>>>>>
>>>>> This feels like it is going to lead to hell when you add in memory
>>>>> hotplug/unplug, with inevitable races.
>>>>>
>>>>>> BTW: IIUC they want predictable names because they need to create
the
>>>>>> files before spawning qemu so that they are picked by qemu
instead of
>>>>>> using temporary names.
>>>>>
>>>>> I would like to know why they even need to associate particular
memory
>>>>> files with particular QEMU processes. eg if they're just exposing
a
>>>>> new type of tmpfs filesystem from the kernel why does it matter what
>>>>> each file is used for.
>>>>
>>>> This might get you answer:
>>>>
>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1461214#c4
>>>>
>>>> So the way I understand it is that they will create the files, and
>>>> provide us with paths. So luckily, we don't have to make up the paths
on
>>>> our own.
>>>
>>> IOW it is pretending to be tmpfs except it is not behaving like tmpfs.
>>> This doesn't really make me any more inclined to support this closed
>>> source stuff in libvirt.
>>
>> Yeah, that's my feeling too. So, what about the following: let's assume
>> they will fix their code so that it is proper tmpfs. Libvirt can then
>> behave to it just like it is already doing so for hugetlbfs. For us
>> it'll be just yet another type of hugepages. I mean, for hugepages we
>> already create /hupages/mount/point/libvirt/$domain per each domain so
>> the separation is there (even though this is considered internal impl),
>> since it would be a proper tmpfs they can see the pid of qemu which is
>> trying to mmap() (and take the name or whatever unique ID they want from
>> there).
>
> Yep, we can at least make a reasonable guarantee that all files belonging
> to a single QEMU process will always be within the same sub-directory.
> This allows the kmod to distinguish 2 files owned by separate VMs, from 2
> files owned by the same VM and do what's needed. I don't see why it would
> need to care about naming conventions beyond the layout.
>
>> I guess what I'm trying to ask is if it was proper tmpfs, we would be
>> okay with it, wouldn't we?
>
> If it is indistinguishable from tmpfs/hugetlbfs from libvirt's POV, we
> should be fine - at most you would need /etc/libvirt/qemu.conf change
> to explicitly point at the custom mount point if libvirt doesn't
> auto-detect the right one.
>
Zack, can you join the discussion and tell us if our design sounds
reasonable to you?
Michal