Re: [libvirt] Exposing mem-path in domain XML

14 Sep 2017


      On 09/06/2017 01:42 PM, Daniel P. Berrange wrote:
...
On Wed, Sep 06, 2017 at 01:35:45PM +0200, Michal Privoznik wrote:
...
On 09/05/2017 04:07 PM, Daniel P. Berrange wrote:
...
On Tue, Sep 05, 2017 at 03:59:09PM +0200, Michal Privoznik wrote:
...
On 07/28/2017 10:59 AM, Daniel P. Berrange wrote:
...
On Fri, Jul 28, 2017 at 10:45:21AM +0200, Michal Privoznik wrote:
...
On 07/27/2017 03:50 PM, Daniel P. Berrange wrote:
> On Thu, Jul 27, 2017 at 02:11:25PM +0200, Michal Privoznik wrote:
>> Dear list,
>>
>> there is the following bug [1] which I'm not quite sure how to grasp. So
>> there is this application/infrastructure called Kove [2] that allows you
>> to have memory for your application stored on a distant host in network
>> and basically fetch needed region on pagefault. Now imagine that
>> somebody wants to use it for backing up domain memory. However, the way
>> that the tool works is it has some kernel module and then some userland
>> binary that is fed with the path of the mmaped file. I don't know all
>> the details, but the point is, in order to let users use this we need to
>> expose the paths for mem-path for the guest memory. I know we did not
>> want to do this in the past, but now it looks like we don't have a way
>> around it, do we?
>
> We don't want to expose the concept of paths in the XML because this is
> a linux specific way to configure hugepages / shared memory. So we hide
> the particular path used in the internal impl of the QEMU driver, and
> or via the qemu.conf global config file. I don't really want to change
> that approach, particularly if the only reason is to integrate with a
> closed source binary like Kove.
Yep, I agree with that. However, if you read the discussion in the
linked bug you'll find that they need to know what file in the
memory_backing_dir (from qemu.conf) corresponds to which domain. The
reported suggested using UUID based filenames, which I fear is not
enough because one can have multiple <memory type='dimm'/> -s configured
for their domain. But I guess we could go with:
${memory_backing_dir}/${domName}        for generic memory
${memory_backing_dir}/${domName}_N      for Nth <memory/>
This feels like it is going to lead to hell when you add in memory
hotplug/unplug, with inevitable races.
...
BTW: IIUC they want predictable names because they need to create the
files before spawning qemu so that they are picked by qemu instead of
using temporary names.
I would like to know why they even need to associate particular memory
files with particular QEMU processes. eg if they're just exposing a
new type of tmpfs filesystem from the kernel why does it matter what
each file is used for.
This might get you answer:
https://bugzilla.redhat.com/show_bug.cgi?id=1461214#c4
So the way I understand it is that they will create the files, and
provide us with paths. So luckily, we don't have to make up the paths on
our own.
IOW it is pretending to be tmpfs except it is not behaving like tmpfs.
This doesn't really make me any more inclined to support this closed
source stuff in libvirt.
Yeah, that's my feeling too. So, what about the following: let's assume
they will fix their code so that it is proper tmpfs. Libvirt can then
behave to it just like it is already doing so for hugetlbfs. For us
it'll be just yet another type of hugepages. I mean, for hugepages we
already create /hupages/mount/point/libvirt/$domain per each domain so
the separation is there (even though this is considered internal impl),
since it would be a proper tmpfs they can see the pid of qemu which is
trying to mmap() (and take the name or whatever unique ID they want from
there).
Yep, we can at least make a reasonable guarantee that all files belonging
to a single QEMU process will always be within the same sub-directory.
This allows the kmod to distinguish 2 files owned by separate VMs, from 2
files owned by the same VM and do what's needed. I don't see why it would
need to care about naming conventions beyond the layout.
...
I guess what I'm trying to ask is if it was proper tmpfs, we would be
okay with it, wouldn't we?
If it is indistinguishable from tmpfs/hugetlbfs from libvirt's POV, we
should be fine -  at most you would need /etc/libvirt/qemu.conf change
to explicitly point at the custom mount point if libvirt doesn't
auto-detect the right one.
Zack, can you join the discussion and tell us if our design sounds
reasonable to you?

Michal