On 09/06/2017 01:42 PM, Daniel P. Berrange wrote:
On Wed, Sep 06, 2017 at 01:35:45PM +0200, Michal Privoznik wrote:
> On 09/05/2017 04:07 PM, Daniel P. Berrange wrote:
>> On Tue, Sep 05, 2017 at 03:59:09PM +0200, Michal Privoznik wrote:
>>> On 07/28/2017 10:59 AM, Daniel P. Berrange wrote:
>>>> On Fri, Jul 28, 2017 at 10:45:21AM +0200, Michal Privoznik wrote:
>>>>> On 07/27/2017 03:50 PM, Daniel P. Berrange wrote:
>>>>>> On Thu, Jul 27, 2017 at 02:11:25PM +0200, Michal Privoznik
wrote:
>>>>>>> Dear list,
>>>>>>>
>>>>>>> there is the following bug [1] which I'm not quite sure
how to grasp. So
>>>>>>> there is this application/infrastructure called Kove [2] that
allows you
>>>>>>> to have memory for your application stored on a distant host
in network
>>>>>>> and basically fetch needed region on pagefault. Now imagine
that
>>>>>>> somebody wants to use it for backing up domain memory.
However, the way
>>>>>>> that the tool works is it has some kernel module and then
some userland
>>>>>>> binary that is fed with the path of the mmaped file. I
don't know all
>>>>>>> the details, but the point is, in order to let users use this
we need to
>>>>>>> expose the paths for mem-path for the guest memory. I know we
did not
>>>>>>> want to do this in the past, but now it looks like we
don't have a way
>>>>>>> around it, do we?
>>>>>>
>>>>>> We don't want to expose the concept of paths in the XML
because this is
>>>>>> a linux specific way to configure hugepages / shared memory. So
we hide
>>>>>> the particular path used in the internal impl of the QEMU driver,
and
>>>>>> or via the qemu.conf global config file. I don't really want
to change
>>>>>> that approach, particularly if the only reason is to integrate
with a
>>>>>> closed source binary like Kove.
>>>>>
>>>>> Yep, I agree with that. However, if you read the discussion in the
>>>>> linked bug you'll find that they need to know what file in the
>>>>> memory_backing_dir (from qemu.conf) corresponds to which domain. The
>>>>> reported suggested using UUID based filenames, which I fear is not
>>>>> enough because one can have multiple <memory
type='dimm'/> -s configured
>>>>> for their domain. But I guess we could go with:
>>>>>
>>>>> ${memory_backing_dir}/${domName} for generic memory
>>>>> ${memory_backing_dir}/${domName}_N for Nth <memory/>
>>>>
>>>> This feels like it is going to lead to hell when you add in memory
>>>> hotplug/unplug, with inevitable races.
>>>>
>>>>> BTW: IIUC they want predictable names because they need to create
the
>>>>> files before spawning qemu so that they are picked by qemu instead
of
>>>>> using temporary names.
>>>>
>>>> I would like to know why they even need to associate particular memory
>>>> files with particular QEMU processes. eg if they're just exposing a
>>>> new type of tmpfs filesystem from the kernel why does it matter what
>>>> each file is used for.
>>>
>>> This might get you answer:
>>>
>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1461214#c4
>>>
>>> So the way I understand it is that they will create the files, and
>>> provide us with paths. So luckily, we don't have to make up the paths on
>>> our own.
>>
>> IOW it is pretending to be tmpfs except it is not behaving like tmpfs.
>> This doesn't really make me any more inclined to support this closed
>> source stuff in libvirt.
>
> Yeah, that's my feeling too. So, what about the following: let's assume
> they will fix their code so that it is proper tmpfs. Libvirt can then
> behave to it just like it is already doing so for hugetlbfs. For us
> it'll be just yet another type of hugepages. I mean, for hugepages we
> already create /hupages/mount/point/libvirt/$domain per each domain so
> the separation is there (even though this is considered internal impl),
> since it would be a proper tmpfs they can see the pid of qemu which is
> trying to mmap() (and take the name or whatever unique ID they want from
> there).
Yep, we can at least make a reasonable guarantee that all files belonging
to a single QEMU process will always be within the same sub-directory.
This allows the kmod to distinguish 2 files owned by separate VMs, from 2
files owned by the same VM and do what's needed. I don't see why it would
need to care about naming conventions beyond the layout.
> I guess what I'm trying to ask is if it was proper tmpfs, we would be
> okay with it, wouldn't we?
If it is indistinguishable from tmpfs/hugetlbfs from libvirt's POV, we
should be fine - at most you would need /etc/libvirt/qemu.conf change
to explicitly point at the custom mount point if libvirt doesn't
auto-detect the right one.
Zack, can you join the discussion and tell us if our design sounds
reasonable to you?
Michal