Re: [libvirt] Exposing mem-path in domain XML

25 Sep 2017


      ----- Original Message -----
...
From: "Michal Privoznik" <mprivozn@redhat.com>
To: "Zack Cornelius" <zack.cornelius@kove.net>
Cc: "Daniel P. Berrange" <berrange@redhat.com>, "libvir-list" <libvir-list@redhat.com>
Sent: Monday, September 25, 2017 9:17:10 AM
Subject: Re: [libvirt] Exposing mem-path in domain XML
...
On 09/15/2017 03:49 PM, Zack Cornelius wrote:
...
For the Kove integration, the memory is allocated on external devices, similar
to a SAN device LUN allocation. As such, each virt will have its own separate
allocation, and will need its memory file(s) managed independently of other
virts. We also use information from the virtual machine management layer (such
as RHV, oVirt, or OpenStack) to associate VM metadata (such as VM ID, owner, or
project) with the allocation, to assist the administrators with monitoring,
tracking, and billing memory usage. This data also assists in maintenance and
troubleshooting by identifying which VMs and which hosts are utilizing memory
on a given external device. I don't believe we could (easily) get this data
just from the process information of the process creating or opening files, but
would need to do some significant work to trace this information from the
qemu/libvirt/management layer stack. Pre-allocating the files at the
integration point, such as oVirt / RHV hooks or libvirt prepare hooks, allows
us to collect this information from the upper layers or domain XML directly.
We don't actually need the file path exposed within the domain XML itself. All
that's really needed is just to have some mechanism for using predictable
filename(s) for qemu, instead of memory filenames that are currently generated
within qemu itself using mktemp. Our original proposal for this was to use the
domain UUID for the filename, and using the file within the
"memory_backing_dir" directory from qemu.conf. This does have the limitation of
not supporting multiple memory backing files or hotplug. An adaptation of this
would be to use the domain UUID (or domain name), plus the memory device id in
the filename (for example: <domain_uuid>_<mem_id1>). This would utilize the
same generation for the mem_id that is already in use for creating the memory
device in qemu. Any other mechanism which would result in well-defined
filenames would also work, as long as the filename is predictable prior to qemu
startup.
I think qemu uses random file names only if the path provided ends with
a directory. I've tried this locally and indeed, when full path ending
with a file was provided qemu just used it. So I've written a patch that
creates mem-path argument with the following structure:
$memory_backing_dir/$alias
Problem with this approach is that $alias is not stable. It may change
on device hot(un-)plug. Moreover, we'd like to keep the possibility to
be able to change it in the future should we find ourselves in such
situation.
...
We may wish to add an additional flag in qemu.conf to enable this behavior,
defaulting to the current random filename generation if not specified. As the
path is in qemu.conf, and the filename would be generated internally within
libvirt, this avoids exposing any file paths within the domain XML, keeping it
system agnostic.
I don't think we need such switch. Others don't really care what the
file is named really.
...
An alternative would be to allow specification of the filename directly in the
domain XML, while continuing to use the path from qemu.conf's
memory_backing_dir directive. With this approach, libvirt would need to
sanitize the filename input to prevent escaping the memory_backing_dir
directory with "..". This method does expose the filenames (but not the path)
in the XML, but allows the management layer (such as oVirt, RHV, or Openstack)
to control the file creation locations directly.
Well, this is interesting idea. However, it may happen that we use
memory-backend-file even if no <memory model='dimm'/> device. The code
that decides this is pretty complex:
libvirt.org/git/?p=libvirt.git;a=blob;f=src/qemu/qemu_command.c;hb=HEAD#l3234
Therefore we might not always have user define the file name.
Kove would only be using our integration with domains using the file memorybacking via the following XML, which I think simplifies the cases where the memory-backend-file gets used.

 <memoryBacking>
   <source='file'/>
   <access mode='shared'/>
 </memoryBacking>

The Kove integration is not compatible with huge pages, so we're just interested in the memoryBacking source='file' case, and not the hugepages cases, if that simplifies things.
...
Personally, I like the idea I've locally implemented. But, problem is we
can't make such promise. Although, as a fix for a different unrelated
bug we might generate the aliases at define time. If we did that, then
we sort of can make the promise about the file naming. Well, sort of,
because for instance for aforementioned <memory model='dimm'/> the alias
for the corresponding memory-backend-file object is 'memdimmX' therefore
the constructed path is different:
-object
memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/hugepages2M/libvirt/qemu/13-fedora/ram-node0,share=yes,size=4294967296
-numa node,nodeid=0,cpus=0-3,memdev=ram-node0
-object
memory-backend-file,id=memdimm0,prealloc=yes,mem-path=/hugepages2M/libvirt/qemu/13-fedora/memdimm0,share=yes,size=536870912
-device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0
The corresponding XML looks like this:
<domain type='kvm'>
 <name>fedora</name>
 <uuid>63840878-0deb-4095-97e6-fc444d9bc9fa</uuid>
 <maxMemory slots='16' unit='KiB'>8388608</maxMemory>
 <memory unit='KiB'>4717568</memory>
 <currentMemory unit='KiB'>4194304</currentMemory>
 <memoryBacking>
   <hugepages/>
   <access mode='shared'/>
 </memoryBacking>
 ...
 <cpu mode='host-passthrough' check='none'>
   <topology sockets='1' cores='2' threads='2'/>
   <numa>
     <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/>
   </numa>
 </cpu>
 ...
 <devices>
   ...
   <memory model='dimm'>
     <target>
       <size unit='KiB'>523264</size>
       <node>0</node>
     </target>
     <address type='dimm' slot='0'/>
   </memory>
 </devices>
With the other bugfix that defines the aliases within the XML, and your locally implemented idea, would the filenames then be predicable or readable from the XML when using memory source file in all the cases with memory defined in the <memory> element, memory defined as part of the NUMA node, and memory defined as a dimm device?

--Zack