On 09/15/2017 03:49 PM, Zack Cornelius wrote:
For the Kove integration, the memory is allocated on external
devices, similar to a SAN device LUN allocation. As such, each virt will have its own
separate allocation, and will need its memory file(s) managed independently of other
virts. We also use information from the virtual machine management layer (such as RHV,
oVirt, or OpenStack) to associate VM metadata (such as VM ID, owner, or project) with the
allocation, to assist the administrators with monitoring, tracking, and billing memory
usage. This data also assists in maintenance and troubleshooting by identifying which VMs
and which hosts are utilizing memory on a given external device. I don't believe we
could (easily) get this data just from the process information of the process creating or
opening files, but would need to do some significant work to trace this information from
the qemu/libvirt/management layer stack. Pre-allocating the files at the integration
point, such as oVirt / RHV hooks or libvirt prepare hooks, allows us to collect this
information from the upper layers or domain XML directly.
We don't actually need the file path exposed within the domain XML itself. All
that's really needed is just to have some mechanism for using predictable filename(s)
for qemu, instead of memory filenames that are currently generated within qemu itself
using mktemp. Our original proposal for this was to use the domain UUID for the filename,
and using the file within the "memory_backing_dir" directory from qemu.conf.
This does have the limitation of not supporting multiple memory backing files or hotplug.
An adaptation of this would be to use the domain UUID (or domain name), plus the memory
device id in the filename (for example: <domain_uuid>_<mem_id1>). This would
utilize the same generation for the mem_id that is already in use for creating the memory
device in qemu. Any other mechanism which would result in well-defined filenames would
also work, as long as the filename is predictable prior to qemu startup.
I think qemu uses random file names only if the path provided ends with
a directory. I've tried this locally and indeed, when full path ending
with a file was provided qemu just used it. So I've written a patch that
creates mem-path argument with the following structure:
$memory_backing_dir/$alias
Problem with this approach is that $alias is not stable. It may change
on device hot(un-)plug. Moreover, we'd like to keep the possibility to
be able to change it in the future should we find ourselves in such
situation.
We may wish to add an additional flag in qemu.conf to enable this behavior, defaulting to
the current random filename generation if not specified. As the path is in qemu.conf, and
the filename would be generated internally within libvirt, this avoids exposing any file
paths within the domain XML, keeping it system agnostic.
I don't think we need such switch. Others don't really care what the
file is named really.
An alternative would be to allow specification of the filename directly in the domain
XML, while continuing to use the path from qemu.conf's memory_backing_dir directive.
With this approach, libvirt would need to sanitize the filename input to prevent escaping
the memory_backing_dir directory with "..". This method does expose the
filenames (but not the path) in the XML, but allows the management layer (such as oVirt,
RHV, or Openstack) to control the file creation locations directly.
Well, this is interesting idea. However, it may happen that we use
memory-backend-file even if no <memory model='dimm'/> device. The code
that decides this is pretty complex:
libvirt.org/git/?p=libvirt.git;a=blob;f=src/qemu/qemu_command.c;hb=HEAD#l...
Therefore we might not always have user define the file name.
Personally, I like the idea I've locally implemented. But, problem is we
can't make such promise. Although, as a fix for a different unrelated
bug we might generate the aliases at define time. If we did that, then
we sort of can make the promise about the file naming. Well, sort of,
because for instance for aforementioned <memory model='dimm'/> the alias
for the corresponding memory-backend-file object is 'memdimmX' therefore
the constructed path is different:
-object
memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/hugepages2M/libvirt/qemu/13-fedora/ram-node0,share=yes,size=4294967296
-numa node,nodeid=0,cpus=0-3,memdev=ram-node0
-object
memory-backend-file,id=memdimm0,prealloc=yes,mem-path=/hugepages2M/libvirt/qemu/13-fedora/memdimm0,share=yes,size=536870912
-device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0
The corresponding XML looks like this:
<domain type='kvm'>
<name>fedora</name>
<uuid>63840878-0deb-4095-97e6-fc444d9bc9fa</uuid>
<maxMemory slots='16' unit='KiB'>8388608</maxMemory>
<memory unit='KiB'>4717568</memory>
<currentMemory unit='KiB'>4194304</currentMemory>
<memoryBacking>
<hugepages/>
<access mode='shared'/>
</memoryBacking>
...
<cpu mode='host-passthrough' check='none'>
<topology sockets='1' cores='2' threads='2'/>
<numa>
<cell id='0' cpus='0-3' memory='4194304'
unit='KiB'/>
</numa>
</cpu>
...
<devices>
...
<memory model='dimm'>
<target>
<size unit='KiB'>523264</size>
<node>0</node>
</target>
<address type='dimm' slot='0'/>
</memory>
</devices>
Michal