[libvirt] Exposing mem-path in domain XML

Dear list, there is the following bug [1] which I'm not quite sure how to grasp. So there is this application/infrastructure called Kove [2] that allows you to have memory for your application stored on a distant host in network and basically fetch needed region on pagefault. Now imagine that somebody wants to use it for backing up domain memory. However, the way that the tool works is it has some kernel module and then some userland binary that is fed with the path of the mmaped file. I don't know all the details, but the point is, in order to let users use this we need to expose the paths for mem-path for the guest memory. I know we did not want to do this in the past, but now it looks like we don't have a way around it, do we? Michal 1: https://bugzilla.redhat.com/show_bug.cgi?id=1461214 2: http://kove.net

On Thu, Jul 27, 2017 at 02:11:25PM +0200, Michal Privoznik wrote:
Dear list,
there is the following bug [1] which I'm not quite sure how to grasp. So there is this application/infrastructure called Kove [2] that allows you to have memory for your application stored on a distant host in network and basically fetch needed region on pagefault. Now imagine that somebody wants to use it for backing up domain memory. However, the way that the tool works is it has some kernel module and then some userland binary that is fed with the path of the mmaped file. I don't know all the details, but the point is, in order to let users use this we need to expose the paths for mem-path for the guest memory. I know we did not want to do this in the past, but now it looks like we don't have a way around it, do we?
We don't want to expose the concept of paths in the XML because this is a linux specific way to configure hugepages / shared memory. So we hide the particular path used in the internal impl of the QEMU driver, and or via the qemu.conf global config file. I don't really want to change that approach, particularly if the only reason is to integrate with a closed source binary like Kove. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 07/27/2017 03:50 PM, Daniel P. Berrange wrote:
On Thu, Jul 27, 2017 at 02:11:25PM +0200, Michal Privoznik wrote:
Dear list,
there is the following bug [1] which I'm not quite sure how to grasp. So there is this application/infrastructure called Kove [2] that allows you to have memory for your application stored on a distant host in network and basically fetch needed region on pagefault. Now imagine that somebody wants to use it for backing up domain memory. However, the way that the tool works is it has some kernel module and then some userland binary that is fed with the path of the mmaped file. I don't know all the details, but the point is, in order to let users use this we need to expose the paths for mem-path for the guest memory. I know we did not want to do this in the past, but now it looks like we don't have a way around it, do we?
We don't want to expose the concept of paths in the XML because this is a linux specific way to configure hugepages / shared memory. So we hide the particular path used in the internal impl of the QEMU driver, and or via the qemu.conf global config file. I don't really want to change that approach, particularly if the only reason is to integrate with a closed source binary like Kove.
Yep, I agree with that. However, if you read the discussion in the linked bug you'll find that they need to know what file in the memory_backing_dir (from qemu.conf) corresponds to which domain. The reported suggested using UUID based filenames, which I fear is not enough because one can have multiple <memory type='dimm'/> -s configured for their domain. But I guess we could go with: ${memory_backing_dir}/${domName} for generic memory ${memory_backing_dir}/${domName}_N for Nth <memory/> BTW: IIUC they want predictable names because they need to create the files before spawning qemu so that they are picked by qemu instead of using temporary names. Michal

On Fri, Jul 28, 2017 at 10:45:21AM +0200, Michal Privoznik wrote:
On 07/27/2017 03:50 PM, Daniel P. Berrange wrote:
On Thu, Jul 27, 2017 at 02:11:25PM +0200, Michal Privoznik wrote:
Dear list,
there is the following bug [1] which I'm not quite sure how to grasp. So there is this application/infrastructure called Kove [2] that allows you to have memory for your application stored on a distant host in network and basically fetch needed region on pagefault. Now imagine that somebody wants to use it for backing up domain memory. However, the way that the tool works is it has some kernel module and then some userland binary that is fed with the path of the mmaped file. I don't know all the details, but the point is, in order to let users use this we need to expose the paths for mem-path for the guest memory. I know we did not want to do this in the past, but now it looks like we don't have a way around it, do we?
We don't want to expose the concept of paths in the XML because this is a linux specific way to configure hugepages / shared memory. So we hide the particular path used in the internal impl of the QEMU driver, and or via the qemu.conf global config file. I don't really want to change that approach, particularly if the only reason is to integrate with a closed source binary like Kove.
Yep, I agree with that. However, if you read the discussion in the linked bug you'll find that they need to know what file in the memory_backing_dir (from qemu.conf) corresponds to which domain. The reported suggested using UUID based filenames, which I fear is not enough because one can have multiple <memory type='dimm'/> -s configured for their domain. But I guess we could go with:
${memory_backing_dir}/${domName} for generic memory ${memory_backing_dir}/${domName}_N for Nth <memory/>
This feels like it is going to lead to hell when you add in memory hotplug/unplug, with inevitable races.
BTW: IIUC they want predictable names because they need to create the files before spawning qemu so that they are picked by qemu instead of using temporary names.
I would like to know why they even need to associate particular memory files with particular QEMU processes. eg if they're just exposing a new type of tmpfs filesystem from the kernel why does it matter what each file is used for. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 07/28/2017 10:59 AM, Daniel P. Berrange wrote:
On Fri, Jul 28, 2017 at 10:45:21AM +0200, Michal Privoznik wrote:
On 07/27/2017 03:50 PM, Daniel P. Berrange wrote:
On Thu, Jul 27, 2017 at 02:11:25PM +0200, Michal Privoznik wrote:
Dear list,
there is the following bug [1] which I'm not quite sure how to grasp. So there is this application/infrastructure called Kove [2] that allows you to have memory for your application stored on a distant host in network and basically fetch needed region on pagefault. Now imagine that somebody wants to use it for backing up domain memory. However, the way that the tool works is it has some kernel module and then some userland binary that is fed with the path of the mmaped file. I don't know all the details, but the point is, in order to let users use this we need to expose the paths for mem-path for the guest memory. I know we did not want to do this in the past, but now it looks like we don't have a way around it, do we?
We don't want to expose the concept of paths in the XML because this is a linux specific way to configure hugepages / shared memory. So we hide the particular path used in the internal impl of the QEMU driver, and or via the qemu.conf global config file. I don't really want to change that approach, particularly if the only reason is to integrate with a closed source binary like Kove.
Yep, I agree with that. However, if you read the discussion in the linked bug you'll find that they need to know what file in the memory_backing_dir (from qemu.conf) corresponds to which domain. The reported suggested using UUID based filenames, which I fear is not enough because one can have multiple <memory type='dimm'/> -s configured for their domain. But I guess we could go with:
${memory_backing_dir}/${domName} for generic memory ${memory_backing_dir}/${domName}_N for Nth <memory/>
This feels like it is going to lead to hell when you add in memory hotplug/unplug, with inevitable races.
BTW: IIUC they want predictable names because they need to create the files before spawning qemu so that they are picked by qemu instead of using temporary names.
I would like to know why they even need to associate particular memory files with particular QEMU processes. eg if they're just exposing a new type of tmpfs filesystem from the kernel why does it matter what each file is used for.
This might get you answer: https://bugzilla.redhat.com/show_bug.cgi?id=1461214#c4 So the way I understand it is that they will create the files, and provide us with paths. So luckily, we don't have to make up the paths on our own. Michal

On Tue, Sep 05, 2017 at 03:59:09PM +0200, Michal Privoznik wrote:
On 07/28/2017 10:59 AM, Daniel P. Berrange wrote:
On Fri, Jul 28, 2017 at 10:45:21AM +0200, Michal Privoznik wrote:
On 07/27/2017 03:50 PM, Daniel P. Berrange wrote:
On Thu, Jul 27, 2017 at 02:11:25PM +0200, Michal Privoznik wrote:
Dear list,
there is the following bug [1] which I'm not quite sure how to grasp. So there is this application/infrastructure called Kove [2] that allows you to have memory for your application stored on a distant host in network and basically fetch needed region on pagefault. Now imagine that somebody wants to use it for backing up domain memory. However, the way that the tool works is it has some kernel module and then some userland binary that is fed with the path of the mmaped file. I don't know all the details, but the point is, in order to let users use this we need to expose the paths for mem-path for the guest memory. I know we did not want to do this in the past, but now it looks like we don't have a way around it, do we?
We don't want to expose the concept of paths in the XML because this is a linux specific way to configure hugepages / shared memory. So we hide the particular path used in the internal impl of the QEMU driver, and or via the qemu.conf global config file. I don't really want to change that approach, particularly if the only reason is to integrate with a closed source binary like Kove.
Yep, I agree with that. However, if you read the discussion in the linked bug you'll find that they need to know what file in the memory_backing_dir (from qemu.conf) corresponds to which domain. The reported suggested using UUID based filenames, which I fear is not enough because one can have multiple <memory type='dimm'/> -s configured for their domain. But I guess we could go with:
${memory_backing_dir}/${domName} for generic memory ${memory_backing_dir}/${domName}_N for Nth <memory/>
This feels like it is going to lead to hell when you add in memory hotplug/unplug, with inevitable races.
BTW: IIUC they want predictable names because they need to create the files before spawning qemu so that they are picked by qemu instead of using temporary names.
I would like to know why they even need to associate particular memory files with particular QEMU processes. eg if they're just exposing a new type of tmpfs filesystem from the kernel why does it matter what each file is used for.
This might get you answer:
https://bugzilla.redhat.com/show_bug.cgi?id=1461214#c4
So the way I understand it is that they will create the files, and provide us with paths. So luckily, we don't have to make up the paths on our own.
IOW it is pretending to be tmpfs except it is not behaving like tmpfs. This doesn't really make me any more inclined to support this closed source stuff in libvirt. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 09/05/2017 04:07 PM, Daniel P. Berrange wrote:
On Tue, Sep 05, 2017 at 03:59:09PM +0200, Michal Privoznik wrote:
On 07/28/2017 10:59 AM, Daniel P. Berrange wrote:
On Fri, Jul 28, 2017 at 10:45:21AM +0200, Michal Privoznik wrote:
On 07/27/2017 03:50 PM, Daniel P. Berrange wrote:
On Thu, Jul 27, 2017 at 02:11:25PM +0200, Michal Privoznik wrote:
Dear list,
there is the following bug [1] which I'm not quite sure how to grasp. So there is this application/infrastructure called Kove [2] that allows you to have memory for your application stored on a distant host in network and basically fetch needed region on pagefault. Now imagine that somebody wants to use it for backing up domain memory. However, the way that the tool works is it has some kernel module and then some userland binary that is fed with the path of the mmaped file. I don't know all the details, but the point is, in order to let users use this we need to expose the paths for mem-path for the guest memory. I know we did not want to do this in the past, but now it looks like we don't have a way around it, do we?
We don't want to expose the concept of paths in the XML because this is a linux specific way to configure hugepages / shared memory. So we hide the particular path used in the internal impl of the QEMU driver, and or via the qemu.conf global config file. I don't really want to change that approach, particularly if the only reason is to integrate with a closed source binary like Kove.
Yep, I agree with that. However, if you read the discussion in the linked bug you'll find that they need to know what file in the memory_backing_dir (from qemu.conf) corresponds to which domain. The reported suggested using UUID based filenames, which I fear is not enough because one can have multiple <memory type='dimm'/> -s configured for their domain. But I guess we could go with:
${memory_backing_dir}/${domName} for generic memory ${memory_backing_dir}/${domName}_N for Nth <memory/>
This feels like it is going to lead to hell when you add in memory hotplug/unplug, with inevitable races.
BTW: IIUC they want predictable names because they need to create the files before spawning qemu so that they are picked by qemu instead of using temporary names.
I would like to know why they even need to associate particular memory files with particular QEMU processes. eg if they're just exposing a new type of tmpfs filesystem from the kernel why does it matter what each file is used for.
This might get you answer:
https://bugzilla.redhat.com/show_bug.cgi?id=1461214#c4
So the way I understand it is that they will create the files, and provide us with paths. So luckily, we don't have to make up the paths on our own.
IOW it is pretending to be tmpfs except it is not behaving like tmpfs. This doesn't really make me any more inclined to support this closed source stuff in libvirt.
Yeah, that's my feeling too. So, what about the following: let's assume they will fix their code so that it is proper tmpfs. Libvirt can then behave to it just like it is already doing so for hugetlbfs. For us it'll be just yet another type of hugepages. I mean, for hugepages we already create /hupages/mount/point/libvirt/$domain per each domain so the separation is there (even though this is considered internal impl), since it would be a proper tmpfs they can see the pid of qemu which is trying to mmap() (and take the name or whatever unique ID they want from there). I guess what I'm trying to ask is if it was proper tmpfs, we would be okay with it, wouldn't we? Michal

On Wed, Sep 06, 2017 at 01:35:45PM +0200, Michal Privoznik wrote:
On 09/05/2017 04:07 PM, Daniel P. Berrange wrote:
On Tue, Sep 05, 2017 at 03:59:09PM +0200, Michal Privoznik wrote:
On 07/28/2017 10:59 AM, Daniel P. Berrange wrote:
On Fri, Jul 28, 2017 at 10:45:21AM +0200, Michal Privoznik wrote:
On 07/27/2017 03:50 PM, Daniel P. Berrange wrote:
On Thu, Jul 27, 2017 at 02:11:25PM +0200, Michal Privoznik wrote: > Dear list, > > there is the following bug [1] which I'm not quite sure how to grasp. So > there is this application/infrastructure called Kove [2] that allows you > to have memory for your application stored on a distant host in network > and basically fetch needed region on pagefault. Now imagine that > somebody wants to use it for backing up domain memory. However, the way > that the tool works is it has some kernel module and then some userland > binary that is fed with the path of the mmaped file. I don't know all > the details, but the point is, in order to let users use this we need to > expose the paths for mem-path for the guest memory. I know we did not > want to do this in the past, but now it looks like we don't have a way > around it, do we?
We don't want to expose the concept of paths in the XML because this is a linux specific way to configure hugepages / shared memory. So we hide the particular path used in the internal impl of the QEMU driver, and or via the qemu.conf global config file. I don't really want to change that approach, particularly if the only reason is to integrate with a closed source binary like Kove.
Yep, I agree with that. However, if you read the discussion in the linked bug you'll find that they need to know what file in the memory_backing_dir (from qemu.conf) corresponds to which domain. The reported suggested using UUID based filenames, which I fear is not enough because one can have multiple <memory type='dimm'/> -s configured for their domain. But I guess we could go with:
${memory_backing_dir}/${domName} for generic memory ${memory_backing_dir}/${domName}_N for Nth <memory/>
This feels like it is going to lead to hell when you add in memory hotplug/unplug, with inevitable races.
BTW: IIUC they want predictable names because they need to create the files before spawning qemu so that they are picked by qemu instead of using temporary names.
I would like to know why they even need to associate particular memory files with particular QEMU processes. eg if they're just exposing a new type of tmpfs filesystem from the kernel why does it matter what each file is used for.
This might get you answer:
https://bugzilla.redhat.com/show_bug.cgi?id=1461214#c4
So the way I understand it is that they will create the files, and provide us with paths. So luckily, we don't have to make up the paths on our own.
IOW it is pretending to be tmpfs except it is not behaving like tmpfs. This doesn't really make me any more inclined to support this closed source stuff in libvirt.
Yeah, that's my feeling too. So, what about the following: let's assume they will fix their code so that it is proper tmpfs. Libvirt can then behave to it just like it is already doing so for hugetlbfs. For us it'll be just yet another type of hugepages. I mean, for hugepages we already create /hupages/mount/point/libvirt/$domain per each domain so the separation is there (even though this is considered internal impl), since it would be a proper tmpfs they can see the pid of qemu which is trying to mmap() (and take the name or whatever unique ID they want from there).
Yep, we can at least make a reasonable guarantee that all files belonging to a single QEMU process will always be within the same sub-directory. This allows the kmod to distinguish 2 files owned by separate VMs, from 2 files owned by the same VM and do what's needed. I don't see why it would need to care about naming conventions beyond the layout.
I guess what I'm trying to ask is if it was proper tmpfs, we would be okay with it, wouldn't we?
If it is indistinguishable from tmpfs/hugetlbfs from libvirt's POV, we should be fine - at most you would need /etc/libvirt/qemu.conf change to explicitly point at the custom mount point if libvirt doesn't auto-detect the right one. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 09/06/2017 01:42 PM, Daniel P. Berrange wrote:
On Wed, Sep 06, 2017 at 01:35:45PM +0200, Michal Privoznik wrote:
On 09/05/2017 04:07 PM, Daniel P. Berrange wrote:
On Tue, Sep 05, 2017 at 03:59:09PM +0200, Michal Privoznik wrote:
On 07/28/2017 10:59 AM, Daniel P. Berrange wrote:
On Fri, Jul 28, 2017 at 10:45:21AM +0200, Michal Privoznik wrote:
On 07/27/2017 03:50 PM, Daniel P. Berrange wrote: > On Thu, Jul 27, 2017 at 02:11:25PM +0200, Michal Privoznik wrote: >> Dear list, >> >> there is the following bug [1] which I'm not quite sure how to grasp. So >> there is this application/infrastructure called Kove [2] that allows you >> to have memory for your application stored on a distant host in network >> and basically fetch needed region on pagefault. Now imagine that >> somebody wants to use it for backing up domain memory. However, the way >> that the tool works is it has some kernel module and then some userland >> binary that is fed with the path of the mmaped file. I don't know all >> the details, but the point is, in order to let users use this we need to >> expose the paths for mem-path for the guest memory. I know we did not >> want to do this in the past, but now it looks like we don't have a way >> around it, do we? > > We don't want to expose the concept of paths in the XML because this is > a linux specific way to configure hugepages / shared memory. So we hide > the particular path used in the internal impl of the QEMU driver, and > or via the qemu.conf global config file. I don't really want to change > that approach, particularly if the only reason is to integrate with a > closed source binary like Kove.
Yep, I agree with that. However, if you read the discussion in the linked bug you'll find that they need to know what file in the memory_backing_dir (from qemu.conf) corresponds to which domain. The reported suggested using UUID based filenames, which I fear is not enough because one can have multiple <memory type='dimm'/> -s configured for their domain. But I guess we could go with:
${memory_backing_dir}/${domName} for generic memory ${memory_backing_dir}/${domName}_N for Nth <memory/>
This feels like it is going to lead to hell when you add in memory hotplug/unplug, with inevitable races.
BTW: IIUC they want predictable names because they need to create the files before spawning qemu so that they are picked by qemu instead of using temporary names.
I would like to know why they even need to associate particular memory files with particular QEMU processes. eg if they're just exposing a new type of tmpfs filesystem from the kernel why does it matter what each file is used for.
This might get you answer:
https://bugzilla.redhat.com/show_bug.cgi?id=1461214#c4
So the way I understand it is that they will create the files, and provide us with paths. So luckily, we don't have to make up the paths on our own.
IOW it is pretending to be tmpfs except it is not behaving like tmpfs. This doesn't really make me any more inclined to support this closed source stuff in libvirt.
Yeah, that's my feeling too. So, what about the following: let's assume they will fix their code so that it is proper tmpfs. Libvirt can then behave to it just like it is already doing so for hugetlbfs. For us it'll be just yet another type of hugepages. I mean, for hugepages we already create /hupages/mount/point/libvirt/$domain per each domain so the separation is there (even though this is considered internal impl), since it would be a proper tmpfs they can see the pid of qemu which is trying to mmap() (and take the name or whatever unique ID they want from there).
Yep, we can at least make a reasonable guarantee that all files belonging to a single QEMU process will always be within the same sub-directory. This allows the kmod to distinguish 2 files owned by separate VMs, from 2 files owned by the same VM and do what's needed. I don't see why it would need to care about naming conventions beyond the layout.
I guess what I'm trying to ask is if it was proper tmpfs, we would be okay with it, wouldn't we?
If it is indistinguishable from tmpfs/hugetlbfs from libvirt's POV, we should be fine - at most you would need /etc/libvirt/qemu.conf change to explicitly point at the custom mount point if libvirt doesn't auto-detect the right one.
Zack, can you join the discussion and tell us if our design sounds reasonable to you? Michal

For the Kove integration, the memory is allocated on external devices, similar to a SAN device LUN allocation. As such, each virt will have its own separate allocation, and will need its memory file(s) managed independently of other virts. We also use information from the virtual machine management layer (such as RHV, oVirt, or OpenStack) to associate VM metadata (such as VM ID, owner, or project) with the allocation, to assist the administrators with monitoring, tracking, and billing memory usage. This data also assists in maintenance and troubleshooting by identifying which VMs and which hosts are utilizing memory on a given external device. I don't believe we could (easily) get this data just from the process information of the process creating or opening files, but would need to do some significant work to trace this information from the qemu/libvirt/management layer stack. Pre-allocating the files at the integration point, such as oVirt / RHV hooks or libvirt prepare hooks, allows us to collect this information from the upper layers or domain XML directly. We don't actually need the file path exposed within the domain XML itself. All that's really needed is just to have some mechanism for using predictable filename(s) for qemu, instead of memory filenames that are currently generated within qemu itself using mktemp. Our original proposal for this was to use the domain UUID for the filename, and using the file within the "memory_backing_dir" directory from qemu.conf. This does have the limitation of not supporting multiple memory backing files or hotplug. An adaptation of this would be to use the domain UUID (or domain name), plus the memory device id in the filename (for example: <domain_uuid>_<mem_id1>). This would utilize the same generation for the mem_id that is already in use for creating the memory device in qemu. Any other mechanism which would result in well-defined filenames would also work, as long as the filename is predictable prior to qemu startup. We may wish to add an additional flag in qemu.conf to enable this behavior, defaulting to the current random filename generation if not specified. As the path is in qemu.conf, and the filename would be generated internally within libvirt, this avoids exposing any file paths within the domain XML, keeping it system agnostic. An alternative would be to allow specification of the filename directly in the domain XML, while continuing to use the path from qemu.conf's memory_backing_dir directive. With this approach, libvirt would need to sanitize the filename input to prevent escaping the memory_backing_dir directory with "..". This method does expose the filenames (but not the path) in the XML, but allows the management layer (such as oVirt, RHV, or Openstack) to control the file creation locations directly. -- Zack Cornelius zack.cornelius@kove.net ----- Original Message -----
From: "Michal Privoznik" <mprivozn@redhat.com> To: "Daniel P. Berrange" <berrange@redhat.com> Cc: "libvir-list" <libvir-list@redhat.com>, "Zack Cornelius" <zack.cornelius@kove.net> Sent: Thursday, September 14, 2017 6:46:48 AM Subject: Re: [libvirt] Exposing mem-path in domain XML
On 09/06/2017 01:42 PM, Daniel P. Berrange wrote:
On Wed, Sep 06, 2017 at 01:35:45PM +0200, Michal Privoznik wrote:
On 09/05/2017 04:07 PM, Daniel P. Berrange wrote:
On Tue, Sep 05, 2017 at 03:59:09PM +0200, Michal Privoznik wrote:
On 07/28/2017 10:59 AM, Daniel P. Berrange wrote:
On Fri, Jul 28, 2017 at 10:45:21AM +0200, Michal Privoznik wrote: > On 07/27/2017 03:50 PM, Daniel P. Berrange wrote: >> On Thu, Jul 27, 2017 at 02:11:25PM +0200, Michal Privoznik wrote: >>> Dear list, >>> >>> there is the following bug [1] which I'm not quite sure how to grasp. So >>> there is this application/infrastructure called Kove [2] that allows you >>> to have memory for your application stored on a distant host in network >>> and basically fetch needed region on pagefault. Now imagine that >>> somebody wants to use it for backing up domain memory. However, the way >>> that the tool works is it has some kernel module and then some userland >>> binary that is fed with the path of the mmaped file. I don't know all >>> the details, but the point is, in order to let users use this we need to >>> expose the paths for mem-path for the guest memory. I know we did not >>> want to do this in the past, but now it looks like we don't have a way >>> around it, do we? >> >> We don't want to expose the concept of paths in the XML because this is >> a linux specific way to configure hugepages / shared memory. So we hide >> the particular path used in the internal impl of the QEMU driver, and >> or via the qemu.conf global config file. I don't really want to change >> that approach, particularly if the only reason is to integrate with a >> closed source binary like Kove. > > Yep, I agree with that. However, if you read the discussion in the > linked bug you'll find that they need to know what file in the > memory_backing_dir (from qemu.conf) corresponds to which domain. The > reported suggested using UUID based filenames, which I fear is not > enough because one can have multiple <memory type='dimm'/> -s configured > for their domain. But I guess we could go with: > > ${memory_backing_dir}/${domName} for generic memory > ${memory_backing_dir}/${domName}_N for Nth <memory/>
This feels like it is going to lead to hell when you add in memory hotplug/unplug, with inevitable races.
> BTW: IIUC they want predictable names because they need to create the > files before spawning qemu so that they are picked by qemu instead of > using temporary names.
I would like to know why they even need to associate particular memory files with particular QEMU processes. eg if they're just exposing a new type of tmpfs filesystem from the kernel why does it matter what each file is used for.
This might get you answer:
https://bugzilla.redhat.com/show_bug.cgi?id=1461214#c4
So the way I understand it is that they will create the files, and provide us with paths. So luckily, we don't have to make up the paths on our own.
IOW it is pretending to be tmpfs except it is not behaving like tmpfs. This doesn't really make me any more inclined to support this closed source stuff in libvirt.
Yeah, that's my feeling too. So, what about the following: let's assume they will fix their code so that it is proper tmpfs. Libvirt can then behave to it just like it is already doing so for hugetlbfs. For us it'll be just yet another type of hugepages. I mean, for hugepages we already create /hupages/mount/point/libvirt/$domain per each domain so the separation is there (even though this is considered internal impl), since it would be a proper tmpfs they can see the pid of qemu which is trying to mmap() (and take the name or whatever unique ID they want from there).
Yep, we can at least make a reasonable guarantee that all files belonging to a single QEMU process will always be within the same sub-directory. This allows the kmod to distinguish 2 files owned by separate VMs, from 2 files owned by the same VM and do what's needed. I don't see why it would need to care about naming conventions beyond the layout.
I guess what I'm trying to ask is if it was proper tmpfs, we would be okay with it, wouldn't we?
If it is indistinguishable from tmpfs/hugetlbfs from libvirt's POV, we should be fine - at most you would need /etc/libvirt/qemu.conf change to explicitly point at the custom mount point if libvirt doesn't auto-detect the right one.
Zack, can you join the discussion and tell us if our design sounds reasonable to you?
Michal

For the Kove integration, the memory is allocated on external devices, similar to a SAN device LUN allocation. As such, each virt will have its own separate allocation, and will need its memory file(s) managed independently of other virts. We also use information from the virtual machine management layer (such as RHV, oVirt, or OpenStack) to associate VM metadata (such as VM ID, owner, or project) with the allocation, to assist the administrators with monitoring, tracking, and billing memory usage. This data also assists in maintenance and troubleshooting by identifying which VMs and which hosts are utilizing memory on a given external device. I don't believe we could (easily) get this data just from the process information of the process creating or opening files, but would need to do some significant work to trace this information from the qemu/libvirt/management layer stack. Pre-allocating the files at the integration point, such as oVirt / RHV hooks or libvirt prepare hooks, allows us to collect this information from the upper layers or domain XML directly. We don't actually need the file path exposed within the domain XML itself. All that's really needed is just to have some mechanism for using predictable filename(s) for qemu, instead of memory filenames that are currently generated within qemu itself using mktemp. Our original proposal for this was to use the domain UUID for the filename, and using the file within the "memory_backing_dir" directory from qemu.conf. This does have the limitation of not supporting multiple memory backing files or hotplug. An adaptation of this would be to use the domain UUID (or domain name), plus the memory device id in the filename (for example: <domain_uuid>_<mem_id1>). This would utilize the same generation for the mem_id that is already in use for creating the memory device in qemu. Any other mechanism which would result in well-defined filenames would also work, as long as the filename is predictable prior to qemu startup. We may wish to add an additional flag in qemu.conf to enable this behavior, defaulting to the current random filename generation if not specified. As the path is in qemu.conf, and the filename would be generated internally within libvirt, this avoids exposing any file paths within the domain XML, keeping it system agnostic. An alternative would be to allow specification of the filename directly in the domain XML, while continuing to use the path from qemu.conf's memory_backing_dir directive. With this approach, libvirt would need to sanitize the filename input to prevent escaping the memory_backing_dir directory with "..". This method does expose the filenames (but not the path) in the XML, but allows the management layer (such as oVirt, RHV, or Openstack) to control the file creation locations directly. -- Zack Cornelius zack.cornelius@kove.net ----- Original Message -----
From: "Michal Privoznik" <mprivozn@redhat.com> To: "Daniel P. Berrange" <berrange@redhat.com> Cc: "libvir-list" <libvir-list@redhat.com>, "Zack Cornelius" <zack.cornelius@kove.net> Sent: Thursday, September 14, 2017 6:46:48 AM Subject: Re: [libvirt] Exposing mem-path in domain XML
On 09/06/2017 01:42 PM, Daniel P. Berrange wrote:
On Wed, Sep 06, 2017 at 01:35:45PM +0200, Michal Privoznik wrote:
On 09/05/2017 04:07 PM, Daniel P. Berrange wrote:
On Tue, Sep 05, 2017 at 03:59:09PM +0200, Michal Privoznik wrote:
On 07/28/2017 10:59 AM, Daniel P. Berrange wrote:
On Fri, Jul 28, 2017 at 10:45:21AM +0200, Michal Privoznik wrote: > On 07/27/2017 03:50 PM, Daniel P. Berrange wrote: >> On Thu, Jul 27, 2017 at 02:11:25PM +0200, Michal Privoznik wrote: >>> Dear list, >>> >>> there is the following bug [1] which I'm not quite sure how to grasp. So >>> there is this application/infrastructure called Kove [2] that allows you >>> to have memory for your application stored on a distant host in network >>> and basically fetch needed region on pagefault. Now imagine that >>> somebody wants to use it for backing up domain memory. However, the way >>> that the tool works is it has some kernel module and then some userland >>> binary that is fed with the path of the mmaped file. I don't know all >>> the details, but the point is, in order to let users use this we need to >>> expose the paths for mem-path for the guest memory. I know we did not >>> want to do this in the past, but now it looks like we don't have a way >>> around it, do we? >> >> We don't want to expose the concept of paths in the XML because this is >> a linux specific way to configure hugepages / shared memory. So we hide >> the particular path used in the internal impl of the QEMU driver, and >> or via the qemu.conf global config file. I don't really want to change >> that approach, particularly if the only reason is to integrate with a >> closed source binary like Kove. > > Yep, I agree with that. However, if you read the discussion in the > linked bug you'll find that they need to know what file in the > memory_backing_dir (from qemu.conf) corresponds to which domain. The > reported suggested using UUID based filenames, which I fear is not > enough because one can have multiple <memory type='dimm'/> -s configured > for their domain. But I guess we could go with: > > ${memory_backing_dir}/${domName} for generic memory > ${memory_backing_dir}/${domName}_N for Nth <memory/>
This feels like it is going to lead to hell when you add in memory hotplug/unplug, with inevitable races.
> BTW: IIUC they want predictable names because they need to create the > files before spawning qemu so that they are picked by qemu instead of > using temporary names.
I would like to know why they even need to associate particular memory files with particular QEMU processes. eg if they're just exposing a new type of tmpfs filesystem from the kernel why does it matter what each file is used for.
This might get you answer:
https://bugzilla.redhat.com/show_bug.cgi?id=1461214#c4
So the way I understand it is that they will create the files, and provide us with paths. So luckily, we don't have to make up the paths on our own.
IOW it is pretending to be tmpfs except it is not behaving like tmpfs. This doesn't really make me any more inclined to support this closed source stuff in libvirt.
Yeah, that's my feeling too. So, what about the following: let's assume they will fix their code so that it is proper tmpfs. Libvirt can then behave to it just like it is already doing so for hugetlbfs. For us it'll be just yet another type of hugepages. I mean, for hugepages we already create /hupages/mount/point/libvirt/$domain per each domain so the separation is there (even though this is considered internal impl), since it would be a proper tmpfs they can see the pid of qemu which is trying to mmap() (and take the name or whatever unique ID they want from there).
Yep, we can at least make a reasonable guarantee that all files belonging to a single QEMU process will always be within the same sub-directory. This allows the kmod to distinguish 2 files owned by separate VMs, from 2 files owned by the same VM and do what's needed. I don't see why it would need to care about naming conventions beyond the layout.
I guess what I'm trying to ask is if it was proper tmpfs, we would be okay with it, wouldn't we?
If it is indistinguishable from tmpfs/hugetlbfs from libvirt's POV, we should be fine - at most you would need /etc/libvirt/qemu.conf change to explicitly point at the custom mount point if libvirt doesn't auto-detect the right one.
Zack, can you join the discussion and tell us if our design sounds reasonable to you?
Michal

On 09/15/2017 03:49 PM, Zack Cornelius wrote:
For the Kove integration, the memory is allocated on external devices, similar to a SAN device LUN allocation. As such, each virt will have its own separate allocation, and will need its memory file(s) managed independently of other virts. We also use information from the virtual machine management layer (such as RHV, oVirt, or OpenStack) to associate VM metadata (such as VM ID, owner, or project) with the allocation, to assist the administrators with monitoring, tracking, and billing memory usage. This data also assists in maintenance and troubleshooting by identifying which VMs and which hosts are utilizing memory on a given external device. I don't believe we could (easily) get this data just from the process information of the process creating or opening files, but would need to do some significant work to trace this information from the qemu/libvirt/management layer stack. Pre-allocating the files at the integration point, such as oVirt / RHV hooks or libvirt prepare hooks, allows us to collect this information from the upper layers or domain XML directly.
We don't actually need the file path exposed within the domain XML itself. All that's really needed is just to have some mechanism for using predictable filename(s) for qemu, instead of memory filenames that are currently generated within qemu itself using mktemp. Our original proposal for this was to use the domain UUID for the filename, and using the file within the "memory_backing_dir" directory from qemu.conf. This does have the limitation of not supporting multiple memory backing files or hotplug. An adaptation of this would be to use the domain UUID (or domain name), plus the memory device id in the filename (for example: <domain_uuid>_<mem_id1>). This would utilize the same generation for the mem_id that is already in use for creating the memory device in qemu. Any other mechanism which would result in well-defined filenames would also work, as long as the filename is predictable prior to qemu startup.
I think qemu uses random file names only if the path provided ends with a directory. I've tried this locally and indeed, when full path ending with a file was provided qemu just used it. So I've written a patch that creates mem-path argument with the following structure: $memory_backing_dir/$alias Problem with this approach is that $alias is not stable. It may change on device hot(un-)plug. Moreover, we'd like to keep the possibility to be able to change it in the future should we find ourselves in such situation.
We may wish to add an additional flag in qemu.conf to enable this behavior, defaulting to the current random filename generation if not specified. As the path is in qemu.conf, and the filename would be generated internally within libvirt, this avoids exposing any file paths within the domain XML, keeping it system agnostic.
I don't think we need such switch. Others don't really care what the file is named really.
An alternative would be to allow specification of the filename directly in the domain XML, while continuing to use the path from qemu.conf's memory_backing_dir directive. With this approach, libvirt would need to sanitize the filename input to prevent escaping the memory_backing_dir directory with "..". This method does expose the filenames (but not the path) in the XML, but allows the management layer (such as oVirt, RHV, or Openstack) to control the file creation locations directly.
Well, this is interesting idea. However, it may happen that we use memory-backend-file even if no <memory model='dimm'/> device. The code that decides this is pretty complex: libvirt.org/git/?p=libvirt.git;a=blob;f=src/qemu/qemu_command.c;hb=HEAD#l3234 Therefore we might not always have user define the file name. Personally, I like the idea I've locally implemented. But, problem is we can't make such promise. Although, as a fix for a different unrelated bug we might generate the aliases at define time. If we did that, then we sort of can make the promise about the file naming. Well, sort of, because for instance for aforementioned <memory model='dimm'/> the alias for the corresponding memory-backend-file object is 'memdimmX' therefore the constructed path is different: -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/hugepages2M/libvirt/qemu/13-fedora/ram-node0,share=yes,size=4294967296 -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -object memory-backend-file,id=memdimm0,prealloc=yes,mem-path=/hugepages2M/libvirt/qemu/13-fedora/memdimm0,share=yes,size=536870912 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 The corresponding XML looks like this: <domain type='kvm'> <name>fedora</name> <uuid>63840878-0deb-4095-97e6-fc444d9bc9fa</uuid> <maxMemory slots='16' unit='KiB'>8388608</maxMemory> <memory unit='KiB'>4717568</memory> <currentMemory unit='KiB'>4194304</currentMemory> <memoryBacking> <hugepages/> <access mode='shared'/> </memoryBacking> ... <cpu mode='host-passthrough' check='none'> <topology sockets='1' cores='2' threads='2'/> <numa> <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/> </numa> </cpu> ... <devices> ... <memory model='dimm'> <target> <size unit='KiB'>523264</size> <node>0</node> </target> <address type='dimm' slot='0'/> </memory> </devices> Michal

----- Original Message -----
From: "Michal Privoznik" <mprivozn@redhat.com> To: "Zack Cornelius" <zack.cornelius@kove.net> Cc: "Daniel P. Berrange" <berrange@redhat.com>, "libvir-list" <libvir-list@redhat.com> Sent: Monday, September 25, 2017 9:17:10 AM Subject: Re: [libvirt] Exposing mem-path in domain XML
On 09/15/2017 03:49 PM, Zack Cornelius wrote:
For the Kove integration, the memory is allocated on external devices, similar to a SAN device LUN allocation. As such, each virt will have its own separate allocation, and will need its memory file(s) managed independently of other virts. We also use information from the virtual machine management layer (such as RHV, oVirt, or OpenStack) to associate VM metadata (such as VM ID, owner, or project) with the allocation, to assist the administrators with monitoring, tracking, and billing memory usage. This data also assists in maintenance and troubleshooting by identifying which VMs and which hosts are utilizing memory on a given external device. I don't believe we could (easily) get this data just from the process information of the process creating or opening files, but would need to do some significant work to trace this information from the qemu/libvirt/management layer stack. Pre-allocating the files at the integration point, such as oVirt / RHV hooks or libvirt prepare hooks, allows us to collect this information from the upper layers or domain XML directly.
We don't actually need the file path exposed within the domain XML itself. All that's really needed is just to have some mechanism for using predictable filename(s) for qemu, instead of memory filenames that are currently generated within qemu itself using mktemp. Our original proposal for this was to use the domain UUID for the filename, and using the file within the "memory_backing_dir" directory from qemu.conf. This does have the limitation of not supporting multiple memory backing files or hotplug. An adaptation of this would be to use the domain UUID (or domain name), plus the memory device id in the filename (for example: <domain_uuid>_<mem_id1>). This would utilize the same generation for the mem_id that is already in use for creating the memory device in qemu. Any other mechanism which would result in well-defined filenames would also work, as long as the filename is predictable prior to qemu startup.
I think qemu uses random file names only if the path provided ends with a directory. I've tried this locally and indeed, when full path ending with a file was provided qemu just used it. So I've written a patch that creates mem-path argument with the following structure:
$memory_backing_dir/$alias
Problem with this approach is that $alias is not stable. It may change on device hot(un-)plug. Moreover, we'd like to keep the possibility to be able to change it in the future should we find ourselves in such situation.
We may wish to add an additional flag in qemu.conf to enable this behavior, defaulting to the current random filename generation if not specified. As the path is in qemu.conf, and the filename would be generated internally within libvirt, this avoids exposing any file paths within the domain XML, keeping it system agnostic.
I don't think we need such switch. Others don't really care what the file is named really.
An alternative would be to allow specification of the filename directly in the domain XML, while continuing to use the path from qemu.conf's memory_backing_dir directive. With this approach, libvirt would need to sanitize the filename input to prevent escaping the memory_backing_dir directory with "..". This method does expose the filenames (but not the path) in the XML, but allows the management layer (such as oVirt, RHV, or Openstack) to control the file creation locations directly.
Well, this is interesting idea. However, it may happen that we use memory-backend-file even if no <memory model='dimm'/> device. The code that decides this is pretty complex:
libvirt.org/git/?p=libvirt.git;a=blob;f=src/qemu/qemu_command.c;hb=HEAD#l3234
Therefore we might not always have user define the file name.
Kove would only be using our integration with domains using the file memorybacking via the following XML, which I think simplifies the cases where the memory-backend-file gets used. <memoryBacking> <source='file'/> <access mode='shared'/> </memoryBacking> The Kove integration is not compatible with huge pages, so we're just interested in the memoryBacking source='file' case, and not the hugepages cases, if that simplifies things.
Personally, I like the idea I've locally implemented. But, problem is we can't make such promise. Although, as a fix for a different unrelated bug we might generate the aliases at define time. If we did that, then we sort of can make the promise about the file naming. Well, sort of, because for instance for aforementioned <memory model='dimm'/> the alias for the corresponding memory-backend-file object is 'memdimmX' therefore the constructed path is different:
-object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/hugepages2M/libvirt/qemu/13-fedora/ram-node0,share=yes,size=4294967296 -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -object memory-backend-file,id=memdimm0,prealloc=yes,mem-path=/hugepages2M/libvirt/qemu/13-fedora/memdimm0,share=yes,size=536870912 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0
The corresponding XML looks like this:
<domain type='kvm'> <name>fedora</name> <uuid>63840878-0deb-4095-97e6-fc444d9bc9fa</uuid> <maxMemory slots='16' unit='KiB'>8388608</maxMemory> <memory unit='KiB'>4717568</memory> <currentMemory unit='KiB'>4194304</currentMemory> <memoryBacking> <hugepages/> <access mode='shared'/> </memoryBacking> ... <cpu mode='host-passthrough' check='none'> <topology sockets='1' cores='2' threads='2'/> <numa> <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/> </numa> </cpu> ... <devices> ... <memory model='dimm'> <target> <size unit='KiB'>523264</size> <node>0</node> </target> <address type='dimm' slot='0'/> </memory> </devices>
With the other bugfix that defines the aliases within the XML, and your locally implemented idea, would the filenames then be predicable or readable from the XML when using memory source file in all the cases with memory defined in the <memory> element, memory defined as part of the NUMA node, and memory defined as a dimm device? --Zack

On 09/26/2017 12:00 AM, Zack Cornelius wrote:
----- Original Message -----
From: "Michal Privoznik" <mprivozn@redhat.com> To: "Zack Cornelius" <zack.cornelius@kove.net> Cc: "Daniel P. Berrange" <berrange@redhat.com>, "libvir-list" <libvir-list@redhat.com> Sent: Monday, September 25, 2017 9:17:10 AM Subject: Re: [libvirt] Exposing mem-path in domain XML
On 09/15/2017 03:49 PM, Zack Cornelius wrote:
Kove would only be using our integration with domains using the file memorybacking via the following XML, which I think simplifies the cases where the memory-backend-file gets used.
<memoryBacking> <source='file'/> <access mode='shared'/> </memoryBacking>
The Kove integration is not compatible with huge pages, so we're just interested in the memoryBacking source='file' case, and not the hugepages cases, if that simplifies things.
Not really. Consider the following domain configuration: <domain type='kvm'> <name>fedora</name> <memory unit='KiB'>4718592</memory> <memoryBacking> <source type='file'/> <access mode='shared'/> </memoryBacking> <cpu mode='host-passthrough' check='none'> <topology sockets='1' cores='2' threads='2'/> <numa> <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/> </numa> </cpu> <devices> <memory model='dimm'> <target> <size unit='KiB'>524288</size> <node>0</node> </target> <alias name='dimm0'/> <address type='dimm' slot='0'/> </memory> </devices> </domain> For this configuration, two memory-backend-file objects are created. The first one is for the guest RAM (for the NUMA node), second is for the DIMM module. While the alias for the DIMM module is exposed in the XML, the alias for the NUMA node is missing: -object memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qemu/ram,share=yes,size=4294967296 -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -object memory-backend-file,id=memdimm0,mem-path=/var/lib/libvirt/qemu/ram,share=yes,size=536870912 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 This complicates things IMO. I guess what I'm saying is that we can generate full paths including the device alias as filename, but I'm not sure it is going to help, is it? Alternatively, and I admit I don't know much about Kove, if libvirt would put all the files into per-domain directories, would that be enough for you? In the example above all the files are put into generic path. However, if the path looked like this: $memoryBackingDir/$domain/ You could differentiate which files belong to which domain. Michal
With the other bugfix that defines the aliases within the XML, and your locally implemented idea, would the filenames then be predicable or readable from the XML when using memory source file in all the cases with memory defined in the <memory> element, memory defined as part of the NUMA node, and memory defined as a dimm device?
That's the point. No. There's not direct 1:1 relationship between domain XML and qemu cmd line. I mean, in terms of objects. To fulfill the XML libvirt may decide to add some objects, just like we see in my example. Michal

----- Original Message -----
From: "Michal Privoznik" <mprivozn@redhat.com> To: "Zack Cornelius" <zack.cornelius@kove.net> Cc: "libvir-list" <libvir-list@redhat.com> Sent: Friday, September 29, 2017 2:44:13 AM Subject: Re: [libvirt] Exposing mem-path in domain XML
For this configuration, two memory-backend-file objects are created. The first one is for the guest RAM (for the NUMA node), second is for the DIMM module. While the alias for the DIMM module is exposed in the XML, the alias for the NUMA node is missing:
-object memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qemu/ram,share=yes,size=4294967296 -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -object memory-backend-file,id=memdimm0,mem-path=/var/lib/libvirt/qemu/ram,share=yes,size=536870912 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0
This complicates things IMO. I guess what I'm saying is that we can generate full paths including the device alias as filename, but I'm not sure it is going to help, is it?
Looking at the examples, it seems like these automatically added memory-backend-file objects would always be generated in a consistent manner. As long as this is true, and these objects always end up with a consistent filenames, our integration components can assume the filenames in a consistent way. For instance, as long as every numa <cell> element which has memory defined generates a file named ram-node<numa_node_number>, we can handle that mapping within our integration components by parsing the XML and identifying the NUMA nodes with memory, and allocating an appropriate amount, with the correct filename.
Alternatively, and I admit I don't know much about Kove, if libvirt would put all the files into per-domain directories, would that be enough for you? In the example above all the files are put into generic path. However, if the path looked like this:
$memoryBackingDir/$domain/
You could differentiate which files belong to which domain.
For our integration to work well for users, we need to allocate the external memory prior to qemu starting, so we can give descriptive error messages as part of our integration components. The directory based approach would require us to allocate external memory when the individual files are created or mmap'd in qemu. This may lead to a situation where there is insufficient external memory, leading to the failure happening within qemu, and a much more generic qemu unable to allocate memory error. With the external memory allocations and files being created and sized prior to qemu startup via our integration components called via the libvirt prepare hook, we can detect this situation during the libvirt hook, and return a more appropriate error message. --Zack

On 10/16/2017 11:42 PM, Zack Cornelius wrote:
----- Original Message -----
From: "Michal Privoznik" <mprivozn@redhat.com> To: "Zack Cornelius" <zack.cornelius@kove.net> Cc: "libvir-list" <libvir-list@redhat.com> Sent: Friday, September 29, 2017 2:44:13 AM Subject: Re: [libvirt] Exposing mem-path in domain XML
For this configuration, two memory-backend-file objects are created. The first one is for the guest RAM (for the NUMA node), second is for the DIMM module. While the alias for the DIMM module is exposed in the XML, the alias for the NUMA node is missing:
-object memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qemu/ram,share=yes,size=4294967296 -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -object memory-backend-file,id=memdimm0,mem-path=/var/lib/libvirt/qemu/ram,share=yes,size=536870912 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0
This complicates things IMO. I guess what I'm saying is that we can generate full paths including the device alias as filename, but I'm not sure it is going to help, is it?
Looking at the examples, it seems like these automatically added memory-backend-file objects would always be generated in a consistent manner. As long as this is true, and these objects always end up with a consistent filenames, our integration components can assume the filenames in a consistent way.
For instance, as long as every numa <cell> element which has memory defined generates a file named ram-node<numa_node_number>, we can handle that mapping within our integration components by parsing the XML and identifying the NUMA nodes with memory, and allocating an appropriate amount, with the correct filename.
Okay. But I just realized, if /var/lib/libvirt/qemu/ram is the memoryBackingDir set in qemu.conf, we're in a trouble. I mean, in general, memoryBackingDir is shared across all the domains. So, if the first domain comes and creates/uses: $memoryBackingDir/ram0, the second domain is going to construct the very same path and things will clash. So what if, I'd introduce yet another config option into qemu.conf that accepts a boolean value, say memoryBackingDirUsePredictive (very bad name, I'm all ears for better one), and if it is enabled, libvirt constructs the following path instead: $memoryBackingDir/libvirt/qemu/$shortName/ramN where $shortName is the result of virDomainObjGetShortName(), which in 99% cases is ${domainID}-${domainName} BTW, I'm still not fully convinced this is a good idea to do so. But if we document that the constructed path can change across the releases (if we find ourselves in need to do so), we should be safe. So let me write the patch and post it onto the list. Michal
participants (3)
-
Daniel P. Berrange
-
Michal Privoznik
-
Zack Cornelius