[libvirt] Designing NVDIMM & memory-backend-file

Dear list, I'd like to fix the following bug [1]. Long story short, the only way how to have a domain use memory-backend-file object is to configure hugepages. Either for whole domain under <memoryBacking/> element, or in <memory/> device: <memoryBacking> <hugepages> <page size='2048' unit='KiB' nodeset='1'/> </hugepages> </memoryBacking> -object memory-backend-file,id=ram-node1,prealloc=yes,\ mem-path=/dev/hugepages2M/libvirt/qemu,size=1073741824,host-nodes=0-3,\ policy=bind \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \ <memory model='dimm'> <source> <nodemask>1-3</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>524287</size> <node>0</node> </target> </memory> -object memory-backend-file,id=memdimm1,prealloc=yes,\ mem-path=/dev/hugepages2M/libvirt/qemu,size=536870912,host-nodes=1-3,policy=bind \ -device pc-dimm,node=0,memdev=memdimm1,id=dimm1 \ Now, there's a request in the BZ to allow applications to use the memory-backend-file but let them use different backend, well different path for backing the memory, e.g. shm which is usually mounted at /dev/shm. And I'd like to consult my idea before I dig deep in the patches. My idea is to extend <memory/> device we have, more precisely <source/> element to allow something like this: <memory model='dimm'> <source> <path>/dev/shm</path> </source> <target/> </memory> This way we can allow users to pass an arbitrary path to the memory-backend-file. Also the amount of code needing change would be fairly small O:-) And while I'll be working on this, I want to take a look at NVDIMM feature that qemu introduced recently [2]. The way I understand it, it is very similar to my problem from above. The only difference (for the consistence) would be that the memory model would be something else than 'dimm' ('nvdimm' perhaps?) <memory model='nvdimm'> <source> <path>/tmp/nvdimm1</path> </source> <target> <size unit='GiB'>10</size> </target> </memory> -machine pc,nvdimm -m 8G,maxmem=100G,slots=100 \ -object memory-backend-file,id=mem1,share,mem-path=/tmp/nvdimm1,size=10G \ -device nvdimm,memdev=mem1,id=nv1 Please, let me know what do you think of this. Thank you for saving me from doing couple of useless iterations of patches getting the design right :-) Michal 1: https://bugzilla.redhat.com/show_bug.cgi?id=1214369 2: http://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg07068.html

On Fri, Jul 01, 2016 at 04:35:18PM +0200, Michal Privoznik wrote: [...]
And while I'll be working on this, I want to take a look at NVDIMM feature that qemu introduced recently [2]. The way I understand it, it is very similar to my problem from above. The only difference (for the consistence) would be that the memory model would be something else than 'dimm' ('nvdimm' perhaps?)
<memory model='nvdimm'> <source> <path>/tmp/nvdimm1</path> </source> <target> <size unit='GiB'>10</size> </target> </memory>
-machine pc,nvdimm -m 8G,maxmem=100G,slots=100 \ -object memory-backend-file,id=mem1,share,mem-path=/tmp/nvdimm1,size=10G \ -device nvdimm,memdev=mem1,id=nv1
This patch shows how libguestfs would like to construct the qemu command line to enable DAX support: https://www.redhat.com/archives/libguestfs/2016-May/msg00138.html Note we need to control share (set share=off specifically). Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org

On Fri, Jul 01, 2016 at 04:35:18PM +0200, Michal Privoznik wrote:
Dear list,
I'd like to fix the following bug [1]. Long story short, the only way how to have a domain use memory-backend-file object is to configure hugepages. Either for whole domain under <memoryBacking/> element, or in <memory/> device:
<memoryBacking> <hugepages> <page size='2048' unit='KiB' nodeset='1'/> </hugepages> </memoryBacking>
-object memory-backend-file,id=ram-node1,prealloc=yes,\ mem-path=/dev/hugepages2M/libvirt/qemu,size=1073741824,host-nodes=0-3,\ policy=bind \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \
<memory model='dimm'> <source> <nodemask>1-3</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>524287</size> <node>0</node> </target> </memory>
-object memory-backend-file,id=memdimm1,prealloc=yes,\ mem-path=/dev/hugepages2M/libvirt/qemu,size=536870912,host-nodes=1-3,policy=bind \ -device pc-dimm,node=0,memdev=memdimm1,id=dimm1 \
Now, there's a request in the BZ to allow applications to use the memory-backend-file but let them use different backend, well different path for backing the memory, e.g. shm which is usually mounted at /dev/shm. And I'd like to consult my idea before I dig deep in the patches.
My idea is to extend <memory/> device we have, more precisely <source/> element to allow something like this:
<memory model='dimm'> <source> <path>/dev/shm</path> </source> <target/> </memory>
This way we can allow users to pass an arbitrary path to the memory-backend-file. Also the amount of code needing change would be fairly small O:-)
I'm not really a fan of exposing file paths for shm in the XML. This filesystem exposure of shm is a linux specific concept which other UNIX with shm support don't do afaik. This is the same reasn why we don't expose the actual hugepages path in the XML, just let the user request hugepages and libvirt figures out the linux-specific way to enable it. It is ok to allow apps to request anonymous vs file backed memory as a general concept, but I don't think we should let them request specific paths with specific semantics. ie, with your suggestion here an app could request /dev/hugepages as the path and turn on huge pages for the guest via the back door I much prefer this proposed design to allow requesting file backed memory https://www.redhat.com/archives/libvir-list/2016-June/msg01651.html Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 01.07.2016 17:06, Daniel P. Berrange wrote:
On Fri, Jul 01, 2016 at 04:35:18PM +0200, Michal Privoznik wrote:
Dear list,
I'd like to fix the following bug [1]. Long story short, the only way how to have a domain use memory-backend-file object is to configure hugepages. Either for whole domain under <memoryBacking/> element, or in <memory/> device:
<memoryBacking> <hugepages> <page size='2048' unit='KiB' nodeset='1'/> </hugepages> </memoryBacking>
-object memory-backend-file,id=ram-node1,prealloc=yes,\ mem-path=/dev/hugepages2M/libvirt/qemu,size=1073741824,host-nodes=0-3,\ policy=bind \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \
<memory model='dimm'> <source> <nodemask>1-3</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>524287</size> <node>0</node> </target> </memory>
-object memory-backend-file,id=memdimm1,prealloc=yes,\ mem-path=/dev/hugepages2M/libvirt/qemu,size=536870912,host-nodes=1-3,policy=bind \ -device pc-dimm,node=0,memdev=memdimm1,id=dimm1 \
Now, there's a request in the BZ to allow applications to use the memory-backend-file but let them use different backend, well different path for backing the memory, e.g. shm which is usually mounted at /dev/shm. And I'd like to consult my idea before I dig deep in the patches.
My idea is to extend <memory/> device we have, more precisely <source/> element to allow something like this:
<memory model='dimm'> <source> <path>/dev/shm</path> </source> <target/> </memory>
This way we can allow users to pass an arbitrary path to the memory-backend-file. Also the amount of code needing change would be fairly small O:-)
I'm not really a fan of exposing file paths for shm in the XML. This filesystem exposure of shm is a linux specific concept which other UNIX with shm support don't do afaik. This is the same reasn why we don't expose the actual hugepages path in the XML, just let the user request hugepages and libvirt figures out the linux-specific way to enable it.
Okay, fair enough. And I can see it working with /dev/shm. But for NVDIMM we certainly have to let users specify path to the DIMM module. Therefore I suggested what I've suggested. But maybe I'm joining two separate problems together. So for the /dev/shm part there seems to be some movement. But for the NVDIMM module - any ideas how to express that one? Michal

On Mon, Jul 04, 2016 at 11:21:20AM +0200, Michal Privoznik wrote:
On 01.07.2016 17:06, Daniel P. Berrange wrote:
On Fri, Jul 01, 2016 at 04:35:18PM +0200, Michal Privoznik wrote:
Dear list,
I'd like to fix the following bug [1]. Long story short, the only way how to have a domain use memory-backend-file object is to configure hugepages. Either for whole domain under <memoryBacking/> element, or in <memory/> device:
<memoryBacking> <hugepages> <page size='2048' unit='KiB' nodeset='1'/> </hugepages> </memoryBacking>
-object memory-backend-file,id=ram-node1,prealloc=yes,\ mem-path=/dev/hugepages2M/libvirt/qemu,size=1073741824,host-nodes=0-3,\ policy=bind \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 \
<memory model='dimm'> <source> <nodemask>1-3</nodemask> <pagesize unit='KiB'>2048</pagesize> </source> <target> <size unit='KiB'>524287</size> <node>0</node> </target> </memory>
-object memory-backend-file,id=memdimm1,prealloc=yes,\ mem-path=/dev/hugepages2M/libvirt/qemu,size=536870912,host-nodes=1-3,policy=bind \ -device pc-dimm,node=0,memdev=memdimm1,id=dimm1 \
Now, there's a request in the BZ to allow applications to use the memory-backend-file but let them use different backend, well different path for backing the memory, e.g. shm which is usually mounted at /dev/shm. And I'd like to consult my idea before I dig deep in the patches.
My idea is to extend <memory/> device we have, more precisely <source/> element to allow something like this:
<memory model='dimm'> <source> <path>/dev/shm</path> </source> <target/> </memory>
This way we can allow users to pass an arbitrary path to the memory-backend-file. Also the amount of code needing change would be fairly small O:-)
I'm not really a fan of exposing file paths for shm in the XML. This filesystem exposure of shm is a linux specific concept which other UNIX with shm support don't do afaik. This is the same reasn why we don't expose the actual hugepages path in the XML, just let the user request hugepages and libvirt figures out the linux-specific way to enable it.
Okay, fair enough. And I can see it working with /dev/shm. But for NVDIMM we certainly have to let users specify path to the DIMM module. Therefore I suggested what I've suggested. But maybe I'm joining two separate problems together.
So for the /dev/shm part there seems to be some movement. But for the NVDIMM module - any ideas how to express that one?
Ok, i guess it is sensible to view the NVDIMM as just another type of block device, so from that POV it is reasonable to allow user to specify a path to the backing store for NVDIMM, in the same way we do with <disk/> sources. So we just need to make clear the distinction between NVDIMM config, and general ability to use volatile shared memory backing with regular DIMMs. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (3)
-
Daniel P. Berrange
-
Michal Privoznik
-
Richard W.M. Jones