[libvirt] Error when creating VM with persistent memory

I am trying to create a VM with added persistent memory using virsh. It fails when persistent memory namespace size is larger than available system memory. Please see error below, prealloc=yes is set in the command line. For dax type namespace, as I understand, prealloc should not be needed. Is this support to add persistent memory implemented in virsh? If so, how to set prealloc=no? Anything else needed to add in xml for persistent memory? Thanks. 1. dax namespace (400GB) which is larger than system memory. # ndctl create-namespace -t pmem -m devdax --align=4k -s 400G { "dev":"namespace1.0", "mode":"devdax", "map":"dev", "size":"393.75 GiB (422.78 GB)", "uuid":"00670b70-7871-4749-a901-9184b7a388d8", "raw_uuid":"a5a1d3f8-01e4-45a2-bad1-38fd09bf4672", "daxregion":{ "id":1, "size":"393.75 GiB (422.78 GB)", "align":4096, "devices":[ { "chardev":"dax1.0", "size":"393.75 GiB (422.78 GB)" } ] }, "numa_node":1 } 2. Relevant parts of xml <domain type='kvm'> <name>test</name> <uuid>02c49a19-ce9e-4320-9a6b-1ebf0913a10e</uuid> <maxMemory slots='16' unit='KiB'>459800576</maxMemory> <memory unit='KiB'>65536000</memory> <currentMemory unit='KiB'>65536000</currentMemory> <vcpu placement='static'>4</vcpu> <os> : : <cpu mode='custom' match='exact' check='partial'> <feature policy='require' name='hypervisor'/> <numa> <cell id='0' cpus='0-3' memory='65536000' unit='KiB'/> </numa> </cpu> : : <memory model='nvdimm'> <source> <path>/dev/dax1.0</path> </source> <target> <size unit='GiB'>376</size> <node>0</node> <label> <size unit='KiB'>128</size> </label> </target> <address type='dimm' slot='0'/> </memory> 3. Error: # virsh start test error: Failed to start domain test error: internal error: qemu unexpectedly closed the monitor: ftruncate: Invalid argument 2019-08-22T04:16:08.744402Z qemu-system-x86_64: -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/dev/dax1.0,size=403726925824: unable to map backing store for guest RAM: Cannot allocate memory Virsh version 4.7.0, qemu version 3.0.0

On 8/27/19 9:58 AM, Seema Pandit wrote:
error: internal error: qemu unexpectedly closed the monitor: ftruncate: Invalid argument 2019-08-22T04:16:08.744402Z qemu-system-x86_64: -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/dev/dax1.0,size=403726925824: unable to map backing store for guest RAM: Cannot allocate memory
I wonder if dropping 'prealloc' would help. This error message comes from phase before 'prealloc' is processed. You can try the following patch: diff --git i/src/qemu/qemu_command.c w/src/qemu/qemu_command.c index 373ebd6d1a..c375955eab 100644 --- i/src/qemu/qemu_command.c +++ w/src/qemu/qemu_command.c @@ -3468,7 +3468,7 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps, } if (virJSONValueObjectAdd(props, - "B:prealloc", prealloc, + "B:prealloc", prealloc ? false : false, "s:mem-path", memPath, NULL) < 0) goto cleanup; Michal

Hi Michal, Thank you for the reply. I was having issues compiling qemu code on fedora29. So instead of dropping prealloc in virsh, tried adding prealloc=yes in qemu command line. prealloc=yes works. It does not lead to using more system memory when using DAX. +Dan Here are the steps: ndctl create-namespace -t pmem -m fsdax --align=4k -s 400G mkfs.ext4 /dev/pmem0 mount -o dax /dev/pmem0 /mnt/pmem0 dd if=/dev/zero of=/mnt/pmem0/file1 bs=4k count=104857600 [root@system-name]# dd if=/dev/zero of=/mnt/pmem0/file1 bs=4k count=104857600 dd: error writing '/mnt/pmem0/file1': No space left on device 101313980+0 records in 101313979+0 records out 414982057984 bytes (415 GB, 386 GiB) copied, 946.495 s, 438 MB/s Slightly smaller file is created than asked. [root@system-name]# du -sh 387G . sample qemu command line which works: qemu-system-x86_64 \ -name test \ -drive file=/var/lib/libvirt/images/test-ad.qcow2,format=qcow2,index=0,media=disk \ -m 2G,slots=4,maxmem=428G \ -smp 2 \ -machine pc,accel=kvm,nvdimm=on \ -enable-kvm \ -object memory-backend-file,id=pmem1,prealloc=yes,share=on,mem-path=/mnt/pmem0/file1,size=386G,align=4K \ -device nvdimm,memdev=pmem1,id=nv1 \ -daemonize So prealloc option works. But still passing /mnt/pmem0/file1 in virsh as the nvdimm fails to start the VM. Errors out saying cannot allocate that much memory. On Wed, Aug 28, 2019 at 6:49 PM Michal Privoznik <mprivozn@redhat.com> wrote:
On 8/27/19 9:58 AM, Seema Pandit wrote:
error: internal error: qemu unexpectedly closed the monitor: ftruncate: Invalid argument 2019-08-22T04:16:08.744402Z qemu-system-x86_64: -object
memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/dev/dax1.0,size=403726925824:
unable to map backing store for guest RAM: Cannot allocate memory
I wonder if dropping 'prealloc' would help. This error message comes from phase before 'prealloc' is processed. You can try the following patch:
diff --git i/src/qemu/qemu_command.c w/src/qemu/qemu_command.c index 373ebd6d1a..c375955eab 100644 --- i/src/qemu/qemu_command.c +++ w/src/qemu/qemu_command.c @@ -3468,7 +3468,7 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps, }
if (virJSONValueObjectAdd(props, - "B:prealloc", prealloc, + "B:prealloc", prealloc ? false : false, "s:mem-path", memPath, NULL) < 0) goto cleanup;
Michal

On 8/31/19 7:33 PM, Seema Pandit wrote:
Hi Michal, Thank you for the reply. I was having issues compiling qemu code on fedora29. So instead of dropping prealloc in virsh, tried adding prealloc=yes in qemu command line. prealloc=yes works. It does not lead to using more system memory when using DAX. +Dan Here are the steps:
ndctl create-namespace -t pmem -m fsdax --align=4k -s 400G
mkfs.ext4 /dev/pmem0
mount -o dax /dev/pmem0 /mnt/pmem0
dd if=/dev/zero of=/mnt/pmem0/file1 bs=4k count=104857600
[root@system-name]# dd if=/dev/zero of=/mnt/pmem0/file1 bs=4k count=104857600
dd: error writing '/mnt/pmem0/file1': No space left on device
101313980+0 records in
101313979+0 records out
414982057984 bytes (415 GB, 386 GiB) copied, 946.495 s, 438 MB/s
Slightly smaller file is created than asked.
[root@system-name]# du -sh
387G .
sample qemu command line which works:
qemu-system-x86_64 \
-name test \
-drive file=/var/lib/libvirt/images/test-ad.qcow2,format=qcow2,index=0,media=disk \ -m 2G,slots=4,maxmem=428G \ -smp 2 \ -machine pc,accel=kvm,nvdimm=on \ -enable-kvm \ -object memory-backend-file,id=pmem1,prealloc=yes,share=on,mem-path=/mnt/pmem0/file1,size=386G,align=4K \ -device nvdimm,memdev=pmem1,id=nv1 \ -daemonize
Looks like the only difference to the cmd line generated by libvirt and this one then is align=4K. To confirm that, can you share the full qemu cmd line as generated by libvirt please? Libvirt does not do anything special with guest memory, so this is matter of qemu cmd line and that's why we need to see what's different, what works and what doesn't. Then we have some lead to understand the problem IMO. Michal

After adding the memoryBacking tag in xml as below (in addition, to other xml changes to add nvdimm), virsh could allocate AD memory larger than the system RAM and VMs could start successfully. <memoryBacking> <access mode='shared'/> <discard/> </memoryBacking> This adds share=yes in command line. -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/mnt/pmem0/file1,share=yes,size=414464344064 -device nvdimm,node=0,label-size=131072,memdev=memnvdimm0,id=nvdimm0,slot=0 For reference qemu command line where VM starts quickly: qemu-system-x86_64 \ -name qemu-gold29 \ -drive file=/var/lib/libvirt/images/gold29-ad.qcow2,format=qcow2,index=0,media=disk \ -m 2G,slots=4,maxmem=428G \ -smp 2 \ -machine pc,accel=kvm,nvdimm=on \ -enable-kvm \ -object memory-backend-file,id=pmem1,share=on,mem-path=/mnt/pmem0/file1,size=386G,align=4K \ -device nvdimm,memdev=pmem1,id=nv1 \ -daemonize Qemu command line generated from virsh: (please note VM now starts with this command line, shared=yes.) /usr/bin/qemu-system-x86_64 -machine accel=kvm -name guest=mix-test,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-18-mix-test/master-key.aes -machine pc-i440fx-3.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off,nvdimm=on -cpu Skylake-Server-IBRS,hypervisor=on -m size=2097152k,slots=16,maxmem=419430400k -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -object memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qem /ram/libvirt/qemu/18-mix-test/ram-node0,discard-data=yes,share=yes,size=2147483648 -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/mnt/pmem0/file1,share=yes,size=414464344064 -device nvdimm,node=0,label-size=131072,memdev=memnvdimm0,id=nvdimm0,slot=0 -uuid 318c0529-0330-460b-8d0a-3b253e9decdd -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=32,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/var/lib/libvirt/images/mix-test.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-0-0,readonly=on -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=34,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:06:db:55,bus=pci.0,addr=0xa -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=35,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5901,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x9 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on VM start takes longer that qemu. Any thoughts?? Why is prealloc=yes default on for nvdimm? Any other important deltas? On Mon, Sep 2, 2019 at 12:21 PM Michal Privoznik <mprivozn@redhat.com> wrote:
On 8/31/19 7:33 PM, Seema Pandit wrote:
Hi Michal, Thank you for the reply. I was having issues compiling qemu code on fedora29. So instead of dropping prealloc in virsh, tried adding prealloc=yes in qemu command line. prealloc=yes works. It does not lead to using more system memory when using DAX. +Dan Here are the steps:
ndctl create-namespace -t pmem -m fsdax --align=4k -s 400G
mkfs.ext4 /dev/pmem0
mount -o dax /dev/pmem0 /mnt/pmem0
dd if=/dev/zero of=/mnt/pmem0/file1 bs=4k count=104857600
[root@system-name]# dd if=/dev/zero of=/mnt/pmem0/file1 bs=4k count=104857600
dd: error writing '/mnt/pmem0/file1': No space left on device
101313980+0 records in
101313979+0 records out
414982057984 bytes (415 GB, 386 GiB) copied, 946.495 s, 438 MB/s
Slightly smaller file is created than asked.
[root@system-name]# du -sh
387G .
sample qemu command line which works:
qemu-system-x86_64 \
-name test \
-drive
file=/var/lib/libvirt/images/test-ad.qcow2,format=qcow2,index=0,media=disk
\ -m 2G,slots=4,maxmem=428G \ -smp 2 \ -machine pc,accel=kvm,nvdimm=on \ -enable-kvm \ -object
memory-backend-file,id=pmem1,prealloc=yes,share=on,mem-path=/mnt/pmem0/file1,size=386G,align=4K
\ -device nvdimm,memdev=pmem1,id=nv1 \ -daemonize
Looks like the only difference to the cmd line generated by libvirt and this one then is align=4K. To confirm that, can you share the full qemu cmd line as generated by libvirt please? Libvirt does not do anything special with guest memory, so this is matter of qemu cmd line and that's why we need to see what's different, what works and what doesn't. Then we have some lead to understand the problem IMO.
Michal

On Mon, Sep 2, 2019 at 10:10 AM Seema Pandit <pan.blr.17@gmail.com> wrote:
After adding the memoryBacking tag in xml as below (in addition, to other xml changes to add nvdimm), virsh could allocate AD memory larger than the system RAM and VMs could start successfully.
<memoryBacking>
<access mode='shared'/>
<discard/>
</memoryBacking>
This adds share=yes in command line.
-object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/mnt/pmem0/file1,share=yes,size=414464344064 -device nvdimm,node=0,label-size=131072,memdev=memnvdimm0,id=nvdimm0,slot=0
For reference qemu command line where VM starts quickly:
qemu-system-x86_64 \
-name qemu-gold29 \
-drive file=/var/lib/libvirt/images/gold29-ad.qcow2,format=qcow2,index=0,media=disk \ -m 2G,slots=4,maxmem=428G \ -smp 2 \ -machine pc,accel=kvm,nvdimm=on \ -enable-kvm \ -object memory-backend-file,id=pmem1,share=on,mem-path=/mnt/pmem0/file1,size=386G,align=4K \ -device nvdimm,memdev=pmem1,id=nv1 \ -daemonize
Qemu command line generated from virsh: (please note VM now starts with this command line, shared=yes.)
/usr/bin/qemu-system-x86_64 -machine accel=kvm -name guest=mix-test,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-18-mix-test/master-key.aes -machine pc-i440fx-3.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off,nvdimm=on -cpu Skylake-Server-IBRS,hypervisor=on -m size=2097152k,slots=16,maxmem=419430400k -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -object memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qem /ram/libvirt/qemu/18-mix-test/ram-node0,discard-data=yes,share=yes,size=2147483648 -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/mnt/pmem0/file1,share=yes,size=414464344064 -device nvdimm,node=0,label-size=131072,memdev=memnvdimm0,id=nvdimm0,slot=0 -uuid 318c0529-0330-460b-8d0a-3b253e9decdd -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=32,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/var/lib/libvirt/images/mix-test.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-0-0,readonly=on -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=34,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:06:db:55,bus=pci.0,addr=0xa -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=35,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5901,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x9 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
VM start takes longer that qemu. Any thoughts?? Why is prealloc=yes default on for nvdimm? Any other important deltas?
The "prealloc" option is a "pay me now" vs "pay me later" type of decision. If the guest workload is ok to absorb fault latency at run time then disable prealloc, if it would prefer more predictable latency and pay all the fault penalty up front then specify prealloc. The "shared" parameter, if it means MAP_SHARED vs MAP_PRIVATE, must be set to "yes" if the guest expects persistence. MAP_PRIVATE is otherwise unacceptable for emulating persistent memory because writes to private memory are discarded, and as you have seen require volatile DRAM backing even in the DAX case.

Actually there is still some issue around this. When trying to start another VM, so using even more pmem, there is a different issue now. Error copied below. []# virsh start manual_clone error: Failed to start domain manual_clone error: internal error: qemu unexpectedly closed the monitor: ftruncate: Invalid argument 2019-09-03T21:28:41.031924Z qemu-system-x86_64: -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/dev/dax1.1,share=yes,size=73014444032: os_mem_prealloc: Insufficient free host memory pages available to allocate guest RAM Old error was Error: # virsh start test error: Failed to start domain test error: internal error: qemu unexpectedly closed the monitor: ftruncate: Invalid argument 2019-08-22T04:16:08.744402Z qemu-system-x86_64: -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/dev/dax1.0,size=403726925824: unable to map backing store for guest RAM: Cannot allocate memory On Tue, Sep 3, 2019 at 9:53 AM Dan Williams <dan.j.williams@intel.com> wrote:
On Mon, Sep 2, 2019 at 10:10 AM Seema Pandit <pan.blr.17@gmail.com> wrote:
After adding the memoryBacking tag in xml as below (in addition, to
other xml changes to add nvdimm), virsh could allocate AD memory larger than the system RAM and VMs could start successfully.
<memoryBacking>
<access mode='shared'/>
<discard/>
</memoryBacking>
This adds share=yes in command line.
-object
memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/mnt/pmem0/file1,share=yes,size=414464344064 -device nvdimm,node=0,label-size=131072,memdev=memnvdimm0,id=nvdimm0,slot=0
For reference qemu command line where VM starts quickly:
qemu-system-x86_64 \
-name qemu-gold29 \
-drive
file=/var/lib/libvirt/images/gold29-ad.qcow2,format=qcow2,index=0,media=disk \ -m 2G,slots=4,maxmem=428G \ -smp 2 \ -machine pc,accel=kvm,nvdimm=on \ -enable-kvm \ -object memory-backend-file,id=pmem1,share=on,mem-path=/mnt/pmem0/file1,size=386G,align=4K \ -device nvdimm,memdev=pmem1,id=nv1 \ -daemonize
Qemu command line generated from virsh: (please note VM now starts with
this command line, shared=yes.)
/usr/bin/qemu-system-x86_64 -machine accel=kvm -name
guest=mix-test,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-18-mix-test/master-key.aes -machine pc-i440fx-3.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off,nvdimm=on -cpu Skylake-Server-IBRS,hypervisor=on -m size=2097152k,slots=16,maxmem=419430400k -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -object memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qem /ram/libvirt/qemu/18-mix-test/ram-node0,discard-data=yes,share=yes,size=2147483648 -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/mnt/pmem0/file1,share=yes,size=414464344064 -device nvdimm,node=0,label-size=131072,memdev=memnvdimm0,id=nvdimm0,slot=0 -uuid 318c0529-0330-460b-8d0a-3b253e9decdd -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=32,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/var/lib/libvirt/images/mix-test.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-0-0,readonly=on -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=34,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:06:db:55,bus=pci.0,addr=0xa -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=35,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5901,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x9 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
VM start takes longer that qemu. Any thoughts?? Why is prealloc=yes
default on for nvdimm? Any other important deltas?
The "prealloc" option is a "pay me now" vs "pay me later" type of decision. If the guest workload is ok to absorb fault latency at run time then disable prealloc, if it would prefer more predictable latency and pay all the fault penalty up front then specify prealloc.
The "shared" parameter, if it means MAP_SHARED vs MAP_PRIVATE, must be set to "yes" if the guest expects persistence. MAP_PRIVATE is otherwise unacceptable for emulating persistent memory because writes to private memory are discarded, and as you have seen require volatile DRAM backing even in the DAX case.

On Wed, Sep 4, 2019 at 12:56 AM Seema Pandit <pan.blr.17@gmail.com> wrote:
Actually there is still some issue around this. When trying to start another VM, so using even more pmem, there is a different issue now. Error copied below.
[]# virsh start manual_clone
error: Failed to start domain manual_clone
error: internal error: qemu unexpectedly closed the monitor: ftruncate: Invalid argument
2019-09-03T21:28:41.031924Z qemu-system-x86_64: -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/dev/dax1.1,share=yes,size=73014444032: os_mem_prealloc: Insufficient free host memory pages available to allocate guest RAM
/dev/dax instances do not support the ftruncate syscall, and device-dax instances are already fully allocated by definition. The prealloc option is simply invalid for /dev/dax targets.

On 9/4/19 6:31 PM, Dan Williams wrote:
On Wed, Sep 4, 2019 at 12:56 AM Seema Pandit <pan.blr.17@gmail.com> wrote:
Actually there is still some issue around this. When trying to start another VM, so using even more pmem, there is a different issue now. Error copied below.
[]# virsh start manual_clone
error: Failed to start domain manual_clone
error: internal error: qemu unexpectedly closed the monitor: ftruncate: Invalid argument
2019-09-03T21:28:41.031924Z qemu-system-x86_64: -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/dev/dax1.1,share=yes,size=73014444032: os_mem_prealloc: Insufficient free host memory pages available to allocate guest RAM
/dev/dax instances do not support the ftruncate syscall, and device-dax instances are already fully allocated by definition. The prealloc option is simply invalid for /dev/dax targets.
Ah, in that case we need an XML knob that if set doesn't put prealloc=yes onto the qemu cmd line. Unfortunatelly, I don't think I have any spare time to implement that, so please be my guest. Michal
participants (4)
-
Dan Williams
-
Michal Privoznik
-
Michal Prívozník
-
Seema Pandit