On Mon, Sep 2, 2019 at 10:10 AM Seema Pandit <pan.blr.17(a)gmail.com> wrote:
After adding the memoryBacking tag in xml as below (in addition, to other xml changes to
add nvdimm), virsh could allocate AD memory larger than the system RAM and VMs could start
successfully.
<memoryBacking>
<access mode='shared'/>
<discard/>
</memoryBacking>
This adds share=yes in command line.
-object
memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/mnt/pmem0/file1,share=yes,size=414464344064
-device nvdimm,node=0,label-size=131072,memdev=memnvdimm0,id=nvdimm0,slot=0
For reference qemu command line where VM starts quickly:
qemu-system-x86_64 \
-name qemu-gold29 \
-drive file=/var/lib/libvirt/images/gold29-ad.qcow2,format=qcow2,index=0,media=disk \ -m
2G,slots=4,maxmem=428G \ -smp 2 \ -machine pc,accel=kvm,nvdimm=on \ -enable-kvm \ -object
memory-backend-file,id=pmem1,share=on,mem-path=/mnt/pmem0/file1,size=386G,align=4K \
-device nvdimm,memdev=pmem1,id=nv1 \ -daemonize
Qemu command line generated from virsh: (please note VM now starts with this command
line, shared=yes.)
/usr/bin/qemu-system-x86_64 -machine accel=kvm -name guest=mix-test,debug-threads=on -S
-object
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-18-mix-test/master-key.aes
-machine pc-i440fx-3.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off,nvdimm=on -cpu
Skylake-Server-IBRS,hypervisor=on -m size=2097152k,slots=16,maxmem=419430400k -realtime
mlock=off -smp 2,sockets=2,cores=1,threads=1 -object
memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qem
/ram/libvirt/qemu/18-mix-test/ram-node0,discard-data=yes,share=yes,size=2147483648 -numa
node,nodeid=0,cpus=0-1,memdev=ram-node0 -object
memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/mnt/pmem0/file1,share=yes,size=414464344064
-device nvdimm,node=0,label-size=131072,memdev=memnvdimm0,id=nvdimm0,slot=0 -uuid
318c0529-0330-460b-8d0a-3b253e9decdd -no-user-config -nodefaults -chardev
socket,id=charmonitor,fd=32,server,nowait -mon chardev=charmonitor,id=monitor,mode=control
-rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown
-global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device
ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device
ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device
ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive
file=/var/lib/libvirt/images/mix-test.qcow2,format=qcow2,if=none,id=drive-virtio-disk0
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-drive if=none,id=drive-ide0-0-0,readonly=on -device
ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=34,id=hostnet0
-device e1000,netdev=hostnet0,id=net0,mac=52:54:00:06:db:55,bus=pci.0,addr=0xa -chardev
pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev
socket,id=charchannel0,fd=35,server,nowait -device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
-chardev spicevmc,id=charchannel1,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
-device usb-tablet,id=input0,bus=usb.0,port=1 -spice
port=5901,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on
-device
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2
-device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device
hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev
spicevmc,id=charredir0,name=usbredir -device
usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev
spicevmc,id=charredir1,name=usbredir -device
usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -object
rng-random,id=objrng0,filename=/dev/urandom -device
virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x9 -sandbox
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
VM start takes longer that qemu. Any thoughts?? Why is prealloc=yes default on for
nvdimm? Any other important deltas?
The "prealloc" option is a "pay me now" vs "pay me later"
type of
decision. If the guest workload is ok to absorb fault latency at run
time then disable prealloc, if it would prefer more predictable
latency and pay all the fault penalty up front then specify prealloc.
The "shared" parameter, if it means MAP_SHARED vs MAP_PRIVATE, must be
set to "yes" if the guest expects persistence. MAP_PRIVATE is
otherwise unacceptable for emulating persistent memory because writes
to private memory are discarded, and as you have seen require volatile
DRAM backing even in the DAX case.