[libvirt] Question : Configuring a VM with backing 1G huge pages across 2 NUMA nodes

Hi Michal, 'have a kernel+qemu+libvirt setup with all recent upstream bits on a given host & was trying to configure a VM with backing 1G huge pages...spanning 2 NUMA nodes. The host had 3 1G huge pages on each of the 2 NUMA nodes : # cat /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages 3 # cat /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages 3 And I had the following in the /etc/fstab hugetlbfs /hugepages_1G hugetlbfs pagesize=1GB 0 0 I added the following entries in the xml file for the 4G/4vcpu VM <memoryBacking> <hugepages> <page size='1048576' unit='KiB' nodeset='0'/> <page size='1048576' unit='KiB' nodeset='1'/> </hugepages> </memoryBacking> <vcpu placement='static'>4</vcpu> <cputune> <vcpupin vcpu='0' cpuset='0'/> <vcpupin vcpu='1' cpuset='1'/> <vcpupin vcpu='2' cpuset='8'/> <vcpupin vcpu='3' cpuset='9'/> </cputune> .... <numatune> <memory node="strict" nodeset="0-1"/> </numatune> .... <cpu> <numa> <cell id='0' cpus='0-1' memory='2097152'/> <cell id='1' cpus='2-3' memory='2097152'/> </numa> </cpu> The resulting qemu command looked like this : /usr/local/bin/qemu-system-x86_64 -name vm1 -S -machine pc-i440fx-2.2,accel=kvm,usb=off \ -m 4096 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 \ -object memory-backend-file,prealloc=yes,mem-path=/hugepages_1G/libvirt/qemu,size=2048M,id=ram-node0,host-nodes=0-1,policy=bind -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \ -object memory-backend-file,prealloc=yes,mem-path=/hugepages_1G/libvirt/qemu,size=2048M,id=ram-node1,host-nodes=0-1,policy=bind -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 \ .... There were 3 1G pages available on each NUMA node on the host as shown above... and I noticed that the VM got backed by 3 1G pages from node0 and 1 1G page from node1. #cat /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/free_hugepages 0 # cat /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/free_hugepages 2 Not sure if this was expected behavior given the options I specified in the xml file ? If yes...Is there some additional option to specify (in the XML file) such that only a given number of 1Gig huge pages per node are picked to back the VM (i.e. in the above case just 2 1G from each node) ? Thanks! Vinod

[CCing Martin Kletzander] On 12.09.2014 08:25, Vinod, Chegu wrote:
Hi Michal,
‘have a kernel+qemu+libvirt setup with all recent upstream bits on a given host & was trying to configure a VM with backing 1G huge pages…spanning 2 NUMA nodes.
The host had 3 1G huge pages on each of the 2 NUMA nodes :
# cat /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
3
# cat /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
3
And I had the following in the /etc/fstab
hugetlbfs /hugepages_1G hugetlbfs pagesize=1GB 0 0
I added the following entries in the xml file for the 4G/4vcpu VM
<memoryBacking>
<hugepages>
<page size='1048576' unit='KiB' nodeset='0'/>
<page size='1048576' unit='KiB' nodeset='1'/>
</hugepages>
</memoryBacking>
<vcpu placement='static'>4</vcpu>
<cputune>
<vcpupin vcpu='0' cpuset='0'/>
<vcpupin vcpu='1' cpuset='1'/>
<vcpupin vcpu='2' cpuset='8'/>
<vcpupin vcpu='3' cpuset='9'/>
</cputune>
….
<numatune>
<memory node=”strict” nodeset=”0-1”/>
[This is a copy-paste error, right? It should have been s/node/mode/] This is incomplete. This basically says nothing more than: All the guest numa nodes must be placed on host numa nodes 0-1. And the qemu command line that libvirt came up with satisfied the constrain. You may wan to pin guest numa nodes to host numa nodes like this: A) use <memory mode="interleave" placement="static" nodeset="0-1"/> I haven't tested myself, but IIRC correctly, this should start placing guest numa nodes sequentially over host numa nodes 0-1. So You'll end up with: host0: guest0, guest2 host1: guest1, guest3 b) use the manual guest <-> host pinning: <numatune> <memory mode='strict' nodeset='0-1'/> <memnode cellid='0' mode='strict' nodeset='0'/> <memnode cellid='1' mode='strict' nodeset='0'/> <memnode cellid='2' mode='strict' nodeset='1'/> <memnode cellid='3' mode='strict' nodeset='1'/> </numatune> This will tie guest0 and guest1 onto host0, and guest2 and guets3 onto host1. I must admit this is not the bit I've implemented, so I don't know all the details. Therefore I'm CCing Martin Kletzander, who's done the major piece of work in this field.
</numatune>
….
<cpu>
<numa>
<cell id='0' cpus='0-1' memory='2097152'/>
<cell id='1' cpus='2-3' memory='2097152'/>
</numa>
</cpu>
The resulting qemu command looked like this :
/usr/local/bin/qemu-system-x86_64 -name vm1 -S -machine pc-i440fx-2.2,accel=kvm,usb=off \
-m 4096 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 \
-object memory-backend-file,prealloc=yes,mem-path=/hugepages_1G/libvirt/qemu,size=2048M,id=ram-node0,host-nodes=0-1,policy=bind -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \
-object memory-backend-file,prealloc=yes,mem-path=/hugepages_1G/libvirt/qemu,size=2048M,id=ram-node1,host-nodes=0-1,policy=bind -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 \
....
There were 3 1G pages available on each NUMA node on the host as shown above... and I noticed that the VM got backed by 3 1G pages from node0 and 1 1G page from node1.
#cat /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/free_hugepages
0
# cat /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/free_hugepages
2
Not sure if this was expected behavior given the options I specified in the xml file ? If yes…Is there some additional option to specify (in the XML file) such that only a given number of 1Gig huge pages per node are picked to back the VM (i.e. in the above case just 2 1G from each node) ?
Thanks!
Vinod
Hopefully, my answer is sufficient, Martin? Michal

Thanks Michal... It seems to work ok now. (My bad that I forgot about Martin's patch for memnode related binding) Thanks! Vinod -----Original Message----- From: Michal Privoznik [mailto:mprivozn@redhat.com] Sent: Friday, September 12, 2014 12:40 AM To: Vinod, Chegu Cc: libvir-list@redhat.com; Eduardo Habkost (ehabkost@redhat.com); Martin Kletzander Subject: Re: Question : Configuring a VM with backing 1G huge pages across 2 NUMA nodes [CCing Martin Kletzander] On 12.09.2014 08:25, Vinod, Chegu wrote:
Hi Michal,
'have a kernel+qemu+libvirt setup with all recent upstream bits on a given host & was trying to configure a VM with backing 1G huge pages...spanning 2 NUMA nodes.
The host had 3 1G huge pages on each of the 2 NUMA nodes :
# cat /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepa ges
3
# cat /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepa ges
3
And I had the following in the /etc/fstab
hugetlbfs /hugepages_1G hugetlbfs pagesize=1GB 0 0
I added the following entries in the xml file for the 4G/4vcpu VM
<memoryBacking>
<hugepages>
<page size='1048576' unit='KiB' nodeset='0'/>
<page size='1048576' unit='KiB' nodeset='1'/>
</hugepages>
</memoryBacking>
<vcpu placement='static'>4</vcpu>
<cputune>
<vcpupin vcpu='0' cpuset='0'/>
<vcpupin vcpu='1' cpuset='1'/>
<vcpupin vcpu='2' cpuset='8'/>
<vcpupin vcpu='3' cpuset='9'/>
</cputune>
....
<numatune>
<memory node="strict" nodeset="0-1"/>
[This is a copy-paste error, right? It should have been s/node/mode/] This is incomplete. This basically says nothing more than: All the guest numa nodes must be placed on host numa nodes 0-1. And the qemu command line that libvirt came up with satisfied the constrain. You may wan to pin guest numa nodes to host numa nodes like this: A) use <memory mode="interleave" placement="static" nodeset="0-1"/> I haven't tested myself, but IIRC correctly, this should start placing guest numa nodes sequentially over host numa nodes 0-1. So You'll end up with: host0: guest0, guest2 host1: guest1, guest3 b) use the manual guest <-> host pinning: <numatune> <memory mode='strict' nodeset='0-1'/> <memnode cellid='0' mode='strict' nodeset='0'/> <memnode cellid='1' mode='strict' nodeset='0'/> <memnode cellid='2' mode='strict' nodeset='1'/> <memnode cellid='3' mode='strict' nodeset='1'/> </numatune> This will tie guest0 and guest1 onto host0, and guest2 and guets3 onto host1. I must admit this is not the bit I've implemented, so I don't know all the details. Therefore I'm CCing Martin Kletzander, who's done the major piece of work in this field.
</numatune>
....
<cpu>
<numa>
<cell id='0' cpus='0-1' memory='2097152'/>
<cell id='1' cpus='2-3' memory='2097152'/>
</numa>
</cpu>
The resulting qemu command looked like this :
/usr/local/bin/qemu-system-x86_64 -name vm1 -S -machine pc-i440fx-2.2,accel=kvm,usb=off \
-m 4096 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 \
-object memory-backend-file,prealloc=yes,mem-path=/hugepages_1G/libvirt/qemu,size=2048M,id=ram-node0,host-nodes=0-1,policy=bind -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \
-object memory-backend-file,prealloc=yes,mem-path=/hugepages_1G/libvirt/qemu,size=2048M,id=ram-node1,host-nodes=0-1,policy=bind -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 \
....
There were 3 1G pages available on each NUMA node on the host as shown above... and I noticed that the VM got backed by 3 1G pages from node0 and 1 1G page from node1.
#cat /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/free_hugepages
0
# cat /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/free_hugepages
2
Not sure if this was expected behavior given the options I specified in the xml file ? If yes...Is there some additional option to specify (in the XML file) such that only a given number of 1Gig huge pages per node are picked to back the VM (i.e. in the above case just 2 1G from each node) ?
Thanks!
Vinod
Hopefully, my answer is sufficient, Martin? Michal
participants (2)
-
Michal Privoznik
-
Vinod, Chegu