[libvirt] 1GB huge pages and incompatible VM memory size

Hi All, Matt encountered the following issue when using 1GB huge pages with libvirt
This problem turned out to be entirely my fault because I didn't round the VM's memory size to a 1G multiple, and the kernel tried to split the VMA at the end of the region, triggering this code in the kernel do_mbind() path,
static int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, int new_below) { struct vm_area_struct *new; int err;
if (is_vm_hugetlb_page(vma) && (addr & ~(huge_page_mask(hstate_vma(vma))))) return -EINVAL;
I have no idea how a less fortunate developer without access to MM experts would have figured this out. It's a shame virsh didn't error out when I initially setup 1G hugepages with an incompatible VM memory size.
What do folks think about improving libvirt to warn or error when using a VM memory size that is not compatible with the host hugepage configuration? Regards, Jim

On 02/13/2017 02:52 AM, Jim Fehlig wrote:
Hi All,
Matt encountered the following issue when using 1GB huge pages with libvirt
This problem turned out to be entirely my fault because I didn't round the VM's memory size to a 1G multiple, and the kernel tried to split the VMA at the end of the region, triggering this code in the kernel do_mbind() path,
static int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, int new_below) { struct vm_area_struct *new; int err;
if (is_vm_hugetlb_page(vma) && (addr &
~(huge_page_mask(hstate_vma(vma))))) return -EINVAL;
I have no idea how a less fortunate developer without access to MM experts would have figured this out. It's a shame virsh didn't error out when I initially setup 1G hugepages with an incompatible VM memory size.
What do folks think about improving libvirt to warn or error when using a VM memory size that is not compatible with the host hugepage configuration?
Jim, what is the actual problem? I've tried to reproduce this by running vm with 3.5G RAM backed by 1GB huge pages and the guest runs just fine. I mean kvm guest. This is because at the cmd line level there is aligned value: -m size=4194304k,slots=16,maxmem=8388608k This is result of qemuDomainAlignMemorySizes(). So perhaps there's a bug somewhere in the function? Michal

On Tue, 21 Feb, at 11:23:52AM, Michal Privoznik wrote:
Jim,
what is the actual problem? I've tried to reproduce this by running vm with 3.5G RAM backed by 1GB huge pages and the guest runs just fine. I mean kvm guest. This is because at the cmd line level there is aligned value:
-m size=4194304k,slots=16,maxmem=8388608k
This is result of qemuDomainAlignMemorySizes(). So perhaps there's a bug somewhere in the function?
Quite possibly. Some memory values work fine and appear to be rounded to the next gigabyte boundary. One KiB values that fails for me is, <memory unit='KiB'>52428801</memory> <currentMemory unit='KiB'>52428801</currentMemory> Which results in qemu-kvm ... -m 51201

On 02/23/2017 03:26 PM, Matt Fleming wrote:
On Tue, 21 Feb, at 11:23:52AM, Michal Privoznik wrote:
Jim,
what is the actual problem? I've tried to reproduce this by running vm with 3.5G RAM backed by 1GB huge pages and the guest runs just fine. I mean kvm guest. This is because at the cmd line level there is aligned value:
-m size=4194304k,slots=16,maxmem=8388608k
This is result of qemuDomainAlignMemorySizes(). So perhaps there's a bug somewhere in the function?
Quite possibly. Some memory values work fine and appear to be rounded to the next gigabyte boundary. One KiB values that fails for me is,
<memory unit='KiB'>52428801</memory> <currentMemory unit='KiB'>52428801</currentMemory>
Which results in qemu-kvm ... -m 51201
I currently don't have access to a host with 50+GB of RAM, so I start small: <memory unit='KiB'>3145739</memory> <currentMemory unit='KiB'>3145739</currentMemory> <memoryBacking> <hugepages> <page size='1048576' unit='KiB'/> </hugepages> </memoryBacking> -m 3073 And this works just fine. One thing that I've noticed is that we don't take into account sizes of huge pages in qemuDomainAlignMemorySizes(). But then again - what is the scenario you're seeing? What's the error message? Michal

On 02/23/2017 08:50 AM, Michal Privoznik wrote:
On 02/23/2017 03:26 PM, Matt Fleming wrote:
On Tue, 21 Feb, at 11:23:52AM, Michal Privoznik wrote:
Jim,
what is the actual problem? I've tried to reproduce this by running vm with 3.5G RAM backed by 1GB huge pages and the guest runs just fine. I mean kvm guest. This is because at the cmd line level there is aligned value:
-m size=4194304k,slots=16,maxmem=8388608k
This is result of qemuDomainAlignMemorySizes(). So perhaps there's a bug somewhere in the function?
Quite possibly. Some memory values work fine and appear to be rounded to the next gigabyte boundary. One KiB values that fails for me is,
<memory unit='KiB'>52428801</memory> <currentMemory unit='KiB'>52428801</currentMemory>
Which results in qemu-kvm ... -m 51201
I currently don't have access to a host with 50+GB of RAM, so I start small:
<memory unit='KiB'>3145739</memory> <currentMemory unit='KiB'>3145739</currentMemory> <memoryBacking> <hugepages> <page size='1048576' unit='KiB'/> </hugepages> </memoryBacking>
-m 3073
And this works just fine. One thing that I've noticed is that we don't take into account sizes of huge pages in qemuDomainAlignMemorySizes(). But then again - what is the scenario you're seeing? What's the error message?
Sorry for the delay. I was having problems reproducing it until Matt reminded me that some vNUMA was needed in the mix. E.g. config containing <memory unit='KiB'>10000000</memory> <currentMemory unit='KiB'>10000000</currentMemory> <memoryBacking> <hugepages> <page size='1048576' unit='KiB'/> </hugepages> </memoryBacking> <cpu mode='host-passthrough'> <topology sockets='2' cores='20' threads='2'/> <numa> <cell id='0' cpus='0-39' memory='5000000' unit='KiB'/> <cell id='1' cpus='40-79' memory='5000000' unit='KiB'/> </numa> </cpu> fails with 2017-03-14T21:50:01.228693Z qemu-system-x86_64: -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=5120196608,host-nodes=0-1,policy=bind: cannot bind memory to host NUMA nodes: Invalid argument Changing the config by s/10000000/10485760/ and s/5000000/5242880/ works. It was only then that I remembered the machine was running libvirt 2.0.0 :-/. I updated to 3.1.0 and the original, unaligned config works. So there are some fixes in the alignment code in the meanwhile. I'll test some of the other problematic configurations, which in some cases resulted in different qemu errors, and report back if I find any problems with current code. Regards, Jim
participants (3)
-
Jim Fehlig
-
Matt Fleming
-
Michal Privoznik