
On 4/11/19 11:56 AM, Michal Privoznik wrote:
On 4/11/19 4:23 PM, Daniel Henrique Barboza wrote:
Hi,
I've tested these patches again, twice, in similar setups like I tested the first version (first in a Power8, then in a Power9 server).
Same results, though. Libvirt will not avoid the launch of a pseries guest, with numanode=strict, even if the numa node does not have available RAM. If I stress test the memory of the guest to force the allocation, QEMU exits with an error as soon as the memory of the host numa node is exhausted.
Yes, this is expected. I mean, by default qemu doesn't allocate memory for the guest fully. You'd have to force it:
<memoryBacking> <allocation mode='immediate'/> </memoryBacking>
Tried with this extra setting, still no good. Domain still boots, even if there is not enough memory to load up all its ram in the NUMA node I am setting. For reference, this is the top of the guest XML: <name>vm1</name> <uuid>f48e9e35-8406-4784-875f-5185cb4d47d7</uuid> <memory unit='KiB'>314572800</memory> <currentMemory unit='KiB'>314572800</currentMemory> <memoryBacking> <allocation mode='immediate'/> </memoryBacking> <vcpu placement='static'>16</vcpu> <numatune> <memory mode='strict' nodeset='0'/> </numatune> <os> <type arch='ppc64' machine='pseries'>hvm</type> <boot dev='hd'/> </os> <clock offset='utc'/> While doing this test, I recalled that some of my IBM peers recently mentioned that they were unable to do a pre-allocation of the RAM of a pseries guest using Libvirt, but they were able to do it using QEMU directly (using -realtime mlock=on). In fact, I just tried it out with command line QEMU and the guest allocated all the memory at boot. This means that the pseries guest is able to do mem pre-alloc. I'd say that there might be something missing somewhere (XML, host setup, libvirt config ...) or perhaps even a bug that is preventing Libvirt from doing this pre-alloc. This explains why I can't verify this patch series. I'll see if I dig it further to understand why when I have the time. Thanks, DHB
If I change the numanode setting to 'preferred' and repeats the test, QEMU doesn't exit with an error - the process starts to take memory from other numa nodes. This indicates that the numanode policy is apparently being forced in the QEMU process - however, it is not forced in VM boot.
I've debugged it a little and haven't found anything wrong that jumps the eye. All functions that succeeds qemuSetupCpusetMems exits out with ret = 0. Unfortunately, I don't have access to a x86 server with more than one NUMA node to compare results.
Since I can't say for sure if what I'm seeing is an exclusive pseries behavior, I see no problem into pushing this series upstream if it makes sense for x86. We can debug/fix the Power side later.
I bet that if you force the allocation then the domain will be unable to boot.
Thanks for the testing!
Michal