Hi! If a KVM-QEMU guest (spawned using libvirt) requires memory and CPU cores that spans across socket boundaries and it wants to avoid memory access across NUMA nodes, what is the best way to proceed ?

I came across the following statement during my search “If a guest requires eight virtual CPUs, as each NUMA node only has four physical CPUs, a better utilization may be obtained by running a pair of four virtual CPU guests and splitting the work between them, rather than using a single 8 CPU guest.”

Is this statement still valid ? Can one use ‘numatune’ XML tag with ‘auto’ placement to create a guest, where vCPUs and memory is allocated in a efficient manner across sockets based on available memory?

If one allocates memory across sockets and passes the topology information to Guest using 'numa' tag, will it help in avoiding the NUMA penalty (ex: <numa>/<cell cpus='0' memory=256000'><cell cpus='1' memory='512000') ?

Any advise on this subject will be appreciated.

Regards,

Sanjay