First I'll quickly summarize my understanding of how to configure numa...
In "//memoryBacking/hugepages/page[@nodeset]" I am telling libvirt to
use hugepages for the guest, and to get those hugepages from a
particular host NUMA node.
In "//numatune/memory[@nodeset]" I am telling libvirt to pin the
memory allocation to the guest from a particular host numa node.
In "//numatune/memnode[@nodeset]" I am telling libvirt which guest
NUMA node (cellid) should come from which host NUMA node (nodeset).
In "//cpu/numa/cell[@id]" I am telling libvirt how much memory to
allocate to each guest NUMA node (cell).
Basically, I thought "nodeset", regardless of where it existed in the
domain xml, referred to the host's NUMA node, and "cell" (<cell id=/>
or @cellid) refers to the guest's NUMA node.
However....
Atlas [1] starts without issue, prometheus [2] fails with "libvirtd[]:
hugepages: node 2 not found". I found a patch that contains the code
responsible for throwing this error [3],
+ if (def->cpu && def->cpu->ncells) {
+ /* Fortunately, we allow only guest NUMA nodes to be continuous
+ * starting from zero. */
+ pos = def->cpu->ncells - 1;
+ }
+
+ next_bit = virBitmapNextSetBit(page->nodemask, pos);
+ if (next_bit >= 0) {
+ virReportError(VIR_ERR_XML_DETAIL,
+ _("hugepages: node %zd not found"),
+ next_bit);
+ return -1;
+ }
Without digging too deeply into the actual code, and just inferring
from the above, it looks like we are reading the number of cells set
in "//cpu/numa" with def->cpu->ncells, and comparing it to the number
of nodesets in "//memoryBacking//hugepages". I think this means that I
misunderstand what the nodeset is for in that element...
Of note is the fact that my host has non-contiguous NUMA node numbers:
2015-02-09 08:53:06
root@eanna i ~ # numastat
node0 node2
numa_hit 216225024 440311113
numa_miss 0 795018
numa_foreign 795018 0
interleave_hit 15835 15783
local_node 214029815 221903122
other_node 2195209 219203009
Thanks again for any help.
[1]:
http://sprunge.us/jZgS
[2]:
http://sprunge.us/iETF
[3]
https://www.redhat.com/archives/libvir-list/2014-September/msg00090.html
On Wed, Feb 4, 2015 at 12:03 PM, G. Richard Bellamy
<rbellamy(a)pteradigm.com> wrote:
*facepalm*
Now that I'm re-reading the documentation it's obvious that <page/>
and @nodeset are for the guest, "This tells the hypervisor that the
guest should have its memory allocated using hugepages instead of the
normal native page size." Pretty clear there.
Thank you SO much for the guidance, I'll return to my tweaking. I'll
report back here with my results.
On Wed, Feb 4, 2015 at 12:17 AM, Michal Privoznik <mprivozn(a)redhat.com> wrote:
> On 04.02.2015 01:59, G. Richard Bellamy wrote:
>> As I mentioned, I got the instances to launch... but they're only
>> taking HugePages from "Node 0", when I believe my setup should pull
>> from both nodes.
>>
>> [atlas]
http://sprunge.us/FSEf
>> [prometheus]
http://sprunge.us/PJcR
>
> [pasting interesting nits from both XMLs]
>
> <domain type='kvm' id='2'>
> <name>atlas</name>
> <uuid>d9991b1c-2f2d-498a-9d21-51f3cf8e6cd9</uuid>
> <memory unit='KiB'>16777216</memory>
> <currentMemory unit='KiB'>16777216</currentMemory>
> <memoryBacking>
> <hugepages>
> <page size='2048' unit='KiB' nodeset='0'/>
> </hugepages>
> <nosharepages/>
> </memoryBacking>
> <!-- no numa pining -->
> </domain>
>
>
> <domain type='kvm' id='3'>
> <name>prometheus</name>
> <uuid>dda7d085-701b-4d0a-96d4-584678104fb3</uuid>
> <memory unit='KiB'>16777216</memory>
> <currentMemory unit='KiB'>16777216</currentMemory>
> <memoryBacking>
> <hugepages>
> <page size='2048' unit='KiB' nodeset='2'/>
> </hugepages>
> <nosharepages/>
> </memoryBacking>
> <!-- again no numa pining -->
> </domain>
>
> So, at start, the @nodeset attribute to <page/> element refers to guest
> numa nodes, not host ones. And since you don't define any numa nodes for
> your guests, it's useless. Side note - I wonder if we should make
> libvirt fail explicitly in this case.
>
> Moreover, you haven't pinned your guests onto any host numa nodes. This
> means it's up to the host kernel and its scheduler where guest will take
> memory from. And subsequently hugepages as well. I think you want to add:
>
> <numatune>
> <memory mode='strict' nodeset='0'/>
> </numatune>
>
> to guest XMLs, where @nodeset refers to host numa nodes and tells where
> the guest should be placed. There are other modes too so please see
> documentation to tune the XML to match your use case perfectly.
>
> Michal