Dear list,
QEMU gained support for configuring HMAT recently (see
v4.2.0-415-g9b12dfa03a
and friends). HMAT stands for Heterogeneous Memory Attribute Table and
defines
various attributes to NUMA. Guest OS/app can read these information and fine
tune optimization. See [1] for more info (esp. links in the transcript).
QEMU defines so called initiator, which is an attribute to a NUMA node
and if
specified points to another node that has the best performance to this node.
For instance:
-machine hmat=on \
-m 2G,slots=2,maxmem=4G \
-object memory-backend-ram,size=1G,id=m0 \
-object memory-backend-ram,size=1G,id=m1 \
-numa node,nodeid=0,memdev=m0 \
-numa node,nodeid=1,memdev=m1,initiator=0 \
-smp 2,sockets=2,maxcpus=2 \
-numa cpu,node-id=0,socket-id=0 \
-numa cpu,node-id=0,socket-id=1
creates a machine with 2 NUMA nodes, node 0 has CPUs and node 1 has
memory only
and it's initiator is node 0 (yes, HMAT allows you to create CPU-less
"NUMA"
nodes). The initiator of node 0 is not specified, but since the node has at
least one CPU it is initiator to itself (and has to be per specs).
This could be represented by an attribute to our /domain/cpu/numa/cell
element.
For instance like this:
<domain>
<vcpu>2</vcpu>
<cpu>
<numa>
<cell id='0' cpus='0,1' memory='1'
unit='GiB'/>
<cell id='1' memory='1' unit='GiB'
initiator='0'/>
</numa>
</cpu>
</domain>
Then, QEMU allows us to control two other important memory attributes:
1) hmat-lb for Latency and Bandwidth
2) hmat-cache for cache attributes
For example:
-machine hmat=on \
-m 2G,slots=2,maxmem=4G \
-object memory-backend-ram,size=1G,id=m0 \
-object memory-backend-ram,size=1G,id=m1 \
-smp 2,sockets=2,maxcpus=2 \
-numa node,nodeid=0,memdev=m0 \
-numa node,nodeid=1,memdev=m1,initiator=0 \
-numa cpu,node-id=0,socket-id=0 \
-numa cpu,node-id=0,socket-id=1 \
-numa
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5
\
-numa
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M
\
-numa
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10
\
-numa
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M
\
-numa
hmat-cache,node-id=0,size=10K,level=1,associativity=direct,policy=write-back,line=8
\
-numa
hmat-cache,node-id=1,size=10K,level=1,associativity=direct,policy=write-back,line=8
This extends previous example by defining some latencies and cache
attributes.
The node 0 has access latency of 5 ns and bandwidth of 200MB/s and node
1 has
access latency of 10ns and bandwidth of only 100MB/s. The memory cache
level 1
on both nodes is 10KB, cache line is 8B long with write-back policy and
direct
associativity (whatever that means).
For better future extensibility I'd express these as separate elements,
rather
than attributes to <cell/> element. For instance like this:
<domain>
<vcpu>2</vcpu>
<cpu>
<numa>
<cell id='0' cpus='0,1' memory='1'
unit='GiB'>
<latencies>
<latency type='access' value='5'/>
<bandwidth type='access' unit='MiB'
value='200'/>
</latencies>
<caches>
<cache level='1' associativity='direct'
policy='write-back'>
<size unit='KiB' value='10'/>
<line unit='B' value='8'/>
</cache>
</caches>
</cell>
<cell id='1' memory='1' unit='GiB'
initiator='0'>
<latencies>
<latency type='access' value='10'/>
<bandwidth type='access' unit='MiB'
value='100'/>
</latencies>
<caches>
<cache level='1' associativity='direct'
policy='write-back'>
<size unit='KiB' value='10'/>
<line unit='B' value='8'/>
</cache>
</caches>
</cell>
</numa>
</cpu>
</domain>
Thing is, the @hierarchy argument accepts: memory (referring to whole
memory),
or first-level|second-level|third-level (referring to side caches for each
domain). I haven't figured out yet, how to express the levels in XML yet.
The @data-type argument accepts access|read|write (this is expressed by
@type
attribute to <latency/> and <bandwidth/> elements). Latency and
bandwidth can
be combined with each type: access-latency, read-latency, write-latency,
access-bandwidth, read-bandwidth, write-bandwidth. And these 6 can then be
combined with aforementioned @hierarchy, producing 24 combinations (if I
read
qemu cmd line specs correctly [2]).
What are your thoughts?
Michal
1:
https://bugzilla.redhat.com/show_bug.cgi?id=1786303
2:
https://git.qemu.org/?p=qemu.git;a=blob;f=qemu-options.hx;h=d4b73ef60c1d4...