* Daniel Veillard <veillard(a)redhat.com> [2007-06-13 12:52]:
On Wed, Jun 13, 2007 at 10:40:40AM -0500, Ryan Harper wrote:
> Hello all,
Hello Ryan,
Hey, thanks for the swift reply.
> I wanted to start a discussion on how we might get libvirt to be able to
> probe the NUMA topology of Xen and Linux (for QEMU/KVM). In Xen, I've
> recently posted patches for exporting topology into the [1]physinfo
> hypercall, as well adding a [2]hypercall to probe the Xen heap. I
> believe the topology and memory info is already available in Linux.
> With these, we have enough information to be able to write some simple
> policy above libvirt that can create guests in a NUMA-aware fashion.
>
> I'd like to suggest the following for discussion:
>
> (1) A function to discover topology
> (2) A function to check available memory
> (3) Specifying which cpus to use prior to domain start
Okay, but let's start by defining the scope a bit. Historically NUMA
have explored various paths, and I assume we are gonna work in a rather
small subset of what NUMA (Non Uniform Memory Access) have meant over time.
I assume the following, tell me if I'm wrong:
- we are just considering memory and processor affinity
- the topology, i.e. the affinity between the processors and the various
memory areas is fixed and the kind of mapping is rather simple
Correct. Currently we are not processing the SLIT tables which provide
costs values between cpus and memory.
to get into more specifics:
- we will need to expand the model of libvirt
http://libvirt.org/intro.html
to split the Node ressources into separate sets containing processors
and memory areas which are highly connected together (assuming the
model is a simple partition of the ressources between the equivalent
of sub-Nodes)
Yeah, the physical topology of the physical machine is split up into
NUMA nodes. Each NUMA node will have a set of cpus, and physical memory.
This topology is static across reboots, until the admin reconfigures the
hardware.
- the function (2) would for a given processor tell how much of
its memory
is already allocated (to existing running or paused domains)
The memory is tracked by how much memory is free in a given NUMA node. We
could implement the function in terms of cpu, but we would be probing on
a per-NUMA node basis and then mapping the cpu to which NUMA node the cpu
belonged. We should be able to answer the reverse (what is in use) by
examining the domain config.
Right ? Is the partition model sufficient for the architectures ?
If yes then we will need a new definition and terminology for those sub-Nodes.
I believe so.
For 3 we already have support for pinning the domain virtual CPUs to physical
CPUs but I guess it's not sufficient because you want this to be activated
from the definition of the domain:
Correct. Knowing the cpus that are being used allows for NUMA-node
local memory allocation. For Xen specifically, pinning after memory has
been allocated (it happens during domain creation) is not sufficient to
ensure that the memory selected will be local to the processors backing
the guest Virtual CPUS.
http://libvirt.org/html/libvirt-libvirt.html#virDomainPinVcpu
So the XML format would have to be extended to allow specifying the subset
of processors the domain is supposed to start on:
http://libvirt.org/format.html
I would assume that if nothing is specified, the underlying Hypervisor
(in libvirt terminology, that could be a linux kernel in practice) will
by default try to do the optimal placement by itself, i.e. (3) is only
useful if you want to override the default behaviour.
(3) is required if one wants to ensure that the resources allocated to
the guest are local. It is possible that the hypervisor allocated local
resources, but without specifying, there is no guarantee.
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh(a)us.ibm.com