On Wed, Jun 13, 2007 at 01:48:21PM -0400, Daniel Veillard wrote:
On Wed, Jun 13, 2007 at 10:40:40AM -0500, Ryan Harper wrote:
> Hello all,
Hello Ryan,
> I wanted to start a discussion on how we might get libvirt to be able to
> probe the NUMA topology of Xen and Linux (for QEMU/KVM). In Xen, I've
> recently posted patches for exporting topology into the [1]physinfo
> hypercall, as well adding a [2]hypercall to probe the Xen heap. I
> believe the topology and memory info is already available in Linux.
> With these, we have enough information to be able to write some simple
> policy above libvirt that can create guests in a NUMA-aware fashion.
>
> I'd like to suggest the following for discussion:
>
> (1) A function to discover topology
> (2) A function to check available memory
> (3) Specifying which cpus to use prior to domain start
Okay, but let's start by defining the scope a bit. Historically NUMA
have explored various paths, and I assume we are gonna work in a rather
small subset of what NUMA (Non Uniform Memory Access) have meant over time.
I assume the following, tell me if I'm wrong:
- we are just considering memory and processor affinity
- the topology, i.e. the affinity between the processors and the various
memory areas is fixed and the kind of mapping is rather simple
to get into more specifics:
- we will need to expand the model of libvirt
http://libvirt.org/intro.html
to split the Node ressources into separate sets containing processors
and memory areas which are highly connected together (assuming the
model is a simple partition of the ressources between the equivalent
of sub-Nodes)
- the function (2) would for a given processor tell how much of its memory
is already allocated (to existing running or paused domains)
Right ? Is the partition model sufficient for the architectures ?
If yes then we will need a new definition and terminology for those sub-Nodes.
We have 3 core models we should refer to when deciding how to present
things.
- Linux/Solaris Xen - hypercalls
- Linux non-Xen - libnuma
- Solaris non-Xen - liblgrp
The Xen & Linux modelling seems reasonably similar IIRC, but Solaris is
a slightly different representational approach.
For 3 we already have support for pinning the domain virtual CPUs to
physical
CPUs but I guess it's not sufficient because you want this to be activated
from the definition of the domain:
http://libvirt.org/html/libvirt-libvirt.html#virDomainPinVcpu
So the XML format would have to be extended to allow specifying the subset
of processors the domain is supposed to start on:
Yeah, I've previously argued against including VCPU pinning information
in the XML since its a tunable, not a hardware description. Reluctantly
though we'll have to add this VCPU info, since its an absolute requirement
for this info to be provided at time of domain creation for NUMA support.
http://libvirt.org/format.html
I would assume that if nothing is specified, the underlying Hypervisor
(in libvirt terminology, that could be a linux kernel in practice) will
by default try to do the optimal placement by itself, i.e. (3) is only
useful if you want to override the default behaviour.
Yes that is correct. We should not change the default - let the OS appply
whatever policy it sees fit by default, since over time OS are tending
towards being able to automagically determine & apply NUMA policy.
Dan
--
|=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=|
|=- Perl modules:
http://search.cpan.org/~danberr/ -=|
|=- Projects:
http://freshmeat.net/~danielpb/ -=|
|=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|