* Daniel Veillard <veillard(a)redhat.com> [2007-10-11 08:01]:
There are a few things I gathered on this issue. This affects
NUMA setups, where basically if a domain must be placed on a given cell
it is not good to let the hypervisor place it first with its own heuristics
and then later migrate it to a different set of CPU, but better to
instruct the hypervisor to start said domain on the given set.
- For Xen it is possible to instruct the hypervisor by passing
(cpus '2,3') in the SExpr where the argument is a list of
the physical processors allowed
A bit more detail here just FYI:
Xen takes the cpu list and converts that into an affinity bitmap that is
then applied to each vcpu allocated to the guest.
- For KVM I think the standard way would be to select the
cpuset using sched_setaffinity() between the fork of the
current process and the exec of the qemu process
Yep.
- there is no need (from a NUMA perspective) to do fine grained
allocation at that point, as long as the domain can be restricted
to a given cell at startup, then if needed virDomainPinVcpu() can be
used later to do more precise pinning in order to try to optimize
placement
kvm-46 added user-space allocated memory which means that we can use
libnuma/numactl to set the approriate node.
- to be able to instruct the hypervisor at creation time adding
the
information in the domain XML description looks the more natural way
(another option would be to force to use virDomainDefineXML, add a
call using the resulting virDomainPtr to define the set, and
then virDomainCreate would be used to do the actual start)
+ the good point of having this embedded in the XML is that
we still have all informations about the domain settings in
the XML, if we want to restart it later
+ the bad point is that we need to fetch and carry this extra
information when doing XML dumps to not loose it for example
when manipulating the domain to add or remove devices
- extracting a cpuset can still be an heavy operation, for example
if using xend on need one RPC per vcpu in the domain, the cpuset
being constructed by OR'ing logically all cpumaps used by the
vcpus of the domain (though in most case this will be the full
map after the first CPU and can be stopped immediately)
Yeah, that might be a decent patch to xend - build up an array of
affinity masks for each vcpu.
- for the mapping at the XML level I suggest to use a simple
extension
to the <vcpu>n</vcpu> and extend it to
<vcpu cpuset='2,3'>n</vcpu>
with a limited syntax which is just the comma separated list of
allowed CPU numbers (if the code actually detects such a cpuset is
in effect i.e. in general this won't be added).
I think we should support the same cpuset notation that Xen supports,
which means including ranges (1-4) and negation (^1). These two
features make describing large ranges much more compact.
Internally implementing this should not be too hard, I would probably refactor
some of the existing parsing code, provide functions to get the cpuset and
the number of physical processors.
Does this sounds okay ?
Yeah, I think this covers everything we'd need.
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh(a)us.ibm.com