On Thu, Oct 11, 2007 at 09:00:14AM -0400, Daniel Veillard wrote:
There are a few things I gathered on this issue. This affects
NUMA setups, where basically if a domain must be placed on a given cell
it is not good to let the hypervisor place it first with its own heuristics
and then later migrate it to a different set of CPU, but better to
instruct the hypervisor to start said domain on the given set.
- For Xen it is possible to instruct the hypervisor by passing
(cpus '2,3') in the SExpr where the argument is a list of
the physical processors allowed
- For KVM I think the standard way would be to select the
cpuset using sched_setaffinity() between the fork of the
current process and the exec of the qemu process
Yep, as with Xen, this will only let you specify coarse mapping at time
of creating the VM. ie you can say 'this VM is allow to run on pCPUs 1 & 3',
but you can't say 'this VM's vCPU 1 is allow on pCPU 1 and vCPU 2 is allowed
on pCPU 3'. This is because KVM has one thread per vCPU, and at the time
of creating the VM the, vCPU threads don't yet exist & so there's nothing
to pin. Not a huge problem really, just something we should document.
Basically we are setting VM affinity at time of creationg. VCPU affinity
can be set once a VM is running to fine-tune.
- there is no need (from a NUMA perspective) to do fine grained
allocation at that point, as long as the domain can be restricted
to a given cell at startup, then if needed virDomainPinVcpu() can be
used later to do more precise pinning in order to try to optimize
placement
Yep, from a NUMA pov we're only concerned with the VM memory allocation,
so it is sufficient to consider VM affinity and not vCPU affinity.
- to be able to instruct the hypervisor at creation time adding
the
information in the domain XML description looks the more natural way
(another option would be to force to use virDomainDefineXML, add a
call using the resulting virDomainPtr to define the set, and
then virDomainCreate would be used to do the actual start)
+ the good point of having this embedded in the XML is that
we still have all informations about the domain settings in
the XML, if we want to restart it later
+ the bad point is that we need to fetch and carry this extra
information when doing XML dumps to not loose it for example
when manipulating the domain to add or remove devices
- extracting a cpuset can still be an heavy operation, for example
if using xend on need one RPC per vcpu in the domain, the cpuset
being constructed by OR'ing logically all cpumaps used by the
vcpus of the domain (though in most case this will be the full
map after the first CPU and can be stopped immediately)
Fetching /xend/domain/%s?op=vcpuinfo lets us get info for all vCPUs
in a domain in a single RPC doesn't it ? In any case we should first
just try the hypercall - in all normal scenarios that'll work fine.
- for the mapping at the XML level I suggest to use a simple
extension
to the <vcpu>n</vcpu> and extend it to
<vcpu cpuset='2,3'>n</vcpu>
with a limited syntax which is just the comma separated list of
allowed CPU numbers (if the code actually detects such a cpuset is
in effect i.e. in general this won't be added).
It doesn't make sense to me to include the info at a vCPU level in the
XML, since our granularity at time of creation is only at a VM level.
When dumping XML, the VM's affinity is basically the union of the affinity
of all of its pCPUs.
Dan.
--
|=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=|
|=- Perl modules:
http://search.cpan.org/~danberr/ -=|
|=- Projects:
http://freshmeat.net/~danielpb/ -=|
|=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|