
On 2012年10月15日 14:22, Hu Tao wrote:
On Fri, Oct 12, 2012 at 01:27:27PM +0800, Osier Yang wrote:
These 3 elements conflicts with each other in either the doc or the underlying codes are. This is to propse a solution.
Before writing any codes, I want to see if the principle is correct, any advise is welcomed.
Current problems:
Problem 1:
The doc shouldn't simply say "These settings are superseded by CPU tuning. " for element<vcpu>. As except the tuning,<vcpu> allows to specify the current, maxmum vcpu number. Apart from that, <vcpu> also allows to specify the placement as "auto", which binds the domain process to the advisory nodeset from numad.
Problem 2:
Doc for<vcpu> says its "cpuset" specify the physical CPUs that the vcpus can be pinned. But it's not the truth, as actually it only pin domain process to the specified physical CPUs. So either it's a document bug, or code bug.
Problem 3:
Doc for<vcpupin> says it supersed "cpuset" of<vcpu>, it's not quite correct, as each<vcpupin> specify the pinning policy only for one vcpu. How about the ones which doesn't have <vcpupin> specified? it says the vcpu will be pinned to all available physical CPUs, but what's the meaning of attribute "cpuset" of<vcpu> then?
Problem 4:
Doc for<emulatorpin> says it pin the emulator threads (domain process in other context, perhaps another follow up patch to cleanup the inconsistency is needed) to the physical CPUs specified its attribute "cpuset". Which conflicts with <vcpu>'s "cpuset". And actually in the underlying codes, it set the affinity for domain process twice if both "cpuset" for<vcpu> and<emulatorpin> are specified, and<emulatorpin>'s pinning will override<vcpu>'s.
Problem 5:
When "placement" of<vcpu> is "auto" (I.e. uses numad to get the advisory nodeset to which the domain process is pinned to), it will also be overridden by<emulatorpin>,
This patch is trying to sort out the conflicts or bugs by:
1) Don't say<vcpu> is superseded by<cputune>
You mean in the documentation of XML format?
Yes.
Acutally the VCPUs placement settings of<vcpu> will be overrided by those of<cputune>. So I think it's better to keep the words in doc to make users aware of this.
The problem is <vcpu> not only defines the vcpu affinities. And in the new design (see following), "cpuset" of <vcpu> defines the **default** placement for both domain process (emulator threads in emulatorpin context) and vcpu threads. For vcpus which doesn't have <vcpupin> specified, they still inherit the default placement. Also domain process will be pinned to the default placement if <emulatorpin> is not specified.
2) Keep the semanteme for "cpuset" of<vcpu> (I.e. Still says it specify the physical CPUs the virtual CPUs). But modifying it to mention it also set the pinning policy for domain process, and the CPU placement of domain process specified by "cpuset" of<vcpu> will be ingored if<emulatorpin> specified, and similary, the CPU placement of vcpu thread will be ignored if it has<vcpupin> specified, for vcpu which doesn't have <vcpupin> specified, it inherits "cpuset" of<vcpu>.
OK.
3) Don't say<vcpu> is supersed by<vcpupin>. If neither<vcpupin> nor "cpuset" of<vcpu> is specified, the vcpu will be pinned to all available pCPUs.
OK.
4) If neither<emulatorpin> nor "cpuset" of<vcpu> is specified, the domain process (emulator threads in the context) will be pinned to all available pCPUs.
OK.
5) If "placement" of<vcpu> is "auto",<emulatorpin> is not allowed.
Conflicts with 2). Why not just override the emulator part? for vcpu threads the "placement" is still "auto".
In the final patch, <emulatorpin> is ignored if vcpu placement is "auto" when parsing.
6) hotplugged vcpus will also inherit "cpuset" of<vcpu>
OK.
Codes changes above document changes will cause:
1) Inherit def->cpumask for each vcpu which doesn't have<vcpupin> specified, during parsing.
OK.
2) ping the vcpu which doesn't have<vcpupin> specified to def->cpumask
s/ping/pin/
either by cgroup for sched_setaffinity(2), which is actually done by 1).
pin vcpu according to this_vcpu->cpumask, since the cpumask either inherits from def->cpumask(<vcpu>), or is set by<vcpupin>.
Yeah, but this is already done by either cgroup or sched_setaffinity when domain starting. So no new codes is needed.
3) Error out if "placement" == "auto", and<emulatorpin> is specified. Otherwise,<emulatorpin> is honored, and "cpuset" of<cpuset> is ignored.
You mean "cpuset" of<vcpu> here?
Right, a typo.
But I still don't understand why "placement" = "auto" and<emulatorpin> can not both exist, but the latter overides the former.
I think we have agreement on the princinple: we must use either "placement == auto" or <emulatorpin> to set the affinity for domain process, but not the both, right? Based on the agreement, there are two ways, one is to ignore one of them inside driver, another is to ignore when parsing conf. The later one is better as other drivers could support "placement" and "emulatorpin" in future, doing the ignoring work inside each driver is duplicate work, and we need a general rule about the relationship between them for doc. In case of we don't have agreement we can't use both of them to set the affinites. The reason is: numad is likely to manage the affinity for domain process dynamically in future, it's unpreditable to overrides the advisory affinity from numad with cgroup afterwards. Regards, Osier