于 2011年05月07日 05:04, Bill Gray 写道:
Looks like there is only a single call-back function --
qemudSecurityHook() -- which has had some cgroup and CPU affinity code
already added in it.
Perhaps a good approach would be to add an invocation of a new function
-- qemudInitMemAffinity() -- as a peer to the already present invocation
of qemudInitCpuAffinity(). The qemudInitMemAffinity() function could use
set_mempolicy() to bind or prefer local/specific memory (depending on
whether the user specifies the explicit memory node list as mandatory or
just advisory). Advisory / preferred won't work correctly for large,
multi-node guests until multiple nodes can be preferred (presumably
selected by amount of free memory resources when multiple nodes are
preferred). It would also be helpful to have an additional attribute to
specify interleaved memory.
How does this approach sound?
Yes, the process should be like so, except the codes you are viewing
are not upstream libvirt. :)
Will add support for "interleave" in next patch series.
Regards
Osier
On 05/06/2011 09:24 AM, Daniel P. Berrange wrote:
> On Fri, May 06, 2011 at 09:20:18PM +0800, Osier Yang wrote:
>> 于 2011年05月06日 17:23, Daniel P. Berrange 写道:
>>> On Thu, May 05, 2011 at 04:30:30PM -0400, Bill Gray wrote:
>>>>
>>>> Hi Daniel,
>>>>
>>>> How can we get NUMA-aligned memory and CPUs if we apply binding APIs
>>>> after the process has already started? Might not all the memory
>>>> already be allocated on the wrong nodes by then?
>>>
>>> The policy has to be set after fork'ing the new QEMU process, but
>>> before exec'ing QEMU. This is essentially what you're doing with
>>> numactl, but with the problem of an extra binary that screws up
>>> the SELinux domain transitions from libvirtd_t -> svirt_t.
>>>
>>>> For expert users, what are the problems with starting qemu with an
>>>> external numactl command (with --cpunodebind and --membind) to
>>>> guarantee optimal alignment?
>>>
>>> Adding an intermediate process will prevent the neccessary SELinux
>>> domain transitions from working. We don't want to allow the
>>> numactl binary to be able to transition to svirt_t because that
>>> would be inappropriate for most users of numactl
>>
>> This make sense, as you said in another mail, perhaps we need to do some
>> work on __virExec, will make v2 series. Thanks for feedback.
>
> Not virExec, but rather in the QEMU exec hook function
>
>
> Daniel