Re: [libvirt] [numatune PATCH v2] Support NUMA tuning

20 May 2011

      于 2011年05月19日 15:34, Daniel Veillard 写道:
...
On Sun, May 15, 2011 at 09:37:21PM -0400, Mark Wagner wrote:
...
On 05/12/2011 06:45 AM, Daniel P. Berrange wrote:
...
On Thu, May 12, 2011 at 06:22:49PM +0800, Osier Yang wrote:
...
Hi, All
This series adopts Daniel's suggestion on v1, using libnuma but
not invoking numactl to set the NUMA policy. Add support for
"interleave" and "preferred" modes, except the "strict" mode
supported in v1.
The new XML is like:
<numatune>
   <memory model="interleave" nodeset="+0-4,8-12"/>
<numatune>
I persist in using the numactl nodeset syntax to represent
the "nodeset", as I think the purpose of adding NUMA tuning
support is to provide the use for NUMA users, keeping the
syntax same as numactl will make them feel better.
Compatibility with numactl syntax is an explicit non-goal.
numactl is just one platform specific impl.  Compatibility
with numactl syntax is of no interest to the ESX or VirtualBox
drivers. The libvirt NUMA syntax should be using other
existing libvirt XML as the design compatibility target.
I won't argue semantic of XML with you, but please keep in mind
that one of the main differences between using a numactl like
mechanism and taskset is that the NUMA mechanisms also let you
bind to specific, NUMA node memory, as well as specifying the
access type.
So from the outside looking in, keeping things in terms of cpusets
would seem to not be in full agreement with the RFE for NUMA support.
I would think that the specification of NUMA binding would need to
include NUMA nodes and specify memory bindings as well as the
access type. From a performance perspective, support for true
NUMA is what is the last hurdle that is keeping libvirt from being
used in high performance situations.
I think that specifying things in terms of nodes instead of
cpus will make it easier for the end user. So I guess I need
to withdraw the part about not arguing XML...
Hi Mark,
I'm not 100% sure I understand what you disagreeing with:
   - it seems to me that the proposed model does allow the specification
     of the nodes and the memory binding associated
   - I wonder if you just object to the "nodeset" attribute name here
   - please note that "Node" in the context of libvirt has the specific
     meaning of the whole physical machine http://libvirt.org/goals.html
     that terminology was set up 5 years ago and present in many places
     of the libvirt API. On the other hand "nodeset" is being used in
     other places to specify a set of cpu nodes in a NUMA context.
I guess Mark is not objecting to the attribute name "nodeset", seems
he means if we use same syntax as "cpuset", it's not the full
agreement with PRE "NUMA support", as we will lose some syntax that
libnuma uses.

As a conclusion after the discussion, we will use "nodeset" as the
attribute name, and with same syntax of "cpuset", and we won't use
the nodestring parsing function "numa_parse_nodestring", which is
provided by libnuma, if we don't want to make things a mess:

"numa_parse_nodestring" only accepts "!" (also "+", but as we won't
support "+", so skip it here) at the beginning of the specified node
string, e.g "0-4,!8-12" is not valid, however, our current "cpuset"
syntax allows "^" could be specified anywhere, e.g. "0-8,^2-4" is
valid, so even if we convert "^" to "!" before passing the string
to "numa_parse_nodestring", that's still doesn't make sense, unless
we declare in the documents, that we use same syntax of "cpuset",
however, the "^" must be specified at the beginning, but that's
no better than introducing a different syntax. On the other hand,
"numa_parse_nodestring" doesn't support syntax like "!6", so in
one word, if we will use same syntax with "cpuset", we can't/won't
use the numa parsing function.

We will use "virDomainCpuSetParse" to parse the value of "nodeset"
to bit mask. and then pass it to numa setting functions, we need to
do some conversion before pass it for numa functions' use though,
as the datatypes are different.

Even if we modify current "cpuset" parsing function to support
"^2-4", that will still diffrent with what "!" means in libnuma.

That means we will use a nearly completely diffrent syntax with
libnuma to represents NUMA nodes in libvirt, with losing sementics
of both "+" and "!" in presentation layer.

Thoughts?

Regards
Osier