-----Original Message-----
From: Martin Kletzander [mailto:mkletzan@redhat.com]
Sent: Wednesday, July 18, 2018 10:03 PM
To: Wang, Huaqiang <huaqiang.wang(a)intel.com>
Cc: libvir-list(a)redhat.com; Feng, Shaohe <shaohe.feng(a)intel.com>; Niu, Bing
<bing.niu(a)intel.com>; Ding, Jian-feng <jian-feng.ding(a)intel.com>; Zang, Rui
<rui.zang(a)intel.com>
Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache Monitoring
Technology (CMT)
On Wed, Jul 18, 2018 at 12:19:18PM +0000, Wang, Huaqiang wrote:
>
>
>> -----Original Message-----
>> From: Martin Kletzander [mailto:mkletzan@redhat.com]
>> Sent: Wednesday, July 18, 2018 8:07 PM
>> To: Wang, Huaqiang <huaqiang.wang(a)intel.com>
>> Cc: libvir-list(a)redhat.com; Feng, Shaohe <shaohe.feng(a)intel.com>;
>> Niu, Bing <bing.niu(a)intel.com>; Ding, Jian-feng
>> <jian-feng.ding(a)intel.com>; Zang, Rui <rui.zang(a)intel.com>
>> Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache Monitoring
>> Technology (CMT)
>>
>> On Wed, Jul 18, 2018 at 02:29:32AM +0000, Wang, Huaqiang wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: Martin Kletzander [mailto:mkletzan@redhat.com]
>> >> Sent: Tuesday, July 17, 2018 5:11 PM
>> >> To: Wang, Huaqiang <huaqiang.wang(a)intel.com>
>> >> Cc: libvir-list(a)redhat.com; Feng, Shaohe
<shaohe.feng(a)intel.com>;
>> >> Niu, Bing <bing.niu(a)intel.com>; Ding, Jian-feng
>> >> <jian-feng.ding(a)intel.com>; Zang, Rui <rui.zang(a)intel.com>
>> >> Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache
>> >> Monitoring Technology (CMT)
>> >>
>> >> On Tue, Jul 17, 2018 at 07:19:41AM +0000, Wang, Huaqiang wrote:
>> >> >Hi Martin,
>> >> >
>> >> >Thanks for your comments. Please see my reply inline.
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Martin Kletzander [mailto:mkletzan@redhat.com]
>> >> >> Sent: Tuesday, July 17, 2018 2:27 PM
>> >> >> To: Wang, Huaqiang <huaqiang.wang(a)intel.com>
>> >> >> Cc: libvir-list(a)redhat.com; Feng, Shaohe
>> >> >> <shaohe.feng(a)intel.com>; Niu, Bing
<bing.niu(a)intel.com>; Ding,
>> >> >> Jian-feng <jian-feng.ding(a)intel.com>; Zang, Rui
>> >> >> <rui.zang(a)intel.com>
>> >> >> Subject: Re: [libvirt] [RFC PATCHv2 00/10] x86 RDT Cache
>> >> >> Monitoring Technology (CMT)
>> >> >>
>> >> >> On Mon, Jul 09, 2018 at 03:00:48PM +0800, Wang Huaqiang
wrote:
>> >> >> >
>> >> >> >This is the V2 of RFC and the POC source code for
introducing
>> >> >> >x86 RDT CMT feature, thanks Martin Kletzander for his
review
>> >> >> >and constructive suggestion for V1.
>> >> >> >
>> >> >> >This series is trying to provide the similar functions of
the
>> >> >> >perf event based CMT, MBMT and MBML features in reporting
>> >> >> >cache occupancy, total memory bandwidth utilization and
local
>> >> >> >memory bandwidth utilization information in livirt.
Firstly we focus on
cmt.
>> >> >> >
>> >> >> >x86 RDT Cache Monitoring Technology (CMT) provides a
medthod
>> >> >> >to track the cache occupancy information per CPU thread.
We
>> >> >> >are leveraging the implementation of kernel resctrl
filesystem
>> >> >> >and create our patches on top of that.
>> >> >> >
>> >> >> >Describing the functionality from a high level:
>> >> >> >
>> >> >> >1. Extend the output of 'domstats' and report CMT
inforamtion.
>> >> >> >
>> >> >> >Comparing with perf event based CMT implementation in
libvirt,
>> >> >> >this series extends the output of command
'domstat' and
>> >> >> >reports cache occupancy information like these:
>> >> >> ><pre>
>> >> >> >[root@dl-c200 libvirt]# virsh domstats vm3 --cpu-resource
>> >> >> >Domain: 'vm3'
>> >> >> > cpu.cacheoccupancy.vcpus_2.value=4415488
>> >> >> > cpu.cacheoccupancy.vcpus_2.vcpus=2
>> >> >> > cpu.cacheoccupancy.vcpus_1.value=7839744
>> >> >> > cpu.cacheoccupancy.vcpus_1.vcpus=1
>> >> >> > cpu.cacheoccupancy.vcpus_0,3.value=53796864
>> >> >> > cpu.cacheoccupancy.vcpus_0,3.vcpus=0,3
>> >> >> ></pre>
>> >> >> >The vcpus have been arragned into three monitoring
groups,
>> >> >> >these three groups cover vcpu 1, vcpu 2 and vcpus 0,3
respectively.
>> >> >> >Take an example, the
'cpu.cacheoccupancy.vcpus_0,3.value'
>> >> >> >reports the cache occupancy information for vcpu 0 and
vcpu 3,
>> >> >> >the
>> >> >> 'cpu.cacheoccupancy.vcpus_0,3.vcpus'
>> >> >> >represents the vcpu group information.
>> >> >> >
>> >> >> >To address Martin's suggestion "beware as 1-4 is
something
>> >> >> >else than
>> >> >> >1,4 so you need to differentiate that.", the content
of 'vcpus'
>> >> >> >(cpu.cacheoccupancy.<groupname>.vcpus=xxx) has been
specially
>> >> >> >processed, if vcpus is a continous range, e.g. 0-2, then
the
>> >> >> >output of cpu.cacheoccupancy.vcpus_0-2.vcpus will be like
>> >> >> >'cpu.cacheoccupancy.vcpus_0-2.vcpus=0,1,2'
>> >> >> >instead of
>> >> >> >'cpu.cacheoccupancy.vcpus_0-2.vcpus=0-2'.
>> >> >> >Please note that 'vcpus_0-2' is a name of this
monitoring
>> >> >> >group, could be specified any other word from the XML
>> >> >> >configuration file or lively changed with the command
introduced in
following part.
>> >> >> >
>> >> >>
>> >> >> One small nit according to the naming (but it shouldn't
block
>> >> >> any reviewers from reviewing, just keep this in mind for next
>> >> >> version for
>> >> >> example) is that this is still inconsistent.
>> >> >
>> >> >OK. I'll try to use words such as 'cache', 'cpu
resource' and
>> >> >avoid using 'RDT', 'CMT'.
>> >> >
>> >>
>> >> Oh, you misunderstood, I meant the naming in the domstats output
>> >> =)
>> >>
>> >> >The way domstats are structured when there is something like an
>> >> >> array could shed some light into this. What you suggested is
>> >> >> really kind of hard to parse (although looks better). What
>> >> >> would you say to
>> >> something like this:
>> >> >>
>> >> >> cpu.cacheoccupancy.count = 3
>> >> >> cpu.cacheoccupancy.0.value=4415488
>> >> >> cpu.cacheoccupancy.0.vcpus=2
>> >> >> cpu.cacheoccupancy.0.name=vcpus_2
>> >> >> cpu.cacheoccupancy.1.value=7839744
>> >> >> cpu.cacheoccupancy.1.vcpus=1
>> >> >> cpu.cacheoccupancy.1.name=vcpus_1
>> >> >> cpu.cacheoccupancy.2.value=53796864
>> >> >> cpu.cacheoccupancy.2.vcpus=0,3
>> >> >> cpu.cacheoccupancy.2.name=0,3
>> >> >>
>> >> >
>> >> >Your arrangement looks more reasonable, thanks for your advice.
>> >> >However, as I mentioned in another email that I sent to
>> >> >libvirt-list hours ago, the kernel resctrl interface provides
>> >> >cache occupancy information for each cache block for every
resource
group.
>> >> >Maybe we need to expose the cache occupancy for each cache block.
>> >> >If you agree, we need to refine the 'domstats' output
message,
>> >> >how about this:
>> >> >
>> >> > cpu.cacheoccupancy.count=3
>> >> > cpu.cacheoccupancy.0.name=vcpus_2
>> >> > cpu.cacheoccupancy.0.vcpus=2
>> >> > cpu.cacheoccupancy.0.block.count=2
>> >> > cpu.cacheoccupancy.0.block.0.bytes=5488
>> >> > cpu.cacheoccupancy.0.block.1. bytes =4410000
>> >> > cpu.cacheoccupancy.1.name=vcpus_1
>> >> > cpu.cacheoccupancy.1.vcpus=1
>> >> > cpu.cacheoccupancy.1.block.count=2
>> >> > cpu.cacheoccupancy.1.block.0. bytes =7839744
>> >> > cpu.cacheoccupancy.1.block.0. bytes =0
>> >> > cpu.cacheoccupancy.2.name=0,3
>> >> > cpu.cacheoccupancy.2.vcpus=0,3
>> >> > cpu.cacheoccupancy.2.block.count=2
>> >> > cpu.cacheoccupancy.2.block.0. bytes=53796864
>> >> > cpu.cacheoccupancy.2.block.1. bytes=0
>> >> >
>> >>
>> >> What do you mean by cache block? Is that (cache_size / granularity)?
>> >> In that case it looks fine, I guess (without putting too much thought
into it).
>> >
>> >No. 'cache block' that I mean is indexed with 'cache id',
with the
>> >id number kept in '/sys/devices/system/cpu/cpu*/cache/index*/id'.
>> >
>> >Generally for a two socket server node, there are two sockets (with
>> >CPU
>> >E5-2680 v4, for example) in system, and each socket has a L3 cache,
>> >if resctrl monitoring group is created (/sys/fs/resctrl/p0, for
>> >example), you can find the cache occupancy information for these two
>> >L3 cache areas separately from file
>> >/sys/fs/resctrl/p0/mon_data/mon_L3_00/llc_occupancy
>> >and file
>> >/sys/fs/resctrl/p0/mon_data/mon_L3_01/llc_occupancy
>> >Cache information for individual socket is meaningful to detect
>> >performance issues such as workload balancing...etc. We'd better
>> >expose these details to libvirt users.
>> >To my knowledge, I am using 'cache block' to describe the CPU cache
>> >indexed with number found in
'/sys/devices/system/cpu/cpu*/cache/index*/id'.
>> >I welcome suggestion on other kind of naming for it.
>> >
>>
>> To be consistent I'd prefer "cache" "cache bank" and
"index" or "id".
>> I don't have specific requirements, I just don't want to invent new
>> words. Look at how it is described in capabilities for example.
>>
>Make sense. Then let's use 'id' for the the purpose, and the output would
be:
>
>cpu.cacheoccupancy.count=3
>cpu.cacheoccupancy.0.name=vcpus_2
>cpu.cacheoccupancy.0.vcpus=2
>cpu.cacheoccupancy.0.id.count=2
>cpu.cacheoccupancy.0.id.0.bytes=5488
>cpu.cacheoccupancy.0.id.1.bytes =4410000
>cpu.cacheoccupancy.1.name=vcpus_1
>cpu.cacheoccupancy.1.vcpus=1
>cpu.cacheoccupancy.1.id.count=2
>cpu.cacheoccupancy.1.id.0.bytes =7839744
>cpu.cacheoccupancy.1.id.1.bytes =0
>cpu.cacheoccupancy.2.name=0,3
>cpu.cacheoccupancy.2.vcpus=0,3
>cpu.cacheoccupancy.2.id.count=2
>cpu.cacheoccupancy.2.id.0.bytes=53796864
>cpu.cacheoccupancy.2.id.1.bytes=0
>
>How about it?
>
I'm switching contexts too much and hence I didn't make myself clear. Since IDs
are not guaranteed to be consecutive, this might be more future-proof:
cpu.cacheoccupancy.count=3
cpu.cacheoccupancy.0.name=vcpus_2
cpu.cacheoccupancy.0.vcpus=2
cpu.cacheoccupancy.0.bank.count=2
cpu.cacheoccupancy.0.bank.0.id=0
cpu.cacheoccupancy.0.bank.0.bytes=5488
cpu.cacheoccupancy.0.bank.1.id=1
cpu.cacheoccupancy.0.bank.1.bytes =4410000
cpu.cacheoccupancy.1.name=vcpus_1
cpu.cacheoccupancy.1.vcpus=1
cpu.cacheoccupancy.1.bank.count=2
cpu.cacheoccupancy.0.bank.0.id=0
cpu.cacheoccupancy.1.bank.0.bytes =7839744
cpu.cacheoccupancy.0.bank.1.id=1
cpu.cacheoccupancy.1.bank.1.bytes =0
cpu.cacheoccupancy.2.name=0,3
cpu.cacheoccupancy.2.vcpus=0,3
cpu.cacheoccupancy.2.bank.count=2
cpu.cacheoccupancy.0.bank.0.id=0
cpu.cacheoccupancy.2.bank.0.bytes=53796864
cpu.cacheoccupancy.0.bank.1.id=1
cpu.cacheoccupancy.2.bank.1.bytes=0
It is better now. Agree.