Re: [libvirt] [RFC PATCH 0/2] Introduce RDT memory bandwidth allocation support

Wednesday, 6 June 2018

Hi Pavel,
Thanks for your valuable inputs here. please see my respond.

On 2018年06月05日 20:11, Pavel Hrdina wrote:
...
 On Tue, May 29, 2018 at 06:58:01PM +0800, bing.niu(a)intel.com wrote:
> From: Bing Niu <bing.niu(a)intel.com&gt;
>
> This series is to introduce RDT memory bandwidth allocation support by extending
> current virresctrl implementation.
>
> The Memory Bandwidth Allocation (MBA) feature provides indirect and approximate
> control over memory bandwidth available per-core. This feature provides a method to
> control applications which may be over-utilizing bandwidth relative to their
priority
> in environments such as the data-center. The details can be found in Intel's SDM
17.19.7.
> Kernel supports MBA through resctrl file system same as CAT. Each resctrl group have
a
> MB parameter to control how much memory bandwidth it can utilize in unit of
percentage.
>
> In this series, MBA is enabled by enhancing existing virresctrl implementation. The
> policy employed for MBA is similar with CAT: The sum of each MBA group's
bandwidth
> dose not exceed 100%. The enhancement of virresctrl include two parts:
>
> Patch 1: Add two new structure virResctrlInfoMB and virResctrlAllocMB for collecting
>           host system MBA capability and domain memory bandwidth allocation.
>
> Patch 2: On frontend XML parsing, add new element "llc" in cachetune
section for
>           MBA allocation.

 Hi,

 Thanks for the patches.  Before we start with the actual implementation
 it would be nice to agree on the design. Total agree. The RFC code acts as baseline
for discuss.
...

 ------------------------------------------------------------------------

 So first point is that we should do it similarly as the cache
 allocation, we will not allow to "share" the bandwidth so the sum should
 be 100% as you already have that in your patches, but we need to do it
 in a way that in the future we can allow to "share" the bandwidth.
Yes, the memory bandwidth allocation policy is derived from existing CAT  in
libvirt. no share or overlap. In the patch, I follow the existing CAT 
behavior. When allocating memory bandwidth. First, calculate the unused 
memory bandwidth by subtracting all existing RDT groups. If we want to 
enable memory bandwidth sharing. We can just simply skip this part and 
do allocation directly.
Could this fit your comment " we need to do it in a way that in the 
future we can allow to "share" the bandwidth."?
If there is anything missing or my understanding incorrect, Please point 
me out. :)

...
 Second point is how the XML will look like.  There are two parts, one
is
 the capabilities XML and second one is domain XML.

 It looks like that your patches don't expose any information in
 capabilities, we should do that in order to let management applications
 know that the feature is available and what are the possible values that
 they can use.

 ------------------------------------------------------------------------

 I've tried to configure MBA on one machine that I have access to witch
 has this cpu: 'Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz' and it behaves
 strangely.  If I configure 'schemata' the output of 'pqos -s' command is
 in some situations different:

      schemata                pqos -s output

      MB:0=10                 MBA COS0 => 10% available
      MB:0=20                 MBA COS0 => 20% available
      MB:0=30                 MBA COS0 => 30% available
      MB:0=40                 MBA COS0 => 40% available
      MB:0=50                 MBA COS0 => 50% available
      MB:0=60                 MBA COS0 => 60% available
      MB:0=70                 MBA COS0 => 90% available
      MB:0=80                 MBA COS0 => 90% available
      MB:0=90                 MBA COS0 => 90% available
      MB:0=100                MBA COS0 => 100% available

 If you look at the table you can see that for values 70-90 the pqos
 shows that the available bandwidth is 90%.

 Tested using Fedora 28:
 kernel-4.16.13-300.fc28.x86_64
 intel-cmt-cat-1.2.0-2.fc28.x86_64
  hmm.., that is strange. I directly manipulate resctrl fs. So I didn't 
hit such kind of issue. I will take a look at this pqos package and let 
you know.
...

------------------------------------------------------------------------

 Since CAT (cache allocation technology) and MBA (memory bandwidth
 allocation) are unrelated and they are controlling different limitation
 we should not group MBA together with CAT in our XML files.  From poor
 documentation it looks like that MBA is related to memory controller.  From Intel
sdm 17.19. MBA used to control the request rate for flushing 
data from llc to memory, usually MBA and llc have a 1:1 mapping 
relation. Yes, I miss exposing capability part. Thanks for pointing out.
...

 Currently the cache allocation in capabilities XML is reported like
 this:

 <capabilities>
    <host>
      ...
      <cache>
        <bank id='0' level='3' type='both' size='30720'
unit='KiB'
cpus='0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46'>
          <control granularity='1536' unit='KiB' type='code'
maxAllocs='8'/>
          <control granularity='1536' unit='KiB' type='data'
maxAllocs='8'/>
        </bank>
        <bank id='1' level='3' type='both' size='30720'
unit='KiB'
cpus='1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47'>
          <control granularity='1536' unit='KiB' type='code'
maxAllocs='8'/>
          <control granularity='1536' unit='KiB' type='data'
maxAllocs='8'/>
        </bank>
      </cache>
      ...
    </host>
 </capabilities>

 So the possible capabilities XML could look like this:

 <capabilities>
    <host>
      ...
      <memory>
        <bank id='0'
cpus='0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46'>
          <control granularity='10' maxAllocs='8'/>
        </bank>
        <bank id='1'
cpus='1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47'>
          <control granularity='10' maxAllocs='8'/>
        </bank>
      </memory>
      ...
    </host>
 </capabilities>

 The element names 'memory' and 'bank' can be named differently,
 suggestions are welcome. How about change bank to node?
...

 Then there is the domain XML, for CAT we use this:

 <domain>
    ...
    <cputune>
      ...
      <cachetune vcpus='0-3'>
        <cache id='0' level='3' type='both' size='3'
unit='MiB'/>
        <cache id='1' level='3' type='both' size='3'
unit='MiB'/>
      </cachetune>
      ...
    <cputune>
    ...
 </domain>

 so the possible domain XML could look like this:

 <domain>
    ...
    <cputune>
      ...
      <memory vcpus='0-3'>
        <socket id='0' bandwidth='30'/>
        <socket id='1' bandwidth='20'/>
      </memory>
      ...
    <cputune>
    ...
 </domain>

 Again, the element names 'memory' and 'socket' can be named differently.
socket --> node?

Since the existing virrestrl implementation only care about cache part 
during development, So we may need change some names of structure and 
functions when enable MBA. How do you think
...

 Pavel

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [RFC PATCH 0/2] Introduce RDT memory bandwidth allocation support