Re: [libvirt] Yet another RFC for CAT

Monday, 4 September 2017

On Mon, Sep 04, 2017 at 04:14:00PM +0200, Martin Kletzander wrote:
...
 * The current design (finally something libvirt-related, right?)

 The discussion ended with a conclusion of the following (with my best
 knowledge, there were so many discussions about so many things that I
 would spend too much time looking up all of them):

 - Users should not need to specify bit masks, such complexity should be
   abstracted.  We'll use sizes (e.g. 4MB)

 - Multiple vCPUs might need to share the same allocation.

 - Exclusivity of allocations is to be assumed, that is only unoccupied
   cache should be used for new allocations.

 The last point seems trivial but it's actually very specific condition
 that, if removed, can cause several problems.  If it's hard to grasp the
 last point together with the second one, you're on the right track.  If
 not, then I'll try to make a point for why the last point should be
 removed in 3... 2... 1...

 * Design flaws

 1) Users cannot specify any allocation that would share only part with
    some other allocation of the domain or the default group.

 2) It was not specified what to do with the default resource group.
    There might be several ways to approach this, with varying pros and
    cons:

     a) Treat it as any other group.  That is any bit set for this group
        will be excluded from usable bits when creating new allocation
        for a domain.

         - Very predictable behaviour

         - You will not be able to allocate any amount of cache without
           previous setting for the default group as that will have all
           the bits set which will make all the cache unusable

     b) Automatically remove the appropriate amount of bits that are
        needed for new domains.

         - No need to do any change to the system settings in order to
           use this new feature

         - We would have to change system settings, which is generally
           frowned upon when done "automatically" as a side effect of
           starting a domain, especially for such scarce resource as
           cache

         - The change to system settings would not be entirely
           predictable

     c) Act like it doesn't exist and don't remove its allocations from
        consideration

         - Doesn't really make sense as system processes might be
           trashing the cache as any VM, moreover when all VM processes
           without allocations will be based in the default group as
           well

 3) There is no way for users to know what the particular settings are
    for any running domain.

 The first point was deemed a corner case.  Fair enough on its own, but
 considering point 2 and its solutions, it is rather difficult for me to
 justify it.  Also, let's say you have domain with 4 vCPUs out of which
 you know 1 might be trashing the cache, but you don't want to restrict
 it completely, but others will utilize it very nicely.  Sensible
 allocations for such domain's vCPUs might be:

  vCPU  0:   000f
  vCPUs 1-3: ffff

 as you want vCPUs 1-3 to utilize even the part of cache that might get
 trashed by vCPU 0.  Or they might share some data (especially
 guest-memory-related).

 The case above is not possible to set up with only per-vcpu(s) scalar
 setting.  And there are more as you might imagine now.  For example how
 do we behave with iothreads and emulator threads? 
Ok, I see what you're getting at.  I've actually forgotten what
our current design looks like though :-)

What level of granularity were we allowing within a guest ?
All vCPUs use separate cache regions from each other, or all
vCPUs use a share cached region, but separate from other guests,
or a mix ?

...
 * My suggestion:

 - Provide an API for querying and changing the allocation of the
   default resource group.  This would be similar to setting and
   querying hugepage allocations (see virsh's freepages/allocpages
   commands). 
Reasonable

...
 - Let users specify the starting position in addition to the size,
i.e.
   not only specifying "size", but also "from".  If "from"
is not
   specified, the whole allocation must be exclusive.  If "from" is
   specified it will be set without checking for collisions.  The latter
   needs them to query the system or know what settings are applied
   (this should be the case all the time), but is better then adding
   non-specific and/or meaningless exclusivity settings (how do you
   specify part-exclusivity of the cache as in the example above) 
I'm concerned about the idea of not checking 'from' for collisions,
if there's allowed a mix of guests with & within 'from'.

eg consider

 * Initially 24 MB of cache is free, starting at 8MB
 * run guest A   from=8M, size=8M
 * run guest B   size=8M
     => libvirt sets from=16M, so doesn't clash with A
 * stop guest A
 * run guest C   size=8M
     => libvirt sets from=8M, so doesn't clash with B
 * restart guest A
     => now clashes with guest C, whereas if you had
        left guest A running, then C would have
	got from=24MB and avoided clash

IOW, if we're to allow users to set 'from', I think we need to
have an explicit flag to indicate whether this is an exclusive
or shared allocation. That way guest A would set 'exclusive',
and so at least see an error when it got a clash with guest
C in the example.

...
 - After starting a domain, fill in any missing information about the
   allocation (I'm generalizing here, but fro now it would only be the
   optional "from" attribute)

 - Add settings not only for vCPUs, but also for other threads as we do
   with pinning, schedulers, etc. 
Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] Yet another RFC for CAT