Re: [libvirt] OpenStack/libvirt CAT interface

11 Jan 2017

      On Wed, Jan 11, 2017 at 10:18:11AM +0000, Daniel P. Berrange wrote:
...
On Tue, Jan 10, 2017 at 02:18:43PM -0200, Marcelo Tosatti wrote:
...
There have been queries about the OpenStack interface 
for CAT:
FYI, there's another mail discussing libvirt design here:
https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html
...
http://bugzilla.redhat.com/show_bug.cgi?id=1299678
Comment 2 says:
Sahid Ferdjaoui 2016-01-19 10:58:48 EST
A spec will have to be addressed, after a first look this feature needs
some work in several components of Nova to maintain/schedule/consume
host's cache. I can work on that spec and implementation it when libvirt
will provides information about cache and feature to use it for guests.
I could add a comment about parameters to resctrltool, but since
this depends on the libvirt interface, it would be good to know
what the libvirt interface exposes first.
I believe it should be essentially similar to OpenStack's
"reserved_host_memory_mb":
Set the reserved_host_memory_mb to reserve RAM for host
processes. For
        the purposes of testing I am going to use the default of 512 MB:
        reserved_host_memory_mb=512
But rather use:
rdt_cat_cache_reservation=type=code/data/both,size=10mb,cacheid=2;
                                  type=code/data/both,size=2mb,cacheid=1;...
(per-vcpu).
Where cache-id is optional.
What is cache-id (from Documentation/x86/intel_rdt_ui.txt on recent
kernel sources):
Cache IDs
---------
On current generation systems there is one L3 cache per socket and L2
caches are generally just shared by the hyperthreads on a core, but this
isn't an architectural requirement. We could have multiple separate L3
caches on a socket, multiple cores could share an L2 cache. So instead
of using "socket" or "core" to define the set of logical cpus sharing
a resource we use a "Cache ID". At a given cache level this will be a
unique number across the whole system (but it isn't guaranteed to be a
contiguous sequence, there may be gaps).  To find the ID for each
logical
CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id
So it seems like cache ID is something we need to add to the XML
I proposed at
https://www.redhat.com/archives/libvir-list/2017-January/msg00489.html
...
WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT)
==========================================================
For virtualization the following scenario is desired,
on a given socket:
* VM-A with VCPUs VM-A.vcpu-1, VM-A.vcpu-2.
        * VM-B with VCPUs VM-B.vcpu-1, VM-B.vcpu-2.
With one realtime workload on each vcpu-2.
Assume VM-A.vcpu-2 on pcpu 3.
Assume VM-B.vcpu-2 on pcpu 5.
Assume pcpus 0-5 on cacheid 0.
We want VM-A.vcpu-2 to have a certain region of cache reserved,
and VM-B.vcpu-2 as well. vcpu-1 for both VMs can use the default group
(that is not have reserved L3 cache).
This translates to the following resctrltool-style reservations:
res.vm-a.vcpu-2
type=both,size=VM-A-RESSIZE,cache-id=0
res.vm-b.vcpu-2
type=both,size=VM-B-RESSIZE,cache-id=0
Which translate to the following in resctrlfs:
res.vm-a.vcpu-2
type=both,size=VM-A-RESSIZE,cache-id=0
                type=both,size=default-size,cache-id=1
                ...
res.vm-b.vcpu-2
type=both,size=VM-B-RESSIZE,cache-id=0
                type=both,size=default-size,cache-id=1
                ...
Which is what we want, since the VCPUs are pinned.
res.vm-a.vcpu-1 and res.vm-b.vcpu-1 don't need to
be assigned to any reservation, which means they'll
remain on the default group.
You've showing type=both here which IIUC, means data
and instruction cache.
No, type=both is non-cdp hosts (data and instructions 
reservations shared).

type=data,type=code is for cdp hosts (data and instructions 
reservations separate).
...
Is that configuring one cache
that serves both purposes ?
Yes.
...
Do we need to be able
to configure them independantly.
Yes.
...
...
RESTRICTIONS TO THE SYNTAX ABOVE
================================
Rules for the parameters:
* type=code must be paired with type=data entry.
What does this mean exactly when configuring guests ? Do
we have to configure data + instruction cache on the same
cache ID, do they have to be the same size, or are they
completely independant ?
This means that a user can't specify this reservation:

	type=data,size=10mb,cache-id=1

They have to specify _both_ code and data
sizes:

	type=data,size=10mb,cache-id=1;
	type=code,size=2mb,cache-id=1

Now a single both reservation is valid:

	type=both,size=10mb,cache-id=1
...
...
ABOUT THE LIST INTERFACE
========================
About an interface for listing the reservations
of the system to OpenStack.
I think that what OpenStack needs is to check, before
starting a guest on a given host, that there is sufficient
space available for the reservation.
To do that, it can:
1) resctrltool list (the end of the output mentions
           how much free space available there is), or
           via resctrlfs directly (have to lock the filesystem,
           read each directory, AND each schemata, and count
           number of zero bits).
        2) Via libvirt
Should fix resctrltool/API to list amount of contiguous free space
OpenStack, should just use libvirt APIs exclusively - there should not
be any need for it to use other tools if we've designed the libvirt API
correctly.
Got it.