Re: [libvirt] "[V3] RFC for support cache tune in libvirt"

12 Jan 2017

      On Thu, Jan 12, 2017 at 08:47:58AM -0200, Marcelo Tosatti wrote:
...
On Thu, Jan 12, 2017 at 09:44:36AM +0800, 乔立勇(Eli Qiao) wrote:
...
hi, It's really good to have you get involved to support CAT in
libvirt/OpenStack.
replied inlines.
2017-01-11 20:19 GMT+08:00 Marcelo Tosatti <mtosatti@redhat.com>:
...
Hi,
Comments/questions related to:
https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html
1) root s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2
How does allocation of code/data look like?
My plan's expose new options:
virsh cachetune kvm02 --l3data.count 2 --l3code.count 2
Please notes, you can use only l3 or l3data/l3code(if enable cdp while
mount resctrl fs)
Fine. However, you should be able to emulate a type=both reservation
(non cdp) by writing a schemata file with the same CBM bits:
L3code:0=0x000ff;1=0x000ff
      L3data:0=0x000ff;1=0x000ff
(*)
I don't see how this interface enables that possibility.
I suppose it would be easier for mgmt software to have it
done automatically:
virsh cachetune kvm02 --l3 size_in_kbytes.
Would create the reservations as (*) in resctrlfs, in 
case host is CDP enabled.
(also please use kbytes, or give a reason to not use
kbytes).
Note: exposing the unit size is fine as mgmt software might 
decide a placement of VMs which reduces the amount of L3
cache reservation rounding (although i doubt anyone is going
to care about that in practice).
...
...
2) 'nodecachestats' command:
3. Add new virsh command 'nodecachestats':
        This API is to expose vary cache resouce left on each hardware (cpu
        socket).
        It will be formated as:
        <resource_type>.<resource_id>: left size KiB
Does this take into account that only contiguous regions of cbm masks
can be used for allocations?
yes, it is the contiguous regions cbm or in another word it's the default
cbm represent's cache value.
resctrl doesn't allow set non-contiguous cbm (which is restricted by
hardware)
OK.
...
...
Also, it should return the amount of free cache on each cacheid.
yes, it is.  resource_id == cacheid
OK.
...
...
3) The interface should support different sizes for different
cache-ids. See the KVM-RT use case at
https://www.redhat.com/archives/libvir-list/2017-January/msg00415.html
"WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT)".
I don't think it's good to let user specify cache-ids while doing cache
allocation.
This is necessary for our usecase.
...
the cache ids used should rely on what cpu affinity the vm are setting.
The cache ids configuration should match the cpu affinity configuration.
...
eg.
1. for those host who has only one cache id(one socket host), we don't need
to set cache id
Right.
...
2. if multiple cache ids(sockets), user should set vcpu -> pcpu mapping
(define cpuset for a VM), then we (libvirt) need to compute how much cache
on which cache id should set.
Which is to say, user should set the cpu affinity before cache allocation.
I know that the most cases of using CAT is for NFV. As far as I know, NFV
is using NUMA and cpu pining (vcpu -> pcpu mapping), so we don't need to
worry about on which cache id we set the cache size.
So, just let user specify cache size(here my propose is cache unit account)
and let libvirt detect on which cache id set how many cache.
Ok fine, its OK to not expose this to the user but calculate it
internally in libvirt. As long as you recompute the schematas whenever
cpu affinity changes. But using different cache-id's in schemata is
necessary for our usecase.
Hum, thinking again about this, it needs to be per-vcpu. So for the NFV
use-case you want:

	vcpu0: no reservation (belongs to the default group).
	vcpu1: reservation with particular size.

Then if a vcpu is pinned, "trim" the reservation down to the
particular cache-id where its pinned to.

This is important because it allows vcpu0 workload to not 
interfere with the realtime workload running on vcpu1.