[libvirt] OpenStack/libvirt CAT interface

There have been queries about the OpenStack interface for CAT: http://bugzilla.redhat.com/show_bug.cgi?id=1299678 Comment 2 says: Sahid Ferdjaoui 2016-01-19 10:58:48 EST A spec will have to be addressed, after a first look this feature needs some work in several components of Nova to maintain/schedule/consume host's cache. I can work on that spec and implementation it when libvirt will provides information about cache and feature to use it for guests. I could add a comment about parameters to resctrltool, but since this depends on the libvirt interface, it would be good to know what the libvirt interface exposes first. I believe it should be essentially similar to OpenStack's "reserved_host_memory_mb": Set the reserved_host_memory_mb to reserve RAM for host processes. For the purposes of testing I am going to use the default of 512 MB: reserved_host_memory_mb=512 But rather use: rdt_cat_cache_reservation=type=code/data/both,size=10mb,cacheid=2; type=code/data/both,size=2mb,cacheid=1;... (per-vcpu). Where cache-id is optional. What is cache-id (from Documentation/x86/intel_rdt_ui.txt on recent kernel sources): Cache IDs --------- On current generation systems there is one L3 cache per socket and L2 caches are generally just shared by the hyperthreads on a core, but this isn't an architectural requirement. We could have multiple separate L3 caches on a socket, multiple cores could share an L2 cache. So instead of using "socket" or "core" to define the set of logical cpus sharing a resource we use a "Cache ID". At a given cache level this will be a unique number across the whole system (but it isn't guaranteed to be a contiguous sequence, there may be gaps). To find the ID for each logical CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT) ========================================================== For virtualization the following scenario is desired, on a given socket: * VM-A with VCPUs VM-A.vcpu-1, VM-A.vcpu-2. * VM-B with VCPUs VM-B.vcpu-1, VM-B.vcpu-2. With one realtime workload on each vcpu-2. Assume VM-A.vcpu-2 on pcpu 3. Assume VM-B.vcpu-2 on pcpu 5. Assume pcpus 0-5 on cacheid 0. We want VM-A.vcpu-2 to have a certain region of cache reserved, and VM-B.vcpu-2 as well. vcpu-1 for both VMs can use the default group (that is not have reserved L3 cache). This translates to the following resctrltool-style reservations: res.vm-a.vcpu-2 type=both,size=VM-A-RESSIZE,cache-id=0 res.vm-b.vcpu-2 type=both,size=VM-B-RESSIZE,cache-id=0 Which translate to the following in resctrlfs: res.vm-a.vcpu-2 type=both,size=VM-A-RESSIZE,cache-id=0 type=both,size=default-size,cache-id=1 ... res.vm-b.vcpu-2 type=both,size=VM-B-RESSIZE,cache-id=0 type=both,size=default-size,cache-id=1 ... Which is what we want, since the VCPUs are pinned. res.vm-a.vcpu-1 and res.vm-b.vcpu-1 don't need to be assigned to any reservation, which means they'll remain on the default group. RESTRICTIONS TO THE SYNTAX ABOVE ================================ Rules for the parameters: * type=code must be paired with type=data entry. ABOUT THE LIST INTERFACE ======================== About an interface for listing the reservations of the system to OpenStack. I think that what OpenStack needs is to check, before starting a guest on a given host, that there is sufficient space available for the reservation. To do that, it can: 1) resctrltool list (the end of the output mentions how much free space available there is), or via resctrlfs directly (have to lock the filesystem, read each directory, AND each schemata, and count number of zero bits). 2) Via libvirt Should fix resctrltool/API to list amount of contiguous free space BTW.

On Tue, Jan 10, 2017 at 02:18:43PM -0200, Marcelo Tosatti wrote:
There have been queries about the OpenStack interface for CAT:
FYI, there's another mail discussing libvirt design here: https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html
http://bugzilla.redhat.com/show_bug.cgi?id=1299678
Comment 2 says: Sahid Ferdjaoui 2016-01-19 10:58:48 EST A spec will have to be addressed, after a first look this feature needs some work in several components of Nova to maintain/schedule/consume host's cache. I can work on that spec and implementation it when libvirt will provides information about cache and feature to use it for guests.
I could add a comment about parameters to resctrltool, but since this depends on the libvirt interface, it would be good to know what the libvirt interface exposes first.
I believe it should be essentially similar to OpenStack's "reserved_host_memory_mb":
Set the reserved_host_memory_mb to reserve RAM for host processes. For the purposes of testing I am going to use the default of 512 MB: reserved_host_memory_mb=512
But rather use:
rdt_cat_cache_reservation=type=code/data/both,size=10mb,cacheid=2; type=code/data/both,size=2mb,cacheid=1;...
(per-vcpu).
Where cache-id is optional.
What is cache-id (from Documentation/x86/intel_rdt_ui.txt on recent kernel sources): Cache IDs --------- On current generation systems there is one L3 cache per socket and L2 caches are generally just shared by the hyperthreads on a core, but this isn't an architectural requirement. We could have multiple separate L3 caches on a socket, multiple cores could share an L2 cache. So instead of using "socket" or "core" to define the set of logical cpus sharing a resource we use a "Cache ID". At a given cache level this will be a unique number across the whole system (but it isn't guaranteed to be a contiguous sequence, there may be gaps). To find the ID for each logical CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id
So it seems like cache ID is something we need to add to the XML I proposed at https://www.redhat.com/archives/libvir-list/2017-January/msg00489.html
WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT) ==========================================================
For virtualization the following scenario is desired, on a given socket:
* VM-A with VCPUs VM-A.vcpu-1, VM-A.vcpu-2. * VM-B with VCPUs VM-B.vcpu-1, VM-B.vcpu-2.
With one realtime workload on each vcpu-2.
Assume VM-A.vcpu-2 on pcpu 3. Assume VM-B.vcpu-2 on pcpu 5.
Assume pcpus 0-5 on cacheid 0.
We want VM-A.vcpu-2 to have a certain region of cache reserved, and VM-B.vcpu-2 as well. vcpu-1 for both VMs can use the default group (that is not have reserved L3 cache).
This translates to the following resctrltool-style reservations:
res.vm-a.vcpu-2
type=both,size=VM-A-RESSIZE,cache-id=0
res.vm-b.vcpu-2
type=both,size=VM-B-RESSIZE,cache-id=0
Which translate to the following in resctrlfs:
res.vm-a.vcpu-2
type=both,size=VM-A-RESSIZE,cache-id=0 type=both,size=default-size,cache-id=1 ...
res.vm-b.vcpu-2
type=both,size=VM-B-RESSIZE,cache-id=0 type=both,size=default-size,cache-id=1 ...
Which is what we want, since the VCPUs are pinned.
res.vm-a.vcpu-1 and res.vm-b.vcpu-1 don't need to be assigned to any reservation, which means they'll remain on the default group.
You've showing type=both here which IIUC, means data and instruction cache. Is that configuring one cache that serves both purposes ? Do we need to be able to configure them independantly.
RESTRICTIONS TO THE SYNTAX ABOVE ================================
Rules for the parameters: * type=code must be paired with type=data entry.
What does this mean exactly when configuring guests ? Do we have to configure data + instruction cache on the same cache ID, do they have to be the same size, or are they completely independant ?
ABOUT THE LIST INTERFACE ========================
About an interface for listing the reservations of the system to OpenStack.
I think that what OpenStack needs is to check, before starting a guest on a given host, that there is sufficient space available for the reservation.
To do that, it can:
1) resctrltool list (the end of the output mentions how much free space available there is), or via resctrlfs directly (have to lock the filesystem, read each directory, AND each schemata, and count number of zero bits). 2) Via libvirt
Should fix resctrltool/API to list amount of contiguous free space
OpenStack, should just use libvirt APIs exclusively - there should not be any need for it to use other tools if we've designed the libvirt API correctly. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On Wed, Jan 11, 2017 at 10:18:11AM +0000, Daniel P. Berrange wrote:
On Tue, Jan 10, 2017 at 02:18:43PM -0200, Marcelo Tosatti wrote:
There have been queries about the OpenStack interface for CAT:
FYI, there's another mail discussing libvirt design here:
https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html
http://bugzilla.redhat.com/show_bug.cgi?id=1299678
Comment 2 says: Sahid Ferdjaoui 2016-01-19 10:58:48 EST A spec will have to be addressed, after a first look this feature needs some work in several components of Nova to maintain/schedule/consume host's cache. I can work on that spec and implementation it when libvirt will provides information about cache and feature to use it for guests.
I could add a comment about parameters to resctrltool, but since this depends on the libvirt interface, it would be good to know what the libvirt interface exposes first.
I believe it should be essentially similar to OpenStack's "reserved_host_memory_mb":
Set the reserved_host_memory_mb to reserve RAM for host processes. For the purposes of testing I am going to use the default of 512 MB: reserved_host_memory_mb=512
But rather use:
rdt_cat_cache_reservation=type=code/data/both,size=10mb,cacheid=2; type=code/data/both,size=2mb,cacheid=1;...
(per-vcpu).
Where cache-id is optional.
What is cache-id (from Documentation/x86/intel_rdt_ui.txt on recent kernel sources): Cache IDs --------- On current generation systems there is one L3 cache per socket and L2 caches are generally just shared by the hyperthreads on a core, but this isn't an architectural requirement. We could have multiple separate L3 caches on a socket, multiple cores could share an L2 cache. So instead of using "socket" or "core" to define the set of logical cpus sharing a resource we use a "Cache ID". At a given cache level this will be a unique number across the whole system (but it isn't guaranteed to be a contiguous sequence, there may be gaps). To find the ID for each logical CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id
So it seems like cache ID is something we need to add to the XML I proposed at
https://www.redhat.com/archives/libvir-list/2017-January/msg00489.html
WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT) ==========================================================
For virtualization the following scenario is desired, on a given socket:
* VM-A with VCPUs VM-A.vcpu-1, VM-A.vcpu-2. * VM-B with VCPUs VM-B.vcpu-1, VM-B.vcpu-2.
With one realtime workload on each vcpu-2.
Assume VM-A.vcpu-2 on pcpu 3. Assume VM-B.vcpu-2 on pcpu 5.
Assume pcpus 0-5 on cacheid 0.
We want VM-A.vcpu-2 to have a certain region of cache reserved, and VM-B.vcpu-2 as well. vcpu-1 for both VMs can use the default group (that is not have reserved L3 cache).
This translates to the following resctrltool-style reservations:
res.vm-a.vcpu-2
type=both,size=VM-A-RESSIZE,cache-id=0
res.vm-b.vcpu-2
type=both,size=VM-B-RESSIZE,cache-id=0
Which translate to the following in resctrlfs:
res.vm-a.vcpu-2
type=both,size=VM-A-RESSIZE,cache-id=0 type=both,size=default-size,cache-id=1 ...
res.vm-b.vcpu-2
type=both,size=VM-B-RESSIZE,cache-id=0 type=both,size=default-size,cache-id=1 ...
Which is what we want, since the VCPUs are pinned.
res.vm-a.vcpu-1 and res.vm-b.vcpu-1 don't need to be assigned to any reservation, which means they'll remain on the default group.
You've showing type=both here which IIUC, means data and instruction cache.
No, type=both is non-cdp hosts (data and instructions reservations shared). type=data,type=code is for cdp hosts (data and instructions reservations separate).
Is that configuring one cache that serves both purposes ?
Yes.
Do we need to be able to configure them independantly.
Yes.
RESTRICTIONS TO THE SYNTAX ABOVE ================================
Rules for the parameters: * type=code must be paired with type=data entry.
What does this mean exactly when configuring guests ? Do we have to configure data + instruction cache on the same cache ID, do they have to be the same size, or are they completely independant ?
This means that a user can't specify this reservation: type=data,size=10mb,cache-id=1 They have to specify _both_ code and data sizes: type=data,size=10mb,cache-id=1; type=code,size=2mb,cache-id=1 Now a single both reservation is valid: type=both,size=10mb,cache-id=1
ABOUT THE LIST INTERFACE ========================
About an interface for listing the reservations of the system to OpenStack.
I think that what OpenStack needs is to check, before starting a guest on a given host, that there is sufficient space available for the reservation.
To do that, it can:
1) resctrltool list (the end of the output mentions how much free space available there is), or via resctrlfs directly (have to lock the filesystem, read each directory, AND each schemata, and count number of zero bits). 2) Via libvirt
Should fix resctrltool/API to list amount of contiguous free space
OpenStack, should just use libvirt APIs exclusively - there should not be any need for it to use other tools if we've designed the libvirt API correctly.
Got it.

This translates to the following resctrltool-style reservations:
res.vm-a.vcpu-2
type=both,size=VM-A-RESSIZE,cache-id=0
res.vm-b.vcpu-2
type=both,size=VM-B-RESSIZE,cache-id=0
Which translate to the following in resctrlfs:
res.vm-a.vcpu-2
type=both,size=VM-A-RESSIZE,cache-id=0 type=both,size=default-size,cache-id=1 ...
res.vm-b.vcpu-2
type=both,size=VM-B-RESSIZE,cache-id=0 type=both,size=default-size,cache-id=1 ...
if we specified cache-id while specify size without thinking the already existed vcpu->pcpu affinity, cache allocation will be useless. a VM is pinged to socket 1 (which cache_id should be 1) but while specify cache resource , user specify the cache-id=0. since the vm won't be scheduled to socket 0, the cache allocated on socket 0 (cache_id=0) will not be used. I suggest to let libvirt detect if the cache-id.
Which is what we want, since the VCPUs are pinned.
res.vm-a.vcpu-1 and res.vm-b.vcpu-1 don't need to be assigned to any reservation, which means they'll remain on the default group.
You've showing type=both here which IIUC, means data and instruction cache.
No, type=both is non-cdp hosts (data and instructions reservations shared).
type=data,type=code is for cdp hosts (data and instructions reservations separate).
Is that configuring one cache that serves both purposes ?
Yes.
Do we need to be able to configure them independantly.
Yes.
RESTRICTIONS TO THE SYNTAX ABOVE ================================
Rules for the parameters: * type=code must be paired with type=data entry.
What does this mean exactly when configuring guests ? Do we have to configure data + instruction cache on the same cache ID, do they have to be the same size, or are they completely independant ?
This means that a user can't specify this reservation:
type=data,size=10mb,cache-id=1
They have to specify _both_ code and data sizes:
type=data,size=10mb,cache-id=1; type=code,size=2mb,cache-id=1
Now a single both reservation is valid:
type=both,size=10mb,cache-id=1
ABOUT THE LIST INTERFACE ========================
About an interface for listing the reservations of the system to OpenStack.
I think that what OpenStack needs is to check, before starting a guest on a given host, that there is sufficient space available for the reservation.
To do that, it can:
1) resctrltool list (the end of the output mentions how much free space available there is), or via resctrlfs directly (have to lock the filesystem, read each directory, AND each schemata, and count number of zero bits). 2) Via libvirt
Should fix resctrltool/API to list amount of contiguous free space
OpenStack, should just use libvirt APIs exclusively - there should not be any need for it to use other tools if we've designed the libvirt API correctly.
Got it.
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- Best regards - Eli 天涯无处不重逢 a leaf duckweed belongs to the sea , where not to meet in life

On Tue, Jan 10, 2017 at 02:18:41PM -0200, Marcelo Tosatti wrote:
There have been queries about the OpenStack interface for CAT:
http://bugzilla.redhat.com/show_bug.cgi?id=1299678
Comment 2 says: Sahid Ferdjaoui 2016-01-19 10:58:48 EST A spec will have to be addressed, after a first look this feature needs some work in several components of Nova to maintain/schedule/consume host's cache. I can work on that spec and implementation it when libvirt will provides information about cache and feature to use it for guests.
I could add a comment about parameters to resctrltool, but since this depends on the libvirt interface, it would be good to know what the libvirt interface exposes first.
I believe it should be essentially similar to OpenStack's "reserved_host_memory_mb":
Set the reserved_host_memory_mb to reserve RAM for host processes. For the purposes of testing I am going to use the default of 512 MB: reserved_host_memory_mb=512
But rather use:
rdt_cat_cache_reservation=type=code/data/both,size=10mb,cacheid=2; type=code/data/both,size=2mb,cacheid=1;...
(per-vcpu).
Where cache-id is optional.
What is cache-id (from Documentation/x86/intel_rdt_ui.txt on recent kernel sources): Cache IDs --------- On current generation systems there is one L3 cache per socket and L2 caches are generally just shared by the hyperthreads on a core, but this isn't an architectural requirement. We could have multiple separate L3 caches on a socket, multiple cores could share an L2 cache. So instead of using "socket" or "core" to define the set of logical cpus sharing a resource we use a "Cache ID". At a given cache level this will be a unique number across the whole system (but it isn't guaranteed to be a contiguous sequence, there may be gaps). To find the ID for each logical CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id
WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT) ==========================================================
For virtualization the following scenario is desired, on a given socket:
* VM-A with VCPUs VM-A.vcpu-1, VM-A.vcpu-2. * VM-B with VCPUs VM-B.vcpu-1, VM-B.vcpu-2.
With one realtime workload on each vcpu-2.
Assume VM-A.vcpu-2 on pcpu 3. Assume VM-B.vcpu-2 on pcpu 5.
Assume pcpus 0-5 on cacheid 0.
We want VM-A.vcpu-2 to have a certain region of cache reserved, and VM-B.vcpu-2 as well. vcpu-1 for both VMs can use the default group (that is not have reserved L3 cache).
This translates to the following resctrltool-style reservations:
res.vm-a.vcpu-2
type=both,size=VM-A-RESSIZE,cache-id=0
res.vm-b.vcpu-2
type=both,size=VM-B-RESSIZE,cache-id=0
Which translate to the following in resctrlfs:
res.vm-a.vcpu-2
type=both,size=VM-A-RESSIZE,cache-id=0 type=both,size=default-size,cache-id=1 ...
res.vm-b.vcpu-2
type=both,size=VM-B-RESSIZE,cache-id=0 type=both,size=default-size,cache-id=1 ...
Which is what we want, since the VCPUs are pinned.
res.vm-a.vcpu-1 and res.vm-b.vcpu-1 don't need to be assigned to any reservation, which means they'll remain on the default group.
RESTRICTIONS TO THE SYNTAX ABOVE ================================
Rules for the parameters: * type=code must be paired with type=data entry.
ABOUT THE LIST INTERFACE ========================
About an interface for listing the reservations of the system to OpenStack.
I think that what OpenStack needs is to check, before starting a guest on a given host, that there is sufficient space available for the reservation.
To do that, it can:
1) resctrltool list (the end of the output mentions how much free space available there is), or via resctrlfs directly (have to lock the filesystem, read each directory, AND each schemata, and count number of zero bits). 2) Via libvirt
Should fix resctrltool/API to list amount of contiguous free space BTW.
Elements of the libvirt CAT interface: 1) Convertion of kbytes (user specification) --> number of CBM bits for host. resctrlfs exposes the CBM bitmask HW format, where every bit indicates a portion of L3 cache. Therefore each bit refers to a number of ways of L3 cache, therefore a number of kbytes. Users measure or determine the CAT size per VM, so the specification should be in kbytes and not number of bits on any particular host. If you expose the "schemata" interface to users, they need to convert between kbytes --> bits of CBM for that particular host. IMO there is no benefit of exposing this information to higher layers (in fact you only want to think about it when programming the HW interface). 2) Sharing of groups. It is possible that two groups share a certain portion of cache, that is: 1 2 3 4 5 6 7 8 (CBM bits) [ 0 0 1 1 1 1 0 0 ] process-A [ 0 0 0 0 1 1 1 1 ] process-B In this example, processes A and B share bits 5 and 6 of the CBM mask, which indicate a certain portion of L3 cache. That scheme could be generalized in a format as follows: GroupA.size = X kbytes, GroupB.size = Y kbytes, (GroupA,GroupB) share Z kbytes. However, for VMs (and even for normal CAT usage), i don't see any usage for that configuration, because: * Determinism is lost: for the shared regions of L3 cache, process-A can reclaim into process-B's L3 cache. * Have to measure both applications together when determining the shared size. 3) CAT allocation type: both or code/data separation. Older CAT enabled processors support a CBM bitmask without separation of code/data, that is, both code and data cachelines can be reclaimed from a given L3 cache reservation. This means that an application with the following pattern: NR OF ACCESSES | TYPE OF ACCESS 10000 | DATA 100 | CODE 10000 | DATA 100 | CODE Can have a high rate of code memory cache-misses, even with cache allocation. So newer CAT enabled processors support CBM bitmask separation, that is: you can reserve a certain portion of L3 cache for code and another portion of L3 cache for data. This is called CDP (CD stands for Cache-Data i suppose). Given a {type=code, type=data} reservation request from a user, with different sizes, the host can be: CDP enabled host: no problem. Non-CDP enabled host: reservation can only be shared. Which means that high rate of code or data misses can be noted. What is done in resctrlfs, when converting a {type=code, type=data} reservation to type=both, is to reserve a type=both reservation with size equals the sum of both type=code or type=data reservation. However, it is useful to expose whether host is CDP enabled or not to OpenStack (so it can decide whether or not to fail initialization of a VM with {type=code,type=data} reservation on non-CDP host, or not. 4) Size of allocatable reservation size: Other than exposing the L3 cache size, exposing the amount of reservable L3 cache is also required to determine eligibility of execution of a VM on a particular host. Options for the libvirt interface: OPTION-1: expose the full resctrlfs interface ============================================= There is no point in having OpenStack perform "1) Convertion of kbytes (user specification) --> number of CBM bits for host." as detailed above. So we want to expose kbytes to OpenStack. OPTION-2: expose sharing of groups ================================== As noted above, sharing of L3 portions by VMs is not beneficial. OPTION-3: don't expose cbm bits and don't expose sharing of groups ================================================================== What remains is the "type={both,data,code}, size=X, cache-id= Z" format. With an interface to expose CDP/Non-CDP capable host, and another to expose allocatable L3 cache size at that moment.

On Wed, Jan 11, 2017 at 08:39:22AM -0200, Marcelo Tosatti wrote:
On Tue, Jan 10, 2017 at 02:18:41PM -0200, Marcelo Tosatti wrote:
There have been queries about the OpenStack interface for CAT:
http://bugzilla.redhat.com/show_bug.cgi?id=1299678
Comment 2 says: Sahid Ferdjaoui 2016-01-19 10:58:48 EST A spec will have to be addressed, after a first look this feature needs some work in several components of Nova to maintain/schedule/consume host's cache. I can work on that spec and implementation it when libvirt will provides information about cache and feature to use it for guests.
I could add a comment about parameters to resctrltool, but since this depends on the libvirt interface, it would be good to know what the libvirt interface exposes first.
I believe it should be essentially similar to OpenStack's "reserved_host_memory_mb":
Set the reserved_host_memory_mb to reserve RAM for host processes. For the purposes of testing I am going to use the default of 512 MB: reserved_host_memory_mb=512
But rather use:
rdt_cat_cache_reservation=type=code/data/both,size=10mb,cacheid=2; type=code/data/both,size=2mb,cacheid=1;...
(per-vcpu).
Where cache-id is optional.
What is cache-id (from Documentation/x86/intel_rdt_ui.txt on recent kernel sources): Cache IDs --------- On current generation systems there is one L3 cache per socket and L2 caches are generally just shared by the hyperthreads on a core, but this isn't an architectural requirement. We could have multiple separate L3 caches on a socket, multiple cores could share an L2 cache. So instead of using "socket" or "core" to define the set of logical cpus sharing a resource we use a "Cache ID". At a given cache level this will be a unique number across the whole system (but it isn't guaranteed to be a contiguous sequence, there may be gaps). To find the ID for each logical CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id
WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT) ==========================================================
For virtualization the following scenario is desired, on a given socket:
* VM-A with VCPUs VM-A.vcpu-1, VM-A.vcpu-2. * VM-B with VCPUs VM-B.vcpu-1, VM-B.vcpu-2.
With one realtime workload on each vcpu-2.
Assume VM-A.vcpu-2 on pcpu 3. Assume VM-B.vcpu-2 on pcpu 5.
Assume pcpus 0-5 on cacheid 0.
We want VM-A.vcpu-2 to have a certain region of cache reserved, and VM-B.vcpu-2 as well. vcpu-1 for both VMs can use the default group (that is not have reserved L3 cache).
This translates to the following resctrltool-style reservations:
res.vm-a.vcpu-2
type=both,size=VM-A-RESSIZE,cache-id=0
res.vm-b.vcpu-2
type=both,size=VM-B-RESSIZE,cache-id=0
Which translate to the following in resctrlfs:
res.vm-a.vcpu-2
type=both,size=VM-A-RESSIZE,cache-id=0 type=both,size=default-size,cache-id=1 ...
res.vm-b.vcpu-2
type=both,size=VM-B-RESSIZE,cache-id=0 type=both,size=default-size,cache-id=1 ...
Which is what we want, since the VCPUs are pinned.
res.vm-a.vcpu-1 and res.vm-b.vcpu-1 don't need to be assigned to any reservation, which means they'll remain on the default group.
RESTRICTIONS TO THE SYNTAX ABOVE ================================
Rules for the parameters: * type=code must be paired with type=data entry.
ABOUT THE LIST INTERFACE ========================
About an interface for listing the reservations of the system to OpenStack.
I think that what OpenStack needs is to check, before starting a guest on a given host, that there is sufficient space available for the reservation.
To do that, it can:
1) resctrltool list (the end of the output mentions how much free space available there is), or via resctrlfs directly (have to lock the filesystem, read each directory, AND each schemata, and count number of zero bits). 2) Via libvirt
Should fix resctrltool/API to list amount of contiguous free space BTW.
Elements of the libvirt CAT interface:
<snip> Please, lets keep the libvirt API design discussion in one place on this thread - its too confusing to split it up https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
participants (3)
-
Daniel P. Berrange
-
Marcelo Tosatti
-
乔立勇(Eli Qiao)