[libvirt] [V3] RFC for support cache tune in libvirt

Add support for cache allocation. Thanks Martin for the previous version comments, this is the v3 version for RFC , I’v have some PoC code [2]. The follow changes are partly finished by the PoC. #Propose Changes ## virsh command line 1. Extend output of nodeinfo, to expose L3 cache size for Level 3 (last level cache size). This will expose how many cache on a host which can be used. root@s2600wt:~/linux# virsh nodeinfo | grep L3 L3 cache size: 56320 KiB 2. Extend capabilities outputs. virsh capabilities | grep resctrl <cpu> ... <resctrl name='L3' unit='KiB' cache_size='56320' cache_unit='2816'/> </cpu> This will tell that the host have enabled resctrl(which you can find it in /sys/fs/resctrl), And it supports to allocate 'L3' type cache, total 'L3' cache size is 56320 KiB, and the minimum unit size of 'L3' cache is 2816 KiB. P.S. L3 cache size unit is the minum l3 cache unit can be allocated. It's hardware related and can not be changed. 3. Add new virsh command 'nodecachestats': This API is to expose vary cache resouce left on each hardware (cpu socket). It will be formated as: <resource_type>.<resource_id>: left size KiB for example I have a 2 socket cpus host, and I'v enabled cat_l3 feature only root@s2600wt:~/linux# virsh nodecachestats L3.0 : 56320 KiB L3.1 : 56320 KiB P.S. resource_type can be L3, L3DATA, L3CODE, L2 for now. 4. Add new interface to manage how many cache can be allociated for a domain root@s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2 root@s2600wt:~/linux# virsh cachetune kvm02 l3.count : 2 This will allocate 2 units(2816 * 2) l3 cache for domain kvm02 ## Domain XML changes Cache Tuneing <domain> ... <cachetune> <l3_cache_count>2</l3_cache_count> </cachetune> ... </domain> ## Restriction for using cache tune on multiple sockets' host. The l3 cache is per socket resource, kernel need to konw about what's affinity looks like, so for a VM which running on a mulitple socket's host, it should have NUMA setting or vcpuset pin setting. Or cache tune will fail. [1] kernel support https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/tree/arch/x86/kerne... [2] libvirt PoC(not finished yet) https://github.com/taget/libvirt/commits/cat_new Best Regards Eli Qiao(乔立勇)OpenStack Core team OTC Intel. --

On Tue, Jan 10, 2017 at 07:42:59AM +0000, Qiao, Liyong wrote:
Add support for cache allocation.
Thanks Martin for the previous version comments, this is the v3 version for RFC , I’v have some PoC code [2]. The follow changes are partly finished by the PoC.
#Propose Changes
## virsh command line
1. Extend output of nodeinfo, to expose L3 cache size for Level 3 (last level cache size).
This will expose how many cache on a host which can be used.
root@s2600wt:~/linux# virsh nodeinfo | grep L3 L3 cache size: 56320 KiB
Ok, as previously discussed, we should include this in the capabilities XML instead and have info about all the caches. We likely also want to relate which CPUs are associated with which cache in some way. eg if we have this topology <topology> <cells num='2'> <cell id='0'> <cpus num='6'> <cpu id='0' socket_id='0' core_id='0' siblings='0'/> <cpu id='1' socket_id='0' core_id='2' siblings='1'/> <cpu id='2' socket_id='0' core_id='4' siblings='2'/> <cpu id='6' socket_id='0' core_id='1' siblings='6'/> <cpu id='7' socket_id='0' core_id='3' siblings='7'/> <cpu id='8' socket_id='0' core_id='5' siblings='8'/> </cpus> </cell> <cell id='1'> <cpus num='6'> <cpu id='3' socket_id='1' core_id='0' siblings='3'/> <cpu id='4' socket_id='1' core_id='2' siblings='4'/> <cpu id='5' socket_id='1' core_id='4' siblings='5'/> <cpu id='9' socket_id='1' core_id='1' siblings='9'/> <cpu id='10' socket_id='1' core_id='3' siblings='10'/> <cpu id='11' socket_id='1' core_id='5' siblings='11'/> </cpus> </cell> </cells> </topology> We might have something like this cache info <cache> <bank type="l3" size="56320" units="KiB" cpus="0,2,3,6,7,8"/> <bank type="l3" size="56320" units="KiB" cpus="3,4,5,9,10,11"/> <bank type="l2" size="256" units="KiB" cpus="0"/> <bank type="l2" size="256" units="KiB" cpus="1"/> <bank type="l2" size="256" units="KiB" cpus="2"/> <bank type="l2" size="256" units="KiB" cpus="3"/> <bank type="l2" size="256" units="KiB" cpus="4"/> <bank type="l2" size="256" units="KiB" cpus="5"/> <bank type="l2" size="256" units="KiB" cpus="6"/> <bank type="l2" size="256" units="KiB" cpus="7"/> <bank type="l2" size="256" units="KiB" cpus="8"/> <bank type="l2" size="256" units="KiB" cpus="9"/> <bank type="l2" size="256" units="KiB" cpus="10"/> <bank type="l2" size="256" units="KiB" cpus="11"/> <bank type="l1i" size="256" units="KiB" cpus="0"/> <bank type="l1i" size="256" units="KiB" cpus="1"/> <bank type="l1i" size="256" units="KiB" cpus="2"/> <bank type="l1i" size="256" units="KiB" cpus="3"/> <bank type="l1i" size="256" units="KiB" cpus="4"/> <bank type="l1i" size="256" units="KiB" cpus="5"/> <bank type="l1i" size="256" units="KiB" cpus="6"/> <bank type="l1i" size="256" units="KiB" cpus="7"/> <bank type="l1i" size="256" units="KiB" cpus="8"/> <bank type="l1i" size="256" units="KiB" cpus="9"/> <bank type="l1i" size="256" units="KiB" cpus="10"/> <bank type="l1i" size="256" units="KiB" cpus="11"/> <bank type="l1d" size="256" units="KiB" cpus="0"/> <bank type="l1d" size="256" units="KiB" cpus="1"/> <bank type="l1d" size="256" units="KiB" cpus="2"/> <bank type="l1d" size="256" units="KiB" cpus="3"/> <bank type="l1d" size="256" units="KiB" cpus="4"/> <bank type="l1d" size="256" units="KiB" cpus="5"/> <bank type="l1d" size="256" units="KiB" cpus="6"/> <bank type="l1d" size="256" units="KiB" cpus="7"/> <bank type="l1d" size="256" units="KiB" cpus="8"/> <bank type="l1d" size="256" units="KiB" cpus="9"/> <bank type="l1d" size="256" units="KiB" cpus="10"/> <bank type="l1d" size="256" units="KiB" cpus="11"/> </cache> which shows each socket has its own dedicated L3 cache, and each core has its own L2 & L1 cache.
2. Extend capabilities outputs.
virsh capabilities | grep resctrl <cpu> ... <resctrl name='L3' unit='KiB' cache_size='56320' cache_unit='2816'/> </cpu>
This will tell that the host have enabled resctrl(which you can find it in /sys/fs/resctrl), And it supports to allocate 'L3' type cache, total 'L3' cache size is 56320 KiB, and the minimum unit size of 'L3' cache is 2816 KiB. P.S. L3 cache size unit is the minum l3 cache unit can be allocated. It's hardware related and can not be changed.
If we're already reported cache in the capabilities from step one, then it ought to be extendable to cover this reporting. <cache> <bank type="l3" size="56320" units="KiB" cpus="0,2,3,6,7,8"> <control unit="KiB" min="2816"/> </bank> <bank type="l3" size="56320" units="KiB" cpus="3,4,5,9,10,11"> <control unit="KiB" min="2816"/> </bank> </cache> note how we report the control info for both l3 caches, since they come from separate sockets and thus could conceivably report different info if different CPUs were in each socket.
3. Add new virsh command 'nodecachestats': This API is to expose vary cache resouce left on each hardware (cpu socket).
It will be formated as:
<resource_type>.<resource_id>: left size KiB
for example I have a 2 socket cpus host, and I'v enabled cat_l3 feature only
root@s2600wt:~/linux# virsh nodecachestats L3.0 : 56320 KiB L3.1 : 56320 KiB
P.S. resource_type can be L3, L3DATA, L3CODE, L2 for now.
This feels like something we should have in the capabilities XML too rather than a new command <cache> <bank type="l3" size="56320" units="KiB" cpus="0,2,3,6,7,8"> <control unit="KiB" min="2816" avail="56320/> </bank> <bank type="l3" size="56320" units="KiB" cpus="3,4,5,9,10,11"> <control unit="KiB" min="2816" avail="56320"/> </bank> </cache>
4. Add new interface to manage how many cache can be allociated for a domain
root@s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2
root@s2600wt:~/linux# virsh cachetune kvm02 l3.count : 2
This will allocate 2 units(2816 * 2) l3 cache for domain kvm02
## Domain XML changes
Cache Tuneing
<domain> ... <cachetune> <l3_cache_count>2</l3_cache_count> </cachetune> ... </domain>
IIUC, the kernel lets us associate individual PIDs with each cache. Since each vCPU is a PID, this means we are able to allocate different cache size to different CPUs. So we need to be able to represent that in the XML. I think we should also represent the allocation in a normal size (ie KiB), not in count of min unit. So eg this shows allocating two cache banks and giving one to the first 4 cpus, and one to the second 4 cpus <cachetune> <bank type="l3" size="5632" unit="KiB" cpus="0,1,2,3"/> <bank type="l3" size="5632" unit="KiB" cpus="4,5,6,7"/> </cachetune> Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On Wed, Jan 11, 2017 at 10:05:26AM +0000, Daniel P. Berrange wrote:
On Tue, Jan 10, 2017 at 07:42:59AM +0000, Qiao, Liyong wrote:
Add support for cache allocation.
Thanks Martin for the previous version comments, this is the v3 version for RFC , I’v have some PoC code [2]. The follow changes are partly finished by the PoC.
#Propose Changes
## virsh command line
1. Extend output of nodeinfo, to expose L3 cache size for Level 3 (last level cache size).
This will expose how many cache on a host which can be used.
root@s2600wt:~/linux# virsh nodeinfo | grep L3 L3 cache size: 56320 KiB
Ok, as previously discussed, we should include this in the capabilities XML instead and have info about all the caches. We likely also want to relate which CPUs are associated with which cache in some way.
eg if we have this topology
<topology> <cells num='2'> <cell id='0'> <cpus num='6'> <cpu id='0' socket_id='0' core_id='0' siblings='0'/> <cpu id='1' socket_id='0' core_id='2' siblings='1'/> <cpu id='2' socket_id='0' core_id='4' siblings='2'/> <cpu id='6' socket_id='0' core_id='1' siblings='6'/> <cpu id='7' socket_id='0' core_id='3' siblings='7'/> <cpu id='8' socket_id='0' core_id='5' siblings='8'/> </cpus> </cell> <cell id='1'> <cpus num='6'> <cpu id='3' socket_id='1' core_id='0' siblings='3'/> <cpu id='4' socket_id='1' core_id='2' siblings='4'/> <cpu id='5' socket_id='1' core_id='4' siblings='5'/> <cpu id='9' socket_id='1' core_id='1' siblings='9'/> <cpu id='10' socket_id='1' core_id='3' siblings='10'/> <cpu id='11' socket_id='1' core_id='5' siblings='11'/> </cpus> </cell> </cells> </topology>
We might have something like this cache info
<cache> <bank type="l3" size="56320" units="KiB" cpus="0,2,3,6,7,8"/> <bank type="l3" size="56320" units="KiB" cpus="3,4,5,9,10,11"/> <bank type="l2" size="256" units="KiB" cpus="0"/> <bank type="l2" size="256" units="KiB" cpus="1"/> <bank type="l2" size="256" units="KiB" cpus="2"/> <bank type="l2" size="256" units="KiB" cpus="3"/> <bank type="l2" size="256" units="KiB" cpus="4"/> <bank type="l2" size="256" units="KiB" cpus="5"/> <bank type="l2" size="256" units="KiB" cpus="6"/> <bank type="l2" size="256" units="KiB" cpus="7"/> <bank type="l2" size="256" units="KiB" cpus="8"/> <bank type="l2" size="256" units="KiB" cpus="9"/> <bank type="l2" size="256" units="KiB" cpus="10"/> <bank type="l2" size="256" units="KiB" cpus="11"/> <bank type="l1i" size="256" units="KiB" cpus="0"/> <bank type="l1i" size="256" units="KiB" cpus="1"/> <bank type="l1i" size="256" units="KiB" cpus="2"/> <bank type="l1i" size="256" units="KiB" cpus="3"/> <bank type="l1i" size="256" units="KiB" cpus="4"/> <bank type="l1i" size="256" units="KiB" cpus="5"/> <bank type="l1i" size="256" units="KiB" cpus="6"/> <bank type="l1i" size="256" units="KiB" cpus="7"/> <bank type="l1i" size="256" units="KiB" cpus="8"/> <bank type="l1i" size="256" units="KiB" cpus="9"/> <bank type="l1i" size="256" units="KiB" cpus="10"/> <bank type="l1i" size="256" units="KiB" cpus="11"/> <bank type="l1d" size="256" units="KiB" cpus="0"/> <bank type="l1d" size="256" units="KiB" cpus="1"/> <bank type="l1d" size="256" units="KiB" cpus="2"/> <bank type="l1d" size="256" units="KiB" cpus="3"/> <bank type="l1d" size="256" units="KiB" cpus="4"/> <bank type="l1d" size="256" units="KiB" cpus="5"/> <bank type="l1d" size="256" units="KiB" cpus="6"/> <bank type="l1d" size="256" units="KiB" cpus="7"/> <bank type="l1d" size="256" units="KiB" cpus="8"/> <bank type="l1d" size="256" units="KiB" cpus="9"/> <bank type="l1d" size="256" units="KiB" cpus="10"/> <bank type="l1d" size="256" units="KiB" cpus="11"/> </cache>
which shows each socket has its own dedicated L3 cache, and each core has its own L2 & L1 cache.
2. Extend capabilities outputs.
virsh capabilities | grep resctrl <cpu> ... <resctrl name='L3' unit='KiB' cache_size='56320' cache_unit='2816'/> </cpu>
This will tell that the host have enabled resctrl(which you can find it in /sys/fs/resctrl), And it supports to allocate 'L3' type cache, total 'L3' cache size is 56320 KiB, and the minimum unit size of 'L3' cache is 2816 KiB. P.S. L3 cache size unit is the minum l3 cache unit can be allocated. It's hardware related and can not be changed.
If we're already reported cache in the capabilities from step one, then it ought to be extendable to cover this reporting.
<cache> <bank type="l3" size="56320" units="KiB" cpus="0,2,3,6,7,8"> <control unit="KiB" min="2816"/> </bank> <bank type="l3" size="56320" units="KiB" cpus="3,4,5,9,10,11"> <control unit="KiB" min="2816"/> </bank> </cache>
note how we report the control info for both l3 caches, since they come from separate sockets and thus could conceivably report different info if different CPUs were in each socket.
3. Add new virsh command 'nodecachestats': This API is to expose vary cache resouce left on each hardware (cpu socket).
It will be formated as:
<resource_type>.<resource_id>: left size KiB
for example I have a 2 socket cpus host, and I'v enabled cat_l3 feature only
root@s2600wt:~/linux# virsh nodecachestats L3.0 : 56320 KiB L3.1 : 56320 KiB
P.S. resource_type can be L3, L3DATA, L3CODE, L2 for now.
This feels like something we should have in the capabilities XML too rather than a new command
<cache> <bank type="l3" size="56320" units="KiB" cpus="0,2,3,6,7,8"> <control unit="KiB" min="2816" avail="56320/> </bank> <bank type="l3" size="56320" units="KiB" cpus="3,4,5,9,10,11"> <control unit="KiB" min="2816" avail="56320"/> </bank> </cache>
4. Add new interface to manage how many cache can be allociated for a domain
root@s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2
root@s2600wt:~/linux# virsh cachetune kvm02 l3.count : 2
This will allocate 2 units(2816 * 2) l3 cache for domain kvm02
## Domain XML changes
Cache Tuneing
<domain> ... <cachetune> <l3_cache_count>2</l3_cache_count> </cachetune> ... </domain>
IIUC, the kernel lets us associate individual PIDs with each cache. Since each vCPU is a PID, this means we are able to allocate different cache size to different CPUs. So we need to be able to represent that in the XML. I think we should also represent the allocation in a normal size (ie KiB), not in count of min unit.
So eg this shows allocating two cache banks and giving one to the first 4 cpus, and one to the second 4 cpus
<cachetune> <bank type="l3" size="5632" unit="KiB" cpus="0,1,2,3"/> <bank type="l3" size="5632" unit="KiB" cpus="4,5,6,7"/> </cachetune>
I agree with your approach, we just need to keep in mind two more things. I/O threads and the mail QEMU (emulator) thread can have allocations as well. Also we need to say on which socket the allocation should be done.
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On Wed, Jan 11, 2017 at 11:55:28AM +0100, Martin Kletzander wrote:
On Wed, Jan 11, 2017 at 10:05:26AM +0000, Daniel P. Berrange wrote:
IIUC, the kernel lets us associate individual PIDs with each cache. Since each vCPU is a PID, this means we are able to allocate different cache size to different CPUs. So we need to be able to represent that in the XML. I think we should also represent the allocation in a normal size (ie KiB), not in count of min unit.
So eg this shows allocating two cache banks and giving one to the first 4 cpus, and one to the second 4 cpus
<cachetune> <bank type="l3" size="5632" unit="KiB" cpus="0,1,2,3"/> <bank type="l3" size="5632" unit="KiB" cpus="4,5,6,7"/> </cachetune>
I agree with your approach, we just need to keep in mind two more things. I/O threads and the mail QEMU (emulator) thread can have allocations as well. Also we need to say on which socket the allocation should be done.
Also, I wonder if this is better put in the existing <cputune> element, since this is really an aspect of the CPU configuration. Perhaps split configuration of cache banks from the mapping to cpus/iothreads/emulator. Also, per Marcello's mail, we need to include the host cache ID, so we know where to allocate from if there's multiple caches of the same type. So XML could look more like this: <cputune> <cache id="1" host_id="2" type="l3" size="5632" unit="KiB"/> <cache id="2" host_id="4" type="l3" size="5632" unit="KiB"/> <cpu_cache vcpus="0-3" id="1"/> <cpu_cache vcpus="4-7" id="2"/> <iothread_cache iothreads="0-1" id="1"/> <emulator_cache id="2"/> </cputune> Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

2017-01-11 19:09 GMT+08:00 Daniel P. Berrange <berrange@redhat.com>:
On Wed, Jan 11, 2017 at 11:55:28AM +0100, Martin Kletzander wrote:
On Wed, Jan 11, 2017 at 10:05:26AM +0000, Daniel P. Berrange wrote:
IIUC, the kernel lets us associate individual PIDs with each cache. Since each vCPU is a PID, this means we are able to allocate different cache size to different CPUs. So we need to be able to represent that in the XML. I think we should also represent the allocation in a normal size (ie KiB), not in count of min unit.
So eg this shows allocating two cache banks and giving one to the first 4 cpus, and one to the second 4 cpus
<cachetune> <bank type="l3" size="5632" unit="KiB" cpus="0,1,2,3"/> <bank type="l3" size="5632" unit="KiB" cpus="4,5,6,7"/> </cachetune>
I agree with your approach, we just need to keep in mind two more things. I/O threads and the mail QEMU (emulator) thread can have allocations as well. Also we need to say on which socket the allocation should be done.
Also, I wonder if this is better put in the existing <cputune> element, since this is really an aspect of the CPU configuration.
Perhaps split configuration of cache banks from the mapping to cpus/iothreads/emulator. Also, per Marcello's mail, we need to include the host cache ID, so we know where to allocate from if there's multiple caches of the same type. So XML could look more like this:
<cputune> <cache id="1" host_id="2" type="l3" size="5632" unit="KiB"/> <cache id="2" host_id="4" type="l3" size="5632" unit="KiB"/>
I don't think we require host_id here. we can only allow setting cache allocation only IF the VM has vcpu -> pcpu affinity setting. and let libvirt calculate where to set the cache (on which cache_id/resource_id/socket_id, the 3 ids are some meaning) since l3 caches are cpu's resource, only the VM running on specify cpu can benefit the cache. if we explicit allocate cache no care about what's the VM's pcpu affinity, helpless.
<cpu_cache vcpus="0-3" id="1"/>
<cpu_cache vcpus="4-7" id="2"/> <iothread_cache iothreads="0-1" id="1"/> <emulator_cache id="2"/> </cputune>
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
-- Best regards - Eli 天涯无处不重逢 a leaf duckweed belongs to the sea , where not to meet in life

On Thu, Jan 12, 2017 at 10:58:53AM +0800, 乔立勇(Eli Qiao) wrote:
2017-01-11 19:09 GMT+08:00 Daniel P. Berrange <berrange@redhat.com>:
On Wed, Jan 11, 2017 at 11:55:28AM +0100, Martin Kletzander wrote:
On Wed, Jan 11, 2017 at 10:05:26AM +0000, Daniel P. Berrange wrote:
IIUC, the kernel lets us associate individual PIDs with each cache. Since each vCPU is a PID, this means we are able to allocate different cache size to different CPUs. So we need to be able to represent that in the XML. I think we should also represent the allocation in a normal size (ie KiB), not in count of min unit.
So eg this shows allocating two cache banks and giving one to the first 4 cpus, and one to the second 4 cpus
<cachetune> <bank type="l3" size="5632" unit="KiB" cpus="0,1,2,3"/> <bank type="l3" size="5632" unit="KiB" cpus="4,5,6,7"/> </cachetune>
I agree with your approach, we just need to keep in mind two more things. I/O threads and the mail QEMU (emulator) thread can have allocations as well. Also we need to say on which socket the allocation should be done.
Also, I wonder if this is better put in the existing <cputune> element, since this is really an aspect of the CPU configuration.
Perhaps split configuration of cache banks from the mapping to cpus/iothreads/emulator. Also, per Marcello's mail, we need to include the host cache ID, so we know where to allocate from if there's multiple caches of the same type. So XML could look more like this:
<cputune> <cache id="1" host_id="2" type="l3" size="5632" unit="KiB"/> <cache id="2" host_id="4" type="l3" size="5632" unit="KiB"/>
I don't think we require host_id here. we can only allow setting cache allocation only IF the VM has vcpu -> pcpu affinity setting. and let libvirt calculate where to set the cache (on which cache_id/resource_id/socket_id, the 3 ids are some meaning) since l3 caches are cpu's resource, only the VM running on specify cpu can benefit the cache.
if we explicit allocate cache no care about what's the VM's pcpu affinity, helpless.
One thing we need to decide upfront is whether we are going to be fixing user misconfiguration and to which extent because I feel like there's too much discussion about that. So either: a) We make sure that each thread that utilizes CAT is pinned to host threads without split cache, i.e. it cannot be scheduled outside of those. I'm not using socket/core/thread and L3 because we need to be prepared here just in case any other cache hierarchy is used. b) We let user specify whatever they want. Option (a) requires more code, more work, and must be checked on all changes (vcpupin API, XML change, CPU hotplug, etc.), but option (b) goes more with the rest of libvirt's config where we just let the users shoot themselves in their feet by misconfiguration, i.e. if someone wants to allocate cache on socket 0 and schedule all CPUs on socket 1, then it's their fault. Option (a) can save us some specification from the XML, because we can compute some of the values. However, that might not be very reliable and we might end up requiring all the values specified at the end anyway. So from my point of view, I'd rather go with (b) just so we don't swamp ourselves with the details, also we can add the checks later. And most importantly, as mentioned before, it goes with the rest of the code.
<cpu_cache vcpus="0-3" id="1"/>
<cpu_cache vcpus="4-7" id="2"/> <iothread_cache iothreads="0-1" id="1"/> <emulator_cache id="2"/> </cputune>
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
-- Best regards - Eli
天涯无处不重逢 a leaf duckweed belongs to the sea , where not to meet in life

On Thu, Jan 12, 2017 at 10:58:53AM +0800, 乔立勇(Eli Qiao) wrote:
2017-01-11 19:09 GMT+08:00 Daniel P. Berrange <berrange@redhat.com>:
On Wed, Jan 11, 2017 at 11:55:28AM +0100, Martin Kletzander wrote:
On Wed, Jan 11, 2017 at 10:05:26AM +0000, Daniel P. Berrange wrote:
IIUC, the kernel lets us associate individual PIDs with each cache. Since each vCPU is a PID, this means we are able to allocate different cache size to different CPUs. So we need to be able to represent that in the XML. I think we should also represent the allocation in a normal size (ie KiB), not in count of min unit.
So eg this shows allocating two cache banks and giving one to the first 4 cpus, and one to the second 4 cpus
<cachetune> <bank type="l3" size="5632" unit="KiB" cpus="0,1,2,3"/> <bank type="l3" size="5632" unit="KiB" cpus="4,5,6,7"/> </cachetune>
I agree with your approach, we just need to keep in mind two more things. I/O threads and the mail QEMU (emulator) thread can have allocations as well. Also we need to say on which socket the allocation should be done.
Also, I wonder if this is better put in the existing <cputune> element, since this is really an aspect of the CPU configuration.
Perhaps split configuration of cache banks from the mapping to cpus/iothreads/emulator. Also, per Marcello's mail, we need to include the host cache ID, so we know where to allocate from if there's multiple caches of the same type. So XML could look more like this:
<cputune> <cache id="1" host_id="2" type="l3" size="5632" unit="KiB"/> <cache id="2" host_id="4" type="l3" size="5632" unit="KiB"/>
I don't think we require host_id here. we can only allow setting cache allocation only IF the VM has vcpu -> pcpu affinity setting. and let libvirt calculate where to set the cache (on which cache_id/resource_id/socket_id, the 3 ids are some meaning) since l3 caches are cpu's resource, only the VM running on specify cpu can benefit the cache.
Lets say the guest is pinned to CPU 3, and there are two separate L3 caches associated with CPU 3. If we don't include host_id, then libvirt has to decide which of the two possible caches to allocate from. We can do that, but generally we've tried to avoid such policy decisions in libvirt before, hence I thought it preferrable to have the admin be explicit about which cache they want. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

more like this:
<cputune> <cache id="1" host_id="2" type="l3" size="5632" unit="KiB"/> <cache id="2" host_id="4" type="l3" size="5632" unit="KiB"/>
If so, we need to extend "virsh cputune" or and new API like cachetune?
<cpu_cache vcpus="0-3" id="1"/> <cpu_cache vcpus="4-7" id="2"/> <iothread_cache iothreads="0-1" id="1"/> <emulator_cache id="2"/> </cputune>
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
-- Best regards - Eli 天涯无处不重逢 a leaf duckweed belongs to the sea , where not to meet in life

On Thu, Jan 12, 2017 at 11:19:07AM +0800, 乔立勇(Eli Qiao) wrote:
more like this:
<cputune> <cache id="1" host_id="2" type="l3" size="5632" unit="KiB"/> <cache id="2" host_id="4" type="l3" size="5632" unit="KiB"/>
If so, we need to extend "virsh cputune" or and new API like cachetune?
Yeah, sure, that's a detail to be done after the design is done.
<cpu_cache vcpus="0-3" id="1"/> <cpu_cache vcpus="4-7" id="2"/> <iothread_cache iothreads="0-1" id="1"/> <emulator_cache id="2"/> </cputune>
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
-- Best regards - Eli
天涯无处不重逢 a leaf duckweed belongs to the sea , where not to meet in life

On Wed, Jan 11, 2017 at 10:05:26AM +0000, Daniel P. Berrange wrote:
On Tue, Jan 10, 2017 at 07:42:59AM +0000, Qiao, Liyong wrote:
Add support for cache allocation.
Thanks Martin for the previous version comments, this is the v3 version for RFC , I’v have some PoC code [2]. The follow changes are partly finished by the PoC.
#Propose Changes
## virsh command line
1. Extend output of nodeinfo, to expose L3 cache size for Level 3 (last level cache size).
This will expose how many cache on a host which can be used.
root@s2600wt:~/linux# virsh nodeinfo | grep L3 L3 cache size: 56320 KiB
Ok, as previously discussed, we should include this in the capabilities XML instead and have info about all the caches. We likely also want to relate which CPUs are associated with which cache in some way.
eg if we have this topology
<topology> <cells num='2'> <cell id='0'> <cpus num='6'> <cpu id='0' socket_id='0' core_id='0' siblings='0'/> <cpu id='1' socket_id='0' core_id='2' siblings='1'/> <cpu id='2' socket_id='0' core_id='4' siblings='2'/> <cpu id='6' socket_id='0' core_id='1' siblings='6'/> <cpu id='7' socket_id='0' core_id='3' siblings='7'/> <cpu id='8' socket_id='0' core_id='5' siblings='8'/> </cpus> </cell> <cell id='1'> <cpus num='6'> <cpu id='3' socket_id='1' core_id='0' siblings='3'/> <cpu id='4' socket_id='1' core_id='2' siblings='4'/> <cpu id='5' socket_id='1' core_id='4' siblings='5'/> <cpu id='9' socket_id='1' core_id='1' siblings='9'/> <cpu id='10' socket_id='1' core_id='3' siblings='10'/> <cpu id='11' socket_id='1' core_id='5' siblings='11'/> </cpus> </cell> </cells> </topology>
We might have something like this cache info
<cache> <bank type="l3" size="56320" units="KiB" cpus="0,2,3,6,7,8"/> <bank type="l3" size="56320" units="KiB" cpus="3,4,5,9,10,11"/> <bank type="l2" size="256" units="KiB" cpus="0"/> <bank type="l2" size="256" units="KiB" cpus="1"/> <bank type="l2" size="256" units="KiB" cpus="2"/> <bank type="l2" size="256" units="KiB" cpus="3"/> <bank type="l2" size="256" units="KiB" cpus="4"/> <bank type="l2" size="256" units="KiB" cpus="5"/> <bank type="l2" size="256" units="KiB" cpus="6"/> <bank type="l2" size="256" units="KiB" cpus="7"/> <bank type="l2" size="256" units="KiB" cpus="8"/> <bank type="l2" size="256" units="KiB" cpus="9"/> <bank type="l2" size="256" units="KiB" cpus="10"/> <bank type="l2" size="256" units="KiB" cpus="11"/> <bank type="l1i" size="256" units="KiB" cpus="0"/> <bank type="l1i" size="256" units="KiB" cpus="1"/> <bank type="l1i" size="256" units="KiB" cpus="2"/> <bank type="l1i" size="256" units="KiB" cpus="3"/> <bank type="l1i" size="256" units="KiB" cpus="4"/> <bank type="l1i" size="256" units="KiB" cpus="5"/> <bank type="l1i" size="256" units="KiB" cpus="6"/> <bank type="l1i" size="256" units="KiB" cpus="7"/> <bank type="l1i" size="256" units="KiB" cpus="8"/> <bank type="l1i" size="256" units="KiB" cpus="9"/> <bank type="l1i" size="256" units="KiB" cpus="10"/> <bank type="l1i" size="256" units="KiB" cpus="11"/> <bank type="l1d" size="256" units="KiB" cpus="0"/> <bank type="l1d" size="256" units="KiB" cpus="1"/> <bank type="l1d" size="256" units="KiB" cpus="2"/> <bank type="l1d" size="256" units="KiB" cpus="3"/> <bank type="l1d" size="256" units="KiB" cpus="4"/> <bank type="l1d" size="256" units="KiB" cpus="5"/> <bank type="l1d" size="256" units="KiB" cpus="6"/> <bank type="l1d" size="256" units="KiB" cpus="7"/> <bank type="l1d" size="256" units="KiB" cpus="8"/> <bank type="l1d" size="256" units="KiB" cpus="9"/> <bank type="l1d" size="256" units="KiB" cpus="10"/> <bank type="l1d" size="256" units="KiB" cpus="11"/> </cache>
which shows each socket has its own dedicated L3 cache, and each core has its own L2 & L1 cache.
We need to also include the host cache ID value in the XML to let us reliably distinguish / associate with differet cache banks when placing guests, if there's multiple caches of the same type associated with the same CPU. <cache> <bank id="0" type="l3" size="56320" units="KiB" cpus="0,2,3,6,7,8"/> <bank id="1" type="l3" size="56320" units="KiB" cpus="0,2,3,6,7,8"/> <bank id="2" type="l3" size="56320" units="KiB" cpus="3,4,5,9,10,11"/> <bank id="3" type="l3" size="56320" units="KiB" cpus="3,4,5,9,10,11"/> <bank id="4" type="l2" size="256" units="KiB" cpus="0"/> .... </cache>
3. Add new virsh command 'nodecachestats': This API is to expose vary cache resouce left on each hardware (cpu socket).
It will be formated as:
<resource_type>.<resource_id>: left size KiB
for example I have a 2 socket cpus host, and I'v enabled cat_l3 feature only
root@s2600wt:~/linux# virsh nodecachestats L3.0 : 56320 KiB L3.1 : 56320 KiB
P.S. resource_type can be L3, L3DATA, L3CODE, L2 for now.
This feels like something we should have in the capabilities XML too rather than a new command
<cache> <bank type="l3" size="56320" units="KiB" cpus="0,2,3,6,7,8"> <control unit="KiB" min="2816" avail="56320/> </bank> <bank type="l3" size="56320" units="KiB" cpus="3,4,5,9,10,11"> <control unit="KiB" min="2816" avail="56320"/> </bank> </cache>
Opps, ignore this. I remember the reason we always report available resource separately from physically present resource, is that we don't want to re-generate capabilities XML every time available resource changes. So, yes, we do need some API like virNodeFreeCache() / virs nodefreecache We probably want to use an 2d array of typed parameters. The first level of the array would represent the cache bank, the second level woudl represent the parameters for that bank. eg if we had 3 cache banks, we'd report a 3x3 typed parameter array, with parameters for the cache ID, its type and the available / free size id=0 type=l3 avail=56320 id=1 type=l3 avail=56320 id=2 type=l3 avail=56320 Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

which shows each socket has its own dedicated L3 cache, and each core has its own L2 & L1 cache.
We need to also include the host cache ID value in the XML to let us reliably distinguish / associate with differet cache banks when placing guests, if there's multiple caches of the same type associated with the same CPU.
<cache> <bank id="0" type="l3" size="56320" units="KiB" cpus="0,2,3,6,7,8"/> <bank id="1" type="l3" size="56320" units="KiB" cpus="0,2,3,6,7,8"/> <bank id="2" type="l3" size="56320" units="KiB" cpus="3,4,5,9,10,11"/> <bank id="3" type="l3" size="56320" units="KiB" cpus="3,4,5,9,10,11"/> <bank id="4" type="l2" size="256" units="KiB" cpus="0"/> .... </cache>
3. Add new virsh command 'nodecachestats': This API is to expose vary cache resouce left on each hardware (cpu socket).
It will be formated as:
<resource_type>.<resource_id>: left size KiB
for example I have a 2 socket cpus host, and I'v enabled cat_l3 feature only
root@s2600wt:~/linux# virsh nodecachestats L3.0 : 56320 KiB L3.1 : 56320 KiB
P.S. resource_type can be L3, L3DATA, L3CODE, L2 for now.
This feels like something we should have in the capabilities XML too rather than a new command
<cache> <bank type="l3" size="56320" units="KiB" cpus="0,2,3,6,7,8"> <control unit="KiB" min="2816" avail="56320/> </bank> <bank type="l3" size="56320" units="KiB" cpus="3,4,5,9,10,11"> <control unit="KiB" min="2816" avail="56320"/> </bank> </cache>
Opps, ignore this. I remember the reason we always report available resource separately from physically present resource, is that we don't want to re-generate capabilities XML every time available resource changes.
So, yes, we do need some API like virNodeFreeCache() / virs nodefreecache
yes, we need this.
We probably want to use an 2d array of typed parameters. The first level of the array would represent the cache bank, the second level woudl represent the parameters for that bank. eg if we had 3 cache banks, we'd report a 3x3 typed parameter array, with parameters for the cache ID, its type and the available / free size
id=0 type=l3 avail=56320
id=1 type=l3 avail=56320
id=2 type=l3 avail=56320
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
-- Best regards - Eli 天涯无处不重逢 a leaf duckweed belongs to the sea , where not to meet in life

<cache> <bank type="l3" size="56320" units="KiB" cpus="0,2,3,6,7,8"/> <bank type="l3" size="56320" units="KiB" cpus="3,4,5,9,10,11"/>
yes, I like this too, it could tell the the resource sharing logic by cpus. Another thinking is that if kernel enable CDP, it will split l3 cache to code / data type <cache> <bank type="l3code" size="28160" units="KiB" cpus="0,2,3,6,7,8"/> <bank type="l3data" size="28160" units="KiB" cpus="3,4,5,9,10,11"/> So these information should not only from /sys/devices/system/cpu/cpu0/cache/index3/size , also depend on if linux resctrl under /sys/fs/resctrl/
<bank type="l2" size="256" units="KiB" cpus="0"/>
I think on your system you don't enable SMT, so if on a system which enabled SMT. we will have: <bank type="l2" size="256" units="KiB" cpus="0, 44"/> <bank type="l2" size="256" units="KiB" cpus="1"/>
<bank type="l2" size="256" units="KiB" cpus="2"/> <bank type="l2" size="256" units="KiB" cpus="3"/> <bank type="l2" size="256" units="KiB" cpus="4"/> <bank type="l2" size="256" units="KiB" cpus="5"/> <bank type="l2" size="256" units="KiB" cpus="6"/> <bank type="l2" size="256" units="KiB" cpus="7"/> <bank type="l2" size="256" units="KiB" cpus="8"/> <bank type="l2" size="256" units="KiB" cpus="9"/> <bank type="l2" size="256" units="KiB" cpus="10"/> <bank type="l2" size="256" units="KiB" cpus="11"/> <bank type="l1i" size="256" units="KiB" cpus="0"/> <bank type="l1i" size="256" units="KiB" cpus="1"/> <bank type="l1i" size="256" units="KiB" cpus="2"/> <bank type="l1i" size="256" units="KiB" cpus="3"/> <bank type="l1i" size="256" units="KiB" cpus="4"/> <bank type="l1i" size="256" units="KiB" cpus="5"/> <bank type="l1i" size="256" units="KiB" cpus="6"/> <bank type="l1i" size="256" units="KiB" cpus="7"/> <bank type="l1i" size="256" units="KiB" cpus="8"/> <bank type="l1i" size="256" units="KiB" cpus="9"/> <bank type="l1i" size="256" units="KiB" cpus="10"/> <bank type="l1i" size="256" units="KiB" cpus="11"/> <bank type="l1d" size="256" units="KiB" cpus="0"/> <bank type="l1d" size="256" units="KiB" cpus="1"/> <bank type="l1d" size="256" units="KiB" cpus="2"/> <bank type="l1d" size="256" units="KiB" cpus="3"/> <bank type="l1d" size="256" units="KiB" cpus="4"/> <bank type="l1d" size="256" units="KiB" cpus="5"/> <bank type="l1d" size="256" units="KiB" cpus="6"/> <bank type="l1d" size="256" units="KiB" cpus="7"/> <bank type="l1d" size="256" units="KiB" cpus="8"/> <bank type="l1d" size="256" units="KiB" cpus="9"/> <bank type="l1d" size="256" units="KiB" cpus="10"/> <bank type="l1d" size="256" units="KiB" cpus="11"/> </cache>
hmm... l2 and l1 cache are per core, I am not sure if we really need to tune the l2 and l1 cache at all, that's too low level....... Per my understanding, if we expose this kinds of capabilities, we should support to manage it, just wonder if we are too early to expose it since low level (linux kernel) have not support it yet.
which shows each socket has its own dedicated L3 cache, and each core has its own L2 & L1 cache.
2. Extend capabilities outputs.
virsh capabilities | grep resctrl <cpu> ... <resctrl name='L3' unit='KiB' cache_size='56320' cache_unit='2816'/> </cpu>
This will tell that the host have enabled resctrl(which you can find it in /sys/fs/resctrl), And it supports to allocate 'L3' type cache, total 'L3' cache size is 56320 KiB, and the minimum unit size of 'L3' cache is 2816 KiB. P.S. L3 cache size unit is the minum l3 cache unit can be allocated. It's hardware related and can not be changed.
If we're already reported cache in the capabilities from step one, then it ought to be extendable to cover this reporting.
<cache> <bank type="l3" size="56320" units="KiB" cpus="0,2,3,6,7,8"> <control unit="KiB" min="2816"/> </bank> <bank type="l3" size="56320" units="KiB" cpus="3,4,5,9,10,11"> <control unit="KiB" min="2816"/> </bank> </cache>
Looks good to me.
note how we report the control info for both l3 caches, since they come from separate sockets and thus could conceivably report different info if different CPUs were in each socket.
3. Add new virsh command 'nodecachestats': This API is to expose vary cache resouce left on each hardware (cpu socket).
It will be formated as:
<resource_type>.<resource_id>: left size KiB
for example I have a 2 socket cpus host, and I'v enabled cat_l3 feature only
root@s2600wt:~/linux# virsh nodecachestats L3.0 : 56320 KiB L3.1 : 56320 KiB
P.S. resource_type can be L3, L3DATA, L3CODE, L2 for now.
This feels like something we should have in the capabilities XML too rather than a new command
<cache> <bank type="l3" size="56320" units="KiB" cpus="0,2,3,6,7,8"> <control unit="KiB" min="2816" avail="56320/> </bank> <bank type="l3" size="56320" units="KiB" cpus="3,4,5,9,10,11"> <control unit="KiB" min="2816" avail="56320"/> </bank> </cache>
4. Add new interface to manage how many cache can be allociated for a domain
root@s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2
root@s2600wt:~/linux# virsh cachetune kvm02 l3.count : 2
This will allocate 2 units(2816 * 2) l3 cache for domain kvm02
## Domain XML changes
Cache Tuneing
<domain> ... <cachetune> <l3_cache_count>2</l3_cache_count> </cachetune> ... </domain>
IIUC, the kernel lets us associate individual PIDs with each cache. Since each vCPU is a PID, this means we are able to allocate different cache size to different CPUs. So we need to be able to represent that in the XML. I think we should also represent the allocation in a normal size (ie KiB), not in count of min unit.
ok
So eg this shows allocating two cache banks and giving one to the first 4 cpus, and one to the second 4 cpus
<cachetune> <bank type="l3" size="5632" unit="KiB" cpus="0,1,2,3"/> <bank type="l3" size="5632" unit="KiB" cpus="4,5,6,7"/>
oh, that depend what the CPUs topology, so I don't like here to ad cpus = "0, 1, 2 , 3", we can not guarantee VM can running though CPU 0 1 2 3, so they may not benefit the cache bank.
</cachetune>
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
-- Best regards - Eli 天涯无处不重逢 a leaf duckweed belongs to the sea , where not to meet in life

On Thu, Jan 12, 2017 at 11:15:39AM +0800, 乔立勇(Eli Qiao) wrote:
<cache> <bank type="l3" size="56320" units="KiB" cpus="0,2,3,6,7,8"/> <bank type="l3" size="56320" units="KiB" cpus="3,4,5,9,10,11"/>
yes, I like this too, it could tell the the resource sharing logic by cpus.
Another thinking is that if kernel enable CDP, it will split l3 cache to code / data type <cache> <bank type="l3code" size="28160" units="KiB" cpus="0,2,3,6,7,8"/> <bank type="l3data" size="28160" units="KiB" cpus="3,4,5,9,10,11"/>
So these information should not only from /sys/devices/system/cpu/cpu0/cache/index3/size , also depend on if linux resctrl under /sys/fs/resctrl/
<bank type="l2" size="256" units="KiB" cpus="0"/>
I think on your system you don't enable SMT, so if on a system which enabled SMT.
we will have: <bank type="l2" size="256" units="KiB" cpus="0, 44"/>
<bank type="l2" size="256" units="KiB" cpus="1"/>
<bank type="l2" size="256" units="KiB" cpus="2"/> <bank type="l2" size="256" units="KiB" cpus="3"/> <bank type="l2" size="256" units="KiB" cpus="4"/> <bank type="l2" size="256" units="KiB" cpus="5"/> <bank type="l2" size="256" units="KiB" cpus="6"/> <bank type="l2" size="256" units="KiB" cpus="7"/> <bank type="l2" size="256" units="KiB" cpus="8"/> <bank type="l2" size="256" units="KiB" cpus="9"/> <bank type="l2" size="256" units="KiB" cpus="10"/> <bank type="l2" size="256" units="KiB" cpus="11"/> <bank type="l1i" size="256" units="KiB" cpus="0"/> <bank type="l1i" size="256" units="KiB" cpus="1"/> <bank type="l1i" size="256" units="KiB" cpus="2"/> <bank type="l1i" size="256" units="KiB" cpus="3"/> <bank type="l1i" size="256" units="KiB" cpus="4"/> <bank type="l1i" size="256" units="KiB" cpus="5"/> <bank type="l1i" size="256" units="KiB" cpus="6"/> <bank type="l1i" size="256" units="KiB" cpus="7"/> <bank type="l1i" size="256" units="KiB" cpus="8"/> <bank type="l1i" size="256" units="KiB" cpus="9"/> <bank type="l1i" size="256" units="KiB" cpus="10"/> <bank type="l1i" size="256" units="KiB" cpus="11"/> <bank type="l1d" size="256" units="KiB" cpus="0"/> <bank type="l1d" size="256" units="KiB" cpus="1"/> <bank type="l1d" size="256" units="KiB" cpus="2"/> <bank type="l1d" size="256" units="KiB" cpus="3"/> <bank type="l1d" size="256" units="KiB" cpus="4"/> <bank type="l1d" size="256" units="KiB" cpus="5"/> <bank type="l1d" size="256" units="KiB" cpus="6"/> <bank type="l1d" size="256" units="KiB" cpus="7"/> <bank type="l1d" size="256" units="KiB" cpus="8"/> <bank type="l1d" size="256" units="KiB" cpus="9"/> <bank type="l1d" size="256" units="KiB" cpus="10"/> <bank type="l1d" size="256" units="KiB" cpus="11"/> </cache>
hmm... l2 and l1 cache are per core, I am not sure if we really need to tune the l2 and l1 cache at all, that's too low level.......
Per my understanding, if we expose this kinds of capabilities, we should support to manage it, just wonder if we are too early to expose it since low level (linux kernel) have not support it yet.
We don't need to list l2/l1 cache in the XML right now. The example above shows that the schemas is capable of supporting it in the future, which is the important thing. So we can start with only reporting L3, and add l2/l1 later if we find it is needed without having to change the XML again. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On Thu, Jan 12, 2017 at 09:20:30AM +0000, Daniel P. Berrange wrote:
On Thu, Jan 12, 2017 at 11:15:39AM +0800, 乔立勇(Eli Qiao) wrote:
<cache> <bank type="l3" size="56320" units="KiB" cpus="0,2,3,6,7,8"/> <bank type="l3" size="56320" units="KiB" cpus="3,4,5,9,10,11"/>
yes, I like this too, it could tell the the resource sharing logic by cpus.
Another thinking is that if kernel enable CDP, it will split l3 cache to code / data type <cache> <bank type="l3code" size="28160" units="KiB" cpus="0,2,3,6,7,8"/> <bank type="l3data" size="28160" units="KiB" cpus="3,4,5,9,10,11"/>
So these information should not only from /sys/devices/system/cpu/cpu0/cache/index3/size , also depend on if linux resctrl under /sys/fs/resctrl/
<bank type="l2" size="256" units="KiB" cpus="0"/>
I think on your system you don't enable SMT, so if on a system which enabled SMT.
we will have: <bank type="l2" size="256" units="KiB" cpus="0, 44"/>
<bank type="l2" size="256" units="KiB" cpus="1"/>
<bank type="l2" size="256" units="KiB" cpus="2"/> <bank type="l2" size="256" units="KiB" cpus="3"/> <bank type="l2" size="256" units="KiB" cpus="4"/> <bank type="l2" size="256" units="KiB" cpus="5"/> <bank type="l2" size="256" units="KiB" cpus="6"/> <bank type="l2" size="256" units="KiB" cpus="7"/> <bank type="l2" size="256" units="KiB" cpus="8"/> <bank type="l2" size="256" units="KiB" cpus="9"/> <bank type="l2" size="256" units="KiB" cpus="10"/> <bank type="l2" size="256" units="KiB" cpus="11"/> <bank type="l1i" size="256" units="KiB" cpus="0"/> <bank type="l1i" size="256" units="KiB" cpus="1"/> <bank type="l1i" size="256" units="KiB" cpus="2"/> <bank type="l1i" size="256" units="KiB" cpus="3"/> <bank type="l1i" size="256" units="KiB" cpus="4"/> <bank type="l1i" size="256" units="KiB" cpus="5"/> <bank type="l1i" size="256" units="KiB" cpus="6"/> <bank type="l1i" size="256" units="KiB" cpus="7"/> <bank type="l1i" size="256" units="KiB" cpus="8"/> <bank type="l1i" size="256" units="KiB" cpus="9"/> <bank type="l1i" size="256" units="KiB" cpus="10"/> <bank type="l1i" size="256" units="KiB" cpus="11"/> <bank type="l1d" size="256" units="KiB" cpus="0"/> <bank type="l1d" size="256" units="KiB" cpus="1"/> <bank type="l1d" size="256" units="KiB" cpus="2"/> <bank type="l1d" size="256" units="KiB" cpus="3"/> <bank type="l1d" size="256" units="KiB" cpus="4"/> <bank type="l1d" size="256" units="KiB" cpus="5"/> <bank type="l1d" size="256" units="KiB" cpus="6"/> <bank type="l1d" size="256" units="KiB" cpus="7"/> <bank type="l1d" size="256" units="KiB" cpus="8"/> <bank type="l1d" size="256" units="KiB" cpus="9"/> <bank type="l1d" size="256" units="KiB" cpus="10"/> <bank type="l1d" size="256" units="KiB" cpus="11"/> </cache>
hmm... l2 and l1 cache are per core, I am not sure if we really need to tune the l2 and l1 cache at all, that's too low level.......
Per my understanding, if we expose this kinds of capabilities, we should support to manage it, just wonder if we are too early to expose it since low level (linux kernel) have not support it yet.
We don't need to list l2/l1 cache in the XML right now. The example above shows that the schemas is capable of supporting it in the future, which is the important thing. So we can start with only reporting L3, and add l2/l1 later if we find it is needed without having to change the XML again.
Another idea of mine was to expose those caches that hosts supports allocation on (i.e. capability a client can use). But that could feel messy in the end. Just a thought.
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
participants (4)
-
Daniel P. Berrange
-
Martin Kletzander
-
Qiao, Liyong
-
乔立勇(Eli Qiao)