
See attachment with two graphs: (1) cache bandwidth, (2) blowup of sustained memory bandwidth region... - X axis has a log scale - Light blue line is an older system with 32K L1 and 6M L2 caches - All other measurements on perf34: 32K L1, 256K L2, 30M L3 caches - Majority of variation in L1 cache region is from the two guest measurements done with no taskset to a VCPU: yellow and maroon lines. Perhaps this reflects the test bouncing between VCPUs in the guest. - The sustained memory bandwidth for the guest with no pinning is only 80% of native (maroon line), which motivates more convenient and comprehensive numactl for guests. - Virtualized bandwidth is otherwise nearly in line with native, which confirms the importance of the virtual CPUID communicating actual native cache sizes to cache-size-aware guest applications, since guest apps could benefit from the full size of the native cache. (Guest was started with "-cpu host", but lscpu in guest showed 4M cache despite actual 30M cache.)