See attachment with two graphs: (1) cache bandwidth, (2) blowup of
sustained memory bandwidth region...
- X axis has a log scale
- Light blue line is an older system with 32K L1 and 6M L2 caches
- All other measurements on perf34: 32K L1, 256K L2, 30M L3 caches
- Majority of variation in L1 cache region is from the two guest
measurements done with no taskset to a VCPU: yellow and maroon lines.
Perhaps this reflects the test bouncing between VCPUs in the guest.
- The sustained memory bandwidth for the guest with no pinning is only
80% of native (maroon line), which motivates more convenient and
comprehensive numactl for guests.
- Virtualized bandwidth is otherwise nearly in line with native, which
confirms the importance of the virtual CPUID communicating actual native
cache sizes to cache-size-aware guest applications, since guest apps
could benefit from the full size of the native cache. (Guest was
started with "-cpu host", but lscpu in guest showed 4M cache despite
actual 30M cache.)