On Fri, Mar 06, 2020 at 09:41:43 +0100, Christian Ehrhardt wrote:
One of the mitigation methods for TAA[1] is to disable TSX
support on the host system. Linux added a mechanism to disable
TSX globally through the kernel command line, and many Linux
distributions now default to tsx=off. This makes existing CPU
models that have HLE and RTM enabled not usable anymore.
Add new versions of all CPU models that have the HLE and RTM
features enabled, that can be used when TSX is disabled in the
host system.
On systems disabling the features without those types defined
in cpu-maps users end up without modern CPU types in the list
of usable CPUs to use in the likes of virsh domcapabilities
or tools higher in the stack like virt-manager.
This adds:
-Cascadelake-Server-noTSX
-Icelake-Client-noTSX
-Icelake-Server-noTSX
-Skylake-Server-noTSX-IBRS
-Skylake-Client-noTSX-IBRS
Originally, I was thinking we should just ignore these new CPU models.
After all, there was a consensus the -IBRS models should have never
existed and new suffixes were not introduced for other vulnerabilities
either.
However, noTSX is different. Usually mitigating a CPU vulnerability
involves adding a new CPU feature which needs to be passed to a guest
and it is perfectly fine to keep using an existing model and just
enabling the new feature on top of it (either manually or automatically
in some way). But noTSX is about removing existing features. While an
existing model can still be used when hle and rtm features are
explicitly disabled, the model itself is not directly usable on a host
which masks TSX on the host level. Domains with host-model CPUs will
work just fine, but other use cases will be broken as several CPU models
will suddenly be marked as unusable in domain capabilities.
So I changed my mind and I think we should add all these noTSX variants.
But for better compatibility with existing libvirt releases, we should
make sure these new models will not be used automatically by libvirt,
i.e., as a host-model CPU, because we can express the same CPU in a
compatible way by disabling hle and rtm. But of course, the new CPU
models would be advertised as supported and usable in domain
capabilities and users could explicitly request them. I guess we could
do this by adding a flag to the CPU model XML and check for it in the
CPU model detection code.
In addition to this, we should add a new cputest data for a CPU with
disabled TSX. I already have the test locally based on the CPU data you
gave me on IRC and I'll send the patch shortly. I imagine the test
should go first (showing a wrong CPU model being used as host CPU),
followed by the addition of the new noTSX models (both host and guest
CPUs should change in the test) and finally applying a patch to ignore
the new models for host-model (the guest CPU should change back to the
original model without -noTSX).
Jirka