On Mon, Jun 02, 2025 at 14:30:43 +0200, Hector Cao wrote:
Hello Jiri,
Thanks for the feedback,
On Mon, Jun 2, 2025 at 9:30 AM Jiri Denemark <jdenemar(a)redhat.com> wrote:
> On Mon, Jun 02, 2025 at 01:19:29 +0200, Hector Cao wrote:
> > Several Intel CPU models with TSX technology (HLE & RTM features) are
> > affected by the vulnerability TAA[1]. One of the mitigation methods
> > for TAA is to disable TSX support on the host system. For that purpose,
> > in 2021, Intel published a microcode update to disable TSX. Linux kernel
> > also disables TSX globally by default. Even though TSX can be activated
> via
> > the kernel command line (tsx=on), many Linux distributions stick with
> > this default behavior and have TSX disabled. This makes existing CPU
> > models that have HLE and RTM enabled not correctly detected by
> > libvirt.
>
> Can you describe the issue in more details? Especially where libvirt
> incorrectly detects CPU models because of this?
>
>
On my platform (Granite Rapids CPU) with TSX disabled by default in the
kernel
The TSX features rtm and hle are missing, per consequence, `virsh
capabilities` detects the CPU as
Icelake-Server-noTSX model.
I see, I was thinking this was the case. The CPU definition provided in
host capabilities is limited and cannot cover CPUs that lack some
features compared to the corresponding CPU model and a simpler CPU model
has to be shown instead. Thus this information is mostly useless (except
for checking what exact features a host CPU supports) and it's not used
for anything by libvirt itself. And since we have a much better way of
describing the host CPU or rather a CPU that can be provided to a guest
on the host (virsh domcapabilities --xpath
"//cpu/mode[@name='host-model']")
there's no reason other applications or users should look at the CPU in
virsh capabilities either. It's similar to how cpu/topology element in
virsh capabilities is useless and should not be used.
So except for not having the right CPU model in the capabilities XML
(which is not a bug, but rather a known limitation), is there any other
issue? I believe the host CPU would be correctly reported as
SapphireRapids/GraniteRapids with both hle and rtm disabled in domain
capabilities XML.
> > This commit adds 2 remaining -noTSX models:
> > - SapphireRapids-noTSX
> > - GraniteRapids-noTSX
>
> QEMU switched away from adding suffixes to CPU models and just adds a
> new version for a CPU model in case it needs to be updated. There's no
> point adding these models to libvirt. Any CPU model that would only
> exist in libvirt would not be directly usable anyway and would have to
> be translated to another CPU model.
>
I would be grateful if you can provide me some background on what is the
criteria to add a
new version to an existing model. For the case of Intel, how do we know
that we need to
add a new version to the CPU model ?
I don't know, you'd need to ask QEMU developers.
Beyond the naming issue (version vs suffix), I understand that we
stopped
doing what we did for older CPU models
like this commit for Icelake, do I understand it correctly ?
i386: Add -noTSX aliases for hle=off, rtm=off CPU models
https://github.com/qemu/qemu/commit/02fa60d10137ed2ef17534718d7467e0d2170142
This was the original approach for creating modified CPU models that can
be used as-is without having to manually specify bunch of features. But
when more cases appeared they realized such approach didn't scale and
switched to versioned CPU models with -v* suffixes instead.
Do you think that adding a new version for Sapphire and Granite
Rapids
CPU models both in QEMU and libvirt would be something that makes
sense to tackle this issue ?
Well, you can try asking whether adding such CPU model in QEMU would
make sense. From libvirt's POV this is just a cosmetic issue so not
worth the effort IMHO.
Jirka