On Mon, Jun 2, 2025 at 3:23 PM Jiří Denemark <jdenemar(a)redhat.com> wrote:
On Mon, Jun 02, 2025 at 14:30:43 +0200, Hector Cao wrote:
> Hello Jiri,
>
> Thanks for the feedback,
>
> On Mon, Jun 2, 2025 at 9:30 AM Jiri Denemark <jdenemar(a)redhat.com>
wrote:
>
> > On Mon, Jun 02, 2025 at 01:19:29 +0200, Hector Cao wrote:
> > > Several Intel CPU models with TSX technology (HLE & RTM features) are
> > > affected by the vulnerability TAA[1]. One of the mitigation methods
> > > for TAA is to disable TSX support on the host system. For that
purpose,
> > > in 2021, Intel published a microcode update to disable TSX. Linux
kernel
> > > also disables TSX globally by default. Even though TSX can be
activated
> > via
> > > the kernel command line (tsx=on), many Linux distributions stick with
> > > this default behavior and have TSX disabled. This makes existing CPU
> > > models that have HLE and RTM enabled not correctly detected by
> > > libvirt.
> >
> > Can you describe the issue in more details? Especially where libvirt
> > incorrectly detects CPU models because of this?
> >
> >
> On my platform (Granite Rapids CPU) with TSX disabled by default in the
> kernel
> The TSX features rtm and hle are missing, per consequence, `virsh
> capabilities` detects the CPU as
> Icelake-Server-noTSX model.
I see, I was thinking this was the case. The CPU definition provided in
host capabilities is limited and cannot cover CPUs that lack some
features compared to the corresponding CPU model and a simpler CPU model
has to be shown instead. Thus this information is mostly useless (except
for checking what exact features a host CPU supports) and it's not used
for anything by libvirt itself. And since we have a much better way of
describing the host CPU or rather a CPU that can be provided to a guest
on the host (virsh domcapabilities --xpath
"//cpu/mode[@name='host-model']")
there's no reason other applications or users should look at the CPU in
virsh capabilities either. It's similar to how cpu/topology element in
virsh capabilities is useless and should not be used.
So except for not having the right CPU model in the capabilities XML
(which is not a bug, but rather a known limitation), is there any other
issue? I believe the host CPU would be correctly reported as
SapphireRapids/GraniteRapids with both hle and rtm disabled in domain
capabilities XML.
Yes, you are right, if rtm and hle features are available, Granite Rapids
will be correctly reported by virsh capabilities
if the MSR bug is fixed (please take a look at :
https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/XN...
)
You are also right that this is not a bug but rather a known limitation.
However, we are getting regular bug reports from users who are not aware of
this known limitation and
are confused. I would think if we can offer a better experience and save
time for everyone, It might be worth the
effort, especially GraniteRapids would be the last CPU model affected by
this issue.
If you still believe that this little effort is not useful, I would think
that we can tackle this issue by
offering better documentation about this known limitation. What do you
think ?
We are thinking about documenting it on Ubuntu but do you think that we can
do something more upstream ?
Thanks !
> > This commit adds 2 remaining -noTSX models:
> > > - SapphireRapids-noTSX
> > > - GraniteRapids-noTSX
> >
> > QEMU switched away from adding suffixes to CPU models and just adds a
> > new version for a CPU model in case it needs to be updated. There's no
> > point adding these models to libvirt. Any CPU model that would only
> > exist in libvirt would not be directly usable anyway and would have to
> > be translated to another CPU model.
> >
>
> I would be grateful if you can provide me some background on what is the
> criteria to add a
> new version to an existing model. For the case of Intel, how do we know
> that we need to
> add a new version to the CPU model ?
I don't know, you'd need to ask QEMU developers.
> Beyond the naming issue (version vs suffix), I understand that we stopped
> doing what we did for older CPU models
> like this commit for Icelake, do I understand it correctly ?
>
> i386: Add -noTSX aliases for hle=off, rtm=off CPU models
>
https://github.com/qemu/qemu/commit/02fa60d10137ed2ef17534718d7467e0d2170142
This was the original approach for creating modified CPU models that can
be used as-is without having to manually specify bunch of features. But
when more cases appeared they realized such approach didn't scale and
switched to versioned CPU models with -v* suffixes instead.
> Do you think that adding a new version for Sapphire and Granite Rapids
> CPU models both in QEMU and libvirt would be something that makes
> sense to tackle this issue ?
Well, you can try asking whether adding such CPU model in QEMU would
make sense. From libvirt's POV this is just a cosmetic issue so not
worth the effort IMHO.
Jirka
--
Hector CAO
Software Engineer – Partner Engineering Team
hector.cao(a)canonical.com
https://launc <
https://launchpad.net/~hectorcao>hpad.net/~hectorcao
<
https://launchpad.net/~hectorcao>
<
https://launchpad.net/~hectorcao>