Re: [PATCH v3 01/12] cpu_map: update script to handle versioned CPUs

Wednesday, 21 February 2024

On 2/21/24 13:56, Jonathon Jongsma wrote:
...
 On 2/20/24 6:08 PM, Jim Fehlig wrote:
> On 12/15/23 15:11, Jonathon Jongsma wrote:
>> Previously, the script only generated the parent CPU and any versions
>> that had a defined alias. The script now generates all CPU versions. Any
>> version that had a defined alias will continue to use that alias, but
>> those without aliases will use the generated name $BASECPUNAME-vN.
>>
>> The reason for this change is two-fold. First, we need to add new models
>> that support new features (such as SEV-SNP). To deal with this, the
>> script now generates model definitions for all versions.
>>
>> But we also need to ensure that our CPU definitions are migration-safe.
>> To deal with this issue we need to make sure we're always using the
>> canonical versioned names for CPUs.
>
> Related to migration safety, do we need to be concerned with the expansion of 
> 'host-model' CPU? E.g. is it possible 'host-model' expands to EPYC
before 
> introducing the new models, and EPYC-v4 afterwards? If so, what are the 
> ramifications of that?

 Indeed, that behavior is possible. In fact, you can see it happening in the test 
 results for e.g. patch #5 ("Add versioned EPYC CPUs"). But I think that in 
 general we should be fine, because we don't just expand to a plain model name, 
 we expand to a model name plus a list of enabled or disabled feature flags.

 For example, consider one test output file 
 (tests/domaincapsdata/qemu_8.1.0-q35.x86_64.xml) that exhibits this behavior 
 change. Before my patches the host CPU expanded to a model of 'EPYC-ROME', with 
 a number of feature flags which included (among others):

        <feature policy='require' name='amd-ssbd'/>
        <feature policy='disable' name='xsaves'/>

 After my patches, the host model was expanded to 'EPYC-Rome-v4', with a slightly

 different set of feature flags. The main differences being that the 'amd-ssbd' 
 and 'xsaves' features were no longer mentioned because they are now included in 
 the -v4 definition. But it also adds a new feature flag to disable the 'ibrs' 
 feature since this feature is included in the -v4 definition but not in the -v1 
 definition and is not provided by the host CPU:

        <feature policy='disable' name='ibrs'/>

 So although the model name is changed, the definition of the host CPU as a whole 
 should expand to the same set of CPU features. 
Thanks for the detailed explanation! It confirms my understanding, which is 
helpful for reviewing the series.

...
 I think that this sort of thing can happen every time a new CPU model
is added 
 (e.g. the -noTSX-IBRS variants), so I think it should work. But if anyone knows 
 of any specific migration issues, please let me know. 
Yep, it's not the first time we've introduced a new CPU model. And I'm now 
vaguely recalling a bug with Skylake-Client-IBRS vs Skylake-Client-noTSX-IBRS. 
The full details escape me, but IIRC migration of a VM using host-model failed 
between supposedly identical hosts with something like "missing hle,rtm 
features". In the end I think one or more hosts in the cluster had microcode 
update applied, which caused the host CPU to switch from Skylake-Client-IBRS to 
Skylake-Client-noTSX-IBRS. No fault of libvirt in that case, but an example of 
the scenarios to consider.

Regards,
Jim

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [PATCH v3 01/12] cpu_map: update script to handle versioned CPUs