On 2/21/24 13:56, Jonathon Jongsma wrote:
On 2/20/24 6:08 PM, Jim Fehlig wrote:
> On 12/15/23 15:11, Jonathon Jongsma wrote:
>> Previously, the script only generated the parent CPU and any versions
>> that had a defined alias. The script now generates all CPU versions. Any
>> version that had a defined alias will continue to use that alias, but
>> those without aliases will use the generated name $BASECPUNAME-vN.
>>
>> The reason for this change is two-fold. First, we need to add new models
>> that support new features (such as SEV-SNP). To deal with this, the
>> script now generates model definitions for all versions.
>>
>> But we also need to ensure that our CPU definitions are migration-safe.
>> To deal with this issue we need to make sure we're always using the
>> canonical versioned names for CPUs.
>
> Related to migration safety, do we need to be concerned with the expansion of
> 'host-model' CPU? E.g. is it possible 'host-model' expands to EPYC
before
> introducing the new models, and EPYC-v4 afterwards? If so, what are the
> ramifications of that?
Indeed, that behavior is possible. In fact, you can see it happening in the test
results for e.g. patch #5 ("Add versioned EPYC CPUs"). But I think that in
general we should be fine, because we don't just expand to a plain model name,
we expand to a model name plus a list of enabled or disabled feature flags.
For example, consider one test output file
(tests/domaincapsdata/qemu_8.1.0-q35.x86_64.xml) that exhibits this behavior
change. Before my patches the host CPU expanded to a model of 'EPYC-ROME', with
a number of feature flags which included (among others):
<feature policy='require' name='amd-ssbd'/>
<feature policy='disable' name='xsaves'/>
After my patches, the host model was expanded to 'EPYC-Rome-v4', with a slightly
different set of feature flags. The main differences being that the 'amd-ssbd'
and 'xsaves' features were no longer mentioned because they are now included in
the -v4 definition. But it also adds a new feature flag to disable the 'ibrs'
feature since this feature is included in the -v4 definition but not in the -v1
definition and is not provided by the host CPU:
<feature policy='disable' name='ibrs'/>
So although the model name is changed, the definition of the host CPU as a whole
should expand to the same set of CPU features.
Thanks for the detailed explanation! It confirms my understanding, which is
helpful for reviewing the series.
I think that this sort of thing can happen every time a new CPU model
is added
(e.g. the -noTSX-IBRS variants), so I think it should work. But if anyone knows
of any specific migration issues, please let me know.
Yep, it's not the first time we've introduced a new CPU model. And I'm now
vaguely recalling a bug with Skylake-Client-IBRS vs Skylake-Client-noTSX-IBRS.
The full details escape me, but IIRC migration of a VM using host-model failed
between supposedly identical hosts with something like "missing hle,rtm
features". In the end I think one or more hosts in the cluster had microcode
update applied, which caused the host CPU to switch from Skylake-Client-IBRS to
Skylake-Client-noTSX-IBRS. No fault of libvirt in that case, but an example of
the scenarios to consider.
Regards,
Jim