On 3/4/24 11:35 AM, Jim Fehlig wrote:
On 3/1/24 10:13, Daniel P. Berrangé wrote:
> On Fri, Mar 01, 2024 at 10:36:12AM -0600, Jonathon Jongsma wrote:
>> On 3/1/24 10:13 AM, Daniel P. Berrangé wrote:
>>> On Tue, Feb 20, 2024 at 05:08:02PM -0700, Jim Fehlig wrote:
>>>> On 12/15/23 15:11, Jonathon Jongsma wrote:
>>>>> Previously, the script only generated the parent CPU and any
versions
>>>>> that had a defined alias. The script now generates all CPU
>>>>> versions. Any
>>>>> version that had a defined alias will continue to use that alias,
but
>>>>> those without aliases will use the generated name $BASECPUNAME-vN.
>>>>>
>>>>> The reason for this change is two-fold. First, we need to add new
>>>>> models
>>>>> that support new features (such as SEV-SNP). To deal with this, the
>>>>> script now generates model definitions for all versions.
>>>>>
>>>>> But we also need to ensure that our CPU definitions are
>>>>> migration-safe.
>>>>> To deal with this issue we need to make sure we're always using
the
>>>>> canonical versioned names for CPUs.
>>>>
>>>> Related to migration safety, do we need to be concerned with the
>>>> expansion
>>>> of 'host-model' CPU? E.g. is it possible 'host-model'
expands to
>>>> EPYC before
>>>> introducing the new models, and EPYC-v4 afterwards? If so, what are
>>>> the
>>>> ramifications of that?
>>>
>>> Yes, I see that happening on my laptop in domcapabilities:
>>>
>>> Currently libvirt reports:
>>>
>>> <mode name='host-model' supported='yes'>
>>> <model fallback='forbid'>Snowridge</model>
>>> <vendor>Intel</vendor>
>>> <maxphysaddr mode='passthrough' limit='46'/>
>>> <feature policy='require' name='ss'/>
>>> <feature policy='require' name='vmx'/>
>>> ...snip...
>>>
>>>
>>> and after this series it reports:
>>>
>>> <mode name='host-model' supported='yes'>
>>> <model fallback='forbid'>Snowridge-v4</model>
>>> <vendor>Intel</vendor>
>>> <maxphysaddr mode='passthrough' limit='46'/>
>>> <feature policy='require' name='ss'/>
>>> <feature policy='require' name='vmx'/>
>>> ...snip...
>>>
>>>
>>> That's not wrong per-se, becasue Snowrigde-v4 has a smaller
>>> delta against my host CPU.
>>>
>>> The problem is that libvirt updates the *live* XML for the
>>> guest with this expansion. IIUC, if we now attempt to
>>> live migrate to a compatible machine running older libvirt
>>> the migrate will fail as old libvirt doesn't know the -v4
>>> CPU.
Downstream, we (SUSE) don't really support migrating from new -> old. Is
this something we aim to support upstream?
I don't know the answer to this question.
>>>
>>> I'm not sure how to address this ?
>>
>> But don't we have this issue any time we add a new CPU model to libvirt?
>> Anytime there's a new model, it has the potential to be a closer
>> match to
>> the host CPU than an existing model definition was. As I mentioned in my
>> previous reply, when e.g. the -noTSX CPU variants were added, didn't the
>> same sort of thing (potentially) happen? Or am I doing something
>> meaningfully different in this patch set than what happens in those
>> scenarios?
>
> I think it probably /did/ happen, but that doesn't make it acceptable.
> The noTSX stuff was the cause of massive amounts of compatibility pain
> for mgmt apps, so the incompatibility in libvirt might have been glossed
> over. We're adding alot of new versions here, so the possibly increasing
> the visibility/impact of this libvirt change.
It can happen when we introduce an entirely new CPU model too. E.g. on a
Genoa machine, prior to commit bfe53e9145c, host model expanded to
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>EPYC-Milan</model>
<vendor>AMD</vendor>
<feature policy='require' name='x2apic'/>
<feature policy='require' name='tsc-deadline'/>
<feature policy='require' name='hypervisor'/>
<feature policy='require' name='tsc_adjust'/>
<feature policy='require' name='avx512f'/>
<feature policy='require' name='avx512dq'/>
<feature policy='require' name='avx512ifma'/>
<feature policy='require' name='avx512cd'/>
<feature policy='require' name='avx512bw'/>
<feature policy='require' name='avx512vl'/>
<feature policy='require' name='avx512vbmi'/>
<feature policy='require' name='avx512vbmi2'/>
<feature policy='require' name='gfni'/>
<feature policy='require' name='vaes'/>
<feature policy='require' name='vpclmulqdq'/>
<feature policy='require' name='avx512vnni'/>
<feature policy='require' name='avx512bitalg'/>
<feature policy='require' name='avx512-vpopcntdq'/>
<feature policy='require' name='la57'/>
<feature policy='require' name='spec-ctrl'/>
<feature policy='require' name='stibp'/>
<feature policy='require' name='arch-capabilities'/>
<feature policy='require' name='ssbd'/>
<feature policy='require' name='avx512-bf16'/>
<feature policy='require' name='cmp_legacy'/>
<feature policy='require' name='virt-ssbd'/>
<feature policy='require' name='rdctl-no'/>
<feature policy='require' name='skip-l1dfl-vmentry'/>
<feature policy='require' name='mds-no'/>
<feature policy='require' name='pschange-mc-no'/>
<feature policy='disable' name='svm'/>
<feature policy='require' name='topoext'/>
<feature policy='disable' name='npt'/>
<feature policy='disable' name='nrip-save'/>
<feature policy='disable' name='svme-addr-chk'/>
</cpu>
After commit bfe53e9145c
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>EPYC-Genoa</model>
<vendor>AMD</vendor>
<feature policy='require' name='x2apic'/>
<feature policy='require' name='tsc-deadline'/>
<feature policy='require' name='hypervisor'/>
<feature policy='require' name='tsc_adjust'/>
<feature policy='require' name='spec-ctrl'/>
<feature policy='require' name='stibp'/>
<feature policy='require' name='arch-capabilities'/>
<feature policy='require' name='ssbd'/>
<feature policy='require' name='cmp_legacy'/>
<feature policy='require' name='virt-ssbd'/>
<feature policy='require' name='rdctl-no'/>
<feature policy='require' name='skip-l1dfl-vmentry'/>
<feature policy='require' name='mds-no'/>
<feature policy='require' name='pschange-mc-no'/>
<feature policy='disable' name='svm'/>
<feature policy='require' name='topoext'/>
<feature policy='disable' name='npt'/>
<feature policy='disable' name='nrip-save'/>
<feature policy='disable' name='svme-addr-chk'/>
</cpu>
Regards,
Jim
Does anybody have a response to this point from Jim? I can't really
think of a way forward if it's not acceptable for the host model
expansion to change between different versions of libvirt.
Jonathon