On 3/22/24 04:54, Daniel P. Berrangé wrote:
On Mon, Mar 04, 2024 at 10:35:25AM -0700, Jim Fehlig wrote:
> On 3/1/24 10:13, Daniel P. Berrangé wrote:
>> On Fri, Mar 01, 2024 at 10:36:12AM -0600, Jonathon Jongsma wrote:
>>> On 3/1/24 10:13 AM, Daniel P. Berrangé wrote:
>>>> On Tue, Feb 20, 2024 at 05:08:02PM -0700, Jim Fehlig wrote:
>>>>> On 12/15/23 15:11, Jonathon Jongsma wrote:
>>>>>> Previously, the script only generated the parent CPU and any
versions
>>>>>> that had a defined alias. The script now generates all CPU
versions. Any
>>>>>> version that had a defined alias will continue to use that alias,
but
>>>>>> those without aliases will use the generated name
$BASECPUNAME-vN.
>>>>>>
>>>>>> The reason for this change is two-fold. First, we need to add new
models
>>>>>> that support new features (such as SEV-SNP). To deal with this,
the
>>>>>> script now generates model definitions for all versions.
>>>>>>
>>>>>> But we also need to ensure that our CPU definitions are
migration-safe.
>>>>>> To deal with this issue we need to make sure we're always
using the
>>>>>> canonical versioned names for CPUs.
>>>>>
>>>>> Related to migration safety, do we need to be concerned with the
expansion
>>>>> of 'host-model' CPU? E.g. is it possible 'host-model'
expands to EPYC before
>>>>> introducing the new models, and EPYC-v4 afterwards? If so, what are
the
>>>>> ramifications of that?
>>>>
>>>> Yes, I see that happening on my laptop in domcapabilities:
>>>>
>>>> Currently libvirt reports:
>>>>
>>>> <mode name='host-model' supported='yes'>
>>>> <model fallback='forbid'>Snowridge</model>
>>>> <vendor>Intel</vendor>
>>>> <maxphysaddr mode='passthrough'
limit='46'/>
>>>> <feature policy='require' name='ss'/>
>>>> <feature policy='require' name='vmx'/>
>>>> ...snip...
>>>>
>>>>
>>>> and after this series it reports:
>>>>
>>>> <mode name='host-model' supported='yes'>
>>>> <model
fallback='forbid'>Snowridge-v4</model>
>>>> <vendor>Intel</vendor>
>>>> <maxphysaddr mode='passthrough'
limit='46'/>
>>>> <feature policy='require' name='ss'/>
>>>> <feature policy='require' name='vmx'/>
>>>> ...snip...
>>>>
>>>>
>>>> That's not wrong per-se, becasue Snowrigde-v4 has a smaller
>>>> delta against my host CPU.
>>>>
>>>> The problem is that libvirt updates the *live* XML for the
>>>> guest with this expansion. IIUC, if we now attempt to
>>>> live migrate to a compatible machine running older libvirt
>>>> the migrate will fail as old libvirt doesn't know the -v4
>>>> CPU.
>
> Downstream, we (SUSE) don't really support migrating from new -> old. Is
> this something we aim to support upstream?
Kind of, sort of, yes and no :)
The VIR_DOMAIN_XML_MIGRATABLE flag is a bit of an attempt to make
it possible to format XML in a way that's (hopefully) mostly acceptable
to older libvirt.
The devil is in the detail though, and there's never really been
any formal testing to prove correctness, so new -> old is one of
those things that may work, please report bugs if we missed
something.
>>>> I'm not sure how to address this ?
>>>
>>> But don't we have this issue any time we add a new CPU model to libvirt?
>>> Anytime there's a new model, it has the potential to be a closer match
to
>>> the host CPU than an existing model definition was. As I mentioned in my
>>> previous reply, when e.g. the -noTSX CPU variants were added, didn't the
>>> same sort of thing (potentially) happen? Or am I doing something
>>> meaningfully different in this patch set than what happens in those
>>> scenarios?
>>
>> I think it probably /did/ happen, but that doesn't make it acceptable.
>> The noTSX stuff was the cause of massive amounts of compatibility pain
>> for mgmt apps, so the incompatibility in libvirt might have been glossed
>> over. We're adding alot of new versions here, so the possibly increasing
>> the visibility/impact of this libvirt change.
>
> It can happen when we introduce an entirely new CPU model too. E.g. on a
> Genoa machine, prior to commit bfe53e9145c, host model expanded to
Yeah, true, so that's a general problem with 'host-model' when
introducing new CPU generations, if that post-dates a user
deploying on said CPU generation..
So what's the consensus on this series? With it, there will certainly be cases
where host-model expands to one of the new versioned models. Question is, how
many of those result in an incompatible CPU from the guest perspective? Without
the series, we'll need to selectively add versioned CPUs when supporting new
features such as sev-snp.
Regards,
Jim