On 21.03.22 10:25, Daniel P. Berrangé wrote:
On Fri, Mar 18, 2022 at 01:23:03PM -0400, Collin Walling wrote:
> On 3/15/22 15:08, David Hildenbrand wrote:
>> On 15.03.22 18:40, Boris Fiuczynski wrote:
>>> On 3/15/22 4:58 PM, David Hildenbrand wrote:
>>>> On 11.03.22 13:44, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> Am 11.03.22 um 10:30 schrieb David Hildenbrand:
>>>>>> On 11.03.22 05:17, Collin Walling wrote:
>>>>>>> The s390x architecture has a growing list of features that
will no longer
>>>>>>> be supported on future hardware releases. This introduces an
issue with
>>>>>>> migration such that guests, running on models with these
features enabled,
>>>>>>> will be rejected outright by machines that do not support
these features.
>>>>>>>
>>>>>>> A current example is the CSSKE feature that has been
deprecated for some time.
>>>>>>> It has been publicly announced that gen15 will be the last
release to
>>>>>>> support this feature, however we have postponed this to
gen16a. A possible
>>>>>>> solution to remedy this would be to create a new QEMU QMP
Response that allows
>>>>>>> users to query for deprecated/unsupported features.
>>>>>>>
>>>>>>> This presents two parts of the puzzle: how to report
deprecated features to
>>>>>>> a user (libvirt) and how should libvirt handle this
information.
>>>>>>>
>>>>>>> First, let's discuss the latter. The patch presented
alongside this cover letter
>>>>>>> attempts to solve the migration issue by hard-coding the
CSSKE feature to be
>>>>>>> disabled for all s390x CPU models. This is done by simply
appending the CSSKE
>>>>>>> feature with the disabled policy to the host-model.
>>>>>>>
>>>>>>> libvirt pseudo:
>>>>>>>
>>>>>>> if arch is s390x
>>>>>>> set CSSKE to disabled for host-model
>>>>>>
>>>>>> That violates host-model semantics and possibly the user intend.
There
>>>>>> would have to be some toggle to manually specify this, for
example, a
>>>>>> new model type or a some magical flag.
>>>>>
>>>>> What we actually want to do is to disable csske completely from QEMU
and
>>>>> thus from the host-model. Then it would not violate the spec.
>>>>> But this has all kind of issues (you cannot migrate from older
versions
>>>>> of software and machines) although the hardware still can provide the
feature.
>>>>>
>>>>> The hardware guys promised me to deprecate things two generations
earlier
>>>>> and we usually deprecate things that are not used or where software
has a
>>>>> runtime switch.
>>>>>
>>>>> From what I hear from you is that you do not want to modify the
host-model
>>>>> semantics to something more useful but rather define a new thing
(e.g. "host-sane") ?
>>>>
>>>> My take would be, to keep the host model consistent, meaning, the
>>>> semantics in QEMU exactly match the semantics in Libvirt. It defines the
>>>> maximum CPU model that's runnable under KVM. If a feature is not
>>>> included (e.g., csske) that feature cannot be enabled in any way.
>>>>
>>>> The "host model" has the semantics of resembling the actual
host CPU.
>>>> This is only partially true, because we support some features the host
>>>> might not support (e.g., zPCI IIRC) and obviously don't support all
host
>>>> features in QEMU.
>>>>
>>>> So instead of playing games on the libvirt side with the host model, I
>>>> see the following alternatives:
>>>>
>>>> 1. Remove the problematic features from the host model in QEMU, like
"we
>>>> just don't support this feature". Consequently, any migration of
a VM
>>>> with csske=on to a new QEMU version will fail, similar to having an
>>>> older QEMU version without support for a certain feature.
>>>>
>>>> "host-passthrough" would change between QEMU versions ... which
I see as
>>>> problematic.
>>>>
>>>> 2. Introduce a new CPU model that has these new semantics: "host
model"
>>>> - deprecated features. Migration of older VMs with csske=on to a new
>>>> QEMU version will work. Make libvirt use/expand that new CPU model
>>>>
>>>> It doesn't necessarily have to be an actual new cpu model. We can use
a
>>>> feature group, like "-cpu host,deprectated-features=false".
What's
>>>> inside "deprecated-features" will actually change between QEMU
versions,
>>>> but we don't really care, as the expanded CPU model won't
change.
>>>>
>>>> "host-passthrough" won't change between QEMU versions ...
>>>>
>>>> 3. As Daniel suggested, don't use the host model, but a CPU model
>>>> indicated as "suggested".
>>>>
>>>> The real issue is that in reality, we don't simply always use a
model
>>>> like "gen15a", but usually want optional features, if they are
around.
>>>> Prime examples are "sie" and friends.
>>>>
>>>>
>>>>
>>>> I tend to prefer 2. With 3. I see issues with optional features like
>>>> "sie" and friends. Often, you really want "give me all you
got, but
>>>> disable deprecated features that might cause problems in the
future".
>>>>
>>>
>>> David,
>>> if I understand you proposal 2 correctly it sounds a lot like Christians
>>> idea of leaving the CPU mode "host-model" as is and introduce a new
CPU
>>> mode "host-recommended" for the new semantics in which
>>> query-cpu-model-expansion would be called with the additional
>>> "deprectated-features" property.
>>> That way libvirt would not have to fiddle around with the deprecation
>>> itself and users would have the option which semantic they want to use.
>>> Is that correct?
>>
>> Yes, exactly.
>>
>>
>
> From what I understand:
>
> QEMU
> - add a "deprecated-features" feature group (more-or-less David's
code)
>
> libvirt
> - recognize a new model name "host-recommended"
> - query QEMU for host-model + deprecated-features and cache it in caps
> file (something like <hostRecCpu>)
> - when guest is defined with "host-recommended", pull <hostRecCPU>
from
> caps when guest is started (similar to how host-model works today)
>
> If this is sufficient, then I can then get to work on this.
>
> My question is what would be the best way to include the deprecated
> features when calculating a baseline or comparison. Both work with the
> host-model and may no longer present an accurate result. Say, for
> example, we baseline a z15 with a gen17 (which will outright not support
> CSSKE). With today's implementation, this might result in a ridiculously
> old CPU model which also does not support CSSKE. The ideal response
> would be a z15 - deprecated features (i.e. host-recommended on a z15),
> but we'd need a way to flag to QEMU that we want to exclude the
> deprecated features. Or am I totally wrong about this?
QEMU has a concept of versioned QEMU models, so you could define a
z15-v2 version without CSSKE
gen15a already comes with csske=false. s390x does not implement
versioned CPU models and as I raised in the past, that concept is rather
a bad fit for s390x.
--
Thanks,
David / dhildenb