On 3/18/22 14:33, David Hildenbrand wrote:
On 18.03.22 18:23, Collin Walling wrote:
> On 3/15/22 15:08, David Hildenbrand wrote:
>> On 15.03.22 18:40, Boris Fiuczynski wrote:
>>> On 3/15/22 4:58 PM, David Hildenbrand wrote:
>>>> On 11.03.22 13:44, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> Am 11.03.22 um 10:30 schrieb David Hildenbrand:
>>>>>> On 11.03.22 05:17, Collin Walling wrote:
>>>>>>> The s390x architecture has a growing list of features that
will no longer
>>>>>>> be supported on future hardware releases. This introduces an
issue with
>>>>>>> migration such that guests, running on models with these
features enabled,
>>>>>>> will be rejected outright by machines that do not support
these features.
>>>>>>>
>>>>>>> A current example is the CSSKE feature that has been
deprecated for some time.
>>>>>>> It has been publicly announced that gen15 will be the last
release to
>>>>>>> support this feature, however we have postponed this to
gen16a. A possible
>>>>>>> solution to remedy this would be to create a new QEMU QMP
Response that allows
>>>>>>> users to query for deprecated/unsupported features.
>>>>>>>
>>>>>>> This presents two parts of the puzzle: how to report
deprecated features to
>>>>>>> a user (libvirt) and how should libvirt handle this
information.
>>>>>>>
>>>>>>> First, let's discuss the latter. The patch presented
alongside this cover letter
>>>>>>> attempts to solve the migration issue by hard-coding the
CSSKE feature to be
>>>>>>> disabled for all s390x CPU models. This is done by simply
appending the CSSKE
>>>>>>> feature with the disabled policy to the host-model.
>>>>>>>
>>>>>>> libvirt pseudo:
>>>>>>>
>>>>>>> if arch is s390x
>>>>>>> set CSSKE to disabled for host-model
>>>>>>
>>>>>> That violates host-model semantics and possibly the user intend.
There
>>>>>> would have to be some toggle to manually specify this, for
example, a
>>>>>> new model type or a some magical flag.
>>>>>
>>>>> What we actually want to do is to disable csske completely from QEMU
and
>>>>> thus from the host-model. Then it would not violate the spec.
>>>>> But this has all kind of issues (you cannot migrate from older
versions
>>>>> of software and machines) although the hardware still can provide the
feature.
>>>>>
>>>>> The hardware guys promised me to deprecate things two generations
earlier
>>>>> and we usually deprecate things that are not used or where software
has a
>>>>> runtime switch.
>>>>>
>>>>> From what I hear from you is that you do not want to modify the
host-model
>>>>> semantics to something more useful but rather define a new thing
(e.g. "host-sane") ?
>>>>
>>>> My take would be, to keep the host model consistent, meaning, the
>>>> semantics in QEMU exactly match the semantics in Libvirt. It defines the
>>>> maximum CPU model that's runnable under KVM. If a feature is not
>>>> included (e.g., csske) that feature cannot be enabled in any way.
>>>>
>>>> The "host model" has the semantics of resembling the actual
host CPU.
>>>> This is only partially true, because we support some features the host
>>>> might not support (e.g., zPCI IIRC) and obviously don't support all
host
>>>> features in QEMU.
>>>>
>>>> So instead of playing games on the libvirt side with the host model, I
>>>> see the following alternatives:
>>>>
>>>> 1. Remove the problematic features from the host model in QEMU, like
"we
>>>> just don't support this feature". Consequently, any migration of
a VM
>>>> with csske=on to a new QEMU version will fail, similar to having an
>>>> older QEMU version without support for a certain feature.
>>>>
>>>> "host-passthrough" would change between QEMU versions ... which
I see as
>>>> problematic.
>>>>
>>>> 2. Introduce a new CPU model that has these new semantics: "host
model"
>>>> - deprecated features. Migration of older VMs with csske=on to a new
>>>> QEMU version will work. Make libvirt use/expand that new CPU model
>>>>
>>>> It doesn't necessarily have to be an actual new cpu model. We can use
a
>>>> feature group, like "-cpu host,deprectated-features=false".
What's
>>>> inside "deprecated-features" will actually change between QEMU
versions,
>>>> but we don't really care, as the expanded CPU model won't
change.
>>>>
>>>> "host-passthrough" won't change between QEMU versions ...
>>>>
>>>> 3. As Daniel suggested, don't use the host model, but a CPU model
>>>> indicated as "suggested".
>>>>
>>>> The real issue is that in reality, we don't simply always use a
model
>>>> like "gen15a", but usually want optional features, if they are
around.
>>>> Prime examples are "sie" and friends.
>>>>
>>>>
>>>>
>>>> I tend to prefer 2. With 3. I see issues with optional features like
>>>> "sie" and friends. Often, you really want "give me all you
got, but
>>>> disable deprecated features that might cause problems in the
future".
>>>>
>>>
>>> David,
>>> if I understand you proposal 2 correctly it sounds a lot like Christians
>>> idea of leaving the CPU mode "host-model" as is and introduce a new
CPU
>>> mode "host-recommended" for the new semantics in which
>>> query-cpu-model-expansion would be called with the additional
>>> "deprectated-features" property.
>>> That way libvirt would not have to fiddle around with the deprecation
>>> itself and users would have the option which semantic they want to use.
>>> Is that correct?
>>
>> Yes, exactly.
>>
>>
>
> From what I understand:
>
> QEMU
> - add a "deprecated-features" feature group (more-or-less David's
code)
>
> libvirt
> - recognize a new model name "host-recommended"
> - query QEMU for host-model + deprecated-features and cache it in caps
> file (something like <hostRecCpu>)
> - when guest is defined with "host-recommended", pull <hostRecCPU>
from
> caps when guest is started (similar to how host-model works today)
>
> If this is sufficient, then I can then get to work on this.
>
> My question is what would be the best way to include the deprecated
> features when calculating a baseline or comparison. Both work with the
> host-model and may no longer present an accurate result. Say, for
> example, we baseline a z15 with a gen17 (which will outright not support
> CSSKE). With today's implementation, this might result in a ridiculously
> old CPU model which also does not support CSSKE. The ideal response
> would be a z15 - deprecated features (i.e. host-recommended on a z15),
> but we'd need a way to flag to QEMU that we want to exclude the
> deprecated features. Or am I totally wrong about this?
For baselining, it would be reasonable to always disable deprecated
features, and to ignore them during the model selection. Should be
fairly easy to implement, let me know if you need any pointers.
Thanks David. I'll take a look when I can. I may not be very active this
week due to personal items, but intend to knock this out as soon as
things settle down on my end.
I *assume* that for comparison there is nothing to do.
I think you're right, at least on QEMU's end.
For libvirt, IIRC, comparison will compare the CPU model cached under
the hostCPU tag to whatever is in the XML. If comparing, say, a gen17
host (no csske support) with a gen15 XML, the result should come up as
"incompatible". To a user, they may think "what the heck, shouldn't
old
gen run on new gen?"
Doesn't the comparison QAPI report which features cause the result of
"incompatible"? Would it make sense to amend the libvirt API to report
features causing this issue? I believe this is what the --error flag is
meant to do, but as far as I know, nothing useful is currently reported.
Something like this (assume we're a gen17 host, and cpu.xml contains a
gen15 host-model)
# virsh hypervisor-cpu-compare cpu.xml --error
error: Failed to compare hypervisor CPU with cpu.xml
error: the CPU is incompatible with host CPU
error: host CPU does not support: csske
--
Regards,
Collin
Stay safe and stay healthy