[PATCH RFC 0/1] s390x CPU Model Feature Deprecation

The s390x architecture has a growing list of features that will no longer be supported on future hardware releases. This introduces an issue with migration such that guests, running on models with these features enabled, will be rejected outright by machines that do not support these features. A current example is the CSSKE feature that has been deprecated for some time. It has been publicly announced that gen15 will be the last release to support this feature, however we have postponed this to gen16a. A possible solution to remedy this would be to create a new QEMU QMP Response that allows users to query for deprecated/unsupported features. This presents two parts of the puzzle: how to report deprecated features to a user (libvirt) and how should libvirt handle this information. First, let's discuss the latter. The patch presented alongside this cover letter attempts to solve the migration issue by hard-coding the CSSKE feature to be disabled for all s390x CPU models. This is done by simply appending the CSSKE feature with the disabled policy to the host-model. libvirt pseudo: if arch is s390x set CSSKE to disabled for host-model This will be recorded under the host-model as (observable via domcapabilities): <mode name='host-model' supported='yes'> <model fallback='forbid'>z13.2-base</model> <feature policy='require' name='aen'/> <feature policy='require' name='aefsi'/> <feature policy='require' name='diag318'/> ... <feature policy='disable' name='csske'/> ... Obviously a hard-coded path is not a desired approach and requires a constant update whenever newer features are listed for deprecation. The patch is presented to instead spin up the discussion as to where it is appropriate to record these deprecated features (e.g. should these be reported under the host-model? or added to the guest CPU definition prior to start up? etc). There is one issue observed by this change to the host-model, denoted directly below. A change in the host-model XML affects the hypervisor-cpu-comparison command, which uses the libvirt-recorded host-model XML. Issuing comparison on a machine that still supports CSSKE (but with it flagged as disabled in the host-model XML) with an equal or older CPU model that does *not* present CSSKE as disabled in the XML will be reported as incompatible. The response should report "identical" or "superset" because technically the hardware still supports the feature. A possible solution is to modify the hypervisor-cpu-comparison command to query the host-model via expansion to get the proper hypervisor CPU model as opposed to using libvirt's modified definition. Secondly, let's discuss the how to report the deprecated features. Namely, an introduction of a new QEMU QMP response. This would be a long-term approach that allows a user to query a list of deprecated features for a particular architecture. A list will be kept within QEMU that contains all deprecated CPU features. This allows the retention of CPU model definitions within QEMU. Libvirt may query this list and update the host-model definition to disable the features reported by QEMU. QEMU QMP Response example: { "execute": "query-cpu-model-deprecated-features" } { "return": { "props": { "name": "csske", "name": "feat_a", "name": "feat_b" }}} libvirt pseudo: if query_deprecated_features is supported list = query_deprecated_features() for each feat in list set feat to disabled for host-model Then, any new features that are flagged for deprecated in the future may simply be added to this "deprecated features" list in QEMU alongside a new CPU definition. Please let me know your thoughts on these approaches. All input is welcome. Thanks. Collin Walling (1): qemu: capabilities: disable csske for host cpu src/qemu/qemu_capabilities.c | 10 ++++++++++ tests/domaincapsdata/qemu_2.11.0.s390x.xml | 1 + tests/domaincapsdata/qemu_2.12.0.s390x.xml | 1 + tests/domaincapsdata/qemu_2.8.0.s390x.xml | 1 + tests/domaincapsdata/qemu_2.9.0.s390x.xml | 1 + tests/domaincapsdata/qemu_3.0.0.s390x.xml | 1 + tests/domaincapsdata/qemu_4.2.0.s390x.xml | 1 + tests/domaincapsdata/qemu_6.0.0.s390x.xml | 1 + 8 files changed, 17 insertions(+) -- 2.31.1

CPU models past gen16a will no longer support the csske feature. In order to secure migration of guests running on machines that still support this feature to machines that do not, let's disable csske in the host-model. Signed-off-by: Collin Walling <walling@linux.ibm.com> --- src/qemu/qemu_capabilities.c | 10 ++++++++++ tests/domaincapsdata/qemu_2.11.0.s390x.xml | 1 + tests/domaincapsdata/qemu_2.12.0.s390x.xml | 1 + tests/domaincapsdata/qemu_2.8.0.s390x.xml | 1 + tests/domaincapsdata/qemu_2.9.0.s390x.xml | 1 + tests/domaincapsdata/qemu_3.0.0.s390x.xml | 1 + tests/domaincapsdata/qemu_4.2.0.s390x.xml | 1 + tests/domaincapsdata/qemu_6.0.0.s390x.xml | 1 + 8 files changed, 17 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 1b28c3f161..6a65c81f81 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -3804,6 +3804,16 @@ virQEMUCapsInitHostCPUModel(virQEMUCaps *qemuCaps, goto error; } + if (ARCH_IS_S390(qemuCaps->arch)) { + /* + * The CSSKE feature will no longer be supported beyond gen16a. + * To protect migration, disable this feature ahead of time + * for all s390x CPU models. + */ + if (virCPUDefAddFeatureIfMissing(cpu, "csske", VIR_CPU_FEATURE_DISABLE) < 0) + goto error; + } + virQEMUCapsSetHostModel(qemuCaps, type, cpu, migCPU, fullCPU); cleanup: diff --git a/tests/domaincapsdata/qemu_2.11.0.s390x.xml b/tests/domaincapsdata/qemu_2.11.0.s390x.xml index 804bf8020e..f21efca122 100644 --- a/tests/domaincapsdata/qemu_2.11.0.s390x.xml +++ b/tests/domaincapsdata/qemu_2.11.0.s390x.xml @@ -61,6 +61,7 @@ <feature policy='require' name='sea_esop2'/> <feature policy='require' name='te'/> <feature policy='require' name='cmm'/> + <feature policy='disable' name='csske'/> </mode> <mode name='custom' supported='yes'> <model usable='yes'>z890.2</model> diff --git a/tests/domaincapsdata/qemu_2.12.0.s390x.xml b/tests/domaincapsdata/qemu_2.12.0.s390x.xml index 5c3d9ce7db..9dc5d1396c 100644 --- a/tests/domaincapsdata/qemu_2.12.0.s390x.xml +++ b/tests/domaincapsdata/qemu_2.12.0.s390x.xml @@ -60,6 +60,7 @@ <feature policy='require' name='sea_esop2'/> <feature policy='require' name='te'/> <feature policy='require' name='cmm'/> + <feature policy='disable' name='csske'/> </mode> <mode name='custom' supported='yes'> <model usable='yes'>z890.2</model> diff --git a/tests/domaincapsdata/qemu_2.8.0.s390x.xml b/tests/domaincapsdata/qemu_2.8.0.s390x.xml index 2c075d7cdb..857cb1ad5b 100644 --- a/tests/domaincapsdata/qemu_2.8.0.s390x.xml +++ b/tests/domaincapsdata/qemu_2.8.0.s390x.xml @@ -48,6 +48,7 @@ <feature policy='require' name='cte'/> <feature policy='require' name='te'/> <feature policy='require' name='cmm'/> + <feature policy='disable' name='csske'/> </mode> <mode name='custom' supported='yes'> <model usable='unknown'>z10EC-base</model> diff --git a/tests/domaincapsdata/qemu_2.9.0.s390x.xml b/tests/domaincapsdata/qemu_2.9.0.s390x.xml index d5b58a786d..2e1ba62dc0 100644 --- a/tests/domaincapsdata/qemu_2.9.0.s390x.xml +++ b/tests/domaincapsdata/qemu_2.9.0.s390x.xml @@ -49,6 +49,7 @@ <feature policy='require' name='cte'/> <feature policy='require' name='te'/> <feature policy='require' name='cmm'/> + <feature policy='disable' name='csske'/> </mode> <mode name='custom' supported='yes'> <model usable='unknown'>z10EC-base</model> diff --git a/tests/domaincapsdata/qemu_3.0.0.s390x.xml b/tests/domaincapsdata/qemu_3.0.0.s390x.xml index f49b6907ff..1b6f64e69f 100644 --- a/tests/domaincapsdata/qemu_3.0.0.s390x.xml +++ b/tests/domaincapsdata/qemu_3.0.0.s390x.xml @@ -64,6 +64,7 @@ <feature policy='require' name='sea_esop2'/> <feature policy='require' name='te'/> <feature policy='require' name='cmm'/> + <feature policy='disable' name='csske'/> </mode> <mode name='custom' supported='yes'> <model usable='yes'>z890.2</model> diff --git a/tests/domaincapsdata/qemu_4.2.0.s390x.xml b/tests/domaincapsdata/qemu_4.2.0.s390x.xml index fb162ea578..b41929b585 100644 --- a/tests/domaincapsdata/qemu_4.2.0.s390x.xml +++ b/tests/domaincapsdata/qemu_4.2.0.s390x.xml @@ -81,6 +81,7 @@ <feature policy='require' name='sea_esop2'/> <feature policy='require' name='te'/> <feature policy='require' name='cmm'/> + <feature policy='disable' name='csske'/> </mode> <mode name='custom' supported='yes'> <model usable='yes'>z800-base</model> diff --git a/tests/domaincapsdata/qemu_6.0.0.s390x.xml b/tests/domaincapsdata/qemu_6.0.0.s390x.xml index 13fa3a637e..da4017541d 100644 --- a/tests/domaincapsdata/qemu_6.0.0.s390x.xml +++ b/tests/domaincapsdata/qemu_6.0.0.s390x.xml @@ -84,6 +84,7 @@ <feature policy='require' name='sea_esop2'/> <feature policy='require' name='te'/> <feature policy='require' name='cmm'/> + <feature policy='disable' name='csske'/> </mode> <mode name='custom' supported='yes'> <model usable='yes'>z800-base</model> -- 2.31.1

On Thu, Mar 10, 2022 at 11:17:38PM -0500, Collin Walling wrote:
CPU models past gen16a will no longer support the csske feature. In order to secure migration of guests running on machines that still support this feature to machines that do not, let's disable csske in the host-model.
The problem scenario you describe is the intended semantics of host-model though. It enables all features available in the host that you launched on. It lets you live migrate to a target host with the same, or a greater number of features. If the target has a greater number of features, it should restrict the VM to the subset of features that were present on the original source CPU. If the target has fewer features, then you simply can't live migrate a VM using host-model. To get live migration in both directions across CPUs with differing featuresets, then the VM needs to be configured with a named CPU model that is a subset of both, rather than host-model. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 11.03.22 10:17, Daniel P. Berrangé wrote:
On Thu, Mar 10, 2022 at 11:17:38PM -0500, Collin Walling wrote:
CPU models past gen16a will no longer support the csske feature. In order to secure migration of guests running on machines that still support this feature to machines that do not, let's disable csske in the host-model.
Sorry to say, removing CPU features is a no-go when wanting to guarantee forward migration without taking care about CPU model details manually and simply using the host model. Self-made HW vendor problem.
The problem scenario you describe is the intended semantics of host-model though. It enables all features available in the host that you launched on. It lets you live migrate to a target host with the same, or a greater number of features. If the target has a greater number of features, it should restrict the VM to the subset of features that were present on the original source CPU. If the target has fewer features, then you simply can't live migrate a VM using host-model.
To get live migration in both directions across CPUs with differing featuresets, then the VM needs to be configured with a named CPU model that is a subset of both, rather than host-model.
Right, and cpu-model-baseline does that job for you if you're lazy to lookup the proper model. -- Thanks, David / dhildenb

Am 11.03.22 um 10:23 schrieb David Hildenbrand:
On 11.03.22 10:17, Daniel P. Berrangé wrote:
On Thu, Mar 10, 2022 at 11:17:38PM -0500, Collin Walling wrote:
CPU models past gen16a will no longer support the csske feature. In order to secure migration of guests running on machines that still support this feature to machines that do not, let's disable csske in the host-model.
Sorry to say, removing CPU features is a no-go when wanting to guarantee forward migration without taking care about CPU model details manually and simply using the host model. Self-made HW vendor problem.
And this simply does not reflect reality. Intel and Power have removed TX for example. We can now sit back and please ourselves how we live in our world of dreams. Or we can try to define an interface that deals with reality and actually solves problems.
The problem scenario you describe is the intended semantics of host-model though. It enables all features available in the host that you launched on. It lets you live migrate to a target host with the same, or a greater number of features. If the target has a greater number of features, it should restrict the VM to the subset of features that were present on the original source CPU. If the target has fewer features, then you simply can't live migrate a VM using host-model.
To get live migration in both directions across CPUs with differing featuresets, then the VM needs to be configured with a named CPU model that is a subset of both, rather than host-model.
Right, and cpu-model-baseline does that job for you if you're lazy to lookup the proper model.
Yes baseline will work, but this requires tooling like openstack. The normal user will just use the default and this is host-model. Let me explain the usecase for this feature. Migration between different versins baseline: always works host-passthrough: you get what you deserve default model: works We have disabled CSSKE from our default models (-cpu gen15a will not present csske). So that works as well. host-model: Also works for all machines that have csske. Now: Lets say gen17 will no longer support this. That means that we can not migrate host-model from gen16 or gen15 because those will have csske. What options do we have? If we disable csske in the host capabilities that would mean that a host compare against an xml from an older QEMU would fail (even if you move from gen14 to gen14). So this is not a good option. By disabling deprecated features ONLY for the _initial_ expansion of model-model, but keeping it in the host capabilities you can migrate existing guests (with the feature) as we only disable in the expansion, but manually asking for it still works. AND it will allow to move this instantiation of the guest to future machines without the feature. Basically everything works. The alternative of removing csske would result in too many failure scenarios.

On 11.03.22 13:12, Christian Borntraeger wrote:
Am 11.03.22 um 10:23 schrieb David Hildenbrand:
On 11.03.22 10:17, Daniel P. Berrangé wrote:
On Thu, Mar 10, 2022 at 11:17:38PM -0500, Collin Walling wrote:
CPU models past gen16a will no longer support the csske feature. In order to secure migration of guests running on machines that still support this feature to machines that do not, let's disable csske in the host-model.
Sorry to say, removing CPU features is a no-go when wanting to guarantee forward migration without taking care about CPU model details manually and simply using the host model. Self-made HW vendor problem.
And this simply does not reflect reality. Intel and Power have removed TX for example. We can now sit back and please ourselves how we live in our world of dreams. Or we can try to define an interface that deals with reality and actually solves problems.
Ehm, so, I spell out the obvious and get such a reaction? Okay, thank you. See my other reply, maybe we want a different kind of "host-model" from QEMU. It's Friday and I'm not particularly motivated to participate further in this discussion today. So I'm going to step away for today, please myself and live in a world of dreams. -- Thanks, David / dhildenb

Am 11.03.22 um 13:27 schrieb David Hildenbrand:
On 11.03.22 13:12, Christian Borntraeger wrote:
Am 11.03.22 um 10:23 schrieb David Hildenbrand:
On 11.03.22 10:17, Daniel P. Berrangé wrote:
On Thu, Mar 10, 2022 at 11:17:38PM -0500, Collin Walling wrote:
CPU models past gen16a will no longer support the csske feature. In order to secure migration of guests running on machines that still support this feature to machines that do not, let's disable csske in the host-model.
Sorry to say, removing CPU features is a no-go when wanting to guarantee forward migration without taking care about CPU model details manually and simply using the host model. Self-made HW vendor problem.
And this simply does not reflect reality. Intel and Power have removed TX for example. We can now sit back and please ourselves how we live in our world of dreams. Or we can try to define an interface that deals with reality and actually solves problems.
Ehm, so, I spell out the obvious and get such a reaction? Okay, thank you.
Sorry, reading my writing again shows that I clearly miscommunicated in a very bad style. My point was rather trying to solve a problem instead I wrote something up in a hurry which resulted in something offensive. Please accept my apologies.

On 11.03.22 13:54, Christian Borntraeger wrote:
Am 11.03.22 um 13:27 schrieb David Hildenbrand:
On 11.03.22 13:12, Christian Borntraeger wrote:
Am 11.03.22 um 10:23 schrieb David Hildenbrand:
On 11.03.22 10:17, Daniel P. Berrangé wrote:
On Thu, Mar 10, 2022 at 11:17:38PM -0500, Collin Walling wrote:
CPU models past gen16a will no longer support the csske feature. In order to secure migration of guests running on machines that still support this feature to machines that do not, let's disable csske in the host-model.
Sorry to say, removing CPU features is a no-go when wanting to guarantee forward migration without taking care about CPU model details manually and simply using the host model. Self-made HW vendor problem.
And this simply does not reflect reality. Intel and Power have removed TX for example. We can now sit back and please ourselves how we live in our world of dreams. Or we can try to define an interface that deals with reality and actually solves problems.
Ehm, so, I spell out the obvious and get such a reaction? Okay, thank you.
Sorry, reading my writing again shows that I clearly miscommunicated in a very bad style. My point was rather trying to solve a problem instead I wrote something up in a hurry which resulted in something offensive.
Please accept my apologies.
No hard feelings, I understand that this is an important thing to sort out for IBM. -- Thanks, David / dhildenb

On Fri, Mar 11, 2022 at 01:12:35PM +0100, Christian Borntraeger wrote:
Am 11.03.22 um 10:23 schrieb David Hildenbrand:
On 11.03.22 10:17, Daniel P. Berrangé wrote:
On Thu, Mar 10, 2022 at 11:17:38PM -0500, Collin Walling wrote:
CPU models past gen16a will no longer support the csske feature. In order to secure migration of guests running on machines that still support this feature to machines that do not, let's disable csske in the host-model.
Sorry to say, removing CPU features is a no-go when wanting to guarantee forward migration without taking care about CPU model details manually and simply using the host model. Self-made HW vendor problem.
And this simply does not reflect reality. Intel and Power have removed TX for example. We can now sit back and please ourselves how we live in our world of dreams. Or we can try to define an interface that deals with reality and actually solves problems.
This proposal wouldn't have helped in the case of Intel removing TSX, because it was removed without prior warning in the middle of the product lifecycle. At that time there were already millions of VMs in existance using the removed feature.
The problem scenario you describe is the intended semantics of host-model though. It enables all features available in the host that you launched on. It lets you live migrate to a target host with the same, or a greater number of features. If the target has a greater number of features, it should restrict the VM to the subset of features that were present on the original source CPU. If the target has fewer features, then you simply can't live migrate a VM using host-model.
To get live migration in both directions across CPUs with differing featuresets, then the VM needs to be configured with a named CPU model that is a subset of both, rather than host-model.
Right, and cpu-model-baseline does that job for you if you're lazy to lookup the proper model.
Yes baseline will work, but this requires tooling like openstack. The normal user will just use the default and this is host-model.
Let me explain the usecase for this feature. Migration between different versins baseline: always works host-passthrough: you get what you deserve default model: works We have disabled CSSKE from our default models (-cpu gen15a will not present csske). So that works as well. host-model: Also works for all machines that have csske. Now: Lets say gen17 will no longer support this. That means that we can not migrate host-model from gen16 or gen15 because those will have csske. What options do we have? If we disable csske in the host capabilities that would mean that a host compare against an xml from an older QEMU would fail (even if you move from gen14 to gen14). So this is not a good option.
By disabling deprecated features ONLY for the _initial_ expansion of model-model, but keeping it in the host capabilities you can migrate existing guests (with the feature) as we only disable in the expansion, but manually asking for it still works. AND it will allow to move this instantiation of the guest to future machines without the feature. Basically everything works.
The change you proposal works functionally, but none the less it is changing the semantics of host-model. It is defined to expose all the features in the host, and the proposal changes yet. If an app actually /wants/ to use the deprecated feature and it exists in the host, then host-model should be allowing that as it does today. The problem scenario you describe is ultimately that OpenStack does not have a future proof default CPU choice. Libvirt and QEMU provide a mechanism for them to pick other CPU models that would address the problem, but they're not using that. The challenge is that OpenStack defaults currently are a zero-interaction thing. They could retain their zero-interaction defaults, if at install time they queried the libvirt capabilities to learn which named CPU models are available, whereupon they could decide to use gen15a. The main challenge here is that the list of named CPU models is an unordered set, so it is hard to programatically figure out which of the available named CPU models is the newest/best/recommended. IOW, what's missing is a way for apps to easily identify that 'gen15a' is the best CPU to use on the host, without needing human interaction. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Fri, Mar 11, 2022 at 12:37:46PM +0000, Daniel P. Berrangé wrote:
On Fri, Mar 11, 2022 at 01:12:35PM +0100, Christian Borntraeger wrote:
Am 11.03.22 um 10:23 schrieb David Hildenbrand:
On 11.03.22 10:17, Daniel P. Berrangé wrote:
On Thu, Mar 10, 2022 at 11:17:38PM -0500, Collin Walling wrote:
CPU models past gen16a will no longer support the csske feature. In order to secure migration of guests running on machines that still support this feature to machines that do not, let's disable csske in the host-model.
Sorry to say, removing CPU features is a no-go when wanting to guarantee forward migration without taking care about CPU model details manually and simply using the host model. Self-made HW vendor problem.
And this simply does not reflect reality. Intel and Power have removed TX for example. We can now sit back and please ourselves how we live in our world of dreams. Or we can try to define an interface that deals with reality and actually solves problems.
This proposal wouldn't have helped in the case of Intel removing TSX, because it was removed without prior warning in the middle of the product lifecycle. At that time there were already millions of VMs in existance using the removed feature.
The problem scenario you describe is the intended semantics of host-model though. It enables all features available in the host that you launched on. It lets you live migrate to a target host with the same, or a greater number of features. If the target has a greater number of features, it should restrict the VM to the subset of features that were present on the original source CPU. If the target has fewer features, then you simply can't live migrate a VM using host-model.
To get live migration in both directions across CPUs with differing featuresets, then the VM needs to be configured with a named CPU model that is a subset of both, rather than host-model.
Right, and cpu-model-baseline does that job for you if you're lazy to lookup the proper model.
Yes baseline will work, but this requires tooling like openstack. The normal user will just use the default and this is host-model.
Let me explain the usecase for this feature. Migration between different versins baseline: always works host-passthrough: you get what you deserve default model: works We have disabled CSSKE from our default models (-cpu gen15a will not present csske). So that works as well. host-model: Also works for all machines that have csske. Now: Lets say gen17 will no longer support this. That means that we can not migrate host-model from gen16 or gen15 because those will have csske. What options do we have? If we disable csske in the host capabilities that would mean that a host compare against an xml from an older QEMU would fail (even if you move from gen14 to gen14). So this is not a good option.
By disabling deprecated features ONLY for the _initial_ expansion of model-model, but keeping it in the host capabilities you can migrate existing guests (with the feature) as we only disable in the expansion, but manually asking for it still works. AND it will allow to move this instantiation of the guest to future machines without the feature. Basically everything works.
The change you proposal works functionally, but none the less it is changing the semantics of host-model. It is defined to expose all the features in the host, and the proposal changes yet. If an app actually /wants/ to use the deprecated feature and it exists in the host, then host-model should be allowing that as it does today.
The problem scenario you describe is ultimately that OpenStack does not have a future proof default CPU choice. Libvirt and QEMU provide a mechanism for them to pick other CPU models that would address the problem, but they're not using that. The challenge is that OpenStack defaults currently are a zero-interaction thing.
They could retain their zero-interaction defaults, if at install time they queried the libvirt capabilities to learn which named CPU models are available, whereupon they could decide to use gen15a. The main challenge here is that the list of named CPU models is an unordered set, so it is hard to programatically figure out which of the available named CPU models is the newest/best/recommended.
IOW, what's missing is a way for apps to easily identify that 'gen15a' is the best CPU to use on the host, without needing human interaction.
I think this could be solved with a change to query-cpu-definitions in QEMU, to add an extra 'recommended: bool' attribute to the CpuDefinitionInfo struct. This would be defined to be only set for 1 CPU model in the list, and would reflect the recommended CPU model given the current version of QEMU, kernel and hardware. Or we could allow 'recommended' to be set for more than 1 CPU, provided we define an explicit ordering of returned CPU models. OpenStack can query this when first launching a guest, and remember it for that guest on future boots. This makes guest robust against QEMU changing its recommendation over time. For example, when it became clear that "TSX" was going to be removed, QEMU could have switched to recommending one of the Intel no-TSX CPU model variants, but existing guests wouldn't be affected. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Am 11.03.22 um 14:08 schrieb Daniel P. Berrangé:
On Fri, Mar 11, 2022 at 12:37:46PM +0000, Daniel P. Berrangé wrote:
On Fri, Mar 11, 2022 at 01:12:35PM +0100, Christian Borntraeger wrote:
Am 11.03.22 um 10:23 schrieb David Hildenbrand:
On 11.03.22 10:17, Daniel P. Berrangé wrote:
On Thu, Mar 10, 2022 at 11:17:38PM -0500, Collin Walling wrote:
CPU models past gen16a will no longer support the csske feature. In order to secure migration of guests running on machines that still support this feature to machines that do not, let's disable csske in the host-model.
Sorry to say, removing CPU features is a no-go when wanting to guarantee forward migration without taking care about CPU model details manually and simply using the host model. Self-made HW vendor problem.
And this simply does not reflect reality. Intel and Power have removed TX for example. We can now sit back and please ourselves how we live in our world of dreams. Or we can try to define an interface that deals with reality and actually solves problems.
This proposal wouldn't have helped in the case of Intel removing TSX, because it was removed without prior warning in the middle of the product lifecycle. At that time there were already millions of VMs in existance using the removed feature.
The problem scenario you describe is the intended semantics of host-model though. It enables all features available in the host that you launched on. It lets you live migrate to a target host with the same, or a greater number of features. If the target has a greater number of features, it should restrict the VM to the subset of features that were present on the original source CPU. If the target has fewer features, then you simply can't live migrate a VM using host-model.
To get live migration in both directions across CPUs with differing featuresets, then the VM needs to be configured with a named CPU model that is a subset of both, rather than host-model.
Right, and cpu-model-baseline does that job for you if you're lazy to lookup the proper model.
Yes baseline will work, but this requires tooling like openstack. The normal user will just use the default and this is host-model.
Let me explain the usecase for this feature. Migration between different versins baseline: always works host-passthrough: you get what you deserve default model: works We have disabled CSSKE from our default models (-cpu gen15a will not present csske). So that works as well. host-model: Also works for all machines that have csske. Now: Lets say gen17 will no longer support this. That means that we can not migrate host-model from gen16 or gen15 because those will have csske. What options do we have? If we disable csske in the host capabilities that would mean that a host compare against an xml from an older QEMU would fail (even if you move from gen14 to gen14). So this is not a good option.
By disabling deprecated features ONLY for the _initial_ expansion of model-model, but keeping it in the host capabilities you can migrate existing guests (with the feature) as we only disable in the expansion, but manually asking for it still works. AND it will allow to move this instantiation of the guest to future machines without the feature. Basically everything works.
The change you proposal works functionally, but none the less it is changing the semantics of host-model. It is defined to expose all the features in the host, and the proposal changes yet. If an app actually /wants/ to use the deprecated feature and it exists in the host, then host-model should be allowing that as it does today.
The problem scenario you describe is ultimately that OpenStack does not have a future proof default CPU choice. Libvirt and QEMU provide a mechanism for them to pick other CPU models that would address the problem, but they're not using that. The challenge is that OpenStack defaults currently are a zero-interaction thing.
They could retain their zero-interaction defaults, if at install time they queried the libvirt capabilities to learn which named CPU models are available, whereupon they could decide to use gen15a. The main challenge here is that the list of named CPU models is an unordered set, so it is hard to programatically figure out which of the available named CPU models is the newest/best/recommended.
IOW, what's missing is a way for apps to easily identify that 'gen15a' is the best CPU to use on the host, without needing human interaction.
I think this could be solved with a change to query-cpu-definitions in QEMU, to add an extra 'recommended: bool' attribute to the CpuDefinitionInfo struct. This would be defined to be only set for 1 CPU model in the list, and would reflect the recommended CPU model given the current version of QEMU, kernel and hardware. Or we could allow 'recommended' to be set for more than 1 CPU, provided we define an explicit ordering of returned CPU models.
I like the recommended: bool attribute. It should provide what we need. Would you then also suggest to use this for host-model or only for a new type like "host-recommended" ?
OpenStack can query this when first launching a guest, and remember it for that guest on future boots.
This makes guest robust against QEMU changing its recommendation over time. For example, when it became clear that "TSX" was going to be removed, QEMU could have switched to recommending one of the Intel no-TSX CPU model variants, but existing guests wouldn't be affected.

On Fri, Mar 11, 2022 at 03:52:57PM +0100, Christian Borntraeger wrote:
Am 11.03.22 um 14:08 schrieb Daniel P. Berrangé:
On Fri, Mar 11, 2022 at 12:37:46PM +0000, Daniel P. Berrangé wrote:
On Fri, Mar 11, 2022 at 01:12:35PM +0100, Christian Borntraeger wrote:
Am 11.03.22 um 10:23 schrieb David Hildenbrand:
On 11.03.22 10:17, Daniel P. Berrangé wrote:
On Thu, Mar 10, 2022 at 11:17:38PM -0500, Collin Walling wrote: > CPU models past gen16a will no longer support the csske feature. In > order to secure migration of guests running on machines that still > support this feature to machines that do not, let's disable csske > in the host-model.
Sorry to say, removing CPU features is a no-go when wanting to guarantee forward migration without taking care about CPU model details manually and simply using the host model. Self-made HW vendor problem.
And this simply does not reflect reality. Intel and Power have removed TX for example. We can now sit back and please ourselves how we live in our world of dreams. Or we can try to define an interface that deals with reality and actually solves problems.
This proposal wouldn't have helped in the case of Intel removing TSX, because it was removed without prior warning in the middle of the product lifecycle. At that time there were already millions of VMs in existance using the removed feature.
The problem scenario you describe is the intended semantics of host-model though. It enables all features available in the host that you launched on. It lets you live migrate to a target host with the same, or a greater number of features. If the target has a greater number of features, it should restrict the VM to the subset of features that were present on the original source CPU. If the target has fewer features, then you simply can't live migrate a VM using host-model.
To get live migration in both directions across CPUs with differing featuresets, then the VM needs to be configured with a named CPU model that is a subset of both, rather than host-model.
Right, and cpu-model-baseline does that job for you if you're lazy to lookup the proper model.
Yes baseline will work, but this requires tooling like openstack. The normal user will just use the default and this is host-model.
Let me explain the usecase for this feature. Migration between different versins baseline: always works host-passthrough: you get what you deserve default model: works We have disabled CSSKE from our default models (-cpu gen15a will not present csske). So that works as well. host-model: Also works for all machines that have csske. Now: Lets say gen17 will no longer support this. That means that we can not migrate host-model from gen16 or gen15 because those will have csske. What options do we have? If we disable csske in the host capabilities that would mean that a host compare against an xml from an older QEMU would fail (even if you move from gen14 to gen14). So this is not a good option.
By disabling deprecated features ONLY for the _initial_ expansion of model-model, but keeping it in the host capabilities you can migrate existing guests (with the feature) as we only disable in the expansion, but manually asking for it still works. AND it will allow to move this instantiation of the guest to future machines without the feature. Basically everything works.
The change you proposal works functionally, but none the less it is changing the semantics of host-model. It is defined to expose all the features in the host, and the proposal changes yet. If an app actually /wants/ to use the deprecated feature and it exists in the host, then host-model should be allowing that as it does today.
The problem scenario you describe is ultimately that OpenStack does not have a future proof default CPU choice. Libvirt and QEMU provide a mechanism for them to pick other CPU models that would address the problem, but they're not using that. The challenge is that OpenStack defaults currently are a zero-interaction thing.
They could retain their zero-interaction defaults, if at install time they queried the libvirt capabilities to learn which named CPU models are available, whereupon they could decide to use gen15a. The main challenge here is that the list of named CPU models is an unordered set, so it is hard to programatically figure out which of the available named CPU models is the newest/best/recommended.
IOW, what's missing is a way for apps to easily identify that 'gen15a' is the best CPU to use on the host, without needing human interaction.
I think this could be solved with a change to query-cpu-definitions in QEMU, to add an extra 'recommended: bool' attribute to the CpuDefinitionInfo struct. This would be defined to be only set for 1 CPU model in the list, and would reflect the recommended CPU model given the current version of QEMU, kernel and hardware. Or we could allow 'recommended' to be set for more than 1 CPU, provided we define an explicit ordering of returned CPU models.
I like the recommended: bool attribute. It should provide what we need.
Would you then also suggest to use this for host-model or only for a new type like "host-recommended" ?
Neither of those. Libvirt would simply report this attribute in the information it exposes about CPUs. OpenStack would explicitly extract this and set it in the XML for the guest, so that each guest's view of "recommended" is fixed from the time that guest is first created, rather than potentially changing on each later boots. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Am 11.03.22 um 15:56 schrieb Daniel P. Berrangé:
On Fri, Mar 11, 2022 at 03:52:57PM +0100, Christian Borntraeger wrote:
Am 11.03.22 um 14:08 schrieb Daniel P. Berrangé:
On Fri, Mar 11, 2022 at 12:37:46PM +0000, Daniel P. Berrangé wrote:
On Fri, Mar 11, 2022 at 01:12:35PM +0100, Christian Borntraeger wrote:
Am 11.03.22 um 10:23 schrieb David Hildenbrand:
On 11.03.22 10:17, Daniel P. Berrangé wrote: > On Thu, Mar 10, 2022 at 11:17:38PM -0500, Collin Walling wrote: >> CPU models past gen16a will no longer support the csske feature. In >> order to secure migration of guests running on machines that still >> support this feature to machines that do not, let's disable csske >> in the host-model.
Sorry to say, removing CPU features is a no-go when wanting to guarantee forward migration without taking care about CPU model details manually and simply using the host model. Self-made HW vendor problem.
And this simply does not reflect reality. Intel and Power have removed TX for example. We can now sit back and please ourselves how we live in our world of dreams. Or we can try to define an interface that deals with reality and actually solves problems.
This proposal wouldn't have helped in the case of Intel removing TSX, because it was removed without prior warning in the middle of the product lifecycle. At that time there were already millions of VMs in existance using the removed feature.
> The problem scenario you describe is the intended semantics of > host-model though. It enables all features available in the host > that you launched on. It lets you live migrate to a target host > with the same, or a greater number of features. If the target has > a greater number of features, it should restrict the VM to the > subset of features that were present on the original source CPU. > If the target has fewer features, then you simply can't live > migrate a VM using host-model. > > To get live migration in both directions across CPUs with differing > featuresets, then the VM needs to be configured with a named CPU > model that is a subset of both, rather than host-model.
Right, and cpu-model-baseline does that job for you if you're lazy to lookup the proper model.
Yes baseline will work, but this requires tooling like openstack. The normal user will just use the default and this is host-model.
Let me explain the usecase for this feature. Migration between different versins baseline: always works host-passthrough: you get what you deserve default model: works We have disabled CSSKE from our default models (-cpu gen15a will not present csske). So that works as well. host-model: Also works for all machines that have csske. Now: Lets say gen17 will no longer support this. That means that we can not migrate host-model from gen16 or gen15 because those will have csske. What options do we have? If we disable csske in the host capabilities that would mean that a host compare against an xml from an older QEMU would fail (even if you move from gen14 to gen14). So this is not a good option.
By disabling deprecated features ONLY for the _initial_ expansion of model-model, but keeping it in the host capabilities you can migrate existing guests (with the feature) as we only disable in the expansion, but manually asking for it still works. AND it will allow to move this instantiation of the guest to future machines without the feature. Basically everything works.
The change you proposal works functionally, but none the less it is changing the semantics of host-model. It is defined to expose all the features in the host, and the proposal changes yet. If an app actually /wants/ to use the deprecated feature and it exists in the host, then host-model should be allowing that as it does today.
The problem scenario you describe is ultimately that OpenStack does not have a future proof default CPU choice. Libvirt and QEMU provide a mechanism for them to pick other CPU models that would address the problem, but they're not using that. The challenge is that OpenStack defaults currently are a zero-interaction thing.
They could retain their zero-interaction defaults, if at install time they queried the libvirt capabilities to learn which named CPU models are available, whereupon they could decide to use gen15a. The main challenge here is that the list of named CPU models is an unordered set, so it is hard to programatically figure out which of the available named CPU models is the newest/best/recommended.
IOW, what's missing is a way for apps to easily identify that 'gen15a' is the best CPU to use on the host, without needing human interaction.
I think this could be solved with a change to query-cpu-definitions in QEMU, to add an extra 'recommended: bool' attribute to the CpuDefinitionInfo struct. This would be defined to be only set for 1 CPU model in the list, and would reflect the recommended CPU model given the current version of QEMU, kernel and hardware. Or we could allow 'recommended' to be set for more than 1 CPU, provided we define an explicit ordering of returned CPU models.
I like the recommended: bool attribute. It should provide what we need.
Would you then also suggest to use this for host-model or only for a new type like "host-recommended" ?
Neither of those. Libvirt would simply report this attribute in the information it exposes about CPUs.
OpenStack would explicitly extract this and set it in the XML for the guest, so that each guest's view of "recommended" is fixed from the time that guest is first created, rather than potentially changing on each later boots.
Openstack is one thing, but I think this flag would really be useful for instantiation without open stack.

Am 11.03.22 um 16:24 schrieb Christian Borntraeger:
Am 11.03.22 um 15:56 schrieb Daniel P. Berrangé:
On Fri, Mar 11, 2022 at 03:52:57PM +0100, Christian Borntraeger wrote:
Am 11.03.22 um 14:08 schrieb Daniel P. Berrangé:
On Fri, Mar 11, 2022 at 12:37:46PM +0000, Daniel P. Berrangé wrote:
On Fri, Mar 11, 2022 at 01:12:35PM +0100, Christian Borntraeger wrote:
Am 11.03.22 um 10:23 schrieb David Hildenbrand: > On 11.03.22 10:17, Daniel P. Berrangé wrote: >> On Thu, Mar 10, 2022 at 11:17:38PM -0500, Collin Walling wrote: >>> CPU models past gen16a will no longer support the csske feature. In >>> order to secure migration of guests running on machines that still >>> support this feature to machines that do not, let's disable csske >>> in the host-model. > > Sorry to say, removing CPU features is a no-go when wanting to guarantee > forward migration without taking care about CPU model details manually > and simply using the host model. Self-made HW vendor problem.
And this simply does not reflect reality. Intel and Power have removed TX for example. We can now sit back and please ourselves how we live in our world of dreams. Or we can try to define an interface that deals with reality and actually solves problems.
This proposal wouldn't have helped in the case of Intel removing TSX, because it was removed without prior warning in the middle of the product lifecycle. At that time there were already millions of VMs in existance using the removed feature.
>> The problem scenario you describe is the intended semantics of >> host-model though. It enables all features available in the host >> that you launched on. It lets you live migrate to a target host >> with the same, or a greater number of features. If the target has >> a greater number of features, it should restrict the VM to the >> subset of features that were present on the original source CPU. >> If the target has fewer features, then you simply can't live >> migrate a VM using host-model. >> >> To get live migration in both directions across CPUs with differing >> featuresets, then the VM needs to be configured with a named CPU >> model that is a subset of both, rather than host-model. > > Right, and cpu-model-baseline does that job for you if you're lazy to > lookup the proper model.
Yes baseline will work, but this requires tooling like openstack. The normal user will just use the default and this is host-model.
Let me explain the usecase for this feature. Migration between different versins baseline: always works host-passthrough: you get what you deserve default model: works We have disabled CSSKE from our default models (-cpu gen15a will not present csske). So that works as well. host-model: Also works for all machines that have csske. Now: Lets say gen17 will no longer support this. That means that we can not migrate host-model from gen16 or gen15 because those will have csske. What options do we have? If we disable csske in the host capabilities that would mean that a host compare against an xml from an older QEMU would fail (even if you move from gen14 to gen14). So this is not a good option.
By disabling deprecated features ONLY for the _initial_ expansion of model-model, but keeping it in the host capabilities you can migrate existing guests (with the feature) as we only disable in the expansion, but manually asking for it still works. AND it will allow to move this instantiation of the guest to future machines without the feature. Basically everything works.
The change you proposal works functionally, but none the less it is changing the semantics of host-model. It is defined to expose all the features in the host, and the proposal changes yet. If an app actually /wants/ to use the deprecated feature and it exists in the host, then host-model should be allowing that as it does today.
The problem scenario you describe is ultimately that OpenStack does not have a future proof default CPU choice. Libvirt and QEMU provide a mechanism for them to pick other CPU models that would address the problem, but they're not using that. The challenge is that OpenStack defaults currently are a zero-interaction thing.
They could retain their zero-interaction defaults, if at install time they queried the libvirt capabilities to learn which named CPU models are available, whereupon they could decide to use gen15a. The main challenge here is that the list of named CPU models is an unordered set, so it is hard to programatically figure out which of the available named CPU models is the newest/best/recommended.
IOW, what's missing is a way for apps to easily identify that 'gen15a' is the best CPU to use on the host, without needing human interaction.
I think this could be solved with a change to query-cpu-definitions in QEMU, to add an extra 'recommended: bool' attribute to the CpuDefinitionInfo struct. This would be defined to be only set for 1 CPU model in the list, and would reflect the recommended CPU model given the current version of QEMU, kernel and hardware. Or we could allow 'recommended' to be set for more than 1 CPU, provided we define an explicit ordering of returned CPU models.
I like the recommended: bool attribute. It should provide what we need.
Would you then also suggest to use this for host-model or only for a new type like "host-recommended" ?
Neither of those. Libvirt would simply report this attribute in the information it exposes about CPUs.
OpenStack would explicitly extract this and set it in the XML for the guest, so that each guest's view of "recommended" is fixed from the time that guest is first created, rather than potentially changing on each later boots.
Openstack is one thing, but I think this flag would really be useful for instantiation without open stack.
To make things more clear. I would like to have a way where a virsh start of a guest xml without CPU model would work for migration in as many scenarios as possible. And if the default model (today host-model) would ignore features that are not recommended, this would be perfect.

On Fri, Mar 11, 2022 at 04:31:49PM +0100, Christian Borntraeger wrote:
Am 11.03.22 um 16:24 schrieb Christian Borntraeger:
Am 11.03.22 um 15:56 schrieb Daniel P. Berrangé:
On Fri, Mar 11, 2022 at 03:52:57PM +0100, Christian Borntraeger wrote:
Am 11.03.22 um 14:08 schrieb Daniel P. Berrangé:
On Fri, Mar 11, 2022 at 12:37:46PM +0000, Daniel P. Berrangé wrote:
On Fri, Mar 11, 2022 at 01:12:35PM +0100, Christian Borntraeger wrote: > > > Am 11.03.22 um 10:23 schrieb David Hildenbrand: > > On 11.03.22 10:17, Daniel P. Berrangé wrote: > > > On Thu, Mar 10, 2022 at 11:17:38PM -0500, Collin Walling wrote: > > > > CPU models past gen16a will no longer support the csske feature. In > > > > order to secure migration of guests running on machines that still > > > > support this feature to machines that do not, let's disable csske > > > > in the host-model. > > > > Sorry to say, removing CPU features is a no-go when wanting to guarantee > > forward migration without taking care about CPU model details manually > > and simply using the host model. Self-made HW vendor problem. > > And this simply does not reflect reality. Intel and Power have removed TX > for example. We can now sit back and please ourselves how we live in our > world of dreams. Or we can try to define an interface that deals with > reality and actually solves problems.
This proposal wouldn't have helped in the case of Intel removing TSX, because it was removed without prior warning in the middle of the product lifecycle. At that time there were already millions of VMs in existance using the removed feature.
> > > The problem scenario you describe is the intended semantics of > > > host-model though. It enables all features available in the host > > > that you launched on. It lets you live migrate to a target host > > > with the same, or a greater number of features. If the target has > > > a greater number of features, it should restrict the VM to the > > > subset of features that were present on the original source CPU. > > > If the target has fewer features, then you simply can't live > > > migrate a VM using host-model. > > > > > > To get live migration in both directions across CPUs with differing > > > featuresets, then the VM needs to be configured with a named CPU > > > model that is a subset of both, rather than host-model. > > > > Right, and cpu-model-baseline does that job for you if you're lazy to > > lookup the proper model. > > Yes baseline will work, but this requires tooling like openstack. The normal > user will just use the default and this is host-model. > > Let me explain the usecase for this feature. Migration between different versins > baseline: always works > host-passthrough: you get what you deserve > default model: works > We have disabled CSSKE from our default models (-cpu gen15a will not present csske). > So that works as well. > host-model: Also works for all machines that have csske. > Now: Lets say gen17 will no longer support this. That means that we can not migrate > host-model from gen16 or gen15 because those will have csske. > What options do we have? If we disable csske in the host capabilities that would mean > that a host compare against an xml from an older QEMU would fail (even if you move > from gen14 to gen14). So this is not a good option. > > By disabling deprecated features ONLY for the _initial_ expansion of model-model, but > keeping it in the host capabilities you can migrate existing guests (with the > feature) as we only disable in the expansion, but manually asking for it still works. > AND it will allow to move this instantiation of the guest to future machines without > the feature. Basically everything works.
The change you proposal works functionally, but none the less it is changing the semantics of host-model. It is defined to expose all the features in the host, and the proposal changes yet. If an app actually /wants/ to use the deprecated feature and it exists in the host, then host-model should be allowing that as it does today.
The problem scenario you describe is ultimately that OpenStack does not have a future proof default CPU choice. Libvirt and QEMU provide a mechanism for them to pick other CPU models that would address the problem, but they're not using that. The challenge is that OpenStack defaults currently are a zero-interaction thing.
They could retain their zero-interaction defaults, if at install time they queried the libvirt capabilities to learn which named CPU models are available, whereupon they could decide to use gen15a. The main challenge here is that the list of named CPU models is an unordered set, so it is hard to programatically figure out which of the available named CPU models is the newest/best/recommended.
IOW, what's missing is a way for apps to easily identify that 'gen15a' is the best CPU to use on the host, without needing human interaction.
I think this could be solved with a change to query-cpu-definitions in QEMU, to add an extra 'recommended: bool' attribute to the CpuDefinitionInfo struct. This would be defined to be only set for 1 CPU model in the list, and would reflect the recommended CPU model given the current version of QEMU, kernel and hardware. Or we could allow 'recommended' to be set for more than 1 CPU, provided we define an explicit ordering of returned CPU models.
I like the recommended: bool attribute. It should provide what we need.
Would you then also suggest to use this for host-model or only for a new type like "host-recommended" ?
Neither of those. Libvirt would simply report this attribute in the information it exposes about CPUs.
OpenStack would explicitly extract this and set it in the XML for the guest, so that each guest's view of "recommended" is fixed from the time that guest is first created, rather than potentially changing on each later boots.
Openstack is one thing, but I think this flag would really be useful for instantiation without open stack.
To make things more clear. I would like to have a way where a virsh start of a guest xml without CPU model would work for migration in as many scenarios as possible. And if the default model (today host-model) would ignore features that are not recommended, this would be perfect.
Libvirt's ABI/API guarantee policy is to not change semantics of historical configuration. So anything we might in this relation would require an explicit XML addition compared to today. If someone makes alot of use of features like live migration then they will be using a serious mgmt app, not virsh. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Fri, Mar 11, 2022 at 04:24:22PM +0100, Christian Borntraeger wrote:
Am 11.03.22 um 15:56 schrieb Daniel P. Berrangé:
On Fri, Mar 11, 2022 at 03:52:57PM +0100, Christian Borntraeger wrote:
Am 11.03.22 um 14:08 schrieb Daniel P. Berrangé:
On Fri, Mar 11, 2022 at 12:37:46PM +0000, Daniel P. Berrangé wrote:
On Fri, Mar 11, 2022 at 01:12:35PM +0100, Christian Borntraeger wrote:
Am 11.03.22 um 10:23 schrieb David Hildenbrand: > On 11.03.22 10:17, Daniel P. Berrangé wrote: > > On Thu, Mar 10, 2022 at 11:17:38PM -0500, Collin Walling wrote: > > > CPU models past gen16a will no longer support the csske feature. In > > > order to secure migration of guests running on machines that still > > > support this feature to machines that do not, let's disable csske > > > in the host-model. > > Sorry to say, removing CPU features is a no-go when wanting to guarantee > forward migration without taking care about CPU model details manually > and simply using the host model. Self-made HW vendor problem.
And this simply does not reflect reality. Intel and Power have removed TX for example. We can now sit back and please ourselves how we live in our world of dreams. Or we can try to define an interface that deals with reality and actually solves problems.
This proposal wouldn't have helped in the case of Intel removing TSX, because it was removed without prior warning in the middle of the product lifecycle. At that time there were already millions of VMs in existance using the removed feature.
> > The problem scenario you describe is the intended semantics of > > host-model though. It enables all features available in the host > > that you launched on. It lets you live migrate to a target host > > with the same, or a greater number of features. If the target has > > a greater number of features, it should restrict the VM to the > > subset of features that were present on the original source CPU. > > If the target has fewer features, then you simply can't live > > migrate a VM using host-model. > > > > To get live migration in both directions across CPUs with differing > > featuresets, then the VM needs to be configured with a named CPU > > model that is a subset of both, rather than host-model. > > Right, and cpu-model-baseline does that job for you if you're lazy to > lookup the proper model.
Yes baseline will work, but this requires tooling like openstack. The normal user will just use the default and this is host-model.
Let me explain the usecase for this feature. Migration between different versins baseline: always works host-passthrough: you get what you deserve default model: works We have disabled CSSKE from our default models (-cpu gen15a will not present csske). So that works as well. host-model: Also works for all machines that have csske. Now: Lets say gen17 will no longer support this. That means that we can not migrate host-model from gen16 or gen15 because those will have csske. What options do we have? If we disable csske in the host capabilities that would mean that a host compare against an xml from an older QEMU would fail (even if you move from gen14 to gen14). So this is not a good option.
By disabling deprecated features ONLY for the _initial_ expansion of model-model, but keeping it in the host capabilities you can migrate existing guests (with the feature) as we only disable in the expansion, but manually asking for it still works. AND it will allow to move this instantiation of the guest to future machines without the feature. Basically everything works.
The change you proposal works functionally, but none the less it is changing the semantics of host-model. It is defined to expose all the features in the host, and the proposal changes yet. If an app actually /wants/ to use the deprecated feature and it exists in the host, then host-model should be allowing that as it does today.
The problem scenario you describe is ultimately that OpenStack does not have a future proof default CPU choice. Libvirt and QEMU provide a mechanism for them to pick other CPU models that would address the problem, but they're not using that. The challenge is that OpenStack defaults currently are a zero-interaction thing.
They could retain their zero-interaction defaults, if at install time they queried the libvirt capabilities to learn which named CPU models are available, whereupon they could decide to use gen15a. The main challenge here is that the list of named CPU models is an unordered set, so it is hard to programatically figure out which of the available named CPU models is the newest/best/recommended.
IOW, what's missing is a way for apps to easily identify that 'gen15a' is the best CPU to use on the host, without needing human interaction.
I think this could be solved with a change to query-cpu-definitions in QEMU, to add an extra 'recommended: bool' attribute to the CpuDefinitionInfo struct. This would be defined to be only set for 1 CPU model in the list, and would reflect the recommended CPU model given the current version of QEMU, kernel and hardware. Or we could allow 'recommended' to be set for more than 1 CPU, provided we define an explicit ordering of returned CPU models.
I like the recommended: bool attribute. It should provide what we need.
Would you then also suggest to use this for host-model or only for a new type like "host-recommended" ?
Neither of those. Libvirt would simply report this attribute in the information it exposes about CPUs.
OpenStack would explicitly extract this and set it in the XML for the guest, so that each guest's view of "recommended" is fixed from the time that guest is first created, rather than potentially changing on each later boots.
Openstack is one thing, but I think this flag would really be useful for instantiation without open stack.
Sure, any mgmt app using libvirt that provisions guests can use this approach. I just mentioned openstack as that was what you mentioned at the start of this thread. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 11.03.22 05:17, Collin Walling wrote:
The s390x architecture has a growing list of features that will no longer be supported on future hardware releases. This introduces an issue with migration such that guests, running on models with these features enabled, will be rejected outright by machines that do not support these features.
A current example is the CSSKE feature that has been deprecated for some time. It has been publicly announced that gen15 will be the last release to support this feature, however we have postponed this to gen16a. A possible solution to remedy this would be to create a new QEMU QMP Response that allows users to query for deprecated/unsupported features.
This presents two parts of the puzzle: how to report deprecated features to a user (libvirt) and how should libvirt handle this information.
First, let's discuss the latter. The patch presented alongside this cover letter attempts to solve the migration issue by hard-coding the CSSKE feature to be disabled for all s390x CPU models. This is done by simply appending the CSSKE feature with the disabled policy to the host-model.
libvirt pseudo:
if arch is s390x set CSSKE to disabled for host-model
That violates host-model semantics and possibly the user intend. There would have to be some toggle to manually specify this, for example, a new model type or a some magical flag. Gluing this to the "host-model" feels wrong. The other concern I have is that deprecated features are a moving target, and with a new QEMU version you could suddenly have more deprecated features. Hm. Maybe you'd want some kind of a host-based-model from QEMU that does this automatically? I need more coffee to get creative on a name. -- Thanks, David / dhildenb

Am 11.03.22 um 10:30 schrieb David Hildenbrand:
On 11.03.22 05:17, Collin Walling wrote:
The s390x architecture has a growing list of features that will no longer be supported on future hardware releases. This introduces an issue with migration such that guests, running on models with these features enabled, will be rejected outright by machines that do not support these features.
A current example is the CSSKE feature that has been deprecated for some time. It has been publicly announced that gen15 will be the last release to support this feature, however we have postponed this to gen16a. A possible solution to remedy this would be to create a new QEMU QMP Response that allows users to query for deprecated/unsupported features.
This presents two parts of the puzzle: how to report deprecated features to a user (libvirt) and how should libvirt handle this information.
First, let's discuss the latter. The patch presented alongside this cover letter attempts to solve the migration issue by hard-coding the CSSKE feature to be disabled for all s390x CPU models. This is done by simply appending the CSSKE feature with the disabled policy to the host-model.
libvirt pseudo:
if arch is s390x set CSSKE to disabled for host-model
That violates host-model semantics and possibly the user intend. There would have to be some toggle to manually specify this, for example, a new model type or a some magical flag.
What we actually want to do is to disable csske completely from QEMU and thus from the host-model. Then it would not violate the spec. But this has all kind of issues (you cannot migrate from older versions of software and machines) although the hardware still can provide the feature. The hardware guys promised me to deprecate things two generations earlier and we usually deprecate things that are not used or where software has a runtime switch. From what I hear from you is that you do not want to modify the host-model semantics to something more useful but rather define a new thing (e.g. "host-sane") ?
Gluing this to the "host-model" feels wrong.
The other concern I have is that deprecated features are a moving target, and with a new QEMU version you could suddenly have more deprecated features. Hm.
Maybe you'd want some kind of a host-based-model from QEMU that does this automatically? I need more coffee to get creative on a name.

On 11.03.22 13:44, Christian Borntraeger wrote:
Am 11.03.22 um 10:30 schrieb David Hildenbrand:
On 11.03.22 05:17, Collin Walling wrote:
The s390x architecture has a growing list of features that will no longer be supported on future hardware releases. This introduces an issue with migration such that guests, running on models with these features enabled, will be rejected outright by machines that do not support these features.
A current example is the CSSKE feature that has been deprecated for some time. It has been publicly announced that gen15 will be the last release to support this feature, however we have postponed this to gen16a. A possible solution to remedy this would be to create a new QEMU QMP Response that allows users to query for deprecated/unsupported features.
This presents two parts of the puzzle: how to report deprecated features to a user (libvirt) and how should libvirt handle this information.
First, let's discuss the latter. The patch presented alongside this cover letter attempts to solve the migration issue by hard-coding the CSSKE feature to be disabled for all s390x CPU models. This is done by simply appending the CSSKE feature with the disabled policy to the host-model.
libvirt pseudo:
if arch is s390x set CSSKE to disabled for host-model
That violates host-model semantics and possibly the user intend. There would have to be some toggle to manually specify this, for example, a new model type or a some magical flag.
What we actually want to do is to disable csske completely from QEMU and thus from the host-model. Then it would not violate the spec. But this has all kind of issues (you cannot migrate from older versions of software and machines) although the hardware still can provide the feature.
The hardware guys promised me to deprecate things two generations earlier and we usually deprecate things that are not used or where software has a runtime switch.
From what I hear from you is that you do not want to modify the host-model semantics to something more useful but rather define a new thing (e.g. "host-sane") ?
My take would be, to keep the host model consistent, meaning, the semantics in QEMU exactly match the semantics in Libvirt. It defines the maximum CPU model that's runnable under KVM. If a feature is not included (e.g., csske) that feature cannot be enabled in any way. The "host model" has the semantics of resembling the actual host CPU. This is only partially true, because we support some features the host might not support (e.g., zPCI IIRC) and obviously don't support all host features in QEMU. So instead of playing games on the libvirt side with the host model, I see the following alternatives: 1. Remove the problematic features from the host model in QEMU, like "we just don't support this feature". Consequently, any migration of a VM with csske=on to a new QEMU version will fail, similar to having an older QEMU version without support for a certain feature. "host-passthrough" would change between QEMU versions ... which I see as problematic. 2. Introduce a new CPU model that has these new semantics: "host model" - deprecated features. Migration of older VMs with csske=on to a new QEMU version will work. Make libvirt use/expand that new CPU model It doesn't necessarily have to be an actual new cpu model. We can use a feature group, like "-cpu host,deprectated-features=false". What's inside "deprecated-features" will actually change between QEMU versions, but we don't really care, as the expanded CPU model won't change. "host-passthrough" won't change between QEMU versions ... 3. As Daniel suggested, don't use the host model, but a CPU model indicated as "suggested". The real issue is that in reality, we don't simply always use a model like "gen15a", but usually want optional features, if they are around. Prime examples are "sie" and friends. I tend to prefer 2. With 3. I see issues with optional features like "sie" and friends. Often, you really want "give me all you got, but disable deprecated features that might cause problems in the future". -- Thanks, David / dhildenb

On 15.03.22 16:58, David Hildenbrand wrote:
On 11.03.22 13:44, Christian Borntraeger wrote:
Am 11.03.22 um 10:30 schrieb David Hildenbrand:
On 11.03.22 05:17, Collin Walling wrote:
The s390x architecture has a growing list of features that will no longer be supported on future hardware releases. This introduces an issue with migration such that guests, running on models with these features enabled, will be rejected outright by machines that do not support these features.
A current example is the CSSKE feature that has been deprecated for some time. It has been publicly announced that gen15 will be the last release to support this feature, however we have postponed this to gen16a. A possible solution to remedy this would be to create a new QEMU QMP Response that allows users to query for deprecated/unsupported features.
This presents two parts of the puzzle: how to report deprecated features to a user (libvirt) and how should libvirt handle this information.
First, let's discuss the latter. The patch presented alongside this cover letter attempts to solve the migration issue by hard-coding the CSSKE feature to be disabled for all s390x CPU models. This is done by simply appending the CSSKE feature with the disabled policy to the host-model.
libvirt pseudo:
if arch is s390x set CSSKE to disabled for host-model
That violates host-model semantics and possibly the user intend. There would have to be some toggle to manually specify this, for example, a new model type or a some magical flag.
What we actually want to do is to disable csske completely from QEMU and thus from the host-model. Then it would not violate the spec. But this has all kind of issues (you cannot migrate from older versions of software and machines) although the hardware still can provide the feature.
The hardware guys promised me to deprecate things two generations earlier and we usually deprecate things that are not used or where software has a runtime switch.
From what I hear from you is that you do not want to modify the host-model semantics to something more useful but rather define a new thing (e.g. "host-sane") ?
My take would be, to keep the host model consistent, meaning, the semantics in QEMU exactly match the semantics in Libvirt. It defines the maximum CPU model that's runnable under KVM. If a feature is not included (e.g., csske) that feature cannot be enabled in any way.
The "host model" has the semantics of resembling the actual host CPU. This is only partially true, because we support some features the host might not support (e.g., zPCI IIRC) and obviously don't support all host features in QEMU.
So instead of playing games on the libvirt side with the host model, I see the following alternatives:
1. Remove the problematic features from the host model in QEMU, like "we just don't support this feature". Consequently, any migration of a VM with csske=on to a new QEMU version will fail, similar to having an older QEMU version without support for a certain feature.
"host-passthrough" would change between QEMU versions ... which I see as problematic.
2. Introduce a new CPU model that has these new semantics: "host model" - deprecated features. Migration of older VMs with csske=on to a new QEMU version will work. Make libvirt use/expand that new CPU model
It doesn't necessarily have to be an actual new cpu model. We can use a feature group, like "-cpu host,deprectated-features=false". What's inside "deprecated-features" will actually change between QEMU versions, but we don't really care, as the expanded CPU model won't change.
"host-passthrough" won't change between QEMU versions ...
3. As Daniel suggested, don't use the host model, but a CPU model indicated as "suggested".
The real issue is that in reality, we don't simply always use a model like "gen15a", but usually want optional features, if they are around. Prime examples are "sie" and friends.
I tend to prefer 2. With 3. I see issues with optional features like "sie" and friends. Often, you really want "give me all you got, but disable deprecated features that might cause problems in the future".
Something as hacky as this: diff --git a/slirp b/slirp --- a/slirp +++ b/slirp @@ -1 +1 @@ -Subproject commit a88d9ace234a24ce1c17189642ef9104799425e0 +Subproject commit a88d9ace234a24ce1c17189642ef9104799425e0-dirty diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c index 11e06cc51f..37200989c6 100644 --- a/target/s390x/cpu_models.c +++ b/target/s390x/cpu_models.c @@ -708,6 +708,34 @@ static void set_feature_group(Object *obj, Visitor *v, const char *name, } } +static void set_deprecated_features(Object *obj, Visitor *v, const char *name, + void *opaque, Error **errp) +{ + DeviceState *dev = DEVICE(obj); + S390CPU *cpu = S390_CPU(obj); + bool value; + + if (dev->realized) { + error_setg(errp, "Attempt to set property '%s' on '%s' after " + "it was realized", name, object_get_typename(obj)); + return; + } else if (!cpu->model) { + error_setg(errp, "Details about the host CPU model are not available, " + "features cannot be changed."); + return; + } + + if (!visit_type_bool(v, name, &value, errp)) { + return; + } + if (value) { + error_setg(errp, "Group '%s' can only be disabled.", name); + return; + } + + clear_bit(S390_FEAT_CONDITIONAL_SSKE, cpu->model->features); +} + static void s390_cpu_model_initfn(Object *obj) { S390CPU *cpu = S390_CPU(obj); @@ -823,6 +851,8 @@ void s390_cpu_model_class_register_props(ObjectClass *oc) object_class_property_add_bool(oc, "static", get_is_static, NULL); object_class_property_add_str(oc, "description", get_description, NULL); + object_class_property_add(oc, "deprecated-features", "bool", NULL, + set_deprecated_features, NULL, NULL); for (feat = 0; feat < S390_FEAT_MAX; feat++) { const S390FeatDef *def = s390_feat_def(feat); While it's primarily useful for the "host" model, it *might* be useful for other (older) models as well. Under TCG: { "execute": "query-cpu-model-expansion", "arguments": { "type": "static", "model": { "name": "z14" } } } {"return": {"model": {"name": "z14-base", "props": {"aen": true, "aefsi": true, "mepoch": true, "msa8": true, "msa7": true, "msa6": true, "msa5": true, "msa4": true, "msa3": true, "msa2": true, "msa1": true, "sthyi": true, "edat": true, "ri": true, "edat2": true, "vx": true, "ipter": true, "mepochptff": true, "vxeh": true, "vxpd": true, "esop": true, "iep": true, "cte": true, "bpb": true, "gs": true, "ppa15": true, "zpci": true, "sea_esop2": true, "te": true, "cmm": true}}}} { "execute": "query-cpu-model-expansion", "arguments": { "type": "static", "model": { "name": "z14", "props": {"deprecated-features":false}} } } {"return": {"model": {"name": "z14-base", "props": {"aen": true, "aefsi": true, "csske": false, "mepoch": true, "msa8": true, "msa7": true, "msa6": true, "msa5": true, "msa4": true, "msa3": true, "msa2": true, "msa1": true, "sthyi": true, "edat": true, "ri": true, "edat2": true, "vx": true, "ipter": true, "mepochptff": true, "vxeh": true, "vxpd": true, "esop": true, "iep": true, "cte": true, "bpb": true, "gs": true, "ppa15": true, "zpci": true, "sea_esop2": true, "te": true, "cmm": true}}}} Note the "csske=false" change. -- Thanks, David / dhildenb

On 3/15/22 4:58 PM, David Hildenbrand wrote:
On 11.03.22 13:44, Christian Borntraeger wrote:
Am 11.03.22 um 10:30 schrieb David Hildenbrand:
On 11.03.22 05:17, Collin Walling wrote:
The s390x architecture has a growing list of features that will no longer be supported on future hardware releases. This introduces an issue with migration such that guests, running on models with these features enabled, will be rejected outright by machines that do not support these features.
A current example is the CSSKE feature that has been deprecated for some time. It has been publicly announced that gen15 will be the last release to support this feature, however we have postponed this to gen16a. A possible solution to remedy this would be to create a new QEMU QMP Response that allows users to query for deprecated/unsupported features.
This presents two parts of the puzzle: how to report deprecated features to a user (libvirt) and how should libvirt handle this information.
First, let's discuss the latter. The patch presented alongside this cover letter attempts to solve the migration issue by hard-coding the CSSKE feature to be disabled for all s390x CPU models. This is done by simply appending the CSSKE feature with the disabled policy to the host-model.
libvirt pseudo:
if arch is s390x set CSSKE to disabled for host-model
That violates host-model semantics and possibly the user intend. There would have to be some toggle to manually specify this, for example, a new model type or a some magical flag.
What we actually want to do is to disable csske completely from QEMU and thus from the host-model. Then it would not violate the spec. But this has all kind of issues (you cannot migrate from older versions of software and machines) although the hardware still can provide the feature.
The hardware guys promised me to deprecate things two generations earlier and we usually deprecate things that are not used or where software has a runtime switch.
From what I hear from you is that you do not want to modify the host-model semantics to something more useful but rather define a new thing (e.g. "host-sane") ?
My take would be, to keep the host model consistent, meaning, the semantics in QEMU exactly match the semantics in Libvirt. It defines the maximum CPU model that's runnable under KVM. If a feature is not included (e.g., csske) that feature cannot be enabled in any way.
The "host model" has the semantics of resembling the actual host CPU. This is only partially true, because we support some features the host might not support (e.g., zPCI IIRC) and obviously don't support all host features in QEMU.
So instead of playing games on the libvirt side with the host model, I see the following alternatives:
1. Remove the problematic features from the host model in QEMU, like "we just don't support this feature". Consequently, any migration of a VM with csske=on to a new QEMU version will fail, similar to having an older QEMU version without support for a certain feature.
"host-passthrough" would change between QEMU versions ... which I see as problematic.
2. Introduce a new CPU model that has these new semantics: "host model" - deprecated features. Migration of older VMs with csske=on to a new QEMU version will work. Make libvirt use/expand that new CPU model
It doesn't necessarily have to be an actual new cpu model. We can use a feature group, like "-cpu host,deprectated-features=false". What's inside "deprecated-features" will actually change between QEMU versions, but we don't really care, as the expanded CPU model won't change.
"host-passthrough" won't change between QEMU versions ...
3. As Daniel suggested, don't use the host model, but a CPU model indicated as "suggested".
The real issue is that in reality, we don't simply always use a model like "gen15a", but usually want optional features, if they are around. Prime examples are "sie" and friends.
I tend to prefer 2. With 3. I see issues with optional features like "sie" and friends. Often, you really want "give me all you got, but disable deprecated features that might cause problems in the future".
David, if I understand you proposal 2 correctly it sounds a lot like Christians idea of leaving the CPU mode "host-model" as is and introduce a new CPU mode "host-recommended" for the new semantics in which query-cpu-model-expansion would be called with the additional "deprectated-features" property. That way libvirt would not have to fiddle around with the deprecation itself and users would have the option which semantic they want to use. Is that correct? -- Mit freundlichen Grüßen/Kind regards Boris Fiuczynski IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Gregor Pillen Geschäftsführung: David Faller Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294

On 15.03.22 18:40, Boris Fiuczynski wrote:
On 3/15/22 4:58 PM, David Hildenbrand wrote:
On 11.03.22 13:44, Christian Borntraeger wrote:
Am 11.03.22 um 10:30 schrieb David Hildenbrand:
On 11.03.22 05:17, Collin Walling wrote:
The s390x architecture has a growing list of features that will no longer be supported on future hardware releases. This introduces an issue with migration such that guests, running on models with these features enabled, will be rejected outright by machines that do not support these features.
A current example is the CSSKE feature that has been deprecated for some time. It has been publicly announced that gen15 will be the last release to support this feature, however we have postponed this to gen16a. A possible solution to remedy this would be to create a new QEMU QMP Response that allows users to query for deprecated/unsupported features.
This presents two parts of the puzzle: how to report deprecated features to a user (libvirt) and how should libvirt handle this information.
First, let's discuss the latter. The patch presented alongside this cover letter attempts to solve the migration issue by hard-coding the CSSKE feature to be disabled for all s390x CPU models. This is done by simply appending the CSSKE feature with the disabled policy to the host-model.
libvirt pseudo:
if arch is s390x set CSSKE to disabled for host-model
That violates host-model semantics and possibly the user intend. There would have to be some toggle to manually specify this, for example, a new model type or a some magical flag.
What we actually want to do is to disable csske completely from QEMU and thus from the host-model. Then it would not violate the spec. But this has all kind of issues (you cannot migrate from older versions of software and machines) although the hardware still can provide the feature.
The hardware guys promised me to deprecate things two generations earlier and we usually deprecate things that are not used or where software has a runtime switch.
From what I hear from you is that you do not want to modify the host-model semantics to something more useful but rather define a new thing (e.g. "host-sane") ?
My take would be, to keep the host model consistent, meaning, the semantics in QEMU exactly match the semantics in Libvirt. It defines the maximum CPU model that's runnable under KVM. If a feature is not included (e.g., csske) that feature cannot be enabled in any way.
The "host model" has the semantics of resembling the actual host CPU. This is only partially true, because we support some features the host might not support (e.g., zPCI IIRC) and obviously don't support all host features in QEMU.
So instead of playing games on the libvirt side with the host model, I see the following alternatives:
1. Remove the problematic features from the host model in QEMU, like "we just don't support this feature". Consequently, any migration of a VM with csske=on to a new QEMU version will fail, similar to having an older QEMU version without support for a certain feature.
"host-passthrough" would change between QEMU versions ... which I see as problematic.
2. Introduce a new CPU model that has these new semantics: "host model" - deprecated features. Migration of older VMs with csske=on to a new QEMU version will work. Make libvirt use/expand that new CPU model
It doesn't necessarily have to be an actual new cpu model. We can use a feature group, like "-cpu host,deprectated-features=false". What's inside "deprecated-features" will actually change between QEMU versions, but we don't really care, as the expanded CPU model won't change.
"host-passthrough" won't change between QEMU versions ...
3. As Daniel suggested, don't use the host model, but a CPU model indicated as "suggested".
The real issue is that in reality, we don't simply always use a model like "gen15a", but usually want optional features, if they are around. Prime examples are "sie" and friends.
I tend to prefer 2. With 3. I see issues with optional features like "sie" and friends. Often, you really want "give me all you got, but disable deprecated features that might cause problems in the future".
David, if I understand you proposal 2 correctly it sounds a lot like Christians idea of leaving the CPU mode "host-model" as is and introduce a new CPU mode "host-recommended" for the new semantics in which query-cpu-model-expansion would be called with the additional "deprectated-features" property. That way libvirt would not have to fiddle around with the deprecation itself and users would have the option which semantic they want to use. Is that correct?
Yes, exactly. -- Thanks, David / dhildenb

On 3/15/22 15:08, David Hildenbrand wrote:
On 15.03.22 18:40, Boris Fiuczynski wrote:
On 3/15/22 4:58 PM, David Hildenbrand wrote:
On 11.03.22 13:44, Christian Borntraeger wrote:
Am 11.03.22 um 10:30 schrieb David Hildenbrand:
On 11.03.22 05:17, Collin Walling wrote:
The s390x architecture has a growing list of features that will no longer be supported on future hardware releases. This introduces an issue with migration such that guests, running on models with these features enabled, will be rejected outright by machines that do not support these features.
A current example is the CSSKE feature that has been deprecated for some time. It has been publicly announced that gen15 will be the last release to support this feature, however we have postponed this to gen16a. A possible solution to remedy this would be to create a new QEMU QMP Response that allows users to query for deprecated/unsupported features.
This presents two parts of the puzzle: how to report deprecated features to a user (libvirt) and how should libvirt handle this information.
First, let's discuss the latter. The patch presented alongside this cover letter attempts to solve the migration issue by hard-coding the CSSKE feature to be disabled for all s390x CPU models. This is done by simply appending the CSSKE feature with the disabled policy to the host-model.
libvirt pseudo:
if arch is s390x set CSSKE to disabled for host-model
That violates host-model semantics and possibly the user intend. There would have to be some toggle to manually specify this, for example, a new model type or a some magical flag.
What we actually want to do is to disable csske completely from QEMU and thus from the host-model. Then it would not violate the spec. But this has all kind of issues (you cannot migrate from older versions of software and machines) although the hardware still can provide the feature.
The hardware guys promised me to deprecate things two generations earlier and we usually deprecate things that are not used or where software has a runtime switch.
From what I hear from you is that you do not want to modify the host-model semantics to something more useful but rather define a new thing (e.g. "host-sane") ?
My take would be, to keep the host model consistent, meaning, the semantics in QEMU exactly match the semantics in Libvirt. It defines the maximum CPU model that's runnable under KVM. If a feature is not included (e.g., csske) that feature cannot be enabled in any way.
The "host model" has the semantics of resembling the actual host CPU. This is only partially true, because we support some features the host might not support (e.g., zPCI IIRC) and obviously don't support all host features in QEMU.
So instead of playing games on the libvirt side with the host model, I see the following alternatives:
1. Remove the problematic features from the host model in QEMU, like "we just don't support this feature". Consequently, any migration of a VM with csske=on to a new QEMU version will fail, similar to having an older QEMU version without support for a certain feature.
"host-passthrough" would change between QEMU versions ... which I see as problematic.
2. Introduce a new CPU model that has these new semantics: "host model" - deprecated features. Migration of older VMs with csske=on to a new QEMU version will work. Make libvirt use/expand that new CPU model
It doesn't necessarily have to be an actual new cpu model. We can use a feature group, like "-cpu host,deprectated-features=false". What's inside "deprecated-features" will actually change between QEMU versions, but we don't really care, as the expanded CPU model won't change.
"host-passthrough" won't change between QEMU versions ...
3. As Daniel suggested, don't use the host model, but a CPU model indicated as "suggested".
The real issue is that in reality, we don't simply always use a model like "gen15a", but usually want optional features, if they are around. Prime examples are "sie" and friends.
I tend to prefer 2. With 3. I see issues with optional features like "sie" and friends. Often, you really want "give me all you got, but disable deprecated features that might cause problems in the future".
David, if I understand you proposal 2 correctly it sounds a lot like Christians idea of leaving the CPU mode "host-model" as is and introduce a new CPU mode "host-recommended" for the new semantics in which query-cpu-model-expansion would be called with the additional "deprectated-features" property. That way libvirt would not have to fiddle around with the deprecation itself and users would have the option which semantic they want to use. Is that correct?
Yes, exactly.
From what I understand:
QEMU - add a "deprecated-features" feature group (more-or-less David's code) libvirt - recognize a new model name "host-recommended" - query QEMU for host-model + deprecated-features and cache it in caps file (something like <hostRecCpu>) - when guest is defined with "host-recommended", pull <hostRecCPU> from caps when guest is started (similar to how host-model works today) If this is sufficient, then I can then get to work on this. My question is what would be the best way to include the deprecated features when calculating a baseline or comparison. Both work with the host-model and may no longer present an accurate result. Say, for example, we baseline a z15 with a gen17 (which will outright not support CSSKE). With today's implementation, this might result in a ridiculously old CPU model which also does not support CSSKE. The ideal response would be a z15 - deprecated features (i.e. host-recommended on a z15), but we'd need a way to flag to QEMU that we want to exclude the deprecated features. Or am I totally wrong about this? -- Regards, Collin Stay safe and stay healthy

On 18.03.22 18:23, Collin Walling wrote:
On 3/15/22 15:08, David Hildenbrand wrote:
On 15.03.22 18:40, Boris Fiuczynski wrote:
On 3/15/22 4:58 PM, David Hildenbrand wrote:
On 11.03.22 13:44, Christian Borntraeger wrote:
Am 11.03.22 um 10:30 schrieb David Hildenbrand:
On 11.03.22 05:17, Collin Walling wrote: > The s390x architecture has a growing list of features that will no longer > be supported on future hardware releases. This introduces an issue with > migration such that guests, running on models with these features enabled, > will be rejected outright by machines that do not support these features. > > A current example is the CSSKE feature that has been deprecated for some time. > It has been publicly announced that gen15 will be the last release to > support this feature, however we have postponed this to gen16a. A possible > solution to remedy this would be to create a new QEMU QMP Response that allows > users to query for deprecated/unsupported features. > > This presents two parts of the puzzle: how to report deprecated features to > a user (libvirt) and how should libvirt handle this information. > > First, let's discuss the latter. The patch presented alongside this cover letter > attempts to solve the migration issue by hard-coding the CSSKE feature to be > disabled for all s390x CPU models. This is done by simply appending the CSSKE > feature with the disabled policy to the host-model. > > libvirt pseudo: > > if arch is s390x > set CSSKE to disabled for host-model
That violates host-model semantics and possibly the user intend. There would have to be some toggle to manually specify this, for example, a new model type or a some magical flag.
What we actually want to do is to disable csske completely from QEMU and thus from the host-model. Then it would not violate the spec. But this has all kind of issues (you cannot migrate from older versions of software and machines) although the hardware still can provide the feature.
The hardware guys promised me to deprecate things two generations earlier and we usually deprecate things that are not used or where software has a runtime switch.
From what I hear from you is that you do not want to modify the host-model semantics to something more useful but rather define a new thing (e.g. "host-sane") ?
My take would be, to keep the host model consistent, meaning, the semantics in QEMU exactly match the semantics in Libvirt. It defines the maximum CPU model that's runnable under KVM. If a feature is not included (e.g., csske) that feature cannot be enabled in any way.
The "host model" has the semantics of resembling the actual host CPU. This is only partially true, because we support some features the host might not support (e.g., zPCI IIRC) and obviously don't support all host features in QEMU.
So instead of playing games on the libvirt side with the host model, I see the following alternatives:
1. Remove the problematic features from the host model in QEMU, like "we just don't support this feature". Consequently, any migration of a VM with csske=on to a new QEMU version will fail, similar to having an older QEMU version without support for a certain feature.
"host-passthrough" would change between QEMU versions ... which I see as problematic.
2. Introduce a new CPU model that has these new semantics: "host model" - deprecated features. Migration of older VMs with csske=on to a new QEMU version will work. Make libvirt use/expand that new CPU model
It doesn't necessarily have to be an actual new cpu model. We can use a feature group, like "-cpu host,deprectated-features=false". What's inside "deprecated-features" will actually change between QEMU versions, but we don't really care, as the expanded CPU model won't change.
"host-passthrough" won't change between QEMU versions ...
3. As Daniel suggested, don't use the host model, but a CPU model indicated as "suggested".
The real issue is that in reality, we don't simply always use a model like "gen15a", but usually want optional features, if they are around. Prime examples are "sie" and friends.
I tend to prefer 2. With 3. I see issues with optional features like "sie" and friends. Often, you really want "give me all you got, but disable deprecated features that might cause problems in the future".
David, if I understand you proposal 2 correctly it sounds a lot like Christians idea of leaving the CPU mode "host-model" as is and introduce a new CPU mode "host-recommended" for the new semantics in which query-cpu-model-expansion would be called with the additional "deprectated-features" property. That way libvirt would not have to fiddle around with the deprecation itself and users would have the option which semantic they want to use. Is that correct?
Yes, exactly.
From what I understand:
QEMU - add a "deprecated-features" feature group (more-or-less David's code)
libvirt - recognize a new model name "host-recommended" - query QEMU for host-model + deprecated-features and cache it in caps file (something like <hostRecCpu>) - when guest is defined with "host-recommended", pull <hostRecCPU> from caps when guest is started (similar to how host-model works today)
If this is sufficient, then I can then get to work on this.
My question is what would be the best way to include the deprecated features when calculating a baseline or comparison. Both work with the host-model and may no longer present an accurate result. Say, for example, we baseline a z15 with a gen17 (which will outright not support CSSKE). With today's implementation, this might result in a ridiculously old CPU model which also does not support CSSKE. The ideal response would be a z15 - deprecated features (i.e. host-recommended on a z15), but we'd need a way to flag to QEMU that we want to exclude the deprecated features. Or am I totally wrong about this?
For baselining, it would be reasonable to always disable deprecated features, and to ignore them during the model selection. Should be fairly easy to implement, let me know if you need any pointers. I *assume* that for comparison there is nothing to do. -- Thanks, David / dhildenb

On 3/18/22 14:33, David Hildenbrand wrote:
On 18.03.22 18:23, Collin Walling wrote:
On 3/15/22 15:08, David Hildenbrand wrote:
On 15.03.22 18:40, Boris Fiuczynski wrote:
On 3/15/22 4:58 PM, David Hildenbrand wrote:
On 11.03.22 13:44, Christian Borntraeger wrote:
Am 11.03.22 um 10:30 schrieb David Hildenbrand: > On 11.03.22 05:17, Collin Walling wrote: >> The s390x architecture has a growing list of features that will no longer >> be supported on future hardware releases. This introduces an issue with >> migration such that guests, running on models with these features enabled, >> will be rejected outright by machines that do not support these features. >> >> A current example is the CSSKE feature that has been deprecated for some time. >> It has been publicly announced that gen15 will be the last release to >> support this feature, however we have postponed this to gen16a. A possible >> solution to remedy this would be to create a new QEMU QMP Response that allows >> users to query for deprecated/unsupported features. >> >> This presents two parts of the puzzle: how to report deprecated features to >> a user (libvirt) and how should libvirt handle this information. >> >> First, let's discuss the latter. The patch presented alongside this cover letter >> attempts to solve the migration issue by hard-coding the CSSKE feature to be >> disabled for all s390x CPU models. This is done by simply appending the CSSKE >> feature with the disabled policy to the host-model. >> >> libvirt pseudo: >> >> if arch is s390x >> set CSSKE to disabled for host-model > > That violates host-model semantics and possibly the user intend. There > would have to be some toggle to manually specify this, for example, a > new model type or a some magical flag.
What we actually want to do is to disable csske completely from QEMU and thus from the host-model. Then it would not violate the spec. But this has all kind of issues (you cannot migrate from older versions of software and machines) although the hardware still can provide the feature.
The hardware guys promised me to deprecate things two generations earlier and we usually deprecate things that are not used or where software has a runtime switch.
From what I hear from you is that you do not want to modify the host-model semantics to something more useful but rather define a new thing (e.g. "host-sane") ?
My take would be, to keep the host model consistent, meaning, the semantics in QEMU exactly match the semantics in Libvirt. It defines the maximum CPU model that's runnable under KVM. If a feature is not included (e.g., csske) that feature cannot be enabled in any way.
The "host model" has the semantics of resembling the actual host CPU. This is only partially true, because we support some features the host might not support (e.g., zPCI IIRC) and obviously don't support all host features in QEMU.
So instead of playing games on the libvirt side with the host model, I see the following alternatives:
1. Remove the problematic features from the host model in QEMU, like "we just don't support this feature". Consequently, any migration of a VM with csske=on to a new QEMU version will fail, similar to having an older QEMU version without support for a certain feature.
"host-passthrough" would change between QEMU versions ... which I see as problematic.
2. Introduce a new CPU model that has these new semantics: "host model" - deprecated features. Migration of older VMs with csske=on to a new QEMU version will work. Make libvirt use/expand that new CPU model
It doesn't necessarily have to be an actual new cpu model. We can use a feature group, like "-cpu host,deprectated-features=false". What's inside "deprecated-features" will actually change between QEMU versions, but we don't really care, as the expanded CPU model won't change.
"host-passthrough" won't change between QEMU versions ...
3. As Daniel suggested, don't use the host model, but a CPU model indicated as "suggested".
The real issue is that in reality, we don't simply always use a model like "gen15a", but usually want optional features, if they are around. Prime examples are "sie" and friends.
I tend to prefer 2. With 3. I see issues with optional features like "sie" and friends. Often, you really want "give me all you got, but disable deprecated features that might cause problems in the future".
David, if I understand you proposal 2 correctly it sounds a lot like Christians idea of leaving the CPU mode "host-model" as is and introduce a new CPU mode "host-recommended" for the new semantics in which query-cpu-model-expansion would be called with the additional "deprectated-features" property. That way libvirt would not have to fiddle around with the deprecation itself and users would have the option which semantic they want to use. Is that correct?
Yes, exactly.
From what I understand:
QEMU - add a "deprecated-features" feature group (more-or-less David's code)
libvirt - recognize a new model name "host-recommended" - query QEMU for host-model + deprecated-features and cache it in caps file (something like <hostRecCpu>) - when guest is defined with "host-recommended", pull <hostRecCPU> from caps when guest is started (similar to how host-model works today)
If this is sufficient, then I can then get to work on this.
My question is what would be the best way to include the deprecated features when calculating a baseline or comparison. Both work with the host-model and may no longer present an accurate result. Say, for example, we baseline a z15 with a gen17 (which will outright not support CSSKE). With today's implementation, this might result in a ridiculously old CPU model which also does not support CSSKE. The ideal response would be a z15 - deprecated features (i.e. host-recommended on a z15), but we'd need a way to flag to QEMU that we want to exclude the deprecated features. Or am I totally wrong about this?
For baselining, it would be reasonable to always disable deprecated features, and to ignore them during the model selection. Should be fairly easy to implement, let me know if you need any pointers.
Thanks David. I'll take a look when I can. I may not be very active this week due to personal items, but intend to knock this out as soon as things settle down on my end.
I *assume* that for comparison there is nothing to do.
I think you're right, at least on QEMU's end. For libvirt, IIRC, comparison will compare the CPU model cached under the hostCPU tag to whatever is in the XML. If comparing, say, a gen17 host (no csske support) with a gen15 XML, the result should come up as "incompatible". To a user, they may think "what the heck, shouldn't old gen run on new gen?" Doesn't the comparison QAPI report which features cause the result of "incompatible"? Would it make sense to amend the libvirt API to report features causing this issue? I believe this is what the --error flag is meant to do, but as far as I know, nothing useful is currently reported. Something like this (assume we're a gen17 host, and cpu.xml contains a gen15 host-model) # virsh hypervisor-cpu-compare cpu.xml --error error: Failed to compare hypervisor CPU with cpu.xml error: the CPU is incompatible with host CPU error: host CPU does not support: csske -- Regards, Collin Stay safe and stay healthy

From what I understand:
QEMU - add a "deprecated-features" feature group (more-or-less David's code)
libvirt - recognize a new model name "host-recommended" - query QEMU for host-model + deprecated-features and cache it in caps file (something like <hostRecCpu>) - when guest is defined with "host-recommended", pull <hostRecCPU> from caps when guest is started (similar to how host-model works today)
If this is sufficient, then I can then get to work on this.
My question is what would be the best way to include the deprecated features when calculating a baseline or comparison. Both work with the host-model and may no longer present an accurate result. Say, for example, we baseline a z15 with a gen17 (which will outright not support CSSKE). With today's implementation, this might result in a ridiculously old CPU model which also does not support CSSKE. The ideal response would be a z15 - deprecated features (i.e. host-recommended on a z15), but we'd need a way to flag to QEMU that we want to exclude the deprecated features. Or am I totally wrong about this?
For baselining, it would be reasonable to always disable deprecated features, and to ignore them during the model selection. Should be fairly easy to implement, let me know if you need any pointers.
Thanks David. I'll take a look when I can. I may not be very active this week due to personal items, but intend to knock this out as soon as things settle down on my end.
No need to rush :)
I *assume* that for comparison there is nothing to do.
I think you're right, at least on QEMU's end.
For libvirt, IIRC, comparison will compare the CPU model cached under the hostCPU tag to whatever is in the XML. If comparing, say, a gen17 host (no csske support) with a gen15 XML, the result should come up as "incompatible". To a user, they may think "what the heck, shouldn't old gen run on new gen?"
I assume you mean an expanded host model on a z15 that still shows "csske=true". And it would be correct: the deprecated feature still around on the older machine (indicated in the host model) is not around on the newer machine (not indicated in the host model). So starting a VM with the "host-model" on the old machine cannot be migrated to the new machine. You'd need to start the VM with the new host-TOBENAMED CPU model. Comparing with that would work as expected, as the deprecated features would not be included.
Doesn't the comparison QAPI report which features cause the result of "incompatible"? Would it make sense to amend the libvirt API to report features causing this issue? I believe this is what the --error flag is meant to do, but as far as I know, nothing useful is currently reported.
Most probably it was never implemented on s390x. Makes sense to me.
Something like this (assume we're a gen17 host, and cpu.xml contains a gen15 host-model)
# virsh hypervisor-cpu-compare cpu.xml --error error: Failed to compare hypervisor CPU with cpu.xml error: the CPU is incompatible with host CPU error: host CPU does not support: csske
I guess instead of "host CPU" you'd want to indicate one of the two CPU models provided. Not sure how to differentiate them from the XML. -- Thanks, David / dhildenb

On Fri, Mar 18, 2022 at 01:23:03PM -0400, Collin Walling wrote:
On 3/15/22 15:08, David Hildenbrand wrote:
On 15.03.22 18:40, Boris Fiuczynski wrote:
On 3/15/22 4:58 PM, David Hildenbrand wrote:
On 11.03.22 13:44, Christian Borntraeger wrote:
Am 11.03.22 um 10:30 schrieb David Hildenbrand:
On 11.03.22 05:17, Collin Walling wrote: > The s390x architecture has a growing list of features that will no longer > be supported on future hardware releases. This introduces an issue with > migration such that guests, running on models with these features enabled, > will be rejected outright by machines that do not support these features. > > A current example is the CSSKE feature that has been deprecated for some time. > It has been publicly announced that gen15 will be the last release to > support this feature, however we have postponed this to gen16a. A possible > solution to remedy this would be to create a new QEMU QMP Response that allows > users to query for deprecated/unsupported features. > > This presents two parts of the puzzle: how to report deprecated features to > a user (libvirt) and how should libvirt handle this information. > > First, let's discuss the latter. The patch presented alongside this cover letter > attempts to solve the migration issue by hard-coding the CSSKE feature to be > disabled for all s390x CPU models. This is done by simply appending the CSSKE > feature with the disabled policy to the host-model. > > libvirt pseudo: > > if arch is s390x > set CSSKE to disabled for host-model
That violates host-model semantics and possibly the user intend. There would have to be some toggle to manually specify this, for example, a new model type or a some magical flag.
What we actually want to do is to disable csske completely from QEMU and thus from the host-model. Then it would not violate the spec. But this has all kind of issues (you cannot migrate from older versions of software and machines) although the hardware still can provide the feature.
The hardware guys promised me to deprecate things two generations earlier and we usually deprecate things that are not used or where software has a runtime switch.
From what I hear from you is that you do not want to modify the host-model semantics to something more useful but rather define a new thing (e.g. "host-sane") ?
My take would be, to keep the host model consistent, meaning, the semantics in QEMU exactly match the semantics in Libvirt. It defines the maximum CPU model that's runnable under KVM. If a feature is not included (e.g., csske) that feature cannot be enabled in any way.
The "host model" has the semantics of resembling the actual host CPU. This is only partially true, because we support some features the host might not support (e.g., zPCI IIRC) and obviously don't support all host features in QEMU.
So instead of playing games on the libvirt side with the host model, I see the following alternatives:
1. Remove the problematic features from the host model in QEMU, like "we just don't support this feature". Consequently, any migration of a VM with csske=on to a new QEMU version will fail, similar to having an older QEMU version without support for a certain feature.
"host-passthrough" would change between QEMU versions ... which I see as problematic.
2. Introduce a new CPU model that has these new semantics: "host model" - deprecated features. Migration of older VMs with csske=on to a new QEMU version will work. Make libvirt use/expand that new CPU model
It doesn't necessarily have to be an actual new cpu model. We can use a feature group, like "-cpu host,deprectated-features=false". What's inside "deprecated-features" will actually change between QEMU versions, but we don't really care, as the expanded CPU model won't change.
"host-passthrough" won't change between QEMU versions ...
3. As Daniel suggested, don't use the host model, but a CPU model indicated as "suggested".
The real issue is that in reality, we don't simply always use a model like "gen15a", but usually want optional features, if they are around. Prime examples are "sie" and friends.
I tend to prefer 2. With 3. I see issues with optional features like "sie" and friends. Often, you really want "give me all you got, but disable deprecated features that might cause problems in the future".
David, if I understand you proposal 2 correctly it sounds a lot like Christians idea of leaving the CPU mode "host-model" as is and introduce a new CPU mode "host-recommended" for the new semantics in which query-cpu-model-expansion would be called with the additional "deprectated-features" property. That way libvirt would not have to fiddle around with the deprecation itself and users would have the option which semantic they want to use. Is that correct?
Yes, exactly.
From what I understand:
QEMU - add a "deprecated-features" feature group (more-or-less David's code)
libvirt - recognize a new model name "host-recommended" - query QEMU for host-model + deprecated-features and cache it in caps file (something like <hostRecCpu>) - when guest is defined with "host-recommended", pull <hostRecCPU> from caps when guest is started (similar to how host-model works today)
If this is sufficient, then I can then get to work on this.
My question is what would be the best way to include the deprecated features when calculating a baseline or comparison. Both work with the host-model and may no longer present an accurate result. Say, for example, we baseline a z15 with a gen17 (which will outright not support CSSKE). With today's implementation, this might result in a ridiculously old CPU model which also does not support CSSKE. The ideal response would be a z15 - deprecated features (i.e. host-recommended on a z15), but we'd need a way to flag to QEMU that we want to exclude the deprecated features. Or am I totally wrong about this?
QEMU has a concept of versioned QEMU models, so you could define a z15-v2 version without CSSKE With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 21.03.22 10:25, Daniel P. Berrangé wrote:
On Fri, Mar 18, 2022 at 01:23:03PM -0400, Collin Walling wrote:
On 3/15/22 15:08, David Hildenbrand wrote:
On 15.03.22 18:40, Boris Fiuczynski wrote:
On 3/15/22 4:58 PM, David Hildenbrand wrote:
On 11.03.22 13:44, Christian Borntraeger wrote:
Am 11.03.22 um 10:30 schrieb David Hildenbrand: > On 11.03.22 05:17, Collin Walling wrote: >> The s390x architecture has a growing list of features that will no longer >> be supported on future hardware releases. This introduces an issue with >> migration such that guests, running on models with these features enabled, >> will be rejected outright by machines that do not support these features. >> >> A current example is the CSSKE feature that has been deprecated for some time. >> It has been publicly announced that gen15 will be the last release to >> support this feature, however we have postponed this to gen16a. A possible >> solution to remedy this would be to create a new QEMU QMP Response that allows >> users to query for deprecated/unsupported features. >> >> This presents two parts of the puzzle: how to report deprecated features to >> a user (libvirt) and how should libvirt handle this information. >> >> First, let's discuss the latter. The patch presented alongside this cover letter >> attempts to solve the migration issue by hard-coding the CSSKE feature to be >> disabled for all s390x CPU models. This is done by simply appending the CSSKE >> feature with the disabled policy to the host-model. >> >> libvirt pseudo: >> >> if arch is s390x >> set CSSKE to disabled for host-model > > That violates host-model semantics and possibly the user intend. There > would have to be some toggle to manually specify this, for example, a > new model type or a some magical flag.
What we actually want to do is to disable csske completely from QEMU and thus from the host-model. Then it would not violate the spec. But this has all kind of issues (you cannot migrate from older versions of software and machines) although the hardware still can provide the feature.
The hardware guys promised me to deprecate things two generations earlier and we usually deprecate things that are not used or where software has a runtime switch.
From what I hear from you is that you do not want to modify the host-model semantics to something more useful but rather define a new thing (e.g. "host-sane") ?
My take would be, to keep the host model consistent, meaning, the semantics in QEMU exactly match the semantics in Libvirt. It defines the maximum CPU model that's runnable under KVM. If a feature is not included (e.g., csske) that feature cannot be enabled in any way.
The "host model" has the semantics of resembling the actual host CPU. This is only partially true, because we support some features the host might not support (e.g., zPCI IIRC) and obviously don't support all host features in QEMU.
So instead of playing games on the libvirt side with the host model, I see the following alternatives:
1. Remove the problematic features from the host model in QEMU, like "we just don't support this feature". Consequently, any migration of a VM with csske=on to a new QEMU version will fail, similar to having an older QEMU version without support for a certain feature.
"host-passthrough" would change between QEMU versions ... which I see as problematic.
2. Introduce a new CPU model that has these new semantics: "host model" - deprecated features. Migration of older VMs with csske=on to a new QEMU version will work. Make libvirt use/expand that new CPU model
It doesn't necessarily have to be an actual new cpu model. We can use a feature group, like "-cpu host,deprectated-features=false". What's inside "deprecated-features" will actually change between QEMU versions, but we don't really care, as the expanded CPU model won't change.
"host-passthrough" won't change between QEMU versions ...
3. As Daniel suggested, don't use the host model, but a CPU model indicated as "suggested".
The real issue is that in reality, we don't simply always use a model like "gen15a", but usually want optional features, if they are around. Prime examples are "sie" and friends.
I tend to prefer 2. With 3. I see issues with optional features like "sie" and friends. Often, you really want "give me all you got, but disable deprecated features that might cause problems in the future".
David, if I understand you proposal 2 correctly it sounds a lot like Christians idea of leaving the CPU mode "host-model" as is and introduce a new CPU mode "host-recommended" for the new semantics in which query-cpu-model-expansion would be called with the additional "deprectated-features" property. That way libvirt would not have to fiddle around with the deprecation itself and users would have the option which semantic they want to use. Is that correct?
Yes, exactly.
From what I understand:
QEMU - add a "deprecated-features" feature group (more-or-less David's code)
libvirt - recognize a new model name "host-recommended" - query QEMU for host-model + deprecated-features and cache it in caps file (something like <hostRecCpu>) - when guest is defined with "host-recommended", pull <hostRecCPU> from caps when guest is started (similar to how host-model works today)
If this is sufficient, then I can then get to work on this.
My question is what would be the best way to include the deprecated features when calculating a baseline or comparison. Both work with the host-model and may no longer present an accurate result. Say, for example, we baseline a z15 with a gen17 (which will outright not support CSSKE). With today's implementation, this might result in a ridiculously old CPU model which also does not support CSSKE. The ideal response would be a z15 - deprecated features (i.e. host-recommended on a z15), but we'd need a way to flag to QEMU that we want to exclude the deprecated features. Or am I totally wrong about this?
QEMU has a concept of versioned QEMU models, so you could define a z15-v2 version without CSSKE
gen15a already comes with csske=false. s390x does not implement versioned CPU models and as I raised in the past, that concept is rather a bad fit for s390x. -- Thanks, David / dhildenb
participants (5)
-
Boris Fiuczynski
-
Christian Borntraeger
-
Collin Walling
-
Daniel P. Berrangé
-
David Hildenbrand