Re: [PATCH v3 01/12] cpu_map: update script to handle versioned CPUs

21 Feb 2024

On 2/20/24 6:08 PM, Jim Fehlig wrote:
...
On 12/15/23 15:11, Jonathon Jongsma wrote:
...
Previously, the script only generated the parent CPU and any versions
that had a defined alias. The script now generates all CPU versions. Any
version that had a defined alias will continue to use that alias, but
those without aliases will use the generated name $BASECPUNAME-vN.
The reason for this change is two-fold. First, we need to add new models
that support new features (such as SEV-SNP). To deal with this, the
script now generates model definitions for all versions.
But we also need to ensure that our CPU definitions are migration-safe.
To deal with this issue we need to make sure we're always using the
canonical versioned names for CPUs.
Related to migration safety, do we need to be concerned with the 
expansion of 'host-model' CPU? E.g. is it possible 'host-model' expands 
to EPYC before introducing the new models, and EPYC-v4 afterwards? If 
so, what are the ramifications of that?
Indeed, that behavior is possible. In fact, you can see it happening in 
the test results for e.g. patch #5 ("Add versioned EPYC CPUs"). But I 
think that in general we should be fine, because we don't just expand to 
a plain model name, we expand to a model name plus a list of enabled or 
disabled feature flags.

For example, consider one test output file 
(tests/domaincapsdata/qemu_8.1.0-q35.x86_64.xml) that exhibits this 
behavior change. Before my patches the host CPU expanded to a model of 
'EPYC-ROME', with a number of feature flags which included (among others):

       <feature policy='require' name='amd-ssbd'/>
       <feature policy='disable' name='xsaves'/>

After my patches, the host model was expanded to 'EPYC-Rome-v4', with a 
slightly different set of feature flags. The main differences being that 
the 'amd-ssbd' and 'xsaves' features were no longer mentioned because 
they are now included in the -v4 definition. But it also adds a new 
feature flag to disable the 'ibrs' feature since this feature is 
included in the -v4 definition but not in the -v1 definition and is not 
provided by the host CPU:

       <feature policy='disable' name='ibrs'/>

So although the model name is changed, the definition of the host CPU as 
a whole should expand to the same set of CPU features.

I think that this sort of thing can happen every time a new CPU model is 
added (e.g. the -noTSX-IBRS variants), so I think it should work. But if 
anyone knows of any specific migration issues, please let me know.
...
...
Qemu documentation states that unversioned names for CPU models (e.g.
'EPYC') are actually aliases to a specific versioned CPU model (e.g.
'EPYC-v1'). The documentation further states that the specific version
targeted by the alias may change based on the machine type of the
domain. Management software such as libvirt is directed to translate
these aliases to a concrete version in order to make sure that the CPU
definition is safe when migrating between different qemu versions that
may make different choices for which underlying versioned model
represents the alias.
Do you have a link to the qemu documentation? Can it be added to the 
commit message?
Yes, I could certainly add a link to the commit message. It is mentioned 
here: 
https://www.qemu.org/docs/master/about/deprecated.html#runnability-guarantee...
...
...
In practice, at the time of writing qemu always maps the unversioned
aliases to the -v1 model. And libvirt's CPU model definitions also
assume that this is the case. For example, the 'x86_EPYC.xml' file
contains the features that are defined for the EPYC-v1 inside of qemu.
But if qemu ever changes their alias mapping, libvirt's idea of what an
'EPYC' CPU means and qemu's idea of what an 'EPYC' CPU means will no
longer match. So when choosing a CPU model for a domain, we should
always pass the canonical versioned name to libvirt rather than the
unversioned alias. To enable this, the script will generate a new
'canonical_name' field to the CPU model xml definition.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
---
  src/cpu_map/sync_qemu_models_i386.py | 42 ++++++++++++++++++++++------
  1 file changed, 34 insertions(+), 8 deletions(-)
The logic changes to the script LGTM.
Reviewed-by: Jim Fehlig <jfehlig@suse.com>
Regards,
Jim
...

diff --git a/src/cpu_map/sync_qemu_models_i386.py 
b/src/cpu_map/sync_qemu_models_i386.py
index 1c6a2d4d27..7fd62eba4a 100755
--- a/src/cpu_map/sync_qemu_models_i386.py
+++ b/src/cpu_map/sync_qemu_models_i386.py
@@ -322,31 +322,55 @@ def expand_model(model):
      different fields and may have differing versions into several 
libvirt-
      friendly cpu models."""
-    result = {
-        "name": model.pop(".name"),
+    basename = model.pop(".name")
+    parent = {
+        "name": basename,
          "vendor": translate_vendor(model.pop(".vendor")),
          "features": set(),
          "extra": dict()}
      if ".family" in model and ".model" in model:
-        result["family"] = model.pop(".family")
-        result["model"] = model.pop(".model")
+        parent["family"] = model.pop(".family")
+        parent["model"] = model.pop(".model")
      for k in [k for k in model if k.startswith(".features")]:
          v = model.pop(k)
          for feature in v.split():
              translated = translate_feature(feature)
              if translated:
-                result["features"].add(translated)
+                parent["features"].add(translated)
      versions = model.pop(".versions", [])
      for k, v in model.items():
-        result["extra"]["model" + k] = v
-    yield result
+        parent["extra"]["model" + k] = v
+
+    if not versions:
+        yield parent
+        return
+
+    result = parent
      for version in versions:
+        # each version builds on the previous one
          result = copy.deepcopy(result)
-        result["name"] = version.pop(".alias", result["name"])
+        vnum = int(version.pop(".version"))
+        vname = "{}-v{}".format(basename, vnum)
+        result["canonical_name"] = vname
+        if vnum == 1:
+            # the first version should always be an alias for the 
parent and
+            # should therefore have no extra properties
+            if version.items():
+                raise RuntimeError("Unexpected properties in version 1")
+            yield result
+            continue
+
+        # prefer the 'alias' over the generated the name if it exists 
since we
+        # have already been using these aliases
+        alias = version.pop(".alias", None)
+        if alias:
+            result["name"] = alias
+        else:
+            result["name"] = vname
          props = version.pop(".props", dict())
          for k, v in props:
@@ -377,6 +401,8 @@ def output_model(f, model):
      f.write("<cpus>\n")
      f.write("  <model name='{}'>\n".format(model["name"]))
+    if "canonical_name" in model and model["name"] != 
model["canonical_name"]:
+        f.write("    
<canonical_name>{}</canonical_name>\n".format(model["canonical_name"]))
      f.write("    <decode host='on' guest='on'/>\n")
      f.write("    <signature family='{}' model='{}'/>\n".format(
          model["family"], model["model"]))