Re: [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE

CCing libvir-list, Jiri Denemark, Michal Privoznik, so they are aware that the definition of "supported CPU features" will probably become a bit more complex in the future. On Tue, Jul 6, 2021 at 5:58 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
On 06/07/21 23:33, Eduardo Habkost wrote:
On Tue, Jul 6, 2021 at 5:05 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
It's a bit tricky, because HLE and RTM won't really behave well. An old guest that sees RTM=1 might end up retrying and aborting transactions too much. So I'm not sure that a QEMU "-cpu host" guest should have HLE and RTM enabled.
Is the purpose of GET_SUPPORTED_CPUID to return what is supported by KVM, or to return what "-cpu host" should enable by default? They are conflicting requirements in this case.
In theory there is GET_EMULATED_CPUID for the former, so it should be the latter. In practice neither QEMU nor Libvirt use it; maybe now we have a good reason to add it, but note that userspace could also check host RTM_ALWAYS_ABORT.
Returning HLE=1,RTM=1 in GET_SUPPORTED_CPUID makes existing userspace take bad decisions until it's updated.
Returning HLE=0,RTM=0 in GET_SUPPORTED_CPUID prevents existing userspace from resuming existing VMs (despite being technically possible).
The first option has an easy workaround that doesn't require a software update (disabling HLE/RTM in the VM configuration). The second option doesn't have a workaround. I'm inclined towards the first option.
The default has already been tsx=off for a while though, so checking either GET_EMULATED_CPUID or host RTM_ALWAYS_ABORT in userspace might also be feasible for those that are still on tsx=on.
This sounds like a perfect use case for GET_EMULATED_CPUID. My only concern is breaking existing userspace. But if this was already broken for a few kernel releases due to tsx=off being the default, maybe GET_EMULATED_CPUID will be a reasonable approach. -- Eduardo

On Wed, Jul 7, 2021 at 8:09 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
CCing libvir-list, Jiri Denemark, Michal Privoznik, so they are aware that the definition of "supported CPU features" will probably become a bit more complex in the future.
Has there ever been a clear definition? Family, model, and stepping, for instance: are these the only values supported? That would make cross-platform migration impossible. What about the vendor string? Is that the only value supported? That would make cross-vendor migration impossible. For the maximum input value for basic CPUID information (CPUID.0H:EAX), is that the only value supported, or is it the maximum value supported? On the various individual feature bits, does a '1' imply that '0' is also supported, or is '1' the only value supported? What about the feature bits with reversed polarity (e.g. CPUID.(EAX=07H,ECX=0):EBX.FDP_EXCPTN_ONLY[bit 6])? This API has never made sense to me. I have no idea how to interpret what it is telling me.
On Tue, Jul 6, 2021 at 5:58 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
On 06/07/21 23:33, Eduardo Habkost wrote:
On Tue, Jul 6, 2021 at 5:05 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
It's a bit tricky, because HLE and RTM won't really behave well. An old guest that sees RTM=1 might end up retrying and aborting transactions too much. So I'm not sure that a QEMU "-cpu host" guest should have HLE and RTM enabled.
Is the purpose of GET_SUPPORTED_CPUID to return what is supported by KVM, or to return what "-cpu host" should enable by default? They are conflicting requirements in this case.
In theory there is GET_EMULATED_CPUID for the former, so it should be the latter. In practice neither QEMU nor Libvirt use it; maybe now we have a good reason to add it, but note that userspace could also check host RTM_ALWAYS_ABORT.
Returning HLE=1,RTM=1 in GET_SUPPORTED_CPUID makes existing userspace take bad decisions until it's updated.
Returning HLE=0,RTM=0 in GET_SUPPORTED_CPUID prevents existing userspace from resuming existing VMs (despite being technically possible).
The first option has an easy workaround that doesn't require a software update (disabling HLE/RTM in the VM configuration). The second option doesn't have a workaround. I'm inclined towards the first option.
The default has already been tsx=off for a while though, so checking either GET_EMULATED_CPUID or host RTM_ALWAYS_ABORT in userspace might also be feasible for those that are still on tsx=on.
This sounds like a perfect use case for GET_EMULATED_CPUID. My only concern is breaking existing userspace.
But if this was already broken for a few kernel releases due to tsx=off being the default, maybe GET_EMULATED_CPUID will be a reasonable approach.
-- Eduardo

On Wed, Jul 7, 2021 at 12:42 PM Jim Mattson <jmattson@google.com> wrote:
On Wed, Jul 7, 2021 at 8:09 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
CCing libvir-list, Jiri Denemark, Michal Privoznik, so they are aware that the definition of "supported CPU features" will probably become a bit more complex in the future.
Has there ever been a clear definition? Family, model, and stepping, for instance: are these the only values supported? That would make cross-platform migration impossible. What about the vendor string? Is that the only value supported? That would make cross-vendor migration impossible. For the maximum input value for basic CPUID information (CPUID.0H:EAX), is that the only value supported, or is it the maximum value supported? On the various individual feature bits, does a '1' imply that '0' is also supported, or is '1' the only value supported? What about the feature bits with reversed polarity (e.g. CPUID.(EAX=07H,ECX=0):EBX.FDP_EXCPTN_ONLY[bit 6])?
This API has never made sense to me. I have no idea how to interpret what it is telling me.
Is this about GET_SUPPORTED_CPUID, QEMU's query-cpu-model-expansion & related commands, or the libvirt CPU APIs? -- Eduardo

On Wed, Jul 7, 2021 at 10:08 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
On Wed, Jul 7, 2021 at 12:42 PM Jim Mattson <jmattson@google.com> wrote:
On Wed, Jul 7, 2021 at 8:09 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
CCing libvir-list, Jiri Denemark, Michal Privoznik, so they are aware that the definition of "supported CPU features" will probably become a bit more complex in the future.
Has there ever been a clear definition? Family, model, and stepping, for instance: are these the only values supported? That would make cross-platform migration impossible. What about the vendor string? Is that the only value supported? That would make cross-vendor migration impossible. For the maximum input value for basic CPUID information (CPUID.0H:EAX), is that the only value supported, or is it the maximum value supported? On the various individual feature bits, does a '1' imply that '0' is also supported, or is '1' the only value supported? What about the feature bits with reversed polarity (e.g. CPUID.(EAX=07H,ECX=0):EBX.FDP_EXCPTN_ONLY[bit 6])?
This API has never made sense to me. I have no idea how to interpret what it is telling me.
Is this about GET_SUPPORTED_CPUID, QEMU's query-cpu-model-expansion & related commands, or the libvirt CPU APIs?
This is my ongoing rant about KVM_GET_SUPPORTED_CPUID.

On Wed, Jul 7, 2021 at 1:18 PM Jim Mattson <jmattson@google.com> wrote:
On Wed, Jul 7, 2021 at 10:08 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
On Wed, Jul 7, 2021 at 12:42 PM Jim Mattson <jmattson@google.com> wrote:
On Wed, Jul 7, 2021 at 8:09 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
CCing libvir-list, Jiri Denemark, Michal Privoznik, so they are aware that the definition of "supported CPU features" will probably become a bit more complex in the future.
Has there ever been a clear definition? Family, model, and stepping, for instance: are these the only values supported? That would make cross-platform migration impossible. What about the vendor string? Is that the only value supported? That would make cross-vendor migration impossible. For the maximum input value for basic CPUID information (CPUID.0H:EAX), is that the only value supported, or is it the maximum value supported? On the various individual feature bits, does a '1' imply that '0' is also supported, or is '1' the only value supported? What about the feature bits with reversed polarity (e.g. CPUID.(EAX=07H,ECX=0):EBX.FDP_EXCPTN_ONLY[bit 6])?
This API has never made sense to me. I have no idea how to interpret what it is telling me.
Is this about GET_SUPPORTED_CPUID, QEMU's query-cpu-model-expansion & related commands, or the libvirt CPU APIs?
This is my ongoing rant about KVM_GET_SUPPORTED_CPUID.
I agree the definition is not clear. I have tried to enumerate below what QEMU assumes about the return value of KVM_GET_SUPPORTED_CPUID. These are a collection of workarounds and feature-specific rules that are encoded in the kvm_arch_get_supported_cpuid() x86_cpu_filter_features(), and cpu_x86_cpuid() functions in QEMU. 1. Passing through the returned values (unchanged) from KVM_GET_SUPPORTED_CPUID to KVM_SET_CPUID is assumed to be always safe, as long as the ability to save/resume VCPU state is not required. (This is the behavior implemented by "-cpu host,migratable=off") 2. The safety of setting a bit to a different value requires specific knowledge about the CPUID bit. 2.1. For a specific set of registers (see below), QEMU assumes it's safe to set the bit to 0 when KVM_GET_SUPPORTED_CPUID returns 1. 2.2. For a few specific leaves (see below), there are more complex rules. 2.4. For all other leaves, QEMU doesn't use the return value of KVM_GET_SUPPORTED_CPUID at all (AFAICS). The CPUID leaves mentioned in 2.1 are: CPUID[1].EDX CPUID[1].ECX CPUID[6].EAX CPUID[EAX=7,ECX=0].EBX - This unfortunately includes de-feature bits like FDP_EXCPTN_ONLY and ZERO_FCS_FDS CPUID[EAX=7,ECX=0].ECX CPUID[EAX=7,ECX=0].EDX CPUID[EAX=7,ECX=1].EAX CPUID[EAX=0Dh,ECX=0].EAX CPUID[EAX=0Dh,ECX=0].EDX CPUID[EAX=0Dh,ECX=1].EAX - Note that CPUID[0Dh] has additional logic to ensure XSAVE component info on CPUID is consistent CPUID[40000001h].EAX CPUID[40000001h].EDX CPUID[80000001h].EDX CPUID[80000001h].ECX CPUID[80000007h].EDX CPUID[80000008h].EBX CPUID[8000000Ah].EDX CPUID[C0000001h].EDX Some of the CPUID leaves mentioned in 2.2 are: CPUID[1].ECX.HYPERVISOR[bit 31] - Can be enabled unconditionally CPUID[1].ECX.TSC_DEADLINE_TIMER[bit 24] - Can be set to 1 if using the in-kernel irqchip and KVM_CAP_TSC_DEADLINE_TIMER is enabled CPUID[1].ECX.X2APIC[bit 21] - Can be set to 1 if using the in-kernel irqchip CPUID[1].ECX.MONITOR[bit 3] - Can be set to 1 if KVM_X86_DISABLE_EXITS_MWAIT is enabled CPUID[6].EAX.ARAT[bit 2] - Can be enabled unconditionally CPUID[EAX=7,ECX=0].EDX.ARCH_CAPABILITIES - Workaround for KVM bug in Linux v4.17-v4.20 CPUID[EAX=14h,ECX=0], CPUID{EAX=14h,ECX=1] - Most bits must match the host, unless CPUID[EAX=7,ECX=0].EBX.INTEL_PT[bit 25] is 0 CPUID[80000001h].EDX - AMD-specific feature flag aliases can be set based on CPUID[1].EDX CPUID[40000001h].EAX - KVM_FEATURE_PV_UNHALT requires in-kernel irqchip - KVM_FEATURE_MSI_EXT_DEST_ID requires split irqchip CPUID[40000001].EDX.KVM_HINTS_REALTIME - Can be enabled unconditionally -- Eduardo

On 07/07/21 20:23, Eduardo Habkost wrote:
On Wed, Jul 7, 2021 at 1:18 PM Jim Mattson <jmattson@google.com> wrote:
On Wed, Jul 7, 2021 at 10:08 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
On Wed, Jul 7, 2021 at 12:42 PM Jim Mattson <jmattson@google.com> wrote:
On Wed, Jul 7, 2021 at 8:09 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
CCing libvir-list, Jiri Denemark, Michal Privoznik, so they are aware that the definition of "supported CPU features" will probably become a bit more complex in the future.
Has there ever been a clear definition? Family, model, and stepping, for instance: are these the only values supported? That would make cross-platform migration impossible. What about the vendor string? Is that the only value supported? That would make cross-vendor migration impossible. For the maximum input value for basic CPUID information (CPUID.0H:EAX), is that the only value supported, or is it the maximum value supported? On the various individual feature bits, does a '1' imply that '0' is also supported, or is '1' the only value supported? What about the feature bits with reversed polarity (e.g. CPUID.(EAX=07H,ECX=0):EBX.FDP_EXCPTN_ONLY[bit 6])?
This API has never made sense to me. I have no idea how to interpret what it is telling me.
Is this about GET_SUPPORTED_CPUID, QEMU's query-cpu-model-expansion & related commands, or the libvirt CPU APIs?
This is my ongoing rant about KVM_GET_SUPPORTED_CPUID.
I agree the definition is not clear. I have tried to enumerate below what QEMU assumes about the return value of KVM_GET_SUPPORTED_CPUID. These are a collection of workarounds and feature-specific rules that are encoded in the kvm_arch_get_supported_cpuid() x86_cpu_filter_features(), and cpu_x86_cpuid() functions in QEMU.
1. Passing through the returned values (unchanged) from KVM_GET_SUPPORTED_CPUID to KVM_SET_CPUID is assumed to be always safe, as long as the ability to save/resume VCPU state is not required. (This is the behavior implemented by "-cpu host,migratable=off")
Right, this is basically the definition of KVM_GET_SUPPORTED_CPUID.
2. The safety of setting a bit to a different value requires specific knowledge about the CPUID bit. 2.1. For a specific set of registers (see below), QEMU assumes it's safe to set the bit to 0 when KVM_GET_SUPPORTED_CPUID returns 1. 2.2. For a few specific leaves (see below), there are more complex rules. 2.4. For all other leaves, QEMU doesn't use the return value of KVM_GET_SUPPORTED_CPUID at all (AFAICS).
The CPUID leaves mentioned in 2.1 are:
CPUID[1].EDX CPUID[1].ECX CPUID[6].EAX CPUID[EAX=7,ECX=0].EBX - This unfortunately includes de-feature bits like FDP_EXCPTN_ONLY and ZERO_FCS_FDS CPUID[EAX=7,ECX=0].ECX CPUID[EAX=7,ECX=0].EDX CPUID[EAX=7,ECX=1].EAX CPUID[EAX=0Dh,ECX=0].EAX CPUID[EAX=0Dh,ECX=0].EDX CPUID[EAX=0Dh,ECX=1].EAX - Note that CPUID[0Dh] has additional logic to ensure XSAVE component info on CPUID is consistent CPUID[40000001h].EAX CPUID[40000001h].EDX CPUID[80000001h].EDX CPUID[80000001h].ECX CPUID[80000007h].EDX CPUID[80000008h].EBX CPUID[8000000Ah].EDX CPUID[C0000001h].EDX
Plus all unknown leaves.
Some of the CPUID leaves mentioned in 2.2 are:
CPUID[1].ECX.HYPERVISOR[bit 31] - Can be enabled unconditionally CPUID[1].ECX.TSC_DEADLINE_TIMER[bit 24] - Can be set to 1 if using the in-kernel irqchip and KVM_CAP_TSC_DEADLINE_TIMER is enabled CPUID[1].ECX.X2APIC[bit 21] - Can be set to 1 if using the in-kernel irqchip CPUID[1].ECX.MONITOR[bit 3] - Can be set to 1 if KVM_X86_DISABLE_EXITS_MWAIT is enabled
Can always be set to 1, but only makes sense to do so if KVM_X86_DISABLE_EXITS_MWAIT is enabled.
CPUID[6].EAX.ARAT[bit 2] - Can be enabled unconditionally CPUID[EAX=7,ECX=0].EDX.ARCH_CAPABILITIES - Workaround for KVM bug in Linux v4.17-v4.20 CPUID[EAX=14h,ECX=0], CPUID{EAX=14h,ECX=1] - Most bits must match the host, unless CPUID[EAX=7,ECX=0].EBX.INTEL_PT[bit 25] is 0 CPUID[80000001h].EDX - AMD-specific feature flag aliases can be set based on CPUID[1].EDX CPUID[40000001h].EAX - KVM_FEATURE_PV_UNHALT requires in-kernel irqchip - KVM_FEATURE_MSI_EXT_DEST_ID requires split irqchip CPUID[40000001].EDX.KVM_HINTS_REALTIME - Can be enabled unconditionally
This should apply to all of CPUID[4000_0001h].EDX in the future Thanks Eduardo, this is a great start for kernel-side documentation! I'll wrap it in a patch. Paolo
participants (3)
-
Eduardo Habkost
-
Jim Mattson
-
Paolo Bonzini