Re: [libvirt] [PATCH 0/5] Use non-blacklisted family/model/stepping for Haswell CPU model

12 Jan 2017

      On Thu, Jan 12, 2017 at 12:38:00PM +0000, Dr. David Alan Gilbert wrote:
...
* Eduardo Habkost (ehabkost@redhat.com) wrote:
...
On Mon, Jan 09, 2017 at 11:35:54AM +0000, Dr. David Alan Gilbert wrote:
...
* Eduardo Habkost (ehabkost@redhat.com) wrote:
...
A recent glibc commit[1] added a blacklist to ensure it won't use
TSX on hosts that are known to have a broken TSX implementation.
Our existing Haswell CPU model has a blacklisted
family/model/stepping combination, so it has to be updated to
make sure guests will really use TSX. This is done by patch 5/5.
However, to do this safely we need to ensure the host CPU is not
a blacklisted one, so we won't mislead guests by exposing
known-to-be-good FMS values on a known-to-be-broken host. This is
done by patch 3/5.
I'd just like to mke sure I understand the way this will fail in a migration;
lets say we have a guest that doesn't have the new libc and hosts
with a blacklisted CPU, and -cpu Haswell.
If I understand correctly then:
  a) With 'enforce' the destination qemu will fail to start
     printing an error about the host lack of tsx feature.
Yes.
...
b) Without 'enforce' the destination will start but print 
     the same error as a warning, but the guest will probably
     break as soon as it tries to use a tsx feature?
Yes. The general rule is: without "enforce", live migration can
break in unpredictable ways.
Without "enforce", QEMU will print a warning, and the VCPU will
run _without_ the TSX features on CPUID. If we're live-migrating,
it may break the guest if it tries to use a TSX feature, or break
migration if a TSX-related bit is already set on a MSR.
OK, but you've been telling people to use "enforce" long enough that
they should have listened.
Are there any other cases we have to worry about;  lets say a VM with the
new libc being migrated from an older QEMU, it suddenly changes
CPU ID to one that's supported; what happens?
I assume you are talking about the stepping change, when
migrating from an old QEMU to a host that is _not_ on the
blacklist. In this case, the guest won't see any changes: CPUID
family/model/stepping will be kept the same for the whole life of
the VM (even if it is shut down), thanks to the machine-type
compatibility code.
...
I'm hoping the guest CPU ID is preserved with the TSX disabled until
a reboot?
CPUID changes are effective immediately on migration. But guests
often notice the change only on the next reboot.

We could do something to make these problems less likely:
including CPUID data on the migration stream. I have considered
it on the past, but never implemented it. Maybe I should
reconsider that.

(This is another case where it would be interesting to have a
mechanism to let the destination host abort migration early: we
could make QEMU work in "enforce" mode when live-migrating, but
using the migrated CPUID data. This way CPUID changes would never
happen.)
...
Dave
...
...
Any other combination?
Dave
...
[1] https://sourceware.org/git/?p=glibc.git;a=commit;h=2702856bf45c82cf8e69f2064...
---
Cc: dgilbert@redhat.com
Cc: fweimer@redhat.com
Cc: carlos@redhat.com
Cc: triegel@redhat.com
Cc: berrange@redhat.com
Cc: jdenemar@redhat.com
Cc: pbonzini@redhat.com
Eduardo Habkost (5):
  i386: Add explicit array size to x86_cpu_vendor_words2str()
  i386: host_vendor_fms() helper function
  i386/kvm: Blacklist TSX on known broken hosts
  pc: Add 2.9 machine-types
  i386: Change stepping of Haswell to non-blacklisted value
include/hw/i386/pc.h |  6 ++++++
 target/i386/cpu.h    |  1 +
 hw/i386/pc_piix.c    | 15 ++++++++++++---
 hw/i386/pc_q35.c     | 13 +++++++++++--
 target/i386/cpu.c    | 32 ++++++++++++++++++++++----------
 target/i386/kvm.c    | 17 +++++++++++++++++
 6 files changed, 69 insertions(+), 15 deletions(-)
-- 
2.11.0.259.g40922b1
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- 
Eduardo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- 
Eduardo