
On Thu, Jan 12, 2017 at 12:38:00PM +0000, Dr. David Alan Gilbert wrote:
* Eduardo Habkost (ehabkost@redhat.com) wrote:
On Mon, Jan 09, 2017 at 11:35:54AM +0000, Dr. David Alan Gilbert wrote:
* Eduardo Habkost (ehabkost@redhat.com) wrote:
A recent glibc commit[1] added a blacklist to ensure it won't use TSX on hosts that are known to have a broken TSX implementation.
Our existing Haswell CPU model has a blacklisted family/model/stepping combination, so it has to be updated to make sure guests will really use TSX. This is done by patch 5/5.
However, to do this safely we need to ensure the host CPU is not a blacklisted one, so we won't mislead guests by exposing known-to-be-good FMS values on a known-to-be-broken host. This is done by patch 3/5.
I'd just like to mke sure I understand the way this will fail in a migration; lets say we have a guest that doesn't have the new libc and hosts with a blacklisted CPU, and -cpu Haswell.
If I understand correctly then: a) With 'enforce' the destination qemu will fail to start printing an error about the host lack of tsx feature.
Yes.
b) Without 'enforce' the destination will start but print the same error as a warning, but the guest will probably break as soon as it tries to use a tsx feature?
Yes. The general rule is: without "enforce", live migration can break in unpredictable ways.
Without "enforce", QEMU will print a warning, and the VCPU will run _without_ the TSX features on CPUID. If we're live-migrating, it may break the guest if it tries to use a TSX feature, or break migration if a TSX-related bit is already set on a MSR.
OK, but you've been telling people to use "enforce" long enough that they should have listened.
Are there any other cases we have to worry about; lets say a VM with the new libc being migrated from an older QEMU, it suddenly changes CPU ID to one that's supported; what happens?
I assume you are talking about the stepping change, when migrating from an old QEMU to a host that is _not_ on the blacklist. In this case, the guest won't see any changes: CPUID family/model/stepping will be kept the same for the whole life of the VM (even if it is shut down), thanks to the machine-type compatibility code.
I'm hoping the guest CPU ID is preserved with the TSX disabled until a reboot?
CPUID changes are effective immediately on migration. But guests often notice the change only on the next reboot. We could do something to make these problems less likely: including CPUID data on the migration stream. I have considered it on the past, but never implemented it. Maybe I should reconsider that. (This is another case where it would be interesting to have a mechanism to let the destination host abort migration early: we could make QEMU work in "enforce" mode when live-migrating, but using the migrated CPUID data. This way CPUID changes would never happen.)
Dave
Any other combination?
Dave
[1] https://sourceware.org/git/?p=glibc.git;a=commit;h=2702856bf45c82cf8e69f2064...
--- Cc: dgilbert@redhat.com Cc: fweimer@redhat.com Cc: carlos@redhat.com Cc: triegel@redhat.com Cc: berrange@redhat.com Cc: jdenemar@redhat.com Cc: pbonzini@redhat.com
Eduardo Habkost (5): i386: Add explicit array size to x86_cpu_vendor_words2str() i386: host_vendor_fms() helper function i386/kvm: Blacklist TSX on known broken hosts pc: Add 2.9 machine-types i386: Change stepping of Haswell to non-blacklisted value
include/hw/i386/pc.h | 6 ++++++ target/i386/cpu.h | 1 + hw/i386/pc_piix.c | 15 ++++++++++++--- hw/i386/pc_q35.c | 13 +++++++++++-- target/i386/cpu.c | 32 ++++++++++++++++++++++---------- target/i386/kvm.c | 17 +++++++++++++++++ 6 files changed, 69 insertions(+), 15 deletions(-)
-- 2.11.0.259.g40922b1
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- Eduardo
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-- Eduardo