On Mon, Dec 14, 2015 at 07:17:27PM -0500, Raj, Ashok wrote:
On Mon, Dec 14, 2015 at 11:37:16PM +0100, Borislav Petkov wrote:
> On Mon, Dec 14, 2015 at 02:11:46PM -0500, Raj, Ashok wrote:
> > This is mostly harmless.. since the MCG_CAP space is shared and has no
> > conflict between vendors. Also just the CAP being set has no effect.
>
> Of course it does - we check SER_P in machine_check_poll() and when
> I emulate an AMD guest and inject errors into it, error handling is
> obviously wrong, see:
>
>
https://lkml.kernel.org/r/20151123150355.GE5134@pd.tnic
>
I can see how this hurts.. since the poller isn't doing cpu model specific
stuff..?
in the LMCE case, even if you advertise MCG_LMCE_P in MCG_CAP, the guest kernel
wont call intel_init_lmce() only from mce_intel.c.. so the same problem
won't happen.
but the issue Eduardo mentioned seems like the following.
New QEMU_LMCE + New KVM_LMCE + New_GUEST_LMCE - No problem
but if you were to migrage the Guest_LMCE to a non-LMCE supported KVM host
we could run into an issue..
is this the compatibility issue that you were looking to fix Eduardo?
If I understood you correctly, yes. Also, note that currently
kvm_arch_init_vcpu() simply warns about missing capabilities,
instead of preventing the VM from running/migrating (as it
should). We need to change that, and figure out a good way to
report "feature FOO can't be enabled in this host" errors to
management software[1]. The main problem is that we don't even
have a QMP console available anymore if machine initialization is
aborted.
CCing libvir-list so they get in the loop.
[1] This is similar to what we need for CPUID checks, but the new
MCE feature means we need something more generic (that just
reports QOM property names, probably?)
--
Eduardo