On Fri, Dec 19, 2014 at 09:43:14AM +0100, Guido Günther wrote:
On Tue, Dec 16, 2014 at 02:47:24PM +0000, Daniel P. Berrange wrote:
> On Tue, Dec 16, 2014 at 03:37:06PM +0100, Guido Günther wrote:
> > On Tue, Dec 16, 2014 at 12:40:26PM +0000, Daniel P. Berrange wrote:
> > > On Tue, Dec 16, 2014 at 01:33:15PM +0100, Guido Günther wrote:
> > > > The intel-microcode 3.20140913.1 update disables TSX-NI
(transactional
> > > > memory instructions). When a server running libvirt is rebooted with
> > > > this update, libvirt no longer considers the machine to have a
Haswell
> > > > CPU:
> > > >
> > > > # virsh capabilities | grep -A1 '<arch>x86_64'
> > > > <arch>x86_64</arch>
> > > > <model>SandyBridge</model>
> > > >
> > > > Since Intel disables the feature on their CPUs we shouldn't check
for it
> > > > as well to keep VMs using Haswell working.
> > > >
> > > > This was debugged and reported by Chris Boot at
> > > >
http://bugs.debian.org/773189
> > > > ---
> > > > src/cpu/cpu_map.xml | 1 -
> > > > 1 file changed, 1 deletion(-)
> > > >
> > > > diff --git a/src/cpu/cpu_map.xml b/src/cpu/cpu_map.xml
> > > > index bd9b056..f41dbce 100644
> > > > --- a/src/cpu/cpu_map.xml
> > > > +++ b/src/cpu/cpu_map.xml
> > > > @@ -507,7 +507,6 @@
> > > > <feature name='movbe'/>
> > > > <feature name='fsgsbase'/>
> > > > <feature name='bmi1'/>
> > > > - <feature name='hle'/>
> > > > <feature name='avx2'/>
> > > > <feature name='smep'/>
> > > > <feature name='bmi2'/>
> > >
> > > NACK. We can not change existing models in cpu_map.xml because that
> > > results in a guest ABI change.
> >
> > But Intel changed the ABI as well by removing hle so what would we do
> > to move forward? Introduce another model? This would still break
> > systems with the microcode update.
>
> Simply do nothing. There is no requirement that guest CPU model names
> match the host CPU you are running on. A guest CPU model name is simply
> a synonym for a collection of features. If libvirt reports 'SandyBridge'
> for the host CPU model, when it is in fact 'Hasmere' that's not a
serious
> problem, because the key fact is that the feature bits are correctly
> detected regardless of what name is reported.
But this means that VMs that have e.g.
<cpu mode='custom' match='exact'>
<model fallback='forbid'>Haswell</model>
<vendor>Intel</vendor>
...
</cpu>
refuse to start after the microcode upgrade like
error: Failed to start domain foo
error: unsupported configuration: guest and host CPU are not compatible: Host CPU does
not provide required features: hle
Following your argument this makes a lot of sense since the feature
isn't in the set of supported features anymore.
In particular this ensures that the correct behaviour happens during
live migration. ie libvirt will refuse to migrate a guest from a
CPU without the microcode, over to a CPU with the microcode, since
that target host would not be able to support the features the guest
is running with.
From a users
perspective it's confusing since they have a Haswell CPU built in
still. So should we define a Haswell2 cpu type without hle so users
can select it in case they don't want to use 'host-model'?
I'd really recommend that they just copy the entire <cpu> block
out of the host capabilities XML into the guest XML if they want
to have a matching host CPU model.
There is some long term work going on to try and get CPU models
that can be versioned against machine types, so that we have a
way to fix these kind of messes in the future, without adding
many new CPUs with a suffix of '2'. This isn't the first mistake
we have in this area - there are quite a few others which already
exist with CPU models having the wrong features compared to the
physical silicon.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|