[libvirt] [PATCH] Drop hle feature for Haswell CPUs

The intel-microcode 3.20140913.1 update disables TSX-NI (transactional memory instructions). When a server running libvirt is rebooted with this update, libvirt no longer considers the machine to have a Haswell CPU: # virsh capabilities | grep -A1 '<arch>x86_64' <arch>x86_64</arch> <model>SandyBridge</model> Since Intel disables the feature on their CPUs we shouldn't check for it as well to keep VMs using Haswell working. This was debugged and reported by Chris Boot at http://bugs.debian.org/773189 --- src/cpu/cpu_map.xml | 1 - 1 file changed, 1 deletion(-) diff --git a/src/cpu/cpu_map.xml b/src/cpu/cpu_map.xml index bd9b056..f41dbce 100644 --- a/src/cpu/cpu_map.xml +++ b/src/cpu/cpu_map.xml @@ -507,7 +507,6 @@ <feature name='movbe'/> <feature name='fsgsbase'/> <feature name='bmi1'/> - <feature name='hle'/> <feature name='avx2'/> <feature name='smep'/> <feature name='bmi2'/> -- 2.1.3

On Tue, Dec 16, 2014 at 01:33:15PM +0100, Guido Günther wrote:
The intel-microcode 3.20140913.1 update disables TSX-NI (transactional memory instructions). When a server running libvirt is rebooted with this update, libvirt no longer considers the machine to have a Haswell CPU:
# virsh capabilities | grep -A1 '<arch>x86_64' <arch>x86_64</arch> <model>SandyBridge</model>
Since Intel disables the feature on their CPUs we shouldn't check for it as well to keep VMs using Haswell working.
This was debugged and reported by Chris Boot at http://bugs.debian.org/773189 --- src/cpu/cpu_map.xml | 1 - 1 file changed, 1 deletion(-)
diff --git a/src/cpu/cpu_map.xml b/src/cpu/cpu_map.xml index bd9b056..f41dbce 100644 --- a/src/cpu/cpu_map.xml +++ b/src/cpu/cpu_map.xml @@ -507,7 +507,6 @@ <feature name='movbe'/> <feature name='fsgsbase'/> <feature name='bmi1'/> - <feature name='hle'/> <feature name='avx2'/> <feature name='smep'/> <feature name='bmi2'/>
NACK. We can not change existing models in cpu_map.xml because that results in a guest ABI change. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tue, Dec 16, 2014 at 12:40:26PM +0000, Daniel P. Berrange wrote:
On Tue, Dec 16, 2014 at 01:33:15PM +0100, Guido Günther wrote:
The intel-microcode 3.20140913.1 update disables TSX-NI (transactional memory instructions). When a server running libvirt is rebooted with this update, libvirt no longer considers the machine to have a Haswell CPU:
# virsh capabilities | grep -A1 '<arch>x86_64' <arch>x86_64</arch> <model>SandyBridge</model>
Since Intel disables the feature on their CPUs we shouldn't check for it as well to keep VMs using Haswell working.
This was debugged and reported by Chris Boot at http://bugs.debian.org/773189 --- src/cpu/cpu_map.xml | 1 - 1 file changed, 1 deletion(-)
diff --git a/src/cpu/cpu_map.xml b/src/cpu/cpu_map.xml index bd9b056..f41dbce 100644 --- a/src/cpu/cpu_map.xml +++ b/src/cpu/cpu_map.xml @@ -507,7 +507,6 @@ <feature name='movbe'/> <feature name='fsgsbase'/> <feature name='bmi1'/> - <feature name='hle'/> <feature name='avx2'/> <feature name='smep'/> <feature name='bmi2'/>
NACK. We can not change existing models in cpu_map.xml because that results in a guest ABI change.
But Intel changed the ABI as well by removing hle so what would we do to move forward? Introduce another model? This would still break systems with the microcode update. Cheers, -- Guido

On Tue, Dec 16, 2014 at 03:37:06PM +0100, Guido Günther wrote:
On Tue, Dec 16, 2014 at 12:40:26PM +0000, Daniel P. Berrange wrote:
On Tue, Dec 16, 2014 at 01:33:15PM +0100, Guido Günther wrote:
The intel-microcode 3.20140913.1 update disables TSX-NI (transactional memory instructions). When a server running libvirt is rebooted with this update, libvirt no longer considers the machine to have a Haswell CPU:
# virsh capabilities | grep -A1 '<arch>x86_64' <arch>x86_64</arch> <model>SandyBridge</model>
Since Intel disables the feature on their CPUs we shouldn't check for it as well to keep VMs using Haswell working.
This was debugged and reported by Chris Boot at http://bugs.debian.org/773189 --- src/cpu/cpu_map.xml | 1 - 1 file changed, 1 deletion(-)
diff --git a/src/cpu/cpu_map.xml b/src/cpu/cpu_map.xml index bd9b056..f41dbce 100644 --- a/src/cpu/cpu_map.xml +++ b/src/cpu/cpu_map.xml @@ -507,7 +507,6 @@ <feature name='movbe'/> <feature name='fsgsbase'/> <feature name='bmi1'/> - <feature name='hle'/> <feature name='avx2'/> <feature name='smep'/> <feature name='bmi2'/>
NACK. We can not change existing models in cpu_map.xml because that results in a guest ABI change.
But Intel changed the ABI as well by removing hle so what would we do to move forward? Introduce another model? This would still break systems with the microcode update.
Simply do nothing. There is no requirement that guest CPU model names match the host CPU you are running on. A guest CPU model name is simply a synonym for a collection of features. If libvirt reports 'SandyBridge' for the host CPU model, when it is in fact 'Hasmere' that's not a serious problem, because the key fact is that the feature bits are correctly detected regardless of what name is reported. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tue, Dec 16, 2014 at 02:47:24PM +0000, Daniel P. Berrange wrote:
On Tue, Dec 16, 2014 at 03:37:06PM +0100, Guido Günther wrote:
On Tue, Dec 16, 2014 at 12:40:26PM +0000, Daniel P. Berrange wrote:
On Tue, Dec 16, 2014 at 01:33:15PM +0100, Guido Günther wrote:
The intel-microcode 3.20140913.1 update disables TSX-NI (transactional memory instructions). When a server running libvirt is rebooted with this update, libvirt no longer considers the machine to have a Haswell CPU:
# virsh capabilities | grep -A1 '<arch>x86_64' <arch>x86_64</arch> <model>SandyBridge</model>
Since Intel disables the feature on their CPUs we shouldn't check for it as well to keep VMs using Haswell working.
This was debugged and reported by Chris Boot at http://bugs.debian.org/773189 --- src/cpu/cpu_map.xml | 1 - 1 file changed, 1 deletion(-)
diff --git a/src/cpu/cpu_map.xml b/src/cpu/cpu_map.xml index bd9b056..f41dbce 100644 --- a/src/cpu/cpu_map.xml +++ b/src/cpu/cpu_map.xml @@ -507,7 +507,6 @@ <feature name='movbe'/> <feature name='fsgsbase'/> <feature name='bmi1'/> - <feature name='hle'/> <feature name='avx2'/> <feature name='smep'/> <feature name='bmi2'/>
NACK. We can not change existing models in cpu_map.xml because that results in a guest ABI change.
But Intel changed the ABI as well by removing hle so what would we do to move forward? Introduce another model? This would still break systems with the microcode update.
Simply do nothing. There is no requirement that guest CPU model names match the host CPU you are running on. A guest CPU model name is simply a synonym for a collection of features. If libvirt reports 'SandyBridge' for the host CPU model, when it is in fact 'Hasmere' that's not a serious problem, because the key fact is that the feature bits are correctly detected regardless of what name is reported.
But this means that VMs that have e.g. <cpu mode='custom' match='exact'> <model fallback='forbid'>Haswell</model> <vendor>Intel</vendor> ... </cpu> refuse to start after the microcode upgrade like error: Failed to start domain foo error: unsupported configuration: guest and host CPU are not compatible: Host CPU does not provide required features: hle Following your argument this makes a lot of sense since the feature isn't in the set of supported features anymore. From a users perspective it's confusing since they have a Haswell CPU built in still. So should we define a Haswell2 cpu type without hle so users can select it in case they don't want to use 'host-model'? Cheers, -- Guido

On Fri, Dec 19, 2014 at 09:43:14AM +0100, Guido Günther wrote:
On Tue, Dec 16, 2014 at 02:47:24PM +0000, Daniel P. Berrange wrote:
On Tue, Dec 16, 2014 at 03:37:06PM +0100, Guido Günther wrote:
On Tue, Dec 16, 2014 at 12:40:26PM +0000, Daniel P. Berrange wrote:
On Tue, Dec 16, 2014 at 01:33:15PM +0100, Guido Günther wrote:
The intel-microcode 3.20140913.1 update disables TSX-NI (transactional memory instructions). When a server running libvirt is rebooted with this update, libvirt no longer considers the machine to have a Haswell CPU:
# virsh capabilities | grep -A1 '<arch>x86_64' <arch>x86_64</arch> <model>SandyBridge</model>
Since Intel disables the feature on their CPUs we shouldn't check for it as well to keep VMs using Haswell working.
This was debugged and reported by Chris Boot at http://bugs.debian.org/773189 --- src/cpu/cpu_map.xml | 1 - 1 file changed, 1 deletion(-)
diff --git a/src/cpu/cpu_map.xml b/src/cpu/cpu_map.xml index bd9b056..f41dbce 100644 --- a/src/cpu/cpu_map.xml +++ b/src/cpu/cpu_map.xml @@ -507,7 +507,6 @@ <feature name='movbe'/> <feature name='fsgsbase'/> <feature name='bmi1'/> - <feature name='hle'/> <feature name='avx2'/> <feature name='smep'/> <feature name='bmi2'/>
NACK. We can not change existing models in cpu_map.xml because that results in a guest ABI change.
But Intel changed the ABI as well by removing hle so what would we do to move forward? Introduce another model? This would still break systems with the microcode update.
Simply do nothing. There is no requirement that guest CPU model names match the host CPU you are running on. A guest CPU model name is simply a synonym for a collection of features. If libvirt reports 'SandyBridge' for the host CPU model, when it is in fact 'Hasmere' that's not a serious problem, because the key fact is that the feature bits are correctly detected regardless of what name is reported.
But this means that VMs that have e.g.
<cpu mode='custom' match='exact'> <model fallback='forbid'>Haswell</model> <vendor>Intel</vendor> ... </cpu>
refuse to start after the microcode upgrade like
error: Failed to start domain foo error: unsupported configuration: guest and host CPU are not compatible: Host CPU does not provide required features: hle
Following your argument this makes a lot of sense since the feature isn't in the set of supported features anymore.
In particular this ensures that the correct behaviour happens during live migration. ie libvirt will refuse to migrate a guest from a CPU without the microcode, over to a CPU with the microcode, since that target host would not be able to support the features the guest is running with.
From a users perspective it's confusing since they have a Haswell CPU built in still. So should we define a Haswell2 cpu type without hle so users can select it in case they don't want to use 'host-model'?
I'd really recommend that they just copy the entire <cpu> block out of the host capabilities XML into the guest XML if they want to have a matching host CPU model. There is some long term work going on to try and get CPU models that can be versioned against machine types, so that we have a way to fix these kind of messes in the future, without adding many new CPUs with a suffix of '2'. This isn't the first mistake we have in this area - there are quite a few others which already exist with CPU models having the wrong features compared to the physical silicon. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (2)
-
Daniel P. Berrange
-
Guido Günther