Re: [libvirt] [PATCH 1/6] kvm: Add KVM_GET_EMULATED_CPUID

On Tue, Sep 24, 2013 at 01:04:14PM +0300, Gleb Natapov wrote:
On Tue, Sep 24, 2013 at 11:57:00AM +0200, Borislav Petkov wrote:
On Mon, September 23, 2013 6:28 pm, Eduardo Habkost wrote:
On Sun, Sep 22, 2013 at 04:44:50PM +0200, Borislav Petkov wrote:
From: Borislav Petkov <bp@suse.de>
Add a kvm ioctl which states which system functionality kvm emulates. The format used is that of CPUID and we return the corresponding CPUID bits set for which we do emulate functionality.
Let me check if I understood the purpose of the new ioctl correctly: the only reason for GET_EMULATED_CPUID to exist is to allow userspace to differentiate features that are native or that are emulated efficiently (GET_SUPPORTED_CPUID) and features that are emulated not very efficiently (GET_EMULATED_CPUID)?
Not only that - emulated features are not reported in CPUID so they can be enabled only when specifically and explicitly requested, i.e. "+movbe". Basically, you want to emulate that feature for the guest but only for this specific guest - the others shouldn't see it.
Then we may have a problem: some CPU models already have "movbe" included (e.g. Haswell), and patch 6/6 will make "-cpu Haswell" get movbe enabled even if it is being emulated. So if we really want to avoid enabling emulated features by mistake, we may need a new CPU flag in addition to "enforce" to tell QEMU that it is OK to enable emulated features (maybe "-cpu ...,emulate"?).
If that's the case, how do we decide how efficient emulation should be, to deserve inclusion in GET_SUPPORTED_CPUID? I am guessing that the criterion will be: if enabling it doesn't risk making performance worse, it can get in GET_SUPPORTED_CPUID.
Well, in the MOVBE case, supported means, the host can execute this instruction natively. Now, you guys say you can emulate x2apic very efficiently and I'm guessing emulating x2apic doesn't bring any emulation overhead, thus SUPPORTED_CPUID.
x2apic emulation has nothing to do with x2apic in a host. It is emulated same way no matter if host has it or not. x2apic is not really cpu feature, but apic one and apic is fully emulated by KVM anyway.
But my question still stands: suppose we had x2apic emulation implemented but for some reason it was painfully slow, we wouldn't want to enable it by mistake. In this case, it would end up on EMULATED_CPUID and not on SUPPORTED_CPUID, right?
But for single instructions or group of instructions, the distinction should be very clear.
At least this is how I see it but Gleb probably can comment too.
That's how I see it two. Basically you want to use movbe emulation (as opposite of virtualization) only if you have binary kernel that compiled for CPU with movbe (Borislav's use case), or you want to migrate temporarily from movbe enabled host to non movbe host because downtime is not an option. We should avoid enabling it "by mistake".
"we should avoid enabling it 'by mistake'" sounds like a good criterion for including something on GET_EMULATED_CPUID instead of GET_SUPPORTED_CPUID. In that case, I believe QEMU should use GET_EMULATED_CPUID only if explicitly requested in the configuration/command-line (that's not what patch 6/6 does). -- Eduardo

On Thu, Sep 26, 2013 at 11:19:15AM -0300, Eduardo Habkost wrote:
Then we may have a problem: some CPU models already have "movbe" included (e.g. Haswell), and patch 6/6 will make "-cpu Haswell" get movbe enabled even if it is being emulated.
Huh? HSW has MOVBE so we won't #UD on it and MOVBE will get executed in hardware when executing the guest. IOW, we'll never get to the emulation path of piggybacking on the #UD.
So if we really want to avoid enabling emulated features by mistake, we may need a new CPU flag in addition to "enforce" to tell QEMU that it is OK to enable emulated features (maybe "-cpu ...,emulate"?).
EMULATED_CPUID are off by default and only if you request them specifically, they get enabled. If you start with "-cpu Haswell", MOVBE will be already set in the host CPUID. Or am I missing something?
But my question still stands: suppose we had x2apic emulation implemented but for some reason it was painfully slow, we wouldn't want to enable it by mistake. In this case, it would end up on EMULATED_CPUID and not on SUPPORTED_CPUID, right?
IMHO we want to enable emulation only when explicitly requested... regardless of the emulation performance. Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. --

On Thu, Sep 26, 2013 at 08:55:24PM +0200, Borislav Petkov wrote:
On Thu, Sep 26, 2013 at 11:19:15AM -0300, Eduardo Habkost wrote:
Then we may have a problem: some CPU models already have "movbe" included (e.g. Haswell), and patch 6/6 will make "-cpu Haswell" get movbe enabled even if it is being emulated.
Huh? HSW has MOVBE so we won't #UD on it and MOVBE will get executed in hardware when executing the guest. IOW, we'll never get to the emulation path of piggybacking on the #UD.
So if we really want to avoid enabling emulated features by mistake, we may need a new CPU flag in addition to "enforce" to tell QEMU that it is OK to enable emulated features (maybe "-cpu ...,emulate"?).
EMULATED_CPUID are off by default and only if you request them specifically, they get enabled.
Please point me to the code that does this, because I don't see it on patch 6/6.
If you start with "-cpu Haswell", MOVBE will be already set in the host CPUID.
Or am I missing something?
In the Haswell example, it is unlikely but possible in theory: you would need a CPU that supported all features from Haswell except movbe. But what will happen if you are using "-cpu n270,enforce" on a SandyBridge host? Also, we don't know anything about future CPUs or future features that will end up on EMULATED_CPUID. The current code doesn't have anything to differentiate features that were already included in the CPU definition and ones explicitly enabled in the command-line (and I would like to keep it that way). And just because a feature was explicitly enabled in the command-line, that doesn't mean the user believe it is acceptable to get it running in emulated mode. That's why I propose a new "emulate" flag, to allow features to be enabled in emulated mode.
But my question still stands: suppose we had x2apic emulation implemented but for some reason it was painfully slow, we wouldn't want to enable it by mistake. In this case, it would end up on EMULATED_CPUID and not on SUPPORTED_CPUID, right?
IMHO we want to enable emulation only when explicitly requested... regardless of the emulation performance.
Well, x2apic is emulated by KVM, and it is on SUPPORTED_CPUID. Ditto for tsc-deadline. Or are you talking specifically about instruction emulation? -- Eduardo

On Thu, Sep 26, 2013 at 04:20:59PM -0300, Eduardo Habkost wrote:
Please point me to the code that does this, because I don't see it on patch 6/6.
@@ -1850,7 +1850,14 @@ static void filter_features_for_kvm(X86CPU *cpu) wi->cpuid_ecx, wi->cpuid_reg); uint32_t requested_features = env->features[w]; + + uint32_t emul_features = kvm_arch_get_emulated_cpuid(s, wi->cpuid_eax, + wi->cpuid_ecx, + wi->cpuid_reg); + env->features[w] &= host_feat; + env->features[w] |= (requested_features & emul_features); Basically we give the requested_features a second chance here. If we don't request an emulated feature, it won't get enabled.
If you start with "-cpu Haswell", MOVBE will be already set in the host CPUID.
Or am I missing something?
In the Haswell example, it is unlikely but possible in theory: you would need a CPU that supported all features from Haswell except movbe. But what will happen if you are using "-cpu n270,enforce" on a SandyBridge host?
That's an interesting question: AFAICT, it will fail because MOVBE is not available on the host, right? And if so, then this is correct behavior IMHO, or how exactly is the "enforce" thing supposed to work? Enforce host CPUID?
Also, we don't know anything about future CPUs or future features that will end up on EMULATED_CPUID. The current code doesn't have anything to differentiate features that were already included in the CPU definition and ones explicitly enabled in the command-line (and I would like to keep it that way).
Ok.
And just because a feature was explicitly enabled in the command-line, that doesn't mean the user believe it is acceptable to get it running in emulated mode. That's why I propose a new "emulate" flag, to allow features to be enabled in emulated mode.
And I think, saying "-cpu ...,+movbe" is an explicit statement enough to say that yes, I am starting this guest and I want MOVBE emulation.
Well, x2apic is emulated by KVM, and it is on SUPPORTED_CPUID. Ditto for tsc-deadline. Or are you talking specifically about instruction emulation?
Basically, I'm viewing this from a very practical standpoint - if I build a kernel which requires MOVBE support but I cannot boot it in kvm because it doesn't emulate MOVBE (TCG does now but it didn't before) I'd like to be able to address that shortcoming by emulating that instruction, if possible. And the whole discussion grew out from the standpoint of being able to emulate stuff so that you can do quick and dirty booting of kernels but not show that emulation capability to the wide audience since it is slow and it shouldn't be used and then migration has issues, etc, etc. But hey, I don't really care all that much if I have to also say -emulate in order to get my functionality. Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. --

On Thu, Sep 26, 2013 at 10:32:06PM +0200, Borislav Petkov wrote:
On Thu, Sep 26, 2013 at 04:20:59PM -0300, Eduardo Habkost wrote:
Please point me to the code that does this, because I don't see it on patch 6/6.
@@ -1850,7 +1850,14 @@ static void filter_features_for_kvm(X86CPU *cpu) wi->cpuid_ecx, wi->cpuid_reg); uint32_t requested_features = env->features[w]; + + uint32_t emul_features = kvm_arch_get_emulated_cpuid(s, wi->cpuid_eax, + wi->cpuid_ecx, + wi->cpuid_reg); + env->features[w] &= host_feat; + env->features[w] |= (requested_features & emul_features);
Basically we give the requested_features a second chance here.
If we don't request an emulated feature, it won't get enabled.
The problem here is that "requested_features" doesn't include just the explicit "+flag" flags, but any flag included in the CPU model definition. See the "-cpu n270" example below.
If you start with "-cpu Haswell", MOVBE will be already set in the host CPUID.
Or am I missing something?
In the Haswell example, it is unlikely but possible in theory: you would need a CPU that supported all features from Haswell except movbe. But what will happen if you are using "-cpu n270,enforce" on a SandyBridge host?
That's an interesting question: AFAICT, it will fail because MOVBE is not available on the host, right?
It should, but your patch will make it stop failing because of MOVBE, as now it can be emulated[1].
And if so, then this is correct behavior IMHO, or how exactly is the "enforce" thing supposed to work? Enforce host CPUID?
"enforce" makes sure all features are really being enabled. It makes QEMU abort if there's any feature that can't be enabled on that host. [1] Maybe one source of confusion is that the existing code have two feature-filtering functions doing basically the same thing: filter_features_for_kvm() and kvm_check_features_against_host(). That's something we must clean up, and they should be unified. "enforce" should become synonymous to "make sure filtered_features is all zeroes". This way, libvirt can emulate what 'enforce" does while being able to collect detailed error information (which is not easy to do if QEMU simply aborts).
Also, we don't know anything about future CPUs or future features that will end up on EMULATED_CPUID. The current code doesn't have anything to differentiate features that were already included in the CPU definition and ones explicitly enabled in the command-line (and I would like to keep it that way).
Ok.
And just because a feature was explicitly enabled in the command-line, that doesn't mean the user believe it is acceptable to get it running in emulated mode. That's why I propose a new "emulate" flag, to allow features to be enabled in emulated mode.
And I think, saying "-cpu ...,+movbe" is an explicit statement enough to say that yes, I am starting this guest and I want MOVBE emulation.
Not necessarily. libvirt has some code that will translate its own CPU model definition to a "-cpu Model,+flag,+flag,+flag,-flag" command-line when necessary. It is by design that there is no difference between explicit "+flag" options and existing flags from the CPU model definition.
Well, x2apic is emulated by KVM, and it is on SUPPORTED_CPUID. Ditto for tsc-deadline. Or are you talking specifically about instruction emulation?
Basically, I'm viewing this from a very practical standpoint - if I build a kernel which requires MOVBE support but I cannot boot it in kvm because it doesn't emulate MOVBE (TCG does now but it didn't before) I'd like to be able to address that shortcoming by emulating that instruction, if possible.
And the whole discussion grew out from the standpoint of being able to emulate stuff so that you can do quick and dirty booting of kernels but not show that emulation capability to the wide audience since it is slow and it shouldn't be used and then migration has issues, etc, etc.
But hey, I don't really care all that much if I have to also say -emulate in order to get my functionality.
OK, I undestand your use case, now. Thanks for your explanation. -- Eduardo

On Fri, Sep 27, 2013 at 11:21:34AM -0300, Eduardo Habkost wrote:
The problem here is that "requested_features" doesn't include just the explicit "+flag" flags, but any flag included in the CPU model definition. See the "-cpu n270" example below.
Oh, you mean if requested_features would contain a flag included from the CPU model definition - a flag which we haven't requested explicitly - and if kvm emulates that flag, then it will get enabled? Hmm.
It should, but your patch will make it stop failing because of MOVBE, as now it can be emulated[1].
Right.
"enforce" makes sure all features are really being enabled. It makes QEMU abort if there's any feature that can't be enabled on that host.
Ok.
[1] Maybe one source of confusion is that the existing code have two feature-filtering functions doing basically the same thing: filter_features_for_kvm() and kvm_check_features_against_host(). That's
Yes, and the first gets executed unconditionally and does the feature filtering, right after the second has run in the kvm_enabled() branch.
something we must clean up, and they should be unified. "enforce" should become synonymous to "make sure filtered_features is all zeroes". This way, libvirt can emulate what 'enforce" does while being able to collect detailed error information (which is not easy to do if QEMU simply aborts).
Ok, maybe someone who's more knowledgeable with this code should do it - not me :) Also, there's another aspect, while we're here: now that QEMU emulates MOVBE with TCG too, how do we specify on the command line, which emulation should be used - kvm.ko or QEMU? Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. --

On Sat, Sep 28, 2013 at 12:49:04PM +0200, Borislav Petkov wrote:
On Fri, Sep 27, 2013 at 11:21:34AM -0300, Eduardo Habkost wrote:
The problem here is that "requested_features" doesn't include just the explicit "+flag" flags, but any flag included in the CPU model definition. See the "-cpu n270" example below.
Oh, you mean if requested_features would contain a flag included from the CPU model definition - a flag which we haven't requested explicitly - and if kvm emulates that flag, then it will get enabled?
Exactly. The code needs to filter/check all feature bits on the CPU, not just the ones requested explicitly in the command-line. [...]
[1] Maybe one source of confusion is that the existing code have two feature-filtering functions doing basically the same thing: filter_features_for_kvm() and kvm_check_features_against_host(). That's
Yes, and the first gets executed unconditionally and does the feature filtering, right after the second has run in the kvm_enabled() branch.
This should be fixed, too: eventually "enforce" should work on TCG mode as well.
something we must clean up, and they should be unified. "enforce" should become synonymous to "make sure filtered_features is all zeroes". This way, libvirt can emulate what 'enforce" does while being able to collect detailed error information (which is not easy to do if QEMU simply aborts).
Ok, maybe someone who's more knowledgeable with this code should do it - not me :)
I have added it to my TODO-list. :-)
Also, there's another aspect, while we're here: now that QEMU emulates MOVBE with TCG too, how do we specify on the command line, which emulation should be used - kvm.ko or QEMU?
You can use accel={tcg,kvm} option on the "-machine" argument, e.g. "-machine pc,accel=kvm". Or the "-enable-kvm" option. -- Eduardo

On Mon, Sep 30, 2013 at 01:13:34PM -0300, Eduardo Habkost wrote:
I have added it to my TODO-list. :-)
Cool, thanks. Let me know if I can test stuff and help out somehow.
Also, there's another aspect, while we're here: now that QEMU emulates MOVBE with TCG too, how do we specify on the command line, which emulation should be used - kvm.ko or QEMU?
You can use accel={tcg,kvm} option on the "-machine" argument, e.g. "-machine pc,accel=kvm". Or the "-enable-kvm" option.
Ah, ok. Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. --
participants (2)
-
Borislav Petkov
-
Eduardo Habkost