[libvirt] [RFC] Support for CPUID masking v2

Hi, This is an attempt to provide similar flexibility to CPU ID masking without being x86-specific and unfriendly to users. As suggested by Dan, we need a way to specify both CPU flags and topology to achieve this goal. Firstly, CPU topology and all (actually all that libvirt knows about) CPU features have to be advertised in host capabilities: <host> <cpu> ... <features> <feature>NAME</feature> </features> <topology> <sockets>NUMBER_OF_SOCKETS</sockets> <cores>CORES_PER_SOCKET</cores> <threads>THREADS_PER_CORE</threads> </topology> </cpu> ... </host> I'm not 100% sure we should represent CPU features as <feature>NAME</feature> especially because some features are currently advertised as <NAME/>. However, extending XML schema every time a new feature is introduced doesn't look like a good idea at all. The problem is we can't get rid of <NAME/>-style features, which would result in redundancy: <features> <vmx/> <feature>vmx</feature> </features> But I think it's better than changing the schema to add new features. Secondly, drivers which support detailed CPU specification have to advertise it in guest capabilities. In case <features> are meant to be hypervisor features, than it could look like: <guest> ... <features> <cpu/> </features> </guest> But if they are meant to be CPU features, we need to come up with something else: <guest> ... <cpu_selection/> </guest> I'm not sure how to deal with named CPUs suggested by Dan. Either we need to come up with global set of named CPUs and document what they mean or let drivers specify their own named CPUs and advertise them through guest capabilities: <guest> ... <cpu model="NAME"> <feature>NAME</feature> ... </cpu> </guest> The former approach would make matching named CPUs with those defined by a hypervisor (such as qemu) quite hard. The latter could bring the need for hardcoding features provided by specific CPU models or, in case we decide not to provide a list of features for each CPU model, it can complicate transferring a domain from one hypervisor to another. And finally, CPU may be configured in domain XML configuration: <domain> ... <cpu model="NAME"> <topology> <sockets>NUMBER_OF_SOCKETS</sockets> <cores>CORES_PER_SOCKET</cores> <threads>THREADS_PER_CORE</threads> </topology> <feature name="NAME" mode="set|check" value="on|off"/> </cpu> </domain> Mode 'check' checks physical CPU for the feature and refuses the domain to start if it doesn't match. VCPU feature is set to the same value. Mode 'set' just sets the VCPU feature. Final note: <topology> could also be called <cpu_topology> to avoid confusion with NUMA <topology>, which is used in host capabilities. However, I prefer <cpu><topology>...</topology></cpu> over <cpu><cpu_topology>...</cpu_topology></cpu>. Thanks for your comments. Jirka

From: libvir-list-bounces@redhat.com [mailto:libvir-list- Hi,
This is an attempt to provide similar flexibility to CPU ID masking without being x86-specific and unfriendly to users. As suggested by Dan, we need a way to specify both CPU flags and topology to achieve this goal.
Firstly, CPU topology and all (actually all that libvirt knows about) CPU features have to be advertised in host capabilities:
<host> <cpu> ... <features> <feature>NAME</feature> </features> <topology> <sockets>NUMBER_OF_SOCKETS</sockets> <cores>CORES_PER_SOCKET</cores> <threads>THREADS_PER_CORE</threads> </topology> </cpu> ... </host>
I'm not 100% sure we should represent CPU features as <feature>NAME</feature> especially because some features are currently advertised as <NAME/>. However, extending XML schema every time a new feature is introduced doesn't look like a good idea at all. The problem is we can't get rid of <NAME/>-style features, which would result in redundancy:
<features> <vmx/> <feature>vmx</feature> </features>
But I think it's better than changing the schema to add new features.
Secondly, drivers which support detailed CPU specification have to advertise it in guest capabilities. In case <features> are meant to be hypervisor features, than it could look like:
<guest> ... <features> <cpu/> </features> </guest>
But if they are meant to be CPU features, we need to come up with something else:
<guest> ... <cpu_selection/> </guest>
I'm not sure how to deal with named CPUs suggested by Dan. Either we need to come up with global set of named CPUs and document what they mean or let drivers specify their own named CPUs and advertise them through guest capabilities:
<guest> ... <cpu model="NAME"> <feature>NAME</feature> ... </cpu> </guest> [IH] you also need to support removing a feature from the base cpu model, if it is disabled by bios, like the nx flag).
The former approach would make matching named CPUs with those defined by a hypervisor (such as qemu) quite hard. The latter could bring the need for hardcoding features provided by specific CPU models or, in case we decide not to provide a list of features for each CPU model, it can complicate transferring a domain from one hypervisor to another.
And finally, CPU may be configured in domain XML configuration:
<domain> ... <cpu model="NAME"> <topology> <sockets>NUMBER_OF_SOCKETS</sockets> <cores>CORES_PER_SOCKET</cores> <threads>THREADS_PER_CORE</threads> </topology>
<feature name="NAME" mode="set|check" value="on|off"/> </cpu> </domain>
Mode 'check' checks physical CPU for the feature and refuses the domain to start if it doesn't match. VCPU feature is set to the same value. Mode 'set' just sets the VCPU feature.
Final note: <topology> could also be called <cpu_topology> to avoid confusion with NUMA <topology>, which is used in host capabilities. However, I prefer <cpu><topology>...</topology></cpu> over <cpu><cpu_topology>...</cpu_topology></cpu>.
Thanks for your comments.
Jirka
-- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

I'm not sure how to deal with named CPUs suggested by Dan. Either we need to come up with global set of named CPUs and document what they mean or let drivers specify their own named CPUs and advertise them through guest capabilities: <guest> ... <cpu model="NAME"> <feature>NAME</feature> ... </cpu> </guest> [IH] you also need to support removing a feature from the base cpu model, if it is disabled by bios, like the nx flag).
Indeed, the above XML snippet describes capabilities, that is what features are turned on by each model name. ...
And finally, CPU may be configured in domain XML configuration:
<domain> ... <cpu model="NAME"> <topology> <sockets>NUMBER_OF_SOCKETS</sockets> <cores>CORES_PER_SOCKET</cores> <threads>THREADS_PER_CORE</threads> </topology>
<feature name="NAME" mode="set|check" value="on|off"/> </cpu> </domain>
Mode 'check' checks physical CPU for the feature and refuses the domain to start if it doesn't match. VCPU feature is set to the same value. Mode 'set' just sets the VCPU feature.
While here, when configuring a domain, you would use something like <cpu model="whatever"> <feature name="sse6" mode="set" value="off"/> </cpu> to turn off 'sse6' feature which was turned on by selecting CPU model 'whatever'. Jirka

On Fri, Sep 04, 2009 at 04:58:25PM +0200, Jiri Denemark wrote:
Firstly, CPU topology and all (actually all that libvirt knows about) CPU features have to be advertised in host capabilities:
<host> <cpu> ... <features> <feature>NAME</feature> </features> <topology> <sockets>NUMBER_OF_SOCKETS</sockets> <cores>CORES_PER_SOCKET</cores> <threads>THREADS_PER_CORE</threads> </topology> </cpu> ... </host>
FWIW, we already have the host topology sockets/core/threads exposed in the virNodeInfo API / struct, though I don't see any harm in having it in the node capabilities XML too, particularly since we put NUMA topology in there.
I'm not 100% sure we should represent CPU features as <feature>NAME</feature> especially because some features are currently advertised as <NAME/>. However, extending XML schema every time a new feature is introduced doesn't look like a good idea at all. The problem is we can't get rid of <NAME/>-style features, which would result in redundancy:
<features> <vmx/> <feature>vmx</feature> </features>
But I think it's better than changing the schema to add new features.
I think we need more than just the features in the capabilties XML though. eg, if an application wants to configure a guest with a CPU model of 'core2duo' then it needs to know whether the host OS is at least a 'core2duo' or a superset. In essence I think the host capabilities XML needs to be more closely aligned with your proposed guest XML, specifically including a base CPU model name, along with any additional features beyond the basic set provided by that model. Which brings me neatly to the next question The host capabilities XML for some random machine says the host CPU is a 'core2duo' + 'ssse3' + '3dnow'. There is a guest to be run with a XML config requesting 'pentium3' + 'ssse3' as a minimum requirement. Now pretend you are not a human who knows pentium3 is a sub-set of core2duo. How do we know whether it is possible to run the guest on that host ? We could say that we'll make 'virDomainCreate' just throw an error when you try to start a guest (or incoming migration, etc), but if we have a data center of many hosts, apps won't want to just try to start a guest on each host. They'll want some way to figure out equivalence between CPU + feature sets. Perhaps this suggests we want a virConnectCompareCPU(conn, "<guest cpu xml fragment>") which returns 0 if the CPU is not compatible (ie subset), 1 if it is identical, or 2 if it is a superset. If we further declare that host capabilities for CPU model follow the same schema a guest XML for CPU model, we can use this same API to test 2 separate hosts for equivalence and thus figure out the lowest common denominator between a set of hosts & also thus what guests are available for that set of hosts. For x86, this would require libvirt internal driver to have a xml -> cpuid convertor, but then we already need one of those if we've to implement this stuff for Xen and VMWare drivers so I don't see this as too bad. We also of course need a cpuid -> xml convertor to populate the host capabilities XML. For all this I'm thining we should have some basic external data files which map named CPUs to sets of CPUID features, and named flags to CPUID bits. Populate this with theset of CPUs QEMU knows about for now, and then we can extend this later simply by dropping in new data files. Back to your question about duplication:
<features> <vmx/> <feature>vmx</feature> </features>
Just ignore the fact that we have vmx, pae + svm features defined for now. Focus on determining what XML schema we want to use consistently across host + guest for describing a CPU model + features. Once that's determined, we'll just fill in the legacy vmx/pae/svm features based off the data for the new format and recommend in the docs not to use the old style.
Secondly, drivers which support detailed CPU specification have to advertise it in guest capabilities. In case <features> are meant to be hypervisor features, than it could look like:
<guest> ... <features> <cpu/> </features> </guest>
But if they are meant to be CPU features, we need to come up with something else:
<guest> ... <cpu_selection/> </guest>
I'm not sure how to deal with named CPUs suggested by Dan. Either we need to come up with global set of named CPUs and document what they mean or let drivers specify their own named CPUs and advertise them through guest capabilities:
<guest> ... <cpu model="NAME"> <feature>NAME</feature> ... </cpu> </guest>
The former approach would make matching named CPUs with those defined by a hypervisor (such as qemu) quite hard. The latter could bring the need for hardcoding features provided by specific CPU models or, in case we decide not to provide a list of features for each CPU model, it can complicate transferring a domain from one hypervisor to another.
As mentioned above I think we want to define a set of named CPU models that can be used across all drivers. For non-x86 we can just follow the standard CPU model names in QEMU. For x86 since there's soo many possible models and new ones appearing all the time, we I think we should define a set of CPUs models starting off those in QEMU, but provide a way to add new models via data files defining CPU ID mapping. nternally to libvirt we'll need bi-directional CPUID<->Model+feature convertors to allow good support in all our drivers. Model+feature -> CPU ID is easy - that's just a lookup. CPU ID -> Model+feature is harder. We'd need to iterate over all known models, and do a CPU ID -> Model+feature conversion for each model. Then pick the one that resulted in the fewest named features, which will probably be the newest CPU model. This will ensure the XML will always be the most concise.
And finally, CPU may be configured in domain XML configuration:
<domain> ... <cpu model="NAME"> <topology> <sockets>NUMBER_OF_SOCKETS</sockets> <cores>CORES_PER_SOCKET</cores> <threads>THREADS_PER_CORE</threads> </topology>
This bit about topology looks just fine.
<feature name="NAME" mode="set|check" value="on|off"/> </cpu> </domain>
Mode 'check' checks physical CPU for the feature and refuses the domain to start if it doesn't match. VCPU feature is set to the same value. Mode 'set' just sets the VCPU feature.
The <feature> bit is probably a little too verbose for my liking. <feature name='ssse3' policy='XXX'> With 'policy' allowing one of: - 'force' - set to '1', even if host doesn't have it - 'require' - set to '1', fail if host doesn't have it - 'optional' - set to '1', only if host has it - 'disable' - set to '0', even if host has it - 'forbid' - set to '0', fail if host has it 'force' is unlikely to be used but its there for completeness since Xen and VMWare allow it. 'forbid' is for cases where you disable the CPUID but an guest may still try to access it anyway and you don't want it to succeeed - if you used 'disable' the guest could still try to use the feature if the host supported it, even if masked out in CPUID. The final complication is the 'optional' flag here. If we set it to 'optional' and we boot the guest on a host that has this feature, then when tring to migrate this in essence becomes a 'require' feature flag since you can't take it away from a running guest.
Final note: <topology> could also be called <cpu_topology> to avoid confusion with NUMA <topology>, which is used in host capabilities. However, I prefer <cpu><topology>...</topology></cpu> over <cpu><cpu_topology>...</cpu_topology></cpu>.
<cpu_topology> is redundant naming - the context - within a <cpu> tag is more than sufficient to distinguish from host capbilities NUMA topology when using <topology>. Finally, throughout this discussion I'm assuming that for non-x86 archs we'll merely use the named CPU model and not bother about any features or flags beyond this - just strict equivalence....until someone who cares enough about those archs complains. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

I'm not 100% sure we should represent CPU features as <feature>NAME</feature> especially because some features are currently advertised as <NAME/>. However, extending XML schema every time a new feature is introduced doesn't look like a good idea at all. The problem is we can't get rid of <NAME/>-style features, which would result in redundancy:
<features> <vmx/> <feature>vmx</feature> </features>
But I think it's better than changing the schema to add new features.
I think we need more than just the features in the capabilties XML though. eg, if an application wants to configure a guest with a CPU model of 'core2duo' then it needs to know whether the host OS is at least a 'core2duo' or a superset.
In essence I think the host capabilities XML needs to be more closely aligned with your proposed guest XML, specifically including a base CPU model name, along with any additional features beyond the basic set provided by that model.
Which brings me neatly to the next question
The host capabilities XML for some random machine says the host CPU is a 'core2duo' + 'ssse3' + '3dnow'.
There is a guest to be run with a XML config requesting 'pentium3' + 'ssse3' as a minimum requirement.
Now pretend you are not a human who knows pentium3 is a sub-set of core2duo. How do we know whether it is possible to run the guest on that host ?
When I was proposing this, I thought about CPU name to be just a shortcut to a set of features. That is, you could see if pentium3 is a subset of core2duo by just translating them into a list of features and comparing them. Thanks for your comments. Jirka

On Tue, Sep 22, 2009 at 05:51:02PM +0200, Jiri Denemark wrote:
I'm not 100% sure we should represent CPU features as <feature>NAME</feature> especially because some features are currently advertised as <NAME/>. However, extending XML schema every time a new feature is introduced doesn't look like a good idea at all. The problem is we can't get rid of <NAME/>-style features, which would result in redundancy:
<features> <vmx/> <feature>vmx</feature> </features>
But I think it's better than changing the schema to add new features.
I think we need more than just the features in the capabilties XML though. eg, if an application wants to configure a guest with a CPU model of 'core2duo' then it needs to know whether the host OS is at least a 'core2duo' or a superset.
In essence I think the host capabilities XML needs to be more closely aligned with your proposed guest XML, specifically including a base CPU model name, along with any additional features beyond the basic set provided by that model.
Which brings me neatly to the next question
The host capabilities XML for some random machine says the host CPU is a 'core2duo' + 'ssse3' + '3dnow'.
There is a guest to be run with a XML config requesting 'pentium3' + 'ssse3' as a minimum requirement.
Now pretend you are not a human who knows pentium3 is a sub-set of core2duo. How do we know whether it is possible to run the guest on that host ?
When I was proposing this, I thought about CPU name to be just a shortcut to a set of features. That is, you could see if pentium3 is a subset of core2duo by just translating them into a list of features and comparing them.
True, but the issue with that is that it is an x86 specific concept. non-x86 CPU models can't be decomposed into a list of features for comparison. So I reckon its best to provide some explicit API or facility in libvirt to compare 2 CPU+feature descriptions for compatability, so we can hide the x86 specific bits from applications Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

[ Sending again as my mail from yesterday seems to not have gone out :-( ] On Fri, Sep 04, 2009 at 04:58:25PM +0200, Jiri Denemark wrote:
Hi,
This is an attempt to provide similar flexibility to CPU ID masking without being x86-specific and unfriendly to users. As suggested by Dan, we need a way to specify both CPU flags and topology to achieve this goal.
Right, thanks for trying to get this rolling :-)
Firstly, CPU topology and all (actually all that libvirt knows about) CPU features have to be advertised in host capabilities:
<host> <cpu> ... <features> <feature>NAME</feature> </features> <topology> <sockets>NUMBER_OF_SOCKETS</sockets> <cores>CORES_PER_SOCKET</cores> <threads>THREADS_PER_CORE</threads> </topology>
<topology sockets="x" cores="y" threads="z"/> would work too and give the possibility to extend in a completely different way later by using subelement if CPU architeture were to evolve drastically later.
</cpu> ... </host>
I'm not 100% sure we should represent CPU features as <feature>NAME</feature> especially because some features are currently advertised as <NAME/>. However, extending XML schema every time a new feature is introduced doesn't look like a good idea at all. The problem is we can't get rid of <NAME/>-style features, which would result in redundancy:
<features> <vmx/> <feature>vmx</feature> </features>
I'm not afraid of that, it's not ideal but since those are virtualization related features having them separated sounds fine. We just can't grow the schemas and parsing code to accomodate a different element for each different name. IMHO the worst is that the definition of the names. First there is gonna be a bunch of them and second their name if you rely just on the procinfo output may not be sufficient in the absolute. Registries are an nightmare by definition, and we should not add a registry of features in libvirt, nor try to assert any semantic to those names. So I'm afraid we are good for just sampling/dumping /proc/cpuinfo and leave the mess to the kernel. The feature list will grow quite long but that's fine IMHO.
But I think it's better than changing the schema to add new features.
Yeah that's unmaintainable.
Secondly, drivers which support detailed CPU specification have to advertise it in guest capabilities. In case <features> are meant to be hypervisor features, than it could look like:
<guest> ... <features> <cpu/> </features> </guest>
Somehow we will get the same mess, I assume QEmu interface can provide that list, right ? I'm also wondering if it's not possibly dependant on the machine, I hope not, i.e. that the emulated CPU features are not also dependent on the emaulated hardware...
But if they are meant to be CPU features, we need to come up with something else:
<guest> ... <cpu_selection/> </guest>
Something like <guest> <cpu model="foo"> <features>fpu vme de pse tsc msr pae mce cx8 apic</features> </cpu> <cpu model="bar"> <features>fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca</features> </cpu> </guest> hoping it doesn't go per machine !
I'm not sure how to deal with named CPUs suggested by Dan. Either we need to come up with global set of named CPUs and document what they mean or let drivers specify their own named CPUs and advertise them through guest capabilities:
<guest> ... <cpu model="NAME"> <feature>NAME</feature> ... </cpu> </guest>
Again I would not build the registry in libvirt itself, at least as a first approach, let the drivers provide them if available, expose them in the capabilities for the given guest type. If we really start to see duplication, then maygbe we can provide an helper. We could certainly provide utilities APIs to extract the set of flags and topology informations from utils/ but I would let the drivers being repsonsible for the list in the end.
The former approach would make matching named CPUs with those defined by a hypervisor (such as qemu) quite hard. The latter could bring the need for hardcoding features provided by specific CPU models or, in case we decide not to provide a list of features for each CPU model, it can complicate transferring a domain from one hypervisor to another.
And finally, CPU may be configured in domain XML configuration:
<domain> ... <cpu model="NAME"> <topology> <sockets>NUMBER_OF_SOCKETS</sockets> <cores>CORES_PER_SOCKET</cores> <threads>THREADS_PER_CORE</threads> </topology>
<topology sockets="x" cores="y" threads="z"/> Might be better, in any case it should be kept consistant with the capabilities section format.
<feature name="NAME" mode="set|check" value="on|off"/> </cpu> </domain>
Mode 'check' checks physical CPU for the feature and refuses the domain to start if it doesn't match. VCPU feature is set to the same value. Mode 'set' just sets the VCPU feature.
Okay, I expect NAME for model or feature name to come from one in the list shown by capabilities, so that there is no need to guess in the process and allow management apps to build an UI based on data available dynamically from libvirt (libvirt hopefully fetching it from the kernel or the hypervisor itself). With your followup mail, it seems mode="set|check" value="on|off" should really be sufficient,
Final note: <topology> could also be called <cpu_topology> to avoid confusion with NUMA <topology>, which is used in host capabilities. However, I prefer <cpu><topology>...</topology></cpu> over <cpu><cpu_topology>...</cpu_topology></cpu>.
Agreed, Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Tue, Sep 22, 2009 at 02:41:08PM +0200, Daniel Veillard wrote:
I'm not 100% sure we should represent CPU features as <feature>NAME</feature> especially because some features are currently advertised as <NAME/>. However, extending XML schema every time a new feature is introduced doesn't look like a good idea at all. The problem is we can't get rid of <NAME/>-style features, which would result in redundancy:
<features> <vmx/> <feature>vmx</feature> </features>
I'm not afraid of that, it's not ideal but since those are virtualization related features having them separated sounds fine. We just can't grow the schemas and parsing code to accomodate a different element for each different name.
IMHO the worst is that the definition of the names. First there is gonna be a bunch of them and second their name if you rely just on the procinfo output may not be sufficient in the absolute.
No, we should't rely on /proc/cpuinfo because that is Linux specific. For Xen and VMWare drivers we want a naming scheme for flags that is OS agnostic, in particular so Xen works on Solaris and VMWare works nicely on Windows. That's why I think we should define a naming scheme for all flags in libvirt, albeit not hardcoded in the source, or XML schema, but in an external data file that can be easily extended when new CPUs come out.
Registries are an nightmare by definition, and we should not add a registry of features in libvirt, nor try to assert any semantic to those names. So I'm afraid we are good for just sampling/dumping /proc/cpuinfo and leave the mess to the kernel. The feature list will grow quite long but that's fine IMHO.
We can actually keep the feature list very short. The key is that we expose a named CPU model which covers 95% of the host PCU features. We then justneed to list CPU features which are not explicitly part of that CPU moidel - which should be a mere handful - certainly less than 10.
Secondly, drivers which support detailed CPU specification have to advertise it in guest capabilities. In case <features> are meant to be hypervisor features, than it could look like:
<guest> ... <features> <cpu/> </features> </guest>
Somehow we will get the same mess, I assume QEmu interface can provide that list, right ? I'm also wondering if it's not possibly dependant on the machine, I hope not, i.e. that the emulated CPU features are not also dependent on the emaulated hardware...
CPU features are really just a artifact of the CPU model. The existing 'acpi' and 'apic' flags are not CPU features - they are chipset features, so out of scope for this discussion
But if they are meant to be CPU features, we need to come up with something else:
<guest> ... <cpu_selection/> </guest>
Something like <guest> <cpu model="foo"> <features>fpu vme de pse tsc msr pae mce cx8 apic</features> </cpu> <cpu model="bar"> <features>fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca</features> </cpu> </guest>
hoping it doesn't go per machine !
Exposing flags as a big string like this is really not nice for applications. Having different representations for capabilities XML description of a CPU vs the guest XML description of a CPU is also undesirable, thus I think thta the capabilities XML should essentially follow whatever schema we decide to use for the guest XML Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Tue, Sep 22, 2009 at 02:25:54PM +0100, Daniel P. Berrange wrote:
On Tue, Sep 22, 2009 at 02:41:08PM +0200, Daniel Veillard wrote:
IMHO the worst is that the definition of the names. First there is gonna be a bunch of them and second their name if you rely just on the procinfo output may not be sufficient in the absolute.
No, we should't rely on /proc/cpuinfo because that is Linux specific. For Xen and VMWare drivers we want a naming scheme for flags that is OS agnostic, in particular so Xen works on Solaris and VMWare works nicely on Windows.
For VMWare I expect the flag list and CPU descriptions to come from the driver, which can probably extract them from ESX itself.
That's why I think we should define a naming scheme for all flags in libvirt, albeit not hardcoded in the source, or XML schema, but in an external data file that can be easily extended when new CPUs come out.
I don't see how that's gonna scale. Just with the set of processors suppoorted by QEmu and the number of flags they may each export or not. Sure an external file would make maintainance way easier, but still...
Registries are an nightmare by definition, and we should not add a registry of features in libvirt, nor try to assert any semantic to those names. So I'm afraid we are good for just sampling/dumping /proc/cpuinfo and leave the mess to the kernel. The feature list will grow quite long but that's fine IMHO.
We can actually keep the feature list very short. The key is that we expose a named CPU model which covers 95% of the host PCU features. We then justneed to list CPU features which are not explicitly part of that CPU moidel - which should be a mere handful - certainly less than 10.
how do you know those lists and subsets, and how are you gonna keep them on the long term. If you take a processor definition from 5 years ago and want to make sure none of the new CPU features are not used what's the scenario in practice ? Would the application hagve to know the logic behind the name we would be defining for the processor type ? Would it have to have that knowledge to know that based on that processor type then such and such flags are not set ? If we export names which are managed by libvirt, then it becomes libvirt responsability to define the matrix of flags names and their semantic. And that's really something I'm afraid of. I prefer to delegate to the kernel or virtualization layers (via the drivers) to provide those flags and semantics, ultimately they end up being maintained either by the chip founders themselves or the hypervisors implementors (VMWare).
Secondly, drivers which support detailed CPU specification have to advertise it in guest capabilities. In case <features> are meant to be hypervisor features, than it could look like:
<guest> ... <features> <cpu/> </features> </guest>
Somehow we will get the same mess, I assume QEmu interface can provide that list, right ? I'm also wondering if it's not possibly dependant on the machine, I hope not, i.e. that the emulated CPU features are not also dependent on the emaulated hardware...
CPU features are really just a artifact of the CPU model.
Hopefully ... but my experience with embedded (some time ago admitedly) made clear that such or such processor feature may be activated or not depending on how they got wired. Even nowadays your CPU may have support for things which gets desactivated by the BIOS for example. Not that simple IMHO.
The existing 'acpi' and 'apic' flags are not CPU features - they are chipset features, so out of scope for this discussion
Okay
Something like <guest> <cpu model="foo"> <features>fpu vme de pse tsc msr pae mce cx8 apic</features> </cpu> <cpu model="bar"> <features>fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca</features> </cpu> </guest>
hoping it doesn't go per machine !
Exposing flags as a big string like this is really not nice for applications. Having different representations for capabilities XML description of a CPU vs the guest XML description of a CPU is also undesirable, thus I think thta the capabilities XML should essentially follow whatever schema we decide to use for the guest XML
yes I agree we should unify both formats. The flag list as a single string isn't fun for applications, granted, something more structured would be better. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Tue, Sep 22, 2009 at 03:52:08PM +0200, Daniel Veillard wrote:
On Tue, Sep 22, 2009 at 02:25:54PM +0100, Daniel P. Berrange wrote:
No, we should't rely on /proc/cpuinfo because that is Linux specific. For Xen and VMWare drivers we want a naming scheme for flags that is OS agnostic, in particular so Xen works on Solaris and VMWare works nicely on Windows.
For VMWare I expect the flag list and CPU descriptions to come from the driver, which can probably extract them from ESX itself.
VMWare doesnt expose any named flags / CPU. It just exports the raw CPUID bitmask. So libvirt has to maintain a database of named + flags to convert the VMWare CPUID into something useful. The same situation exists with Xen.
That's why I think we should define a naming scheme for all flags in libvirt, albeit not hardcoded in the source, or XML schema, but in an external data file that can be easily extended when new CPUs come out.
I don't see how that's gonna scale. Just with the set of processors suppoorted by QEmu and the number of flags they may each export or not. Sure an external file would make maintainance way easier, but still...
The key is that you don't try to create named CPU model for every possible CPU that Intel/AMD release. You just have a handful of CPU models, and then uses flags to indicate extra features. Thus it becomes tradeoff between the number of CPU models available, vs number of extra flags an app has to list which lets us control the way it scales while still giving flexibility to apps. As a point of reference, QEMU has < 10 named CPU models currently for x86, but with the combination of possible names + flags, it can expose many 100's of different CPUs to the guest.
Registries are an nightmare by definition, and we should not add a registry of features in libvirt, nor try to assert any semantic to those names. So I'm afraid we are good for just sampling/dumping /proc/cpuinfo and leave the mess to the kernel. The feature list will grow quite long but that's fine IMHO.
We can actually keep the feature list very short. The key is that we expose a named CPU model which covers 95% of the host PCU features. We then justneed to list CPU features which are not explicitly part of that CPU moidel - which should be a mere handful - certainly less than 10.
how do you know those lists and subsets, and how are you gonna keep them on the long term. If you take a processor definition from 5 years ago and want to make sure none of the new CPU features are not used what's the scenario in practice ? Would the application hagve to know the logic behind the name we would be defining for the processor type ? Would it have to have that knowledge to know that based on that processor type then such and such flags are not set ? If we export names which are managed by libvirt, then it becomes libvirt responsability to define the matrix of flags names and their semantic. And that's really something I'm afraid of. I prefer to delegate to the kernel or virtualization layers (via the drivers) to provide those flags and semantics, ultimately they end up being maintained either by the chip founders themselves or the hypervisors implementors (VMWare).
The key issue here is there is nothing to delegate to in VMWare or Xen case, since both use the raw CPUID format as their config model - which is x86 specific so we need to apply a conversion in libvirt. Once we're doing that, it becomes trivial todo it for exposing the host CPU model too.
Somehow we will get the same mess, I assume QEmu interface can provide that list, right ? I'm also wondering if it's not possibly dependant on the machine, I hope not, i.e. that the emulated CPU features are not also dependent on the emaulated hardware...
CPU features are really just a artifact of the CPU model.
Hopefully ... but my experience with embedded (some time ago admitedly) made clear that such or such processor feature may be activated or not depending on how they got wired. Even nowadays your CPU may have support for things which gets desactivated by the BIOS for example. Not that simple IMHO.
The BIOS settings aren't actually toggling the CPU features. If you have VT/SVM disabled in the BIOS, it'll still be visible in the /proc/cpuinfo flags data. The BIOS is toggling something else to prevent the feature being used, even when present. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Tue, Sep 22, 2009 at 03:01:18PM +0100, Daniel P. Berrange wrote:
On Tue, Sep 22, 2009 at 03:52:08PM +0200, Daniel Veillard wrote:
On Tue, Sep 22, 2009 at 02:25:54PM +0100, Daniel P. Berrange wrote:
No, we should't rely on /proc/cpuinfo because that is Linux specific. For Xen and VMWare drivers we want a naming scheme for flags that is OS agnostic, in particular so Xen works on Solaris and VMWare works nicely on Windows.
For VMWare I expect the flag list and CPU descriptions to come from the driver, which can probably extract them from ESX itself.
VMWare doesnt expose any named flags / CPU. It just exports the raw CPUID bitmask. So libvirt has to maintain a database of named + flags to convert the VMWare CPUID into something useful. The same situation exists with Xen.
<sigh/>
That's why I think we should define a naming scheme for all flags in libvirt, albeit not hardcoded in the source, or XML schema, but in an external data file that can be easily extended when new CPUs come out.
I don't see how that's gonna scale. Just with the set of processors suppoorted by QEmu and the number of flags they may each export or not. Sure an external file would make maintainance way easier, but still...
The key is that you don't try to create named CPU model for every possible CPU that Intel/AMD release. You just have a handful of CPU models, and then uses flags to indicate extra features. Thus it becomes tradeoff between the number of CPU models available, vs number of extra flags an app has to list which lets us control the way it scales while still giving flexibility to apps. As a point of reference, QEMU has < 10 named CPU models currently for x86, but with the combination of possible names + flags, it can expose many 100's of different CPUs to the guest.
Okay, maybe we can keep this maintainable. The alternative is a very unfriendly and unreliable API. It's just a shame that those things end up in libvirt, because honnestly it really sounds like low level logic that not directly tied to virtualization and which should really come from the system, but since we have remote only drivers like VMWare, that doesn't exist and we would need this portable then okay. If we go that way, then yes definitely let's make those descriptions an external file easilly updated by the distro or sysadmin, so that we don't get upstream update request in 5 years when nobody knows anymore what a core2duo might have been... Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/
participants (5)
-
'Jiri Denemark'
-
Daniel P. Berrange
-
Daniel Veillard
-
Itamar Heim
-
Jiri Denemark