Thanks a lot for the explanations, Daniel.
Comments about specific items inline.
On Wed, Mar 07, 2012 at 02:18:28PM +0000, Daniel P. Berrange wrote:
> I have two main points I would like to understand/discuss:
>
> 1) The relationship between libvirt's cpu_map.xml and the Qemu CPU model
> definitions.
We have several areas of code in which we use CPU definitions
- Reporting the host CPU definition (virsh capabilities)
- Calculating host CPU compatibility / baseline definitions
- Checking guest / host CPU compatibility
- Configuring the guest CPU definition
libvirt targets multiple platforms, and our CPU handling code is designed
to be common & sharable across all the libvirt drivers, VMWare, Xen, KVM,
LXC, etc. Obviously for container based virt, only the host side of things
is relevant.
The libvirt CPU XML definition consists of
- Model name
- Vendor name
- zero or more feature flags added/removed.
A model name is basically just an alias for a bunch of feature flags,
so that the CPU XML definitions are a) reasonably short b) have
some sensible default baselines.
The cpu_map.xml is the database of the CPU models that libvirt
supports. We use this database to transform the CPU definition
from the guest XML, into the hypervisor's own format.
Understood. Makes sense.
As luck would have it, the cpu_map.xml file contents match what
QEMU has. This need not be the case though. If there is a model
in the libvirt cpu_map.xml that QEMU doesn't know, we'll just
pick the nearest matching QEMU cpu model & specify the fature
flags to compensate.
Awesome. So, if Qemu and libvirt disagrees, libvirt will know that and
add the necessary flags? That was my main worry. If disagreement between
Qemu and libvirt is not a problem, it would make things much easier.
...but:
Is that really implemented? I simply don't see libvirt doing that. I see
code that calls "-cpu ?" to list the available CPU models, but no code
calling "-cpu ?dump", or parsing the Qemu CPU definition config file. I
even removed some random flags from the Nehalem model on my machine
(running Fedora 16), and no additional flags were added.
We could go one step further and just write
out a cpu.conf file that we load in QEMU with -loadconfig.
Sounds good. Anyway, I want to make everything configurable on the
cpudef config file configurable on the command-line too, so both options
(command-line or config file) would work.
On Xen we would use the cpu_map.xml to generate the CPUID
masks that Xen expects. Similarly for VMWare.
> 2) How we could properly allow CPU models to be changed without breaking
> existing virtual machines?
What is the scope of changes expected to CPU models ?
We already have at least four cases, affecting different fields of the
CPU definitions:
A) Adding/removing flags. Exampes:
- When the current set of flags is simply incorrect. See commit df07ec56
on qemu.git, where lots of flags that weren't supposed to be set
were removed from some models.
- When a new feature is now supported by Qemu+KVM and it's present
on real-world CPUs, but our CPU definitions don't have the feature
yet. e.g. x2apic, that is present on real-world Westmere CPUs but
disabled on Qemu Westmere CPU definition.
B) Changing "level" for some reason. One example: Conroe, Penrym and
Nehalem have level=2, but need to have level>=4 to make CPU topology
work, so they have to be changed.
C) Enabling/disabling or overriding specific CPUID leafs. This isn't
even configurable on the config files today, but I plan to allow it
to be configured, otherwise users won't be able to enable/disable
some features that are probed by the guest by simply looking at a
CPUID leaf (e.g. the 0xA CPUID leaf that contains PMU information).
The PMU leaf is an example where a CPU looks different by simply using a
different Qemu or kernel version, and libvirt can't control the
visibility of that feature to the guest:
- If you start a Virtual Machine using Qemu-1.0 today, with the "pc-1.0"
machine-type, the PMU CPUID leaf won't be visible to the guest
(as Qemu-1.0 doesn't support the PMU leaf).
- If you start a Virtual Machine using Qemu-1.1 in the future, using the
"pc-1.1" machine-type, with a recent kernel, the PMU CPUID leaf _will_
be visible to the guest (as the qemu.git master branch supports it).
Up to now, it is OK because the machine-type in theory help us control
the feature, but we have a problem on this case:
- If you start a Virtual Machine using Qemu-1.1 in the future, using the
"pc-1.1" machine-type, using exactly the same command-line as above,
but using an old kernel, the PMU CPUID leaf will _not_ be visible to
the guest.
> 1) Qemu and cpu_map.xml
>
> I would like to understand how cpu_map.xml is supposed to be used, and
> how it is supposed to interact with the CPU model definitions provided
> by Qemu. More precisely:
>
> 1.1) Do we want to eliminate the duplication between the Qemu CPU
> definitions and cpu_map.xml?
It isn't possible for us to the libvirt cpu_map.xml, since we
need that across all our hypervisor targets.
OK, as you already explained. It's not a problem to me as long as things
work as expected when Qemu and libvirt disagree about a CPU model
definition.
So, about the specific questions:
> 1.1.1) If we want to eliminate the duplication, how can we
accomplish
> that? What interfaces you miss, that Qemu could provide?
>
> 1.1.2) If the duplication has a purpose and you want to keep
> cpu_map.xml, then:
> - First, I would like to understand why libvirt needs cpu_map.xml? Is
> it part of the "public" interface of libvirt, or is it just an
> internal file where libvirt stores non-user-visible data?
You answered that above.
> - How can we make sure there is no confusion between libvirt
and Qemu
> about the CPU models? For example, what if cpu_map.xml says model
> 'Moo' has the flag 'foo' enabled, but Qemu disagrees? How do we
> guarantee that libvirt gets exactly what it expects from Qemu when
> it asks for a CPU model? We have "-cpu ?dump" today, but it's not
> the better interface we could have. Do you miss something in special
> in the Qemu<->libvirt interface, to help on that?
So, it looks like either I am missing something on my tests or libvirt
is _not_ probing the Qemu CPU model definitions to make sure libvirt
gets all the features it expects.
Also, I would like to ask if you have suggestions to implement
the equivalent of "-cpu ?dump" in a more friendly and extensible way.
Would a QMP command be a good alternative? Would a command-line option
with json output be good enough?
(Do we have any case of capability-querying being made using QMP before
starting any actual VM, today?)
> 1.2) About the probing of available features on the host system:
Qemu
> has code specialized to query KVM about the available features, and to
> check what can be enabled and what can't be enabled in a VM. On many
> cases, the available features match exactly what is returned by the
> CPUID instruction on the host system, but there are some
> exceptions:
> - Some features can be enabled even when the host CPU doesn't support
> it (because they are completely emulated by KVM, e.g. x2apic).
> - On many other cases, the feature may be available but we have to
> check if Qemu+KVM are really able to expose it to the guest (many
> features work this way, as many depend on specific support by the
> KVM kernel module and/or Qemu).
>
> I suppose libvirt does want to check which flags can be enabled in a
> VM, as it already have checks for host CPU features (e.g.
> src/cpu/cpu_x86.c:x86Compute()). But I also suppose that libvirt
> doesn't want to duplicate the KVM feature probing code present on
> Qemu, and in this case we could have an interface where libvirt could
> query for the actually-available CPU features. Would it be useful for
> libvirt? What's the best way to expose this interface?
So, about the above: the cases where libvirt thinks a feature is
available but Qemu knows it is not available are sort-of OK today,
because Qemu would simply refuse to start and an error message would be
returned to the user.
But what about the features that are not available on the host CPU,
libvirt will think it can't be enabled, but that _can_ be enabled?
x2apic seems to be the only case today, but we may have others in the
future.
>
> 1.3) Some features are not plain CPU feature bits: e.g. level=X can be
> set in "-cpu" argument, and other features are enabled/disabled by
> exposing specific CPUID leafs and not just a feature bit (e.g. PMU
> CPUID leaf support). I suppose libvirt wants to be able to probe for
> those features too, and be able to enable/disable them, right?
The libvirt CPU definition does not currently store info about the
level, family, model, stepping, xlevel or model_id items. We really
ought to fix this, so that libvirt does have that info. Then we'd
be able to write out a QEMU config that fully specified the exact
model.
OK, good to know that this is being planned.
> 2) How to change an existing model and keep existing VMs working?
>
> Sometimes we have to update a CPU model definition because of some bug.
> Eamples:
>
> - The CPU models Conroe, Penrym and Nehalem, have level=2 set. This
> works most times, but it breaks CPU core/thread topology enumeration.
> We have to change those CPU models to use level=4 to fix the bug.
This is an example of why libvirt needs to represent the level/family
etc in its CPU definition. That way, when a guest is first created,
the XML will save the CPU model, feature flags, level, family, etc
it is created with. Should the level be changed later, existing guests
would then not be affected, only new guests would get the level=4
Correct.
> - This can happen with plain CPU feature bits, too, not just "level":
> sometimes real-world CPU models have a feature that is not supported
> by Qemu+KVM yet, but when the kernel and Qemu finally starts to
> support it, we may want to enable it on existing CPU models. Sometimes
> a model simply has the wrong set of feature bits, and we have to fix
> it to have the right set of features.
> 2.3) How all this will interact with cpu_map.xml? Right now there's the
> assumption that the CPU model definitions are immutable, right?
>
> 2.4) How do you think libvirt would expose this "CPU model version"
> to the user? Should it just expose the unversioned CPU models to the
> user, and let Qemu or libvirt choose the right version based on
> machine-type? Should it expose only the versioned CPU models (because
> they are immutable) and never expose the unversioned aliases? Should
> it expose the unversioned alias, but change the Domain XML definition
> automatically to the versioned immutable one (like it happens with
> machine-type)?
We should only expose unversioned CPU models, but then record the
precise details of the current version in the guest XML.
Sounds good to me.
That answers most of my questions about how libvirt would handle changes
on CPU models. Now we need good mechanisms that allow libvirt to do
that. If you have specific requirements or suggestions in mind, please
let me know.
--
Eduardo