Hi,
Sorry for the long message, but I didn't find a way to summarize the
questions and issues and make it shorter.
For people who don't know me: I have started to work recently on the
Qemu CPU model code. I have been looking at how things work on
libvirt+Qemu today w.r.t. CPU models, and I have some points I would
like to understand better and see if they can be improved.
I have two main points I would like to understand/discuss:
1) The relationship between libvirt's cpu_map.xml and the Qemu CPU model
definitions.
2) How we could properly allow CPU models to be changed without breaking
existing virtual machines?
Note that for all the questions below, I don't expect that we design the
whole solution and discuss every single detail in this thread. I just
want to collectn suggestions, information about libvirt requirements and
assumptions, and warnings about expected pitfalls before I start working
on a solution on Qemu.
1) Qemu and cpu_map.xml
I would like to understand how cpu_map.xml is supposed to be used, and
how it is supposed to interact with the CPU model definitions provided
by Qemu. More precisely:
1.1) Do we want to eliminate the duplication between the Qemu CPU
definitions and cpu_map.xml?
1.1.1) If we want to eliminate the duplication, how can we accomplish
that? What interfaces you miss, that Qemu could provide?
1.1.2) If the duplication has a purpose and you want to keep
cpu_map.xml, then:
- First, I would like to understand why libvirt needs cpu_map.xml? Is
it part of the "public" interface of libvirt, or is it just an
internal file where libvirt stores non-user-visible data?
- How can we make sure there is no confusion between libvirt and Qemu
about the CPU models? For example, what if cpu_map.xml says model
'Moo' has the flag 'foo' enabled, but Qemu disagrees? How do we
guarantee that libvirt gets exactly what it expects from Qemu when
it asks for a CPU model? We have "-cpu ?dump" today, but it's not
the better interface we could have. Do you miss something in special
in the Qemu<->libvirt interface, to help on that?
1.2) About the probing of available features on the host system: Qemu
has code specialized to query KVM about the available features, and to
check what can be enabled and what can't be enabled in a VM. On many
cases, the available features match exactly what is returned by the
CPUID instruction on the host system, but there are some
exceptions:
- Some features can be enabled even when the host CPU doesn't support
it (because they are completely emulated by KVM, e.g. x2apic).
- On many other cases, the feature may be available but we have to
check if Qemu+KVM are really able to expose it to the guest (many
features work this way, as many depend on specific support by the
KVM kernel module and/or Qemu).
I suppose libvirt does want to check which flags can be enabled in a
VM, as it already have checks for host CPU features (e.g.
src/cpu/cpu_x86.c:x86Compute()). But I also suppose that libvirt
doesn't want to duplicate the KVM feature probing code present on
Qemu, and in this case we could have an interface where libvirt could
query for the actually-available CPU features. Would it be useful for
libvirt? What's the best way to expose this interface?
1.3) Some features are not plain CPU feature bits: e.g. level=X can be
set in "-cpu" argument, and other features are enabled/disabled by
exposing specific CPUID leafs and not just a feature bit (e.g. PMU
CPUID leaf support). I suppose libvirt wants to be able to probe for
those features too, and be able to enable/disable them, right?
2) How to change an existing model and keep existing VMs working?
Sometimes we have to update a CPU model definition because of some bug.
Eamples:
- The CPU models Conroe, Penrym and Nehalem, have level=2 set. This
works most times, but it breaks CPU core/thread topology enumeration.
We have to change those CPU models to use level=4 to fix the bug.
- This can happen with plain CPU feature bits, too, not just "level":
sometimes real-world CPU models have a feature that is not supported
by Qemu+KVM yet, but when the kernel and Qemu finally starts to
support it, we may want to enable it on existing CPU models. Sometimes
a model simply has the wrong set of feature bits, and we have to fix
it to have the right set of features.
But if we simply change the existing model definition, this will break
existing machines:
- Today, it would break on live migration, but that's slightly easy to
fix: we have to migrate the CPUID information too, to make sure we
won't change the CPU under the guest OS feet.
- Even if we fix live migration, simple "cold" migration will make the
guest OS see a different CPU after a reboot, and that's undesirable
too. Even if the Qemu developers disagree with me and decide that this
is not a problem, libvirt may want to expose a more stable CPU to the
guest, and some cooperation from Qemu would be ncessary.
So, my questions are:
About the libvirt<->Qemu interface:
2.1) What's the best mechanism to have different versions of a CPU
model? An alias system like the one used by machine-types? How to
implement this without confusing the existing libvirt probing code?
2.2) We have to make the CPU model version-choosing mechanism depend on
the machine-type. e.g. if the user has a pc-1.0 machine using the
Nehalem CPU model, we have to keep using the level=2 version of that
CPU. But if the user chose a newer machine-type version, we can safely
get the latest-and-greates version of the Nehalem CPU model. How to
make this work without confusing libvirt?
About the user<->libvirt interface:
2.3) How all this will interact with cpu_map.xml? Right now there's the
assumption that the CPU model definitions are immutable, right?
2.4) How do you think libvirt would expose this "CPU model version"
to the user? Should it just expose the unversioned CPU models to the
user, and let Qemu or libvirt choose the right version based on
machine-type? Should it expose only the versioned CPU models (because
they are immutable) and never expose the unversioned aliases? Should
it expose the unversioned alias, but change the Domain XML definition
automatically to the versioned immutable one (like it happens with
machine-type)?
I don't plan to interfere on the libvirt interface design, but I suppose
that libvirt design assumptions will be impacted by the solution we
choose on Qemu. For example: right now libvirt seems to assume that CPU
models are immutable. Are you going to keep this assumption in the
libvirt interfaces? Because I am already willing to break this
assumption on Qemu, although I would like to cooperate with libvirt and
not break any requirements/assumptions without warning.
--
Eduardo