Re: [libvirt] [RFC] qemu: Redesigning guest CPU configuration

23 Jun 2015


      On 19.06.2015 14:27, Daniel Hansel wrote:
...
On 18.06.2015 15:41, Daniel P. Berrange wrote:
...
On Wed, Jun 17, 2015 at 05:37:42PM +0200, Jiri Denemark wrote:
...
Hi all (and sorry for the long email),
The current way QEMU driver handles guest CPU configuration is not
ideal. We detect host CPU capabilities only by querying the CPU and we
don't check with QEMU what features it supports. We don't check QEMU's
definitions of CPU models, which may be different from libvirt's
definitions. All this results in several issues:
- guest CPU may change druing migration, save/restore
- libvirt may ask for a CPU which QEMU cannot provide; the guest will
  see a slightly different CPU but libvirt client won't know about it
- libvirt may come up with a CPU that doesn't make sense and which won't
  work for a guest (the guest may even crash)
Although usually everything just works, it is very fragile.
A third issue is that if there is no <cpu> in the guest config, we
just delegate CPU choice to QEMU and then ignore any CPU checks when
migrating. If libvirt owns the full CPU config, we'd probably want
to also decide the default ourselves, so that we will always be able
todo migrate CPU checks.
...
Since we want to fix all these issues, we need to:
- guarantee stable guest ABI (a single domain XML should always results
  in the same guest ABI). Once a domain is started, its CPU definition
  should never change (unless someone changes the XML, of course,
  similar to, e.g. PCI addresses). However, there are a few exceptions:
    - host-passthrough CPU mode will always result in "-cpu host"
    - host-model CPU mode should recompute the CPU model on every start,
      but the CPU must not change during migration
- always make sure QEMU provides the CPU we asked for. Starting a domain
  should fail in case QEMU cannot provide exactly the CPU we asked for.
- provide usable host-model mode and custom mode with minimum match. We
  need to generate CPU configurations that actually work, i.e., we need
  to ask QEMU what CPU it can provide on current host rather than
  requesting a bunch of features on top of a CPU model which does not
  always match the host CPU.
QEMU already provides or will soon provide everything we need to meet
these requirements:
- we can cover every configurable part of a CPU in our cpu_map.xml and
  instead of asking QEMU for a specific CPU model we can use "-cpu
  custom" with a fully specified CPU
- we can use the additional data about CPU models to choose the right
  one for a host CPU
- when starting a domain we can check whether QEMU filtered out any of
  the features we asked for and refuse to start the domain
- we can ask QEMU what would "-cpu host" look like and use that for
  host-model and minimum match CPUs (it won't work for TCG mode, though,
  but we can keep using the current CPUID detection code for TCG)
In TCG mode of course, 'host-model' and 'host-passthrough' are
effectively identical, and don't actually need the host to support
all the featues, since TCG is fully emulated. Which means that you
can migrated TCG guests to anyhost with any model :-) I wonder if
we are probably accidentally restricting that today, becuase we
assume KVM needs host support.
...
Once we start maintaining CPU models with all the details, we will
likely meet the same issues QEMU folks meet, i.e., we will need to fix
bugs in existing CPU models. And it's not just about adding removing CPU
features but also fixing other parameters, such as wrong level, etc.
It's clear every change will require a new CPU model to be defined. But
I think we should do it in a way that applications or users should not
need (if they don't want to) to care about it. I'm thinking about doing
something similar to machine types. Each CPU model could be defined in
several versions and a CPU specs without a version would be an alias to
the latest version.
Agreed, I think that versioning CPU models, independantly of machine
types makes sense. It is probably a little more complex - in most cases
we'd increase the version, but in some cases I think we'd end up wanting
to define new named models. For example, with the recent TSX scenario we
had, using versions would not have been appropriate, because Intel in
fact ship 2 variants of the silicon. So even with with versioning, we
would still have wanted to introduce the noTSX variants of the models.
...
The problem is, we need to maintain backward compatibility and we should
avoid breaking existing domains (shouldn't we?) which just work even
though their guest CPUs do not exactly match the domain XML definitions.
Yep breaking existing domains isn't too pleasant!
...
So either we need to define all existing CPU models in all their
variants used for various machine types and have a mapping between
(model without a version, machine type) to a specific version of the
model (which may be quite hard) or we need to be able to distinguish
between an existing domain and a new domain with no CPU model version.
While host-model and host-passthrough CPU modes are easy because they
are designed to change everytime a domain starts (which means we don't
need to be able to distinguish between existing and new domains), custom
CPU mode are tricky. Currently, the only at least a bit reasonable thing
which came to my mind is to have a new CPU mode, but it still seems
awkward so please share your ideas if you have any.
Introducing a new CPU mode feels pretty unpleasant to me.
Although it will certainly be tedious work, getting details of all the
CPU variants for historical machine types should be doable I think.
...
BTW, I don't think we should try to expose every part of the CPU model
definitions in domain XML, they should remain hidden behind the CPU
model name. It would be hard to explain what each of the extra
parameters mean, each model would have to include them anyway since we
can't expect users to provide all the details correctly, and once
visible in domain XML it could encourage users to play with the values.
Yeah, I don't think we need expose all the raw details. If people really
badly want to be able to customize that, then we should instead look at
how we could better enable the cpu_map.xml file to be admin extensible.
Hi Daniel and Jirka,

just as a ping if you have missed my comment...
...
Hi,
currently Michael Mueller (IBM) is working on an extension of QEMU to support CPU models for s390x platform.
During the discussion on the QEMU mailing list the implementation was done in a more common way to provide support for all platforms.
According to that new implementation I have implemented a first version for libvirt to retrieve the CPU model(s) supported by QEMU on s390x.
Due to the fact that the discussion is ongoing my prototype is not ready to be tested yet.
A short overview about the current prototype I have implemented (QEMU cpu model support patches from Michael Mueller required):
1. During start of libvirt daemon QEMU monitor is used to retrieve the CPU models (i.e. just model names, QEMU handles all other setting like features, etc.) QEMU is supporting.
2. The supported CPU models are stored in libvirt's QEMU capabilities (and stored in the capabilities cache file).
3. Each call of virConnectGetCPUModelNames() (i.e. qemuConnectGetCPUModelNames()) is retrieving the information from QEMU capabilities (cached or not) on s390x platform.
All other platforms remain on the currently implemented way to parse the cpu_map.xml.
Depending on that implementation all requests to get CPU models (e.g. for CPU model comparison, CPU model listing) will lead to a more appropriate result (e.g. if a QEMU binary is exchanged by a QEMU
binary built manually).
...
Regards,
Daniel
-- 

Mit freundlichen Grüßen / Kind regards
Daniel Hansel

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294