On Thu, Jul 25, 2013 at 10:45:10AM +0100, Daniel P. Berrange wrote:
On Wed, Jul 24, 2013 at 03:25:19PM -0300, Eduardo Habkost wrote:
> On Tue, Jul 23, 2013 at 07:32:46PM +0200, Jiri Denemark wrote:
> > On Tue, Jul 23, 2013 at 19:28:38 +0200, Jiri Denemark wrote:
> > > On Tue, Jul 23, 2013 at 17:32:42 +0100, Daniel Berrange wrote:
> > > > On Tue, Jul 23, 2013 at 06:11:33PM +0200, Jiri Denemark wrote:
> > > > > ---
> > > > > src/qemu/qemu_monitor.c | 21 +++
> > > > > src/qemu/qemu_monitor.h | 3 +
> > > > > src/qemu/qemu_monitor_json.c | 162
+++++++++++++++++++++
> > > > > src/qemu/qemu_monitor_json.h | 6 +
> > > > > tests/Makefile.am | 1 +
> > > > > .../qemumonitorjson-getcpu-empty.data | 2 +
> > > > > .../qemumonitorjson-getcpu-empty.json | 46
++++++
> > > > > .../qemumonitorjson-getcpu-filtered.data | 4 +
> > > > > .../qemumonitorjson-getcpu-filtered.json | 46
++++++
> > > > > .../qemumonitorjson-getcpu-full.data | 4 +
> > > > > .../qemumonitorjson-getcpu-full.json | 46
++++++
> > > > > .../qemumonitorjson-getcpu-host.data | 5 +
> > > > > .../qemumonitorjson-getcpu-host.json | 45
++++++
> > > > > tests/qemumonitorjsontest.c | 74
++++++++++
> > > > > 14 files changed, 465 insertions(+)
> > > > > create mode 100644
tests/qemumonitorjsondata/qemumonitorjson-getcpu-empty.data
> > > > > create mode 100644
tests/qemumonitorjsondata/qemumonitorjson-getcpu-empty.json
> > > > > create mode 100644
tests/qemumonitorjsondata/qemumonitorjson-getcpu-filtered.data
> > > > > create mode 100644
tests/qemumonitorjsondata/qemumonitorjson-getcpu-filtered.json
> > > > > create mode 100644
tests/qemumonitorjsondata/qemumonitorjson-getcpu-full.data
> > > > > create mode 100644
tests/qemumonitorjsondata/qemumonitorjson-getcpu-full.json
> > > > > create mode 100644
tests/qemumonitorjsondata/qemumonitorjson-getcpu-host.data
> > > > > create mode 100644
tests/qemumonitorjsondata/qemumonitorjson-getcpu-host.json
> > > >
> > > > ACK, though I believe the design of this monitor API is flawed
> > > > because it requires you to re-launch QEMU with different accel
> > > > args
> > >
> > > Not really, this can be used in tcg too. It's just when we want to
get
> > > the data for "host" CPU, we need to enable kvm as tcg knows
nothing
> > > about that CPU. Which makes sense as kvm (the kernel module) influences
> > > how the "host" CPU will look like.
> >
> > However, you need to have a CPU to be able to ask for his properties
> > (which kinda makes sense too) and for that you also need a machine with
> > type != none. Which makes sense too, as the CPU may differ depending on
> > machine type (which, however, does not happen for "host" CPU).
>
> In addition to the "-cpu host" KVM initialization problem, this is an
> additional problem with the current interfaces provided by QEMU:
>
> 1) libvirt needs to query data that depend on chosen machine-type and
> CPU model
> 2) Some machine-type behavior is code and not introspectable data
> * Luckily most of the data we need in this case should/will be
> encoded in the compat_props tables.
> * In either case, we don't have an API to query for machine-type
> compat_props information yet.
> 3) CPU model behavior will be modelled as CPU class behavior. Like
> on the machine-type case, some of the CPU-model-specific behavior may
> be modelled as code, and not introspectable data.
> * However, e may be able to eventually encode most or all of
> CPU-model-specific behavior simply as different per-CPU-class
> property defaults.
> * In either case, we don't have an API for QOM class introspection,
> yet.
>
> But there's something important in this case: the resulting CPUID data
> for a specific machine-type + CPU-model combination must be always the
> same, forever. This means libvirt may even use a static table, or cache
> this information indefinitely.
>
> (Note that I am not talking about "-cpu host", here, but about all the
> other CPU models)
Hmm, so if the CPU filtering can vary per every single individual
machine type, then the approach Jiri started here, of invoking QEMU
with machine type set to query the CPU after it was created, is
definitely not something we can follow. It is just far too inefficient.
I believe there's some confusion here: we are trying to solve two
problems:
1) CPU feature filtering (checking which features are available in a
given host)
2) CPU model probing (checking what exactly is going to be available
when a given CPU model is used, in case nothing is filtered out)
Item (1) depends on: host CPU capabilities, host kernel capabilities,
QEMU capabilities, presence of some few QEMU command-line options (e.g.
kernel irqchip), but shouldn't depend on the machine-type. It depends on
/dev/kvm being open.
Item (2) depends on the machine-type, but is static and must never
change on future QEMU versions (if it changes, it is a QEMU bug). It
doesn't depend on opening /dev/kvm.
Item (1) can be solved if libvirt does the work itself, by opening
/dev/kvm and checking for GET_SUPPORTED_CPUID and checking for QEMU
options/capabilities (as long as we document that very carefully). But
adding a more specific QMP command that won't require accel=kvm to work
may be simpler and better for everybody.
Item (2) may be solved today using a static table and/or caching (so
libvirt just need to query this information once in a lifetime). It can
also be solved partially (without machine-type support) in theory if
QEMU let libvirt repeatedly create and destroy CPU objects just to query
the resulting feature properties.
...but both problems could be solved very easily using current QEMU
interfaces, if libvirt simply executed the QEMU binary more than once.
Is "must not run QEMU more than once" a hard requirement? Perfect is the
enemy of good. :)
I understand that the QEMU code isn't currently structured in a way
that lets it easily expose information that varies per machine type,
but I don't think we need to solve the entire problem space in a
perfectly generic fashion here. Perfect is the enemy of good.
Right. Also, the more important item (item 1) is not affected by
machine-types. Host features change every time you run on a new
host/kernel, so probing it precisely is very useful, to detect problems
earlier (not just at the last moment before starting a VM).
On the other hand, per-machine-type CPU model changes are more rare, and
libvirt can still detect unexpected results immediately before the VM is
started. (I don't know what libvirt would do in case it detects it,
though. Abort? Log a warning?)
If we can get all the CPU feature flag filtering information to be
in statically defined data structures, then it seems that it would
be pretty straightforward to add a monitor API that takes a CPU
model name and machine type name, and returns the list of feature
flags, without actually having to initialize the machine type or
CPU. It can even just open /dev/kvm & issue the neccessary ioctl,
without having to initialize the entire KVM CPU subsystem in QEMU.
The "without actually having to initialize the machine" part may be
complicated, but it may be doable. But depending on the direction QEMU
machine-types design is going (I don't know if there are plans to
eventually make them more QOM-friendly), the solution accepted by QEMU
may be different.
I will suggest this as a topic for the next KVM call. Are you interested
in joining the call?
--
Eduardo