On Thu, Jul 25, 2013 at 10:15:56AM -0300, Eduardo Habkost wrote:
On Thu, Jul 25, 2013 at 10:45:10AM +0100, Daniel P. Berrange wrote:
> On Wed, Jul 24, 2013 at 03:25:19PM -0300, Eduardo Habkost wrote:
> > In addition to the "-cpu host" KVM initialization problem, this is
an
> > additional problem with the current interfaces provided by QEMU:
> >
> > 1) libvirt needs to query data that depend on chosen machine-type and
> > CPU model
> > 2) Some machine-type behavior is code and not introspectable data
> > * Luckily most of the data we need in this case should/will be
> > encoded in the compat_props tables.
> > * In either case, we don't have an API to query for machine-type
> > compat_props information yet.
> > 3) CPU model behavior will be modelled as CPU class behavior. Like
> > on the machine-type case, some of the CPU-model-specific behavior may
> > be modelled as code, and not introspectable data.
> > * However, e may be able to eventually encode most or all of
> > CPU-model-specific behavior simply as different per-CPU-class
> > property defaults.
> > * In either case, we don't have an API for QOM class introspection,
> > yet.
> >
> > But there's something important in this case: the resulting CPUID data
> > for a specific machine-type + CPU-model combination must be always the
> > same, forever. This means libvirt may even use a static table, or cache
> > this information indefinitely.
> >
> > (Note that I am not talking about "-cpu host", here, but about all
the
> > other CPU models)
>
> Hmm, so if the CPU filtering can vary per every single individual
> machine type, then the approach Jiri started here, of invoking QEMU
> with machine type set to query the CPU after it was created, is
> definitely not something we can follow. It is just far too inefficient.
I believe there's some confusion here: we are trying to solve two
problems:
1) CPU feature filtering (checking which features are available in a
given host)
2) CPU model probing (checking what exactly is going to be available
when a given CPU model is used, in case nothing is filtered out)
Yep, what Jiri proposed in the original libvirt thread was just a
solution to 1). In seeing that though, I was concerned about how it
scales up once we have to deal with 2) as well, which I believe is
planned future work.
Item (1) depends on: host CPU capabilities, host kernel
capabilities,
QEMU capabilities, presence of some few QEMU command-line options (e.g.
kernel irqchip), but shouldn't depend on the machine-type. It depends on
/dev/kvm being open.
Item (2) depends on the machine-type, but is static and must never
change on future QEMU versions (if it changes, it is a QEMU bug). It
doesn't depend on opening /dev/kvm.
Item (1) can be solved if libvirt does the work itself, by opening
/dev/kvm and checking for GET_SUPPORTED_CPUID and checking for QEMU
options/capabilities (as long as we document that very carefully). But
adding a more specific QMP command that won't require accel=kvm to work
may be simpler and better for everybody.
Item (2) may be solved today using a static table and/or caching (so
libvirt just need to query this information once in a lifetime). It can
also be solved partially (without machine-type support) in theory if
QEMU let libvirt repeatedly create and destroy CPU objects just to query
the resulting feature properties.
We really don't want to have static tables, since that creates pain
in the case where distro vendors create their own custom machine types
or CPU models. It would mean libvirt had to record info not only about
upstream QEMU, but about every vendor's QEMU builds. Probing the actual
binary is the only sensible way here.
...but both problems could be solved very easily using current QEMU
interfaces, if libvirt simply executed the QEMU binary more than once.
Is "must not run QEMU more than once" a hard requirement? Perfect is the
enemy of good. :)
Yes, it is a hard requirement.
> I understand that the QEMU code isn't currently structured
in a way
> that lets it easily expose information that varies per machine type,
> but I don't think we need to solve the entire problem space in a
> perfectly generic fashion here. Perfect is the enemy of good.
Right. Also, the more important item (item 1) is not affected by
machine-types. Host features change every time you run on a new
host/kernel, so probing it precisely is very useful, to detect problems
earlier (not just at the last moment before starting a VM).
On the other hand, per-machine-type CPU model changes are more rare, and
libvirt can still detect unexpected results immediately before the VM is
started. (I don't know what libvirt would do in case it detects it,
though. Abort? Log a warning?)
We don't want to be running QEMU multiple times during the startup
process for a VM, because that adds delays to the startup process.
It might not sound like much but adding a few 100ms to probe CPUs
by running QEMU is quite significant for apps like libvirt-sandbox
and libguesfs where absolute boot time is important. We used to run
QEMU at startup to probe things & we just recently got rid of that
delay, so I don't want to re-introduce it against. When starting
a VM, we only once to start QEMU once, as the actual instance that
is going to run the VM.
> If we can get all the CPU feature flag filtering information to
be
> in statically defined data structures, then it seems that it would
> be pretty straightforward to add a monitor API that takes a CPU
> model name and machine type name, and returns the list of feature
> flags, without actually having to initialize the machine type or
> CPU. It can even just open /dev/kvm & issue the neccessary ioctl,
> without having to initialize the entire KVM CPU subsystem in QEMU.
The "without actually having to initialize the machine" part may be
complicated, but it may be doable. But depending on the direction QEMU
machine-types design is going (I don't know if there are plans to
eventually make them more QOM-friendly), the solution accepted by QEMU
may be different.
I will suggest this as a topic for the next KVM call. Are you interested
in joining the call?
Yes, assuming it doesn't clash with anything else i have scheduled.
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|