On Mon, Mar 19, 2018 at 04:14:20PM +0100, Marek Marczykowski-Górecki wrote:
On Mon, Mar 19, 2018 at 02:11:02PM +0000, Daniel P. Berrangé wrote:
> On Fri, Mar 16, 2018 at 08:54:42PM +0100, Marek Marczykowski-Górecki wrote:
> > On Fri, Mar 16, 2018 at 05:44:28PM +0000, Daniel P. Berrangé wrote:
> > > Since Xen lets you specify raw "cpuid" register values here,
surely
> > > this is flexible enough to allow us to support the mode=custom CPU
> > > models ?
> > >
> > > We would just need to make sure every bit poisition used either
> > > 0 or 1, and not 'x', so that we are fully overriding whatever
> > > defaults are presented by the hypervisor "host" CPU model. Or
is
> > > life more complicated than that ?
> >
> > This is what v1 of this series had... See discussion there, especially
> > message from Jiri:
> >
https://www.redhat.com/archives/libvir-list/2017-June/msg01304.html
> >
> > And this, from... you:
> >
https://www.redhat.com/archives/libvir-list/2017-June/msg01308.html
>
> Ok, yeah, I remember these discussions now. What I didn't realize at the
> time though, was that the revised patches would end up silently changing
> any XML that says mode="custom" + model="Foobar", into
mode="host-passthrough"
>
> The core problem we faced in QEMU was that if you take a base named CPU
> model and then add/remove 20+ features, some guest OS can get confused.
> This is because the combination of features may be unusual, or the
> level & xlevel values reported alongside CPUID are unsual for the given
> set of features enabled. The Xen config doesn't provide a way to configure
> the level & xlevel values for the CPU, so will potentially hit this same
> problem. This is *already* possible though, even with host-passthrough,
> since the user can ask for host-passthrough and also disable a whole
> bunch of features.
>
> IOW, Jiri is right, but I don't think that neccesarily implies we should
> not implement support for mode="custom". I see we have 3 options here
> currently
>
> 1. Support mode="custom", using the cpuid config option in libxl
> 2. Raise an error when starting a guest with mode="custom"
> 3. Translate mode="custom" into mode="host-passthrough" and
ignore
> requested named model
>
> So currently situation is that users of libxl libvirt driver may have
> deployed guests using arbitrary <cpu> model setup, and that would have
> been ignored, giving that a host-passthrough setup.
>
> If we choose option (1), users will have changed semantics for their
> existing guests, as instead of getting full host-passthrough, they'll
> get a CPU that actually matches the mode=custom their XML has beeen
> requesting all along.
>
> If we choose option (2), we'll break existing deployed guests with
> mode=custom.
>
> If we choose option (3), users will have preserved semantics for their
> existing guests, but we'll never be honouring the actual XML config
> the user requested. If we did ever want to properly support mode=custom
> in future we're still going doom existing users.
There is a variant of this option: do not translate mode="custom" into
mode="host-passthrough" explicitly (keep it mode="custom" in the
config), but ignore both model and features in mode="custom" - as it is
currently.
In other words - add CPUID support only to mode="host-passthrough" and
keep other modes untouched.
Implementing proper mode=custom support (either now or later) will
change guest view of the system anyway, it doesn't matter when we do it.
And I'm leaning to doing it later.
Yes, just continuing to ignore mode=custom entirely would be acceptable,
as that's no worse than what we do today.
Perhaps you could actually emit a warning message
"Ignoring CPU with mode=custom, update your config to mode=host-passthrough
to avoid risk of changed guest semantics when mode=custom is supported in
the future".
so that we reduce level of surprise in future if we implement it.
> Ideally we would have choosen option (2) from day 1, but that
ship
> has sailed.
>
> I really dislike, even hate, option (3) because it is explicitly
> ignoring a valid XML configuration request and turning it into
> something completely different.
Well, libvirt does it all the time - by ignoring not supported domain
config elements instead of raising some error. For example most
<feature policy=...> are in this category, regardless of the mode, until
this patch series. Same for <cputune>, <memtune>, and other <*tune>,
various hypervisor features (libxl supports apic, pae, and acpi, others
are silently ignored). I can go on with the list...
And I guess it isn't only libxl driver.
Yes, we do unfortunately ignore unsupported XML elements in general, but
once we do support something, we try to report errors if we can't support
every enum choice it offers.
I don't want to implement both mode=custom and
mode=host-passthrough
in this patch series, mostly for non-technical reasons. It already took
very long time and I'd like to finally have at least some of it in. And
also it looks like mode=custom will be mostly separate code (maybe even
using old xend syntax, to have control over all cpuid bits, not only
those listed in libxl_cpuid.c).
Yes, understandable - if you drop the code that silently re-writes the
XML for mode=custom & add a warning message, that would be acceptable
to merge from my pov.
> So despite the problems / limitations of mode=custom that were
previously
> raised, I none the less think we should implement mode=custom for the
> libxl driver. Then document that its usage is discouraged - for the same
> reason that we discourage users from arbitrarily blanking out features
> for mode=host-passthrough.
>
> Perhaps in future, we could make mode=custom work more reliably if the
> libxl driver provided a way to set the level & xlevel values.
What exactly do you mean by level & xlevel values?
libxl cpuid syntax supports maxleaf and maxhvleaf - is it the same?
See here for details:
http://xenbits.xen.org/docs/unstable/man/xl.cfg.5.html#cpuid-LIBXL_STRING...
Ok, yeah, I guess I should have avoided QEMU specific terminology.
'level', and 'xlevel' are the values reported in eax, when
the user issues CPU ID index '0' or '0x800000000' respectively.
IIUC, essentially they value of 'level' is saying what is the
maximum leaf supported, and xlevel is the maximum hypevisor
leaf value. So this does sound like it probably matches to the Xen
maxleaf/maxhvleaf values.
The other thing I forgot were the family, model & stepping values.
I think those were actually the more critical thing from POV of
avoiding guest bugs, as certain OS / apps have (broken) assumption
about existance of features implying a certain min stepping level
or vica-verca.
> > This is not only about 'x'. But also about setting
'1' where hardware
> > does not really support given feature. This will also result in
"broken"
> > CPU.
>
> If we host hardware does not support a given feature bit, then the code
> has to make a decision based on the <feature policy=...> setting. ie
> "force" would present the feature to the guest, regardless of whether
> the host supports ir, while "require" would refuse to start the guest,
> and "optional" would silently disable the feature.
Hmm, but <feature policy=...> applies only to bits you explicitly
specify, not all those implied by given model, right? This means that
innocent model setting may also result in weird configuration. This is
nothing libxl specific, just about mode=custom on hardware assisted
virtualization (as opposed to pure software one, that could provide some
features regardless of host support).
For any CPU ID bits implied by the model, I believe we just decided they
would be assumed to have policy=require. Though in QEMU driver we actually
end up having policy=optional in many cases since historically we've no
way to identify if KVM silently blocks certain features, even if the host
CPU supports them. Even if a feature is implied by the <model>, the user
could still explicitly list if with <feature> to set a contrary policy.
Regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|