On Mon, Jul 29, 2024 at 01:00:30PM -0400, Peter Xu wrote:
On Mon, Jul 29, 2024 at 04:58:03PM +0100, Daniel P. Berrangé wrote:
>
> We've got two mutually conflicting goals with the machine type
> definitions.
>
> Primarily we use them to ensure stable ABI, but an important
> secondary goal is to enable new tunables to have new defaults
> set, without having to update every mgmt app. The latter
> works very well when the defaults have no dependancy on the
> platform kernel/OS, but breaks migration when they do have a
> platform dependancy.
>
> > - Firstly, never quietly flipping any bit that affects the ABI...
> >
> > - Have a default value of off, then QEMU will always allow the VM to boot
> > by default, while advanced users can opt-in on new features. We can't
> > make this ON by default otherwise some VMs can already fail to boot,
> >
> > - If the host doesn't support the feature while the cmdline enabled it,
> > it needs to fail QEMU boot rather than flipping, so that it says
"hey,
> > this host does not support running such VM specified, due to XXX
> > feature missing".
> >
> > That's the only way an user could understand what happened, and IMHO
that's
> > a clean way that we stick with QEMU cmdline on defining the guest ABI,
> > while in which the machine type is the fundation of such definition, as the
> > machine type can decides many of the rest compat properties. And that's
> > the whole point of the compat properties too (to make sure the guest ABI is
> > stable).
> >
> > If kernel breaks it easily, all compat property things that we maintain can
> > already stop making sense in general, because it didn't define the whole
> > guest ABI..
> >
> > So AFAIU that's really what we used for years, I hope I didn't
overlook
> > somehting. And maybe we don't yet need the "-platform" layer if
we can
> > keep up with this rule?
>
> We've failed at this for years wrt enabling use of new defaults that have
> a platform depedancy, so historical practice isn't a good reference.
>
> There are 100's (possibly 1000's) of tunables set implicitly as part of
> the machine type, and of those, libvirt likely only exposes a few 10's
> of tunables. The vast majority are low level details that no mgmt app
> wants to know about, they just want to accept QEMU's new defaults,
> while preserving machine ABI. This is a good thing. No one wants the
> burden of wiring up every single tunable into libvirt and mgmt apps.
>
> This is what the "-platform" concept would be intended to preserve. It
> would allow a way to enable groups of settings that have a platform level
> dependancy, without ever having to teach either libvirt or the mgmt apps
> about the individual tunables.
Do you think we can achieve similar goal by simply turning the feature to
ON only after a few QEMU releases? I also mentioned that idea below.
https://lore.kernel.org/r/ZqQNKZ9_OPhDq2AK@x1n
So far it really sounds like the right thing to do to me to fix all similar
issues, even without introducing anything new we need to maintain.
Turning a feature with a platform dependency to "on" implies that
the machine type will cease to work out of the box for platforms
which lack the feature. IMHO that's not acceptable behaviour for
any of our supported platforms.
IOW, "after a few QEMU releases" implies a delay of as much as
5 years, while we wait for platforms which don't support the
feature to drop out of our supported targets list. I don't
think that'll satisfy the desire to get the new feature
available to users as soon as practical for their particular
platform.
To put that again, what we need to do is this:
- To start: we should NEVER turn any guest ABI relevant bits
automatically by QEMU, for sure..
- When introducing any new device feature that may both (1) affects guest
ABI, and (2) depends on host kernel features, we set those default
values to OFF always at start. So this already covers old machine
types, no compat property needed so far.
- We always fail hard on QEMU boot whenever we detected such property is
not supported by the current host when with ON (and since it's OFF by
default it must be that the user specified that ON).
- When after a stablized period of time for that new feature to land most
kernels (we may consider to look at how major Linux distros updates the
kernel versions) when we're pretty sure the new feature should be
available on most of the QEMU modern users, we add a patch to make the
property default ON on the new machine type, add a compat property for
old machines.
Our supported platform list determines when this will be, and given
our current criteria, this can be as long as 5 years.
With regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|