On 2024/08/07 5:41, Peter Xu wrote:
> On Mon, Aug 05, 2024 at 04:27:43PM +0900, Akihiko Odaki wrote:
> > On 2024/08/04 22:08, Peter Xu wrote:
> > > On Sun, Aug 04, 2024 at 03:49:45PM +0900, Akihiko Odaki wrote:
> > > > On 2024/08/03 1:26, Peter Xu wrote:
> > > > > On Sat, Aug 03, 2024 at 12:54:51AM +0900, Akihiko Odaki wrote:
> > > > > > > > > I'm not sure if I read it right.
Perhaps you meant something more generic
> > > > > > > > > than -platform but similar?
> > > > > > > > >
> > > > > > > > > For example, "-profile [PROFILE]"
qemu cmdline, where PROFILE can be either
> > > > > > > > > "perf" or "compat",
while by default to "compat"?
> > > > > > > >
> > > > > > > > "perf" would cover 4) and
"compat" will cover 1). However neither of them
> > > > > > > > will cover 2) because an enum is not enough to
know about all hosts. I
> > > > > > > > presented a design that will cover 2) in:
> > > > > > > >
https://lore.kernel.org/r/2da4ebcd-2058-49c3-a4ec-8e60536e5cbb@daynix.com
> > > > > > >
> > > > > > > "-merge-platform" shouldn't be a QEMU
parameter, but should be something
> > > > > > > separate.
> > > > > >
> > > > > > Do you mean merging platform dumps should be done with
another command? I
> > > > > > think we will want to know the QOM tree is in use when
implementing
> > > > > > -merge-platform. For example, you cannot define a
"platform" when e.g., you
> > > > > > don't know what netdev backend (e.g., user, vhost-net,
vhost-vdpa) is
> > > > > > connected to virtio-net devices. Of course we can include
those information
> > > > > > in dumps, but we don't do so for VMState.
> > > > >
> > > > > What I was thinking is the generated platform dump shouldn't
care about
> > > > > what is used as backend: it should try to probe whatever is
specified in
> > > > > the qemu cmdline, and it's the user's job to make sure
the exact same qemu
> > > > > cmdline is used in other hosts to dump this information.
> > > > >
> > > > > IOW, the dump will only contain the information that was based
on the qemu
> > > > > cmdline. E.g., if it doesn't include virtio device at all,
and if we only
> > > > > support such dump for virtio, it should dump nothing.
> > > > >
> > > > > Then the -merge-platform will expect all dumps to look the same
too,
> > > > > merging them with AND on each field.
> > > >
> > > > I think we will still need the QOM tree in that case. I think the
platform
> > > > information will look somewhat similar to VMState, which requires the
QOM
> > > > tree to interpret.
> > >
> > > Ah yes, I assume you meant when multiple devices can report different
thing
> > > even if with the same frontend / device type. QOM should work, or
anything
> > > that can identify a device, e.g. with id / instance_id attached along
with
> > > the device class.
> > >
> > > One thing that I still don't know how it works is how it interacts
with new
> > > hosts being added.
> > >
> > > This idea is based on the fact that the cluster is known before starting
> > > any VM. However in reality I think it can happen when VMs started with a
> > > small cluster but then cluster extended, when the -merge-platform has
been
> > > done on the smaller set.
> > >
> > > >
> > > > >
> > > > > Said that, I actually am still not clear on how / whether it
should work at
> > > > > last. At least my previous concern (1) didn't has a good
answer yet, on
> > > > > what we do when profile collisions with qemu cmdlines. So far I
actually
> > > > > still think it more straightforward that in migration we
handshake on these
> > > > > capabilities if possible.
> > > > >
> > > > > And that's why I was thinking (where I totally agree with
you on this) that
> > > > > whether we should settle a short term plan first to be on the
safe side
> > > > > that we start with migration always being compatible, then we
figure the
> > > > > other approach. That seems easier to me, and it's also a
matter of whether
> > > > > we want to do something for 9.1, or leaving that for 9.2 for
USO*.
> > > >
> > > > I suggest disabling all offload features of virtio-net with 9.2.
> > > >
> > > > I want to keep things consistent so I want to disable all at once.
This
> > > > change will be very uncomfortable for us, who are implementing
offload
> > > > features, but I hope it will motivate us to implement a proper
solution.
> > > >
> > > > That said, it will be surely a breaking change so we should wait for
9.1
> > > > before making such a change.
> > >
> > > Personally I don't worry too much on other offload bits besides USO*
so far
> > > if we have them ON for longer time. My wish was that they're old
good
> > > kernel features mostly supported everywhere who runs QEMU, then we're
good.
> >
> > Unfortunately, we cannot expect everyone runs Linux, and the offload
> > features are provided by Linux. However, QEMU can run on other platforms,
> > and offload features may be provided by vhost-user or vhost-vdpa.
>
> I see. I am not familiar with the status quo there, so I'll leave that to
> you and other experts that know better on this..
>
> Personally I do care more on Linux, as that's what we ship within RH..
>
> >
> > >
> > > And I definitely worry about future offload features, or any feature that
> > > may probe host like this and auto-OFF: I hope we can do them on the safe
> > > side starting from day1.
> > >
> > > So I don't know whether we should do that to USO* only or all. But I
agree
> > > with you that'll definitely be cleaner.
> > >
> > > On the details of how to turn them off properly.. Taking an example if
we
> > > want to turn off all the offload features by default (or simply we
replace
> > > that with USO-only)..
> > >
> > > Upstream machine type is flexible to all kinds of kernels, so we may not
> > > want to regress anyone using an existing machine type even on perf,
> > > especially if we want to turn off all.
> > >
> > > In that case we may need one more knob (I'm assuming this is
virtio-net
> > > specific issue, but maybe not; using it as an example) to make sure the
old
> > > machine types perfs as well, with:
> > >
> > > - x-virtio-net-offload-enforce
> > >
> > > When set, the offload features with value ON are enforced, so when
> > > the host doesn't support a offload feature it will fail to
boot,
> > > showing the error that specific offload feature is not supported by
the
> > > virtio backend.
> > >
> > > When clear, the offload features with value ON are not enforced, so
> > > these features can be automatically turned OFF when it's
detected the
> > > backend doesn't support them. This may bring best perf but has
the
> > > risk of breaking migration.
> >
> > "[PATCH v3 0/5] virtio-net: Convert feature properties to OnOffAuto"
adds
> > "x-force-features-auto" compatibility property to virtio-net for
this
> > purpose:
> >
https://lore.kernel.org/r/20240714-auto-v3-0-e27401aabab3@daynix.com
>
> Ah ok. But note that there's still a slight difference: we need to avoid
> AUTO being an option, at all, IMHO.
>
> It's about making qemu cmdline the ABI: when with AUTO it's still possible
> the user uses AUTO on both sides, then ABI may not be guaranteed.
>
> AUTO would be fine if: (1) the property doesn't affect guest ABI, or (2)
> the AUTO bit will always generate the same thing on both hosts. However
> USO* isn't such case.. so the AUTO option is IMHO not wanted.
>
> What I mentioned above "x-virtio-net-offload-enforce" shouldn't add
> anything new to "uso"; it still can only be ON/OFF. However it should
> affect "flip that to OFF automatically" or "fail the boot"
behavior on
> missing features.
My rationale for the OnOffAuto change is that "flipping ON to OFF
automatically" is more confusing than making users specify AUTO to allow
QEMU making the feature OFF. "ON" will always make the boot fail.
The ABI guarantee will be gone anyway if x-virtio-net-offload-enforce=off.
AUTO is no different in that sense.
Hmm yes; I wished we can have device properties that user can never
specify, but only set from internals. It's just that applying a compat
property so far require a generic device property. Or say, it'll be nice
that compat property can tweak a class variable too then no property to
introduce.
We could even add a migration blocker for x-virtio-net-offload-enforce=ON,
but again it could be too aggresive. I think it might be better we bet
nobody will even know there's the parameter, so it won't be used in manual
setup. OTOH, "guest_uso4" can be too easiy to find there's the AUTO
option: I normally use ",guest_uso4=?" to just dump the possible values.
Thanks,
--
Peter Xu