On Thu, Aug 08, 2024 at 08:43:22PM +0900, Akihiko Odaki wrote:
> On 2024/08/07 5:41, Peter Xu wrote:
>> On Mon, Aug 05, 2024 at 04:27:43PM +0900, Akihiko Odaki wrote:
>>> On 2024/08/04 22:08, Peter Xu wrote:
>>>> On Sun, Aug 04, 2024 at 03:49:45PM +0900, Akihiko Odaki wrote:
>>>>> On 2024/08/03 1:26, Peter Xu wrote:
>>>>>> On Sat, Aug 03, 2024 at 12:54:51AM +0900, Akihiko Odaki wrote:
>>>>>>>>>> I'm not sure if I read it right. Perhaps you
meant something more generic
>>>>>>>>>> than -platform but similar?
>>>>>>>>>>
>>>>>>>>>> For example, "-profile [PROFILE]" qemu
cmdline, where PROFILE can be either
>>>>>>>>>> "perf" or "compat", while by
default to "compat"?
>>>>>>>>>
>>>>>>>>> "perf" would cover 4) and
"compat" will cover 1). However neither of them
>>>>>>>>> will cover 2) because an enum is not enough to know
about all hosts. I
>>>>>>>>> presented a design that will cover 2) in:
>>>>>>>>>
https://lore.kernel.org/r/2da4ebcd-2058-49c3-a4ec-8e60536e5cbb@daynix.com
>>>>>>>>
>>>>>>>> "-merge-platform" shouldn't be a QEMU
parameter, but should be something
>>>>>>>> separate.
>>>>>>>
>>>>>>> Do you mean merging platform dumps should be done with
another command? I
>>>>>>> think we will want to know the QOM tree is in use when
implementing
>>>>>>> -merge-platform. For example, you cannot define a
"platform" when e.g., you
>>>>>>> don't know what netdev backend (e.g., user, vhost-net,
vhost-vdpa) is
>>>>>>> connected to virtio-net devices. Of course we can include
those information
>>>>>>> in dumps, but we don't do so for VMState.
>>>>>>
>>>>>> What I was thinking is the generated platform dump shouldn't
care about
>>>>>> what is used as backend: it should try to probe whatever is
specified in
>>>>>> the qemu cmdline, and it's the user's job to make sure
the exact same qemu
>>>>>> cmdline is used in other hosts to dump this information.
>>>>>>
>>>>>> IOW, the dump will only contain the information that was based on
the qemu
>>>>>> cmdline. E.g., if it doesn't include virtio device at all,
and if we only
>>>>>> support such dump for virtio, it should dump nothing.
>>>>>>
>>>>>> Then the -merge-platform will expect all dumps to look the same
too,
>>>>>> merging them with AND on each field.
>>>>>
>>>>> I think we will still need the QOM tree in that case. I think the
platform
>>>>> information will look somewhat similar to VMState, which requires the
QOM
>>>>> tree to interpret.
>>>>
>>>> Ah yes, I assume you meant when multiple devices can report different
thing
>>>> even if with the same frontend / device type. QOM should work, or
anything
>>>> that can identify a device, e.g. with id / instance_id attached along
with
>>>> the device class.
>>>>
>>>> One thing that I still don't know how it works is how it interacts
with new
>>>> hosts being added.
>>>>
>>>> This idea is based on the fact that the cluster is known before starting
>>>> any VM. However in reality I think it can happen when VMs started with
a
>>>> small cluster but then cluster extended, when the -merge-platform has
been
>>>> done on the smaller set.
>>>>
>>>>>
>>>>>>
>>>>>> Said that, I actually am still not clear on how / whether it
should work at
>>>>>> last. At least my previous concern (1) didn't has a good
answer yet, on
>>>>>> what we do when profile collisions with qemu cmdlines. So far I
actually
>>>>>> still think it more straightforward that in migration we
handshake on these
>>>>>> capabilities if possible.
>>>>>>
>>>>>> And that's why I was thinking (where I totally agree with you
on this) that
>>>>>> whether we should settle a short term plan first to be on the
safe side
>>>>>> that we start with migration always being compatible, then we
figure the
>>>>>> other approach. That seems easier to me, and it's also a
matter of whether
>>>>>> we want to do something for 9.1, or leaving that for 9.2 for
USO*.
>>>>>
>>>>> I suggest disabling all offload features of virtio-net with 9.2.
>>>>>
>>>>> I want to keep things consistent so I want to disable all at once.
This
>>>>> change will be very uncomfortable for us, who are implementing
offload
>>>>> features, but I hope it will motivate us to implement a proper
solution.
>>>>>
>>>>> That said, it will be surely a breaking change so we should wait for
9.1
>>>>> before making such a change.
>>>>
>>>> Personally I don't worry too much on other offload bits besides USO*
so far
>>>> if we have them ON for longer time. My wish was that they're old
good
>>>> kernel features mostly supported everywhere who runs QEMU, then we're
good.
>>>
>>> Unfortunately, we cannot expect everyone runs Linux, and the offload
>>> features are provided by Linux. However, QEMU can run on other platforms,
>>> and offload features may be provided by vhost-user or vhost-vdpa.
>>
>> I see. I am not familiar with the status quo there, so I'll leave that to
>> you and other experts that know better on this..
>>
>> Personally I do care more on Linux, as that's what we ship within RH..
>>
>>>
>>>>
>>>> And I definitely worry about future offload features, or any feature
that
>>>> may probe host like this and auto-OFF: I hope we can do them on the safe
>>>> side starting from day1.
>>>>
>>>> So I don't know whether we should do that to USO* only or all. But I
agree
>>>> with you that'll definitely be cleaner.
>>>>
>>>> On the details of how to turn them off properly.. Taking an example if
we
>>>> want to turn off all the offload features by default (or simply we
replace
>>>> that with USO-only)..
>>>>
>>>> Upstream machine type is flexible to all kinds of kernels, so we may not
>>>> want to regress anyone using an existing machine type even on perf,
>>>> especially if we want to turn off all.
>>>>
>>>> In that case we may need one more knob (I'm assuming this is
virtio-net
>>>> specific issue, but maybe not; using it as an example) to make sure the
old
>>>> machine types perfs as well, with:
>>>>
>>>> - x-virtio-net-offload-enforce
>>>>
>>>> When set, the offload features with value ON are enforced, so
when
>>>> the host doesn't support a offload feature it will fail to
boot,
>>>> showing the error that specific offload feature is not supported
by the
>>>> virtio backend.
>>>>
>>>> When clear, the offload features with value ON are not enforced,
so
>>>> these features can be automatically turned OFF when it's
detected the
>>>> backend doesn't support them. This may bring best perf but
has the
>>>> risk of breaking migration.
>>>
>>> "[PATCH v3 0/5] virtio-net: Convert feature properties to
OnOffAuto" adds
>>> "x-force-features-auto" compatibility property to virtio-net for
this
>>> purpose:
>>>
https://lore.kernel.org/r/20240714-auto-v3-0-e27401aabab3@daynix.com
>>
>> Ah ok. But note that there's still a slight difference: we need to avoid
>> AUTO being an option, at all, IMHO.
>>
>> It's about making qemu cmdline the ABI: when with AUTO it's still
possible
>> the user uses AUTO on both sides, then ABI may not be guaranteed.
>>
>> AUTO would be fine if: (1) the property doesn't affect guest ABI, or (2)
>> the AUTO bit will always generate the same thing on both hosts. However
>> USO* isn't such case.. so the AUTO option is IMHO not wanted.
>>
>> What I mentioned above "x-virtio-net-offload-enforce" shouldn't
add
>> anything new to "uso"; it still can only be ON/OFF. However it should
>> affect "flip that to OFF automatically" or "fail the boot"
behavior on
>> missing features.
>
> My rationale for the OnOffAuto change is that "flipping ON to OFF
> automatically" is more confusing than making users specify AUTO to allow
> QEMU making the feature OFF. "ON" will always make the boot fail.
>
> The ABI guarantee will be gone anyway if x-virtio-net-offload-enforce=off.
> AUTO is no different in that sense.
Hmm yes; I wished we can have device properties that user can never
specify, but only set from internals. It's just that applying a compat
property so far require a generic device property. Or say, it'll be nice
that compat property can tweak a class variable too then no property to
introduce.
We could even add a migration blocker for x-virtio-net-offload-enforce=ON,
but again it could be too aggresive. I think it might be better we bet
nobody will even know there's the parameter, so it won't be used in manual
setup. OTOH, "guest_uso4" can be too easiy to find there's the AUTO
option: I normally use ",guest_uso4=?" to just dump the possible values.
We can detect and reject AUTO when cross-migrate=on if desired, but I'm
not sure it's worthwhile.
Regards,
Akihiko Odaki