Re: [PATCH v2 4/4] virtio-net: Add support for USO features

30 Jul 2024

      On Tue, Jul 30, 2024 at 10:57 AM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
...
On 2024/07/30 11:04, Jason Wang wrote:
...
On Tue, Jul 30, 2024 at 12:43 AM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
...
On 2024/07/29 23:29, Peter Xu wrote:
...
On Mon, Jul 29, 2024 at 01:45:12PM +0900, Akihiko Odaki wrote:
...
On 2024/07/29 12:50, Jason Wang wrote:
...
On Sun, Jul 28, 2024 at 11:19 PM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>
> On 2024/07/27 5:47, Peter Xu wrote:
>> On Fri, Jul 26, 2024 at 04:17:12PM +0100, Daniel P. Berrangé wrote:
>>> On Fri, Jul 26, 2024 at 10:43:42AM -0400, Peter Xu wrote:
>>>> On Fri, Jul 26, 2024 at 09:48:02AM +0100, Daniel P. Berrangé wrote:
>>>>> On Fri, Jul 26, 2024 at 09:03:24AM +0200, Thomas Huth wrote:
>>>>>> On 26/07/2024 08.08, Michael S. Tsirkin wrote:
>>>>>>> On Thu, Jul 25, 2024 at 06:18:20PM -0400, Peter Xu wrote:
>>>>>>>> On Tue, Aug 01, 2023 at 01:31:48AM +0300, Yuri Benditovich wrote:
>>>>>>>>> USO features of virtio-net device depend on kernel ability
>>>>>>>>> to support them, for backward compatibility by default the
>>>>>>>>> features are disabled on 8.0 and earlier.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
>>>>>>>>> Signed-off-by: Andrew Melnychecnko <andrew@daynix.com>
>>>>>>>>
>>>>>>>> Looks like this patch broke migration when the VM starts on a host that has
>>>>>>>> USO supported, to another host that doesn't..
>>>>>>>
>>>>>>> This was always the case with all offloads. The answer at the moment is,
>>>>>>> don't do this.
>>>>>>
>>>>>> May I ask for my understanding:
>>>>>> "don't do this" = don't automatically enable/disable virtio features in QEMU
>>>>>> depending on host kernel features, or "don't do this" = don't try to migrate
>>>>>> between machines that have different host kernel features?
>>>>>>
>>>>>>> Long term, we need to start exposing management APIs
>>>>>>> to discover this, and management has to disable unsupported features.
>>>>>>
>>>>>> Ack, this likely needs some treatments from the libvirt side, too.
>>>>>
>>>>> When QEMU automatically toggles machine type featuers based on host
>>>>> kernel, relying on libvirt to then disable them again is impractical,
>>>>> as we cannot assume that the libvirt people are using knows about
>>>>> newly introduced features. Even if libvirt is updated to know about
>>>>> it, people can easily be using a previous libvirt release.
>>>>>
>>>>> QEMU itself needs to make the machine types do that they are there
>>>>> todo, which is to define a stable machine ABI.
>>>>>
>>>>> What QEMU is missing here is a "platform ABI" concept, to encode
>>>>> sets of features which are tied to specific platform generations.
>>>>> As long as we don't have that we'll keep having these broken
>>>>> migration problems from machine types dynamically changing instead
>>>>> of providing a stable guest ABI.
>>>>
>>>> Any more elaboration on this idea?  Would it be easily feasible in
>>>> implementation?
>>>
>>> In terms of launching QEMU I'd imagine:
>>>
>>>       $QEMU -machine pc-q35-9.1 -platform linux-6.9 ...args...
>>>
>>> Any virtual machine HW features which are tied to host kernel features
>>> would have their defaults set based on the requested -platform. The
>>> -machine will be fully invariant wrt the host kernel.
>>>
>>> You would have -platform hlep to list available platforms, and
>>> corresonding QMP "query-platforms" command to list what platforms
>>> are supported on a given host OS.
>>>
>>> Downstream distros can provide their own platforms definitions
>>> (eg "linux-rhel-9.5") if they have kernels whose feature set
>>> diverges from upstream due to backports.
>>>
>>> Mgmt apps won't need to be taught about every single little QEMU
>>> setting whose default is derived from the kernel. Individual
>>> defaults are opaque and controlled by the requested platform.
>>>
>>> Live migration has clearly defined semantics, and mgmt app can
>>> use query-platforms to validate two hosts are compatible.
>>>
>>> Omitting -platform should pick the very latest platform that is
>>> cmpatible with the current host (not neccessarily the latest
>>> platform built-in to QEMU).
>>
>> This seems to add one more layer to maintain, and so far I don't know
>> whether it's a must.
>>
>> To put it simple, can we simply rely on qemu cmdline as "the guest ABI"?  I
>> thought it was mostly the case already, except some extremely rare
>> outliers.
>>
>> When we have one host that boots up a VM using:
>>
>>       $QEMU1 $cmdline
>>
>> Then another host boots up:
>>
>>       $QEMU2 $cmdline -incoming XXX
>>
>> Then migration should succeed if $cmdline is exactly the same, and the VM
>> can boot up all fine without errors on both sides.
>>
>> AFAICT this has nothing to do with what kernel is underneath, even not
>> Linux?  I think either QEMU1 / QEMU2 has the option to fail.  But if it
>> didn't, I thought the ABI should be guaranteed.
>>
>> That's why I think this is a migration violation, as 99.99% of other device
>> properties should be following this rule.  The issue here is, we have the
>> same virtio-net-pci cmdline on both sides in this case, but the ABI got
>> break.
>>
>> That's also why I was suggesting if the property contributes to the guest
>> ABI, then AFAIU QEMU needs to:
>>
>>       - Firstly, never quietly flipping any bit that affects the ABI...
>>
>>       - Have a default value of off, then QEMU will always allow the VM to boot
>>         by default, while advanced users can opt-in on new features.  We can't
>>         make this ON by default otherwise some VMs can already fail to boot,
>
> It may not be necessary the case that old features are supported by
> every systems. In an extreme case, a user may migrate a VM from Linux to
> Windows, which probably doesn't support any offloading at all. A more
> convincing scenario is RSS offloading with eBPF; using eBPF requires a
> privilege so we cannot assume it is always available even on the latest
> version of Linux.
I don't get why eBPF matters here. It is something that is not noticed
by the guest and we have a fallback anyhow.
It is noticeable for the guest, and the fallback is not effective with
vhost.
It's a bug then. Qemu can fallback to tuntap if it sees issues in vhost.
We can certainly fallback to in-QEMU RSS by disabling vhost, but I would
not say lack of such fallback is a bug.
Such fallback is by design since the introduction of vhost.
...
We don't provide in-QEMU
fallback for other offloads.
Yes but what I want to say is that eBPF RSS is different from those
segmentation offloads. And technically, Qemu can do fallback for
offloads (as RSC did).

Thanks
...
Regards,
Akihiko Odaki