On Tue, Jul 30, 2024 at 10:57 AM Akihiko Odaki <akihiko.odaki(a)daynix.com> wrote:
On 2024/07/30 11:04, Jason Wang wrote:
> On Tue, Jul 30, 2024 at 12:43 AM Akihiko Odaki <akihiko.odaki(a)daynix.com>
wrote:
>>
>> On 2024/07/29 23:29, Peter Xu wrote:
>>> On Mon, Jul 29, 2024 at 01:45:12PM +0900, Akihiko Odaki wrote:
>>>> On 2024/07/29 12:50, Jason Wang wrote:
>>>>> On Sun, Jul 28, 2024 at 11:19 PM Akihiko Odaki
<akihiko.odaki(a)daynix.com> wrote:
>>>>>>
>>>>>> On 2024/07/27 5:47, Peter Xu wrote:
>>>>>>> On Fri, Jul 26, 2024 at 04:17:12PM +0100, Daniel P. Berrangé
wrote:
>>>>>>>> On Fri, Jul 26, 2024 at 10:43:42AM -0400, Peter Xu
wrote:
>>>>>>>>> On Fri, Jul 26, 2024 at 09:48:02AM +0100, Daniel P.
Berrangé wrote:
>>>>>>>>>> On Fri, Jul 26, 2024 at 09:03:24AM +0200, Thomas
Huth wrote:
>>>>>>>>>>> On 26/07/2024 08.08, Michael S. Tsirkin
wrote:
>>>>>>>>>>>> On Thu, Jul 25, 2024 at 06:18:20PM
-0400, Peter Xu wrote:
>>>>>>>>>>>>> On Tue, Aug 01, 2023 at 01:31:48AM
+0300, Yuri Benditovich wrote:
>>>>>>>>>>>>>> USO features of virtio-net
device depend on kernel ability
>>>>>>>>>>>>>> to support them, for backward
compatibility by default the
>>>>>>>>>>>>>> features are disabled on 8.0 and
earlier.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Yuri Benditovich
<yuri.benditovich(a)daynix.com>
>>>>>>>>>>>>>> Signed-off-by: Andrew
Melnychecnko <andrew(a)daynix.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Looks like this patch broke
migration when the VM starts on a host that has
>>>>>>>>>>>>> USO supported, to another host that
doesn't..
>>>>>>>>>>>>
>>>>>>>>>>>> This was always the case with all
offloads. The answer at the moment is,
>>>>>>>>>>>> don't do this.
>>>>>>>>>>>
>>>>>>>>>>> May I ask for my understanding:
>>>>>>>>>>> "don't do this" = don't
automatically enable/disable virtio features in QEMU
>>>>>>>>>>> depending on host kernel features, or
"don't do this" = don't try to migrate
>>>>>>>>>>> between machines that have different host
kernel features?
>>>>>>>>>>>
>>>>>>>>>>>> Long term, we need to start exposing
management APIs
>>>>>>>>>>>> to discover this, and management has to
disable unsupported features.
>>>>>>>>>>>
>>>>>>>>>>> Ack, this likely needs some treatments from
the libvirt side, too.
>>>>>>>>>>
>>>>>>>>>> When QEMU automatically toggles machine type
featuers based on host
>>>>>>>>>> kernel, relying on libvirt to then disable them
again is impractical,
>>>>>>>>>> as we cannot assume that the libvirt people are
using knows about
>>>>>>>>>> newly introduced features. Even if libvirt is
updated to know about
>>>>>>>>>> it, people can easily be using a previous
libvirt release.
>>>>>>>>>>
>>>>>>>>>> QEMU itself needs to make the machine types do
that they are there
>>>>>>>>>> todo, which is to define a stable machine ABI.
>>>>>>>>>>
>>>>>>>>>> What QEMU is missing here is a "platform
ABI" concept, to encode
>>>>>>>>>> sets of features which are tied to specific
platform generations.
>>>>>>>>>> As long as we don't have that we'll keep
having these broken
>>>>>>>>>> migration problems from machine types
dynamically changing instead
>>>>>>>>>> of providing a stable guest ABI.
>>>>>>>>>
>>>>>>>>> Any more elaboration on this idea? Would it be
easily feasible in
>>>>>>>>> implementation?
>>>>>>>>
>>>>>>>> In terms of launching QEMU I'd imagine:
>>>>>>>>
>>>>>>>> $QEMU -machine pc-q35-9.1 -platform linux-6.9
...args...
>>>>>>>>
>>>>>>>> Any virtual machine HW features which are tied to host
kernel features
>>>>>>>> would have their defaults set based on the requested
-platform. The
>>>>>>>> -machine will be fully invariant wrt the host kernel.
>>>>>>>>
>>>>>>>> You would have -platform hlep to list available
platforms, and
>>>>>>>> corresonding QMP "query-platforms" command to
list what platforms
>>>>>>>> are supported on a given host OS.
>>>>>>>>
>>>>>>>> Downstream distros can provide their own platforms
definitions
>>>>>>>> (eg "linux-rhel-9.5") if they have kernels
whose feature set
>>>>>>>> diverges from upstream due to backports.
>>>>>>>>
>>>>>>>> Mgmt apps won't need to be taught about every single
little QEMU
>>>>>>>> setting whose default is derived from the kernel.
Individual
>>>>>>>> defaults are opaque and controlled by the requested
platform.
>>>>>>>>
>>>>>>>> Live migration has clearly defined semantics, and mgmt
app can
>>>>>>>> use query-platforms to validate two hosts are
compatible.
>>>>>>>>
>>>>>>>> Omitting -platform should pick the very latest platform
that is
>>>>>>>> cmpatible with the current host (not neccessarily the
latest
>>>>>>>> platform built-in to QEMU).
>>>>>>>
>>>>>>> This seems to add one more layer to maintain, and so far I
don't know
>>>>>>> whether it's a must.
>>>>>>>
>>>>>>> To put it simple, can we simply rely on qemu cmdline as
"the guest ABI"? I
>>>>>>> thought it was mostly the case already, except some
extremely rare
>>>>>>> outliers.
>>>>>>>
>>>>>>> When we have one host that boots up a VM using:
>>>>>>>
>>>>>>> $QEMU1 $cmdline
>>>>>>>
>>>>>>> Then another host boots up:
>>>>>>>
>>>>>>> $QEMU2 $cmdline -incoming XXX
>>>>>>>
>>>>>>> Then migration should succeed if $cmdline is exactly the
same, and the VM
>>>>>>> can boot up all fine without errors on both sides.
>>>>>>>
>>>>>>> AFAICT this has nothing to do with what kernel is
underneath, even not
>>>>>>> Linux? I think either QEMU1 / QEMU2 has the option to fail.
But if it
>>>>>>> didn't, I thought the ABI should be guaranteed.
>>>>>>>
>>>>>>> That's why I think this is a migration violation, as
99.99% of other device
>>>>>>> properties should be following this rule. The issue here
is, we have the
>>>>>>> same virtio-net-pci cmdline on both sides in this case, but
the ABI got
>>>>>>> break.
>>>>>>>
>>>>>>> That's also why I was suggesting if the property
contributes to the guest
>>>>>>> ABI, then AFAIU QEMU needs to:
>>>>>>>
>>>>>>> - Firstly, never quietly flipping any bit that affects
the ABI...
>>>>>>>
>>>>>>> - Have a default value of off, then QEMU will always
allow the VM to boot
>>>>>>> by default, while advanced users can opt-in on new
features. We can't
>>>>>>> make this ON by default otherwise some VMs can
already fail to boot,
>>>>>>
>>>>>> It may not be necessary the case that old features are supported
by
>>>>>> every systems. In an extreme case, a user may migrate a VM from
Linux to
>>>>>> Windows, which probably doesn't support any offloading at
all. A more
>>>>>> convincing scenario is RSS offloading with eBPF; using eBPF
requires a
>>>>>> privilege so we cannot assume it is always available even on the
latest
>>>>>> version of Linux.
>>>>>
>>>>> I don't get why eBPF matters here. It is something that is not
noticed
>>>>> by the guest and we have a fallback anyhow.
>>
>> It is noticeable for the guest, and the fallback is not effective with
>> vhost.
>
> It's a bug then. Qemu can fallback to tuntap if it sees issues in vhost.
We can certainly fallback to in-QEMU RSS by disabling vhost, but I would
not say lack of such fallback is a bug.
Such fallback is by design since the introduction of vhost.
We don't provide in-QEMU
fallback for other offloads.
Yes but what I want to say is that eBPF RSS is different from those
segmentation offloads. And technically, Qemu can do fallback for
offloads (as RSC did).
Thanks
Regards,
Akihiko Odaki