On 2024/07/30 11:04, Jason Wang wrote:
On Tue, Jul 30, 2024 at 12:43 AM Akihiko Odaki
<akihiko.odaki(a)daynix.com> wrote:
>
> On 2024/07/29 23:29, Peter Xu wrote:
>> On Mon, Jul 29, 2024 at 01:45:12PM +0900, Akihiko Odaki wrote:
>>> On 2024/07/29 12:50, Jason Wang wrote:
>>>> On Sun, Jul 28, 2024 at 11:19 PM Akihiko Odaki
<akihiko.odaki(a)daynix.com> wrote:
>>>>>
>>>>> On 2024/07/27 5:47, Peter Xu wrote:
>>>>>> On Fri, Jul 26, 2024 at 04:17:12PM +0100, Daniel P. Berrangé
wrote:
>>>>>>> On Fri, Jul 26, 2024 at 10:43:42AM -0400, Peter Xu wrote:
>>>>>>>> On Fri, Jul 26, 2024 at 09:48:02AM +0100, Daniel P.
Berrangé wrote:
>>>>>>>>> On Fri, Jul 26, 2024 at 09:03:24AM +0200, Thomas Huth
wrote:
>>>>>>>>>> On 26/07/2024 08.08, Michael S. Tsirkin wrote:
>>>>>>>>>>> On Thu, Jul 25, 2024 at 06:18:20PM -0400,
Peter Xu wrote:
>>>>>>>>>>>> On Tue, Aug 01, 2023 at 01:31:48AM +0300,
Yuri Benditovich wrote:
>>>>>>>>>>>>> USO features of virtio-net device
depend on kernel ability
>>>>>>>>>>>>> to support them, for backward
compatibility by default the
>>>>>>>>>>>>> features are disabled on 8.0 and
earlier.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Yuri Benditovich
<yuri.benditovich(a)daynix.com>
>>>>>>>>>>>>> Signed-off-by: Andrew Melnychecnko
<andrew(a)daynix.com>
>>>>>>>>>>>>
>>>>>>>>>>>> Looks like this patch broke migration
when the VM starts on a host that has
>>>>>>>>>>>> USO supported, to another host that
doesn't..
>>>>>>>>>>>
>>>>>>>>>>> This was always the case with all offloads.
The answer at the moment is,
>>>>>>>>>>> don't do this.
>>>>>>>>>>
>>>>>>>>>> May I ask for my understanding:
>>>>>>>>>> "don't do this" = don't
automatically enable/disable virtio features in QEMU
>>>>>>>>>> depending on host kernel features, or
"don't do this" = don't try to migrate
>>>>>>>>>> between machines that have different host kernel
features?
>>>>>>>>>>
>>>>>>>>>>> Long term, we need to start exposing
management APIs
>>>>>>>>>>> to discover this, and management has to
disable unsupported features.
>>>>>>>>>>
>>>>>>>>>> Ack, this likely needs some treatments from the
libvirt side, too.
>>>>>>>>>
>>>>>>>>> When QEMU automatically toggles machine type featuers
based on host
>>>>>>>>> kernel, relying on libvirt to then disable them again
is impractical,
>>>>>>>>> as we cannot assume that the libvirt people are using
knows about
>>>>>>>>> newly introduced features. Even if libvirt is updated
to know about
>>>>>>>>> it, people can easily be using a previous libvirt
release.
>>>>>>>>>
>>>>>>>>> QEMU itself needs to make the machine types do that
they are there
>>>>>>>>> todo, which is to define a stable machine ABI.
>>>>>>>>>
>>>>>>>>> What QEMU is missing here is a "platform
ABI" concept, to encode
>>>>>>>>> sets of features which are tied to specific platform
generations.
>>>>>>>>> As long as we don't have that we'll keep
having these broken
>>>>>>>>> migration problems from machine types dynamically
changing instead
>>>>>>>>> of providing a stable guest ABI.
>>>>>>>>
>>>>>>>> Any more elaboration on this idea? Would it be easily
feasible in
>>>>>>>> implementation?
>>>>>>>
>>>>>>> In terms of launching QEMU I'd imagine:
>>>>>>>
>>>>>>> $QEMU -machine pc-q35-9.1 -platform linux-6.9
...args...
>>>>>>>
>>>>>>> Any virtual machine HW features which are tied to host kernel
features
>>>>>>> would have their defaults set based on the requested
-platform. The
>>>>>>> -machine will be fully invariant wrt the host kernel.
>>>>>>>
>>>>>>> You would have -platform hlep to list available platforms,
and
>>>>>>> corresonding QMP "query-platforms" command to list
what platforms
>>>>>>> are supported on a given host OS.
>>>>>>>
>>>>>>> Downstream distros can provide their own platforms
definitions
>>>>>>> (eg "linux-rhel-9.5") if they have kernels whose
feature set
>>>>>>> diverges from upstream due to backports.
>>>>>>>
>>>>>>> Mgmt apps won't need to be taught about every single
little QEMU
>>>>>>> setting whose default is derived from the kernel. Individual
>>>>>>> defaults are opaque and controlled by the requested
platform.
>>>>>>>
>>>>>>> Live migration has clearly defined semantics, and mgmt app
can
>>>>>>> use query-platforms to validate two hosts are compatible.
>>>>>>>
>>>>>>> Omitting -platform should pick the very latest platform that
is
>>>>>>> cmpatible with the current host (not neccessarily the latest
>>>>>>> platform built-in to QEMU).
>>>>>>
>>>>>> This seems to add one more layer to maintain, and so far I
don't know
>>>>>> whether it's a must.
>>>>>>
>>>>>> To put it simple, can we simply rely on qemu cmdline as "the
guest ABI"? I
>>>>>> thought it was mostly the case already, except some extremely
rare
>>>>>> outliers.
>>>>>>
>>>>>> When we have one host that boots up a VM using:
>>>>>>
>>>>>> $QEMU1 $cmdline
>>>>>>
>>>>>> Then another host boots up:
>>>>>>
>>>>>> $QEMU2 $cmdline -incoming XXX
>>>>>>
>>>>>> Then migration should succeed if $cmdline is exactly the same,
and the VM
>>>>>> can boot up all fine without errors on both sides.
>>>>>>
>>>>>> AFAICT this has nothing to do with what kernel is underneath,
even not
>>>>>> Linux? I think either QEMU1 / QEMU2 has the option to fail. But
if it
>>>>>> didn't, I thought the ABI should be guaranteed.
>>>>>>
>>>>>> That's why I think this is a migration violation, as 99.99%
of other device
>>>>>> properties should be following this rule. The issue here is, we
have the
>>>>>> same virtio-net-pci cmdline on both sides in this case, but the
ABI got
>>>>>> break.
>>>>>>
>>>>>> That's also why I was suggesting if the property contributes
to the guest
>>>>>> ABI, then AFAIU QEMU needs to:
>>>>>>
>>>>>> - Firstly, never quietly flipping any bit that affects the
ABI...
>>>>>>
>>>>>> - Have a default value of off, then QEMU will always allow
the VM to boot
>>>>>> by default, while advanced users can opt-in on new
features. We can't
>>>>>> make this ON by default otherwise some VMs can already
fail to boot,
>>>>>
>>>>> It may not be necessary the case that old features are supported by
>>>>> every systems. In an extreme case, a user may migrate a VM from Linux
to
>>>>> Windows, which probably doesn't support any offloading at all. A
more
>>>>> convincing scenario is RSS offloading with eBPF; using eBPF requires
a
>>>>> privilege so we cannot assume it is always available even on the
latest
>>>>> version of Linux.
>>>>
>>>> I don't get why eBPF matters here. It is something that is not
noticed
>>>> by the guest and we have a fallback anyhow.
>
> It is noticeable for the guest, and the fallback is not effective with
> vhost.
It's a bug then. Qemu can fallback to tuntap if it sees issues in vhost.
We can certainly fallback to in-QEMU RSS by disabling vhost, but I would
not say lack of such fallback is a bug. We don't provide in-QEMU
fallback for other offloads.
Regards,
Akihiko Odaki