On Tue, Jul 30, 2024 at 12:43 AM Akihiko Odaki <akihiko.odaki(a)daynix.com> wrote:
On 2024/07/29 23:29, Peter Xu wrote:
> On Mon, Jul 29, 2024 at 01:45:12PM +0900, Akihiko Odaki wrote:
>> On 2024/07/29 12:50, Jason Wang wrote:
>>> On Sun, Jul 28, 2024 at 11:19 PM Akihiko Odaki
<akihiko.odaki(a)daynix.com> wrote:
>>>>
>>>> On 2024/07/27 5:47, Peter Xu wrote:
>>>>> On Fri, Jul 26, 2024 at 04:17:12PM +0100, Daniel P. Berrangé wrote:
>>>>>> On Fri, Jul 26, 2024 at 10:43:42AM -0400, Peter Xu wrote:
>>>>>>> On Fri, Jul 26, 2024 at 09:48:02AM +0100, Daniel P. Berrangé
wrote:
>>>>>>>> On Fri, Jul 26, 2024 at 09:03:24AM +0200, Thomas Huth
wrote:
>>>>>>>>> On 26/07/2024 08.08, Michael S. Tsirkin wrote:
>>>>>>>>>> On Thu, Jul 25, 2024 at 06:18:20PM -0400, Peter
Xu wrote:
>>>>>>>>>>> On Tue, Aug 01, 2023 at 01:31:48AM +0300,
Yuri Benditovich wrote:
>>>>>>>>>>>> USO features of virtio-net device depend
on kernel ability
>>>>>>>>>>>> to support them, for backward
compatibility by default the
>>>>>>>>>>>> features are disabled on 8.0 and
earlier.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Yuri Benditovich
<yuri.benditovich(a)daynix.com>
>>>>>>>>>>>> Signed-off-by: Andrew Melnychecnko
<andrew(a)daynix.com>
>>>>>>>>>>>
>>>>>>>>>>> Looks like this patch broke migration when
the VM starts on a host that has
>>>>>>>>>>> USO supported, to another host that
doesn't..
>>>>>>>>>>
>>>>>>>>>> This was always the case with all offloads. The
answer at the moment is,
>>>>>>>>>> don't do this.
>>>>>>>>>
>>>>>>>>> May I ask for my understanding:
>>>>>>>>> "don't do this" = don't
automatically enable/disable virtio features in QEMU
>>>>>>>>> depending on host kernel features, or
"don't do this" = don't try to migrate
>>>>>>>>> between machines that have different host kernel
features?
>>>>>>>>>
>>>>>>>>>> Long term, we need to start exposing management
APIs
>>>>>>>>>> to discover this, and management has to disable
unsupported features.
>>>>>>>>>
>>>>>>>>> Ack, this likely needs some treatments from the
libvirt side, too.
>>>>>>>>
>>>>>>>> When QEMU automatically toggles machine type featuers
based on host
>>>>>>>> kernel, relying on libvirt to then disable them again is
impractical,
>>>>>>>> as we cannot assume that the libvirt people are using
knows about
>>>>>>>> newly introduced features. Even if libvirt is updated to
know about
>>>>>>>> it, people can easily be using a previous libvirt
release.
>>>>>>>>
>>>>>>>> QEMU itself needs to make the machine types do that they
are there
>>>>>>>> todo, which is to define a stable machine ABI.
>>>>>>>>
>>>>>>>> What QEMU is missing here is a "platform ABI"
concept, to encode
>>>>>>>> sets of features which are tied to specific platform
generations.
>>>>>>>> As long as we don't have that we'll keep having
these broken
>>>>>>>> migration problems from machine types dynamically
changing instead
>>>>>>>> of providing a stable guest ABI.
>>>>>>>
>>>>>>> Any more elaboration on this idea? Would it be easily
feasible in
>>>>>>> implementation?
>>>>>>
>>>>>> In terms of launching QEMU I'd imagine:
>>>>>>
>>>>>> $QEMU -machine pc-q35-9.1 -platform linux-6.9 ...args...
>>>>>>
>>>>>> Any virtual machine HW features which are tied to host kernel
features
>>>>>> would have their defaults set based on the requested -platform.
The
>>>>>> -machine will be fully invariant wrt the host kernel.
>>>>>>
>>>>>> You would have -platform hlep to list available platforms, and
>>>>>> corresonding QMP "query-platforms" command to list
what platforms
>>>>>> are supported on a given host OS.
>>>>>>
>>>>>> Downstream distros can provide their own platforms definitions
>>>>>> (eg "linux-rhel-9.5") if they have kernels whose
feature set
>>>>>> diverges from upstream due to backports.
>>>>>>
>>>>>> Mgmt apps won't need to be taught about every single little
QEMU
>>>>>> setting whose default is derived from the kernel. Individual
>>>>>> defaults are opaque and controlled by the requested platform.
>>>>>>
>>>>>> Live migration has clearly defined semantics, and mgmt app can
>>>>>> use query-platforms to validate two hosts are compatible.
>>>>>>
>>>>>> Omitting -platform should pick the very latest platform that is
>>>>>> cmpatible with the current host (not neccessarily the latest
>>>>>> platform built-in to QEMU).
>>>>>
>>>>> This seems to add one more layer to maintain, and so far I don't
know
>>>>> whether it's a must.
>>>>>
>>>>> To put it simple, can we simply rely on qemu cmdline as "the
guest ABI"? I
>>>>> thought it was mostly the case already, except some extremely rare
>>>>> outliers.
>>>>>
>>>>> When we have one host that boots up a VM using:
>>>>>
>>>>> $QEMU1 $cmdline
>>>>>
>>>>> Then another host boots up:
>>>>>
>>>>> $QEMU2 $cmdline -incoming XXX
>>>>>
>>>>> Then migration should succeed if $cmdline is exactly the same, and
the VM
>>>>> can boot up all fine without errors on both sides.
>>>>>
>>>>> AFAICT this has nothing to do with what kernel is underneath, even
not
>>>>> Linux? I think either QEMU1 / QEMU2 has the option to fail. But if
it
>>>>> didn't, I thought the ABI should be guaranteed.
>>>>>
>>>>> That's why I think this is a migration violation, as 99.99% of
other device
>>>>> properties should be following this rule. The issue here is, we
have the
>>>>> same virtio-net-pci cmdline on both sides in this case, but the ABI
got
>>>>> break.
>>>>>
>>>>> That's also why I was suggesting if the property contributes to
the guest
>>>>> ABI, then AFAIU QEMU needs to:
>>>>>
>>>>> - Firstly, never quietly flipping any bit that affects the
ABI...
>>>>>
>>>>> - Have a default value of off, then QEMU will always allow the
VM to boot
>>>>> by default, while advanced users can opt-in on new features.
We can't
>>>>> make this ON by default otherwise some VMs can already fail
to boot,
>>>>
>>>> It may not be necessary the case that old features are supported by
>>>> every systems. In an extreme case, a user may migrate a VM from Linux
to
>>>> Windows, which probably doesn't support any offloading at all. A
more
>>>> convincing scenario is RSS offloading with eBPF; using eBPF requires a
>>>> privilege so we cannot assume it is always available even on the latest
>>>> version of Linux.
>>>
>>> I don't get why eBPF matters here. It is something that is not noticed
>>> by the guest and we have a fallback anyhow.
It is noticeable for the guest, and the fallback is not effective with
vhost.
It's a bug then. Qemu can fallback to tuntap if it sees issues in vhost.
Thanks