On 2024/07/30 12:03, Jason Wang wrote:
On Tue, Jul 30, 2024 at 10:57 AM Akihiko Odaki
<akihiko.odaki(a)daynix.com> wrote:
>
> On 2024/07/30 11:04, Jason Wang wrote:
>> On Tue, Jul 30, 2024 at 12:43 AM Akihiko Odaki <akihiko.odaki(a)daynix.com>
wrote:
>>>
>>> On 2024/07/29 23:29, Peter Xu wrote:
>>>> On Mon, Jul 29, 2024 at 01:45:12PM +0900, Akihiko Odaki wrote:
>>>>> On 2024/07/29 12:50, Jason Wang wrote:
>>>>>> On Sun, Jul 28, 2024 at 11:19 PM Akihiko Odaki
<akihiko.odaki(a)daynix.com> wrote:
>>>>>>>
>>>>>>> On 2024/07/27 5:47, Peter Xu wrote:
>>>>>>>> On Fri, Jul 26, 2024 at 04:17:12PM +0100, Daniel P.
Berrangé wrote:
>>>>>>>>> On Fri, Jul 26, 2024 at 10:43:42AM -0400, Peter Xu
wrote:
>>>>>>>>>> On Fri, Jul 26, 2024 at 09:48:02AM +0100, Daniel
P. Berrangé wrote:
>>>>>>>>>>> On Fri, Jul 26, 2024 at 09:03:24AM +0200,
Thomas Huth wrote:
>>>>>>>>>>>> On 26/07/2024 08.08, Michael S. Tsirkin
wrote:
>>>>>>>>>>>>> On Thu, Jul 25, 2024 at 06:18:20PM
-0400, Peter Xu wrote:
>>>>>>>>>>>>>> On Tue, Aug 01, 2023 at
01:31:48AM +0300, Yuri Benditovich wrote:
>>>>>>>>>>>>>>> USO features of virtio-net
device depend on kernel ability
>>>>>>>>>>>>>>> to support them, for backward
compatibility by default the
>>>>>>>>>>>>>>> features are disabled on 8.0
and earlier.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Yuri
Benditovich <yuri.benditovich(a)daynix.com>
>>>>>>>>>>>>>>> Signed-off-by: Andrew
Melnychecnko <andrew(a)daynix.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Looks like this patch broke
migration when the VM starts on a host that has
>>>>>>>>>>>>>> USO supported, to another host
that doesn't..
>>>>>>>>>>>>>
>>>>>>>>>>>>> This was always the case with all
offloads. The answer at the moment is,
>>>>>>>>>>>>> don't do this.
>>>>>>>>>>>>
>>>>>>>>>>>> May I ask for my understanding:
>>>>>>>>>>>> "don't do this" = don't
automatically enable/disable virtio features in QEMU
>>>>>>>>>>>> depending on host kernel features, or
"don't do this" = don't try to migrate
>>>>>>>>>>>> between machines that have different host
kernel features?
>>>>>>>>>>>>
>>>>>>>>>>>>> Long term, we need to start exposing
management APIs
>>>>>>>>>>>>> to discover this, and management has
to disable unsupported features.
>>>>>>>>>>>>
>>>>>>>>>>>> Ack, this likely needs some treatments
from the libvirt side, too.
>>>>>>>>>>>
>>>>>>>>>>> When QEMU automatically toggles machine type
featuers based on host
>>>>>>>>>>> kernel, relying on libvirt to then disable
them again is impractical,
>>>>>>>>>>> as we cannot assume that the libvirt people
are using knows about
>>>>>>>>>>> newly introduced features. Even if libvirt is
updated to know about
>>>>>>>>>>> it, people can easily be using a previous
libvirt release.
>>>>>>>>>>>
>>>>>>>>>>> QEMU itself needs to make the machine types
do that they are there
>>>>>>>>>>> todo, which is to define a stable machine
ABI.
>>>>>>>>>>>
>>>>>>>>>>> What QEMU is missing here is a "platform
ABI" concept, to encode
>>>>>>>>>>> sets of features which are tied to specific
platform generations.
>>>>>>>>>>> As long as we don't have that we'll
keep having these broken
>>>>>>>>>>> migration problems from machine types
dynamically changing instead
>>>>>>>>>>> of providing a stable guest ABI.
>>>>>>>>>>
>>>>>>>>>> Any more elaboration on this idea? Would it be
easily feasible in
>>>>>>>>>> implementation?
>>>>>>>>>
>>>>>>>>> In terms of launching QEMU I'd imagine:
>>>>>>>>>
>>>>>>>>> $QEMU -machine pc-q35-9.1 -platform linux-6.9
...args...
>>>>>>>>>
>>>>>>>>> Any virtual machine HW features which are tied to
host kernel features
>>>>>>>>> would have their defaults set based on the requested
-platform. The
>>>>>>>>> -machine will be fully invariant wrt the host
kernel.
>>>>>>>>>
>>>>>>>>> You would have -platform hlep to list available
platforms, and
>>>>>>>>> corresonding QMP "query-platforms" command
to list what platforms
>>>>>>>>> are supported on a given host OS.
>>>>>>>>>
>>>>>>>>> Downstream distros can provide their own platforms
definitions
>>>>>>>>> (eg "linux-rhel-9.5") if they have kernels
whose feature set
>>>>>>>>> diverges from upstream due to backports.
>>>>>>>>>
>>>>>>>>> Mgmt apps won't need to be taught about every
single little QEMU
>>>>>>>>> setting whose default is derived from the kernel.
Individual
>>>>>>>>> defaults are opaque and controlled by the requested
platform.
>>>>>>>>>
>>>>>>>>> Live migration has clearly defined semantics, and
mgmt app can
>>>>>>>>> use query-platforms to validate two hosts are
compatible.
>>>>>>>>>
>>>>>>>>> Omitting -platform should pick the very latest
platform that is
>>>>>>>>> cmpatible with the current host (not neccessarily the
latest
>>>>>>>>> platform built-in to QEMU).
>>>>>>>>
>>>>>>>> This seems to add one more layer to maintain, and so far
I don't know
>>>>>>>> whether it's a must.
>>>>>>>>
>>>>>>>> To put it simple, can we simply rely on qemu cmdline as
"the guest ABI"? I
>>>>>>>> thought it was mostly the case already, except some
extremely rare
>>>>>>>> outliers.
>>>>>>>>
>>>>>>>> When we have one host that boots up a VM using:
>>>>>>>>
>>>>>>>> $QEMU1 $cmdline
>>>>>>>>
>>>>>>>> Then another host boots up:
>>>>>>>>
>>>>>>>> $QEMU2 $cmdline -incoming XXX
>>>>>>>>
>>>>>>>> Then migration should succeed if $cmdline is exactly the
same, and the VM
>>>>>>>> can boot up all fine without errors on both sides.
>>>>>>>>
>>>>>>>> AFAICT this has nothing to do with what kernel is
underneath, even not
>>>>>>>> Linux? I think either QEMU1 / QEMU2 has the option to
fail. But if it
>>>>>>>> didn't, I thought the ABI should be guaranteed.
>>>>>>>>
>>>>>>>> That's why I think this is a migration violation, as
99.99% of other device
>>>>>>>> properties should be following this rule. The issue here
is, we have the
>>>>>>>> same virtio-net-pci cmdline on both sides in this case,
but the ABI got
>>>>>>>> break.
>>>>>>>>
>>>>>>>> That's also why I was suggesting if the property
contributes to the guest
>>>>>>>> ABI, then AFAIU QEMU needs to:
>>>>>>>>
>>>>>>>> - Firstly, never quietly flipping any bit that
affects the ABI...
>>>>>>>>
>>>>>>>> - Have a default value of off, then QEMU will
always allow the VM to boot
>>>>>>>> by default, while advanced users can opt-in on
new features. We can't
>>>>>>>> make this ON by default otherwise some VMs can
already fail to boot,
>>>>>>>
>>>>>>> It may not be necessary the case that old features are
supported by
>>>>>>> every systems. In an extreme case, a user may migrate a VM
from Linux to
>>>>>>> Windows, which probably doesn't support any offloading at
all. A more
>>>>>>> convincing scenario is RSS offloading with eBPF; using eBPF
requires a
>>>>>>> privilege so we cannot assume it is always available even on
the latest
>>>>>>> version of Linux.
>>>>>>
>>>>>> I don't get why eBPF matters here. It is something that is
not noticed
>>>>>> by the guest and we have a fallback anyhow.
>>>
>>> It is noticeable for the guest, and the fallback is not effective with
>>> vhost.
>>
>> It's a bug then. Qemu can fallback to tuntap if it sees issues in vhost.
>
> We can certainly fallback to in-QEMU RSS by disabling vhost, but I would
> not say lack of such fallback is a bug.
Such fallback is by design since the introduction of vhost.
> We don't provide in-QEMU
> fallback for other offloads.
Yes but what I want to say is that eBPF RSS is different from those
segmentation offloads. And technically, Qemu can do fallback for
offloads (as RSC did).
Well, I couldn't find any code disabling vhost for the in-QEMU RSC
implementation.
Looking at the code, I also found the case of vhost-vdpa. vhost can be
simply disabled if it is backed by tuntap, but it is not the case for vDPA.
Regards,
Akihiko Odaki