On Mon, Jul 13, 2020 at 14:04:25 +0200, Jiri Denemark wrote:
On Sat, Jul 11, 2020 at 13:44:19 -0400, Mark Mielke wrote:
> On Sat, Jul 11, 2020 at 6:04 AM Mark Mielke <mark.mielke(a)gmail.com> wrote:
>
> > On Fri, Jul 10, 2020 at 7:48 AM Mark Mielke <mark.mielke(a)gmail.com>
wrote:
> >
> >> On Fri, Jul 10, 2020 at 7:14 AM Jiri Denemark <jdenemar(a)redhat.com>
> >> wrote:
> >>
> >>> The implementation seems to be doing exactly what the commit message
> >>>
> >> says. The migratable=off default should be used only when QEMU does not
> >>> support -cpu host,migratable=on|off, that is only when QEMU is very
old.
> >>> Every non-ancient version of libvirt should have the
> >>> QEMU_CAPS_CPU_MIGRATABLE set and thus this code should choose
> >>> migrateble=on default.
> >>>
> >> QEMU_CAPS_CPU_MIGRATABLE only from the <cpu> element? If so,
doesn't this
> >> mean that it is not explicitly listed for host-passthrough, and this means
> >> the check is not detecting whether it is enabled or not properly?
> >>
> > Trying to understand what is going on more - I see "migratable" seems
to
> > be ok when launching a new machine, but the failure scenario was live
> > migration from 6.4.0 to 6.5.0.
> >
> > Is this because the QEMU_CAPS_CPU_MIGRATABLE was not filled in for 6.4.0,
> > and live migration grabs the capabilities from the source, where the
> > absence of this capability makes it presume an older Qemu in the above code?
> >
>
> Sorry all - I am having trouble reproducing now. The expected use cases are
> now working.
>
> Is it possible that the "migratable" flag might have been missing on some
> of the instances, although migration worked fine, and despite having used
> Qemu 4.2 and Qemu 5.0?
When an updated libvirtd which knows about this new capability starts,
it would reprobe all QEMU capabilities (lazily, i.e., once they are
needed). However, if there is a running domain, libvirt will use cached
capabilities probed when the domain was started. I suspect migrating
such domain could be a problem. I'll try to reproduce locally.
OK, I did not reproduce the failure, because migratable=off doesn't
enable anything more than migratable=on (likely because L1 VM in my
nested environment does not have any non-migratable features enabled).
But I was able to reproduce the issue itself and the migration could
clearly fail if migratable=off enabled some non-migratable features. The
reproducer is actually easy and one doesn't even need to migrate to see
libvirt did something wrong:
1. run libvirtd older then 6.5.0
2. start a domain with host-passthrough CPU (QEMU would default to
migratable=on)
3. upgrade libvirt to 6.5.0 and restart libvirtd
4. virsh dumpxml $DOMAIN_STARTED_IN_STEP_2
Now you would see
<cpu mode='host-passthrough' check='none'
migratable='off'/>
which differs from the default used by QEMU. Migrating such domain would
succeed anyway, because it was actually started with migratable='on'.
But when such domain is migrated to libvirt 6.5.0, we would honor the
migratable attribute and start QEMU with -cpu host,migratable=off which
could cause failures when trying to migrate this domain again.
The problem is exactly where I was afraid it could be. When libvirtd
starts, it reads the QEMU capabilities probed by older libvirt
(QEMU_CAPS_CPU_MIGRATABLE would be off) and wrongly updates the XML of
the running domain. I'll prepare a patch to fix this.
Jirka