On Wed, May 18, 2022 at 14:53:50 +0200, Jiri Denemark wrote:
On Thu, May 12, 2022 at 13:50:08 +0200, Peter Krempa wrote:
> On Tue, May 10, 2022 at 17:21:11 +0200, Jiri Denemark wrote:
> > This phase marks a migration protocol as broken in a post-copy phase.
> > Libvirt is no longer actively watching the migration in this phase as
> > the migration API that started the migration failed.
> >
> > This may either happen when post-copy migration really fails (QEMU
> > enters postcopy-paused migration state) or when the migration still
> > progresses between both QEMU processes, but libvirt lost control of it
> > because the connection between libvirt daemons (in p2p migration) or a
> > daemon and client (non-p2p migration) was closed. For example, when one
> > of the daemons was restarted.
> >
> > Signed-off-by: Jiri Denemark <jdenemar(a)redhat.com>
> > ---
> > src/qemu/qemu_migration.c | 15 +++++++++++----
> > src/qemu/qemu_process.c | 16 +++++++++++++---
> > 2 files changed, 24 insertions(+), 7 deletions(-)
> >
>
> [...]
>
>
> > @@ -6327,9 +6334,9 @@ qemuMigrationProcessUnattended(virQEMUDriver *driver,
> > vm->def->name);
> >
> > if (job == VIR_ASYNC_JOB_MIGRATION_IN)
> > - phase = QEMU_MIGRATION_PHASE_FINISH3;
> > + phase = QEMU_MIGRATION_PHASE_FINISH_RESUME;
> > else
> > - phase = QEMU_MIGRATION_PHASE_CONFIRM3;
> > + phase = QEMU_MIGRATION_PHASE_CONFIRM_RESUME;
> >
> > if (qemuMigrationJobStartPhase(vm, phase) < 0)
> > return;
>
> This hunk seems to be misplaced or at least doesn't really seem to be
> related to anything this patch is claiming to do.
Thanks to changes in this patch migration is in
QEMU_MIGRATION_PHASE_POSTCOPY_FAILED phase when we get here so to make
it all work, we need to use RESUME phases >
QEMU_MIGRATION_PHASE_POSTCOPY_FAILED. Otherwise qemuMigrationCheckPhase
would complain. Perhaps this could be moved to "Add new migration phases
for post-copy recovery", but I haven't checked for sure.
>
> [..]
>
> > @@ -3751,9 +3752,18 @@ qemuProcessRecoverMigration(virQEMUDriver *driver,
> > return -1;
> >
> > if (rc > 0) {
> > + job->phase = QEMU_MIGRATION_PHASE_POSTCOPY_FAILED;
> > +
> > if (migStatus == VIR_DOMAIN_JOB_STATUS_POSTCOPY) {
> > VIR_DEBUG("Post-copy migration of domain %s still running, it
"
> > "will be handled as unattended",
vm->def->name);
> > +
> > + if (state == VIR_DOMAIN_RUNNING)
> > + reason = VIR_DOMAIN_RUNNING_POSTCOPY;
> > + else
> > + reason = VIR_DOMAIN_PAUSED_POSTCOPY;
>
> This bit also doesn't seem to be justified by what this patch is
> supposed to do.
Until now broken migration protocol in post-copy phase has been
indicated with VIR_DOMAIN_*_POSTCOPY_FAILED states, which changed in
this patch and we can use the right state here depending on the QEMU
state. That said, it could be moved into a separate patch.
Fair enough on both points.
Reviewed-by: Peter Krempa <pkrempa(a)redhat.com>