On Thu, May 12, 2022 at 13:50:08 +0200, Peter Krempa wrote:
On Tue, May 10, 2022 at 17:21:11 +0200, Jiri Denemark wrote:
> This phase marks a migration protocol as broken in a post-copy phase.
> Libvirt is no longer actively watching the migration in this phase as
> the migration API that started the migration failed.
>
> This may either happen when post-copy migration really fails (QEMU
> enters postcopy-paused migration state) or when the migration still
> progresses between both QEMU processes, but libvirt lost control of it
> because the connection between libvirt daemons (in p2p migration) or a
> daemon and client (non-p2p migration) was closed. For example, when one
> of the daemons was restarted.
>
> Signed-off-by: Jiri Denemark <jdenemar(a)redhat.com>
> ---
> src/qemu/qemu_migration.c | 15 +++++++++++----
> src/qemu/qemu_process.c | 16 +++++++++++++---
> 2 files changed, 24 insertions(+), 7 deletions(-)
>
[...]
> @@ -6327,9 +6334,9 @@ qemuMigrationProcessUnattended(virQEMUDriver *driver,
> vm->def->name);
>
> if (job == VIR_ASYNC_JOB_MIGRATION_IN)
> - phase = QEMU_MIGRATION_PHASE_FINISH3;
> + phase = QEMU_MIGRATION_PHASE_FINISH_RESUME;
> else
> - phase = QEMU_MIGRATION_PHASE_CONFIRM3;
> + phase = QEMU_MIGRATION_PHASE_CONFIRM_RESUME;
>
> if (qemuMigrationJobStartPhase(vm, phase) < 0)
> return;
This hunk seems to be misplaced or at least doesn't really seem to be
related to anything this patch is claiming to do.
Thanks to changes in this patch migration is in
QEMU_MIGRATION_PHASE_POSTCOPY_FAILED phase when we get here so to make
it all work, we need to use RESUME phases >
QEMU_MIGRATION_PHASE_POSTCOPY_FAILED. Otherwise qemuMigrationCheckPhase
would complain. Perhaps this could be moved to "Add new migration phases
for post-copy recovery", but I haven't checked for sure.
[..]
> @@ -3751,9 +3752,18 @@ qemuProcessRecoverMigration(virQEMUDriver *driver,
> return -1;
>
> if (rc > 0) {
> + job->phase = QEMU_MIGRATION_PHASE_POSTCOPY_FAILED;
> +
> if (migStatus == VIR_DOMAIN_JOB_STATUS_POSTCOPY) {
> VIR_DEBUG("Post-copy migration of domain %s still running, it
"
> "will be handled as unattended",
vm->def->name);
> +
> + if (state == VIR_DOMAIN_RUNNING)
> + reason = VIR_DOMAIN_RUNNING_POSTCOPY;
> + else
> + reason = VIR_DOMAIN_PAUSED_POSTCOPY;
This bit also doesn't seem to be justified by what this patch is
supposed to do.
Until now broken migration protocol in post-copy phase has been
indicated with VIR_DOMAIN_*_POSTCOPY_FAILED states, which changed in
this patch and we can use the right state here depending on the QEMU
state. That said, it could be moved into a separate patch.
Jirka