
On 07 Oct 2014, at 23:02 , Jiri Denemark <jdenemar@redhat.com> wrote:
On Tue, Sep 30, 2014 at 16:39:27 +0200, Cristian Klein wrote:
Perform phase stops once migration switched to post-copy. Confirm phase waits for post-copy to finish before killing the VM.
Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- src/qemu/qemu_driver.c | 8 ++++++++ src/qemu/qemu_migration.c | 25 ++++++++++++++++++++++--- 2 files changed, 30 insertions(+), 3 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index e873d45..3fe2216 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -10942,6 +10942,14 @@ qemuDomainMigratePrepare2(virConnectPtr dconn,
virCheckFlags(QEMU_MIGRATION_FLAGS, -1);
+ if (flags & VIR_MIGRATE_POSTCOPY) { + /* post-copy migration does not work with Sequence v2 */ + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Post-copy migration requested but not " + "supported by v2 protocol")); + goto cleanup; + } +
This code should be unreachable. If both source and destination daemons support VIR_MIGRATE_POSTCOPY, they support v3 protocol as well. And a client new enough to specify VIR_MIGRATE_POSTCOPY will also support v3 migration protocol. However, it doesn't hurt to check this to be safe.
if (flags & VIR_MIGRATE_TUNNELLED) { /* this is a logical error; we never should have gotten here with * VIR_MIGRATE_TUNNELLED set diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 4a36946..436b701 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2039,6 +2039,11 @@ qemuMigrationUpdateJobStatus(virQEMUDriverPtr driver, ret = 0; break;
+ case QEMU_MONITOR_MIGRATION_STATUS_POSTCOPY_ACTIVE: + jobInfo->type = VIR_DOMAIN_JOB_PHASE1_COMPLETED; + ret = 0; + break; +
This will need to be dropped after 5/8 is removed. However, it reminds me...
enum { QEMU_MONITOR_MIGRATION_STATUS_INACTIVE, QEMU_MONITOR_MIGRATION_STATUS_ACTIVE, QEMU_MONITOR_MIGRATION_STATUS_COMPLETED, QEMU_MONITOR_MIGRATION_STATUS_ERROR, QEMU_MONITOR_MIGRATION_STATUS_CANCELLED, QEMU_MONITOR_MIGRATION_STATUS_SETUP, QEMU_MONITOR_MIGRATION_STATUS_POSTCOPY_ACTIVE,
QEMU_MONITOR_MIGRATION_STATUS_LAST };
in qemu_monitor.h needs to be turned into a proper typedef so that
switch (status.status) {
line in qemuMigrationUpdateJobStatus may be changed to explicitly mention the enum so that the compiler may report a warning whenever we add new status but forgot to handle it in this switch.
What about `QEMU_MONITOR_MIGRATION_STATUS_LAST`? Should this be included in a switch with an assertion that that code should never be reached?
Which means the new state will need to be handled in the same patch it was introduced, i.e, in 3/8.
case QEMU_MONITOR_MIGRATION_STATUS_INACTIVE: jobInfo->type = VIR_DOMAIN_JOB_NONE; virReportError(VIR_ERR_OPERATION_FAILED, @@ -2077,6 +2082,7 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, qemuDomainJobInfoPtr jobInfo = priv->job.current; const char *job; int pauseReason; + bool inPhase2 = (jobInfo->type == VIR_DOMAIN_JOB_PHASE1_COMPLETED);
I think it would be cleaner to pass this info in a new parameter for qemuMigrationWaitForCompletion instead of doing a hidden black magic here.
I agree with you, I wasn’t sure about with this way of coding things either. Is it okey to ask developers to always create a new intermediate parameter to call this function, so as to make its usage easier to read? E.g., “”” bool return_on_postcopy_active = false; rv = qemuMigrationWaitForCompletion(driver, vm, QEMU_ASYNC_JOB_MIGRATION_OUT, conn, abort_on_error, return_on_postcopy_active); “”"
switch (priv->job.asyncJob) { case QEMU_ASYNC_JOB_MIGRATION_OUT: @@ -2092,9 +2098,11 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, job = _("job"); }
- jobInfo->type = VIR_DOMAIN_JOB_UNBOUNDED; + if (!inPhase2) + jobInfo->type = VIR_DOMAIN_JOB_UNBOUNDED;
- while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) { + while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED || + (inPhase2 && jobInfo->type == VIR_DOMAIN_JOB_PHASE1_COMPLETED)) {
Just use jobInfo->status.status directly.
/* Poll every 50ms for progress & to allow cancellation */ struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull };
@@ -2123,7 +2131,8 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, virObjectLock(vm); }
- if (jobInfo->type == VIR_DOMAIN_JOB_COMPLETED) { + if (jobInfo->type == VIR_DOMAIN_JOB_COMPLETED || + jobInfo->type == VIR_DOMAIN_JOB_PHASE1_COMPLETED) {
This shouldn't be needed when 5/8 is dropped.
qemuDomainJobInfoUpdateDowntime(jobInfo); VIR_FREE(priv->job.completed); if (VIR_ALLOC(priv->job.completed) == 0) @@ -3149,6 +3158,16 @@ qemuMigrationConfirmPhase(virQEMUDriverPtr driver,
virCheckFlags(QEMU_MIGRATION_FLAGS, -1);
+ /* Wait for post-copy to complete */ + if (flags & VIR_MIGRATE_POSTCOPY) { + bool abort_on_error = !!(flags & VIR_MIGRATE_ABORT_ON_ERROR); + rv = qemuMigrationWaitForCompletion(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT, + conn, abort_on_error); + if (rv < 0) + goto cleanup; + } +
This has to be done after setting the phase. We may need to add a new so that we can recover from post-copy migrations when libvirtd is restarted.
Correct me if I’m wrong, but isn’t the phase already set to `QEMU_MIGRATION_PHASE_CONFIRM3` by `qemuMigrationConfirm`, which calls `qemuMigrationConfirmPhase`? Also, I’m not sure we need another phase. If libvirtd is restarted, it should continue with phase CONFIRM3, wait for post-copy migration to finish or observe that the VM has finished post-copy migration, and kill it. Am I missing something?
qemuMigrationJobSetPhase(driver, vm, retcode == 0 ? QEMU_MIGRATION_PHASE_CONFIRM3
Cristian