On 07 Oct 2014, at 23:02 , Jiri Denemark <jdenemar(a)redhat.com> wrote:
On Tue, Sep 30, 2014 at 16:39:27 +0200, Cristian Klein wrote:
> Perform phase stops once migration switched to post-copy.
> Confirm phase waits for post-copy to finish before killing the VM.
>
> Signed-off-by: Cristian Klein <cristian.klein(a)cs.umu.se>
> ---
> src/qemu/qemu_driver.c | 8 ++++++++
> src/qemu/qemu_migration.c | 25 ++++++++++++++++++++++---
> 2 files changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index e873d45..3fe2216 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -10942,6 +10942,14 @@ qemuDomainMigratePrepare2(virConnectPtr dconn,
>
> virCheckFlags(QEMU_MIGRATION_FLAGS, -1);
>
> + if (flags & VIR_MIGRATE_POSTCOPY) {
> + /* post-copy migration does not work with Sequence v2 */
> + virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> + _("Post-copy migration requested but not "
> + "supported by v2 protocol"));
> + goto cleanup;
> + }
> +
This code should be unreachable. If both source and destination daemons
support VIR_MIGRATE_POSTCOPY, they support v3 protocol as well. And a
client new enough to specify VIR_MIGRATE_POSTCOPY will also support v3
migration protocol. However, it doesn't hurt to check this to be safe.
> if (flags & VIR_MIGRATE_TUNNELLED) {
> /* this is a logical error; we never should have gotten here with
> * VIR_MIGRATE_TUNNELLED set
> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
> index 4a36946..436b701 100644
> --- a/src/qemu/qemu_migration.c
> +++ b/src/qemu/qemu_migration.c
> @@ -2039,6 +2039,11 @@ qemuMigrationUpdateJobStatus(virQEMUDriverPtr driver,
> ret = 0;
> break;
>
> + case QEMU_MONITOR_MIGRATION_STATUS_POSTCOPY_ACTIVE:
> + jobInfo->type = VIR_DOMAIN_JOB_PHASE1_COMPLETED;
> + ret = 0;
> + break;
> +
This will need to be dropped after 5/8 is removed. However, it reminds
me...
enum {
QEMU_MONITOR_MIGRATION_STATUS_INACTIVE,
QEMU_MONITOR_MIGRATION_STATUS_ACTIVE,
QEMU_MONITOR_MIGRATION_STATUS_COMPLETED,
QEMU_MONITOR_MIGRATION_STATUS_ERROR,
QEMU_MONITOR_MIGRATION_STATUS_CANCELLED,
QEMU_MONITOR_MIGRATION_STATUS_SETUP,
QEMU_MONITOR_MIGRATION_STATUS_POSTCOPY_ACTIVE,
QEMU_MONITOR_MIGRATION_STATUS_LAST
};
in qemu_monitor.h needs to be turned into a proper typedef so that
switch (status.status) {
line in qemuMigrationUpdateJobStatus may be changed to explicitly
mention the enum so that the compiler may report a warning whenever we
add new status but forgot to handle it in this switch.
What about `QEMU_MONITOR_MIGRATION_STATUS_LAST`? Should this be included in a switch with
an assertion that that code should never be reached?
Which means the
new state will need to be handled in the same patch it was introduced,
i.e, in 3/8.
> case QEMU_MONITOR_MIGRATION_STATUS_INACTIVE:
> jobInfo->type = VIR_DOMAIN_JOB_NONE;
> virReportError(VIR_ERR_OPERATION_FAILED,
> @@ -2077,6 +2082,7 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver,
> qemuDomainJobInfoPtr jobInfo = priv->job.current;
> const char *job;
> int pauseReason;
> + bool inPhase2 = (jobInfo->type == VIR_DOMAIN_JOB_PHASE1_COMPLETED);
I think it would be cleaner to pass this info in a new parameter for
qemuMigrationWaitForCompletion instead of doing a hidden black magic
here.
I agree with you, I wasn’t sure about with this way of coding things either. Is it okey to
ask developers to always create a new intermediate parameter to call this function, so as
to make its usage easier to read? E.g.,
“””
bool return_on_postcopy_active = false;
rv = qemuMigrationWaitForCompletion(driver, vm,
QEMU_ASYNC_JOB_MIGRATION_OUT,
conn, abort_on_error,
return_on_postcopy_active);
“”"
>
> switch (priv->job.asyncJob) {
> case QEMU_ASYNC_JOB_MIGRATION_OUT:
> @@ -2092,9 +2098,11 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver,
> job = _("job");
> }
>
> - jobInfo->type = VIR_DOMAIN_JOB_UNBOUNDED;
> + if (!inPhase2)
> + jobInfo->type = VIR_DOMAIN_JOB_UNBOUNDED;
>
> - while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) {
> + while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED ||
> + (inPhase2 && jobInfo->type ==
VIR_DOMAIN_JOB_PHASE1_COMPLETED)) {
Just use jobInfo->status.status directly.
> /* Poll every 50ms for progress & to allow cancellation */
> struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull };
>
> @@ -2123,7 +2131,8 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver,
> virObjectLock(vm);
> }
>
> - if (jobInfo->type == VIR_DOMAIN_JOB_COMPLETED) {
> + if (jobInfo->type == VIR_DOMAIN_JOB_COMPLETED ||
> + jobInfo->type == VIR_DOMAIN_JOB_PHASE1_COMPLETED) {
This shouldn't be needed when 5/8 is dropped.
> qemuDomainJobInfoUpdateDowntime(jobInfo);
> VIR_FREE(priv->job.completed);
> if (VIR_ALLOC(priv->job.completed) == 0)
> @@ -3149,6 +3158,16 @@ qemuMigrationConfirmPhase(virQEMUDriverPtr driver,
>
> virCheckFlags(QEMU_MIGRATION_FLAGS, -1);
>
> + /* Wait for post-copy to complete */
> + if (flags & VIR_MIGRATE_POSTCOPY) {
> + bool abort_on_error = !!(flags & VIR_MIGRATE_ABORT_ON_ERROR);
> + rv = qemuMigrationWaitForCompletion(driver, vm,
> + QEMU_ASYNC_JOB_MIGRATION_OUT,
> + conn, abort_on_error);
> + if (rv < 0)
> + goto cleanup;
> + }
> +
This has to be done after setting the phase. We may need to add a new so
that we can recover from post-copy migrations when libvirtd is
restarted.
Correct me if I’m wrong, but isn’t the phase already set to
`QEMU_MIGRATION_PHASE_CONFIRM3` by `qemuMigrationConfirm`, which calls
`qemuMigrationConfirmPhase`?
Also, I’m not sure we need another phase. If libvirtd is restarted, it should continue
with phase CONFIRM3, wait for post-copy migration to finish or observe that the VM has
finished post-copy migration, and kill it. Am I missing something?
> qemuMigrationJobSetPhase(driver, vm,
> retcode == 0
> ? QEMU_MIGRATION_PHASE_CONFIRM3
Cristian