[libvirt] [PATCH 0/3] Fix various migration issues

Jiri Denemark (3): Fix error message when TUNNELLED flag is used in non-p2p migration qemu: Send migrate_cancel when aborting migration qemu: Properly abort migration to a file src/libvirt.c | 8 ++++- src/qemu/qemu_migration.c | 74 ++++++++++++++++++++++++++++++----------------- 2 files changed, 54 insertions(+), 28 deletions(-) -- 1.9.3

The current error message is error: use virDomainMigrateToURI3 for peer-to-peer migration which is correct but a bit misleading because the client did not specify VIR_MIGRATE_PEER2PEER flag. This patch changes the error message to error: cannot perform tunnelled migration without using peer2peer flag which is consistent with the error reported by older migration APIs. Reported by Rich Jones in https://bugzilla.redhat.com/show_bug.cgi?id=1095924 Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/libvirt.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/libvirt.c b/src/libvirt.c index 19fa18b..72a9f6d 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -5723,12 +5723,18 @@ virDomainMigrate3(virDomainPtr domain, __FUNCTION__); goto error; } - if (flags & (VIR_MIGRATE_PEER2PEER | VIR_MIGRATE_TUNNELLED)) { + if (flags & VIR_MIGRATE_PEER2PEER) { virReportInvalidArg(flags, "%s", _("use virDomainMigrateToURI3 for peer-to-peer " "migration")); goto error; } + if (flags & VIR_MIGRATE_TUNNELLED) { + virReportInvalidArg(flags, "%s", + _("cannot perform tunnelled migration " + "without using peer2peer flag")); + goto error; + } if (flags & VIR_MIGRATE_OFFLINE) { if (!VIR_DRV_SUPPORTS_FEATURE(domain->conn->driver, domain->conn, -- 1.9.3

On 05/22/14 13:55, Jiri Denemark wrote:
The current error message is
error: use virDomainMigrateToURI3 for peer-to-peer migration
which is correct but a bit misleading because the client did not specify VIR_MIGRATE_PEER2PEER flag. This patch changes the error message to
error: cannot perform tunnelled migration without using peer2peer flag
which is consistent with the error reported by older migration APIs.
Reported by Rich Jones in https://bugzilla.redhat.com/show_bug.cgi?id=1095924
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/libvirt.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/src/libvirt.c b/src/libvirt.c index 19fa18b..72a9f6d 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -5723,12 +5723,18 @@ virDomainMigrate3(virDomainPtr domain, __FUNCTION__); goto error; } - if (flags & (VIR_MIGRATE_PEER2PEER | VIR_MIGRATE_TUNNELLED)) { + if (flags & VIR_MIGRATE_PEER2PEER) { virReportInvalidArg(flags, "%s", _("use virDomainMigrateToURI3 for peer-to-peer " "migration")); goto error; } + if (flags & VIR_MIGRATE_TUNNELLED) { + virReportInvalidArg(flags, "%s", + _("cannot perform tunnelled migration " + "without using peer2peer flag"));
<bikeshed> This error message will incline the user to specifying the VIR_MIGRATE_PEER2PEER flag for this api, so that he receives the error above. how about "use virDomainMigrateToURI3 and specify the VIR_MIGRATE_PEER2PEER flag for tunelled migration" Not sure whether that will be more clear though. </bikeshed>
+ goto error; + }
if (flags & VIR_MIGRATE_OFFLINE) { if (!VIR_DRV_SUPPORTS_FEATURE(domain->conn->driver, domain->conn,
ACK, Peter

On Thu, May 22, 2014 at 15:56:57 +0200, Peter Krempa wrote:
On 05/22/14 13:55, Jiri Denemark wrote:
The current error message is
error: use virDomainMigrateToURI3 for peer-to-peer migration
which is correct but a bit misleading because the client did not specify VIR_MIGRATE_PEER2PEER flag. This patch changes the error message to
error: cannot perform tunnelled migration without using peer2peer flag
which is consistent with the error reported by older migration APIs.
Reported by Rich Jones in https://bugzilla.redhat.com/show_bug.cgi?id=1095924
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/libvirt.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/src/libvirt.c b/src/libvirt.c index 19fa18b..72a9f6d 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -5723,12 +5723,18 @@ virDomainMigrate3(virDomainPtr domain, __FUNCTION__); goto error; } - if (flags & (VIR_MIGRATE_PEER2PEER | VIR_MIGRATE_TUNNELLED)) { + if (flags & VIR_MIGRATE_PEER2PEER) { virReportInvalidArg(flags, "%s", _("use virDomainMigrateToURI3 for peer-to-peer " "migration")); goto error; } + if (flags & VIR_MIGRATE_TUNNELLED) { + virReportInvalidArg(flags, "%s", + _("cannot perform tunnelled migration " + "without using peer2peer flag"));
<bikeshed> This error message will incline the user to specifying the VIR_MIGRATE_PEER2PEER flag for this api, so that he receives the error above.
how about "use virDomainMigrateToURI3 and specify the VIR_MIGRATE_PEER2PEER flag for tunelled migration"
Not sure whether that will be more clear though.
</bikeshed>
Well, the API documentation says VIR_MIGRATE_TUNNELLED and VIR_MIGRATE_PEER2PEER are not supported by this API, use virDomainMigrateToURI3 instead. so any direct user of this API should already call the right API. And virsh users would just see the new error and once they add the peer2peer flag, virsh will automatically use the correct API. So I guess it's good enough :-) Jirka

When QEMU reports failed or cancelled migration, we don't need to send it migrate_cancel QMP command. But in all other error paths, such as if we detect broken connection to a destination daemon or something else happens inside libvirt, we need to explicitly send migrate_cancel command instead of relying on the migration to be implicitly cancelled when destination QEMU is killed. Because we were not doing so, one could end up with a paused domain after failed migration. https://bugzilla.redhat.com/show_bug.cgi?id=807023 Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 53 ++++++++++++++++++++++++----------------------- 1 file changed, 27 insertions(+), 26 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index d6271fb..ae18acb 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1724,10 +1724,9 @@ qemuMigrationUpdateJobStatus(virQEMUDriverPtr driver, priv->job.status = status; - if (ret < 0 || virTimeMillisNow(&priv->job.info.timeElapsed) < 0) { - priv->job.info.type = VIR_DOMAIN_JOB_FAILED; + if (ret < 0 || virTimeMillisNow(&priv->job.info.timeElapsed) < 0) return -1; - } + priv->job.info.timeElapsed -= priv->job.start; ret = -1; @@ -1784,6 +1783,9 @@ qemuMigrationUpdateJobStatus(virQEMUDriverPtr driver, } +/* Returns 0 on success, -2 when migration needs to be cancelled, or -1 when + * QEMU reports failed migration. + */ static int qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, virDomainObjPtr vm, enum qemuDomainAsyncJob asyncJob, @@ -1814,18 +1816,21 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, virDomainObjPtr vm, struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull }; if (qemuMigrationUpdateJobStatus(driver, vm, job, asyncJob) == -1) - goto cleanup; + break; /* cancel migration if disk I/O error is emitted while migrating */ if (abort_on_error && virDomainObjGetState(vm, &pauseReason) == VIR_DOMAIN_PAUSED && - pauseReason == VIR_DOMAIN_PAUSED_IOERROR) - goto cancel; + pauseReason == VIR_DOMAIN_PAUSED_IOERROR) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("%s: %s"), job, _("failed due to I/O error")); + break; + } if (dconn && virConnectIsAlive(dconn) <= 0) { virReportError(VIR_ERR_OPERATION_FAILED, "%s", _("Lost connection to destination host")); - goto cleanup; + break; } virObjectUnlock(vm); @@ -1835,25 +1840,17 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, virDomainObjPtr vm, virObjectLock(vm); } - cleanup: - if (priv->job.info.type == VIR_DOMAIN_JOB_COMPLETED) + if (priv->job.info.type == VIR_DOMAIN_JOB_COMPLETED) { return 0; - else + } else if (priv->job.info.type == VIR_DOMAIN_JOB_UNBOUNDED) { + /* The migration was aborted by us rather than QEMU itself so let's + * update the job type and notify the caller to send migrate_cancel. + */ + priv->job.info.type = VIR_DOMAIN_JOB_FAILED; + return -2; + } else { return -1; - - cancel: - if (virDomainObjIsActive(vm)) { - if (qemuDomainObjEnterMonitorAsync(driver, vm, - priv->job.asyncJob) == 0) { - qemuMonitorMigrateCancel(priv->mon); - qemuDomainObjExitMonitor(driver, vm); - } } - - priv->job.info.type = VIR_DOMAIN_JOB_FAILED; - virReportError(VIR_ERR_OPERATION_FAILED, - _("%s: %s"), job, _("failed due to I/O error")); - return -1; } @@ -3229,6 +3226,7 @@ qemuMigrationRun(virQEMUDriverPtr driver, virErrorPtr orig_err = NULL; unsigned int cookieFlags = 0; bool abort_on_error = !!(flags & VIR_MIGRATE_ABORT_ON_ERROR); + int rc; VIR_DEBUG("driver=%p, vm=%p, cookiein=%s, cookieinlen=%d, " "cookieout=%p, cookieoutlen=%p, flags=%lx, resource=%lu, " @@ -3385,9 +3383,12 @@ qemuMigrationRun(virQEMUDriverPtr driver, !(iothread = qemuMigrationStartTunnel(spec->fwd.stream, fd))) goto cancel; - if (qemuMigrationWaitForCompletion(driver, vm, - QEMU_ASYNC_JOB_MIGRATION_OUT, - dconn, abort_on_error) < 0) + rc = qemuMigrationWaitForCompletion(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT, + dconn, abort_on_error); + if (rc == -2) + goto cancel; + else if (rc == -1) goto cleanup; /* When migration completed, QEMU will have paused the -- 1.9.3

On 05/22/14 13:55, Jiri Denemark wrote:
When QEMU reports failed or cancelled migration, we don't need to send it migrate_cancel QMP command. But in all other error paths, such as if we detect broken connection to a destination daemon or something else happens inside libvirt, we need to explicitly send migrate_cancel command instead of relying on the migration to be implicitly cancelled when destination QEMU is killed.
Because we were not doing so, one could end up with a paused domain after failed migration.
https://bugzilla.redhat.com/show_bug.cgi?id=807023 Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 53 ++++++++++++++++++++++++----------------------- 1 file changed, 27 insertions(+), 26 deletions(-)
ACK, Peter

-----Original Message----- From: libvir-list-bounces@redhat.com [mailto:libvir-list-bounces@redhat.com] On Behalf Of Jiri Denemark Sent: Thursday, May 22, 2014 7:55 PM To: libvir-list@redhat.com Subject: [libvirt] [PATCH 2/3] qemu: Send migrate_cancel when aborting migration
When QEMU reports failed or cancelled migration, we don't need to send it migrate_cancel QMP command. But in all other error paths, such as if we detect broken connection to a destination daemon or something else happens inside libvirt, we need to explicitly send migrate_cancel command instead of relying on the migration to be implicitly cancelled when destination QEMU is killed.
Because we were not doing so, one could end up with a paused domain after failed migration.
Hi, Jiri I'm interested in the patch. I want to know the specific scene. But the bug link you posted is about device_del. Is there another link about migration? (I think)
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> ---

----- Original Message -----
From: "Wangrui (K)" <moon.wangrui@huawei.com> To: "Jiri Denemark" <jdenemar@redhat.com>, libvir-list@redhat.com Sent: Friday, May 23, 2014 9:40:23 AM Subject: Re: [libvirt] [PATCH 2/3] qemu: Send migrate_cancel when aborting migration
-----Original Message----- From: libvir-list-bounces@redhat.com [mailto:libvir-list-bounces@redhat.com] On Behalf Of Jiri Denemark Sent: Thursday, May 22, 2014 7:55 PM To: libvir-list@redhat.com Subject: [libvirt] [PATCH 2/3] qemu: Send migrate_cancel when aborting migration
When QEMU reports failed or cancelled migration, we don't need to send it migrate_cancel QMP command. But in all other error paths, such as if we detect broken connection to a destination daemon or something else happens inside libvirt, we need to explicitly send migrate_cancel command instead of relying on the migration to be implicitly cancelled when destination QEMU is killed.
Because we were not doing so, one could end up with a paused domain after failed migration.
Hi, Jiri I'm interested in the patch. I want to know the specific scene. But the bug link you posted is about device_del. Is there another link about migration? (I think)
https://bugzilla.redhat.com/show_bug.cgi?id=1098833
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> ---
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- Best Regards Zhe Peng

On Fri, May 23, 2014 at 01:40:23 +0000, Wangrui (K) wrote:
-----Original Message----- From: libvir-list-bounces@redhat.com [mailto:libvir-list-bounces@redhat.com] On Behalf Of Jiri Denemark Sent: Thursday, May 22, 2014 7:55 PM To: libvir-list@redhat.com Subject: [libvirt] [PATCH 2/3] qemu: Send migrate_cancel when aborting migration
When QEMU reports failed or cancelled migration, we don't need to send it migrate_cancel QMP command. But in all other error paths, such as if we detect broken connection to a destination daemon or something else happens inside libvirt, we need to explicitly send migrate_cancel command instead of relying on the migration to be implicitly cancelled when destination QEMU is killed.
Because we were not doing so, one could end up with a paused domain after failed migration.
Hi, Jiri I'm interested in the patch. I want to know the specific scene. But the bug link you posted is about device_del. Is there another link about migration? (I think)
Oops, that's what you get for working on several bugs at once :-) As already mentioned, the correct bug number is 1098833. I'll push the patch shortly with a corrected BZ link. Jirka

This is similar to the previous commit in that we need to explicitly send migrate_cancel when libvirt detects an error other than those reported by query-migrate. However, the possibility to hit such error is pretty small. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index ae18acb..5754f73 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -4704,6 +4704,7 @@ qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm, int pipeFD[2] = { -1, -1 }; unsigned long saveMigBandwidth = priv->migMaxBandwidth; char *errbuf = NULL; + virErrorPtr orig_err = NULL; /* Increase migration bandwidth to unlimited since target is a file. * Failure to change migration speed is not fatal. */ @@ -4806,8 +4807,17 @@ qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm, rc = qemuMigrationWaitForCompletion(driver, vm, asyncJob, NULL, false); - if (rc < 0) + if (rc < 0) { + if (rc == -2) { + orig_err = virSaveLastError(); + virCommandAbort(cmd); + if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) == 0) { + qemuMonitorMigrateCancel(priv->mon); + qemuDomainObjExitMonitor(driver, vm); + } + } goto cleanup; + } if (cmd && virCommandWait(cmd, NULL) < 0) goto cleanup; @@ -4815,6 +4825,9 @@ qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm, ret = 0; cleanup: + if (ret < 0 && !orig_err) + orig_err = virSaveLastError(); + /* Restore max migration bandwidth */ if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) == 0) { qemuMonitorSetMigrationSpeed(priv->mon, saveMigBandwidth); @@ -4840,6 +4853,12 @@ qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm, VIR_CGROUP_DEVICE_RWM); virDomainAuditCgroupPath(vm, priv->cgroup, "deny", path, "rwm", rv == 0); } + + if (orig_err) { + virSetError(orig_err); + virFreeError(orig_err); + } + return ret; } -- 1.9.3

On 05/22/14 13:55, Jiri Denemark wrote:
This is similar to the previous commit in that we need to explicitly send migrate_cancel when libvirt detects an error other than those reported by query-migrate. However, the possibility to hit such error is pretty small.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index ae18acb..5754f73 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -4704,6 +4704,7 @@ qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm, int pipeFD[2] = { -1, -1 }; unsigned long saveMigBandwidth = priv->migMaxBandwidth; char *errbuf = NULL; + virErrorPtr orig_err = NULL;
/* Increase migration bandwidth to unlimited since target is a file. * Failure to change migration speed is not fatal. */ @@ -4806,8 +4807,17 @@ qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm,
rc = qemuMigrationWaitForCompletion(driver, vm, asyncJob, NULL, false);
- if (rc < 0) + if (rc < 0) { + if (rc == -2) { + orig_err = virSaveLastError(); + virCommandAbort(cmd);
Shouldn't we abort the migration in qemu before killing the I/O helper? Qemu will get a sigpipe probably after we do that. Not sure wether that's worth dealing with as the migration is still going to fail.
+ if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) == 0) { + qemuMonitorMigrateCancel(priv->mon); + qemuDomainObjExitMonitor(driver, vm); + } + } goto cleanup; + }
if (cmd && virCommandWait(cmd, NULL) < 0) goto cleanup;
ACK, preferably you respond to my question before pushing :) Peter

On Thu, May 22, 2014 at 16:14:22 +0200, Peter Krempa wrote:
On 05/22/14 13:55, Jiri Denemark wrote:
This is similar to the previous commit in that we need to explicitly send migrate_cancel when libvirt detects an error other than those reported by query-migrate. However, the possibility to hit such error is pretty small.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index ae18acb..5754f73 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -4704,6 +4704,7 @@ qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm, int pipeFD[2] = { -1, -1 }; unsigned long saveMigBandwidth = priv->migMaxBandwidth; char *errbuf = NULL; + virErrorPtr orig_err = NULL;
/* Increase migration bandwidth to unlimited since target is a file. * Failure to change migration speed is not fatal. */ @@ -4806,8 +4807,17 @@ qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm,
rc = qemuMigrationWaitForCompletion(driver, vm, asyncJob, NULL, false);
- if (rc < 0) + if (rc < 0) { + if (rc == -2) { + orig_err = virSaveLastError(); + virCommandAbort(cmd);
Shouldn't we abort the migration in qemu before killing the I/O helper? Qemu will get a sigpipe probably after we do that. Not sure wether that's worth dealing with as the migration is still going to fail.
Hmm, good question. Do we even need to abort I/O helper as I guess cancelling the migration will result in EOF on the I/O helper's input?
+ if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) == 0) { + qemuMonitorMigrateCancel(priv->mon); + qemuDomainObjExitMonitor(driver, vm); + } + }
Jirka

On 05/22/14 16:28, Jiri Denemark wrote:
On Thu, May 22, 2014 at 16:14:22 +0200, Peter Krempa wrote:
On 05/22/14 13:55, Jiri Denemark wrote:
This is similar to the previous commit in that we need to explicitly send migrate_cancel when libvirt detects an error other than those reported by query-migrate. However, the possibility to hit such error is pretty small.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index ae18acb..5754f73 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -4704,6 +4704,7 @@ qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm, int pipeFD[2] = { -1, -1 }; unsigned long saveMigBandwidth = priv->migMaxBandwidth; char *errbuf = NULL; + virErrorPtr orig_err = NULL;
/* Increase migration bandwidth to unlimited since target is a file. * Failure to change migration speed is not fatal. */ @@ -4806,8 +4807,17 @@ qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm,
rc = qemuMigrationWaitForCompletion(driver, vm, asyncJob, NULL, false);
- if (rc < 0) + if (rc < 0) { + if (rc == -2) { + orig_err = virSaveLastError(); + virCommandAbort(cmd);
Shouldn't we abort the migration in qemu before killing the I/O helper? Qemu will get a sigpipe probably after we do that. Not sure wether that's worth dealing with as the migration is still going to fail.
Hmm, good question. Do we even need to abort I/O helper as I guess cancelling the migration will result in EOF on the I/O helper's input?
Well and a counter-example is if qemu get's stuck, then you at least abort the iohelper and then get stuck a few lines below on the monitor.
+ if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) == 0) { + qemuMonitorMigrateCancel(priv->mon); + qemuDomainObjExitMonitor(driver, vm); + } + }
Jirka

On Thu, May 22, 2014 at 18:33:08 +0200, Peter Krempa wrote:
On 05/22/14 16:28, Jiri Denemark wrote:
On Thu, May 22, 2014 at 16:14:22 +0200, Peter Krempa wrote:
On 05/22/14 13:55, Jiri Denemark wrote:
This is similar to the previous commit in that we need to explicitly send migrate_cancel when libvirt detects an error other than those reported by query-migrate. However, the possibility to hit such error is pretty small.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index ae18acb..5754f73 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -4704,6 +4704,7 @@ qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm, int pipeFD[2] = { -1, -1 }; unsigned long saveMigBandwidth = priv->migMaxBandwidth; char *errbuf = NULL; + virErrorPtr orig_err = NULL;
/* Increase migration bandwidth to unlimited since target is a file. * Failure to change migration speed is not fatal. */ @@ -4806,8 +4807,17 @@ qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm,
rc = qemuMigrationWaitForCompletion(driver, vm, asyncJob, NULL, false);
- if (rc < 0) + if (rc < 0) { + if (rc == -2) { + orig_err = virSaveLastError(); + virCommandAbort(cmd);
Shouldn't we abort the migration in qemu before killing the I/O helper? Qemu will get a sigpipe probably after we do that. Not sure wether that's worth dealing with as the migration is still going to fail.
Hmm, good question. Do we even need to abort I/O helper as I guess cancelling the migration will result in EOF on the I/O helper's input?
Well and a counter-example is if qemu get's stuck, then you at least abort the iohelper and then get stuck a few lines below on the monitor.
Fair enough, I'll push the patch as-is. We don't really care if QEMU aborts migration after the I/O helper is killed since we're going to cancel it anyway. Jirka

On Thu, May 22, 2014 at 13:55:14 +0200, Jiri Denemark wrote:
Jiri Denemark (3): Fix error message when TUNNELLED flag is used in non-p2p migration qemu: Send migrate_cancel when aborting migration qemu: Properly abort migration to a file
src/libvirt.c | 8 ++++- src/qemu/qemu_migration.c | 74 ++++++++++++++++++++++++++++++----------------- 2 files changed, 54 insertions(+), 28 deletions(-)
I fixed the bugzilla link in 2/3 and pushed these patches. Jirka
participants (4)
-
Jiri Denemark
-
Peter Krempa
-
Wangrui (K)
-
Zhe Peng