Tunnelled migration can hang if the destination qemu exits despite all the
ABI checks. This happens whenever the destination qemu exits before the
complete transfer. The savevm state checks at runtime can fail at
destination qemu and error out. The source qemu cant notice it as the
EPIPE is not propogated to qemu. The qemuMigrationIOFunc() notices
the stream being broken from virStreamSend() and it cleans up the stream alone.
The qemuMigrationWaitForCompletion() would never get to 100% transfer
completion. The qemuMigrationWaitForCompletion() never breaks out as well since
the ssh connection to destination is healthy, and the source qemu also thinks
the migration is ongoing as the Fd to which it transfers, is never
closed or broken. So, the migration will hang forever. Even Ctrl-C on the
virsh migrate wouldn't be honoured. Close the source side FD when there is
an error in the stream. That way, the source qemu updates itself and
qemuMigrationWaitForCompletion() notices the failure.
Close the FD for all kinds of errors to be sure. The error message is not
copied for EPIPE so that the destination error is copied instead later.
Note:
Reproducible with repeated migrations between Power hosts running in different
subcores-per-core modes.
Signed-off-by: Shivaprasad G Bhat <sbhat(a)linux.vnet.ibm.com>
---
src/qemu/qemu_migration.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index ff89ab5..4f9aced 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -4029,7 +4029,11 @@ static void qemuMigrationIOFunc(void *arg)
}
error:
- virCopyLastError(&data->err);
+ /* Let the source qemu know that the transfer can't continue anymore.
+ * Don't copy the error for EPIPE as destination has the actual error */
+ VIR_FORCE_CLOSE(data->sock);
+ if (!virLastErrorIsSystemErrno(EPIPE))
+ virCopyLastError(&data->err);
virResetLastError();
VIR_FREE(buffer);
}