[libvirt] [PATCH 0/4] Fix tunnelled migration

All bugs fixed by the following patches were spotted while testing tunnelled migration. However, the first three of them may also be hit in other scenarios. Jiri Denemark (4): qemu: Preserve original error during migration rpc: Discard non-blocking calls only when necessary qemu: Fix detection of failed migration qemu: Avoid bogus error at the end of tunnelled migration src/qemu/qemu_migration.c | 171 +++++++++++++++++++++++++++++++++++++-------- src/rpc/virnetclient.c | 21 +++--- 2 files changed, 151 insertions(+), 41 deletions(-) -- 1.7.8.5

In some cases (spotted with broken connection during tunneled migration) we were overwriting the original error with worse or even misleading errors generated when we were cleaning up after failed migration. --- src/qemu/qemu_migration.c | 20 ++++++++++++++++++++ 1 files changed, 20 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index fc83805..1bee66a 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1733,6 +1733,7 @@ qemuMigrationRun(struct qemud_driver *driver, qemuMigrationIOThreadPtr iothread = NULL; int fd = -1; unsigned long migrate_speed = resource ? resource : priv->migMaxBandwidth; + virErrorPtr orig_err = NULL; VIR_DEBUG("driver=%p, vm=%p, cookiein=%s, cookieinlen=%d, " "cookieout=%p, cookieoutlen=%p, flags=%lx, resource=%lu, " @@ -1874,6 +1875,9 @@ qemuMigrationRun(struct qemud_driver *driver, ret = 0; cleanup: + if (ret < 0 && !orig_err) + orig_err = virSaveLastError(); + if (spec->fwdType != MIGRATION_FWD_DIRECT) { /* Close now to ensure the IO thread quits & is joinable */ VIR_FORCE_CLOSE(fd); @@ -1888,9 +1892,16 @@ cleanup: qemuMigrationCookieFree(mig); + if (orig_err) { + virSetError(orig_err); + virFreeError(orig_err); + } + return ret; cancel: + orig_err = virSaveLastError(); + if (virDomainObjIsActive(vm)) { if (qemuDomainObjEnterMonitorAsync(driver, vm, QEMU_ASYNC_JOB_MIGRATION_OUT) == 0) { @@ -2495,6 +2506,7 @@ qemuMigrationPerformJob(struct qemud_driver *driver, virDomainEventPtr event = NULL; int ret = -1; int resume = 0; + virErrorPtr orig_err = NULL; if (qemuMigrationJobStart(driver, vm, QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) goto cleanup; @@ -2540,6 +2552,9 @@ qemuMigrationPerformJob(struct qemud_driver *driver, resume = 0; endjob: + if (ret < 0) + orig_err = virSaveLastError(); + if (resume && virDomainObjGetState(vm, NULL) == VIR_DOMAIN_PAUSED) { /* we got here through some sort of failure; start the domain again */ if (qemuProcessStartCPUs(driver, vm, conn, @@ -2569,6 +2584,11 @@ endjob: vm = NULL; } + if (orig_err) { + virSetError(orig_err); + virFreeError(orig_err); + } + cleanup: if (vm) virDomainObjUnlock(vm); -- 1.7.8.5

On 04/25/2012 02:07 PM, Jiri Denemark wrote:
In some cases (spotted with broken connection during tunneled migration) we were overwriting the original error with worse or even misleading errors generated when we were cleaning up after failed migration. --- src/qemu/qemu_migration.c | 20 ++++++++++++++++++++ 1 files changed, 20 insertions(+), 0 deletions(-)
ACK. Peter

Currently, non-blocking calls are either sent immediately or discarded in case sending would block. This was implemented based on the assumption that the non-blocking keepalive call is not needed as there are other calls in the queue which would keep the connection alive. However, if those calls are no-reply calls (such as those carrying stream data), the remote party knows the connection is alive but since we don't get any reply from it, we think the connection is dead. This is most visible in tunnelled migration. If it happens to be longer than keepalive timeout (30s by default), it may be unexpectedly aborted because the connection is considered to be dead. With this patch, we only discard non-blocking calls when the last call with a thread is completed and thus there is no thread left to keep sending the remaining non-blocking calls. --- src/rpc/virnetclient.c | 21 +++++++++------------ 1 files changed, 9 insertions(+), 12 deletions(-) diff --git a/src/rpc/virnetclient.c b/src/rpc/virnetclient.c index 33b7701..8952dd4 100644 --- a/src/rpc/virnetclient.c +++ b/src/rpc/virnetclient.c @@ -1257,6 +1257,13 @@ static void virNetClientIOEventLoopPassTheBuck(virNetClientPtr client, virNetCli } client->haveTheBuck = false; + /* Remove non-blocking calls from the dispatch list since there is no + * call with a thread in the list which could take care of them. + */ + virNetClientCallRemovePredicate(&client->waitDispatch, + virNetClientIOEventLoopRemoveNonBlocking, + thiscall); + VIR_DEBUG("No thread to pass the buck to"); if (client->wantClose) { virNetClientCloseLocked(client); @@ -1300,12 +1307,9 @@ static int virNetClientIOEventLoop(virNetClientPtr client, if (virNetSocketHasCachedData(client->sock) || client->wantClose) timeout = 0; - /* If there are any non-blocking calls in the queue, - * then we don't want to sleep in poll() + /* If we are non-blocking, we don't want to sleep in poll() */ - if (virNetClientCallMatchPredicate(client->waitDispatch, - virNetClientIOEventLoopWantNonBlock, - NULL)) + if (thiscall->nonBlock) timeout = 0; fds[0].events = fds[0].revents = 0; @@ -1410,13 +1414,6 @@ static int virNetClientIOEventLoop(virNetClientPtr client, virNetClientIOEventLoopRemoveDone, thiscall); - /* Iterate through waiting calls and if any are - * non-blocking, remove them from the dispatch list... - */ - virNetClientCallRemovePredicate(&client->waitDispatch, - virNetClientIOEventLoopRemoveNonBlocking, - thiscall); - /* Now see if *we* are done */ if (thiscall->mode == VIR_NET_CLIENT_MODE_COMPLETE) { virNetClientCallRemove(&client->waitDispatch, thiscall); -- 1.7.8.5

On 04/25/2012 02:07 PM, Jiri Denemark wrote:
Currently, non-blocking calls are either sent immediately or discarded in case sending would block. This was implemented based on the assumption that the non-blocking keepalive call is not needed as there are other calls in the queue which would keep the connection alive. However, if those calls are no-reply calls (such as those carrying stream data), the remote party knows the connection is alive but since we don't get any reply from it, we think the connection is dead.
This is most visible in tunnelled migration. If it happens to be longer than keepalive timeout (30s by default), it may be unexpectedly aborted because the connection is considered to be dead.
With this patch, we only discard non-blocking calls when the last call with a thread is completed and thus there is no thread left to keep sending the remaining non-blocking calls. ---
ACK. Peter

When QEMU reported failed or canceled migration, we correctly detected it but didn't really consider it as an error condition and migration protocol just went on. Luckily, some of the subsequent steps eventually failed end we reported an (unrelated and mostly random) error back to the caller. --- src/qemu/qemu_migration.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 1bee66a..43e56b7 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -878,38 +878,39 @@ qemuMigrationSetOffline(struct qemud_driver *driver, static int qemuMigrationUpdateJobStatus(struct qemud_driver *driver, virDomainObjPtr vm, const char *job, enum qemuDomainAsyncJob asyncJob) { qemuDomainObjPrivatePtr priv = vm->privateData; - int ret = -1; + int ret; int status; unsigned long long memProcessed; unsigned long long memRemaining; unsigned long long memTotal; ret = qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob); if (ret < 0) { /* Guest already exited; nothing further to update. */ return -1; } ret = qemuMonitorGetMigrationStatus(priv->mon, &status, &memProcessed, &memRemaining, &memTotal); qemuDomainObjExitMonitorWithDriver(driver, vm); if (ret < 0 || virTimeMillisNow(&priv->job.info.timeElapsed) < 0) { priv->job.info.type = VIR_DOMAIN_JOB_FAILED; return -1; } priv->job.info.timeElapsed -= priv->job.start; + ret = -1; switch (status) { case QEMU_MONITOR_MIGRATION_STATUS_INACTIVE: priv->job.info.type = VIR_DOMAIN_JOB_NONE; qemuReportError(VIR_ERR_OPERATION_FAILED, _("%s: %s"), job, _("is not active")); break; -- 1.7.8.5

On 04/25/2012 02:07 PM, Jiri Denemark wrote:
When QEMU reported failed or canceled migration, we correctly detected it but didn't really consider it as an error condition and migration protocol just went on. Luckily, some of the subsequent steps eventually failed end we reported an (unrelated and mostly random) error back to the caller. ---
Thanks for extra context. ACK Peter

Once qemu monitor reports migration has completed, we just closed our end of the pipe and let migration tunnel die. This generated bogus error in case we did so before the thread saw EOF on the pipe and migration was aborted even though it was in fact successful. With this patch we first wake up the tunnel thread and once it has read all data from the pipe and finished the stream we close the filedescriptor. A small additional bonus of this patch is that real errors reported inside qemuMigrationIOFunc are not overwritten by virStreamAbort any more. --- src/qemu/qemu_migration.c | 148 ++++++++++++++++++++++++++++++++++++--------- 1 files changed, 120 insertions(+), 28 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 43e56b7..3a420be 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -25,6 +25,7 @@ #include <gnutls/gnutls.h> #include <gnutls/x509.h> #include <fcntl.h> +#include <poll.h> #include "qemu_migration.h" #include "qemu_monitor.h" @@ -1585,49 +1586,113 @@ struct _qemuMigrationIOThread { virStreamPtr st; int sock; virError err; + int wakeupRecvFD; + int wakeupSendFD; }; static void qemuMigrationIOFunc(void *arg) { qemuMigrationIOThreadPtr data = arg; - char *buffer; - int nbytes = TUNNEL_SEND_BUF_SIZE; + char *buffer = NULL; + struct pollfd fds[2]; + int timeout = -1; + virErrorPtr err = NULL; + + VIR_DEBUG("Running migration tunnel; stream=%p, sock=%d", + data->st, data->sock); if (VIR_ALLOC_N(buffer, TUNNEL_SEND_BUF_SIZE) < 0) { virReportOOMError(); - virStreamAbort(data->st); - goto error; + goto abrt; } + fds[0].fd = data->sock; + fds[1].fd = data->wakeupRecvFD; + for (;;) { - nbytes = saferead(data->sock, buffer, TUNNEL_SEND_BUF_SIZE); - if (nbytes < 0) { + int ret; + + fds[0].events = fds[1].events = POLLIN; + fds[0].revents = fds[1].revents = 0; + + ret = poll(fds, ARRAY_CARDINALITY(fds), timeout); + + if (ret < 0) { + if (errno == EAGAIN || errno == EINTR) + continue; virReportSystemError(errno, "%s", - _("tunnelled migration failed to read from qemu")); - virStreamAbort(data->st); - VIR_FREE(buffer); - goto error; + _("poll failed in migration tunnel")); + goto abrt; } - else if (nbytes == 0) - /* EOF; get out of here */ + + if (ret == 0) { + /* We were asked to gracefully stop but reading would block. This + * can only happen if qemu told us migration finished but didn't + * close the migration fd. We handle this in the same way as EOF. + */ + VIR_DEBUG("QEMU forgot to close migration fd"); break; + } - if (virStreamSend(data->st, buffer, nbytes) < 0) { - VIR_FREE(buffer); - goto error; + if (fds[1].revents & (POLLIN | POLLERR | POLLHUP)) { + char stop = 0; + + if (saferead(data->wakeupRecvFD, &stop, 1) != 1) { + virReportSystemError(errno, "%s", + _("failed to read from wakeup fd")); + goto abrt; + } + + VIR_DEBUG("Migration tunnel was asked to %s", + stop ? "abort" : "finish"); + if (stop) { + goto abrt; + } else { + timeout = 0; + } } - } - VIR_FREE(buffer); + if (fds[0].revents & (POLLIN | POLLERR | POLLHUP)) { + int nbytes; + + nbytes = saferead(data->sock, buffer, TUNNEL_SEND_BUF_SIZE); + if (nbytes > 0) { + if (virStreamSend(data->st, buffer, nbytes) < 0) + goto error; + } else if (nbytes < 0) { + virReportSystemError(errno, "%s", + _("tunnelled migration failed to read from qemu")); + goto abrt; + } else { + /* EOF; get out of here */ + break; + } + } + } if (virStreamFinish(data->st) < 0) goto error; + VIR_FREE(buffer); + return; +abrt: + err = virSaveLastError(); + if (err && err->code == VIR_ERR_OK) { + virFreeError(err); + err = NULL; + } + virStreamAbort(data->st); + if (err) { + virSetError(err); + virFreeError(err); + } + error: virCopyLastError(&data->err); virResetLastError(); + VIR_FREE(buffer); } @@ -1635,37 +1700,63 @@ static qemuMigrationIOThreadPtr qemuMigrationStartTunnel(virStreamPtr st, int sock) { - qemuMigrationIOThreadPtr io; + qemuMigrationIOThreadPtr io = NULL; + int wakeupFD[2] = { -1, -1 }; - if (VIR_ALLOC(io) < 0) { - virReportOOMError(); - return NULL; + if (pipe2(wakeupFD, O_CLOEXEC) < 0) { + virReportSystemError(errno, "%s", + _("Unable to make pipe")); + goto error; } + if (VIR_ALLOC(io) < 0) + goto no_memory; + io->st = st; io->sock = sock; + io->wakeupRecvFD = wakeupFD[0]; + io->wakeupSendFD = wakeupFD[1]; if (virThreadCreate(&io->thread, true, qemuMigrationIOFunc, io) < 0) { virReportSystemError(errno, "%s", _("Unable to create migration thread")); - VIR_FREE(io); - return NULL; + goto error; } return io; + +no_memory: + virReportOOMError(); +error: + VIR_FORCE_CLOSE(wakeupFD[0]); + VIR_FORCE_CLOSE(wakeupFD[1]); + VIR_FREE(io); + return NULL; } static int -qemuMigrationStopTunnel(qemuMigrationIOThreadPtr io) +qemuMigrationStopTunnel(qemuMigrationIOThreadPtr io, bool error) { int rv = -1; + char stop = error ? 1 : 0; + + /* make sure the thread finishes its job and is joinable */ + if (safewrite(io->wakeupSendFD, &stop, 1) != 1) { + virReportSystemError(errno, "%s", + _("failed to wakeup migration tunnel")); + goto cleanup; + } + virThreadJoin(&io->thread); /* Forward error from the IO thread, to this thread */ if (io->err.code != VIR_ERR_OK) { - virSetError(&io->err); + if (error) + rv = 0; + else + virSetError(&io->err); virResetError(&io->err); goto cleanup; } @@ -1673,6 +1764,8 @@ qemuMigrationStopTunnel(qemuMigrationIOThreadPtr io) rv = 0; cleanup: + VIR_FORCE_CLOSE(io->wakeupSendFD); + VIR_FORCE_CLOSE(io->wakeupRecvFD); VIR_FREE(io); return rv; } @@ -1880,10 +1973,9 @@ cleanup: orig_err = virSaveLastError(); if (spec->fwdType != MIGRATION_FWD_DIRECT) { - /* Close now to ensure the IO thread quits & is joinable */ - VIR_FORCE_CLOSE(fd); - if (iothread && qemuMigrationStopTunnel(iothread) < 0) + if (iothread && qemuMigrationStopTunnel(iothread, ret < 0) < 0) ret = -1; + VIR_FORCE_CLOSE(fd); } if (ret == 0 && -- 1.7.8.5

On 04/25/2012 02:07 PM, Jiri Denemark wrote:
Once qemu monitor reports migration has completed, we just closed our end of the pipe and let migration tunnel die. This generated bogus error in case we did so before the thread saw EOF on the pipe and migration was aborted even though it was in fact successful.
With this patch we first wake up the tunnel thread and once it has read all data from the pipe and finished the stream we close the filedescriptor.
A small additional bonus of this patch is that real errors reported inside qemuMigrationIOFunc are not overwritten by virStreamAbort any more. ---
ACK. Peter

On Wed, Apr 25, 2012 at 14:07:44 +0200, Jiri Denemark wrote:
All bugs fixed by the following patches were spotted while testing tunnelled migration. However, the first three of them may also be hit in other scenarios.
Jiri Denemark (4): qemu: Preserve original error during migration rpc: Discard non-blocking calls only when necessary qemu: Fix detection of failed migration qemu: Avoid bogus error at the end of tunnelled migration
src/qemu/qemu_migration.c | 171 +++++++++++++++++++++++++++++++++++++-------- src/rpc/virnetclient.c | 21 +++--- 2 files changed, 151 insertions(+), 41 deletions(-)
Thanks for the review. I pushed this series. Jirka
participants (2)
-
Jiri Denemark
-
Peter Krempa