Re: [libvirt] [PATCH v2] Close the source fd if the destination qemu exits during tunnelled migration

28 Sep 2015

      On 09/28/2015 11:02 AM, Shivaprasad bhat wrote:
...
Hi Jon,
Thanks a lot for attaching the patch.  Replies inline.
On Thu, Sep 24, 2015 at 1:52 AM, John Ferlan <jferlan@redhat.com> wrote:
...
On 09/22/2015 07:21 AM, Shivaprasad bhat wrote:
...
On Mon, Sep 21, 2015 at 8:04 PM, John Ferlan <jferlan@redhat.com> wrote:
...
On 09/21/2015 05:09 AM, Shivaprasad bhat wrote:
...
Thanks John for the comments.
On Fri, Sep 18, 2015 at 10:34 PM, John Ferlan <jferlan@redhat.com> wrote:
...
On 09/14/2015 10:44 AM, Shivaprasad G Bhat wrote:
> Tunnelled migration can hang if the destination qemu exits despite all the
> ABI checks. This happens whenever the destination qemu exits before the
> complete transfer is noticed by source qemu. The savevm state checks at
> runtime can fail at destination and cause qemu to error out.
> The source qemu cant notice it as the EPIPE is not propogated to it.
> The qemuMigrationIOFunc() notices the stream being broken from virStreamSend()
> and it cleans up the stream alone. The qemuMigrationWaitForCompletion() would
> never get to 100% transfer completion.
> The qemuMigrationWaitForCompletion() never breaks out as well since
> the ssh connection to destination is healthy, and the source qemu also thinks
> the migration is ongoing as the Fd to which it transfers, is never
> closed or broken. So, the migration will hang forever. Even Ctrl-C on the
> virsh migrate wouldn't be honoured. Close the source side FD when there is
> an error in the stream. That way, the source qemu updates itself and
> qemuMigrationWaitForCompletion() notices the failure.
>
> Close the FD for all kinds of errors to be sure. The error message is not
> copied for EPIPE so that the destination error is copied instead later.
>
> Note:
> Reproducible with repeated migrations between Power hosts running in different
> subcores-per-core modes.
>
> Changes from v1 -> v2:
>    VIR_FORCE_CLOSE() was called twice for this use case which would log
>    unneccessary warnings. So, move the fd close to qemuMigrationIOFunc
>    so that there are no unnecessary duplicate attempts.(Would this trigger
>    a Coverity error? I don't have a setup to check.)
>
> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
> ---
>  src/qemu/qemu_migration.c |    8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
> index ff89ab5..9602fb2 100644
> --- a/src/qemu/qemu_migration.c
> +++ b/src/qemu/qemu_migration.c
> @@ -4012,6 +4012,7 @@ static void qemuMigrationIOFunc(void *arg)
>      if (virStreamFinish(data->st) < 0)
>          goto error;
>
> +    VIR_FORCE_CLOSE(data->sock);
>      VIR_FREE(buffer);
>
>      return;
> @@ -4029,7 +4030,11 @@ static void qemuMigrationIOFunc(void *arg)
>      }
>
>   error:
> -    virCopyLastError(&data->err);
> +    /* Let the source qemu know that the transfer cant continue anymore.
> +     * Don't copy the error for EPIPE as destination has the actual error. */
> +    VIR_FORCE_CLOSE(data->sock);
> +    if (!virLastErrorIsSystemErrno(EPIPE))
> +        virCopyLastError(&data->err);
>      virResetLastError();
>      VIR_FREE(buffer);
>  }
> @@ -4397,7 +4402,6 @@ qemuMigrationRun(virQEMUDriverPtr driver,
>          if (iothread && qemuMigrationStopTunnel(iothread, ret < 0) < 0)
>              ret = -1;
>      }
> -    VIR_FORCE_CLOSE(fd);
^^^
This causes Coverity to claim a RESOURCE_LEAK
Feels like this was a mistake edit...
The VIR_FORCE_CLOSE() inside qemuMigrationIOFunc() alone is sufficient.
Having this again here would lead to Warning in the logs. I too thought coverity
would complain. Is there a way to force ignore a coverity warning?
Typically a marker of sorts, such as
/* coverity[leaked_handle] <some extra comment explaining why> */
Although I'm not sure that's the best way to handle this either.
The problem I see though is this is an "error path" issue and when
perhaps it's safe/required to close fd/io->sock/data->sock.
Your commit comment notes that the issue is seen on a fairly specific
event (virStreamSend failure) for a specific type of migration.
I believe the failure can be seen for all types of migration with savestate
mismatch.
My thoughts were based mostly on your commit message comments:
"The qemuMigrationIOFunc() notices the stream being broken from
virStreamSend() and it cleans up the stream alone. The
qemuMigrationWaitForCompletion() would never get to 100% transfer
completion."
and the belief that the core issue you described is there's no way for
qemuMigrationCompleted to know the qemuMigrationIOFunc thread failed and
thus that's what caused the hang. Since we can pass the iothread "io"
object to the Completion function, I figured we could also use that to
check the status of the thread in the list of things already checked. I
guess it's not clear to me why your test fails below. Maybe it'll help
to attach/show you my thoughts in a patch format. I just took your
changes and "adjusted" them, merging the differences. So you could 'git
am' to top of trunk in order to compare my thoughts.
I tried your patch. Gave migrations in a while loop. Hit the hang in 7th run
itlsef. I feel the small time window is not very negligible.
Perhaps I should clarify my query-migrate has no timeout comment... It
seems based on what I've read so far, the 'query-migrate' command
started successfully, because if it hadn't we would have received a
failure (as shown below). Thus libvirt has sent the command via the
monitor and is waiting for a response (e.g. the virCondWait after the
qemuMonitorSend in the trace below). The response isn't coming because
either "A" qemu didn't send it back or "B" libvirt missed it - that
should be determinable.

There's a way to turn on debugging so the monitor dialog can be seen -
via changes to /etc/libvirt/libvirtd.conf. I use :

log_level = 1
log_filters="3:remote 4:event 3:json 3:rpc"
log_outputs="1:file:/var/log/libvirt/libvirtd.log"

But you may need to remove the "3:json" in order to see the dialog since
that where it "feels like" the issue might be. Then start libvirtd in
the debugger again. Once it's hung - you should be able to scan (eg,
edit) the libvirtd.log file and search for the "query-migrate" command
being sent and then follow the copious output looking for the presence
of a returned command.  If there is none, then something in qemu isn't
returning the failure correctly and it would need to be fixed there I
would think as opposed to throwing down the big hammer of closing the fd.

I think you've hit some edge/error "timing" condition - one of the more
difficult to debug. Finding the root cause of the problem could be
beneficial especially if it could be considered some more "common" type
problem for other similar migrate commands.

John
...
...
I'm not opposed to the chosen method, although perhaps Dan or Jiri who
understand more of the inner workings of migration than I do can provide
some comments. Changing where the fd is closed could have some corner
race condition I'm not aware of...
One issue that needs to be resolved with your patch is what happens to
the 'fd' when qemuMigrationIOFunc isn't involved?  IOW: if "fwdType ==
MIGRATION_FWD_DIRECT", then qemuMigrationStartTunnel isn't called and
thus qemuMigrationIOFunc won't be closing the 'fd' in the form of
data->sock. That's perhaps a path Coverity was considering (and I hadn't
considered initially).
I think to resolve this the following should be added to your patch (I
tried it, Coverity is happy too)...
if (spec->fwdType != MIGRATION_FWD_DIRECT) {
    if (!(iothread = qemuMigrationStartTunnel(spec->fwd.stream, fd)))
        goto cancel;
    /* If we've created a tunnel, then the 'fd' will be closed in the
     * qemuMigrationIOFunc as data->sock
     */
    fd = -1;
}
and restore the VIR_FORCE_CLOSE(fd);
I agree. Let me know if we should go with this approach. I'll spin
the next version with the above corrections. I am copying Jiri on this
patch for his opinion as well.
...
...
...
As I
read the code, that failure jumps to error (as does virStreamFinish). So
now you have a fairly specific set of instances which perhaps would
cause qemuMigrationWaitForCompletion to need to fail. The fix you have
proposed is to close the data->sock (io->sock, fd). However, your
proposal is a larger hammer. I assume closure of data->sock causes
WaitForCompletion to fail (perhaps differently) and that's why you chose it.
Going back to the beginning of qemuMigrationRun, setting the 'fd' and
using StartTunnel seems to rely on MIGRATION_FWD_DIRECT, but if a tunnel
is created an (ugh) 'iothread' variable is NON-NULL. What if 'iothread'
were passed to qemuMigrationWaitForCompletion? Then again passed to
qemuMigrationCompleted which would check if iothread non-NULL and for
some new flag that could be created in _qemuMigrationIOThread, being
true. If it were true, then that would cause failure so that
WaitForCompletion would return error status. The only way the flag is
set to true is in qemuMigrationIOFunc when the code jumps to error.
(e.g. add bool stream_abort and data->stream_abort = true in "error:",
then check iothread && iothread->stream_abort, force error).
I tried this. Until the fd is closed, the 'info migrate' wouldn't
return from the
qemu. So, the migration hangs. Checking the stream status before/after
fetching the job status would still leave a race. So, having it in a different
thread (qemuMigrationIOFunc) here seems inevitable to me.
Although perhaps susceptible to a smaller "timing" window, if we move
the iothread check to before the CheckJobStatus call in Completed, then
that'll work better?
It seems the "query-migrate" command has no timeout... I assume qemu
would indicate that if you call it when the migration pipe had an error,
then you unpredictable results (only a guess - I didn't look at that
code as well). Perhaps that means libvirt needs some code to handle the
condition where the command doesn't finish in a timely manner.
No, If the pipe was broken, the query-migrate returns "failed". This is
anyway handled in a generic way in libvirt.
90.408 > 0x3fff80007410 {"execute":"query-migrate","id":"libvirt-725"}
 90.408 < 0x3fff80007410 {"return": {"status": "failed"}, "id": "libvirt-725"}
Thanks and Regards,
Shiva
...
Perhaps worth a shot...  I did change my attached patch to reflect this
thought..
John
...
#1  0x00003fff83ff593c in virCondWait (c=<optimized out>, m=<optimized out>)
    at util/virthread.c:154
#2  0x00003fff76235544 in qemuMonitorSend (mon=0x3fff54003970,
    msg=<optimized out>) at qemu/qemu_monitor.c:1035
#3  0x00003fff7624fb30 in qemuMonitorJSONCommandWithFd (mon=0x3fff54003970,
    cmd=0x3fff5c001420, scm_fd=-1, reply=0x3fff79fbd388)
    at qemu/qemu_monitor_json.c:293
#4  0x00003fff76254b90 in qemuMonitorJSONCommand (reply=0x3fff79fbd388,
    cmd=0x3fff5c001420, mon=0x3fff54003970) at qemu/qemu_monitor_json.c:323
#5  qemuMonitorJSONGetMigrationStatus (mon=0x3fff54003970,
    status=0x3fff79fbd538) at qemu/qemu_monitor_json.c:2620
#6  0x00003fff7623a664 in qemuMonitorGetMigrationStatus (mon=<optimized out>,
    status=<optimized out>) at qemu/qemu_monitor.c:2134
#7  0x00003fff76228e6c in qemuMigrationFetchJobStatus (driver=0x3fff6c118d80,
    vm=0x3fff6c27faf0, asyncJob=<optimized out>, jobInfo=0x3fff79fbd4f0)
    at qemu/qemu_migration.c:2528
#8  0x00003fff76228fb4 in qemuMigrationUpdateJobStatus (driver=0x3fff6c118d80,
    vm=0x3fff6c27faf0, asyncJob=QEMU_ASYNC_JOB_MIGRATION_OUT)
    at qemu/qemu_migration.c:2565
#9  0x00003fff76229200 in qemuMigrationCheckJobStatus (driver=0x3fff6c118d80,
    vm=0x3fff6c27faf0, asyncJob=QEMU_ASYNC_JOB_MIGRATION_OUT)
---Type <return> to continue, or q <return> to quit---cont
    at qemu/qemu_migration.c:2585
#10 0x00003fff76229408 in qemuMigrationCompleted (iothread=<optimized out>,
...
The only question then in my mind then is would this also be something
that should be done for the virStreamFinish failure path which also
jumps to error?
Yes, I too am not sure if its appropriate for virStreamFinish failure
case. But, given
this an error path, I feel we can safely go ahead and close the
data->sock.
...
Doing this means you shouldn't need the VIR_FILE_CLOSE(data->sock) for
either the normal or error path.  Does this seem to be a reasonable
approach and solve the issue you're facing?
John
...
Thanks and Regards,
Shivaprasad
...
The rest of the patch looks reasonable; however, I'm far from the expert
in these matters.
John
>
>      if (priv->job.completed) {
>          qemuDomainJobInfoUpdateTime(priv->job.completed);
>
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
>