On Tue, Oct 13, 2015 at 4:51 PM, Shivaprasad bhat
<shivaprasadbhat(a)gmail.com> wrote:
On Mon, Oct 12, 2015 at 8:03 PM, Jiri Denemark
<jdenemar(a)redhat.com> wrote:
> On Thu, Oct 08, 2015 at 17:59:07 +0530, Shivaprasad G Bhat wrote:
>> Tunnelled migration can hang if the destination qemu exits despite all the
>> ABI checks. This happens whenever the destination qemu exits before the
>> complete transfer is noticed by source qemu. The savevm state checks at
>> runtime can fail at destination and cause qemu to error out.
>> The source qemu cant notice it as the EPIPE is not propogated to it.
>> The qemuMigrationIOFunc() notices the stream being broken from virStreamSend()
>> and it cleans up the stream alone. The qemuMigrationWaitForCompletion() would
>> never get to 100% transfer completion.
>> The qemuMigrationWaitForCompletion() never breaks out as well since
>> the ssh connection to destination is healthy, and the source qemu also thinks
>> the migration is ongoing as the Fd to which it transfers, is never
>> closed or broken. So, the migration will hang forever. Even Ctrl-C on the
>> virsh migrate wouldn't be honoured. Close the source side FD when there is
>> an error in the stream. That way, the source qemu updates itself and
>> qemuMigrationWaitForCompletion() notices the failure.
>>
>> Close the FD for all kinds of errors to be sure. The error message is not
>> copied for EPIPE so that the destination error is copied instead later.
>>
>> Note:
>> Reproducible with repeated migrations between Power hosts running in different
>> subcores-per-core modes.
>>
>> Changes from v2 -> v3:
>> VIR_FORCE_CLOSE() needn't be called on the qemu source fd as that is
closed
>> by qemuMigrationIOFunc() for tunnelled migration. Set the fd to -1.
>>
>> Signed-off-by: Shivaprasad G Bhat <sbhat(a)linux.vnet.ibm.com>
>> ---
>> src/qemu/qemu_migration.c | 18 ++++++++++++++----
>> 1 file changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
>> index 7440108..c982838 100644
>> --- a/src/qemu/qemu_migration.c
>> +++ b/src/qemu/qemu_migration.c
>> @@ -4019,6 +4019,7 @@ static void qemuMigrationIOFunc(void *arg)
>> if (virStreamFinish(data->st) < 0)
>> goto error;
>>
>> + VIR_FORCE_CLOSE(data->sock);
>> VIR_FREE(buffer);
>>
>> return;
>> @@ -4036,7 +4037,11 @@ static void qemuMigrationIOFunc(void *arg)
>> }
>>
>> error:
>> - virCopyLastError(&data->err);
>> + /* Let the source qemu know that the transfer cant continue anymore.
>> + * Don't copy the error for EPIPE as destination has the actual error.
*/
>> + VIR_FORCE_CLOSE(data->sock);
>> + if (!virLastErrorIsSystemErrno(EPIPE))
>> + virCopyLastError(&data->err);
>> virResetLastError();
>> VIR_FREE(buffer);
>> }
>
> Hmm, I don't think we need to filter out any error here. Were you
> actually able to get the EPIPE error except for seeing it in the logs?
> What I'm trying to say is: if we get an EPIPE in qemuMigrationIOFunc, we
> close data->sock which will propagate to the source QEMU and
> qemuMigrationWaitForCompletion should return a "migration unexpectedly
> failed" error, which will get stored in orig_err and even though
> qemuMigrationStopTunnel will set the thread local error to the one from
> iothread->data, we will reset it back to the orig_err one. Only in case
> qemuMigrationWaitForCompletion returned success but we still got error
> on the tunnel, we'd actually see the error from iothread->err.
>
Yes. The qemuMigrationWaitForCompletion() returns success sometimes and
StopTunnel propogating the failure in such a case. This happens inconsistently,
but I do see it happening. Please let me know what I should be doing
if this is not the
way.
Hi Jirka,
As you mentioned over IRC, I checked what would be the result of
migration if the
destination has no error message. I modified the qemu to not to emit
any error messages.
The error in such case is "internal error: End of file from monitor"
which of course is coming
from libvirtd is as good as "cannot write to stream: Broken pipe".
Both anyway convey very less
as to what to do.
So, I think ignoring EPIPE is the way to go. Please let me know your opinion.
Thanks,
Shiva
Thanks,
Shivaprasad
> Otherwise the patch looks fine.
>
> Jirka