Re: [libvirt] [PATCH v3 2/6] fdstream: Report error from the I/O thread

13 Jun 2017

On 06/12/2017 11:48 PM, John Ferlan wrote:
...
On 06/05/2017 04:22 AM, Michal Privoznik wrote:
...
Problem with our error reporting is that the error object is a
thread local variable. That means if there's an error reported
within the I/O thread it gets logged and everything, but later
when the event loop aborts the stream it doesn't see the original
error. So we are left with some generic error. We can do better
if we copy the error message between the threads.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
 daemon/stream.c        | 18 ++++++++++++------
 src/util/virfdstream.c |  9 ++++++---
 2 files changed, 18 insertions(+), 9 deletions(-)
Perhaps I'm lost in weeds of this code, but I thought the magic of error
message details for the threads was handled through rerr (and
virNetMessageSaveError).
Not really. When you call virReportError()/virReportSystemError() these
functions:

1) fetch thread local error object
2) if it doesn't exist they create one
3) copy the error message and couple of other things into the object

Therefore, every thread has its own error object and unless two threads
share a pointer, they cannot see each others error object. Now, we have
two threads when it comes to server side of streams: the event loop
(which sends data to the client), and I/O thread. Should something go
wrong in the latter, we need to save the error object somewhere so that
the former can pick it up.
...
Still, based on later patches which seem to care about the value of
errno, it interesting this one removes an errno setting and replaces it
with an error message.
With a pointer. That way we can still check if it is set or not. Prior
to this patch, threadErr is an int, and checks are: "if
(!fdst->threadErr)". After that threadErr is a pointer so that the
checks can stay the same.
...
Maybe the answer is rather than replacing threadErr it should be
'augmented' with a 'threadErrMsg'.
Why? Prior to this patch threadErr despite being type of was used as a
boolean really. What would be the benefits of having both int threadErr;
and virErrorPtr trheadErrMsg;?
...
...

diff --git a/daemon/stream.c b/daemon/stream.c
index 1d5b50ad7..5077ac8b0 100644
--- a/daemon/stream.c
+++ b/daemon/stream.c
@@ -231,17 +231,23 @@ daemonStreamEvent(virStreamPtr st, int events, void *opaque)
         int ret;
         virNetMessagePtr msg;
         virNetMessageError rerr;
+        virErrorPtr origErr = virSaveLastError();
memset(&rerr, 0, sizeof(rerr));
         stream->closed = true;
         virStreamEventRemoveCallback(stream->st);
         virStreamAbort(stream->st);
-        if (events & VIR_STREAM_EVENT_HANGUP)
-            virReportError(VIR_ERR_RPC,
-                           "%s", _("stream had unexpected termination"));
-        else
-            virReportError(VIR_ERR_RPC,
-                           "%s", _("stream had I/O failure"));
+        if (origErr && origErr->code != VIR_ERR_OK) {
+            virSetError(origErr);
+            virFreeError(origErr);
+        } else {
+            if (events & VIR_STREAM_EVENT_HANGUP)
+                virReportError(VIR_ERR_RPC,
+                               "%s", _("stream had unexpected termination"));
+            else
+                virReportError(VIR_ERR_RPC,
+                               "%s", _("stream had I/O failure"));
+        }
This seems fine - display the error that we got from some lower spot if
it exists; otherwise,
Might be important to note the saving of the error has a lot to do with
virStreamAbort's clearing.
Well, we don't have comment on other places like this, but okay, I can
add a comment that says "save original error in case something resets
its later", or something among those lines.
...
At the very least this is an error in our processing vs. something
within the stream, correct? Or is that the thicket of weeds I'm lost in.
...
msg = virNetMessageNew(false);
         if (!msg) {
diff --git a/src/util/virfdstream.c b/src/util/virfdstream.c
index cd24757e6..5b59765fa 100644
--- a/src/util/virfdstream.c
+++ b/src/util/virfdstream.c
@@ -106,7 +106,7 @@ struct virFDStreamData {
     /* Thread data */
     virThreadPtr thread;
     virCond threadCond;
-    int threadErr;
+    virErrorPtr threadErr;
     bool threadQuit;
     bool threadAbort;
     bool threadDoRead;
@@ -123,6 +123,7 @@ virFDStreamDataDispose(void *obj)
     virFDStreamDataPtr fdst = obj;
VIR_DEBUG("obj=%p", fdst);
+    virFreeError(fdst->threadErr);
     virFDStreamMsgQueueFree(&fdst->msg);
 }
@@ -312,8 +313,10 @@ static void virFDStreamEvent(int watch ATTRIBUTE_UNUSED,
         return;
     }
-    if (fdst->threadErr)
+    if (fdst->threadErr) {
         events |= VIR_STREAM_EVENT_ERROR;
+        virSetError(fdst->threadErr);> +    }
cb = fdst->cb;
     cbopaque = fdst->opaque;
@@ -637,7 +640,7 @@ virFDStreamThread(void *opaque)
     return;
error:
-    fdst->threadErr = errno;
+    fdst->threadErr = virSaveLastError();
"Typically" the processing is:
orig_err = virSaveLastError();
    { do stuff };
    if (orig_err) {
        virSetError(orig_err);
        virFreeError(orig_err);
    }
This is not typical. This hunk here (virSaveLastError()) is called from
within the I/O thread. And as I've explained earlier, error objects are
thread local. So what this hunk does is save any error reported earlier
so that the other hunk above (virSetError()) can set it (it runs in a
different threat). What's not so typical about this? I don't think that
we copy error objects between two threads anywhere else in the code.
...
So, how do we know threadErr got something in return and if it didn't
we've messed up "other" algorithms. In particular, the lines in the
previous hunk won't set 'events' any more.
How come? threadErr is set if and only if the I/O thread finished with
an error.
...
I'm not worried about being called twice (we know that won't happen
;-)), but spreading out the error message processing through 3 functions
causes me wonder whether that virFreeError should be done right after
the virSetError rather than waiting for virFDStreamDataDispose.
I don't think so. threadError is part of virFDStreamData object and as
such the dispose function is responsible for freeing the object.

Michal