[libvirt] [PATCH] remote_daemon_stream: Hold an extra reference to stream in daemonStreamFilter

In v5.9.0-273-g8ecab214de I've tried to fix a lock ordering problem, but introduced a crasher. Problem is that because the client lock is unlocked (in order to honour lock ordering) the stream we are currently checking in daemonStreamFilter() might be freed and thus stream->priv might not even exist when the control get to virMutexLock() call. To resolve this, grab an extra reference to the stream and handle its cleanup should the refcounter reach zero after the deref. If that's the case and we are the only ones holding a reference to the stream, we MUST return a positive value to make virNetServerClientDispatchRead() break its loop where it iterates over filters. The problem is, if we did not do so, then "filter = filter->next" line will read from a memory that was just freed (freeing a stream also unregisters its filter). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- Reproducing this issue is very easy: 1) put sleep(5) right after virObjectUnlock(client) in the fist hunk, 2) virsh console --force $dom and type something so that the stream has some data to process 3) while 2) is still running, run the same command from another terminal 4) observe libvirtd crash src/remote/remote_daemon_stream.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/src/remote/remote_daemon_stream.c b/src/remote/remote_daemon_stream.c index 82cadb67ac..73e4d7befb 100644 --- a/src/remote/remote_daemon_stream.c +++ b/src/remote/remote_daemon_stream.c @@ -293,10 +293,25 @@ daemonStreamFilter(virNetServerClientPtr client, daemonClientStream *stream = opaque; int ret = 0; + /* We must honour lock ordering here. Client private data lock must + * be acquired before client lock. Bu we are already called with + * client locked. To avoid stream disappearing while we unlock + * everything, let's increase its refcounter. This has some + * implications though. */ + stream->refs++; virObjectUnlock(client); virMutexLock(&stream->priv->lock); virObjectLock(client); + if (stream->refs == 1) { + /* So we are the only ones holding the reference to the stream. + * Return 1 to signal to the caller that we've processed the + * message. And to "process" means free. */ + virNetMessageFree(msg); + ret = 1; + goto cleanup; + } + if (msg->header.type != VIR_NET_STREAM && msg->header.type != VIR_NET_STREAM_HOLE) goto cleanup; @@ -318,6 +333,10 @@ daemonStreamFilter(virNetServerClientPtr client, cleanup: virMutexUnlock(&stream->priv->lock); + /* Don't pass client here, because client is locked here and this + * function might try to lock it again which would result in a + * deadlock. */ + daemonFreeClientStream(NULL, stream); return ret; } -- 2.23.0

Yeah, your patch is perfectly fine for stream freed problem. But I have reviewed code in daemonStreamEvent() and daemonStreamFilter() again, and I think there still two issue need to be reconsidered: 1. stream->ref only ++ in daemonStreamFilter, except for error occurred call daemonRemoveClientStream() lead to ref--, like code as follow. Though code won't be executed because of no VIR_STREAM_EVENT_HANGUP event taken place, but it is not good code: /* If we got HANGUP, we need to only send an empty * packet so the client sees an EOF and cleans up */ if (!stream->closed && !stream->recvEOF && (events & VIR_STREAM_EVENT_HANGUP)) { virNetMessagePtr msg; events &= ~(VIR_STREAM_EVENT_HANGUP); stream->tx = false; stream->recvEOF = true; if (!(msg = virNetMessageNew(false))) { daemonRemoveClientStream(client, stream); virNetServerClientClose(client); goto cleanup; } msg->cb = daemonStreamMessageFinished; msg->opaque = stream; stream->refs++; if (virNetServerProgramSendStreamData(stream->prog, client, msg, stream->procedure, stream->serial, "", 0) < 0) { virNetMessageFree(msg); daemonRemoveClientStream(client, stream); virNetServerClientClose(client); goto cleanup; } } 2. call virNetServerClientClose() is still inappropriate in because the it may free the client resource which other threads need ref, may be replace it with virNetServerClientImmediateClose() is better, the daemon can process client resource release unified 3. Segmentfault may still exists when another thread call remoteClientCloseFunc in virNetServerClientClose()(for example, when new force console session break down existed console session, and existed console session 's client close the session simutaneously, but remoteClientCloseFunc not lock(priv->lock), so the resource maybe released when daemonStreamEvent ref) How do you see it? Best Regards! Michal Privoznik <mprivozn@redhat.com> 于2019年11月26日周二 上午12:49写道:
In v5.9.0-273-g8ecab214de I've tried to fix a lock ordering problem, but introduced a crasher. Problem is that because the client lock is unlocked (in order to honour lock ordering) the stream we are currently checking in daemonStreamFilter() might be freed and thus stream->priv might not even exist when the control get to virMutexLock() call.
To resolve this, grab an extra reference to the stream and handle its cleanup should the refcounter reach zero after the deref. If that's the case and we are the only ones holding a reference to the stream, we MUST return a positive value to make virNetServerClientDispatchRead() break its loop where it iterates over filters. The problem is, if we did not do so, then "filter = filter->next" line will read from a memory that was just freed (freeing a stream also unregisters its filter).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
Reproducing this issue is very easy:
1) put sleep(5) right after virObjectUnlock(client) in the fist hunk, 2) virsh console --force $dom and type something so that the stream has some data to process 3) while 2) is still running, run the same command from another terminal 4) observe libvirtd crash
src/remote/remote_daemon_stream.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
diff --git a/src/remote/remote_daemon_stream.c b/src/remote/remote_daemon_stream.c index 82cadb67ac..73e4d7befb 100644 --- a/src/remote/remote_daemon_stream.c +++ b/src/remote/remote_daemon_stream.c @@ -293,10 +293,25 @@ daemonStreamFilter(virNetServerClientPtr client, daemonClientStream *stream = opaque; int ret = 0;
+ /* We must honour lock ordering here. Client private data lock must + * be acquired before client lock. Bu we are already called with + * client locked. To avoid stream disappearing while we unlock + * everything, let's increase its refcounter. This has some + * implications though. */ + stream->refs++; virObjectUnlock(client); virMutexLock(&stream->priv->lock); virObjectLock(client);
+ if (stream->refs == 1) { + /* So we are the only ones holding the reference to the stream. + * Return 1 to signal to the caller that we've processed the + * message. And to "process" means free. */ + virNetMessageFree(msg); + ret = 1; + goto cleanup; + } + if (msg->header.type != VIR_NET_STREAM && msg->header.type != VIR_NET_STREAM_HOLE) goto cleanup; @@ -318,6 +333,10 @@ daemonStreamFilter(virNetServerClientPtr client,
cleanup: virMutexUnlock(&stream->priv->lock); + /* Don't pass client here, because client is locked here and this + * function might try to lock it again which would result in a + * deadlock. */ + daemonFreeClientStream(NULL, stream); return ret; }
-- 2.23.0

Hi, Michal: I think the daemonFreeClientStream(NULL, stream); in your patch should line above virMutexUnlock(&stream->priv->lock), or else there is issue WAW. That is two threads may sub stream-> Lance Liu <liu.lance.89@gmail.com> 于2019年11月26日周二 下午1:54写道:
Yeah, your patch is perfectly fine for stream freed problem. But I have reviewed code in daemonStreamEvent() and daemonStreamFilter() again, and I think there still two issue need to be reconsidered: 1. stream->ref only ++ in daemonStreamFilter, except for error occurred call daemonRemoveClientStream() lead to ref--, like code as follow. Though code won't be executed because of no VIR_STREAM_EVENT_HANGUP event taken place, but it is not good code: /* If we got HANGUP, we need to only send an empty * packet so the client sees an EOF and cleans up */ if (!stream->closed && !stream->recvEOF && (events & VIR_STREAM_EVENT_HANGUP)) { virNetMessagePtr msg; events &= ~(VIR_STREAM_EVENT_HANGUP); stream->tx = false; stream->recvEOF = true; if (!(msg = virNetMessageNew(false))) { daemonRemoveClientStream(client, stream); virNetServerClientClose(client); goto cleanup; } msg->cb = daemonStreamMessageFinished; msg->opaque = stream; stream->refs++; if (virNetServerProgramSendStreamData(stream->prog, client, msg, stream->procedure, stream->serial, "", 0) < 0) { virNetMessageFree(msg); daemonRemoveClientStream(client, stream); virNetServerClientClose(client); goto cleanup; } } 2. call virNetServerClientClose() is still inappropriate in because the it may free the client resource which other threads need ref, may be replace it with virNetServerClientImmediateClose() is better, the daemon can process client resource release unified 3. Segmentfault may still exists when another thread call remoteClientCloseFunc in virNetServerClientClose()(for example, when new force console session break down existed console session, and existed console session 's client close the session simutaneously, but remoteClientCloseFunc not lock(priv->lock), so the resource maybe released when daemonStreamEvent ref)
How do you see it?
Best Regards!
Michal Privoznik <mprivozn@redhat.com> 于2019年11月26日周二 上午12:49写道:
In v5.9.0-273-g8ecab214de I've tried to fix a lock ordering problem, but introduced a crasher. Problem is that because the client lock is unlocked (in order to honour lock ordering) the stream we are currently checking in daemonStreamFilter() might be freed and thus stream->priv might not even exist when the control get to virMutexLock() call.
To resolve this, grab an extra reference to the stream and handle its cleanup should the refcounter reach zero after the deref. If that's the case and we are the only ones holding a reference to the stream, we MUST return a positive value to make virNetServerClientDispatchRead() break its loop where it iterates over filters. The problem is, if we did not do so, then "filter = filter->next" line will read from a memory that was just freed (freeing a stream also unregisters its filter).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
Reproducing this issue is very easy:
1) put sleep(5) right after virObjectUnlock(client) in the fist hunk, 2) virsh console --force $dom and type something so that the stream has some data to process 3) while 2) is still running, run the same command from another terminal 4) observe libvirtd crash
src/remote/remote_daemon_stream.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
diff --git a/src/remote/remote_daemon_stream.c b/src/remote/remote_daemon_stream.c index 82cadb67ac..73e4d7befb 100644 --- a/src/remote/remote_daemon_stream.c +++ b/src/remote/remote_daemon_stream.c @@ -293,10 +293,25 @@ daemonStreamFilter(virNetServerClientPtr client, daemonClientStream *stream = opaque; int ret = 0;
+ /* We must honour lock ordering here. Client private data lock must + * be acquired before client lock. Bu we are already called with + * client locked. To avoid stream disappearing while we unlock + * everything, let's increase its refcounter. This has some + * implications though. */ + stream->refs++; virObjectUnlock(client); virMutexLock(&stream->priv->lock); virObjectLock(client);
+ if (stream->refs == 1) { + /* So we are the only ones holding the reference to the stream. + * Return 1 to signal to the caller that we've processed the + * message. And to "process" means free. */ + virNetMessageFree(msg); + ret = 1; + goto cleanup; + } + if (msg->header.type != VIR_NET_STREAM && msg->header.type != VIR_NET_STREAM_HOLE) goto cleanup; @@ -318,6 +333,10 @@ daemonStreamFilter(virNetServerClientPtr client,
cleanup: virMutexUnlock(&stream->priv->lock); + /* Don't pass client here, because client is locked here and this + * function might try to lock it again which would result in a + * deadlock. */ + daemonFreeClientStream(NULL, stream); return ret; }
-- 2.23.0

On 11/28/19 4:09 AM, Lance Liu wrote:
Hi, Michal: I think the daemonFreeClientStream(NULL, stream); in your patch should line above virMutexUnlock(&stream->priv->lock), or else there is issue WAW.
I'm sorry, I don't know what WAW is.
That is two threads may sub stream->
Lance Liu <liu.lance.89@gmail.com <mailto:liu.lance.89@gmail.com>> 于 2019年11月26日周二 下午1:54写道:
Yeah, your patch is perfectly fine for stream freed problem. But I have reviewed code in daemonStreamEvent() and daemonStreamFilter() again, and I think there still two issue need to be reconsidered: 1. stream->ref only ++ in daemonStreamFilter, except for error occurred call daemonRemoveClientStream() lead to ref--, like code as follow. Though code won't be executed because of no VIR_STREAM_EVENT_HANGUP event taken place, but it is not good code:
Do you perhaps mean daemonStreamEvent() instead of daemonStreamFilter()? Because in daemonStreamFilter() we do both ref & unref - the unref is hidden in daemonFreeClientStream().
/* If we got HANGUP, we need to only send an empty * packet so the client sees an EOF and cleans up */ if (!stream->closed && !stream->recvEOF && (events & VIR_STREAM_EVENT_HANGUP)) { virNetMessagePtr msg; events &= ~(VIR_STREAM_EVENT_HANGUP); stream->tx = false; stream->recvEOF = true; if (!(msg = virNetMessageNew(false))) { daemonRemoveClientStream(client, stream); virNetServerClientClose(client); goto cleanup; } msg->cb = daemonStreamMessageFinished; msg->opaque = stream; stream->refs++; if (virNetServerProgramSendStreamData(stream->prog, client, msg, stream->procedure, stream->serial, "", 0) < 0) { virNetMessageFree(msg); daemonRemoveClientStream(client, stream); virNetServerClientClose(client); goto cleanup; }
Again, the unref is hidden in daemonStreamMessageFinished() which calls daemonFreeClientStream(). The msg->cb is guaranteed to be called eventually upon successful return from virNetServerProgramSendStreamData(). That's why you don't see the direct unref in the success path.
} 2. call virNetServerClientClose() is still inappropriate in because the it may free the client resource which other threads need ref, may be replace it with virNetServerClientImmediateClose() is better, the daemon can process client resource release unified
I've seen the patch you send, but it was without any commit message so no one had any idea why the change is needed. I understand your argument here, but I'm having hard time finding practical example in the code. I mean, have you seen such issue in real?
3. Segmentfault may still exists when another thread call remoteClientCloseFunc in virNetServerClientClose()(for example, when new force console session break down existed console session, and existed console session 's client close the session simutaneously, but remoteClientCloseFunc not lock(priv->lock), so the resource maybe released when daemonStreamEvent ref)
Again, sounds plausible, but this is a subset of the previous problem since remoteClientCloseFunc() is called from virNetServerClientCloseLocked() which is called from virNetServerClientClose(). Michal

Michal Privoznik 下午5:58 (1小时前) 发送至 我、 libvir-list On 11/28/19 4:09 AM, Lance Liu wrote:
Hi, Michal: I think the daemonFreeClientStream(NULL, stream); in your patch should line above virMutexUnlock(&stream->priv->lock), or else there is issue WAW.
I'm sorry, I don't know what WAW is. write after write, daemonFreeClientStream() outside stream->priv->lock may lead to two threads call daemonFreeClientStream() simultaneously
That is two threads may sub stream->
Lance Liu <liu.lance.89@gmail.com <mailto:liu.lance.89@gmail.com>> 于 2019年11月26日周二 下午1:54写道:
Yeah, your patch is perfectly fine for stream freed problem. But I have reviewed code in daemonStreamEvent() and daemonStreamFilter() again, and I think there still two issue need to be reconsidered: 1. stream->ref only ++ in daemonStreamFilter, except for error occurred call daemonRemoveClientStream() lead to ref--, like code as follow. Though code won't be executed because of no VIR_STREAM_EVENT_HANGUP event taken place, but it is not good code:
Do you perhaps mean daemonStreamEvent() instead of daemonStreamFilter()? Because in daemonStreamFilter() we do both ref & unref - the unref is hidden in daemonFreeClientStream(). Again, the unref is hidden in daemonStreamMessageFinished() which calls daemonFreeClientStream(). The msg->cb is guaranteed to be called eventually upon successful return from virNetServerProgramSendStreamData(). That's why you don't see the direct unref in the success path. Yeah, got it. I missed that!
} 2. call virNetServerClientClose() is still inappropriate in because the it may free the client resource which other threads need ref, may be replace it with virNetServerClientImmediateClose() is better, the daemon can process client resource release unified
I've seen the patch you send, but it was without any commit message so no one had any idea why the change is needed. I understand your argument here, but I'm having hard time finding practical example in the code. I mean, have you seen such issue in real?
3. Segmentfault may still exists when another thread call remoteClientCloseFunc in virNetServerClientClose()(for example, when new force console session break down existed console session, and existed console session 's client close the session simutaneously, but remoteClientCloseFunc not lock(priv->lock), so the resource maybe released when daemonStreamEvent ref)
Again, sounds plausible, but this is a subset of the previous problem since remoteClientCloseFunc() is called from virNetServerClientCloseLocked() which is called from virNetServerClientClose(). I'll show you how to get the bug。Use gdb attach the libvirt daemon and break in the middle of daemonStreamEvent()(with event == 4, called by new force console break down existed console session), then exit from the previous virsh console client, so libvirt daemon can percent console client connection fd closed, then call the virNetServerClientClose() in main thread, because the callback remote remoteClientCloseFunc() downes not lock priv->lock, so it will release stream directly, regardless anther thread reference it in daemonStreamEvent() Michal Privoznik <mprivozn@redhat.com> 于2019年11月28日周四 下午5:58写道:
On 11/28/19 4:09 AM, Lance Liu wrote:
Hi, Michal: I think the daemonFreeClientStream(NULL, stream); in your patch should line above virMutexUnlock(&stream->priv->lock), or else there is issue WAW.
I'm sorry, I don't know what WAW is.
That is two threads may sub stream->
Lance Liu <liu.lance.89@gmail.com <mailto:liu.lance.89@gmail.com>> 于 2019年11月26日周二 下午1:54写道:
Yeah, your patch is perfectly fine for stream freed problem. But I have reviewed code in daemonStreamEvent() and daemonStreamFilter() again, and I think there still two issue need to be reconsidered: 1. stream->ref only ++ in daemonStreamFilter, except for error occurred call daemonRemoveClientStream() lead to ref--, like code as follow. Though code won't be executed because of no VIR_STREAM_EVENT_HANGUP event taken place, but it is not good code:
Do you perhaps mean daemonStreamEvent() instead of daemonStreamFilter()? Because in daemonStreamFilter() we do both ref & unref - the unref is hidden in daemonFreeClientStream().
/* If we got HANGUP, we need to only send an empty * packet so the client sees an EOF and cleans up */ if (!stream->closed && !stream->recvEOF && (events & VIR_STREAM_EVENT_HANGUP)) { virNetMessagePtr msg; events &= ~(VIR_STREAM_EVENT_HANGUP); stream->tx = false; stream->recvEOF = true; if (!(msg = virNetMessageNew(false))) { daemonRemoveClientStream(client, stream); virNetServerClientClose(client); goto cleanup; } msg->cb = daemonStreamMessageFinished; msg->opaque = stream; stream->refs++; if (virNetServerProgramSendStreamData(stream->prog, client, msg, stream->procedure, stream->serial, "", 0) < 0) { virNetMessageFree(msg); daemonRemoveClientStream(client, stream); virNetServerClientClose(client); goto cleanup; }
Again, the unref is hidden in daemonStreamMessageFinished() which calls daemonFreeClientStream(). The msg->cb is guaranteed to be called eventually upon successful return from virNetServerProgramSendStreamData(). That's why you don't see the direct unref in the success path.
} 2. call virNetServerClientClose() is still inappropriate in because the it may free the client resource which other threads need ref, may be replace it with virNetServerClientImmediateClose() is better, the daemon can process client resource release unified
I've seen the patch you send, but it was without any commit message so no one had any idea why the change is needed. I understand your argument here, but I'm having hard time finding practical example in the code. I mean, have you seen such issue in real?
3. Segmentfault may still exists when another thread call remoteClientCloseFunc in virNetServerClientClose()(for example, when new force console session break down existed console session, and existed console session 's client close the session simutaneously, but remoteClientCloseFunc not lock(priv->lock), so the resource maybe released when daemonStreamEvent ref)
Again, sounds plausible, but this is a subset of the previous problem since remoteClientCloseFunc() is called from virNetServerClientCloseLocked() which is called from virNetServerClientClose().
Michal

On Mon, Nov 25, 2019 at 05:49:35PM +0100, Michal Privoznik wrote:
In v5.9.0-273-g8ecab214de I've tried to fix a lock ordering problem, but introduced a crasher. Problem is that because the client lock is unlocked (in order to honour lock ordering) the stream we are currently checking in daemonStreamFilter() might be freed and thus stream->priv might not even exist when the control get to virMutexLock() call.
To resolve this, grab an extra reference to the stream and handle its cleanup should the refcounter reach zero after the deref. If that's the case and we are the only ones holding a reference to the stream, we MUST return a positive value to make virNetServerClientDispatchRead() break its loop where it iterates over filters. The problem is, if we did not do so, then "filter = filter->next" line will read from a memory that was just freed (freeing a stream also unregisters its filter).
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
Reproducing this issue is very easy:
1) put sleep(5) right after virObjectUnlock(client) in the fist hunk, 2) virsh console --force $dom and type something so that the stream has some data to process 3) while 2) is still running, run the same command from another terminal 4) observe libvirtd crash
src/remote/remote_daemon_stream.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
Reviewed-by: Ján Tomko <jtomko@redhat.com> Jano

On Fri, Nov 29, 2019 at 02:04:35PM +0100, Michal Privoznik wrote:
On 11/29/19 1:30 PM, Ján Tomko wrote:
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Thanks, I'd like to push this one during the freeze since it fixes a crash. Are you okay with that?
Yes, that's what the freeze is for. Jano
Michal

On 11/29/19 2:25 PM, Ján Tomko wrote:
On Fri, Nov 29, 2019 at 02:04:35PM +0100, Michal Privoznik wrote:
On 11/29/19 1:30 PM, Ján Tomko wrote:
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Thanks, I'd like to push this one during the freeze since it fixes a crash. Are you okay with that?
Yes, that's what the freeze is for.
Thanks, pushed. Michal
participants (3)
-
Ján Tomko
-
Lance Liu
-
Michal Privoznik