On 03/01/2018 04:43 AM, xinhua.Cao wrote:
From: l00425170 <liujunjie23(a)huawei.com>
When libvirt client registers callback function,
remoteDispatchConnectDomainEventCallbackRegisterAny is called, then virInsertElementsN
will be called to add new callback for struct daemonClientPrivate *priv.
But at the same time, if the corresponding socket is closed (one way is to kill the
client process), the close callback remoteClientCloseFunc is called, then
remoteFreePrivCallbacks(priv) is called to free priv. So there exists data race to
cause libvirtd coredump.
More details, in add callback, when memset for ptrptr (that is priv) is done in
virExpandN at the process of virInsertElementsN. At this time point, priv can be
free to be 0x0 when close client.
So when the procedure goes to memcpy in virInsertElementsN, the libvirtd will coredump.
The description above is not easy to read/understand without looking at
the code as well. Took a while to chase through these and understand
what the issue is (or at least what it appears to be).
So the problem is partially that remoteClientFreePrivateCallbacks is
called without taking the priv->lock.
But it may also be partially that the *EventCallback Register/Deregister
API's will check conn->priv and then take the priv->lock (unlike the
sasl API code).
The solution you have come up with is to add a bool @clientClosed to
daemonClientPrivate to be set when remoteClientCloseFunc is fired. Then
in the various remoteDispatchConnect*EventRegister calls, you check if
that bool is set before continuing. IOW: Don't allow registering a new
one if it's determined that remoteClientCloseFunc already ran.
So my first thought is - why limit to just those API's? Couldn't any one
of the API's that calls virNetServerClientGetPrivateData to fetch @priv
be "interrupted [in]appropriately" prior to checking "if
(!priv->conn)"?
A @priv that could have been deleted by something else? I'm probably
over thinking though ;-)
Anyway, let's assume that cannot happen. Let's assume that checking
priv->conn is still "good enough". Thus there's two things that I'd
like
to see tested/done before adding a new boolean:
1. Rather the check "if (!priv->conn)" before the virMutexLock, move
the virMutexLock(&priv->lock); to before the if condition. You'll note
the "if" failure will "goto cleanup" where cleanup: will
virMutexUnlock
even though we don't have the lock. We don't check the error, so no big
deal. This should be a separate/initial patch.
2. Add the virMutexLock before remoteClientFreePrivateCallbacks and
virMutexUnlock afterwards. This would be a second patch.
Can you reproduce the stack trace under those conditions? If so, then
there may be a really nasty TOCTOU problem which could be quite invasive
to resolve properly. If not, then it seems "priv->conn" is "good
enough"
as the gating condition.
If you can reproduce the condition, then since remoteClientCloseFunc is
essentially being in the middle of a close is either of the client
fields "delayedClose" or "wantClose" already set? (easy enough to
check
while debugging). If so, can we use that knowledge somehow to not allow
new EventCallbacks to be added?
I think the "worst" that could happen with the proper locking in place
for remoteClientFreePrivateCallbacks is some event callback is added and
leaked because we were able to RegisterAny after remoteClientCloseFunc
ran. In which case, so what, we were dying anyway...
NB: Couple of other things to think about below...
And the stack are as follows:
Thread 1 (Thread 0x7feadf7fe700 (LWP 8978)):
#0 0x00007feb7b77f314 in __memcpy_ssse3_back() from /usr/lib64/libc.so.6
#1 0x00007feb7e769a82 in memcpy
(__len=8, __src=0x7feadf7fdb58, __dest=<optimized out>)
at /usr/include/bits/string3.h:51
#2 virInsertElementsN
(ptrptr=ptrptr@entry=0x558e9c36fae8, size=size@entry=8,
at=<optimized out>, at@entry=18446744073709551615,
countptr=countptr@entry=0x558e9c36faf0, add=add@entry=1,
newelems=newelems@entry=0x7feadf7fdb58, clearOriginal=clearOriginal@entry=true,
inPlace=inPlace@entry=false, report=report@entry=true,
domcode=domcode@entry=7, filename=filename@entry=0x558e9af16107
"remote.c",
funcname=funcname@entry=0x558e9af24600 <__FUNCTION__.48097>
"remoteDispatchConnectDomainEventCallbackRegisterAny",
linenr=linenr@entry=4424)
at util/viralloc.c:454
#3 0x0000558e9aed7001 in remoteDispatchConnectDomainEventCallbackRegisterAny
(server=<optimized out>, msg=<optimized out>, ret=0x7feab805e6f0,
args=0x7feab806aa40, rerr=0x7feadf7fdc90, client=<optimized out>)
at remote.c:4422
#4 remoteDispatchConnectDomainEventCallbackRegisterAnyHelper
(server=<optimized out>, client=<optimized out>, msg=<optimized
out>,
rerr=0x7feadf7fdc90, args=0x7feab806aa40, ret=0x7feab805e6f0)
at remote_dispatch.h:424
#5 0x0000558e9af0ca48 in virNetServerProgramDispatchCall
(msg=0x558e9c3925c0, client=0x558e9c36f970, server=0x558e9c351570,
prog=0x558e9c367d50)
at rpc/virnetserverprogram.c:474
#6 virNetServerProgramDispatch
(prog=0x558e9c367d50, server=server@entry=0x558e9c351570, client=0x558e9c36f970,
msg=0x558e9c3925c0)
at rpc/virnetserverprogram.c:311
#7 0x0000558e9af085ed in virNetServerProcessMsg
(msg=<optimized out>, prog=<optimized out>, client=<optimized
out>,
srv=0x558e9c351570)
at rpc/virnetserver.c:148
#8 virNetServerHandleJob
(jobOpaque=<optimized out>, opaque=0x558e9c351570)
at rpc/virnetserver.c:169
#9 0x00007feb7e7f1861 in virThreadPoolWorker
(opaque=opaque@entry=0x558e9c36b5f0)
at util/virthreadpool.c:167
#10 0x00007feb7e7f0b78 in virThreadHelper
(data=<optimized out>)
at util/virthread.c:219
#11 0x00007feb7b9fcdc5 in start_thread () from /usr/lib64/libpthread.so.0
#12 0x00007feb7b72b75d in clone () from /usr/lib64/libc.so.6
Signed-off-by: l00425170 <liujunjie23(a)huawei.com>
^^^^^^^^^^
This needs to be a more proper name.
---
src/remote/remote_daemon.h | 2 ++
src/remote/remote_daemon_dispatch.c | 46 ++++++++++++++++++++++++++++++++++++-
2 files changed, 47 insertions(+), 1 deletion(-)
diff --git a/src/remote/remote_daemon.h b/src/remote/remote_daemon.h
index 4467f71..8accafc 100644
--- a/src/remote/remote_daemon.h
+++ b/src/remote/remote_daemon.h
@@ -76,6 +76,8 @@ struct daemonClientPrivate {
virConnectPtr conn;
daemonClientStreamPtr streams;
+
+ bool clientClosed;
If we find we really need to add it, I think this bool should be moved
before the various EventCallback defs and renamed to "inhibitEventAdd"
(or something similar) with a comment before it indicating how it'll be
used.
"This will be set when the remoteClientCloseFunc is called due to socket
closure as a result of a client process being killed. It will ensure no
other thread could be adding to the various client private *EventCallbacks"
};
diff --git a/src/remote/remote_daemon_dispatch.c b/src/remote/remote_daemon_dispatch.c
index fdb0a36..59e69e6 100644
--- a/src/remote/remote_daemon_dispatch.c
+++ b/src/remote/remote_daemon_dispatch.c
@@ -1752,8 +1752,12 @@ static void remoteClientCloseFunc(virNetServerClientPtr client)
daemonRemoveAllClientStreams(priv->streams);
/* Deregister event delivery callback */
- if (priv->conn)
+ if (priv->conn) {
+ virMutexLock(&priv->lock);
remoteClientFreePrivateCallbacks(priv);
+ priv->clientClosed = true;
+ virMutexUnlock(&priv->lock);
+ }
}
@@ -3889,6 +3893,11 @@ remoteDispatchConnectDomainEventRegister(virNetServerPtr server
ATTRIBUTE_UNUSED
virMutexLock(&priv->lock);
+ if (priv->clientClosed) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("connection want
close"));
If we find we need to add it, then the message could be "connection
closing".
John
+ goto cleanup;
+ }
+
/* If we call register first, we could append a complete callback
* to our array, but on OOM append failure, we'd have to then hope
* deregister works to undo our register. So instead we append an
@@ -4116,6 +4125,11 @@ remoteDispatchConnectDomainEventRegisterAny(virNetServerPtr server
ATTRIBUTE_UNU
virMutexLock(&priv->lock);
+ if (priv->clientClosed) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("connection want
close"));
+ goto cleanup;
+ }
+
/* We intentionally do not use VIR_DOMAIN_EVENT_ID_LAST here; any
* new domain events added after this point should only use the
* modern callback style of RPC. */
@@ -4192,6 +4206,11 @@
remoteDispatchConnectDomainEventCallbackRegisterAny(virNetServerPtr server ATTRI
virMutexLock(&priv->lock);
+ if (priv->clientClosed) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("connection want
close"));
+ goto cleanup;
+ }
+
if (args->dom &&
!(dom = get_nonnull_domain(priv->conn, *args->dom)))
goto cleanup;
@@ -5702,6 +5721,11 @@ remoteDispatchConnectNetworkEventRegisterAny(virNetServerPtr
server ATTRIBUTE_UN
virMutexLock(&priv->lock);
+ if (priv->clientClosed) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("connection want
close"));
+ goto cleanup;
+ }
+
if (args->net &&
!(net = get_nonnull_network(priv->conn, *args->net)))
goto cleanup;
@@ -5824,6 +5848,11 @@ remoteDispatchConnectStoragePoolEventRegisterAny(virNetServerPtr
server ATTRIBUT
virMutexLock(&priv->lock);
+ if (priv->clientClosed) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("connection want
close"));
+ goto cleanup;
+ }
+
if (args->pool &&
!(pool = get_nonnull_storage_pool(priv->conn, *args->pool)))
goto cleanup;
@@ -5945,6 +5974,11 @@ remoteDispatchConnectNodeDeviceEventRegisterAny(virNetServerPtr
server ATTRIBUTE
virMutexLock(&priv->lock);
+ if (priv->clientClosed) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("connection want
close"));
+ goto cleanup;
+ }
+
if (args->dev &&
!(dev = get_nonnull_node_device(priv->conn, *args->dev)))
goto cleanup;
@@ -6066,6 +6100,11 @@ remoteDispatchConnectSecretEventRegisterAny(virNetServerPtr server
ATTRIBUTE_UNU
virMutexLock(&priv->lock);
+ if (priv->clientClosed) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("connection want
close"));
+ goto cleanup;
+ }
+
if (args->secret &&
!(secret = get_nonnull_secret(priv->conn, *args->secret)))
goto cleanup;
@@ -6188,6 +6227,11 @@ qemuDispatchConnectDomainMonitorEventRegister(virNetServerPtr
server ATTRIBUTE_U
virMutexLock(&priv->lock);
+ if (priv->clientClosed) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("connection want
close"));
+ goto cleanup;
+ }
+
if (args->dom &&
!(dom = get_nonnull_domain(priv->conn, *args->dom)))
goto cleanup;