[libvirt] [PATCH v4 0/2] daemon: fix termination/reload issues

This is a repost/fixup of Nikolay's v3: https://www.redhat.com/archives/libvir-list/2017-October/msg01089.html The primary difference here is a reorder of the patches to perform the proper refcnt operations before reordering the shutdown path in order to clean up servers out of the hash table sooner than later. In particular, the lock_manager sequencing in order to make sure that the virObjectUnref is done "orderly". Nikolay - if you're fine with these changes let me know and I can then push them (unless of course someone else ACK's before that). Hopefully the change to lock_daemon makes sense. Theoretically spaking the virHashFree in virNetDaemonDispose is probably now superfluous since patch 2/2 will remove all the servers much sooner, but let's keep it there just to be safe! Nikolay Shirokovskiy (2): rpc: When adding srv to dmn servers, need to add ref libvirtd: fix crash on termination src/locking/lock_daemon.c | 19 ++++++++++++++----- src/rpc/virnetdaemon.c | 4 ++++ 2 files changed, 18 insertions(+), 5 deletions(-) -- 2.13.6

From: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> Commit id '252610f7d' modified net server management to use a hash table to store/manage the various servers; however, during virNetDaemonAddServerPostExec an @srv object is created, added to the dmn->servers hash table, but did not increment the object refcnt like was done during virNetDaemonAddServer as if @srv were being created for the first time. Without the proper ref a subsequent virObjectUnref done during virNetDaemonDispose when virHashFree calls the hash table entry @dataFree function virObjectFreeHashData could inadvertently free the memory before some caller is really done with it or cause the caller to attempt to free something that's already been freed (and it no longer necessarily owns). Additionally, since commit id 'f08b1c58f3' removed the @srv from virLockManager in favor of using virNetDaemonGetServer which creates a ref on @srv, we need to modify the lock_manager code in order to properly handle and dispose the references to the @srv object allowing either the cleanup processing or the virNetDaemonDispose processing to remove the last ref to the object. Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/locking/lock_daemon.c | 19 ++++++++++++++----- src/rpc/virnetdaemon.c | 2 ++ 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/src/locking/lock_daemon.c b/src/locking/lock_daemon.c index fe3eaf9032..ee2c35fdb8 100644 --- a/src/locking/lock_daemon.c +++ b/src/locking/lock_daemon.c @@ -1312,14 +1312,14 @@ int main(int argc, char **argv) { srv = virNetDaemonGetServer(lockDaemon->dmn, "virtlockd"); if ((rv = virLockDaemonSetupNetworkingSystemD(srv)) < 0) { ret = VIR_LOCK_DAEMON_ERR_NETWORK; - goto cleanup; + goto cleanup_srvref; } /* Only do this, if systemd did not pass a FD */ if (rv == 0 && virLockDaemonSetupNetworkingNative(srv, sock_file) < 0) { ret = VIR_LOCK_DAEMON_ERR_NETWORK; - goto cleanup; + goto cleanup_srvref; } } else if (rv == 1) { srv = virNetDaemonGetServer(lockDaemon->dmn, "virtlockd"); @@ -1333,7 +1333,7 @@ int main(int argc, char **argv) { if ((virLockDaemonSetupSignals(lockDaemon->dmn)) < 0) { ret = VIR_LOCK_DAEMON_ERR_SIGNAL; - goto cleanup; + goto cleanup_srvref; } if (!(lockProgram = virNetServerProgramNew(VIR_LOCK_SPACE_PROTOCOL_PROGRAM, @@ -1341,14 +1341,17 @@ int main(int argc, char **argv) { virLockSpaceProtocolProcs, virLockSpaceProtocolNProcs))) { ret = VIR_LOCK_DAEMON_ERR_INIT; - goto cleanup; + goto cleanup_srvref; } if (virNetServerAddProgram(srv, lockProgram) < 0) { ret = VIR_LOCK_DAEMON_ERR_INIT; - goto cleanup; + goto cleanup_srvref; } + /* Completed srv usage from virNetDaemonGetServer */ + virObjectUnref(srv); + /* Disable error func, now logging is setup */ virSetErrorFunc(NULL, virLockDaemonErrorHandler); @@ -1403,4 +1406,10 @@ int main(int argc, char **argv) { no_memory: VIR_ERROR(_("Can't allocate memory")); exit(EXIT_FAILURE); + + cleanup_srvref: + /* Unref for virNetDaemonGetServer ref - still have "our" ref from + * allocation and possibly a ref for being in netserver hash table */ + virObjectUnref(srv); + goto cleanup; } diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index e3b9390af2..d970c47ad4 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -313,6 +313,8 @@ virNetDaemonAddServerPostExec(virNetDaemonPtr dmn, if (virHashAddEntry(dmn->servers, serverName, srv) < 0) goto error; + virObjectRef(srv); + virJSONValueFree(object); virObjectUnlock(dmn); return srv; -- 2.13.6

On 27.10.2017 08:26, John Ferlan wrote:
From: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Commit id '252610f7d' modified net server management to use a hash table to store/manage the various servers; however, during virNetDaemonAddServerPostExec an @srv object is created, added to the dmn->servers hash table, but did not increment the object refcnt like was done during virNetDaemonAddServer as if @srv were being created for the first time.
I'm not agree that 252610f7d introduced the problem. Before this commit the situation was the same as now. virNetDaemonAddServer takes extra reference, virNetDaemonAddServerPostExec does not take extra reference, virNetDaemonDispose unref every server in array (just as hash table does). lock daemon does not store object returned by virNetDaemonAddServerPostExec and log daemon store the object and unref it in virLogDaemonFree.
Without the proper ref a subsequent virObjectUnref done during virNetDaemonDispose when virHashFree calls the hash table entry @dataFree function virObjectFreeHashData could inadvertently free the memory before some caller is really done with it or cause the caller to attempt to free something that's already been freed (and it no longer necessarily owns).
Additionally, since commit id 'f08b1c58f3' removed the @srv from virLockManager in favor of using virNetDaemonGetServer which creates a ref on @srv, we need to modify the lock_manager code in order to properly handle and dispose the references to the @srv object allowing either the cleanup processing or the virNetDaemonDispose processing to remove the last ref to the object.
But @srv is unrefed on cleanup already. What use of creating extra label? Besides this patch is missing unref @srv in virLockDaemonNewPostExecRestart in the original patch. I think it is necessary.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/locking/lock_daemon.c | 19 ++++++++++++++----- src/rpc/virnetdaemon.c | 2 ++ 2 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/src/locking/lock_daemon.c b/src/locking/lock_daemon.c index fe3eaf9032..ee2c35fdb8 100644 --- a/src/locking/lock_daemon.c +++ b/src/locking/lock_daemon.c @@ -1312,14 +1312,14 @@ int main(int argc, char **argv) { srv = virNetDaemonGetServer(lockDaemon->dmn, "virtlockd"); if ((rv = virLockDaemonSetupNetworkingSystemD(srv)) < 0) { ret = VIR_LOCK_DAEMON_ERR_NETWORK; - goto cleanup; + goto cleanup_srvref; }
/* Only do this, if systemd did not pass a FD */ if (rv == 0 && virLockDaemonSetupNetworkingNative(srv, sock_file) < 0) { ret = VIR_LOCK_DAEMON_ERR_NETWORK; - goto cleanup; + goto cleanup_srvref; } } else if (rv == 1) { srv = virNetDaemonGetServer(lockDaemon->dmn, "virtlockd"); @@ -1333,7 +1333,7 @@ int main(int argc, char **argv) {
if ((virLockDaemonSetupSignals(lockDaemon->dmn)) < 0) { ret = VIR_LOCK_DAEMON_ERR_SIGNAL; - goto cleanup; + goto cleanup_srvref; }
if (!(lockProgram = virNetServerProgramNew(VIR_LOCK_SPACE_PROTOCOL_PROGRAM, @@ -1341,14 +1341,17 @@ int main(int argc, char **argv) { virLockSpaceProtocolProcs, virLockSpaceProtocolNProcs))) { ret = VIR_LOCK_DAEMON_ERR_INIT; - goto cleanup; + goto cleanup_srvref; }
if (virNetServerAddProgram(srv, lockProgram) < 0) { ret = VIR_LOCK_DAEMON_ERR_INIT; - goto cleanup; + goto cleanup_srvref; }
+ /* Completed srv usage from virNetDaemonGetServer */ + virObjectUnref(srv); + /* Disable error func, now logging is setup */ virSetErrorFunc(NULL, virLockDaemonErrorHandler);
@@ -1403,4 +1406,10 @@ int main(int argc, char **argv) { no_memory: VIR_ERROR(_("Can't allocate memory")); exit(EXIT_FAILURE); + + cleanup_srvref: + /* Unref for virNetDaemonGetServer ref - still have "our" ref from + * allocation and possibly a ref for being in netserver hash table */ + virObjectUnref(srv); + goto cleanup; } diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index e3b9390af2..d970c47ad4 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -313,6 +313,8 @@ virNetDaemonAddServerPostExec(virNetDaemonPtr dmn, if (virHashAddEntry(dmn->servers, serverName, srv) < 0) goto error;
+ virObjectRef(srv); + virJSONValueFree(object); virObjectUnlock(dmn); return srv;

On 10/27/2017 02:29 AM, Nikolay Shirokovskiy wrote:
On 27.10.2017 08:26, John Ferlan wrote:
From: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Commit id '252610f7d' modified net server management to use a hash table to store/manage the various servers; however, during virNetDaemonAddServerPostExec an @srv object is created, added to the dmn->servers hash table, but did not increment the object refcnt like was done during virNetDaemonAddServer as if @srv were being created for the first time.
I'm not agree that 252610f7d introduced the problem. Before this commit the situation was the same as now. virNetDaemonAddServer takes extra reference, virNetDaemonAddServerPostExec does not take extra reference, virNetDaemonDispose unref every server in array (just as hash table does). lock daemon does not store object returned by virNetDaemonAddServerPostExec and log daemon store the object and unref it in virLogDaemonFree.
My point wasn't to blame commit 252610f7d entirely, but it helps point out where the logic was added to use hash tables instead of linked lists. Still true though that for virNetDaemonAddServer we've ref'd when adding to the hash table and for virNetDaemonAddServerPostExec we do not.
Without the proper ref a subsequent virObjectUnref done during virNetDaemonDispose when virHashFree calls the hash table entry @dataFree function virObjectFreeHashData could inadvertently free the memory before some caller is really done with it or cause the caller to attempt to free something that's already been freed (and it no longer necessarily owns).
Additionally, since commit id 'f08b1c58f3' removed the @srv from virLockManager in favor of using virNetDaemonGetServer which creates a ref on @srv, we need to modify the lock_manager code in order to properly handle and dispose the references to the @srv object allowing either the cleanup processing or the virNetDaemonDispose processing to remove the last ref to the object.
But @srv is unrefed on cleanup already. What use of creating extra label?
You have to read my last response to your patch, but again... Consider that for the virLockDaemonNew path (e.g. rv == 0), there are two refs created - 1 for the alloc and 1 for the insert to hash table. When virNetDaemonGetServer is called a 3rd ref is generated. So when "undo-ing" your refs - you need to cover all your bases. For any failure after virNetDaemonGetServer succeeds, one needs to virObjectUnref that @srv ref, hence the new label. If we reach the cleanup: label, then the virObjectUnref there is the same as the virObjectUnref in log_daemon.c w/r/t virLogDaemonFree and the virObjectUnref(logd->srv) - that is at this point we're declaring we're done with @srv. If there's still a ref because virNetDaemonDispose hasn't run, then when that runs it'll free things; otherwise, @srv would be free'd on our Unref.
Besides this patch is missing unref @srv in virLockDaemonNewPostExecRestart in the original patch. I think it is necessary.
And I don't believe that's necessary. What purpose does it serve? When we leave virLockDaemonNewPostExecRestart or virNetDaemonAddServerPostExec there should be 2 refs *just like* there were when we left virLockDaemonNew. That way when we reach the "else (rv == 1)" code (back in main) and call virNetDaemonGetServer we create that 3rd ref to @srv just like we had in the rv == 0 path. If virLockDaemonPostExecRestart fails, eg, rv < 0, then all bets are off, the @srv == NULL, the virObjectUnref does nothing and we die quite ungracefully. At this point in time virLockDaemonPostExecRestart only ever returns -1, 0, or 1. Hopefully this makes sense. John
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/locking/lock_daemon.c | 19 ++++++++++++++----- src/rpc/virnetdaemon.c | 2 ++ 2 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/src/locking/lock_daemon.c b/src/locking/lock_daemon.c index fe3eaf9032..ee2c35fdb8 100644 --- a/src/locking/lock_daemon.c +++ b/src/locking/lock_daemon.c @@ -1312,14 +1312,14 @@ int main(int argc, char **argv) { srv = virNetDaemonGetServer(lockDaemon->dmn, "virtlockd"); if ((rv = virLockDaemonSetupNetworkingSystemD(srv)) < 0) { ret = VIR_LOCK_DAEMON_ERR_NETWORK; - goto cleanup; + goto cleanup_srvref; }
/* Only do this, if systemd did not pass a FD */ if (rv == 0 && virLockDaemonSetupNetworkingNative(srv, sock_file) < 0) { ret = VIR_LOCK_DAEMON_ERR_NETWORK; - goto cleanup; + goto cleanup_srvref; } } else if (rv == 1) { srv = virNetDaemonGetServer(lockDaemon->dmn, "virtlockd"); @@ -1333,7 +1333,7 @@ int main(int argc, char **argv) {
if ((virLockDaemonSetupSignals(lockDaemon->dmn)) < 0) { ret = VIR_LOCK_DAEMON_ERR_SIGNAL; - goto cleanup; + goto cleanup_srvref; }
if (!(lockProgram = virNetServerProgramNew(VIR_LOCK_SPACE_PROTOCOL_PROGRAM, @@ -1341,14 +1341,17 @@ int main(int argc, char **argv) { virLockSpaceProtocolProcs, virLockSpaceProtocolNProcs))) { ret = VIR_LOCK_DAEMON_ERR_INIT; - goto cleanup; + goto cleanup_srvref; }
if (virNetServerAddProgram(srv, lockProgram) < 0) { ret = VIR_LOCK_DAEMON_ERR_INIT; - goto cleanup; + goto cleanup_srvref; }
+ /* Completed srv usage from virNetDaemonGetServer */ + virObjectUnref(srv); + /* Disable error func, now logging is setup */ virSetErrorFunc(NULL, virLockDaemonErrorHandler);
@@ -1403,4 +1406,10 @@ int main(int argc, char **argv) { no_memory: VIR_ERROR(_("Can't allocate memory")); exit(EXIT_FAILURE); + + cleanup_srvref: + /* Unref for virNetDaemonGetServer ref - still have "our" ref from + * allocation and possibly a ref for being in netserver hash table */ + virObjectUnref(srv); + goto cleanup; } diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index e3b9390af2..d970c47ad4 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -313,6 +313,8 @@ virNetDaemonAddServerPostExec(virNetDaemonPtr dmn, if (virHashAddEntry(dmn->servers, serverName, srv) < 0) goto error;
+ virObjectRef(srv); + virJSONValueFree(object); virObjectUnlock(dmn); return srv;

On 27.10.2017 13:44, John Ferlan wrote:
On 10/27/2017 02:29 AM, Nikolay Shirokovskiy wrote:
On 27.10.2017 08:26, John Ferlan wrote:
From: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Commit id '252610f7d' modified net server management to use a hash table to store/manage the various servers; however, during virNetDaemonAddServerPostExec an @srv object is created, added to the dmn->servers hash table, but did not increment the object refcnt like was done during virNetDaemonAddServer as if @srv were being created for the first time.
I'm not agree that 252610f7d introduced the problem. Before this commit the situation was the same as now. virNetDaemonAddServer takes extra reference, virNetDaemonAddServerPostExec does not take extra reference, virNetDaemonDispose unref every server in array (just as hash table does). lock daemon does not store object returned by virNetDaemonAddServerPostExec and log daemon store the object and unref it in virLogDaemonFree.
My point wasn't to blame commit 252610f7d entirely, but it helps point out where the logic was added to use hash tables instead of linked lists.
Still true though that for virNetDaemonAddServer we've ref'd when adding to the hash table and for virNetDaemonAddServerPostExec we do not.
Without the proper ref a subsequent virObjectUnref done during virNetDaemonDispose when virHashFree calls the hash table entry @dataFree function virObjectFreeHashData could inadvertently free the memory before some caller is really done with it or cause the caller to attempt to free something that's already been freed (and it no longer necessarily owns).
Additionally, since commit id 'f08b1c58f3' removed the @srv from virLockManager in favor of using virNetDaemonGetServer which creates a ref on @srv, we need to modify the lock_manager code in order to properly handle and dispose the references to the @srv object allowing either the cleanup processing or the virNetDaemonDispose processing to remove the last ref to the object.
But @srv is unrefed on cleanup already. What use of creating extra label?
You have to read my last response to your patch, but again... Consider that for the virLockDaemonNew path (e.g. rv == 0), there are two refs created - 1 for the alloc and 1 for the insert to hash table. When> virNetDaemonGetServer is called a 3rd ref is generated. So when "undo-ing" your refs - you need to cover all your bases. For any failure after virNetDaemonGetServer succeeds, one needs to virObjectUnref that @srv ref, hence the new label.
If we reach the cleanup: label, then the virObjectUnref there is the same as the virObjectUnref in log_daemon.c w/r/t virLogDaemonFree and the virObjectUnref(logd->srv) - that is at this point we're declaring we're done with @srv. If there's still a ref because virNetDaemonDispose hasn't run, then when that runs it'll free things; otherwise, @srv would be free'd on our Unref.
Ohh, I see know. You adressing another problem in virtlockd code. virLockDaemonNew forgets to unref @srv. I think this should be better addressed in virLockDaemonNew.
Besides this patch is missing unref @srv in virLockDaemonNewPostExecRestart in the original patch. I think it is necessary.
And I don't believe that's necessary. What purpose does it serve? When we leave virLockDaemonNewPostExecRestart or virNetDaemonAddServerPostExec there should be 2 refs *just like* there were when we left virLockDaemonNew. That way when we reach the "else (rv == 1)" code (back in main) and call virNetDaemonGetServer we create that 3rd ref to @srv just like we had in the rv == 0 path.
Or we can fix virLockDaemonNew to have only 1 refs. 1 ref is enough AFAIU.
If virLockDaemonPostExecRestart fails, eg, rv < 0, then all bets are off, the @srv == NULL, the virObjectUnref does nothing and we die quite ungracefully.
I think we better make virLockDaemonPostExecRestart to return with no references to net servers in case of fail then we have no problems.
At this point in time virLockDaemonPostExecRestart only ever returns -1, 0, or 1.
Hopefully this makes sense.
In short I would leave unref in virLockDaemonPostExecRestart of the original patch and add unref to virLockDaemonNew in a distinct patch.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/locking/lock_daemon.c | 19 ++++++++++++++----- src/rpc/virnetdaemon.c | 2 ++ 2 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/src/locking/lock_daemon.c b/src/locking/lock_daemon.c index fe3eaf9032..ee2c35fdb8 100644 --- a/src/locking/lock_daemon.c +++ b/src/locking/lock_daemon.c @@ -1312,14 +1312,14 @@ int main(int argc, char **argv) { srv = virNetDaemonGetServer(lockDaemon->dmn, "virtlockd"); if ((rv = virLockDaemonSetupNetworkingSystemD(srv)) < 0) { ret = VIR_LOCK_DAEMON_ERR_NETWORK; - goto cleanup; + goto cleanup_srvref; }
/* Only do this, if systemd did not pass a FD */ if (rv == 0 && virLockDaemonSetupNetworkingNative(srv, sock_file) < 0) { ret = VIR_LOCK_DAEMON_ERR_NETWORK; - goto cleanup; + goto cleanup_srvref; } } else if (rv == 1) { srv = virNetDaemonGetServer(lockDaemon->dmn, "virtlockd"); @@ -1333,7 +1333,7 @@ int main(int argc, char **argv) {
if ((virLockDaemonSetupSignals(lockDaemon->dmn)) < 0) { ret = VIR_LOCK_DAEMON_ERR_SIGNAL; - goto cleanup; + goto cleanup_srvref; }
if (!(lockProgram = virNetServerProgramNew(VIR_LOCK_SPACE_PROTOCOL_PROGRAM, @@ -1341,14 +1341,17 @@ int main(int argc, char **argv) { virLockSpaceProtocolProcs, virLockSpaceProtocolNProcs))) { ret = VIR_LOCK_DAEMON_ERR_INIT; - goto cleanup; + goto cleanup_srvref; }
if (virNetServerAddProgram(srv, lockProgram) < 0) { ret = VIR_LOCK_DAEMON_ERR_INIT; - goto cleanup; + goto cleanup_srvref; }
+ /* Completed srv usage from virNetDaemonGetServer */ + virObjectUnref(srv); + /* Disable error func, now logging is setup */ virSetErrorFunc(NULL, virLockDaemonErrorHandler);
@@ -1403,4 +1406,10 @@ int main(int argc, char **argv) { no_memory: VIR_ERROR(_("Can't allocate memory")); exit(EXIT_FAILURE); + + cleanup_srvref: + /* Unref for virNetDaemonGetServer ref - still have "our" ref from + * allocation and possibly a ref for being in netserver hash table */ + virObjectUnref(srv); + goto cleanup; } diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index e3b9390af2..d970c47ad4 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -313,6 +313,8 @@ virNetDaemonAddServerPostExec(virNetDaemonPtr dmn, if (virHashAddEntry(dmn->servers, serverName, srv) < 0) goto error;
+ virObjectRef(srv); + virJSONValueFree(object); virObjectUnlock(dmn); return srv;

On 27.10.2017 14:29, Nikolay Shirokovskiy wrote:
On 27.10.2017 13:44, John Ferlan wrote:
On 10/27/2017 02:29 AM, Nikolay Shirokovskiy wrote:
On 27.10.2017 08:26, John Ferlan wrote:
From: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Commit id '252610f7d' modified net server management to use a hash table to store/manage the various servers; however, during virNetDaemonAddServerPostExec an @srv object is created, added to the dmn->servers hash table, but did not increment the object refcnt like was done during virNetDaemonAddServer as if @srv were being created for the first time.
I'm not agree that 252610f7d introduced the problem. Before this commit the situation was the same as now. virNetDaemonAddServer takes extra reference, virNetDaemonAddServerPostExec does not take extra reference, virNetDaemonDispose unref every server in array (just as hash table does). lock daemon does not store object returned by virNetDaemonAddServerPostExec and log daemon store the object and unref it in virLogDaemonFree.
My point wasn't to blame commit 252610f7d entirely, but it helps point out where the logic was added to use hash tables instead of linked lists.
Still true though that for virNetDaemonAddServer we've ref'd when adding to the hash table and for virNetDaemonAddServerPostExec we do not.
Without the proper ref a subsequent virObjectUnref done during virNetDaemonDispose when virHashFree calls the hash table entry @dataFree function virObjectFreeHashData could inadvertently free the memory before some caller is really done with it or cause the caller to attempt to free something that's already been freed (and it no longer necessarily owns).
Additionally, since commit id 'f08b1c58f3' removed the @srv from virLockManager in favor of using virNetDaemonGetServer which creates a ref on @srv, we need to modify the lock_manager code in order to properly handle and dispose the references to the @srv object allowing either the cleanup processing or the virNetDaemonDispose processing to remove the last ref to the object.
But @srv is unrefed on cleanup already. What use of creating extra label?
You have to read my last response to your patch, but again... Consider that for the virLockDaemonNew path (e.g. rv == 0), there are two refs created - 1 for the alloc and 1 for the insert to hash table. When> virNetDaemonGetServer is called a 3rd ref is generated. So when "undo-ing" your refs - you need to cover all your bases. For any failure after virNetDaemonGetServer succeeds, one needs to virObjectUnref that @srv ref, hence the new label.
If we reach the cleanup: label, then the virObjectUnref there is the same as the virObjectUnref in log_daemon.c w/r/t virLogDaemonFree and the virObjectUnref(logd->srv) - that is at this point we're declaring we're done with @srv. If there's still a ref because virNetDaemonDispose hasn't run, then when that runs it'll free things; otherwise, @srv would be free'd on our Unref.
Ohh, I see know. You adressing another problem in virtlockd code. virLockDaemonNew forgets to unref @srv. I think this should be better addressed in virLockDaemonNew.
Besides this patch is missing unref @srv in virLockDaemonNewPostExecRestart in the original patch. I think it is necessary.
And I don't believe that's necessary. What purpose does it serve? When we leave virLockDaemonNewPostExecRestart or virNetDaemonAddServerPostExec there should be 2 refs *just like* there were when we left virLockDaemonNew. That way when we reach the "else (rv == 1)" code (back in main) and call virNetDaemonGetServer we create that 3rd ref to @srv just like we had in the rv == 0 path.
Or we can fix virLockDaemonNew to have only 1 refs. 1 ref is enough AFAIU.
If virLockDaemonPostExecRestart fails, eg, rv < 0, then all bets are off, the @srv == NULL, the virObjectUnref does nothing and we die quite ungracefully.
I think we better make virLockDaemonPostExecRestart to return with no references to net servers in case of fail then we have no problems.
At this point in time virLockDaemonPostExecRestart only ever returns -1, 0, or 1.
Hopefully this makes sense.
In short I would leave unref in virLockDaemonPostExecRestart of the original patch and add unref to virLockDaemonNew in a distinct patch.
Particularly the former missing unref is introduced in fa1420736 and the latter is in f08b1c58f3.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/locking/lock_daemon.c | 19 ++++++++++++++----- src/rpc/virnetdaemon.c | 2 ++ 2 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/src/locking/lock_daemon.c b/src/locking/lock_daemon.c index fe3eaf9032..ee2c35fdb8 100644 --- a/src/locking/lock_daemon.c +++ b/src/locking/lock_daemon.c @@ -1312,14 +1312,14 @@ int main(int argc, char **argv) { srv = virNetDaemonGetServer(lockDaemon->dmn, "virtlockd"); if ((rv = virLockDaemonSetupNetworkingSystemD(srv)) < 0) { ret = VIR_LOCK_DAEMON_ERR_NETWORK; - goto cleanup; + goto cleanup_srvref; }
/* Only do this, if systemd did not pass a FD */ if (rv == 0 && virLockDaemonSetupNetworkingNative(srv, sock_file) < 0) { ret = VIR_LOCK_DAEMON_ERR_NETWORK; - goto cleanup; + goto cleanup_srvref; } } else if (rv == 1) { srv = virNetDaemonGetServer(lockDaemon->dmn, "virtlockd"); @@ -1333,7 +1333,7 @@ int main(int argc, char **argv) {
if ((virLockDaemonSetupSignals(lockDaemon->dmn)) < 0) { ret = VIR_LOCK_DAEMON_ERR_SIGNAL; - goto cleanup; + goto cleanup_srvref; }
if (!(lockProgram = virNetServerProgramNew(VIR_LOCK_SPACE_PROTOCOL_PROGRAM, @@ -1341,14 +1341,17 @@ int main(int argc, char **argv) { virLockSpaceProtocolProcs, virLockSpaceProtocolNProcs))) { ret = VIR_LOCK_DAEMON_ERR_INIT; - goto cleanup; + goto cleanup_srvref; }
if (virNetServerAddProgram(srv, lockProgram) < 0) { ret = VIR_LOCK_DAEMON_ERR_INIT; - goto cleanup; + goto cleanup_srvref; }
+ /* Completed srv usage from virNetDaemonGetServer */ + virObjectUnref(srv); + /* Disable error func, now logging is setup */ virSetErrorFunc(NULL, virLockDaemonErrorHandler);
@@ -1403,4 +1406,10 @@ int main(int argc, char **argv) { no_memory: VIR_ERROR(_("Can't allocate memory")); exit(EXIT_FAILURE); + + cleanup_srvref: + /* Unref for virNetDaemonGetServer ref - still have "our" ref from + * allocation and possibly a ref for being in netserver hash table */ + virObjectUnref(srv); + goto cleanup; } diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index e3b9390af2..d970c47ad4 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -313,6 +313,8 @@ virNetDaemonAddServerPostExec(virNetDaemonPtr dmn, if (virHashAddEntry(dmn->servers, serverName, srv) < 0) goto error;
+ virObjectRef(srv); + virJSONValueFree(object); virObjectUnlock(dmn); return srv;
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 10/27/2017 08:02 AM, Nikolay Shirokovskiy wrote:
On 27.10.2017 14:29, Nikolay Shirokovskiy wrote:
On 27.10.2017 13:44, John Ferlan wrote:
On 10/27/2017 02:29 AM, Nikolay Shirokovskiy wrote:
On 27.10.2017 08:26, John Ferlan wrote:
From: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Commit id '252610f7d' modified net server management to use a hash table to store/manage the various servers; however, during virNetDaemonAddServerPostExec an @srv object is created, added to the dmn->servers hash table, but did not increment the object refcnt like was done during virNetDaemonAddServer as if @srv were being created for the first time.
I'm not agree that 252610f7d introduced the problem. Before this commit the situation was the same as now. virNetDaemonAddServer takes extra reference, virNetDaemonAddServerPostExec does not take extra reference, virNetDaemonDispose unref every server in array (just as hash table does). lock daemon does not store object returned by virNetDaemonAddServerPostExec and log daemon store the object and unref it in virLogDaemonFree.
My point wasn't to blame commit 252610f7d entirely, but it helps point out where the logic was added to use hash tables instead of linked lists.
Still true though that for virNetDaemonAddServer we've ref'd when adding to the hash table and for virNetDaemonAddServerPostExec we do not.
Without the proper ref a subsequent virObjectUnref done during virNetDaemonDispose when virHashFree calls the hash table entry @dataFree function virObjectFreeHashData could inadvertently free the memory before some caller is really done with it or cause the caller to attempt to free something that's already been freed (and it no longer necessarily owns).
Additionally, since commit id 'f08b1c58f3' removed the @srv from virLockManager in favor of using virNetDaemonGetServer which creates a ref on @srv, we need to modify the lock_manager code in order to properly handle and dispose the references to the @srv object allowing either the cleanup processing or the virNetDaemonDispose processing to remove the last ref to the object.
But @srv is unrefed on cleanup already. What use of creating extra label?
You have to read my last response to your patch, but again... Consider that for the virLockDaemonNew path (e.g. rv == 0), there are two refs created - 1 for the alloc and 1 for the insert to hash table. When> virNetDaemonGetServer is called a 3rd ref is generated. So when "undo-ing" your refs - you need to cover all your bases. For any failure after virNetDaemonGetServer succeeds, one needs to virObjectUnref that @srv ref, hence the new label.
If we reach the cleanup: label, then the virObjectUnref there is the same as the virObjectUnref in log_daemon.c w/r/t virLogDaemonFree and the virObjectUnref(logd->srv) - that is at this point we're declaring we're done with @srv. If there's still a ref because virNetDaemonDispose hasn't run, then when that runs it'll free things; otherwise, @srv would be free'd on our Unref.
Ohh, I see know. You adressing another problem in virtlockd code. virLockDaemonNew forgets to unref @srv. I think this should be better addressed in virLockDaemonNew.
I suppose that's another way to handle this since virLockDaemonNew allocates and places into hash table and then "theoretically" becomes hands off allowing the main() to call virNetDaemonGetServer in order to get a new ref to the object. The virLogDaemon handling should do the same thing IMO - that is don't save @srv in logd - but that's a different issue. Also of note is that @srv is leaked when going to error: in virLockDaemonNew if virNetDaemonNew fails <sigh>. I do see now why you also had the Unref patch in patch2 of v2. I'll hopefully talk with Erik in the morning and we can come to a conclusion on this. John
Besides this patch is missing unref @srv in virLockDaemonNewPostExecRestart in the original patch. I think it is necessary.
And I don't believe that's necessary. What purpose does it serve? When we leave virLockDaemonNewPostExecRestart or virNetDaemonAddServerPostExec there should be 2 refs *just like* there were when we left virLockDaemonNew. That way when we reach the "else (rv == 1)" code (back in main) and call virNetDaemonGetServer we create that 3rd ref to @srv just like we had in the rv == 0 path.
Or we can fix virLockDaemonNew to have only 1 refs. 1 ref is enough AFAIU.
If virLockDaemonPostExecRestart fails, eg, rv < 0, then all bets are off, the @srv == NULL, the virObjectUnref does nothing and we die quite ungracefully.
I think we better make virLockDaemonPostExecRestart to return with no references to net servers in case of fail then we have no problems.
At this point in time virLockDaemonPostExecRestart only ever returns -1, 0, or 1.
Hopefully this makes sense.
In short I would leave unref in virLockDaemonPostExecRestart of the original patch and add unref to virLockDaemonNew in a distinct patch.
Particularly the former missing unref is introduced in fa1420736 and the latter is in f08b1c58f3.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/locking/lock_daemon.c | 19 ++++++++++++++----- src/rpc/virnetdaemon.c | 2 ++ 2 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/src/locking/lock_daemon.c b/src/locking/lock_daemon.c index fe3eaf9032..ee2c35fdb8 100644 --- a/src/locking/lock_daemon.c +++ b/src/locking/lock_daemon.c @@ -1312,14 +1312,14 @@ int main(int argc, char **argv) { srv = virNetDaemonGetServer(lockDaemon->dmn, "virtlockd"); if ((rv = virLockDaemonSetupNetworkingSystemD(srv)) < 0) { ret = VIR_LOCK_DAEMON_ERR_NETWORK; - goto cleanup; + goto cleanup_srvref; }
/* Only do this, if systemd did not pass a FD */ if (rv == 0 && virLockDaemonSetupNetworkingNative(srv, sock_file) < 0) { ret = VIR_LOCK_DAEMON_ERR_NETWORK; - goto cleanup; + goto cleanup_srvref; } } else if (rv == 1) { srv = virNetDaemonGetServer(lockDaemon->dmn, "virtlockd"); @@ -1333,7 +1333,7 @@ int main(int argc, char **argv) {
if ((virLockDaemonSetupSignals(lockDaemon->dmn)) < 0) { ret = VIR_LOCK_DAEMON_ERR_SIGNAL; - goto cleanup; + goto cleanup_srvref; }
if (!(lockProgram = virNetServerProgramNew(VIR_LOCK_SPACE_PROTOCOL_PROGRAM, @@ -1341,14 +1341,17 @@ int main(int argc, char **argv) { virLockSpaceProtocolProcs, virLockSpaceProtocolNProcs))) { ret = VIR_LOCK_DAEMON_ERR_INIT; - goto cleanup; + goto cleanup_srvref; }
if (virNetServerAddProgram(srv, lockProgram) < 0) { ret = VIR_LOCK_DAEMON_ERR_INIT; - goto cleanup; + goto cleanup_srvref; }
+ /* Completed srv usage from virNetDaemonGetServer */ + virObjectUnref(srv); + /* Disable error func, now logging is setup */ virSetErrorFunc(NULL, virLockDaemonErrorHandler);
@@ -1403,4 +1406,10 @@ int main(int argc, char **argv) { no_memory: VIR_ERROR(_("Can't allocate memory")); exit(EXIT_FAILURE); + + cleanup_srvref: + /* Unref for virNetDaemonGetServer ref - still have "our" ref from + * allocation and possibly a ref for being in netserver hash table */ + virObjectUnref(srv); + goto cleanup; } diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index e3b9390af2..d970c47ad4 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -313,6 +313,8 @@ virNetDaemonAddServerPostExec(virNetDaemonPtr dmn, if (virHashAddEntry(dmn->servers, serverName, srv) < 0) goto error;
+ virObjectRef(srv); + virJSONValueFree(object); virObjectUnlock(dmn); return srv;
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 10/27/2017 02:29 AM, Nikolay Shirokovskiy wrote:
On 27.10.2017 08:26, John Ferlan wrote:
From: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Commit id '252610f7d' modified net server management to use a hash table to store/manage the various servers; however, during virNetDaemonAddServerPostExec an @srv object is created, added to the dmn->servers hash table, but did not increment the object refcnt like was done during virNetDaemonAddServer as if @srv were being created for the first time.
I'm not agree that 252610f7d introduced the problem. Before this commit the situation was the same as now. virNetDaemonAddServer takes extra reference, virNetDaemonAddServerPostExec does not take extra reference, virNetDaemonDispose unref every server in array (just as hash table does). lock daemon does not store object returned by virNetDaemonAddServerPostExec and log daemon store the object and unref it in virLogDaemonFree.
So I dug deeper and can alter the commit message to: Commit id 'fa1420736' introduced virNetDaemonAddServerPostExec and virNetDaemonAddServer; however, for the former when adding @srv to the dmn->servers list, there was no corresponding virObjectRef as there was in the latter. Commit id '252610f7d' modified net server management to use a hash table to store/manage the various servers and did not alter the code to add the object reference. [...] John

On 27.10.2017 14:20, John Ferlan wrote:
On 10/27/2017 02:29 AM, Nikolay Shirokovskiy wrote:
On 27.10.2017 08:26, John Ferlan wrote:
From: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
Commit id '252610f7d' modified net server management to use a hash table to store/manage the various servers; however, during virNetDaemonAddServerPostExec an @srv object is created, added to the dmn->servers hash table, but did not increment the object refcnt like was done during virNetDaemonAddServer as if @srv were being created for the first time.
I'm not agree that 252610f7d introduced the problem. Before this commit the situation was the same as now. virNetDaemonAddServer takes extra reference, virNetDaemonAddServerPostExec does not take extra reference, virNetDaemonDispose unref every server in array (just as hash table does). lock daemon does not store object returned by virNetDaemonAddServerPostExec and log daemon store the object and unref it in virLogDaemonFree.
So I dug deeper and can alter the commit message to:
Commit id 'fa1420736' introduced virNetDaemonAddServerPostExec and virNetDaemonAddServer; however, for the former when adding @srv to the dmn->servers list, there was no corresponding virObjectRef as there was in the latter.
Commit id '252610f7d' modified net server management to use a hash table to store/manage the various servers and did not alter the code to add the object reference.
I still think 252610f7d is not related. Commit only change the way servers list is stored not be way their reference are managed. Particularly virNetDaemonAddServerPostExec does not make ref before 252610f7d so the commit supposes it ok. Nikolay

From: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> The problem is incorrect order of qemu driver shutdown and shutdown of netserver threads that serve client requests (thru qemu driver particularly). Net server threads are shutdown upon dispose which is triggered by last daemon object unref at the end of main function. At the same time qemu driver is shutdown earlier in virStateCleanup. As a result netserver threads see invalid driver object in the middle of request processing. Let's move shutting down netserver threads earlier to virNetDaemonClose. Note: order of last daemon unref and virStateCleanup is introduced in 85c3a182 for a valid reason. Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/rpc/virnetdaemon.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index d970c47ad4..7cb3214166 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -882,6 +882,8 @@ virNetDaemonClose(virNetDaemonPtr dmn) virObjectLock(dmn); virHashForEach(dmn->servers, daemonServerClose, NULL); + virHashRemoveAll(dmn->servers); + dmn->servers = NULL; virObjectUnlock(dmn); } -- 2.13.6

On 27.10.2017 08:26, John Ferlan wrote:
From: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
The problem is incorrect order of qemu driver shutdown and shutdown of netserver threads that serve client requests (thru qemu driver particularly).
Net server threads are shutdown upon dispose which is triggered by last daemon object unref at the end of main function. At the same time qemu driver is shutdown earlier in virStateCleanup. As a result netserver threads see invalid driver object in the middle of request processing.
Let's move shutting down netserver threads earlier to virNetDaemonClose.
Note: order of last daemon unref and virStateCleanup is introduced in 85c3a182 for a valid reason.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/rpc/virnetdaemon.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index d970c47ad4..7cb3214166 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -882,6 +882,8 @@ virNetDaemonClose(virNetDaemonPtr dmn) virObjectLock(dmn);
virHashForEach(dmn->servers, daemonServerClose, NULL); + virHashRemoveAll(dmn->servers); + dmn->servers = NULL;
Setting NULL is not good, we leak hash table memory. Originally I call virHashFree instead of virHashRemoveAll thus this line. But calling virHashFree earlier then dispose is dangerous I agee with you here, virHashRemoveAll is better.
virObjectUnlock(dmn); }

On 10/27/2017 02:32 AM, Nikolay Shirokovskiy wrote:
On 27.10.2017 08:26, John Ferlan wrote:
From: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
The problem is incorrect order of qemu driver shutdown and shutdown of netserver threads that serve client requests (thru qemu driver particularly).
Net server threads are shutdown upon dispose which is triggered by last daemon object unref at the end of main function. At the same time qemu driver is shutdown earlier in virStateCleanup. As a result netserver threads see invalid driver object in the middle of request processing.
Let's move shutting down netserver threads earlier to virNetDaemonClose.
Note: order of last daemon unref and virStateCleanup is introduced in 85c3a182 for a valid reason.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/rpc/virnetdaemon.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index d970c47ad4..7cb3214166 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -882,6 +882,8 @@ virNetDaemonClose(virNetDaemonPtr dmn) virObjectLock(dmn);
virHashForEach(dmn->servers, daemonServerClose, NULL); + virHashRemoveAll(dmn->servers); + dmn->servers = NULL;
Setting NULL is not good, we leak hash table memory. Originally I call virHashFree instead of virHashRemoveAll thus this line. But calling virHashFree earlier then dispose is dangerous I agee with you here, virHashRemoveAll is better.
See and I left the assignment from your patch. Still, right virHashRemoveAll doesn't do the VIR_FREE(table->table) and VIR_FREE(table)... I'll remove the assignment. John
virObjectUnlock(dmn); }
participants (2)
-
John Ferlan
-
Nikolay Shirokovskiy