[libvirt] [PATCH v3 0/2] daemon: fix termination/reload issues

v1: https://www.redhat.com/archives/libvir-list/2017-September/msg01006.html Changes from v2: Fix syntax check issues. Changes from v1: 1. Patches that fixes qemu driver termination are moved out of this series 2. New version of [1] now fixes issue in a way that suggested by reviewer 3. [2] fixes another issue that was discovered in the process of review Nikolay Shirokovskiy (2): libvirtd: fix crash on termination [1] virtlogd: add missing netserver refcount increment on reload [2] src/locking/lock_daemon.c | 1 + src/rpc/virnetdaemon.c | 3 +++ 2 files changed, 4 insertions(+) -- 1.8.3.1

The problem is incorrect order of qemu driver shutdown and shutdown of netserver threads that serve client requests (thru qemu driver particularly). Net server threads are shutdowned upon dispose which is triggered by last daemon object unref at the end of main function. At the same time qemu driver is shutdowned earlier in virStateCleanup. As a result netserver threads see invalid driver object in the middle of request processing. Let's move shutting down netserver threads earlier to virNetDaemonClose. Note: order of last daemon unref and virStateCleanup is introduced in 85c3a182 for a valid reason. --- One can use next patch to trigger crash on termination. Call domstats function and then send TERM to libvirtd. Patch to trigger crash # diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c # index cf5e4ad..39a57aa 100644 # --- a/src/qemu/qemu_driver.c # +++ b/src/qemu/qemu_driver.c # @@ -20144,6 +20144,8 @@ qemuConnectGetAllDomainStats(virConnectPtr conn, # domflags = 0; # vm = vms[i]; # # + sleep(5); # + # virObjectLock(vm); # # if (HAVE_JOB(privflags) && src/rpc/virnetdaemon.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index e3b9390..495bc4b 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -880,6 +880,8 @@ virNetDaemonClose(virNetDaemonPtr dmn) virObjectLock(dmn); virHashForEach(dmn->servers, daemonServerClose, NULL); + virHashFree(dmn->servers); + dmn->servers = NULL; virObjectUnlock(dmn); } -- 1.8.3.1

On 10/24/2017 06:47 AM, Nikolay Shirokovskiy wrote:
The problem is incorrect order of qemu driver shutdown and shutdown of netserver threads that serve client requests (thru qemu driver particularly).
Net server threads are shutdowned upon dispose which is triggered by last daemon object unref at the end of main function. At the same time qemu driver is shutdowned earlier in virStateCleanup. As a result netserver threads see invalid driver object in the middle of request processing.
Let's move shutting down netserver threads earlier to virNetDaemonClose.
Note: order of last daemon unref and virStateCleanup is introduced in 85c3a182 for a valid reason. ---
One can use next patch to trigger crash on termination. Call domstats function and then send TERM to libvirtd.
Patch to trigger crash # diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c # index cf5e4ad..39a57aa 100644 # --- a/src/qemu/qemu_driver.c # +++ b/src/qemu/qemu_driver.c # @@ -20144,6 +20144,8 @@ qemuConnectGetAllDomainStats(virConnectPtr conn, # domflags = 0; # vm = vms[i]; # # + sleep(5); # + # virObjectLock(vm); # # if (HAVE_JOB(privflags) &&
src/rpc/virnetdaemon.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index e3b9390..495bc4b 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -880,6 +880,8 @@ virNetDaemonClose(virNetDaemonPtr dmn) virObjectLock(dmn);
virHashForEach(dmn->servers, daemonServerClose, NULL); + virHashFree(dmn->servers);
Should this be virHashRemoveAll? Or I wonder if the virHashFree in virNetDaemonDispose. John (I'm at KVM Forum this week so responses will be delayed)
+ dmn->servers = NULL;
virObjectUnlock(dmn); }

On 25.10.2017 11:07, John Ferlan wrote:
On 10/24/2017 06:47 AM, Nikolay Shirokovskiy wrote:
The problem is incorrect order of qemu driver shutdown and shutdown of netserver threads that serve client requests (thru qemu driver particularly).
Net server threads are shutdowned upon dispose which is triggered by last daemon object unref at the end of main function. At the same time qemu driver is shutdowned earlier in virStateCleanup. As a result netserver threads see invalid driver object in the middle of request processing.
Let's move shutting down netserver threads earlier to virNetDaemonClose.
Note: order of last daemon unref and virStateCleanup is introduced in 85c3a182 for a valid reason. ---
One can use next patch to trigger crash on termination. Call domstats function and then send TERM to libvirtd.
Patch to trigger crash # diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c # index cf5e4ad..39a57aa 100644 # --- a/src/qemu/qemu_driver.c # +++ b/src/qemu/qemu_driver.c # @@ -20144,6 +20144,8 @@ qemuConnectGetAllDomainStats(virConnectPtr conn, # domflags = 0; # vm = vms[i]; # # + sleep(5); # + # virObjectLock(vm); # # if (HAVE_JOB(privflags) &&
src/rpc/virnetdaemon.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index e3b9390..495bc4b 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -880,6 +880,8 @@ virNetDaemonClose(virNetDaemonPtr dmn) virObjectLock(dmn);
virHashForEach(dmn->servers, daemonServerClose, NULL); + virHashFree(dmn->servers);
Should this be virHashRemoveAll?
Yep, it would be better. I guess no need to resend the series... Nikolay
Or I wonder if the virHashFree in virNetDaemonDispose.
John
(I'm at KVM Forum this week so responses will be delayed)
+ dmn->servers = NULL;
virObjectUnlock(dmn); }

On 10/25/2017 04:24 AM, Nikolay Shirokovskiy wrote:
On 25.10.2017 11:07, John Ferlan wrote:
On 10/24/2017 06:47 AM, Nikolay Shirokovskiy wrote:
The problem is incorrect order of qemu driver shutdown and shutdown of netserver threads that serve client requests (thru qemu driver particularly).
Net server threads are shutdowned upon dispose which is triggered by last daemon object unref at the end of main function. At the same time qemu driver is shutdowned earlier in virStateCleanup. As a result netserver threads see invalid driver object in the middle of request processing.
Let's move shutting down netserver threads earlier to virNetDaemonClose.
Note: order of last daemon unref and virStateCleanup is introduced in 85c3a182 for a valid reason. ---
One can use next patch to trigger crash on termination. Call domstats function and then send TERM to libvirtd.
Patch to trigger crash # diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c # index cf5e4ad..39a57aa 100644 # --- a/src/qemu/qemu_driver.c # +++ b/src/qemu/qemu_driver.c # @@ -20144,6 +20144,8 @@ qemuConnectGetAllDomainStats(virConnectPtr conn, # domflags = 0; # vm = vms[i]; # # + sleep(5); # + # virObjectLock(vm); # # if (HAVE_JOB(privflags) &&
src/rpc/virnetdaemon.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index e3b9390..495bc4b 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -880,6 +880,8 @@ virNetDaemonClose(virNetDaemonPtr dmn) virObjectLock(dmn);
virHashForEach(dmn->servers, daemonServerClose, NULL); + virHashFree(dmn->servers);
Should this be virHashRemoveAll?
Yep, it would be better.
I guess no need to resend the series...
I can change it locally before pushing - just wanted to check John
Nikolay
Or I wonder if the virHashFree in virNetDaemonDispose.
John
(I'm at KVM Forum this week so responses will be delayed)
+ dmn->servers = NULL;
virObjectUnlock(dmn); }

After virNetDaemonAddServerPostExec call in virtlogd we should have netserver refcount set to 2. One goes to netdaemon servers hashtable and one goes to virtlogd own reference to netserver. Let's add missing increment in virNetDaemonAddServerPostExec itself while holding daemon lock. We also have to unref new extra ref after virtlockd call to virNetDaemonAddServerPostExec. --- src/locking/lock_daemon.c | 1 + src/rpc/virnetdaemon.c | 1 + 2 files changed, 2 insertions(+) diff --git a/src/locking/lock_daemon.c b/src/locking/lock_daemon.c index fe3eaf9..41a06b2 100644 --- a/src/locking/lock_daemon.c +++ b/src/locking/lock_daemon.c @@ -275,6 +275,7 @@ virLockDaemonNewPostExecRestart(virJSONValuePtr object, bool privileged) virLockDaemonClientFree, (void*)(intptr_t)(privileged ? 0x1 : 0x0)))) goto error; + virObjectUnref(srv); return lockd; diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index 495bc4b..f3e3ffe 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -312,6 +312,7 @@ virNetDaemonAddServerPostExec(virNetDaemonPtr dmn, if (virHashAddEntry(dmn->servers, serverName, srv) < 0) goto error; + virObjectRef(srv); virJSONValueFree(object); virObjectUnlock(dmn); -- 1.8.3.1

On 10/24/2017 06:47 AM, Nikolay Shirokovskiy wrote:
After virNetDaemonAddServerPostExec call in virtlogd we should have netserver refcount set to 2. One goes to netdaemon servers hashtable and one goes to virtlogd own reference to netserver. Let's add missing increment in virNetDaemonAddServerPostExec itself while holding daemon lock.
We also have to unref new extra ref after virtlockd call to virNetDaemonAddServerPostExec. --- src/locking/lock_daemon.c | 1 + src/rpc/virnetdaemon.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/src/locking/lock_daemon.c b/src/locking/lock_daemon.c index fe3eaf9..41a06b2 100644 --- a/src/locking/lock_daemon.c +++ b/src/locking/lock_daemon.c @@ -275,6 +275,7 @@ virLockDaemonNewPostExecRestart(virJSONValuePtr object, bool privileged) virLockDaemonClientFree, (void*)(intptr_t)(privileged ? 0x1 : 0x0)))) goto error; + virObjectUnref(srv);
I need to think a bit more about this one... different model in lockd vs. logd over @srv usage especially w/r/t this particular path. I see that in this path eventually something calls virNetDaemonGetServer instead of storing the @srv in order to add lockProgram... In any case, I guess I'd be worried/concerned that something could remove @srv causing the Hash table code to unref/delete the srv... Also, in the cleanup: path of main() wouldn't the Unref(srv) cause problems? Again- need to think a bit longer. John
return lockd;
diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index 495bc4b..f3e3ffe 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -312,6 +312,7 @@ virNetDaemonAddServerPostExec(virNetDaemonPtr dmn,
if (virHashAddEntry(dmn->servers, serverName, srv) < 0) goto error; + virObjectRef(srv);
virJSONValueFree(object); virObjectUnlock(dmn);

On 25.10.2017 12:06, John Ferlan wrote:
On 10/24/2017 06:47 AM, Nikolay Shirokovskiy wrote:
After virNetDaemonAddServerPostExec call in virtlogd we should have netserver refcount set to 2. One goes to netdaemon servers hashtable and one goes to virtlogd own reference to netserver. Let's add missing increment in virNetDaemonAddServerPostExec itself while holding daemon lock.
We also have to unref new extra ref after virtlockd call to virNetDaemonAddServerPostExec. --- src/locking/lock_daemon.c | 1 + src/rpc/virnetdaemon.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/src/locking/lock_daemon.c b/src/locking/lock_daemon.c index fe3eaf9..41a06b2 100644 --- a/src/locking/lock_daemon.c +++ b/src/locking/lock_daemon.c @@ -275,6 +275,7 @@ virLockDaemonNewPostExecRestart(virJSONValuePtr object, bool privileged) virLockDaemonClientFree, (void*)(intptr_t)(privileged ? 0x1 : 0x0)))) goto error; + virObjectUnref(srv);
I need to think a bit more about this one... different model in lockd vs. logd over @srv usage especially w/r/t this particular path.
I see that in this path eventually something calls virNetDaemonGetServer instead of storing the @srv in order to add lockProgram... In any case,> I guess I'd be worried/concerned that something could remove @srv causing the Hash table code to unref/delete the srv... Also, in the cleanup: path of main() wouldn't the Unref(srv) cause problems?
But this unref only balance newly added ref. If there are other problems with reference conting in virtlockd - its a different concern.
Again- need to think a bit longer.
John
return lockd;
diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index 495bc4b..f3e3ffe 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -312,6 +312,7 @@ virNetDaemonAddServerPostExec(virNetDaemonPtr dmn,
if (virHashAddEntry(dmn->servers, serverName, srv) < 0) goto error; + virObjectRef(srv);
virJSONValueFree(object); virObjectUnlock(dmn);

On 10/25/2017 05:15 AM, Nikolay Shirokovskiy wrote:
On 25.10.2017 12:06, John Ferlan wrote:
On 10/24/2017 06:47 AM, Nikolay Shirokovskiy wrote:
After virNetDaemonAddServerPostExec call in virtlogd we should have netserver refcount set to 2. One goes to netdaemon servers hashtable and one goes to virtlogd own reference to netserver. Let's add missing increment in virNetDaemonAddServerPostExec itself while holding daemon lock.
We also have to unref new extra ref after virtlockd call to virNetDaemonAddServerPostExec. --- src/locking/lock_daemon.c | 1 + src/rpc/virnetdaemon.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/src/locking/lock_daemon.c b/src/locking/lock_daemon.c index fe3eaf9..41a06b2 100644 --- a/src/locking/lock_daemon.c +++ b/src/locking/lock_daemon.c @@ -275,6 +275,7 @@ virLockDaemonNewPostExecRestart(virJSONValuePtr object, bool privileged) virLockDaemonClientFree, (void*)(intptr_t)(privileged ? 0x1 : 0x0)))) goto error; + virObjectUnref(srv);
I need to think a bit more about this one... different model in lockd vs. logd over @srv usage especially w/r/t this particular path.
I see that in this path eventually something calls virNetDaemonGetServer instead of storing the @srv in order to add lockProgram... In any case,> I guess I'd be worried/concerned that something could remove @srv causing the Hash table code to unref/delete the srv... Also, in the cleanup: path of main() wouldn't the Unref(srv) cause problems?
But this unref only balance newly added ref. If there are other problems with reference conting in virtlockd - its a different concern.
Ok - so what I've learned is that virLockDaemonPostExecRestart returns rv == 1 which indicates a successful restore resulting in a call to virNetDaemonGetServer which will do a virObjectRef anyway leaving us with (theoretically) 2 refs prior to the code that adds programs to @srv (and less concern from my part of the subsequent Unref in cleanup: here). If we saved srv in lockd like logd does, then we wouldn't need to call virNetDaemonGetServer resulting in the same eventual result except for a window after the Unref here where the refcnt == 1 until virNetDaemonGetServer is run. Since I believe nothing could happen in between that would cause the Unref due to HashFree being called, then I think we're OK. The concern being is if virNetDaemonGetServer didn't find the server, then @srv would be NULL and the subsequent call to virNetServerAddProgram for lockProgram would fail miserably, but I don't think that can happen theoretically at least! John
Again- need to think a bit longer.
John
return lockd;
diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index 495bc4b..f3e3ffe 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -312,6 +312,7 @@ virNetDaemonAddServerPostExec(virNetDaemonPtr dmn,
if (virHashAddEntry(dmn->servers, serverName, srv) < 0) goto error; + virObjectRef(srv);
virJSONValueFree(object); virObjectUnlock(dmn);

On 10/25/2017 05:48 AM, John Ferlan wrote:
On 10/25/2017 05:15 AM, Nikolay Shirokovskiy wrote:
On 25.10.2017 12:06, John Ferlan wrote:
On 10/24/2017 06:47 AM, Nikolay Shirokovskiy wrote:
After virNetDaemonAddServerPostExec call in virtlogd we should have netserver refcount set to 2. One goes to netdaemon servers hashtable and one goes to virtlogd own reference to netserver. Let's add missing increment in virNetDaemonAddServerPostExec itself while holding daemon lock.
We also have to unref new extra ref after virtlockd call to virNetDaemonAddServerPostExec. --- src/locking/lock_daemon.c | 1 + src/rpc/virnetdaemon.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/src/locking/lock_daemon.c b/src/locking/lock_daemon.c index fe3eaf9..41a06b2 100644 --- a/src/locking/lock_daemon.c +++ b/src/locking/lock_daemon.c @@ -275,6 +275,7 @@ virLockDaemonNewPostExecRestart(virJSONValuePtr object, bool privileged) virLockDaemonClientFree, (void*)(intptr_t)(privileged ? 0x1 : 0x0)))) goto error; + virObjectUnref(srv);
I need to think a bit more about this one... different model in lockd vs. logd over @srv usage especially w/r/t this particular path.
I see that in this path eventually something calls virNetDaemonGetServer instead of storing the @srv in order to add lockProgram... In any case,> I guess I'd be worried/concerned that something could remove @srv causing the Hash table code to unref/delete the srv... Also, in the cleanup: path of main() wouldn't the Unref(srv) cause problems?
But this unref only balance newly added ref. If there are other problems with reference conting in virtlockd - its a different concern.
Ok - so what I've learned is that virLockDaemonPostExecRestart returns rv == 1 which indicates a successful restore resulting in a call to virNetDaemonGetServer which will do a virObjectRef anyway leaving us with (theoretically) 2 refs prior to the code that adds programs to @srv (and less concern from my part of the subsequent Unref in cleanup: here).
If we saved srv in lockd like logd does, then we wouldn't need to call virNetDaemonGetServer resulting in the same eventual result except for a window after the Unref here where the refcnt == 1 until virNetDaemonGetServer is run. Since I believe nothing could happen in between that would cause the Unref due to HashFree being called, then I think we're OK.
The concern being is if virNetDaemonGetServer didn't find the server, then @srv would be NULL and the subsequent call to virNetServerAddProgram for lockProgram would fail miserably, but I don't think that can happen theoretically at least!
John
Again- need to think a bit longer.
I've been thinking more about this more... Looking through history - I see commit id f08b1c58 removed @srv from _virLockDaemon, but _virLogDaemon wasn't adjusted likewise. I also see commit id 252610f7 which modified the virnetdaemon code to use hash tables. What I'm missing is why that patch didn't add the virObjectRef as well since the @srv is placed into the hash table to match the similar operation done for the virNetDaemonAddServer. It seems when first written perhaps 252610f7 came first with f08b1c58 afterwards, but when committed it was in reverse order. There's just something not quite right and I'm not quite sure what the exact right patch(es) should be. "Theoretically speaking"... If we get a REF for alloc, then we need an UNREF If we get a REF for hash, then we need an UNREF If we get a REF for virNetDaemonGetServer, then we need an UNREF Without the REF in PostExec, then LogD will have a problem if there's a usage of logd->srv after the UNREF for the hash table happens. In particular, the UNREF in virLogDaemonFree. However, for LockD there's not a problem because there's REF's for GetServer and no UNREF on the allocation. For the "non"-PostExec path - there's a leak of @srv (theoretical) and what seems to be normal processing for the PostExec path. With the REF in PostExec, LogD would be fixed; however, LockD then has a leak because of the extra REF. Pushing patch1 seems to solve at least the problem w/ the late/delayed client access (as shown by the sleep() comments), but I fear it opens the door for a LogD problem because hash table ref gets decremented sooner. It's all very tricky and timing based. Perhaps someone else wants to have a look too and provide some thoughts? John
John
return lockd;
diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index 495bc4b..f3e3ffe 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -312,6 +312,7 @@ virNetDaemonAddServerPostExec(virNetDaemonPtr dmn,
if (virHashAddEntry(dmn->servers, serverName, srv) < 0) goto error; + virObjectRef(srv);
virJSONValueFree(object); virObjectUnlock(dmn);
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Thu, Oct 26, 2017 at 09:14:02AM -0400, John Ferlan wrote:
On 10/25/2017 05:48 AM, John Ferlan wrote:
On 10/25/2017 05:15 AM, Nikolay Shirokovskiy wrote:
On 25.10.2017 12:06, John Ferlan wrote:
On 10/24/2017 06:47 AM, Nikolay Shirokovskiy wrote:
After virNetDaemonAddServerPostExec call in virtlogd we should have netserver refcount set to 2. One goes to netdaemon servers hashtable and one goes to virtlogd own reference to netserver. Let's add missing increment in virNetDaemonAddServerPostExec itself while holding daemon lock.
We also have to unref new extra ref after virtlockd call to virNetDaemonAddServerPostExec. --- src/locking/lock_daemon.c | 1 + src/rpc/virnetdaemon.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/src/locking/lock_daemon.c b/src/locking/lock_daemon.c index fe3eaf9..41a06b2 100644 --- a/src/locking/lock_daemon.c +++ b/src/locking/lock_daemon.c @@ -275,6 +275,7 @@ virLockDaemonNewPostExecRestart(virJSONValuePtr object, bool privileged) virLockDaemonClientFree, (void*)(intptr_t)(privileged ? 0x1 : 0x0)))) goto error; + virObjectUnref(srv);
I need to think a bit more about this one... different model in lockd vs. logd over @srv usage especially w/r/t this particular path.
I see that in this path eventually something calls virNetDaemonGetServer instead of storing the @srv in order to add lockProgram... In any case,> I guess I'd be worried/concerned that something could remove @srv causing the Hash table code to unref/delete the srv... Also, in the cleanup: path of main() wouldn't the Unref(srv) cause problems?
But this unref only balance newly added ref. If there are other problems with reference conting in virtlockd - its a different concern.
Ok - so what I've learned is that virLockDaemonPostExecRestart returns rv == 1 which indicates a successful restore resulting in a call to virNetDaemonGetServer which will do a virObjectRef anyway leaving us with (theoretically) 2 refs prior to the code that adds programs to @srv (and less concern from my part of the subsequent Unref in cleanup: here).
If we saved srv in lockd like logd does, then we wouldn't need to call virNetDaemonGetServer resulting in the same eventual result except for a window after the Unref here where the refcnt == 1 until virNetDaemonGetServer is run. Since I believe nothing could happen in between that would cause the Unref due to HashFree being called, then I think we're OK.
The concern being is if virNetDaemonGetServer didn't find the server, then @srv would be NULL and the subsequent call to virNetServerAddProgram for lockProgram would fail miserably, but I don't think that can happen theoretically at least!
John
Again- need to think a bit longer.
I've been thinking more about this more...
Looking through history - I see commit id f08b1c58 removed @srv from _virLockDaemon, but _virLogDaemon wasn't adjusted likewise.
I also see commit id 252610f7 which modified the virnetdaemon code to use hash tables. What I'm missing is why that patch didn't add the virObjectRef as well since the @srv is placed into the hash table to match the similar operation done for the virNetDaemonAddServer.
It seems when first written perhaps 252610f7 came first with f08b1c58 afterwards, but when committed it was in reverse order.
There's just something not quite right and I'm not quite sure what the exact right patch(es) should be.
"Theoretically speaking"...
If we get a REF for alloc, then we need an UNREF If we get a REF for hash, then we need an UNREF If we get a REF for virNetDaemonGetServer, then we need an UNREF
Without the REF in PostExec, then LogD will have a problem if there's a usage of logd->srv after the UNREF for the hash table happens. In particular, the UNREF in virLogDaemonFree. However, for LockD there's not a problem because there's REF's for GetServer and no UNREF on the allocation. For the "non"-PostExec path - there's a leak of @srv (theoretical) and what seems to be normal processing for the PostExec path.
With the REF in PostExec, LogD would be fixed; however, LockD then has a leak because of the extra REF.
Pushing patch1 seems to solve at least the problem w/ the late/delayed client access (as shown by the sleep() comments), but I fear it opens the door for a LogD problem because hash table ref gets decremented sooner.
It's all very tricky and timing based.
Perhaps someone else wants to have a look too and provide some thoughts?
I'll definitely have a look at the series, but being at KVM Forum, my workflow is broken so to speak, hopefully I'll manage to get through this thoroughly before you push this. Erik
participants (3)
-
Erik Skultety
-
John Ferlan
-
Nikolay Shirokovskiy