[libvirt] [PATCH 0/5] libvirtd: Adjustments to startup and cleanup processing

Essentially a followup from the review of: daemon: fix termination/reload issues https://www.redhat.com/archives/libvir-list/2017-October/msg01347.html and in particular patch 3/3, a review which has exteneded into Nov.: https://www.redhat.com/archives/libvir-list/2017-November/msg00066.html Make a few adjustments as noted in the patches as an alternative to patch 3/3 from above. The cleanup path is essentially reordered in inverse order of startup. Other adjustments include virObjectUnref after attempts to place object into domain or server hash tables or linked lists. John Ferlan (5): libvirtd: Move pid_file_fd setup to before run_dir libvirtd: Alter order of virNetDaemonNew libvirtd: Alter refcnt processing for domain server objects libvirtd: Alter refcnt processing for server program objects libvirtd: Fix order of cleanup processing daemon/libvirtd.c | 95 +++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 65 insertions(+), 30 deletions(-) -- 2.13.6

Once we have forked the daemon successfully, let's claim the pidfile immediately rather than waiting for setup of run_dir. Signed-off-by: John Ferlan <jferlan@redhat.com> --- daemon/libvirtd.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/daemon/libvirtd.c b/daemon/libvirtd.c index 589b32192e..4fc33ba0d4 100644 --- a/daemon/libvirtd.c +++ b/daemon/libvirtd.c @@ -1258,6 +1258,12 @@ int main(int argc, char **argv) { } } + /* Try to claim the pidfile, exiting if we can't */ + if ((pid_file_fd = virPidFileAcquirePath(pid_file, false, getpid())) < 0) { + ret = VIR_DAEMON_ERR_PIDFILE; + goto cleanup; + } + /* Ensure the rundir exists (on tmpfs on some systems) */ if (privileged) { if (VIR_STRDUP_QUIET(run_dir, LOCALSTATEDIR "/run/libvirt") < 0) { @@ -1286,12 +1292,6 @@ int main(int argc, char **argv) { } umask(old_umask); - /* Try to claim the pidfile, exiting if we can't */ - if ((pid_file_fd = virPidFileAcquirePath(pid_file, false, getpid())) < 0) { - ret = VIR_DAEMON_ERR_PIDFILE; - goto cleanup; - } - if (virNetlinkStartup() < 0) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; -- 2.13.6

On Tue, Nov 07, 2017 at 09:39:52PM -0500, John Ferlan wrote:
Once we have forked the daemon successfully, let's claim the pidfile immediately rather than waiting for setup of run_dir.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>

Let's be sure we can get a Daemon object before the server object. This is a more "orderly" way to do things since the svr object would be added to the dmn object afterwards. Signed-off-by: John Ferlan <jferlan@redhat.com> --- daemon/libvirtd.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/daemon/libvirtd.c b/daemon/libvirtd.c index 4fc33ba0d4..73f24915df 100644 --- a/daemon/libvirtd.c +++ b/daemon/libvirtd.c @@ -1297,6 +1297,11 @@ int main(int argc, char **argv) { goto cleanup; } + if (!(dmn = virNetDaemonNew())) { + ret = VIR_DAEMON_ERR_INIT; + goto cleanup; + } + if (!(srv = virNetServerNew("libvirtd", 1, config->min_workers, config->max_workers, @@ -1314,8 +1319,7 @@ int main(int argc, char **argv) { goto cleanup; } - if (!(dmn = virNetDaemonNew()) || - virNetDaemonAddServer(dmn, srv) < 0) { + if (virNetDaemonAddServer(dmn, srv) < 0) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } -- 2.13.6

On Tue, Nov 07, 2017 at 09:39:53PM -0500, John Ferlan wrote:
Let's be sure we can get a Daemon object before the server object. This is a more "orderly" way to do things since the svr object would be added to the dmn object afterwards.
Signed-off-by: John Ferlan <jferlan@redhat.com>
Reviewed-by: Erik Skultety <eskultet@redhat.com>

Whether the @srv/@srvAdm is added to the dmn->servers list or not, the reference kept for the allocation can be dropped leaving just the reference for being on the dmn->servers list be the sole deciding factor when to really free the associated memory. The @dmn dispose function (virNetDaemonDispose) will handle the Unref of the objects during the virHashFree. Signed-off-by: John Ferlan <jferlan@redhat.com> --- daemon/libvirtd.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/daemon/libvirtd.c b/daemon/libvirtd.c index 73f24915df..5c47e49d48 100644 --- a/daemon/libvirtd.c +++ b/daemon/libvirtd.c @@ -1065,6 +1065,7 @@ int main(int argc, char **argv) { char *remote_config_file = NULL; int statuswrite = -1; int ret = 1; + int rc; int pid_file_fd = -1; char *pid_file = NULL; char *sock_file = NULL; @@ -1319,7 +1320,11 @@ int main(int argc, char **argv) { goto cleanup; } - if (virNetDaemonAddServer(dmn, srv) < 0) { + /* Add @srv to @dmn servers hash table and Unref to allow removal from + * hash table at @dmn disposal to decide when to free @srv */ + rc = virNetDaemonAddServer(dmn, srv); + virObjectUnref(srv); + if (rc < 0) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } @@ -1393,7 +1398,11 @@ int main(int argc, char **argv) { goto cleanup; } - if (virNetDaemonAddServer(dmn, srvAdm) < 0) { + /* Add @srvAdm to @dmn servers hash table and Unref to allow removal from + * hash table at @dmn disposal to decide when to free @srvAdm */ + rc = virNetDaemonAddServer(dmn, srvAdm); + virObjectUnref(srvAdm); + if (rc < 0) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } @@ -1509,8 +1518,6 @@ int main(int argc, char **argv) { virObjectUnref(qemuProgram); virObjectUnref(adminProgram); virNetDaemonClose(dmn); - virObjectUnref(srv); - virObjectUnref(srvAdm); virNetlinkShutdown(); if (statuswrite != -1) { if (ret != 0) { -- 2.13.6

On Tue, Nov 07, 2017 at 09:39:54PM -0500, John Ferlan wrote:
Whether the @srv/@srvAdm is added to the dmn->servers list or not, the reference kept for the allocation can be dropped leaving just the reference for being on the dmn->servers list be the sole deciding factor when to really free the associated memory. The @dmn dispose function (virNetDaemonDispose) will handle the Unref of the objects during the virHashFree.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- daemon/libvirtd.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/daemon/libvirtd.c b/daemon/libvirtd.c index 73f24915df..5c47e49d48 100644 --- a/daemon/libvirtd.c +++ b/daemon/libvirtd.c @@ -1065,6 +1065,7 @@ int main(int argc, char **argv) { char *remote_config_file = NULL; int statuswrite = -1; int ret = 1; + int rc; int pid_file_fd = -1; char *pid_file = NULL; char *sock_file = NULL; @@ -1319,7 +1320,11 @@ int main(int argc, char **argv) { goto cleanup; }
- if (virNetDaemonAddServer(dmn, srv) < 0) { + /* Add @srv to @dmn servers hash table and Unref to allow removal from + * hash table at @dmn disposal to decide when to free @srv */ + rc = virNetDaemonAddServer(dmn, srv);
<rant> Sorry for the delay, I've been playing with LXC for 2 days, trying to either debug or just use valgrind on lxc_controller to get the info I need (explanation below), but after those 2 days, I just give up, not worth any more time, if someone knows how I can actually hit the cleanup section in lxc_controller within gdb telling me that it suddenly temporarily disabled breakpoints at the moment it's supposed to stop at one that it had let me to set earlier, I'm all ears. Btw to my biggest surprise valgrind still doesn't handle setns syscall which has probably been the reason why we forbade running LXC under valgrind in 2009, unbelievable. </rant> Anyhow, based purely on a visual perception (sigh...), it looks like that the LXC "server" is leaked in lxc_controller.c because virLXCControllerSetupServer creates the object, calls virNetDaemonAddServer which increments @refcnt, but I don't see two corresponding virObjectUnrefs (that's the thing, once you don't run gdb or valgrind, you can't be sure whether this suspicion is right or not ...). Now the actual review. virNetDaemonAddServer is only used when spawning a new daemon or setting up LXC controller. The function essentially does: lock(@dmn) hash_table_add(@srv) ref(@srv) unlock(@dmn) and then you unref @dmn right upon the completion of adding the server to the hash table, what's the purpose of the extra ref when you discard it immediately? Unless I'm failing to see something, I'd prefer to just drop the extra ref in virNetDaemonAddServer and leave the proposed hunks unchanged - you have 1 ref which you got by creating the object, now it should be just simply inserted into the hash table, the additional ref after insertion doesn't make sense, if someone managed to unref it before inserting it into the hash table, hash insert would most likely fail, if not, hashFree surely would, not mentioning that at that point there's no entity that is touching servers.
+ virObjectUnref(srv); + if (rc < 0) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } @@ -1393,7 +1398,11 @@ int main(int argc, char **argv) { goto cleanup; }
- if (virNetDaemonAddServer(dmn, srvAdm) < 0) { + /* Add @srvAdm to @dmn servers hash table and Unref to allow removal from + * hash table at @dmn disposal to decide when to free @srvAdm */ + rc = virNetDaemonAddServer(dmn, srvAdm); + virObjectUnref(srvAdm); + if (rc < 0) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } @@ -1509,8 +1518,6 @@ int main(int argc, char **argv) { virObjectUnref(qemuProgram); virObjectUnref(adminProgram); virNetDaemonClose(dmn); - virObjectUnref(srv); - virObjectUnref(srvAdm);
^This hunk is of course fine. Erik

On 11/10/2017 10:08 AM, Erik Skultety wrote:
On Tue, Nov 07, 2017 at 09:39:54PM -0500, John Ferlan wrote:
Whether the @srv/@srvAdm is added to the dmn->servers list or not, the reference kept for the allocation can be dropped leaving just the reference for being on the dmn->servers list be the sole deciding factor when to really free the associated memory. The @dmn dispose function (virNetDaemonDispose) will handle the Unref of the objects during the virHashFree.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- daemon/libvirtd.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/daemon/libvirtd.c b/daemon/libvirtd.c index 73f24915df..5c47e49d48 100644 --- a/daemon/libvirtd.c +++ b/daemon/libvirtd.c @@ -1065,6 +1065,7 @@ int main(int argc, char **argv) { char *remote_config_file = NULL; int statuswrite = -1; int ret = 1; + int rc; int pid_file_fd = -1; char *pid_file = NULL; char *sock_file = NULL; @@ -1319,7 +1320,11 @@ int main(int argc, char **argv) { goto cleanup; }
- if (virNetDaemonAddServer(dmn, srv) < 0) { + /* Add @srv to @dmn servers hash table and Unref to allow removal from + * hash table at @dmn disposal to decide when to free @srv */ + rc = virNetDaemonAddServer(dmn, srv);
<rant> Sorry for the delay, I've been playing with LXC for 2 days, trying to either debug or just use valgrind on lxc_controller to get the info I need (explanation below), but after those 2 days, I just give up, not worth any more time, if someone knows how I can actually hit the cleanup section in lxc_controller within gdb telling me that it suddenly temporarily disabled breakpoints at the moment it's supposed to stop at one that it had let me to set earlier, I'm all ears. Btw to my biggest surprise valgrind still doesn't handle setns syscall which has probably been the reason why we forbade running LXC under valgrind in 2009, unbelievable. </rant>
I feel your pain ;-) Chasing these memory things is challenging especially when variable names change or things just aren't consistent. They're a real time sync too - that whole NetDaemon, NetServer, NetServerClient, etc. code is a beast to understand. I only have figured out a small piece of it...
Anyhow, based purely on a visual perception (sigh...), it looks like that the LXC "server" is leaked in lxc_controller.c because virLXCControllerSetupServer creates the object, calls virNetDaemonAddServer which increments @refcnt, but I don't see two corresponding virObjectUnrefs (that's the thing, once you don't run gdb or valgrind, you can't be sure whether this suspicion is right or not ...).
So let's see - I see the 2 REF's on @srv as you do in main(). The ctrl->daemon->servers will be UNREF'd in virNetDaemonDispose when the last Unref of ctrl->daemon is done in virLXCControllerFree via the call to virHashFree... BTW: Chasing that is painful, but once the server is removed from the hash table it does end up calling an Unref when virObjectFreeHashData is called (e.g. the 2nd arg to virHashCreate). This may be one of those cases where you need to add "temporary" debug code to virObjectRef and virObjectUnref along with the address of the object in order to see "when" it's Ref'd and Unref'd and what the count is (I've done this using VIR_WARN and just perusing generated output). It is painful... Don't forget to add debug code to this function that would print the addr of the daemon and server object's so that you can output everything to a file and then comb through the output looking for when things are ref/unref'd and most importantly some free of an object that didn't have an object it contains being free'd. Once you see the daemon object free'd, but the @srv object still hanging on - you know you're pretty safe in assuming there's a missing Unref... Anyway, I agree with you upon visual inspection - there's a leak since @srv is never Unref'd. Once it's added to the Server hash table, then there should be a virObjectUnref(srv). As for other consumers.... The reason that log_daemon doesn't do the virObjectUnref is that it saves @srv and eventually will Unref it later. Unlike the lock_daemon code which uses virNetDaemonGetServer to find the @srv; whereas, log_daemon goes direct to logDaemon->srv.
Now the actual review. virNetDaemonAddServer is only used when spawning a new daemon or setting up LXC controller. The function essentially does:
lock(@dmn) hash_table_add(@srv) ref(@srv) unlock(@dmn)
and then you unref @dmn right upon the completion of adding the server to the hash table, what's the purpose of the extra ref when you discard it immediately?
The REF is for the Hash Table. We've added something to the hash table - we should increment the refcnt - nothing more nothing less. The UNREF for that occurs as part of virHashFree because virHashCreate uses virObjectFreeHashData. The UNREF is for the allocation (or virNetServerNew) and an optimization because we know it's in the Hash Table and wouldn't be removed from the Hash Table until the virNetDaemonDispose calls virHashFree once the last @dmn is UNREF'd. Dropping this patch and the next one is fine by me, but the Unref's below would be reinstated.
Unless I'm failing to see something, I'd prefer to just drop the extra ref in virNetDaemonAddServer and leave the proposed hunks unchanged - you have 1 ref which you got by creating the object, now it should be just simply inserted into the hash table, the additional ref after insertion doesn't make sense, if someone managed to unref it before inserting it into the hash table, hash insert would most likely fail, if not, hashFree surely would, not mentioning that at that point there's no entity that is touching servers.
That messes up log_daemon as it's stores @srv in logDamon->srv and then will virObjectUnref it virLogDaemonFree. If log_daemon didn't save a ref of the object in logDaemon->srv, then it too could do a virObjectUnref right away too.
+ virObjectUnref(srv); + if (rc < 0) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } @@ -1393,7 +1398,11 @@ int main(int argc, char **argv) { goto cleanup; }
- if (virNetDaemonAddServer(dmn, srvAdm) < 0) { + /* Add @srvAdm to @dmn servers hash table and Unref to allow removal from + * hash table at @dmn disposal to decide when to free @srvAdm */ + rc = virNetDaemonAddServer(dmn, srvAdm); + virObjectUnref(srvAdm); + if (rc < 0) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } @@ -1509,8 +1518,6 @@ int main(int argc, char **argv) { virObjectUnref(qemuProgram); virObjectUnref(adminProgram); virNetDaemonClose(dmn); - virObjectUnref(srv); - virObjectUnref(srvAdm);
^This hunk is of course fine.
Erik

On Fri, Nov 10, 2017 at 05:41:51PM -0500, John Ferlan wrote:
On 11/10/2017 10:08 AM, Erik Skultety wrote:
On Tue, Nov 07, 2017 at 09:39:54PM -0500, John Ferlan wrote:
Whether the @srv/@srvAdm is added to the dmn->servers list or not, the reference kept for the allocation can be dropped leaving just the reference for being on the dmn->servers list be the sole deciding factor when to really free the associated memory. The @dmn dispose function (virNetDaemonDispose) will handle the Unref of the objects during the virHashFree.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- daemon/libvirtd.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/daemon/libvirtd.c b/daemon/libvirtd.c index 73f24915df..5c47e49d48 100644 --- a/daemon/libvirtd.c +++ b/daemon/libvirtd.c @@ -1065,6 +1065,7 @@ int main(int argc, char **argv) { char *remote_config_file = NULL; int statuswrite = -1; int ret = 1; + int rc; int pid_file_fd = -1; char *pid_file = NULL; char *sock_file = NULL; @@ -1319,7 +1320,11 @@ int main(int argc, char **argv) { goto cleanup; }
- if (virNetDaemonAddServer(dmn, srv) < 0) { + /* Add @srv to @dmn servers hash table and Unref to allow removal from + * hash table at @dmn disposal to decide when to free @srv */ + rc = virNetDaemonAddServer(dmn, srv);
<rant> Sorry for the delay, I've been playing with LXC for 2 days, trying to either debug or just use valgrind on lxc_controller to get the info I need (explanation below), but after those 2 days, I just give up, not worth any more time, if someone knows how I can actually hit the cleanup section in lxc_controller within gdb telling me that it suddenly temporarily disabled breakpoints at the moment it's supposed to stop at one that it had let me to set earlier, I'm all ears. Btw to my biggest surprise valgrind still doesn't handle setns syscall which has probably been the reason why we forbade running LXC under valgrind in 2009, unbelievable. </rant>
I feel your pain ;-) Chasing these memory things is challenging especially when variable names change or things just aren't consistent.
Well, it wouldn't have to be if one didn't have to fight the mentioned tools so much - if you've been using fedora since v25 (I don't know whether debian/ubuntu suffer the same way), you might have come across this weird thing that even if you compile libvirt with debugging symbols, you're not able to actually step into non-static functions with "step" (you'd have to use stepi multiple times) because the linker doesn't generate the PLT stubs for some reason, which is quite frustrating for someone who debugs a lot and more breakpoints just don't help here really since some of them might get hit constantly from places I don't want and disabling them simply doesn't feel efficient like using 'step'. ...
Anyway, I agree with you upon visual inspection - there's a leak since @srv is never Unref'd. Once it's added to the Server hash table, then there should be a virObjectUnref(srv).
\o/ This is just one in a ... lot ..., since the Unrefs are almost impossible to track visually... :)
As for other consumers.... The reason that log_daemon doesn't do the virObjectUnref is that it saves @srv and eventually will Unref it later. Unlike the lock_daemon code which uses virNetDaemonGetServer to find the @srv; whereas, log_daemon goes direct to logDaemon->srv.
Yeah, that just doesn't feel right, virtlogd is just a copy of virtlockd, I haven't found a compelling reason why they're different in this aspect, so I sent simple patches to address that.
Now the actual review. virNetDaemonAddServer is only used when spawning a new daemon or setting up LXC controller. The function essentially does:
lock(@dmn) hash_table_add(@srv) ref(@srv) unlock(@dmn)
and then you unref @dmn right upon the completion of adding the server to the hash table, what's the purpose of the extra ref when you discard it immediately?
The REF is for the Hash Table. We've added something to the hash table - we should increment the refcnt - nothing more nothing less. The UNREF for that occurs as part of virHashFree because virHashCreate uses virObjectFreeHashData.
Sure, I understand that the ref is for the hash table, but there's no other caller that would interfere with the object (the idea behind ref counting), you already have a ref for the daemon, which is not going to need it anymore (we know this), so you can just pass it onto the hash table, because right now, it looks odd, you wait dor successful addition to the table (it's not event the table doing the ref increment), increment ref, return, drop ref - every time I look at it, it raises the question 'why'... Erik

[...]
Now the actual review. virNetDaemonAddServer is only used when spawning a new daemon or setting up LXC controller. The function essentially does:
lock(@dmn) hash_table_add(@srv) ref(@srv) unlock(@dmn)
and then you unref @dmn right upon the completion of adding the server to the hash table, what's the purpose of the extra ref when you discard it immediately? Unless I'm failing to see something, I'd prefer to just drop the extra ref in virNetDaemonAddServer and leave the proposed hunks unchanged - you have 1 ref which you got by creating the object, now it should be just simply inserted into the hash table, the additional ref after insertion doesn't make sense, if someone managed to unref it before inserting it into the hash table, hash insert would most likely fail, if not, hashFree surely would, not mentioning that at that point there's no entity that is touching servers.
Since I was "digging" on a different issue - check out how this is done elsewhere... Start with virNetServerDispatchNewClient. It'll call the ClientNew function which generates a REF. Then it will call AddClient which generates a REF. Then because it's on that list and this code is theoretically done with it - it will UNREF the client before returning. John [...]

On Sun, Nov 12, 2017 at 09:46:48AM -0500, John Ferlan wrote:
[...]
Now the actual review. virNetDaemonAddServer is only used when spawning a new daemon or setting up LXC controller. The function essentially does:
lock(@dmn) hash_table_add(@srv) ref(@srv) unlock(@dmn)
and then you unref @dmn right upon the completion of adding the server to the hash table, what's the purpose of the extra ref when you discard it immediately? Unless I'm failing to see something, I'd prefer to just drop the extra ref in virNetDaemonAddServer and leave the proposed hunks unchanged - you have 1 ref which you got by creating the object, now it should be just simply inserted into the hash table, the additional ref after insertion doesn't make sense, if someone managed to unref it before inserting it into the hash table, hash insert would most likely fail, if not, hashFree surely would, not mentioning that at that point there's no entity that is touching servers.
Since I was "digging" on a different issue - check out how this is done elsewhere... Start with virNetServerDispatchNewClient. It'll call the ClientNew function which generates a REF. Then it will call AddClient which generates a REF. Then because it's on that list and this code is theoretically done with it - it will UNREF the client before returning.
Sure, but we shouldn't treat everything in a uniform manner - the fact that we can do that and it works still doesn't necessarily mean it's right. BTW I only looked at NetClient briefly, but that too looks to me like the reffing is unnecessary. Erik

On 11/14/2017 09:17 AM, Erik Skultety wrote:
On Sun, Nov 12, 2017 at 09:46:48AM -0500, John Ferlan wrote:
[...]
Now the actual review. virNetDaemonAddServer is only used when spawning a new daemon or setting up LXC controller. The function essentially does:
lock(@dmn) hash_table_add(@srv) ref(@srv) unlock(@dmn)
and then you unref @dmn right upon the completion of adding the server to the hash table, what's the purpose of the extra ref when you discard it immediately? Unless I'm failing to see something, I'd prefer to just drop the extra ref in virNetDaemonAddServer and leave the proposed hunks unchanged - you have 1 ref which you got by creating the object, now it should be just simply inserted into the hash table, the additional ref after insertion doesn't make sense, if someone managed to unref it before inserting it into the hash table, hash insert would most likely fail, if not, hashFree surely would, not mentioning that at that point there's no entity that is touching servers.
Since I was "digging" on a different issue - check out how this is done elsewhere... Start with virNetServerDispatchNewClient. It'll call the ClientNew function which generates a REF. Then it will call AddClient which generates a REF. Then because it's on that list and this code is theoretically done with it - it will UNREF the client before returning.
Sure, but we shouldn't treat everything in a uniform manner - the fact that we can do that and it works still doesn't necessarily mean it's right. BTW I only looked at NetClient briefly, but that too looks to me like the reffing is unnecessary.
Erik
I do understand your point, but undoing the Ref when placing into the HashTable consistently across all consumers is perhaps a large task. At this point, I'm thinking I just drop patches 3 & 4 - it just means the objects will have 2 refs and allows me/us to just make forward progress putting the discussion about the need to Ref when adding to the hash table for a different day or set of patches. In the end the more important patch is 5 - at least with respect to does it handle the issue that Nikolay was originally trying to handle. John

On Wed, Nov 15, 2017 at 05:59:39PM -0500, John Ferlan wrote:
On 11/14/2017 09:17 AM, Erik Skultety wrote:
On Sun, Nov 12, 2017 at 09:46:48AM -0500, John Ferlan wrote:
[...]
Now the actual review. virNetDaemonAddServer is only used when spawning a new daemon or setting up LXC controller. The function essentially does:
lock(@dmn) hash_table_add(@srv) ref(@srv) unlock(@dmn)
and then you unref @dmn right upon the completion of adding the server to the hash table, what's the purpose of the extra ref when you discard it immediately? Unless I'm failing to see something, I'd prefer to just drop the extra ref in virNetDaemonAddServer and leave the proposed hunks unchanged - you have 1 ref which you got by creating the object, now it should be just simply inserted into the hash table, the additional ref after insertion doesn't make sense, if someone managed to unref it before inserting it into the hash table, hash insert would most likely fail, if not, hashFree surely would, not mentioning that at that point there's no entity that is touching servers.
Since I was "digging" on a different issue - check out how this is done elsewhere... Start with virNetServerDispatchNewClient. It'll call the ClientNew function which generates a REF. Then it will call AddClient which generates a REF. Then because it's on that list and this code is theoretically done with it - it will UNREF the client before returning.
Sure, but we shouldn't treat everything in a uniform manner - the fact that we can do that and it works still doesn't necessarily mean it's right. BTW I only looked at NetClient briefly, but that too looks to me like the reffing is unnecessary.
Erik
I do understand your point, but undoing the Ref when placing into the HashTable consistently across all consumers is perhaps a large task.
At this point, I'm thinking I just drop patches 3 & 4 - it just means the objects will have 2 refs and allows me/us to just make forward progress putting the discussion about the need to Ref when adding to the
OK, we can do that...
hash table for a different day or set of patches. In the end the more important patch is 5 - at least with respect to does it handle the issue that Nikolay was originally trying to handle.
Damn, I forgot to respond to patch 5, sorry about that, I just did. Erik

Instead of holding onto the reference for each program added to a NetServer, let's Unref the program reference as soon as the attempt is made to add to the @srv->programs list. This then allows the virNetServerDispose to handle the final Unref of the object for actual deletion and cleans up the cleanup: path. Signed-off-by: John Ferlan <jferlan@redhat.com> --- daemon/libvirtd.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/daemon/libvirtd.c b/daemon/libvirtd.c index 5c47e49d48..87c5b22710 100644 --- a/daemon/libvirtd.c +++ b/daemon/libvirtd.c @@ -1352,7 +1352,10 @@ int main(int argc, char **argv) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } - if (virNetServerAddProgram(srv, remoteProgram) < 0) { + + rc = virNetServerAddProgram(srv, remoteProgram); + virObjectUnref(remoteProgram); + if (rc < 0) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } @@ -1364,7 +1367,10 @@ int main(int argc, char **argv) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } - if (virNetServerAddProgram(srv, lxcProgram) < 0) { + + rc = virNetServerAddProgram(srv, lxcProgram); + virObjectUnref(lxcProgram); + if (rc < 0) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } @@ -1376,7 +1382,10 @@ int main(int argc, char **argv) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } - if (virNetServerAddProgram(srv, qemuProgram) < 0) { + + rc = virNetServerAddProgram(srv, qemuProgram); + virObjectUnref(qemuProgram); + if (rc < 0) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } @@ -1414,7 +1423,10 @@ int main(int argc, char **argv) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } - if (virNetServerAddProgram(srvAdm, adminProgram) < 0) { + + rc = virNetServerAddProgram(srvAdm, adminProgram); + virObjectUnref(adminProgram); + if (rc < 0) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } @@ -1513,10 +1525,6 @@ int main(int argc, char **argv) { cleanup: virNetlinkEventServiceStopAll(); - virObjectUnref(remoteProgram); - virObjectUnref(lxcProgram); - virObjectUnref(qemuProgram); - virObjectUnref(adminProgram); virNetDaemonClose(dmn); virNetlinkShutdown(); if (statuswrite != -1) { -- 2.13.6

On Tue, Nov 07, 2017 at 09:39:55PM -0500, John Ferlan wrote:
Instead of holding onto the reference for each program added to a NetServer, let's Unref the program reference as soon as the attempt is made to add to the @srv->programs list. This then allows the virNetServerDispose to handle the final Unref of the object for actual deletion and cleans up the cleanup: path.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- daemon/libvirtd.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/daemon/libvirtd.c b/daemon/libvirtd.c index 5c47e49d48..87c5b22710 100644 --- a/daemon/libvirtd.c +++ b/daemon/libvirtd.c @@ -1352,7 +1352,10 @@ int main(int argc, char **argv) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } - if (virNetServerAddProgram(srv, remoteProgram) < 0) { + + rc = virNetServerAddProgram(srv, remoteProgram); + virObjectUnref(remoteProgram);
Again, why can't we drop the ref from the AddProgram, if we as the @srv mutex owners got rescheduled and someone unreff'd @program before we inserted it to the array (while holding mutex), we'd fail to ref a dangling pointer consequently, so I don't see any added protection here and I think ^these hunks are unnecessary, once we agree to drop virObjectRef from virNet<Foo>Add<Bar> during the init phase.
+ if (rc < 0) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } @@ -1364,7 +1367,10 @@ int main(int argc, char **argv) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } - if (virNetServerAddProgram(srv, lxcProgram) < 0) { + + rc = virNetServerAddProgram(srv, lxcProgram); + virObjectUnref(lxcProgram); + if (rc < 0) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } @@ -1376,7 +1382,10 @@ int main(int argc, char **argv) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } - if (virNetServerAddProgram(srv, qemuProgram) < 0) { + + rc = virNetServerAddProgram(srv, qemuProgram); + virObjectUnref(qemuProgram); + if (rc < 0) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } @@ -1414,7 +1423,10 @@ int main(int argc, char **argv) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } - if (virNetServerAddProgram(srvAdm, adminProgram) < 0) { + + rc = virNetServerAddProgram(srvAdm, adminProgram); + virObjectUnref(adminProgram); + if (rc < 0) { ret = VIR_DAEMON_ERR_INIT; goto cleanup; } @@ -1513,10 +1525,6 @@ int main(int argc, char **argv) {
cleanup: virNetlinkEventServiceStopAll(); - virObjectUnref(remoteProgram); - virObjectUnref(lxcProgram); - virObjectUnref(qemuProgram); - virObjectUnref(adminProgram);
This hunk would stay either way. Erik

Current cleanup processing is ad-hoc at best - it's led to a couple of strange and hard to diagnose timing problems and crashes. So rather than perform cleanup in a somewhat random order, let's perform cleanup in the exact opposite order of startup. Signed-off-by: John Ferlan <jferlan@redhat.com> --- daemon/libvirtd.c | 42 +++++++++++++++++++++++++++++------------- 1 file changed, 29 insertions(+), 13 deletions(-) diff --git a/daemon/libvirtd.c b/daemon/libvirtd.c index 87c5b22710..92037e9070 100644 --- a/daemon/libvirtd.c +++ b/daemon/libvirtd.c @@ -1524,9 +1524,35 @@ int main(int argc, char **argv) { 0, "shutdown", NULL, NULL); cleanup: - virNetlinkEventServiceStopAll(); + /* NB: Order for cleanup should attempt to kept in the inverse order + * was was used to create/start the daemon - there are some fairly + * important reliances built into the startup processing that use + * refcnt's in order to manage objects. Removing too early could + * cause crashes. */ virNetDaemonClose(dmn); + + virNetlinkEventServiceStopAll(); + + if (driversInitialized) { + /* Set via daemonRunStateInit thread in daemonStateInit. + * NB: It is possible that virNetlinkEventServerStart fails + * and we jump to cleanup before driversInitialized has + * been set. That could leave things inconsistent; however, + * resolution of that possibility is perhaps more trouble than + * it's worth to handle. */ + driversInitialized = false; + virStateCleanup(); + } + + virObjectUnref(dmn); + virNetlinkShutdown(); + + if (pid_file_fd != -1) + virPidFileReleasePath(pid_file, pid_file_fd); + + VIR_FREE(run_dir); + if (statuswrite != -1) { if (ret != 0) { /* Tell parent of daemon what failed */ @@ -1537,25 +1563,15 @@ int main(int argc, char **argv) { } VIR_FORCE_CLOSE(statuswrite); } - if (pid_file_fd != -1) - virPidFileReleasePath(pid_file, pid_file_fd); VIR_FREE(sock_file); VIR_FREE(sock_file_ro); VIR_FREE(sock_file_adm); + VIR_FREE(pid_file); - VIR_FREE(remote_config_file); - VIR_FREE(run_dir); + VIR_FREE(remote_config_file); daemonConfigFree(config); - if (driversInitialized) { - driversInitialized = false; - virStateCleanup(); - } - /* Now that the hypervisor shutdown inhibition functions that use - * 'dmn' as a parameter are done, we can finally unref 'dmn' */ - virObjectUnref(dmn); - return ret; } -- 2.13.6

On Tue, Nov 07, 2017 at 09:39:56PM -0500, John Ferlan wrote:
Current cleanup processing is ad-hoc at best - it's led to a couple of strange and hard to diagnose timing problems and crashes.
So rather than perform cleanup in a somewhat random order, let's perform cleanup in the exact opposite order of startup.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- daemon/libvirtd.c | 42 +++++++++++++++++++++++++++++------------- 1 file changed, 29 insertions(+), 13 deletions(-)
diff --git a/daemon/libvirtd.c b/daemon/libvirtd.c index 87c5b22710..92037e9070 100644 --- a/daemon/libvirtd.c +++ b/daemon/libvirtd.c @@ -1524,9 +1524,35 @@ int main(int argc, char **argv) { 0, "shutdown", NULL, NULL);
cleanup: - virNetlinkEventServiceStopAll(); + /* NB: Order for cleanup should attempt to kept in the inverse order + * was was used to create/start the daemon - there are some fairly + * important reliances built into the startup processing that use + * refcnt's in order to manage objects. Removing too early could + * cause crashes. */
^Not a very useful commentary, more suitable for the commit message - so I preferred if you'd drop it from here.
virNetDaemonClose(dmn); + + virNetlinkEventServiceStopAll(); + + if (driversInitialized) { + /* Set via daemonRunStateInit thread in daemonStateInit. + * NB: It is possible that virNetlinkEventServerStart fails + * and we jump to cleanup before driversInitialized has + * been set. That could leave things inconsistent; however, + * resolution of that possibility is perhaps more trouble than + * it's worth to handle. */
True, I guess, nevertheless it's not very useful either... ...
+ virObjectUnref(dmn);
Why ^here? Just because we're doing the cleanup the exact opposite way? The approach is right, but I think in general cleanup routines should be run first followed by releasing all the objects, so this should stay where it is right now - at the very end. I agree with the rest of the hunks. Reviewed-by: Erik Skultety <eskultet@redhat.com>

On 11/10/2017 10:23 AM, Erik Skultety wrote:
On Tue, Nov 07, 2017 at 09:39:56PM -0500, John Ferlan wrote:
Current cleanup processing is ad-hoc at best - it's led to a couple of strange and hard to diagnose timing problems and crashes.
So rather than perform cleanup in a somewhat random order, let's perform cleanup in the exact opposite order of startup.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- daemon/libvirtd.c | 42 +++++++++++++++++++++++++++++------------- 1 file changed, 29 insertions(+), 13 deletions(-)
diff --git a/daemon/libvirtd.c b/daemon/libvirtd.c index 87c5b22710..92037e9070 100644 --- a/daemon/libvirtd.c +++ b/daemon/libvirtd.c @@ -1524,9 +1524,35 @@ int main(int argc, char **argv) { 0, "shutdown", NULL, NULL);
cleanup: - virNetlinkEventServiceStopAll(); + /* NB: Order for cleanup should attempt to kept in the inverse order + * was was used to create/start the daemon - there are some fairly + * important reliances built into the startup processing that use + * refcnt's in order to manage objects. Removing too early could + * cause crashes. */
^Not a very useful commentary, more suitable for the commit message - so I preferred if you'd drop it from here.
I like it here because I won't necessarily go find the commit that added this code in order to know that we need to be very careful and orderly about the cleanup order. The only time I'd go back to a commit message is to perhaps point out why the order is the way it is when some other patch f*ks it up ;-)
virNetDaemonClose(dmn); + + virNetlinkEventServiceStopAll(); + + if (driversInitialized) { + /* Set via daemonRunStateInit thread in daemonStateInit. + * NB: It is possible that virNetlinkEventServerStart fails + * and we jump to cleanup before driversInitialized has + * been set. That could leave things inconsistent; however, + * resolution of that possibility is perhaps more trouble than + * it's worth to handle. */
True, I guess, nevertheless it's not very useful either...
...
It would be for someone chasing a very rogue and odd driver core. I think it was important to know/understand that there is an open window here which someone could drive a truck through. Maybe someone else has a more wonderful idea or set of patches that would avoid this... I could add an XXX in front of which seems to be our norm of - someone someday when they have copious free time and wants to pull their hair out could actually fix this problem.
+ virObjectUnref(dmn);
Why ^here? Just because we're doing the cleanup the exact opposite way? The approach is right, but I think in general cleanup routines should be run first followed by releasing all the objects, so this should stay where it is right now - at the very end.
Nothing after this theoretically relies on @dmn still existing. Nothing else uses it or has a reference to it.... John
I agree with the rest of the hunks.
Reviewed-by: Erik Skultety <eskultet@redhat.com>

On Fri, Nov 10, 2017 at 05:51:26PM -0500, John Ferlan wrote:
On 11/10/2017 10:23 AM, Erik Skultety wrote:
On Tue, Nov 07, 2017 at 09:39:56PM -0500, John Ferlan wrote:
Current cleanup processing is ad-hoc at best - it's led to a couple of strange and hard to diagnose timing problems and crashes.
So rather than perform cleanup in a somewhat random order, let's perform cleanup in the exact opposite order of startup.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- daemon/libvirtd.c | 42 +++++++++++++++++++++++++++++------------- 1 file changed, 29 insertions(+), 13 deletions(-)
diff --git a/daemon/libvirtd.c b/daemon/libvirtd.c index 87c5b22710..92037e9070 100644 --- a/daemon/libvirtd.c +++ b/daemon/libvirtd.c @@ -1524,9 +1524,35 @@ int main(int argc, char **argv) { 0, "shutdown", NULL, NULL);
cleanup: - virNetlinkEventServiceStopAll(); + /* NB: Order for cleanup should attempt to kept in the inverse order + * was was used to create/start the daemon - there are some fairly + * important reliances built into the startup processing that use + * refcnt's in order to manage objects. Removing too early could + * cause crashes. */
^Not a very useful commentary, more suitable for the commit message - so I preferred if you'd drop it from here.
I like it here because I won't necessarily go find the commit that added this code in order to know that we need to be very careful and orderly about the cleanup order.
The only time I'd go back to a commit message is to perhaps point out why the order is the way it is when some other patch f*ks it up ;-)
Well, I think that releasing memory in the inverse order should be an implicit rationale, you could at least shorten the commentary to something as simple as "Cleanup order needs to be kept strictly in the inverse order"...
virNetDaemonClose(dmn); + + virNetlinkEventServiceStopAll(); + + if (driversInitialized) { + /* Set via daemonRunStateInit thread in daemonStateInit. + * NB: It is possible that virNetlinkEventServerStart fails + * and we jump to cleanup before driversInitialized has + * been set. That could leave things inconsistent; however, + * resolution of that possibility is perhaps more trouble than + * it's worth to handle. */
True, I guess, nevertheless it's not very useful either...
...
It would be for someone chasing a very rogue and odd driver core. I think it was important to know/understand that there is an open window here which someone could drive a truck through.
Maybe I'm just paranoid enough to expect such a window to exist everywhere... not a fan of the commentary though... Reviewed-by: Erik Skultety <eskultet@redhat.com>

On 11/16/2017 02:58 AM, Erik Skultety wrote:
On Fri, Nov 10, 2017 at 05:51:26PM -0500, John Ferlan wrote:
On 11/10/2017 10:23 AM, Erik Skultety wrote:
On Tue, Nov 07, 2017 at 09:39:56PM -0500, John Ferlan wrote:
Current cleanup processing is ad-hoc at best - it's led to a couple of strange and hard to diagnose timing problems and crashes.
So rather than perform cleanup in a somewhat random order, let's perform cleanup in the exact opposite order of startup.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- daemon/libvirtd.c | 42 +++++++++++++++++++++++++++++------------- 1 file changed, 29 insertions(+), 13 deletions(-)
diff --git a/daemon/libvirtd.c b/daemon/libvirtd.c index 87c5b22710..92037e9070 100644 --- a/daemon/libvirtd.c +++ b/daemon/libvirtd.c @@ -1524,9 +1524,35 @@ int main(int argc, char **argv) { 0, "shutdown", NULL, NULL);
cleanup: - virNetlinkEventServiceStopAll(); + /* NB: Order for cleanup should attempt to kept in the inverse order + * was was used to create/start the daemon - there are some fairly + * important reliances built into the startup processing that use + * refcnt's in order to manage objects. Removing too early could + * cause crashes. */
^Not a very useful commentary, more suitable for the commit message - so I preferred if you'd drop it from here.
I like it here because I won't necessarily go find the commit that added this code in order to know that we need to be very careful and orderly about the cleanup order.
The only time I'd go back to a commit message is to perhaps point out why the order is the way it is when some other patch f*ks it up ;-)
Well, I think that releasing memory in the inverse order should be an implicit rationale, you could at least shorten the commentary to something as simple as "Cleanup order needs to be kept strictly in the inverse order"...
I don't disagree about implicit rationale - unfortunately as seen in this cleanup reality is further from those logical thoughts.
virNetDaemonClose(dmn); + + virNetlinkEventServiceStopAll(); + + if (driversInitialized) { + /* Set via daemonRunStateInit thread in daemonStateInit. + * NB: It is possible that virNetlinkEventServerStart fails + * and we jump to cleanup before driversInitialized has + * been set. That could leave things inconsistent; however, + * resolution of that possibility is perhaps more trouble than + * it's worth to handle. */
True, I guess, nevertheless it's not very useful either...
...
It would be for someone chasing a very rogue and odd driver core. I think it was important to know/understand that there is an open window here which someone could drive a truck through.
Maybe I'm just paranoid enough to expect such a window to exist everywhere... not a fan of the commentary though...
Reviewed-by: Erik Skultety <eskultet@redhat.com>
Fair enough - point made. I'll adjust as follows and add the extra commentary about drivers initialized in the commit message for anyone that cares to go back through history... FWIW, all of cleanup: cleanup: /* Keep cleanup order in inverse order of startup */ virNetDaemonClose(dmn); virNetlinkEventServiceStopAll(); if (driversInitialized) { /* NB: Possible issue with timing window between driversInitialized * setting if virNetlinkEventServerStart fails */ driversInitialized = false; virStateCleanup(); } virObjectUnref(adminProgram); virObjectUnref(srvAdm); virObjectUnref(qemuProgram); virObjectUnref(lxcProgram); virObjectUnref(remoteProgram); virObjectUnref(srv); virObjectUnref(dmn); virNetlinkShutdown(); if (pid_file_fd != -1) virPidFileReleasePath(pid_file, pid_file_fd); VIR_FREE(run_dir); if (statuswrite != -1) { if (ret != 0) { /* Tell parent of daemon what failed */ char status = ret; while (write(statuswrite, &status, 1) == -1 && errno == EINTR) ; } VIR_FORCE_CLOSE(statuswrite); } VIR_FREE(sock_file); VIR_FREE(sock_file_ro); VIR_FREE(sock_file_adm); VIR_FREE(pid_file); VIR_FREE(remote_config_file); daemonConfigFree(config); return ret; John
participants (2)
-
Erik Skultety
-
John Ferlan