On Fri, Oct 08, 2021 at 01:12:49PM +0200, Michal Privoznik wrote:
When a server decides to close a client, the
virNetServerClientCloseLocked() is called. In here various
cleanup steps are taken, but the most important part (from this
commit's POV at least) is the way that the socket is closed.
Firstly, removal of the socket associated with the client from
the event loop is signalized and then the socket is unrefed. The
socket is not closed just yet though, because the event loop
holds a reference to it. This reference will be freed as soon as
the event loop wakes up and starts issuing callbacks (in this
case virNetSocketEventFree()).
So far, this is how things usually work. But if the daemon
reaches the number of opened files limit, things start to work
differently.
If the RLIMIT_NOFILE limit is reached and there's a client that
wants to connect then the event loop wakes up, sees POLLIN on the
socket and calls virNetServerServiceAccept() which in turn calls
virNetSocketAccept(). But because of the limit, accept() fails
with EMFILE leaving the POLLIN event unhandled. The dispatch then
continues to next FDs with events on them. BUT, it will NOT call
the socket removal callback (virNetSocketEventFree()) because it
has low priority (G_PRIORITY_DEFAULT_IDLE). Per glib's
documentation:
if virNetSocketAccept fails, there's no client socket to
remove/close. AFAIK, we are not removing the server
socket either.
So I'm not following which socket is supposedly being
removed in this scenario ?
* Each event source is assigned a priority. The default priority,
* %G_PRIORITY_DEFAULT, is 0. Values less than 0 denote higher priorities.
* Values greater than 0 denote lower priorities. Events from high priority
* sources are always processed before events from lower priority sources.
and per g_idle_add() documentation:
* Adds a function to be called whenever there are no higher priority
* events pending to the default main loop. The function is given the
* default idle priority, %G_PRIORITY_DEFAULT_IDLE.
Now, because we did not accept() the client we are constantly
seeing POLLIN on the main socket and thus the removal of the
client socket won't ever happen.
The fix is to set at least the same priority as other sources,
but since we want to just close an FD, let's give it the highest
priority and call it before handling other events.
This issue can be easily reproduced, for instance:
# ulimit -S -n 40 (tweak this number if needed)
# ./src/libvirtd
from another terminal:
# for ((i=0; i<100; i++)); do virsh list & done; virsh list
The last `virsh list` must not get stuck.
The bug below describes libvirt core dumps rather than
hangs. Do you know why it dumped ?
Resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=2007168
Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
---
src/util/vireventglib.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/util/vireventglib.c b/src/util/vireventglib.c
index f3e5a344b0..983787932f 100644
--- a/src/util/vireventglib.c
+++ b/src/util/vireventglib.c
@@ -287,7 +287,7 @@ virEventGLibHandleRemove(int watch)
* 'removed' to prevent reuse
*/
data->removed = TRUE;
- g_idle_add(virEventGLibHandleRemoveIdle, data);
+ g_idle_add_full(G_PRIORITY_HIGH, virEventGLibHandleRemoveIdle, data, NULL);
ret = 0;
--
2.32.0
Regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|