On 10/8/21 2:23 PM, Daniel P. Berrangé wrote:
On Fri, Oct 08, 2021 at 01:56:24PM +0200, Michal Prívozník wrote:
> On 10/8/21 1:36 PM, Daniel P. Berrangé wrote:
>> On Fri, Oct 08, 2021 at 01:12:49PM +0200, Michal Privoznik wrote:
>>> When a server decides to close a client, the
>>> virNetServerClientCloseLocked() is called. In here various
>>> cleanup steps are taken, but the most important part (from this
>>> commit's POV at least) is the way that the socket is closed.
>>> Firstly, removal of the socket associated with the client from
>>> the event loop is signalized and then the socket is unrefed. The
>>> socket is not closed just yet though, because the event loop
>>> holds a reference to it. This reference will be freed as soon as
>>> the event loop wakes up and starts issuing callbacks (in this
>>> case virNetSocketEventFree()).
>>>
>>> So far, this is how things usually work. But if the daemon
>>> reaches the number of opened files limit, things start to work
>>> differently.
>>>
>>> If the RLIMIT_NOFILE limit is reached and there's a client that
>>> wants to connect then the event loop wakes up, sees POLLIN on the
>>> socket and calls virNetServerServiceAccept() which in turn calls
>>> virNetSocketAccept(). But because of the limit, accept() fails
>>> with EMFILE leaving the POLLIN event unhandled. The dispatch then
>>> continues to next FDs with events on them. BUT, it will NOT call
>>> the socket removal callback (virNetSocketEventFree()) because it
>>> has low priority (G_PRIORITY_DEFAULT_IDLE). Per glib's
>>> documentation:
>>
>> if virNetSocketAccept fails, there's no client socket to
>> remove/close. AFAIK, we are not removing the server
>> socket either.
>>
>> So I'm not following which socket is supposedly being
>> removed in this scenario ?
>
> Right. I didn't phrase it properly. Let's call the
> /var/run/libvirt/libvirt-sock SockA and accepted socket (i.e. per
> client) SockB.
> Let the NOFILE limit be just the right value so that both SockA and
> SockB can exist, but no more. Now, let's have another client trying to
> connect(). The event loop wakes up, because there's an event on SockA.
> But the connection can't be accept()-ed. Alright, so let the already
> connected client disconnect (that is, SockB). This will result in:
>
> g_idle_add(virEventGLibHandleRemoveIdle, SockB); /* I've made a
> shortcut here, in reality there's a data variable that points to SockB,
> but you get the idea */
>
> Now, because we still haven't handled the connection attempt at SockA
> the idle source won't ever be called because it has lower priority.
> Therefore, the event loop is stuck in endless loop trying to accept()
> connection on SockA.
Ah i see, "idle" callbacks aren't called because the event loop
isn't idle.
So with this change, we'll let the idle callback run, which will
close the FD which will let libvird accept the client which gets
us out of the 100% CPU burn loop..... unless there were two
pending clients....
So, ACK to this patch as a quick fix, but it feels like we
ultimately have a more serious problem.
I wonder if we should temporarily stop watching events on
the incoming server if we get EMFILE first time on accept,
and set a timer to re-enable events 60 seconds later. Rinse,
repeat.
Yes, good idea. But I worry that 60 seconds may be too long. Or
needlessly short - depending on usage. Nevertheless, let us start with
60 seconds and we can fine tune later.
So are you okay with me pushing this patch and sending this ^^ as a
follow up?
>>> * Each event source is assigned a priority. The default priority,
>>> * %G_PRIORITY_DEFAULT, is 0. Values less than 0 denote higher priorities.
>>> * Values greater than 0 denote lower priorities. Events from high priority
>>> * sources are always processed before events from lower priority sources.
>>>
>>> and per g_idle_add() documentation:
>>>
>>> * Adds a function to be called whenever there are no higher priority
>>> * events pending to the default main loop. The function is given the
>>> * default idle priority, %G_PRIORITY_DEFAULT_IDLE.
>>>
>>> Now, because we did not accept() the client we are constantly
>>> seeing POLLIN on the main socket and thus the removal of the
>>> client socket won't ever happen.
>>>
>>> The fix is to set at least the same priority as other sources,
>>> but since we want to just close an FD, let's give it the highest
>>> priority and call it before handling other events.
>>>
>>> This issue can be easily reproduced, for instance:
>>>
>>> # ulimit -S -n 40 (tweak this number if needed)
>>> # ./src/libvirtd
>>>
>>> from another terminal:
>>>
>>> # for ((i=0; i<100; i++)); do virsh list & done; virsh list
>>>
>>> The last `virsh list` must not get stuck.
>>
>> The bug below describes libvirt core dumps rather than
>> hangs. Do you know why it dumped ?
>
> Because of glib. We called some glib function which internally wanted to
> open a pipe which failed. Thus glib aborted. I don't remember the exact
> function, but I can look it up if you want.
Oh, so this patch is not actually fixing the core dump scenario
then. Even after we fix the bug that prevents clients disconnecting,
it will still dump if this API is called at a point where we are in
a EMFILE state ?
Yes. But I am afraid there's no way around it. Found the stack trace:
Stack trace of thread 52231:
#0 0x000003ffa90be9e4 raise (libc.so.6)
#1 0x000003ffa96590ee g_log_default_handler (libglib-2.0.so.0)
#2 0x000003ffa965935e g_logv (libglib-2.0.so.0)
#3 0x000003ffa965955a g_log (libglib-2.0.so.0)
#4 0x000003ffa969dd70 g_wakeup_new (libglib-2.0.so.0)
#5 0x000003ffa964eb38 g_main_context_new (libglib-2.0.so.0)
#6 0x000003ffa9946886 vir_event_thread_init (libvirt.so.0)
#7 0x000003ffa95b5ff8 g_type_create_instance (libgobject-2.0.so.0)
#8 0x000003ffa95978cc g_object_new_internal (libgobject-2.0.so.0)
#9 0x000003ffa95990de g_object_new_with_properties (libgobject-2.0.so.0)
#10 0x000003ffa9599ab6 g_object_new (libgobject-2.0.so.0)
#11 0x000003ffa9946b78 virEventThreadNew (libvirt.so.0)
#12 0x000003ff6c2c8bd8 qemuProcessQMPNew (libvirt_driver_qemu.so)
#13 0x000003ff6c1f57b8 virQEMUCapsInitQMPSingle (libvirt_driver_qemu.so)
#14 0x000003ff6c1f59ec virQEMUCapsNewForBinaryInternal
(libvirt_driver_qemu.so)
#15 0x000003ff6c1f5c92 virQEMUCapsNewData (libvirt_driver_qemu.so)
#16 0x000003ffa99510f4 virFileCacheValidate (libvirt.so.0)
#17 0x000003ffa995133c virFileCacheLookup (libvirt.so.0)
#18 0x000003ff6c1f5e4a virQEMUCapsCacheLookup (libvirt_driver_qemu.so)
#19 0x000003ff6c1f621e virQEMUCapsCacheLookupCopy (libvirt_driver_qemu.so)
#20 0x000003ff6c2c4a44 qemuProcessInit (libvirt_driver_qemu.so)
#21 0x000003ff6c2c67f2 qemuProcessStart (libvirt_driver_qemu.so)
#22 0x000003ff6c26358c qemuDomainObjStart.constprop.55
(libvirt_driver_qemu.so)
#23 0x000003ff6c2639c8 qemuDomainCreateWithFlags (libvirt_driver_qemu.so)
#24 0x000003ffa9b77902 virDomainCreate (libvirt.so.0)
#25 0x000002aa34ad2d52 remoteDispatchDomainCreateHelper (libvirtd)
#26 0x000003ffa9a6094a virNetServerProgramDispatch (libvirt.so.0)
#27 0x000003ffa9a665b6 virNetServerHandleJob (libvirt.so.0)
#28 0x000003ffa99a0d32 virThreadPoolWorker (libvirt.so.0)
#29 0x000003ffa99a02d6 virThreadHelper (libvirt.so.0)
#30 0x000003ffa85880c6 start_thread (libpthread.so.0)
#31 0x000003ffa917bd36 thread_start (libc.so.6)
So my bad, it failed opening eventfd (in frame #4 g_wakeup_new()) but
anyway, I don't think we can mitigate this. But I think there was
another stacktrace attached to BZ where it failed in glib in pipe().
Doesn't really matter though. It's glib - we accepted abort policy when
adopting it.
Michal