Re: [PATCH] vireventglib: Remove handles with the highest priority

Friday, 8 October 2021

On Fri, Oct 08, 2021 at 03:13:27PM +0200, Michal Prívozník wrote:
...
 On 10/8/21 2:23 PM, Daniel P. Berrangé wrote:
 > On Fri, Oct 08, 2021 at 01:56:24PM +0200, Michal Prívozník wrote:
 >> On 10/8/21 1:36 PM, Daniel P. Berrangé wrote:
 > So, ACK to this patch as a quick fix, but it feels like we
 > ultimately have a more serious problem.
 > 
 > I wonder if we should temporarily stop watching events on
 > the incoming server if we get EMFILE first time on accept,
 > and set a timer to re-enable events 60 seconds later. Rinse,
 > repeat.

 Yes, good idea. But I worry that 60 seconds may be too long. Or
 needlessly short - depending on usage. Nevertheless, let us start with
 60 seconds and we can fine tune later. 
Once we're in this out of FDs situation, we're in a bad mess no
matter what, so the length of timeout isn't a big issue IMHO.

Essentially we just need to stop burning CPU gratuitously.
Even 5 seconds is long enough to achieve that goal, so just
pick a number you're happy with - I don't feel strongly about
it being 60 seconds, it was just an arbitrary number i picked.

...
 So are you okay with me pushing this patch and sending this ^^ as a
 follow up? 
Yes, consider is Reviewed-by: Daniel P. Berrangé <berrange(a)redhat.com&gt;

...

 > 
 > 
 >>>>  * Each event source is assigned a priority. The default priority,
 >>>>  * %G_PRIORITY_DEFAULT, is 0. Values less than 0 denote higher
priorities.
 >>>>  * Values greater than 0 denote lower priorities. Events from high
priority
 >>>>  * sources are always processed before events from lower priority
sources.
 >>>>
 >>>> and per g_idle_add() documentation:
 >>>>
 >>>>  * Adds a function to be called whenever there are no higher priority
 >>>>  * events pending to the default main loop. The function is given the
 >>>>  * default idle priority, %G_PRIORITY_DEFAULT_IDLE.
 >>>>
 >>>> Now, because we did not accept() the client we are constantly
 >>>> seeing POLLIN on the main socket and thus the removal of the
 >>>> client socket won't ever happen.
 >>>>
 >>>> The fix is to set at least the same priority as other sources,
 >>>> but since we want to just close an FD, let's give it the highest
 >>>> priority and call it before handling other events.
 >>>>
 >>>> This issue can be easily reproduced, for instance:
 >>>>
 >>>>  # ulimit -S -n 40 (tweak this number if needed)
 >>>>  # ./src/libvirtd
 >>>>
 >>>> from another terminal:
 >>>>
 >>>>  # for ((i=0; i<100; i++)); do virsh list & done; virsh list
 >>>>
 >>>> The last `virsh list` must not get stuck.
 >>>
 >>> The bug below describes libvirt core dumps rather than
 >>> hangs. Do you know why it dumped ?
 >>
 >> Because of glib. We called some glib function which internally wanted to
 >> open a pipe which failed. Thus glib aborted. I don't remember the exact
 >> function, but I can look it up if you want.
 > 
 > Oh, so this patch is not actually fixing the core dump scenario
 > then.  Even after we fix the bug that prevents clients disconnecting,
 > it will still dump if this API is called at a point where we are in
 > a EMFILE state ?

 Yes. But I am afraid there's no way around it. Found the stack trace:

 Stack trace of thread 52231:
 #0  0x000003ffa90be9e4 raise (libc.so.6)
 #1  0x000003ffa96590ee g_log_default_handler (libglib-2.0.so.0)
 #2  0x000003ffa965935e g_logv (libglib-2.0.so.0)
 #3  0x000003ffa965955a g_log (libglib-2.0.so.0)
 #4  0x000003ffa969dd70 g_wakeup_new (libglib-2.0.so.0)
 #5  0x000003ffa964eb38 g_main_context_new (libglib-2.0.so.0)
 #6  0x000003ffa9946886 vir_event_thread_init (libvirt.so.0)
 #7  0x000003ffa95b5ff8 g_type_create_instance (libgobject-2.0.so.0)
 #8  0x000003ffa95978cc g_object_new_internal (libgobject-2.0.so.0)
 #9  0x000003ffa95990de g_object_new_with_properties (libgobject-2.0.so.0)
 #10 0x000003ffa9599ab6 g_object_new (libgobject-2.0.so.0)
 #11 0x000003ffa9946b78 virEventThreadNew (libvirt.so.0)
 #12 0x000003ff6c2c8bd8 qemuProcessQMPNew (libvirt_driver_qemu.so)
 #13 0x000003ff6c1f57b8 virQEMUCapsInitQMPSingle (libvirt_driver_qemu.so)
 #14 0x000003ff6c1f59ec virQEMUCapsNewForBinaryInternal
 (libvirt_driver_qemu.so)
 #15 0x000003ff6c1f5c92 virQEMUCapsNewData (libvirt_driver_qemu.so)
 #16 0x000003ffa99510f4 virFileCacheValidate (libvirt.so.0)
 #17 0x000003ffa995133c virFileCacheLookup (libvirt.so.0)
 #18 0x000003ff6c1f5e4a virQEMUCapsCacheLookup (libvirt_driver_qemu.so)
 #19 0x000003ff6c1f621e virQEMUCapsCacheLookupCopy (libvirt_driver_qemu.so)
 #20 0x000003ff6c2c4a44 qemuProcessInit (libvirt_driver_qemu.so)
 #21 0x000003ff6c2c67f2 qemuProcessStart (libvirt_driver_qemu.so)
 #22 0x000003ff6c26358c qemuDomainObjStart.constprop.55
 (libvirt_driver_qemu.so)
 #23 0x000003ff6c2639c8 qemuDomainCreateWithFlags (libvirt_driver_qemu.so)
 #24 0x000003ffa9b77902 virDomainCreate (libvirt.so.0)
 #25 0x000002aa34ad2d52 remoteDispatchDomainCreateHelper (libvirtd)
 #26 0x000003ffa9a6094a virNetServerProgramDispatch (libvirt.so.0)
 #27 0x000003ffa9a665b6 virNetServerHandleJob (libvirt.so.0)
 #28 0x000003ffa99a0d32 virThreadPoolWorker (libvirt.so.0)
 #29 0x000003ffa99a02d6 virThreadHelper (libvirt.so.0)
 #30 0x000003ffa85880c6 start_thread (libpthread.so.0)
 #31 0x000003ffa917bd36 thread_start (libc.so.6)

 So my bad, it failed opening eventfd (in frame #4 g_wakeup_new()) but
 anyway, I don't think we can mitigate this. But I think there was
 another stacktrace attached to BZ where it failed in glib in pipe().
 Doesn't really matter though. It's glib - we accepted abort policy when
 adopting it. 
Yeah, we're basically doomed with FD exhaustion. If a host got into
this state once, its likely todo it again, so the only real fix is
to re-configure the host to have sufficient FDs. Anything we do to
cope with FD exhaustion is just a "nice to have", not a real fix

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [PATCH] vireventglib: Remove handles with the highest priority