On Thu, Feb 18, 2021 at 12:04:56PM -0600, Jonathon Jongsma wrote:
On Wed, 17 Feb 2021 14:35:51 +0100
Erik Skultety <eskultet(a)redhat.com> wrote:
> > +static void
> > +mdevctlEventHandleCallback(GFileMonitor *monitor G_GNUC_UNUSED,
> > + GFile *file,
> > + GFile *other_file G_GNUC_UNUSED,
> > + GFileMonitorEvent event_type,
> > + gpointer user_data)
> > +{
> > + udevEventDataPtr priv = user_data;
> > + /* if a new directory appears, monitor that directory for
> > changes */
> > + if (event_type == G_FILE_MONITOR_EVENT_CREATED &&
> > + g_file_query_file_type(file, G_FILE_QUERY_INFO_NONE, NULL)
> > ==
> > + G_FILE_TYPE_DIRECTORY) {
> > + GList *newmonitors = monitorFileRecursively(priv, file);
> > + priv->mdevctlMonitors =
> > g_list_concat(priv->mdevctlMonitors, newmonitors);
>
> priv->mdevctlMonitors is shared among threads, but you only acquire
> the mutex just before exec'ing mdevctl in the mdevctlHandler thread,
> so ^this can yield unexpected results in terms of directories
> monitored.
A) It doesn't really affect the directories being monitored, it only
affects our ability to free these monitors later in the destructor.
B) Aside from the very first time this variable is set in
nodeStateInitialize(), the only place that accesses the mdevctlMonitors
variable is this glib signal callback (to add new monitors to the list)
and the destructor (to free the monitors). And I believe those should
always run in the main thread. So I don't think there's really much of
a thread-safety issue here. Perhaps there's a slight race in
nodeStateInitialize() which appears to be called from a non-main
thread. If a directory gets created immediately after connecting to the
file monitor signal but before the function returns and the GList gets
assigned to priv->mdevctlMonitors, I suppose the signal handler could
run and it could get corrupted. Would you like me to protect all
accesses with a Mutex?
If for some reason we need to extend the code in the future, it's so easy to
miss a single spot. So, I'd say - better safe than sorry - let's protect the
variable the way it's supposed to be even though the race might not exist right
now.
Erik
>
> > + }
> > +
> > + /* When mdevctl creates a device, it can result in multiple
> > notify events
> > + * emitted for a single logical change (e.g. several CHANGED
> > events, or a
> > + * CREATED and CHANGED event followed by CHANGES_DONE_HINT).
> > To avoid
> > + * spawning a mdevctl thread multiple times for a single
> > logical
> > + * configuration change, try to coalesce these changes by
> > waiting for the
> > + * CHANGES_DONE_HINT event. As a fallback, add a timeout to
> > trigger the
> > + * signal if that event never comes */
>
> So there's no guaranteed pattern to monitor, is that so? IOW Even the
> CHANGES_DONE_HINT may not actually arrive? That's a very poor design.
> I'd like to ditch the timer as much as possible.
Maybe we can simply rely on the CHANGES_DONE_HINT. I tend to be
conservative and was trying to handle the case where the
CHANGES_DONE_HINT might not be delivered (it is called a "hint" after
all). Part of this was caused by my reading this comment in a test
case within the glib repository:
/* Sometimes the emission of 'CHANGES_DONE_HINT' may be late because
* it depends on the ability of file monitor implementation to report
* 'CHANGES_DONE_HINT' itself. If the file monitor implementation
* doesn't report 'CHANGES_DONE_HINT' itself, it may be emitted by
* GLocalFileMonitor after a few seconds, which causes the event to
* mix with results from different steps. Since 'CHANGES_DONE_HINT'
* is just a hint, we don't require it to be reliable and we simply
* ignore unexpected 'CHANGES_DONE_HINT' events here. */
So it is not reliably communicated if your file monitor backend doesn't
implement it. But the above comment seems to suggest that if the file monitor
backend doesn't support it, the event will be generated by glib itself and will
simply be delayed by a couple of seconds rather than missing altogether. I'd
have to read the glib code to tell whether it's possible for it to be omitted
altogether. I guess I don't see the timer as a horrible workaround, but I can
ditch it if you want.
Alternatively I suppose I could simply kick off a new thread for each file
change event and not bother trying to coalesce them at all. This will result in
perhaps a an extra thread or two being launched when mdevctl modifies
the mdev registry behind our backs, but that's unlikely to be a common
occurrence and so shouldn't really be a significant performance issue.
If there may be multiple events connected to a single change, which one would
trigger the actual device list update? I guess I'm starting to be okay with the
timeout in the end.
Erik