On Thu, Aug 31, 2017 at 12:23:09PM -0400, John Ferlan wrote:
On 08/29/2017 08:49 AM, Erik Skultety wrote:
> On Mon, Aug 28, 2017 at 12:26:08PM -0400, John Ferlan wrote:
>>
>>
>> On 08/24/2017 07:23 AM, Erik Skultety wrote:
>>> Adjust udevEventHandleThread to be a proper thread routine running in an
>>> infinite loop handling devices. Also introduce udevEventThreadData
>>> private structure.
>>> Every time there's and incoming event from udev,
udevEventHandleCallback
>>> only increments the number of events queuing on the monitor and signals
>>> the worker thread to handle them.
>>>
>>> Signed-off-by: Erik Skultety <eskultet(a)redhat.com>
>>> ---
>>> src/node_device/node_device_udev.c | 132
++++++++++++++++++++++++++++++-------
>>> 1 file changed, 108 insertions(+), 24 deletions(-)
>>>
Let me take my head out of the VxHS sandbox ;-) and peek into this other
quagmire!
>>
>> Once we get here I'm not sure I understand the udev interaction. I guess
>> it's not "crystal clear" that the socket relationship is a queue
of
>> device events. I also haven't studied the udev code nor have I been
>> working through this as much as you have lately - so perhaps something
>> you've uncovered will help educate me in this manner. Still here's my
>> thoughts and a small sliver of investigative data...
>>
>> So far there's been a 1-to-1 interaction between libvirt and udev event.
>> With this change there could be an n-to-1 interaction - as in receive @n
>> devices.
>>
>> Up to this point the interaction has been:
>>
>> 1. udev event
>> 2. validate fd/udev_monitor
>> 3. call udev_monitor_receive_device to get @device
>> 4. process @device
>> 5. unref @device
>> 6. return to udev
>>
>> After this point
>>
>> 1. udev event
>> 2. validate fd/udev_monitor
>> 3. increase event count
>> 4. signal condition to thread to wakeup
>> 5. return to udev
>>
>> asynchronously in a loop
>>
>> 1. wakeup on condition and for each 'event' refcnt
>> 2. decrease event refcnt
>> 3. validate fd/udev_monitor
>> 4. call udev_monitor_receive_device to get @device
>> 5. process @device
>> 6. unref @device
>>
>> If there's more than one event, does udev_monitor_receive_device
>> guarantee the order? Are we sure udev buffers/queues in this manner?
>
> This should be fairly deterministic as it's not that much of a udev problem as
> it is a kernel problem (since we're talking about a netlink socket which is
> internally represented by a 'struct socket' structure in the kernel which
in
> fact implements socket buffers, more specifically - very important - queues).
> So, if kernel didn't guarantee the order of packets/events on the socket,
you'd
> experience the same problem with calling udev_monitor_receive_device, which
> basically only does recvmsg(), directly from udevEventHandleCallback. IOW if
> that was the case, udev itself could get easily confused by the kernel events
> about a device being deleted that had not been previously added etc. So, yes,
> the order has to be guaranteed and if it is not, then it's a kernel bug.
>
Ah... OK great - I hadn't dug into that code and figured you'd been
looking more in depth. I think as part of the commit message for this
(or whatever this evolves into), the a description of how the udev
processing works could help. At the very least something that says,
kernel udev keeps messages in a queue on the socket to allow pulling off
the messages by udev_monitor_receive_device. If too many messages
stockpile, ENOSPC is generated, while if we try to pull something off
but nothing exists, then EAGAIN is generated. Since the socket has a
remote end (libvirt) that "should" be notified when data arrives (or
something like that).
OK, commit message is going to be enhanced, but probably not this one, rather
the one for patch 4/6.
...
Seems to me then 'nevents' counting is pointless other than to indicate
that you got "some" event that needs to be processed.
When the thread wakes up or is scheduled, then it processes all it can
find (locked of course) and then clears the flag indicating the presence
of events. Of course, that blocks the Callback function from setting the
flag that there's an event.
Thus still having a somewhat similar problem as you're trying to fix but
also introducing the problem that the thread wakes up and locks private
data, event is signaled again indicating the same event waiting, but it
cannot get the private data lock, meanwhile the thread processes all
events, releases the lock. This allows the other thread to gain the
It doesn't process everything it can, it only processes up to N events it
already knows about, so if there are 2 events and you're about to process the
second one, lock the private data, decrement the events, process it and then
either wait if the handle callback wasn't waiting on the lock to increment
@nevents again or process any new events signaled by the handle callback. So
the important thing is that the problem you're describing cannot happen, since
the worker thread *does* rely on the actual number of events (the ones it knows
about) waiting on the monitor, it's not a pointless counter.
lock, set the new event bit, but since we've already processed it
- when
the thread wakes up it finds no event and generates the bogus message.
IIUC, then patch4 resolves this by disabling the main event loop from
To be precise, patch 4 only takes into account the existence of a scheduler :).
While I could merge 4/6 into this one, I'd rather not do it, simply because I
tried to make all the changes as gradual as possible and I'm afraid that if I
merged it, the change and the meaning of 4/6 might easily get overlooked, which
is quite the opposite of what I was trying to achieve, I wanted to make the
change and the corresponding issue explicit, in case someone hits a similar
issue in the future.
sending events until we've processed them. Thus should patch4 be
merged
with this one?
In any case, that's some nice detective work - I can only imagine this
was *really* frustrating to dig into! Some day it'll be "fixed" and we
won't trip into the problem, but until then, I now understand things a
lot better...
Well, honestly, it could have been much worse given libudev's codebase, since
there are exits that do not set any errno, they just return NULL straight away
and since I'm still about to find out where does libudev actually outputs the
error messages, as I'm pretty sure it's not within our context as I've been
told, trying to debug a race by investigating libudev's internals and exit
points is not something I would describe as fun and would recommend in any way.
Erik