On Mon, Aug 28, 2017 at 12:26:08PM -0400, John Ferlan wrote:
On 08/24/2017 07:23 AM, Erik Skultety wrote:
> Adjust udevEventHandleThread to be a proper thread routine running in an
> infinite loop handling devices. Also introduce udevEventThreadData
> private structure.
> Every time there's and incoming event from udev, udevEventHandleCallback
> only increments the number of events queuing on the monitor and signals
> the worker thread to handle them.
>
> Signed-off-by: Erik Skultety <eskultet(a)redhat.com>
> ---
> src/node_device/node_device_udev.c | 132 ++++++++++++++++++++++++++++++-------
> 1 file changed, 108 insertions(+), 24 deletions(-)
>
Once we get here I'm not sure I understand the udev interaction. I guess
it's not "crystal clear" that the socket relationship is a queue of
device events. I also haven't studied the udev code nor have I been
working through this as much as you have lately - so perhaps something
you've uncovered will help educate me in this manner. Still here's my
thoughts and a small sliver of investigative data...
So far there's been a 1-to-1 interaction between libvirt and udev event.
With this change there could be an n-to-1 interaction - as in receive @n
devices.
Up to this point the interaction has been:
1. udev event
2. validate fd/udev_monitor
3. call udev_monitor_receive_device to get @device
4. process @device
5. unref @device
6. return to udev
After this point
1. udev event
2. validate fd/udev_monitor
3. increase event count
4. signal condition to thread to wakeup
5. return to udev
asynchronously in a loop
1. wakeup on condition and for each 'event' refcnt
2. decrease event refcnt
3. validate fd/udev_monitor
4. call udev_monitor_receive_device to get @device
5. process @device
6. unref @device
If there's more than one event, does udev_monitor_receive_device
guarantee the order? Are we sure udev buffers/queues in this manner?
This should be fairly deterministic as it's not that much of a udev problem as
it is a kernel problem (since we're talking about a netlink socket which is
internally represented by a 'struct socket' structure in the kernel which in
fact implements socket buffers, more specifically - very important - queues).
So, if kernel didn't guarantee the order of packets/events on the socket, you'd
experience the same problem with calling udev_monitor_receive_device, which
basically only does recvmsg(), directly from udevEventHandleCallback. IOW if
that was the case, udev itself could get easily confused by the kernel events
about a device being deleted that had not been previously added etc. So, yes,
the order has to be guaranteed and if it is not, then it's a kernel bug.
That is since we're not grabbing @device before we return to udev
when
the event is signaled, is there any guarantee from udev that *the same*
device will still exist when we do get around to making our call? My
concern being how "long" is the data guaranteed to be there.
As I said, unless the socket buffer size limit has been reached, in which case
ENOSPC was returned by libudev - I had a BZ on this:
https://bugzilla.redhat.com/show_bug.cgi?id=1450960
As I read the subsequent patch - I'm thinking perhaps we really want to
keep that 1-to-1 relationship. Could the reason that you're getting
those messages be because we're now calling udev out of the order it
expected? Do we know why we're getting a NULL return? I found:
Yeah, I was getting EAGAIN and it wasn't until that point that I realized that
the scheduler came into play, IOW the main event loop thread polls the file
descriptor list, invoking the corresponding handlers, which means that
udevEventHandleCallback just increased the overall number of events waiting to
be processed by the handler thread, except that since no one actually extracted
the data out of the socket and the thread hadn't been scheduled in the meantime
it's still the same data that the main event loop thread finds waiting on the
socket the next time it polls the @fd, therefore incrementing the event counter
multiple times for the same event. Naturally, when the handler thread finally
removes the event from the monitor, not knowing the counter had incremented
for this event multiple times, it will fail to remove any more events from the
monitor, as none had arrived in the meantime, therefore the EAGAIN.
https://sourcecodebrowser.com/udev/141/libudev-monitor_8c.html
which seems to indicate a number of reasons to return NULL... Maybe udev
is processing another device and while doing so it blocks anyone calling
udev_monitor_receive_device. Conversely while processing an event that
same block wouldn't be present because the expectation is that the
called function will call udev_monitor_receive_device.
If this was the case, sooner or later we'd hit the problem anyway.
I think perhaps the processing thread is fine to have; however, instead
of processing the event it's processing an @device that was managed in
udevEventHandleCallback and put onto some "worker queue". At least that
This would work nicely as well, I just went with a simpler solution where I
didn't have to create another device queue and only offloaded the whole
processing onto the handler thread, so it's just a matter taste :).
Erik
way we know we have a refcnt'd udev_device and we can process in
an
expected order.
John
> diff --git a/src/node_device/node_device_udev.c
b/src/node_device/node_device_udev.c
> index fe943c78b..444e5be4d 100644
> --- a/src/node_device/node_device_udev.c
> +++ b/src/node_device/node_device_udev.c
> @@ -59,6 +59,54 @@ struct _udevPrivate {
> bool privileged;
> };
>
> +typedef struct _udevEventThreadData udevEventThreadData;
> +typedef udevEventThreadData *udevEventThreadDataPtr;
> +
> +struct _udevEventThreadData {
> + virMutex lock;
> + virCond threadCond;
> +
> + bool threadQuit;
> + int monitor_fd;
> + unsigned long long nevents; /* number of udev events queuing on monitor */
> +};
> +
> +
> +static udevEventThreadDataPtr
> +udevEventThreadDataNew(int fd)
> +{
> + udevEventThreadDataPtr ret = NULL;
> +
> + if (VIR_ALLOC_QUIET(ret) < 0)
> + return NULL;
> +
> + if (virMutexInit(&ret->lock) < 0)
> + goto cleanup;
> +
> + if (virCondInit(&ret->threadCond) < 0)
> + goto cleanup_mutex;
> +
> + ret->monitor_fd = fd;
> +
> + return ret;
> +
> + cleanup_mutex:
> + virMutexDestroy(&ret->lock);
> +
> + cleanup:
> + VIR_FREE(ret);
> + return NULL;
> +}
> +
> +
> +static void
> +udevEventThreadDataFree(udevEventThreadDataPtr data)
> +{
> + virMutexDestroy(&data->lock);
> + virCondDestroy(&data->threadCond);
> + VIR_FREE(data);
> +}
> +
>
> static bool
> udevHasDeviceProperty(struct udev_device *dev,
> @@ -1647,35 +1695,53 @@ udevCheckMonitorFD(struct udev_monitor *udev_monitor, int
fd)
>
>
> static void
> -udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
> +udevEventHandleThread(void *opaque)
> {
> + udevEventThreadDataPtr privateData = opaque;
> struct udev_device *device = NULL;
> struct udev_monitor *udev_monitor = NULL;
> - int fd = (intptr_t) opaque;
>
> - nodeDeviceLock();
> - udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
> + /* continue rather than break from the loop on non-fatal errors */
> + while (1) {
> + virMutexLock(&privateData->lock);
> + while (privateData->nevents == 0 && !privateData->threadQuit)
{
> + if (virCondWait(&privateData->threadCond,
&privateData->lock)) {
> + virReportSystemError(errno, "%s",
> + _("handler failed to wait on
condition"));
> + goto cleanup;
> + }
> + }
>
> - if (!udevCheckMonitorFD(udev_monitor, fd))
> - goto unlock;
> + privateData->nevents--;
> + virMutexUnlock(&privateData->lock);
>
> - device = udev_monitor_receive_device(udev_monitor);
> - if (device == NULL) {
> - virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> - _("udev_monitor_receive_device returned NULL"));
> - goto unlock;
> + nodeDeviceLock();
> + udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
> +
> + if (!udevCheckMonitorFD(udev_monitor, privateData->monitor_fd)) {
> + nodeDeviceUnlock();
> + goto cleanup;
> + }
> +
> + device = udev_monitor_receive_device(udev_monitor);
> + nodeDeviceUnlock();
> +
> + if (!device) {
> + virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> + _("udev_monitor_receive_device returned
NULL"));
> + goto next;
> + }
> +
> + udevHandleOneDevice(device);
> +
> + next:
> + udev_device_unref(device);
> }
>
> - nodeDeviceUnlock();
> - udevHandleOneDevice(device);
> -
> cleanup:
> udev_device_unref(device);
> + udevEventThreadDataFree(privateData);
> return;
> -
> - unlock:
> - nodeDeviceUnlock();
> - goto cleanup;
> }
>
>
> @@ -1683,20 +1749,28 @@ static void
> udevEventHandleCallback(int watch ATTRIBUTE_UNUSED,
> int fd,
> int events ATTRIBUTE_UNUSED,
> - void *data ATTRIBUTE_UNUSED)
> + void *opaque)
> {
> struct udev_monitor *udev_monitor = NULL;
> + udevEventThreadDataPtr threadData = opaque;
>
> nodeDeviceLock();
> udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
> -
> if (!udevCheckMonitorFD(udev_monitor, fd)) {
> + virMutexLock(&threadData->lock);
> + threadData->threadQuit = true;
> + virCondSignal(&threadData->threadCond);
> + virMutexUnlock(&threadData->lock);
> +
> nodeDeviceUnlock();
> return;
> }
> nodeDeviceUnlock();
>
> - udevEventHandleThread((void *)(intptr_t) fd);
> + virMutexLock(&threadData->lock);
> + threadData->nevents++;
> + virCondSignal(&threadData->threadCond);
> + virMutexUnlock(&threadData->lock);
> }
>
>
> @@ -1823,6 +1897,9 @@ nodeStateInitialize(bool privileged,
> {
> udevPrivate *priv = NULL;
> struct udev *udev = NULL;
> + int monitor_fd = -1;
> + virThread th;
> + udevEventThreadDataPtr threadData = NULL;
>
> if (VIR_ALLOC(priv) < 0)
> return -1;
> @@ -1883,6 +1960,14 @@ nodeStateInitialize(bool privileged,
> 128 * 1024 * 1024);
> #endif
>
> + monitor_fd = udev_monitor_get_fd(priv->udev_monitor);
> + if (!(threadData = udevEventThreadDataNew(monitor_fd)) ||
> + virThreadCreate(&th, false, udevEventHandleThread, threadData) < 0)
{
> + virReportSystemError(errno, "%s",
> + _("failed to create udev handling
thread"));
> + goto cleanup;
> + }
> +
> /* We register the monitor with the event callback so we are
> * notified by udev of device changes before we enumerate existing
> * devices because libvirt will simply recreate the device if we
> @@ -1891,9 +1976,8 @@ nodeStateInitialize(bool privileged,
> * enumeration. The alternative is to register the callback after
> * we enumerate, in which case we will fail to create any devices
> * that appear while the enumeration is taking place. */
> - priv->watch = virEventAddHandle(udev_monitor_get_fd(priv->udev_monitor),
> - VIR_EVENT_HANDLE_READABLE,
> - udevEventHandleCallback, NULL, NULL);
> + priv->watch = virEventAddHandle(monitor_fd, VIR_EVENT_HANDLE_READABLE,
> + udevEventHandleCallback, threadData, NULL);
> if (priv->watch == -1)
> goto unlock;
>
>