Re: [libvirt] [PATCH v4 5/7] nodedev: Disable/re-enable polling on the udev fd

2 Oct 2017


      On Fri, Sep 29, 2017 at 09:46:48AM -0400, John Ferlan wrote:
...
On 09/28/2017 06:00 AM, Erik Skultety wrote:
...
[...]
...
...
nodeDeviceLock();
+        priv = driver->privateData;
         udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
if (!udevEventCheckMonitorFD(udev_monitor, privateData->monitor_fd)) {
@@ -1725,6 +1727,9 @@ udevEventHandleThread(void *opaque)
         device = udev_monitor_receive_device(udev_monitor);
         nodeDeviceUnlock();
+        /* Re-enable polling for new events on the @udev_monitor */
+        virEventUpdateHandle(priv->watch, VIR_EVENT_HANDLE_READABLE);
+
I think this should only be done when privateData->nevents == 0?  If we
have multiple events to read, then calling virEventPollUpdateHandle,
(eventually) for every pass through the loop seems like a bit of
overkill especially if udevEventHandleCallback turns right around and
disables it again.
Also fortunately there isn't more than one udev thread sending the
events since you access the priv->watch without the driver lock...
Conversely perhaps we only disable if events > 1... Then again, how does
one get to 2 events queued if we're disabling as soon as we increment
Very good point, technically events would still get queued, we just wouldn't
check and yes, we would process 1 event at a time. Not optimal, but if you look
at the original code and compare it with this one performance-wise (and I hope
I haven't missed anything that would render everything I write next a complete
rubbish), the time complexity hasn't changed, the space complexity hasn't
changed, what changed is code complexity which makes the code a bit slower due
to the excessive locking and toggling the FD polling back and forth. So
essentially you've got the same thing as you had before..but asynchronous.
However, yes, the usage of @nevents is completely useless now (haven't realized
that immediately, thanks) and a simple signalling should suffice.
Having it "slower" is necessarily bad ;-)  That gives some of the other
slower buggers a chance to fill in the details we need. Throwing the
control back to udev quicker could aid in that too.
I'm sorry, but I don't follow, how do you hand control back over to something
you can't control? udev is not paused in any way during the phase libvirt is
processing the events, so it keeps pushing new events to the socket queue,
until it can't push any more.
...
...
So how could we make it faster though? I thought more about the idea you shared
in one of the previous reviews, letting the thread actually pull all the data
from the monitor, to which I IIRC replied something in the sense that the event
counting mechanism wouldn't allow that and it would break. Okay, let's drop the
event counting. What if we now let both the udev handler thread and the event
loop "poll" the file descriptor, IOW let the event loop polling the monitor fd,
thus invoking udevHandleCallback which would in turn keep signalling the handler
thread that there are some data. The difference now in the handler thread would
be that it wouldn't blindly trust the callback about the data, because of the
scheduling issue, it would keep poking the monitor itself until it gets either
EAGAIN or EWOULDBLOCK from recvmsg() called in libudev,  which would then be the
signal to start waiting on the condition. The next an event appears, a signal
from udevHandleCallback would finally have a meaning and wouldn't be ignored.
Is making it faster really a goal?  It's preferable that it works
"I would think it would be overkill to disable/enable for just 1 event." I was
trying to come up with something faster based on that statement.
...
consistently I think. The various errno possibilities and the desire to
avoid the "udev_monitor_receive_device returned NULL" message processing
because we had too many "cooks in the kitchen" trying to determine
whether a new device was really available or was this just another
notification for something we're already processing.
Also, not that I expect the udev code to change, but if a new errno is
added then we may have to keep up... Always a fear especially if we're
Not necessarily, that would mean that either libudev replaces recvmsg with
something else or that recvmsg starts returning more/different errnos to signal
that there are no data to be pulled - since there are already 2 such errnos,
it's highly unlikely IMHO. Unless libudev changes substantially, in terms of
error signaling, I think we can safely ignore the other errnos as the actual
outcome for libvirt is that libudev failed because of some socket error, but
since that can either come from kernel or from libudev itself, the best thing
we can do is shrug our shoulders and say "libudev failed for some reason, but
we can't tell why".
...
using the errno to dictate our algorithm.
...
This way, it's actually the handler thread who's 'the boss', as most of the
signals from udevHandleCallback would be lost/NOPs.
Would you agree to such an approach?
At some point we get too fancy and think too hard about a problem
letting it consume us.  I think when I presented my idea - I wasn't
really understanding the details of how we consume the udev data. Now
that I have a bit more of a clue - perhaps the original idea wasn't so
good. If we receive multiple notifications that a device is ready to be
processed even though we could be processing it - how are we to truly
know whether we missed one or we really got it and udev was just
reminding us again.
udev doesn't remind us of anything, it's our event loop that does - if we don't
pre-process the events somehow, we can't tell them apart and even if we did, it
would be utterly unreliable since you'd have to include some kind of nonce and
a hash to be really sure that it's a duplicate event and not a device that was
un-plugged and re-plugged again in a very short period of time. So, this issue
is not a matter of our communication with udev, it's about how good can we
manage our discover-pull-process event algorithm.
...
I'm not against looking at a different mechanism - the question then
becomes from your perspective is it worth it?
I think the ultimate question is how fast do we want to deliver a fix. Of
course I'm open to explore more approaches, but we can do that even with a
"preliminary" fix included (even though, honestly, once this is merged, I doubt
anyone will find time to do any kind of experiments unless there's another
breakage to fix in this code). So, based on what I just wrote, I'm inclined to
say no, it really wouldn't be of much worth.

Even though I came with another proposal, I personally still like the previous
version with one-by-one event version, since despite (perhaps) being a bit
slower, it's a more transparent and consistent (minus the event counter)
solution than the one I proposed the last time.

Erik