On Thu, Oct 18, 2018 at 12:04:55PM -0400, John Ferlan wrote:
[...]
>>
>> The one thing I have forgotten or perhaps I should say struck me -
>> should this code use virCondWaitUntil or would it just not matter?
>>
>> Let's say 10 devices were added and on the 10th one we had this issue,
>> set dataReady to false, but then didn't get another udev device ready
>> event for "a while" leaving that 10th one in limbo until such time a
new
>> device was available (e.g. udevEventHandleCallback is called by udev).
>
> Can you be more specific on how we could utilize virCondWaitUntil? Because the
> way I see it, it wouldn't help us in either of the cases because the problem
> isn't the thread waiting to be woken up to process an event rather it's the
> recvmsg syscall in libudev which would cause the issue on old CentOS.
> Looking at your example, if you put virCondWaitUntil in the !device conditional
> path, how would that prevent us to hang after processing the last device if we
> never got a chance to even hit the !device code path, since we kept getting
> legitimate events until the last device where we got several events for a
> single piece of data? It very well might be the jetlag speaking for me, but I
> just don't see it in your example.
>
> Erik
>
I guess it was the "getting of legitimate events" that I couldn't recall
the exact workings. My "theory" was we got a group of 10 devices, failed
on the last one, and didn't get another call for let's say hours. Then
we have this device in limbo just because udev told us too soon an we
acted, but the scheduler had other ideas vis-a-vis filling in the
@device. I'm fine with things as they are still.
Aha, now I see your thoughts. So, what you're mentioning was a different, kinda
related issue with mdevs where the sysfs tree wasn't completely ready at the
time of emission of the event. However, we need to forget about that udev might
tell us about the device too soon, because we technically got the device, it's
just the device is not not fully ready, but that's a different piece of code
handling that situation.
With your code however, you wouldn't really remedy the problem, because on
CentOS 6 the !device branch is essentially dead code and on CentOS 7 the code
is just unnecessarily more complex, less readable and doesn't really change
(let alone help) the status quo. Thanks for sharing the details though,
it helped me understand your thoughts.
Erik
More details if you care to read... First, I think the theory of
thinking about using Until processing works better than the actual
practice of coding it - Until would requiring using virTimeMillisNow
which can fail so we need to account for that...
My general thought was without writing a lick of code was:
bool timedWait = false;
int rc;
...
while (!priv->dataReady && !priv->threadQuit) {
errno = 0;
if (!timedWait)
rc = virCondWait(...)
else
rc = virCondWaitUntil(...);
if (rc < 0 && errno != ETIMEDOUT) {
virReportSystemError ...
}
timedWait = false;
...
if (!device) {
...
virObjectLock(priv);
priv->dataReady = false;
virObjectUnlock(priv);
timedWait = true;
...
IOW: Only use the timedWait when/if !device caused the "continue;". I
also note now that the virCondWait error check doesn't check < 0, just
that the return is not 0... Maybe that was copied from virFDStreamThread
since other callers check < 0.
Once I started writing code I realized the virTimeMillisNow issue, so
then things would become:
errno = 0;
if (timedWait) {
/* If we fail this, then just use virCondWait */
if (virTimeMillisNow(&now) < 0) {
timedWait = false;
} else {
when = now + 1000ull;
rc = virCondWait(&priv->threadCond,
&priv->parent.lock,
when);
}
}
if (!timedWait)
rc = virCondWait(&priv->threadCond, &priv->parent.lock);
if (rc < 0 && errno != ETIMEDOUT) {
Much uglier and scarier than I first thought.
John