On 8/24/22 2:09 AM, Erik Skultety wrote:
> On Tue, Aug 23, 2022 at 12:43:03PM -0500, Jonathon Jongsma wrote:
> > Openstack developers reported that newly-created mdevs were not
> > recognized by libvirt until after a libvirt daemon restart. The source
> > of the problem appears to be that when libvirt gets the udev 'add'
> > event, the sysfs tree for that device might not be ready and so libvirt
> > waits 100ms for it to appear (max 100 waits of 1ms each). But in the
> > OpenStack environment, the sysfs tree for new mediated devices was
> > taking closer to 250ms to appear and therefore libvirt gave up waiting
> > and didn't add these new devices to its list of nodedevs.
> >
> > By changing the wait time to 1 second (max 100 waits of 10ms each), this
> > should provide enough time to enable these deployments to recognize
> > newly-created mediated devices, but it shouldn't increase the delay for
> > more traditional deployments too much.
> >
> > Resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=2109450
> >
> > Signed-off-by: Jonathon Jongsma <jjongsma(a)redhat.com>
> > ---
> >
> > Alternatively, we could switch to triggering off of the udev 'bind'
event
> > rather than the 'add' event, but I wasn't able to convince myself
that this
> > would result in 100% compatible behavior, so this felt like the safest
> > solution. If others can convince me that switching to 'bind' is safe, I
can
> > re-submit this patch.
>
> Is there a guarantee that the filesystem tree is ready by the time the event
> arrives? I remember back in the day when I implemented this, this was even
> discussed on the kernel list and the outcome was that each application needs to
> sort this out on its own hinting that at least at that time there wasn't
> any other way to do this reliably? Has something changed in the meantime?
>
> Erik
>
I'm afraid I don't actually know if anything has changed in the kernel in
this area. That's basically the reason that I proposed the approach that I
did. But I do know that in the bug referenced, the 'bind' event comes about
250ms later than the 'add' event. I'm not sure if the filesystem tree is
necessarily ready on 'bind', but the fact that it is 250ms later means that,
at minimum, there's a significantly better chance that it is ready by that
point than at the time of 'add'.
In that case I'd accept this solution over bind since on a loaded system you
neither have a guarantee that the filesystem tree is ready by the time bind is
delivered nor that bind cannot be delayed for significantly longer period (less
likely).
So, from my POV:
Reviewed-by: Erik Skultety <eskultet(a)redhat.com>
to the patch as is.
Regards,
Erik