On Wed, Aug 24, 2022 at 06:14:51PM +0100, Daniel P. Berrangé wrote:
> On Wed, Aug 24, 2022 at 10:56:22AM -0500, Jonathon Jongsma wrote:
> > On 8/24/22 2:09 AM, Erik Skultety wrote:
> > > On Tue, Aug 23, 2022 at 12:43:03PM -0500, Jonathon Jongsma wrote:
> > > > Openstack developers reported that newly-created mdevs were not
> > > > recognized by libvirt until after a libvirt daemon restart. The
source
> > > > of the problem appears to be that when libvirt gets the udev
'add'
> > > > event, the sysfs tree for that device might not be ready and so
libvirt
> > > > waits 100ms for it to appear (max 100 waits of 1ms each). But in the
> > > > OpenStack environment, the sysfs tree for new mediated devices was
> > > > taking closer to 250ms to appear and therefore libvirt gave up
waiting
> > > > and didn't add these new devices to its list of nodedevs.
> > > >
> > > > By changing the wait time to 1 second (max 100 waits of 10ms each),
this
> > > > should provide enough time to enable these deployments to recognize
> > > > newly-created mediated devices, but it shouldn't increase the
delay for
> > > > more traditional deployments too much.
> > > >
> > > > Resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=2109450
> > > >
> > > > Signed-off-by: Jonathon Jongsma <jjongsma(a)redhat.com>
> > > > ---
> > > >
> > > > Alternatively, we could switch to triggering off of the udev
'bind' event
> > > > rather than the 'add' event, but I wasn't able to
convince myself that this
> > > > would result in 100% compatible behavior, so this felt like the
safest
> > > > solution. If others can convince me that switching to 'bind'
is safe, I can
> > > > re-submit this patch.
> > >
> > > Is there a guarantee that the filesystem tree is ready by the time the
event
> > > arrives? I remember back in the day when I implemented this, this was
even
> > > discussed on the kernel list and the outcome was that each application
needs to
> > > sort this out on its own hinting that at least at that time there
wasn't
> > > any other way to do this reliably? Has something changed in the meantime?
> > >
> > > Erik
> > >
> >
> > I'm afraid I don't actually know if anything has changed in the kernel
in
> > this area. That's basically the reason that I proposed the approach that I
> > did. But I do know that in the bug referenced, the 'bind' event comes
about
> > 250ms later than the 'add' event. I'm not sure if the filesystem
tree is
> > necessarily ready on 'bind', but the fact that it is 250ms later means
that,
> > at minimum, there's a significantly better chance that it is ready by that
> > point than at the time of 'add'.
>
> Looking at the commit message for 'bind' events
>
>
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit...
>
> [quote]
> Also, many drivers create additional driver-specific device attributes
> when binding to the device, to provide userspace with additional controls.
> The new events allow userspace to adjust these driver-specific attributes
> without worrying that they are not there yet.
> [/quote]
>
> My reading of this is that we should NOT assume sysfs files exist
> until we've seen the 'bind' event.
>
> IOW, the reason we have this problem with sysfs is precisely because
> we hooked into 'add' instead of 'bind'. The sleep papers over the
> mistake, while using 'bind' instead would address the root cause.
From the quote you posted I don't see how we're guaranteed that the sysfs tree
is ready and it looks like each driver has too much room for interpreting
'bind' whichever way they want. That said, I agree that switching to listening
to 'bind' instead of 'add' is an improvement, however we still may need
to have
a timeout in place anyway as I still don't see any guarantees from the NVIDIA
driver that the sysfs tree will be ready by then.
Right, lets do both, so might as well commit this timeout change
immediately before release since it should be safe.
With regards,
Daniel
--
|: