On Thu, Jul 18, 2024 at 03:50:27PM +0200, Tim Wiederhake wrote:
On Thu, 2024-07-04 at 08:10 +0100, Daniel P. Berrangé wrote:
> On Wed, Jul 03, 2024 at 02:44:37PM +0200, Tim Wiederhake wrote:
> > `pthread_mutex_destroy`, `pthread_mutex_lock` and
> > `pthread_mutex_unlock`
> > return an error code that is currently ignored.
> >
> > Add debug information if one of these operations failed, e.g. when
> > there
> > is an attempt to destroy a still locked mutex or unlock an already
> > unlocked mutex. Both scenarios are considered undefined behavior.
> >
> > Signed-off-by: Tim Wiederhake <twiederh(a)redhat.com>
> > ---
> > src/util/virthread.c | 15 ++++++++++++---
> > 1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/util/virthread.c b/src/util/virthread.c
> > index 5422bb74fd..14116a2221 100644
> > --- a/src/util/virthread.c
> > +++ b/src/util/virthread.c
> > @@ -35,7 +35,10 @@
> >
> > #include "viralloc.h"
> > #include "virthreadjob.h"
> > +#include "virlog.h"
> >
> > +#define VIR_FROM_THIS VIR_FROM_THREAD
> > +VIR_LOG_INIT("util.thread");
> >
> > int virOnce(virOnceControl *once, virOnceFunc init)
> > {
> > @@ -83,17 +86,23 @@ int virMutexInitRecursive(virMutex *m)
> >
> > void virMutexDestroy(virMutex *m)
> > {
> > - pthread_mutex_destroy(&m->lock);
> > + if (pthread_mutex_destroy(&m->lock)) {
> > + VIR_WARN("Failed to destroy mutex=%p", m);
> > + }
> > }
> >
> > void virMutexLock(virMutex *m)
> > {
> > - pthread_mutex_lock(&m->lock);
> > + if (pthread_mutex_lock(&m->lock)) {
> > + VIR_WARN("Failed to lock mutex=%p", m);
> > + }
> > }
> >
> > void virMutexUnlock(virMutex *m)
> > {
> > - pthread_mutex_unlock(&m->lock);
> > + if (pthread_mutex_unlock(&m->lock)) {
> > + VIR_WARN("Failed to unlock mutex=%p", m);
> > + }
> > }
>
> I'd be surprised if these lock/unlock warnings ever trigger, since
> IIUC
> they would need us to be using an error checking mutex, not a regular
> mutex. IOW, aren't these just adding condition test overhead +
> unreachable
> code to the lock calls ?
>
> The 2nd patch shows failures in the destroy calls IIUC.
I have looked more closely into the issue now. pthread_mutex_lock and
pthread_mutex_unlock do indeed not return a non-zero value over us not
using error checking mutexes.
During my last attempt at fixing the issues I had a patch that would
count lockings and unlockings of mutexes explicitly, and I believe I
recall seeing problems in that area as well. Sadly, I cannot reproduce
that now, at least not reliably: Ignoring the warnings for
pthread_mutex_destroy, virnetdaemontest does seem to trigger my "number
of locks == number of unlocks" check in about 3 out of 10.000 runs. And
sometimes with a frequency of 1 in 10. Sometimes not at all. In any
case: I do not consider the checks for locking / unlocking dead code.
So far I have been using the test suite to check for obvious issues,
but I cannot rule out that libvirt itself has race conditions too.
But if the checks never fire, due to the mutex type, this isn't
helping us diagnose anything surely ?
I would advocate for merging this patch as is, and add a patch to
enable error checking for the mutexes.
Error checking mutexes aren't something we can easily enable, because
they break any code which needs to hold a mutex across fork() and
unlock in the child. IOW, we would need two different types of mutex
and pick which to use, where.
With regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|