
On Thu, 2024-07-04 at 08:10 +0100, Daniel P. Berrangé wrote:
On Wed, Jul 03, 2024 at 02:44:37PM +0200, Tim Wiederhake wrote:
`pthread_mutex_destroy`, `pthread_mutex_lock` and `pthread_mutex_unlock` return an error code that is currently ignored.
Add debug information if one of these operations failed, e.g. when there is an attempt to destroy a still locked mutex or unlock an already unlocked mutex. Both scenarios are considered undefined behavior.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com> --- src/util/virthread.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/src/util/virthread.c b/src/util/virthread.c index 5422bb74fd..14116a2221 100644 --- a/src/util/virthread.c +++ b/src/util/virthread.c @@ -35,7 +35,10 @@ #include "viralloc.h" #include "virthreadjob.h" +#include "virlog.h" +#define VIR_FROM_THIS VIR_FROM_THREAD +VIR_LOG_INIT("util.thread"); int virOnce(virOnceControl *once, virOnceFunc init) { @@ -83,17 +86,23 @@ int virMutexInitRecursive(virMutex *m) void virMutexDestroy(virMutex *m) { - pthread_mutex_destroy(&m->lock); + if (pthread_mutex_destroy(&m->lock)) { + VIR_WARN("Failed to destroy mutex=%p", m); + } } void virMutexLock(virMutex *m) { - pthread_mutex_lock(&m->lock); + if (pthread_mutex_lock(&m->lock)) { + VIR_WARN("Failed to lock mutex=%p", m); + } } void virMutexUnlock(virMutex *m) { - pthread_mutex_unlock(&m->lock); + if (pthread_mutex_unlock(&m->lock)) { + VIR_WARN("Failed to unlock mutex=%p", m); + } }
I'd be surprised if these lock/unlock warnings ever trigger, since IIUC they would need us to be using an error checking mutex, not a regular mutex. IOW, aren't these just adding condition test overhead + unreachable code to the lock calls ?
The 2nd patch shows failures in the destroy calls IIUC.
With regards, Daniel
I have looked more closely into the issue now. pthread_mutex_lock and pthread_mutex_unlock do indeed not return a non-zero value over us not using error checking mutexes. During my last attempt at fixing the issues I had a patch that would count lockings and unlockings of mutexes explicitly, and I believe I recall seeing problems in that area as well. Sadly, I cannot reproduce that now, at least not reliably: Ignoring the warnings for pthread_mutex_destroy, virnetdaemontest does seem to trigger my "number of locks == number of unlocks" check in about 3 out of 10.000 runs. And sometimes with a frequency of 1 in 10. Sometimes not at all. In any case: I do not consider the checks for locking / unlocking dead code. So far I have been using the test suite to check for obvious issues, but I cannot rule out that libvirt itself has race conditions too. I would advocate for merging this patch as is, and add a patch to enable error checking for the mutexes. Regards, Tim