On Thu, 2024-07-04 at 08:10 +0100, Daniel P. Berrangé wrote:
On Wed, Jul 03, 2024 at 02:44:37PM +0200, Tim Wiederhake wrote:
> `pthread_mutex_destroy`, `pthread_mutex_lock` and
> `pthread_mutex_unlock`
> return an error code that is currently ignored.
>
> Add debug information if one of these operations failed, e.g. when
> there
> is an attempt to destroy a still locked mutex or unlock an already
> unlocked mutex. Both scenarios are considered undefined behavior.
>
> Signed-off-by: Tim Wiederhake <twiederh(a)redhat.com>
> ---
> src/util/virthread.c | 15 ++++++++++++---
> 1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/src/util/virthread.c b/src/util/virthread.c
> index 5422bb74fd..14116a2221 100644
> --- a/src/util/virthread.c
> +++ b/src/util/virthread.c
> @@ -35,7 +35,10 @@
>
> #include "viralloc.h"
> #include "virthreadjob.h"
> +#include "virlog.h"
>
> +#define VIR_FROM_THIS VIR_FROM_THREAD
> +VIR_LOG_INIT("util.thread");
>
> int virOnce(virOnceControl *once, virOnceFunc init)
> {
> @@ -83,17 +86,23 @@ int virMutexInitRecursive(virMutex *m)
>
> void virMutexDestroy(virMutex *m)
> {
> - pthread_mutex_destroy(&m->lock);
> + if (pthread_mutex_destroy(&m->lock)) {
> + VIR_WARN("Failed to destroy mutex=%p", m);
> + }
> }
>
> void virMutexLock(virMutex *m)
> {
> - pthread_mutex_lock(&m->lock);
> + if (pthread_mutex_lock(&m->lock)) {
> + VIR_WARN("Failed to lock mutex=%p", m);
> + }
> }
>
> void virMutexUnlock(virMutex *m)
> {
> - pthread_mutex_unlock(&m->lock);
> + if (pthread_mutex_unlock(&m->lock)) {
> + VIR_WARN("Failed to unlock mutex=%p", m);
> + }
> }
I'd be surprised if these lock/unlock warnings ever trigger, since
IIUC
they would need us to be using an error checking mutex, not a regular
mutex. IOW, aren't these just adding condition test overhead +
unreachable
code to the lock calls ?
The 2nd patch shows failures in the destroy calls IIUC.
With regards,
Daniel
I have looked more closely into the issue now. pthread_mutex_lock and
pthread_mutex_unlock do indeed not return a non-zero value over us not
using error checking mutexes.
During my last attempt at fixing the issues I had a patch that would
count lockings and unlockings of mutexes explicitly, and I believe I
recall seeing problems in that area as well. Sadly, I cannot reproduce
that now, at least not reliably: Ignoring the warnings for
pthread_mutex_destroy, virnetdaemontest does seem to trigger my "number
of locks == number of unlocks" check in about 3 out of 10.000 runs. And
sometimes with a frequency of 1 in 10. Sometimes not at all. In any
case: I do not consider the checks for locking / unlocking dead code.
So far I have been using the test suite to check for obvious issues,
but I cannot rule out that libvirt itself has race conditions too.
I would advocate for merging this patch as is, and add a patch to
enable error checking for the mutexes.
Regards,
Tim