[libvirt PATCH 0/7] Fix some mutex related issues

The calls to pthread_mutex_{lock,unlock,destroy} in "virthread.c" ignore the return value of these functions. This series fixes all "double unlock mutex" and "destroy locked mutex" issues I came across and adds checks to the calls to pthread_mutex_*, so no new issues of this type are introduced. Patches 1, 2, 3, and 4 fix the issues, patches 5, 6, and 7 add the checks. See https://gitlab.com/twiederh/libvirt/-/pipelines/348505233 and https://gitlab.com/twiederh/libvirt/-/pipelines/348505233 (only patches 5, 6, and 7). Regards, Tim Tim Wiederhake (7): tests: qemuMonitorTestFree: Unlock virDomainObj before freeing qemusecuritytest: Unlock virDomainObj before disposal qemuxml2argvtest: Unlock virDomainObj before disposal qemuhotplugtest: Lock reused virDomainObj Meson: Define "DYNAMIC_ANALYSIS" if built with sanitizers enabled virMutex*: Warn on error virMutex*: Fail loudly meson.build | 4 ++++ src/util/virthread.c | 24 +++++++++++++++++++++--- tests/qemuhotplugtest.c | 1 + tests/qemumonitortestutils.c | 2 +- tests/qemusecuritytest.c | 3 ++- tests/qemuxml2argvtest.c | 12 +++++++----- 6 files changed, 36 insertions(+), 10 deletions(-) -- 2.31.1

virDomainObj contains a mutex. Destroying a locked mutex results in undefined behavior. Signed-off-by: Tim Wiederhake <twiederh@redhat.com> --- tests/qemumonitortestutils.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/qemumonitortestutils.c b/tests/qemumonitortestutils.c index 143dd954e6..f316fe823f 100644 --- a/tests/qemumonitortestutils.c +++ b/tests/qemumonitortestutils.c @@ -411,7 +411,7 @@ qemuMonitorTestFree(qemuMonitorTest *test) g_object_unref(test->eventThread); - virObjectUnref(test->vm); + virDomainObjEndAPI(&test->vm); if (test->started) virThreadJoin(&test->thread); -- 2.31.1

virDomainObj contains a mutex. Destroying a locked mutex results in undefined behavior. Signed-off-by: Tim Wiederhake <twiederh@redhat.com> --- tests/qemusecuritytest.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tests/qemusecuritytest.c b/tests/qemusecuritytest.c index a7e87fdf8f..b14da08f1b 100644 --- a/tests/qemusecuritytest.c +++ b/tests/qemusecuritytest.c @@ -114,7 +114,7 @@ testDomain(const void *opaque) /* Mocking is enabled only when this env variable is set. * See mock code for explanation. */ if (g_setenv(ENVVAR, "1", FALSE) == FALSE) - return -1; + goto cleanup; if (qemuSecuritySetAllLabel(data->driver, vm, NULL, false) < 0) goto cleanup; @@ -126,6 +126,7 @@ testDomain(const void *opaque) ret = 0; cleanup: + virObjectUnlock(vm); g_unsetenv(ENVVAR); freePaths(); return ret; -- 2.31.1

virDomainObj contains a mutex. Destroying a locked mutex results in undefined behavior. Signed-off-by: Tim Wiederhake <twiederh@redhat.com> --- tests/qemuxml2argvtest.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index b552f5deed..4276f062a2 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -562,8 +562,9 @@ testCompareXMLToArgvValidateSchema(virQEMUDriver *drv, if (virBitmapParse("0-3", &priv->autoNodeset, 4) < 0) return -1; - if (!(cmd = testCompareXMLToArgvCreateArgs(drv, vm, migrateURI, info, flags, - true))) + cmd = testCompareXMLToArgvCreateArgs(drv, vm, migrateURI, info, flags, true); + virDomainObjEndAPI(&vm); + if (!cmd) return -1; if (virCommandGetArgList(cmd, &args, &nargs) < 0) @@ -767,8 +768,9 @@ testCompareXMLToArgv(const void *data) VIR_FREE(log); virResetLastError(); - if (!(cmd = testCompareXMLToArgvCreateArgs(&driver, vm, migrateURI, info, - flags, false))) { + cmd = testCompareXMLToArgvCreateArgs(&driver, vm, migrateURI, info, flags, false); + virDomainObjEndAPI(&vm); + if (!cmd) { err = virGetLastError(); if (!err) { VIR_TEST_DEBUG("no error was reported for expected failure"); @@ -816,7 +818,7 @@ testCompareXMLToArgv(const void *data) cleanup: VIR_FREE(log); virDomainChrSourceDefClear(&monitor_chr); - virObjectUnref(vm); + virDomainObjEndAPI(&vm); virIdentitySetCurrent(NULL); virSetConnectSecret(NULL); virSetConnectStorage(NULL); -- 2.31.1

On Thu, Aug 05, 2021 at 15:08:47 +0200, Tim Wiederhake wrote:
virDomainObj contains a mutex. Destroying a locked mutex results in undefined behavior.
I presume all of the unlocks you've added are purely based on success code paths you've observed, right?
Signed-off-by: Tim Wiederhake <twiederh@redhat.com> --- tests/qemuxml2argvtest.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index b552f5deed..4276f062a2 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c
Specifically, we have: G_DEFINE_AUTOPTR_CLEANUP_FUNC(virDomainObj, virObjectUnref); and then in this function: g_autoptr(virDomainObj) vm = NULL;
@@ -562,8 +562,9 @@ testCompareXMLToArgvValidateSchema(virQEMUDriver *drv, if (virBitmapParse("0-3", &priv->autoNodeset, 4) < 0) return -1;
So in all other cases that hit the automatic cleanup path before virDomainObjEndAPI (yes, they would result in test suite failure), such as in the above call to virBitmapParse, or rather the 'virDomainDefParseFile' call before it (which can fail easily on XML errors) will still hit the problem. So it seems that we either will rely on the fact that the undefined behaviour is sane in our case, or we need a more systemic approach than unlocking the few cases, to fix this properly.
- if (!(cmd = testCompareXMLToArgvCreateArgs(drv, vm, migrateURI, info, flags, - true))) + cmd = testCompareXMLToArgvCreateArgs(drv, vm, migrateURI, info, flags, true); + virDomainObjEndAPI(&vm); + if (!cmd) return -1;
if (virCommandGetArgList(cmd, &args, &nargs) < 0)

If the previous test had `keep == 1` set, this function inherits its (unlocked) virDomainObj. Signed-off-by: Tim Wiederhake <twiederh@redhat.com> --- tests/qemuhotplugtest.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/qemuhotplugtest.c b/tests/qemuhotplugtest.c index 57d76845c6..d8d6a3a226 100644 --- a/tests/qemuhotplugtest.c +++ b/tests/qemuhotplugtest.c @@ -280,6 +280,7 @@ testQemuHotplug(const void *data) if (test->vm) { vm = test->vm; + virObjectLock(vm); if (!vm->def) { VIR_TEST_VERBOSE("test skipped due to failure of dependent test"); goto cleanup; -- 2.31.1

This is complementary to the "STATIC_ANALYSIS" macro defined in "internal.h", but not as straight-forward to detect: clang defines "__has_feature(address_sanitizer)" if the address sanitizer is enabled. "__has_feature" is clang specific and does not exist in gcc (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60512) gcc, depending on the version, defines "__SANITIZE_ADDRESS__ 1" if compiled with the address sanitizer enabled. No similar macro is defined for the undefined behavior sanitizer. Instead, let meson set this symbol. Signed-off-by: Tim Wiederhake <twiederh@redhat.com> --- meson.build | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/meson.build b/meson.build index e25dc17fc8..2c424076ba 100644 --- a/meson.build +++ b/meson.build @@ -224,6 +224,10 @@ alloc_max = run_command( 'print(min(2**(@0@ * 8 - 1) - 1, 2**(@1@ * 8) - 1))'.format(ptrdiff_max, size_max), ) +if (get_option('b_sanitize') != 'none') + conf.set('DYNAMIC_ANALYSIS', 1) +endif + # sanitizer instrumentation may enlarge stack frames stack_frame_size = get_option('b_sanitize') == 'none' ? 4096 : 8192 -- 2.31.1

`pthread_mutex_destroy`, `pthread_mutex_lock` and `pthread_mutex_unlock` return an error code that is currently ignored. Add debug information if one of these operations failed, e.g. when there is an attempt to destroy a still locked mutex or unlock and already unlocked mutex. Signed-off-by: Tim Wiederhake <twiederh@redhat.com> --- src/util/virthread.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/src/util/virthread.c b/src/util/virthread.c index e89c1a09fb..f64dbee9e9 100644 --- a/src/util/virthread.c +++ b/src/util/virthread.c @@ -35,7 +35,10 @@ #include "viralloc.h" #include "virthreadjob.h" +#include "virlog.h" +#define VIR_FROM_THIS VIR_FROM_THREAD +VIR_LOG_INIT("util.thread"); int virOnce(virOnceControl *once, virOnceFunc init) { @@ -83,17 +86,23 @@ int virMutexInitRecursive(virMutex *m) void virMutexDestroy(virMutex *m) { - pthread_mutex_destroy(&m->lock); + if (pthread_mutex_destroy(&m->lock)) { + VIR_WARN("Failed to destroy mutex=%p", m); + } } void virMutexLock(virMutex *m) { - pthread_mutex_lock(&m->lock); + if (pthread_mutex_lock(&m->lock)) { + VIR_WARN("Failed to lock mutex=%p", m); + } } void virMutexUnlock(virMutex *m) { - pthread_mutex_unlock(&m->lock); + if (pthread_mutex_unlock(&m->lock)) { + VIR_WARN("Failed to unlock mutex=%p", m); + } } -- 2.31.1

On Thu, Aug 05, 2021 at 03:08:50PM +0200, Tim Wiederhake wrote:
`pthread_mutex_destroy`, `pthread_mutex_lock` and `pthread_mutex_unlock` return an error code that is currently ignored.
Add debug information if one of these operations failed, e.g. when there is an attempt to destroy a still locked mutex or unlock and already unlocked mutex.
We ignore the errors because in practice we found impls don't return any error code when using PTHREAD_MUTEX_NORMAL - it would require use of PTHREAD_MUTEX_ERRORCHECK. Did you actually see errors returned for real ?
Signed-off-by: Tim Wiederhake <twiederh@redhat.com> --- src/util/virthread.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/src/util/virthread.c b/src/util/virthread.c index e89c1a09fb..f64dbee9e9 100644 --- a/src/util/virthread.c +++ b/src/util/virthread.c @@ -35,7 +35,10 @@
#include "viralloc.h" #include "virthreadjob.h" +#include "virlog.h"
+#define VIR_FROM_THIS VIR_FROM_THREAD +VIR_LOG_INIT("util.thread");
int virOnce(virOnceControl *once, virOnceFunc init) { @@ -83,17 +86,23 @@ int virMutexInitRecursive(virMutex *m)
void virMutexDestroy(virMutex *m) { - pthread_mutex_destroy(&m->lock); + if (pthread_mutex_destroy(&m->lock)) { + VIR_WARN("Failed to destroy mutex=%p", m); + } }
void virMutexLock(virMutex *m) { - pthread_mutex_lock(&m->lock); + if (pthread_mutex_lock(&m->lock)) { + VIR_WARN("Failed to lock mutex=%p", m); + } }
void virMutexUnlock(virMutex *m) { - pthread_mutex_unlock(&m->lock); + if (pthread_mutex_unlock(&m->lock)) { + VIR_WARN("Failed to unlock mutex=%p", m); + } }
-- 2.31.1
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, 2021-08-05 at 14:24 +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 03:08:50PM +0200, Tim Wiederhake wrote:
`pthread_mutex_destroy`, `pthread_mutex_lock` and `pthread_mutex_unlock` return an error code that is currently ignored.
Add debug information if one of these operations failed, e.g. when there is an attempt to destroy a still locked mutex or unlock and already unlocked mutex.
We ignore the errors because in practice we found impls don't return any error code when using PTHREAD_MUTEX_NORMAL - it would require use of PTHREAD_MUTEX_ERRORCHECK.
Did you actually see errors returned for real ?
Yes. Please note that I apprently copy-pasted the wrong link in the cover letter. The pipeline for this series is correct (https://gitlab.com/twiederh/libvirt/-/pipelines/348505233), the pipeline for only patches 5, 6, 7 that demonstrate the non-zero return values of pthread_mutex_{lock,unlock,destroy} is here: https://gitlab.com/twiederh/libvirt/-/pipelines/348505770 Regards, Tim
Signed-off-by: Tim Wiederhake <twiederh@redhat.com> --- src/util/virthread.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/src/util/virthread.c b/src/util/virthread.c index e89c1a09fb..f64dbee9e9 100644 --- a/src/util/virthread.c +++ b/src/util/virthread.c @@ -35,7 +35,10 @@ #include "viralloc.h" #include "virthreadjob.h" +#include "virlog.h" +#define VIR_FROM_THIS VIR_FROM_THREAD +VIR_LOG_INIT("util.thread"); int virOnce(virOnceControl *once, virOnceFunc init) { @@ -83,17 +86,23 @@ int virMutexInitRecursive(virMutex *m) void virMutexDestroy(virMutex *m) { - pthread_mutex_destroy(&m->lock); + if (pthread_mutex_destroy(&m->lock)) { + VIR_WARN("Failed to destroy mutex=%p", m); + } } void virMutexLock(virMutex *m) { - pthread_mutex_lock(&m->lock); + if (pthread_mutex_lock(&m->lock)) { + VIR_WARN("Failed to lock mutex=%p", m); + } } void virMutexUnlock(virMutex *m) { - pthread_mutex_unlock(&m->lock); + if (pthread_mutex_unlock(&m->lock)) { + VIR_WARN("Failed to unlock mutex=%p", m); + } } -- 2.31.1
Regards, Daniel

On Thu, Aug 05, 2021 at 03:33:57PM +0200, Tim Wiederhake wrote:
On Thu, 2021-08-05 at 14:24 +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 03:08:50PM +0200, Tim Wiederhake wrote:
`pthread_mutex_destroy`, `pthread_mutex_lock` and `pthread_mutex_unlock` return an error code that is currently ignored.
Add debug information if one of these operations failed, e.g. when there is an attempt to destroy a still locked mutex or unlock and already unlocked mutex.
We ignore the errors because in practice we found impls don't return any error code when using PTHREAD_MUTEX_NORMAL - it would require use of PTHREAD_MUTEX_ERRORCHECK.
Did you actually see errors returned for real ?
Yes. Please note that I apprently copy-pasted the wrong link in the cover letter. The pipeline for this series is correct (https://gitlab.com/twiederh/libvirt/-/pipelines/348505233), the pipeline for only patches 5, 6, 7 that demonstrate the non-zero return values of pthread_mutex_{lock,unlock,destroy} is here:
All that's showing is (killed by signal 6 SIGABRT) where's the VIR_WARN output that ought to have ended up on stderr from the failed test before aborting ?
Signed-off-by: Tim Wiederhake <twiederh@redhat.com> --- src/util/virthread.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/src/util/virthread.c b/src/util/virthread.c index e89c1a09fb..f64dbee9e9 100644 --- a/src/util/virthread.c +++ b/src/util/virthread.c @@ -35,7 +35,10 @@ #include "viralloc.h" #include "virthreadjob.h" +#include "virlog.h" +#define VIR_FROM_THIS VIR_FROM_THREAD +VIR_LOG_INIT("util.thread"); int virOnce(virOnceControl *once, virOnceFunc init) { @@ -83,17 +86,23 @@ int virMutexInitRecursive(virMutex *m) void virMutexDestroy(virMutex *m) { - pthread_mutex_destroy(&m->lock); + if (pthread_mutex_destroy(&m->lock)) { + VIR_WARN("Failed to destroy mutex=%p", m); + } } void virMutexLock(virMutex *m) { - pthread_mutex_lock(&m->lock); + if (pthread_mutex_lock(&m->lock)) { + VIR_WARN("Failed to lock mutex=%p", m); + } } void virMutexUnlock(virMutex *m) { - pthread_mutex_unlock(&m->lock); + if (pthread_mutex_unlock(&m->lock)) { + VIR_WARN("Failed to unlock mutex=%p", m); + } } -- 2.31.1
Regards, Daniel
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Aug 05, 2021 at 02:58:03PM +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 03:33:57PM +0200, Tim Wiederhake wrote:
On Thu, 2021-08-05 at 14:24 +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 03:08:50PM +0200, Tim Wiederhake wrote:
`pthread_mutex_destroy`, `pthread_mutex_lock` and `pthread_mutex_unlock` return an error code that is currently ignored.
Add debug information if one of these operations failed, e.g. when there is an attempt to destroy a still locked mutex or unlock and already unlocked mutex.
We ignore the errors because in practice we found impls don't return any error code when using PTHREAD_MUTEX_NORMAL - it would require use of PTHREAD_MUTEX_ERRORCHECK.
Did you actually see errors returned for real ?
Yes. Please note that I apprently copy-pasted the wrong link in the cover letter. The pipeline for this series is correct (https://gitlab.com/twiederh/libvirt/-/pipelines/348505233), the pipeline for only patches 5, 6, 7 that demonstrate the non-zero return values of pthread_mutex_{lock,unlock,destroy} is here:
All that's showing is
(killed by signal 6 SIGABRT)
where's the VIR_WARN output that ought to have ended up on stderr from the failed test before aborting ?
Ok, so the VIR_WARN output is getting squelched by the tests somewhere, as its missing when i run locally too. If I add a test that does virMutex m; ignore_value(virMutexInit(&m)); virMutexUnlock(&m); then I get no warnings with this patch, nor abort with the next patch. It only triggers with PTHREAD_MUTEX_ERRORCHECK The pthread_mutex_destroy call is the only one that I see returning errors with PTHREAD_MUTEX_NORMAL. So I don't think there's benefit to adding the code to lock/unlock paths. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Aug 05, 2021 at 16:11:15 +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 02:58:03PM +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 03:33:57PM +0200, Tim Wiederhake wrote:
On Thu, 2021-08-05 at 14:24 +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 03:08:50PM +0200, Tim Wiederhake wrote:
[...]
The pthread_mutex_destroy call is the only one that I see returning errors with PTHREAD_MUTEX_NORMAL. So I don't think there's benefit to adding the code to lock/unlock paths.
And as noted I'd be very careful with that one too. I've got at least one counterexample of a code path which effectively does the same on a very common code path. Namely 'virDomainObjParseXML' uses: g_autoptr(virDomainObj) obj = NULL; and then returns directly on common failures such as XML parsing errors, post-parse callback errors or even errors from the validation callbacks. Spamming logs in such a common code path is definitely not acceptable.

On Thu, Aug 05, 2021 at 05:29:24PM +0200, Peter Krempa wrote:
On Thu, Aug 05, 2021 at 16:11:15 +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 02:58:03PM +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 03:33:57PM +0200, Tim Wiederhake wrote:
On Thu, 2021-08-05 at 14:24 +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 03:08:50PM +0200, Tim Wiederhake wrote:
[...]
The pthread_mutex_destroy call is the only one that I see returning errors with PTHREAD_MUTEX_NORMAL. So I don't think there's benefit to adding the code to lock/unlock paths.
And as noted I'd be very careful with that one too. I've got at least one counterexample of a code path which effectively does the same on a very common code path.
Namely 'virDomainObjParseXML' uses:
g_autoptr(virDomainObj) obj = NULL;
and then returns directly on common failures such as XML parsing errors, post-parse callback errors or even errors from the validation callbacks.
Spamming logs in such a common code path is definitely not acceptable.
Destroying a locked mutex though is a clear bug that should be fixed though, not a false positive. If it is a commonly triggered bug that is even worse and more reason to fix it. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Aug 05, 2021 at 16:33:57 +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 05:29:24PM +0200, Peter Krempa wrote:
On Thu, Aug 05, 2021 at 16:11:15 +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 02:58:03PM +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 03:33:57PM +0200, Tim Wiederhake wrote:
On Thu, 2021-08-05 at 14:24 +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 03:08:50PM +0200, Tim Wiederhake wrote:
[...]
The pthread_mutex_destroy call is the only one that I see returning errors with PTHREAD_MUTEX_NORMAL. So I don't think there's benefit to adding the code to lock/unlock paths.
And as noted I'd be very careful with that one too. I've got at least one counterexample of a code path which effectively does the same on a very common code path.
Namely 'virDomainObjParseXML' uses:
g_autoptr(virDomainObj) obj = NULL;
and then returns directly on common failures such as XML parsing errors, post-parse callback errors or even errors from the validation callbacks.
Spamming logs in such a common code path is definitely not acceptable.
Destroying a locked mutex though is a clear bug that should be fixed though, not a false positive. If it is a commonly triggered bug that is even worse and more reason to fix it.
I definitely agree, but we need a better approach to do this, because it will keep happening since the g_auto* infrastructure is really convenient. And this is something that can happen also without it. Unfortunately I don't think that just replacing G_DEFINE_AUTOPTR_CLEANUP_FUNC(virDomainObj, virObjectUnref); by an equivalent of virObjectUnrefAndUnlock is sufficient. IMO we should do some form of lock state tracking for 'virObjectLockable' so that we can keep using g_auto for it without either accidentaly unlocking or unrefing a locked object.

On Thu, Aug 05, 2021 at 05:44:15PM +0200, Peter Krempa wrote:
On Thu, Aug 05, 2021 at 16:33:57 +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 05:29:24PM +0200, Peter Krempa wrote:
On Thu, Aug 05, 2021 at 16:11:15 +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 02:58:03PM +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 03:33:57PM +0200, Tim Wiederhake wrote:
On Thu, 2021-08-05 at 14:24 +0100, Daniel P. Berrangé wrote: > On Thu, Aug 05, 2021 at 03:08:50PM +0200, Tim Wiederhake wrote:
[...]
The pthread_mutex_destroy call is the only one that I see returning errors with PTHREAD_MUTEX_NORMAL. So I don't think there's benefit to adding the code to lock/unlock paths.
And as noted I'd be very careful with that one too. I've got at least one counterexample of a code path which effectively does the same on a very common code path.
Namely 'virDomainObjParseXML' uses:
g_autoptr(virDomainObj) obj = NULL;
and then returns directly on common failures such as XML parsing errors, post-parse callback errors or even errors from the validation callbacks.
Spamming logs in such a common code path is definitely not acceptable.
Destroying a locked mutex though is a clear bug that should be fixed though, not a false positive. If it is a commonly triggered bug that is even worse and more reason to fix it.
I definitely agree, but we need a better approach to do this, because it will keep happening since the g_auto* infrastructure is really convenient. And this is something that can happen also without it.
Unfortunately I don't think that just replacing
G_DEFINE_AUTOPTR_CLEANUP_FUNC(virDomainObj, virObjectUnref);
by an equivalent of virObjectUnrefAndUnlock is sufficient.
IMO we should do some form of lock state tracking for 'virObjectLockable' so that we can keep using g_auto for it without either accidentaly unlocking or unrefing a locked object.
Lock state tracking doesn't sound very appealing to me. QEMU makes use of g_auto for unlocking mutexes within scopes WITH_QEMU_LOCK_GUARD(&mutex) { ... if (error) { return; <-- mutex is automatically unlocked } if (early_exit) { break; <-- leave this scope early, unlocking } ... } and another pattern ... <-- mutex not locked QEMU_LOCK_GUARD(&mutex); <-- mutex locked from here onwards ... if (error) { return; <-- mutex is automatically unlocked } This would work for libvirt too, at least for cases where the lock+unlock are in the same scope (which should be the majority). Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Aug 05, 2021 at 16:48:59 +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 05:44:15PM +0200, Peter Krempa wrote:
On Thu, Aug 05, 2021 at 16:33:57 +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 05:29:24PM +0200, Peter Krempa wrote:
[...]
Lock state tracking doesn't sound very appealing to me.
QEMU makes use of g_auto for unlocking mutexes within scopes
WITH_QEMU_LOCK_GUARD(&mutex) { ... if (error) { return; <-- mutex is automatically unlocked }
if (early_exit) { break; <-- leave this scope early, unlocking } ... }
and another pattern
... <-- mutex not locked QEMU_LOCK_GUARD(&mutex); <-- mutex locked from here onwards ... if (error) { return; <-- mutex is automatically unlocked }
This would work for libvirt too, at least for cases where the lock+unlock are in the same scope (which should be the majority).
Yes I thought about this some time ago, but unfortunately in this very specific case it doesn't work. For some specific/historic reason 'virDomainObjNew' returns a locked mutex. Caller is expected then to attach the object into the list or dispose of it, so this doesn't happen in the same context. Given how important virDomainObjs are this needs to be approached very carefully.

On Thu, Aug 05, 2021 at 05:53:15PM +0200, Peter Krempa wrote:
On Thu, Aug 05, 2021 at 16:48:59 +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 05:44:15PM +0200, Peter Krempa wrote:
On Thu, Aug 05, 2021 at 16:33:57 +0100, Daniel P. Berrangé wrote:
On Thu, Aug 05, 2021 at 05:29:24PM +0200, Peter Krempa wrote:
[...]
Lock state tracking doesn't sound very appealing to me.
QEMU makes use of g_auto for unlocking mutexes within scopes
WITH_QEMU_LOCK_GUARD(&mutex) { ... if (error) { return; <-- mutex is automatically unlocked }
if (early_exit) { break; <-- leave this scope early, unlocking } ... }
and another pattern
... <-- mutex not locked QEMU_LOCK_GUARD(&mutex); <-- mutex locked from here onwards ... if (error) { return; <-- mutex is automatically unlocked }
This would work for libvirt too, at least for cases where the lock+unlock are in the same scope (which should be the majority).
Yes I thought about this some time ago, but unfortunately in this very specific case it doesn't work.
For some specific/historic reason 'virDomainObjNew' returns a locked mutex. Caller is expected then to attach the object into the list or dispose of it, so this doesn't happen in the same context.
There's not that many callers of virDomainObjNew, nor users of g_autoptr(virDomainObj), so we could remove that lock call and push it up into the caller. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Aug 05, 2021 at 15:08:50 +0200, Tim Wiederhake wrote:
`pthread_mutex_destroy`, `pthread_mutex_lock` and `pthread_mutex_unlock` return an error code that is currently ignored.
Add debug information if one of these operations failed, e.g. when there is an attempt to destroy a still locked mutex or unlock and already unlocked mutex.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com> --- src/util/virthread.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/src/util/virthread.c b/src/util/virthread.c index e89c1a09fb..f64dbee9e9 100644 --- a/src/util/virthread.c +++ b/src/util/virthread.c @@ -35,7 +35,10 @@
#include "viralloc.h" #include "virthreadjob.h" +#include "virlog.h"
+#define VIR_FROM_THIS VIR_FROM_THREAD +VIR_LOG_INIT("util.thread");
int virOnce(virOnceControl *once, virOnceFunc init) { @@ -83,17 +86,23 @@ int virMutexInitRecursive(virMutex *m)
void virMutexDestroy(virMutex *m) { - pthread_mutex_destroy(&m->lock); + if (pthread_mutex_destroy(&m->lock)) { + VIR_WARN("Failed to destroy mutex=%p", m); + } }
void virMutexLock(virMutex *m) { - pthread_mutex_lock(&m->lock); + if (pthread_mutex_lock(&m->lock)) { + VIR_WARN("Failed to lock mutex=%p", m); + } }
void virMutexUnlock(virMutex *m) { - pthread_mutex_unlock(&m->lock); + if (pthread_mutex_unlock(&m->lock)) { + VIR_WARN("Failed to unlock mutex=%p", m); + } }
I'm not very persuaded that this is a good idea: 1) The warning itself is not useful The pointer itself is valid only when the program is running, having this in the logs has almost no value. 2) It spams logs As noted above, by getting just the log you don't get any useful information and users tend to report that log is being spammed in certain cases, so while this would note that there is something going on a lot of additional work is required to figure out what, thus the value of such entry is rather low. 3) It can create an endless loop VIR_WARN -> VIR_WARN_INT -> virLogMessage -> virLogVMessage -> virLogLock -> virMutexLock()

On Thu, Aug 05, 2021 at 03:43:11PM +0200, Peter Krempa wrote:
On Thu, Aug 05, 2021 at 15:08:50 +0200, Tim Wiederhake wrote:
`pthread_mutex_destroy`, `pthread_mutex_lock` and `pthread_mutex_unlock` return an error code that is currently ignored.
Add debug information if one of these operations failed, e.g. when there is an attempt to destroy a still locked mutex or unlock and already unlocked mutex.
Signed-off-by: Tim Wiederhake <twiederh@redhat.com> --- src/util/virthread.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/src/util/virthread.c b/src/util/virthread.c index e89c1a09fb..f64dbee9e9 100644 --- a/src/util/virthread.c +++ b/src/util/virthread.c @@ -35,7 +35,10 @@
#include "viralloc.h" #include "virthreadjob.h" +#include "virlog.h"
+#define VIR_FROM_THIS VIR_FROM_THREAD +VIR_LOG_INIT("util.thread");
int virOnce(virOnceControl *once, virOnceFunc init) { @@ -83,17 +86,23 @@ int virMutexInitRecursive(virMutex *m)
void virMutexDestroy(virMutex *m) { - pthread_mutex_destroy(&m->lock); + if (pthread_mutex_destroy(&m->lock)) { + VIR_WARN("Failed to destroy mutex=%p", m); + } }
void virMutexLock(virMutex *m) { - pthread_mutex_lock(&m->lock); + if (pthread_mutex_lock(&m->lock)) { + VIR_WARN("Failed to lock mutex=%p", m); + } }
void virMutexUnlock(virMutex *m) { - pthread_mutex_unlock(&m->lock); + if (pthread_mutex_unlock(&m->lock)) { + VIR_WARN("Failed to unlock mutex=%p", m); + } }
I'm not very persuaded that this is a good idea:
1) The warning itself is not useful
The pointer itself is valid only when the program is running, having this in the logs has almost no value.
2) It spams logs
As noted above, by getting just the log you don't get any useful information and users tend to report that log is being spammed in certain cases, so while this would note that there is something going on a lot of additional work is required to figure out what, thus the value of such entry is rather low.
3) It can create an endless loop
VIR_WARN -> VIR_WARN_INT -> virLogMessage -> virLogVMessage -> virLogLock -> virMutexLock()
I think we could use perror() to make it more useful and avoid the loop void virMutexDestroy(virMutex *m) { if ((errno = pthread_mutex_destroy(&m->lock)) != 0) { perror("pthread_mutex_destroy"); } } still not going to be enough to diagnose the problem fro the logs though, as we have no use context. This change would give a bit more of a fighting chance at diagnosis: in virthread.h #define virMutexDestroy(m) \ virMutexDestroyImpl(m, __FILE__, __func__, __LINE__) void virMutexDestroyImpl(virMutex *mm, const char *file, const char *func, int line); and in virthread.c void virMutexDestroyImpl(virMutex *m, const char *file, const char *func, int line) { if ((errno = pthread_mutex_destroy(&m->lock)) != 0) { g_autofree char *msg = g_strdup_printf("pthread_mutex_destroy (caller %s in %s:%d)", func, file, line); perror(msg); } } That results in: $ ./build/tests/qemusecuritytest TEST: qemusecuritytest pthread_mutex_destroy (caller virObjectLockableDispose in ../src/util/virobject.c:349): Device or resource busy !pthread_mutex_destroy (caller virObjectLockableDispose in ../src/util/virobject.c:349): Device or resource busy !pthread_mutex_destroy (caller virObjectLockableDispose in ../src/util/virobject.c:349): Device or resource busy !pthread_mutex_destroy (caller virObjectLockableDispose in ../src/util/virobject.c:349): Device or resource busy !pthread_mutex_destroy (caller virObjectLockableDispose in ../src/util/virobject.c:349): Device or resource busy !pthread_mutex_destroy (caller virObjectLockableDispose in ../src/util/virobject.c:349): Device or resource busy Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Make failure to lock, unlock or destroy mutexes visible in the CI. As "abort()" is quite a harsh treatment and pthread generally speaking "does the right thing" regardless, limit this to analysis builds in the CI as to not bother users but still make developers aware of issues. Signed-off-by: Tim Wiederhake <twiederh@redhat.com> --- src/util/virthread.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/src/util/virthread.c b/src/util/virthread.c index f64dbee9e9..95213bd511 100644 --- a/src/util/virthread.c +++ b/src/util/virthread.c @@ -88,6 +88,9 @@ void virMutexDestroy(virMutex *m) { if (pthread_mutex_destroy(&m->lock)) { VIR_WARN("Failed to destroy mutex=%p", m); +#if DYNAMIC_ANALYSIS + abort(); +#endif } } @@ -95,6 +98,9 @@ void virMutexLock(virMutex *m) { if (pthread_mutex_lock(&m->lock)) { VIR_WARN("Failed to lock mutex=%p", m); +#if DYNAMIC_ANALYSIS + abort(); +#endif } } @@ -102,6 +108,9 @@ void virMutexUnlock(virMutex *m) { if (pthread_mutex_unlock(&m->lock)) { VIR_WARN("Failed to unlock mutex=%p", m); +#if DYNAMIC_ANALYSIS + abort(); +#endif } } -- 2.31.1
participants (3)
-
Daniel P. Berrangé
-
Peter Krempa
-
Tim Wiederhake