[libvirt PATCH] Fix incorrect uses of g_clear_pointer() introduced in 8.1.0

This is a partial revert of 87a43a907f0ad4897a28ad7c216bc70f37270b93. The change to use g_clear_pointer() in more places was accidentally applied to cases involving vir_g_source_unref(). In some cases, the ordering of g_source_destroy() and vir_g_source_unref() was reversed, which resulted in the source being marked as destroyed, after it is already unreferenced. This use-after-free case might work in many cases, but with versions of glibc older than 2.64.0 it may defer unref to run within the main thread to avoid a race condition, which creates a large distance between the g_source_unref() and g_source_destroy(). In some cases, the call to vir_g_source_unref() was replaced with a second call to g_source_destroy(), leading to a memory leak or worse. In our experience, the symptoms were that use of libvirt-python became slower over time, with OpenStack nova-compute initially taking around one second to periodically query the host PCI devices, and within an hour it was taking over a minute to complete the same operation, until it is was eventually running this query back-to-back, resulting in the nova-compute process consuming 100% of one CPU thread, losing its RabbitMQ connection frequently, and showing up as down to the control plane. --- src/qemu/qemu_agent.c | 3 ++- src/qemu/qemu_monitor.c | 3 ++- src/util/vireventglib.c | 12 ++++++++---- 3 files changed, 12 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c index f57a8d5f25..e6e92c7dc4 100644 --- a/src/qemu/qemu_agent.c +++ b/src/qemu/qemu_agent.c @@ -452,8 +452,9 @@ static void qemuAgentUnregister(qemuAgent *agent) { if (agent->watch) { + g_source_destroy(agent->watch); vir_g_source_unref(agent->watch, agent->context); - g_clear_pointer(&agent->watch, g_source_destroy); + agent->watch = NULL; } } diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 37bcbde31e..32c993a941 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -794,8 +794,9 @@ void qemuMonitorUnregister(qemuMonitor *mon) { if (mon->watch) { + g_source_destroy(mon->watch); vir_g_source_unref(mon->watch, mon->context); - g_clear_pointer(&mon->watch, g_source_destroy); + mon->watch = NULL; } } diff --git a/src/util/vireventglib.c b/src/util/vireventglib.c index fc04d8f712..983787932f 100644 --- a/src/util/vireventglib.c +++ b/src/util/vireventglib.c @@ -228,7 +228,8 @@ virEventGLibHandleUpdate(int watch, VIR_DEBUG("Removed old handle source=%p", data->source); g_source_destroy(data->source); - g_clear_pointer(&data->source, g_source_destroy); + vir_g_source_unref(data->source, NULL); + data->source = NULL; data->events = 0; } @@ -275,8 +276,9 @@ virEventGLibHandleRemove(int watch) data, watch, data->fd); if (data->source != NULL) { + g_source_destroy(data->source); vir_g_source_unref(data->source, NULL); - g_clear_pointer(&data->source, g_source_destroy); + data->source = NULL; data->events = 0; } @@ -417,8 +419,9 @@ virEventGLibTimeoutUpdate(int timer, if (data->source == NULL) goto cleanup; + g_source_destroy(data->source); vir_g_source_unref(data->source, NULL); - g_clear_pointer(&data->source, g_source_destroy); + data->source = NULL; } cleanup: @@ -465,8 +468,9 @@ virEventGLibTimeoutRemove(int timer) data, timer); if (data->source != NULL) { + g_source_destroy(data->source); vir_g_source_unref(data->source, NULL); - g_clear_pointer(&data->source, g_source_destroy); + data->source = NULL; } /* since the actual timeout deletion is done asynchronously, a timeoutUpdate call may -- 2.36.1

On a Sunday in 2022, Mark Mielke wrote:
This is a partial revert of 87a43a907f0ad4897a28ad7c216bc70f37270b93.
I'll drop the trailing period before pushing.
The change to use g_clear_pointer() in more places was accidentally applied to cases involving vir_g_source_unref().
In some cases, the ordering of g_source_destroy() and vir_g_source_unref() was reversed, which resulted in the source being marked as destroyed, after it is already unreferenced.
Oops, sorry I missed that during review.
This use-after-free case might work in many cases, but with versions of glibc older than 2.64.0 it may defer unref to run within the main
s/glibc/glib/
thread to avoid a race condition, which creates a large distance between the g_source_unref() and g_source_destroy().
In some cases, the call to vir_g_source_unref() was replaced with a second call to g_source_destroy(), leading to a memory leak or worse.
In our experience, the symptoms were that use of libvirt-python became slower over time, with OpenStack nova-compute initially taking around one second to periodically query the host PCI devices, and within an hour it was taking over a minute to complete the same operation, until it is was eventually running this query back-to-back, resulting in the nova-compute process consuming 100% of one CPU thread, losing its RabbitMQ connection frequently, and showing up as down to the control plane.
Your patch is missing a sign-off https://libvirt.org/hacking.html#developer-certificate-of-origin Just replying to this e-mail with the Signed-off-by line is enough - no need to resend the patch. I'll push it with the sign-off included after the pipeline succeds: https://gitlab.com/janotomko/libvirt/-/pipelines/562139546
--- src/qemu/qemu_agent.c | 3 ++- src/qemu/qemu_monitor.c | 3 ++- src/util/vireventglib.c | 12 ++++++++---- 3 files changed, 12 insertions(+), 6 deletions(-)
Reviewed-by: Ján Tomko <jtomko@redhat.com> Jano

Sounds good - thank you... Patch is Signed-off-by: Mark Mielke <mark.mielke@gmail.com> On Mon, Jun 13, 2022 at 5:08 AM Ján Tomko <jtomko@redhat.com> wrote:
On a Sunday in 2022, Mark Mielke wrote:
This is a partial revert of 87a43a907f0ad4897a28ad7c216bc70f37270b93.
I'll drop the trailing period before pushing.
The change to use g_clear_pointer() in more places was accidentally applied to cases involving vir_g_source_unref().
In some cases, the ordering of g_source_destroy() and vir_g_source_unref() was reversed, which resulted in the source being marked as destroyed, after it is already unreferenced.
Oops, sorry I missed that during review.
This use-after-free case might work in many cases, but with versions of glibc older than 2.64.0 it may defer unref to run within the main
s/glibc/glib/
thread to avoid a race condition, which creates a large distance between the g_source_unref() and g_source_destroy().
In some cases, the call to vir_g_source_unref() was replaced with a second call to g_source_destroy(), leading to a memory leak or worse.
In our experience, the symptoms were that use of libvirt-python became slower over time, with OpenStack nova-compute initially taking around one second to periodically query the host PCI devices, and within an hour it was taking over a minute to complete the same operation, until it is was eventually running this query back-to-back, resulting in the nova-compute process consuming 100% of one CPU thread, losing its RabbitMQ connection frequently, and showing up as down to the control plane.
Your patch is missing a sign-off https://libvirt.org/hacking.html#developer-certificate-of-origin
Just replying to this e-mail with the Signed-off-by line is enough - no need to resend the patch. I'll push it with the sign-off included after the pipeline succeds: https://gitlab.com/janotomko/libvirt/-/pipelines/562139546
--- src/qemu/qemu_agent.c | 3 ++- src/qemu/qemu_monitor.c | 3 ++- src/util/vireventglib.c | 12 ++++++++---- 3 files changed, 12 insertions(+), 6 deletions(-)
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Jano
-- Mark Mielke <mark.mielke@gmail.com>
participants (2)
-
Ján Tomko
-
Mark Mielke