Hello,
I found an issue that libvirt isn't close an unix domain socket to connect to the qemu
monitor
even after the VM is destroyed.
This issue happens since commit 695bdb3841 ("src: ensure GSource background unref
happens in correct event loop")
on the system whose glib version is 2.56.
I would appreciate it if you could give any ideas to solve the issue.
The socket is allocated in qemuMonitorOpenUnix(), and used by the monitor->socket
and monitor->watch:
qemuMonitorOpen
qemuMonitorOpenUnix
if ((monfd = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) {
qemuMonitorOpenInternal
mon->socket = g_socket_new_from_fd(fd, &gerr);
qemuMonitorRegister
mon->watch = g_socket_create_source(mon->socket,
Usually, the socket is closed when the reference counter of the glib object
(mon->socket and mon->watch) gets 0:
qemuMonitorClose
qemuMonitorUnregister
vir_g_source_unref(mon->watch, mon->context);
g_source_set_callback(idle, virEventGLibSourceUnrefIdle, src, NULL);
virEventGLibSourceUnrefIdle
g_source_unref(src); <== unref monitor->watch
g_object_unref(mon->socket); <== unref monitor->socket
It seems that the callback virEventGLibSourceUnrefIdle() to unref the monitor->watch
doesn't
work when qemuMonitorUnregister() is called via qemuProcessStop(), so the socket isn't
closed.
I'm not sure why the callback doesn't work at that time. I suspect that the VM is
closing
so the main loop of the monitor doesn't work any more.
We can close the socket to add g_socket_close() before unref the mon->socket, however,
it may remain a memory leak because of mon->watch (GSource object), so probably
it isn't a good idea to close the socket.
We can unref the mon->watch to set NULL to the second argument of vir_g_source_unref()
because the default main loop still works at that time, however, I'm not
sure it's an appropriate way to avoid the gobject issue which the commit solves...
I found this issue on the qemu monitor, and probably the qemu agent has the same issue
because the socket procedure is almost the same as the monitor.
I would appreciate it if you could give any ideas to solve this issue.
Following is to observe the callback working:
---
diff --git a/src/util/glibcompat.c b/src/util/glibcompat.c
index eb6dcc0111..b8b1770424 100644
--- a/src/util/glibcompat.c
+++ b/src/util/glibcompat.c
@@ -24,6 +24,9 @@
#include "glibcompat.h"
+#include "virlog.h"
+
+VIR_LOG_INIT("util.glibcompat");
/*
* Note that because of the GLIB_VERSION_MAX_ALLOWED constant in
* config-post.h, allowing use of functions from newer GLib via
@@ -244,6 +247,7 @@ virEventGLibSourceUnrefIdle(gpointer data)
GSource *src = data;
g_source_unref(src);
+ VIR_DEBUG("unrefed: %p", src);
return FALSE;
}
@@ -257,6 +261,7 @@ void vir_g_source_unref(GSource *src, GMainContext *ctx)
g_source_attach(idle, ctx);
g_source_unref(idle);
+ VIR_DEBUG("unref registered: %p ctx: %p", src, ctx);
}
#endif
---
Case the mon->watch (0xffff28008af0) is unreffed correctly
(via qemuMonitorUpdateWatch()):
18:54:15.403+0000: 16845: debug : qemuMonitorEmitResume:1159 : mon=0xffff683ac020
18:54:15.403+0000: 16845: debug : qemuProcessHandleResume:713 : Transitioned guest test-vm
into running state, reason 'booted', event detail 0
18:54:15.404+0000: 16845: debug : vir_g_source_unref:264 : unref registered:
0xffff28008af0 ctx: 0xffff780169a0
18:54:15.404+0000: 16845: debug : qemuMonitorJSONIOProcessLine:222 : Line
[{"return": {}, "id": "libvirt-10"}]
18:54:15.404+0000: 16845: info : qemuMonitorJSONIOProcessLine:242 :
QEMU_MONITOR_RECV_REPLY: mon=0xffff683ac020 reply={"return": {}, "id":
"libvirt-10"}
18:54:15.404+0000: 16845: debug : vir_g_source_unref:264 : unref registered:
0xffff2819a260 ctx: 0xffff780169a0
18:54:15.404+0000: 16845: debug : virEventGLibSourceUnrefIdle:250 : unrefed:
0xffff28008af0
Case the mon->watch (0xffff7802bb30) isn't unreffed
(via qemuProcessStop()):
18:54:15.642+0000: 16589: debug : qemuProcessStop:8008 : Shutting down vm=0xaaaad40edec0
name=test-vm id=3 pid=16842, reason=destroyed, asyncJob=none, flags=0x0
18:54:15.642+0000: 16589: debug : qemuDomainLogAppendMessage:6733 : Append log message
(vm='test-vm' message='2022-01-25 18:54:15.642+0000: shutting down,
reason=destroyed) stdioLogD=1
18:54:15.643+0000: 16589: info : qemuMonitorClose:834 : QEMU_MONITOR_CLOSE:
mon=0xffff683ac020
18:54:15.643+0000: 16589: debug : vir_g_source_unref:264 : unref registered:
0xffff7802bb30 ctx: 0xffff780169a0
18:54:15.643+0000: 16845: debug : qemuMonitorJSONIOProcessEvent:209 : handle SHUTDOWN
handler=0xffff4ef057c0 data=0xffff28007da0
18:54:15.643+0000: 16845: debug : qemuMonitorEmitShutdown:1132 : mon=0xffff683ac020
guest=2
18:54:15.643+0000: 16845: debug : qemuProcessHandleShutdown:572 : vm=0xaaaad40edec0
18:54:15.643+0000: 16845: debug : qemuProcessHandleShutdown:592 : Transitioned guest
test-vm to shutdown state
18:54:15.643+0000: 16845: debug : qemuProcessKill:7921 : vm=0xaaaad40edec0 name=test-vm
pid=16842 flags=0x2
18:54:15.643+0000: 16845: debug : qemuMonitorDispose:216 : mon=0xffff683ac020
18:54:15.643+0000: 16589: debug : qemuProcessKill:7921 : vm=0xaaaad40edec0 name=test-vm
pid=16842 flags=0x5
18:54:15.643+0000: 16589: debug : qemuDomainCleanupRun:7313 : driver=0xffff281441b0,
vm=test-vm
18:54:15.644+0000: 16589: debug : qemuProcessAutoDestroyRemove:8395 : vm=test-vm
18:54:15.645+0000: 16589: debug : qemuRemoveCgroup:1221 : Failed to terminate cgroup for
test-vm
18:54:15.647+0000: 16589: debug : qemuDomainObjEndJob:1141 : Stopping job: destroy
(async=none vm=0xaaaad40edec0 name=test-vm)
Thanks!
Masa