On a Thursday in 2024, Daniel P. Berrangé wrote:
Users are seeing periodic segfaults from libvirt client apps,
especially thread heavy ones like virt-manager. A typical
stack trace would end up in the virNetClientIOEventFD method,
with illegal access to stale stack data. eg
==238721==ERROR: AddressSanitizer: stack-use-after-return on address 0x75cd18709788 at pc
0x75cd3111f907 bp 0x75cd181ff550 sp 0x75cd181ff548
WRITE of size 4 at 0x75cd18709788 thread T11
#0 0x75cd3111f906 in virNetClientIOEventFD
/usr/src/debug/libvirt/libvirt-10.2.0/build/../src/rpc/virnetclient.c:1634:15
#1 0x75cd3210d198 (/usr/lib/libglib-2.0.so.0+0x5a198) (BuildId:
0a2311dfbbc6c215dc36f4b6bdd2b4b6fbae55a2)
#2 0x75cd3216c3be (/usr/lib/libglib-2.0.so.0+0xb93be) (BuildId:
0a2311dfbbc6c215dc36f4b6bdd2b4b6fbae55a2)
#3 0x75cd3210ddc6 in g_main_loop_run (/usr/lib/libglib-2.0.so.0+0x5adc6) (BuildId:
0a2311dfbbc6c215dc36f4b6bdd2b4b6fbae55a2)
#4 0x75cd3111a47c in virNetClientIOEventLoop
/usr/src/debug/libvirt/libvirt-10.2.0/build/../src/rpc/virnetclient.c:1722:9
#5 0x75cd3111a47c in virNetClientIO
/usr/src/debug/libvirt/libvirt-10.2.0/build/../src/rpc/virnetclient.c:2002:10
#6 0x75cd3111a47c in virNetClientSendInternal
/usr/src/debug/libvirt/libvirt-10.2.0/build/../src/rpc/virnetclient.c:2170:11
#7 0x75cd311198a8 in virNetClientSendWithReply
/usr/src/debug/libvirt/libvirt-10.2.0/build/../src/rpc/virnetclient.c:2198:11
#8 0x75cd31111653 in virNetClientProgramCall
/usr/src/debug/libvirt/libvirt-10.2.0/build/../src/rpc/virnetclientprogram.c:318:9
#9 0x75cd31241c8f in callFull
/usr/src/debug/libvirt/libvirt-10.2.0/build/../src/remote/remote_driver.c:6054:10
#10 0x75cd31241c8f in call
/usr/src/debug/libvirt/libvirt-10.2.0/build/../src/remote/remote_driver.c:6076:12
#11 0x75cd31241c8f in remoteNetworkGetXMLDesc
/usr/src/debug/libvirt/libvirt-10.2.0/build/src/remote/remote_client_bodies.h:5959:9
#12 0x75cd31410ff7 in virNetworkGetXMLDesc
/usr/src/debug/libvirt/libvirt-10.2.0/build/../src/libvirt-network.c:952:15
The root cause is a bad assumption in the virNetClientIOEventLoop
method. This method is run by whichever thread currently owns the
buck, and is responsible for handling I/O. Inside a for(;;) loop,
this method creates a temporary GSource, adds it to the event loop
and runs g_main_loop_run(). When I/O is ready, the GSource callback
(virNetClientIOEventFD) will fire and call g_main_loop_quit(), and
return G_SOURCE_REMOVE which results in the temporary GSource being
destroyed. A g_autoptr() will then remove the last reference.
What was overlooked, is that a second thread can come along and
while it can't enter virNetClientIOEventLoop, it will register an
idle source that uses virNetClientIOWakeup to interrupt the
original thread's 'g_main_loop_run' call. When this happens the
virNetClientIOEventFD callback never runs, and so the temporary
GSource is not destroyed. The g_autoptr() will remove a reference,
but by virtue of still being attached to the event context, there
is an extra reference held causing GSource to be leaked. The
next time 'g_main_loop_run' is called, the original GSource will
trigger its callback, and access data that was allocated on the
stack by the previous thread, and likely SEGV.
To solve this, the thread calling 'g_main_loop_run' must call
g_source_destroy, immediately upon return, to guarantee that
the temporary GSource is removed.
CVE-2024-4418
Reported-by: Martin Shirokov <shirokovmartin(a)gmail.com>
Signed-off-by: Daniel P. Berrangé <berrange(a)redhat.com>
---
src/rpc/virnetclient.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
Reviewed-by: Ján Tomko <jtomko(a)redhat.com>
Jano