On 03/31/2013 10:20 AM, Peter Krempa wrote:
The last Viktor's effort to fix the race and memory corruption
unfortunately
wasn't complete in the case the close callback was not registered in an
connection. At that time, the trail of event's that I'll describe later could
still happend and corrupt the memory or cause a crash of the client (including
the daemon in case of a p2p migration).
Consider the following prerequisities and trail of events:
Let's have a remote connection to a hypervisor that doesn't have a close
callback registered and the client is using the event loop. The crash happens in
cooperation of 2 threads. Thread E is the event loop and thread W is the worker
that does some stuff. R denotes the remote client.
1.) W - The client finishes everything and sheds the last reference on the client
2.) W - The virObject stuff invokes virConnectDispose that invokes doRemoteClose
3.) W - the remote close method invokes the REMOTE_PROC_CLOSE RPC method.
4.) W - The thread is preempted at this point.
5.) R - The remote side recieves the close and closes the socket.
6.) E - poll() wakes up due to the closed socket and invokes the close callback
7.) E - The event loop is preempted right before remoteClientCloseFunc is called
8.) W - The worker now finishes, and frees the conn object.
9.) E - The remoteClientCloseFunc accesses the now-freed conn object in the
attempt to retrieve pointer for the real close callback.
10.) Kaboom, corrupted memory/segfault.
With patch 1/6 alone, I think I saw exactly this scenario by doing:
# virsh migrate dom qemu+ssh://remote/system --live --p2p --verbose
where I observed this on the source side under valgrind.
[as to the patch itself, I'm still testing it, and ran out of time to
definitively say ack today, but so far I'm liking it]
DEBUG: Connection close called, sleeping
2013-04-01 23:31:06.966+0000: 8724: error : virNetSocketReadWire:1181 :
End of file while reading data: : Input/output error
DEBUG: calling the close callback
DEBUG: Finishing close
==8724== Invalid read of size 4
==8724== at 0x390A009220: pthread_mutex_lock (pthread_mutex_lock.c:50)
==8724== by 0x4E89372: virMutexLock (threads-pthread.c:85)
==8724== by 0x4F76CC0: remoteClientCloseFunc (remote_driver.c:337)
==8724== by 0x4FA7E12: virNetClientCloseLocked (virnetclient.c:693)
==8724== by 0x4FAAB69: virNetClientIncomingEvent (virnetclient.c:1866)
==8724== by 0x4FBFE73: virNetSocketEventHandle (virnetsocket.c:1500)
==8724== by 0x4E6E69F: virEventPollDispatchHandles (event_poll.c:485)
==8724== by 0x4E6EF38: virEventPollRunOnce (event_poll.c:632)
==8724== by 0x4E6CFD7: virEventRunDefaultImpl (event.c:247)
==8724== by 0x4FB2DEB: virNetServerRun (virnetserver.c:748)
==8724== by 0x40DC17: main (libvirtd.c:1228)
==8724== Address 0x14df2540 is 160 bytes inside a block of size 312 free'd
==8724== at 0x4A063F0: free (vg_replace_malloc.c:446)
==8724== by 0x4E7AA26: virFree (memory.c:419)
==8724== by 0x4E94D5C: virObjectUnref (virobject.c:145)
==8724== by 0x4F2EEDB: virConnectClose (libvirt.c:1458)
==8724== by 0x12A9A67E: doPeer2PeerMigrate (qemu_migration.c:2630)
==8724== by 0x12A9A8BB: qemuMigrationPerformJob (qemu_migration.c:2685)
==8724== by 0x12A9B0D4: qemuMigrationPerform (qemu_migration.c:2872)
==8724== by 0x12ADA7E0: qemuDomainMigratePerform3 (qemu_driver.c:10107)
==8724== by 0x4F3D7D0: virDomainMigratePerform3 (libvirt.c:6253)
==8724== by 0x438CD7: remoteDispatchDomainMigratePerform3 (remote.c:3593)
==8724== by 0x418A8D: remoteDispatchDomainMigratePerform3Helper
(remote_dispatch.h:3629)
==8724== by 0x4FAC16D: virNetServerProgramDispatchCall
(virnetserverprogram.c:431)
==8724==
==8724== Invalid read of size 4
==8724== at 0x390A00924C: pthread_mutex_lock (pthread_mutex_lock.c:61)
==8724== by 0x4E89372: virMutexLock (threads-pthread.c:85)
==8724== by 0x4F76CC0: remoteClientCloseFunc (remote_driver.c:337)
==8724== by 0x4FA7E12: virNetClientCloseLocked (virnetclient.c:693)
==8724== by 0x4FAAB69: virNetClientIncomingEvent (virnetclient.c:1866)
==8724== by 0x4FBFE73: virNetSocketEventHandle (virnetsocket.c:1500)
==8724== by 0x4E6E69F: virEventPollDispatchHandles (event_poll.c:485)
==8724== by 0x4E6EF38: virEventPollRunOnce (event_poll.c:632)
==8724== by 0x4E6CFD7: virEventRunDefaultImpl (event.c:247)
==8724== by 0x4FB2DEB: virNetServerRun (virnetserver.c:748)
==8724== by 0x40DC17: main (libvirtd.c:1228)
==8724== Address 0x14df2530 is 144 bytes inside a block of size 312 free'd
==8724== at 0x4A063F0: free (vg_replace_malloc.c:446)
==8724== by 0x4E7AA26: virFree (memory.c:419)
==8724== by 0x4E94D5C: virObjectUnref (virobject.c:145)
==8724== by 0x4F2EEDB: virConnectClose (libvirt.c:1458)
==8724== by 0x12A9A67E: doPeer2PeerMigrate (qemu_migration.c:2630)
==8724== by 0x12A9A8BB: qemuMigrationPerformJob (qemu_migration.c:2685)
==8724== by 0x12A9B0D4: qemuMigrationPerform (qemu_migration.c:2872)
==8724== by 0x12ADA7E0: qemuDomainMigratePerform3 (qemu_driver.c:10107)
==8724== by 0x4F3D7D0: virDomainMigratePerform3 (libvirt.c:6253)
==8724== by 0x438CD7: remoteDispatchDomainMigratePerform3 (remote.c:3593)
==8724== by 0x418A8D: remoteDispatchDomainMigratePerform3Helper
(remote_dispatch.h:3629)
==8724== by 0x4FAC16D: virNetServerProgramDispatchCall
(virnetserverprogram.c:431)
==8724==
==8724== Invalid read of size 4
==8724== at 0x390A009257: pthread_mutex_lock (pthread_mutex_lock.c:127)
==8724== by 0x4E89372: virMutexLock (threads-pthread.c:85)
==8724== by 0x4F76CC0: remoteClientCloseFunc (remote_driver.c:337)
==8724== by 0x4FA7E12: virNetClientCloseLocked (virnetclient.c:693)
==8724== by 0x4FAAB69: virNetClientIncomingEvent (virnetclient.c:1866)
==8724== by 0x4FBFE73: virNetSocketEventHandle (virnetsocket.c:1500)
==8724== by 0x4E6E69F: virEventPollDispatchHandles (event_poll.c:485)
==8724== by 0x4E6EF38: virEventPollRunOnce (event_poll.c:632)
==8724== by 0x4E6CFD7: virEventRunDefaultImpl (event.c:247)
==8724== by 0x4FB2DEB: virNetServerRun (virnetserver.c:748)
==8724== by 0x40DC17: main (libvirtd.c:1228)
==8724== Address 0x14df253c is 156 bytes inside a block of size 312 free'd
==8724== at 0x4A063F0: free (vg_replace_malloc.c:446)
==8724== by 0x4E7AA26: virFree (memory.c:419)
==8724== by 0x4E94D5C: virObjectUnref (virobject.c:145)
==8724== by 0x4F2EEDB: virConnectClose (libvirt.c:1458)
==8724== by 0x12A9A67E: doPeer2PeerMigrate (qemu_migration.c:2630)
==8724== by 0x12A9A8BB: qemuMigrationPerformJob (qemu_migration.c:2685)
==8724== by 0x12A9B0D4: qemuMigrationPerform (qemu_migration.c:2872)
==8724== by 0x12ADA7E0: qemuDomainMigratePerform3 (qemu_driver.c:10107)
==8724== by 0x4F3D7D0: virDomainMigratePerform3 (libvirt.c:6253)
==8724== by 0x438CD7: remoteDispatchDomainMigratePerform3 (remote.c:3593)
==8724== by 0x418A8D: remoteDispatchDomainMigratePerform3Helper
(remote_dispatch.h:3629)
==8724== by 0x4FAC16D: virNetServerProgramDispatchCall
(virnetserverprogram.c:431)
==8724==
==8724== Invalid write of size 4
==8724== at 0x390A00925C: pthread_mutex_lock (pthread_mutex_lock.c:125)
==8724== by 0x4E89372: virMutexLock (threads-pthread.c:85)
==8724== by 0x4F76CC0: remoteClientCloseFunc (remote_driver.c:337)
==8724== by 0x4FA7E12: virNetClientCloseLocked (virnetclient.c:693)
==8724== by 0x4FAAB69: virNetClientIncomingEvent (virnetclient.c:1866)
==8724== by 0x4FBFE73: virNetSocketEventHandle (virnetsocket.c:1500)
==8724== by 0x4E6E69F: virEventPollDispatchHandles (event_poll.c:485)
==8724== by 0x4E6EF38: virEventPollRunOnce (event_poll.c:632)
==8724== by 0x4E6CFD7: virEventRunDefaultImpl (event.c:247)
==8724== by 0x4FB2DEB: virNetServerRun (virnetserver.c:748)
==8724== by 0x40DC17: main (libvirtd.c:1228)
==8724== Address 0x14df2538 is 152 bytes inside a block of size 312 free'd
==8724== at 0x4A063F0: free (vg_replace_malloc.c:446)
==8724== by 0x4E7AA26: virFree (memory.c:419)
==8724== by 0x4E94D5C: virObjectUnref (virobject.c:145)
==8724== by 0x4F2EEDB: virConnectClose (libvirt.c:1458)
==8724== by 0x12A9A67E: doPeer2PeerMigrate (qemu_migration.c:2630)
==8724== by 0x12A9A8BB: qemuMigrationPerformJob (qemu_migration.c:2685)
==8724== by 0x12A9B0D4: qemuMigrationPerform (qemu_migration.c:2872)
==8724== by 0x12ADA7E0: qemuDomainMigratePerform3 (qemu_driver.c:10107)
==8724== by 0x4F3D7D0: virDomainMigratePerform3 (libvirt.c:6253)
==8724== by 0x438CD7: remoteDispatchDomainMigratePerform3 (remote.c:3593)
==8724== by 0x418A8D: remoteDispatchDomainMigratePerform3Helper
(remote_dispatch.h:3629)
==8724== by 0x4FAC16D: virNetServerProgramDispatchCall
(virnetserverprogram.c:431)
==8724==
==8724== Invalid read of size 8
==8724== at 0x4F76CC5: remoteClientCloseFunc (remote_driver.c:338)
==8724== by 0x4FA7E12: virNetClientCloseLocked (virnetclient.c:693)
==8724== by 0x4FAAB69: virNetClientIncomingEvent (virnetclient.c:1866)
==8724== by 0x4FBFE73: virNetSocketEventHandle (virnetsocket.c:1500)
==8724== by 0x4E6E69F: virEventPollDispatchHandles (event_poll.c:485)
==8724== by 0x4E6EF38: virEventPollRunOnce (event_poll.c:632)
==8724== by 0x4E6CFD7: virEventRunDefaultImpl (event.c:247)
==8724== by 0x4FB2DEB: virNetServerRun (virnetserver.c:748)
==8724== by 0x40DC17: main (libvirtd.c:1228)
==8724== Address 0x14df25b8 is 280 bytes inside a block of size 312 free'd
==8724== at 0x4A063F0: free (vg_replace_malloc.c:446)
==8724== by 0x4E7AA26: virFree (memory.c:419)
==8724== by 0x4E94D5C: virObjectUnref (virobject.c:145)
==8724== by 0x4F2EEDB: virConnectClose (libvirt.c:1458)
==8724== by 0x12A9A67E: doPeer2PeerMigrate (qemu_migration.c:2630)
==8724== by 0x12A9A8BB: qemuMigrationPerformJob (qemu_migration.c:2685)
==8724== by 0x12A9B0D4: qemuMigrationPerform (qemu_migration.c:2872)
==8724== by 0x12ADA7E0: qemuDomainMigratePerform3 (qemu_driver.c:10107)
==8724== by 0x4F3D7D0: virDomainMigratePerform3 (libvirt.c:6253)
==8724== by 0x438CD7: remoteDispatchDomainMigratePerform3 (remote.c:3593)
==8724== by 0x418A8D: remoteDispatchDomainMigratePerform3Helper
(remote_dispatch.h:3629)
==8724== by 0x4FAC16D: virNetServerProgramDispatchCall
(virnetserverprogram.c:431)
==8724==
==8724== Invalid read of size 4
==8724== at 0x390A00A630: pthread_mutex_unlock
(pthread_mutex_unlock.c:37)
==8724== by 0x4E893AF: virMutexUnlock (threads-pthread.c:90)
==8724== by 0x4F76DD1: remoteClientCloseFunc (remote_driver.c:354)
==8724== by 0x4FA7E12: virNetClientCloseLocked (virnetclient.c:693)
==8724== by 0x4FAAB69: virNetClientIncomingEvent (virnetclient.c:1866)
==8724== by 0x4FBFE73: virNetSocketEventHandle (virnetsocket.c:1500)
==8724== by 0x4E6E69F: virEventPollDispatchHandles (event_poll.c:485)
==8724== by 0x4E6EF38: virEventPollRunOnce (event_poll.c:632)
==8724== by 0x4E6CFD7: virEventRunDefaultImpl (event.c:247)
==8724== by 0x4FB2DEB: virNetServerRun (virnetserver.c:748)
==8724== by 0x40DC17: main (libvirtd.c:1228)
==8724== Address 0x14df2540 is 160 bytes inside a block of size 312 free'd
==8724== at 0x4A063F0: free (vg_replace_malloc.c:446)
==8724== by 0x4E7AA26: virFree (memory.c:419)
==8724== by 0x4E94D5C: virObjectUnref (virobject.c:145)
==8724== by 0x4F2EEDB: virConnectClose (libvirt.c:1458)
==8724== by 0x12A9A67E: doPeer2PeerMigrate (qemu_migration.c:2630)
==8724== by 0x12A9A8BB: qemuMigrationPerformJob (qemu_migration.c:2685)
==8724== by 0x12A9B0D4: qemuMigrationPerform (qemu_migration.c:2872)
==8724== by 0x12ADA7E0: qemuDomainMigratePerform3 (qemu_driver.c:10107)
==8724== by 0x4F3D7D0: virDomainMigratePerform3 (libvirt.c:6253)
==8724== by 0x438CD7: remoteDispatchDomainMigratePerform3 (remote.c:3593)
==8724== by 0x418A8D: remoteDispatchDomainMigratePerform3Helper
(remote_dispatch.h:3629)
==8724== by 0x4FAC16D: virNetServerProgramDispatchCall
(virnetserverprogram.c:431)
==8724==
==8724== Invalid write of size 4
==8724== at 0x390A00A645: pthread_mutex_unlock
(pthread_mutex_unlock.c:46)
==8724== by 0x4E893AF: virMutexUnlock (threads-pthread.c:90)
==8724== by 0x4F76DD1: remoteClientCloseFunc (remote_driver.c:354)
==8724== by 0x4FA7E12: virNetClientCloseLocked (virnetclient.c:693)
==8724== by 0x4FAAB69: virNetClientIncomingEvent (virnetclient.c:1866)
==8724== by 0x4FBFE73: virNetSocketEventHandle (virnetsocket.c:1500)
==8724== by 0x4E6E69F: virEventPollDispatchHandles (event_poll.c:485)
==8724== by 0x4E6EF38: virEventPollRunOnce (event_poll.c:632)
==8724== by 0x4E6CFD7: virEventRunDefaultImpl (event.c:247)
==8724== by 0x4FB2DEB: virNetServerRun (virnetserver.c:748)
==8724== by 0x40DC17: main (libvirtd.c:1228)
==8724== Address 0x14df2538 is 152 bytes inside a block of size 312 free'd
==8724== at 0x4A063F0: free (vg_replace_malloc.c:446)
==8724== by 0x4E7AA26: virFree (memory.c:419)
==8724== by 0x4E94D5C: virObjectUnref (virobject.c:145)
==8724== by 0x4F2EEDB: virConnectClose (libvirt.c:1458)
==8724== by 0x12A9A67E: doPeer2PeerMigrate (qemu_migration.c:2630)
==8724== by 0x12A9A8BB: qemuMigrationPerformJob (qemu_migration.c:2685)
==8724== by 0x12A9B0D4: qemuMigrationPerform (qemu_migration.c:2872)
==8724== by 0x12ADA7E0: qemuDomainMigratePerform3 (qemu_driver.c:10107)
==8724== by 0x4F3D7D0: virDomainMigratePerform3 (libvirt.c:6253)
==8724== by 0x438CD7: remoteDispatchDomainMigratePerform3 (remote.c:3593)
==8724== by 0x418A8D: remoteDispatchDomainMigratePerform3Helper
(remote_dispatch.h:3629)
==8724== by 0x4FAC16D: virNetServerProgramDispatchCall
(virnetserverprogram.c:431)
==8724==
==8724== Invalid read of size 4
==8724== at 0x390A00A64C: pthread_mutex_unlock
(pthread_mutex_unlock.c:49)
==8724== by 0x4E893AF: virMutexUnlock (threads-pthread.c:90)
==8724== by 0x4F76DD1: remoteClientCloseFunc (remote_driver.c:354)
==8724== by 0x4FA7E12: virNetClientCloseLocked (virnetclient.c:693)
==8724== by 0x4FAAB69: virNetClientIncomingEvent (virnetclient.c:1866)
==8724== by 0x4FBFE73: virNetSocketEventHandle (virnetsocket.c:1500)
==8724== by 0x4E6E69F: virEventPollDispatchHandles (event_poll.c:485)
==8724== by 0x4E6EF38: virEventPollRunOnce (event_poll.c:632)
==8724== by 0x4E6CFD7: virEventRunDefaultImpl (event.c:247)
==8724== by 0x4FB2DEB: virNetServerRun (virnetserver.c:748)
==8724== by 0x40DC17: main (libvirtd.c:1228)
==8724== Address 0x14df253c is 156 bytes inside a block of size 312 free'd
==8724== at 0x4A063F0: free (vg_replace_malloc.c:446)
==8724== by 0x4E7AA26: virFree (memory.c:419)
==8724== by 0x4E94D5C: virObjectUnref (virobject.c:145)
==8724== by 0x4F2EEDB: virConnectClose (libvirt.c:1458)
==8724== by 0x12A9A67E: doPeer2PeerMigrate (qemu_migration.c:2630)
==8724== by 0x12A9A8BB: qemuMigrationPerformJob (qemu_migration.c:2685)
==8724== by 0x12A9B0D4: qemuMigrationPerform (qemu_migration.c:2872)
==8724== by 0x12ADA7E0: qemuDomainMigratePerform3 (qemu_driver.c:10107)
==8724== by 0x4F3D7D0: virDomainMigratePerform3 (libvirt.c:6253)
==8724== by 0x438CD7: remoteDispatchDomainMigratePerform3 (remote.c:3593)
==8724== by 0x418A8D: remoteDispatchDomainMigratePerform3Helper
(remote_dispatch.h:3629)
==8724== by 0x4FAC16D: virNetServerProgramDispatchCall
(virnetserverprogram.c:431)
==8724==
==8724== Invalid read of size 4
==8724== at 0x390A00A656: pthread_mutex_unlock
(pthread_mutex_unlock.c:52)
==8724== by 0x4E893AF: virMutexUnlock (threads-pthread.c:90)
==8724== by 0x4F76DD1: remoteClientCloseFunc (remote_driver.c:354)
==8724== by 0x4FA7E12: virNetClientCloseLocked (virnetclient.c:693)
==8724== by 0x4FAAB69: virNetClientIncomingEvent (virnetclient.c:1866)
==8724== by 0x4FBFE73: virNetSocketEventHandle (virnetsocket.c:1500)
==8724== by 0x4E6E69F: virEventPollDispatchHandles (event_poll.c:485)
==8724== by 0x4E6EF38: virEventPollRunOnce (event_poll.c:632)
==8724== by 0x4E6CFD7: virEventRunDefaultImpl (event.c:247)
==8724== by 0x4FB2DEB: virNetServerRun (virnetserver.c:748)
==8724== by 0x40DC17: main (libvirtd.c:1228)
==8724== Address 0x14df2530 is 144 bytes inside a block of size 312 free'd
==8724== at 0x4A063F0: free (vg_replace_malloc.c:446)
==8724== by 0x4E7AA26: virFree (memory.c:419)
==8724== by 0x4E94D5C: virObjectUnref (virobject.c:145)
==8724== by 0x4F2EEDB: virConnectClose (libvirt.c:1458)
==8724== by 0x12A9A67E: doPeer2PeerMigrate (qemu_migration.c:2630)
==8724== by 0x12A9A8BB: qemuMigrationPerformJob (qemu_migration.c:2685)
==8724== by 0x12A9B0D4: qemuMigrationPerform (qemu_migration.c:2872)
==8724== by 0x12ADA7E0: qemuDomainMigratePerform3 (qemu_driver.c:10107)
==8724== by 0x4F3D7D0: virDomainMigratePerform3 (libvirt.c:6253)
==8724== by 0x438CD7: remoteDispatchDomainMigratePerform3 (remote.c:3593)
==8724== by 0x418A8D: remoteDispatchDomainMigratePerform3Helper
(remote_dispatch.h:3629)
==8724== by 0x4FAC16D: virNetServerProgramDispatchCall
(virnetserverprogram.c:431)
==8724==
DEBUG: callback returned
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org