
On 03/31/2013 10:20 AM, Peter Krempa wrote:
The last Viktor's effort to fix the race and memory corruption unfortunately wasn't complete in the case the close callback was not registered in an connection. At that time, the trail of event's that I'll describe later could still happend and corrupt the memory or cause a crash of the client (including the daemon in case of a p2p migration).
Consider the following prerequisities and trail of events: Let's have a remote connection to a hypervisor that doesn't have a close callback registered and the client is using the event loop. The crash happens in cooperation of 2 threads. Thread E is the event loop and thread W is the worker that does some stuff. R denotes the remote client.
1.) W - The client finishes everything and sheds the last reference on the client 2.) W - The virObject stuff invokes virConnectDispose that invokes doRemoteClose 3.) W - the remote close method invokes the REMOTE_PROC_CLOSE RPC method. 4.) W - The thread is preempted at this point. 5.) R - The remote side recieves the close and closes the socket. 6.) E - poll() wakes up due to the closed socket and invokes the close callback 7.) E - The event loop is preempted right before remoteClientCloseFunc is called 8.) W - The worker now finishes, and frees the conn object. 9.) E - The remoteClientCloseFunc accesses the now-freed conn object in the attempt to retrieve pointer for the real close callback. 10.) Kaboom, corrupted memory/segfault.
With patch 1/6 alone, I think I saw exactly this scenario by doing: # virsh migrate dom qemu+ssh://remote/system --live --p2p --verbose where I observed this on the source side under valgrind. [as to the patch itself, I'm still testing it, and ran out of time to definitively say ack today, but so far I'm liking it] DEBUG: Connection close called, sleeping 2013-04-01 23:31:06.966+0000: 8724: error : virNetSocketReadWire:1181 : End of file while reading data: : Input/output error DEBUG: calling the close callback DEBUG: Finishing close ==8724== Invalid read of size 4 ==8724== at 0x390A009220: pthread_mutex_lock (pthread_mutex_lock.c:50) ==8724== by 0x4E89372: virMutexLock (threads-pthread.c:85) ==8724== by 0x4F76CC0: remoteClientCloseFunc (remote_driver.c:337) ==8724== by 0x4FA7E12: virNetClientCloseLocked (virnetclient.c:693) ==8724== by 0x4FAAB69: virNetClientIncomingEvent (virnetclient.c:1866) ==8724== by 0x4FBFE73: virNetSocketEventHandle (virnetsocket.c:1500) ==8724== by 0x4E6E69F: virEventPollDispatchHandles (event_poll.c:485) ==8724== by 0x4E6EF38: virEventPollRunOnce (event_poll.c:632) ==8724== by 0x4E6CFD7: virEventRunDefaultImpl (event.c:247) ==8724== by 0x4FB2DEB: virNetServerRun (virnetserver.c:748) ==8724== by 0x40DC17: main (libvirtd.c:1228) ==8724== Address 0x14df2540 is 160 bytes inside a block of size 312 free'd ==8724== at 0x4A063F0: free (vg_replace_malloc.c:446) ==8724== by 0x4E7AA26: virFree (memory.c:419) ==8724== by 0x4E94D5C: virObjectUnref (virobject.c:145) ==8724== by 0x4F2EEDB: virConnectClose (libvirt.c:1458) ==8724== by 0x12A9A67E: doPeer2PeerMigrate (qemu_migration.c:2630) ==8724== by 0x12A9A8BB: qemuMigrationPerformJob (qemu_migration.c:2685) ==8724== by 0x12A9B0D4: qemuMigrationPerform (qemu_migration.c:2872) ==8724== by 0x12ADA7E0: qemuDomainMigratePerform3 (qemu_driver.c:10107) ==8724== by 0x4F3D7D0: virDomainMigratePerform3 (libvirt.c:6253) ==8724== by 0x438CD7: remoteDispatchDomainMigratePerform3 (remote.c:3593) ==8724== by 0x418A8D: remoteDispatchDomainMigratePerform3Helper (remote_dispatch.h:3629) ==8724== by 0x4FAC16D: virNetServerProgramDispatchCall (virnetserverprogram.c:431) ==8724== ==8724== Invalid read of size 4 ==8724== at 0x390A00924C: pthread_mutex_lock (pthread_mutex_lock.c:61) ==8724== by 0x4E89372: virMutexLock (threads-pthread.c:85) ==8724== by 0x4F76CC0: remoteClientCloseFunc (remote_driver.c:337) ==8724== by 0x4FA7E12: virNetClientCloseLocked (virnetclient.c:693) ==8724== by 0x4FAAB69: virNetClientIncomingEvent (virnetclient.c:1866) ==8724== by 0x4FBFE73: virNetSocketEventHandle (virnetsocket.c:1500) ==8724== by 0x4E6E69F: virEventPollDispatchHandles (event_poll.c:485) ==8724== by 0x4E6EF38: virEventPollRunOnce (event_poll.c:632) ==8724== by 0x4E6CFD7: virEventRunDefaultImpl (event.c:247) ==8724== by 0x4FB2DEB: virNetServerRun (virnetserver.c:748) ==8724== by 0x40DC17: main (libvirtd.c:1228) ==8724== Address 0x14df2530 is 144 bytes inside a block of size 312 free'd ==8724== at 0x4A063F0: free (vg_replace_malloc.c:446) ==8724== by 0x4E7AA26: virFree (memory.c:419) ==8724== by 0x4E94D5C: virObjectUnref (virobject.c:145) ==8724== by 0x4F2EEDB: virConnectClose (libvirt.c:1458) ==8724== by 0x12A9A67E: doPeer2PeerMigrate (qemu_migration.c:2630) ==8724== by 0x12A9A8BB: qemuMigrationPerformJob (qemu_migration.c:2685) ==8724== by 0x12A9B0D4: qemuMigrationPerform (qemu_migration.c:2872) ==8724== by 0x12ADA7E0: qemuDomainMigratePerform3 (qemu_driver.c:10107) ==8724== by 0x4F3D7D0: virDomainMigratePerform3 (libvirt.c:6253) ==8724== by 0x438CD7: remoteDispatchDomainMigratePerform3 (remote.c:3593) ==8724== by 0x418A8D: remoteDispatchDomainMigratePerform3Helper (remote_dispatch.h:3629) ==8724== by 0x4FAC16D: virNetServerProgramDispatchCall (virnetserverprogram.c:431) ==8724== ==8724== Invalid read of size 4 ==8724== at 0x390A009257: pthread_mutex_lock (pthread_mutex_lock.c:127) ==8724== by 0x4E89372: virMutexLock (threads-pthread.c:85) ==8724== by 0x4F76CC0: remoteClientCloseFunc (remote_driver.c:337) ==8724== by 0x4FA7E12: virNetClientCloseLocked (virnetclient.c:693) ==8724== by 0x4FAAB69: virNetClientIncomingEvent (virnetclient.c:1866) ==8724== by 0x4FBFE73: virNetSocketEventHandle (virnetsocket.c:1500) ==8724== by 0x4E6E69F: virEventPollDispatchHandles (event_poll.c:485) ==8724== by 0x4E6EF38: virEventPollRunOnce (event_poll.c:632) ==8724== by 0x4E6CFD7: virEventRunDefaultImpl (event.c:247) ==8724== by 0x4FB2DEB: virNetServerRun (virnetserver.c:748) ==8724== by 0x40DC17: main (libvirtd.c:1228) ==8724== Address 0x14df253c is 156 bytes inside a block of size 312 free'd ==8724== at 0x4A063F0: free (vg_replace_malloc.c:446) ==8724== by 0x4E7AA26: virFree (memory.c:419) ==8724== by 0x4E94D5C: virObjectUnref (virobject.c:145) ==8724== by 0x4F2EEDB: virConnectClose (libvirt.c:1458) ==8724== by 0x12A9A67E: doPeer2PeerMigrate (qemu_migration.c:2630) ==8724== by 0x12A9A8BB: qemuMigrationPerformJob (qemu_migration.c:2685) ==8724== by 0x12A9B0D4: qemuMigrationPerform (qemu_migration.c:2872) ==8724== by 0x12ADA7E0: qemuDomainMigratePerform3 (qemu_driver.c:10107) ==8724== by 0x4F3D7D0: virDomainMigratePerform3 (libvirt.c:6253) ==8724== by 0x438CD7: remoteDispatchDomainMigratePerform3 (remote.c:3593) ==8724== by 0x418A8D: remoteDispatchDomainMigratePerform3Helper (remote_dispatch.h:3629) ==8724== by 0x4FAC16D: virNetServerProgramDispatchCall (virnetserverprogram.c:431) ==8724== ==8724== Invalid write of size 4 ==8724== at 0x390A00925C: pthread_mutex_lock (pthread_mutex_lock.c:125) ==8724== by 0x4E89372: virMutexLock (threads-pthread.c:85) ==8724== by 0x4F76CC0: remoteClientCloseFunc (remote_driver.c:337) ==8724== by 0x4FA7E12: virNetClientCloseLocked (virnetclient.c:693) ==8724== by 0x4FAAB69: virNetClientIncomingEvent (virnetclient.c:1866) ==8724== by 0x4FBFE73: virNetSocketEventHandle (virnetsocket.c:1500) ==8724== by 0x4E6E69F: virEventPollDispatchHandles (event_poll.c:485) ==8724== by 0x4E6EF38: virEventPollRunOnce (event_poll.c:632) ==8724== by 0x4E6CFD7: virEventRunDefaultImpl (event.c:247) ==8724== by 0x4FB2DEB: virNetServerRun (virnetserver.c:748) ==8724== by 0x40DC17: main (libvirtd.c:1228) ==8724== Address 0x14df2538 is 152 bytes inside a block of size 312 free'd ==8724== at 0x4A063F0: free (vg_replace_malloc.c:446) ==8724== by 0x4E7AA26: virFree (memory.c:419) ==8724== by 0x4E94D5C: virObjectUnref (virobject.c:145) ==8724== by 0x4F2EEDB: virConnectClose (libvirt.c:1458) ==8724== by 0x12A9A67E: doPeer2PeerMigrate (qemu_migration.c:2630) ==8724== by 0x12A9A8BB: qemuMigrationPerformJob (qemu_migration.c:2685) ==8724== by 0x12A9B0D4: qemuMigrationPerform (qemu_migration.c:2872) ==8724== by 0x12ADA7E0: qemuDomainMigratePerform3 (qemu_driver.c:10107) ==8724== by 0x4F3D7D0: virDomainMigratePerform3 (libvirt.c:6253) ==8724== by 0x438CD7: remoteDispatchDomainMigratePerform3 (remote.c:3593) ==8724== by 0x418A8D: remoteDispatchDomainMigratePerform3Helper (remote_dispatch.h:3629) ==8724== by 0x4FAC16D: virNetServerProgramDispatchCall (virnetserverprogram.c:431) ==8724== ==8724== Invalid read of size 8 ==8724== at 0x4F76CC5: remoteClientCloseFunc (remote_driver.c:338) ==8724== by 0x4FA7E12: virNetClientCloseLocked (virnetclient.c:693) ==8724== by 0x4FAAB69: virNetClientIncomingEvent (virnetclient.c:1866) ==8724== by 0x4FBFE73: virNetSocketEventHandle (virnetsocket.c:1500) ==8724== by 0x4E6E69F: virEventPollDispatchHandles (event_poll.c:485) ==8724== by 0x4E6EF38: virEventPollRunOnce (event_poll.c:632) ==8724== by 0x4E6CFD7: virEventRunDefaultImpl (event.c:247) ==8724== by 0x4FB2DEB: virNetServerRun (virnetserver.c:748) ==8724== by 0x40DC17: main (libvirtd.c:1228) ==8724== Address 0x14df25b8 is 280 bytes inside a block of size 312 free'd ==8724== at 0x4A063F0: free (vg_replace_malloc.c:446) ==8724== by 0x4E7AA26: virFree (memory.c:419) ==8724== by 0x4E94D5C: virObjectUnref (virobject.c:145) ==8724== by 0x4F2EEDB: virConnectClose (libvirt.c:1458) ==8724== by 0x12A9A67E: doPeer2PeerMigrate (qemu_migration.c:2630) ==8724== by 0x12A9A8BB: qemuMigrationPerformJob (qemu_migration.c:2685) ==8724== by 0x12A9B0D4: qemuMigrationPerform (qemu_migration.c:2872) ==8724== by 0x12ADA7E0: qemuDomainMigratePerform3 (qemu_driver.c:10107) ==8724== by 0x4F3D7D0: virDomainMigratePerform3 (libvirt.c:6253) ==8724== by 0x438CD7: remoteDispatchDomainMigratePerform3 (remote.c:3593) ==8724== by 0x418A8D: remoteDispatchDomainMigratePerform3Helper (remote_dispatch.h:3629) ==8724== by 0x4FAC16D: virNetServerProgramDispatchCall (virnetserverprogram.c:431) ==8724== ==8724== Invalid read of size 4 ==8724== at 0x390A00A630: pthread_mutex_unlock (pthread_mutex_unlock.c:37) ==8724== by 0x4E893AF: virMutexUnlock (threads-pthread.c:90) ==8724== by 0x4F76DD1: remoteClientCloseFunc (remote_driver.c:354) ==8724== by 0x4FA7E12: virNetClientCloseLocked (virnetclient.c:693) ==8724== by 0x4FAAB69: virNetClientIncomingEvent (virnetclient.c:1866) ==8724== by 0x4FBFE73: virNetSocketEventHandle (virnetsocket.c:1500) ==8724== by 0x4E6E69F: virEventPollDispatchHandles (event_poll.c:485) ==8724== by 0x4E6EF38: virEventPollRunOnce (event_poll.c:632) ==8724== by 0x4E6CFD7: virEventRunDefaultImpl (event.c:247) ==8724== by 0x4FB2DEB: virNetServerRun (virnetserver.c:748) ==8724== by 0x40DC17: main (libvirtd.c:1228) ==8724== Address 0x14df2540 is 160 bytes inside a block of size 312 free'd ==8724== at 0x4A063F0: free (vg_replace_malloc.c:446) ==8724== by 0x4E7AA26: virFree (memory.c:419) ==8724== by 0x4E94D5C: virObjectUnref (virobject.c:145) ==8724== by 0x4F2EEDB: virConnectClose (libvirt.c:1458) ==8724== by 0x12A9A67E: doPeer2PeerMigrate (qemu_migration.c:2630) ==8724== by 0x12A9A8BB: qemuMigrationPerformJob (qemu_migration.c:2685) ==8724== by 0x12A9B0D4: qemuMigrationPerform (qemu_migration.c:2872) ==8724== by 0x12ADA7E0: qemuDomainMigratePerform3 (qemu_driver.c:10107) ==8724== by 0x4F3D7D0: virDomainMigratePerform3 (libvirt.c:6253) ==8724== by 0x438CD7: remoteDispatchDomainMigratePerform3 (remote.c:3593) ==8724== by 0x418A8D: remoteDispatchDomainMigratePerform3Helper (remote_dispatch.h:3629) ==8724== by 0x4FAC16D: virNetServerProgramDispatchCall (virnetserverprogram.c:431) ==8724== ==8724== Invalid write of size 4 ==8724== at 0x390A00A645: pthread_mutex_unlock (pthread_mutex_unlock.c:46) ==8724== by 0x4E893AF: virMutexUnlock (threads-pthread.c:90) ==8724== by 0x4F76DD1: remoteClientCloseFunc (remote_driver.c:354) ==8724== by 0x4FA7E12: virNetClientCloseLocked (virnetclient.c:693) ==8724== by 0x4FAAB69: virNetClientIncomingEvent (virnetclient.c:1866) ==8724== by 0x4FBFE73: virNetSocketEventHandle (virnetsocket.c:1500) ==8724== by 0x4E6E69F: virEventPollDispatchHandles (event_poll.c:485) ==8724== by 0x4E6EF38: virEventPollRunOnce (event_poll.c:632) ==8724== by 0x4E6CFD7: virEventRunDefaultImpl (event.c:247) ==8724== by 0x4FB2DEB: virNetServerRun (virnetserver.c:748) ==8724== by 0x40DC17: main (libvirtd.c:1228) ==8724== Address 0x14df2538 is 152 bytes inside a block of size 312 free'd ==8724== at 0x4A063F0: free (vg_replace_malloc.c:446) ==8724== by 0x4E7AA26: virFree (memory.c:419) ==8724== by 0x4E94D5C: virObjectUnref (virobject.c:145) ==8724== by 0x4F2EEDB: virConnectClose (libvirt.c:1458) ==8724== by 0x12A9A67E: doPeer2PeerMigrate (qemu_migration.c:2630) ==8724== by 0x12A9A8BB: qemuMigrationPerformJob (qemu_migration.c:2685) ==8724== by 0x12A9B0D4: qemuMigrationPerform (qemu_migration.c:2872) ==8724== by 0x12ADA7E0: qemuDomainMigratePerform3 (qemu_driver.c:10107) ==8724== by 0x4F3D7D0: virDomainMigratePerform3 (libvirt.c:6253) ==8724== by 0x438CD7: remoteDispatchDomainMigratePerform3 (remote.c:3593) ==8724== by 0x418A8D: remoteDispatchDomainMigratePerform3Helper (remote_dispatch.h:3629) ==8724== by 0x4FAC16D: virNetServerProgramDispatchCall (virnetserverprogram.c:431) ==8724== ==8724== Invalid read of size 4 ==8724== at 0x390A00A64C: pthread_mutex_unlock (pthread_mutex_unlock.c:49) ==8724== by 0x4E893AF: virMutexUnlock (threads-pthread.c:90) ==8724== by 0x4F76DD1: remoteClientCloseFunc (remote_driver.c:354) ==8724== by 0x4FA7E12: virNetClientCloseLocked (virnetclient.c:693) ==8724== by 0x4FAAB69: virNetClientIncomingEvent (virnetclient.c:1866) ==8724== by 0x4FBFE73: virNetSocketEventHandle (virnetsocket.c:1500) ==8724== by 0x4E6E69F: virEventPollDispatchHandles (event_poll.c:485) ==8724== by 0x4E6EF38: virEventPollRunOnce (event_poll.c:632) ==8724== by 0x4E6CFD7: virEventRunDefaultImpl (event.c:247) ==8724== by 0x4FB2DEB: virNetServerRun (virnetserver.c:748) ==8724== by 0x40DC17: main (libvirtd.c:1228) ==8724== Address 0x14df253c is 156 bytes inside a block of size 312 free'd ==8724== at 0x4A063F0: free (vg_replace_malloc.c:446) ==8724== by 0x4E7AA26: virFree (memory.c:419) ==8724== by 0x4E94D5C: virObjectUnref (virobject.c:145) ==8724== by 0x4F2EEDB: virConnectClose (libvirt.c:1458) ==8724== by 0x12A9A67E: doPeer2PeerMigrate (qemu_migration.c:2630) ==8724== by 0x12A9A8BB: qemuMigrationPerformJob (qemu_migration.c:2685) ==8724== by 0x12A9B0D4: qemuMigrationPerform (qemu_migration.c:2872) ==8724== by 0x12ADA7E0: qemuDomainMigratePerform3 (qemu_driver.c:10107) ==8724== by 0x4F3D7D0: virDomainMigratePerform3 (libvirt.c:6253) ==8724== by 0x438CD7: remoteDispatchDomainMigratePerform3 (remote.c:3593) ==8724== by 0x418A8D: remoteDispatchDomainMigratePerform3Helper (remote_dispatch.h:3629) ==8724== by 0x4FAC16D: virNetServerProgramDispatchCall (virnetserverprogram.c:431) ==8724== ==8724== Invalid read of size 4 ==8724== at 0x390A00A656: pthread_mutex_unlock (pthread_mutex_unlock.c:52) ==8724== by 0x4E893AF: virMutexUnlock (threads-pthread.c:90) ==8724== by 0x4F76DD1: remoteClientCloseFunc (remote_driver.c:354) ==8724== by 0x4FA7E12: virNetClientCloseLocked (virnetclient.c:693) ==8724== by 0x4FAAB69: virNetClientIncomingEvent (virnetclient.c:1866) ==8724== by 0x4FBFE73: virNetSocketEventHandle (virnetsocket.c:1500) ==8724== by 0x4E6E69F: virEventPollDispatchHandles (event_poll.c:485) ==8724== by 0x4E6EF38: virEventPollRunOnce (event_poll.c:632) ==8724== by 0x4E6CFD7: virEventRunDefaultImpl (event.c:247) ==8724== by 0x4FB2DEB: virNetServerRun (virnetserver.c:748) ==8724== by 0x40DC17: main (libvirtd.c:1228) ==8724== Address 0x14df2530 is 144 bytes inside a block of size 312 free'd ==8724== at 0x4A063F0: free (vg_replace_malloc.c:446) ==8724== by 0x4E7AA26: virFree (memory.c:419) ==8724== by 0x4E94D5C: virObjectUnref (virobject.c:145) ==8724== by 0x4F2EEDB: virConnectClose (libvirt.c:1458) ==8724== by 0x12A9A67E: doPeer2PeerMigrate (qemu_migration.c:2630) ==8724== by 0x12A9A8BB: qemuMigrationPerformJob (qemu_migration.c:2685) ==8724== by 0x12A9B0D4: qemuMigrationPerform (qemu_migration.c:2872) ==8724== by 0x12ADA7E0: qemuDomainMigratePerform3 (qemu_driver.c:10107) ==8724== by 0x4F3D7D0: virDomainMigratePerform3 (libvirt.c:6253) ==8724== by 0x438CD7: remoteDispatchDomainMigratePerform3 (remote.c:3593) ==8724== by 0x418A8D: remoteDispatchDomainMigratePerform3Helper (remote_dispatch.h:3629) ==8724== by 0x4FAC16D: virNetServerProgramDispatchCall (virnetserverprogram.c:431) ==8724== DEBUG: callback returned -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org