Hi everyone,

 

I do ram migration operation in KVM environment(libvirt1.2.4 qemu1.5.1).

I encountered libvirtd deadlock or segmentfault when I destroy the

migration VM on destination.

I got the problem by flowing steps:

step 1: migrate VM.

step 2: execute "virsh destroy [VMName]" to destroy the migration VM on

      destination immediately.

step 3: the destination libvirtd will be probably deadlock or segmentfault.

 

Deadlock stack as followed:

#0  0x00007fb5c18132d4 in __lll_lock_wait () from /lib64/libpthread.so.0

#1  0x00007fb5c180e659 in _L_lock_1008 () from /lib64/libpthread.so.0

#2  0x00007fb5c180e46e in pthread_mutex_lock () from /lib64/libpthread.so.0

#3  0x00007fb5c45d175f in virMutexLock (m=0x7fb5b0066ed0) at util/virthread.c:88

#4  0x00007fb5c45b6b04 in virObjectLock (anyobj=0x7fb5b0066ec0) at

    util/virobject.c:323

#5  0x00007fb5b8f4842a in qemuMonitorEmitEvent (mon=0x7fb5b0066ec0,

    event=0x7fb5b00688d0 "SHUTDOWN", seconds=1399374472, micros=509994,

    details=0x0) at qemu/qemu_monitor.c:1185

#6  0x00007fb5b8f62af2 in qemuMonitorJSONIOProcessEvent (mon=0x7fb5b0066ec0,

    obj=0x7fb5b0069080) at qemu/qemu_monitor_json.c:158

#7  0x00007fb5b8f62d25 in qemuMonitorJSONIOProcessLine (mon=0x7fb5b0066ec0,

    line=0x7fb5b005bbe0 "{\"timestamp\": {\"seconds\": 1399374472,

    \"microseconds\": 509994}, \"event\": \"SHUTDOWN\"}",msg=0x7fb5bd873c80)

    at qemu/qemu_monitor_json.c:195

#8  0x00007fb5b8f62f85 in qemuMonitorJSONIOProcess (mon=0x7fb5b0066ec0,

    data=0x7fb5b0060770 "{\"timestamp\": {\"seconds\": 1399374472,

    \"microseconds\": 509994},\"event\": \"SHUTDOWN\"}\r\n", len=85,

    msg=0x7fb5bd873c80) at qemu/qemu_monitor_json.c:237

#9  0x00007fb5b8f49aa0 in qemuMonitorIOProcess (mon=0x7fb5b0066ec0)

    at qemu/qemu_monitor.c:402

#10 0x00007fb5b8f4a09b in qemuMonitorIO (watch=20, fd=24, events=0,

    opaque=0x7fb5b0066ec0) at qemu/qemu_monitor.c:651

#11 0x00007fb5c458c4d9 in virEventPollDispatchHandles (nfds=17, fds=0x7fb5b0068a60)

    at util/vireventpoll.c:510

#12 0x00007fb5c458decf in virEventPollRunOnce () at util/vireventpoll.c:659

#13 0x00007fb5c458bfcc in virEventRunDefaultImpl () at util/virevent.c:308

#14 0x00007fb5c51a17a9 in virNetServerRun (srv=0x7fb5c5411d70)

    at rpc/virnetserver.c:1139

#15 0x00007fb5c5157f63 in main (argc=3, argv=0x7fff7fc04f48) at libvirtd.c:150

 

 

After analysis, I found it may be caused by multithreaded simultaneously

access to the global variables "vm->privateData->mon".

When problems occurthere are three libvirtd threads at work on destination

hostsuppose

ThreadA: migration threaddo qemuProcessStart.

ThreadB: destroy threaddo qemuDoaminDestroy -> qemuProcessStop.

ThreadCMonitor Threaddo IOWriteIORead and some other operations according to

the mon->msg when mon->fd change. When threadB destroy happpens, this thread would

handle the SUHTDOWN event.

 

In threadA, when it sends QMP command to Qemu, it will operate the vm->privateData->mon

lock. Such as the operation "qemuDomainObjEnterMonitor -> qemuMonitorSetBalloon ->

qemuDomainObjExitMonitor", but it's not an atomic operation. If "virsh destroy [VMName]"

happens during this operation, threadB will set the lock vm->privateData->mon to NULL in

qemuProcessStop. And then in threadA, the function qemuDomainObjExitMonitor will fail to

unlock vm->privateData->mon as it's NULL. So, threadC will never acquire the

vm->privateData->mon lock and the deadlock problem happened.

 

what was worse, if qemuMonitorSetBalloon perform succeed in threadA.

ThreadA will coutinue to execute till it enter the function qemuMonitorSetDomainLog,

it would cause segmentfault at VIR_FORCE_CLOSE(mon->logfd) due to mon is NULL.

 

I could not find a good way to sovle this problem. Does anyone have good ideas?

Thanks.

 

Ps:

I find an easy way to reproduce this problem  more probably by following steps:

 

step 1: Fault Injection, fit into this patch and update the libvirtd on destination host:

--- src/qemu/qemu_process.c     2014-05-06 19:06:00.000000000 +0800

+++ src/qemu/qemu_process.c     2014-05-06 19:07:12.000000000 +0800

@@ -4131,6 +4131,8 @@

                        vm->def->mem.cur_balloon);

         goto cleanup;

     }

+    VIR_DEBUG("Fault Injection, sleep 3 seconds.");

+    sleep(3);

     qemuDomainObjEnterMonitor(driver, vm);

     if (vm->def->memballoon && vm->def->memballoon->period)

         qemuMonitorSetMemoryStatsPeriod(priv->mon, vm->def->memballoon->period);

 

step 2: migrate VM.

step 3: execute "virsh destroy [VMName]" to destroy the migration VM on destination

        when log prints "Fault Injection, sleep 3 seconds."

step 4: the libvirtd deadlock stack occurs as above mentioned.

 

Regards