
On 2015/4/23 19:06, Daniel P. Berrange wrote:
On Thu, Apr 23, 2015 at 07:00:21PM +0800, zhang bo wrote:
The reason for the problem is that: 1 guestA locks vm while creating each tapDev(virNetDevTapCreate) in qemuBuildCommandLine(), for 10seconds 2 guestB calls qemuMigrationPrepareAny->*virDomainObjListAdd* to get its vm object, which locks 'doms' and waits for the vm lock. 3 doms will be locked until guestA unlock its vm, we say that's 10 seconds. 4 guestC calls qemuDomainMigrateFinish3->virDomainObjListFindByName, which tries to lock doms. because it's
Ok, this is the real core problem - FindByName has a bad impl that requires iterating over every single guest. Unfortunately due to the design of the migration API we can't avoid this call, but we could add a second hash table of name -> virDomainObj so we make it O(1) and lock-less.
I got a question: shall we add an object (similar to doms) and lock it while searching the vm in the new hash table? If so, the problem may still exist.
now locked by guestB, guestC blocks here, and it can't be unpaused for at least 10 seconds. 5 then comes to guestD, guestE, guestF, etc, the downtime will be added up, to even 50 seconds or more. 6 the command 'virsh list' is blocked as well.
Thus, we think the problem must be solved.
Regards, Daniel