On 12/03/14 06:49, Martin Kletzander wrote:
There is one problem that causes various errors in the daemon. When
domain is waiting for a job, it is unlocked while waiting on the
condition. However, if that domain is for example transient and
being removed in another API (e.g. cancelling incoming migration), it
get's unref'd. If the first call, that was waiting, fails to get the
job, it unref's the domain object, and because it was the last
reference, it causes clearing of the whole domain object. However,
when finishing the call, the domain must be unlocked, but there is no
way for the API to know whether it was cleaned or not (unless there
is some ugly temporary variable, but let's scratch that).
The root cause is that our APIs don't ref the objects they are using
and all use the implicit reference that the object has when it is in
the domain list. That reference can be removed when the API is
waiting for a job. And because each domain doesn't do its ref'ing,
it results in the ugly checking of the return value of
virObjectUnref() that we have everywhere.
This patch changes qemuDomObjFromDomain() to ref the domain (using
virDomainObjListFindByUUIDRef()) and adds qemuDomObjEndAPI() which
should be the only function in which the return value of
virObjectUnref() is checked. This makes all reference counting
deterministic and makes the code a bit clearer.
Signed-off-by: Martin Kletzander <mkletzan(a)redhat.com> ---
src/qemu/THREADS.txt | 40 ++- src/qemu/qemu_domain.c | 28
+- src/qemu/qemu_domain.h | 12 +- src/qemu/qemu_driver.c | 708
++++++++++++++++------------------------------
src/qemu/qemu_migration.c | 108 +++---- src/qemu/qemu_migration.h |
10 +- src/qemu/qemu_process.c | 126 ++++----- 7 files changed, 379
insertions(+), 653 deletions(-)
...
index 08d6b7c..a0582d3 100644 --- a/src/qemu/qemu_process.c +++
b/src/qemu/qemu_process.c
...
@@ -3756,6 +3736,9 @@ qemuProcessReconnectHelper(virDomainObjPtr
obj, struct qemuProcessReconnectData *src = opaque; struct
qemuProcessReconnectData *data;
+ virObjectLock(obj); + virObjectRef(obj); + if (!obj->pid)
return 0;
This hunk causes deadlock of the daemon when starting if you have at
least one inactive VM. You lock the object, find out that the pid
is 0 and return without unlocking.
The daemon then locks up when trying to reload snapshot list for
the domain as the domain is still left locked:
Thread 2 (Thread 0x7fd6165f1700 (LWP 847791)):
#0 0x00007fd62c16faf4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fd62c16b459 in _L_lock_535 () from /lib64/libpthread.so.0
#2 0x00007fd62c16b280 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fd62f99b453 in virMutexLock (m=0x7fd61c0e4810) at util/virthread.c:88
#4 0x00007fd62f97cdc6 in virObjectLock (anyobj=0x7fd61c0e4800) at util/virobject.c:323
#5 0x00007fd617672af5 in qemuDomainSnapshotLoad (vm=0x7fd61c0e4800, data=0x7fd61c0023f0)
at qemu/qemu_driver.c:474
#6 0x00007fd62f9f20b8 in virDomainObjListHelper (payload=0x7fd61c0e4800,
name=0x7fd61c0f0de0, opaque=0x7fd6165f0810) at conf/domain_conf.c:20454
#7 0x00007fd62f9559be in virHashForEach (table=0x7fd61c008050, iter=0x7fd62f9f2072
<virDomainObjListHelper>, data=0x7fd6165f0810) at util/virhash.c:521
#8 0x00007fd62f9f213e in virDomainObjListForEach (doms=0x7fd61c03bdd0,
callback=0x7fd617672a55 <qemuDomainSnapshotLoad>, opaque=0x7fd61c0023f0) at
conf/domain_conf.c:20467
#9 0x00007fd61767445c in qemuStateInitialize (privileged=true, callback=0x7fd630584ab2
<daemonInhibitCallback>, opaque=0x7fd63258dba0) at qemu/qemu_driver.c:877
#10 0x00007fd62fa5f05a in virStateInitialize (privileged=true, callback=0x7fd630584ab2
<daemonInhibitCallback>, opaque=0x7fd63258dba0) at libvirt.c:742
#11 0x00007fd630584b7a in daemonRunStateInit (opaque=0x7fd63258dba0) at libvirtd.c:918
#12 0x00007fd62f99b8b4 in virThreadHelper (data=0x7fd6325b4480) at util/virthread.c:197
#13 0x00007fd62c169023 in start_thread () from /lib64/libpthread.so.0
#14 0x00007fd62bea470d in clone () from /lib64/libc.so.6
@@ -3780,8 +3763,6 @@ qemuProcessReconnectHelper(virDomainObjPtr
obj, * this early phase. */
- virObjectLock(obj); - qemuDomainObjRestoreJob(obj,
&data->oldjob);
if (qemuDomainObjBeginJob(src->driver, obj, QEMU_JOB_MODIFY) < 0)
Rest of the review will follow once I find a way to solve that issue.
Peter