On 09/13/2018 10:11 PM, wang.yechao255(a)zte.com.cn wrote:
I just code review, found there may be problem.
The follow statement in founction qemuProcessReconnectHelper:
"if (virThreadCreate(&thread, false, qemuProcessReconnect, data) < 0) "
may be failed (no one can guarantee 'virThreadCreate' always success).
Not sure how, but your response isn't threaded with your original
posting. Makes it difficult to follow.
Posting a few bugs related to qemuProcessReconnectHelper - makes me
wonder "how" you get yourself into this situation. Is it like the
libnuma bug and you're creating scripts that have frequent and continual
libvirtd restarts, which I noted there isn't not really a normal situation.
In any case, I think in this situation what we want is to call
virDomainObjListRemoveLocked instead of virDomainObjListRemove from
qemuDomainRemoveInactive because all the prerequisites for calling the
Locked function are true (the obj is locked/reffed and the list lock is
held).
Now this is *not* to say qemuDomainRemoveInactive should always call the
*Locked variant. It means adding a "bool useLocked" to the API and then
passing true *only* from qemuProcessReconnectHelper and then of course
calling the virDomainObjListRemoveLocked variant when the bool is true.
Requires changing all the other callers to pass false.
Another preferred option:
Patch 1. Split up qemuDomainRemoveInactive, such that code from "cfg =
virQEMUDriverGetConfig(driver);" through/including
"qemuExtDevicesCleanupHost(driver, vm->def);" plus the
"virObjectUnref(cfg);" is in a static helper
"qemuDomainRemoveInactiveCommon".
Patch 2. Create a qemuDomainRemoveInactiveLocked which is a copy of
qemuDomainRemoveInactive except that instead of calling
virDomainObjListRemove it calls virDomainObjListRemoveLocked.
Patch 3. Create a qemuDomainRemoveInactiveJobLocked which copies
qemuDomainRemoveInactiveJob except of course calling
virDomainObjListRemoveLocked.
Patch 4. Modify qemuProcessReconnectHelper to call
qemuDomainRemoveInactiveJobLocked.
John
if ‘virThreadCreate’ failed, the follow backstrace we will get:
#0 0x00007fa89921203e in pthread_rwlock_wrlock () from
/lib64/libpthread.so.0
#1 0x00007fa89ba218e5 in virRWLockWrite (m=<optimized out>) at
util/virthread.c:122
#2 0x00007fa89b9f9ebb in virObjectRWLockWrite (anyobj=<optimized out>)
at util/virobject.c:487
#3 0x00007fa89ba82a68 in virDomainObjListRemove (doms=0x7fa87411fde0,
dom=0x7fa8740f94f0) at conf/virdomainobjlist.c:400
#4 0x00007fa87e1b9ace in qemuDomainRemoveInactive
(driver=driver@entry=0x7fa87411aa20, vm=vm@entry=0x7fa8740f94f0) at
qemu/qemu_domain.c:8309
#5 0x00007fa87e1b9c02 in qemuDomainRemoveInactiveJob
(driver=0x7fa87411aa20, vm=0x7fa8740f94f0) at qemu/qemu_domain.c:8331
#6 0x00007fa87e1ef36d in qemuProcessReconnectHelper
(obj=0x7fa8740f94f0, opaque=0x7fa87b4b3c30) at qemu/qemu_process.c:8035
#7 0x00007fa89ba81e9a in virDomainObjListHelper (payload=<optimized
out>, name=<optimized out>, opaque=0x7fa87b4b3c00) at
conf/virdomainobjlist.c:804
#8 0x00007fa89b9ccaa0 in virHashForEach (table=0x7fa87410e520,
iter=iter@entry=0x7fa89ba81e90 <virDomainObjListHelper>,
data=data@entry=0x7fa87b4b3c00)
at util/virhash.c:580
#9 0x00007fa89ba83391 in virDomainObjListForEach (doms=0x7fa87411fde0,
callback=callback@entry=0x7fa87e1ef220 <qemuProcessReconnectHelper>,
opaque=opaque@entry=0x7fa87b4b3c30) at conf/virdomainobjlist.c:819
#10 0x00007fa87e1f1564 in qemuProcessReconnectAll (driver=<optimized
out>) at qemu/qemu_process.c:8056
#11 0x00007fa87e227928 in qemuStateInitialize (privileged=true,
callback=<optimized out>, opaque=<optimized out>) at qemu/qemu_driver.c:919
#12 0x00007fa89bb9f91f in virStateInitialize (privileged=true,
callback=callback@entry=0x7fa89c547cd0 <daemonInhibitCallback>,
opaque=opaque@entry=0x7fa89d875c00)
at libvirt.c:662
#13 0x00007fa89c547d2b in daemonRunStateInit (opaque=0x7fa89d875c00) at
remote/remote_daemon.c:803
#14 0x00007fa89ba21712 in virThreadHelper (data=<optimized out>) at
util/virthread.c:206
#15 0x00007fa89920edc5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007fa898b3673d in clone () from /lib64/libc.so.6
frame 8, virHashForEach has called virObjectLock(doms)
frame 3, virDomainObjListRemove calls virObjectRWLockWrite(doms) again.
thus deadlock occurs.
原始邮件
*发件人:*PeterKrempa <pkrempa(a)redhat.com>
*收件人:*王业超10154425;
*抄送人:*libvir-list(a)redhat.com <libvir-list(a)redhat.com>
*日 期 :*2018年09月13日 19:31
*主 题 :**Re: [libvirt] [PATCH v2] qemu: fix deadlock if
createqemuProcessReconnect thread failed*
On Thu, Sep 13, 2018 at 19:28:12 +0800, Wang Yechao wrote:
> qemuProcessReconnectHelper has hold the doms lock, if create
> qemuProcessReconnect thread failed, it will get the doms lock
> again to remove the dom from doms list.
>
> add obj->inReconnetCtx flag to avoid deadlock.
Please describe the situation more or provide a reproducer.
>
> Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
> ---
> src/conf/domain_conf.h | 1 +
> src/conf/virdomainobjlist.c | 6 ++++--
> src/qemu/qemu_process.c | 1 +
> 3 files changed, 6 insertions(+), 2 deletions(-)
--
libvir-list mailing list
libvir-list(a)redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list