Yes, we have recently encountered this issue, and it is exactly the same
as the scenario described in the bug report. We investigated the code
and found this potential problem. The issue occurs when collecting
information through the 'virsh capabilities' while concurrently creating
vm. The higher the frequency and concurrency, the more likely the issue
will reproduce. We believe this is a very rare bug; our environment has
been running for three years, and we've only encountered this issue
once. To successfully reproduce the problem, we constructed a test case
by adding a sleep between the lock nested and unlock. But we still need
to try to fix this issue, even though the probability is very low.
在 2024/7/3 8:49 下午, Michal Prívozník 写道:
> On 6/25/24 12:03, hongmianquan via Devel wrote:
>> We need to ensure top lock is acquired before nested lock. Otherwise deadlock
>> issues may arise. We have the stack security driver, which internally manages
>> other security drivers, we call them "top" and "nested".
>>
>> We call virSecurityStackPreFork() to lock the top one, and it also locks
>> and then unlocks the nested drivers prior to fork. Then in
qemuSecurityPostFork(),
>> it unlocks the top one, but not the nested ones. Thus, if one of the nested
>> drivers ("dac" or "selinux") is still locked, it will cause a
deadlock.
>> We discovered this case: the nested list obtained through the
qemuSecurityGetNested()
>> will be locked for subsequent use, such as in virQEMUDriverCreateCapabilities(),
>> where the nested list is locked using qemuSecurityGetDOI, but the top one is not
locked beforehand.
>>
>> The problem stack is as follows:
>>
>> libvirtd thread1 libvirtd thread2 child libvirtd
>> | | |
>> | | |
>> virsh capabilities qemuProcessLanuch |
>> | | |
>> | lock top |
>> | | |
>> lock nested | |
>> | | |
>> | fork------------------->|(held nested
lock)
>> | | |
>> | | |
>> unlock nested unlock top unlock top
>> |
>> |
>> qemuSecuritySetSocketLabel
>> |
>> |
>> lock nested (deadlock)
>>
>> In this commit, we ensure that the top lock is acquired before the nested lock,
>> so during fork, it's not possible for another task to acquire the nested
lock.
>>
>> Resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1303031
>>
>> Signed-off-by: hongmianquan <hongmianquan(a)bytedance.com>
>> ---
>> src/libvirt_private.syms | 3 ++-
>> src/qemu/qemu_conf.c | 9 ++++++++-
>> src/qemu/qemu_driver.c | 16 +++++++++-------
>> src/qemu/qemu_security.h | 2 ++
>> src/security/security_manager.c | 22 ++++++++++++++++++++++
>> src/security/security_manager.h | 2 ++
>> 6 files changed, 45 insertions(+), 9 deletions(-)
>
> I can get myself convinced there is a deadlock, but I haven't seen it in
> a while. Have you encountered this problem recently?
>
> Michal
>