On Mon, Apr 26, 2021 at 17:23:15 +0800, gongwei(a)smartx.com wrote:
When stop libvirtd is used, libvirtd exits the eventloop and
cleans up
the driverState first. Then release threadPool. If the workers thread
is still executing at this time, it needs to access driverState.
If the value in driverState is not judged at this time, direct access
will cause an abnormal exit and damage the cache file of libvirt.
In our example, the migration task is in progress at this time,
the source is waiting for the target libvirtd dstFinish to return,
the source libvirtd is stopped, and a crash occurs. After start libvirtd,
the corresponding virtual machine process cannot be managed by libvirt
This explanation seems to suggest that this happens when you are
shutting down libvirtd ...
stack:
#0 virSecurityManagerGetNested (mgr=0x7f76141143c0) at
security/security_manager.c:1033
1033 if (STREQ("stack", mgr->drv->name))
(gdb) bt
#0 virSecurityManagerGetNested (mgr=0x7f76141143c0) at
security/security_manager.c:1033
#1 0x00007f761c31660e in virQEMUDriverCreateCapabilities
(driver=driver@entry=0x7f7614111060)
at qemu/qemu_conf.c:1043
... so I'd say that calling this function would be invalid in the first
place.
#2 0x00007f761c3168b3 in virQEMUDriverGetCapabilities
(driver=0x7f7614111060,
refresh=<optimized out>) at qemu/qemu_conf.c:1103
#3 0x00007f761c334d16 in qemuMigrationCookieXMLParse (flags=32,
ctxt=0x7f76040040c0,
doc=0x7f76040425c0, driver=0x7f7614111060, mig=0x7f760400ee10)
at qemu/qemu_migration_cookie.c:1209
#4 qemuMigrationCookieXMLParseStr (flags=32,
xml=0x7f7604004580 "<qemu-migration>\n
<name>519ed304-375a-4819-a2d5-2f0ba662b9bc</name>
049152ab-efdf-4aaf-ab08-b57ac1816351</uuid>
<hostname>gongwei-nestedcluster-20210330042359-1</me>
<hostuuid>41d69"..., driver=0x7f7614111060, mig=0x7f760400ee10)
at qemu/qemu_migration_cookie.c:1404
#5 qemuMigrationEatCookie (driver=driver@entry=0x7f7614111060,
dom=dom@entry=0x7f7604001ac0,
This function was renamed in v6.8.0-17-g43f0944f66 so this backtrace
comes from an old version. Which version did you observe the bug in?
I'm asking because around v6.7.0-71-g399039a6b1 libvirt's shutdown code
was modified to actually wait for threads, so the bug as you are
describing it may no longer be valid.
In case you can reproduce it with upstream libvirtd which this patch
should be against please also include some reasonable steps to show that
it's still happening or at least a recent backtrace.