
On Mon, Sep 22, 2025 at 16:15:47 +0800, Yong Huang wrote:
On Mon, Sep 22, 2025 at 2:59 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Mon, Sep 22, 2025 at 11:30:46 +0800, Yong Huang wrote:
On Fri, Sep 19, 2025 at 8:23 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Fri, Sep 19, 2025 at 17:09:07 +0800, yong.huang@smartx.com wrote:
From: Hyman Huang <yong.huang@smartx.com>
[...]
3. Launch the migration and use "systemctl restart libvirt" to restart Libvirtd once after migration enters the perform phase.
[...]
Okay so my understanding from your description is that an (early startup) failure in virDomainObjListLoadAllConfigs() (and surrounding code) can result in the daemon shutting down before the threads handling the already loaded(? ... impossible to tell with the abbreviated log below) domains terminate? Right?
Yes, in our productized Libvirt, an early failure in virDomainObjListLoadAllConfigs()
can result in the daemon shutting down.
In the upstream Libvirt, the daemon started up successfully but failed to manage the VM
(The virDomainObjListLoadAllConfigs returns an error since the missing private data
in status XML).
So I assume a non-upstream version. What is it based on? What else did you change?
Thus the other threads trigger a use-after-free on the driver object?
Anyways I think it's clear now that just checking if the callbacks are present doesn't make sense.
Additionally there's now an upstream issue https://gitlab.com/libvirt/libvirt/-/issues/814 which seems to claim a use-after-free on a different code path but still triggered by the cleanup code freeing private data.
Unfortunately I didn't get any logs or backtrace there either.
I'll look into the shutdown code path and see if I can figure it out.
4. Search the log message:
$ cat /var/log/zbs/libvirtd.log |egrep "PrivateData formatter driver does not exist|remoteDispatchDomainMigratePerform3Params" 2025-09-22 03:06:12.517+0000: 1124258: debug : virThreadJobSet:94 :
Thread
1124258 (rpc-worker) is now running job remoteDispatchDomainMigratePerform3Params
This log indicate that 1124258 thread now execute the remoteDispatchDomainMigratePerform3Params
2025-09-22 03:06:12.517+0000: 1124258: debug :
remoteDispatchDomainMigratePerform3ParamsHelper:8804 : server=0x556317979660 client=0x55631799eff0 msg=0x55631799c010 rerr=0x7f08c688b9c0 args=0x7f08a800a820 ret=0x7f08a80053b0 2025-09-22 03:06:21.959+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist
In the execution path of remoteDispatchDomainMigratePerform3Params, it enters the code and the
warning message is logged, while the following warning message is never logged in a successful migration:
+ if (!xmlopt->privateData.format) { + VIR_WARN("PrivateData formatter driver does not exist"); + }
The following info shows the backtrace of virDomainObjFormat in an successful migration:
Successful, meaning you didn't hit the bug?
#0 virDomainObjFormat (obj=obj@entry=0x7fa3342598e0, xmlopt=0x7fa3341c54b0, flags=flags@entry=313) at ../../src/conf/domain_conf.c:30166 #1 0x00007fa395ae8684 in virDomainObjSave (obj=obj@entry=0x7fa3342598e0, xmlopt=<optimized out>, statusDir=0x7fa33412aec0 "/run/libvirt/qemu") at ../../src/conf/domain_conf.c:30375
[...] I asked for a backtrace of all threads as I want to see what the other threads are doing during the shutdown.
2025-09-22 03:06:25.141+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.141+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.153+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist
2025-09-22 03:06:25.153+0000: 1124258: debug : virThreadJobClear:119 :
Thread 1124258 (rpc-worker) finished job remoteDispatchDomainMigratePerform3Params with ret=-1
This log is so abbreviated that it's useless. Please post the full thing somewhere.
:( Since we focus on the shutdown code of Libvirtd, getting the backtrace is not easy, so I added the debug patch.
Additionally if you can reproduce this without the patch I'd be interested in that log as well.
Yes, I reproduce this with Libvirt 6.2.0, the latest version in the
upstream uses the same logic and I assume that it also has this issue and reproducing is not that hard.
So, can you hit this problem with current upstream code? While the logic for formatting data is the same, it's not actually the problem. The problem is in the shutdown logic and I do remember some changes in the code. Especially if you're claiming to use libvirt-6.2 which is 5 years old at this point.