[PATCH 0/1] Refine the domain status saving logic

From: Hyman Huang <yong.huang@smartx.com> When saving the domain status in the path of the virDomainObjSave function, Libvirtd will omit the private data if the format interface is not provided. When the Libvirtd service is starting up and the driver has not yet been initialized, this behavior is justified. However, when the service is shutting down, the driver is being finalized and the interface is being released, a migration job or another job may call the qemuDomainSaveStatus and save the domain status at the same time. For the latter, this behavior causes the XML to be saved without private information (such as monitor path and qemuCaps), which is required for the Libvirtd service to manage a VM during startup. As a result, after restarting the Libvirtd service, a running VM that is being migrated previously might escape management. Thus, when formatting the status XML for a running virtual machine, we need to presume that the "privateData.format" interface is present. Add a thorough check in the virDomainObjSave path to make sure that private data in the status XML file always exists for the running VM so that we won't lose it after restart the Libvirtd service. Please review, thanks. Yong Hyman Huang (1): domain_conf: Do not omit private data when formatting the status XML src/conf/domain_conf.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) -- 2.27.0

From: Hyman Huang <yong.huang@smartx.com> Add a thorough check in the virDomainObjSave path to make sure that private data in the status XML file always exists for the running VM so that we won't lose them after restart the service. --- src/conf/domain_conf.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 281846dfbe..74af08e584 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -29542,9 +29542,24 @@ virDomainObjFormat(virDomainObj *obj, obj->deprecations[i]); } - if (xmlopt->privateData.format && - xmlopt->privateData.format(&buf, obj) < 0) - return NULL; + if (xmlopt->privateData.format) { + if (xmlopt->privateData.format(&buf, obj) < 0) + return NULL; + } else { + /* + * Add a thorough check in the virDomainObjSave path to make + * sure that private data in the status XML file always exists + * for the running VM so that we won't lose them after restart + * the service. + */ + if (virDomainObjIsActive(obj) && + (obj->def->virtType == VIR_DOMAIN_VIRT_KVM)) { + VIR_WARN("Do not omit private data when formatting the" + " status XML for a %s VM", + virDomainStateTypeToString(state)); + return NULL; + } + } if (virDomainDefFormatInternal(obj->def, xmlopt, &buf, flags) < 0) return NULL; -- 2.27.0

On Fri, Sep 19, 2025 at 17:09:08 +0800, yong.huang@smartx.com wrote:
From: Hyman Huang <yong.huang@smartx.com>
Add a thorough check in the virDomainObjSave path to make sure that private data in the status XML file always exists for the running VM so that we won't lose them after restart the service. --- src/conf/domain_conf.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 281846dfbe..74af08e584 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -29542,9 +29542,24 @@ virDomainObjFormat(virDomainObj *obj, obj->deprecations[i]); }
- if (xmlopt->privateData.format && - xmlopt->privateData.format(&buf, obj) < 0) - return NULL; + if (xmlopt->privateData.format) { + if (xmlopt->privateData.format(&buf, obj) < 0) + return NULL; + } else { + /* + * Add a thorough check in the virDomainObjSave path to make + * sure that private data in the status XML file always exists + * for the running VM so that we won't lose them after restart + * the service. + */ + if (virDomainObjIsActive(obj) && + (obj->def->virtType == VIR_DOMAIN_VIRT_KVM)) { + VIR_WARN("Do not omit private data when formatting the" + " status XML for a %s VM", + virDomainStateTypeToString(state)); + return NULL; + } + }
So IIUC (based on the description in the cover letter) you're saying that there's a possibility that on the shutdown of the qemu driver something unregisters the private data formatting functions and thus we'd overwrite the status XML with a XML missing private data, right? In such case this is not the right fix. It just covers up the that fact and also prints unactionable warnings caused by internal bug. The proper fix is that the private data formatters must not be unregistered if we have internal state. Can you elaborate when that happens? This seems like a race condition in the shutdown code.

On Fri, Sep 19, 2025 at 5:44 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Fri, Sep 19, 2025 at 17:09:08 +0800, yong.huang@smartx.com wrote:
From: Hyman Huang <yong.huang@smartx.com>
Add a thorough check in the virDomainObjSave path to make sure that private data in the status XML file always exists for the running VM so that we won't lose them after restart the service. --- src/conf/domain_conf.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 281846dfbe..74af08e584 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -29542,9 +29542,24 @@ virDomainObjFormat(virDomainObj *obj, obj->deprecations[i]); }
- if (xmlopt->privateData.format && - xmlopt->privateData.format(&buf, obj) < 0) - return NULL; + if (xmlopt->privateData.format) { + if (xmlopt->privateData.format(&buf, obj) < 0) + return NULL; + } else { + /* + * Add a thorough check in the virDomainObjSave path to make + * sure that private data in the status XML file always exists + * for the running VM so that we won't lose them after restart + * the service. + */ + if (virDomainObjIsActive(obj) && + (obj->def->virtType == VIR_DOMAIN_VIRT_KVM)) { + VIR_WARN("Do not omit private data when formatting the" + " status XML for a %s VM", + virDomainStateTypeToString(state)); + return NULL; + } + }
So IIUC (based on the description in the cover letter) you're saying that there's a possibility that on the shutdown of the qemu driver something unregisters the private data formatting functions and thus we'd overwrite the status XML with a XML missing private data, right?
In such case this is not the right fix. It just covers up the that fact and also prints unactionable warnings caused by internal bug.
The proper fix is that the private data formatters must not be unregistered if we have internal state.
Can you elaborate when that happens? This seems like a race condition in the shutdown code.
The xmlopt driver contains privateData that implements the parse/format interface:
virQEMUDriverCreateXMLConf virDomainXMLOptionNew xmlopt->privateData = &virQEMUDriverPrivateDataCallbacks; The shutdown code frees the xmlopt field in qemu_driver, thereby releasing the parse/format interface: qemuStateCleanup virObjectUnref(qemu_driver->xmlopt); Meanwhile, the migration thread may call the virDomainObjSave function during the execution of qemuMigrationSrcConfirmPhase or qemuMigrationDstFinish: virDomainObjSave virDomainObjFormat if (xmlopt->privateData.format && xmlopt->privateData.format(&buf, obj) < 0) return NULL; As shown above, the shutdown code and the migration thread race to access the privateData format interface. -- Best regards

On Fri, Sep 19, 2025 at 17:09:07 +0800, yong.huang@smartx.com wrote:
From: Hyman Huang <yong.huang@smartx.com>
When saving the domain status in the path of the virDomainObjSave function, Libvirtd will omit the private data if the format interface is not provided.
When the Libvirtd service is starting up and the driver has not yet been initialized, this behavior is justified.
However, when the service is shutting down, the driver is being finalized and the interface is being released, a migration job or another job may call the qemuDomainSaveStatus and save the domain status at the same time. For the latter, this behavior
I had another look and `virQEMUDriverPrivateDataCallbacks` is a global variable in the qemu driver which is just populated and never changed. This struct is then copied (by assignment) into the virDomainXMLOption instance that is used inside the qemu driver. Thus while this does have a private copy of the function pointers. The virDomainXMLOption instance lives the whole time the virt drivers live, so it's only ever cleared in virStateCleanup, which in the daemons happens right before exit. Now looking at the formatter code (let's use your patch as an example): --- src/conf/domain_conf.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 281846dfbe..74af08e584 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -29542,9 +29542,24 @@ virDomainObjFormat(virDomainObj *obj, obj->deprecations[i]); } =20 - if (xmlopt->privateData.format && - xmlopt->privateData.format(&buf, obj) < 0) - return NULL; + if (xmlopt->privateData.format) { + if (xmlopt->privateData.format(&buf, obj) < 0) Here you see that the code assumes that 'xmlopt' is *not NULL*. Inside of XMLopt you have only pointers that don't change. As of such the 'format' callback being NULL is only if something overwrote the data inside 'xmlopt', which we don't do anywhere because it doesn't make sense. The other possibility is that 'xmlopt' is used after free, thus was randomly overwritten, but that would be a much different kind of bug and your fix wouldn't fix that at all.
causes the XML to be saved without private information (such as monitor path and qemuCaps), which is required for the Libvirtd service to manage a VM during startup. As a result, after restarting the Libvirtd service, a running VM that is being migrated previously might escape management.
Thus, when formatting the status XML for a running virtual machine, we need to presume that the "privateData.format" interface is present.
Add a thorough check in the virDomainObjSave path to make sure that private data in the status XML file always exists for the running VM so that we won't lose it after restart the Libvirtd service.
Now this outlines the implications of losing the private data but doesn't really give any information how this happened. Can you please clarify: 1) What did you do to trigger this bug? What was the state of the host? 2) How did you figure out that what you claim happened? 3) Can you reproduce the bug? If you can reproduce the bug please make sure to collect debug logs of libvirtd, and potentially consider running the daemon inside e.g. valgrind to see if it reproduces with such heavy instrumentation.

On Fri, Sep 19, 2025 at 8:23 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Fri, Sep 19, 2025 at 17:09:07 +0800, yong.huang@smartx.com wrote:
From: Hyman Huang <yong.huang@smartx.com>
When saving the domain status in the path of the virDomainObjSave function, Libvirtd will omit the private data if the format interface is not provided.
When the Libvirtd service is starting up and the driver has not yet been initialized, this behavior is justified.
However, when the service is shutting down, the driver is being finalized and the interface is being released, a migration job or another job may call the qemuDomainSaveStatus and save the domain status at the same time. For the latter, this behavior
I had another look and `virQEMUDriverPrivateDataCallbacks` is a global variable in the qemu driver which is just populated and never changed.
This struct is then copied (by assignment) into the virDomainXMLOption instance that is used inside the qemu driver. Thus while this does have a private copy of the function pointers. The virDomainXMLOption instance lives the whole time the virt drivers live, so it's only ever cleared in virStateCleanup, which in the daemons happens right before exit.
Now looking at the formatter code (let's use your patch as an example):
--- src/conf/domain_conf.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 281846dfbe..74af08e584 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -29542,9 +29542,24 @@ virDomainObjFormat(virDomainObj *obj, obj->deprecations[i]); } =20 - if (xmlopt->privateData.format && - xmlopt->privateData.format(&buf, obj) < 0) - return NULL; + if (xmlopt->privateData.format) { + if (xmlopt->privateData.format(&buf, obj) < 0)
Here you see that the code assumes that 'xmlopt' is *not NULL*. Inside of XMLopt you have only pointers that don't change.
As of such the 'format' callback being NULL is only if something overwrote the data inside 'xmlopt', which we don't do anywhere because it doesn't make sense.
The other possibility is that 'xmlopt' is used after free, thus was randomly overwritten, but that would be a much different kind of bug and your fix wouldn't fix that at all.
causes the XML to be saved without private information (such as monitor path and qemuCaps), which is required for the Libvirtd service to manage a VM during startup. As a result, after restarting the Libvirtd service, a running VM that is being migrated previously might escape management.
Thus, when formatting the status XML for a running virtual machine, we need to presume that the "privateData.format" interface is present.
Add a thorough check in the virDomainObjSave path to make sure that private data in the status XML file always exists for the running VM so that we won't lose it after restart the Libvirtd service.
Now this outlines the implications of losing the private data but doesn't really give any information how this happened.
Can you please clarify: 1) What did you do to trigger this bug? What was the state of the host? 2) How did you figure out that what you claim happened? 3) Can you reproduce the bug?
If you can reproduce the bug please make sure to collect debug logs of libvirtd, and potentially consider running the daemon inside e.g. valgrind to see if it reproduces with such heavy instrumentation.
1. Add a debuginfo print patch on Libvirtd, if privateData.format does not exist, log it: --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -30186,6 +30186,11 @@ virDomainObjFormat(virDomainObjPtr obj, xmlopt->privateData.format(&buf, obj) < 0) return NULL; + if (!xmlopt->privateData.format) { + VIR_WARN("PrivateData formatter driver does not exist"); + } + 2. Add a private patch(Not the complete code): Stop process if malformed config found Malformed material may prevent the virtual machine from loading in its live state, according to the configuration XML. If such a scenario was found by the Libvirt service during startup, report an error and terminate. diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index ba5eb7ad1b..9a97fa63fd 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -951,7 +951,7 @@ qemuStateInitialize(bool privileged, /* Get all the running persistent or transient configs first */ if (virDomainObjListLoadAllConfigs(qemu_driver->domains, cfg->stateDir, - NULL, true, false, + NULL, true, true, qemu_driver->xmlopt, NULL, NULL) < 0) --- a/src/bhyve/bhyve_driver.c +++ b/src/bhyve/bhyve_driver.c @@ -1286,14 +1286,14 @@ bhyveStateInitialize(bool privileged, if (virDomainObjListLoadAllConfigs(bhyve_driver->domains, BHYVE_STATE_DIR, - NULL, true, + NULL, true, false, bhyve_driver->xmlopt, NULL, NULL) < 0) goto cleanup; if (virDomainObjListLoadAllConfigs(bhyve_driver->domains, BHYVE_CONFIG_DIR, - BHYVE_AUTOSTART_DIR, false, + BHYVE_AUTOSTART_DIR, false, false, bhyve_driver->xmlopt, NULL, NULL) < 0) --- a/src/conf/virdomainobjlist.c +++ b/src/conf/virdomainobjlist.c @@ -584,6 +584,7 @@ virDomainObjListLoadAllConfigs(virDomainObjListPtr doms, const char *configDir, const char *autostartDir, bool liveStatus, + bool liveStatusErrorFatal, virDomainXMLOptionPtr xmlopt, virDomainLoadConfigNotify notify, void *opaque) @@ -630,9 +631,14 @@ virDomainObjListLoadAllConfigs(virDomainObjListPtr doms, virDomainObjEndAPI(&dom); } else { VIR_ERROR(_("Failed to load config for domain '%s'"), entry->d_name); + if (liveStatus && liveStatusErrorFatal) { + ret = -1; + goto end; + } } } +end: VIR_DIR_CLOSE(dir); virObjectRWUnlock(doms); diff --git a/src/conf/virdomainobjlist.h b/src/conf/virdomainobjlist.h index 6150e13aa4..1ff3679c50 100644 --- a/src/conf/virdomainobjlist.h +++ b/src/conf/virdomainobjlist.h @@ -66,6 +66,7 @@ int virDomainObjListLoadAllConfigs(virDomainObjListPtr doms, const char *configDir, const char *autostartDir, bool liveStatus, + bool liveStatusErrorFatal, virDomainXMLOptionPtr xmlopt, virDomainLoadConfigNotify notify, void *opaque); ... 3. Launch the migration and use "systemctl restart libvirt" to restart Libvirtd once after migration enters the perform phase. 4. Search the log message: $ cat /var/log/zbs/libvirtd.log |egrep "PrivateData formatter driver does not exist|remoteDispatchDomainMigratePerform3Params" 2025-09-22 03:06:12.517+0000: 1124258: debug : virThreadJobSet:94 : Thread 1124258 (rpc-worker) is now running job remoteDispatchDomainMigratePerform3Params 2025-09-22 03:06:12.517+0000: 1124258: debug : remoteDispatchDomainMigratePerform3ParamsHelper:8804 : server=0x556317979660 client=0x55631799eff0 msg=0x55631799c010 rerr=0x7f08c688b9c0 args=0x7f08a800a820 ret=0x7f08a80053b0 2025-09-22 03:06:21.959+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.141+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.141+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.153+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.153+0000: 1124258: debug : virThreadJobClear:119 : Thread 1124258 (rpc-worker) finished job remoteDispatchDomainMigratePerform3Params with ret=-1 Migration rpc-worker routine print the message 5. Here we see the status XML: <domstatus state='running' reason='migrated' pid='1092202'> <taint flag='custom-argv'/> <taint flag='custom-ga-command'/> <domain type='kvm' id='1' xmlns:qemu=' http://libvirt.org/schemas/domain/qemu/1.0'> <name>88030027-a5ce-4c19-aacc-5f1decf0b647</name> <uuid>6c677d6f-d140-48ca-b4ae-aff7fa7e4c50</uuid> <description>SmartX</description> <maxMemory slots='255' unit='KiB'>4194304000</maxMemory> ... It does not contain private data. 6. Finally, in our case, Libvirtd service can not startup; And in the upstream case, the behavior is that VM escapes from managing after startup logically. -- Best regards

On Mon, Sep 22, 2025 at 11:30:46 +0800, Yong Huang wrote:
On Fri, Sep 19, 2025 at 8:23 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Fri, Sep 19, 2025 at 17:09:07 +0800, yong.huang@smartx.com wrote:
From: Hyman Huang <yong.huang@smartx.com>
When saving the domain status in the path of the virDomainObjSave function, Libvirtd will omit the private data if the format interface is not provided.
When the Libvirtd service is starting up and the driver has not yet been initialized, this behavior is justified.
However, when the service is shutting down, the driver is being finalized and the interface is being released, a migration job or another job may call the qemuDomainSaveStatus and save the domain status at the same time. For the latter, this behavior
I had another look and `virQEMUDriverPrivateDataCallbacks` is a global variable in the qemu driver which is just populated and never changed.
This struct is then copied (by assignment) into the virDomainXMLOption instance that is used inside the qemu driver. Thus while this does have a private copy of the function pointers. The virDomainXMLOption instance lives the whole time the virt drivers live, so it's only ever cleared in virStateCleanup, which in the daemons happens right before exit.
Now looking at the formatter code (let's use your patch as an example):
--- src/conf/domain_conf.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 281846dfbe..74af08e584 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -29542,9 +29542,24 @@ virDomainObjFormat(virDomainObj *obj, obj->deprecations[i]); } =20 - if (xmlopt->privateData.format && - xmlopt->privateData.format(&buf, obj) < 0) - return NULL; + if (xmlopt->privateData.format) { + if (xmlopt->privateData.format(&buf, obj) < 0)
Here you see that the code assumes that 'xmlopt' is *not NULL*. Inside of XMLopt you have only pointers that don't change.
As of such the 'format' callback being NULL is only if something overwrote the data inside 'xmlopt', which we don't do anywhere because it doesn't make sense.
The other possibility is that 'xmlopt' is used after free, thus was randomly overwritten, but that would be a much different kind of bug and your fix wouldn't fix that at all.
causes the XML to be saved without private information (such as monitor path and qemuCaps), which is required for the Libvirtd service to manage a VM during startup. As a result, after restarting the Libvirtd service, a running VM that is being migrated previously might escape management.
Thus, when formatting the status XML for a running virtual machine, we need to presume that the "privateData.format" interface is present.
Add a thorough check in the virDomainObjSave path to make sure that private data in the status XML file always exists for the running VM so that we won't lose it after restart the Libvirtd service.
Now this outlines the implications of losing the private data but doesn't really give any information how this happened.
Can you please clarify: 1) What did you do to trigger this bug? What was the state of the host? 2) How did you figure out that what you claim happened? 3) Can you reproduce the bug?
If you can reproduce the bug please make sure to collect debug logs of libvirtd, and potentially consider running the daemon inside e.g. valgrind to see if it reproduces with such heavy instrumentation.
1. Add a debuginfo print patch on Libvirtd, if privateData.format does not exist, log it:
[...]
@@ -630,9 +631,14 @@ virDomainObjListLoadAllConfigs(virDomainObjListPtr doms, virDomainObjEndAPI(&dom); } else { VIR_ERROR(_("Failed to load config for domain '%s'"), entry->d_name); + if (liveStatus && liveStatusErrorFatal) { + ret = -1; + goto end; + } } }
+end: VIR_DIR_CLOSE(dir); virObjectRWUnlock(doms);
[..]
3. Launch the migration and use "systemctl restart libvirt" to restart Libvirtd once after migration enters the perform phase.
[...] Okay so my understanding from your description is that an (early startup) failure in virDomainObjListLoadAllConfigs() (and surrounding code) can result in the daemon shutting down before the threads handling the already loaded(? ... impossible to tell with the abbreviated log below) domains terminate? Right? Thus the other threads trigger a use-after-free on the driver object? Anyways I think it's clear now that just checking if the callbacks are present doesn't make sense. Additionally there's now an upstream issue https://gitlab.com/libvirt/libvirt/-/issues/814 which seems to claim a use-after-free on a different code path but still triggered by the cleanup code freeing private data. Unfortunately I didn't get any logs or backtrace there either. I'll look into the shutdown code path and see if I can figure it out.
4. Search the log message:
$ cat /var/log/zbs/libvirtd.log |egrep "PrivateData formatter driver does not exist|remoteDispatchDomainMigratePerform3Params" 2025-09-22 03:06:12.517+0000: 1124258: debug : virThreadJobSet:94 : Thread 1124258 (rpc-worker) is now running job remoteDispatchDomainMigratePerform3Params 2025-09-22 03:06:12.517+0000: 1124258: debug : remoteDispatchDomainMigratePerform3ParamsHelper:8804 : server=0x556317979660 client=0x55631799eff0 msg=0x55631799c010 rerr=0x7f08c688b9c0 args=0x7f08a800a820 ret=0x7f08a80053b0 2025-09-22 03:06:21.959+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.141+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.141+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.153+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.153+0000: 1124258: debug : virThreadJobClear:119 : Thread 1124258 (rpc-worker) finished job remoteDispatchDomainMigratePerform3Params with ret=-1
This log is so abbreviated that it's useless. Please post the full thing somewhere. Additionally if you can reproduce this without the patch I'd be interested in that log as well.
Migration rpc-worker routine print the message
5. Here we see the status XML:
<domstatus state='running' reason='migrated' pid='1092202'> <taint flag='custom-argv'/> <taint flag='custom-ga-command'/> <domain type='kvm' id='1' xmlns:qemu=' http://libvirt.org/schemas/domain/qemu/1.0'> <name>88030027-a5ce-4c19-aacc-5f1decf0b647</name> <uuid>6c677d6f-d140-48ca-b4ae-aff7fa7e4c50</uuid> <description>SmartX</description> <maxMemory slots='255' unit='KiB'>4194304000</maxMemory> ... It does not contain private data.
6. Finally, in our case, Libvirtd service can not startup; And in the upstream case, the behavior is that VM escapes from managing after startup logically.
-- Best regards

On Mon, Sep 22, 2025 at 2:59 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Fri, Sep 19, 2025 at 8:23 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Fri, Sep 19, 2025 at 17:09:07 +0800, yong.huang@smartx.com wrote:
From: Hyman Huang <yong.huang@smartx.com>
When saving the domain status in the path of the virDomainObjSave function, Libvirtd will omit the private data if the format interface is not provided.
When the Libvirtd service is starting up and the driver has not yet been initialized, this behavior is justified.
However, when the service is shutting down, the driver is being finalized and the interface is being released, a migration job or another job may call the qemuDomainSaveStatus and save the domain status at the same time. For the latter, this behavior
I had another look and `virQEMUDriverPrivateDataCallbacks` is a global variable in the qemu driver which is just populated and never changed.
This struct is then copied (by assignment) into the virDomainXMLOption instance that is used inside the qemu driver. Thus while this does have a private copy of the function pointers. The virDomainXMLOption instance lives the whole time the virt drivers live, so it's only ever cleared in virStateCleanup, which in the daemons happens right before exit.
Now looking at the formatter code (let's use your patch as an example):
--- src/conf/domain_conf.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 281846dfbe..74af08e584 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -29542,9 +29542,24 @@ virDomainObjFormat(virDomainObj *obj, obj->deprecations[i]); } =20 - if (xmlopt->privateData.format && - xmlopt->privateData.format(&buf, obj) < 0) - return NULL; + if (xmlopt->privateData.format) { + if (xmlopt->privateData.format(&buf, obj) < 0)
Here you see that the code assumes that 'xmlopt' is *not NULL*. Inside of XMLopt you have only pointers that don't change.
As of such the 'format' callback being NULL is only if something overwrote the data inside 'xmlopt', which we don't do anywhere because it doesn't make sense.
The other possibility is that 'xmlopt' is used after free, thus was randomly overwritten, but that would be a much different kind of bug and your fix wouldn't fix that at all.
causes the XML to be saved without private information (such as monitor path and qemuCaps), which is required for the Libvirtd service to manage a VM during startup. As a result, after restarting the Libvirtd service, a running VM that is being migrated previously might escape management.
Thus, when formatting the status XML for a running virtual machine, we need to presume that the "privateData.format" interface is
On Mon, Sep 22, 2025 at 11:30:46 +0800, Yong Huang wrote: present.
Add a thorough check in the virDomainObjSave path to make sure that private data in the status XML file always exists for the running VM so that we won't lose it after restart the Libvirtd service.
Now this outlines the implications of losing the private data but doesn't really give any information how this happened.
Can you please clarify: 1) What did you do to trigger this bug? What was the state of the host? 2) How did you figure out that what you claim happened? 3) Can you reproduce the bug?
If you can reproduce the bug please make sure to collect debug logs of libvirtd, and potentially consider running the daemon inside e.g. valgrind to see if it reproduces with such heavy instrumentation.
1. Add a debuginfo print patch on Libvirtd, if privateData.format does not exist, log it:
[...]
@@ -630,9 +631,14 @@ virDomainObjListLoadAllConfigs(virDomainObjListPtr doms, virDomainObjEndAPI(&dom); } else { VIR_ERROR(_("Failed to load config for domain '%s'"), entry->d_name); + if (liveStatus && liveStatusErrorFatal) { + ret = -1; + goto end; + } } }
+end: VIR_DIR_CLOSE(dir); virObjectRWUnlock(doms);
[..]
3. Launch the migration and use "systemctl restart libvirt" to restart Libvirtd once after migration enters the perform phase.
[...]
Okay so my understanding from your description is that an (early startup) failure in virDomainObjListLoadAllConfigs() (and surrounding code) can result in the daemon shutting down before the threads handling the already loaded(? ... impossible to tell with the abbreviated log below) domains terminate? Right?
Yes, in our productized Libvirt, an early failure in virDomainObjListLoadAllConfigs() can result in the daemon shutting down. In the upstream Libvirt, the daemon started up successfully but failed to manage the VM (The virDomainObjListLoadAllConfigs returns an error since the missing private data in status XML).
Thus the other threads trigger a use-after-free on the driver object?
Anyways I think it's clear now that just checking if the callbacks are present doesn't make sense.
Additionally there's now an upstream issue https://gitlab.com/libvirt/libvirt/-/issues/814 which seems to claim a use-after-free on a different code path but still triggered by the cleanup code freeing private data.
Unfortunately I didn't get any logs or backtrace there either.
I'll look into the shutdown code path and see if I can figure it out.
4. Search the log message:
$ cat /var/log/zbs/libvirtd.log |egrep "PrivateData formatter driver does not exist|remoteDispatchDomainMigratePerform3Params" 2025-09-22 03:06:12.517+0000: 1124258: debug : virThreadJobSet:94 :
Thread
1124258 (rpc-worker) is now running job remoteDispatchDomainMigratePerform3Params
This log indicate that 1124258 thread now execute the remoteDispatchDomainMigratePerform3Params
2025-09-22 03:06:12.517+0000: 1124258: debug :
remoteDispatchDomainMigratePerform3ParamsHelper:8804 : server=0x556317979660 client=0x55631799eff0 msg=0x55631799c010 rerr=0x7f08c688b9c0 args=0x7f08a800a820 ret=0x7f08a80053b0 2025-09-22 03:06:21.959+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist
In the execution path of remoteDispatchDomainMigratePerform3Params, it enters the code and the warning message is logged, while the following warning message is never logged in a successful migration: + if (!xmlopt->privateData.format) { + VIR_WARN("PrivateData formatter driver does not exist"); + } The following info shows the backtrace of virDomainObjFormat in an successful migration: #0 virDomainObjFormat (obj=obj@entry=0x7fa3342598e0, xmlopt=0x7fa3341c54b0, flags=flags@entry=313) at ../../src/conf/domain_conf.c:30166 #1 0x00007fa395ae8684 in virDomainObjSave (obj=obj@entry=0x7fa3342598e0, xmlopt=<optimized out>, statusDir=0x7fa33412aec0 "/run/libvirt/qemu") at ../../src/conf/domain_conf.c:30375 #2 0x00007fa37dde816c in qemuDomainObjSaveStatus (driver=driver@entry=0x7fa334115690, obj=obj@entry=0x7fa3342598e0) at ../../src/qemu/qemu_domain.c:9669 #3 0x00007fa37ddefb0d in qemuDomainObjSetJobPhase (driver=driver@entry=0x7fa334115690, obj=obj@entry=0x7fa3342598e0, phase=phase@entry=3) at ../../src/qemu/qemu_domain.c:9727 #4 0x00007fa37de37900 in qemuMigrationJobSetPhase (driver=driver@entry=0x7fa334115690, vm=vm@entry=0x7fa3342598e0, phase=phase@entry=QEMU_MIGRATION_PHASE_PERFORM3) at ../../src/qemu/qemu_migration.c:5747 #5 0x00007fa37de41470 in qemuMigrationSrcPerformPeer2Peer3 (flags=<optimized out>, useParams=<optimized out>, bandwidth=0, migParams=0x7fa374011b80, nbdPort=0, migrate_disks_detect_zeroes=0x0, migrate_disks=0x0, nmigrate_disks=0, listenAddress=<optimized out>, graphicsuri=0x0, uri=0x7fa37401ccf0 "tcp://10.123.82.12:49152", dname=0x0, persist_xml=0x0, xmlin=<optimized out>, vm=0x7fa3342598e0, dconnuri=0x7fa374014350 "qemu+tls:// 10.123.82.12/system?no_verify=1", dconn=0x7fa3740148b0, sconn=0x7fa37400eef0, driver=0x7fa334115690) at ../../src/qemu/qemu_migration.c:4523 #6 qemuMigrationSrcPerformPeer2Peer (v3proto=<synthetic pointer>, resource=0, dname=0x0, flags=<optimized out>, migParams=0x7fa374011b80, nbdPort=0, migrate_disks_detect_zeroes=0x0, migrate_disks=0x0, nmigrate_disks=0, listenAddress=<optimized out>, graphicsuri=0x0, uri=<optimized out>, dconnuri=0x7fa374014350 "qemu+tls:// 10.123.82.12/system?no_verify=1", persist_xml=0x0, xmlin=<optimized out>, vm=0x7fa3342598e0, sconn=0x7fa37400eef0, driver=0x7fa334115690) at ../../src/qemu/qemu_migration.c:4835 #7 qemuMigrationSrcPerformJob (driver=driver@entry=0x7fa334115690, conn=conn@entry=0x7fa37400eef0, vm=vm@entry=0x7fa3342598e0, xmlin=xmlin@entry=0x7fa374016570 "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<domain type=\"kvm\" xmlns:qemu=\" http://libvirt.org/schemas/domain/qemu/1.0\"><name>88030027-a5ce-4c19-aacc-5f1decf0b647</name><uuid>6c677d6f-d140-48ca-b4ae-aff7fa7"..., persist_xml=persist_xml@entry=0x0, dconnuri=0x7fa374014350 "qemu+tls:// 10.123.82.12/system?no_verify=1", uri=0x7fa374003a30 "tcp://10.123.82.12", graphicsuri=0x0, listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, migrate_disks_detect_zeroes=0x0, nbdPort=0, migParams=0x7fa374011b80, cookiein=0x0, cookieinlen=0, cookieout=0x7fa3871578c8, cookieoutlen=0x7fa3871578bc, flags=143899, dname=0x0, resource=0, v3proto=<optimized out>) at ../../src/qemu/qemu_migration.c:4911 #8 0x00007fa37de4215b in qemuMigrationSrcPerform (driver=driver@entry=0x7fa334115690, conn=0x7fa37400eef0, vm=0x7fa3342598e0, xmlin=0x7fa374016570 "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<domain type=\"kvm\" xmlns:qemu=\" http://libvirt.org/schemas/domain/qemu/1.0\"><name>88030027-a5ce-4c19-aacc-5f1decf0b647</name><uuid>6c677d6f-d140-48ca-b4ae-aff7fa7"..., persist_xml=0x0, dconnuri=dconnuri@entry=0x7fa374014350 "qemu+tls:// 10.123.82.12/system?no_verify=1", uri=0x7fa374003a30 "tcp://10.123.82.12", graphicsuri=0x0, listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, migrate_disks_detect_zeroes=0x0, nbdPort=0, migParams=0x7fa374011b80, cookiein=0x0, cookieinlen=0, cookieout=0x7fa3871578c8, cookieoutlen=0x7fa3871578bc, flags=143899, dname=0x0, resource=0, v3proto=true) at ../../src/qemu/qemu_migration.c:5101 #9 0x00007fa37de73f98 in qemuDomainMigratePerform3Params (dom=0x7fa374013bc0, dconnuri=0x7fa374014350 "qemu+tls:// 10.123.82.12/system?no_verify=1", params=<optimized out>, nparams=<optimized out>, cookiein=0x0, cookieinlen=0, cookieout=0x7fa3871578c8, cookieoutlen=0x7fa3871578bc, flags=143899) at ../../src/qemu/qemu_driver.c:12742 #10 0x00007fa395bde0f8 in virDomainMigratePerform3Params (domain=domain@entry=0x7fa374013bc0, dconnuri=0x7fa374014350 "qemu+tls:// 10.123.82.12/system?no_verify=1", params=0x7fa374010510, nparams=5, cookiein=0x0, cookieinlen=0, cookieout=0x7fa3871578c8, cookieoutlen=0x7fa3871578bc, flags=143899) at ../../src/libvirt-domain.c:4989 #11 0x000055c9014bfb1e in remoteDispatchDomainMigratePerform3Params (server=0x55c9033c4660, msg=0x55c9033ef470, ret=0x7fa3740142c0, args=0x7fa374011ee0, rerr=0x7fa3871579c0, client=<optimized out>) at ../../src/remote/remote_daemon_dispatch.c:5736 #12 remoteDispatchDomainMigratePerform3ParamsHelper (server=0x55c9033c4660, client=<optimized out>, msg=0x55c9033ef470, rerr=0x7fa3871579c0, args=0x7fa374011ee0, ret=0x7fa3740142c0) at ./remote/remote_daemon_dispatch_stubs.h:8805 #13 0x00007fa395b791ad in virNetServerProgramDispatchCall (msg=0x55c9033ef470, client=0x55c9033ee120, server=0x55c9033c4660, prog=0x55c9033ceb90) at ../../src/rpc/virnetserverprogram.c:430 #14 virNetServerProgramDispatch (prog=0x55c9033ceb90, server=server@entry=0x55c9033c4660, client=0x55c9033ee120, msg=0x55c9033ef470) at ../../src/rpc/virnetserverprogram.c:302 #15 0x00007fa395b7e142 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x55c9033c4660) at ../../src/rpc/virnetserver.c:137 #16 virNetServerHandleJob (jobOpaque=0x55c9033dff40, opaque=0x55c9033c4660) at ../../src/rpc/virnetserver.c:154 #17 0x00007fa395a92330 in virThreadPoolWorker (opaque=<optimized out>) at ../../src/util/virthreadpool.c:163 #18 0x00007fa395a91967 in virThreadHelper (data=<optimized out>) at ../../src/util/virthread.c:233 #19 0x00007fa394ed1f1b in ?? () from /usr/lib64/libpthread.so.0 #20 0x00007fa394de92e0 in clone () from /usr/lib64/libc.so.6
2025-09-22 03:06:25.141+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.141+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.153+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist
2025-09-22 03:06:25.153+0000: 1124258: debug : virThreadJobClear:119 :
Thread 1124258 (rpc-worker) finished job remoteDispatchDomainMigratePerform3Params with ret=-1
This log is so abbreviated that it's useless. Please post the full thing somewhere.
:( Since we focus on the shutdown code of Libvirtd, getting the backtrace is not easy, so I added the debug patch.
Additionally if you can reproduce this without the patch I'd be interested in that log as well.
Yes, I reproduce this with Libvirt 6.2.0, the latest version in the
upstream uses the same logic and I assume that it also has this issue and reproducing is not that hard.
Migration rpc-worker routine print the message
5. Here we see the status XML:
<domstatus state='running' reason='migrated' pid='1092202'> <taint flag='custom-argv'/> <taint flag='custom-ga-command'/> <domain type='kvm' id='1' xmlns:qemu=' http://libvirt.org/schemas/domain/qemu/1.0'> <name>88030027-a5ce-4c19-aacc-5f1decf0b647</name> <uuid>6c677d6f-d140-48ca-b4ae-aff7fa7e4c50</uuid> <description>SmartX</description> <maxMemory slots='255' unit='KiB'>4194304000</maxMemory> ... It does not contain private data.
6. Finally, in our case, Libvirtd service can not startup; And in the upstream case, the behavior is that VM escapes from managing after startup logically.
-- Best regards
-- Best regards

On Mon, Sep 22, 2025 at 16:15:47 +0800, Yong Huang wrote:
On Mon, Sep 22, 2025 at 2:59 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Mon, Sep 22, 2025 at 11:30:46 +0800, Yong Huang wrote:
On Fri, Sep 19, 2025 at 8:23 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Fri, Sep 19, 2025 at 17:09:07 +0800, yong.huang@smartx.com wrote:
From: Hyman Huang <yong.huang@smartx.com>
[...]
3. Launch the migration and use "systemctl restart libvirt" to restart Libvirtd once after migration enters the perform phase.
[...]
Okay so my understanding from your description is that an (early startup) failure in virDomainObjListLoadAllConfigs() (and surrounding code) can result in the daemon shutting down before the threads handling the already loaded(? ... impossible to tell with the abbreviated log below) domains terminate? Right?
Yes, in our productized Libvirt, an early failure in virDomainObjListLoadAllConfigs()
can result in the daemon shutting down.
In the upstream Libvirt, the daemon started up successfully but failed to manage the VM
(The virDomainObjListLoadAllConfigs returns an error since the missing private data
in status XML).
So I assume a non-upstream version. What is it based on? What else did you change?
Thus the other threads trigger a use-after-free on the driver object?
Anyways I think it's clear now that just checking if the callbacks are present doesn't make sense.
Additionally there's now an upstream issue https://gitlab.com/libvirt/libvirt/-/issues/814 which seems to claim a use-after-free on a different code path but still triggered by the cleanup code freeing private data.
Unfortunately I didn't get any logs or backtrace there either.
I'll look into the shutdown code path and see if I can figure it out.
4. Search the log message:
$ cat /var/log/zbs/libvirtd.log |egrep "PrivateData formatter driver does not exist|remoteDispatchDomainMigratePerform3Params" 2025-09-22 03:06:12.517+0000: 1124258: debug : virThreadJobSet:94 :
Thread
1124258 (rpc-worker) is now running job remoteDispatchDomainMigratePerform3Params
This log indicate that 1124258 thread now execute the remoteDispatchDomainMigratePerform3Params
2025-09-22 03:06:12.517+0000: 1124258: debug :
remoteDispatchDomainMigratePerform3ParamsHelper:8804 : server=0x556317979660 client=0x55631799eff0 msg=0x55631799c010 rerr=0x7f08c688b9c0 args=0x7f08a800a820 ret=0x7f08a80053b0 2025-09-22 03:06:21.959+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist
In the execution path of remoteDispatchDomainMigratePerform3Params, it enters the code and the
warning message is logged, while the following warning message is never logged in a successful migration:
+ if (!xmlopt->privateData.format) { + VIR_WARN("PrivateData formatter driver does not exist"); + }
The following info shows the backtrace of virDomainObjFormat in an successful migration:
Successful, meaning you didn't hit the bug?
#0 virDomainObjFormat (obj=obj@entry=0x7fa3342598e0, xmlopt=0x7fa3341c54b0, flags=flags@entry=313) at ../../src/conf/domain_conf.c:30166 #1 0x00007fa395ae8684 in virDomainObjSave (obj=obj@entry=0x7fa3342598e0, xmlopt=<optimized out>, statusDir=0x7fa33412aec0 "/run/libvirt/qemu") at ../../src/conf/domain_conf.c:30375
[...] I asked for a backtrace of all threads as I want to see what the other threads are doing during the shutdown.
2025-09-22 03:06:25.141+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.141+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.153+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist
2025-09-22 03:06:25.153+0000: 1124258: debug : virThreadJobClear:119 :
Thread 1124258 (rpc-worker) finished job remoteDispatchDomainMigratePerform3Params with ret=-1
This log is so abbreviated that it's useless. Please post the full thing somewhere.
:( Since we focus on the shutdown code of Libvirtd, getting the backtrace is not easy, so I added the debug patch.
Additionally if you can reproduce this without the patch I'd be interested in that log as well.
Yes, I reproduce this with Libvirt 6.2.0, the latest version in the
upstream uses the same logic and I assume that it also has this issue and reproducing is not that hard.
So, can you hit this problem with current upstream code? While the logic for formatting data is the same, it's not actually the problem. The problem is in the shutdown logic and I do remember some changes in the code. Especially if you're claiming to use libvirt-6.2 which is 5 years old at this point.

On Mon, Sep 22, 2025 at 4:28 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Mon, Sep 22, 2025 at 16:15:47 +0800, Yong Huang wrote:
On Mon, Sep 22, 2025 at 2:59 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Mon, Sep 22, 2025 at 11:30:46 +0800, Yong Huang wrote:
On Fri, Sep 19, 2025 at 8:23 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Fri, Sep 19, 2025 at 17:09:07 +0800, yong.huang@smartx.com wrote:
From: Hyman Huang <yong.huang@smartx.com>
[...]
3. Launch the migration and use "systemctl restart libvirt" to restart Libvirtd once after migration enters the perform phase.
[...]
Okay so my understanding from your description is that an (early startup) failure in virDomainObjListLoadAllConfigs() (and surrounding code) can result in the daemon shutting down before the threads handling the already loaded(? ... impossible to tell with the abbreviated log below) domains terminate? Right?
Yes, in our productized Libvirt, an early failure in virDomainObjListLoadAllConfigs()
can result in the daemon shutting down.
In the upstream Libvirt, the daemon started up successfully but failed to manage the VM
(The virDomainObjListLoadAllConfigs returns an error since the missing private data
in status XML).
So I assume a non-upstream version. What is it based on? What else did you change?
Thus the other threads trigger a use-after-free on the driver object?
Anyways I think it's clear now that just checking if the callbacks are present doesn't make sense.
Additionally there's now an upstream issue https://gitlab.com/libvirt/libvirt/-/issues/814 which seems to claim a use-after-free on a different code path but still triggered by the cleanup code freeing private data.
Unfortunately I didn't get any logs or backtrace there either.
I'll look into the shutdown code path and see if I can figure it out.
4. Search the log message:
$ cat /var/log/zbs/libvirtd.log |egrep "PrivateData formatter driver
does
not exist|remoteDispatchDomainMigratePerform3Params" 2025-09-22 03:06:12.517+0000: 1124258: debug : virThreadJobSet:94 : Thread 1124258 (rpc-worker) is now running job remoteDispatchDomainMigratePerform3Params
This log indicate that 1124258 thread now execute the remoteDispatchDomainMigratePerform3Params
2025-09-22 03:06:12.517+0000: 1124258: debug :
remoteDispatchDomainMigratePerform3ParamsHelper:8804 : server=0x556317979660 client=0x55631799eff0 msg=0x55631799c010 rerr=0x7f08c688b9c0 args=0x7f08a800a820 ret=0x7f08a80053b0 2025-09-22 03:06:21.959+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist
In the execution path of remoteDispatchDomainMigratePerform3Params, it enters the code and the
warning message is logged, while the following warning message is never logged in a successful migration:
+ if (!xmlopt->privateData.format) { + VIR_WARN("PrivateData formatter driver does not exist"); + }
The following info shows the backtrace of virDomainObjFormat in an successful migration:
Successful, meaning you didn't hit the bug?
#0 virDomainObjFormat (obj=obj@entry=0x7fa3342598e0, xmlopt=0x7fa3341c54b0, flags=flags@entry=313) at ../../src/conf/domain_conf.c:30166 #1 0x00007fa395ae8684 in virDomainObjSave (obj=obj@entry =0x7fa3342598e0, xmlopt=<optimized out>, statusDir=0x7fa33412aec0 "/run/libvirt/qemu") at ../../src/conf/domain_conf.c:30375
[...]
I asked for a backtrace of all threads as I want to see what the other threads are doing during the shutdown.
2025-09-22 03:06:25.141+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.141+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist 2025-09-22 03:06:25.153+0000: 1124258: warning : virDomainObjFormat:30190 : PrivateData formatter driver does not exist
2025-09-22 03:06:25.153+0000: 1124258: debug : virThreadJobClear:119 :
Thread 1124258 (rpc-worker) finished job remoteDispatchDomainMigratePerform3Params with ret=-1
This log is so abbreviated that it's useless. Please post the full thing somewhere.
:( Since we focus on the shutdown code of Libvirtd, getting the backtrace is not easy, so I added the debug patch.
Additionally if you can reproduce this without the patch I'd be interested in that log as well.
Yes, I reproduce this with Libvirt 6.2.0, the latest version in the
upstream uses the same logic and I assume that it also has this issue and reproducing is not that hard.
So, can you hit this problem with current upstream code?
While the logic for formatting data is the same, it's not actually the problem. The problem is in the shutdown logic and I do remember some changes in the code. Especially if you're claiming to use libvirt-6.2 which is 5 years old at this point.
Ok, in case of focusing on a nonexistent issue in the upstream code, I'll try to reproduce this and reply to you once I get the result. Thanks for the reply. -- Best regards

On Mon, Sep 22, 2025 at 11:30:46 +0800, Yong Huang wrote:
On Fri, Sep 19, 2025 at 8:23 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Fri, Sep 19, 2025 at 17:09:07 +0800, yong.huang@smartx.com wrote:
From: Hyman Huang <yong.huang@smartx.com>
When saving the domain status in the path of the virDomainObjSave function, Libvirtd will omit the private data if the format interface is not provided.
When the Libvirtd service is starting up and the driver has not yet been initialized, this behavior is justified.
However, when the service is shutting down, the driver is being finalized and the interface is being released, a migration job or another job may call the qemuDomainSaveStatus and save the domain status at the same time. For the latter, this behavior
I had another look and `virQEMUDriverPrivateDataCallbacks` is a global variable in the qemu driver which is just populated and never changed.
This struct is then copied (by assignment) into the virDomainXMLOption instance that is used inside the qemu driver. Thus while this does have a private copy of the function pointers. The virDomainXMLOption instance lives the whole time the virt drivers live, so it's only ever cleared in virStateCleanup, which in the daemons happens right before exit.
Now looking at the formatter code (let's use your patch as an example):
--- src/conf/domain_conf.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 281846dfbe..74af08e584 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -29542,9 +29542,24 @@ virDomainObjFormat(virDomainObj *obj, obj->deprecations[i]); } =20 - if (xmlopt->privateData.format && - xmlopt->privateData.format(&buf, obj) < 0) - return NULL; + if (xmlopt->privateData.format) { + if (xmlopt->privateData.format(&buf, obj) < 0)
Here you see that the code assumes that 'xmlopt' is *not NULL*. Inside of XMLopt you have only pointers that don't change.
As of such the 'format' callback being NULL is only if something overwrote the data inside 'xmlopt', which we don't do anywhere because it doesn't make sense.
The other possibility is that 'xmlopt' is used after free, thus was randomly overwritten, but that would be a much different kind of bug and your fix wouldn't fix that at all.
causes the XML to be saved without private information (such as monitor path and qemuCaps), which is required for the Libvirtd service to manage a VM during startup. As a result, after restarting the Libvirtd service, a running VM that is being migrated previously might escape management.
Thus, when formatting the status XML for a running virtual machine, we need to presume that the "privateData.format" interface is present.
Add a thorough check in the virDomainObjSave path to make sure that private data in the status XML file always exists for the running VM so that we won't lose it after restart the Libvirtd service.
Now this outlines the implications of losing the private data but doesn't really give any information how this happened.
Can you please clarify: 1) What did you do to trigger this bug? What was the state of the host? 2) How did you figure out that what you claim happened? 3) Can you reproduce the bug?
If you can reproduce the bug please make sure to collect debug logs of libvirtd, and potentially consider running the daemon inside e.g. valgrind to see if it reproduces with such heavy instrumentation.
1. Add a debuginfo print patch on Libvirtd, if privateData.format does not exist, log it:
--- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -30186,6 +30186,11 @@ virDomainObjFormat(virDomainObjPtr obj, xmlopt->privateData.format(&buf, obj) < 0) return NULL;
+ if (!xmlopt->privateData.format) { + VIR_WARN("PrivateData formatter driver does not exist"); + } +
2. Add a private patch(Not the complete code):
Stop process if malformed config found
Malformed material may prevent the virtual machine from loading in its live state, according to the configuration XML. If such a scenario was found by the Libvirt service during startup, report an error and terminate.
Okay I had a deeper look and I now understand what you wanted to do by this code. It's also completely irelevant to the use-after-free bug. Note that we do not want this behaviour, because it prevents the whole deamon from starting. We need to fix the root cause rather than cover up. Losing a VM is obviously very bad, but fixing the root cause for it is what we should focus on. Please provide the unabreviated debug log and/or a *full* backtrace of all threads if you can attach a debugger to the daemon and setup a breakpoint at the point where the formatting of XML would use the freed data.
participants (3)
-
Peter Krempa
-
Yong Huang
-
yong.huang@smartx.com