
On Mon, Sep 22, 2025 at 11:30:46 +0800, Yong Huang wrote:
On Fri, Sep 19, 2025 at 8:23 PM Peter Krempa <pkrempa@redhat.com> wrote:
On Fri, Sep 19, 2025 at 17:09:07 +0800, yong.huang@smartx.com wrote:
From: Hyman Huang <yong.huang@smartx.com>
When saving the domain status in the path of the virDomainObjSave function, Libvirtd will omit the private data if the format interface is not provided.
When the Libvirtd service is starting up and the driver has not yet been initialized, this behavior is justified.
However, when the service is shutting down, the driver is being finalized and the interface is being released, a migration job or another job may call the qemuDomainSaveStatus and save the domain status at the same time. For the latter, this behavior
I had another look and `virQEMUDriverPrivateDataCallbacks` is a global variable in the qemu driver which is just populated and never changed.
This struct is then copied (by assignment) into the virDomainXMLOption instance that is used inside the qemu driver. Thus while this does have a private copy of the function pointers. The virDomainXMLOption instance lives the whole time the virt drivers live, so it's only ever cleared in virStateCleanup, which in the daemons happens right before exit.
Now looking at the formatter code (let's use your patch as an example):
--- src/conf/domain_conf.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 281846dfbe..74af08e584 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -29542,9 +29542,24 @@ virDomainObjFormat(virDomainObj *obj, obj->deprecations[i]); } =20 - if (xmlopt->privateData.format && - xmlopt->privateData.format(&buf, obj) < 0) - return NULL; + if (xmlopt->privateData.format) { + if (xmlopt->privateData.format(&buf, obj) < 0)
Here you see that the code assumes that 'xmlopt' is *not NULL*. Inside of XMLopt you have only pointers that don't change.
As of such the 'format' callback being NULL is only if something overwrote the data inside 'xmlopt', which we don't do anywhere because it doesn't make sense.
The other possibility is that 'xmlopt' is used after free, thus was randomly overwritten, but that would be a much different kind of bug and your fix wouldn't fix that at all.
causes the XML to be saved without private information (such as monitor path and qemuCaps), which is required for the Libvirtd service to manage a VM during startup. As a result, after restarting the Libvirtd service, a running VM that is being migrated previously might escape management.
Thus, when formatting the status XML for a running virtual machine, we need to presume that the "privateData.format" interface is present.
Add a thorough check in the virDomainObjSave path to make sure that private data in the status XML file always exists for the running VM so that we won't lose it after restart the Libvirtd service.
Now this outlines the implications of losing the private data but doesn't really give any information how this happened.
Can you please clarify: 1) What did you do to trigger this bug? What was the state of the host? 2) How did you figure out that what you claim happened? 3) Can you reproduce the bug?
If you can reproduce the bug please make sure to collect debug logs of libvirtd, and potentially consider running the daemon inside e.g. valgrind to see if it reproduces with such heavy instrumentation.
1. Add a debuginfo print patch on Libvirtd, if privateData.format does not exist, log it:
--- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -30186,6 +30186,11 @@ virDomainObjFormat(virDomainObjPtr obj, xmlopt->privateData.format(&buf, obj) < 0) return NULL;
+ if (!xmlopt->privateData.format) { + VIR_WARN("PrivateData formatter driver does not exist"); + } +
2. Add a private patch(Not the complete code):
Stop process if malformed config found
Malformed material may prevent the virtual machine from loading in its live state, according to the configuration XML. If such a scenario was found by the Libvirt service during startup, report an error and terminate.
Okay I had a deeper look and I now understand what you wanted to do by this code. It's also completely irelevant to the use-after-free bug. Note that we do not want this behaviour, because it prevents the whole deamon from starting. We need to fix the root cause rather than cover up. Losing a VM is obviously very bad, but fixing the root cause for it is what we should focus on. Please provide the unabreviated debug log and/or a *full* backtrace of all threads if you can attach a debugger to the daemon and setup a breakpoint at the point where the formatting of XML would use the freed data.