On Mon, Sep 22, 2025 at 11:30:46 +0800, Yong Huang wrote:
> On Fri, Sep 19, 2025 at 8:23 PM Peter Krempa <pkrempa@redhat.com> wrote:
>
> > On Fri, Sep 19, 2025 at 17:09:07 +0800, yong.huang@smartx.com wrote:
> > > From: Hyman Huang <yong.huang@smartx.com>
> > >
> > > When saving the domain status in the path of the virDomainObjSave
> > > function, Libvirtd will omit the private data if the format
> > > interface is not provided.
> > >
> > > When the Libvirtd service is starting up and the driver has not
> > > yet been initialized, this behavior is justified.
> > >
> > > However, when the service is shutting down, the driver is being
> > > finalized and the interface is being released, a migration job
> > > or another job may call the qemuDomainSaveStatus and save the
> > > domain status at the same time. For the latter, this behavior
> >
> > I had another look and `virQEMUDriverPrivateDataCallbacks` is a global
> > variable in the qemu driver which is just populated and never changed.
> >
> > This struct is then copied (by assignment) into the virDomainXMLOption
> > instance that is used inside the qemu driver. Thus while this does have
> > a private copy of the function pointers. The virDomainXMLOption
> > instance lives the whole time the virt drivers live, so it's only ever
> > cleared in virStateCleanup, which in the daemons happens right before
> > exit.
> >
> > Now looking at the formatter code (let's use your patch as an example):
> >
> > ---
> > src/conf/domain_conf.c | 21 ++++++++++++++++++---
> > 1 file changed, 18 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
> > index 281846dfbe..74af08e584 100644
> > --- a/src/conf/domain_conf.c
> > +++ b/src/conf/domain_conf.c
> > @@ -29542,9 +29542,24 @@ virDomainObjFormat(virDomainObj *obj,
> > obj->deprecations[i]);
> > }
> > =20
> > - if (xmlopt->privateData.format &&
> > - xmlopt->privateData.format(&buf, obj) < 0)
> > - return NULL;
> > + if (xmlopt->privateData.format) {
> > + if (xmlopt->privateData.format(&buf, obj) < 0)
> >
> > Here you see that the code assumes that 'xmlopt' is *not NULL*. Inside
> > of XMLopt you have only pointers that don't change.
> >
> > As of such the 'format' callback being NULL is only if something
> > overwrote the data inside 'xmlopt', which we don't do anywhere because
> > it doesn't make sense.
> >
> > The other possibility is that 'xmlopt' is used after free, thus was
> > randomly overwritten, but that would be a much different kind of bug and
> > your fix wouldn't fix that at all.
> >
> >
> > > causes the XML to be saved without private information (such as
> > > monitor path and qemuCaps), which is required for the Libvirtd
> > > service to manage a VM during startup. As a result, after restarting
> > > the Libvirtd service, a running VM that is being migrated previously
> > > might escape management.
> > >
> > > Thus, when formatting the status XML for a running virtual machine,
> > > we need to presume that the "privateData.format" interface is present.
> > >
> > > Add a thorough check in the virDomainObjSave path to make sure that
> > > private data in the status XML file always exists for the running VM
> > > so that we won't lose it after restart the Libvirtd service.
> >
> > Now this outlines the implications of losing the private data but
> > doesn't really give any information how this happened.
> >
> > Can you please clarify:
> > 1) What did you do to trigger this bug? What was the state of the host?
> > 2) How did you figure out that what you claim happened?
> > 3) Can you reproduce the bug?
> >
> > If you can reproduce the bug please make sure to collect debug logs of
> > libvirtd, and potentially consider running the daemon inside e.g.
> > valgrind to see if it reproduces with such heavy instrumentation.
> >
> >
> 1. Add a debuginfo print patch on Libvirtd, if privateData.format does not
> exist, log it:
[...]
> @@ -630,9 +631,14 @@ virDomainObjListLoadAllConfigs(virDomainObjListPtr
> doms,
> virDomainObjEndAPI(&dom);
> } else {
> VIR_ERROR(_("Failed to load config for domain '%s'"),
> entry->d_name);
> + if (liveStatus && liveStatusErrorFatal) {
> + ret = -1;
> + goto end;
> + }
> }
> }
>
> +end:
> VIR_DIR_CLOSE(dir);
> virObjectRWUnlock(doms);
[..]
>
> 3. Launch the migration and use "systemctl restart libvirt" to restart
> Libvirtd
> once after migration enters the perform phase.
[...]
Okay so my understanding from your description is that an (early
startup) failure in virDomainObjListLoadAllConfigs() (and surrounding
code) can result in the daemon shutting down before the threads handling
the already loaded(? ... impossible to tell with the abbreviated log
below) domains terminate? Right?
Yes, in our productized Libvirt, an early failure in virDomainObjListLoadAllConfigs()
can result in the daemon shutting down.
In the upstream Libvirt, the daemon started up successfully but failed to manage the VM
(The virDomainObjListLoadAllConfigs returns an error since the missing private data
in status XML).
Thus the other threads trigger a use-after-free on the driver object?
Anyways I think it's clear now that just checking if the callbacks are
present doesn't make sense.
Additionally there's now an upstream issue https://gitlab.com/libvirt/libvirt/-/issues/814
which seems to claim a use-after-free on a different code path but still
triggered by the cleanup code freeing private data.
Unfortunately I didn't get any logs or backtrace there either.
I'll look into the shutdown code path and see if I can figure it out.
>
>
> 4. Search the log message:
>
> $ cat /var/log/zbs/libvirtd.log |egrep "PrivateData formatter driver does
> not exist|remoteDispatchDomainMigratePerform3Params"
> 2025-09-22 03:06:12.517+0000: 1124258: debug : virThreadJobSet:94 : Thread
> 1124258 (rpc-worker) is now running job
> remoteDispatchDomainMigratePerform3Params
> 2025-09-22 03:06:12.517+0000: 1124258: debug :
> remoteDispatchDomainMigratePerform3ParamsHelper:8804 :
> server=0x556317979660 client=0x55631799eff0 msg=0x55631799c010
> rerr=0x7f08c688b9c0 args=0x7f08a800a820 ret=0x7f08a80053b0
> 2025-09-22 03:06:21.959+0000: 1124258: warning : virDomainObjFormat:30190 :
> PrivateData formatter driver does not exist
In the execution path of remoteDispatchDomainMigratePerform3Params, it enters the code and the
warning message is logged, while the following warning message is never logged in a successful migration:
> 2025-09-22 03:06:25.141+0000: 1124258: warning : virDomainObjFormat:30190 :
> PrivateData formatter driver does not exist
> 2025-09-22 03:06:25.141+0000: 1124258: warning : virDomainObjFormat:30190 :
> PrivateData formatter driver does not exist
> 2025-09-22 03:06:25.153+0000: 1124258: warning : virDomainObjFormat:30190 :
> PrivateData formatter driver does not exist
> 2025-09-22 03:06:25.153+0000: 1124258: debug : virThreadJobClear:119 :
> Thread 1124258 (rpc-worker) finished job
> remoteDispatchDomainMigratePerform3Params with ret=-1
This log is so abbreviated that it's useless. Please post the full thing
somewhere.
Additionally if you can reproduce this without the patch I'd be
interested in that log as well.
>
> Migration rpc-worker routine print the message
>
> 5. Here we see the status XML:
>
> <domstatus state='running' reason='migrated' pid='1092202'>
> <taint flag='custom-argv'/>
> <taint flag='custom-ga-command'/>
> <domain type='kvm' id='1' xmlns:qemu='
> http://libvirt.org/schemas/domain/qemu/1.0'>
> <name>88030027-a5ce-4c19-aacc-5f1decf0b647</name>
> <uuid>6c677d6f-d140-48ca-b4ae-aff7fa7e4c50</uuid>
> <description>SmartX</description>
> <maxMemory slots='255' unit='KiB'>4194304000</maxMemory>
> ...
> It does not contain private data.
>
> 6. Finally, in our case, Libvirtd service can not startup; And in the
> upstream case, the behavior
> is that VM escapes from managing after startup logically.
>
>
> --
> Best regards