-----Original Message-----
From: Thanos Makatos <thanos.makatos(a)nutanix.com>
Sent: Friday, March 8, 2024 2:21 PM
To: devel(a)lists.libvirt.org
Cc: Michal Privoznik <mprivozn(a)redhat.com>
Subject: RE: join running core dump job
> -----Original Message-----
> From: Thanos Makatos <thanos.makatos(a)nutanix.com>
> Sent: Monday, March 4, 2024 9:45 PM
> To: Thanos Makatos <thanos.makatos(a)nutanix.com>;
devel(a)lists.libvirt.org
> Subject: RE: join running core dump job
>
> > -----Original Message-----
> > From: Thanos Makatos <thanos.makatos(a)nutanix.com>
> > Sent: Monday, March 4, 2024 5:24 PM
> > To: devel(a)lists.libvirt.org
> > Subject: join running core dump job
> >
> > Is there a way to programmatically wait for a previously initiated
> > virDomainCoreDumpWithFormat() where the process that started it died?
> I'm
> > looking at the API and don't seem to find anything relevant. I suppose I
> could
> > poll via virDomainGetJobStats(), but, ideally, I'd like a function that
would
> join
> > the dump job and return when the dump job finishes.
> > _______________________________________________
> > Devel mailing list -- devel(a)lists.libvirt.org
> > To unsubscribe send an email to devel-leave(a)lists.libvirt.org
>
> I see there's qemuDumpWaitForCompletion(), looks promising.
I've made some progress (added a
virHypervisorDriver.domainCoreDumpWait and the relevant scaffolding to
make 'virsh dump --wait' work), calling qemuDumpWaitForCompletion() is all
that's needed.
However, it doesn't seem trivial to implement this in the test_driver.
First, IIUC testDomainCoreDumpWithFormat() gets an exclusive lock on the
domain (haven't tested anything yet), so calling domainCoreDumpWait()
would block for the wrong reason. Is making
testDomainCoreDumpWithFormat() using asynchronous jobs internally
unavoidable?
Second, I want to test behaviour in my application where (1) it calls
domainCoreDump(), (2) crashes before domainCoreDump() finishes, (3) then
my application starts again and looks for a pending dump job and (4) joins it
using domainCoreDumpWait(). I can't see an easy wait of faking a dump job
in the test_driver when it starts. How about adding persistent tasks, which I
can pre-populate before starting my application, or fake jobs via an
environment variable, so that when the test_driver starts it can internally
continue them? E.g. we can specify how long to run the job for and
domainCoreDumpWait() add a sleep for that long.
I ended up using and environment variable in the test_driver to fake jobs, so the
application under test doesn't need to know anything. Would something like that be
accepted?
I'm open to suggestions.
I have managed to implement a PoC by introducing virDomainJobWait(), which checks the job
type and if it's a dump it calls qemuDumpWaitForCompletion(), purely because it's
the easiest thing to do (it fails with VIR_ERR_OPERATION_INVALID for all other
operations). We'd like to implement this for all other job types and I'm looking
for some pointers, is virCondWaitUntil(&priv->job.asyncCond,
&obj->parent.lock) all that's required? (And return priv->job.error?)
Also, as I explained earlier, I want to implement joining a potentially existing dump job.
In my specific use case, I want to either join an ongoing dump job if it exists, otherwise
start a new one. I do this by calling virDomainGetJobStats(), but I'm thinking whether
we could add a new, optional dump flag, e.g. 'join' which would make
virDomainCoreDumpWithFormat do all this internally. Would something like that be accepted
upstream.