RE: join running core dump job

23 Mar 2024

      ...
-----Original Message-----
From: Thanos Makatos <thanos.makatos@nutanix.com>
Sent: Friday, March 8, 2024 2:21 PM
To: devel@lists.libvirt.org
Cc: Michal Privoznik <mprivozn@redhat.com>
Subject: RE: join running core dump job
...
-----Original Message-----
From: Thanos Makatos <thanos.makatos@nutanix.com>
Sent: Monday, March 4, 2024 9:45 PM
To: Thanos Makatos <thanos.makatos@nutanix.com>;
devel@lists.libvirt.org
Subject: RE: join running core dump job
...
-----Original Message-----
From: Thanos Makatos <thanos.makatos@nutanix.com>
Sent: Monday, March 4, 2024 5:24 PM
To: devel@lists.libvirt.org
Subject: join running core dump job
Is there a way to programmatically wait for a previously initiated
virDomainCoreDumpWithFormat() where the process that started it died?
I'm
looking at the API and don't seem to find anything relevant.  I suppose I
could
poll via virDomainGetJobStats(), but, ideally, I'd like a function that would
join
the dump job and return when the dump job finishes.
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org
I see there's qemuDumpWaitForCompletion(), looks promising.
I've made some progress (added a
virHypervisorDriver.domainCoreDumpWait and the relevant scaffolding to
make 'virsh dump --wait' work), calling qemuDumpWaitForCompletion() is all
that's needed.
However, it doesn't seem trivial to implement this in the test_driver.
First, IIUC testDomainCoreDumpWithFormat() gets an exclusive lock on the
domain (haven't tested anything yet), so calling domainCoreDumpWait()
would block for the wrong reason. Is making
testDomainCoreDumpWithFormat() using asynchronous jobs internally
unavoidable?
Second, I want to test behaviour in my application where (1) it calls
domainCoreDump(), (2) crashes before domainCoreDump() finishes, (3) then
my application starts again and looks for a pending dump job and (4) joins it
using domainCoreDumpWait(). I can't see an easy wait of faking a dump job
in the test_driver when it starts. How about adding persistent tasks, which I
can pre-populate before starting my application, or fake jobs via an
environment variable, so that when the test_driver starts it can internally
continue them? E.g. we can specify how long to run the job for and
domainCoreDumpWait() add a sleep for that long.
I ended up using and environment variable in the test_driver to fake jobs, so the application under test doesn't need to know anything. Would something like that be accepted?
...
I'm open to suggestions.
I have managed to implement a PoC by introducing virDomainJobWait(), which checks the job type and if it's a dump it calls qemuDumpWaitForCompletion(), purely because it's the easiest thing to do (it fails with VIR_ERR_OPERATION_INVALID for all other operations). We'd like to implement this for all other job types and I'm looking for some pointers, is virCondWaitUntil(&priv->job.asyncCond, &obj->parent.lock) all that's required? (And return priv->job.error?)

Also, as I explained earlier, I want to implement joining a potentially existing dump job. In my specific use case, I want to either join an ongoing dump job if it exists, otherwise start a new one. I do this by calling virDomainGetJobStats(), but I'm thinking whether we could add a new, optional dump flag, e.g. 'join' which would make virDomainCoreDumpWithFormat do all this internally. Would something like that be accepted upstream.

Thanos Makatos

tags

participants (1)