RE: join running core dump job

-----Original Message----- From: Thanos Makatos <thanos.makatos@nutanix.com> Sent: Friday, March 8, 2024 2:21 PM To: devel@lists.libvirt.org Cc: Michal Privoznik <mprivozn@redhat.com> Subject: RE: join running core dump job
-----Original Message----- From: Thanos Makatos <thanos.makatos@nutanix.com> Sent: Monday, March 4, 2024 9:45 PM To: Thanos Makatos <thanos.makatos@nutanix.com>; devel@lists.libvirt.org Subject: RE: join running core dump job
-----Original Message----- From: Thanos Makatos <thanos.makatos@nutanix.com> Sent: Monday, March 4, 2024 5:24 PM To: devel@lists.libvirt.org Subject: join running core dump job
Is there a way to programmatically wait for a previously initiated virDomainCoreDumpWithFormat() where the process that started it died? I'm looking at the API and don't seem to find anything relevant. I suppose I could poll via virDomainGetJobStats(), but, ideally, I'd like a function that would join the dump job and return when the dump job finishes. _______________________________________________ Devel mailing list -- devel@lists.libvirt.org To unsubscribe send an email to devel-leave@lists.libvirt.org
I see there's qemuDumpWaitForCompletion(), looks promising.
I've made some progress (added a virHypervisorDriver.domainCoreDumpWait and the relevant scaffolding to make 'virsh dump --wait' work), calling qemuDumpWaitForCompletion() is all that's needed.
However, it doesn't seem trivial to implement this in the test_driver. First, IIUC testDomainCoreDumpWithFormat() gets an exclusive lock on the domain (haven't tested anything yet), so calling domainCoreDumpWait() would block for the wrong reason. Is making testDomainCoreDumpWithFormat() using asynchronous jobs internally unavoidable? Second, I want to test behaviour in my application where (1) it calls domainCoreDump(), (2) crashes before domainCoreDump() finishes, (3) then my application starts again and looks for a pending dump job and (4) joins it using domainCoreDumpWait(). I can't see an easy wait of faking a dump job in the test_driver when it starts. How about adding persistent tasks, which I can pre-populate before starting my application, or fake jobs via an environment variable, so that when the test_driver starts it can internally continue them? E.g. we can specify how long to run the job for and domainCoreDumpWait() add a sleep for that long.
I ended up using and environment variable in the test_driver to fake jobs, so the application under test doesn't need to know anything. Would something like that be accepted?
I'm open to suggestions.
I have managed to implement a PoC by introducing virDomainJobWait(), which checks the job type and if it's a dump it calls qemuDumpWaitForCompletion(), purely because it's the easiest thing to do (it fails with VIR_ERR_OPERATION_INVALID for all other operations). We'd like to implement this for all other job types and I'm looking for some pointers, is virCondWaitUntil(&priv->job.asyncCond, &obj->parent.lock) all that's required? (And return priv->job.error?) Also, as I explained earlier, I want to implement joining a potentially existing dump job. In my specific use case, I want to either join an ongoing dump job if it exists, otherwise start a new one. I do this by calling virDomainGetJobStats(), but I'm thinking whether we could add a new, optional dump flag, e.g. 'join' which would make virDomainCoreDumpWithFormat do all this internally. Would something like that be accepted upstream.
participants (1)
-
Thanos Makatos