On Mon, Feb 23, 2026 at 22:39:43 -0300, Lucas Amaral wrote:
When creating multiple VMs in parallel, concurrent volume creation holds an async job on the pool. A pool refresh during this window
Other operations increase the async job count too, such as download/upload/wipe.
fails immediately with "pool has asynchronous jobs running", which causes cascading failures in parallel provisioning workflows.
The refresh operation genuinely cannot run while a volume build is in progress (it clears all volume metadata via virStoragePoolObjClearVols), but the failure is premature since
Yes, clearing of the list of volumes especially can't happen while some of the operations that set the 'in_use' semaphore of virStorageVol are in use. Without that we could perhaps do refcounted volume objects and let the running API operate on the old variant while refresh populates a new list, but that wouldn't work well with the interlocking.
the async job will finish shortly.
IMO 'shortly' is relative if upload/download/wipe is considered.
Add a condition variable to virStoragePoolObj that allows storagePoolRefresh() to wait up to 30 seconds for async jobs to drain. The volume build thread broadcasts the condition when it decrements asyncjobs to zero. After waking, the refresh function re-validates preconditions (pool still active, not starting) since
The pre-condition checks ought to be packaged into a helper function if this approach is to be taken.
the pool lock was released during the wait.
Only storagePoolRefresh() gets the wait mechanism. The other three operations (destroy, undefine, delete) keep the immediate error
This IMO doesn't make sense and additionally our async job handling on VM objects doesn't have this distinction. All APIs get to wait for the async job to finish. I don't think this has any reason to be exception. I think that we need in fact to extract the logic you propose and apply it to all operations which check the reference count and also add a variant of this which will consider the 'in_use' semaphore in addition to the 'asyncjobs' semaphore.
because waiting to destroy or delete a pool during volume creation is not a sensible user workflow.
Technically the 'delete'/'destroy' would happen. Also as noted there are other operations too which take the 'async job'.
Resolves: https://issues.redhat.com/browse/RHEL-150758 Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com> ---
[...]