Re: [PATCH 2/2] storage: wait for async jobs to drain during pool refresh

20 Mar 2026

      On Mon, Feb 23, 2026 at 22:39:43 -0300, Lucas Amaral wrote:
...
When creating multiple VMs in parallel, concurrent volume creation
holds an async job on the pool. A pool refresh during this window
Other operations increase the async job count too, such as
download/upload/wipe.
...
fails immediately with "pool has asynchronous jobs running", which
causes cascading failures in parallel provisioning workflows.
The refresh operation genuinely cannot run while a volume build is
in progress (it clears all volume metadata via
virStoragePoolObjClearVols), but the failure is premature since
Yes, clearing of the list of volumes especially can't happen while some
of the operations that set the 'in_use' semaphore of virStorageVol are
in use.

Without that we could perhaps do refcounted volume objects and let the
running API operate on the old variant while refresh populates a new
list, but that wouldn't work well with the interlocking.
...
the async job will finish shortly.
IMO 'shortly' is relative if upload/download/wipe is considered.
...
Add a condition variable to virStoragePoolObj that allows
storagePoolRefresh() to wait up to 30 seconds for async jobs to
drain. The volume build thread broadcasts the condition when it
decrements asyncjobs to zero. After waking, the refresh function
re-validates preconditions (pool still active, not starting) since
The pre-condition checks ought to be packaged into a helper function if
this approach is to be taken.
...
the pool lock was released during the wait.
Only storagePoolRefresh() gets the wait mechanism. The other three
operations (destroy, undefine, delete) keep the immediate error
This IMO doesn't make sense and additionally our async job handling on
VM objects doesn't have this distinction. All APIs get to wait for the
async job to finish. I don't think this has any reason to be exception.

I think that we need in fact to extract the logic you propose and apply
it to all operations which check the reference count and also add a
variant of this which will consider the 'in_use' semaphore in addition
to the 'asyncjobs' semaphore.
...
because waiting to destroy or delete a pool during volume creation
is not a sensible user workflow.
Technically the 'delete'/'destroy' would happen. Also as noted there are
other operations too which take the 'async job'.
...
Resolves: https://issues.redhat.com/browse/RHEL-150758
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
[...]

Re: [PATCH 2/2] storage: wait for async jobs to drain during pool refresh

Peter Krempa