On 11/07/2017 04:18 AM, Cedric Bosdonnat wrote:
On Mon, 2017-11-06 at 15:53 -0500, John Ferlan wrote:
> Resolve a storage driver crash as a result of a long running
> storageVolCreateXML when the virStorageVolPoolRefreshThread is
> run as a result of a storageVolUpload complete and ran the
> virStoragePoolObjClearVols without checking if the creation
> code was currently processing a buildVol after incrementing
> the driver->asyncjob count.
>
> The refreshThread needs to wait until all creation threads
> are completed so as to not alter the volume list while the
> volume is being built.
>
> Crash from valgrind is as follows (with a bit of editing):
>
> ==21309== Invalid read of size 8
> ==21309== at 0x153E47AF: storageBackendUpdateVolTargetInfo
> ==21309== by 0x153E4C30: virStorageBackendUpdateVolInfo
> ==21309== by 0x153E52DE: virStorageBackendVolRefreshLocal
> ==21309== by 0x153DE29E: storageVolCreateXML
> ==21309== by 0x562035B: virStorageVolCreateXML
> ==21309== by 0x147366: remoteDispatchStorageVolCreateXML
> ...
> ==21309== Address 0x2590a720 is 64 bytes inside a block of size 336 free'd
> ==21309== at 0x4C2F2BB: free
> ==21309== by 0x54CB9FA: virFree
> ==21309== by 0x55BC800: virStorageVolDefFree
> ==21309== by 0x55BF1D8: virStoragePoolObjClearVols
> ==21309== by 0x153D967E: virStorageVolPoolRefreshThread
> ...
> ==21309== Block was alloc'd at
> ==21309== at 0x4C300A5: calloc
> ==21309== by 0x54CB483: virAlloc
> ==21309== by 0x55BDC1F: virStorageVolDefParseXML
> ==21309== by 0x55BDC1F: virStorageVolDefParseNode
> ==21309== by 0x55BE5A4: virStorageVolDefParse
> ==21309== by 0x153DDFF1: storageVolCreateXML
> ==21309== by 0x562035B: virStorageVolCreateXML
> ==21309== by 0x147366: remoteDispatchStorageVolCreateXML
> ...
>
> Signed-off-by: John Ferlan <jferlan(a)redhat.com>
> ---
> src/storage/storage_driver.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/src/storage/storage_driver.c b/src/storage/storage_driver.c
> index b0edf9f885..5e920fc14c 100644
> --- a/src/storage/storage_driver.c
> +++ b/src/storage/storage_driver.c
> @@ -2257,6 +2257,7 @@ virStorageVolPoolRefreshThread(void *opaque)
> virStorageBackendPtr backend;
> virObjectEventPtr event = NULL;
>
> + retry:
> storageDriverLock();
> if (cbdata->vol_path) {
> if (virStorageBackendPloopRestoreDesc(cbdata->vol_path) < 0)
> @@ -2270,6 +2271,14 @@ virStorageVolPoolRefreshThread(void *opaque)
> if (!(backend = virStorageBackendForType(def->type)))
> goto cleanup;
>
> + /* Some thread is creating a new volume in the pool, we need to retry */
> + if (virStoragePoolObjGetAsyncjobs(obj) > 0) {
> + virStoragePoolObjUnlock(obj);
> + storageDriverUnlock();
> + usleep(100 * 1000);
> + goto retry;
> + }
> +
> virStoragePoolObjClearVols(obj);
> if (backend->refreshPool(NULL, obj) < 0)
> VIR_DEBUG("Failed to refresh storage pool");
ACK, does the job here. I'm rather surprised to see you implementing it
with sleep, while you pointed me towards virCondWait yesterday. But using
sleep is a way less intrusive change.
Well you only asked about an alternative mechanism and condition
variable was the first thing that popped into my mind; however, as I got
to actually doing more thinking about it - asyncjobs is not blocking
multiple creates from occurring; whereas, a condition variable would be
waiting for one thing to complete.
My response to Jan also lists 2 other alternatives. This was just the
"easiest" of the 3. If there's other ideas, I'm open to suggestions.
John
--
Cedric