
On 11/07/2017 04:18 AM, Cedric Bosdonnat wrote:
On Mon, 2017-11-06 at 15:53 -0500, John Ferlan wrote:
Resolve a storage driver crash as a result of a long running storageVolCreateXML when the virStorageVolPoolRefreshThread is run as a result of a storageVolUpload complete and ran the virStoragePoolObjClearVols without checking if the creation code was currently processing a buildVol after incrementing the driver->asyncjob count.
The refreshThread needs to wait until all creation threads are completed so as to not alter the volume list while the volume is being built.
Crash from valgrind is as follows (with a bit of editing):
==21309== Invalid read of size 8 ==21309== at 0x153E47AF: storageBackendUpdateVolTargetInfo ==21309== by 0x153E4C30: virStorageBackendUpdateVolInfo ==21309== by 0x153E52DE: virStorageBackendVolRefreshLocal ==21309== by 0x153DE29E: storageVolCreateXML ==21309== by 0x562035B: virStorageVolCreateXML ==21309== by 0x147366: remoteDispatchStorageVolCreateXML ... ==21309== Address 0x2590a720 is 64 bytes inside a block of size 336 free'd ==21309== at 0x4C2F2BB: free ==21309== by 0x54CB9FA: virFree ==21309== by 0x55BC800: virStorageVolDefFree ==21309== by 0x55BF1D8: virStoragePoolObjClearVols ==21309== by 0x153D967E: virStorageVolPoolRefreshThread ... ==21309== Block was alloc'd at ==21309== at 0x4C300A5: calloc ==21309== by 0x54CB483: virAlloc ==21309== by 0x55BDC1F: virStorageVolDefParseXML ==21309== by 0x55BDC1F: virStorageVolDefParseNode ==21309== by 0x55BE5A4: virStorageVolDefParse ==21309== by 0x153DDFF1: storageVolCreateXML ==21309== by 0x562035B: virStorageVolCreateXML ==21309== by 0x147366: remoteDispatchStorageVolCreateXML ...
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/storage/storage_driver.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/src/storage/storage_driver.c b/src/storage/storage_driver.c index b0edf9f885..5e920fc14c 100644 --- a/src/storage/storage_driver.c +++ b/src/storage/storage_driver.c @@ -2257,6 +2257,7 @@ virStorageVolPoolRefreshThread(void *opaque) virStorageBackendPtr backend; virObjectEventPtr event = NULL;
+ retry: storageDriverLock(); if (cbdata->vol_path) { if (virStorageBackendPloopRestoreDesc(cbdata->vol_path) < 0) @@ -2270,6 +2271,14 @@ virStorageVolPoolRefreshThread(void *opaque) if (!(backend = virStorageBackendForType(def->type))) goto cleanup;
+ /* Some thread is creating a new volume in the pool, we need to retry */ + if (virStoragePoolObjGetAsyncjobs(obj) > 0) { + virStoragePoolObjUnlock(obj); + storageDriverUnlock(); + usleep(100 * 1000); + goto retry; + } + virStoragePoolObjClearVols(obj); if (backend->refreshPool(NULL, obj) < 0) VIR_DEBUG("Failed to refresh storage pool");
ACK, does the job here. I'm rather surprised to see you implementing it with sleep, while you pointed me towards virCondWait yesterday. But using sleep is a way less intrusive change.
Well you only asked about an alternative mechanism and condition variable was the first thing that popped into my mind; however, as I got to actually doing more thinking about it - asyncjobs is not blocking multiple creates from occurring; whereas, a condition variable would be waiting for one thing to complete. My response to Jan also lists 2 other alternatives. This was just the "easiest" of the 3. If there's other ideas, I'm open to suggestions. John
-- Cedric