On 11/07/2017 04:36 AM, Ján Tomko wrote:
On Mon, Nov 06, 2017 at 03:53:08PM -0500, John Ferlan wrote:
> Resolve a storage driver crash as a result of a long running
> storageVolCreateXML when the virStorageVolPoolRefreshThread is
> run as a result of a storageVolUpload complete and ran the
> virStoragePoolObjClearVols without checking if the creation
> code was currently processing a buildVol after incrementing
> the driver->asyncjob count.
I thought the whole point of refreshing the pool in a thread
was to have the storage driver usable for other things.
This effectively disables pool refresh during long-running
jobs.
This would "disable" the refresh during a long running create/build job,
but that's already being done as other than another create, not much can
be done while a create/build is "in progress". The pool undefine,
destroy, delete, and refresh calls are all blocked (they fail) until the
create completes.
The reason for virStorageVolPoolRefreshThread is that updateVol can be
long running and once it's done (the stream is closed), it was desired
that the pool sizing values be updated. That meant either refresh the
entire pool (the easy way - theoretically) or update just the volume and
adjust the pool sizing values with the delta from the upload (similar to
how create/build alters the values).
Unfortunately the latter is not possible (yet) because only a subset of
pool types that support uploadVol also support refreshVol. But this
really is only a "problem" for the intersection of those that support
the buildVol too. That is the only reason we'd be blocking is if a pool
type supports both buildVol/buildVolFrom and uploadVol - so the fs,
logical, and vstorage pools.
The virStoragePoolFCRefreshThread used similar type logic w/r/t
refreshing the pool after a createVport completed. Here though, we're
waiting for udev to work it's magic in creating the vHBA LUN. The
asyncjob won't be > 0 for scsi pools yet, so perhaps patch 2 isn't
necessary, but it's preventative.
In any case, at this point in time I would think it's more desirable to
block during the build/upload condition as opposed to crash.
An "alternative" is to refresh the pool after a create vol rather than
just altering the pool available/allocation values from the created
volume target allocation. That way the VolPoolRefreshThread could just
"exit" knowing that volCreate will do the update. However, this refresh
could only happen when there are no asyncjobs outstanding. In the mean
time the pool sizing values probably would still need to be updated if
asyncjobs > 0 to avoid multiple create jobs from "exhausting" the pool
because the sizing values weren't properly updated.
>
> The refreshThread needs to wait until all creation threads
> are completed so as to not alter the volume list while the
> volume is being built.
>
> Crash from valgrind is as follows (with a bit of editing):
>
> ==21309== Invalid read of size 8
> ==21309== at 0x153E47AF: storageBackendUpdateVolTargetInfo
> ==21309== by 0x153E4C30: virStorageBackendUpdateVolInfo
> ==21309== by 0x153E52DE: virStorageBackendVolRefreshLocal
> ==21309== by 0x153DE29E: storageVolCreateXML
This is a fault of storageVolCreateXML for using invalid objects
after dropping the locks.
Partially, but the code also goes through the trouble of incrementing
the pool->asyncjobs count and setting the voldef->building flag for what
appears to be the reasoning that nothing else should modify the pool
volume list while we create/build a new volume, but the refresh thread
just ignores that. The asyncjobs was added in much simpler times in
commit id '2cd9b2d8'.
And this brings up a possible 3rd alternative. Perhaps the more real
problem is that virStoragePoolObjClearVols and refreshPool processing is
overly invasive with respect to free-ing the entire volume list and
rebuilding it rather than perhaps doing some sort of more intelligent
processing to "add" new volumes found in a pool that are not in the
list, "remove" ones no longer in the pool that are in the list, and
"update" those that are still in the list. That to me is outside the
scope of this change, but a noble goal at some point in time.
John
Jan
> ==21309== by 0x562035B: virStorageVolCreateXML
> ==21309== by 0x147366: remoteDispatchStorageVolCreateXML
> ...
> ==21309== Address 0x2590a720 is 64 bytes inside a block of size 336
> free'd
> ==21309== at 0x4C2F2BB: free
> ==21309== by 0x54CB9FA: virFree
> ==21309== by 0x55BC800: virStorageVolDefFree
> ==21309== by 0x55BF1D8: virStoragePoolObjClearVols
> ==21309== by 0x153D967E: virStorageVolPoolRefreshThread
> ...
> ==21309== Block was alloc'd at
> ==21309== at 0x4C300A5: calloc
> ==21309== by 0x54CB483: virAlloc
> ==21309== by 0x55BDC1F: virStorageVolDefParseXML
> ==21309== by 0x55BDC1F: virStorageVolDefParseNode
> ==21309== by 0x55BE5A4: virStorageVolDefParse
> ==21309== by 0x153DDFF1: storageVolCreateXML
> ==21309== by 0x562035B: virStorageVolCreateXML
> ==21309== by 0x147366: remoteDispatchStorageVolCreateXML
> ...
>
> Signed-off-by: John Ferlan <jferlan(a)redhat.com>
> ---
> src/storage/storage_driver.c | 9 +++++++++
> 1 file changed, 9 insertions(+)