On 07/01/14 02:30, John Ferlan wrote:
On 01/06/2014 05:19 AM, Osier Yang wrote:
> The SCSI device corresponding to the vHBA might not show up in
> sysfs yet when we trying to scan the LUNs. As a result, we will
> end up with an empty volume set for the pool after pool-start,
> even if there are LUNs.
So what causes the "delay" to get the LUN's into sysfs?
It's basicly from the delay of udev.
Is there
something that can be done at creation time (or wherever) to sync that
sooner?
I thought like that, let's say at the point of "createVport". But the
"createVport" is just to create the vHBA, and nothing else to do,
the left work for device showing up in the sysfs tree is on the
udev then.
Polling right after "createVport" for the SCSI device in sysfs tree
will take more time.
Is there a way to determine that the SCSI device hasn't shown
up yet other than the readdir()? You're adding a delay/loop for some
other subsystem's inability? to sync or provide the resources.
It's the only way AFAIK.
> Though the time of the device showing up is rather depended,
> better than doing nothing, this patch introduces the polling
> with 5 * 1 seconds in maximum (the time works fine on my
> testing machine at least). Note that for the pool which doesn't
> have any LUN, it will still take 5 seconds to poll, but it's
> not a bad trade, 5 seconds is not much, and in most cases,
> one won't use an empty pool in practice.
Since the paths that call into this only seem to be via refresh and
perhaps a subsequent refresh would resolve things.
Exactly, in most cases it will work, since the time window between
pool-start and pool-refresh should be enough for the SCSI device
showing up in the sysfs tree, *generally*. but it's not necessary
to be that.
Could this be better
served by documenting that it's possible that depending on circumstance
"X" (answer to my first question) that in order to see elements in the
pool, one may have to reload again. Assuming of course that the next
reload would find them...
I thought like this too, but the problem is it's not guaranteed that
the volume could be loaded after execute "pool-refresh" one time,
may be 2 times, 3 times, ... N times. Also the time of each
"pool-refresh" is not fixed, it depends on how long the
"udevadm settle" (see virFileWaitForDevices) will take.
I guess I'm a bit cautious about adding randomly picked timeout values
based on some test because while it may work for you, perhaps it's 10
seconds for someone else. While you may consider a 5 second pause "OK"
and "reasonable" a customer may not consider that to be reasonable.
People (and testers) do strange and random things.
Exactly, this is not the only problem we faced regarding to the
storage stuffs, and the users keeps asking why, why, and why.
Furthermore, could it be possible that you "catch" things perfectly and
only say 10 of 100 devices are found... But if you waited another 5
seconds the other 90 devices would show up.. I think by adding this
code you end up down a slippery slope of handing fc_host devices
specially...
We are exactly on the same page, but the question is what the
best solution we should provide? It looks ugly if we add documentation
saying one should use pool-refresh after the pool is started, if the
volumes are not loaded, but how many times to use the pool-refresh
is depended? This patch was basicly a proposal for discussion. I
didn't expect it could be committed smoothly.
Regards,
Osier