[libvirt] [PATCH 0/2] Storage: Fixes for the "fc_host" type pool

*** Enough details in the patch commits *** Osier Yang (2): storage: Fix autostart of pool with "fc_host" type adapter storage: Polling the sysfs for pool with "fc_host" type adapter src/storage/storage_backend_scsi.c | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) -- 1.8.1.4

The "checkPool" is a bit different for pool with "fc_host" type source adapter, since the vHBA it's based on might be not created yet (it's created by "startPool", which is involked after "checkPool" in storageDriverAutostart). So it should not fail, otherwise the "autostart" of the pool will fail either. The problem is easy to reproduce: * Enable "autostart" for the pool * Restart libvirtd service * Check the pool's state --- src/storage/storage_backend_scsi.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/src/storage/storage_backend_scsi.c b/src/storage/storage_backend_scsi.c index 6f86ffc..93039c1 100644 --- a/src/storage/storage_backend_scsi.c +++ b/src/storage/storage_backend_scsi.c @@ -702,8 +702,18 @@ virStorageBackendSCSICheckPool(virConnectPtr conn ATTRIBUTE_UNUSED, *isActive = false; - if (!(name = getAdapterName(pool->def->source.adapter))) - return -1; + if (!(name = getAdapterName(pool->def->source.adapter))) { + /* It's normal for the pool with "fc_host" type source + * adapter fails to get the adapter name, since the vHBA + * the adapter based on might be not created yet. + */ + if (pool->def->source.adapter.type == + VIR_STORAGE_POOL_SOURCE_ADAPTER_TYPE_FC_HOST) { + return 0; + } else { + return -1; + } + } if (getHostNumber(name, &host) < 0) goto cleanup; -- 1.8.1.4

On 01/06/2014 05:19 AM, Osier Yang wrote:
The "checkPool" is a bit different for pool with "fc_host" type source adapter, since the vHBA it's based on might be not created yet (it's created by "startPool", which is involked after "checkPool" in storageDriverAutostart). So it should not fail, otherwise the "autostart" of the pool will fail either.
The problem is easy to reproduce: * Enable "autostart" for the pool * Restart libvirtd service * Check the pool's state --- src/storage/storage_backend_scsi.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
Not sure this is the right thing to do. With this change it doesn't matter what getAdapterName() returns for fc_host's... The getAdapterName() already has some code to specifically handle VIR_STORAGE_POOL_SOURCE_ADAPTER_TYPE_FC_HOST. The call to virGetFCHostNameByWWN() is similar to createVport() which is called by the 'start' path, except that the createVport() path will be happy if the name already exists. Since it seems the checkPool is meant to initialize things (from reading the error message in the calling function), why not create the vPort in the check function? It's a bit more refactoring that perhaps initially desired, although having a vport hanging around unused may not be quite right either. Another option would be to have the check function perform "enough initialization" or "checks" to make it more likely that the start path will succeed. Looking at the code does cause me to wonder what happens in the "existing" code if the vport was created when the CheckPool function was called returning the NameByWWN() in the 'name' field. The subsequent getHostNumber() call uses the 'name' instead of what the start path does using the parent value. It seems the "check" for fc_host and non-fc_host is different enough that the distinguishment needs to be in the check routine rather than hidden in the getAdapterName() function. John
diff --git a/src/storage/storage_backend_scsi.c b/src/storage/storage_backend_scsi.c index 6f86ffc..93039c1 100644 --- a/src/storage/storage_backend_scsi.c +++ b/src/storage/storage_backend_scsi.c @@ -702,8 +702,18 @@ virStorageBackendSCSICheckPool(virConnectPtr conn ATTRIBUTE_UNUSED,
*isActive = false;
- if (!(name = getAdapterName(pool->def->source.adapter))) - return -1; + if (!(name = getAdapterName(pool->def->source.adapter))) { + /* It's normal for the pool with "fc_host" type source + * adapter fails to get the adapter name, since the vHBA + * the adapter based on might be not created yet. + */ + if (pool->def->source.adapter.type == + VIR_STORAGE_POOL_SOURCE_ADAPTER_TYPE_FC_HOST) { + return 0; + } else { + return -1; + } + }
if (getHostNumber(name, &host) < 0) goto cleanup;

On 07/01/14 01:21, John Ferlan wrote:
On 01/06/2014 05:19 AM, Osier Yang wrote:
The "checkPool" is a bit different for pool with "fc_host" type source adapter, since the vHBA it's based on might be not created yet (it's created by "startPool", which is involked after "checkPool" in storageDriverAutostart). So it should not fail, otherwise the "autostart" of the pool will fail either.
The problem is easy to reproduce: * Enable "autostart" for the pool * Restart libvirtd service * Check the pool's state --- src/storage/storage_backend_scsi.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
Not sure this is the right thing to do. With this change it doesn't matter what getAdapterName() returns for fc_host's...
It matters with if "*isActive" is true or false in the end. We don't need to try to start the pool if it's already active.
The getAdapterName() already has some code to specifically handle VIR_STORAGE_POOL_SOURCE_ADAPTER_TYPE_FC_HOST. The call to virGetFCHostNameByWWN() is similar to createVport() which is called by the 'start' path, except that the createVport() path will be happy if the name already exists.
Since it seems the checkPool is meant to initialize things (from reading the error message in the calling function), why not create the vPort in the check function? It's a bit more refactoring that perhaps initially desired, although having a vport hanging around unused may not be quite right either.
No, "checkPool" is only involked when autostarting the pools. Not by poolCreate, poolCreateXML, etc. That's why creating pool must fall into "startPool".
Another option would be to have the check function perform "enough initialization" or "checks" to make it more likely that the start path will succeed.
That's actually what "checkPool" *should* do. But for "fc_host" pool, it's a bit special, since it's unlike other pool types, the underlying physical stuffs must be already existing on host.
Looking at the code does cause me to wonder what happens in the "existing" code if the vport was created when the CheckPool function was called returning the NameByWWN() in the 'name' field. The subsequent getHostNumber() call uses the 'name' instead of what the start path does using the parent value.
It's right. "getHostNumber" in "checkPool" is to get the SCSI host number of the vHBA. And then checking if the SCSI device shows up in the sysfs tree with the got host number. "getHostNumber" in "startPool" should get the parent's host number, since it should know the sys file path "/.../.../create_vport". And write command to it.
It seems the "check" for fc_host and non-fc_host is different enough that the distinguishment needs to be in the check routine rather than hidden in the getAdapterName() function.
Not quite clear about your meaning. But getAdapterName is just a wrapper to get the adapter name with regarding to different adapter type. Any relationship of the "check" difference ? Regards, Osier

On 01/07/2014 06:07 AM, Osier Yang wrote:
On 07/01/14 01:21, John Ferlan wrote:
On 01/06/2014 05:19 AM, Osier Yang wrote:
The "checkPool" is a bit different for pool with "fc_host" type source adapter, since the vHBA it's based on might be not created yet (it's created by "startPool", which is involked after "checkPool" in storageDriverAutostart). So it should not fail, otherwise the "autostart" of the pool will fail either.
The problem is easy to reproduce: * Enable "autostart" for the pool * Restart libvirtd service * Check the pool's state --- src/storage/storage_backend_scsi.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
Not sure this is the right thing to do. With this change it doesn't matter what getAdapterName() returns for fc_host's...
It matters with if "*isActive" is true or false in the end. We don't need to try to start the pool if it's already active.
Fair enough; however, let's consider the failure case. On failure, a message is reported and name == NULL. Do we need to clear that error now? I think my original thoughts were along the lines of having getAdapterName do more of the heavy lifting. Have both check and start call it. It would call the createVport which would be mostly unchanged, except for the virFileWaitForDevices() which could move to start... Thus getAdapterName is (and moves below deleteVport) if !fc_host vir_STRDUP(name) else name = createVport return name CheckPool doesn't change createVport changes to return the name that means after the ManageVport the name would need to be fabricated or the NameByWWN() called again the virFileWaitForDevices() moves to the start function startPool calls getAdapterName() if it doesn't get a name back - fail with error message that was in getAdapterName when NameByWWN first failed. free the name call virFileWaitForDevices() I've attached a format-patch output for this logic - your call as to whether or not you want to use it... If you keep your logic, then just decide upon how to handle the error message... It won't be necessary for the check case, but would be for the refresh case. I am OK with things as they are, it just looks odd in the CheckPool function to be special casing FC_HOST just because it's not created yet and then to have the start function do the create anyway. It just seems we could be smarter/better. John
The getAdapterName() already has some code to specifically handle VIR_STORAGE_POOL_SOURCE_ADAPTER_TYPE_FC_HOST. The call to virGetFCHostNameByWWN() is similar to createVport() which is called by the 'start' path, except that the createVport() path will be happy if the name already exists.
Since it seems the checkPool is meant to initialize things (from reading the error message in the calling function), why not create the vPort in the check function? It's a bit more refactoring that perhaps initially desired, although having a vport hanging around unused may not be quite right either.
No, "checkPool" is only involked when autostarting the pools. Not by poolCreate, poolCreateXML, etc. That's why creating pool must fall into "startPool".
Another option would be to have the check function perform "enough initialization" or "checks" to make it more likely that the start path will succeed.
That's actually what "checkPool" *should* do. But for "fc_host" pool, it's a bit special, since it's unlike other pool types, the underlying physical stuffs must be already existing on host.
Looking at the code does cause me to wonder what happens in the "existing" code if the vport was created when the CheckPool function was called returning the NameByWWN() in the 'name' field. The subsequent getHostNumber() call uses the 'name' instead of what the start path does using the parent value.
It's right. "getHostNumber" in "checkPool" is to get the SCSI host number of the vHBA. And then checking if the SCSI device shows up in the sysfs tree with the got host number.
"getHostNumber" in "startPool" should get the parent's host number, since it should know the sys file path "/.../.../create_vport". And write command to it.
It seems the "check" for fc_host and non-fc_host is different enough that the distinguishment needs to be in the check routine rather than hidden in the getAdapterName() function.
Not quite clear about your meaning. But getAdapterName is just a wrapper to get the adapter name with regarding to different adapter type. Any relationship of the "check" difference ?
Regards, Osier

On 22/01/14 21:36, John Ferlan wrote:
On 01/07/2014 06:07 AM, Osier Yang wrote:
On 07/01/14 01:21, John Ferlan wrote:
On 01/06/2014 05:19 AM, Osier Yang wrote:
The "checkPool" is a bit different for pool with "fc_host" type source adapter, since the vHBA it's based on might be not created yet (it's created by "startPool", which is involked after "checkPool" in storageDriverAutostart). So it should not fail, otherwise the "autostart" of the pool will fail either.
The problem is easy to reproduce: * Enable "autostart" for the pool * Restart libvirtd service * Check the pool's state --- src/storage/storage_backend_scsi.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
Not sure this is the right thing to do. With this change it doesn't matter what getAdapterName() returns for fc_host's... It matters with if "*isActive" is true or false in the end. We don't need to try to start the pool if it's already active.
Fair enough; however, let's consider the failure case. On failure, a message is reported and name == NULL. Do we need to clear that error now?
Indeed.
I've attached a format-patch output for this logic - your call as to whether or not you want to use it... If you keep your logic, then just decide upon how to handle the error message... It won't be necessary for the check case, but would be for the refresh case.
I am OK with things as they are, it just looks odd in the CheckPool function to be special casing FC_HOST just because it's not created yet and then to have the start function do the create anyway. It just seems we could be smarter/better.
I agree your method is more linear, except there is no need for the error statement in startPool (hope the indention is not broken), it will just override the useful errors in the code path. diff --git a/src/storage/storage_backend_scsi.c b/src/storage/storage_backend_scsi.c index d3d14ce..f6cb820 100644 --- a/src/storage/storage_backend_scsi.c +++ b/src/storage/storage_backend_scsi.c @@ -744,13 +744,8 @@ virStorageBackendSCSIStartPool(virConnectPtr conn ATTRIBUTE_UNUSED, { char *name = NULL; virStoragePoolSourceAdapter adapter = pool->def->source.adapter; - if (!(name = getAdapterName(pool->def->source.adapter))) { - virReportError(VIR_ERR_XML_ERROR, - _("Failed to find SCSI host with wwnn='%s', " - "wwpn='%s'"), adapter.data.fchost.wwnn, - adapter.data.fchost.wwpn); + if (!(name = getAdapterName(pool->def->source.adapter))) return -1; - } VIR_FREE(name); virFileWaitForDevices(); return 0; To ensure things work well, I'm going to put the patch into practice tomorrow on a NPIV machine. If it goes well, I will push it with above change. Osier

On 22/01/14 21:36, John Ferlan wrote:
On 01/07/2014 06:07 AM, Osier Yang wrote:
On 07/01/14 01:21, John Ferlan wrote:
On 01/06/2014 05:19 AM, Osier Yang wrote:
The "checkPool" is a bit different for pool with "fc_host" type source adapter, since the vHBA it's based on might be not created yet (it's created by "startPool", which is involked after "checkPool" in storageDriverAutostart). So it should not fail, otherwise the "autostart" of the pool will fail either.
The problem is easy to reproduce: * Enable "autostart" for the pool * Restart libvirtd service * Check the pool's state --- src/storage/storage_backend_scsi.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
Not sure this is the right thing to do. With this change it doesn't matter what getAdapterName() returns for fc_host's... It matters with if "*isActive" is true or false in the end. We don't need to try to start the pool if it's already active.
Fair enough; however, let's consider the failure case. On failure, a message is reported and name == NULL. Do we need to clear that error now?
I think my original thoughts were along the lines of having getAdapterName do more of the heavy lifting. Have both check and start call it. It would call the createVport which would be mostly unchanged, except for the virFileWaitForDevices() which could move to start...
Found your method *might* break one logic. Assuming the pool is not marked as "autostart", and the vHBA was not created yet. Since with your method "checkPool" will create the vHBA now, then it's possible for "checkPool" to return "*isActive" as true, as long as the device path for the created vHBA can show up in host very quickly (like what we talked a lot in PATCH 2/2, how long it will show up is also depended, the time can be long, but it also can be short), i.e. during the "checkPool" process. And thus the pool will be marked as "Active". As the result, the problem on one side of the coin is fixed, but it introduces a similar problem on another side. :-) That's why I said previously I don't want to do any creation work in "checkPool", it should only check something. So, I'm going forward to push my patch, with the last error reset in "checkPool". Osier

On 01/23/2014 04:45 AM, Osier Yang wrote:
On 22/01/14 21:36, John Ferlan wrote:
On 01/07/2014 06:07 AM, Osier Yang wrote:
On 07/01/14 01:21, John Ferlan wrote:
On 01/06/2014 05:19 AM, Osier Yang wrote:
The "checkPool" is a bit different for pool with "fc_host" type source adapter, since the vHBA it's based on might be not created yet (it's created by "startPool", which is involked after "checkPool" in storageDriverAutostart). So it should not fail, otherwise the "autostart" of the pool will fail either.
The problem is easy to reproduce: * Enable "autostart" for the pool * Restart libvirtd service * Check the pool's state --- src/storage/storage_backend_scsi.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
Not sure this is the right thing to do. With this change it doesn't matter what getAdapterName() returns for fc_host's... It matters with if "*isActive" is true or false in the end. We don't need to try to start the pool if it's already active.
Fair enough; however, let's consider the failure case. On failure, a message is reported and name == NULL. Do we need to clear that error now?
I think my original thoughts were along the lines of having getAdapterName do more of the heavy lifting. Have both check and start call it. It would call the createVport which would be mostly unchanged, except for the virFileWaitForDevices() which could move to start...
Found your method *might* break one logic.
Assuming the pool is not marked as "autostart", and the vHBA was not created yet. Since with your method "checkPool" will create the vHBA now, then it's possible for "checkPool" to return "*isActive" as true, as long as the device path for the created vHBA can show up in host very quickly (like what we talked a lot in PATCH 2/2, how long it will show up is also depended, the time can be long, but it also can be short), i.e. during the "checkPool" process. And thus the pool will be marked as "Active". As the result, the problem on one side of the coin is fixed, but it introduces a similar problem on another side. :-)
Huh? Not sure I see what problem you're driving at. Look back at the caller - isActive from _scsi translates to "started" in the caller. The 'started' is used to decide whether to call 'startPool' and 'refreshPool'. If autostart is disabled, then 'startPool' won't be called. Truly all that's not done prior to refreshPool being called is the 'virFileWaitForDevices();'... So move that into createVport(). Or am I missing something? If the pool isn't autostarted, then the next start will call getAdapterName() which determines the pool is started (or not) and be happy. John
That's why I said previously I don't want to do any creation work in "checkPool", it should only check something.
So, I'm going forward to push my patch, with the last error reset in "checkPool".
Osier

On 23/01/14 21:35, John Ferlan wrote:
On 01/23/2014 04:45 AM, Osier Yang wrote:
On 22/01/14 21:36, John Ferlan wrote:
On 01/07/2014 06:07 AM, Osier Yang wrote:
On 07/01/14 01:21, John Ferlan wrote:
On 01/06/2014 05:19 AM, Osier Yang wrote:
The "checkPool" is a bit different for pool with "fc_host" type source adapter, since the vHBA it's based on might be not created yet (it's created by "startPool", which is involked after "checkPool" in storageDriverAutostart). So it should not fail, otherwise the "autostart" of the pool will fail either.
The problem is easy to reproduce: * Enable "autostart" for the pool * Restart libvirtd service * Check the pool's state --- src/storage/storage_backend_scsi.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
Not sure this is the right thing to do. With this change it doesn't matter what getAdapterName() returns for fc_host's... It matters with if "*isActive" is true or false in the end. We don't need to try to start the pool if it's already active.
Fair enough; however, let's consider the failure case. On failure, a message is reported and name == NULL. Do we need to clear that error now?
I think my original thoughts were along the lines of having getAdapterName do more of the heavy lifting. Have both check and start call it. It would call the createVport which would be mostly unchanged, except for the virFileWaitForDevices() which could move to start...
Found your method *might* break one logic.
Assuming the pool is not marked as "autostart", and the vHBA was not created yet. Since with your method "checkPool" will create the vHBA now, then it's possible for "checkPool" to return "*isActive" as true, as long as the device path for the created vHBA can show up in host very quickly (like what we talked a lot in PATCH 2/2, how long it will show up is also depended, the time can be long, but it also can be short), i.e. during the "checkPool" process. And thus the pool will be marked as "Active". As the result, the problem on one side of the coin is fixed, but it introduces a similar problem on another side. :-) Huh? Not sure I see what problem you're driving at.
Look back at the caller - isActive from _scsi translates to "started" in the caller. The 'started' is used to decide whether to call 'startPool' and 'refreshPool'. If autostart is disabled, then 'startPool' won't be called.
Assuming pool->autostart if false, and "started" is happened to be true. Calling refreshPool is likely to cause pool->active == 1. <...> if (started) { if (backend->refreshPool(conn, pool) < 0) { virErrorPtr err = virGetLastError(); if (backend->stopPool) backend->stopPool(conn, pool); VIR_ERROR(_("Failed to autostart storage pool '%s': %s"), pool->def->name, err ? err->message : _("no error message found")); virStoragePoolObjUnlock(pool); continue; } pool->active = 1; } </...> Since the vHBA was already created in checkPool, then I see not much reason why refreshPool can not be success. Osier

The SCSI device corresponding to the vHBA might not show up in sysfs yet when we trying to scan the LUNs. As a result, we will end up with an empty volume set for the pool after pool-start, even if there are LUNs. Though the time of the device showing up is rather depended, better than doing nothing, this patch introduces the polling with 5 * 1 seconds in maximum (the time works fine on my testing machine at least). Note that for the pool which doesn't have any LUN, it will still take 5 seconds to poll, but it's not a bad trade, 5 seconds is not much, and in most cases, one won't use an empty pool in practice. --- src/storage/storage_backend_scsi.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/src/storage/storage_backend_scsi.c b/src/storage/storage_backend_scsi.c index 93039c1..2efcdb8 100644 --- a/src/storage/storage_backend_scsi.c +++ b/src/storage/storage_backend_scsi.c @@ -495,6 +495,8 @@ virStorageBackendSCSIFindLUs(virStoragePoolObjPtr pool, DIR *devicedir = NULL; struct dirent *lun_dirent = NULL; char devicepattern[64]; + bool found = false; + size_t i = 0; VIR_DEBUG("Discovering LUs on host %u", scanhost); @@ -510,6 +512,7 @@ virStorageBackendSCSIFindLUs(virStoragePoolObjPtr pool, snprintf(devicepattern, sizeof(devicepattern), "%u:%%u:%%u:%%u\n", scanhost); +retry: while ((lun_dirent = readdir(devicedir))) { if (sscanf(lun_dirent->d_name, devicepattern, &bus, &target, &lun) != 3) { @@ -518,9 +521,22 @@ virStorageBackendSCSIFindLUs(virStoragePoolObjPtr pool, VIR_DEBUG("Found LU '%s'", lun_dirent->d_name); + found = true; processLU(pool, scanhost, bus, target, lun); } + /* Sleep for 5 seconds in maximum if the pool's source + * adapter type is "fc_host", since the corresponding + * SCSI device might not show up in the sysfs yet. + */ + if (!found && i++ < 5 && + pool->def->source.adapter.type == + VIR_STORAGE_POOL_SOURCE_ADAPTER_TYPE_FC_HOST) { + sleep(1); + rewinddir(devicedir); + goto retry; + } + closedir(devicedir); return retval; -- 1.8.1.4

On 01/06/2014 05:19 AM, Osier Yang wrote:
The SCSI device corresponding to the vHBA might not show up in sysfs yet when we trying to scan the LUNs. As a result, we will end up with an empty volume set for the pool after pool-start, even if there are LUNs.
So what causes the "delay" to get the LUN's into sysfs? Is there something that can be done at creation time (or wherever) to sync that sooner? Is there a way to determine that the SCSI device hasn't shown up yet other than the readdir()? You're adding a delay/loop for some other subsystem's inability? to sync or provide the resources.
Though the time of the device showing up is rather depended, better than doing nothing, this patch introduces the polling with 5 * 1 seconds in maximum (the time works fine on my testing machine at least). Note that for the pool which doesn't have any LUN, it will still take 5 seconds to poll, but it's not a bad trade, 5 seconds is not much, and in most cases, one won't use an empty pool in practice.
Since the paths that call into this only seem to be via refresh and perhaps a subsequent refresh would resolve things. Could this be better served by documenting that it's possible that depending on circumstance "X" (answer to my first question) that in order to see elements in the pool, one may have to reload again. Assuming of course that the next reload would find them... I guess I'm a bit cautious about adding randomly picked timeout values based on some test because while it may work for you, perhaps it's 10 seconds for someone else. While you may consider a 5 second pause "OK" and "reasonable" a customer may not consider that to be reasonable. People (and testers) do strange and random things. Furthermore, could it be possible that you "catch" things perfectly and only say 10 of 100 devices are found... But if you waited another 5 seconds the other 90 devices would show up.. I think by adding this code you end up down a slippery slope of handing fc_host devices specially... John
--- src/storage/storage_backend_scsi.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/src/storage/storage_backend_scsi.c b/src/storage/storage_backend_scsi.c index 93039c1..2efcdb8 100644 --- a/src/storage/storage_backend_scsi.c +++ b/src/storage/storage_backend_scsi.c @@ -495,6 +495,8 @@ virStorageBackendSCSIFindLUs(virStoragePoolObjPtr pool, DIR *devicedir = NULL; struct dirent *lun_dirent = NULL; char devicepattern[64]; + bool found = false; + size_t i = 0;
VIR_DEBUG("Discovering LUs on host %u", scanhost);
@@ -510,6 +512,7 @@ virStorageBackendSCSIFindLUs(virStoragePoolObjPtr pool,
snprintf(devicepattern, sizeof(devicepattern), "%u:%%u:%%u:%%u\n", scanhost);
+retry: while ((lun_dirent = readdir(devicedir))) { if (sscanf(lun_dirent->d_name, devicepattern, &bus, &target, &lun) != 3) { @@ -518,9 +521,22 @@ virStorageBackendSCSIFindLUs(virStoragePoolObjPtr pool,
VIR_DEBUG("Found LU '%s'", lun_dirent->d_name);
+ found = true; processLU(pool, scanhost, bus, target, lun); }
+ /* Sleep for 5 seconds in maximum if the pool's source + * adapter type is "fc_host", since the corresponding + * SCSI device might not show up in the sysfs yet. + */ + if (!found && i++ < 5 && + pool->def->source.adapter.type == + VIR_STORAGE_POOL_SOURCE_ADAPTER_TYPE_FC_HOST) { + sleep(1); + rewinddir(devicedir); + goto retry; + } + closedir(devicedir);
return retval;

On 07/01/14 02:30, John Ferlan wrote:
On 01/06/2014 05:19 AM, Osier Yang wrote:
The SCSI device corresponding to the vHBA might not show up in sysfs yet when we trying to scan the LUNs. As a result, we will end up with an empty volume set for the pool after pool-start, even if there are LUNs. So what causes the "delay" to get the LUN's into sysfs?
It's basicly from the delay of udev.
Is there something that can be done at creation time (or wherever) to sync that sooner?
I thought like that, let's say at the point of "createVport". But the "createVport" is just to create the vHBA, and nothing else to do, the left work for device showing up in the sysfs tree is on the udev then. Polling right after "createVport" for the SCSI device in sysfs tree will take more time.
Is there a way to determine that the SCSI device hasn't shown up yet other than the readdir()? You're adding a delay/loop for some other subsystem's inability? to sync or provide the resources.
It's the only way AFAIK.
Though the time of the device showing up is rather depended, better than doing nothing, this patch introduces the polling with 5 * 1 seconds in maximum (the time works fine on my testing machine at least). Note that for the pool which doesn't have any LUN, it will still take 5 seconds to poll, but it's not a bad trade, 5 seconds is not much, and in most cases, one won't use an empty pool in practice.
Since the paths that call into this only seem to be via refresh and perhaps a subsequent refresh would resolve things.
Exactly, in most cases it will work, since the time window between pool-start and pool-refresh should be enough for the SCSI device showing up in the sysfs tree, *generally*. but it's not necessary to be that.
Could this be better served by documenting that it's possible that depending on circumstance "X" (answer to my first question) that in order to see elements in the pool, one may have to reload again. Assuming of course that the next reload would find them...
I thought like this too, but the problem is it's not guaranteed that the volume could be loaded after execute "pool-refresh" one time, may be 2 times, 3 times, ... N times. Also the time of each "pool-refresh" is not fixed, it depends on how long the "udevadm settle" (see virFileWaitForDevices) will take.
I guess I'm a bit cautious about adding randomly picked timeout values based on some test because while it may work for you, perhaps it's 10 seconds for someone else. While you may consider a 5 second pause "OK" and "reasonable" a customer may not consider that to be reasonable. People (and testers) do strange and random things.
Exactly, this is not the only problem we faced regarding to the storage stuffs, and the users keeps asking why, why, and why.
Furthermore, could it be possible that you "catch" things perfectly and only say 10 of 100 devices are found... But if you waited another 5 seconds the other 90 devices would show up.. I think by adding this code you end up down a slippery slope of handing fc_host devices specially...
We are exactly on the same page, but the question is what the best solution we should provide? It looks ugly if we add documentation saying one should use pool-refresh after the pool is started, if the volumes are not loaded, but how many times to use the pool-refresh is depended? This patch was basicly a proposal for discussion. I didn't expect it could be committed smoothly. Regards, Osier

On 01/07/2014 05:37 AM, Osier Yang wrote:
On 07/01/14 02:30, John Ferlan wrote:
On 01/06/2014 05:19 AM, Osier Yang wrote:
The SCSI device corresponding to the vHBA might not show up in sysfs yet when we trying to scan the LUNs. As a result, we will end up with an empty volume set for the pool after pool-start, even if there are LUNs. So what causes the "delay" to get the LUN's into sysfs?
It's basicly from the delay of udev.
Is there something that can be done at creation time (or wherever) to sync that sooner?
I thought like that, let's say at the point of "createVport". But the "createVport" is just to create the vHBA, and nothing else to do, the left work for device showing up in the sysfs tree is on the udev then.
Polling right after "createVport" for the SCSI device in sysfs tree will take more time.
Is there a way to determine that the SCSI device hasn't shown up yet other than the readdir()? You're adding a delay/loop for some other subsystem's inability? to sync or provide the resources.
It's the only way AFAIK.
Though the time of the device showing up is rather depended, better than doing nothing, this patch introduces the polling with 5 * 1 seconds in maximum (the time works fine on my testing machine at least). Note that for the pool which doesn't have any LUN, it will still take 5 seconds to poll, but it's not a bad trade, 5 seconds is not much, and in most cases, one won't use an empty pool in practice.
Since the paths that call into this only seem to be via refresh and perhaps a subsequent refresh would resolve things.
Exactly, in most cases it will work, since the time window between pool-start and pool-refresh should be enough for the SCSI device showing up in the sysfs tree, *generally*. but it's not necessary to be that.
Could this be better served by documenting that it's possible that depending on circumstance "X" (answer to my first question) that in order to see elements in the pool, one may have to reload again. Assuming of course that the next reload would find them...
I thought like this too, but the problem is it's not guaranteed that the volume could be loaded after execute "pool-refresh" one time, may be 2 times, 3 times, ... N times. Also the time of each "pool-refresh" is not fixed, it depends on how long the "udevadm settle" (see virFileWaitForDevices) will take.
I guess I'm a bit cautious about adding randomly picked timeout values based on some test because while it may work for you, perhaps it's 10 seconds for someone else. While you may consider a 5 second pause "OK" and "reasonable" a customer may not consider that to be reasonable. People (and testers) do strange and random things.
Exactly, this is not the only problem we faced regarding to the storage stuffs, and the users keeps asking why, why, and why.
Furthermore, could it be possible that you "catch" things perfectly and only say 10 of 100 devices are found... But if you waited another 5 seconds the other 90 devices would show up.. I think by adding this code you end up down a slippery slope of handing fc_host devices specially...
We are exactly on the same page, but the question is what the best solution we should provide? It looks ugly if we add documentation saying one should use pool-refresh after the pool is started, if the volumes are not loaded, but how many times to use the pool-refresh is depended? This patch was basicly a proposal for discussion. I didn't expect it could be committed smoothly.
I'm still not convinced that the retry loop is the right way to go. We could conceivably still have a failure even after 5 retries at once per second. Also is sleep() the right call or should it be usleep()? I have this alarm (sic) going off in the back of my head about using sleep() in a threaded context. Although I do see node_device_driver.c uses it... Regardless of whether we (u)sleep() or not, if we still have a failure, then we're still left documenting the fact that for fc_host type pools there are "conditions" that need to be met which are out of the control of libvirt's scope of influence that would cause the pool to not be populated. The "user" is still left trying to refresh multiple times. And we still don't know exactly whey we're not seeing the devices. The reality is this is also true for other pools as well since it's not guaranteed that processLU() is ever called and 'retval' is always 0 (and probably could be removed since it's pointless). Secondary question - is it possible for someone to configure an empty directory? Then add LU's afterwards? Why wait the extra 5 seconds just because it was configured that way? In any case, if you're going to be checking that no LU's were found, then perhaps a VIR_DEBUG() message indicating no LU's were found... That message could be generic regardless of fc_host or not (eg. "No LU's found in %s", device_path)... John

On 22/01/14 22:02, John Ferlan wrote:
On 01/07/2014 05:37 AM, Osier Yang wrote:
On 07/01/14 02:30, John Ferlan wrote:
On 01/06/2014 05:19 AM, Osier Yang wrote:
The SCSI device corresponding to the vHBA might not show up in sysfs yet when we trying to scan the LUNs. As a result, we will end up with an empty volume set for the pool after pool-start, even if there are LUNs. So what causes the "delay" to get the LUN's into sysfs? It's basicly from the delay of udev.
Is there something that can be done at creation time (or wherever) to sync that sooner? I thought like that, let's say at the point of "createVport". But the "createVport" is just to create the vHBA, and nothing else to do, the left work for device showing up in the sysfs tree is on the udev then.
Polling right after "createVport" for the SCSI device in sysfs tree will take more time.
Is there a way to determine that the SCSI device hasn't shown up yet other than the readdir()? You're adding a delay/loop for some other subsystem's inability? to sync or provide the resources. It's the only way AFAIK.
Though the time of the device showing up is rather depended, better than doing nothing, this patch introduces the polling with 5 * 1 seconds in maximum (the time works fine on my testing machine at least). Note that for the pool which doesn't have any LUN, it will still take 5 seconds to poll, but it's not a bad trade, 5 seconds is not much, and in most cases, one won't use an empty pool in practice. Since the paths that call into this only seem to be via refresh and perhaps a subsequent refresh would resolve things. Exactly, in most cases it will work, since the time window between pool-start and pool-refresh should be enough for the SCSI device showing up in the sysfs tree, *generally*. but it's not necessary to be that.
Could this be better served by documenting that it's possible that depending on circumstance "X" (answer to my first question) that in order to see elements in the pool, one may have to reload again. Assuming of course that the next reload would find them... I thought like this too, but the problem is it's not guaranteed that the volume could be loaded after execute "pool-refresh" one time, may be 2 times, 3 times, ... N times. Also the time of each "pool-refresh" is not fixed, it depends on how long the "udevadm settle" (see virFileWaitForDevices) will take.
I guess I'm a bit cautious about adding randomly picked timeout values based on some test because while it may work for you, perhaps it's 10 seconds for someone else. While you may consider a 5 second pause "OK" and "reasonable" a customer may not consider that to be reasonable. People (and testers) do strange and random things. Exactly, this is not the only problem we faced regarding to the storage stuffs, and the users keeps asking why, why, and why.
Furthermore, could it be possible that you "catch" things perfectly and only say 10 of 100 devices are found... But if you waited another 5 seconds the other 90 devices would show up.. I think by adding this code you end up down a slippery slope of handing fc_host devices specially... We are exactly on the same page, but the question is what the best solution we should provide? It looks ugly if we add documentation saying one should use pool-refresh after the pool is started, if the volumes are not loaded, but how many times to use the pool-refresh is depended? This patch was basicly a proposal for discussion. I didn't expect it could be committed smoothly.
I'm still not convinced that the retry loop is the right way to go. We could conceivably still have a failure even after 5 retries at once per second.
Also is sleep() the right call or should it be usleep()? I have this alarm (sic) going off in the back of my head about using sleep() in a threaded context. Although I do see node_device_driver.c uses it...
Regardless of whether we (u)sleep() or not, if we still have a failure, then we're still left documenting the fact that for fc_host type pools there are "conditions" that need to be met which are out of the control of libvirt's scope of influence that would cause the pool to not be populated. The "user" is still left trying to refresh multiple times. And we still don't know exactly whey we're not seeing the devices. The reality is this is also true for other pools as well since it's not guaranteed that processLU() is ever called and 'retval' is always 0 (and probably could be removed since it's pointless).
I gave up hacking in the code, instead, I'm making a patch to add document in virsh manual as a "Note" to prompt the fact. Something like this: <...> +B<Note>: For pool which relies on remote resources, e.g. a "iscsi" type +pool, or a "scsi" type pool based on a (v)HBA, since the corresponding +devices for the volumes may not show up on the host's sysfs yet during +the pool starting process, it may need to refresh the pool (see +B<pool-refresh>) to get the volumes correctly loaded. How many times needed +for the refreshing is depended on the network connection and the time +the host takes to export the corresponding devices to sysfs. </...> Osier

For pool which relies on remote resources, such as a "iscsi" type pool, since how long it takes to export the corresponding devices to host's sysfs is really depended, it could depend on the network connection, it also could depend on the host's udev procedures. So it's likely that the volumes are not able to be detected during pool starting process, polling the sysfs doesn't work, since we don't know how much time is best for the polling, and even worse, the volumes could still be not detected or partly not detected even after the polling. So we end up with a documentation to prompt the fact, in virsh manual. And as a small improvement, let's explicitly say no LUNs found in the debug log in that case. --- src/storage/storage_backend_scsi.c | 5 +++++ tools/virsh.pod | 8 ++++++++ 2 files changed, 13 insertions(+) diff --git a/src/storage/storage_backend_scsi.c b/src/storage/storage_backend_scsi.c index 93039c1..fbcb102 100644 --- a/src/storage/storage_backend_scsi.c +++ b/src/storage/storage_backend_scsi.c @@ -495,6 +495,7 @@ virStorageBackendSCSIFindLUs(virStoragePoolObjPtr pool, DIR *devicedir = NULL; struct dirent *lun_dirent = NULL; char devicepattern[64]; + bool found = false; VIR_DEBUG("Discovering LUs on host %u", scanhost); @@ -516,11 +517,15 @@ virStorageBackendSCSIFindLUs(virStoragePoolObjPtr pool, continue; } + found = true; VIR_DEBUG("Found LU '%s'", lun_dirent->d_name); processLU(pool, scanhost, bus, target, lun); } + if (!found) + VIR_DEBUG("No LU found for pool %s", pool->def->name); + closedir(devicedir); return retval; diff --git a/tools/virsh.pod b/tools/virsh.pod index 77f9383..073465e 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -2650,6 +2650,14 @@ Refresh the list of volumes contained in I<pool>. Start the storage I<pool>, which is previously defined but inactive. +B<Note>: For pool which relies on remote resources, e.g. a "iscsi" type +pool, or a "scsi" type pool based on a (v)HBA, since the corresponding +devices for the volumes may not show up on the host's sysfs yet during +the pool starting process, it may need to refresh the pool (see +B<pool-refresh>) to get the volumes correctly detected. How many times +needed for the refreshing is depended on the network connection and the +time the host takes to export the corresponding devices to sysfs. + =item B<pool-undefine> I<pool-or-uuid> Undefine the configuration for an inactive I<pool>. -- 1.8.1.4

On 01/22/2014 09:39 AM, Osier Yang wrote:
For pool which relies on remote resources, such as a "iscsi" type pool, since how long it takes to export the corresponding devices to host's sysfs is really depended, it could depend on the network connection, it also could depend on the host's udev procedures. So it's likely that the volumes are not able to be detected during pool starting process, polling the sysfs doesn't work, since we don't know how much time is best for the polling, and even worse, the volumes could still be not detected or partly not detected even after the polling. So we end up with a documentation to prompt the fact, in virsh manual.
Probably some of the above is overkill since the virsh.pod repeats much of it - although having the intro comment specifically target udev procedures (or "udevadm settle") as the culprit are good. Also see my change to your text below.
And as a small improvement, let's explicitly say no LUNs found in the debug log in that case. --- src/storage/storage_backend_scsi.c | 5 +++++ tools/virsh.pod | 8 ++++++++ 2 files changed, 13 insertions(+)
diff --git a/src/storage/storage_backend_scsi.c b/src/storage/storage_backend_scsi.c index 93039c1..fbcb102 100644 --- a/src/storage/storage_backend_scsi.c +++ b/src/storage/storage_backend_scsi.c @@ -495,6 +495,7 @@ virStorageBackendSCSIFindLUs(virStoragePoolObjPtr pool, DIR *devicedir = NULL; struct dirent *lun_dirent = NULL; char devicepattern[64]; + bool found = false;
VIR_DEBUG("Discovering LUs on host %u", scanhost);
@@ -516,11 +517,15 @@ virStorageBackendSCSIFindLUs(virStoragePoolObjPtr pool, continue; }
+ found = true; VIR_DEBUG("Found LU '%s'", lun_dirent->d_name);
processLU(pool, scanhost, bus, target, lun); }
+ if (!found) + VIR_DEBUG("No LU found for pool %s", pool->def->name); + closedir(devicedir);
return retval; diff --git a/tools/virsh.pod b/tools/virsh.pod index 77f9383..073465e 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -2650,6 +2650,14 @@ Refresh the list of volumes contained in I<pool>.
Start the storage I<pool>, which is previously defined but inactive.
+B<Note>: For pool which relies on remote resources, e.g. a "iscsi" type +pool, or a "scsi" type pool based on a (v)HBA, since the corresponding +devices for the volumes may not show up on the host's sysfs yet during +the pool starting process, it may need to refresh the pool (see +B<pool-refresh>) to get the volumes correctly detected. How many times +needed for the refreshing is depended on the network connection and the +time the host takes to export the corresponding devices to sysfs. + =item B<pool-undefine> I<pool-or-uuid>
Undefine the configuration for an inactive I<pool>.
B<Note>: A storage pool that relies on remote resources such as an "iscsi" or a vHBA backed "scsi" pool may need to be refreshed multiple times in order to have all the volumes detected (see B<pool-refresh>). This is because the corresponding volume devices may not be present in the host's filesystem during the initial pool startup or the current refresh attempt. The number of refresh retries is dependant upon the network connection and the time the host takes to export the corresponding devices. (sic) note that formatstorage.html.in has both vHBA and (v)HBA, while virsh.pod presently only uses vHBA. Whatever the *correct* syntax is for the instance being used is what we should use here. ACK with the change John

On 22/01/14 23:15, John Ferlan wrote:
For pool which relies on remote resources, such as a "iscsi" type pool, since how long it takes to export the corresponding devices to host's sysfs is really depended, it could depend on the network connection, it also could depend on the host's udev procedures. So it's likely that the volumes are not able to be detected during pool starting process, polling the sysfs doesn't work, since we don't know how much time is best for the polling, and even worse, the volumes could still be not detected or partly not detected even after the polling. So we end up with a documentation to prompt the fact, in virsh manual. Probably some of the above is overkill since the virsh.pod repeats much of it - although having the intro comment specifically target udev
On 01/22/2014 09:39 AM, Osier Yang wrote: procedures (or "udevadm settle") as the culprit are good. Also see my change to your text below.
And as a small improvement, let's explicitly say no LUNs found in the debug log in that case. --- src/storage/storage_backend_scsi.c | 5 +++++ tools/virsh.pod | 8 ++++++++ 2 files changed, 13 insertions(+)
diff --git a/src/storage/storage_backend_scsi.c b/src/storage/storage_backend_scsi.c index 93039c1..fbcb102 100644 --- a/src/storage/storage_backend_scsi.c +++ b/src/storage/storage_backend_scsi.c @@ -495,6 +495,7 @@ virStorageBackendSCSIFindLUs(virStoragePoolObjPtr pool, DIR *devicedir = NULL; struct dirent *lun_dirent = NULL; char devicepattern[64]; + bool found = false;
VIR_DEBUG("Discovering LUs on host %u", scanhost);
@@ -516,11 +517,15 @@ virStorageBackendSCSIFindLUs(virStoragePoolObjPtr pool, continue; }
+ found = true; VIR_DEBUG("Found LU '%s'", lun_dirent->d_name);
processLU(pool, scanhost, bus, target, lun); }
+ if (!found) + VIR_DEBUG("No LU found for pool %s", pool->def->name); + closedir(devicedir);
return retval; diff --git a/tools/virsh.pod b/tools/virsh.pod index 77f9383..073465e 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -2650,6 +2650,14 @@ Refresh the list of volumes contained in I<pool>.
Start the storage I<pool>, which is previously defined but inactive.
+B<Note>: For pool which relies on remote resources, e.g. a "iscsi" type +pool, or a "scsi" type pool based on a (v)HBA, since the corresponding +devices for the volumes may not show up on the host's sysfs yet during +the pool starting process, it may need to refresh the pool (see +B<pool-refresh>) to get the volumes correctly detected. How many times +needed for the refreshing is depended on the network connection and the +time the host takes to export the corresponding devices to sysfs. + =item B<pool-undefine> I<pool-or-uuid>
Undefine the configuration for an inactive I<pool>.
B<Note>: A storage pool that relies on remote resources such as an "iscsi" or a vHBA backed "scsi" pool may need to be refreshed multiple times in order to have all the volumes detected (see B<pool-refresh>). This is because the corresponding volume devices may not be present in the host's filesystem during the initial pool startup or the current refresh attempt. The number of refresh retries is dependant upon the network connection and the time the host takes to export the corresponding devices.
I like it. :-)
(sic) note that formatstorage.html.in has both vHBA and (v)HBA, while
We have to use (v)HBA here, since it's the same case for HBA.
virsh.pod presently only uses vHBA. Whatever the *correct* syntax is
The only place uses "vHBA" in virsh.pod is for nodedev-destroy, which only works for vHBA indeed, it doesn't work for HBA. Pushed with "s/vHBA/(v)HBA/", thanks. Osier

On 23/01/14 13:44, Osier Yang wrote:
On 22/01/14 23:15, John Ferlan wrote:
For pool which relies on remote resources, such as a "iscsi" type pool, since how long it takes to export the corresponding devices to host's sysfs is really depended, it could depend on the network connection, it also could depend on the host's udev procedures. So it's likely that the volumes are not able to be detected during pool starting process, polling the sysfs doesn't work, since we don't know how much time is best for the polling, and even worse, the volumes could still be not detected or partly not detected even after the polling. So we end up with a documentation to prompt the fact, in virsh manual. Probably some of the above is overkill since the virsh.pod repeats much of it - although having the intro comment specifically target udev
On 01/22/2014 09:39 AM, Osier Yang wrote: procedures (or "udevadm settle") as the culprit are good. Also see my change to your text below.
And as a small improvement, let's explicitly say no LUNs found in the debug log in that case. --- src/storage/storage_backend_scsi.c | 5 +++++ tools/virsh.pod | 8 ++++++++ 2 files changed, 13 insertions(+)
diff --git a/src/storage/storage_backend_scsi.c b/src/storage/storage_backend_scsi.c index 93039c1..fbcb102 100644 --- a/src/storage/storage_backend_scsi.c +++ b/src/storage/storage_backend_scsi.c @@ -495,6 +495,7 @@ virStorageBackendSCSIFindLUs(virStoragePoolObjPtr pool, DIR *devicedir = NULL; struct dirent *lun_dirent = NULL; char devicepattern[64]; + bool found = false; VIR_DEBUG("Discovering LUs on host %u", scanhost); @@ -516,11 +517,15 @@ virStorageBackendSCSIFindLUs(virStoragePoolObjPtr pool, continue; } + found = true; VIR_DEBUG("Found LU '%s'", lun_dirent->d_name); processLU(pool, scanhost, bus, target, lun); } + if (!found) + VIR_DEBUG("No LU found for pool %s", pool->def->name); + closedir(devicedir); return retval; diff --git a/tools/virsh.pod b/tools/virsh.pod index 77f9383..073465e 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -2650,6 +2650,14 @@ Refresh the list of volumes contained in I<pool>. Start the storage I<pool>, which is previously defined but inactive. +B<Note>: For pool which relies on remote resources, e.g. a "iscsi" type +pool, or a "scsi" type pool based on a (v)HBA, since the corresponding +devices for the volumes may not show up on the host's sysfs yet during +the pool starting process, it may need to refresh the pool (see +B<pool-refresh>) to get the volumes correctly detected. How many times +needed for the refreshing is depended on the network connection and the +time the host takes to export the corresponding devices to sysfs. + =item B<pool-undefine> I<pool-or-uuid> Undefine the configuration for an inactive I<pool>.
B<Note>: A storage pool that relies on remote resources such as an "iscsi" or a vHBA backed "scsi" pool may need to be refreshed multiple times in order to have all the volumes detected (see B<pool-refresh>). This is because the corresponding volume devices may not be present in the host's filesystem during the initial pool startup or the current refresh attempt. The number of refresh retries is dependant upon the network connection and the time the host takes to export the corresponding devices.
I like it. :-)
(sic) note that formatstorage.html.in has both vHBA and (v)HBA, while
We have to use (v)HBA here, since it's the same case for HBA.
Btw, I'm not sure whether using "(v)HBA" as the abbreviation for "vHBA or/and HBA" is acceptable in formal document or not, if it's not quite good, it will need a further patch to clean up them in both virsh.pod and formatdomain.html. Anyway, it's something trivial. I'm going forward to push this patch. Osier
participants (2)
-
John Ferlan
-
Osier Yang