[libvirt] [PATCH v2 0/3] Fix a couple issues found w/ vHBA logic

v1: https://www.redhat.com/archives/libvir-list/2017-July/msg00029.html Changes in v2: Rewrote patch 1 to resolve (new) bz1472277. This can also be cherry picked into 3.2-maint (and any others that get created). Added patch 2 - it's obvious why Adjusted patch 3 to make the change in storage rather than conf, but removed the change to the checkParent condition since that's fixed in patch 1. The change to checkParent is the same though. John Ferlan (3): storage: Fix existing parent check for vHBA creation storage: Remove @conn from virNodeDeviceCreateVport conf: Fix vHBA checkParent logic for pool creation src/conf/node_device_conf.c | 63 ++------------------------ src/conf/node_device_conf.h | 3 +- src/storage/storage_backend_scsi.c | 92 +++++++++++++++++++++++++++++++++++++- 3 files changed, 95 insertions(+), 63 deletions(-) -- 2.9.4

https://bugzilla.redhat.com/show_bug.cgi?id=1472277 Commit id '106930aaa' altered the order of checking for an existing vHBA (e.g something created via nodedev-create functionality outside of the storage pool logic) which inadvertantly broke the code to decide whether to alter/force the fchost->managed field to be 'yes' because the storage pool will be managing the created vHBA in order to ensure when the storage pool is destroyed that the vHBA is also destroyed. This patch moves the check (and checkParent helper) for an existing vHBA back into the createVport in storage_backend_scsi. It also restores the logic back to what commit id '79ab0935' had with respect to using "if (fchost->parent && !checkParent(...))" and returns immediately. The changes made by commit id '08c0ea16f' are only necessary to run the virStoragePoolFCRefreshThread when a vHBA was really created because there's a timing lag such that the refreshPool call made after a startPool from storagePoolCreate* wouldn't necessarily find LUNs, but the thread would. For an already existing vHBA, using the thread is unnecessary since the vHBA already exists and the lag to configure the LUNs wouldn't exist. Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/conf/node_device_conf.c | 55 ------------------------------------ src/storage/storage_backend_scsi.c | 57 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+), 55 deletions(-) diff --git a/src/conf/node_device_conf.c b/src/conf/node_device_conf.c index 503b129..9c0ffa5 100644 --- a/src/conf/node_device_conf.c +++ b/src/conf/node_device_conf.c @@ -2259,48 +2259,6 @@ virNodeDeviceGetParentName(virConnectPtr conn, } -/* - * Using the host# name found via wwnn/wwpn lookup in the fc_host - * sysfs tree to get the parent 'scsi_host#' to ensure it matches. - */ -static bool -checkParent(virConnectPtr conn, - const char *name, - const char *parent_name) -{ - char *scsi_host_name = NULL; - char *vhba_parent = NULL; - bool retval = false; - - VIR_DEBUG("conn=%p, name=%s, parent_name=%s", conn, name, parent_name); - - /* autostarted pool - assume we're OK */ - if (!conn) - return true; - - if (virAsprintf(&scsi_host_name, "scsi_%s", name) < 0) - goto cleanup; - - if (!(vhba_parent = virNodeDeviceGetParentName(conn, scsi_host_name))) - goto cleanup; - - if (STRNEQ(parent_name, vhba_parent)) { - virReportError(VIR_ERR_XML_ERROR, - _("Parent attribute '%s' does not match parent '%s' " - "determined for the '%s' wwnn/wwpn lookup."), - parent_name, vhba_parent, name); - goto cleanup; - } - - retval = true; - - cleanup: - VIR_FREE(vhba_parent); - VIR_FREE(scsi_host_name); - return retval; -} - - /** * @conn: Connection pointer * @fchost: Pointer to vHBA adapter @@ -2326,19 +2284,6 @@ virNodeDeviceCreateVport(virConnectPtr conn, VIR_DEBUG("conn=%p, parent='%s', wwnn='%s' wwpn='%s'", conn, NULLSTR(fchost->parent), fchost->wwnn, fchost->wwpn); - /* If we find an existing HBA/vHBA within the fc_host sysfs - * using the wwnn/wwpn, then a nodedev is already created for - * this pool and we don't have to create the vHBA - */ - if ((name = virVHBAGetHostByWWN(NULL, fchost->wwnn, fchost->wwpn))) { - /* If a parent was provided, let's make sure the 'name' we've - * retrieved has the same parent. If not this will cause failure. */ - if (fchost->parent && checkParent(conn, name, fchost->parent)) - VIR_FREE(name); - - return name; - } - if (fchost->parent) { if (VIR_STRDUP(parent_hoststr, fchost->parent) < 0) goto cleanup; diff --git a/src/storage/storage_backend_scsi.c b/src/storage/storage_backend_scsi.c index f7378d3..aa0fb99 100644 --- a/src/storage/storage_backend_scsi.c +++ b/src/storage/storage_backend_scsi.c @@ -211,6 +211,48 @@ getAdapterName(virStorageAdapterPtr adapter) } +/* + * Using the host# name found via wwnn/wwpn lookup in the fc_host + * sysfs tree to get the parent 'scsi_host#' to ensure it matches. + */ +static bool +checkParent(virConnectPtr conn, + const char *name, + const char *parent_name) +{ + char *scsi_host_name = NULL; + char *vhba_parent = NULL; + bool retval = false; + + VIR_DEBUG("conn=%p, name=%s, parent_name=%s", conn, name, parent_name); + + /* autostarted pool - assume we're OK */ + if (!conn) + return true; + + if (virAsprintf(&scsi_host_name, "scsi_%s", name) < 0) + goto cleanup; + + if (!(vhba_parent = virNodeDeviceGetParentName(conn, scsi_host_name))) + goto cleanup; + + if (STRNEQ(parent_name, vhba_parent)) { + virReportError(VIR_ERR_XML_ERROR, + _("Parent attribute '%s' does not match parent '%s' " + "determined for the '%s' wwnn/wwpn lookup."), + parent_name, vhba_parent, name); + goto cleanup; + } + + retval = true; + + cleanup: + VIR_FREE(vhba_parent); + VIR_FREE(scsi_host_name); + return retval; +} + + static int createVport(virConnectPtr conn, virStoragePoolDefPtr def, @@ -226,6 +268,21 @@ createVport(virConnectPtr conn, conn, NULLSTR(configFile), NULLSTR(fchost->parent), fchost->wwnn, fchost->wwpn); + /* If we find an existing HBA/vHBA within the fc_host sysfs + * using the wwnn/wwpn, then a nodedev is already created for + * this pool and we don't have to create the vHBA + */ + if ((name = virVHBAGetHostByWWN(NULL, fchost->wwnn, fchost->wwpn))) { + ret = 0; + + /* If a parent was provided, let's make sure the 'name' we've + * retrieved has the same parent. If not this will cause failure. */ + if (fchost->parent && !checkParent(conn, name, fchost->parent)) + ret = -1; + + VIR_FREE(name); + return ret; + } /* Since we're creating the vHBA, then we need to manage removing it * as well. Since we need this setting to "live" through a libvirtd -- 2.9.4

On Tue, Jul 18, 2017 at 11:10:38AM -0400, John Ferlan wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1472277
Commit id '106930aaa' altered the order of checking for an existing vHBA (e.g something created via nodedev-create functionality outside of the storage pool logic) which inadvertantly broke the code to decide whether to alter/force the fchost->managed field to be 'yes' because the storage pool will be managing the created vHBA in order to ensure when the storage pool is destroyed that the vHBA is also destroyed.
Right, I agree with this - unless the user explicitly states they want the pre-created vHBA to be managed, we don't force any setting. I was wondering though, what about a use case when the user wants the vHBA to be auto-created, but non-managed at the same time...(yeah, I know I'm stretching it a bit with these unlikely scenarios, but then, you'd surely agree, you've already seen some of similar sort and one can never expect what creative ways of defining the XML are there to be found :)) I was also about to point out in the previous version, that I didn't like how complex virNodeDeviceCreateVport was getting - do wwnn and wwpn relate to a vHBA at all or is it a regular HBA, does the vHBA exist already or should we actually create and manage it.
This patch moves the check (and checkParent helper) for an existing vHBA back into the createVport in storage_backend_scsi. It also restores the logic back to what commit id '79ab0935' had with
As I'm writing below, if you want to go for a partial revert, this patch needs to be split in 2, it's never a good idea to move code segments and doing adjustments to it at the same time.
respect to using "if (fchost->parent && !checkParent(...))" and returns immediately. The changes made by commit id '08c0ea16f' are only necessary to run the virStoragePoolFCRefreshThread when a vHBA was really created because there's a timing lag such that the refreshPool call made after a startPool from storagePoolCreate* wouldn't necessarily find LUNs, but the thread would. For an already existing vHBA, using the thread is unnecessary since the vHBA already exists and the lag to configure the LUNs wouldn't exist.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/conf/node_device_conf.c | 55 ------------------------------------ src/storage/storage_backend_scsi.c | 57 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+), 55 deletions(-)
diff --git a/src/conf/node_device_conf.c b/src/conf/node_device_conf.c index 503b129..9c0ffa5 100644 --- a/src/conf/node_device_conf.c +++ b/src/conf/node_device_conf.c @@ -2259,48 +2259,6 @@ virNodeDeviceGetParentName(virConnectPtr conn, }
-/* - * Using the host# name found via wwnn/wwpn lookup in the fc_host - * sysfs tree to get the parent 'scsi_host#' to ensure it matches. - */ -static bool -checkParent(virConnectPtr conn, - const char *name, - const char *parent_name) -{ - char *scsi_host_name = NULL; - char *vhba_parent = NULL; - bool retval = false; - - VIR_DEBUG("conn=%p, name=%s, parent_name=%s", conn, name, parent_name); - - /* autostarted pool - assume we're OK */ - if (!conn) - return true; - - if (virAsprintf(&scsi_host_name, "scsi_%s", name) < 0) - goto cleanup; - - if (!(vhba_parent = virNodeDeviceGetParentName(conn, scsi_host_name))) - goto cleanup; - - if (STRNEQ(parent_name, vhba_parent)) { - virReportError(VIR_ERR_XML_ERROR, - _("Parent attribute '%s' does not match parent '%s' " - "determined for the '%s' wwnn/wwpn lookup."), - parent_name, vhba_parent, name); - goto cleanup; - } - - retval = true; - - cleanup: - VIR_FREE(vhba_parent); - VIR_FREE(scsi_host_name); - return retval; -} - - /** * @conn: Connection pointer * @fchost: Pointer to vHBA adapter @@ -2326,19 +2284,6 @@ virNodeDeviceCreateVport(virConnectPtr conn, VIR_DEBUG("conn=%p, parent='%s', wwnn='%s' wwpn='%s'", conn, NULLSTR(fchost->parent), fchost->wwnn, fchost->wwpn);
- /* If we find an existing HBA/vHBA within the fc_host sysfs - * using the wwnn/wwpn, then a nodedev is already created for - * this pool and we don't have to create the vHBA - */ - if ((name = virVHBAGetHostByWWN(NULL, fchost->wwnn, fchost->wwpn))) { - /* If a parent was provided, let's make sure the 'name' we've - * retrieved has the same parent. If not this will cause failure. */ - if (fchost->parent && checkParent(conn, name, fchost->parent)) - VIR_FREE(name); - - return name; - } - if (fchost->parent) { if (VIR_STRDUP(parent_hoststr, fchost->parent) < 0) goto cleanup; diff --git a/src/storage/storage_backend_scsi.c b/src/storage/storage_backend_scsi.c index f7378d3..aa0fb99 100644 --- a/src/storage/storage_backend_scsi.c +++ b/src/storage/storage_backend_scsi.c @@ -211,6 +211,48 @@ getAdapterName(virStorageAdapterPtr adapter) }
+/* + * Using the host# name found via wwnn/wwpn lookup in the fc_host + * sysfs tree to get the parent 'scsi_host#' to ensure it matches. + */ +static bool +checkParent(virConnectPtr conn, + const char *name, + const char *parent_name) +{ + char *scsi_host_name = NULL; + char *vhba_parent = NULL; + bool retval = false; + + VIR_DEBUG("conn=%p, name=%s, parent_name=%s", conn, name, parent_name); + + /* autostarted pool - assume we're OK */ + if (!conn) + return true; + + if (virAsprintf(&scsi_host_name, "scsi_%s", name) < 0) + goto cleanup; + + if (!(vhba_parent = virNodeDeviceGetParentName(conn, scsi_host_name))) + goto cleanup; + + if (STRNEQ(parent_name, vhba_parent)) { + virReportError(VIR_ERR_XML_ERROR, + _("Parent attribute '%s' does not match parent '%s' " + "determined for the '%s' wwnn/wwpn lookup."), + parent_name, vhba_parent, name); + goto cleanup; + } + + retval = true; + + cleanup: + VIR_FREE(vhba_parent); + VIR_FREE(scsi_host_name); + return retval; +} + + static int createVport(virConnectPtr conn, virStoragePoolDefPtr def, @@ -226,6 +268,21 @@ createVport(virConnectPtr conn, conn, NULLSTR(configFile), NULLSTR(fchost->parent), fchost->wwnn, fchost->wwpn);
+ /* If we find an existing HBA/vHBA within the fc_host sysfs + * using the wwnn/wwpn, then a nodedev is already created for + * this pool and we don't have to create the vHBA + */ + if ((name = virVHBAGetHostByWWN(NULL, fchost->wwnn, fchost->wwpn))) { + ret = 0; + + /* If a parent was provided, let's make sure the 'name' we've + * retrieved has the same parent. If not this will cause failure. */ + if (fchost->parent && !checkParent(conn, name, fchost->parent)) + ret = -1; + + VIR_FREE(name); + return ret;
Firstly, this patch is trying to do 2 things at once - a partial revert of commit and then altering a check to fix a BZ, it's easier for a reviewer to both track and read, I also think it will potentially make anyone who's going to later look at the git history in the future happier. Secondly, @ret is already initialized to -1 when entering ^this block and I don't find it that much readable when you potentially set it twice and do something that is already handled in the cleanup section. Since we return 0 either when the vHBA exist AND no parent to be checked was provided or when both the name and the parent which satisfies the sanity check was provided I suggest squashing the following in instead: diff --git a/src/storage/storage_backend_scsi.c b/src/storage/storage_backend_scsi.c index aa0fb992d..d75553f1b 100644 --- a/src/storage/storage_backend_scsi.c +++ b/src/storage/storage_backend_scsi.c @@ -273,15 +273,13 @@ createVport(virConnectPtr conn, * this pool and we don't have to create the vHBA */ if ((name = virVHBAGetHostByWWN(NULL, fchost->wwnn, fchost->wwpn))) { - ret = 0; /* If a parent was provided, let's make sure the 'name' we've * retrieved has the same parent. If not this will cause failure. */ - if (fchost->parent && !checkParent(conn, name, fchost->parent)) - ret = -1; + if (!fchost->parent || checkParent(conn, name, fchost->parent)) + ret = 0; - VIR_FREE(name); - return ret; + goto cleanup; } /* Since we're creating the vHBA, then we need to manage removing it Other than this, the patch looks reasonable, I'll try a few scenarios in the meantime. Erik
+ }
/* Since we're creating the vHBA, then we need to manage removing it * as well. Since we need this setting to "live" through a libvirtd

On 07/19/2017 07:38 AM, Erik Skultety wrote:
On Tue, Jul 18, 2017 at 11:10:38AM -0400, John Ferlan wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1472277
Commit id '106930aaa' altered the order of checking for an existing vHBA (e.g something created via nodedev-create functionality outside of the storage pool logic) which inadvertantly broke the code to decide whether to alter/force the fchost->managed field to be 'yes' because the storage pool will be managing the created vHBA in order to ensure when the storage pool is destroyed that the vHBA is also destroyed.
Right, I agree with this - unless the user explicitly states they want the pre-created vHBA to be managed, we don't force any setting. I was wondering though, what about a use case when the user wants the vHBA to be auto-created, but non-managed at the same time...(yeah, I know I'm stretching it a bit with these unlikely scenarios, but then, you'd surely agree, you've already seen some of similar sort and one can never expect what creative ways of defining the XML are there to be found :))
I think you're becoming the new vHBA expert ;-) In this case, I tell them to go see figure one or go read the docs. From the storage pool page, managed description: "For configurations that do not provide an already created vHBA from a 'virsh nodedev-create', libvirt will set this property to "yes"."
I was also about to point out in the previous version, that I didn't like how complex virNodeDeviceCreateVport was getting - do wwnn and wwpn relate to a vHBA at all or is it a regular HBA, does the vHBA exist already or should we actually create and manage it.
The wwnn/wwpn are the wwnn/wwpn used to create the vHBA. Typically, a SAN admin will "assign" the pair (there's some specific rules about them). If not provided for a storage pool, then it's an XML error. For a nodedev it's possible to have libvirt generate a wwnn/wwpn, which yes, is quite confusing. In doing so libvirt uses a specific base and adds to it (see virRandomGenerateWWN and cover at least one eye). I agree in general about the "complexity" thing, but as time has gone on requests for new ways to determine which parent to use has caused code bloat. Complexity is a time factored calculation. When originally implemented using "host#" for parent was perfectly fine, but then someone realized that it should be "scsi_host#". Then someone said, that "scsi_host#" wasn't good enough because it can change between reboots. So parent by wwnn/wwpn was added. During the discussions for that someone else said what about using the fabric_wwn in order to find a parent. Oh and the "future" holds being able to define multiple vHBA's because it's felt migration of domains using vHBA pools is going to need more than one vHBA because on the target host using the same wwnn/wwpn won't work (although I have doubts here, but haven't had the cycles to investigate). If you're interested, go read the tail end of the wiki (http://wiki.libvirt.org/page/NPIV_in_libvirt) - it'll show a bit of the history of how things looked at one time. TBH: Sometimes I think QE reads the code and figures out a way to create bugs based on assumptions the code makes rather than making sure the intended use cases "work properly". Hence this whole need to know whether the parent is the HBA or the vHBA. Why would *anyone* provide the HBA parent wwnn/wwpn when it's perfectly fine to create a storage pool of the HBA without it via: <adapter type='scsi_host' name='scsi_host3'/> but no, someone wants to be inventive and think/believe: <adapter type='fc_host' [parent='scsi_host3'] wwnn='HBA_wwnn' wwpn='HBA_wwpn'/> should work as well (where HBA_wwnn/wwpn are the wwnn/wwpn of scsi_host3 in this example). The second XML isn't illegal, but because scsi_host3 has vHBA characteristics (in this case the ability to create vports), thus it passes certain tests and "could" be a different, but legal way to essentially define the HBA as the storage pool. Wouldn't anyone use it this way - probably not as there's no reason since the vHBA LUN's won't be available. It's a digression + tirade ;-)...
This patch moves the check (and checkParent helper) for an existing vHBA back into the createVport in storage_backend_scsi. It also restores the logic back to what commit id '79ab0935' had with
As I'm writing below, if you want to go for a partial revert, this patch needs to be split in 2, it's never a good idea to move code segments and doing adjustments to it at the same time.
While I understand your point, I disagree. Purely moving the code doesn't work as the code in node_device_conf returns @name and code in storage_backend_scsi returns -1/0, so some amount of massaging the code is going to be required anyway. May as well get it right in one shot rather than confuse a future git bisector even more. I can change it to the snippet you show below, but I feel it should be all in one patch. Thanks for jumping in (and becoming the new expert) - John
respect to using "if (fchost->parent && !checkParent(...))" and returns immediately. The changes made by commit id '08c0ea16f' are only necessary to run the virStoragePoolFCRefreshThread when a vHBA was really created because there's a timing lag such that the refreshPool call made after a startPool from storagePoolCreate* wouldn't necessarily find LUNs, but the thread would. For an already existing vHBA, using the thread is unnecessary since the vHBA already exists and the lag to configure the LUNs wouldn't exist.
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/conf/node_device_conf.c | 55 ------------------------------------ src/storage/storage_backend_scsi.c | 57 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+), 55 deletions(-)
diff --git a/src/conf/node_device_conf.c b/src/conf/node_device_conf.c index 503b129..9c0ffa5 100644 --- a/src/conf/node_device_conf.c +++ b/src/conf/node_device_conf.c @@ -2259,48 +2259,6 @@ virNodeDeviceGetParentName(virConnectPtr conn, }
-/* - * Using the host# name found via wwnn/wwpn lookup in the fc_host - * sysfs tree to get the parent 'scsi_host#' to ensure it matches. - */ -static bool -checkParent(virConnectPtr conn, - const char *name, - const char *parent_name) -{ - char *scsi_host_name = NULL; - char *vhba_parent = NULL; - bool retval = false; - - VIR_DEBUG("conn=%p, name=%s, parent_name=%s", conn, name, parent_name); - - /* autostarted pool - assume we're OK */ - if (!conn) - return true; - - if (virAsprintf(&scsi_host_name, "scsi_%s", name) < 0) - goto cleanup; - - if (!(vhba_parent = virNodeDeviceGetParentName(conn, scsi_host_name))) - goto cleanup; - - if (STRNEQ(parent_name, vhba_parent)) { - virReportError(VIR_ERR_XML_ERROR, - _("Parent attribute '%s' does not match parent '%s' " - "determined for the '%s' wwnn/wwpn lookup."), - parent_name, vhba_parent, name); - goto cleanup; - } - - retval = true; - - cleanup: - VIR_FREE(vhba_parent); - VIR_FREE(scsi_host_name); - return retval; -} - - /** * @conn: Connection pointer * @fchost: Pointer to vHBA adapter @@ -2326,19 +2284,6 @@ virNodeDeviceCreateVport(virConnectPtr conn, VIR_DEBUG("conn=%p, parent='%s', wwnn='%s' wwpn='%s'", conn, NULLSTR(fchost->parent), fchost->wwnn, fchost->wwpn);
- /* If we find an existing HBA/vHBA within the fc_host sysfs - * using the wwnn/wwpn, then a nodedev is already created for - * this pool and we don't have to create the vHBA - */ - if ((name = virVHBAGetHostByWWN(NULL, fchost->wwnn, fchost->wwpn))) { - /* If a parent was provided, let's make sure the 'name' we've - * retrieved has the same parent. If not this will cause failure. */ - if (fchost->parent && checkParent(conn, name, fchost->parent)) - VIR_FREE(name); - - return name; - } - if (fchost->parent) { if (VIR_STRDUP(parent_hoststr, fchost->parent) < 0) goto cleanup; diff --git a/src/storage/storage_backend_scsi.c b/src/storage/storage_backend_scsi.c index f7378d3..aa0fb99 100644 --- a/src/storage/storage_backend_scsi.c +++ b/src/storage/storage_backend_scsi.c @@ -211,6 +211,48 @@ getAdapterName(virStorageAdapterPtr adapter) }
+/* + * Using the host# name found via wwnn/wwpn lookup in the fc_host + * sysfs tree to get the parent 'scsi_host#' to ensure it matches. + */ +static bool +checkParent(virConnectPtr conn, + const char *name, + const char *parent_name) +{ + char *scsi_host_name = NULL; + char *vhba_parent = NULL; + bool retval = false; + + VIR_DEBUG("conn=%p, name=%s, parent_name=%s", conn, name, parent_name); + + /* autostarted pool - assume we're OK */ + if (!conn) + return true; + + if (virAsprintf(&scsi_host_name, "scsi_%s", name) < 0) + goto cleanup; + + if (!(vhba_parent = virNodeDeviceGetParentName(conn, scsi_host_name))) + goto cleanup; + + if (STRNEQ(parent_name, vhba_parent)) { + virReportError(VIR_ERR_XML_ERROR, + _("Parent attribute '%s' does not match parent '%s' " + "determined for the '%s' wwnn/wwpn lookup."), + parent_name, vhba_parent, name); + goto cleanup; + } + + retval = true; + + cleanup: + VIR_FREE(vhba_parent); + VIR_FREE(scsi_host_name); + return retval; +} + + static int createVport(virConnectPtr conn, virStoragePoolDefPtr def, @@ -226,6 +268,21 @@ createVport(virConnectPtr conn, conn, NULLSTR(configFile), NULLSTR(fchost->parent), fchost->wwnn, fchost->wwpn);
+ /* If we find an existing HBA/vHBA within the fc_host sysfs + * using the wwnn/wwpn, then a nodedev is already created for + * this pool and we don't have to create the vHBA + */ + if ((name = virVHBAGetHostByWWN(NULL, fchost->wwnn, fchost->wwpn))) { + ret = 0; + + /* If a parent was provided, let's make sure the 'name' we've + * retrieved has the same parent. If not this will cause failure. */ + if (fchost->parent && !checkParent(conn, name, fchost->parent)) + ret = -1; + + VIR_FREE(name); + return ret;
Firstly, this patch is trying to do 2 things at once - a partial revert of commit and then altering a check to fix a BZ, it's easier for a reviewer to both track and read, I also think it will potentially make anyone who's going to later look at the git history in the future happier.
Secondly, @ret is already initialized to -1 when entering ^this block and I don't find it that much readable when you potentially set it twice and do something that is already handled in the cleanup section. Since we return 0 either when the vHBA exist AND no parent to be checked was provided or when both the name and the parent which satisfies the sanity check was provided I suggest squashing the following in instead:
diff --git a/src/storage/storage_backend_scsi.c b/src/storage/storage_backend_scsi.c index aa0fb992d..d75553f1b 100644 --- a/src/storage/storage_backend_scsi.c +++ b/src/storage/storage_backend_scsi.c @@ -273,15 +273,13 @@ createVport(virConnectPtr conn, * this pool and we don't have to create the vHBA */ if ((name = virVHBAGetHostByWWN(NULL, fchost->wwnn, fchost->wwpn))) { - ret = 0;
/* If a parent was provided, let's make sure the 'name' we've * retrieved has the same parent. If not this will cause failure. */ - if (fchost->parent && !checkParent(conn, name, fchost->parent)) - ret = -1; + if (!fchost->parent || checkParent(conn, name, fchost->parent)) + ret = 0;
- VIR_FREE(name); - return ret; + goto cleanup; }
/* Since we're creating the vHBA, then we need to manage removing it
Other than this, the patch looks reasonable, I'll try a few scenarios in the meantime.
Erik
+ }
/* Since we're creating the vHBA, then we need to manage removing it * as well. Since we need this setting to "live" through a libvirtd

On Wed, Jul 19, 2017 at 09:59:06AM -0400, John Ferlan wrote:
On 07/19/2017 07:38 AM, Erik Skultety wrote:
On Tue, Jul 18, 2017 at 11:10:38AM -0400, John Ferlan wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1472277
Commit id '106930aaa' altered the order of checking for an existing vHBA (e.g something created via nodedev-create functionality outside of the storage pool logic) which inadvertantly broke the code to decide whether to alter/force the fchost->managed field to be 'yes' because the storage pool will be managing the created vHBA in order to ensure when the storage pool is destroyed that the vHBA is also destroyed.
Right, I agree with this - unless the user explicitly states they want the pre-created vHBA to be managed, we don't force any setting. I was wondering though, what about a use case when the user wants the vHBA to be auto-created, but non-managed at the same time...(yeah, I know I'm stretching it a bit with these unlikely scenarios, but then, you'd surely agree, you've already seen some of similar sort and one can never expect what creative ways of defining the XML are there to be found :))
I think you're becoming the new vHBA expert ;-)
In this case, I tell them to go see figure one or go read the docs. From the storage pool page, managed description: "For configurations that do not provide an already created vHBA from a 'virsh nodedev-create', libvirt will set this property to "yes"."
I was also about to point out in the previous version, that I didn't like how complex virNodeDeviceCreateVport was getting - do wwnn and wwpn relate to a vHBA at all or is it a regular HBA, does the vHBA exist already or should we actually create and manage it.
The wwnn/wwpn are the wwnn/wwpn used to create the vHBA. Typically, a SAN admin will "assign" the pair (there's some specific rules about them). If not provided for a storage pool, then it's an XML error. For a nodedev it's possible to have libvirt generate a wwnn/wwpn, which yes, is quite confusing. In doing so libvirt uses a specific base and adds to it (see virRandomGenerateWWN and cover at least one eye).
I agree in general about the "complexity" thing, but as time has gone on requests for new ways to determine which parent to use has caused code bloat. Complexity is a time factored calculation. When originally implemented using "host#" for parent was perfectly fine, but then someone realized that it should be "scsi_host#". Then someone said, that "scsi_host#" wasn't good enough because it can change between reboots. So parent by wwnn/wwpn was added. During the discussions for that someone else said what about using the fabric_wwn in order to find a parent. Oh and the "future" holds being able to define multiple vHBA's because it's felt migration of domains using vHBA pools is going to need more than one vHBA because on the target host using the same wwnn/wwpn won't work (although I have doubts here, but haven't had the cycles to investigate).
If you're interested, go read the tail end of the wiki (http://wiki.libvirt.org/page/NPIV_in_libvirt) - it'll show a bit of the history of how things looked at one time.
TBH: Sometimes I think QE reads the code and figures out a way to create bugs based on assumptions the code makes rather than making sure the intended use cases "work properly". Hence this whole need to know whether the parent is the HBA or the vHBA. Why would *anyone* provide the HBA parent wwnn/wwpn when it's perfectly fine to create a storage pool of the HBA without it via:
<adapter type='scsi_host' name='scsi_host3'/>
but no, someone wants to be inventive and think/believe:
<adapter type='fc_host' [parent='scsi_host3'] wwnn='HBA_wwnn' wwpn='HBA_wwpn'/>
should work as well (where HBA_wwnn/wwpn are the wwnn/wwpn of scsi_host3 in this example).
The second XML isn't illegal, but because scsi_host3 has vHBA
Well, I'd call it a misconfiguration, how can a device be a parent to itself? That's not what the attribute's supposed to serve and using it this way is a misuse - we should either ignore it in that case or return error. The storage pool won't start but it should never have in the first place and the XML def won't disappear in this case, so IMHO we could and probably should forbid it.
characteristics (in this case the ability to create vports), thus it passes certain tests and "could" be a different, but legal way to essentially define the HBA as the storage pool. Wouldn't anyone use it this way - probably not as there's no reason since the vHBA LUN's won't be available. It's a digression + tirade ;-)...
This patch moves the check (and checkParent helper) for an existing vHBA back into the createVport in storage_backend_scsi. It also restores the logic back to what commit id '79ab0935' had with
As I'm writing below, if you want to go for a partial revert, this patch needs to be split in 2, it's never a good idea to move code segments and doing adjustments to it at the same time.
While I understand your point, I disagree. Purely moving the code doesn't work as the code in node_device_conf returns @name and code in storage_backend_scsi returns -1/0, so some amount of massaging the code is going to be required anyway. May as well get it right in one shot rather than confuse a future git bisector even more.
Huh, you're right, let's go with your proposal then, with squashing in the tiny optimization snippet I've suggested in my previous response. Erik

On 07/19/2017 10:21 AM, Erik Skultety wrote:
On Wed, Jul 19, 2017 at 09:59:06AM -0400, John Ferlan wrote:
On 07/19/2017 07:38 AM, Erik Skultety wrote:
On Tue, Jul 18, 2017 at 11:10:38AM -0400, John Ferlan wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1472277
Commit id '106930aaa' altered the order of checking for an existing vHBA (e.g something created via nodedev-create functionality outside of the storage pool logic) which inadvertantly broke the code to decide whether to alter/force the fchost->managed field to be 'yes' because the storage pool will be managing the created vHBA in order to ensure when the storage pool is destroyed that the vHBA is also destroyed.
Right, I agree with this - unless the user explicitly states they want the pre-created vHBA to be managed, we don't force any setting. I was wondering though, what about a use case when the user wants the vHBA to be auto-created, but non-managed at the same time...(yeah, I know I'm stretching it a bit with these unlikely scenarios, but then, you'd surely agree, you've already seen some of similar sort and one can never expect what creative ways of defining the XML are there to be found :))
I think you're becoming the new vHBA expert ;-)
In this case, I tell them to go see figure one or go read the docs. From the storage pool page, managed description: "For configurations that do not provide an already created vHBA from a 'virsh nodedev-create', libvirt will set this property to "yes"."
I was also about to point out in the previous version, that I didn't like how complex virNodeDeviceCreateVport was getting - do wwnn and wwpn relate to a vHBA at all or is it a regular HBA, does the vHBA exist already or should we actually create and manage it.
The wwnn/wwpn are the wwnn/wwpn used to create the vHBA. Typically, a SAN admin will "assign" the pair (there's some specific rules about them). If not provided for a storage pool, then it's an XML error. For a nodedev it's possible to have libvirt generate a wwnn/wwpn, which yes, is quite confusing. In doing so libvirt uses a specific base and adds to it (see virRandomGenerateWWN and cover at least one eye).
I agree in general about the "complexity" thing, but as time has gone on requests for new ways to determine which parent to use has caused code bloat. Complexity is a time factored calculation. When originally implemented using "host#" for parent was perfectly fine, but then someone realized that it should be "scsi_host#". Then someone said, that "scsi_host#" wasn't good enough because it can change between reboots. So parent by wwnn/wwpn was added. During the discussions for that someone else said what about using the fabric_wwn in order to find a parent. Oh and the "future" holds being able to define multiple vHBA's because it's felt migration of domains using vHBA pools is going to need more than one vHBA because on the target host using the same wwnn/wwpn won't work (although I have doubts here, but haven't had the cycles to investigate).
If you're interested, go read the tail end of the wiki (http://wiki.libvirt.org/page/NPIV_in_libvirt) - it'll show a bit of the history of how things looked at one time.
TBH: Sometimes I think QE reads the code and figures out a way to create bugs based on assumptions the code makes rather than making sure the intended use cases "work properly". Hence this whole need to know whether the parent is the HBA or the vHBA. Why would *anyone* provide the HBA parent wwnn/wwpn when it's perfectly fine to create a storage pool of the HBA without it via:
<adapter type='scsi_host' name='scsi_host3'/>
but no, someone wants to be inventive and think/believe:
<adapter type='fc_host' [parent='scsi_host3'] wwnn='HBA_wwnn' wwpn='HBA_wwpn'/>
should work as well (where HBA_wwnn/wwpn are the wwnn/wwpn of scsi_host3 in this example).
The second XML isn't illegal, but because scsi_host3 has vHBA
Well, I'd call it a misconfiguration, how can a device be a parent to itself? That's not what the attribute's supposed to serve and using it this way is a misuse - we should either ignore it in that case or return error. The storage pool won't start but it should never have in the first place and the XML def won't disappear in this case, so IMHO we could and probably should forbid it.
Because it hasn't really been characterized as a misconfiguration previously. I doubt anyone outside of QE has ever done something like this as there's no reason to do so. If they want to use the HBA they'd use the 'scsi_host' format. IMO: something with 'fc_host' what uses the HBA wwnn/wwpn is misconfigured because it's not a vHBA then, but there's been no attempt to prohibit that configuration, hence the current mess. I'd be perfectly fine with turning this particular bz/check into - don't configure things this way because it's not how it's supposed to work. If you want a storage pool backed to an HBA, then use the scsi_host syntax. If you want a vHBA/NPIV then use the fc_host syntax.
characteristics (in this case the ability to create vports), thus it passes certain tests and "could" be a different, but legal way to essentially define the HBA as the storage pool. Wouldn't anyone use it this way - probably not as there's no reason since the vHBA LUN's won't be available. It's a digression + tirade ;-)...
This patch moves the check (and checkParent helper) for an existing vHBA back into the createVport in storage_backend_scsi. It also restores the logic back to what commit id '79ab0935' had with
As I'm writing below, if you want to go for a partial revert, this patch needs to be split in 2, it's never a good idea to move code segments and doing adjustments to it at the same time.
While I understand your point, I disagree. Purely moving the code doesn't work as the code in node_device_conf returns @name and code in storage_backend_scsi returns -1/0, so some amount of massaging the code is going to be required anyway. May as well get it right in one shot rather than confuse a future git bisector even more.
Huh, you're right, let's go with your proposal then, with squashing in the tiny optimization snippet I've suggested in my previous response.
Erik
Something I've done in my branch John

On Wed, Jul 19, 2017 at 11:09:00AM -0400, John Ferlan wrote:
On 07/19/2017 10:21 AM, Erik Skultety wrote:
On Wed, Jul 19, 2017 at 09:59:06AM -0400, John Ferlan wrote:
On 07/19/2017 07:38 AM, Erik Skultety wrote:
On Tue, Jul 18, 2017 at 11:10:38AM -0400, John Ferlan wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1472277
Commit id '106930aaa' altered the order of checking for an existing vHBA (e.g something created via nodedev-create functionality outside of the storage pool logic) which inadvertantly broke the code to decide whether to alter/force the fchost->managed field to be 'yes' because the storage pool will be managing the created vHBA in order to ensure when the storage pool is destroyed that the vHBA is also destroyed.
Right, I agree with this - unless the user explicitly states they want the pre-created vHBA to be managed, we don't force any setting. I was wondering though, what about a use case when the user wants the vHBA to be auto-created, but non-managed at the same time...(yeah, I know I'm stretching it a bit with these unlikely scenarios, but then, you'd surely agree, you've already seen some of similar sort and one can never expect what creative ways of defining the XML are there to be found :))
I think you're becoming the new vHBA expert ;-)
In this case, I tell them to go see figure one or go read the docs. From the storage pool page, managed description: "For configurations that do not provide an already created vHBA from a 'virsh nodedev-create', libvirt will set this property to "yes"."
I was also about to point out in the previous version, that I didn't like how complex virNodeDeviceCreateVport was getting - do wwnn and wwpn relate to a vHBA at all or is it a regular HBA, does the vHBA exist already or should we actually create and manage it.
The wwnn/wwpn are the wwnn/wwpn used to create the vHBA. Typically, a SAN admin will "assign" the pair (there's some specific rules about them). If not provided for a storage pool, then it's an XML error. For a nodedev it's possible to have libvirt generate a wwnn/wwpn, which yes, is quite confusing. In doing so libvirt uses a specific base and adds to it (see virRandomGenerateWWN and cover at least one eye).
I agree in general about the "complexity" thing, but as time has gone on requests for new ways to determine which parent to use has caused code bloat. Complexity is a time factored calculation. When originally implemented using "host#" for parent was perfectly fine, but then someone realized that it should be "scsi_host#". Then someone said, that "scsi_host#" wasn't good enough because it can change between reboots. So parent by wwnn/wwpn was added. During the discussions for that someone else said what about using the fabric_wwn in order to find a parent. Oh and the "future" holds being able to define multiple vHBA's because it's felt migration of domains using vHBA pools is going to need more than one vHBA because on the target host using the same wwnn/wwpn won't work (although I have doubts here, but haven't had the cycles to investigate).
If you're interested, go read the tail end of the wiki (http://wiki.libvirt.org/page/NPIV_in_libvirt) - it'll show a bit of the history of how things looked at one time.
TBH: Sometimes I think QE reads the code and figures out a way to create bugs based on assumptions the code makes rather than making sure the intended use cases "work properly". Hence this whole need to know whether the parent is the HBA or the vHBA. Why would *anyone* provide the HBA parent wwnn/wwpn when it's perfectly fine to create a storage pool of the HBA without it via:
<adapter type='scsi_host' name='scsi_host3'/>
but no, someone wants to be inventive and think/believe:
<adapter type='fc_host' [parent='scsi_host3'] wwnn='HBA_wwnn' wwpn='HBA_wwpn'/>
should work as well (where HBA_wwnn/wwpn are the wwnn/wwpn of scsi_host3 in this example).
The second XML isn't illegal, but because scsi_host3 has vHBA
Well, I'd call it a misconfiguration, how can a device be a parent to itself? That's not what the attribute's supposed to serve and using it this way is a misuse - we should either ignore it in that case or return error. The storage pool won't start but it should never have in the first place and the XML def won't disappear in this case, so IMHO we could and probably should forbid it.
Because it hasn't really been characterized as a misconfiguration previously. I doubt anyone outside of QE has ever done something like this as there's no reason to do so. If they want to use the HBA they'd use the 'scsi_host' format.
IMO: something with 'fc_host' what uses the HBA wwnn/wwpn is misconfigured because it's not a vHBA then, but there's been no attempt to prohibit that configuration, hence the current mess.
I'd be perfectly fine with turning this particular bz/check into - don't configure things this way because it's not how it's supposed to work. If you want a storage pool backed to an HBA, then use the scsi_host syntax. If you want a vHBA/NPIV then use the fc_host syntax.
After re-reading the docs, I'm quite convinced we should enforce the configuration, hopefully it would clean up the code significantly. Erik

[...]
Because it hasn't really been characterized as a misconfiguration previously. I doubt anyone outside of QE has ever done something like this as there's no reason to do so. If they want to use the HBA they'd use the 'scsi_host' format.
IMO: something with 'fc_host' what uses the HBA wwnn/wwpn is misconfigured because it's not a vHBA then, but there's been no attempt to prohibit that configuration, hence the current mess.
I'd be perfectly fine with turning this particular bz/check into - don't configure things this way because it's not how it's supposed to work. If you want a storage pool backed to an HBA, then use the scsi_host syntax. If you want a vHBA/NPIV then use the fc_host syntax.
After re-reading the docs, I'm quite convinced we should enforce the configuration, hopefully it would clean up the code significantly.
Erik
Not sure how much gets "cleaned up" as only the Create/Destroy storage pool code paths need to know. While I agree in principle that it's incorrect configuration, it has been allowed and removing something that was allowed at one time gets into a gray area. For the destroy path, we'd still have to be able to handle both since a storage pool could exist prior to disallowing the create path to work and we'd have no way to destroy the pool. For the create path, it's an adjustment "post" in patch3 to add checks for whether the parent or the parent found by wwnn/wwpn is an HBA, then disallow the startup. In any case, I'll post a new series with patch1 adjust as requested and patch 3 split to handle the two things - the first being checking the parent_name validity and the second to ensure the @name is not an HBA John

It's no longer needed since the checkParent code moved back to storage_backend_scsi.c Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/conf/node_device_conf.c | 8 +++----- src/conf/node_device_conf.h | 3 +-- src/storage/storage_backend_scsi.c | 2 +- 3 files changed, 5 insertions(+), 8 deletions(-) diff --git a/src/conf/node_device_conf.c b/src/conf/node_device_conf.c index 9c0ffa5..82f02fa 100644 --- a/src/conf/node_device_conf.c +++ b/src/conf/node_device_conf.c @@ -2260,7 +2260,6 @@ virNodeDeviceGetParentName(virConnectPtr conn, /** - * @conn: Connection pointer * @fchost: Pointer to vHBA adapter * * Create a vHBA for Storage. This code accomplishes this via searching @@ -2273,16 +2272,15 @@ virNodeDeviceGetParentName(virConnectPtr conn, * Returns vHBA name on success, NULL on failure with an error message set */ char * -virNodeDeviceCreateVport(virConnectPtr conn, - virStorageAdapterFCHostPtr fchost) +virNodeDeviceCreateVport(virStorageAdapterFCHostPtr fchost) { unsigned int parent_host; char *name = NULL; char *parent_hoststr = NULL; bool skip_capable_check = false; - VIR_DEBUG("conn=%p, parent='%s', wwnn='%s' wwpn='%s'", - conn, NULLSTR(fchost->parent), fchost->wwnn, fchost->wwpn); + VIR_DEBUG("parent='%s', wwnn='%s' wwpn='%s'", + NULLSTR(fchost->parent), fchost->wwnn, fchost->wwpn); if (fchost->parent) { if (VIR_STRDUP(parent_hoststr, fchost->parent) < 0) diff --git a/src/conf/node_device_conf.h b/src/conf/node_device_conf.h index d10683d..da56eaf 100644 --- a/src/conf/node_device_conf.h +++ b/src/conf/node_device_conf.h @@ -384,8 +384,7 @@ virNodeDeviceGetParentName(virConnectPtr conn, const char *nodedev_name); char * -virNodeDeviceCreateVport(virConnectPtr conn, - virStorageAdapterFCHostPtr fchost); +virNodeDeviceCreateVport(virStorageAdapterFCHostPtr fchost); int virNodeDeviceDeleteVport(virConnectPtr conn, diff --git a/src/storage/storage_backend_scsi.c b/src/storage/storage_backend_scsi.c index aa0fb99..365ab77 100644 --- a/src/storage/storage_backend_scsi.c +++ b/src/storage/storage_backend_scsi.c @@ -297,7 +297,7 @@ createVport(virConnectPtr conn, } } - if (!(name = virNodeDeviceCreateVport(conn, fchost))) + if (!(name = virNodeDeviceCreateVport(fchost))) goto cleanup; /* Creating our own VPORT didn't leave enough time to find any LUN's, -- 2.9.4

https://bugzilla.redhat.com/show_bug.cgi?id=1458708 When originally designed in commit id '42a021c1', providing the 'parent' attribute to the <adapter type='fc_host' wwnn='vHBA_wwnn' wwpn='vHBA_wwpn'/> was checked to make sure that the "parent" of the wwnn/wwpn matched that of the provided parent "just in case" someone created the node device first, then defined the storage pool using that node device afterwards. The result is to not perform the vport_create call when the scsi_host represented by the wwnn/wwpn already exists since it would fail. Eventually someone came up with the brilliant idea to provide wwnn/wwpn of an HBA instead of a vHBA, e.g. <adapter type='fc_host' wwnn='HBA_wwnn' wwpn='HBA_wwpn'/>. This is the same as creating a storage pool backed to the HBA that just happens to also be vport (vHBA) capable. The logic to bypass the vport_create call was the same as the vHBA code since the wwn's already exist. Once that was determined to work, adding in the 'parent' attribute caused a problem for the DeleteVport code, which was fixed by commit id '2c8e30ee7e'. The next test tried was providing a valid pair of wwns that would find the scsi_host HBA, but providing the wrong name for the 'parent' attribute. This caused a different failure because at DeleteVport time if a parent is provided it was assumed valid especially since the CreateVport code would check that by calling virVHBAPathExists. So alter the checkParent code to first ensure that the provided parent_name is a valid HBA/vHBA, then check if the found scsi_host is an HBA or a vHBA and make the appropriate check against the parent_name similar to the check made in virNodeDeviceDeleteVport. Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/storage/storage_backend_scsi.c | 47 ++++++++++++++++++++++++++++++++------ 1 file changed, 40 insertions(+), 7 deletions(-) diff --git a/src/storage/storage_backend_scsi.c b/src/storage/storage_backend_scsi.c index 365ab77..d225e4f 100644 --- a/src/storage/storage_backend_scsi.c +++ b/src/storage/storage_backend_scsi.c @@ -220,6 +220,7 @@ checkParent(virConnectPtr conn, const char *name, const char *parent_name) { + unsigned int host_num; char *scsi_host_name = NULL; char *vhba_parent = NULL; bool retval = false; @@ -230,20 +231,52 @@ checkParent(virConnectPtr conn, if (!conn) return true; - if (virAsprintf(&scsi_host_name, "scsi_%s", name) < 0) + if (virSCSIHostGetNumber(parent_name, &host_num) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("parent '%s' is not properly formatted"), name); goto cleanup; + } - if (!(vhba_parent = virNodeDeviceGetParentName(conn, scsi_host_name))) + if (!virVHBAPathExists(NULL, host_num)) { + virReportError(VIR_ERR_XML_ERROR, + ("parent '%s' is not a vHBA/HBA"), parent_name); goto cleanup; + } - if (STRNEQ(parent_name, vhba_parent)) { - virReportError(VIR_ERR_XML_ERROR, - _("Parent attribute '%s' does not match parent '%s' " - "determined for the '%s' wwnn/wwpn lookup."), - parent_name, vhba_parent, name); + if (virAsprintf(&scsi_host_name, "scsi_%s", name) < 0) + goto cleanup; + + if (virSCSIHostGetNumber(scsi_host_name, &host_num) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("host name '%s' is not properly formatted"), name); goto cleanup; } + /* If scsi_host_name is vport capable, then it's an HBA, thus compare + * only against the parent_name; otherwise, as long as the scsi_host_name + * path exists, then the scsi_host_name is a vHBA in which case we need + * to compare against it's parent. */ + if (virVHBAIsVportCapable(NULL, host_num)) { + if (STRNEQ(parent_name, scsi_host_name)) { + virReportError(VIR_ERR_XML_ERROR, + _("parent HBA '%s' doesn't match the wwnn/wwpn " + "scsi_host '%s'"), + parent_name, scsi_host_name); + goto cleanup; + } + } else { + if (!(vhba_parent = virNodeDeviceGetParentName(conn, scsi_host_name))) + goto cleanup; + + if (STRNEQ(parent_name, vhba_parent)) { + virReportError(VIR_ERR_XML_ERROR, + _("parent vHBA '%s' doesn't match the wwnn/wwpn " + "scsi_host '%s' parent '%s'"), + parent_name, scsi_host_name, vhba_parent); + goto cleanup; + } + } + retval = true; cleanup: -- 2.9.4

On Tue, Jul 18, 2017 at 11:10:40AM -0400, John Ferlan wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1458708
When originally designed in commit id '42a021c1', providing the 'parent' attribute to the <adapter type='fc_host' wwnn='vHBA_wwnn' wwpn='vHBA_wwpn'/> was checked to make sure that the "parent" of the wwnn/wwpn matched that of the provided parent "just in case" someone created the node device first, then defined the storage pool using that node device afterwards. The result is to not perform the vport_create call when the scsi_host represented by the wwnn/wwpn already exists since it would fail.
Okay, so we can both agree that these scenarios are rather unlikely. Why can't we just ignore the 'parent' attribute completely once we find out that a device with corresponding wwnn and wwpn already exist? Delete wouldn't be a problem in that case either, we do have a functional logic querying the parent automatically if none had been provided at the time of creation of the pool. Erik
Eventually someone came up with the brilliant idea to provide wwnn/wwpn of an HBA instead of a vHBA, e.g. <adapter type='fc_host' wwnn='HBA_wwnn' wwpn='HBA_wwpn'/>. This is the same as creating a storage pool backed to the HBA that just happens to also be vport (vHBA) capable. The logic to bypass the vport_create call was the same as the vHBA code since the wwn's already exist. Once that was determined to work, adding in the 'parent' attribute caused a problem for the DeleteVport code, which was fixed by commit id '2c8e30ee7e'.
The next test tried was providing a valid pair of wwns that would find the scsi_host HBA, but providing the wrong name for the 'parent' attribute. This caused a different failure because at DeleteVport time if a parent is provided it was assumed valid especially since the CreateVport code would check that by calling virVHBAPathExists.
So alter the checkParent code to first ensure that the provided parent_name is a valid HBA/vHBA, then check if the found scsi_host is an HBA or a vHBA and make the appropriate check against the parent_name similar to the check made in virNodeDeviceDeleteVport.

On 07/19/2017 08:42 AM, Erik Skultety wrote:
On Tue, Jul 18, 2017 at 11:10:40AM -0400, John Ferlan wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1458708
When originally designed in commit id '42a021c1', providing the 'parent' attribute to the <adapter type='fc_host' wwnn='vHBA_wwnn' wwpn='vHBA_wwpn'/> was checked to make sure that the "parent" of the wwnn/wwpn matched that of the provided parent "just in case" someone created the node device first, then defined the storage pool using that node device afterwards. The result is to not perform the vport_create call when the scsi_host represented by the wwnn/wwpn already exists since it would fail.
Okay, so we can both agree that these scenarios are rather unlikely. Why can't we just ignore the 'parent' attribute completely once we find out that a device with corresponding wwnn and wwpn already exist? Delete wouldn't be a problem in that case either, we do have a functional logic querying the parent automatically if none had been provided at the time of creation of the pool.
Erik
Nothing quickly sprang to mind until I read the storage pool section on @parent and remembered that it's useful to avoid multiple pools using the same source as the duplicate pool checks done as part of virStoragePoolObjSourceMatchTypeISCSI John
Eventually someone came up with the brilliant idea to provide wwnn/wwpn of an HBA instead of a vHBA, e.g. <adapter type='fc_host' wwnn='HBA_wwnn' wwpn='HBA_wwpn'/>. This is the same as creating a storage pool backed to the HBA that just happens to also be vport (vHBA) capable. The logic to bypass the vport_create call was the same as the vHBA code since the wwn's already exist. Once that was determined to work, adding in the 'parent' attribute caused a problem for the DeleteVport code, which was fixed by commit id '2c8e30ee7e'.
The next test tried was providing a valid pair of wwns that would find the scsi_host HBA, but providing the wrong name for the 'parent' attribute. This caused a different failure because at DeleteVport time if a parent is provided it was assumed valid especially since the CreateVport code would check that by calling virVHBAPathExists.
So alter the checkParent code to first ensure that the provided parent_name is a valid HBA/vHBA, then check if the found scsi_host is an HBA or a vHBA and make the appropriate check against the parent_name similar to the check made in virNodeDeviceDeleteVport.

diff --git a/src/storage/storage_backend_scsi.c b/src/storage/storage_backend_scsi.c index 365ab77..d225e4f 100644 --- a/src/storage/storage_backend_scsi.c +++ b/src/storage/storage_backend_scsi.c @@ -220,6 +220,7 @@ checkParent(virConnectPtr conn, const char *name, const char *parent_name) { + unsigned int host_num; char *scsi_host_name = NULL; char *vhba_parent = NULL; bool retval = false; @@ -230,20 +231,52 @@ checkParent(virConnectPtr conn, if (!conn) return true;
- if (virAsprintf(&scsi_host_name, "scsi_%s", name) < 0) + if (virSCSIHostGetNumber(parent_name, &host_num) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("parent '%s' is not properly formatted"), name); goto cleanup; + }
- if (!(vhba_parent = virNodeDeviceGetParentName(conn, scsi_host_name))) + if (!virVHBAPathExists(NULL, host_num)) { + virReportError(VIR_ERR_XML_ERROR, + ("parent '%s' is not a vHBA/HBA"), parent_name);
Under what circumstances can parent be a ^^^^vHBA??
goto cleanup; + }
- if (STRNEQ(parent_name, vhba_parent)) { - virReportError(VIR_ERR_XML_ERROR, - _("Parent attribute '%s' does not match parent '%s' " - "determined for the '%s' wwnn/wwpn lookup."), - parent_name, vhba_parent, name); + if (virAsprintf(&scsi_host_name, "scsi_%s", name) < 0) + goto cleanup; + + if (virSCSIHostGetNumber(scsi_host_name, &host_num) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("host name '%s' is not properly formatted"), name); goto cleanup; }
+ /* If scsi_host_name is vport capable, then it's an HBA, thus compare + * only against the parent_name; otherwise, as long as the scsi_host_name + * path exists, then the scsi_host_name is a vHBA in which case we need + * to compare against it's parent. */ + if (virVHBAIsVportCapable(NULL, host_num)) {
I truly find it pointless to try to do any kind of parent validation on HBA device. Can a HBA device actually have a scsi_host parent? If not, then we should only validate whether the device given by its wwnn and wwpn exist (we already know that since we're here) is vport capable and then if the 'parent' attribute has been defined, then return an error about invalid configuration, rather than trying to compare parent names in that matter. Erik

On 07/19/2017 09:21 AM, Erik Skultety wrote:
diff --git a/src/storage/storage_backend_scsi.c b/src/storage/storage_backend_scsi.c index 365ab77..d225e4f 100644 --- a/src/storage/storage_backend_scsi.c +++ b/src/storage/storage_backend_scsi.c @@ -220,6 +220,7 @@ checkParent(virConnectPtr conn, const char *name, const char *parent_name) { + unsigned int host_num; char *scsi_host_name = NULL; char *vhba_parent = NULL; bool retval = false; @@ -230,20 +231,52 @@ checkParent(virConnectPtr conn, if (!conn) return true;
- if (virAsprintf(&scsi_host_name, "scsi_%s", name) < 0) + if (virSCSIHostGetNumber(parent_name, &host_num) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("parent '%s' is not properly formatted"), name); goto cleanup; + }
- if (!(vhba_parent = virNodeDeviceGetParentName(conn, scsi_host_name))) + if (!virVHBAPathExists(NULL, host_num)) { + virReportError(VIR_ERR_XML_ERROR, + ("parent '%s' is not a vHBA/HBA"), parent_name);
Under what circumstances can parent be a ^^^^vHBA??
The function tests whether "/sys/class/fc_host/host%d" exists... On my system: ls /sys/class/fc_host host3 host4 host8 where host3 and host4 are HBA and host8 is vHBA The difference between them? host3/host4 have files "vport_create", "vport_delete", "max_npiv_vports", "npiv_vports_inuse", and "issue_lip". The XML for an HBA would be: <adapter type='fc_host' [parent='scsi_host3'] wwnn='HBA_wwnn' wwpn='HBA_wwpn'/> and the vHBA: <adapter type='fc_host' [parent='scsi_host3'] wwnn='vHBA_wwnn' wwpn='vHBA_wwpn'/> Both are legal XML.
goto cleanup; + }
- if (STRNEQ(parent_name, vhba_parent)) { - virReportError(VIR_ERR_XML_ERROR, - _("Parent attribute '%s' does not match parent '%s' " - "determined for the '%s' wwnn/wwpn lookup."), - parent_name, vhba_parent, name); + if (virAsprintf(&scsi_host_name, "scsi_%s", name) < 0) + goto cleanup; + + if (virSCSIHostGetNumber(scsi_host_name, &host_num) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("host name '%s' is not properly formatted"), name); goto cleanup; }
+ /* If scsi_host_name is vport capable, then it's an HBA, thus compare + * only against the parent_name; otherwise, as long as the scsi_host_name + * path exists, then the scsi_host_name is a vHBA in which case we need + * to compare against it's parent. */ + if (virVHBAIsVportCapable(NULL, host_num)) {
I truly find it pointless to try to do any kind of parent validation on HBA device. Can a HBA device actually have a scsi_host parent? If not, then we should only validate whether the device given by its wwnn and wwpn exist (we already know that since we're here) is vport capable and then if the 'parent' attribute has been defined, then return an error about invalid configuration, rather than trying to compare parent names in that matter.
Erik
vport capability is essentially the presence of "vport_create" in the /sys/class/fc_host/host# or /sys/class/scsi_host/host# path. An HBA has some sort of a PCI_* parent based on it's address. For this particular bug, someone provided a parent and it was wrong, so that's what's being checked. Although I agree the insanity of the checks for someone providing wrong data is annoying; however, if not checked then it was possible to create a storage pool backed to the HBA defined by the wwnn/wwpn provided; however, when it comes time to destroy it, the code ended up calling the VPORT_DELETE and obviously failing, causing the command to fail to remove the pool. And there was no way to remove the pool with the invalid parent name. So at startup, let's make sure if the parent name is provided that it's correct. These start checks essentially mimic the destroy tests. One could argue though that the failure paths in shutdown for virSCSIHostGetNumber and virNodeDeviceGetParentName should never happen, but assuming that is paramount to someone finding a way to make it happen. John
participants (2)
-
Erik Skultety
-
John Ferlan