
On 09/13/2016 04:49 PM, John Ferlan wrote:
[...]
I took a peek at the v1, but suffice to say didn't follow it that closely since there's a number of changes in v2 related to patch ordering. You could get 3 different opinions on that matter too! Hopefully a consensus before too long. :)
yeah - v1 was lots of little steps that you were asked to combine. OTOH there are those that like to combine multiple things in one patch which can make it really difficult to follow. The order here was "followable" - my suggestion was more to try and get the guts (qemu) done first, then the fluff (domain/xml).
Agreed. v1 was perhaps overkill in breaking things down so small, because I was picking up an idea with a bit of dust on it.
Consider your patch order to be making progress towards some sustainable feature/bug fix - each step along the way being able to pass "make check syntax-check" so that future attempts to git bisect can point the fickle finger of fate (at someone else!). If a patch adds something that will be external that a followup patch will use - then great. If it's testable, even better. For example, adding virSCSIOpenVhost as a separate patch with the tests/qemuxml2argvmock.c is fine. One more tidbit here - it's a very faint memory, but check out tests/commandtest.c and the lengthy comment in mymain() regarding "expected" fd values. I think you're "safe" in what you've done - although your mock should mimic whatever changes you make to the primary function.
[...]
A wwpn doesn't generally have the "naa." prefix on it. So, why not just add that to the command line instead? That is a wwpn is a 16 hex digit value. I'm assuming that in order to support this feature a "naa." is prefixed and sent to qemu. Does that necessarily mean some other hypervisor would choose to place the "naa." prefix or would then code need to be added to strip it off for those other hypervisors? Looking at the qemu code - it seems fairly specific. Fairly certain that it's more a vhost thing of which the hypervisor utilizes. Per a recommendation made on this list years ago, I used targetcli to do the configuration rather than direct manipulation of sysfs, and according to its man page:
All WWNs are prefaced by their type, such as "iqn", "naa", or "ib", and may be given with or without colons.
I don't use targetcli so I'm not familiar with it.
Still not convinced using naa.# in the XML is the best way to go, but I'm not against it. I guess a lot depends on how the sysfs views things.
As to whether other hypervisors follow the same behavior, I do not know. I was just trying to conform to what it would allow in the case of vhost-scsi. Now, I suppose I'm confusing matters by saying "wwpn" here since as you say that's 16 hex digits, and should probably say just "wwn" ?
I guess I've always thought of a wwn/wwpn/wwnn as that string of bytes...
I always thought WWPN and WWNN as types of WWN...
Usually WWN/WWNN have been interchangeable, while WWPN is a WWN assigned to a port in a Fibre Channel fabric
...but have seen it this way also. Either way makes sense to me.
FWIW: virsh nodedev-dumpxml on a vport capable HBA and the created vHBA will list a wwnn, wwpn, and fabric_wwn. The fabric_wwn's will match. The wwnn/wwpn for the vport is what was used when running the HBA's/vport_create function. It's a tangled web. Wait until vGPU's show up (that's another black magic trick).
[...]
It would be nice to tell a user how to find that wwpn value via some command (virsh or otherwise) in order to fill in the wwpn attribute. Although there's always varying opinions on this since many times it's distro dependent. That's why I suggest virsh - at least if it's supportable we can display the wwpn there using nodedev-dumpxml. Hrm... nodedev-dumpxml is new to me, neat! I'll go down that bunny trail, but it seems like it'd be a bit of work to add the smarts for this.
Hours of all sorts of fun.... The 'nodedev-{list|dumpxml}' are essentially the libvirt mechanism to list/describe the various rabbit holes that is/are the sysfs tree and especially branches. The nodedev-list --tree will probably be of most interest, but so is virsh nodedev-list {scsi_host|pci|usb|scsi|scsi_generic|scs_target|net}
What/where is the 'vhost_scsi_wwpn'? IOW: It's not something defined yet - I'm assuming there's some output somewhere where that's pulled from. As near as I can tell, it's not anything that can be autogenerated off the bat. Somebody somewhere needs to first establish the sysfs entries for the vhost target, which can then be stored and loaded on a next (host) boot. The end result looks something like this:
AFAIK systemd handles the sysfs creation (ok at least for my purposes).
This is what I was concerned about - we really need a way to tell someone how to "easily" find the wwpn that's to be used. Although I don't have vhost-scsi on my laptop ;-)
Me neither. Of course, I'm using an s390, so that's a little different too. :)
- I have to figure that by using 'ls -al' on the path from a 'virsh nodedev-dumpxml' of a scsi_hostN that is an HBA will show a <path> that path would then be a starting point...
Googling "scsi_host HBA wwn" shows me some interesting results including ones that seem to indicate 'fc_host' is involved in some way even for the HBA (I guess that makes sense - where the vHBA is just an abstraction off of that). That said, a '/sys/class/fc_host/' is the starting point then to find things and virsh nodedev-list --cap vports would be the way to get capable HBA's.
/sys/class/fc_host is just showing the individual SCSI disks, but nothing that looks to point to the vhost-scsi target that I'm passing to QEMU. /sys/class/fc_vports is empty. (Oh wait, is NPIV turned on here? Ugh, pick this up later.)
[root@xxxxxxxx ~]# ls -l /sys/kernel/config/target/vhost/naa.50014056e4794195/tpgt_1/lun/lun_0/ total 0 Is the 'vhost' or 'vhost/naa.%s' linked from some /sys/class/{scsi|fc}_host/host%d directory?
No, I can't see anybody linking it from anywhere except the couple cross links in /sys/kernel/config/ shown below.
I think that becomes the key for traversal. Or maybe finding "5001...4195" in some other link from /sys/class/{scsi|fc}_host/....
lrwxrwxrwx 1 root root 0 Sep 13 15:44 752e150d11 -> ../../../../../../target/core/iblock_0/disk0 -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_gp -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_offline -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_status -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_write_md drwxr-xr-x 5 root root 0 Sep 13 15:44 statistics [root@xxxxxxxx ~]# cat /sys/kernel/config/target/core/iblock_0/disk0/info Status: ACTIVATED Max Queue Depth: 0 SectorSize: 512 HwMaxSectors: 4592 iBlock device: dm-0 UDEV PATH: /dev/mapper/mpathb readonly: 0 Major: 252 Minor: 0 CLAIMED: IBLOCK
(Aside: note the folder name in /sys/kernel/config that contains the wwn/wwpn name with the "naa." prefix.)
I see... Then again "/sys/kernel" isn't the path used by nodedev driver, hence my question above.
Erg, so then the nodedev stuff becomes bigger still.
One final question - what is this exposed as on the guest? "Some" scsi_hostM? For a SCSI LUN it's some 'sgN' value. I see the qemu side seems to infer some amount of scsi_generic code (read_naa_id), so it's mostly curiosity/learning for me. Anything that's put behind the target gets seen to the guest. So that can be one or more sgN LUNs and their sdX equivalents, and as near as I can tell a scsi_hostM as you describe. A quick test with two LUNs behind one vhost-scsi target gives me the following in the guest:
# ls -l /sys/bus/scsi/devices total 0 lrwxrwxrwx 1 root root 0 Sep 13 16:10 0:0:1:0 -> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0/target0:0:1/0:0:1:0 lrwxrwxrwx 1 root root 0 Sep 13 16:10 0:0:1:1 -> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0/target0:0:1/0:0:1:1 lrwxrwxrwx 1 root root 0 Sep 13 16:10 host0 -> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0 lrwxrwxrwx 1 root root 0 Sep 13 16:11 target0:0:1 -> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0/target0:0:1 # lsscsi [0:0:1:0] disk LIO-ORG disk0 4.0 /dev/sda [0:0:1:1] disk LIO-ORG disk1 4.0 /dev/sdb
FYI, hotplugging/unplugging devices to a target works just fine, except there's no notification that a guest can pick up on to know that a device had been added. Or worse, taken away.
Hmm. Maybe I'm just not asking the right question.
I guess my perception was that if you expose the 'scsi_hostN' HBA then you are giving the guest the whole HBA to use as it sees fit. The host shouldn't be touching the luns afterwards.
Correct, it's just not a "scsi_hostN" HBA that exists as part of the normal fibre attachment, but a vHBA that's defined to the vhost-scsi driver (via targetcli or some other manual handshaking).
What you've shown is 2 LUNS in the /sys/bus/scsi/ tree; however, is there a /sys/bus/scsi_host/hostN that corresponds to the passed through HBA?
Nope... (s390 host) # ls -l /sys/bus/ total 0 drwxr-xr-x 4 root root 0 Sep 13 23:12 ccw drwxr-xr-x 4 root root 0 Sep 13 23:12 clockevents drwxr-xr-x 4 root root 0 Sep 13 23:12 clocksource drwxr-xr-x 4 root root 0 Sep 13 23:12 container drwxr-xr-x 4 root root 0 Sep 13 23:12 cpu drwxr-xr-x 4 root root 0 Sep 13 23:12 css drwxr-xr-x 4 root root 0 Sep 13 23:12 etr drwxr-xr-x 4 root root 0 Sep 13 23:12 event_source drwxr-xr-x 4 root root 0 Sep 13 23:12 memory drwxr-xr-x 5 root root 0 Sep 13 23:12 pci drwxr-xr-x 4 root root 0 Sep 13 23:12 platform drwxr-xr-x 4 root root 0 Sep 13 23:12 scm drwxr-xr-x 4 root root 0 Sep 13 23:12 scsi drwxr-xr-x 4 root root 0 Sep 13 23:12 stp drwxr-xr-x 4 root root 0 Sep 13 23:12 virtio drwxr-xr-x 4 root root 0 Sep 13 23:12 workqueue
[...]
+ <attribute name="protocol"> + <choice> + <value>vhost</value> <!-- vhost, required --> + </choice> + </attribute> + <attribute name="wwpn"> + <data type="string"> + <param name="pattern">(naa\.)[0-9a-fA-F]{16}</param> + </data> If you go with the we'll add "naa." later, then you could just:
<ref name='wwn'/> This rings a bell with me, as to why I had the label be "wwpn" instead of "wwn" above. My attempt at trying to distinguish libvirt internals with something user-facing, maybe?
I think wwpn will still be technically correct (for some strange reason after skimming google results). The HBA will have a port which is how it's reached. I suppose the answer though is in how sysfs views things.
[...]
"pci", - "scsi") + "scsi", + "scsi_host") VIR_ENUM_IMPL(virDomainHostdevSubsysPCIBackend, VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_LAST, @@ -660,6 +661,10 @@ VIR_ENUM_IMPL(virDomainHostdevSubsysSCSIProtocol, "adapter", "iscsi") +VIR_ENUM_IMPL(virDomainHostdevSubsysHostProtocol, + VIR_DOMAIN_HOSTDEV_HOST_PROTOCOL_TYPE_LAST, + "vhost") +
Notice how the virDomainHostdevSubsysSCSIProtocol adds 'adapter' first - this should do something similar (although the preference is to use "none" - I cannot recall why that was different). Because "none" == "adapter" in the case of SCSI?
Some sort of RNG magic in commit id '54ac483e6' - I forget the details (too long ago), but I think it had to do with being able to reuse the existing RNG without creating a whole new option.
Fair enough.
[...]
hmmm... If you're going to change this here, then qemuDomainAttachHostDevice would be changed at the same time. Hence why this ordering stuff to patches is important.
I cannot recall if removing would cause a build error in this case. It does, because of the switch statement not having a "default" case.
Perhaps a different ordering of patches removes the need.... But I know it's tough with this new type... I guess it's odd the *RemoveHostDevice needed it, but the corollary Attach didn't... Ahhh - I see why - the switch in Attach doesn't have the typecast (virDomainHostdevSubsysType).
I can send a patch for that or you can make the modification too...
I have a patch started, but kept it separate from this series because I got distracted/hoped to get v2 out before the last freeze. I will dust that off.
[...]
Thanks for the review. (Silent ACK on a lot of the above comments.) I'll try to get these all in place for a v3 with a little runway before the next freeze.
OK - hopefully I won't be neck deep in some other firefight
Thanks for taking a few minutes away from those to look over this little pile. I appreciate the fires that pop up and get in the way of the usual todo list. :)
and will be able to give them quicker attention. At the very least getting the capabilities changes in which is generally the most conflict...
Apologies for that. I've already rebased/resolved the capability conflict that occurred between posting to the list and now, and moved that to the head of my series. Taking a day's holiday tomorrow, so will get back to the v3 series Thursday. - Eric