On 09/13/2016 04:49 PM, John Ferlan wrote:
[...]
>> I took a peek at the v1, but suffice to say didn't follow it that
>> closely since there's a number of changes in v2 related to patch
>> ordering. You could get 3 different opinions on that matter too!
> Hopefully a consensus before too long. :)
>
yeah - v1 was lots of little steps that you were asked to combine. OTOH
there are those that like to combine multiple things in one patch which
can make it really difficult to follow. The order here was "followable"
- my suggestion was more to try and get the guts (qemu) done first, then
the fluff (domain/xml).
Agreed. v1 was perhaps overkill in breaking things down so small,
because I was picking up an idea with a bit of dust on it.
Consider your patch order to be making progress towards some sustainable
feature/bug fix - each step along the way being able to pass "make check
syntax-check" so that future attempts to git bisect can point the fickle
finger of fate (at someone else!). If a patch adds something that will
be external that a followup patch will use - then great. If it's
testable, even better. For example, adding virSCSIOpenVhost as a
separate patch with the tests/qemuxml2argvmock.c is fine. One more
tidbit here - it's a very faint memory, but check out
tests/commandtest.c and the lengthy comment in mymain() regarding
"expected" fd values. I think you're "safe" in what you've
done -
although your mock should mimic whatever changes you make to the primary
function.
[...]
>> A wwpn doesn't generally have the "naa." prefix on it. So, why not
just
>> add that to the command line instead? That is a wwpn is a 16 hex digit
>> value. I'm assuming that in order to support this feature a "naa."
is
>> prefixed and sent to qemu. Does that necessarily mean some other
>> hypervisor would choose to place the "naa." prefix or would then code
>> need to be added to strip it off for those other hypervisors? Looking
>> at the qemu code - it seems fairly specific.
> Fairly certain that it's more a vhost thing of which the hypervisor
> utilizes. Per a recommendation made on this list years ago, I used
> targetcli to do the configuration rather than direct manipulation of
> sysfs, and according to its man page:
>
> All WWNs are prefaced by their type, such as "iqn", "naa",
or "ib",
> and may be given with or without colons.
>
I don't use targetcli so I'm not familiar with it.
Still not convinced using naa.# in the XML is the best way to go, but
I'm not against it. I guess a lot depends on how the sysfs views things.
> As to whether other hypervisors follow the same behavior, I do not
> know. I was just trying to conform to what it would allow in the case
> of vhost-scsi. Now, I suppose I'm confusing matters by saying "wwpn"
> here since as you say that's 16 hex digits, and should probably say just
> "wwn" ?
>
I guess I've always thought of a wwn/wwpn/wwnn as that string of
bytes...
I always thought WWPN and WWNN as types of WWN...
Usually WWN/WWNN have been interchangeable, while WWPN is a
WWN assigned to a port in a Fibre Channel fabric
...but have seen it this way also. Either way makes sense to me.
FWIW: virsh nodedev-dumpxml on a vport capable HBA and the created vHBA
will list a wwnn, wwpn, and fabric_wwn. The fabric_wwn's will match.
The wwnn/wwpn for the vport is what was used when running the
HBA's/vport_create function. It's a tangled web. Wait until vGPU's
show up (that's another black magic trick).
[...]
>> It would be nice to tell a user how to find that wwpn value via some
>> command (virsh or otherwise) in order to fill in the wwpn attribute.
>> Although there's always varying opinions on this since many times it's
>> distro dependent. That's why I suggest virsh - at least if it's
>> supportable we can display the wwpn there using nodedev-dumpxml.
> Hrm... nodedev-dumpxml is new to me, neat! I'll go down that bunny
> trail, but it seems like it'd be a bit of work to add the smarts for this.
>
Hours of all sorts of fun.... The 'nodedev-{list|dumpxml}' are
essentially the libvirt mechanism to list/describe the various rabbit
holes that is/are the sysfs tree and especially branches. The
nodedev-list --tree will probably be of most interest, but so is virsh
nodedev-list {scsi_host|pci|usb|scsi|scsi_generic|scs_target|net}
>> What/where is the 'vhost_scsi_wwpn'? IOW: It's not something
defined
>> yet - I'm assuming there's some output somewhere where that's pulled
>> from.
> As near as I can tell, it's not anything that can be autogenerated off
> the bat. Somebody somewhere needs to first establish the sysfs entries
> for the vhost target, which can then be stored and loaded on a next
> (host) boot. The end result looks something like this:
>
AFAIK systemd handles the sysfs creation (ok at least for my purposes).
This is what I was concerned about - we really need a way to tell
someone how to "easily" find the wwpn that's to be used. Although I
don't have vhost-scsi on my laptop ;-)
Me neither. Of course, I'm using an s390, so that's a little different
too. :)
- I have to figure that by using
'ls -al' on the path from a 'virsh nodedev-dumpxml' of a scsi_hostN that
is an HBA will show a <path> that path would then be a starting point...
Googling "scsi_host HBA wwn" shows me some interesting results including
ones that seem to indicate 'fc_host' is involved in some way even for
the HBA (I guess that makes sense - where the vHBA is just an
abstraction off of that). That said, a '/sys/class/fc_host/' is the
starting point then to find things and virsh nodedev-list --cap vports
would be the way to get capable HBA's.
/sys/class/fc_host is just showing the individual SCSI disks, but
nothing that looks to point to the vhost-scsi target that I'm passing to
QEMU. /sys/class/fc_vports is empty. (Oh wait, is NPIV turned on
here? Ugh, pick this up later.)
> [root@xxxxxxxx ~]# ls -l
> /sys/kernel/config/target/vhost/naa.50014056e4794195/tpgt_1/lun/lun_0/
> total 0
Is the 'vhost' or 'vhost/naa.%s' linked from some
/sys/class/{scsi|fc}_host/host%d directory?
No, I can't see anybody linking it from anywhere except the couple cross
links in /sys/kernel/config/ shown below.
I think that becomes the
key for traversal. Or maybe finding "5001...4195" in some other link
from /sys/class/{scsi|fc}_host/....
> lrwxrwxrwx 1 root root 0 Sep 13 15:44 752e150d11 ->
> ../../../../../../target/core/iblock_0/disk0
> -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_gp
> -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_offline
> -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_status
> -rw-r--r-- 1 root root 4096 Sep 13 15:45 alua_tg_pt_write_md
> drwxr-xr-x 5 root root 0 Sep 13 15:44 statistics
> [root@xxxxxxxx ~]# cat /sys/kernel/config/target/core/iblock_0/disk0/info
> Status: ACTIVATED Max Queue Depth: 0 SectorSize: 512 HwMaxSectors: 4592
> iBlock device: dm-0 UDEV PATH: /dev/mapper/mpathb readonly: 0
> Major: 252 Minor: 0 CLAIMED: IBLOCK
>
> (Aside: note the folder name in /sys/kernel/config that contains the
> wwn/wwpn name with the "naa." prefix.)
>
>
I see... Then again "/sys/kernel" isn't the path used by nodedev driver,
hence my question above.
Erg, so then the nodedev stuff becomes bigger still.
>> One final question - what is this exposed as on the guest? "Some"
>> scsi_hostM? For a SCSI LUN it's some 'sgN' value. I see the qemu
side
>> seems to infer some amount of scsi_generic code (read_naa_id), so it's
>> mostly curiosity/learning for me.
> Anything that's put behind the target gets seen to the guest. So that
> can be one or more sgN LUNs and their sdX equivalents, and as near as I
> can tell a scsi_hostM as you describe. A quick test with two LUNs
> behind one vhost-scsi target gives me the following in the guest:
>
> # ls -l /sys/bus/scsi/devices
> total 0
> lrwxrwxrwx 1 root root 0 Sep 13 16:10 0:0:1:0 ->
> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0/target0:0:1/0:0:1:0
> lrwxrwxrwx 1 root root 0 Sep 13 16:10 0:0:1:1 ->
> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0/target0:0:1/0:0:1:1
> lrwxrwxrwx 1 root root 0 Sep 13 16:10 host0 ->
> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0
> lrwxrwxrwx 1 root root 0 Sep 13 16:11 target0:0:1 ->
> ../../../devices/css0/0.0.0002/0.0.beef/virtio2/host0/target0:0:1
> # lsscsi
> [0:0:1:0] disk LIO-ORG disk0 4.0 /dev/sda
> [0:0:1:1] disk LIO-ORG disk1 4.0 /dev/sdb
>
> FYI, hotplugging/unplugging devices to a target works just fine, except
> there's no notification that a guest can pick up on to know that a
> device had been added. Or worse, taken away.
>
Hmm. Maybe I'm just not asking the right question.
I guess my perception was that if you expose the 'scsi_hostN' HBA then
you are giving the guest the whole HBA to use as it sees fit. The host
shouldn't be touching the luns afterwards.
Correct, it's just not a "scsi_hostN" HBA that exists as part of the
normal fibre attachment, but a vHBA that's defined to the vhost-scsi
driver (via targetcli or some other manual handshaking).
What you've shown is 2 LUNS in the /sys/bus/scsi/ tree; however, is
there a /sys/bus/scsi_host/hostN that corresponds to the passed through
HBA?
Nope... (s390 host)
# ls -l /sys/bus/
total 0
drwxr-xr-x 4 root root 0 Sep 13 23:12 ccw
drwxr-xr-x 4 root root 0 Sep 13 23:12 clockevents
drwxr-xr-x 4 root root 0 Sep 13 23:12 clocksource
drwxr-xr-x 4 root root 0 Sep 13 23:12 container
drwxr-xr-x 4 root root 0 Sep 13 23:12 cpu
drwxr-xr-x 4 root root 0 Sep 13 23:12 css
drwxr-xr-x 4 root root 0 Sep 13 23:12 etr
drwxr-xr-x 4 root root 0 Sep 13 23:12 event_source
drwxr-xr-x 4 root root 0 Sep 13 23:12 memory
drwxr-xr-x 5 root root 0 Sep 13 23:12 pci
drwxr-xr-x 4 root root 0 Sep 13 23:12 platform
drwxr-xr-x 4 root root 0 Sep 13 23:12 scm
drwxr-xr-x 4 root root 0 Sep 13 23:12 scsi
drwxr-xr-x 4 root root 0 Sep 13 23:12 stp
drwxr-xr-x 4 root root 0 Sep 13 23:12 virtio
drwxr-xr-x 4 root root 0 Sep 13 23:12 workqueue
[...]
>>> + <attribute name="protocol">
>>> + <choice>
>>> + <value>vhost</value> <!-- vhost, required
-->
>>> + </choice>
>>> + </attribute>
>>> + <attribute name="wwpn">
>>> + <data type="string">
>>> + <param
name="pattern">(naa\.)[0-9a-fA-F]{16}</param>
>>> + </data>
>> If you go with the we'll add "naa." later, then you could just:
>>
>> <ref name='wwn'/>
> This rings a bell with me, as to why I had the label be "wwpn" instead
> of "wwn" above. My attempt at trying to distinguish libvirt internals
> with something user-facing, maybe?
>
I think wwpn will still be technically correct (for some strange reason
after skimming google results). The HBA will have a port which is how
it's reached. I suppose the answer though is in how sysfs views things.
[...]
>>> "pci",
>>> - "scsi")
>>> + "scsi",
>>> + "scsi_host")
>>> VIR_ENUM_IMPL(virDomainHostdevSubsysPCIBackend,
>>> VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_LAST,
>>> @@ -660,6 +661,10 @@ VIR_ENUM_IMPL(virDomainHostdevSubsysSCSIProtocol,
>>> "adapter",
>>> "iscsi")
>>> +VIR_ENUM_IMPL(virDomainHostdevSubsysHostProtocol,
>>> + VIR_DOMAIN_HOSTDEV_HOST_PROTOCOL_TYPE_LAST,
>>> + "vhost")
>>> +
>> Notice how the virDomainHostdevSubsysSCSIProtocol adds 'adapter' first -
>> this should do something similar (although the preference is to use
>> "none" - I cannot recall why that was different).
> Because "none" == "adapter" in the case of SCSI?
>
Some sort of RNG magic in commit id '54ac483e6' - I forget the details
(too long ago), but I think it had to do with being able to reuse the
existing RNG without creating a whole new option.
Fair enough.
[...]
>> hmmm... If you're going to change this here, then
>> qemuDomainAttachHostDevice would be changed at the same time. Hence why
>> this ordering stuff to patches is important.
>>
>> I cannot recall if removing would cause a build error in this case.
> It does, because of the switch statement not having a "default" case.
>
Perhaps a different ordering of patches removes the need.... But I know
it's tough with this new type... I guess it's odd the *RemoveHostDevice
needed it, but the corollary Attach didn't... Ahhh - I see why - the
switch in Attach doesn't have the typecast (virDomainHostdevSubsysType).
I can send a patch for that or you can make the modification too...
I have a patch started, but kept it separate from this series because I
got distracted/hoped to get v2 out before the last freeze. I will dust
that off.
[...]
> Thanks for the review. (Silent ACK on a lot of the above comments.)
> I'll try to get these all in place for a v3 with a little runway before
> the next freeze.
>
OK - hopefully I won't be neck deep in some other firefight
Thanks for taking a few minutes away from those to look over this little
pile. I appreciate the fires that pop up and get in the way of the
usual todo list. :)
and will be
able to give them quicker attention. At the very least getting the
capabilities changes in which is generally the most conflict...
Apologies for that. I've already rebased/resolved the capability
conflict that occurred between posting to the list and now, and moved
that to the head of my series. Taking a day's holiday tomorrow, so will
get back to the v3 series Thursday.
- Eric