On 01/26/2018 04:07 AM, John Ferlan wrote:
On 01/18/2018 11:04 AM, Michal Privoznik wrote:
> This is a definition that holds information on SCSI persistent
> reservation settings. The XML part looks like this:
>
> <reservations enabled='yes' managed='no'>
> <source type='unix' path='/path/to/qemu-pr-helper.sock'
mode='client'/>
> </reservations>
>
> If @managed is set to 'yes' then the <source/> is not parsed.
> This design was agreed on here:
>
>
https://www.redhat.com/archives/libvir-list/2017-November/msg01005.html
>
> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
> ---
> docs/formatdomain.html.in | 25 +++-
> docs/schemas/domaincommon.rng | 19 +--
> docs/schemas/storagecommon.rng | 34 +++++
> src/conf/domain_conf.c | 36 +++++
> src/libvirt_private.syms | 3 +
> src/util/virstoragefile.c | 148 +++++++++++++++++++++
> src/util/virstoragefile.h | 15 +++
> .../disk-virtio-scsi-reservations-not-managed.xml | 40 ++++++
> .../disk-virtio-scsi-reservations.xml | 38 ++++++
> .../disk-virtio-scsi-reservations-not-managed.xml | 1 +
> .../disk-virtio-scsi-reservations.xml | 1 +
> tests/qemuxml2xmltest.c | 4 +
> 12 files changed, 348 insertions(+), 16 deletions(-)
> create mode 100644
tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml
> create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml
> create mode 120000
tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml
> create mode 120000 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml
>
Before digging too deep into this...
- I assume we're avoiding <disk> iSCSI mainly because those
reservations would take place elsewhere, safe assumption?
I believe so, but I'll let Paolo answer that. The way I understand
reservations is that qemu needs to issue 'privileged' SCSI commands and
thus for regular SCSI (which for purpose of this argument involves iSCSI
emulated by kernel) either qemu needs CAP_SYS_RAWIO or a helper process
to which it'll pass the FD and which will issue the 'privileged' SCSI
commands on qemu's behalf.
- What about using lun's from a storage pool (and what could become
your favorite, NPIV devices ;-))
<disk type='volume' device='lun'>
<driver name='qemu' type='raw'/>
<source pool='sourcepool' volume='unit:0:4:0'/>
<target dev='sda' bus='scsi'/>
</disk>
These should work too with my patches (not tested though - I don't have
any real SCSI machine).
- What about <hostdev>'s?
<hostdev mode='subsystem' type='scsi'>
but not iSCSI or vHost hostdev's. I think that creates the SCSI
generic LUN, but it's been a while since I've thought about the
terminology used for hostdev's...
I think these don't need the feature since qemu can access the device
directly.
I also have this faint recollection of PR related to sgio filtering
and perhaps even rawio, but dredging that back up could send me down the
path of some "downstream only" type bz's. Although searching on just
VIR_DOMAIN_DISK_DEVICE_LUN does bring up qemuSetUnprivSGIO.
And finally... I assume there is one qemu-pr-manager (qemu.conf changes
eventually)... Eventually there's magic that allows/adds per domain
*and* per LUN some socket path. If libvirt provided it's generated via
the domain temporary directory; however, what's not really clear is how
that unmanaged path really works. Need a virtual whiteboard...
So, in case of unmanaged path, here are the assumptions that my patches
are built on:
1) unmanaged helper process (UHP) is spawned by somebody else's than
libvirtd (hence unmanaged) - it doesn't have to be user, it can be
systemd for instance.
2) path to UHP's socket has to be labeled correctly - libvirt doesn't
touch that, because it knows nothing about usage scenario, whether sys
admin intended one UHP per whole host and thus configured label that
way, or it is spawned by mgmt app (or systemd, or whomever) per one
domain, or even disk. Therefore, we can do nothing more than shrug
shoulders and require users to label the socket correctly. Or use
managed helper.
3) in future, when UHP dies, libvirt will NOT spawn it again. It's
unmanaged after all. It's user/sysadmin responsibility to spawn it
again.
Now, for the managed helper process (MHP) the assumptions are:
1) there's one MHP per domain (all SCSI disks in the domain share the
same MHP).
2) the MHP runs as root, but is placed into the same CGroup, mount
namespace as qemu process it serves
3) MHP is lives and dies with the domain it is associated with.
The code might be complicated more than needed - it is prepared to have
one MHP per disk rather than domain (should we ever need it). Therefore
instead of storing one pid_t, we store them in a hash table where more
can be stored.
Michal