On 29/01/2018 14:03, Michal Privoznik wrote:
On 01/26/2018 04:07 AM, John Ferlan wrote:
>
>
> On 01/18/2018 11:04 AM, Michal Privoznik wrote:
>> This is a definition that holds information on SCSI persistent
>> reservation settings. The XML part looks like this:
>>
>> <reservations enabled='yes' managed='no'>
>> <source type='unix' path='/path/to/qemu-pr-helper.sock'
mode='client'/>
>> </reservations>
>>
>> If @managed is set to 'yes' then the <source/> is not parsed.
>> This design was agreed on here:
>>
>>
https://www.redhat.com/archives/libvir-list/2017-November/msg01005.html
>>
>> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
>> ---
>> docs/formatdomain.html.in | 25 +++-
>> docs/schemas/domaincommon.rng | 19 +--
>> docs/schemas/storagecommon.rng | 34 +++++
>> src/conf/domain_conf.c | 36 +++++
>> src/libvirt_private.syms | 3 +
>> src/util/virstoragefile.c | 148 +++++++++++++++++++++
>> src/util/virstoragefile.h | 15 +++
>> .../disk-virtio-scsi-reservations-not-managed.xml | 40 ++++++
>> .../disk-virtio-scsi-reservations.xml | 38 ++++++
>> .../disk-virtio-scsi-reservations-not-managed.xml | 1 +
>> .../disk-virtio-scsi-reservations.xml | 1 +
>> tests/qemuxml2xmltest.c | 4 +
>> 12 files changed, 348 insertions(+), 16 deletions(-)
>> create mode 100644
tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml
>> create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml
>> create mode 120000
tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml
>> create mode 120000 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml
>>
>
> Before digging too deep into this...
>
> - I assume we're avoiding <disk> iSCSI mainly because those
> reservations would take place elsewhere, safe assumption?
I believe so, but I'll let Paolo answer that. The way I understand
reservations is that qemu needs to issue 'privileged' SCSI commands and
thus for regular SCSI (which for purpose of this argument involves iSCSI
emulated by kernel) either qemu needs CAP_SYS_RAWIO or a helper process
to which it'll pass the FD and which will issue the 'privileged' SCSI
commands on qemu's behalf.
Yes. There are two reasons for QEMU to access the helper. First, in
order to be able to issue the command without CAP_SYS_RAWIO. Second, in
order to access /dev/mapper/control and issue the command to all targets
in a multipath setup.
iSCSI in kernel, including multipath over iSCSI is included. iSCSI in
userspace does not need qemu-pr-manager because QEMU 1) can just send
the command down a TCP socket without needing CAP_SYS_RAWIO 2) does not
support multipath for iSCSI in userspace.
> - What about using lun's from a storage pool (and what could
become
> your favorite, NPIV devices ;-))
>
> <disk type='volume' device='lun'>
> <driver name='qemu' type='raw'/>
> <source pool='sourcepool' volume='unit:0:4:0'/>
> <target dev='sda' bus='scsi'/>
> </disk>
These should work too with my patches (not tested though - I don't have
any real SCSI machine).
> - What about <hostdev>'s?
>
> <hostdev mode='subsystem' type='scsi'>
>
> but not iSCSI or vHost hostdev's. I think that creates the SCSI
> generic LUN, but it's been a while since I've thought about the
> terminology used for hostdev's...
I think these don't need the feature since qemu can access the device
directly.
They actually need the feature, but it can be added later.
> And finally... I assume there is one qemu-pr-manager (qemu.conf
changes
> eventually)... Eventually there's magic that allows/adds per domain
> *and* per LUN some socket path. If libvirt provided it's generated via
> the domain temporary directory; however, what's not really clear is how
> that unmanaged path really works. Need a virtual whiteboard...
So, in case of unmanaged path, here are the assumptions that my patches
are built on:
1) unmanaged helper process (UHP) is spawned by somebody else's than
libvirtd (hence unmanaged) - it doesn't have to be user, it can be
systemd for instance.
2) path to UHP's socket has to be labeled correctly - libvirt doesn't
touch that
3) in future, when UHP dies, libvirt will NOT spawn it again. It's
unmanaged after all. It's user/sysadmin responsibility to spawn it
again.
Correct.
Now, for the managed helper process (MHP) the assumptions are:
1) there's one MHP per domain (all SCSI disks in the domain share the
same MHP).
2) the MHP runs as root, but is placed into the same CGroup, mount
namespace as qemu process it serves
3) MHP is lives and dies with the domain it is associated with.
Correct, with the caveat that QEMU must provide the MHP state and death
event for this to be complete.
Thanks,
Paolo
The code might be complicated more than needed - it is prepared to
have
one MHP per disk rather than domain (should we ever need it). Therefore
instead of storing one pid_t, we store them in a hash table where more
can be stored.
Michal