On 02/21/2017 12:03 PM, Daniel P. Berrange wrote:
On Tue, Feb 21, 2017 at 11:33:25AM -0500, John Ferlan wrote:
> Repost:
http://www.redhat.com/archives/libvir-list/2017-February/msg00501.html
>
> to update to top of branch as of commit id '5ad03b9db2'
BTW, could you include the full cover letter in each new version rather
than making people follow links all the way back to v1 to find info
about the patch series goals.
OK - I'll try to remember.
IIUC, the intention here is that we automatically create NPIV devices
when starting guests and delete them when stopping guests. I can see
some appeal in this, but at the same time I'm not convinced we should
add such a feature.
A bit more than that - create the vHBA and assign the LUN's to the guest
as they are discovered and remove them as they are removed (events from
udev). This is a mechanism/idea from Paolo. The RHV team would be the
primary consumer and IIRC they don't use storage pools.
AFAICT, the node device APIs already allow a management application to
achieve the same end goal without needing this integration. Yes, it
would simplify usage of NPIV on the surface, but the cost of doing this
is that it ends a specific usage policy for NPIV in the libvirt code and
makes error handling harder. In particular it is possible to get into a
situation where a VM fails to start and we're also unable to clear up
the NPIV device we just auto-created. Now this could be said to apply
to pretty much everything we do during guest startup, but in most cases
the failure is harmless or gets auto-cleaned up by the kernel (ie the
tap devices get auto-deleted when the FD is closed, or SELinux labels
get reset next time a VM wants that file, locks are released when we
close the virtlockd file handle, etc). NPIV is significantly more
complicated and more likely to hit failure scenarios due to fact that
it involves interactions with off-node hardware resources.
I agree with your points. The "purpose" of libvirt taking care of it
would be to let libvirt handle all those nasty and odd failure or
integration issues - including migration. Of course from a libvirt
perspective, I'd rather take the 'scsi_hostX' vHBA and just pass that
through to QEMU directly to allow it (or the guest) to find the LUN's,
but that's push the problem the other way.
I said early on that this is something that could be done by the upper
layers that would be able to receive the add/remove lun events whether
they created a storage pool just for that purpose or they created the
vHBA themselves. It's probably even in the bz's on this.
Is there some aspect of NPIV mgmt that can only be achieved if libvirt
is explicitly managing the device lifecycle during VM start/stop, as
opposed to having the mgmt app manage it ?
Beyond the upper layers not needing to handle anything other than
creating the vHBA for the domain and letting libvirt handle the rest.
If OpenStack were to provide NPIV support I think it'd probably
end
up dealing with device setup explicitly via the node device APIs
rather than relying on libvirt to create/delete them. That way it
can track the lifecycle of NPIV devices explicitly, and if it is not
possible to delete them at time of QEMU shutdown for some reason, it
can easily arrange to delete them later.
Overall I think one of the more successful aspects of libvirt's design
has been the way we minimise the addition of usage policy decisions, in
favour of providing mechanisms that applications can use to implement
policies. This has had a cost in that applications need todo more work
themselves, but on balance I still think it is a win to avoid adding
policy driven features to libvirt.
A key question is just where "autocreation/delete of NPIV devices" falls
in the line between mechanism & policy, since the line is not entirely
black & white. I tend towards it being policy though, since it is just
providing a less general purpose way todo something that can be achieved
already via the node device APIs.
Regards,
Daniel
I understand - to a degree I guess I had assumed some of these type
discussions had taken place by those that wanted the feature added.
One other good thing that's come out of these changes is a bit more
testing for vHBA creation via nodedev/storage pool and quite a bit of
code cleanup once/if most of the patches I posted earlier in the week
are accepted.
John
(FWIW: I'll have limited access to email over the next couple of days...)