On Tue, Feb 21, 2017 at 10:45:05PM -0500, John Ferlan wrote:
On 02/21/2017 12:03 PM, Daniel P. Berrange wrote:
> On Tue, Feb 21, 2017 at 11:33:25AM -0500, John Ferlan wrote:
>> Repost:
http://www.redhat.com/archives/libvir-list/2017-February/msg00501.html
>>
>> to update to top of branch as of commit id '5ad03b9db2'
>
> BTW, could you include the full cover letter in each new version rather
> than making people follow links all the way back to v1 to find info
> about the patch series goals.
OK - I'll try to remember.
>
> IIUC, the intention here is that we automatically create NPIV devices
> when starting guests and delete them when stopping guests. I can see
> some appeal in this, but at the same time I'm not convinced we should
> add such a feature.
A bit more than that - create the vHBA and assign the LUN's to the guest
as they are discovered and remove them as they are removed (events from
udev). This is a mechanism/idea from Paolo. The RHV team would be the
primary consumer and IIRC they don't use storage pools.
>
> AFAICT, the node device APIs already allow a management application to
> achieve the same end goal without needing this integration. Yes, it
> would simplify usage of NPIV on the surface, but the cost of doing this
> is that it ends a specific usage policy for NPIV in the libvirt code and
> makes error handling harder. In particular it is possible to get into a
> situation where a VM fails to start and we're also unable to clear up
> the NPIV device we just auto-created. Now this could be said to apply
> to pretty much everything we do during guest startup, but in most cases
> the failure is harmless or gets auto-cleaned up by the kernel (ie the
> tap devices get auto-deleted when the FD is closed, or SELinux labels
> get reset next time a VM wants that file, locks are released when we
> close the virtlockd file handle, etc). NPIV is significantly more
> complicated and more likely to hit failure scenarios due to fact that
> it involves interactions with off-node hardware resources.
I agree with your points. The "purpose" of libvirt taking care of it
would be to let libvirt handle all those nasty and odd failure or
integration issues - including migration. Of course from a libvirt
perspective, I'd rather take the 'scsi_hostX' vHBA and just pass that
through to QEMU directly to allow it (or the guest) to find the LUN's,
but that's push the problem the other way.
I said early on that this is something that could be done by the upper
layers that would be able to receive the add/remove lun events whether
they created a storage pool just for that purpose or they created the
vHBA themselves. It's probably even in the bz's on this.
Yeah, I'm curious where the desire to assign individual LUNs comes
from - that makes me even less comfortable with this idea. It means
we are taking a single device from libvirt's POV - a HBA, and turning
it into many devices from the guest's POV, which means for a single
device we'd need to track multiple SCSI addresses surely, and we'd
not know how many addresses we need to track up front either. I'd
really strongly prefer we track individual devices given to the guest
explicitly.
I'd also suggest just assigning the whole vHBA, but We can still make
it possible for a mgmt app to dynamically add/remove LUNS from a vHBA.
Our node device APIs can report events when devices appear/disappear
so an app can create a vHBA and monitor for LUN changes & update the
guest. That said I'm curious how you'd handle remove, because surely
by the time you see a LUN disappear at vHBA, the guest is already
broken as its got an orphaned device. This seems like another reason
to just assign the vHBA itself to let the guest deal with LUN removal
more directly.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://entangle-photo.org -o-
http://search.cpan.org/~danberr/ :|