On Wed, Sep 21, 2022 at 12:06:49PM -0600, Alex Williamson wrote:
> I still think the compat gaps are small. I've realized that
> VFIO_DMA_UNMAP_FLAG_VADDR has no implementation in qemu, and since it
> can deadlock the kernel I propose we purge it completely.
Steve won't be happy to hear that, QEMU support exists but isn't yet
merged.
If Steve wants to keep it then someone needs to fix the deadlock in
the vfio implementation before any userspace starts to appear.
I can fix the deadlock in iommufd in a terrible expensive way, but
would rather we design a better interface if nobody is using it yet. I
advocate for passing the memfd to the kernel and use that as the page
provider, not a mm_struct.
The issue is where we account these pinned pages, where accounting
is
necessary such that a user cannot lock an arbitrary number of pages
into RAM to generate a DoS attack.
It is worth pointing out that preventing a DOS attack doesn't actually
work because a *task* limit is trivially bypassed by just spawning
more tasks. So, as a security feature, this is already very
questionable.
What we've done here is make the security feature work to actually
prevent DOS attacks, which then gives you this problem:
This obviously has implications. AFAICT, any management tool that
doesn't instantiate assigned device VMs under separate users are
essentially untenable.
Because now that the security feature works properly it detects the
DOS created by spawning multiple tasks :(
Somehow I was under the impression there was not user sharing in the
common cases, but I guess I don't know that for sure.
> So, I still like 2 because it yields the smallest next step
before we
> can bring all the parallel work onto the list, and it makes testing
> and converting non-qemu stuff easier even going forward.
If a vfio compatible interface isn't transparently compatible, then I
have a hard time understanding its value. Please correct my above
description and implications, but I suspect these are not just
theoretical ABI compat issues. Thanks,
Because it is just fine for everything that doesn't use the ulimit
feature, which is still a lot of use cases!
Remember, at this point we are not replacing /dev/vfio/vfio, this is
just providing the general compat in a form that has to be opted
into. I think if you open the /dev/iommu device node then you should
get secured accounting.
If /dev/vfio/vfio is provided by iommufd it may well have to trigger a
different ulimit tracking - if that is the only sticking point it
seems minor and should be addressed in some later series that adds
/dev/vfio/vfio support to iommufd..
Jason