On Wed, Sep 21, 2022 at 03:44:24PM -0300, Jason Gunthorpe wrote:
If /dev/vfio/vfio is provided by iommufd it may well have to trigger
a
different ulimit tracking - if that is the only sticking point it
seems minor and should be addressed in some later series that adds
/dev/vfio/vfio support to iommufd..
And I have come up with a nice idea for this that feels OK
- Add a 'pin accounting compat' flag to struct iommufd_ctx (eg per FD)
The flag is set to 1 if /dev/vfio/vfio was the cdev that opened the
ctx
An IOCTL issued by cap sysadmin can set the flag
- If the flag is set we do not do pin accounting in the user.
Instead we account for pins in the FD. The single FD cannot pass the
rlimit.
This nicely emulates the desired behavior from virtualization without
creating all the problems with exec/fork/etc that per-task tracking
has.
Even in iommufd native mode a priviledged virtualization layer can use
the ioctl to enter the old mode and pass the fd to qemu under a shared
user. This should ease migration I guess.
It can still be oversubscribed but it is now limited to the number of
iommufd_ctx's *with devices* that the userspace can create. Since each
device can be attached to only 1 iommufd this is a stronger limit than
the task limit. 1 device given to the qemu will mean a perfect
enforcement. (ignoring that a hostile qemu can still blow past the
rlimit using concurrent rdma or io_uring)
It is a small incremental step - does this suitably address the concern?
Jason