Re: [PATCH RFC v2 00/13] IOMMUFD Generic interface

Friday, 21 October 2022

On Fri, Sep 23, 2022 at 11:40:51AM -0400, Laine Stump wrote:
...
 It's been a few years, but my recollection is that before
starting a
 libvirtd that will run a guest with a vfio device, a privileged process
 needs to

 1) increase the locked memory limit for the user that will be running qemu
 (eg. by adding a file with the increased limit to /etc/security/limits.d)

 2) bind the device to the vfio-pci driver, and

 3) chown /dev/vfio/$iommu_group to the user running qemu. 
Here is what is going on to resolve this:

1) iommufd internally supports two ways to account ulimits, the vfio
   way and the io_uring way. Each FD operates in its own mode.

   When /dev/iommu is opened the FD defaults to the io_uring way, when
   /dev/vfio/vfio is opened it uses the VFIO way. This means
   /dev/vfio/vfio is not a symlink, there is a new kconfig
   now to make iommufd directly provide a miscdev.

2) There is an ioctl IOMMU_OPTION_RLIMIT_MODE which allows a
   privileged user to query/set which mode the FD will run in.

   The idea is that libvirt will open iommufd, the first action will
   be to set vfio compat mode, and then it will fd pass the fd to
   qemu and qemu will operate in the correct sandbox.

3) We are working on a cgroup for FOLL_LONGTERM, it is a big job but
   this should prove a comprehensive resolution to this problem across
   the kernel and improve the qemu sandbox security.

   Still TBD, but most likely when the cgroup supports this libvirt
   would set the rlimit to unlimited, then set new mlock and
   FOLL_LONGTERM cgroup limits to create the sandbox.

Jason

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [PATCH RFC v2 00/13] IOMMUFD Generic interface