
On 10/11/2022 8:30 AM, Jason Gunthorpe wrote:
On Mon, Oct 10, 2022 at 04:54:50PM -0400, Steven Sistare wrote:
Do we have a solution to this?
If not I would like to make a patch removing VFIO_DMA_UNMAP_FLAG_VADDR
Aside from the approach to use the FD, another idea is to just use fork.
qemu would do something like
.. stop all container ioctl activity .. fork() ioctl(CHANGE_MM) // switch all maps to this mm .. signal parent.. .. wait parent.. exit(0) .. wait child .. exec() ioctl(CHANGE_MM) // switch all maps to this mm ..signal child.. waitpid(childpid)
This way the kernel is never left without a page provider for the maps, the dummy mm_struct belonging to the fork will serve that role for the gap.
And the above is only required if we have mdevs, so we could imagine userspace optimizing it away for, eg vfio-pci only cases.
It is not as efficient as using a FD backing, but this is super easy to implement in the kernel.
I propose to avoid deadlock for mediated devices as follows. Currently, an mdev calling vfio_pin_pages blocks in vfio_wait while VFIO_DMA_UNMAP_FLAG_VADDR is asserted.
* In vfio_wait, I will maintain a list of waiters, each list element consisting of (task, mdev, close_flag=false).
* When the vfio device descriptor is closed, vfio_device_fops_release will notify the vfio_iommu driver, which will find the mdev on the waiters list, set elem->close_flag=true, and call wake_up_process for the task.
This alone is not sufficient, the mdev driver can continue to establish new mappings until it's close_device function returns. Killing only existing mappings is racy.
I think you are focusing on the one issue I pointed at, as I said, I'm sure there are more ways than just close to abuse this functionality to deadlock the kernel.
I continue to prefer we remove it completely and do something more robust. I suggested two options.
It's not racy. New pin requests also land in vfio_wait if any vaddr's have been invalidated in any vfio_dma in the iommu. See vfio_iommu_type1_pin_pages() if (iommu->vaddr_invalid_count) vfio_find_dma_valid() vfio_wait() However, I will investigate saving a reference to the file object in the vfio_dma (for mappings backed by a file) and using that to translate IOVA's. I think that will be easier to use than fork/CHANGE_MM/exec, and may even be easier to use than VFIO_DMA_UNMAP_FLAG_VADDR. To be continued. - Steve