On Mon, Oct 10, 2022 at 04:54:50PM -0400, Steven Sistare wrote:
> Do we have a solution to this?
>
> If not I would like to make a patch removing VFIO_DMA_UNMAP_FLAG_VADDR
>
> Aside from the approach to use the FD, another idea is to just use
> fork.
>
> qemu would do something like
>
> .. stop all container ioctl activity ..
> fork()
> ioctl(CHANGE_MM) // switch all maps to this mm
> .. signal parent..
> .. wait parent..
> exit(0)
> .. wait child ..
> exec()
> ioctl(CHANGE_MM) // switch all maps to this mm
> ..signal child..
> waitpid(childpid)
>
> This way the kernel is never left without a page provider for the
> maps, the dummy mm_struct belonging to the fork will serve that role
> for the gap.
>
> And the above is only required if we have mdevs, so we could imagine
> userspace optimizing it away for, eg vfio-pci only cases.
>
> It is not as efficient as using a FD backing, but this is super easy
> to implement in the kernel.
I propose to avoid deadlock for mediated devices as follows. Currently, an
mdev calling vfio_pin_pages blocks in vfio_wait while VFIO_DMA_UNMAP_FLAG_VADDR
is asserted.
* In vfio_wait, I will maintain a list of waiters, each list element
consisting of (task, mdev, close_flag=false).
* When the vfio device descriptor is closed, vfio_device_fops_release
will notify the vfio_iommu driver, which will find the mdev on the waiters
list, set elem->close_flag=true, and call wake_up_process for the task.
This alone is not sufficient, the mdev driver can continue to
establish new mappings until it's close_device function
returns. Killing only existing mappings is racy.
I think you are focusing on the one issue I pointed at, as I said, I'm
sure there are more ways than just close to abuse this functionality
to deadlock the kernel.
I continue to prefer we remove it completely and do something more
robust. I suggested two options.
Jason