On Mon, 2021-03-08 at 13:17 +0000, Daniel P. Berrangé wrote:
On Mon, Mar 08, 2021 at 02:11:56PM +0100, Andrea Bolognani wrote:
> The reason why VFIO device assignment is currently not completely
> broken in KubeVirt is that, when the QEMU process is initially
> started, we set the memory locking limit after fork(), so we can do
> that using setrlimit() which doesn't require additional capabilities,
> but in the hotplug scenario libvirtd needs to change the limits of a
> different process: in that case we are forced to use prlimit(), which
> fails due to the lack of CAP_SYS_RESOURCE.
Since you added code to parse existing limits from /proc, I'm wondering
if we can just do without the config option. Simply try to use prlimit
and if it fails, query existing limits to determine if we sould treat
the prlimit as fatal or ignore it. Overall I'd prefer libvirt to
"just work" out of the box rather than requiring people to know about
setting a "make-vfio-hotplug-work=yes" flag in the config file.
The problem with that approach is what to do when *lowering* the
limit, for example as a consequence of hot-unplugging the last VFIO
device from the VM.
If we're controlling the memory locking limit ourselves, then failure
to lower it should be an error, because leaving the limit much higher
than necessary creates potential for DoS by a compromised QEMU; on
the other hand, if the limit is controlled by an external process,
all we can really do is assume they will do the right thing after
hot-unplugging has happened.
I don't think discoverability is too much of an issue, as anyone who
needs to use this option will already have needed to figure out a lot
more in order to effectively take over memory locking limit
management responsibilities from libvirt...
--
Andrea Bolognani / Red Hat / Virtualization