On Mon, 2021-03-08 at 15:57 +0000, Daniel P. Berrangé wrote:
On Mon, Mar 08, 2021 at 04:32:26PM +0100, Andrea Bolognani wrote:
> On Mon, 2021-03-08 at 13:17 +0000, Daniel P. Berrangé wrote:
> > Since you added code to parse existing limits from /proc, I'm wondering
> > if we can just do without the config option. Simply try to use prlimit
> > and if it fails, query existing limits to determine if we sould treat
> > the prlimit as fatal or ignore it. Overall I'd prefer libvirt to
> > "just work" out of the box rather than requiring people to know
about
> > setting a "make-vfio-hotplug-work=yes" flag in the config file.
>
> The problem with that approach is what to do when *lowering* the
> limit, for example as a consequence of hot-unplugging the last VFIO
> device from the VM.
>
> If we're controlling the memory locking limit ourselves, then failure
> to lower it should be an error, because leaving the limit much higher
> than necessary creates potential for DoS by a compromised QEMU; on
> the other hand, if the limit is controlled by an external process,
> all we can really do is assume they will do the right thing after
> hot-unplugging has happened.
IMHO once QEMU vCPUs start running, immediately assume QEMU is
compromised / hostile. IOW, the DoS risk arrived the moment it
was given the higher limit. We're just failing to close off the
existing risk we've already accepted, which doesn't worry me much.
On unplug the only thing we actually do when memory lock reduce
fails is to log a warning message, it is never treated as a
fatal error.
So the only difference is whether we skip the warning message
when we get EPERM from prlimit(), or always emit the warning.
You're right, we're currently just soft-failing when we can't lower
the memlock limit on unplug. Given this and your assessment of the
security implications, which I trust, we should indeed be able to
avoid introducing the qemu.conf knob and just behave sanely in all
scenarios out of the box. I'll give it a try.
--
Andrea Bolognani / Red Hat / Virtualization