Re: RFC: New APIs for delegation of privileged operations

7 Dec 2022

      On Wed, Dec 07, 2022 at 12:42:06PM +0100, Martin Kletzander wrote:
...
On Thu, Dec 01, 2022 at 10:17:49AM +0000, Daniel P. Berrangé wrote:
...
The other end of the
virInternalSetProcessSetMaxMemLockHandler
wouldn't have ability to validate the VM identity even if we
passed it, since the master source of VM identity info is
the unprivileged and untrusted component.
This means it is a big challenge to do more than just a blanket
allow/deny for the entire 'max mem lock' feature, rather than try
to finese it per VM.
Exactly what I was afraid of with another approach I discussed with
someone else a while ago.  If you start providing ways to do arbitrary
privileged operations, then you are effectively giving away privileged
access.
In this case I think it was an unfortunate choice of an API.  If the API
is *designed* to provide the proper identifying information, then the
management application can then choose the action properly.
I think it is challenging no matter what because the privileged component
is placing trust on the unprivilged component to supply honest identifying
info.  This is a key reason why polkit ACL checks are done based on
process ID + permission name. Process ID can't be faked, and you're asking
the user to confirm the honesty of the permission name.

Overall, I think if you're going to allow "mem lock" to an unprivileged
VM that's fine, but the expectation should be that we're allowing this
for *any* VM, and not able to offer per-VM access control on that usage.
...
...
...
...
The application-provided handler would be responsible for performing
the privileged operation (in this case raising the memlock limit for
a process). For KubeVirt, virt-launcher would have to pass the baton
to virt-handler.
If such an handler is installed, libvirt would invoke it (and likely
go through some sanity checks afterwards); if not, it would attempt
to perform the privileged operation itself, as it does today.
I think we also need to carefully consider how to do the callbacks in
our RPC. Currently events don't wait for the client to respond and these
would need to do so.
Also use of these should e.g. require keepalive to be used.
What you're describing is no longer callbacks, but it is method
calls operating in the reverse direction to what they do today.
Furthermore this would tie up the RPC worker dispatch thread
while waiting for a response. This could easily result in a
deadlock situation if the client app servicing this request
was busy doing something else, or had to wait on its own
locks, but those locks were held by another thread that is
waiting on libvirt. Then there's the complexity of tieing
this up into thue actual RPC implementations of both cllient
and server.
My idea how to avoid callback failures was to split
e.g. virDomainCreate() similarly to how virDomainMigrate...() is,
i.e. into phases which would be invisible from the caller unless they
want to do "something special" with the domain.  Of course this does not
work for actions initiated by guest/libvirt, but to start small this was
one of the ideas.
I think the experiance with migration would strongly bias me against
decomposing the startup process. We had no choice with migrate but
to decompse because we needed to co-ordinate across multiple hosts,
but it has been a horrible burden, requiring us to re-design the
internal migration steps many many times. I don't want us to get
into that trap for other APIs.
...
...
This really just re-inforces the point that this doesn't belong
in the public API / RPC layer at all. It needs to all work via
a plugin loaded in-process. That plugin can call out to whatever
service it wants to use, whether this uses DBus or REST or
something else is upto the plugin author.
One counterargument to this was that it would require mgmt app
developers to write an .so, or at least a C-compatible plugin.  While I
don't see that as an issue (apart from it possibly hindering the
adoption) I can offer an idea which came before we even discussed this
particular topic.  It actually started from the other end, but let me
give you some idea how it might be more usable approach for mgmt apps
like kubevirt.
I don't see the need to write an .so to be a unreasonable burden.
The plugin does not need to contain any significant amount of
logic. If desired it could be nothing more than a shim which talks
to a separate $LANG daemon over an RPC system. Even if wanting a
self-contained plugin, its possible to write shared libraries in
common system languages like C, Go, Rust, as they can expose shim
functions with C linkage semantics. If you really wanted it would
be possible to write a C shim that loads the python/perl/ruby/wasm/
etc runtime, and then write your plugins in dynamic languages too.
...
If we provide a way to initialise this "plugin" at runtime, then there
is no need to mangle with reloading the daemon.  If we provide a way to
do this over the already existing connection, then it might be even
easier to use.  However that would require a way to transfer a coded
algorithm and being able to run it (ideally with reduced access to the
system for security purposes).  A format that could be pretty usable for
this is WASM.  And before anyone rolls their eyes give me a minute to
mention few points about it:
- there is no limit to the source code language used
- it can be built particularly for the libvirt daemon dynamically (if
   someone wants that)
- it can be ran safely by giving it access only to a limited set of
   predefined APIs
- libvirt can run it itself, the runtime can be built in
I'm not saying this is the solution to the points in question, just an
idea I had few months ago which lead nowhere because I did not have
enough time to make a simple bytecode run in a C program.
Of course most of the above points can be solved without utilising WASM,
e.g. by allowing plugins to be handled by admin APIs, running them in a
sandbox, etc.
I feel like the type of functionality we need plugins to support is
not something that we need to, nor should, make dynamically configurable
at runtime. If we need to have matched logic at startup and shutdown of
a VM, or plug and unplug of a device to a running VM, it is creating
us an unecessary problem to allow this to be dynamically changed. It
should be more than sufficient to configure this at host deployment
time prior to starting libvirtd. For kubevirt usage this is certainly
trivial, since they deploy and start a new libvirt at VM startup.

If we had static plugins, someone could write a loadable .so library
that contained the WASM runtime, and exposed a RPC API for injecting
WASM code to a running libvirt, allowing the things you describe
above, without having to tie libvirt to that approach.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|