On Wed, Dec 07, 2022 at 12:07:11PM +0000, Daniel P. Berrangé wrote:
On Wed, Dec 07, 2022 at 12:42:06PM +0100, Martin Kletzander wrote:
> On Thu, Dec 01, 2022 at 10:17:49AM +0000, Daniel P. Berrangé wrote:
> > The other end of the
> >
> > virInternalSetProcessSetMaxMemLockHandler
> >
> > wouldn't have ability to validate the VM identity even if we
> > passed it, since the master source of VM identity info is
> > the unprivileged and untrusted component.
> >
> > This means it is a big challenge to do more than just a blanket
> > allow/deny for the entire 'max mem lock' feature, rather than try
> > to finese it per VM.
> >
>
> Exactly what I was afraid of with another approach I discussed with
> someone else a while ago. If you start providing ways to do arbitrary
> privileged operations, then you are effectively giving away privileged
> access.
>
> In this case I think it was an unfortunate choice of an API. If the API
> is *designed* to provide the proper identifying information, then the
> management application can then choose the action properly.
I think it is challenging no matter what because the privileged component
is placing trust on the unprivilged component to supply honest identifying
info. This is a key reason why polkit ACL checks are done based on
process ID + permission name. Process ID can't be faked, and you're asking
the user to confirm the honesty of the permission name.
Overall, I think if you're going to allow "mem lock" to an unprivileged
VM that's fine, but the expectation should be that we're allowing this
for *any* VM, and not able to offer per-VM access control on that usage.
What I meant was something more along the lines of "place_vm_in_cgroup" where
we
would offload the cgroup placement to kubevirt. It already has to trust us with
the information we provide now (which threads are placed in which cgroups). The
benefit of having the callback per-connection and calling it on the connection
that is starting the VM would kind of make the argument easier.
> > > > The application-provided handler would be responsible for performing
> > > > the privileged operation (in this case raising the memlock limit for
> > > > a process). For KubeVirt, virt-launcher would have to pass the baton
> > > > to virt-handler.
> > > >
> > > > If such an handler is installed, libvirt would invoke it (and likely
> > > > go through some sanity checks afterwards); if not, it would attempt
> > > > to perform the privileged operation itself, as it does today.
> > >
> > > I think we also need to carefully consider how to do the callbacks in
> > > our RPC. Currently events don't wait for the client to respond and
these
> > > would need to do so.
> > >
> > > Also use of these should e.g. require keepalive to be used.
> >
> > What you're describing is no longer callbacks, but it is method
> > calls operating in the reverse direction to what they do today.
> > Furthermore this would tie up the RPC worker dispatch thread
> > while waiting for a response. This could easily result in a
> > deadlock situation if the client app servicing this request
> > was busy doing something else, or had to wait on its own
> > locks, but those locks were held by another thread that is
> > waiting on libvirt. Then there's the complexity of tieing
> > this up into thue actual RPC implementations of both cllient
> > and server.
> >
>
> My idea how to avoid callback failures was to split
> e.g. virDomainCreate() similarly to how virDomainMigrate...() is,
> i.e. into phases which would be invisible from the caller unless they
> want to do "something special" with the domain. Of course this does not
> work for actions initiated by guest/libvirt, but to start small this was
> one of the ideas.
I think the experiance with migration would strongly bias me against
decomposing the startup process. We had no choice with migrate but
to decompse because we needed to co-ordinate across multiple hosts,
but it has been a horrible burden, requiring us to re-design the
internal migration steps many many times. I don't want us to get
into that trap for other APIs.
Fair point.
> > This really just re-inforces the point that this doesn't
belong
> > in the public API / RPC layer at all. It needs to all work via
> > a plugin loaded in-process. That plugin can call out to whatever
> > service it wants to use, whether this uses DBus or REST or
> > something else is upto the plugin author.
> >
>
> One counterargument to this was that it would require mgmt app
> developers to write an .so, or at least a C-compatible plugin. While I
> don't see that as an issue (apart from it possibly hindering the
> adoption) I can offer an idea which came before we even discussed this
> particular topic. It actually started from the other end, but let me
> give you some idea how it might be more usable approach for mgmt apps
> like kubevirt.
I don't see the need to write an .so to be a unreasonable burden.
Me neither, I suggested that if someone wants to mend the behaviour of libvirt
in a specific way (e.g. requests that the daemon does not call some function)
can already be solved (or at least tried and tested) by LD_PRELOAD-ing a
library. The first time it was said in a bit of an anger, but thinking about it
the point is still valid. Of course that's not fitting for any production usage
for which we'd want something at least resembling a plug-in system.
The plugin does not need to contain any significant amount of
logic. If desired it could be nothing more than a shim which talks
to a separate $LANG daemon over an RPC system. Even if wanting a
self-contained plugin, its possible to write shared libraries in
common system languages like C, Go, Rust, as they can expose shim
functions with C linkage semantics. If you really wanted it would
be possible to write a C shim that loads the python/perl/ruby/wasm/
etc runtime, and then write your plugins in dynamic languages too.
> If we provide a way to initialise this "plugin" at runtime, then there
> is no need to mangle with reloading the daemon. If we provide a way to
> do this over the already existing connection, then it might be even
> easier to use. However that would require a way to transfer a coded
> algorithm and being able to run it (ideally with reduced access to the
> system for security purposes). A format that could be pretty usable for
> this is WASM. And before anyone rolls their eyes give me a minute to
> mention few points about it:
>
> - there is no limit to the source code language used
>
> - it can be built particularly for the libvirt daemon dynamically (if
> someone wants that)
>
> - it can be ran safely by giving it access only to a limited set of
> predefined APIs
>
> - libvirt can run it itself, the runtime can be built in
>
> I'm not saying this is the solution to the points in question, just an
> idea I had few months ago which lead nowhere because I did not have
> enough time to make a simple bytecode run in a C program.
>
> Of course most of the above points can be solved without utilising WASM,
> e.g. by allowing plugins to be handled by admin APIs, running them in a
> sandbox, etc.
I feel like the type of functionality we need plugins to support is
not something that we need to, nor should, make dynamically configurable
at runtime. If we need to have matched logic at startup and shutdown of
a VM, or plug and unplug of a device to a running VM, it is creating
us an unecessary problem to allow this to be dynamically changed. It
should be more than sufficient to configure this at host deployment
time prior to starting libvirtd. For kubevirt usage this is certainly
trivial, since they deploy and start a new libvirt at VM startup.
If we had static plugins, someone could write a loadable .so library
that contained the WASM runtime, and exposed a RPC API for injecting
WASM code to a running libvirt, allowing the things you describe
above, without having to tie libvirt to that approach.
Once there is an interface defined (for example the plugin) then it can do
anything, of course. The advantage was mainly with code transfer from the
client to the server, once that is out of the question (well it was never in)
the other advantages are fall apart too.
Static plug-ins are the safest (and probably easiest) choice. The argument
about writing them, and mostly other arguments as well, are mostly constructed
until people from mgmt apps weigh in with their opinions.