On Wed, Nov 30, 2022 at 09:47:02AM +0100, Peter Krempa wrote:
On Tue, Nov 29, 2022 at 09:05:33 -0800, Andrea Bolognani wrote:
[...]
> Proposal
> --------
>
> In order to address the issues outlined above, I propose that we
> introduce a new set of APIs in libvirt.
>
> These APIs would expose some of the inner workings of libvirt, and
> as such would come with *massively reduced* stability guarantees
> compared to the rest of our public API.
>
> The idea is that applications such as KubeVirt, which track libvirt
> fairly closely and stay pinned to specific versions, would be able to
> adapt to changes in these APIs relatively painlessly. More
> traditional management applications such as virt-manager would simply
> not opt into using the new APIs and maintain the status quo.
>
> Using memlock as an example, the new API could look like
>
> typedef int (*virInternalSetMaxMemLockHandler)(pid_t pid,
> unsigned long long bytes);
>
> int virInternalSetProcessSetMaxMemLockHandler(virConnectPtr conn,
>
> virInternalSetMaxMemLockHandler handler);
As proposed, with 'pid'/'bytes' as arguments the callbacks don't
really
feel to provide any form of privilege restriction to libvirt.
Libvirt can choose to invoke the callback on any PID it likes and the
application registering the callback doesn't have any way to check that
the pid actually belongs to the started VM.
I'm afraid that with many of those you'll have the same issue.
At the very least the API/callback should carry additional information
outlining to which object the callback invocation is tied to.
(I understand that kubevirt most likely can assume that it's the single
VM it's handling inside that specific container, though)
By carefuly scoping this though the management layer can at least in
some situations trust this the data (e.g. if no untrusted code ran yet).
In fact libvirt does the same e.g. when probing vCPU thread pids at
startup (while cpus) are paused.
I simply want to point out that by this mechanism the code inside the
"unprivileged" container is not as unprivileged as it has backdoors to
higher layers to do privileged operations and the abstraction in the
callbacks needs to be carefully chosen so that it is not simply a
security charade.
Yep, authorization policy is the single biggest design challenge
with doing privilege separation and it is way too easy to screw
it up, even with very simple operations.
In the case of the libvirt polkit access control, polkit uses
SCM_RIGHTS to identify the calling process, and prompts the user
for password. We also provide the VM uuid, name, id along side.
The key thing there though is that the uuid, name + id have been
validated by libvirt which is the trusted component.
The other end of the
virInternalSetProcessSetMaxMemLockHandler
wouldn't have ability to validate the VM identity even if we
passed it, since the master source of VM identity info is
the unprivileged and untrusted component.
This means it is a big challenge to do more than just a blanket
allow/deny for the entire 'max mem lock' feature, rather than try
to finese it per VM.
> The application-provided handler would be responsible for
performing
> the privileged operation (in this case raising the memlock limit for
> a process). For KubeVirt, virt-launcher would have to pass the baton
> to virt-handler.
>
> If such an handler is installed, libvirt would invoke it (and likely
> go through some sanity checks afterwards); if not, it would attempt
> to perform the privileged operation itself, as it does today.
I think we also need to carefully consider how to do the callbacks in
our RPC. Currently events don't wait for the client to respond and these
would need to do so.
Also use of these should e.g. require keepalive to be used.
What you're describing is no longer callbacks, but it is method
calls operating in the reverse direction to what they do today.
Furthermore this would tie up the RPC worker dispatch thread
while waiting for a response. This could easily result in a
deadlock situation if the client app servicing this request
was busy doing something else, or had to wait on its own
locks, but those locks were held by another thread that is
waiting on libvirt. Then there's the complexity of tieing
this up into thue actual RPC implementations of both cllient
and server.
This really just re-inforces the point that this doesn't belong
in the public API / RPC layer at all. It needs to all work via
a plugin loaded in-process. That plugin can call out to whatever
service it wants to use, whether this uses DBus or REST or
something else is upto the plugin author.
With regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|