On Tue, Jan 29, 2019 at 06:40:08PM +0000, Daniel P. Berrangé wrote:
On Tue, Jan 29, 2019 at 05:15:42PM +0100, Erik Skultety wrote:
> On Wed, Jan 23, 2019 at 03:02:28PM +0000, Singh, Brijesh wrote:
> >
> >
> > On 1/23/19 7:36 AM, Daniel P. Berrangé wrote:
> > > On Wed, Jan 23, 2019 at 02:33:01PM +0100, Erik Skultety wrote:
> > >> On Wed, Jan 23, 2019 at 01:24:13PM +0000, Daniel P. Berrangé wrote:
> > >>> On Wed, Jan 23, 2019 at 02:22:12PM +0100, Erik Skultety wrote:
> > >>>> On Wed, Jan 23, 2019 at 01:10:42PM +0000, Daniel P. Berrangé
wrote:
> > >>>>>
> > >>>>> It still looks like we can solve this entirely in libvirt
by just giving
> > >>>>> the libvirt capabilities probing code CAP_DAC_OVERRIDE.
This would make
> > >>>>> libvirt work for all currently released SEV support in
kernel/qemu.
> > >>>>
> > >>>> Sure we can, but that would make libvirt the only legitimate
user of /dev/sev
> > >>>> and everything else would require the admin to change the
permissions
> > >>>> explicitly so that other apps could at least retrieve the
platform info, if
> > >>>> it was intended to be for public use?
> > >>>> Additionally, we'll still get shot down by SELinux because
svirt_t wouldn't be
> > >>>> able to access /dev/sev by default.
> > >>>
> > >>> That's separate form probing and just needs SELinux policy to
define
> > >>> a new sev_device_t type and grant svirt_t access to it.
> > >>
> > >> I know, I misread "we can solve this entirely in libvirt"
then, I thought you
> > >> the SELinux part was included in the statement, my bad then. Still,
back to the
> > >> original issue, we could technically do both, libvirt would have run
qemu with
> > >> CAP_DAC_OVERRIDE and we keep working with everything's been
released for
> > >> SEV in kernel/qemu and for everyone else, systemd might add 0644 for
/dev/sev,
> > >> that way, everyone's happy, not that I'd be a fan of libvirt
often having
> > >> to work around something because projects underneath wouldn't
backport fixes to
> > >> all the distros we support, thus leaving the dirty work to us.
> > >
> > > Setting 0644 for /dev/sev looks unsafe to me unless someone can show
> > > where the permissions checks take place for the many ioctls that
> > > /dev/sev allows, such that only SEV_PDH_CERT_EXPORT or
SEV_PLATFORM_STATUS
> > > is allowed when /dev/sev is opened by a user who doesn't have write
> > > permissions.
> > >
> >
> > Agree its not safe to do 0644.
> >
> > Currently, anyone who has access to /dev/sev (read or write) will be
> > able to execute SEV platform command. In other words there is no
> > permission check per command basis. I must admit that while developing
> > the driver I was under assumption that only root will ever access the
> > /dev/sev device hence overlooked it. But now knowing that others may
> > also need to access the /dev/sev, I can submit patch in kernel to do
> > per command access control.
> >
> > Until then, can we follow Daniel's recommendation to elevate privilege
> > of the probing code?
>
> So, sorry for not coming back earlier, but I'm still fighting a permission
> issue when opening the /dev/sev device and I honestly don't know what's
> happening. If you apply the patch bellow (or attachment) and you run libvirtd
> with LIBVIRT_LOG_OUTPUTS=1:file:<your_log_file> env, you should see something
> like this in the logs:
>
> warning : virExec:778 : INHERITABLE CAPS: 'dac_override'
> warning : virExec:781 : EFFECTIVE CAPS: 'dac_override'
> warning : virExec:784 : PERMITTED CAPS: 'dac_override'
> warning : virExec:787 : BOUNDING CAPS: 'dac_override'
>
> ...and that is right before we issue execve. Yet, if you debug further into the
> QEMU process right after execve, it doesn't report any capabilities at all, so
> naturally it'll still get permission denied. Is there something I'm missing
> here?
Arrrrggghh. We're missing the painful thing that I re-learn the hard way
every time we play with capabilities in libvirt.....
*Even* if you have set INHERITABLE capabilities, they are discarded
on execve(), unless the binary you are exec'ing has file capabilities
set that match those you are trying to give it. Of course QEMU binaries
(and almost all binaries) lack such file capabilities.
Thanks for ^this bit which helped me understand the bits below. When I read the
man page yesterday the first question was, okay, how do I figure out whether
the file capabilities bit is set? Well, use xattrs...which didn't return
anything, so I was puzzled what exactly it should look like, but now that you
explained that most binaries actually lack the file capabilities, I see the
issue clearly :).
The capabilities man page does in fact explain this
[quote]
P’(permitted) = (P(inheritable) & F(inheritable)) |
(F(permitted) & cap_bset)
P’(effective) = F(effective) ? P’(permitted) : 0
where
P denotes the value of a thread capability set before the execve(2)
P' denotes the value of a thread capability set after the execve(2)
F denotes a file capability set
If ... real user ID of the process is 0 (root) then the file inheritable
and permitted sets are defined to be all ones
[/quote]
So this means your code works perfectly *if* we are executing
QEMU as root, but if we are executing QEMU as non-root then
all our work is for nothing.
Now I have actually intentionally mis-quoted from an old man
page here. In Linux 4.3 the kernel devs invented a new set of
"ambient" capabilities [1] which allow programs to deal with this
insanely annoying behaviour.
Right, now the ambient stuff makes sense, because at first I didn't understand
what the hell that was about after reading the man page.
[quote]
P'(ambient) = (file is privileged) ? 0 : P(ambient)
P'(permitted) = (P(inheritable) & F(inheritable)) |
(F(permitted) & cap_bset) | P'(ambient)
P'(effective) = F(effective) ? P'(permitted) : P'(ambient)
[/quote]
so if we added CAP_DAC_OVERRIDE to the ambient capabilities set,
then your patch would work.
We can do that with this patch:
diff --git a/src/util/virutil.c b/src/util/virutil.c
index 5251b66454..be83e91fee 100644
--- a/src/util/virutil.c
+++ b/src/util/virutil.c
@@ -1601,6 +1601,12 @@ virSetUIDGIDWithCaps(uid_t uid, gid_t gid, gid_t *groups, int
ngroups,
goto cleanup;
}
+ for (i = 0; i <= CAP_LAST_CAP; i++) {
+ if (capBits & (1ULL << i)) {
+ prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, i, 0, 0);
+ }
+ }
I'll squash this in and give it another test.
+
ret = 0;
cleanup:
return ret;
though, we need a #ifdef check for existance of PR_CAP_AMBIENT
> An alternative question I've been playing ever since we exchanged the last few
> emails is that can't we wait until the ioctls are compared against permissions
> in kernel so that upstream libvirt (and downstream too for that matter) doesn't
> have to work around it and stick with that workaround for eternity?
IIUC, the SEV feature has already shipped with distros, so we'd effectively
be saying that what we already shipped is unusable to libvirt. This doesn't
feel like a desirable story to me.
It was, but it never worked, it always has been broken in this way. When we
were merging this upstream, we had a terrible shortage of machines and we had
to share, so the first person to provision the machine had already taken care
of the permissions in order to test so that led to this issue having been
overlooked until now. If it ever worked as expected and then we broke it, then
any fix from our side would make sense but otherwise I believe we should fix
this bottom up.
Thanks,
Erik