On Fri, Jan 18, 2019 at 10:16:38AM +0000, Daniel P. Berrangé wrote:
On Fri, Jan 18, 2019 at 10:39:35AM +0100, Erik Skultety wrote:
> Hi,
> this is a summary of a private discussion I've had with guys CC'd on this
email
> about finding a solution to [1] - basically, the default permissions on
> /dev/sev (below) make it impossible to query for SEV platform capabilities,
> since by default we run QEMU as qemu:qemu when probing for capabilities. It's
> worth noting is that this is only relevant to probing, since for a proper QEMU
> VM we create a mount namespace for the process and chown all the nodes (needs a
> SEV fix though).
>
> # ll /dev/sev
> crw-------. 1 root root
>
> I suggested either force running QEMU as root for probing (despite the obvious
> security implications) or using namespaces for probing too. Dan argued that
> this would have a significant perf impact and suggested we ask systemd to add a
> global udev rule.
I've just realized there is a potential 3rd solution. Remember there is
actually nothing inherantly special about the 'root' user as an account
ID. 'root' gains its powers from the fact that it has many capabilities
by default. 'qemu' can't access /dev/sev because it is owned by a
different user (happens to be root) and 'qemu' does not have capabilities.
So we can make probing work by using our capabilities code to grant
CAP_DAC_OVERRIDE to the qemu process we spawn. So probing still runs
as 'qemu', but can none the less access /dev/sev while it is owned
by root. We were not using 'qemu' for sake of security, as the probing
process is not executing any untrusthworthy code, so we don't loose any
security protection by granting CAP_DAC_OVERRIDE.
Truth to be told, I was playing with the idea of using CAP_ capabilities,
however, because I'm paranoid (maybe too much) I took your comment "malicious
QEMU" as an axiom for the process not be trusted *at all*, regardless of what
it's doing, I decided to ditch the idea, because with CAP_DAC_OVERRIDE, you're
essentially giving QEMU an open ticket to any file on the filesystem (okay, not
any and not always because QEMU executed by system libvirt will be confined so
SElinux will still be our guardian angel).
[...]
>
> So, in conclusion, we absolutely need input from Brijesh (AMD) whether there
> was something more than the low limit on number of guests behind the default
> permissions. Also, we'd like to get some details on how the limit is managed,
> helping to assess the approaches mentioned above.
Regardless of this problem, I think it is important to have some docs
in either libvirt or QEMU that describe the resource usage constraints
so that management apps can decide how to best take advantage of SEV.
Agreed. In the meantime, I'll start working on adding /dev/sev into the
namespace only for SEV-enabled guests.
Thanks,
Erik