On 07/27/2018 11:01 AM, Daniel P. Berrangé wrote:
On Fri, Jul 27, 2018 at 09:56:23AM +0200, Michal Privoznik wrote:
> Dear list,
>
> we have this very old bug [1] that I just keep pushing in front of me. I
> made several attempts to fix it. However, none of them made into the
> tree. I guess it's time to have discussion what to do about it. IIRC,
> the algorithm that I implemented last was to keep original label in
> XATTRs (among with some ref counter) and the last one to restore the
> label will look there and find the original label. There was a problem
> with two libvirtds fighting over a file on shared FS.
IIRC there was a second problem around security of XATTRs. eg, if we set
an XATTR it was possible for QEMU to change it once we given QEMU privs
on the image. Thus a malicious QEMU can cause libvirtd to change the
image to an arbitrary user on shutdown.
I think it was possible to deal with that on Linux, because the kernel
hsa some xattr namespacing that is reserved for only root.
Yes, there are basically 4 levels (defined by prefix of attribute name):
security.
system.
trusted.
user.
For the first two there is no restriction on side of VFS (but there
might be some coming from underlying FS and/or security module). The
trusted. can be get/set only by CAP_SYS_ADMIN process. And finally user.
can be set only one regular files (plus some other restrictions), but
it's available for basically everybody.
IIRC, I used trusted. namespace because of CAP_SYS_ADMIN. Nota bene, if
a file is stored on NFS and only CAP_SYS_ADMIN can change the
attribute's value they are CAP_SYS_ADMIN already - they might change
seclabel too. Why bother mangling libvirt's records. IOW, if somebody
runs qemu with CAP_SYS_ADMIN all bets are off anyway.
This would of course mean that there's no label restore for session
daemon, but that can hardly ever work. Do we even initialize DAC/SELinux
drivers for session daemon?
This protection
is worthless though when using NFS images as you can't trust the remote NFS
client to honour the same restriction.
> So I guess my question is can we come up with a design that would work?
> Or at least work to the extent that we're satisfied with?
>
> Personally, I like the XATTRs approach. And to resolve the NFS race we
> can invent yet another lockspace to guard labelling - I also have a bug
> for that [2] (although, I'm not that familiar with lockspaces). I guess
> doing disk metadata locking is not going to be trivial, is it?
Yeah, for sure we need two distinct locking regimes. Our current locking
aims to protect disk content, and the lifetime of the lock is the entire
duration of the QEMU process. For XATTRs, we only need write protection
for the short time window in which the security drivers execute, which
is at start, during hotplug/unplug, during shutdown, and finally migration.
I guess migration would be covered by start. Since these locks would be
held only for very short period (basically they would be acquired just
before we try to set seclabel and released after disk content lock is
successfully acquired) they can be exclusive.
However, since we want to protect more than just disk seclabels, we need
to acquire this new lock from more places.
I think we could achieve this with the virtlockd if we make it use the
same locking file, but a different byte offset within the file. Just
need to make sure it doesn't clash with the byte offset QEMU has chosen,
nor the current offset we use.
Makes sense. So the first step is to introduce metadata locking. I'll
look into that.
Michal