On 07/27/2018 11:01 AM, Daniel P. Berrangé wrote:
On Fri, Jul 27, 2018 at 09:56:23AM +0200, Michal Privoznik wrote:
> Dear list,
>
> we have this very old bug [1] that I just keep pushing in front of me. I
> made several attempts to fix it. However, none of them made into the
> tree. I guess it's time to have discussion what to do about it. IIRC,
> the algorithm that I implemented last was to keep original label in
> XATTRs (among with some ref counter) and the last one to restore the
> label will look there and find the original label. There was a problem
> with two libvirtds fighting over a file on shared FS.
IIRC there was a second problem around security of XATTRs. eg, if we set
an XATTR it was possible for QEMU to change it once we given QEMU privs
on the image. Thus a malicious QEMU can cause libvirtd to change the
image to an arbitrary user on shutdown.
I think it was possible to deal with that on Linux, because the kernel
hsa some xattr namespacing that is reserved for only root. This protection
is worthless though when using NFS images as you can't trust the remote NFS
client to honour the same restriction.
> So I guess my question is can we come up with a design that would work?
> Or at least work to the extent that we're satisfied with?
>
> Personally, I like the XATTRs approach. And to resolve the NFS race we
> can invent yet another lockspace to guard labelling - I also have a bug
> for that [2] (although, I'm not that familiar with lockspaces). I guess
> doing disk metadata locking is not going to be trivial, is it?
Yeah, for sure we need two distinct locking regimes. Our current locking
aims to protect disk content, and the lifetime of the lock is the entire
duration of the QEMU process. For XATTRs, we only need write protection
for the short time window in which the security drivers execute, which
is at start, during hotplug/unplug, during shutdown, and finally migration.
I think we could achieve this with the virtlockd if we make it use the
same locking file, but a different byte offset within the file. Just
need to make sure it doesn't clash with the byte offset QEMU has chosen,
nor the current offset we use.
So do we really need virtlockd for this? I mean, the whole purpose of it
is to hold locks so that libvirtd can restart independently, without
losing the lock attached. However, since the metadata lock will be held
only for a fraction of a second and will be not held through more than a
single API aren't we just fine with libvirtd holding the lock? The way I
imagine this to work is the following:
lock_for_metadata(vm->def); // locks an unique byte in all the disks
qemuSecuritySetAllLabel();
...
qemuProcessStartCPUs(); // disk content lock is acquired here
unlock_for_metadata(vm->def);
Another way to look at it is: it's libvirtd who relabels all the paths
and the metadata lock should guard just that. Speaking of which, we need
these locks to wait for each other, not just set-or-fail. Again,
something that goes against virtlockd view of locking.
Frankly, I've started implementing this with virtlockd already, but the
changes I made so far simply do not feel right, e.g. I have to change
lock_protocol.x so that virLockSpaceProtocolAcquireResourceArgs has this
'type' argument which tells virtlockd which byte we want to lock.
Michal