On Mon, Aug 06, 2018 at 10:02:44AM +0200, Michal Prívozník wrote:
On 07/27/2018 11:01 AM, Daniel P. Berrangé wrote:
> On Fri, Jul 27, 2018 at 09:56:23AM +0200, Michal Privoznik wrote:
>> Dear list,
>>
>> we have this very old bug [1] that I just keep pushing in front of me. I
>> made several attempts to fix it. However, none of them made into the
>> tree. I guess it's time to have discussion what to do about it. IIRC,
>> the algorithm that I implemented last was to keep original label in
>> XATTRs (among with some ref counter) and the last one to restore the
>> label will look there and find the original label. There was a problem
>> with two libvirtds fighting over a file on shared FS.
>
> IIRC there was a second problem around security of XATTRs. eg, if we set
> an XATTR it was possible for QEMU to change it once we given QEMU privs
> on the image. Thus a malicious QEMU can cause libvirtd to change the
> image to an arbitrary user on shutdown.
>
> I think it was possible to deal with that on Linux, because the kernel
> hsa some xattr namespacing that is reserved for only root. This protection
> is worthless though when using NFS images as you can't trust the remote NFS
> client to honour the same restriction.
>
>> So I guess my question is can we come up with a design that would work?
>> Or at least work to the extent that we're satisfied with?
>>
>> Personally, I like the XATTRs approach. And to resolve the NFS race we
>> can invent yet another lockspace to guard labelling - I also have a bug
>> for that [2] (although, I'm not that familiar with lockspaces). I guess
>> doing disk metadata locking is not going to be trivial, is it?
>
> Yeah, for sure we need two distinct locking regimes. Our current locking
> aims to protect disk content, and the lifetime of the lock is the entire
> duration of the QEMU process. For XATTRs, we only need write protection
> for the short time window in which the security drivers execute, which
> is at start, during hotplug/unplug, during shutdown, and finally migration.
>
> I think we could achieve this with the virtlockd if we make it use the
> same locking file, but a different byte offset within the file. Just
> need to make sure it doesn't clash with the byte offset QEMU has chosen,
> nor the current offset we use.
So do we really need virtlockd for this? I mean, the whole purpose of it
is to hold locks so that libvirtd can restart independently, without
losing the lock attached. However, since the metadata lock will be held
only for a fraction of a second and will be not held through more than a
single API aren't we just fine with libvirtd holding the lock? The way I
imagine this to work is the following:
There were two reasons for virtlockd. The first was the persistent holding
of locks, but the other reasons is that POSIX fcntl() locks are horrific.
If one thread has an FD open with a lock held, if another thread opens and
closes the same underlying file, the first thread's lock will get dropped.
We can't be confident that other threads won't open the file, so the only
way to be safe was to put locking in to a separate process where we know
exactly what all threads are doing.
Frankly, I've started implementing this with virtlockd already,
but the
changes I made so far simply do not feel right, e.g. I have to change
lock_protocol.x so that virLockSpaceProtocolAcquireResourceArgs has this
'type' argument which tells virtlockd which byte we want to lock.
We could perhaps make use of the "flags" field, giving a flg to identify
the lockspace to use. This could be turned into an offset server side.
Regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|