On 10/17/2018 11:45 AM, Daniel P. Berrangé wrote:
On Wed, Oct 17, 2018 at 11:06:46AM +0200, Michal Privoznik wrote:
> Trying to use virlockd to lock metadata turns out to be too big
> gun. Since we will always spawn a separate process for relabeling
> we are safe to use thread unsafe POSIX locks and take out
> virtlockd completely out of the picture.
Note safety of POSIX locks is not merely about other threads.
If *any* code we invoke from security_dac or security_selinux
were to open() & close() the file we're labelling we'll
loose the locks.
This is a little concerning since we call out to a few functions
in src/util that might not be aware they should *never* open)(
and close() the path they're given. It looks like we lucky at
the moment, but this has future gun -> foot -> ouch potential.
Likewise we also call out to libselinux APIs. I've looked at
the code for the APIs we call and again, right now, we look
to be safe, but there's some risk there.
I'm not sure whether this is big enough risk to invalidate this
forking approach or not though. This kind of problem was the
reason why virtlockd was created to hold locks in a completely
separate process from the code with the critical section.
But unless we check for hard links virtlockd will suffer from the POSIX
locks too. For instance, assume that B is a hardlink to A. Then, if a
process opens A, locks it an the other opens B and closes it the lock is
released. Even though A and B look distinct.
The problem with virlockd is that our current virlockspace impl is not
aware of different offsets. We can try to make it so, but that would
complicate the code a lot. Just imagine that there's a domain already
running and it holds disk content lock over file C. If an user wants to
start a domain (which too has disk C), libvirtd will try to lock C again
(even though on different offset) but it will fail to do so because C is
already locked.
Another problem with virtlockd that I faced was that with current
implementation the connection to virlockd is opened in metadataLock(),
where the connection FD is dup()-ed to prevent connection closing. The
connection is then closed in metadataUnlock() where the dup()-ed FD is
closed. This works pretty much good, except for fork(). If a fork()
occurs between metadataLock() and metadataUnlock() the duplicated FD is
leaked into the child and we can't be certain when will the child close
it. The worst case scenario is that the child will close the FD when we
are holding some resources, which results in virtlockd killing libvirtd
(= registered owner of the locks).
Alternatively, we can switch to OFD locks and have the feature work only
on Linux. If anybody from BSD users will want the feature they will have
to push their kernel developers to implement some sensible file locking.
Also, one of the questions is what we want metadata locking to be. So
far I am aiming on having exclusive access to the file that we are
chown()-ing (or setfcon()-ing) so that I can put a code around it that
manipulates XATTR where the original owner would be stored. This way the
metadata lock is held only for a fraction of a second.
I guess we don't want the metadata lock be held through whole run of a
domain, i.e. obtained at domain start and released at domain shutdown. I
don't see much benefit in that.
Michal