Re: [libvirt] Use flock() instead of fcntl()

29 Jul 2013


      Am Montag, 29. Juli 2013, 12:52:00 schrieb Daniel P. Berrange:
...
On Fri, Jul 26, 2013 at 11:35:32AM +0100, Daniel P. Berrange wrote:
...
On Fri, Jul 26, 2013 at 12:31:35PM +0200, David Weber wrote:
...
Am Freitag, 26. Juli 2013, 10:14:59 schrieb Daniel P. Berrange:
...
On Fri, Jul 26, 2013 at 10:44:24AM +0200, David Weber wrote:
...
...
Looking again at flock() I see it cannot support locking of
ranges, only
the entire file. This makes it unsuitable as a replacement for
libvirt's
use of fcntl() I'm afraid. I can only sugggest you configure OCFS2
so
that it supports fcntl(), or setup virtlockd to use separate
indirect
leases on a diffrent shared filesystem, or perhaps try sanlock
instead
which doesn't require any special filesystem support.
It's true that flock() doesn't support locking of ranges but I can't
see
how this is necessary.
The code may not currently use ranges, but that doesn't mean it'll
stay
that way. By adding support for flock() we're preventing us from
making
use of this feature in the future, and I don't want to see that.
Just curious,  what would be a possible feature which would require
range
based locking?
I would really like to see flock() support in virtlockd because all
other
solutions have major drawbacks for me.
Currently we use locks to protect the content of disk images.
During startup/shutdown, however, libvirt also makes changes to the
metadata of images by setting SELinux labels, uid/gid ownership and
potentially ACLs. Currently we've delibrately crippled some of our
code during shutdown since it isn't safe in the face of multiple
libvirt's running. We need to introduce locking of file metadata
to protect this code. The metadata locks, however, must not conflict
with the content lock. Thus the reason why we only lock a single
byte (range 0-1) for content locks, is that we want to be able to
additional locks (range 1-2 or similar) for the metadata locks
on the same files.
Perhaps this was already your plan but the second lock mustn't come from the 
same process because it will reset the first lock if you use fcntl(). See for 
more details:
http://0pointer.de/blog/projects/locking.html
http://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2.html
...
I've been wondering over the weekend if there is any viable alternative
strategy we could take which could allow us to use flock(), without
badly compromising our option for metadata locking. Given that metadata
locks would be only held for very short periods of time, I think it
could be reasonable to say we don't need to do locks on the individual
disks files. It would be sufficient to do locks on the directory
containing the disk file, if we only have flock() available. This would
mean we wouldn't be blocked by the inability to lock byte ranges.
Sounds good!
...
I'm also wondering if there is a way to detect when fcntl() is not
available for OCFS2 ? With the current virtlockd code, what is the
behaviour when it tries to lock a file with fcntl() ?  Does the fcntl
lock attempt succeed, but only provide protection scoped to the single
host, or do we get a hard errno from fcntl() on OCFS (eg EINVAL or
something ?  If we can detect broken fcntl() on OCFS, then we should
not need to have a global config parameter - we would be able to
automatically use fcntl() by default ,and fallback to flock() on
OCFS2 deployments which aren't using cluster technology to enable
fcntl().
Unfortunately there seems to be no way to detect if fcntl() is cluster aware. 
When OCFS2 is used without an userspace cluster stack, the fcntl() locks will 
only exist on that machine. Every other machine in the cluster don't see them.

So we would still need a config parameter or libvirt always uses flock() and 
doesn't care about all crappy NFS servers which doen't support flock() :)

David
...
Daniel