Am Montag, 29. Juli 2013, 12:52:00 schrieb Daniel P. Berrange:
On Fri, Jul 26, 2013 at 11:35:32AM +0100, Daniel P. Berrange wrote:
> On Fri, Jul 26, 2013 at 12:31:35PM +0200, David Weber wrote:
> > Am Freitag, 26. Juli 2013, 10:14:59 schrieb Daniel P. Berrange:
> > > On Fri, Jul 26, 2013 at 10:44:24AM +0200, David Weber wrote:
> > > > > Looking again at flock() I see it cannot support locking of
> > > > > ranges, only
> > > > > the entire file. This makes it unsuitable as a replacement for
> > > > > libvirt's
> > > > > use of fcntl() I'm afraid. I can only sugggest you configure
OCFS2
> > > > > so
> > > > > that it supports fcntl(), or setup virtlockd to use separate
> > > > > indirect
> > > > > leases on a diffrent shared filesystem, or perhaps try sanlock
> > > > > instead
> > > > > which doesn't require any special filesystem support.
> > > >
> > > > It's true that flock() doesn't support locking of ranges but
I can't
> > > > see
> > > > how this is necessary.
> > >
> > > The code may not currently use ranges, but that doesn't mean
it'll
> > > stay
> > > that way. By adding support for flock() we're preventing us from
> > > making
> > > use of this feature in the future, and I don't want to see that.
> >
> > Just curious, what would be a possible feature which would require
> > range
> > based locking?
> >
> > I would really like to see flock() support in virtlockd because all
> > other
> > solutions have major drawbacks for me.
>
> Currently we use locks to protect the content of disk images.
>
> During startup/shutdown, however, libvirt also makes changes to the
> metadata of images by setting SELinux labels, uid/gid ownership and
> potentially ACLs. Currently we've delibrately crippled some of our
> code during shutdown since it isn't safe in the face of multiple
> libvirt's running. We need to introduce locking of file metadata
> to protect this code. The metadata locks, however, must not conflict
> with the content lock. Thus the reason why we only lock a single
> byte (range 0-1) for content locks, is that we want to be able to
> additional locks (range 1-2 or similar) for the metadata locks
> on the same files.
Perhaps this was already your plan but the second lock mustn't come from the
same process because it will reset the first lock if you use fcntl(). See for
more details:
http://0pointer.de/blog/projects/locking.html
http://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2.html
I've been wondering over the weekend if there is any viable alternative
strategy we could take which could allow us to use flock(), without
badly compromising our option for metadata locking. Given that metadata
locks would be only held for very short periods of time, I think it
could be reasonable to say we don't need to do locks on the individual
disks files. It would be sufficient to do locks on the directory
containing the disk file, if we only have flock() available. This would
mean we wouldn't be blocked by the inability to lock byte ranges.
Sounds good!
I'm also wondering if there is a way to detect when fcntl() is not
available for OCFS2 ? With the current virtlockd code, what is the
behaviour when it tries to lock a file with fcntl() ? Does the fcntl
lock attempt succeed, but only provide protection scoped to the single
host, or do we get a hard errno from fcntl() on OCFS (eg EINVAL or
something ? If we can detect broken fcntl() on OCFS, then we should
not need to have a global config parameter - we would be able to
automatically use fcntl() by default ,and fallback to flock() on
OCFS2 deployments which aren't using cluster technology to enable
fcntl().
Unfortunately there seems to be no way to detect if fcntl() is cluster aware.
When OCFS2 is used without an userspace cluster stack, the fcntl() locks will
only exist on that machine. Every other machine in the cluster don't see them.
So we would still need a config parameter or libvirt always uses flock() and
doesn't care about all crappy NFS servers which doen't support flock() :)
David
Daniel