[libvirt] Use flock() instead of fcntl()

Hi, we are interested in using virtlockd on an OCFS2 shared filesystem. We are now facing the problem that virtlockd uses fcntl() locks which aren't supported by OCFS2 with the o2cb cluster stack and we want to avoid using indirect leases. OCFS2 instead supports flock() which is quite similar to fcntl(). I attached a patch which makes libvirt use flock() *instead* of fcntl() and it seems to work. NFS on the contrast only supports fcntl() so it should be configurable which lock type to use. I'm not very experienced with locking, so would such a patch be acceptable or do you see possible problems with it? Cheers, David
From b823a9a9bd60a870d64341c4273c42d4eeba8d9b Mon Sep 17 00:00:00 2001 From: David Weber <wb@munzinger.de> Date: Thu, 25 Jul 2013 08:20:20 +0000 Subject: [PATCH] Use flock() instead of fcntl()
--- src/util/virfile.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/util/virfile.c b/src/util/virfile.c index 8f0eec3..e243c26 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c int virFileLock(int fd, bool shared, off_t start, off_t len) .l_len = len, }; - if (fcntl(fd, F_SETLK, &fl) < 0) + if (flock(fd, LOCK_EX | LOCK_NB) < 0) return -errno; return 0; int virFileUnlock(int fd, off_t start, off_t len) .l_len = len, }; - if (fcntl(fd, F_SETLK, &fl) < 0) + if (flock(fd, LOCK_UN) < 0) return -errno; return 0; -- 1.8.1.5

On Thu, Jul 25, 2013 at 08:23:24AM +0000, David Weber wrote:
Hi,
we are interested in using virtlockd on an OCFS2 shared filesystem. We are now facing the problem that virtlockd uses fcntl() locks which aren't supported by OCFS2 with the o2cb cluster stack and we want to avoid using indirect leases.
OCFS2 instead supports flock() which is quite similar to fcntl(). I attached a patch which makes libvirt use flock() *instead* of fcntl() and it seems to work.
NFS on the contrast only supports fcntl() so it should be configurable which lock type to use.
I'm not very experienced with locking, so would such a patch be acceptable or do you see possible problems with it?
We definitely can't use flock() unconditionally like that, as it is less widely supported than fcntl() locking and is known broken on most network filesystems (flock() will only lock wrt the local node, not network-wide). I'm pretty surprised that you can fcntl() is not supported in ocfs2. Are you really sure about that ? This mail message suggests it is supported https://oss.oracle.com/pipermail/ocfs2-users/2009-June/003572.html "Support for fcntl locking aka file-range locking aka posix locking is provided by vfs for all file systems. However, that support is appropriate only for local file systems. In ocfs2, we have added support for cluster-aware fcntl locking via the userspace clustering framework that allows one to use ocfs2 with different cluster-stacks." Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Thank you for your quick response! Am Donnerstag, 25. Juli 2013, 10:31:40 schrieb Daniel P. Berrange:
On Thu, Jul 25, 2013 at 08:23:24AM +0000, David Weber wrote:
Hi,
we are interested in using virtlockd on an OCFS2 shared filesystem. We are now facing the problem that virtlockd uses fcntl() locks which aren't supported by OCFS2 with the o2cb cluster stack and we want to avoid using indirect leases.
OCFS2 instead supports flock() which is quite similar to fcntl(). I attached a patch which makes libvirt use flock() *instead* of fcntl() and it seems to work.
NFS on the contrast only supports fcntl() so it should be configurable which lock type to use.
I'm not very experienced with locking, so would such a patch be acceptable or do you see possible problems with it?
We definitely can't use flock() unconditionally like that, as it is less widely supported than fcntl() locking and is known broken on most network filesystems (flock() will only lock wrt the local node, not network-wide).
flock() locks are cluster aware in recent versions of OCFS2 and I would try to make it configurable which lock type to use.
I'm pretty surprised that you can fcntl() is not supported in ocfs2. Are you really sure about that ?
This mail message suggests it is supported
https://oss.oracle.com/pipermail/ocfs2-users/2009-June/003572.html
"Support for fcntl locking aka file-range locking aka posix locking is provided by vfs for all file systems. However, that support is appropriate only for local file systems.
In ocfs2, we have added support for cluster-aware fcntl locking via the userspace clustering framework that allows one to use ocfs2 with different cluster-stacks."
OCFS2 supports fcntl() only with the userspace cluster stacks (pacemaker and cman) which we want to avoid. SUSE advises to use indirect leases which we also want to avoid: https://www.suse.com/documentation//sled11/book_kvm/?page=/documentation//sl... "virtlockd's default configuration does not allow you to lock disk files placed on a shared file system (for example NFS or OCFS2). Locking on these file systems requires you to specify a lockspace directory on the VM Host Server by setting" Although that's not completely correct because NFS supports fcntl() Cheers, David
Regards, Daniel

On Thu, Jul 25, 2013 at 12:07:44PM +0200, David Weber wrote:
Thank you for your quick response!
Am Donnerstag, 25. Juli 2013, 10:31:40 schrieb Daniel P. Berrange:
On Thu, Jul 25, 2013 at 08:23:24AM +0000, David Weber wrote:
Hi,
we are interested in using virtlockd on an OCFS2 shared filesystem. We are now facing the problem that virtlockd uses fcntl() locks which aren't supported by OCFS2 with the o2cb cluster stack and we want to avoid using indirect leases.
OCFS2 instead supports flock() which is quite similar to fcntl(). I attached a patch which makes libvirt use flock() *instead* of fcntl() and it seems to work.
NFS on the contrast only supports fcntl() so it should be configurable which lock type to use.
I'm not very experienced with locking, so would such a patch be acceptable or do you see possible problems with it?
We definitely can't use flock() unconditionally like that, as it is less widely supported than fcntl() locking and is known broken on most network filesystems (flock() will only lock wrt the local node, not network-wide).
flock() locks are cluster aware in recent versions of OCFS2 and I would try to make it configurable which lock type to use.
I'm pretty surprised that you can fcntl() is not supported in ocfs2. Are you really sure about that ?
This mail message suggests it is supported
https://oss.oracle.com/pipermail/ocfs2-users/2009-June/003572.html
"Support for fcntl locking aka file-range locking aka posix locking is provided by vfs for all file systems. However, that support is appropriate only for local file systems.
In ocfs2, we have added support for cluster-aware fcntl locking via the userspace clustering framework that allows one to use ocfs2 with different cluster-stacks."
OCFS2 supports fcntl() only with the userspace cluster stacks (pacemaker and cman) which we want to avoid.
Looking again at flock() I see it cannot support locking of ranges, only the entire file. This makes it unsuitable as a replacement for libvirt's use of fcntl() I'm afraid. I can only sugggest you configure OCFS2 so that it supports fcntl(), or setup virtlockd to use separate indirect leases on a diffrent shared filesystem, or perhaps try sanlock instead which doesn't require any special filesystem support.
SUSE advises to use indirect leases which we also want to avoid: https://www.suse.com/documentation//sled11/book_kvm/?page=/documentation//sl...
"virtlockd's default configuration does not allow you to lock disk files placed on a shared file system (for example NFS or OCFS2). Locking on these file systems requires you to specify a lockspace directory on the VM Host Server by setting"
Although that's not completely correct because NFS supports fcntl()
That's just badly written explanation for that config setting. It should really be saying that the default configuration does not provide protection across multiple hosts for file paths which are not visible via a shared filesystem. eg a SAN LUN in /dev/sdNNN won't be protected in the default config. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Am Donnerstag, 25. Juli 2013, 11:29:37 schrieb Daniel P. Berrange:
On Thu, Jul 25, 2013 at 12:07:44PM +0200, David Weber wrote:
Thank you for your quick response!
Am Donnerstag, 25. Juli 2013, 10:31:40 schrieb Daniel P. Berrange:
On Thu, Jul 25, 2013 at 08:23:24AM +0000, David Weber wrote:
Hi,
we are interested in using virtlockd on an OCFS2 shared filesystem. We are now facing the problem that virtlockd uses fcntl() locks which aren't supported by OCFS2 with the o2cb cluster stack and we want to avoid using indirect leases.
OCFS2 instead supports flock() which is quite similar to fcntl(). I attached a patch which makes libvirt use flock() *instead* of fcntl() and it seems to work.
NFS on the contrast only supports fcntl() so it should be configurable which lock type to use.
I'm not very experienced with locking, so would such a patch be acceptable or do you see possible problems with it?
We definitely can't use flock() unconditionally like that, as it is less widely supported than fcntl() locking and is known broken on most network filesystems (flock() will only lock wrt the local node, not network-wide).
flock() locks are cluster aware in recent versions of OCFS2 and I would try to make it configurable which lock type to use.
I'm pretty surprised that you can fcntl() is not supported in ocfs2. Are you really sure about that ?
This mail message suggests it is supported
https://oss.oracle.com/pipermail/ocfs2-users/2009-June/003572.html
"Support for fcntl locking aka file-range locking aka posix locking is
provided by vfs for all file systems. However, that support is appropriate only for local file systems.
In ocfs2, we have added support for cluster-aware fcntl locking via the userspace clustering framework that allows one to use ocfs2 with different cluster-stacks."
OCFS2 supports fcntl() only with the userspace cluster stacks (pacemaker and cman) which we want to avoid.
Looking again at flock() I see it cannot support locking of ranges, only the entire file. This makes it unsuitable as a replacement for libvirt's use of fcntl() I'm afraid. I can only sugggest you configure OCFS2 so that it supports fcntl(), or setup virtlockd to use separate indirect leases on a diffrent shared filesystem, or perhaps try sanlock instead which doesn't require any special filesystem support.
It's true that flock() doesn't support locking of ranges but I can't see how this is necessary. I attached a patch with extends virtlockd so it can either use flock() *or* fcntl(). Configurable in the configuration file.
SUSE advises to use indirect leases which we also want to avoid: https://www.suse.com/documentation//sled11/book_kvm/?page=/documentation// sled11/book_kvm/data/sec_libvirt_storage_locking.html
"virtlockd's default configuration does not allow you to lock disk files placed on a shared file system (for example NFS or OCFS2). Locking on these file systems requires you to specify a lockspace directory on the VM Host Server by setting"
Although that's not completely correct because NFS supports fcntl()
That's just badly written explanation for that config setting. It should really be saying that the default configuration does not provide protection across multiple hosts for file paths which are not visible via a shared filesystem. eg a SAN LUN in /dev/sdNNN won't be protected in the default config.
Daniel
From 2a91cc505b4fb5b467d2add416b2ebf4baa53311 Mon Sep 17 00:00:00 2001 From: David Weber <wb@munzinger.de> Date: Thu, 25 Jul 2013 14:10:23 +0200 Subject: [PATCH] Add flock() support for locking
Configurable in the virtlockd config --- src/locking/libvirt_lockd.aug | 1 + src/locking/lock_daemon_dispatch.c | 6 ++- src/locking/lock_driver_lockd.c | 14 ++++++- src/locking/lock_protocol.x | 3 +- src/locking/lockd.conf | 7 ++++ src/util/virfile.c | 59 +++++++++++++++++++++++++++- src/util/virfile.h | 4 ++ src/util/virlockspace.c | 80 ++++++++++++++++++++++++++++---------- src/util/virlockspace.h | 1 + 9 files changed, 148 insertions(+), 27 deletions(-) diff --git a/src/locking/libvirt_lockd.aug b/src/locking/libvirt_lockd.aug index 6a3bcba..7df715f 100644 --- a/src/locking/libvirt_lockd.aug +++ b/src/locking/libvirt_lockd.aug @@ -19,6 +19,7 @@ module Libvirt_lockd = (* Each enty in the config is one of the following three ... *) let entry = bool_entry "auto_disk_leases" | bool_entry "require_lease_for_disks" + | bool_entry "useFlock" | str_entry "file_lockspace_dir" | str_entry "lvm_lockspace_dir" | str_entry "scsi_lockspace_dir" diff --git a/src/locking/lock_daemon_dispatch.c b/src/locking/lock_daemon_dispatch.c index c2e1000..930018c 100644 --- a/src/locking/lock_daemon_dispatch.c +++ b/src/locking/lock_daemon_dispatch.c @@ -50,7 +50,8 @@ virLockSpaceProtocolDispatchAcquireResource(virNetServerPtr server ATTRIBUTE_UNU virMutexLock(&priv->lock); virCheckFlagsGoto(VIR_LOCK_SPACE_PROTOCOL_ACQUIRE_RESOURCE_SHARED | - VIR_LOCK_SPACE_PROTOCOL_ACQUIRE_RESOURCE_AUTOCREATE, cleanup); + VIR_LOCK_SPACE_PROTOCOL_ACQUIRE_RESOURCE_AUTOCREATE | + VIR_LOCK_SPACE_PROTOCOL_ACQUIRE_RESOURCE_USE_FLOCK, cleanup); if (priv->restricted) { virReportError(VIR_ERR_OPERATION_DENIED, "%s", @@ -76,7 +77,8 @@ virLockSpaceProtocolDispatchAcquireResource(virNetServerPtr server ATTRIBUTE_UNU newFlags |= VIR_LOCK_SPACE_ACQUIRE_SHARED; if (flags & VIR_LOCK_SPACE_PROTOCOL_ACQUIRE_RESOURCE_AUTOCREATE) newFlags |= VIR_LOCK_SPACE_ACQUIRE_AUTOCREATE; - + if (flags & VIR_LOCK_SPACE_PROTOCOL_ACQUIRE_RESOURCE_USE_FLOCK) + newFlags |= VIR_LOCK_SPACE_ACQUIRE_FLOCK; if (virLockSpaceAcquireResource(lockspace, args->name, priv->ownerPid, diff --git a/src/locking/lock_driver_lockd.c b/src/locking/lock_driver_lockd.c index b115799..1e07162 100644 --- a/src/locking/lock_driver_lockd.c +++ b/src/locking/lock_driver_lockd.c @@ -71,6 +71,7 @@ struct _virLockManagerLockDaemonPrivate { struct _virLockManagerLockDaemonDriver { bool autoDiskLease; bool requireLeaseForDisks; + bool useFlock; char *fileLockSpaceDir; char *lvmLockSpaceDir; @@ -125,6 +126,10 @@ static int virLockManagerLockDaemonLoadConfig(const char *configFile) CHECK_TYPE("auto_disk_leases", VIR_CONF_LONG); if (p) driver->autoDiskLease = p->l; + p = virConfGetValue(conf, "use_flock"); + CHECK_TYPE("use_flock", VIR_CONF_LONG); + if (p) driver->useFlock = p->l; + p = virConfGetValue(conf, "file_lockspace_dir"); CHECK_TYPE("file_lockspace_dir", VIR_CONF_STRING); if (p && p->str) { @@ -379,6 +384,7 @@ static int virLockManagerLockDaemonInit(unsigned int version, driver->requireLeaseForDisks = true; driver->autoDiskLease = true; + driver->useFlock = false; if (virLockManagerLockDaemonLoadConfig(configFile) < 0) goto error; @@ -724,6 +730,8 @@ static int virLockManagerLockDaemonAcquire(virLockManagerPtr lock, args.name = priv->resources[i].name; args.flags = priv->resources[i].flags; + if (driver->useFlock) + args.flags |= VIR_LOCK_SPACE_PROTOCOL_ACQUIRE_RESOURCE_USE_FLOCK; if (virNetClientProgramCall(program, client, counter++, @@ -780,10 +788,12 @@ static int virLockManagerLockDaemonRelease(virLockManagerPtr lock, args.name = priv->resources[i].name; args.flags = priv->resources[i].flags; - args.flags &= - ~(VIR_LOCK_SPACE_PROTOCOL_ACQUIRE_RESOURCE_SHARED | + args.flags &=~(VIR_LOCK_SPACE_PROTOCOL_ACQUIRE_RESOURCE_SHARED | VIR_LOCK_SPACE_PROTOCOL_ACQUIRE_RESOURCE_AUTOCREATE); + if (driver->useFlock) + args.flags |= VIR_LOCK_SPACE_PROTOCOL_ACQUIRE_RESOURCE_USE_FLOCK; + if (virNetClientProgramCall(program, client, counter++, diff --git a/src/locking/lock_protocol.x b/src/locking/lock_protocol.x index 354d51a..7828674 100644 --- a/src/locking/lock_protocol.x +++ b/src/locking/lock_protocol.x @@ -52,7 +52,8 @@ struct virLockSpaceProtocolDeleteResourceArgs { enum virLockSpaceProtocolAcquireResourceFlags { VIR_LOCK_SPACE_PROTOCOL_ACQUIRE_RESOURCE_SHARED = 1, - VIR_LOCK_SPACE_PROTOCOL_ACQUIRE_RESOURCE_AUTOCREATE = 2 + VIR_LOCK_SPACE_PROTOCOL_ACQUIRE_RESOURCE_AUTOCREATE = 2, + VIR_LOCK_SPACE_PROTOCOL_ACQUIRE_RESOURCE_USE_FLOCK = 4 }; struct virLockSpaceProtocolAcquireResourceArgs { diff --git a/src/locking/lockd.conf b/src/locking/lockd.conf index 85edb91..944c808 100644 --- a/src/locking/lockd.conf +++ b/src/locking/lockd.conf @@ -17,6 +17,13 @@ # #require_lease_for_disks = 1 +# +# The default lockd behaviour is to acquire locks via their +# the fnctl(2) syscall. This isn't aviable on all filesystems. +# In this case you can use this flag to use flock(2) instead. +# +#use_flock = 0 + # # The default lockd behaviour is to use the "direct" diff --git a/src/util/virfile.c b/src/util/virfile.c index 8f0eec3..56900f0 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -378,7 +378,7 @@ int virFileLock(int fd, bool shared, off_t start, off_t len) * @start: byte offset to start unlock * @len: length of lock (0 to release entire remaining file from @start) * - * Release a lock previously acquired with virFileUnlock(). + * Release a lock previously acquired with virFileLock(). * NB the lock will also be released if any open file descriptor * pointing to the same file as @fd is closed * @@ -398,6 +398,63 @@ int virFileUnlock(int fd, off_t start, off_t len) return 0; } + +/** + * virFileLockFlock: + * @fd: file descriptor to acquire the lock on + * @shared: type of lock to acquire + * + * Attempt to acquire a lock on the file @fd. If @shared + * is true, then a shared lock will be acquired, + * otherwise an exclusive lock will be acquired. If + * the lock cannot be acquired, an error will be + * returned. This will not wait to acquire the lock if + * another process already holds it. + * + * The lock will be released when @fd is closed. The lock + * will also be released if *any* other open file descriptor + * pointing to the same underlying file is closed. As such + * this function should not be relied on in multi-threaded + * apps where other threads can be opening/closing arbitrary + * files. + * + * Returns 0 on success, or -errno otherwise + */ +int virFileLockFlock(int fd, bool shared) +{ + if (shared) { + if (flock(fd, LOCK_SH | LOCK_NB) < 0) + return -errno; + } else { + if (flock(fd, LOCK_EX | LOCK_NB) < 0) + return -errno; + } + + return 0; +} + + +/** + * virFileUnlockFlock: + * @fd: file descriptor to release the lock on + * @start: byte offset to start unlock + * + * Release a lock previously acquired with virFileLockFlock(). + * NB the lock will also be released if any open file descriptor + * pointing to the same file as @fd is closed + * + * Returns 0 on succcess, or -errno on error + */ +int virFileUnlockFlock(int fd) +{ + if (flock(fd, LOCK_UN) < 0) + return -errno; + + return 0; +} + + + #else int virFileLock(int fd ATTRIBUTE_UNUSED, bool shared ATTRIBUTE_UNUSED, diff --git a/src/util/virfile.h b/src/util/virfile.h index 72d35ce..e8ced5c 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -100,6 +100,10 @@ void virFileWrapperFdFree(virFileWrapperFdPtr dfd); int virFileLock(int fd, bool shared, off_t start, off_t len); int virFileUnlock(int fd, off_t start, off_t len); +int virFileLockFlock(int fd, bool shared); +int virFileUnlockFlock(int fd); + + typedef int (*virFileRewriteFunc)(int fd, void *opaque); int virFileRewrite(const char *path, mode_t mode, diff --git a/src/util/virlockspace.c b/src/util/virlockspace.c index afb1abb..ef95672 100644 --- a/src/util/virlockspace.c +++ b/src/util/virlockspace.c @@ -154,17 +154,32 @@ virLockSpaceResourceNew(virLockSpacePtr lockspace, goto error; } - if (virFileLock(res->fd, shared, 0, 1) < 0) { - if (errno == EACCES || errno == EAGAIN) { - virReportError(VIR_ERR_RESOURCE_BUSY, - _("Lockspace resource '%s' is locked"), - resname); - } else { - virReportSystemError(errno, - _("Unable to acquire lock on '%s'"), - res->path); + if (flags & VIR_LOCK_SPACE_ACQUIRE_FLOCK) { + if (virFileLockFlock(res->fd, shared) < 0) { + if (errno == EACCES || errno == EAGAIN) { + virReportError(VIR_ERR_RESOURCE_BUSY, + _("Lockspace resource '%s' is locked"), + resname); + } else { + virReportSystemError(errno, + _("Unable to acquire lock on '%s'"), + res->path); + } + goto error; + } + } else { + if (virFileLock(res->fd, shared, 0, 1) < 0) { + if (errno == EACCES || errno == EAGAIN) { + virReportError(VIR_ERR_RESOURCE_BUSY, + _("Lockspace resource '%s' is locked"), + resname); + } else { + virReportSystemError(errno, + _("Unable to acquire lock on '%s'"), + res->path); + } + goto error; } - goto error; } /* Now make sure the pidfile we locked is the same @@ -201,17 +216,32 @@ virLockSpaceResourceNew(virLockSpacePtr lockspace, goto error; } - if (virFileLock(res->fd, shared, 0, 1) < 0) { - if (errno == EACCES || errno == EAGAIN) { - virReportError(VIR_ERR_RESOURCE_BUSY, - _("Lockspace resource '%s' is locked"), - resname); - } else { - virReportSystemError(errno, - _("Unable to acquire lock on '%s'"), - res->path); + if (flags & VIR_LOCK_SPACE_ACQUIRE_FLOCK) { + if (virFileLockFlock(res->fd, shared) < 0) { + if (errno == EACCES || errno == EAGAIN) { + virReportError(VIR_ERR_RESOURCE_BUSY, + _("Lockspace resource '%s' is locked"), + resname); + } else { + virReportSystemError(errno, + _("Unable to acquire lock on '%s'"), + res->path); + } + goto error; + } + } else { + if (virFileLock(res->fd, shared, 0, 1) < 0) { + if (errno == EACCES || errno == EAGAIN) { + virReportError(VIR_ERR_RESOURCE_BUSY, + _("Lockspace resource '%s' is locked"), + resname); + } else { + virReportSystemError(errno, + _("Unable to acquire lock on '%s'"), + res->path); + } + goto error; } - goto error; } } res->lockHeld = true; @@ -616,9 +646,17 @@ int virLockSpaceAcquireResource(virLockSpacePtr lockspace, VIR_DEBUG("lockspace=%p resname=%s flags=%x owner=%lld", lockspace, resname, flags, (unsigned long long)owner); + + if (flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) + VIR_DEBUG("David: shared"); + if (flags & VIR_LOCK_SPACE_ACQUIRE_AUTOCREATE) + VIR_DEBUG("David: autocreate"); + if (flags & VIR_LOCK_SPACE_ACQUIRE_FLOCK) + VIR_DEBUG("David: flock"); virCheckFlags(VIR_LOCK_SPACE_ACQUIRE_SHARED | - VIR_LOCK_SPACE_ACQUIRE_AUTOCREATE, -1); + VIR_LOCK_SPACE_ACQUIRE_AUTOCREATE | + VIR_LOCK_SPACE_ACQUIRE_FLOCK, -1); virMutexLock(&lockspace->lock); diff --git a/src/util/virlockspace.h b/src/util/virlockspace.h index 041cf20..a6ba3dd 100644 --- a/src/util/virlockspace.h +++ b/src/util/virlockspace.h @@ -45,6 +45,7 @@ int virLockSpaceDeleteResource(virLockSpacePtr lockspace, typedef enum { VIR_LOCK_SPACE_ACQUIRE_SHARED = (1 << 0), VIR_LOCK_SPACE_ACQUIRE_AUTOCREATE = (1 << 1), + VIR_LOCK_SPACE_ACQUIRE_FLOCK = (1 << 2), } virLockSpaceAcquireFlags; int virLockSpaceAcquireResource(virLockSpacePtr lockspace, -- 1.8.1.2

On Fri, Jul 26, 2013 at 10:44:24AM +0200, David Weber wrote:
Looking again at flock() I see it cannot support locking of ranges, only the entire file. This makes it unsuitable as a replacement for libvirt's use of fcntl() I'm afraid. I can only sugggest you configure OCFS2 so that it supports fcntl(), or setup virtlockd to use separate indirect leases on a diffrent shared filesystem, or perhaps try sanlock instead which doesn't require any special filesystem support.
It's true that flock() doesn't support locking of ranges but I can't see how this is necessary.
The code may not currently use ranges, but that doesn't mean it'll stay that way. By adding support for flock() we're preventing us from making use of this feature in the future, and I don't want to see that. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Am Freitag, 26. Juli 2013, 10:14:59 schrieb Daniel P. Berrange:
On Fri, Jul 26, 2013 at 10:44:24AM +0200, David Weber wrote:
Looking again at flock() I see it cannot support locking of ranges, only the entire file. This makes it unsuitable as a replacement for libvirt's use of fcntl() I'm afraid. I can only sugggest you configure OCFS2 so that it supports fcntl(), or setup virtlockd to use separate indirect leases on a diffrent shared filesystem, or perhaps try sanlock instead which doesn't require any special filesystem support.
It's true that flock() doesn't support locking of ranges but I can't see how this is necessary.
The code may not currently use ranges, but that doesn't mean it'll stay that way. By adding support for flock() we're preventing us from making use of this feature in the future, and I don't want to see that.
Just curious, what would be a possible feature which would require range based locking? I would really like to see flock() support in virtlockd because all other solutions have major drawbacks for me.
Daniel

On Fri, Jul 26, 2013 at 12:31:35PM +0200, David Weber wrote:
Am Freitag, 26. Juli 2013, 10:14:59 schrieb Daniel P. Berrange:
On Fri, Jul 26, 2013 at 10:44:24AM +0200, David Weber wrote:
Looking again at flock() I see it cannot support locking of ranges, only the entire file. This makes it unsuitable as a replacement for libvirt's use of fcntl() I'm afraid. I can only sugggest you configure OCFS2 so that it supports fcntl(), or setup virtlockd to use separate indirect leases on a diffrent shared filesystem, or perhaps try sanlock instead which doesn't require any special filesystem support.
It's true that flock() doesn't support locking of ranges but I can't see how this is necessary.
The code may not currently use ranges, but that doesn't mean it'll stay that way. By adding support for flock() we're preventing us from making use of this feature in the future, and I don't want to see that.
Just curious, what would be a possible feature which would require range based locking?
I would really like to see flock() support in virtlockd because all other solutions have major drawbacks for me.
Currently we use locks to protect the content of disk images. During startup/shutdown, however, libvirt also makes changes to the metadata of images by setting SELinux labels, uid/gid ownership and potentially ACLs. Currently we've delibrately crippled some of our code during shutdown since it isn't safe in the face of multiple libvirt's running. We need to introduce locking of file metadata to protect this code. The metadata locks, however, must not conflict with the content lock. Thus the reason why we only lock a single byte (range 0-1) for content locks, is that we want to be able to additional locks (range 1-2 or similar) for the metadata locks on the same files. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Fri, Jul 26, 2013 at 11:35:32AM +0100, Daniel P. Berrange wrote:
On Fri, Jul 26, 2013 at 12:31:35PM +0200, David Weber wrote:
Am Freitag, 26. Juli 2013, 10:14:59 schrieb Daniel P. Berrange:
On Fri, Jul 26, 2013 at 10:44:24AM +0200, David Weber wrote:
Looking again at flock() I see it cannot support locking of ranges, only the entire file. This makes it unsuitable as a replacement for libvirt's use of fcntl() I'm afraid. I can only sugggest you configure OCFS2 so that it supports fcntl(), or setup virtlockd to use separate indirect leases on a diffrent shared filesystem, or perhaps try sanlock instead which doesn't require any special filesystem support.
It's true that flock() doesn't support locking of ranges but I can't see how this is necessary.
The code may not currently use ranges, but that doesn't mean it'll stay that way. By adding support for flock() we're preventing us from making use of this feature in the future, and I don't want to see that.
Just curious, what would be a possible feature which would require range based locking?
I would really like to see flock() support in virtlockd because all other solutions have major drawbacks for me.
Currently we use locks to protect the content of disk images.
During startup/shutdown, however, libvirt also makes changes to the metadata of images by setting SELinux labels, uid/gid ownership and potentially ACLs. Currently we've delibrately crippled some of our code during shutdown since it isn't safe in the face of multiple libvirt's running. We need to introduce locking of file metadata to protect this code. The metadata locks, however, must not conflict with the content lock. Thus the reason why we only lock a single byte (range 0-1) for content locks, is that we want to be able to additional locks (range 1-2 or similar) for the metadata locks on the same files.
I've been wondering over the weekend if there is any viable alternative strategy we could take which could allow us to use flock(), without badly compromising our option for metadata locking. Given that metadata locks would be only held for very short periods of time, I think it could be reasonable to say we don't need to do locks on the individual disks files. It would be sufficient to do locks on the directory containing the disk file, if we only have flock() available. This would mean we wouldn't be blocked by the inability to lock byte ranges. I'm also wondering if there is a way to detect when fcntl() is not available for OCFS2 ? With the current virtlockd code, what is the behaviour when it tries to lock a file with fcntl() ? Does the fcntl lock attempt succeed, but only provide protection scoped to the single host, or do we get a hard errno from fcntl() on OCFS (eg EINVAL or something ? If we can detect broken fcntl() on OCFS, then we should not need to have a global config parameter - we would be able to automatically use fcntl() by default ,and fallback to flock() on OCFS2 deployments which aren't using cluster technology to enable fcntl(). Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Am Montag, 29. Juli 2013, 12:52:00 schrieb Daniel P. Berrange:
On Fri, Jul 26, 2013 at 11:35:32AM +0100, Daniel P. Berrange wrote:
On Fri, Jul 26, 2013 at 12:31:35PM +0200, David Weber wrote:
Am Freitag, 26. Juli 2013, 10:14:59 schrieb Daniel P. Berrange:
On Fri, Jul 26, 2013 at 10:44:24AM +0200, David Weber wrote:
Looking again at flock() I see it cannot support locking of ranges, only the entire file. This makes it unsuitable as a replacement for libvirt's use of fcntl() I'm afraid. I can only sugggest you configure OCFS2 so that it supports fcntl(), or setup virtlockd to use separate indirect leases on a diffrent shared filesystem, or perhaps try sanlock instead which doesn't require any special filesystem support.
It's true that flock() doesn't support locking of ranges but I can't see how this is necessary.
The code may not currently use ranges, but that doesn't mean it'll stay that way. By adding support for flock() we're preventing us from making use of this feature in the future, and I don't want to see that.
Just curious, what would be a possible feature which would require range based locking?
I would really like to see flock() support in virtlockd because all other solutions have major drawbacks for me.
Currently we use locks to protect the content of disk images.
During startup/shutdown, however, libvirt also makes changes to the metadata of images by setting SELinux labels, uid/gid ownership and potentially ACLs. Currently we've delibrately crippled some of our code during shutdown since it isn't safe in the face of multiple libvirt's running. We need to introduce locking of file metadata to protect this code. The metadata locks, however, must not conflict with the content lock. Thus the reason why we only lock a single byte (range 0-1) for content locks, is that we want to be able to additional locks (range 1-2 or similar) for the metadata locks on the same files.
Perhaps this was already your plan but the second lock mustn't come from the same process because it will reset the first lock if you use fcntl(). See for more details: http://0pointer.de/blog/projects/locking.html http://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2.html
I've been wondering over the weekend if there is any viable alternative strategy we could take which could allow us to use flock(), without badly compromising our option for metadata locking. Given that metadata locks would be only held for very short periods of time, I think it could be reasonable to say we don't need to do locks on the individual disks files. It would be sufficient to do locks on the directory containing the disk file, if we only have flock() available. This would mean we wouldn't be blocked by the inability to lock byte ranges.
Sounds good!
I'm also wondering if there is a way to detect when fcntl() is not available for OCFS2 ? With the current virtlockd code, what is the behaviour when it tries to lock a file with fcntl() ? Does the fcntl lock attempt succeed, but only provide protection scoped to the single host, or do we get a hard errno from fcntl() on OCFS (eg EINVAL or something ? If we can detect broken fcntl() on OCFS, then we should not need to have a global config parameter - we would be able to automatically use fcntl() by default ,and fallback to flock() on OCFS2 deployments which aren't using cluster technology to enable fcntl().
Unfortunately there seems to be no way to detect if fcntl() is cluster aware. When OCFS2 is used without an userspace cluster stack, the fcntl() locks will only exist on that machine. Every other machine in the cluster don't see them. So we would still need a config parameter or libvirt always uses flock() and doesn't care about all crappy NFS servers which doen't support flock() :) David
Daniel

Daniel P. Berrange wrote:
On Thu, Jul 25, 2013 at 12:07:44PM +0200, David Weber wrote:
SUSE advises to use indirect leases which we also want to avoid: https://www.suse.com/documentation//sled11/book_kvm/?page=/documentation//sl...
"virtlockd's default configuration does not allow you to lock disk files placed on a shared file system (for example NFS or OCFS2). Locking on these file systems requires you to specify a lockspace directory on the VM Host Server by setting"
Although that's not completely correct because NFS supports fcntl()
That's just badly written explanation for that config setting. It should really be saying that the default configuration does not provide protection across multiple hosts for file paths which are not visible via a shared filesystem. eg a SAN LUN in /dev/sdNNN won't be protected in the default config.
Agreed, poorly worded. I've submitted feedback to the doc team with suggested text improvement. Regards, Jim
participants (3)
-
Daniel P. Berrange
-
David Weber
-
Jim Fehlig