[libvirt] [PATCH] locking: Add io_timeout to sanlock

https://bugzilla.redhat.com/show_bug.cgi?id=1251190 So, if domain loses access to storage, sanlock tries to kill it after some timeout. So far, the default is 80 seconds. But for some scenarios this might not be enough. We should allow users to adjust the timeout according to their needs. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/locking/libvirt_sanlock.aug | 1 + src/locking/lock_driver_sanlock.c | 8 +++++++- src/locking/sanlock.conf | 7 +++++++ src/locking/test_libvirt_sanlock.aug.in | 1 + 4 files changed, 16 insertions(+), 1 deletion(-) diff --git a/src/locking/libvirt_sanlock.aug b/src/locking/libvirt_sanlock.aug index a78a444..8843590 100644 --- a/src/locking/libvirt_sanlock.aug +++ b/src/locking/libvirt_sanlock.aug @@ -22,6 +22,7 @@ module Libvirt_sanlock = | int_entry "host_id" | bool_entry "require_lease_for_disks" | bool_entry "ignore_readonly_and_shared_disks" + | int_entry "io_timeout" | str_entry "user" | str_entry "group" let comment = [ label "#comment" . del /#[ \t]*/ "# " . store /([^ \t\n][^\n]*)?/ . del /\n/ "\n" ] diff --git a/src/locking/lock_driver_sanlock.c b/src/locking/lock_driver_sanlock.c index e052875..f1dbbaf 100644 --- a/src/locking/lock_driver_sanlock.c +++ b/src/locking/lock_driver_sanlock.c @@ -73,6 +73,7 @@ struct _virLockManagerSanlockDriver { int hostID; bool autoDiskLease; char *autoDiskLeasePath; + unsigned int io_timeout; /* under which permissions does sanlock run */ uid_t user; @@ -151,6 +152,10 @@ static int virLockManagerSanlockLoadConfig(const char *configFile) else driver->requireLeaseForDisks = !driver->autoDiskLease; + p = virConfGetValue(conf, "io_timeout"); + CHECK_TYPE("io_timeout", VIR_CONF_ULONG); + if (p) driver->io_timeout = p->l; + p = virConfGetValue(conf, "user"); CHECK_TYPE("user", VIR_CONF_STRING); if (p) { @@ -338,7 +343,7 @@ static int virLockManagerSanlockSetupLockspace(void) * or we can fallback to polling. */ retry: - if ((rv = sanlock_add_lockspace(&ls, 0)) < 0) { + if ((rv = sanlock_add_lockspace_timeout(&ls, 0, driver->io_timeout)) < 0) { if (-rv == EINPROGRESS && --retries) { #ifdef HAVE_SANLOCK_INQ_LOCKSPACE /* we have this function which blocks until lockspace change the @@ -404,6 +409,7 @@ static int virLockManagerSanlockInit(unsigned int version, driver->requireLeaseForDisks = true; driver->hostID = 0; driver->autoDiskLease = false; + driver->io_timeout = 0; driver->user = (uid_t) -1; driver->group = (gid_t) -1; if (VIR_STRDUP(driver->autoDiskLeasePath, LOCALSTATEDIR "/lib/libvirt/sanlock") < 0) { diff --git a/src/locking/sanlock.conf b/src/locking/sanlock.conf index e5566ef..3a1a51c 100644 --- a/src/locking/sanlock.conf +++ b/src/locking/sanlock.conf @@ -54,6 +54,13 @@ #require_lease_for_disks = 1 # +# Sanlock is able to kill qemu processes on IO timeout. By its internal +# implementation, the current default is 80 seconds. If you need to adjust +# the value change the following variable. Value of zero means use the +# default sanlock timeout. +#io_timeout = 0 + +# # The combination of user and group under which the sanlock # daemon runs. Libvirt will chown created files (like # content of disk_lease_dir) to make sure sanlock daemon can diff --git a/src/locking/test_libvirt_sanlock.aug.in b/src/locking/test_libvirt_sanlock.aug.in index ef98ea6..7f66f81 100644 --- a/src/locking/test_libvirt_sanlock.aug.in +++ b/src/locking/test_libvirt_sanlock.aug.in @@ -6,5 +6,6 @@ module Test_libvirt_sanlock = { "disk_lease_dir" = "/var/lib/libvirt/sanlock" } { "host_id" = "1" } { "require_lease_for_disks" = "1" } +{ "io_timeout" = "0" } { "user" = "root" } { "group" = "root" } -- 2.4.10

On Fri, Oct 23, 2015 at 01:37:54PM +0200, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1251190
So, if domain loses access to storage, sanlock tries to kill it after some timeout. So far, the default is 80 seconds. But for some scenarios this might not be enough. We should allow users to adjust the timeout according to their needs.
Shouldn't we check for whether the current sanlock version supports that? Or require new enough version in configure/libvirt.spec? I understand the _timeout option was added later then the previous one. Otherwise looks good, if you convince me that there is no need for the check then it can go in as it is. Martin

On 27.10.2015 10:23, Martin Kletzander wrote:
On Fri, Oct 23, 2015 at 01:37:54PM +0200, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1251190
So, if domain loses access to storage, sanlock tries to kill it after some timeout. So far, the default is 80 seconds. But for some scenarios this might not be enough. We should allow users to adjust the timeout according to their needs.
Shouldn't we check for whether the current sanlock version supports that? Or require new enough version in configure/libvirt.spec? I understand the _timeout option was added later then the previous one.
Otherwise looks good, if you convince me that there is no need for the check then it can go in as it is.
The _timeout was added in e174a7e33728ba3ad587546693612476a081735d (sanlock-2.5~5) which dates back to Aug 2012. So I'd say since we are more than three years since then we are safe. But if you would feel safer with an explicit check I can do that. Michal
participants (2)
-
Martin Kletzander
-
Michal Privoznik