On Tue, Jul 03, 2018 at 10:20:29AM -0400, Steve Gaarder wrote:
I have several Qemu/kvm servers running VMs hosted on an NFS share,
and am
using virtlockd. (lock_manager = "lockd" in qemu.conf) After a power
failure, one of the VMs will not start, claiming that it is locked. How do I
get out of this?
Libvirt uses fcntl() for locking disk image. In NFS v2 and v3, locking is
a side band protocol and when an NFS client host dies while holding locks,
the server will not release them. When the host comes back online it tell
the server to flush all locks it previously held. The problems obviously
arise if your dead host doesn't come back online, as nothgin will release
the locks and so other hosts won't be able to lock the VM.
In NFS v4 the situation is much improved, as locking is part of the main
protocol implemented as continually renewed leases. Thus when a client host
dies, it is possible for the server to timeout any locks it held without
waiting for the host to come back online.
My best recommendation would thus be to use NFS v4. Note that there's still
a 60 second timeout IIRC by default before the server releases the dead
client's locks.
Take a read of "man 5 nfs" if you want to learn more - see the section
headings
"Using file locks with NFS"
and
"NFS version 4 Leases"
Regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|