[libvirt-users] sanlockd, virtlock and GFS2

Hi, I'm trying to put in place a KVM cluster (using clvm and gfs2), but I'm running into some issues with either sanlock or virtlockd. All virtual machines are handled via the cluster (in /etc/cluser/cluster.conf) but I want some kind of locking to be in place as extra security measurement. Sanlock ======= At first I tried sanlock, but it seems if one node goes down unexpectedly, sanlock sometimes blocks everything on the other node. So I can't start any VM's there, so cluster failover is not working for the VM's. The locking dir for sanlock is on a shared dir (GFS2), so this might be the problem: sanlock recommends using a shared device (doing everything yourself) and don't recommend the locking dir to be on GFS2, but libvirt config suggest using a shared dir. Redhat even suggests that GFS2 is a good solution there, So: is there a hint on how to format/mount the GFS2 partition for sanlock usage? This thread (http://web.archiveorange.com/archive/v/fZfKniEdO4ZFyIhhNbzU ) recommends against GFS2+sanlock, but maybe using the "no_lock" gfs2 option might help? Any kind of advice? Virtlockd ========= Since sanlock was not working as expected, I tried virtlockd. This seems to be working well, but: live migration fails because the file on the shared disk (used for locking) is being locked by the original server running the VM. So when trying to do live migration, this fails and leaves the VM paused on the original server. Trying to migrate a paused server then works ok, but that's of course not live migration. So: is there a hint on how to format/mount the GFS2 partition for virtlock usage? Of course: since every VM is being started/stopped via the cluster, I could just leave out locking all together and trust on the cluster software ... Any insights/tips are greatly appreciated! Franky

On Fri, May 03, 2013 at 10:39:33AM +0200, Franky Van Liedekerke wrote:
Virtlockd ========= Since sanlock was not working as expected, I tried virtlockd. This seems to be working well, but: live migration fails because the file on the shared disk (used for locking) is being locked by the original server running the VM. So when trying to do live migration, this fails and leaves the VM paused on the original server. Trying to migrate a paused server then works ok, but that's of course not live migration.
That would be a bug - the lock is suposed to be released on the source & re-acquired on the dest at the switch-over point. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 2013-05-03 10:51, Daniel P. Berrange wrote:
On Fri, May 03, 2013 at 10:39:33AM +0200, Franky Van Liedekerke wrote:
Virtlockd ========= Since sanlock was not working as expected, I tried virtlockd. This seems to be working well, but: live migration fails because the file on the shared disk (used for locking) is being locked by the original server running the VM. So when trying to do live migration, this fails and leaves the VM paused on the original server. Trying to migrate a paused server then works ok, but that's of course not live migration.
That would be a bug - the lock is suposed to be released on the source & re-acquired on the dest at the switch-over point.
Daniel
Okay ... anyway to test/debug that? Any gfs2 options to take into account? I can recompile virtlockd if wanted ... Franky

On Fri, May 03, 2013 at 11:07:52AM +0200, Franky Van Liedekerke wrote:
On 2013-05-03 10:51, Daniel P. Berrange wrote:
On Fri, May 03, 2013 at 10:39:33AM +0200, Franky Van Liedekerke wrote:
Virtlockd ========= Since sanlock was not working as expected, I tried virtlockd. This seems to be working well, but: live migration fails because the file on the shared disk (used for locking) is being locked by the original server running the VM. So when trying to do live migration, this fails and leaves the VM paused on the original server. Trying to migrate a paused server then works ok, but that's of course not live migration.
That would be a bug - the lock is suposed to be released on the source & re-acquired on the dest at the switch-over point.
Daniel
Okay ... anyway to test/debug that? Any gfs2 options to take into account? I can recompile virtlockd if wanted ...
It is unlikley to have anything todo with GFS2. It'll bug a bug hiding somewhere in the lock manager or QEMU code I expect. Some sequence of operations is not quite right Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 2013-05-03 11:10, Daniel P. Berrange wrote:
On Fri, May 03, 2013 at 11:07:52AM +0200, Franky Van Liedekerke wrote:
On 2013-05-03 10:51, Daniel P. Berrange wrote:
On Fri, May 03, 2013 at 10:39:33AM +0200, Franky Van Liedekerke wrote:
Virtlockd ========= Since sanlock was not working as expected, I tried virtlockd. This seems to be working well, but: live migration fails because the file on the shared disk (used for locking) is being locked by the original server running the VM. So when trying to do live migration, this fails and leaves the VM paused on the original server. Trying to migrate a paused server then works ok, but that's of course not live migration.
That would be a bug - the lock is suposed to be released on the source & re-acquired on the dest at the switch-over point.
Daniel
Okay ... anyway to test/debug that? Any gfs2 options to take into account? I can recompile virtlockd if wanted ...
It is unlikley to have anything todo with GFS2. It'll bug a bug hiding somewhere in the lock manager or QEMU code I expect. Some sequence of operations is not quite right
Daniel
I'm seeing this on the source host when doing a live migration: error : virLockSpaceAcquireResource:662 : resource busy Lockspace resource '86f38e27a48275cc2b1a1e12897ba0339875d5b005e68fe1f7d4a1ac038ccdb3' is locked and this on the destination: error : virLockSpaceResourceNew:168 : resource busy Lockspace resource '86f38e27a48275cc2b1a1e12897ba0339875d5b005e68fe1f7d4a1ac038ccdb3' is locked So ... any c-code tips? Or should I just not use virtlockd? Franky

On Fri, May 03, 2013 at 10:10:50AM +0100, Daniel P. Berrange wrote:
On Fri, May 03, 2013 at 11:07:52AM +0200, Franky Van Liedekerke wrote:
On 2013-05-03 10:51, Daniel P. Berrange wrote:
On Fri, May 03, 2013 at 10:39:33AM +0200, Franky Van Liedekerke wrote:
Virtlockd ========= Since sanlock was not working as expected, I tried virtlockd. This seems to be working well, but: live migration fails because the file on the shared disk (used for locking) is being locked by the original server running the VM. So when trying to do live migration, this fails and leaves the VM paused on the original server. Trying to migrate a paused server then works ok, but that's of course not live migration.
That would be a bug - the lock is suposed to be released on the source & re-acquired on the dest at the switch-over point.
Daniel
Okay ... anyway to test/debug that? Any gfs2 options to take into account? I can recompile virtlockd if wanted ...
It is unlikley to have anything todo with GFS2. It'll bug a bug hiding somewhere in the lock manager or QEMU code I expect. Some sequence of operations is not quite right
A rather awful coding screwup caused us to never release locks when the VM was paused, thus breaking migration The fix is here https://www.redhat.com/archives/libvir-list/2013-May/msg00240.html Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 2013-05-03 15:07, Daniel P. Berrange wrote:
On Fri, May 03, 2013 at 10:10:50AM +0100, Daniel P. Berrange wrote:
On Fri, May 03, 2013 at 11:07:52AM +0200, Franky Van Liedekerke wrote:
On 2013-05-03 10:51, Daniel P. Berrange wrote:
On Fri, May 03, 2013 at 10:39:33AM +0200, Franky Van Liedekerke wrote:
Virtlockd ========= Since sanlock was not working as expected, I tried virtlockd. This seems to be working well, but: live migration fails because the file on the shared disk (used for locking) is being locked by the original server running the VM. So when trying to do live migration, this fails and leaves the VM paused on the original server. Trying to migrate a paused server then works ok, but that's of course not live migration.
That would be a bug - the lock is suposed to be released on the source & re-acquired on the dest at the switch-over point.
Daniel
Okay ... anyway to test/debug that? Any gfs2 options to take into account? I can recompile virtlockd if wanted ...
It is unlikley to have anything todo with GFS2. It'll bug a bug hiding somewhere in the lock manager or QEMU code I expect. Some sequence of operations is not quite right
A rather awful coding screwup caused us to never release locks when the VM was paused, thus breaking migration
The fix is here
https://www.redhat.com/archives/libvir-list/2013-May/msg00240.html
Daniel
I confirm that the fix seems to work. I need to test some more, but it's looking good so far. Franky
participants (2)
-
Daniel P. Berrange
-
Franky Van Liedekerke