Hi Daniel,

I thought migration might be the reason, but I'm still not seeing the behavior you describe with regards to pausing. I saw the following behavior:

  1. Created VM on node 1
  2. Started VM on node 1
  3. Migrated VM to node 2, node 1 is now shutdown, node 2 is running
  4. I paused node 2
  5. I started node 1, no error
  6. Paused node 1
  7. Unpaused node 2, no err

I thought maybe the original VM had to be paused first, so I tried that as well:

  1. Created VM on node 1
  2. Started VM on node 1
  3. Migrated to node 2, node 1 is now shutdown, node 2 is running
  4. I shutdown node 2 instead of pausing
  5. I started node 1
  6. I paused node 1
  7. Started node 2
  8. Paused node 2
  9. Started node 1

So sanlock is preventing both from running concurrently, but it seems to contradict your statement:
"Even if you now pause the VM on node 2, and try to resume node 1, sanlock will still prevent node 1 from running again. It tracks a lock "version" number. The key is that once the original VM is paused, if any other VM runs on that disk, the orignal VM is never allowed to be unpaused again."

Am I mistaken?

Thanks,
Michael


On 1/31/2013 9:44 AM, Daniel P. Berrange wrote:
On Thu, Jan 31, 2013 at 09:34:47AM -0800, Michael Rodrigues wrote:
Hello,

I'm using libvirt and sanlock on qemu-kvm guests. Each guest has
it's own Logical Volume for it's root filesystem. Sanlock is
configured and working and prevents me from starting the same VM
twice on multiple nodes and corrupting it's root filesystem. Each
VM's domain XML resides on 2 servers that share the LVM volume group
over fiber channel.

In testing, I noticed that if I pause a VM on node 1, the sanlock
lock is relinquished, and I am able to start the same VM, using the
same root filesystem, on node 2. I get a lock error when unpausing
node 1's VM if node 2's copy is still running, but by this point,
the disk may already be corrupted.
The disk isn't corrupted - preventing node 1 VM from un-pausing
is explicitly what is preventing the disk from becoming corrupted.

Even if you now pause the VM on node 2, and try to resume node 1,
sanlock will still prevent node 1 from running again. It tracks
a lock "version" number. The key is that once the original VM
is paused, if any other VM runs on that disk, the orignal VM is
never allowed to be unpaused again.

Is it necessary that paused VMs don't get to keep their locks? Is
there a way to configure sanlock to lock when a VM is paused?
No, it isn't something that's configurable - this behaviour is
required in order todo migration for example.

Regards,
Daniel


-- 
Michael Rodrigues
Interim Help Desk Manager
Gevirtz Graduate School of Education
Education Building 4203
(805) 893-8031
help@education.ucsb.edu