[libvirt-users] Sanlock gives up lock when VM is paused

Hello, I'm using libvirt and sanlock on qemu-kvm guests. Each guest has it's own Logical Volume for it's root filesystem. Sanlock is configured and working and prevents me from starting the same VM twice on multiple nodes and corrupting it's root filesystem. Each VM's domain XML resides on 2 servers that share the LVM volume group over fiber channel. In testing, I noticed that if I pause a VM on node 1, the sanlock lock is relinquished, and I am able to start the same VM, using the same root filesystem, on node 2. I get a lock error when unpausing node 1's VM if node 2's copy is still running, but by this point, the disk may already be corrupted. Is it necessary that paused VMs don't get to keep their locks? Is there a way to configure sanlock to lock when a VM is paused? Versions: sanlock(-devel, -lib) 2.3-1.el6 libvirt(-lock-sanlock, -client) 0.9.10-21.el6_3.8 I'm using NFS for the lockspace. Thanks, Michael -- Michael Rodrigues Interim Help Desk Manager Gevirtz Graduate School of Education Education Building 4203 (805) 893-8031 help@education.ucsb.edu

On Thu, Jan 31, 2013 at 09:34:47AM -0800, Michael Rodrigues wrote:
Hello,
I'm using libvirt and sanlock on qemu-kvm guests. Each guest has it's own Logical Volume for it's root filesystem. Sanlock is configured and working and prevents me from starting the same VM twice on multiple nodes and corrupting it's root filesystem. Each VM's domain XML resides on 2 servers that share the LVM volume group over fiber channel.
In testing, I noticed that if I pause a VM on node 1, the sanlock lock is relinquished, and I am able to start the same VM, using the same root filesystem, on node 2. I get a lock error when unpausing node 1's VM if node 2's copy is still running, but by this point, the disk may already be corrupted.
The disk isn't corrupted - preventing node 1 VM from un-pausing is explicitly what is preventing the disk from becoming corrupted. Even if you now pause the VM on node 2, and try to resume node 1, sanlock will still prevent node 1 from running again. It tracks a lock "version" number. The key is that once the original VM is paused, if any other VM runs on that disk, the orignal VM is never allowed to be unpaused again.
Is it necessary that paused VMs don't get to keep their locks? Is there a way to configure sanlock to lock when a VM is paused?
No, it isn't something that's configurable - this behaviour is required in order todo migration for example. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Hi Daniel, I thought migration might be the reason, but I'm still not seeing the behavior you describe with regards to pausing. I saw the following behavior: 1. Created VM on node 1 2. Started VM on node 1 3. Migrated VM to node 2, node 1 is now shutdown, node 2 is running 4. I paused node 2 5. I started node 1, no error 6. Paused node 1 7. Unpaused node 2, no err I thought maybe the original VM had to be paused first, so I tried that as well: 1. Created VM on node 1 2. Started VM on node 1 3. Migrated to node 2, node 1 is now shutdown, node 2 is running 4. I shutdown node 2 instead of pausing 5. I started node 1 6. I paused node 1 7. Started node 2 8. Paused node 2 9. Started node 1 So sanlock is preventing both from running concurrently, but it seems to contradict your statement: "Even if you now pause the VM on node 2, and try to resume node 1, sanlock will still prevent node 1 from running again. It tracks a lock "version" number. The key is that once the original VM is paused, if any other VM runs on that disk, the orignal VM is never allowed to be unpaused again." Am I mistaken? Thanks, Michael On 1/31/2013 9:44 AM, Daniel P. Berrange wrote:
On Thu, Jan 31, 2013 at 09:34:47AM -0800, Michael Rodrigues wrote:
Hello,
I'm using libvirt and sanlock on qemu-kvm guests. Each guest has it's own Logical Volume for it's root filesystem. Sanlock is configured and working and prevents me from starting the same VM twice on multiple nodes and corrupting it's root filesystem. Each VM's domain XML resides on 2 servers that share the LVM volume group over fiber channel.
In testing, I noticed that if I pause a VM on node 1, the sanlock lock is relinquished, and I am able to start the same VM, using the same root filesystem, on node 2. I get a lock error when unpausing node 1's VM if node 2's copy is still running, but by this point, the disk may already be corrupted. The disk isn't corrupted - preventing node 1 VM from un-pausing is explicitly what is preventing the disk from becoming corrupted.
Even if you now pause the VM on node 2, and try to resume node 1, sanlock will still prevent node 1 from running again. It tracks a lock "version" number. The key is that once the original VM is paused, if any other VM runs on that disk, the orignal VM is never allowed to be unpaused again.
Is it necessary that paused VMs don't get to keep their locks? Is there a way to configure sanlock to lock when a VM is paused? No, it isn't something that's configurable - this behaviour is required in order todo migration for example.
Regards, Daniel
-- Michael Rodrigues Interim Help Desk Manager Gevirtz Graduate School of Education Education Building 4203 (805) 893-8031 help@education.ucsb.edu

On Thu, Jan 31, 2013 at 10:35:17AM -0800, Michael Rodrigues wrote:
Hi Daniel,
I thought migration might be the reason, but I'm still not seeing the behavior you describe with regards to pausing. I saw the following behavior:
1. Created VM on node 1 2. Started VM on node 1 3. Migrated VM to node 2, node 1 is now shutdown, node 2 is running 4. I paused node 2 5. I started node 1, no error 6. Paused node 1 7. Unpaused node 2, no err
I thought maybe the original VM had to be paused first, so I tried that as well:
1. Created VM on node 1 2. Started VM on node 1 3. Migrated to node 2, node 1 is now shutdown, node 2 is running 4. I shutdown node 2 instead of pausing 5. I started node 1 6. I paused node 1 7. Started node 2 8. Paused node 2 9. Started node 1
Hmm, that isn't supposed to be possible. When you paused node 1 in step 6, it was supposed to record the lease version number. When you resume in step 9, the version number should mis-match due to step 7, and thus sandlock ought to have caused an error at step 9. If that didn't happen, then I believe we have a bug
So sanlock is preventing both from running concurrently, but it seems to contradict your statement: "Even if you now pause the VM on node 2, and try to resume node 1, sanlock will still prevent node 1 from running again. It tracks a lock "version" number. The key is that once the original VM is paused, if any other VM runs on that disk, the orignal VM is never allowed to be unpaused again."
Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 1/31/2013 10:40 AM, Daniel P. Berrange wrote:
On Thu, Jan 31, 2013 at 10:35:17AM -0800, Michael Rodrigues wrote:
Hi Daniel,
I thought migration might be the reason, but I'm still not seeing the behavior you describe with regards to pausing. I saw the following behavior:
1. Created VM on node 1 2. Started VM on node 1 3. Migrated VM to node 2, node 1 is now shutdown, node 2 is running 4. I paused node 2 5. I started node 1, no error 6. Paused node 1 7. Unpaused node 2, no err
I thought maybe the original VM had to be paused first, so I tried that as well:
1. Created VM on node 1 2. Started VM on node 1 3. Migrated to node 2, node 1 is now shutdown, node 2 is running 4. I shutdown node 2 instead of pausing 5. I started node 1 6. I paused node 1 7. Started node 2 8. Paused node 2 9. Started node 1 Hmm, that isn't supposed to be possible. When you paused node 1 in step 6, it was supposed to record the lease version number. When you resume in step 9, the version number should mis-match due to step 7, and thus sandlock ought to have caused an error at step 9. If that didn't happen, then I believe we have a bug Should I file a report? I'm not really a developer but I can provide whatever information is necessary for a proper report. I don't have RHEL or a bugzilla account.
So sanlock is preventing both from running concurrently, but it seems to contradict your statement: "Even if you now pause the VM on node 2, and try to resume node 1, sanlock will still prevent node 1 from running again. It tracks a lock "version" number. The key is that once the original VM is paused, if any other VM runs on that disk, the orignal VM is never allowed to be unpaused again." Daniel
-- Michael Rodrigues Interim Help Desk Manager Gevirtz Graduate School of Education Education Building 4203 (805) 893-8031 help@education.ucsb.edu

On Thu, Jan 31, 2013 at 11:02:15AM -0800, Michael Rodrigues wrote:
On 1/31/2013 10:40 AM, Daniel P. Berrange wrote:
On Thu, Jan 31, 2013 at 10:35:17AM -0800, Michael Rodrigues wrote:
Hi Daniel,
I thought migration might be the reason, but I'm still not seeing the behavior you describe with regards to pausing. I saw the following behavior:
1. Created VM on node 1 2. Started VM on node 1 3. Migrated VM to node 2, node 1 is now shutdown, node 2 is running 4. I paused node 2 5. I started node 1, no error 6. Paused node 1 7. Unpaused node 2, no err
I thought maybe the original VM had to be paused first, so I tried that as well:
1. Created VM on node 1 2. Started VM on node 1 3. Migrated to node 2, node 1 is now shutdown, node 2 is running 4. I shutdown node 2 instead of pausing 5. I started node 1 6. I paused node 1 7. Started node 2 8. Paused node 2 9. Started node 1 Hmm, that isn't supposed to be possible. When you paused node 1 in step 6, it was supposed to record the lease version number. When you resume in step 9, the version number should mis-match due to step 7, and thus sandlock ought to have caused an error at step 9. If that didn't happen, then I believe we have a bug Should I file a report? I'm not really a developer but I can provide whatever information is necessary for a proper report. I don't have RHEL or a bugzilla account.
Yes, please do file a bug report including this scenario to reproduce Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Just for follow up and future reference: https://bugzilla.redhat.com/show_bug.cgi?id=906590 Thanks, Michael On 1/31/2013 12:03 PM, Daniel P. Berrange wrote:
On Thu, Jan 31, 2013 at 11:02:15AM -0800, Michael Rodrigues wrote:
On 1/31/2013 10:40 AM, Daniel P. Berrange wrote:
On Thu, Jan 31, 2013 at 10:35:17AM -0800, Michael Rodrigues wrote:
Hi Daniel,
I thought migration might be the reason, but I'm still not seeing the behavior you describe with regards to pausing. I saw the following behavior:
1. Created VM on node 1 2. Started VM on node 1 3. Migrated VM to node 2, node 1 is now shutdown, node 2 is running 4. I paused node 2 5. I started node 1, no error 6. Paused node 1 7. Unpaused node 2, no err
I thought maybe the original VM had to be paused first, so I tried that as well:
1. Created VM on node 1 2. Started VM on node 1 3. Migrated to node 2, node 1 is now shutdown, node 2 is running 4. I shutdown node 2 instead of pausing 5. I started node 1 6. I paused node 1 7. Started node 2 8. Paused node 2 9. Started node 1 Hmm, that isn't supposed to be possible. When you paused node 1 in step 6, it was supposed to record the lease version number. When you resume in step 9, the version number should mis-match due to step 7, and thus sandlock ought to have caused an error at step 9. If that didn't happen, then I believe we have a bug Should I file a report? I'm not really a developer but I can provide whatever information is necessary for a proper report. I don't have RHEL or a bugzilla account. Yes, please do file a bug report including this scenario to reproduce
Daniel
-- Michael Rodrigues Interim Help Desk Manager Gevirtz Graduate School of Education Education Building 4203 (805) 893-8031 help@education.ucsb.edu

In doing more testing I notice that the resource listed by 'sanlock client status' disappears when the VM is paused. I haven't yet found anything about lock versions yet though. However, even if sanlock behaved as you said, preventing the unpausing of a VM once another one has used it's disk, this could still cause damage to the filesystem if the VM is paused during write operations and never allowed to finish (after the subsequent VM using the same storage takes over). I would think the ideal behavior would be keeping the lock until the VM is completely off. I don't see why this would get in the way of migration as the filesystem shouldn't need to be accessed until the RAM contents have been migrated to the other machine, but then again I'm not writing the code. Thanks, Michael On 1/31/2013 3:56 PM, Michael Rodrigues wrote:
Just for follow up and future reference:
https://bugzilla.redhat.com/show_bug.cgi?id=906590
Thanks, Michael
On 1/31/2013 12:03 PM, Daniel P. Berrange wrote:
On Thu, Jan 31, 2013 at 11:02:15AM -0800, Michael Rodrigues wrote:
On 1/31/2013 10:40 AM, Daniel P. Berrange wrote:
On Thu, Jan 31, 2013 at 10:35:17AM -0800, Michael Rodrigues wrote:
Hi Daniel,
I thought migration might be the reason, but I'm still not seeing the behavior you describe with regards to pausing. I saw the following behavior:
1. Created VM on node 1 2. Started VM on node 1 3. Migrated VM to node 2, node 1 is now shutdown, node 2 is running 4. I paused node 2 5. I started node 1, no error 6. Paused node 1 7. Unpaused node 2, no err
I thought maybe the original VM had to be paused first, so I tried that as well:
1. Created VM on node 1 2. Started VM on node 1 3. Migrated to node 2, node 1 is now shutdown, node 2 is running 4. I shutdown node 2 instead of pausing 5. I started node 1 6. I paused node 1 7. Started node 2 8. Paused node 2 9. Started node 1 Hmm, that isn't supposed to be possible. When you paused node 1 in step 6, it was supposed to record the lease version number. When you resume in step 9, the version number should mis-match due to step 7, and thus sandlock ought to have caused an error at step 9. If that didn't happen, then I believe we have a bug Should I file a report? I'm not really a developer but I can provide whatever information is necessary for a proper report. I don't have RHEL or a bugzilla account. Yes, please do file a bug report including this scenario to reproduce
Daniel
-- Michael Rodrigues Interim Help Desk Manager Gevirtz Graduate School of Education Education Building 4203 (805) 893-8031 help@education.ucsb.edu
participants (2)
-
Daniel P. Berrange
-
Michael Rodrigues