[libvirt] Bug report 826704 - sanlock releases all resources on virsh detach-disk

Hello, I logged a bug about using virsh detach-disk cleaning up all sanlock resources for the domain instead of only the device in question. After a quick look into the code, I think a new method similar to virLockManagerSanlockAddResource is needed in case of detaching a disk from the domain, like e.g. virLockManagerSanlockDelResource (…). Now it looks like virLockManagerSanlockRelease is called, which releases all resources: if ((rv = sanlock_release(-1, priv->vm_pid, SANLK_REL_ALL, 0, NULL)) < 0) { virsh detach-disk should then call virLockManagerSanlockDelResource for the given resource imo. Any thoughts about this or why it is implemented like this? -- Frido Roose

On Thursday 31 May 2012 at 10:44, Frido Roose wrote: Hello, I logged a bug about using virsh detach-disk cleaning up all sanlock resources for the domain instead of only the device in question. After a quick look into the code, I think a new method similar to virLockManagerSanlockAddResource is needed in case of detaching a disk from the domain, like e.g. virLockManagerSanlockDelResource (…). Now it looks like virLockManagerSanlockRelease is called, which releases all resources: if ((rv = sanlock_release(-1, priv->vm_pid, SANLK_REL_ALL, 0, NULL)) < 0) { virsh detach-disk should then call virLockManagerSanlockDelResource for the given resource imo. Any thoughts about this or why it is implemented like this? With the help of some debugging output and the source code, new questions arise: Reproducing the problem =================== 10:10:47 pan ~ # virsh domblklist vmor01 Target Source ------------------------------------------------ vda /var/lib/libvirt/images/vmor01.img vdb /dev/disk/by-id/wwn-0x600a0b80002695880000cc4c4f865710 10:10:51 pan ~ # sanlock client status daemon b7baf6f0-2d9b-4b6f-be1e-2169284c68a0.pan p -1 listener p 9846 p -1 status s __LIBVIRT__DISKS__:14:/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__:0 r __LIBVIRT__DISKS__:d540f8af975019b0d41c4c65e4955072:/var/lib/libvirt/sanlock/d540f8af975019b0d41c4c65e4955072:0:11 p 9846 r __LIBVIRT__DISKS__:571a282fc5e8cbb42167ce9ec1d41a95:/var/lib/libvirt/sanlock/571a282fc5e8cbb42167ce9ec1d41a95:0:10 p 9846 10:10:53 pan ~ # virsh detach-disk vmor01 vdb Disk detached successfully 10:11:39 pan ~ # sanlock client status daemon b7baf6f0-2d9b-4b6f-be1e-2169284c68a0.pan p -1 listener p 9846 p -1 status s __LIBVIRT__DISKS__:14:/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__:0 DEBUG output ============ 2012-06-01 08:11:39.198+0000: 9718: debug : virDomainLockManagerNew:123 : plugin=0x7fa5240cb4f0 dom=0x7fa52419d460 withResources=0 2012-06-01 08:11:39.198+0000: 9718: debug : virLockManagerNew:291 : plugin=0x7fa5240cb4f0 type=0 nparams=4 params=0x7fa533cdf8d0 flags=0 2012-06-01 08:11:39.198+0000: 9718: debug : virLockManagerLogParams:98 : key=uuid type=uuid value=c0ee1dfa-4056-f3f9-9f8c-f10e974b59f0 2012-06-01 08:11:39.198+0000: 9718: debug : virLockManagerLogParams:94 : key=name type=string value=vmor01 2012-06-01 08:11:39.198+0000: 9718: debug : virLockManagerLogParams:82 : key=id type=uint value=1 2012-06-01 08:11:39.198+0000: 9718: debug : virLockManagerLogParams:82 : key=pid type=uint value=9846 2012-06-01 08:11:39.198+0000: 9718: debug : virDomainLockManagerAddDisk:86 : Add disk /dev/disk/by-id/wwn-0x600a0b80002695880000cc4c4f865710 2012-06-01 08:11:39.198+0000: 9718: debug : virLockManagerAddResource:320 : lock=0x7fa524198360 type=0 name=/dev/disk/by-id/wwn-0x600a0b80002695880000cc4c4f865710 nparams=0 params=(nil) flags=0 2012-06-01 08:11:39.198+0000: 9718: debug : virLockManagerRelease:352 : lock=0x7fa524198360 state=(nil) flags=0 2012-06-01 08:11:39.214+0000: 9718: debug : virLockManagerFree:374 : lock=0x7fa524198360 SOURCE ======== qemuDomainDetachDiskDevice or qemuDomainDetachPciDiskDevice calls virDomainLockDiskDetach (src/locking/domain_lock.c): int virDomainLockDiskDetach(virLockManagerPluginPtr plugin, virDomainObjPtr dom, virDomainDiskDefPtr disk) { virLockManagerPtr lock; int ret = -1; if (!(lock = virDomainLockManagerNew(plugin, dom, false))) return -1; if (virDomainLockManagerAddDisk(lock, disk) < 0) goto cleanup; if (virLockManagerRelease(lock, NULL, 0) < 0) goto cleanup; ret = 0; cleanup: virLockManagerFree(lock); return ret; } I wondered why virDomainLockManagerAddDisk is called when detaching a disk device. It even looks like this succeeds and then virLockManagerRelease is called, which cleans up all sanlock resources for the domain. I suspect a function like virDomainLockManagerRemoveDisk(lock, disk) should be called at this point that releases the specific disk, which does not yet exist. Only virLockDriverAddResource exists in the _virLockDriver struct. I have the feeling that this sanlock is not completely implemented yet? Or am I missing something big? Best regards, Frido Roose

On 06/01/2012 02:46 AM, Frido Roose wrote:
I wondered why virDomainLockManagerAddDisk is called when detaching a disk device. It even looks like this succeeds and then virLockManagerRelease is called, which cleans up all sanlock resources for the domain.
I suspect a function like virDomainLockManagerRemoveDisk(lock, disk) should be called at this point that releases the specific disk, which does not yet exist.
Yes, that sounds reasonable.
Only virLockDriverAddResource exists in the _virLockDriver struct.
I have the feeling that this sanlock is not completely implemented yet? Or am I missing something big?
You are correct that no one has coded it yet - since it seems to be your itch, would you mind taking a stab at writing the patch to add it? -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Fri, Jun 01, 2012 at 10:46:04AM +0200, Frido Roose wrote:
On Thursday 31 May 2012 at 10:44, Frido Roose wrote:
Hello,
I logged a bug about using virsh detach-disk cleaning up all sanlock resources for the domain instead of only the device in question.
After a quick look into the code, I think a new method similar to virLockManagerSanlockAddResource is needed in case of detaching a disk from the domain, like e.g. virLockManagerSanlockDelResource (…).
Now it looks like virLockManagerSanlockRelease is called, which releases all resources: if ((rv = sanlock_release(-1, priv->vm_pid, SANLK_REL_ALL, 0, NULL)) < 0) {
virsh detach-disk should then call virLockManagerSanlockDelResource for the given resource imo.
You are a little mixed up with the sanlock API naming here vs the hotplug hotunplug actions. The way sanlock works is that you do - Connect to sanlock - Register all the disks that will be operated on using virLockManagerSanlockAddResource - Invoke the acquire/inquire/release actions on sanlock In other words, using 'AddResource' is correct, even when releasing a resource/lease. The problem is that the usage of SANLK_REL_ALL is wrong. We should be using the res_args + res_count fields to pass the explicit list of disks, instead of releasing all of them. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Friday 8 June 2012 at 15:45, Daniel P. Berrange wrote:
You are a little mixed up with the sanlock API naming here vs the hotplug hotunplug actions.
The way sanlock works is that you do
- Connect to sanlock - Register all the disks that will be operated on using virLockManagerSanlockAddResource - Invoke the acquire/inquire/release actions on sanlock
In other words, using 'AddResource' is correct, even when releasing a resource/lease.
The problem is that the usage of SANLK_REL_ALL is wrong. We should be using the res_args + res_count fields to pass the explicit list of disks, instead of releasing all of them.
You are absolutely right, I was confused about the naming. I found some time this weekend to take a look into the code again and changed to code as you suggested. I compiled and tested the patch, and the release of the resource looks fine now. Only the specific resource/disk is being removed from sanlock now, while the others stay registered. You find the patch included in this message. Thanks for the help! BR, Frido

I compiled and tested the patch, and the release of the resource looks fine now. Only the specific resource/disk is being removed from sanlock now, while the others stay registered.
Hold on, despite the behavior is as expected with the patch, there seems to be an issue with the return value or check: 2012-07-09 10:53:29.105+0000: 6686: debug : virLockManagerAddResource:320 : lock=0x7f22e8009b60 type=0 name=/var/lib/libvirt/images/testlun2.img nparams=0 params=(nil) flags=0 2012-07-09 10:53:29.105+0000: 6686: debug : virLockManagerRelease:352 : lock=0x7f22e8009b60 state=(nil) flags=0 2012-07-09 10:53:29.105+0000: 6686: error : virLockManagerSanlockRelease:842 : Failed to release lock: Operation not permitted 2012-07-09 10:53:29.105+0000: 6686: debug : virLockManagerFree:374 : lock=0x7f22e8009b60 2012-07-09 10:53:29.105+0000: 6686: warning : qemuDomainDetachPciDiskDevice:1356 : Unable to release lock on /var/lib/libvirt/images/testlun2.img The disk resource is however removed from the domain, and from sanlock (as exposed with: sanlock client status) as we expected it to be. Probably something is missing in the return value handling. Fyi, Frido

Hold on, despite the behavior is as expected with the patch, there seems to be an issue with the return value or check:
Problem found: the res_count variable was not properly initialized causing the improper return values. Included a new patch which I have tested again by hot-attaching/detaching disk devices. Debug output looks fine now, sanlock behavior as expected. Could someone review this? Thanks, Frido
participants (3)
-
Daniel P. Berrange
-
Eric Blake
-
Frido Roose