
[moderator note: .pngs were stripped to avoid overwhelming the mail server and list recipients with 1M of data] -------- Forwarded Message -------- Date: Wed, 30 Mar 2016 11:25:09 +0200 Message-ID: <20160330112509.Horde.0Oad-ZfkzdZnouXf-VkEmw1@webmailperso.univ-brest.fr> From: villeneu@kassis.univ-brest.fr To: Franky Van Liedekerke <liedekef@telenet.be> Cc: libvirt-users@redhat.com Subject: Re: [libvirt-users] VM crash and lock manager References: <20160329192740.Horde.BZXrmDPzMBshASPvOs_x4Q6@webmailperso.univ-brest.fr> <20160329231311.1ae867a1@telenet.be> In-Reply-To: <20160329231311.1ae867a1@telenet.be> --- Franky Van Liedekerke <liedekef@telenet.be> a écrit :
On Tue, 29 Mar 2016 19:27:40 +0200 villeneu@kassis.univ-brest.fr wrote:
Hello
I changed my hypervisors ( 8) from centos 6.0 to fedora 23 and I added locks to prevent starting VM on multiple nodes, and it's work well.
Since I changed the OS some old VMs ( centos 5.x and 6.x ) with very old kernel seem very instable and crash very often ( after one or two days ). Before they never have these problems.
I saw, perhaps the reason is the dev is too slow ??? see screenshot joined
A major problem with these crashs is that the VMs couldn't be destroyed by the virsh command, the qemu process is notified as defunct by the ps command.
with virsh destroy VM I often get
Failed to terminate process xxx with SIGTERM: Device or resource busy and the VM is still in the list
If I try to remove all the file associated with the VM in /var/run or /var/lib....channel .. It doesn't give results The VM is still running as defunct and after a libvirt restart , libvirt and virsh command are pending.
With sanlock it's a problem because the is never release and it's impossible to restart the VM. I can't remove the VM in the sanlock list. I tried kill commands, rm ... but without a real success.
I tried many command with sanlock perhaps I misunderstood some commands but I never succeed in removing lock on the VMs.
sanlock client rem_lockspace -r
__LIBVIRT__DISKS__:b90b9c61e2d6413077205907ffb3281a:/var/lib/libvirt/images/POOL_ADMIN/sanlock/b90b9c61e2d6413077205907ffb3281a:0:4
-p 9383
So I'm trying to use lock_manager to replace the sanlock but lock_manager doesn't have tools to retrieve the lock / VM name in case of failure or crash.
My questions are, is there a way to release or to force to release a lock with sanlock even if the qemu process is defunc and a way to restart libvirt in normal usage ?
Perhaps by the API it`s possible to forece the release the lock ?
Is there tools to retreive or calculate SHA256 lock for a VM with the lock_manager lock. I saw it's the lock is a SHA256 hash of the path and the VM name but I nerver get the good hash when I try by the hand.
Is there tools to retreive or calculate SHA256 the lock for a VM with the lock_manager lock. I saw that the lock is a SHA256 hash of the path and the VM name but I never get the good hash when I try by the hand.
Thanks. Sorry for my poor english ... I 'am french Michel
It seems the lock held by sanlock isn't being released, so the vm can't be shut down properly. Maybe some selinux issue? I've been using virtlockd without troubles for more than a year already (without selinux). For the locks, see if the lslocks command helps you.
Franky
_______________________________________________ libvirt-users mailing list
libvirt-users@redhat.comhttps://www.redhat.com/mailman/listinfo/libvirt-users I do not use selinux too, I preffered sanlock at the beginning because sanlock comes with tools to remove locks. But when VMs crash abnormaly or freeze ( see the sreenshot I missed to attach it yesterday ) there is no way to unlease or remove the locks, they are presents even if I kill the process qemu and I can't restart the VM because the lock is still here. So I try now lock-manager, but do you know exactly how the lock is calculate . In case of problem I want to be able to retrieve the lock coupled with the VM. I have seen that it is a hash SHA256 of the path and the name, but if a try to do by the hand a sha256 of the complete path of the VM I don't have the correct value ( the same value as lock_manager ). Any Ideas ? Thanks crash1.png crash2.png