Michel Villeneuve <Michel.Villeneuve@univ-brest.fr> a écrit :
Hello
since I changed my hypervisor from centos 6.3 to Fedora-23
I had many problems with differents VMs.
Very often once by day ( I have about 150 VM ),
some VMs crash or freeze indifferently and I got
messages like this on console.
[<fffffffff8000a08bd>] wake_bit_funtion +0w0/0x23
.....
[<fffffffff8800a0ead>] :jbd:journal_get_write_access+0x22/0x33
;;;;
[<fffffffff800013ccd>] :ext3:ext3_dirty_inode+0x63/0x7b
And so the VM is crashed and can't be accessed but often the ping command can respond
Before my migration I never meet these problems, It 's strictly the same VMs between the 6.3 and
fedora23 release. I just changed the parameter :
<type arch='x86_64' machine='rhel-6.0.0'>hvm</type>
to
<type arch='x86_64' machine='pc-i440fx-2.4'>hvm</type>
and do a virsh define
I tried some other version of the parameters without success.
and I also added a lockd manager in fedora23.
Before and I used libvirt.0.9.5 or 1.0.2 on centos 6.03 without lockd
A major problem with these crashs is that the VMs couldn't be destroyed
by the virsh command, the qemu process is notified as defunct by the ps command
If I try a virsh destroy
I get in log file
2016-04-20 20:32:47.318+0000: 5541: info : virEventPollRunOnce:641 : EVENT_POLL_RUN: nhandles=11 timeout=-1
2016-04-20 20:32:55.028+0000: 5567: debug : virProcessKillPainfully:368 : Timed out waiting after SIGTERM to process 8720, sending SIGKILL
2016-04-20 20:33:00.032+0000: 5567: error : virProcessKillPainfully:398 : Failed to terminate process 8720 with SIGKILL: Périphérique ou ressource occupé
or on console
Failed to terminate process xxx with SIGTERM: Device or resource busy
and the VM is still in the list in a "Stopping" state
Result of ps ps on the the qemu process attached to the VM
qemu 8720 1 0 avril20 ? 00:07:16 [qemu-system-x86] <defunct>
root 8733 2 0 avril20 ? 00:00:01 [vhost-8720]
root 8735 2 0 avril20 ? 00:00:00 [kvm-pit/8720]
libvirtd seems to be in an anormal state. If I restart the libvirtd
the virsh command just hang and never remove the VM from the list.
The only seems to reboot the hypervisor but all the VMs in production too=..
Is there a way to remove the process qemu in defunct state without reboot the hypervisor.
Perhaps the probleme come from the VM parameters which have been created on 6.3 Centos and libvirt <1.0 version. Do I need to convert some other parameters ?
I 'am trying to put a new hypervisor in aFailed to terminate process X with SIGKILL: Device or resource busy version level less than fedora23 perhaps a Centos 7.2 to
see what 's happen and if there is a problem like mine.
Thanks
PS:
I put the log_level to 1
----------------------information in logfile
[root@kvmserver6 ~]# ls -al /var/lib/libvirt/qemu/domain-1-TEST-VM-A
total 8
drwxr-x--- 2 qemu qemu 4096 20 avril 22:01 .
drwxr-x--x. 18 qemu qemu 4096 20 avril 22:01 ..
srwxrwxr-x 1 qemu qemu 0 20 avril 22:01 monitor.sock
/var/lib/libvirt/qemu/channel/target/domain-1-TEST-VM-A/
[root@kvmserver6 ~]# cat /var/log/libvirt/qemu/TEST-VM-A.log
2016-04-20 20:01:36.714+0000: starting up libvirt version: 1.3.3, package: 1.fc23 (Unknown, 2016-04-06-15:17:39, thinkpad2), qemu version: 2.4.1 (qemu-2.4.1-8.fc23), hostname: kvmserver6.univ-brest.fr
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -name TEST-VM-A,debug-threads=on -S -machine pc-i440fx-2.4,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 1e4c27e4-123e-719a-9fdf-f783d34cbb40 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1-TEST-VM-A/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/POOL_PROD4/TEST-VM-A.img,format=raw,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:77:11:11,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 0.0.0.0:0,password -k fr -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device ES1370,id=sound0,bus=pci.0,addr=0x4 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on
char device redirected to /dev/pts/1 (label charserial0)
qemu: terminating on signal 15 from pid 5541Michel Villeneuve
Tel 02 98 01 71 61