
Hi Peter, thanks for your help. ----- On Jun 5, 2019, at 9:27 AM, Peter Krempa pkrempa@redhat.com wrote:
============================================================= ... 2019-05-31 20:31:34.481+0000: 4170: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor 2019-06-01 01:05:32.233+0000: 4170: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor
This message is printed if qemu crashes for some reason and then closes the monitor socket unexpectedly.
2019-06-01 01:05:43.804+0000: 22605: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:05:43.848+0000: 22596: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:06:11.438+0000: 26112: warning : qemuDomainObjBeginJobInternal:4865 : Cannot start job (destroy, none) for doma in severin; current job is (modify, none) owned by (5372 remoteDispatchDomainBlockJobAbort, 0 <null>) for (39s, 0s) 2019-06-01 01:06:11.438+0000: 26112: error : qemuDomainObjBeginJobInternal:4877 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainBlockJobAbort)
So this means that the virDomainBlockJobAbort API which is also used for --pivot got stuck for some time.
This is kind of strange if the VM crashed, there might also be a bug in the synchronous block job handling, but it's hard to tell from this log.
The VM didn't crash. It kept running. See "last": root pts/49 ha-idg-2.scidom. Tue Jun 4 14:02 - 13:18 (23:16) root pts/47 pc60337.scidom.d Mon Jun 3 15:13 still logged in reboot system boot 2.6.4-52-smp Wed May 15 20:19 (20+17:02) reboot system boot 2.6.4-52-smp Fri Mar 15 17:38 (81+18:44) reboot system boot 2.6.4-52-smp Wed Feb 27 20:29 (15+21:04)
The syslog from the domain itself didn't reveal anything, it just continues to run. The libvirt log from the domains just says: qemu-system-x86_64: block/mirror.c:864: mirror_run: Assertion `((&bs->tracked_requests)->lh_first == ((void *)0))' failed.
So that's interresting. Usually assertion failure in qemu leads to calling abort() and thus the vm would have crashed. Didn't you HA solution restart it?
No. As said the VM didn't crash. It kept running.
At any rate it would be really beneficial if you could collect debug logs for libvirtd which also contain the monitor interactions with qemu:
https://wiki.libvirt.org/page/DebugLogs
The qemu assertion failure above should ideally be reported to qemu, but if you are able to reproduce the problem with libvirtd debug logs enabled I can extract more useful info from there which the qemu project would ask you anyways.
I can't reproduce it. It seems to happen accidentally. But i can collect the logs. Do they get very large ? I can contact you the next time it happen. Is that ok for you ? Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Stellv. Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, Kerstin Guenther Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671