[libvirt-users] blockcommit of domain not successfull

Hi, i have several domains running on a 2-node HA-cluster. Each night i create snapshots of the domains, after copying the consistent raw file to a CIFS server i blockcommit the changes into the raw files. That's running quite well. But recent the blockcommit didn't work for one domain: I create a logfile from the whole procedure: =============================================================== ... Sat Jun 1 03:05:24 CEST 2019 Target Source ------------------------------------------------ vdb /mnt/snap/severin.sn hdc - /usr/bin/virsh blockcommit severin /mnt/snap/severin.sn --verbose --active --pivot Block commit: [ 0 %]Block commit: [ 15 %]Block commit: [ 28 %]Block commit: [ 35 %]Block commit: [ 43 %]Block commit: [ 53 %]Block commit: [ 63 %]Block commit: [ 73 %]Block commit: [ 82 %]Block commit: [ 89 %]Block commit: [ 98 %]Block commit: [100 %]Target Source ------------------------------------------------ vdb /mnt/snap/severin.sn ... ============================================================== The libvirtd-log says (it's UTC IIRC): ============================================================= ... 2019-05-31 20:31:34.481+0000: 4170: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor 2019-06-01 01:05:32.233+0000: 4170: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor 2019-06-01 01:05:43.804+0000: 22605: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:05:43.848+0000: 22596: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:06:11.438+0000: 26112: warning : qemuDomainObjBeginJobInternal:4865 : Cannot start job (destroy, none) for doma in severin; current job is (modify, none) owned by (5372 remoteDispatchDomainBlockJobAbort, 0 <null>) for (39s, 0s) 2019-06-01 01:06:11.438+0000: 26112: error : qemuDomainObjBeginJobInternal:4877 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainBlockJobAbort) 2019-06-01 01:06:13.976+0000: 5369: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:06:14.028+0000: 22596: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:06:44.165+0000: 5371: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:06:44.218+0000: 22605: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:07:14.343+0000: 5369: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:07:14.387+0000: 22598: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:07:44.495+0000: 22605: warning : qemuGetProcessInfo:1461 : cannot parse process status data ... =========================================================== and "cannot parse process status data" continuously until the end of the logfile. The syslog from the domain itself didn't reveal anything, it just continues to run. The libvirt log from the domains just says: qemu-system-x86_64: block/mirror.c:864: mirror_run: Assertion `((&bs->tracked_requests)->lh_first == ((void *)0))' failed. Hosts are SLES 12 SP4 with libvirt-daemon-4.0.0-8.9.1.x86_64. Bernd -- Bernd Lentes Systemadministration Institut für Entwicklungsgenetik Gebäude 35.34 - Raum 208 HelmholtzZentrum münchen bernd.lentes@helmholtz-muenchen.de phone: +49 89 3187 1241 phone: +49 89 3187 3827 fax: +49 89 3187 2294 http://www.helmholtz-muenchen.de/idg wer Fehler macht kann etwas lernen wer nichts macht kann auch nichts lernen Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Stellv. Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, Kerstin Guenther Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671

On Tue, Jun 04, 2019 at 14:44:29 +0200, Lentes, Bernd wrote:
Hi,
Hi,
i have several domains running on a 2-node HA-cluster. Each night i create snapshots of the domains, after copying the consistent raw file to a CIFS server i blockcommit the changes into the raw files. That's running quite well. But recent the blockcommit didn't work for one domain: I create a logfile from the whole procedure: =============================================================== ... Sat Jun 1 03:05:24 CEST 2019 Target Source ------------------------------------------------ vdb /mnt/snap/severin.sn hdc -
/usr/bin/virsh blockcommit severin /mnt/snap/severin.sn --verbose --active --pivot Block commit: [ 0 %]Block commit: [ 15 %]Block commit: [ 28 %]Block commit: [ 35 %]Block commit: [ 43 %]Block commit: [ 53 %]Block commit: [ 63 %]Block commit: [ 73 %]Block commit: [ 82 %]Block commit: [ 89 %]Block commit: [ 98 %]Block commit: [100 %]Target Source ------------------------------------------------ vdb /mnt/snap/severin.sn ... ==============================================================
The libvirtd-log says (it's UTC IIRC): ============================================================= ... 2019-05-31 20:31:34.481+0000: 4170: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor 2019-06-01 01:05:32.233+0000: 4170: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor
This message is printed if qemu crashes for some reason and then closes the monitor socket unexpectedly.
2019-06-01 01:05:43.804+0000: 22605: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:05:43.848+0000: 22596: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:06:11.438+0000: 26112: warning : qemuDomainObjBeginJobInternal:4865 : Cannot start job (destroy, none) for doma in severin; current job is (modify, none) owned by (5372 remoteDispatchDomainBlockJobAbort, 0 <null>) for (39s, 0s) 2019-06-01 01:06:11.438+0000: 26112: error : qemuDomainObjBeginJobInternal:4877 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainBlockJobAbort)
So this means that the virDomainBlockJobAbort API which is also used for --pivot got stuck for some time. This is kind of strange if the VM crashed, there might also be a bug in the synchronous block job handling, but it's hard to tell from this log.
2019-06-01 01:06:13.976+0000: 5369: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:06:14.028+0000: 22596: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:06:44.165+0000: 5371: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:06:44.218+0000: 22605: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:07:14.343+0000: 5369: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:07:14.387+0000: 22598: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:07:44.495+0000: 22605: warning : qemuGetProcessInfo:1461 : cannot parse process status data ... =========================================================== and "cannot parse process status data" continuously until the end of the logfile.
The syslog from the domain itself didn't reveal anything, it just continues to run. The libvirt log from the domains just says: qemu-system-x86_64: block/mirror.c:864: mirror_run: Assertion `((&bs->tracked_requests)->lh_first == ((void *)0))' failed.
So that's interresting. Usually assertion failure in qemu leads to calling abort() and thus the vm would have crashed. Didn't you HA solution restart it? At any rate it would be really beneficial if you could collect debug logs for libvirtd which also contain the monitor interactions with qemu: https://wiki.libvirt.org/page/DebugLogs The qemu assertion failure above should ideally be reported to qemu, but if you are able to reproduce the problem with libvirtd debug logs enabled I can extract more useful info from there which the qemu project would ask you anyways.

Hi Peter, thanks for your help. ----- On Jun 5, 2019, at 9:27 AM, Peter Krempa pkrempa@redhat.com wrote:
============================================================= ... 2019-05-31 20:31:34.481+0000: 4170: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor 2019-06-01 01:05:32.233+0000: 4170: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor
This message is printed if qemu crashes for some reason and then closes the monitor socket unexpectedly.
2019-06-01 01:05:43.804+0000: 22605: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:05:43.848+0000: 22596: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-01 01:06:11.438+0000: 26112: warning : qemuDomainObjBeginJobInternal:4865 : Cannot start job (destroy, none) for doma in severin; current job is (modify, none) owned by (5372 remoteDispatchDomainBlockJobAbort, 0 <null>) for (39s, 0s) 2019-06-01 01:06:11.438+0000: 26112: error : qemuDomainObjBeginJobInternal:4877 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainBlockJobAbort)
So this means that the virDomainBlockJobAbort API which is also used for --pivot got stuck for some time.
This is kind of strange if the VM crashed, there might also be a bug in the synchronous block job handling, but it's hard to tell from this log.
The VM didn't crash. It kept running. See "last": root pts/49 ha-idg-2.scidom. Tue Jun 4 14:02 - 13:18 (23:16) root pts/47 pc60337.scidom.d Mon Jun 3 15:13 still logged in reboot system boot 2.6.4-52-smp Wed May 15 20:19 (20+17:02) reboot system boot 2.6.4-52-smp Fri Mar 15 17:38 (81+18:44) reboot system boot 2.6.4-52-smp Wed Feb 27 20:29 (15+21:04)
The syslog from the domain itself didn't reveal anything, it just continues to run. The libvirt log from the domains just says: qemu-system-x86_64: block/mirror.c:864: mirror_run: Assertion `((&bs->tracked_requests)->lh_first == ((void *)0))' failed.
So that's interresting. Usually assertion failure in qemu leads to calling abort() and thus the vm would have crashed. Didn't you HA solution restart it?
No. As said the VM didn't crash. It kept running.
At any rate it would be really beneficial if you could collect debug logs for libvirtd which also contain the monitor interactions with qemu:
https://wiki.libvirt.org/page/DebugLogs
The qemu assertion failure above should ideally be reported to qemu, but if you are able to reproduce the problem with libvirtd debug logs enabled I can extract more useful info from there which the qemu project would ask you anyways.
I can't reproduce it. It seems to happen accidentally. But i can collect the logs. Do they get very large ? I can contact you the next time it happen. Is that ok for you ? Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Stellv. Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, Kerstin Guenther Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671

On Wed, Jun 05, 2019 at 13:33:49 +0200, Lentes, Bernd wrote:
Hi Peter,
thanks for your help.
----- On Jun 5, 2019, at 9:27 AM, Peter Krempa pkrempa@redhat.com wrote:
[...]
So that's interresting. Usually assertion failure in qemu leads to calling abort() and thus the vm would have crashed. Didn't you HA solution restart it?
No. As said the VM didn't crash. It kept running.
That's interresting. I hope you manage to reproduce it then.
At any rate it would be really beneficial if you could collect debug logs for libvirtd which also contain the monitor interactions with qemu:
https://wiki.libvirt.org/page/DebugLogs
The qemu assertion failure above should ideally be reported to qemu, but if you are able to reproduce the problem with libvirtd debug logs enabled I can extract more useful info from there which the qemu project would ask you anyways.
I can't reproduce it. It seems to happen accidentally. But i can collect the logs. Do they get very large ? I can contact you the next time it happen. Is that ok for you ?
Unfortunately they do get very large if there's some monitoring gathering stats through libvirt, but it's okay to nuke them prior to attempting the block commit, or daily or so. Please do contact me if you gather anything interresting.

----- On Jun 5, 2019, at 4:49 PM, Peter Krempa pkrempa@redhat.com wrote:
On Wed, Jun 05, 2019 at 13:33:49 +0200, Lentes, Bernd wrote:
Hi Peter,
thanks for your help.
----- On Jun 5, 2019, at 9:27 AM, Peter Krempa pkrempa@redhat.com wrote:
[...]
So that's interresting. Usually assertion failure in qemu leads to calling abort() and thus the vm would have crashed. Didn't you HA solution restart it?
No. As said the VM didn't crash. It kept running.
That's interresting. I hope you manage to reproduce it then.
I can't reproduce it. It seems to happen accidentally. But i can collect the logs. Do they get very large ? I can contact you the next time it happen. Is that ok for you ?
Unfortunately they do get very large if there's some monitoring gathering stats through libvirt, but it's okay to nuke them prior to attempting the block commit, or daily or so.
Please do contact me if you gather anything interresting.
Hi, i followed https://wiki.libvirt.org/page/DebugLogs. Where do i have to set LIBVIRT_LOG_OUTPUTS="1:file:/tmp/libvirt_client.log" ? Also in /etc/libvirt/libvirtd.conf ? Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Stellv. Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, Kerstin Guenther Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671

On Wed, Jun 05, 2019 at 18:02:06 +0200, Lentes, Bernd wrote: [...]
Hi,
i followed https://wiki.libvirt.org/page/DebugLogs. Where do i have to set LIBVIRT_LOG_OUTPUTS="1:file:/tmp/libvirt_client.log" ? Also in /etc/libvirt/libvirtd.conf ?
No, that's an environment variable if you want to debug the client, but that's not necessary in this case. Only the daemon logs are interresting.

----- On Jun 5, 2019, at 4:49 PM, Peter Krempa pkrempa@redhat.com wrote:
On Wed, Jun 05, 2019 at 13:33:49 +0200, Lentes, Bernd wrote:
Hi Peter,
thanks for your help.
----- On Jun 5, 2019, at 9:27 AM, Peter Krempa pkrempa@redhat.com wrote:
[...]
So that's interresting. Usually assertion failure in qemu leads to calling abort() and thus the vm would have crashed. Didn't you HA solution restart it?
No. As said the VM didn't crash. It kept running.
That's interresting. I hope you manage to reproduce it then.
At any rate it would be really beneficial if you could collect debug logs for libvirtd which also contain the monitor interactions with qemu:
https://wiki.libvirt.org/page/DebugLogs
The qemu assertion failure above should ideally be reported to qemu, but if you are able to reproduce the problem with libvirtd debug logs enabled I can extract more useful info from there which the qemu project would ask you anyways.
I can't reproduce it. It seems to happen accidentally. But i can collect the logs. Do they get very large ? I can contact you the next time it happen. Is that ok for you ?
Unfortunately they do get very large if there's some monitoring gathering stats through libvirt, but it's okay to nuke them prior to attempting the block commit, or daily or so.
Please do contact me if you gather anything interresting.
Hi, it happened again. Following the log of my script it started on 8th of june at 5:59:09 (UTC+2) to blockcommit the domain. These are the related lines in libvirtd.log: =================================================== 2019-06-07 20:30:57.170+0000: 30299: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor 2019-06-08 03:59:17.690+0000: 30299: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor 2019-06-08 03:59:26.145+0000: 30300: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-08 03:59:26.191+0000: 30303: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-08 03:59:56.095+0000: 27956: warning : qemuDomainObjBeginJobInternal:4865 : Cannot start job (destroy, none) for domain severin; current job is (modify, none) owned by (13061 remoteDispatchDomainBlockJobAbort, 0 <null>) for (38s, 0s) 2019-06-08 03:59:56.095+0000: 27956: error : qemuDomainObjBeginJobInternal:4877 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainBlockJobAbort) 2019-06-08 03:59:56.325+0000: 13060: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-08 03:59:56.372+0000: 30304: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-08 04:00:26.503+0000: 13060: warning : qemuGetProcessInfo:1461 : cannot parse process status data ==================================================== Since then the script is stuck. Thanks for your help. Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, Kerstin Guenther Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671

On Tue, Jun 11, 2019 at 19:19:05 +0200, Lentes, Bernd wrote: [...]
Hi,
Hi,
it happened again. Following the log of my script it started on 8th of june at 5:59:09 (UTC+2) to blockcommit the domain. These are the related lines in libvirtd.log: ===================================================
Thanks for comming back to me with the information. Unfortunately this is not a full debug log but I can try to tell you what I see here:
2019-06-07 20:30:57.170+0000: 30299: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor 2019-06-08 03:59:17.690+0000: 30299: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor
So this looks like qemu crashed. Or at least it's the usual symptom we get. Is there anything in /var/log/libvirt/qemu/$VMNAME.log?
2019-06-08 03:59:26.145+0000: 30300: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-08 03:59:26.191+0000: 30303: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-08 03:59:56.095+0000: 27956: warning : qemuDomainObjBeginJobInternal:4865 : Cannot start job (destroy, none) for domain severin; current job is (modify, none) owned by (13061 remoteDispatchDomainBlockJobAbort, 0 <null>) for (38s, 0s)
And this looks to me as if the Abort job can't be interrupted properly while waiting synchronously for the job to finish. This seems to be the problem. If the VM indeed crashed there's a problem in job waiting apparently. I'd still really like to have debug logs in this case to really see what happened.

----- On Jun 13, 2019, at 9:56 AM, Peter Krempa pkrempa@redhat.com wrote:
Thanks for comming back to me with the information.
Unfortunately this is not a full debug log but I can try to tell you what I see here:
I configured libvirtd that way: ha-idg-1:~ # grep -Ev '^$|#' /etc/libvirt/libvirtd.conf log_level = 1 log_filters="1:qemu 3:remote 4:event 3:util.json 3:rpc" log_outputs="1:file:/var/log/libvirt/libvirtd.log" keepalive_interval = -1 That's what i found on https://wiki.libvirt.org/page/DebugLogs . Isn't that correct ? That should create informative logfiles. The other host has excat the same configuration but produce much bigger logfiles !?! I have libvirt-daemon-4.0.0-8.12.1.x86_64.
2019-06-07 20:30:57.170+0000: 30299: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor 2019-06-08 03:59:17.690+0000: 30299: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor
So this looks like qemu crashed. Or at least it's the usual symptom we get. Is there anything in /var/log/libvirt/qemu/$VMNAME.log?
That's all: qemu-system-x86_64: block/mirror.c:864: mirror_run: Assertion `((&bs->tracked_requests)->lh_first == ((void *)0))' failed.
2019-06-08 03:59:26.145+0000: 30300: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-08 03:59:26.191+0000: 30303: warning : qemuGetProcessInfo:1461 : cannot parse process status data 2019-06-08 03:59:56.095+0000: 27956: warning : qemuDomainObjBeginJobInternal:4865 : Cannot start job (destroy, none) for domain severin; current job is (modify, none) owned by (13061 remoteDispatchDomainBlockJobAbort, 0 <null>) for (38s, 0s)
And this looks to me as if the Abort job can't be interrupted properly while waiting synchronously for the job to finish. This seems to be the problem. If the VM indeed crashed there's a problem in job waiting apparently.
I'd still really like to have debug logs in this case to really see what happened.
I configured logging as i found on https://wiki.libvirt.org/page/DebugLogs. What else can i do ? Bernd Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, Kerstin Guenther Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671

----- On Jun 13, 2019, at 1:08 PM, Bernd Lentes bernd.lentes@helmholtz-muenchen.de wrote: I found further information in /var/log/messages for both occurrences: 2019-06-01T03:05:31.620725+02:00 ha-idg-2 systemd-coredump[14253]: Core Dumping has been disabled for process 30590 (qemu-system-x86). 2019-06-01T03:05:31.712673+02:00 ha-idg-2 systemd-coredump[14253]: Process 30590 (qemu-system-x86) of user 488 dumped core. 2019-06-01T03:05:32.173272+02:00 ha-idg-2 kernel: [294682.387828] br0: port 4(vnet2) entered disabled state 2019-06-01T03:05:32.177111+02:00 ha-idg-2 kernel: [294682.388384] device vnet2 left promiscuous mode 2019-06-01T03:05:32.177122+02:00 ha-idg-2 kernel: [294682.388391] br0: port 4(vnet2) entered disabled state 2019-06-01T03:05:32.208916+02:00 ha-idg-2 wickedd[2954]: error retrieving tap attribute from sysfs 2019-06-01T03:05:41.395685+02:00 ha-idg-2 systemd-machined[2824]: Machine qemu-31-severin terminated. 2019-06-08T05:59:17.502899+02:00 ha-idg-1 systemd-coredump[31089]: Core Dumping has been disabled for process 19489 (qemu-system-x86). 2019-06-08T05:59:17.523050+02:00 ha-idg-1 systemd-coredump[31089]: Process 19489 (qemu-system-x86) of user 489 dumped core. 2019-06-08T05:59:17.650334+02:00 ha-idg-1 kernel: [999258.577132] br0: port 9(vnet7) entered disabled state 2019-06-08T05:59:17.650354+02:00 ha-idg-1 kernel: [999258.578103] device vnet7 left promiscuous mode 2019-06-08T05:59:17.650355+02:00 ha-idg-1 kernel: [999258.578108] br0: port 9(vnet7) entered disabled state 2019-06-08T05:59:25.983702+02:00 ha-idg-1 systemd-machined[1383]: Machine qemu-205-severin terminated. Core Dumping is disabled, but nevertheless a core dump has been created ? Where could i find it ? Would it be useful to provide it ? Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, Kerstin Guenther Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671

On Thu, Jun 13, 2019 at 16:01:18 +0200, Lentes, Bernd wrote:
----- On Jun 13, 2019, at 1:08 PM, Bernd Lentes bernd.lentes@helmholtz-muenchen.de wrote:
I found further information in /var/log/messages for both occurrences:
2019-06-01T03:05:31.620725+02:00 ha-idg-2 systemd-coredump[14253]: Core Dumping has been disabled for process 30590 (qemu-system-x86). 2019-06-01T03:05:31.712673+02:00 ha-idg-2 systemd-coredump[14253]: Process 30590 (qemu-system-x86) of user 488 dumped core. 2019-06-01T03:05:32.173272+02:00 ha-idg-2 kernel: [294682.387828] br0: port 4(vnet2) entered disabled state 2019-06-01T03:05:32.177111+02:00 ha-idg-2 kernel: [294682.388384] device vnet2 left promiscuous mode 2019-06-01T03:05:32.177122+02:00 ha-idg-2 kernel: [294682.388391] br0: port 4(vnet2) entered disabled state 2019-06-01T03:05:32.208916+02:00 ha-idg-2 wickedd[2954]: error retrieving tap attribute from sysfs 2019-06-01T03:05:41.395685+02:00 ha-idg-2 systemd-machined[2824]: Machine qemu-31-severin terminated.
2019-06-08T05:59:17.502899+02:00 ha-idg-1 systemd-coredump[31089]: Core Dumping has been disabled for process 19489 (qemu-system-x86). 2019-06-08T05:59:17.523050+02:00 ha-idg-1 systemd-coredump[31089]: Process 19489 (qemu-system-x86) of user 489 dumped core. 2019-06-08T05:59:17.650334+02:00 ha-idg-1 kernel: [999258.577132] br0: port 9(vnet7) entered disabled state 2019-06-08T05:59:17.650354+02:00 ha-idg-1 kernel: [999258.578103] device vnet7 left promiscuous mode 2019-06-08T05:59:17.650355+02:00 ha-idg-1 kernel: [999258.578108] br0: port 9(vnet7) entered disabled state 2019-06-08T05:59:25.983702+02:00 ha-idg-1 systemd-machined[1383]: Machine qemu-205-severin terminated.
Core Dumping is disabled, but nevertheless a core dump has been created ? Where could i find it ? Would it be useful to provide it ?
So this really hints to qemu crashing. It certainly will be beneficial to collect the backtrace, but you really should report this (including the error message from the vm log file) to the qemu team. They might have even fixed it by now, so a plain update might help.

----- On Jun 14, 2019, at 9:14 AM, Peter Krempa pkrempa@redhat.com wrote:
On Thu, Jun 13, 2019 at 16:01:18 +0200, Lentes, Bernd wrote:
----- On Jun 13, 2019, at 1:08 PM, Bernd Lentes bernd.lentes@helmholtz-muenchen.de wrote:
I found further information in /var/log/messages for both occurrences:
2019-06-01T03:05:31.620725+02:00 ha-idg-2 systemd-coredump[14253]: Core Dumping has been disabled for process 30590 (qemu-system-x86). 2019-06-01T03:05:31.712673+02:00 ha-idg-2 systemd-coredump[14253]: Process 30590 (qemu-system-x86) of user 488 dumped core. 2019-06-01T03:05:32.173272+02:00 ha-idg-2 kernel: [294682.387828] br0: port 4(vnet2) entered disabled state 2019-06-01T03:05:32.177111+02:00 ha-idg-2 kernel: [294682.388384] device vnet2 left promiscuous mode 2019-06-01T03:05:32.177122+02:00 ha-idg-2 kernel: [294682.388391] br0: port 4(vnet2) entered disabled state 2019-06-01T03:05:32.208916+02:00 ha-idg-2 wickedd[2954]: error retrieving tap attribute from sysfs 2019-06-01T03:05:41.395685+02:00 ha-idg-2 systemd-machined[2824]: Machine qemu-31-severin terminated.
2019-06-08T05:59:17.502899+02:00 ha-idg-1 systemd-coredump[31089]: Core Dumping has been disabled for process 19489 (qemu-system-x86). 2019-06-08T05:59:17.523050+02:00 ha-idg-1 systemd-coredump[31089]: Process 19489 (qemu-system-x86) of user 489 dumped core. 2019-06-08T05:59:17.650334+02:00 ha-idg-1 kernel: [999258.577132] br0: port 9(vnet7) entered disabled state 2019-06-08T05:59:17.650354+02:00 ha-idg-1 kernel: [999258.578103] device vnet7 left promiscuous mode 2019-06-08T05:59:17.650355+02:00 ha-idg-1 kernel: [999258.578108] br0: port 9(vnet7) entered disabled state 2019-06-08T05:59:25.983702+02:00 ha-idg-1 systemd-machined[1383]: Machine qemu-205-severin terminated.
Core Dumping is disabled, but nevertheless a core dump has been created ? Where could i find it ? Would it be useful to provide it ?
So this really hints to qemu crashing. It certainly will be beneficial to collect the backtrace, but you really should report this (including the error message from the vm log file) to the qemu team.
They might have even fixed it by now, so a plain update might help.
Hi Peter, thanks for your help. I'll continue on the Qemu ML: https://lists.nongnu.org/archive/html/qemu-discuss/2019-06/msg00014.html Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, Kerstin Guenther Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671
participants (2)
-
Lentes, Bernd
-
Peter Krempa