On 10/8/21 11:37 PM, Jim Fehlig wrote:
> There have been countless reports from users concerned about the following
> error reported by libvirtd when qemu domains are shutdown
>
> internal error: End of file from qemu monitor
>
> While the error is harmless, users often mistaken it for real problem with
> their deployments. EOF from the monitor can't be entirely ignored since
> other threads may be using the monitor and must be able to detect the EOF
> condition.
>
> One potential fix is to delay reporting EOF until the monitor is used
> after EOF is detected. This patch adds a 'goteof' member to the
> qemuMonitor structure, which is set when EOF is detected on the monitor
> socket. If another thread later tries to send data on the monitor, the
> EOF error is reported.
>
> Signed-off-by: Jim Fehlig <jfehlig(a)suse.com>
> ---
>
> An RFC patch to squelch qemu monitor EOF error messages on VM shutdown.
> Previous discussions and information on testing of the patch can be
> found in this thread
>
>
https://listman.redhat.com/archives/libvir-list/2021-September/msg00949.html
>
> src/qemu/qemu_monitor.c | 29 ++++++++++++++++-------------
> 1 file changed, 16 insertions(+), 13 deletions(-)
>
> diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c
> index 5fc23f13d3..751ec8ba6c 100644
> --- a/src/qemu/qemu_monitor.c
> +++ b/src/qemu/qemu_monitor.c
> @@ -113,6 +113,7 @@ struct _qemuMonitor {
>
> /* true if qemu no longer wants 'props' sub-object of object-add */
> bool objectAddNoWrap;
> + bool goteof;
> };
>
> /**
> @@ -526,10 +527,10 @@ qemuMonitorIO(GSocket *socket G_GNUC_UNUSED,
> {
> qemuMonitor *mon = opaque;
> bool error = false;
> - bool eof = false;
> bool hangup = false;
>
> virObjectRef(mon);
> + mon->goteof = false;
>
At this point, the monitor object is unlocked (see the line below). So
setting this flag outside is potentially dangerous. But, I don't think
we need to set ->goteof here at all, do we? I mean, the moment we see
EOF the monitor object will be disposed and not ever used again.
Agreed. I've removed it and sent a non-RFC version of the patch
As an update to the commentary in that patch, my tests have now survived 220
iterations.
Regards,
Jim