On 9/30/21 11:24, Daniel P. Berrangé wrote:
On Thu, Sep 30, 2021 at 11:15:18AM -0600, Jim Fehlig wrote:
> On 9/29/21 21:29, Jim Fehlig wrote:
>> Hi All,
>>
>> Likely Christian received a bug report that motivated commit aeda1b8c56,
>> which was later reverted by Michal with commit 72adaf2f10. In the past,
>> I recall being asked about "internal error: End of file from qemu
>> monitor" on normal VM shutdown and gave a hand wavy response using some
>> of Michal's words from the revert commit message.
>>
>> I recently received a bug report (sorry, but no public link) from a
>> concerned user about this error and wondered if there is some way to
>> improve it? I went down some dead ends before circling back to
>> Christian's patch. When rebased to latest master, I cannot reproduce the
>> hangs reported by Michal [1]. Perhaps Nikolay's series to resolve
>> hangs/crashes of libvirtd [2] has now made Christian's patch viable?
>
> Hmm, Nikolay's series improves thread management at daemon shutdown and
> doesn't touch VM shutdown logic. But there has been some behavior change
> from the time aeda1b8c56 was committed (3.4.0 dev cycle) to current git
> master. At least I don't see libvirtd hanging when running Michal's test on
> master + rebased aeda1b8c56.
This particular "eof" error message has been a source of never ending
complaints from people who mistakenly think it indicates a significant
problem.
Nod. I've probably been asked about it more times than I'm remembering.
"error"
and "fail" often catch the attention of users and monitor tools.
I've always been wary of hiding it by default as there are
potentially
scenarios it which it is interesting to see. I think I'm coming around
to the idea though that we're better off hiding it by default. The
scenarios which care about it will probably already need the user to
contribute full debug level logs in order to diagnose properly.
I've started a variant of Michal's test on git master + rebased aeda1b8c56. The
test starts 6 VMs, waits 90sec, shuts them down in parallel, waits for shutdowns
to finish, then repeats. All while continuously calling GetAllDomainStats from
another client. I'll see how that holds up before re-proposing Christian's patch.
Regards,
Jim