On 01/18/2018 08:25 AM, Ján Tomko wrote:
> On Wed, Jan 17, 2018 at 04:45:38PM +0200, Serhii Kharchenko wrote:
>> Hello libvirt-users list,
>>
>> We're catching the same bug since 3.4.0 version (3.3.0 works OK).
>> So, we have process that is permanently connected to libvirtd via socket
>> and it is collecting stats, listening to events and control the VPSes.
>>
>> When we try to 'shutdown' a number of VPSes we often catch the bug.
>> One of
>> VPSes sticks in 'in shutdown' state, no related 'qemu' process
is
>> present,
>> and there is the next error in the log:
>>
>> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.005+0000:
>> 20438: warning : qemuGetProcessInfo:1460 : cannot parse process status
>> data
>> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000:
>> 20441: error : virFileReadAll:1420 : Failed to open file
>> '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\
x2d36\x2dDOMAIN1.scope/cpuacct.usage':
>>
>> No such file or directory
>> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000:
>> 20441: error : virCgroupGetValueStr:844 : Unable to read from
>> '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\
x2d36\x2dDOMAIN1.scope/cpuacct.usage':
>>
>> No such file or directory
>> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000:
>> 20441: error : virCgroupGetDomainTotalCpuStats:3319 : unable to get cpu
>> account: Operation not permitted
>> Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000:
>> 20522: warning : qemuDomainObjBeginJobInternal:4862 : Cannot start job
>> (destroy, none) for domain DOMAIN1; current job is (query, none) owned
by
>> (20440 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (30s, 0s)
>> Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000:
>> 20522: error : qemuDomainObjBeginJobInternal:4874 : Timed out during
>> operation: cannot acquire state change lock (held by
>> remoteDispatchConnectGetAllDomainStats)
>>
>> I think only the last line matters.
>> The bug is highly reproducible. We can easily catch it even when we call
>> multiple 'virsh shutdown' in shell one by one.
>>
>> When we shutdown the process connected to the socket - everything
>> become OK
>> and the bug is gone.
>>
>> The system is used is Gentoo Linux, tried all modern versions of libvirt
>> (3.4.0, 3.7.0, 3.8.0, 3.9.0, 3.10.0, 4.0.0-rc2 (today's version from
>> git))
>> and they have this bug. 3.3.0 works OK.
>>
>
> I don't see anything obvious stats related in the diff between 3.3.0 and
> 3.4.0. We have added reporting of the shutdown reason, but that's just
> parsing one more JSON reply we previously ignored.
>
> Can you try running 'git bisect' to pinpoint the exact commit that
> caused this issue?
I am able to reproduce this issue, ran bisect and fount that the commit
which broke it is aeda1b8c56dc58b0a413acc61bbea938b40499e1.
https://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=
aeda1b8c56dc58b0a413acc61bbea938b40499e1;hp=ec337aee9b20091d6f9f60b78f210d
55f812500b
But it's very unlikely that the commit is causing the error. If anything
it is just exposing whatever error we have there. I mean, if I revert
the commit on the top of current HEAD I can no longer reproduce the issue.
Michal
Michal, Ján,
I've got the same results:
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[aeda1b8c56dc58b0a413acc61bbea938b40499e1] qemu: monitor: do not report
error on shutdown
And yes, when I revert it in HEAD - the problem is gone.