On Fri, Feb 02, 2018 at 02:53:50PM +0100, Viktor Mihajlovski wrote:
On 01.02.2018 21:26, Eduardo Habkost wrote:
> On Thu, Feb 01, 2018 at 09:15:15PM +0100, Radim Krčmář wrote:
>> 2018-02-01 12:54-0500, Luiz Capitulino:
>>>
>>> Libvirt needs to know when a vCPU is halted. To get this information,
>>
>> I don't see why upper level management should care about that, a single
>> bit about halted state that can be incorrect at the time it is processed
>> seems of very limited use.
>
> I don't see why, either.
>
> I'm CCing libvir-list and the people involved in the code that
> added halt state to libvirt domain statistics.
>
I'll try to explain the motivation for the "halted" state exposure and
why it ended int the libvirt domain stats.
s390 CPUs can be present in a system (e.g. after being hotplugged) but
be offline (disabled) in which case they are not used by the operating
system. In Linux disabled CPUs show a value of '0' in
/sys/devices/system/cpu/cpu<n>/online.
Higher level management software (on top of libvirt) can take advantage
of knowing whether a guest CPU is online and thus used or not.
Specifically it might not make sense to plug more CPUs if the guest OS
isn't using the CPUs at all.
Wasn't this already represented on "vcpu.<n>.state"? Why is
"vcpu.<n>.halted" needed?
A disabled guest CPU is represented as halted in the QEMU object model
and can therefore be identified by the QMP query-cpus command.
The initial patch proposal to expose this via virsh vcpuinfo was not
considered to be desirable because there was a concern that legacy
management software might be confused seeing halted vcpus. Therefore the
state information was added to the cpu domain statistics.
One issue we're facing is that the semantics of "halted" are different
between s390 and at least x86. The question might be whether they are
different enough to grant a specific "disabled" indicator.
From your description, it looks like they are completely
different. On x86, a CPU that is online and in use can be moved
between halted and non-halted state many times a second.
If that's the case, we can probably fix this without breaking
existing code: explicitly documenting the semantics of
"vcpu.<n>.halted" at virConnectGetAllDomainStats() to mean "not
online" (i.e. the s390 semantics, not the x86 one), and making
qemuMonitorGetCpuHalted() s390-specific.
Possibly a better long-term solution is to deprecate
"vcpu.<n>.halted" and make "vcpu.<n>.state" work correctly
on
s390.
It would be also interesting to update QEMU QMP documentation to
clarify the arch-specific semantics of "halted".
--
Eduardo