On Fri, 2 Feb 2018 14:53:50 +0100
Viktor Mihajlovski <mihajlov(a)linux.vnet.ibm.com> wrote:
On 01.02.2018 21:26, Eduardo Habkost wrote:
> On Thu, Feb 01, 2018 at 09:15:15PM +0100, Radim Krčmář wrote:
>> 2018-02-01 12:54-0500, Luiz Capitulino:
>>>
>>> Libvirt needs to know when a vCPU is halted. To get this information,
>>
>> I don't see why upper level management should care about that, a single
>> bit about halted state that can be incorrect at the time it is processed
>> seems of very limited use.
>
> I don't see why, either.
>
> I'm CCing libvir-list and the people involved in the code that
> added halt state to libvirt domain statistics.
>
I'll try to explain the motivation for the "halted" state exposure and
why it ended int the libvirt domain stats.
s390 CPUs can be present in a system (e.g. after being hotplugged) but
be offline (disabled) in which case they are not used by the operating
system. In Linux disabled CPUs show a value of '0' in
/sys/devices/system/cpu/cpu<n>/online.
If that's all you want, have you considered using the guest agent?
Higher level management software (on top of libvirt) can take
advantage
of knowing whether a guest CPU is online and thus used or not.
Specifically it might not make sense to plug more CPUs if the guest OS
isn't using the CPUs at all.
OK, so what's the algorithm used by the higher level management
software where this all fits together? Something like:
1. Hotplug vCPU
2. Poll "halted" state
3. If "halted" becomes true, hotplug more vCPUs
4. If "halted" never becomes true, don't hotplug more CPUs
If that's the case, then I guess grepping for State in
/proc/qemu-pid/threadid/status will have the same end result, no?
A disabled guest CPU is represented as halted in the QEMU object model
and can therefore be identified by the QMP query-cpus command.
The initial patch proposal to expose this via virsh vcpuinfo was not
considered to be desirable because there was a concern that legacy
management software might be confused seeing halted vcpus. Therefore the
state information was added to the cpu domain statistics.
One issue we're facing is that the semantics of "halted" are different
between s390 and at least x86. The question might be whether they are
different enough to grant a specific "disabled" indicator.
[...]