
On Fri, Feb 02, 2018 at 12:15:54PM -0200, Eduardo Habkost wrote:
On Fri, Feb 02, 2018 at 02:53:50PM +0100, Viktor Mihajlovski wrote:
On 01.02.2018 21:26, Eduardo Habkost wrote:
On Thu, Feb 01, 2018 at 09:15:15PM +0100, Radim Krčmář wrote:
2018-02-01 12:54-0500, Luiz Capitulino:
Libvirt needs to know when a vCPU is halted. To get this information,
I don't see why upper level management should care about that, a single bit about halted state that can be incorrect at the time it is processed seems of very limited use.
I don't see why, either.
I'm CCing libvir-list and the people involved in the code that added halt state to libvirt domain statistics.
I'll try to explain the motivation for the "halted" state exposure and why it ended int the libvirt domain stats.
s390 CPUs can be present in a system (e.g. after being hotplugged) but be offline (disabled) in which case they are not used by the operating system. In Linux disabled CPUs show a value of '0' in /sys/devices/system/cpu/cpu<n>/online.
Higher level management software (on top of libvirt) can take advantage of knowing whether a guest CPU is online and thus used or not. Specifically it might not make sense to plug more CPUs if the guest OS isn't using the CPUs at all.
Wasn't this already represented on "vcpu.<n>.state"? Why is "vcpu.<n>.halted" needed?
A disabled guest CPU is represented as halted in the QEMU object model and can therefore be identified by the QMP query-cpus command.
The initial patch proposal to expose this via virsh vcpuinfo was not considered to be desirable because there was a concern that legacy management software might be confused seeing halted vcpus. Therefore the state information was added to the cpu domain statistics.
One issue we're facing is that the semantics of "halted" are different between s390 and at least x86. The question might be whether they are different enough to grant a specific "disabled" indicator.
From your description, it looks like they are completely different. On x86, a CPU that is online and in use can be moved between halted and non-halted state many times a second.
If that's the case, we can probably fix this without breaking existing code: explicitly documenting the semantics of "vcpu.<n>.halted" at virConnectGetAllDomainStats() to mean "not online" (i.e. the s390 semantics, not the x86 one), and making qemuMonitorGetCpuHalted() s390-specific.
Possibly a better long-term solution is to deprecate "vcpu.<n>.halted" and make "vcpu.<n>.state" work correctly on s390.
It would be also interesting to update QEMU QMP documentation to clarify the arch-specific semantics of "halted".
Any also especially clarify the awful performance implications of running this particular query command. In general I would not expect query-xxx monitor commands to interrupt all vcpus, so we should clearly warn about this ! Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|