Re: [libvirt] [RFC] kvm: x86: export vCPU halted state to sysfs

Friday, 2 February 2018

On Fri, Feb 02, 2018 at 12:15:54PM -0200, Eduardo Habkost wrote:
...
 On Fri, Feb 02, 2018 at 02:53:50PM +0100, Viktor Mihajlovski wrote:
 > On 01.02.2018 21:26, Eduardo Habkost wrote:
 > > On Thu, Feb 01, 2018 at 09:15:15PM +0100, Radim Krčmář wrote:
 > >> 2018-02-01 12:54-0500, Luiz Capitulino:
 > >>>
 > >>> Libvirt needs to know when a vCPU is halted. To get this information,
 > >>
 > >> I don't see why upper level management should care about that, a
single
 > >> bit about halted state that can be incorrect at the time it is processed
 > >> seems of very limited use.
 > > 
 > > I don't see why, either.
 > > 
 > > I'm CCing libvir-list and the people involved in the code that
 > > added halt state to libvirt domain statistics.
 > > 
 > I'll try to explain the motivation for the "halted" state exposure
and
 > why it ended int the libvirt domain stats.
 > 
 > s390 CPUs can be present in a system (e.g. after being hotplugged) but
 > be offline (disabled) in which case they are not used by the operating
 > system. In Linux disabled CPUs show a value of '0' in
 > /sys/devices/system/cpu/cpu<n>/online.
 > 
 > Higher level management software (on top of libvirt) can take advantage
 > of knowing whether a guest CPU is online and thus used or not.
 > Specifically it might not make sense to plug more CPUs if the guest OS
 > isn't using the CPUs at all.

 Wasn't this already represented on "vcpu.<n>.state"?  Why is
 "vcpu.<n>.halted" needed?

 > 
 > A disabled guest CPU is represented as halted in the QEMU object model
 > and can therefore be identified by the QMP query-cpus command.
 > 
 > The initial patch proposal to expose this via virsh vcpuinfo was not
 > considered to be desirable because there was a concern that legacy
 > management software might be confused seeing halted vcpus. Therefore the
 > state information was added to the cpu domain statistics.
 > 
 > One issue we're facing is that the semantics of "halted" are
different
 > between s390 and at least x86. The question might be whether they are
 > different enough to grant a specific "disabled" indicator.

 From your description, it looks like they are completely
 different.  On x86, a CPU that is online and in use can be moved
 between halted and non-halted state many times a second.

 If that's the case, we can probably fix this without breaking
 existing code: explicitly documenting the semantics of
 "vcpu.<n>.halted" at virConnectGetAllDomainStats() to mean "not
 online" (i.e. the s390 semantics, not the x86 one), and making
 qemuMonitorGetCpuHalted() s390-specific.

 Possibly a better long-term solution is to deprecate
 "vcpu.<n>.halted" and make "vcpu.<n>.state" work
correctly on
 s390.

 It would be also interesting to update QEMU QMP documentation to
 clarify the arch-specific semantics of "halted". 
Any also especially clarify the awful performance implications of running
this particular query command. In general I would not expect query-xxx
monitor commands to interrupt all vcpus, so we should clearly warn about
this !

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [RFC] kvm: x86: export vCPU halted state to sysfs