Re: How to monitor domains in regards steal time and other important metrics (VIR_DOMAIN_STATS_VCPU) ?

Friday, 19 January 2024

Hi Christian,

I can't answer to your question which is too technical for my humble 
knowledge but I wanted to seize the opportunity to thank you for your 
effort into maintaining a prometheus exporter for libvirt.

Also I wanted to talk a bit about the features of your exporter, maybe 
this discussion should be held elsewhere, let me know.

You proposed the prometheus-community to adopt you exporter, which is a 
super cool idea. But IMHO before that you should have or plan to expose 
a bit more metrics.

The metric list in the README contains only domain related metrics. 
Other exporters like tinkoff (the one I'm using) expose a bit more, I 
know there are metrics about pools at least. Do you plan to include more 
metrics in the future (volumes, volume pools, networks...)? I can 
understand if you need only domain related metrics but I think the other 
metrics should be there if this become a kinda official exporter for 
libvirt.

Thanks again.

Guy Godfroy

Le 19/01/2024 à 12:35, Christian Rohmann via Users a écrit :
> With the holidays and all I take the liberty to bump this post.
> Anybody got any idea on how to monitor steal time then?
>
>
> On 21.12.23 17:36, Christian Rohmann wrote:
>> Hey libvirt-users,
>>
>> first allow me to give a little background.
>>
>> We monitor performance metrics of OpenStack Nova VMs using libvirt as 
>> hypervisor. We used to run the libvirt prometheus exporter written by 
>> zhangjianweibj [1].
>> This exporter, compared to the one from kumina / tinkoff ([2]) makes 
>> use of the DigitalOcean go-libvirt [3], but that should not make much 
>> of a difference for my questions.
>> Since the development of that exporter seems to have stalled and we 
>> wanted to rework and contribute new features to it, we created a fork 
>> [4].
>> After working trough the various ideas we had and applying them to 
>> the code, we proposed the prometheus-community to adopt the exporter 
>> [5] to ensure it is maintained
>> and to serve as a reference exporter even.
>>
>>
>> Now to my actual question ...
>>
>> Libvirt exposes per VCPU stats for domains via [6]. I'd like to be 
>> able to export those via the exporter.
>> One important metric to me would be things like the steal time 
>> (vcpu.<num>.delay), to determine is domains are starting to get cut 
>> short or even starve
>> on cpu time. Apparently those metrics are / cannot be expose anymore 
>> since the switch to CGroupsV2? Reading [7] or [8] others seem to have 
>> run into this.
>>
>> Is this actually still the case, even for more recent kernels? If so, 
>> I am wondering if there is an issue being tracked to implement this 
>> functionality?
>> How is the steal time reported to the guest if the hypervisor is 
>> unable to export this info?
>>
>> Then there are other approaches like vmtop by Digital Ocean [9], 
>> which does use info and metrics available via /proc to determine 
>> steal time and other vcpu based metrics.
>> So it seems the required data is somewhat available from the kernel?
>>
>>
>> Last but not least I'd like your opinion on what other key metrics 
>> are important to monitoring on hypervisors and their guests?
>>
>>
>>
>>
>> Regards
>>
>>
>> Christian
>>
>>
>>
>>
>> [1] https://github.com/zhangjianweibj/prometheus-libvirt-exporter
>> [2] https://github.com/Tinkoff/libvirt-exporter
>> [3] https://github.com/digitalocean/go-libvirt
>> [4] https://github.com/inovex/prometheus-libvirt-exporter
>> [5] https://github.com/prometheus-community/community/issues/50
>> [6] 
>> https://libvirt.org/html/libvirt-libvirt-domain.html#VIR_DOMAIN_STATS_VCPU
>> [7] https://bugzilla.redhat.com/show_bug.cgi?id=2015763
>> [8] https://bugzilla.redhat.com/show_bug.cgi?id=1796543
>> [9] https://github.com/digitalocean/vmtop/
>>
> _______________________________________________
> Users mailing list -- users(a)lists.libvirt.org
> To unsubscribe send an email to users-leave(a)lists.libvirt.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: How to monitor domains in regards steal time and other important metrics (VIR_DOMAIN_STATS_VCPU) ?