Hi Christian,
I can't answer to your question which is too technical for my humble
knowledge but I wanted to seize the opportunity to thank you for your
effort into maintaining a prometheus exporter for libvirt.
Also I wanted to talk a bit about the features of your exporter, maybe
this discussion should be held elsewhere, let me know.
You proposed the prometheus-community to adopt you exporter, which is a
super cool idea. But IMHO before that you should have or plan to expose
a bit more metrics.
The metric list in the README contains only domain related metrics.
Other exporters like tinkoff (the one I'm using) expose a bit more, I
know there are metrics about pools at least. Do you plan to include more
metrics in the future (volumes, volume pools, networks...)? I can
understand if you need only domain related metrics but I think the other
metrics should be there if this become a kinda official exporter for
libvirt.
Thanks again.
Guy Godfroy
Le 19/01/2024 à 12:35, Christian Rohmann via Users a écrit :
> With the holidays and all I take the liberty to bump this post.
> Anybody got any idea on how to monitor steal time then?
>
>
> On 21.12.23 17:36, Christian Rohmann wrote:
>> Hey libvirt-users,
>>
>> first allow me to give a little background.
>>
>> We monitor performance metrics of OpenStack Nova VMs using libvirt as
>> hypervisor. We used to run the libvirt prometheus exporter written by
>> zhangjianweibj [1].
>> This exporter, compared to the one from kumina / tinkoff ([2]) makes
>> use of the DigitalOcean go-libvirt [3], but that should not make much
>> of a difference for my questions.
>> Since the development of that exporter seems to have stalled and we
>> wanted to rework and contribute new features to it, we created a fork
>> [4].
>> After working trough the various ideas we had and applying them to
>> the code, we proposed the prometheus-community to adopt the exporter
>> [5] to ensure it is maintained
>> and to serve as a reference exporter even.
>>
>>
>> Now to my actual question ...
>>
>> Libvirt exposes per VCPU stats for domains via [6]. I'd like to be
>> able to export those via the exporter.
>> One important metric to me would be things like the steal time
>> (vcpu.<num>.delay), to determine is domains are starting to get cut
>> short or even starve
>> on cpu time. Apparently those metrics are / cannot be expose anymore
>> since the switch to CGroupsV2? Reading [7] or [8] others seem to have
>> run into this.
>>
>> Is this actually still the case, even for more recent kernels? If so,
>> I am wondering if there is an issue being tracked to implement this
>> functionality?
>> How is the steal time reported to the guest if the hypervisor is
>> unable to export this info?
>>
>> Then there are other approaches like vmtop by Digital Ocean [9],
>> which does use info and metrics available via /proc to determine
>> steal time and other vcpu based metrics.
>> So it seems the required data is somewhat available from the kernel?
>>
>>
>> Last but not least I'd like your opinion on what other key metrics
>> are important to monitoring on hypervisors and their guests?
>>
>>
>>
>>
>> Regards
>>
>>
>> Christian
>>
>>
>>
>>
>> [1]
https://github.com/zhangjianweibj/prometheus-libvirt-exporter
>> [2]
https://github.com/Tinkoff/libvirt-exporter
>> [3]
https://github.com/digitalocean/go-libvirt
>> [4]
https://github.com/inovex/prometheus-libvirt-exporter
>> [5]
https://github.com/prometheus-community/community/issues/50
>> [6]
>>
https://libvirt.org/html/libvirt-libvirt-domain.html#VIR_DOMAIN_STATS_VCPU
>> [7]
https://bugzilla.redhat.com/show_bug.cgi?id=2015763
>> [8]
https://bugzilla.redhat.com/show_bug.cgi?id=1796543
>> [9]
https://github.com/digitalocean/vmtop/
>>
> _______________________________________________
> Users mailing list -- users(a)lists.libvirt.org
> To unsubscribe send an email to users-leave(a)lists.libvirt.org