Re: [libvirt] [RFC][scale] new API for querying domains stats

Tuesday, 1 July 2014

On 01.07.2014 09:09, Francesco Romani wrote:
...
 Hi everyone,

 I'd like to discuss possible APIs and plans for new query APIs in libvirt.

 I'm one of the oVirt (http://www.ovirt.org) developers, and I write code for VDSM;
 VDSM is the node management daemon, which is in charge, among many other things, to
 gather the host and statistics per Domain/VM.

 Right now we aim for a number of VM per node in the (few) hundreds, but we have big
plans
 to scale much more, and to possibly reach thousands in a not so distant future.
 At the moment, we use one thread per VM to gather the VM stats (CPU, network, disk),
 and of course this obviously scales poorly. 
I think this is your main problem. Why not have only one thread that 
would manage list of domains to query and issue the APIs periodically 
instead of having one thread per domain?

...

 This is made only worse by the fact that VDSM is a python 2.7 application, and
notoriously
 python 2.x behaves very badly with threads. We are already working to improve our code,
 but I'd like to bring the discussion here and see if and when the querying API can be
improved.

 We currently use these APIs for our sempling:
    virDomainBlockInfo
    virDomainGetInfo
    virDomainGetCPUStats
    virDomainBlockStats
    virDomainBlockStatsFlags
    virDomainInterfaceStats
    virDomainGetVcpusFlags
    virDomainGetMetadata

 What we'd like to have is

 * asynchronous APIs for querying domain stats
(https://bugzilla.redhat.com/show_bug.cgi?id=1113106)
    This would be just awesome. Either a single callback or a different one per call is
fine
    (let's discuss this!).
    please note that we are much more concerned about thread reduction then about
performance
    numbers. We had report of thread number becoming a real harm, while performance so
far
    is not yet a concern (https://bugzilla.redhat.com/show_bug.cgi?id=1102147#c54) 
I'm not a big fan of this approach. I mean, IIRC python has this Big 
Python Lock, which effectively prevents two threads run concurrently. So 
while in C this would make perfect sense, it doesn't do so in python. 
The callbacks would be called from the event loop, which given how 
frequently you dump the info will block other threads. Therefore I'm 
afraid the approach would not bring any speed up, rather slow down.

...

 * bulk APIs for querying domain stats
(https://bugzilla.redhat.com/show_bug.cgi?id=1113116)
    would be really welcome as well. It is quite independent from the previous bullet
point
    and would help us greatly with scale. 
I think this one looks better. Especially if you consider my suggestion 
of having only one thread to serve all domains.

...

 So, I'd like to discuss if these additions are (or can be) in the project roadmap,
 and, if so, how the API could look like and what the possible timeframe could be.
 Of course I'd be happy to provide any further information about VDSM and its
workings.

 Thoughts very welcome!

 Thanks and best regards,

Michal

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [RFC][scale] new API for querying domains stats