On 01.07.2014 09:09, Francesco Romani wrote:
Hi everyone,
I'd like to discuss possible APIs and plans for new query APIs in libvirt.
I'm one of the oVirt (
http://www.ovirt.org) developers, and I write code for VDSM;
VDSM is the node management daemon, which is in charge, among many other things, to
gather the host and statistics per Domain/VM.
Right now we aim for a number of VM per node in the (few) hundreds, but we have big
plans
to scale much more, and to possibly reach thousands in a not so distant future.
At the moment, we use one thread per VM to gather the VM stats (CPU, network, disk),
and of course this obviously scales poorly.
I think this is your main problem. Why not have only one thread that
would manage list of domains to query and issue the APIs periodically
instead of having one thread per domain?
This is made only worse by the fact that VDSM is a python 2.7 application, and
notoriously
python 2.x behaves very badly with threads. We are already working to improve our code,
but I'd like to bring the discussion here and see if and when the querying API can be
improved.
We currently use these APIs for our sempling:
virDomainBlockInfo
virDomainGetInfo
virDomainGetCPUStats
virDomainBlockStats
virDomainBlockStatsFlags
virDomainInterfaceStats
virDomainGetVcpusFlags
virDomainGetMetadata
What we'd like to have is
* asynchronous APIs for querying domain stats
(
https://bugzilla.redhat.com/show_bug.cgi?id=1113106)
This would be just awesome. Either a single callback or a different one per call is
fine
(let's discuss this!).
please note that we are much more concerned about thread reduction then about
performance
numbers. We had report of thread number becoming a real harm, while performance so
far
is not yet a concern (
https://bugzilla.redhat.com/show_bug.cgi?id=1102147#c54)
I'm not a big fan of this approach. I mean, IIRC python has this Big
Python Lock, which effectively prevents two threads run concurrently. So
while in C this would make perfect sense, it doesn't do so in python.
The callbacks would be called from the event loop, which given how
frequently you dump the info will block other threads. Therefore I'm
afraid the approach would not bring any speed up, rather slow down.
* bulk APIs for querying domain stats
(
https://bugzilla.redhat.com/show_bug.cgi?id=1113116)
would be really welcome as well. It is quite independent from the previous bullet
point
and would help us greatly with scale.
I think this one looks better. Especially if you consider my suggestion
of having only one thread to serve all domains.
So, I'd like to discuss if these additions are (or can be) in the project roadmap,
and, if so, how the API could look like and what the possible timeframe could be.
Of course I'd be happy to provide any further information about VDSM and its
workings.
Thoughts very welcome!
Thanks and best regards,
Michal