On 01.07.2014 11:33, Daniel P. Berrange wrote:
On Tue, Jul 01, 2014 at 11:19:04AM +0200, Michal Privoznik wrote:
> On 01.07.2014 09:09, Francesco Romani wrote:
>> Hi everyone,
>>
>> I'd like to discuss possible APIs and plans for new query APIs in libvirt.
>>
>> I'm one of the oVirt (
http://www.ovirt.org) developers, and I write code for
VDSM;
>> VDSM is the node management daemon, which is in charge, among many other things,
to
>> gather the host and statistics per Domain/VM.
>>
>> Right now we aim for a number of VM per node in the (few) hundreds, but we have
big plans
>> to scale much more, and to possibly reach thousands in a not so distant future.
>> At the moment, we use one thread per VM to gather the VM stats (CPU, network,
disk),
>> and of course this obviously scales poorly.
>
> I think this is your main problem. Why not have only one thread that would
> manage list of domains to query and issue the APIs periodically instead of
> having one thread per domain?
You suffer from round trip time on every API call if you serialize it all
in a single thread. eg if every API call is 50ms and you want to check
once per scond, you can only monitor 20 VMs before you take more time than
you have available. This really sucks when the majority of that 50ms is a
sleep in poll() waiting for the RPC response.
Unless you have the bulk query API which will take the RTT only once ;)
>> This is made only worse by the fact that VDSM is a python 2.7 application, and
notoriously
>> python 2.x behaves very badly with threads. We are already working to improve our
code,
>> but I'd like to bring the discussion here and see if and when the querying
API can be improved.
>>
>> We currently use these APIs for our sempling:
>> virDomainBlockInfo
>> virDomainGetInfo
>> virDomainGetCPUStats
>> virDomainBlockStats
>> virDomainBlockStatsFlags
>> virDomainInterfaceStats
>> virDomainGetVcpusFlags
>> virDomainGetMetadata
>>
>> What we'd like to have is
>>
>> * asynchronous APIs for querying domain stats
(
https://bugzilla.redhat.com/show_bug.cgi?id=1113106)
>> This would be just awesome. Either a single callback or a different one per
call is fine
>> (let's discuss this!).
>> please note that we are much more concerned about thread reduction then about
performance
>> numbers. We had report of thread number becoming a real harm, while
performance so far
>> is not yet a concern
(
https://bugzilla.redhat.com/show_bug.cgi?id=1102147#c54)
>
> I'm not a big fan of this approach. I mean, IIRC python has this Big Python
> Lock, which effectively prevents two threads run concurrently. So while in C
> this would make perfect sense, it doesn't do so in python. The callbacks
> would be called from the event loop, which given how frequently you dump the
> info will block other threads. Therefore I'm afraid the approach would not
> bring any speed up, rather slow down.
I'm not sure I agree with your assessment here. If we consider a single
API call, the time this takes to complete is made up of a number of parts
1. Time to write() the RPC call to the socket
2. Time for libvirtd to process the RPC call
3. Time to recv() the RPC reply from the socket
1. Time to write() the RPC call to the socket
2. Time for libvirtd to process the RPC call
3. Time to recv() the RPC reply from the socket
1. Time to write() the RPC call to the socket
2. Time for libvirtd to process the RPC call
3. Time to recv() the RPC reply from the socket
...and so on..
If the time for item 2 dominates over the time for items 1 & 2 (which
it should really) then the client thread is going to be sleeping in a
poll() for the bulk of the duration of the libvirt API call. If we had
an async API mechanism, then the VDSM time would essentially be consumed
with
1. Time to write() the RPC call to the socket
2. Time to write() the RPC call to the socket
3. Time to write() the RPC call to the socket
4. Time to write() the RPC call to the socket
5. Time to write() the RPC call to the socket
6. Time to write() the RPC call to the socket
7. wait for replies to start arriving
8. Time to recv() the RPC reply from the socket
9. Time to recv() the RPC reply from the socket
10. Time to recv() the RPC reply from the socket
11. Time to recv() the RPC reply from the socket
12. Time to recv() the RPC reply from the socket
13. Time to recv() the RPC reply from the socket
14. Time to recv() the RPC reply from the socket
Well, in the async form you need to account even the time spent in the
callbacks:
1. write(serial=1, ...)
2. write(serial=2, ...)
..
7. wait for replies
8. recv(serial=x1, ...) // there's no guarantee on order of replies
9. callback(serial=x1, ...)
10. recv(serial=x2, ...)
11. callback(serial=x2, ....)
And it's the callback times I'm worried about. I'm not saying we should
not add the callback APIs. What I'm really saying is I have doubts it
will help python apps. It will definitely help scaling C applications
though.
Of course there's a limit to how many outstanding async calls you
can
make before the event loop gets 100% busy processing the responses,
but I don't think that makes async calls worthless. Even if we had the
bulk list API calls, async calling would be useful, because it would
let VDSM fire off requests for disk, net, cpu, mem stats in parallel
from a single thread.
Regards,
Daniel
Michal