
Hi Daniel, thanks for confirming that I'm on the right way. But I still experience problems with a heavily stressed node. Let me first explain my current node setup: <xm info> release : 2.6.18-1.2835.slc4xen version : #1 SMP Wed Nov 29 21:05:58 CET 2006 machine : i686 nr_cpus : 2 nr_nodes : 1 sockets_per_node : 2 cores_per_socket : 1 threads_per_core : 1 cpu_mhz : 2800 total_memory : 2047 xen_major : 3 xen_minor : 0 xen_extra : .3-rc5-1.2835.s xen_caps : xen-3.0-x86_32p xen_pagesize : 4096 </xm info> <xm vcpu-list> Name ID VCPUs CPU State Time(s) CPU Affinity Domain-0 0 0 0 r-- 9761.7 any cpu Domain-0 0 1 1 --- 10571.9 any cpu stornode 2 0 1 r-- 7287.6 1 stornode 2 1 1 --- 6473.1 1 worknode 3 0 0 --- 2139.3 0 worknode 3 1 0 --- 1368.2 0 worknode 3 2 0 --- 1223.1 0 worknode 3 3 0 --- 1349.5 0 </xm vcpu-list> I'm running on all domains on every virtual cpu a cpu stress tool, called cpuburn. Now I let my small sensor calculate the cpu utilisation of the whole node. I calculate the cpu utilisation for each domain, one after another, and then sum up the results to the node value. In the described stress situation it tooks about an average of 4 seconds to make the following to function calls, which provide me the cpuTime of a domain dom_old = virDomainLookupByID(conn_old, listOfDomains[i]); ret = virDomainGetInfo(dom_old, &info_old); Here are the stats from my latest measurement: old cpuTime new cpuTime Domain-0: 3s 4294835190ms 3s 4294849513ms stornode: 5s 580501ms 6s 4294550691ms worknode: 6s 4294546809ms 5s 582761ms That leads to results in cpu utilisation computation for the node, which are much lower, around 75%, than the real value (100%) would be. One solution would be to add the measured time make those calls to used cpuTime. But this in turn can cause calculations of to high values because I don't really know in which point in time the value is written to the structure. Nevertheless is xentop showing me every time the correct cpu- utilisation of each of my domains. So that I conclude, that this problem must have something to do with libvirt API. Do you ore does anybody else experienced similar issues? Do you know any solution to that? Cheers, Jan On 10.05.2007, at 18:32, Daniel P. Berrange wrote:
On Thu, May 10, 2007 at 05:41:33PM +0200, Jan Michael wrote:
Hi everyone,
using libvirt I'm trying to calculate cpu utilization of a node in percent. But sometimes values beyond 100.0% are being calculated. This is because a domain spend more time on a cpu than time is elapsed in the meantime.
A short explanation of the way how cpu utilization is computed in my case:
1. - open two connections with conn_cur/conn_old = virConnectOpenReadOnly(NULL); 2. - get current time gettimeofday(&time_old, NULL); - get domain by id with dom_old = virDomainLookupByID(conn_old, id) - get domain information virDomainGetInfo(dom_old, &info_old); 3. - sleep a second
4. - doing same stuff like in 2. but with _cur
5. - compute cpu utilization by dividing used cputime by elapsed time and multiply with 100
Am I right if I suppose that cpuTime for _virDomainInfo structure will be directly acquired from the hypervisor in virDomainGetInfo (dom_old, &info_old) or is it already present with getting the domain itself? Is there any better solution of doing this, which is more precise?
This is the best approach - the algorithm you summarized is basically the same as I use in virt-manager. The reason it sometimes goes above 100% is just due to timing / schedular variations
1. get timeofday 2. get cputime for domA 3. sleep a while 4. get timeofday 5. get cputime for domA
We're basically looking at the ratio of 4-1, against 5-2. It would be 100% accurate if you could guarentee no time elapased between steps 1 & 2, or between steps 4 & 5, but there's always some latency in there, so occassionally you might end up calculating a value that is a tiny bit over 100%. In virt-manager I deal with this by simply rounding down to 100 if this occurs.
Based on the hypercalls which are available to us, I don't see any way to avoid this scenario. Then again it is not like we really need millisecond precision in caculating CPU usage so I don't think its a problem worrying about too much.
And another general question: The monitoring utility of xen, called xentop, provides also statistics about networking and vbds. Are there any plans to provide this values by libvirt in the future?
I'd like to see the ability to track network & disk I/O stats. No one has so far stepped forward to suggest an API or implmentation, but I'd welcome anyone interested in taking a look at this area.
Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/ ~danberr/ -=| |=- Projects: http://freshmeat.net/ ~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|