Re: [Libvir] Question on acquiring cpuTime in struct _virDomainInfo

23 May 2007

      Hi Daniel,

thanks for confirming that I'm on the right way. But I still  
experience problems with a heavily stressed node. Let me first  
explain my current node setup:

<xm info>
	release                : 2.6.18-1.2835.slc4xen
	version                : #1 SMP Wed Nov 29 21:05:58 CET 2006
	machine                : i686
	nr_cpus                : 2
	nr_nodes               : 1
	sockets_per_node       : 2
	cores_per_socket       : 1
	threads_per_core       : 1
	cpu_mhz                : 2800
	total_memory           : 2047
	xen_major              : 3
	xen_minor              : 0
	xen_extra              : .3-rc5-1.2835.s
	xen_caps               : xen-3.0-x86_32p
	xen_pagesize           : 4096
</xm info>

<xm vcpu-list>
	Name             ID VCPUs   CPU State   Time(s) CPU Affinity
	Domain-0         0     0     0   r--    9761.7 any cpu
	Domain-0         0     1     1   ---   10571.9 any cpu
	stornode         2     0     1   r--    7287.6 1
	stornode         2     1     1   ---    6473.1 1
	worknode         3     0     0   ---    2139.3 0
	worknode         3     1     0   ---    1368.2 0
	worknode         3     2     0   ---    1223.1 0
	worknode         3     3     0   ---    1349.5 0
</xm vcpu-list>

I'm running on all domains on every virtual cpu a cpu stress tool,  
called cpuburn. Now I let my small sensor calculate the cpu  
utilisation of the whole node. I calculate the cpu utilisation for  
each domain, one after another, and then sum up the results to the  
node value.

In the described stress situation it tooks about an average of 4  
seconds to make the following to function calls, which provide me the  
cpuTime of a domain

             dom_old = virDomainLookupByID(conn_old, listOfDomains[i]);
	    ret = virDomainGetInfo(dom_old, &info_old);

Here are the stats from my latest measurement:

		old cpuTime		new cpuTime

Domain-0:	3s 4294835190ms		3s 4294849513ms
stornode:	5s 580501ms 		6s 4294550691ms
worknode:	6s 4294546809ms		5s 582761ms

That leads to results in cpu utilisation computation for the node,  
which are much lower, around 75%, than the real value (100%) would be.

One solution would be to add the measured time make those calls to  
used cpuTime. But this in turn can cause calculations of to high  
values because I don't really know in which point in time the value  
is written to the structure.

Nevertheless is xentop showing me every time the correct cpu- 
utilisation of each of my domains. So that I conclude, that this  
problem must have something to do with libvirt API.

Do you ore does anybody else experienced similar issues? Do you know  
any solution to that?

Cheers,

	Jan

On 10.05.2007, at 18:32, Daniel P. Berrange wrote:
...
On Thu, May 10, 2007 at 05:41:33PM +0200, Jan Michael wrote:
...
Hi everyone,
using libvirt I'm trying to calculate cpu utilization of a node in
percent. But sometimes values beyond 100.0% are being calculated.
This is because a domain spend more time on a cpu than time is
elapsed in the meantime.
A short explanation of the way how cpu utilization is computed in my
case:
1. - open two connections with
      conn_cur/conn_old = virConnectOpenReadOnly(NULL);
  2. - get current time
      gettimeofday(&time_old, NULL);
     - get domain by id with
      dom_old = virDomainLookupByID(conn_old, id)
     - get domain information
      virDomainGetInfo(dom_old, &info_old);
  3. - sleep a second
4. - doing same stuff like in 2. but with _cur
5. - compute cpu utilization by dividing used cputime by elapsed  
time
      and multiply with 100
Am I right if I suppose that cpuTime for _virDomainInfo structure
will be directly acquired from the hypervisor in virDomainGetInfo
(dom_old, &info_old) or is it already present with getting the domain
itself? Is there any better solution of doing this, which is more
precise?
This is the best approach - the algorithm you summarized is basically
the same as I use in virt-manager. The reason it sometimes goes above
100% is just due to timing / schedular variations
1. get timeofday
   2. get cputime for domA
   3. sleep a while
   4. get timeofday
   5. get cputime for domA
We're basically looking at the ratio of 4-1, against 5-2. It would
be 100% accurate if you could guarentee no time elapased between
steps 1 & 2, or between steps 4 & 5, but there's always some latency
in there, so occassionally you might end up calculating a value that
is a tiny bit over 100%.  In virt-manager I deal with this by simply
rounding down to 100 if this occurs.
Based on the hypercalls which are available to us, I don't see any
way to avoid this scenario. Then again it is not like we really need
millisecond precision in caculating CPU usage so I don't think its
a problem worrying about too much.
...
And another general question:
The monitoring utility of xen, called xentop, provides also
statistics about networking and vbds. Are there any plans to provide
this values by libvirt in the future?
I'd like to see the ability to track  network & disk I/O stats.
No one has so far stepped forward to suggest an API or implmentation,
but I'd welcome anyone interested in taking a look at this area.
Regards,
Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978  
392 2496 -=|
|=-           Perl modules: http://search.cpan.org/ 
~danberr/              -=|
|=-               Projects: http://freshmeat.net/ 
~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B  
9505  -=|