[libvirt-users] What does cpu_time returned by virDomainGetCPUStats mean?

hi, everyone. I find an 'interesting' thing involving virDomainGetCPUStats(). I call it for cpu usage consumed by a domain and get a array of virTypedParameter. My system is 2-core and hyperviser return 1 parameter per cpu. So the contents of the array is like this. virTypedParameter[0] { .fiedl = "cpu_time" .type = 4 .value.ul = 51640610899 } virTypedParameter[1] { .field = "cpu_time" .type = 4 .value.ul = 55302820304 } I thought this value store the run time of the cpu since last boot. But I find I was wrong because this value would increase until it wraps down and doesn't reset even the domain is restarted. So, what does this value mean? How can I get the CPU usage of the domain? I found nothing on the API reference doc page:-(. No word is related with the meaning of the returned array of virTypedParameter by virDomainGetCPUStats().

Il giorno Lun 16 Apr 2012 13:37:24 CEST, Zhihua Che ha scritto: [...]
How can I get the CPU usage of the domain? [...]
As Daniel told me two days ago, cpuTime shows the absolute CPU time consumed since boot. To get % CPU time, take two readings 'n' seconds apart and calculate the delta between them. -- RaSca Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene! rasca@miamammausalinux.org http://www.miamammausalinux.org

在 2012年4月16日 下午7:56,RaSca <rasca@miamammausalinux.org> 写道:
Il giorno Lun 16 Apr 2012 13:37:24 CEST, Zhihua Che ha scritto: [...]
How can I get the CPU usage of the domain? [...]
As Daniel told me two days ago, cpuTime shows the absolute CPU time consumed since boot. To get % CPU time, take two readings 'n' seconds apart and calculate the delta between them.
I guess so, but what is the precision of the returned cpu time. I don't think it's 1/HZ as host machine is according to my experiments. Thanks for your reply.

On 04/16/2012 05:37 AM, Zhihua Che wrote:
I thought this value store the run time of the cpu since last boot.
The intent is to store run time of the hypervisor process managing the guest (which therefore is larger than the amount of time that the guest thinks it has been running, since the hypervisor has some overhead). But the API is flexible enough that we can add more statistics, if it proves easy to collect such additional statistics.
But I find I was wrong because this value would increase until it wraps down and doesn't reset even the domain is restarted.
How are you restarting the guest? If it is by rebooting the guest _within the same qemu process_, then no, the numbers won't reset. Based on how the cpuacct cgroup works, the numbers should only wrap when you actually create a new qemu process (actually stop the guest and boot it fresh in a new qemu process, rather than rebooting the guest within the same qemu process). Perhaps we should be improving our XML to track delta usage since a given point in time, and when we detect a domain reboot, update that delta point so that the usage will again appear to be 0; allowing a delta calculation would also let us "track" CPU usage even across domain migration or managedsave/restore.
So, what does this value mean?
How can I get the CPU usage of the domain?
I found nothing on the API reference doc page:-(. No word is related with the meaning of the returned array of virTypedParameter by virDomainGetCPUStats().
I found this: http://libvirt.org/html/libvirt-libvirt.html#VIR_DOMAIN_CPU_STATS_CPUTIME "cpu usage in nanoseconds, as a ullong" and looking at libvirt.h, VIR_DOMAIN_CPU_STATS_CPUTIME maps to the "cpu_time" name of your API call. If that still isn't enough information, could you help out by submitting patches to improve our documentation? -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

在 2012年4月17日 上午4:11,Eric Blake <eblake@redhat.com> 写道:
On 04/16/2012 05:37 AM, Zhihua Che wrote:
I thought this value store the run time of the cpu since last boot.
The intent is to store run time of the hypervisor process managing the guest (which therefore is larger than the amount of time that the guest thinks it has been running, since the hypervisor has some overhead). But the API is flexible enough that we can add more statistics, if it proves easy to collect such additional statistics.
But I find I was wrong because this value would increase until it wraps down and doesn't reset even the domain is restarted.
How are you restarting the guest? If it is by rebooting the guest _within the same qemu process_, then no, the numbers won't reset. Based on how the cpuacct cgroup works, the numbers should only wrap when you actually create a new qemu process (actually stop the guest and boot it fresh in a new qemu process, rather than rebooting the guest within the same qemu process).
After reading your post, I did the below experiment again. I started the domain by issuing 'start ubuntu-1' and shutted down my domain by issuing 'destroy ubuntu-1'. (BTW, I cannot shutdown my domain by 'shutdown ubunut-1'. I guess my domain image got something wrong. I wish this didn't affect the experiment) Here are my two sample values. start ubunt-1 cpu0+cpu1 11239925290 cpu0 6034491893 cpu1 5205433307 destroy and start ubunu-1 cpu0+cpu1 10621566430 cpu0 5403373809 cpu1 5218192621 I can't determine whether these value reseted because total number is lower than that before restarting while one cpu is larger or lower. These values really confuse me. I check the kvm process id through ps. I'm sure the two running domain are assigned two different process id. I guess that means they run as different qemu process as you mentioned.
Perhaps we should be improving our XML to track delta usage since a given point in time, and when we detect a domain reboot, update that delta point so that the usage will again appear to be 0; allowing a delta calculation would also let us "track" CPU usage even across domain migration or managedsave/restore.
So, what does this value mean?
How can I get the CPU usage of the domain?
I found nothing on the API reference doc page:-(. No word is related with the meaning of the returned array of virTypedParameter by virDomainGetCPUStats().
I found this:
http://libvirt.org/html/libvirt-libvirt.html#VIR_DOMAIN_CPU_STATS_CPUTIME
"cpu usage in nanoseconds, as a ullong"
and looking at libvirt.h, VIR_DOMAIN_CPU_STATS_CPUTIME maps to the "cpu_time" name of your API call.
If that still isn't enough information, could you help out by submitting patches to improve our documentation?
-- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 04/16/2012 09:15 PM, Zhihua Che wrote:
After reading your post, I did the below experiment again. I started the domain by issuing 'start ubuntu-1' and shutted down my domain by issuing 'destroy ubuntu-1'.
That's a forceful shutdown - it removes the virtual power cord, and always works, but may leave your disks in a state that requires fsck to recover.
(BTW, I cannot shutdown my domain by 'shutdown ubunut-1'. I guess my domain image got something wrong. I wish this didn't affect the experiment)
shutdown requires the guest to understand an ACPI power request (not all guests do), or to have a guest agent installed (even fewer guests have qemu_ga installed, as it is still quite new). But when it does work, it is nicer, because it lets the guest shut down gracefully.
Here are my two sample values.
start ubunt-1 cpu0+cpu1 11239925290 cpu0 6034491893
6 seconds active use attributed to cpu0,
cpu1 5205433307
5 seconds on cpu1
destroy and start ubunu-1 cpu0+cpu1 10621566430 cpu0 5403373809 cpu1 5218192621
5 seconds on each cpu. Yes, the values reset for you, and your testing happened to time the guest with about the same amount of usage time between your two starts and queries. You'll notice much more impressive numbers if you let the guest run for 10 minutes, then query, then restart, then query right away.
These values really confuse me.
They are merely kernel counters of how many nanoseconds of processor time has been attributed to the cgroup owning the qemu process since the cgroup was created. A qemu process is created each time you boot the guest from scratch, and 'virsh destroy' followed by 'virsh start' is indeed sufficient to start the guest from scratch.
I check the kvm process id through ps. I'm sure the two running domain are assigned two different process id. I guess that means they run as different qemu process as you mentioned.
Technically, a cgroup can own multiple pids, so distinct pids is not always a guarantee of distinct cpuacct numbers. However, libvirt does indeed create a new cgroup for each qemu pid, so in this instance, yes, a different qemu pid should mean that the numbers started from scratch. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

在 2012年4月17日 上午4:11,Eric Blake <eblake@redhat.com> 写道:
On 04/16/2012 05:37 AM, Zhihua Che wrote:
I thought this value store the run time of the cpu since last boot.
The intent is to store run time of the hypervisor process managing the guest (which therefore is larger than the amount of time that the guest thinks it has been running, since the hypervisor has some overhead). But the API is flexible enough that we can add more statistics, if it proves easy to collect such additional statistics.
But I find I was wrong because this value would increase until it wraps down and doesn't reset even the domain is restarted.
How are you restarting the guest? If it is by rebooting the guest _within the same qemu process_, then no, the numbers won't reset. Based on how the cpuacct cgroup works, the numbers should only wrap when you actually create a new qemu process (actually stop the guest and boot it fresh in a new qemu process, rather than rebooting the guest within the same qemu process).
Perhaps we should be improving our XML to track delta usage since a given point in time, and when we detect a domain reboot, update that delta point so that the usage will again appear to be 0; allowing a delta calculation would also let us "track" CPU usage even across domain migration or managedsave/restore.
So, what does this value mean?
How can I get the CPU usage of the domain?
I found nothing on the API reference doc page:-(. No word is related with the meaning of the returned array of virTypedParameter by virDomainGetCPUStats().
I found this:
http://libvirt.org/html/libvirt-libvirt.html#VIR_DOMAIN_CPU_STATS_CPUTIME
"cpu usage in nanoseconds, as a ullong"
and looking at libvirt.h, VIR_DOMAIN_CPU_STATS_CPUTIME maps to the "cpu_time" name of your API call.
If that still isn't enough information, could you help out by submitting patches to improve our documentation?
Yes, I would take a shoot.

在 2012年4月17日 上午4:11,Eric Blake <eblake@redhat.com> 写道:
On 04/16/2012 05:37 AM, Zhihua Che wrote:
I thought this value store the run time of the cpu since last boot.
The intent is to store run time of the hypervisor process managing the guest (which therefore is larger than the amount of time that the guest thinks it has been running, since the hypervisor has some overhead). But the API is flexible enough that we can add more statistics, if it proves easy to collect such additional statistics.
But I find I was wrong because this value would increase until it wraps down and doesn't reset even the domain is restarted.
How are you restarting the guest? If it is by rebooting the guest _within the same qemu process_, then no, the numbers won't reset. Based on how the cpuacct cgroup works, the numbers should only wrap when you actually create a new qemu process (actually stop the guest and boot it fresh in a new qemu process, rather than rebooting the guest within the same qemu process).
Perhaps we should be improving our XML to track delta usage since a given point in time, and when we detect a domain reboot, update that delta point so that the usage will again appear to be 0; allowing a delta calculation would also let us "track" CPU usage even across domain migration or managedsave/restore.
So, what does this value mean?
How can I get the CPU usage of the domain?
I found nothing on the API reference doc page:-(. No word is related with the meaning of the returned array of virTypedParameter by virDomainGetCPUStats().
I found this:
http://libvirt.org/html/libvirt-libvirt.html#VIR_DOMAIN_CPU_STATS_CPUTIME
"cpu usage in nanoseconds, as a ullong"
and looking at libvirt.h, VIR_DOMAIN_CPU_STATS_CPUTIME maps to the "cpu_time" name of your API call.
If that still isn't enough information, could you help out by submitting patches to improve our documentation?
--
I hope this help user to know what the virTypedParameters mean. As for me, I didn't connect the function with macros like VIR_DOMAIN_CPU_STATS_CPUTIME. I don't know which files in directory docs I should modify. I guess our docs are generated by extracting comments in source code. So I modify in comments above virDomainGetCPUStats(). Feel free to tell me if I'm wrong.
participants (3)
-
Eric Blake
-
RaSca
-
Zhihua Che