[libvirt] qemu-kvm spending time in do_info_migrate() during virDomainSave(); 50ms polling too fast?

...or more particularly, in ram_bytes_remaining() called by do_info_migrate(); oddly, this is much more pronounced when running with <emulator> pointing at a shim prepending -no-kvm-irqchip to the invoked command line. This VM was intended to be paused for the save event (if my software was doing its job correctly), so we shouldn't be spending time running the guest CPU and writing updates for already-once-written blocks. I'm seeing much more CPU time spent inside qemu-kvm than in the exec'd lzop process compressing and writing the data stream; on attaching gdb and taking some stack traces to sample where execution time was spent, it appeared that we were spending our time responding to requests from the monitor. The question then -- is the 50ms poll in qemuDomainWaitForMigrationComplete (called from qemudDomainSave) perhaps too frequent? Thanks! --- Below is an exchange from IRC: <nDuff> How often should libvirt be calling "info migrate" during a virDomainSave (of a qemu domain)? * nDuff is seeing his qemu-kvm spending the bulk of its time inside ram_bytes_remaining() under do_info_migrate(). <DV> nDuff: I doubt libvirt is doing this on his own, something else is asking for the information I would assume <nDuff> DV, I'm not running virt-manager or such; the only management layer on top is locally developed, and it only has a single thread that's blocked waiting for the dom.save() call [this is using the Python bindings] to complete. <DV> interesting <DV> nDuff -> send this to the list, someone need to look at it, at least raise the problem, maybe we didn't expected that to be so costly <DV> nDuff: maybe open a bugzilla <nDuff> it might be that it's not _usually_ so costly except that I'm hitting a qemu/kvm bug; it only started expressing itself when I added -no-kvm-irqchip to the commandline via a shim <DV> nDuff: I could see how trying to extract this too often could stall the migration process <nDuff> ...but yes, I'll post to the list. <DV> nDuff: maybe it's related to the capability to force migration end when the full flush will be shorter than a user defined limit

2010/5/3 Charles Duffy <charles@dyfis.net>:
...or more particularly, in ram_bytes_remaining() called by do_info_migrate(); oddly, this is much more pronounced when running with <emulator> pointing at a shim prepending -no-kvm-irqchip to the invoked command line.
This VM was intended to be paused for the save event (if my software was doing its job correctly), so we shouldn't be spending time running the guest CPU and writing updates for already-once-written blocks.
I'm seeing much more CPU time spent inside qemu-kvm than in the exec'd lzop process compressing and writing the data stream; on attaching gdb and taking some stack traces to sample where execution time was spent, it appeared that we were spending our time responding to requests from the monitor.
The question then -- is the 50ms poll in qemuDomainWaitForMigrationComplete (called from qemudDomainSave) perhaps too frequent?
Thanks!
What's the libvirt version? Maybe the "Poll for migration end every 50ms instead of 50us" [1] commit (part of 0.8.1) fixes your problem, if you're currently using libvirt < 0.8.1. [1] http://libvirt.org/git/?p=libvirt.git;a=commit;h=e2c059485cf062bf1f906623703... Matthias

On Mon, May 3, 2010 at 1:48 PM, Matthias Bolte < matthias.bolte@googlemail.com> wrote:
2010/5/3 Charles Duffy <charles@dyfis.net>:
The question then -- is the 50ms poll in qemuDomainWaitForMigrationComplete (called from qemudDomainSave) perhaps too frequent?
What's the libvirt version?
Maybe the "Poll for migration end every 50ms instead of 50us" [1] commit (part of 0.8.1) fixes your problem, if you're currently using libvirt < 0.8.1.
Ahh; this predates 0.8, so that sounds to be precisely on-point.

On Mon, May 03, 2010 at 02:46:35PM -0500, Charles Duffy wrote:
On Mon, May 3, 2010 at 1:48 PM, Matthias Bolte < matthias.bolte@googlemail.com> wrote:
2010/5/3 Charles Duffy <charles@dyfis.net>:
The question then -- is the 50ms poll in qemuDomainWaitForMigrationComplete (called from qemudDomainSave) perhaps too frequent?
What's the libvirt version?
Maybe the "Poll for migration end every 50ms instead of 50us" [1] commit (part of 0.8.1) fixes your problem, if you're currently using libvirt < 0.8.1.
Gahh, I had forgotten about the patch
Ahh; this predates 0.8, so that sounds to be precisely on-point.
Actually the fix is in 0.8.1, the bug was added on Feb 3 i.e. probably part of 0.7.6 Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/
participants (3)
-
Charles Duffy
-
Daniel Veillard
-
Matthias Bolte