...or more particularly, in ram_bytes_remaining() called by do_info_migrate();
oddly, this is much more pronounced when running with <emulator> pointing at a
shim prepending -no-kvm-irqchip to the invoked command line.
This VM was intended to be paused for the save event (if my software was doing
its job correctly), so we shouldn't be spending time running the guest CPU and
writing updates for already-once-written blocks.
I'm seeing much more CPU time spent inside qemu-kvm than in the exec'd lzop
process compressing and writing the data stream; on attaching gdb and taking
some stack traces to sample where execution time was spent, it appeared that we
were spending our time responding to requests from the monitor.
The question then -- is the 50ms poll in qemuDomainWaitForMigrationComplete
(called from qemudDomainSave) perhaps too frequent?
Thanks!
---
Below is an exchange from IRC:
<nDuff> How often should libvirt be calling "info migrate" during a
virDomainSave (of a qemu domain)?
* nDuff is seeing his qemu-kvm spending the bulk of its time inside
ram_bytes_remaining() under do_info_migrate().
<DV> nDuff: I doubt libvirt is doing this on his own, something else is asking
for the information I would assume
<nDuff> DV, I'm not running virt-manager or such; the only management layer on
top is locally developed, and it only has a single thread that's blocked waiting
for the dom.save() call [this is using the Python bindings] to complete.
<DV> interesting
<DV> nDuff -> send this to the list, someone need to look at it, at least raise
the problem, maybe we didn't expected that to be so costly
<DV> nDuff: maybe open a bugzilla
<nDuff> it might be that it's not _usually_ so costly except that I'm
hitting a
qemu/kvm bug; it only started expressing itself when I added -no-kvm-irqchip to
the commandline via a shim
<DV> nDuff: I could see how trying to extract this too often could stall the
migration process
<nDuff> ...but yes, I'll post to the list.
<DV> nDuff: maybe it's related to the capability to force migration end when the
full flush will be shorter than a user defined limit