On Mon, 23 Mar 2009 13:44:58 +0000
"Daniel P. Berrange" <berrange(a)redhat.com> wrote:
On Sun, Mar 22, 2009 at 07:28:36PM +0900, Matt McCowan wrote:
> Running into an issue where, if I/O is hampered by load for example,
> reading a largish state file (created by 'virsh save') is not allowed to
> complete.
> qemudStartVMDaemon in src/qemu_driver.c has a loop that waits 10 seconds
> for the VM to be brought up. An strace against libvirt when doing a
> 'virsh restore' against a largish state file shows the VM being sent a
> kill when it's still happily reading from the file.
My bad. It's not the timeout loop in qemudStartVMDaemon that's killing
it. It's as you suggested and the code is crapping out in
qemudReadMonitorOutput, seemingly when poll()ing the consoles fd - it
doesn't get any POLLIN in the 10 secs it waits. (Against latest CVS
pull)
This is a little odd to me - we had previously fixed KVM migration
code so that during startup with -incoming, it would correctly
respond to monitor commands, explicitly to avoid libvirt timing
out in this way. I'm wondering what has broken since then, whether
its libvirt's usage changing, or KVM impl changing.
I'm running kvm-83 (QEMU 0.9.1) if that's of any help.
The state files I have dragged in during testing were generally 4G+ and
worked without problem. The ones I'm playing with in the production
environment are <3G, but on a more heavily loaded system with lots of
snap shotted LVs.
'virsh restore' on the other VMs with <2G state files works just fine.
Continuing to look ...
Regards
Matt
SysAdmin
RPS MetOcean
Perth, Western Australia