On Thu, Mar 26, 2009 at 05:13:00PM +0900, Matt McCowan wrote:
On Mon, 23 Mar 2009 13:44:58 +0000
"Daniel P. Berrange" <berrange(a)redhat.com> wrote:
> On Sun, Mar 22, 2009 at 07:28:36PM +0900, Matt McCowan wrote:
> > Running into an issue where, if I/O is hampered by load for example,
> > reading a largish state file (created by 'virsh save') is not allowed
to
> > complete.
> > qemudStartVMDaemon in src/qemu_driver.c has a loop that waits 10 seconds
> > for the VM to be brought up. An strace against libvirt when doing a
> > 'virsh restore' against a largish state file shows the VM being sent a
> > kill when it's still happily reading from the file.
My bad. It's not the timeout loop in qemudStartVMDaemon that's killing
it. It's as you suggested and the code is crapping out in
qemudReadMonitorOutput, seemingly when poll()ing the consoles fd - it
doesn't get any POLLIN in the 10 secs it waits. (Against latest CVS
pull)
Hmm, this is the exact scenario I thought we had gotten fixed in upstream
QEMU/KVM.
> This is a little odd to me - we had previously fixed KVM migration
> code so that during startup with -incoming, it would correctly
> respond to monitor commands, explicitly to avoid libvirt timing
> out in this way. I'm wondering what has broken since then, whether
> its libvirt's usage changing, or KVM impl changing.
I'm running kvm-83 (QEMU 0.9.1) if that's of any help.
The state files I have dragged in during testing were generally 4G+ and
worked without problem. The ones I'm playing with in the production
environment are <3G, but on a more heavily loaded system with lots of
snap shotted LVs.
'virsh restore' on the other VMs with <2G state files works just fine.
Clearly the monitor console is not responding while it is reading in
the state file & how long that takes is dependant on host OS load.
As a temporary workaround the only option is really to increase that
10 second timeout significantly, if doing a restore/migrate operation.
In parallel with that we'll have to look at KVM code again and figure
out why its behaving this way.
Daniel
--
|: Red Hat, Engineering, London -o-
http://people.redhat.com/berrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org -o-
http://ovirt.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|