"Daniel P. Berrange" <berrange(a)redhat.com> wrote:
The recent refactoring of the QEMU startup process now reads the
monitor
TTY from the logfile. Unfortunately in this refactoring we lost the check
for the 'ret == 0' scenario in the read() return value. So if QEMU quits
at startup, eg due to missing disk image, we loop forever on read() == 0
because we hit end-of-file and QEMU has quit.
This patch adds back handling for this scenario, and takes care to
propagate the contents of the log to the user as an error message
# start demo
libvir: QEMU error : internal error unable to start guest: qemu: could
not open disk image /home/berrange/Fedora-9-i686-Live.iso
error: Failed to start domain demo
In addition, there were a couple of other bugs
- a memory leak where we set the 'monitorpath' variable, even
though we'd just set it moments before.
- a missing check for whether the driver VNC password was present
when initializing passwords at VM startupo
- missing initialization of the monitor_watch field, and missing
checking for whether the watch was set before removing it.
- a gratuitous LOG_INFO when shutting down any VM, which could
just be LOG_DEBUG.
FYI, I reproduced the infloop with the following:
cat <<\EOF > d.xml
<domain type='qemu'>
<name>D</name>
<uuid>c7a5fdbd-cdaf-9455-926a-d65c16db1809</uuid>
<memory>219200</memory>
<currentMemory>219200</currentMemory>
<vcpu>2</vcpu>
<os>
<type arch='i686' machine='pc'>hvm</type>
<boot dev='cdrom'/>
</os>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<disk type='file' device='cdrom'>
<source file='no-such-file'/>
<target dev='hdc'/>
<readonly/>
</disk>
<disk type='file' device='disk'>
<source file='no-such-file'/>
<target dev='hda'/>
</disk>
<graphics type='vnc' port='-1'/>
</devices>
</domain>
EOF
Before, this use of virsh would hang:
qemud/libvirtd &
src/virsh -c qemu:///session "define d.xml; start D"
Now, it fails with a diagnostic, as you'd expect.
Somehow, while testing this, I got numerous segfaults
from libvirtd (due to dereferencing NULL doms->objs[i]->def,
but those should never be NULL), yet when I went to set up a
reproducer, it stopped happening altogether.
So not only does the infloop-fix look like it does the right thing,
I confirmed it with the above. The rest looks fine, too.
Once we have something like my unix_sock_dir patch or an improved testing
framework, I'll add a test like the above. Or maybe you know how to
reproduce the infloop using the test driver?
ACK.