When I edit the domain's config file like this:
=====================
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/var/lib/libvirt/images/test3.img'/>
<target dev='sdb' bus='scsi'/>
<address type='drive' controller='0' bus='0'
unit='10'/>
</disk>
=====================
Note, the unit is wrong, but libvirt does not check it.
When I start the vm with the wrong config file, libvirtd will be blocked because
qemu quited unexpectedly. This bug does not happen every time, and it only happened
once on my box.
So I try to use gdb and add sleep() to trigger this bug. I have posted two patches
to fix 2 bugs. But there is still another bug, and I have no good way to fix it.
I add sleep() in qemuDomainObjExitMonitorWithDriver():
==============================
int qemuDomainObjExitMonitorWithDriver(struct qemud_driver *driver,
virDomainObjPtr obj)
{
qemuDomainObjPrivatePtr priv = obj->privateData;
int refs;
int debug = 0;
refs = qemuMonitorUnref(priv->mon);
if (refs > 0)
qemuMonitorUnlock(priv->mon);
/* Note: qemu may quited unexpectedly here, and the monitor will be freed.
* If it happened, priv->mon will be null.
*/
if (debug)
sleep(100);
qemuDriverLock(driver);
virDomainObjLock(obj);
if (refs == 0) {
priv->mon = NULL;
}
}
==============================
Steps to reproduce this bug:
1. use gdb to attach libvirtd, and set a breakpoint in the function
qemuConnectMonitor()
2. start a vm
3. let the libvirtd to run until qemuMonitorSetCapabilities() returns.
4. kill the qemu process
5. step into qemuDomainObjExitMonitorWithDriver(), and set debug to 1
Now, qemuDomainObjExitMonitorWithDriver() will sleep 100s to make sure
qemuProcessHandleMonitorEOF() is done before qemuProcessHandleMonitorEOF()
returns.
priv->mon will be null after qemuDomainObjExitMonitorWithDriver() returns.
So we must not use it. Unfortunately we still use it, and it will cause
libvirtd crashed.
My first fix is that qemuDomainObjExitMonitorWithDriver() returns -1, and
the caller checks the return value, then do some cleanup and return error.
Unfortunately we may use priv->mon when doing some cleanup. The only way
to avoid it is that add some local variable and set it when qemu quited
unexpectedly. Avoid to use priv->mon in cleanup codes
Is there some simply way to fix this bug.