At 03/31/2011 03:59 AM, Eric Blake Write:
On 03/29/2011 04:17 AM, Wen Congyang wrote:
> When I edit the domain's config file like this:
> =====================
> <disk type='file' device='disk'>
> <driver name='qemu' type='qcow2'/>
> <source file='/var/lib/libvirt/images/test3.img'/>
> <target dev='sdb' bus='scsi'/>
> <address type='drive' controller='0' bus='0'
unit='10'/>
> </disk>
> =====================
>
> Note, the unit is wrong, but libvirt does not check it.
> When I start the vm with the wrong config file, libvirtd will be blocked because
> qemu quited unexpectedly. This bug does not happen every time, and it only happened
> once on my box.
>
> So I try to use gdb and add sleep() to trigger this bug. I have posted two patches
> to fix 2 bugs. But there is still another bug, and I have no good way to fix it.
Did you find a way to work around this yet? The typical solution is to
temporarily add another reference to an object that you intend to still
use after regaining locks, so that even if the qemu process dies in the
meantime, it does not free the last reference and therefore does not
delete the object in its cleanup code.
>
> Steps to reproduce this bug:
> 1. use gdb to attach libvirtd, and set a breakpoint in the function
> qemuConnectMonitor()
> 2. start a vm
> 3. let the libvirtd to run until qemuMonitorSetCapabilities() returns.
> 4. kill the qemu process
> 5. step into qemuDomainObjExitMonitorWithDriver(), and set debug to 1
>
> Now, qemuDomainObjExitMonitorWithDriver() will sleep 100s to make sure
> qemuProcessHandleMonitorEOF() is done before qemuProcessHandleMonitorEOF()
> returns.
>
> priv->mon will be null after qemuDomainObjExitMonitorWithDriver() returns.
> So we must not use it. Unfortunately we still use it, and it will cause
> libvirtd crashed.
Sounds like qemuConnectMonitor needs an extra reference around priv->mon
for the duration of the connect attempt, so that
qemuProcessHandleMonitorEOF will not free the monitor.
No, qemuConnectMonitor() calls qemuDomainObjEnterMonitorWithDriver() to hold
an extra reference around priv->mon, and release it in
qemuDomainObjExitMonitorWithDriver().
But qemuDomainObjExitMonitorWithDriver() does not tell the caller whether
priv->mon can be used.
If we will call qemuDomainObjEnterMonitorWithDriver()/qemuDomainObjEnterMonitor(),
I think we must check whether the domain is active. If we call them more than once,
we must check it every time. And we should not do other things between checking whether
the domain is active and calling them(If we do as this, the code can be maintained
easily)
But some codes does not respect this rule.
Here is the list of the functions that may have the similar problem:
1. qemu_migration.c
doNativeMigrate()
qemuMigrationToFile()
2. qemu_process.c
qemuProcessWaitForMonitor()
qemuProcessDetectVcpuPIDs()
qemuProcessInitPasswords()
qemuProcessInitPCIAddresses()
qemuProcessStartCPUs()
qemuProcessStopCPUs()
qemuProcessStart()
3. qemu_hotplug.c
qemuDomainChangeEjectableMedia()
qemuDomainAttachPciDiskDevice()
qemuDomainAttachPciControllerDevice
qemuDomainAttachSCSIDisk()
qemuDomainAttachUsbMassstorageDevice()
qemuDomainAttachNetDevice()
qemuDomainAttachHostPciDevice()
qemuDomainAttachHostUsbDevice()
qemuDomainDetachPciDiskDevice()
qemuDomainDetachDiskDevice()
qemuDomainDetachPciControllerDevice()
qemuDomainDetachNetDevice()
qemuDomainDetachHostPciDevice()
qemuDomainDetachHostUsbDevice()
qemuDomainChangeGraphicsPasswords()
4. qemu_driver.c
qemudDomainHotplugVcpus()
qemudDomainDumpXML()
qemuDomainSnapshotCreateActive()
qemuDomainMonitorCommand()
If everyont agrees this rule, I will fix them.
Thanks.
Wen Congyang