On 07/28/2011 03:13 AM, Wen Congyang wrote:
Current, we support run sync job and async job at the same time. It
means that the
monitor commands for two jobs can be run in any order.
In the function qemuDomainObjEnterMonitorInternal():
if (priv->job.active == QEMU_JOB_NONE&& priv->job.asyncJob) {
if (qemuDomainObjBeginNestedJob(driver, obj)< 0)
We check whether the caller is an async job by priv->job.active and
priv->job.asynJob. But when an async job is running, priv->job.active
is QEMU_JOB_NONE if no sync job is running, or priv->job.active is not
QEMU_JOB_NONE if a sync job is running. So we cannot check whether the
caller is an async job in the function qemuDomainObjEnterMonitorInternal().
Unfortunately, if sync job and async job are running at the same time, we may
try to send monitor command at the same time in two threads. It's very
dangerous, and it will cause libvirtd crashed.
Thanks for the analysis. I think you are spot on as to the problem's
root cause - we are not properly serializing access to the monitor when
mixing a sync and an async job.
However, I'm not yet convinced that adding yet another new condvar is
the solution. Rather, your idea of adding
qemuDomainObjEnterMonitorWithDriverAsync() might be the way to go. And
as for existing functions that can be used either by the async job or by
a sync job (qemuProcessStopCPUs), I think that merely means that
qemuDomainObjEnterMonitorWithDriverAsync is passed either
QEMU_ASYNC_JOB_NONE (sync job request) or the current async job, and
that qemuDomainObjEnterMonitorWithDriver becomes shorthand for
qemuDomainObjEnterMonitorWithDriver(,QEMU_ASYNC_JOB_NONE).
I'll try a patch along those lines, but may end up going with yours
after all.
--
Eric Blake eblake(a)redhat.com +1-801-349-2682
Libvirt virtualization library
http://libvirt.org