On Mon, Mar 21, 2011 at 06:15:36PM +0800, Hu Tao wrote:
On Fri, Mar 18, 2011 at 11:20:56AM +0000, Daniel P. Berrange wrote:
> On Wed, Mar 16, 2011 at 06:30:20PM +0800, Hu Tao wrote:
> > ---
> > src/qemu/qemu_domain.c | 26 +-----------
> > src/qemu/qemu_monitor.c | 106 ++++++++++++++++++++++++++--------------------
> > src/qemu/qemu_monitor.h | 4 +-
> > 3 files changed, 64 insertions(+), 72 deletions(-)
> >
> > diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
> > index 8a2b9cc..e056ef0 100644
> > --- a/src/qemu/qemu_domain.c
> > +++ b/src/qemu/qemu_domain.c
> > @@ -566,7 +566,6 @@ void qemuDomainObjEnterMonitor(virDomainObjPtr obj)
> > qemuDomainObjPrivatePtr priv = obj->privateData;
> >
> > qemuMonitorLock(priv->mon);
> > - qemuMonitorRef(priv->mon);
> > virDomainObjUnlock(obj);
> > }
> >
> > @@ -578,19 +577,9 @@ void qemuDomainObjEnterMonitor(virDomainObjPtr obj)
> > void qemuDomainObjExitMonitor(virDomainObjPtr obj)
> > {
> > qemuDomainObjPrivatePtr priv = obj->privateData;
> > - int refs;
> > -
> > - refs = qemuMonitorUnref(priv->mon);
> > -
> > - if (refs > 0)
> > - qemuMonitorUnlock(priv->mon);
> >
> > + qemuMonitorUnlock(priv->mon);
> > virDomainObjLock(obj);
> > -
> > - if (refs == 0) {
> > - virDomainObjUnref(obj);
> > - priv->mon = NULL;
> > - }
> > }
> >
> >
> > @@ -608,7 +597,6 @@ void qemuDomainObjEnterMonitorWithDriver(struct
qemud_driver *driver,
> > qemuDomainObjPrivatePtr priv = obj->privateData;
> >
> > qemuMonitorLock(priv->mon);
> > - qemuMonitorRef(priv->mon);
> > virDomainObjUnlock(obj);
> > qemuDriverUnlock(driver);
> > }
> > @@ -623,20 +611,10 @@ void qemuDomainObjExitMonitorWithDriver(struct
qemud_driver *driver,
> > virDomainObjPtr obj)
> > {
> > qemuDomainObjPrivatePtr priv = obj->privateData;
> > - int refs;
> > -
> > - refs = qemuMonitorUnref(priv->mon);
> > -
> > - if (refs > 0)
> > - qemuMonitorUnlock(priv->mon);
> >
> > + qemuMonitorUnlock(priv->mon);
> > qemuDriverLock(driver);
> > virDomainObjLock(obj);
> > -
> > - if (refs == 0) {
> > - virDomainObjUnref(obj);
> > - priv->mon = NULL;
> > - }
> > }
>
> I don't think these changes aren't right. The EnterMonitor() method
> is releasing the lock on the virDomainObjPtr. This means that other
> threads (specifically the main I/O thread) can use the virDomainObjPtr
> and its qemuMonitorPtr instance. Incrementing reference count on the
> qemuMonitorPtr is the only thing that prevents it being free()d while
> it is unlocked by this thread.
>
> eg, consider this
>
> In the main thread making a monitor call
>
> qemuDomainObjEnterMonitor(obj)
>
> qemuMonitorSetCPU(mon, 1, 1);
>
> qemuDomainObjExitMonitor(obj)
>
>
> So, at step 2 there, qemuMonitorSetCPU() is asleep on the condition
> variable, waiting for the reply from QEMU.
>
>
> Now QEMU dies for some reason, so in the I/O thread, we have
> qemuMonitorIO() called. This sees the EOF on the socket, so
> it invokes the callback "(eofNotify)(mon, vm, failed);".
>
> This callback is qemuMonitorClose(), which calls qemuMonitorFree()
> which releases the last reference count, which deletes the condition
> variable. The other thread is now waiting on a condition variable
> which has been deleted. It'll either hang forever, or crash
>
> We *must* prevent the monitor being free'd while a thread is running
> a command, hence qemuDomainObjEnterMonitor() must call qemuMonitorRef()
> and qemuDomainObjExitMonitor() must call qemuMonitorUnref().
You're all right except on when to ref the monitor. The monitor may be
freed by another thread before call of qemuDomainObjEnterMonitor(), so
call of qemuMonitorRef() at this time doesn't help. I think the monitor
must have already been refed before a thread starts operating it.
No, that scenario should not be possible. As per src/qemu/THREADS.txt,
the 'virDomainObjPtr' lock *must* be held when making any change to
that object, so if 'vm' is locked, and you have checked virDomainObjIsActive,
then it should not be poissible for ',mon' to be NULL in the call to
qemuDomainObjEnterMonitor.
Consider this sequence:
virDomainObjPtr vm;
qemuDriverLock(driver);
vm = virDomainFindByUUID(&driver->domains, dom->uuid);
qemuDriverUnlock(driver);
/* The mutex on 'vm' is held at this point */
if (qemuDomainObjBeginJob(vm) < 0)
goto cleanup;
/* We know that no other thread has a monitor command active
at this point. */
if (virDomainObjIsActive(vm)) {
qemuReportError(VIR_ERR_OPERATION_INVALID,
"%s", _("domain is not running"));
goto cleanup;
}
/* We know that priv->mon is non-NULL at this point */
qemuDomainObjPrivatePtr priv = vm->privateData;
qemuDomainObjEnterMonitor(vm);
/* 'vm' is now unlocked, but priv->mon cannot be
free()d by the I/O thread, because EnterMonitor()
acquired an extra reference on it */
ret = qemuMonitorGetMemoryStats(priv->mon, stats, nr_stats);
qemuDomainObjExitMonitor(vm);
/* 'vm' is now locked again, but priv->mon may now
be NULL, since ExitMonitor released a reference */
if (qemuDomainObjEndJob(vm) == 0)
vm = NULL;
/* vm may be NULL, becuase EndJob released a reference
on it, that was acquired by BeginJob */
Provided the call to virDomainObjIsActive() is not forgotten,
we can be sure that 'mon' exists in qemuDomainObjEnterMonitor.
The qemuDomainObjEnterMonitor() call, acquires an extra reference
to guarentee that the monitor is never free'd while the
qemuMonitorGetMemoryStats() API is running, even if the I/O
thread releases the primary reference on the monitor.
The qemuDomainObjExitMonitor() call releases the reference that
was acquired by EnterMonitor, so, it is possible that the monitor
is now freed. If a second monitor call is to be made now, the
virDomainObjIsActive API *must* be checked again, before calling
EnterMonitor again.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|