On Tue, Nov 16, 2010 at 08:09:35AM -0500, Stefan Berger wrote:
On 11/10/2010 07:43 AM, Stefan Berger wrote:
>On 11/10/2010 05:41 AM, Daniel P. Berrange wrote:
>>On Tue, Nov 09, 2010 at 07:43:23PM -0500, Stefan Berger wrote:
>>>I am replacing the last instances of close() I found with VIR_CLOSE() /
>>>VIR_FORCE_CLOSE respectively.
>>>
>>>The first patch of virsh I missed out on previously.
>>>
>>>The 2nd patch I had left out intentionally to look at it more
>>>carefully:
>>>The 'closed' variable could be easily removed since it wasn't
used
>>>anywhere else. The possible race condition that could result from the
>>>filedescriptor being closed and not set to -1 (and possibly let us
>>>write
>>>into 'something' totally different if the fd was allocated by
another
>>>thread) seems to be prevented by the qemuMonitorLock() already placed
>>>around the code that reads from or writes to the fd. So the change of
>>>this code as shown in the patch should not have any side-effects.
>>>
>>>Signed-off-by: Stefan Berger<stefanb(a)us.ibm.com>
>>>--- libvirt-acl.orig/src/qemu/qemu_monitor.c
>>>+++ libvirt-acl/src/qemu/qemu_monitor.c
>>>@@ -73,8 +73,6 @@ struct _qemuMonitor {
>>>
>>> /* If the monitor EOF callback is currently active (stops more
>>>commands being run) */
>>> unsigned eofcb: 1;
>>>- /* If the monitor is in process of shutting down */
>>>- unsigned closed: 1;
>>>
>>> unsigned json: 1;
>>> };
>>>@@ -692,17 +690,11 @@ void qemuMonitorClose(qemuMonitorPtr mon
>>> VIR_DEBUG("mon=%p", mon);
>>>
>>> qemuMonitorLock(mon);
>>>- if (!mon->closed) {
>>>+
>>>+ if (mon->fd>= 0) {
>>> if (mon->watch)
>>> virEventRemoveHandle(mon->watch);
>>>- if (mon->fd != -1)
>>>- close(mon->fd);
>>>- /* NB: ordinarily one might immediately set mon->watch to -1
>>>- * and mon->fd to -1, but there may be a callback active
>>>- * that is still relying on these fields being valid. So
>>>- * we merely close them, but not clear their values and
>>>- * use this explicit 'closed' flag to track this state */
>>>- mon->closed = 1;
>>>+ VIR_FORCE_CLOSE(mon->fd);
>>> }
>>Err, the comment you deleted here explains why this change is not
>>safe.
>
>If I looked at the code correctly then this here is the callback:
>
>static void
>qemuMonitorIO(int watch, int fd, int events, void *opaque) {
> qemuMonitorPtr mon = opaque;
> int quit = 0, failed = 0;
>
> qemuMonitorLock(mon);
> qemuMonitorRef(mon);
>#if DEBUG_IO
> VIR_DEBUG("Monitor %p I/O on watch %d fd %d events %d", mon, watch, f
>#endif
>
> if (mon->fd != fd || mon->watch != watch) {
> VIR_ERROR(_("event from unexpected fd %d!=%d / watch %d!=%d"), mo
> failed = 1;
> } else {
>[...]
>
>
>It grabs the lock and then subsequently, with the lock held, calls
>into qemuMonitorIOWrite and qemuMonitorIORead (not shown, this is done
>in the [...]), which then access the mon->fd. Now the above
>VIR_FORCE_CLOSE() function is also being called with the lock held,
>thus serializes the access to the file descriptor (mon->fd). When the
>'if (mon->fd != fd)' is evaluate (while the lock is held) after the fd
>was closed and now set to -1, then we won't read from a non-open file
>descriptor anymore, which would otherwise occur in the else branch.
>Otherwise, if fd->mon != -1, it would do the reads and writes as
>intended and the above close could not happen while it is doing that.
>
>When I made the above change I first introduced another lock
>'mon->monFDLock' to serialize the access to the fd, but then I saw
>that the qemuMonitorLock() was already doing that as well, so I ended
>up removing it again entirely. Well, to me this was convincing so that
>the above changes seemed 'safe'. Actually, I think the previous code
>could write and read to a file descriptor that doesn't communicate
>with the monitor anymore but may belong to another thread now (since
>the mon->fd wasn't set to -1, but kept as-is). So, to me this is
>actually an improvement.
>
Did I convince you?
Yes & no. On the surface it looks ok, but I'm still a little worried about
the implications, because I put this code in there to avoid some nasty
re-entrancy problems. I guessing some later re-factoring might have fixed
those problems making this code redundant. So I'll ACK for nwo, but if we
notice any unusual behaviour with the monitor, then this is the place to
look...
Daniel
--
|: Red Hat, Engineering, London -o-
http://people.redhat.com/berrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org -o-
http://deltacloud.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|