On Wed, Nov 11, 2009 at 11:06:50AM +0000, Daniel P. Berrange wrote:
On Wed, Nov 11, 2009 at 11:55:29AM +0100, Daniel Veillard wrote:
> On Wed, Nov 11, 2009 at 03:11:31AM -0200, Eduardo Otubo wrote:
> > Hello all,
> >
> > Since I moved to libssh2 I had noticed a weird behaviour, virsh was
> > taking too much time to complete the operations when using phyp driver.
> > Just found the problem, 10 seconds of timeout passed to select().
> > Changed to zero, since I'm just polling the socket.
> >
> > Actually this patch is more important than it seems, now I can write a
> > script (using virsh) to test all the phyp features.
>
> The problem is that you end up creating a busy loop by doing this,
> and I would be surprized if that patch was actually correct. Somehow
> somewhere in one of the loops calling waitsocket() you should not make
> that call, this is what is blocking you and using gdb (or another
> debugger) during that time out will allow you to find out exactly where
> this is hapening. So rather than just dropping the ball and just looping
> you should take that opportunity of a reproduceable bug to debug it :-)
>
> In the meantime moving the timeout from 10s to 1 millisecond by
> changing tv_sec to 0 and tv_usec to 1000 is certainly a good
> workaround, it will still avoid looping needlessly, and the 1ms extra
> timeout should not break your scripts. I will change the code
> accordingly, but you should debug your issue with the 10s delay (or even
> increase it if it makes things easier to set up).
>
> So I pushed that modified fix but please debug the issue :-)
I don't think that is correct really.
This function is invoked when libssh2 gets EAGAIN, and needs to wait for
more data to arrive on the socket. It should be waiting for select() to
show the file handle is writable or readable, arguably using an inifinite
timeout, particularly because no caller of waitsocket() ever bothers to
check the return value for a timeout errors
The problem is that right now, the code doesn't call it only when
getting EAGAIN from libssh2 but in a number of various loops. It's
clearly called from places it shouldn't (hence the 10s timeouts), and
current patch is only a band aid. Hence my request to do the actual
debug.
Daniel
--
Daniel Veillard | libxml Gnome XML XSLT toolkit
http://xmlsoft.org/
daniel(a)veillard.com | Rpmfind RPM search engine
http://rpmfind.net/
http://veillard.com/ | virtualization library
http://libvirt.org/