Ján Tomko wrote:
[cc: Guido]
On Sat, Jul 01, 2017 at 02:18:58PM +0400, Roman Bogorodskiy wrote:
> Andrea Bolognani wrote:
>> virnetsockettest also fails pretty often for me, certainly
>> more than your figure; even if that wasn't the case, 1/5
>> failure rate is way too high for a CI job.
>
>I played a little more with virnetsockettest to get real stats and
>figured the following:
>
> 1. On my desktop (i5) and laptop (i3), I didn't get any failures in 50
> 'check' runs
> 2. On a VM that I use to run test builds in Jenkins, out of 50 runs it
> fails from 1 to 6 times; I did this test a couple of times and either I
> was lucky or failure rate is higher when my Jenkins perform regular
> builds.
>
>Anyway, I'll try to find a way to debug what's going on with
>virnetsockettest.
>
IIRC Debian disabled this test years ago.
Guido, have you ever discovered the cause?
Jan
I made some experiments on the weekend, and here are my results:
On a box where test fails from time to time, it fails at this point:
virObjectUnref(csock);
for (i = 0; i < nlsock; i++) {
if (virNetSocketAccept(lsock[i], &ssock) != -1 && ssock) {
char c = 'a';
if (virNetSocketWrite(ssock, &c, 1) != -1 &&
virNetSocketRead(ssock, &c, 1) != -1) {
VIR_DEBUG("Unexpected client socket present"); <--- HERE
goto cleanup;
}
}
virObjectUnref(ssock);
ssock = NULL;
}
On a box where this test never fails, it reaches this block, but:
* virNetSocketWrite(ssock, &c, 1) != -1
* virNetSocketRead(ssock, &c, 1) == -1
It's enough to make the test pass. On a failing box both Write() and Read()
return != -1 when the test fails.
I'm not quite sure what this specific block is testing though. My guess
was that calling "virObjectUnref(csock);" will destroy the client socket
and Accept() will not work (this should also make the test pass I
guess).
Anyway, I tried to insert sleep(1) right after virObjectUnref(csock),
the one before the virNetSocketAccept() call,
and the test stopped failing. I've started it in a loop like:
while test $? -eq 0; do ./virnetsockettest; done
and actually forgot to stop, so it's running since Saturday without
failures.
I've haven't had a chance yet to debug it further.
Roman Bogorodskiy