Skip to content

Occasional "broken pipe" errors on Travis/AppVeyor during reflection test #9501

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tkelman opened this issue Dec 30, 2014 · 10 comments · Fixed by #9678
Closed

Occasional "broken pipe" errors on Travis/AppVeyor during reflection test #9501

tkelman opened this issue Dec 30, 2014 · 10 comments · Fixed by #9678
Labels
bug Indicates an unexpected problem or unintended behavior io Involving the I/O subsystem: libuv, read, write, etc. test This change adds or pertains to unit tests

Comments

@tkelman
Copy link
Contributor

tkelman commented Dec 30, 2014

These tests were added in #8251 - I've been seeing intermittent failures on both Travis and AppVeyor over the last month or so (https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.946/job/ih37028wuuky5mvs and https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.67/job/dyhnx0wobs5l5kkj and a bunch more I'll have to go digging for).

AppVeyor almost always gives an error message like

    From worker 3:       * reflection
ERROR: test error in expression: code_native(ismatch,(Regex,AbstractString)) == nothing
write: broken pipe (EPIPE)
 in wait at no file
 in stream_wait at no file
while loading reflection.jl, in expression starting on line 11
while loading C:\projects\julia\test\runtests.jl, in expression starting on line 42

but often on Travis there won't be an error, it'll just manifest as the worker that's running the reflection test dying.

The I/O label is due to this code at the start of test/reflection.jl:

# redirect stdout and stderr to avoid spam during tests.
oldout = STDOUT
olderr = STDERR
redirect_stdout()
redirect_stderr()
@tkelman tkelman added test This change adds or pertains to unit tests io Involving the I/O subsystem: libuv, read, write, etc. labels Dec 30, 2014
@timholy
Copy link
Member

timholy commented Dec 31, 2014

There's a certain amount of irony in the involvement of #8251, but obviously there's a real (and important) issue here. Thanks for noticing the pattern.

@vtjnash vtjnash added the bug Indicates an unexpected problem or unintended behavior label Dec 31, 2014
@ihnorton
Copy link
Member

This is a bit of a shot in the dark, and probably unrelated, but: I wonder if this might be a sort of race condition between libuv and the JIT. When running julia against LLVM built in debug mode (where codegen is quite slow) I occasionally see some really weird behavior in the REPL where interactions (for example Ctrl-R mode) will seem to get stuck for a second or so, and then suddenly everything "catches up". I think I also see this (but very rarely) against llvm-release when Julia itself is built in debug and running under gdb. Seems unlikely though because I would expect much more to be broken.

@timholy
Copy link
Member

timholy commented Dec 31, 2014

Interestingly, if (starting from before #9521 got merged) I do make test-reflection, I get failures pretty often:

tim@diva:~/src/julia$ make test-reflection
    JULIA test/reflection
     * reflection
make[1]: *** [reflection] Error 141
make: *** [test-reflection] Error 2

But I did 10 successful runs in a row with the current master.

Here's the (to me) weird part: if I check out the old test/reflection.jl with

git show 78a7175561c830419f708c2df737cd39d9b689c1:test/reflection.jl > /tmp/reflection.jl

and then run it many times:

for i in `seq 1 10`; do julia /tmp/reflection.jl done

it does not (visibly) produce any errors. But if I copy the old file into test, then

for i in `seq 1 10`; do make test-reflection; done

errors every time. So:

  • Current master works because the test changed, not because I fixed a bug
  • There is some difference I don't yet understand between running a script as make test-reflection and julia test/reflection.jl.

Now, check this out:

tim@diva:~/src/julia/test$ for i in `seq 1 10`; do julia --check-bounds=yes -f runtests.jl reflection; echo $?; done
     * reflection
141
     * reflection
    SUCCESS
0
     * reflection
141
     * reflection
141
     * reflection
141
     * reflection
141
     * reflection
141
     * reflection
141
     * reflection
141
     * reflection
141
tim@diva:~/src/julia/test$ for i in `seq 1 10`; do julia --check-bounds=yes reflection.jl; echo $?; done
1
1
1
1
1
1
1
1
1
1

Note that with a single test file, runtests.jl does not start up more processes.

@timholy
Copy link
Member

timholy commented Dec 31, 2014

Update: if I change the bash line to

for i in `seq 1 10`; do julia --check-bounds=yes -f -e 'using Base.Test; include("reflection.jl")'; echo $?; done

then I get exit code 0 for all the runs. (And this is using the problematic test/reflection.jl, the one that fails when used with make test-reflection.)

@timholy
Copy link
Member

timholy commented Jan 1, 2015

It sure seems like a race condition. I can strip runtests.jl down to this:

testnames = [
    "linalg", "core", "keywordargs", "numbers", "strings", "dates",
    "collections", "hashing", "remote", "iobuffer", "staged", "arrayops",
    "subarray", "reduce", "reducedim", "random", "intfuncs",
    "simdloop", "blas", "fft", "dsp", "sparse", "bitarray", "copy", "math",
    "functional", "bigint", "sorting", "statistics", "spawn",
    "backtrace", "priorityqueue", "arpack", "file", "suitesparse", "version",
    "resolve", "pollfd", "mpfr", "broadcast", "complex", "socket",
    "floatapprox", "readdlm", "reflection", "regex", "float16", "combinatorics",
    "sysinfo", "rounding", "ranges", "mod2pi", "euler", "show",
    "lineedit", "replcompletions", "repl", "test", "goto",
    "llvmcall", "grisu", "nullable", "meta", "profile",
    "libgit2", "docs", "base64"
]

tests = ARGS

@everywhere include("testdefs.jl")
# include("testdefs.jl")

reduce(propagate_errors, nothing, pmap(runtests, tests; err_retry=false, err_stop=true))

println("    \033[32;1mSUCCESS\033[0m")

and still get the 141 exit code much of the time. Curiously, if I delete the definition of testnames (which is unused), I almost never get the 141 code. And if I comment out the @everywhere line and uncomment the one below it, I almost never get the 141 code.

I don't know enough about pipes, etc to take this any farther.

@timholy
Copy link
Member

timholy commented Jan 8, 2015

Hmm, somehow this got automatically closed when #9678 was merged. Reopening.

@timholy timholy reopened this Jan 8, 2015
@tkelman
Copy link
Contributor Author

tkelman commented Jan 8, 2015

Whoops, it had the string "fix #9501" in the PR description.

@SimonDanisch
Copy link
Contributor

Is this related?
I get this randomly in some of my test code:

error in running finalizer: jl_uv_writecb() ERROR: broken pipe EPIPE
jl_uv_writecb() ERROR: broken pipe EPIPE
jl_uv_writecb() ERROR: broken pipe EPIPE
jl_uv_writecb() ERROR: broken pipe EPIPE
jl_uv_writecb() ERROR: broken pipe EPIPE
jl_uv_writecb() ERROR: broken pipe EPIPE
jl_uv_writecb() ERROR: broken pipe EPIPE
jl_uv_writecb() ERROR: broken pipe EPIPE
jl_uv_writecb() ERROR: broken pipe EPIPE
jl_uv_writecb() ERROR: broken pipe EPIPE

@SimonDanisch
Copy link
Contributor

It seems to stem from an error in the finalizer, which then wants to be printed but i guess the pipe is already freed? So not really related, but also not very nice...

@vtjnash
Copy link
Member

vtjnash commented Jul 20, 2016

resolved by #9521, the original bug was in the test (redirecting stdout, but not reading from it to prevent kernel from blocking on more data)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior io Involving the I/O subsystem: libuv, read, write, etc. test This change adds or pertains to unit tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants