Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when writing big arrays with zstd compression #599

Closed
ali-ramadhan opened this issue Sep 15, 2024 · 3 comments
Closed

Comments

@ali-ramadhan
Copy link

Trying to write large arrays to disk using zstd compression seems to consistently result in segmentation faults.

This could be an issue with CodecZstd.jl (see JuliaIO/CodecZstd.jl#25) and unsafe_wrap usage. I see some unsafe_wrap usage in https://github.com/JuliaIO/JLD2.jl/blob/3647927e85d1fc832d460f7c05bb197c98eac3a4/src/data/writing_datatypes.jl

Should they be wrapped around with a GC.@preserve block?

Not sure if the issue really belongs here or in CodecZstd.jl but just thought to open it to discuss.

MWE:

using JLD2, CodecZstd

jldopen("test.jld2", "w"; compress = ZstdFrameCompressor()) do file
    x = randn(10, 500, 500, 500)
    for i in 1:10
        file[string(i)] = x[i, :, :, :]
    end
end

Error:

[104407] signal (11.1): Segmentation fault
in expression starting at REPL[10]:1
ZSTD_CCtx_reset at /home/alir/.julia/artifacts/4c45bf9c8292490acd9463bbfbf168277d9720b6/lib/libzstd.so (unknown line)
ZSTD_initCStream at /home/alir/.julia/artifacts/4c45bf9c8292490acd9463bbfbf168277d9720b6/lib/libzstd.so (unknown line)
ZSTD_initCStream at /home/alir/.julia/packages/CodecZstd/KmRP9/src/LibZstd_clang.jl:277 [inlined]
initialize! at /home/alir/.julia/packages/CodecZstd/KmRP9/src/libzstd.jl:60 [inlined]
initialize at /home/alir/.julia/packages/CodecZstd/KmRP9/src/compression.jl:82 [inlined]
deflate_data at /home/alir/.julia/packages/JLD2/JHhTf/src/compression.jl:177
unknown function (ip: 0x73792cd194c5)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
write_dataset at /home/alir/.julia/packages/JLD2/JHhTf/src/datasets.jl:360
unknown function (ip: 0x73792cd190e5)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
write_dataset at /home/alir/.julia/packages/JLD2/JHhTf/src/inlineunion.jl:48
write_dataset at /home/alir/.julia/packages/JLD2/JHhTf/src/inlineunion.jl:37 [inlined]
#write#127 at /home/alir/.julia/packages/JLD2/JHhTf/src/compression.jl:87
write at /home/alir/.julia/packages/JLD2/JHhTf/src/compression.jl:75 [inlined]
write at /home/alir/.julia/packages/JLD2/JHhTf/src/compression.jl:75 [inlined]
setindex! at /home/alir/.julia/packages/JLD2/JHhTf/src/groups.jl:103 [inlined]
setindex! at /home/alir/.julia/packages/JLD2/JHhTf/src/JLD2.jl:373 [inlined]
#5 at ./REPL[10]:4
unknown function (ip: 0x73792cd173f5)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
#jldopen#93 at /home/alir/.julia/packages/JLD2/JHhTf/src/loadsave.jl:4
jldopen at /home/alir/.julia/packages/JLD2/JHhTf/src/loadsave.jl:1
unknown function (ip: 0x73792cd01235)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/interpreter.c:617
jl_interpret_toplevel_thunk at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:877
jl_toplevel_eval_flex at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:877
jl_toplevel_eval_flex at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
eval_user_input at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:150
repl_backend_loop at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:246
#start_repl_backend#46 at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:231
start_repl_backend at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:228
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
#run_repl#59 at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:389
run_repl at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:375
jfptr_run_repl_91805.1 at /home/alir/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
#1013 at ./client.jl:432
jfptr_YY.1013_82772.1 at /home/alir/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_latest at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/builtins.c:812
#invokelatest#2 at ./essentials.jl:892 [inlined]
invokelatest at ./essentials.jl:889 [inlined]
run_main_repl at ./client.jl:416
exec_options at ./client.jl:333
_start at ./client.jl:552
jfptr__start_82798.1 at /home/alir/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
true_main at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/jlapi.c:731
main at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/cli/loader_exe.c:58
unknown function (ip: 0x737934054e07)
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 7222260 (Pool: 7219373; Big: 2887); GC: 13
[1]    104407 segmentation fault (core dumped)  julia

Environment:

julia> versioninfo()
Julia Version 1.10.5
Commit 6f3fdf7b362 (2024-08-27 14:19 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 48 × AMD Ryzen Threadripper 7960X 24-Cores
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 48 virtual cores)
Environment:
  LD_PRELOAD = /usr/NX/lib/libnxegl.so

Packages:

(@v1.10) pkg> st -m
Status `~/.julia/environments/v1.10/Manifest.toml`
  [da1fd8a2] CodeTracking v1.3.6
  [6b39b394] CodecZstd v0.8.5
  [5789e2e9] FileIO v1.16.3
  [033835bb] JLD2 v0.5.1
  [692b3bcd] JLLWrappers v1.6.0
⌃ [aa1ae85d] JuliaInterpreter v0.9.35
⌃ [6f1432cf] LoweredCodeUtils v3.0.1
  [1914dd2f] MacroTools v0.5.13
  [bac558e1] OrderedCollections v1.6.3
  [aea7be01] PrecompileTools v1.2.1
  [21216c6a] Preferences v1.4.3
  [ae029012] Requires v1.3.0
  [295af30f] Revise v3.5.18
  [3bb67fe8] TranscodingStreams v0.11.2
  [3161d3a3] Zstd_jll v1.5.6+0
  [0dad84c5] ArgTools v1.1.1
  [56f22d72] Artifacts
  [2a0f44e3] Base64
  [ade2ca70] Dates
  [8ba89e20] Distributed
  [f43a241f] Downloads v1.6.0
  [7b1f6079] FileWatching
  [b77e0a4c] InteractiveUtils
  [b27032c2] LibCURL v0.6.4
  [76f85450] LibGit2
  [8f399da3] Libdl
  [56ddb016] Logging
  [d6f4376e] Markdown
  [a63ad114] Mmap
  [ca575930] NetworkOptions v1.2.0
  [44cfe95a] Pkg v1.10.0
  [de0858da] Printf
  [3fa0cd96] REPL
  [9a3f8284] Random
  [ea8e919c] SHA v0.7.0
  [9e88b42a] Serialization
  [6462fe0b] Sockets
  [fa267f1f] TOML v1.0.3
  [a4e569a6] Tar v1.10.0
  [cf7118a7] UUIDs
  [4ec0a83e] Unicode
  [deac9b47] LibCURL_jll v8.4.0+0
  [e37daf67] LibGit2_jll v1.6.4+0
  [29816b5a] LibSSH2_jll v1.11.0+1
  [c8ffd9c3] MbedTLS_jll v2.28.2+1
  [14a3606d] MozillaCACerts_jll v2023.1.10
  [83775a58] Zlib_jll v1.2.13+1
  [8e850ede] nghttp2_jll v1.52.0+1
  [3f19e933] p7zip_jll v17.4.0+2
Info Packages marked with ⌃ have new versions available and may be upgradable.
@JonasIsensee
Copy link
Collaborator

JonasIsensee commented Sep 15, 2024

Hi @ali-ramadhan,
thank you for the report! (and also for digging a bit yourself!)

I'm afraid, the issue is a bit more subtle and JLD2 is not really at fault.

I can reproduce your segfault but entirely without using JLD2:

using CodecZstd, TranscodingStreams

compressor = ZstdFrameCompressor()
x = rand(UInt8, 4*10^7);
TranscodingStreams.initialize(compressor)
ret1 = transcode(compressor, x);
TranscodingStreams.finalize(compressor)

# compress again using the same compressor
TranscodingStreams.initialize(compressor) # segfault happens here!
# ret2 = transcode(compressor, x);
# TranscodingStreams.finalize(compressor)

I've opened a new issue over at CodecZstd JuliaIO/CodecZstd.jl#70

@ali-ramadhan
Copy link
Author

Thank you for looking into this @JonasIsensee and for reducing the MWE down to the core issue! Will track the issue you created in CodecZstd.jl.

@ali-ramadhan
Copy link
Author

Confirming that my MWE works now and does not segfault now that JuliaIO/CodecZstd.jl#74 has been merged and will be in the next version of CodecZstd.jl 🎉 Thank you @nhz2 and @JonasIsensee!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants