Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single file multi-threading (read-only) #477

Merged
merged 12 commits into from
Jul 26, 2023
Merged

Conversation

ejmeitz
Copy link
Contributor

@ejmeitz ejmeitz commented Jul 22, 2023

This PR aims to add support for multi-threaded read-only access to JLD2 files. I created a new global tracker OPEN_PARALLEL_FILES which keeps track of the files flagged as open for multi-threaded access. The flag for multi-threaded access was added to the jldopen function as parallel_read with a default of false. If the user changes this flag to true the program will check that the file is opened with in mode r and that it is not already present in the OPEN_FILES list. If these checks are satisfied jldopen proceeds as normal. The OPEN_FILES_LOCK is not invoked when parallel_read is true.

This is in reference to #403. Still need to add tests, but the existing tests pass locally.

Things I am unsure of:

  • Do I need to verify compression types are passed the same each time? (other flags as well)
  • How to write tests that guarantee parallel reads (e.g. could just get lucky with timing and one finishes before the other). I dont seem to be able to spawn threads in a testset.
  • Why the BigInt tests fails on two of the architectures in the CI (passes locally for me)

@codecov
Copy link

codecov bot commented Jul 22, 2023

Codecov Report

Patch coverage: 15.11% and project coverage change: -8.90% ⚠️

Comparison is base (0198f19) 86.06% compared to head (1e99703) 77.16%.
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #477      +/-   ##
==========================================
- Coverage   86.06%   77.16%   -8.90%     
==========================================
  Files          31       32       +1     
  Lines        4170     4704     +534     
==========================================
+ Hits         3589     3630      +41     
- Misses        581     1074     +493     
Files Changed Coverage Δ
src/data/type_defs.jl 100.00% <ø> (ø)
src/metadataview.jl 0.00% <0.00%> (ø)
src/datasets.jl 91.36% <50.00%> (ø)
src/data/reconstructing_datatypes.jl 80.79% <60.34%> (-1.10%) ⬇️
src/JLD2.jl 89.67% <95.45%> (+0.19%) ⬆️
src/backwards_compatibility.jl 36.63% <100.00%> (ø)
src/data/writing_datatypes.jl 96.55% <100.00%> (ø)

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@JonasIsensee JonasIsensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi

thank you for your contribution!

At this stage, this PR will not do what we want it to do.

I like the new keyword argument and that you check for the file to be in read-only mode.
However, the JLDFile object doesn not support parallel access.
To circumvent that, this PR must return a new JLDFile instance every time you do jldopen(filename).

Possibly contrary to what I said earlier, maybe we don't need to keep track of OPEN_PARALLEL_FILES. Currently it can't hold multiple file handles of the same file anyway. The only advantage would be to throw an error when a user first opens a parallel-read file handle, then edits the file in regular mode and continues to read with the original file handle.

src/JLD2.jl Outdated Show resolved Hide resolved
src/JLD2.jl Outdated Show resolved Hide resolved
@ejmeitz
Copy link
Contributor Author

ejmeitz commented Jul 23, 2023

Makes sense, that shouldn't be too hard to remedy.

@ejmeitz
Copy link
Contributor Author

ejmeitz commented Jul 23, 2023

Well the code definitely makes a new handle every time if parallel_read is true but I am still clueless as to how I can test this does what I think it does.

@JonasIsensee
Copy link
Collaborator

Why the BigInt tests fails on two of the architectures in the CI (passes locally for me)
Don't worry about this one. It happens sporadically. I'm not sure why.

@JonasIsensee
Copy link
Collaborator

Thank you for making the changes!

That is fine - I don't think we can properly test this through CI. I'll have a final close look at the code and test it locally.

Looking at the code now, I don't think I like the "OPEN_PARALLEL_FILES" list. I think we're better off without it.

Also, you (probably accidentally) committed a manifest and a .jld2 file.

@ejmeitz
Copy link
Contributor Author

ejmeitz commented Jul 23, 2023

  • I'll try and test it out as well. I have some of the banned use cases in tests but not actual parallel reads.
  • I left the OPEN_PARALLEL_FILES to check that case you mentioned of someone trying to open the same file elsewhere with parallel_read set to false and then writing to that file.
  • Yeah the manifest is from running the tests locally oops. It won't let me delete the test.jld2 file I committed for some reason.

@JonasIsensee
Copy link
Collaborator

  • I'll try and test it out as well. I have some of the banned use cases in tests but not actual parallel reads.

👍

* I left the `OPEN_PARALLEL_FILES` to check that case you mentioned of someone trying to open the same file elsewhere with `parallel_read` set to false and then writing to that file.

I understand and appreciate the intention. I'd also support it if it were that simple.
Imagine the following test-case.

fp = jldopen(testfile; parallel_read=true)
close(fp) # probably (already) fails because it can't be deleted from the OPEN_FILES Dict

fs = jldopen(testfile; parallel_read=false) # fails because OPEN_PARALLEL_FILES has an entry (that may be WeakRef(nothing))

Even harder to fix:

fp1 = jldopen(testfile; parallel_read=true)
fp2 = jldopen(testfile; parallel_read=true) # overwrites reference tot fp1
close(fp2)

# If you fix the case above, JLD2 would now still think that there is no open parallel file handle.
fs = jldopen(tesfile; parallel_read=false) # should fail according to current logic but doesn't  

Since this is quite a bit of work (Would need to keep a full list of ALL open file handles and also check whether they are still open) I would prefer removing this safety check and add a warning in the docs instead.

* Yeah the manifest is from running the tests locally oops. It won't let me delete the test.jld2 file I committed for some reason.

That's fine. I can fix that in the end.

@JonasIsensee
Copy link
Collaborator

Hi @ejmeitz,

thanks for helping out!
I had a few minutes to spare and took the liberty to finish this up real quick.

I relaxed the open conditions. (parallel and serial read is fine)
Will merge and release once tests pass.

@JonasIsensee JonasIsensee self-requested a review July 26, 2023 07:13
@JonasIsensee JonasIsensee merged commit c47f45b into JuliaIO:master Jul 26, 2023
11 of 14 checks passed
@ZenanH
Copy link

ZenanH commented Sep 25, 2024

Hi,

I want to read a jld2 file in multi-threaded mode, so I do my code with WriteVTK.jl like this:

paraview_collection(mps_path) do pvd
    @inbounds Threads.@threads for i in 1:itr
        jldopen(jld2dir, "r"; parallel_read=true) do fid
            # read data from jld2 file
            time = fid["group$(i)/time"]
            sig = fid["group$(i)/sig"]
            ...

            # write data
            VTU_cls = [MeshCell(VTKCellTypes.VTK_VERTEX, [i]) for i in 1:mp_num]
            VTU_pts = Array{Float64}(mp_pos')
            let vtk = vtk_grid(joinpath(anim_path, "iter_$(i)"), VTU_pts, VTU_cls)
                vtk["stress"] = sig[:, [1, 2, 4]]'
                pvd[time] = vtk
                ...
                
            end
        end
    end
end
output.mp4

I noticed an issue in a multithreading scenario where some random time steps cause 'shaking'. I've checked the JLD2 file, and it is fine. Therefore, I suspect the issue might be caused by multithreading (it always works fine when multithreading is disabled).

Is it because I am reading it the wrong way😅, or is it a bug in multithreading?

@JonasIsensee
Copy link
Collaborator

@ZenanH
The JLD2 part looks fine by itself.
You are reusing the iteration variable i a bunch of times. You could check that this is not accidentally messing things up in the rest of your code.

@ejmeitz
Copy link
Contributor Author

ejmeitz commented Sep 25, 2024

I use this feature in my code and it works correctly, so I think JLD2 is fine. I don't really understand your code but there could be a race condition in there somewhere. Its hard to know if the issue is JLD2 or elsewhere.

@ZenanH
Copy link

ZenanH commented Sep 27, 2024

Thank you for your reply👍, the problem is on WriteVTK.jl side, and here is the solution (JuliaVTK/WriteVTK.jl#150).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants