Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous NetCDF output. #137

Merged
merged 6 commits into from
Mar 19, 2019
Merged

Asynchronous NetCDF output. #137

merged 6 commits into from
Mar 19, 2019

Conversation

ali-ramadhan
Copy link
Member

@ali-ramadhan ali-ramadhan commented Mar 19, 2019

This PR adds an option async::Bool to the NetCDFOutputWriter. By default async=false. When async=true the data to be written to disk is copied and is asynchronously written to disk by a second worker/process. This is especially useful when running models on the GPU as writing and compressing NetCDF output is time consuming and the GPU is just idle while this is happening.

In particular, compressing NetCDF seems to be very inefficient. It takes way too long. I'm considering setting compress=0 or trying out NCDatasets.jl. We can also just write output to JLD then convert all the JLD files to one big NetCDF after the model is done time stepping.

See the free convection example to see this feature in action.

I tried to keep all the parallel computing stuff under the hood but unfortunately I could not get this feature to work without manually calling addprocs() and @everywhere using Oceananigans in the example, so for now it requires some work on the user's part.

cc: @SandreOuza

Resolves #124.

@ali-ramadhan ali-ramadhan added feature 🌟 Something new and shiny performance 🏍️ So we can get the wrong answer even faster labels Mar 19, 2019
@ali-ramadhan ali-ramadhan modified the milestones: v0.6, v0.5 Mar 19, 2019
@ali-ramadhan ali-ramadhan self-assigned this Mar 19, 2019
@glwagner
Copy link
Member

+1 for NCDatasets.

Asynchronous output writing is great; we will also want this (or at least a similar concept) for high-frequency diagnostics gathering + occasional writing of gathered diagnostics.

I'll open an issue later today once I hammer out the requirements for this functionality.

@ali-ramadhan
Copy link
Member Author

ali-ramadhan commented Mar 19, 2019

It's pretty bad. I don't think the compression level functionality even does anything!

using NetCDF
N = 256^3
A = rand(N)

for cl in 0:9
    tic = time_ns()
    
    filename = "compress" * string(cl) * ".nc"
    varname  = "rands"
    attribs  = Dict("units"   => "m/s")

    nccreate(filename, varname, "x1", collect(1:N), Dict("units"=>"m"), atts=attribs, compress=cl)
    ncwrite(A, filename, varname)
    ncclose(filename)
    
    toc = time_ns()
    
    ts = prettytime(toc - tic)
    fs = datasize(filesize(filename); style=:bin, format="%.3f")
    println("Compression level $cl: $ts $fs")
end
Compression level 0: 4.784 s 233.989 MiB
Compression level 1: 4.618 s 233.989 MiB
Compression level 2: 4.358 s 233.989 MiB
Compression level 3: 5.900 s 233.989 MiB
Compression level 4: 4.456 s 233.989 MiB
Compression level 5: 4.643 s 233.989 MiB
Compression level 6: 4.353 s 233.989 MiB
Compression level 7: 5.022 s 233.989 MiB
Compression level 8: 6.425 s 233.989 MiB
Compression level 9: 4.271 s 233.989 MiB

I opened an issue (JuliaGeo/NetCDF.jl#87) but it seems like an inactive package so not expecting a reply.

Either way I need to figure out why CI is crapping out. Might be the use of the Distributed package.

@ali-ramadhan
Copy link
Member Author

PS: @glwagner we already have #31 open to discuss NetCDF output.

If you open a new issue we should close #31.

@ali-ramadhan
Copy link
Member Author

Tests are passing so I will merge unless there are any objections.

Code coverage decreased but I'm not sure how to test this as it would require spinning up another worker on CI (not available to us as we're on the free-tier CI plans).

@codecov
Copy link

codecov bot commented Mar 19, 2019

Codecov Report

Merging #137 into master will decrease coverage by 0.75%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #137      +/-   ##
==========================================
- Coverage   56.78%   56.03%   -0.76%     
==========================================
  Files          19       19              
  Lines         597      605       +8     
==========================================
  Hits          339      339              
- Misses        258      266       +8
Impacted Files Coverage Δ
src/fields.jl 43.54% <ø> (ø) ⬆️
src/output_writers.jl 0% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 47b4613...2aba006. Read the comment docs.

1 similar comment
@codecov
Copy link

codecov bot commented Mar 19, 2019

Codecov Report

Merging #137 into master will decrease coverage by 0.75%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #137      +/-   ##
==========================================
- Coverage   56.78%   56.03%   -0.76%     
==========================================
  Files          19       19              
  Lines         597      605       +8     
==========================================
  Hits          339      339              
- Misses        258      266       +8
Impacted Files Coverage Δ
src/fields.jl 43.54% <ø> (ø) ⬆️
src/output_writers.jl 0% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 47b4613...2aba006. Read the comment docs.

@ali-ramadhan ali-ramadhan merged commit af9ba13 into master Mar 19, 2019
@ali-ramadhan ali-ramadhan deleted the async-netcdf-io branch March 19, 2019 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature 🌟 Something new and shiny performance 🏍️ So we can get the wrong answer even faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants