Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document chunking in nccreate #87

Open
ali-ramadhan opened this issue Mar 19, 2019 · 6 comments
Open

Document chunking in nccreate #87

ali-ramadhan opened this issue Mar 19, 2019 · 6 comments

Comments

@ali-ramadhan
Copy link

Please let me know if I'm doing this wrong but I was trying to find a nice balance between compression time and file size by benchmarking compression levels 0-9 but instead find that the compression level does nothing.

using NetCDF
N = 256^3
A = rand(N)

for cl in 0:9
    tic = time_ns()
    
    filename = "compress" * string(cl) * ".nc"
    varname  = "rands"
    attribs  = Dict("units"   => "m/s")

    nccreate(filename, varname, "x1", collect(1:N), Dict("units"=>"m"), atts=attribs, compress=cl)
    ncwrite(A, filename, varname)
    ncclose(filename)
    
    toc = time_ns()
    
    ts = prettytime(toc - tic)
    fs = datasize(filesize(filename); style=:bin, format="%.3f")
    println("Compression level $cl: $ts $fs")
end
Compression level 0: 4.784 s 233.989 MiB
Compression level 1: 4.618 s 233.989 MiB
Compression level 2: 4.358 s 233.989 MiB
Compression level 3: 5.900 s 233.989 MiB
Compression level 4: 4.456 s 233.989 MiB
Compression level 5: 4.643 s 233.989 MiB
Compression level 6: 4.353 s 233.989 MiB
Compression level 7: 5.022 s 233.989 MiB
Compression level 8: 6.425 s 233.989 MiB
Compression level 9: 4.271 s 233.989 MiB
@jarlela
Copy link
Contributor

jarlela commented Mar 19, 2019

Compression of NetCDF files is only enabled if chunking is also enabled, nccreate uses chunksize=(0,) by default. Try adding , chunksize=(C,) , where C is some integer larger than 1 and smaller than N to your nccreate arguments.

@ali-ramadhan
Copy link
Author

Thanks for the super quick reply @jarlela! Ah I did not know about chunking (don't think it was in the documentation).

I tried a typical 4 MiB chunk size with chunksize=(4096,) but again noticed no difference between compression level and file size.

I thought maybe it was because I'm saving a very long vector so I'm now trying to save a 3D array. I magically found that it doesn't crash with chunksize=(1,16,256) (still 4 MiB I think?) but even then I still don't see a difference between compression level and file size.

N = 256
A = rand(N, N, N)

for cl in 0:9
    tic = time_ns()
    
    filename = "compress" * string(cl) * ".nc"
    varname  = "rands"
    attribs  = Dict("units"   => "m/s")

    nccreate(filename, varname,
             "x1", collect(1:N), Dict("units"=>"m"),
             "x2", collect(1:N), Dict("units"=>"m"),
             "x3", collect(1:N), Dict("units"=>"m"),
             atts=attribs, chunksize=(1,16,256), compress=cl)
    ncwrite(A, filename, varname)
    ncclose(filename)
    
    toc = time_ns()
    
    ts = prettytime(toc - tic)
    fs = datasize(filesize(filename); style=:bin, format="%.3f")
    println("Compression level $cl: $ts $fs")
end
Compression level 0: 2.996 s 111.172 MiB
Compression level 1: 3.009 s 111.172 MiB
Compression level 2: 3.058 s 111.172 MiB
Compression level 3: 3.047 s 111.172 MiB
Compression level 4: 3.069 s 111.172 MiB
Compression level 5: 3.027 s 111.172 MiB
Compression level 6: 3.029 s 111.172 MiB
Compression level 7: 3.046 s 111.172 MiB
Compression level 8: 3.016 s 111.172 MiB
Compression level 9: 2.969 s 111.172 MiB

@meggart
Copy link
Member

meggart commented Mar 25, 2019

Thanks for reporting, could you try this branch #88 ?

On a side note, when running your example, you should provide the chunksize in Julia-ordered dimensions, so you probably wanted chunksize = (256,16,1) to align along the first axis, in case you want performance. On the other hand, you don't see a lot of compression in random numbers, so I did something like A = Float64.(rand(1:10,N,N,N) to be able to compress something.

@ali-ramadhan
Copy link
Author

Hey @meggart thanks for looking into this. Tried #88 and it's working as expected now!

N = 256
A = Float64.(rand(1:10, N, N, N))

for cl in 0:9
    tic = time_ns()
    
    filename = "compress" * string(cl) * ".nc"
    varname  = "rands"
    attribs  = Dict("units"   => "m/s")

    nccreate(filename, varname,
             "x1", collect(1:N), Dict("units"=>"m"),
             "x2", collect(1:N), Dict("units"=>"m"),
             "x3", collect(1:N), Dict("units"=>"m"),
             atts=attribs, chunksize=(256,16,1), compress=cl)
    ncwrite(A, filename, varname)
    ncclose(filename)
    
    toc = time_ns()
    
    ts = prettytime(toc - tic)
    fs = datasize(filesize(filename); style=:bin, format="%.3f")
    println("Compression level $cl: $ts $fs")
end
Compression level 0: 866.125 ms 128.280 MiB
Compression level 1: 930.343 ms 12.469 MiB
Compression level 2: 985.205 ms 12.097 MiB
Compression level 3: 1.078 s 11.716 MiB
Compression level 4: 1.407 s 11.421 MiB
Compression level 5: 1.557 s 11.212 MiB
Compression level 6: 1.918 s 11.010 MiB
Compression level 7: 2.348 s 10.952 MiB
Compression level 8: 4.791 s 10.858 MiB
Compression level 9: 6.997 s 10.843 MiB

@bjarthur
Copy link

bjarthur commented Aug 3, 2023

+1 for the suggestion above to document chunking. it's still not mentioned anywhere in docs/ nor in the docstrings, e.g.:

help?> nccreate
search: nccreate

  nccreate (filename, varname, dimensions ...)

  Create a variable in an existing NetCDF file or generates a new file. filename and varname
  are strings. After that follows a list of dimensions. Each dimension entry starts with a
  dimension name (a String), and may be followed by a dimension length, an array with
  dimension values or a Dict containing dimension attributes. Then the next dimension is
  entered and so on. Have a look at examples/high.jl for an example use.

  Keyword arguments
  –––––––––––––––––––

    •  atts Dict of attribute names and values to be assigned to the variable created

    •  gatts Dict of attribute names and values to be written as global attributes

    •  compress Integer [0..9] setting the compression level of the file, only valid if
       mode=NC_NETCDF4

    •  t variable type, currently supported types are: const NC_BYTE, NC_CHAR, NC_SHORT,
       NC_INT, NC_FLOAT, NC_LONG, NC_DOUBLE

    •  mode file creation mode, only valid when new file is created, choose one of:
       NC_NETCDF4, NC_CLASSIC_MODEL, NC_64BIT_OFFSET

also, @meggart, can you please elaborate what did you meant by:

you probably wanted chunksize = (256,16,1) to align along the first axis, in case you want performance

what aspect of performance is improved if the chunk is bigger in the first axis? compression ratio, read time, something else? thanks!

@meggart meggart reopened this Aug 4, 2023
@meggart meggart changed the title Compression level functionality in nccreate does not seem to work. Document chunking in nccreate Aug 4, 2023
@meggart
Copy link
Member

meggart commented Aug 4, 2023

Ok, I have re-opened and changed the title of the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants