Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grab bag of outstanding issues #5

Open
7 of 11 tasks
omad opened this issue Oct 10, 2018 · 6 comments
Open
7 of 11 tasks

Grab bag of outstanding issues #5

omad opened this issue Oct 10, 2018 · 6 comments

Comments

@omad
Copy link
Contributor

omad commented Oct 10, 2018

Work Generation

  • Incremental. Need a way to compare what's on S3 to what's not.
    • Our current unit of work for COG conversion is a NetCDF file. Either stacked or unstacked. It will be awkward to compare Stacked NetCDF files to data existing on S3 since they represent different one dataset vs many..

COG Converter

  • Configurable COG parameters when generating overviews.
    • Resampling method for overlays for different products
    • Number of overview levels
    • Compression/chunk size (maybe, deflate/512 is good, but...)
  • Is it faster/easier/more configurable to use rio cogeo than raw GDAL.
  • Review/test the parameters used

Uploader

  • Specify bucket instead of having COG-Conversion define it.
  • Give uploader an option to move files to a COMPLETE directory instead of deleting them. Will let us test upload to a dev bucket, and then run again against the prod bucket.
  • SPEED How fast can we upload in a single thread, do we need parallel upload processes?
  • MAYBE Ability to watch multiple directories?
@ashoka1234
Copy link
Contributor

ashoka1234 commented Oct 10, 2018

For the COG-conversion

  • Validation of COG datasets - On which side do we do this?

@omad
Copy link
Contributor Author

omad commented Oct 12, 2018

A very quick test of rio cogeo indicates no significant performance difference.

@omad
Copy link
Contributor Author

omad commented Oct 12, 2018

  • Easier submission to large parallel PBS jobs
  • Useful progress logging when run in PBS or as a background process
    • We currently use tqdm for an interactive progress bar, it might be possible to use it for background progress logs
  • Finer grained progress when converting stacked NetCDF files
  • Progress and speed metrics from the Uploader

@ashoka1234
Copy link
Contributor

For the COG-conversion

  • Further configurability of upload directory structure, for example some products wants flat directory structure in AWS when only yearly time makes sense, i.e. without months and days
  • What do we do when time indicated in the file name rather than the timestamp of dataset makes sense

@omad
Copy link
Contributor Author

omad commented Oct 12, 2018

  • Make the src_template more flexible and simplify class COGProductConfiguration by using either parse or accepting regexes.
  • Look into using MPIPoolExecutor for distributing work inside PBS jobs

@emmaai
Copy link
Contributor

emmaai commented Nov 15, 2018

For the COG-conversion

* [x]  Validation of `COG` datasets - On which side do we do this?

On NCI, done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants