Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R access for CONUS404 #72

Open
amsnyder opened this issue Sep 22, 2022 · 4 comments
Open

R access for CONUS404 #72

amsnyder opened this issue Sep 22, 2022 · 4 comments
Assignees

Comments

@amsnyder
Copy link
Contributor

amsnyder commented Sep 22, 2022

Creating a thread to discuss R access to the CONUS404 data subset on S3+Caldera. The data are in zarr format, which can be read in easily by python, but the R community is still finding solutions for. We have many R users who will want access to this dataset, and we would like to be able to provide guidance to them. Some solutions that have been considered:

  1. Asking R users to download their data subset into a netcdf file, which can then be read into R. This is possible, but not ideal to ask R users to download and set up a python environment just for data access.
  2. @jesse-ross has done some initial exploration of the stars package in R, based on this blog. This seems to work, but it required specific R package versions, so a Docker container might be needed if this is our recommended approach.
  3. Use reticulate to run python code that reads zarr data in R. This approach has not been explored by our team, but Lauren Koenig may have previously done some work on this.
  4. @rsignell-usgs has suggested looking into the latest NetCDF library, which can also read Zarr. He thinks that's what GDAL is using to read Zarr, and it is the way most of the "R-reading-Zarr" demos he has seen have been based on.

Jesse will be leading this exploration, and we can use this thread to discuss and document our learnings along the way.

@jesse-ross
Copy link

For the stars/GDAL approach (2), I think a docker image will definitely be needed at present, because development versions of several geospatial packages are required (the blog post linked above is missing some details that are in its canonical version here). The image code.usgs.gov:5001/jross/zarr-in-r:latest has the necessary versions.

@amsnyder
Copy link
Contributor Author

Jesse here are some example notebooks you could try to replicate:
https://github.com/hytest-org/hytest/tree/dev/dataset_access

I would start with the explore notebook. These notebooks will likely be updated in the coming weeks with additional instructional material, but they will give a sense of what we want to provide to our users.

@amsnyder
Copy link
Contributor Author

amsnyder commented Dec 5, 2022

@jesse-ross - have you looked into RNetCDF at all? I am not familiar with it, but Dave B. mentioned it in this issue about updating the geoknife package to work with zarr.

@jesse-ross
Copy link

Looks interesting, thanks! It looks like it's gotten zarr support. I will look into it as well when I get to this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants