Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt download_vpfiles() to aloft S3 bucket and VPTS-CSV format #550

Closed
7 tasks done
peterdesmet opened this issue Mar 10, 2023 · 2 comments
Closed
7 tasks done

Adapt download_vpfiles() to aloft S3 bucket and VPTS-CSV format #550

peterdesmet opened this issue Mar 10, 2023 · 2 comments
Assignees
Labels
VPTS CSV Related to VPTS CSV format
Milestone

Comments

@peterdesmet
Copy link
Collaborator

peterdesmet commented Mar 10, 2023

Current functionality

Download and unzip a selection of vertical profile (vp) files from the ENRAM data repository, where these are stored as monthly zips per radar.

download_vpfiles(
  date_min,
  date_max,
  radars,
  directory = ".",
  overwrite = FALSE
)

Reason for change

  • New endpoint for data (aloft)
  • Monthly zips of hdf5 files are no longer offered
  • VPTS CSV files are now available

Questions

  • Do we want to keep downloading and reading as two separate functions? Or should it be possible to read directly from a URL (read_vpfiles() support reading remote files #544) Keep as separate functions
  • Downloaded files will be gzipped CSV files. Should we retain possibility of downloading HDF5 files (much slower) from bucket? Can we retain the function name download_vpfiles() or should it be download_vpts()? Offer possibility, deprecate download_vpfiles() in favour of download_vpts_aloft(), see Create download_vpts_aloft() function #553
  • Should downloaded files be unzipped? I would say no, as they can be read directly via read_csv() (they are not zipped directories of files). No
  • Should we make use of an S3 dependency to download files? TBD, see Create download_vpts_aloft() function #553
  • date_min and date_max are precise to the day, but data are downloaded per month (= same as current behaviour). Should the values for date_min/date_max be structured as YYYY-MM? Note that these currently align with select_vpfiles(). Keep them as full dates
  • How should we warn for unrecognized radars? TBD, see Create download_vpts_aloft() function #553
  • How should we warn for dates without data? TBD, see Create download_vpts_aloft() function #553

Ping @CeciliaNilsson709 @adokter

@peterdesmet peterdesmet self-assigned this Mar 10, 2023
@peterdesmet peterdesmet added the VPTS CSV Related to VPTS CSV format label Mar 10, 2023
@CeciliaNilsson709
Copy link
Collaborator

Lots of hard questions, might be a good discussion to have during the coding sprint, or in a meeting preparing for the coding sprint? Some quick thoughts below.

Do we want to keep downloading and reading as two separate functions? Or should it be possible to read directly from a URL (Support reading remote vp files #544)

For the CSV's it would make sense to have direct reading I think, for the HDF5 not ofcourse.

Downloaded files will be gzipped CSV files. Should we retain possibility of downloading HDF5 files (much slower) from bucket? Can we retain the function name download_vpfiles() or should it be download_vpts()?

I think we should keep the current functionality as people probably use since we have been teaching it that way. Not likely that everyone will want to change to working from the CSVs.

Should downloaded files be unzipped? I would say no, as they can be read directly via read_csv() (they are not zipped directories of files).

I guess for the CSVs it won't matter much so they don't need to be unzipped.

Should we make use of an S3 dependency to download files?

Don't know.

date_min and date_max are precise to the day, but data are downloaded per month (= same as current behaviour). Should the values for date_min/date_max be structured as YYYY-MM? Note that these currently align with select_vpfiles().

I thought the data will not be monthly anymore? But if it is, then yes it would make sense to go for YYYY-MM. But also as noted this is the current behaviour and I don't think it has caused any confusion so far.

How should we warn for unrecognized radars?
How should we warn for dates without data?

Currently both these just give an error for the download per file, which makes it pretty easy to figure out what the problem is, so don't think we need to make it too complicated?

@peterdesmet
Copy link
Collaborator Author

peterdesmet commented Apr 4, 2023

Discussed in call with @adokter and @CeciliaNilsson709. We suggest to:

  • Keep downloading seperate from reading.
  • Split download functions per source (aloft, rmi, nexrad). That makes it easier to maintain, since the author of a function is likely familiar with a single rather than all sources. Makes it also clear to the user. And allows for specific parameters per source, rather than trying to combine it all in one function.
  • Keep functions split by data type (pvol vs vpts). Since most sources only offer one data type, it would not increase the number of functions that much.
  • Face out the use of singular vp, so name functions download_pvol_<source>() and download_vpts_<source>(). The latter could still offer the possibility to download singular vp hdf5 files

Suggested functions:

@adokter adokter added this to the 0.7.1 milestone Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
VPTS CSV Related to VPTS CSV format
Projects
None yet
Development

No branches or pull requests

3 participants