Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCDS data downloads are very slow #299

Open
rod-glover opened this issue Sep 14, 2023 · 0 comments
Open

PCDS data downloads are very slow #299

rod-glover opened this issue Sep 14, 2023 · 0 comments

Comments

@rod-glover
Copy link
Contributor

rod-glover commented Sep 14, 2023

Data downloads from the Met Data Portal - PCDS can be extremely slow. The backend serving this data is running the branch pcds-only, which is an open branch maintained for precisely this application.

This investigation originated with a complaint about a different issue, due to metadata values in CRMP database, which was resolved. The slowness of downloads during testing for that issue was marked.

The original complaint caused us to look at downloads using the following station filters:

  • Start date: 2014-01-01
  • End date: 2014-03-01
  • Include stations with no observations: CHECKED
  • Only include stations with climatology : UNCHECKED
  • Network: Single network at a time.
  • Variable: Temperature (Mean), Temperature (Min.), Temperature (Point), Temperature Climatology (Mean), Temperature Climatology (Min.)
  • Observation frequency: Hourly

And the following settings on teh Station Data tab:

  • Clip time series to filter date range: CHECKED

I'm documenting an ongoing investigation. Summary of what I've done so far:

  • Downloads in NetCDF, CSV formats
  • All networks with non-zero station counts with those filter parameters
  • Timeseries only

Observations:

  • Some downloads are only moderately slow, but still only 200-400 kB/min
  • Download speeds vary by network. This may be due to the fact that the number of stations and observations vary by an order of magnitude between networks, and that for some reason downloading larger numbers of observations (or stations) is slower not just overall but per station/observation/MB for larger data sets.
  • These figures are consistent across several days of tests.

I will attach a spreadsheet with details later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant