Enhancement to ERA5 Data Retrieval and Download Process #397

yndevops2 · 2024-10-28T16:55:12Z

This update introduces an optimized approach for data retrieval and caching for ERA5 data from the Climate Data Store (CDS). Key changes include:

Caching Mechanism: Added a caching mechanism to prevent repeated downloads for identical data requests. The cache files are named based on a unique hash of the request parameters, making subsequent retrievals faster by using pre-downloaded data.
Custom Download Function: Integrated a custom download function with a progress bar to enhance user experience. The function uses chunked downloading with error handling and retry mechanisms for a robust download process.
Progress Bar: A dynamic progress bar displays the download status of multiple files, with completed files removed from the display to improve readability.

These improvements aim to make data retrieval more efficient and user-friendly.

Closes # (if applicable).

Changes proposed in this Pull Request

Checklist

Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in doc.
Newly introduced dependencies are added to environment.yaml, environment_docs.yaml and setup.py (if applicable).
A note for the release notes doc/release_notes.rst of the upcoming release is included.
Unit tests for new features were added (if applicable).
I consent to the release of this PR's code under the MIT license.

All downloaded data is cached; progress bar optimized

for more information, see https://pre-commit.ci

make it single line, make it more robust

for more information, see https://pre-commit.ci

lkstrp

Thanks @yndevops2 for the contribution!
In general you can contact us via GitHub and PRs, you don't need to send any emails.
Still, I haven't heard back from you.

A caching feature like this adds a lot of overhead and I do not know if it is needed. Also, the CDS API already provides caching and there is no real use case for re-downloading data

awongel · 2025-01-28T01:38:11Z

Hi @lkstrp,
We asked @yndevops2 to help us speed up the download because with the main branch version of Atlite it was not possible to download global grid-scale multiyear time series for the capacity factors, which we needed for a project. With this upgrade that is now possible.

The caching is an optional flag anyway, but we can talk about if you'd be interested in only integrating the sped-up download. The idea for the caching was that when you change the region of interest to something smaller than what has been downloaded before, one could avoid redownloading the data.

yndevops2 · 2025-02-02T08:39:43Z

Hi @lkstrp,
Downloading multi-year time series at a global scale is slow because .nc files download one by one, and caching only works if all data is fully downloaded.

Please check this code:

import atlite
import logging
import geopandas as gpd

def main(year):
    
    logging.basicConfig(level=logging.INFO)

    url = "https://naciscdn.org/naturalearth/110m/cultural/ne_110m_admin_0_countries.zip"

    world = gpd.read_file(url)
    # Drop uninhabited regions and Antarctica
    world = world[(world["POP_EST"] > 0) & (world["NAME"] != "Antarctica")]

    region = world
    region_name = "world"

    # Loop over the years
    logging.info(f"Processing {year}")

    # Define the cutout; this will not yet trigger any major operations
    cutout = atlite.Cutout(
        path=f"{region_name}-{year}_timeseries", module="era5", 
        bounds=region.unary_union.bounds, 
        time=f"{year}",
        chunks={"time": 100,},)
    # This is where all the work happens (this can take some time).
    cutout.prepare(
        compression={"zlib": True, "complevel": 9},
        monthly_requests=True,
        concurrent_requests=True)

    # Extract the wind power generation capacity factors
    wind_power_generation = cutout.wind(
        "Vestas_V112_3MW", 
        capacity_factor_timeseries=True,
        )

    # Extract the solar power generation capacity factors
    solar_power_generation = cutout.pv(
        panel="CSi", 
        orientation='latitude_optimal', 
        tracking="horizontal",
        capacity_factor_timeseries=True,)
    
    # Extract the concenctrated solar power generation capacity factors
    csp_power_generation = cutout.csp(
        installation="SAM_parabolic_trough", 
        capacity_factor_timeseries=True,)

    # Save gridded data as netCDF files
    wind_power_generation.to_netcdf(f"{region_name}_wind_CF_timeseries_{year}.nc")
    solar_power_generation.to_netcdf(f"{region_name}_solar_CF_timeseries_{year}.nc")
    csp_power_generation.to_netcdf(f"{region_name}_csp_CF_timeseries_{year}.nc")

if __name__ == "__main__":
    main("2023")

yndevops2 and others added 6 commits October 28, 2024 18:54

Update era5.py

7069e56

All downloaded data is cached; progress bar optimized

[pre-commit.ci] auto fixes from pre-commit.com hooks

7921b41

for more information, see https://pre-commit.ci

Update era5.py

e7f10cd

make it single line, make it more robust

[pre-commit.ci] auto fixes from pre-commit.com hooks

39f2f4e

for more information, see https://pre-commit.ci

Merge branch 'PyPSA:master' into master

bab996a

Enhancement to ERA5 Data Retrieval and Download Process

c6e0cc5

yndevops2 changed the title ~~Update era5.py~~ Enhancement to ERA5 Data Retrieval and Download Process Nov 3, 2024

fneum requested a review from lkstrp November 4, 2024 11:03

Merge branch 'master' into master

67fa747

lkstrp reviewed Dec 12, 2024

View reviewed changes

Merge branch 'master' into master

977c414

tp1845 approved these changes Jan 30, 2025

View reviewed changes

Merge branch 'master' into master

e2283f8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement to ERA5 Data Retrieval and Download Process #397

Enhancement to ERA5 Data Retrieval and Download Process #397

yndevops2 commented Oct 28, 2024 •

edited

Loading

lkstrp left a comment

awongel commented Jan 28, 2025

yndevops2 commented Feb 2, 2025

Enhancement to ERA5 Data Retrieval and Download Process #397

Are you sure you want to change the base?

Enhancement to ERA5 Data Retrieval and Download Process #397

Conversation

yndevops2 commented Oct 28, 2024 • edited Loading

Changes proposed in this Pull Request

Checklist

lkstrp left a comment

Choose a reason for hiding this comment

awongel commented Jan 28, 2025

yndevops2 commented Feb 2, 2025

yndevops2 commented Oct 28, 2024 •

edited

Loading