Currently, the automated data downloading includes two steps:
-
data download and extraction:
motypydownload
-
format conversion from netCDF to Zarr:
netcdf2zarr
In the future, there will be more conversion steps (e.g., from Zarr to Parquet) and / or an upload step to the systems of a collaborator.
For real-time downloading from the Copernicus Ocean website, the data are selected according to user-given parameters:
- spatial domain
- depth
- time span
- variables
To extract the data we use the motuclient
.
The data are downloaded as netCDF files into associated name directories:
- for the physics analysis dataset:
global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh/nc/
- for the wave analysis dataset:
global-analysis-forecast-wav-001-027/nc/
For each day, a separate .nc
file is created named according to the selected, variable and start and end time stamp, e.g., global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh_uo_2021-01-23_2021-01-24.nc
or global-analysis-forecast-wav-001-027_VPED_2021-01-23_2021-01-24.nc
.
We'll combine all timesteps into one Zarr store per variable.
For the physics analysis dataset, there would, e.g., be global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh/zarr/global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh_uo_2021-01-01_2021-02-01.zarr/
.
{produc-id}/{format}/{product-id}_{variable}_{start-date}_{end-time}.{extension}
with
product-id
being, e.g.,global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh
, orglobal-analysis-forecast-wav-001-027
.format
beingnc
,zarr
, etc.variable
, beinguo
,vo
, etc.start-time
being interpreted as left inclusive boundary of the time interval covered by the data file / data store, andend-time
being interpreted as the right exclusive boundary of the time intervalextension
beingnc
,zarr/
, etc.
The necessary container images can be built locally:
docker build -t rasmus-cmems-downloads:motupydownload-latest motupydownload/
docker build -t rasmus-cmems-downloads:netcdf2zarr-latest netcdf2zarr/
Or pre-built images can be pulled and tagged:
docker pull quay.io/willirath/rasmus-cmems-downloads:motupydownload-latest
docker pull quay.io/willirath/rasmus-cmems-downloads:netcdf2zarr-latest
docker tag \
quay.io/willirath/rasmus-cmems-downloads:motupydownload-latest \
rasmus-cmems-downloads:motupydownload-latest
docker tag \
quay.io/willirath/rasmus-cmems-downloads:netcdf2zarr-latest \
rasmus-cmems-downloads:netcdf2zarr-latest
For a help message, just run:
docker run rasmus-cmems-downloads:motupydownload-latest --help
and
docker run rasmus-cmems-downloads:netcdf2zarr-latest --help
For actually downloading data and for authenticating on the copernicus service providing the data, read below.
We'll read credentials from environment variables:
export MOTU_USER="XXXXXXXXXXXXXXXX"
export MOTU_PASSWORD="XXXXXXXXXXXXXXXXX"
To run the container for downloading 10 days of the wave forecast product into ./data/
, do:
docker run -v $PWD:/work --rm \
-e MOTU_USER -e MOTU_PASSWORD \
rasmus-cmems-downloads:motupydownload-latest \
--service_id GLOBAL_ANALYSIS_FORECAST_PHY_001_024-TDS \
--product_id global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh \
--time_min 2021-01-01 --time_max 2021-02-01 \
--var uo --var vo --basedir /work/data
To convert the data that was just downloaded to ./data/
, run:
docker run -v $PWD:/work --rm \
rasmus-cmems-downloads:netcdf2zarr-latest \
--product_id global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh \
--var uo --var vo --basedir /work/data
On Nesh, make singularity available by loading the module:
module load singularity/3.5.2
Then, pull both images:
singularity pull --disable-cache --dir $PWD \
docker://quay.io/willirath/rasmus-cmems-downloads:motupydownload-latest
singularity pull --disable-cache --dir $PWD \
docker://quay.io/willirath/rasmus-cmems-downloads:netcdf2zarr-latest
We'll read credentials from environment variables:
export MOTU_USER="XXXXXXXXXXXXXXXX"
export MOTU_PASSWORD="XXXXXXXXXXXXXXXXX"
singularity run -B $PWD:/work \
rasmus-cmems-downloads_motupydownload-latest.sif \
--service_id GLOBAL_ANALYSIS_FORECAST_PHY_001_024-TDS \
--product_id global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh \
--time_min 2021-01-01 --time_max 2021-02-01 \
--var uo --var vo \
--basedir /work/data
singularity run -B $PWD:/work \
rasmus-cmems-downloads_netcdf2zarr-latest.sif \
--product_id global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh \
--var uo --var vo \
--basedir /work/data
TBD