[Do not merge] Document new structure (#21)

* Black * Provide default for credentials and fix times * Fix badge * First part of the docs done * Ignore data directories * Fix var handling and output paths * Document container usage * Add sections for TODO's
geomar-od · Mar 2, 2021 · a385bec · a385bec
1 parent 4e44c5e
commit a385bec
Show file tree

Hide file tree

Showing 4 changed files with 136 additions and 277 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,2 +1,4 @@
 *.nc
 *.csv
+zarr
+nc
diff --git a/README.md b/README.md
@@ -1,279 +1,123 @@
 # CMEMS automated data retrieval
 
-[![build-and-push-images](https://github.com/geomar-od/rasmus-cmems-downloads/workflows/build-and-push-images/badge.svg?branch=main)](https://github.com/geomar-od/rasmus-cmems-downloads/actions?query=workflow%3Abuild-and-push-images)
+[![build-and-push-images](https://github.com/geomar-od/rasmus-cmems-downloads/actions/workflows/build_and_push_images.yaml/badge.svg)](https://github.com/geomar-od/rasmus-cmems-downloads/actions/workflows/build_and_push_images.yaml)
 [![quay.io/willirath/rasmus-cmems-downloads](https://img.shields.io/badge/quay.io-build-blue)](https://quay.io/repository/willirath/rasmus-cmems-downloads)
 
 ## Overview
 
-Currently, automated data downloading (https://github.com/geomar-od/rasmus-cmems-downloads) includes two steps:
+Currently, the automated data downloading includes two steps:
 
-1. data download and extraction,
+1. data download and extraction: [`motypydownload`](motypydownload/)
 
-2. format conversion from netCDF to `zarr` format.
+2. format conversion from netCDF to [Zarr](https://zarr.readthedocs.io/en/stable/): [`netcdf2zarr`](netcdf2zarr/)
 
-In the future, we may have more conversion steps (e.g., Parquet) and / or an upload step to the True Ocean systems.
+In the future, there will be more conversion steps (e.g., from Zarr to Parquet) and / or an upload step to the systems of a collaborator.
 
 ## Description
 
-For real-time downloading from the Copernicus Ocean website (https://resources.marine.copernicus.eu/?option=com_csw&task=results) the data are selected according to user-given parameters:
+For real-time downloading from the [Copernicus Ocean website](https://resources.marine.copernicus.eu/?option=com_csw&task=results), the data are selected according to user-given parameters:
 
 - spatial domain
 - depth
 - time span
 - variables
 
-To extract the data we use the [`motuclient`](https://github.com/clstoulouse/motu-client-python/) implemented in Python and bash scripting.
+To extract the data we use the [`motuclient`](https://github.com/clstoulouse/motu-client-python/).
 
-Currently, two simulation datasets are downloaded:
+## File and directory naming
 
-- GLOBAL_ANALYSIS_FORECAST_PHY
-- GLOBAL_ANALYSIS_FORECAST_WAV
+### netCDF
 
-The data are downloaded as netCDF files into associated name directories: `global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh/nc` and `global-analysis-forecast-wav-001-027/nc` in a directory that can be chosen via an input argument.
+The data are downloaded as netCDF files into associated name directories:
+- for the [physics analysis dataset](https://resources.marine.copernicus.eu/?option=com_csw&view=details&product_id=GLOBAL_ANALYSIS_FORECAST_PHY_001_024): `global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh/nc/`
+- for the [wave analysis dataset](https://resources.marine.copernicus.eu/?option=com_csw&view=details&product_id=GLOBAL_ANALYSIS_FORECAST_WAV_001_027): `global-analysis-forecast-wav-001-027/nc/`
 
-For a time step a separated `.nc` file is created named according to selected model, variable and start and end time stamp, e.g., `global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh_uo_2021-01-23_2021-01-24.nc` or `global-analysis-forecast-wav-001-027_VPED_2021-01-23_2021-01-24.nc`. We create daily data files for every variable.  
+For each day, a separate `.nc` file is created named according to the selected, variable and start and end time stamp, e.g., `global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh_uo_2021-01-23_2021-01-24.nc` or `global-analysis-forecast-wav-001-027_VPED_2021-01-23_2021-01-24.nc`.
 
-To convert data to the tabular `.csv` format, we use Python scripts and create directories `GLOBAL_ANALYSIS_FORECAST_PHY_CSV/` and `GLOBAL_ANALYSIS_FORECAST_WAV_CSV/` which again will be located in a directory that can be chosen via a command line argument. The converted files are called, e.g., `GLOBAL_ANALYSIS_FORECAST_PHY_001_24-TDS_2021-01-23_21:30:00:00.csv` or `GLOBAL_ANALYSIS_FORECAST_WAVE_001_27-TDS_2021-01-23_15:00:00:00.csv`.
+### Zarr
 
+We'll combine all timesteps into one Zarr store per variable.
+For the [physics analysis dataset](https://resources.marine.copernicus.eu/?option=com_csw&view=details&product_id=GLOBAL_ANALYSIS_FORECAST_PHY_001_024), there would, e.g., be `global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh/zarr/global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh_uo_2021-01-01_2021-02-01.zarr/`.
 
-## Usage
+### General
 
-### Data download and extraction
+`{produc-id}/{format}/{product-id}_{variable}_{start-date}_{end-time}.{extension}`
+with
+- `product-id` being, e.g., `global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh`, or `global-analysis-forecast-wav-001-027`.
+- `format` being `nc`, `zarr`, etc.
+- `variable`, being `uo`, `vo`, etc.
+- `start-time` being interpreted as left inclusive boundary of the time interval covered by the data file / data store, and `end-time` being interpreted as the right exclusive boundary of the time interval
+- `extension` being `nc`, `zarr/`, etc.
 
-This needs a Python environment containing the following packages
-- [`motuclient`](https://github.com/clstoulouse/motu-client-python#using-pip)
-- [`xarray`](http://xarray.pydata.org/en/stable/installing.html#instructions)
-- [`netCDF4`](https://pypi.org/project/netCDF4/)
-- [`pandas`](https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html#installing-from-pypi)
-Note that it is, however,  recommended to use the Docker ([see below](#usage-with-docker)).
 
-To initiate an automated data retrieval, copy the `.py` files to the desired output directory.
+## Usage (with Docker)
 
-Set environment variables containing your CMEMS credentials:
-```shell
-export MOTU_USER="XXXXXXXXXXXXXXXX"
-export MOTU_PASSWORD="XXXXXXXXXXXXXXXXX"
-```
+### Building or pulling the container images
 
-To execute download using python and default set of arguments:
-```shell
-  python MotuClDownloadCMEMSPhysModel.py
-  python MotuClDownloadCMEMSWavModel.py
-```
-or, otherwise with selected arguments call:
-
-```shell
-   python MotuClDownloadCMEMSPhysModel.py --basedir <basedir_name> --longitude_min <lon_min_value> --longitude_max <lon_max_value> --latitude_min <lat_min_value> --latitude_max <lat_max_value> --depth_min <depth_min_value> --depth_max <depth_max_value> --time_min <YYYY-MM-DD SS:mm:HH> --time_max <YYYY-MM-DD SS:mm:HH> --replace <True/False> --vars <list of variables> --service_id <service_id_name> --product_id <product_id_name>
-   python MotuClDownloadCMEMSWavModel.py --basedir <basedir_name> --longitude_min <lon_min_value> --longitude_max <lon_max_value> --latitude_min <lat_min_value> --latitude_max <lat_max_value> --depth_min <depth_min_value> --depth_max <depth_max_value> --time_min <YYYY-MM-DD SS:mm:HH> --time_max <YYYY-MM-DD SS:mm:HH> --replace <True/False> --vars <list of variables> --service_id <service_id_name> --product_id <product_id_name>
-   ```
-where the set of arguments is defined as follows:
-*  basedir  - directory which contains the directories for the netCDF and the CSV files can be chosen. If no `<base_dir>` is supplied, the current directory will be chosen.
-*  longitude_min, --longitude_max - longitudinal domain extend (default = -180, -179.91667), 
-*  latitude_min, --latitude_max - latitudonal domain extend (default = -80, 90), 
-*  depth_min, --depth_max - bottom and top depth layers (default = 0.493, 0.4942),
-*  time_min, --time_max - time range in the format "YYYY-MM-DD SS:mm:HH", 
-*  replace - option for re--downloading of data files, True/False is selected depending on whether existing files are replaced automatically <br>
-               upon downloading (default = False)                             
-*  vars - list of variables could be selected from available model ocean parameters  
-*  service_id - name of ocean model (default = GLOBAL_ANALYSIS_FORECAST_PHY_001_024-TDS/GLOBAL_ANALYSIS_FORECAST_WAV_001_027-TDS for ocean/wave model correspondingly) 
-*  product_id - name of ocean product (default = global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh/global-analysis-forecast-wav-001-027 for ocean/wave model correspondingly)
-
-## Data conversion from netcdf to Zarr
-
-This step is outdated
-After the data are downloaded, run the python scripts to convert the data to `.csv` format by running:
-```shell
-python NetCDF2CSVPhysModel.py --basedir <base_dir>
-python NetCDF2CSVWaveModel.py --basedir <base_dir>
-```
-After downloads are finished, run the python scripts to convert netcdf files to `zarr store` by running
-in command line with default arguments:
+The necessary container images can be built locally:
 ```shell
-  python NetCDF2zarrPhysModel.py
-  python NetCDF2zarrWavModel.py
+docker build -t rasmus-cmems-downloads:motupydownload-latest motupydownload/
+docker build -t rasmus-cmems-downloads:netcdf2zarr-latest netcdf2zarr/
 ```
-or using list of arguments as follows:
-
-```shell
-   python NetCDF2zarrPhysModel.py --basedir <basedir_name> --product_id <product_id_name> --var <list of variables>
-   python NetCDF2zarrWavModel.py --basedir <basedir_name> --product_id <product_id_name> --var <list of variables>
-   ```
-where the set of arguments is defined as above in previous section. 
-After the conversion is finished `zarr store` directories with data are created named according to convention `product_id/product_id_variable_name_start_day_end_day.zarr`.
-
-In future versions,there will be a single data conversion call, the model selection will be passed as argument.    
-
 
-## Usage with Docker
-
-### Build or pull image
-
-First, build the container images with
-```shell
-docker build \
-    -t rasmus-cmems-downloads:motupy-latest - \
-    < Dockerfile_motupy
-docker build \
-    -t rasmus-cmems-downloads:netcdf2csv-latest - \
-    < Dockerfile_netcdf2csv
-docker build \
-    -t rasmus-cmems-downloads:netcdf2zarr-latest - \
-    netcdf2zarr/
-```
-or pull the pre-built images with
+Or pre-built images can be pulled and tagged:
 ```shell
-docker pull quay.io/willirath/rasmus-cmems-downloads:motupy-latest
-docker tag \
-    quay.io/willirath/rasmus-cmems-downloads:motupy-latest \
-    rasmus-cmems-downloads:motupy-latest
+docker pull quay.io/willirath/rasmus-cmems-downloads:motupydownload-latest
+docker pull quay.io/willirath/rasmus-cmems-downloads:netcdf2zarr-latest
 
-docker pull quay.io/willirath/rasmus-cmems-downloads:netcdf2csv-latest
 docker tag \
-    quay.io/willirath/rasmus-cmems-downloads:netcdf2csv-latest \
-    rasmus-cmems-downloads:netcdf2csv-latest
-
-docker pull quay.io/willirath/rasmus-cmems-downloads:netcdf2zarr-latest
+    quay.io/willirath/rasmus-cmems-downloads:motupydownload-latest \
+    rasmus-cmems-downloads:motupydownload-latest
 docker tag \
     quay.io/willirath/rasmus-cmems-downloads:netcdf2zarr-latest \
     rasmus-cmems-downloads:netcdf2zarr-latest
 ```
 
-### Set credentials
-
-Set environment variables containing your CMEMS credentials:
-```shell
-export MOTU_USER="XXXXXXXXXXXXXXXX"
-export MOTU_PASSWORD="XXXXXXXXXXXXXXXXX"
-```
-
-### Download data
+### Running the containers
 
-Then, run the download steps with
+For a help message, just run:
 ```shell
-docker run -it --rm \
-    -e MOTU_USER -e MOTU_PASSWORD \
-    -v $PWD:/work -w /work \
-    rasmus-cmems-downloads:motupy-latest \
-    ./MotuClCallPhysModel.sh <base_dir>
+docker run rasmus-cmems-downloads:motupydownload-latest --help
 ```
 and
 ```shell
-docker run -it --rm \
-    -e MOTU_USER -e MOTU_PASSWORD\
-    -v $PWD:/work -w /work \
-    rasmus-cmems-downloads:motupy-latest \
-    ./MotuClCallWaveModel.sh <base_dir>
-```
-Likewise, to run the download with python via docker
-```shell
-docker run -it --rm \
-    -e MOTU_USER -e MOTU_PASSWORD \
-    -v $PWD:/work -w /work \
-    rasmus-cmems-downloads:motupy-latest \
-    python MotuClDownloadCMEMSPhysModel.py 
+docker run rasmus-cmems-downloads:netcdf2zarr-latest --help
 ```
-and
-```shell
-docker run -it --rm \
-    -e MOTU_USER -e MOTU_PASSWORD\
-    -v $PWD:/work -w /work \
-    rasmus-cmems-downloads:motupy-latest \
-    python MotuClDownloadCMEMSPhysModel.py
-```
-Again, `<base_dir>` indicates where the data should be downloaded.
 
-### Run conversion
+For actually downloading data and for authenticating on the copernicus service providing the data, read below.
 
-And finally, run the conversion steps with
-```shell
-docker run -it --rm \
-    -v $PWD:/work -w /work \
-    rasmus-cmems-downloads:netcdf2csv-latest \
-    python NetCDF2CSVPhysModel.py --basedir <base_dir>
-```
-and
-```shell
-docker run -it --rm \
-    -v $PWD:/work -w /work \
-    rasmus-cmems-downloads:netcdf2csv-latest \
-    python NetCDF2CSVWaveModel.py --basedir <base_dir>
-```
-Again, `<base_dir>` indicates where the data should be downloaded.
-
-To convert to Zarr, run:
-```shell
-docker run --rm \
-    -v $PWD:/work -w /work \
-    rasmus-cmems-downloads:netcdf2zarr-latest \
-    --basedir <base_dir> --product-id <productid> --var <var1> --var <var2> 
-```
-
-## Usage with Singularity
+### Data download example
 
-### Load singularity module
-
-If necessary, make sure `singularity` is in your path.
-On Nesh, you currently need to run
+We'll read credentials from environment variables:
 ```shell
-module load singularity/3.5.2
+export MOTU_USER="XXXXXXXXXXXXXXXX"
+export MOTU_PASSWORD="XXXXXXXXXXXXXXXXX"
 ```
 
-### Build or pull image
-
-First, pull the container images with
+To run the container for downloading 10 days of the wave forecast product into `./data/`, do:
 ```shell
-singularity pull --disable-cache --dir $PWD docker://quay.io/willirath/rasmus-cmems-downloads:motupy-latest
-
-singularity pull --disable-cache --dir $PWD docker://quay.io/willirath/rasmus-cmems-downloads:netcdf2csv-latest
-```
-This will create two singularity files (ending on `sif`):
-```
-rasmus-cmems-downloads_motupy-latest.sif
-rasmus-cmems-downloads_motupy-latest.sif
+docker run -v $PWD:/work --rm \
+    -e MOTU_USER -e MOTU_PASSWORD \
+    rasmus-cmems-downloads:motupydownload-latest \
+    --service_id GLOBAL_ANALYSIS_FORECAST_PHY_001_024-TDS \
+    --product_id global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh \
+    --var uo --var vo --basedir /work/data
 ```
 
-### Set credentials
+### Zarr conversion example
 
-Set environment variables containing your CMEMS credentials:
+To convert the data that was just downloaded to `./data/`, run:
 ```shell
-export MOTU_USER="XXXXXXXXXXXXXXXX"
-export MOTU_PASSWORD="XXXXXXXXXXXXXXXXX"
+docker run -v $PWD:/work --rm \
+    rasmus-cmems-downloads:netcdf2zarr-latest \
+    --product_id global-analysis-forecast-phy-001-024-hourly-t-u-v-ssh \
+    --var uo --var vo --basedir /work/data
 ```
 
-### Download data
+## Usage (with Singularity)
 
-Then, run the download steps with
-```shell
-singularity run rasmus-cmems-downloads_motupy-latest.sif \
-    ./MotuClCallPhysModel.sh <base_dir>
-```
-and
-```shell
-singularity run rasmus-cmems-downloads_motupy-latest.sif \
-    ./MotuClCallWaveModel.sh <base_dir>
-```
-Likewise, to run the download with python via docker
-```shell
-singularity run rasmus-cmems-downloads_motupy-latest.sif \
-    python MotuClDownloadCMEMSPhysModel.py 
-```
-and
-```shell
-singularity run rasmus-cmems-downloads_motupy-latest.sif \
-    python MotuClDownloadCMEMSPhysModel.py
-```
-Again, `<base_dir>` indicates where the data should be downloaded.
+TBD
 
-### Run conversion
+## Usage (with local Python installation)
 
-And finally, run the conversion steps with
-```shell
-singularity run rasmus-cmems-downloads_netcdf2csv-latest. \
-    python NetCDF2CSVPhysModel.py --basedir <base_dir>
-```
-and
-```shell
-singularity run rasmus-cmems-downloads_netcdf2csv-latest. \
-    python NetCDF2CSVWaveModel.py --basedir <base_dir>
-```
-Again, `<base_dir>` indicates where the data should be downloaded.
+TBD
-Original file line number
+Diff line change
@@ -1,2 +1,4 @@
     *.nc
     *.csv
+    zarr
+    nc