-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
157 lines (108 loc) · 8.2 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
title: "copernicus"
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
Provides access, download and archiving tools for [Copernicus](https://marine.copernicus.eu/) marine datasets using R language. This package has been developed primarily around the [daily ocean physics forecast](https://data.marine.copernicus.eu/product/GLOBAL_ANALYSISFORECAST_PHY_001_024/download?dataset=cmems_mod_glo_phy_anfc_0.083deg_P1D-m) but with no or minimal modification it can work with other products.
### Note
In 2023/2024 Copernicus migrated to a new service model; learn more [here](https://help.marine.copernicus.eu/en/articles/7045314-migrating-to-the-new-global-physical-analysis-and-forecasting-system). This migration introduced the [Copernicus Marine Toolbox](https://help.marine.copernicus.eu/en/collections/4060068-copernicus-marine-toolbox) which provides a Python API and a command line interface (CLI). This package leverages the latter.
The toolbox is under active development, so if you are having troubles (like we have sometimes!) try reinstalling. We have some notes [here](https://github.com/BigelowLab/copernicus/wiki/Installation-of-%60copernicusmarine%60).
## Copernicus resources
Copernicus serves **so many** data resources; finding what you want can be a challenge. Check out the new [Marine Data Store](https://marine.copernicus.eu/news/introducing-new-copernicus-marine-data-store). And checkout the listing [here](https://marine.copernicus.eu/about/producers).
### The all important datasetID
Like data offerings from [OBPG](https://oceancolor.gsfc.nasa.gov/), Copernicus strives to provided consistent dataset identifiers that are easily decoded programmatically (and with practice by eye). In order to download programmatically you must have the datasetID in hand. Learn more about Copernicus [nomenclature rules here](https://help.marine.copernicus.eu/en/articles/6820094-how-is-defined-the-nomenclature-of-copernicus-marine-data#h_34a5a6f21d).
### `get` or `subset`
The [Copernicus Marine Toolbox](https://help.marine.copernicus.eu/en/collections/4060068-copernicus-marine-toolbox) command-line application, `copernicus-marine` provides two primary methods for donwloading data: `get` and `subset`. `get` is not well documented, but subset does what it implies - subsetting resources by variable, spatial bounding box and time. This package only supports `subset`.
## Requirements
+ [R v4.1+](https://www.r-project.org/)
+ [rlang](https://CRAN.R-project.org/package=rlang)
+ [dplyr](https://CRAN.R-project.org/package=dplyr)
+ [ncdf4](https://CRAN.R-project.org/package=ncdf4)
+ [sf](https://CRAN.R-project.org/package=sf)
+ [stars](https://CRAN.R-project.org/package=stars)
+ [readr](https://CRAN.R-project.org/package=readr)
+ [twinkle](https://github.com/BigelowLab/twinkle) (not on CRAN)
## Installation
```
remotes::install_github("BigelowLab/twinkle")
remotes::install_github("BigelowLab/copernicus")
```
## Configuration
You can preconfigure a credentials file (required) and a path definition file (optional) to streamline accessing and storing data.
### Configure credentials
You must have credentials to access Copernicus holdings - if you don't have them now please request access [here](https://data.marine.copernicus.eu/register).
Once you have them you can add them you need to run a [`copernicusmarine` app](https://github.com/BigelowLab/copernicus/wiki/Installation-of-%60copernicusmarine%60) to set them.
### Configure data path
If you plan to use our directory-driven database storage system then you should set the root path for the data directory. You can always change or override it, but, like credentials, storing this path in a hidden file will ease subsequent use of the functions. We don't actually run this in the README, but you can copy-and-paste to use in R. Again, replace the path with one suiting your own situation.
```
set_root_path("/the/path/to/copernicus/data")
```
## Fetching data
To fetch data we'll focus on [ocean physics daily forecast](https://data.marine.copernicus.eu/product/GLOBAL_ANALYSISFORECAST_PHY_001_024/download?dataset=cmems_mod_glo_phy-cur_anfc_0.083deg_P1D-m) which serves daily mean sea surface currents. We'll define a date range and the bounding box that covers the Gulf of Maine (gom).
```{r fetch}
suppressPackageStartupMessages({
library(stars)
library(copernicus)
library(dplyr)
})
dataset_id = "cmems_mod_glo_phy-cur_anfc_0.083deg_P1D-m"
vars = c("uo","vo")
bb = c(xmin = -72, xmax = -63, ymin = 39, ymax = 46)
depth = c(0,1) # just the top 10 meters
time = (c(0, 9) + Sys.Date()) # today - and a little ahead window
x = fetch_copernicus(dataset_id = dataset_id,
vars = vars,
bb = bb,
time = time,
cleanup = FALSE,
verbose = TRUE)
x
```
We can plot a subset of these using base graphics and the `[` function. `stars` objects are indexed first by the attribute (variable) followed by the dimensions. In this case the index order is [`attribute`, `x`, `y`, `depth`, `time`] or [`attribute`, `x`, `y`, `time`] for single-depth objects.
```{r first_plot}
itime = 1
date = format(st_get_dimension_values(x, "time")[itime], "%Y-%m-%d")
plot(x[1, , , , itime], key.lab = paste("uo for each depth on", date))
```
If you want to plot all of the times for a given depth you can use `slice`.
```{r second_plot}
idepth = 1
depth = round(st_get_dimension_values(x, 'depth')[idepth],2)
plot(slice(x, "depth", idepth),
key.lab = paste0("uo at depth ", depth, "m"))
```
## Archiving data
You can download and archive data using the database functionality provided in this package. There are a number of ways to manage suites of data, this is just one fairly light weight method.
Here, we store data in a directory tree that starts with `region` at it's root. Within the `region` we divide by `year`, `monthday`. Within in each `monthday` directory there are one or more files uniquely named to provide complete identification of datasetid, time, depth, period, variable and treatment. Each file contains one raster for one variable at one depth and one time.
Here is an example of a file name that follows this pattern.
`datasetid__time_depth_period_variable_treatment.tif`
```
gom/2023/1205/cmems_mod_glo_phy-cur_anfc_0.083deg_P1D-m__2023-12-05T000000_0.49_day_uo_raw.tif
```
Here you can see that datasetid and the rest of the identifiers are separated by a double underscore to aid in programmatic parsing. Time includes the hour in case we ever want to download the 6-hour datasets. The depth is rounded to three significant digits by default, but that is configurable. The treatment, `raw`, in this case means the values are as downloaded, however, if you ever wanted to roll your own running mean (say 8-day rolling mean) or some other statistic this naming system provides the flexibility you will need.
**NOTE** Don't forget to [set your root data path](#Configure-data-path).
First we define an output path for the Gulf of Maine data. The path isn't created until data is written to it. Then we simply call `archive_copernicus()` to automatically write individual GeoTIFF files into a database structure. Note that we provide an identifier that provides the provenance of the data. We receive, in turn, a table that serves as a database.
```{r archive}
path = copernicus_path("gom")
db = archive_copernicus(x, path = path, id = 'cmems_mod_glo_phy-cur_anfc_0.083deg_P1D-m') |>
dplyr::glimpse()
```
Since this is the first time you uhave downloaded and archived data, be sure to save the database.
```{r write_database}
write_database(db, path)
```
### Using the database
The database is very light and easy to filter for just the records you might need. Note that depth is a character data type; this provides you with flexibility to define depth as 'surface' or '50-75' or something like that.
Let's walk through reading the database, filtering it for a subset, reading the files and finally displaying.
```{r database, message = FALSE}
db <- copernicus::read_database(path) |>
dplyr::filter(dplyr::between(date, Sys.Date(), Sys.Date()+1)) |>
dplyr::glimpse()
```
Now we can read in the files.
```{r, read_files}
s = read_copernicus(db, path)
s
```