Skip to content

Commit

Permalink
clifo -> cliflo (typo) and other chores
Browse files Browse the repository at this point in the history
  • Loading branch information
alpha-beta-soup committed Sep 9, 2024
1 parent 9869b6b commit fff6db1
Show file tree
Hide file tree
Showing 15 changed files with 30 additions and 4,574,865 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
.snakemake/
config.local.yml
config.local.yml
static/cliflo/all_cf_*.csv
17 changes: 11 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,10 +114,15 @@ This was more successful, although the data is relatively low resolution (30 arc
| Greymouth | West Coast | Regional | Continuous | Continuous |
| Dunedin | Otago | Regional | 1 Jul | 31 Jul |

Citations:

### Clifo
Stefan Lange, Matthias Mengel, Simon Treu, Matthias Büchner (2023): ISIMIP3a atmospheric climate input data (v1.2). ISIMIP Repository. https://doi.org/10.48364/ISIMIP.982724.2

![Map of New Zealand indicating the end of the growing season, using Clifo data](./maps/clifo/maps.png)
Dirk N. Karger, Stefan Lange, Chantal Hari, Christopher P.O. Reyer, Niklaus E. Zimmermann (2022): CHELSA-W5E5 v1.0: W5E5 v1.0 downscaled with CHELSA v2.0. ISIMIP Repository. https://doi.org/10.48364/ISIMIP.836809.3

### CliFlo

![Map of New Zealand indicating the end of the growing season, using CliFlo data](./maps/cliflo/maps.png)

## Method

Expand All @@ -129,11 +134,11 @@ This was more successful, although the data is relatively low resolution (30 arc
4. Calculate the median frost day for the first and last frost across a range of years. The output is written as NetCDF data, to match the input. (Rule `median_frost_doy`.)
5. We can also produce a summary table to compare against the baseline data. (Rule `summary_table`.)

### Clifo
### CliFlo

1. Data is downloaded from Clifo. This is done in a semi-automated fashion, but is not part of the workflow because it does require manual intervention. (The reason is that the Clifo database has a download limit of 2 million rows, which this exceeds. When the limit is reached, no data is returned until the user manually resets the user account.)
2. Clifo data is cleaned. Duplicate data (due to the semi-automated download process) is removed. Interpolation is performed to fill some small gaps (up to three days by default) using an [Akima spline interpolation](https://en.wikipedia.org/wiki/Akima_spline). In the same step, the first and last date of the frost threshold being breached is recorded, on an annual basis. (Rule `clean_clifo_data`.)
3. The median frost first/last day-of-year is captured for each station, over a period. The growing season is determined by these values. By default, stations are excluded from the result set if they are not present in the record for 6/18 of the requested period (in years). Also, stations are filtered out if they are within a distance threshold of other stations (the station with the longest record within the period is kept). (Rule `median_clifo_data`)
1. Data is downloaded from CliFlo. This is done in a semi-automated fashion, but is not part of the workflow because it does require manual intervention. (The reason is that the CliFlo database has a download limit of 2 million rows, which this exceeds. When the limit is reached, no data is returned until the user manually resets the user account.)
2. CliFlo data is cleaned. Duplicate data (due to the semi-automated download process) is removed. Interpolation is performed to fill some small gaps (up to three days by default) using an [Akima spline interpolation](https://en.wikipedia.org/wiki/Akima_spline). In the same step, the first and last date of the frost threshold being breached is recorded, on an annual basis. (Rule `clean_cliflo_data`.)
3. The median frost first/last day-of-year is captured for each station, over a period. The growing season is determined by these values. By default, stations are excluded from the result set if they are not present in the record for 6/18 of the requested period (in years). Also, stations are filtered out if they are within a distance threshold of other stations (the station with the longest record within the period is kept). (Rule `median_cliflo_data`)
4. To produce an interpolated result, we use a thin plate spline radial basis function. The dependent variables are the median day of year (or median growing season period, in days). The covariates are elevation and coastal proximity. Stations may be removed if they exceed a Mahalanobis distance threshold (considering the dependent variable, together with elevation). The growing season is interpolated first (using all remaining stations) then the median date of the last killing frost (omitting stations with no frost record to avoid specifying a fill value). The median date of the first frost is then derived from these estimates so that the output is internally consistent (i.e. the growing season length accords with the median dates). The output is written as three bands of a GeoTIFF with band descriptions. (Rule `thin_plate_spline`).

Various parameters can be adjusted without editing the scripts, such as the value of ε, the number of neighbours to consider, the smoothness of the approximation, whether to perform outlier detection, the outlier threshold, the spatial coincidence threshold, whether to include coastal proximty as a covariate, and whether to log transform the coastal proximity measurement.
Expand Down
4 changes: 2 additions & 2 deletions Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@ FROST_PERIODS = [

include: "rules/elevation.smk"
include: "rules/chelsa.smk"
include: "rules/clifo.smk"
include: "rules/cliflo.smk"

rule all:
input:
chelsa=map(lambda period: expand(SUMMARY_TABLE_CSV, start=period[0], end=period[-1]), FROST_PERIODS),
clifo=map(lambda period: expand(MEDIAN_CLIFO_TPS, start=period[0], end=period[-1]), FROST_PERIODS)
cliflo=map(lambda period: expand(MEDIAN_CLIFLO_TPS, start=period[0], end=period[-1]), FROST_PERIODS)
File renamed without changes
File renamed without changes.
File renamed without changes.
24 changes: 12 additions & 12 deletions scripts/median_clifo.py → scripts/median_cliflo.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,13 @@

days_in_year = lambda year: 365 + calendar.isleap(int(year))

clifo = gpd.read_parquet(smk.input[0])
cliflo = gpd.read_parquet(smk.input[0])

clifo = clifo[
(clifo['Year'] >= int(start)) & (clifo['Year'] <= int(end))
cliflo = cliflo[
(cliflo['Year'] >= int(start)) & (cliflo['Year'] <= int(end))
]
clifo['start'] = pd.to_datetime(clifo['start'])
clifo['end'] = pd.to_datetime(clifo['end'])
cliflo['start'] = pd.to_datetime(cliflo['start'])
cliflo['end'] = pd.to_datetime(cliflo['end'])

range_start = pd.to_datetime(f'{start}-01-01')
range_end = pd.to_datetime(f'{end}-12-31')
Expand All @@ -44,19 +44,19 @@ def calculate_duration_in_range(start, end, range_start, range_end):
else:
return 0

clifo['duration_in_range'] = clifo.apply(
cliflo['duration_in_range'] = cliflo.apply(
lambda row: calculate_duration_in_range(row['start'], row['end'], range_start, range_end), axis=1
).round(0)

# Ignore records with insufficient record lengths
clifo = clifo[clifo['duration_in_range'] >= min_period_Y]
cliflo = cliflo[cliflo['duration_in_range'] >= min_period_Y]

clifo['First_Frost_DOY'] = clifo['First_Date'].dt.dayofyear
clifo['Last_Frost_DOY'] = clifo['Last_Date'].dt.dayofyear
clifo['Frost_Period_D'] = (clifo['Last_Frost_DOY'] - clifo['First_Frost_DOY'] + 1).clip(lower=0)
cliflo['First_Frost_DOY'] = cliflo['First_Date'].dt.dayofyear
cliflo['Last_Frost_DOY'] = cliflo['Last_Date'].dt.dayofyear
cliflo['Frost_Period_D'] = (cliflo['Last_Frost_DOY'] - cliflo['First_Frost_DOY'] + 1).clip(lower=0)


clifo['Growing_Season_D'] = (clifo['Year'].apply(days_in_year) - clifo['Frost_Period_D']).fillna(np.inf)
cliflo['Growing_Season_D'] = (cliflo['Year'].apply(days_in_year) - cliflo['Frost_Period_D']).fillna(np.inf)

def median_frost_day(series, fill_value=np.nan):
# Calculate the median
Expand All @@ -69,7 +69,7 @@ def median_frost_day(series, fill_value=np.nan):
return median if placeholder_fraction < 0.5 else np.nan

group_by_cols = ['Station', 'network', 'agent', 'start', 'end', 'open', 'lat', 'lon', 'geometry', 'duration_in_range']
medians = gpd.GeoDataFrame(clifo.groupby(group_by_cols).agg({
medians = gpd.GeoDataFrame(cliflo.groupby(group_by_cols).agg({
'First_Frost_DOY': lambda x: median_frost_day(x, fill_value=365),
'Last_Frost_DOY': lambda x: median_frost_day(x, fill_value=0.0),
'Growing_Season_D': 'median',
Expand Down
File renamed without changes.
6 changes: 3 additions & 3 deletions static/clifo/download.R → static/cliflo/download.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ install.packages("lubridate")
library(lubridate)
library(dplyr)

richard.cfuser = clifro::cf_user(
username="[email protected]",
user.cfuser = clifro::cf_user(
username="[email protected]",
password="**********"
)

Expand Down Expand Up @@ -46,7 +46,7 @@ for (region in regions) {
cf_data <- tryCatch({
# Attempt to query the data
cf_data <- clifro::cf_query(
user = richard.cfuser,
user = user.cfuser,
datatype = temp.dt,
station = temp.dt.stations,
start_date = start_date,
Expand Down
Loading

0 comments on commit fff6db1

Please sign in to comment.