-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathload-duration.qmd
434 lines (337 loc) · 18.2 KB
/
load-duration.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
# Load Duration Curves {#sec-ldc}
The Load Duration Curve (LDC) is a simple method for visualizing and characterizing water quality concentrations at different flow regimes. It is supported by EPA, TCEQ and TSSWCB for Total Maximum Daily Load (TMDL) and Watershed Protection Plan (WPP) development. The LDC is an extension of a flow duration curve (FDC). The FDC is a cumulative distribution plot with observed mean daily streamflow values on the y-axis and the proportion of time or values that exceed a given flow on the x-axis. More specifically, the FDC displays the exceedance probability $p$ on the x-axis and the associated discharge, $Q$, on the y-axis [@vogelFlowDurationCurves1994]. The LDC is developed by multiplying an allowable pollutant concentration (water quality standard or screening level) by the daily streamflow volume to identify the allowable pollutant loads across flow duration intervals. Measured pollutant concentrations are added on top of the duration curve by multiplying the concentration and streamflow volume on a given day to derive an instantaneous load at a given exceedance percentile. By overlaying measured values over the duration curve, we develop some inference for what conditions pollutant concentration and loads exceed water quality thresholds.
The LDC approach is most appropriate in water bodies where there is some type of correlation between flow condition and concentration (typically rivers and streams where loading is tied to runoff and there are not strong accumulation processes). The LDC approach is typically not appropriate for lakes and estuaries. The LDC approach can be modified to account for tidal influences.
@u.s.environmentalprotectionagencyApproachUsingLoad2007 provides a good introduction to and discussion on the approriate uses of LDCs that are applicable for both TMDL and WPP development.
## Data
This example uses *E. coli* bacteria concentrations collected at SWQMIS station `12517` on Tres Palacios Creek. Streamflow data comes from the co-located USGS streamgage `08162600`.
### Water Quality
:::{.column-margin}
**Get the data:**
These examples use the `swqmispublicdata.txt` data in the [example data](https://github.com/TxWRI/r-manual/raw/main/data/tutorial.zip)
:::
We will import water quality data the same way that was covered in @sec-waterquality.
```{r}
#| message: false
# install.packages("janitor")
library(tidyverse)
library(ggtext)
library(janitor)
library(dataRetrieval)
library(DescTools)
library(twriTemplates)
df <- read_delim("data/swqmispublicdata.txt",
delim = "|",
# see awkward column names that have to be wrapped
# in between "`" (escape) marks.
col_types = cols_only(
`Segment ID` = col_character(),
`Station ID` = col_factor(),
`Station Description` = col_character(),
`End Date` = col_date(format = "%m/%d/%Y"),
`Collecting Entity` = col_character(),
`Monitoring Type` = col_character(),
`Composite Category` = col_character(),
`Parameter Name` = col_character(),
`Parameter Code` = col_character(),
`Value` = col_number(),
`RFA/Tag ID` = col_character()
)) |>
clean_names() |>
filter(station_id == "12517") |>
filter(parameter_code == "31699")
df |> glimpse()
```
### Hydrology
Streamflow data is obtained from the USGS NWIS using the *dataRetrieval* package. It is important to note that streamflow data might not be static. USGS might update the data at some point due to new rating curves, data quality checks, etc. We want to work off a snapshot of the data.
:::{.column-margin}
:::{.callout-important}
In a project workflow, this would be in a data download script that saves your downloaded data to a csv. Then in your analysis script, read the csv data. You can rerun the analysis script without having to re-download the data and possibly running into changes in the source data. Frequent repeated downloads might also cause server issues or your IP getting temporarily blocked or rate-limited by the NWIS server.
:::
:::
```{r}
#| message: false
# This should be in a separate data download script!
Q_df <- readNWISdv(siteNumbers = "08162600",
startDate = "2000-01-01",
endDate = "2020-12-31",
parameterCd = "00060",
statCd = "00003")
Q_df <- renameNWISColumns(Q_df) |>
clean_names()
# save the data and work off the saved data
# write_csv(Q_df, "data/streamflow_0816200.csv")
# Q_df <- read_csv("data/streamflow_0816200.csv")
Q_df |> glimpse()
```
## Manual Method
First we will go through a manual method for developing the FDC and LDC. I go through this becuae it is worth understanding the specific steps for developing the FDC and LDC. the [*ldc*](https://txwri.github.io/ldc/index.html) is available to streamline these steps, but I highly reccomend understanding the fundamentals first.
### Flow Duration Curve
The primary step in calculating the FDC is calculating the exceedance probability for each streamflow value. Without delving into statistics, there are a surprising number of ways to accomplish this. If you have lots of streamflow values, the method you chose does not matter much [@vogelFlowDurationCurves1994]. For manual calculations, we typically calculate the exceedance probability as the rank value of a given flow divided by the the number of streamflow values plus 1 [@morrisonDevelopmentDurationCurveBased2008]:
$$
p_i = \frac{i}{n+1}
$$
where $p_i$ is the exceedance probability, $i$ is the rank number of a given streamflow and $n$ is the number of observations. $p_i$ is also called the Weibull plotting position which is the mean of the cumulative distribution function of the $i$th observation [@gumbelStatisticsExtreme1958].
There are two methods for calculating this in *dplyr*. The first involves the direct calculation by ranking flows in descending order and dividing by length plus one. The second involves using the `ppoints()` function in R which returns ordered probability points for a given vector. by setting `a=0` we specify that the function returns the Weibull plotting positions.
```{r}
#| message: false
Q_df <- Q_df |>
select(date, flow) |>
arrange(flow) |>
mutate(
## direct calculation if you prefer
flow_exceedance = rank(desc(flow), ties.method = "last")/(length(flow)+1),
## or weibull pp function, you don't need to do both!
flow_exceedance_1 = 1-ppoints(flow, a = 0))
ggplot(Q_df) +
geom_line(aes(flow_exceedance, flow)) +
scale_y_log10() +
theme_TWRI_print() +
labs(y = "Mean Daily Flow [cfs]", x = "Proportion of Days Flow Exceeded")
```
The resulting FDC shows us the percent of time over the *entire period of record* that mean daily streamflows were exceeded. For example the above figure shows the max streamflow was around 10,000 cfs and the minimum was less than 1. Also 80% of the time, streamflows exceeded about 10 cfs.
### Load Duration Curve
Now the FDC can be converted to an LDC by multiplying the mean daily streamflow volume by the allowable bacteria concentration. The general steps in your head should be:
- convert mean daily discharge (cfs) to daily volume for water (cubic feet, mL, whatever);
- multiply the measured pollutant concentration by the daily volume
This results in total mass or counts of pollutant per day.
```{r}
#| message: false
Q_df <- Q_df |>
# We don't need both flow exceedance columns
select(-c(flow_exceedance_1)) |>
# MPN/100mL * cubic feet/sec * mL/cubic feet * sec/day = mpn/day
mutate(ldc = (126/100) * flow * 28316.8 * 86400)
ggplot(Q_df) +
geom_line(aes(flow_exceedance, ldc)) +
scale_y_log10() +
theme_TWRI_print() +
labs(y = "*E. coli* [MPN/day]", x = "Proportion of Days Load Exceeded") +
theme(axis.title.y = element_markdown())
```
The LDC looks exactly the same as the FDC, the units on the y-axis change to pollutant load per day. The next step is to add the measured concentrations to the figure. We need to join the bacteria data to the flow data, then calculate the measured loads. In order to plot this and label the legends properly, we need to manually set some of the aesthtic values in `ggplot2()`.
```{r}
#| message: false
#| warning: false
ecoli_df <- df |>
select(station_id, end_date, parameter_code, value)
Q_df <- Q_df |>
left_join(ecoli_df, by = c("date" = "end_date")) |>
mutate(measured_load = (value/100) * flow * 28316.8 * 86400)
ggplot(Q_df) +
geom_line(aes(flow_exceedance, ldc,
linetype = "Allowable Load at Geomean Criterion (126 MPN/100 mL)")) +
geom_point(aes(flow_exceedance, measured_load,
shape = "Measurement Value (MPN/day)",
color = "Measurement Value (MPN/day)")) +
scale_y_log10() +
scale_shape_manual(name = "values", values = c(21)) +
scale_color_manual(name = "values", values = c("dodgerblue4")) +
theme_TWRI_print() +
labs(y = "*E. coli* [MPN/day]", x = "Proportion of Days Load Exceeded") +
theme(axis.title.y = element_markdown(),
legend.direction = "vertical",
legend.title = element_blank())
```
We can already see general trends in the LDC and bacteria data. There is clearly a higher variance and probably a higher difference in median measured load and allowable load at high flows. As flows decrease (on the right hand side of the graph), a higher proportion of measured loads appear to be below the allowable load line.
There are several ways to quantify this. The easiest to explain to the general audience is to (1) split the flows into different regimes, (2) take the geomean of the loads within the flow regime, and (3) take the difference between geomean measured load and the median allowable load. Alternatively, we could fit a log-regression, LOADEST or generalized additive model to the data to estimate load across all exceedance percentiles. The former approach is shown below:
```{r}
#| message: false
# create a summary table
load_summary <- Q_df |>
# classify flow conditions based on exceedance
mutate(flow_condition = case_when(
flow_exceedance >= 0 & flow_exceedance < 0.1 ~ "Highest Flows",
flow_exceedance >= 0.1 & flow_exceedance < 0.4 ~ "Moist Conditions",
flow_exceedance >= 0.4 & flow_exceedance < 0.6 ~ "Mid-range Conditions",
flow_exceedance >= 0.6 & flow_exceedance < 0.9 ~ "Dry Conditions",
flow_exceedance >= 0.9 & flow_exceedance <= 1 ~ "Lowest Flows"
)) |>
group_by(flow_condition) |>
summarize(median_flow = quantile(flow, 0.5, type = 5,
names = FALSE, na.rm = TRUE),
median_p = round(quantile(flow_exceedance,
.5, type = 5, names = FALSE,
na.rm = TRUE), 2),
geomean_ecoli = Gmean(value, na.rm = TRUE),
allowable_load = median_flow * 126/100 * 28316.8 * 86400,
geomean_load = median_flow * geomean_ecoli/100 * 28316.8 * 86400,
reduction_needed = case_when(
allowable_load < geomean_load ~ geomean_load - allowable_load,
allowable_load >= geomean_load ~ 0),
percent_reduction_needed = reduction_needed/geomean_load *100) |>
arrange(median_p) |>
mutate(flow_condition = as_factor(flow_condition))
load_summary
```
Now we have some summary data that includes median values for each flow regime and estimates of required reductions. The next step is to add some info to the ggplot.
```{r}
#| message: false
#| warning: false
# set the y-axis value for the flow-regime labels
label_max <- max(Q_df$measured_load, na.rm = TRUE) +
(0.5 * max(Q_df$measured_load, na.rm = TRUE))
ggplot(Q_df) +
## add some lines to indicate flow regimes
geom_vline(xintercept = c(.10, .40, .60, .90), color = "#cccccc") +
## add ldc line
geom_line(aes(flow_exceedance, ldc,
linetype = "Allowable Load at Geomean Criterion (126 MPN/100 mL)")) +
## add measured loads
geom_point(aes(flow_exceedance, measured_load,
shape = "Measurement Value (MPN/day)",
color = "Measurement Value (MPN/day)")) +
## add summarized measured loads
geom_point(data = load_summary, aes(median_p, geomean_load,
shape = "Exisiting Geomean Load (MPN/day)",
color = "Exisiting Geomean Load (MPN/day)")) +
## log10 y-axis
scale_y_log10() +
## shrink the ends of the x-axis a little bit
scale_x_continuous(expand = c(0.005,0.005)) +
## manually set the shapes for the point aesthetics
scale_shape_manual(name = "values", values = c(12, 21)) +
## manually set the shapes for the color aesthetic
scale_color_manual(name = "values", values = c("red", "dodgerblue4")) +
## I like this tick marks that indicate a log transformed scale
annotation_logticks(sides = "l", color = "#cccccc") +
## add some labels to the flow-regimes
annotate("text", x = .05, y = label_max,
label = "High\nflows", hjust = 0.5, size = 3,
family = "OpenSansCondensed_TWRI", lineheight = 1) +
annotate("text", x = .25, y = label_max,
label = "Moist\nconditions", hjust = 0.5, size = 3,
family = "OpenSansCondensed_TWRI", lineheight = 1) +
annotate("text", x = .50, y = label_max,
label = "Mid-range\nflows", hjust = 0.5, size = 3,
family = "OpenSansCondensed_TWRI", lineheight = 1) +
annotate("text", x = .75, y = label_max,
label = "Dry\nconditions", hjust = 0.5, size = 3,
family = "OpenSansCondensed_TWRI", lineheight = 1) +
annotate("text", x = .95, y = label_max,
label = "Low\nflows", hjust = 0.5, size = 3,
family = "OpenSansCondensed_TWRI", lineheight = 1) +
## labels
labs(y = "*E. coli* [MPN/day]", x = "Proportion of Days Load Exceeded") +
## general theme
theme_TWRI_print() +
theme(axis.title.y = element_markdown(),
legend.direction = "vertical",
legend.title = element_blank())
```
## ldc Package
Some of the steps described above cna be skipped by using the [*ldc*](https://txwri.github.io/ldc/) package. The package includes functions for period of record LDCs and annualized LDCs which are not covered here. One important functionality introduced with the package is the use of [*units*](https://github.com/r-quantities/units/).
```{r}
#| message: false
# install.packages("ldc",
# repos = c(txwri = 'https://txwri.r-universe.dev',
# CRAN = 'https://cloud.r-project.org'))
library(ldc)
library(units)
## ldc uses the unit package to facilitate unit conversions
## we need to make the cfu unit first, since it isn't included
## in the units package
install_unit("MPN")
## get a new clean dataframe with column of Date, flows, and E. coli
Q_df <- readNWISdv(siteNumbers = "08162600",
startDate = "2000-01-01",
endDate = "2020-12-31",
parameterCd = "00060",
statCd = "00003") |>
renameNWISColumns(Q_df) |>
clean_names() |>
left_join(ecoli_df, by = c("date" = "end_date")) |>
## attach a unit to streamflow
mutate(flow = set_units(flow, "ft^3/s"),
value = set_units(value, "MPN/100mL"))
Q_df |> glimpse()
```
With the dataframe setup, the `calc_ldc()` function will generate a exceedance probabilities and flow regime categories:
```{r}
#| message: false
## specify the allowable concentration
allowable_concentration <- 126
## set the units
units(allowable_concentration) <- "MPN/100mL"
df_ldc <- calc_ldc(Q_df,
Q = flow,
C = value,
allowable_concentration = allowable_concentration,
breaks = c(1, 0.9, 0.6, 0.4, 0.1, 0),
labels = c("Highest Flows",
"Moist Conditions",
"Mid-range Flows",
"Dry Conditions",
"Low Flows"))
df_ldc
```
With the LDC information calculated, the summary table can be generated with `summ_ldc()`:
```{r}
#| message: false
#|
df_sum <- summ_ldc(df_ldc,
Q = flow,
C = value,
Exceedance = P_Exceedance,
groups = Flow_Category,
method = "geomean")
df_sum
```
With the summary table we can finally plot the LDC:
```{r}
#| message: false
#| warning: false
draw_ldc(df_ldc,
df_sum,
label_nudge_y = log10(1000)) +
labs(y = "*E. coli* [MPN/day]") +
scale_y_log10() +
theme_TWRI_print() +
theme(axis.title.y = element_markdown(),
legend.title = element_blank(),
legend.direction = "vertical")
```
We get the same outputs as the manual method with far fewer functions and less change for copy/paste errors. Another advantage of using *ldc* is the ability to change units on the fly without digging into a formula.
For example, the summary table shows median daily flow volume as 100ml/day. That isn't an inuitive unit. Let's report in million gallons per day instead by using the `set_units()` function:
```{r}
#| message: false
#|
df_sum <- df_sum |>
mutate(Median_Daily_Flow_Volume = set_units(Median_Daily_Flow_Volume,
"1E6gallons/day"))
df_sum
```
Often we report bacteria loads as million or billion counts per day:
```{r}
#| message: false
#|
df_sum <- df_sum |>
mutate(Median_Flow_Load = set_units(Median_Flow_Load,
"1E9MPN/day"))
df_sum
```
Update units in df_ldc also:
```{r}
#| message: false
#|
df_ldc <- df_ldc |>
mutate(Daily_Load = set_units(Daily_Load,
"1E9MPN/day"),
Allowable_Daily_Load = set_units(Allowable_Daily_Load,
"1E9MPN/day"))
df_ldc
```
Now the updated units can be plotted on the LDC again:
```{r}
#| message: false
#| warning: false
draw_ldc(df_ldc,
df_sum,
label_nudge_y = log10(1000)) +
labs(y = "*E. coli* [Billion MPN/day]") +
## make our labels more reader friendly
scale_y_log10(labels = scales::comma) +
theme_TWRI_print() +
theme(axis.title.y = element_markdown(),
legend.title = element_blank(),
legend.direction = "vertical")
```