MW_Predicted-Floweriness.Rmd

---
title: "Calculate Flowering Indices from MeadoWatch Model Fits"
author: "Janneke Hille Ris Lambers, Aji John, Meera Sethi, Elli Theobald"
date: "13/01/2021"
output: html_document
---

#Goal / Objective
The goal of this R markdown is to estimate, per DOY, various metrics correlated to the flowering season. This file takes as input parameter fits from phenological curves fit to MeadoWatch data (see *Peak-Wildflower-Season.Rmd*)

#Setup for R script and R markdown
1. Load all libraries
2. Specify string behavior
3. Load packages

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(stringsAsFactors = FALSE)
library(boot)
 
```


#Read in necessary data: parameter fits and trail specific SDD data
Parameter fits are from maximum likelihood models fit to a subset of MeadoWatch data (see *MW_Modelfitting(byspecies).Rmd*). The snow disappearance data were measured using hobos, modeled or observed on the MeadoWatch trails (these observations are used to describe the snow dynamics of each year / trail). The following needs to be read in:

1. *output/Parameters_plotmodel.csv*: includes peak, range and max parameters (for a unimodal curve describing the relationship between DOY and flowering probability), in each year, trail, plot in which species were observed. In other words, this model was separately fit to year-trail-plot-species specific data.

2. A second set of models were fit to each species data. This model assumed the same functional form - with the relationship between DOY and flowering probability in each plot determined by 4 parameters (in *output/Parameters_speciesmodel.csv*. In this model, however, the following was assumed:
    + A. The peak parameter varied with plot specific snow duration. The slope and intercept parameters describing this relationship are the first two parameters.
    + B. The range / duration parameter describing phenological curves are assumed to be constant across species (fit across all years / plots)

3. *cleandata/MW_SDDall.csv* includes observed and predicted snow disappearance data collected from each trail / plot.

4. Also read in the necessary phenological functions (*HRL_phenology_functions.R*)
    
```{r}
# read in functions to calculate trail-wide flowering
source("./HRL_phenology_functions.R")

# Read in phenology data, station data
PlotPheno_pars <- read.csv("output/Parameters_plotmodel.csv", header=TRUE)
Species_pars <- read.csv("output/Parameters_speciesmodel.csv", header=TRUE)
SDDdat_0 <- read.csv("cleandata/MW_SDDall.csv", header=TRUE)
SiteInfo <- read.csv("data/MW_SiteInfo_2013_2020.csv", header=TRUE)

# Merge SDDdat_0 with SiteInfo
SDDdat_1 <- merge(SDDdat_0, SiteInfo, by = c("Site_Loc"))

# Create new SDD - observed when present, predicted when not
SDDfin <- SDDdat_1$SDD
# This substitutes in predicted SDD where sensors failed = NA
SDDfin[is.na(SDDdat_1$SDD)==TRUE] <- 
  round(SDDdat_0$predSDD[is.na(SDDdat_1$SDD)==TRUE],0)

# this removes unwanted columns, including SDD and pred SDD
SDDdat_2 <- subset(SDDdat_1, select=-c(Site_Num, Site_Loc, calibration,
                                     snow_appearance_date, 
                                     snow_disappearance_date,
                                     snow_cover_duration, minimum_soil_temp,
                                     SDD, predSDD, Latitude, 
                                     Longitude, Elevation))

# This adds SDDfin back to data frame as SDD
SDDdat_2$SDD <- SDDfin

# Subset so Forest Types aren't included
SDDdat <- SDDdat_2[SDDdat_2$Type!="Forest",]

# Determine unique species, years included in parameters
species <- unique(Species_pars$species)
nspp <- length(species)
yrs <- unique(PlotPheno_pars$year); yrs <- yrs[order(yrs)]
nyrs <- length(yrs)
trails <- c("Glacier Basin", "Reflection Lakes")

# Reminder of species, # years included
print(paste("Data examines phenology of", nspp, "species", sep=" "))
print("Species include:"); print(species)
print(paste("Data includes observations from", nyrs, "years", sep=" "))
print("Years of data included:"); print(yrs)      

#Examine data frames     
head(PlotPheno_pars)
head(Species_pars)
head(SDDdat)

#Write
write.csv(SDDdat, "output/MeadowSDD.csv", quote=FALSE,
          row.names=FALSE)
```


#Calculate trail wide probability of observing flowers and flowering richness 
This chunk of code estimates the probability of observing flowering of focal species along each MeadoWatch trail (RL and GB) for each year, between March 15 and October 15 (7 months). It does so using fitted models relating peak flowering to snow disappearance per species. We then make some assumptions to calculate the probability of observing flowering and flowering richness in the following way:

1. For each trail and year, we calculate the earliest and latest snowmelt observed that year (assumptions, this a reasonable proxy for range of snowmelt observed along the entire trail).

2. For each trail, year and focal species on the trail, we calculate the following parameters:
    + A. Earliest peak flowering and latest peakflowering (*peakearly* and *peaklate*) on the trail, from the relationship between SDD and peak flowering (in *Species_pars*).
    + B. Also extract year-specific range and max parameters from the same file.
    
3. Year, species and trail wide probability of flowering by DOY is calculated as follows:
    + DOY < *peakearly*; predict using *peakearly* (on the trail in question), *range* and *max*
    + DOY >= *peakearly* and <= *peaklate*, probability of flowering is inv.logit(*max*). This assumes that somewhere on the trail the visitor is observing flowering at the maximum probability.
    + DOY > *peaklate*, predict using *peaklate* (on the trail in question), *range* and *max*
    
4. Species-specific flowering probabilites can be used to calculate the probability of observing all focal species flowering, and the probability of observing at least one focal species flowering. 

5. Flowering richness is also calculated, as follows:
    + Per species, when flowering probability is > 0.5*max, flowering = yes; < = no
    + Sum all the yes's per DOY per trail to calculate richness

*Note that an issue with this is that we are assuming that yes / no observations of a specific species in a 2 x 1 meter plot correlate with abundance / number of flowers in meadows along the trail.*
    
```{r}
DOY_pred <- seq(105,285) #approximate April 15 - October 15
output_p <- c()
output_r <- c()
f_thresh <- 0.5 #between 0 and 1, modifies at what proportion of max flowering,
                # for example, at 0.5, implies than when 50% max close to peak
                

for(i in 1:nyrs){ #each year
    outputGB <- cbind(rep(yrs[i], times=length(DOY_pred)),
                      rep("GB", times=length(DOY_pred)), DOY_pred)
    outputRL <- cbind(rep(yrs[i], times=length(DOY_pred)),
                      rep("RL", times=length(DOY_pred)), DOY_pred)
    dimnames(outputGB) <- list(c(), c("year","trail","DOY"))
    dimnames(outputRL) <- list(c(), c("year","trail","DOY"))
    outputRL_yr_p <- outputRL; outputGB_yr_p <- outputGB
    outputRL_yr_r <- outputRL; outputGB_yr_r <- outputGB
    
    for(j in 1:2){ #each trail
        #Extract max and min SDD
        SDD_obs <- SDDdat$SDD[SDDdat$Year==yrs[i] &
                                  SDDdat$Transect==trails[j]]
        if(length(SDD_obs)==0){next} #break if no obs - GB 2013, 2014
        SDD_max <- max(SDD_obs); SDD_min <- min(SDD_obs)
        
        for(k in 1:length(species)){
            #determine early, late peak
            int_spp <- Species_pars$peakint[Species_pars$species==species[k]]
            slp_spp <- Species_pars$peakSDDslope[Species_pars$species
                                                 ==species[k]]
            peakearly <- int_spp + slp_spp*SDD_min
            peaklate <- int_spp + slp_spp*SDD_max
            
            #determine max, range
            max <- Species_pars$max[Species_pars$species==species[k]]
            rng <- Species_pars$range[Species_pars$species==species[k]]
            
            #calculate predflower
            param <- c(peakearly, peaklate, rng, max)
            ptrailf <- predflowertrail(DOY_pred,param)
            ptrailr <- ptrailf
            ptrailr[ptrailf>=f_thresh*max(ptrailf)] <- 1
            ptrailr[ptrailf<f_thresh*max(ptrailf)] <- 0
            
            #Now add tooutput files
            if(j==1){
              outputGB_yr_p <- cbind(outputGB_yr_p, ptrailf)
              dimnames(outputGB_yr_p)[[2]][k+3] <- species[k]
              outputGB_yr_r <- cbind(outputGB_yr_r, ptrailr)
              dimnames(outputGB_yr_r)[[2]][k+3] <- species[k]
              }
            if(j==2){
              outputRL_yr_p <- cbind(outputRL_yr_p, ptrailf)
              dimnames(outputRL_yr_p)[[2]][k+3] <- species[k]
              outputRL_yr_r <- cbind(outputRL_yr_r, ptrailr)
              dimnames(outputRL_yr_r)[[2]][k+3] <- species[k]
          }
        }
    if(j==1 & yrs[i]>2014){
             output_p <- rbind(output_p,outputGB_yr_p)
             output_r <- rbind(output_r,outputGB_yr_r)}
    if(j==2){output_p <- rbind(output_p,outputRL_yr_p)
             output_r <- rbind(output_r,outputRL_yr_r)}
    }
}

#change output_p and output_r to data frames
output_p <- data.frame(output_p)
output_r <- data.frame(output_r)

for(i in 3:dim(output_p)[2]){
  output_p[,i] <- as.numeric(output_p[,i])
  output_r[,i] <- as.numeric(output_r[,i])
}

#Calculate probability flowering indices
#probability all focal species flower
allfl <- apply(output_p[,4:dim(output_p)[2]], 1, prod)

#probability any focal species flowers
pnofl <- 1-output_p[,4:dim(output_p)[2]]
anyfl <- 1-apply(pnofl, 1, prod)

#append
output_p_fin <- data.frame(cbind(output_p, anyfl,allfl))

#Calculate richness 
rich <- apply(output_r[,4:dim(output_r)[2]], 1, sum)
flseason <- rep(0, times=length(rich))
flseason[rich[]>=0.5*length(species)] <- 1
output_r_fin <- data.frame(cbind(output_r, rich, flseason))

#Examine output
head(output_p_fin) 
head(output_r_fin)

#write output
write.csv(output_p_fin, "output/FloweringProbabilities.csv", quote=FALSE,
          row.names=FALSE)
write.csv(output_r_fin, "output/FloweringRichness.csv", quote=FALSE,
          row.names=FALSE)

```


#Visualize flowering richness for all years, on both transects
```{r}
  
#Set up plot
par(mfrow=c(1,2), omi=c(0,0,0,0), mai=c(0.5,0.4,0.4,0.2), 
    tck=-0.01, mgp=c(1.25,0.25,0), xpd=TRUE)

yrcols <- c("chartreuse","cyan","darkorange","darkorchid1","darkgreen",
            "deeppink3","cornflowerblue")

#first plot Reflection lakes
trldat_r <- output_r_fin[output_r_fin$trail=="RL",]
maxspp <- max(output_r_fin$rich)

for(i in 2013:max(yrs)){
    yrdat_r <- trldat_r[trldat_r$year==i,]
    
   #graph richness: create if 2013, add lines if later years
   if(i==2013){
   plot(yrdat_r$DOY,yrdat_r$rich, type="l", col=yrcols[i-2012],
       lwd=2, xaxt="n",xlab="DOY", ylab="flowering richness", 
       main=paste("RL - Richness", sep=" "), ylim=c(0,maxspp));
   text(c(105,135,165,195,225,255,285),-1*maxspp/12,
       labels=c("Apr","May","Jun","Jul","Aug","Sep","Oct"))}
    
   if(i>2013){
     lines(yrdat_r$DOY,yrdat_r$rich, type="l", col=yrcols[i-2012],lwd=2)}
}

#Now Glacier Basin
trldat_r <- output_r_fin[output_r_fin$trail=="GB",]
maxspp <- max(trldat_r$rich)

for(i in 2015:max(yrs)){
        yrdat_r <- trldat_r[trldat_r$year==i,]
    
   #graph richness: create if 2013, add lines if later years
   if(i==2015){
   plot(yrdat_r$DOY,yrdat_r$rich, type="l", col=yrcols[i-2012],
       lwd=2, xaxt="n",xlab="DOY", ylab="Flowering richness", 
       main=paste("GB - Richness", sep=" "), ylim=c(0,maxspp));
   text(c(105,135,165,195,225,255,285),-1,
       labels=c("Apr","May","Jun","Jul","Aug","Sep","Oct"))}
    
   if(i>2015){
     lines(yrdat_r$DOY,yrdat_r$rich, type="l", col=yrcols[i-2012],lwd=2)}
}

legend(x="topleft",legend=c(2013:2019), cex=0.75,
       col=yrcols, lwd=2)

```