Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add D-M, random seed argument, and code improvements #6

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Description: One step ahead (OSA) residual diagnostic plots for composition data
License: GPL-3
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.1
RoxygenNote: 7.3.2
Depends:
R (>= 2.10)
Imports:
Expand Down
6 changes: 5 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
@@ -1 +1,5 @@
exportPattern("^[[:alpha:]]+")
# Generated by roxygen2: do not edit by hand

export(plot_osa)
export(run_osa)
import(ggplot2)
72 changes: 55 additions & 17 deletions R/plot_osa.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,21 +12,41 @@
#' @param figwidth (default=NULL) by default the function scales the figure width by
#' the number of fleets being plotted. user may want to overwrite depending on
#' other variables like the number of years in the model.
#' @param plot Whether to plot and return the ggplot object (default) or return the
#' underlying data
#' @param addCI Whether to add a confidence interval to the SDNR
#' value for the QQ plots. See details section for further information.
#' @param vjust,hjust Values to control placement of the SDNR
#' text on QQ plots. See \code{?geom_text} for more details.
#'
#' @return Saves a multipanel figure with OSA bubble plots, standard normal QQ
#' plots, and aggregated fits to the composition data for one or more fleets.
#' Also returns these plots as an outputted list for further refinement by
#' user if needed. Outlying residuals are defined as being greater than an
#' absolute value of 3 and identified in the bubble plots as a triangle. The
#' QQ plots include the standard deviation of the normalized residuals (SDNR;
#' Francis, 2011), which if the models assumptions are met, should be 1.
#' @return Creates a multipanel figure with OSA bubble plots, standard normal QQ
#' plots, and aggregated fits to the composition data for one or more fleets. Also
#' returns these plots as an outputted list for further refinement by user if
#' needed (if plot=TRUE, otherwise it returns the underlying data.frames as a
#' list). Outlying residuals are defined as being greater than an absolute value of
#' 3 and identified in the bubble plots as a triangle. The QQ plots include the
#' 3 and identified in the bubble plots as a triangle. The QQ plots include the
#' standard deviation of the normalized residuals (SDNR; Francis, 2011), which if
#' the models assumptions are met, should be 1.
#'
#' References:
#' @details The standard deviation of the normalized residuals
#' (SDNR) is calcaluted as sd(resid) because under a correctly
#' specified model the OSA residuals are iid standard normal
#' and thus already normalized. The SDNR will follow a Chisq
#' distribution with degrees of freedom of (n-1) where n is the
#' number of residuals (after dropping a bin). Francis (2011)
#' suggests only an upper confidence limit for indices, but
#' here we are interested in overfit as well and so calculate a
#' two-sided 95\% confidence interval. This is given in
#' parentheses below the SDNR value. We caution against strict
#' threhold tests of this and instead suggest using it to give
#' context to the size of SDNR.

#' @references
#' Francis, R.C., 2011. Data weighting in statistical fisheries stock
#' assessment models. Canadian Journal of Fisheries and Aquatic Sciences,
#' 68(6), pp.1124-1138.
#'
#'
#' @import ggplot2
#'
#' @export
Expand Down Expand Up @@ -65,15 +85,19 @@
#' osaplots$bubble
#' osaplots$qq
#' osaplots$aggcomp
plot_osa <- function(input, outpath = NULL, figheight = 8, figwidth = NULL) {
plot_osa <- function(input, plot=TRUE, outpath = NULL, figheight = 8, figwidth = NULL,
addCI = TRUE, hjust = -.1, vjust = 1.1) {

# create output filepath if it doesn't already exist
if(!is.null(outpath)) dir.create(file.path(outpath), showWarnings = FALSE)

## helper function so the order of input stays the same when plotted
fleets <- sapply(input, function(x) x[[1]]$fleet[1])
fleetf <- function(x) factor(x, levels=fleets)
# ensure osa inputs are structured properly:
res <- lapply(input, `[[`, 1) # extracts each element of the list of lists
if(all(unlist(lapply(res, is.data.frame)))) {
res <- do.call("rbind", res)
res$fleet <- fleetf(res$fleet)
} else {
stop("The input argument should be a list() of output objects from run_osa. The $res element in one of these lists was not a dataframe.")
}
Expand All @@ -85,10 +109,10 @@ plot_osa <- function(input, outpath = NULL, figheight = 8, figwidth = NULL) {
agg <- lapply(input, `[[`, 2)
if(all(unlist(lapply(agg, is.data.frame)))) {
agg <- do.call("rbind", agg)
agg$fleet <- fleetf(agg$fleet)
} else {
stop("The input argument should be a list() of output objects from run_osa. The $agg element in one of these lists was not a dataframe.")
}

# bubble plots
res <- res %>%
dplyr::mutate(sign = ifelse(resid < 0, "Neg", "Pos"),
Expand Down Expand Up @@ -116,9 +140,20 @@ plot_osa <- function(input, outpath = NULL, figheight = 8, figwidth = NULL) {
theme(legend.position = "top")

# QQ plots

sdnr <- res %>%
dplyr::group_by(fleet) %>%
dplyr::summarise(sdnr = paste0('SDNR = ', formatC(round(sd(resid),3), format = "f", digits = 2)))
dplyr::summarise(
df=n()-1,
HCI = sqrt(qchisq(.975,df)/df),
LCI = sqrt(qchisq(.025,df)/df),
est= sd(resid)) %>%
mutate(
sdnr=paste0('SDNR=',sprintf('%.2f', est))
)
if(addCI)
sdnr <- mutate(sdnr,
sdnr=paste0(sdnr,'\n(', sprintf('%.2f', LCI), '-', sprintf('%.2f', HCI),')'))

qq_plot <- ggplot() +
stat_qq(data = res, aes(sample = resid), col = "blue") +
Expand All @@ -128,8 +163,7 @@ plot_osa <- function(input, outpath = NULL, figheight = 8, figwidth = NULL) {
theme_bw(base_size = 10) +
geom_text(data = sdnr,
aes(x = -Inf, y = Inf, label = sdnr),
hjust = -0.5,
vjust = 2.5)
hjust = hjust, vjust = vjust)

# aggregated fits

Expand Down Expand Up @@ -169,9 +203,13 @@ plot_osa <- function(input, outpath = NULL, figheight = 8, figwidth = NULL) {
}

# save and print figure
ggsave(plot = p, filename = fp, units = 'in', bg = 'white', height = figheight,
if(!is.null(outpath))
ggsave(plot = p, filename = fp, units = 'in', bg = 'white', height = figheight,
width = figwidth, dpi = 300)
print(p)
if(plot){
print(p)
return(p)
}
return(list(bubble = bubble_plot,
qq = qq_plot,
aggcomp = agg_plot))
Expand Down
21 changes: 15 additions & 6 deletions R/run_osa.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,10 @@
#' @param years vector of years associated with the observed ages or lengths
#' @param index_label character value indicating 'age' or 'length bin' depending
#' on comp type
#'
#' @param theta scalar for using the linear Dirichlet-multinomial, if no value is
#' provided (the default) the function assumes a multinomial distribution, otherwise
#' alpha is calcluated as the sample size N times the expected probabilities times theta.
#' @param seed A random seed (integer) used to \code{set.seed} for reproducibility. If unspecified a default of 99801 is used. Random values are necessary for integer observations.
#' @return a list with two elements: (1) \code{res}: a long-format dataframe with
#' columns fleet, index_label (indicates whether the comp is age or length),
#' year, index (age or length bin), resid (osa), and (2) \code{agg}: a dataframe of
Expand Down Expand Up @@ -43,19 +46,25 @@
#' out1$res # osa residual for each age and year
#' out1$agg # observed and expected value for each age aggregated across all yrs
#'
run_osa <- function(obs, exp, N, fleet, index, years, index_label = 'Age or Length'){
run_osa <- function(obs, exp, N, fleet, index, years,
index_label = 'Age or Length',
seed=99801, theta=NULL){

# check dimensions
stopifnot(all.equal(nrow(obs), nrow(exp), length(N), length(years)))
stopifnot(all.equal(ncol(obs), ncol(exp), length(index)))

if(!is.null(theta)) stopifnot(theta>0)
# calculate osa residuals for multinomial (note the rounding here, multinomial
# expects integer) - sum of obs should equal N
o <- round(N*obs/rowSums(obs), 0); p <- exp/rowSums(exp)
# o <-N*obs/rowSums(obs); p <- exp/rowSums(exp)
set.seed(99801)
res <- compResidual::resMulti(t(o), t(p))

set.seed(seed)
if(!is.null(theta)){
alpha <- rowSums(o)*p*theta
res <- compResidual::resDirM(t(o), t(alpha))
} else {
res <- compResidual::resMulti(t(o), t(p))
}
# aggregated fits to the composition data
oagg <- colSums(o)/sum(o)
eagg <- colSums(p)/sum(p)
Expand Down
60 changes: 47 additions & 13 deletions man/plot_osa.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 17 additions & 1 deletion man/run_osa.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.