SSE overall Design dicussion #737

barrettk · 2025-02-05T19:03:12Z

We have drafted the idea of adding a new SSE model type to bbr in #735.

This issue will provide a full example of running an SSE analysis to help manage and structure design decisions we still need to make. We opted to refactor existing bootstrap functionality since a lot of the setup (procedurally), managing, and summarization of SSE runs overlapped with bootstraps. However, we need to better understand where they diverge, and what bbr should be responsible for.

Overall Design Decisions

Do we need more flexibility during the setup (e.g., setup_sse_run), or can users handle any case-specific setup beforehand relatively easily?
- Try replacing the Simulate data step in the example using an mrgsolve simulation
Inspect the summary object and other function calls. Are we able to access everything we need with relative ease?
- Is there anything else we should be capturing and/or making something easier to grab?
- Is there a need for post-summary helper functions to perform standard SSE analyses (such as the initial_estimates_compare example)?
PsN does SSE a bit differently, in that it lets you provide "alternative models", while we have less control over the input simulated data. This is a much larger conversation that should likely be discussed in a separate issue in detail. However it would be nice to discuss this at a higher level. Is this a "version 2" thing? How do we think most users expect SSE to work?

Sub-issues

SSE summarization #738

Example

Install development bbr from commit

devtools::install_git(
  url = "[email protected]:metrumresearchgroup/bbr.git",
  ref = "e5b79a47cbefcdaf573e3d59e2a11064623021f7",
  git = "external",
  dependencies = TRUE
)

library(bbr)

helper functions

# This is a helper function we use in our test suite. It's function is just
# to add an MSFO = {x}.MSF to an $EST record.
# - Used with bbr::add_simulation
add_msf_opt <- function(mod, msf_path = paste0(get_model_id(mod), ".MSF")){
  ctl <- bbr:::get_model_ctl(mod)
  mod_path <- get_model_path(mod)
  est <- nmrec::select_records(ctl, "est")[[1]]
  
  msf_path_ctl <- bbr:::get_msf_path(mod, .check_exists = FALSE)
  if(is.null(msf_path_ctl)){
    nmrec::set_record_option(est, "MSFO", msf_path)
    nmrec::write_ctl(ctl, mod_path)
  }else{
    rlang::inform(glue::glue("MSF option already exists: \n - {est$format()}"))
  }
  return(mod)
}

# Sample Quantiles for a given set of columns in a dataframe
get_percentiles <- function(
    df, 
    compare_cols, 
    probs = c(0.5, 0.025, 0.975), 
    na.rm = FALSE
){
  comp_df <- df %>% dplyr::select({{ compare_cols }})
  quantile_fn <- function(x)  {
    quantile(x, probs = probs, na.rm = na.rm)
  }
  comp_df <- comp_df %>%
    dplyr::reframe(across(.cols = everything(), .fns = quantile_fn)) %>%
    t() %>%
    as.data.frame() %>%
    tibble::rownames_to_column() %>% tibble::as_tibble()
  colnames(comp_df) <- c("parameter_names", paste0("p", probs * 100))
  return(comp_df)
}

Simulate data

# Simulate ----------------------------------------------------------------

# New model with MSF saved out
# - MSF file is required for bbr::add_sumulation()
# - We create a model, ensure it will output estimates to an MSF file, and then
#   simulate `N_SIM` times.
# - This is a working example using `bbr`, though we want to see how inputs could
#   vary when using mrgsolve. Do we need more flexibility below, or can some of it be
#   done using `dplyr`?

# Define N number of simulations
N_SIM <- 200

# Starting example model from bbr
model_dir <- system.file("model/nonmem/basic", package = "bbr")
mod1 <- read_model(file.path(model_dir, "1"))

# Submit a model we plan to simulate
mod2 <- copy_model_from(mod1, "2") %>% update_model_id()
mod2 <- add_msf_opt(mod2) # would normally be done manually
submit_model(mod2, .mode = "local")

# Simulate data - can also test with mrgsolve
add_simulation(mod2, n = N_SIM, .mode = "local", .overwrite = T)
sim_data <- nm_join_sim(mod2)

New SSE run

# New SSE Run -------------------------------------------------------------


# new SSE run or read in previous runs
# mod2 <- read_model(file.path(model_dir, "2"))
# sse_run <- read_model(file.path(model_dir, "2-sse"))

# Can use `.suffix` to create multiple SSE designs from the same starting model
# sse_run <- new_sse_run(mod2, .suffix = "sse-design-1")
sse_run <- new_sse_run(mod2, .suffix = "sse")


# Set up the SSE run
# - This function takes a bbi_nmsse_model (created by a previous new_sse_run() 
#   call) and creates `n` new model objects and re-sampled datasets in a subdirectory. 
#   The control stream found at get_model_path(sse_run) is used as the "template"
#   for these new model objects, and the new datasets are sampled from the dataset
#   passed to data. 
# - See ?setup_sse_run for more details
sse_run <- setup_sse_run(
  sse_run,
  # Simulation dataset
  # - Could filter to a specific design here:
  # data = sim_data %>% dplyr::filter(DESIGN = 1),
  data = sim_data,
  # Stratification columns for sampling
  strat_cols = "SEX",
  # N simulations
  n = N_SIM,
  # Sample size for each dataset (uses ID column as KEY)
  sample_size = 30,
  # Simulation replicate column name (e.g., "IREP" for mrgsolve)
  # - Filters to each simulation before sampling.
  sim_col = "nn"
)

# Print to console to view SSE specifications
sse_run

Submit and get status

# Submit in batches
submit_model(sse_run, .batch_size = 100)

# Get status of run completion along the way
get_model_status(sse_run)

Summarize and save results

# Summarize the parameter estimates, run details, and any heuristics of a SSE run,
# saving the results to a `sse_summary.RDS` data file within the SSE run directory.
# - See ?summarize_sse_run() for more details
sse_sum <- summarize_sse_run(sse_run)

# Print to console to view high level information about the run
sse_sum

# You can look at different summary tables within this object
sse_sum$analysis_summary
sse_sum$run_details
sse_sum$run_heuristics

# Read in SSE estimates. Faster once it's been summarized above^
sse_estimates(sse_run) # same as sse_sum$parameter_estimates


# Summary log for each run
# You can also find this information in sse_sum 
# (e.g., sse_sum$analysis_summary has OFV's, termination codes, etc.)
summary_log(sse_run$absolute_model_path)


# Compare to initial or "true" estimates the SSE run is based on
# - initial_estimates_compare() is a prototype function in bbr
initial_estimates_compare(sse_sum, probs = c(0.5, 0.025, 0.975))

# Look at OFV distribution
# - get_percentiles() is a helper defined in this doc
get_percentiles(sse_sum$analysis_summary, compare_cols = "ofv")


# Read in all SSE models - can be helpful for inspecting specific model runs
sse_mods <- get_sse_models(sse_run)

length(sse_mods)
model_summary(sse_mods[[1]]) %>% param_estimates()

Example plots (post summary)

# Example Plots
library(ggplot2)

# Look at OFV distribution
sse_sum$analysis_summary %>%
  ggplot(aes(x = ofv)) +
  geom_histogram(color = "white", alpha = 0.7) +
  theme_bw()


# Look at estimate distributions
par <- sse_estimates(sse_run) 
sum_data <- initial_estimates_compare(sse_sum) %>% 
  tidyr::pivot_longer(
    cols = c("initial","p50", "p2.5", "p97.5"), 
    names_to = "stat", values_to = "value"
  )

record_type <- "THETA" # Filter to record type (e.g., THETAs)
par_pl <- par %>% dplyr::filter(grepl(record_type, parameter_names))
sum_data_pl <- sum_data %>% dplyr::filter(grepl(record_type, parameter_names))

low_hi_sum <- sum_data_pl %>% dplyr::filter(stat != "initial")
orig_sum <- sum_data_pl %>% dplyr::filter(stat == "initial")

par_pl %>% ggplot(aes(x = estimate)) +
  facet_wrap(~parameter_names, scales = "free") +
  geom_histogram(color = "white", alpha = 0.7) +
  geom_vline(data = low_hi_sum, aes(xintercept = value), lwd = 1, color = "blue3") +
  geom_vline(data = orig_sum, aes(xintercept = value), lwd = 1, color = "red3") +
  theme_bw() + labs(x = "Value", y = "Count")

Objective Function Value	Parameter estimates compared to "truth" (red line)

The text was updated successfully, but these errors were encountered:

barrettk added the needs SME input SME = Subject Matter Expert label Feb 5, 2025

barrettk mentioned this issue Feb 5, 2025

SSE summarization #738

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSE overall Design dicussion #737

SSE overall Design dicussion #737

barrettk commented Feb 5, 2025 •

edited

Loading

SSE overall Design dicussion #737

SSE overall Design dicussion #737

Comments

barrettk commented Feb 5, 2025 • edited Loading

Overall Design Decisions

Sub-issues

Example

barrettk commented Feb 5, 2025 •

edited

Loading