Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSE overall Design dicussion #737

Open
6 tasks
barrettk opened this issue Feb 5, 2025 · 0 comments
Open
6 tasks

SSE overall Design dicussion #737

barrettk opened this issue Feb 5, 2025 · 0 comments
Labels
needs SME input SME = Subject Matter Expert

Comments

@barrettk
Copy link
Collaborator

barrettk commented Feb 5, 2025

We have drafted the idea of adding a new SSE model type to bbr in #735.

This issue will provide a full example of running an SSE analysis to help manage and structure design decisions we still need to make. We opted to refactor existing bootstrap functionality since a lot of the setup (procedurally), managing, and summarization of SSE runs overlapped with bootstraps. However, we need to better understand where they diverge, and what bbr should be responsible for.

Overall Design Decisions

  • Do we need more flexibility during the setup (e.g., setup_sse_run), or can users handle any case-specific setup beforehand relatively easily?
    • Try replacing the Simulate data step in the example using an mrgsolve simulation
  • Inspect the summary object and other function calls. Are we able to access everything we need with relative ease?
    • Is there anything else we should be capturing and/or making something easier to grab?
    • Is there a need for post-summary helper functions to perform standard SSE analyses (such as the initial_estimates_compare example)?
  • PsN does SSE a bit differently, in that it lets you provide "alternative models", while we have less control over the input simulated data. This is a much larger conversation that should likely be discussed in a separate issue in detail. However it would be nice to discuss this at a higher level. Is this a "version 2" thing? How do we think most users expect SSE to work?

Sub-issues

Example

Install development bbr from commit
devtools::install_git(
  url = "[email protected]:metrumresearchgroup/bbr.git",
  ref = "e5b79a47cbefcdaf573e3d59e2a11064623021f7",
  git = "external",
  dependencies = TRUE
)

library(bbr)
helper functions
# This is a helper function we use in our test suite. It's function is just
# to add an MSFO = {x}.MSF to an $EST record.
# - Used with bbr::add_simulation
add_msf_opt <- function(mod, msf_path = paste0(get_model_id(mod), ".MSF")){
  ctl <- bbr:::get_model_ctl(mod)
  mod_path <- get_model_path(mod)
  est <- nmrec::select_records(ctl, "est")[[1]]
  
  msf_path_ctl <- bbr:::get_msf_path(mod, .check_exists = FALSE)
  if(is.null(msf_path_ctl)){
    nmrec::set_record_option(est, "MSFO", msf_path)
    nmrec::write_ctl(ctl, mod_path)
  }else{
    rlang::inform(glue::glue("MSF option already exists: \n - {est$format()}"))
  }
  return(mod)
}

# Sample Quantiles for a given set of columns in a dataframe
get_percentiles <- function(
    df, 
    compare_cols, 
    probs = c(0.5, 0.025, 0.975), 
    na.rm = FALSE
){
  comp_df <- df %>% dplyr::select({{ compare_cols }})
  quantile_fn <- function(x)  {
    quantile(x, probs = probs, na.rm = na.rm)
  }
  comp_df <- comp_df %>%
    dplyr::reframe(across(.cols = everything(), .fns = quantile_fn)) %>%
    t() %>%
    as.data.frame() %>%
    tibble::rownames_to_column() %>% tibble::as_tibble()
  colnames(comp_df) <- c("parameter_names", paste0("p", probs * 100))
  return(comp_df)
}
Simulate data
# Simulate ----------------------------------------------------------------

# New model with MSF saved out
# - MSF file is required for bbr::add_sumulation()
# - We create a model, ensure it will output estimates to an MSF file, and then
#   simulate `N_SIM` times.
# - This is a working example using `bbr`, though we want to see how inputs could
#   vary when using mrgsolve. Do we need more flexibility below, or can some of it be
#   done using `dplyr`?

# Define N number of simulations
N_SIM <- 200

# Starting example model from bbr
model_dir <- system.file("model/nonmem/basic", package = "bbr")
mod1 <- read_model(file.path(model_dir, "1"))

# Submit a model we plan to simulate
mod2 <- copy_model_from(mod1, "2") %>% update_model_id()
mod2 <- add_msf_opt(mod2) # would normally be done manually
submit_model(mod2, .mode = "local")

# Simulate data - can also test with mrgsolve
add_simulation(mod2, n = N_SIM, .mode = "local", .overwrite = T)
sim_data <- nm_join_sim(mod2)
New SSE run
# New SSE Run -------------------------------------------------------------


# new SSE run or read in previous runs
# mod2 <- read_model(file.path(model_dir, "2"))
# sse_run <- read_model(file.path(model_dir, "2-sse"))

# Can use `.suffix` to create multiple SSE designs from the same starting model
# sse_run <- new_sse_run(mod2, .suffix = "sse-design-1")
sse_run <- new_sse_run(mod2, .suffix = "sse")


# Set up the SSE run
# - This function takes a bbi_nmsse_model (created by a previous new_sse_run() 
#   call) and creates `n` new model objects and re-sampled datasets in a subdirectory. 
#   The control stream found at get_model_path(sse_run) is used as the "template"
#   for these new model objects, and the new datasets are sampled from the dataset
#   passed to data. 
# - See ?setup_sse_run for more details
sse_run <- setup_sse_run(
  sse_run,
  # Simulation dataset
  # - Could filter to a specific design here:
  # data = sim_data %>% dplyr::filter(DESIGN = 1),
  data = sim_data,
  # Stratification columns for sampling
  strat_cols = "SEX",
  # N simulations
  n = N_SIM,
  # Sample size for each dataset (uses ID column as KEY)
  sample_size = 30,
  # Simulation replicate column name (e.g., "IREP" for mrgsolve)
  # - Filters to each simulation before sampling.
  sim_col = "nn"
)

# Print to console to view SSE specifications
sse_run
Submit and get status
# Submit in batches
submit_model(sse_run, .batch_size = 100)

# Get status of run completion along the way
get_model_status(sse_run)
Summarize and save results
# Summarize the parameter estimates, run details, and any heuristics of a SSE run,
# saving the results to a `sse_summary.RDS` data file within the SSE run directory.
# - See ?summarize_sse_run() for more details
sse_sum <- summarize_sse_run(sse_run)

# Print to console to view high level information about the run
sse_sum

# You can look at different summary tables within this object
sse_sum$analysis_summary
sse_sum$run_details
sse_sum$run_heuristics

# Read in SSE estimates. Faster once it's been summarized above^
sse_estimates(sse_run) # same as sse_sum$parameter_estimates


# Summary log for each run
# You can also find this information in sse_sum 
# (e.g., sse_sum$analysis_summary has OFV's, termination codes, etc.)
summary_log(sse_run$absolute_model_path)


# Compare to initial or "true" estimates the SSE run is based on
# - initial_estimates_compare() is a prototype function in bbr
initial_estimates_compare(sse_sum, probs = c(0.5, 0.025, 0.975))

# Look at OFV distribution
# - get_percentiles() is a helper defined in this doc
get_percentiles(sse_sum$analysis_summary, compare_cols = "ofv")


# Read in all SSE models - can be helpful for inspecting specific model runs
sse_mods <- get_sse_models(sse_run)

length(sse_mods)
model_summary(sse_mods[[1]]) %>% param_estimates()
Example plots (post summary)
# Example Plots
library(ggplot2)

# Look at OFV distribution
sse_sum$analysis_summary %>%
  ggplot(aes(x = ofv)) +
  geom_histogram(color = "white", alpha = 0.7) +
  theme_bw()


# Look at estimate distributions
par <- sse_estimates(sse_run) 
sum_data <- initial_estimates_compare(sse_sum) %>% 
  tidyr::pivot_longer(
    cols = c("initial","p50", "p2.5", "p97.5"), 
    names_to = "stat", values_to = "value"
  )

record_type <- "THETA" # Filter to record type (e.g., THETAs)
par_pl <- par %>% dplyr::filter(grepl(record_type, parameter_names))
sum_data_pl <- sum_data %>% dplyr::filter(grepl(record_type, parameter_names))

low_hi_sum <- sum_data_pl %>% dplyr::filter(stat != "initial")
orig_sum <- sum_data_pl %>% dplyr::filter(stat == "initial")

par_pl %>% ggplot(aes(x = estimate)) +
  facet_wrap(~parameter_names, scales = "free") +
  geom_histogram(color = "white", alpha = 0.7) +
  geom_vline(data = low_hi_sum, aes(xintercept = value), lwd = 1, color = "blue3") +
  geom_vline(data = orig_sum, aes(xintercept = value), lwd = 1, color = "red3") +
  theme_bw() + labs(x = "Value", y = "Count")
Objective Function Value Parameter estimates compared to "truth" (red line)
Image Image
@barrettk barrettk added the needs SME input SME = Subject Matter Expert label Feb 5, 2025
@barrettk barrettk mentioned this issue Feb 5, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs SME input SME = Subject Matter Expert
Projects
None yet
Development

No branches or pull requests

1 participant