Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorise ebola model #211

Merged
merged 25 commits into from
Apr 17, 2024
Merged

Vectorise ebola model #211

merged 25 commits into from
Apr 17, 2024

Conversation

pratikunterwegs
Copy link
Collaborator

@pratikunterwegs pratikunterwegs commented Apr 8, 2024

This PR is WIP on a project to release {epidemics} v0.3.0 focusing on functionality related to the Ebola model.

This PR:

  1. Fixes Stochastic model functions should accept parameter vectors #169 by allowing the Ebola model to accept vectors of infection parameters following Tidyverse recycling rules;
  2. Fixes Stochastic model should accept list of model component sets #170 by allowing the Ebola model to accept lists of rate intervention sets (time-dependence is the only other composable element and is not allowed to be vectorised);
  3. Fixes Preserve seeds across parameter and event sets for stochastic models #172 by using {withr} to reset the random seed across parameter-set and scenario combinations, while pre-generating and using local seeds for each replicate within each parameter-scenario combination; this ensures that each $i$-th replicate of each scenario uses the same seed;
  4. Fixes Stochastic Ebola model should have a two level structure #173 by lifting the dynamic parameter calculation and compartmental transitions code into an internal, 'unsafe' function without input checks called .model_ebola_internal(); the user facing model_ebola() holds code for input checking, parameter-scenario combination, and model output handling;
  5. Fixes Stochastic Ebola model should allow replicates #174 by introducing the replicates argument to model_ebola() with a default of 100 replicates;
  6. Fixes Show better seed management in Ebola vignette #163 by showing how to use {withr} for seed management in the Ebola vignette;
  7. Fixes Show multiple replicates for Ebola intervention scenarios #164 by showing multiple replicates for each model run in the Ebola vignette.

Note that the continuous benchmarking check is expected to fail (or show reduced performance) as the default for model_ebola() is now 100 replicates.

A small example showing seed preservation:

library(epidemics)
library(ggplot2)

# Prepare population and parameters
demography_vector <- 67000 # small population
contact_matrix <- matrix(1)

# manual case counts divided by pop size rather than proportions as small sizes
# introduce errors when converting to counts in the model code; extra
# individuals may appear
infectious <- 1
exposed <- 10
initial_conditions <- matrix(
  c(demography_vector - infectious - exposed, exposed, infectious, 0, 0, 0) /
    demography_vector,
  nrow = 1
)
rownames(contact_matrix) <- "full_pop"
pop <- population(
  contact_matrix = contact_matrix,
  demography_vector = demography_vector,
  initial_conditions = initial_conditions
)

compartments <- c(
  "susceptible", "exposed", "infectious", "hospitalised", "funeral", "removed"
)

# prepare integer values for expectations on data length
time_end <- 100L
replicates <- 10L

npi_list <- list(
  scenario_baseline = NULL,
  scenario_00 = list(
    transmission_rate = intervention(
      "transmission", "rate", 50, 100, 0.1
    )
  ),
  scenario_01 = list(
    transmission_rate = intervention(
      "transmission", "rate", 50, 100, 1.0
    )
  )
)
output <- withr::with_seed(
  1,
  model_ebola(
    population = pop,
    intervention = npi_list,
    replicates = 5L, # replicates
    time_end = time_end
  )
)
data <- output[, unlist(data, recursive = FALSE), by = "scenario"]

ggplot(data[compartment == "exposed"], aes(time, value)) +
  geom_line(
    aes(
      col = as.factor(scenario),
      group = interaction(scenario, replicate)
    )
  ) +
  facet_grid(~replicate)

Created on 2024-04-08 with reprex v2.0.2

@pratikunterwegs pratikunterwegs self-assigned this Apr 8, 2024
@pratikunterwegs pratikunterwegs marked this pull request as ready for review April 8, 2024 13:15
Copy link

github-actions bot commented Apr 8, 2024

This is how benchmark results would change (along with a 95% confidence interval in relative change) if e99365a is merged into main:

  • ✔️default_ode: 15ms -> 15.1ms [-0.51%, +1.92%]
  • ✔️default_ode_interventions: 73.4ms -> 73ms [-1.05%, +0.1%]
  • ✔️default_ode_param_vec: 838ms -> 841ms [-0.1%, +0.89%]
  • ✔️default_ode_paramvec_intervs: 6.25s -> 6.27s [-0.22%, +0.72%]
  • ❗🐌ebola: 43ms -> 615ms [+1325.2%, +1337.77%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if e99365a is merged into main:

  • ✔️default_ode: 14.8ms -> 14.9ms [-0.45%, +1.88%]
  • 🚀default_ode_interventions: 73.8ms -> 73ms [-1.94%, -0.28%]
  • ✔️default_ode_param_vec: 843ms -> 845ms [-1.37%, +1.72%]
  • 🚀default_ode_paramvec_intervs: 6.29s -> 6.24s [-1.23%, -0.31%]
  • ❗🐌ebola: 37.2ms -> 607ms [+1522.66%, +1535.58%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 6258c0c is merged into main:

  • ✔️default_ode: 15.6ms -> 15.5ms [-2.1%, +0.56%]
  • 🚀default_ode_interventions: 74.2ms -> 73.8ms [-1.11%, -0.09%]
  • ✔️default_ode_param_vec: 851ms -> 849ms [-1.05%, +0.55%]
  • ✔️default_ode_paramvec_intervs: 6.34s -> 6.33s [-0.69%, +0.36%]
  • ❗🐌ebola: 46.6ms -> 628ms [+1238.71%, +1257.64%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@joshwlambert joshwlambert self-requested a review April 16, 2024 13:10
Copy link
Member

@joshwlambert joshwlambert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @pratikunterwegs. I've only taken a relatively quick look through this PR but everything seems to work well.

I've left a few comments throughout the diffs. I haven't had time to look properly into the seed management, happy to look again another time, but no need to wait on merging this PR as I can investigate when reviewing future PRs.

R/check_args_ebola.R Outdated Show resolved Hide resolved
R/helpers.R Outdated Show resolved Hide resolved
sorted = FALSE
)

# Send warning when > 10,000 runs (including replicates) are requested
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the runtime of 10,000 runs? I struggle to gauge whether this is a sensible threshold to warn users. Is there also a memory usage aspect to this that should also be considered when running many thousand runs, e.g. if a user has a laptop with 1GB of RAM.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably depends on hardware - 10,000 feels like a lot even with replicates. If you remember, the Gaza work was 35K runs overall, over 11 pathogens. So 10K is probably a good middle ground where it's not too low, but also a fair warning threshold.

Copy link
Collaborator Author

@pratikunterwegs pratikunterwegs Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a reprex with a single run of 10k replicates. Probably higher with interventions added. It's not very slow on my hardware, but still worth a warning I'd think.

library(epidemics)
library(ggplot2)

# Prepare population and parameters
demography_vector <- 67000 # small population
contact_matrix <- matrix(1)

# manual case counts divided by pop size rather than proportions as small sizes
# introduce errors when converting to counts in the model code; extra
# individuals may appear
infectious <- 1
exposed <- 10
initial_conditions <- matrix(
  c(demography_vector - infectious - exposed, exposed, infectious, 0, 0, 0) /
    demography_vector,
  nrow = 1
)
pop <- population(
  contact_matrix = contact_matrix,
  demography_vector = demography_vector,
  initial_conditions = initial_conditions
)

# prepare integer values for expectations on data length
time_end <- 100L
replicates <- 10000L

tictoc::tic()
model_ebola(
  population = pop,
  replicates = replicates,
  time_end = time_end
)
#>           time demography_group  compartment value replicate
#>          <int>           <char>       <char> <num>     <int>
#>       1:     0     demo_group_1  susceptible 66989         1
#>       2:     0     demo_group_1      exposed    10         1
#>       3:     0     demo_group_1   infectious     1         1
#>       4:     0     demo_group_1 hospitalised     0         1
#>       5:     0     demo_group_1      funeral     0         1
#>      ---                                                    
#> 6059996:   100     demo_group_1      exposed    78     10000
#> 6059997:   100     demo_group_1   infectious   107     10000
#> 6059998:   100     demo_group_1 hospitalised    43     10000
#> 6059999:   100     demo_group_1      funeral     6     10000
#> 6060000:   100     demo_group_1      removed   395     10000
tictoc::toc()
#> 44.941 sec elapsed

Created on 2024-04-17 with reprex v2.0.2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a warning is useful in this case as it reassures the user that their session isn't hanging and that something is running in the background that may take a substantial amount of time.

Something for a possible future PR, it might be worth restating "This may take some time" with "This can take several minutes", to be clearer to user how long they should expect to wait.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something for a possible future PR, it might be worth restating "This may take some time" with "This can take several minutes", to be clearer to user how long they should expect to wait.

An alternative would be add a cli progress bar especially since you already import {cli}.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll take a look into this.

@@ -111,20 +112,21 @@ This can also be interpreted as the proportion of funerals that are ebola-safe.
## Run epidemic model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving my comment here as L101 has not changed to make the comment there.

Why are deaths and recovered in the same compartment? It might be worth adding a bit more explanation as to why this is the case other than saying it does not affect model dynamics. What if someone wanted to estimate the total number of deaths from the outbreak?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proximate reason really is that it's because neither really affects the outbreak size any further; the underlying reason is that removals due to deaths and recoveries weren't separated in the Li et al. consensus model if I recall correctly. It should be fairly easy to estimate deaths by scaling the epidemic size by the CFR.

@sbfnk or @adamkucharski - you've actually responded to Ebola - would separating deaths from recoveries be useful? I'm happy to raise an issue for this.

vignettes/model_ebola.Rmd Show resolved Hide resolved
vignettes/model_ebola.Rmd Outdated Show resolved Hide resolved
vignettes/model_ebola.Rmd Outdated Show resolved Hide resolved
vignettes/model_ebola.Rmd Outdated Show resolved Hide resolved
vignettes/model_ebola.Rmd Show resolved Hide resolved
vignettes/model_ebola.Rmd Show resolved Hide resolved
@pratikunterwegs
Copy link
Collaborator Author

Thanks @joshwlambert for looking through this, you've already managed to catch a few issues that I'll fix (func docs, notes, captions). Will go through the comments and answer there.

Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 6258c0c is merged into main:

  • ✔️default_ode: 15.2ms -> 15.1ms [-2.41%, +1.07%]
  • ✔️default_ode_interventions: 73.7ms -> 73.2ms [-1.6%, +0.23%]
  • ✔️default_ode_param_vec: 837ms -> 836ms [-0.88%, +0.76%]
  • ✔️default_ode_paramvec_intervs: 6.38s -> 6.34s [-1.38%, +0.1%]
  • ❗🐌ebola: 47.8ms -> 617ms [+1183.9%, +1199.66%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@pratikunterwegs
Copy link
Collaborator Author

Thanks @joshwlambert for reviewing. I'm merging this PR now so the next one can be up for review; happy to take more feedback on the Ebola model as issues.

@pratikunterwegs pratikunterwegs merged commit 21ebc61 into main Apr 17, 2024
12 checks passed
@pratikunterwegs pratikunterwegs deleted the vectorise-ebola branch April 17, 2024 13:07
@pratikunterwegs pratikunterwegs mentioned this pull request May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment