Vectorise ebola model #211

pratikunterwegs · 2024-04-08T12:55:31Z

This PR is WIP on a project to release {epidemics} v0.3.0 focusing on functionality related to the Ebola model.

This PR:

Fixes Stochastic model functions should accept parameter vectors #169 by allowing the Ebola model to accept vectors of infection parameters following Tidyverse recycling rules;
Fixes Stochastic model should accept list of model component sets #170 by allowing the Ebola model to accept lists of rate intervention sets (time-dependence is the only other composable element and is not allowed to be vectorised);
Fixes Preserve seeds across parameter and event sets for stochastic models #172 by using {withr} to reset the random seed across parameter-set and scenario combinations, while pre-generating and using local seeds for each replicate within each parameter-scenario combination; this ensures that each $i$-th replicate of each scenario uses the same seed;
Fixes Stochastic Ebola model should have a two level structure #173 by lifting the dynamic parameter calculation and compartmental transitions code into an internal, 'unsafe' function without input checks called .model_ebola_internal(); the user facing model_ebola() holds code for input checking, parameter-scenario combination, and model output handling;
Fixes Stochastic Ebola model should allow replicates #174 by introducing the replicates argument to model_ebola() with a default of 100 replicates;
Fixes Show better seed management in Ebola vignette #163 by showing how to use {withr} for seed management in the Ebola vignette;
Fixes Show multiple replicates for Ebola intervention scenarios #164 by showing multiple replicates for each model run in the Ebola vignette.

Note that the continuous benchmarking check is expected to fail (or show reduced performance) as the default for model_ebola() is now 100 replicates.

A small example showing seed preservation:

library(epidemics)
library(ggplot2)

# Prepare population and parameters
demography_vector <- 67000 # small population
contact_matrix <- matrix(1)

# manual case counts divided by pop size rather than proportions as small sizes
# introduce errors when converting to counts in the model code; extra
# individuals may appear
infectious <- 1
exposed <- 10
initial_conditions <- matrix(
  c(demography_vector - infectious - exposed, exposed, infectious, 0, 0, 0) /
    demography_vector,
  nrow = 1
)
rownames(contact_matrix) <- "full_pop"
pop <- population(
  contact_matrix = contact_matrix,
  demography_vector = demography_vector,
  initial_conditions = initial_conditions
)

compartments <- c(
  "susceptible", "exposed", "infectious", "hospitalised", "funeral", "removed"
)

# prepare integer values for expectations on data length
time_end <- 100L
replicates <- 10L

npi_list <- list(
  scenario_baseline = NULL,
  scenario_00 = list(
    transmission_rate = intervention(
      "transmission", "rate", 50, 100, 0.1
    )
  ),
  scenario_01 = list(
    transmission_rate = intervention(
      "transmission", "rate", 50, 100, 1.0
    )
  )
)
output <- withr::with_seed(
  1,
  model_ebola(
    population = pop,
    intervention = npi_list,
    replicates = 5L, # replicates
    time_end = time_end
  )
)
data <- output[, unlist(data, recursive = FALSE), by = "scenario"]

ggplot(data[compartment == "exposed"], aes(time, value)) +
  geom_line(
    aes(
      col = as.factor(scenario),
      group = interaction(scenario, replicate)
    )
  ) +
  facet_grid(~replicate)

^{Created on 2024-04-08 with reprex v2.0.2}

github-actions · 2024-04-08T13:18:01Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if e99365a is merged into main:

✔️default_ode: 15ms -> 15.1ms [-0.51%, +1.92%]
✔️default_ode_interventions: 73.4ms -> 73ms [-1.05%, +0.1%]
✔️default_ode_param_vec: 838ms -> 841ms [-0.1%, +0.89%]
✔️default_ode_paramvec_intervs: 6.25s -> 6.27s [-0.22%, +0.72%]
❗🐌ebola: 43ms -> 615ms [+1325.2%, +1337.77%]
Further explanation regarding interpretation and methodology can be found in the documentation.

github-actions · 2024-04-12T13:55:46Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if e99365a is merged into main:

✔️default_ode: 14.8ms -> 14.9ms [-0.45%, +1.88%]
🚀default_ode_interventions: 73.8ms -> 73ms [-1.94%, -0.28%]
✔️default_ode_param_vec: 843ms -> 845ms [-1.37%, +1.72%]
🚀default_ode_paramvec_intervs: 6.29s -> 6.24s [-1.23%, -0.31%]
❗🐌ebola: 37.2ms -> 607ms [+1522.66%, +1535.58%]
Further explanation regarding interpretation and methodology can be found in the documentation.

…169

github-actions · 2024-04-15T11:18:15Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 6258c0c is merged into main:

✔️default_ode: 15.6ms -> 15.5ms [-2.1%, +0.56%]
🚀default_ode_interventions: 74.2ms -> 73.8ms [-1.11%, -0.09%]
✔️default_ode_param_vec: 851ms -> 849ms [-1.05%, +0.55%]
✔️default_ode_paramvec_intervs: 6.34s -> 6.33s [-0.69%, +0.36%]
❗🐌ebola: 46.6ms -> 628ms [+1238.71%, +1257.64%]
Further explanation regarding interpretation and methodology can be found in the documentation.

joshwlambert

Nice work @pratikunterwegs. I've only taken a relatively quick look through this PR but everything seems to work well.

I've left a few comments throughout the diffs. I haven't had time to look properly into the seed management, happy to look again another time, but no need to wait on merging this PR as I can investigate when reviewing future PRs.

R/check_args_ebola.R

R/helpers.R

joshwlambert · 2024-04-16T14:07:35Z

R/model_ebola.R

+    sorted = FALSE
+  )
+
+  # Send warning when > 10,000 runs (including replicates) are requested


What is the runtime of 10,000 runs? I struggle to gauge whether this is a sensible threshold to warn users. Is there also a memory usage aspect to this that should also be considered when running many thousand runs, e.g. if a user has a laptop with 1GB of RAM.

Probably depends on hardware - 10,000 feels like a lot even with replicates. If you remember, the Gaza work was 35K runs overall, over 11 pathogens. So 10K is probably a good middle ground where it's not too low, but also a fair warning threshold.

Here's a reprex with a single run of 10k replicates. Probably higher with interventions added. It's not very slow on my hardware, but still worth a warning I'd think.

library(epidemics) library(ggplot2) # Prepare population and parameters demography_vector <- 67000 # small population contact_matrix <- matrix(1) # manual case counts divided by pop size rather than proportions as small sizes # introduce errors when converting to counts in the model code; extra # individuals may appear infectious <- 1 exposed <- 10 initial_conditions <- matrix( c(demography_vector - infectious - exposed, exposed, infectious, 0, 0, 0) / demography_vector, nrow = 1 ) pop <- population( contact_matrix = contact_matrix, demography_vector = demography_vector, initial_conditions = initial_conditions ) # prepare integer values for expectations on data length time_end <- 100L replicates <- 10000L tictoc::tic() model_ebola( population = pop, replicates = replicates, time_end = time_end ) #> time demography_group compartment value replicate #> <int> <char> <char> <num> <int> #> 1: 0 demo_group_1 susceptible 66989 1 #> 2: 0 demo_group_1 exposed 10 1 #> 3: 0 demo_group_1 infectious 1 1 #> 4: 0 demo_group_1 hospitalised 0 1 #> 5: 0 demo_group_1 funeral 0 1 #> --- #> 6059996: 100 demo_group_1 exposed 78 10000 #> 6059997: 100 demo_group_1 infectious 107 10000 #> 6059998: 100 demo_group_1 hospitalised 43 10000 #> 6059999: 100 demo_group_1 funeral 6 10000 #> 6060000: 100 demo_group_1 removed 395 10000 tictoc::toc() #> 44.941 sec elapsed

^{Created on 2024-04-17 with reprex v2.0.2}

I agree that a warning is useful in this case as it reassures the user that their session isn't hanging and that something is running in the background that may take a substantial amount of time.

Something for a possible future PR, it might be worth restating "This may take some time" with "This can take several minutes", to be clearer to user how long they should expect to wait.

Something for a possible future PR, it might be worth restating "This may take some time" with "This can take several minutes", to be clearer to user how long they should expect to wait.

An alternative would be add a cli progress bar especially since you already import {cli}.

Thanks, I'll take a look into this.

joshwlambert · 2024-04-16T14:17:19Z

vignettes/model_ebola.Rmd

@@ -111,20 +112,21 @@ This can also be interpreted as the proportion of funerals that are ebola-safe.
 ## Run epidemic model


Leaving my comment here as L101 has not changed to make the comment there.

Why are deaths and recovered in the same compartment? It might be worth adding a bit more explanation as to why this is the case other than saying it does not affect model dynamics. What if someone wanted to estimate the total number of deaths from the outbreak?

The proximate reason really is that it's because neither really affects the outbreak size any further; the underlying reason is that removals due to deaths and recoveries weren't separated in the Li et al. consensus model if I recall correctly. It should be fairly easy to estimate deaths by scaling the epidemic size by the CFR.

@sbfnk or @adamkucharski - you've actually responded to Ebola - would separating deaths from recoveries be useful? I'm happy to raise an issue for this.

vignettes/model_ebola.Rmd

pratikunterwegs · 2024-04-16T14:37:47Z

Thanks @joshwlambert for looking through this, you've already managed to catch a few issues that I'll fix (func docs, notes, captions). Will go through the comments and answer there.

github-actions · 2024-04-17T10:01:23Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 6258c0c is merged into main:

✔️default_ode: 15.2ms -> 15.1ms [-2.41%, +1.07%]
✔️default_ode_interventions: 73.7ms -> 73.2ms [-1.6%, +0.23%]
✔️default_ode_param_vec: 837ms -> 836ms [-0.88%, +0.76%]
✔️default_ode_paramvec_intervs: 6.38s -> 6.34s [-1.38%, +0.1%]
❗🐌ebola: 47.8ms -> 617ms [+1183.9%, +1199.66%]
Further explanation regarding interpretation and methodology can be found in the documentation.

pratikunterwegs · 2024-04-17T13:06:48Z

Thanks @joshwlambert for reviewing. I'm merging this PR now so the next one can be up for review; happy to take more feedback on the Ebola model as issues.

pratikunterwegs self-assigned this Apr 8, 2024

pratikunterwegs marked this pull request as ready for review April 8, 2024 13:15

pratikunterwegs added a commit that referenced this pull request Apr 12, 2024

Handle Ebola model replicates in epidemic_size(), see #211

48bebd3

pratikunterwegs and others added 21 commits April 15, 2024 11:55

Initial rewrite of ebola model for vectorised params/composables, WIP #…

f43e022

…169

Initial implementation of ebola args checker, WIP #173

49d4be9

Initial implementation of ebola output processing

90d9390

Initial docs for new ebola model, WIP #169

62ecad4

Refactor ebola model for vector inputs, WIP #169, #170

c46e169

Update design docs for vectorised ebola mod, WIP #169, #170

7e5b932

Fix output processing in ebola model

9b82c52

Add improved test suite for ebola model

09d143e

Update ebola model snapshot

4910340

Use local_preserve_seed()

9d049fc

Add tests for ebola seed management, WIP: test failing

e512799

Move data struct init within iteration ebola model

588fa7b

Fix documentation and wordlist

1a3f585

Minor edit to .cross_check_intervention()

d4f81d7

Make intervention fns internal

d9de8db

{withr} moved to Imports

b729145

Update CITATION.cff

e7827ab

Correct seed management ebola model

fd1ddc3

Update ebola model tests and snapshot

fece4bc

Update ebola vignette

c450187

Ebola model timepoints match ODE models, 0:time_end

d4b31ab

pratikunterwegs force-pushed the vectorise-ebola branch from cb817b1 to d4b31ab Compare April 15, 2024 10:55

Update CITATION.cff

1c2324b

joshwlambert self-requested a review April 16, 2024 13:10

joshwlambert approved these changes Apr 16, 2024

View reviewed changes

pratikunterwegs added 3 commits April 17, 2024 10:37

Update fn docs .check_prepare_args_ebola()

22227e8

Rm input checking in internal fn .output_to_df()

0cc1000

Update ebola vignette with review comments

83547a3

pratikunterwegs merged commit 21ebc61 into main Apr 17, 2024
12 checks passed

pratikunterwegs deleted the vectorise-ebola branch April 17, 2024 13:07

pratikunterwegs added a commit that referenced this pull request Apr 17, 2024

Handle Ebola model replicates in epidemic_size(), see #211

072821d

pratikunterwegs added a commit that referenced this pull request Apr 22, 2024

Handle Ebola model replicates in epidemic_size(), see #211

c5019f7

pratikunterwegs mentioned this pull request Apr 23, 2024

Possible API for models #160

Closed

pratikunterwegs mentioned this pull request May 30, 2024

A few remarks #121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorise ebola model #211

Vectorise ebola model #211

pratikunterwegs commented Apr 8, 2024 •

edited

Loading

github-actions bot commented Apr 8, 2024

github-actions bot commented Apr 12, 2024

github-actions bot commented Apr 15, 2024

joshwlambert left a comment

joshwlambert Apr 16, 2024

pratikunterwegs Apr 16, 2024

pratikunterwegs Apr 17, 2024 •

edited

Loading

joshwlambert Apr 17, 2024

jamesmbaazam Apr 18, 2024

pratikunterwegs Apr 22, 2024

joshwlambert Apr 16, 2024

pratikunterwegs Apr 16, 2024

pratikunterwegs commented Apr 16, 2024

github-actions bot commented Apr 17, 2024

pratikunterwegs commented Apr 17, 2024

		@@ -111,20 +112,21 @@ This can also be interpreted as the proportion of funerals that are ebola-safe.
		## Run epidemic model

Vectorise ebola model #211

Vectorise ebola model #211

Conversation

pratikunterwegs commented Apr 8, 2024 • edited Loading

github-actions bot commented Apr 8, 2024

github-actions bot commented Apr 12, 2024

github-actions bot commented Apr 15, 2024

joshwlambert left a comment

Choose a reason for hiding this comment

joshwlambert Apr 16, 2024

Choose a reason for hiding this comment

pratikunterwegs Apr 16, 2024

Choose a reason for hiding this comment

pratikunterwegs Apr 17, 2024 • edited Loading

Choose a reason for hiding this comment

joshwlambert Apr 17, 2024

Choose a reason for hiding this comment

jamesmbaazam Apr 18, 2024

Choose a reason for hiding this comment

pratikunterwegs Apr 22, 2024

Choose a reason for hiding this comment

joshwlambert Apr 16, 2024

Choose a reason for hiding this comment

pratikunterwegs Apr 16, 2024

Choose a reason for hiding this comment

pratikunterwegs commented Apr 16, 2024

github-actions bot commented Apr 17, 2024

pratikunterwegs commented Apr 17, 2024

pratikunterwegs commented Apr 8, 2024 •

edited

Loading

pratikunterwegs Apr 17, 2024 •

edited

Loading