Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorial: Porting An Existing Model (Influenza-Epidemia) To MSR #162

Conversation

AFg6K7h4fhy2
Copy link
Collaborator

NOTE: This is currently a work in progress. I've committed a skeleton of the Quarto tutorial but need to continuing finishing the implementation in Python before porting over more and then writing in detail. Consider this draft PR and its initial commit a placeholder.

Recall the goal from the corresponding issue:

Verify that DHM's influenza forecasting repository cfa-forecast-renewal-epidemia, which utilizes epidemia, can be instantiated via MSR.

And also recall the desired features:

Create quarto walk-through detailing the port-over steps and providing documentation.

Validate MSR implemented cfa-forecast-renewal-epidemia via simulation and fitting on NHSN data, which is the data cfa-forecast-renewal-epidemia has historically been used on.

@AFg6K7h4fhy2 AFg6K7h4fhy2 added documentation Improvements or additions to documentation request New feature or request labels Jun 6, 2024
@AFg6K7h4fhy2 AFg6K7h4fhy2 added this to the 🦩 Kakuru milestone Jun 6, 2024
@AFg6K7h4fhy2 AFg6K7h4fhy2 self-assigned this Jun 6, 2024
@AFg6K7h4fhy2 AFg6K7h4fhy2 linked an issue Jun 6, 2024 that may be closed by this pull request
Copy link

codecov bot commented Jun 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.43%. Comparing base (769712b) to head (5b3feac).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #162   +/-   ##
=======================================
  Coverage   93.43%   93.43%           
=======================================
  Files          38       38           
  Lines        1005     1005           
=======================================
  Hits          939      939           
  Misses         66       66           
Flag Coverage Δ
unittests 93.43% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@damonbayer damonbayer modified the milestones: 🦩 Kakuru, L Sprint Jun 10, 2024
@AFg6K7h4fhy2
Copy link
Collaborator Author

@dylanhmorris

Would appreciate some Why thoughts on the following settings:

day_of_week_effect_prior_modes = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
day_of_week_effect_prior_scales = [0.25, 0.25, 0.25, 0.25, 0.25, 0.25]
holiday_eff_prior_mode = -0.2
holiday_eff_prior_scale = 0.2
post_holiday_eff_prior_mode = 0.0
post_holiday_eff_prior_scale = 0.2
recency_eff_prior_mode = 0.0
recency_eff_prior_scale = 0.1
recency_effect_length = 0

generation_time_dist = [0.233, 0.359, 0.198, 0.103, 0.053, 0.027, 0.014, 0.007, 0.003, 0.002, 0.001]
ihr_intercept_prior_mode = -6.5
ihr_intercept_prior_scale = 0.5
inf_model_prior_infections_per_capita = 0.0001
inf_model_seeds = 8
inf_to_hosp_dist = [0.05, 0.1, 0.175, 0.225, 0.175, 0.125, 0.075, 0.05, 0.025]
max_rt = 3.0
n_pre_observation_days = 14
non_obs_effect_prior_mode = -50
non_obs_effect_prior_scale = 0.01
reciprocal_dispersion_prior_mode = 10
reciprocal_dispersion_prior_scale = 5
reference_day_of_week = 4
rt_intercept_prior_mode = -0.4054651
rt_intercept_prior_scale = 0.3
seed = 54321
susceptible_fraction_prior_mode = 0.9
susceptible_fraction_prior_scale = 0.1
weekly_rw_prior_scale = 0.25
first_fitting_date = "2023-09-15"

[mcmc]
adapt_delta = 0.9
max_treedepth = 12
n_chains = 1
n_warmup = 1000
n_iter = 2000

Apologies for not putting this question here sooner. I do not know why I bottled it in my markdown file.

@AFg6K7h4fhy2
Copy link
Collaborator Author

AFg6K7h4fhy2 commented Jun 20, 2024

Transmission Component

I would also appreciate some thoughts on my attempted description of the transmission component of cfaepim:


In cfaepim, the influenza hospitalizations for each US territory are modelled separately, as though influenza transmission, infection, and observation within each territory's boundaries is completely independent of other territories. This corresponds to an "unpooled" modelling case (see Epidemia's partial pooling page). For each US territory, $\mathscr{R}_t$ can be modeled as a transformed linear prediction:

$$\mathscr{R}_t = g^{-1}(\eta_t)$$

where $g$ is a link function and $\eta_t$ is the prediction on the transformed scale.

The link function for cfaepim is a scaled logit, so:

$$g^{-1}(x) = \frac{K}{1 + e^{-x}}$$

with $K = 3$ since max_rt = 3.0. As for each territory's transformed scale prediction $\eta_t$, we have

$$\eta_t = \beta_0 + W_t$$

where $W_t$ is a random walk satisfying $$W_t = W_{t-1} + \gamma_t$$ with $t &gt; 0$ and $W_0 = 0$, and $\gamma_t \sim \mathcal{N}(0, \sigma)$, where $\sigma \sim \mathcal{N}^{+}(0, 0.25)$ since weekly_rw_prior_scale = 0.25. Note that the $\mathscr{R}_t$ intercept, i.e. $\beta_0$, has a prior. Since rt_intercept_prior_mode = -0.4054651 and rt_intercept_prior_scale = 0.3, $\beta_0 \sim \mathcal{N}(-0.4054651, 0.3)$.


Original R code for reference:

build_light_rt <- function(rt_intercept_prior_mode,
                           rt_intercept_prior_scale,
                           max_rt,
                           rw_prior_scale) {
  rt_model <- epidemia::epirt(
    formula = as.formula(
      sprintf(
        paste0(
          "R(location, date) ~ 1 + ",
          "rw(time = week, gr = location, prior_scale = %f)"
        ),
        rw_prior_scale
      )
    ),
    prior_intercept = rstanarm::normal(
      location = rt_intercept_prior_mode,
      scale = rt_intercept_prior_scale
    ),
    link = epidemia::scaled_logit(K = max_rt)
  )
  return(rt_model)
}

poetry.lock Outdated Show resolved Hide resolved
@AFg6K7h4fhy2
Copy link
Collaborator Author

AFg6K7h4fhy2 commented Jun 20, 2024

Observation Component

Similar to above, would appreciate some thoughts on the veracity of the following description:

NOTE: As of 2024-07-09, the below is a "corrected" description.


Proceeding, each US territory has is own time series of observed influenza hospitalizations $Y = (Y_1, \dotsc, Y_n)$. The observed influenza hospitalizations at time $t$ are given by $$Y_t \sim \mathcal{D}(y_t, \phi)$$ where $\mathcal{D}(y_t, \phi)$ is the sampling distribution with auxiliary parameter $\phi$.

This distribution is not fixed; rather, it is determined by the family variable in Epidemia's epiobs function.

Since family = "neg_binom" in cfaepim, $\mathcal{D}(y_t, \phi)$ is the negative binomial distribution parameterized by its mean and reciprocal dispersion ($\phi$).

Note that since reciprocal_dispersion_prior_mode = 10 and reciprocal_dispersion_prior_scale = 5, we have the prior $\phi \sim \mathcal{N}(10, 5)$.

The expected value for the observed influenza hospitalizations on day $t$, i.e. $y_t$, is given by

$$y_t = \alpha_t \sum_{s < t}i_s \pi_{t-s}$$

with $s &lt; t$ and where

  • $\pi_{k}$ is the time distribution from an infection to an observation
  • $i_s$ are new infections as modelled via a renewal equation
  • $\alpha_t&gt;0$ represents an instantaneous ascertainment rate, which in this instance means the rate of influenza infections that become reported hospitalizations at time $t$, modeled by a linear model with a logit link (since link = "logit"), i.e. the function $g$, where $$g^{-1}(x) = \frac{1}{1 + e^{-x}}$$

Within cfaepim, the linear predictor for $\alpha_t$ consists of

  • A "day of the week effect on observation probability, relative to the reference day of the week" (day_of_week).
  • A holiday effect, if the day is a holiday (is_holiday).
  • A post-holiday effect, if the day occurs after a holiday (is_post_holiday).

We have:

$$ \begin{aligned} \alpha_t &= g^{-1}(\eta_t)\\ \eta_t &= \beta_0 + \sum_{i=1}^9 \beta_i x_i \end{aligned} $$

where

  • the infection to hospitalization delay distribution intercept $\beta_0 \sim \mathcal{N}(-6.5, 0.5)$, since ihr_intercept_prior_mode = -6.5 and ihr_intercept_prior_scale = 0.5 .
  • each of the coefficients $\beta_1, \dotsc, \beta_6$ have priors, $\beta_i \sim \mathcal{N}(0, 0.25) \quad i = 1, \dotsc, 6$ (see day_of_week_effect_prior_modes and day_of_week_effect_prior_scales) with reference_day_of_week = 4, i.e. Thursday, since epiweeks index Sunday at 0.
  • the coefficient $\beta_7 \sim \mathcal{N}(-0.2, 0.2)$, since holiday_eff_prior_mode = -0.2 and holiday_eff_prior_scale = 0.2, and refers to the holiday effect.
  • the coefficient $\beta_8 \sim \mathcal{N}(0, 0.2)$, since post_holiday_eff_prior_mode = 0.0 and post_holiday_eff_prior_scale = 0.2, and refers to the post-holiday effect.
  • the coefficient $\beta_9 \sim \mathcal{N}(-50, 0.01)$, since non_obs_effect_prior_mode = -50 and non_obs_effect_prior_scale = 0.01, and refers to the change in the observation probability during the nonobservation (seeding) period.

Within cfaepim, we have the infection to hospitalization delay distribution as

$$\pi = \begin{bmatrix} 0.05 & 0.1 & 0.175 & 0.225 & 0.175 & 0.125 & 0.075 & 0.05 & 0.025\end{bmatrix}$$

given the selected value for inf_to_hosp_dist.


The corresponding code in cfaepim:

build_light_obs <- function(inf_to_hosp_dist,
                            ihr_intercept_prior_mode,
                            ihr_intercept_prior_scale,
                            day_of_week_eff_prior_modes,
                            day_of_week_eff_prior_scales,
                            holiday_eff_prior_mode,
                            holiday_eff_prior_scale,
                            post_holiday_eff_prior_mode,
                            post_holiday_eff_prior_scale,
                            non_obs_effect_prior_mode,
                            non_obs_effect_prior_scale,
                            inv_dispersion_prior_mode,
                            inv_dispersion_prior_scale,
                            link = "logit") {
  return(epidemia::epiobs(
    formula = as.formula(paste0(
      "hosp ~ 1 + day_of_week + ",
      "is_holiday + ",
      "is_post_holiday + ",
      "nonobservation_period"
    )),
    ## Add a covariate for the
    ## nonobservation window to
    ## leave an initial evolution
    ## period with no observations
    i2o = inf_to_hosp_dist,
    link = link,
    family = "neg_binom",
    prior_intercept = rstanarm::normal(
      location = ihr_intercept_prior_mode,
      scale = ihr_intercept_prior_scale
    ),
    prior = rstanarm::normal(
      location = c(
        day_of_week_eff_prior_modes,
        holiday_eff_prior_mode,
        post_holiday_eff_prior_mode,
        non_obs_effect_prior_mode
      ),
      ## a large negative non_obs_effect
      ## effectively conditions on detection
      ## prob = 0 outside the observation period
      scale = c(
        day_of_week_eff_prior_scales,
        holiday_eff_prior_scale,
        post_holiday_eff_prior_scale,
        non_obs_effect_prior_scale
        ## non-obs prior scale
        ## should be small to
        ## enforce non-obs effect
        ## close to (large negative) mode
      )
    ),
    prior_aux = rstanarm::normal(
      location = inv_dispersion_prior_mode,
      scale = inv_dispersion_prior_scale
    )
  ))
}

@dylanhmorris
Copy link
Collaborator

@AFg6K7h4fhy2 terminology edits only for the post on the $\mathcal{R}(t)$ model.

@dylanhmorris
Copy link
Collaborator

dylanhmorris commented Jun 25, 2024

Comments on the observation component description

Proceeding, each US territory has is own time series of observed influenza hospitalizations $Y = (Y_1, \dotsc, Y_n)$. The observed influenza hospitalizations at time $t$ are given by $$Y_t \sim p(y_t, \phi)$$ where $p(y_t, \phi)$ is the sampling distribution with auxiliary parameter $\phi$. Note that since reciprocal_dispersion_prior_mode = 10 and reciprocal_dispersion_prior_scale = 5, we have $\phi \sim \mathcal{N}(10, 5)$. Since family = "neg_binom", $\phi$ represents the reciprocal dispersion of the sampling distribution.

Should state explicitly that family = "neg_binom" means that $p$ becomes a negative binomial parameterized by its mean and reciprocal dispersion. I also might not use $p$ to denote the distribution, since it often denotes a specific probability. Perhaps something like $\sim \mathcal{D}(y_t, \phi)$ or $\sim \mathrm{Dist}(y_t, \phi)$ if your goal is to indicate that the choice of distribution is not fixed but rather determined by the family argument.

  • $\alpha_t&gt;0$ represents an instantaneous ascertainment rate, which in this instance means the rate of influenza infections that become hospitalizations at time $t$

Suggest reported hospitalizations

transformed by the logit-link (since link = "logit") function $g$, where $$g^{-1}(x) = \frac{1}{1 + e^{-x}}$$

Suggest modeled by a linear model with a logit link

Within cfaepim, the linear predictor for $\alpha_t$ consists of

linear prediction

We have:

$$ \begin{aligned} \alpha_t &= g^{-1}(\eta_t)\\ \eta_t &= \sum_{i=0}^9 \beta_i \end{aligned} $$

Need to include the predictor variables, not just the coefficients:

$$ \begin{aligned} \alpha_t &= g^{-1}(\eta_t)\\ \eta_t &= \sum_{i=0}^9 \beta_i x_i \end{aligned} $$

  • each of the covariates $\beta_1, \dotsc, \beta_6$ have priors, $\beta_i \sim \mathcal{N}(0, 0.25) \quad i = 1, \dotsc, 6$, corresponding, to the week from Sunday to Friday (where the reference date is Saturday)
  1. The $\beta_i$ are not covariates, they are coefficients. Covariate has a more or less precise meaning depending on who is talking, but the closer thing here would be the predictor variables $x_i$, not the corresponding coefficients $\beta_i$.
  2. The reference date is not necessarily Saturday. It is set in the config by the reference_day_of_week entry. The day of week values are then encoded relative to this reference day in the build_state_light_model() function here:
  ## make sure day_of_week is properly
  ## set up as a factor
  dow_levels <- levels(
    lubridate::wday("2023-01-01",
      label = TRUE,
      week_start = params$reference_day_of_week
    )
  )


  clean_data <- clean_data |>
    dplyr::mutate(
      day_of_week = factor(day_of_week,
        ordered = FALSE,
        levels = dow_levels
      )
    )

@AFg6K7h4fhy2

@damonbayer damonbayer added this to the P Sprint milestone Aug 2, 2024
@damonbayer damonbayer modified the milestones: P Sprint, Q Sprint Aug 19, 2024
@AFg6K7h4fhy2
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Instantiate CFA's Epidemia-For-Flu Model Via MSR
4 participants