- Arithmetic Engine
- Main existing pain-points:
- Interface for inputting rate expressions is not intuitive in many (but not all) cases
- Not all operations that a user would want to include in a rate expression are possible (e.g. polynomials with more than one term)
- Details:
- Ensure that operations can happen in any order (e.g. currently powers must be taken after sums of state variables)
- User can just use the formula interface without needing to consider storing intermediate values
- Use 'condensation operations' like convolution and lagged differencing more generally
- Templatize the C++ code so that the user can compile it themselves with custom operations (e.g. "for some reason I just need the digamma function")
- Main existing pain-points:
- Numerical Robustness
- Main existing pain-points:
- Users get no help on how to diagnose lack of convergence
- Users get warnings on negative state variables and rates, but it is not clear how to figure out why and how to stop it
- Details:
- Currently
convergence_info
,opt_obj
, andopt_par
provide basic access to the underlying optimization objects, which should be enough for people to construct their own convergence diagnostics - The first step to making this easier is to finish this
- Then we might consider wrapping this stuff up
- Provide warnings for common scenarios (e.g. I've noticed that the default
bbmle
settings can get stuck on parameters that cause flat dynamics for simple models -- not sure how important this is for more complex models) - Recommended and wise default priors
- Restricting state variables to be positive is difficult given that our formulation is not on the log scale
- We need better documentation on how to run simulations to diagnose negativity
- Currently
- Main existing pain-points:
- Inference on Parameters
- Main existing pain-points:
- Not clear to users how to get confidence intervals for parameters
- Details:
- Can get confidence intervals from
bbmle::confint.mle2
,TMB::sdreport
,rstan::summary
- Should at least provide examples in the guide, and then wrap them if it makes sense
- Can get confidence intervals from
- Main existing pain-points:
- Log-Linear Models
- Main existing pain-points:
- The current piece-wise model of time-variation limits modelling flexibility
- Details:
- More general parameter time-variation
- Any parameter is a link-transformed linear matrix equation
- We should add terms for dependence on state variables
- How to extend semi-parametric log-linear models into forecasting periods?
- Main existing pain-points:
- Process Error (Sim ProcErr and Calib ProcErr)
- Main existing pain-points:
- Process error has important dynamical consequences for many models, but is not currently possible
- Details:
- Should be easy to add to simulations, but it will be a little more challenging to include with calibrations
- This will need to plug in nicely with log-linear models, where we can decompose the model matrix into a fixed-effect and random-effect component
- The random-effect component will naturally make a fairly wide range of process error models available including
- Main existing pain-points:
- Model Products
- Main existing pain-points:
- Model definition might be currently harder than it needs to be, given the lack of a formal but straightforward interface for combining sub-models
- Details:
- Model Specification as Products of Sub-Models
- Multiply sub-models (e.g.
SEIR * vax_status
) - Currently possible in a lightweight way using string manipulation of names, so it is possible that we just need a better-documented set of examples
- Variation of parameters over sub-models
- i.e. like params_timevar but over other model categories
- e.g. transmission rate varies over vaccination status
- We have detailed notes that include a proposal
- Main existing pain-points:
- ODE Solvers
- Main existing pain-points:
- Within-time-step variation can have important dynamical consequences in some cases, but this is not currently possible
- Details:
- Runge-Kutta-4 should be straightforward
- Is the boxcar model related to this?
- Main existing pain-points:
- R0, Gbar, r
- Main existing pain-points:
- Applying default parameters to calibration/forecasting scenarios can lead to bad results, and fix-par functionality provides some protection
- Details:
- R0, Gbar, r for general models (next-generation matrix?)
- Priors on these aggregate summarizing parameters
- Main existing pain-points:
- Generic Time Step Units
- Main existing pain-points:
- The natural time-step is not always one day, but this is currently a hard requirement
- Details:
- Each step always means one day
- But more often we only have data at less granular time scales
- Main existing pain-points:
- Flows without per-capita Rates
- Main existing pain-points:
- Some flows that we encounter are unnatural or impossible to specify as per capita rates (e.g. Vital dynamics with waning; vaccination administration)
- Details:
- e.g. vaccination
- e.g. for convenience when the per-captia denominator is the sum of multiple boxes as it often is with birth rates
- This is a convenience issue for birth rates but becomes a necessity when both birth and waning are in the model, because then waning needs to have an associated outflow but not birth -- and the engine has no ability to restrict outflow from some states back to S but not from others
- Main existing pain-points:
- MCMC
- Main existing pain-points:
- I'm not sure that there are any existing pain-points
- Details:
- tmbstan vs rstan engine as an alternative to tmb
- Probably should start with tmbstan as we already have an prototype
- Main existing pain-points:
- Uneven Lagged Diff
- Main existing pain-points:
- Reports are not always provided every
n
days, and so the lagged differencing needs to account for variablen
- Reports are not always provided every
- Main existing pain-points:
- Technical Debt
- Modularization & Software Stability
- Main existing pain-points:
- Too many parts of the codebase need to be modified when adding new features
- The Clean up Existing Features aspect of technical debt grows faster when Modularization is insufficient
- Details:
- No new 'user' functionality
- After all I've learned, redesign the codebase so that it is easier to extend in the future
- Big value for adoption and community growth
- C++ modularity versus R-side modularity -- which is more important?
- Design principles
- Single responsibility -- each function should have one reason to change
- Open-closed -- functions should be open for extension but closed for modification
- Substitution -- alternative versions of a method should have the same input and output types, but the computation differs
- Interface segregation -- interfaces should not depend on methods that they do not use, and therefore we should tend to have many use-case focused interfaces rather than few general-purpose interfaces
- Dependency inversion -- interfaces should depend on abstract function definitions, not concrete implementations
- Main existing pain-points:
- Clean up Existing Features
- Main existing pain-points:
- The list is growing, which increases developer stress
- Details:
- Enforce Ordering of Model Definition Functions
- Ensemble Forecasting
- Choose any combination of parameter uncertainty, observation error, and process error (when process error is possible)
- Flexibly omit error for particular time-series or omit uncertainty for certain parameters
- Ensemble forecasting should run faster in loops within simulation macros on the C++ side, as opposed to generating samples from the distribution of parameters on the R side and then passing back to C++
- Smoothing out the empirical quantiles so that forecast envelopes are not as bumpy
- Symbolic Matrix Algebra Engine
- Poorly documented
- If we do a cleanup of the Arithmetic Engine then the matrix stuff should be done at that time, it we do not then the matrix stuff needs to be done sooner
- Make State Cleanup
- Option to not use eigenvector and just populate the initial state with elements of the parameter vector
- More intuitive interface, or at least detailed documentation
- Condensation
- Gaussian convolution -- make sure that it is robust when sd is large (set a threshold for an error -- in general we need this for all convolutions)
- Mass loss problem: the q-vector that we choose will cause us to lose mass at the tails
- Could renormalize (probably should)
- Take the difference of the cumulative convolution distribution of the two end-points, and if this is above a threshold then throw an error/warn/fix??
- We could also use this approach to check for negativity
- Mass loss problem: the q-vector that we choose will cause us to lose mass at the tails
- Cumulative sums in condensation on the C++ side
- Gaussian convolution -- make sure that it is robust when sd is large (set a threshold for an error -- in general we need this for all convolutions)
- Ability to fit prior hyperparameters
- Main existing pain-points:
- Guides
- Main existing pain-points:
- We have a draft but reasonably helpful guide for users, but the contributors guide is totally insufficient
- Details:
- Part of the problem is that the previous technical debt items demotivate me from finishing the above
- Main existing pain-points:
- Modularization & Software Stability
Naming conventions for value vector? troubleshoot test_calibrate more docs for new params_timevar structure issue/document testLevel stuff in README
update method for calibrations
names? Symbol.Date values -> time_params
-
REAL CONFIDENCE INTERVALS!!
- we need to get importance weighting going etc. (or MCMC, but imp weighting should work if anything will)
-
instabilities etc.
- rewrite as gradient
- Runge-Kutta steps?
- fancy flows/transformations
- why are hospitalizations bouncy?
-
simulate from simple model
-
simulate from spline model
-
test spline variations: number of spline parameters? effects of var choice on knot placement? effects of (strong) penalization on spline coeffs?
-
document !!!!
-
document Rt calculation (add to McMasterReport.pdf) etc.
-
allow choice of Wald vs DEoptim cov matrices in forecast_ensemble; allow control of
pop_pred_samp
-
document invlink_trans better
-
secret plots; writeup for secret
- save fluxes
- calibrate uncertainty
- DEoptim pop/imp wt experiments
- full Bayes?
- pop_pred_samp
- match up with EpiEstim
- LTCF: data, model compartments
- TMB core
- testing flow
- should log_mu be logit_mu instead?
- debug update/parallel DE_cores
- allow browser()/debugging in mle_fun (circumvent mle2 problems)?
- condense shouldn't be so hard-coded; refactor. Enforce conventions for var names (I vs ICU)?
- think about condensation/diff/cum (cumRep, hosp admission. Can cumulate during or post-run; need something that makes obs error correct and is efficient)
- Erlangization/chain trick
- TMB/Stan implementations
- compartmental
- renewal
- testing (distributions, compartmental)
-
change/document that parameters should not contain underscores (or use a different delimiter for invlink prefixes, e.g. .. or __ or |
-
rename fix_pars to fix_params?
-
print start, end dates in print.fit_pansim method
-
arrange consistent factor ordering/colour palette throughout
-
simulations with per-variable obs error, ¿process error?
-
hospital admissions
-
pipeline for copying up-to-date calibrations and data into package
-
priors on r/R0/etc.?
-
MCMC
-
carry along last date as metadata in calibrations (what data were used?)
-
process error
- implemented but not well-documented
- overdispersion (beyond demog stoch) is global, not per-transition or per-state variable ...
- allowing per-trans overdisp means thinking about including vectors in a parameter list - what if anything will this break?
-
R(t)
- delay-convolve beta curve
- consider other estimation machinery?
-
check hospital-only calibration (don't try to calibrate)
-
Makiness:
- depend on package version
- ont_all from clean, not calib?
- dotdir
-
add testing intensity?
-
make
get_r
work for r<1 ... -
distributions in
calibrate
- implement Poisson?
- allow var-specific nb params
-
calibrate
/mle_fun
: better error handling if dates/vars don't match -
calibrate
substitute mle2 for optim? -
importance weights for
forecast_ensemble
? -
include hessian=TRUE by default
-
process error version
-
transition TODO list to GH issues??? (items marked Z have been posted)
-
fix make rules
-
Z write_params
-
incorporate het/behavioural change:
beta*(S/N)^het_alpha*I
-
multi-run plots (aggregate + bind_rows)
-
improve documentation!
- add hospital admissions 'stock'
- Z translate label names for graphs ("D" -> "Dead", etc.)
- Z include start dates as metadata for params, states, runs ...
- check
do_hazard
option and switch to this as default - un-FRY
update_foi
function - Z
write_params
,read_params
; keep labels, unevaluated values, etc. as attributes, get start date as attribute. (Store as separate column, or commented row, or ... ???) [store params as JSON???] - S3 method for params
- Z pretty-printing (print rates as reciprocals?)
- full stochastic version (now does obs stoch)
- demographic: reulermultinom etc.
- random-walk of parameters
- Bayesian/importance-sampling solution
- Erlang-izing transition matrix
- age structure
- spatial structure
- incorporate testing (see
testing_flow.md
) - porting to other platforms for speed/latent variable handling
- TMB-ize ?
- generate enums of state names and params
- changing transition matrix in place will be easier