Estimate true numbers of cases from Johns Hopkins data.
The "ascertainment" function does most of the calculation.
The first step is to calculate new infections timeshifted back by 7 days. The figure 7 comes from:
- 5.5 days infection-to-symptoms (https://docs.google.com/spreadsheets/d/1yzVSp71jiCsoD_L6sXchfg8L9OF0tGHWEnCdfvF73Ac/edit#gid=1242721729)
- 4 days symptoms-to-confirmation (also https://docs.google.com/spreadsheets/d/1yzVSp71jiCsoD_L6sXchfg8L9OF0tGHWEnCdfvF73Ac/edit#gid=1242721729)
- Subtract two days as exponential growth means that fast confirmations will be overrepresented
We could in principle calculate this properly as we do with time to death, but for the purposes of calculating the true number of cases it is less crucial than calculating infection to death properly.
There may also be a lag in the reporting of deaths, and this would have a significant impact on results.
The number of deaths on day
Suppose that there is a fixed ascertainment rate
Note This is actually lagged ascertainment rate - it asks "what fraction of new cases today are detected 7 days later". I don't directly estimate an instantaneous ascertainment rate as this is very sensitive to the short term trajectory of infections, which, as I note above, I can't yet forecast. What we can in fact calculate is:
The method doesn't work well with small numbers of deaths per day. There are probably better ways to do this like variance weighted averaging, but I just remove small numbers of deaths.
This is for use if you are trying to determine initial conditions for an SEIR model.
The infectious class is given by
and the exposed class by
finally, the recovered class is given by
Where
To make the infectious class match standard SEIR treatment, we take
To get a recovery rate that matches our use of GLEAM, we could take
Another possibility is to take the "serial interval" distribution from https://www.imperial.ac.uk/media/imperial-college/medicine/mrc-gida/2020-03-30-COVID19-Report-13.pdf (note that this seems to be the distribution of probability of infection, not probability of first infection, so it's not acutally the serial interval). However, we need to turn this nonexponential distribution into an exponential infectious class, and figuring out how to approximate this appropriately seems hard.