-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs overhaul #431
base: dev
Are you sure you want to change the base?
Docs overhaul #431
Conversation
/preview-docs |
🚀 Deployed on https://67e478bce7f35616d80fb137--epipredict.netlify.app |
Our setup is generating docs in Also FYI: the bot edits its own comment for links. Each preview is a separate link and the links stick around for like 90 days. You can see the previous links in the comment edit history. Edit: this has been fixed on main so is no longer necessary |
8044b98
to
d35363e
Compare
/preview-docs |
So something weird is happening with the plot for I added an option to replace the data for the autoplot so you can compare with new data instead |
Draft of the getting started is ready, moving on to a draft of the "guts" page (name a placeholder), which is an overview of creating workflows by hand |
/preview-docs |
Including 0.5 into the user's selection sounds simple and reasonable to me. They can always filter out what they don't want. |
/preview-docs |
/preview-docs @nmdefries this also updates the backtesting vignette; I'm dropping the Canadian example because it basically had no revisions. |
/preview-docs |
|
||
``` r | ||
two_week_ahead <- arx_forecaster( | ||
covid_case_death_rates, | ||
four_week_ahead <- arx_forecaster( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: This gives me an error
Error in prep.epi_recipe(blueprint$recipe, training = training, fresh = blueprint$fresh, :
object 'validate_training_data' not found
I'm using the dev
version of epipredict
, installed today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's internal to {recipes}
(unexported). The DESCRIPTION may need to depend on a higher version. News.md here suggests 1.1.0, but worth looking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Long term goal is to remove any dependence on internal functions from other packages)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recipes
1.1.1 worked along with an update to hardhat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments/questions so far.
I just need to finish the custom_workflows
vignette.
vignettes/backtesting.Rmd
Outdated
As truth data, we'll compare with the `epix_as_of()` to generate a snapshot of | ||
the archive at the last date[^1]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: why are we comparing our forecast to the finalized value?
data = percent_cli_data |> filter(geo_value == geo_choose), | ||
aes(x = time_value, y = percent_cli, color = factor(version)), | ||
inherit.aes = FALSE, na.rm = TRUE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: On the version-faithful plot, the "finalized" line is bolded and pink vs gray on the version un-faithful plot. This makes them confusing to compare.
The version faithful finalized data also doesn't cover the full time period.
vignettes/custom_epiworkflows.Rmd
Outdated
library(epidatr) | ||
``` | ||
|
||
To get a better handle on custom `epi_workflow()`s, lets recreate and then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: please explain what an epi_workflow
is and why you'd want to use it.
versions you should assume performance is worse than what the test would | ||
otherwise suggest. | ||
|
||
[^4]: Until we have a time machine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: 😆
The version faithful and un-faithful forecasts look moderately similar except for the 1 day horizons | ||
(although neither approach produces amazingly accurate forecasts). | ||
|
||
### Example using case data from Canada | ||
In the version faithful case for California, the March 2021 forecast (turquoise) | ||
starts at a value just above 10, which is very well lined up with reported values leading up to that forecast. | ||
The measured and forecasted trends are also concordant (both increasingly moderately fast). | ||
|
||
<details> | ||
Because the data for this time period was later adjusted down with a decreasing trend, the March 2021 forecast looks quite bad compared to finalized data. | ||
|
||
<summary>Data and forecasts. Similar to the above.</summary> | ||
|
||
By leveraging the flexibility of `epiprocess`, we can apply the same techniques | ||
to data from other sources. Since some collaborators are in British Columbia, | ||
Canada, we'll do essentially the same thing for Canada as we did above. | ||
|
||
The [COVID-19 Canada Open Data Working Group](https://opencovid.ca/) collects | ||
daily time series data on COVID-19 cases, deaths, recoveries, testing and | ||
vaccinations at the health region and province levels. Data are collected from | ||
publicly available sources such as government datasets and news releases. | ||
Unfortunately, there is no simple versioned source, so we have created our own | ||
from the Github commit history. | ||
|
||
First, we load versioned case rates at the provincial level. After converting | ||
these to 7-day averages (due to highly variable provincial reporting | ||
mismatches), we then convert the data to an `epi_archive` object, and extract | ||
the latest version from it. Finally, we run the same forcasting exercise as for | ||
the American data, but here we compare the forecasts produced from using simple | ||
linear regression with those from using boosted regression trees. | ||
|
||
```{r get-can-fc, warning = FALSE} | ||
aheads <- c(7, 14, 21, 28) | ||
canada_archive <- can_prov_cases | ||
canada_archive_faux <- epix_as_of(canada_archive, canada_archive$versions_end) %>% | ||
mutate(version = time_value) %>% | ||
as_epi_archive() | ||
# This function will add the 7-day average of the case rate to the data | ||
# before forecasting. | ||
smooth_cases <- function(epi_df) { | ||
epi_df %>% | ||
group_by(geo_value) %>% | ||
epi_slide_mean("case_rate", .window_size = 7, na.rm = TRUE, .suffix = "_{.n}dav") | ||
} | ||
forecast_dates <- seq.Date( | ||
from = min(canada_archive$DT$version), | ||
to = max(canada_archive$DT$version), | ||
by = "1 month" | ||
) | ||
The equivalent version un-faithful forecast starts at a value of 5, which is in line with the finalized data but would have been out of place compared to the version data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment: I heavily modified this section to focus more on why/how version faithful and un-faithful forecasts differ, and "advertise" backtesting as a useful tool. The previous blurb made it sound like version un-faithful forecasts performed better, which they do on finalized data of course, but is not what we're trying to say here.
Please check over the new version @dsweber2 to see if i left anything out.
vignettes/custom_epiworkflows.Rmd
Outdated
``` | ||
|
||
So there are 6 steps we will need to recreate. | ||
One thing to note about the extracted recipe is that it has already been |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: confused about recipe
vs workflow
(this is probably not the right spot to explain). We should probably link to recipes
documentation of recipe
s so that we don't have to get into a lot of detail here.
engines (such as `quantile_reg()`) are | ||
|
||
- `layer_quantile_distn()`: adds the specified quantiles. | ||
If they differ from the ones actually fit, they will be interpolated and/or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: please clarify: differ in what way?
vignettes/custom_epiworkflows.Rmd
Outdated
|
||
## Predicting | ||
|
||
To do a prediction, we need to first narrow the dataset down to the relevant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: clarify why they need to be removed. Won't unused obs just be ignored?
vignettes/custom_epiworkflows.Rmd
Outdated
``` | ||
|
||
The resulting tibble is 800 rows long, however. | ||
This produces forecasts for not just the actual `forecast_date`, but for every |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: where did we set the forecast_date
/how does the workflow know what it is? what if we want to use a different forecast date? do we have to re-define and re-compile the whole workflow?
This can be useful for cases where `get_test_data()` doesn't pull sufficient | ||
data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: This sounds suspicious. If get_test_data
decided there wasn't sufficient data, it sounds like the predict
-> filter
approach is doing something wrong (predicting with insufficient data).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed that the predict
-> filter
approach would return the same predictions for forecast_date
, but it sounds like not.
Checklist
Please:
dajmcdon.
DESCRIPTION
andNEWS.md
.Always increment the patch version number (the third number), unless you are
making a release PR from dev to main, in which case increment the minor
version number (the second number).
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
0.7.2, then write your changes under the 0.8 heading).
epiprocess
version in theDESCRIPTION
file ifepiprocess
soonepipredict
andepiprocess
Change explanations for reviewer
Draft ready for review:
Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch
symmetrize
for residuals #264nafill_buffer
usage #320