-
Notifications
You must be signed in to change notification settings - Fork 0
initial nssp covid forecast #203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
R/aux_data_utils.R
Outdated
as_epi_archive(compactify = TRUE) %>% | ||
extract2("DT") %>% | ||
# End of week to midweek correction. | ||
mutate(time_value = time_value + 3) %>% | ||
mutate(time_value = floor_date(time_value, "week", week_start = 4)-1) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing the previous code assumed time_value is always a Wednesday and this doesn't rely on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also question: floor_date(week_start = 4) - 1
brings the date back to the previous Thursday and we subtract one to go to Wednesday, why not go direct to week_start = 3
? Also what is time_value usually in these datasets anyway? Do we need to worry about rounding to the wrong week (previous vs current)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'm getting really tired of leaving brittle date arithmetic all over the place. Though now that you point it out, this is going the wrong direction, since it should be moving forward from Sunday to Wednesday.
week_start=4
vs week_start=3)-1
, here's a comparison:
> its_been <- tibble(time_value = seq.Date(from = as.Date("2025-06-15"), to = as.Date("2025-06-22"), by = 1))
> its_been %>% mutate(rounded = floor_date(time_value, unit = "week", week_start = 4) - 1, wkday = wday(time_value))
# A tibble: 8 × 3
time_value rounded wkday
<date> <date> <dbl>
1 2025-06-15 2025-06-11 1
2 2025-06-16 2025-06-11 2
3 2025-06-17 2025-06-11 3
4 2025-06-18 2025-06-11 4
5 2025-06-19 2025-06-18 5
6 2025-06-20 2025-06-18 6
7 2025-06-21 2025-06-18 7
8 2025-06-22 2025-06-18 1
> its_been %>% mutate(rounded = floor_date(time_value, unit = "week", week_start = 3), wkday = wday(time_value))
# A tibble: 8 × 3
time_value rounded wkday
<date> <date> <dbl>
1 2025-06-15 2025-06-11 1
2 2025-06-16 2025-06-11 2
3 2025-06-17 2025-06-11 3
4 2025-06-18 2025-06-18 4
5 2025-06-19 2025-06-18 5
6 2025-06-20 2025-06-18 6
7 2025-06-21 2025-06-18 7
8 2025-06-22 2025-06-18 1
week_start=4)-1
offsets where the boundary occurs by one. I shouldn't have used either of them though; what we really need is anything from Sunday through Saturday to round to Wednesday, or floor_date(time_value, unit = "week", week_start = 7) + 3
:
> its_been %>% mutate(rounded = floor_date(time_value, unit = "week", week_start = 7) + 3, wkday = wday(time_value))
# A tibble: 8 × 3
time_value rounded wkday
<date> <date> <dbl>
1 2025-06-15 2025-06-18 1
2 2025-06-16 2025-06-18 2
3 2025-06-17 2025-06-18 3
4 2025-06-18 2025-06-18 4
5 2025-06-19 2025-06-18 5
6 2025-06-20 2025-06-18 6
7 2025-06-21 2025-06-18 7
8 2025-06-22 2025-06-25 1
We really should be baking this logic into epiprocess, rather than having to re-derive it all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea or at least document each of these brittle transformations. I barely remember any of the assumptions or goals of the +3's. I vaguely remember it being a "bump Wed to Sat" move, since our week aggregation labeled the week with the Wed contained. So your logic here focusing on Wednesday labeling is not what I expect to see.
I started writing some wrapper functions for lubridate with the rounding behavior explicitly marked at some point, but then I didn't like that all I did was just wrap the function and comment it. Maybe that's worth something though, so then we don't have to comment these edge case assumptions everywhere we use floor_date and instead comment in the wrapper docstring.
Confirmed that extra_sources is somehow making wildly off-base predictions, so I'm just going to drop it for now. Maybe we can come back and debug it eventually; it may actually provide some different info. Edit: But also, I had a bug related to the climate_linear ensemble |
…ata timestamp checker
This adds nssp forecasts for covid on prod. The ensemble has something weird going on at the 0 ahead, and it still needs an actual CSV output function.