Skip to content

initial nssp covid forecast #203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 25, 2025
Merged

initial nssp covid forecast #203

merged 8 commits into from
Jun 25, 2025

Conversation

dsweber2
Copy link
Contributor

This adds nssp forecasts for covid on prod. The ensemble has something weird going on at the 0 ahead, and it still needs an actual CSV output function.

@dsweber2 dsweber2 requested a review from dshemetov June 24, 2025 22:56
as_epi_archive(compactify = TRUE) %>%
extract2("DT") %>%
# End of week to midweek correction.
mutate(time_value = time_value + 3) %>%
mutate(time_value = floor_date(time_value, "week", week_start = 4)-1) %>%
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing the previous code assumed time_value is always a Wednesday and this doesn't rely on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also question: floor_date(week_start = 4) - 1 brings the date back to the previous Thursday and we subtract one to go to Wednesday, why not go direct to week_start = 3? Also what is time_value usually in these datasets anyway? Do we need to worry about rounding to the wrong week (previous vs current)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'm getting really tired of leaving brittle date arithmetic all over the place. Though now that you point it out, this is going the wrong direction, since it should be moving forward from Sunday to Wednesday.

week_start=4 vs week_start=3)-1, here's a comparison:

> its_been <- tibble(time_value = seq.Date(from = as.Date("2025-06-15"), to = as.Date("2025-06-22"), by = 1))
> its_been %>% mutate(rounded = floor_date(time_value, unit = "week", week_start = 4) - 1, wkday = wday(time_value))
# A tibble: 8 × 3
  time_value rounded    wkday
  <date>     <date>     <dbl>
1 2025-06-15 2025-06-11     1
2 2025-06-16 2025-06-11     2
3 2025-06-17 2025-06-11     3
4 2025-06-18 2025-06-11     4
5 2025-06-19 2025-06-18     5
6 2025-06-20 2025-06-18     6
7 2025-06-21 2025-06-18     7
8 2025-06-22 2025-06-18     1
>  its_been %>% mutate(rounded = floor_date(time_value, unit = "week", week_start = 3), wkday = wday(time_value))
# A tibble: 8 × 3
  time_value rounded    wkday
  <date>     <date>     <dbl>
1 2025-06-15 2025-06-11     1
2 2025-06-16 2025-06-11     2
3 2025-06-17 2025-06-11     3
4 2025-06-18 2025-06-18     4
5 2025-06-19 2025-06-18     5
6 2025-06-20 2025-06-18     6
7 2025-06-21 2025-06-18     7
8 2025-06-22 2025-06-18     1

week_start=4)-1 offsets where the boundary occurs by one. I shouldn't have used either of them though; what we really need is anything from Sunday through Saturday to round to Wednesday, or floor_date(time_value, unit = "week", week_start = 7) + 3:

> its_been %>% mutate(rounded = floor_date(time_value, unit = "week", week_start = 7) + 3, wkday = wday(time_value))
# A tibble: 8 × 3
  time_value rounded    wkday
  <date>     <date>     <dbl>
1 2025-06-15 2025-06-18     1
2 2025-06-16 2025-06-18     2
3 2025-06-17 2025-06-18     3
4 2025-06-18 2025-06-18     4
5 2025-06-19 2025-06-18     5
6 2025-06-20 2025-06-18     6
7 2025-06-21 2025-06-18     7
8 2025-06-22 2025-06-25     1

We really should be baking this logic into epiprocess, rather than having to re-derive it all the time.

Copy link
Contributor

@dshemetov dshemetov Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea or at least document each of these brittle transformations. I barely remember any of the assumptions or goals of the +3's. I vaguely remember it being a "bump Wed to Sat" move, since our week aggregation labeled the week with the Wed contained. So your logic here focusing on Wednesday labeling is not what I expect to see.

I started writing some wrapper functions for lubridate with the rounding behavior explicitly marked at some point, but then I didn't like that all I did was just wrap the function and comment it. Maybe that's worth something though, so then we don't have to comment these edge case assumptions everywhere we use floor_date and instead comment in the wrapper docstring.

@dsweber2
Copy link
Contributor Author

dsweber2 commented Jun 25, 2025

Confirmed that extra_sources is somehow making wildly off-base predictions, so I'm just going to drop it for now. Maybe we can come back and debug it eventually; it may actually provide some different info.

Edit: But also, I had a bug related to the climate_linear ensemble

@dsweber2 dsweber2 merged commit cc54d1b into main Jun 25, 2025
1 check passed
@dshemetov dshemetov deleted the nsspForecast branch June 26, 2025 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants