You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Working on the sensitivity chapter has led me to a bit of a deep dive into using some variables from touringplans::parks_metadata_raw to better capture variables related to the crowd flow at Magic Kingdom. After spending some time with it, I think I'm overloading this section in the sensitivity chapter. This started as me creating two new variables for an alternative DAG (is_weekend and is_holiday) but is now getting a bit too nuanced for this section.
I think a better approach would be to present this more complex confounding structure in the ML chapter, partly to help justify using more flexible modeling approaches. Then, we can pick up that thread later without spending so much time on the idea in the sensitivity chapter.
So, for now, I'm going to stick with the two examples above and rework this in a few months. Here's some copy and code related to that. (I think this should be expanded to be even more sophisticated, e.g. happenings at other parks)
In particular, we want to capture baseline crowd flow. We'll use a few new variables to try to approximate this: the previous day's wait time at the same hour, the number of schools in session, whether it's a weekend or holiday, and if it's a holiday, how it's related to crowd size, some variables related to events around the Magic Kingdom like fireworks and parades, and a marker of the ride capacity loss to due attraction shutdowns in the park.
Consider this expanded DAG in @fig-dag-extra-days. For simplicity, we're presenting all of these confounders in a single supernode called crowd flow. We're assuming that all of them are causes of both whether there are Extra Magic Morning and wait times.
metadata<-parks_metadata_raw|>
filter(year==2018) |>
select(
# some of these are precision variablesdate, insession, insession_sqrt_dlr, mkevent, holiday, holidaym, mkprdday,
mkprddt1, mkprddt2, mkfirewk, mkfiret1, mkfiret2,
# maybe post outcome var techncially# should lag?capacitylost_mk
)
seven_dwarfs_with_days<-seven_dwarfs_train_2018|>
filter(wait_hour==9) |>
mutate(
is_holiday=park_date%in%holidays,
is_weekend=timeDate::isWeekend(park_date),
prev_wait= lag(wait_minutes_posted_avg, order_by=park_date)
) |>
left_join(metadata, by= c("park_date"="date")) |>
mutate(
insession= parse_number(insession),
insession_sqrt_dlr= parse_number(insession_sqrt_dlr)
)
fit_ipw_effect(
park_extra_magic_morning~park_temperature_high+park_close+park_ticket_season+is_weekend+insession+insession_sqrt_dlr+mkevent+holiday+holidaym+mkprdday+mkfirewk+capacitylost_mk,
.data=seven_dwarfs_with_days
)
calculate_coef2<-function(n_days_lag) {
distinct_emm<-seven_dwarfs_with_days|>
arrange(park_date) |>
transmute(
park_date,
prev_park_extra_magic_morning= lag(park_extra_magic_morning, n=n_days_lag),
prev_park_temperature_high= lag(park_temperature_high, n=n_days_lag),
prev_park_close= lag(park_close, n=n_days_lag),
prev_park_ticket_season= lag(park_ticket_season, n=n_days_lag),
prev_is_weekend= lag(insession, n=n_days_lag),
prev_insession= lag(insession, n=n_days_lag),
prev_insession_sqrt_dlr= lag(insession_sqrt_dlr, n=n_days_lag),
prev_mkevent= lag(mkevent, n=n_days_lag),
prev_holiday= lag(holiday, n=n_days_lag),
prev_holidaym= lag(holidaym, n=n_days_lag),
prev_mkprdday= lag(mkprdday, n=n_days_lag),
prev_mkfirewk= lag(mkfirewk, n=n_days_lag),
prev_capacitylost_mk= lag(capacitylost_mk, n=n_days_lag),
)
seven_dwarfs_with_days_lag<-seven_dwarfs_with_days|>
left_join(distinct_emm, by="park_date") |>
filter(!is.na(prev_park_extra_magic_morning))
fit_ipw_effect(
prev_park_extra_magic_morning~prev_park_temperature_high+prev_park_close+prev_park_ticket_season+prev_is_weekend+prev_insession+prev_insession_sqrt_dlr+prev_mkevent+prev_holiday+prev_holidaym+prev_mkprdday+prev_mkfirewk+prev_capacitylost_mk,
.data=seven_dwarfs_with_days_lag,
.trt="prev_park_extra_magic_morning",
.outcome_fmla=wait_minutes_posted_avg~prev_park_extra_magic_morning+park_extra_magic_morning
)
}
calculate_coef2(63)
coefs<-purrr::map_dbl(1:63, calculate_coef2)
ggplot(data.frame(coefs=coefs, x=1:63), aes(x=x, y=coefs)) +
geom_hline(yintercept=0) +
geom_point() +
geom_smooth() +
labs(y="difference in wait times (minutes)\n on day (i) for EMM on day (i - n)", x="day (i - n)")
The text was updated successfully, but these errors were encountered:
Working on the sensitivity chapter has led me to a bit of a deep dive into using some variables from
touringplans::parks_metadata_raw
to better capture variables related to the crowd flow at Magic Kingdom. After spending some time with it, I think I'm overloading this section in the sensitivity chapter. This started as me creating two new variables for an alternative DAG (is_weekend
andis_holiday
) but is now getting a bit too nuanced for this section.I think a better approach would be to present this more complex confounding structure in the ML chapter, partly to help justify using more flexible modeling approaches. Then, we can pick up that thread later without spending so much time on the idea in the sensitivity chapter.
So, for now, I'm going to stick with the two examples above and rework this in a few months. Here's some copy and code related to that. (I think this should be expanded to be even more sophisticated, e.g. happenings at other parks)
The text was updated successfully, but these errors were encountered: