diff --git a/episodes/simple-analysis.Rmd b/episodes/simple-analysis.Rmd index d2cb41c8..388025a9 100644 --- a/episodes/simple-analysis.Rmd +++ b/episodes/simple-analysis.Rmd @@ -23,6 +23,19 @@ exercises: 10 Understanding the trend in case data is crucial for various purposes, such as forecasting future case counts, implementing public health interventions, and assessing the effectiveness of control measures. By analyzing the trend, policymakers and public health experts can make informed decisions to mitigate the spread of diseases and protect public health. This episode focuses on how to perform a simple early analysis on incidence data. It uses the same dataset of **Covid-19 case data from England** that utilized it in [Aggregate and visualize](../episodes/describe-cases.Rmd) episode. +::::::::::::::::::: checklist + +### The double-colon + +The double-colon `::` in R let you call a specific function from a package without loading the entire package into the current environment. + +For example, `dplyr::filter(data, condition)` uses `filter()` from the `{dplyr}` package. + +This help us remember package functions and avoid namespace conflicts. + +::::::::::::::::::: + + ## Simple model Aggregated case data over a specific time unit, or incidence data, typically represent the number of cases occurring within that time frame. These data can often be assumed to follow either `Poisson distribution` or a `negative binomial (NB) distribution`, depending on the specific characteristics of the data and the underlying processes generating them. When analyzing such data, one common approach is to examine the trend over time by computing the rate of change, which can indicate whether there is exponential growth or decay in the number of cases. Exponential growth implies that the number of cases is increasing at an accelerating rate over time, while exponential decay suggests that the number of cases is decreasing at a decelerating rate. @@ -31,24 +44,39 @@ The `i2extras` package provides methods for modelling the trend in case data, ca ```{r, warning=FALSE, message=FALSE} -# loads the i2extras package, which provides methods for modeling +# load packages which provides methods for modeling library("i2extras") -# This line loads the i2extras package, which provides methods for modeling library("incidence2") -# subset the covid19_eng_case_data to include only the first 3 months of data + +# read data from {outbreaks} package covid19_eng_case_data <- outbreaks::covid19_england_nhscalls_2020 + +# subset the covid19_eng_case_data to include only the first 3 months of data df <- base::subset( covid19_eng_case_data, covid19_eng_case_data$date <= min(covid19_eng_case_data$date) + 90 ) + # uses the incidence function from the incidence2 package to compute the # incidence data -df_incid <- incidence2::incidence(df, date_index = "date", groups = "sex") +df_incid <- incidence2::incidence( + df, + date_index = "date", + groups = "sex" +) # fit a curve to the incidence data. The model chosen is the negative binomial # distribution with a significance level (alpha) of 0.05. -fitted_curve_nb <- i2extras::fit_curve(df_incid, model = "negbin", alpha = 0.05) -base::plot(fitted_curve_nb, angle = 45) + ggplot2::labs(x = "Date", y = "Cases") +fitted_curve_nb <- + i2extras::fit_curve( + df_incid, + model = "negbin", + alpha = 0.05 + ) + +# plot fitted curve +base::plot(fitted_curve_nb, angle = 45) + + ggplot2::labs(x = "Date", y = "Cases") ``` @@ -61,8 +89,13 @@ Repeat the above analysis using Poisson distribution? :::::::::::::::::::::::: solution ```{r, warning=FALSE, message=FALSE} -fitted_curve_poisson <- i2extras::fit_curve(df_incid, model = "poisson", - alpha = 0.05) +fitted_curve_poisson <- + i2extras::fit_curve( + x = df_incid, + model = "poisson", + alpha = 0.05 + ) + base::plot(fitted_curve_poisson, angle = 45) + ggplot2::labs(x = "Date", y = "Cases") ``` @@ -99,6 +132,7 @@ The **Peak time ** is the time at which the highest number of cases is observed ```{r, message=FALSE, warning=FALSE} peaks_nb <- i2extras::estimate_peak(df_incid, progress = FALSE) |> subset(select = -c(count_variable, bootstrap_peaks)) + base::print(peaks_nb) ``` @@ -109,10 +143,17 @@ A moving or rolling average calculates the average number of cases within a spec ```{r, warning=FALSE, message=FALSE} library("ggplot2") + moving_Avg_week <- i2extras::add_rolling_average(df_incid, n = 7L) + base::plot(moving_Avg_week, border_colour = "white", angle = 45) + - ggplot2::geom_line(ggplot2::aes(x = date_index, y = rolling_average, - color = "red")) + + ggplot2::geom_line( + ggplot2::aes( + x = date_index, + y = rolling_average, + color = "red" + ) + ) + ggplot2::labs(x = "Date", y = "Cases") ``` @@ -126,9 +167,19 @@ Compute and visualize the monthly moving average of cases on `df_incid`? ```{r, warning=FALSE, message=FALSE} moving_Avg_mont <- i2extras::add_rolling_average(df_incid, n = 30L) -base::plot(moving_Avg_mont, border_colour = "white", angle = 45) + - ggplot2::geom_line(ggplot2::aes(x = date_index, y = rolling_average, - color = "red")) + + +base::plot( + moving_Avg_mont, + border_colour = "white", + angle = 45 +) + + ggplot2::geom_line( + ggplot2::aes( + x = date_index, + y = rolling_average, + color = "red" + ) + ) + ggplot2::labs(x = "Date", y = "Cases") ```