From 19bf82b8f08b8d319f4fe45c24fbfc7740c73213 Mon Sep 17 00:00:00 2001
From: Degoot-AM <degoot@aims.ac.za>
Date: Mon, 23 Sep 2024 11:08:23 +0000
Subject: [PATCH 1/5] add minimal {simulist list configruration}

---
 episodes/describe-cases.Rmd                 | 107 ++++-------------
 renv/profiles/lesson-requirements/renv.lock | 124 ++++++++++----------
 2 files changed, 89 insertions(+), 142 deletions(-)

diff --git a/episodes/describe-cases.Rmd b/episodes/describe-cases.Rmd
index 330d8205..3f1f9b9c 100644
--- a/episodes/describe-cases.Rmd
+++ b/episodes/describe-cases.Rmd
@@ -24,18 +24,18 @@ exercises: 10
 In an analytic pipeline, exploratory data analysis (EDA) is an important step before formal modelling. EDA helps 
 determine relationships between variables and summarize their main characteristics often by means of data visualization. 
 
-This episode focuses on  EDA of outbreaks and epidemic data, and how to achieved that using a couples of handy `R` 
-packages. A key observation in EDA of epidemic analysis is capturing the relationship between time and the number of 
-reported cases, spanning various categories (confirmed, hospitalized, deaths, and recoveries), locations, and other 
-demographic factors such as gender, age, etc.  
+This episode focuses on EDA of outbreak data using a few essential R packages. 
+A key aspect of EDA in epidemic analysis is identifying the relationship between time and the  observed epidemic outcome, such as confirmed cases, hospitalizations, deaths, and recoveries across different locations and demographic factors, including gender, age, and more.
 
-Let's start by loading the package `{incidence2}` to aggregate linelist data by groups and visualize epicurves. We'll use `{simulist}` to simulate outbreak data, `{epiparameter}` to access delays for this simulation, and `{tracetheme}` for complementary figure formatting. We'll use the pipe `%>%` to connect some of their functions, including others from the packages `{dplyr}` and `{ggplot2}`, so let's also call to the tidyverse package:
+Let's start by loading the package `{incidence2}` to aggregate linelist data by groups and visualize epicurves.
+ We'll use `{simulist}` to simulate outbreak data,  and `{tracetheme}` for complementary figure formatting.
+ We'll use the pipe `%>%` to connect some of their functions, including others from the packages `{dplyr}` and 
+ `{ggplot2}`, so let's also call to the tidyverse package:
 
 ```{r,eval=TRUE,message=FALSE,warning=FALSE}
 # Load packages
 library(incidence2) # to aggregate and visualise
 library(simulist) # to simulate linelist data
-library(epiparameter) # to access delays
 library(tracetheme) # for figure formatting
 library(tidyverse) # for {dplyr} and {ggplot2} functions and the pipe %>%
 ```
@@ -47,7 +47,6 @@ library(tidyverse) # for {dplyr} and {ggplot2} functions and the pipe %>%
 The double-colon `::` in R let you call a specific function from a package without loading the entire package into the current environment. 
 
 For example, `dplyr::filter(data, condition)` uses `filter()` from the `{dplyr}` package.
-
 This help us remember package functions and avoid namespace conflicts.
 
 :::::::::::::::::::
@@ -56,83 +55,30 @@ This help us remember package functions and avoid namespace conflicts.
 ## Synthetic outbreak data
 
 To illustrate the process of conducting EDA on outbreak data, we will generate a line list 
-for a hypothetical Ebola outbreak utilizing the `{simulist}` package. This line list dataset offers individual-level 
-information about the outbreak. For our simulation, we will assume that the dynamics of this outbreak are influenced by 
-several factors: the contact distribution (average number of contacts for an infected case), distribution of contact 
-intervals (time period between contacts), and the delay distributions of onset to hospitalization and onset to death. 
-These latter distributions can be sourced from literature and are conveniently available in the `{epiparameter}` 
-package, see the below code chunk.
+for a hypothetical disease outbreak utilizing the `{simulist}` package. `{simulist}` generates simulation data for outbreak according to a given configuration. 
+Its minimal configuration can generate a  linelist as shown in the below code chunk 
 
 ```{r, warning=FALSE, message=FALSE}
-# Define contact distribution
-contact_dist <- epiparameter::epidist(
-  disease = "Ebola",
-  epi_dist = "contact distribution",
-  prob_distribution = "pois",
-  prob_distribution_params = c(mean = 2)
-)
+set.seed(1)
+sim_data <- simulist::sim_linelist(outbreak_size = c(1000, 2000))
+head(sim_data)
+```
 
-# Define  distribution for interval between contact
-cont_interval <- epiparameter::epidist(
-  disease = "Ebola",
-  epi_dist = "contact interval",
-  prob_distribution = "gamma",
-  prob_distribution_params = c(shape = 1, scale = 1)
-)
+ This linelist dataset offers individual-level information about the outbreak. 
 
-# Define onset to hospitalized distribution
-onset_to_hosp <- contact_dist <- epiparameter::epidist(
-  disease = "Ebola",
-  epi_dist = "onset to hospitalisatio",
-  prob_distribution = "pois",
-  prob_distribution_params = c(mean = 7)
-)
+::::::::::::::::::: spoiler
 
-# get onset to death from {epiparameter} database
-onset_to_death <- epiparameter::epidist_db(
-  disease = "Ebola",
-  epi_dist = "onset to death",
-  single_epidist = TRUE
-)
+## Additional Resources on Outbreak Data
+
+ - This is the default configuration of `{simulist}`, should want to know more about its functionalities, 
+check [middle](https://github.com/epiverse-trace/tutorials-middle/) and [late](https://epiverse-trace.github.io/tutorials-late/) tutorials.
+
+ - You can find more information at the [`{outbreak}` documentation](https://outbreak-info.github.io/R-outbreak-info/)
+
+:::::::::::::::::::
 
-# Define distribution for infectious period
-infect_period <- epiparameter::epidist(
-  disease = "Ebola",
-  epi_dist = "Infectious period",
-  prob_distribution = "gamma",
-  prob_distribution_params = c(shape = 1, scale = 1)
-)
-```
 
-Additionally, we assume that the outbreak started at the beginning of 2023, is highly contagious with a probability of 
-infection of $80\%$, and its minimum and maximum sizes are 1000 and 10,000, respectively. Combining these assumptions with 
-the mentioned distributions, the code chunk below generates a simulated line list:
 
-```{r, warning=FALSE, message=FALSE}
-# Set seed to 1 to  have the same results
-base::set.seed(1)
-
-# Generate simulation data using the defined distribution.
-linelist <- simulist::sim_linelist(
-  contact_dist,
-  infect_period,
-  prob_infect = 0.6,
-  onset_to_hosp,
-  onset_to_death,
-  hosp_risk = 0.2,
-  hosp_death_risk = 0.5,
-  non_hosp_death_risk = 0.05,
-  outbreak_start_date = as.Date("2023-01-01"),
-  outbreak_size = c(1000, 10000),
-  population_age = c(1, 90),
-  case_type_probs = c(suspected = 0.2, probable = 0.1, confirmed = 0.7),
-  config = simulist::create_config()
-) %>%
-  dplyr::as_tibble() # for a simple data frame output
-
-# View first few rows of the generated data
-linelist
-```
 ## Aggregating
 
 Downstream analysis involves working with aggregated data rather than individual cases. This requires grouping linelist 
@@ -144,7 +90,7 @@ simulated  Ebola `linelist` data based on the  date of onset.
 ```{r, message=FALSE, warning=FALSE}
 # create incidence object by aggregating case data  based on the date of onset
 dialy_incidence_data <- incidence2::incidence(
-  linelist,
+  sim_data,
   date_index = "date_onset",
   interval = 1
 )
@@ -158,7 +104,7 @@ more factors. Below is a code snippet demonstrating weekly cases grouped by the
 ```{r}
 # Grouping data by week
 weekly_incidence_data <- incidence2::incidence(
-  linelist,
+  sim_data,
   date_index = "date_onset",
   interval = 7,
   groups = c("sex", "case_type")
@@ -177,7 +123,7 @@ resulting `incidence2` object. The `incidence2` package provides a function call
 ```{r, message=FALSE, warning=FALSE}
 # Create incidence object
 dialy_incidence_data_2 <- incidence2::incidence(
-  linelist,
+  sim_data,
   date_index = "date_onset",
   groups = "sex",
   interval = 1
@@ -340,9 +286,6 @@ ggplot2::ggplot(data = dialy_incidence_data_2) +
 ```
 
 
-
-
-
 ::::::::::::::::::::::::::::::::::::: challenge 
 
 ## Challenge 1: Can you do it?
diff --git a/renv/profiles/lesson-requirements/renv.lock b/renv/profiles/lesson-requirements/renv.lock
index 77cdee04..3f13b958 100644
--- a/renv/profiles/lesson-requirements/renv.lock
+++ b/renv/profiles/lesson-requirements/renv.lock
@@ -4,7 +4,7 @@
     "Repositories": [
       {
         "Name": "CRAN",
-        "URL": "https://cran.rstudio.com"
+        "URL": "https://cloud.r-project.org"
       }
     ]
   },
@@ -66,7 +66,7 @@
     },
     "MASS": {
       "Package": "MASS",
-      "Version": "7.3-60.2",
+      "Version": "7.3-61",
       "Source": "Repository",
       "Repository": "CRAN",
       "Requirements": [
@@ -77,7 +77,7 @@
         "stats",
         "utils"
       ],
-      "Hash": "2f342c46163b0b54d7b64d1f798e2c78"
+      "Hash": "0cafd6f0500e5deba33be22c46bf6055"
     },
     "Matrix": {
       "Package": "Matrix",
@@ -146,7 +146,7 @@
       "Package": "R6",
       "Version": "2.5.1",
       "Source": "Repository",
-      "Repository": "RSPM",
+      "Repository": "CRAN",
       "Requirements": [
         "R"
       ],
@@ -297,7 +297,7 @@
       "Package": "base64enc",
       "Version": "0.1-3",
       "Source": "Repository",
-      "Repository": "RSPM",
+      "Repository": "CRAN",
       "Requirements": [
         "R"
       ],
@@ -496,14 +496,14 @@
     },
     "cli": {
       "Package": "cli",
-      "Version": "3.6.2",
+      "Version": "3.6.3",
       "Source": "Repository",
-      "Repository": "RSPM",
+      "Repository": "CRAN",
       "Requirements": [
         "R",
         "utils"
       ],
-      "Hash": "1216ac65ac55ec0058a6f75d7ca0fd52"
+      "Hash": "b21916dd77a27642b447374a5d30ecf3"
     },
     "clipr": {
       "Package": "clipr",
@@ -585,7 +585,7 @@
     },
     "crayon": {
       "Package": "crayon",
-      "Version": "1.5.2",
+      "Version": "1.5.3",
       "Source": "Repository",
       "Repository": "CRAN",
       "Requirements": [
@@ -593,7 +593,7 @@
         "methods",
         "utils"
       ],
-      "Hash": "e8a1e41acf02548751f45c718d55aa6a"
+      "Hash": "859d96e65ef198fd43e82b9628d593ef"
     },
     "curl": {
       "Package": "curl",
@@ -674,14 +674,14 @@
     },
     "digest": {
       "Package": "digest",
-      "Version": "0.6.35",
+      "Version": "0.6.36",
       "Source": "Repository",
       "Repository": "CRAN",
       "Requirements": [
         "R",
         "utils"
       ],
-      "Hash": "698ece7ba5a4fa4559e3d537e7ec3d31"
+      "Hash": "fd6824ad91ede64151e93af67df6376b"
     },
     "distcrete": {
       "Package": "distcrete",
@@ -760,14 +760,14 @@
     },
     "epiparameter": {
       "Package": "epiparameter",
-      "Version": "0.1.0.9000",
+      "Version": "0.2.0.9000",
       "Source": "GitHub",
       "RemoteType": "github",
       "RemoteHost": "api.github.com",
       "RemoteUsername": "epiverse-trace",
       "RemoteRepo": "epiparameter",
       "RemoteRef": "main",
-      "RemoteSha": "0f805b90f984def4851a78148f1cf44c3d480845",
+      "RemoteSha": "71d2329363604d18c36ebbc8b0588939ec3296d9",
       "Requirements": [
         "R",
         "checkmate",
@@ -775,29 +775,30 @@
         "distcrete",
         "distributional",
         "graphics",
+        "lifecycle",
         "pillar",
         "rlang",
         "stats",
         "utils"
       ],
-      "Hash": "739fc2a8d826daffda31a1e639a7dcfa"
+      "Hash": "3c012d19a0947674e1908d92e51153fa"
     },
     "evaluate": {
       "Package": "evaluate",
-      "Version": "0.23",
+      "Version": "0.24.0",
       "Source": "Repository",
       "Repository": "CRAN",
       "Requirements": [
         "R",
         "methods"
       ],
-      "Hash": "daf4a1246be12c1fa8c7705a0935c1a0"
+      "Hash": "a1066cbc05caee9a4bf6d90f194ff4da"
     },
     "fansi": {
       "Package": "fansi",
       "Version": "1.0.6",
       "Source": "Repository",
-      "Repository": "RSPM",
+      "Repository": "CRAN",
       "Requirements": [
         "R",
         "grDevices",
@@ -837,7 +838,7 @@
       "Package": "fontawesome",
       "Version": "0.5.2",
       "Source": "Repository",
-      "Repository": "CRAN",
+      "Repository": "RSPM",
       "Requirements": [
         "R",
         "htmltools",
@@ -1309,7 +1310,7 @@
       "Package": "jquerylib",
       "Version": "0.1.4",
       "Source": "Repository",
-      "Repository": "RSPM",
+      "Repository": "CRAN",
       "Requirements": [
         "htmltools"
       ],
@@ -1319,7 +1320,7 @@
       "Package": "jsonlite",
       "Version": "1.8.8",
       "Source": "Repository",
-      "Repository": "RSPM",
+      "Repository": "CRAN",
       "Requirements": [
         "methods"
       ],
@@ -1327,7 +1328,7 @@
     },
     "knitr": {
       "Package": "knitr",
-      "Version": "1.47",
+      "Version": "1.48",
       "Source": "Repository",
       "Repository": "CRAN",
       "Requirements": [
@@ -1339,7 +1340,7 @@
         "xfun",
         "yaml"
       ],
-      "Hash": "7c99b2d55584b982717fcc0950378612"
+      "Hash": "acf380f300c721da9fde7df115a5f86f"
     },
     "labeling": {
       "Package": "labeling",
@@ -1393,7 +1394,7 @@
     },
     "linelist": {
       "Package": "linelist",
-      "Version": "1.1.3",
+      "Version": "1.1.4",
       "Source": "Repository",
       "Repository": "CRAN",
       "Requirements": [
@@ -1404,7 +1405,7 @@
         "rlang",
         "tidyselect"
       ],
-      "Hash": "04c2949bdf494d59a1909b923c56a24a"
+      "Hash": "742c211230f8ebc3a9c543263097dddf"
     },
     "listenv": {
       "Package": "listenv",
@@ -1418,7 +1419,7 @@
     },
     "lme4": {
       "Package": "lme4",
-      "Version": "1.1-35.3",
+      "Version": "1.1-35.4",
       "Source": "Repository",
       "Repository": "CRAN",
       "Requirements": [
@@ -1440,7 +1441,7 @@
         "stats",
         "utils"
       ],
-      "Hash": "862f9d995f528f3051f524791955b20c"
+      "Hash": "a6f5390caceaa1b23b68f57d663b2061"
     },
     "loo": {
       "Package": "loo",
@@ -1474,7 +1475,7 @@
       "Package": "magrittr",
       "Version": "2.0.3",
       "Source": "Repository",
-      "Repository": "RSPM",
+      "Repository": "CRAN",
       "Requirements": [
         "R"
       ],
@@ -1518,7 +1519,7 @@
       "Package": "memoise",
       "Version": "2.0.1",
       "Source": "Repository",
-      "Repository": "RSPM",
+      "Repository": "CRAN",
       "Requirements": [
         "cachem",
         "rlang"
@@ -1546,7 +1547,7 @@
       "Package": "mime",
       "Version": "0.12",
       "Source": "Repository",
-      "Repository": "RSPM",
+      "Repository": "CRAN",
       "Requirements": [
         "tools"
       ],
@@ -1593,9 +1594,9 @@
     },
     "nlme": {
       "Package": "nlme",
-      "Version": "3.1-164",
+      "Version": "3.1-165",
       "Source": "Repository",
-      "Repository": "RSPM",
+      "Repository": "CRAN",
       "Requirements": [
         "R",
         "graphics",
@@ -1603,17 +1604,14 @@
         "stats",
         "utils"
       ],
-      "Hash": "a623a2239e642806158bc4dc3f51565d"
+      "Hash": "2769a88be217841b1f33ed469675c3cc"
     },
     "nloptr": {
       "Package": "nloptr",
-      "Version": "2.0.3",
+      "Version": "2.1.0",
       "Source": "Repository",
       "Repository": "CRAN",
-      "Requirements": [
-        "testthat"
-      ],
-      "Hash": "277c67a08f358f42b6a77826e4492f79"
+      "Hash": "2f436f0a4e224ae7542c68f1896a00d9"
     },
     "numDeriv": {
       "Package": "numDeriv",
@@ -1743,24 +1741,25 @@
     },
     "pkgload": {
       "Package": "pkgload",
-      "Version": "1.3.4",
+      "Version": "1.4.0",
       "Source": "Repository",
       "Repository": "CRAN",
       "Requirements": [
         "R",
         "cli",
-        "crayon",
         "desc",
         "fs",
         "glue",
+        "lifecycle",
         "methods",
         "pkgbuild",
+        "processx",
         "rlang",
         "rprojroot",
         "utils",
         "withr"
       ],
-      "Hash": "876c618df5ae610be84356d5d7a5d124"
+      "Hash": "2ec30ffbeec83da57655b850cf2d3e0e"
     },
     "plogr": {
       "Package": "plogr",
@@ -1860,14 +1859,14 @@
     },
     "ps": {
       "Package": "ps",
-      "Version": "1.7.6",
+      "Version": "1.7.7",
       "Source": "Repository",
       "Repository": "CRAN",
       "Requirements": [
         "R",
         "utils"
       ],
-      "Hash": "dd2b9319ee0656c8acf45c7f40c59de7"
+      "Hash": "878b467580097e9c383acbb16adab57a"
     },
     "purrr": {
       "Package": "purrr",
@@ -1912,7 +1911,7 @@
       "Package": "rappdirs",
       "Version": "0.3.3",
       "Source": "Repository",
-      "Repository": "RSPM",
+      "Repository": "CRAN",
       "Requirements": [
         "R"
       ],
@@ -2197,16 +2196,16 @@
     },
     "simulist": {
       "Package": "simulist",
-      "Version": "0.3.0",
+      "Version": "0.3.0.9000",
       "Source": "GitHub",
-      "RemoteType": "github",
       "Remotes": "epiverse-trace/epiparameter",
+      "RemoteType": "github",
       "RemoteHost": "api.github.com",
       "RemoteRepo": "simulist",
       "RemoteUsername": "epiverse-trace",
       "RemotePkgRef": "epiverse-trace/simulist",
       "RemoteRef": "HEAD",
-      "RemoteSha": "4dde87cde1c5a4310abe6b39c2c04ceac79b5761",
+      "RemoteSha": "2e45a42f03f0dfc1c06f64f0b6fced431443a89c",
       "Requirements": [
         "R",
         "checkmate",
@@ -2215,7 +2214,7 @@
         "rlang",
         "stats"
       ],
-      "Hash": "c452128ef4252d6d71b9a4e37e903c3e"
+      "Hash": "a51541bbe21de05740ecb86b5c00786e"
     },
     "snakecase": {
       "Package": "snakecase",
@@ -2231,9 +2230,14 @@
     },
     "socialmixr": {
       "Package": "socialmixr",
-      "Version": "0.3.2",
-      "Source": "Repository",
-      "Repository": "CRAN",
+      "Version": "0.3.2.9000",
+      "Source": "GitHub",
+      "RemoteType": "github",
+      "RemoteHost": "api.github.com",
+      "RemoteRepo": "socialmixr",
+      "RemoteUsername": "epiforecasts",
+      "RemoteRef": "HEAD",
+      "RemoteSha": "753bef4f60890ebde5c62d2a95fe91e91beba3de",
       "Requirements": [
         "R",
         "countrycode",
@@ -2249,7 +2253,7 @@
         "wpp2017",
         "xml2"
       ],
-      "Hash": "45662d7e3ca41647455b89c74eb28abf"
+      "Hash": "e5c1bfa7e7ba72f1f58176597bc4140a"
     },
     "spam": {
       "Package": "spam",
@@ -2297,7 +2301,7 @@
     },
     "survival": {
       "Package": "survival",
-      "Version": "3.5-8",
+      "Version": "3.7-0",
       "Source": "Repository",
       "Repository": "CRAN",
       "Requirements": [
@@ -2309,7 +2313,7 @@
         "stats",
         "utils"
       ],
-      "Hash": "184d7799bca4ba8c3be72ea396f4b9a3"
+      "Hash": "5aaa9cbaf4aba20f8e06fdea1850a398"
     },
     "sys": {
       "Package": "sys",
@@ -2682,7 +2686,7 @@
     },
     "xfun": {
       "Package": "xfun",
-      "Version": "0.44",
+      "Version": "0.45",
       "Source": "Repository",
       "Repository": "CRAN",
       "Requirements": [
@@ -2690,13 +2694,13 @@
         "stats",
         "tools"
       ],
-      "Hash": "317a0538d32f4a009658bcedb7923f4b"
+      "Hash": "ca59c87fe305b16a9141a5874c3a7889"
     },
     "xml2": {
       "Package": "xml2",
       "Version": "1.3.6",
       "Source": "Repository",
-      "Repository": "RSPM",
+      "Repository": "CRAN",
       "Requirements": [
         "R",
         "cli",
@@ -2707,10 +2711,10 @@
     },
     "yaml": {
       "Package": "yaml",
-      "Version": "2.3.8",
+      "Version": "2.3.9",
       "Source": "Repository",
-      "Repository": "RSPM",
-      "Hash": "29240487a071f535f5e5d5a323b7afbd"
+      "Repository": "CRAN",
+      "Hash": "9cb28d11799d93c953f852083d55ee9e"
     }
   }
 }

From 619a088794c0947c827e066ac61b62416b45d06e Mon Sep 17 00:00:00 2001
From: Degoot-AM <degoot@aims.ac.za>
Date: Mon, 23 Sep 2024 12:06:54 +0000
Subject: [PATCH 2/5] add more challenges

---
 episodes/describe-cases.Rmd | 79 +++++++++++++++++++++++++------------
 1 file changed, 54 insertions(+), 25 deletions(-)

diff --git a/episodes/describe-cases.Rmd b/episodes/describe-cases.Rmd
index 3f1f9b9c..96a1a1d2 100644
--- a/episodes/describe-cases.Rmd
+++ b/episodes/describe-cases.Rmd
@@ -60,7 +60,7 @@ Its minimal configuration can generate a  linelist as shown in the below code ch
 
 ```{r, warning=FALSE, message=FALSE}
 set.seed(1)
-sim_data <- simulist::sim_linelist(outbreak_size = c(1000, 2000))
+sim_data <- simulist::sim_linelist(outbreak_size = c(1000, 1500))
 head(sim_data)
 ```
 
@@ -89,21 +89,21 @@ simulated  Ebola `linelist` data based on the  date of onset.
 
 ```{r, message=FALSE, warning=FALSE}
 # create incidence object by aggregating case data  based on the date of onset
-dialy_incidence_data <- incidence2::incidence(
+dialy_incidence <- incidence2::incidence(
   sim_data,
   date_index = "date_onset",
   interval = 1
 )
 
 # View the first incidence data for the first 5 days
-dialy_incidence_data
+head(dialy_incidence, 5)
 ```
 Furthermore, with the `{incidence2}` package, you can specify the desired interval and categorize cases by one or 
 more factors. Below is a code snippet demonstrating weekly cases grouped by the date of onset and gender.
 
 ```{r}
 # Grouping data by week
-weekly_incidence_data <- incidence2::incidence(
+weekly_incidence <- incidence2::incidence(
   sim_data,
   date_index = "date_onset",
   interval = 7,
@@ -111,18 +111,18 @@ weekly_incidence_data <- incidence2::incidence(
 )
 
 # View incidence data for the first 5 weeks
-weekly_incidence_data
+head(weekly_incidence, 5)
 ```
 
 ::::::::::::::::::::::::::::::::::::: callout
-## Notes 
+## Dates Completion  
 When cases are grouped by different factors, it's possible that these groups may have different date ranges in the 
 resulting `incidence2` object. The `incidence2` package provides a function called `complete_dates()` to ensure that an
  incidence object has the same range of dates for each group. By default, missing counts will be filled with 0.
 
 ```{r, message=FALSE, warning=FALSE}
 # Create incidence object
-dialy_incidence_data_2 <- incidence2::incidence(
+dialy_incidence_2 <- incidence2::incidence(
   sim_data,
   date_index = "date_onset",
   groups = "sex",
@@ -131,7 +131,7 @@ dialy_incidence_data_2 <- incidence2::incidence(
 
 # Complete missing dates in the incidence object
 incidence2::complete_dates(
-  x = dialy_incidence_data_2,
+  x = dialy_incidence_2,
   expand = TRUE,
   fill = 0L, by = 1L,
   allow_POSIXct = FALSE
@@ -139,15 +139,24 @@ incidence2::complete_dates(
 ```
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+
+## Challenge 1: Can you do it?
+ - **Task**:Aggregate sim_data linelist based on admission date and case outcome in __biweekly__
+  intervals, and save the results in an object called `biweekly_incidence`.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
 ## Visualization
 
-"The `incidence2` object can be visualized using the `plot()` function from the base R package. 
+The `incidence2` object can be visualized using the `plot()` function from the base R package. 
 The resulting graph is referred to as an epidemic curve, or epi-curve for short. The following code 
-snippets generate epi-curves for the `dialy_incidence_data` and `weekly_incidence_data` incidence objects mentioned above."
+snippets generate epi-curves for the `dialy_incidence` and `weekly_incidence` incidence objects mentioned above.
 
 ```{r, message=FALSE, warning=FALSE}
 # Plot daily incidence data
-base::plot(dialy_incidence_data) +
+base::plot(dialy_incidence) +
   ggplot2::labs(
     x = "Time (in days)",
     y = "Dialy cases"
@@ -159,7 +168,7 @@ base::plot(dialy_incidence_data) +
 ```{r, message=FALSE, warning=FALSE}
 # Plot weekly incidence data
 
-base::plot(weekly_incidence_data) +
+base::plot(weekly_incidence) +
   ggplot2::labs(
     x = "Time (in weeks)",
     y = "weekly cases"
@@ -167,12 +176,19 @@ base::plot(weekly_incidence_data) +
   tracetheme::theme_trace()
 ``` 
 
+::::::::::::::::::::::::::::::::::::: challenge 
+
+## Challenge 2: Can you do it?
+ - **Task**: Visualize `biweekly_incidence` object.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
 ## Curve of cumulative cases
 
 The cumulative number of cases can be calculated using the `cumulate()` function from an `incidence2` object and visualized, as in the example below.
 
 ```{r, message=FALSE, warning=FALSE}
-cum_df <- incidence2::cumulate(dialy_incidence_data)
+cum_df <- incidence2::cumulate(dialy_incidence)
 
 base::plot(cum_df) +
   ggplot2::labs(
@@ -182,7 +198,15 @@ base::plot(cum_df) +
   tracetheme::theme_trace()
 ```
 
-Note that this function preserves grouping, i.e., if the `incidence2` object contains groups, it will accumulate the cases accordingly. Give it a try with the `weekly_incidence_data` object!
+Note that this function preserves grouping, i.e., if the `incidence2` object contains groups, it will accumulate the cases accordingly.
+
+
+::::::::::::::::::::::::::::::::::::: challenge 
+
+## Challenge 3: Can you do it?
+ - **Task**: Visulaize the cumulatie cases from `biweekly_incidence` object.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
 
 ##  Peak estimation
 
@@ -191,7 +215,7 @@ This function employs a bootstrapping method to determine the peak time.
 
 ```{r, message=FALSE, warning=FALSE}
 peak <- incidence2::estimate_peak(
-  dialy_incidence_data,
+  dialy_incidence,
   n = 100,
   alpha = 0.05,
   first_only = TRUE,
@@ -203,6 +227,14 @@ peak
 This example demonstrates how to estimate the peak time using the `estimate_peak()` function at $95%$ 
 confidence interval and using 100 bootstrap samples. 
 
+::::::::::::::::::::::::::::::::::::: challenge 
+
+## Challenge 4: Can you do it?
+ - **Task**: Estimate the peak time from `biweekly_incidence` object.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+
 ## Visulaziantion with ggplot2
 
 
@@ -212,16 +244,16 @@ The example below demonstrates how to configure these three elements for a simpl
 
 ```{r, message=FALSE, warning=FALSE}
 breaks <- seq.Date(
-  from = min(as.Date(dialy_incidence_data$date_index,
+  from = min(as.Date(dialy_incidence$date_index,
     na.rm = TRUE
   )),
-  to = as.Date(max(dialy_incidence_data$date_index,
+  to = as.Date(max(dialy_incidence$date_index,
     na.rm = TRUE
   )),
-  by = 1
+  by = 20
 )
 
-ggplot2::ggplot(data = dialy_incidence_data) +
+ggplot2::ggplot(data = dialy_incidence) +
   geom_histogram(
     mapping = aes(
       x = as.Date(date_index),
@@ -254,7 +286,7 @@ ggplot2::ggplot(data = dialy_incidence_data) +
 Use the `group` option in the mapping function to visualize an epicurve with different groups. If there is more than one grouping factor, use the `facet_wrap()` option, as demonstrated in the example below:
 
 ```{r, message=FALSE, warning=FALSE}
-ggplot2::ggplot(data = dialy_incidence_data_2) +
+ggplot2::ggplot(data = dialy_incidence_2) +
   geom_histogram(
     mapping = aes(
       x = as.Date(date_index),
@@ -288,11 +320,8 @@ ggplot2::ggplot(data = dialy_incidence_data_2) +
 
 ::::::::::::::::::::::::::::::::::::: challenge 
 
-## Challenge 1: Can you do it?
-
- - Using suitable distributions for contacts, contact interval, infectious period, onset to hospitalized, and onset to 
- death, generate a simulated linelist data for  Marburg outbreak that has the probability of $0.5$ infection?
- - Aggregate the generated linelist and produce some epidemic curves?
+## Challenge 5: Can you do it?
+ - **Task**: Produce an annotated figure for biweekly_incidence using `{ggplot2}`.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 

From 0e6f1d6867e6f7d8dbba37285fa4111873b7ee41 Mon Sep 17 00:00:00 2001
From: Degoot-AM <degoot@aims.ac.za>
Date: Mon, 23 Sep 2024 12:36:12 +0000
Subject: [PATCH 3/5] commenting code chunks

---
 episodes/describe-cases.Rmd | 156 ++++++++++++++++++++----------------
 1 file changed, 87 insertions(+), 69 deletions(-)

diff --git a/episodes/describe-cases.Rmd b/episodes/describe-cases.Rmd
index 96a1a1d2..2d8f6f88 100644
--- a/episodes/describe-cases.Rmd
+++ b/episodes/describe-cases.Rmd
@@ -34,10 +34,10 @@ Let's start by loading the package `{incidence2}` to aggregate linelist data by
 
 ```{r,eval=TRUE,message=FALSE,warning=FALSE}
 # Load packages
-library(incidence2) # to aggregate and visualise
-library(simulist) # to simulate linelist data
-library(tracetheme) # for figure formatting
-library(tidyverse) # for {dplyr} and {ggplot2} functions and the pipe %>%
+library(incidence2) # For aggregating and visualising
+library(simulist) # For simulating linelist data
+library(tracetheme) # For formatting figures
+library(tidyverse) # For {dplyr} and {ggplot2} functions and the pipe %>%
 ```
 
 ::::::::::::::::::: checklist
@@ -59,8 +59,11 @@ for a hypothetical disease outbreak utilizing the `{simulist}` package. `{simuli
 Its minimal configuration can generate a  linelist as shown in the below code chunk 
 
 ```{r, warning=FALSE, message=FALSE}
-set.seed(1)
+# Simulate linelist data for an outbreak with size between 1000 and 1500
+set.seed(1) # Set seed for reproducibility
 sim_data <- simulist::sim_linelist(outbreak_size = c(1000, 1500))
+
+# Display the first few rows of the simulated dataset
 head(sim_data)
 ```
 
@@ -88,29 +91,30 @@ and/or other factors. The code chunk provided below demonstrates the creation of
 simulated  Ebola `linelist` data based on the  date of onset.
 
 ```{r, message=FALSE, warning=FALSE}
-# create incidence object by aggregating case data  based on the date of onset
+# Create an incidence object by aggregating case data based on the date of onset
 dialy_incidence <- incidence2::incidence(
   sim_data,
   date_index = "date_onset",
-  interval = 1
+  interval = 1 # Aggregate by daily intervals
 )
 
-# View the first incidence data for the first 5 days
+# View the first 5 rows of the incidence data
 head(dialy_incidence, 5)
+
 ```
 Furthermore, with the `{incidence2}` package, you can specify the desired interval and categorize cases by one or 
 more factors. Below is a code snippet demonstrating weekly cases grouped by the date of onset and gender.
 
 ```{r}
-# Grouping data by week
+# Group incidence data by week, accounting for sex and case type
 weekly_incidence <- incidence2::incidence(
   sim_data,
   date_index = "date_onset",
-  interval = 7,
-  groups = c("sex", "case_type")
+  interval = 7, # Aggregate by weekly intervals
+  groups = c("sex", "case_type") # Group by sex and case type
 )
 
-# View incidence data for the first 5 weeks
+# View the incidence data for the first 5 weeks
 head(weekly_incidence, 5)
 ```
 
@@ -121,20 +125,21 @@ resulting `incidence2` object. The `incidence2` package provides a function call
  incidence object has the same range of dates for each group. By default, missing counts will be filled with 0.
 
 ```{r, message=FALSE, warning=FALSE}
-# Create incidence object
+# Create an incidence object grouped by sex, aggregating daily
 dialy_incidence_2 <- incidence2::incidence(
   sim_data,
   date_index = "date_onset",
   groups = "sex",
-  interval = 1
+  interval = 1 # Aggregate by daily intervals
 )
 
 # Complete missing dates in the incidence object
-incidence2::complete_dates(
+dialy_incidence_2_complete <- incidence2::complete_dates(
   x = dialy_incidence_2,
-  expand = TRUE,
-  fill = 0L, by = 1L,
-  allow_POSIXct = FALSE
+  expand = TRUE, # Expand to fill in missing dates
+  fill = 0L,     # Fill missing values with 0
+  by = 1L,       # Fill by daily intervals
+  allow_POSIXct = FALSE # Ensure that dates are not in POSIXct format
 )
 ```
 ::::::::::::::::::::::::::::::::::::::::::::::::
@@ -158,10 +163,10 @@ snippets generate epi-curves for the `dialy_incidence` and `weekly_incidence` in
 # Plot daily incidence data
 base::plot(dialy_incidence) +
   ggplot2::labs(
-    x = "Time (in days)",
-    y = "Dialy cases"
+    x = "Time (in days)", # x-axis label
+    y = "Dialy cases" # y-axis label
   ) +
-  tracetheme::theme_trace()
+  tracetheme::theme_trace() # Apply the custom trace theme
 ``` 
 
 
@@ -170,10 +175,10 @@ base::plot(dialy_incidence) +
 
 base::plot(weekly_incidence) +
   ggplot2::labs(
-    x = "Time (in weeks)",
-    y = "weekly cases"
+    x = "Time (in weeks)", # x-axis label
+    y = "weekly cases" # y-axis label
   ) +
-  tracetheme::theme_trace()
+  tracetheme::theme_trace() # Apply the custom trace theme
 ``` 
 
 ::::::::::::::::::::::::::::::::::::: challenge 
@@ -188,14 +193,16 @@ base::plot(weekly_incidence) +
 The cumulative number of cases can be calculated using the `cumulate()` function from an `incidence2` object and visualized, as in the example below.
 
 ```{r, message=FALSE, warning=FALSE}
+# Calculate cumulative incidence
 cum_df <- incidence2::cumulate(dialy_incidence)
 
+# Plot cumulative incidence data using ggplot2
 base::plot(cum_df) +
   ggplot2::labs(
-    x = "Time (in days)",
-    y = "weekly cases"
+    x = "Time (in days)", # x-axis label
+    y = "weekly cases" # y-axis label
   ) +
-  tracetheme::theme_trace()
+  tracetheme::theme_trace() # Apply the custom trace theme
 ```
 
 Note that this function preserves grouping, i.e., if the `incidence2` object contains groups, it will accumulate the cases accordingly.
@@ -214,15 +221,17 @@ One can estimate the peak --the time with the highest number of recorded cases--
 This function employs a bootstrapping method to determine the peak time.
 
 ```{r, message=FALSE, warning=FALSE}
+# Estimate the peak of the daily incidence data
 peak <- incidence2::estimate_peak(
   dialy_incidence,
-  n = 100,
-  alpha = 0.05,
-  first_only = TRUE,
-  progress = FALSE
+  n = 100,         # Number of simulations for the peak estimation
+  alpha = 0.05,    # Significance level for the confidence interval
+  first_only = TRUE, # Return only the first peak found
+  progress = FALSE  # Disable progress messages
 )
 
-peak
+# Display the estimated peak
+print(peak)
 ```
 This example demonstrates how to estimate the peak time using the `estimate_peak()` function at $95%$ 
 confidence interval and using 100 bootstrap samples. 
@@ -235,7 +244,7 @@ confidence interval and using 100 bootstrap samples.
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
 
-## Visulaziantion with ggplot2
+## Visualization with ggplot2
 
 
 `{incidence2}` produces basic plots for epicurves, but additional work is required to create well-annotated graphs. However, using the `{ggplot2}` package, you can generate more sophisticated and better-annotated epicurves.
@@ -243,16 +252,14 @@ confidence interval and using 100 bootstrap samples.
 The example below demonstrates how to configure these three elements for a simple `{incidence2}` object.
 
 ```{r, message=FALSE, warning=FALSE}
+# Define date breaks for the x-axis
 breaks <- seq.Date(
-  from = min(as.Date(dialy_incidence$date_index,
-    na.rm = TRUE
-  )),
-  to = as.Date(max(dialy_incidence$date_index,
-    na.rm = TRUE
-  )),
-  by = 20
+  from = min(as.Date(dialy_incidence$date_index, na.rm = TRUE)),
+  to = max(as.Date(dialy_incidence$date_index, na.rm = TRUE)),
+  by = 20 # every 20 days
 )
 
+# Create the plot
 ggplot2::ggplot(data = dialy_incidence) +
   geom_histogram(
     mapping = aes(
@@ -260,32 +267,37 @@ ggplot2::ggplot(data = dialy_incidence) +
       y = count
     ),
     stat = "identity",
-    color = "blue",
-    width = 1
+    color = "blue", # bar border color
+    fill = "lightblue", # bar fill color
+    width = 1 # bar width
   ) +
-  theme_minimal() + # simple theme
+  theme_minimal() + # apply a minimal theme for clean visuals
   theme(
-    plot.title = element_text(face = "bold", hjust = 0.5),
-    plot.caption = element_text(face = "italic", hjust = 0),
-    axis.title = element_text(face = "bold"),
-    axis.text.x = element_text(angle = 45)
+    plot.title = element_text(face = "bold",
+                              hjust = 0.5), # center and bold title
+    plot.subtitle = element_text(hjust = 0.5), # center subtitle
+    plot.caption = element_text(face = "italic",
+                                hjust = 0), # italicized caption
+    axis.title = element_text(face = "bold"), # bold axis titles
+    axis.text.x = element_text(angle = 45, vjust = 0.5) # rotated x-axis text
   ) +
   labs(
-    x = "Date", # x-label
-    y = "Number of cases", # y-label,
-    title = "Daily outbreak cases", # title
-    subtitle = "subtitle", # subtitle
-    caption = "informative caption"
+    x = "Date", # x-axis label
+    y = "Number of cases", # y-axis label
+    title = "Daily Outbreak Cases", # plot title
+    subtitle = "Epidemiological Data for the Outbreak", # plot subtitle
+    caption = "Data Source: Simulated Data" # plot caption
   ) +
   scale_x_date(
-    breaks = breaks,
-    label = scales::label_date_short()
+    breaks = breaks, # set custom breaks on the x-axis
+    labels = scales::label_date_short() # shortened date labels
   )
 ```
 
 Use the `group` option in the mapping function to visualize an epicurve with different groups. If there is more than one grouping factor, use the `facet_wrap()` option, as demonstrated in the example below:
 
 ```{r, message=FALSE, warning=FALSE}
+# Plot daily incidence by sex with facets
 ggplot2::ggplot(data = dialy_incidence_2) +
   geom_histogram(
     mapping = aes(
@@ -296,32 +308,37 @@ ggplot2::ggplot(data = dialy_incidence_2) +
     ),
     stat = "identity"
   ) +
-  theme_minimal() + # simple theme
+  theme_minimal() + # apply minimal theme
   theme(
-    plot.title = element_text(face = "bold", hjust = 0.5),
-    plot.caption = element_text(face = "italic", hjust = 0),
-    axis.title = element_text(face = "bold"),
-    axis.text.x = element_text(angle = 45)
+    plot.title = element_text(face = "bold",
+                              hjust = 0.5), # bold and center the title
+    plot.subtitle = element_text(hjust = 0.5), # center the subtitle
+    plot.caption = element_text(face = "italic", hjust = 0), # italic caption
+    axis.title = element_text(face = "bold"), # bold axis labels
+    axis.text.x = element_text(angle = 45,
+                               vjust = 0.5) # rotate x-axis text for readability
   ) +
   labs(
-    x = "Date", # x-label
-    y = "Number of cases", # y-label,
-    title = "Daily outbreak cases", # title
-    subtitle = "subtitle", # subtitle
-    caption = "informative caption"
+    x = "Date", # x-axis label
+    y = "Number of cases", # y-axis label
+    title = "Daily Outbreak Cases by Sex", # plot title
+    subtitle = "Incidence of Cases Grouped by Sex", # plot subtitle
+    caption = "Data Source: Simulated Data" # caption for additional context
   ) +
-  facet_wrap(~sex) +
+  facet_wrap(~sex) + # create separate panels by sex
   scale_x_date(
-    breaks = breaks,
-    label = scales::label_date_short()
-  )
+    breaks = breaks, # set custom date breaks
+    labels = scales::label_date_short() # short date format for x-axis labels
+  ) +
+  scale_fill_manual(values = c("lightblue",
+                               "lightpink")) # custom fill colors for sex
 ```
 
 
 ::::::::::::::::::::::::::::::::::::: challenge 
 
 ## Challenge 5: Can you do it?
- - **Task**: Produce an annotated figure for biweekly_incidence using `{ggplot2}`.
+ - **Task**: Produce an annotated figure for biweekly_incidence using `{ggplot2}` package.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
@@ -329,5 +346,6 @@ ggplot2::ggplot(data = dialy_incidence_2) +
 
 - Use `{simulist}` package to generate synthetic outbreak data
 - Use `{incidence2}` package to aggregate case data based on a date event, and produce epidemic curves. 
+- Use `{ggplot2}` package to produce better annotated epicurves. 
 
 ::::::::::::::::::::::::::::::::::::::::::::::::

From dee2d40633deadd29bf426ecd5cdf55c2cda5f7a Mon Sep 17 00:00:00 2001
From: Degoot-AM <degoot@aims.ac.za>
Date: Mon, 23 Sep 2024 17:17:07 +0000
Subject: [PATCH 4/5] adding extra comments

---
 episodes/describe-cases.Rmd | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/episodes/describe-cases.Rmd b/episodes/describe-cases.Rmd
index 2d8f6f88..57d87c74 100644
--- a/episodes/describe-cases.Rmd
+++ b/episodes/describe-cases.Rmd
@@ -32,7 +32,7 @@ Let's start by loading the package `{incidence2}` to aggregate linelist data by
  We'll use the pipe `%>%` to connect some of their functions, including others from the packages `{dplyr}` and 
  `{ggplot2}`, so let's also call to the tidyverse package:
 
-```{r,eval=TRUE,message=FALSE,warning=FALSE}
+```{r}
 # Load packages
 library(incidence2) # For aggregating and visualising
 library(simulist) # For simulating linelist data
@@ -58,7 +58,7 @@ To illustrate the process of conducting EDA on outbreak data, we will generate a
 for a hypothetical disease outbreak utilizing the `{simulist}` package. `{simulist}` generates simulation data for outbreak according to a given configuration. 
 Its minimal configuration can generate a  linelist as shown in the below code chunk 
 
-```{r, warning=FALSE, message=FALSE}
+```{r}
 # Simulate linelist data for an outbreak with size between 1000 and 1500
 set.seed(1) # Set seed for reproducibility
 sim_data <- simulist::sim_linelist(outbreak_size = c(1000, 1500))
@@ -90,7 +90,7 @@ package offers an essential function, called `incidence`, for grouping case data
 and/or other factors. The code chunk provided below demonstrates the creation of an `incidence2` object from the 
 simulated  Ebola `linelist` data based on the  date of onset.
 
-```{r, message=FALSE, warning=FALSE}
+```{r}
 # Create an incidence object by aggregating case data based on the date of onset
 dialy_incidence <- incidence2::incidence(
   sim_data,
@@ -124,7 +124,7 @@ When cases are grouped by different factors, it's possible that these groups may
 resulting `incidence2` object. The `incidence2` package provides a function called `complete_dates()` to ensure that an
  incidence object has the same range of dates for each group. By default, missing counts will be filled with 0.
 
-```{r, message=FALSE, warning=FALSE}
+```{r}
 # Create an incidence object grouped by sex, aggregating daily
 dialy_incidence_2 <- incidence2::incidence(
   sim_data,
@@ -159,7 +159,7 @@ The `incidence2` object can be visualized using the `plot()` function from the b
 The resulting graph is referred to as an epidemic curve, or epi-curve for short. The following code 
 snippets generate epi-curves for the `dialy_incidence` and `weekly_incidence` incidence objects mentioned above.
 
-```{r, message=FALSE, warning=FALSE}
+```{r}
 # Plot daily incidence data
 base::plot(dialy_incidence) +
   ggplot2::labs(
@@ -170,7 +170,7 @@ base::plot(dialy_incidence) +
 ``` 
 
 
-```{r, message=FALSE, warning=FALSE}
+```{r}
 # Plot weekly incidence data
 
 base::plot(weekly_incidence) +
@@ -192,7 +192,7 @@ base::plot(weekly_incidence) +
 
 The cumulative number of cases can be calculated using the `cumulate()` function from an `incidence2` object and visualized, as in the example below.
 
-```{r, message=FALSE, warning=FALSE}
+```{r}
 # Calculate cumulative incidence
 cum_df <- incidence2::cumulate(dialy_incidence)
 
@@ -220,7 +220,7 @@ Note that this function preserves grouping, i.e., if the `incidence2` object con
 One can estimate the peak --the time with the highest number of recorded cases-- using the `estimate_peak()` function from the {incidence2} package. 
 This function employs a bootstrapping method to determine the peak time.
 
-```{r, message=FALSE, warning=FALSE}
+```{r}
 # Estimate the peak of the daily incidence data
 peak <- incidence2::estimate_peak(
   dialy_incidence,
@@ -251,7 +251,7 @@ confidence interval and using 100 bootstrap samples.
 `{ggplot2}` is a comprehensive package with many functionalities. However, we will focus on three key elements for producing epicurves: histogram plots, scaling date axes and their labels, and general plot theme annotation.
 The example below demonstrates how to configure these three elements for a simple `{incidence2}` object.
 
-```{r, message=FALSE, warning=FALSE}
+```{r}
 # Define date breaks for the x-axis
 breaks <- seq.Date(
   from = min(as.Date(dialy_incidence$date_index, na.rm = TRUE)),
@@ -296,7 +296,7 @@ ggplot2::ggplot(data = dialy_incidence) +
 
 Use the `group` option in the mapping function to visualize an epicurve with different groups. If there is more than one grouping factor, use the `facet_wrap()` option, as demonstrated in the example below:
 
-```{r, message=FALSE, warning=FALSE}
+```{r}
 # Plot daily incidence by sex with facets
 ggplot2::ggplot(data = dialy_incidence_2) +
   geom_histogram(

From b9656bfc0d1dee128a1f5d2682e2ba750635e12b Mon Sep 17 00:00:00 2001
From: Andree Valle Campos <avallecam@gmail.com>
Date: Tue, 1 Oct 2024 03:04:57 +0100
Subject: [PATCH 5/5] add after edit commits

---
 episodes/describe-cases.Rmd | 59 ++++++++++++++++++++++++-------------
 1 file changed, 39 insertions(+), 20 deletions(-)

diff --git a/episodes/describe-cases.Rmd b/episodes/describe-cases.Rmd
index 57d87c74..68168235 100644
--- a/episodes/describe-cases.Rmd
+++ b/episodes/describe-cases.Rmd
@@ -32,7 +32,7 @@ Let's start by loading the package `{incidence2}` to aggregate linelist data by
  We'll use the pipe `%>%` to connect some of their functions, including others from the packages `{dplyr}` and 
  `{ggplot2}`, so let's also call to the tidyverse package:
 
-```{r}
+```{r,eval=TRUE,message=FALSE,warning=FALSE}
 # Load packages
 library(incidence2) # For aggregating and visualising
 library(simulist) # For simulating linelist data
@@ -61,10 +61,11 @@ Its minimal configuration can generate a  linelist as shown in the below code ch
 ```{r}
 # Simulate linelist data for an outbreak with size between 1000 and 1500
 set.seed(1) # Set seed for reproducibility
-sim_data <- simulist::sim_linelist(outbreak_size = c(1000, 1500))
+sim_data <- simulist::sim_linelist(outbreak_size = c(1000, 1500)) %>%
+  dplyr::as_tibble() # for a simple data frame output
 
-# Display the first few rows of the simulated dataset
-head(sim_data)
+# Display the simulated dataset
+sim_data
 ```
 
  This linelist dataset offers individual-level information about the outbreak. 
@@ -73,10 +74,10 @@ head(sim_data)
 
 ## Additional Resources on Outbreak Data
 
- - This is the default configuration of `{simulist}`, should want to know more about its functionalities, 
+This is the default configuration of `{simulist}`, should want to know more about its functionalities, 
 check [middle](https://github.com/epiverse-trace/tutorials-middle/) and [late](https://epiverse-trace.github.io/tutorials-late/) tutorials.
 
- - You can find more information at the [`{outbreak}` documentation](https://outbreak-info.github.io/R-outbreak-info/)
+You can find datasets from real emergencies from the past at the [`{outbreak}` R package documentation](https://outbreak-info.github.io/R-outbreak-info/)
 
 :::::::::::::::::::
 
@@ -86,21 +87,20 @@ check [middle](https://github.com/epiverse-trace/tutorials-middle/) and [late](h
 
 Downstream analysis involves working with aggregated data rather than individual cases. This requires grouping linelist 
 data in the form of incidence data. The [incidence2]((https://www.reconverse.org/incidence2/articles/incidence2.html){.external target="_blank"}) 
-package offers an essential function, called `incidence`, for grouping case data, usually centered around dated events 
-and/or other factors. The code chunk provided below demonstrates the creation of an `incidence2` object from the 
-simulated  Ebola `linelist` data based on the  date of onset.
+package offers an essential function, called `incidence2::incidence()`, for grouping case data, usually centered around dated events 
+and/or other factors. The code chunk provided below demonstrates the creation of an `<incidence2>` class object from the 
+simulated  Ebola `linelist` data based on the date of onset.
 
 ```{r}
 # Create an incidence object by aggregating case data based on the date of onset
 dialy_incidence <- incidence2::incidence(
   sim_data,
   date_index = "date_onset",
-  interval = 1 # Aggregate by daily intervals
+  interval = "day" # Aggregate by daily intervals
 )
 
-# View the first 5 rows of the incidence data
-head(dialy_incidence, 5)
-
+# View the incidence data
+dialy_incidence
 ```
 Furthermore, with the `{incidence2}` package, you can specify the desired interval and categorize cases by one or 
 more factors. Below is a code snippet demonstrating weekly cases grouped by the date of onset and gender.
@@ -110,12 +110,12 @@ more factors. Below is a code snippet demonstrating weekly cases grouped by the
 weekly_incidence <- incidence2::incidence(
   sim_data,
   date_index = "date_onset",
-  interval = 7, # Aggregate by weekly intervals
+  interval = "week", # Aggregate by weekly intervals
   groups = c("sex", "case_type") # Group by sex and case type
 )
 
-# View the incidence data for the first 5 weeks
-head(weekly_incidence, 5)
+# View the incidence data
+weekly_incidence
 ```
 
 ::::::::::::::::::::::::::::::::::::: callout
@@ -123,6 +123,8 @@ head(weekly_incidence, 5)
 When cases are grouped by different factors, it's possible that these groups may have different date ranges in the 
 resulting `incidence2` object. The `incidence2` package provides a function called `complete_dates()` to ensure that an
  incidence object has the same range of dates for each group. By default, missing counts will be filled with 0.
+ 
+This functionality is also available as an argument within `incidence2::incidence()` adding `complete_dates = TRUE`.
 
 ```{r}
 # Create an incidence object grouped by sex, aggregating daily
@@ -130,10 +132,12 @@ dialy_incidence_2 <- incidence2::incidence(
   sim_data,
   date_index = "date_onset",
   groups = "sex",
-  interval = 1 # Aggregate by daily intervals
+  interval = "day", # Aggregate by daily intervals
+  complete_dates = TRUE # Complete missing dates in the incidence object
 )
+```
 
-# Complete missing dates in the incidence object
+```{r,echo=FALSE,eval=FALSE}
 dialy_incidence_2_complete <- incidence2::complete_dates(
   x = dialy_incidence_2,
   expand = TRUE, # Expand to fill in missing dates
@@ -142,13 +146,15 @@ dialy_incidence_2_complete <- incidence2::complete_dates(
   allow_POSIXct = FALSE # Ensure that dates are not in POSIXct format
 )
 ```
+
+
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
 
 ::::::::::::::::::::::::::::::::::::: challenge 
 
 ## Challenge 1: Can you do it?
- - **Task**:Aggregate sim_data linelist based on admission date and case outcome in __biweekly__
+ - **Task**: Aggregate `sim_data` linelist based on admission date and case outcome in __biweekly__
   intervals, and save the results in an object called `biweekly_incidence`.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
@@ -172,7 +178,6 @@ base::plot(dialy_incidence) +
 
 ```{r}
 # Plot weekly incidence data
-
 base::plot(weekly_incidence) +
   ggplot2::labs(
     x = "Time (in weeks)", # x-axis label
@@ -181,6 +186,20 @@ base::plot(weekly_incidence) +
   tracetheme::theme_trace() # Apply the custom trace theme
 ``` 
 
+:::::::::::::::::::::::: callout
+
+#### easy aesthetics
+
+We invite you to skim the `{incidence2}` package ["Get started" vignette](https://www.reconverse.org/incidence2/articles/incidence2.html). Find how you can use arguments within `plot()` to provide aesthetics to your incidence2 class objects!
+
+```{r}
+base::plot(weekly_incidence, fill = "sex")
+```
+
+Some of them include `show_cases = TRUE`, `angle = 45`, and `n_breaks = 5`. Give them a try!
+
+::::::::::::::::::::::::
+
 ::::::::::::::::::::::::::::::::::::: challenge 
 
 ## Challenge 2: Can you do it?