Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

full review post-trial 01 and new episodes in tutorials-early #70

Merged
merged 26 commits into from
Jun 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
79d7d69
Update episodes/describe-cases.Rmd
Degoot-AM Jun 3, 2024
6b4c918
fix learning objective
avallecam Jun 12, 2024
4f79a87
fix typo
avallecam Jun 12, 2024
e4ed011
fix typo edit
avallecam Jun 13, 2024
45f3a6a
fix typos in clean file
avallecam Jun 13, 2024
1ffd6d9
fix typos in delay episodes
avallecam Jun 13, 2024
e653b74
add fixed proposed for clean episode
avallecam Jun 13, 2024
07373b7
fix links for the early repository
avallecam Jun 13, 2024
4bb9ead
fix writing
avallecam Jun 14, 2024
49314af
Update episodes/clean-data.Rmd
Degoot-AM Jun 14, 2024
bb62e71
Update episodes/clean-data.Rmd
Degoot-AM Jun 14, 2024
dfad0a2
fix code for tutorial to run
avallecam Jun 14, 2024
7588529
required to make the latest version to run
avallecam Jun 14, 2024
2917a86
Update episodes/simple-analysis.Rmd
Degoot-AM Jun 17, 2024
e2e4191
fix setup page naming
avallecam Jun 17, 2024
3406626
fix typo in setup
avallecam Jun 17, 2024
91e52b2
fit website title
avallecam Jun 17, 2024
a710666
add content on table in setup
avallecam Jun 17, 2024
37dd49c
add edit suggestions to read-cases
avallecam Jun 17, 2024
fec84e1
add edit suggestion to clean-data
avallecam Jun 17, 2024
895d671
add edit suggestion to describe-cases
avallecam Jun 17, 2024
7d52b30
add edit suggestion to simple-analysis
avallecam Jun 17, 2024
ba6504d
add namespace after intro
avallecam Jun 17, 2024
76e7cbd
add namespace reminder
avallecam Jun 17, 2024
8fb5d66
add add early task packages + links to setup
avallecam Jun 17, 2024
400775c
fix lintr checks
avallecam Jun 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
carpentry: 'incubator'

# Overall title for pages.
title: 'Using delays to quantify transmission'
title: 'Read and clean case data, and make linelist for outbreak analytics with R'

# Date the lesson was created (YYYY-MM-DD, this is empty by default)
created:
Expand Down
70 changes: 47 additions & 23 deletions episodes/clean-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,33 @@

- Explain how to clean, curate, and standardize case data using `{cleanepi}` package
- Demonstrate how to covert case data to `linelist` data
avallecam marked this conversation as resolved.
Show resolved Hide resolved
- Perform essential data-cleaning operations to be performed in a raw case dataset.

avallecam marked this conversation as resolved.
Show resolved Hide resolved
::::::::::::::::::::::::::::::::::::::::::::::::

avallecam marked this conversation as resolved.
Show resolved Hide resolved
::::::::::::::::::::: prereq

This episode requires you to:

- Download the [simulated_ebola_2.csv](https://epiverse-trace.github.io/tutorials-early/data/simulated_ebola_2.csv)
- Save it in the `data/` folder.

:::::::::::::::::::::

## Introduction
In the process of analyzing outbreak data, it's essential to ensure that the dataset is clean, curated, standardized, and validate to facilitate accurate and reproducible analysis. This episode focuses on cleaning epidemics and outbreaks data using the [cleanepi](https://epiverse-trace.github.io/cleanepi/) package, and validate it using the [linelist](https://epiverse-trace.github.io/linelist/) package. For demonstration purposes, we'll work with a simulated dataset of Ebola cases.
In the process of analyzing outbreak data, it's essential to ensure that the dataset is clean, curated, standardized, and valid to facilitate accurate and reproducible analysis. This episode focuses on cleaning epidemics and outbreaks data using the [cleanepi](https://epiverse-trace.github.io/cleanepi/) package, and validate it using the [linelist](https://epiverse-trace.github.io/linelist/) package. For demonstration purposes, we'll work with a simulated dataset of Ebola cases.

avallecam marked this conversation as resolved.
Show resolved Hide resolved
::::::::::::::::::: checklist

### The double-colon

The double-colon `::` in R let you call a specific function from a package without loading the entire package into the current environment.

For example, `dplyr::filter(data, condition)` uses `filter()` from the `{dplyr}` package.

This help us remember package functions and avoid namespace conflicts.

:::::::::::::::::::


The first step is to import the dataset following the guidelines outlined in the [Read case data](../episodes/read-cases.Rmd) episode. This involves loading the dataset into our environment and view its structure and content.
Expand All @@ -30,9 +52,9 @@
library("here")

# Read data
# e.g.: if path to file is data/raw-data/simulated_ebola_2.csv then:
# e.g.: if path to file is data/simulated_ebola_2.csv then:
raw_ebola_data <- rio::import(
here::here("data", "raw-data", "simulated_ebola_2.csv")
here::here("data", "simulated_ebola_2.csv")
)
```

Expand Down Expand Up @@ -83,14 +105,18 @@

If you want to maintain certain column names without subjecting them to the standardization process, you can utilize the `keep` parameter of the `standardize_column_names()` function. This parameter accepts a vector of column names that are intended to be kept unchanged.

**Exercise:** Standardize the column names of the input dataset, but keep the “V1” column as is.
::::::::::::::::::::::::::::::::::::: challenge

Standardize the column names of the input dataset, but keep the “V1” column as it is.

::::::::::::::::::::::::::::::::::::::::::::::::

### Removing irregularities

Raw data may contain irregularities such as duplicated and empty rows and columns, as well as constant columns. `remove_duplicates` and `remove_constants` functions from `{cleanepi}` remove such irregularities as demonstrated in the below code chunk.

```{r}
sim_ebola_data <- cleanepi::remove_constant(sim_ebola_data)
sim_ebola_data <- cleanepi::remove_constants(sim_ebola_data)
sim_ebola_data <- cleanepi::remove_duplicates(sim_ebola_data)
```

Expand All @@ -101,24 +127,24 @@
In addition to the regularities, raw data can contain missing values that may be encoded by different strings, including the empty. To ensure robust analysis, it is a good practice to replace all missing values by `NA` in the entire dataset. Below is a code snippet demonstrating how you can achieve this in `{cleanepi}`:

```{r}
sim_ebola_data <- cleanepi::replace_missing_values(sim_ebola_data)
sim_ebola_data <- cleanepi::replace_missing_values(
data = sim_ebola_data,
na_strings = ""
)
```

### Validating subject IDs

Each entry in the dataset represents a subject and should be distinguishable by a specific column formatted in a particular way, such as falling within a specified range, containing certain prefixes and/or suffixes, containing a specific number of characters. The `{cleanepi}` package offers the `check_subject_ids` function designed precisely for this task as shown in the below code chunk. This function validates whether they are unique and meet the required criteria.

```{r}
# remove this chunk code once {cleanepi} is updated.
# The coercion made here will be accounted for within {cleanepi}
sim_ebola_data$case_id <- as.character(sim_ebola_data$case_id)
```

```{r}
sim_ebola_data <- cleanepi::check_subject_ids(sim_ebola_data,
target_columns = "case_id",
range = c(0, 15000)
)
sim_ebola_data <-
cleanepi::check_subject_ids(
data = sim_ebola_data,
target_columns = "case_id",
range = c(0, 15000)
)
```

Note that our simulated dataset does contain duplicated subject IDS.
Expand Down Expand Up @@ -168,8 +194,7 @@
```{r, warning=FALSE}
sim_ebola_data <- cleanepi::check_date_sequence(
Degoot-AM marked this conversation as resolved.
Show resolved Hide resolved
data = sim_ebola_data,
target_columns = c("date_onset", "date_sample"),
remove = TRUE
target_columns = c("date_onset", "date_sample")
)
```

Expand Down Expand Up @@ -212,7 +237,7 @@
until the date this document was generated (`r Sys.Date()`).

```{r}
sim_ebola_data <- cleanepi::span(
sim_ebola_data <- cleanepi::timespan(
sim_ebola_data,
target_column = "date_sample",
end_date = Sys.Date(),
Expand All @@ -234,13 +259,11 @@

Further more, you can combine multiple data cleaning tasks via the pipe operator in "|>", as shown in the below code snippet.
```{r}
# remove the line below once Karim has updated cleanepi
raw_ebola_data$`case id` <- as.character(raw_ebola_data$`case id`)
# PERFORM THE OPERATIONS USING THE pipe SYNTAX
cleaned_data <- raw_ebola_data |>
cleanepi::standardize_column_names(keep = "V1", rename = NULL) |>
cleanepi::replace_missing_values(target_columns = NULL) |>
cleanepi::remove_constant(cutoff = 1.0) |>
cleanepi::replace_missing_values(na_strings = "") |>
cleanepi::remove_constants(cutoff = 1.0) |>
cleanepi::remove_duplicates(target_columns = NULL) |>
cleanepi::standardize_dates(
target_columns = c("date_onset", "date_sample"),
Expand All @@ -266,7 +289,7 @@

You can view the report using `cleanepi::print_report()` function.

![Example of data cleaning report generated by `{cleanepi}`](fig/report_demo.png)

Check warning on line 292 in episodes/clean-data.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/report_demo.png

## Validating and tagging case data
In outbreak analysis, once you have completed the initial steps of reading and cleaning the case data,
Expand All @@ -280,7 +303,8 @@

```{r,warning=FALSE}
library("linelist")
data <- linelist::make_linelist(cleaned_data,
data <- linelist::make_linelist(
x = cleaned_data,
id = "case_id",
age = "age",
date_onset = "date_onset",
Expand Down
12 changes: 7 additions & 5 deletions episodes/delays-functions.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -90,9 +90,11 @@

### The double-colon

The double-colon `::` in R is used to access functions or objects from a specific package without loading the entire package into the current environment. This allows for a more targeted approach to using package components and helps avoid namespace conflicts.
The double-colon `::` in R let you call a specific function from a package without loading the entire package into the current environment.

`::` lets you call a specific function from a package by explicitly mentioning the package name. For example, `dplyr::filter(data, condition)` uses `filter()` from the `{dplyr}` package without loading the entire package.
For example, `dplyr::filter(data, condition)` uses `filter()` from the `{dplyr}` package.

This help us remember package functions and avoid namespace conflicts.

:::::::::::::::::::

Expand All @@ -111,7 +113,7 @@

If you need it, read in detail about the [R probability functions for the normal distribution](https://sakai.unc.edu/access/content/group/3d1eb92e-7848-4f55-90c3-7c72a54e7e43/public/docs/lectures/lecture13.htm#probfunc), each of its definitions and identify in which part of a distribution they are located!

![The four probability functions for the normal distribution ([Jack Weiss, 2012](https://sakai.unc.edu/access/content/group/3d1eb92e-7848-4f55-90c3-7c72a54e7e43/public/docs/lectures/lecture13.htm#probfunc))](fig/fig5a-normaldistribution.png)

Check warning on line 116 in episodes/delays-functions.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/fig5a-normaldistribution.png

::::::::::::::::::::

Expand Down Expand Up @@ -150,7 +152,7 @@

::::::::::::::::::::::::::::::::: challenge

### Window for contact tracing and the Serial interval
### Window for contact tracing and the serial interval

The **serial interval** is important in the optimisation of contact tracing since it provides a time window for the containment of a disease spread ([Fine, 2003](https://academic.oup.com/aje/article/158/11/1039/162725)). Depending on the serial interval, we can evaluate the need to expand the number of days pre-onset to consider in the contact tracing to include more backwards contacts ([Davis et al., 2020](https://assets.publishing.service.gov.uk/media/61e9ab3f8fa8f50597fb3078/S0523_Oxford_-_Backwards_contact_tracing.pdf)).

Expand Down Expand Up @@ -247,7 +249,7 @@

::::::::::::::::::::::::::::::::: challenge

### Length of quarantine and Incubation period
### Length of quarantine and incubation period

The **incubation period** distribution is a useful delay to assess the length of active monitoring or quarantine ([Lauer et al., 2020](https://www.acpjournals.org/doi/10.7326/M20-0504)). Similarly, delays from symptom onset to recovery (or death) will determine the required duration of health care and case isolation ([Cori et al., 2017](https://royalsocietypublishing.org/doi/10.1098/rstb.2016.0371)).

Expand Down Expand Up @@ -406,7 +408,7 @@

::::::::::::::::::::::::::::::::: challenge

### Use an Incubation period for COVID-19 to estimate Rt
### Use an incubation period for COVID-19 to estimate Rt

Estimate the time-varying reproduction number for the first 60 days of the `example_confirmed` data set from `{EpiNow2}`. Access to an incubation period for COVID-19 from `{epiparameter}` to use it as a reporting delay.

Expand Down
21 changes: 17 additions & 4 deletions episodes/delays-reuse.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@

Infectious diseases follow an infection cycle, which usually includes the following phases: presymptomatic period, symptomatic period and recovery period, as described by their [natural history](../learners/reference.md#naturalhistory). These time periods can be used to understand transmission dynamics and inform disease prevention and control interventions.

![Definition of key time periods. From [Xiang et al, 2021](https://www.sciencedirect.com/science/article/pii/S2468042721000038)](fig/time-periods.jpg)

Check warning on line 38 in episodes/delays-reuse.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/time-periods.jpg


::::::::::::::::: callout
Expand All @@ -61,6 +61,19 @@
library(tidyverse)
```

avallecam marked this conversation as resolved.
Show resolved Hide resolved
::::::::::::::::::: checklist

### The double-colon

The double-colon `::` in R let you call a specific function from a package without loading the entire package into the current environment.

For example, `dplyr::filter(data, condition)` uses `filter()` from the `{dplyr}` package.

This help us remember package functions and avoid namespace conflicts.

:::::::::::::::::::


## The problem

If we want to estimate the transmissibility of an infection, it's common to use a package such as `{EpiEstim}` or `{EpiNow2}`. However, both require some epidemiological information as an input. For example, in `{EpiNow2}` we use `EpiNow2::dist_spec()` to specify a [generation time](../learners/reference.md#generationtime) as a probability `distribution` adding its `mean`, standard deviation (`sd`), and maximum value (`max`). To specify a `generation_time` that follows a _Gamma_ distribution with mean $\mu = 4$, standard deviation $\sigma = 2$, and a maximum value of 20, we write:
Expand Down Expand Up @@ -99,12 +112,12 @@

The generation time, jointly with the reproduction number ($R$), provide valuable insights on the strength of transmission and inform the implementation of control measures. Given a $R>1$, the shorter the generation time, the earlier the incidence of disease cases will grow.

![Video from the MRC Centre for Global Infectious Disease Analysis, Ep 76. Science In Context - Epi Parameter Review Group with Dr Anne Cori (27-07-2023) at <https://youtu.be/VvpYHhFDIjI?si=XiUyjmSV1gKNdrrL>](fig/reproduction-generation-time.png)

Check warning on line 115 in episodes/delays-reuse.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/reproduction-generation-time.png

In calculating the effective reproduction number ($R_{t}$), the *generation time* distribution is often approximated by the [serial interval](../learners/reference.md#serialinterval) distribution.
This frequent approximation is because it is easier to observe and measure the onset of symptoms than the onset of infectiousness.

![A schematic of the relationship of different time periods of transmission between an infector and an infectee in a transmission pair. Exposure window is defined as the time interval having viral exposure, and transmission window is defined as the time interval for onward transmission with respect to the infection time ([Chung Lau et al., 2021](https://academic.oup.com/jid/article/224/10/1664/6356465)).](fig/serial-interval-observed.jpeg)

Check warning on line 120 in episodes/delays-reuse.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/serial-interval-observed.jpeg

However, using the *serial interval* as an approximation of the *generation time* is primarily valid for diseases in which infectiousness starts after symptom onset ([Chung Lau et al., 2021](https://academic.oup.com/jid/article/224/10/1664/6356465)). In cases where infectiousness starts before symptom onset, the serial intervals can have negative values, which is the case for diseases with pre-symptomatic transmission ([Nishiura et al., 2020](https://www.ijidonline.com/article/S1201-9712(20)30119-3/fulltext#gr2)).

Expand All @@ -116,13 +129,13 @@

When we calculate the *serial interval*, we see that not all case pairs have the same time length. We will observe this variability for any case pair and individual time period, including the [incubation period](../learners/reference.md#incubation) and [infectious period](../learners/reference.md#infectiousness).

![Serial intervals of possible case pairs in (a) COVID-19 and (b) MERS-CoV. Pairs represent a presumed infector and their presumed infectee plotted by date of symptom onset ([Althobaity et al., 2022](https://www.sciencedirect.com/science/article/pii/S2468042722000537#fig6)).](fig/serial-interval-pairs.jpg)

Check warning on line 132 in episodes/delays-reuse.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/serial-interval-pairs.jpg

To summarise these data from individual and pair time periods, we can find the **statistical distributions** that best fit the data ([McFarland et al., 2023](https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2023.28.27.2200806)).

<!-- add a reference about good practices to estimate distributions -->

![Fitted serial interval distribution for (a) COVID-19 and (b) MERS-CoV based on reported transmission pairs in Saudi Arabia. We fitted three commonly used distributions, Lognormal, Gamma, and Weibull distributions, respectively ([Althobaity et al., 2022](https://www.sciencedirect.com/science/article/pii/S2468042722000537#fig5)).](fig/seria-interval-fitted-distributions.jpg)

Check warning on line 138 in episodes/delays-reuse.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/seria-interval-fitted-distributions.jpg

Statistical distributions are summarised in terms of their **summary statistics** like the *location* (mean and percentiles) and *spread* (variance or standard deviation) of the distribution, or with their **distribution parameters** that inform about the *form* (shape and rate/scale) of the distribution. These estimated values can be reported with their **uncertainty** (95% confidence intervals).

Expand All @@ -141,7 +154,7 @@
| MERS-CoV | 14.08(13.1–15.2) | 2.58(2.50–2.68) | 0.44(0.39–0.5) |
| COVID-19 | 5.2(4.2–6.5) | 1.45(1.31–1.61) | 0.63(0.54–0.74) |

Table: Serial interval estimates using Gamma, Weibull, and Log normal distributions. 95% confidence intervals for the shape and scale (logmean and sd for Log normal) parameters are shown in brackets ([Althobaity et al., 2022](https://www.sciencedirect.com/science/article/pii/S2468042722000537#tbl3)).
Table: Serial interval estimates using Gamma, Weibull, and Log Normal distributions. 95% confidence intervals for the shape and scale (logmean and sd for Log Normal) parameters are shown in brackets ([Althobaity et al., 2022](https://www.sciencedirect.com/science/article/pii/S2468042722000537#tbl3)).

:::::::::::::::::::::::::

Expand All @@ -151,12 +164,12 @@

Assume that COVID-19 and SARS have similar reproduction number values and that the serial interval approximates the generation time.

Given the Serial interval of both infections in the figure below:
Given the serial interval of both infections in the figure below:

- Which one would be harder to control?
- Why do you conclude that?

![Serial interval of novel coronavirus (COVID-19) infections overlaid with a published distribution of SARS. ([Nishiura et al., 2020](https://www.ijidonline.com/article/S1201-9712(20)30119-3/fulltext))](fig/serial-interval-covid-sars.jpg)

Check warning on line 172 in episodes/delays-reuse.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[image missing alt-text]: fig/serial-interval-covid-sars.jpg

::::::::::::::::: hint

Expand Down Expand Up @@ -251,7 +264,7 @@

::::::::::::::::: spoiler

### Why do we have a 'NA' entry?
### Why do we have an 'NA' entry?

Entries with a missing value (`<NA>`) in the `prob_distribution` column are *non-parameterised* entries. They have summary statistics but no probability distribution. Compare these two outputs:

Expand Down Expand Up @@ -633,7 +646,7 @@

::::::::::::::::: discussion

### The distribution Zoo
### The distribution zoo

Explore this shinyapp called **The Distribution Zoo**!

Expand Down
40 changes: 28 additions & 12 deletions episodes/describe-cases.Rmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: 'Aggregate and visulaize'
title: 'Aggregate and visualize'
teaching: 20
exercises: 10
---
Expand Down Expand Up @@ -29,6 +29,19 @@ packages. A key observation in EDA of epidemic analysis is capturing the relatio
reported cases, spanning various categories (confirmed, hospitalized, deaths, and recoveries), locations, and other
demographic factors such as gender, age, etc.

avallecam marked this conversation as resolved.
Show resolved Hide resolved
::::::::::::::::::: checklist

### The double-colon

The double-colon `::` in R let you call a specific function from a package without loading the entire package into the current environment.

For example, `dplyr::filter(data, condition)` uses `filter()` from the `{dplyr}` package.

This help us remember package functions and avoid namespace conflicts.

:::::::::::::::::::


## Synthetic outbreak data

To illustrate the process of conducting EDA on outbreak data, we will generate a line list
Expand Down Expand Up @@ -101,8 +114,6 @@ linelist <- simulist::sim_linelist(
hosp_death_risk = 0.5,
non_hosp_death_risk = 0.05,
outbreak_start_date = as.Date("2023-01-01"),
add_names = TRUE,
add_ct = TRUE,
outbreak_size = c(1000, 10000),
population_age = c(1, 90),
case_type_probs = c(suspected = 0.2, probable = 0.1, confirmed = 0.7),
Expand Down Expand Up @@ -166,7 +177,8 @@ dialy_incidence_data_2 <- incidence2::incidence(
)

# Complete missing dates in the incidence object
incidence2::complete_dates(dialy_incidence_data_2,
incidence2::complete_dates(
x = dialy_incidence_data_2,
expand = TRUE,
fill = 0L, by = 1L,
allow_POSIXct = FALSE
Expand All @@ -186,21 +198,25 @@ library("ggplot2")
library("tracetheme")

# Plot daily incidence data
base::plot(dialy_incidence_data) + ggplot2::labs(
x = "Time (in days)",
y = "Dialy cases"
) + tracetheme::theme_trace()
base::plot(dialy_incidence_data) +
ggplot2::labs(
x = "Time (in days)",
y = "Dialy cases"
) +
tracetheme::theme_trace()
```


```{r, message=FALSE, warning=FALSE}

# Plot weekly incidence data

base::plot(weekly_incidence_data) + ggplot2::labs(
x = "Time (in days)",
y = "weekly cases"
) + tracetheme::theme_trace()
base::plot(weekly_incidence_data) +
ggplot2::labs(
x = "Time (in days)",
y = "weekly cases"
) +
tracetheme::theme_trace()
```

::::::::::::::::::::::::::::::::::::: challenge
Expand Down
6 changes: 4 additions & 2 deletions episodes/quantify-transmissibility.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,11 @@ library(tidyverse)

### The double-colon

The double-colon `::` in R is used to access functions or objects from a specific package without loading the entire package into the current environment. This allows for a more targeted approach to using package components and helps avoid namespace conflicts.
The double-colon `::` in R let you call a specific function from a package without loading the entire package into the current environment.

`::` lets you call a specific function from a package by explicitly mentioning the package name. For example, `dplyr::filter(data, condition)` uses `filter()` from the `{dplyr}` package without loading the entire package.
For example, `dplyr::filter(data, condition)` uses `filter()` from the `{dplyr}` package.

This help us remember package functions and avoid namespace conflicts.

:::::::::::::::::::

Expand Down
Loading
Loading