-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.Rmd
141 lines (106 loc) · 5.72 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
output: github_document
---
```{r, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
library(ECOTOXr)
```
# `ECOTOXr`
> Harness information from the [US EPA ECOTOXicology Knowledgebase](https://cfpub.epa.gov/ecotox/)
<!-- badges: start -->
[![R build status](https://github.com/pepijn-devries/ECOTOXr/workflows/R-CMD-check/badge.svg)](https://github.com/pepijn-devries/ECOTOXr/actions)
[![version](https://www.r-pkg.org/badges/version/ECOTOXr)](https://CRAN.R-project.org/package=ECOTOXr)
![cranlogs](https://cranlogs.r-pkg.org/badges/ECOTOXr)
<!--[![Codecov test coverage](https://codecov.io/gh/pepijn-devries/ECOTOXr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/pepijn-devries/ECOTOXr?branch=main)-->
<!-- badges: end -->
## Overview
<a href="https://github.com/pepijn-devries/ECOTOXr/"><img src="man/figures/logo.png" alt="ECOTOXr logo" align="right" class="pkgdown-hide" /></a>
`ECOTOXr` can be used to explore and analyse data from the [US EPA ECOTOX database](https://cfpub.epa.gov/ecotox/).
More specifically you can:
* Build a local SQLite copy of the [US EPA ECOTOX database](https://cfpub.epa.gov/ecotox/)
* Search and extract data from the local database
* Use experimental features to search the on-line dashboards: [ECOTOX](https://cfpub.epa.gov/ecotox/search.cfm) and
[CompTox](https://comptox.epa.gov/dashboard/batch-search)
## Why use `ECOTOXr`?
The `ECOTOXr` package allows you to search and extract data from the [ECOTOXicological Knowledgebase](https://cfpub.epa.gov/ecotox/)
and import it directly into `R`. This will allow you to formalize and document the search- and extract-procedures in `R` code.
This makes it easier to share and reproduce such procedures and its results. Moreover, you can directly apply any statistical
analysis offered in `R`.
## Installation
> Get CRAN version
```{r eval=FALSE}
install.packages("ECOTOXr")
```
> Get development version from r-universe
```{r eval=FALSE}
install.packages("ECOTOXr", repos = c("https://pepijn-devries.r-universe.dev", "https://cloud.r-project.org"))
```
## Usage
### Preparing the database
Although `ECOTOXr` has experimental features to search the on-line database. The package will
reach its full potential when you build a copy of the database on your local machine.
> Download and build a local copy of the latest ASCII export of the US EPA ECOTOX database
```{r eval=FALSE}
download_ecotox_data()
```
### Searching the local database for species and substances
Obviously, searching the local database is only possible after the download and build is
ready (see previous section).
> Search the local database for tests of water flea Daphnia magna exposed to benzene
```{r eval=FALSE}
search_ecotox(
list(
latin_name = list(terms = "Daphnia magna", method = "exact"),
chemical_name = list(terms = "benzene", method = "exact")
)
)
```
### Three ways of querying the local database
Let's have a look at 3 different approaches for retrieving a specific record from the local database, using the
unique identifier `result_id`. The first option is to use the build in `search_ecotox` function. It uses
simple `R` syntax and allows you to search and collect any field from any table in the database. Furthermore,
all requested output fields are automatically joined to the result without the end-user needing to know anything
about the database structure.
> Using the prefab function `search_ecotox` packaged by `ECOTOXr`
```{r warning = FALSE}
search_ecotox(
list(
result_id = list(terms = "401386", method = "exact")
),
as_data_frame = F
)
```
If you like to use [`dplyr`](https://dplyr.tidyverse.org/) verbs, you are in luck. SQLite database can be approached using
`dplyr` verbs. This approach will only return information from the `results` table. The end-user will have to join other information
(like test species and test substance) manually. This does require knowledge of the database structure.
> Using `dplyr` verbs
```{r warning = FALSE}
con <- dbConnectEcotox()
dplyr::tbl(con, "results") |>
dplyr::filter(result_id == "401386") |>
dplyr::collect()
```
If you prefer working using `SQL` directly, that is fine too. The [`RSQLite`](https://cran.r-project.org/package=RSQLite) package
allows you to get queries using `SQL` statements. The result is identical to that of the previous approach. Here too the end-user
needs knowledge of the database structure in order to join additional data.
> Using `SQL` syntax
```{r warning = FALSE}
dbGetQuery(con, "SELECT * FROM results WHERE result_id='401386'") |>
dplyr::as_tibble()
```
## Disclaimers
It is the end-users own responsibility to check the quality of collected data, using the original referenced source in order to evaluate its
fitness for use, see also: <https://cfpub.epa.gov/ecotox/help.cfm#info-limitations>.
Note that the package maintainer is not affiliated with the US EPA, this package is therefore **not** official US EPA software.
## Resources
* De Vries, P. (2024) ECOTOXr: An R package for reproducible and transparent retrieval of data from EPA's ECOTOX database. _Chemosphere_ 364 143078 <https://doi.org/10.1016/j.chemosphere.2024.143078>
* [Manual of the CRAN release](https://CRAN.R-project.org/package=ECOTOXr)
* EPA ECOTOX help <https://cfpub.epa.gov/ecotox/help.cfm>
* Olker, Jennifer H.; Elonen, Colleen M.; Pilli, Anne; Anderson, Arne; Kinziger, Brian; Erickson, Stephen; Skopinski, Michael;
Pomplun, Anita; LaLone, Carlie A.; Russom, Christine L.; Hoff, Dale. (2022): The ECOTOXicology Knowledgebase: A Curated Database of
Ecologically Relevant Toxicity Tests to Support Environmental Research and Risk Assessment. _Environmental Toxicology and Chemistry_
41(6) 1520-1539 <https://doi.org/10.1002/etc.5324>