Skip to content

Commit

Permalink
Merge pull request #37 from pepijn-devries/work-in-progress
Browse files Browse the repository at this point in the history
Added test to compare online search with results from local search
  • Loading branch information
pepijn-devries authored May 24, 2024
2 parents b15528d + 312bc1e commit 0cde415
Show file tree
Hide file tree
Showing 8 changed files with 154 additions and 31 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/pkgdown.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
permissions:
contents: write
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- uses: r-lib/actions/setup-pandoc@v2

Expand All @@ -41,7 +41,7 @@ jobs:

- name: Deploy to GitHub pages 🚀
if: github.event_name != 'pull_request'
uses: JamesIves/github-pages-deploy-action@v4.4.1
uses: JamesIves/github-pages-deploy-action@v4.5.0
with:
clean: false
branch: gh-pages
Expand Down
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: ECOTOXr
Type: Package
Title: Download and Extract Data from US EPA's ECOTOX Database
Version: 1.0.9
Date: 2024-01-07
Version: 1.1.0
Date: 2024-02-10
Authors@R: c(person("Pepijn", "de Vries", role = c("aut", "cre", "dtc"),
email = "[email protected]",
comment = c(ORCID = "0000-0002-7961-6646")))
Expand Down
5 changes: 5 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
ECOTOXr v1.1.0 (Release date: 2024-02-10)
-------------

* Added `pkgdown` generated website

ECOTOXr v1.0.9 (Release date: 2024-01-07)
-------------

Expand Down
26 changes: 13 additions & 13 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,26 +11,26 @@ knitr::opts_chunk$set(
library(ECOTOXr)
```

> `{ECOTOXr}` Harness information from the [US EPA ECOTOXicology Knowledgebase](https://cfpub.epa.gov/ecotox/)
> `ECOTOXr` Harness information from the [US EPA ECOTOXicology Knowledgebase](https://cfpub.epa.gov/ecotox/)
[![R build status](https://github.com/pepijn-devries/ECOTOXr/workflows/R-CMD-check/badge.svg)](https://github.com/pepijn-devries/ECOTOXr/actions)
[![version](https://www.r-pkg.org/badges/version/ECOTOXr)](https://CRAN.R-project.org/package=ECOTOXr)
![cranlogs](https://cranlogs.r-pkg.org/badges/ECOTOXr)


## Overview

<a href="https://github.com/pepijn-devries/ECOTOXr/"><img src="man/figures/logo.png" alt="ECOTOXr logo" align="right" /></a>
`{ECOTOXr}` can be used to explore and analyse data from the [US EPA ECOTOX database](https://cfpub.epa.gov/ecotox/).
<a href="https://github.com/pepijn-devries/ECOTOXr/"><img src="man/figures/logo.png" alt="ECOTOXr logo" align="right" class="pkgdown-hide" /></a>
`ECOTOXr` can be used to explore and analyse data from the [US EPA ECOTOX database](https://cfpub.epa.gov/ecotox/).
More specifically you can:

* Build a local SQLite copy of the [US EPA ECOTOX database](https://cfpub.epa.gov/ecotox/)
* Search and extract data from the local database
* Use experimental features to search the on-line dashboards: [ECOTOX](https://cfpub.epa.gov/ecotox/search.cfm) and
[CompTox](https://comptox.epa.gov/dashboard/batch-search)

## Why use `{ECOTOXr}`?
## Why use `ECOTOXr`?

The `{ECOTOXr}` package allows you to search and extract data from the [ECOTOXicological Knowledgebase](https://cfpub.epa.gov/ecotox/)
The `ECOTOXr` package allows you to search and extract data from the [ECOTOXicological Knowledgebase](https://cfpub.epa.gov/ecotox/)
and import it directly into `R`. This will allow you to formalize and document the search- and extract-procedures in `R` code.
This makes it easier to share and reproduce such procedures and its results. Moreover, you can directly apply any statistical
analysis offered in `R`.
Expand All @@ -42,16 +42,16 @@ analysis offered in `R`.
install.packages("ECOTOXr")
```

> Get development version on github
> Get development version from r-universe
```{r eval=FALSE}
devtools::install_github('pepijn-devries/ECOTOXr')
install.packages("ECOTOXr", repos = c("https://pepijn-devries.r-universe.dev", "https://cloud.r-project.org"))
```

## Usage

### Preparing the database

Although `{ECOTOXr}` has experimental features to search the on-line database. The package will
Although `ECOTOXr` has experimental features to search the on-line database. The package will
reach its full potential when you build a copy of the database on your local machine.

> Download and build a local copy of the latest ASCII export of the US EPA ECOTOX database
Expand Down Expand Up @@ -84,7 +84,7 @@ simple `R` syntax and allows you to search and collect any field from any table
all requested output fields are automatically joined to the result without the end-user needing to know anything
about the database structure.

> Using the prefab function `search_ecotox` packaged by `{ECOTOXr}`
> Using the prefab function `search_ecotox` packaged by `ECOTOXr`
```{r warning = FALSE}
search_ecotox(
Expand All @@ -95,11 +95,11 @@ search_ecotox(
)
```

If you like to use [`{dplyr}`](https://dplyr.tidyverse.org/) verbs, you are in luck. SQLite database can be approached using
`{dplyr}` verbs. This approach will only return information from the `results` table. The end-user will have to join other information
If you like to use [`dplyr`](https://dplyr.tidyverse.org/) verbs, you are in luck. SQLite database can be approached using
`dplyr` verbs. This approach will only return information from the `results` table. The end-user will have to join other information
(like test species and test substance) manually. This does require knowledge of the database structure.

> Using `{dplyr}` verbs
> Using `dplyr` verbs
```{r warning = FALSE}
con <- dbConnectEcotox()
Expand All @@ -108,7 +108,7 @@ dplyr::tbl(con, "results") |>
dplyr::collect()
```

If you prefer working using `SQL` directly, that is fine too. The [`{RSQLite}`](https://cran.r-project.org/package=RSQLite) package
If you prefer working using `SQL` directly, that is fine too. The [`RSQLite`](https://cran.r-project.org/package=RSQLite) package
allows you to get queries using `SQL` statements. The result is identical to that of the previous approach. Here too the end-user
needs knowledge of the database structure in order to join additional data.

Expand Down
28 changes: 14 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@

> `{ECOTOXr}` Harness information from the [US EPA ECOTOXicology
> `ECOTOXr` Harness information from the [US EPA ECOTOXicology
> Knowledgebase](https://cfpub.epa.gov/ecotox/) [![R build
> status](https://github.com/pepijn-devries/ECOTOXr/workflows/R-CMD-check/badge.svg)](https://github.com/pepijn-devries/ECOTOXr/actions)
> [![version](https://www.r-pkg.org/badges/version/ECOTOXr)](https://CRAN.R-project.org/package=ECOTOXr)
> ![cranlogs](https://cranlogs.r-pkg.org/badges/ECOTOXr)
## Overview

<a href="https://github.com/pepijn-devries/ECOTOXr/"><img src="man/figures/logo.png" alt="ECOTOXr logo" align="right" /></a>
`{ECOTOXr}` can be used to explore and analyse data from the [US EPA
<a href="https://github.com/pepijn-devries/ECOTOXr/"><img src="man/figures/logo.png" alt="ECOTOXr logo" align="right" class="pkgdown-hide" /></a>
`ECOTOXr` can be used to explore and analyse data from the [US EPA
ECOTOX database](https://cfpub.epa.gov/ecotox/). More specifically you
can:

Expand All @@ -19,9 +19,9 @@ can:
[ECOTOX](https://cfpub.epa.gov/ecotox/search.cfm) and
[CompTox](https://comptox.epa.gov/dashboard/batch-search)

## Why use `{ECOTOXr}`?
## Why use `ECOTOXr`?

The `{ECOTOXr}` package allows you to search and extract data from the
The `ECOTOXr` package allows you to search and extract data from the
[ECOTOXicological Knowledgebase](https://cfpub.epa.gov/ecotox/) and
import it directly into `R`. This will allow you to formalize and
document the search- and extract-procedures in `R` code. This makes it
Expand All @@ -36,17 +36,17 @@ you can directly apply any statistical analysis offered in `R`.
install.packages("ECOTOXr")
```

> Get development version on github
> Get development version from r-universe
``` r
devtools::install_github('pepijn-devries/ECOTOXr')
install.packages("ECOTOXr", repos = c("https://pepijn-devries.r-universe.dev", "https://cloud.r-project.org"))
```

## Usage

### Preparing the database

Although `{ECOTOXr}` has experimental features to search the on-line
Although `ECOTOXr` has experimental features to search the on-line
database. The package will reach its full potential when you build a
copy of the database on your local machine.

Expand Down Expand Up @@ -84,7 +84,7 @@ from any table in the database. Furthermore, all requested output fields
are automatically joined to the result without the end-user needing to
know anything about the database structure.

> Using the prefab function `search_ecotox` packaged by `{ECOTOXr}`
> Using the prefab function `search_ecotox` packaged by `ECOTOXr`
``` r
search_ecotox(
Expand All @@ -110,14 +110,14 @@ search_ecotox(
#> # exposure_duration_mean_op <chr>, exposure_duration_mean <chr>, …
```

If you like to use [`{dplyr}`](https://dplyr.tidyverse.org/) verbs, you
are in luck. SQLite database can be approached using `{dplyr}` verbs.
This approach will only return information from the `results` table. The
If you like to use [`dplyr`](https://dplyr.tidyverse.org/) verbs, you
are in luck. SQLite database can be approached using `dplyr` verbs. This
approach will only return information from the `results` table. The
end-user will have to join other information (like test species and test
substance) manually. This does require knowledge of the database
structure.

> Using `{dplyr}` verbs
> Using `dplyr` verbs
``` r
con <- dbConnectEcotox()
Expand All @@ -138,7 +138,7 @@ dplyr::tbl(con, "results") |>
```

If you prefer working using `SQL` directly, that is fine too. The
[`{RSQLite}`](https://cran.r-project.org/package=RSQLite) package allows
[`RSQLite`](https://cran.r-project.org/package=RSQLite) package allows
you to get queries using `SQL` statements. The result is identical to
that of the previous approach. Here too the end-user needs knowledge of
the database structure in order to join additional data.
Expand Down
1 change: 1 addition & 0 deletions man/ECOTOXr-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Binary file added tests/testthat/test_data/insecticides.rdata
Binary file not shown.
117 changes: 117 additions & 0 deletions tests/testthat/test_online.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
library(dplyr)

check_db <- function() {
if (!check_ecotox_availability()) {
skip("ECOTOX database not available")
}
}

test_that("Online and local search yield the same results", {
check_db()
skip_if_offline()
expect_true({
load(file.path(testthat::test_path(), "test_data", "insecticides.rdata"))
insecticides$cas <- format(as.cas(insecticides$cas), hyphenate = FALSE)
unit_conversion <-
data.frame(what = c(rep("mass", 8), rep("volume", 2)),
unit = c("pg", "ng", "ug", "mg", "g", "nmol", "umol", "mmol", "L", "m3"),
conversion = 10^c(-12, -9, -6, -3, 0, -9, -6, -3, 0, 3))

insecticedes_search <- search_ecotox(
list(
test_cas = list(terms = insecticides$cas, method = "exact"),
endpoint = list(terms = c("EC50", "LC50"), method = "contains"),
latin_name = list(terms = "Daphnia magna", method = "exact"),
effect = list(terms = c("ITX", "MOR"), method = "contains")
),
c(list_ecotox_fields(), "results.obs_duration_mean", "results.obs_duration_unit",
"results.result_id")) |>
mutate(
duration_corr = case_match(
obs_duration_unit, "d" ~ 1, "h" ~ 1/24, "mi" ~ 1/(60*24), "NR" ~ NA, "wk" ~ 7),
duration_corr = suppressWarnings(as.numeric(obs_duration_mean)*duration_corr),
test_cas = as.character(ECOTOXr::as.cas(test_cas)),
conc1_mean = suppressWarnings({as.numeric(gsub("[*]", "", conc1_mean))})
) |>
filter(duration_corr == 2 & conc1_mean != "NR" & !grepl("org", conc1_unit)) |>
left_join(insecticides |> distinct(), c(test_cas = "cas")) |>
mutate(
conc1_unit_fix = trimws(gsub("AI", "", conc1_unit)),
conc1_unit_fix =
case_match(
conc1_unit_fix,
"mM" ~ "mmol/L",
"uM" ~ "umol/L",
"nM" ~ "nmol/L",
"mg/kg" ~ "mg/L",
"ppm" ~ "mg/L",
"ppb" ~ "ug/L",
"ppt" ~ "ng/L",
.default = conc1_unit_fix),
conc1_conversion_factor = {
do.call(rbind, strsplit(conc1_unit_fix, "/")) |>
as.data.frame() |>
rename_with(~c("mass", "volume")) |>
left_join(unit_conversion, c(mass = "unit")) |>
rename(mass_conversion = "conversion") |>
mutate(
## If mass is reported as volume (1 case) use specific gravity to convert to actual mass
mass_conversion = ifelse(mass == "ul" & test_cas == "333-41-5", 1.117e-6, mass_conversion)
) |>
left_join(unit_conversion, c(volume = "unit")) |>
rename(volume_conversion = "conversion") |>
mutate(molar_conversion = ifelse(grepl("mol", mass), molweight, 1),
total_conversion = molar_conversion*mass_conversion/volume_conversion) |>
pull(total_conversion)
},
conc1_ug_l = 1e6*conc1_mean*conc1_conversion_factor
) |>
suppressWarnings() |>
suppressMessages()

websearch <- list_ecotox_web_fields(
txAdvancedChemicalEntries = paste(insecticides$cas,
collapse = "\r\n"),
RBCHEMSEARCHTYPE = "EXACT",
txAdvancedSpecEntries = "daphnia magna",
RBSPECSEARCHTYPE = "EXACT",
cbResultsGroup12a = "LC50",
cbResultsGroup13a = "EC50",
cbResultsGroup6 = "MOR",
cbResultsGroup7c = "ITX",
txExposureDurationStd = "2",
cbResult_number = "Result Number")

websearch <- suppressWarnings(websearch_ecotox(websearch))
websearch <- websearch$`Aquatic-Export` |>
dplyr::filter(!is.na(`Conc 1 Mean (Standardized)`) &
`Conc 1 Units (Standardized)` == "AI mg/L") |>
select(result_id = "Result Number",
conc1_ug_l = "Conc 1 Mean (Standardized)",
test_cas = "CAS Number") |>
mutate(test_cas = as.character(as.cas(test_cas)), ## hyphenate the CAS numbers
conc1_ug_l = 1e3*conc1_ug_l)
conc_check <-
full_join(
websearch |>
select(web_conc = "conc1_ug_l", "result_id"),
insecticedes_search |>
select(local_conc = "conc1_ug_l", "result_id"),
by = "result_id"
) |>
mutate(
diff = 1 - web_conc/local_conc,
check = diff < 1e-3
)
result <-
## Number of records differ no more than 2
(abs(nrow(websearch) - nrow(insecticedes_search)) <= 2) &&
## Retrieved websearch cas numbers are also in local cas numbers
all(websearch$test_cas %in% insecticedes_search$test_cas) &&
## Retrieved local cas numbers are also in websearch cas numbers
all(insecticedes_search$test_cas %in% websearch$test_cas) &&
## concentrations are identical
all(na.omit(conc_check$check))
result
})
})

0 comments on commit 0cde415

Please sign in to comment.