Skip to content

Commit

Permalink
Bahadzie/issue7 (#8)
Browse files Browse the repository at this point in the history
* Handles text containing leading or trailing whitespace.
Fixes #7

* Undo unnecessary changes in English word mappings
  • Loading branch information
bahadzie authored Apr 17, 2024
1 parent 6645e34 commit fc32825
Show file tree
Hide file tree
Showing 4 changed files with 37 additions and 21 deletions.
6 changes: 4 additions & 2 deletions R/numberize.R
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,8 @@ digits_from <- function(text, lang = "en") {
)

# clean and prep
text <- tolower(text)
text <- tolower(text) # converts to string as a side effect
text <- trimws(text)
text <- gsub("\\sand|-|,|\\bet\\b|\\sy\\s", " ", text) # all lang

if (lang == "es") {
Expand Down Expand Up @@ -123,7 +124,7 @@ number_from <- function(digits) {
if (is.na(text)) {
return(NA)
}

# convert to numeric. Numeric values will pass and non numeric values will be
# coerced to NA and converted into numbers.
tmp_text <- suppressWarnings(as.numeric(text))
Expand All @@ -145,6 +146,7 @@ number_from <- function(digits) {
#'
#' @param text Vector containing spelled numbers in a supported language.
#' @param lang The text's language. Currently one of `"en" | "fr" | "es"`.
#' Default is "en"
#'
#' @return A vector of numeric values.
#'
Expand Down
33 changes: 19 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@

<!-- README.md is generated from README.Rmd. Please edit that file. -->

<!-- The code to render this README is stored in .github/workflows/render-readme.yaml -->

<!-- Variables marked with double curly braces will be transformed beforehand: -->

<!-- `packagename` is extracted from the DESCRIPTION file -->

<!-- `gh_repo` is extracted via a special environment variable in GitHub Actions -->

# *numberize* <img src="man/figures/logo.svg" align="right" width="120" />
Expand All @@ -21,9 +25,10 @@ status](https://www.r-pkg.org/badges/version/numberize)](https://CRAN.R-project.
<!-- badges: end -->

*numberize* is an R package to convert numbers written as English,
French or Spanish words from `"zero"` to
`"nine hundred and ninety nine trillion, nine hundred and ninety nine billion, nine hundred and ninety nine million, nine hundred and ninety nine thousand, nine hundred and ninety nine"`
from a character string to a numeric value.
French or Spanish words from `"zero"` to `"nine hundred and ninety nine
trillion, nine hundred and ninety nine billion, nine hundred and ninety
nine million, nine hundred and ninety nine thousand, nine hundred and
ninety nine"` from a character string to a numeric value.

<!-- This sentence is optional and can be removed -->

Expand Down Expand Up @@ -79,17 +84,17 @@ numberize(

## Related packages and Limitations

- [`{numberwang}`](https://github.com/coolbutuseless/numberwang)
converts numbers to words and vice versa. Limitation: English only,
not on CRAN.
- [`{nombre}`](https://cran.r-project.org/web/packages/nombre/index.html)
converts numerics into words. Limitation: English only, no word to
number conversion.
- [`{english}`](https://cran.r-project.org/web/packages/english/index.html)
converts numerics into words. Limitation: English only, no word to
number conversion.
- [`{spanish}`](https://cran.r-project.org/web/packages/spanish/index.html)
converts numbers to words and vice versa. Limitation: Spanish only.
- [`{numberwang}`](https://github.com/coolbutuseless/numberwang)
converts numbers to words and vice versa. Limitation: English only,
not on CRAN.
- [`{nombre}`](https://cran.r-project.org/web/packages/nombre/index.html)
converts numerics into words. Limitation: English only, no word to
number conversion.
- [`{english}`](https://cran.r-project.org/web/packages/english/index.html)
converts numerics into words. Limitation: English only, no word to
number conversion.
- [`{spanish}`](https://cran.r-project.org/web/packages/spanish/index.html)
converts numbers to words and vice versa. Limitation: Spanish only.

*numberize* is released as a standalone package in the hope that it will
be useful to the R community at large. *numberize* was created in
Expand Down
3 changes: 2 additions & 1 deletion man/numberize.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 12 additions & 4 deletions tests/testthat/test-numberize.R
Original file line number Diff line number Diff line change
Expand Up @@ -64,17 +64,17 @@ test_df <- data.frame(
)
)

test_that("translating English numbers works", {
test_that("translating vector of English numbers works", {
res <- numberize(test_df[["en"]])
expect_identical(res, test_df[["num"]])
})

test_that("translating French numbers works", {
test_that("translating vector of French numbers works", {
res <- numberize(test_df[["fr"]], lang = "fr")
expect_identical(res, test_df[["num"]])
})

test_that("translating Spanish numbers works", {
test_that("translating vector of Spanish numbers works", {
res <- numberize(test_df[["es"]], lang = "es")
expect_identical(res, test_df[["num"]])
})
Expand All @@ -84,15 +84,23 @@ test_that("translating single french text works", {
expect_identical(res, 1515)
})

test_that("non digit word returns NA", {

test_that("text with non digit word returns NA", {
res <- numberize("epiverse", lang = "en")
expect_true(is.na(res))
})

# NB: this vector is coerced into character by R
test_that("vector with number and words and NA is properly handled", {
res <- numberize(
c(17, "dix", "soixante-cinq", "deux mille vingt-quatre", NA),
lang = "fr"
)
expect_identical(res, c(17, 10, 65, 2024, NA))
})

test_that("text with leading and trailing whitespace works", {
res <- numberize(" mille cinq cent quinze
", lang = "fr")
expect_identical(res, 1515)
})

0 comments on commit fc32825

Please sign in to comment.