Data is often published with shorthand and symbols, and regularly these
tags are found in the same container (e.g. a spreadsheet/table cell) as
the numeric value. The aim of {shrthnd}
is to process character
vectors of numerical data that also contain non-numeric shorthand and
symbols, and to ensure both pieces of information can be easily retained
and worked with.
{shrthnd}
is not yet on CRAN, but binary versions can be installed
from R-universe:
install.packages(
"shrthnd",
repos = c("https://mattkerlogue.r-universe.dev", "https://cran.r-project.org")
)
You can install the development version of shrthnd like so:
# install.packages("remotes")
remotes::install_github("mattkerlogue/shrthnd")
Use shrthnd_num()
to convert a character vector to a shrthnd_num
vector. In effect a shrthnd_num()
is a pair of vectors, one numeric
and a character vector to store the non-numeric components of the input
vector. By default a shrthnd_num()
will try to behave as a numeric
vector, and can be explicitly coerced into a numeric vector with
as.numeric()
. You can use shrthnd_tags()
, amongst other functions,
to interact with the non-numeric (“tag”) component of the input vector.
{shrthnd}
also provides for the annotation of data.frames,
specifically of the tibble::tibble()
flavour.
Full usage details are available on the {shrthnd}
documentation
website.
library(shrthnd)
x <- c("12", "34.567", "[c]", "NA", "56.78 [e]", "78.9", "90.123[e]",
"321.09*", "987.564 \u2021", ".", "..")
sh_x <- shrthnd_num(x)
sh_x
#> <shrthnd_num[11]>
#> [1] 12.00 34.57 NA [c] NA 56.78 [e] 78.90
#> [7] 90.12 [e] 321.09 * 987.56 ‡ NA . NA ..
shrthnd_list(sh_x)
#> <shrthnd_list[6]>
#> [c] (1 location): 3
#> [e] (2 locations): 5, 7
#> * (1 location): 8
#> ‡ (1 location): 9
#> . (1 location): 10
#> .. (1 location): 11
tbl <- tibble::tibble(
x = x,
sh_x = sh_x,
as_num = as.numeric(sh_x),
as_char = as.character(sh_x),
tag = shrthnd_tags(sh_x),
as_shrthnd = as_shrthnd(sh_x),
as_shrthnd2 = as_shrthnd(sh_x, digits = 3)
)
tbl
#> # A tibble: 11 × 7
#> x sh_x as_num as_char tag as_shrthnd as_shrthnd2
#> <chr> <sh_dbl> <dbl> <chr> <chr> <chr> <chr>
#> 1 12 12.00 12 12 <NA> 12.00 12.000
#> 2 34.567 34.57 34.6 34.567 <NA> 34.57 34.567
#> 3 [c] NA [c] NA <NA> [c] NA [c] NA [c]
#> 4 NA NA NA <NA> <NA> NA NA
#> 5 56.78 [e] 56.78 [e] 56.8 56.78 [e] 56.78 [e] 56.780 [e]
#> 6 78.9 78.90 78.9 78.9 <NA> 78.90 78.900
#> 7 90.123[e] 90.12 [e] 90.1 90.123 [e] 90.12 [e] 90.123 [e]
#> 8 321.09* 321.09 * 321. 321.09 * 321.09 * 321.090 *
#> 9 987.564 ‡ 987.56 ‡ 988. 987.564 ‡ 987.56 ‡ 987.564 ‡
#> 10 . NA . NA <NA> . NA . NA .
#> 11 .. NA .. NA <NA> .. NA .. NA ..
sh_tbl <- shrthnd_tbl(
tbl,
title = "Example table",
notes = c("Note 1", "Note 2"),
source_note = "Shrthnd documentation, 2023"
)
sh_tbl
#> # Title: Example table
#> # A tibble: 11 × 7
#> x sh_x as_num as_char tag as_shrthnd as_shrthnd2
#> <chr> <sh_dbl> <dbl> <chr> <chr> <chr> <chr>
#> 1 12 12.00 12 12 <NA> 12.00 12.000
#> 2 34.567 34.57 34.6 34.567 <NA> 34.57 34.567
#> 3 [c] NA [c] NA <NA> [c] NA [c] NA [c]
#> 4 NA NA NA <NA> <NA> NA NA
#> 5 56.78 [e] 56.78 [e] 56.8 56.78 [e] 56.78 [e] 56.780 [e]
#> 6 78.9 78.90 78.9 78.9 <NA> 78.90 78.900
#> 7 90.123[e] 90.12 [e] 90.1 90.123 [e] 90.12 [e] 90.123 [e]
#> 8 321.09* 321.09 * 321. 321.09 * 321.09 * 321.090 *
#> 9 987.564 ‡ 987.56 ‡ 988. 987.564 ‡ 987.56 ‡ 987.564 ‡
#> 10 . NA . NA <NA> . NA . NA .
#> 11 .. NA .. NA <NA> .. NA .. NA ..
#> # ☰ Source: Shrthnd documentation, 2023
#> # ☰ There are 2 notes, use `annotations(x)` to view
annotations(sh_tbl)
#> ── Notes for `sh_tbl` ──────────────────────────────────────────────────────────
#> Title: Example table
#> Source: Shrthnd documentation, 2023
#> Notes:
#> • Note 1
#> • Note 2
Datasets, especially statistical data published by governments, international institutions and academia, often comes with symbols and markers to provide further details about the values: that a value is estimated, the reason for why a value is missing, or that a value has a given statistical significance level.
The most common approach to processing data that contains both numeric
and non-numeric components is to scrub the non-numeric content, so that
the input can be coerced into a numeric vector. However, this
non-numeric content (“tags”) often convey useful information that it
might be useful to retain. If you want to access this non-numeric
content, you may need to re-import your dataset or change your
processing. This creates opportunity for error and, critically,
de-linking the numeric and non-numeric components. The shrthnd_num()
data type builds on
vctrs::new_rcrd()
to separate, but keep linked, these numeric and non-numeric components
of a vector.
The {shrthnd}
package logo is a combination of the word “shorthand”
written in Pitman
shorthand alongside an
asterisk. The image was drawn by hand with plot points then adjusted for
plotting in {ggplot2}
. The “shorthand” shape is based on the
representation in Arthur Reynold’s Pitman’s English and Shorthand
Dictionary, retrieved from the Internet Archive on
2023-05-11.