-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
123 lines (95 loc) · 4.15 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# shrthnd <img src="man/figures/shrthnd_hex.png" align="right" alt="tidyods package logo" width="120" />
<!-- badges: start -->
[![R-CMD-check](https://github.com/mattkerlogue/shrthnd/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/mattkerlogue/shrthnd/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
Data is often published with shorthand and symbols, and regularly these tags
are found in the same container (e.g. a spreadsheet/table cell) as the numeric
value. The aim of `{shrthnd}` is to process character vectors of numerical
data that also contain non-numeric shorthand and symbols, and to ensure both
pieces of information can be easily retained and worked with.
## Installation
`{shrthnd}` is not yet on CRAN, but binary versions can be installed from
[R-universe](http://mattkerlogue.r-universe.dev/shrthnd):
```r
install.packages(
"shrthnd",
repos = c("https://mattkerlogue.r-universe.dev", "https://cran.r-project.org")
)
```
You can install the development version of shrthnd like so:
```r
# install.packages("remotes")
remotes::install_github("mattkerlogue/shrthnd")
```
## Usage
Use `shrthnd_num()` to convert a character vector to a `shrthnd_num` vector.
In effect a `shrthnd_num()` is a pair of vectors, one numeric and a character
vector to store the non-numeric components of the input vector. By default a
`shrthnd_num()` will try to behave as a numeric vector, and can be explicitly
coerced into a numeric vector with `as.numeric()`. You can use
`shrthnd_tags()`, amongst other functions, to interact with the non-numeric
("tag") component of the input vector. `{shrthnd}` also provides for the
annotation of data.frames, specifically of the `tibble::tibble()` flavour.
Full usage details are available on the `{shrthnd}`
[documentation website](https://mattkerlogue.github.io/shrthnd/).
```{r}
library(shrthnd)
x <- c("12", "34.567", "[c]", "NA", "56.78 [e]", "78.9", "90.123[e]",
"321.09*", "987.564 \u2021", ".", "..")
sh_x <- shrthnd_num(x)
sh_x
shrthnd_list(sh_x)
tbl <- tibble::tibble(
x = x,
sh_x = sh_x,
as_num = as.numeric(sh_x),
as_char = as.character(sh_x),
tag = shrthnd_tags(sh_x),
as_shrthnd = as_shrthnd(sh_x),
as_shrthnd2 = as_shrthnd(sh_x, digits = 3)
)
tbl
sh_tbl <- shrthnd_tbl(
tbl,
title = "Example table",
notes = c("Note 1", "Note 2"),
source_note = "Shrthnd documentation, 2023"
)
sh_tbl
annotations(sh_tbl)
```
## Philosophy
Datasets, especially statistical data published by governments, international
institutions and academia, often comes with symbols and markers to provide
further details about the values: that a value is estimated, the reason for
why a value is missing, or that a value has a given statistical significance
level.
The most common approach to processing data that contains both numeric and
non-numeric components is to scrub the non-numeric content, so that the input
can be coerced into a numeric vector. However, this non-numeric content
("tags") often convey useful information that it might be useful to retain. If
you want to access this non-numeric content, you may need to re-import your
dataset or change your processing. This creates opportunity for error and,
critically, de-linking the numeric and non-numeric components. The
`shrthnd_num()` data type builds on [`vctrs::new_rcrd()`](https://vctrs.r-lib.org/reference/new_rcrd.html) to
separate, but keep linked, these numeric and non-numeric components of a
vector.
## Logo
The `{shrthnd}` package logo is a combination of the word "shorthand" written
in [Pitman shorthand](https://en.wikipedia.org/wiki/Pitman_shorthand) alongside
an asterisk. The image was drawn by hand with plot points then adjusted for
plotting in `{ggplot2}`. The "shorthand" shape is based on the representation
in Arthur Reynold's *Pitman's English and Shorthand Dictionary*, retrieved from the
[Internet Archive on 2023-05-11](https://archive.org/details/in.ernet.dli.2015.449114/page/n641/mode/1up).