Extended R package for the 2016 Australian Census
This is the development site for an R package containing all 2016 Census data released by the ABS through its data packs. The data will be too large to host through CRAN.
I think the best way to explain my motivation for producing this package is to show you a variable name from Table G38 of the data pack:
Se_d_r_or_t_h_t_Tot_NofB_0_ib
There are two problems with the data packs:
- The variable names are arcane.
- The data is not tidy: subtotals and subvariables lurk among the variable names.
The goals of this package are:
- To tidy the data so that the tables are normalized.
- To provide at all costs readable variable names.
- Predictable table names and structure to support autocompletion.
- Measure columns are in CamelCase, with an optional suffix for upper/lower bounds (
.min
and.max
). - All table names:
- start with
[A-Z0-9]{3}
, representing the geographic extent of the key (e.g. tables starting withLGA
are summaries of Local Government Areas, those starting withSTE
are summaries of states/territories) - followed by two underscores
- followed by the names of the measure columns in CamelCase separated by underscores
- and finish with an underscore and the value column name (unless the value column is
persons
, in which case it is omitted).
- start with
- The measure columns are in alphabetical order (except for subitems).
- The value columns are in lower snakecase. (TODO)
- Tables never contain subtotals.
- Tables are ordered by the key and then by the measure columns.
In addition:
- The package tarball should be under 100 MB (so that it can be uploaded to a drat repository there).