Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Data Package compatibility to vignettes #246

Merged
merged 41 commits into from
Aug 27, 2024
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
1d09d94
Ignore vignette extras
peterdesmet Jul 9, 2024
c9b9bf5
Copy paste read_resource info to vignettes
peterdesmet Jul 9, 2024
c64d19d
Describe all Data Resource properties
peterdesmet Jul 9, 2024
364c9c0
Update data-resource and table-dialect
peterdesmet Jul 10, 2024
a2df6e6
Merge branch 'main' into vignettes
peterdesmet Jul 10, 2024
1fc0edc
Update table-schema.Rmd
peterdesmet Jul 10, 2024
5470999
Add alt to logo to avoid pkgdown 2.1.0 warning
peterdesmet Jul 10, 2024
fffc5b3
a bit more concise
sannegovaert Jul 10, 2024
46e4043
more clear
sannegovaert Jul 10, 2024
4b5b2b7
small update
sannegovaert Jul 10, 2024
0923df9
3rd group Mutating added
sannegovaert Jul 10, 2024
41f1d7d
more concise
sannegovaert Jul 10, 2024
250c838
update links
sannegovaert Jul 10, 2024
08d7f79
use present tense
sannegovaert Jul 10, 2024
d546e47
use present tense
sannegovaert Jul 10, 2024
a6ac11f
Merge branch 'main' into vignettes
peterdesmet Jul 15, 2024
7ebee57
Add URL, remove author properties
peterdesmet Jul 15, 2024
88f042e
Update Data Resource and Table Dialect
peterdesmet Jul 15, 2024
d438907
Rework data-resource
peterdesmet Aug 22, 2024
90c628c
Update data-resource and table-dialect
peterdesmet Aug 22, 2024
1040c5f
Update title
peterdesmet Aug 22, 2024
8977690
Add author + update some phrasing
peterdesmet Aug 23, 2024
58aecce
Finalize table-schema vignette
peterdesmet Aug 23, 2024
c6f4631
Create data-package.Rmd
peterdesmet Aug 23, 2024
19969b4
Merge branch 'main' into vignettes
peterdesmet Aug 23, 2024
0526367
Fix todo and make use of observations_1.tsv
peterdesmet Aug 23, 2024
d3fe50f
Rephrase "support" as "implementation"
peterdesmet Aug 23, 2024
eff34a2
Finalize Data Package vignette
peterdesmet Aug 23, 2024
cff1cdc
Simplify titles and use custom navbar
peterdesmet Aug 23, 2024
8189dc7
Avoid "will"
peterdesmet Aug 23, 2024
aaeb785
Rather than custom navbar, group articles
peterdesmet Aug 23, 2024
5491f26
Link to articles from README
peterdesmet Aug 23, 2024
adfdf39
Avoid use of "pkg" + link to package as {pkg}
peterdesmet Aug 23, 2024
5d0b565
Clarify v1 vs v2 support
peterdesmet Aug 23, 2024
aeb670c
Link functions to vignettes instead of verbosely describing
peterdesmet Aug 23, 2024
e38d56e
Use [function()]
peterdesmet Aug 23, 2024
6f1c61b
Indicate warn
peterdesmet Aug 23, 2024
fe30d7d
Describe change
peterdesmet Aug 23, 2024
e1ba4ec
replace with working link
sannegovaert Aug 26, 2024
55511cc
Review suggestions
peterdesmet Aug 26, 2024
5e46bed
Remove link to issues
peterdesmet Aug 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@
# produced vignettes
vignettes/*.html
vignettes/*.pdf
vignettes/*.R
inst/doc

# OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
.httr-oauth
Expand Down
3 changes: 2 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
* `add_resource()` now allows to replace an existing resource (#227).
* `read_resource()` now returns error if both `path` and `data` are provided (#143).
* `write_package()` no longer writes to `"."` by default, since this is not allowed by CRAN policies. The user needs to explicitly define a directory (#205).
* `write_package()` now writes incoming `null` values back to `NULL` in `datapackage.json`, rather than empty an empty lists. Properties that are assigned `NA` and `NULL` by the user, remain being written as `null` and removed respectively (#203).
* `write_package()` now writes incoming `null` values back to `NULL` in `datapackage.json`, rather than empty lists. Properties that are assigned `NA` and `NULL` by the user, remain being written as `null` and removed respectively (#203).
* New vignettes `vignette("data-package")`, `vignette("data-resource")`, `vignette("table-dialect")` and `vignette("table-schema")` describe how frictionless implements the Data Package standard. The (verbose) function documentation of `read_resource()` and `create_schema()` has been moved to these vignettes, improving readability and maintenance (#208, #246).
* The included dataset `example_package` is removed in favour of the function `example_package()`. This function allows to reproducibly provide a _local Data Package_, while before it needed to be a remote package. The `observations` resource was also changed from a remote to a local resource - allowing the entire example Data Package to be read locally - and from CSV to TSV - allowing to test for dialect. Examples and tests were updated (#114, #253).

## Changes for developers
Expand Down
18 changes: 10 additions & 8 deletions R/add_resource.R
Original file line number Diff line number Diff line change
@@ -1,19 +1,22 @@
#' Add a Data Resource
#'
#' Adds a [Data Resource](https://specs.frictionlessdata.io/data-resource/) to a
#' Data Package.
#' Adds a Data Resource to a Data Package.
#' The resource will be a [Tabular Data Resource](
#' https://specs.frictionlessdata.io/tabular-data-resource/).
#' The resource name can only contain lowercase alphanumeric characters plus
#' `.`, `-` and `_`.
#'
#' See `vignette("data-resource")` (and to a lesser extend
peterdesmet marked this conversation as resolved.
Show resolved Hide resolved
#' `vignette("table-dialect")`) to learn how this function implements the
#' Data Package standard.
#'
#' @inheritParams read_resource
#' @param data Data to attach, either a data frame or path(s) to CSV file(s):
#' - Data frame: attached to the resource as `data` and written to a CSV file
#' when using [write_package()].
#' - One or more paths to CSV file(s) as a character (vector): added to the
#' resource as `path`.
#' The **last file will be read** with [readr::read_delim()] to create or
#' The last file will be read with [readr::read_delim()] to create or
#' compare with `schema` and to set `format`, `mediatype` and `encoding`.
#' The other files are ignored, but are expected to have the same structure
#' and properties.
Expand All @@ -24,11 +27,10 @@
#' resource with the same name.
#' @param delim Single character used to separate the fields in the CSV file(s),
#' e.g. `\t` for tab delimited file.
#' Will be set as `delimiter` in the resource [CSV
#' dialect](https://specs.frictionlessdata.io/csv-dialect/#specification), so
#' read functions know how to read the file(s).
#' @param ... Additional [metadata
#' properties](https://specs.frictionlessdata.io/data-resource/#metadata-properties)
#' Will be set as `delimiter` in the resource Table Dialect, so read functions
#'. know how to read the file(s).
#' @param ... Additional [metadata properties](
#' https://docs.ropensci.org/frictionless/articles/data-resource.html#properties-implementation)
peterdesmet marked this conversation as resolved.
Show resolved Hide resolved
#' to add to the resource, e.g. `title = "My title", validated = FALSE`.
#' These are not verified against specifications and are ignored by
#' [read_resource()].
Expand Down
12 changes: 6 additions & 6 deletions R/create_package.R
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
#' Create a Data Package
#'
#' Initiates a [Data Package](https://specs.frictionlessdata.io/data-package/)
#' object, either from scratch or from an existing list.
#' Initiates a Data Package object, either from scratch or from an existing
#' list.
#' This Data Package object is a list with the following characteristics:
#' - A `datapackage` subclass.
#' - All properties of the original `descriptor`.
#' - A [`resources`](
#' https://specs.frictionlessdata.io/data-package/#required-properties)
#' property, set to an empty list if undefined.
#' - A `resources` property, set to an empty list if undefined.
#' - A `directory` property, set to `"."` for the current directory if
#' undefined.
#' It is used as the base path to access resources with [read_resource()].
#'
#' The function will run [check_package()] on the created package to make sure
#' See `vignette("data-package")` to learn how this function implements the
#' Data Package standard.
#' [check_package()] is automatically called on the created package to make sure
#' it is valid.
#'
#' @param descriptor List to be made into a Data Package object.
Expand Down
51 changes: 5 additions & 46 deletions R/create_schema.R
Original file line number Diff line number Diff line change
@@ -1,56 +1,15 @@
#' Create a Table Schema for a data frame
#'
#' Creates a [Table Schema](https://specs.frictionlessdata.io/table-schema/) for
#' a data frame, listing all column names and types as field names and
#' (converted) types.
#' Creates a Table Schema for a data frame, listing all column names and types
#' as field names and (converted) types.
#'
#' See `vignette("table-schema")` to learn how this function implements the
#' Data Package standard.
#'
#' @param data A data frame.
#' @return List describing a Table Schema.
#' @family create functions
#' @export
#' @section Table schema properties:
#' The Table Schema will be created from the data frame columns:
#'
#' - `name`: contains the column name.
#' - `title`: not set.
#' - `description`: not set.
#' - `type`: contains the converted column type (see further).
#' - `format`: not set and can thus be considered `default`.
#' This is also the case for dates, times and datetimes, since
#' [readr::write_csv()] used by [write_package()] will format those to ISO8601
#' which is considered the default.
#' Datetimes in local or non-UTC timezones will be converted to UTC before
#' writing.
#' - `constraints`: not set, except for factors (see further).
#' - `missingValues`: not set.
#' [write_package()] will use the default `""` for missing values.
#' - `primaryKey`: not set.
#' - `foreignKeys`: not set.
#'
#' ## Field types
#'
#' The column type will determine the field `type`, as follows:
#'
#' - `character` as
#' [string](https://specs.frictionlessdata.io/table-schema/#string).
#' - `Date` as [date](https://specs.frictionlessdata.io/table-schema/#date).
#' - `difftime` as
#' [number](https://specs.frictionlessdata.io/table-schema/#number).
#' - `factor` as
#' [string](https://specs.frictionlessdata.io/table-schema/#string) with
#' factor levels as `enum`.
#' - [hms::hms()] as
#' [time](https://specs.frictionlessdata.io/table-schema/#time).
#' - `integer` as
#' [integer](https://specs.frictionlessdata.io/table-schema/#integer).
#' - `logical` as.
#' [boolean](https://specs.frictionlessdata.io/table-schema/#boolean).
#' - `numeric` as
#' [number](https://specs.frictionlessdata.io/table-schema/#number).
#' - `POSIXct`/`POSIXlt` as
#' [datetime](https://specs.frictionlessdata.io/table-schema/#datetime).
#' - Any other type as
#' [any](https://specs.frictionlessdata.io/table-schema/#any).
#' @examples
#' # Create a data frame
#' df <- data.frame(
Expand Down
9 changes: 5 additions & 4 deletions R/get_schema.R
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
#' Get the Table Schema of a Data Resource
#'
#' Returns the [Table Schema](https://specs.frictionlessdata.io/table-schema/)
#' of a Data Resource (in a Data Package), i.e. the content of its `schema`
#' property, describing the resource's fields, data types, relationships, and
#' missing values.
#' Returns the Table Schema of a Data Resource (in a Data Package), i.e. the
#' content of its `schema` property, describing the resource's fields, data
#' types, relationships, and missing values.
#' The resource must be a [Tabular Data Resource](
#' https://specs.frictionlessdata.io/tabular-data-resource/).
#'
#' See `vignette("table-schema")` to learn more about Table Schema.
#'
#' @inheritParams read_resource
#' @return List describing a Table Schema.
#' @family accessor functions
Expand Down
3 changes: 3 additions & 0 deletions R/read_package.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@
#' https://specs.frictionlessdata.io/data-package/#descriptor) file that
#' describes the Data Package metadata and its Data Resources.
#'
#' See `vignette("data-package")` to learn how this function implements the
#' Data Package standard.
#'
#' @param file Path or URL to a `datapackage.json` file.
#' @return A Data Package object, see [create_package()].
#' @family read functions
Expand Down
167 changes: 6 additions & 161 deletions R/read_resource.R
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
#' Read data from a Data Resource into a tibble data frame
#'
#' Reads data from a [Data Resource](
#' https://specs.frictionlessdata.io/data-resource/) (in a Data Package) into a
#' tibble (a Tidyverse data frame).
#' Reads data from a Data Resource (in a Data Package) into a tibble (a
#' Tidyverse data frame).
#' The resource must be a [Tabular Data Resource](
#' https://specs.frictionlessdata.io/tabular-data-resource/).
#' The function uses [readr::read_delim()] to read CSV files, passing the
#' resource properties `path`, CSV dialect, column names, data types, etc.
#' Column names are taken from the provided Table Schema (`schema`), not from
#' the header in the CSV file(s).
#'
#' See `vignette("data-resource")`, `vignette("table-dialect")` and
#' `vignette("table-schema")` to learn how this function implements the
#' Data Package standard.
#'
#' @param package Data Package object, as returned by [read_package()] or
#' [create_package()].
#' @param resource_name Name of the Data Resource.
Expand All @@ -22,164 +25,6 @@
#' frame.
#' @family read functions
#' @export
#' @section Resource properties:
#' The [Data Resource properties](
#' https://specs.frictionlessdata.io/data-resource/) are handled as follows:
#'
#' ## Path
#'
#' [`path`](https://specs.frictionlessdata.io/data-resource/#data-location) is
#' required.
#' It can be a local path or URL, which must resolve.
#' Absolute path (`/`) and relative parent path (`../`) are forbidden to avoid
#' security vulnerabilities.
#'
#' When multiple paths are provided (`"path": [ "myfile1.csv", "myfile2.csv"]`)
#' then data are merged into a single data frame, in the order in which the
#' paths are listed.
#'
#' ## Data
#'
#' If `path` is not present, the function will attempt to read data from the
#' `data` property.
#' **`schema` will be ignored**.
#'
#' ## Name
#'
#' `name` is [required](https://specs.frictionlessdata.io/data-resource/#name).
#' It is used to find the resource with `name` = `resource_name`.
#'
#' ## Profile
#'
#' `profile` is [required](
#' https://specs.frictionlessdata.io/tabular-data-resource/#specification) to
#' have the value `tabular-data-resource`.
#'
#' ## File encoding
#'
#' `encoding` (e.g. `windows-1252`) is [required](
#' https://specs.frictionlessdata.io/data-resource/#optional-properties) if the
#' resource file(s) is not encoded as UTF-8.
#' The returned data frame will always be UTF-8.
#'
#' ## CSV Dialect
#'
#' `dialect` properties are [required](
#' https://specs.frictionlessdata.io/csv-dialect/#specification) if the resource
#' file(s) deviate from the default CSV settings (see below).
#' It can either be a JSON object or a path or URL referencing a JSON object.
#' Only deviating properties need to be specified, e.g. a tab delimited file
#' without a header row needs:
#' ```json
#' "dialect": {"delimiter": "\t", "header": "false"}
#' ```
#'
#' These are the CSV dialect properties.
#' Some are ignored by the function:
#' - `delimiter`: default `,`.
#' - `lineTerminator`: ignored, line terminator characters `LF` and `CRLF` are
#' interpreted automatically by [readr::read_delim()], while `CR` (used by
#' Classic Mac OS, final release 2001) is not supported.
#' - `doubleQuote`: default `true`.
#' - `quoteChar`: default `"`.
#' - `escapeChar`: anything but `\` is ignored and it will set `doubleQuote` to
#' `false` as these fields are mutually exclusive.
#' You can thus not escape with `\"` and `""` in the same file.
#' - `nullSequence`: ignored, use `missingValues`.
#' - `skipInitialSpace`: default `false`.
#' - `header`: default `true`.
#' - `commentChar`: not set by default.
#' - `caseSensitiveHeader`: ignored, header is not used for column names, see
#' Schema.
#' - `csvddfVersion`: ignored.
#'
#' ## File compression
#'
#' Resource file(s) with `path` ending in `.gz`, `.bz2`, `.xz`, or `.zip` are
#' automatically decompressed using default [readr::read_delim()]
#' functionality.
#' Only `.gz` files can be read directly from URL `path`s.
#' Only the extension in `path` can be used to indicate compression type,
#' the `compression` property is [ignored](
#' https://specs.frictionlessdata.io/patterns/#specification-3).
#'
#' ## Ignored resource properties
#'
#' - `title`
#' - `description`
#' - `format`
#' - `mediatype`
#' - `bytes`
#' - `hash`
#' - `sources`
#' - `licenses`
#' @section Table schema properties:
#' `schema` is required and must follow the [Table Schema](
#' https://specs.frictionlessdata.io/table-schema/) specification.
#' It can either be a JSON object or a path or URL referencing a JSON object.
#'
#' - Field `name`s are used as column headers.
#' - Field `type`s are use as column types (see further).
#' - [`missingValues`](
#' https://specs.frictionlessdata.io/table-schema/#missing-values) are used to
#' interpret as `NA`, with `""` as default.
#'
#' ## Field types
#'
#' Field `type` is used to set the column type, as follows:
#'
#' - [string](https://specs.frictionlessdata.io/table-schema/#string) as
#' `character`; or `factor` when `enum` is present.
#' `format` is ignored.
#' - [number](https://specs.frictionlessdata.io/table-schema/#number) as
#' `double`; or `factor` when `enum` is present.
#' Use `bareNumber: false` to ignore whitespace and non-numeric characters.
#' `decimalChar` (`.` by default) and `groupChar` (undefined by default) can
#' be defined, but the most occurring value will be used as a global value for
#' all number fields of that resource.
#' - [integer](https://specs.frictionlessdata.io/table-schema/#integer) as
#' `double` (not integer, to avoid issues with big numbers); or `factor` when
#' `enum` is present.
#' Use `bareNumber: false` to ignore whitespace and non-numeric characters.
#' - [boolean](https://specs.frictionlessdata.io/table-schema/#boolean) as
#' `logical`.
#' Non-default `trueValues/falseValues` are not supported.
#' - [object](https://specs.frictionlessdata.io/table-schema/#object) as
#' `character`.
#' - [array](https://specs.frictionlessdata.io/table-schema/#array) as
#' `character`.
#' - [date](https://specs.frictionlessdata.io/table-schema/#date) as `date`.
#' Supports `format`, with values `default` (ISO date), `any` (guess `ymd`)
#' and [Python/C strptime](
#' https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior)
#' patterns, such as `%a, %d %B %Y` for `Sat, 23 November 2013`.
#' `%x` is `%m/%d/%y`.
#' `%j`, `%U`, `%w` and `%W` are not supported.
#' - [time](https://specs.frictionlessdata.io/table-schema/#time) as
#' [hms::hms()].
#' Supports `format`, with values `default` (ISO time), `any` (guess `hms`)
#' and [Python/C strptime](
#' https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior)
#' patterns, such as `%I%p%M:%S.%f%z` for `8AM30:00.300+0200`.
#' - [datetime](https://specs.frictionlessdata.io/table-schema/#datetime) as
#' `POSIXct`.
#' Supports `format`, with values `default` (ISO datetime), `any`
#' (ISO datetime) and the same patterns as for `date` and `time`.
#' `%c` is not supported.
#' - [year](https://specs.frictionlessdata.io/table-schema/#year) as `date`,
#' with `01` for month and day.
#' - [yearmonth](https://specs.frictionlessdata.io/table-schema/#yearmonth) as
#' `date`, with `01` for day.
#' - [duration](https://specs.frictionlessdata.io/table-schema/#duration) as
#' `character`.
#' Can be parsed afterwards with [lubridate::duration()].
#' - [geopoint](https://specs.frictionlessdata.io/table-schema/#geopoint) as
#' `character`.
#' - [geojson](https://specs.frictionlessdata.io/table-schema/#geojson) as
#' `character`.
#' - [any](https://specs.frictionlessdata.io/table-schema/#any) as `character`.
#' - Any other value is not allowed.
#' - Type is guessed if not provided.
#' @examples
#' # Read a datapackage.json file
#' package <- read_package(
Expand Down
4 changes: 2 additions & 2 deletions R/remove_resource.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#' Remove a Data Resource
#'
#' Removes a [Data Resource](https://specs.frictionlessdata.io/data-resource/)
#' from a Data Package, i.e. it removes one of the described `resources`.
#' Removes a Data Resource from a Data Package, i.e. it removes one of the
#' described `resources`.
#'
#' @inheritParams read_resource
#' @return `package` with one fewer resource.
Expand Down
2 changes: 1 addition & 1 deletion R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ unique_sorted <- function(x) {
#' Clean list
#'
#' Removes all elements from a list that meet a criterion function, e.g.
#' `is.null(x)` for empty elements.
#' [is.null()] for empty elements.
#' Removal can be recursive to guarantee elements are removed at any level.
#' Function is copied and adapted from `rlist::list.clean()` (MIT licensed), to
#' avoid requiring full `rlist` dependency.
Expand Down
2 changes: 1 addition & 1 deletion R/write_package.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
#' location of file(s).
#' - Resource `path` has only URL(s): resource stays as is.
#' - Resource has inline `data` originally: resource stays as is.
#' - Resource has inline `data` as result of adding data with `add_resource()`:
#' - Resource has inline `data` as result of adding data with [add_resource()]:
#' data are written to a CSV file using [readr::write_csv()], `path` points to
#' location of file, `data` property is removed.
#' Use `compress = TRUE` to gzip those CSV files.
Expand Down
Loading
Loading