frictionlessdata · peterdesmet · Aug 27, 2024 · Jul 9, 2024 · Jul 9, 2024 · Jul 9, 2024
diff --git a/.gitignore b/.gitignore
@@ -24,6 +24,8 @@
 # produced vignettes
 vignettes/*.html
 vignettes/*.pdf
+vignettes/*.R
+inst/doc
 
 # OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
 .httr-oauth

diff --git a/NEWS.md b/NEWS.md
@@ -5,7 +5,8 @@
 * `add_resource()` now allows to replace an existing resource (#227).
 * `read_resource()` now returns error if both `path` and `data` are provided (#143).
 * `write_package()` no longer writes to `"."` by default, since this is not allowed by CRAN policies. The user needs to explicitly define a directory (#205).
-* `write_package()` now writes incoming `null` values back to `NULL` in `datapackage.json`, rather than empty an empty lists. Properties that are assigned `NA` and `NULL` by the user, remain being written as `null` and removed respectively (#203).
+* `write_package()` now writes incoming `null` values back to `NULL` in `datapackage.json`, rather than empty lists. Properties that are assigned `NA` and `NULL` by the user, remain being written as `null` and removed respectively (#203).
+* New vignettes `vignette("data-package")`, `vignette("data-resource")`, `vignette("table-dialect")` and `vignette("table-schema")` describe how frictionless implements the Data Package standard. The (verbose) function documentation of `read_resource()` and `create_schema()` has been moved to these vignettes, improving readability and maintenance (#208, #246).
 * The included dataset `example_package` is removed in favour of the function `example_package()`. This function allows to reproducibly provide a _local Data Package_, while before it needed to be a remote package. The `observations` resource was also changed from a remote to a local resource - allowing the entire example Data Package to be read locally - and from CSV to TSV - allowing to test for dialect. Examples and tests were updated (#114, #253).
 
 ## Changes for developers

diff --git a/R/add_resource.R b/R/add_resource.R
@@ -1,19 +1,22 @@
 #' Add a Data Resource
 #'
-#' Adds a [Data Resource](https://specs.frictionlessdata.io/data-resource/) to a
-#' Data Package.
+#' Adds a Data Resource to a Data Package.
 #' The resource will be a [Tabular Data Resource](
 #' https://specs.frictionlessdata.io/tabular-data-resource/).
 #' The resource name can only contain lowercase alphanumeric characters plus
 #' `.`, `-` and `_`.
 #'
+#' See `vignette("data-resource")` (and to a lesser extend
+#' `vignette("table-dialect")`) to learn how this function implements the
+#' Data Package standard.
+#'
 #' @inheritParams read_resource
 #' @param data Data to attach, either a data frame or path(s) to CSV file(s):
 #'   - Data frame: attached to the resource as `data` and written to a CSV file
 #'     when using [write_package()].
 #'   - One or more paths to CSV file(s) as a character (vector): added to the
 #'     resource as `path`.
-#'     The **last file will be read** with [readr::read_delim()] to create or
+#'     The last file will be read with [readr::read_delim()] to create or
 #'     compare with `schema` and to set `format`, `mediatype` and `encoding`.
 #'     The other files are ignored, but are expected to have the same structure
 #'     and properties.
@@ -24,11 +27,10 @@
 #'   resource with the same name.
 #' @param delim Single character used to separate the fields in the CSV file(s),
 #'   e.g. `\t` for tab delimited file.
-#'   Will be set as `delimiter` in the resource [CSV
-#'   dialect](https://specs.frictionlessdata.io/csv-dialect/#specification), so
-#'   read functions know how to read the file(s).
-#' @param ... Additional [metadata
-#'   properties](https://specs.frictionlessdata.io/data-resource/#metadata-properties)
+#'   Will be set as `delimiter` in the resource Table Dialect, so read functions
+#'.  know how to read the file(s).
+#' @param ... Additional [metadata properties](
+#'   https://docs.ropensci.org/frictionless/articles/data-resource.html#properties-implementation)
 #'   to add to the resource, e.g. `title = "My title", validated = FALSE`.
 #'   These are not verified against specifications and are ignored by
 #'   [read_resource()].

diff --git a/R/create_package.R b/R/create_package.R
@@ -1,18 +1,18 @@
 #' Create a Data Package
 #'
-#' Initiates a [Data Package](https://specs.frictionlessdata.io/data-package/)
-#' object, either from scratch or from an existing list.
+#' Initiates a Data Package object, either from scratch or from an existing
+#' list.
 #' This Data Package object is a list with the following characteristics:
 #' - A `datapackage` subclass.
 #' - All properties of the original `descriptor`.
-#' - A [`resources`](
-#'   https://specs.frictionlessdata.io/data-package/#required-properties)
-#'   property, set to an empty list if undefined.
+#' - A `resources` property, set to an empty list if undefined.
 #' - A `directory` property, set to `"."` for the current directory if
 #'   undefined.
 #'   It is used as the base path to access resources with [read_resource()].
 #'
-#' The function will run [check_package()] on the created package to make sure
+#' See `vignette("data-package")` to learn how this function implements the
+#' Data Package standard.
+#' [check_package()] is automatically called on the created package to make sure
 #' it is valid.
 #'
 #' @param descriptor List to be made into a Data Package object.

diff --git a/R/create_schema.R b/R/create_schema.R
@@ -1,56 +1,15 @@
 #' Create a Table Schema for a data frame
 #'
-#' Creates a [Table Schema](https://specs.frictionlessdata.io/table-schema/) for
-#' a data frame, listing all column names and types as field names and
-#' (converted) types.
+#' Creates a Table Schema for a data frame, listing all column names and types
+#' as field names and (converted) types.
+#'
+#' See `vignette("table-schema")` to learn how this function implements the
+#' Data Package standard.
 #'
 #' @param data A data frame.
 #' @return List describing a Table Schema.
 #' @family create functions
 #' @export
-#' @section Table schema properties:
-#' The Table Schema will be created from the data frame columns:
-#'
-#' - `name`: contains the column name.
-#' - `title`: not set.
-#' - `description`: not set.
-#' - `type`: contains the converted column type (see further).
-#' - `format`: not set and can thus be considered `default`.
-#'   This is also the case for dates, times and datetimes, since
-#'   [readr::write_csv()] used by [write_package()] will format those to ISO8601
-#'   which is considered the default.
-#'   Datetimes in local or non-UTC timezones will be converted to UTC before
-#'   writing.
-#' - `constraints`: not set, except for factors (see further).
-#' - `missingValues`: not set.
-#'   [write_package()] will use the default `""` for missing values.
-#' - `primaryKey`: not set.
-#' - `foreignKeys`: not set.
-#'
-#' ## Field types
-#'
-#' The column type will determine the field `type`, as follows:
-#'
-#' - `character` as
-#'   [string](https://specs.frictionlessdata.io/table-schema/#string).
-#' - `Date` as [date](https://specs.frictionlessdata.io/table-schema/#date).
-#' - `difftime` as
-#'   [number](https://specs.frictionlessdata.io/table-schema/#number).
-#' - `factor` as
-#'   [string](https://specs.frictionlessdata.io/table-schema/#string) with
-#'   factor levels as `enum`.
-#' - [hms::hms()] as
-#'   [time](https://specs.frictionlessdata.io/table-schema/#time).
-#' - `integer` as
-#'   [integer](https://specs.frictionlessdata.io/table-schema/#integer).
-#' - `logical` as.
-#'   [boolean](https://specs.frictionlessdata.io/table-schema/#boolean).
-#' - `numeric` as
-#'   [number](https://specs.frictionlessdata.io/table-schema/#number).
-#' - `POSIXct`/`POSIXlt` as
-#'   [datetime](https://specs.frictionlessdata.io/table-schema/#datetime).
-#' - Any other type as
-#'   [any](https://specs.frictionlessdata.io/table-schema/#any).
 #' @examples
 #' # Create a data frame
 #' df <- data.frame(

diff --git a/R/get_schema.R b/R/get_schema.R
@@ -1,12 +1,13 @@
 #' Get the Table Schema of a Data Resource
 #'
-#' Returns the [Table Schema](https://specs.frictionlessdata.io/table-schema/)
-#' of a Data Resource (in a Data Package), i.e. the content of its `schema`
-#' property, describing the resource's fields, data types, relationships, and
-#' missing values.
+#' Returns the Table Schema of a Data Resource (in a Data Package), i.e. the
+#' content of its `schema` property, describing the resource's fields, data
+#' types, relationships, and missing values.
 #' The resource must be a [Tabular Data Resource](
 #' https://specs.frictionlessdata.io/tabular-data-resource/).
 #'
+#' See `vignette("table-schema")` to learn more about Table Schema.
+#'
 #' @inheritParams read_resource
 #' @return List describing a Table Schema.
 #' @family accessor functions

diff --git a/R/read_package.R b/R/read_package.R
@@ -4,6 +4,9 @@
 #' https://specs.frictionlessdata.io/data-package/#descriptor) file that
 #' describes the Data Package metadata and its Data Resources.
 #'
+#' See `vignette("data-package")` to learn how this function implements the
+#' Data Package standard.
+#'
 #' @param file Path or URL to a `datapackage.json` file.
 #' @return A Data Package object, see [create_package()].
 #' @family read functions

diff --git a/R/read_resource.R b/R/read_resource.R
@@ -1,15 +1,18 @@
 #' Read data from a Data Resource into a tibble data frame
 #'
-#' Reads data from a [Data Resource](
-#' https://specs.frictionlessdata.io/data-resource/) (in a Data Package) into a
-#' tibble (a Tidyverse data frame).
+#' Reads data from a Data Resource (in a Data Package) into a tibble (a
+#' Tidyverse data frame).
 #' The resource must be a [Tabular Data Resource](
 #' https://specs.frictionlessdata.io/tabular-data-resource/).
 #' The function uses [readr::read_delim()] to read CSV files, passing the
 #' resource properties `path`, CSV dialect, column names, data types, etc.
 #' Column names are taken from the provided Table Schema (`schema`), not from
 #' the header in the CSV file(s).
 #'
+#' See `vignette("data-resource")`, `vignette("table-dialect")` and
+#' `vignette("table-schema")` to learn how this function implements the
+#' Data Package standard.
+#'
 #' @param package Data Package object, as returned by [read_package()] or
 #'   [create_package()].
 #' @param resource_name Name of the Data Resource.
@@ -22,164 +25,6 @@
 #'   frame.
 #' @family read functions
 #' @export
-#' @section Resource properties:
-#' The [Data Resource properties](
-#' https://specs.frictionlessdata.io/data-resource/) are handled as follows:
-#'
-#' ## Path
-#'
-#' [`path`](https://specs.frictionlessdata.io/data-resource/#data-location) is
-#' required.
-#' It can be a local path or URL, which must resolve.
-#' Absolute path (`/`) and relative parent path (`../`) are forbidden to avoid
-#' security vulnerabilities.
-#'
-#' When multiple paths are provided (`"path": [ "myfile1.csv", "myfile2.csv"]`)
-#' then data are merged into a single data frame, in the order in which the
-#' paths are listed.
-#'
-#' ## Data
-#'
-#' If `path` is not present, the function will attempt to read data from the
-#' `data` property.
-#' **`schema` will be ignored**.
-#'
-#' ## Name
-#'
-#' `name` is [required](https://specs.frictionlessdata.io/data-resource/#name).
-#' It is used to find the resource with `name` = `resource_name`.
-#'
-#' ## Profile
-#'
-#' `profile` is [required](
-#' https://specs.frictionlessdata.io/tabular-data-resource/#specification) to
-#' have the value `tabular-data-resource`.
-#'
-#' ## File encoding
-#'
-#' `encoding` (e.g. `windows-1252`) is [required](
-#' https://specs.frictionlessdata.io/data-resource/#optional-properties) if the
-#' resource file(s) is not encoded as UTF-8.
-#' The returned data frame will always be UTF-8.
-#'
-#' ## CSV Dialect
-#'
-#' `dialect` properties are [required](
-#' https://specs.frictionlessdata.io/csv-dialect/#specification) if the resource
-#' file(s) deviate from the default CSV settings (see below).
-#' It can either be a JSON object or a path or URL referencing a JSON object.
-#' Only deviating properties need to be specified, e.g. a tab delimited file
-#' without a header row needs:
-#' ```json
-#' "dialect": {"delimiter": "\t", "header": "false"}
-#' ```
-#'
-#' These are the CSV dialect properties.
-#' Some are ignored by the function:
-#' - `delimiter`: default `,`.
-#' - `lineTerminator`: ignored, line terminator characters `LF` and `CRLF` are
-#'   interpreted automatically by [readr::read_delim()], while `CR` (used by
-#'   Classic Mac OS, final release 2001) is not supported.
-#' - `doubleQuote`: default `true`.
-#' - `quoteChar`: default `"`.
-#' - `escapeChar`: anything but `\` is ignored and it will set `doubleQuote` to
-#'   `false` as these fields are mutually exclusive.
-#'   You can thus not escape with `\"` and `""` in the same file.
-#' - `nullSequence`: ignored, use `missingValues`.
-#' - `skipInitialSpace`: default `false`.
-#' - `header`: default `true`.
-#' - `commentChar`: not set by default.
-#' - `caseSensitiveHeader`: ignored, header is not used for column names, see
-#'   Schema.
-#' - `csvddfVersion`: ignored.
-#'
-#' ## File compression
-#'
-#' Resource file(s) with `path` ending in `.gz`, `.bz2`, `.xz`, or `.zip` are
-#' automatically decompressed using default [readr::read_delim()]
-#' functionality.
-#' Only `.gz` files can be read directly from URL `path`s.
-#' Only the extension in `path` can be used to indicate compression type,
-#' the `compression` property is [ignored](
-#' https://specs.frictionlessdata.io/patterns/#specification-3).
-#'
-#' ## Ignored resource properties
-#'
-#' - `title`
-#' - `description`
-#' - `format`
-#' - `mediatype`
-#' - `bytes`
-#' - `hash`
-#' - `sources`
-#' - `licenses`
-#' @section Table schema properties:
-#' `schema` is required and must follow the [Table Schema](
-#' https://specs.frictionlessdata.io/table-schema/) specification.
-#' It can either be a JSON object or a path or URL referencing a JSON object.
-#'
-#' - Field `name`s are used as column headers.
-#' - Field `type`s are use as column types (see further).
-#' - [`missingValues`](
-#'   https://specs.frictionlessdata.io/table-schema/#missing-values) are used to
-#'   interpret as `NA`, with `""` as default.
-#'
-#' ## Field types
-#'
-#' Field `type` is used to set the column type, as follows:
-#'
-#' - [string](https://specs.frictionlessdata.io/table-schema/#string) as
-#'   `character`; or `factor` when `enum` is present.
-#'   `format` is ignored.
-#' - [number](https://specs.frictionlessdata.io/table-schema/#number) as
-#'   `double`; or `factor` when `enum` is present.
-#'   Use `bareNumber: false` to ignore whitespace and non-numeric characters.
-#'   `decimalChar` (`.` by default) and `groupChar` (undefined by default) can
-#'   be defined, but the most occurring value will be used as a global value for
-#'   all number fields of that resource.
-#' - [integer](https://specs.frictionlessdata.io/table-schema/#integer) as
-#'   `double` (not integer, to avoid issues with big numbers); or `factor` when
-#'   `enum` is present.
-#'   Use `bareNumber: false` to ignore whitespace and non-numeric characters.
-#' - [boolean](https://specs.frictionlessdata.io/table-schema/#boolean) as
-#'   `logical`.
-#'   Non-default `trueValues/falseValues` are not supported.
-#' - [object](https://specs.frictionlessdata.io/table-schema/#object) as
-#'   `character`.
-#' - [array](https://specs.frictionlessdata.io/table-schema/#array) as
-#'   `character`.
-#' - [date](https://specs.frictionlessdata.io/table-schema/#date) as `date`.
-#'   Supports `format`, with values `default` (ISO date), `any` (guess `ymd`)
-#'   and [Python/C strptime](
-#'   https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior)
-#'   patterns, such as `%a, %d %B %Y` for `Sat, 23 November 2013`.
-#'   `%x` is `%m/%d/%y`.
-#'   `%j`, `%U`, `%w` and `%W` are not supported.
-#' - [time](https://specs.frictionlessdata.io/table-schema/#time) as
-#'   [hms::hms()].
-#'   Supports `format`, with values `default` (ISO time), `any` (guess `hms`)
-#'   and [Python/C strptime](
-#'   https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior)
-#'   patterns, such as `%I%p%M:%S.%f%z` for `8AM30:00.300+0200`.
-#' - [datetime](https://specs.frictionlessdata.io/table-schema/#datetime) as
-#'   `POSIXct`.
-#'   Supports `format`, with values `default` (ISO datetime), `any`
-#'   (ISO datetime) and the same patterns as for `date` and `time`.
-#'   `%c` is not supported.
-#' - [year](https://specs.frictionlessdata.io/table-schema/#year) as `date`,
-#'   with `01` for month and day.
-#' - [yearmonth](https://specs.frictionlessdata.io/table-schema/#yearmonth) as
-#'   `date`, with `01` for day.
-#' - [duration](https://specs.frictionlessdata.io/table-schema/#duration) as
-#'   `character`.
-#'   Can be parsed afterwards with [lubridate::duration()].
-#' - [geopoint](https://specs.frictionlessdata.io/table-schema/#geopoint) as
-#'   `character`.
-#' - [geojson](https://specs.frictionlessdata.io/table-schema/#geojson) as
-#'   `character`.
-#' - [any](https://specs.frictionlessdata.io/table-schema/#any) as `character`.
-#' - Any other value is not allowed.
-#' - Type is guessed if not provided.
 #' @examples
 #' # Read a datapackage.json file
 #' package <- read_package(

diff --git a/R/remove_resource.R b/R/remove_resource.R
@@ -1,7 +1,7 @@
 #' Remove a Data Resource
 #'
-#' Removes a [Data Resource](https://specs.frictionlessdata.io/data-resource/)
-#' from a Data Package, i.e. it removes one of the described `resources`.
+#' Removes a Data Resource from a Data Package, i.e. it removes one of the
+#' described `resources`.
 #'
 #' @inheritParams read_resource
 #' @return `package` with one fewer resource.

diff --git a/R/utils.R b/R/utils.R
@@ -19,7 +19,7 @@ unique_sorted <- function(x) {
 #' Clean list
 #'
 #' Removes all elements from a list that meet a criterion function, e.g.
-#' `is.null(x)` for empty elements.
+#' [is.null()] for empty elements.
 #' Removal can be recursive to guarantee elements are removed at any level.
 #' Function is copied and adapted from `rlist::list.clean()` (MIT licensed), to
 #' avoid requiring full `rlist` dependency.

diff --git a/R/write_package.R b/R/write_package.R
@@ -10,7 +10,7 @@
 #'   location of file(s).
 #' - Resource `path` has only URL(s): resource stays as is.
 #' - Resource has inline `data` originally: resource stays as is.
-#' - Resource has inline `data` as result of adding data with `add_resource()`:
+#' - Resource has inline `data` as result of adding data with [add_resource()]:
 #'   data are written to a CSV file using [readr::write_csv()], `path` points to
 #'   location of file, `data` property is removed.
 #'   Use `compress = TRUE` to gzip those CSV files.