diff --git a/inst/doc/YAML_CONFIG.R b/inst/doc/YAML_CONFIG.R index 0eadbd5..f5d5757 100644 --- a/inst/doc/YAML_CONFIG.R +++ b/inst/doc/YAML_CONFIG.R @@ -1,8 +1,7 @@ ## ---- echo = FALSE, results = 'hide'------------------------------------- library(DataPackageR) library(yaml) -yml <- DataPackageR::construct_yml_config(code = "subsetCars.Rmd", - data = "cars_over_20") +yml <- DataPackageR::construct_yml_config(code = "subsetCars.Rmd", data = "cars_over_20") ## ---- echo = FALSE, comment=""------------------------------------------- cat(yaml::as.yaml(yml)) diff --git a/inst/doc/YAML_CONFIG.html b/inst/doc/YAML_CONFIG.html index a31f227..9d61508 100644 --- a/inst/doc/YAML_CONFIG.html +++ b/inst/doc/YAML_CONFIG.html @@ -12,7 +12,7 @@ - + The DataPackageR YAML configuration file. @@ -280,7 +280,7 @@

The DataPackageR YAML configuration file.

Greg Finak gfinak@fredhutch.org

-

2018-07-31

+

2018-08-01

@@ -309,7 +309,7 @@

The datapackager.yml file.

enabled: yes objects: cars_over_20 render_root: - tmp: '138283' + tmp: '978673'

YAML config file properties.

@@ -399,7 +399,7 @@
Example
enabled: no objects: cars_over_20 render_root: - tmp: '739294' + tmp: '756425'
@@ -416,7 +416,7 @@
Example
enabled: yes objects: cars_over_20 render_root: - tmp: '739294' + tmp: '756425'
@@ -435,7 +435,7 @@
Example
enabled: yes objects: cars_over_20 render_root: - tmp: '739294' + tmp: '756425'
@@ -456,7 +456,7 @@
Example
- cars_over_20 - another_object render_root: - tmp: '739294' + tmp: '756425'
@@ -475,7 +475,7 @@
Example
- cars_over_20 - another_object render_root: - tmp: '739294' + tmp: '756425'
@@ -492,7 +492,7 @@
Example
enabled: yes objects: cars_over_20 render_root: - tmp: '739294' + tmp: '756425'
diff --git a/inst/doc/usingDataPackageR.R b/inst/doc/usingDataPackageR.R index ecf94cd..d57bc39 100644 --- a/inst/doc/usingDataPackageR.R +++ b/inst/doc/usingDataPackageR.R @@ -26,7 +26,9 @@ DataPackageR::datapackage_skeleton(name = "mtcars20", force = TRUE, code_files = processing_code, r_object_names = "cars_over_20", - path = tempdir() + path = tempdir() + #dependencies argument is empty + #raw_data_dir argument is empty. ) ## ----dirstructure,echo=FALSE--------------------------------------------- @@ -42,8 +44,7 @@ df <- data.frame(pathString = file.path( as.Node(df) ## ---- echo=FALSE--------------------------------------------------------- -cat(yaml::as.yaml(yaml::yaml.load_file( - file.path(tempdir(),"mtcars20","datapackager.yml")))) +cat(yaml::as.yaml(yaml::yaml.load_file(file.path(tempdir(),"mtcars20","datapackager.yml")))) ## ----eval=TRUE----------------------------------------------------------- # Run the preprocessing code to build cars_over_20 @@ -61,11 +62,13 @@ df <- data.frame(pathString = file.path( )) as.Node(df) +## ----rebuild_docs-------------------------------------------------------- +document(file.path(tempdir(),"mtcars20")) + ## ------------------------------------------------------------------------ # Let's use the package we just created. -install.packages(file.path(tempdir(),"mtcars20_1.0.tar.gz"), - type = "source", repos = NULL) -if (!"package:mtcars20" %in% search()) +install.packages(file.path(tempdir(),"mtcars20_1.0.tar.gz"), type = "source", repos = NULL) +if(!"package:mtcars20"%in%search()) attachNamespace('mtcars20') #use library() in your code data("cars_over_20") # load the data diff --git a/inst/doc/usingDataPackageR.Rmd b/inst/doc/usingDataPackageR.Rmd index 5c857e9..10bf2b8 100644 --- a/inst/doc/usingDataPackageR.Rmd +++ b/inst/doc/usingDataPackageR.Rmd @@ -41,7 +41,8 @@ The user needs to provide: - R or Rmd code files that do data processing. - A list of R object names created by those code files. - +- Optionally a path to a directory of raw data (will be copied into the package). +- Optionally a list of additional code files that may be dependencies of your R scripts. ```{r minimal_example, results='hide'} @@ -65,7 +66,9 @@ DataPackageR::datapackage_skeleton(name = "mtcars20", force = TRUE, code_files = processing_code, r_object_names = "cars_over_20", - path = tempdir() + path = tempdir() + #dependencies argument is empty + #raw_data_dir argument is empty. ) ``` @@ -120,9 +123,11 @@ Further information on the contents of the YAML configuration file, and the API Raw data (provided the size is not prohibitive) can be placed in `inst/extdata`. +The `datapackage_skeleton()` API has the `raw_data_dir` argument, which will copy the contents of `raw_data_dir` (and its subdirectories) into `inst/extdata` automatically. + In this example we are reading the `mtcars` data set that is already in memory, rather than from the file system. -#### An API to read raw data sets from within an R or Rmd procesing script. +### An API to read raw data sets from within an R or Rmd procesing script. As stated in the README, in order for your processing scripts to be portable, you should not use abosolute paths to files. DataPackageR provides an API to point to the data package root directory and the `inst/extdata` and `data` subdirectories. @@ -139,6 +144,23 @@ Similarly: Raw data sets that are stored externally (outside the data package source tree) can be constructed relative to the `project_path()`. +### YAML header metadata for R files and Rmd files. + +If your processing scripts are Rmd files, the usual yaml header for rmarkdown documents should be present. + +If you have Rmd files, you can still include a yaml header, but it should be commented with `#'` and it should be at the top of your R file. For example, a test R file in the DataPackageR package looks as follows: + +``` +#'--- +#'title: Sample report from R script +#'author: Greg Finak +#'date: August 1, 2018 +#'--- +data <- runif(100) +``` + +This will be converted to an Rmd file with a proper yaml header, which will then be turned into a vignette and properly indexed in the built package. + ## Build the data package. @@ -150,6 +172,21 @@ Once the skeleton framework is set up, DataPackageR:::package_build(file.path(tempdir(),"mtcars20")) ``` +### Documenting your data set changes in NEWS.md + +When you build a package in interactive mode, you will be +prompted to input text describing the changes to your data package (one line). + +These will appear in the NEWS.md file in the following format: + +``` +DataVersion: xx.yy.zz +======== +A description of your changes to the package + +[The rest of the file] +``` + ### Why not just use R CMD build? @@ -183,12 +220,22 @@ df <- data.frame(pathString = file.path( as.Node(df) ``` -#### Update the autogenerated documentation. +### Update the autogenerated documentation. After the first build, the `R` directory contains `mtcars.R` that has autogenerated `roxygen2` markup documentation for the data package and for the packaged data `cars_over20`. The processed `Rd` files can be found in `man`. +The autogenerated documentation source is in the `documentation.R` file in `data-raw`. + +You should update this file to properly document your objects. Then rebuild the documentation: + +```{r rebuild_docs} +document(file.path(tempdir(),"mtcars20")) +``` + +This is done without reprocessing the data. + #### Dont' forget to rebuild the package. You should update the documentation in `R/mtcars.R`, then call `package_build()` again. diff --git a/inst/doc/usingDataPackageR.html b/inst/doc/usingDataPackageR.html index 52d8354..41ce053 100644 --- a/inst/doc/usingDataPackageR.html +++ b/inst/doc/usingDataPackageR.html @@ -12,7 +12,7 @@ - + Using DataPackageR @@ -280,7 +280,7 @@

Using DataPackageR

Greg Finak gfinak@fredhutch.org

-

2018-07-31

+

2018-08-01

@@ -290,11 +290,15 @@

2018-07-31

  • What’s in the package skeleton structure?
  • A few words abou the YAML config file
  • Where do I put my raw datasets?
  • +
  • An API to read raw data sets from within an R or Rmd procesing script.
  • +
  • YAML header metadata for R files and Rmd files.
  • Build the data package.
  • Installing and using the new data package
  • What’s in the package skeleton structure?

    This has created a datapackage source tree named “mtcars20” (in a temporary directory). For a real use case you would pick a path on your filesystem where you could then initialize a new github repository for the package.

    @@ -387,7 +396,7 @@

    A few words abou the YAML config file

    enabled: yes objects: cars_over_20 render_root: - tmp: '791000' + tmp: '534320'

    The two main pieces of information in the configuration are a list of the files to be processed and the data sets the package will store.

    This example packages an R data set named cars_over_20 (the name was passed in to datapackage_skeleton()). It is created by the subsetCars.Rmd file.

    The objects must be listed in the yaml configuration file. datapackage_skeleton() ensures this is done for you automatically.

    @@ -397,62 +406,98 @@

    A few words abou the YAML config file

    Where do I put my raw datasets?

    Raw data (provided the size is not prohibitive) can be placed in inst/extdata.

    +

    The datapackage_skeleton() API has the raw_data_dir argument, which will copy the contents of raw_data_dir (and its subdirectories) into inst/extdata automatically.

    In this example we are reading the mtcars data set that is already in memory, rather than from the file system.

    -
    -

    An API to read raw data sets from within an R or Rmd procesing script.

    +
    +
    +

    An API to read raw data sets from within an R or Rmd procesing script.

    As stated in the README, in order for your processing scripts to be portable, you should not use abosolute paths to files. DataPackageR provides an API to point to the data package root directory and the inst/extdata and data subdirectories. These are useful for constructing portable paths in your code to read files from these locations.

    For example: to construct a path to a file named “mydata.csv” located in inst/extdata in your data package source tree:

      -
    • use DataPackageR::project_extdata_path("mydata.csv") in your R or Rmd file. This would return: e.g., /var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T//RtmpilrC9a/mtcars20/inst/extdata/mydata.csv
    • +
    • use DataPackageR::project_extdata_path("mydata.csv") in your R or Rmd file. This would return: e.g., /var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T//RtmpO9s7M4/mtcars20/inst/extdata/mydata.csv

    Similarly:

      -
    • DataPackageR::project_path() constructs a path to the data package root directory. (e.g., /var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T//RtmpilrC9a/mtcars20)
    • -
    • DataPackageR::project_data_path() constructs a path to the data package data subdirectory. (e.g., /var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T//RtmpilrC9a/mtcars20/data)
    • +
    • DataPackageR::project_path() constructs a path to the data package root directory. (e.g., /var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T//RtmpO9s7M4/mtcars20)
    • +
    • DataPackageR::project_data_path() constructs a path to the data package data subdirectory. (e.g., /var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T//RtmpO9s7M4/mtcars20/data)

    Raw data sets that are stored externally (outside the data package source tree) can be constructed relative to the project_path().

    +
    +

    YAML header metadata for R files and Rmd files.

    +

    If your processing scripts are Rmd files, the usual yaml header for rmarkdown documents should be present.

    +

    If you have Rmd files, you can still include a yaml header, but it should be commented with #' and it should be at the top of your R file. For example, a test R file in the DataPackageR package looks as follows:

    +
    #'---
    +#'title: Sample report  from R script
    +#'author: Greg Finak
    +#'date: August 1, 2018
    +#'---
    +data <- runif(100)
    +

    This will be converted to an Rmd file with a proper yaml header, which will then be turned into a vignette and properly indexed in the built package.

    Build the data package.

    Once the skeleton framework is set up,

    -
    # Run the preprocessing code to build cars_over_20
    -# and reproducibly enclose it in a package.
    -DataPackageR:::package_build(file.path(tempdir(),"mtcars20"))
    -INFO [2018-07-31 11:26:17] Logging to /private/var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T/RtmpilrC9a/mtcars20/inst/extdata/Logfiles/processing.log
    -INFO [2018-07-31 11:26:17] Processing data
    -INFO [2018-07-31 11:26:17] Reading yaml configuration
    -INFO [2018-07-31 11:26:17] Found /private/var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T/RtmpilrC9a/mtcars20/data-raw/subsetCars.Rmd
    -INFO [2018-07-31 11:26:17] Processing 1 of 1: /private/var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T/RtmpilrC9a/mtcars20/data-raw/subsetCars.Rmd
    -processing file: subsetCars.Rmd
    -output file: subsetCars.knit.md
    -/usr/local/bin/pandoc +RTS -K512m -RTS subsetCars.utf8.md --to html4 --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash+smart --output /private/var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T/RtmpilrC9a/mtcars20/inst/extdata/Logfiles/subsetCars.html --email-obfuscation none --self-contained --standalone --section-divs --template /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable 'theme:bootstrap' --include-in-header /var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T//RtmpilrC9a/rmarkdown-str1228b57905547.html --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' 
    -
    -Output created: /private/var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T/RtmpilrC9a/mtcars20/inst/extdata/Logfiles/subsetCars.html
    -INFO [2018-07-31 11:26:17] 1 required data objects created by subsetCars.Rmd
    -INFO [2018-07-31 11:26:17] Saving to data
    -INFO [2018-07-31 11:26:17] Copied documentation to /private/var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T/RtmpilrC9a/mtcars20/R/mtcars20.R
    -
    -✔ Creating 'vignettes/'
    -✔ Creating 'inst/doc/'
    -INFO [2018-07-31 11:26:17] Done
    -INFO [2018-07-31 11:26:17] DataPackageR succeeded
    -INFO [2018-07-31 11:26:17] Building documentation
    -First time using roxygen2. Upgrading automatically...
    -Updating roxygen version in /private/var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T/RtmpilrC9a/mtcars20/DESCRIPTION
    -Writing NAMESPACE
    -Loading mtcars20
    -Writing mtcars20.Rd
    -Writing cars_over_20.Rd
    -INFO [2018-07-31 11:26:17] Building package
    -'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file  \
    -  --no-environ --no-save --no-restore --quiet CMD build  \
    -  '/private/var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T/RtmpilrC9a/mtcars20'  \
    -  --no-resave-data --no-manual --no-build-vignettes 
    -
    -Reloading installed mtcars20
    -[1] "/private/var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T/RtmpilrC9a/mtcars20_1.0.tar.gz"
    +
    # Run the preprocessing code to build cars_over_20
    +# and reproducibly enclose it in a package.
    +DataPackageR:::package_build(file.path(tempdir(),"mtcars20"))
    +INFO [2018-08-01 11:21:24] Logging to /private/var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T/RtmpO9s7M4/mtcars20/inst/extdata/Logfiles/processing.log
    +INFO [2018-08-01 11:21:24] Processing data
    +INFO [2018-08-01 11:21:24] Reading yaml configuration
    +INFO [2018-08-01 11:21:24] Found /private/var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T/RtmpO9s7M4/mtcars20/data-raw/subsetCars.Rmd
    +INFO [2018-08-01 11:21:24] Processing 1 of 1: /private/var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T/RtmpO9s7M4/mtcars20/data-raw/subsetCars.Rmd
    +processing file: subsetCars.Rmd
    +output file: subsetCars.knit.md
    +/usr/local/bin/pandoc +RTS -K512m -RTS subsetCars.utf8.md --to html4 --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash+smart --output /private/var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T/RtmpO9s7M4/mtcars20/inst/extdata/Logfiles/subsetCars.html --email-obfuscation none --self-contained --standalone --section-divs --template /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable 'theme:bootstrap' --include-in-header /var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T//RtmpO9s7M4/rmarkdown-str1527273c3b6f8.html --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' 
    +
    +Output created: /private/var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T/RtmpO9s7M4/mtcars20/inst/extdata/Logfiles/subsetCars.html
    +INFO [2018-08-01 11:21:24] 1 required data objects created by subsetCars.Rmd
    +INFO [2018-08-01 11:21:24] NEWS.md file not found, creating!
    +Enter a text description of the changes for the NEWS file.
    +INFO [2018-08-01 11:21:30] Saving to data
    +INFO [2018-08-01 11:21:30] Copied documentation to /private/var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T/RtmpO9s7M4/mtcars20/R/mtcars20.R
    +
    +✔ Creating 'vignettes/'
    +✔ Creating 'inst/doc/'
    +INFO [2018-08-01 11:21:30] Done
    +INFO [2018-08-01 11:21:30] DataPackageR succeeded
    +INFO [2018-08-01 11:21:30] Building documentation
    +First time using roxygen2. Upgrading automatically...
    +Updating roxygen version in /private/var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T/RtmpO9s7M4/mtcars20/DESCRIPTION
    +Writing NAMESPACE
    +Loading mtcars20
    +Writing mtcars20.Rd
    +Writing cars_over_20.Rd
    +INFO [2018-08-01 11:21:31] Building package
    +'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file  \
    +  --no-environ --no-save --no-restore --quiet CMD build  \
    +  '/private/var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T/RtmpO9s7M4/mtcars20'  \
    +  --no-resave-data --no-manual --no-build-vignettes 
    +
    +Reloading installed mtcars20
    +Next Steps 
    +1. Update your package documentation. 
    +   - Edit the documentation.R file in the package source data-raw subdirectory and update the roxygen markup. 
    +   - Rebuild the package documentation with  document() . 
    +2. Add your package to source control. 
    +   - Call  git init .  in the package source root directory. 
    +   -  git add  the package files. 
    +   -  git commit  your new package. 
    +   - Set up a github repository for your pacakge. 
    +   - Add the github repository as a remote of your local package repository. 
    +   -  git push  your local repository to gitub. 
    +[1] "/private/var/folders/jh/x0h3v3pd4dd497g3gtzsm8500000gn/T/RtmpO9s7M4/mtcars20_1.0.tar.gz"
    +
    +

    Documenting your data set changes in NEWS.md

    +

    When you build a package in interactive mode, you will be prompted to input text describing the changes to your data package (one line).

    +

    These will appear in the NEWS.md file in the following format:

    +
    DataVersion: xx.yy.zz
    +========
    +A description of your changes to the package
    +
    +[The rest of the file]
    +

    Why not just use R CMD build?

    If the processing script is time consuming or the data set is particularly large, then R CMD build would run the code each time the package is installed. In such cases, raw data may not be available, or the environment to do the data processing may not be set up for each user of the data. DataPackageR decouples data processing from package building/installation for data consumers.

    @@ -490,16 +535,24 @@

    A note about the package source directory after building.

    20 ¦ ¦--cars_over_20.Rd 21 ¦ °--mtcars20.Rd 22 ¦--NAMESPACE -23 ¦--R -24 ¦ °--mtcars20.R -25 ¦--Read-and-delete-me -26 °--vignettes -27 °--subsetCars.Rmd -
    -

    Update the autogenerated documentation.

    +23 ¦--NEWS.md +24 ¦--R +25 ¦ °--mtcars20.R +26 ¦--Read-and-delete-me +27 °--vignettes +28 °--subsetCars.Rmd +
    +
    +

    Update the autogenerated documentation.

    After the first build, the R directory contains mtcars.R that has autogenerated roxygen2 markup documentation for the data package and for the packaged data cars_over20.

    The processed Rd files can be found in man.

    -
    +

    The autogenerated documentation source is in the documentation.R file in data-raw.

    +

    You should update this file to properly document your objects. Then rebuild the documentation:

    + +

    This is done without reprocessing the data.

    Dont’ forget to rebuild the package.

    You should update the documentation in R/mtcars.R, then call package_build() again.

    @@ -514,46 +567,46 @@

    Accessing vignettes, data sets, and data set documentation.

    When the package is installed, these will be accessible via the vignette() API.

    The vignette will detail the processing performed by the subsetCars.Rmd processing script.

    The data set documentation will be accessible via ?cars_over_20, and the data sets via data().

    - +
    @@ -563,22 +616,22 @@

    Migrating old data packages.

    @@ -607,19 +660,19 @@

    The new way

    Partial builds

    We can also perform partial builds of a subset of files in a package by toggling the enabled key in the config file.

    This can be done with the following API:

    - +

    Note that the modified configuration needs to be written back to the package source directory in order for the changes to take effect.

    The consequence of toggling a file to enable: no is that it will be skipped when the package is rebuilt, but the data will still be retained in the package, and the documentation will not be altered.

    This is useful in situations where we have multiple data sets, and want to re-run one script to update a specific data set, but not the other scripts because they may be too time consuming, for example.

    @@ -692,7 +745,7 @@

    DESCRIPTION

    LazyData: true ByteCompile: true DataVersion: 0.1.0 -Date: 2018-07-31 +Date: 2018-08-01 Suggests: knitr, rmarkdown