-
If you'd like to learn more about the benefit to using project files, working with directories in R, and having project-oriented workflows, I highly recommend reading the entire second chapter of the online book What They Forgot to Teach You About R
-
The
here
package by Kirill Müller and Jenny Bryan
As admitted in the Making Data Pipelines in R Talk, I personally learned about the importance of ensuring that my data was clean and as expected by introducing data validation late in my journey. Consequently, I am absolutely not an expert in data validation and am still learning about it. The following resources helped me figure out what I needed to at the time, but are also on my to-do list to get back to and read more thoroughly.
-
The
validate
package and The Data Validation Cookbook by Mark van der Loo -
A Lightweight Data Validation Ecosystem with R, GitHub, and Slack by Emily Riederer
-
The
data.validator
package by Appsilon
-
The
janitor
package by Sam Firke -
Various Tidyverse Packages like:
-
What is Data Validation and When Do You Do It? by James Phoenix
-
Methodology for Data Validation by Zio et. al
As mentioned in the Talk, sustainability can look like a lot of different things depending on the context of the pipeline. In my personal case, sustainability meant being able to not only document the pipeline, but also make the code human-readable to non-programmers through non-technical documents. The following are links to versions of my own personal documents I've used, as well as future readings for things that may be helpful when thinking about the sustainability of your pipeline in R.
-
The
dataReporter
Package (formerly known as dataMaid) by Claus Ekstrøm and Anne Petersen - Useful for generating codebooks and reports on your data. -
The
flowr
Package by Sahil Seth- Useful for experimenting with visualizing workflows in R -
Codebook Template - Useful for thinking about what to put into a codebook. This is a word document. If you'd like, you could recreate this in R Markdown or other programs.
-
Workflow Reference - Useful for inspiration about visualizing and describing workflows in your pipeline. This was originally created in Canva, but any visualization software, or even Microsoft Powerpoint would suffice.
-
Data Map Template - Useful for giving an editable template for visualizing datasets on a more granular level. Similar to SQL schema visualizations. Can be created in any visualization software. Created here in powerpoint for ease of sharing.