Skip to content

A template file and folder structure for a data analysis project/paper done with R/Quarto/Github.

Notifications You must be signed in to change notification settings

ahgroup/data-analysis-template

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

A template file and folder structure for a data analysis project/paper done with R/Quarto/Github. Other components (e.g., other programming languages) can be added as needed.

Pre-requisites

This is a template for a data analysis project using R, Quarto, Github and a reference manager that can handle bibtex. Our recommendation for the reference manager is Zotero, with the Better BibTex plugin/extension. It is also assumed that you have a word processor installed (e.g. MS Word or LibreOffice). You need that software stack to make use of this template. To produce pdf output, you need a TeX distribution installed. You can use tinytex, following these instructions.

Template structure and content

The template comes with a folder structure and example files to illustrative the kinds of content you would place in the different folders. The following is a brief description of the contents. See the readme files in each folder for more details.

  • The assets folder contains static assets like manually generated schematics/diagrams, bibtex files, csl style files, PDFs of references, and other such content. These assets are not code-based and are not generated by code. Basically add anything that's you want to be part of this repo but that doesn't fit into the other categories.

  • All code goes into the code folder and subfolders. Currently, there are 3 sub-folders that do different parts of an analysis. You can re-organize such that it makes most sense for your project. The folders contain files that do some data cleaning and analysis to illustrate the overall setup and workflow. See the readme files in those folders for details.

  • All data goes into the data folder and subfolders. Currently, there are 2 sub-folders that contain different versions of a simple example data set. You can re-organize such that it makes most sense for your project.

  • The products folder and its subfolders contains deliverables, such as manuscript/report, the supplement, slide decks, posters, Shiny web apps, etc. Those should generally be made with Quarto/R. As needed, other formats can be used. There is an example manuscript and and example slide deck.

    • The manuscript subfolder contains a template for a report written as Quarto file. If you access this repository as part of my Modern Applied Data Science course, the sections are guides for your project. If you found your way to this repository outside the course, you might only be interested in seeing how the file pulls in results and references and generates a word document as output, without paying attention to the detailed structure. There is also a sub-folder containing an example template for a supplementary material file.
    • The slides subfolder contains a basic example of slides made with Quarto.
  • The results folder contains automatically/code generated output. This includes figures, tables saved as serialized R data (.Rds) files, computed values and other outputs. All content in these folders should be automatically generated by code. Manually generated results should be avoided as much as possible. If absolutely necessary, they go into the assets folder.

  • There are multiple special files in the repo.

    • readme.md: this file contains instructions or details about the folder it is located in. You are reading the project-level README.md file right now. There is a readme in almost every folder.
    • data-analysis-template.Rproj is a file that tells RStudio that this is the main folder for a project. Rename if you want.
    • a few "hidden" files and folders (they start with a . and depending on how your OS is configured, you might not see them). Those are for R/RStudio and Git/GitHub and you can ignore them.

Naming conventions

We try to follow these naming conventions for folders and files:

  • Somewhat descriptive and easy to understand names.
  • Only lower-case letters (and numbers if needed). Words separated by a -.

For instance there is a folder called analysis-code with a file called exploratory-analysis-v2.qmd in it. We don't use _ or blank spaces for separators. We also don't use CamelCase, only lower-case. Exceptions are made for standard file endings, for instance R scripts end in .R (instead of .r).

Package management

It is recommended to use renv to manage R packages and increase chances of future reproducibility. This is required if you are using the template as part of a research project for our group. Otherwise, you can decide to implement renv or not. This can happen at any stage, though earlier in the project is generally better.

If you plan to use renv, start by reading the introduction to renv article so you know how to use it.

Getting started

This is a Github template repository. The best way to get it and start using it is by following these steps.

Once you got the repository, you can check out the examples by executing them in order. First run the processing code, which will produce the processed data. Then run the analysis scripts, which will take the processed data and produce some results. Then you can run the manuscript, poster and slides example files in any order. Those files pull in the generated results and display them. These files also pull in references from the bibtex file and format them according to the CSL style.

About

A template file and folder structure for a data analysis project/paper done with R/Quarto/Github.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •