Skip to content

Latest commit

 

History

History

Proj6

DAND Project 6 - Explore and Summarize Data

Project Purpose and Notes

Use R and apply exploratory data analysis techniques to explore relationships in one variable to multiple variables and to explore a selected data set for distributions, outliers, and anomalies.

This project uses ggplot2, dplyr, lubridate, and the scales packages as well as a R Markdown "notebook" to analyze a data set and communicate findings. The data set is provided by Udacity.

This project was created and tested on Windows 10 64bit using R 3.5.1 with the following libraries:

  • dplyr 0.7.6
  • ggplot2 3.0.0
  • lubridate 1.7.4
  • scales 0.5.0 In addition, the use of RStudio is strongly recommended. Version 1.1.453 was used to create and run the R Markdown "notebook" along with:
  • knitr 1.2.0

Installation and Requirements

  • Install R
    • Note: R v3.5 or later recommended
  • Install dplyr, ggplot2, lubridate, and scales
  • Download the Udacity data set
  • Clone this repo
  • Open the rmd file in RStudio
  • Run or "re-knit" the rmd file from RStudio

Project Requirements

  • Using the provided project template RMD file, perform an analysis on the selected data set
    • A stream-of-consciousness analysis and exploration of the data
    • A final plots and summary section at the end
    • A final reflection session capturing struggles, successes, and future ideas about the analysis of the data set

Resource Attribution

  • The following resources were used in coming up with the solution for this project:
    • StackOverflow
    • ggplot2 documentation/web site
    • knitr documentation/wiki
    • RStudio documentation and references on R Markdown and knitr options
    • Referenced many sections from "The Art of R Programming"
    • Referenced many chapters from "R Graphics Cookbook"

Project Solution Documents

License

MIT License