DAND Project 6 - Explore and Summarize Data
Use R and apply exploratory data analysis techniques to explore relationships in one variable to multiple variables and to explore a selected data set for distributions, outliers, and anomalies.
This project uses ggplot2, dplyr, lubridate, and the scales packages as well as a R Markdown "notebook" to analyze a data set and communicate findings. The data set is provided by Udacity.
This project was created and tested on Windows 10 64bit using R 3.5.1 with the following libraries:
- dplyr 0.7.6
- ggplot2 3.0.0
- lubridate 1.7.4
- scales 0.5.0 In addition, the use of RStudio is strongly recommended. Version 1.1.453 was used to create and run the R Markdown "notebook" along with:
- knitr 1.2.0
- Install R
- Note: R v3.5 or later recommended
- Install dplyr, ggplot2, lubridate, and scales
- Download the Udacity data set
- Clone this repo
- Open the rmd file in RStudio
- Run or "re-knit" the rmd file from RStudio
- Using the provided project template RMD file, perform an analysis on the selected data set
- A stream-of-consciousness analysis and exploration of the data
- A final plots and summary section at the end
- A final reflection session capturing struggles, successes, and future ideas about the analysis of the data set
- The following resources were used in coming up with the solution for this project:
- StackOverflow
- ggplot2 documentation/web site
- knitr documentation/wiki
- RStudio documentation and references on R Markdown and knitr options
- Referenced many sections from "The Art of R Programming"
- Referenced many chapters from "R Graphics Cookbook"
- Project 6 - R Markdown Notebook - R Markdown Notebook for Project
- Project 6 - Knitted R Markdown Note as HTML web page - Web Page (HTML) from knitted R Markdown Notebook for Project
- Prosper Loan Data - Variable Definitions - Microsoft Excel Spreadsheet with feature definitions and additional sheets used to facilitate the data set analysis