Skip to content
This repository has been archived by the owner on Jun 23, 2020. It is now read-only.

Commit

Permalink
Add script to clean and combine data, and add data
Browse files Browse the repository at this point in the history
- Update survey data dictionary with left out questions
- Update survey data dictionary with variable/column names for questions
- Add script `clean-data.R` to clean and combine the two survey datasets into
  one for ease of analysis
- Create the combined survey dataset after running `clean-data.R`
- Create README.md file to explain cleaned data and the script to produce it
- Update root README.md file to briefly explain data
- Change `data/` directory to `raw-data/`
  • Loading branch information
erictleung committed May 11, 2016
1 parent 97ba361 commit 7f185c3
Show file tree
Hide file tree
Showing 8 changed files with 16,975 additions and 37 deletions.
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,18 @@ We announced on [March 29th,

Survey development was lead by [Quincy Larson](https://twitter.com/ossia) with Free Code Camp and [Saron Yitbarek](https://twitter.com/saronyitbarek) with Code Newbie. For more about why we made this survey: ["How we crafted a survey for thousands of people who are learning to code"](https://medium.freecodecamp.com/we-just-launched-the-biggest-ever-survey-of-people-learning-to-code-cac81dadf1ea#.8g9ts8gm5).

## Table of Contents

- [About the Data](#about-the-data)
- [How to Contribute](#how-to-contribute)
- [Analysis of other relevant recent data](#analysis-of-other-relevant-recent-data)
- [License](#license)

## About the Data

The survey results are located in the [`data/`](data/) directory, in .csv format.
The raw survey results are located in the [`raw-data/`](raw-data/) directory, in `.csv` format.

We have cleaned and combined the data for convenience of downstream analyses and visualizations. The cleaned data is located in the [`clean-data/`](clean-data/) directory.

## How to Contribute

Expand Down
15,654 changes: 15,654 additions & 0 deletions clean-data/2016-FCC-New-Coders-Survey-Part-1.csv

Large diffs are not rendered by default.

65 changes: 65 additions & 0 deletions clean-data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Cleaning and Combine Free Code Camp Survey Data

## Introduction

The survey data was broken up into two parts and need to be combined into one
for ease of future downstream analyses. Additionally, these two data sets need
to be cleaned up a bit because of the nature of survey data.

## Notable Data Transformations

### Obvious Outliers

In some of the numeric free text answers, numeric values were filtered out if it
was beyond a reasonable threshold. For example, an answer saying you've coded
for 100,000 months would be removed.

### Numeric Ranges

Some answers were given as ranges. For example, a range of "9-10" months of
programming might have been answer to a question. The average of this range was
taken when possible.

### Years to Months

Some answers to a question asking about months were given in years. These were
converted to months if possible.

### Normalization of Answers

Some of the free text answers were very similar to each other, with the
exception of a space or two. These will register as different answers if you
aren't looking for them. Answers like "Cybersecurity" and "Cyber Security" are
the same and were changed to a consistent manner. There may have been some
missed.


## Prerequisites to Rerun Data Manipulations

- [R][RProj] (>= 3.2.3)
- [dplyr][dplyrGH] (>= 0.4.3) [CRAN][dplyrCRAN]
- [Rcpp][RcppGH] (>= 0.12.4) [CRAN][RcppCRAN]

[RProj]: https://www.r-project.org/
[dplyrGH]: https://github.com/hadley/dplyr
[RcppGH]: https://github.com/RcppCore/Rcpp
[dplyrCRAN]: https://cran.r-project.org/web/packages/dplyr/index.html
[RcppCRAN]: https://cran.r-project.org/web/packages/Rcpp/index.html


## Reproduce Cleaning and Combining of Data

Running the following script will create a new file
`2016-New-Coders-Survey.csv` file in this directory `clean-data/`.

```shell
git clone https://github.com/FreeCodeCamp/2016-new-coder-survey.git
cd clean-data
Rscript clean-data.R
```


## Cleaning Pipeline

1. Rename column names
2. Clean free text fields for appropriate question
Loading

0 comments on commit 7f185c3

Please sign in to comment.