Skip to content
This repository was archived by the owner on Jun 23, 2020. It is now read-only.

Commit 01ac7a1

Browse files
committed
Add script to clean and combine data, and add data
- Update survey data dictionary with left out questions - Update survey data dictionary with variable/column names for questions - Add script `clean-data.R` to clean and combine the two survey datasets into one for ease of analysis - Create the combined survey dataset after running `clean-data.R` - Create README.md file to explain cleaned data and the script to produce it - Update root README.md file to briefly explain data - Change `data/` directory to `raw-data/`
1 parent 97ba361 commit 01ac7a1

8 files changed

+16975
-37
lines changed

README.md

+10-1
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,18 @@ We announced on [March 29th,
55

66
Survey development was lead by [Quincy Larson](https://twitter.com/ossia) with Free Code Camp and [Saron Yitbarek](https://twitter.com/saronyitbarek) with Code Newbie. For more about why we made this survey: ["How we crafted a survey for thousands of people who are learning to code"](https://medium.freecodecamp.com/we-just-launched-the-biggest-ever-survey-of-people-learning-to-code-cac81dadf1ea#.8g9ts8gm5).
77

8+
## Table of Contents
9+
10+
- [About the Data](#about-the-data)
11+
- [How to Contribute](#how-to-contribute)
12+
- [Analysis of other relevant recent data](#analysis-of-other-relevant-recent-data)
13+
- [License](#license)
14+
815
## About the Data
916

10-
The survey results are located in the [`data/`](data/) directory, in .csv format.
17+
The raw survey results are located in the [`raw-data/`](raw-data/) directory, in `.csv` format.
18+
19+
We have cleaned and combined the data for convenience of downstream analyses and visualizations. The cleaned data is located in the [`clean-data/`](clean-data/) directory.
1120

1221
## How to Contribute
1322

clean-data/2016-FCC-New-Coders-Survey-Part-1.csv

+15,654
Large diffs are not rendered by default.

clean-data/README.md

+65
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Cleaning and Combine Free Code Camp Survey Data
2+
3+
## Introduction
4+
5+
The survey data was broken up into two parts and need to be combined into one
6+
for ease of future downstream analyses. Additionally, these two data sets need
7+
to be cleaned up a bit because of the nature of survey data.
8+
9+
## Notable Data Transformations
10+
11+
### Obvious Outliers
12+
13+
In some of the numeric free text answers, numeric values were filtered out if it
14+
was beyond a reasonable threshold. For example, an answer saying you've coded
15+
for 100,000 months would be removed.
16+
17+
### Numeric Ranges
18+
19+
Some answers were given as ranges. For example, a range of "9-10" months of
20+
programming might have been answer to a question. The average of this range was
21+
taken when possible.
22+
23+
### Years to Months
24+
25+
Some answers to a question asking about months were given in years. These were
26+
converted to months if possible.
27+
28+
### Normalization of Answers
29+
30+
Some of the free text answers were very similar to each other, with the
31+
exception of a space or two. These will register as different answers if you
32+
aren't looking for them. Answers like "Cybersecurity" and "Cyber Security" are
33+
the same and were changed to a consistent manner. There may have been some
34+
missed.
35+
36+
37+
## Prerequisites to Rerun Data Manipulations
38+
39+
- [R][RProj] (>= 3.2.3)
40+
- [dplyr][dplyrGH] (>= 0.4.3) [CRAN][dplyrCRAN]
41+
- [Rcpp][RcppGH] (>= 0.12.4) [CRAN][RcppCRAN]
42+
43+
[RProj]: https://www.r-project.org/
44+
[dplyrGH]: https://github.com/hadley/dplyr
45+
[RcppGH]: https://github.com/RcppCore/Rcpp
46+
[dplyrCRAN]: https://cran.r-project.org/web/packages/dplyr/index.html
47+
[RcppCRAN]: https://cran.r-project.org/web/packages/Rcpp/index.html
48+
49+
50+
## Reproduce Cleaning and Combining of Data
51+
52+
Running the following script will create a new file
53+
`2016-New-Coders-Survey.csv` file in this directory `clean-data/`.
54+
55+
```shell
56+
git clone https://github.com/FreeCodeCamp/2016-new-coder-survey.git
57+
cd clean-data
58+
Rscript clean-data.R
59+
```
60+
61+
62+
## Cleaning Pipeline
63+
64+
1. Rename column names
65+
2. Clean free text fields for appropriate question

0 commit comments

Comments
 (0)