Update cleaning of dataset #72

erictleung · 2016-08-02T06:18:27Z

Change commute times >300 minutes to NA
Change minimum mortgage to $1000 and maximum mortgage to $1000000
Move data dictionary into clean-data/ directory
Change minimum student debt to $1000 and maximum debt to $500000
Add changelog to clean data README
Remove missing data encoding information in README
Add example exploration in clean data README
Add figure of distribution of ages in dataset
Change naming of some columns that were originally extracted from "Other"
columns in the dataset to reflect columns were derived, rather than
originally being there
Clean data for children to make sure number of children and yes/no answer to
having children is consistent
Fix spelling mistake in IsReceiveDisabilitiesBenefits (original:
IsReceiveDiabilitiesBenefits)
Use ungroup() command in time_diff_check because of dplyr version
changes
Separate polishing steps for podcasts, resources, and so on to make it easier
to see what is being polished
Update survey data dictionary description with details on the two datasets
and parts of the survey
Update survey data
Update version numbers for R packages

cc/ @evaristoc @SamAI-Software @QuincyLarson

Close #33

QuincyLarson · 2016-08-02T08:19:41Z

@erictleung I am unfamiliar with R so I don't feel qualified to QA this, but all of these changes you described sound sane :)

SamAI-Software · 2016-08-02T08:25:13Z

The issues might be because some people already did their analyses, so changing variable names will break their code. Mine, too.
I understand the reasons why Eric changed names and fixed typos, but it might be too late for that.

erictleung · 2016-08-02T14:52:17Z

Right, I understand that I've changed those variables names and it will break some people's code. @evaristoc and I discussed the reason for renaming the variable names with Other in them. And I agree, it might be too late at this point.

I guess it is not too urgent that those variable names be changed. I can revert them back and just make a note of it in the README file.

The most important part of the change is the normalization part to address issue #33.

SamAI-Software · 2016-08-02T15:13:39Z

The most important part of the change is the normalization part to address issue #33.

This part seems to be fine.

If you revert the old variable names and add a note into README, then we should be good to go.

erictleung · 2016-08-02T16:37:34Z

@SamAI-Software awesome, I'll try to get to it later tonight.

erictleung · 2016-08-04T10:04:43Z

@SamAI-Software updated my PR!

I reverted the major change of adding Other into the variables names. I did, however, keep the variable change for IsReceiveDisabilitiesBenefits as the original IsReceiveDiabilitiesBenefits has a typo.

Feel free to pull down my PR and QA check the dataset. Let me know if there's anything else of concern 😃

- Change commute times >300 minutes to NA - Change minimum mortgage to $1000 and maximum mortgage to $1000000 - Move data dictionary into `clean-data/` directory - Change minimum student debt to $1000 and maximum debt to $500000 - Add changelog to clean data README - Remove missing data encoding information in README - Add example exploration in clean data README - Add figure of distribution of ages in dataset - Clean data for children to make sure number of children and yes/no answer to having children is consistent - Fix spelling mistake in `IsReceiveDisabilitiesBenefits` (original: IsReceiveDiabilitiesBenefits) - Use `ungroup()` command in `time_diff_check` because of `dplyr` version changes - Separate polishing steps for podcasts, resources, and so on to make it easier to see what is being polished - Update survey data dictionary description with details on the two datasets and parts of the survey - Update survey data - Update version numbers for R packages

SamAI-Software · 2016-08-04T13:22:58Z

LGTM

erictleung force-pushed the update-clean-dataset branch from 896a533 to 3069b4a Compare August 4, 2016 09:57

erictleung force-pushed the update-clean-dataset branch from 3069b4a to 5890a46 Compare August 4, 2016 10:05

QuincyLarson merged commit f5f3d21 into freeCodeCamp:master May 27, 2017

erictleung deleted the update-clean-dataset branch May 28, 2017 02:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update cleaning of dataset #72

Update cleaning of dataset #72

erictleung commented Aug 2, 2016 •

edited

Loading

QuincyLarson commented Aug 2, 2016

SamAI-Software commented Aug 2, 2016

erictleung commented Aug 2, 2016

SamAI-Software commented Aug 2, 2016

erictleung commented Aug 2, 2016

erictleung commented Aug 4, 2016 •

edited

Loading

SamAI-Software commented Aug 4, 2016

Update cleaning of dataset #72

Update cleaning of dataset #72

Conversation

erictleung commented Aug 2, 2016 • edited Loading

QuincyLarson commented Aug 2, 2016

SamAI-Software commented Aug 2, 2016

erictleung commented Aug 2, 2016

SamAI-Software commented Aug 2, 2016

erictleung commented Aug 2, 2016

erictleung commented Aug 4, 2016 • edited Loading

SamAI-Software commented Aug 4, 2016

erictleung commented Aug 2, 2016 •

edited

Loading

erictleung commented Aug 4, 2016 •

edited

Loading