-
-
Notifications
You must be signed in to change notification settings - Fork 60
Normalizing data #33
Comments
Hi @SamAI-Software Please contact @erictleung to confirm a copy of the file is available. cc. @erictleung |
@SamAI-Software thanks for looking at the data. I did realize this and started changing the salaries, removing dollar sign and averaging ranges I found. I see you've already found my PR on this over in #29. I think we can close this issue and just carry the conversation over there. |
@erictleung cool, closing this issue. |
@erictleung, I reopen this issue, because PR was already merged. Everything seems to be good, but 3 variables: CommuteTime, HomeMortgageOwe, StudentDebtOwe. I had investigated CommuteTime for a bit.
There are 84 answers more than 5 hours, and 71 of them from not-native English speakers.
So my bet is that many non-native speakers had mistaken the question:
And they thought that we were asking about how long is their working day. So for CommuteTime I suggest to cut off all the answers greater than 300 (5 hours) into NA, and not into 300, because we have no idea how much is their real commute time, as they confused the question or make some totally unreal number, like 600 or 1000, etc. For HomeMortgageOwe and StudentDebtOwe we just need min & max values, because mortgages like "35" or "10 000 000" don't look trustful. Summary: CommuteTime |
Sorry, this has been long overdue. I'm got most of the code ready. I was going to try and get an updated data dictionary at the same time, but maybe I should just settle on the data first and then the data dictionary later. I'm away from my primary development environment until next week. So I'll try to get a PR in by the end of next week dealing with these normalization issues. I also found some spelling mistakes I fixed as well.. |
@erictleung cool, but also consider, that data dictionary was already PR-ed with missing variables by @M0nica |
Are we planning to normalize data for questions with open answers?
For example, 5. About how much money do you expect to earn per year at your first developer job (in US Dollars)?
Avg.$ = $53K per year, but some answers have $800K and more.$200K per year, then avg.$ = $48K annually
If set the upper bound to
Also some answers with "K" ($70K), and some are with full numbers ($70000)
And some answers are too small to be annually, but good enough to be monthly. In many countries people never use annual salaries, but monthly. So lots of people was probably confused and wrote their monthly expectations.
I did some normalizing, but in the end the average expectation didn't change a lot.
Original avg.$ = $53K/year VS $52K/year for normalized data.
Normalizing is a good practice, but it didn't change much in this example, so are we planning to do it? If yes, then we need to agree on conditions.
2016FCC_moneyAnnual_v.0.2.xlsx
The text was updated successfully, but these errors were encountered: