Skip to content

Commit

Permalink
Merge pull request #412 from worldbank/la-word-search
Browse files Browse the repository at this point in the history
Add marias suggestions
  • Loading branch information
kbjarkefur authored Feb 26, 2020
2 parents f30a0e7 + c783462 commit 8ee1277
Showing 1 changed file with 8 additions and 6 deletions.
14 changes: 8 additions & 6 deletions chapters/data-analysis.tex
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,8 @@ \subsection{Organizing your folder structure}

\subsection{Breaking down tasks}

We divide the process of transforming raw datasets to analysis-ready datasets to research results into four steps:
We divide the process of transforming raw datasets to research outputs into
four steps:
de-identification, data cleaning, variable construction, and data analysis.
Though they are frequently implemented concurrently,
creating separate scripts and datasets prevents mistakes.
Expand Down Expand Up @@ -206,7 +207,8 @@ \section{De-identifying research data}
as you can always go back and remove variables from the list of variables to be dropped,
but you can not go back in time and drop a PII variable that was leaked
because it was incorrectly kept.
Examples include respondent names and phone numbers, enumerator names, tax payer numbers, and addresses.
Examples include respondent names and phone numbers, enumerator names, taxpayer
numbers, and addresses.
For each confidential variable that is needed in the analysis, ask yourself:
\textit{can I encode or otherwise construct a variable that masks the confidential component, and
then drop this variable?}
Expand Down Expand Up @@ -362,7 +364,7 @@ \subsection{Documenting data cleaning}
or that you intend to release as part of a replication package or data publication.

Another important component of data cleaning documentation are the results of data exploration.
As clean your dataset, take the time to explore the variables in it.
As you clean your dataset, take the time to explore the variables in it.
Use tabulations, summary statistics, histograms and density plots to understand the structure of data,
and look for potentially problematic patterns such as outliers,
missing values and distributions that may be caused by data entry errors.
Expand All @@ -380,9 +382,9 @@ \section{Constructing analysis datasets}
as planned during research design\index{Research design},
and using the pre-analysis plan as a guide.\index{Pre-analysis plan}
During this process, the data points will typically be reshaped and aggregated
so that level of the dataset goes from the unit of observation
(one item in the bundle) in the survey to the unit of analysis (the household).\sidenote{
\url{https://dimewiki.worldbank.org/Unit\_of\_Observation}}
so that level of the dataset goes from the unit of observation in the survey
to the unit of analysis.\sidenote{\url{
https://dimewiki.worldbank.org/Unit\_of\_Observation}}


A constructed dataset is built to answer an analysis question.
Expand Down

0 comments on commit 8ee1277

Please sign in to comment.