-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmain.tex
198 lines (162 loc) · 15.1 KB
/
main.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{natbib}
\usepackage{graphicx}
\usepackage[]{endfloat}
\usepackage[dvipsnames]{xcolor}
\usepackage{longtable,pdflscape, graphicx, booktabs, dcolumn, listings, amsmath,bbm,courier,pgffor,fixltx2e, natbib,enumerate,amssymb,amsfonts,color,hyperref,array,calc,multirow,tikz,amsthm,anyfontsize,bbm}
\DeclareDelayedFloatFlavour*{longtable}{table}
\title{Public-private wage gap in Brazil: a counterfactual quantile analysis using longitudinal data}
\author{Vítor Costa}
\date
\begin{document}
\maketitle
\tableofcontents
\section{Introduction and motivation}
Anecdotal evidence in Brazil suggests public sector workers are overcompensated compared to their private sector counterparts. In fact, (brief list of references for the Brazilian economy and their estimates). (Highlight the discrepancy of these estimates). (highlight their lack of treatment for endogenous selection).
Add
\begin{enumerate}
\item Segmentation (the two different legal frameworks, CLT vs Statutory)
\item Endogenous selection (selection via highly competitive civil exams)
\item Increasing share of personnel expense in the national budget
\item Results for other countries
\end{enumerate}
This paper's contribution to this literature is twofold: I apply a quantile Oaxaca-Blinder decomposition method and use the panel structure of the data to estimate worker fixed effects. The method allows me to estimate composition and structural effects of the private-public gap along the distribution of wages. Endogeneity procedure relies on \cite{canay_simple_2011}.
Add
\begin{enumerate}
\item conclusions
\item sections layout
\end{enumerate}
\section{Previous Evidence}\label{section:lit-review}
\section{Brazilian Civil Service Panorama}
\section{Data}
The data come from the \emph{Relatório Anual de Informações Sociais}, or simply, RAIS. The RAIS is a yearly based data set of matched employer-employee administrative records kept by the Brazilian federal government to primarily consolidate the information necessary for the execution of social programs, such as social security and benefits for low earners. It also aims to subsidize policy makers dependent on labor market statistics. \\
\noindent
The collection of data from business establishments was implemented in 1975, by force of law, and since 1997 the collection is done electronically. RAIS covers the whole universe of the formal labor market in Brazil. As of 2007, it contained information from 6.9 million establishments and 37.6 workers, approximately. \\
\noindent
The variables available from RAIS are varied. Each worker and establishment have their own unique identifier. An employer-employee observation contains data on monthly average earnings, occupation, industry sector, municipality of the establishment, date of worker admission, data of worker separation cause of separation - retirement, death, just cause etc - and tenure of the worker in the current job. The researcher can also observe employees demographic variables, such as race, gender, age, schooling, nationality, and disability status. \\
\subsection{Sample selection}
\noindent
The years in the data go from 2003 up to 2014, amounting to approximately 781 million observations. Due to the computational impossibility of estimating the model in the whole data, I select a 1\% sample based on the worker unique identifier. Following the standard practice in the literature, I filter the sample for prime age workers only, i.e. age between 25 and 54. For workers with more than one job, I keep only the record of the highest paying appointment. I also drop rural workers and employees under temporary contracts. The final sample has approximately 3.95 million job-year observations and 683,821 workers. \\
\subsection{Characterization of Public Employees}
\noindent
I consider to public employees all workers whose employers are a part of the federal administration, thus including judicial, legislative and executive powers. The choice for the federal sphere is mainly motivated by the policy relevance of this paper. The federal budget is subject to bigger scrutiny from the other republican powers, the public, and international organisms. Secondly, due to the higher transparency in the National Treasury bookkeeping, I am able to provide a more reliable estimate for the fiscal impact of the public-private wage gap.
\subsection{RAIS versus PNAD and PME}
The totality of papers mentioned in the literature review of Section \ref{section:lit-review} uses two different sources of employment records. The most widely used is the \emph{Pesquisa Nacional por Amostra de Domicílios} - or simply, PNAD - which is a yearly household survey. The main drawback of PNAD compared to RAIS is that the household survey does not allow for the observation of individuals across distinct time periods. As I show in Section \ref{section:end-sel}, my correction for endogenous selection relies on the identification of worker fixed effects.
On the other hand, PNAD covers the informal sector of the labor market, which is absent in RAIS. From records of PNAD itself, \cite{filho_evolucao_2015} show that informal workers accounted for 32.5\% of the total work force in 2012, after having reached 42.3\% in 2003. From a policy perspective, however, the estimate of a the public sector wage premium must be done in comparison to the formal private sector. This is due to two main reasons. One, posting informal job vacancies is not an option for the federal government to offer its employees a formal arrangement. Most importantly, nonetheless, is the fact that the observable attributes composition of public sector employees are much closer to the formal private sector pool than the informal one \footnote{\textcolor{red}{I have to add more evidence on this, but \cite{corseuil_criterios_2015} find that informal sector workers are more likely to be women, less likely to be white, and have lower educational attainment and labor attachment.}}. \\
\noindent
\cite{emilio_evaluating_2012}, on the other hand, estimate the wage gap from the \emph{Pesquisa Mensal do Emprego} - PME. Similar to PNAD, workers from the informal sector are surveyed in PME but each household is also re-surveyed after a period of one year. Using demographic variables, \cite{emilio_evaluating_2012} can match individuals within the same household across the two periods of observation and estimate an individual fixed effect to net out endogenous selection into the public sector effect. \\
\noindent
The disadvantages of the PME are related to its territorial coverage - only 30\% of the population in the 2002-2004 interval examined by \cite{emilio_evaluating_2012} - and the reduced number of time periods available to estimate the individual specific effects - $T=2$. As \cite{canay_simple_2011} shows, the consistency of worker fixed effects relies on large $T$ asymptotics. With my sample of RAIS I cover the whole of the Brazilian territory and I am able to estimate worker specific effects with up to 12 yearly observations\footnote{\textcolor{red}{Upon update of the data to include years 2015, 2016 and 2017, I'll be able to re-estimate individual fixed effects with up to 15 yearly observations.}}.
\section{Equations for estimations}
\subsection{Naïve equations}
Throughout this paper I will refer to the model and estimation of the premium in the absence of any endogenous selection treatment as naïve. If we ignore the possibility of sorting into the public sector of individuals with higher unobserved ability, we could consistently estimate the public-private premium by means of an Oaxaca-Blinder decomposition[\cite{oaxaca_male-female_1973},\cite{blinder_wage_1973}]. \\
\noindent
Let's denote by $y^{1}_{it}$ the observed wage for individual $i$ in year $t$ in the public sector. If in year $t$, individual $i$ works at the private sector, I denote his wage by $y^{0}_{it}$. Conditional on a set $X_{it}$ of observed individual characteristics, the expected wage in each sector is given by
\begin{align}
y_{it}^{1} = X_{it}^{1}\beta^{1}+u_{it}\\
y_{it}^{0} = X_{it}^{0}\beta^{0}+u_{it} \label{wage-priv}
\end{align}
which, after taking the averages over individuals, can be rewritten as
\begin{align}
\bar{y}_{t}^{1} - \bar{y}_{t}^{0} &= \bar{X}_{t}^{1}\beta^{1} -\bar{X}_{t}^{0}\beta^{0} \\
&= \underbrace{\left(\bar{X}_{t}^{1}-\bar{X}_{t}^{0}\right)\beta^{0}}_{Characteristics} +\underbrace{\bar{X}_{t}^{1}\left(\beta^{1}-\beta^{0}\right)}_{Structural} \label{ob-str}
\end{align}
\noindent
The left hand side of equation (\ref{ob-str}) is the observed mean wage difference between public and private sector workers. The right hand side breaks this difference into two terms that I will refer to \emph{characteristics} and \emph{structural} effects. The characteristics estimates the size of the observed mean wage difference due to differences in observed attributes of workers in both groups, holding fixed the coefficients in the wage equation for group $0$. A higher characteristics effect is obtained when, holding fixed the parameters of equation (\ref{wage-priv}), workers in the public sector have higher (lower) mean observed attributes associated with positive (negative) coefficients in equation (\ref{wage-priv}). \\
\noindent
This is indeed the case when we observe the values of exploratory statistics from the sample. From table \ref{table:all}, we can see that workers in the public sector have higher education attainment, higher tenure in the current job, are slightly older and more likely to be white.
\begin{longtable}[c]{@{}lccc@{}}
\toprule
& Public (N=797128) & Private (N=3151791) & Total
(N=3948919)\tabularnewline
\midrule
\endhead
Hourly wage (2011 US\$) & & &\tabularnewline
- Mean (SD) & 10.328 (14.181) & 5.823 (14.201) & 6.733
(14.311)\tabularnewline
- Median (Q1, Q3) & 6.100 (3.111, 11.958) & 3.191 (2.306, 5.386) & 3.452
(2.387, 6.549)\tabularnewline
- Range & 0.285 - 1189.725 & 0.269 - 5497.925 & 0.269 -
5497.925\tabularnewline
Age & & &\tabularnewline
- Mean (SD) & 40.072 (8.129) & 35.858 (7.991) & 36.708
(8.196)\tabularnewline
- Median (Q1, Q3) & 40.000 (33.000, 47.000) & 34.000 (29.000, 42.000) &
36.000 (30.000, 43.000)\tabularnewline
- Range & 25.000 - 54.000 & 25.000 - 54.000 & 25.000 -
54.000\tabularnewline
Tenure & & &\tabularnewline
- Mean (SD) & 123.947 (102.908) & 41.529 (55.963) & 58.166
(75.708)\tabularnewline
- Median (Q1, Q3) & 95.300 (32.900, 200.900) & 20.900 (7.800, 50.900) &
26.800 (9.500, 72.900)\tabularnewline
- Range & 0.000 - 487.000 & 0.000 - 486.900 & 0.000 -
487.000\tabularnewline
Gender & & &\tabularnewline
- Male & 321731 (40.4\%) & 1973298 (62.6\%) & 2295029
(58.1\%)\tabularnewline
- Female & 475397 (59.6\%) & 1178493 (37.4\%) & 1653890
(41.9\%)\tabularnewline
Race & & &\tabularnewline
- N-Miss & 207504 & 16583 & 224087\tabularnewline
- White & 375817 (63.7\%) & 1981624 (63.2\%) & 2357441
(63.3\%)\tabularnewline
- Nonwhite & 213807 (36.3\%) & 1153584 (36.8\%) & 1367391
(36.7\%)\tabularnewline
Education & & &\tabularnewline
- Low & 194169 (24.4\%) & 1386563 (44.0\%) & 1580732
(40.0\%)\tabularnewline
- Medium & 315961 (39.6\%) & 1410160 (44.7\%) & 1726121
(43.7\%)\tabularnewline
- High & 286998 (36.0\%) & 355068 (11.3\%) & 642066
(16.3\%)\tabularnewline
\bottomrule
\label{table:all}
\end{longtable}
\subsection{Distribution of the wage gap}
There is no reason \emph{a priori} to believe that the premium is uniform for both low and high wage levels. I am interested in not only describing the \emph{mean} wage gap but also the distribution of the public sector premium along the whole wage distribution. In order to estimate the distribution of the premium, I use a quantile version of the Oaxaca-Blinder decomposition as devised in \cite{chernozhukov_inference_2013}.\\
\noindent
Going back to equation (\ref{ob-str}), we can interpret the expression $\bar{X}_{t}^{1}\beta^{0}$ as the counterfactual for the wage of an average employee in group $1$ were she rewarded according to the $\beta$ estimates from group $0$. Similarly, we can compute the counterfactual distribution - over different quantiles - of wages of public employees via the method in \cite{chernozhukov_inference_2013}. An exposition of the actual procedure to obtain counterfactual quantiles lies beyond the scope of this paper, but I refer the reader to \cite{hospido_public_2016} for a step-by-step explanation of the method in the Oaxaca-Blinder context. \cite{chernozhukov_inference_2013} also elicits the bootstrapping procedure to construct confidence intervals around the quantile estimates for the characteristics and structural effects in equation \ref{ob-str}. \\
\subsection{Accounting for endogenous selection} \label{section:end-sel}
\begin{enumerate}
\item Discuss \cite{canay_simple_2011} and outline equation for worker efffects estimation
\item Outline net of FEs equation
\end{enumerate}
\section{Estimation Results}
\subsection{Worker Effects}
The estimated distribution of worker effects is depicted in Figure \ref{fig:we}. The set of possible values for \textcolor{red}{$\alpha_i$ in equation XX is similar across both sectors}. It is not actually the range of individual effects that changes between the two sectors but the probability density of these effects. One can find public and private workers with practically any value for unobserved coefficients in that range, but it is notorious how the density of above average values is higher among public sector workers.\\
\begin{figure}[h!]\label{fig:we}
\caption{Worker Effects by Sector}
\includegraphics[scale=0.7]{graphs/001_fe_sector_onepc.pdf}
\end{figure}
\subsection{Oaxaca-Blinder (OB)}
\begin{enumerate}
\item Report overall OB
\item Plot yearly OB in both cases
\end{enumerate}
The observed wage gap on the whole sample is of x\%, of which y\% is due to workers observed attributes differentials and the remainder z\% stems from differences in coefficients. From Figure \ref{fig:yearly}, the Oaxaca-Blinder decomposition shows that for the first four years in the sample the coefficients effect was negative for public employees, meaning that holding fixed their observed characteristics, their wages would be higher, on average, in the private sector. \\
\noindent
However, once we take the endogenous selection into the equation, not only does the overall premium get compressed but...
\begin{figure}[h!]
\caption{Oaxaca-Blinder Decomposition by Year}
\includegraphics[scale=0.65]{graphs/002_yearly_ob_onepc.pdf}
\label{fig:yearly}
\end{figure}
\subsection{Distribution of the Premium}
\begin{enumerate}
\item Plot overall distribution in two cases
\item Show distribution estimates broken by demographic groups
\end{enumerate}
\section{Conclusion}
\begin{enumerate}
\item what do we learn from the quantile decomposition? where does the premium phase out?
\item is self-selection important? by how much?
\item what is the size of premium once we account for selection?
\item show expenses with inactive civil employees and its evolution in time.
\item state the discrepancy of retirement entitlements for both types of workers and problems for future research.
\end{enumerate}
\newpage
\scriptsize{\bibliographystyle{chicago}}
\bibliography{library} %bibtex file name without .bib
\end{document}