chap1.tex

\chapter{State-Building through Public Land Disposal? An Application of Matrix Completion for Counterfactual Prediction}\label{land-reform}

\begin{quote}  
	\textbf{Summary:} How would the frontier have evolved in the absence of homestead policies? I apply a matrix completion method to predict the counterfactual time-series of frontier state capacity had there been no homesteading. In placebo tests, the matrix completion method outperforms synthetic controls and other regression-based estimators in terms of minimizing prediction error. Causal estimates signify that homestead policies had significant and long-lasting negative impacts on state government expenditure and revenue. These results are similar to difference-in-difference estimates that exploit variation in the timing and intensity of homestead entries aggregated from 1.46 million individual land patent records.
\end{quote}

\clearpage

\section{Introduction}
\noindent
Political scientists are increasingly interested in patterns of state development across time and place. Several scholars \citep[e.g.,][]{bensel1990,murtazashvili2013political,frymer2014rush} theorize a relationship between mid-nineteenth century public land policies and the development of the state, arguing that policies designed to transfer public land to private individuals increased the bureaucratic capacity of the U.S. federal government to administer land. 

Public land policies had long-lasting impacts on state capacity, or the ability of governments to finance and implement policies \citep{besley2010state}. I explore the role of two U.S. public land policies in shaping state capacity: the Homestead Act (HSA) of 1862, which opened for settlement hundreds of millions of acres of western frontier land, and the Southern Homestead Act (SHA) of 1866, which opened over 46 million acres of land for homesteading. I provide evidence that homesteads authorized under these laws had significant long-run impacts on the capacity of frontier state governments. 

The view that the western frontier had long-lasting impacts on the evolution of democratic institutions can be traced to \citet{turner1956significance}. Turner's ``frontier thesis" posits that homestead policies acted as a ``safety valve'' for relieving pressure from congested urban labor markets in eastern states. The view of the frontier as a ``safety valve'' has been explored by \citet{ferrie1997migration}, who finds evidence in a linked census sample of substantial migration to the frontier by unskilled workers and considerable gains in wealth for these migrant workers. Homestead policies not only offered greater economic opportunities to eastern migrants, but also the sparse population on the western frontier meant that state and local governments competed with each other to attract migrants in order to lower local labor costs and to increase land values and tax revenues. Frontier governments offered migrants broad access to cheap land and property rights, unrestricted voting rights, and a more generous provision of schooling and other public goods \citep{engerman2005evolution}.

\citet{garcia2008myth} test the frontier thesis in a global context and conclude that the economic effect of the frontier depends on the quality of political institutions at the time of frontier expansion. Frontier expansion promotes equitable outcomes only when societies are initially democratic. When institutional quality is weak, the existence of frontier land can yield worse developmental outcomes because non-democratic political elites can monopolize frontier lands. Historical scholars have noted that public land policies were often exploited by land speculators, ranchers, miners, and loggers, to accumulate public land and extract natural resources during the early stages of capitalist development \citep{gates1942role,murtazashvili2013political}. According to this view, homesteading laws were \emph{de jure} social polices but \emph{de facto} corporate welfarism. 

The paper makes a methodological contribution in applying an alternative method for estimating causal impacts of policy interventions on time-series cross-section data. Building on a new literature that uses machine learning algorithms such as L1-regularized linear regression \citep{doudchenko2016balancing} or deep neural networks \citep{poulos2017rnn} for counterfactual prediction, I apply a matrix completion method to predict the treated unit time-series in the absence of the intervention. I perform placebo tests and find that the matrix completion method outperforms the synthetic control method and other regression-based estimators in terms of minimizing prediction error. In addition, I show how to evaluate the overall effect of the policy intervention using a randomization inference procedure in which approximately unbiased $p$-values are obtained under minimal assumptions.

The paper proceeds as follows: in Section \ref{history-ch1}, I overview the historical context of homestead policies and its relationship to state capacity and land inequality; Section \ref{estimation} describes the method of matrix completion for counterfactual prediction, benchmarks the method against alternative estimators, and describes the inferential procedure. In Section \ref{state-capacity}, I report the results of placebo tests to verify the consistency of the matrix completion estimator. I then present estimates of the long-run impacts of homestead policies on state capacity. Section~\ref{DID} reports DID estimates of the effect of homesteads on state capacity and land inequality, and Section~\ref{discussion-ch1} concludes. 

\section{Historical background} \label{history-ch1}

The 1862 HSA opened up hundreds of millions of acres of western public land for settlement. The HSA provides that any adult citizen --- including women, immigrants who had applied for citizenship, and freed slaves following the passage of the Fourteenth Amendment---  could apply for a homestead grant of 160 acres of frontier land. Applicants were required to live and make improvements on the land for five years before filing to claim a homestead land grant. 

Under the HSA, the bulk of newly surveyed land on the western frontier was reserved for homesteads, although the law did not end sales of public land. The explicit goal of the HSA was to liberalize the homesteading requirements set by the Preemption Act of 1841, which permitted individuals already inhabiting public land to purchase up to 160 acres at \$1.25 per acre before the land was put up for sale. The implicit goal was to promote rapid settlement on the western frontier and reduce federal government's enforcement costs \citep{allen1991homesteading}. 

In the pre-Reconstruction South, public land was not open to homestead but rather unrestricted cash entry, which permitted the direct sale of public land to private individuals of 80 acres or more for at least \$1.25 an acre. The 1866 SHA restricted cash entry and reserved for homesteading over 46 million acres of public land, or about one-third of the total land area in the five southern public land states (PLS) \citep[pp. 13]{lanza1999agrarianism}. PLS are states created out of the public domain. In the South, these states include Alabama, Arkansas, Florida, Louisiana, and Mississippi. Western PLS include the 25 states that comprise the Midwestern, Southwestern, and Western U.S. (except Hawaii). 

Homestead policies were often exploited by speculators and corporations through fraudulent filings. Speculators and corporations engaged in the practice of paying individuals to stake out homesteads in order to extract resources from the land with no intention of filing for the final patent. In the South, these ``dummy entry-men'' were used by timber and mining companies to extract resources while the cash entry restriction of the SHA was in effect. When the restriction was removed, there was no need for fraudulent filings because the larger companies could buy land in unlimited amounts at a nominal price \citep{gates1940federal, gates1979federal}. The same pattern of fraudulent filings existed in the West, where \citet{murtazashvili2013political} argues that speculators benefited disproportionately from public land policies because the economic balance of power tilted toward the wealthy. \citet{gates1942role} characterizes western speculators who bought land in bulk prior to the 1889 restriction as being influential in state and local governments, resistant to paying taxes, and opposed to government spending. 

\section{Matrix completion for counterfactual prediction} \label{estimation}

An important problem in the social sciences is estimating the effect of a binary intervention on an outcome over time. When interventions take place at an aggregate level (e.g., a state), researchers make causal inferences by comparing the post-intervention (``post-period'') outcomes of affected (``treated'') units against the outcomes of unaffected  (``control'') units. A common approach to the problem is the synthetic control method, which predicts the counterfactual outcomes of treated units by finding a convex combination of control units that match the treated units in term of lagged outcomes. Correlations across units are assumed to remain constant over time. 

This paper applies the method of matrix completion via nuclear norm minimization (MC-NNM) proposed by \citet{athey2017matrix} to predict counterfactual outcomes. Matrix completion methods \citep[e.g.,][]{mazumder2010spectral} exploit correlations within and across units, but ignore the temporal dimension of the data. These methods typically assume missing values are sampled uniformly at random \citep{yoon2018estimating}; in contrast, the MC-NNM estimator allows for patterns of missing data to have a time-series dependency structure. 

Let $Y$ denote a $\text{N} \times \text{T}$ matrix of outcomes for each unit $i =1, \ldots, \text{N}$ at time $t = 1, \ldots, \text{T}$. $Y$ is incomplete because we observe each element $Y_{it}$ for only the control units and the treated units prior to time of initial treatment exposure, $\text{T}_0 < \text{T}$. Let $\mathcal{O}$ denote the set of $(it)$ values that are observed and $\mathcal{M}$ the set of $(it)$ missing values. Let the values of the $\text{N} \times \text{T}$ complete matrix $M$ be $M_{it} =1 $ if $(it) \in \mathcal{M}$ and $M_{it} = 0$ if $(it) \in \mathcal{O}$.\footnote{Note that the process that generates $M$ is referred to the assignment mechanism in the causal inference literature \citep{imbens2015causal} and the missing data mechanism in missing data analysis \citep{little2014}.}

We cannot directly observe counterfactual outcomes and instead wish to impute missing values in $Y$ for treated units with $M_{it}=1$. In an observational setting, units are part of the assignment mechanism that generates $M$ and patterns of missing data follow one of two specific structures. In the case of simultaneous adoption of treatment, a subset of units are exposed to treatment at time $\text{T}_0$ and every subsequent period. The second structure arises from the staggered adoption setting, where $\text{T}_0$ may vary across treated units. In either case, there are selection biases because the probability of missingness may depend on the unobserved data. The goal is to accurately estimate the effect of a policy intervention despite incomplete data subject to selection bias. 

\subsection{Matrix completion estimator}

Matrix completion methods attempt to impute missing entries in a low-rank matrix by solving a convex optimization problem via NNM, even when relatively few values are observed in $Y$ \citep{candes2009exact,candes2010matrix}. The MC-NNM estimator is 

\begin{equation}
Y_{it} = L_{it}^{*} + \sum_{p=1}^{\text{P}} X_{ip} \upbeta_{p}^{*} + \upgamma_{i}^{*} + \updelta_{t}^{*} + \upepsilon_{it} \label{eq:mc-Y}
\end{equation}

\noindent
where $L^{*}$ a low-rank matrix to be estimated, $X$ is a $\text{N} \times \text{P}$ matrix of normalized, unit-specific covariates, and $\boldsymbol{\upgamma}^{*}$ and $\boldsymbol{\updelta}^{*}$ are vectors of unit and time effects, respectively. The identifying condition is that, conditional on $L^{*}$, the error vector $\boldsymbol{\upepsilon}$ is independent across rows (units) and $\E[\boldsymbol{\upepsilon} | L^{*} + \boldsymbol{\upbeta}^{*} + \boldsymbol{\upgamma}^{*} + \boldsymbol{\updelta}^{*}] = 0$. Estimating $L^{*}$ involves minimizing the sum of squared errors via nuclear norm regularized least squares:

\begin{equation}
\hat{L}, \, \hat{\upbeta} = \min_{L, \,\boldsymbol{\upbeta}} \Bigg[\sum_{(it) \in \mathcal{O}} \frac{1}{|\mathcal{O}|} \bigg(Y_{it} - L_{it} - \sum_{p=1}^{\text{P}} X_{ip} \upbeta_p - \upgamma_{i} - \updelta_{t} \bigg)^2 + \lambda \norm{L}_\star \Bigg], \label{eq:mc-opt}
\end{equation}

\noindent
where $\lambda$ is the regularization term on the nuclear norm $\norm{\cdot}_\star$ --- i.e., sum of singular values --- and its value is selected by cross-validation. The algorithm for (\ref{eq:mc-opt}) iteratively replaces missing values with those recovered from a singular value decomposition (SVD) \citep{mazumder2010spectral}.\footnote{\citet{amjad2018robust} propose an alternative approach of approximating $L^{*}$ via SVD, and then using linear regression on the ``de-noised'' matrix, rather than relying on matrix norm regularizations.}

\citet{athey2017matrix} note two drawbacks of the MC-NNM estimator: first, it penalizes the errors for each value with $M_{it} = 0$ equally without regard to the fact that $\Pr (M_{it}) = 1$ (i.e., the propensity score) increases with $t$. \citet{athey2017matrix} suggest weighting the loss function by the propensity score, which is similar to the importance weighting scheme proposed by \citet{cortes2008sample} to address the problem of covariate shift, which occurs when training and test data are drawn from different distributions. Second, the columns of $\boldsymbol{\upepsilon}$ may be autocorrelated because the estimator does not account for time-series dependencies in the observed data. The estimator detects patterns row- and column-wise, but treat the data as perfectly synchronized \citep{yoon2018estimating}. 

\subsection{Simulations} \label{sims}

In this section, I evaluate the accuracy of the MC-NNM estimator on the following three datasets common to the synthetic control literature, with the actual treated unit removed from each dataset: \possessivecite{abadie2003economic} study of the economic impact of terrorism in the Basque Country during the late 1960s ($\text{N}=16$, $\text{T}=43$); \possessivecite{abadie2010synthetic} study of the effects of a large-scale tobacco control program implemented in California in 1988 ($\text{N}=38$, $\text{T}=31$); and \possessivecite{abadie2015comparative} study of the economic impact of the 1990 German reunification on West Germany ($\text{N}=16$, $\text{T}=44$). For each trial run, I randomly select half of the control units to be treated and predict their counterfactual outcomes for periods following a given $\text{T}_0$. I compare the predicted values to the observed values by calculating the root-mean squared error, $\text{RMSE} = \sum_{it}|L^{*} -\hat{L}|^2 / \sqrt{\text{NT}}$. 

I benchmark the MC-NNM estimator against the following methods:

\begin{description}
		{\setlength\itemindent{1mm}
			\item[(a) DID] Regression of $Y$ on $\boldsymbol{\upgamma}$ and $\boldsymbol{\updelta}$ and a binary treatment variable 
			\item[(b) HR-EN] Horizontal regression with elastic net regularization \citep{zou2005regularization, athey2017matrix} 
			\item[(c) PCA] Regularized iterative principal components analysis \citep{ilin2010practical}
			\item[(d) SC-ADH] Synthetic control approached via exponentiated gradient descent \citep{abadie2010synthetic}
			\item[(e) SVD] Low-rank SVD approximation estimated by expectation maximization \citep{troyanskaya2001missing}
			\item[(f) VT-EN] Vertical regression with elastic net regularization, Ibid. (b).
		}
\end{description}

Figure \ref{synth-stag} reports the average prediction error of the estimators in a staggered treatment adoption setting, with the estimates jittered horizontally to reduce overlap. Error bars represent 95\% prediction intervals calculated using the standard deviation of the error distribution for 20 trial runs. 

Across all estimators, the average RMSE decreases and prediction intervals narrow as $\text{T}_0/\text{T}$ approaches unity because the estimators have more information to generate counterfactual predictions. The MC-NNM estimator generally outperforms all other estimators in terms of average RMSE across different ratios $\text{T}_0/\text{T}$.  The strong performance of the MC-NNM estimator can be attributed to the fact that it is capable of using additional information in the form of pre-intervention (``pre-period'') observations of the treated units, whereas the regression-based estimators rely only on the pre-period observations of control units to predict counterfactuals. Figure \ref{synth-sim} presents a similar pattern of results in a simultaneous adoption setting.

\begin{figure}[htbp]
	\centering
	\begin{subfigure}[t]{0.42\textwidth}
		\centering
		\includegraphics[width=\textwidth]{/media/jason/Dropbox/github/land-reform/paper/plots/basque-stag.png}
		\caption{Basque Country terrorism data, $\text{N}_t = 8$} 
	\end{subfigure}
	~ 
	\begin{subfigure}[t]{0.42\textwidth}
		\centering
		\includegraphics[width=\textwidth]{/media/jason/Dropbox/github/land-reform/paper/plots/california-stag.png}
		\caption{California smoking ban data, $\text{N}_t = 19$}
	\end{subfigure}
	~ 
	\begin{subfigure}[t]{0.42\textwidth}
		\centering
		\includegraphics[width=\textwidth]{/media/jason/Dropbox/github/land-reform/paper/plots/germany-stag.png}
		\caption{West German reunification data, $\text{N}_t = 8$}
	\end{subfigure}
	\caption{Placebo tests under staggered treatment adoption: {\protect\tikz \protect\draw[color={rgb:red,4;green,0;yellow,1}] (0,0) -- plot[mark=o, mark options={scale=1.5}] (0.25,0) -- (0.5,0);}, DID;
		{\protect\tikz \protect\draw[color={rgb:orange,4;yellow,2;pink,3}] (0,0) -- plot[mark=triangle*, mark options={scale=1.5,fill=white}] (0.25,0) -- (0.5,0);}, HR-EN; 
		{\protect\tikz \protect\draw[color={rgb:red,0;green,5;blue,1}] (0,0) -- plot[mark=+, mark options={scale=1.5}] (0.25,0) -- (0.5,0);}, MC-NNM;
		{\protect\tikz \protect\draw[color={rgb:red,0;green,4;blue,2}] (0,0) -- plot[mark=x, mark options={scale=1.5}] (0.25,0) -- (0.5,0);}, PCA;
		{\protect\tikz \protect\draw[color=cyan] (0,0) -- plot[mark=diamond, mark options={scale=1.5}] (0.25,0) -- (0.5,0);}, SC-ADH;
		{\protect\tikz \protect\draw[color={rgb:red,100;pink,100;blue,200}] (0,0) -- plot[mark=triangle, mark options={scale=1.5, rotate=180}] (0.25,0) -- (0.5,0);}, SVD;
		{\protect\tikz \protect\draw[color=magenta] (0,0) -- plot[mark=square, mark options={scale=1.5}] (0.25,0) -- (0.5,0);}, VT-EN.
		\label{synth-stag}}
\end{figure}

\subsection{Hypothesis testing} \label{hyp-test}

Consider a setup with $\text{J}$ control units indexed by $i=1, \ldots, \text{J}$ and $\text{Q}$ treated units indexed by $i = \text{J}+1, \ldots, \text{N}$. The optimization program (\ref{eq:mc-opt}) imputes the missing entries in $Y$:

\[
\hat{Y}_{it} = \hat{L}_{it} \hspace{10mm} \text{for } \text{J}+1 \leq i \leq \text{N} \text{ and } \text{T}_0+1 \leq t \leq \text{T}.
\] The inferred causal effect of the intervention on the treated group is the difference between the observed outcomes of the treated units and the counterfactual outcomes that would have been observed in the absence of the intervention, 

\[
\hat{\alpha}_{it} = Y_{it} - \hat{Y}_{it} \hspace{10mm} \text{for } \text{J}+1 \leq i \leq \text{N} \text{ and } \text{T}_0+1 \leq t \leq \text{T}.
\] Taking the difference-in-means between treated unit observed outcomes and predicted outcomes gives the per-period estimated average causal effect across treated units:

\begin{equation}
\hat{\bar{\alpha}}_{t}  = \frac{1}{\text{Q}} \sum^{\text{N}}_{i=\text{J}+1} \hat{\alpha}_{it} \hspace{10mm}  \text{for } \text{T}_0+1 \leq t \leq \text{T}.  \label{eq:avg-pointwise}
\end{equation}

\citet{chernozhukov2017exact} propose a randomization inference approach for testing the sharp null hypothesis $
\text{H}_0: \boldsymbol{\hat{\bar{\alpha}}} = \boldsymbol{\bar{\alpha}^{o}}$, where $\left\{\bar{\alpha}^{o}_{t}\right\}_{t= \text{T}_0}^\text{T}$ is a trajectory of per-period average effects under the null. The test statistic suggested by the authors is constructed so that we reject higher values:

\begin{equation} \label{tstat}
\text{S}_q (\boldsymbol{\hat{\bar{\alpha}}}) = \left(\frac{1}{\sqrt{\text{T}_\star}} \sum_{t= \text{T}_0 +1}^{\text{T}} |\hat{\bar{\alpha}}_{t}|^q\right)^q,
\end{equation}
%
where $\text{T}_\star = \text{T}- \text{T}_0$ and $q$ is a constant. 

Letting $\boldsymbol{\hat{\bar{\alpha}}_{\pi}}$ denote the vector of per-period average causal effects estimated for each permutation $\pi \in \Pi$, the randomization $p$-value is

\begin{equation}
\hat{p} = 1 - \frac{1}{\Pi} \sum_{\pi \in \Pi} I \left\{\text{S}_q (\boldsymbol{\hat{\bar{\alpha}}_{\pi}}) < \text{S}_q (\boldsymbol{\hat{\bar{\alpha}}}) \right\}, \label{eq:pvalue}
\end{equation}
%
where $I (\cdot)$ denotes the indicator function. 

Following \citet{chernozhukov2017exact}, (\ref{eq:pvalue}) is estimated by permuting $Y$ across the time dimension. The idea for permuting time periods rather than treatment assignment, as proposed by \citet{abadie2010synthetic}, is that if the data are stationary and weakly dependent, which is often the case in an aggregate time-series setting, then the distribution of the error term $\boldsymbol{\epsilon}$ in (\ref{eq:mc-Y}) should be the same in the pre- and post-periods. \citet{chernozhukov2017exact} prove that the $p$-values resulting from their inferential procedure are approximately unbiased under consistent estimation. 

\paragraph{Permuation structures} In the tests described below, three types of permutations are used: i.i.d. random permutations of the time index $t$; i.i.d. block random permutations of $\text{K}=\text{T}/b$ non-overlapping blocks, where $b$ is selected according to the optimal block length for the dependent bootstrap \citep{politis2004automatic}; and moving block permutations that circularly shift $t$ by one period, resulting in $\text{T}-1$ permutations. The latter two permutations are capable of preserving the dependence structure of the data and are thus appropriate for weakly dependent data. 

\section{Impact of homestead policies on state capacity} \label{state-capacity}

In this section, I estimate the causal impacts of homestead policies on state capacity, as measured by state government spending and revenue. I create measures of total expenditure and revenue collected from the records of 48 state governments during the period of 1783 to 1932 \citep{sylla1993sources} and the records of 16 state governments during the period of 1933 to 1937 \citep{sylla1995sourcesa,sylla1995sourcesb}. Comparable measures for 48 states are drawn from U.S. Census special reports for the years 1902, 1913, 1932, 1942, 1962, 1972, and 1982 \citep{haines2010}. The expenditure measure includes state government spending on education, social welfare programs, and transportation. The revenue measure incorporates state government income streams such as tax revenue and non-tax revenues such as land sales.

The data pre-processing steps are as follows. Each measure is inflation-adjusted according to the U.S. Consumer Price Index \citep{williamson2017seven} and scaled by the total free population in the decennial census \citep{haines2010}. Missing values are imputed separately in the pre- and -post-periods by carrying the last observation forward and remaining missing values are imputed by carrying the next observation backward. The raw outcomes data are log-transformed to alleviate exponential effects. Lastly, I remove states with no variance in the pre-period outcomes, resulting in complete $\text{N} \times \text{T}$ matrices of size $33 \times 159$ and $34 \times 158$ for the expenditures and revenues outcomes, respectively. 

In this application, PLS are the treated units and state-land states --- i.e., states that were not crafted from the public domain and were therefore not directly affected by homestead policies --- serve as control units. This group includes states of the original 13 colonies, Maine, Tennessee, Texas, Vermont, and West Virginia. The staggered adoption setting is appropriate for the current application because $\text{T}_0$ varies across states that were exposed to homesteads following the passage of the HSA. I aggregate to the state level approximately 1.46 million individual land patent records authorized under the HSA.\footnote{Land patent records provide information on the initial transfer of land titles from the federal government and are made accessible online by the U.S. General Land Office (\url{https://glorecords.blm.gov}).} Using these records, I determine that the earliest homestead entries occurred in 1869 in about half of the western frontier states, about seven years following the enactment of the HSA. In 1872, the first homesteads were filed in southern PLS. The timing and intensity of homestead entries is graphed in Figure \ref{fig:homestead-heatmap}. 

\subsection{Sources of bias}

I anticipate two possible sources of bias when estimating (\ref{eq:mc-Y}): selection bias and interference. In this context, selection bias occurs when the probability of treatment depends on unobserved data. We should expect that after homestead policy was implemented, eastern migrants were attracted to more productive land on the frontier and thus frontier states with higher levels of pre-intervention agricultural productivity are more likely to be exposed to treatment. To control for bias arising from selective migration, I include in the model  pre-intervention measures of agricultural productivity that include 1860 average farm sizes and 1850 and 160 average farm values \citep{haines2010}. 

To control for selection bias arising from differences in access to frontier lands, I create a measure of railroad access using digitized railroad maps provided by \citet{atack2013use}, which contain information on the year that each rail line was built. Overlaying the railroad track map over historical county borders, I calculate the total miles of operational track per square mile and aggregate the measure to the state-level. Excluding these unit-specific covariates may result in overestimating the actual impact of homestead policies.

A second potential source of bias arises from spillover effects. This approach, in line with modern frameworks of causal inference, assumes that control units are unaffected by the effects of treatment \citep{rubin1990,imbens2015causal}. While the no spillover assumption cannot directly be tested, it is likely in the present application that the outcomes of state-land states (i.e., control units) were indirectly affected by the out-migration of homesteaders from frontier states (i.e., treated units). When assuming the absence of spill-over effects, the use of indirectly affected states as control units would underestimate treatment effects because it would make the counterfactual and observed treated unit observations in the post-period more similar. 

Spillover effects might also arise if state-land state governments increase public investments in order to dissuade workers from migrating to the frontier in the first place. The historical evidence, however, suggests that labor-scarce frontier states were more strongly motivated to attract migrants and stimulate population growth than long-settled state-land states \citep{engerman2005evolution}. For example, the adoption of compulsory primary education laws and support for public education in general in western states has been considered as a means to attract potential migrants to the frontier \citep{meyer1979public,bandiera2018nation}. Spillover effects arising from competition among state governments would also underestimate the effect of treatment. 

\subsection{Placebo tests}

Prior to presenting the main results, I assess the validity of the key assumption underlying the approach by discarding post-period observations from the data. Treating $t = \left\{1, \ldots,  \text{T}_0 - \tau\right\}$ as the pre-period, I estimate (\ref{tstat}) and test the zero effect null hypothesis 

\begin{equation} \label{h0-placebo}
\text{H}_0: \text{S}_q (\hat{\bar{\alpha}}_{t}) = 0  \, \hspace{5mm} \,  \text{for} \, \hspace{2mm}  \text{T}_0 - \tau +1 \leq t \leq  \text{T}_0,
\end{equation}
%\frac{1}{\tau +1} \sum_{ \text{T}_0 - \tau +1}^{ \text{T}_0} \hat{\bar{\alpha}}_{t} 
\noindent
where $\tau \in \left\{1, 10, 25\right\}$ and $q \in \left\{1, 2\right\}$. 

Table \ref{placebo-tests} reports randomization $p$-values corresponding to each permutation structure and value of $\tau$ and $q$. i.i.d. block and i.i.d. block $p$-values are calculated using $|\Pi| = 1,000$ permutations. Moving block $p$-values are based on $|\Pi| = \text{T}-1$ permutations. When considering the revenue outcome, placebo tests yield two-sided $p$-values greater than the significance level of $\alpha = 0.05$ (shaded cells), regardless of the value of $q$ or permutation structure. These placebo results provide evidence that the model is correctly specified. However, we can only reject the null in the case of $\tau=1$ and for i.i.d. and i.i.d. block structures when considering the expenditure outcome. The latter results underscore the need for sufficient pre-intervention observations when fitting the model. Moving block permutations are more likely to yield false positives in the placebo tests, where the number of available permutations used to calculate the $p$-values is relatively low. 

\begin{table}[htbp]
	\captionsetup{font=normalsize}
	\caption{Placebo test $p$-values.\label{placebo-tests}}
	\begin{center}
		\scalebox{.8}{\input{/media/jason/Dropbox/github/land-reform/paper/placebo-tests}}
	\end{center}
\end{table}

Further evidence of the unbiasedness of the estimator is provided in Figure \ref{mc-sim}, which presents the results of placebo tests on control units using both pre- and post-period observations. Similar to the simulations on the synthetic control datasets discussed in Section \ref{sims}, there are no missing entries because the actual treated units are removed prior to the placebo tests. I randomly choose about half of the remaining control units as hypothetical treated units and predict their values for time periods following a randomly selected $\text{T}_0$. The MC-NNM estimator outperforms DID and SVD estimators in terms of minimizing RMSE for each ratio $\text{T}_0/\text{T}$. At $\text{T}_0/\text{T} \geq 0.5$, the estimator generally yields comparable error rates to PCA, synthetic control, and vertical regression estimators. 

\subsection{Main estimates} \label{main-estimates}

In the main analyses, I fit the MC-NNM estimator described in (\ref{eq:mc-Y}) on the entirety of observed entries in $Y$ to recover its missing entries; i.e., the counterfactual outcomes of PLS. The top panel of Figure \ref{mc-estimates-exp-pc} compares the observed time-series of treated units and control units along with the predicted outcomes of treated units. The dashed vertical line represents the initial treatment year of 1869. The observed means of the treated and control units are essentially identical in the post-period. However, we are interested primarily in the difference in the observed and predicted treated unit outcomes, which is the quantity $\hat{\bar{\alpha}}_{t}$, which corresponds to the estimated per-period average causal effect of treatment exposure on the treated units. These per-period causal impacts are plotted in the bottom panels, with 95\% confidence intervals estimated by taking $\hat{\bar{\alpha}}_{t} \pm 1.96$ the standard error of the distribution of 1,000 block bootstrap replicates of $\hat{\bar{\alpha}}_{t}$, with optimal block lengths selected by the procedure described by \citet{politis2004automatic}. 

The per-period impact time-series for both outcomes are essentially zero during the pre-period and within the bounds of the bootstrap confidence intervals, which demonstrates that the model is closely fitting the pre-period observations. Per-period impacts on state government spending peak in 1870, at the same time most PLS were first exposed to homesteads, representing a 0.18 [-0.35, 0.71] log increase in per-capita expenditure. By 1876, after most PLS had been exposed to homesteads, homestead exposure decreases expenditure by 0.51 [-1.67, 0.66] log points, and the trajectory of causal impacts remains negative for the rest of the time-series.

\begin{figure}[htbp]
	\centering
	\includegraphics[width=0.9\textwidth]{/media/jason/Dropbox/github/land-reform/paper/plots/mc-exp-pc.png}
	\caption{MC-NNM estimates of treatment exposure on state government expenditure, 1809 to 1982:
		{\color{Darjeeling15}{\sampleline{}}}, observed treated;
		{\color{Darjeeling11}{\sampleline{dashed}}}, observed control;
		{\color{Darjeeling15}{\sampleline{dotted}}}, counterfactual treated;
		{\color{Darjeeling15}{\sampleline{dash pattern=on .7em off .2em on .05em off .2em}}}, $\hat{\bar{\alpha}}_{t}$.\label{mc-estimates-exp-pc}} 
\end{figure}

A similar pattern of results emerges when estimating the impacts of homesteads on state government revenue (Figure \ref{mc-estimates-rev-pc}). Per-period impacts on revenue peak in 1873, representing a 0.43 [-0.57, 1.44] log increase in per-capita revenues, at the same time southern PLS are exposed to homesteads. The causal impacts on revenue quickly decrease and remain negative for the remaining time-series; in 1877, exposure to homesteads confer a 0.45 [-1.51, 0.61] log point decrease in per-capita revenue. 

\begin{figure}[htbp]
	\centering
	\includegraphics[width=0.9\textwidth]{/media/jason/Dropbox/github/land-reform/paper/plots/mc-rev-pc.png}
	\caption{MC-NNM estimates of treatment exposure on state government revenue, 1809 to 1982:
		{\color{Darjeeling15}{\sampleline{}}}, observed treated;
		{\color{Darjeeling11}{\sampleline{dashed}}}, observed control;
		{\color{Darjeeling15}{\sampleline{dotted}}}, counterfactual treated;
		{\color{Darjeeling15}{\sampleline{dash pattern=on .7em off .2em on .05em off .2em}}}, $\hat{\bar{\alpha}}_{t}$.\label{mc-estimates-rev-pc}} 
\end{figure}

The estimated bootstrap confidence intervals are useful for evaluating per-period causal impacts but are not helpful in evaluating the overall effect of homestead policies. Table \ref{mc-estimates} reports the results of testing the null hypothesis:

\begin{equation} \label{h0}
\text{H}_0: \text{S}_q (\hat{\bar{\alpha}}_{t}) = 0  \, \hspace{5mm} \,  \text{for} \, \hspace{2mm}  \text{T}_0 + 1 \leq t \leq \text{T}.
\end{equation}
\noindent

In the table, $\text{S}_q (\boldsymbol{\hat{\bar{\alpha}}})$ corresponds to the test statistic described in (\ref{tstat}) and each value beneath is the randomization $p$-value corresponding to each permutation structure. We can reject the null hypothesis (\ref{h0}) at the 5\% level for both outcomes, both values of $q$, and all three permutation schemes. Note that the relevant test statistic $S (\hat{\bar{\alpha}}_{t})$ measures the trajectory of average causal effects in absolute terms and thus does not provide information on the direction or evolution of the causal effects over time.

\begin{table}[htbp]
	\captionsetup{font=normalsize}
	\caption{Testing the null hypothesis (\ref{h0}).\label{mc-estimates}}
	\begin{center}
		\input{/media/jason/Dropbox/github/land-reform/paper/mc-estimates}
	\end{center}
\end{table}

\subsection{Sensitivity to imputation method}

The previously described estimates imply that homestead policy had a statistically significant and long-lasting negative impact on state government capacity. How much of this conclusion depends on the imputation procedure? I compare the following four imputation methods used for time-series analysis in the presence of missing values:
%
\begin{description}
	{\setlength\itemindent{1mm}
		\item[(a) Linear interpolation] Use linear interpolation to replace missing values
		\item[(b) LOCF] Replace each missing value with the most recent non-missing value prior to it (Last Observation Carried Forward); remaining missing values are imputed by LOCF in reverse
		\item[(c) Median replacement] Replace missing values with the median of the training set
		\item[(d) Random replacement] Replaces each missing value by drawing a random sample between the minimum and the maximum non-missing values in the data.
	}
\end{description}

Note that LOCF (b) is the imputation method used in the previous section, and that each imputation procedure is performed separately on the training and test sets to ensure that the networks do not learn from the unseen test data. I fit the MC-NNM model on each differently imputed and present the results in Table \ref{mc-sens}.\footnote{Figures \ref{mc-estimates-exp-pc-imp} and \ref{mc-estimates-rev-pc-imp} plot the observed outcomes and the MC-NNM estimates on the differently imputed data.} The results are generally insensitive to the choice of imputation method, except that the effect on revenue is not significant when the data are imputed with random replacement and $q=2$. 

\begin{table}[htbp]
	\captionsetup{font=normalsize}
	\caption{Testing the null hypothesis (\ref{h0}) on differently imputed data.\label{mc-sens}}
	\begin{center}
		\scalebox{.9}{\input{/media/jason/Dropbox/github/land-reform/paper/mc-sens.tex}}
	\end{center}
\end{table}

\section{DID estimation} \label{DID}

The matrix completion approach estimates the impact of a binary exposure to treatment on a continuous outcome. However, in this application a continuous form of treatment is available in the form of homestead entries. Equation (\ref{eq:dd}) estimates a continuous version of the DID estimator described in Section \ref{sims}, where the first difference comes from variation in the date of initial exposure to homesteads, and the second difference comes from variation in the intensity of homestead entries:

\begin{equation} 
Y_{it} =  \gamma_i + \delta_t + \psi \, M_{it} + \phi \, (M_{it} \cdot H_{it}) + X_{it}  + \epsilon_{it}. \label{eq:dd} 
\end{equation}

In this model, $X$ is a matrix of unit- and time-varying covariates included to control for parallel trends in agricultural productivity and access to frontier lands. Entries in the treatment indicator $M$ are set to $M_{it} = 1$ at $t \geq  \text{T}_0$, where $\text{T}_0$ varies across units. The continuous treatment exposure variable $H_{it}$ measures the per-capita statewide sum of homestead entries in state $i$ and year $t$. The coefficient corresponding to the interaction term, $\hat{\phi}$, is the estimated average causal effect of exposure to homesteads. I use unit-stratified bootstrapped samples to construct nonparametric standard errors for $\hat{\phi}$. The model assumes i.i.d. errors, which understates the standard errors for $\hat{\updelta}$ when the regression errors are serially correlated, or $\mathrm{Corr} (\epsilon_{it}, \epsilon_{i,t-1}) \neq 0$, which can arise when the time-series lengths are not sufficiently long to reliably estimate the data generating process.\footnote{\citet{bertrand2004much} show that the stratified bootstrap can be used to compute consistent standard errors when the number of units is sufficiently large.}

Similar to the case of binary treatment, the continuous DID estimator is adapted to a setting of staggered adoption because the initial date of exposure to homesteads varies across PLS. It should be emphasized that estimating (\ref{eq:dd}) in a staggered adoption setting relies on several strong assumptions regarding both the assignment mechanism --- in this application, the distribution of $\text{T}_0$ --- and the counterfactual outcomes of the treated units. The framework of \citet{athey2018design}, for instance, assumes the distribution of $\text{T}_0$ is completely random conditional on the covariates. In the current application, this assumption ignores the possibility that initial exposure to homesteads might be determined by unobserved factors. The framework also that the counterfactual outcomes at time $t$ does not depend on the future date of treatment exposure if $t <  \text{T}_0$ or the history of treatment exposure if $t >  \text{T}_0$. Violations of these assumptions would arise if the homestead policies is anticipated prior to $\text{T}_0$ or if the size of frontier state government is determined by whether the state was exposed early or late to homesteads. 

\subsection{DID estimates on state capacity}

I estimate (\ref{eq:dd}) on balanced state-year panel datasets of state government finances from the years 1783 to 1982. The covariate matrix $X_{it}$ includes measures of railroad access, farm sizes, and farm values. Missing values in $X_{it}$ are imputed separately in the time periods before and after 1868, carrying the last observation forward and impute remaining missing values by carrying the next observation backward.

Table \ref{dd-estimates} reports the DID treatment effect estimates and corresponding 95\% confidence intervals constructed using 1,000 state-stratified bootstrap samples. The estimates indicate that a 10\% increase in log per-capita homesteads is expected to significantly decrease log per-capita stage government finances by about 0.1\%. The point estimates are considerably smaller in magnitude -- albeit in the the same direction-- as the per-period MC-NNM estimates presented in Section \ref{main-estimates}. The bootstrap confidence intervals around the DID estimates are considerably more narrow than those for the MC-NNM per-period impacts displayed in Figure \ref{mc-estimates} and are potentially overoptimistic due to serial correlation in the DID regression errors.

\begin{table}[htbp]
	\begin{center}
		\caption{DID estimates: Impact of homestead entries on outcomes. \label{dd-estimates}}
		\resizebox{.8\width}{!}{\input{/media/jason/Dropbox/github/land-reform/paper/dd-estimates}}
	\end{center}
\end{table}

\subsection{Land inequality as a causal mechanism}  \label{mechanisms}

Inequality is a potential causal mechanism underlying the relationship between homesteads and state capacity. Median voter-based theories that assume parity in the political influence of voters predict a positive relationship between inequality and the size of governments \citep{meltzer1981rational}. In settings with high inequality, the median voter is poorer than the average voter, which in turn increases demand for redistribution in majority-rule elections. 

However, models that allow for economic differences in political influence predict a nonlinear or inverse relationship between inequality and government size. In \possessivecite{benabou2000unequal} model, for instance, the pivotal voter is wealthier than the median and has the power to block redistribution as inequality increases. But when inequality is too high, the poor can impose redistribution on elites through `universal' majority voting \citep{perotti1993political,saint1993education}. In \possessivecite{besley2009origins} framework, for example, greater economic power of the ruling class reduces investment in state capacity. Similarly, \citet{galor2009inequality} propose a model where wealthy landowners block education reforms because education favors industrial labor productivity and decreases the value in farm rents. Inequality in this context can be thought of as a proxy for the amount of \emph{de facto} political influence elites have to block reforms and limit the capacity of the state \citep{acemoglu2008persistence}. 

To test whether homesteads affected future land inequality in frontier counties, I calculate a commonly-used measure of land inequality based on the Gini coefficient of census farm sizes. Gini-based land inequality measures are commonly used as proxy for the \emph{de facto} bargaining power of landed elites \citep[e.g.,][]{boix2003democracy,ziblatt2008does,ansell2015}.\footnote{Note that the Gini coefficient will underestimate land inequality in counties with high shares of propertyless farmers because tenant farms are included in the farm size data, which is problematic because farms can be operated by different tenants but owned by the same landlord. I correct for this problem by adjusting the farm Gini coefficient by the ratio of farms to adult males, as recommended by \citet{vollrath2013inequality}.} A bivariate regression model yields a positive relationship between land inequality and state government finances during the period of 1860 to 1950, especially at higher levels of inequality (Figure \ref{fig:ineq-capacity}). 

The direction of this relationship contrasts with recent empirical studies that establish a negative relationship based on county-level analyses in specific regions and over a single census year. \citet{ramcharan2010inequality}, for instance, finds an inverse relationship between land inequality and county-level property tax revenues in 1890. The authors find that the negative relationship is especially large in rural counties, where landownership tends to be more concentrated. \citet{vollrath2013inequality} establish a negative relationship between land inequality and local property tax revenues in 1890 in northern rural counties. The present findings, in contrast, are based on state-level expenditure and revenue data collected over a much longer time span. 

\begin{figure}[htbp]
	\begin{center}
		\includegraphics[width=0.9\textwidth]{/media/jason/Dropbox/github/land-reform/paper/plots/ineq-capacity.png} 
	\end{center}
	\caption{Land inequality (lagged by 10 years) vs. log per-capita revenue and expenditure, 1860-1950. Each point is a state-year observation. Lines represent generalized additive model (GAM) fits to the data and shaded regions represent corresponding 95\% confidence intervals.   \label{fig:ineq-capacity}}
\end{figure} 

Table \ref{dd-estimates} presents DID estimates of the impact of log per-capita homesteads on land inequality at the state-level during the period of 1870 to 1950. Since land inequality is measured every decennial, I aggregate homesteads to the next decennial year; e.g., the number of homesteads measured in 1880 is the total for the years 1871 to 1880. Average farm values are included in the regression as a proxy for agricultural productivity, which might be associated with farm sizes approaching ideal scale and therefore land inequality. I estimate that homesteads significantly decreased land inequality in frontier states: a 1\% increase in log per-capita homesteads is expected to lower land inequality by $4.81\,\cdot\,10^{-6}$ points. The magnitude of this estimates implies that homestead policy did not fundamentally alter the long-run distribution of landownership, which may be explained by qualitative evidence that suggests public land laws were exploited by land speculators and natural resource companies and that the rents from public land were appropriated by the private sector.  

\section{Conclusion} \label{discussion-ch1} 

The findings of this paper signify that mid-nineteenth century homestead policies had long-lasting impacts that can potentially explain contemporary differences in state government capacity. MC-NNM and DID estimates imply that homestead policies --- or the homestead entries authorized by those policies --- had significant and negative impacts on state government expenditure and revenue that lasted a century following its implementation.  

I explore land inequality as a possible causal mechanism underlying the relationship between land reform and state capacity. First, I provide evidence of a positive relationship between land inequality and state government finances and that the slope of correlation increases at higher levels of inequality. A nonlinearity in the relationship between inequality and state capacity can arise in theoretical models that incorporate economic differences in political influence: greater income inequality reduces investments in fiscal capacity when elites have a monopoly on political power, however when inequality gets too high, the poor can impose redistribution through majority voting. Second, I present DID estimates that reveal per-capita homesteads significantly lowered land inequality in frontier states; although, the magnitude of the effect is negligible. The failure of homestead policy to fundamentally alter the long-run distribution of landownership may be explained by qualitative evidence that suggests public land laws were \emph{de facto} corporate welfarism. 

This paper makes a methodological contribution in applying matrix completion --- a machine learning method commonly used for user recommendation tasks --- for estimating causal impacts of policy interventions on time-series cross-sectional data. The promise of the method is three-fold. First, the method can be easily understood within the frameworks of modern causal inference and missing data imputation: we cannot directly observe the counterfactual outcomes of treated units and wish to impute these values on the basis of the observed values. Second, the method allows for patterns of missing data to have a time-series dependency structure and is thus adaptable to settings with staggered treatment adoption. Third, the method outperforms several other regression-based estimators in a battery of placebo tests. The performance advantage can be attributed to the fact that it is capable of using additional information in the form of pre-period observations of the treated units, whereas other estimators rely only on the pre-period observations of control units to predict counterfactuals. 

Further research is needed to determine the conditions under which consistency holds. Estimator consistency is required to obtain approximately unbiased $p$-values under the randomization inference procedure.