name | topic | maintainer | version | source | |
---|---|---|---|---|---|
Econometrics |
Econometrics |
Achim Zeileis, Grant McDermott, Kevin Tappe |
2022-12-29 |
Base R ships with a lot of functionality useful for (computational) econometrics,
in particular in the stats package. This functionality is complemented by many
packages on CRAN, a brief overview is given below. There is also a certain
overlap between the tools for econometrics in this view and those in the task
views on r view("Finance")
, r view("TimeSeries")
, and r view("CausalInference")
.
The packages in this view can be roughly structured into the following topics. If you think that some package is missing from the list, please file an issue in the GitHub repository or contact the maintainer.
- Estimation and standard inference: Ordinary least squares (OLS)
estimation for linear models is provided by
lm()
(from stats) and standard tests for model comparisons are available in various methods such assummary()
andanova()
. - Further inference and nested model comparisons: Functions analogous to
the basic
summary()
andanova()
methods that also support asymptotic tests (z instead of t tests, and Chi-squared instead of F tests) and plug-in of other covariance matrices arecoeftest()
andwaldtest()
inr pkg("lmtest", priority = "core")
. (Non)linear hypothesis testing for a wide range of R packages can implemented through thedeltamethod()
function ofr pkg("marginaleffects", priority = "core")
. This expands on older (non)linear hypothesis test functions likelinearHypothesis()
anddeltaMethod()
fromr pkg("car", priority = "core")
. - Robust standard errors: HC, HAC, clustered, and bootstrap covariance
matrices are available in
r pkg("sandwich", priority = "core")
and can be plugged into the inference functions mentioned above. - Nonnested model comparisons: Various tests for comparing non-nested
linear models are available in
r pkg("lmtest")
(encompassing test, J test, Cox test). The Vuong test for comparing other non-nested models is provided byr pkg("nonnest2")
(and specifically for count data regression inr pkg("pscl")
). - Diagnostic checking: The packages
r pkg("car")
andr pkg("lmtest")
provide a large collection of regression diagnostics and diagnostic tests. - Miscellaneous: Much of the above functionality is bundled together in
r pkg("fixest", priority = "core")
, which provides a number of in-built convenience features that users may find attractive. This includes robust standard error specification, multi-model estimation, custom hypothesis testing, etc.
- Generalized linear models (GLMs): Many standard microeconometric models
belong to the family of generalized linear models and can be fitted by
glm()
from package stats. This includes in particular logit and probit models for modeling choice data and Poisson models for count data. - Effects and marginal effects: Effects for typical values of regressors
in GLMs and various other probabilistic regression models can be obtained
and visualized using
r pkg("effects")
. Marginal effect tables and corresponding visualizations for a wide range of models can be be produced withr pkg("marginaleffects", priority = "core")
. Other implementations of marginal effects for certain models are inr pkg("margins")
andr pkg("mfx")
. Interactive visualizations of both effects and marginal effects are possible inr pkg("LinRegInteractive")
. - Binary responses: The standard logit and probit models (among many
others) for binary responses are GLMs that can be estimated by
glm()
withfamily = binomial
. Bias-reduced GLMs that are robust to complete and quasi-complete separation are provided byr pkg("brglm")
. Discrete choice models estimated by simulated maximum likelihood are implemented inr pkg("Rchoice")
.r pkg("bife")
provides binary choice models with fixed effects. Heteroscedastic probit models (and other heteroscedastic GLMs) are implemented inr pkg("glmx")
along with parametric link functions and goodness-of-link tests for GLMs. - Count responses: The basic Poisson regression is a GLM that can be
estimated by
glm()
withfamily = poisson
as explained above. Negative binomial GLMs are available viaglm.nb()
in packager pkg("MASS")
. Another implementation of negative binomial models is provided byr pkg("aod")
, which also contains other models for overdispersed data. Zero-inflated and hurdle count models are provided in packager pkg("pscl")
. A reimplementation by the same authors is currently under development inr rforge("countreg")
on R-Forge which also encompasses separate functions for zero-truncated regression, finite mixture models etc. - Multinomial responses: Multinomial models with individual-specific
covariates only are available in
multinom()
from packager pkg("nnet")
. An implementation with both individual- and choice-specific variables isr pkg("mlogit")
. Generalized multinomial logit models (e.g., with random effects etc.) are inr pkg("gmnl")
. A flexible framework of various customizable choice models (including multinomial logit and nested logit among many others) is implemented in ther pkg("apollo")
package. The newerr pkg("logitr")
package combines many of the features from these preceding packages and also offers some meaningful performance improvements for fast estimation of multinomial and mixed logit models. Generalized additive models (GAMs) for multinomial responses can be fitted with ther pkg("VGAM")
package. A Bayesian approach to multinomial probit models is provided byr pkg("MNP")
. Various Bayesian multinomial models (including logit and probit) are available inr pkg("bayesm")
. Furthermore, the packager pkg("RSGHB")
fits various hierarchical Bayesian specifications based on direct specification of the likelihood function. - Ordered responses: Proportional-odds regression for ordered responses is
implemented in
polr()
from packager pkg("MASS")
. The packager pkg("ordinal")
provides cumulative link models for ordered data which encompasses proportional odds models but also includes more general specifications. Bayesian ordered probit models are provided byr pkg("bayesm")
. - Censored responses: Basic censored regression models (e.g., tobit models)
can be fitted by
survreg()
inr pkg("survival")
, a convenience interfacetobit()
is in packager pkg("AER", priority = "core")
. Further censored regression models, including models for panel data, are provided inr pkg("censReg")
. Censored regression models with conditional heteroscedasticity are inr pkg("crch")
. Furthermore, hurdle models for left-censored data at zero can be estimated withr pkg("mhurdle")
. Models for sample selection are available inr pkg("sampleSelection")
andr pkg("ssmrob")
using classical and robust inference, respectively. Packager pkg("matchingMarkets")
corrects for selection bias when the sample is the result of a stable matching process (e.g., a group formation or college admissions problem). - Truncated responses:
r pkg("crch")
for truncated (and potentially heteroscedastic) Gaussian, logistic, and t responses. Homoscedastic Gaussian responses are also available inr pkg("truncreg")
. - Fraction and proportion responses: Beta regression for responses in (0, 1) is in
r pkg("betareg")
andr pkg("gamlss")
. - Duration responses: Many classical duration models can be fitted with
r pkg("survival")
, e.g., Cox proportional hazard models withcoxph()
or Weibull models withsurvreg()
. Many more refined models can be found in ther view("Survival")
task view. - High-dimensional fixed effects: Linear and generalized linear models with
potentially high-dimensional fixed effects, also for multiple groups, can be
fitted with
r pkg("fixest", priority = "core")
, using optimized parallel C++ code. Other implementations of high-dimensional fixed effects are inr pkg("lfe")
andr pkg("alpaca")
for linear and generalized linear models, respectively. - Miscellaneous: Further more refined tools for microeconometrics are
provided in the
r pkg("micEcon")
family of packages: Analysis with Cobb-Douglas, translog, and quadratic functions is inr pkg("micEcon")
; the constant elasticity of scale (CES) function is inr pkg("micEconCES")
; the symmetric normalized quadratic profit (SNQP) function is inr pkg("micEconSNQP")
. The almost ideal demand system (AIDS) is inr pkg("micEconAids")
. Stochastic frontier analysis (SFA) is inr pkg("frontier")
. Semiparametric SFA in is available inr pkg("semsfa")
and spatial SFA inr pkg("ssfa")
. The packager pkg("bayesm")
implements a Bayesian approach to microeconometrics and marketing. Inference for relative distributions is contained in packager pkg("reldist")
.
We review packages related to some common research designs for causal
inference below. This section is necessarily brief and should be paired with
the r view("CausalInference")
task view, since is there a high degree of
overlap.
- Basic difference-in-differences (DiD): The canonical 2x2 DiD model (two
units, two periods) can be estimated as a simple interaction between two
factor variables in
lm()
orglm()
, etc. Similarly, the equivalent two-way fixed effects (TWFE) design can be obtained using factors to control for unit and time fixed effects. However, for high-dimensional datasets TWFE is more conveniently estimated using a dedicated panel data package liker pkg("fixest")
orr pkg("plm")
. The former even provides a conveniencei()
operator for constructing and interacting factors in TWFE settings. - Advanced DiD and TWFE corrections: Despite its long-standing popularity,
recent research has uncovered various problems with (naive) TWFE; for example,
severe bias in the presence of staggered treatment rollout. A cottage
industry of workarounds and
alternative estimators now exists to address these problems. R package
implementations include:
r pkg("bacondecomp")
,r pkg("did")
,r pkg("did2s")
,r pkg("DRDID")
,r pkg("etwfe")
,r pkg("fixest")
(via thesunab()
function), andr pkg("gsynth")
. - Synthetic control: The original synthetic control (SC) implementation is
available through
r pkg("Synth")
, whiler pkg("tidysynth")
offers a newer SC implementation with various enhancements (speed, inspection, etc.) Similarly,r pkg("gsynth")
generalizes the original SC implementation to multiple treated units and variable treatment periods, and also supports additional estimation methods like the EM algorithm and matrix completion.
- Basic instrumental variables (IV) regression: Two-stage least squares
(2SLS) is provided by
r pkg("ivreg", priority = "core")
, which separates out the dedicated 2SLS routines previously found inr pkg("AER")
). Another implementation is available astsls()
in packager pkg("sem")
. - Binary responses: An IV probit model via GLS estimation is available in
r pkg("ivprobit")
. Ther pkg("LARF")
package estimates local average response functions for binary treatments and binary instruments. - Panel data: Several panel data model packages (see below) provide their own
dedicated IV routines for efficient estimation in the presence of
high-dimensional data. These include
r pkg("fixest")
andr pkg("lfe")
for fixed effects, andr pkg("plm")
for first-difference, between, and multiple random effects methods. - Miscellaneous:
r pkg("REndo")
fits linear models with endogenous regressor using various latent instrumental variable approaches.r pkg("SteinIV")
provides semi-parametric IV estimators, including JIVE and SPS.
- Regression discontinuity design (RDD) methods are implemented in
r pkg("rdrobust")
(offering robust confidence interval construction and bandwidth selection),r pkg("rddensity")
(density discontinuity testing (also known as manipulation testing)),r pkg("rdlocrand")
(inference under local randomization), andr pkg("rdmulti")
(analysis with multiple cutoffs or scores). - Tools to perform power, sample size and minimum detectable effects (MDE)
calculations are available in
r pkg("rdpower")
, whiler pkg("RATest")
provides a collection of randomization tests, including a permutation test for the continuity assumption of the baseline covariates in the sharp RDD.
- Panel standard errors: A simple approach for panel data is to fit the
pooling (or independence) model (e.g., via
lm()
orglm()
) and only correct the standard errors. Different types of clustered, panel, and panel-corrected standard errors are available inr pkg("sandwich")
(incorporating prior work fromr pkg("multiwayvcov")
),r pkg("clusterSEs")
,r pkg("pcse")
,r pkg("clubSandwich")
,r pkg("plm", priority = "core")
, andr pkg("geepack")
, respectively. The latter two require estimation of the pooling/independence models viaplm()
andgeeglm()
from the respective packages (which also provide other types of models, see below). - Linear panel models:
r pkg("fixest", priority = "core")
provides very efficient fixed-effect routines that scale to high-dimensional data and multiple fixed-effects.r pkg("plm")
, providing a wide range of within, between, and random-effect methods (among others) along with corrected standard errors, tests, etc. Various dynamic panel models are available inr pkg("plm")
, with estimation based on moment conditions inr pkg("pdynmc")
, and dynamic panel models with fixed effects inr pkg("OrthoPanels")
.r pkg("feisr")
provides fixed effects individual slope (FEIS) models. Panel vector autoregressions are implemented inr pkg("panelvar")
. - GLMs and generalized estimation equations. The aformentioned
r pkg("fixest")
supports a variety of GLM-like models in addition to linear panel models. This includes efficient fixed-effect estimation of logit, probit, Poisson, and negative binomial models. Similar functionality is provided byr pkg("alpaca")
(which also accounts for incidental parameter problems) andr pkg("pglm")
. GEE models for panel data (or longitudinal data in statistical jargon) are available in inr pkg("geepack")
. - Mixed effects models: Linear and nonlinear models for panel data (and
more general multi-level data) are available in
r pkg("lme4")
andr pkg("nlme")
. - Instrumental variables:
r pkg("fixest")
. See also above. - Miscellaneous: Threshold regression
and unit root tests are in
r pkg("pdR")
. The panel data approach method for program evaluation is available inr pkg("pampe")
. Dedicated fast data preprocessing for panel data econometrics is provided byr pkg("collapse")
.
- Nonlinear least squares modeling:
nls()
in package stats. - Quantile regression:
r pkg("quantreg")
(including linear, nonlinear, censored, locally polynomial and additive quantile regressions). - Generalized method of moments (GMM) and generalized empirical likelihood
(GEL):
r pkg("gmm")
. - Spatial econometric models: The
r view("Spatial")
view gives details about handling spatial data, along with information about (regression) modeling. In particular, spatial regression models can be fitted usingr pkg("spatialreg")
andr pkg("sphet")
(the latter using a GMM approach).r pkg("splm")
is a package for spatial panel models. Spatial probit models are available inr pkg("spatialprobit")
and spatial seemingly unrelated regression (SUR) models inr pkg("spsur")
. - Bayesian model averaging (BMA): A comprehensive toolbox for BMA is
provided by
r pkg("BMS")
including flexible prior selection, sampling, etc. A different implementation is inr pkg("BMA")
for linear models, generalizable linear models and survival models (Cox regression). - Linear structural equation models:
r pkg("lavaan")
andr pkg("sem")
. See also ther view("Psychometrics")
task view for more details. - Machine learning: There are several packages that combine machine
learning techniques with econometric inference (especially for identifying
causal effects). These include
r pkg("grf")
for causal random forests and estimation of heterogeneous treatment effects,r pkg("DoubleML")
for double machine learning of a wide range of models from the mlr3 family, andr pkg("hdm")
for selected high-dimensional econometric models. For a more general overview see ther view("MachineLearning")
task view. - Simultaneous equation estimation:
r pkg("systemfit")
. - Nonparametric methods:
r pkg("np")
using kernel smoothing andr pkg("NNS")
using partial moments. - Linear and nonlinear mixed-effect models:
r pkg("nlme")
andr pkg("lme4")
. - Generalized additive models (GAMs):
r pkg("mgcv")
,r pkg("gam")
,r pkg("gamlss")
andr pkg("VGAM")
. - Design-based inference:
r pkg("estimatr")
contains fast procedures for several design-appropriate estimators with robust standard errors and confidence intervals including linear regression, instrumental variables regression, difference-in-means, among others. - Extreme bounds analysis:
r pkg("ExtremeBounds")
. - Miscellaneous: The packages
r pkg("VGAM")
,r pkg("rms")
andr pkg("Hmisc")
provide several tools for extended handling of (generalized) linear regression models.
- The
r view("TimeSeries")
task view provides much more detailed information about both basic time series infrastructure and time series models. Here, only the most important aspects relating to econometrics are briefly mentioned. Time series models for financial econometrics (e.g., GARCH, stochastic volatility models, or stochastic differential equations, etc.) are described in ther view("Finance")
task view. - Infrastructure for regularly spaced time series: The class
"ts"
in package stats is R's standard class for regularly spaced time series (especially annual, quarterly, and monthly data). It can be coerced back and forth without loss of information to"zooreg"
from packager pkg("zoo", priority = "core")
. - Infrastructure for irregularly spaced time series:
r pkg("zoo")
provides infrastructure for both regularly and irregularly spaced time series (the latter via the class"zoo"
) where the time information can be of arbitrary class. This includes daily series (typically with"Date"
time index) or intra-day series (e.g., with"POSIXct"
time index). An extension based onr pkg("zoo")
geared towards time series with different kinds of time index isr pkg("xts")
. Further packages aimed particularly at finance applications are discussed in ther view("Finance")
task view. - Classical time series models: Simple autoregressive models can be
estimated with
ar()
and ARIMA modeling and Box-Jenkins-type analysis can be carried out witharima()
(both in the stats package). An enhanced version ofarima()
is inr pkg("forecast", priority = "core")
. - Linear regression models: A convenience interface to
lm()
for estimating OLS and 2SLS models based on time series data isr pkg("dynlm")
. Linear regression models with AR error terms via GLS is possible usinggls()
fromr pkg("nlme")
. - Structural time series models: Standard models can be fitted with
StructTS()
in stats. Further packages are discussed in ther view("TimeSeries")
task view. - Filtering and decomposition:
decompose()
andHoltWinters()
in stats. The basic function for computing filters (both rolling and autoregressive) isfilter()
in stats. Many extensions to these methods, in particular for forecasting and model selection, are provided in ther pkg("forecast")
package. - Vector autoregression: Simple models can be fitted by
ar()
in stats, more elaborate models are provided in packager pkg("vars")
along with suitable diagnostics, visualizations etc. Panel vector autoregressions are available inr pkg("panelvar")
. - Unit root and cointegration tests:
r pkg("urca", priority = "core")
,r pkg("tseries", priority = "core")
,r pkg("CADFtest")
. See alsor pkg("pco")
for panel cointegration tests andr pkg("plm", priority = "core")
for panel unit root tests. - Miscellaneous:
r pkg("tsDyn")
- Threshold and smooth transition models.r pkg("midasr")
- MIDAS regression and other econometric methods for mixed frequency time series data analysis.r pkg("gets")
- GEneral-To-Specific (GETS) model selection for either ARX models with log-ARCH-X errors, or a log-ARCH-X model of the log variance.r pkg("bimets")
- Econometric modeling of time series data using flexible specifications of simultaneous equation models.r pkg("dlsem")
- Distributed-lag linear structural equation models.r pkg("lpirfs")
- Local projections impulse response functions.r pkg("apt")
- Asymmetric price transmission models.
- Textbooks and journals: Packages
r pkg("AER")
,r pkg("Ecdat")
, andr pkg("wooldridge")
contain a comprehensive collections of data sets from various standard econometric textbooks (including Greene, Stock & Watson, Wooldridge, Baltagi, among others) as well as several data sets from the Journal of Applied Econometrics and the Journal of Business & Economic Statistics data archives.r pkg("AER")
andr pkg("wooldridge")
additionally provide extensive sets of examples reproducing analyses from the textbooks/papers, illustrating various econometric methods. Inr pkg("pder")
a wide collection of data sets for "Panel Data Econometrics with R" (Croissant & Millo 2018) is available. Ther github("ccolonescu/PoEdata")
package on GitHub provides the data sets from "Principles of Econometrics" (4th ed, by Hill, Griffiths, and Lim 2011). - Canadian monetary aggregates:
r pkg("CDNmoney")
. - Penn World Table:
r pkg("pwt")
provides versions 5.6, 6.x, 7.x. Version 8.x and 9.x data are available inr pkg("pwt8")
andr pkg("pwt9")
, respectively. - Time series and forecasting data: The packages
r pkg("expsmooth")
,r pkg("fma")
, andr pkg("Mcomp")
are data packages with time series data from the books "Forecasting with Exponential Smoothing: The State Space Approach" (Hyndman, Koehler, Ord, Snyder, 2008, Springer) and "Forecasting: Methods and Applications" (Makridakis, Wheelwright, Hyndman, 3rd ed., 1998, Wiley) and the M-competitions, respectively. - Empirical Research in Economics: Package
r pkg("erer")
contains functions and datasets for the book of "Empirical Research in Economics: Growing up with R" (Sun 2015). - Panel Study of Income Dynamics (PSID):
r pkg("psidR")
can build panel data sets from the Panel Study of Income Dynamics (PSID). - World Bank data and statistics: The
r pkg("wbstats")
package provides programmatic access to the World Bank API.
- Model tables: A flexible implementation of side-by-side summary tables for
a wide range of statistical models along with corresponding visualizations
and data summary tables is provided in
r pkg("modelsummary")
. Other implementations as well as further utilities for integrating econometric and statistical results in scientific papers etc. are discussed in ther view("ReproducibleResearch")
task view. - Matrix manipulations: As a vector- and matrix-based language, base R
ships with many powerful tools for doing matrix manipulations, which are
complemented by the packages
r pkg("Matrix")
andr pkg("SparseM")
. - Optimization and mathematical programming: R and many of its contributed
packages provide many specialized functions for solving particular
optimization problems, e.g., in regression as discussed above. Further
functionality for solving more general optimization problems, e.g.,
likelihood maximization, is discussed in the the
r view("Optimization")
task view. - Bootstrap: In addition to the recommended
r pkg("boot")
package, there are some other general bootstrapping techniques available inr pkg("bootstrap")
orr pkg("simpleboot")
as well some bootstrap techniques designed for time-series data, such as the maximum entropy bootstrap inr pkg("meboot")
or thetsbootstrap()
fromr pkg("tseries")
. Ther pkg("fwildclusterboot")
package provides a fast wild cluster bootstrap implementation for linear regression models, especially when the number of clusters is low. - Inequality: For measuring inequality, concentration and poverty the
package
r pkg("ineq")
provides some basic tools such as Lorenz curves, Pen's parade, the Gini coefficient, Herfindahl-Hirschman index and many more. - Structural change: R is particularly strong when dealing with structural
changes and changepoints in parametric models, see
r pkg("strucchange")
andr pkg("segmented")
. - Exchange rate regimes: Methods for inference about exchange rate regimes,
in particular in a structural change setting, are provided by
r pkg("fxregime")
. - Global value chains: Tools and decompositions for global value chains are
in
r pkg("gvc")
andr pkg("decompr")
. - Regression discontinuity design: A variety of methods are provided in the
r pkg("rdd")
,r pkg("rdrobust")
, andr pkg("rdlocrand")
packages. Ther pkg("rdpower")
package offers power calculations for regression discontinuity designs. Andr pkg("rdmulti")
implements analysis with multiple cutoffs or scores. - Gravity models: Estimation of log-log and multiplicative gravity models
is available in
r pkg("gravity")
. - z-Tree:
r pkg("zTree")
can import data from the z-Tree software for developing and carrying out economic experiments. - Numerical standard errors:
r pkg("nse")
implements various numerical standard errors for time series data, especially in simulation experiments with correlated outcome sequences.
- Articles: Special Volume on "Econometrics in R" in JSS (2008)
- Book: Applied Econometrics with R (Kleiber & Zeileis; 2008)
- Book: Introduction to Econometrics with R (Hanck, Arnold, Gerber, & Schmelzer; 2021)
- Book: Introduction to Econometrics with R (Oswald, Robin, & Viers; 2020)
- Book: Causal Inference: The Mixtape (Cunningham; 2021)
- Book: Hands-On Intermediate Econometrics Using R (Vinod; 2008)
- Book: Learning Microeconometrics with R (Adams; 2021)
- Book: Panel Data Econometrics with R (Croissant & Millo; 2018)
- Book: Principles of Econometrics with R (Colonescu; 2016)
- Book: Spatial Econometrics (Kelejian & Piras; 2017)
- Book: Statistical Inference via Data Science (Ismay & Kim; 2022)
- Book: The Effect (Huntington-Klein; 2022)
- Book: Using R for Introductory Econometrics (Heiss; 2019)
- Course: Applied Empirical Methods (Goldsmith-Pinkham; 2021)
- Course: Data Science for Economists (McDermott; 2021)
- Course: Econometrics In-Class Labs (Ransom; 2021)
- Course: Introduction to Econometrics (Rubin; 2021)
- Course: PhD Econometrics (Rubin; 2022)
- Course: Program Evaluation for Public Service (Heiss, 2022)
- Course: Statistical Rethinking (McElreath; 2022)
- Website: Stata2R