dscolby · dscolby · Jul 5, 2024 · Jun 22, 2024 · Jun 22, 2024 · Jun 22, 2024
diff --git a/Project.toml b/Project.toml
@@ -1,20 +1,20 @@
 name = "CausalELM"
 uuid = "26abab4e-b12e-45db-9809-c199ca6ddca8"
 authors = ["Darren Colby <[email protected]> and contributors"]
-version = "0.6"
+version = "0.7.0"
 
 [deps]
 LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
 Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
 
 [compat]
-LinearAlgebra = "1.7"
-Random = "1.7"
-julia = "1.7"
 Aqua = "0.8"
 DataFrames = "1.5"
 Documenter = "1.2"
+LinearAlgebra = "1.7"
+Random = "1.7"
 Test = "1.7"
+julia = "1.7"
 
 [extras]
 Aqua = "4c88cf16-eb10-579e-8560-4a9242c79595"

diff --git a/README.md b/README.md
@@ -41,11 +41,11 @@ series analysis, G-computation, and double machine learning; average treatment e
 treated (ATT) with G-computation; cumulative treatment effect with interrupted time series 
 analysis; and the conditional average treatment effect (CATE) via S-learning, T-learning, 
 X-learning, R-learning, and doubly robust estimation. Underlying all of these estimators are 
-extreme learning machines, a simple neural network that uses randomized weights instead of 
-using gradient descent. Once a model has been estimated, CausalELM can summarize the model, 
-including computing p-values via randomization inference, and conduct sensitivity analysis 
-to calidate the plausibility of modeling assumptions. Furthermore, all of this can be done 
-in four lines of code.
+ensembles of extreme learning machines, a simple neural network that uses randomized weights 
+and least squares optimization instead of gradient descent. Once a model has been estimated, 
+CausalELM can summarize the model and conduct sensitivity analysis to validate the 
+plausibility of modeling assumptions. Furthermore, all of this can be done in four lines of 
+code.
 </p>
 
 <h2>Extreme Learning Machines and Causal Inference</h2>
@@ -73,37 +73,39 @@ to adjust the initial estimates. This approach has three advantages. First, it i
 efficient with high dimensional data than conventional methods. Metalearners take a similar 
 approach to estimate the CATE. While all of these models are different, they have one thing 
 in common: how well they perform depends on the underlying model they fit to the data. To 
-that end, CausalELMs use extreme learning machines because they are simple yet flexible 
-enough to be universal function approximators.
+that end, CausalELMs use bagged ensembles of extreme learning machines because they are 
+simple yet flexible enough to be universal function approximators with lower varaince than 
+single extreme learning machines.
 </p>
 
 <h2>CausalELM Features</h2>
 <ul>
   <li>Estimate a causal effect, get a summary, and validate assumptions in just four lines of code</li>
-  <li>All models automatically select the best number of neurons and L2 penalty</li>
+  <li>Bagging improves performance and reduces variance without the need to tune a regularization parameter</li>
   <li>Enables using the same structs for regression and classification</li>
   <li>Includes 13 activation functions and allows user-defined activation functions</li>
   <li>Most inference and validation tests do not assume functional or distributional forms</li>
   <li>Implements the latest techniques form statistics, econometrics, and biostatistics</li>
-  <li>Works out of the box with DataFrames or arrays</li>
+  <li>Works out of the box with arrays or any data structure that implements the Tables.jl interface</li>
   <li>Codebase is high-quality, well tested, and regularly updated</li>
 </ul>
 
 <h2>What's New?</h2>
 <ul>
   <li>Now includes doubly robust estimator for CATE estimation</li>
-  <li>Uses generalized cross validation with successive halving to find the best ridge penalty</li>
-  <li>Double machine learning, R-learning, and doubly robust estimators suppot specifying confounders and covariates of interest separately</li>
-  <li>Counterfactual consistency validation simulates outcomes that violate the assumption rather than the previous binning approach</li>
-  <li>Standardized and improved docstrings and added doctests</li>
+  <li>All estimators now implement bagging to reduce predictive performance and reduce variance</li>
+  <li>Counterfactual consistency validation simulates more realistic violations of the counterfactual consistency assumption</li>
+  <li>Uses a simple heuristic to choose the number of neurons, which reduces training time and still works well in practice</li>
+  <li>Probability clipping for classifier predictions and residuals is no longer necessary due to the bagging procedure</li>
   <li>CausalELM talk has been accepted to JuliaCon 2024!</li> 
 </ul>
 
 <h2>What's Next?</h2>
 <p>
-Newer versions of CausalELM will hopefully support using GPUs and provide textual 
-interpretations of the results of calling validate on a model that has been estimated. 
-However, these priorities could also change depending on feedback recieved at JuliaCon.
+Newer versions of CausalELM will hopefully support using GPUs and provide interpretations of 
+the results of calling validate on a model that has been estimated. In addition, some 
+estimators will also support using instrumental variables. However, these priorities could 
+also change depending on feedback recieved at JuliaCon.
 </p>
 
 <h2>Disclaimer</h2>

diff --git a/docs/src/api.md b/docs/src/api.md
@@ -1,7 +1,7 @@
 # CausalELM
-Most of the methods and structs here are private, not exported, should not be called by the 
-user, and are documented for the purpose of developing CausalELM or to facilitate 
-understanding of the implementation.
+```@docs
+CausalELM.CausalELM
+```
 
 ## Types
 ```@docs
@@ -15,9 +15,8 @@ RLearner
 DoublyRobustLearner
 CausalELM.CausalEstimator
 CausalELM.Metalearner
-CausalELM.ExtremeLearningMachine
 CausalELM.ExtremeLearner
-CausalELM.RegularizedExtremeLearner
+CausalELM.ELMEnsemble
 CausalELM.Nonbinary
 CausalELM.Binary
 CausalELM.Count
@@ -41,28 +40,15 @@ elish
 fourier
 ```
 
-## Cross Validation
-```@docs
-CausalELM.generate_folds
-CausalELM.generate_temporal_folds
-CausalELM.validation_loss
-CausalELM.cross_validate
-CausalELM.best_size
-CausalELM.shuffle_data
-```
-
 ## Average Causal Effect Estimators
 ```@docs
 CausalELM.g_formula!
-CausalELM.causal_loss!
 CausalELM.predict_residuals
-CausalELM.make_folds
 CausalELM.moving_average
 ```
 
 ## Metalearners
 ```@docs
-CausalELM.causal_loss
 CausalELM.doubly_robust_formula!
 CausalELM.stage1!
 CausalELM.stage2!
@@ -94,7 +80,6 @@ CausalELM.e_value
 CausalELM.binarize
 CausalELM.risk_ratio
 CausalELM.positivity
-CausalELM.var_type
 ```
 
 ## Validation Metrics
@@ -114,17 +99,17 @@ CausalELM.fit!
 CausalELM.predict
 CausalELM.predict_counterfactual!
 CausalELM.placebo_test
-CausalELM.ridge_constant
 CausalELM.set_weights_biases
 ```
 
 ## Utility Functions
 ```@docs
+CausalELM.var_type
 CausalELM.mean
 CausalELM.var
 CausalELM.one_hot_encode
 CausalELM.clip_if_binary
 CausalELM.@model_config
 CausalELM.@standard_input_data
-CausalELM.@double_learner_input_data
+CausalELM.generate_folds
 ```
diff --git a/docs/src/contributing.md b/docs/src/contributing.md
@@ -27,15 +27,15 @@ code follows the guidelines below.
 
 *   Most new structs for estimating causal effects should have mostly the same fields. To 
     reduce the burden of repeatedly defining all these fields, it is advisable to use the 
-    model_config, standard_input_data, and double_learner_input_data macros to 
-    programmatically generate fields for new structs. Doing so will ensure that with little 
-    to no effort the new structs will work with the summarize and validate methods.
+    model_config and standard_input_data macros to programmatically generate fields for new 
+    structs. Doing so will ensure that with little to no effort the new structs will work 
+    with the summarize and validate methods.
 
 *   There are no repeated code blocks. If there are repeated codeblocks, then they should be 
     consolidated into a separate function.
 
-*   Methods should generally include types and be type stable. If there is a strong reason 
-    to deviate from this point, there should be a comment in the code explaining why.
+*   Interanl methods can contain types and be parametric but public methods should be as 
+    general as possible.
 
 *   Minimize use of new constants and macros. If they must be included, the reason for their 
     inclusion should be obvious or included in the docstring.

diff --git a/docs/src/guide/doublemachinelearning.md b/docs/src/guide/doublemachinelearning.md
@@ -4,13 +4,8 @@ estimating causal effects when the dimensionality of the covariates is too high
 regression or the treatment or outcomes cannot be easily modeled parametrically. Double 
 machine learning estimates models of the treatment assignment and outcome and then combines 
 them in a final model. This is a semiparametric model in the sense that the first stage 
-models can take on any functional form but the final stage model is linear.
-
-!!! note
-    If regularized is set to true then the ridge penalty will be estimated using generalized 
-    cross validation where the maximum number of iterations is 2 * folds for the successive 
-    halving procedure. However, if the penalty in on iteration is approximately the same as in 
-    the previous penalty, then the procedure will stop early.
+models can take on any functional form but the final stage model is a linear combination of 
+the residuals from the first stage models.
 
 !!! note
     For more information see:
@@ -19,70 +14,53 @@ models can take on any functional form but the final stage model is linear.
     Whitney Newey, and James Robins. "Double/debiased machine learning for treatment and 
     structural parameters." (2018): C1-C68.
 
-
 ## Step 1: Initialize a Model
-The DoubleMachineLearning constructor takes at least three arguments, an array of 
-covariates, a treatment vector, and an outcome vector. This estimator supports binary, count, 
-or continuous treatments and binary, count, continuous, or time to event outcomes. You can 
-also specify confounders that you do not want to estimate the CATE for by passing a parameter 
-to the W argument. Otherwise, the model assumes all possible confounders are contained in X.
+The DoubleMachineLearning constructor takes at least three arguments—covariates, a 
+treatment statuses, and outcomes, all of which may be either an array or any struct that 
+implements the Tables.jl interface (e.g. DataFrames). This estimator supports binary, count, 
+or continuous treatments and binary, count, continuous, or time to event outcomes.
 
 !!! note
-    Internally, the outcome and treatment models are treated as a regression since extreme 
-    learning machines minimize the MSE. This means that predicted treatments and outcomes 
-    under treatment and control groups could fall outside [0, 1], although this is not likely 
-    in practice. To deal with this, predicted binary variables are automatically clipped to 
-    [0.0000001, 0.9999999]. This also means that count outcomes will be predicted as continuous 
-    variables.
+    Non-binary categorical outcomes are treated as continuous.
 
 !!! tip
-    You can also specify the following options: whether the treatment vector is categorical ie 
-    not continuous and containing more than two classes, whether to use L2 regularization, the 
-    activation function, the validation metric to use when searching for the best number of 
-    neurons, the minimum and maximum number of neurons to consider, the number of folds to use 
-    for cross validation, the number of iterations to perform cross validation, and the number 
-    of neurons to use in the ELM used to learn the function from number of neurons to validation 
-    loss. These arguments are specified with the following keyword arguments: t\_cat, 
-    regularized, activation, validation\_metric, min\_neurons, max\_neurons, folds, iterations, 
-    and approximator\_neurons.
+    You can also specify the the number of folds to use for cross-fitting, the number of 
+    extreme learning machines to incorporate in the ensemble, the number of features to 
+    consider for each extreme learning machine, the activation function to use, the number 
+    of observations to bootstrap in each extreme learning machine, and the number of neurons 
+    in each extreme learning machine. These arguments are specified with the folds, 
+    num_machines, num_features, activation, sample_size, and num\_neurons keywords.
+
 ```julia
 # Create some data with a binary treatment
 X, T, Y, W = rand(100, 5), [rand()<0.4 for i in 1:100], rand(100), rand(100, 4)
 
-# We could also use DataFrames
+# We could also use DataFrames or any other package implementing the Tables.jl API
 # using DataFrames
 # X = DataFrame(x1=rand(100), x2=rand(100), x3=rand(100), x4=rand(100), x5=rand(100))
 # T, Y = DataFrame(t=[rand()<0.4 for i in 1:100]), DataFrame(y=rand(100))
-# W = DataFrame(w1=rand(100), w2=rand(100), w3=rand(100), w4=rand(100))
-
-# W is optional and means there are confounders that you are not interested in estimating
-# the CATE for
-dml = DoubleMachineLearning(X, T, Y, W=W)
+dml = DoubleMachineLearning(X, T, Y)
 ```
 
 ## Step 2: Estimate the Causal Effect
-To estimate the causal effect, we call estimatecausaleffect! on the model above.
+To estimate the causal effect, we call estimate_causal_effect! on the model above.
 ```julia
 # we could also estimate the ATT by passing quantity_of_interest="ATT"
 estimate_causal_effect!(dml)
 ```
 
 # Get a Summary
-We can get a summary that includes a p-value and standard error estimated via asymptotic 
-randomization inference by passing our model to the summarize method.
-
-Calling the summarize method returns a dictionary with the estimator's task (regression or 
-classification), the quantity of interest being estimated (ATE), whether the model uses an 
-L2 penalty (always true for DML), the activation function used in the model's outcome 
-predictors, whether the data is temporal (always false for DML), the validation metric used 
-for cross validation to find the best number of neurons, the number of neurons used in the 
-ELMs used by the estimator, the number of neurons used in the ELM used to learn a mapping 
-from number of neurons to validation loss during cross validation, the causal effect, 
-standard error, and p-value.
+We can get a summary of the model by pasing the model to the summarize method.
+
+!!!note
+    To calculate the p-value and standard error for the treatmetn effect, you can set the 
+    inference argument to false. However, p-values and standard errors are calculated via 
+    randomization inference, which will take a long time. But can be sped up by launching 
+    Julia with a higher number of threads.
+
 ```julia
 # Can also use the British spelling
 # summarise(dml)
-
 summarize(dml)
 ```
 
@@ -94,12 +72,12 @@ tests do not provide definitive evidence of a violation of these assumptions. To
 counterfactual consistency assumption, we simulate counterfactual outcomes that are 
 different from the observed outcomes, estimate models with the simulated counterfactual 
 outcomes, and take the averages. If the outcome is continuous, the noise for the simulated 
-counterfactuals is drawn from N(0, dev) for each element in devs, otherwise the default is 
-0.25, 0.5, 0.75, and 1.0 standard deviations from the mean outcome. For discrete variables, 
-each outcome is replaced with a different value in the range of outcomes with probability ϵ 
-for each ϵ in devs, otherwise the default is 0.025, 0.05, 0.075, 0.1. If the average 
-estimate for a given level of violation differs greatly from the effect estimated on the 
-actual data, then the model is very sensitive to violations of the counterfactual 
+counterfactuals is drawn from N(0, dev) for each element in devs and each outcome, 
+multiplied by the original outcome, and added to the original outcome. For discrete 
+variables, each outcome is replaced with a different value in the range of outcomes with 
+probability ϵ for each ϵ in devs, otherwise the default is 0.025, 0.05, 0.075, 0.1. If the 
+average estimate for a given level of violation differs greatly from the effect estimated on 
+the actual data, then the model is very sensitive to violations of the counterfactual 
 consistency assumption for that level of violation. Next, this method tests the model's 
 sensitivity to a violation of the exchangeability assumption by calculating the E-value, 
 which is the minimum strength of association, on the risk ratio scale, that an unobserved 

diff --git a/docs/src/guide/estimatorselection.md b/docs/src/guide/estimatorselection.md
@@ -5,15 +5,13 @@ given dataset and causal question.
 
 | Model                            | Struct                | Causal Estimands                 | Supported Treatment Types | Supported Outcome Types                  |
 |----------------------------------|-----------------------|----------------------------------|---------------------------|------------------------------------------|
-| Interrupted Time Series Analysis | InterruptedTimeSeries | ATE, Cumulative Treatment Effect | Binary                   | Continuous, Count[^2], Time to Event         |
-| G-computation                    | GComputation          | ATE, ATT, ITT                    | Binary                   | Binary[^1],Continuous, Time to Event, Count[^2] |
-| Double Machine Learning          | DoubleMachineLearning | ATE                              | Binary[^1], Count[^2], Continuous | Binary[^1], Count[^2], Continuous, Time to Event |
-| S-learning                       | SLearner              | CATE                             | Binary                    | Binary[^1], Continuous, Time to Event, Count[^2] |
-| T-learning                       | TLearner              | CATE                             | Binary                    | Binary[^1], Continuous, Count[^2], Time to Event |
-| X-learning                       | XLearner              | CATE                             | Binary[^1]                    | Binary[^1], Continuous, Count[^2], Time to Event |
-| R-learning                       | RLearner              | CATE                             | Binary[^1], Count[^2], Continuous | Binary[^1], Count[^2], Continuous, Time to Event |
-| Doubly Robust Estimation         | DoublyRobustLearner   | CATE                             | Binary                    | Binary[^1], Continuous, Count[^2], Time to Event |
+| Interrupted Time Series Analysis | InterruptedTimeSeries | ATE, Cumulative Treatment Effect | Binary                   | Continuous, Count[^1], Time to Event         |
+| G-computation                    | GComputation          | ATE, ATT, ITT                    | Binary                   | Binary,Continuous, Time to Event, Count[^1] |
+| Double Machine Learning          | DoubleMachineLearning | ATE                              | Binary, Count[^1], Continuous | Binary, Count[^1], Continuous, Time to Event |
+| S-learning                       | SLearner              | CATE                             | Binary                    | Binary, Continuous, Time to Event, Count[^1] |
+| T-learning                       | TLearner              | CATE                             | Binary                    | Binary, Continuous, Count[^1], Time to Event |
+| X-learning                       | XLearner              | CATE                             | Binary                    | Binary, Continuous, Count[^1], Time to Event |
+| R-learning                       | RLearner              | CATE                             | Binary, Count[^1], Continuous | Binary, Count[^1], Continuous, Time to Event |
+| Doubly Robust Estimation         | DoublyRobustLearner   | CATE                             | Binary                    | Binary, Continuous, Count[^1], Time to Event |
 
-[^1]: Models that use propensity scores or predict binary treatment assignment may, on very rare occasions, return values outside of [0, 1]. In that case, values are clipped to be between 0.0000001 and 0.9999999.
-
-[^2]: Similar to other packages, predictions of count variables is treated as a continuous regression task.
+[^1]: Similar to other packages, predictions of count variables is treated as a continuous regression task.