Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development #66

Merged
merged 26 commits into from
Jul 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
dbb101d
Replaced cross validation with log heuristic to select number of neurons
dscolby Jun 22, 2024
111c8b1
Merge pull request #63 from dscolby/hotfix-v0.6.1
dscolby Jun 22, 2024
2252b14
Updated release notes
dscolby Jun 22, 2024
d942210
Merge pull request #64 from dscolby/hotfix-v0.6.1
dscolby Jun 22, 2024
a1f7840
Made inference optional
dscolby Jun 23, 2024
98a4bd1
Checking if issue is local
dscolby Jun 26, 2024
e486c60
Implemented ELM ensembles with bagging
dscolby Jun 28, 2024
fe540c0
Fixed double machine learning estimation
dscolby Jun 30, 2024
da335b1
Added multithreading for generating null distributions
dscolby Jun 30, 2024
e905506
Removed redundant W argument
dscolby Jun 30, 2024
03eb8aa
Changed default number of features for estimators
dscolby Jul 1, 2024
28d0a0a
Moved generate_folds to utilities.jl
dscolby Jul 1, 2024
c95d6ff
Fixed R-learning
dscolby Jul 1, 2024
2ee91b8
Implemented probabilistic predictions for binary outcomes
dscolby Jul 1, 2024
4c45278
Made better keys for dictionary returned by counterfactual_consistency
dscolby Jul 2, 2024
c97ea4e
Fixed R-learning again
dscolby Jul 2, 2024
c6139e1
Shuffled data in DML, DRE, and RLearner constructors
dscolby Jul 2, 2024
84008b5
Made swish the default activation function
dscolby Jul 3, 2024
bfc9795
Changed the default number of machines to 50
dscolby Jul 4, 2024
cb0845f
Changed how noise is calculated to test counterfactual consistency
dscolby Jul 4, 2024
2c34658
Added parallel execution to calculate null distributions
dscolby Jul 4, 2024
2c2adb7
Added multithreading in counterfactual_consistency
dscolby Jul 5, 2024
fe62069
Cleaned up docs
dscolby Jul 5, 2024
5803950
Fixed documentation
dscolby Jul 5, 2024
d39f14d
Updated version
dscolby Jul 5, 2024
2df0032
Tested persistent tasks
dscolby Jul 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
name = "CausalELM"
uuid = "26abab4e-b12e-45db-9809-c199ca6ddca8"
authors = ["Darren Colby <[email protected]> and contributors"]
version = "0.6"
version = "0.7.0"

[deps]
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"

[compat]
LinearAlgebra = "1.7"
Random = "1.7"
julia = "1.7"
Aqua = "0.8"
DataFrames = "1.5"
Documenter = "1.2"
LinearAlgebra = "1.7"
Random = "1.7"
Test = "1.7"
julia = "1.7"

[extras]
Aqua = "4c88cf16-eb10-579e-8560-4a9242c79595"
Expand Down
34 changes: 18 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,11 @@ series analysis, G-computation, and double machine learning; average treatment e
treated (ATT) with G-computation; cumulative treatment effect with interrupted time series
analysis; and the conditional average treatment effect (CATE) via S-learning, T-learning,
X-learning, R-learning, and doubly robust estimation. Underlying all of these estimators are
extreme learning machines, a simple neural network that uses randomized weights instead of
using gradient descent. Once a model has been estimated, CausalELM can summarize the model,
including computing p-values via randomization inference, and conduct sensitivity analysis
to calidate the plausibility of modeling assumptions. Furthermore, all of this can be done
in four lines of code.
ensembles of extreme learning machines, a simple neural network that uses randomized weights
and least squares optimization instead of gradient descent. Once a model has been estimated,
CausalELM can summarize the model and conduct sensitivity analysis to validate the
plausibility of modeling assumptions. Furthermore, all of this can be done in four lines of
code.
</p>

<h2>Extreme Learning Machines and Causal Inference</h2>
Expand Down Expand Up @@ -73,37 +73,39 @@ to adjust the initial estimates. This approach has three advantages. First, it i
efficient with high dimensional data than conventional methods. Metalearners take a similar
approach to estimate the CATE. While all of these models are different, they have one thing
in common: how well they perform depends on the underlying model they fit to the data. To
that end, CausalELMs use extreme learning machines because they are simple yet flexible
enough to be universal function approximators.
that end, CausalELMs use bagged ensembles of extreme learning machines because they are
simple yet flexible enough to be universal function approximators with lower varaince than
single extreme learning machines.
</p>

<h2>CausalELM Features</h2>
<ul>
<li>Estimate a causal effect, get a summary, and validate assumptions in just four lines of code</li>
<li>All models automatically select the best number of neurons and L2 penalty</li>
<li>Bagging improves performance and reduces variance without the need to tune a regularization parameter</li>
<li>Enables using the same structs for regression and classification</li>
<li>Includes 13 activation functions and allows user-defined activation functions</li>
<li>Most inference and validation tests do not assume functional or distributional forms</li>
<li>Implements the latest techniques form statistics, econometrics, and biostatistics</li>
<li>Works out of the box with DataFrames or arrays</li>
<li>Works out of the box with arrays or any data structure that implements the Tables.jl interface</li>
<li>Codebase is high-quality, well tested, and regularly updated</li>
</ul>

<h2>What's New?</h2>
<ul>
<li>Now includes doubly robust estimator for CATE estimation</li>
<li>Uses generalized cross validation with successive halving to find the best ridge penalty</li>
<li>Double machine learning, R-learning, and doubly robust estimators suppot specifying confounders and covariates of interest separately</li>
<li>Counterfactual consistency validation simulates outcomes that violate the assumption rather than the previous binning approach</li>
<li>Standardized and improved docstrings and added doctests</li>
<li>All estimators now implement bagging to reduce predictive performance and reduce variance</li>
<li>Counterfactual consistency validation simulates more realistic violations of the counterfactual consistency assumption</li>
<li>Uses a simple heuristic to choose the number of neurons, which reduces training time and still works well in practice</li>
<li>Probability clipping for classifier predictions and residuals is no longer necessary due to the bagging procedure</li>
<li>CausalELM talk has been accepted to JuliaCon 2024!</li>
</ul>

<h2>What's Next?</h2>
<p>
Newer versions of CausalELM will hopefully support using GPUs and provide textual
interpretations of the results of calling validate on a model that has been estimated.
However, these priorities could also change depending on feedback recieved at JuliaCon.
Newer versions of CausalELM will hopefully support using GPUs and provide interpretations of
the results of calling validate on a model that has been estimated. In addition, some
estimators will also support using instrumental variables. However, these priorities could
also change depending on feedback recieved at JuliaCon.
</p>

<h2>Disclaimer</h2>
Expand Down
27 changes: 6 additions & 21 deletions docs/src/api.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# CausalELM
Most of the methods and structs here are private, not exported, should not be called by the
user, and are documented for the purpose of developing CausalELM or to facilitate
understanding of the implementation.
```@docs
CausalELM.CausalELM
```

## Types
```@docs
Expand All @@ -15,9 +15,8 @@ RLearner
DoublyRobustLearner
CausalELM.CausalEstimator
CausalELM.Metalearner
CausalELM.ExtremeLearningMachine
CausalELM.ExtremeLearner
CausalELM.RegularizedExtremeLearner
CausalELM.ELMEnsemble
CausalELM.Nonbinary
CausalELM.Binary
CausalELM.Count
Expand All @@ -41,28 +40,15 @@ elish
fourier
```

## Cross Validation
```@docs
CausalELM.generate_folds
CausalELM.generate_temporal_folds
CausalELM.validation_loss
CausalELM.cross_validate
CausalELM.best_size
CausalELM.shuffle_data
```

## Average Causal Effect Estimators
```@docs
CausalELM.g_formula!
CausalELM.causal_loss!
CausalELM.predict_residuals
CausalELM.make_folds
CausalELM.moving_average
```

## Metalearners
```@docs
CausalELM.causal_loss
CausalELM.doubly_robust_formula!
CausalELM.stage1!
CausalELM.stage2!
Expand Down Expand Up @@ -94,7 +80,6 @@ CausalELM.e_value
CausalELM.binarize
CausalELM.risk_ratio
CausalELM.positivity
CausalELM.var_type
```

## Validation Metrics
Expand All @@ -114,17 +99,17 @@ CausalELM.fit!
CausalELM.predict
CausalELM.predict_counterfactual!
CausalELM.placebo_test
CausalELM.ridge_constant
CausalELM.set_weights_biases
```

## Utility Functions
```@docs
CausalELM.var_type
CausalELM.mean
CausalELM.var
CausalELM.one_hot_encode
CausalELM.clip_if_binary
CausalELM.@model_config
CausalELM.@standard_input_data
CausalELM.@double_learner_input_data
CausalELM.generate_folds
```
10 changes: 5 additions & 5 deletions docs/src/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,15 @@ code follows the guidelines below.

* Most new structs for estimating causal effects should have mostly the same fields. To
reduce the burden of repeatedly defining all these fields, it is advisable to use the
model_config, standard_input_data, and double_learner_input_data macros to
programmatically generate fields for new structs. Doing so will ensure that with little
to no effort the new structs will work with the summarize and validate methods.
model_config and standard_input_data macros to programmatically generate fields for new
structs. Doing so will ensure that with little to no effort the new structs will work
with the summarize and validate methods.

* There are no repeated code blocks. If there are repeated codeblocks, then they should be
consolidated into a separate function.

* Methods should generally include types and be type stable. If there is a strong reason
to deviate from this point, there should be a comment in the code explaining why.
* Interanl methods can contain types and be parametric but public methods should be as
general as possible.

* Minimize use of new constants and macros. If they must be included, the reason for their
inclusion should be obvious or included in the docstring.
Expand Down
84 changes: 31 additions & 53 deletions docs/src/guide/doublemachinelearning.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,8 @@ estimating causal effects when the dimensionality of the covariates is too high
regression or the treatment or outcomes cannot be easily modeled parametrically. Double
machine learning estimates models of the treatment assignment and outcome and then combines
them in a final model. This is a semiparametric model in the sense that the first stage
models can take on any functional form but the final stage model is linear.

!!! note
If regularized is set to true then the ridge penalty will be estimated using generalized
cross validation where the maximum number of iterations is 2 * folds for the successive
halving procedure. However, if the penalty in on iteration is approximately the same as in
the previous penalty, then the procedure will stop early.
models can take on any functional form but the final stage model is a linear combination of
the residuals from the first stage models.

!!! note
For more information see:
Expand All @@ -19,70 +14,53 @@ models can take on any functional form but the final stage model is linear.
Whitney Newey, and James Robins. "Double/debiased machine learning for treatment and
structural parameters." (2018): C1-C68.


## Step 1: Initialize a Model
The DoubleMachineLearning constructor takes at least three arguments, an array of
covariates, a treatment vector, and an outcome vector. This estimator supports binary, count,
or continuous treatments and binary, count, continuous, or time to event outcomes. You can
also specify confounders that you do not want to estimate the CATE for by passing a parameter
to the W argument. Otherwise, the model assumes all possible confounders are contained in X.
The DoubleMachineLearning constructor takes at least three arguments—covariates, a
treatment statuses, and outcomes, all of which may be either an array or any struct that
implements the Tables.jl interface (e.g. DataFrames). This estimator supports binary, count,
or continuous treatments and binary, count, continuous, or time to event outcomes.

!!! note
Internally, the outcome and treatment models are treated as a regression since extreme
learning machines minimize the MSE. This means that predicted treatments and outcomes
under treatment and control groups could fall outside [0, 1], although this is not likely
in practice. To deal with this, predicted binary variables are automatically clipped to
[0.0000001, 0.9999999]. This also means that count outcomes will be predicted as continuous
variables.
Non-binary categorical outcomes are treated as continuous.

!!! tip
You can also specify the following options: whether the treatment vector is categorical ie
not continuous and containing more than two classes, whether to use L2 regularization, the
activation function, the validation metric to use when searching for the best number of
neurons, the minimum and maximum number of neurons to consider, the number of folds to use
for cross validation, the number of iterations to perform cross validation, and the number
of neurons to use in the ELM used to learn the function from number of neurons to validation
loss. These arguments are specified with the following keyword arguments: t\_cat,
regularized, activation, validation\_metric, min\_neurons, max\_neurons, folds, iterations,
and approximator\_neurons.
You can also specify the the number of folds to use for cross-fitting, the number of
extreme learning machines to incorporate in the ensemble, the number of features to
consider for each extreme learning machine, the activation function to use, the number
of observations to bootstrap in each extreme learning machine, and the number of neurons
in each extreme learning machine. These arguments are specified with the folds,
num_machines, num_features, activation, sample_size, and num\_neurons keywords.

```julia
# Create some data with a binary treatment
X, T, Y, W = rand(100, 5), [rand()<0.4 for i in 1:100], rand(100), rand(100, 4)

# We could also use DataFrames
# We could also use DataFrames or any other package implementing the Tables.jl API
# using DataFrames
# X = DataFrame(x1=rand(100), x2=rand(100), x3=rand(100), x4=rand(100), x5=rand(100))
# T, Y = DataFrame(t=[rand()<0.4 for i in 1:100]), DataFrame(y=rand(100))
# W = DataFrame(w1=rand(100), w2=rand(100), w3=rand(100), w4=rand(100))

# W is optional and means there are confounders that you are not interested in estimating
# the CATE for
dml = DoubleMachineLearning(X, T, Y, W=W)
dml = DoubleMachineLearning(X, T, Y)
```

## Step 2: Estimate the Causal Effect
To estimate the causal effect, we call estimatecausaleffect! on the model above.
To estimate the causal effect, we call estimate_causal_effect! on the model above.
```julia
# we could also estimate the ATT by passing quantity_of_interest="ATT"
estimate_causal_effect!(dml)
```

# Get a Summary
We can get a summary that includes a p-value and standard error estimated via asymptotic
randomization inference by passing our model to the summarize method.

Calling the summarize method returns a dictionary with the estimator's task (regression or
classification), the quantity of interest being estimated (ATE), whether the model uses an
L2 penalty (always true for DML), the activation function used in the model's outcome
predictors, whether the data is temporal (always false for DML), the validation metric used
for cross validation to find the best number of neurons, the number of neurons used in the
ELMs used by the estimator, the number of neurons used in the ELM used to learn a mapping
from number of neurons to validation loss during cross validation, the causal effect,
standard error, and p-value.
We can get a summary of the model by pasing the model to the summarize method.

!!!note
To calculate the p-value and standard error for the treatmetn effect, you can set the
inference argument to false. However, p-values and standard errors are calculated via
randomization inference, which will take a long time. But can be sped up by launching
Julia with a higher number of threads.

```julia
# Can also use the British spelling
# summarise(dml)

summarize(dml)
```

Expand All @@ -94,12 +72,12 @@ tests do not provide definitive evidence of a violation of these assumptions. To
counterfactual consistency assumption, we simulate counterfactual outcomes that are
different from the observed outcomes, estimate models with the simulated counterfactual
outcomes, and take the averages. If the outcome is continuous, the noise for the simulated
counterfactuals is drawn from N(0, dev) for each element in devs, otherwise the default is
0.25, 0.5, 0.75, and 1.0 standard deviations from the mean outcome. For discrete variables,
each outcome is replaced with a different value in the range of outcomes with probability ϵ
for each ϵ in devs, otherwise the default is 0.025, 0.05, 0.075, 0.1. If the average
estimate for a given level of violation differs greatly from the effect estimated on the
actual data, then the model is very sensitive to violations of the counterfactual
counterfactuals is drawn from N(0, dev) for each element in devs and each outcome,
multiplied by the original outcome, and added to the original outcome. For discrete
variables, each outcome is replaced with a different value in the range of outcomes with
probability ϵ for each ϵ in devs, otherwise the default is 0.025, 0.05, 0.075, 0.1. If the
average estimate for a given level of violation differs greatly from the effect estimated on
the actual data, then the model is very sensitive to violations of the counterfactual
consistency assumption for that level of violation. Next, this method tests the model's
sensitivity to a violation of the exchangeability assumption by calculating the E-value,
which is the minimum strength of association, on the risk ratio scale, that an unobserved
Expand Down
20 changes: 9 additions & 11 deletions docs/src/guide/estimatorselection.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,13 @@ given dataset and causal question.

| Model | Struct | Causal Estimands | Supported Treatment Types | Supported Outcome Types |
|----------------------------------|-----------------------|----------------------------------|---------------------------|------------------------------------------|
| Interrupted Time Series Analysis | InterruptedTimeSeries | ATE, Cumulative Treatment Effect | Binary | Continuous, Count[^2], Time to Event |
| G-computation | GComputation | ATE, ATT, ITT | Binary | Binary[^1],Continuous, Time to Event, Count[^2] |
| Double Machine Learning | DoubleMachineLearning | ATE | Binary[^1], Count[^2], Continuous | Binary[^1], Count[^2], Continuous, Time to Event |
| S-learning | SLearner | CATE | Binary | Binary[^1], Continuous, Time to Event, Count[^2] |
| T-learning | TLearner | CATE | Binary | Binary[^1], Continuous, Count[^2], Time to Event |
| X-learning | XLearner | CATE | Binary[^1] | Binary[^1], Continuous, Count[^2], Time to Event |
| R-learning | RLearner | CATE | Binary[^1], Count[^2], Continuous | Binary[^1], Count[^2], Continuous, Time to Event |
| Doubly Robust Estimation | DoublyRobustLearner | CATE | Binary | Binary[^1], Continuous, Count[^2], Time to Event |
| Interrupted Time Series Analysis | InterruptedTimeSeries | ATE, Cumulative Treatment Effect | Binary | Continuous, Count[^1], Time to Event |
| G-computation | GComputation | ATE, ATT, ITT | Binary | Binary,Continuous, Time to Event, Count[^1] |
| Double Machine Learning | DoubleMachineLearning | ATE | Binary, Count[^1], Continuous | Binary, Count[^1], Continuous, Time to Event |
| S-learning | SLearner | CATE | Binary | Binary, Continuous, Time to Event, Count[^1] |
| T-learning | TLearner | CATE | Binary | Binary, Continuous, Count[^1], Time to Event |
| X-learning | XLearner | CATE | Binary | Binary, Continuous, Count[^1], Time to Event |
| R-learning | RLearner | CATE | Binary, Count[^1], Continuous | Binary, Count[^1], Continuous, Time to Event |
| Doubly Robust Estimation | DoublyRobustLearner | CATE | Binary | Binary, Continuous, Count[^1], Time to Event |

[^1]: Models that use propensity scores or predict binary treatment assignment may, on very rare occasions, return values outside of [0, 1]. In that case, values are clipped to be between 0.0000001 and 0.9999999.

[^2]: Similar to other packages, predictions of count variables is treated as a continuous regression task.
[^1]: Similar to other packages, predictions of count variables is treated as a continuous regression task.
Loading
Loading