0.5.0

usaito released this 07 Sep 01:35

· 216 commits to master since this release

The changes are summarized below:

Major updates

Add OPE/OPE with Continuous Actions
- SyntheticContinuousBanditDataset (#112 )
- ContinuousOPEEstimators [1] (#113 )
- ContinuousNNPolicyLearner [2] (#114 )
Add Weight clipping to IPW and DR (#115 )
Add Automatic Hyperparameter Tuning of OPE estimators [3] (#116, #131 )
Add arguments to the SyntheticBanditDataset class to generate more flexible synthetic data (#123 )
Add subsample option to the OpenBanditDataset class (#118 )
Modify an input type of off_policy_objective argument and Add some hyperparameters to NNPolicyLearner (#132)

Minor updates

Fix README (#119 )
Fix Scalar value checking (#122 )
Add ValueError to OffPolicyEvaluation class (#125 )
Fix Error messages (#126 )
Add Some Errors (#125, #129 )
Update Quickstart examples (#127 )

Cautions

the hyperparameter name of obp.ope.SwitchDoublyRobust has changed to lambda_ from tau
the type of argument off_policy_objective of obp.policy.NNPolicyLearner has changed to str from callable

References

Nathan Kallus and Angela Zhou. Policy Evaluation and Optimization with Continuous Treatments, AISTATS2018.
Nathan Kallus and Masatoshi Uehara. "Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies", NeurIPS2020.
Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, and Miroslav Dudik. "Doubly Robust Off-Policy Evaluation with Shrinkage.", ICML2020.

Assets 2