0.5.0
The changes are summarized below:
Major updates
- Add OPE/OPE with Continuous Actions
- Add Weight clipping to IPW and DR (#115 )
- Add Automatic Hyperparameter Tuning of OPE estimators [3] (#116, #131 )
- Add arguments to the SyntheticBanditDataset class to generate more flexible synthetic data (#123 )
- Add subsample option to the OpenBanditDataset class (#118 )
- Modify an input type of
off_policy_objective
argument and Add some hyperparameters to NNPolicyLearner (#132)
Minor updates
- Fix README (#119 )
- Fix Scalar value checking (#122 )
- Add ValueError to OffPolicyEvaluation class (#125 )
- Fix Error messages (#126 )
- Add Some Errors (#125, #129 )
- Update Quickstart examples (#127 )
Cautions
- the hyperparameter name of
obp.ope.SwitchDoublyRobust
has changed tolambda_
fromtau
- the type of argument
off_policy_objective
ofobp.policy.NNPolicyLearner
has changed tostr
fromcallable
References
- Nathan Kallus and Angela Zhou. Policy Evaluation and Optimization with Continuous Treatments, AISTATS2018.
- Nathan Kallus and Masatoshi Uehara. "Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies", NeurIPS2020.
- Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, and Miroslav Dudik. "Doubly Robust Off-Policy Evaluation with Shrinkage.", ICML2020.