Release 0.5.5 · st-tech/zr-obp

Updates

Add some advanced off-policy gradient estimators (#167)
Automatic candidate hyperparamer sorting for slope (#168)
Fixing the the error checking about "p_e_a" in obp.ope.OffPolicyEvaluation (#169)
Fixing the expected reward factual in the independent reward structure (#170)
Allowing slope to use the true marginal importance weight for mips (#172)

Yuta Saito and Thorsten Joachims. "Off-Policy Evaluation for Large Action Spaces via Embeddings." 2022.
Thorsten Joachims, Adith Swaminathan, and Maarten de Rijke. "Deep Learning for Logged Bandit Feedback.", 2018.
Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, and Miroslav Dudik.Doubly Robust Off-Policy Evaluation with Shrinkage.", 2020.
Alberto Maria Metelli, Alessio Russo, and Marcello Restelli. "Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning.", 2021.