You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
Updates
Add some advanced off-policy gradient estimators (#167)
Automatic candidate hyperparamer sorting for slope (#168)
Fixing the the error checking about "p_e_a" in obp.ope.OffPolicyEvaluation (#169)
Fixing the expected reward factual in the independent reward structure (#170)
Allowing slope to use the true marginal importance weight for mips (#172)
References
Yuta Saito and Thorsten Joachims. "Off-Policy Evaluation for Large Action Spaces via Embeddings." 2022.
Thorsten Joachims, Adith Swaminathan, and Maarten de Rijke. "Deep Learning for Logged Bandit Feedback.", 2018.
Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, and Miroslav Dudik.Doubly Robust Off-Policy Evaluation with Shrinkage.", 2020.
Alberto Maria Metelli, Alessio Russo, and Marcello Restelli. "Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning.", 2021.