Skip to content
/ mope Public
forked from n17s/mope

martingale off-policy evaluation

License

Notifications You must be signed in to change notification settings

zmhammedi/mope

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mope

martingale off-policy evaluation

conda installation

conda env create -f environment.yml

outline

  • opebet.py contains the code for MOPE (wealth_lb_2d) and its ablations:
    • wealth_lb_1d: scalar betting
    • wealth_2d: exact wealth maximization
    • wealth_lb_2d_individual_qps: individual bets per value on a grid
  • opebetrp.py contains code for reward predictors and gated deployment
    • wealth_lb_rp subtracts the reward predictor control variate from w*r
    • wealth_lb_rp_double_hedge the double hedging strategy
    • wealth_lb_gd confidence sequence for gated deployment

About

martingale off-policy evaluation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 96.1%
  • Python 3.9%