martingale off-policy evaluation
conda env create -f environment.yml
opebet.py
contains the code for MOPE (wealth_lb_2d
) and its ablations:wealth_lb_1d
: scalar bettingwealth_2d
: exact wealth maximizationwealth_lb_2d_individual_qps
: individual bets per value on a grid
opebetrp.py
contains code for reward predictors and gated deploymentwealth_lb_rp
subtracts the reward predictor control variate from w*rwealth_lb_rp_double_hedge
the double hedging strategywealth_lb_gd
confidence sequence for gated deployment