Series of n-armed bandit environments for the OpenAI Gym
BanditTwoArmedDeterministicFixed-v0
: Simplest case where one bandit always pays, and the other always doesn'tBanditTwoArmedHighLowFixed-v0
: Stochastic version with a large difference between which bandit pays out of two choicesBanditTwoArmedHighHighFixed-v0
: Stochastic version with a small difference between which bandit pays where both are goodBanditTwoArmedLowLowFixed-v0
: Stochastic version with a small difference between which bandit pays where both are badBanditTwoArmedUniform-v0
: Stochastic version both arms pay between 0 and 1BanditTenArmedRandomFixed-v0
: 10 armed bandit with random probabilities assigned to payoutsBanditTenArmedRandomRandom-v0
: 10 armed bandit with random probabilities assigned to both payouts and rewardsBanditTenArmedUniformDistributedReward-v0
: 10 armed bandit with that always pays out with a reward selected from a uniform distributionBanditTenArmedGaussian-v0
: 10 armed bandit mentioned on page 30 of Reinforcement Learning: An Introduction (Sutton and Barto)
git clone [email protected]:mimoralea/gym-bandits.git
cd gym-bandits
pip install .
or:
pip install git+https://github.com/mimoralea/gym-bandits#egg=gym-bandits
In your gym environment
import gym, gym_bandits
env = gym.make("BanditTenArmedGaussian-v0")