Reinforcement Learning Blackjack Algorithm (RL_Blackjack) Plays through an infinite deck against a two players and after a set number of iterations creates a 3D graph comparing state-value distribution without usable ace (when ace value varies from 11 to 1).
Greedy Policy Blackjack Algorithm (GP_Blackjack) Player plays against dealer and progressively priorities short term rewards over exploration. The final output states [Wins,Draws,Losses].
Both algorithms created with thanks to Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto.