title

abstract

layout

series

publisher

issn

id

month

tex_title

firstpage

lastpage

page

order

cycles

bibtex_author

author

date

address

container-title

volume

genre

issued

pdf

extras

Reinforcement Learning in Parametric MDPs with Exponential Families

Extending model-based regret minimization strategies for Markov decision processes (MDPs) beyond discrete state-action spaces requires structural assumptions on the reward and transition models. Existing parametric approaches establish regret guarantees by making strong assumptions about either the state transition distribution or the value function as a function of state-action features, and often do not satisfactorily capture classical problems like linear dynamical systems or factored MDPs. This paper introduces a new MDP transition model defined by a collection of linearly parameterized exponential families with $d$ unknown parameters. For finite-horizon episodic RL with horizon $H$ in this MDP model, we propose a model-based upper confidence RL algorithm (Exp-UCRL) that solves a penalized maximum likelihood estimation problem to learn the $d$-dimensional representation of the transition distribution, balancing the exploitation-exploration tradeoff using confidence sets in the exponential family space. We demonstrate the efficiency of our algorithm by proving a frequentist (worst-case) regret bound that is of order $\tilde O(d\sqrt{H^3 N})$, sub-linear in total time $N$, linear in dimension $d$, and polynomial in the planning horizon $H$. This is achieved by deriving a novel concentration inequality for conditional exponential families that might be of independent interest. The exponential family MDP model also admits an efficient posterior sampling-style algorithm for which a similar guarantee on the Bayesian regret is shown.

inproceedings

Proceedings of Machine Learning Research

PMLR

2640-3498

chowdhury21b

0

Reinforcement Learning in Parametric MDPs with Exponential Families

1855

1863

1855-1863

1855

false

Chowdhury, Sayak Ray and Gopalan, Aditya and Maillard, Odalric-Ambrym

given	family
Sayak Ray	Chowdhury

given	family
Aditya	Gopalan

given	family
Odalric-Ambrym	Maillard

2021-03-18

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics

130

inproceedings

date-parts

2021

3

18

http://proceedings.mlr.press/v130/chowdhury21b/chowdhury21b.pdf

label	link
Supplementary PDF	http://proceedings.mlr.press/v130/chowdhury21b/chowdhury21b-supp.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2021-03-18-chowdhury21b.md

2021-03-18-chowdhury21b.md

Files

2021-03-18-chowdhury21b.md

Latest commit

History

2021-03-18-chowdhury21b.md

File metadata and controls