title | abstract | layout | series | publisher | issn | id | month | tex_title | firstpage | lastpage | page | order | cycles | bibtex_author | author | date | address | container-title | volume | genre | issued | extras | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Reinforcement Learning in Parametric MDPs with Exponential Families |
Extending model-based regret minimization strategies for Markov decision processes (MDPs) beyond discrete state-action spaces requires structural assumptions on the reward and transition models. Existing parametric approaches establish regret guarantees by making strong assumptions about either the state transition distribution or the value function as a function of state-action features, and often do not satisfactorily capture classical problems like linear dynamical systems or factored MDPs. This paper introduces a new MDP transition model defined by a collection of linearly parameterized exponential families with |
inproceedings |
Proceedings of Machine Learning Research |
PMLR |
2640-3498 |
chowdhury21b |
0 |
Reinforcement Learning in Parametric MDPs with Exponential Families |
1855 |
1863 |
1855-1863 |
1855 |
false |
Chowdhury, Sayak Ray and Gopalan, Aditya and Maillard, Odalric-Ambrym |
|
2021-03-18 |
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics |
130 |
inproceedings |
|
|