title | abstract | layout | series | publisher | issn | id | month | tex_title | firstpage | lastpage | page | order | cycles | bibtex_author | author | date | address | container-title | volume | genre | issued | extras | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
On the Linear Convergence of Policy Gradient Methods for Finite MDPs |
We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations. There has been some recent work viewing this setting as an instance of smooth non-linear optimization problems, to show sub-linear convergence rates with small step-sizes. Here, we take a completely different perspective based on illuminating connections with policy iteration, to show how many variants of policy gradient algorithms succeed with large step-sizes and attain a linear rate of convergence. |
inproceedings |
Proceedings of Machine Learning Research |
PMLR |
2640-3498 |
bhandari21a |
0 |
On the Linear Convergence of Policy Gradient Methods for Finite MDPs |
2386 |
2394 |
2386-2394 |
2386 |
false |
Bhandari, Jalaj and Russo, Daniel |
|
2021-03-18 |
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics |
130 |
inproceedings |
|
|