Skip to content

Latest commit

 

History

History
45 lines (45 loc) · 1.61 KB

2021-03-18-bhandari21a.md

File metadata and controls

45 lines (45 loc) · 1.61 KB
title abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
On the Linear Convergence of Policy Gradient Methods for Finite MDPs
We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations. There has been some recent work viewing this setting as an instance of smooth non-linear optimization problems, to show sub-linear convergence rates with small step-sizes. Here, we take a completely different perspective based on illuminating connections with policy iteration, to show how many variants of policy gradient algorithms succeed with large step-sizes and attain a linear rate of convergence.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
bhandari21a
0
On the Linear Convergence of Policy Gradient Methods for Finite MDPs
2386
2394
2386-2394
2386
false
Bhandari, Jalaj and Russo, Daniel
given family
Jalaj
Bhandari
given family
Daniel
Russo
2021-03-18
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics
130
inproceedings
date-parts
2021
3
18