title | abstract | layout | series | publisher | issn | id | month | tex_title | firstpage | lastpage | page | order | cycles | bibtex_author | author | date | address | container-title | volume | genre | issued | extras | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling |
We consider the contextual bandit problem, where a player sequentially makes decisions based on past observations to maximize the cumulative reward. Although many algorithms have been proposed for contextual bandit, most of them rely on finding the maximum likelihood estimator at each iteration, which requires |
inproceedings |
Proceedings of Machine Learning Research |
PMLR |
2640-3498 |
ding21a |
0 |
An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling |
1585 |
1593 |
1585-1593 |
1585 |
false |
Ding, Qin and Hsieh, Cho-Jui and Sharpnack, James |
|
2021-03-18 |
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics |
130 |
inproceedings |
|
|