title | booktitle | year | volume | series | month | publisher | url | openreview | abstract | layout | issn | id | tex_title | firstpage | lastpage | page | order | cycles | bibtex_editor | editor | bibtex_author | author | date | address | container-title | genre | issued | extras | |||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Exploring Beyond Curiosity Rewards: Language-Driven Exploration in RL |
Proceedings of the 16th Asian Conference on Machine Learning |
2025 |
260 |
Proceedings of Machine Learning Research |
0 |
PMLR |
qHv7qTETsw |
Sparse rewards pose a significant challenge for many reinforcement learning algorithms, which struggle in the absence of a dense, well-shaped reward function. Drawing inspiration from the curiosity exhibited in animals, intrinsically-driven methods overcome this drawback by incentivizing agents to explore novel states. Yet, in the absence of domain-specific priors, sample efficiency is hindered as most discovered novelty has little relevance to the true task reward. We present iLLM, a curiosity-driven approach that leverages the inductive bias of foundation models — Large Language Models, as a source of information about plausibly useful behaviors. Two tasks are introduced for shaping exploration: 1) action generation and 2) history compression, where the language model is prompted with a description of the state-action trajectory. We further propose a technique for mapping state-action pairs to pretrained token embeddings of the language model in order to alleviate the need for explicit textual descriptions of the environment. By distilling prior knowledge from large language models, iLLM encourages agents to discover diverse and human-meaningful behaviors without requiring direct human intervention. We evaluate the proposed method on BabyAI-Text, MiniHack, Atari games, and Crafter tasks, demonstrating higher sample efficiency compared to prior curiosity-driven approaches. |
inproceedings |
2640-3498 |
bougie25a |
{Exploring Beyond Curiosity Rewards}: {L}anguage-Driven Exploration in {RL} |
127 |
142 |
127-142 |
127 |
false |
Nguyen, Vu and Lin, Hsuan-Tien |
|
Bougie, Nicolas and Watanabe, Narimasa |
|
2025-01-14 |
Proceedings of the 16th Asian Conference on Machine Learning |
inproceedings |
|
|