Skip to content

Latest commit

 

History

History
129 lines (102 loc) · 12 KB

README.md

File metadata and controls

129 lines (102 loc) · 12 KB

Awesome-LLM-EN

Awesome License: MIT

This is a compilation on "LLMs for Embodied Navigation," including state-of-the-art benchmarks and datasets.

In recent years, the rapid advancement of Large Language Models (LLMs) like GPT has garnered increasing attention for their potential in various practical applications. The integration of LLMs with Embodied Intelligence has emerged as a new focal area. Embodied Intelligence underscores that intelligent behavior stems not just from computational models but also involves physical interactions between robots or intelligent agents and their environment. The essence of embodied intelligence is that genuine intelligence requires the context of both the body and the surrounding environment, necessitating systems that can perceive through sensors and act via actuators. This integration is especially critical for applications that demand language and image processing capabilities. Overall, the amalgamation of LLMs and embodied intelligence offers immense potential and opens up new avenues for AI application, raising fresh research challenges including model interpretability and real-time performance.

Among the numerous applications of LLMs, navigation tasks stand out as they require deep environmental understanding and quick, accurate decision-making. LLMs can enrich embodied intelligence systems with advanced environmental perception and decision-making support through their robust language and image processing skills. This article focuses on navigation, offering a comprehensive summary of the integration between LLMs and embodied intelligence. It covers state-of-the-art models, research methodologies, and evaluates the pros and cons of current embodied navigation models and datasets. Finally, the article provides insights into the role of LLMs in embodied intelligence based on the latest research, projecting future developments in the field.

News

😊 This project is under development. You can hit the STAR and WATCH to follow the updates.

Overview

In this repository, we collect recent advances in unifying LLMs and Agents. We have identified two commonly used general models:

🤖 1) LLM for Planner

🤖 2) LLM for Semantic Understanding..

Fig.1: The first type employs LLMs as planners that directly generate actions, thereby leveraging exploration policies to control agent

Fig.2: The second type utilizes LLMs to analyze incoming visual or textual data to extract goal-relevant information,
upon which exploration policies subsequently generate appropriate actions to guide

Table of Contents

Related Surveys

  • A Real 3D Embodied Dataset for Robotic Active Visual Learning (IEEE, 2022) [paper]
  • ProcTHOR: Large-Scale Embodied AI Using Procedural Generation (Arxiv, 2022) [paper]
  • SOON: Scenario Oriented Object Navigation with Graph-based Exploration (Arxiv, 2021) [paper]
  • Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding (Arxiv, 2020) [paper]
  • Reverie: Remote embodied visual referring expression in real indoor environments (Arxiv, 2019) [paper]
  • Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments (CVPR, 2018) [paper]
  • Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments (CVPR, 2019) [paper]
  • March in Chat: Interactive Prompting for Remote Embodied Referring Expression (ICCV, 2023) [paper]
  • Matterport3D: Learning from RGB-D Data in Indoor Environments (Arxiv, 2019) [paper]
  • Vision-and-dialog navigation (pmlr, 2019) [paper]
  • Alfred: A benchmark for interpreting grounded instructions for everyday tasks (CVPR, 2020) [paper]
  • Ving: Learning open-world navigation with visual goals (IEEE, 2021) [paper]
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Arxiv, 2018) [paper]
  • Grounded language-image pre-training (CVPR, 2022) [paper]
  • Hinge-Loss Markov Random Fields and Probabilistic Soft Logic (Arxiv, 2017) [paper]
  • Dynamic Planning with a LLM (Arxiv, 2023) [paper]
  • Chain-of-thought prompting elicits reasoning in large language models (NIPS, 2022) [paper]
  • From Machine Learning to Robotics: Challenges and Opportunities for Embodied Intelligence (Arxiv, 2021) [paper]
  • Natural language processing (Springer, 2020) [paper]

LLMs for Grounded Language Understanding

  • Esc: Exploration with soft commonsense constraints for zero-shot object navigation (Arxiv, 2023) [paper]
  • SayNav: Grounding Large Language Models for Dynamic Planning to Navigation in New Environments (Arxiv, 2023) [paper]
  • L3MVN: Leveraging Large Language Models for Visual Target Navigation (Arxiv, 2023) [paper]
  • Zson: Zero-shot object-goal navigation using multimodal goal embeddings (NIPS, 2022) [paper]
  • Clip-nav: Using clip for zero-shot vision-and-language navigation (Arxiv, 2022) [paper]
  • Clip on wheels: Zero-shot object navigation as object localization and exploration (Arxiv, 2023) [paper]
  • Vision-based navigation with language-based assistance via imitation learning with indirect intervention (CVPR, 2019) [paper]
  • VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models (Arxiv, 2023) [paper]
  • Visual language maps for robot navigation (IEEE, 2023) [paper] *Language Models as Zero-Shot Planners:Extracting Actionable Knowledge for Embodied Agents (ICLR, 2022)[paper]

LLMs for Few-Shot Planning

  • A2Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models (Arxiv, 2023) [paper]
  • March in Chat: Interactive Prompting for Remote Embodied Referring Expression (ICCV, 2023) [paper]
  • NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models (Arxiv, 2023) [paper]
  • VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View (Arxiv, 2023) [paper]
  • OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav (Arxiv, 2023) [paper]
  • Sqa3d: Situated question answering in 3d scenes (Arxiv, 2022) [paper]
  • LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models (ICCV, 2023) [paper]

Integrating Embodied Intelligence With Reinforcement Learning

  • Deep Reinforcement Learning-Based Online Domain Adaptation Method for Fault Diagnosis of Rotating Machinery (IEEE, 2021) [paper]
  • A Control Strategy of Robot Eye-Head Coordinated Gaze Behavior Achieved for Minimized Neural Transmission Noise (IEEE, 2022) [paper]
  • Cocktail LSTM and Its Application Into Machine Remaining Useful Life Prediction (IEEE, 2023) [paper]
  • Terrain Identification for Humanoid Robots Applying Convolutional Neural Networks (IEEE, 2020) [paper]
  • A Survey on NLP based Text Summarization for Summarizing Product Reviews (IEEE, 2020) [paper]
  • Prediction and Compensation of Contour Error of CNC Systems Based on LSTM Neural-Network (IEEE, 2021) [paper]
  • A Behavior-Based Reinforcement Learning Approach to Control Walking Bipedal Robots Under Unknown Disturbances (IEEE, 2021) [paper]
  • Localization for Multirobot Formations in Indoor Environment (IEEE, 2010) [paper]
  • Leader-Following Formation Control of Nonholonomic Mobile Robots With Velocity Observers (IEEE, 2020) [paper]
  • Motion Planning for Point-to-Point Navigation of Spherical Robot Using Position Feedback (IEEE, 2019) [paper]
  • Simultaneous Hand–Eye/Robot–World/Camera–IMU Calibration (IEEE, 2029) [paper]
  • The influence of preprocessing on text classification using a bag-of-words representation (PLOS, 2020) [paper]

Star History

Star History Chart