Name	Name	Last commit message	Last commit date
Latest commit zhang_lu Add concepts docs Nov 11, 2024 961e558 · Nov 11, 2024 History 145 Commits
docker/conductor	docker/conductor	Add concepts docs	Nov 11, 2024
docs	docs	Add concepts docs	Nov 11, 2024
examples	examples	update README for simple VQA example	Nov 11, 2024
omagent-core	omagent-core	Add concepts docs	Nov 11, 2024
.gitattributes	.gitattributes	Init commit	Jul 4, 2024
.gitignore	.gitignore	migrate dnc to conductor implementation	Nov 8, 2024
LICENSE	LICENSE	Init commit	Jul 4, 2024
README.md	README.md	Add concepts docs	Nov 11, 2024
README_ZH.md	README_ZH.md	Add concepts docs	Nov 11, 2024
my_exp.py	my_exp.py	Fix the issue where the conductor settings are not taking effect.	Nov 9, 2024
requirements.txt	requirements.txt	Init commit	Jul 4, 2024
run.py	run.py	Fix the bug in the CLI input and adjust the code in run.py	Nov 10, 2024
test.py	test.py	Add concepts docs	Nov 11, 2024

Repository files navigation

OmAgent

English | 中文

🗓️ Updates

11/12/2024: OmAgent v0.2.0 is officially released! We have completely rebuilt the underlying framework of OmAgent, making it more flexible and easy to extend. The new version added the concept of devices, making it easier to develop quickly for smart hardware.
10/20/2024: We are actively engaged in developing version 2.0.0 🚧 Exciting new features are underway! You are welcome to join us on X and Discord~
09/20/2024: Our paper has been accepted by EMNLP 2024. See you in Miami!🏝
07/04/2024: The OmAgent open-source project has been unveiled. 🎉
06/24/2024: The OmAgent research paper has been published.

📖 Introduction

OmAgent is an open-source agent framework designed to streamlines the development of on-device multimodal agents. Our goal is to enable agents that can empower various hardware devices, ranging from smart phone, smart wearables (e.g. glasses), IP cameras to futuristic robots. As a result, OmAgent creates an abstraction over various types of device and simplifies the process of connecting these devices to the state-of-the-art multimodal foundation models and agent algorithms, to allow everyone build the most interesting on-device agents. Moreover, OmAgent focuses on optimize the end-to-end computing pipeline, on in order to provides the most real-time user interaction experience out of the box.

In conclusion, key features of OmAgent include:

Easy Connection to Diverse Devices: we make it really simple to connect to physical devices, e.g. phone, glasses and more, so that agent/model developers can build the applications that not running on web page, but running on devices. We welcome contribution to support more devices!
Speed-optimized SOTA Mutlimodal Models: OmAgent integrates the SOTA commercial and open-source foundation models to provide application developers the most powerful intelligence. Moreover, OmAgent streamlines the audio/video processing and computing process to easily enable natural and fluid interaction between the device and the users.
SOTA Multimodal Agent Algorithms: OmAgent provides an easy workflow orchestration interface for researchers and developers implement the latest agent algorithms, e.g. ReAct, DnC and more. We welcome contributions of any new agent algorithm to enable more complex problem solving abilities.
Scalability and Flexibility: OmAgent provides an intuitive interface for building scalable agents, enabling developers to construct agents tailored to specific roles and highly adaptive to various applications.

🛠️ How To Install

1. Deploy the Workflow Orchestration Engine

OmAgent utilizes Conductor as its workflow orchestration engine. Conductor is an open-source, distributed, and scalable workflow engine that supports a variety of programming languages and frameworks. By default, it uses Redis for persistence and Elasticsearch (7.x) as the indexing backend.
It is recommended to deploy Conductor using Docker:

docker compose -f docker/conductor/docker-compose.yml up -d

Once deployed, you can access the Conductor UI at http://localhost:5001. (Note: Mac system will occupy port 5000 by default, so we use 5001 here. You can specify other ports when deploying Conductor.)
The Conductor API can be accessed via http://localhost:8080.

2. Install OmAgent

Python Version: Ensure Python 3.10 or higher is installed.
Install omagent_core:
```
pip install -e omagent-core
```
Install dependencies for the sample project:
```
pip install -r requirements.txt
```
Install Optional Components:
- Install Milvus VectorDB for enhanced support of long-term memory. OmAgent uses Milvus Lite as the default vector database for storing vector data related to long-term memory. To utilize the full Milvus service, you may deploy the Milvus vector database via Docker.

3. Connect Devices

If you wish to use smart devices to access your agents, we provide a smartphone app and corresponding backend, allowing you to focus on agent functionality without worrying about complex device connection issues.

Deploy the app backend
TODO
Download, install, and debug the smartphone app
TODO

🚀 Quick Start

Hello World

Adjust Python Path: The script modifies the Python path to ensure it can locate necessary modules. Verify the path is correct for your setup:
```
CURRENT_PATH = Path(__file__).parents[0]
sys.path.append(os.path.abspath(CURRENT_PATH.joinpath('../../')))
```
- CURRENT_PATH: This is the path to the current directory.
- sys.path.append: This adds the path to the current directory to the Python path. This is to allow importing packages from the examples directory later.
Initialize Logging: The script sets up logging to track application events. You can adjust the logging level (INFO, DEBUG, etc.) as needed:
```
logging.init_logger("omagent", "omagent", level="INFO")
```
Create and Execute Workflow: The script creates a workflow and adds a task to it. It then starts the agent client to execute the workflow:
```
 from examples.step1_simpleVQA.agent.simple_vqa.simple_vqa import SimpleVQA
 from examples.step1_simpleVQA.agent.input_interface.input_interface import InputIterface

 workflow = ConductorWorkflow(name='example1')
 task1 = simple_task(task_def_name='InputIterface', task_reference_name='input_task')
 task2 = simple_task(task_def_name='SimpleVQA', task_reference_name='simple_vqa', inputs={'user_instruction': task1.output('user_instruction')})
 workflow >> task1 >> task2
 
 
 workflow.register(True)
 
 agent_client = DefaultClient(interactor=workflow, config_path='examples/step1_simpleVQA/configs', workers=[InputIterface()])
 agent_client.start_interactor()
```
- Workflow: Defines the sequence of tasks. 'name' is the name of the workflow， please make sure it is unique.
- Task: Represents a unit of work, in this case, we use SimpleVQA from the examples. 'task_def_name' represents the corresponding class name, 'task_reference_name' represents the name in the conductor.
- AppClient: Starts the agent client to execute the workflow. Here we use AppClient, if you want to use CLI, please use DefaultClient.
- agent_client.start_interactor(): This will start the worker corresponding to the registered task, in this case, it will start SimpleVQA and wait for the conductor's scheduling.
Run the Script
Execute the script using Python:
```
python run_app.py
```
Ensure the workflow engine is running before executing the script.

🏗 Architecture

The design architecture of OmAgent adheres to three fundamental principles:

Graph-based workflow orchestration;
Native multimodality;
Device-centricity.

With OmAgent, one has the opportunity to craft a bespoke intelligent agent program.

For a deeper comprehension of OmAgent, let us elucidate key terms:

Devices: Central to OmAgent's vision is the empowerment of intelligent hardware devices through artificial intelligence agents, rendering devices a pivotal component of OmAgent's essence. By leveraging the downloadable mobile application we have generously provided, your mobile device can become the inaugural foundational node linked to OmAgent. Devices serve to intake environmental stimuli, such as images and sounds, potentially offering responsive feedback. We have evolved a streamlined backend process to manage the app-centric business logic, thereby enabling developers to concentrate on constructing the intelligence agent's logical framework.
Workflow: Within the OmAgent Framework, the architectural structure of intelligent agents is articulated through graphs. Developers possess the liberty to innovate, configure, and sequence node functionalities at will. Presently, we have opted for Conductor as the workflow orchestration engine, lending support to intricate operations like switch-case, fork-join, and do-while.
Task and Worker: Throughout the OmAgent workflow development journey, Task and Worker stand as pivotal concepts. Worker embodies the actual operational logic of workflow nodes, whereas Task oversees the orchestration of the workflow's logic. Tasks are categorized into Operators, managing workflow logic (e.g., looping, branching), and Simple Tasks, representing nodes customized by developers. Each Simple Task is correlated with a Worker; when the workflow progresses to a given Simple Task, the task is dispatched to the corresponding worker for execution.

Basic Principles of Building an Agent

Modularity: Break down the agent's functionality into discrete workers, each responsible for a specific task.
Reusability: Design workers to be reusable across different workflows and agents.
Scalability: Use workflows to scale the agent's capabilities by adding more workers or adjusting the workflow sequence.
Interoperability: Workers can interact with various backends, such as LLMs, databases, or APIs, allowing agents to perform complex operations.
Asynchronous Execution: The workflow engine and task handler manage the execution asynchronously, enabling efficient resource utilization.

Examples

We provide exemplary projects to demonstrate the construction of intelligent agents using OmAgent. You can find a comprehensive list in the examples directory. Here is the reference sequence:

step1_simpleVQA illustrates the creation of a simple multimodal VQA agent with OmAgent. Documentation
step2_outfit_with_switch demonstrates how to build an agent with switch-case branches using OmAgent. Documentation
step3_outfit_with_loop shows the construction of an agent incorporating loops using OmAgent. Documentation
step4_outfit_with_ltm exemplifies using OmAgent to create an agent equipped with long-term memory. Documentation
dnc_loop demonstrates the development of an agent utilizing the DnC algorithm to tackle complex problems. Documentation
video_understanding showcases the creation of a video understanding agent for interpreting video content using OmAgent. Documentation

API Documentation

The API documentation is available here.

🔗 Related works

If you are intrigued by multimodal large language models, and agent technologies, we invite you to delve deeper into our research endeavors:
🔆 How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection (AAAI24)
🏠 GitHub Repository

🔆 OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network (IET Computer Vision)
🏠 Github Repository

⭐️ Citation

If you find our repository beneficial, please cite our paper:

@article{zhang2024omagent,
  title={OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer},
  author={Zhang, Lu and Zhao, Tiancheng and Ying, Heting and Ma, Yibo and Lee, Kyusong},
  journal={arXiv preprint arXiv:2406.16620},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OmAgent

🗓️ Updates

📖 Introduction

🛠️ How To Install

1. Deploy the Workflow Orchestration Engine

2. Install OmAgent

3. Connect Devices

🚀 Quick Start

Hello World

🏗 Architecture

Basic Principles of Building an Agent

Examples

API Documentation

🔗 Related works

⭐️ Citation

Star History

About

Releases

Packages

Languages

License

MoodMatrix/OmAgent

Folders and files

Latest commit

History

Repository files navigation

OmAgent

🗓️ Updates

📖 Introduction

🛠️ How To Install

1. Deploy the Workflow Orchestration Engine

2. Install OmAgent

3. Connect Devices

🚀 Quick Start

Hello World

🏗 Architecture

Basic Principles of Building an Agent

Examples

API Documentation

🔗 Related works

⭐️ Citation

Star History

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages