fix: workflow index error (#173)

agiresearch · Jul 16, 2024 · f1d9685 · f1d9685
1 parent 16152da
commit f1d9685
Show file tree

Hide file tree

Showing 2 changed files with 135 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -34,6 +34,138 @@ Please see our ongoing [documentation](https://aios.readthedocs.io/en/latest/) f
 - [Installation](https://aios.readthedocs.io/en/latest/get_started/installation.html)
 - [Quickstart](https://aios.readthedocs.io/en/latest/get_started/quickstart.html)
 
+### Installation
+
+Git clone AIOS
+```bash
+git clone https://github.com/agiresearch/AIOS.git
+```
+```bash
+conda create -n AIOS python=3.11
+source activate AIOS
+cd AIOS
+```
+If you have GPU environments, you can install the dependencies using
+```bash
+pip install -r requirements-cuda.txt
+```
+or else you can install the dependencies using
+```bash
+pip install -r requirements.txt
+```
+
+### Quickstart
+
+#### Use with OpenAI API
+You need to get your OpenAI API key from https://platform.openai.com/api-keys.
+Then set up your OpenAI API key as an environment variable
+
+```bash
+export OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
+```
+
+Then run main.py with the models provided by OpenAI API
+
+```python
+python main.py --llm_name gpt-3.5-turbo # use gpt-3.5-turbo for example
+```
+
+#### Use with Gemini API
+You need to get your Gemini API key from https://ai.google.dev/gemini-api
+
+```bash
+export GEMINI_API_KEY=<YOUR_GEMINI_API_KEY>
+```
+
+Then run main.py with the models provided by OpenAI API
+
+```python
+python main.py --llm_name gemini-1.5-flash # use gemini-1.5-flash for example
+```
+
+If you want to use **open-sourced** models provided by huggingface, here we provide three options:
+* Use with ollama
+* Use with native huggingface models
+* Use with vllm
+
+#### Use with ollama
+You need to download ollama from from https://ollama.com/.
+
+Then you need to start the ollama server either from ollama app
+
+or using the following command in the terminal
+
+```bash
+ollama serve
+```
+
+To use models provided by ollama, you need to pull the available models from https://ollama.com/library
+
+```bash
+ollama pull llama3:8b # use llama3:8b for example
+```
+
+ollama can support CPU-only environment, so if you do not have CUDA environment
+
+You can run aios with ollama models by
+
+```python
+python main.py --llm_name ollama/llama3:8b --use_backend ollama # use ollama/llama3:8b for example
+```
+
+However, if you have the GPU environment, you can also pass GPU-related parameters to speed up
+using the following command
+
+```python
+python main.py --llm_name ollama/llama3:8b --use_backend ollama --max_gpu_memory '{"0": "24GB"}' --eval_device "cuda:0" --max_new_tokens 256
+```
+
+#### Use with native huggingface llm models
+Some of the huggingface models require authentification, if you want to use all of
+the models you need to set up  your authentification token in https://huggingface.co/settings/tokens
+and set up it as an environment variable using the following command
+
+```bash
+export HF_AUTH_TOKENS=<YOUR_TOKEN_ID>
+```
+
+You can run with the
+
+```python
+python main.py --llm_name meta-llama/Meta-Llama-3-8B-Instruct --max_gpu_memory '{"0": "24GB"}' --eval_device "cuda:0" --max_new_tokens 256
+```
+
+By default, huggingface will download the models in the `~/.cache` directory.
+If you want to designate the download directory, you can set up it using the following command
+
+```bash
+export HF_HOME=<YOUR_HF_HOME>
+```
+
+#### Use with vllm
+If you want to speed up the inference of huggingface models, you can use vllm as the backend.
+
+💡Note: It is important to note that vllm currently only supports linux and GPU-enabled environment. So if you do not have the environment, you need to choose other options.
+
+Considering that vllm itself does not support passing designated GPU ids, you need to either
+setup the environment variable,
+
+```bash
+export CUDA_VISIBLE_DEVICES="0" # replace with your designated gpu ids
+```
+
+Then run the command
+
+```python
+python main.py --llm_name meta-llama/Meta-Llama-3-8B-Instruct --use_backend vllm --max_gpu_memory '{"0": "24GB"}' --eval_device "cuda:0" --max_new_tokens 256
+```
+
+or you can pass the `CUDA_VISIBLE_DEVICES` as the prefix
+
+```python
+CUDA_VISIBLE_DEVICES=0 python main.py --llm_name meta-llama/Meta-Llama-3-8B-Instruct --use_backend vllm --max_gpu_memory '{"0": "24GB"}' --eval_device "cuda:0" --max_new_tokens 256
+```
+
 ### Supported LLM Endpoints
 - [OpenAI API](https://platform.openai.com/api-keys)
 - [Gemini API](https://ai.google.dev/gemini-api)

diff --git a/pyopenagi/agents/react_agent.py b/pyopenagi/agents/react_agent.py
@@ -148,7 +148,7 @@ def run(self):
                 message = step["message"]
                 tool_use = step["tool_use"]
 
-                prompt = f"At step {self.rounds + 1}, you need to {message}. "
+                prompt = f"At step {i + 1}, you need to {message}. "
                 self.messages.append({
                     "role": "user",
                     "content": prompt
@@ -174,7 +174,7 @@ def run(self):
                 self.request_turnaround_times.extend(turnaround_times)
 
                 if tool_calls:
-                    for i in range(self.plan_max_fail_times):
+                    for _ in range(self.plan_max_fail_times):
                         actions, observations, success = self.call_tools(tool_calls=tool_calls)
 
                         action_messages = "[Action]: " + ";".join(actions)
@@ -198,7 +198,7 @@ def run(self):
                 if i == len(workflow) - 1:
                     final_result = self.messages[-1]
 
-                self.logger.log(f"At step {self.rounds + 1}, {self.messages[-1]}\n", level="info")
+                self.logger.log(f"At step {i + 1}, {self.messages[-1]}\n", level="info")
 
                 self.rounds += 1