-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 1f54b68
Showing
12 changed files
with
2,027 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
.cache | ||
.history | ||
llm_cache.sqlite | ||
coding | ||
*.pyc |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
# Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning | ||
<p align="center"> | ||
<img width="1000px" alt="Sibyl System" src="https://github.com/Ag2S1/Sibyl-System/blob/main/imgs/Sibyl.png?raw=true"> | ||
</p> | ||
<p align="center"> | ||
<a href="https://arxiv.org/abs/2407.10718">[📄arXiv]</a> | ||
<a href="https://huggingface.co/papers/2407.10718">[🤗HF Paper]</a> | ||
<a href="https://github.com/Ag2S1/Sibyl-System">[🛠️Code]</a> | ||
</p> | ||
|
||
This is an experimental project. We are attempting to design a general assistant system that evolves from System1 to System2. The name Sibyl comes from the multi-agent system composed of numerous human brains in [Psycho-Pass](https://psychopass.fandom.com/wiki/Sibyl_System). | ||
|
||
## Citation | ||
If you find our work useful, please cite our paper: | ||
``` | ||
@article{wang2024sibyl, | ||
title={Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning}, | ||
author={Yulong Wang and Tianhao Shen and Lifeng Liu and Jian Xie}, | ||
year={2024}, | ||
eprint={2407.10718}, | ||
archivePrefix={arXiv}, | ||
primaryClass={cs.AI}, | ||
url={https://arxiv.org/abs/2407.10718}, | ||
} | ||
``` | ||
|
||
|
||
## Benchmark | ||
### GAIA | ||
|Model Name|Average score (%)|Level 1 score (%)| Level 2 score (%) | Level 3 score (%)| | ||
|-|-|-|-|-| | ||
|**Sibyl System v0.2**|34.55|47.31|32.7|16.33| | ||
|Multi-Agent Experiment v0.1 (powered by AutoGen)|32.33|47.31|28.93|14.58| | ||
|FRIDAY|24.25|40.86|20.13|6.12| | ||
|GPT4 + manually selected plugins|14.6|30.3|9.7|0| | ||
|GPT4 Turbo|6.67|9.68|6.92|0| | ||
|AutoGPT4|5|15.05|0.63|0| | ||
|
||
## Philosophy | ||
### From System1 to System2 | ||
|
||
Currently popular assistant systems like ChatGPT are designed to solve human decision-making problems at the minute level. Even with methods such as CoT and ReAct, they encounter significant difficulties in handling problems at the 10-minute level. Our system aims to gradually solve problems from the minute level to the hour level and even the day level. | ||
|
||
### Complexity Control | ||
|
||
Decoder-only models have a beneficial characteristic of being pure functions, which allows us to better control complexity. However, as we evolve from System1 to System2, the introduction of states inevitably causes the system's complexity to gradually spiral out of control. Some existing Multi-Agent solutions introduce too many states, making the system difficult to scale sustainably. We aim to control this Multi-Agent characteristic within parts of the system or push it to the system's edges. | ||
|
||
## Contact | ||
|
||
If you have any inquiries, please feel free to raise an issue or reach out to us via email at: [email protected], [email protected] |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
You are a helpful AI assistant. | ||
|
||
I'll give you a question and a set of tools. Tell me which function you would use to solve the problem (or if you don't need any tool). | ||
|
||
# Step History | ||
{steps} | ||
|
||
# Question | ||
```text | ||
{question} | ||
``` | ||
|
||
# Tools | ||
|
||
## Browser | ||
The functions of the browser will share the same session, that means the viewport will persist between calls | ||
Every function will return the text of the current viewport after the action is performed. For long pages(longer than 1 viewport), you can use the page_up() and page_down() functions to scroll the viewport. | ||
Since the page has been converted from HTML to Markdown, you cannot submit information using a form, nor can you enter information in any text boxes. If you want to use the form inside the page, try using the computer_terminal below to read the html content. | ||
When the page is very long, content truncation may occur due to the limited display capacity of the viewport. You need to carefully consider whether additional page down is needed to ensure that you have obtained the complete information. | ||
- informational_web_search(query: str) -> str: | ||
Perform an INFORMATIONAL web search query and return the search results. | ||
- navigational_web_search(query: str) -> str: | ||
Perform a NAVIGATIONAL web search query and immediately navigate to the top result. Useful, for example, to navigate to a particular Wikipedia article or other known destination. Equivalent to Google's "I'm Feeling Lucky" button. | ||
- visit_page(url: str) -> str: | ||
Visit a webpage at a given URL and return its text. | ||
- page_up() -> str: | ||
Scroll the viewport UP one page-length in the current webpage and return the new viewport content. | ||
- page_down() -> str: | ||
Scroll the viewport DOWN one page-length in the current webpage and return the new viewport content. | ||
- download_file(url: str) -> str: | ||
Download a file at a given URL and, if possible, return its text. File types that will returned as text: .pdf, .docx, .xlsx, .pptx, .wav, .mp3, .jpg, .jpeg, .png(You can read the text content of the file with these extensions). | ||
- find_on_page_ctrl_f(search_string: str) -> str: | ||
When the page is too long to be fully displayed in one viewport, you can use this function to scroll the viewport to the first occurrence of the search string. If the viewport has already displayed the entire page(Showing page 1 of 1.), there is no need to use this function. This is equivalent to Ctrl+F. This search string supports wildcards like '*' | ||
- find_next() -> str: | ||
Scroll the viewport to the next occurrence of the search string. | ||
|
||
## Computer Terminal | ||
- computer_terminal(code: str) -> str | ||
You can use this function to run Python code. Use print() to output the result. | ||
|
||
Based on the question and the step history, tell me which function you would use to solve the problem in next step. | ||
If you don't need any function or the question is very easy to answer, function "None" is also an option. | ||
Do not change the format and precision of the results (including rounding), as a dedicated person will handle the final formatting of the results. | ||
Use JSON format to answer. | ||
{format_instructions} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
Format the following answer according to these rules: | ||
|
||
1. **Numbers**: | ||
* If the answer contains a relevant number, return the number without commas, units, or punctuation. | ||
* If the number represents thousands, return the number in thousands. | ||
* Perform necessary unit conversions based on the context provided in the question. For example, convert picometers to Angstroms if the question implies this. | ||
* Retain the original precision of the number unless specific rounding instructions are given. | ||
* Numbers should be written as digits (e.g., 1000000 instead of "one million"). | ||
|
||
2. **Dates**: | ||
* If the answer contains a date, return it in the same format provided. | ||
|
||
3. **Strings**: | ||
* Exclude articles and abbreviations. | ||
* Write digits in numeric form unless specified otherwise. | ||
|
||
4. **Lists**: | ||
* If the answer is a comma-separated list, return it as a comma-separated list, applying the above rules for numbers and strings. | ||
|
||
5. **Sentences**: | ||
* If the answer is a full sentence and the question expects a detailed explanation, preserve the sentence as is. | ||
* If the answer can be reduced to "Yes" or "No", do so. | ||
|
||
Important: | ||
1. Carefully interpret the question to determine the appropriate format for the answer, including any necessary unit conversions. | ||
2. Return only the final formatted answer. | ||
3. The final formatted answer should be as concise as possible, directly addressing the question without any additional explanation or restatement. | ||
4. Exclude any additional details beyond the specific information requested. | ||
5. If unable to solve the question, make a well-informed EDUCATED GUESS based on the information we have provided. Your EDUCATED GUESS should be a number OR as few words as possible OR a comma separated list of numbers and/or strings. DO NOT OUTPUT 'I don't know', 'Unable to determine', etc. | ||
|
||
Here is the question: | ||
{question} | ||
|
||
Here is the answer to format: | ||
{answer} | ||
|
||
Formatted answer: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
Your ultimate goal is to find the answer to the question below. | ||
```text | ||
{question} | ||
``` | ||
|
||
# Step History | ||
```text | ||
{steps} | ||
``` | ||
|
||
The next step is running the following code: | ||
```python | ||
{code} | ||
``` | ||
|
||
Check this code and help me improve it. | ||
|
||
Response in JSON format: | ||
{format_instructions} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
Your ultimate goal is to find the answer to the question below. | ||
```text | ||
{question} | ||
``` | ||
|
||
# Tools | ||
|
||
## Browser | ||
The functions of the browser will share the same session, that means the viewport will persist between calls | ||
Every function will return the text of the current viewport after the action is performed. For long pages(longer than 1 viewport), you can use the page_up() and page_down() functions to scroll the viewport. | ||
Since the page has been converted from HTML to Markdown, you cannot submit information using a form, nor can you enter information in any text boxes. If you want to use the form inside the page, try using the computer_terminal below to read the html content. | ||
When the page is very long, content truncation may occur due to the limited display capacity of the viewport. You need to carefully consider whether additional page down is needed to ensure that you have obtained the complete information. | ||
- informational_web_search(query: str) -> str: | ||
Perform an INFORMATIONAL web search query and return the search results. | ||
- navigational_web_search(query: str) -> str: | ||
Perform a NAVIGATIONAL web search query and immediately navigate to the top result. Useful, for example, to navigate to a particular Wikipedia article or other known destination. Equivalent to Google's "I'm Feeling Lucky" button. | ||
- visit_page(url: str) -> str: | ||
Visit a webpage at a given URL and return its text. | ||
- page_up() -> str: | ||
Scroll the viewport UP one page-length in the current webpage and return the new viewport content. | ||
- page_down() -> str: | ||
Scroll the viewport DOWN one page-length in the current webpage and return the new viewport content. | ||
- download_file(url: str) -> str: | ||
Download a file at a given URL and, if possible, return its text. File types that will returned as text: .pdf, .docx, .xlsx, .pptx, .wav, .mp3, .jpg, .jpeg, .png(You can read the text content of the file with these extensions). | ||
- find_on_page_ctrl_f(search_string: str) -> str: | ||
When the page is too long to be fully displayed in one viewport, you can use this function to scroll the viewport to the first occurrence of the search string. If the viewport has already displayed the entire page(Showing page 1 of 1.), there is no need to use this function. This is equivalent to Ctrl+F. This search string supports wildcards like '*' | ||
- find_next() -> str: | ||
Scroll the viewport to the next occurrence of the search string. | ||
|
||
## Computer Terminal | ||
- computer_terminal(code: str) -> str | ||
You can use this tool to run Python code. Use print() to output the result. | ||
|
||
# Step History | ||
```text | ||
{steps} | ||
``` | ||
|
||
# Current Step Tool Result | ||
Tool: {tool} | ||
Args: {args} | ||
``` | ||
{tool_result} | ||
``` | ||
|
||
# Instructions | ||
1. Analyze the given tool result to extract relevant information directly contributing to answering the question. | ||
2. Verify the information against the original question to ensure accuracy. | ||
3. Record new facts only if they provide unique information not already found in the step history. | ||
4. If the current tool result directly answers the question, record the answer and explain why no further steps are necessary. | ||
5. If the current tool result is insufficient, plan a follow-up step to gather more data. | ||
6. Choose the next tool and query that efficiently leads to the ultimate goal. | ||
7. Minimize unnecessary steps by focusing on direct and efficient methods to gather required information. | ||
8. Explain why you chose the next step and how it contributes to answering the question. | ||
9. Do not change the format and precision of the results, as a dedicated person will handle the final formatting. | ||
10. Your reply will be sent to the next agent for further action, so it is necessary to record all the information needed by the next agent in the plan (such as the complete URL of the link that needs to be clicked). | ||
|
||
Response Format: | ||
```text | ||
Facts: | ||
1. Address: xxxx, Title: xxxx, Viewport position: xxxx | ||
xxxxx | ||
2. Address: xxxx, Title: xxxx, Viewport position: xxxx | ||
xxxxx | ||
Explanation: | ||
xxxx | ||
Plan: | ||
xxxx | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
beautifulsoup4==4.12.3 | ||
datasets==2.19.1 | ||
diskcache==5.6.3 | ||
easyocr==1.7.1 | ||
joblib==1.4.2 | ||
langchain_community==0.2.0 | ||
langchain_core==0.2.0 | ||
langchain_openai==0.1.7 | ||
mammoth==1.7.1 | ||
markdownify==0.12.1 | ||
numpy==1.26.4 | ||
openai_whisper==20231117 | ||
openpyxl==3.1.2 | ||
pandas==2.2.2 | ||
pathvalidate==3.2.0 | ||
pdfminer==20191125 | ||
pdfminer.six==20231228 | ||
Pillow==10.3.0 | ||
puremagic==1.23 | ||
pyautogen==0.2.27 | ||
pydub==0.25.1 | ||
python_pptx==0.6.23 | ||
ray==2.22.0 | ||
Requests==2.32.3 | ||
rich==13.7.1 | ||
SpeechRecognition==3.10.4 | ||
youtube_transcript_api==0.6.2 |
Oops, something went wrong.